mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-07-20 19:04:54 +00:00

T

98dc191194 Fix: Lock AI index during reading and don't index documents many times during a bulk update (#12899 )

* Fix: Move LLM index lock outside index dir and skip per-doc tasks on bulk update

Two concurrency bugs from #12893:

[P1] Lock file lived inside LLM_INDEX_DIR. A rebuild calls
shutil.rmtree(LLM_INDEX_DIR), deleting the lock while a worker still
held it. A second worker then acquired a fresh lock on the new path and
ran concurrently, defeating serialisation. Move the lock to
DATA_DIR/locks/llm_index.lock (a new settings constant LLM_INDEX_LOCK)
so rmtree cannot touch it. The locks/ dir is created at settings load
time, matching the existing pattern for LOGGING_DIR.

[P2] document_updated was connected to add_or_update_document_in_llm_index
in apps.py. bulk_update_documents() emits document_updated for every
document in the batch, queuing N per-document LLM tasks, and then also
calls update_llm_index(rebuild=False) once at the end. Pass
skip_ai_index=True when sending document_updated from the bulk path so
the handler skips the per-document enqueue; the existing batch call at
the end of bulk_update_documents is the only LLM update for that path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: ghost vectors leave KeyError-prone nodes_dict entries after deletion

docstore.delete_document() removes a node from the docstore but leaves its
entry in index_struct.nodes_dict (the FAISS positional-id to node-UUID map).
A subsequent similarity query resolves the ghost position to the deleted UUID,
finds nothing in fetched_nodes_by_id, and raises KeyError inside
_insert_fetched_nodes_into_query_result.

Purge stale nodes_dict entries after each docstore deletion and re-sync the
mutated index_struct into the kvstore so persist() writes the updated mapping.
Dead FAISS vectors remain in the flat index until the next full rebuild
(IndexFlatL2 is append-only); add a try/except KeyError around
retriever.retrieve() as a defensive fallback for any residual ghost positions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: acquire index lock in query_similar_documents

query_similar_documents() loaded the index and ran the FAISS retriever
without holding the file lock. All write paths (update_llm_index,
llm_index_add_or_update_document, llm_index_remove_document) hold
FileLock(_index_lock_path()), so a concurrent rebuild calling
shutil.rmtree(LLM_INDEX_DIR) while a read is mid-load produces an IOError
or corrupt partial state.

Wrap the load_or_build_index() call and all subsequent retriever work inside
FileLock. The early-return guards (vector_store_file_exists check, empty
allowed_document_ids) remain outside the lock; the DB query for the final
result set also stays outside.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: skip LLM index enqueue on document_updated during version addition

When a document is consumed as a new version of an existing document, the
consumer fires document_consumption_finished (which triggers
add_or_update_document_in_llm_index) and then document_updated for the root
document. Both signals are connected to the same handler, so the root document
was enqueued for LLM indexing twice per version-addition event.

Pass skip_ai_index=True on the consumer's version-addition document_updated
send so the handler's existing guard suppresses the duplicate enqueue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Test: bulk_update_documents must not enqueue per-doc LLM tasks

With AI enabled, bulk_update_documents() sends document_updated for every
document in the batch. The skip_ai_index=True kwarg (added in the P2 fix)
prevents add_or_update_document_in_llm_index from enqueuing a per-document
task for each one. Only the single update_llm_index call at the end should run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Debug level log sure

* Update src/paperless_ai/indexing.py

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>

* Apply suggestion from @shamoon

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>

2026-06-02 10:46:29 -07:00

.devcontainer

Breaking: Remove pybzar as a barcode reader (#12065 )

2026-02-13 08:14:00 -08:00

.github

Fix: Handle dash or plus operators in search queries (#12734 )

2026-05-07 17:26:11 +00:00

docker

Security: Improve overall security in a few ways (#12501 )

2026-04-02 15:30:26 -07:00

docs

Enhancement: AI LLM chunk size and context window config (#12891 )

2026-06-01 17:56:21 +00:00

scripts

Fix: Changes bare metal webserver to use uvloop (#12626 )

2026-04-22 17:34:25 -07:00

src

Fix: Lock AI index during reading and don't index documents many times during a bulk update (#12899 )

2026-06-02 10:46:29 -07:00

src-ui

Enhancement: AI LLM chunk size and context window config (#12891 )

2026-06-01 17:56:21 +00:00

.codecov.yml

Breaking: Drop support for Python 3.10 (#12234 )

2026-03-04 15:03:33 -08:00

.dockerignore

Chore: Enable mypy checking in CI (#11991 )

2026-02-03 16:02:33 -08:00

.editorconfig

Breaking: Refactor advanced database settings to allow more user configuration (#12165 )

2026-02-27 14:37:26 -08:00

.env

Chore: Remove unneeded .env entry, revert crowdin action rm, reduce frequency

2023-12-02 08:24:17 -08:00

.gitignore

Chore: Add generic type params and update our baselines (#12566 )

2026-04-13 14:12:59 -07:00

.hadolint.yml

Configure Hadolint in a single location for both hooks and CI

2022-07-19 13:54:33 -07:00

.mypy-baseline.txt

Chore: Update typing and baselines again (#12641 )

2026-04-28 09:28:05 -07:00

.pre-commit-config.yaml

Chore(deps): Bump the pre-commit-dependencies group with 4 updates (#12694 )

2026-05-02 14:39:32 -07:00

.prettierrc.js

Chore(deps): Bump the pre-commit-dependencies group with 4 updates (#12323 )

2026-03-12 16:29:57 +00:00

.pyrefly-baseline.json

Chore: Update typing and baselines again (#12641 )

2026-04-28 09:28:05 -07:00

.yamlfmt

Chore(deps): Bump bootstrap from 5.3.7 to 5.3.8 in /src-ui (#10740 )

2025-09-03 21:58:53 +00:00

CODE_OF_CONDUCT.md

Chore(deps-dev): Bump the development group across 1 directory with 2 updates (#6851 )

2024-05-29 07:04:01 +00:00

CODEOWNERS

Chore: Switch from pipenv to uv (#9251 )

2025-03-04 16:15:51 +00:00

CONTRIBUTING.md

Breaking: Drop support for Python 3.10 (#12234 )

2026-03-04 15:03:33 -08:00

crowdin.yml

Chore: Implement crowdin GHA (#4706 )

2023-12-01 17:44:33 -08:00

Dockerfile

Fix: Makes the font cache folder writeable to all users, like ourselves (#12726 )

2026-05-06 12:24:30 -07:00

install-paperless-ngx.sh

Chore: fix Postgres compose volume mount path in install script (#11184 )

2025-10-26 14:40:37 +00:00

LICENSE

Initial commit

2015-12-20 12:54:28 +00:00

paperless-ngx.code-workspace

Chore: Upgrades tantivy-py to the latest release (#12605 )

2026-04-29 10:09:50 -07:00

paperless.conf.example

Security: Improve overall security in a few ways (#12501 )

2026-04-02 15:30:26 -07:00

pyproject.toml

Enhancement: support ollama embeddings (#12753 )

2026-05-09 00:06:14 +00:00

README.md

Enhancement: Paperless-ngx v3 Logo (#12673 )

2026-04-29 16:38:25 -07:00

SECURITY.md

Update SECURITY.md

2026-05-04 14:20:25 -07:00

uv.lock

Upgrades this dep so it handles newer models, like gpt-5-5 which require a locked 1.0 temperature value (#12824 )

2026-05-18 12:30:03 -07:00

zensical.toml

Enhancement: Paperless-ngx v3 Logo (#12673 )

2026-04-29 16:38:25 -07:00

README.md

Paperless-ngx

Paperless-ngx is a document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.

Paperless-ngx is the official successor to the original Paperless & Paperless-ng projects and is designed to distribute the responsibility of advancing and supporting the project among a team of people. Consider joining us!

Thanks to the generous folks at DigitalOcean, a demo is available at demo.paperless-ngx.com using login demo / demo. Note: demo content is reset frequently and confidential information should not be uploaded.

Features
Getting started
Contributing
Related Projects
Important Note

This project is supported by:

Features

A full list of features and screenshots are available in the documentation.

Getting started

The easiest way to deploy paperless is docker compose. The files in the /docker/compose directory are configured to pull the image from the GitHub container registry.

If you'd like to jump right in, you can configure a docker compose environment with our install script:

bash -c "$(curl -L https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"

More details and step-by-step guides for alternative installation methods can be found in the documentation.

Migrating from Paperless-ng is easy, just drop in the new docker image! See the documentation on migrating for more details.

Documentation

The documentation for Paperless-ngx is available at https://docs.paperless-ngx.com.

Contributing

If you feel like contributing to the project, please do! Bug fixes, enhancements, visual fixes etc. are always welcome. If you want to implement something big: Please start a discussion about that! The documentation has some basic information on how to get started.

Community Support

People interested in continuing the work on paperless-ngx are encouraged to reach out here on github and in the Matrix Room. If you would like to contribute to the project on an ongoing basis there are multiple teams (frontend, ci/cd, etc) that could use your help so please reach out!

Translation

Paperless-ngx is available in many languages that are coordinated on Crowdin. If you want to help out by translating paperless-ngx into your language, please head over to https://crowdin.com/project/paperless-ngx, and thank you! More details can be found in CONTRIBUTING.md.

Feature Requests

Feature requests can be submitted via GitHub Discussions, you can search for existing ideas, add your own and vote for the ones you care about.

Bugs

For bugs please open an issue or start a discussion if you have questions.

Please see the wiki for a user-maintained list of related projects and software that is compatible with Paperless-ngx.

Important Note

Document scanners are typically used to scan sensitive documents like your social insurance number, tax records, invoices, etc. Paperless-ngx should never be run on an untrusted host because information is stored in clear text without encryption. No guarantees are made regarding security (but we do try!) and you use the app at your own risk. The safest way to run Paperless-ngx is on a local server in your own home with backups in place.

README.md

Paperless-ngx

Features

Getting started

Documentation

Contributing

Community Support

Translation

Feature Requests

Bugs

Related Projects

Important Note