Commit Graph

34 Commits

Author SHA1 Message Date
Trenton H 2c58d86380 Fix: Minor fixes for the AI indexing (#12893)
* Fix: Remove all nodes for multi-chunk documents in update_llm_index incremental path

The existing_nodes dict comprehension keyed on document_id silently dropped all
but the last node per document, so only that one node was deleted when a
modified document was re-indexed, leaving all other chunks as ghost vectors in
the FAISS index. Switch to a defaultdict(list) that collects every node per
document_id, then iterate and delete all of them before inserting fresh nodes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: Wire document_updated signal to LLM index update handler

Connect document_updated to add_or_update_document_in_llm_index in
DocumentsConfig.ready() so REST API edits (PATCH /api/documents/{id}/)
enqueue an LLM vector store update, matching the existing
document_consumption_finished behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: Add file lock around FAISS index mutations to prevent concurrent write corruption

Two concurrent Celery workers calling llm_index_add_or_update_document or
llm_index_remove_document each loaded the same on-disk index independently,
made their own change, and the last writer silently overwrote the first's
update. Wrap both functions and the rebuild/persist body of update_llm_index
in a filelock.FileLock keyed on LLM_INDEX_DIR/index.lock. Add a TOCTOU
comment on queue_llm_index_update_if_needed explaining the residual risk
(duplicate rebuild tasks are wasteful but not corrupting because the lock
serialises the actual write).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: Apply _normalize() in extract_unmatched_names to prevent duplicate suggestions

extract_unmatched_names was using .lower() while _match_names_to_queryset
uses _normalize() (which also strips punctuation). A name like "J. Smith"
matched to existing correspondent "J Smith" would still appear in the
unmatched list, causing duplicate object creation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: Skip LLM index update gracefully when document has no indexable content

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: Persist empty index when all documents are deleted to clear stale FAISS vectors

The early-return guard in update_llm_index fired before persist() when no
documents existed, leaving a stale on-disk FAISS index that returned phantom
hits for deleted document IDs. Now the guard only returns early for the
incremental (rebuild=False) path when no index exists on disk; the rebuild
path always continues through to persist(), producing an empty clean index.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Chore: Simplify incremental index update — use docs.values() and deduplicate node extend

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 13:40:49 -07:00
shamoon 299dac21ee Enhancement: “live” document updates (#12141) 2026-03-04 00:27:07 +00:00
Sebastian Steinbeißer 3b5ffbf9fa Chore(mypy): Annotate None returns for typing improvements (#11213) 2026-02-02 08:44:12 -08:00
shamoon e940764fe0 Feature: Paperless AI (#10319) 2026-01-13 16:24:42 +00:00
shamoon 1dc80f04cb Feature: openapi spec, full api browser (#8948) 2025-02-10 16:43:07 +00:00
shamoon 51b0f6e325 Fix: remove outdated admin logentry handler (#8580) 2025-01-01 10:11:54 -08:00
shamoon dcc8d4046a Chore: Unify workflow logic (#7880) 2024-10-10 20:28:44 +00:00
shamoon 3b6ce16f1c Feature: Workflows (#5121) 2024-01-03 08:19:19 +00:00
Trenton H 8d60506884 Standarizes the imports across all the files and modules (#4248) 2023-09-23 20:17:01 -07:00
Trenton H 6f163111ce Upgrades black to v23, upgrades ruff 2023-04-26 09:35:27 -07:00
Trenton H 3bcbd05252 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Markus 69ef26dab0 Feature: Dynamic document storage pathes (#916)
* Added devcontainer

* Add feature storage pathes

* Exclude tests and add versioning

* Check escaping

* Check escaping

* Check quoting

* Echo

* Escape

* Escape :

* Double escape \

* Escaping

* Remove if

* Escape colon

* Missing \

* Esacpe :

* Escape all

* test

* Remove sed

* Fix exclude

* Remove SED command

* Add LD_LIBRARY_PATH

* Adjusted to v1.7

* Updated test-cases

* Remove devcontainer

* Removed internal build-file

* Run pre-commit

* Corrected flak8 error

* Adjusted to v1.7

* Updated test-cases

* Corrected flak8 error

* Adjusted to new plural translations

* Small adjustments due to code-review backend

* Adjusted line-break

* Removed PAPERLESS prefix from settings variables

* Corrected style change due to search+replace

* First documentation draft

* Revert changes to Pipfile

* Add sphinx-autobuild with keep-outdated

* Revert merge error that results in wrong storage path is evaluated

* Adjust styles of generated files ...

* Adds additional testing to cover dynamic storage path functionality

* Remove unnecessary condition

* Add hint to edit storage path dialog

* Correct spelling of pathes to paths

* Minor documentation tweaks

* Minor typo

* improving wrapping of filter editor buttons with new storage path button

* Update .gitignore

* Fix select border radius in non input-groups

* Better storage path edit hint

* Add note to edit storage path dialog re document_renamer

* Add note to bulk edit storage path re document_renamer

* Rename FILTER_STORAGE_DIRECTORY to PATH

* Fix broken filter rule parsing

* Show default storage if unspecified

* Remove note re storage path on bulk edit

* Add basic validation of filename variables

Co-authored-by: Markus Kling <markus@markus-kling.net>
Co-authored-by: Trenton Holmes <holmes.trenton@gmail.com>
Co-authored-by: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Co-authored-by: Quinn Casey <quinn@quinncasey.com>
2022-05-19 14:42:25 -07:00
Trenton Holmes 1771d18a21 Runs the pre-commit hooks over all the Python files 2022-03-11 11:34:28 -08:00
kpj fc695896dd Format Python code with black 2022-02-27 15:26:41 +01:00
jonaswinkler bcf17bfdc0 fix some translation issues 2021-01-02 00:45:23 +01:00
jonaswinkler 4b74cd5677 fix #236 2021-01-01 23:27:55 +01:00
Jonas Winkler 2e04ba1c04 code style fixes 2020-11-12 21:09:45 +01:00
Jonas Winkler 734da28b69 fixed the file handling implementation. The feature is cool, but the original implementation had so many small flaws it wasn't even funny. 2020-11-11 14:21:33 +01:00
Jonas Winkler 942fab7298 I removed the model save/delete hooks for index updates since they were causing too much trouble with migrations 2020-11-08 11:24:57 +01:00
Jonas Winkler 6f3d25d7b1 this was not required since saving a document updates the index anyway 2020-11-03 13:51:49 +01:00
Jonas Winkler 11af74ba36 unified document matching, legacy and automatching work alongside now 2020-10-28 11:45:11 +01:00
Jonas Winkler 052c1680f3 added
- document index
- api access for thumbnails/downloads
- more api filters

updated
- pipfile

removed
- filename handling
- legacy thumb/download access
- obsolete admin gui settings (per page items, FY, inline view)
2020-10-25 23:03:02 +01:00
Jonas Winkler 46a5bc00d7 Merge branch 'machine-learning' into dev 2018-09-11 14:36:21 +02:00
Jonas Winkler d46ee11143 The classifier works with ids now, not names. Minor changes. 2018-09-11 14:30:18 +02:00
Jonas Winkler 11adc94e5e mode change 2018-09-06 12:00:01 +02:00
Jonas Winkler c091eba26e Implemented the classifier model, including automatic tagging of new documents 2018-09-04 14:39:55 +02:00
Jonas Winkler d7ab69fed9 Added document type 2018-08-24 13:45:15 +02:00
CkuT 45e18d7094 Add LogEntry after document consumption
See #319
2018-03-11 17:09:43 +01:00
Daniel Quinn f5daded930 Fix for #131: delete files on document.delete 2016-08-16 19:13:37 +01:00
Lenz Weber e7566d2b1c style changes, variable renames
* PEP8 conformity
* rename run_post_consume_external_script to run_post_consume_script
* rename run_pre_consume_external_script to run_pre_consume_script
* change order of declaration and use from post...pre to pre...post
2016-06-24 16:53:55 +02:00
Lenz Weber c728b1dd21 add pre-consume hook
a script hook can be defined in /etc/paperless.conf as
PAPERLESS_PRE_CONSUME_SCRIPT
2016-06-23 21:57:17 +02:00
Daniel Quinn cb2df58b27 Everything appears to be working 2016-03-28 19:47:11 +01:00
Daniel Quinn b92e007e15 Removed log components and introduced signals for tags & correspondents 2016-03-28 11:11:15 +01:00
Daniel Quinn 855ee64097 It works! 2015-12-20 19:23:33 +00:00