paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-07-02 18:24:17 +00:00

Author	SHA1	Message	Date
Trenton H	2c58d86380	Fix: Minor fixes for the AI indexing (#12893 ) * Fix: Remove all nodes for multi-chunk documents in update_llm_index incremental path The existing_nodes dict comprehension keyed on document_id silently dropped all but the last node per document, so only that one node was deleted when a modified document was re-indexed, leaving all other chunks as ghost vectors in the FAISS index. Switch to a defaultdict(list) that collects every node per document_id, then iterate and delete all of them before inserting fresh nodes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Wire document_updated signal to LLM index update handler Connect document_updated to add_or_update_document_in_llm_index in DocumentsConfig.ready() so REST API edits (PATCH /api/documents/{id}/) enqueue an LLM vector store update, matching the existing document_consumption_finished behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Add file lock around FAISS index mutations to prevent concurrent write corruption Two concurrent Celery workers calling llm_index_add_or_update_document or llm_index_remove_document each loaded the same on-disk index independently, made their own change, and the last writer silently overwrote the first's update. Wrap both functions and the rebuild/persist body of update_llm_index in a filelock.FileLock keyed on LLM_INDEX_DIR/index.lock. Add a TOCTOU comment on queue_llm_index_update_if_needed explaining the residual risk (duplicate rebuild tasks are wasteful but not corrupting because the lock serialises the actual write). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Apply _normalize() in extract_unmatched_names to prevent duplicate suggestions extract_unmatched_names was using .lower() while _match_names_to_queryset uses _normalize() (which also strips punctuation). A name like "J. Smith" matched to existing correspondent "J Smith" would still appear in the unmatched list, causing duplicate object creation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Skip LLM index update gracefully when document has no indexable content Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Persist empty index when all documents are deleted to clear stale FAISS vectors The early-return guard in update_llm_index fired before persist() when no documents existed, leaving a stale on-disk FAISS index that returned phantom hits for deleted document IDs. Now the guard only returns early for the incremental (rebuild=False) path when no index exists on disk; the rebuild path always continues through to persist(), producing an empty clean index. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Chore: Simplify incremental index update — use docs.values() and deduplicate node extend --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-01 13:40:49 -07:00
shamoon	299dac21ee	Enhancement: “live” document updates (#12141 )	2026-03-04 00:27:07 +00:00
Sebastian Steinbeißer	3b5ffbf9fa	Chore(mypy): Annotate `None` returns for typing improvements (#11213 )	2026-02-02 08:44:12 -08:00
shamoon	e940764fe0	Feature: Paperless AI (#10319 )	2026-01-13 16:24:42 +00:00
shamoon	1dc80f04cb	Feature: openapi spec, full api browser (#8948 )	2025-02-10 16:43:07 +00:00
shamoon	51b0f6e325	Fix: remove outdated admin logentry handler (#8580 )	2025-01-01 10:11:54 -08:00
shamoon	dcc8d4046a	Chore: Unify workflow logic (#7880 )	2024-10-10 20:28:44 +00:00
shamoon	3b6ce16f1c	Feature: Workflows (#5121 )	2024-01-03 08:19:19 +00:00
Trenton H	8d60506884	Standarizes the imports across all the files and modules (#4248 )	2023-09-23 20:17:01 -07:00
Trenton H	6f163111ce	Upgrades black to v23, upgrades ruff	2023-04-26 09:35:27 -07:00
Trenton H	3bcbd05252	Fixes ruff not running isort against the codebase	2023-04-26 09:35:27 -07:00
Markus	69ef26dab0	Feature: Dynamic document storage pathes (#916 ) * Added devcontainer * Add feature storage pathes * Exclude tests and add versioning * Check escaping * Check escaping * Check quoting * Echo * Escape * Escape : * Double escape \ * Escaping * Remove if * Escape colon * Missing \ * Esacpe : * Escape all * test * Remove sed * Fix exclude * Remove SED command * Add LD_LIBRARY_PATH * Adjusted to v1.7 * Updated test-cases * Remove devcontainer * Removed internal build-file * Run pre-commit * Corrected flak8 error * Adjusted to v1.7 * Updated test-cases * Corrected flak8 error * Adjusted to new plural translations * Small adjustments due to code-review backend * Adjusted line-break * Removed PAPERLESS prefix from settings variables * Corrected style change due to search+replace * First documentation draft * Revert changes to Pipfile * Add sphinx-autobuild with keep-outdated * Revert merge error that results in wrong storage path is evaluated * Adjust styles of generated files ... * Adds additional testing to cover dynamic storage path functionality * Remove unnecessary condition * Add hint to edit storage path dialog * Correct spelling of pathes to paths * Minor documentation tweaks * Minor typo * improving wrapping of filter editor buttons with new storage path button * Update .gitignore * Fix select border radius in non input-groups * Better storage path edit hint * Add note to edit storage path dialog re document_renamer * Add note to bulk edit storage path re document_renamer * Rename FILTER_STORAGE_DIRECTORY to PATH * Fix broken filter rule parsing * Show default storage if unspecified * Remove note re storage path on bulk edit * Add basic validation of filename variables Co-authored-by: Markus Kling <markus@markus-kling.net> Co-authored-by: Trenton Holmes <holmes.trenton@gmail.com> Co-authored-by: Michael Shamoon <4887959+shamoon@users.noreply.github.com> Co-authored-by: Quinn Casey <quinn@quinncasey.com>	2022-05-19 14:42:25 -07:00
Trenton Holmes	1771d18a21	Runs the pre-commit hooks over all the Python files	2022-03-11 11:34:28 -08:00
kpj	fc695896dd	Format Python code with black	2022-02-27 15:26:41 +01:00
jonaswinkler	bcf17bfdc0	fix some translation issues	2021-01-02 00:45:23 +01:00
jonaswinkler	4b74cd5677	fix #236	2021-01-01 23:27:55 +01:00
Jonas Winkler	2e04ba1c04	code style fixes	2020-11-12 21:09:45 +01:00
Jonas Winkler	734da28b69	fixed the file handling implementation. The feature is cool, but the original implementation had so many small flaws it wasn't even funny.	2020-11-11 14:21:33 +01:00
Jonas Winkler	942fab7298	I removed the model save/delete hooks for index updates since they were causing too much trouble with migrations	2020-11-08 11:24:57 +01:00
Jonas Winkler	6f3d25d7b1	this was not required since saving a document updates the index anyway	2020-11-03 13:51:49 +01:00
Jonas Winkler	11af74ba36	unified document matching, legacy and automatching work alongside now	2020-10-28 11:45:11 +01:00
Jonas Winkler	052c1680f3	added - document index - api access for thumbnails/downloads - more api filters updated - pipfile removed - filename handling - legacy thumb/download access - obsolete admin gui settings (per page items, FY, inline view)	2020-10-25 23:03:02 +01:00
Jonas Winkler	46a5bc00d7	Merge branch 'machine-learning' into dev	2018-09-11 14:36:21 +02:00
Jonas Winkler	d46ee11143	The classifier works with ids now, not names. Minor changes.	2018-09-11 14:30:18 +02:00
Jonas Winkler	11adc94e5e	mode change	2018-09-06 12:00:01 +02:00
Jonas Winkler	c091eba26e	Implemented the classifier model, including automatic tagging of new documents	2018-09-04 14:39:55 +02:00
Jonas Winkler	d7ab69fed9	Added document type	2018-08-24 13:45:15 +02:00
CkuT	45e18d7094	Add LogEntry after document consumption See #319	2018-03-11 17:09:43 +01:00
Daniel Quinn	f5daded930	Fix for #131 : delete files on document.delete	2016-08-16 19:13:37 +01:00
Lenz Weber	e7566d2b1c	style changes, variable renames * PEP8 conformity * rename run_post_consume_external_script to run_post_consume_script * rename run_pre_consume_external_script to run_pre_consume_script * change order of declaration and use from post...pre to pre...post	2016-06-24 16:53:55 +02:00
Lenz Weber	c728b1dd21	add pre-consume hook a script hook can be defined in /etc/paperless.conf as PAPERLESS_PRE_CONSUME_SCRIPT	2016-06-23 21:57:17 +02:00
Daniel Quinn	cb2df58b27	Everything appears to be working	2016-03-28 19:47:11 +01:00
Daniel Quinn	b92e007e15	Removed log components and introduced signals for tags & correspondents	2016-03-28 11:11:15 +01:00
Daniel Quinn	855ee64097	It works!	2015-12-20 19:23:33 +00:00

34 Commits