mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2026-07-02 10:14:17 +00:00
2c58d86380
* Fix: Remove all nodes for multi-chunk documents in update_llm_index incremental path The existing_nodes dict comprehension keyed on document_id silently dropped all but the last node per document, so only that one node was deleted when a modified document was re-indexed, leaving all other chunks as ghost vectors in the FAISS index. Switch to a defaultdict(list) that collects every node per document_id, then iterate and delete all of them before inserting fresh nodes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Wire document_updated signal to LLM index update handler Connect document_updated to add_or_update_document_in_llm_index in DocumentsConfig.ready() so REST API edits (PATCH /api/documents/{id}/) enqueue an LLM vector store update, matching the existing document_consumption_finished behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Add file lock around FAISS index mutations to prevent concurrent write corruption Two concurrent Celery workers calling llm_index_add_or_update_document or llm_index_remove_document each loaded the same on-disk index independently, made their own change, and the last writer silently overwrote the first's update. Wrap both functions and the rebuild/persist body of update_llm_index in a filelock.FileLock keyed on LLM_INDEX_DIR/index.lock. Add a TOCTOU comment on queue_llm_index_update_if_needed explaining the residual risk (duplicate rebuild tasks are wasteful but not corrupting because the lock serialises the actual write). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Apply _normalize() in extract_unmatched_names to prevent duplicate suggestions extract_unmatched_names was using .lower() while _match_names_to_queryset uses _normalize() (which also strips punctuation). A name like "J. Smith" matched to existing correspondent "J Smith" would still appear in the unmatched list, causing duplicate object creation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Skip LLM index update gracefully when document has no indexable content Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: Persist empty index when all documents are deleted to clear stale FAISS vectors The early-return guard in update_llm_index fired before persist() when no documents existed, leaving a stale on-disk FAISS index that returned phantom hits for deleted document IDs. Now the guard only returns early for the incremental (rebuild=False) path when no index exists on disk; the rebuild path always continues through to persist(), producing an empty clean index. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Chore: Simplify incremental index update — use docs.values() and deduplicate node extend --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
39 lines
1.8 KiB
Python
39 lines
1.8 KiB
Python
from django.apps import AppConfig
|
|
from django.utils.translation import gettext_lazy as _
|
|
|
|
|
|
class DocumentsConfig(AppConfig):
|
|
name = "documents"
|
|
|
|
verbose_name = _("Documents")
|
|
|
|
def ready(self) -> None:
|
|
from documents.signals import document_consumption_finished
|
|
from documents.signals import document_updated
|
|
from documents.signals.handlers import add_inbox_tags
|
|
from documents.signals.handlers import add_or_update_document_in_llm_index
|
|
from documents.signals.handlers import add_to_index
|
|
from documents.signals.handlers import run_workflows_added
|
|
from documents.signals.handlers import run_workflows_updated
|
|
from documents.signals.handlers import send_websocket_document_updated
|
|
from documents.signals.handlers import set_correspondent
|
|
from documents.signals.handlers import set_document_type
|
|
from documents.signals.handlers import set_storage_path
|
|
from documents.signals.handlers import set_tags
|
|
|
|
document_consumption_finished.connect(add_inbox_tags)
|
|
document_consumption_finished.connect(set_correspondent)
|
|
document_consumption_finished.connect(set_document_type)
|
|
document_consumption_finished.connect(set_tags)
|
|
document_consumption_finished.connect(set_storage_path)
|
|
document_consumption_finished.connect(add_to_index)
|
|
document_consumption_finished.connect(run_workflows_added)
|
|
document_consumption_finished.connect(add_or_update_document_in_llm_index)
|
|
document_updated.connect(run_workflows_updated)
|
|
document_updated.connect(send_websocket_document_updated)
|
|
document_updated.connect(add_or_update_document_in_llm_index)
|
|
|
|
import documents.schema # noqa: F401
|
|
|
|
AppConfig.ready(self)
|