* Fix: Remove all nodes for multi-chunk documents in update_llm_index incremental path
The existing_nodes dict comprehension keyed on document_id silently dropped all
but the last node per document, so only that one node was deleted when a
modified document was re-indexed, leaving all other chunks as ghost vectors in
the FAISS index. Switch to a defaultdict(list) that collects every node per
document_id, then iterate and delete all of them before inserting fresh nodes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Wire document_updated signal to LLM index update handler
Connect document_updated to add_or_update_document_in_llm_index in
DocumentsConfig.ready() so REST API edits (PATCH /api/documents/{id}/)
enqueue an LLM vector store update, matching the existing
document_consumption_finished behavior.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Add file lock around FAISS index mutations to prevent concurrent write corruption
Two concurrent Celery workers calling llm_index_add_or_update_document or
llm_index_remove_document each loaded the same on-disk index independently,
made their own change, and the last writer silently overwrote the first's
update. Wrap both functions and the rebuild/persist body of update_llm_index
in a filelock.FileLock keyed on LLM_INDEX_DIR/index.lock. Add a TOCTOU
comment on queue_llm_index_update_if_needed explaining the residual risk
(duplicate rebuild tasks are wasteful but not corrupting because the lock
serialises the actual write).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Apply _normalize() in extract_unmatched_names to prevent duplicate suggestions
extract_unmatched_names was using .lower() while _match_names_to_queryset
uses _normalize() (which also strips punctuation). A name like "J. Smith"
matched to existing correspondent "J Smith" would still appear in the
unmatched list, causing duplicate object creation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Skip LLM index update gracefully when document has no indexable content
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Persist empty index when all documents are deleted to clear stale FAISS vectors
The early-return guard in update_llm_index fired before persist() when no
documents existed, leaving a stale on-disk FAISS index that returned phantom
hits for deleted document IDs. Now the guard only returns early for the
incremental (rebuild=False) path when no index exists on disk; the rebuild
path always continues through to persist(), producing an empty clean index.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Chore: Simplify incremental index update — use docs.values() and deduplicate node extend
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* PEP8 conformity
* rename run_post_consume_external_script to run_post_consume_script
* rename run_pre_consume_external_script to run_pre_consume_script
* change order of declaration and use from post...pre to pre...post