paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-08-03 01:22:17 +00:00

Author	SHA1	Message	Date
stumpylog	75a31ee09b	Extracts some common code into helpers instead of duplication	2026-06-05 13:31:32 -07:00
stumpylog	a23888aa1b	Simplify table existence checking and ensuring a table exists	2026-06-05 13:15:50 -07:00
stumpylog	e3ebe7cda1	Abstracts the modified check into the vector store	2026-06-05 13:06:44 -07:00
stumpylog	8533b99adf	Don't need a dead parameter	2026-06-05 13:02:45 -07:00
stumpylog	b54b8a23ce	Don't always re-create the document_id index, do it only if not already existing	2026-06-05 12:58:22 -07:00
stumpylog	a5f7a5561d	ensure the llm_dir exists for the write_store too	2026-06-05 12:49:41 -07:00
stumpylog	3c2ef25edd	Fixes this test so it works regardless of cwd	2026-06-05 11:56:49 -07:00
stumpylog	09b3063344	Small targeted tests for coverage or pragma no cover	2026-06-05 11:53:08 -07:00
stumpylog	60faa3f20f	Removes the spec andplan files	2026-06-05 11:43:42 -07:00
stumpylog	ca6dca0efe	Adds a new compact sub-command + handler to force compact lancedb version	2026-06-05 11:43:42 -07:00
stumpylog	3aa83c9e4c	To reduce embedding size, don't store the metadata in the body. Body is content + a few other things, metadata keys hold the metadata	2026-06-05 11:43:42 -07:00
stumpylog	e7f8bf0542	Globally reduces httpx logging	2026-06-05 11:43:42 -07:00
stumpylog shamoon Claude Opus 4.8	707c3d7842	fix(ai): sort document_id filter values; add chat filter scoping test - chat.py: use sorted() for doc_ids in the MetadataFilters IN clause, matching the same pattern used in query_similar_documents. Ensures deterministic filter construction regardless of document iteration order. - test_chat.py: add test_chat_filter_contains_only_requested_document_ids verifying that the retriever receives a filter scoped only to the requested documents (not all indexed documents). Inspired by test_document_filtered_retriever_applies_lancedb_metadata_filter in origin/feature/beta-lancedb. Co-Authored-By: shamoon <shamoon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	eab0a4abea	fix(ai): rename vector_store_file_exists -> llm_index_exists in views.py Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	a9d339157f	refactor(ai): write_store() context manager wraps the FileLock All mutating index operations (upsert, delete, rebuild, compact) now use with write_store() as store: instead of explicit FileLock + get_vector_store() at each call site. Read paths continue to use get_vector_store() directly (no lock needed). Also type-annotates test fixture params throughout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	2c3c892dae	test(ai): type-annotate fixture parameters Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	17755a2c58	refactor(ai): cleanup pass — naming, batched embedding, remove dead wrappers - Rename vector_store_file_exists -> llm_index_exists (accurate now) - Rename _iter_existing_modified -> _stored_modified_times; project away vector column (cheap scan) and return dict[doc_id, modified_str] directly - Drop _index_lock_path() indirection; inline settings.LLM_INDEX_LOCK - Move LLM_INDEX_LOCK alongside the index dir (drop_table is safe; no rmtree) - Drop current_embedding_dim() redirect; callers use get_embedding_dim() - Drop lazy-import explanatory comments (constraint lives in CLAUDE.md) - Batch embedding calls via get_text_embedding_batch() in all three loops - get_nodes: raise NotImplementedError for node_ids (was silently ignored) - has_nodes(): cheap limit(1) existence check; chat.py uses it instead of get_nodes() which materialized all matching rows - conftest: use mocker fixture (pytest-mock) instead of bare patch; add LLM_INDEX_LOCK to temp_llm_index_dir override; type-annotate mock_embed_model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	6775fed68e	types(ai): pass pyrefly for the LanceDB vector store code Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	2673bc7b9c	refactor(ai): drop unused delete_nodes and node_ids path from the adapter Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	97fb1ec313	test(ai): drop FAISS-internal assertions Remove tests that validated removed internals (get_or_create_storage_context, remove_document_docstore_nodes, index.docstore.docs) and rewrite the remaining ones to assert against the LanceDB store directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	e5acadff52	refactor(ai): chat uses a stock filtered retriever Delete _get_document_filtered_retriever (74-line custom FAISS retriever with expanding top_k loop) and rewrite _stream_chat_with_documents to use a stock VectorIndexRetriever with MetadataFilters(IN). The no-content pre-check now calls index.vector_store.get_nodes(filters=...) which returns [] cleanly for un-indexed documents. Move FakeEmbedding and mock_embed_model fixture to conftest.py so both test_chat.py and test_ai_indexing.py share them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	6e69dc78a7	feat(ai): dimension guard and FAISS index migration Drops migrate_stale_faiss_index (users delete llm_index/ manually on upgrade). Keeps embedding_dim_mismatch to force a rebuild when the model changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	b855eba878	feat(ai): dimension guard and FAISS index migration Adds current_embedding_dim() to embedding.py, migrate_stale_faiss_index() and embedding_dim_mismatch() to indexing.py, and wires both into update_llm_index so that stale FAISS directories are wiped on startup and embedding model changes force a full index rebuild. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	1f2af9087c	refactor(ai): query_similar_documents via metadata filter Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	6bb8212f20	refactor(ai): group new LanceDB indexing tests in a class Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:42 -07:00
stumpylogandClaude Opus 4.8	f41e32cfcd	refactor(ai): build the index from the LanceDB store alone (lazy import) Replace get_or_create_storage_context with get_vector_store() (lazy import of paperless_ai.vector_store inside the function), rewrite load_or_build_index to use VectorStoreIndex.from_vector_store, and rewrite vector_store_file_exists to use store.table_exists(). Add LLM_INDEX_TABLE constant and TYPE_CHECKING-only import of PaperlessLanceVectorStore. Delete remove_document_docstore_nodes and rewire llm_index_add_or_update_document, llm_index_remove_document, and update_llm_index to use upsert_document/delete/drop_table on the LanceDB store. Serialize tags list as JSON string to satisfy flat_metadata validation. Add test_get_vector_store_roundtrip, test_add_then_remove_document, test_update_shrinks_chunks_without_orphans, and the subprocess lazy-import guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	2f5d199fef	feat(ai): tie LlamaDocument id to the paperless document id Set id_=str(document.id) on the LlamaDocument constructor in build_document_node so that every chunk node's ref_doc_id equals the paperless document pk, enabling the LanceDB adapter's delete(str(doc.id)) and doc_id column to work correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	848f140c04	refactor(ai): drop version-defensive vector-index check (lancedb is pinned) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	4219622e1b	refactor(ai): log when the vector-index check fails Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	52b77413f9	feat(ai): ANN index threshold, scalar index, and compaction Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	849c0fbe23	fix(ai): upsert empty-nodes path deletes by document_id When upsert_document receives an empty nodes list, delete existing chunks using the document_id column directly (consistent with the merge_insert prune predicate) rather than calling delete() which filters on doc_id. Guard for a missing table to avoid a no-op. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	f9e5480c64	feat(ai): atomic upsert_document on the LanceDB store Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	9367cf531e	docs(plan): add Task 13 — pass new AI code through pyrefly Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	5dac7c897f	refactor(ai): address review on the LanceDB adapter - Fix delete() to use single-quote delimiter consistent with _escape - Fix _distance comment: L2 not squared-L2 - Fix similarity_top_k zero-guard to use explicit None check - Replace deprecated table_names() with list_tables().tables (lancedb 0.33) - Add add() Sequence[BaseNode] signature with collections.abc.Sequence import - Add test_build_where_or_condition for OR filter branch coverage Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	0bab965d9c	feat(ai): add LanceDB-backed vector store adapter Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	a7ea06c820	build: replace faiss-cpu with lancedb for the AI vector store Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	714d3d68c5	Design: Implementation plan for the LanceDB vector store Task-by-task TDD plan implementing the LanceDB design spec: dependency swap, the PaperlessLanceVectorStore adapter, atomic merge_insert upsert, ANN threshold + scalar index + compaction, the indexing/chat/similar rewires, FAISS migration, and a lazy-import guard test so non-AI paths (management commands) never drag in llama_index/lancedb/pyarrow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
stumpylogandClaude Opus 4.8	f0d233631a	Design: Replace FAISS vector store with LanceDB (custom adapter) Spec for swapping the AI feature's llama-index FAISS StorageContext trio (FaissVectorStore + SimpleDocumentStore + SimpleIndexStore) for LanceDB via a custom BasePydanticVectorStore adapter (no llama-index-vector-stores-lancedb, no pandas). Covers: disk-resident memory-mapped storage, native merge_insert upsert with when_not_matched_by_source_delete, MetadataFilters(IN) filtering on a top-level document_id column, auto IVF ANN threshold (IVF_FLAT fallback), MVCC compaction via optimize(cleanup_older_than=...), migration, concurrency, and testing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 11:43:41 -07:00
shamoonandGitHub	449fd97b1f	Fix (beta): respect disable state for suggest endpoint, require change perms (#12942 )	2026-06-05 14:16:53 +00:00
Trenton HandGitHub	fa0c4368d7	Fix: Ensure checksum comparison is using SHA256 in file handling (#12939 )	2026-06-05 06:46:45 -07:00
shamoon	289d797837	Merge branch 'dev' into beta	2026-06-03 15:12:44 -07:00
dependabot[bot]GitHubdependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	f3eb8d4f58	docker-compose(deps): bump apache/tika in /docker/compose (#12912 ) Bumps apache/tika from 3.2.3.0 to 3.3.1.0. --- updated-dependencies: - dependency-name: apache/tika dependency-version: 3.3.1.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 13:13:14 -07:00
dependabot[bot]GitHubdependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	eab964124d	docker-compose(deps): bump gotenberg/gotenberg in /docker/compose (#12910 ) Bumps gotenberg/gotenberg from 8.27 to 8.33. --- updated-dependencies: - dependency-name: gotenberg/gotenberg dependency-version: '8.33' dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 12:40:18 -07:00
Trenton HandGitHub	7ef6ba69e6	Fix: Validate the AI backend settings earlier instead of crashing inside the AI module (#12903 )	2026-06-03 12:16:09 -07:00
dependabot[bot]GitHubdependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2e9b07b77f	docker-compose(deps): Bump nginx in /docker/compose (#12911 ) Bumps nginx from 1.29.5-alpine to 1.31.1-alpine. --- updated-dependencies: - dependency-name: nginx dependency-version: 1.31.1-alpine dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 11:41:13 -07:00
Trenton HandGitHub	abdcdccf08	Chore(deps): Silence a couple more vulnerabilities here (#12797 )	2026-06-03 09:28:00 -07:00
shamoonandGitHub	1663ed170c	Enhancement (beta): add direct LLM language setting (#12906 )	2026-06-03 15:53:22 +00:00
dependabot[bot]GitHub shamoon	59f22a3d59	Chore(deps-dev): Bump @playwright/test from 1.59.1 to 1.60.0 in /src-ui (#12919 ) Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com> Signed-off-by: dependabot[bot] <support@github.com>	2026-06-03 15:49:50 +00:00
shamoonandGitHub	47a6fcfc39	Fix (beta): correctly apply i18n in suggestions dropdown (#12905 )	2026-06-03 08:40:06 -07:00
dependabot[bot]GitHubdependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	edcc78d450	Chore(deps-dev): Bump @types/node from 25.6.0 to 25.9.1 in /src-ui (#12915 ) Bumps [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) from 25.6.0 to 25.9.1. - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) --- updated-dependencies: - dependency-name: "@types/node" dependency-version: 25.9.1 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-03 15:26:15 +00:00

1 2 3 4 5 ...