paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-04-02 14:22:43 +00:00

Author	SHA1	Message	Date
shamoon	2cbbbaf170	Add a couple deprecation notes	2026-04-01 11:08:01 -07:00
shamoon	71889e6e90	Use tantivy for global search too	2026-04-01 10:58:39 -07:00
shamoon	9139507bd6	Wire the simple searches to view	2026-04-01 10:18:26 -07:00
shamoon	631074e4ed	Add simple text search mode and API param	2026-04-01 10:08:27 -07:00
Trenton H	eac4a6ca05	Merge remote-tracking branch 'origin/dev' into feature-tantivy-search-backend Hopefully the conflicts are good	2026-03-31 11:57:10 -07:00
shamoon	d2328b776a	Performance: support bulk edit without id lists (#12355 )	2026-03-31 18:23:28 +00:00
shamoon	245514ad10	Performance: deprecate and remove usage of `all` in API results (#12309 )	2026-03-31 07:55:59 -07:00
Trenton H	eaa23751de	Merge: resolve conflict with include_selection_data from #12300 The origin/dev branch added include_selection_data support to the search list() view (PR #12300). Our Tantivy list() had replaced the Whoosh implementation entirely, causing a conflict. Resolution: keep the Tantivy implementation and incorporate the include_selection_data feature. When requested, selection_data is computed over all matching document IDs from ordered_hits (the full Tantivy result set, not just the current page). Also update test_search_with_include_selection_data from #12300 to use the Tantivy indexing API (get_backend().add_or_update) instead of the removed Whoosh AsyncWriter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 17:08:39 -07:00
Trenton H	50f6b2d4c3	feat(search): wire Tantivy backend into all callsites; remove Whoosh - Replace all `from documents import index` + Whoosh writer usage across admin.py, bulk_edit.py, tasks.py, views.py, signals/handlers.py with `get_backend().add_or_update/remove/batch_update` - Add `effective_content` param to `_build_tantivy_doc` / `add_or_update` (used by signal handler to re-index root doc with version's OCR text) - Add `wipe_index()` (renamed from `_wipe_index`) to public API; use from `document_index --recreate` flag - `index_optimize()` replaced with deprecation log message; Tantivy manages segment merging automatically - `index_reindex()` now calls `get_backend().rebuild()` + `reset_backend()` with select_related/prefetch_related for efficiency - `document_index` management command: add `--recreate` flag - Status view: use `get_backend()` + dir mtime scan instead of Whoosh `ix.last_modified()` - Delete `documents/index.py`, `test_index.py`, `test_delayedquery.py` - Update all tests: patch `documents.search.get_backend` (lazy imports); `DirectoriesMixin` calls `reset_backend()` in setUp/tearDown; `TestDocumentConsumptionFinishedSignal` likewise - `test_api_search.py`: fix order-independent assertions for date-range queries; fix `_rewrite_8digit_date` to be field-aware and timezone-correct for DateTimeField vs DateField Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 10:43:30 -07:00
shamoon	f715533770	Performance: support passing selection data with filtered document requests (#12300 )	2026-03-30 16:38:52 +00:00
shamoon	129da3ade7	Tweakhancement: show file extension in StoragePath test (#12452 )	2026-03-28 13:58:33 -07:00
shamoon	ae0474450f	Chore: logger, response and template sanitization cleanup (#12439 )	2026-03-26 07:36:02 -07:00
Trenton H	701735f6e5	Chore: Drop old signal and unneeded apps, transition to parser registry instead (#12405 ) * refactor: switch consumer and callers to ParserRegistry (Phase 4) Replace all Django signal-based parser discovery with direct registry calls. Removes `_parser_cleanup`, `parser_is_new_style` shims, and all old-style isinstance checks. All parser instantiation now uses the `with parser_class() as parser:` context manager pattern. - documents/parsers.py: delegate to get_parser_registry(); drop lru_cache - documents/consumer.py: use registry + context manager; remove shims - documents/tasks.py: same pattern - documents/management/commands/document_thumbnails.py: same pattern - documents/views.py: get_metadata uses context manager - documents/checks.py: use get_parser_registry().all_parsers() - paperless/parsers/registry.py: add all_parsers() public method - tests: update mocks to target documents.consumer.get_parser_class_for_mime_type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: drop get_parser_class_for_mime_type; callers use registry directly All callers now call get_parser_registry().get_parser_for_file() with the actual filename and path, enabling score() to use file extension hints. The MIME-only helper is removed. - consumer.py: passes self.filename + self.working_copy - tasks.py: passes document.original_filename + document.source_path - document_thumbnails.py: same pattern - views.py: passes Path(file).name + Path(file) - parsers.py: internal helpers inline the registry call with filename="" - test_parsers.py: drop TestParserDiscovery (was testing mock behavior); TestParserAvailability uses registry directly - test_consumer.py: mocks switch to documents.consumer.get_parser_registry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: remove document_consumer_declaration signal infrastructure Remove the document_consumer_declaration signal that was previously used for parser registration. Each parser app no longer connects to this signal, and the signal declaration itself has been removed from documents/signals. Changes: - Remove document_consumer_declaration from documents/signals/__init__.py - Remove ready() methods and signal imports from all parser app configs - Delete signal shim files (signals.py) from all parser apps: - paperless_tesseract/signals.py - paperless_text/signals.py - paperless_tika/signals.py - paperless_mail/signals.py - paperless_remote/signals.py Parser discovery now happens exclusively through the ParserRegistry system introduced in the previous refactor phases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: remove empty paperless_text and paperless_tika Django apps After parser classes were moved to paperless/parsers/ in the plugin refactor, these Django apps contained only empty AppConfig classes with no models, views, tasks, migrations, or other functionality. - Remove paperless_text and paperless_tika from INSTALLED_APPS - Delete empty app directories entirely - Update pyproject.toml test exclusions - Clean stale mypy baseline entries for moved parser files paperless_remote app is retained as it contains meaningful system checks for Azure AI configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Moves the checks and tests to the main application and removes the old applications * Adds a comment to satisy Sonar * refactor: remove automatic log_summary() call from get_parser_registry() The summary was logged once per process, causing it to appear repeatedly during Docker startup (management commands, web server, each Celery worker subprocess). External parsers are already announced individually at INFO when discovered; the full summary is redundant noise. log_summary() is retained on ParserRegistry for manual/debug use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Cleans up the duplicate test file/fixture * Fixes a race condition where webserver threads could race to populate the registry --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-22 06:53:32 -07:00
shamoon	0f84af27d0	Merge branch 'main' into dev # Conflicts: # docs/setup.md # src-ui/src/main.ts # src/documents/tests/test_api_bulk_edit.py # src/documents/tests/test_api_custom_fields.py # src/documents/tests/test_api_search.py # src/documents/tests/test_api_status.py # src/documents/tests/test_workflows.py # src/paperless_mail/tests/test_api.py	2026-03-21 02:12:19 -07:00
shamoon	3cbdf5d0b7	Fix: require view permission for more-like search	2026-03-21 01:20:59 -07:00
shamoon	9e9fc6213c	Resolve GHSA-96jx-fj7m-qh6x	2026-03-20 15:39:15 -07:00
shamoon	87ebd13abc	Fix: remove pagination from document notes api spec (#12388 )	2026-03-18 06:48:05 -07:00
Trenton H	aea2927a02	Feature: Convert Tika parser to the plugin system (#12333 ) * Chore: move Tika parser and tests to paperless/ Move TikaDocumentParser and its tests to the canonical parser package location, matching the pattern established for TextDocumentParser: - src/paperless_tika/parsers.py → src/paperless/parsers/tika.py - src/paperless_tika/tests/test_tika_parser.py → src/paperless/tests/parsers/test_tika_parser.py - src/paperless_tika/tests/samples/ → src/paperless/tests/samples/tika/ Merge tika fixtures (tika_parser, sample_odt_file, sample_docx_file, sample_doc_file, sample_broken_odt) into the shared parsers conftest. Remove the now-empty src/paperless_tika/tests/conftest.py. Content is unchanged — this commit is rename-only so git history is preserved on the moved files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Feature: Phase 3 — migrate TikaDocumentParser to ParserProtocol Refactor TikaDocumentParser to satisfy ParserProtocol without subclassing the legacy DocumentParser ABC: - Add ClassVars: name, version, author, url - Add supported_mime_types() classmethod (12 Office/ODF/RTF MIME types) - Add score() classmethod — returns None when TIKA_ENABLED is False, 10 otherwise - can_produce_archive = False (PDF is for display, not an OCR archive) - requires_pdf_rendition = True (Office formats need PDF for browser display) - __enter__/__exit__ via ExitStack: TikaClient opened once per parser lifetime and shared across parse() and extract_metadata() calls - extract_metadata() falls back to a short-lived TikaClient when called outside a context manager (legacy view-layer metadata path) - _convert_to_pdf() uses OutputTypeConfig() to honour the database-stored ApplicationConfiguration before falling back to the env-var setting - Rename convert_to_pdf → _convert_to_pdf (private helper) Update paperless_tika/signals.py shim to import from the new module path and drop the legacy logging_group/progress_callback kwargs. Update documents/consumer.py to extend the existing TextDocumentParser special cases to also cover TikaDocumentParser (parse/get_thumbnail signatures, __exit__ cleanup). Add TestTikaParserRegistryInterface (7 tests) covering score(), properties, and ParserProtocol isinstance check. Update existing tests to use the new accessor API (get_text, get_date, get_archive_path, _convert_to_pdf). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: update remaining imports and move live Tika tests after parser migration - src/documents/tests/test_parsers.py: import TikaDocumentParser from paperless.parsers.tika (old paperless_tika.parsers no longer exists) - git mv paperless_tika/tests/test_live_tika.py → paperless/tests/parsers/test_live_tika.py to co-locate all Tika tests with the parser; update import and replace old attribute API (tika_parser.text/.archive_path) with accessor methods (get_text/get_archive_path) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: satisfy mypy and pyrefly for TikaDocumentParser Use a TYPE_CHECKING-guarded assert to narrow self._tika_client from TikaClient \| None to TikaClient at the point of use in parse(). The assert is visible to type checkers (TYPE_CHECKING=True) so both mypy and pyrefly accept the subsequent attribute accesses without error; at runtime TYPE_CHECKING is False so the assert never executes and no ruff S101 suppression is required. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix: require context manager for TikaDocumentParser; clean up client lifecycle - consumer.py: call __enter__ for new-style parsers so _tika_client and _gotenberg_client are set before parse() is invoked - views.py: use `with parser` (via nullcontext for old-style parsers) in get_metadata so extract_metadata always runs inside a context manager - tika.py: GotenbergClient added to ExitStack alongside TikaClient; inline client creation removed from extract_metadata and _convert_to_pdf; __exit__ uses ExitStack.close() instead of __exit__ pass-through Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-17 15:43:28 -07:00
shamoon	2bb4af2be6	Change: sort custom fields alphabetically by default (#12358 )	2026-03-15 22:52:02 -07:00
shamoon	45b363659e	Chore: mark document detail email action as deprecated (#12308 )	2026-03-12 15:42:14 +00:00
shamoon	86573fc1a0	Chore: separate actions from bulk edit endpoint (#12286 )	2026-03-10 18:55:36 +00:00
shamoon	85a18e5911	Enhancement: saved view sharing (#12142 )	2026-03-04 14:15:43 -08:00
shamoon	d51a118aac	Merge branch 'main' into dev	2026-03-04 13:31:20 -08:00
shamoon	5b809122b5	Fix: apply ordering after annotating tag document count (#12238 )	2026-03-04 00:33:13 -08:00
shamoon	96ac7b2336	Tweak: Ignore version docs for workflows (#12217 )	2026-03-02 08:21:14 -08:00
shamoon	f65807b906	Merge branch 'main' into dev # Conflicts: # docs/setup.md # src-ui/src/app/components/manage/document-attributes/management-list/management-list.component.ts # src/documents/tests/test_api_documents.py	2026-02-28 02:31:20 -08:00
shamoon	c7f83212a3	Enforce on selection_data too	2026-02-28 01:27:40 -08:00
Jan Kleine	c86ebc0260	Enhancment: Formatted filename for single document downloads (#12095 ) --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-02-26 18:06:47 +00:00
shamoon	ceee769e26	Feature: document file versions (#12061 )	2026-02-26 16:46:54 +00:00
Trenton H	53ac338946	Breaking: Removes API v1 and the related serializer (#12166 )	2026-02-26 04:06:43 +00:00
shamoon	426c0a8974	Chore: typing fixes	2026-02-16 09:54:06 -08:00
shamoon	4884b67714	Fix more typing failures	2026-02-16 09:37:33 -08:00
shamoon	02896a15fd	Fix: only pass user to SerializerWithPerms serializers	2026-02-16 09:31:33 -08:00
shamoon	d8e07b8d84	Fix typing issue	2026-02-16 09:17:20 -08:00
shamoon	be4e29a19c	Merge branch 'main' into dev	2026-02-16 09:01:19 -08:00
shamoon	afaf39e43a	Fix/GHSA-x395-6h48-wr8v	2026-02-16 00:02:15 -08:00
shamoon	5b45b89d35	Performance fix: use subqueries to improve object retrieval in large installs (#11950 )	2026-02-05 08:46:32 -08:00
Trenton H	2ec8ec96c8	Feature: Enable users to customize date parsing via plugins (#11931 )	2026-02-03 20:09:13 +00:00
Sebastian Steinbeißer	3b5ffbf9fa	Chore(mypy): Annotate `None` returns for typing improvements (#11213 )	2026-02-02 08:44:12 -08:00
shamoon	c3b036e0d3	Merge branch 'main' into dev	2026-01-31 09:10:33 -08:00
shamoon	c8c4c7c749	Security: enforce permissions for post_document	2026-01-30 12:14:18 -08:00
shamoon	e4b861d76f	Fix: prevent note deletion outside doc	2026-01-29 13:35:01 -08:00
shamoon	1f074390e4	Feature: sharelink bundles (#11682 )	2026-01-27 18:54:51 +00:00
Antoine Mérino	df07b8a03e	Performance: faster statistics panel on dashboard (#11760 )	2026-01-26 12:10:57 -08:00
shamoon	857aaca493	Merge branch 'release/v2.20.x' into dev	2026-01-26 09:25:58 -08:00
shamoon	891f4a2faf	Fix: correctly extract all ids for nested tags (#11888 )	2026-01-26 09:12:03 -08:00
shamoon	2312314aa7	Performance: improve treenode inefficiencies (#11606 )	2026-01-25 21:47:08 -08:00
Trenton H	d0032c18be	Breaking: Remove support for document and thumbnail encryption (#11850 )	2026-01-24 19:29:54 -08:00
shamoon	00bb92e3e1	Fix: support ordering by storage path name (#11661 )	2026-01-13 09:36:14 -08:00
shamoon	e940764fe0	Feature: Paperless AI (#10319 )	2026-01-13 16:24:42 +00:00

1 2 3 4 5 ...

444 Commits