paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-04-03 14:48:50 +00:00

Author	SHA1	Message	Date
shamoon	51880b04f8	Anchor later query tokens in regex search	2026-04-02 16:03:39 -07:00
shamoon	e578090c30	Fix sonar thing, 1 more coverage line	2026-04-02 16:03:39 -07:00
shamoon	d9d3b4f81b	Add deprecated filter logging	2026-04-02 16:03:39 -07:00
shamoon	2184fb460d	Drop backwards migration	2026-04-02 16:03:39 -07:00
shamoon	97c804709a	De-dupe this	2026-04-02 16:03:39 -07:00
shamoon	6df75a03bd	Document TITLE search mode in TantivyBackend	2026-04-02 16:03:39 -07:00
shamoon	8f90aa368b	Rename migration	2026-04-02 16:03:39 -07:00
shamoon	2171b735cf	Switch simple substring search to simple_search analyzer	2026-04-02 16:03:39 -07:00
shamoon	a3da9fd8e0	Ok make it a proper filter type	2026-04-02 16:03:39 -07:00
shamoon	13e97ffa0f	Add a couple deprecation notes	2026-04-02 16:03:39 -07:00
shamoon	a0639f4830	Backend tests	2026-04-02 16:03:39 -07:00
shamoon	55f5404afb	Use tantivy for global search too	2026-04-02 16:03:39 -07:00
shamoon	7dfba1f38f	Wire the simple searches to view	2026-04-02 16:03:39 -07:00
shamoon	24efaeb5a6	Add a simple title query	2026-04-02 16:03:39 -07:00
shamoon	0a76c71925	Add simple text search mode and API param	2026-04-02 16:03:39 -07:00
GitHub Actions	83501757df	Auto translate strings	2026-04-02 22:36:32 +00:00
Trenton H	dda05a7c00	Security: Improve overall security in a few ways (#12501 ) - Make sure we're always using regex with timeouts for user controlled data - Adds rate limiting to the token endpoint (configurable) - Signs the classifier pickle file with the SECRET_KEY and refuse to load one which doesn't verify. - Require the user to set a secret key, instead of falling back to our old hard coded one	2026-04-02 15:30:26 -07:00
Trenton H	376af81b9c	Fix: Resolve another TC assuming an object has been created somewhere (#12503 )	2026-04-02 14:58:28 -07:00
GitHub Actions	05c9e21fac	Auto translate strings	2026-04-02 19:40:05 +00:00
Trenton H	aed9abe48c	Feature: Replace Whoosh with tantivy search backend (#12471 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Antoine Mérino <3023499+Merinorus@users.noreply.github.com>	2026-04-02 12:38:22 -07:00
GitHub Actions	2aa0c9f0b4	Auto translate strings	2026-03-31 18:25:03 +00:00
shamoon	d2328b776a	Performance: support bulk edit without id lists (#12355 )	2026-03-31 18:23:28 +00:00
GitHub Actions	e1da2a1efe	Auto translate strings	2026-03-31 14:57:34 +00:00
shamoon	245514ad10	Performance: deprecate and remove usage of `all` in API results (#12309 )	2026-03-31 07:55:59 -07:00
GitHub Actions	020057e1a4	Auto translate strings	2026-03-30 16:40:47 +00:00
shamoon	f715533770	Performance: support passing selection data with filtered document requests (#12300 )	2026-03-30 16:38:52 +00:00
Jan Kleine	0292edbee7	Fixhancement: include trashed documents in document exporter/importer (#12425 )	2026-03-30 16:30:22 +00:00
Andreas Schneider	85e0d1842a	Tests: add regression test for redis URL with empty username (#12460 ) * Tests: add regression test for redis URL with empty username and password Covers the unix://:SECRET@/path.sock format (empty username, password only), which was missing from the existing test cases for PR #12239. * Update src/paperless/tests/settings/test_custom_parsers.py --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-03-29 06:31:18 -07:00
GitHub Actions	62f79c088e	Auto translate strings	2026-03-28 21:00:05 +00:00
shamoon	129da3ade7	Tweakhancement: show file extension in StoragePath test (#12452 )	2026-03-28 13:58:33 -07:00
Trenton H	9383471fa0	Feature: Transition all checksums to use SHA256 (#12432 )	2026-03-26 11:28:02 -07:00
GitHub Actions	b153ec803b	Auto translate strings	2026-03-26 14:38:10 +00:00
shamoon	ae0474450f	Chore: logger, response and template sanitization cleanup (#12439 )	2026-03-26 07:36:02 -07:00
Trenton H	8efb01010c	fix: Don't silently drop the change_groups and switch to a couple slightly more efficient implementations (#12431 )	2026-03-26 14:15:42 +00:00
Trenton H	d18bbfa9c3	Chore: Instead of manual temporary directory management, use a context manager (#12430 )	2026-03-26 14:05:58 +00:00
GitHub Actions	79def8a200	Auto translate strings	2026-03-22 13:55:02 +00:00
Trenton H	701735f6e5	Chore: Drop old signal and unneeded apps, transition to parser registry instead (#12405 ) * refactor: switch consumer and callers to ParserRegistry (Phase 4) Replace all Django signal-based parser discovery with direct registry calls. Removes `_parser_cleanup`, `parser_is_new_style` shims, and all old-style isinstance checks. All parser instantiation now uses the `with parser_class() as parser:` context manager pattern. - documents/parsers.py: delegate to get_parser_registry(); drop lru_cache - documents/consumer.py: use registry + context manager; remove shims - documents/tasks.py: same pattern - documents/management/commands/document_thumbnails.py: same pattern - documents/views.py: get_metadata uses context manager - documents/checks.py: use get_parser_registry().all_parsers() - paperless/parsers/registry.py: add all_parsers() public method - tests: update mocks to target documents.consumer.get_parser_class_for_mime_type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: drop get_parser_class_for_mime_type; callers use registry directly All callers now call get_parser_registry().get_parser_for_file() with the actual filename and path, enabling score() to use file extension hints. The MIME-only helper is removed. - consumer.py: passes self.filename + self.working_copy - tasks.py: passes document.original_filename + document.source_path - document_thumbnails.py: same pattern - views.py: passes Path(file).name + Path(file) - parsers.py: internal helpers inline the registry call with filename="" - test_parsers.py: drop TestParserDiscovery (was testing mock behavior); TestParserAvailability uses registry directly - test_consumer.py: mocks switch to documents.consumer.get_parser_registry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: remove document_consumer_declaration signal infrastructure Remove the document_consumer_declaration signal that was previously used for parser registration. Each parser app no longer connects to this signal, and the signal declaration itself has been removed from documents/signals. Changes: - Remove document_consumer_declaration from documents/signals/__init__.py - Remove ready() methods and signal imports from all parser app configs - Delete signal shim files (signals.py) from all parser apps: - paperless_tesseract/signals.py - paperless_text/signals.py - paperless_tika/signals.py - paperless_mail/signals.py - paperless_remote/signals.py Parser discovery now happens exclusively through the ParserRegistry system introduced in the previous refactor phases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: remove empty paperless_text and paperless_tika Django apps After parser classes were moved to paperless/parsers/ in the plugin refactor, these Django apps contained only empty AppConfig classes with no models, views, tasks, migrations, or other functionality. - Remove paperless_text and paperless_tika from INSTALLED_APPS - Delete empty app directories entirely - Update pyproject.toml test exclusions - Clean stale mypy baseline entries for moved parser files paperless_remote app is retained as it contains meaningful system checks for Azure AI configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Moves the checks and tests to the main application and removes the old applications * Adds a comment to satisy Sonar * refactor: remove automatic log_summary() call from get_parser_registry() The summary was logged once per process, causing it to appear repeatedly during Docker startup (management commands, web server, each Celery worker subprocess). External parsers are already announced individually at INFO when discovered; the full summary is redundant noise. log_summary() is retained on ParserRegistry for manual/debug use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Cleans up the duplicate test file/fixture * Fixes a race condition where webserver threads could race to populate the registry --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-22 06:53:32 -07:00
GitHub Actions	07f54bfdab	Auto translate strings	2026-03-21 09:26:23 +00:00
shamoon	0f84af27d0	Merge branch 'main' into dev # Conflicts: # docs/setup.md # src-ui/src/main.ts # src/documents/tests/test_api_bulk_edit.py # src/documents/tests/test_api_custom_fields.py # src/documents/tests/test_api_search.py # src/documents/tests/test_api_status.py # src/documents/tests/test_workflows.py # src/paperless_mail/tests/test_api.py	2026-03-21 02:12:19 -07:00
shamoon	9646b8c67d	Bump version to 2.20.13	2026-03-21 01:50:04 -07:00
shamoon	e590d7df69	Merge branch 'release/v2.20.x'	2026-03-21 01:49:32 -07:00
shamoon	cc71aad058	Fix: suggest corrections only if visible results	2026-03-21 01:24:23 -07:00
shamoon	3cbdf5d0b7	Fix: require view permission for more-like search	2026-03-21 01:20:59 -07:00
shamoon	f84e0097e5	Fix validate document link targets	2026-03-21 00:55:36 -07:00
shamoon	7dbf8bdd4a	Fix: enforce permissions when attaching accounts to mail rules	2026-03-21 00:44:28 -07:00
shamoon	2cb155e717	Bump version to 2.20.12	2026-03-20 15:47:37 -07:00
shamoon	9e9fc6213c	Resolve GHSA-96jx-fj7m-qh6x	2026-03-20 15:39:15 -07:00
Trenton H	a9756f9462	Chore: Convert Tesseract parser to plugin style (#12403 ) * Move tesseract parser, tests, and samples to paperless.parsers Relocates files in preparation for the Phase 3 Protocol-based parser refactor, preserving full git history via rename. - src/paperless_tesseract/parsers.py -> src/paperless/parsers/tesseract.py - src/paperless_tesseract/tests/test_parser.py -> src/paperless/tests/parsers/test_tesseract_parser.py - src/paperless_tesseract/tests/test_parser_custom_settings.py -> src/paperless/tests/parsers/test_tesseract_custom_settings.py - src/paperless_tesseract/tests/samples/* -> src/paperless/tests/samples/tesseract/ - Moves RUF001 suppression from broad per-file pyproject.toml ignore to inline noqa comments on the two affected lines Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Refactor RasterisedDocumentParser to ParserProtocol interface - Add RasterisedDocumentParser to registry.register_defaults() - Update parser class: remove DocumentParser inheritance, add Protocol class attrs/classmethods/properties, context-manager lifecycle - Add read_file_handle_unicode_errors() to shared parsers/utils.py - Replace inline unicode-error-handling with shared utility call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update tesseract signals.py to import from new parser location RasterisedDocumentParser moved to paperless.parsers.tesseract; update the lazy import in signals.get_parser so the signal-based consumer declaration continues to work during the registry transition. Pop logging_group and progress_callback kwargs for constructor compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * tests: rewrite test_tesseract_parser to pytest style with typed fixtures - Converts all tests from Django TestCase to pytest-style classes - Adds tesseract_samples_dir, null_app_config, tesseract_parser, and make_tesseract_parser fixtures in conftest.py; all DB-free except TestOcrmypdfParameters which uses @pytest.mark.django_db - Defines MakeTesseractParser type alias in conftest.py for autocomplete - Fixes FBT001 (boolean positional args) by making bool params keyword-only with * separator in parametrize test signatures - Adds type annotations to all fixture parameters for IDE support - Uses pytest.param(..., id="...") throughout; pytest-mock for patching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(types): fully annotate paperless/parsers/tesseract.py Fixes all mypy and pyrefly errors in the new parser file: - Add missing type annotations to is_image, has_alpha, get_dpi, calculate_a4_dpi, construct_ocrmypdf_parameters, post_process_text - Narrow Path-only (no str) for image helper args; convert to str when building list[str] args for run_subprocess - Annotate ocrmypdf_args as dict[str, Any] so operator expressions on its values type-check and ocrmypdf.ocr(*args) resolves cleanly - Declare text: str \| None = None at top of extract_text to unify all assignments to the same type across both branches - Import Any from typing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Fixes isort * fix: add RasterisedDocumentParser to new-style parser shim checks The new RasterisedDocumentParser uses __enter__/__exit__ for resource management instead of cleanup(). Update all existing new-style shims to include it in the isinstance checks: - documents/consumer.py: _parser_cleanup(), parser_is_new_style - documents/tasks.py: parser_is_new_style, finally cleanup branch (also adds RemoteDocumentParser which was missing from the latter) - documents/management/commands/document_thumbnails.py: adds new-style handling from scratch (enter/exit + 2-arg get_thumbnail signature) Fix stale import paths in three test files that were still importing from paperless_tesseract.parsers instead of paperless.parsers.tesseract. Fix two registry tests that used application/pdf as a proxy for "no handler" — now that RasterisedDocumentParser is registered, PDF always has a handler, so switch to a truly unsupported MIME type. Signal infrastructure and shims remain intact; this is plumbing only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * One missed import (cherry pick?) * Adds a no cover for a special case of handling unicode errors in PDF metadata --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-20 12:46:07 -07:00
Trenton H	c2b8b22fb4	Chore: Convert mail parser to plugin style (#12397 ) * Refactor(mail): rename paperless_mail/parsers.py → paperless/parsers/mail.py Preserve git history for MailDocumentParser by committing the rename separately before editing, following the project convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Refactor(mail): move mail parser tests to paperless/tests/parsers/ Move test_parsers.py → test_mail_parser.py and test_parsers_live.py → test_mail_parser_live.py alongside the other built-in parser tests, preserving git history before editing. Update MailDocumentParser import to the new canonical location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Chore: move mail parser sample files to paperless/tests/samples/mail/ Relocate all mail test fixtures from src/paperless_mail/tests/samples/ to src/paperless/tests/samples/mail/ ahead of the parser plugin refactor. Add the new path to the codespell skip list to prevent false-positive spell corrections in binary/fixture email files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Feat(tests): add mail parser fixtures to paperless/tests/parsers/conftest.py Add mail_samples_dir, per-file sample fixtures, and mail_parser (context-manager style) to mirror the old paperless_mail conftest but rooted at the new samples/mail/ location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Feat(parsers): migrate MailDocumentParser to ParserProtocol Move the mail parser from paperless_mail/parsers.py to paperless/parsers/mail.py and refactor it to implement ParserProtocol: - Class-level name/version/author/url attributes - supported_mime_types() and score() classmethods (score=20) - can_produce_archive=False, requires_pdf_rendition=True - Context manager lifecycle (__enter__/__exit__) - New parse() signature without mailrule_id kwarg; consumer sets parser.mailrule_id before calling parse() instead - get_text()/get_date()/get_archive_path() accessor methods - extract_metadata() returning email headers and attachment info Register MailDocumentParser in the ParserRegistry alongside Text and Tika parsers. Update consumer, signals, and all import sites to use the new location. Update tests to use the new accessor API, patch paths, and context-manager fixture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix(parsers): pop legacy constructor args in mail signal wrapper MailDocumentParser.__init__ takes no constructor args in the new protocol. Update the get_parser() signal wrapper to pop logging_group and progress_callback (passed by the legacy consumer dispatch path) before instantiating — the same pattern used by TextDocumentParser. Also update test_mail_parser_receives_mailrule to use the real signal wrapper (mail_get_parser) instead of MailDocumentParser directly, so the test exercises the actual dispatch path and matches the new parse() call signature (no mailrule kwarg). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Bumps this so we can run * Fixes location of the fixture * Removes fixtures which were duplicated * Feat(parsers): add ParserContext and configure() to ParserProtocol Replace the ad-hoc mailrule_id attribute assignment with a typed, immutable ParserContext dataclass and a configure() method on the Protocol: - ParserContext(frozen=True, slots=True) lives in paperless/parsers/ alongside ParserProtocol and MetadataEntry; currently carries only mailrule_id but is designed to grow with output_type, ocr_mode, and ocr_language in a future phase (decoupling parsers from settings.) - ParserProtocol.configure(context: ParserContext) -> None is the extension point; no-op by default - MailDocumentParser.configure() reads mailrule_id into _mailrule_id - TextDocumentParser and TikaDocumentParser implement a no-op configure() - Consumer calls document_parser.configure(ParserContext(...)) before parse(), replacing the isinstance(parser, MailDocumentParser) guard and the direct attribute mutation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Feat(parsers): call configure(ParserContext()) in update_document task Apply the same new-style parser shim pattern as the consumer to update_document_content_maybe_archive_file: - Call __enter__ for Text/Tika parsers after instantiation - Call configure(ParserContext()) before parse() for all new-style parsers (mailrule_id is not available here — this is a re-process of an existing document, so the default empty context is correct) - Call parse(path, mime_type) with 2 args for new-style parsers - Call get_thumbnail(path, mime_type) with 2 args for new-style parsers - Call __exit__ instead of cleanup() in the finally block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix(tests): add configure() to DummyParser and missing-method parametrize ParserProtocol now requires configure(context: ParserContext) -> None. Update DummyParser in test_registry.py to implement it, and add 'missing-configure' to the test_partial_compliant_fails_isinstance parametrize list so the new method is covered by the negative test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Cleans up the reprocess task and generally reduces duplicate of classes * Corrects the score return * Updates so we can report a page count for these parsers, assuming we do have an archive produced when called * Increases test coverage * One more coverage * Updates typing * Updates typing --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-20 09:22:18 -07:00
Trenton H	68fc898042	Fix: Resolve more instances of tests which mutated global states (#12395 )	2026-03-19 10:05:07 -07:00

1 2 3 4 5 ...

3925 Commits