paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-04-20 23:09:28 +00:00

Author	SHA1	Message	Date
Trenton H	c232d443fa	Breaking: Decouple OCR control from archive file control (#12448 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-04-06 15:50:21 -07:00
Trenton H	9383471fa0	Feature: Transition all checksums to use SHA256 (#12432 )	2026-03-26 11:28:02 -07:00
Trenton H	701735f6e5	Chore: Drop old signal and unneeded apps, transition to parser registry instead (#12405 ) * refactor: switch consumer and callers to ParserRegistry (Phase 4) Replace all Django signal-based parser discovery with direct registry calls. Removes `_parser_cleanup`, `parser_is_new_style` shims, and all old-style isinstance checks. All parser instantiation now uses the `with parser_class() as parser:` context manager pattern. - documents/parsers.py: delegate to get_parser_registry(); drop lru_cache - documents/consumer.py: use registry + context manager; remove shims - documents/tasks.py: same pattern - documents/management/commands/document_thumbnails.py: same pattern - documents/views.py: get_metadata uses context manager - documents/checks.py: use get_parser_registry().all_parsers() - paperless/parsers/registry.py: add all_parsers() public method - tests: update mocks to target documents.consumer.get_parser_class_for_mime_type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: drop get_parser_class_for_mime_type; callers use registry directly All callers now call get_parser_registry().get_parser_for_file() with the actual filename and path, enabling score() to use file extension hints. The MIME-only helper is removed. - consumer.py: passes self.filename + self.working_copy - tasks.py: passes document.original_filename + document.source_path - document_thumbnails.py: same pattern - views.py: passes Path(file).name + Path(file) - parsers.py: internal helpers inline the registry call with filename="" - test_parsers.py: drop TestParserDiscovery (was testing mock behavior); TestParserAvailability uses registry directly - test_consumer.py: mocks switch to documents.consumer.get_parser_registry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: remove document_consumer_declaration signal infrastructure Remove the document_consumer_declaration signal that was previously used for parser registration. Each parser app no longer connects to this signal, and the signal declaration itself has been removed from documents/signals. Changes: - Remove document_consumer_declaration from documents/signals/__init__.py - Remove ready() methods and signal imports from all parser app configs - Delete signal shim files (signals.py) from all parser apps: - paperless_tesseract/signals.py - paperless_text/signals.py - paperless_tika/signals.py - paperless_mail/signals.py - paperless_remote/signals.py Parser discovery now happens exclusively through the ParserRegistry system introduced in the previous refactor phases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: remove empty paperless_text and paperless_tika Django apps After parser classes were moved to paperless/parsers/ in the plugin refactor, these Django apps contained only empty AppConfig classes with no models, views, tasks, migrations, or other functionality. - Remove paperless_text and paperless_tika from INSTALLED_APPS - Delete empty app directories entirely - Update pyproject.toml test exclusions - Clean stale mypy baseline entries for moved parser files paperless_remote app is retained as it contains meaningful system checks for Azure AI configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Moves the checks and tests to the main application and removes the old applications * Adds a comment to satisy Sonar * refactor: remove automatic log_summary() call from get_parser_registry() The summary was logged once per process, causing it to appear repeatedly during Docker startup (management commands, web server, each Celery worker subprocess). External parsers are already announced individually at INFO when discovered; the full summary is redundant noise. log_summary() is retained on ParserRegistry for manual/debug use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Cleans up the duplicate test file/fixture * Fixes a race condition where webserver threads could race to populate the registry --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-22 06:53:32 -07:00
Trenton H	c2b8b22fb4	Chore: Convert mail parser to plugin style (#12397 ) * Refactor(mail): rename paperless_mail/parsers.py → paperless/parsers/mail.py Preserve git history for MailDocumentParser by committing the rename separately before editing, following the project convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Refactor(mail): move mail parser tests to paperless/tests/parsers/ Move test_parsers.py → test_mail_parser.py and test_parsers_live.py → test_mail_parser_live.py alongside the other built-in parser tests, preserving git history before editing. Update MailDocumentParser import to the new canonical location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Chore: move mail parser sample files to paperless/tests/samples/mail/ Relocate all mail test fixtures from src/paperless_mail/tests/samples/ to src/paperless/tests/samples/mail/ ahead of the parser plugin refactor. Add the new path to the codespell skip list to prevent false-positive spell corrections in binary/fixture email files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Feat(tests): add mail parser fixtures to paperless/tests/parsers/conftest.py Add mail_samples_dir, per-file sample fixtures, and mail_parser (context-manager style) to mirror the old paperless_mail conftest but rooted at the new samples/mail/ location. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Feat(parsers): migrate MailDocumentParser to ParserProtocol Move the mail parser from paperless_mail/parsers.py to paperless/parsers/mail.py and refactor it to implement ParserProtocol: - Class-level name/version/author/url attributes - supported_mime_types() and score() classmethods (score=20) - can_produce_archive=False, requires_pdf_rendition=True - Context manager lifecycle (__enter__/__exit__) - New parse() signature without mailrule_id kwarg; consumer sets parser.mailrule_id before calling parse() instead - get_text()/get_date()/get_archive_path() accessor methods - extract_metadata() returning email headers and attachment info Register MailDocumentParser in the ParserRegistry alongside Text and Tika parsers. Update consumer, signals, and all import sites to use the new location. Update tests to use the new accessor API, patch paths, and context-manager fixture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix(parsers): pop legacy constructor args in mail signal wrapper MailDocumentParser.__init__ takes no constructor args in the new protocol. Update the get_parser() signal wrapper to pop logging_group and progress_callback (passed by the legacy consumer dispatch path) before instantiating — the same pattern used by TextDocumentParser. Also update test_mail_parser_receives_mailrule to use the real signal wrapper (mail_get_parser) instead of MailDocumentParser directly, so the test exercises the actual dispatch path and matches the new parse() call signature (no mailrule kwarg). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Bumps this so we can run * Fixes location of the fixture * Removes fixtures which were duplicated * Feat(parsers): add ParserContext and configure() to ParserProtocol Replace the ad-hoc mailrule_id attribute assignment with a typed, immutable ParserContext dataclass and a configure() method on the Protocol: - ParserContext(frozen=True, slots=True) lives in paperless/parsers/ alongside ParserProtocol and MetadataEntry; currently carries only mailrule_id but is designed to grow with output_type, ocr_mode, and ocr_language in a future phase (decoupling parsers from settings.) - ParserProtocol.configure(context: ParserContext) -> None is the extension point; no-op by default - MailDocumentParser.configure() reads mailrule_id into _mailrule_id - TextDocumentParser and TikaDocumentParser implement a no-op configure() - Consumer calls document_parser.configure(ParserContext(...)) before parse(), replacing the isinstance(parser, MailDocumentParser) guard and the direct attribute mutation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Feat(parsers): call configure(ParserContext()) in update_document task Apply the same new-style parser shim pattern as the consumer to update_document_content_maybe_archive_file: - Call __enter__ for Text/Tika parsers after instantiation - Call configure(ParserContext()) before parse() for all new-style parsers (mailrule_id is not available here — this is a re-process of an existing document, so the default empty context is correct) - Call parse(path, mime_type) with 2 args for new-style parsers - Call get_thumbnail(path, mime_type) with 2 args for new-style parsers - Call __exit__ instead of cleanup() in the finally block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix(tests): add configure() to DummyParser and missing-method parametrize ParserProtocol now requires configure(context: ParserContext) -> None. Update DummyParser in test_registry.py to implement it, and add 'missing-configure' to the test_partial_compliant_fails_isinstance parametrize list so the new method is covered by the negative test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Cleans up the reprocess task and generally reduces duplicate of classes * Corrects the score return * Updates so we can report a page count for these parsers, assuming we do have an archive produced when called * Increases test coverage * One more coverage * Updates typing * Updates typing --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-20 09:22:18 -07:00
shamoon	ca5879a54e	Fix one test with explicit override	2026-03-16 23:03:31 -07:00
shamoon	24a2cfd957	Change: use explicit doc creation instead of clone for versions (#12226 )	2026-03-04 15:57:44 -08:00
shamoon	df03207eef	Fix: correct doc version filename handling (#12223 )	2026-03-04 23:28:07 +00:00
shamoon	d51a118aac	Merge branch 'main' into dev	2026-03-04 13:31:20 -08:00
shamoon	8b8307571a	Fix: enforce path limit for db filename fields (#12235 )	2026-03-03 13:19:56 -08:00
shamoon	ceee769e26	Feature: document file versions (#12061 )	2026-02-26 16:46:54 +00:00
shamoon	6192915be7	Fixhancement: improve ASN handling with PDF operations (#11689 )	2026-02-06 21:14:02 +00:00
Sebastian Steinbeißer	3b5ffbf9fa	Chore(mypy): Annotate `None` returns for typing improvements (#11213 )	2026-02-02 08:44:12 -08:00
shamoon	4428354150	Feature: allow duplicates with warnings, UI for discovery (#11815 )	2026-01-26 18:55:08 +00:00
shamoon	7604a0b583	Fix: prevent ASN collisions for merge operations (#11634 )	2025-12-19 20:05:34 -08:00
sidey79	9e11e7fd05	Enhancement: jinja template support for workflow title assignment (#10700 ) --------- Co-authored-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com> Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2025-09-11 06:56:16 -07:00
Sebastian Steinbeißer	d2064a2535	Chore: switch from os.path to pathlib.Path (#10539 )	2025-09-03 08:12:41 -07:00
shamoon	e97cfb9b5e	Chore: refactor consumer plugin checks to a pre-flight plugin (#9994 )	2025-06-03 19:28:49 +00:00
shamoon	bc2facc87f	Chore: use pathlib in remaining tests	2025-06-03 11:48:17 -07:00
matthesrieke	e9746aa0e3	Enhancement: include DOCUMENT_TYPE to post consume scripts (#9977 ) * expose DOCUMENT_TYPE to post consume scripts * Apply suggestions from code review Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com> --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2025-05-28 23:32:59 +00:00
shamoon	1a6f32534c	Change: treat created as date not datetime (#9793 )	2025-05-16 14:23:04 +00:00
shamoon	edc7181843	Enhancement: support assigning custom field values in workflows (#9272 )	2025-03-05 12:30:19 -08:00
Trenton H	f205c4d0e2	Removes undocumented FileInfo (#9298 )	2025-03-04 13:49:47 -08:00
Silvia Bigler	71472a6a82	Enhancement: add layout options for email conversion (#8907 ) --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2025-02-07 18:32:35 +00:00
Sebastian Steinbeißer	e560fa3be0	Chore: Enable ruff FBT (#8645 )	2025-02-07 09:12:03 -08:00
shamoon	5e687d9a93	Feature: auto-clean some invalid pdfs (#7651 )	2024-09-25 15:57:20 +00:00
shamoon	0ee85aae21	Enhancement: log when pre-check fails for documents in trash (#7355 )	2024-08-05 17:01:01 -07:00
Trenton H	622f624132	Chore: Change the code formatter to Ruff (#6756 ) * Changing the formatting to ruff-format * Replaces references to black to ruff or ruff format, removes black from dependencies	2024-05-18 02:26:50 +00:00
Trenton H	b720aa3cd1	Chore: Convert the consumer to a plugin (#6361 )	2024-04-18 02:59:14 +00:00
Trenton H	2c43b06910	Chore: Standardize subprocess running and logging (#6275 )	2024-04-04 13:11:43 -07:00
shamoon	6d5f4e92cc	Enhancement: title assignment placeholder error handling, fallback (#5282 )	2024-01-10 10:18:55 -08:00
shamoon	f525ac0af6	Chore: add pre-commit hook for codespell (#5324 )	2024-01-08 13:03:05 -08:00
Trenton H	a82e3771ae	Fix: Allows pre-consume scripts to modify the working path again (#5260 ) * Allows pre-consume scripts to modify the working path again and generally cleans up some confusion about working copy vs original	2024-01-05 21:01:57 -08:00
Trenton H	061f33fb05	Feature: Allow setting backend configuration settings via the UI (#5126 ) * Saving some start on this * At least partially working for the tesseract parser * Problems with migration testing need to figure out * Work around that error * Fixes max m_pixels * Moving the settings to main paperless application * Starting some consumer options * More fixes and work * Fixes these last tests * Fix max_length on OcrSettings.mode field * Fix all fields on Common & Ocr settings serializers * Umbrellla config view * Revert "Umbrellla config view" This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5. * Updates to use a single configuration object for all settings * Squashed commit of the following: commit `8a0a49dd57` Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 19 23:02:47 2023 -0800 Fix formatting commit `66b2d90c50` Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 19 22:36:35 2023 -0800 Refactor frontend data models commit `5723bd8dd8` Author: Adam Bogdał <adam@bogdal.pl> Date: Wed Dec 20 01:17:43 2023 +0100 Fix: speed up admin panel for installs with a large number of documents (#5052) commit `9b08ce1761` Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 19 15:18:51 2023 -0800 Update PULL_REQUEST_TEMPLATE.md commit `a6248bec2d` Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 19 15:02:05 2023 -0800 Chore: Update Angular to v17 (#4980) commit `b1f6f52486` Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 19 13:53:56 2023 -0800 Fix: Dont allow null custom_fields property via API (#5063) commit `638d9970fd` Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 19 13:43:50 2023 -0800 Enhancement: symmetric document links (#4907) commit `5e8de4c1da` Author: shamoon <4887959+shamoon@users.noreply.github.com> Date: Tue Dec 19 12:45:04 2023 -0800 Enhancement: shared icon & shared by me filter (#4859) commit `088bad9030` Author: Trenton H <797416+stumpylog@users.noreply.github.com> Date: Tue Dec 19 12:04:03 2023 -0800 Bulk updates all the backend libraries (#5061) * Saving some work on frontend config * Very basic but dynamically-generated config form * Saving work on slightly less ugly frontend config * JSON validation for user_args field * Fully dynamic config form * Adds in some additional validators for a nicer error message * Cleaning up the testing and coverage more * Reverts unintentional change * Adds documentation about the settings and the precedence * Couple more commenting and style fixes --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2023-12-29 15:42:56 -08:00
shamoon	f27f25aa03	Enhancement: support assigning custom fields via consumption templates (#4727 )	2023-12-03 15:35:30 -08:00
Trenton H	8d60506884	Standarizes the imports across all the files and modules (#4248 )	2023-09-23 20:17:01 -07:00
shamoon	9712ac109d	Feature: consumption templates (#4196 ) * Initial implementation of consumption templates * Frontend implementation of consumption templates Testing * Support consumption template source * order templates, automatically add permissions * Support title assignment in consumption templates * Refactoring, filters to and, show sources on list Show sources on template list, update some translation strings Make filters and minor testing * Update strings * Only update django-multiselectfield * Basic docs, document some methods * Improve testing coverage, template multi-assignment merges	2023-09-22 16:53:13 -07:00
Trenton Holmes	650c816a7b	Removes support for Python 3.8 and lower from the code base	2023-09-10 11:42:59 -07:00
Trenton H	714995877a	Merge pull request #4037 from andreheuer/dev Enhancement: add task id to pre/post consume script as env	2023-09-08 10:00:05 -07:00
shamoon	61566a34d1	Fix consumer error typo	2023-09-01 00:11:32 -07:00
shamoon	e14f4c94c2	Fix: ghostscript rendering error doesnt trigger frontend failure message (#4092 ) * Raise ParseError from gs rendering error * catch all parser errors as generic exception * Differentiate generic vs parse errors during consumption	2023-08-31 19:49:00 -07:00
André Heuer	88ee3bdb6d	Removed parameter, added documentation	2023-08-29 23:09:47 -07:00
André Heuer	8f8a99a645	Added task id to pre/post consume script as env	2023-08-29 23:09:47 -07:00
Trenton Holmes	07e7bcd30b	Small improvement to the consumer status with stronger typing	2023-07-26 07:03:43 -07:00
Trenton H	802e5591ce	Also handles confirming returned predictions are still automatic matching, in case the classifier hasn't been run since a type was changed	2023-07-24 12:31:56 -07:00
Trenton H	452c79f9a1	Improves the logging mixin and allows it to be typed better	2023-05-23 17:16:39 -07:00
Trenton H	6f163111ce	Upgrades black to v23, upgrades ruff	2023-04-26 09:35:27 -07:00
Trenton H	3bcbd05252	Fixes ruff not running isort against the codebase	2023-04-26 09:35:27 -07:00
Trenton H	ce41ac9158	Configures ruff as the one stop linter and resolves warnings it raised	2023-04-01 17:03:52 -07:00
Trenton Holmes	0df91c31f1	Creates a mix-in for asserting file system states	2023-02-20 10:25:21 -08:00
Trenton Holmes	3e777f2a5b	Fixes up some minor warnings from test code	2023-02-11 14:35:16 -08:00

1 2 3

137 Commits