* refactor: switch consumer and callers to ParserRegistry (Phase 4)
Replace all Django signal-based parser discovery with direct registry
calls. Removes `_parser_cleanup`, `parser_is_new_style` shims, and all
old-style isinstance checks. All parser instantiation now uses the
`with parser_class() as parser:` context manager pattern.
- documents/parsers.py: delegate to get_parser_registry(); drop lru_cache
- documents/consumer.py: use registry + context manager; remove shims
- documents/tasks.py: same pattern
- documents/management/commands/document_thumbnails.py: same pattern
- documents/views.py: get_metadata uses context manager
- documents/checks.py: use get_parser_registry().all_parsers()
- paperless/parsers/registry.py: add all_parsers() public method
- tests: update mocks to target documents.consumer.get_parser_class_for_mime_type
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: drop get_parser_class_for_mime_type; callers use registry directly
All callers now call get_parser_registry().get_parser_for_file() with
the actual filename and path, enabling score() to use file extension
hints. The MIME-only helper is removed.
- consumer.py: passes self.filename + self.working_copy
- tasks.py: passes document.original_filename + document.source_path
- document_thumbnails.py: same pattern
- views.py: passes Path(file).name + Path(file)
- parsers.py: internal helpers inline the registry call with filename=""
- test_parsers.py: drop TestParserDiscovery (was testing mock behavior);
TestParserAvailability uses registry directly
- test_consumer.py: mocks switch to documents.consumer.get_parser_registry
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: remove document_consumer_declaration signal infrastructure
Remove the document_consumer_declaration signal that was previously used
for parser registration. Each parser app no longer connects to this signal,
and the signal declaration itself has been removed from documents/signals.
Changes:
- Remove document_consumer_declaration from documents/signals/__init__.py
- Remove ready() methods and signal imports from all parser app configs
- Delete signal shim files (signals.py) from all parser apps:
- paperless_tesseract/signals.py
- paperless_text/signals.py
- paperless_tika/signals.py
- paperless_mail/signals.py
- paperless_remote/signals.py
Parser discovery now happens exclusively through the ParserRegistry
system introduced in the previous refactor phases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: remove empty paperless_text and paperless_tika Django apps
After parser classes were moved to paperless/parsers/ in the plugin
refactor, these Django apps contained only empty AppConfig classes
with no models, views, tasks, migrations, or other functionality.
- Remove paperless_text and paperless_tika from INSTALLED_APPS
- Delete empty app directories entirely
- Update pyproject.toml test exclusions
- Clean stale mypy baseline entries for moved parser files
paperless_remote app is retained as it contains meaningful system
checks for Azure AI configuration.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Moves the checks and tests to the main application and removes the old applications
* Adds a comment to satisy Sonar
* refactor: remove automatic log_summary() call from get_parser_registry()
The summary was logged once per process, causing it to appear repeatedly
during Docker startup (management commands, web server, each Celery
worker subprocess). External parsers are already announced individually
at INFO when discovered; the full summary is redundant noise.
log_summary() is retained on ParserRegistry for manual/debug use.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Cleans up the duplicate test file/fixture
* Fixes a race condition where webserver threads could race to populate the registry
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Refactor(mail): rename paperless_mail/parsers.py → paperless/parsers/mail.py
Preserve git history for MailDocumentParser by committing the rename
separately before editing, following the project convention.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Refactor(mail): move mail parser tests to paperless/tests/parsers/
Move test_parsers.py → test_mail_parser.py and test_parsers_live.py →
test_mail_parser_live.py alongside the other built-in parser tests,
preserving git history before editing. Update MailDocumentParser import
to the new canonical location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Chore: move mail parser sample files to paperless/tests/samples/mail/
Relocate all mail test fixtures from src/paperless_mail/tests/samples/ to
src/paperless/tests/samples/mail/ ahead of the parser plugin refactor.
Add the new path to the codespell skip list to prevent false-positive
spell corrections in binary/fixture email files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Feat(tests): add mail parser fixtures to paperless/tests/parsers/conftest.py
Add mail_samples_dir, per-file sample fixtures, and mail_parser
(context-manager style) to mirror the old paperless_mail conftest
but rooted at the new samples/mail/ location.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Feat(parsers): migrate MailDocumentParser to ParserProtocol
Move the mail parser from paperless_mail/parsers.py to
paperless/parsers/mail.py and refactor it to implement ParserProtocol:
- Class-level name/version/author/url attributes
- supported_mime_types() and score() classmethods (score=20)
- can_produce_archive=False, requires_pdf_rendition=True
- Context manager lifecycle (__enter__/__exit__)
- New parse() signature without mailrule_id kwarg; consumer sets
parser.mailrule_id before calling parse() instead
- get_text()/get_date()/get_archive_path() accessor methods
- extract_metadata() returning email headers and attachment info
Register MailDocumentParser in the ParserRegistry alongside Text and
Tika parsers. Update consumer, signals, and all import sites to use
the new location. Update tests to use the new accessor API, patch
paths, and context-manager fixture.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix(parsers): pop legacy constructor args in mail signal wrapper
MailDocumentParser.__init__ takes no constructor args in the new
protocol. Update the get_parser() signal wrapper to pop logging_group
and progress_callback (passed by the legacy consumer dispatch path)
before instantiating — the same pattern used by TextDocumentParser.
Also update test_mail_parser_receives_mailrule to use the real signal
wrapper (mail_get_parser) instead of MailDocumentParser directly, so
the test exercises the actual dispatch path and matches the new
parse() call signature (no mailrule kwarg).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Bumps this so we can run
* Fixes location of the fixture
* Removes fixtures which were duplicated
* Feat(parsers): add ParserContext and configure() to ParserProtocol
Replace the ad-hoc mailrule_id attribute assignment with a typed,
immutable ParserContext dataclass and a configure() method on the
Protocol:
- ParserContext(frozen=True, slots=True) lives in paperless/parsers/
alongside ParserProtocol and MetadataEntry; currently carries only
mailrule_id but is designed to grow with output_type, ocr_mode, and
ocr_language in a future phase (decoupling parsers from settings.*)
- ParserProtocol.configure(context: ParserContext) -> None is the
extension point; no-op by default
- MailDocumentParser.configure() reads mailrule_id into _mailrule_id
- TextDocumentParser and TikaDocumentParser implement a no-op configure()
- Consumer calls document_parser.configure(ParserContext(...)) before
parse(), replacing the isinstance(parser, MailDocumentParser) guard
and the direct attribute mutation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Feat(parsers): call configure(ParserContext()) in update_document task
Apply the same new-style parser shim pattern as the consumer to
update_document_content_maybe_archive_file:
- Call __enter__ for Text/Tika parsers after instantiation
- Call configure(ParserContext()) before parse() for all new-style parsers
(mailrule_id is not available here — this is a re-process of an
existing document, so the default empty context is correct)
- Call parse(path, mime_type) with 2 args for new-style parsers
- Call get_thumbnail(path, mime_type) with 2 args for new-style parsers
- Call __exit__ instead of cleanup() in the finally block
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix(tests): add configure() to DummyParser and missing-method parametrize
ParserProtocol now requires configure(context: ParserContext) -> None.
Update DummyParser in test_registry.py to implement it, and add
'missing-configure' to the test_partial_compliant_fails_isinstance
parametrize list so the new method is covered by the negative test.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Cleans up the reprocess task and generally reduces duplicate of classes
* Corrects the score return
* Updates so we can report a page count for these parsers, assuming we do have an archive produced when called
* Increases test coverage
* One more coverage
* Updates typing
* Updates typing
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Saving some start on this
* At least partially working for the tesseract parser
* Problems with migration testing need to figure out
* Work around that error
* Fixes max m_pixels
* Moving the settings to main paperless application
* Starting some consumer options
* More fixes and work
* Fixes these last tests
* Fix max_length on OcrSettings.mode field
* Fix all fields on Common & Ocr settings serializers
* Umbrellla config view
* Revert "Umbrellla config view"
This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5.
* Updates to use a single configuration object for all settings
* Squashed commit of the following:
commit 8a0a49dd57
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 23:02:47 2023 -0800
Fix formatting
commit 66b2d90c50
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 22:36:35 2023 -0800
Refactor frontend data models
commit 5723bd8dd8
Author: Adam Bogdał <adam@bogdal.pl>
Date: Wed Dec 20 01:17:43 2023 +0100
Fix: speed up admin panel for installs with a large number of documents (#5052)
commit 9b08ce1761
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 15:18:51 2023 -0800
Update PULL_REQUEST_TEMPLATE.md
commit a6248bec2d
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 15:02:05 2023 -0800
Chore: Update Angular to v17 (#4980)
commit b1f6f52486
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 13:53:56 2023 -0800
Fix: Dont allow null custom_fields property via API (#5063)
commit 638d9970fd
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 13:43:50 2023 -0800
Enhancement: symmetric document links (#4907)
commit 5e8de4c1da
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 12:45:04 2023 -0800
Enhancement: shared icon & shared by me filter (#4859)
commit 088bad9030
Author: Trenton H <797416+stumpylog@users.noreply.github.com>
Date: Tue Dec 19 12:04:03 2023 -0800
Bulk updates all the backend libraries (#5061)
* Saving some work on frontend config
* Very basic but dynamically-generated config form
* Saving work on slightly less ugly frontend config
* JSON validation for user_args field
* Fully dynamic config form
* Adds in some additional validators for a nicer error message
* Cleaning up the testing and coverage more
* Reverts unintentional change
* Adds documentation about the settings and the precedence
* Couple more commenting and style fixes
---------
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
* Initial implementation of consumption templates
* Frontend implementation of consumption templates
Testing
* Support consumption template source
* order templates, automatically add permissions
* Support title assignment in consumption templates
* Refactoring, filters to and, show sources on list
Show sources on template list, update some translation strings
Make filters and
minor testing
* Update strings
* Only update django-multiselectfield
* Basic docs, document some methods
* Improve testing coverage, template multi-assignment merges