mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2026-03-27 19:32:45 +00:00
Introduces the foundation of the entrypoint-based parser discovery system to replace the signal-based document_consumer_declaration approach. - Add ParserProtocol: runtime_checkable Protocol defining the full contract for document parsers (supported_mime_types, score, parse, context manager, result accessors) - Add ParserRegistry: lazy singleton with entrypoint discovery via importlib.metadata group 'paperless_ngx.parsers', uniform score-based selection across external and built-in parsers - Add get_parser_registry(), init_builtin_parsers(), reset_parser_registry() module-level helpers - Wire Celery worker_process_init to call init_builtin_parsers() eagerly in each worker, deferring third-party discovery to first task use - Add 28 pytest tests covering Protocol compliance, singleton lifecycle, scoring logic, entrypoint discovery, and log output Built-in parsers and consumer migration follow in Phases 3-6. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>