* Chore: move Tika parser and tests to paperless/
Move TikaDocumentParser and its tests to the canonical parser package
location, matching the pattern established for TextDocumentParser:
- src/paperless_tika/parsers.py → src/paperless/parsers/tika.py
- src/paperless_tika/tests/test_tika_parser.py → src/paperless/tests/parsers/test_tika_parser.py
- src/paperless_tika/tests/samples/ → src/paperless/tests/samples/tika/
Merge tika fixtures (tika_parser, sample_odt_file, sample_docx_file,
sample_doc_file, sample_broken_odt) into the shared parsers conftest.
Remove the now-empty src/paperless_tika/tests/conftest.py.
Content is unchanged — this commit is rename-only so git history is
preserved on the moved files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Feature: Phase 3 — migrate TikaDocumentParser to ParserProtocol
Refactor TikaDocumentParser to satisfy ParserProtocol without subclassing
the legacy DocumentParser ABC:
- Add ClassVars: name, version, author, url
- Add supported_mime_types() classmethod (12 Office/ODF/RTF MIME types)
- Add score() classmethod — returns None when TIKA_ENABLED is False, 10 otherwise
- can_produce_archive = False (PDF is for display, not an OCR archive)
- requires_pdf_rendition = True (Office formats need PDF for browser display)
- __enter__/__exit__ via ExitStack: TikaClient opened once per parser
lifetime and shared across parse() and extract_metadata() calls
- extract_metadata() falls back to a short-lived TikaClient when called
outside a context manager (legacy view-layer metadata path)
- _convert_to_pdf() uses OutputTypeConfig() to honour the database-stored
ApplicationConfiguration before falling back to the env-var setting
- Rename convert_to_pdf → _convert_to_pdf (private helper)
Update paperless_tika/signals.py shim to import from the new module path
and drop the legacy logging_group/progress_callback kwargs.
Update documents/consumer.py to extend the existing TextDocumentParser
special cases to also cover TikaDocumentParser (parse/get_thumbnail
signatures, __exit__ cleanup).
Add TestTikaParserRegistryInterface (7 tests) covering score(), properties,
and ParserProtocol isinstance check. Update existing tests to use the new
accessor API (get_text, get_date, get_archive_path, _convert_to_pdf).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: update remaining imports and move live Tika tests after parser migration
- src/documents/tests/test_parsers.py: import TikaDocumentParser from
paperless.parsers.tika (old paperless_tika.parsers no longer exists)
- git mv paperless_tika/tests/test_live_tika.py →
paperless/tests/parsers/test_live_tika.py to co-locate all Tika tests
with the parser; update import and replace old attribute API
(tika_parser.text/.archive_path) with accessor methods
(get_text/get_archive_path)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: satisfy mypy and pyrefly for TikaDocumentParser
Use a TYPE_CHECKING-guarded assert to narrow self._tika_client from
TikaClient | None to TikaClient at the point of use in parse(). The
assert is visible to type checkers (TYPE_CHECKING=True) so both mypy
and pyrefly accept the subsequent attribute accesses without error;
at runtime TYPE_CHECKING is False so the assert never executes and no
ruff S101 suppression is required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: require context manager for TikaDocumentParser; clean up client lifecycle
- consumer.py: call __enter__ for new-style parsers so _tika_client and
_gotenberg_client are set before parse() is invoked
- views.py: use `with parser` (via nullcontext for old-style parsers) in
get_metadata so extract_metadata always runs inside a context manager
- tika.py: GotenbergClient added to ExitStack alongside TikaClient;
inline client creation removed from extract_metadata and _convert_to_pdf;
__exit__ uses ExitStack.close() instead of __exit__ pass-through
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Updates the Gotenberg version to use 7.10 and gotenberg-client to match
* Fixes a long standing bug in this test where a whole page was missing from the expected
* Replaces references to docker-compose (the v1 executable) with docker compose (the v2 plugin) as well as fixing up some referenes between the tool vs the command
* Update docs/setup.md
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
* Replaces references to docker-compose (the v1 executable) with docker compose (the v2 plugin) as well as fixing up some referenes between the tool vs the command
---------
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>