Commit Graph

24 Commits

Author SHA1 Message Date
Trenton H
aea2927a02 Feature: Convert Tika parser to the plugin system (#12333)
* Chore: move Tika parser and tests to paperless/

Move TikaDocumentParser and its tests to the canonical parser package
location, matching the pattern established for TextDocumentParser:

- src/paperless_tika/parsers.py → src/paperless/parsers/tika.py
- src/paperless_tika/tests/test_tika_parser.py → src/paperless/tests/parsers/test_tika_parser.py
- src/paperless_tika/tests/samples/ → src/paperless/tests/samples/tika/

Merge tika fixtures (tika_parser, sample_odt_file, sample_docx_file,
sample_doc_file, sample_broken_odt) into the shared parsers conftest.
Remove the now-empty src/paperless_tika/tests/conftest.py.

Content is unchanged — this commit is rename-only so git history is
preserved on the moved files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Feature: Phase 3 — migrate TikaDocumentParser to ParserProtocol

Refactor TikaDocumentParser to satisfy ParserProtocol without subclassing
the legacy DocumentParser ABC:

- Add ClassVars: name, version, author, url
- Add supported_mime_types() classmethod (12 Office/ODF/RTF MIME types)
- Add score() classmethod — returns None when TIKA_ENABLED is False, 10 otherwise
- can_produce_archive = False (PDF is for display, not an OCR archive)
- requires_pdf_rendition = True (Office formats need PDF for browser display)
- __enter__/__exit__ via ExitStack: TikaClient opened once per parser
  lifetime and shared across parse() and extract_metadata() calls
- extract_metadata() falls back to a short-lived TikaClient when called
  outside a context manager (legacy view-layer metadata path)
- _convert_to_pdf() uses OutputTypeConfig() to honour the database-stored
  ApplicationConfiguration before falling back to the env-var setting
- Rename convert_to_pdf → _convert_to_pdf (private helper)

Update paperless_tika/signals.py shim to import from the new module path
and drop the legacy logging_group/progress_callback kwargs.

Update documents/consumer.py to extend the existing TextDocumentParser
special cases to also cover TikaDocumentParser (parse/get_thumbnail
signatures, __exit__ cleanup).

Add TestTikaParserRegistryInterface (7 tests) covering score(), properties,
and ParserProtocol isinstance check.  Update existing tests to use the new
accessor API (get_text, get_date, get_archive_path, _convert_to_pdf).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: update remaining imports and move live Tika tests after parser migration

- src/documents/tests/test_parsers.py: import TikaDocumentParser from
  paperless.parsers.tika (old paperless_tika.parsers no longer exists)
- git mv paperless_tika/tests/test_live_tika.py →
  paperless/tests/parsers/test_live_tika.py to co-locate all Tika tests
  with the parser; update import and replace old attribute API
  (tika_parser.text/.archive_path) with accessor methods
  (get_text/get_archive_path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: satisfy mypy and pyrefly for TikaDocumentParser

Use a TYPE_CHECKING-guarded assert to narrow self._tika_client from
TikaClient | None to TikaClient at the point of use in parse().  The
assert is visible to type checkers (TYPE_CHECKING=True) so both mypy
and pyrefly accept the subsequent attribute accesses without error;
at runtime TYPE_CHECKING is False so the assert never executes and no
ruff S101 suppression is required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: require context manager for TikaDocumentParser; clean up client lifecycle

- consumer.py: call __enter__ for new-style parsers so _tika_client and
  _gotenberg_client are set before parse() is invoked
- views.py: use `with parser` (via nullcontext for old-style parsers) in
  get_metadata so extract_metadata always runs inside a context manager
- tika.py: GotenbergClient added to ExitStack alongside TikaClient;
  inline client creation removed from extract_metadata and _convert_to_pdf;
  __exit__ uses ExitStack.close() instead of __exit__ pass-through

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 15:43:28 -07:00
dependabot[bot]
484bef00c1 docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#12190)
Bumps gotenberg/gotenberg from 8.26 to 8.27.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.27'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:14:48 -08:00
dependabot[bot]
b9b90ec9f7 docker-compose(deps): Bump nginx in /docker/compose (#12018)
Bumps nginx from 1.29-alpine to 1.29.5-alpine.

---
updated-dependencies:
- dependency-name: nginx
  dependency-version: 1.29.5-alpine
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 12:26:29 -08:00
dependabot[bot]
4a5116adf8 docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#11979)
Bumps gotenberg/gotenberg from 8.25 to 8.26.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.26'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-04 13:24:19 -08:00
Trenton H
01b21377af Chore: Use a local http server instead of external to reduce flakiness (#11916) 2026-01-28 03:57:12 +00:00
Trenton H
c84f2f04b3 Chore: Switch to a local IMAP server instead of a real email service (#11913) 2026-01-27 11:35:12 -08:00
dependabot[bot]
4bf681387a docker-compose(deps): bump gotenberg/gotenberg in /docker/compose (#11393)
Bumps gotenberg/gotenberg from 8.24 to 8.25.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.25'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-18 09:29:39 -08:00
dependabot[bot]
7326224888 docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#11050)
Bumps gotenberg/gotenberg from 8.23 to 8.24.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.24'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
2025-10-15 13:52:58 -07:00
dependabot[bot]
84d85d7a23 docker-compose(deps): Bump gotenberg/gotenberg from 8.22 to 8.23 in /docker/compose (#10812)
Bumps gotenberg/gotenberg from 8.22 to 8.23.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.23'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-09 12:47:25 -07:00
dependabot[bot]
27d72ebb18 docker-compose(deps): Bump gotenberg/gotenberg from 8.20 to 8.22 in /docker/compose (#10687)
Bumps gotenberg/gotenberg from 8.20 to 8.22.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.22'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 14:53:35 -07:00
Katrin Leinweber
5410074062 Documentation: copy-edits (#10417) 2025-07-20 17:27:04 +00:00
Trenton H
3d2a3ede71 Chore: Updates dependency groups (#10339) 2025-07-07 17:37:58 -07:00
dependabot[bot]
915584551c docker-compose(deps): bump gotenberg/gotenberg from 8.19 to 8.20 in /docker/compose (#9661)
Bumps gotenberg/gotenberg from 8.19 to 8.20.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.20'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-24 10:06:07 -07:00
dependabot[bot]
c9bc9acd1a docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#9532)
Bumps gotenberg/gotenberg from 8.17 to 8.19.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.19'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-02 18:53:44 +00:00
Max Mehl
6b7fb286f7 Chore: bump gotenberg docker images (#9189)
* Chore: update gotenberg Docker images to latest minor version

* Chore: update gotenberg Docker images to latest minor version for devcontainer
2025-02-21 13:29:21 -08:00
Trenton H
29e6371cd1 Feature: Upgrade Gotenberg to v8 (#7094) 2024-06-27 02:37:50 +00:00
Bruno Willenborg
2116964f67 docs: drop obsolete docker compose version (#6806) 2024-05-22 15:21:48 -07:00
Trenton H
f7ce32f471 Updates the Tika image to the official now that Apache publishes multi-arch images (#6802) 2024-05-21 20:45:56 +00:00
luzpaz
58bf9c552b Documentation: Fix typos with automated tool (#5319)
---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-01-08 16:58:41 +00:00
Trenton H
771c1fab92 Chore: Raise Gotenberg container version (#4815)
* Updates the Gotenberg version to use 7.10 and gotenberg-client to match
* Fixes a long standing bug in this test where a whole page was missing from the expected
2023-12-05 15:36:25 +00:00
Trenton H
c8ee35692c Documentation: Update documentation to refer only to Docker Compose v2 command (#4650)
* Replaces references to docker-compose (the v1 executable) with docker compose (the v2 plugin) as well as fixing up some referenes between the tool vs the command

* Update docs/setup.md

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>

* Replaces references to docker-compose (the v1 executable) with docker compose (the v2 plugin) as well as fixing up some referenes between the tool vs the command

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-11-20 20:14:33 -08:00
shamoon
8955d25a00 update gotenberg to 7.8 2023-02-05 20:17:22 -08:00
phail
fe2db4dbf7 adapt compose file for eml parsing 2022-11-30 10:16:39 +01:00
Trenton H
fed7d3e993 Use docker compose to start and stop containers which match directly to our command overrides 2022-11-29 20:11:40 -08:00