Commit Graph

83 Commits

Author SHA1 Message Date
Trenton H
aea2927a02 Feature: Convert Tika parser to the plugin system (#12333)
* Chore: move Tika parser and tests to paperless/

Move TikaDocumentParser and its tests to the canonical parser package
location, matching the pattern established for TextDocumentParser:

- src/paperless_tika/parsers.py → src/paperless/parsers/tika.py
- src/paperless_tika/tests/test_tika_parser.py → src/paperless/tests/parsers/test_tika_parser.py
- src/paperless_tika/tests/samples/ → src/paperless/tests/samples/tika/

Merge tika fixtures (tika_parser, sample_odt_file, sample_docx_file,
sample_doc_file, sample_broken_odt) into the shared parsers conftest.
Remove the now-empty src/paperless_tika/tests/conftest.py.

Content is unchanged — this commit is rename-only so git history is
preserved on the moved files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Feature: Phase 3 — migrate TikaDocumentParser to ParserProtocol

Refactor TikaDocumentParser to satisfy ParserProtocol without subclassing
the legacy DocumentParser ABC:

- Add ClassVars: name, version, author, url
- Add supported_mime_types() classmethod (12 Office/ODF/RTF MIME types)
- Add score() classmethod — returns None when TIKA_ENABLED is False, 10 otherwise
- can_produce_archive = False (PDF is for display, not an OCR archive)
- requires_pdf_rendition = True (Office formats need PDF for browser display)
- __enter__/__exit__ via ExitStack: TikaClient opened once per parser
  lifetime and shared across parse() and extract_metadata() calls
- extract_metadata() falls back to a short-lived TikaClient when called
  outside a context manager (legacy view-layer metadata path)
- _convert_to_pdf() uses OutputTypeConfig() to honour the database-stored
  ApplicationConfiguration before falling back to the env-var setting
- Rename convert_to_pdf → _convert_to_pdf (private helper)

Update paperless_tika/signals.py shim to import from the new module path
and drop the legacy logging_group/progress_callback kwargs.

Update documents/consumer.py to extend the existing TextDocumentParser
special cases to also cover TikaDocumentParser (parse/get_thumbnail
signatures, __exit__ cleanup).

Add TestTikaParserRegistryInterface (7 tests) covering score(), properties,
and ParserProtocol isinstance check.  Update existing tests to use the new
accessor API (get_text, get_date, get_archive_path, _convert_to_pdf).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: update remaining imports and move live Tika tests after parser migration

- src/documents/tests/test_parsers.py: import TikaDocumentParser from
  paperless.parsers.tika (old paperless_tika.parsers no longer exists)
- git mv paperless_tika/tests/test_live_tika.py →
  paperless/tests/parsers/test_live_tika.py to co-locate all Tika tests
  with the parser; update import and replace old attribute API
  (tika_parser.text/.archive_path) with accessor methods
  (get_text/get_archive_path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: satisfy mypy and pyrefly for TikaDocumentParser

Use a TYPE_CHECKING-guarded assert to narrow self._tika_client from
TikaClient | None to TikaClient at the point of use in parse().  The
assert is visible to type checkers (TYPE_CHECKING=True) so both mypy
and pyrefly accept the subsequent attribute accesses without error;
at runtime TYPE_CHECKING is False so the assert never executes and no
ruff S101 suppression is required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: require context manager for TikaDocumentParser; clean up client lifecycle

- consumer.py: call __enter__ for new-style parsers so _tika_client and
  _gotenberg_client are set before parse() is invoked
- views.py: use `with parser` (via nullcontext for old-style parsers) in
  get_metadata so extract_metadata always runs inside a context manager
- tika.py: GotenbergClient added to ExitStack alongside TikaClient;
  inline client creation removed from extract_metadata and _convert_to_pdf;
  __exit__ uses ExitStack.close() instead of __exit__ pass-through

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 15:43:28 -07:00
Trenton H
86fa74c115 Fix: Postgres selection, DBENGINE and migrations (#12299) 2026-03-11 11:54:24 -07:00
dependabot[bot]
484bef00c1 docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#12190)
Bumps gotenberg/gotenberg from 8.26 to 8.27.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.27'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 10:14:48 -08:00
dependabot[bot]
b9b90ec9f7 docker-compose(deps): Bump nginx in /docker/compose (#12018)
Bumps nginx from 1.29-alpine to 1.29.5-alpine.

---
updated-dependencies:
- dependency-name: nginx
  dependency-version: 1.29.5-alpine
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-06 12:26:29 -08:00
dependabot[bot]
4a5116adf8 docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#11979)
Bumps gotenberg/gotenberg from 8.25 to 8.26.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.26'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-04 13:24:19 -08:00
Trenton H
01b21377af Chore: Use a local http server instead of external to reduce flakiness (#11916) 2026-01-28 03:57:12 +00:00
Trenton H
c84f2f04b3 Chore: Switch to a local IMAP server instead of a real email service (#11913) 2026-01-27 11:35:12 -08:00
dependabot[bot]
4bf681387a docker-compose(deps): bump gotenberg/gotenberg in /docker/compose (#11393)
Bumps gotenberg/gotenberg from 8.24 to 8.25.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.25'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-18 09:29:39 -08:00
shamoon
a206ac78dd Chore: update Postgres compose volume mount path (#11084) 2025-10-20 16:18:36 +00:00
dependabot[bot]
7326224888 docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#11050)
Bumps gotenberg/gotenberg from 8.23 to 8.24.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.24'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
2025-10-15 13:52:58 -07:00
dependabot[bot]
92ee906701 docker-compose(deps): Bump library/postgres in /docker/compose (#10965)
Bumps library/postgres from 17 to 18.

---
updated-dependencies:
- dependency-name: library/postgres
  dependency-version: '18'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-30 20:25:02 +00:00
dependabot[bot]
84d85d7a23 docker-compose(deps): Bump gotenberg/gotenberg from 8.22 to 8.23 in /docker/compose (#10812)
Bumps gotenberg/gotenberg from 8.22 to 8.23.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.23'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-09 12:47:25 -07:00
dependabot[bot]
10ccccc987 docker-compose(deps): Bump library/mariadb from 11 to 12 in /docker/compose (#10621)
Bumps library/mariadb from 11 to 12.

---
updated-dependencies:
- dependency-name: library/mariadb
  dependency-version: '12'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 15:07:50 -07:00
dependabot[bot]
27d72ebb18 docker-compose(deps): Bump gotenberg/gotenberg from 8.20 to 8.22 in /docker/compose (#10687)
Bumps gotenberg/gotenberg from 8.20 to 8.22.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.22'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-27 14:53:35 -07:00
Katrin Leinweber
5410074062 Documentation: copy-edits (#10417) 2025-07-20 17:27:04 +00:00
Boyuan Yang
f8689c4819 Documentation: Fix URL for PAPERLESS_OCR_LANGUAGE example in docker-compose.env (#10408) 2025-07-19 02:25:31 +00:00
Trenton H
3d2a3ede71 Chore: Updates dependency groups (#10339) 2025-07-07 17:37:58 -07:00
dependabot[bot]
bcb0ae1ee5 docker-compose(deps): Bump library/redis from 7 to 8 in /docker/compose (#9879)
Bumps library/redis from 7 to 8.

---
updated-dependencies:
- dependency-name: library/redis
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-07 16:49:18 +00:00
dependabot[bot]
915584551c docker-compose(deps): bump gotenberg/gotenberg from 8.19 to 8.20 in /docker/compose (#9661)
Bumps gotenberg/gotenberg from 8.19 to 8.20.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.20'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-24 10:06:07 -07:00
shamoon
312bb743b9 Chore: add ymlfmt (#9745) 2025-04-22 22:20:54 +00:00
dependabot[bot]
c9bc9acd1a docker-compose(deps): Bump gotenberg/gotenberg in /docker/compose (#9532)
Bumps gotenberg/gotenberg from 8.17 to 8.19.

---
updated-dependencies:
- dependency-name: gotenberg/gotenberg
  dependency-version: '8.19'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-02 18:53:44 +00:00
shamoon
32a7f9cd5a Enhancement: allow webUI first account signup (#9500) 2025-03-29 17:12:34 +00:00
Trenton H
9c68100dc0 Fix: Make management commands aware of the container environment (#9499) 2025-03-26 14:17:10 -07:00
dependabot[bot]
032bada221 docker-compose(deps): Bump library/postgres in /docker/compose (#9353)
Bumps library/postgres from 16 to 17.

---
updated-dependencies:
- dependency-name: library/postgres
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-10 09:37:53 -07:00
Max Mehl
6b7fb286f7 Chore: bump gotenberg docker images (#9189)
* Chore: update gotenberg Docker images to latest minor version

* Chore: update gotenberg Docker images to latest minor version for devcontainer
2025-02-21 13:29:21 -08:00
shamoon
5821033e3d Merge branch 'dev' 2025-01-13 07:47:08 -08:00
shamoon
fcf532f13e Documentation: documentation updates 2024-11-24 14:20:20 -08:00
ftibi93
6c3d6d562d Documentation: fix docker-compose.portainer.yml GID (#8273) 2024-11-16 04:19:18 +00:00
Trenton H
29e6371cd1 Feature: Upgrade Gotenberg to v8 (#7094) 2024-06-27 02:37:50 +00:00
Bruno Willenborg
2116964f67 docs: drop obsolete docker compose version (#6806) 2024-05-22 15:21:48 -07:00
Trenton H
f7ce32f471 Updates the Tika image to the official now that Apache publishes multi-arch images (#6802) 2024-05-21 20:45:56 +00:00
Trenton H
48092d47c5 Updates the recommended versions of databases to their latest (#6639) 2024-05-08 20:32:17 +00:00
Joakim Berglund
85b596d20d Lowercase stack name in docker-compose.portainer.yml (#5491)
Portainer does not allow upper case letters in the stack name. Update documentation to adhere to this limitation.
2024-01-21 10:34:04 -08:00
luzpaz
58bf9c552b Documentation: Fix typos with automated tool (#5319)
---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-01-08 16:58:41 +00:00
Colin Hebert
4f85dcecfc Deployment: Use the default Docker healthcheck from the Dockerfile (Part 2) (#5224)
* Set default healthcheck

* Rely on default healthcheck
2024-01-07 22:49:29 +00:00
Trenton H
771c1fab92 Chore: Raise Gotenberg container version (#4815)
* Updates the Gotenberg version to use 7.10 and gotenberg-client to match
* Fixes a long standing bug in this test where a whole page was missing from the expected
2023-12-05 15:36:25 +00:00
Trenton H
c8ee35692c Documentation: Update documentation to refer only to Docker Compose v2 command (#4650)
* Replaces references to docker-compose (the v1 executable) with docker compose (the v2 plugin) as well as fixing up some referenes between the tool vs the command

* Update docs/setup.md

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>

* Replaces references to docker-compose (the v1 executable) with docker compose (the v2 plugin) as well as fixing up some referenes between the tool vs the command

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-11-20 20:14:33 -08:00
Trenton H
c4407dccf6 Updates the default Postgres to 15 for new installs 2023-06-20 10:35:48 -07:00
shamoon
38de2a7767 Merge branch 'dev' 2023-02-16 20:07:50 -08:00
Omar Saleem
37ddc3b8f7 wrapping ports in quotes 2023-02-10 15:14:22 -08:00
shamoon
8955d25a00 update gotenberg to 7.8 2023-02-05 20:17:22 -08:00
Michael Shamoon
8c9a74ee0c Merge branch 'dev' 2022-12-29 19:39:38 -08:00
ThellraAK
800e842ab3 Removing Mariadb default open port (#2227)
* Removing Mariadb default open port

Removing the listening port 3306 for the DB, Docker networks will let the containers talk to one another.  The existing setup would allow anyone to connect to the DB and use the default passwords.

* Update docker-compose.mariadb-tika.yml

Adding change to the other compose file to remove open port

* Remove excess blank lines

* Remove excess blank lines

Co-authored-by: Felix E <felix@eckhofer.com>
2022-12-21 02:36:37 -08:00
phail
fe2db4dbf7 adapt compose file for eml parsing 2022-11-30 10:16:39 +01:00
phail
47c88a6bdd Merge remote-tracking branch 'paperless/dev' into feature-consume-eml 2022-11-30 10:10:57 +01:00
Trenton H
fed7d3e993 Use docker compose to start and stop containers which match directly to our command overrides 2022-11-29 20:11:40 -08:00
Trenton H
ba1366f49a Merge branch 'dev' into beta 2022-11-09 13:51:10 -08:00
phail
08988e11f8 Merge remote-tracking branch 'paperless/dev' into feature-consume-eml 2022-10-23 20:37:22 +02:00
phail
f1f5227ccd add unittest for external images 2022-10-22 00:44:32 +02:00
Trenton Holmes
694ad53ef9 Updates Gotenberg container to the latest 2022-10-09 17:55:09 -07:00