Commit Graph

44 Commits

Author SHA1 Message Date
Trenton H
aea2927a02 Feature: Convert Tika parser to the plugin system (#12333)
* Chore: move Tika parser and tests to paperless/

Move TikaDocumentParser and its tests to the canonical parser package
location, matching the pattern established for TextDocumentParser:

- src/paperless_tika/parsers.py → src/paperless/parsers/tika.py
- src/paperless_tika/tests/test_tika_parser.py → src/paperless/tests/parsers/test_tika_parser.py
- src/paperless_tika/tests/samples/ → src/paperless/tests/samples/tika/

Merge tika fixtures (tika_parser, sample_odt_file, sample_docx_file,
sample_doc_file, sample_broken_odt) into the shared parsers conftest.
Remove the now-empty src/paperless_tika/tests/conftest.py.

Content is unchanged — this commit is rename-only so git history is
preserved on the moved files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Feature: Phase 3 — migrate TikaDocumentParser to ParserProtocol

Refactor TikaDocumentParser to satisfy ParserProtocol without subclassing
the legacy DocumentParser ABC:

- Add ClassVars: name, version, author, url
- Add supported_mime_types() classmethod (12 Office/ODF/RTF MIME types)
- Add score() classmethod — returns None when TIKA_ENABLED is False, 10 otherwise
- can_produce_archive = False (PDF is for display, not an OCR archive)
- requires_pdf_rendition = True (Office formats need PDF for browser display)
- __enter__/__exit__ via ExitStack: TikaClient opened once per parser
  lifetime and shared across parse() and extract_metadata() calls
- extract_metadata() falls back to a short-lived TikaClient when called
  outside a context manager (legacy view-layer metadata path)
- _convert_to_pdf() uses OutputTypeConfig() to honour the database-stored
  ApplicationConfiguration before falling back to the env-var setting
- Rename convert_to_pdf → _convert_to_pdf (private helper)

Update paperless_tika/signals.py shim to import from the new module path
and drop the legacy logging_group/progress_callback kwargs.

Update documents/consumer.py to extend the existing TextDocumentParser
special cases to also cover TikaDocumentParser (parse/get_thumbnail
signatures, __exit__ cleanup).

Add TestTikaParserRegistryInterface (7 tests) covering score(), properties,
and ParserProtocol isinstance check.  Update existing tests to use the new
accessor API (get_text, get_date, get_archive_path, _convert_to_pdf).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: update remaining imports and move live Tika tests after parser migration

- src/documents/tests/test_parsers.py: import TikaDocumentParser from
  paperless.parsers.tika (old paperless_tika.parsers no longer exists)
- git mv paperless_tika/tests/test_live_tika.py →
  paperless/tests/parsers/test_live_tika.py to co-locate all Tika tests
  with the parser; update import and replace old attribute API
  (tika_parser.text/.archive_path) with accessor methods
  (get_text/get_archive_path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: satisfy mypy and pyrefly for TikaDocumentParser

Use a TYPE_CHECKING-guarded assert to narrow self._tika_client from
TikaClient | None to TikaClient at the point of use in parse().  The
assert is visible to type checkers (TYPE_CHECKING=True) so both mypy
and pyrefly accept the subsequent attribute accesses without error;
at runtime TYPE_CHECKING is False so the assert never executes and no
ruff S101 suppression is required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: require context manager for TikaDocumentParser; clean up client lifecycle

- consumer.py: call __enter__ for new-style parsers so _tika_client and
  _gotenberg_client are set before parse() is invoked
- views.py: use `with parser` (via nullcontext for old-style parsers) in
  get_metadata so extract_metadata always runs inside a context manager
- tika.py: GotenbergClient added to ExitStack alongside TikaClient;
  inline client creation removed from extract_metadata and _convert_to_pdf;
  __exit__ uses ExitStack.close() instead of __exit__ pass-through

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 15:43:28 -07:00
Trenton H
7494161c95 Add dependency groups for pre-commit dependencies 2026-03-12 08:04:21 -07:00
Trenton H
5331312699 Remove cooldown for pre-commit updates (it's not supported)
Removed the default cooldown period for pre-commit updates.
2026-03-12 07:59:27 -07:00
Trenton H
b5a002b8ed Chore: Enable dependabot for pre-commit (#12305) 2026-03-12 07:52:43 -07:00
Trenton H
8db1c4e08b Breaking: Remove pybzar as a barcode reader (#12065) 2026-02-13 08:14:00 -08:00
shamoon
ab328e0212 Chore: move to Zensical for docs (#12011)
(cherry picked from commit 3c51b3f9cd)
2026-02-07 10:58:55 -08:00
Trenton H
71663fdbe2 Chore: Switches all locations to use prek in place of pre-commit (#12002) 2026-02-05 10:51:23 -08:00
Trenton H
2e5bd02e7e chore: Improves dependabot groups, in particular the Django group not catching everything (#11397) 2025-12-03 09:25:59 -08:00
Trenton H
3d2a3ede71 Chore: Updates dependency groups (#10339) 2025-07-07 17:37:58 -07:00
sidey79
7c33785c07 Development: devcontainer setup, docs and enable dependabot (#10081)
* fix: container setup and task description

* feat: enable dependabot for devcontainer

* fix: dont install latest uv and dont install uvx

* Cleanup decontainer readme

* Fix the reset venv command

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2025-06-01 17:59:46 +00:00
Trenton H
51e70f0a20 Removes the reviewers field, excludes Django and related from the small changes group (#10076) 2025-05-31 15:35:51 +00:00
shamoon
312bb743b9 Chore: add ymlfmt (#9745) 2025-04-22 22:20:54 +00:00
Trenton H
c3df7d3439 Chore: Group additional Django dependencies together (#9741) 2025-04-21 12:16:34 -07:00
shamoon
2f70d58219 Development: change frontend package manager to pnpm (#9363) 2025-03-11 17:59:44 +00:00
Trenton Holmes
5570d20625 Fixes the package ecosystem 2025-03-09 19:46:34 -07:00
Trenton H
ba2cb1dec8 Chore: Enables dependabot for Dockerfile and our Compose files (#9342) 2025-03-09 19:43:07 -07:00
Trenton H
eb8e124971 Chore: Switch from pipenv to uv (#9251) 2025-03-04 16:15:51 +00:00
Trenton H
654c9ca273 Feature: Switch webserver to granian (#9218)
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2025-02-28 19:37:45 +00:00
shamoon
bb83c1eb0a Chore: upgrade to DRF 3.15 (#7134) 2024-07-09 16:57:53 +00:00
Trenton H
e799d757c2 Ignores DRF 3.15.2 (#7122) 2024-06-27 20:40:16 +00:00
Trenton H
5bd248578a Configures dependabot to ignore djangorestframework specific versions (#6967) 2024-06-11 21:36:43 +00:00
Trenton H
622f624132 Chore: Change the code formatter to Ruff (#6756)
* Changing the formatting to ruff-format

* Replaces references to black to ruff or ruff format, removes black from dependencies
2024-05-18 02:26:50 +00:00
Trenton H
f43013a746 Ignores uvicorn updates in dependabot (#5906) 2024-02-26 07:55:59 -08:00
shamoon
0f08796e1b Allow more dependabot updates at once 2023-11-01 13:53:02 -07:00
Trenton H
613b429540 Updates dependabot to group some backend deps and all Github actions updates (#4280) 2023-09-27 09:32:25 -07:00
shamoon
0d01295e79 Update GitHub strings
See #4024
2023-09-19 20:34:27 -07:00
shamoon
8e0adbb0fb Add other jest dependencies to group 2023-09-01 14:17:32 -07:00
shamoon
c5d18b03cd Add eslint to eslint dependabot group 2023-08-01 13:27:58 -07:00
shamoon
62e81d8bf0 Add more frontend dependabot groups 2023-07-19 09:48:23 -07:00
shamoon
aa1f2d3b59 Group frontend angular dependabot updates 2023-07-05 09:57:44 -07:00
Quinn Casey
8c19c2c2e9 Fix comment directory 2022-04-28 12:52:20 -07:00
Quinn Casey
6f0fee4c43 Assign GHA bumps to CI/CD team 2022-04-28 12:51:43 -07:00
Quinn Casey
a5b4a7caad Add back review groups 2022-04-28 12:49:44 -07:00
Quinn Casey
c1b9db19c6 Add back labels removed in 30834e 2022-04-28 12:47:37 -07:00
Trenton Holmes
e5f5030e9c Updates the target branch for GHA updates to be dev 2022-04-25 15:27:15 -07:00
Michael Shamoon
3afb3a905c fix ci.yml comment
[ci skip]
2022-03-27 14:44:39 -07:00
Michael Shamoon
117157f02c Merge branch 'main' into testing 2022-03-23 07:22:52 -07:00
shamoon
30834eb8ff Change dependabot to check npm monthly
[ci skip]
2022-03-21 14:53:56 -07:00
Michael Shamoon
98d677dc0b Merge branch 'main' into dev 2022-03-21 08:36:44 -07:00
shamoon
a6d2a390f0 Add review groups 2022-03-13 20:54:06 -07:00
Trenton Holmes
0edfe83a23 Enables dependabot for Github Action versions 2022-03-12 18:03:39 -08:00
Quinn Casey
296d1d1b61 Make dependabot labels consistent 2022-03-10 14:01:58 -08:00
Johann Bauer
9484eb9f6b Enable dependabot for repository (#86) 2022-03-02 15:49:50 +01:00
Johann Bauer
c7541cb516 Enable dependabot for repository (#86) 2022-02-18 16:06:56 +01:00