Compare commits

...

14 Commits

Author SHA1 Message Date
Trenton H
2098a11eb1 Fix: text parser get_parser forwards logging_group, drops progress_callback
TextDocumentParser.__init__ accepts logging_group: object = None, same
as RemoteDocumentParser. The old shim incorrectly dropped it; fix to
forward it as a positional arg and only drop progress_callback.
Add type annotations and from __future__ import annotations for
consistency with the remote parser signals shim.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 12:36:24 -07:00
Trenton H
af8a8e791b Fix: get_parser factory forwards logging_group, drops progress_callback
consumer.py calls parser_class(logging_group, progress_callback=...).
RemoteDocumentParser.__init__ accepts logging_group but not
progress_callback, so only the latter is dropped — matching the pattern
established by the TextDocumentParser signals shim.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 12:35:16 -07:00
Trenton H
8d4163bef3 Refactor: fix type errors in remote parser and signals
- remote.py: add `if TYPE_CHECKING: assert` guards before the Azure
  client construction to narrow config.endpoint and config.api_key from
  str|None to str. The narrowing is safe: engine_is_valid() guarantees
  both are non-None when it returns True (api_key explicitly; endpoint
  via `not (engine=="azureai" and endpoint is None)` for the only valid
  engine). Asserts are wrapped in TYPE_CHECKING so they carry zero
  runtime cost.

- signals.py: add full type annotations — return types, Any-typed
  sender parameter, and explicit logging_group argument replacing *args.
  Add `from __future__ import annotations` for consistent annotation style.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 12:31:17 -07:00
Trenton H
e9e1d4ccca Refactor: wire RemoteDocumentParser into consumer and fix signals
- paperless_remote/signals.py: import from paperless.parsers.remote
  (new location after git mv). supported_mime_types() is now a
  classmethod that always returns the full set, so get_supported_mime_types()
  in the signal layer explicitly checks RemoteEngineConfig validity and
  returns {} when unconfigured — preserving the old behaviour where an
  unconfigured remote parser does not register for any MIME types.

- documents/consumer.py: extend the _parser_cleanup() shim, parse()
  dispatch, and get_thumbnail() dispatch to include RemoteDocumentParser
  alongside TextDocumentParser. Both new-style parsers use __exit__
  for cleanup and take (document_path, mime_type) without a file_name
  argument.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 12:09:33 -07:00
Trenton H
c955ba7d07 Refactor: improve remote parser test fixture structure
- make_azure_mock moved from conftest.py back into test_remote_parser.py;
  it is specific to that module and does not belong in shared fixtures
- azure_client fixture composes azure_settings + make_azure_mock + patch
  in one step; tests no longer repeat the mocker.patch call or carry an
  unused azure_settings parameter
- failing_azure_client fixture similarly composes azure_settings + patch
  with a RuntimeError side effect; TestRemoteParserParseError now only
  receives the mock it actually uses
- All @pytest.mark.parametrize calls use pytest.param with explicit ids
  (pdf, png, jpeg, ...) for readable test output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 12:00:37 -07:00
Trenton H
7028bb2163 Refactor: use fixture factory and usefixtures in remote parser tests
- `_make_azure_mock` helper promoted to `make_azure_mock` factory fixture
  in conftest.py; tests call `make_azure_mock()` or
  `make_azure_mock("custom text")` instead of a module-level function
- `azure_settings` and `no_engine_settings` applied via
  `@pytest.mark.usefixtures` wherever their value is not referenced
  inside the test body; `TestRemoteParserParseError` marked at the class
  level since all three tests need the same setting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 11:56:38 -07:00
Trenton H
5d4d87764c Feature: migrate RemoteDocumentParser to ParserProtocol interface
Rewrites the remote OCR parser to the new plugin system contract:

- `supported_mime_types()` is now a classmethod that always returns the
  full set of 7 MIME types; the old instance-method hack (returning {}
  when unconfigured) is removed
- `score()` classmethod returns None when no remote engine is configured
  (making the parser invisible to the registry), and 20 when active —
  higher than the tesseract default of 10 so the remote engine takes
  priority when both are available
- No longer inherits from RasterisedDocumentParser; inherits no parser
  class at all — just implements the protocol directly
- `can_produce_archive = True`; `requires_pdf_rendition = False`
- `_azure_ai_vision_parse()` takes explicit config arg; API client
  created and closed within the method
- `get_page_count()` returns the PDF page count for application/pdf,
  delegating to the new `get_page_count_for_pdf()` utility
- `extract_metadata()` delegates to `extract_pdf_metadata()` for PDFs;
  returns [] for all other MIME types

New files:
- `src/paperless/parsers/utils.py` — shared `extract_pdf_metadata()` and
  `get_page_count_for_pdf()` utilities (pikepdf-based); both the remote
  and tesseract parsers will use these going forward
- `src/paperless/tests/parsers/test_remote_parser.py` — 42 pytest-style
  tests using pytest-django `settings` and pytest-mock `mocker` fixtures
- `src/paperless/tests/parsers/conftest.py` — remote parser instance,
  sample-file, and settings-helper fixtures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 11:52:11 -07:00
Trenton H
75dce7f19f Refactor: move remote parser, test, and sample to paperless.parsers
Relocates three files to their new homes in the parser plugin system:

- src/paperless_remote/parsers.py
    → src/paperless/parsers/remote.py
- src/paperless_remote/tests/test_parser.py
    → src/paperless/tests/parsers/test_remote_parser.py
- src/paperless_remote/tests/samples/simple-digital.pdf
    → src/paperless/tests/samples/remote/simple-digital.pdf

Content and imports will be updated in the follow-up commit that
rewrites the parser to the new ParserProtocol interface.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 11:32:34 -07:00
dependabot[bot]
365ff99934 Bump ocrmypdf from 16.13.0 to 17.3.0 in the document-processing group (#12267)
* Bump ocrmypdf from 16.13.0 to 17.3.0 in the document-processing group

Bumps the document-processing group with 1 update: [ocrmypdf](https://github.com/ocrmypdf/OCRmyPDF).


Updates `ocrmypdf` from 16.13.0 to 17.3.0
- [Release notes](https://github.com/ocrmypdf/OCRmyPDF/releases)
- [Commits](https://github.com/ocrmypdf/OCRmyPDF/compare/v16.13.0...v17.3.0)

---
updated-dependencies:
- dependency-name: ocrmypdf
  dependency-version: 17.3.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: document-processing
...

Signed-off-by: dependabot[bot] <support@github.com>

* Updates the argument name for v17

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
2026-03-13 09:51:21 -07:00
Trenton H
d86cfdb088 Feature: Initial document parser plugin framework (#12294) 2026-03-12 21:53:17 +00:00
dependabot[bot]
c2e1085418 Chore(deps): Bump tornado from 6.5.4 to 6.5.5 (#12327)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.5.4 to 6.5.5.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.5.4...v6.5.5)

---
updated-dependencies:
- dependency-name: tornado
  dependency-version: 6.5.5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-12 13:44:41 -07:00
Trenton H
ee0d1a3094 Enhancement: Make the StatusConsumer truly async (#12298) 2026-03-12 13:27:35 -07:00
Trenton H
f15394fa5c Fix: Removes the double exec that prevented migrations from running (#12317) 2026-03-12 12:46:12 -07:00
dependabot[bot]
773eb25f7d Chore(deps): Bump the utilities-minor group across 1 directory with 5 updates (#12324)
* Chore(deps): Bump the utilities-minor group across 1 directory with 5 updates

Bumps the utilities-minor group with 5 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [drf-spectacular-sidecar](https://github.com/tfranzel/drf-spectacular-sidecar) | `2026.1.1` | `2026.3.1` |
| [filelock](https://github.com/tox-dev/py-filelock) | `3.20.3` | `3.25.0` |
| [scikit-learn](https://github.com/scikit-learn/scikit-learn) | `1.7.2` | `1.8.0` |
| [faker](https://github.com/joke2k/faker) | `40.5.1` | `40.8.0` |
| [pyrefly](https://github.com/facebook/pyrefly) | `0.54.0` | `0.55.0` |



Updates `drf-spectacular-sidecar` from 2026.1.1 to 2026.3.1
- [Commits](https://github.com/tfranzel/drf-spectacular-sidecar/compare/2026.1.1...2026.3.1)

Updates `filelock` from 3.20.3 to 3.25.0
- [Release notes](https://github.com/tox-dev/py-filelock/releases)
- [Changelog](https://github.com/tox-dev/filelock/blob/main/docs/changelog.rst)
- [Commits](https://github.com/tox-dev/py-filelock/compare/3.20.3...3.25.0)

Updates `scikit-learn` from 1.7.2 to 1.8.0
- [Release notes](https://github.com/scikit-learn/scikit-learn/releases)
- [Commits](https://github.com/scikit-learn/scikit-learn/compare/1.7.2...1.8.0)

Updates `faker` from 40.5.1 to 40.8.0
- [Release notes](https://github.com/joke2k/faker/releases)
- [Changelog](https://github.com/joke2k/faker/blob/master/CHANGELOG.md)
- [Commits](https://github.com/joke2k/faker/compare/v40.5.1...v40.8.0)

Updates `pyrefly` from 0.54.0 to 0.55.0
- [Release notes](https://github.com/facebook/pyrefly/releases)
- [Commits](https://github.com/facebook/pyrefly/compare/0.54.0...0.55.0)

---
updated-dependencies:
- dependency-name: drf-spectacular-sidecar
  dependency-version: 2026.3.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: filelock
  dependency-version: 3.25.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: scikit-learn
  dependency-version: 1.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: faker
  dependency-version: 40.8.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: pyrefly
  dependency-version: 0.55.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Dont know what your problem is dependabot

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-03-12 12:30:42 -07:00
34 changed files with 3797 additions and 750 deletions

View File

@@ -10,12 +10,10 @@ cd "${PAPERLESS_SRC_DIR}"
# The whole migrate, with flock, needs to run as the right user
if [[ -n "${USER_IS_NON_ROOT}" ]]; then
exec s6-setlock -n "${data_dir}/migration_lock" python3 manage.py check --tag compatibility paperless
python3 manage.py check --tag compatibility paperless || exit 1
exec s6-setlock -n "${data_dir}/migration_lock" python3 manage.py migrate --skip-checks --no-input
else
exec s6-setuidgid paperless \
s6-setlock -n "${data_dir}/migration_lock" \
python3 manage.py check --tag compatibility paperless
s6-setuidgid paperless python3 manage.py check --tag compatibility paperless || exit 1
exec s6-setuidgid paperless \
s6-setlock -n "${data_dir}/migration_lock" \
python3 manage.py migrate --skip-checks --no-input

View File

@@ -42,10 +42,10 @@ dependencies = [
"djangorestframework~=3.16",
"djangorestframework-guardian~=0.4.0",
"drf-spectacular~=0.28",
"drf-spectacular-sidecar~=2026.1.1",
"drf-spectacular-sidecar~=2026.3.1",
"drf-writable-nested~=0.7.1",
"faiss-cpu>=1.10",
"filelock~=3.20.3",
"filelock~=3.25.2",
"flower~=2.0.1",
"gotenberg-client~=0.13.1",
"httpx-oauth~=0.16",
@@ -60,7 +60,7 @@ dependencies = [
"llama-index-llms-openai>=0.6.13",
"llama-index-vector-stores-faiss>=0.5.2",
"nltk~=3.9.1",
"ocrmypdf~=16.13.0",
"ocrmypdf~=17.3.0",
"openai>=1.76",
"pathvalidate~=3.3.1",
"pdf2image~=1.17.0",
@@ -72,7 +72,7 @@ dependencies = [
"rapidfuzz~=3.14.0",
"redis[hiredis]~=5.2.1",
"regex>=2025.9.18",
"scikit-learn~=1.7.0",
"scikit-learn~=1.8.0",
"sentence-transformers>=4.1",
"setproctitle~=1.3.4",
"tika-client~=0.10.0",
@@ -111,7 +111,7 @@ docs = [
testing = [
"daphne",
"factory-boy~=3.3.1",
"faker~=40.5.1",
"faker~=40.8.0",
"imagehash",
"pytest~=9.0.0",
"pytest-cov~=7.0.0",

View File

@@ -51,11 +51,29 @@ from documents.templating.workflows import parse_w_workflow_placeholders
from documents.utils import copy_basic_file_stats
from documents.utils import copy_file_with_basic_stats
from documents.utils import run_subprocess
from paperless.parsers.remote import RemoteDocumentParser
from paperless.parsers.text import TextDocumentParser
from paperless_mail.parsers import MailDocumentParser
LOGGING_NAME: Final[str] = "paperless.consumer"
def _parser_cleanup(parser: DocumentParser) -> None:
"""
Call cleanup on a parser, handling the new-style context-manager parsers.
New-style parsers (e.g. TextDocumentParser) use __exit__ for teardown
instead of a cleanup() method. This shim will be removed once all existing parsers
have switched to the new style and this consumer is updated to use it
TODO(stumpylog): Remove me in the future
"""
if isinstance(parser, (TextDocumentParser, RemoteDocumentParser)):
parser.__exit__(None, None, None)
else:
parser.cleanup()
class WorkflowTriggerPlugin(
NoCleanupPluginMixin,
NoSetupPluginMixin,
@@ -459,6 +477,12 @@ class ConsumerPlugin(
self.filename,
self.input_doc.mailrule_id,
)
elif isinstance(
document_parser,
(TextDocumentParser, RemoteDocumentParser),
):
# TODO(stumpylog): Remove me in the future
document_parser.parse(self.working_copy, mime_type)
else:
document_parser.parse(self.working_copy, mime_type, self.filename)
@@ -469,11 +493,15 @@ class ConsumerPlugin(
ProgressStatusOptions.WORKING,
ConsumerStatusShortMessage.GENERATING_THUMBNAIL,
)
thumbnail = document_parser.get_thumbnail(
self.working_copy,
mime_type,
self.filename,
)
if isinstance(document_parser, (TextDocumentParser, RemoteDocumentParser)):
# TODO(stumpylog): Remove me in the future
thumbnail = document_parser.get_thumbnail(self.working_copy, mime_type)
else:
thumbnail = document_parser.get_thumbnail(
self.working_copy,
mime_type,
self.filename,
)
text = document_parser.get_text()
date = document_parser.get_date()
@@ -490,7 +518,7 @@ class ConsumerPlugin(
page_count = document_parser.get_page_count(self.working_copy, mime_type)
except ParseError as e:
document_parser.cleanup()
_parser_cleanup(document_parser)
if tempdir:
tempdir.cleanup()
self._fail(
@@ -500,7 +528,7 @@ class ConsumerPlugin(
exception=e,
)
except Exception as e:
document_parser.cleanup()
_parser_cleanup(document_parser)
if tempdir:
tempdir.cleanup()
self._fail(
@@ -702,7 +730,7 @@ class ConsumerPlugin(
exception=e,
)
finally:
document_parser.cleanup()
_parser_cleanup(document_parser)
tempdir.cleanup()
self.run_post_consume_script(document)

View File

@@ -30,6 +30,7 @@ def _process_document(doc_id: int) -> None:
)
shutil.move(thumb, document.thumbnail_path)
finally:
# TODO(stumpylog): Cleanup once all parsers are handled
parser.cleanup()

View File

@@ -399,6 +399,7 @@ def update_document_content_maybe_archive_file(document_id) -> None:
f"Error while parsing document {document} (ID: {document_id})",
)
finally:
# TODO(stumpylog): Cleanup once all parsers are handled
parser.cleanup()

View File

@@ -9,8 +9,8 @@ from documents.parsers import get_default_file_extension
from documents.parsers import get_parser_class_for_mime_type
from documents.parsers import get_supported_file_extensions
from documents.parsers import is_file_ext_supported
from paperless.parsers.text import TextDocumentParser
from paperless_tesseract.parsers import RasterisedDocumentParser
from paperless_text.parsers import TextDocumentParser
from paperless_tika.parsers import TikaDocumentParser

View File

@@ -1,6 +1,7 @@
import os
from celery import Celery
from celery.signals import worker_process_init
# Set the default Django settings module for the 'celery' program.
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "paperless.settings")
@@ -15,3 +16,19 @@ app.config_from_object("django.conf:settings", namespace="CELERY")
# Load task modules from all registered Django apps.
app.autodiscover_tasks()
@worker_process_init.connect
def on_worker_process_init(**kwargs) -> None: # pragma: no cover
"""
Register built-in parsers eagerly in each Celery worker process.
This registers only the built-in parsers (no entrypoint discovery) so
that workers can begin consuming documents immediately. Entrypoint
discovery for third-party parsers is deferred to the first call of
get_parser_registry() inside a task, keeping worker_process_init
well within its 4-second timeout budget.
"""
from paperless.parsers.registry import init_builtin_parsers
init_builtin_parsers()

View File

@@ -1,62 +1,51 @@
import json
from typing import Any
from asgiref.sync import async_to_sync
from channels.exceptions import AcceptConnection
from channels.exceptions import DenyConnection
from channels.generic.websocket import WebsocketConsumer
from channels.generic.websocket import AsyncWebsocketConsumer
class StatusConsumer(WebsocketConsumer):
def _authenticated(self):
return "user" in self.scope and self.scope["user"].is_authenticated
class StatusConsumer(AsyncWebsocketConsumer):
def _authenticated(self) -> bool:
user: Any = self.scope.get("user")
return user is not None and user.is_authenticated
def _can_view(self, data):
user = self.scope.get("user") if self.scope.get("user") else None
async def _can_view(self, data: dict[str, Any]) -> bool:
user: Any = self.scope.get("user")
if user is None:
return False
owner_id = data.get("owner_id")
users_can_view = data.get("users_can_view", [])
groups_can_view = data.get("groups_can_view", [])
return (
user.is_superuser
or user.id == owner_id
or user.id in users_can_view
or any(
user.groups.filter(pk=group_id).exists() for group_id in groups_can_view
)
)
def connect(self):
if user.is_superuser or user.id == owner_id or user.id in users_can_view:
return True
return await user.groups.filter(pk__in=groups_can_view).aexists()
async def connect(self) -> None:
if not self._authenticated():
raise DenyConnection
else:
async_to_sync(self.channel_layer.group_add)(
"status_updates",
self.channel_name,
)
raise AcceptConnection
await self.close()
return
await self.channel_layer.group_add("status_updates", self.channel_name)
await self.accept()
def disconnect(self, close_code) -> None:
async_to_sync(self.channel_layer.group_discard)(
"status_updates",
self.channel_name,
)
async def disconnect(self, code: int) -> None:
await self.channel_layer.group_discard("status_updates", self.channel_name)
def status_update(self, event) -> None:
async def status_update(self, event: dict[str, Any]) -> None:
if not self._authenticated():
self.close()
else:
if self._can_view(event["data"]):
self.send(json.dumps(event))
await self.close()
elif await self._can_view(event["data"]):
await self.send(json.dumps(event))
def documents_deleted(self, event) -> None:
async def documents_deleted(self, event: dict[str, Any]) -> None:
if not self._authenticated():
self.close()
await self.close()
else:
self.send(json.dumps(event))
await self.send(json.dumps(event))
def document_updated(self, event: Any) -> None:
async def document_updated(self, event: dict[str, Any]) -> None:
if not self._authenticated():
self.close()
else:
if self._can_view(event["data"]):
self.send(json.dumps(event))
await self.close()
elif await self._can_view(event["data"]):
await self.send(json.dumps(event))

View File

@@ -0,0 +1,379 @@
"""
Public interface for the Paperless-ngx parser plugin system.
This module defines ParserProtocol — the structural contract that every
document parser must satisfy, whether it is a built-in parser shipped with
Paperless-ngx or a third-party parser installed via a Python entrypoint.
Phase 1/2 scope: only the Protocol is defined here. The transitional
DocumentParser ABC (Phase 3) and concrete built-in parsers (Phase 3+) will
be added in later phases, so there are intentionally no imports of parser
implementations here.
Usage example (third-party parser)::
from paperless.parsers import ParserProtocol
class MyParser:
name = "my-parser"
version = "1.0.0"
author = "Acme Corp"
url = "https://example.com/my-parser"
@classmethod
def supported_mime_types(cls) -> dict[str, str]:
return {"application/x-my-format": ".myf"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 10
# … implement remaining protocol methods …
assert isinstance(MyParser(), ParserProtocol)
"""
from __future__ import annotations
from typing import TYPE_CHECKING
from typing import Protocol
from typing import Self
from typing import TypedDict
from typing import runtime_checkable
if TYPE_CHECKING:
import datetime
from pathlib import Path
from types import TracebackType
__all__ = [
"MetadataEntry",
"ParserProtocol",
]
class MetadataEntry(TypedDict):
"""A single metadata field extracted from a document.
All four keys are required. Values are always serialised to strings —
type-specific conversion (dates, integers, lists) is the responsibility
of the parser before returning.
"""
namespace: str
"""URI of the metadata namespace (e.g. 'http://ns.adobe.com/pdf/1.3/')."""
prefix: str
"""Conventional namespace prefix (e.g. 'pdf', 'xmp', 'dc')."""
key: str
"""Field name within the namespace (e.g. 'Author', 'CreateDate')."""
value: str
"""String representation of the field value."""
@runtime_checkable
class ParserProtocol(Protocol):
"""Structural contract for all Paperless-ngx document parsers.
Both built-in parsers and third-party plugins (discovered via the
"paperless_ngx.parsers" entrypoint group) must satisfy this Protocol.
Because it is decorated with runtime_checkable, isinstance(obj,
ParserProtocol) works at runtime based on method presence, which is
useful for validation in ParserRegistry.discover.
Parsers must expose four string attributes at the class level so the
registry can log attribution information without instantiating the parser:
name : str
Human-readable parser name (e.g. "Tesseract OCR").
version : str
Semantic version string (e.g. "1.2.3").
author : str
Author or organisation name.
url : str
URL for documentation, source code, or issue tracker.
"""
# ------------------------------------------------------------------
# Class-level identity (checked by the registry, not Protocol methods)
# ------------------------------------------------------------------
name: str
version: str
author: str
url: str
# ------------------------------------------------------------------
# Class methods
# ------------------------------------------------------------------
@classmethod
def supported_mime_types(cls) -> dict[str, str]:
"""Return a mapping of supported MIME types to preferred file extensions.
The keys are MIME type strings (e.g. "application/pdf"), and the
values are the preferred file extension including the leading dot
(e.g. ".pdf"). The registry uses this mapping both to decide whether
a parser is a candidate for a given file and to determine the default
extension when creating archive copies.
Returns
-------
dict[str, str]
{mime_type: extension} mapping — may be empty if the parser
has been temporarily disabled.
"""
...
@classmethod
def score(
cls,
mime_type: str,
filename: str,
path: Path | None = None,
) -> int | None:
"""Return a priority score for handling this file, or None to decline.
The registry calls this after confirming that the MIME type is in
supported_mime_types. Parsers may inspect filename and optionally
the file at path to refine their confidence level.
A higher score wins. Return None to explicitly decline handling a file
even though the MIME type is listed as supported (e.g. when a feature
flag is disabled, or a required service is not configured).
Parameters
----------
mime_type:
The detected MIME type of the file to be parsed.
filename:
The original filename, including extension.
path:
Optional filesystem path to the file. Parsers that need to
inspect file content (e.g. magic-byte sniffing) may use this.
May be None when scoring happens before the file is available locally.
Returns
-------
int | None
Priority score (higher wins), or None to decline.
"""
...
# ------------------------------------------------------------------
# Properties
# ------------------------------------------------------------------
@property
def can_produce_archive(self) -> bool:
"""Whether this parser can produce a searchable PDF archive copy.
If True, the consumption pipeline may request an archive version when
processing the document, subject to the ARCHIVE_FILE_GENERATION
setting. If False, only thumbnail and text extraction are performed.
"""
...
@property
def requires_pdf_rendition(self) -> bool:
"""Whether the parser must produce a PDF for the frontend to display.
True for formats the browser cannot display natively (e.g. DOCX, ODT).
When True, the pipeline always stores the PDF output regardless of the
ARCHIVE_FILE_GENERATION setting, since the original format cannot be
shown to the user.
"""
...
# ------------------------------------------------------------------
# Core parsing interface
# ------------------------------------------------------------------
def parse(
self,
document_path: Path,
mime_type: str,
*,
produce_archive: bool = True,
) -> None:
"""Parse document_path and populate internal state.
After a successful call, callers retrieve results via get_text,
get_date, and get_archive_path.
Parameters
----------
document_path:
Absolute path to the document file to parse.
mime_type:
Detected MIME type of the document.
produce_archive:
When True (the default) and can_produce_archive is also True,
the parser should produce a searchable PDF at the path returned
by get_archive_path. Pass False when only text extraction and
thumbnail generation are required and disk I/O should be minimised.
Raises
------
documents.parsers.ParseError
If parsing fails for any reason.
"""
...
# ------------------------------------------------------------------
# Result accessors
# ------------------------------------------------------------------
def get_text(self) -> str | None:
"""Return the plain-text content extracted during parse.
Returns
-------
str | None
Extracted text, or None if no text could be found.
"""
...
def get_date(self) -> datetime.datetime | None:
"""Return the document date detected during parse.
Returns
-------
datetime.datetime | None
Detected document date, or None if no date was found.
"""
...
def get_archive_path(self) -> Path | None:
"""Return the path to the generated archive PDF, or None.
Returns
-------
Path | None
Path to the searchable PDF archive, or None if no archive was
produced (e.g. because produce_archive=False or the parser does
not support archive generation).
"""
...
# ------------------------------------------------------------------
# Thumbnail and metadata
# ------------------------------------------------------------------
def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
"""Generate and return the path to a thumbnail image for the document.
May be called independently of parse. The returned path must point to
an existing WebP image file inside the parser's temporary working
directory.
Parameters
----------
document_path:
Absolute path to the source document.
mime_type:
Detected MIME type of the document.
Returns
-------
Path
Path to the generated thumbnail image (WebP format preferred).
"""
...
def get_page_count(
self,
document_path: Path,
mime_type: str,
) -> int | None:
"""Return the number of pages in the document, if determinable.
Parameters
----------
document_path:
Absolute path to the source document.
mime_type:
Detected MIME type of the document.
Returns
-------
int | None
Page count, or None if the parser cannot determine it.
"""
...
def extract_metadata(
self,
document_path: Path,
mime_type: str,
) -> list[MetadataEntry]:
"""Extract format-specific metadata from the document.
Called by the API view layer on demand — not during the consumption
pipeline. Results are returned to the frontend for per-file display.
For documents with an archive version, this method is called twice:
once for the original file (with its native MIME type) and once for
the archive file (with ``"application/pdf"``). Parsers that produce
archives should handle both cases.
Implementations must not raise. A failure to read metadata is not
fatal — log a warning and return whatever partial results were
collected, or ``[]`` if none.
Parameters
----------
document_path:
Absolute path to the file to extract metadata from.
mime_type:
MIME type of the file at ``document_path``. May be
``"application/pdf"`` when called for the archive version.
Returns
-------
list[MetadataEntry]
Zero or more metadata entries. Returns ``[]`` if no metadata
could be extracted or the format does not support it.
"""
...
# ------------------------------------------------------------------
# Context manager
# ------------------------------------------------------------------
def __enter__(self) -> Self:
"""Enter the parser context, returning the parser instance.
Implementations should perform any resource allocation here if not
done in __init__ (e.g. creating API clients or temp directories).
Returns
-------
Self
The parser instance itself.
"""
...
def __exit__(
self,
exc_type: type[BaseException] | None,
exc_val: BaseException | None,
exc_tb: TracebackType | None,
) -> None:
"""Exit the parser context and release all resources.
Implementations must clean up all temporary files and other resources
regardless of whether an exception occurred.
Parameters
----------
exc_type:
The exception class, or None if no exception was raised.
exc_val:
The exception instance, or None.
exc_tb:
The traceback, or None.
"""
...

View File

@@ -0,0 +1,366 @@
"""
Singleton registry that tracks all document parsers available to
Paperless-ngx — both built-ins shipped with the application and third-party
plugins installed via Python entrypoints.
Public surface
--------------
get_parser_registry
Lazy-initialise and return the shared ParserRegistry. This is the primary
entry point for production code.
init_builtin_parsers
Register built-in parsers only, without entrypoint discovery. Safe to
call from Celery worker_process_init where importing all entrypoints
would be wasteful or cause side effects.
reset_parser_registry
Reset module-level state. For tests only.
Entrypoint group
----------------
Third-party parsers must advertise themselves under the
"paperless_ngx.parsers" entrypoint group in their pyproject.toml::
[project.entry-points."paperless_ngx.parsers"]
my_parser = "my_package.parsers:MyParser"
The loaded class must expose the following attributes at the class level
(not just on instances) for the registry to accept it:
name, version, author, url, supported_mime_types (callable), score (callable).
"""
from __future__ import annotations
import logging
from importlib.metadata import entry_points
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from pathlib import Path
from paperless.parsers import ParserProtocol
logger = logging.getLogger("paperless.parsers.registry")
# ---------------------------------------------------------------------------
# Module-level singleton state
# ---------------------------------------------------------------------------
_registry: ParserRegistry | None = None
_discovery_complete: bool = False
# Attribute names that every registered external parser class must expose.
_REQUIRED_ATTRS: tuple[str, ...] = (
"name",
"version",
"author",
"url",
"supported_mime_types",
"score",
)
# ---------------------------------------------------------------------------
# Module-level accessor functions
# ---------------------------------------------------------------------------
def get_parser_registry() -> ParserRegistry:
"""Return the shared ParserRegistry instance.
On the first call this function:
1. Creates a new ParserRegistry.
2. Calls register_defaults to install built-in parsers.
3. Calls discover to load third-party plugins via importlib.metadata entrypoints.
4. Calls log_summary to emit a startup summary.
Subsequent calls return the same instance immediately.
Returns
-------
ParserRegistry
The shared registry singleton.
"""
global _registry, _discovery_complete
if _registry is None:
_registry = ParserRegistry()
_registry.register_defaults()
if not _discovery_complete:
_registry.discover()
_registry.log_summary()
_discovery_complete = True
return _registry
def init_builtin_parsers() -> None:
"""Register built-in parsers without performing entrypoint discovery.
Intended for use in Celery worker_process_init handlers where importing
all installed entrypoints would be wasteful, slow, or could produce
undesirable side effects. Entrypoint discovery (third-party plugins) is
deliberately not performed.
Safe to call multiple times — subsequent calls are no-ops.
Returns
-------
None
"""
global _registry
if _registry is None:
_registry = ParserRegistry()
_registry.register_defaults()
def reset_parser_registry() -> None:
"""Reset the module-level registry state to its initial values.
Resets _registry and _discovery_complete so the next call to
get_parser_registry will re-initialise everything from scratch.
FOR TESTS ONLY. Do not call this in production code — resetting the
registry mid-request causes all subsequent parser lookups to go through
discovery again, which is expensive and may have unexpected side effects
in multi-threaded environments.
Returns
-------
None
"""
global _registry, _discovery_complete
_registry = None
_discovery_complete = False
# ---------------------------------------------------------------------------
# Registry class
# ---------------------------------------------------------------------------
class ParserRegistry:
"""Registry that maps MIME types to the best available parser class.
Parsers are partitioned into two lists:
_builtins
Parser classes registered via register_builtin (populated by
register_defaults in Phase 3+).
_external
Parser classes loaded from installed Python entrypoints via discover.
When resolving a parser for a file, external parsers are evaluated
alongside built-in parsers using a uniform scoring mechanism. Both lists
are iterated together; the class with the highest score wins. If an
external parser wins, its attribution details are logged so users can
identify which third-party package handled their document.
"""
def __init__(self) -> None:
self._external: list[type[ParserProtocol]] = []
self._builtins: list[type[ParserProtocol]] = []
# ------------------------------------------------------------------
# Registration
# ------------------------------------------------------------------
def register_builtin(self, parser_class: type[ParserProtocol]) -> None:
"""Register a built-in parser class.
Built-in parsers are shipped with Paperless-ngx and are appended to
the _builtins list. They are never overridden by external parsers;
instead, scoring determines which parser wins for any given file.
Parameters
----------
parser_class:
The parser class to register. Must satisfy ParserProtocol.
"""
self._builtins.append(parser_class)
def register_defaults(self) -> None:
"""Register the built-in parsers that ship with Paperless-ngx.
Each parser that has been migrated to the new ParserProtocol interface
is registered here. Parsers are added in ascending weight order so
that log output is predictable; scoring determines which parser wins
at runtime regardless of registration order.
"""
from paperless.parsers.remote import RemoteDocumentParser
from paperless.parsers.text import TextDocumentParser
self.register_builtin(TextDocumentParser)
self.register_builtin(RemoteDocumentParser)
# ------------------------------------------------------------------
# Discovery
# ------------------------------------------------------------------
def discover(self) -> None:
"""Load third-party parsers from the "paperless_ngx.parsers" entrypoint group.
For each advertised entrypoint the method:
1. Calls ep.load() to import the class.
2. Validates that the class exposes all required attributes.
3. On success, appends the class to _external and logs an info message.
4. On failure (import error or missing attributes), logs an appropriate
warning/error and continues to the next entrypoint.
Errors during discovery of a single parser do not prevent other parsers
from being loaded.
Returns
-------
None
"""
eps = entry_points(group="paperless_ngx.parsers")
for ep in eps:
try:
parser_class = ep.load()
except Exception:
logger.exception(
"Failed to load parser entrypoint '%s' — skipping.",
ep.name,
)
continue
missing = [
attr for attr in _REQUIRED_ATTRS if not hasattr(parser_class, attr)
]
if missing:
logger.warning(
"Parser loaded from entrypoint '%s' is missing required "
"attributes %r — skipping.",
ep.name,
missing,
)
continue
self._external.append(parser_class)
logger.info(
"Loaded third-party parser '%s' v%s by %s (entrypoint: '%s').",
parser_class.name,
parser_class.version,
parser_class.author,
ep.name,
)
# ------------------------------------------------------------------
# Summary logging
# ------------------------------------------------------------------
def log_summary(self) -> None:
"""Log a startup summary of all registered parsers.
Built-in parsers are listed first, followed by any external parsers
discovered from entrypoints. If no external parsers were found a
short informational message is logged instead of an empty list.
Returns
-------
None
"""
logger.info(
"Built-in parsers (%d):",
len(self._builtins),
)
for cls in self._builtins:
logger.info(
" [built-in] %s v%s%s",
getattr(cls, "name", repr(cls)),
getattr(cls, "version", "unknown"),
getattr(cls, "url", "built-in"),
)
if not self._external:
logger.info("No third-party parsers discovered.")
return
logger.info(
"Third-party parsers (%d):",
len(self._external),
)
for cls in self._external:
logger.info(
" [external] %s v%s by %s — report issues at %s",
getattr(cls, "name", repr(cls)),
getattr(cls, "version", "unknown"),
getattr(cls, "author", "unknown"),
getattr(cls, "url", "unknown"),
)
# ------------------------------------------------------------------
# Parser resolution
# ------------------------------------------------------------------
def get_parser_for_file(
self,
mime_type: str,
filename: str,
path: Path | None = None,
) -> type[ParserProtocol] | None:
"""Return the best parser class for the given file, or None.
All registered parsers (external first, then built-ins) are evaluated
against the file. A parser is eligible if mime_type appears in the dict
returned by its supported_mime_types classmethod, and its score
classmethod returns a non-None integer.
The parser with the highest score wins. When two parsers return the
same score, the one that appears earlier in the evaluation order wins
(external parsers are evaluated before built-ins, giving third-party
packages a chance to override defaults at equal priority).
When an external parser is selected, its identity is logged at INFO
level so operators can trace which package handled a document.
Parameters
----------
mime_type:
The detected MIME type of the file.
filename:
The original filename, including extension.
path:
Optional filesystem path to the file. Forwarded to each
parser's score method.
Returns
-------
type[ParserProtocol] | None
The winning parser class, or None if no parser can handle the file.
"""
best_score: int | None = None
best_parser: type[ParserProtocol] | None = None
# External parsers are placed first so that, at equal scores, an
# external parser wins over a built-in (first-seen policy).
for parser_class in (*self._external, *self._builtins):
if mime_type not in parser_class.supported_mime_types():
continue
score = parser_class.score(mime_type, filename, path)
if score is None:
continue
if best_score is None or score > best_score:
best_score = score
best_parser = parser_class
if best_parser is not None and best_parser in self._external:
logger.info(
"Document handled by third-party parser '%s' v%s%s",
getattr(best_parser, "name", repr(best_parser)),
getattr(best_parser, "version", "unknown"),
getattr(best_parser, "url", "unknown"),
)
return best_parser

View File

@@ -0,0 +1,429 @@
"""
Built-in remote-OCR document parser.
Handles documents by sending them to a configured remote OCR engine
(currently Azure AI Vision / Document Intelligence) and retrieving both
the extracted text and a searchable PDF with an embedded text layer.
When no engine is configured, ``score()`` returns ``None`` so the parser
is effectively invisible to the registry — the tesseract parser handles
these MIME types instead.
"""
from __future__ import annotations
import logging
import shutil
import tempfile
from pathlib import Path
from typing import TYPE_CHECKING
from typing import Self
from django.conf import settings
from paperless.version import __full_version_str__
if TYPE_CHECKING:
import datetime
from types import TracebackType
from paperless.parsers import MetadataEntry
logger = logging.getLogger("paperless.parsing.remote")
_SUPPORTED_MIME_TYPES: dict[str, str] = {
"application/pdf": ".pdf",
"image/png": ".png",
"image/jpeg": ".jpg",
"image/tiff": ".tiff",
"image/bmp": ".bmp",
"image/gif": ".gif",
"image/webp": ".webp",
}
class RemoteEngineConfig:
"""Holds and validates the remote OCR engine configuration."""
def __init__(
self,
engine: str | None,
api_key: str | None = None,
endpoint: str | None = None,
) -> None:
self.engine = engine
self.api_key = api_key
self.endpoint = endpoint
def engine_is_valid(self) -> bool:
"""Return True when the engine is known and fully configured."""
return (
self.engine in ("azureai",)
and self.api_key is not None
and not (self.engine == "azureai" and self.endpoint is None)
)
class RemoteDocumentParser:
"""Parse documents via a remote OCR API (currently Azure AI Vision).
This parser sends documents to a remote engine that returns both
extracted text and a searchable PDF with an embedded text layer.
It does not depend on Tesseract or ocrmypdf.
Class attributes
----------------
name : str
Human-readable parser name.
version : str
Semantic version string, kept in sync with Paperless-ngx releases.
author : str
Maintainer name.
url : str
Issue tracker / source URL.
"""
name: str = "Paperless-ngx Remote OCR Parser"
version: str = __full_version_str__
author: str = "Paperless-ngx Contributors"
url: str = "https://github.com/paperless-ngx/paperless-ngx"
# ------------------------------------------------------------------
# Class methods
# ------------------------------------------------------------------
@classmethod
def supported_mime_types(cls) -> dict[str, str]:
"""Return the MIME types this parser can handle.
The full set is always returned regardless of whether a remote
engine is configured. The ``score()`` method handles the
"am I active?" logic by returning ``None`` when not configured.
Returns
-------
dict[str, str]
Mapping of MIME type to preferred file extension.
"""
return _SUPPORTED_MIME_TYPES
@classmethod
def score(
cls,
mime_type: str,
filename: str,
path: Path | None = None,
) -> int | None:
"""Return the priority score for handling this file, or None.
Returns ``None`` when no valid remote engine is configured,
making the parser invisible to the registry for this file.
When configured, returns 20 — higher than the Tesseract parser's
default of 10 — so the remote engine takes priority.
Parameters
----------
mime_type:
Detected MIME type of the file.
filename:
Original filename including extension.
path:
Optional filesystem path. Not inspected by this parser.
Returns
-------
int | None
20 when the remote engine is configured and the MIME type is
supported, otherwise None.
"""
config = RemoteEngineConfig(
engine=settings.REMOTE_OCR_ENGINE,
api_key=settings.REMOTE_OCR_API_KEY,
endpoint=settings.REMOTE_OCR_ENDPOINT,
)
if not config.engine_is_valid():
return None
if mime_type not in _SUPPORTED_MIME_TYPES:
return None
return 20
# ------------------------------------------------------------------
# Properties
# ------------------------------------------------------------------
@property
def can_produce_archive(self) -> bool:
"""Whether this parser can produce a searchable PDF archive copy.
Returns
-------
bool
Always True — the remote engine always returns a PDF with an
embedded text layer that serves as the archive copy.
"""
return True
@property
def requires_pdf_rendition(self) -> bool:
"""Whether the parser must produce a PDF for the frontend to display.
Returns
-------
bool
Always False — all supported originals are displayable by
the browser (PDF) or handled via the archive copy (images).
"""
return False
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def __init__(self, logging_group: object = None) -> None:
settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
self._tempdir = Path(
tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR),
)
self._logging_group = logging_group
self._text: str | None = None
self._archive_path: Path | None = None
def __enter__(self) -> Self:
return self
def __exit__(
self,
exc_type: type[BaseException] | None,
exc_val: BaseException | None,
exc_tb: TracebackType | None,
) -> None:
logger.debug("Cleaning up temporary directory %s", self._tempdir)
shutil.rmtree(self._tempdir, ignore_errors=True)
# ------------------------------------------------------------------
# Core parsing interface
# ------------------------------------------------------------------
def parse(
self,
document_path: Path,
mime_type: str,
*,
produce_archive: bool = True,
) -> None:
"""Send the document to the remote engine and store results.
Parameters
----------
document_path:
Absolute path to the document file to parse.
mime_type:
Detected MIME type of the document.
produce_archive:
Ignored — the remote engine always returns a searchable PDF,
which is stored as the archive copy regardless of this flag.
"""
config = RemoteEngineConfig(
engine=settings.REMOTE_OCR_ENGINE,
api_key=settings.REMOTE_OCR_API_KEY,
endpoint=settings.REMOTE_OCR_ENDPOINT,
)
if not config.engine_is_valid():
logger.warning(
"No valid remote parser engine is configured, content will be empty.",
)
self._text = ""
return
if config.engine == "azureai":
self._text = self._azure_ai_vision_parse(document_path, config)
# ------------------------------------------------------------------
# Result accessors
# ------------------------------------------------------------------
def get_text(self) -> str | None:
"""Return the plain-text content extracted during parse."""
return self._text
def get_date(self) -> datetime.datetime | None:
"""Return the document date detected during parse.
Returns
-------
datetime.datetime | None
Always None — the remote parser does not detect dates.
"""
return None
def get_archive_path(self) -> Path | None:
"""Return the path to the generated archive PDF, or None."""
return self._archive_path
# ------------------------------------------------------------------
# Thumbnail and metadata
# ------------------------------------------------------------------
def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
"""Generate a thumbnail image for the document.
Uses the archive PDF produced by the remote engine when available,
otherwise falls back to the original document path (PDF inputs).
Parameters
----------
document_path:
Absolute path to the source document.
mime_type:
Detected MIME type of the document.
Returns
-------
Path
Path to the generated WebP thumbnail inside the temp directory.
"""
# make_thumbnail_from_pdf lives in documents.parsers for now;
# it will move to paperless.parsers.utils when the tesseract
# parser is migrated in a later phase.
from documents.parsers import make_thumbnail_from_pdf
return make_thumbnail_from_pdf(
self._archive_path or document_path,
self._tempdir,
self._logging_group,
)
def get_page_count(
self,
document_path: Path,
mime_type: str,
) -> int | None:
"""Return the number of pages in a PDF document.
Parameters
----------
document_path:
Absolute path to the source document.
mime_type:
Detected MIME type of the document.
Returns
-------
int | None
Page count for PDF inputs, or ``None`` for other MIME types.
"""
if mime_type != "application/pdf":
return None
from paperless.parsers.utils import get_page_count_for_pdf
return get_page_count_for_pdf(document_path, log=logger)
def extract_metadata(
self,
document_path: Path,
mime_type: str,
) -> list[MetadataEntry]:
"""Extract format-specific metadata from the document.
Delegates to the shared pikepdf-based extractor for PDF files.
Returns ``[]`` for all other MIME types.
Parameters
----------
document_path:
Absolute path to the file to extract metadata from.
mime_type:
MIME type of the file. May be ``"application/pdf"`` when
called for the archive version of an image original.
Returns
-------
list[MetadataEntry]
Zero or more metadata entries.
"""
if mime_type != "application/pdf":
return []
from paperless.parsers.utils import extract_pdf_metadata
return extract_pdf_metadata(document_path, log=logger)
# ------------------------------------------------------------------
# Private helpers
# ------------------------------------------------------------------
def _azure_ai_vision_parse(
self,
file: Path,
config: RemoteEngineConfig,
) -> str | None:
"""Send ``file`` to Azure AI Document Intelligence and return text.
Downloads the searchable PDF output from Azure and stores it at
``self._archive_path``. Returns the extracted text content, or
``None`` on failure (the error is logged).
Parameters
----------
file:
Absolute path to the document to analyse.
config:
Validated remote engine configuration.
Returns
-------
str | None
Extracted text, or None if the Azure call failed.
"""
if TYPE_CHECKING:
# Callers must have already validated config via engine_is_valid():
# engine_is_valid() asserts api_key is not None and (for azureai)
# endpoint is not None, so these casts are provably safe.
assert config.endpoint is not None
assert config.api_key is not None
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from azure.ai.documentintelligence.models import AnalyzeOutputOption
from azure.ai.documentintelligence.models import DocumentContentFormat
from azure.core.credentials import AzureKeyCredential
client = DocumentIntelligenceClient(
endpoint=config.endpoint,
credential=AzureKeyCredential(config.api_key),
)
try:
with file.open("rb") as f:
analyze_request = AnalyzeDocumentRequest(bytes_source=f.read())
poller = client.begin_analyze_document(
model_id="prebuilt-read",
body=analyze_request,
output_content_format=DocumentContentFormat.TEXT,
output=[AnalyzeOutputOption.PDF],
content_type="application/json",
)
poller.wait()
result_id = poller.details["operation_id"]
result = poller.result()
self._archive_path = self._tempdir / "archive.pdf"
with self._archive_path.open("wb") as f:
for chunk in client.get_analyze_result_pdf(
model_id="prebuilt-read",
result_id=result_id,
):
f.write(chunk)
return result.content
except Exception as e:
logger.error("Azure AI Vision parsing failed: %s", e)
finally:
client.close()
return None

View File

@@ -0,0 +1,320 @@
"""
Built-in plain-text document parser.
Handles text/plain, text/csv, and application/csv MIME types by reading the
file content directly. Thumbnails are generated by rendering a page-sized
WebP image from the first 100,000 characters using Pillow.
"""
from __future__ import annotations
import logging
import shutil
import tempfile
from pathlib import Path
from typing import TYPE_CHECKING
from typing import Self
from django.conf import settings
from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont
from paperless.version import __full_version_str__
if TYPE_CHECKING:
import datetime
from types import TracebackType
from paperless.parsers import MetadataEntry
logger = logging.getLogger("paperless.parsing.text")
_SUPPORTED_MIME_TYPES: dict[str, str] = {
"text/plain": ".txt",
"text/csv": ".csv",
"application/csv": ".csv",
}
class TextDocumentParser:
"""Parse plain-text documents (txt, csv) for Paperless-ngx.
This parser reads the file content directly as UTF-8 text and renders a
simple thumbnail using Pillow. It does not perform OCR and does not
produce a searchable PDF archive copy.
Class attributes
----------------
name : str
Human-readable parser name.
version : str
Semantic version string, kept in sync with Paperless-ngx releases.
author : str
Maintainer name.
url : str
Issue tracker / source URL.
"""
name: str = "Paperless-ngx Text Parser"
version: str = __full_version_str__
author: str = "Paperless-ngx Contributors"
url: str = "https://github.com/paperless-ngx/paperless-ngx"
# ------------------------------------------------------------------
# Class methods
# ------------------------------------------------------------------
@classmethod
def supported_mime_types(cls) -> dict[str, str]:
"""Return the MIME types this parser handles.
Returns
-------
dict[str, str]
Mapping of MIME type to preferred file extension.
"""
return _SUPPORTED_MIME_TYPES
@classmethod
def score(
cls,
mime_type: str,
filename: str,
path: Path | None = None,
) -> int | None:
"""Return the priority score for handling this file.
Parameters
----------
mime_type:
Detected MIME type of the file.
filename:
Original filename including extension.
path:
Optional filesystem path. Not inspected by this parser.
Returns
-------
int | None
10 if the MIME type is supported, otherwise None.
"""
if mime_type in _SUPPORTED_MIME_TYPES:
return 10
return None
# ------------------------------------------------------------------
# Properties
# ------------------------------------------------------------------
@property
def can_produce_archive(self) -> bool:
"""Whether this parser can produce a searchable PDF archive copy.
Returns
-------
bool
Always False — the text parser does not produce a PDF archive.
"""
return False
@property
def requires_pdf_rendition(self) -> bool:
"""Whether the parser must produce a PDF for the frontend to display.
Returns
-------
bool
Always False — plain text files are displayable as-is.
"""
return False
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def __init__(self, logging_group: object = None) -> None:
settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
self._tempdir = Path(
tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR),
)
self._text: str | None = None
def __enter__(self) -> Self:
return self
def __exit__(
self,
exc_type: type[BaseException] | None,
exc_val: BaseException | None,
exc_tb: TracebackType | None,
) -> None:
logger.debug("Cleaning up temporary directory %s", self._tempdir)
shutil.rmtree(self._tempdir, ignore_errors=True)
# ------------------------------------------------------------------
# Core parsing interface
# ------------------------------------------------------------------
def parse(
self,
document_path: Path,
mime_type: str,
*,
produce_archive: bool = True,
) -> None:
"""Read the document and store its text content.
Parameters
----------
document_path:
Absolute path to the text file.
mime_type:
Detected MIME type of the document.
produce_archive:
Ignored — this parser never produces a PDF archive.
Raises
------
documents.parsers.ParseError
If the file cannot be read.
"""
self._text = self._read_text(document_path)
# ------------------------------------------------------------------
# Result accessors
# ------------------------------------------------------------------
def get_text(self) -> str | None:
"""Return the plain-text content extracted during parse.
Returns
-------
str | None
Extracted text, or None if parse has not been called yet.
"""
return self._text
def get_date(self) -> datetime.datetime | None:
"""Return the document date detected during parse.
Returns
-------
datetime.datetime | None
Always None — the text parser does not detect dates.
"""
return None
def get_archive_path(self) -> Path | None:
"""Return the path to a generated archive PDF, or None.
Returns
-------
Path | None
Always None — the text parser does not produce a PDF archive.
"""
return None
# ------------------------------------------------------------------
# Thumbnail and metadata
# ------------------------------------------------------------------
def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
"""Render the first portion of the document as a WebP thumbnail.
Parameters
----------
document_path:
Absolute path to the source document.
mime_type:
Detected MIME type of the document.
Returns
-------
Path
Path to the generated WebP thumbnail inside the temporary directory.
"""
max_chars = 100_000
file_size_limit = 50 * 1024 * 1024
if document_path.stat().st_size > file_size_limit:
text = "[File too large to preview]"
else:
with Path(document_path).open("r", encoding="utf-8", errors="replace") as f:
text = f.read(max_chars)
img = Image.new("RGB", (500, 700), color="white")
draw = ImageDraw.Draw(img)
font = ImageFont.truetype(
font=settings.THUMBNAIL_FONT_NAME,
size=20,
layout_engine=ImageFont.Layout.BASIC,
)
draw.multiline_text((5, 5), text, font=font, fill="black", spacing=4)
out_path = self._tempdir / "thumb.webp"
img.save(out_path, format="WEBP")
return out_path
def get_page_count(
self,
document_path: Path,
mime_type: str,
) -> int | None:
"""Return the number of pages in the document.
Parameters
----------
document_path:
Absolute path to the source document.
mime_type:
Detected MIME type of the document.
Returns
-------
int | None
Always None — page count is not meaningful for plain text.
"""
return None
def extract_metadata(
self,
document_path: Path,
mime_type: str,
) -> list[MetadataEntry]:
"""Extract format-specific metadata from the document.
Returns
-------
list[MetadataEntry]
Always ``[]`` — plain text files carry no structured metadata.
"""
return []
# ------------------------------------------------------------------
# Private helpers
# ------------------------------------------------------------------
def _read_text(self, filepath: Path) -> str:
"""Read file content, replacing invalid UTF-8 bytes rather than failing.
Parameters
----------
filepath:
Path to the file to read.
Returns
-------
str
File content as a string.
"""
try:
return filepath.read_text(encoding="utf-8")
except UnicodeDecodeError as exc:
logger.warning(
"Unicode error reading %s, replacing bad bytes: %s",
filepath,
exc,
)
return filepath.read_bytes().decode("utf-8", errors="replace")

View File

@@ -0,0 +1,130 @@
"""
Shared utilities for Paperless-ngx document parsers.
Functions here are format-neutral helpers that multiple parsers need.
Keeping them here avoids parsers inheriting from each other just to
share implementation.
"""
from __future__ import annotations
import logging
import re
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from pathlib import Path
from paperless.parsers import MetadataEntry
logger = logging.getLogger("paperless.parsers.utils")
def get_page_count_for_pdf(
document_path: Path,
log: logging.Logger | None = None,
) -> int | None:
"""Return the number of pages in a PDF file using pikepdf.
Parameters
----------
document_path:
Absolute path to the PDF file.
log:
Logger to use for warnings. Falls back to the module-level logger
when omitted.
Returns
-------
int | None
Page count, or ``None`` if the file cannot be opened or is not a
valid PDF.
"""
import pikepdf
_log = log or logger
try:
with pikepdf.Pdf.open(document_path) as pdf:
return len(pdf.pages)
except Exception as e:
_log.warning("Unable to determine PDF page count for %s: %s", document_path, e)
return None
def extract_pdf_metadata(
document_path: Path,
log: logging.Logger | None = None,
) -> list[MetadataEntry]:
"""Extract XMP/PDF metadata from a PDF file using pikepdf.
Reads all XMP metadata entries from the document and returns them as a
list of ``MetadataEntry`` dicts. The method never raises — any failure
to open the file or read a specific key is logged and skipped.
Parameters
----------
document_path:
Absolute path to the PDF file.
log:
Logger to use for warnings and debug messages. Falls back to the
module-level logger when omitted.
Returns
-------
list[MetadataEntry]
Zero or more metadata entries. Returns ``[]`` if the file cannot
be opened or contains no readable XMP metadata.
"""
import pikepdf
from paperless.parsers import MetadataEntry
_log = log or logger
result: list[MetadataEntry] = []
namespace_pattern = re.compile(r"\{(.*)\}(.*)")
try:
pdf = pikepdf.open(document_path)
meta = pdf.open_metadata()
except Exception as e:
_log.warning("Could not open PDF metadata for %s: %s", document_path, e)
return []
for key, value in meta.items():
if isinstance(value, list):
value = " ".join(str(e) for e in value)
value = str(value)
try:
m = namespace_pattern.match(key)
if m is None:
continue
namespace = m.group(1)
key_value = m.group(2)
try:
namespace.encode("utf-8")
key_value.encode("utf-8")
except UnicodeEncodeError as enc_err:
_log.debug("Skipping metadata key %s: %s", key, enc_err)
continue
result.append(
MetadataEntry(
namespace=namespace,
prefix=meta.REVERSE_NS[namespace],
key=key_value,
value=value,
),
)
except Exception as e:
_log.warning(
"Error reading metadata key %s value %s: %s",
key,
value,
e,
)
return result

View File

@@ -0,0 +1,48 @@
"""
Fixtures defined here are available to every test module under
src/paperless/tests/ (including sub-packages such as parsers/).
Session-scoped fixtures for the shared samples directory live here so
sub-package conftest files can reference them without duplicating path logic.
Parser-specific fixtures (concrete parser instances, format-specific sample
files) live in paperless/tests/parsers/conftest.py.
"""
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
import pytest
from paperless.parsers.registry import reset_parser_registry
if TYPE_CHECKING:
from collections.abc import Generator
@pytest.fixture(scope="session")
def samples_dir() -> Path:
"""Absolute path to the shared parser sample files directory.
Sub-package conftest files derive format-specific paths from this root,
e.g. ``samples_dir / "text" / "test.txt"``.
Returns
-------
Path
Directory containing all sample documents used by parser tests.
"""
return (Path(__file__).parent / "samples").resolve()
@pytest.fixture(autouse=True)
def clean_registry() -> Generator[None, None, None]:
"""Reset the parser registry before and after every test.
This prevents registry state from leaking between tests that call
get_parser_registry() or init_builtin_parsers().
"""
reset_parser_registry()
yield
reset_parser_registry()

View File

View File

@@ -0,0 +1,165 @@
"""
Parser fixtures that are used across multiple test modules in this package
are defined here. Format-specific sample-file fixtures are grouped by parser
so it is easy to see which files belong to which test module.
"""
from __future__ import annotations
from typing import TYPE_CHECKING
import pytest
from paperless.parsers.remote import RemoteDocumentParser
from paperless.parsers.text import TextDocumentParser
if TYPE_CHECKING:
from collections.abc import Generator
from pathlib import Path
from pytest_django.fixtures import SettingsWrapper
# ------------------------------------------------------------------
# Text parser sample files
# ------------------------------------------------------------------
@pytest.fixture(scope="session")
def text_samples_dir(samples_dir: Path) -> Path:
"""Absolute path to the text parser sample files directory.
Returns
-------
Path
``<samples_dir>/text/``
"""
return samples_dir / "text"
@pytest.fixture(scope="session")
def sample_txt_file(text_samples_dir: Path) -> Path:
"""Path to a valid UTF-8 plain-text sample file.
Returns
-------
Path
Absolute path to ``text/test.txt``.
"""
return text_samples_dir / "test.txt"
@pytest.fixture(scope="session")
def malformed_txt_file(text_samples_dir: Path) -> Path:
"""Path to a text file containing invalid UTF-8 bytes.
Returns
-------
Path
Absolute path to ``text/decode_error.txt``.
"""
return text_samples_dir / "decode_error.txt"
# ------------------------------------------------------------------
# Text parser instance
# ------------------------------------------------------------------
@pytest.fixture()
def text_parser() -> Generator[TextDocumentParser, None, None]:
"""Yield a TextDocumentParser and clean up its temporary directory afterwards.
Yields
------
TextDocumentParser
A ready-to-use parser instance.
"""
with TextDocumentParser() as parser:
yield parser
# ------------------------------------------------------------------
# Remote parser sample files
# ------------------------------------------------------------------
@pytest.fixture(scope="session")
def remote_samples_dir(samples_dir: Path) -> Path:
"""Absolute path to the remote parser sample files directory.
Returns
-------
Path
``<samples_dir>/remote/``
"""
return samples_dir / "remote"
@pytest.fixture(scope="session")
def sample_pdf_file(remote_samples_dir: Path) -> Path:
"""Path to a simple digital PDF sample file.
Returns
-------
Path
Absolute path to ``remote/simple-digital.pdf``.
"""
return remote_samples_dir / "simple-digital.pdf"
# ------------------------------------------------------------------
# Remote parser instance
# ------------------------------------------------------------------
@pytest.fixture()
def remote_parser() -> Generator[RemoteDocumentParser, None, None]:
"""Yield a RemoteDocumentParser and clean up its temporary directory afterwards.
Yields
------
RemoteDocumentParser
A ready-to-use parser instance.
"""
with RemoteDocumentParser() as parser:
yield parser
# ------------------------------------------------------------------
# Remote parser settings helpers
# ------------------------------------------------------------------
@pytest.fixture()
def azure_settings(settings: SettingsWrapper) -> SettingsWrapper:
"""Configure Django settings for a valid Azure AI OCR engine.
Sets ``REMOTE_OCR_ENGINE``, ``REMOTE_OCR_API_KEY``, and
``REMOTE_OCR_ENDPOINT`` to test values. Settings are restored
automatically after the test by pytest-django.
Returns
-------
SettingsWrapper
The modified settings object (for chaining further overrides).
"""
settings.REMOTE_OCR_ENGINE = "azureai"
settings.REMOTE_OCR_API_KEY = "test-api-key"
settings.REMOTE_OCR_ENDPOINT = "https://test.cognitiveservices.azure.com"
return settings
@pytest.fixture()
def no_engine_settings(settings: SettingsWrapper) -> SettingsWrapper:
"""Configure Django settings with no remote engine configured.
Returns
-------
SettingsWrapper
The modified settings object.
"""
settings.REMOTE_OCR_ENGINE = None
settings.REMOTE_OCR_API_KEY = None
settings.REMOTE_OCR_ENDPOINT = None
return settings

View File

@@ -0,0 +1,490 @@
"""
Tests for paperless.parsers.remote.RemoteDocumentParser.
All tests use the context-manager protocol for parser lifecycle.
Fixture layout
--------------
make_azure_mock — factory (defined here; specific to this module)
azure_client — composes azure_settings + make_azure_mock + patch;
use when a test needs the client to succeed
failing_azure_client
— composes azure_settings + patch with RuntimeError;
use when a test needs the client to fail
"""
from __future__ import annotations
from typing import TYPE_CHECKING
from unittest.mock import Mock
import pytest
from paperless.parsers import ParserProtocol
from paperless.parsers.remote import RemoteDocumentParser
if TYPE_CHECKING:
from collections.abc import Callable
from pathlib import Path
from pytest_django.fixtures import SettingsWrapper
from pytest_mock import MockerFixture
# ---------------------------------------------------------------------------
# Module-local fixtures
# ---------------------------------------------------------------------------
_AZURE_CLIENT_TARGET = "azure.ai.documentintelligence.DocumentIntelligenceClient"
_DEFAULT_TEXT = "Extracted text."
@pytest.fixture()
def make_azure_mock() -> Callable[[str], Mock]:
"""Return a factory that builds a mock Azure DocumentIntelligenceClient.
Usage::
mock_client = make_azure_mock() # default extracted text
mock_client = make_azure_mock("My text.") # custom extracted text
"""
def _factory(text: str = _DEFAULT_TEXT) -> Mock:
mock_client = Mock()
mock_poller = Mock()
mock_poller.wait.return_value = None
mock_poller.details = {"operation_id": "fake-op-id"}
mock_poller.result.return_value.content = text
mock_client.begin_analyze_document.return_value = mock_poller
mock_client.get_analyze_result_pdf.return_value = [b"%PDF-1.4 FAKE"]
return mock_client
return _factory
@pytest.fixture()
def azure_client(
azure_settings: SettingsWrapper,
make_azure_mock: Callable[[str], Mock],
mocker: MockerFixture,
) -> Mock:
"""Patch the Azure DI client with a succeeding mock and return the instance.
Implicitly applies ``azure_settings`` so tests using this fixture do not
also need ``@pytest.mark.usefixtures("azure_settings")``.
"""
mock_client = make_azure_mock()
mocker.patch(_AZURE_CLIENT_TARGET, return_value=mock_client)
return mock_client
@pytest.fixture()
def failing_azure_client(
azure_settings: SettingsWrapper,
mocker: MockerFixture,
) -> Mock:
"""Patch the Azure DI client to raise RuntimeError on every call.
Implicitly applies ``azure_settings``. Returns the mock instance so
tests can assert on calls such as ``close()``.
"""
mock_client = Mock()
mock_client.begin_analyze_document.side_effect = RuntimeError("network failure")
mocker.patch(_AZURE_CLIENT_TARGET, return_value=mock_client)
return mock_client
# ---------------------------------------------------------------------------
# Protocol contract
# ---------------------------------------------------------------------------
class TestRemoteParserProtocol:
"""Verify that RemoteDocumentParser satisfies the ParserProtocol contract."""
def test_isinstance_satisfies_protocol(
self,
remote_parser: RemoteDocumentParser,
) -> None:
assert isinstance(remote_parser, ParserProtocol)
def test_class_attributes_present(self) -> None:
assert isinstance(RemoteDocumentParser.name, str) and RemoteDocumentParser.name
assert (
isinstance(RemoteDocumentParser.version, str)
and RemoteDocumentParser.version
)
assert (
isinstance(RemoteDocumentParser.author, str) and RemoteDocumentParser.author
)
assert isinstance(RemoteDocumentParser.url, str) and RemoteDocumentParser.url
# ---------------------------------------------------------------------------
# supported_mime_types
# ---------------------------------------------------------------------------
class TestRemoteParserSupportedMimeTypes:
"""supported_mime_types() always returns the full set regardless of config."""
def test_returns_dict(self) -> None:
mime_types = RemoteDocumentParser.supported_mime_types()
assert isinstance(mime_types, dict)
def test_includes_all_expected_types(self) -> None:
mime_types = RemoteDocumentParser.supported_mime_types()
expected = {
"application/pdf",
"image/png",
"image/jpeg",
"image/tiff",
"image/bmp",
"image/gif",
"image/webp",
}
assert expected == set(mime_types.keys())
@pytest.mark.usefixtures("no_engine_settings")
def test_returns_full_set_when_not_configured(self) -> None:
"""
GIVEN: No remote engine is configured
WHEN: supported_mime_types() is called
THEN: The full MIME type dict is still returned (score() handles activation)
"""
mime_types = RemoteDocumentParser.supported_mime_types()
assert len(mime_types) == 7
# ---------------------------------------------------------------------------
# score()
# ---------------------------------------------------------------------------
class TestRemoteParserScore:
"""score() encodes the activation logic: None when unconfigured, 20 when active."""
@pytest.mark.usefixtures("azure_settings")
@pytest.mark.parametrize(
"mime_type",
[
pytest.param("application/pdf", id="pdf"),
pytest.param("image/png", id="png"),
pytest.param("image/jpeg", id="jpeg"),
pytest.param("image/tiff", id="tiff"),
pytest.param("image/bmp", id="bmp"),
pytest.param("image/gif", id="gif"),
pytest.param("image/webp", id="webp"),
],
)
def test_score_returns_20_when_configured(self, mime_type: str) -> None:
result = RemoteDocumentParser.score(mime_type, "doc.pdf")
assert result == 20
@pytest.mark.usefixtures("no_engine_settings")
@pytest.mark.parametrize(
"mime_type",
[
pytest.param("application/pdf", id="pdf"),
pytest.param("image/png", id="png"),
pytest.param("image/jpeg", id="jpeg"),
],
)
def test_score_returns_none_when_no_engine(self, mime_type: str) -> None:
result = RemoteDocumentParser.score(mime_type, "doc.pdf")
assert result is None
def test_score_returns_none_when_api_key_missing(
self,
settings: SettingsWrapper,
) -> None:
settings.REMOTE_OCR_ENGINE = "azureai"
settings.REMOTE_OCR_API_KEY = None
settings.REMOTE_OCR_ENDPOINT = "https://test.cognitiveservices.azure.com"
result = RemoteDocumentParser.score("application/pdf", "doc.pdf")
assert result is None
def test_score_returns_none_when_endpoint_missing(
self,
settings: SettingsWrapper,
) -> None:
settings.REMOTE_OCR_ENGINE = "azureai"
settings.REMOTE_OCR_API_KEY = "key"
settings.REMOTE_OCR_ENDPOINT = None
result = RemoteDocumentParser.score("application/pdf", "doc.pdf")
assert result is None
@pytest.mark.usefixtures("azure_settings")
def test_score_returns_none_for_unsupported_mime_type(self) -> None:
result = RemoteDocumentParser.score("text/plain", "doc.txt")
assert result is None
@pytest.mark.usefixtures("azure_settings")
def test_score_higher_than_tesseract_default(self) -> None:
"""Remote parser (20) outranks the tesseract default (10) when configured."""
score = RemoteDocumentParser.score("application/pdf", "doc.pdf")
assert score is not None and score > 10
# ---------------------------------------------------------------------------
# Properties
# ---------------------------------------------------------------------------
class TestRemoteParserProperties:
def test_can_produce_archive_is_true(
self,
remote_parser: RemoteDocumentParser,
) -> None:
assert remote_parser.can_produce_archive is True
def test_requires_pdf_rendition_is_false(
self,
remote_parser: RemoteDocumentParser,
) -> None:
assert remote_parser.requires_pdf_rendition is False
# ---------------------------------------------------------------------------
# Lifecycle
# ---------------------------------------------------------------------------
class TestRemoteParserLifecycle:
def test_context_manager_cleans_up_tempdir(self) -> None:
with RemoteDocumentParser() as parser:
tempdir = parser._tempdir
assert tempdir.exists()
assert not tempdir.exists()
def test_context_manager_cleans_up_after_exception(self) -> None:
tempdir: Path | None = None
with pytest.raises(RuntimeError):
with RemoteDocumentParser() as parser:
tempdir = parser._tempdir
raise RuntimeError("boom")
assert tempdir is not None
assert not tempdir.exists()
# ---------------------------------------------------------------------------
# parse() — happy path
# ---------------------------------------------------------------------------
class TestRemoteParserParse:
def test_parse_returns_text_from_azure(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
azure_client: Mock,
) -> None:
remote_parser.parse(sample_pdf_file, "application/pdf")
assert remote_parser.get_text() == _DEFAULT_TEXT
def test_parse_sets_archive_path(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
azure_client: Mock,
) -> None:
remote_parser.parse(sample_pdf_file, "application/pdf")
archive = remote_parser.get_archive_path()
assert archive is not None
assert archive.exists()
assert archive.suffix == ".pdf"
def test_parse_closes_client_on_success(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
azure_client: Mock,
) -> None:
remote_parser.parse(sample_pdf_file, "application/pdf")
azure_client.close.assert_called_once()
@pytest.mark.usefixtures("no_engine_settings")
def test_parse_sets_empty_text_when_not_configured(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
) -> None:
remote_parser.parse(sample_pdf_file, "application/pdf")
assert remote_parser.get_text() == ""
assert remote_parser.get_archive_path() is None
def test_get_text_none_before_parse(
self,
remote_parser: RemoteDocumentParser,
) -> None:
assert remote_parser.get_text() is None
def test_get_date_always_none(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
azure_client: Mock,
) -> None:
remote_parser.parse(sample_pdf_file, "application/pdf")
assert remote_parser.get_date() is None
# ---------------------------------------------------------------------------
# parse() — Azure failure path
# ---------------------------------------------------------------------------
class TestRemoteParserParseError:
def test_parse_returns_none_on_azure_error(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
failing_azure_client: Mock,
) -> None:
remote_parser.parse(sample_pdf_file, "application/pdf")
assert remote_parser.get_text() is None
def test_parse_closes_client_on_error(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
failing_azure_client: Mock,
) -> None:
remote_parser.parse(sample_pdf_file, "application/pdf")
failing_azure_client.close.assert_called_once()
def test_parse_logs_error_on_azure_failure(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
failing_azure_client: Mock,
mocker: MockerFixture,
) -> None:
mock_log = mocker.patch("paperless.parsers.remote.logger")
remote_parser.parse(sample_pdf_file, "application/pdf")
mock_log.error.assert_called_once()
assert "Azure AI Vision parsing failed" in mock_log.error.call_args[0][0]
# ---------------------------------------------------------------------------
# get_page_count()
# ---------------------------------------------------------------------------
class TestRemoteParserPageCount:
def test_page_count_for_pdf(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
) -> None:
count = remote_parser.get_page_count(sample_pdf_file, "application/pdf")
assert isinstance(count, int)
assert count >= 1
def test_page_count_returns_none_for_image_mime(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
) -> None:
count = remote_parser.get_page_count(sample_pdf_file, "image/png")
assert count is None
def test_page_count_returns_none_for_invalid_pdf(
self,
remote_parser: RemoteDocumentParser,
tmp_path: Path,
) -> None:
bad_pdf = tmp_path / "bad.pdf"
bad_pdf.write_bytes(b"not a pdf at all")
count = remote_parser.get_page_count(bad_pdf, "application/pdf")
assert count is None
# ---------------------------------------------------------------------------
# extract_metadata()
# ---------------------------------------------------------------------------
class TestRemoteParserMetadata:
def test_extract_metadata_non_pdf_returns_empty(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
) -> None:
result = remote_parser.extract_metadata(sample_pdf_file, "image/png")
assert result == []
def test_extract_metadata_pdf_returns_list(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
) -> None:
result = remote_parser.extract_metadata(sample_pdf_file, "application/pdf")
assert isinstance(result, list)
def test_extract_metadata_pdf_entries_have_required_keys(
self,
remote_parser: RemoteDocumentParser,
sample_pdf_file: Path,
) -> None:
result = remote_parser.extract_metadata(sample_pdf_file, "application/pdf")
for entry in result:
assert "namespace" in entry
assert "prefix" in entry
assert "key" in entry
assert "value" in entry
assert isinstance(entry["value"], str)
def test_extract_metadata_does_not_raise_on_invalid_pdf(
self,
remote_parser: RemoteDocumentParser,
tmp_path: Path,
) -> None:
bad_pdf = tmp_path / "bad.pdf"
bad_pdf.write_bytes(b"not a pdf at all")
result = remote_parser.extract_metadata(bad_pdf, "application/pdf")
assert result == []
# ---------------------------------------------------------------------------
# Registry integration
# ---------------------------------------------------------------------------
class TestRemoteParserRegistry:
def test_registered_in_defaults(self) -> None:
from paperless.parsers.registry import ParserRegistry
registry = ParserRegistry()
registry.register_defaults()
assert RemoteDocumentParser in registry._builtins
@pytest.mark.usefixtures("azure_settings")
def test_get_parser_returns_remote_when_configured(self) -> None:
from paperless.parsers.registry import get_parser_registry
registry = get_parser_registry()
parser_cls = registry.get_parser_for_file("application/pdf", "doc.pdf")
assert parser_cls is RemoteDocumentParser
@pytest.mark.usefixtures("no_engine_settings")
def test_get_parser_returns_none_for_pdf_when_not_configured(self) -> None:
"""With no tesseract parser registered yet, PDF has no handler if remote is off."""
from paperless.parsers.registry import ParserRegistry
registry = ParserRegistry()
registry.register_defaults()
parser_cls = registry.get_parser_for_file("application/pdf", "doc.pdf")
assert parser_cls is None

View File

@@ -0,0 +1,256 @@
"""
Tests for paperless.parsers.text.TextDocumentParser.
All tests use the context-manager protocol for parser lifecycle. Sample
files are provided by session-scoped fixtures defined in conftest.py.
"""
from __future__ import annotations
import tempfile
from pathlib import Path
import pytest
from paperless.parsers import ParserProtocol
from paperless.parsers.text import TextDocumentParser
class TestTextParserProtocol:
"""Verify that TextDocumentParser satisfies the ParserProtocol contract."""
def test_isinstance_satisfies_protocol(
self,
text_parser: TextDocumentParser,
) -> None:
assert isinstance(text_parser, ParserProtocol)
def test_class_attributes_present(self) -> None:
assert isinstance(TextDocumentParser.name, str) and TextDocumentParser.name
assert (
isinstance(TextDocumentParser.version, str) and TextDocumentParser.version
)
assert isinstance(TextDocumentParser.author, str) and TextDocumentParser.author
assert isinstance(TextDocumentParser.url, str) and TextDocumentParser.url
def test_supported_mime_types_returns_dict(self) -> None:
mime_types = TextDocumentParser.supported_mime_types()
assert isinstance(mime_types, dict)
assert "text/plain" in mime_types
assert "text/csv" in mime_types
assert "application/csv" in mime_types
@pytest.mark.parametrize(
("mime_type", "expected"),
[
("text/plain", 10),
("text/csv", 10),
("application/csv", 10),
("application/pdf", None),
("image/png", None),
],
)
def test_score(self, mime_type: str, expected: int | None) -> None:
assert TextDocumentParser.score(mime_type, "file.txt") == expected
def test_can_produce_archive_is_false(
self,
text_parser: TextDocumentParser,
) -> None:
assert text_parser.can_produce_archive is False
def test_requires_pdf_rendition_is_false(
self,
text_parser: TextDocumentParser,
) -> None:
assert text_parser.requires_pdf_rendition is False
class TestTextParserLifecycle:
"""Verify context-manager behaviour and temporary directory cleanup."""
def test_context_manager_cleans_up_tempdir(self) -> None:
with TextDocumentParser() as parser:
tempdir = parser._tempdir
assert tempdir.exists()
assert not tempdir.exists()
def test_context_manager_cleans_up_after_exception(self) -> None:
tempdir: Path | None = None
with pytest.raises(RuntimeError):
with TextDocumentParser() as parser:
tempdir = parser._tempdir
raise RuntimeError("boom")
assert tempdir is not None
assert not tempdir.exists()
class TestTextParserParse:
"""Verify parse() and the result accessors."""
def test_parse_valid_utf8(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
text_parser.parse(sample_txt_file, "text/plain")
assert text_parser.get_text() == "This is a test file.\n"
def test_parse_returns_none_for_archive_path(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
text_parser.parse(sample_txt_file, "text/plain")
assert text_parser.get_archive_path() is None
def test_parse_returns_none_for_date(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
text_parser.parse(sample_txt_file, "text/plain")
assert text_parser.get_date() is None
def test_parse_invalid_utf8_bytes_replaced(
self,
text_parser: TextDocumentParser,
malformed_txt_file: Path,
) -> None:
"""
GIVEN:
- A text file containing invalid UTF-8 byte sequences
WHEN:
- The file is parsed
THEN:
- Parsing succeeds
- Invalid bytes are replaced with the Unicode replacement character
"""
text_parser.parse(malformed_txt_file, "text/plain")
assert text_parser.get_text() == "Pantothens\ufffdure\n"
def test_get_text_none_before_parse(
self,
text_parser: TextDocumentParser,
) -> None:
assert text_parser.get_text() is None
class TestTextParserThumbnail:
"""Verify thumbnail generation."""
def test_thumbnail_exists_and_is_file(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
thumb = text_parser.get_thumbnail(sample_txt_file, "text/plain")
assert thumb.exists()
assert thumb.is_file()
def test_thumbnail_large_file_does_not_read_all(
self,
text_parser: TextDocumentParser,
) -> None:
"""
GIVEN:
- A text file larger than 50 MB
WHEN:
- A thumbnail is requested
THEN:
- The thumbnail is generated without loading the full file
"""
with tempfile.NamedTemporaryFile(
delete=False,
mode="w",
encoding="utf-8",
suffix=".txt",
) as tmp:
tmp.write("A" * (51 * 1024 * 1024))
large_file = Path(tmp.name)
try:
thumb = text_parser.get_thumbnail(large_file, "text/plain")
assert thumb.exists()
assert thumb.is_file()
finally:
large_file.unlink(missing_ok=True)
def test_get_page_count_returns_none(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
assert text_parser.get_page_count(sample_txt_file, "text/plain") is None
class TestTextParserMetadata:
"""Verify extract_metadata behaviour."""
def test_extract_metadata_returns_empty_list(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
result = text_parser.extract_metadata(sample_txt_file, "text/plain")
assert result == []
def test_extract_metadata_returns_list_type(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
result = text_parser.extract_metadata(sample_txt_file, "text/plain")
assert isinstance(result, list)
def test_extract_metadata_ignores_mime_type(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
"""extract_metadata returns [] regardless of the mime_type argument."""
assert text_parser.extract_metadata(sample_txt_file, "application/pdf") == []
assert text_parser.extract_metadata(sample_txt_file, "text/csv") == []
class TestTextParserRegistry:
"""Verify that TextDocumentParser is registered by default."""
def test_registered_in_defaults(self) -> None:
from paperless.parsers.registry import ParserRegistry
registry = ParserRegistry()
registry.register_defaults()
assert TextDocumentParser in registry._builtins
def test_get_parser_for_text_plain(self) -> None:
from paperless.parsers.registry import get_parser_registry
registry = get_parser_registry()
parser_cls = registry.get_parser_for_file("text/plain", "doc.txt")
assert parser_cls is TextDocumentParser
def test_get_parser_for_text_csv(self) -> None:
from paperless.parsers.registry import get_parser_registry
registry = get_parser_registry()
parser_cls = registry.get_parser_for_file("text/csv", "data.csv")
assert parser_cls is TextDocumentParser
def test_get_parser_for_unknown_type_returns_none(self) -> None:
from paperless.parsers.registry import get_parser_registry
registry = get_parser_registry()
parser_cls = registry.get_parser_for_file("application/pdf", "doc.pdf")
assert parser_cls is None

View File

@@ -0,0 +1,714 @@
"""
Tests for :mod:`paperless.parsers` (ParserProtocol) and
:mod:`paperless.parsers.registry` (ParserRegistry + module-level helpers).
All tests use pytest-style functions/classes — no unittest.TestCase.
The ``clean_registry`` fixture ensures complete isolation between tests by
resetting the module-level singleton before and after every test.
"""
from __future__ import annotations
import logging
from importlib.metadata import EntryPoint
from pathlib import Path
from typing import Self
from unittest.mock import MagicMock
from unittest.mock import patch
import pytest
from paperless.parsers import ParserProtocol
from paperless.parsers.registry import ParserRegistry
from paperless.parsers.registry import get_parser_registry
from paperless.parsers.registry import init_builtin_parsers
from paperless.parsers.registry import reset_parser_registry
@pytest.fixture()
def dummy_parser_cls() -> type:
"""Return a class that fully satisfies :class:`ParserProtocol`.
GIVEN: A need to exercise registry and Protocol logic with a minimal
but complete parser.
WHEN: A test requests this fixture.
THEN: A class with all required attributes and methods is returned.
"""
class DummyParser:
name = "dummy-parser"
version = "0.1.0"
author = "Test Author"
url = "https://example.com/dummy-parser"
@classmethod
def supported_mime_types(cls) -> dict[str, str]:
return {"text/plain": ".txt"}
@classmethod
def score(
cls,
mime_type: str,
filename: str,
path: Path | None = None,
) -> int | None:
return 10
@property
def can_produce_archive(self) -> bool:
return False
@property
def requires_pdf_rendition(self) -> bool:
return False
def parse(
self,
document_path: Path,
mime_type: str,
*,
produce_archive: bool = True,
) -> None:
"""
Required to exist, but doesn't need to do anything
"""
def get_text(self) -> str | None:
return None
def get_date(self) -> None:
return None
def get_archive_path(self) -> Path | None:
return None
def get_thumbnail(
self,
document_path: Path,
mime_type: str,
) -> Path:
return Path("/tmp/thumbnail.webp")
def get_page_count(
self,
document_path: Path,
mime_type: str,
) -> int | None:
return None
def extract_metadata(
self,
document_path: Path,
mime_type: str,
) -> list:
return []
def __enter__(self) -> Self:
return self
def __exit__(self, exc_type, exc_val, exc_tb) -> None:
"""
Required to exist, but doesn't need to do anything
"""
return DummyParser
class TestParserProtocol:
"""Verify runtime isinstance() checks against ParserProtocol."""
def test_compliant_class_instance_passes_isinstance(
self,
dummy_parser_cls: type,
) -> None:
"""
GIVEN: A class that implements every method required by ParserProtocol.
WHEN: isinstance() is called with the Protocol.
THEN: The check passes (returns True).
"""
instance = dummy_parser_cls()
assert isinstance(instance, ParserProtocol)
def test_non_compliant_class_instance_fails_isinstance(self) -> None:
"""
GIVEN: A plain class with no parser-related methods.
WHEN: isinstance() is called with ParserProtocol.
THEN: The check fails (returns False).
"""
class Unrelated:
pass
assert not isinstance(Unrelated(), ParserProtocol)
@pytest.mark.parametrize(
"missing_method",
[
pytest.param("parse", id="missing-parse"),
pytest.param("get_text", id="missing-get_text"),
pytest.param("get_thumbnail", id="missing-get_thumbnail"),
pytest.param("__enter__", id="missing-__enter__"),
pytest.param("__exit__", id="missing-__exit__"),
],
)
def test_partial_compliant_fails_isinstance(
self,
dummy_parser_cls: type,
missing_method: str,
) -> None:
"""
GIVEN: A class that satisfies ParserProtocol except for one method.
WHEN: isinstance() is called with ParserProtocol.
THEN: The check fails because the Protocol is not fully satisfied.
"""
# Create a subclass and delete the specified method to break compliance.
partial_cls = type(
"PartialParser",
(dummy_parser_cls,),
{missing_method: None}, # Replace with None — not callable
)
assert not isinstance(partial_cls(), ParserProtocol)
class TestRegistrySingleton:
"""Verify the module-level singleton lifecycle functions."""
def test_get_parser_registry_returns_instance(self) -> None:
"""
GIVEN: No registry has been created yet.
WHEN: get_parser_registry() is called.
THEN: A ParserRegistry instance is returned.
"""
registry = get_parser_registry()
assert isinstance(registry, ParserRegistry)
def test_get_parser_registry_same_instance_on_repeated_calls(self) -> None:
"""
GIVEN: A registry instance was created by a prior call.
WHEN: get_parser_registry() is called a second time.
THEN: The exact same object (identity) is returned.
"""
first = get_parser_registry()
second = get_parser_registry()
assert first is second
def test_reset_parser_registry_gives_fresh_instance(self) -> None:
"""
GIVEN: A registry instance already exists.
WHEN: reset_parser_registry() is called and then get_parser_registry()
is called again.
THEN: A new, distinct registry instance is returned.
"""
first = get_parser_registry()
reset_parser_registry()
second = get_parser_registry()
assert first is not second
def test_init_builtin_parsers_does_not_run_discover(
self,
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""
GIVEN: discover() would raise an exception if called.
WHEN: init_builtin_parsers() is called.
THEN: No exception is raised, confirming discover() was not invoked.
"""
def exploding_discover(self) -> None:
raise RuntimeError(
"discover() must not be called from init_builtin_parsers",
)
monkeypatch.setattr(ParserRegistry, "discover", exploding_discover)
# Should complete without raising.
init_builtin_parsers()
def test_init_builtin_parsers_idempotent(self) -> None:
"""
GIVEN: init_builtin_parsers() has already been called once.
WHEN: init_builtin_parsers() is called a second time.
THEN: No error is raised and the same registry instance is reused.
"""
init_builtin_parsers()
# Capture the registry created by the first call.
import paperless.parsers.registry as reg_module
first_registry = reg_module._registry
init_builtin_parsers()
assert reg_module._registry is first_registry
class TestParserRegistryGetParserForFile:
"""Verify parser selection logic in get_parser_for_file()."""
def test_returns_none_when_no_parsers_registered(self) -> None:
"""
GIVEN: A registry with no parsers registered.
WHEN: get_parser_for_file() is called for any MIME type.
THEN: None is returned.
"""
registry = ParserRegistry()
result = registry.get_parser_for_file("text/plain", "doc.txt")
assert result is None
def test_returns_none_for_unsupported_mime_type(
self,
dummy_parser_cls: type,
) -> None:
"""
GIVEN: A registry with a parser that supports only 'text/plain'.
WHEN: get_parser_for_file() is called with 'application/pdf'.
THEN: None is returned.
"""
registry = ParserRegistry()
registry.register_builtin(dummy_parser_cls)
result = registry.get_parser_for_file("application/pdf", "file.pdf")
assert result is None
def test_returns_parser_for_supported_mime_type(
self,
dummy_parser_cls: type,
) -> None:
"""
GIVEN: A registry with a parser registered for 'text/plain'.
WHEN: get_parser_for_file() is called with 'text/plain'.
THEN: The registered parser class is returned.
"""
registry = ParserRegistry()
registry.register_builtin(dummy_parser_cls)
result = registry.get_parser_for_file("text/plain", "readme.txt")
assert result is dummy_parser_cls
def test_highest_score_wins(self) -> None:
"""
GIVEN: Two parsers both supporting 'text/plain' with scores 5 and 20.
WHEN: get_parser_for_file() is called for 'text/plain'.
THEN: The parser with score 20 is returned.
"""
class LowScoreParser:
name = "low"
version = "1.0"
author = "A"
url = "https://example.com/low"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 5
class HighScoreParser:
name = "high"
version = "1.0"
author = "B"
url = "https://example.com/high"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 20
registry = ParserRegistry()
registry.register_builtin(LowScoreParser)
registry.register_builtin(HighScoreParser)
result = registry.get_parser_for_file("text/plain", "readme.txt")
assert result is HighScoreParser
def test_parser_returning_none_score_is_skipped(self) -> None:
"""
GIVEN: A parser that returns None from score() for the given file.
WHEN: get_parser_for_file() is called.
THEN: That parser is skipped and None is returned (no other candidates).
"""
class DecliningParser:
name = "declining"
version = "1.0"
author = "A"
url = "https://example.com"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return None # Explicitly declines
registry = ParserRegistry()
registry.register_builtin(DecliningParser)
result = registry.get_parser_for_file("text/plain", "readme.txt")
assert result is None
def test_all_parsers_decline_returns_none(self) -> None:
"""
GIVEN: Multiple parsers that all return None from score().
WHEN: get_parser_for_file() is called.
THEN: None is returned.
"""
class AlwaysDeclines:
name = "declines"
version = "1.0"
author = "A"
url = "https://example.com"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return None
registry = ParserRegistry()
registry.register_builtin(AlwaysDeclines)
registry._external.append(AlwaysDeclines)
result = registry.get_parser_for_file("text/plain", "file.txt")
assert result is None
def test_external_parser_beats_builtin_same_score(self) -> None:
"""
GIVEN: An external and a built-in parser both returning score 10.
WHEN: get_parser_for_file() is called.
THEN: The external parser wins because externals are evaluated first
and the first-seen-wins policy applies at equal scores.
"""
class BuiltinParser:
name = "builtin"
version = "1.0"
author = "Core"
url = "https://example.com/builtin"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 10
class ExternalParser:
name = "external"
version = "2.0"
author = "Third Party"
url = "https://example.com/external"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 10
registry = ParserRegistry()
registry.register_builtin(BuiltinParser)
registry._external.append(ExternalParser)
result = registry.get_parser_for_file("text/plain", "file.txt")
assert result is ExternalParser
def test_builtin_wins_when_external_declines(self) -> None:
"""
GIVEN: An external parser that declines (score None) and a built-in
that returns score 5.
WHEN: get_parser_for_file() is called.
THEN: The built-in parser is returned.
"""
class DecliningExternal:
name = "declining-external"
version = "1.0"
author = "Third Party"
url = "https://example.com/declining"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return None
class AcceptingBuiltin:
name = "accepting-builtin"
version = "1.0"
author = "Core"
url = "https://example.com/accepting"
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 5
registry = ParserRegistry()
registry.register_builtin(AcceptingBuiltin)
registry._external.append(DecliningExternal)
result = registry.get_parser_for_file("text/plain", "file.txt")
assert result is AcceptingBuiltin
class TestDiscover:
"""Verify entrypoint discovery in ParserRegistry.discover()."""
def test_discover_with_no_entrypoints(self) -> None:
"""
GIVEN: No entrypoints are registered under 'paperless_ngx.parsers'.
WHEN: discover() is called.
THEN: _external remains empty and no errors are raised.
"""
registry = ParserRegistry()
with patch(
"paperless.parsers.registry.entry_points",
return_value=[],
):
registry.discover()
assert registry._external == []
def test_discover_adds_valid_external_parser(self) -> None:
"""
GIVEN: One valid entrypoint whose loaded class has all required attrs.
WHEN: discover() is called.
THEN: The class is appended to _external.
"""
class ValidExternal:
name = "valid-external"
version = "3.0.0"
author = "Someone"
url = "https://example.com/valid"
@classmethod
def supported_mime_types(cls):
return {"application/pdf": ".pdf"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 5
mock_ep = MagicMock(spec=EntryPoint)
mock_ep.name = "valid_external"
mock_ep.load.return_value = ValidExternal
registry = ParserRegistry()
with patch(
"paperless.parsers.registry.entry_points",
return_value=[mock_ep],
):
registry.discover()
assert ValidExternal in registry._external
def test_discover_skips_entrypoint_with_load_error(
self,
caplog: pytest.LogCaptureFixture,
) -> None:
"""
GIVEN: An entrypoint whose load() method raises ImportError.
WHEN: discover() is called.
THEN: The entrypoint is skipped, an error is logged, and _external
remains empty.
"""
mock_ep = MagicMock(spec=EntryPoint)
mock_ep.name = "broken_ep"
mock_ep.load.side_effect = ImportError("missing dependency")
registry = ParserRegistry()
with caplog.at_level(logging.ERROR, logger="paperless.parsers.registry"):
with patch(
"paperless.parsers.registry.entry_points",
return_value=[mock_ep],
):
registry.discover()
assert registry._external == []
assert any(
"broken_ep" in record.message
for record in caplog.records
if record.levelno >= logging.ERROR
)
def test_discover_skips_entrypoint_with_missing_attrs(
self,
caplog: pytest.LogCaptureFixture,
) -> None:
"""
GIVEN: A class loaded from an entrypoint that is missing the 'score'
attribute.
WHEN: discover() is called.
THEN: The entrypoint is skipped, a warning is logged, and _external
remains empty.
"""
class MissingScore:
name = "missing-score"
version = "1.0"
author = "Someone"
url = "https://example.com"
# 'score' classmethod is intentionally absent.
@classmethod
def supported_mime_types(cls):
return {"text/plain": ".txt"}
mock_ep = MagicMock(spec=EntryPoint)
mock_ep.name = "missing_score_ep"
mock_ep.load.return_value = MissingScore
registry = ParserRegistry()
with caplog.at_level(logging.WARNING, logger="paperless.parsers.registry"):
with patch(
"paperless.parsers.registry.entry_points",
return_value=[mock_ep],
):
registry.discover()
assert registry._external == []
assert any(
"missing_score_ep" in record.message
for record in caplog.records
if record.levelno >= logging.WARNING
)
def test_discover_logs_loaded_parser_info(
self,
caplog: pytest.LogCaptureFixture,
) -> None:
"""
GIVEN: A valid entrypoint that loads successfully.
WHEN: discover() is called.
THEN: An INFO log message is emitted containing the parser name,
version, author, and entrypoint name.
"""
class LoggableParser:
name = "loggable"
version = "4.2.0"
author = "Log Tester"
url = "https://example.com/loggable"
@classmethod
def supported_mime_types(cls):
return {"image/png": ".png"}
@classmethod
def score(cls, mime_type, filename, path=None):
return 1
mock_ep = MagicMock(spec=EntryPoint)
mock_ep.name = "loggable_ep"
mock_ep.load.return_value = LoggableParser
registry = ParserRegistry()
with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
with patch(
"paperless.parsers.registry.entry_points",
return_value=[mock_ep],
):
registry.discover()
info_messages = " ".join(
r.message for r in caplog.records if r.levelno == logging.INFO
)
assert "loggable" in info_messages
assert "4.2.0" in info_messages
assert "Log Tester" in info_messages
assert "loggable_ep" in info_messages
class TestLogSummary:
"""Verify log output from ParserRegistry.log_summary()."""
def test_log_summary_with_no_external_parsers(
self,
dummy_parser_cls: type,
caplog: pytest.LogCaptureFixture,
) -> None:
"""
GIVEN: A registry with one built-in parser and no external parsers.
WHEN: log_summary() is called.
THEN: The built-in parser name appears in the logs.
"""
registry = ParserRegistry()
registry.register_builtin(dummy_parser_cls)
with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
registry.log_summary()
all_messages = " ".join(r.message for r in caplog.records)
assert dummy_parser_cls.name in all_messages
def test_log_summary_with_external_parsers(
self,
caplog: pytest.LogCaptureFixture,
) -> None:
"""
GIVEN: A registry with one external parser registered.
WHEN: log_summary() is called.
THEN: The external parser name, version, author, and url appear in
the log output.
"""
class ExtParser:
name = "ext-parser"
version = "9.9.9"
author = "Ext Corp"
url = "https://ext.example.com"
@classmethod
def supported_mime_types(cls):
return {}
@classmethod
def score(cls, mime_type, filename, path=None):
return None
registry = ParserRegistry()
registry._external.append(ExtParser)
with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
registry.log_summary()
all_messages = " ".join(r.message for r in caplog.records)
assert "ext-parser" in all_messages
assert "9.9.9" in all_messages
assert "Ext Corp" in all_messages
assert "https://ext.example.com" in all_messages
def test_log_summary_logs_no_third_party_message_when_none(
self,
caplog: pytest.LogCaptureFixture,
) -> None:
"""
GIVEN: A registry with no external parsers.
WHEN: log_summary() is called.
THEN: A message containing 'No third-party parsers discovered.' is
logged.
"""
registry = ParserRegistry()
with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
registry.log_summary()
all_messages = " ".join(r.message for r in caplog.records)
assert "No third-party parsers discovered." in all_messages

View File

@@ -1,186 +1,175 @@
from unittest import mock
import pytest
from channels.layers import get_channel_layer
from channels.testing import WebsocketCommunicator
from django.test import TestCase
from django.test import override_settings
from pytest_mock import MockerFixture
from documents.plugins.helpers import DocumentsStatusManager
from documents.plugins.helpers import ProgressManager
from documents.plugins.helpers import ProgressStatusOptions
from paperless.asgi import application
TEST_CHANNEL_LAYERS = {
"default": {
"BACKEND": "channels.layers.InMemoryChannelLayer",
},
}
class TestWebSockets:
@pytest.fixture(autouse=True)
def anyio_backend(self) -> str:
return "asyncio"
@override_settings(CHANNEL_LAYERS=TEST_CHANNEL_LAYERS)
class TestWebSockets(TestCase):
@pytest.mark.anyio
async def test_no_auth(self) -> None:
communicator = WebsocketCommunicator(application, "/ws/status/")
connected, _ = await communicator.connect()
self.assertFalse(connected)
assert not connected
await communicator.disconnect()
@mock.patch("paperless.consumers.StatusConsumer.close")
@mock.patch("paperless.consumers.StatusConsumer._authenticated")
async def test_close_on_no_auth(self, _authenticated, mock_close) -> None:
_authenticated.return_value = True
@pytest.mark.anyio
async def test_close_on_no_auth(self, mocker: MockerFixture) -> None:
mock_auth = mocker.patch(
"paperless.consumers.StatusConsumer._authenticated",
return_value=True,
)
mock_close = mocker.patch(
"paperless.consumers.StatusConsumer.close",
new_callable=mocker.AsyncMock,
)
communicator = WebsocketCommunicator(application, "/ws/status/")
connected, _ = await communicator.connect()
self.assertTrue(connected)
message = {"type": "status_update", "data": {"task_id": "test"}}
_authenticated.return_value = False
assert connected
mock_auth.return_value = False
channel_layer = get_channel_layer()
assert channel_layer is not None
await channel_layer.group_send(
"status_updates",
message,
{"type": "status_update", "data": {"task_id": "test"}},
)
await communicator.receive_nothing()
mock_close.assert_called_once()
mock_close.assert_awaited_once()
mock_close.reset_mock()
message = {
"type": "document_updated",
"data": {"document_id": 10, "modified": "2026-02-17T00:00:00Z"},
}
await channel_layer.group_send(
"status_updates",
message,
{
"type": "document_updated",
"data": {"document_id": 10, "modified": "2026-02-17T00:00:00Z"},
},
)
await communicator.receive_nothing()
mock_close.assert_called_once()
mock_close.assert_awaited_once()
mock_close.reset_mock()
message = {"type": "documents_deleted", "data": {"documents": [1, 2, 3]}}
await channel_layer.group_send(
"status_updates",
message,
{"type": "documents_deleted", "data": {"documents": [1, 2, 3]}},
)
await communicator.receive_nothing()
mock_close.assert_awaited_once()
mock_close.assert_called_once()
@mock.patch("paperless.consumers.StatusConsumer._authenticated")
async def test_auth(self, _authenticated) -> None:
_authenticated.return_value = True
communicator = WebsocketCommunicator(application, "/ws/status/")
connected, _ = await communicator.connect()
self.assertTrue(connected)
await communicator.disconnect()
@mock.patch("paperless.consumers.StatusConsumer._authenticated")
async def test_receive_status_update(self, _authenticated) -> None:
_authenticated.return_value = True
communicator = WebsocketCommunicator(application, "/ws/status/")
connected, _ = await communicator.connect()
self.assertTrue(connected)
message = {"type": "status_update", "data": {"task_id": "test"}}
channel_layer = get_channel_layer()
await channel_layer.group_send(
"status_updates",
message,
@pytest.mark.anyio
async def test_auth(self, mocker: MockerFixture) -> None:
mocker.patch(
"paperless.consumers.StatusConsumer._authenticated",
return_value=True,
)
response = await communicator.receive_json_from()
self.assertEqual(response, message)
communicator = WebsocketCommunicator(application, "/ws/status/")
connected, _ = await communicator.connect()
assert connected
await communicator.disconnect()
async def test_status_update_check_perms(self) -> None:
@pytest.mark.anyio
async def test_receive_status_update(self, mocker: MockerFixture) -> None:
mocker.patch(
"paperless.consumers.StatusConsumer._authenticated",
return_value=True,
)
communicator = WebsocketCommunicator(application, "/ws/status/")
communicator.scope["user"] = mock.Mock()
communicator.scope["user"].is_authenticated = True
communicator.scope["user"].is_superuser = False
communicator.scope["user"].id = 1
connected, _ = await communicator.connect()
self.assertTrue(connected)
assert connected
# Test as owner
message = {"type": "status_update", "data": {"task_id": "test"}}
channel_layer = get_channel_layer()
assert channel_layer is not None
await channel_layer.group_send("status_updates", message)
assert await communicator.receive_json_from() == message
await communicator.disconnect()
@pytest.mark.anyio
async def test_status_update_check_perms(self, mocker: MockerFixture) -> None:
user = mocker.MagicMock()
user.is_authenticated = True
user.is_superuser = False
user.id = 1
communicator = WebsocketCommunicator(application, "/ws/status/")
communicator.scope["user"] = user # type: ignore[typeddict-unknown-key]
connected, _ = await communicator.connect()
assert connected
channel_layer = get_channel_layer()
assert channel_layer is not None
# Message received as owner
message = {"type": "status_update", "data": {"task_id": "test", "owner_id": 1}}
channel_layer = get_channel_layer()
await channel_layer.group_send(
"status_updates",
message,
)
response = await communicator.receive_json_from()
self.assertEqual(response, message)
await channel_layer.group_send("status_updates", message)
assert await communicator.receive_json_from() == message
# Test with a group that the user belongs to
communicator.scope["user"].groups.filter.return_value.exists.return_value = True
# Message received via group membership
user.groups.filter.return_value.aexists = mocker.AsyncMock(return_value=True)
message = {
"type": "status_update",
"data": {"task_id": "test", "owner_id": 2, "groups_can_view": [1]},
}
channel_layer = get_channel_layer()
await channel_layer.group_send(
"status_updates",
message,
)
response = await communicator.receive_json_from()
self.assertEqual(response, message)
await channel_layer.group_send("status_updates", message)
assert await communicator.receive_json_from() == message
# Test with a different owner_id
# Message not received for different owner with no group match
user.groups.filter.return_value.aexists = mocker.AsyncMock(return_value=False)
message = {"type": "status_update", "data": {"task_id": "test", "owner_id": 2}}
channel_layer = get_channel_layer()
await channel_layer.group_send(
"status_updates",
message,
)
response = await communicator.receive_nothing()
self.assertNotEqual(response, message)
await channel_layer.group_send("status_updates", message)
assert await communicator.receive_nothing()
await communicator.disconnect()
@mock.patch("paperless.consumers.StatusConsumer._authenticated")
async def test_receive_documents_deleted(self, _authenticated) -> None:
_authenticated.return_value = True
@pytest.mark.anyio
async def test_receive_documents_deleted(self, mocker: MockerFixture) -> None:
mocker.patch(
"paperless.consumers.StatusConsumer._authenticated",
return_value=True,
)
communicator = WebsocketCommunicator(application, "/ws/status/")
connected, _ = await communicator.connect()
self.assertTrue(connected)
assert connected
message = {"type": "documents_deleted", "data": {"documents": [1, 2, 3]}}
channel_layer = get_channel_layer()
await channel_layer.group_send(
"status_updates",
message,
)
assert channel_layer is not None
await channel_layer.group_send("status_updates", message)
response = await communicator.receive_json_from()
self.assertEqual(response, message)
assert await communicator.receive_json_from() == message
await communicator.disconnect()
@mock.patch("paperless.consumers.StatusConsumer._can_view")
@mock.patch("paperless.consumers.StatusConsumer._authenticated")
async def test_receive_document_updated(self, _authenticated, _can_view) -> None:
_authenticated.return_value = True
_can_view.return_value = True
@pytest.mark.anyio
async def test_receive_document_updated(self, mocker: MockerFixture) -> None:
mocker.patch(
"paperless.consumers.StatusConsumer._authenticated",
return_value=True,
)
mocker.patch(
"paperless.consumers.StatusConsumer._can_view",
return_value=True,
)
communicator = WebsocketCommunicator(application, "/ws/status/")
connected, _ = await communicator.connect()
self.assertTrue(connected)
assert connected
message = {
"type": "document_updated",
@@ -192,67 +181,52 @@ class TestWebSockets(TestCase):
"groups_can_view": [],
},
}
channel_layer = get_channel_layer()
assert channel_layer is not None
await channel_layer.group_send(
"status_updates",
message,
)
await channel_layer.group_send("status_updates", message)
response = await communicator.receive_json_from()
self.assertEqual(response, message)
assert await communicator.receive_json_from() == message
await communicator.disconnect()
@mock.patch("channels.layers.InMemoryChannelLayer.group_send")
def test_manager_send_progress(self, mock_group_send) -> None:
def test_manager_send_progress(self, mocker: MockerFixture) -> None:
mock_group_send = mocker.patch(
"channels.layers.InMemoryChannelLayer.group_send",
)
with ProgressManager(task_id="test") as manager:
manager.send_progress(
ProgressStatusOptions.STARTED,
"Test message",
1,
10,
extra_args={
"foo": "bar",
},
extra_args={"foo": "bar"},
)
message = mock_group_send.call_args[0][1]
self.assertEqual(
message,
{
"type": "status_update",
"data": {
"filename": None,
"task_id": "test",
"current_progress": 1,
"max_progress": 10,
"status": ProgressStatusOptions.STARTED,
"message": "Test message",
"foo": "bar",
},
assert mock_group_send.call_args[0][1] == {
"type": "status_update",
"data": {
"filename": None,
"task_id": "test",
"current_progress": 1,
"max_progress": 10,
"status": ProgressStatusOptions.STARTED,
"message": "Test message",
"foo": "bar",
},
}
def test_manager_send_documents_deleted(self, mocker: MockerFixture) -> None:
mock_group_send = mocker.patch(
"channels.layers.InMemoryChannelLayer.group_send",
)
@mock.patch("channels.layers.InMemoryChannelLayer.group_send")
def test_manager_send_documents_deleted(
self,
mock_group_send: mock.MagicMock,
) -> None:
with DocumentsStatusManager() as manager:
manager.send_documents_deleted([1, 2, 3])
message = mock_group_send.call_args[0][1]
self.assertEqual(
message,
{
"type": "documents_deleted",
"data": {
"documents": [1, 2, 3],
},
assert mock_group_send.call_args[0][1] == {
"type": "documents_deleted",
"data": {
"documents": [1, 2, 3],
},
)
}

View File

@@ -1,118 +0,0 @@
from pathlib import Path
from django.conf import settings
from paperless_tesseract.parsers import RasterisedDocumentParser
class RemoteEngineConfig:
def __init__(
self,
engine: str,
api_key: str | None = None,
endpoint: str | None = None,
):
self.engine = engine
self.api_key = api_key
self.endpoint = endpoint
def engine_is_valid(self):
valid = self.engine in ["azureai"] and self.api_key is not None
if self.engine == "azureai":
valid = valid and self.endpoint is not None
return valid
class RemoteDocumentParser(RasterisedDocumentParser):
"""
This parser uses a remote OCR engine to parse documents. Currently, it supports Azure AI Vision
as this is the only service that provides a remote OCR API with text-embedded PDF output.
"""
logging_name = "paperless.parsing.remote"
def get_settings(self) -> RemoteEngineConfig:
"""
Returns the configuration for the remote OCR engine, loaded from Django settings.
"""
return RemoteEngineConfig(
engine=settings.REMOTE_OCR_ENGINE,
api_key=settings.REMOTE_OCR_API_KEY,
endpoint=settings.REMOTE_OCR_ENDPOINT,
)
def supported_mime_types(self):
if self.settings.engine_is_valid():
return {
"application/pdf": ".pdf",
"image/png": ".png",
"image/jpeg": ".jpg",
"image/tiff": ".tiff",
"image/bmp": ".bmp",
"image/gif": ".gif",
"image/webp": ".webp",
}
else:
return {}
def azure_ai_vision_parse(
self,
file: Path,
) -> str | None:
"""
Uses Azure AI Vision to parse the document and return the text content.
It requests a searchable PDF output with embedded text.
The PDF is saved to the archive_path attribute.
Returns the text content extracted from the document.
If the parsing fails, it returns None.
"""
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
from azure.ai.documentintelligence.models import AnalyzeOutputOption
from azure.ai.documentintelligence.models import DocumentContentFormat
from azure.core.credentials import AzureKeyCredential
client = DocumentIntelligenceClient(
endpoint=self.settings.endpoint,
credential=AzureKeyCredential(self.settings.api_key),
)
try:
with file.open("rb") as f:
analyze_request = AnalyzeDocumentRequest(bytes_source=f.read())
poller = client.begin_analyze_document(
model_id="prebuilt-read",
body=analyze_request,
output_content_format=DocumentContentFormat.TEXT,
output=[AnalyzeOutputOption.PDF], # request searchable PDF output
content_type="application/json",
)
poller.wait()
result_id = poller.details["operation_id"]
result = poller.result()
# Download the PDF with embedded text
self.archive_path = self.tempdir / "archive.pdf"
with self.archive_path.open("wb") as f:
for chunk in client.get_analyze_result_pdf(
model_id="prebuilt-read",
result_id=result_id,
):
f.write(chunk)
return result.content
except Exception as e:
self.log.error(f"Azure AI Vision parsing failed: {e}")
finally:
client.close()
return None
def parse(self, document_path: Path, mime_type, file_name=None):
if not self.settings.engine_is_valid():
self.log.warning(
"No valid remote parser engine is configured, content will be empty.",
)
self.text = ""
elif self.settings.engine == "azureai":
self.text = self.azure_ai_vision_parse(document_path)

View File

@@ -1,16 +1,36 @@
def get_parser(*args, **kwargs):
from paperless_remote.parsers import RemoteDocumentParser
from __future__ import annotations
from typing import Any
def get_parser(*args: Any, **kwargs: Any) -> Any:
from paperless.parsers.remote import RemoteDocumentParser
# The new RemoteDocumentParser does not accept the progress_callback
# kwarg injected by the old signal-based consumer. logging_group is
# forwarded as a positional arg.
# Phase 4 will replace this signal path with the new ParserRegistry.
kwargs.pop("progress_callback", None)
return RemoteDocumentParser(*args, **kwargs)
def get_supported_mime_types():
from paperless_remote.parsers import RemoteDocumentParser
def get_supported_mime_types() -> dict[str, str]:
from django.conf import settings
return RemoteDocumentParser(None).supported_mime_types()
from paperless.parsers.remote import RemoteDocumentParser
from paperless.parsers.remote import RemoteEngineConfig
config = RemoteEngineConfig(
engine=settings.REMOTE_OCR_ENGINE,
api_key=settings.REMOTE_OCR_API_KEY,
endpoint=settings.REMOTE_OCR_ENDPOINT,
)
if not config.engine_is_valid():
return {}
return RemoteDocumentParser.supported_mime_types()
def remote_consumer_declaration(sender, **kwargs):
def remote_consumer_declaration(sender: Any, **kwargs: Any) -> dict[str, Any]:
return {
"parser": get_parser,
"weight": 5,

View File

@@ -1,131 +0,0 @@
import uuid
from pathlib import Path
from unittest import mock
from django.test import TestCase
from django.test import override_settings
from documents.tests.utils import DirectoriesMixin
from documents.tests.utils import FileSystemAssertsMixin
from paperless_remote.parsers import RemoteDocumentParser
from paperless_remote.signals import get_parser
class TestParser(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
SAMPLE_FILES = Path(__file__).resolve().parent / "samples"
def assertContainsStrings(self, content: str, strings: list[str]) -> None:
# Asserts that all strings appear in content, in the given order.
indices = []
for s in strings:
if s in content:
indices.append(content.index(s))
else:
self.fail(f"'{s}' is not in '{content}'")
self.assertListEqual(indices, sorted(indices))
@mock.patch("paperless_tesseract.parsers.run_subprocess")
@mock.patch("azure.ai.documentintelligence.DocumentIntelligenceClient")
def test_get_text_with_azure(self, mock_client_cls, mock_subprocess) -> None:
# Arrange mock Azure client
mock_client = mock.Mock()
mock_client_cls.return_value = mock_client
# Simulate poller result and its `.details`
mock_poller = mock.Mock()
mock_poller.wait.return_value = None
mock_poller.details = {"operation_id": "fake-op-id"}
mock_client.begin_analyze_document.return_value = mock_poller
mock_poller.result.return_value.content = "This is a test document."
# Return dummy PDF bytes
mock_client.get_analyze_result_pdf.return_value = [
b"%PDF-",
b"1.7 ",
b"FAKEPDF",
]
# Simulate pdftotext by writing dummy text to sidecar file
def fake_run(cmd, *args, **kwargs) -> None:
with Path(cmd[-1]).open("w", encoding="utf-8") as f:
f.write("This is a test document.")
mock_subprocess.side_effect = fake_run
with override_settings(
REMOTE_OCR_ENGINE="azureai",
REMOTE_OCR_API_KEY="somekey",
REMOTE_OCR_ENDPOINT="https://endpoint.cognitiveservices.azure.com",
):
parser = get_parser(uuid.uuid4())
parser.parse(
self.SAMPLE_FILES / "simple-digital.pdf",
"application/pdf",
)
self.assertContainsStrings(
parser.text.strip(),
["This is a test document."],
)
@mock.patch("azure.ai.documentintelligence.DocumentIntelligenceClient")
def test_get_text_with_azure_error_logged_and_returns_none(
self,
mock_client_cls,
) -> None:
mock_client = mock.Mock()
mock_client.begin_analyze_document.side_effect = RuntimeError("fail")
mock_client_cls.return_value = mock_client
with override_settings(
REMOTE_OCR_ENGINE="azureai",
REMOTE_OCR_API_KEY="somekey",
REMOTE_OCR_ENDPOINT="https://endpoint.cognitiveservices.azure.com",
):
parser = get_parser(uuid.uuid4())
with mock.patch.object(parser.log, "error") as mock_log_error:
parser.parse(
self.SAMPLE_FILES / "simple-digital.pdf",
"application/pdf",
)
self.assertIsNone(parser.text)
mock_client.begin_analyze_document.assert_called_once()
mock_client.close.assert_called_once()
mock_log_error.assert_called_once()
self.assertIn(
"Azure AI Vision parsing failed",
mock_log_error.call_args[0][0],
)
@override_settings(
REMOTE_OCR_ENGINE="azureai",
REMOTE_OCR_API_KEY="key",
REMOTE_OCR_ENDPOINT="https://endpoint.cognitiveservices.azure.com",
)
def test_supported_mime_types_valid_config(self) -> None:
parser = RemoteDocumentParser(uuid.uuid4())
expected_types = {
"application/pdf": ".pdf",
"image/png": ".png",
"image/jpeg": ".jpg",
"image/tiff": ".tiff",
"image/bmp": ".bmp",
"image/gif": ".gif",
"image/webp": ".webp",
}
self.assertEqual(parser.supported_mime_types(), expected_types)
def test_supported_mime_types_invalid_config(self) -> None:
parser = get_parser(uuid.uuid4())
self.assertEqual(parser.supported_mime_types(), {})
@override_settings(
REMOTE_OCR_ENGINE=None,
REMOTE_OCR_API_KEY=None,
REMOTE_OCR_ENDPOINT=None,
)
def test_parse_with_invalid_config(self) -> None:
parser = get_parser(uuid.uuid4())
parser.parse(self.SAMPLE_FILES / "simple-digital.pdf", "application/pdf")
self.assertEqual(parser.text, "")

View File

@@ -221,7 +221,7 @@ class RasterisedDocumentParser(DocumentParser):
if TYPE_CHECKING:
assert isinstance(self.settings, OcrConfig)
ocrmypdf_args = {
"input_file": input_file,
"input_file_or_options": input_file,
"output_file": output_file,
# need to use threads, since this will be run in daemonized
# processes via the task library.
@@ -285,7 +285,7 @@ class RasterisedDocumentParser(DocumentParser):
"for compatibility with img2pdf",
)
# Replace the input file with the non-alpha
ocrmypdf_args["input_file"] = self.remove_alpha(input_file)
ocrmypdf_args["input_file_or_options"] = self.remove_alpha(input_file)
if dpi:
self.log.debug(f"Detected DPI for image {input_file}: {dpi}")

View File

@@ -778,7 +778,7 @@ class TestParser(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
safe_fallback=False,
)
self.assertEqual(params["input_file"], "input.pdf")
self.assertEqual(params["input_file_or_options"], "input.pdf")
self.assertEqual(params["output_file"], "output.pdf")
self.assertEqual(params["sidecar"], "sidecar.txt")

View File

@@ -1,50 +0,0 @@
from pathlib import Path
from django.conf import settings
from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont
from documents.parsers import DocumentParser
class TextDocumentParser(DocumentParser):
"""
This parser directly parses a text document (.txt, .md, or .csv)
"""
logging_name = "paperless.parsing.text"
def get_thumbnail(self, document_path: Path, mime_type, file_name=None) -> Path:
# Avoid reading entire file into memory
max_chars = 100_000
file_size_limit = 50 * 1024 * 1024
if document_path.stat().st_size > file_size_limit:
text = "[File too large to preview]"
else:
with Path(document_path).open("r", encoding="utf-8", errors="replace") as f:
text = f.read(max_chars)
img = Image.new("RGB", (500, 700), color="white")
draw = ImageDraw.Draw(img)
font = ImageFont.truetype(
font=settings.THUMBNAIL_FONT_NAME,
size=20,
layout_engine=ImageFont.Layout.BASIC,
)
draw.multiline_text((5, 5), text, font=font, fill="black", spacing=4)
out_path = self.tempdir / "thumb.webp"
img.save(out_path, format="WEBP")
return out_path
def parse(self, document_path, mime_type, file_name=None) -> None:
self.text = self.read_file_handle_unicode_errors(document_path)
def get_settings(self) -> None:
"""
This parser does not implement additional settings yet
"""
return None

View File

@@ -1,10 +1,20 @@
def get_parser(*args, **kwargs):
from paperless_text.parsers import TextDocumentParser
from __future__ import annotations
from typing import Any
def get_parser(*args: Any, **kwargs: Any) -> Any:
from paperless.parsers.text import TextDocumentParser
# The new TextDocumentParser does not accept the progress_callback
# kwarg injected by the old signal-based consumer. logging_group is
# forwarded as a positional arg.
# Phase 4 will replace this signal path with the new ParserRegistry.
kwargs.pop("progress_callback", None)
return TextDocumentParser(*args, **kwargs)
def text_consumer_declaration(sender, **kwargs):
def text_consumer_declaration(sender: Any, **kwargs: Any) -> dict[str, Any]:
return {
"parser": get_parser,
"weight": 10,

View File

@@ -1,30 +0,0 @@
from collections.abc import Generator
from pathlib import Path
import pytest
from paperless_text.parsers import TextDocumentParser
@pytest.fixture(scope="session")
def sample_dir() -> Path:
return (Path(__file__).parent / Path("samples")).resolve()
@pytest.fixture()
def text_parser() -> Generator[TextDocumentParser, None, None]:
try:
parser = TextDocumentParser(logging_group=None)
yield parser
finally:
parser.cleanup()
@pytest.fixture(scope="session")
def sample_txt_file(sample_dir: Path) -> Path:
return sample_dir / "test.txt"
@pytest.fixture(scope="session")
def malformed_txt_file(sample_dir: Path) -> Path:
return sample_dir / "decode_error.txt"

View File

@@ -1,69 +0,0 @@
import tempfile
from pathlib import Path
from paperless_text.parsers import TextDocumentParser
class TestTextParser:
def test_thumbnail(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
# just make sure that it does not crash
f = text_parser.get_thumbnail(sample_txt_file, "text/plain")
assert f.exists()
assert f.is_file()
def test_parse(
self,
text_parser: TextDocumentParser,
sample_txt_file: Path,
) -> None:
text_parser.parse(sample_txt_file, "text/plain")
assert text_parser.get_text() == "This is a test file.\n"
assert text_parser.get_archive_path() is None
def test_parse_invalid_bytes(
self,
text_parser: TextDocumentParser,
malformed_txt_file: Path,
) -> None:
"""
GIVEN:
- Text file which contains invalid UTF bytes
WHEN:
- The file is parsed
THEN:
- Parsing continues
- Invalid bytes are removed
"""
text_parser.parse(malformed_txt_file, "text/plain")
assert text_parser.get_text() == "Pantothens<EFBFBD>ure\n"
assert text_parser.get_archive_path() is None
def test_thumbnail_large_file(self, text_parser: TextDocumentParser) -> None:
"""
GIVEN:
- A very large text file (>50MB)
WHEN:
- A thumbnail is requested
THEN:
- A thumbnail is created without reading the entire file into memory
"""
with tempfile.NamedTemporaryFile(
delete=False,
mode="w",
encoding="utf-8",
suffix=".txt",
) as tmp:
tmp.write("A" * (51 * 1024 * 1024)) # 51 MB of 'A'
large_file = Path(tmp.name)
thumb = text_parser.get_thumbnail(large_file, "text/plain")
assert thumb.exists()
assert thumb.is_file()
large_file.unlink()

View File

@@ -12,6 +12,7 @@ def tika_parser() -> Generator[TikaDocumentParser, None, None]:
parser = TikaDocumentParser(logging_group=None)
yield parser
finally:
# TODO(stumpylog): Cleanup once all parsers are handled
parser.cleanup()

351
uv.lock generated
View File

@@ -831,6 +831,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/2d/82/e5d2c1c67d19841e9edc74954c827444ae826978499bde3dfc1d007c8c11/deepmerge-2.0-py3-none-any.whl", hash = "sha256:6de9ce507115cff0bed95ff0ce9ecc31088ef50cbdf09bc90a09349a318b3d00", size = 13475, upload-time = "2024-08-30T05:31:48.659Z" },
]
[[package]]
name = "defusedxml"
version = "0.7.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/0f/d5/c66da9b79e5bdb124974bfe172b4daf3c984ebd9c2a06e2b8a4dc7331c72/defusedxml-0.7.1.tar.gz", hash = "sha256:1bb3032db185915b62d7c6209c5a8792be6a32ab2fedacc84e01b52c51aa3e69", size = 75520, upload-time = "2021-03-08T10:59:26.269Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/07/6c/aa3f2f849e01cb6a001cd8554a88d4c77c5c1a31c95bdf1cf9301e6d9ef4/defusedxml-0.7.1-py2.py3-none-any.whl", hash = "sha256:a352e7e428770286cc899e2542b6cdaedb2b4953ff269a210103ec58f6198a61", size = 25604, upload-time = "2021-03-08T10:59:24.45Z" },
]
[[package]]
name = "deprecated"
version = "1.3.1"
@@ -1172,14 +1181,14 @@ wheels = [
[[package]]
name = "drf-spectacular-sidecar"
version = "2026.1.1"
version = "2026.3.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "django", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/1e/81/c7b0e3ccbd5a039c4f4fcfecf88391a666ca1406a953886e2f39295b1c90/drf_spectacular_sidecar-2026.1.1.tar.gz", hash = "sha256:6f7c173a8ddbbbdafc7a27e028614b65f07a89ca90f996a432d57460463b56be", size = 2468060, upload-time = "2026-01-01T11:27:12.682Z" }
sdist = { url = "https://files.pythonhosted.org/packages/aa/42/2f8c1b2846399d47094ec414bc0d6a7cce7ba95fd6545a97285eee89f7f1/drf_spectacular_sidecar-2026.3.1.tar.gz", hash = "sha256:5b7fedad66e3851f2f442480792c08115d79217959d01645b93d3d2258938be1", size = 2461501, upload-time = "2026-03-01T11:31:19.708Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/db/96/38725edda526f3e9e597f531beeec94b0ef433d9494f06a13b7636eecb6e/drf_spectacular_sidecar-2026.1.1-py3-none-any.whl", hash = "sha256:af8df62f1b594ec280351336d837eaf2402ab25a6bc2a1fad7aee9935821070f", size = 2489520, upload-time = "2026-01-01T11:27:11.056Z" },
{ url = "https://files.pythonhosted.org/packages/c1/28/2d5e64d101ebc5180674fcaf7b5a35e398e2f8d9688b2e8d52b0e1394e7d/drf_spectacular_sidecar-2026.3.1-py3-none-any.whl", hash = "sha256:864edb83e022e13e3941c325c3cc0c954c843fa2e1d0bc95e81887664b2d3dad", size = 2481725, upload-time = "2026-03-01T11:31:18.469Z" },
]
[[package]]
@@ -1230,11 +1239,11 @@ wheels = [
[[package]]
name = "faker"
version = "40.5.1"
version = "40.8.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/03/2a/96fff3edcb10f6505143448a4b91535f77b74865cec45be52690ee280443/faker-40.5.1.tar.gz", hash = "sha256:70222361cd82aa10cb86066d1a4e8f47f2bcdc919615c412045a69c4e6da0cd3", size = 1952684, upload-time = "2026-02-23T21:34:38.362Z" }
sdist = { url = "https://files.pythonhosted.org/packages/70/03/14428edc541467c460d363f6e94bee9acc271f3e62470630fc9a647d0cf2/faker-40.8.0.tar.gz", hash = "sha256:936a3c9be6c004433f20aa4d99095df5dec82b8c7ad07459756041f8c1728875", size = 1956493, upload-time = "2026-03-04T16:18:48.161Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/4d/a9/1eed4db92d0aec2f9bfdf1faae0ab0418b5e121dda5701f118a7a4f0cd6a/faker-40.5.1-py3-none-any.whl", hash = "sha256:c69640c1e13bad49b4bcebcbf1b52f9f1a872b6ea186c248ada34d798f1661bf", size = 1987053, upload-time = "2026-02-23T21:34:36.418Z" },
{ url = "https://files.pythonhosted.org/packages/4c/3b/c6348f1e285e75b069085b18110a4e6325b763a5d35d5e204356fc7c20b3/faker-40.8.0-py3-none-any.whl", hash = "sha256:eb21bdba18f7a8375382eb94fb436fce07046893dc94cb20817d28deb0c3d579", size = 1989124, upload-time = "2026-03-04T16:18:46.45Z" },
]
[[package]]
@@ -1251,11 +1260,11 @@ wheels = [
[[package]]
name = "filelock"
version = "3.20.3"
version = "3.25.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/1d/65/ce7f1b70157833bf3cb851b556a37d4547ceafc158aa9b34b36782f23696/filelock-3.20.3.tar.gz", hash = "sha256:18c57ee915c7ec61cff0ecf7f0f869936c7c30191bb0cf406f1341778d0834e1", size = 19485, upload-time = "2026-01-09T17:55:05.421Z" }
sdist = { url = "https://files.pythonhosted.org/packages/94/b8/00651a0f559862f3bb7d6f7477b192afe3f583cc5e26403b44e59a55ab34/filelock-3.25.2.tar.gz", hash = "sha256:b64ece2b38f4ca29dd3e810287aa8c48182bbecd1ae6e9ae126c9b35f1382694", size = 40480, upload-time = "2026-03-11T20:45:38.487Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b5/36/7fb70f04bf00bc646cd5bb45aa9eddb15e19437a28b8fb2b4a5249fac770/filelock-3.20.3-py3-none-any.whl", hash = "sha256:4b0dda527ee31078689fc205ec4f1c1bf7d56cf88b6dc9426c4f230e46c2dce1", size = 16701, upload-time = "2026-01-09T17:55:04.334Z" },
{ url = "https://files.pythonhosted.org/packages/a4/a5/842ae8f0c08b61d6484b52f99a03510a3a72d23141942d216ebe81fefbce/filelock-3.25.2-py3-none-any.whl", hash = "sha256:ca8afb0da15f229774c9ad1b455ed96e85a81373065fb10446672f64444ddf70", size = 26759, upload-time = "2026-03-11T20:45:37.437Z" },
]
[[package]]
@@ -1283,6 +1292,59 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a6/ff/ee2f67c0ff146ec98b5df1df637b2bc2d17beeb05df9f427a67bd7a7d79c/flower-2.0.1-py2.py3-none-any.whl", hash = "sha256:9db2c621eeefbc844c8dd88be64aef61e84e2deb29b271e02ab2b5b9f01068e2", size = 383553, upload-time = "2023-08-13T14:37:41.552Z" },
]
[[package]]
name = "fonttools"
version = "4.62.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/5a/96/686339e0fda8142b7ebed39af53f4a5694602a729662f42a6209e3be91d0/fonttools-4.62.0.tar.gz", hash = "sha256:0dc477c12b8076b4eb9af2e440421b0433ffa9e1dcb39e0640a6c94665ed1098", size = 3579521, upload-time = "2026-03-09T16:50:06.217Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e4/33/63d79ca41020dd460b51f1e0f58ad1ff0a36b7bcbdf8f3971d52836581e9/fonttools-4.62.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:196cafef9aeec5258425bd31a4e9a414b2ee0d1557bca184d7923d3d3bcd90f9", size = 2870816, upload-time = "2026-03-09T16:48:32.39Z" },
{ url = "https://files.pythonhosted.org/packages/c0/7a/9aeec114bc9fc00d757a41f092f7107863d372e684a5b5724c043654477c/fonttools-4.62.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:153afc3012ff8761b1733e8fbe5d98623409774c44ffd88fbcb780e240c11d13", size = 2416127, upload-time = "2026-03-09T16:48:34.627Z" },
{ url = "https://files.pythonhosted.org/packages/5a/71/12cfd8ae0478b7158ffa8850786781f67e73c00fd897ef9d053415c5f88b/fonttools-4.62.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:13b663fb197334de84db790353d59da2a7288fd14e9be329f5debc63ec0500a5", size = 5100678, upload-time = "2026-03-09T16:48:36.454Z" },
{ url = "https://files.pythonhosted.org/packages/8a/d7/8e4845993ee233c2023d11babe9b3dae7d30333da1d792eeccebcb77baab/fonttools-4.62.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:591220d5333264b1df0d3285adbdfe2af4f6a45bbf9ca2b485f97c9f577c49ff", size = 5070859, upload-time = "2026-03-09T16:48:38.786Z" },
{ url = "https://files.pythonhosted.org/packages/ae/a0/287ae04cd883a52e7bb1d92dfc4997dcffb54173761c751106845fa9e316/fonttools-4.62.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:579f35c121528a50c96bf6fcb6a393e81e7f896d4326bf40e379f1c971603db9", size = 5076689, upload-time = "2026-03-09T16:48:41.886Z" },
{ url = "https://files.pythonhosted.org/packages/6d/4e/a2377ad26c36fcd3e671a1c316ea5ed83107de1588e2d897a98349363bc7/fonttools-4.62.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:44956b003151d5a289eba6c71fe590d63509267c37e26de1766ba15d9c589582", size = 5202053, upload-time = "2026-03-09T16:48:43.867Z" },
{ url = "https://files.pythonhosted.org/packages/ab/9d/7ad1ffc080619f67d0b1e0fa6a0578f0be077404f13fd8e448d1616a94a3/fonttools-4.62.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:22bde4dc12a9e09b5ced77f3b5053d96cf10c4976c6ac0dee293418ef289d221", size = 2870004, upload-time = "2026-03-09T16:48:50.837Z" },
{ url = "https://files.pythonhosted.org/packages/4d/8b/ba59069a490f61b737e064c3129453dbd28ee38e81d56af0d04d7e6b4de4/fonttools-4.62.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7199c73b326bad892f1cb53ffdd002128bfd58a89b8f662204fbf1daf8d62e85", size = 2414662, upload-time = "2026-03-09T16:48:53.295Z" },
{ url = "https://files.pythonhosted.org/packages/8c/8c/c52a4310de58deeac7e9ea800892aec09b00bb3eb0c53265b31ec02be115/fonttools-4.62.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d732938633681d6e2324e601b79e93f7f72395ec8681f9cdae5a8c08bc167e72", size = 5032975, upload-time = "2026-03-09T16:48:55.718Z" },
{ url = "https://files.pythonhosted.org/packages/0b/a1/d16318232964d786907b9b3613b8409f74cf0be2da400854509d3a864e43/fonttools-4.62.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:31a804c16d76038cc4e3826e07678efb0a02dc4f15396ea8e07088adbfb2578e", size = 4988544, upload-time = "2026-03-09T16:48:57.715Z" },
{ url = "https://files.pythonhosted.org/packages/b2/8d/7e745ca3e65852adc5e52a83dc213fe1b07d61cb5b394970fcd4b1199d1e/fonttools-4.62.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:090e74ac86e68c20150e665ef8e7e0c20cb9f8b395302c9419fa2e4d332c3b51", size = 4971296, upload-time = "2026-03-09T16:48:59.678Z" },
{ url = "https://files.pythonhosted.org/packages/e6/d4/b717a4874175146029ca1517e85474b1af80c9d9a306fc3161e71485eea5/fonttools-4.62.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:8f086120e8be9e99ca1288aa5ce519833f93fe0ec6ebad2380c1dee18781f0b5", size = 5122503, upload-time = "2026-03-09T16:49:02.464Z" },
{ url = "https://files.pythonhosted.org/packages/82/c7/985c1670aa6d82ef270f04cde11394c168f2002700353bd2bde405e59b8f/fonttools-4.62.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:274c8b8a87e439faf565d3bcd3f9f9e31bca7740755776a4a90a4bfeaa722efa", size = 2864929, upload-time = "2026-03-09T16:49:09.331Z" },
{ url = "https://files.pythonhosted.org/packages/c1/dc/c409c8ceec0d3119e9ab0b7b1a2e3c76d1f4d66e4a9db5c59e6b7652e7df/fonttools-4.62.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:93e27131a5a0ae82aaadcffe309b1bae195f6711689722af026862bede05c07c", size = 2412586, upload-time = "2026-03-09T16:49:11.378Z" },
{ url = "https://files.pythonhosted.org/packages/5f/ac/8e300dbf7b4d135287c261ffd92ede02d9f48f0d2db14665fbc8b059588a/fonttools-4.62.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:83c6524c5b93bad9c2939d88e619fedc62e913c19e673f25d5ab74e7a5d074e5", size = 5013708, upload-time = "2026-03-09T16:49:14.063Z" },
{ url = "https://files.pythonhosted.org/packages/fb/bc/60d93477b653eeb1ddf5f9ec34be689b79234d82dbdded269ac0252715b8/fonttools-4.62.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:106aec9226f9498fc5345125ff7200842c01eda273ae038f5049b0916907acee", size = 4964355, upload-time = "2026-03-09T16:49:16.515Z" },
{ url = "https://files.pythonhosted.org/packages/cb/eb/6dc62bcc3c3598c28a3ecb77e69018869c3e109bd83031d4973c059d318b/fonttools-4.62.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:15d86b96c79013320f13bc1b15f94789edb376c0a2d22fb6088f33637e8dfcbc", size = 4953472, upload-time = "2026-03-09T16:49:18.494Z" },
{ url = "https://files.pythonhosted.org/packages/82/b3/3af7592d9b254b7b7fec018135f8776bfa0d1ad335476c2791b1334dc5e4/fonttools-4.62.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:4f16c07e5250d5d71d0f990a59460bc5620c3cc456121f2cfb5b60475699905f", size = 5094701, upload-time = "2026-03-09T16:49:21.67Z" },
{ url = "https://files.pythonhosted.org/packages/1a/64/61f69298aa6e7c363dcf00dd6371a654676900abe27d1effd1a74b43e5d0/fonttools-4.62.0-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:4fa5a9c716e2f75ef34b5a5c2ca0ee4848d795daa7e6792bf30fd4abf8993449", size = 2864222, upload-time = "2026-03-09T16:49:28.285Z" },
{ url = "https://files.pythonhosted.org/packages/c6/57/6b08756fe4455336b1fe160ab3c11fccc90768ccb6ee03fb0b45851aace4/fonttools-4.62.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:625f5cbeb0b8f4e42343eaeb4bc2786718ddd84760a2f5e55fdd3db049047c00", size = 2410674, upload-time = "2026-03-09T16:49:30.504Z" },
{ url = "https://files.pythonhosted.org/packages/6f/86/db65b63bb1b824b63e602e9be21b18741ddc99bcf5a7850f9181159ae107/fonttools-4.62.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6247e58b96b982709cd569a91a2ba935d406dccf17b6aa615afaed37ac3856aa", size = 4999387, upload-time = "2026-03-09T16:49:32.593Z" },
{ url = "https://files.pythonhosted.org/packages/86/c8/c6669e42d2f4efd60d38a3252cebbb28851f968890efb2b9b15f9d1092b0/fonttools-4.62.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:840632ea9c1eab7b7f01c369e408c0721c287dfd7500ab937398430689852fd1", size = 4912506, upload-time = "2026-03-09T16:49:34.927Z" },
{ url = "https://files.pythonhosted.org/packages/2e/49/0ae552aa098edd0ec548413fbf818f52ceb70535016215094a5ce9bf8f70/fonttools-4.62.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:28a9ea2a7467a816d1bec22658b0cce4443ac60abac3e293bdee78beb74588f3", size = 4951202, upload-time = "2026-03-09T16:49:37.1Z" },
{ url = "https://files.pythonhosted.org/packages/71/65/ae38fc8a4cea6f162d74cf11f58e9aeef1baa7d0e3d1376dabd336c129e5/fonttools-4.62.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:5ae611294f768d413949fd12693a8cba0e6332fbc1e07aba60121be35eac68d0", size = 5060758, upload-time = "2026-03-09T16:49:39.464Z" },
{ url = "https://files.pythonhosted.org/packages/f8/65/f47f9b3db1ec156a1f222f1089ba076b2cc9ee1d024a8b0a60c54258517e/fonttools-4.62.0-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:0361a7d41d86937f1f752717c19f719d0fde064d3011038f9f19bdf5fc2f5c95", size = 2947079, upload-time = "2026-03-09T16:49:46.471Z" },
{ url = "https://files.pythonhosted.org/packages/52/73/bc62e5058a0c22cf02b1e0169ef0c3ca6c3247216d719f95bead3c05a991/fonttools-4.62.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:d4108c12773b3c97aa592311557c405d5b4fc03db2b969ed928fcf68e7b3c887", size = 2448802, upload-time = "2026-03-09T16:49:48.328Z" },
{ url = "https://files.pythonhosted.org/packages/2b/df/bfaa0e845884935355670e6e68f137185ab87295f8bc838db575e4a66064/fonttools-4.62.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b448075f32708e8fb377fe7687f769a5f51a027172c591ba9a58693631b077a8", size = 5137378, upload-time = "2026-03-09T16:49:50.223Z" },
{ url = "https://files.pythonhosted.org/packages/32/32/04f616979a18b48b52e634988b93d847b6346260faf85ecccaf7e2e9057f/fonttools-4.62.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:e5f1fa8cc9f1a56a3e33ee6b954d6d9235e6b9d11eb7a6c9dfe2c2f829dc24db", size = 4920714, upload-time = "2026-03-09T16:49:53.172Z" },
{ url = "https://files.pythonhosted.org/packages/3b/2e/274e16689c1dfee5c68302cd7c444213cfddd23cf4620374419625037ec6/fonttools-4.62.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:f8c8ea812f82db1e884b9cdb663080453e28f0f9a1f5027a5adb59c4cc8d38d1", size = 5016012, upload-time = "2026-03-09T16:49:55.762Z" },
{ url = "https://files.pythonhosted.org/packages/7f/0c/b08117270626e7117ac2f89d732fdd4386ec37d2ab3a944462d29e6f89a1/fonttools-4.62.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:03c6068adfdc67c565d217e92386b1cdd951abd4240d65180cec62fa74ba31b2", size = 5042766, upload-time = "2026-03-09T16:49:57.726Z" },
{ url = "https://files.pythonhosted.org/packages/9c/57/c2487c281dde03abb2dec244fd67059b8d118bd30a653cbf69e94084cb23/fonttools-4.62.0-py3-none-any.whl", hash = "sha256:75064f19a10c50c74b336aa5ebe7b1f89fd0fb5255807bfd4b0c6317098f4af3", size = 1152427, upload-time = "2026-03-09T16:50:04.074Z" },
]
[[package]]
name = "fpdf2"
version = "2.8.7"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "defusedxml", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "fonttools", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pillow", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/27/f2/72feae0b2827ed38013e4307b14f95bf0b3d124adfef4d38a7d57533f7be/fpdf2-2.8.7.tar.gz", hash = "sha256:7060ccee5a9c7ab0a271fb765a36a23639f83ef8996c34e3d46af0a17ede57f9", size = 362351, upload-time = "2026-02-28T05:39:16.456Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/66/0a/cf50ecffa1e3747ed9380a3adfc829259f1f86b3fdbd9e505af789003141/fpdf2-2.8.7-py3-none-any.whl", hash = "sha256:d391fc508a3ce02fc43a577c830cda4fe6f37646f2d143d489839940932fbc19", size = 327056, upload-time = "2026-02-28T05:39:14.619Z" },
]
[[package]]
name = "frozenlist"
version = "1.8.0"
@@ -1393,74 +1455,74 @@ wheels = [
[[package]]
name = "granian"
version = "2.7.0"
version = "2.7.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "click", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/43/75/bdea4ab49a02772a3007e667284764081d401169e96d0270d95509e3e240/granian-2.7.0.tar.gz", hash = "sha256:bee8e8a81a259e6f08613c973062df9db5f8451b521bb0259ed8f27d3e2bab23", size = 127963, upload-time = "2026-02-02T11:39:57.525Z" }
sdist = { url = "https://files.pythonhosted.org/packages/57/19/d4ea523715ba8dd2ed295932cc3dda6bb197060f78aada6e886ff08587b2/granian-2.7.2.tar.gz", hash = "sha256:cdae2f3a26fa998d41fefad58f1d1c84a0b035a6cc9377addd81b51ba82f927f", size = 128969, upload-time = "2026-02-24T23:04:23.314Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/8a/28/a3ee3f2220c0b9045f8caa2a2cb7484618961b7500f88594349a7889d391/granian-2.7.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:e76afb483d7f42a0b911bdb447d282f70ad7a96caabd4c99cdc300117c5f8977", size = 4580966, upload-time = "2026-02-02T11:38:14.077Z" },
{ url = "https://files.pythonhosted.org/packages/1b/60/b53da9c255f6853a5516d0f8a3e7325c24123f0f7e77856558c49810f4ce/granian-2.7.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:628523302274f95ca967f295a9aa7bc4ade5e1eced42afc60d06dfe20f2da07a", size = 4210344, upload-time = "2026-02-02T11:38:15.34Z" },
{ url = "https://files.pythonhosted.org/packages/a2/bb/c3380106565bc99edfb90baafa1a8081a4334709ce0200d207ddda36275e/granian-2.7.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a62560b64a17e1cbae61038285d5fa8a32613ada9a46f05047dc607ea7d38f23", size = 5130258, upload-time = "2026-02-02T11:38:17.175Z" },
{ url = "https://files.pythonhosted.org/packages/a2/8f/2c3348d6d33807e3b818ac07366b5251e811ce2548fbe82e0b55982d8a13/granian-2.7.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:47b8e0e9497d24466d6511443cc18f22f18405aab5a7e2fece1dd38206af88c4", size = 4576496, upload-time = "2026-02-02T11:38:18.577Z" },
{ url = "https://files.pythonhosted.org/packages/f6/71/d1d146170a23f3523d8629b47f849b30ba0d513eb519188ce5d7bfd1b916/granian-2.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cc6039c61a07b2d36462c487b66b131ae3fd862bdc8fb81d6e5c206c1a2b683c", size = 4975062, upload-time = "2026-02-02T11:38:20.084Z" },
{ url = "https://files.pythonhosted.org/packages/16/f9/f3acbf8c41cd10ff81109bd9078d3228f23e52bab8673763c65739a87e30/granian-2.7.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:f3b0442beb11b035ee09959726f44b3730d0b55688110defd1d9a9a6c7486955", size = 4827755, upload-time = "2026-02-02T11:38:21.817Z" },
{ url = "https://files.pythonhosted.org/packages/9f/f8/503135b89539feea2be495b47858c22409ba77ffcb71920ae0727c674189/granian-2.7.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:741d0b58a5133cc5902b3129a8a4c55143f0f8769a80e7aa80caadc64c9f1d8b", size = 4939033, upload-time = "2026-02-02T11:38:23.033Z" },
{ url = "https://files.pythonhosted.org/packages/99/90/aaabe2c1162d07a6af55532b6f616199aa237805ef1d732fa78d9883d217/granian-2.7.0-cp311-cp311-musllinux_1_1_armv7l.whl", hash = "sha256:02a6fe6a19f290b70bc23feeb3809511becdaff2263b0469f02c28772af97652", size = 5292980, upload-time = "2026-02-02T11:38:24.823Z" },
{ url = "https://files.pythonhosted.org/packages/eb/aa/d1eb7342676893ab0ec1e66cceca4450bec3f29c488db2a92af5b4211d4d/granian-2.7.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:8239b1a661271428c3e358e4bdcaaaf877a432cc593e93fc6b5a612ae521b06a", size = 5087230, upload-time = "2026-02-02T11:38:26.09Z" },
{ url = "https://files.pythonhosted.org/packages/97/1a/b6d7840bfd9cd9bed627b138e6e8e49d1961997adba30ee39ad75d07ed58/granian-2.7.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:d9c42562dcbf52848d0a9d0db58f8f2e790586eb0c363b8ad1b30fe0bd362117", size = 4572728, upload-time = "2026-02-02T11:38:30.143Z" },
{ url = "https://files.pythonhosted.org/packages/15/93/f8f7224d9eaaaf4dbf493035a85287fa2e27c17e5f7aacc01821d8aa66b4/granian-2.7.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:a3421bd5c90430073e1f3f88fc63bc8d0a8ee547a9a5c06d577a281f384160bd", size = 4195034, upload-time = "2026-02-02T11:38:32.007Z" },
{ url = "https://files.pythonhosted.org/packages/4b/db/66843a35e1b6345da2a1c71839fb9aa7eb0f17d380fbf4cb5c7e06eb6f85/granian-2.7.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8b8057dc81772932e208f2327b5e347459eb78896118e27af9845801e267cec5", size = 5123768, upload-time = "2026-02-02T11:38:33.449Z" },
{ url = "https://files.pythonhosted.org/packages/10/ce/631c5c1f7a4e6b8c98ec857b3e6795fe64e474b6f48df388ac701a21f3fe/granian-2.7.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:1e5e70f438b1a4787d76566770e98bf7732407efa02802f38f10c960247107d7", size = 4562424, upload-time = "2026-02-02T11:38:34.815Z" },
{ url = "https://files.pythonhosted.org/packages/28/41/19bdfa3719e22c4dcf6fa1a53323551a37aa58a4ca7a768db6a0ba714ab0/granian-2.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:213dd224a47c7bfcbb91718c7eeb56d6067825a28dcae50f537964e2dafb729a", size = 5006002, upload-time = "2026-02-02T11:38:36.76Z" },
{ url = "https://files.pythonhosted.org/packages/3c/5b/3b40f489e2449eb58df93ad38f42d1a6c2910502a4bc8017c047e16d637c/granian-2.7.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:bb5be27c0265268d43bab9a878ac27a20b4288843ffc9fda1009b8226673f629", size = 4825073, upload-time = "2026-02-02T11:38:37.998Z" },
{ url = "https://files.pythonhosted.org/packages/04/92/b6de6f8c4146409efb58aee75277b810d54de03a1687d33f1f3f1feb3395/granian-2.7.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:a6ff95aede82903c06eb560a32b10e9235fdafc4568c8fe7dcac28d62be5ffa2", size = 4928628, upload-time = "2026-02-02T11:38:39.481Z" },
{ url = "https://files.pythonhosted.org/packages/39/21/d8a191dcfbf8422b868ab847829670075ba3e4325611e0a9fd2dc909a142/granian-2.7.0-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:e44f0c1676b27582df26d47cf466fedebd72f520edc2025f125c83ff58af77f9", size = 5282898, upload-time = "2026-02-02T11:38:40.815Z" },
{ url = "https://files.pythonhosted.org/packages/d0/46/2746f1a4f0f093576fb64b63c3f022f254c6d2c4cc66d37dd881608397ce/granian-2.7.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:9241b72f95ceb57e2bbce55e0f61c250c1c02e9d2f8531b027dd3dc204209fdd", size = 5118453, upload-time = "2026-02-02T11:38:42.716Z" },
{ url = "https://files.pythonhosted.org/packages/f8/df/b68626242fb4913df0968ee5662f5a394857b3d6fc4ee17c94be69664491/granian-2.7.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:bc61451791c8963232e4921c6805e7c2e366635e1e658267b1854889116ff6d7", size = 4572200, upload-time = "2026-02-02T11:38:46.194Z" },
{ url = "https://files.pythonhosted.org/packages/c0/15/2fe28bca0751d9dc46e5c7e9e4b0c4fd1a55e3e8ba062f28292322ee160b/granian-2.7.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e274a0d6a01c475b9135212106ca5b69f5ec2f67f4ca6ce812d185d80255cdf5", size = 4195415, upload-time = "2026-02-02T11:38:47.78Z" },
{ url = "https://files.pythonhosted.org/packages/07/2a/d4dc40e58a55835cac5296f5090cc3ce2d43332ad486bbf78b3a00e46199/granian-2.7.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:34bd28075adae3453c596ee20089e0288379e3fdf1cec8bafff89bb175ea0eb4", size = 5122981, upload-time = "2026-02-02T11:38:49.55Z" },
{ url = "https://files.pythonhosted.org/packages/bd/fe/8c79837df620dc0eca6a8b799505910cbba2d85d92ccc58d1c549f7027be/granian-2.7.0-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f526583b72cf9e6ca9a4849c781ed546f44005f0ad4b5c7eb1090e1ebec209bf", size = 4561440, upload-time = "2026-02-02T11:38:50.799Z" },
{ url = "https://files.pythonhosted.org/packages/4f/e7/d7abfaa9829ff50cddc27919bd3ce5a335402ebbbaa650e96fe579136674/granian-2.7.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95ac07d5314e03e667210349dfc76124d69726731007c24716e21a2554cc15ca", size = 5005076, upload-time = "2026-02-02T11:38:52.157Z" },
{ url = "https://files.pythonhosted.org/packages/1a/45/108afaa0636c93b6a8ff12810787e4a1ea27fffe59f12ca0de7c784b119a/granian-2.7.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:f6812e342c41ca80e1b34fb6c9a7e51a4bbd14f59025bd1bb59d45a39e02b8d5", size = 4825142, upload-time = "2026-02-02T11:38:53.506Z" },
{ url = "https://files.pythonhosted.org/packages/4b/eb/cedf4675b1047490f819ce8bd1ee1ea74b6c772ae9d9dd1c117ae690a3eb/granian-2.7.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:7a4099ba59885123405699a5313757556ff106f90336dccdf4ceda76f32657d0", size = 4927830, upload-time = "2026-02-02T11:38:54.92Z" },
{ url = "https://files.pythonhosted.org/packages/f9/b5/2d7a2e03ba29a6915ad41502e2870899b9eb54861e3d06ad8470c5e70b41/granian-2.7.0-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:c487731fbae86808410e88c587eb4071213812c5f52570b7981bf07a1b84be25", size = 5282142, upload-time = "2026-02-02T11:38:56.445Z" },
{ url = "https://files.pythonhosted.org/packages/a9/e7/c851b2e2351727186b4bc4a35df832e2e97e4f77b8a93dfdb6daa098cf9e/granian-2.7.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:ca4877ebf8873488ba72a299206621bd0c6febb8f091f3da62117c1fe344501f", size = 5117907, upload-time = "2026-02-02T11:38:57.852Z" },
{ url = "https://files.pythonhosted.org/packages/e1/2f/c9bcd4aa36d3092fe88a623e60aa89bd4ff16836803a633b8b454946a845/granian-2.7.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:e1df8e4669b4fb69b373b2ab40a10a8c511eeb41838d65adb375d1c0e4e7454c", size = 4493110, upload-time = "2026-02-02T11:39:01.294Z" },
{ url = "https://files.pythonhosted.org/packages/6a/b4/02d11870255920d35f8eab390e509d3688fe0018011bb606aa00057b778f/granian-2.7.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:6331ed9d3eb06cfba737dfb8efa3f0a8b4d4312a5af91c0a67bfbaa078b62eb4", size = 4122388, upload-time = "2026-02-02T11:39:02.509Z" },
{ url = "https://files.pythonhosted.org/packages/98/50/dfad5a414a2e3e14c30cd0d54cef1dab4874a67c1e6f8b1124d9998ed8b2/granian-2.7.0-cp313-cp313t-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:093e1c277eddba00eaa94ca82ff7a9ab57b0554cd7013e5b2f3468635dbe520d", size = 4379344, upload-time = "2026-02-02T11:39:04.489Z" },
{ url = "https://files.pythonhosted.org/packages/6e/53/ef086af03ef31aa3c1dbff2da5928a9b5dd1f48d8ebee18dd6628951ae9e/granian-2.7.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8e8e317bdc9ca9905d0b20f665f8fe31080c7f13d90675439113932bb3272c24", size = 5069172, upload-time = "2026-02-02T11:39:05.757Z" },
{ url = "https://files.pythonhosted.org/packages/c3/57/117864ea46c6cbcbeff733a4da736e814b06d6634beeb201b9db176bd6be/granian-2.7.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:391e8589265178fd7f444b6711b6dda157a6b66059a15bf1033ffceeaf26918c", size = 4848246, upload-time = "2026-02-02T11:39:07.048Z" },
{ url = "https://files.pythonhosted.org/packages/60/da/2d45b7b6638a77362228d6770a61fa2bc3feae6c52a80993c230f344b197/granian-2.7.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:49b6873f4a8ee7a1ea627ff98d67ecdd644cfc18aab475b2e15f651dbcbe4140", size = 4669023, upload-time = "2026-02-02T11:39:09.612Z" },
{ url = "https://files.pythonhosted.org/packages/22/69/49e54eb6ed67ccf471c19d4c65f64197dd5a416d501620519e28ea92c82e/granian-2.7.0-cp313-cp313t-musllinux_1_1_aarch64.whl", hash = "sha256:39778147c7527de0bcda12cd9c38863d4e6a80d3a8a96ddeb6fe2d1342f337db", size = 4896002, upload-time = "2026-02-02T11:39:10.996Z" },
{ url = "https://files.pythonhosted.org/packages/c5/f1/a864a78029265d06a6fd61c760c8facf032be0d345deca5081718cbb006f/granian-2.7.0-cp313-cp313t-musllinux_1_1_armv7l.whl", hash = "sha256:8135d0a4574dc5a0acf3a815fc6cad5bbe9075ef86df2c091ec34fbd21639c1c", size = 5239945, upload-time = "2026-02-02T11:39:12.726Z" },
{ url = "https://files.pythonhosted.org/packages/26/33/feef40e4570b771d815c1ddd1008ccc9c0e81ce5a015deded6788e919f18/granian-2.7.0-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:47df2d9e50f22fa820b34fd38ceeeedc0b97994fa164425fa30e746759db8a44", size = 5078968, upload-time = "2026-02-02T11:39:14.048Z" },
{ url = "https://files.pythonhosted.org/packages/b9/6a/b8d58474bbcbca450f030fd41b65c94ae0afb5e8f58c39fbea2df4efee2b/granian-2.7.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:23c6531b75c94c7b533812aed4f40dc93008c406cfa5629ec93397cd0f6770cb", size = 4569780, upload-time = "2026-02-02T11:39:16.671Z" },
{ url = "https://files.pythonhosted.org/packages/c2/dc/a8b11425ebdf6cb58e1084fdb7759d853ca7f0b00376e4bb66300322f5d3/granian-2.7.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:e4939b86f2b7918202ce56cb01c2efe20a393c742d41640b444e82c8b444b614", size = 4195285, upload-time = "2026-02-02T11:39:18.596Z" },
{ url = "https://files.pythonhosted.org/packages/7e/b5/6cc0b94f997d93f4b1510b2d953f07a7f1d16a143d60b53e0e50b887fa12/granian-2.7.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:38fa10adf3c4d50e31a08401e6701ee2488613d905bb316cad456e5ebad5aa81", size = 5121311, upload-time = "2026-02-02T11:39:20.092Z" },
{ url = "https://files.pythonhosted.org/packages/f4/f9/df3d862874cf4b233f97253bb78991ae4f31179a5581beaa41a2100e3bce/granian-2.7.0-cp314-cp314-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:9b366a9fd713a20321e668768b122b7b0140bfaeb3cb0557b6cb11dce827a4fb", size = 4557737, upload-time = "2026-02-02T11:39:21.992Z" },
{ url = "https://files.pythonhosted.org/packages/c7/7f/e3063368345f39188afe5baa1ab62fdd951097656cd83bec3964f91f6e66/granian-2.7.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a916413e0dcd5c6eaf7f7413a6d899f7ba53a988d08e3b3c7ab2e0b5fa687559", size = 5004108, upload-time = "2026-02-02T11:39:23.306Z" },
{ url = "https://files.pythonhosted.org/packages/bc/eb/892bcc0cfc44ed791795bab251e0b6ed767397182bac134d9f0fcecc552e/granian-2.7.0-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:e315adf24162294d35ca4bed66c8f66ac15a0696f2cb462e729122d148f6d958", size = 4823143, upload-time = "2026-02-02T11:39:24.696Z" },
{ url = "https://files.pythonhosted.org/packages/b3/e0/ff8528bf620b6da7833171f6d30bfe4b4b1d6e7d155b634bd17590e0c4b4/granian-2.7.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:486f8785e716f76f96534aaba25acd5dee1a8398725ffd2a55f0833689c75933", size = 4926328, upload-time = "2026-02-02T11:39:26.111Z" },
{ url = "https://files.pythonhosted.org/packages/02/f7/fb0a761d39245295660703a42e9448f3c04ce1f26b2f62e044d179167880/granian-2.7.0-cp314-cp314-musllinux_1_1_armv7l.whl", hash = "sha256:0e5e2c1c6ff1501e3675e5237096b90b767f506bb0ef88594310b7b9eaa95532", size = 5281190, upload-time = "2026-02-02T11:39:27.68Z" },
{ url = "https://files.pythonhosted.org/packages/d6/d8/860e7e96ea109c6db431c8284040d265758bded35f9ce2de05f3969d7c0c/granian-2.7.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:d4418b417f9c2162b4fa9ec41ec34ed3e8ed891463bb058873034222be53542f", size = 5117989, upload-time = "2026-02-02T11:39:29.008Z" },
{ url = "https://files.pythonhosted.org/packages/fb/9a/500ab01ae273870e8fc056956cc49716707b4a0e76fb2b5993258e1494f7/granian-2.7.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:b4367c088c00bdc38a8a495282070010914931edb4c488499f290c91018d9e80", size = 4492656, upload-time = "2026-02-02T11:39:31.614Z" },
{ url = "https://files.pythonhosted.org/packages/d0/26/86dc5a6fff60ee0cc38c2fcd1a0d4cebd52e6764a9f752a20458001ca57e/granian-2.7.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c8f3df224284ed1ff673f61de652337d7721100bf4cfc336b2047005b0edb2e0", size = 4122201, upload-time = "2026-02-02T11:39:33.162Z" },
{ url = "https://files.pythonhosted.org/packages/0f/60/887dc5a099135ff449adcdea9a2aa38f39673baf99de9acb78077b701432/granian-2.7.0-cp314-cp314t-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:6682c08b0d82ad75f8e9d1571254630133e1563c49f0600c2e2dc26cec743ae7", size = 4377489, upload-time = "2026-02-02T11:39:34.532Z" },
{ url = "https://files.pythonhosted.org/packages/5a/6b/68c12f8c4c1f1c109bf55d66beeb37a817fd908af5d5d9b48afcbdc3e623/granian-2.7.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:d6ccc3bdc2248775b6bd292d7d37a1bff79eb1aaf931f3a217ea9fb9a6fe7ca4", size = 5067294, upload-time = "2026-02-02T11:39:35.84Z" },
{ url = "https://files.pythonhosted.org/packages/ff/4f/be4f9c129f5f80f52654f257abe91f647defec020fa134b3600013b7219d/granian-2.7.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5431272a4d6f49a200aeb7b01010a3785b93b9bd8cd813d98ed29c8e9ba1c476", size = 4848356, upload-time = "2026-02-02T11:39:37.443Z" },
{ url = "https://files.pythonhosted.org/packages/d7/aa/f6efcfb435f370a6f3626bd5837465bfb71950f6b3cb3c74e54b176c72e2/granian-2.7.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:790b150255576775672f26dbcbd6eb05f70260dd661b91ce462f6f3846db9501", size = 4669022, upload-time = "2026-02-02T11:39:38.782Z" },
{ url = "https://files.pythonhosted.org/packages/1d/36/e86050c476046ef1f0aae0eb86d098fa787abfc8887a131c82baccc7565e/granian-2.7.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:ce9be999273c181e4b65efbbd82a5bc6f223f1db3463660514d1dc229c8ba760", size = 4895567, upload-time = "2026-02-02T11:39:40.144Z" },
{ url = "https://files.pythonhosted.org/packages/2b/5e/25283ff7fc12fcf42ae8a5687243119739cf4b0bf5ccb1c32d11d37987b1/granian-2.7.0-cp314-cp314t-musllinux_1_1_armv7l.whl", hash = "sha256:319b34f18ed3162354513acb5a9e8cee720ac166cd88fe05f0f057703eb47e4f", size = 5238652, upload-time = "2026-02-02T11:39:41.648Z" },
{ url = "https://files.pythonhosted.org/packages/5f/60/06148781120e086c7437aa9513198025ea1eb847cb2e244d5e2b9801782e/granian-2.7.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:b01bed8ad748840e7ab49373f642076f3bc459e39937a4ce11c5be03e67cdfd9", size = 5079018, upload-time = "2026-02-02T11:39:43.309Z" },
{ url = "https://files.pythonhosted.org/packages/0f/0b/39ebf1b791bbd4049239ecfee8f072321211879e5617a023921961be1d55/granian-2.7.0-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:24a1f6a894bea95ef0e603bebacbccd19c319c0da493bb4fde8b94b8629f3dc8", size = 4581648, upload-time = "2026-02-02T11:39:45.991Z" },
{ url = "https://files.pythonhosted.org/packages/2f/cd/4642192520478bba4cd547124d92607c958a0786864ebe378f3008b40048/granian-2.7.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:c2799497ac896cffea85512983c5d9eb4ae51ebacd7a9a5fd3d2ac81f1755fac", size = 4214257, upload-time = "2026-02-02T11:39:47.507Z" },
{ url = "https://files.pythonhosted.org/packages/e2/3f/615f93753c3b682219fe546196fc9eb3a045d846e57883312c97de4d785a/granian-2.7.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b66a15d004136e641706e0e5522b3509151e2027a0677cf4fa97d049d9ddfa41", size = 4979656, upload-time = "2026-02-02T11:39:48.838Z" },
{ url = "https://files.pythonhosted.org/packages/6e/68/1f2c36a964f93bfe8d6189431b8425acc591b735e47d8898b2e70c478398/granian-2.7.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:de5a6fa93d2138ba2372d20d97b87c1af75fa16a59a93841745326825c3ddf83", size = 4844448, upload-time = "2026-02-02T11:39:50.5Z" },
{ url = "https://files.pythonhosted.org/packages/df/23/d8c83fe6a6656026c734c2ea771cbcdec6f0010e749f8ab0db1bfc8a3dfe/granian-2.7.0-pp311-pypy311_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:aacda2ad46724490c4cd811b8dcadff2260603a3e95ca0d8c33552d791a3c6ac", size = 4930755, upload-time = "2026-02-02T11:39:51.866Z" },
{ url = "https://files.pythonhosted.org/packages/20/e5/2a86ee18544185e72fc50b50985b6bfb4504f7835875d2636f573e100071/granian-2.7.0-pp311-pypy311_pp73-musllinux_1_1_armv7l.whl", hash = "sha256:7efb5ebdb308ed1685a80cded6ea51447753e8afe92c21fc3abf9a06a9eb5d2e", size = 5295728, upload-time = "2026-02-02T11:39:53.364Z" },
{ url = "https://files.pythonhosted.org/packages/7e/bd/0d47d17769601c56d876b289456f27799611571227b99ad300e221600bbd/granian-2.7.0-pp311-pypy311_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:ae96b75420d01d9a7dbe1bd84f1898b2b0ade6883db59bfe2b233d7c28c6b0df", size = 5095149, upload-time = "2026-02-02T11:39:54.767Z" },
{ url = "https://files.pythonhosted.org/packages/f8/58/dcf0e8a54b9a7f8b7482ed617bca08503a47eb6b702aea73cda9efd2c81c/granian-2.7.2-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:3a0d33ada95a1421e5a22d447d918e5615ff0aa37f12de5b84455afe89970875", size = 6522860, upload-time = "2026-02-24T23:02:15.901Z" },
{ url = "https://files.pythonhosted.org/packages/2b/dd/398de0f273fdcf0e96bd70d8cd97364625176990e67457f11e23f95772bd/granian-2.7.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:ee26f0258cc1b6ccf87c7bdcee6d1f90710505522fc9880ec02b299fb15679ad", size = 6135934, upload-time = "2026-02-24T23:02:18.52Z" },
{ url = "https://files.pythonhosted.org/packages/67/b7/7bf635bbdfb88dfc6591fa2ce5c3837ab9535e57e197a780c4a338363de7/granian-2.7.2-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f52338cfab08b8cdaadaa5b93665e0be5b4c4f718fbd132d76ceacacb9ff864e", size = 7138393, upload-time = "2026-02-24T23:02:19.911Z" },
{ url = "https://files.pythonhosted.org/packages/0a/90/e424fd8a703add1e8922390503be8d057882b35b42ba51796157aabd659b/granian-2.7.2-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6e377d03a638fecb6949ab05c8fd4a76f892993aed17c602d179bfd56aebc2de", size = 6467189, upload-time = "2026-02-24T23:02:21.896Z" },
{ url = "https://files.pythonhosted.org/packages/65/9a/5de24d7e2dba1aa9fbac6f0a80dace975cfac1b7c7624ece21da75a38987/granian-2.7.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7f742f3ca1797a746fae4a9337fe5d966460c957fa8efeaccf464b872e158d3d", size = 6870813, upload-time = "2026-02-24T23:02:23.972Z" },
{ url = "https://files.pythonhosted.org/packages/ac/cd/a604e38237857f4ad4262eadc409f94fe08fed3e86fa0b8734479cc5bfb1/granian-2.7.2-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:ca4402e8f28a958f0c0f6ebff94cd0b04ca79690aded785648a438bc3c875ba3", size = 7046583, upload-time = "2026-02-24T23:02:25.94Z" },
{ url = "https://files.pythonhosted.org/packages/cc/ad/79eaae0cddd90c4e191b37674cedd8f4863b44465cb435b10396d0f12c82/granian-2.7.2-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:1f9a899123b0d084783626e5225608094f1d2f6fc81b3a7c77ab8daac33ab74a", size = 7121958, upload-time = "2026-02-24T23:02:27.641Z" },
{ url = "https://files.pythonhosted.org/packages/ca/51/e5c923b1baa003f5b4b7fc148be6f8d2e3cabe55d41040fe8139da52e31b/granian-2.7.2-cp311-cp311-musllinux_1_1_armv7l.whl", hash = "sha256:56ba4bef79d0ae3736328038deed2b5d281b11672bc0b08ffc8ce6210e406ef8", size = 7303047, upload-time = "2026-02-24T23:02:30.863Z" },
{ url = "https://files.pythonhosted.org/packages/06/c0/ebd68144a3ce9ead1a3192ac02e1c26e4874df1257435ce6137adf92fedb/granian-2.7.2-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:ea46e3f43d94715aa89d1f2f5754753d46e6b653d561b82b0291e62a31bdfb35", size = 7011349, upload-time = "2026-02-24T23:02:32.887Z" },
{ url = "https://files.pythonhosted.org/packages/ec/ed/37f5d7d887ec9159dd8f5b1c9c38cee711d51016d203959f2d51c536a33b/granian-2.7.2-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a836f3f8ebfe61cb25d9afb655f2e5d3851154fd2ad97d47bb4fb202817212fc", size = 6451593, upload-time = "2026-02-24T23:02:36.203Z" },
{ url = "https://files.pythonhosted.org/packages/1e/06/84ee67a68504836a52c48ec3b4b2b406cbd927c9b43aae89d82db8d097a0/granian-2.7.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:09b1c543ba30886dea515a156baf6d857bbb8b57dbfd8b012c578b93c80ef0c3", size = 6101239, upload-time = "2026-02-24T23:02:37.636Z" },
{ url = "https://files.pythonhosted.org/packages/ed/50/ece7dc8efe144542cd626b88b1475b649e2eaa3eb5f7541ca57390151b05/granian-2.7.2-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:6d334d4fbefb97001e78aa8067deafb107b867c102ba2120b4b2ec989fa58a89", size = 7079443, upload-time = "2026-02-24T23:02:39.651Z" },
{ url = "https://files.pythonhosted.org/packages/7e/e8/0f37b531d3cc96b8538cca2dc86eda92102e0ee345b30aa689354194a4cb/granian-2.7.2-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8c86081d8c87989db69650e9d0e50ed925b8cd5dad21e0a86aa72d7a45f45925", size = 6428683, upload-time = "2026-02-24T23:02:41.827Z" },
{ url = "https://files.pythonhosted.org/packages/47/09/228626706554b389407270e2a6b19b7dee06d6890e8c01a39c6a785827fd/granian-2.7.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d9eda33dca2c8bc6471bb6e9e25863077bca3877a1bba4069cd5e0ee2de41765", size = 6959520, upload-time = "2026-02-24T23:02:43.488Z" },
{ url = "https://files.pythonhosted.org/packages/61/c0/a639ceabd59b8acae2d71b5c918fcb2d42f8ef98994eedcf9a8b6813731d/granian-2.7.2-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:9cf69aaff6f632074ffbe7c1ee214e50f64be36101b7cb8253eeec1d460f2dba", size = 6991548, upload-time = "2026-02-24T23:02:44.954Z" },
{ url = "https://files.pythonhosted.org/packages/b1/99/a35ed838a3095dcad02ae3944d19ebafe1d5a98cdc72bb61835fb5faf933/granian-2.7.2-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:f761a748cc7f3843b430422d2539da679daf5d3ef0259a101b90d5e55a0aafa7", size = 7121475, upload-time = "2026-02-24T23:02:46.991Z" },
{ url = "https://files.pythonhosted.org/packages/ce/24/3952c464432b904ec1cf537d2bd80d2dfde85524fa428ab9db2b5afe653c/granian-2.7.2-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:41c7b8390b78647fe34662ed7296e1465dad4a5112af9b0ecf8e367083d6c76a", size = 7243647, upload-time = "2026-02-24T23:02:49.165Z" },
{ url = "https://files.pythonhosted.org/packages/c9/fa/ab39e39c6b78eab6b42cf5bb36f56badde2aaafc3807f03f781d00e7861a/granian-2.7.2-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:a052ed466da5922cb443435a95a0c751566943278a6f22cef3d2e19d4e7ecdea", size = 7048915, upload-time = "2026-02-24T23:02:50.773Z" },
{ url = "https://files.pythonhosted.org/packages/ab/bc/cf0bc29f583096a842cf0f26ae2fe40c72ed5286d4548be99ecfcdbb17e2/granian-2.7.2-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:76b840ff13dde8838fd33cd096f2e7cadf2c21a499a67f695f53de57deab6ff8", size = 6440868, upload-time = "2026-02-24T23:02:53.619Z" },
{ url = "https://files.pythonhosted.org/packages/2f/0d/bae1dcd2182ba5d9a5df33eb50b56dc5bbe67e31033d822e079aa8c1ff30/granian-2.7.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:00ccc8d7284bc7360f310179d0b4d17e5ca3077bbe24427e9e9310df397e3831", size = 6097336, upload-time = "2026-02-24T23:02:55.185Z" },
{ url = "https://files.pythonhosted.org/packages/65/7d/3e0a7f32b0ad5faa1d847c51191391552fa239821c95fc7c022688985df2/granian-2.7.2-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:675987c1b321dc8af593db8639e00c25277449b32e8c1b2ddd46b35f28d9fac4", size = 7098742, upload-time = "2026-02-24T23:02:57.898Z" },
{ url = "https://files.pythonhosted.org/packages/89/41/3b44386d636ac6467f0f13f45474c71fc3b90a4f0ba8b536de91b2845a09/granian-2.7.2-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:681c6fbe3354aaa6251e6191ec89f5174ac3b9fbc4b4db606fea456d01969fcb", size = 6430667, upload-time = "2026-02-24T23:02:59.789Z" },
{ url = "https://files.pythonhosted.org/packages/52/70/7b24e187aed3fb7ac2b29d2480a045559a509ef9fec54cffb8694a2d94af/granian-2.7.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8e5c9ae65af5e572dca27d8ca0da4c5180b08473ac47e6f5329699e9455a5cc3", size = 6948424, upload-time = "2026-02-24T23:03:01.406Z" },
{ url = "https://files.pythonhosted.org/packages/fa/4c/cb74c367f9efb874f2c8433fe9bf3e824f05cf719f2251d40e29e07f08c0/granian-2.7.2-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:e37fab2be919ceb195db00d7f49ec220444b1ecaa07c03f7c1c874cacff9de83", size = 7000407, upload-time = "2026-02-24T23:03:03.214Z" },
{ url = "https://files.pythonhosted.org/packages/58/98/dfed3966ed7fbd3aae56e123598f90dc206484092b8373d0a71e2d8b82a8/granian-2.7.2-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:8ec167ab30f5396b5caaff16820a39f4e91986d2fe5bdc02992a03c2b2b2b313", size = 7121626, upload-time = "2026-02-24T23:03:05.349Z" },
{ url = "https://files.pythonhosted.org/packages/39/82/acec732a345cd03b2f6e48ac04b66b7b8b61f5c50eb08d7421fc8c56591a/granian-2.7.2-cp313-cp313-musllinux_1_1_armv7l.whl", hash = "sha256:63f426d793f2116d23be265dd826bec1e623680baf94cc270fe08923113a86ba", size = 7253447, upload-time = "2026-02-24T23:03:06.986Z" },
{ url = "https://files.pythonhosted.org/packages/c5/2b/64779e69b08c1ff1bfc09a4ede904ab761ff63f936c275710886057c52f7/granian-2.7.2-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:1617cbb4efe3112f07fb6762cf81d2d9fe4bdb78971d1fd0a310f8b132f6a51e", size = 7053005, upload-time = "2026-02-24T23:03:09.021Z" },
{ url = "https://files.pythonhosted.org/packages/4c/49/9eb88875d709db7e7844e1c681546448dab5ff5651cd1c1d80ac4b1de4e3/granian-2.7.2-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:016c5857c8baedeab7eb065f98417f5ea26bb72b0f7e0544fe76071efc5ab255", size = 6401748, upload-time = "2026-02-24T23:03:12.802Z" },
{ url = "https://files.pythonhosted.org/packages/e3/80/85726ad9999ed89cb6a32f7f57eb50ce7261459d9c30c3b194ae4c5aa2c5/granian-2.7.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:dcbe01fa141adf3f90964e86a959e250754aa7c6dad8fa7a855e6fd382de4c13", size = 6101265, upload-time = "2026-02-24T23:03:14.435Z" },
{ url = "https://files.pythonhosted.org/packages/07/82/0df56a42b9f4c327d0e0b052f43369127e1b565b9e66bf2c9488f1c8d759/granian-2.7.2-cp313-cp313t-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:283ba23817a685784b66f45423d2f25715fdc076c8ffb43c49a807ee56a0ffc0", size = 6249488, upload-time = "2026-02-24T23:03:16.387Z" },
{ url = "https://files.pythonhosted.org/packages/ef/cc/d83a351560a3d6377672636129c52f06f8393f5831c5ee0f06f274883ea6/granian-2.7.2-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3258419c741897273ce155568b5a9cbacb7700a00516e87119a90f7d520d6783", size = 7104734, upload-time = "2026-02-24T23:03:17.993Z" },
{ url = "https://files.pythonhosted.org/packages/84/d1/539907ee96d0ee2bcceabb4a6a9643b75378d6dfea09b7a9e4fd22cdf977/granian-2.7.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a196125c4837491c139c9cc83541b48c408c92b9cfbbf004fd28717f9c02ad21", size = 6785504, upload-time = "2026-02-24T23:03:19.763Z" },
{ url = "https://files.pythonhosted.org/packages/86/bf/4b6f45882f8341e7c6cb824d693deb94c306be6525b483c76fb373d1e749/granian-2.7.2-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:746555ac8a2dcd9257bfe7ad58f1d7a60892bc4613df6a7d8f736692b3bb3b88", size = 6902790, upload-time = "2026-02-24T23:03:22.215Z" },
{ url = "https://files.pythonhosted.org/packages/44/b8/832970d2d4b144b87be39f5b9dfd31fdb17f298dc238a0b2100c95002cf8/granian-2.7.2-cp313-cp313t-musllinux_1_1_aarch64.whl", hash = "sha256:5ac1843c6084933a54a07d9dcae643365f1d83aaff3fd4f2676ea301185e4e8b", size = 7082682, upload-time = "2026-02-24T23:03:23.875Z" },
{ url = "https://files.pythonhosted.org/packages/38/bc/1521dbf026d1c9d2465cd54e016efd8ff6e1e72eff521071dab20dd61c44/granian-2.7.2-cp313-cp313t-musllinux_1_1_armv7l.whl", hash = "sha256:3612eb6a3f4351dd2c4df246ed0d21056c0556a6b1ed772dd865310aa55a9ba9", size = 7264742, upload-time = "2026-02-24T23:03:25.562Z" },
{ url = "https://files.pythonhosted.org/packages/19/ae/00884ab77045a2f54db90932f9d1ca522201e2a6b2cf2a9b38840db0fd54/granian-2.7.2-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:34708b145e31b4538e0556704a07454a76d6776c55c5bc3a1335e80ef6b3bae3", size = 7062571, upload-time = "2026-02-24T23:03:27.278Z" },
{ url = "https://files.pythonhosted.org/packages/69/4a/8ce622f4f7d58e035d121b9957dd5a8929028dc99cfc5d2bf7f2aa28912c/granian-2.7.2-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:592806c28c491f9c1d1501bac706ecf5e72b73969f20f912678d53308786d658", size = 6442041, upload-time = "2026-02-24T23:03:30.986Z" },
{ url = "https://files.pythonhosted.org/packages/27/62/7d36ed38a40a68c2856b6d2a6fedd40833e7f82eb90ba0d03f2d69ffadf5/granian-2.7.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c9dcde3968b921654bde999468e97d03031f28668bc1fc145c81d8bedb0fb2a4", size = 6100793, upload-time = "2026-02-24T23:03:32.734Z" },
{ url = "https://files.pythonhosted.org/packages/b4/c5/17fea68f4cb280c217cbd65534664722c9c9b0138c2754e20c235d70b5f4/granian-2.7.2-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:6d4d78408283ec51f0fb00557856b4593947ad5b48287c04e1c22764a0ac28a5", size = 7119810, upload-time = "2026-02-24T23:03:34.807Z" },
{ url = "https://files.pythonhosted.org/packages/0a/76/35e240d107e0f158662652fd61191de4fb0c2c080e3786ca8f16c71547b7/granian-2.7.2-cp314-cp314-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:66d28b078e8087f794b83822055f95caf93d83b23f47f4efcd5e2f0f7a5d8a81", size = 6450789, upload-time = "2026-02-24T23:03:36.81Z" },
{ url = "https://files.pythonhosted.org/packages/4c/55/a6d08cfecc808149a910e51c57883ab26fad69d922dc2e76fb2d87469e2d/granian-2.7.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4ff7a93123ab339ba6cad51cc7141f8880ec47b152ce2491595bb08edda20106", size = 6902672, upload-time = "2026-02-24T23:03:38.655Z" },
{ url = "https://files.pythonhosted.org/packages/98/2e/c86d95f324248fcc5dcaf034c9f688b32f7a488f0b2a4a25e6673776107f/granian-2.7.2-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:a52effb9889f0944f0353afd6ce5a9d9aa83826d44bbf3c8013e978a3d6ef7b7", size = 6964399, upload-time = "2026-02-24T23:03:40.459Z" },
{ url = "https://files.pythonhosted.org/packages/37/4b/44fde33fe10245a3fba76bf843c387fad2d548244345115b9d87e1c40994/granian-2.7.2-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:76c987c3ca78bf7666ab053c3ed7e3af405af91b2e5ce2f1cf92634c1581e238", size = 7034929, upload-time = "2026-02-24T23:03:42.149Z" },
{ url = "https://files.pythonhosted.org/packages/90/76/38d205cb527046241a9ee4f51048bf44101c626ad4d2af16dd9d14dc1db6/granian-2.7.2-cp314-cp314-musllinux_1_1_armv7l.whl", hash = "sha256:6590f8092c2bb6614e561ba771f084cbf72ecbc38dbf9849762ac38718085c29", size = 7259609, upload-time = "2026-02-24T23:03:43.852Z" },
{ url = "https://files.pythonhosted.org/packages/00/37/04245c7259e65f1083ce193875c6c44da4c98604d3b00a264a74dd4f042b/granian-2.7.2-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:7c1ce9b0c9446b680e9545e7fc95a75f0c53a25dedcf924b1750c3e5ba5bf908", size = 7073161, upload-time = "2026-02-24T23:03:45.655Z" },
{ url = "https://files.pythonhosted.org/packages/cc/07/0e56fb4f178e14b4c1fa1f6f00586ca81761ccbe2d8803f2c12b6b17a7d6/granian-2.7.2-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:a698d9b662d5648c8ae3dc01ad01688e1a8afc3525e431e7cddb841c53e5e291", size = 6415279, upload-time = "2026-02-24T23:03:48.932Z" },
{ url = "https://files.pythonhosted.org/packages/27/bc/3e69305bf34806cd852f4683deec844a2cb9a4d8888d7f172b507f6080a8/granian-2.7.2-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:17516095b520b3c039ddbe41a6beb2c59d554b668cc229d36d82c93154a799af", size = 6090528, upload-time = "2026-02-24T23:03:50.52Z" },
{ url = "https://files.pythonhosted.org/packages/ec/10/7d58a922b44417a6207c0a3230b0841cd7385a36fc518ac15fed16ebf6f7/granian-2.7.2-cp314-cp314t-manylinux_2_12_i686.manylinux2010_i686.whl", hash = "sha256:96b0fd9eac60f939b3cbe44c8f32a42fdb7c1a1a9e07ca89e7795cdc7a606beb", size = 6252291, upload-time = "2026-02-24T23:03:52.248Z" },
{ url = "https://files.pythonhosted.org/packages/54/56/65776c6d759dcef9cce15bc11bdea2c64fe668088faf35d87916bd88f595/granian-2.7.2-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e50fb13e053384b8bd3823d4967606c6fd89f2b0d20e64de3ae212b85ffdfed2", size = 7106748, upload-time = "2026-02-24T23:03:53.994Z" },
{ url = "https://files.pythonhosted.org/packages/81/ee/d9ed836316607401f158ac264a3f770469d1b1edbf119402777a9eff1833/granian-2.7.2-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9bb1ef13125bc05ab2e18869ed311beaeb085a4c4c195d55d0865f5753a4c0b4", size = 6778883, upload-time = "2026-02-24T23:03:55.574Z" },
{ url = "https://files.pythonhosted.org/packages/a1/46/eabab80e07a14527c336dec6d902329399f3ba2b82dc94b6435651021359/granian-2.7.2-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:b1c77189335070c6ba6b8d158518fde4c50f892753620f0b22a7552ad4347143", size = 6903426, upload-time = "2026-02-24T23:03:57.296Z" },
{ url = "https://files.pythonhosted.org/packages/24/8a/8ce186826066f6d453316229383a5be3b0b8a4130146c21f321ee64fe2cb/granian-2.7.2-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:1777166c3c853eed4440adb3cbbf34bba2b77d595bfc143a5826904a80b22f34", size = 7083877, upload-time = "2026-02-24T23:03:59.425Z" },
{ url = "https://files.pythonhosted.org/packages/cf/eb/91ed4646ce1c920ad39db0bcddb6f4755e1823002b14fb026104e3eb8bce/granian-2.7.2-cp314-cp314t-musllinux_1_1_armv7l.whl", hash = "sha256:0ffac19208ae548f3647c849579b803beaed2b50dfb0f3790ad26daac0033484", size = 7267282, upload-time = "2026-02-24T23:04:01.218Z" },
{ url = "https://files.pythonhosted.org/packages/49/2f/58cba479254530ab09132e150e4ab55362f6e875d9e82b6790477843e0aa/granian-2.7.2-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:82f34e78c1297bf5a1b6a5097e30428db98b59fce60a7387977b794855c0c3bc", size = 7054941, upload-time = "2026-02-24T23:04:03.211Z" },
{ url = "https://files.pythonhosted.org/packages/59/71/f21b26c7dc7a8bc9d8288552c9c12128e73f1c3f04799b6e28a0a269b9b0/granian-2.7.2-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:5613ee8c1233a79e56e1735e19c8c70af22a8c6b5808d7c1423dc5387bee4c05", size = 6504773, upload-time = "2026-02-24T23:04:06.498Z" },
{ url = "https://files.pythonhosted.org/packages/6e/68/282fbf5418f9348f657f505dc744cdca70ac850d39a805b21395211bf099/granian-2.7.2-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:0cd6fee79f585de2e1a90b6a311f62b3768c7cda649bc0e02908157ffa2553cc", size = 6138096, upload-time = "2026-02-24T23:04:09.138Z" },
{ url = "https://files.pythonhosted.org/packages/e7/e0/b578709020f84c07ad2ca88f77ac67fd2c62e6b16f93ff8c8d65b7d99296/granian-2.7.2-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e94c825f8b327114f7062d158c502a540ef5819f809e10158f0edddddaf41bb9", size = 6900043, upload-time = "2026-02-24T23:04:11.015Z" },
{ url = "https://files.pythonhosted.org/packages/c7/2f/a2671cc160f29ccf8e605eb8fa113c01051b0d7947048c5b29eb4e603384/granian-2.7.2-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:a6adea5fb8a537d18f3f2b848023151063bc45896415fdebfeb0bf0663d5a03b", size = 7040211, upload-time = "2026-02-24T23:04:13.31Z" },
{ url = "https://files.pythonhosted.org/packages/36/ce/df9bba3b211cda2d47535bb21bc040007e021e8c8adc20ce36619f903bc4/granian-2.7.2-pp311-pypy311_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:2392ab03cb92b1b2d4363f450b2d875177e10f0e22d67a4423052e6885e430f2", size = 7118085, upload-time = "2026-02-24T23:04:15.05Z" },
{ url = "https://files.pythonhosted.org/packages/a9/87/37124b2ee0cddce6ba438b0ff879ddae094ae2c92b24b28ffbe35110931f/granian-2.7.2-pp311-pypy311_pp73-musllinux_1_1_armv7l.whl", hash = "sha256:406c0bb1f5bf55c72cfbfdfd2ccec21299eb3f7b311d85c4889dde357fd36f33", size = 7314667, upload-time = "2026-02-24T23:04:16.783Z" },
{ url = "https://files.pythonhosted.org/packages/8c/ac/8b142ed352bc525e3c97440aab312928beebc735927b0cf979692bfcda3b/granian-2.7.2-pp311-pypy311_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:362a6001daa2ce62532a49df407fe545076052ef29289a76d5760064d820f48b", size = 7004934, upload-time = "2026-02-24T23:04:19.059Z" },
]
[package.optional-dependencies]
@@ -2722,10 +2784,11 @@ wheels = [
[[package]]
name = "ocrmypdf"
version = "16.13.0"
version = "17.3.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "deprecation", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "fpdf2", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "img2pdf", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "packaging", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pdfminer-six", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -2733,11 +2796,14 @@ dependencies = [
{ name = "pikepdf", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pillow", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pluggy", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pydantic", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pypdfium2", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "rich", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "uharfbuzz", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/8c/52/be1aaece0703a736757d8957c0d4f19c37561054169b501eb0e7132f15e5/ocrmypdf-16.13.0.tar.gz", hash = "sha256:29d37e915234ce717374863a9cc5dd32d29e063dfe60c51380dda71254c88248", size = 7042247, upload-time = "2025-12-24T07:58:35.86Z" }
sdist = { url = "https://files.pythonhosted.org/packages/fa/fe/60bdc79529be1ad8b151d426ed2020d5ac90328c54e9ba92bd808e1535c1/ocrmypdf-17.3.0.tar.gz", hash = "sha256:4022f13aad3f405e330056a07aa8bd63714b48b414693831b56e2cf2c325f52d", size = 7378015, upload-time = "2026-02-21T09:30:07.207Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/41/b1/e2e7ad98de0d3ee05b44dbc3f78ccb158a620f3add82d00c85490120e7f2/ocrmypdf-16.13.0-py3-none-any.whl", hash = "sha256:fad8a6f7cc52cdc6225095c401a1766c778c47efe9f1e854ae4dc64a550a3d37", size = 165377, upload-time = "2025-12-24T07:58:33.925Z" },
{ url = "https://files.pythonhosted.org/packages/3d/b1/b7ae057a1bcb1495067ee3c4d48c1ce5fc66addd9492307c5a0ff799a7f2/ocrmypdf-17.3.0-py3-none-any.whl", hash = "sha256:c8882e7864954d3db6bcee49cc9f261b65bff66b7e5925eb68a1c281f41cad23", size = 488130, upload-time = "2026-02-21T09:30:05.236Z" },
]
[[package]]
@@ -2958,10 +3024,10 @@ requires-dist = [
{ name = "djangorestframework", specifier = "~=3.16" },
{ name = "djangorestframework-guardian", specifier = "~=0.4.0" },
{ name = "drf-spectacular", specifier = "~=0.28" },
{ name = "drf-spectacular-sidecar", specifier = "~=2026.1.1" },
{ name = "drf-spectacular-sidecar", specifier = "~=2026.3.1" },
{ name = "drf-writable-nested", specifier = "~=0.7.1" },
{ name = "faiss-cpu", specifier = ">=1.10" },
{ name = "filelock", specifier = "~=3.20.3" },
{ name = "filelock", specifier = "~=3.25.2" },
{ name = "flower", specifier = "~=2.0.1" },
{ name = "gotenberg-client", specifier = "~=0.13.1" },
{ name = "granian", extras = ["uvloop"], marker = "extra == 'webserver'", specifier = "~=2.7.0" },
@@ -2978,7 +3044,7 @@ requires-dist = [
{ name = "llama-index-vector-stores-faiss", specifier = ">=0.5.2" },
{ name = "mysqlclient", marker = "extra == 'mariadb'", specifier = "~=2.2.7" },
{ name = "nltk", specifier = "~=3.9.1" },
{ name = "ocrmypdf", specifier = "~=16.13.0" },
{ name = "ocrmypdf", specifier = "~=17.3.0" },
{ name = "openai", specifier = ">=1.76" },
{ name = "pathvalidate", specifier = "~=3.3.1" },
{ name = "pdf2image", specifier = "~=1.17.0" },
@@ -2995,7 +3061,7 @@ requires-dist = [
{ name = "rapidfuzz", specifier = "~=3.14.0" },
{ name = "redis", extras = ["hiredis"], specifier = "~=5.2.1" },
{ name = "regex", specifier = ">=2025.9.18" },
{ name = "scikit-learn", specifier = "~=1.7.0" },
{ name = "scikit-learn", specifier = "~=1.8.0" },
{ name = "sentence-transformers", specifier = ">=4.1" },
{ name = "setproctitle", specifier = "~=1.3.4" },
{ name = "tika-client", specifier = "~=0.10.0" },
@@ -3011,7 +3077,7 @@ provides-extras = ["mariadb", "postgres", "webserver"]
dev = [
{ name = "daphne" },
{ name = "factory-boy", specifier = "~=3.3.1" },
{ name = "faker", specifier = "~=40.5.1" },
{ name = "faker", specifier = "~=40.8.0" },
{ name = "imagehash" },
{ name = "prek", specifier = "~=0.3.0" },
{ name = "pytest", specifier = "~=9.0.0" },
@@ -3034,7 +3100,7 @@ lint = [
testing = [
{ name = "daphne" },
{ name = "factory-boy", specifier = "~=3.3.1" },
{ name = "faker", specifier = "~=40.5.1" },
{ name = "faker", specifier = "~=40.8.0" },
{ name = "imagehash" },
{ name = "pytest", specifier = "~=9.0.0" },
{ name = "pytest-cov", specifier = "~=7.0.0" },
@@ -3656,16 +3722,40 @@ wheels = [
]
[[package]]
name = "pyrefly"
version = "0.54.0"
name = "pypdfium2"
version = "5.6.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/81/44/c10b16a302fda90d0af1328f880b232761b510eab546616a7be2fdf35a57/pyrefly-0.54.0.tar.gz", hash = "sha256:c6663be64d492f0d2f2a411ada9f28a6792163d34133639378b7f3dd9a8dca94", size = 5098893, upload-time = "2026-02-23T15:44:35.111Z" }
sdist = { url = "https://files.pythonhosted.org/packages/3b/01/be763b9081c7eb823196e7d13d9c145bf75ac43f3c1466de81c21c24b381/pypdfium2-5.6.0.tar.gz", hash = "sha256:bcb9368acfe3547054698abbdae68ba0cbd2d3bda8e8ee437e061deef061976d", size = 270714, upload-time = "2026-03-08T01:05:06.5Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/5f/99/8fdcdb4e55f0227fdd9f6abce36b619bab1ecb0662b83b66adc8cba3c788/pyrefly-0.54.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:58a3f092b6dc25ef79b2dc6c69a40f36784ca157c312bfc0baea463926a9db6d", size = 12223973, upload-time = "2026-02-23T15:44:14.278Z" },
{ url = "https://files.pythonhosted.org/packages/90/35/c2aaf87a76003ad27b286594d2e5178f811eaa15bfe3d98dba2b47d56dd1/pyrefly-0.54.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:615081414106dd95873bc39c3a4bed68754c6cc24a8177ac51d22f88f88d3eb3", size = 11785585, upload-time = "2026-02-23T15:44:17.468Z" },
{ url = "https://files.pythonhosted.org/packages/c4/4a/ced02691ed67e5a897714979196f08ad279ec7ec7f63c45e00a75a7f3c0e/pyrefly-0.54.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0cbcaf20f5fe585079079a95205c1f3cd4542d17228cdf1df560288880623b70", size = 33381977, upload-time = "2026-02-23T15:44:19.736Z" },
{ url = "https://files.pythonhosted.org/packages/0b/ce/72a117ed437c8f6950862181014b41e36f3c3997580e29b772b71e78d587/pyrefly-0.54.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:66d5da116c0d34acfbd66663addd3ca8aa78a636f6692a66e078126d3620a883", size = 35962821, upload-time = "2026-02-23T15:44:22.357Z" },
{ url = "https://files.pythonhosted.org/packages/85/de/89013f5ae0a35d2b6b01274a92a35ee91431ea001050edf0a16748d39875/pyrefly-0.54.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6ef3ac27f1a4baaf67aead64287d3163350844794aca6315ad1a9650b16ec26a", size = 38496689, upload-time = "2026-02-23T15:44:25.236Z" },
{ url = "https://files.pythonhosted.org/packages/6e/f6/9f9e190fe0e5a6b86b82f83bd8b5d3490348766062381140ca5cad8e00b1/pypdfium2-5.6.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:e468c38997573f0e86f03273c2c1fbdea999de52ba43fee96acaa2f6b2ad35f7", size = 3412541, upload-time = "2026-03-08T01:04:25.45Z" },
{ url = "https://files.pythonhosted.org/packages/ee/8d/e57492cb2228ba56ed57de1ff044c8ac114b46905f8b1445c33299ba0488/pypdfium2-5.6.0-py3-none-macosx_11_0_x86_64.whl", hash = "sha256:ad3abddc5805424f962e383253ccad6a0d1d2ebd86afa9a9e1b9ca659773cd0d", size = 3592320, upload-time = "2026-03-08T01:04:27.509Z" },
{ url = "https://files.pythonhosted.org/packages/f9/8a/8ab82e33e9c551494cbe1526ea250ca8cc4e9e98d6a4fc6b6f8d959aa1d1/pypdfium2-5.6.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f6b5eb9eae5c45076395454522ca26add72ba8bd1fe473e1e4721aa58521470c", size = 3596450, upload-time = "2026-03-08T01:04:29.183Z" },
{ url = "https://files.pythonhosted.org/packages/f5/b5/602a792282312ccb158cc63849528079d94b0a11efdc61f2a359edfb41e9/pypdfium2-5.6.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:258624da8ef45cdc426e11b33e9d83f9fb723c1c201c6e0f4ab5a85966c6b876", size = 3325442, upload-time = "2026-03-08T01:04:30.886Z" },
{ url = "https://files.pythonhosted.org/packages/81/1f/9e48ec05ed8d19d736c2d1f23c1bd0f20673f02ef846a2576c69e237f15d/pypdfium2-5.6.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:e9367451c8a00931d6612db0822525a18c06f649d562cd323a719e46ac19c9bb", size = 3727434, upload-time = "2026-03-08T01:04:33.619Z" },
{ url = "https://files.pythonhosted.org/packages/33/90/0efd020928b4edbd65f4f3c2af0c84e20b43a3ada8fa6d04f999a97afe7a/pypdfium2-5.6.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a757869f891eac1cc1372e38a4aa01adac8abc8fe2a8a4e2ebf50595e3bf5937", size = 4139029, upload-time = "2026-03-08T01:04:36.08Z" },
{ url = "https://files.pythonhosted.org/packages/ff/49/a640b288a48dab1752281dd9b72c0679fccea107874e80a65a606b00efa9/pypdfium2-5.6.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:515be355222cc57ae9e62cd5c7c350b8e0c863efc539f80c7d75e2811ba45cb6", size = 3646387, upload-time = "2026-03-08T01:04:38.151Z" },
{ url = "https://files.pythonhosted.org/packages/b0/3b/a344c19c01021eeb5d830c102e4fc9b1602f19c04aa7d11abbe2d188fd8e/pypdfium2-5.6.0-py3-none-manylinux_2_27_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d1c4753c7caf7d004211d7f57a21f10d127f5e0e5510a14d24bc073e7220a3ea", size = 3097212, upload-time = "2026-03-08T01:04:40.776Z" },
{ url = "https://files.pythonhosted.org/packages/50/96/e48e13789ace22aeb9b7510904a1b1493ec588196e11bbacc122da330b3d/pypdfium2-5.6.0-py3-none-manylinux_2_38_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c49729090281fdd85775fb8912c10bd19e99178efaa98f145ab06e7ce68554d2", size = 2965026, upload-time = "2026-03-08T01:04:42.857Z" },
{ url = "https://files.pythonhosted.org/packages/cb/06/3100e44d4935f73af8f5d633d3bd40f0d36d606027085a0ef1f0566a6320/pypdfium2-5.6.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:a4a1749a8d4afd62924a8d95cfa4f2e26fc32957ce34ac3b674be6f127ed252e", size = 4131431, upload-time = "2026-03-08T01:04:44.982Z" },
{ url = "https://files.pythonhosted.org/packages/64/ef/d8df63569ce9a66c8496057782eb8af78e0d28667922d62ec958434e3d4b/pypdfium2-5.6.0-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:36469ebd0fdffb7130ce45ed9c44f8232d91571c89eb851bd1633c64b6f6114f", size = 3747469, upload-time = "2026-03-08T01:04:46.702Z" },
{ url = "https://files.pythonhosted.org/packages/a6/47/fd2c6a67a49fade1acd719fbd11f7c375e7219912923ef2de0ea0ac1544e/pypdfium2-5.6.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:9da900df09be3cf546b637a127a7b6428fb22d705951d731269e25fd3adef457", size = 4337578, upload-time = "2026-03-08T01:04:49.007Z" },
{ url = "https://files.pythonhosted.org/packages/6b/f5/836c83e54b01e09478c4d6bf4912651d6053c932250fcee953f5c72d8e4a/pypdfium2-5.6.0-py3-none-musllinux_1_2_ppc64le.whl", hash = "sha256:45fccd5622233c5ec91a885770ae7dd4004d4320ac05a4ad8fa03a66dea40244", size = 4376104, upload-time = "2026-03-08T01:04:51.04Z" },
{ url = "https://files.pythonhosted.org/packages/6e/7f/b940b6a1664daf8f9bad87c6c99b84effa3611615b8708d10392dc33036c/pypdfium2-5.6.0-py3-none-musllinux_1_2_riscv64.whl", hash = "sha256:282dc030e767cd61bd0299f9d581052b91188e2b87561489057a8e7963e7e0cb", size = 3929824, upload-time = "2026-03-08T01:04:53.544Z" },
{ url = "https://files.pythonhosted.org/packages/88/79/00267d92a6a58c229e364d474f5698efe446e0c7f4f152f58d0138715e99/pypdfium2-5.6.0-py3-none-musllinux_1_2_s390x.whl", hash = "sha256:a1c1dfe950382c76a7bba1ba160ec5e40df8dd26b04a1124ae268fda55bc4cbe", size = 4270201, upload-time = "2026-03-08T01:04:55.81Z" },
{ url = "https://files.pythonhosted.org/packages/e1/ab/b127f38aba41746bdf9ace15ba08411d7ef6ecba1326d529ba414eb1ed50/pypdfium2-5.6.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:43b0341ca6feb6c92e4b7a9eb4813e5466f5f5e8b6baeb14df0a94d5f312c00b", size = 4180793, upload-time = "2026-03-08T01:04:57.961Z" },
]
[[package]]
name = "pyrefly"
version = "0.55.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/bf/c4/76e0797215e62d007f81f86c9c4fb5d6202685a3f5e70810f3fd94294f92/pyrefly-0.55.0.tar.gz", hash = "sha256:434c3282532dd4525c4840f2040ed0eb79b0ec8224fe18d957956b15471f2441", size = 5135682, upload-time = "2026-03-03T00:46:38.122Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/39/b0/16e50cf716784513648e23e726a24f71f9544aa4f86103032dcaa5ff71a2/pyrefly-0.55.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:49aafcefe5e2dd4256147db93e5b0ada42bff7d9a60db70e03d1f7055338eec9", size = 12210073, upload-time = "2026-03-03T00:46:15.51Z" },
{ url = "https://files.pythonhosted.org/packages/3a/ad/89500c01bac3083383011600370289fbc67700c5be46e781787392628a3a/pyrefly-0.55.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:2827426e6b28397c13badb93c0ede0fb0f48046a7a89e3d774cda04e8e2067cd", size = 11767474, upload-time = "2026-03-03T00:46:18.003Z" },
{ url = "https://files.pythonhosted.org/packages/78/68/4c66b260f817f304ead11176ff13985625f7c269e653304b4bdb546551af/pyrefly-0.55.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7346b2d64dc575bd61aa3bca854fbf8b5a19a471cbdb45e0ca1e09861b63488c", size = 33260395, upload-time = "2026-03-03T00:46:20.509Z" },
{ url = "https://files.pythonhosted.org/packages/47/09/10bd48c9f860064f29f412954126a827d60f6451512224912c265e26bbe6/pyrefly-0.55.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:233b861b4cff008b1aff62f4f941577ed752e4d0060834229eb9b6826e6973c9", size = 35848269, upload-time = "2026-03-03T00:46:23.418Z" },
{ url = "https://files.pythonhosted.org/packages/a9/39/bc65cdd5243eb2dfea25dd1321f9a5a93e8d9c3a308501c4c6c05d011585/pyrefly-0.55.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f5aa85657d76da1d25d081a49f0e33c8fc3ec91c1a0f185a8ed393a5a3d9e178", size = 38449820, upload-time = "2026-03-03T00:46:26.309Z" },
]
[[package]]
@@ -4283,7 +4373,7 @@ wheels = [
[[package]]
name = "scikit-learn"
version = "1.7.2"
version = "1.8.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "joblib", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -4291,28 +4381,32 @@ dependencies = [
{ name = "scipy", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "threadpoolctl", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/98/c2/a7855e41c9d285dfe86dc50b250978105dce513d6e459ea66a6aeb0e1e0c/scikit_learn-1.7.2.tar.gz", hash = "sha256:20e9e49ecd130598f1ca38a1d85090e1a600147b9c02fa6f15d69cb53d968fda", size = 7193136, upload-time = "2025-09-09T08:21:29.075Z" }
sdist = { url = "https://files.pythonhosted.org/packages/0e/d4/40988bf3b8e34feec1d0e6a051446b1f66225f8529b9309becaeef62b6c4/scikit_learn-1.8.0.tar.gz", hash = "sha256:9bccbb3b40e3de10351f8f5068e105d0f4083b1a65fa07b6634fbc401a6287fd", size = 7335585, upload-time = "2025-12-10T07:08:53.618Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/43/83/564e141eef908a5863a54da8ca342a137f45a0bfb71d1d79704c9894c9d1/scikit_learn-1.7.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:c7509693451651cd7361d30ce4e86a1347493554f172b1c72a39300fa2aea79e", size = 9331967, upload-time = "2025-09-09T08:20:32.421Z" },
{ url = "https://files.pythonhosted.org/packages/18/d6/ba863a4171ac9d7314c4d3fc251f015704a2caeee41ced89f321c049ed83/scikit_learn-1.7.2-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:0486c8f827c2e7b64837c731c8feff72c0bd2b998067a8a9cbc10643c31f0fe1", size = 8648645, upload-time = "2025-09-09T08:20:34.436Z" },
{ url = "https://files.pythonhosted.org/packages/ef/0e/97dbca66347b8cf0ea8b529e6bb9367e337ba2e8be0ef5c1a545232abfde/scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:89877e19a80c7b11a2891a27c21c4894fb18e2c2e077815bcade10d34287b20d", size = 9715424, upload-time = "2025-09-09T08:20:36.776Z" },
{ url = "https://files.pythonhosted.org/packages/f7/32/1f3b22e3207e1d2c883a7e09abb956362e7d1bd2f14458c7de258a26ac15/scikit_learn-1.7.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8da8bf89d4d79aaec192d2bda62f9b56ae4e5b4ef93b6a56b5de4977e375c1f1", size = 9509234, upload-time = "2025-09-09T08:20:38.957Z" },
{ url = "https://files.pythonhosted.org/packages/a7/aa/3996e2196075689afb9fce0410ebdb4a09099d7964d061d7213700204409/scikit_learn-1.7.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8d91a97fa2b706943822398ab943cde71858a50245e31bc71dba62aab1d60a96", size = 9259818, upload-time = "2025-09-09T08:20:43.19Z" },
{ url = "https://files.pythonhosted.org/packages/43/5d/779320063e88af9c4a7c2cf463ff11c21ac9c8bd730c4a294b0000b666c9/scikit_learn-1.7.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:acbc0f5fd2edd3432a22c69bed78e837c70cf896cd7993d71d51ba6708507476", size = 8636997, upload-time = "2025-09-09T08:20:45.468Z" },
{ url = "https://files.pythonhosted.org/packages/5c/d0/0c577d9325b05594fdd33aa970bf53fb673f051a45496842caee13cfd7fe/scikit_learn-1.7.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:e5bf3d930aee75a65478df91ac1225ff89cd28e9ac7bd1196853a9229b6adb0b", size = 9478381, upload-time = "2025-09-09T08:20:47.982Z" },
{ url = "https://files.pythonhosted.org/packages/82/70/8bf44b933837ba8494ca0fc9a9ab60f1c13b062ad0197f60a56e2fc4c43e/scikit_learn-1.7.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b4d6e9deed1a47aca9fe2f267ab8e8fe82ee20b4526b2c0cd9e135cea10feb44", size = 9300296, upload-time = "2025-09-09T08:20:50.366Z" },
{ url = "https://files.pythonhosted.org/packages/ae/93/a3038cb0293037fd335f77f31fe053b89c72f17b1c8908c576c29d953e84/scikit_learn-1.7.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0b7dacaa05e5d76759fb071558a8b5130f4845166d88654a0f9bdf3eb57851b7", size = 9212382, upload-time = "2025-09-09T08:20:54.731Z" },
{ url = "https://files.pythonhosted.org/packages/40/dd/9a88879b0c1104259136146e4742026b52df8540c39fec21a6383f8292c7/scikit_learn-1.7.2-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:abebbd61ad9e1deed54cca45caea8ad5f79e1b93173dece40bb8e0c658dbe6fe", size = 8592042, upload-time = "2025-09-09T08:20:57.313Z" },
{ url = "https://files.pythonhosted.org/packages/46/af/c5e286471b7d10871b811b72ae794ac5fe2989c0a2df07f0ec723030f5f5/scikit_learn-1.7.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:502c18e39849c0ea1a5d681af1dbcf15f6cce601aebb657aabbfe84133c1907f", size = 9434180, upload-time = "2025-09-09T08:20:59.671Z" },
{ url = "https://files.pythonhosted.org/packages/f1/fd/df59faa53312d585023b2da27e866524ffb8faf87a68516c23896c718320/scikit_learn-1.7.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7a4c328a71785382fe3fe676a9ecf2c86189249beff90bf85e22bdb7efaf9ae0", size = 9283660, upload-time = "2025-09-09T08:21:01.71Z" },
{ url = "https://files.pythonhosted.org/packages/55/87/ef5eb1f267084532c8e4aef98a28b6ffe7425acbfd64b5e2f2e066bc29b3/scikit_learn-1.7.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:9acb6c5e867447b4e1390930e3944a005e2cb115922e693c08a323421a6966e8", size = 9558731, upload-time = "2025-09-09T08:21:06.381Z" },
{ url = "https://files.pythonhosted.org/packages/93/f8/6c1e3fc14b10118068d7938878a9f3f4e6d7b74a8ddb1e5bed65159ccda8/scikit_learn-1.7.2-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:2a41e2a0ef45063e654152ec9d8bcfc39f7afce35b08902bfe290c2498a67a6a", size = 9038852, upload-time = "2025-09-09T08:21:08.628Z" },
{ url = "https://files.pythonhosted.org/packages/83/87/066cafc896ee540c34becf95d30375fe5cbe93c3b75a0ee9aa852cd60021/scikit_learn-1.7.2-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:98335fb98509b73385b3ab2bd0639b1f610541d3988ee675c670371d6a87aa7c", size = 9527094, upload-time = "2025-09-09T08:21:11.486Z" },
{ url = "https://files.pythonhosted.org/packages/9c/2b/4903e1ccafa1f6453b1ab78413938c8800633988c838aa0be386cbb33072/scikit_learn-1.7.2-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:191e5550980d45449126e23ed1d5e9e24b2c68329ee1f691a3987476e115e09c", size = 9367436, upload-time = "2025-09-09T08:21:13.602Z" },
{ url = "https://files.pythonhosted.org/packages/d9/82/dee5acf66837852e8e68df6d8d3a6cb22d3df997b733b032f513d95205b7/scikit_learn-1.7.2-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:fa8f63940e29c82d1e67a45d5297bdebbcb585f5a5a50c4914cc2e852ab77f33", size = 9208906, upload-time = "2025-09-09T08:21:18.557Z" },
{ url = "https://files.pythonhosted.org/packages/3c/30/9029e54e17b87cb7d50d51a5926429c683d5b4c1732f0507a6c3bed9bf65/scikit_learn-1.7.2-cp314-cp314-macosx_12_0_arm64.whl", hash = "sha256:f95dc55b7902b91331fa4e5845dd5bde0580c9cd9612b1b2791b7e80c3d32615", size = 8627836, upload-time = "2025-09-09T08:21:20.695Z" },
{ url = "https://files.pythonhosted.org/packages/60/18/4a52c635c71b536879f4b971c2cedf32c35ee78f48367885ed8025d1f7ee/scikit_learn-1.7.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:9656e4a53e54578ad10a434dc1f993330568cfee176dff07112b8785fb413106", size = 9426236, upload-time = "2025-09-09T08:21:22.645Z" },
{ url = "https://files.pythonhosted.org/packages/99/7e/290362f6ab582128c53445458a5befd471ed1ea37953d5bcf80604619250/scikit_learn-1.7.2-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:96dc05a854add0e50d3f47a1ef21a10a595016da5b007c7d9cd9d0bffd1fcc61", size = 9312593, upload-time = "2025-09-09T08:21:24.65Z" },
{ url = "https://files.pythonhosted.org/packages/c9/92/53ea2181da8ac6bf27170191028aee7251f8f841f8d3edbfdcaf2008fde9/scikit_learn-1.8.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:146b4d36f800c013d267b29168813f7a03a43ecd2895d04861f1240b564421da", size = 8595835, upload-time = "2025-12-10T07:07:39.385Z" },
{ url = "https://files.pythonhosted.org/packages/01/18/d154dc1638803adf987910cdd07097d9c526663a55666a97c124d09fb96a/scikit_learn-1.8.0-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:f984ca4b14914e6b4094c5d52a32ea16b49832c03bd17a110f004db3c223e8e1", size = 8080381, upload-time = "2025-12-10T07:07:41.93Z" },
{ url = "https://files.pythonhosted.org/packages/8a/44/226142fcb7b7101e64fdee5f49dbe6288d4c7af8abf593237b70fca080a4/scikit_learn-1.8.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5e30adb87f0cc81c7690a84f7932dd66be5bac57cfe16b91cb9151683a4a2d3b", size = 8799632, upload-time = "2025-12-10T07:07:43.899Z" },
{ url = "https://files.pythonhosted.org/packages/36/4d/4a67f30778a45d542bbea5db2dbfa1e9e100bf9ba64aefe34215ba9f11f6/scikit_learn-1.8.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ada8121bcb4dac28d930febc791a69f7cb1673c8495e5eee274190b73a4559c1", size = 9103788, upload-time = "2025-12-10T07:07:45.982Z" },
{ url = "https://files.pythonhosted.org/packages/90/74/e6a7cc4b820e95cc38cf36cd74d5aa2b42e8ffc2d21fe5a9a9c45c1c7630/scikit_learn-1.8.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:5fb63362b5a7ddab88e52b6dbb47dac3fd7dafeee740dc6c8d8a446ddedade8e", size = 8548242, upload-time = "2025-12-10T07:07:51.568Z" },
{ url = "https://files.pythonhosted.org/packages/49/d8/9be608c6024d021041c7f0b3928d4749a706f4e2c3832bbede4fb4f58c95/scikit_learn-1.8.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:5025ce924beccb28298246e589c691fe1b8c1c96507e6d27d12c5fadd85bfd76", size = 8079075, upload-time = "2025-12-10T07:07:53.697Z" },
{ url = "https://files.pythonhosted.org/packages/dd/47/f187b4636ff80cc63f21cd40b7b2d177134acaa10f6bb73746130ee8c2e5/scikit_learn-1.8.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4496bb2cf7a43ce1a2d7524a79e40bc5da45cf598dbf9545b7e8316ccba47bb4", size = 8660492, upload-time = "2025-12-10T07:07:55.574Z" },
{ url = "https://files.pythonhosted.org/packages/97/74/b7a304feb2b49df9fafa9382d4d09061a96ee9a9449a7cbea7988dda0828/scikit_learn-1.8.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a0bcfe4d0d14aec44921545fd2af2338c7471de9cb701f1da4c9d85906ab847a", size = 8931904, upload-time = "2025-12-10T07:07:57.666Z" },
{ url = "https://files.pythonhosted.org/packages/03/aa/e22e0768512ce9255eba34775be2e85c2048da73da1193e841707f8f039c/scikit_learn-1.8.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0d6ae97234d5d7079dc0040990a6f7aeb97cb7fa7e8945f1999a429b23569e0a", size = 8513770, upload-time = "2025-12-10T07:08:03.251Z" },
{ url = "https://files.pythonhosted.org/packages/58/37/31b83b2594105f61a381fc74ca19e8780ee923be2d496fcd8d2e1147bd99/scikit_learn-1.8.0-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:edec98c5e7c128328124a029bceb09eda2d526997780fef8d65e9a69eead963e", size = 8044458, upload-time = "2025-12-10T07:08:05.336Z" },
{ url = "https://files.pythonhosted.org/packages/2d/5a/3f1caed8765f33eabb723596666da4ebbf43d11e96550fb18bdec42b467b/scikit_learn-1.8.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:74b66d8689d52ed04c271e1329f0c61635bcaf5b926db9b12d58914cdc01fe57", size = 8610341, upload-time = "2025-12-10T07:08:07.732Z" },
{ url = "https://files.pythonhosted.org/packages/38/cf/06896db3f71c75902a8e9943b444a56e727418f6b4b4a90c98c934f51ed4/scikit_learn-1.8.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8fdf95767f989b0cfedb85f7ed8ca215d4be728031f56ff5a519ee1e3276dc2e", size = 8900022, upload-time = "2025-12-10T07:08:09.862Z" },
{ url = "https://files.pythonhosted.org/packages/d2/7d/a630359fc9dcc95496588c8d8e3245cc8fd81980251079bc09c70d41d951/scikit_learn-1.8.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:7cc267b6108f0a1499a734167282c00c4ebf61328566b55ef262d48e9849c735", size = 8826045, upload-time = "2025-12-10T07:08:15.215Z" },
{ url = "https://files.pythonhosted.org/packages/cc/56/a0c86f6930cfcd1c7054a2bc417e26960bb88d32444fe7f71d5c2cfae891/scikit_learn-1.8.0-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:fe1c011a640a9f0791146011dfd3c7d9669785f9fed2b2a5f9e207536cf5c2fd", size = 8420324, upload-time = "2025-12-10T07:08:17.561Z" },
{ url = "https://files.pythonhosted.org/packages/46/1e/05962ea1cebc1cf3876667ecb14c283ef755bf409993c5946ade3b77e303/scikit_learn-1.8.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:72358cce49465d140cc4e7792015bb1f0296a9742d5622c67e31399b75468b9e", size = 8680651, upload-time = "2025-12-10T07:08:19.952Z" },
{ url = "https://files.pythonhosted.org/packages/fe/56/a85473cd75f200c9759e3a5f0bcab2d116c92a8a02ee08ccd73b870f8bb4/scikit_learn-1.8.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:80832434a6cc114f5219211eec13dcbc16c2bac0e31ef64c6d346cde3cf054cb", size = 8925045, upload-time = "2025-12-10T07:08:22.11Z" },
{ url = "https://files.pythonhosted.org/packages/24/05/1af2c186174cc92dcab2233f327336058c077d38f6fe2aceb08e6ab4d509/scikit_learn-1.8.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:c22a2da7a198c28dd1a6e1136f19c830beab7fdca5b3e5c8bba8394f8a5c45b3", size = 8528667, upload-time = "2025-12-10T07:08:27.541Z" },
{ url = "https://files.pythonhosted.org/packages/a8/25/01c0af38fe969473fb292bba9dc2b8f9b451f3112ff242c647fee3d0dfe7/scikit_learn-1.8.0-cp314-cp314-macosx_12_0_arm64.whl", hash = "sha256:6b595b07a03069a2b1740dc08c2299993850ea81cce4fe19b2421e0c970de6b7", size = 8066524, upload-time = "2025-12-10T07:08:29.822Z" },
{ url = "https://files.pythonhosted.org/packages/be/ce/a0623350aa0b68647333940ee46fe45086c6060ec604874e38e9ab7d8e6c/scikit_learn-1.8.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:29ffc74089f3d5e87dfca4c2c8450f88bdc61b0fc6ed5d267f3988f19a1309f6", size = 8657133, upload-time = "2025-12-10T07:08:31.865Z" },
{ url = "https://files.pythonhosted.org/packages/b8/cb/861b41341d6f1245e6ca80b1c1a8c4dfce43255b03df034429089ca2a2c5/scikit_learn-1.8.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fb65db5d7531bccf3a4f6bec3462223bea71384e2cda41da0f10b7c292b9e7c4", size = 8923223, upload-time = "2025-12-10T07:08:34.166Z" },
{ url = "https://files.pythonhosted.org/packages/2d/d1/ef294ca754826daa043b2a104e59960abfab4cf653891037d19dd5b6f3cf/scikit_learn-1.8.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:4511be56637e46c25721e83d1a9cea9614e7badc7040c4d573d75fbe257d6fd7", size = 8848305, upload-time = "2025-12-10T07:08:41.013Z" },
{ url = "https://files.pythonhosted.org/packages/5b/e2/b1f8b05138ee813b8e1a4149f2f0d289547e60851fd1bb268886915adbda/scikit_learn-1.8.0-cp314-cp314t-macosx_12_0_arm64.whl", hash = "sha256:a69525355a641bf8ef136a7fa447672fb54fe8d60cab5538d9eb7c6438543fb9", size = 8432257, upload-time = "2025-12-10T07:08:42.873Z" },
{ url = "https://files.pythonhosted.org/packages/26/11/c32b2138a85dcb0c99f6afd13a70a951bfdff8a6ab42d8160522542fb647/scikit_learn-1.8.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c2656924ec73e5939c76ac4c8b026fc203b83d8900362eb2599d8aee80e4880f", size = 8678673, upload-time = "2025-12-10T07:08:45.362Z" },
{ url = "https://files.pythonhosted.org/packages/c7/57/51f2384575bdec454f4fe4e7a919d696c9ebce914590abf3e52d47607ab8/scikit_learn-1.8.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:15fc3b5d19cc2be65404786857f2e13c70c83dd4782676dd6814e3b89dc8f5b9", size = 8922467, upload-time = "2025-12-10T07:08:47.408Z" },
]
[[package]]
@@ -4809,18 +4903,16 @@ wheels = [
[[package]]
name = "tornado"
version = "6.5.4"
version = "6.5.5"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/37/1d/0a336abf618272d53f62ebe274f712e213f5a03c0b2339575430b8362ef2/tornado-6.5.4.tar.gz", hash = "sha256:a22fa9047405d03260b483980635f0b041989d8bcc9a313f8fe18b411d84b1d7", size = 513632, upload-time = "2025-12-15T19:21:03.836Z" }
sdist = { url = "https://files.pythonhosted.org/packages/f8/f1/3173dfa4a18db4a9b03e5d55325559dab51ee653763bb8745a75af491286/tornado-6.5.5.tar.gz", hash = "sha256:192b8f3ea91bd7f1f50c06955416ed76c6b72f96779b962f07f911b91e8d30e9", size = 516006, upload-time = "2026-03-10T21:31:02.067Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ab/a9/e94a9d5224107d7ce3cc1fab8d5dc97f5ea351ccc6322ee4fb661da94e35/tornado-6.5.4-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:d6241c1a16b1c9e4cc28148b1cda97dd1c6cb4fb7068ac1bedc610768dff0ba9", size = 443909, upload-time = "2025-12-15T19:20:48.382Z" },
{ url = "https://files.pythonhosted.org/packages/db/7e/f7b8d8c4453f305a51f80dbb49014257bb7d28ccb4bbb8dd328ea995ecad/tornado-6.5.4-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:2d50f63dda1d2cac3ae1fa23d254e16b5e38153758470e9956cbc3d813d40843", size = 442163, upload-time = "2025-12-15T19:20:49.791Z" },
{ url = "https://files.pythonhosted.org/packages/ba/b5/206f82d51e1bfa940ba366a8d2f83904b15942c45a78dd978b599870ab44/tornado-6.5.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d1cf66105dc6acb5af613c054955b8137e34a03698aa53272dbda4afe252be17", size = 445746, upload-time = "2025-12-15T19:20:51.491Z" },
{ url = "https://files.pythonhosted.org/packages/8e/9d/1a3338e0bd30ada6ad4356c13a0a6c35fbc859063fa7eddb309183364ac1/tornado-6.5.4-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:50ff0a58b0dc97939d29da29cd624da010e7f804746621c78d14b80238669335", size = 445083, upload-time = "2025-12-15T19:20:52.778Z" },
{ url = "https://files.pythonhosted.org/packages/50/d4/e51d52047e7eb9a582da59f32125d17c0482d065afd5d3bc435ff2120dc5/tornado-6.5.4-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e5fb5e04efa54cf0baabdd10061eb4148e0be137166146fff835745f59ab9f7f", size = 445315, upload-time = "2025-12-15T19:20:53.996Z" },
{ url = "https://files.pythonhosted.org/packages/27/07/2273972f69ca63dbc139694a3fc4684edec3ea3f9efabf77ed32483b875c/tornado-6.5.4-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9c86b1643b33a4cd415f8d0fe53045f913bf07b4a3ef646b735a6a86047dda84", size = 446003, upload-time = "2025-12-15T19:20:56.101Z" },
{ url = "https://files.pythonhosted.org/packages/d1/83/41c52e47502bf7260044413b6770d1a48dda2f0246f95ee1384a3cd9c44a/tornado-6.5.4-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:6eb82872335a53dd063a4f10917b3efd28270b56a33db69009606a0312660a6f", size = 445412, upload-time = "2025-12-15T19:20:57.398Z" },
{ url = "https://files.pythonhosted.org/packages/10/c7/bc96917f06cbee182d44735d4ecde9c432e25b84f4c2086143013e7b9e52/tornado-6.5.4-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:6076d5dda368c9328ff41ab5d9dd3608e695e8225d1cd0fd1e006f05da3635a8", size = 445392, upload-time = "2025-12-15T19:20:58.692Z" },
{ url = "https://files.pythonhosted.org/packages/59/8c/77f5097695f4dd8255ecbd08b2a1ed8ba8b953d337804dd7080f199e12bf/tornado-6.5.5-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:487dc9cc380e29f58c7ab88f9e27cdeef04b2140862e5076a66fb6bb68bb1bfa", size = 445983, upload-time = "2026-03-10T21:30:44.28Z" },
{ url = "https://files.pythonhosted.org/packages/ab/5e/7625b76cd10f98f1516c36ce0346de62061156352353ef2da44e5c21523c/tornado-6.5.5-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:65a7f1d46d4bb41df1ac99f5fcb685fb25c7e61613742d5108b010975a9a6521", size = 444246, upload-time = "2026-03-10T21:30:46.571Z" },
{ url = "https://files.pythonhosted.org/packages/b2/04/7b5705d5b3c0fab088f434f9c83edac1573830ca49ccf29fb83bf7178eec/tornado-6.5.5-cp39-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:e74c92e8e65086b338fd56333fb9a68b9f6f2fe7ad532645a290a464bcf46be5", size = 447229, upload-time = "2026-03-10T21:30:48.273Z" },
{ url = "https://files.pythonhosted.org/packages/34/01/74e034a30ef59afb4097ef8659515e96a39d910b712a89af76f5e4e1f93c/tornado-6.5.5-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:435319e9e340276428bbdb4e7fa732c2d399386d1de5686cb331ec8eee754f07", size = 448192, upload-time = "2026-03-10T21:30:51.22Z" },
{ url = "https://files.pythonhosted.org/packages/be/00/fe9e02c5a96429fce1a1d15a517f5d8444f9c412e0bb9eadfbe3b0fc55bf/tornado-6.5.5-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:3f54aa540bdbfee7b9eb268ead60e7d199de5021facd276819c193c0fb28ea4e", size = 448039, upload-time = "2026-03-10T21:30:53.52Z" },
{ url = "https://files.pythonhosted.org/packages/82/9e/656ee4cec0398b1d18d0f1eb6372c41c6b889722641d84948351ae19556d/tornado-6.5.5-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:36abed1754faeb80fbd6e64db2758091e1320f6bba74a4cf8c09cd18ccce8aca", size = 447445, upload-time = "2026-03-10T21:30:55.541Z" },
]
[[package]]
@@ -5141,6 +5233,23 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b1/5e/512aeb40fd819f4660d00f96f5c7371ee36fc8c6b605128c5ee59e0b28c6/u_msgpack_python-2.8.0-py2.py3-none-any.whl", hash = "sha256:1d853d33e78b72c4228a2025b4db28cda81214076e5b0422ed0ae1b1b2bb586a", size = 10590, upload-time = "2023-05-18T09:28:10.323Z" },
]
[[package]]
name = "uharfbuzz"
version = "0.53.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/1c/8d/7c82298bfa5c96f018541661bc2ccdf90dfe397bb2724db46725bf495466/uharfbuzz-0.53.3.tar.gz", hash = "sha256:9a87175c14d1361322ce2a3504e63c6b66062934a5edf47266aed5b33416806c", size = 1714488, upload-time = "2026-01-24T13:10:43.693Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/51/88/5df9337adb60d7b1ad150b162bbc5c56d783d15546714085d92b9531f8f3/uharfbuzz-0.53.3-cp310-abi3-macosx_10_9_universal2.whl", hash = "sha256:d977e41a501d9e8af3f2c329d75031037ee79634bc29ca3872e9115c44e67d25", size = 2722639, upload-time = "2026-01-24T13:10:22.436Z" },
{ url = "https://files.pythonhosted.org/packages/39/c4/8b4b050e77d6cb9a84af509e5796734f0e687bd02ad11757a581bd6f197d/uharfbuzz-0.53.3-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:21d512c94aa992691aaf5b433deaca7e51f4ea54c68b99f535974073364f806f", size = 1647506, upload-time = "2026-01-24T13:10:24.16Z" },
{ url = "https://files.pythonhosted.org/packages/30/ff/8e7cf78d525604f3e0a43b9468263fcf2acb5d208a3979c3bfa8dc61112d/uharfbuzz-0.53.3-cp310-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:dca9a2e071c0c59ba8f382356f31a2518ac3dc7cc77e4f3519defc454c5b9a97", size = 1706448, upload-time = "2026-01-24T13:10:25.729Z" },
{ url = "https://files.pythonhosted.org/packages/9b/a0/739471cdd52723ecc9fc80f36fb92c706a87265dc258521c1b14d99414f7/uharfbuzz-0.53.3-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:1191a74ddcf18ec721161b6b33a8ab31b0c6a2b15c6724a9b663127bf7f07d2e", size = 2664628, upload-time = "2026-01-24T13:10:27.814Z" },
{ url = "https://files.pythonhosted.org/packages/ae/4a/63a81e9eef922b9f26bd948b518b73704d01a8d8e83324b2f99084ab7af0/uharfbuzz-0.53.3-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:35ec3b600b3f63e7659792f9bd43e1ffb389d3d2aac8285f269d11efbe04787d", size = 2757384, upload-time = "2026-01-24T13:10:29.669Z" },
{ url = "https://files.pythonhosted.org/packages/e2/d2/27be1201488323d0ff0c99fb966a0522b2736f79bd5a5b7b99526fca3d98/uharfbuzz-0.53.3-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:6f0ad2812303d2c7ccff596fd6c9d5629874f3a83f30255e11639c9b7ba4e89d", size = 1335822, upload-time = "2026-01-24T13:10:34.774Z" },
{ url = "https://files.pythonhosted.org/packages/70/99/53e39bcd4dec5981eb70a6a76285a862c8a76b80cd52e8f40fe51adab032/uharfbuzz-0.53.3-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:757d9ed1841912e8f229319f335cf7dd25a2fd377e444bda9deb720617192e12", size = 1237560, upload-time = "2026-01-24T13:10:36.971Z" },
{ url = "https://files.pythonhosted.org/packages/aa/2b/04d8cde466acfe70373d4f489da5c6eab0aba07d50442dd21217cb0fd167/uharfbuzz-0.53.3-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5d3a0b824811bd1be129356818e6cdbf0e4b056bb60aa9a5eb270bff9d21f24c", size = 1497923, upload-time = "2026-01-24T13:10:38.743Z" },
{ url = "https://files.pythonhosted.org/packages/f3/01/a250521491bc995609275e0062c552b16f437a3ce15de83250176245093e/uharfbuzz-0.53.3-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9211d798b2921a99b8c34e810676137f66372d3b5447765b72d969bdfa6abe6a", size = 1556794, upload-time = "2026-01-24T13:10:40.262Z" },
]
[[package]]
name = "ujson"
version = "5.11.0"