Compare commits

..

14 Commits

Author SHA1 Message Date
shamoon
59414df8f1 Docs stuff 2026-02-26 17:06:54 -08:00
Trenton H
30e25a5522 This link doesn't exist anymore 2026-02-26 12:37:02 -08:00
Trenton H
09ec99e4a4 Sure, fix this one 2026-02-26 12:34:47 -08:00
Trenton H
25bb20fa33 Probably brings back the inferring of Postgres vs SQLite 2026-02-26 12:30:05 -08:00
Trenton H
b1c0593b3e We're down one more level here 2026-02-26 11:07:30 -08:00
Trenton H
21425147ad Cleans up config formatting 2026-02-26 11:07:30 -08:00
Trenton H
e2c48164db Finshes the writeup and migration guide 2026-02-26 11:07:30 -08:00
Trenton H
32cbf41c4c Relaxes the typing here 2026-02-26 11:07:30 -08:00
Trenton H
25a5abfa8d Makes the system checks return multiples, not mashed together 2026-02-26 11:07:30 -08:00
Trenton H
14d015b3f6 Missing the required parameters 2026-02-26 11:07:30 -08:00
Trenton H
41cee07a0d Smaller changeset, keep the rest for later 2026-02-26 11:07:30 -08:00
Trenton H
996b48940f Adds the checks for depracated env vars 2026-02-26 11:07:29 -08:00
Trenton H
0a5c8ae4dc Moves to the new parser module 2026-02-26 11:07:29 -08:00
Trenton H
8b272d59ff Renames the settings to a module for the next steps 2026-02-26 11:07:29 -08:00
36 changed files with 1585 additions and 2119 deletions

View File

@@ -39,3 +39,6 @@ max_line_length = off
[Dockerfile*]
indent_style = space
[*.toml]
indent_style = space

View File

@@ -51,137 +51,172 @@ matcher.
### Database
By default, Paperless uses **SQLite** with a database stored at `data/db.sqlite3`.
To switch to **PostgreSQL** or **MariaDB**, set [`PAPERLESS_DBHOST`](#PAPERLESS_DBHOST) and optionally configure other
database-related environment variables.
For multi-user or higher-throughput deployments, **PostgreSQL** (recommended) or
**MariaDB** can be used instead by setting [`PAPERLESS_DBENGINE`](#PAPERLESS_DBENGINE)
and the relevant connection variables.
#### [`PAPERLESS_DBENGINE=<engine>`](#PAPERLESS_DBENGINE) {#PAPERLESS_DBENGINE}
: Specifies the database engine to use. Accepted values are `sqlite`, `postgresql`,
and `mariadb`.
Defaults to `sqlite` if not set.
PostgreSQL and MariaDB both require [`PAPERLESS_DBHOST`](#PAPERLESS_DBHOST) to be
set. SQLite does not use any other connection variables; the database file is always
located at `<PAPERLESS_DATA_DIR>/db.sqlite3`.
!!! warning
Using MariaDB comes with some caveats.
See [MySQL Caveats](advanced_usage.md#mysql-caveats).
#### [`PAPERLESS_DBHOST=<hostname>`](#PAPERLESS_DBHOST) {#PAPERLESS_DBHOST}
: If unset, Paperless uses **SQLite** by default.
Set `PAPERLESS_DBHOST` to switch to PostgreSQL or MariaDB instead.
#### [`PAPERLESS_DBENGINE=<engine_name>`](#PAPERLESS_DBENGINE) {#PAPERLESS_DBENGINE}
: Optional. Specifies the database engine to use when connecting to a remote database.
Available options are `postgresql` and `mariadb`.
Defaults to `postgresql` if `PAPERLESS_DBHOST` is set.
!!! warning
Using MariaDB comes with some caveats. See [MySQL Caveats](advanced_usage.md#mysql-caveats).
: Hostname of the PostgreSQL or MariaDB database server. Required when
`PAPERLESS_DBENGINE` is `postgresql` or `mariadb`.
#### [`PAPERLESS_DBPORT=<port>`](#PAPERLESS_DBPORT) {#PAPERLESS_DBPORT}
: Port to use when connecting to PostgreSQL or MariaDB.
Default is `5432` for PostgreSQL and `3306` for MariaDB.
Defaults to `5432` for PostgreSQL and `3306` for MariaDB.
#### [`PAPERLESS_DBNAME=<name>`](#PAPERLESS_DBNAME) {#PAPERLESS_DBNAME}
: Name of the database to connect to when using PostgreSQL or MariaDB.
: Name of the PostgreSQL or MariaDB database to connect to.
Defaults to "paperless".
Defaults to `paperless`.
#### [`PAPERLESS_DBUSER=<name>`](#PAPERLESS_DBUSER) {#PAPERLESS_DBUSER}
#### [`PAPERLESS_DBUSER=<user>`](#PAPERLESS_DBUSER) {#PAPERLESS_DBUSER}
: Username for authenticating with the PostgreSQL or MariaDB database.
Defaults to "paperless".
Defaults to `paperless`.
#### [`PAPERLESS_DBPASS=<password>`](#PAPERLESS_DBPASS) {#PAPERLESS_DBPASS}
: Password for the PostgreSQL or MariaDB database user.
Defaults to "paperless".
Defaults to `paperless`.
#### [`PAPERLESS_DBSSLMODE=<mode>`](#PAPERLESS_DBSSLMODE) {#PAPERLESS_DBSSLMODE}
#### [`PAPERLESS_DB_OPTIONS=<options>`](#PAPERLESS_DB_OPTIONS) {#PAPERLESS_DB_OPTIONS}
: SSL mode to use when connecting to PostgreSQL or MariaDB.
: Advanced database connection options as a semicolon-delimited key-value string.
Keys and values are separated by `=`. Dot-notation produces nested option
dictionaries; for example, `pool.max_size=20` sets
`OPTIONS["pool"]["max_size"] = 20`.
See [the official documentation about
sslmode for PostgreSQL](https://www.postgresql.org/docs/current/libpq-ssl.html).
Options specified here are merged over the engine defaults. Unrecognised keys
are passed through to the underlying database driver without validation, so a
typo will be silently ignored rather than producing an error.
See [the official documentation about
sslmode for MySQL and MariaDB](https://dev.mysql.com/doc/refman/8.0/en/connection-options.html#option_general_ssl-mode).
Refer to your database driver's documentation for the full set of accepted keys:
*Note*: SSL mode values differ between PostgreSQL and MariaDB.
- PostgreSQL: [libpq connection parameters](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS)
- MariaDB: [MariaDB Connector/Python](https://mariadb.com/kb/en/mariadb-connector-python/)
- SQLite: [SQLite PRAGMA statements](https://www.sqlite.org/pragma.html)
Default is `prefer` for PostgreSQL and `PREFERRED` for MariaDB.
!!! note "PostgreSQL connection pooling"
#### [`PAPERLESS_DBSSLROOTCERT=<ca-path>`](#PAPERLESS_DBSSLROOTCERT) {#PAPERLESS_DBSSLROOTCERT}
Pool size is controlled via `pool.min_size` and `pool.max_size`. When
configuring pooling, ensure your PostgreSQL `max_connections` is large enough
to handle all pool connections across all workers:
`(web_workers + celery_workers) * pool.max_size + safety_margin`.
: Path to the SSL root certificate used to verify the database server.
**Examples:**
See [the official documentation about
sslmode for PostgreSQL](https://www.postgresql.org/docs/current/libpq-ssl.html).
Changes the location of `root.crt`.
```bash title="PostgreSQL: require SSL, set a custom CA certificate, and limit the pool size"
PAPERLESS_DB_OPTIONS="sslmode=require;sslrootcert=/certs/ca.pem;pool.max_size=5"
```
See [the official documentation about
sslmode for MySQL and MariaDB](https://dev.mysql.com/doc/refman/8.0/en/connection-options.html#option_general_ssl-ca).
```bash title="MariaDB: require SSL with a custom CA certificate"
PAPERLESS_DB_OPTIONS="ssl_mode=REQUIRED;ssl.ca=/certs/ca.pem"
```
Defaults to unset, using the standard location in the home directory.
```bash title="SQLite: set a busy timeout of 30 seconds"
# PostgreSQL: set a connection timeout
PAPERLESS_DB_OPTIONS="connect_timeout=10"
```
#### [`PAPERLESS_DBSSLCERT=<client-cert-path>`](#PAPERLESS_DBSSLCERT) {#PAPERLESS_DBSSLCERT}
#### ~~[`PAPERLESS_DBSSLMODE`](#PAPERLESS_DBSSLMODE)~~ {#PAPERLESS_DBSSLMODE}
: Path to the client SSL certificate used when connecting securely.
!!! failure "Removed in v3"
See [the official documentation about
sslmode for PostgreSQL](https://www.postgresql.org/docs/current/libpq-ssl.html).
Use [`PAPERLESS_DB_OPTIONS`](#PAPERLESS_DB_OPTIONS) instead.
See [the official documentation about
sslmode for MySQL and MariaDB](https://dev.mysql.com/doc/refman/8.0/en/connection-options.html#option_general_ssl-cert).
```bash title="PostgreSQL"
PAPERLESS_DB_OPTIONS="sslmode=require"
```
Changes the location of `postgresql.crt`.
```bash title="MariaDB"
PAPERLESS_DB_OPTIONS="ssl_mode=REQUIRED"
```
Defaults to unset, using the standard location in the home directory.
#### ~~[`PAPERLESS_DBSSLROOTCERT`](#PAPERLESS_DBSSLROOTCERT)~~ {#PAPERLESS_DBSSLROOTCERT}
#### [`PAPERLESS_DBSSLKEY=<client-cert-key>`](#PAPERLESS_DBSSLKEY) {#PAPERLESS_DBSSLKEY}
!!! failure "Removed in v3"
: Path to the client SSL private key used when connecting securely.
Use [`PAPERLESS_DB_OPTIONS`](#PAPERLESS_DB_OPTIONS) instead.
See [the official documentation about
sslmode for PostgreSQL](https://www.postgresql.org/docs/current/libpq-ssl.html).
```bash title="PostgreSQL"
PAPERLESS_DB_OPTIONS="sslrootcert=/path/to/ca.pem"
```
See [the official documentation about
sslmode for MySQL and MariaDB](https://dev.mysql.com/doc/refman/8.0/en/connection-options.html#option_general_ssl-key).
```bash title="MariaDB"
PAPERLESS_DB_OPTIONS="ssl.ca=/path/to/ca.pem"
```
Changes the location of `postgresql.key`.
#### ~~[`PAPERLESS_DBSSLCERT`](#PAPERLESS_DBSSLCERT)~~ {#PAPERLESS_DBSSLCERT}
Defaults to unset, using the standard location in the home directory.
!!! failure "Removed in v3"
#### [`PAPERLESS_DB_TIMEOUT=<int>`](#PAPERLESS_DB_TIMEOUT) {#PAPERLESS_DB_TIMEOUT}
Use [`PAPERLESS_DB_OPTIONS`](#PAPERLESS_DB_OPTIONS) instead.
: Sets how long a database connection should wait before timing out.
```bash title="PostgreSQL"
PAPERLESS_DB_OPTIONS="sslcert=/path/to/client.crt"
```
For SQLite, this sets how long to wait if the database is locked.
For PostgreSQL or MariaDB, this sets the connection timeout.
```bash title="MariaDB"
PAPERLESS_DB_OPTIONS="ssl.cert=/path/to/client.crt"
```
Defaults to unset, which uses Djangos built-in defaults.
#### ~~[`PAPERLESS_DBSSLKEY`](#PAPERLESS_DBSSLKEY)~~ {#PAPERLESS_DBSSLKEY}
#### [`PAPERLESS_DB_POOLSIZE=<int>`](#PAPERLESS_DB_POOLSIZE) {#PAPERLESS_DB_POOLSIZE}
!!! failure "Removed in v3"
: Defines the maximum number of database connections to keep in the pool.
Use [`PAPERLESS_DB_OPTIONS`](#PAPERLESS_DB_OPTIONS) instead.
Only applies to PostgreSQL. This setting is ignored for other database engines.
```bash title="PostgreSQL"
PAPERLESS_DB_OPTIONS="sslkey=/path/to/client.key"
```
The value must be greater than or equal to 1 to be used.
Defaults to unset, which disables connection pooling.
```bash title="MariaDB"
PAPERLESS_DB_OPTIONS="ssl.key=/path/to/client.key"
```
!!! note
#### ~~[`PAPERLESS_DB_TIMEOUT`](#PAPERLESS_DB_TIMEOUT)~~ {#PAPERLESS_DB_TIMEOUT}
A pool of 8-10 connections per worker is typically sufficient.
If you encounter error messages such as `couldn't get a connection`
or database connection timeouts, you probably need to increase the pool size.
!!! failure "Removed in v3"
!!! warning
Make sure your PostgreSQL `max_connections` setting is large enough to handle the connection pools:
`(NB_PAPERLESS_WORKERS + NB_CELERY_WORKERS) × POOL_SIZE + SAFETY_MARGIN`. For example, with
4 Paperless workers and 2 Celery workers, and a pool size of 8:``(4 + 2) × 8 + 10 = 58`,
so `max_connections = 60` (or even more) is appropriate.
Use [`PAPERLESS_DB_OPTIONS`](#PAPERLESS_DB_OPTIONS) instead.
This assumes only Paperless-ngx connects to your PostgreSQL instance. If you have other applications,
you should increase `max_connections` accordingly.
```bash title="SQLite"
PAPERLESS_DB_OPTIONS="timeout=30"
```
```bash title="PostgreSQL or MariaDB"
PAPERLESS_DB_OPTIONS="connect_timeout=30"
```
#### ~~[`PAPERLESS_DB_POOLSIZE`](#PAPERLESS_DB_POOLSIZE)~~ {#PAPERLESS_DB_POOLSIZE}
!!! failure "Removed in v3"
Use [`PAPERLESS_DB_OPTIONS`](#PAPERLESS_DB_OPTIONS) instead.
```bash
PAPERLESS_DB_OPTIONS="pool.max_size=10"
```
#### [`PAPERLESS_DB_READ_CACHE_ENABLED=<bool>`](#PAPERLESS_DB_READ_CACHE_ENABLED) {#PAPERLESS_DB_READ_CACHE_ENABLED}

View File

@@ -48,3 +48,58 @@ The `CONSUMER_BARCODE_SCANNER` setting has been removed. zxing-cpp is now the on
reliability.
- The `libzbar0` / `libzbar-dev` system packages are no longer required and can be removed from any custom Docker
images or host installations.
## Database Engine
`PAPERLESS_DBENGINE` is now required to use PostgreSQL or MariaDB. Previously, the
engine was inferred from the presence of `PAPERLESS_DBHOST`, with `PAPERLESS_DBENGINE`
only needed to select MariaDB over PostgreSQL.
SQLite users require no changes, though they may explicitly set their engine if desired.
#### Action Required
PostgreSQL and MariaDB users must add `PAPERLESS_DBENGINE` to their environment:
```yaml
# v2 (PostgreSQL inferred from PAPERLESS_DBHOST)
PAPERLESS_DBHOST: postgres
# v3 (engine must be explicit)
PAPERLESS_DBENGINE: postgresql
PAPERLESS_DBHOST: postgres
```
See [`PAPERLESS_DBENGINE`](configuration.md#PAPERLESS_DBENGINE) for accepted values.
## Database Advanced Options
The individual SSL, timeout, and pooling variables have been removed in favour of a
single [`PAPERLESS_DB_OPTIONS`](configuration.md#PAPERLESS_DB_OPTIONS) string. This
consolidates a growing set of engine-specific variables into one place, and allows
any option supported by the underlying database driver to be set without requiring a
dedicated environment variable for each.
The removed variables and their replacements are:
| Removed Variable | Replacement in `PAPERLESS_DB_OPTIONS` |
| ------------------------- | ---------------------------------------------------------------------------- |
| `PAPERLESS_DBSSLMODE` | `sslmode=<value>` (PostgreSQL) or `ssl_mode=<value>` (MariaDB) |
| `PAPERLESS_DBSSLROOTCERT` | `sslrootcert=<path>` (PostgreSQL) or `ssl.ca=<path>` (MariaDB) |
| `PAPERLESS_DBSSLCERT` | `sslcert=<path>` (PostgreSQL) or `ssl.cert=<path>` (MariaDB) |
| `PAPERLESS_DBSSLKEY` | `sslkey=<path>` (PostgreSQL) or `ssl.key=<path>` (MariaDB) |
| `PAPERLESS_DB_POOLSIZE` | `pool.max_size=<value>` (PostgreSQL only) |
| `PAPERLESS_DB_TIMEOUT` | `timeout=<value>` (SQLite) or `connect_timeout=<value>` (PostgreSQL/MariaDB) |
The deprecated variables will continue to function for now but will be removed in a
future release. A deprecation warning is logged at startup for each deprecated variable
that is still set.
#### Action Required
Users with any of the deprecated variables set should migrate to `PAPERLESS_DB_OPTIONS`.
Multiple options are combined in a single value:
```bash
PAPERLESS_DB_OPTIONS="sslmode=require;sslrootcert=/certs/ca.pem;pool.max_size=10"
```

View File

@@ -504,8 +504,7 @@ installation. Keep these points in mind:
- Read the [changelog](changelog.md) and
take note of breaking changes.
- Decide whether to stay on SQLite or migrate to PostgreSQL.
See [documentation](#sqlite_to_psql) for details on moving data
from SQLite to PostgreSQL. Both work fine with
Both work fine with
Paperless. However, if you already have a database server running
for other services, you might as well use it for Paperless as well.
- The task scheduler of Paperless, which is used to execute periodic

View File

@@ -37,7 +37,6 @@ dependencies = [
"django-filter~=25.1",
"django-guardian~=3.3.0",
"django-multiselectfield~=1.0.1",
"django-rich~=2.2.0",
"django-soft-delete~=1.0.18",
"django-treenode>=0.23.2",
"djangorestframework~=3.16",
@@ -77,6 +76,7 @@ dependencies = [
"setproctitle~=1.3.4",
"tika-client~=0.10.0",
"torch~=2.10.0",
"tqdm~=4.67.1",
"watchfiles>=1.1.1",
"whitenoise~=6.11",
"whoosh-reloaded>=2.7.5",
@@ -149,6 +149,7 @@ typing = [
"types-pytz",
"types-redis",
"types-setuptools",
"types-tqdm",
]
[tool.uv]
@@ -303,7 +304,6 @@ markers = [
"tika: Tests requiring Tika service",
"greenmail: Tests requiring Greenmail service",
"date_parsing: Tests which cover date parsing from content or filename",
"management: Tests which cover management commands/functionality",
]
[tool.pytest_env]

View File

@@ -1,320 +0,0 @@
"""
Base command class for Paperless-ngx management commands.
Provides automatic progress bar and multiprocessing support with minimal boilerplate.
"""
from __future__ import annotations
import os
from collections.abc import Iterable
from collections.abc import Sized
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures import as_completed
from dataclasses import dataclass
from typing import TYPE_CHECKING
from typing import Any
from typing import ClassVar
from typing import Generic
from typing import TypeVar
from django import db
from django.core.management import CommandError
from django.db.models import QuerySet
from django_rich.management import RichCommand
from rich.console import Console
from rich.progress import BarColumn
from rich.progress import MofNCompleteColumn
from rich.progress import Progress
from rich.progress import SpinnerColumn
from rich.progress import TextColumn
from rich.progress import TimeElapsedColumn
from rich.progress import TimeRemainingColumn
if TYPE_CHECKING:
from collections.abc import Callable
from collections.abc import Generator
from collections.abc import Iterable
from collections.abc import Sequence
from django.core.management import CommandParser
T = TypeVar("T")
R = TypeVar("R")
@dataclass(frozen=True, slots=True)
class ProcessResult(Generic[T, R]):
"""
Result of processing a single item in parallel.
Attributes:
item: The input item that was processed.
result: The return value from the processing function, or None if an error occurred.
error: The exception if processing failed, or None on success.
"""
item: T
result: R | None
error: BaseException | None
@property
def success(self) -> bool:
"""Return True if the item was processed successfully."""
return self.error is None
class PaperlessCommand(RichCommand):
"""
Base command class with automatic progress bar and multiprocessing support.
Features are opt-in via class attributes:
supports_progress_bar: Adds --no-progress-bar argument (default: True)
supports_multiprocessing: Adds --processes argument (default: False)
Example usage:
class Command(PaperlessCommand):
help = "Process all documents"
def handle(self, *args, **options):
documents = Document.objects.all()
for doc in self.track(documents, description="Processing..."):
process_document(doc)
class Command(PaperlessCommand):
help = "Regenerate thumbnails"
supports_multiprocessing = True
def handle(self, *args, **options):
ids = list(Document.objects.values_list("id", flat=True))
for result in self.process_parallel(process_doc, ids):
if result.error:
self.console.print(f"[red]Failed: {result.error}[/red]")
"""
supports_progress_bar: ClassVar[bool] = True
supports_multiprocessing: ClassVar[bool] = False
# Instance attributes set by execute() before handle() runs
no_progress_bar: bool
process_count: int
def add_arguments(self, parser: CommandParser) -> None:
"""Add arguments based on supported features."""
super().add_arguments(parser)
if self.supports_progress_bar:
parser.add_argument(
"--no-progress-bar",
default=False,
action="store_true",
help="Disable the progress bar",
)
if self.supports_multiprocessing:
default_processes = max(1, (os.cpu_count() or 1) // 4)
parser.add_argument(
"--processes",
default=default_processes,
type=int,
help=f"Number of processes to use (default: {default_processes})",
)
def execute(self, *args: Any, **options: Any) -> str | None:
"""
Set up instance state before handle() is called.
This is called by Django's command infrastructure after argument parsing
but before handle(). We use it to set instance attributes from options.
"""
# Set progress bar state
if self.supports_progress_bar:
self.no_progress_bar = options.get("no_progress_bar", False)
else:
self.no_progress_bar = True
# Set multiprocessing state
if self.supports_multiprocessing:
self.process_count = options.get("processes", 1)
if self.process_count < 1:
raise CommandError("--processes must be at least 1")
else:
self.process_count = 1
return super().execute(*args, **options)
def _create_progress(self, description: str) -> Progress:
"""
Create a configured Progress instance.
Progress output is directed to stderr to match the convention that
progress bars are transient UI feedback, not command output. This
mirrors tqdm's default behavior and prevents progress bar rendering
from interfering with stdout-based assertions in tests or piped
command output.
Args:
description: Text to display alongside the progress bar.
Returns:
A Progress instance configured with appropriate columns.
"""
return Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
BarColumn(),
MofNCompleteColumn(),
TimeElapsedColumn(),
TimeRemainingColumn(),
console=Console(stderr=True),
transient=False,
)
def _get_iterable_length(self, iterable: Iterable[object]) -> int | None:
"""
Attempt to determine the length of an iterable without consuming it.
Tries .count() first (for Django querysets - executes SELECT COUNT(*)),
then falls back to len() for sequences.
Args:
iterable: The iterable to measure.
Returns:
The length if determinable, None otherwise.
"""
if isinstance(iterable, QuerySet):
return iterable.count()
if isinstance(iterable, Sized):
return len(iterable)
return None
def track(
self,
iterable: Iterable[T],
*,
description: str = "Processing...",
total: int | None = None,
) -> Generator[T, None, None]:
"""
Iterate over items with an optional progress bar.
Respects --no-progress-bar flag. When disabled, simply yields items
without any progress display.
Args:
iterable: The items to iterate over.
description: Text to display alongside the progress bar.
total: Total number of items. If None, attempts to determine
automatically via .count() (for querysets) or len().
Yields:
Items from the iterable.
Example:
for doc in self.track(documents, description="Renaming..."):
process(doc)
"""
if self.no_progress_bar:
yield from iterable
return
# Attempt to determine total if not provided
if total is None:
total = self._get_iterable_length(iterable)
with self._create_progress(description) as progress:
task_id = progress.add_task(description, total=total)
for item in iterable:
yield item
progress.advance(task_id)
def process_parallel(
self,
fn: Callable[[T], R],
items: Sequence[T],
*,
description: str = "Processing...",
) -> Generator[ProcessResult[T, R], None, None]:
"""
Process items in parallel with progress tracking.
When --processes=1, runs sequentially in the main process without
spawning subprocesses. This is critical for testing, as multiprocessing
breaks fixtures, mocks, and database transactions.
When --processes > 1, uses ProcessPoolExecutor and automatically closes
database connections before spawning workers (required for PostgreSQL).
Args:
fn: Function to apply to each item. Must be picklable for parallel
execution (i.e., defined at module level, not a lambda or closure).
items: Sequence of items to process.
description: Text to display alongside the progress bar.
Yields:
ProcessResult for each item, containing the item, result, and any error.
Example:
def regenerate_thumbnail(doc_id: int) -> Path:
...
for result in self.process_parallel(regenerate_thumbnail, doc_ids):
if result.error:
self.console.print(f"[red]Failed {result.item}[/red]")
"""
total = len(items)
if self.process_count == 1:
# Sequential execution in main process - critical for testing
yield from self._process_sequential(fn, items, description, total)
else:
# Parallel execution with ProcessPoolExecutor
yield from self._process_parallel(fn, items, description, total)
def _process_sequential(
self,
fn: Callable[[T], R],
items: Sequence[T],
description: str,
total: int,
) -> Generator[ProcessResult[T, R], None, None]:
"""Process items sequentially in the main process."""
for item in self.track(items, description=description, total=total):
try:
result = fn(item)
yield ProcessResult(item=item, result=result, error=None)
except Exception as e:
yield ProcessResult(item=item, result=None, error=e)
def _process_parallel(
self,
fn: Callable[[T], R],
items: Sequence[T],
description: str,
total: int,
) -> Generator[ProcessResult[T, R], None, None]:
"""Process items in parallel using ProcessPoolExecutor."""
# Close database connections before forking - required for PostgreSQL
db.connections.close_all()
with self._create_progress(description) as progress:
task_id = progress.add_task(description, total=total)
with ProcessPoolExecutor(max_workers=self.process_count) as executor:
# Submit all tasks and map futures back to items
future_to_item = {executor.submit(fn, item): item for item in items}
# Yield results as they complete
for future in as_completed(future_to_item):
item = future_to_item[future]
try:
result = future.result()
yield ProcessResult(item=item, result=result, error=None)
except Exception as e:
yield ProcessResult(item=item, result=None, error=e)
finally:
progress.advance(task_id)

View File

@@ -1,15 +1,20 @@
import logging
import multiprocessing
import tqdm
from django import db
from django.conf import settings
from django.core.management.base import BaseCommand
from documents.management.commands.base import PaperlessCommand
from documents.management.commands.mixins import MultiProcessMixin
from documents.management.commands.mixins import ProgressBarMixin
from documents.models import Document
from documents.tasks import update_document_content_maybe_archive_file
logger = logging.getLogger("paperless.management.archiver")
class Command(PaperlessCommand):
class Command(MultiProcessMixin, ProgressBarMixin, BaseCommand):
help = (
"Using the current classification model, assigns correspondents, tags "
"and document types to all documents, effectively allowing you to "
@@ -17,10 +22,7 @@ class Command(PaperlessCommand):
"modified) after their initial import."
)
supports_multiprocessing = True
def add_arguments(self, parser):
super().add_arguments(parser)
parser.add_argument(
"-f",
"--overwrite",
@@ -42,8 +44,13 @@ class Command(PaperlessCommand):
"run on this specific document."
),
)
self.add_argument_progress_bar_mixin(parser)
self.add_argument_processes_mixin(parser)
def handle(self, *args, **options):
self.handle_processes_mixin(**options)
self.handle_progress_bar_mixin(**options)
settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
overwrite = options["overwrite"]
@@ -53,21 +60,35 @@ class Command(PaperlessCommand):
else:
documents = Document.objects.all()
document_ids = [
doc.id for doc in documents if overwrite or not doc.has_archive_version
]
document_ids = list(
map(
lambda doc: doc.id,
filter(lambda d: overwrite or not d.has_archive_version, documents),
),
)
# Note to future self: this prevents django from reusing database
# connections between processes, which is bad and does not work
# with postgres.
db.connections.close_all()
try:
logging.getLogger().handlers[0].level = logging.ERROR
for result in self.process_parallel(
update_document_content_maybe_archive_file,
document_ids,
description="Archiving...",
):
if result.error:
self.console.print(
f"[red]Failed document {result.item}: {result.error}[/red]",
if self.process_count == 1:
for doc_id in document_ids:
update_document_content_maybe_archive_file(doc_id)
else: # pragma: no cover
with multiprocessing.Pool(self.process_count) as pool:
list(
tqdm.tqdm(
pool.imap_unordered(
update_document_content_maybe_archive_file,
document_ids,
),
total=len(document_ids),
disable=self.no_progress_bar,
),
)
except KeyboardInterrupt: # pragma: no cover
self.console.print("[yellow]Aborting...[/yellow]")
except KeyboardInterrupt:
self.stdout.write(self.style.NOTICE("Aborting..."))

View File

@@ -1,20 +1,24 @@
import dataclasses
import multiprocessing
from typing import Final
import rapidfuzz
import tqdm
from django.core.management import BaseCommand
from django.core.management import CommandError
from documents.management.commands.base import PaperlessCommand
from documents.management.commands.mixins import MultiProcessMixin
from documents.management.commands.mixins import ProgressBarMixin
from documents.models import Document
@dataclasses.dataclass(frozen=True, slots=True)
@dataclasses.dataclass(frozen=True)
class _WorkPackage:
first_doc: Document
second_doc: Document
@dataclasses.dataclass(frozen=True, slots=True)
@dataclasses.dataclass(frozen=True)
class _WorkResult:
doc_one_pk: int
doc_two_pk: int
@@ -27,23 +31,22 @@ class _WorkResult:
def _process_and_match(work: _WorkPackage) -> _WorkResult:
"""
Does basic processing of document content, gets the basic ratio
and returns the result package.
and returns the result package
"""
# Normalize the string some, lower case, whitespace, etc
first_string = rapidfuzz.utils.default_process(work.first_doc.content)
second_string = rapidfuzz.utils.default_process(work.second_doc.content)
# Basic matching ratio
match = rapidfuzz.fuzz.ratio(first_string, second_string)
return _WorkResult(work.first_doc.pk, work.second_doc.pk, match)
class Command(PaperlessCommand):
class Command(MultiProcessMixin, ProgressBarMixin, BaseCommand):
help = "Searches for documents where the content almost matches"
supports_multiprocessing = True
def add_arguments(self, parser):
super().add_arguments(parser)
parser.add_argument(
"--ratio",
default=85.0,
@@ -56,11 +59,16 @@ class Command(PaperlessCommand):
action="store_true",
help="If set, one document of matches above the ratio WILL BE DELETED",
)
self.add_argument_progress_bar_mixin(parser)
self.add_argument_processes_mixin(parser)
def handle(self, *args, **options):
RATIO_MIN: Final[float] = 0.0
RATIO_MAX: Final[float] = 100.0
self.handle_processes_mixin(**options)
self.handle_progress_bar_mixin(**options)
if options["delete"]:
self.stdout.write(
self.style.WARNING(
@@ -72,58 +80,66 @@ class Command(PaperlessCommand):
checked_pairs: set[tuple[int, int]] = set()
work_pkgs: list[_WorkPackage] = []
# Ratio is a float from 0.0 to 100.0
if opt_ratio < RATIO_MIN or opt_ratio > RATIO_MAX:
raise CommandError("The ratio must be between 0 and 100")
all_docs = Document.objects.all().order_by("id")
# Build work packages for processing
for first_doc in all_docs:
for second_doc in all_docs:
# doc to doc is obviously not useful
if first_doc.pk == second_doc.pk:
continue
# Skip empty documents (e.g. password-protected)
if first_doc.content.strip() == "" or second_doc.content.strip() == "":
continue
# Skip matching which have already been matched together
# doc 1 to doc 2 is the same as doc 2 to doc 1
doc_1_to_doc_2 = (first_doc.pk, second_doc.pk)
doc_2_to_doc_1 = doc_1_to_doc_2[::-1]
if doc_1_to_doc_2 in checked_pairs or doc_2_to_doc_1 in checked_pairs:
continue
checked_pairs.update([doc_1_to_doc_2, doc_2_to_doc_1])
# Actually something useful to work on now
work_pkgs.append(_WorkPackage(first_doc, second_doc))
results: list[_WorkResult] = []
# Don't spin up a pool of 1 process
if self.process_count == 1:
for work in self.track(work_pkgs, description="Matching..."):
results = []
for work in tqdm.tqdm(work_pkgs, disable=self.no_progress_bar):
results.append(_process_and_match(work))
else: # pragma: no cover
for proc_result in self.process_parallel(
_process_and_match,
work_pkgs,
description="Matching...",
):
if proc_result.error:
self.console.print(
f"[red]Failed: {proc_result.error}[/red]",
)
elif proc_result.result is not None:
results.append(proc_result.result)
messages: list[str] = []
maybe_delete_ids: list[int] = []
for match_result in sorted(results):
if match_result.ratio >= opt_ratio:
messages.append(
self.style.NOTICE(
f"Document {match_result.doc_one_pk} fuzzy match"
f" to {match_result.doc_two_pk}"
f" (confidence {match_result.ratio:.3f})\n",
with multiprocessing.Pool(processes=self.process_count) as pool:
results = list(
tqdm.tqdm(
pool.imap_unordered(_process_and_match, work_pkgs),
total=len(work_pkgs),
disable=self.no_progress_bar,
),
)
maybe_delete_ids.append(match_result.doc_two_pk)
# Check results
messages = []
maybe_delete_ids = []
for result in sorted(results):
if result.ratio >= opt_ratio:
messages.append(
self.style.NOTICE(
f"Document {result.doc_one_pk} fuzzy match"
f" to {result.doc_two_pk} (confidence {result.ratio:.3f})\n",
),
)
maybe_delete_ids.append(result.doc_two_pk)
if len(messages) == 0:
messages.append(self.style.SUCCESS("No matches found\n"))
self.stdout.writelines(messages)
messages.append(
self.style.SUCCESS("No matches found\n"),
)
self.stdout.writelines(
messages,
)
if options["delete"]:
self.stdout.write(
self.style.NOTICE(

View File

@@ -1,12 +1,25 @@
import logging
import tqdm
from django.core.management.base import BaseCommand
from django.db.models.signals import post_save
from documents.management.commands.base import PaperlessCommand
from documents.management.commands.mixins import ProgressBarMixin
from documents.models import Document
class Command(PaperlessCommand):
help = "Rename all documents"
class Command(ProgressBarMixin, BaseCommand):
help = "This will rename all documents to match the latest filename format."
def add_arguments(self, parser):
self.add_argument_progress_bar_mixin(parser)
def handle(self, *args, **options):
for document in self.track(Document.objects.all(), description="Renaming..."):
self.handle_progress_bar_mixin(**options)
logging.getLogger().handlers[0].level = logging.ERROR
for document in tqdm.tqdm(
Document.objects.all(),
disable=self.no_progress_bar,
):
post_save.send(Document, instance=document, created=False)

View File

@@ -1,7 +1,10 @@
import logging
import tqdm
from django.core.management.base import BaseCommand
from documents.classifier import load_classifier
from documents.management.commands.base import PaperlessCommand
from documents.management.commands.mixins import ProgressBarMixin
from documents.models import Document
from documents.signals.handlers import set_correspondent
from documents.signals.handlers import set_document_type
@@ -11,7 +14,7 @@ from documents.signals.handlers import set_tags
logger = logging.getLogger("paperless.management.retagger")
class Command(PaperlessCommand):
class Command(ProgressBarMixin, BaseCommand):
help = (
"Using the current classification model, assigns correspondents, tags "
"and document types to all documents, effectively allowing you to "
@@ -20,7 +23,6 @@ class Command(PaperlessCommand):
)
def add_arguments(self, parser):
super().add_arguments(parser)
parser.add_argument("-c", "--correspondent", default=False, action="store_true")
parser.add_argument("-T", "--tags", default=False, action="store_true")
parser.add_argument("-t", "--document_type", default=False, action="store_true")
@@ -32,7 +34,7 @@ class Command(PaperlessCommand):
action="store_true",
help=(
"By default this command won't try to assign a correspondent "
"if more than one matches the document. Use this flag if "
"if more than one matches the document. Use this flag if "
"you'd rather it just pick the first one it finds."
),
)
@@ -47,6 +49,7 @@ class Command(PaperlessCommand):
"and tags that do not match anymore due to changed rules."
),
)
self.add_argument_progress_bar_mixin(parser)
parser.add_argument(
"--suggest",
default=False,
@@ -65,6 +68,8 @@ class Command(PaperlessCommand):
)
def handle(self, *args, **options):
self.handle_progress_bar_mixin(**options)
if options["inbox_only"]:
queryset = Document.objects.filter(tags__is_inbox_tag=True)
else:
@@ -79,7 +84,7 @@ class Command(PaperlessCommand):
classifier = load_classifier()
for document in self.track(documents, description="Retagging..."):
for document in tqdm.tqdm(documents, disable=self.no_progress_bar):
if options["correspondent"]:
set_correspondent(
sender=None,
@@ -117,7 +122,6 @@ class Command(PaperlessCommand):
stdout=self.stdout,
style_func=self.style,
)
if options["storage_path"]:
set_storage_path(
sender=None,

View File

@@ -1,97 +1,17 @@
"""Management command to check the document archive for issues."""
from django.core.management.base import BaseCommand
from __future__ import annotations
import logging
from typing import Any
from rich.panel import Panel
from rich.table import Table
from rich.text import Text
from documents.management.commands.base import PaperlessCommand
from documents.models import Document
from documents.sanity_checker import SanityCheckMessages
from documents.management.commands.mixins import ProgressBarMixin
from documents.sanity_checker import check_sanity
_LEVEL_STYLE: dict[int, tuple[str, str]] = {
logging.ERROR: ("bold red", "ERROR"),
logging.WARNING: ("yellow", "WARN"),
logging.INFO: ("dim", "INFO"),
}
class Command(PaperlessCommand):
class Command(ProgressBarMixin, BaseCommand):
help = "This command checks your document archive for issues."
def _render_results(self, messages: SanityCheckMessages) -> None:
"""Render sanity check results as a Rich table."""
console = self.console
def add_arguments(self, parser):
self.add_argument_progress_bar_mixin(parser)
if len(messages) == 0:
console.print(
Panel(
"[green]No issues detected.[/green]",
title="Sanity Check",
border_style="green",
),
)
return
def handle(self, *args, **options):
self.handle_progress_bar_mixin(**options)
messages = check_sanity(progress=self.use_progress_bar, scheduled=False)
# Build a lookup for document titles
doc_pks = [pk for pk in messages.document_pks() if pk is not None]
titles: dict[int, str] = {}
if doc_pks:
titles = dict(
Document.global_objects.filter(pk__in=doc_pks)
.only("pk", "title")
.values_list("pk", "title"),
)
table = Table(
title="Sanity Check Results",
show_lines=True,
title_style="bold",
)
table.add_column("Level", width=7, no_wrap=True)
table.add_column("Document", min_width=20)
table.add_column("Issue", ratio=1)
for doc_pk, doc_messages in messages.iter_messages():
if doc_pk is not None:
title = titles.get(doc_pk, "Unknown")
doc_label = f"#{doc_pk} {title}"
else:
doc_label = "(global)"
for msg in doc_messages:
style, label = _LEVEL_STYLE.get(
msg["level"],
("dim", "INFO"),
)
table.add_row(
Text(label, style=style),
doc_label,
msg["message"],
)
console.print(table)
# Summary line
parts: list[str] = []
if messages.has_error:
parts.append("[bold red]errors[/bold red]")
if messages.has_warning:
parts.append("[yellow]warnings[/yellow]")
summary = " and ".join(parts) if parts else "infos"
console.print(f"\nFound {len(messages)} document(s) with {summary}.")
def handle(self, *args: Any, **options: Any) -> None:
messages = check_sanity(
scheduled=False,
iter_wrapper=lambda docs: self.track(
docs,
description="Checking documents...",
),
)
self._render_results(messages)
messages.log_messages()

View File

@@ -1,45 +1,43 @@
import logging
import multiprocessing
import shutil
from documents.management.commands.base import PaperlessCommand
import tqdm
from django import db
from django.core.management.base import BaseCommand
from documents.management.commands.mixins import MultiProcessMixin
from documents.management.commands.mixins import ProgressBarMixin
from documents.models import Document
from documents.parsers import get_parser_class_for_mime_type
logger = logging.getLogger("paperless.management.thumbnails")
def _process_document(doc_id: int) -> None:
def _process_document(doc_id) -> None:
document: Document = Document.objects.get(id=doc_id)
parser_class = get_parser_class_for_mime_type(document.mime_type)
if parser_class is None:
logger.warning(
"%s: No parser for mime type %s",
document,
document.mime_type,
)
if parser_class:
parser = parser_class(logging_group=None)
else:
print(f"{document} No parser for mime type {document.mime_type}") # noqa: T201
return
parser = parser_class(logging_group=None)
try:
thumb = parser.get_thumbnail(
document.source_path,
document.mime_type,
document.get_public_filename(),
)
shutil.move(thumb, document.thumbnail_path)
finally:
parser.cleanup()
class Command(PaperlessCommand):
class Command(MultiProcessMixin, ProgressBarMixin, BaseCommand):
help = "This will regenerate the thumbnails for all documents."
supports_multiprocessing = True
def add_arguments(self, parser) -> None:
super().add_arguments(parser)
parser.add_argument(
"-d",
"--document",
@@ -51,23 +49,36 @@ class Command(PaperlessCommand):
"run on this specific document."
),
)
self.add_argument_progress_bar_mixin(parser)
self.add_argument_processes_mixin(parser)
def handle(self, *args, **options):
logging.getLogger().handlers[0].level = logging.ERROR
self.handle_processes_mixin(**options)
self.handle_progress_bar_mixin(**options)
if options["document"]:
documents = Document.objects.filter(pk=options["document"])
else:
documents = Document.objects.all()
ids = list(documents.values_list("id", flat=True))
ids = [doc.id for doc in documents]
for result in self.process_parallel(
_process_document,
ids,
description="Regenerating thumbnails...",
):
if result.error: # pragma: no cover
self.console.print(
f"[red]Failed document {result.item}: {result.error}[/red]",
# Note to future self: this prevents django from reusing database
# connections between processes, which is bad and does not work
# with postgres.
db.connections.close_all()
if self.process_count == 1:
for doc_id in ids:
_process_document(doc_id)
else: # pragma: no cover
with multiprocessing.Pool(processes=self.process_count) as pool:
list(
tqdm.tqdm(
pool.imap_unordered(_process_document, ids),
total=len(ids),
disable=self.no_progress_bar,
),
)

View File

@@ -21,6 +21,26 @@ class CryptFields(TypedDict):
fields: list[str]
class MultiProcessMixin:
"""
Small class to handle adding an argument and validating it
for the use of multiple processes
"""
def add_argument_processes_mixin(self, parser: ArgumentParser) -> None:
parser.add_argument(
"--processes",
default=max(1, os.cpu_count() // 4),
type=int,
help="Number of processes to distribute work amongst",
)
def handle_processes_mixin(self, *args, **options) -> None:
self.process_count = options["processes"]
if self.process_count < 1:
raise CommandError("There must be at least 1 process")
class ProgressBarMixin:
"""
Many commands use a progress bar, which can be disabled

View File

@@ -1,21 +1,27 @@
from auditlog.models import LogEntry
from django.core.management.base import BaseCommand
from django.db import transaction
from tqdm import tqdm
from documents.management.commands.base import PaperlessCommand
from documents.management.commands.mixins import ProgressBarMixin
class Command(PaperlessCommand):
"""Prune the audit logs of objects that no longer exist."""
class Command(BaseCommand, ProgressBarMixin):
"""
Prune the audit logs of objects that no longer exist.
"""
help = "Prunes the audit logs of objects that no longer exist."
def handle(self, *args, **options):
def add_arguments(self, parser):
self.add_argument_progress_bar_mixin(parser)
def handle(self, **options):
self.handle_progress_bar_mixin(**options)
with transaction.atomic():
for log_entry in self.track(
LogEntry.objects.all(),
description="Pruning audit logs...",
):
for log_entry in tqdm(LogEntry.objects.all(), disable=self.no_progress_bar):
model_class = log_entry.content_type.model_class()
# use global_objects for SoftDeleteModel
objects = (
model_class.global_objects
if hasattr(model_class, "global_objects")
@@ -26,8 +32,8 @@ class Command(PaperlessCommand):
and not objects.filter(pk=log_entry.object_id).exists()
):
log_entry.delete()
self.console.print(
f"Deleted audit log entry for "
f"{model_class.__name__} #{log_entry.object_id}",
style="yellow",
tqdm.write(
self.style.NOTICE(
f"Deleted audit log entry for {model_class.__name__} #{log_entry.object_id}",
),
)

View File

@@ -1,141 +1,80 @@
"""
Sanity checker for the Paperless-ngx document archive.
Verifies that all documents have valid files, correct checksums,
and consistent metadata. Reports orphaned files in the media directory.
Progress display is the caller's responsibility -- pass an ``iter_wrapper``
to wrap the document queryset (e.g., with a progress bar). The default
is an identity function that adds no overhead.
"""
from __future__ import annotations
import hashlib
import logging
import uuid
from collections import defaultdict
from collections.abc import Callable
from collections.abc import Iterable
from collections.abc import Iterator
from pathlib import Path
from typing import TYPE_CHECKING
from typing import Final
from typing import TypedDict
from typing import TypeVar
from celery import states
from django.conf import settings
from django.utils import timezone
from tqdm import tqdm
from documents.models import Document
from documents.models import PaperlessTask
from paperless.config import GeneralConfig
logger = logging.getLogger("paperless.sanity_checker")
_T = TypeVar("_T")
IterWrapper = Callable[[Iterable[_T]], Iterable[_T]]
class MessageEntry(TypedDict):
"""A single sanity check message with its severity level."""
level: int
message: str
def _identity(iterable: Iterable[_T]) -> Iterable[_T]:
"""Pass through an iterable unchanged (default iter_wrapper)."""
return iterable
class SanityCheckMessages:
"""Collects sanity check messages grouped by document primary key.
Messages are categorized as error, warning, or info. ``None`` is used
as the key for messages not associated with a specific document
(e.g., orphaned files).
"""
def __init__(self) -> None:
self._messages: dict[int | None, list[MessageEntry]] = defaultdict(list)
self.has_error: bool = False
self.has_warning: bool = False
self._messages: dict[int, list[dict]] = defaultdict(list)
self.has_error = False
self.has_warning = False
# -- Recording ----------------------------------------------------------
def error(self, doc_pk: int | None, message: str) -> None:
def error(self, doc_pk, message) -> None:
self._messages[doc_pk].append({"level": logging.ERROR, "message": message})
self.has_error = True
def warning(self, doc_pk: int | None, message: str) -> None:
def warning(self, doc_pk, message) -> None:
self._messages[doc_pk].append({"level": logging.WARNING, "message": message})
self.has_warning = True
def info(self, doc_pk: int | None, message: str) -> None:
def info(self, doc_pk, message) -> None:
self._messages[doc_pk].append({"level": logging.INFO, "message": message})
# -- Iteration / query --------------------------------------------------
def document_pks(self) -> list[int | None]:
"""Return all document PKs (including None for global messages)."""
return list(self._messages.keys())
def iter_messages(self) -> Iterator[tuple[int | None, list[MessageEntry]]]:
"""Iterate over (doc_pk, messages) pairs."""
yield from self._messages.items()
def __len__(self) -> int:
return len(self._messages)
def __getitem__(self, item: int | None) -> list[MessageEntry]:
return self._messages[item]
# -- Logging output (used by Celery task path) --------------------------
def log_messages(self) -> None:
"""Write all messages to the ``paperless.sanity_checker`` logger.
logger = logging.getLogger("paperless.sanity_checker")
This is the output path for headless / Celery execution.
Management commands use Rich rendering instead.
"""
if len(self._messages) == 0:
logger.info("Sanity checker detected no issues.")
return
else:
# Query once
all_docs = Document.global_objects.all()
doc_pks = [pk for pk in self._messages if pk is not None]
titles: dict[int, str] = {}
if doc_pks:
titles = dict(
Document.global_objects.filter(pk__in=doc_pks)
.only("pk", "title")
.values_list("pk", "title"),
)
for doc_pk in self._messages:
if doc_pk is not None:
doc = all_docs.get(pk=doc_pk)
logger.info(
f"Detected following issue(s) with document #{doc.pk},"
f" titled {doc.title}",
)
for msg in self._messages[doc_pk]:
logger.log(msg["level"], msg["message"])
for doc_pk, entries in self._messages.items():
if doc_pk is not None:
title = titles.get(doc_pk, "Unknown")
logger.info(
"Detected following issue(s) with document #%s, titled %s",
doc_pk,
title,
)
for msg in entries:
logger.log(msg["level"], msg["message"])
def __len__(self):
return len(self._messages)
def __getitem__(self, item):
return self._messages[item]
class SanityCheckFailedException(Exception):
pass
# ---------------------------------------------------------------------------
# Internal helpers
# ---------------------------------------------------------------------------
def check_sanity(*, progress=False, scheduled=True) -> SanityCheckMessages:
paperless_task = PaperlessTask.objects.create(
task_id=uuid.uuid4(),
type=PaperlessTask.TaskType.SCHEDULED_TASK
if scheduled
else PaperlessTask.TaskType.MANUAL_TASK,
task_name=PaperlessTask.TaskName.CHECK_SANITY,
status=states.STARTED,
date_created=timezone.now(),
date_started=timezone.now(),
)
messages = SanityCheckMessages()
def _build_present_files() -> set[Path]:
"""Collect all files in MEDIA_ROOT, excluding directories and ignorable files."""
present_files = {
x.resolve()
for x in Path(settings.MEDIA_ROOT).glob("**/*")
@@ -143,167 +82,95 @@ def _build_present_files() -> set[Path]:
}
lockfile = Path(settings.MEDIA_LOCK).resolve()
present_files.discard(lockfile)
if lockfile in present_files:
present_files.remove(lockfile)
general_config = GeneralConfig()
app_logo = general_config.app_logo or settings.APP_LOGO
if app_logo:
logo_file = Path(settings.MEDIA_ROOT / Path(app_logo.lstrip("/"))).resolve()
present_files.discard(logo_file)
if logo_file in present_files:
present_files.remove(logo_file)
return present_files
def _check_thumbnail(
doc: Document,
messages: SanityCheckMessages,
present_files: set[Path],
) -> None:
"""Verify the thumbnail exists and is readable."""
thumbnail_path: Final[Path] = Path(doc.thumbnail_path).resolve()
if not thumbnail_path.exists() or not thumbnail_path.is_file():
messages.error(doc.pk, "Thumbnail of document does not exist.")
return
present_files.discard(thumbnail_path)
try:
_ = thumbnail_path.read_bytes()
except OSError as e:
messages.error(doc.pk, f"Cannot read thumbnail file of document: {e}")
def _check_original(
doc: Document,
messages: SanityCheckMessages,
present_files: set[Path],
) -> None:
"""Verify the original file exists, is readable, and has matching checksum."""
source_path: Final[Path] = Path(doc.source_path).resolve()
if not source_path.exists() or not source_path.is_file():
messages.error(doc.pk, "Original of document does not exist.")
return
present_files.discard(source_path)
try:
checksum = hashlib.md5(source_path.read_bytes()).hexdigest()
except OSError as e:
messages.error(doc.pk, f"Cannot read original file of document: {e}")
else:
if checksum != doc.checksum:
messages.error(
doc.pk,
f"Checksum mismatch. Stored: {doc.checksum}, actual: {checksum}.",
)
def _check_archive(
doc: Document,
messages: SanityCheckMessages,
present_files: set[Path],
) -> None:
"""Verify archive file consistency: checksum/filename pairing and file integrity."""
if doc.archive_checksum is not None and doc.archive_filename is None:
messages.error(
doc.pk,
"Document has an archive file checksum, but no archive filename.",
)
elif doc.archive_checksum is None and doc.archive_filename is not None:
messages.error(
doc.pk,
"Document has an archive file, but its checksum is missing.",
)
elif doc.has_archive_version:
if TYPE_CHECKING:
assert isinstance(doc.archive_path, Path)
archive_path: Final[Path] = Path(doc.archive_path).resolve()
if not archive_path.exists() or not archive_path.is_file():
messages.error(doc.pk, "Archived version of document does not exist.")
return
present_files.discard(archive_path)
try:
checksum = hashlib.md5(archive_path.read_bytes()).hexdigest()
except OSError as e:
messages.error(
doc.pk,
f"Cannot read archive file of document : {e}",
)
for doc in tqdm(Document.global_objects.all(), disable=not progress):
# Check sanity of the thumbnail
thumbnail_path: Final[Path] = Path(doc.thumbnail_path).resolve()
if not thumbnail_path.exists() or not thumbnail_path.is_file():
messages.error(doc.pk, "Thumbnail of document does not exist.")
else:
if checksum != doc.archive_checksum:
messages.error(
doc.pk,
"Checksum mismatch of archived document. "
f"Stored: {doc.archive_checksum}, actual: {checksum}.",
)
if thumbnail_path in present_files:
present_files.remove(thumbnail_path)
try:
_ = thumbnail_path.read_bytes()
except OSError as e:
messages.error(doc.pk, f"Cannot read thumbnail file of document: {e}")
# Check sanity of the original file
# TODO: extract method
source_path: Final[Path] = Path(doc.source_path).resolve()
if not source_path.exists() or not source_path.is_file():
messages.error(doc.pk, "Original of document does not exist.")
else:
if source_path in present_files:
present_files.remove(source_path)
try:
checksum = hashlib.md5(source_path.read_bytes()).hexdigest()
except OSError as e:
messages.error(doc.pk, f"Cannot read original file of document: {e}")
else:
if checksum != doc.checksum:
messages.error(
doc.pk,
"Checksum mismatch. "
f"Stored: {doc.checksum}, actual: {checksum}.",
)
def _check_content(doc: Document, messages: SanityCheckMessages) -> None:
"""Flag documents with no OCR content."""
if not doc.content:
messages.info(doc.pk, "Document contains no OCR data")
# Check sanity of the archive file.
if doc.archive_checksum is not None and doc.archive_filename is None:
messages.error(
doc.pk,
"Document has an archive file checksum, but no archive filename.",
)
elif doc.archive_checksum is None and doc.archive_filename is not None:
messages.error(
doc.pk,
"Document has an archive file, but its checksum is missing.",
)
elif doc.has_archive_version:
archive_path: Final[Path] = Path(doc.archive_path).resolve()
if not archive_path.exists() or not archive_path.is_file():
messages.error(doc.pk, "Archived version of document does not exist.")
else:
if archive_path in present_files:
present_files.remove(archive_path)
try:
checksum = hashlib.md5(archive_path.read_bytes()).hexdigest()
except OSError as e:
messages.error(
doc.pk,
f"Cannot read archive file of document : {e}",
)
else:
if checksum != doc.archive_checksum:
messages.error(
doc.pk,
"Checksum mismatch of archived document. "
f"Stored: {doc.archive_checksum}, "
f"actual: {checksum}.",
)
def _check_document(
doc: Document,
messages: SanityCheckMessages,
present_files: set[Path],
) -> None:
"""Run all checks for a single document."""
_check_thumbnail(doc, messages, present_files)
_check_original(doc, messages, present_files)
_check_archive(doc, messages, present_files)
_check_content(doc, messages)
# ---------------------------------------------------------------------------
# Public entry point
# ---------------------------------------------------------------------------
def check_sanity(
*,
scheduled: bool = True,
iter_wrapper: IterWrapper[Document] = _identity,
) -> SanityCheckMessages:
"""Run a full sanity check on the document archive.
Args:
scheduled: Whether this is a scheduled (automatic) or manual check.
Controls the task type recorded in the database.
iter_wrapper: A callable that wraps the document iterable, e.g.,
for progress bar display. Defaults to identity (no wrapping).
Returns:
A SanityCheckMessages instance containing all detected issues.
"""
paperless_task = PaperlessTask.objects.create(
task_id=uuid.uuid4(),
type=(
PaperlessTask.TaskType.SCHEDULED_TASK
if scheduled
else PaperlessTask.TaskType.MANUAL_TASK
),
task_name=PaperlessTask.TaskName.CHECK_SANITY,
status=states.STARTED,
date_created=timezone.now(),
date_started=timezone.now(),
)
messages = SanityCheckMessages()
present_files = _build_present_files()
documents = Document.global_objects.all()
for doc in iter_wrapper(documents):
_check_document(doc, messages, present_files)
# other document checks
if not doc.content:
messages.info(doc.pk, "Document contains no OCR data")
for extra_file in present_files:
messages.warning(None, f"Orphaned file in media dir: {extra_file}")
paperless_task.status = states.SUCCESS if not messages.has_error else states.FAILURE
# result is concatenated messages
paperless_task.result = f"{len(messages)} issues found."
if messages.has_error:
paperless_task.result += " Check logs for details."
paperless_task.date_done = timezone.now()
paperless_task.save(update_fields=["status", "result", "date_done"])
return messages

View File

@@ -1,96 +1,10 @@
import shutil
import zoneinfo
from dataclasses import dataclass
from pathlib import Path
from typing import TYPE_CHECKING
import filelock
import pytest
from django.contrib.auth import get_user_model
from pytest_django.fixtures import SettingsWrapper
from rest_framework.test import APIClient
from documents.tests.factories import DocumentFactory
if TYPE_CHECKING:
from documents.models import Document
@dataclass(frozen=True, slots=True)
class PaperlessDirs:
"""Standard Paperless-ngx directory layout for tests."""
media: Path
originals: Path
archive: Path
thumbnails: Path
@pytest.fixture(scope="session")
def samples_dir() -> Path:
"""Path to the shared test sample documents."""
return Path(__file__).parent / "samples" / "documents"
@pytest.fixture()
def paperless_dirs(tmp_path: Path) -> PaperlessDirs:
"""Create and return the directory structure for testing."""
media = tmp_path / "media"
dirs = PaperlessDirs(
media=media,
originals=media / "documents" / "originals",
archive=media / "documents" / "archive",
thumbnails=media / "documents" / "thumbnails",
)
for d in (dirs.originals, dirs.archive, dirs.thumbnails):
d.mkdir(parents=True)
return dirs
@pytest.fixture()
def _media_settings(paperless_dirs: PaperlessDirs, settings) -> None:
"""Configure Django settings to point at temp directories."""
settings.MEDIA_ROOT = paperless_dirs.media
settings.ORIGINALS_DIR = paperless_dirs.originals
settings.ARCHIVE_DIR = paperless_dirs.archive
settings.THUMBNAIL_DIR = paperless_dirs.thumbnails
settings.MEDIA_LOCK = paperless_dirs.media / "media.lock"
settings.IGNORABLE_FILES = {".DS_Store", "Thumbs.db", "desktop.ini"}
settings.APP_LOGO = ""
@pytest.fixture()
def sample_doc(
paperless_dirs: PaperlessDirs,
_media_settings: None,
samples_dir: Path,
) -> "Document":
"""Create a document with valid files and matching checksums."""
with filelock.FileLock(paperless_dirs.media / "media.lock"):
shutil.copy(
samples_dir / "originals" / "0000001.pdf",
paperless_dirs.originals / "0000001.pdf",
)
shutil.copy(
samples_dir / "archive" / "0000001.pdf",
paperless_dirs.archive / "0000001.pdf",
)
shutil.copy(
samples_dir / "thumbnails" / "0000001.webp",
paperless_dirs.thumbnails / "0000001.webp",
)
return DocumentFactory(
title="test",
checksum="42995833e01aea9b3edee44bbfdd7ce1",
archive_checksum="62acb0bcbfbcaa62ca6ad3668e4e404b",
content="test content",
pk=1,
filename="0000001.pdf",
mime_type="application/pdf",
archive_filename="0000001.pdf",
)
@pytest.fixture()
def settings_timezone(settings: SettingsWrapper) -> zoneinfo.ZoneInfo:

View File

@@ -1,518 +0,0 @@
"""Tests for PaperlessCommand base class."""
from __future__ import annotations
import io
from typing import TYPE_CHECKING
import pytest
from django.core.management import CommandError
from django.db.models import QuerySet
from rich.console import Console
from documents.management.commands.base import PaperlessCommand
from documents.management.commands.base import ProcessResult
if TYPE_CHECKING:
from pytest_mock import MockerFixture
# --- Test Commands ---
# These simulate real command implementations for testing
class SimpleCommand(PaperlessCommand):
"""Command with default settings (progress bar, no multiprocessing)."""
help = "Simple test command"
def handle(self, *args, **options):
items = list(range(5))
results = []
for item in self.track(items, description="Processing..."):
results.append(item * 2)
self.stdout.write(f"Results: {results}")
class NoProgressBarCommand(PaperlessCommand):
"""Command with progress bar disabled."""
help = "No progress bar command"
supports_progress_bar = False
def handle(self, *args, **options):
items = list(range(3))
for _ in self.track(items):
# We don't need to actually work
pass
self.stdout.write("Done")
class MultiprocessCommand(PaperlessCommand):
"""Command with multiprocessing support."""
help = "Multiprocess test command"
supports_multiprocessing = True
def handle(self, *args, **options):
items = list(range(5))
results = []
for result in self.process_parallel(
_double_value,
items,
description="Processing...",
):
results.append(result)
successes = sum(1 for r in results if r.success)
self.stdout.write(f"Successes: {successes}")
# --- Helper Functions for Multiprocessing ---
# Must be at module level to be picklable
def _double_value(x: int) -> int:
"""Double the input value."""
return x * 2
def _divide_ten_by(x: int) -> float:
"""Divide 10 by x. Raises ZeroDivisionError if x is 0."""
return 10 / x
# --- Fixtures ---
@pytest.fixture
def console() -> Console:
"""Create a non-interactive console for testing."""
return Console(force_terminal=False, force_interactive=False)
@pytest.fixture
def simple_command(console: Console) -> SimpleCommand:
"""Create a SimpleCommand instance configured for testing."""
command = SimpleCommand()
command.stdout = io.StringIO()
command.stderr = io.StringIO()
command.console = console
command.no_progress_bar = True
command.process_count = 1
return command
@pytest.fixture
def multiprocess_command(console: Console) -> MultiprocessCommand:
"""Create a MultiprocessCommand instance configured for testing."""
command = MultiprocessCommand()
command.stdout = io.StringIO()
command.stderr = io.StringIO()
command.console = console
command.no_progress_bar = True
command.process_count = 1
return command
@pytest.fixture
def mock_queryset():
"""
Create a mock Django QuerySet that tracks method calls.
This verifies we use .count() instead of len() for querysets.
"""
class MockQuerySet(QuerySet):
def __init__(self, items: list):
self._items = items
self.count_called = False
def count(self) -> int:
self.count_called = True
return len(self._items)
def __iter__(self):
return iter(self._items)
def __len__(self):
raise AssertionError("len() should not be called on querysets")
return MockQuerySet
# --- Test Classes ---
@pytest.mark.management
class TestProcessResult:
"""Tests for the ProcessResult dataclass."""
def test_success_result(self):
result = ProcessResult(item=1, result=2, error=None)
assert result.item == 1
assert result.result == 2
assert result.error is None
assert result.success is True
def test_error_result(self):
error = ValueError("test error")
result = ProcessResult(item=1, result=None, error=error)
assert result.item == 1
assert result.result is None
assert result.error is error
assert result.success is False
@pytest.mark.management
class TestPaperlessCommandArguments:
"""Tests for argument parsing behavior."""
def test_progress_bar_argument_added_by_default(self):
command = SimpleCommand()
parser = command.create_parser("manage.py", "simple")
options = parser.parse_args(["--no-progress-bar"])
assert options.no_progress_bar is True
options = parser.parse_args([])
assert options.no_progress_bar is False
def test_progress_bar_argument_not_added_when_disabled(self):
command = NoProgressBarCommand()
parser = command.create_parser("manage.py", "noprogress")
options = parser.parse_args([])
assert not hasattr(options, "no_progress_bar")
def test_processes_argument_added_when_multiprocessing_enabled(self):
command = MultiprocessCommand()
parser = command.create_parser("manage.py", "multiprocess")
options = parser.parse_args(["--processes", "4"])
assert options.processes == 4
options = parser.parse_args([])
assert options.processes >= 1
def test_processes_argument_not_added_when_multiprocessing_disabled(self):
command = SimpleCommand()
parser = command.create_parser("manage.py", "simple")
options = parser.parse_args([])
assert not hasattr(options, "processes")
@pytest.mark.management
class TestPaperlessCommandExecute:
"""Tests for the execute() setup behavior."""
@pytest.fixture
def base_options(self) -> dict:
"""Base options required for execute()."""
return {
"verbosity": 1,
"no_color": True,
"force_color": False,
"skip_checks": True,
}
@pytest.mark.parametrize(
("no_progress_bar_flag", "expected"),
[
pytest.param(False, False, id="progress-bar-enabled"),
pytest.param(True, True, id="progress-bar-disabled"),
],
)
def test_no_progress_bar_state_set(
self,
base_options: dict,
*,
no_progress_bar_flag: bool,
expected: bool,
):
command = SimpleCommand()
command.stdout = io.StringIO()
command.stderr = io.StringIO()
options = {**base_options, "no_progress_bar": no_progress_bar_flag}
command.execute(**options)
assert command.no_progress_bar is expected
def test_no_progress_bar_always_true_when_not_supported(self, base_options: dict):
command = NoProgressBarCommand()
command.stdout = io.StringIO()
command.stderr = io.StringIO()
command.execute(**base_options)
assert command.no_progress_bar is True
@pytest.mark.parametrize(
("processes", "expected"),
[
pytest.param(1, 1, id="single-process"),
pytest.param(4, 4, id="four-processes"),
],
)
def test_process_count_set(
self,
base_options: dict,
processes: int,
expected: int,
):
command = MultiprocessCommand()
command.stdout = io.StringIO()
command.stderr = io.StringIO()
options = {**base_options, "processes": processes, "no_progress_bar": True}
command.execute(**options)
assert command.process_count == expected
@pytest.mark.parametrize(
"invalid_count",
[
pytest.param(0, id="zero"),
pytest.param(-1, id="negative"),
],
)
def test_process_count_validation_rejects_invalid(
self,
base_options: dict,
invalid_count: int,
):
command = MultiprocessCommand()
command.stdout = io.StringIO()
command.stderr = io.StringIO()
options = {**base_options, "processes": invalid_count, "no_progress_bar": True}
with pytest.raises(CommandError, match="--processes must be at least 1"):
command.execute(**options)
def test_process_count_defaults_to_one_when_not_supported(self, base_options: dict):
command = SimpleCommand()
command.stdout = io.StringIO()
command.stderr = io.StringIO()
options = {**base_options, "no_progress_bar": True}
command.execute(**options)
assert command.process_count == 1
@pytest.mark.management
class TestGetIterableLength:
"""Tests for the _get_iterable_length() method."""
def test_uses_count_for_querysets(
self,
simple_command: SimpleCommand,
mock_queryset,
):
"""Should call .count() on Django querysets rather than len()."""
queryset = mock_queryset([1, 2, 3, 4, 5])
result = simple_command._get_iterable_length(queryset)
assert result == 5
assert queryset.count_called is True
def test_uses_len_for_sized(self, simple_command: SimpleCommand):
"""Should use len() for sequences and other Sized types."""
result = simple_command._get_iterable_length([1, 2, 3, 4])
assert result == 4
def test_returns_none_for_unsized_iterables(self, simple_command: SimpleCommand):
"""Should return None for generators and other iterables without len()."""
result = simple_command._get_iterable_length(x for x in [1, 2, 3])
assert result is None
@pytest.mark.management
class TestTrack:
"""Tests for the track() method."""
def test_with_progress_bar_disabled(self, simple_command: SimpleCommand):
simple_command.no_progress_bar = True
items = ["a", "b", "c"]
result = list(simple_command.track(items, description="Test..."))
assert result == items
def test_with_progress_bar_enabled(self, simple_command: SimpleCommand):
simple_command.no_progress_bar = False
items = [1, 2, 3]
result = list(simple_command.track(items, description="Processing..."))
assert result == items
def test_with_explicit_total(self, simple_command: SimpleCommand):
simple_command.no_progress_bar = False
def gen():
yield from [1, 2, 3]
result = list(simple_command.track(gen(), total=3))
assert result == [1, 2, 3]
def test_with_generator_no_total(self, simple_command: SimpleCommand):
def gen():
yield from [1, 2, 3]
result = list(simple_command.track(gen()))
assert result == [1, 2, 3]
def test_empty_iterable(self, simple_command: SimpleCommand):
result = list(simple_command.track([]))
assert result == []
def test_uses_queryset_count(
self,
simple_command: SimpleCommand,
mock_queryset,
mocker: MockerFixture,
):
"""Verify track() uses .count() for querysets."""
simple_command.no_progress_bar = False
queryset = mock_queryset([1, 2, 3])
spy = mocker.spy(simple_command, "_get_iterable_length")
result = list(simple_command.track(queryset))
assert result == [1, 2, 3]
spy.assert_called_once_with(queryset)
assert queryset.count_called is True
@pytest.mark.management
class TestProcessParallel:
"""Tests for the process_parallel() method."""
def test_sequential_processing_single_process(
self,
multiprocess_command: MultiprocessCommand,
):
multiprocess_command.process_count = 1
items = [1, 2, 3, 4, 5]
results = list(multiprocess_command.process_parallel(_double_value, items))
assert len(results) == 5
assert all(r.success for r in results)
result_map = {r.item: r.result for r in results}
assert result_map == {1: 2, 2: 4, 3: 6, 4: 8, 5: 10}
def test_sequential_processing_handles_errors(
self,
multiprocess_command: MultiprocessCommand,
):
multiprocess_command.process_count = 1
items = [1, 2, 0, 4] # 0 causes ZeroDivisionError
results = list(multiprocess_command.process_parallel(_divide_ten_by, items))
assert len(results) == 4
successes = [r for r in results if r.success]
failures = [r for r in results if not r.success]
assert len(successes) == 3
assert len(failures) == 1
assert failures[0].item == 0
assert isinstance(failures[0].error, ZeroDivisionError)
def test_parallel_closes_db_connections(
self,
multiprocess_command: MultiprocessCommand,
mocker: MockerFixture,
):
multiprocess_command.process_count = 2
items = [1, 2, 3]
mock_connections = mocker.patch(
"documents.management.commands.base.db.connections",
)
results = list(multiprocess_command.process_parallel(_double_value, items))
mock_connections.close_all.assert_called_once()
assert len(results) == 3
def test_parallel_processing_handles_errors(
self,
multiprocess_command: MultiprocessCommand,
mocker: MockerFixture,
):
multiprocess_command.process_count = 2
items = [1, 2, 0, 4]
mocker.patch("documents.management.commands.base.db.connections")
results = list(multiprocess_command.process_parallel(_divide_ten_by, items))
failures = [r for r in results if not r.success]
assert len(failures) == 1
assert failures[0].item == 0
def test_empty_items(self, multiprocess_command: MultiprocessCommand):
results = list(multiprocess_command.process_parallel(_double_value, []))
assert results == []
def test_result_contains_original_item(
self,
multiprocess_command: MultiprocessCommand,
):
items = [10, 20, 30]
results = list(multiprocess_command.process_parallel(_double_value, items))
for result in results:
assert result.item in items
assert result.result == result.item * 2
def test_sequential_path_used_for_single_process(
self,
multiprocess_command: MultiprocessCommand,
mocker: MockerFixture,
):
"""Verify single process uses sequential path (important for testing)."""
multiprocess_command.process_count = 1
spy_sequential = mocker.spy(multiprocess_command, "_process_sequential")
spy_parallel = mocker.spy(multiprocess_command, "_process_parallel")
list(multiprocess_command.process_parallel(_double_value, [1, 2, 3]))
spy_sequential.assert_called_once()
spy_parallel.assert_not_called()
def test_parallel_path_used_for_multiple_processes(
self,
multiprocess_command: MultiprocessCommand,
mocker: MockerFixture,
):
"""Verify multiple processes uses parallel path."""
multiprocess_command.process_count = 2
mocker.patch("documents.management.commands.base.db.connections")
spy_sequential = mocker.spy(multiprocess_command, "_process_sequential")
spy_parallel = mocker.spy(multiprocess_command, "_process_parallel")
list(multiprocess_command.process_parallel(_double_value, [1, 2, 3]))
spy_parallel.assert_called_once()
spy_sequential.assert_not_called()

View File

@@ -1,173 +0,0 @@
"""Tests for the document_sanity_checker management command.
Verifies Rich rendering (table, panel, summary) and end-to-end CLI behavior.
"""
from __future__ import annotations
from io import StringIO
from pathlib import Path
from typing import TYPE_CHECKING
import pytest
from django.core.management import call_command
from rich.console import Console
from documents.management.commands.document_sanity_checker import Command
from documents.sanity_checker import SanityCheckMessages
from documents.tests.factories import DocumentFactory
if TYPE_CHECKING:
from documents.models import Document
from documents.tests.conftest import PaperlessDirs
def _render_to_string(messages: SanityCheckMessages) -> str:
"""Render command output to a plain string for assertion."""
buf = StringIO()
cmd = Command()
cmd.console = Console(file=buf, width=120, no_color=True)
cmd._render_results(messages)
return buf.getvalue()
# ---------------------------------------------------------------------------
# Rich rendering
# ---------------------------------------------------------------------------
class TestRenderResultsNoIssues:
"""No DB access needed -- renders an empty SanityCheckMessages."""
def test_shows_panel(self) -> None:
output = _render_to_string(SanityCheckMessages())
assert "No issues detected" in output
assert "Sanity Check" in output
@pytest.mark.django_db
class TestRenderResultsWithIssues:
def test_error_row(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.error(sample_doc.pk, "Original missing")
output = _render_to_string(msgs)
assert "Sanity Check Results" in output
assert "ERROR" in output
assert "Original missing" in output
assert f"#{sample_doc.pk}" in output
assert sample_doc.title in output
def test_warning_row(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.warning(sample_doc.pk, "Suspicious file")
output = _render_to_string(msgs)
assert "WARN" in output
assert "Suspicious file" in output
def test_info_row(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.info(sample_doc.pk, "No OCR data")
output = _render_to_string(msgs)
assert "INFO" in output
assert "No OCR data" in output
@pytest.mark.usefixtures("_media_settings")
def test_global_message(self) -> None:
msgs = SanityCheckMessages()
msgs.warning(None, "Orphaned file: /tmp/stray.pdf")
output = _render_to_string(msgs)
assert "(global)" in output
assert "Orphaned file" in output
def test_multiple_messages_same_doc(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.error(sample_doc.pk, "Thumbnail missing")
msgs.error(sample_doc.pk, "Checksum mismatch")
output = _render_to_string(msgs)
assert "Thumbnail missing" in output
assert "Checksum mismatch" in output
@pytest.mark.usefixtures("_media_settings")
def test_unknown_doc_pk(self) -> None:
msgs = SanityCheckMessages()
msgs.error(99999, "Ghost document")
output = _render_to_string(msgs)
assert "#99999" in output
assert "Unknown" in output
@pytest.mark.django_db
class TestRenderResultsSummary:
def test_errors_only(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.error(sample_doc.pk, "broken")
output = _render_to_string(msgs)
assert "errors" in output
assert "Found 1 document(s)" in output
def test_warnings_only(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.warning(sample_doc.pk, "odd")
output = _render_to_string(msgs)
assert "warnings" in output
def test_errors_and_warnings(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.error(sample_doc.pk, "broken")
msgs.warning(None, "orphan")
output = _render_to_string(msgs)
assert "errors" in output
assert "warnings" in output
assert "Found 2 document(s)" in output
def test_infos_only(self, sample_doc: Document) -> None:
msgs = SanityCheckMessages()
msgs.info(sample_doc.pk, "no OCR")
output = _render_to_string(msgs)
assert "infos" in output
# ---------------------------------------------------------------------------
# End-to-end command execution
# ---------------------------------------------------------------------------
@pytest.mark.django_db
@pytest.mark.management
class TestDocumentSanityCheckerCommand:
def test_no_issues(self, sample_doc: Document) -> None:
out = StringIO()
call_command("document_sanity_checker", "--no-progress-bar", stdout=out)
assert "No issues detected" in out.getvalue()
@pytest.mark.usefixtures("_media_settings")
def test_no_issues_empty_archive(self) -> None:
out = StringIO()
call_command("document_sanity_checker", "--no-progress-bar", stdout=out)
assert "No issues detected" in out.getvalue()
def test_missing_original(self, sample_doc: Document) -> None:
Path(sample_doc.source_path).unlink()
out = StringIO()
call_command("document_sanity_checker", "--no-progress-bar", stdout=out)
output = out.getvalue()
assert "ERROR" in output
assert "Original of document does not exist" in output
@pytest.mark.usefixtures("_media_settings")
def test_checksum_mismatch(self, paperless_dirs: PaperlessDirs) -> None:
"""Lightweight document with zero-byte files triggers checksum mismatch."""
doc = DocumentFactory(
title="test",
content="test",
filename="test.pdf",
checksum="abc",
)
Path(doc.source_path).touch()
Path(doc.thumbnail_path).touch()
out = StringIO()
call_command("document_sanity_checker", "--no-progress-bar", stdout=out)
output = out.getvalue()
assert "ERROR" in output
assert "Checksum mismatch. Stored: abc, actual:" in output

View File

@@ -4,7 +4,6 @@ from io import StringIO
from pathlib import Path
from unittest import mock
import pytest
from auditlog.models import LogEntry
from django.contrib.contenttypes.models import ContentType
from django.core.management import call_command
@@ -20,7 +19,6 @@ from documents.tests.utils import FileSystemAssertsMixin
sample_file: Path = Path(__file__).parent / "samples" / "simple.pdf"
@pytest.mark.management
@override_settings(FILENAME_FORMAT="{correspondent}/{title}")
class TestArchiver(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
def make_models(self):
@@ -96,7 +94,6 @@ class TestArchiver(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
self.assertEqual(doc2.archive_filename, "document_01.pdf")
@pytest.mark.management
class TestMakeIndex(TestCase):
@mock.patch("documents.management.commands.document_index.index_reindex")
def test_reindex(self, m) -> None:
@@ -109,7 +106,6 @@ class TestMakeIndex(TestCase):
m.assert_called_once()
@pytest.mark.management
class TestRenamer(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
@override_settings(FILENAME_FORMAT="")
def test_rename(self) -> None:
@@ -134,7 +130,6 @@ class TestRenamer(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
self.assertIsFile(doc2.archive_path)
@pytest.mark.management
class TestCreateClassifier(TestCase):
@mock.patch(
"documents.management.commands.document_create_classifier.train_classifier",
@@ -145,7 +140,31 @@ class TestCreateClassifier(TestCase):
m.assert_called_once()
@pytest.mark.management
class TestSanityChecker(DirectoriesMixin, TestCase):
def test_no_issues(self) -> None:
with self.assertLogs() as capture:
call_command("document_sanity_checker")
self.assertEqual(len(capture.output), 1)
self.assertIn("Sanity checker detected no issues.", capture.output[0])
def test_errors(self) -> None:
doc = Document.objects.create(
title="test",
content="test",
filename="test.pdf",
checksum="abc",
)
Path(doc.source_path).touch()
Path(doc.thumbnail_path).touch()
with self.assertLogs() as capture:
call_command("document_sanity_checker")
self.assertEqual(len(capture.output), 2)
self.assertIn("Checksum mismatch. Stored: abc, actual:", capture.output[1])
class TestConvertMariaDBUUID(TestCase):
@mock.patch("django.db.connection.schema_editor")
def test_convert(self, m) -> None:
@@ -159,7 +178,6 @@ class TestConvertMariaDBUUID(TestCase):
self.assertIn("Successfully converted", stdout.getvalue())
@pytest.mark.management
class TestPruneAuditLogs(TestCase):
def test_prune_audit_logs(self) -> None:
LogEntry.objects.create(

View File

@@ -577,7 +577,6 @@ class TestTagsFromPath:
assert len(tag_ids) == 0
@pytest.mark.management
class TestCommandValidation:
"""Tests for command argument validation."""
@@ -606,7 +605,6 @@ class TestCommandValidation:
cmd.handle(directory=str(sample_pdf), oneshot=True, testing=False)
@pytest.mark.management
@pytest.mark.usefixtures("mock_supported_extensions")
class TestCommandOneshot:
"""Tests for oneshot mode."""
@@ -777,7 +775,6 @@ def start_consumer(
)
@pytest.mark.management
@pytest.mark.django_db
class TestCommandWatch:
"""Integration tests for the watch loop."""
@@ -899,7 +896,6 @@ class TestCommandWatch:
assert not thread.is_alive()
@pytest.mark.management
@pytest.mark.django_db
class TestCommandWatchPolling:
"""Tests for polling mode."""
@@ -932,7 +928,6 @@ class TestCommandWatchPolling:
mock_consume_file_delay.delay.assert_called()
@pytest.mark.management
@pytest.mark.django_db
class TestCommandWatchRecursive:
"""Tests for recursive watching."""
@@ -996,7 +991,6 @@ class TestCommandWatchRecursive:
assert len(overrides.tag_ids) == 2
@pytest.mark.management
@pytest.mark.django_db
class TestCommandWatchEdgeCases:
"""Tests for edge cases and error handling."""

View File

@@ -7,7 +7,6 @@ from pathlib import Path
from unittest import mock
from zipfile import ZipFile
import pytest
from allauth.socialaccount.models import SocialAccount
from allauth.socialaccount.models import SocialApp
from allauth.socialaccount.models import SocialToken
@@ -46,7 +45,6 @@ from documents.tests.utils import paperless_environment
from paperless_mail.models import MailAccount
@pytest.mark.management
class TestExportImport(
DirectoriesMixin,
FileSystemAssertsMixin,
@@ -848,7 +846,6 @@ class TestExportImport(
self.assertEqual(Document.objects.all().count(), 4)
@pytest.mark.management
class TestCryptExportImport(
DirectoriesMixin,
FileSystemAssertsMixin,

View File

@@ -1,6 +1,5 @@
from io import StringIO
import pytest
from django.core.management import CommandError
from django.core.management import call_command
from django.test import TestCase
@@ -8,7 +7,6 @@ from django.test import TestCase
from documents.models import Document
@pytest.mark.management
class TestFuzzyMatchCommand(TestCase):
MSG_REGEX = r"Document \d fuzzy match to \d \(confidence \d\d\.\d\d\d\)"
@@ -51,6 +49,19 @@ class TestFuzzyMatchCommand(TestCase):
self.call_command("--ratio", "101")
self.assertIn("The ratio must be between 0 and 100", str(e.exception))
def test_invalid_process_count(self) -> None:
"""
GIVEN:
- Invalid process count less than 0 above upper
WHEN:
- Command is called
THEN:
- Error is raised indicating issue
"""
with self.assertRaises(CommandError) as e:
self.call_command("--processes", "0")
self.assertIn("There must be at least 1 process", str(e.exception))
def test_no_matches(self) -> None:
"""
GIVEN:
@@ -140,7 +151,7 @@ class TestFuzzyMatchCommand(TestCase):
mime_type="application/pdf",
filename="final_test.pdf",
)
stdout, _ = self.call_command("--no-progress-bar")
stdout, _ = self.call_command()
lines = [x.strip() for x in stdout.splitlines() if x.strip()]
self.assertEqual(len(lines), 3)
for line in lines:
@@ -183,7 +194,7 @@ class TestFuzzyMatchCommand(TestCase):
self.assertEqual(Document.objects.count(), 3)
stdout, _ = self.call_command("--delete", "--no-progress-bar")
stdout, _ = self.call_command("--delete")
self.assertIn(
"The command is configured to delete documents. Use with caution",

View File

@@ -4,7 +4,6 @@ from io import StringIO
from pathlib import Path
from zipfile import ZipFile
import pytest
from django.contrib.auth.models import User
from django.core.management import call_command
from django.core.management.base import CommandError
@@ -19,7 +18,6 @@ from documents.tests.utils import FileSystemAssertsMixin
from documents.tests.utils import SampleDirMixin
@pytest.mark.management
class TestCommandImport(
DirectoriesMixin,
FileSystemAssertsMixin,

View File

@@ -1,4 +1,3 @@
import pytest
from django.core.management import call_command
from django.core.management.base import CommandError
from django.test import TestCase
@@ -11,7 +10,6 @@ from documents.models import Tag
from documents.tests.utils import DirectoriesMixin
@pytest.mark.management
class TestRetagger(DirectoriesMixin, TestCase):
def make_models(self) -> None:
self.sp1 = StoragePath.objects.create(

View File

@@ -2,7 +2,6 @@ import os
from io import StringIO
from unittest import mock
import pytest
from django.contrib.auth.models import User
from django.core.management import call_command
from django.test import TestCase
@@ -10,7 +9,6 @@ from django.test import TestCase
from documents.tests.utils import DirectoriesMixin
@pytest.mark.management
class TestManageSuperUser(DirectoriesMixin, TestCase):
def call_command(self, environ):
out = StringIO()

View File

@@ -2,7 +2,6 @@ import shutil
from pathlib import Path
from unittest import mock
import pytest
from django.core.management import call_command
from django.test import TestCase
@@ -13,7 +12,6 @@ from documents.tests.utils import DirectoriesMixin
from documents.tests.utils import FileSystemAssertsMixin
@pytest.mark.management
class TestMakeThumbnails(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
def make_models(self) -> None:
self.d1 = Document.objects.create(

View File

@@ -1,295 +1,192 @@
"""Tests for the sanity checker module.
Tests exercise ``check_sanity`` as a whole, verifying document validation,
orphan detection, task recording, and the iter_wrapper contract.
"""
from __future__ import annotations
import logging
import shutil
from pathlib import Path
from typing import TYPE_CHECKING
import pytest
import filelock
from django.conf import settings
from django.test import TestCase
from django.test import override_settings
from documents.models import Document
from documents.models import PaperlessTask
from documents.sanity_checker import check_sanity
if TYPE_CHECKING:
from collections.abc import Iterable
from documents.tests.conftest import PaperlessDirs
from documents.tests.utils import DirectoriesMixin
@pytest.mark.django_db
class TestCheckSanityNoDocuments:
"""Sanity checks against an empty archive."""
class TestSanityCheck(DirectoriesMixin, TestCase):
def make_test_data(self):
with filelock.FileLock(settings.MEDIA_LOCK):
# just make sure that the lockfile is present.
shutil.copy(
(
Path(__file__).parent
/ "samples"
/ "documents"
/ "originals"
/ "0000001.pdf"
),
Path(self.dirs.originals_dir) / "0000001.pdf",
)
shutil.copy(
(
Path(__file__).parent
/ "samples"
/ "documents"
/ "archive"
/ "0000001.pdf"
),
Path(self.dirs.archive_dir) / "0000001.pdf",
)
shutil.copy(
(
Path(__file__).parent
/ "samples"
/ "documents"
/ "thumbnails"
/ "0000001.webp"
),
Path(self.dirs.thumbnail_dir) / "0000001.webp",
)
@pytest.mark.usefixtures("_media_settings")
def test_no_documents(self) -> None:
return Document.objects.create(
title="test",
checksum="42995833e01aea9b3edee44bbfdd7ce1",
archive_checksum="62acb0bcbfbcaa62ca6ad3668e4e404b",
content="test",
pk=1,
filename="0000001.pdf",
mime_type="application/pdf",
archive_filename="0000001.pdf",
)
def assertSanityError(self, doc: Document, messageRegex) -> None:
messages = check_sanity()
assert not messages.has_error
assert not messages.has_warning
assert len(messages) == 0
@pytest.mark.usefixtures("_media_settings")
def test_no_issues_logs_clean(self, caplog: pytest.LogCaptureFixture) -> None:
messages = check_sanity()
with caplog.at_level(logging.INFO, logger="paperless.sanity_checker"):
self.assertTrue(messages.has_error)
with self.assertLogs() as capture:
messages.log_messages()
assert "Sanity checker detected no issues." in caplog.text
@pytest.mark.django_db
class TestCheckSanityHealthyDocument:
def test_no_errors(self, sample_doc: Document) -> None:
messages = check_sanity()
assert not messages.has_error
assert not messages.has_warning
assert len(messages) == 0
@pytest.mark.django_db
class TestCheckSanityThumbnail:
def test_missing(self, sample_doc: Document) -> None:
Path(sample_doc.thumbnail_path).unlink()
messages = check_sanity()
assert messages.has_error
assert any(
"Thumbnail of document does not exist" in m["message"]
for m in messages[sample_doc.pk]
)
def test_unreadable(self, sample_doc: Document) -> None:
thumb = Path(sample_doc.thumbnail_path)
thumb.chmod(0o000)
try:
messages = check_sanity()
assert messages.has_error
assert any(
"Cannot read thumbnail" in m["message"] for m in messages[sample_doc.pk]
self.assertEqual(
capture.records[0].message,
f"Detected following issue(s) with document #{doc.pk}, titled {doc.title}",
)
finally:
thumb.chmod(0o644)
self.assertRegex(capture.records[1].message, messageRegex)
@pytest.mark.django_db
class TestCheckSanityOriginal:
def test_missing(self, sample_doc: Document) -> None:
Path(sample_doc.source_path).unlink()
def test_no_issues(self) -> None:
self.make_test_data()
messages = check_sanity()
assert messages.has_error
assert any(
"Original of document does not exist" in m["message"]
for m in messages[sample_doc.pk]
)
def test_checksum_mismatch(self, sample_doc: Document) -> None:
sample_doc.checksum = "badhash"
sample_doc.save()
messages = check_sanity()
assert messages.has_error
assert any(
"Checksum mismatch" in m["message"] and "badhash" in m["message"]
for m in messages[sample_doc.pk]
)
def test_unreadable(self, sample_doc: Document) -> None:
src = Path(sample_doc.source_path)
src.chmod(0o000)
try:
messages = check_sanity()
assert messages.has_error
assert any(
"Cannot read original" in m["message"] for m in messages[sample_doc.pk]
self.assertFalse(messages.has_error)
self.assertFalse(messages.has_warning)
with self.assertLogs() as capture:
messages.log_messages()
self.assertEqual(len(capture.output), 1)
self.assertEqual(capture.records[0].levelno, logging.INFO)
self.assertEqual(
capture.records[0].message,
"Sanity checker detected no issues.",
)
finally:
src.chmod(0o644)
def test_no_docs(self) -> None:
self.assertEqual(len(check_sanity()), 0)
@pytest.mark.django_db
class TestCheckSanityArchive:
def test_checksum_without_filename(self, sample_doc: Document) -> None:
sample_doc.archive_filename = None
sample_doc.save()
def test_success(self) -> None:
self.make_test_data()
self.assertEqual(len(check_sanity()), 0)
def test_no_thumbnail(self) -> None:
doc = self.make_test_data()
Path(doc.thumbnail_path).unlink()
self.assertSanityError(doc, "Thumbnail of document does not exist")
def test_thumbnail_no_access(self) -> None:
doc = self.make_test_data()
Path(doc.thumbnail_path).chmod(0o000)
self.assertSanityError(doc, "Cannot read thumbnail file of document")
Path(doc.thumbnail_path).chmod(0o777)
def test_no_original(self) -> None:
doc = self.make_test_data()
Path(doc.source_path).unlink()
self.assertSanityError(doc, "Original of document does not exist.")
def test_original_no_access(self) -> None:
doc = self.make_test_data()
Path(doc.source_path).chmod(0o000)
self.assertSanityError(doc, "Cannot read original file of document")
Path(doc.source_path).chmod(0o777)
def test_original_checksum_mismatch(self) -> None:
doc = self.make_test_data()
doc.checksum = "WOW"
doc.save()
self.assertSanityError(doc, "Checksum mismatch. Stored: WOW, actual: ")
def test_no_archive(self) -> None:
doc = self.make_test_data()
Path(doc.archive_path).unlink()
self.assertSanityError(doc, "Archived version of document does not exist.")
def test_archive_no_access(self) -> None:
doc = self.make_test_data()
Path(doc.archive_path).chmod(0o000)
self.assertSanityError(doc, "Cannot read archive file of document")
Path(doc.archive_path).chmod(0o777)
def test_archive_checksum_mismatch(self) -> None:
doc = self.make_test_data()
doc.archive_checksum = "WOW"
doc.save()
self.assertSanityError(doc, "Checksum mismatch of archived document")
def test_empty_content(self) -> None:
doc = self.make_test_data()
doc.content = ""
doc.save()
messages = check_sanity()
assert messages.has_error
assert any(
"checksum, but no archive filename" in m["message"]
for m in messages[sample_doc.pk]
self.assertFalse(messages.has_error)
self.assertFalse(messages.has_warning)
self.assertEqual(len(messages), 1)
self.assertRegex(
messages[doc.pk][0]["message"],
"Document contains no OCR data",
)
def test_filename_without_checksum(self, sample_doc: Document) -> None:
sample_doc.archive_checksum = None
sample_doc.save()
def test_orphaned_file(self) -> None:
self.make_test_data()
Path(self.dirs.originals_dir, "orphaned").touch()
messages = check_sanity()
assert messages.has_error
assert any(
"checksum is missing" in m["message"] for m in messages[sample_doc.pk]
self.assertTrue(messages.has_warning)
self.assertRegex(
messages._messages[None][0]["message"],
"Orphaned file in media dir",
)
def test_missing_file(self, sample_doc: Document) -> None:
Path(sample_doc.archive_path).unlink()
messages = check_sanity()
assert messages.has_error
assert any(
"Archived version of document does not exist" in m["message"]
for m in messages[sample_doc.pk]
)
def test_checksum_mismatch(self, sample_doc: Document) -> None:
sample_doc.archive_checksum = "wronghash"
sample_doc.save()
messages = check_sanity()
assert messages.has_error
assert any(
"Checksum mismatch of archived document" in m["message"]
for m in messages[sample_doc.pk]
)
def test_unreadable(self, sample_doc: Document) -> None:
archive = Path(sample_doc.archive_path)
archive.chmod(0o000)
try:
messages = check_sanity()
assert messages.has_error
assert any(
"Cannot read archive" in m["message"] for m in messages[sample_doc.pk]
)
finally:
archive.chmod(0o644)
def test_no_archive_at_all(self, sample_doc: Document) -> None:
"""Document with neither archive checksum nor filename is valid."""
Path(sample_doc.archive_path).unlink()
sample_doc.archive_checksum = None
sample_doc.archive_filename = None
sample_doc.save()
messages = check_sanity()
assert not messages.has_error
@pytest.mark.django_db
class TestCheckSanityContent:
@pytest.mark.parametrize(
"content",
[
pytest.param("", id="empty-string"),
],
@override_settings(
APP_LOGO="logo/logo.png",
)
def test_no_content(self, sample_doc: Document, content: str) -> None:
sample_doc.content = content
sample_doc.save()
def test_ignore_logo(self) -> None:
self.make_test_data()
logo_dir = Path(self.dirs.media_dir, "logo")
logo_dir.mkdir(parents=True, exist_ok=True)
Path(self.dirs.media_dir, "logo", "logo.png").touch()
messages = check_sanity()
assert not messages.has_error
assert not messages.has_warning
assert any("no OCR data" in m["message"] for m in messages[sample_doc.pk])
self.assertFalse(messages.has_warning)
@pytest.mark.django_db
class TestCheckSanityOrphans:
def test_orphaned_file(
self,
sample_doc: Document,
paperless_dirs: PaperlessDirs,
) -> None:
(paperless_dirs.originals / "orphan.pdf").touch()
def test_ignore_ignorable_files(self) -> None:
self.make_test_data()
Path(self.dirs.media_dir, ".DS_Store").touch()
Path(self.dirs.media_dir, "desktop.ini").touch()
messages = check_sanity()
assert messages.has_warning
assert any("Orphaned file" in m["message"] for m in messages[None])
self.assertFalse(messages.has_warning)
@pytest.mark.usefixtures("_media_settings")
def test_ignorable_files_not_flagged(
self,
paperless_dirs: PaperlessDirs,
) -> None:
(paperless_dirs.media / ".DS_Store").touch()
(paperless_dirs.media / "desktop.ini").touch()
messages = check_sanity()
assert not messages.has_warning
def test_archive_filename_no_checksum(self) -> None:
doc = self.make_test_data()
doc.archive_checksum = None
doc.save()
self.assertSanityError(doc, "has an archive file, but its checksum is missing.")
@pytest.mark.django_db
class TestCheckSanityIterWrapper:
def test_wrapper_receives_documents(self, sample_doc: Document) -> None:
seen: list[Document] = []
def tracking(iterable: Iterable[Document]) -> Iterable[Document]:
for item in iterable:
seen.append(item)
yield item
check_sanity(iter_wrapper=tracking)
assert len(seen) == 1
assert seen[0].pk == sample_doc.pk
def test_default_works_without_wrapper(self, sample_doc: Document) -> None:
messages = check_sanity()
assert not messages.has_error
@pytest.mark.django_db
class TestCheckSanityTaskRecording:
@pytest.mark.parametrize(
("expected_type", "scheduled"),
[
pytest.param(PaperlessTask.TaskType.SCHEDULED_TASK, True, id="scheduled"),
pytest.param(PaperlessTask.TaskType.MANUAL_TASK, False, id="manual"),
],
)
@pytest.mark.usefixtures("_media_settings")
def test_task_type(self, expected_type: str, *, scheduled: bool) -> None:
check_sanity(scheduled=scheduled)
task = PaperlessTask.objects.latest("date_created")
assert task.task_name == PaperlessTask.TaskName.CHECK_SANITY
assert task.type == expected_type
def test_success_status(self, sample_doc: Document) -> None:
check_sanity()
task = PaperlessTask.objects.latest("date_created")
assert task.status == "SUCCESS"
def test_failure_status(self, sample_doc: Document) -> None:
Path(sample_doc.source_path).unlink()
check_sanity()
task = PaperlessTask.objects.latest("date_created")
assert task.status == "FAILURE"
assert "Check logs for details" in task.result
@pytest.mark.django_db
class TestCheckSanityLogMessages:
def test_logs_doc_issues(
self,
sample_doc: Document,
caplog: pytest.LogCaptureFixture,
) -> None:
Path(sample_doc.source_path).unlink()
messages = check_sanity()
with caplog.at_level(logging.INFO, logger="paperless.sanity_checker"):
messages.log_messages()
assert f"document #{sample_doc.pk}" in caplog.text
assert "Original of document does not exist" in caplog.text
def test_logs_global_issues(
self,
sample_doc: Document,
paperless_dirs: PaperlessDirs,
caplog: pytest.LogCaptureFixture,
) -> None:
(paperless_dirs.originals / "orphan.pdf").touch()
messages = check_sanity()
with caplog.at_level(logging.WARNING, logger="paperless.sanity_checker"):
messages.log_messages()
assert "Orphaned file" in caplog.text
@pytest.mark.usefixtures("_media_settings")
def test_logs_unknown_doc_pk(self, caplog: pytest.LogCaptureFixture) -> None:
"""A doc PK not in the DB logs 'Unknown' as the title."""
messages = check_sanity()
messages.error(99999, "Ghost document")
with caplog.at_level(logging.INFO, logger="paperless.sanity_checker"):
messages.log_messages()
assert "#99999" in caplog.text
assert "Unknown" in caplog.text
def test_archive_checksum_no_filename(self) -> None:
doc = self.make_test_data()
doc.archive_filename = None
doc.save()
self.assertSanityError(
doc,
"has an archive file checksum, but no archive filename.",
)

View File

@@ -202,3 +202,43 @@ def audit_log_check(app_configs, **kwargs):
)
return result
@register()
def check_deprecated_db_settings(
app_configs: object,
**kwargs: object,
) -> list[Warning]:
"""Check for deprecated database environment variables.
Detects legacy advanced options that should be migrated to
PAPERLESS_DB_OPTIONS. Returns one Warning per deprecated variable found.
"""
deprecated_vars: dict[str, str] = {
"PAPERLESS_DB_TIMEOUT": "timeout",
"PAPERLESS_DB_POOLSIZE": "pool.min_size / pool.max_size",
"PAPERLESS_DBSSLMODE": "sslmode",
"PAPERLESS_DBSSLROOTCERT": "sslrootcert",
"PAPERLESS_DBSSLCERT": "sslcert",
"PAPERLESS_DBSSLKEY": "sslkey",
}
warnings: list[Warning] = []
for var_name, db_option_key in deprecated_vars.items():
if not os.getenv(var_name):
continue
warnings.append(
Warning(
f"Deprecated environment variable: {var_name}",
hint=(
f"{var_name} is no longer supported and will be removed in v3.2. "
f"Set the equivalent option via PAPERLESS_DB_OPTIONS instead. "
f'Example: PAPERLESS_DB_OPTIONS=\'{{"{db_option_key}": "<value>"}}\'. '
"See https://docs.paperless-ngx.com/migration/ for the full reference."
),
id="paperless.W001",
),
)
return warnings

View File

@@ -17,6 +17,8 @@ from dateparser.languages.loader import LocaleDataLoader
from django.utils.translation import gettext_lazy as _
from dotenv import load_dotenv
from paperless.settings.custom import parse_db_settings
logger = logging.getLogger("paperless.settings")
# Tap paperless.conf if it's available
@@ -282,7 +284,7 @@ DEBUG = __get_boolean("PAPERLESS_DEBUG", "NO")
# Directories #
###############################################################################
BASE_DIR: Path = Path(__file__).resolve().parent.parent
BASE_DIR: Path = Path(__file__).resolve().parent.parent.parent
STATIC_ROOT = __get_path("PAPERLESS_STATICDIR", BASE_DIR.parent / "static")
@@ -722,83 +724,8 @@ EMAIL_CERTIFICATE_FILE = __get_optional_path("PAPERLESS_EMAIL_CERTIFICATE_LOCATI
###############################################################################
# Database #
###############################################################################
def _parse_db_settings() -> dict:
databases = {
"default": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": DATA_DIR / "db.sqlite3",
"OPTIONS": {},
},
}
if os.getenv("PAPERLESS_DBHOST"):
# Have sqlite available as a second option for management commands
# This is important when migrating to/from sqlite
databases["sqlite"] = databases["default"].copy()
databases["default"] = {
"HOST": os.getenv("PAPERLESS_DBHOST"),
"NAME": os.getenv("PAPERLESS_DBNAME", "paperless"),
"USER": os.getenv("PAPERLESS_DBUSER", "paperless"),
"PASSWORD": os.getenv("PAPERLESS_DBPASS", "paperless"),
"OPTIONS": {},
}
if os.getenv("PAPERLESS_DBPORT"):
databases["default"]["PORT"] = os.getenv("PAPERLESS_DBPORT")
# Leave room for future extensibility
if os.getenv("PAPERLESS_DBENGINE") == "mariadb":
engine = "django.db.backends.mysql"
# Contrary to Postgres, Django does not natively support connection pooling for MariaDB.
# However, since MariaDB uses threads instead of forks, establishing connections is significantly faster
# compared to PostgreSQL, so the lack of pooling is not an issue
options = {
"read_default_file": "/etc/mysql/my.cnf",
"charset": "utf8mb4",
"ssl_mode": os.getenv("PAPERLESS_DBSSLMODE", "PREFERRED"),
"ssl": {
"ca": os.getenv("PAPERLESS_DBSSLROOTCERT", None),
"cert": os.getenv("PAPERLESS_DBSSLCERT", None),
"key": os.getenv("PAPERLESS_DBSSLKEY", None),
},
}
else: # Default to PostgresDB
engine = "django.db.backends.postgresql"
options = {
"sslmode": os.getenv("PAPERLESS_DBSSLMODE", "prefer"),
"sslrootcert": os.getenv("PAPERLESS_DBSSLROOTCERT", None),
"sslcert": os.getenv("PAPERLESS_DBSSLCERT", None),
"sslkey": os.getenv("PAPERLESS_DBSSLKEY", None),
}
if int(os.getenv("PAPERLESS_DB_POOLSIZE", 0)) > 0:
options.update(
{
"pool": {
"min_size": 1,
"max_size": int(os.getenv("PAPERLESS_DB_POOLSIZE")),
},
},
)
databases["default"]["ENGINE"] = engine
databases["default"]["OPTIONS"].update(options)
if os.getenv("PAPERLESS_DB_TIMEOUT") is not None:
if databases["default"]["ENGINE"] == "django.db.backends.sqlite3":
databases["default"]["OPTIONS"].update(
{"timeout": int(os.getenv("PAPERLESS_DB_TIMEOUT"))},
)
else:
databases["default"]["OPTIONS"].update(
{"connect_timeout": int(os.getenv("PAPERLESS_DB_TIMEOUT"))},
)
databases["sqlite"]["OPTIONS"].update(
{"timeout": int(os.getenv("PAPERLESS_DB_TIMEOUT"))},
)
return databases
DATABASES = _parse_db_settings()
DATABASES = parse_db_settings(DATA_DIR)
if os.getenv("PAPERLESS_DBENGINE") == "mariadb":
# Silence Django error on old MariaDB versions.

View File

@@ -0,0 +1,133 @@
import os
from pathlib import Path
from typing import Any
from paperless.settings.parsers import get_choice_from_env
from paperless.settings.parsers import get_int_from_env
from paperless.settings.parsers import parse_dict_from_str
def parse_db_settings(data_dir: Path) -> dict[str, dict[str, Any]]:
"""Parse database settings from environment variables.
Core connection variables (no deprecation):
- PAPERLESS_DBENGINE (sqlite/postgresql/mariadb)
- PAPERLESS_DBHOST, PAPERLESS_DBPORT
- PAPERLESS_DBNAME, PAPERLESS_DBUSER, PAPERLESS_DBPASS
Advanced options can be set via:
- Legacy individual env vars (deprecated in v3.0, removed in v3.2)
- PAPERLESS_DB_OPTIONS (recommended v3+ approach)
Args:
data_dir: The data directory path for SQLite database location.
Returns:
A databases dict suitable for Django DATABASES setting.
"""
try:
engine = get_choice_from_env(
"PAPERLESS_DBENGINE",
{"sqlite", "postgresql", "mariadb"},
default="sqlite",
)
except ValueError:
# MariaDB users already had to set PAPERLESS_DBENGINE, so it was picked up above
# SQLite users didn't need to set anything
engine = "postgresql" if "PAPERLESS_DBHOST" in os.environ else "sqlite"
db_config: dict[str, Any]
base_options: dict[str, Any]
match engine:
case "sqlite":
db_config = {
"ENGINE": "django.db.backends.sqlite3",
"NAME": str((data_dir / "db.sqlite3").resolve()),
}
base_options = {}
case "postgresql":
db_config = {
"ENGINE": "django.db.backends.postgresql",
"HOST": os.getenv("PAPERLESS_DBHOST"),
"NAME": os.getenv("PAPERLESS_DBNAME", "paperless"),
"USER": os.getenv("PAPERLESS_DBUSER", "paperless"),
"PASSWORD": os.getenv("PAPERLESS_DBPASS", "paperless"),
}
base_options = {
"sslmode": os.getenv("PAPERLESS_DBSSLMODE", "prefer"),
"sslrootcert": os.getenv("PAPERLESS_DBSSLROOTCERT"),
"sslcert": os.getenv("PAPERLESS_DBSSLCERT"),
"sslkey": os.getenv("PAPERLESS_DBSSLKEY"),
}
if (pool_size := get_int_from_env("PAPERLESS_DB_POOLSIZE")) is not None:
base_options["pool"] = {
"min_size": 1,
"max_size": pool_size,
}
case "mariadb":
db_config = {
"ENGINE": "django.db.backends.mysql",
"HOST": os.getenv("PAPERLESS_DBHOST"),
"NAME": os.getenv("PAPERLESS_DBNAME", "paperless"),
"USER": os.getenv("PAPERLESS_DBUSER", "paperless"),
"PASSWORD": os.getenv("PAPERLESS_DBPASS", "paperless"),
}
base_options = {
"read_default_file": "/etc/mysql/my.cnf",
"charset": "utf8mb4",
"collation": "utf8mb4_unicode_ci",
"ssl_mode": os.getenv("PAPERLESS_DBSSLMODE", "PREFERRED"),
"ssl": {
"ca": os.getenv("PAPERLESS_DBSSLROOTCERT"),
"cert": os.getenv("PAPERLESS_DBSSLCERT"),
"key": os.getenv("PAPERLESS_DBSSLKEY"),
},
}
case _: # pragma: no cover
raise NotImplementedError(engine)
# Handle port setting for external databases
if (
engine in ("postgresql", "mariadb")
and (port := get_int_from_env("PAPERLESS_DBPORT")) is not None
):
db_config["PORT"] = port
# Handle timeout setting (common across all engines, different key names)
if (timeout := get_int_from_env("PAPERLESS_DB_TIMEOUT")) is not None:
timeout_key = "timeout" if engine == "sqlite" else "connect_timeout"
base_options[timeout_key] = timeout
# Apply PAPERLESS_DB_OPTIONS overrides
db_config["OPTIONS"] = parse_dict_from_str(
os.getenv("PAPERLESS_DB_OPTIONS"),
defaults=base_options,
separator=";",
type_map={
# SQLite options
"timeout": int,
# Postgres/MariaDB options
"connect_timeout": int,
"pool.min_size": int,
"pool.max_size": int,
},
)
databases = {"default": db_config}
# Add SQLite fallback for PostgreSQL/MariaDB
# TODO: Is this really useful/used?
if engine in ("postgresql", "mariadb"):
databases["sqlite"] = {
"ENGINE": "django.db.backends.sqlite3",
"NAME": str((data_dir / "db.sqlite3").resolve()),
"OPTIONS": {},
}
return databases

View File

@@ -0,0 +1,192 @@
import copy
import os
from collections.abc import Callable
from collections.abc import Mapping
from pathlib import Path
from typing import Any
from typing import TypeVar
from typing import overload
T = TypeVar("T")
def str_to_bool(value: str) -> bool:
"""
Converts a string representation of truth to a boolean value.
Recognizes 'true', '1', 't', 'y', 'yes' as True, and
'false', '0', 'f', 'n', 'no' as False. Case-insensitive.
Args:
value: The string to convert.
Returns:
The boolean representation of the string.
Raises:
ValueError: If the string is not a recognized boolean value.
"""
val_lower = value.strip().lower()
if val_lower in ("true", "1", "t", "y", "yes"):
return True
elif val_lower in ("false", "0", "f", "n", "no"):
return False
raise ValueError(f"Cannot convert '{value}' to a boolean.")
@overload
def get_int_from_env(key: str) -> int | None: ...
@overload
def get_int_from_env(key: str, default: None) -> int | None: ...
@overload
def get_int_from_env(key: str, default: int) -> int: ...
def get_int_from_env(key: str, default: int | None = None) -> int | None:
"""
Return an integer value based on the environment variable.
If default is provided, returns that value when key is missing.
If default is None, returns None when key is missing.
"""
if key not in os.environ:
return default
return int(os.environ[key])
def parse_dict_from_str(
env_str: str | None,
defaults: dict[str, Any] | None = None,
type_map: Mapping[str, Callable[[str], Any]] | None = None,
separator: str = ",",
) -> dict[str, Any]:
"""
Parses a key-value string into a dictionary, applying defaults and casting types.
Supports nested keys via dot-notation, e.g.:
"database.host=localhost,database.port=5432"
Args:
env_str: The string from the environment variable (e.g., "port=9090,debug=true").
defaults: A dictionary of default values (can contain nested dicts).
type_map: A dictionary mapping keys (dot-notation allowed) to a type or a parsing
function (e.g., {'port': int, 'debug': bool, 'database.port': int}).
The special `bool` type triggers custom boolean parsing.
separator: The character used to separate key-value pairs. Defaults to ','.
Returns:
A dictionary with the parsed and correctly-typed settings.
Raises:
ValueError: If a value cannot be cast to its specified type.
"""
def _set_nested(d: dict, keys: list[str], value: Any) -> None:
"""Set a nested value, creating intermediate dicts as needed."""
cur = d
for k in keys[:-1]:
if k not in cur or not isinstance(cur[k], dict):
cur[k] = {}
cur = cur[k]
cur[keys[-1]] = value
def _get_nested(d: dict, keys: list[str]) -> Any:
"""Get nested value or raise KeyError if not present."""
cur = d
for k in keys:
if not isinstance(cur, dict) or k not in cur:
raise KeyError
cur = cur[k]
return cur
def _has_nested(d: dict, keys: list[str]) -> bool:
try:
_get_nested(d, keys)
return True
except KeyError:
return False
settings: dict[str, Any] = copy.deepcopy(defaults) if defaults else {}
_type_map = type_map if type_map else {}
if not env_str:
return settings
# Parse the environment string using the specified separator
pairs = [p.strip() for p in env_str.split(separator) if p.strip()]
for pair in pairs:
if "=" not in pair:
# ignore malformed pairs
continue
key, val = pair.split("=", 1)
key = key.strip()
val = val.strip()
if not key:
continue
parts = key.split(".")
_set_nested(settings, parts, val)
# Apply type casting to the updated settings (supports nested keys in type_map)
for key, caster in _type_map.items():
key_parts = key.split(".")
if _has_nested(settings, key_parts):
raw_val = _get_nested(settings, key_parts)
# Only cast if it's a string (i.e. from env parsing). If defaults already provided
# a different type we leave it as-is.
if isinstance(raw_val, str):
try:
if caster is bool:
parsed = str_to_bool(raw_val)
elif caster is Path:
parsed = Path(raw_val).resolve()
else:
parsed = caster(raw_val)
except (ValueError, TypeError) as e:
caster_name = getattr(caster, "__name__", repr(caster))
raise ValueError(
f"Error casting key '{key}' with value '{raw_val}' "
f"to type '{caster_name}'",
) from e
_set_nested(settings, key_parts, parsed)
return settings
def get_choice_from_env(
env_key: str,
choices: set[str],
default: str | None = None,
) -> str:
"""
Gets and validates an environment variable against a set of allowed choices.
Args:
env_key: The environment variable key to validate
choices: Set of valid choices for the environment variable
default: Optional default value if environment variable is not set
Returns:
The validated environment variable value
Raises:
ValueError: If the environment variable value is not in choices
or if no default is provided and env var is missing
"""
value = os.environ.get(env_key, default)
if value is None:
raise ValueError(
f"Environment variable '{env_key}' is required but not set.",
)
if value not in choices:
raise ValueError(
f"Environment variable '{env_key}' has invalid value '{value}'. "
f"Valid choices are: {', '.join(sorted(choices))}",
)
return value

View File

@@ -2,13 +2,17 @@ import os
from pathlib import Path
from unittest import mock
import pytest
from django.core.checks import Warning
from django.test import TestCase
from django.test import override_settings
from pytest_mock import MockerFixture
from documents.tests.utils import DirectoriesMixin
from documents.tests.utils import FileSystemAssertsMixin
from paperless.checks import audit_log_check
from paperless.checks import binaries_check
from paperless.checks import check_deprecated_db_settings
from paperless.checks import debug_mode_check
from paperless.checks import paths_check
from paperless.checks import settings_values_check
@@ -237,3 +241,157 @@ class TestAuditLogChecks(TestCase):
("auditlog table was found but audit log is disabled."),
msg.msg,
)
DEPRECATED_VARS: dict[str, str] = {
"PAPERLESS_DB_TIMEOUT": "timeout",
"PAPERLESS_DB_POOLSIZE": "pool.min_size / pool.max_size",
"PAPERLESS_DBSSLMODE": "sslmode",
"PAPERLESS_DBSSLROOTCERT": "sslrootcert",
"PAPERLESS_DBSSLCERT": "sslcert",
"PAPERLESS_DBSSLKEY": "sslkey",
}
class TestDeprecatedDbSettings:
"""Test suite for the check_deprecated_db_settings system check."""
def test_no_deprecated_vars_returns_empty(
self,
mocker: MockerFixture,
) -> None:
"""No warnings when none of the deprecated vars are present."""
# clear=True ensures vars from the outer test environment do not leak in
mocker.patch.dict(os.environ, {}, clear=True)
result = check_deprecated_db_settings(None)
assert result == []
@pytest.mark.parametrize(
("env_var", "db_option_key"),
[
("PAPERLESS_DB_TIMEOUT", "timeout"),
("PAPERLESS_DB_POOLSIZE", "pool.min_size / pool.max_size"),
("PAPERLESS_DBSSLMODE", "sslmode"),
("PAPERLESS_DBSSLROOTCERT", "sslrootcert"),
("PAPERLESS_DBSSLCERT", "sslcert"),
("PAPERLESS_DBSSLKEY", "sslkey"),
],
ids=[
"db-timeout",
"db-poolsize",
"ssl-mode",
"ssl-rootcert",
"ssl-cert",
"ssl-key",
],
)
def test_single_deprecated_var_produces_one_warning(
self,
mocker: MockerFixture,
env_var: str,
db_option_key: str,
) -> None:
"""Each deprecated var in isolation produces exactly one warning."""
mocker.patch.dict(os.environ, {env_var: "some_value"}, clear=True)
result = check_deprecated_db_settings(None)
assert len(result) == 1
warning = result[0]
assert isinstance(warning, Warning)
assert warning.id == "paperless.W001"
assert env_var in warning.hint
assert db_option_key in warning.hint
def test_multiple_deprecated_vars_produce_one_warning_each(
self,
mocker: MockerFixture,
) -> None:
"""Each deprecated var present in the environment gets its own warning."""
set_vars = {
"PAPERLESS_DB_TIMEOUT": "30",
"PAPERLESS_DB_POOLSIZE": "10",
"PAPERLESS_DBSSLMODE": "require",
}
mocker.patch.dict(os.environ, set_vars, clear=True)
result = check_deprecated_db_settings(None)
assert len(result) == len(set_vars)
assert all(isinstance(w, Warning) for w in result)
assert all(w.id == "paperless.W001" for w in result)
all_hints = " ".join(w.hint for w in result)
for var_name in set_vars:
assert var_name in all_hints
def test_all_deprecated_vars_produces_one_warning_each(
self,
mocker: MockerFixture,
) -> None:
"""All deprecated vars set simultaneously produces one warning per var."""
all_vars = dict.fromkeys(DEPRECATED_VARS, "some_value")
mocker.patch.dict(os.environ, all_vars, clear=True)
result = check_deprecated_db_settings(None)
assert len(result) == len(DEPRECATED_VARS)
assert all(isinstance(w, Warning) for w in result)
assert all(w.id == "paperless.W001" for w in result)
def test_unset_vars_not_mentioned_in_warnings(
self,
mocker: MockerFixture,
) -> None:
"""Vars absent from the environment do not appear in any warning."""
mocker.patch.dict(
os.environ,
{"PAPERLESS_DB_TIMEOUT": "30"},
clear=True,
)
result = check_deprecated_db_settings(None)
assert len(result) == 1
assert "PAPERLESS_DB_TIMEOUT" in result[0].hint
unset_vars = [v for v in DEPRECATED_VARS if v != "PAPERLESS_DB_TIMEOUT"]
for var_name in unset_vars:
assert var_name not in result[0].hint
def test_empty_string_var_not_treated_as_set(
self,
mocker: MockerFixture,
) -> None:
"""A var set to an empty string is not flagged as a deprecated setting."""
mocker.patch.dict(
os.environ,
{"PAPERLESS_DB_TIMEOUT": ""},
clear=True,
)
result = check_deprecated_db_settings(None)
assert result == []
def test_warning_mentions_migration_target(
self,
mocker: MockerFixture,
) -> None:
"""Each warning hints at PAPERLESS_DB_OPTIONS as the migration target."""
mocker.patch.dict(
os.environ,
{"PAPERLESS_DBSSLMODE": "require"},
clear=True,
)
result = check_deprecated_db_settings(None)
assert len(result) == 1
assert "PAPERLESS_DB_OPTIONS" in result[0].hint
def test_warning_message_identifies_var(
self,
mocker: MockerFixture,
) -> None:
"""The warning message (not just the hint) identifies the offending var."""
mocker.patch.dict(
os.environ,
{"PAPERLESS_DBSSLCERT": "/path/to/cert.pem"},
clear=True,
)
result = check_deprecated_db_settings(None)
assert len(result) == 1
assert "PAPERLESS_DBSSLCERT" in result[0].msg

View File

@@ -1,19 +1,21 @@
import datetime
import os
from pathlib import Path
from unittest import TestCase
from unittest import mock
import pytest
from celery.schedules import crontab
from pytest_mock import MockerFixture
from paperless.settings import _parse_base_paths
from paperless.settings import _parse_beat_schedule
from paperless.settings import _parse_dateparser_languages
from paperless.settings import _parse_db_settings
from paperless.settings import _parse_ignore_dates
from paperless.settings import _parse_paperless_url
from paperless.settings import _parse_redis_url
from paperless.settings import default_threads_per_worker
from paperless.settings.custom import parse_db_settings
class TestIgnoreDateParsing(TestCase):
@@ -378,62 +380,302 @@ class TestCeleryScheduleParsing(TestCase):
)
class TestDBSettings(TestCase):
def test_db_timeout_with_sqlite(self) -> None:
"""
GIVEN:
- PAPERLESS_DB_TIMEOUT is set
WHEN:
- Settings are parsed
THEN:
- PAPERLESS_DB_TIMEOUT set for sqlite
"""
with mock.patch.dict(
os.environ,
{
"PAPERLESS_DB_TIMEOUT": "10",
},
):
databases = _parse_db_settings()
class TestParseDbSettings:
"""Test suite for parse_db_settings function."""
self.assertDictEqual(
@pytest.mark.parametrize(
("env_vars", "expected_database_settings"),
[
pytest.param(
{},
{
"timeout": 10.0,
"default": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
databases["default"]["OPTIONS"],
)
def test_db_timeout_with_not_sqlite(self) -> None:
"""
GIVEN:
- PAPERLESS_DB_TIMEOUT is set but db is not sqlite
WHEN:
- Settings are parsed
THEN:
- PAPERLESS_DB_TIMEOUT set correctly in non-sqlite db & for fallback sqlite db
"""
with mock.patch.dict(
os.environ,
{
"PAPERLESS_DBHOST": "127.0.0.1",
"PAPERLESS_DB_TIMEOUT": "10",
},
):
databases = _parse_db_settings()
self.assertDictEqual(
databases["default"]["OPTIONS"],
databases["default"]["OPTIONS"]
| {
"connect_timeout": 10.0,
},
)
self.assertDictEqual(
id="default-sqlite",
),
pytest.param(
{
"timeout": 10.0,
"PAPERLESS_DBENGINE": "sqlite",
"PAPERLESS_DB_OPTIONS": "timeout=30",
},
databases["sqlite"]["OPTIONS"],
{
"default": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {
"timeout": 30,
},
},
},
id="sqlite-with-timeout-override",
),
pytest.param(
{
"PAPERLESS_DBENGINE": "postgresql",
"PAPERLESS_DBHOST": "localhost",
},
{
"default": {
"ENGINE": "django.db.backends.postgresql",
"HOST": "localhost",
"NAME": "paperless",
"USER": "paperless",
"PASSWORD": "paperless",
"OPTIONS": {
"sslmode": "prefer",
"sslrootcert": None,
"sslcert": None,
"sslkey": None,
},
},
"sqlite": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
id="postgresql-defaults",
),
pytest.param(
{
"PAPERLESS_DBENGINE": "postgresql",
"PAPERLESS_DBHOST": "paperless-db-host",
"PAPERLESS_DBPORT": "1111",
"PAPERLESS_DBNAME": "customdb",
"PAPERLESS_DBUSER": "customuser",
"PAPERLESS_DBPASS": "custompass",
"PAPERLESS_DB_OPTIONS": "pool.max_size=50;pool.min_size=2;sslmode=require",
},
{
"default": {
"ENGINE": "django.db.backends.postgresql",
"HOST": "paperless-db-host",
"PORT": 1111,
"NAME": "customdb",
"USER": "customuser",
"PASSWORD": "custompass",
"OPTIONS": {
"sslmode": "require",
"sslrootcert": None,
"sslcert": None,
"sslkey": None,
"pool": {
"min_size": 2,
"max_size": 50,
},
},
},
"sqlite": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
id="postgresql-overrides",
),
pytest.param(
{
"PAPERLESS_DBENGINE": "postgresql",
"PAPERLESS_DBHOST": "pghost",
"PAPERLESS_DB_POOLSIZE": "10",
},
{
"default": {
"ENGINE": "django.db.backends.postgresql",
"HOST": "pghost",
"NAME": "paperless",
"USER": "paperless",
"PASSWORD": "paperless",
"OPTIONS": {
"sslmode": "prefer",
"sslrootcert": None,
"sslcert": None,
"sslkey": None,
"pool": {
"min_size": 1,
"max_size": 10,
},
},
},
"sqlite": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
id="postgresql-legacy-poolsize",
),
pytest.param(
{
"PAPERLESS_DBENGINE": "postgresql",
"PAPERLESS_DBHOST": "pghost",
"PAPERLESS_DBSSLMODE": "require",
"PAPERLESS_DBSSLROOTCERT": "/certs/ca.crt",
"PAPERLESS_DB_TIMEOUT": "30",
},
{
"default": {
"ENGINE": "django.db.backends.postgresql",
"HOST": "pghost",
"NAME": "paperless",
"USER": "paperless",
"PASSWORD": "paperless",
"OPTIONS": {
"sslmode": "require",
"sslrootcert": "/certs/ca.crt",
"sslcert": None,
"sslkey": None,
"connect_timeout": 30,
},
},
"sqlite": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
id="postgresql-legacy-ssl-and-timeout",
),
pytest.param(
{
"PAPERLESS_DBENGINE": "mariadb",
"PAPERLESS_DBHOST": "localhost",
},
{
"default": {
"ENGINE": "django.db.backends.mysql",
"HOST": "localhost",
"NAME": "paperless",
"USER": "paperless",
"PASSWORD": "paperless",
"OPTIONS": {
"read_default_file": "/etc/mysql/my.cnf",
"charset": "utf8mb4",
"collation": "utf8mb4_unicode_ci",
"ssl_mode": "PREFERRED",
"ssl": {
"ca": None,
"cert": None,
"key": None,
},
},
},
"sqlite": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
id="mariadb-defaults",
),
pytest.param(
{
"PAPERLESS_DBENGINE": "mariadb",
"PAPERLESS_DBHOST": "paperless-mariadb-host",
"PAPERLESS_DBPORT": "5555",
"PAPERLESS_DBUSER": "my-cool-user",
"PAPERLESS_DBPASS": "my-secure-password",
"PAPERLESS_DB_OPTIONS": "ssl.ca=/path/to/ca.pem;ssl_mode=REQUIRED",
},
{
"default": {
"ENGINE": "django.db.backends.mysql",
"HOST": "paperless-mariadb-host",
"PORT": 5555,
"NAME": "paperless",
"USER": "my-cool-user",
"PASSWORD": "my-secure-password",
"OPTIONS": {
"read_default_file": "/etc/mysql/my.cnf",
"charset": "utf8mb4",
"collation": "utf8mb4_unicode_ci",
"ssl_mode": "REQUIRED",
"ssl": {
"ca": "/path/to/ca.pem",
"cert": None,
"key": None,
},
},
},
"sqlite": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
id="mariadb-overrides",
),
pytest.param(
{
"PAPERLESS_DBENGINE": "mariadb",
"PAPERLESS_DBHOST": "mariahost",
"PAPERLESS_DBSSLMODE": "REQUIRED",
"PAPERLESS_DBSSLROOTCERT": "/certs/ca.pem",
"PAPERLESS_DBSSLCERT": "/certs/client.pem",
"PAPERLESS_DBSSLKEY": "/certs/client.key",
"PAPERLESS_DB_TIMEOUT": "25",
},
{
"default": {
"ENGINE": "django.db.backends.mysql",
"HOST": "mariahost",
"NAME": "paperless",
"USER": "paperless",
"PASSWORD": "paperless",
"OPTIONS": {
"read_default_file": "/etc/mysql/my.cnf",
"charset": "utf8mb4",
"collation": "utf8mb4_unicode_ci",
"ssl_mode": "REQUIRED",
"ssl": {
"ca": "/certs/ca.pem",
"cert": "/certs/client.pem",
"key": "/certs/client.key",
},
"connect_timeout": 25,
},
},
"sqlite": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": None, # Will be replaced with tmp_path
"OPTIONS": {},
},
},
id="mariadb-legacy-ssl-and-timeout",
),
],
)
def test_parse_db_settings(
self,
tmp_path: Path,
mocker: MockerFixture,
env_vars: dict[str, str],
expected_database_settings: dict[str, dict],
) -> None:
"""Test various database configurations with defaults and overrides."""
# Clear environment and set test vars
mocker.patch.dict(os.environ, env_vars, clear=True)
# Update expected paths with actual tmp_path
if (
"default" in expected_database_settings
and expected_database_settings["default"]["NAME"] is None
):
expected_database_settings["default"]["NAME"] = str(
tmp_path / "db.sqlite3",
)
if "sqlite" in expected_database_settings:
expected_database_settings["sqlite"]["NAME"] = str(
tmp_path / "db.sqlite3",
)
settings = parse_db_settings(tmp_path)
assert settings == expected_database_settings
class TestPaperlessURLSettings(TestCase):

173
uv.lock generated
View File

@@ -721,14 +721,14 @@ wheels = [
[[package]]
name = "concurrent-log-handler"
version = "0.9.29"
version = "0.9.28"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "portalocker", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9c/2c/ba185acc438cff6b58cd8f8dec27e7f4fcabf6968a1facbb6d0cacbde7fe/concurrent_log_handler-0.9.29.tar.gz", hash = "sha256:bc37a76d3f384cbf4a98f693ebd770543edc0f4cd5c6ab6bc70e9e1d7d582265", size = 42114, upload-time = "2026-02-22T18:18:25.758Z" }
sdist = { url = "https://files.pythonhosted.org/packages/6b/ed/68b9c3a07a2331361a09a194e4375c4ee680a799391cfb1ca924ca2b6523/concurrent_log_handler-0.9.28.tar.gz", hash = "sha256:4cc27969b3420239bd153779266f40d9713ece814e312b7aa753ce62c6eacdb8", size = 30935, upload-time = "2025-06-10T19:02:15.622Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/4a/f3/3e3188fdb3e53c6343fd1c7de41c55d4db626f07db3877eae77b28d58bd2/concurrent_log_handler-0.9.29-py3-none-any.whl", hash = "sha256:0d6c077fbaef2dae49a25975dcf72a602fe0a6a4ce80a3b7c37696d37e10459a", size = 32052, upload-time = "2026-02-22T18:18:24.558Z" },
{ url = "https://files.pythonhosted.org/packages/d0/a0/1331c3f12d95adc8d0385dc620001054c509db88376d2e17be36b6353020/concurrent_log_handler-0.9.28-py3-none-any.whl", hash = "sha256:65db25d05506651a61573937880789fc51c7555e7452303042b5a402fd78939c", size = 28983, upload-time = "2025-06-10T19:02:14.223Z" },
]
[[package]]
@@ -1137,29 +1137,16 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/6d/10/23c0644cf67567bbe4e3a2eeeec0e9c79b701990c0e07c5ee4a4f8897f91/django_multiselectfield-1.0.1-py3-none-any.whl", hash = "sha256:18dc14801f7eca844a48e21cba6d8ec35b9b581f2373bbb2cb75e6994518259a", size = 20481, upload-time = "2025-06-12T14:41:20.107Z" },
]
[[package]]
name = "django-rich"
version = "2.2.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "django", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "rich", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a6/67/e307a5fef657e7992468f567b521534c52e01bdda5a1ae5b12de679a670f/django_rich-2.2.0.tar.gz", hash = "sha256:ecec7842d040024ed8a225699388535e46b87277550c33f46193b52cece2f780", size = 62427, upload-time = "2025-09-18T11:42:17.182Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/27/ed/23fa669493d78cd67e7f6734fa380f8690f2b4d75b4f72fd645a52c3b32a/django_rich-2.2.0-py3-none-any.whl", hash = "sha256:a0d2c916bd9750b6e9beb57407aef5e836c8705d7dbe9e4fd4725f7bbe41c407", size = 9210, upload-time = "2025-09-18T11:42:15.779Z" },
]
[[package]]
name = "django-soft-delete"
version = "1.0.23"
version = "1.0.22"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "django", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/aa/98/c7c52a85b070b1703774df817b6460a7714655302a2d503f6447544f1a29/django_soft_delete-1.0.23.tar.gz", hash = "sha256:814659f0d19d4f2afc58b31ff73f88f0af66715ccef3b4fcd8f6b3a011d59b2a", size = 22458, upload-time = "2026-02-21T17:48:41.345Z" }
sdist = { url = "https://files.pythonhosted.org/packages/98/d1/c990b731676f93bd4594dee4b5133df52f5d0eee1eb8a969b4030014ac54/django_soft_delete-1.0.22.tar.gz", hash = "sha256:32d0bb95f180c28a40163e78a558acc18901fd56011f91f8ee735c171a6d4244", size = 21982, upload-time = "2025-10-25T13:11:46.199Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/91/9e/77375a163c340fff03d037eac7d970ce006626e6c3aea87b5d159f052f8b/django_soft_delete-1.0.23-py3-none-any.whl", hash = "sha256:dd2133d4925d58308680f389daa2e150abf7b81a4f0abbbf2161a9db3b9f1e74", size = 19308, upload-time = "2026-02-21T17:48:39.974Z" },
{ url = "https://files.pythonhosted.org/packages/f5/c2/fca2bf69b7ca7e18aed9ac059e89f1043663e207a514e8fb652450e49631/django_soft_delete-1.0.22-py3-none-any.whl", hash = "sha256:81973c541d21452d249151085d617ebbfb5ec463899f47cd6b1306677481e94c", size = 19221, upload-time = "2025-10-25T13:11:44.755Z" },
]
[[package]]
@@ -2194,7 +2181,7 @@ wheels = [
[[package]]
name = "llama-index-core"
version = "0.14.15"
version = "0.14.13"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiohttp", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -2222,15 +2209,14 @@ dependencies = [
{ name = "sqlalchemy", extra = ["asyncio"], marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "tenacity", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "tiktoken", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "tinytag", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "tqdm", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "typing-extensions", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "typing-inspect", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "wrapt", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/0c/4f/7c714bdf94dd229707b43e7f8cedf3aed0a99938fd46a9ad8a418c199988/llama_index_core-0.14.15.tar.gz", hash = "sha256:3766aeeb95921b3a2af8c2a51d844f75f404215336e1639098e3652db52c68ce", size = 11593505, upload-time = "2026-02-18T19:05:48.274Z" }
sdist = { url = "https://files.pythonhosted.org/packages/74/54/d6043a088e5e9c1d62300db7ad0ef417c6b9a92f7b4a5cade066aeafdaca/llama_index_core-0.14.13.tar.gz", hash = "sha256:c3b30d20ae0407e5d0a1d35bb3376a98e242661ebfc22da754b5a3da1f8108c0", size = 11589074, upload-time = "2026-01-21T20:44:16.287Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/41/9e/262f6465ee4fffa40698b3cc2177e377ce7d945d3bd8b7d9c6b09448625d/llama_index_core-0.14.15-py3-none-any.whl", hash = "sha256:e02b321c10673871a38aaefdc4a93d5ae8ec324cad4408683189e5a1aa1e3d52", size = 11937002, upload-time = "2026-02-18T19:05:45.855Z" },
{ url = "https://files.pythonhosted.org/packages/76/59/9769f03f1cccadcc014b3b65c166de18999b51459a0f0a579d80f6c91d80/llama_index_core-0.14.13-py3-none-any.whl", hash = "sha256:392f0a5a09433e9dea786964ef5fe5ca2a2b10aee9f979a9507c19a14da2a20a", size = 11934761, upload-time = "2026-01-21T20:44:18.892Z" },
]
[[package]]
@@ -2288,27 +2274,27 @@ wheels = [
[[package]]
name = "llama-index-llms-openai"
version = "0.6.21"
version = "0.6.18"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "llama-index-core", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "openai", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/d8/5b/775289b3064302966cc839bbccfdbe314f706eaf58ad4233b86e5d53343d/llama_index_llms_openai-0.6.21.tar.gz", hash = "sha256:0b92dcfb01cbc7752f5b8bdf6d93430643d295210cf9392b45291d6fdd81e0ee", size = 25961, upload-time = "2026-02-26T04:19:33.604Z" }
sdist = { url = "https://files.pythonhosted.org/packages/56/78/298de76242aee7f5fdd65a0bffb541b3f81759613de1e8ebc719eec8e8af/llama_index_llms_openai-0.6.18.tar.gz", hash = "sha256:36c0256a7a211bbbc5ecc00d3f2caa9730eea1971ced3b68b7c94025c0448020", size = 25946, upload-time = "2026-02-06T12:01:03.095Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e3/d7/5b513acbf0bfc2b6ef281b6bbca764062facc431e8f13763c16005fbd34b/llama_index_llms_openai-0.6.21-py3-none-any.whl", hash = "sha256:ef8c048849f844c7db9ff4208cca9878a799bc5fcdd72954197ea11e64b37c97", size = 26965, upload-time = "2026-02-26T04:19:34.561Z" },
{ url = "https://files.pythonhosted.org/packages/39/46/5a4b62108fb94febe27d35c8476dea042d7a609ee4bf14f5b61f03d5a75a/llama_index_llms_openai-0.6.18-py3-none-any.whl", hash = "sha256:73bbbf233d38116d48350391a3649884829564f4c8f6168c8fa3f3ae1b557376", size = 26945, upload-time = "2026-02-06T12:01:01.25Z" },
]
[[package]]
name = "llama-index-vector-stores-faiss"
version = "0.5.3"
version = "0.5.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "llama-index-core", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/c5/e6/57da31b38d173cd9124fdcdd47487b9a917b69bd49e8f6e551407ccfa860/llama_index_vector_stores_faiss-0.5.3.tar.gz", hash = "sha256:9620b1e27e96233fda88878c453532fba6061cf7ba7a53698a34703faab21ece", size = 6048, upload-time = "2026-02-12T14:22:14.612Z" }
sdist = { url = "https://files.pythonhosted.org/packages/2d/5f/c4ae340f178f202cf09dcc24dd0953a41d9ab24bc33e1f7220544ba86e41/llama_index_vector_stores_faiss-0.5.2.tar.gz", hash = "sha256:924504765e68b1f84ec602feb2d9516be6a6c12fad5e133f19cc5da3b23f4282", size = 5910, upload-time = "2025-12-17T21:01:13.21Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ed/ad/ad192dd624ca2875b8ca74e55fddf9b083d6614524004f7830379d0a0cfd/llama_index_vector_stores_faiss-0.5.3-py3-none-any.whl", hash = "sha256:ef186e38a820e696a1adca15432c8539d73f2959eb05671011db21091a286c8c", size = 7738, upload-time = "2026-02-12T14:22:13.756Z" },
{ url = "https://files.pythonhosted.org/packages/8f/c1/c8317250c2a83d1d439814d1a7f41fa34a23c224b3099da898f08a249859/llama_index_vector_stores_faiss-0.5.2-py3-none-any.whl", hash = "sha256:72a3a03d9f25c70bbcc8c61aa860cd1db69f2a8070606ecc3266d767b71ff2a2", size = 7605, upload-time = "2025-12-17T21:01:12.429Z" },
]
[[package]]
@@ -2780,9 +2766,9 @@ wheels = [
[[package]]
name = "mysqlclient"
version = "2.2.8"
version = "2.2.7"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/eb/b0/9df076488cb2e536d40ce6dbd4273c1f20a386e31ffe6e7cb613902b3c2a/mysqlclient-2.2.8.tar.gz", hash = "sha256:8ed20c5615a915da451bb308c7d0306648a4fd9a2809ba95c992690006306199", size = 92287, upload-time = "2026-02-10T10:58:37.405Z" }
sdist = { url = "https://files.pythonhosted.org/packages/61/68/810093cb579daae426794bbd9d88aa830fae296e85172d18cb0f0e5dd4bc/mysqlclient-2.2.7.tar.gz", hash = "sha256:24ae22b59416d5fcce7e99c9d37548350b4565baac82f95e149cac6ce4163845", size = 91383, upload-time = "2025-01-10T12:06:00.763Z" }
[[package]]
name = "nest-asyncio"
@@ -3055,7 +3041,6 @@ dependencies = [
{ name = "django-filter", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-guardian", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-multiselectfield", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-rich", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-soft-delete", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-treenode", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "djangorestframework", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -3096,6 +3081,7 @@ dependencies = [
{ name = "tika-client", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "torch", version = "2.10.0", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform == 'darwin'" },
{ name = "torch", version = "2.10.0+cpu", source = { registry = "https://download.pytorch.org/whl/cpu" }, marker = "sys_platform == 'linux'" },
{ name = "tqdm", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "watchfiles", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "whitenoise", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "whoosh-reloaded", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -3176,6 +3162,7 @@ typing = [
{ name = "types-pytz", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "types-redis", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "types-setuptools", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "types-tqdm", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
[package.metadata]
@@ -3199,7 +3186,6 @@ requires-dist = [
{ name = "django-filter", specifier = "~=25.1" },
{ name = "django-guardian", specifier = "~=3.3.0" },
{ name = "django-multiselectfield", specifier = "~=1.0.1" },
{ name = "django-rich", specifier = "~=2.2.0" },
{ name = "django-soft-delete", specifier = "~=1.0.18" },
{ name = "django-treenode", specifier = ">=0.23.2" },
{ name = "djangorestframework", specifier = "~=3.16" },
@@ -3246,6 +3232,7 @@ requires-dist = [
{ name = "setproctitle", specifier = "~=1.3.4" },
{ name = "tika-client", specifier = "~=0.10.0" },
{ name = "torch", specifier = "~=2.10.0", index = "https://download.pytorch.org/whl/cpu" },
{ name = "tqdm", specifier = "~=4.67.1" },
{ name = "watchfiles", specifier = ">=1.1.1" },
{ name = "whitenoise", specifier = "~=6.11" },
{ name = "whoosh-reloaded", specifier = ">=2.7.5" },
@@ -3310,6 +3297,7 @@ typing = [
{ name = "types-pytz" },
{ name = "types-redis" },
{ name = "types-setuptools" },
{ name = "types-tqdm" },
]
[[package]]
@@ -3549,23 +3537,23 @@ wheels = [
[[package]]
name = "prek"
version = "0.3.3"
version = "0.3.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/bf/f1/7613dc8347a33e40fc5b79eec6bc7d458d8bbc339782333d8433b665f86f/prek-0.3.3.tar.gz", hash = "sha256:117bd46ebeb39def24298ce021ccc73edcf697b81856fcff36d762dd56093f6f", size = 343697, upload-time = "2026-02-15T13:33:28.723Z" }
sdist = { url = "https://files.pythonhosted.org/packages/d3/f5/ee52def928dd1355c20bcfcf765e1e61434635c33f3075e848e7b83a157b/prek-0.3.2.tar.gz", hash = "sha256:dce0074ff1a21290748ca567b4bda7553ee305a8c7b14d737e6c58364a499364", size = 334229, upload-time = "2026-02-06T13:49:47.539Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2d/8b/dce13d2a3065fd1e8ffce593a0e51c4a79c3cde9c9a15dc0acc8d9d1573d/prek-0.3.3-py3-none-linux_armv6l.whl", hash = "sha256:e8629cac4bdb131be8dc6e5a337f0f76073ad34a8305f3fe2bc1ab6201ede0a4", size = 4644636, upload-time = "2026-02-15T13:33:43.609Z" },
{ url = "https://files.pythonhosted.org/packages/01/30/06ab4dbe7ce02a8ce833e92deb1d9a8e85ae9d40e33d1959a2070b7494c6/prek-0.3.3-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:4b9e819b9e4118e1e785047b1c8bd9aec7e4d836ed034cb58b7db5bcaaf49437", size = 4651410, upload-time = "2026-02-15T13:33:34.277Z" },
{ url = "https://files.pythonhosted.org/packages/d4/fc/da3bc5cb38471e7192eda06b7a26b7c24ef83e82da2c1dbc145f2bf33640/prek-0.3.3-py3-none-macosx_11_0_arm64.whl", hash = "sha256:bf29db3b5657c083eb8444c25aadeeec5167dc492e9019e188f87932f01ea50a", size = 4273163, upload-time = "2026-02-15T13:33:42.106Z" },
{ url = "https://files.pythonhosted.org/packages/b4/74/47839395091e2937beced81a5dd2f8ea9c8239c853da8611aaf78ee21a8b/prek-0.3.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.musllinux_1_1_aarch64.whl", hash = "sha256:ae09736149815b26e64a9d350ca05692bab32c2afdf2939114d3211aaad68a3e", size = 4631808, upload-time = "2026-02-15T13:33:20.076Z" },
{ url = "https://files.pythonhosted.org/packages/e2/89/3f5ef6f7c928c017cb63b029349d6bc03598ab7f6979d4a770ce02575f82/prek-0.3.3-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:856c2b55c51703c366bb4ce81c6a91102b70573a9fc8637db2ac61c66e4565f9", size = 4548959, upload-time = "2026-02-15T13:33:36.325Z" },
{ url = "https://files.pythonhosted.org/packages/b2/18/80002c4c4475f90ca025f27739a016927a0e5d905c60612fc95da1c56ab7/prek-0.3.3-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:3acdf13a018f685beaff0a71d4b0d2ccbab4eaa1aced6d08fd471c1a654183eb", size = 4862256, upload-time = "2026-02-15T13:33:37.754Z" },
{ url = "https://files.pythonhosted.org/packages/c5/25/648bf084c2468fa7cfcdbbe9e59956bbb31b81f36e113bc9107d80af26a7/prek-0.3.3-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:0f035667a8bd0a77b2bfa2b2e125da8cb1793949e9eeef0d8daab7f8ac8b57fe", size = 5404486, upload-time = "2026-02-15T13:33:39.239Z" },
{ url = "https://files.pythonhosted.org/packages/8b/43/261fb60a11712a327da345912bd8b338dc5a050199de800faafa278a6133/prek-0.3.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d09b2ad14332eede441d977de08eb57fb3f61226ed5fd2ceb7aadf5afcdb6794", size = 4887513, upload-time = "2026-02-15T13:33:40.702Z" },
{ url = "https://files.pythonhosted.org/packages/c7/2c/581e757ee57ec6046b32e0ee25660fc734bc2622c319f57119c49c0cab58/prek-0.3.3-py3-none-manylinux_2_28_aarch64.whl", hash = "sha256:c0c3ffac16e37a9daba43a7e8316778f5809b70254be138761a8b5b9ef0df28e", size = 4632336, upload-time = "2026-02-15T13:33:25.867Z" },
{ url = "https://files.pythonhosted.org/packages/d5/d8/aa276ce5d11b77882da4102ca0cb7161095831105043ae7979bbfdcc3dc4/prek-0.3.3-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:a3dc7720b580c07c0386e17af2486a5b4bc2f6cc57034a288a614dcbc4abe555", size = 4679370, upload-time = "2026-02-15T13:33:22.247Z" },
{ url = "https://files.pythonhosted.org/packages/70/19/9d4fa7bde428e58d9f48a74290c08736d42aeb5690dcdccc7a713e34a449/prek-0.3.3-py3-none-musllinux_1_1_armv7l.whl", hash = "sha256:60e0fa15da5020a03df2ee40268145ec5b88267ec2141a205317ad4df8c992d6", size = 4540316, upload-time = "2026-02-15T13:33:24.088Z" },
{ url = "https://files.pythonhosted.org/packages/25/b5/973cce29257e0b47b16cc9b4c162772ea01dbb7c080791ea0c068e106e05/prek-0.3.3-py3-none-musllinux_1_1_i686.whl", hash = "sha256:553515da9586d9624dc42db32b744fdb91cf62b053753037a0cadb3c2d8d82a2", size = 4724566, upload-time = "2026-02-15T13:33:29.832Z" },
{ url = "https://files.pythonhosted.org/packages/d6/8b/ad8b2658895a8ed2b0bc630bf38686fe38b7ff2c619c58953a80e4de3048/prek-0.3.3-py3-none-musllinux_1_1_x86_64.whl", hash = "sha256:9512cf370e0d1496503463a4a65621480efb41b487841a9e9ff1661edf14b238", size = 4995072, upload-time = "2026-02-15T13:33:27.417Z" },
{ url = "https://files.pythonhosted.org/packages/76/69/70a5fc881290a63910494df2677c0fb241d27cfaa435bbcd0de5cd2e2443/prek-0.3.2-py3-none-linux_armv6l.whl", hash = "sha256:4f352f9c3fc98aeed4c8b2ec4dbf16fc386e45eea163c44d67e5571489bd8e6f", size = 4614960, upload-time = "2026-02-06T13:50:05.818Z" },
{ url = "https://files.pythonhosted.org/packages/c0/15/a82d5d32a2207ccae5d86ea9e44f2b93531ed000faf83a253e8d1108e026/prek-0.3.2-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:4a000cfbc3a6ec7d424f8be3c3e69ccd595448197f92daac8652382d0acc2593", size = 4622889, upload-time = "2026-02-06T13:49:53.662Z" },
{ url = "https://files.pythonhosted.org/packages/89/75/ea833b58a12741397017baef9b66a6e443bfa8286ecbd645d14111446280/prek-0.3.2-py3-none-macosx_11_0_arm64.whl", hash = "sha256:5436bdc2702cbd7bcf9e355564ae66f8131211e65fefae54665a94a07c3d450a", size = 4239653, upload-time = "2026-02-06T13:50:02.88Z" },
{ url = "https://files.pythonhosted.org/packages/10/b4/d9c3885987afac6e20df4cb7db14e3b0d5a08a77ae4916488254ebac4d0b/prek-0.3.2-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.musllinux_1_1_aarch64.whl", hash = "sha256:0161b5f584f9e7f416d6cf40a17b98f17953050ff8d8350ec60f20fe966b86b6", size = 4595101, upload-time = "2026-02-06T13:49:49.813Z" },
{ url = "https://files.pythonhosted.org/packages/21/a6/1a06473ed83dbc898de22838abdb13954e2583ce229f857f61828384634c/prek-0.3.2-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:4e641e8533bca38797eebb49aa89ed0e8db0e61225943b27008c257e3af4d631", size = 4521978, upload-time = "2026-02-06T13:49:41.266Z" },
{ url = "https://files.pythonhosted.org/packages/0c/5e/c38390d5612e6d86b32151c1d2fdab74a57913473193591f0eb00c894c21/prek-0.3.2-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cfca1810d49d3f9ef37599c958c4e716bc19a1d78a7e88cbdcb332e0b008994f", size = 4829108, upload-time = "2026-02-06T13:49:44.598Z" },
{ url = "https://files.pythonhosted.org/packages/80/a6/cecce2ab623747ff65ed990bb0d95fa38449ee19b348234862acf9392fff/prek-0.3.2-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e5d69d754299a95a85dc20196f633232f306bee7e7c8cba61791f49ce70404ec", size = 5357520, upload-time = "2026-02-06T13:49:48.512Z" },
{ url = "https://files.pythonhosted.org/packages/a5/18/d6bcb29501514023c76d55d5cd03bdbc037737c8de8b6bc41cdebfb1682c/prek-0.3.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:539dcb90ad9b20837968539855df6a29493b328a1ae87641560768eed4f313b0", size = 4852635, upload-time = "2026-02-06T13:49:58.347Z" },
{ url = "https://files.pythonhosted.org/packages/1b/0a/ae46f34ba27ba87aea5c9ad4ac9cd3e07e014fd5079ae079c84198f62118/prek-0.3.2-py3-none-manylinux_2_28_aarch64.whl", hash = "sha256:1998db3d0cbe243984736c82232be51318f9192e2433919a6b1c5790f600b5fd", size = 4599484, upload-time = "2026-02-06T13:49:43.296Z" },
{ url = "https://files.pythonhosted.org/packages/1a/a9/73bfb5b3f7c3583f9b0d431924873928705cdef6abb3d0461c37254a681b/prek-0.3.2-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:07ab237a5415a3e8c0db54de9d63899bcd947624bdd8820d26f12e65f8d19eb7", size = 4657694, upload-time = "2026-02-06T13:50:01.074Z" },
{ url = "https://files.pythonhosted.org/packages/a7/bc/0994bc176e1a80110fad3babce2c98b0ac4007630774c9e18fc200a34781/prek-0.3.2-py3-none-musllinux_1_1_armv7l.whl", hash = "sha256:0ced19701d69c14a08125f14a5dd03945982edf59e793c73a95caf4697a7ac30", size = 4509337, upload-time = "2026-02-06T13:49:54.891Z" },
{ url = "https://files.pythonhosted.org/packages/f9/13/e73f85f65ba8f626468e5d1694ab3763111513da08e0074517f40238c061/prek-0.3.2-py3-none-musllinux_1_1_i686.whl", hash = "sha256:ffb28189f976fa111e770ee94e4f298add307714568fb7d610c8a7095cb1ce59", size = 4697350, upload-time = "2026-02-06T13:50:04.526Z" },
{ url = "https://files.pythonhosted.org/packages/14/47/98c46dcd580305b9960252a4eb966f1a7b1035c55c363f378d85662ba400/prek-0.3.2-py3-none-musllinux_1_1_x86_64.whl", hash = "sha256:f63134b3eea14421789a7335d86f99aee277cb520427196f2923b9260c60e5c5", size = 4955860, upload-time = "2026-02-06T13:49:56.581Z" },
]
[[package]]
@@ -4631,24 +4619,24 @@ wheels = [
[[package]]
name = "ruff"
version = "0.15.4"
version = "0.15.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/da/31/d6e536cdebb6568ae75a7f00e4b4819ae0ad2640c3604c305a0428680b0c/ruff-0.15.4.tar.gz", hash = "sha256:3412195319e42d634470cc97aa9803d07e9d5c9223b99bcb1518f0c725f26ae1", size = 4569550, upload-time = "2026-02-26T20:04:14.959Z" }
sdist = { url = "https://files.pythonhosted.org/packages/c8/39/5cee96809fbca590abea6b46c6d1c586b49663d1d2830a751cc8fc42c666/ruff-0.15.0.tar.gz", hash = "sha256:6bdea47cdbea30d40f8f8d7d69c0854ba7c15420ec75a26f463290949d7f7e9a", size = 4524893, upload-time = "2026-02-03T17:53:35.357Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/f2/82/c11a03cfec3a4d26a0ea1e571f0f44be5993b923f905eeddfc397c13d360/ruff-0.15.4-py3-none-linux_armv6l.whl", hash = "sha256:a1810931c41606c686bae8b5b9a8072adac2f611bb433c0ba476acba17a332e0", size = 10453333, upload-time = "2026-02-26T20:04:20.093Z" },
{ url = "https://files.pythonhosted.org/packages/ce/5d/6a1f271f6e31dffb31855996493641edc3eef8077b883eaf007a2f1c2976/ruff-0.15.4-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:5a1632c66672b8b4d3e1d1782859e98d6e0b4e70829530666644286600a33992", size = 10853356, upload-time = "2026-02-26T20:04:05.808Z" },
{ url = "https://files.pythonhosted.org/packages/b1/d8/0fab9f8842b83b1a9c2bf81b85063f65e93fb512e60effa95b0be49bfc54/ruff-0.15.4-py3-none-macosx_11_0_arm64.whl", hash = "sha256:a4386ba2cd6c0f4ff75252845906acc7c7c8e1ac567b7bc3d373686ac8c222ba", size = 10187434, upload-time = "2026-02-26T20:03:54.656Z" },
{ url = "https://files.pythonhosted.org/packages/85/cc/cc220fd9394eff5db8d94dec199eec56dd6c9f3651d8869d024867a91030/ruff-0.15.4-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b2496488bdfd3732747558b6f95ae427ff066d1fcd054daf75f5a50674411e75", size = 10535456, upload-time = "2026-02-26T20:03:52.738Z" },
{ url = "https://files.pythonhosted.org/packages/fa/0f/bced38fa5cf24373ec767713c8e4cadc90247f3863605fb030e597878661/ruff-0.15.4-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3f1c4893841ff2d54cbda1b2860fa3260173df5ddd7b95d370186f8a5e66a4ac", size = 10287772, upload-time = "2026-02-26T20:04:08.138Z" },
{ url = "https://files.pythonhosted.org/packages/2b/90/58a1802d84fed15f8f281925b21ab3cecd813bde52a8ca033a4de8ab0e7a/ruff-0.15.4-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:820b8766bd65503b6c30aaa6331e8ef3a6e564f7999c844e9a547c40179e440a", size = 11049051, upload-time = "2026-02-26T20:04:03.53Z" },
{ url = "https://files.pythonhosted.org/packages/d2/ac/b7ad36703c35f3866584564dc15f12f91cb1a26a897dc2fd13d7cb3ae1af/ruff-0.15.4-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c9fb74bab47139c1751f900f857fa503987253c3ef89129b24ed375e72873e85", size = 11890494, upload-time = "2026-02-26T20:04:10.497Z" },
{ url = "https://files.pythonhosted.org/packages/93/3d/3eb2f47a39a8b0da99faf9c54d3eb24720add1e886a5309d4d1be73a6380/ruff-0.15.4-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f80c98765949c518142b3a50a5db89343aa90f2c2bf7799de9986498ae6176db", size = 11326221, upload-time = "2026-02-26T20:04:12.84Z" },
{ url = "https://files.pythonhosted.org/packages/ff/90/bf134f4c1e5243e62690e09d63c55df948a74084c8ac3e48a88468314da6/ruff-0.15.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:451a2e224151729b3b6c9ffb36aed9091b2996fe4bdbd11f47e27d8f2e8888ec", size = 11168459, upload-time = "2026-02-26T20:04:00.969Z" },
{ url = "https://files.pythonhosted.org/packages/b5/e5/a64d27688789b06b5d55162aafc32059bb8c989c61a5139a36e1368285eb/ruff-0.15.4-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:a8f157f2e583c513c4f5f896163a93198297371f34c04220daf40d133fdd4f7f", size = 11104366, upload-time = "2026-02-26T20:03:48.099Z" },
{ url = "https://files.pythonhosted.org/packages/f1/f6/32d1dcb66a2559763fc3027bdd65836cad9eb09d90f2ed6a63d8e9252b02/ruff-0.15.4-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:917cc68503357021f541e69b35361c99387cdbbf99bd0ea4aa6f28ca99ff5338", size = 10510887, upload-time = "2026-02-26T20:03:45.771Z" },
{ url = "https://files.pythonhosted.org/packages/ff/92/22d1ced50971c5b6433aed166fcef8c9343f567a94cf2b9d9089f6aa80fe/ruff-0.15.4-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:e9737c8161da79fd7cfec19f1e35620375bd8b2a50c3e77fa3d2c16f574105cc", size = 10285939, upload-time = "2026-02-26T20:04:22.42Z" },
{ url = "https://files.pythonhosted.org/packages/e6/f4/7c20aec3143837641a02509a4668fb146a642fd1211846634edc17eb5563/ruff-0.15.4-py3-none-musllinux_1_2_i686.whl", hash = "sha256:291258c917539e18f6ba40482fe31d6f5ac023994ee11d7bdafd716f2aab8a68", size = 10765471, upload-time = "2026-02-26T20:03:58.924Z" },
{ url = "https://files.pythonhosted.org/packages/d0/09/6d2f7586f09a16120aebdff8f64d962d7c4348313c77ebb29c566cefc357/ruff-0.15.4-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:3f83c45911da6f2cd5936c436cf86b9f09f09165f033a99dcf7477e34041cbc3", size = 11263382, upload-time = "2026-02-26T20:04:24.424Z" },
{ url = "https://files.pythonhosted.org/packages/bc/88/3fd1b0aa4b6330d6aaa63a285bc96c9f71970351579152d231ed90914586/ruff-0.15.0-py3-none-linux_armv6l.whl", hash = "sha256:aac4ebaa612a82b23d45964586f24ae9bc23ca101919f5590bdb368d74ad5455", size = 10354332, upload-time = "2026-02-03T17:52:54.892Z" },
{ url = "https://files.pythonhosted.org/packages/72/f6/62e173fbb7eb75cc29fe2576a1e20f0a46f671a2587b5f604bfb0eaf5f6f/ruff-0.15.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:dcd4be7cc75cfbbca24a98d04d0b9b36a270d0833241f776b788d59f4142b14d", size = 10767189, upload-time = "2026-02-03T17:53:19.778Z" },
{ url = "https://files.pythonhosted.org/packages/99/e4/968ae17b676d1d2ff101d56dc69cf333e3a4c985e1ec23803df84fc7bf9e/ruff-0.15.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d747e3319b2bce179c7c1eaad3d884dc0a199b5f4d5187620530adf9105268ce", size = 10075384, upload-time = "2026-02-03T17:53:29.241Z" },
{ url = "https://files.pythonhosted.org/packages/a2/bf/9843c6044ab9e20af879c751487e61333ca79a2c8c3058b15722386b8cae/ruff-0.15.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:650bd9c56ae03102c51a5e4b554d74d825ff3abe4db22b90fd32d816c2e90621", size = 10481363, upload-time = "2026-02-03T17:52:43.332Z" },
{ url = "https://files.pythonhosted.org/packages/55/d9/4ada5ccf4cd1f532db1c8d44b6f664f2208d3d93acbeec18f82315e15193/ruff-0.15.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a6664b7eac559e3048223a2da77769c2f92b43a6dfd4720cef42654299a599c9", size = 10187736, upload-time = "2026-02-03T17:53:00.522Z" },
{ url = "https://files.pythonhosted.org/packages/86/e2/f25eaecd446af7bb132af0a1d5b135a62971a41f5366ff41d06d25e77a91/ruff-0.15.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6f811f97b0f092b35320d1556f3353bf238763420ade5d9e62ebd2b73f2ff179", size = 10968415, upload-time = "2026-02-03T17:53:15.705Z" },
{ url = "https://files.pythonhosted.org/packages/e7/dc/f06a8558d06333bf79b497d29a50c3a673d9251214e0d7ec78f90b30aa79/ruff-0.15.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:761ec0a66680fab6454236635a39abaf14198818c8cdf691e036f4bc0f406b2d", size = 11809643, upload-time = "2026-02-03T17:53:23.031Z" },
{ url = "https://files.pythonhosted.org/packages/dd/45/0ece8db2c474ad7df13af3a6d50f76e22a09d078af63078f005057ca59eb/ruff-0.15.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:940f11c2604d317e797b289f4f9f3fa5555ffe4fb574b55ed006c3d9b6f0eb78", size = 11234787, upload-time = "2026-02-03T17:52:46.432Z" },
{ url = "https://files.pythonhosted.org/packages/8a/d9/0e3a81467a120fd265658d127db648e4d3acfe3e4f6f5d4ea79fac47e587/ruff-0.15.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bcbca3d40558789126da91d7ef9a7c87772ee107033db7191edefa34e2c7f1b4", size = 11112797, upload-time = "2026-02-03T17:52:49.274Z" },
{ url = "https://files.pythonhosted.org/packages/b2/cb/8c0b3b0c692683f8ff31351dfb6241047fa873a4481a76df4335a8bff716/ruff-0.15.0-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:9a121a96db1d75fa3eb39c4539e607f628920dd72ff1f7c5ee4f1b768ac62d6e", size = 11033133, upload-time = "2026-02-03T17:53:33.105Z" },
{ url = "https://files.pythonhosted.org/packages/f8/5e/23b87370cf0f9081a8c89a753e69a4e8778805b8802ccfe175cc410e50b9/ruff-0.15.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5298d518e493061f2eabd4abd067c7e4fb89e2f63291c94332e35631c07c3662", size = 10442646, upload-time = "2026-02-03T17:53:06.278Z" },
{ url = "https://files.pythonhosted.org/packages/e1/9a/3c94de5ce642830167e6d00b5c75aacd73e6347b4c7fc6828699b150a5ee/ruff-0.15.0-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:afb6e603d6375ff0d6b0cee563fa21ab570fd15e65c852cb24922cef25050cf1", size = 10195750, upload-time = "2026-02-03T17:53:26.084Z" },
{ url = "https://files.pythonhosted.org/packages/30/15/e396325080d600b436acc970848d69df9c13977942fb62bb8722d729bee8/ruff-0.15.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:77e515f6b15f828b94dc17d2b4ace334c9ddb7d9468c54b2f9ed2b9c1593ef16", size = 10676120, upload-time = "2026-02-03T17:53:09.363Z" },
{ url = "https://files.pythonhosted.org/packages/8d/c9/229a23d52a2983de1ad0fb0ee37d36e0257e6f28bfd6b498ee2c76361874/ruff-0.15.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:6f6e80850a01eb13b3e42ee0ebdf6e4497151b48c35051aab51c101266d187a3", size = 11201636, upload-time = "2026-02-03T17:52:57.281Z" },
]
[[package]]
@@ -4839,7 +4827,7 @@ wheels = [
[[package]]
name = "sentence-transformers"
version = "5.2.3"
version = "5.2.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "huggingface-hub", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -4854,9 +4842,9 @@ dependencies = [
{ name = "transformers", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "typing-extensions", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/5b/30/21664028fc0776eb1ca024879480bbbab36f02923a8ff9e4cae5a150fa35/sentence_transformers-5.2.3.tar.gz", hash = "sha256:3cd3044e1f3fe859b6a1b66336aac502eaae5d3dd7d5c8fc237f37fbf58137c7", size = 381623, upload-time = "2026-02-17T14:05:20.238Z" }
sdist = { url = "https://files.pythonhosted.org/packages/a6/bc/0bc9c0ec1cf83ab2ec6e6f38667d167349b950fff6dd2086b79bd360eeca/sentence_transformers-5.2.2.tar.gz", hash = "sha256:7033ee0a24bc04c664fd490abf2ef194d387b3a58a97adcc528783ff505159fa", size = 381607, upload-time = "2026-01-27T11:11:02.658Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/46/9f/dba4b3e18ebbe1eaa29d9f1764fbc7da0cd91937b83f2b7928d15c5d2d36/sentence_transformers-5.2.3-py3-none-any.whl", hash = "sha256:6437c62d4112b615ddebda362dfc16a4308d604c5b68125ed586e3e95d5b2e30", size = 494225, upload-time = "2026-02-17T14:05:18.596Z" },
{ url = "https://files.pythonhosted.org/packages/cc/21/7e925890636791386e81b52878134f114d63072e79fffe14cdcc5e7a5e6a/sentence_transformers-5.2.2-py3-none-any.whl", hash = "sha256:280ac54bffb84c110726b4d8848ba7b7c60813b9034547f8aea6e9a345cd1c23", size = 494106, upload-time = "2026-01-27T11:11:00.983Z" },
]
[[package]]
@@ -5145,15 +5133,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ab/0d/c1ad6f4016a3968c048545f5d9b8ffebf577774b2ede3e2e352553b685fe/tiktoken-0.12.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:5edb8743b88d5be814b1a8a8854494719080c28faaa1ccbef02e87354fe71ef0", size = 1253706, upload-time = "2025-10-06T20:22:33.385Z" },
]
[[package]]
name = "tinytag"
version = "2.2.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/98/07/fb260bac73119f369a10e884016516d07cd760b5068e703773f83dd5e7bf/tinytag-2.2.0.tar.gz", hash = "sha256:f15b082510f6e0fc717e597edc8759d6f2d3ff6194ac0f3bcd675a9a09d9b798", size = 38120, upload-time = "2025-12-15T21:10:19.093Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b1/e2/9818fcebb348237389d2ac2fea97cf2b2638378a0866105a45ae9be49728/tinytag-2.2.0-py3-none-any.whl", hash = "sha256:d2cf3ef8ee0f6c854663f77d9d5f8159ee1c834c70f5ea4f214ddc4af8148f79", size = 32861, upload-time = "2025-12-15T21:10:17.63Z" },
]
[[package]]
name = "tokenizers"
version = "0.22.2"
@@ -5502,11 +5481,11 @@ wheels = [
[[package]]
name = "types-markdown"
version = "3.10.2.20260211"
version = "3.10.0.20251106"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/6d/2e/35b30a09f6ee8a69142408d3ceb248c4454aa638c0a414d8704a3ef79563/types_markdown-3.10.2.20260211.tar.gz", hash = "sha256:66164310f88c11a58c6c706094c6f8c537c418e3525d33b76276a5fbd66b01ce", size = 19768, upload-time = "2026-02-11T04:19:29.497Z" }
sdist = { url = "https://files.pythonhosted.org/packages/de/e4/060f0dadd9b551cae77d6407f2bc84b168f918d90650454aff219c1b3ed2/types_markdown-3.10.0.20251106.tar.gz", hash = "sha256:12836f7fcbd7221db8baeb0d3a2f820b95050d0824bfa9665c67b4d144a1afa1", size = 19486, upload-time = "2025-11-06T03:06:44.317Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/54/c9/659fa2df04b232b0bfcd05d2418e683080e91ec68f636f3c0a5a267350e7/types_markdown-3.10.2.20260211-py3-none-any.whl", hash = "sha256:2d94d08587e3738203b3c4479c449845112b171abe8b5cadc9b0c12fcf3e99da", size = 25854, upload-time = "2026-02-11T04:19:28.647Z" },
{ url = "https://files.pythonhosted.org/packages/92/58/f666ca9391f2a8bd33bb0b0797cde6ac3e764866708d5f8aec6fab215320/types_markdown-3.10.0.20251106-py3-none-any.whl", hash = "sha256:2c39512a573899b59efae07e247ba088a75b70e3415e81277692718f430afd7e", size = 25862, upload-time = "2025-11-06T03:06:43.082Z" },
]
[[package]]
@@ -5595,6 +5574,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/2b/7f/016dc5cc718ec6ccaa84fb73ed409ef1c261793fd5e637cdfaa18beb40a9/types_setuptools-80.10.0.20260124-py3-none-any.whl", hash = "sha256:efed7e044f01adb9c2806c7a8e1b6aa3656b8e382379b53d5f26ee3db24d4c01", size = 64333, upload-time = "2026-01-24T03:18:38.344Z" },
]
[[package]]
name = "types-tqdm"
version = "4.67.3.20260205"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "types-requests", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/53/46/790b9872523a48163bdda87d47849b4466017640e5259d06eed539340afd/types_tqdm-4.67.3.20260205.tar.gz", hash = "sha256:f3023682d4aa3bbbf908c8c6bb35f35692d319460d9bbd3e646e8852f3dd9f85", size = 17597, upload-time = "2026-02-05T04:03:19.721Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/cc/da/7f761868dbaa328392356fab30c18ab90d14cce86b269e7e63328f29d4a3/types_tqdm-4.67.3.20260205-py3-none-any.whl", hash = "sha256:85c31731e81dc3c5cecc34c6c8b2e5166fafa722468f58840c2b5ac6a8c5c173", size = 23894, upload-time = "2026-02-05T04:03:18.48Z" },
]
[[package]]
name = "types-webencodings"
version = "0.5.0.20251108"
@@ -6104,7 +6095,7 @@ wheels = [
[[package]]
name = "zensical"
version = "0.0.24"
version = "0.0.21"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "click", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -6115,18 +6106,18 @@ dependencies = [
{ name = "pyyaml", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "tomli", marker = "(python_full_version < '3.11' and sys_platform == 'darwin') or (python_full_version < '3.11' and sys_platform == 'linux')" },
]
sdist = { url = "https://files.pythonhosted.org/packages/3b/96/9c6cbdd7b351d1023cdbbcf7872d4cb118b0334cfe5821b99e0dd18e3f00/zensical-0.0.24.tar.gz", hash = "sha256:b5d99e225329bf4f98c8022bdf0a0ee9588c2fada7b4df1b7b896fcc62b37ec3", size = 3840688, upload-time = "2026-02-26T09:43:44.557Z" }
sdist = { url = "https://files.pythonhosted.org/packages/8a/50/2655b5f72d0c72f4366be580f5e2354ff05280d047ea986fe89570e44589/zensical-0.0.21.tar.gz", hash = "sha256:c13563836fa63a3cabeffd83fe3a770ca740cfa5ae7b85df85d89837e31b3b4a", size = 3819731, upload-time = "2026-02-04T17:47:59.396Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/8e/aa/b8201af30e376a67566f044a1c56210edac5ae923fd986a836d2cf593c9c/zensical-0.0.24-cp310-abi3-macosx_10_12_x86_64.whl", hash = "sha256:d390c5453a5541ca35d4f9e1796df942b6612c546e3153dd928236d3b758409a", size = 12263407, upload-time = "2026-02-26T09:43:14.716Z" },
{ url = "https://files.pythonhosted.org/packages/78/8e/3d910214471ade604fd39b080db3696864acc23678b5b4b8475c7dbfd2ce/zensical-0.0.24-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:81ac072869cf4d280853765b2bfb688653da0dfb9408f3ab15aca96455ab8223", size = 12142610, upload-time = "2026-02-26T09:43:17.546Z" },
{ url = "https://files.pythonhosted.org/packages/cf/d7/eb0983640aa0419ddf670298cfbcf8b75629b6484925429b857851e00784/zensical-0.0.24-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b5eb1dfa84cae8e960bfa2c6851d2bc8e9710c4c4c683bd3aaf23185f646ae46", size = 12508380, upload-time = "2026-02-26T09:43:20.114Z" },
{ url = "https://files.pythonhosted.org/packages/a3/04/4405b9e6f937a75db19f0d875798a7eb70817d6a3bec2a2d289a2d5e8aea/zensical-0.0.24-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:57d7c9e589da99c1879a1c703e67c85eaa6be4661cdc6ce6534f7bb3575983f4", size = 12440807, upload-time = "2026-02-26T09:43:22.679Z" },
{ url = "https://files.pythonhosted.org/packages/12/dc/a7ca2a4224b3072a2c2998b6611ad7fd4f8f131ceae7aa23238d97d26e22/zensical-0.0.24-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:42fcc121c3095734b078a95a0dae4d4924fb8fbf16bf730456146ad6cab48ad0", size = 12782727, upload-time = "2026-02-26T09:43:25.347Z" },
{ url = "https://files.pythonhosted.org/packages/42/37/22f1727da356ed3fcbd31f68d4a477f15c232997c87e270cfffb927459ac/zensical-0.0.24-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:832d4a2a051b9f49561031a2986ace502326f82d9a401ddf125530d30025fdd4", size = 12547616, upload-time = "2026-02-26T09:43:28.031Z" },
{ url = "https://files.pythonhosted.org/packages/6d/ff/c75ff111b8e12157901d00752beef9d691dbb5a034b6a77359972262416a/zensical-0.0.24-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:e5fea3bb61238dba9f930f52669db67b0c26be98e1c8386a05eb2b1e3cb875dc", size = 12684883, upload-time = "2026-02-26T09:43:30.642Z" },
{ url = "https://files.pythonhosted.org/packages/b9/92/4f6ea066382e3d068d3cadbed99e9a71af25e46c84a403e0f747960472a2/zensical-0.0.24-cp310-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:75eef0428eec2958590633fdc82dc2a58af124879e29573aa7e153b662978073", size = 12713825, upload-time = "2026-02-26T09:43:33.273Z" },
{ url = "https://files.pythonhosted.org/packages/bc/fb/bf735b19bce0034b1f3b8e1c50b2896ebbd0c5d92d462777e759e78bb083/zensical-0.0.24-cp310-abi3-musllinux_1_2_i686.whl", hash = "sha256:3c6b39659156394ff805b4831dac108c839483d9efa4c9b901eaa913efee1ac7", size = 12854318, upload-time = "2026-02-26T09:43:35.632Z" },
{ url = "https://files.pythonhosted.org/packages/7e/28/0ddab6c1237e3625e7763ff666806f31e5760bb36d18624135a6bb6e8643/zensical-0.0.24-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:9eef82865a18b3ca4c3cd13e245dff09a865d1da3c861e2fc86eaa9253a90f02", size = 12818270, upload-time = "2026-02-26T09:43:37.749Z" },
{ url = "https://files.pythonhosted.org/packages/1d/98/90710d232cb35b633815fa7b493da542391b89283b6103a5bb4ae9fc0dd9/zensical-0.0.21-cp310-abi3-macosx_10_12_x86_64.whl", hash = "sha256:67404cc70c330246dfb7269bcdb60a25be0bb60a212a09c9c50229a1341b1f84", size = 12237120, upload-time = "2026-02-04T17:47:28.615Z" },
{ url = "https://files.pythonhosted.org/packages/97/fb/4280b3781157e8f051711732192f949bf29beeafd0df3e33c1c8bf9b7a1a/zensical-0.0.21-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:d4fd253ccfbf5af56434124f13bac01344e456c020148369b18d8836b6537c3c", size = 12118047, upload-time = "2026-02-04T17:47:31.369Z" },
{ url = "https://files.pythonhosted.org/packages/74/b3/b7f85ae9cf920cf9f17bf157ae6c274919477148feb7716bf735636caa0e/zensical-0.0.21-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:440e40cdc30a29bf7466bcd6f43ed7bd1c54ea3f1a0fefca65619358b481a5bc", size = 12473440, upload-time = "2026-02-04T17:47:33.577Z" },
{ url = "https://files.pythonhosted.org/packages/d8/ac/1dc6e98f79ed19b9f103c88a0bd271f9140565d7d26b64bc1542b3ef6d91/zensical-0.0.21-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:368e832fc8068e75dc45cab59379db4cefcd81eb116f48d058db8fb7b7aa8d14", size = 12412588, upload-time = "2026-02-04T17:47:36.491Z" },
{ url = "https://files.pythonhosted.org/packages/bd/76/16a580f6dd32b387caa4a41615451e7dddd1917a2ff2e5b08744f41b4e11/zensical-0.0.21-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f4ab962d47f9dd73510eed168469326c7a452554dfbfdb9cdf85efc7140244df", size = 12749438, upload-time = "2026-02-04T17:47:38.969Z" },
{ url = "https://files.pythonhosted.org/packages/95/30/4baaa1c910eee61db5f49d0d45f2e550a0027218c618f3dd7f8da966a019/zensical-0.0.21-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:b846d53dfce007f056ff31848f87f3f2a388228e24d4851c0cafdce0fa204c9b", size = 12514504, upload-time = "2026-02-04T17:47:41.31Z" },
{ url = "https://files.pythonhosted.org/packages/76/77/931fccae5580b94409a0448a26106f922dcfa7822e7b93cacd2876dd63a8/zensical-0.0.21-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:daac1075552d230d52d621d2e4754ba24d5afcaa201a7a991f1a8d57e320c9de", size = 12647832, upload-time = "2026-02-04T17:47:44.073Z" },
{ url = "https://files.pythonhosted.org/packages/5b/82/3cf75de64340829d55c87c36704f4d1d8c952bd2cdc8a7bc48cbfb8ab333/zensical-0.0.21-cp310-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:7b380f545adb6d40896f9bd698eb0e1540ed4258d35b83f55f91658d0fdae312", size = 12678537, upload-time = "2026-02-04T17:47:46.899Z" },
{ url = "https://files.pythonhosted.org/packages/77/91/6f4938dceeaa241f78bbfaf58a94acef10ba18be3468795173e3087abeb6/zensical-0.0.21-cp310-abi3-musllinux_1_2_i686.whl", hash = "sha256:5c2227fdab64616bea94b40b8340bafe00e2e23631cc58eeea1e7267167e6ac5", size = 12822164, upload-time = "2026-02-04T17:47:49.231Z" },
{ url = "https://files.pythonhosted.org/packages/a2/4e/a9c9d25ef0766f767db7b4f09da68da9b3d8a28c3d68cfae01f8e3f9e297/zensical-0.0.21-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:2e0f5154d236ed0f98662ee68785b67e8cd2138ea9d5e26070649e93c22eeee0", size = 12785632, upload-time = "2026-02-04T17:47:52.613Z" },
]
[[package]]

View File

@@ -16,6 +16,7 @@ repo_name = "paperless-ngx/paperless-ngx"
nav = [
"index.md",
"setup.md",
"migration.md",
"usage.md",
"configuration.md",
"administration.md",