Compare commits

..

11 Commits

Author SHA1 Message Date
Trenton H
0887203d45 feat(profiling): add workflow trigger matching profiling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 14:55:25 -07:00
Trenton H
ea14c0b06f fix(profiling): use sha256 for sanity corpus checksums
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 14:50:21 -07:00
Trenton H
a8dc332abb feat(profiling): add sanity checker profiling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 14:42:00 -07:00
Trenton H
e64b9a4cfd feat(profiling): add matching pipeline profiling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 14:30:33 -07:00
Trenton H
6ba1acd7d3 fix(profiling): fix stale docstring and add module_db docstring in doclist test 2026-04-11 14:19:51 -07:00
Trenton H
d006b79fd1 feat(profiling): add document list API and selection_data profiling
Adds test_doclist_profile.py with 8 profiling tests covering the
/api/documents/ list path (ORM ordering, page sizes, single-doc detail,
cProfile) and _get_selection_data_for_queryset in isolation and via API.
Also registers the 'profiling' pytest marker in pyproject.toml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 14:13:17 -07:00
Trenton H
24b754b44c fix(profiling): fix stale run paths in docstrings and consolidate profiling imports 2026-04-11 13:57:00 -07:00
Trenton H
a1a3520a8c refactor(profiling): use shared profile_cpu in backend search test
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 13:46:43 -07:00
Trenton H
23449cda17 refactor(profiling): use shared profile_cpu/measure_memory in classifier test
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 13:44:57 -07:00
Trenton H
ca3f5665ba fix(profiling): correct docstring import path and add Callable type annotation 2026-04-11 13:29:48 -07:00
Trenton H
9aa0914c3f feat(profiling): add profile_cpu and measure_memory shared helpers
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 13:22:43 -07:00
61 changed files with 11755 additions and 9231 deletions

View File

@@ -165,7 +165,6 @@ jobs:
contents: read
env:
DEFAULT_PYTHON: "3.12"
PAPERLESS_SECRET_KEY: "ci-typing-not-a-real-secret"
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

View File

@@ -88,7 +88,6 @@ jobs:
uv export --quiet --no-dev --all-extras --format requirements-txt --output-file requirements.txt
- name: Compile messages
env:
PAPERLESS_SECRET_KEY: "ci-release-not-a-real-secret"
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
cd src/
@@ -97,7 +96,6 @@ jobs:
manage.py compilemessages
- name: Collect static files
env:
PAPERLESS_SECRET_KEY: "ci-release-not-a-real-secret"
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
cd src/

View File

@@ -36,8 +36,6 @@ jobs:
--group dev \
--frozen
- name: Generate backend translation strings
env:
PAPERLESS_SECRET_KEY: "ci-translate-not-a-real-secret"
run: cd src/ && uv run manage.py makemessages -l en_US -i "samples*"
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v5.0.0

3
.gitignore vendored
View File

@@ -79,7 +79,6 @@ virtualenv
/docker-compose.env
/docker-compose.yml
.ruff_cache/
.mypy_cache/
# Used for development
scripts/import-for-development
@@ -112,6 +111,4 @@ celerybeat-schedule*
# ignore pnpm package store folder created when setting up the devcontainer
.pnpm-store/
# Git worktree local folder
.worktrees

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@@ -241,66 +241,3 @@ For example:
}
}
```
## Consume Script Positional Arguments Removed
Pre- and post-consumption scripts no longer receive positional arguments. All information is
now passed exclusively via environment variables, which have been available since earlier versions.
### Pre-consumption script
Previously, the original file path was passed as `$1`. It is now only available as
`DOCUMENT_SOURCE_PATH`.
**Before:**
```bash
#!/usr/bin/env bash
# $1 was the original file path
process_document "$1"
```
**After:**
```bash
#!/usr/bin/env bash
process_document "${DOCUMENT_SOURCE_PATH}"
```
### Post-consumption script
Previously, document metadata was passed as positional arguments `$1` through `$8`:
| Argument | Environment Variable Equivalent |
| -------- | ------------------------------- |
| `$1` | `DOCUMENT_ID` |
| `$2` | `DOCUMENT_FILE_NAME` |
| `$3` | `DOCUMENT_SOURCE_PATH` |
| `$4` | `DOCUMENT_THUMBNAIL_PATH` |
| `$5` | `DOCUMENT_DOWNLOAD_URL` |
| `$6` | `DOCUMENT_THUMBNAIL_URL` |
| `$7` | `DOCUMENT_CORRESPONDENT` |
| `$8` | `DOCUMENT_TAGS` |
**Before:**
```bash
#!/usr/bin/env bash
DOCUMENT_ID=$1
CORRESPONDENT=$7
TAGS=$8
```
**After:**
```bash
#!/usr/bin/env bash
# Use environment variables directly
echo "Document ${DOCUMENT_ID} from ${DOCUMENT_CORRESPONDENT} tagged: ${DOCUMENT_TAGS}"
```
### Action Required
Update any pre- or post-consumption scripts that read `$1`, `$2`, etc. to use the
corresponding environment variables instead. Environment variables have been the preferred
option since v1.8.0.

150
profiling.py Normal file
View File

@@ -0,0 +1,150 @@
"""
Temporary profiling utilities for comparing implementations.
Usage in a management command or shell::
from profiling import profile_block, profile_cpu, measure_memory
with profile_block("new check_sanity"):
messages = check_sanity()
with profile_block("old check_sanity"):
messages = check_sanity_old()
Drop this file when done.
"""
from __future__ import annotations
import tracemalloc
from collections.abc import Callable # noqa: TC003
from collections.abc import Generator # noqa: TC003
from contextlib import contextmanager
from time import perf_counter
from typing import Any
from django.db import connection
from django.db import reset_queries
from django.test.utils import override_settings
@contextmanager
def profile_block(label: str = "block") -> Generator[None, None, None]:
"""Profile memory, wall time, and DB queries for a code block.
Prints a summary to stdout on exit. Requires no external packages.
Enables DEBUG temporarily to capture Django's query log.
"""
tracemalloc.start()
snapshot_before = tracemalloc.take_snapshot()
with override_settings(DEBUG=True):
reset_queries()
start = perf_counter()
yield
elapsed = perf_counter() - start
queries = list(connection.queries)
snapshot_after = tracemalloc.take_snapshot()
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
# Compare snapshots for top allocations
stats = snapshot_after.compare_to(snapshot_before, "lineno")
query_time = sum(float(q["time"]) for q in queries)
mem_diff = sum(s.size_diff for s in stats)
print(f"\n{'=' * 60}") # noqa: T201
print(f" Profile: {label}") # noqa: T201
print(f"{'=' * 60}") # noqa: T201
print(f" Wall time: {elapsed:.4f}s") # noqa: T201
print(f" Queries: {len(queries)} ({query_time:.4f}s)") # noqa: T201
print(f" Memory delta: {mem_diff / 1024:.1f} KiB") # noqa: T201
print(f" Peak memory: {peak / 1024:.1f} KiB") # noqa: T201
print("\n Top 5 allocations:") # noqa: T201
for stat in stats[:5]:
print(f" {stat}") # noqa: T201
print(f"{'=' * 60}\n") # noqa: T201
def profile_cpu(
fn: Callable[[], Any],
*,
label: str,
top: int = 30,
sort: str = "cumtime",
) -> tuple[Any, float]:
"""Run *fn()* under cProfile, print stats, return (result, elapsed_s).
Args:
fn: Zero-argument callable to profile.
label: Human-readable label printed in the header.
top: Number of cProfile rows to print.
sort: cProfile sort key (default: cumulative time).
Returns:
``(result, elapsed_s)`` where *result* is the return value of *fn()*.
"""
import cProfile
import io
import pstats
pr = cProfile.Profile()
t0 = perf_counter()
pr.enable()
result = fn()
pr.disable()
elapsed = perf_counter() - t0
buf = io.StringIO()
ps = pstats.Stats(pr, stream=buf).sort_stats(sort)
ps.print_stats(top)
print(f"\n{'=' * 72}") # noqa: T201
print(f" {label}") # noqa: T201
print(f" wall time: {elapsed * 1000:.1f} ms") # noqa: T201
print(f"{'=' * 72}") # noqa: T201
print(buf.getvalue()) # noqa: T201
return result, elapsed
def measure_memory(fn: Callable[[], Any], *, label: str) -> tuple[Any, float, float]:
"""Run *fn()* under tracemalloc, print allocation report.
Args:
fn: Zero-argument callable to profile.
label: Human-readable label printed in the header.
Returns:
``(result, peak_kib, delta_kib)``.
"""
tracemalloc.start()
snapshot_before = tracemalloc.take_snapshot()
t0 = perf_counter()
result = fn()
elapsed = perf_counter() - t0
snapshot_after = tracemalloc.take_snapshot()
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
stats = snapshot_after.compare_to(snapshot_before, "lineno")
delta_kib = sum(s.size_diff for s in stats) / 1024
print(f"\n{'=' * 72}") # noqa: T201
print(f" [memory] {label}") # noqa: T201
print(f" wall time: {elapsed * 1000:.1f} ms") # noqa: T201
print(f" memory delta: {delta_kib:+.1f} KiB") # noqa: T201
print(f" peak traced: {peak / 1024:.1f} KiB") # noqa: T201
print(f"{'=' * 72}") # noqa: T201
print(" Top allocation sites (by size_diff):") # noqa: T201
for stat in stats[:20]:
if stat.size_diff != 0:
print( # noqa: T201
f" {stat.size_diff / 1024:+8.1f} KiB {stat.traceback.format()[0]}",
)
return result, peak / 1024, delta_kib

View File

@@ -24,10 +24,11 @@ dependencies = [
"dateparser~=1.2",
# WARNING: django does not use semver.
# Only patch versions are guaranteed to not introduce breaking changes.
"django~=5.2.13",
"django~=5.2.10",
"django-allauth[mfa,socialaccount]~=65.15.0",
"django-auditlog~=3.4.1",
"django-cachalot~=2.9.0",
"django-celery-results~=2.6.0",
"django-compression-middleware~=0.5.0",
"django-cors-headers~=4.9.0",
"django-extensions~=4.1",
@@ -112,7 +113,7 @@ testing = [
"factory-boy~=3.3.1",
"faker~=40.12.0",
"imagehash",
"pytest~=9.0.3",
"pytest~=9.0.0",
"pytest-cov~=7.1.0",
"pytest-django~=4.12.0",
"pytest-env~=1.6.0",
@@ -311,7 +312,7 @@ markers = [
"date_parsing: Tests which cover date parsing from content or filename",
"management: Tests which cover management commands/functionality",
"search: Tests for the Tantivy search backend",
"api: Tests for REST API endpoints",
"profiling: Performance profiling tests — print measurements, no assertions",
]
[tool.pytest_env]

8
src-ui/pnpm-lock.yaml generated
View File

@@ -4363,8 +4363,8 @@ packages:
flatted@3.4.2:
resolution: {integrity: sha512-PjDse7RzhcPkIJwy5t7KPWQSZ9cAbzQXcafsetQoD7sOJRQlGikNbx7yZp2OotDnJyrDcbyRq3Ttb18iYOqkxA==}
follow-redirects@1.16.0:
resolution: {integrity: sha512-y5rN/uOsadFT/JfYwhxRS5R7Qce+g3zG97+JrtFZlC9klX/W5hD7iiLzScI4nZqUS7DNUdhPgw4xI8W2LuXlUw==}
follow-redirects@1.15.11:
resolution: {integrity: sha512-deG2P0JfjrTxl50XGCDyfI97ZGVCxIpfKYmfyrQ54n5FO/0gfIES8C/Psl6kWVDolizcaaxZJnTS0QSMxvnsBQ==}
engines: {node: '>=4.0'}
peerDependencies:
debug: '*'
@@ -11427,7 +11427,7 @@ snapshots:
flatted@3.4.2: {}
follow-redirects@1.16.0(debug@4.4.3):
follow-redirects@1.15.11(debug@4.4.3):
optionalDependencies:
debug: 4.4.3
@@ -11634,7 +11634,7 @@ snapshots:
http-proxy@1.18.1(debug@4.4.3):
dependencies:
eventemitter3: 4.0.7
follow-redirects: 1.16.0(debug@4.4.3)
follow-redirects: 1.15.11(debug@4.4.3)
requires-port: 1.0.0
transitivePeerDependencies:
- debug

View File

@@ -76,27 +76,33 @@
<label class="form-check-label" for="task{{task.id}}"></label>
</div>
</td>
<td class="overflow-auto name-col">{{ task.input_data?.filename }}</td>
<td class="overflow-auto name-col">{{ task.task_file_name }}</td>
<td class="d-none d-lg-table-cell">{{ task.date_created | customDate:'short' }}</td>
@if (activeTab !== 'started' && activeTab !== 'queued') {
<td class="d-none d-lg-table-cell">
@if (task.result_message?.length > 50) {
@if (task.result?.length > 50) {
<div class="result" (click)="expandTask(task); $event.stopPropagation();"
[ngbPopover]="resultPopover" popoverClass="shadow small mobile" triggers="mouseenter:mouseleave" container="body">
<span class="small d-none d-md-inline-block font-monospace text-muted">{{ task.result_message | slice:0:50 }}&hellip;</span>
<span class="small d-none d-md-inline-block font-monospace text-muted">{{ task.result | slice:0:50 }}&hellip;</span>
</div>
}
@if (task.result_message?.length <= 50) {
<span class="small d-none d-md-inline-block font-monospace text-muted">{{ task.result_message }}</span>
@if (task.result?.length <= 50) {
<span class="small d-none d-md-inline-block font-monospace text-muted">{{ task.result }}</span>
}
<ng-template #resultPopover>
<pre class="small mb-0">{{ task.result_message | slice:0:300 }}@if (task.result_message.length > 300) {
<pre class="small mb-0">{{ task.result | slice:0:300 }}@if (task.result.length > 300) {
&hellip;
}</pre>
@if (task.result_message?.length > 300) {
@if (task.result?.length > 300) {
<br/><em>(<ng-container i18n>click for full output</ng-container>)</em>
}
</ng-template>
@if (task.duplicate_documents?.length > 0) {
<div class="small text-warning-emphasis d-flex align-items-center gap-1">
<i-bs class="lh-1" width="1em" height="1em" name="exclamation-triangle"></i-bs>
<span i18n>Duplicate(s) detected</span>
</div>
}
</td>
}
<td class="d-lg-none">
@@ -110,7 +116,7 @@
<i-bs name="check" class="me-1"></i-bs><ng-container i18n>Dismiss</ng-container>
</button>
<ng-container *pngxIfPermissions="{ action: PermissionAction.View, type: PermissionType.Document }">
@if (task.related_document_ids?.[0]) {
@if (task.related_document) {
<button class="btn btn-sm btn-outline-primary" (click)="dismissAndGo(task); $event.stopPropagation();">
<i-bs name="file-text" class="me-1"></i-bs><ng-container i18n>Open Document</ng-container>
</button>
@@ -121,7 +127,7 @@
</tr>
<tr>
<td class="p-0" [class.border-0]="expandedTask !== task.id" colspan="5">
<pre #collapse="ngbCollapse" [ngbCollapse]="expandedTask !== task.id" class="small mb-0"><div class="small p-1 p-lg-3 ms-lg-3">{{ task.result_message }}</div></pre>
<pre #collapse="ngbCollapse" [ngbCollapse]="expandedTask !== task.id" class="small mb-0"><div class="small p-1 p-lg-3 ms-lg-3">{{ task.result }}</div></pre>
</td>
</tr>
}

View File

@@ -20,8 +20,8 @@ import { throwError } from 'rxjs'
import { routes } from 'src/app/app-routing.module'
import {
PaperlessTask,
PaperlessTaskName,
PaperlessTaskStatus,
PaperlessTaskTriggerSource,
PaperlessTaskType,
} from 'src/app/data/paperless-task'
import { IfPermissionsDirective } from 'src/app/directives/if-permissions.directive'
@@ -39,100 +39,81 @@ const tasks: PaperlessTask[] = [
{
id: 467,
task_id: '11ca1a5b-9f81-442c-b2c8-7e4ae53657f1',
input_data: { filename: 'test.pdf' },
task_file_name: 'test.pdf',
date_created: new Date('2023-03-01T10:26:03.093116Z'),
date_done: new Date('2023-03-01T10:26:07.223048Z'),
task_type: PaperlessTaskType.ConsumeFile,
task_type_display: 'Consume File',
trigger_source: PaperlessTaskTriggerSource.FolderConsume,
trigger_source_display: 'Folder Consume',
status: PaperlessTaskStatus.Failure,
status_display: 'Failure',
result_message:
'test.pd: Not consuming test.pdf: It is a duplicate of test (#100)',
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Failed,
result: 'test.pd: Not consuming test.pdf: It is a duplicate of test (#100)',
acknowledged: false,
related_document_ids: [],
related_document: null,
},
{
id: 466,
task_id: '10ca1a5b-3c08-442c-b2c8-7e4ae53657f1',
input_data: { filename: '191092.pdf' },
task_file_name: '191092.pdf',
date_created: new Date('2023-03-01T09:26:03.093116Z'),
date_done: new Date('2023-03-01T09:26:07.223048Z'),
task_type: PaperlessTaskType.ConsumeFile,
task_type_display: 'Consume File',
trigger_source: PaperlessTaskTriggerSource.FolderConsume,
trigger_source_display: 'Folder Consume',
status: PaperlessTaskStatus.Failure,
status_display: 'Failure',
result_message:
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Failed,
result:
'191092.pd: Not consuming 191092.pdf: It is a duplicate of 191092 (#311)',
acknowledged: false,
related_document_ids: [],
related_document: null,
},
{
id: 465,
task_id: '3612d477-bb04-44e3-985b-ac580dd496d8',
input_data: { filename: 'Scan Jun 6, 2023 at 3.19 PM.pdf' },
task_file_name: 'Scan Jun 6, 2023 at 3.19 PM.pdf',
date_created: new Date('2023-06-06T15:22:05.722323-07:00'),
date_done: new Date('2023-06-06T15:22:14.564305-07:00'),
task_type: PaperlessTaskType.ConsumeFile,
task_type_display: 'Consume File',
trigger_source: PaperlessTaskTriggerSource.FolderConsume,
trigger_source_display: 'Folder Consume',
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Pending,
status_display: 'Pending',
result_message: null,
result: null,
acknowledged: false,
related_document_ids: [],
related_document: null,
},
{
id: 464,
task_id: '2eac4716-2aa6-4dcd-9953-264e11656d7e',
input_data: { filename: 'paperless-mail-l4dkg8ir' },
task_file_name: 'paperless-mail-l4dkg8ir',
date_created: new Date('2023-06-04T11:24:32.898089-07:00'),
date_done: new Date('2023-06-04T11:24:44.678605-07:00'),
task_type: PaperlessTaskType.ConsumeFile,
task_type_display: 'Consume File',
trigger_source: PaperlessTaskTriggerSource.EmailConsume,
trigger_source_display: 'Email Consume',
status: PaperlessTaskStatus.Success,
status_display: 'Success',
result_message: 'Success. New document id 422 created',
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Complete,
result: 'Success. New document id 422 created',
acknowledged: false,
related_document_ids: [422],
related_document: 422,
},
{
id: 463,
task_id: '28125528-1575-4d6b-99e6-168906e8fa5c',
input_data: { filename: 'onlinePaymentSummary.pdf' },
task_file_name: 'onlinePaymentSummary.pdf',
date_created: new Date('2023-06-01T13:49:51.631305-07:00'),
date_done: new Date('2023-06-01T13:49:54.190220-07:00'),
task_type: PaperlessTaskType.ConsumeFile,
task_type_display: 'Consume File',
trigger_source: PaperlessTaskTriggerSource.FolderConsume,
trigger_source_display: 'Folder Consume',
status: PaperlessTaskStatus.Success,
status_display: 'Success',
result_message: 'Success. New document id 421 created',
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Complete,
result: 'Success. New document id 421 created',
acknowledged: false,
related_document_ids: [421],
related_document: 421,
},
{
id: 462,
task_id: 'a5b9ca47-0c8e-490f-a04c-6db5d5fc09e5',
input_data: { filename: 'paperless-mail-_rrpmqk6' },
task_file_name: 'paperless-mail-_rrpmqk6',
date_created: new Date('2023-06-07T02:54:35.694916Z'),
date_done: null,
task_type: PaperlessTaskType.ConsumeFile,
task_type_display: 'Consume File',
trigger_source: PaperlessTaskTriggerSource.EmailConsume,
trigger_source_display: 'Email Consume',
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Started,
status_display: 'Started',
result_message: null,
result: null,
acknowledged: false,
related_document_ids: [],
related_document: null,
},
]
@@ -186,7 +167,7 @@ describe('TasksComponent', () => {
fixture.detectChanges()
httpTestingController
.expectOne(
`${environment.apiBaseUrl}tasks/?task_type=consume_file&acknowledged=false`
`${environment.apiBaseUrl}tasks/?task_name=consume_file&acknowledged=false`
)
.flush(tasks)
})
@@ -195,7 +176,7 @@ describe('TasksComponent', () => {
const tabButtons = fixture.debugElement.queryAll(By.directive(NgbNavItem))
let currentTasksLength = tasks.filter(
(t) => t.status === PaperlessTaskStatus.Failure
(t) => t.status === PaperlessTaskStatus.Failed
).length
component.activeTab = TaskTab.Failed
fixture.detectChanges()
@@ -207,7 +188,7 @@ describe('TasksComponent', () => {
).toHaveLength(currentTasksLength + 1)
currentTasksLength = tasks.filter(
(t) => t.status === PaperlessTaskStatus.Success
(t) => t.status === PaperlessTaskStatus.Complete
).length
component.activeTab = TaskTab.Completed
fixture.detectChanges()
@@ -327,7 +308,7 @@ describe('TasksComponent', () => {
expect(component.selectedTasks).toEqual(
new Set(
tasks
.filter((t) => t.status === PaperlessTaskStatus.Failure)
.filter((t) => t.status === PaperlessTaskStatus.Failed)
.map((t) => t.id)
)
)
@@ -341,7 +322,7 @@ describe('TasksComponent', () => {
component.dismissAndGo(tasks[3])
expect(routerSpy).toHaveBeenCalledWith([
'documents',
tasks[3].related_document_ids?.[0],
tasks[3].related_document,
])
})

View File

@@ -175,7 +175,7 @@ export class TasksComponent
dismissAndGo(task: PaperlessTask) {
this.dismissTask(task)
this.router.navigate(['documents', task.related_document_ids?.[0]])
this.router.navigate(['documents', task.related_document])
}
expandTask(task: PaperlessTask) {
@@ -207,13 +207,11 @@ export class TasksComponent
if (this._filterText.length) {
tasks = tasks.filter((t) => {
if (this.filterTargetID == TaskFilterTargetID.Name) {
return (t.input_data?.filename as string)
?.toLowerCase()
return t.task_file_name
.toLowerCase()
.includes(this._filterText.toLowerCase())
} else if (this.filterTargetID == TaskFilterTargetID.Result) {
return t.result_message
?.toLowerCase()
.includes(this._filterText.toLowerCase())
return t.result.toLowerCase().includes(this._filterText.toLowerCase())
}
})
}

View File

@@ -169,10 +169,10 @@
}
</button>
@if (currentUserIsSuperUser) {
@if (isRunning(PaperlessTaskType.IndexOptimize)) {
@if (isRunning(PaperlessTaskName.IndexOptimize)) {
<div class="spinner-border spinner-border-sm ms-2" role="status"></div>
} @else {
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskType.IndexOptimize)">
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskName.IndexOptimize)">
<i-bs name="play-fill" class="me-1"></i-bs>
<ng-container i18n>Run Task</ng-container>
</button>
@@ -203,10 +203,10 @@
}
</button>
@if (currentUserIsSuperUser) {
@if (isRunning(PaperlessTaskType.TrainClassifier)) {
@if (isRunning(PaperlessTaskName.TrainClassifier)) {
<div class="spinner-border spinner-border-sm ms-2" role="status"></div>
} @else {
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskType.TrainClassifier)">
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskName.TrainClassifier)">
<i-bs name="play-fill" class="me-1"></i-bs>
<ng-container i18n>Run Task</ng-container>
</button>
@@ -237,10 +237,10 @@
}
</button>
@if (currentUserIsSuperUser) {
@if (isRunning(PaperlessTaskType.SanityCheck)) {
@if (isRunning(PaperlessTaskName.SanityCheck)) {
<div class="spinner-border spinner-border-sm ms-2" role="status"></div>
} @else {
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskType.SanityCheck)">
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskName.SanityCheck)">
<i-bs name="play-fill" class="me-1"></i-bs>
<ng-container i18n>Run Task</ng-container>
</button>
@@ -285,10 +285,10 @@
}
</button>
@if (currentUserIsSuperUser) {
@if (isRunning(PaperlessTaskType.LlmIndex)) {
@if (isRunning(PaperlessTaskName.LLMIndexUpdate)) {
<div class="spinner-border spinner-border-sm ms-2" role="status"></div>
} @else {
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskType.LlmIndex)">
<button class="btn btn-sm d-flex align-items-center btn-dark small ms-2" (click)="runTask(PaperlessTaskName.LLMIndexUpdate)">
<i-bs name="play-fill" class="me-1"></i-bs>
<ng-container i18n>Run Task</ng-container>
</button>

View File

@@ -25,7 +25,7 @@ import {
import { NgbActiveModal } from '@ng-bootstrap/ng-bootstrap'
import { NgxBootstrapIconsModule, allIcons } from 'ngx-bootstrap-icons'
import { Subject, of, throwError } from 'rxjs'
import { PaperlessTaskType } from 'src/app/data/paperless-task'
import { PaperlessTaskName } from 'src/app/data/paperless-task'
import {
InstallType,
SystemStatus,
@@ -138,9 +138,9 @@ describe('SystemStatusDialogComponent', () => {
})
it('should check if task is running', () => {
component.runTask(PaperlessTaskType.IndexOptimize)
expect(component.isRunning(PaperlessTaskType.IndexOptimize)).toBeTruthy()
expect(component.isRunning(PaperlessTaskType.SanityCheck)).toBeFalsy()
component.runTask(PaperlessTaskName.IndexOptimize)
expect(component.isRunning(PaperlessTaskName.IndexOptimize)).toBeTruthy()
expect(component.isRunning(PaperlessTaskName.SanityCheck)).toBeFalsy()
})
it('should support running tasks, refresh status and show toasts', () => {
@@ -151,22 +151,22 @@ describe('SystemStatusDialogComponent', () => {
// fail first
runSpy.mockReturnValue(throwError(() => new Error('error')))
component.runTask(PaperlessTaskType.IndexOptimize)
expect(runSpy).toHaveBeenCalledWith(PaperlessTaskType.IndexOptimize)
component.runTask(PaperlessTaskName.IndexOptimize)
expect(runSpy).toHaveBeenCalledWith(PaperlessTaskName.IndexOptimize)
expect(toastErrorSpy).toHaveBeenCalledWith(
`Failed to start task ${PaperlessTaskType.IndexOptimize}, see the logs for more details`,
`Failed to start task ${PaperlessTaskName.IndexOptimize}, see the logs for more details`,
expect.any(Error)
)
// succeed
runSpy.mockReturnValue(of({}))
getStatusSpy.mockReturnValue(of(status))
component.runTask(PaperlessTaskType.IndexOptimize)
expect(runSpy).toHaveBeenCalledWith(PaperlessTaskType.IndexOptimize)
component.runTask(PaperlessTaskName.IndexOptimize)
expect(runSpy).toHaveBeenCalledWith(PaperlessTaskName.IndexOptimize)
expect(getStatusSpy).toHaveBeenCalled()
expect(toastSpy).toHaveBeenCalledWith(
`Task ${PaperlessTaskType.IndexOptimize} started`
`Task ${PaperlessTaskName.IndexOptimize} started`
)
})

View File

@@ -8,7 +8,7 @@ import {
} from '@ng-bootstrap/ng-bootstrap'
import { NgxBootstrapIconsModule } from 'ngx-bootstrap-icons'
import { Subject, takeUntil } from 'rxjs'
import { PaperlessTaskType } from 'src/app/data/paperless-task'
import { PaperlessTaskName } from 'src/app/data/paperless-task'
import {
SystemStatus,
SystemStatusItemStatus,
@@ -49,14 +49,14 @@ export class SystemStatusDialogComponent implements OnInit, OnDestroy {
private settingsService = inject(SettingsService)
public SystemStatusItemStatus = SystemStatusItemStatus
public PaperlessTaskType = PaperlessTaskType
public PaperlessTaskName = PaperlessTaskName
public status: SystemStatus
public frontendVersion: string = environment.version
public versionMismatch: boolean = false
public copied: boolean = false
private runningTasks: Set<PaperlessTaskType> = new Set()
private runningTasks: Set<PaperlessTaskName> = new Set()
private unsubscribeNotifier: Subject<any> = new Subject()
get currentUserIsSuperUser(): boolean {
@@ -107,11 +107,11 @@ export class SystemStatusDialogComponent implements OnInit, OnDestroy {
return now.getTime() - date.getTime() > hours * 60 * 60 * 1000
}
public isRunning(taskName: PaperlessTaskType): boolean {
public isRunning(taskName: PaperlessTaskName): boolean {
return this.runningTasks.has(taskName)
}
public runTask(taskName: PaperlessTaskType) {
public runTask(taskName: PaperlessTaskName) {
this.runningTasks.add(taskName)
this.toastService.showInfo(`Task ${taskName} started`)
this.tasksService.run(taskName).subscribe({

View File

@@ -1,63 +1,49 @@
import { Document } from './document'
import { ObjectWithId } from './object-with-id'
export enum PaperlessTaskType {
ConsumeFile = 'consume_file',
TrainClassifier = 'train_classifier',
SanityCheck = 'sanity_check',
IndexOptimize = 'index_optimize',
IndexRebuild = 'index_rebuild',
MailFetch = 'mail_fetch',
LlmIndex = 'llm_index',
Auto = 'auto_task',
ScheduledTask = 'scheduled_task',
ManualTask = 'manual_task',
}
export enum PaperlessTaskTriggerSource {
Scheduled = 'scheduled',
WebUI = 'web_ui',
ApiUpload = 'api_upload',
FolderConsume = 'folder_consume',
EmailConsume = 'email_consume',
System = 'system',
Manual = 'manual',
export enum PaperlessTaskName {
ConsumeFile = 'consume_file',
TrainClassifier = 'train_classifier',
SanityCheck = 'check_sanity',
IndexOptimize = 'index_optimize',
LLMIndexUpdate = 'llmindex_update',
}
export enum PaperlessTaskStatus {
Pending = 'pending',
Started = 'started',
Success = 'success',
Failure = 'failure',
Revoked = 'revoked',
Pending = 'PENDING',
Started = 'STARTED',
Complete = 'SUCCESS',
Failed = 'FAILURE',
}
export interface PaperlessTask extends ObjectWithId {
task_id: string
task_type: PaperlessTaskType
task_type_display: string
trigger_source: PaperlessTaskTriggerSource
trigger_source_display: string
type: PaperlessTaskType
status: PaperlessTaskStatus
status_display: string
date_created: Date
date_started?: Date
date_done?: Date
duration_seconds?: number
wait_time_seconds?: number
input_data: Record<string, unknown>
result_data?: Record<string, unknown>
result_message?: string
related_document_ids: number[]
acknowledged: boolean
task_id: string
task_file_name: string
task_name: PaperlessTaskName
date_created: Date
date_done?: Date
result?: string
related_document?: number
duplicate_documents?: Document[]
owner?: number
}
export interface PaperlessTaskSummary {
task_type: PaperlessTaskType
total_count: number
pending_count: number
success_count: number
failure_count: number
avg_duration_seconds: number | null
avg_wait_time_seconds: number | null
last_run: Date | null
last_success: Date | null
last_failure: Date | null
}

View File

@@ -5,7 +5,11 @@ import {
} from '@angular/common/http/testing'
import { TestBed } from '@angular/core/testing'
import { environment } from 'src/environments/environment'
import { PaperlessTaskStatus, PaperlessTaskType } from '../data/paperless-task'
import {
PaperlessTaskName,
PaperlessTaskStatus,
PaperlessTaskType,
} from '../data/paperless-task'
import { TasksService } from './tasks.service'
describe('TasksService', () => {
@@ -33,7 +37,7 @@ describe('TasksService', () => {
it('calls tasks api endpoint on reload', () => {
tasksService.reload()
const req = httpTestingController.expectOne(
`${environment.apiBaseUrl}tasks/?task_type=consume_file&acknowledged=false`
`${environment.apiBaseUrl}tasks/?task_name=consume_file&acknowledged=false`
)
expect(req.request.method).toEqual('GET')
})
@@ -42,7 +46,7 @@ describe('TasksService', () => {
tasksService.loading = true
tasksService.reload()
httpTestingController.expectNone(
`${environment.apiBaseUrl}tasks/?task_type=consume_file&acknowledged=false`
`${environment.apiBaseUrl}tasks/?task_name=consume_file&acknowledged=false`
)
})
@@ -59,7 +63,7 @@ describe('TasksService', () => {
// reload is then called
httpTestingController
.expectOne(
`${environment.apiBaseUrl}tasks/?task_type=consume_file&acknowledged=false`
`${environment.apiBaseUrl}tasks/?task_name=consume_file&acknowledged=false`
)
.flush([])
})
@@ -68,56 +72,56 @@ describe('TasksService', () => {
expect(tasksService.total).toEqual(0)
const mockTasks = [
{
task_type: PaperlessTaskType.ConsumeFile,
status: PaperlessTaskStatus.Success,
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Complete,
acknowledged: false,
task_id: '1234',
input_data: { filename: 'file1.pdf' },
task_file_name: 'file1.pdf',
date_created: new Date(),
related_document_ids: [],
},
{
task_type: PaperlessTaskType.ConsumeFile,
status: PaperlessTaskStatus.Failure,
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Failed,
acknowledged: false,
task_id: '1235',
input_data: { filename: 'file2.pdf' },
task_file_name: 'file2.pdf',
date_created: new Date(),
related_document_ids: [],
},
{
task_type: PaperlessTaskType.ConsumeFile,
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Pending,
acknowledged: false,
task_id: '1236',
input_data: { filename: 'file3.pdf' },
task_file_name: 'file3.pdf',
date_created: new Date(),
related_document_ids: [],
},
{
task_type: PaperlessTaskType.ConsumeFile,
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Started,
acknowledged: false,
task_id: '1237',
input_data: { filename: 'file4.pdf' },
task_file_name: 'file4.pdf',
date_created: new Date(),
related_document_ids: [],
},
{
task_type: PaperlessTaskType.ConsumeFile,
status: PaperlessTaskStatus.Success,
type: PaperlessTaskType.Auto,
task_name: PaperlessTaskName.ConsumeFile,
status: PaperlessTaskStatus.Complete,
acknowledged: false,
task_id: '1238',
input_data: { filename: 'file5.pdf' },
task_file_name: 'file5.pdf',
date_created: new Date(),
related_document_ids: [],
},
]
tasksService.reload()
const req = httpTestingController.expectOne(
`${environment.apiBaseUrl}tasks/?task_type=consume_file&acknowledged=false`
`${environment.apiBaseUrl}tasks/?task_name=consume_file&acknowledged=false`
)
req.flush(mockTasks)
@@ -130,9 +134,9 @@ describe('TasksService', () => {
})
it('supports running tasks', () => {
tasksService.run(PaperlessTaskType.SanityCheck).subscribe((res) => {
tasksService.run(PaperlessTaskName.SanityCheck).subscribe((res) => {
expect(res).toEqual({
task_id: 'abc-123',
result: 'success',
})
})
const req = httpTestingController.expectOne(
@@ -140,7 +144,7 @@ describe('TasksService', () => {
)
expect(req.request.method).toEqual('POST')
req.flush({
task_id: 'abc-123',
result: 'success',
})
})
})

View File

@@ -4,8 +4,8 @@ import { Observable, Subject } from 'rxjs'
import { first, takeUntil, tap } from 'rxjs/operators'
import {
PaperlessTask,
PaperlessTaskName,
PaperlessTaskStatus,
PaperlessTaskType,
} from 'src/app/data/paperless-task'
import { environment } from 'src/environments/environment'
@@ -18,7 +18,7 @@ export class TasksService {
private baseUrl: string = environment.apiBaseUrl
private endpoint: string = 'tasks'
public loading: boolean = false
public loading: boolean
private fileTasks: PaperlessTask[] = []
@@ -33,27 +33,21 @@ export class TasksService {
}
public get queuedFileTasks(): PaperlessTask[] {
return this.fileTasks.filter(
(t) => t.status === PaperlessTaskStatus.Pending
)
return this.fileTasks.filter((t) => t.status == PaperlessTaskStatus.Pending)
}
public get startedFileTasks(): PaperlessTask[] {
return this.fileTasks.filter(
(t) => t.status === PaperlessTaskStatus.Started
)
return this.fileTasks.filter((t) => t.status == PaperlessTaskStatus.Started)
}
public get completedFileTasks(): PaperlessTask[] {
return this.fileTasks.filter(
(t) => t.status === PaperlessTaskStatus.Success
(t) => t.status == PaperlessTaskStatus.Complete
)
}
public get failedFileTasks(): PaperlessTask[] {
return this.fileTasks.filter(
(t) => t.status === PaperlessTaskStatus.Failure
)
return this.fileTasks.filter((t) => t.status == PaperlessTaskStatus.Failed)
}
public reload() {
@@ -62,16 +56,18 @@ export class TasksService {
this.http
.get<PaperlessTask[]>(
`${this.baseUrl}${this.endpoint}/?task_type=${PaperlessTaskType.ConsumeFile}&acknowledged=false`
`${this.baseUrl}${this.endpoint}/?task_name=consume_file&acknowledged=false`
)
.pipe(takeUntil(this.unsubscribeNotifer), first())
.subscribe((r) => {
this.fileTasks = r
this.fileTasks = r.filter(
(t) => t.task_name == PaperlessTaskName.ConsumeFile
)
this.loading = false
})
}
public dismissTasks(task_ids: Set<number>): Observable<any> {
public dismissTasks(task_ids: Set<number>) {
return this.http
.post(`${this.baseUrl}tasks/acknowledge/`, {
tasks: [...task_ids],
@@ -85,24 +81,16 @@ export class TasksService {
)
}
public dismissAllTasks(): Observable<any> {
return this.http.post(`${this.baseUrl}tasks/acknowledge_all/`, {}).pipe(
first(),
takeUntil(this.unsubscribeNotifer),
tap(() => {
this.reload()
})
)
}
public cancelPending(): void {
this.unsubscribeNotifer.next(true)
}
public run(taskType: PaperlessTaskType): Observable<{ task_id: string }> {
return this.http.post<{ task_id: string }>(
public run(taskName: PaperlessTaskName): Observable<any> {
return this.http.post<any>(
`${environment.apiBaseUrl}${this.endpoint}/run/`,
{ task_type: taskType }
{
task_name: taskName,
}
)
}
}

View File

@@ -144,30 +144,18 @@ class StoragePathAdmin(GuardedModelAdmin):
class TaskAdmin(admin.ModelAdmin):
list_display = (
"task_id",
"task_type",
"trigger_source",
"status",
"date_created",
"date_done",
"duration_seconds",
)
list_filter = ("status", "task_type", "trigger_source", "date_done")
search_fields = ("task_id", "task_type", "status")
list_display = ("task_id", "task_file_name", "task_name", "date_done", "status")
list_filter = ("status", "date_done", "task_name")
search_fields = ("task_name", "task_id", "status", "task_file_name")
readonly_fields = (
"task_id",
"task_type",
"trigger_source",
"task_file_name",
"task_name",
"status",
"date_created",
"date_started",
"date_done",
"duration_seconds",
"wait_time_seconds",
"input_data",
"result_data",
"result_message",
"result",
)

View File

@@ -313,6 +313,7 @@ class ConsumerPlugin(
run_subprocess(
[
settings.PRE_CONSUME_SCRIPT,
original_file_path,
],
script_env,
self.log,
@@ -382,6 +383,14 @@ class ConsumerPlugin(
run_subprocess(
[
settings.POST_CONSUME_SCRIPT,
str(document.pk),
document.get_public_filename(),
os.path.normpath(document.source_path),
os.path.normpath(document.thumbnail_path),
reverse("document-download", kwargs={"pk": document.pk}),
reverse("document-thumb", kwargs={"pk": document.pk}),
str(document.correspondent),
str(",".join(document.tags.all().values_list("name", flat=True))),
],
script_env,
self.log,
@@ -641,10 +650,6 @@ class ConsumerPlugin(
# If we get here, it was successful. Proceed with post-consume
# hooks. If they fail, nothing will get changed.
document = Document.objects.prefetch_related("versions").get(
pk=document.pk,
)
document_consumption_finished.send(
sender=self.__class__,
document=document,

View File

@@ -26,10 +26,8 @@ from django.db.models.functions import Cast
from django.utils.translation import gettext_lazy as _
from django_filters import DateFilter
from django_filters.rest_framework import BooleanFilter
from django_filters.rest_framework import DateTimeFilter
from django_filters.rest_framework import Filter
from django_filters.rest_framework import FilterSet
from django_filters.rest_framework import MultipleChoiceFilter
from drf_spectacular.utils import extend_schema_field
from guardian.utils import get_group_obj_perms_model
from guardian.utils import get_user_obj_perms_model
@@ -864,56 +862,18 @@ class ShareLinkBundleFilterSet(FilterSet):
class PaperlessTaskFilterSet(FilterSet):
task_type = MultipleChoiceFilter(
choices=PaperlessTask.TaskType.choices,
label="Task Type",
)
trigger_source = MultipleChoiceFilter(
choices=PaperlessTask.TriggerSource.choices,
label="Trigger Source",
)
status = MultipleChoiceFilter(
choices=PaperlessTask.Status.choices,
label="Status",
)
is_complete = BooleanFilter(
method="filter_is_complete",
label="Is Complete",
)
acknowledged = BooleanFilter(
label="Acknowledged",
field_name="acknowledged",
)
date_created_after = DateTimeFilter(
field_name="date_created",
lookup_expr="gte",
label="Created After",
)
date_created_before = DateTimeFilter(
field_name="date_created",
lookup_expr="lte",
label="Created Before",
)
class Meta:
model = PaperlessTask
fields = ["task_type", "trigger_source", "status", "acknowledged", "owner"]
def filter_is_complete(self, queryset, name, value):
complete = [
PaperlessTask.Status.SUCCESS,
PaperlessTask.Status.FAILURE,
PaperlessTask.Status.REVOKED,
]
if value:
return queryset.filter(status__in=complete)
return queryset.exclude(status__in=complete)
fields = {
"type": ["exact"],
"task_name": ["exact"],
"status": ["exact"],
}
class ObjectOwnedOrGrantedPermissionsFilter(ObjectPermissionsFilter):

View File

@@ -22,6 +22,7 @@ class Command(PaperlessCommand):
self.buffered_logging("paperless.classifier"),
):
train_classifier(
scheduled=False,
status_callback=lambda msg: self.console.print(f" {msg}"),
)

View File

@@ -17,6 +17,7 @@ class Command(PaperlessCommand):
def handle(self, *args: Any, **options: Any) -> None:
llmindex_index(
rebuild=options["command"] == "rebuild",
scheduled=False,
iter_wrapper=lambda docs: self.track(
docs,
description="Indexing documents...",

View File

@@ -111,6 +111,7 @@ class Command(PaperlessCommand):
def handle(self, *args: Any, **options: Any) -> None:
messages = check_sanity(
scheduled=False,
iter_wrapper=lambda docs: self.track(
docs,
description="Checking documents...",

View File

@@ -1,213 +0,0 @@
"""
Drop and recreate the PaperlessTask table with the new structured schema.
We intentionally drop all existing task data -- the old schema was
string-based and incompatible with the new JSONField result storage.
"""
import django.db.models.deletion
import django.utils.timezone
from django.conf import settings
from django.db import migrations
from django.db import models
class Migration(migrations.Migration):
dependencies = [
("documents", "0018_saved_view_simple_search_rules"),
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
]
operations = [
migrations.DeleteModel(name="PaperlessTask"),
migrations.CreateModel(
name="PaperlessTask",
fields=[
(
"id",
models.AutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
(
"owner",
models.ForeignKey(
blank=True,
default=None,
null=True,
on_delete=django.db.models.deletion.SET_NULL,
to=settings.AUTH_USER_MODEL,
verbose_name="owner",
),
),
(
"task_id",
models.CharField(
help_text="Celery task ID",
max_length=255,
unique=True,
verbose_name="Task ID",
),
),
(
"task_type",
models.CharField(
choices=[
("consume_file", "Consume File"),
("train_classifier", "Train Classifier"),
("sanity_check", "Sanity Check"),
("index_optimize", "Index Optimize"),
("index_rebuild", "Index Rebuild"),
("mail_fetch", "Mail Fetch"),
("llm_index", "LLM Index"),
],
db_index=True,
help_text="The kind of work being performed",
max_length=50,
verbose_name="Task Type",
),
),
(
"trigger_source",
models.CharField(
choices=[
("scheduled", "Scheduled"),
("web_ui", "Web UI"),
("api_upload", "API Upload"),
("folder_consume", "Folder Consume"),
("email_consume", "Email Consume"),
("system", "System"),
("manual", "Manual"),
],
db_index=True,
help_text="What initiated this task",
max_length=50,
verbose_name="Trigger Source",
),
),
(
"status",
models.CharField(
choices=[
("pending", "Pending"),
("started", "Started"),
("success", "Success"),
("failure", "Failure"),
("revoked", "Revoked"),
],
db_index=True,
default="pending",
max_length=30,
verbose_name="Status",
),
),
(
"date_created",
models.DateTimeField(
db_index=True,
default=django.utils.timezone.now,
verbose_name="Created",
),
),
(
"date_started",
models.DateTimeField(
blank=True,
null=True,
verbose_name="Started",
),
),
(
"date_done",
models.DateTimeField(
blank=True,
db_index=True,
null=True,
verbose_name="Completed",
),
),
(
"duration_seconds",
models.FloatField(
blank=True,
help_text="Elapsed time from start to completion",
null=True,
verbose_name="Duration (seconds)",
),
),
(
"wait_time_seconds",
models.FloatField(
blank=True,
help_text="Time from task creation to worker pickup",
null=True,
verbose_name="Wait Time (seconds)",
),
),
(
"input_data",
models.JSONField(
blank=True,
default=dict,
help_text="Structured input parameters for the task",
verbose_name="Input Data",
),
),
(
"result_data",
models.JSONField(
blank=True,
help_text="Structured result data from task execution",
null=True,
verbose_name="Result Data",
),
),
(
"result_message",
models.TextField(
blank=True,
help_text="Human-readable result message",
null=True,
verbose_name="Result Message",
),
),
(
"acknowledged",
models.BooleanField(
db_index=True,
default=False,
verbose_name="Acknowledged",
),
),
],
options={
"verbose_name": "Task",
"verbose_name_plural": "Tasks",
"ordering": ["-date_created"],
},
),
migrations.AddIndex(
model_name="paperlesstask",
index=models.Index(
fields=["status", "date_created"],
name="documents_p_status_8aa687_idx",
),
),
migrations.AddIndex(
model_name="paperlesstask",
index=models.Index(
fields=["task_type", "status"],
name="documents_p_task_ty_e4a93f_idx",
),
),
migrations.AddIndex(
model_name="paperlesstask",
index=models.Index(
fields=["owner", "acknowledged", "date_created"],
name="documents_p_owner_i_62c545_idx",
),
),
]

View File

@@ -1,22 +0,0 @@
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("documents", "0019_task_system_redesign"),
]
operations = [
migrations.RunSQL(
sql="DROP TABLE IF EXISTS django_celery_results_taskresult;",
reverse_sql=migrations.RunSQL.noop,
),
migrations.RunSQL(
sql="DROP TABLE IF EXISTS django_celery_results_groupresult;",
reverse_sql=migrations.RunSQL.noop,
),
migrations.RunSQL(
sql="DROP TABLE IF EXISTS django_celery_results_chordcounter;",
reverse_sql=migrations.RunSQL.noop,
),
]

View File

@@ -3,6 +3,7 @@ from pathlib import Path
from typing import Final
import pathvalidate
from celery import states
from django.conf import settings
from django.contrib.auth.models import Group
from django.contrib.auth.models import User
@@ -380,10 +381,7 @@ class Document(SoftDeleteModel, ModelWithOwner): # type: ignore[django-manager-
if isinstance(prefetched_cache, dict)
else None
)
if prefetched_versions is not None:
# Empty list means prefetch ran and found no versions — use own content.
if not prefetched_versions:
return self.content
if prefetched_versions:
latest_prefetched = max(prefetched_versions, key=lambda doc: doc.id)
return latest_prefetched.content
@@ -662,170 +660,97 @@ class UiSettings(models.Model):
class PaperlessTask(ModelWithOwner):
"""
Tracks background task execution for user visibility and debugging.
State transitions:
PENDING -> STARTED -> SUCCESS
PENDING -> STARTED -> FAILURE
PENDING -> REVOKED (if cancelled before starting)
"""
class Status(models.TextChoices):
PENDING = "pending", _("Pending")
STARTED = "started", _("Started")
SUCCESS = "success", _("Success")
FAILURE = "failure", _("Failure")
REVOKED = "revoked", _("Revoked")
ALL_STATES = sorted(states.ALL_STATES)
TASK_STATE_CHOICES = sorted(zip(ALL_STATES, ALL_STATES))
class TaskType(models.TextChoices):
CONSUME_FILE = "consume_file", _("Consume File")
TRAIN_CLASSIFIER = "train_classifier", _("Train Classifier")
SANITY_CHECK = "sanity_check", _("Sanity Check")
INDEX_OPTIMIZE = "index_optimize", _("Index Optimize")
INDEX_REBUILD = "index_rebuild", _("Index Rebuild")
MAIL_FETCH = "mail_fetch", _("Mail Fetch")
LLM_INDEX = "llm_index", _("LLM Index")
AUTO = ("auto_task", _("Auto Task"))
SCHEDULED_TASK = ("scheduled_task", _("Scheduled Task"))
MANUAL_TASK = ("manual_task", _("Manual Task"))
class TriggerSource(models.TextChoices):
SCHEDULED = "scheduled", _("Scheduled") # Celery beat
WEB_UI = "web_ui", _("Web UI") # Document uploaded via web
API_UPLOAD = "api_upload", _("API Upload") # Document uploaded via API
FOLDER_CONSUME = "folder_consume", _("Folder Consume") # Consume folder
EMAIL_CONSUME = "email_consume", _("Email Consume") # Email attachment
SYSTEM = (
"system",
_("System"),
) # Auto-triggered by system (self-heal, config side-effect)
MANUAL = "manual", _("Manual") # User explicitly ran via /api/tasks/run/
class TaskName(models.TextChoices):
CONSUME_FILE = ("consume_file", _("Consume File"))
TRAIN_CLASSIFIER = ("train_classifier", _("Train Classifier"))
CHECK_SANITY = ("check_sanity", _("Check Sanity"))
INDEX_OPTIMIZE = ("index_optimize", _("Index Optimize"))
LLMINDEX_UPDATE = ("llmindex_update", _("LLM Index Update"))
# Identification
task_id = models.CharField(
max_length=255,
unique=True,
verbose_name=_("Task ID"),
help_text=_("Celery task ID"),
help_text=_("Celery ID for the Task that was run"),
)
task_type = models.CharField(
max_length=50,
choices=TaskType.choices,
verbose_name=_("Task Type"),
help_text=_("The kind of work being performed"),
db_index=True,
acknowledged = models.BooleanField(
default=False,
verbose_name=_("Acknowledged"),
help_text=_("If the task is acknowledged via the frontend or API"),
)
trigger_source = models.CharField(
max_length=50,
choices=TriggerSource.choices,
verbose_name=_("Trigger Source"),
help_text=_("What initiated this task"),
db_index=True,
task_file_name = models.CharField(
null=True,
max_length=255,
verbose_name=_("Task Filename"),
help_text=_("Name of the file which the Task was run for"),
)
task_name = models.CharField(
null=True,
max_length=255,
choices=TaskName.choices,
verbose_name=_("Task Name"),
help_text=_("Name of the task that was run"),
)
# State tracking
status = models.CharField(
max_length=30,
choices=Status.choices,
default=Status.PENDING,
verbose_name=_("Status"),
db_index=True,
default=states.PENDING,
choices=TASK_STATE_CHOICES,
verbose_name=_("Task State"),
help_text=_("Current state of the task being run"),
)
# Timestamps
date_created = models.DateTimeField(
null=True,
default=timezone.now,
verbose_name=_("Created"),
db_index=True,
verbose_name=_("Created DateTime"),
help_text=_("Datetime field when the task result was created in UTC"),
)
date_started = models.DateTimeField(
null=True,
blank=True,
verbose_name=_("Started"),
default=None,
verbose_name=_("Started DateTime"),
help_text=_("Datetime field when the task was started in UTC"),
)
date_done = models.DateTimeField(
null=True,
blank=True,
verbose_name=_("Completed"),
db_index=True,
default=None,
verbose_name=_("Completed DateTime"),
help_text=_("Datetime field when the task was completed in UTC"),
)
# Duration fields -- populated by task_postrun signal handler
duration_seconds = models.FloatField(
result = models.TextField(
null=True,
blank=True,
verbose_name=_("Duration (seconds)"),
help_text=_("Elapsed time from start to completion"),
)
wait_time_seconds = models.FloatField(
null=True,
blank=True,
verbose_name=_("Wait Time (seconds)"),
help_text=_("Time from task creation to worker pickup"),
)
# Input/Output data
input_data = models.JSONField(
default=dict,
blank=True,
verbose_name=_("Input Data"),
help_text=_("Structured input parameters for the task"),
)
result_data = models.JSONField(
null=True,
blank=True,
default=None,
verbose_name=_("Result Data"),
help_text=_("Structured result data from task execution"),
help_text=_(
"The data returned by the task",
),
)
result_message = models.TextField(
null=True,
blank=True,
verbose_name=_("Result Message"),
help_text=_("Human-readable result message"),
type = models.CharField(
max_length=30,
choices=TaskType.choices,
default=TaskType.AUTO,
verbose_name=_("Task Type"),
help_text=_("The type of task that was run"),
)
# Acknowledgment
acknowledged = models.BooleanField(
default=False,
verbose_name=_("Acknowledged"),
db_index=True,
)
class Meta:
verbose_name = _("Task")
verbose_name_plural = _("Tasks")
ordering = ["-date_created"]
indexes = [
models.Index(fields=["status", "date_created"]),
models.Index(fields=["task_type", "status"]),
models.Index(fields=["owner", "acknowledged", "date_created"]),
]
def __str__(self) -> str:
return f"{self.get_task_type_display()} [{self.task_id[:8]}]"
@property
def is_complete(self) -> bool:
return self.status in (
self.Status.SUCCESS,
self.Status.FAILURE,
self.Status.REVOKED,
)
@property
def related_document_ids(self) -> list[int]:
if not self.result_data:
return []
if doc_id := self.result_data.get("document_id"):
return [doc_id]
if dup_id := self.result_data.get("duplicate_of"):
return [dup_id]
return []
return f"Task {self.task_id}"
class Note(SoftDeleteModel):

View File

@@ -10,6 +10,7 @@ is an identity function that adds no overhead.
"""
import logging
import uuid
from collections import defaultdict
from collections.abc import Iterator
from pathlib import Path
@@ -17,9 +18,12 @@ from typing import TYPE_CHECKING
from typing import Final
from typing import TypedDict
from celery import states
from django.conf import settings
from django.utils import timezone
from documents.models import Document
from documents.models import PaperlessTask
from documents.utils import IterWrapper
from documents.utils import compute_checksum
from documents.utils import identity
@@ -178,9 +182,8 @@ def _check_thumbnail(
present_files: set[Path],
) -> None:
"""Verify the thumbnail exists and is readable."""
# doc.thumbnail_path already returns a resolved Path; no need to re-resolve.
thumbnail_path: Final[Path] = doc.thumbnail_path
if not thumbnail_path.is_file():
thumbnail_path: Final[Path] = Path(doc.thumbnail_path).resolve()
if not thumbnail_path.exists() or not thumbnail_path.is_file():
messages.error(doc.pk, "Thumbnail of document does not exist.")
return
@@ -197,9 +200,8 @@ def _check_original(
present_files: set[Path],
) -> None:
"""Verify the original file exists, is readable, and has matching checksum."""
# doc.source_path already returns a resolved Path; no need to re-resolve.
source_path: Final[Path] = doc.source_path
if not source_path.is_file():
source_path: Final[Path] = Path(doc.source_path).resolve()
if not source_path.exists() or not source_path.is_file():
messages.error(doc.pk, "Original of document does not exist.")
return
@@ -235,9 +237,8 @@ def _check_archive(
elif doc.has_archive_version:
if TYPE_CHECKING:
assert isinstance(doc.archive_path, Path)
# doc.archive_path already returns a resolved Path; no need to re-resolve.
archive_path: Final[Path] = doc.archive_path # type: ignore[assignment]
if not archive_path.is_file():
archive_path: Final[Path] = Path(doc.archive_path).resolve()
if not archive_path.exists() or not archive_path.is_file():
messages.error(doc.pk, "Archived version of document does not exist.")
return
@@ -283,33 +284,59 @@ def _check_document(
def check_sanity(
*,
scheduled: bool = True,
iter_wrapper: IterWrapper[Document] = identity,
) -> SanityCheckMessages:
"""Run a full sanity check on the document archive.
Args:
scheduled: Whether this is a scheduled (automatic) or manual check.
Controls the task type recorded in the database.
iter_wrapper: A callable that wraps the document iterable, e.g.,
for progress bar display. Defaults to identity (no wrapping).
Returns:
A SanityCheckMessages instance containing all detected issues.
"""
paperless_task = PaperlessTask.objects.create(
task_id=uuid.uuid4(),
type=(
PaperlessTask.TaskType.SCHEDULED_TASK
if scheduled
else PaperlessTask.TaskType.MANUAL_TASK
),
task_name=PaperlessTask.TaskName.CHECK_SANITY,
status=states.STARTED,
date_created=timezone.now(),
date_started=timezone.now(),
)
messages = SanityCheckMessages()
present_files = _build_present_files()
documents = Document.global_objects.only(
"pk",
"filename",
"mime_type",
"checksum",
"archive_checksum",
"archive_filename",
"content",
).iterator(chunk_size=500)
documents = Document.global_objects.all()
for doc in iter_wrapper(documents):
_check_document(doc, messages, present_files)
for extra_file in present_files:
messages.warning(None, f"Orphaned file in media dir: {extra_file}")
paperless_task.status = states.SUCCESS if not messages.has_error else states.FAILURE
if messages.total_issue_count == 0:
paperless_task.result = "No issues found."
else:
parts: list[str] = []
if messages.document_error_count:
parts.append(f"{messages.document_error_count} document(s) with errors")
if messages.document_warning_count:
parts.append(f"{messages.document_warning_count} document(s) with warnings")
if messages.global_warning_count:
parts.append(f"{messages.global_warning_count} global warning(s)")
paperless_task.result = ", ".join(parts) + " found."
if messages.has_error:
paperless_task.result += " Check logs for details."
paperless_task.date_done = timezone.now()
paperless_task.save(update_fields=["status", "result", "date_done"])
return messages

View File

@@ -12,6 +12,7 @@ from typing import Literal
from typing import TypedDict
import magic
from celery import states
from django.conf import settings
from django.contrib.auth.models import Group
from django.contrib.auth.models import User
@@ -99,7 +100,7 @@ logger = logging.getLogger("paperless.serializers")
# https://www.django-rest-framework.org/api-guide/serializers/#example
class DynamicFieldsModelSerializer(serializers.ModelSerializer[Any]):
class DynamicFieldsModelSerializer(serializers.ModelSerializer):
"""
A ModelSerializer that takes an additional `fields` argument that
controls which fields should be displayed.
@@ -120,7 +121,7 @@ class DynamicFieldsModelSerializer(serializers.ModelSerializer[Any]):
self.fields.pop(field_name)
class MatchingModelSerializer(serializers.ModelSerializer[Any]):
class MatchingModelSerializer(serializers.ModelSerializer):
document_count = serializers.IntegerField(read_only=True)
def get_slug(self, obj) -> str:
@@ -260,7 +261,7 @@ class SetPermissionsSerializer(serializers.DictField):
class OwnedObjectSerializer(
SerializerWithPerms,
serializers.ModelSerializer[Any],
serializers.ModelSerializer,
SetPermissionsMixin,
):
def __init__(self, *args, **kwargs) -> None:
@@ -468,7 +469,7 @@ class OwnedObjectSerializer(
return super().update(instance, validated_data)
class OwnedObjectListSerializer(serializers.ListSerializer[Any]):
class OwnedObjectListSerializer(serializers.ListSerializer):
def to_representation(self, documents):
self.child.context["shared_object_pks"] = self.child.get_shared_object_pks(
documents,
@@ -681,27 +682,27 @@ class TagSerializer(MatchingModelSerializer, OwnedObjectSerializer):
return super().validate(attrs)
class CorrespondentField(serializers.PrimaryKeyRelatedField[Correspondent]):
class CorrespondentField(serializers.PrimaryKeyRelatedField):
def get_queryset(self):
return Correspondent.objects.all()
class TagsField(serializers.PrimaryKeyRelatedField[Tag]):
class TagsField(serializers.PrimaryKeyRelatedField):
def get_queryset(self):
return Tag.objects.all()
class DocumentTypeField(serializers.PrimaryKeyRelatedField[DocumentType]):
class DocumentTypeField(serializers.PrimaryKeyRelatedField):
def get_queryset(self):
return DocumentType.objects.all()
class StoragePathField(serializers.PrimaryKeyRelatedField[StoragePath]):
class StoragePathField(serializers.PrimaryKeyRelatedField):
def get_queryset(self):
return StoragePath.objects.all()
class CustomFieldSerializer(serializers.ModelSerializer[CustomField]):
class CustomFieldSerializer(serializers.ModelSerializer):
data_type = serializers.ChoiceField(
choices=CustomField.FieldDataType,
read_only=False,
@@ -815,7 +816,7 @@ def validate_documentlink_targets(user, doc_ids):
)
class CustomFieldInstanceSerializer(serializers.ModelSerializer[CustomFieldInstance]):
class CustomFieldInstanceSerializer(serializers.ModelSerializer):
field = serializers.PrimaryKeyRelatedField(queryset=CustomField.objects.all())
value = ReadWriteSerializerMethodField(allow_null=True)
@@ -921,14 +922,14 @@ class CustomFieldInstanceSerializer(serializers.ModelSerializer[CustomFieldInsta
]
class BasicUserSerializer(serializers.ModelSerializer[User]):
class BasicUserSerializer(serializers.ModelSerializer):
# Different than paperless.serializers.UserSerializer
class Meta:
model = User
fields = ["id", "username", "first_name", "last_name"]
class NotesSerializer(serializers.ModelSerializer[Note]):
class NotesSerializer(serializers.ModelSerializer):
user = BasicUserSerializer(read_only=True)
class Meta:
@@ -1255,7 +1256,7 @@ class DocumentSerializer(
list_serializer_class = OwnedObjectListSerializer
class SearchResultListSerializer(serializers.ListSerializer[Document]):
class SearchResultListSerializer(serializers.ListSerializer):
def to_representation(self, hits):
document_ids = [hit["id"] for hit in hits]
# Fetch all Document objects in the list in one SQL query.
@@ -1312,7 +1313,7 @@ class SearchResultSerializer(DocumentSerializer):
list_serializer_class = SearchResultListSerializer
class SavedViewFilterRuleSerializer(serializers.ModelSerializer[SavedViewFilterRule]):
class SavedViewFilterRuleSerializer(serializers.ModelSerializer):
class Meta:
model = SavedViewFilterRule
fields = ["rule_type", "value"]
@@ -2400,7 +2401,7 @@ class StoragePathSerializer(MatchingModelSerializer, OwnedObjectSerializer):
return super().update(instance, validated_data)
class UiSettingsViewSerializer(serializers.ModelSerializer[UiSettings]):
class UiSettingsViewSerializer(serializers.ModelSerializer):
settings = serializers.DictField(required=False, allow_null=True)
class Meta:
@@ -2428,81 +2429,7 @@ class UiSettingsViewSerializer(serializers.ModelSerializer[UiSettings]):
return ui_settings
class TaskSerializerV10(OwnedObjectSerializer):
"""Task serializer for API v10+ using new field names."""
related_document_ids = serializers.ListField(
child=serializers.IntegerField(),
read_only=True,
)
task_type_display = serializers.CharField(
source="get_task_type_display",
read_only=True,
)
trigger_source_display = serializers.CharField(
source="get_trigger_source_display",
read_only=True,
)
status_display = serializers.CharField(
source="get_status_display",
read_only=True,
)
class Meta:
model = PaperlessTask
fields = (
"id",
"task_id",
"task_type",
"task_type_display",
"trigger_source",
"trigger_source_display",
"status",
"status_display",
"date_created",
"date_started",
"date_done",
"duration_seconds",
"wait_time_seconds",
"input_data",
"result_data",
"result_message",
"related_document_ids",
"acknowledged",
"owner",
)
read_only_fields = fields
class TaskSerializerV9(serializers.ModelSerializer):
"""Task serializer for API v9 backwards compatibility.
Maps old field names to the new model fields so existing clients continue
to work unchanged.
"""
# v9 field: task_name -> task_type
task_name = serializers.CharField(source="task_type", read_only=True)
# v9 field: task_file_name -> input_data.filename
task_file_name = serializers.SerializerMethodField()
# v9 field: type -> trigger_source (mapped to old enum labels)
type = serializers.SerializerMethodField()
# v9 field: result -> result_message (with legacy format fallback)
result = serializers.CharField(
source="result_message",
read_only=True,
allow_null=True,
)
# v9 field: related_document -> first document ID from result_data
related_document = serializers.SerializerMethodField()
# v9 field: duplicate_documents -> list of duplicate IDs from result_data
duplicate_documents = serializers.SerializerMethodField()
class TasksViewSerializer(OwnedObjectSerializer):
class Meta:
model = PaperlessTask
fields = (
@@ -2510,59 +2437,59 @@ class TaskSerializerV9(serializers.ModelSerializer):
"task_id",
"task_name",
"task_file_name",
"type",
"status",
"date_created",
"date_done",
"type",
"status",
"result",
"acknowledged",
"related_document",
"duplicate_documents",
"owner",
)
read_only_fields = fields
def get_task_file_name(self, obj: PaperlessTask) -> str | None:
if not obj.input_data:
return None
return obj.input_data.get("filename")
related_document = serializers.SerializerMethodField()
duplicate_documents = serializers.SerializerMethodField()
created_doc_re = re.compile(r"New document id (\d+) created")
duplicate_doc_re = re.compile(r"It is a duplicate of .* \(#(\d+)\)")
def get_type(self, obj: PaperlessTask) -> str:
# Old type values: AUTO_TASK, SCHEDULED_TASK, MANUAL_TASK
source_to_old_type = {
PaperlessTask.TriggerSource.SCHEDULED: "SCHEDULED_TASK",
PaperlessTask.TriggerSource.SYSTEM: "AUTO_TASK",
}
return source_to_old_type.get(obj.trigger_source, "MANUAL_TASK")
def get_related_document(self, obj) -> str | None:
result = None
re = None
if obj.result:
match obj.status:
case states.SUCCESS:
re = self.created_doc_re
case states.FAILURE:
re = (
self.duplicate_doc_re
if "existing document is in the trash" not in obj.result
else None
)
if re is not None:
try:
result = re.search(obj.result).group(1)
except Exception:
pass
def get_related_document(self, obj: PaperlessTask) -> int | None:
ids = obj.related_document_ids
return ids[0] if ids else None
return result
def get_duplicate_documents(self, obj: PaperlessTask) -> list[int]:
if not obj.result_data:
@extend_schema_field(DuplicateDocumentSummarySerializer(many=True))
def get_duplicate_documents(self, obj):
related_document = self.get_related_document(obj)
request = self.context.get("request")
user = request.user if request else None
document = Document.global_objects.filter(pk=related_document).first()
if not related_document or not user or not document:
return []
dup_of = obj.result_data.get("duplicate_of")
return [dup_of] if dup_of is not None else []
duplicates = _get_viewable_duplicates(document, user)
return list(duplicates.values("id", "title", "deleted_at"))
class TaskSummarySerializer(serializers.Serializer):
task_type = serializers.CharField()
total_count = serializers.IntegerField()
pending_count = serializers.IntegerField()
success_count = serializers.IntegerField()
failure_count = serializers.IntegerField()
avg_duration_seconds = serializers.FloatField(allow_null=True)
avg_wait_time_seconds = serializers.FloatField(allow_null=True)
last_run = serializers.DateTimeField(allow_null=True)
last_success = serializers.DateTimeField(allow_null=True)
last_failure = serializers.DateTimeField(allow_null=True)
class RunTaskSerializer(serializers.Serializer):
task_type = serializers.ChoiceField(
choices=PaperlessTask.TaskType.choices,
label="Task Type",
class RunTaskViewSerializer(serializers.Serializer[dict[str, Any]]):
task_name = serializers.ChoiceField(
choices=PaperlessTask.TaskName.choices,
label="Task Name",
write_only=True,
)
@@ -2833,7 +2760,7 @@ class BulkEditObjectsSerializer(SerializerWithPerms, SetPermissionsMixin):
return attrs
class WorkflowTriggerSerializer(serializers.ModelSerializer[WorkflowTrigger]):
class WorkflowTriggerSerializer(serializers.ModelSerializer):
id = serializers.IntegerField(required=False, allow_null=True)
sources = fields.MultipleChoiceField(
choices=WorkflowTrigger.DocumentSourceChoices.choices,
@@ -2943,7 +2870,7 @@ class WorkflowTriggerSerializer(serializers.ModelSerializer[WorkflowTrigger]):
return super().update(instance, validated_data)
class WorkflowActionEmailSerializer(serializers.ModelSerializer[WorkflowActionEmail]):
class WorkflowActionEmailSerializer(serializers.ModelSerializer):
id = serializers.IntegerField(allow_null=True, required=False)
class Meta:
@@ -2957,9 +2884,7 @@ class WorkflowActionEmailSerializer(serializers.ModelSerializer[WorkflowActionEm
]
class WorkflowActionWebhookSerializer(
serializers.ModelSerializer[WorkflowActionWebhook],
):
class WorkflowActionWebhookSerializer(serializers.ModelSerializer):
id = serializers.IntegerField(allow_null=True, required=False)
def validate_url(self, url):
@@ -2980,7 +2905,7 @@ class WorkflowActionWebhookSerializer(
]
class WorkflowActionSerializer(serializers.ModelSerializer[WorkflowAction]):
class WorkflowActionSerializer(serializers.ModelSerializer):
id = serializers.IntegerField(required=False, allow_null=True)
assign_correspondent = CorrespondentField(allow_null=True, required=False)
assign_tags = TagsField(many=True, allow_null=True, required=False)
@@ -3102,7 +3027,7 @@ class WorkflowActionSerializer(serializers.ModelSerializer[WorkflowAction]):
return attrs
class WorkflowSerializer(serializers.ModelSerializer[Workflow]):
class WorkflowSerializer(serializers.ModelSerializer):
order = serializers.IntegerField(required=False)
triggers = WorkflowTriggerSerializer(many=True)

View File

@@ -8,6 +8,7 @@ from typing import TYPE_CHECKING
from typing import Any
from celery import shared_task
from celery import states
from celery.signals import before_task_publish
from celery.signals import task_failure
from celery.signals import task_postrun
@@ -30,7 +31,6 @@ from documents import matching
from documents.caching import clear_document_caches
from documents.caching import invalidate_llm_suggestions_cache
from documents.data_models import ConsumableDocument
from documents.data_models import DocumentSource
from documents.file_handling import create_source_path_directory
from documents.file_handling import delete_empty_directories
from documents.file_handling import generate_filename
@@ -999,175 +999,68 @@ def run_workflows(
return overrides, "\n".join(messages)
# ---------------------------------------------------------------------------
# Task tracking -- Celery signal handlers
# ---------------------------------------------------------------------------
TRACKED_TASKS: dict[str, PaperlessTask.TaskType] = {
"documents.tasks.consume_file": PaperlessTask.TaskType.CONSUME_FILE,
"documents.tasks.train_classifier": PaperlessTask.TaskType.TRAIN_CLASSIFIER,
"documents.tasks.sanity_check": PaperlessTask.TaskType.SANITY_CHECK,
"documents.tasks.index_optimize": PaperlessTask.TaskType.INDEX_OPTIMIZE,
"documents.tasks.llmindex_index": PaperlessTask.TaskType.LLM_INDEX,
"paperless_mail.tasks.process_mail_accounts": PaperlessTask.TaskType.MAIL_FETCH,
}
_DOCUMENT_SOURCE_TO_TRIGGER: dict[Any, PaperlessTask.TriggerSource] = {
DocumentSource.ConsumeFolder: PaperlessTask.TriggerSource.FOLDER_CONSUME,
DocumentSource.ApiUpload: PaperlessTask.TriggerSource.API_UPLOAD,
DocumentSource.MailFetch: PaperlessTask.TriggerSource.EMAIL_CONSUME,
DocumentSource.WebUI: PaperlessTask.TriggerSource.WEB_UI,
}
def _extract_input_data(
task_type: PaperlessTask.TaskType,
args: tuple,
task_kwargs: dict,
) -> dict:
if task_type == PaperlessTask.TaskType.CONSUME_FILE:
input_doc = args[0] if args else task_kwargs.get("input_doc")
overrides = args[1] if len(args) >= 2 else task_kwargs.get("overrides")
if input_doc is None:
return {}
data: dict = {
"filename": input_doc.original_file.name,
"mime_type": input_doc.mime_type,
}
if input_doc.original_path:
data["source_path"] = str(input_doc.original_path)
if input_doc.mailrule_id:
data["mailrule_id"] = input_doc.mailrule_id
if overrides:
override_dict = {
k: v
for k, v in vars(overrides).items()
if v is not None and not k.startswith("_")
}
if override_dict:
data["overrides"] = override_dict
return data
if task_type == PaperlessTask.TaskType.MAIL_FETCH:
account_ids = args[0] if args else task_kwargs.get("account_ids")
return {"account_ids": account_ids}
return {}
def _determine_trigger_source(
task_type: PaperlessTask.TaskType,
args: tuple,
task_kwargs: dict,
headers: dict,
) -> PaperlessTask.TriggerSource:
# Explicit header takes priority -- covers beat ("scheduled") and system auto-runs ("system")
header_source = headers.get("trigger_source")
if header_source == "scheduled":
return PaperlessTask.TriggerSource.SCHEDULED
if header_source == "system":
return PaperlessTask.TriggerSource.SYSTEM
if task_type == PaperlessTask.TaskType.CONSUME_FILE:
input_doc = args[0] if args else task_kwargs.get("input_doc")
if input_doc is not None:
return _DOCUMENT_SOURCE_TO_TRIGGER.get(
input_doc.source,
PaperlessTask.TriggerSource.API_UPLOAD,
)
return PaperlessTask.TriggerSource.MANUAL
def _extract_owner_id(
task_type: PaperlessTask.TaskType,
args: tuple,
task_kwargs: dict,
) -> int | None:
if task_type != PaperlessTask.TaskType.CONSUME_FILE:
return None
overrides = args[1] if len(args) >= 2 else task_kwargs.get("overrides")
if overrides and hasattr(overrides, "owner_id"):
return overrides.owner_id
return None
def _parse_legacy_result(result: str) -> dict | None:
import re as _re
if match := _re.search(r"New document id (\d+) created", result):
return {"document_id": int(match.group(1))}
if match := _re.search(r"It is a duplicate of .* \(#(\d+)\)", result):
return {
"duplicate_of": int(match.group(1)),
"duplicate_in_trash": "existing document is in the trash" in result,
}
return None
@before_task_publish.connect
def before_task_publish_handler(
sender=None,
headers=None,
body=None,
**kwargs,
) -> None:
def before_task_publish_handler(sender=None, headers=None, body=None, **kwargs) -> None:
"""
Creates the PaperlessTask record when the task is published to broker.
Creates the PaperlessTask object in a pending state. This is sent before
the task reaches the broker, but before it begins executing on a worker.
https://docs.celeryq.dev/en/stable/userguide/signals.html#before-task-publish
https://docs.celeryq.dev/en/stable/internals/protocol.html#version-2
"""
if headers is None or body is None:
return
task_name = headers.get("task", "")
task_type = TRACKED_TASKS.get(task_name)
if task_type is None:
https://docs.celeryq.dev/en/stable/internals/protocol.html#version-2
"""
if "task" not in headers or headers["task"] != "documents.tasks.consume_file":
# Assumption: this is only ever a v2 message
return
try:
close_old_connections()
args, task_kwargs, _ = body
task_id = headers["id"]
input_data = _extract_input_data(task_type, args, task_kwargs)
trigger_source = _determine_trigger_source(
task_type,
args,
task_kwargs,
headers,
)
owner_id = _extract_owner_id(task_type, args, task_kwargs)
task_args = body[0]
input_doc, overrides = task_args
task_file_name = input_doc.original_file.name
user_id = overrides.owner_id if overrides else None
PaperlessTask.objects.create(
task_id=task_id,
task_type=task_type,
trigger_source=trigger_source,
status=PaperlessTask.Status.PENDING,
input_data=input_data,
owner_id=owner_id,
type=PaperlessTask.TaskType.AUTO,
task_id=headers["id"],
status=states.PENDING,
task_file_name=task_file_name,
task_name=PaperlessTask.TaskName.CONSUME_FILE,
result=None,
date_created=timezone.now(),
date_started=None,
date_done=None,
owner_id=user_id,
)
except Exception:
except Exception: # pragma: no cover
# Don't let an exception in the signal handlers prevent
# a document from being consumed.
logger.exception("Creating PaperlessTask failed")
@task_prerun.connect
def task_prerun_handler(sender=None, task_id=None, task=None, **kwargs) -> None:
"""
Marks the task STARTED when execution begins on a worker.
Updates the PaperlessTask to be started. Sent before the task begins execution
on a worker.
https://docs.celeryq.dev/en/stable/userguide/signals.html#task-prerun
"""
if task_id is None:
return
try:
close_old_connections()
PaperlessTask.objects.filter(task_id=task_id).update(
status=PaperlessTask.Status.STARTED,
date_started=timezone.now(),
)
except Exception:
task_instance = PaperlessTask.objects.filter(task_id=task_id).first()
if task_instance is not None:
task_instance.status = states.STARTED
task_instance.date_started = timezone.now()
task_instance.save()
except Exception: # pragma: no cover
# Don't let an exception in the signal handlers prevent
# a document from being consumed.
logger.exception("Setting PaperlessTask started failed")
@@ -1181,53 +1074,22 @@ def task_postrun_handler(
**kwargs,
) -> None:
"""
Records task completion and result data.
Updates the result of the PaperlessTask.
https://docs.celeryq.dev/en/stable/userguide/signals.html#task-postrun
"""
if task_id is None:
return
try:
close_old_connections()
status_map = {
"SUCCESS": PaperlessTask.Status.SUCCESS,
"FAILURE": PaperlessTask.Status.FAILURE,
"REVOKED": PaperlessTask.Status.REVOKED,
}
new_status = status_map.get(state, PaperlessTask.Status.FAILURE)
result_data: dict | None = None
result_message: str | None = None
if isinstance(retval, dict):
result_data = retval
elif isinstance(retval, str):
result_message = retval
result_data = _parse_legacy_result(retval)
now = timezone.now()
task_instance = PaperlessTask.objects.filter(task_id=task_id).first()
if task_instance is None:
return
duration_seconds: float | None = None
wait_time_seconds: float | None = None
if task_instance.date_started:
duration_seconds = (now - task_instance.date_started).total_seconds()
if task_instance.date_started and task_instance.date_created:
wait_time_seconds = (
task_instance.date_started - task_instance.date_created
).total_seconds()
PaperlessTask.objects.filter(task_id=task_id).update(
status=new_status,
result_data=result_data,
result_message=result_message,
date_done=now,
duration_seconds=duration_seconds,
wait_time_seconds=wait_time_seconds,
)
except Exception:
if task_instance is not None:
task_instance.status = state or states.FAILURE
task_instance.result = retval
task_instance.date_done = timezone.now()
task_instance.save()
except Exception: # pragma: no cover
# Don't let an exception in the signal handlers prevent
# a document from being consumed.
logger.exception("Updating PaperlessTask failed")
@@ -1241,33 +1103,21 @@ def task_failure_handler(
**kwargs,
) -> None:
"""
Records failure details when a task raises an exception.
Updates the result of a failed PaperlessTask.
https://docs.celeryq.dev/en/stable/userguide/signals.html#task-failure
"""
if task_id is None:
return
try:
close_old_connections()
task_instance = PaperlessTask.objects.filter(task_id=task_id).first()
result_data: dict = {
"error_type": type(exception).__name__ if exception else "Unknown",
"error_message": str(exception) if exception else "Unknown error",
}
if traceback:
import traceback as _tb
tb_str = "".join(_tb.format_tb(traceback))
result_data["traceback"] = tb_str[:5000]
PaperlessTask.objects.filter(task_id=task_id).update(
status=PaperlessTask.Status.FAILURE,
result_data=result_data,
result_message=str(exception) if exception else None,
date_done=timezone.now(),
)
except Exception:
logger.exception("Updating PaperlessTask on failure failed")
if task_instance is not None and task_instance.result is None:
task_instance.status = states.FAILURE
task_instance.result = traceback
task_instance.date_done = timezone.now()
task_instance.save()
except Exception: # pragma: no cover
logger.exception("Updating PaperlessTask failed")
@worker_process_init.connect

View File

@@ -10,6 +10,7 @@ from tempfile import mkstemp
from celery import Task
from celery import shared_task
from celery import states
from django.conf import settings
from django.contrib.contenttypes.models import ContentType
from django.db import models
@@ -40,6 +41,7 @@ from documents.models import Correspondent
from documents.models import CustomFieldInstance
from documents.models import Document
from documents.models import DocumentType
from documents.models import PaperlessTask
from documents.models import ShareLink
from documents.models import ShareLinkBundle
from documents.models import StoragePath
@@ -82,8 +84,19 @@ def index_optimize() -> None:
@shared_task
def train_classifier(
*,
scheduled=True,
status_callback: Callable[[str], None] | None = None,
) -> str:
) -> None:
task = PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK
if scheduled
else PaperlessTask.TaskType.MANUAL_TASK,
task_id=uuid.uuid4(),
task_name=PaperlessTask.TaskName.TRAIN_CLASSIFIER,
status=states.STARTED,
date_created=timezone.now(),
date_started=timezone.now(),
)
if (
not Tag.objects.filter(matching_algorithm=Tag.MATCH_AUTO).exists()
and not DocumentType.objects.filter(matching_algorithm=Tag.MATCH_AUTO).exists()
@@ -97,22 +110,37 @@ def train_classifier(
if settings.MODEL_FILE.exists():
logger.info(f"Removing {settings.MODEL_FILE} so it won't be used")
settings.MODEL_FILE.unlink()
return result
task.status = states.SUCCESS
task.result = result
task.date_done = timezone.now()
task.save()
return
classifier = load_classifier()
if not classifier:
classifier = DocumentClassifier()
if classifier.train(status_callback=status_callback):
logger.info(
f"Saving updated classifier model to {settings.MODEL_FILE}...",
)
classifier.save()
return "Training completed successfully"
else:
logger.debug("Training data unchanged.")
return "Training data unchanged"
try:
if classifier.train(status_callback=status_callback):
logger.info(
f"Saving updated classifier model to {settings.MODEL_FILE}...",
)
classifier.save()
task.result = "Training completed successfully"
else:
logger.debug("Training data unchanged.")
task.result = "Training data unchanged"
task.status = states.SUCCESS
except Exception as e:
logger.warning("Classifier error: " + str(e))
task.status = states.FAILURE
task.result = str(e)
task.date_done = timezone.now()
task.save(update_fields=["status", "result", "date_done"])
@shared_task(bind=True)
@@ -203,8 +231,8 @@ def consume_file(
@shared_task
def sanity_check(*, raise_on_error: bool = True) -> str:
messages = sanity_checker.check_sanity()
def sanity_check(*, scheduled=True, raise_on_error=True):
messages = sanity_checker.check_sanity(scheduled=scheduled)
messages.log_messages()
if not messages.has_error and not messages.has_warning and not messages.has_info:
@@ -607,19 +635,42 @@ def update_document_parent_tags(tag: Tag, new_parent: Tag) -> None:
def llmindex_index(
*,
iter_wrapper: IterWrapper[Document] = identity,
rebuild: bool = False,
) -> str | None:
rebuild=False,
scheduled=True,
auto=False,
) -> None:
ai_config = AIConfig()
if not ai_config.llm_index_enabled:
if ai_config.llm_index_enabled:
task = PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK
if scheduled
else PaperlessTask.TaskType.AUTO
if auto
else PaperlessTask.TaskType.MANUAL_TASK,
task_id=uuid.uuid4(),
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
status=states.STARTED,
date_created=timezone.now(),
date_started=timezone.now(),
)
from paperless_ai.indexing import update_llm_index
try:
result = update_llm_index(
iter_wrapper=iter_wrapper,
rebuild=rebuild,
)
task.status = states.SUCCESS
task.result = result
except Exception as e:
logger.error("LLM index error: " + str(e))
task.status = states.FAILURE
task.result = str(e)
task.date_done = timezone.now()
task.save(update_fields=["status", "result", "date_done"])
else:
logger.info("LLM index is disabled, skipping update.")
return None
from paperless_ai.indexing import update_llm_index
return update_llm_index(
iter_wrapper=iter_wrapper,
rebuild=rebuild,
)
@shared_task

View File

@@ -13,8 +13,6 @@ from rest_framework.test import APIClient
from documents.tests.factories import DocumentFactory
UserModelT = get_user_model()
if TYPE_CHECKING:
from documents.models import Document
@@ -128,34 +126,15 @@ def rest_api_client():
yield APIClient()
@pytest.fixture()
def regular_user(django_user_model: type[UserModelT]) -> UserModelT:
"""Unprivileged authenticated user for permission boundary tests."""
return django_user_model.objects.create_user(username="regular", password="regular")
@pytest.fixture()
def admin_client(rest_api_client: APIClient, admin_user: UserModelT) -> APIClient:
"""Admin client pre-authenticated and sending the v10 Accept header."""
rest_api_client.force_authenticate(user=admin_user)
rest_api_client.credentials(HTTP_ACCEPT="application/json; version=10")
return rest_api_client
@pytest.fixture()
def v9_client(rest_api_client: APIClient, admin_user: UserModelT) -> APIClient:
"""Admin client pre-authenticated and sending the v9 Accept header."""
rest_api_client.force_authenticate(user=admin_user)
rest_api_client.credentials(HTTP_ACCEPT="application/json; version=9")
return rest_api_client
@pytest.fixture()
def user_client(rest_api_client: APIClient, regular_user: UserModelT) -> APIClient:
"""Regular-user client pre-authenticated and sending the v10 Accept header."""
rest_api_client.force_authenticate(user=regular_user)
rest_api_client.credentials(HTTP_ACCEPT="application/json; version=10")
return rest_api_client
@pytest.fixture
def authenticated_rest_api_client(rest_api_client: APIClient):
"""
The basic DRF ApiClient which has been authenticated
"""
UserModel = get_user_model()
user = UserModel.objects.create_user(username="testuser", password="password")
rest_api_client.force_authenticate(user=user)
yield rest_api_client
@pytest.fixture(scope="session", autouse=True)

View File

@@ -11,7 +11,6 @@ from documents.models import Correspondent
from documents.models import Document
from documents.models import DocumentType
from documents.models import MatchingModel
from documents.models import PaperlessTask
from documents.models import StoragePath
from documents.models import Tag
@@ -66,17 +65,3 @@ class DocumentFactory(DjangoModelFactory):
correspondent = None
document_type = None
storage_path = None
class PaperlessTaskFactory(DjangoModelFactory):
class Meta:
model = PaperlessTask
task_id = factory.LazyFunction(lambda: str(__import__("uuid").uuid4()))
task_type = PaperlessTask.TaskType.CONSUME_FILE
trigger_source = PaperlessTask.TriggerSource.WEB_UI
status = PaperlessTask.Status.PENDING
input_data = factory.LazyFunction(dict)
result_data = None
result_message = None
acknowledged = False

View File

@@ -831,7 +831,7 @@ class TestApiAppConfig(DirectoriesMixin, APITestCase):
config.save()
with (
patch("documents.tasks.llmindex_index.apply_async") as mock_update,
patch("documents.tasks.llmindex_index.delay") as mock_update,
patch("paperless_ai.indexing.vector_store_file_exists") as mock_exists,
):
mock_exists.return_value = False

View File

@@ -4,6 +4,7 @@ import tempfile
from pathlib import Path
from unittest import mock
from celery import states
from django.contrib.auth.models import Permission
from django.contrib.auth.models import User
from django.test import override_settings
@@ -12,7 +13,6 @@ from rest_framework.test import APITestCase
from documents.models import PaperlessTask
from documents.permissions import has_system_status_permission
from documents.tests.factories import PaperlessTaskFactory
from paperless import version
@@ -258,10 +258,10 @@ class TestSystemStatus(APITestCase):
THEN:
- The response contains an OK classifier status
"""
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.TRAIN_CLASSIFIER,
trigger_source=PaperlessTask.TriggerSource.SCHEDULED,
status=PaperlessTask.Status.SUCCESS,
PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK,
status=states.SUCCESS,
task_name=PaperlessTask.TaskName.TRAIN_CLASSIFIER,
)
self.client.force_login(self.user)
response = self.client.get(self.ENDPOINT)
@@ -295,11 +295,11 @@ class TestSystemStatus(APITestCase):
THEN:
- The response contains an ERROR classifier status
"""
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.TRAIN_CLASSIFIER,
trigger_source=PaperlessTask.TriggerSource.SCHEDULED,
status=PaperlessTask.Status.FAILURE,
result_message="Classifier training failed",
PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK,
status=states.FAILURE,
task_name=PaperlessTask.TaskName.TRAIN_CLASSIFIER,
result="Classifier training failed",
)
self.client.force_login(self.user)
response = self.client.get(self.ENDPOINT)
@@ -319,10 +319,10 @@ class TestSystemStatus(APITestCase):
THEN:
- The response contains an OK sanity check status
"""
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.SANITY_CHECK,
trigger_source=PaperlessTask.TriggerSource.SCHEDULED,
status=PaperlessTask.Status.SUCCESS,
PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK,
status=states.SUCCESS,
task_name=PaperlessTask.TaskName.CHECK_SANITY,
)
self.client.force_login(self.user)
response = self.client.get(self.ENDPOINT)
@@ -356,11 +356,11 @@ class TestSystemStatus(APITestCase):
THEN:
- The response contains an ERROR sanity check status
"""
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.SANITY_CHECK,
trigger_source=PaperlessTask.TriggerSource.SCHEDULED,
status=PaperlessTask.Status.FAILURE,
result_message="5 issues found.",
PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK,
status=states.FAILURE,
task_name=PaperlessTask.TaskName.CHECK_SANITY,
result="5 issues found.",
)
self.client.force_login(self.user)
response = self.client.get(self.ENDPOINT)
@@ -405,10 +405,10 @@ class TestSystemStatus(APITestCase):
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(response.data["tasks"]["llmindex_status"], "WARNING")
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.LLM_INDEX,
trigger_source=PaperlessTask.TriggerSource.SCHEDULED,
status=PaperlessTask.Status.SUCCESS,
PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK,
status=states.SUCCESS,
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
)
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
@@ -425,11 +425,11 @@ class TestSystemStatus(APITestCase):
- The response contains the correct AI status
"""
with override_settings(AI_ENABLED=True, LLM_EMBEDDING_BACKEND="openai"):
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.LLM_INDEX,
trigger_source=PaperlessTask.TriggerSource.SCHEDULED,
status=PaperlessTask.Status.FAILURE,
result_message="AI index update failed",
PaperlessTask.objects.create(
type=PaperlessTask.TaskType.SCHEDULED_TASK,
status=states.FAILURE,
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
result="AI index update failed",
)
self.client.force_login(self.user)
response = self.client.get(self.ENDPOINT)

View File

@@ -1,586 +1,425 @@
"""Tests for the /api/tasks/ endpoint.
Covers:
- v10 serializer (new field names)
- v9 serializer (backwards-compatible field names)
- Filtering, ordering, acknowledge, acknowledge_all, summary, active, run
"""
import uuid
from datetime import timedelta
from unittest import mock
import pytest
import celery
from django.contrib.auth.models import Permission
from django.contrib.auth.models import User
from django.utils import timezone
from rest_framework import status
from rest_framework.test import APIClient
from rest_framework.test import APITestCase
from documents.models import Document
from documents.models import PaperlessTask
from documents.tests.factories import PaperlessTaskFactory
pytestmark = pytest.mark.api
ENDPOINT = "/api/tasks/"
ACCEPT_V10 = "application/json; version=10"
ACCEPT_V9 = "application/json; version=9"
from documents.tests.utils import DirectoriesMixin
from documents.views import TasksViewSet
@pytest.mark.django_db()
class TestGetTasksV10:
def test_list_returns_tasks(self, admin_client: APIClient) -> None:
"""GET /api/tasks/ returns all tasks visible to the admin."""
PaperlessTaskFactory.create_batch(2)
class TestTasks(DirectoriesMixin, APITestCase):
ENDPOINT = "/api/tasks/"
response = admin_client.get(ENDPOINT)
def setUp(self) -> None:
super().setUp()
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 2
self.user = User.objects.create_superuser(username="temp_admin")
self.client.force_authenticate(user=self.user)
def test_related_document_ids_populated_from_result_data(
self,
admin_client: APIClient,
) -> None:
"""related_document_ids includes the consumed document_id from result_data."""
PaperlessTaskFactory(
status=PaperlessTask.Status.SUCCESS,
result_data={"document_id": 7},
def test_get_tasks(self) -> None:
"""
GIVEN:
- Attempted celery tasks
WHEN:
- API call is made to get tasks
THEN:
- Attempting and pending tasks are serialized and provided
"""
task1 = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
)
response = admin_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["related_document_ids"] == [7]
def test_related_document_ids_includes_duplicate_of(
self,
admin_client: APIClient,
) -> None:
"""related_document_ids includes duplicate_of when the file was already archived."""
PaperlessTaskFactory(
status=PaperlessTask.Status.SUCCESS,
result_data={"duplicate_of": 12},
task2 = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_two.pdf",
)
response = admin_client.get(ENDPOINT)
response = self.client.get(self.ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["related_document_ids"] == [12]
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 2)
returned_task1 = response.data[1]
returned_task2 = response.data[0]
def test_filter_by_task_type(self, admin_client: APIClient) -> None:
"""?task_type= filters results to tasks of that type only."""
PaperlessTaskFactory(task_type=PaperlessTask.TaskType.CONSUME_FILE)
PaperlessTaskFactory(task_type=PaperlessTask.TaskType.TRAIN_CLASSIFIER)
self.assertEqual(returned_task1["task_id"], task1.task_id)
self.assertEqual(returned_task1["status"], celery.states.PENDING)
self.assertEqual(returned_task1["task_file_name"], task1.task_file_name)
response = admin_client.get(
ENDPOINT,
{"task_type": PaperlessTask.TaskType.TRAIN_CLASSIFIER},
self.assertEqual(returned_task2["task_id"], task2.task_id)
self.assertEqual(returned_task2["status"], celery.states.PENDING)
self.assertEqual(returned_task2["task_file_name"], task2.task_file_name)
def test_get_single_task_status(self) -> None:
"""
GIVEN
- Query parameter for a valid task ID
WHEN:
- API call is made to get task status
THEN:
- Single task data is returned
"""
id1 = str(uuid.uuid4())
task1 = PaperlessTask.objects.create(
task_id=id1,
task_file_name="task_one.pdf",
)
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 1
assert response.data[0]["task_type"] == PaperlessTask.TaskType.TRAIN_CLASSIFIER
def test_filter_by_status(self, admin_client: APIClient) -> None:
"""?status= filters results to tasks with that status only."""
PaperlessTaskFactory(status=PaperlessTask.Status.PENDING)
PaperlessTaskFactory(status=PaperlessTask.Status.SUCCESS)
response = admin_client.get(
ENDPOINT,
{"status": PaperlessTask.Status.SUCCESS},
_ = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_two.pdf",
)
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 1
assert response.data[0]["status"] == PaperlessTask.Status.SUCCESS
response = self.client.get(self.ENDPOINT + f"?task_id={id1}")
def test_filter_by_task_id(self, admin_client: APIClient) -> None:
"""?task_id= returns only the task with that UUID."""
task = PaperlessTaskFactory()
PaperlessTaskFactory() # unrelated task that should not appear
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 1)
returned_task1 = response.data[0]
response = admin_client.get(ENDPOINT, {"task_id": task.task_id})
self.assertEqual(returned_task1["task_id"], task1.task_id)
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 1
assert response.data[0]["task_id"] == task.task_id
def test_filter_by_acknowledged(self, admin_client: APIClient) -> None:
"""?acknowledged=false returns only tasks that have not been acknowledged."""
PaperlessTaskFactory(acknowledged=False)
PaperlessTaskFactory(acknowledged=True)
response = admin_client.get(ENDPOINT, {"acknowledged": "false"})
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 1
assert response.data[0]["acknowledged"] is False
def test_filter_is_complete_true(self, admin_client: APIClient) -> None:
"""?is_complete=true returns only SUCCESS and FAILURE tasks."""
PaperlessTaskFactory(status=PaperlessTask.Status.PENDING)
PaperlessTaskFactory(status=PaperlessTask.Status.SUCCESS)
PaperlessTaskFactory(status=PaperlessTask.Status.FAILURE)
response = admin_client.get(ENDPOINT, {"is_complete": "true"})
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 2
returned_statuses = {t["status"] for t in response.data}
assert returned_statuses == {
PaperlessTask.Status.SUCCESS,
PaperlessTask.Status.FAILURE,
}
def test_filter_is_complete_false(self, admin_client: APIClient) -> None:
"""?is_complete=false returns only PENDING and STARTED tasks."""
PaperlessTaskFactory(status=PaperlessTask.Status.PENDING)
PaperlessTaskFactory(status=PaperlessTask.Status.STARTED)
PaperlessTaskFactory(status=PaperlessTask.Status.SUCCESS)
response = admin_client.get(ENDPOINT, {"is_complete": "false"})
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 2
returned_statuses = {t["status"] for t in response.data}
assert returned_statuses == {
PaperlessTask.Status.PENDING,
PaperlessTask.Status.STARTED,
}
def test_default_ordering_is_newest_first(self, admin_client: APIClient) -> None:
"""Tasks are returned in descending date_created order (newest first)."""
base = timezone.now()
t1 = PaperlessTaskFactory(date_created=base)
t2 = PaperlessTaskFactory(date_created=base + timedelta(seconds=1))
t3 = PaperlessTaskFactory(date_created=base + timedelta(seconds=2))
response = admin_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
ids = [t["task_id"] for t in response.data]
assert ids == [t3.task_id, t2.task_id, t1.task_id]
def test_list_is_owner_aware(
self,
admin_user: User,
regular_user: User,
) -> None:
"""The task list only shows tasks the user owns or that are unowned."""
regular_user.user_permissions.add(
Permission.objects.get(codename="view_paperlesstask"),
def test_get_single_task_status_not_valid(self) -> None:
"""
GIVEN
- Query parameter for a non-existent task ID
WHEN:
- API call is made to get task status
THEN:
- No task data is returned
"""
PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
)
client = APIClient()
client.force_authenticate(user=regular_user)
client.credentials(HTTP_ACCEPT=ACCEPT_V10)
PaperlessTaskFactory(owner=admin_user)
shared_task = PaperlessTaskFactory()
own_task = PaperlessTaskFactory(owner=regular_user)
response = client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 2
returned_task_ids = {t["task_id"] for t in response.data}
assert shared_task.task_id in returned_task_ids
assert own_task.task_id in returned_task_ids
@pytest.mark.django_db()
class TestGetTasksV9:
def test_task_name_equals_task_type_value(self, v9_client: APIClient) -> None:
"""task_name mirrors the task_type value for v9 backwards compatibility."""
PaperlessTaskFactory(task_type=PaperlessTask.TaskType.CONSUME_FILE)
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["task_name"] == "consume_file"
def test_task_file_name_from_input_data(self, v9_client: APIClient) -> None:
"""task_file_name is read from input_data['filename']."""
PaperlessTaskFactory(input_data={"filename": "report.pdf"})
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["task_file_name"] == "report.pdf"
def test_task_file_name_none_when_no_filename_key(
self,
v9_client: APIClient,
) -> None:
"""task_file_name is None when filename is absent from input_data."""
PaperlessTaskFactory(input_data={})
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["task_file_name"] is None
def test_type_scheduled_maps_to_scheduled_task(self, v9_client: APIClient) -> None:
"""trigger_source=scheduled maps to type='SCHEDULED_TASK' in v9."""
PaperlessTaskFactory(trigger_source=PaperlessTask.TriggerSource.SCHEDULED)
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["type"] == "SCHEDULED_TASK"
def test_type_system_maps_to_auto_task(self, v9_client: APIClient) -> None:
"""trigger_source=system maps to type='AUTO_TASK' in v9."""
PaperlessTaskFactory(trigger_source=PaperlessTask.TriggerSource.SYSTEM)
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["type"] == "AUTO_TASK"
def test_type_web_ui_maps_to_manual_task(self, v9_client: APIClient) -> None:
"""trigger_source=web_ui maps to type='MANUAL_TASK' in v9."""
PaperlessTaskFactory(trigger_source=PaperlessTask.TriggerSource.WEB_UI)
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["type"] == "MANUAL_TASK"
def test_type_manual_maps_to_manual_task(self, v9_client: APIClient) -> None:
"""trigger_source=manual maps to type='MANUAL_TASK' in v9."""
PaperlessTaskFactory(trigger_source=PaperlessTask.TriggerSource.MANUAL)
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["type"] == "MANUAL_TASK"
def test_related_document_from_result_data_document_id(
self,
v9_client: APIClient,
) -> None:
"""related_document is taken from result_data['document_id'] in v9."""
PaperlessTaskFactory(
status=PaperlessTask.Status.SUCCESS,
result_data={"document_id": 99},
_ = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_two.pdf",
)
response = v9_client.get(ENDPOINT)
response = self.client.get(self.ENDPOINT + "?task_id=bad-task-id")
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["related_document"] == 99
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 0)
def test_related_document_none_when_no_result_data(
self,
v9_client: APIClient,
) -> None:
"""related_document is None when result_data is absent in v9."""
PaperlessTaskFactory(result_data=None)
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["related_document"] is None
def test_duplicate_documents_from_result_data(self, v9_client: APIClient) -> None:
"""duplicate_documents includes duplicate_of from result_data in v9."""
PaperlessTaskFactory(
status=PaperlessTask.Status.SUCCESS,
result_data={"duplicate_of": 55},
def test_acknowledge_tasks(self) -> None:
"""
GIVEN:
- Attempted celery tasks
WHEN:
- API call is made to get mark task as acknowledged
THEN:
- Task is marked as acknowledged
"""
task = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
)
response = v9_client.get(ENDPOINT)
response = self.client.get(self.ENDPOINT)
self.assertEqual(len(response.data), 1)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["duplicate_documents"] == [55]
def test_duplicate_documents_empty_when_no_result_data(
self,
v9_client: APIClient,
) -> None:
"""duplicate_documents is an empty list when result_data is absent in v9."""
PaperlessTaskFactory(result_data=None)
response = v9_client.get(ENDPOINT)
assert response.status_code == status.HTTP_200_OK
assert response.data[0]["duplicate_documents"] == []
def test_filter_by_task_name_maps_to_task_type(self, v9_client: APIClient) -> None:
"""?task_name=consume_file filter maps to the task_type field for v9 compatibility."""
PaperlessTaskFactory(task_type=PaperlessTask.TaskType.CONSUME_FILE)
PaperlessTaskFactory(task_type=PaperlessTask.TaskType.TRAIN_CLASSIFIER)
response = v9_client.get(ENDPOINT, {"task_name": "consume_file"})
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 1
assert response.data[0]["task_name"] == "consume_file"
def test_filter_by_type_maps_to_trigger_source(self, v9_client: APIClient) -> None:
"""?type=SCHEDULED_TASK filter maps to trigger_source=scheduled for v9 compatibility."""
PaperlessTaskFactory(trigger_source=PaperlessTask.TriggerSource.SCHEDULED)
PaperlessTaskFactory(trigger_source=PaperlessTask.TriggerSource.WEB_UI)
response = v9_client.get(ENDPOINT, {"type": "SCHEDULED_TASK"})
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 1
assert response.data[0]["type"] == "SCHEDULED_TASK"
@pytest.mark.django_db()
class TestAcknowledge:
def test_returns_count(self, admin_client: APIClient) -> None:
"""POST acknowledge/ returns the count of tasks that were acknowledged."""
task1 = PaperlessTaskFactory()
task2 = PaperlessTaskFactory()
response = admin_client.post(
ENDPOINT + "acknowledge/",
{"tasks": [task1.id, task2.id]},
format="json",
)
assert response.status_code == status.HTTP_200_OK
assert response.data == {"result": 2}
def test_acknowledged_tasks_excluded_from_unacked_filter(
self,
admin_client: APIClient,
) -> None:
"""Acknowledged tasks no longer appear when filtering with ?acknowledged=false."""
task = PaperlessTaskFactory()
admin_client.post(
ENDPOINT + "acknowledge/",
response = self.client.post(
self.ENDPOINT + "acknowledge/",
{"tasks": [task.id]},
format="json",
)
self.assertEqual(response.status_code, status.HTTP_200_OK)
response = self.client.get(self.ENDPOINT + "?acknowledged=false")
self.assertEqual(len(response.data), 0)
def test_acknowledge_tasks_requires_change_permission(self) -> None:
"""
GIVEN:
- A regular user initially without change permissions
- A regular user with change permissions
WHEN:
- API call is made to acknowledge tasks
THEN:
- The first user is forbidden from acknowledging tasks
- The second user is allowed to acknowledge tasks
"""
regular_user = User.objects.create_user(username="test")
self.client.force_authenticate(user=regular_user)
task = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
)
response = admin_client.get(ENDPOINT, {"acknowledged": "false"})
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 0
def test_requires_change_permission(self, user_client: APIClient) -> None:
"""Regular users without change_paperlesstask permission receive 403."""
task = PaperlessTaskFactory()
response = user_client.post(
ENDPOINT + "acknowledge/",
response = self.client.post(
self.ENDPOINT + "acknowledge/",
{"tasks": [task.id]},
format="json",
)
self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)
assert response.status_code == status.HTTP_403_FORBIDDEN
def test_succeeds_with_change_permission(self, regular_user: User) -> None:
"""Users granted change_paperlesstask permission can acknowledge tasks."""
regular_user.user_permissions.add(
regular_user2 = User.objects.create_user(username="test2")
regular_user2.user_permissions.add(
Permission.objects.get(codename="change_paperlesstask"),
)
regular_user.save()
regular_user2.save()
self.client.force_authenticate(user=regular_user2)
client = APIClient()
client.force_authenticate(user=regular_user)
client.credentials(HTTP_ACCEPT=ACCEPT_V10)
task = PaperlessTaskFactory()
response = client.post(
ENDPOINT + "acknowledge/",
response = self.client.post(
self.ENDPOINT + "acknowledge/",
{"tasks": [task.id]},
format="json",
)
self.assertEqual(response.status_code, status.HTTP_200_OK)
def test_tasks_owner_aware(self) -> None:
"""
GIVEN:
- Existing PaperlessTasks with owner and with no owner
WHEN:
- API call is made to get tasks
THEN:
- Only tasks with no owner or request user are returned
"""
regular_user = User.objects.create_user(username="test")
regular_user.user_permissions.add(*Permission.objects.all())
self.client.logout()
self.client.force_authenticate(user=regular_user)
task1 = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
owner=self.user,
)
assert response.status_code == status.HTTP_200_OK
@pytest.mark.django_db()
class TestAcknowledgeAll:
def test_marks_only_completed_tasks(self, admin_client: APIClient) -> None:
"""acknowledge_all/ marks only SUCCESS and FAILURE tasks as acknowledged."""
PaperlessTaskFactory(status=PaperlessTask.Status.SUCCESS, acknowledged=False)
PaperlessTaskFactory(status=PaperlessTask.Status.FAILURE, acknowledged=False)
PaperlessTaskFactory(status=PaperlessTask.Status.PENDING, acknowledged=False)
response = admin_client.post(ENDPOINT + "acknowledge_all/")
assert response.status_code == status.HTTP_200_OK
assert response.data == {"result": 2}
def test_skips_already_acknowledged(self, admin_client: APIClient) -> None:
"""acknowledge_all/ does not re-acknowledge tasks that are already acknowledged."""
PaperlessTaskFactory(status=PaperlessTask.Status.SUCCESS, acknowledged=True)
PaperlessTaskFactory(status=PaperlessTask.Status.SUCCESS, acknowledged=False)
response = admin_client.post(ENDPOINT + "acknowledge_all/")
assert response.status_code == status.HTTP_200_OK
assert response.data == {"result": 1}
def test_skips_pending_and_started(self, admin_client: APIClient) -> None:
"""acknowledge_all/ does not touch PENDING or STARTED tasks."""
PaperlessTaskFactory(status=PaperlessTask.Status.PENDING)
PaperlessTaskFactory(status=PaperlessTask.Status.STARTED)
response = admin_client.post(ENDPOINT + "acknowledge_all/")
assert response.status_code == status.HTTP_200_OK
assert response.data == {"result": 0}
def test_includes_revoked(self, admin_client: APIClient) -> None:
"""acknowledge_all/ marks REVOKED tasks as acknowledged."""
PaperlessTaskFactory(status=PaperlessTask.Status.REVOKED, acknowledged=False)
response = admin_client.post(ENDPOINT + "acknowledge_all/")
assert response.status_code == status.HTTP_200_OK
assert response.data == {"result": 1}
@pytest.mark.django_db()
class TestSummary:
def test_returns_per_type_totals(self, admin_client: APIClient) -> None:
"""summary/ returns per-type counts of total, success, and failure tasks."""
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.CONSUME_FILE,
status=PaperlessTask.Status.SUCCESS,
)
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.CONSUME_FILE,
status=PaperlessTask.Status.FAILURE,
)
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.TRAIN_CLASSIFIER,
status=PaperlessTask.Status.SUCCESS,
task2 = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_two.pdf",
)
response = admin_client.get(ENDPOINT + "summary/")
task3 = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_three.pdf",
owner=regular_user,
)
assert response.status_code == status.HTTP_200_OK
by_type = {item["task_type"]: item for item in response.data}
assert by_type["consume_file"]["total_count"] == 2
assert by_type["consume_file"]["success_count"] == 1
assert by_type["consume_file"]["failure_count"] == 1
assert by_type["train_classifier"]["total_count"] == 1
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 2)
self.assertEqual(response.data[0]["task_id"], task3.task_id)
self.assertEqual(response.data[1]["task_id"], task2.task_id)
@pytest.mark.django_db()
class TestActive:
def test_returns_pending_and_started_only(self, admin_client: APIClient) -> None:
"""active/ returns only tasks in PENDING or STARTED status."""
PaperlessTaskFactory(status=PaperlessTask.Status.PENDING)
PaperlessTaskFactory(status=PaperlessTask.Status.STARTED)
PaperlessTaskFactory(status=PaperlessTask.Status.SUCCESS)
PaperlessTaskFactory(status=PaperlessTask.Status.FAILURE)
acknowledge_response = self.client.post(
self.ENDPOINT + "acknowledge/",
{"tasks": [task1.id, task2.id, task3.id]},
)
self.assertEqual(acknowledge_response.status_code, status.HTTP_200_OK)
self.assertEqual(acknowledge_response.data, {"result": 2})
response = admin_client.get(ENDPOINT + "active/")
def test_task_result_no_error(self) -> None:
"""
GIVEN:
- A celery task completed without error
WHEN:
- API call is made to get tasks
THEN:
- The returned data includes the task result
"""
PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
status=celery.states.SUCCESS,
result="Success. New document id 1 created",
)
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 2
active_statuses = {t["status"] for t in response.data}
assert active_statuses == {
PaperlessTask.Status.PENDING,
PaperlessTask.Status.STARTED,
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 1)
returned_data = response.data[0]
self.assertEqual(returned_data["result"], "Success. New document id 1 created")
self.assertEqual(returned_data["related_document"], "1")
def test_task_result_with_error(self) -> None:
"""
GIVEN:
- A celery task completed with an exception
WHEN:
- API call is made to get tasks
THEN:
- The returned result is the exception info
"""
PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
status=celery.states.FAILURE,
result="test.pdf: Unexpected error during ingestion.",
)
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 1)
returned_data = response.data[0]
self.assertEqual(
returned_data["result"],
"test.pdf: Unexpected error during ingestion.",
)
def test_task_name_webui(self) -> None:
"""
GIVEN:
- Attempted celery task
- Task was created through the webui
WHEN:
- API call is made to get tasks
THEN:
- Returned data include the filename
"""
PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="test.pdf",
task_name=PaperlessTask.TaskName.CONSUME_FILE,
status=celery.states.SUCCESS,
)
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 1)
returned_data = response.data[0]
self.assertEqual(returned_data["task_file_name"], "test.pdf")
def test_task_name_consume_folder(self) -> None:
"""
GIVEN:
- Attempted celery task
- Task was created through the consume folder
WHEN:
- API call is made to get tasks
THEN:
- Returned data include the filename
"""
PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="anothertest.pdf",
task_name=PaperlessTask.TaskName.CONSUME_FILE,
status=celery.states.SUCCESS,
)
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 1)
returned_data = response.data[0]
self.assertEqual(returned_data["task_file_name"], "anothertest.pdf")
def test_task_result_duplicate_warning_includes_count(self) -> None:
"""
GIVEN:
- A celery task succeeds, but a duplicate exists
WHEN:
- API call is made to get tasks
THEN:
- The returned data includes duplicate warning metadata
"""
checksum = "duplicate-checksum"
Document.objects.create(
title="Existing",
content="",
mime_type="application/pdf",
checksum=checksum,
)
created_doc = Document.objects.create(
title="Created",
content="",
mime_type="application/pdf",
checksum=checksum,
archive_checksum="another-checksum",
)
PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_file_name="task_one.pdf",
status=celery.states.SUCCESS,
result=f"Success. New document id {created_doc.pk} created",
)
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(len(response.data), 1)
returned_data = response.data[0]
self.assertEqual(returned_data["related_document"], str(created_doc.pk))
def test_run_train_classifier_task(self) -> None:
"""
GIVEN:
- A superuser
WHEN:
- API call is made to run the train classifier task
THEN:
- The task is run
"""
mock_train_classifier = mock.Mock(return_value="Task started")
TasksViewSet.TASK_AND_ARGS_BY_NAME = {
PaperlessTask.TaskName.TRAIN_CLASSIFIER: (
mock_train_classifier,
{"scheduled": False},
),
}
def test_excludes_revoked_tasks_from_active(self, admin_client: APIClient) -> None:
"""active/ excludes REVOKED tasks."""
PaperlessTaskFactory(status=PaperlessTask.Status.REVOKED)
response = admin_client.get(ENDPOINT + "active/")
assert response.status_code == status.HTTP_200_OK
assert len(response.data) == 0
@pytest.mark.django_db()
class TestRun:
def test_forbidden_for_regular_user(self, user_client: APIClient) -> None:
"""Regular users without add_paperlesstask permission receive 403 from run/."""
response = user_client.post(
ENDPOINT + "run/",
{"task_type": PaperlessTask.TaskType.TRAIN_CLASSIFIER},
format="json",
response = self.client.post(
self.ENDPOINT + "run/",
{"task_name": PaperlessTask.TaskName.TRAIN_CLASSIFIER},
)
assert response.status_code == status.HTTP_403_FORBIDDEN
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(response.data, {"result": "Task started"})
mock_train_classifier.assert_called_once_with(scheduled=False)
def test_dispatches_via_apply_async_with_manual_trigger_header(
self,
admin_client: APIClient,
) -> None:
"""run/ dispatches the task via apply_async with trigger_source=manual in headers."""
fake_task_id = str(uuid.uuid4())
mock_async_result = mock.Mock()
mock_async_result.id = fake_task_id
mock_apply_async = mock.Mock(return_value=mock_async_result)
with mock.patch(
"documents.views.train_classifier.apply_async",
mock_apply_async,
):
response = admin_client.post(
ENDPOINT + "run/",
{"task_type": PaperlessTask.TaskType.TRAIN_CLASSIFIER},
format="json",
)
assert response.status_code == status.HTTP_200_OK
assert response.data == {"task_id": fake_task_id}
mock_apply_async.assert_called_once_with(
kwargs={},
headers={"trigger_source": "manual"},
# mock error
mock_train_classifier.reset_mock()
mock_train_classifier.side_effect = Exception("Error")
response = self.client.post(
self.ENDPOINT + "run/",
{"task_name": PaperlessTask.TaskName.TRAIN_CLASSIFIER},
)
def test_returns_400_for_consume_file(self, admin_client: APIClient) -> None:
"""consume_file cannot be manually triggered via the run endpoint."""
response = admin_client.post(
ENDPOINT + "run/",
{"task_type": PaperlessTask.TaskType.CONSUME_FILE},
format="json",
self.assertEqual(response.status_code, status.HTTP_500_INTERNAL_SERVER_ERROR)
mock_train_classifier.assert_called_once_with(scheduled=False)
@mock.patch("documents.tasks.sanity_check")
def test_run_task_requires_superuser(self, mock_check_sanity) -> None:
"""
GIVEN:
- A regular user
WHEN:
- API call is made to run a task
THEN:
- The task is not run
"""
regular_user = User.objects.create_user(username="test")
regular_user.user_permissions.add(*Permission.objects.all())
self.client.logout()
self.client.force_authenticate(user=regular_user)
response = self.client.post(
self.ENDPOINT + "run/",
{"task_name": PaperlessTask.TaskName.CHECK_SANITY},
)
assert response.status_code == status.HTTP_400_BAD_REQUEST
def test_returns_400_for_invalid_task_type(self, admin_client: APIClient) -> None:
"""run/ returns 400 for an unrecognized task_type value."""
response = admin_client.post(
ENDPOINT + "run/",
{"task_type": "not_a_real_type"},
format="json",
)
assert response.status_code == status.HTTP_400_BAD_REQUEST
def test_sanity_check_dispatched_with_correct_kwargs(
self,
admin_client: APIClient,
) -> None:
"""run/ dispatches sanity_check with raise_on_error=False and manual trigger header."""
fake_task_id = str(uuid.uuid4())
mock_async_result = mock.Mock()
mock_async_result.id = fake_task_id
mock_apply_async = mock.Mock(return_value=mock_async_result)
with mock.patch(
"documents.views.sanity_check.apply_async",
mock_apply_async,
):
response = admin_client.post(
ENDPOINT + "run/",
{"task_type": PaperlessTask.TaskType.SANITY_CHECK},
format="json",
)
assert response.status_code == status.HTTP_200_OK
assert response.data == {"task_id": fake_task_id}
mock_apply_async.assert_called_once_with(
kwargs={"raise_on_error": False},
headers={"trigger_source": "manual"},
)
self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)
mock_check_sanity.assert_not_called()

View File

@@ -1328,7 +1328,7 @@ class PreConsumeTestCase(DirectoriesMixin, GetConsumerMixin, TestCase):
environment = args[1]
self.assertEqual(command[0], script.name)
self.assertEqual(len(command), 1)
self.assertEqual(command[1], str(self.test_file))
subset = {
"DOCUMENT_SOURCE_PATH": str(c.input_doc.original_file),
@@ -1478,7 +1478,11 @@ class PostConsumeTestCase(DirectoriesMixin, GetConsumerMixin, TestCase):
environment = args[1]
self.assertEqual(command[0], script.name)
self.assertEqual(len(command), 1)
self.assertEqual(command[1], str(doc.pk))
self.assertEqual(command[5], f"/api/documents/{doc.pk}/download/")
self.assertEqual(command[6], f"/api/documents/{doc.pk}/thumb/")
self.assertEqual(command[7], "my_bank")
self.assertCountEqual(command[8].split(","), ["a", "b"])
subset = {
"DOCUMENT_ID": str(doc.pk),

View File

@@ -211,7 +211,7 @@ class TestCreateClassifier:
call_command("document_create_classifier", skip_checks=True)
m.assert_called_once_with(status_callback=mocker.ANY)
m.assert_called_once_with(scheduled=False, status_callback=mocker.ANY)
assert callable(m.call_args.kwargs["status_callback"])
def test_create_classifier_callback_output(self, mocker: MockerFixture) -> None:

View File

@@ -1,7 +1,7 @@
"""Tests for the sanity checker module.
Tests exercise ``check_sanity`` as a whole, verifying document validation,
orphan detection, and the iter_wrapper contract.
orphan detection, task recording, and the iter_wrapper contract.
"""
from __future__ import annotations
@@ -12,12 +12,13 @@ from typing import TYPE_CHECKING
import pytest
from documents.models import Document
from documents.models import PaperlessTask
from documents.sanity_checker import check_sanity
if TYPE_CHECKING:
from collections.abc import Iterable
from documents.models import Document
from documents.tests.conftest import PaperlessDirs
@@ -228,6 +229,35 @@ class TestCheckSanityIterWrapper:
assert not messages.has_error
@pytest.mark.django_db
class TestCheckSanityTaskRecording:
@pytest.mark.parametrize(
("expected_type", "scheduled"),
[
pytest.param(PaperlessTask.TaskType.SCHEDULED_TASK, True, id="scheduled"),
pytest.param(PaperlessTask.TaskType.MANUAL_TASK, False, id="manual"),
],
)
@pytest.mark.usefixtures("_media_settings")
def test_task_type(self, expected_type: str, *, scheduled: bool) -> None:
check_sanity(scheduled=scheduled)
task = PaperlessTask.objects.latest("date_created")
assert task.task_name == PaperlessTask.TaskName.CHECK_SANITY
assert task.type == expected_type
def test_success_status(self, sample_doc: Document) -> None:
check_sanity()
task = PaperlessTask.objects.latest("date_created")
assert task.status == "SUCCESS"
def test_failure_status(self, sample_doc: Document) -> None:
Path(sample_doc.source_path).unlink()
check_sanity()
task = PaperlessTask.objects.latest("date_created")
assert task.status == "FAILURE"
assert "Check logs for details" in task.result
@pytest.mark.django_db
class TestCheckSanityLogMessages:
def test_logs_doc_issues(

View File

@@ -1,302 +1,250 @@
import uuid
from unittest import mock
import pytest
import celery
from django.contrib.auth import get_user_model
from django.test import TestCase
from documents.data_models import ConsumableDocument
from documents.data_models import DocumentMetadataOverrides
from documents.data_models import DocumentSource
from documents.models import Document
from documents.models import PaperlessTask
from documents.signals.handlers import add_to_index
from documents.signals.handlers import before_task_publish_handler
from documents.signals.handlers import task_failure_handler
from documents.signals.handlers import task_postrun_handler
from documents.signals.handlers import task_prerun_handler
from documents.tests.test_consumer import fake_magic_from_file
from documents.tests.utils import DirectoriesMixin
@pytest.fixture
def consume_input_doc():
doc = mock.MagicMock(spec=ConsumableDocument)
# original_file is a Path; configure the nested mock so .name works
doc.original_file = mock.MagicMock()
doc.original_file.name = "invoice.pdf"
doc.original_path = None
doc.mime_type = "application/pdf"
doc.mailrule_id = None
doc.source = DocumentSource.WebUI
return doc
@mock.patch("documents.consumer.magic.from_file", fake_magic_from_file)
class TestTaskSignalHandler(DirectoriesMixin, TestCase):
@classmethod
def setUpTestData(cls) -> None:
super().setUpTestData()
cls.user = get_user_model().objects.create_user(username="testuser")
@pytest.fixture
def consume_overrides(django_user_model):
user = django_user_model.objects.create_user(username="testuser")
overrides = mock.MagicMock(spec=DocumentMetadataOverrides)
overrides.owner_id = user.id
return overrides
def send_publish(
task_name: str,
args: tuple,
kwargs: dict,
headers: dict | None = None,
) -> str:
from documents.signals.handlers import before_task_publish_handler
task_id = str(uuid.uuid4())
hdrs = {"task": task_name, "id": task_id, **(headers or {})}
before_task_publish_handler(sender=task_name, headers=hdrs, body=(args, kwargs, {}))
return task_id
@pytest.mark.django_db
class TestBeforeTaskPublishHandler:
def test_creates_task_for_consume_file(self, consume_input_doc, consume_overrides):
task_id = send_publish(
"documents.tasks.consume_file",
(consume_input_doc, consume_overrides),
{},
)
task = PaperlessTask.objects.get(task_id=task_id)
assert task.task_type == PaperlessTask.TaskType.CONSUME_FILE
assert task.status == PaperlessTask.Status.PENDING
assert task.trigger_source == PaperlessTask.TriggerSource.WEB_UI
assert task.input_data["filename"] == "invoice.pdf"
assert task.owner_id == consume_overrides.owner_id
def test_creates_task_for_train_classifier(self):
task_id = send_publish("documents.tasks.train_classifier", (), {})
task = PaperlessTask.objects.get(task_id=task_id)
assert task.task_type == PaperlessTask.TaskType.TRAIN_CLASSIFIER
assert task.trigger_source == PaperlessTask.TriggerSource.MANUAL
def test_creates_task_for_sanity_check(self):
task_id = send_publish("documents.tasks.sanity_check", (), {})
task = PaperlessTask.objects.get(task_id=task_id)
assert task.task_type == PaperlessTask.TaskType.SANITY_CHECK
def test_creates_task_for_process_mail_accounts(self):
task_id = send_publish(
"paperless_mail.tasks.process_mail_accounts",
(),
{"account_ids": [1, 2]},
)
task = PaperlessTask.objects.get(task_id=task_id)
assert task.task_type == PaperlessTask.TaskType.MAIL_FETCH
assert task.input_data["account_ids"] == [1, 2]
def test_scheduled_header_sets_trigger_source(self):
task_id = send_publish(
"documents.tasks.train_classifier",
(),
{},
headers={"trigger_source": "scheduled"},
)
task = PaperlessTask.objects.get(task_id=task_id)
assert task.trigger_source == PaperlessTask.TriggerSource.SCHEDULED
def test_system_header_sets_trigger_source(self):
task_id = send_publish(
"documents.tasks.llmindex_index",
(),
{"rebuild": True},
headers={"trigger_source": "system"},
)
task = PaperlessTask.objects.get(task_id=task_id)
assert task.trigger_source == PaperlessTask.TriggerSource.SYSTEM
def test_ignores_untracked_task(self):
send_publish("documents.tasks.bulk_update_documents", ([1, 2],), {})
assert PaperlessTask.objects.count() == 0
def test_ignores_none_headers(self):
from documents.signals.handlers import before_task_publish_handler
before_task_publish_handler(sender=None, headers=None, body=None)
assert PaperlessTask.objects.count() == 0
def test_consume_folder_source_maps_correctly(
def util_call_before_task_publish_handler(
self,
consume_input_doc,
consume_overrides,
):
consume_input_doc.source = DocumentSource.ConsumeFolder
task_id = send_publish(
"documents.tasks.consume_file",
(consume_input_doc, consume_overrides),
headers_to_use,
body_to_use,
) -> None:
"""
Simple utility to call the pre-run handle and ensure it created a single task
instance
"""
self.assertEqual(PaperlessTask.objects.all().count(), 0)
before_task_publish_handler(headers=headers_to_use, body=body_to_use)
self.assertEqual(PaperlessTask.objects.all().count(), 1)
def test_before_task_publish_handler_consume(self) -> None:
"""
GIVEN:
- A celery task is started via the consume folder
WHEN:
- Task before publish handler is called
THEN:
- The task is created and marked as pending
"""
headers = {
"id": str(uuid.uuid4()),
"task": "documents.tasks.consume_file",
}
body = (
# args
(
ConsumableDocument(
source=DocumentSource.ConsumeFolder,
original_file="/consume/hello-999.pdf",
),
DocumentMetadataOverrides(
title="Hello world",
owner_id=self.user.id,
),
),
# kwargs
{},
# celery stuff
{"callbacks": None, "errbacks": None, "chain": None, "chord": None},
)
self.util_call_before_task_publish_handler(
headers_to_use=headers,
body_to_use=body,
)
task = PaperlessTask.objects.get(task_id=task_id)
assert task.trigger_source == PaperlessTask.TriggerSource.FOLDER_CONSUME
def test_email_source_maps_correctly(self, consume_input_doc, consume_overrides):
consume_input_doc.source = DocumentSource.MailFetch
task_id = send_publish(
"documents.tasks.consume_file",
(consume_input_doc, consume_overrides),
task = PaperlessTask.objects.get()
self.assertIsNotNone(task)
self.assertEqual(headers["id"], task.task_id)
self.assertEqual("hello-999.pdf", task.task_file_name)
self.assertEqual(PaperlessTask.TaskName.CONSUME_FILE, task.task_name)
self.assertEqual(self.user.id, task.owner_id)
self.assertEqual(celery.states.PENDING, task.status)
def test_task_prerun_handler(self) -> None:
"""
GIVEN:
- A celery task is started via the consume folder
WHEN:
- Task starts execution
THEN:
- The task is marked as started
"""
headers = {
"id": str(uuid.uuid4()),
"task": "documents.tasks.consume_file",
}
body = (
# args
(
ConsumableDocument(
source=DocumentSource.ConsumeFolder,
original_file="/consume/hello-99.pdf",
),
None,
),
# kwargs
{},
)
task = PaperlessTask.objects.get(task_id=task_id)
assert task.trigger_source == PaperlessTask.TriggerSource.EMAIL_CONSUME
@pytest.mark.django_db
class TestTaskPrerunHandler:
def test_marks_task_started(self):
task = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_type=PaperlessTask.TaskType.CONSUME_FILE,
trigger_source=PaperlessTask.TriggerSource.MANUAL,
status=PaperlessTask.Status.PENDING,
)
from documents.signals.handlers import task_prerun_handler
task_prerun_handler(task_id=task.task_id)
task.refresh_from_db()
assert task.status == PaperlessTask.Status.STARTED
assert task.date_started is not None
def test_ignores_unknown_task_id(self):
from documents.signals.handlers import task_prerun_handler
task_prerun_handler(task_id="nonexistent-id") # must not raise
def test_ignores_none_task_id(self):
from documents.signals.handlers import task_prerun_handler
task_prerun_handler(task_id=None) # must not raise
@pytest.mark.django_db
class TestTaskPostrunHandler:
def _started_task(self) -> PaperlessTask:
from django.utils import timezone
return PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_type=PaperlessTask.TaskType.TRAIN_CLASSIFIER,
trigger_source=PaperlessTask.TriggerSource.MANUAL,
status=PaperlessTask.Status.STARTED,
date_started=timezone.now(),
# celery stuff
{"callbacks": None, "errbacks": None, "chain": None, "chord": None},
)
def test_records_success_with_dict_result(self):
task = self._started_task()
from documents.signals.handlers import task_postrun_handler
self.util_call_before_task_publish_handler(
headers_to_use=headers,
body_to_use=body,
)
task_prerun_handler(task_id=headers["id"])
task = PaperlessTask.objects.get()
self.assertEqual(celery.states.STARTED, task.status)
def test_task_postrun_handler(self) -> None:
"""
GIVEN:
- A celery task is started via the consume folder
WHEN:
- Task finished execution
THEN:
- The task is marked as started
"""
headers = {
"id": str(uuid.uuid4()),
"task": "documents.tasks.consume_file",
}
body = (
# args
(
ConsumableDocument(
source=DocumentSource.ConsumeFolder,
original_file="/consume/hello-9.pdf",
),
None,
),
# kwargs
{},
# celery stuff
{"callbacks": None, "errbacks": None, "chain": None, "chord": None},
)
self.util_call_before_task_publish_handler(
headers_to_use=headers,
body_to_use=body,
)
task_postrun_handler(
task_id=task.task_id,
retval={"document_id": 42},
state="SUCCESS",
task_id=headers["id"],
retval="Success. New document id 1 created",
state=celery.states.SUCCESS,
)
task.refresh_from_db()
assert task.status == PaperlessTask.Status.SUCCESS
assert task.result_data == {"document_id": 42}
assert task.date_done is not None
assert task.duration_seconds is not None
assert task.wait_time_seconds is not None
def test_records_failure_state(self):
task = self._started_task()
from documents.signals.handlers import task_postrun_handler
task = PaperlessTask.objects.get()
task_postrun_handler(task_id=task.task_id, retval="some error", state="FAILURE")
task.refresh_from_db()
assert task.status == PaperlessTask.Status.FAILURE
self.assertEqual(celery.states.SUCCESS, task.status)
def test_parses_legacy_new_document_string(self):
task = self._started_task()
from documents.signals.handlers import task_postrun_handler
task_postrun_handler(
task_id=task.task_id,
retval="New document id 42 created",
state="SUCCESS",
def test_task_failure_handler(self) -> None:
"""
GIVEN:
- A celery task is started via the consume folder
WHEN:
- Task failed execution
THEN:
- The task is marked as failed
"""
headers = {
"id": str(uuid.uuid4()),
"task": "documents.tasks.consume_file",
}
body = (
# args
(
ConsumableDocument(
source=DocumentSource.ConsumeFolder,
original_file="/consume/hello-9.pdf",
),
None,
),
# kwargs
{},
# celery stuff
{"callbacks": None, "errbacks": None, "chain": None, "chord": None},
)
task.refresh_from_db()
assert task.result_data["document_id"] == 42
assert task.result_message == "New document id 42 created"
def test_parses_legacy_duplicate_string(self):
task = self._started_task()
from documents.signals.handlers import task_postrun_handler
task_postrun_handler(
task_id=task.task_id,
retval="It is a duplicate of some document (#99).",
state="FAILURE",
self.util_call_before_task_publish_handler(
headers_to_use=headers,
body_to_use=body,
)
task.refresh_from_db()
assert task.result_data["duplicate_of"] == 99
assert task.result_data["duplicate_in_trash"] is False
def test_ignores_unknown_task_id(self):
from documents.signals.handlers import task_postrun_handler
task_postrun_handler(
task_id="nonexistent",
retval=None,
state="SUCCESS",
) # must not raise
def test_records_revoked_state(self):
task = self._started_task()
from documents.signals.handlers import task_postrun_handler
task_postrun_handler(task_id=task.task_id, retval=None, state="REVOKED")
task.refresh_from_db()
assert task.status == PaperlessTask.Status.REVOKED
@pytest.mark.django_db
class TestTaskFailureHandler:
def test_records_failure_with_exception(self):
from django.utils import timezone
task = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_type=PaperlessTask.TaskType.CONSUME_FILE,
trigger_source=PaperlessTask.TriggerSource.WEB_UI,
status=PaperlessTask.Status.STARTED,
date_started=timezone.now(),
)
from documents.signals.handlers import task_failure_handler
task_failure_handler(
task_id=task.task_id,
exception=ValueError("PDF parse failed"),
traceback=None,
task_id=headers["id"],
exception="Example failure",
)
task.refresh_from_db()
assert task.status == PaperlessTask.Status.FAILURE
assert task.result_data["error_type"] == "ValueError"
assert task.result_data["error_message"] == "PDF parse failed"
assert task.date_done is not None
def test_records_traceback_when_provided(self):
import sys
task = PaperlessTask.objects.get()
from django.utils import timezone
self.assertEqual(celery.states.FAILURE, task.status)
task = PaperlessTask.objects.create(
task_id=str(uuid.uuid4()),
task_type=PaperlessTask.TaskType.CONSUME_FILE,
trigger_source=PaperlessTask.TriggerSource.WEB_UI,
status=PaperlessTask.Status.STARTED,
date_started=timezone.now(),
def test_add_to_index_indexes_root_once_for_root_documents(self) -> None:
root = Document.objects.create(
title="root",
checksum="root",
mime_type="application/pdf",
)
try:
raise ValueError("test error")
except ValueError:
tb = sys.exc_info()[2]
from documents.signals.handlers import task_failure_handler
with mock.patch("documents.search.get_backend") as mock_get_backend:
mock_backend = mock.MagicMock()
mock_get_backend.return_value = mock_backend
add_to_index(sender=None, document=root)
task_failure_handler(
task_id=task.task_id,
exception=ValueError("test error"),
traceback=tb,
mock_backend.add_or_update.assert_called_once_with(root, effective_content="")
def test_add_to_index_reindexes_root_for_version_documents(self) -> None:
root = Document.objects.create(
title="root",
checksum="root",
mime_type="application/pdf",
)
version = Document.objects.create(
title="version",
checksum="version",
mime_type="application/pdf",
root_document=root,
)
task.refresh_from_db()
assert "traceback" in task.result_data
assert len(task.result_data["traceback"]) <= 5000
def test_ignores_none_task_id(self):
from documents.signals.handlers import task_failure_handler
with mock.patch("documents.search.get_backend") as mock_get_backend:
mock_backend = mock.MagicMock()
mock_get_backend.return_value = mock_backend
add_to_index(sender=None, document=version)
task_failure_handler(task_id=None, exception=ValueError("x"), traceback=None)
self.assertEqual(mock_backend.add_or_update.call_count, 1)
self.assertEqual(
mock_backend.add_or_update.call_args_list[0].args[0].id,
version.id,
)
self.assertEqual(
mock_backend.add_or_update.call_args_list[0].kwargs,
{"effective_content": version.content},
)

View File

@@ -4,6 +4,7 @@ from pathlib import Path
from unittest import mock
import pytest
from celery import states
from django.conf import settings
from django.test import TestCase
from django.test import override_settings
@@ -13,6 +14,7 @@ from documents import tasks
from documents.models import Correspondent
from documents.models import Document
from documents.models import DocumentType
from documents.models import PaperlessTask
from documents.models import Tag
from documents.sanity_checker import SanityCheckFailedException
from documents.sanity_checker import SanityCheckMessages
@@ -38,8 +40,7 @@ class TestClassifier(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
def test_train_classifier_with_auto_tag(self, load_classifier) -> None:
load_classifier.return_value = None
Tag.objects.create(matching_algorithm=Tag.MATCH_AUTO, name="test")
with self.assertRaises(ValueError):
tasks.train_classifier()
tasks.train_classifier()
load_classifier.assert_called_once()
self.assertIsNotFile(settings.MODEL_FILE)
@@ -47,8 +48,7 @@ class TestClassifier(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
def test_train_classifier_with_auto_type(self, load_classifier) -> None:
load_classifier.return_value = None
DocumentType.objects.create(matching_algorithm=Tag.MATCH_AUTO, name="test")
with self.assertRaises(ValueError):
tasks.train_classifier()
tasks.train_classifier()
load_classifier.assert_called_once()
self.assertIsNotFile(settings.MODEL_FILE)
@@ -56,8 +56,7 @@ class TestClassifier(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
def test_train_classifier_with_auto_correspondent(self, load_classifier) -> None:
load_classifier.return_value = None
Correspondent.objects.create(matching_algorithm=Tag.MATCH_AUTO, name="test")
with self.assertRaises(ValueError):
tasks.train_classifier()
tasks.train_classifier()
load_classifier.assert_called_once()
self.assertIsNotFile(settings.MODEL_FILE)
@@ -299,7 +298,7 @@ class TestAIIndex(DirectoriesMixin, TestCase):
WHEN:
- llmindex_index task is called
THEN:
- update_llm_index is called and its result is returned
- update_llm_index is called, and the task is marked as success
"""
Document.objects.create(
title="test",
@@ -309,9 +308,13 @@ class TestAIIndex(DirectoriesMixin, TestCase):
# lazy-loaded so mock the actual function
with mock.patch("paperless_ai.indexing.update_llm_index") as update_llm_index:
update_llm_index.return_value = "LLM index updated successfully."
result = tasks.llmindex_index()
tasks.llmindex_index()
update_llm_index.assert_called_once()
self.assertEqual(result, "LLM index updated successfully.")
task = PaperlessTask.objects.get(
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
)
self.assertEqual(task.status, states.SUCCESS)
self.assertEqual(task.result, "LLM index updated successfully.")
@override_settings(
AI_ENABLED=True,
@@ -322,9 +325,9 @@ class TestAIIndex(DirectoriesMixin, TestCase):
GIVEN:
- Document exists, AI is enabled, llm index backend is set
WHEN:
- llmindex_index task is called and update_llm_index raises an exception
- llmindex_index task is called
THEN:
- the exception propagates to the caller
- update_llm_index raises an exception, and the task is marked as failure
"""
Document.objects.create(
title="test",
@@ -334,9 +337,13 @@ class TestAIIndex(DirectoriesMixin, TestCase):
# lazy-loaded so mock the actual function
with mock.patch("paperless_ai.indexing.update_llm_index") as update_llm_index:
update_llm_index.side_effect = Exception("LLM index update failed.")
with self.assertRaises(Exception, msg="LLM index update failed."):
tasks.llmindex_index()
tasks.llmindex_index()
update_llm_index.assert_called_once()
task = PaperlessTask.objects.get(
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
)
self.assertEqual(task.status, states.FAILURE)
self.assertIn("LLM index update failed.", task.result)
def test_update_document_in_llm_index(self) -> None:
"""

View File

@@ -8,7 +8,6 @@ import zipfile
from collections import defaultdict
from collections import deque
from datetime import datetime
from datetime import timedelta
from pathlib import Path
from time import mktime
from typing import TYPE_CHECKING
@@ -21,6 +20,7 @@ from urllib.parse import urlparse
import httpx
import magic
import pathvalidate
from celery import states
from django.conf import settings
from django.contrib.auth.models import Group
from django.contrib.auth.models import User
@@ -91,7 +91,6 @@ from rest_framework.mixins import DestroyModelMixin
from rest_framework.mixins import ListModelMixin
from rest_framework.mixins import RetrieveModelMixin
from rest_framework.mixins import UpdateModelMixin
from rest_framework.permissions import IsAdminUser
from rest_framework.permissions import IsAuthenticated
from rest_framework.request import Request
from rest_framework.response import Response
@@ -192,7 +191,7 @@ from documents.serialisers import PostDocumentSerializer
from documents.serialisers import RemovePasswordDocumentsSerializer
from documents.serialisers import ReprocessDocumentsSerializer
from documents.serialisers import RotateDocumentsSerializer
from documents.serialisers import RunTaskSerializer
from documents.serialisers import RunTaskViewSerializer
from documents.serialisers import SavedViewSerializer
from documents.serialisers import SearchResultSerializer
from documents.serialisers import SerializerWithPerms
@@ -201,9 +200,7 @@ from documents.serialisers import ShareLinkSerializer
from documents.serialisers import StoragePathSerializer
from documents.serialisers import StoragePathTestSerializer
from documents.serialisers import TagSerializer
from documents.serialisers import TaskSerializerV9
from documents.serialisers import TaskSerializerV10
from documents.serialisers import TaskSummarySerializer
from documents.serialisers import TasksViewSerializer
from documents.serialisers import TrashSerializer
from documents.serialisers import UiSettingsViewSerializer
from documents.serialisers import WorkflowActionSerializer
@@ -294,7 +291,7 @@ class IndexView(TemplateView):
return context
class PassUserMixin(GenericAPIView[Any]):
class PassUserMixin(GenericAPIView):
"""
Pass a user object to serializer
"""
@@ -460,10 +457,7 @@ class PermissionsAwareDocumentCountMixin(BulkPermissionMixin, PassUserMixin):
@extend_schema_view(**generate_object_with_permissions_schema(CorrespondentSerializer))
class CorrespondentViewSet(
PermissionsAwareDocumentCountMixin,
ModelViewSet[Correspondent],
):
class CorrespondentViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet):
model = Correspondent
queryset = Correspondent.objects.select_related("owner").order_by(Lower("name"))
@@ -500,7 +494,7 @@ class CorrespondentViewSet(
@extend_schema_view(**generate_object_with_permissions_schema(TagSerializer))
class TagViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet[Tag]):
class TagViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet):
model = Tag
serializer_class = TagSerializer
document_count_through = Document.tags.through
@@ -579,10 +573,7 @@ class TagViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet[Tag]):
@extend_schema_view(**generate_object_with_permissions_schema(DocumentTypeSerializer))
class DocumentTypeViewSet(
PermissionsAwareDocumentCountMixin,
ModelViewSet[DocumentType],
):
class DocumentTypeViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet):
model = DocumentType
queryset = DocumentType.objects.select_related("owner").order_by(Lower("name"))
@@ -817,7 +808,7 @@ class DocumentViewSet(
UpdateModelMixin,
DestroyModelMixin,
ListModelMixin,
GenericViewSet[Document],
GenericViewSet,
):
model = Document
queryset = Document.objects.all()
@@ -1257,10 +1248,7 @@ class DocumentViewSet(
),
)
def suggestions(self, request, pk=None):
doc = get_object_or_404(
Document.objects.select_related("owner").prefetch_related("versions"),
pk=pk,
)
doc = get_object_or_404(Document.objects.select_related("owner"), pk=pk)
if request.user is not None and not has_perms_owner_aware(
request.user,
"view_document",
@@ -1964,7 +1952,7 @@ class ChatStreamingSerializer(serializers.Serializer):
],
name="dispatch",
)
class ChatStreamingView(GenericAPIView[Any]):
class ChatStreamingView(GenericAPIView):
permission_classes = (IsAuthenticated,)
serializer_class = ChatStreamingSerializer
@@ -2290,7 +2278,7 @@ class LogViewSet(ViewSet):
@extend_schema_view(**generate_object_with_permissions_schema(SavedViewSerializer))
class SavedViewViewSet(BulkPermissionMixin, PassUserMixin, ModelViewSet[SavedView]):
class SavedViewViewSet(BulkPermissionMixin, PassUserMixin, ModelViewSet):
model = SavedView
queryset = SavedView.objects.select_related("owner").prefetch_related(
@@ -2768,7 +2756,7 @@ class RemovePasswordDocumentsView(DocumentOperationPermissionMixin):
},
),
)
class PostDocumentView(GenericAPIView[Any]):
class PostDocumentView(GenericAPIView):
permission_classes = (IsAuthenticated,)
serializer_class = PostDocumentSerializer
parser_classes = (parsers.MultiPartParser,)
@@ -2889,7 +2877,7 @@ class PostDocumentView(GenericAPIView[Any]):
},
),
)
class SelectionDataView(GenericAPIView[Any]):
class SelectionDataView(GenericAPIView):
permission_classes = (IsAuthenticated,)
serializer_class = DocumentListSerializer
parser_classes = (parsers.MultiPartParser, parsers.JSONParser)
@@ -2993,7 +2981,7 @@ class SelectionDataView(GenericAPIView[Any]):
},
),
)
class SearchAutoCompleteView(GenericAPIView[Any]):
class SearchAutoCompleteView(GenericAPIView):
permission_classes = (IsAuthenticated,)
def get(self, request, format=None):
@@ -3274,7 +3262,7 @@ class GlobalSearchView(PassUserMixin):
},
),
)
class StatisticsView(GenericAPIView[Any]):
class StatisticsView(GenericAPIView):
permission_classes = (IsAuthenticated,)
def get(self, request, format=None):
@@ -3376,7 +3364,7 @@ class StatisticsView(GenericAPIView[Any]):
)
class BulkDownloadView(DocumentSelectionMixin, GenericAPIView[Any]):
class BulkDownloadView(DocumentSelectionMixin, GenericAPIView):
permission_classes = (IsAuthenticated,)
serializer_class = BulkDownloadSerializer
parser_classes = (parsers.JSONParser,)
@@ -3429,7 +3417,7 @@ class BulkDownloadView(DocumentSelectionMixin, GenericAPIView[Any]):
@extend_schema_view(**generate_object_with_permissions_schema(StoragePathSerializer))
class StoragePathViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet[StoragePath]):
class StoragePathViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet):
model = StoragePath
queryset = StoragePath.objects.select_related("owner").order_by(
@@ -3493,7 +3481,7 @@ class StoragePathViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet[Storag
return Response(result)
class UiSettingsView(GenericAPIView[Any]):
class UiSettingsView(GenericAPIView):
queryset = UiSettings.objects.all()
permission_classes = (IsAuthenticated, PaperlessObjectPermissions)
serializer_class = UiSettingsViewSerializer
@@ -3591,7 +3579,7 @@ class UiSettingsView(GenericAPIView[Any]):
},
),
)
class RemoteVersionView(GenericAPIView[Any]):
class RemoteVersionView(GenericAPIView):
cache_key = "remote_version_view_latest_release"
def get(self, request, format=None):
@@ -3668,52 +3656,37 @@ class RemoteVersionView(GenericAPIView[Any]):
),
],
)
class TasksViewSet(ReadOnlyModelViewSet[PaperlessTask]):
class TasksViewSet(ReadOnlyModelViewSet):
permission_classes = (IsAuthenticated, PaperlessObjectPermissions)
serializer_class = TasksViewSerializer
filter_backends = (
DjangoFilterBackend,
OrderingFilter,
ObjectOwnedOrGrantedPermissionsFilter,
)
filterset_class = PaperlessTaskFilterSet
ordering_fields = [
"date_created",
"date_done",
"status",
"task_type",
"duration_seconds",
"wait_time_seconds",
]
ordering = ["-date_created"]
def get_serializer_class(self):
# v9: use backwards-compatible serializer with old field names
if self.request.version and int(self.request.version) < 10:
return TaskSerializerV9
return TaskSerializerV10
TASK_AND_ARGS_BY_NAME = {
PaperlessTask.TaskName.INDEX_OPTIMIZE: (index_optimize, {}),
PaperlessTask.TaskName.TRAIN_CLASSIFIER: (
train_classifier,
{"scheduled": False},
),
PaperlessTask.TaskName.CHECK_SANITY: (
sanity_check,
{"scheduled": False, "raise_on_error": False},
),
PaperlessTask.TaskName.LLMINDEX_UPDATE: (
llmindex_index,
{"scheduled": False, "rebuild": False},
),
}
def get_queryset(self):
queryset = PaperlessTask.objects.all()
# v9 backwards compat: map old query params to new field names
if self.request.version and int(self.request.version) < 10:
task_name = self.request.query_params.get("task_name")
if task_name is not None:
queryset = queryset.filter(task_type=task_name)
task_type_old = self.request.query_params.get("type")
if task_type_old is not None:
# Old type values: AUTO_TASK -> SYSTEM, SCHEDULED_TASK -> SCHEDULED, MANUAL_TASK -> MANUAL
old_to_new = {
"AUTO_TASK": PaperlessTask.TriggerSource.SYSTEM,
"SCHEDULED_TASK": PaperlessTask.TriggerSource.SCHEDULED,
"MANUAL_TASK": PaperlessTask.TriggerSource.MANUAL,
}
new_source = old_to_new.get(task_type_old)
if new_source:
queryset = queryset.filter(trigger_source=new_source)
# v10+: direct task_id param for backwards compat
queryset = PaperlessTask.objects.all().order_by("-date_created")
task_id = self.request.query_params.get("task_id")
if task_id is not None:
queryset = queryset.filter(task_id=task_id)
queryset = PaperlessTask.objects.filter(task_id=task_id)
return queryset
@action(
@@ -3725,123 +3698,39 @@ class TasksViewSet(ReadOnlyModelViewSet[PaperlessTask]):
serializer = AcknowledgeTasksViewSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
task_ids = serializer.validated_data.get("tasks")
tasks = self.get_queryset().filter(id__in=task_ids)
count = tasks.update(acknowledged=True)
return Response({"result": count})
@action(
methods=["post"],
detail=False,
permission_classes=[IsAuthenticated, AcknowledgeTasksPermissions],
)
def acknowledge_all(self, request):
"""Acknowledge all completed tasks visible to the requesting user."""
count = (
self.get_queryset()
.filter(
acknowledged=False,
status__in=[
PaperlessTask.Status.SUCCESS,
PaperlessTask.Status.FAILURE,
PaperlessTask.Status.REVOKED,
],
)
.update(acknowledged=True)
)
return Response({"result": count})
@action(methods=["get"], detail=False)
def summary(self, request):
"""Aggregated task statistics per task_type over the last N days (default 30)."""
from django.db.models import Avg
from django.db.models import Count
from django.db.models import Max
from django.db.models import Q
days = int(request.query_params.get("days", 30))
cutoff = timezone.now() - timedelta(days=days)
queryset = self.get_queryset().filter(date_created__gte=cutoff)
data = queryset.values("task_type").annotate(
total_count=Count("id"),
pending_count=Count("id", filter=Q(status=PaperlessTask.Status.PENDING)),
success_count=Count("id", filter=Q(status=PaperlessTask.Status.SUCCESS)),
failure_count=Count("id", filter=Q(status=PaperlessTask.Status.FAILURE)),
avg_duration_seconds=Avg(
"duration_seconds",
filter=Q(duration_seconds__isnull=False),
),
avg_wait_time_seconds=Avg(
"wait_time_seconds",
filter=Q(wait_time_seconds__isnull=False),
),
last_run=Max("date_created"),
last_success=Max(
"date_done",
filter=Q(status=PaperlessTask.Status.SUCCESS),
),
last_failure=Max(
"date_done",
filter=Q(status=PaperlessTask.Status.FAILURE),
),
)
serializer = TaskSummarySerializer(data, many=True)
return Response(serializer.data)
@action(methods=["get"], detail=False)
def active(self, request):
"""Currently pending and running tasks (capped at 50)."""
queryset = (
self.get_queryset()
.filter(
status__in=[PaperlessTask.Status.PENDING, PaperlessTask.Status.STARTED],
)
.order_by("-date_created")[:50]
)
serializer = self.get_serializer(queryset, many=True)
return Response(serializer.data)
@action(methods=["post"], detail=False, permission_classes=[IsAdminUser])
def run(self, request):
"""Manually dispatch a background task. Superuser (admin) only."""
serializer = RunTaskSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
task_type = serializer.validated_data.get("task_type")
task_func_map = {
PaperlessTask.TaskType.INDEX_OPTIMIZE: (index_optimize, {}),
PaperlessTask.TaskType.TRAIN_CLASSIFIER: (train_classifier, {}),
PaperlessTask.TaskType.SANITY_CHECK: (
sanity_check,
{"raise_on_error": False},
),
PaperlessTask.TaskType.LLM_INDEX: (
llmindex_index,
{"rebuild": False},
),
}
if task_type not in task_func_map:
return Response(
{"error": f"Task type '{task_type}' cannot be manually triggered"},
status=status.HTTP_400_BAD_REQUEST,
)
try:
task_func, task_kwargs = task_func_map[task_type]
async_result = task_func.apply_async(
kwargs=task_kwargs,
headers={"trigger_source": "manual"},
tasks = PaperlessTask.objects.filter(id__in=task_ids)
if request.user is not None and not request.user.is_superuser:
tasks = tasks.filter(owner=request.user) | tasks.filter(owner=None)
result = tasks.update(
acknowledged=True,
)
return Response({"task_id": async_result.id})
return Response({"result": result})
except Exception:
return HttpResponseBadRequest()
@action(methods=["post"], detail=False)
def run(self, request):
serializer = RunTaskViewSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
task_name = serializer.validated_data.get("task_name")
if not request.user.is_superuser:
return HttpResponseForbidden("Insufficient permissions")
try:
task_func, task_args = self.TASK_AND_ARGS_BY_NAME[task_name]
result = task_func(**task_args)
return Response({"result": result})
except Exception as e:
logger.warning(f"Error running task: {e!s}")
logger.warning(f"An error occurred running task: {e!s}")
return HttpResponseServerError(
"Error running task, check logs for more detail.",
)
class ShareLinkViewSet(PassUserMixin, ModelViewSet[ShareLink]):
class ShareLinkViewSet(ModelViewSet, PassUserMixin):
model = ShareLink
queryset = ShareLink.objects.all()
@@ -3858,7 +3747,7 @@ class ShareLinkViewSet(PassUserMixin, ModelViewSet[ShareLink]):
ordering_fields = ("created", "expiration", "document")
class ShareLinkBundleViewSet(PassUserMixin, ModelViewSet[ShareLinkBundle]):
class ShareLinkBundleViewSet(ModelViewSet, PassUserMixin):
model = ShareLinkBundle
queryset = ShareLinkBundle.objects.all()
@@ -4215,7 +4104,7 @@ class BulkEditObjectsView(PassUserMixin):
return Response({"result": "OK"})
class WorkflowTriggerViewSet(ModelViewSet[WorkflowTrigger]):
class WorkflowTriggerViewSet(ModelViewSet):
permission_classes = (IsAuthenticated, PaperlessObjectPermissions)
serializer_class = WorkflowTriggerSerializer
@@ -4233,7 +4122,7 @@ class WorkflowTriggerViewSet(ModelViewSet[WorkflowTrigger]):
return super().partial_update(request, *args, **kwargs)
class WorkflowActionViewSet(ModelViewSet[WorkflowAction]):
class WorkflowActionViewSet(ModelViewSet):
permission_classes = (IsAuthenticated, PaperlessObjectPermissions)
serializer_class = WorkflowActionSerializer
@@ -4258,7 +4147,7 @@ class WorkflowActionViewSet(ModelViewSet[WorkflowAction]):
return super().partial_update(request, *args, **kwargs)
class WorkflowViewSet(ModelViewSet[Workflow]):
class WorkflowViewSet(ModelViewSet):
permission_classes = (IsAuthenticated, PaperlessObjectPermissions)
serializer_class = WorkflowSerializer
@@ -4276,7 +4165,7 @@ class WorkflowViewSet(ModelViewSet[Workflow]):
)
class CustomFieldViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet[CustomField]):
class CustomFieldViewSet(PermissionsAwareDocumentCountMixin, ModelViewSet):
permission_classes = (IsAuthenticated, PaperlessObjectPermissions)
serializer_class = CustomFieldSerializer
@@ -4460,11 +4349,11 @@ class SystemStatusView(PassUserMixin):
last_trained_task = (
PaperlessTask.objects.filter(
task_type=PaperlessTask.TaskType.TRAIN_CLASSIFIER,
task_name=PaperlessTask.TaskName.TRAIN_CLASSIFIER,
status__in=[
PaperlessTask.Status.SUCCESS,
PaperlessTask.Status.FAILURE,
PaperlessTask.Status.REVOKED,
states.SUCCESS,
states.FAILURE,
states.REVOKED,
], # ignore running tasks
)
.order_by("-date_done")
@@ -4475,23 +4364,20 @@ class SystemStatusView(PassUserMixin):
if last_trained_task is None:
classifier_status = "WARNING"
classifier_error = "No classifier training tasks found"
elif (
last_trained_task
and last_trained_task.status != PaperlessTask.Status.SUCCESS
):
elif last_trained_task and last_trained_task.status != states.SUCCESS:
classifier_status = "ERROR"
classifier_error = last_trained_task.result_message
classifier_error = last_trained_task.result
classifier_last_trained = (
last_trained_task.date_done if last_trained_task else None
)
last_sanity_check = (
PaperlessTask.objects.filter(
task_type=PaperlessTask.TaskType.SANITY_CHECK,
task_name=PaperlessTask.TaskName.CHECK_SANITY,
status__in=[
PaperlessTask.Status.SUCCESS,
PaperlessTask.Status.FAILURE,
PaperlessTask.Status.REVOKED,
states.SUCCESS,
states.FAILURE,
states.REVOKED,
], # ignore running tasks
)
.order_by("-date_done")
@@ -4502,12 +4388,9 @@ class SystemStatusView(PassUserMixin):
if last_sanity_check is None:
sanity_check_status = "WARNING"
sanity_check_error = "No sanity check tasks found"
elif (
last_sanity_check
and last_sanity_check.status != PaperlessTask.Status.SUCCESS
):
elif last_sanity_check and last_sanity_check.status != states.SUCCESS:
sanity_check_status = "ERROR"
sanity_check_error = last_sanity_check.result_message
sanity_check_error = last_sanity_check.result
sanity_check_last_run = (
last_sanity_check.date_done if last_sanity_check else None
)
@@ -4520,7 +4403,7 @@ class SystemStatusView(PassUserMixin):
else:
last_llmindex_update = (
PaperlessTask.objects.filter(
task_type=PaperlessTask.TaskType.LLM_INDEX,
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
)
.order_by("-date_done")
.first()
@@ -4530,12 +4413,9 @@ class SystemStatusView(PassUserMixin):
if last_llmindex_update is None:
llmindex_status = "WARNING"
llmindex_error = "No LLM index update tasks found"
elif (
last_llmindex_update
and last_llmindex_update.status == PaperlessTask.Status.FAILURE
):
elif last_llmindex_update and last_llmindex_update.status == states.FAILURE:
llmindex_status = "ERROR"
llmindex_error = last_llmindex_update.result_message
llmindex_error = last_llmindex_update.result
llmindex_last_modified = (
last_llmindex_update.date_done if last_llmindex_update else None
)

File diff suppressed because it is too large Load Diff

View File

@@ -74,7 +74,7 @@ class PaperlessAuthTokenSerializer(AuthTokenSerializer):
return attrs
class UserSerializer(PasswordValidationMixin, serializers.ModelSerializer[User]):
class UserSerializer(PasswordValidationMixin, serializers.ModelSerializer):
password = ObfuscatedPasswordField(required=False)
user_permissions = serializers.SlugRelatedField(
many=True,
@@ -142,7 +142,7 @@ class UserSerializer(PasswordValidationMixin, serializers.ModelSerializer[User])
return user
class GroupSerializer(serializers.ModelSerializer[Group]):
class GroupSerializer(serializers.ModelSerializer):
permissions = serializers.SlugRelatedField(
many=True,
queryset=Permission.objects.exclude(content_type__app_label="admin"),
@@ -158,7 +158,7 @@ class GroupSerializer(serializers.ModelSerializer[Group]):
)
class SocialAccountSerializer(serializers.ModelSerializer[SocialAccount]):
class SocialAccountSerializer(serializers.ModelSerializer):
name = serializers.SerializerMethodField()
class Meta:
@@ -176,7 +176,7 @@ class SocialAccountSerializer(serializers.ModelSerializer[SocialAccount]):
return "Unknown App"
class ProfileSerializer(PasswordValidationMixin, serializers.ModelSerializer[User]):
class ProfileSerializer(PasswordValidationMixin, serializers.ModelSerializer):
email = serializers.EmailField(allow_blank=True, required=False)
password = ObfuscatedPasswordField(required=False, allow_null=False)
auth_token = serializers.SlugRelatedField(read_only=True, slug_field="key")
@@ -209,9 +209,7 @@ class ProfileSerializer(PasswordValidationMixin, serializers.ModelSerializer[Use
)
class ApplicationConfigurationSerializer(
serializers.ModelSerializer[ApplicationConfiguration],
):
class ApplicationConfigurationSerializer(serializers.ModelSerializer):
user_args = serializers.JSONField(binary=True, allow_null=True)
barcode_tag_mapping = serializers.JSONField(binary=True, allow_null=True)
llm_api_key = ObfuscatedPasswordField(

View File

@@ -133,6 +133,7 @@ INSTALLED_APPS = [
"rest_framework",
"rest_framework.authtoken",
"django_filters",
"django_celery_results",
"guardian",
"allauth",
"allauth.account",
@@ -668,6 +669,8 @@ CELERY_BROKER_TRANSPORT_OPTIONS = {
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT: Final[int] = get_int_from_env("PAPERLESS_WORKER_TIMEOUT", 1800)
CELERY_RESULT_EXTENDED = True
CELERY_RESULT_BACKEND = "django-db"
CELERY_CACHE_BACKEND = "default"
# https://docs.celeryq.dev/en/stable/userguide/configuration.html#task-serializer

View File

@@ -181,10 +181,7 @@ def parse_beat_schedule() -> dict:
schedule[task["name"]] = {
"task": task["task"],
"schedule": crontab(minute, hour, day_week, day_month, month),
"options": {
**task["options"],
"headers": {"trigger_source": "scheduled"},
},
"options": task["options"],
}
return schedule

View File

@@ -186,66 +186,42 @@ def make_expected_schedule(
"Check all e-mail accounts": {
"task": "paperless_mail.tasks.process_mail_accounts",
"schedule": crontab(minute="*/10"),
"options": {
"expires": mail_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": mail_expire},
},
"Train the classifier": {
"task": "documents.tasks.train_classifier",
"schedule": crontab(minute="5", hour="*/1"),
"options": {
"expires": classifier_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": classifier_expire},
},
"Optimize the index": {
"task": "documents.tasks.index_optimize",
"schedule": crontab(minute=0, hour=0),
"options": {
"expires": index_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": index_expire},
},
"Perform sanity check": {
"task": "documents.tasks.sanity_check",
"schedule": crontab(minute=30, hour=0, day_of_week="sun"),
"options": {
"expires": sanity_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": sanity_expire},
},
"Empty trash": {
"task": "documents.tasks.empty_trash",
"schedule": crontab(minute=0, hour="1"),
"options": {
"expires": empty_trash_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": empty_trash_expire},
},
"Check and run scheduled workflows": {
"task": "documents.tasks.check_scheduled_workflows",
"schedule": crontab(minute="5", hour="*/1"),
"options": {
"expires": workflow_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": workflow_expire},
},
"Rebuild LLM index": {
"task": "documents.tasks.llmindex_index",
"schedule": crontab(minute="10", hour="2"),
"options": {
"expires": llm_index_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": llm_index_expire},
},
"Cleanup expired share link bundles": {
"task": "documents.tasks.cleanup_expired_share_link_bundles",
"schedule": crontab(minute=0, hour="2"),
"options": {
"expires": share_link_cleanup_expire,
"headers": {"trigger_source": "scheduled"},
},
"options": {"expires": share_link_cleanup_expire},
},
}
@@ -308,16 +284,6 @@ class TestParseBeatSchedule:
schedule = parse_beat_schedule()
assert schedule == expected
def test_parse_beat_schedule_all_entries_have_trigger_source_header(self) -> None:
"""Every beat entry must carry trigger_source=scheduled so the task signal
handler can identify scheduler-originated tasks."""
schedule = parse_beat_schedule()
for name, entry in schedule.items():
headers = entry.get("options", {}).get("headers", {})
assert headers.get("trigger_source") == "scheduled", (
f"Beat entry '{name}' is missing trigger_source header"
)
class TestParseDbSettings:
"""Test suite for parse_db_settings function."""

View File

@@ -1,6 +1,5 @@
from collections import OrderedDict
from pathlib import Path
from typing import Any
from allauth.mfa import signals
from allauth.mfa.adapter import get_adapter as get_mfa_adapter
@@ -115,7 +114,7 @@ class FaviconView(View):
return HttpResponseNotFound("favicon.ico not found")
class UserViewSet(ModelViewSet[User]):
class UserViewSet(ModelViewSet):
_BOOL_NOT_PROVIDED = object()
model = User
@@ -217,7 +216,7 @@ class UserViewSet(ModelViewSet[User]):
return HttpResponseNotFound("TOTP not found")
class GroupViewSet(ModelViewSet[Group]):
class GroupViewSet(ModelViewSet):
model = Group
queryset = Group.objects.order_by(Lower("name"))
@@ -230,7 +229,7 @@ class GroupViewSet(ModelViewSet[Group]):
ordering_fields = ("name",)
class ProfileView(GenericAPIView[Any]):
class ProfileView(GenericAPIView):
"""
User profile view, only available when logged in
"""
@@ -289,7 +288,7 @@ class ProfileView(GenericAPIView[Any]):
},
),
)
class TOTPView(GenericAPIView[Any]):
class TOTPView(GenericAPIView):
"""
TOTP views
"""
@@ -369,7 +368,7 @@ class TOTPView(GenericAPIView[Any]):
},
),
)
class GenerateAuthTokenView(GenericAPIView[Any]):
class GenerateAuthTokenView(GenericAPIView):
"""
Generates (or re-generates) an auth token, requires a logged in user
unlike the default DRF endpoint
@@ -398,7 +397,7 @@ class GenerateAuthTokenView(GenericAPIView[Any]):
},
),
)
class ApplicationConfigurationViewSet(ModelViewSet[ApplicationConfiguration]):
class ApplicationConfigurationViewSet(ModelViewSet):
model = ApplicationConfiguration
queryset = ApplicationConfiguration.objects
@@ -427,9 +426,10 @@ class ApplicationConfigurationViewSet(ModelViewSet[ApplicationConfiguration]):
and not vector_store_file_exists()
):
# AI index was just enabled and vector store file does not exist
llmindex_index.apply_async(
kwargs={"rebuild": True},
headers={"trigger_source": "system"},
llmindex_index.delay(
rebuild=True,
scheduled=False,
auto=True,
)
@@ -450,7 +450,7 @@ class ApplicationConfigurationViewSet(ModelViewSet[ApplicationConfiguration]):
},
),
)
class DisconnectSocialAccountView(GenericAPIView[Any]):
class DisconnectSocialAccountView(GenericAPIView):
"""
Disconnects a social account provider from the user account
"""
@@ -476,7 +476,7 @@ class DisconnectSocialAccountView(GenericAPIView[Any]):
},
),
)
class SocialAccountProvidersView(GenericAPIView[Any]):
class SocialAccountProvidersView(GenericAPIView):
"""
List of social account providers
"""

View File

@@ -4,6 +4,7 @@ from datetime import timedelta
from pathlib import Path
from typing import TYPE_CHECKING
from celery import states
from django.conf import settings
from django.utils import timezone
@@ -27,20 +28,17 @@ def queue_llm_index_update_if_needed(*, rebuild: bool, reason: str) -> bool:
from documents.tasks import llmindex_index
has_running = PaperlessTask.objects.filter(
task_type=PaperlessTask.TaskType.LLM_INDEX,
status__in=[PaperlessTask.Status.PENDING, PaperlessTask.Status.STARTED],
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
status__in=[states.PENDING, states.STARTED],
).exists()
has_recent = PaperlessTask.objects.filter(
task_type=PaperlessTask.TaskType.LLM_INDEX,
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
date_created__gte=(timezone.now() - timedelta(minutes=5)),
).exists()
if has_running or has_recent:
return False
llmindex_index.apply_async(
kwargs={"rebuild": rebuild},
headers={"trigger_source": "system"},
)
llmindex_index.delay(rebuild=rebuild, scheduled=False, auto=True)
logger.warning(
"Queued LLM index update%s: %s",
" (rebuild)" if rebuild else "",

View File

@@ -3,13 +3,13 @@ from unittest.mock import MagicMock
from unittest.mock import patch
import pytest
from celery import states
from django.test import override_settings
from django.utils import timezone
from llama_index.core.base.embeddings.base import BaseEmbedding
from documents.models import Document
from documents.models import PaperlessTask
from documents.tests.factories import PaperlessTaskFactory
from paperless_ai import indexing
@@ -292,15 +292,13 @@ def test_queue_llm_index_update_if_needed_enqueues_when_idle_or_skips_recent() -
)
assert result is True
mock_task.apply_async.assert_called_once_with(
kwargs={"rebuild": True},
headers={"trigger_source": "system"},
)
mock_task.delay.assert_called_once_with(rebuild=True, scheduled=False, auto=True)
PaperlessTaskFactory(
task_type=PaperlessTask.TaskType.LLM_INDEX,
trigger_source=PaperlessTask.TriggerSource.SYSTEM,
status=PaperlessTask.Status.STARTED,
PaperlessTask.objects.create(
task_id="task-1",
task_name=PaperlessTask.TaskName.LLMINDEX_UPDATE,
status=states.STARTED,
date_created=timezone.now(),
)
# Existing running task
@@ -311,7 +309,7 @@ def test_queue_llm_index_update_if_needed_enqueues_when_idle_or_skips_recent() -
)
assert result is False
mock_task.apply_async.assert_not_called()
mock_task.delay.assert_not_called()
@override_settings(

View File

@@ -57,7 +57,7 @@ class MailAccountSerializer(OwnedObjectSerializer):
return instance
class AccountField(serializers.PrimaryKeyRelatedField[MailAccount]):
class AccountField(serializers.PrimaryKeyRelatedField):
def get_queryset(self):
return MailAccount.objects.all().order_by("-id")

View File

@@ -1,7 +1,6 @@
import datetime
import logging
from datetime import timedelta
from typing import Any
from django.http import HttpResponseBadRequest
from django.http import HttpResponseForbidden
@@ -66,7 +65,7 @@ from paperless_mail.tasks import process_mail_accounts
},
),
)
class MailAccountViewSet(PassUserMixin, ModelViewSet[MailAccount]):
class MailAccountViewSet(ModelViewSet, PassUserMixin):
model = MailAccount
queryset = MailAccount.objects.all().order_by("pk")
@@ -160,7 +159,7 @@ class MailAccountViewSet(PassUserMixin, ModelViewSet[MailAccount]):
return Response({"result": "OK"})
class ProcessedMailViewSet(PassUserMixin, ReadOnlyModelViewSet[ProcessedMail]):
class ProcessedMailViewSet(ReadOnlyModelViewSet, PassUserMixin):
permission_classes = (IsAuthenticated, PaperlessObjectPermissions)
serializer_class = ProcessedMailSerializer
pagination_class = StandardPagination
@@ -188,7 +187,7 @@ class ProcessedMailViewSet(PassUserMixin, ReadOnlyModelViewSet[ProcessedMail]):
return Response({"result": "OK", "deleted_mail_ids": mail_ids})
class MailRuleViewSet(PassUserMixin, ModelViewSet[MailRule]):
class MailRuleViewSet(ModelViewSet, PassUserMixin):
model = MailRule
queryset = MailRule.objects.all().order_by("order")
@@ -204,7 +203,7 @@ class MailRuleViewSet(PassUserMixin, ModelViewSet[MailRule]):
responses={200: None},
),
)
class OauthCallbackView(GenericAPIView[Any]):
class OauthCallbackView(GenericAPIView):
permission_classes = (IsAuthenticated,)
def get(self, request, format=None):

346
test_backend_profile.py Normal file
View File

@@ -0,0 +1,346 @@
# ruff: noqa: T201
"""
cProfile-based search pipeline profiling with a 20k-document dataset.
Run with:
uv run pytest ../test_backend_profile.py \
-m profiling --override-ini="addopts=" -s -v
Each scenario prints:
- Wall time for the operation
- cProfile stats sorted by cumulative time (top 25 callers)
This is a developer tool, not a correctness test. Nothing here should
fail unless the code is broken.
"""
from __future__ import annotations
import random
import time
from typing import TYPE_CHECKING
import pytest
from profiling import profile_cpu
from documents.models import Document
from documents.search._backend import TantivyBackend
from documents.search._backend import reset_backend
if TYPE_CHECKING:
from pathlib import Path
# transaction=False (default): tests roll back, but the module-scoped fixture
# commits its data outside the test transaction so it remains visible throughout.
pytestmark = [pytest.mark.profiling, pytest.mark.django_db]
# ---------------------------------------------------------------------------
# Dataset constants
# ---------------------------------------------------------------------------
NUM_DOCS = 20_000
SEED = 42
# Terms and their approximate match rates across the corpus.
# "rechnung" -> ~70% of docs (~14 000)
# "mahnung" -> ~20% of docs (~4 000)
# "kontonummer" -> ~5% of docs (~1 000)
# "rarewort" -> ~1% of docs (~200)
COMMON_TERM = "rechnung"
MEDIUM_TERM = "mahnung"
RARE_TERM = "kontonummer"
VERY_RARE_TERM = "rarewort"
PAGE_SIZE = 25
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_FILLER_WORDS = [
"dokument", # codespell:ignore
"seite",
"datum",
"betrag",
"nummer",
"konto",
"firma",
"vertrag",
"lieferant",
"bestellung",
"steuer",
"mwst",
"leistung",
"auftrag",
"zahlung",
]
def _build_content(rng: random.Random) -> str:
"""Return a short paragraph with terms embedded at the desired rates."""
words = rng.choices(_FILLER_WORDS, k=15)
if rng.random() < 0.70:
words.append(COMMON_TERM)
if rng.random() < 0.20:
words.append(MEDIUM_TERM)
if rng.random() < 0.05:
words.append(RARE_TERM)
if rng.random() < 0.01:
words.append(VERY_RARE_TERM)
rng.shuffle(words)
return " ".join(words)
def _time(fn, *, label: str, runs: int = 3):
"""Run *fn()* several times and report min/avg/max wall time (no cProfile)."""
times = []
result = None
for _ in range(runs):
t0 = time.perf_counter()
result = fn()
times.append(time.perf_counter() - t0)
mn, avg, mx = min(times), sum(times) / len(times), max(times)
print(
f" {label}: min={mn * 1000:.1f}ms avg={avg * 1000:.1f}ms max={mx * 1000:.1f}ms (n={runs})",
)
return result
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture(scope="module")
def module_db(django_db_setup, django_db_blocker):
"""Unlock the DB for the whole module (module-scoped)."""
with django_db_blocker.unblock():
yield
@pytest.fixture(scope="module")
def large_backend(tmp_path_factory, module_db) -> TantivyBackend:
"""
Build a 20 000-document DB + on-disk Tantivy index, shared across all
profiling scenarios in this module. Teardown deletes all documents.
"""
index_path: Path = tmp_path_factory.mktemp("tantivy_profile")
# ---- 1. Bulk-create Document rows ----------------------------------------
rng = random.Random(SEED)
docs = [
Document(
title=f"Document {i:05d}",
content=_build_content(rng),
checksum=f"{i:064x}",
pk=i + 1,
)
for i in range(NUM_DOCS)
]
t0 = time.perf_counter()
Document.objects.bulk_create(docs, batch_size=1_000)
db_time = time.perf_counter() - t0
print(f"\n[setup] bulk_create {NUM_DOCS} docs: {db_time:.2f}s")
# ---- 2. Build Tantivy index -----------------------------------------------
backend = TantivyBackend(path=index_path)
backend.open()
t0 = time.perf_counter()
with backend.batch_update() as batch:
for doc in Document.objects.iterator(chunk_size=500):
batch.add_or_update(doc)
idx_time = time.perf_counter() - t0
print(f"[setup] index {NUM_DOCS} docs: {idx_time:.2f}s")
# ---- 3. Report corpus stats -----------------------------------------------
for term in (COMMON_TERM, MEDIUM_TERM, RARE_TERM, VERY_RARE_TERM):
count = len(backend.search_ids(term, user=None))
print(f"[setup] '{term}' -> {count} hits")
yield backend
# ---- Teardown ------------------------------------------------------------
backend.close()
reset_backend()
Document.objects.all().delete()
# ---------------------------------------------------------------------------
# Profiling tests — each scenario is a separate function so pytest can run
# them individually or all together with -m profiling.
# ---------------------------------------------------------------------------
class TestSearchIdsProfile:
"""Profile backend.search_ids() — pure Tantivy, no DB."""
def test_search_ids_large(self, large_backend: TantivyBackend):
"""~14 000 hits: how long does Tantivy take to collect all IDs?"""
profile_cpu(
lambda: large_backend.search_ids(COMMON_TERM, user=None),
label=f"search_ids('{COMMON_TERM}') [large result set ~14k]",
)
def test_search_ids_medium(self, large_backend: TantivyBackend):
"""~4 000 hits."""
profile_cpu(
lambda: large_backend.search_ids(MEDIUM_TERM, user=None),
label=f"search_ids('{MEDIUM_TERM}') [medium result set ~4k]",
)
def test_search_ids_rare(self, large_backend: TantivyBackend):
"""~1 000 hits."""
profile_cpu(
lambda: large_backend.search_ids(RARE_TERM, user=None),
label=f"search_ids('{RARE_TERM}') [rare result set ~1k]",
)
class TestIntersectAndOrderProfile:
"""
Profile the DB intersection step: filter(pk__in=search_ids).
This is the 'intersect_and_order' logic from views.py.
"""
def test_intersect_large(self, large_backend: TantivyBackend):
"""Intersect 14k Tantivy IDs with all 20k ORM-visible docs."""
all_ids = large_backend.search_ids(COMMON_TERM, user=None)
qs = Document.objects.all()
print(f"\n Tantivy returned {len(all_ids)} IDs")
profile_cpu(
lambda: list(qs.filter(pk__in=all_ids).values_list("pk", flat=True)),
label=f"filter(pk__in={len(all_ids)} ids) [large, use_tantivy_sort=True path]",
)
# Also time it a few times to get stable numbers
print()
_time(
lambda: list(qs.filter(pk__in=all_ids).values_list("pk", flat=True)),
label=f"filter(pk__in={len(all_ids)}) repeated",
)
def test_intersect_rare(self, large_backend: TantivyBackend):
"""Intersect ~1k Tantivy IDs — the happy path."""
all_ids = large_backend.search_ids(RARE_TERM, user=None)
qs = Document.objects.all()
print(f"\n Tantivy returned {len(all_ids)} IDs")
profile_cpu(
lambda: list(qs.filter(pk__in=all_ids).values_list("pk", flat=True)),
label=f"filter(pk__in={len(all_ids)} ids) [rare, use_tantivy_sort=True path]",
)
class TestHighlightHitsProfile:
"""Profile backend.highlight_hits() — per-doc Tantivy lookups with BM25 scoring."""
def test_highlight_page1(self, large_backend: TantivyBackend):
"""25-doc highlight for page 1 (rank_start=1)."""
all_ids = large_backend.search_ids(COMMON_TERM, user=None)
page_ids = all_ids[:PAGE_SIZE]
profile_cpu(
lambda: large_backend.highlight_hits(
COMMON_TERM,
page_ids,
rank_start=1,
),
label=f"highlight_hits page 1 (ids {all_ids[0]}..{all_ids[PAGE_SIZE - 1]})",
)
def test_highlight_page_middle(self, large_backend: TantivyBackend):
"""25-doc highlight for a mid-corpus page (rank_start=page_offset+1)."""
all_ids = large_backend.search_ids(COMMON_TERM, user=None)
mid = len(all_ids) // 2
page_ids = all_ids[mid : mid + PAGE_SIZE]
page_offset = mid
profile_cpu(
lambda: large_backend.highlight_hits(
COMMON_TERM,
page_ids,
rank_start=page_offset + 1,
),
label=f"highlight_hits page ~{mid // PAGE_SIZE} (offset {page_offset})",
)
def test_highlight_repeated(self, large_backend: TantivyBackend):
"""Multiple runs of page-1 highlight to see variance."""
all_ids = large_backend.search_ids(COMMON_TERM, user=None)
page_ids = all_ids[:PAGE_SIZE]
print()
_time(
lambda: large_backend.highlight_hits(COMMON_TERM, page_ids, rank_start=1),
label="highlight_hits page 1",
runs=5,
)
class TestFullPipelineProfile:
"""
Profile the combined pipeline as it runs in views.py:
search_ids -> filter(pk__in) -> highlight_hits
"""
def _run_pipeline(
self,
backend: TantivyBackend,
term: str,
page: int = 1,
):
all_ids = backend.search_ids(term, user=None)
qs = Document.objects.all()
visible_ids = set(qs.filter(pk__in=all_ids).values_list("pk", flat=True))
ordered_ids = [i for i in all_ids if i in visible_ids]
page_offset = (page - 1) * PAGE_SIZE
page_ids = ordered_ids[page_offset : page_offset + PAGE_SIZE]
hits = backend.highlight_hits(
term,
page_ids,
rank_start=page_offset + 1,
)
return ordered_ids, hits
def test_pipeline_large_page1(self, large_backend: TantivyBackend):
"""Full pipeline: large result set, page 1."""
ordered_ids, hits = profile_cpu(
lambda: self._run_pipeline(large_backend, COMMON_TERM, page=1),
label=f"full pipeline '{COMMON_TERM}' page 1",
)[0]
print(f" -> {len(ordered_ids)} total results, {len(hits)} hits on page")
def test_pipeline_large_page5(self, large_backend: TantivyBackend):
"""Full pipeline: large result set, page 5."""
ordered_ids, hits = profile_cpu(
lambda: self._run_pipeline(large_backend, COMMON_TERM, page=5),
label=f"full pipeline '{COMMON_TERM}' page 5",
)[0]
print(f" -> {len(ordered_ids)} total results, {len(hits)} hits on page")
def test_pipeline_rare(self, large_backend: TantivyBackend):
"""Full pipeline: rare term, page 1 (fast path)."""
ordered_ids, hits = profile_cpu(
lambda: self._run_pipeline(large_backend, RARE_TERM, page=1),
label=f"full pipeline '{RARE_TERM}' page 1",
)[0]
print(f" -> {len(ordered_ids)} total results, {len(hits)} hits on page")
def test_pipeline_repeated(self, large_backend: TantivyBackend):
"""Repeated runs to get stable timing (no cProfile overhead)."""
print()
for term, label in [
(COMMON_TERM, f"'{COMMON_TERM}' (large)"),
(MEDIUM_TERM, f"'{MEDIUM_TERM}' (medium)"),
(RARE_TERM, f"'{RARE_TERM}' (rare)"),
]:
_time(
lambda t=term: self._run_pipeline(large_backend, t, page=1),
label=f"full pipeline {label} page 1",
runs=3,
)

605
test_classifier_profile.py Normal file
View File

@@ -0,0 +1,605 @@
# ruff: noqa: T201
"""
cProfile + tracemalloc classifier profiling test.
Run with:
uv run pytest ../test_classifier_profile.py \
-m profiling --override-ini="addopts=" -s -v
Corpus: 5 000 documents, 40 correspondents (25 AUTO), 25 doc types (15 AUTO),
50 tags (30 AUTO), 20 storage paths (12 AUTO).
Document content is generated with Faker for realistic base text, with a
per-label fingerprint injected so the MLP has a real learning signal.
Scenarios:
- train() full corpus — memory and CPU profiles
- second train() no-op path — shows cost of the skip check
- save()/load() round-trip — model file size and memory cost
- _update_data_vectorizer_hash() isolated hash overhead
- predict_*() four independent calls per document — the 4x redundant
vectorization path used by the signal handlers
- _vectorize() cache-miss vs cache-hit breakdown
Memory: tracemalloc (delta + peak + top-20 allocation sites).
CPU: cProfile sorted by cumulative time (top 30).
"""
from __future__ import annotations
import random
import time
from typing import TYPE_CHECKING
import pytest
from django.test import override_settings
from faker import Faker
from profiling import measure_memory
from profiling import profile_cpu
from documents.classifier import DocumentClassifier
from documents.models import Correspondent
from documents.models import Document
from documents.models import DocumentType
from documents.models import MatchingModel
from documents.models import StoragePath
from documents.models import Tag
if TYPE_CHECKING:
from pathlib import Path
pytestmark = [pytest.mark.profiling, pytest.mark.django_db]
# ---------------------------------------------------------------------------
# Corpus parameters
# ---------------------------------------------------------------------------
NUM_DOCS = 5_000
NUM_CORRESPONDENTS = 40 # first 25 are MATCH_AUTO
NUM_DOC_TYPES = 25 # first 15 are MATCH_AUTO
NUM_TAGS = 50 # first 30 are MATCH_AUTO
NUM_STORAGE_PATHS = 20 # first 12 are MATCH_AUTO
NUM_AUTO_CORRESPONDENTS = 25
NUM_AUTO_DOC_TYPES = 15
NUM_AUTO_TAGS = 30
NUM_AUTO_STORAGE_PATHS = 12
SEED = 42
# ---------------------------------------------------------------------------
# Content generation
# ---------------------------------------------------------------------------
def _make_label_fingerprint(
fake: Faker,
label_seed: int,
n_words: int = 6,
) -> list[str]:
"""
Generate a small set of unique-looking words to use as the learning
fingerprint for a label. Each label gets its own seeded Faker so the
fingerprints are distinct and reproducible.
"""
per_label_fake = Faker()
per_label_fake.seed_instance(label_seed)
# Mix word() and last_name() to get varied, pronounceable tokens
words: list[str] = []
while len(words) < n_words:
w = per_label_fake.word().lower()
if w not in words:
words.append(w)
return words
def _build_fingerprints(
num_correspondents: int,
num_doc_types: int,
num_tags: int,
num_paths: int,
) -> tuple[list[list[str]], list[list[str]], list[list[str]], list[list[str]]]:
"""Pre-generate per-label fingerprints. Expensive once, free to reuse."""
fake = Faker()
# Use deterministic seeds offset by type so fingerprints don't collide
corr_fps = [
_make_label_fingerprint(fake, 1_000 + i) for i in range(num_correspondents)
]
dtype_fps = [_make_label_fingerprint(fake, 2_000 + i) for i in range(num_doc_types)]
tag_fps = [_make_label_fingerprint(fake, 3_000 + i) for i in range(num_tags)]
path_fps = [_make_label_fingerprint(fake, 4_000 + i) for i in range(num_paths)]
return corr_fps, dtype_fps, tag_fps, path_fps
def _build_content(
fake: Faker,
corr_fp: list[str] | None,
dtype_fp: list[str] | None,
tag_fps: list[list[str]],
path_fp: list[str] | None,
) -> str:
"""
Combine a Faker paragraph (realistic base text) with per-label
fingerprint words so the classifier has a genuine learning signal.
"""
# 3-sentence paragraph provides realistic vocabulary
base = fake.paragraph(nb_sentences=3)
extras: list[str] = []
if corr_fp:
extras.extend(corr_fp)
if dtype_fp:
extras.extend(dtype_fp)
for fp in tag_fps:
extras.extend(fp)
if path_fp:
extras.extend(path_fp)
if extras:
return base + " " + " ".join(extras)
return base
# ---------------------------------------------------------------------------
# Module-scoped corpus fixture
# ---------------------------------------------------------------------------
@pytest.fixture(scope="module")
def module_db(django_db_setup, django_db_blocker):
"""Unlock the DB for the whole module (module-scoped)."""
with django_db_blocker.unblock():
yield
@pytest.fixture(scope="module")
def classifier_corpus(tmp_path_factory, module_db):
"""
Build the full 5 000-document corpus once for all profiling tests.
Label objects are created individually (small number), documents are
bulk-inserted, and tag M2M rows go through the through-table.
Yields a dict with the model path and a sample content string for
prediction tests. All rows are deleted on teardown.
"""
model_path: Path = tmp_path_factory.mktemp("cls_profile") / "model.pickle"
with override_settings(MODEL_FILE=model_path):
fake = Faker()
Faker.seed(SEED)
rng = random.Random(SEED)
# Pre-generate fingerprints for all labels
print("\n[setup] Generating label fingerprints...")
corr_fps, dtype_fps, tag_fps, path_fps = _build_fingerprints(
NUM_CORRESPONDENTS,
NUM_DOC_TYPES,
NUM_TAGS,
NUM_STORAGE_PATHS,
)
# -----------------------------------------------------------------
# 1. Create label objects
# -----------------------------------------------------------------
print(f"[setup] Creating {NUM_CORRESPONDENTS} correspondents...")
correspondents: list[Correspondent] = []
for i in range(NUM_CORRESPONDENTS):
algo = (
MatchingModel.MATCH_AUTO
if i < NUM_AUTO_CORRESPONDENTS
else MatchingModel.MATCH_NONE
)
correspondents.append(
Correspondent.objects.create(
name=fake.company(),
matching_algorithm=algo,
),
)
print(f"[setup] Creating {NUM_DOC_TYPES} document types...")
doc_types: list[DocumentType] = []
for i in range(NUM_DOC_TYPES):
algo = (
MatchingModel.MATCH_AUTO
if i < NUM_AUTO_DOC_TYPES
else MatchingModel.MATCH_NONE
)
doc_types.append(
DocumentType.objects.create(
name=fake.bs()[:64],
matching_algorithm=algo,
),
)
print(f"[setup] Creating {NUM_TAGS} tags...")
tags: list[Tag] = []
for i in range(NUM_TAGS):
algo = (
MatchingModel.MATCH_AUTO
if i < NUM_AUTO_TAGS
else MatchingModel.MATCH_NONE
)
tags.append(
Tag.objects.create(
name=f"{fake.word()} {i}",
matching_algorithm=algo,
is_inbox_tag=False,
),
)
print(f"[setup] Creating {NUM_STORAGE_PATHS} storage paths...")
storage_paths: list[StoragePath] = []
for i in range(NUM_STORAGE_PATHS):
algo = (
MatchingModel.MATCH_AUTO
if i < NUM_AUTO_STORAGE_PATHS
else MatchingModel.MATCH_NONE
)
storage_paths.append(
StoragePath.objects.create(
name=fake.word(),
path=f"{fake.word()}/{fake.word()}/{{title}}",
matching_algorithm=algo,
),
)
# -----------------------------------------------------------------
# 2. Build document rows and M2M assignments
# -----------------------------------------------------------------
print(f"[setup] Building {NUM_DOCS} document rows...")
doc_rows: list[Document] = []
doc_tag_map: list[tuple[int, int]] = [] # (doc_position, tag_index)
for i in range(NUM_DOCS):
corr_idx = (
rng.randrange(NUM_CORRESPONDENTS) if rng.random() < 0.80 else None
)
dt_idx = rng.randrange(NUM_DOC_TYPES) if rng.random() < 0.80 else None
sp_idx = rng.randrange(NUM_STORAGE_PATHS) if rng.random() < 0.70 else None
# 1-4 tags; most documents get at least one
n_tags = rng.randint(1, 4) if rng.random() < 0.85 else 0
assigned_tag_indices = rng.sample(range(NUM_TAGS), min(n_tags, NUM_TAGS))
content = _build_content(
fake,
corr_fp=corr_fps[corr_idx] if corr_idx is not None else None,
dtype_fp=dtype_fps[dt_idx] if dt_idx is not None else None,
tag_fps=[tag_fps[ti] for ti in assigned_tag_indices],
path_fp=path_fps[sp_idx] if sp_idx is not None else None,
)
doc_rows.append(
Document(
title=fake.sentence(nb_words=5),
content=content,
checksum=f"{i:064x}",
correspondent=correspondents[corr_idx]
if corr_idx is not None
else None,
document_type=doc_types[dt_idx] if dt_idx is not None else None,
storage_path=storage_paths[sp_idx] if sp_idx is not None else None,
),
)
for ti in assigned_tag_indices:
doc_tag_map.append((i, ti))
t0 = time.perf_counter()
Document.objects.bulk_create(doc_rows, batch_size=500)
print(
f"[setup] bulk_create {NUM_DOCS} documents: {time.perf_counter() - t0:.2f}s",
)
# -----------------------------------------------------------------
# 3. Bulk-create M2M through-table rows
# -----------------------------------------------------------------
created_docs = list(Document.objects.order_by("pk"))
through_rows = [
Document.tags.through(
document_id=created_docs[pos].pk,
tag_id=tags[ti].pk,
)
for pos, ti in doc_tag_map
if pos < len(created_docs)
]
t0 = time.perf_counter()
Document.tags.through.objects.bulk_create(
through_rows,
batch_size=1_000,
ignore_conflicts=True,
)
print(
f"[setup] bulk_create {len(through_rows)} tag M2M rows: "
f"{time.perf_counter() - t0:.2f}s",
)
# Sample content for prediction tests
sample_content = _build_content(
fake,
corr_fp=corr_fps[0],
dtype_fp=dtype_fps[0],
tag_fps=[tag_fps[0], tag_fps[1], tag_fps[5]],
path_fp=path_fps[0],
)
yield {
"model_path": model_path,
"sample_content": sample_content,
}
# Teardown
print("\n[teardown] Removing corpus...")
Document.objects.all().delete()
Correspondent.objects.all().delete()
DocumentType.objects.all().delete()
Tag.objects.all().delete()
StoragePath.objects.all().delete()
# ---------------------------------------------------------------------------
# Training profiles
# ---------------------------------------------------------------------------
class TestClassifierTrainingProfile:
"""Profile DocumentClassifier.train() on the full corpus."""
def test_train_memory(self, classifier_corpus, tmp_path):
"""
Peak memory allocated during train().
tracemalloc reports the delta and top allocation sites.
"""
model_path = tmp_path / "model.pickle"
with override_settings(MODEL_FILE=model_path):
classifier = DocumentClassifier()
result, _, _ = measure_memory(
classifier.train,
label=(
f"train() [{NUM_DOCS} docs | "
f"{NUM_CORRESPONDENTS} correspondents ({NUM_AUTO_CORRESPONDENTS} AUTO) | "
f"{NUM_DOC_TYPES} doc types ({NUM_AUTO_DOC_TYPES} AUTO) | "
f"{NUM_TAGS} tags ({NUM_AUTO_TAGS} AUTO) | "
f"{NUM_STORAGE_PATHS} paths ({NUM_AUTO_STORAGE_PATHS} AUTO)]"
),
)
assert result is True, "train() must return True on first run"
print("\n Classifiers trained:")
print(
f" tags_classifier: {classifier.tags_classifier is not None}",
)
print(
f" correspondent_classifier: {classifier.correspondent_classifier is not None}",
)
print(
f" document_type_classifier: {classifier.document_type_classifier is not None}",
)
print(
f" storage_path_classifier: {classifier.storage_path_classifier is not None}",
)
if classifier.data_vectorizer is not None:
vocab_size = len(classifier.data_vectorizer.vocabulary_)
print(f" vocabulary size: {vocab_size} terms")
def test_train_cpu(self, classifier_corpus, tmp_path):
"""
CPU profile of train() — shows time spent in DB queries,
CountVectorizer.fit_transform(), and four MLPClassifier.fit() calls.
"""
model_path = tmp_path / "model_cpu.pickle"
with override_settings(MODEL_FILE=model_path):
classifier = DocumentClassifier()
profile_cpu(
classifier.train,
label=f"train() [{NUM_DOCS} docs]",
top=30,
)
def test_train_second_call_noop(self, classifier_corpus, tmp_path):
"""
No-op path: second train() on unchanged data should return False.
Still queries the DB to build the hash — shown here as the remaining cost.
"""
model_path = tmp_path / "model_noop.pickle"
with override_settings(MODEL_FILE=model_path):
classifier = DocumentClassifier()
t0 = time.perf_counter()
classifier.train()
first_ms = (time.perf_counter() - t0) * 1000
result, second_elapsed = profile_cpu(
classifier.train,
label="train() second call (no-op — same data unchanged)",
top=20,
)
assert result is False, "second train() should skip and return False"
print(f"\n First train: {first_ms:.1f} ms (full fit)")
print(f" Second train: {second_elapsed * 1000:.1f} ms (skip)")
print(f" Speedup: {first_ms / (second_elapsed * 1000):.1f}x")
def test_vectorizer_hash_cost(self, classifier_corpus, tmp_path):
"""
Isolate _update_data_vectorizer_hash() — pickles the entire
CountVectorizer just to SHA256 it. Called at both save and load.
"""
import pickle
model_path = tmp_path / "model_hash.pickle"
with override_settings(MODEL_FILE=model_path):
classifier = DocumentClassifier()
classifier.train()
profile_cpu(
classifier._update_data_vectorizer_hash,
label="_update_data_vectorizer_hash() [pickle.dumps vectorizer + sha256]",
top=10,
)
pickled_size = len(pickle.dumps(classifier.data_vectorizer))
vocab_size = len(classifier.data_vectorizer.vocabulary_)
print(f"\n Vocabulary size: {vocab_size} terms")
print(f" Pickled vectorizer: {pickled_size / 1024:.1f} KiB")
def test_save_load_roundtrip(self, classifier_corpus, tmp_path):
"""
Profile save() and load() — model file size directly reflects how
much memory the classifier occupies on disk (and roughly in RAM).
"""
model_path = tmp_path / "model_saveload.pickle"
with override_settings(MODEL_FILE=model_path):
classifier = DocumentClassifier()
classifier.train()
_, save_peak, _ = measure_memory(
classifier.save,
label="save() [pickle.dumps + HMAC + atomic rename]",
)
file_size_kib = model_path.stat().st_size / 1024
print(f"\n Model file size: {file_size_kib:.1f} KiB")
classifier2 = DocumentClassifier()
_, load_peak, _ = measure_memory(
classifier2.load,
label="load() [read file + verify HMAC + pickle.loads]",
)
print("\n Summary:")
print(f" Model file size: {file_size_kib:.1f} KiB")
print(f" Save peak memory: {save_peak:.1f} KiB")
print(f" Load peak memory: {load_peak:.1f} KiB")
# ---------------------------------------------------------------------------
# Prediction profiles
# ---------------------------------------------------------------------------
class TestClassifierPredictionProfile:
"""
Profile the four predict_*() methods — specifically the redundant
per-call vectorization overhead from the signal handler pattern.
"""
@pytest.fixture(autouse=True)
def trained_classifier(self, classifier_corpus, tmp_path):
model_path = tmp_path / "model_pred.pickle"
self._ctx = override_settings(MODEL_FILE=model_path)
self._ctx.enable()
self.classifier = DocumentClassifier()
self.classifier.train()
self.content = classifier_corpus["sample_content"]
yield
self._ctx.disable()
def test_predict_all_four_separately_cpu(self):
"""
Profile all four predict_*() calls in the order the signal handlers
fire them. Call 1 is a cache miss; calls 2-4 hit the locmem cache
but still pay sha256 + pickle.loads each time.
"""
from django.core.cache import caches
caches["read-cache"].clear()
content = self.content
print(f"\n Content length: {len(content)} chars")
calls = [
("predict_correspondent", self.classifier.predict_correspondent),
("predict_document_type", self.classifier.predict_document_type),
("predict_tags", self.classifier.predict_tags),
("predict_storage_path", self.classifier.predict_storage_path),
]
timings: list[tuple[str, float]] = []
for name, fn in calls:
_, elapsed = profile_cpu(
lambda f=fn: f(content),
label=f"{name}() [call {len(timings) + 1}/4]",
top=15,
)
timings.append((name, elapsed * 1000))
print("\n Per-call timings (sequential, locmem cache):")
for name, ms in timings:
print(f" {name:<32s} {ms:8.3f} ms")
print(f" {'TOTAL':<32s} {sum(t for _, t in timings):8.3f} ms")
def test_predict_all_four_memory(self):
"""
Memory allocated for the full four-prediction sequence, both cold
and warm, to show pickle serialization allocation per call.
"""
from django.core.cache import caches
content = self.content
calls = [
self.classifier.predict_correspondent,
self.classifier.predict_document_type,
self.classifier.predict_tags,
self.classifier.predict_storage_path,
]
caches["read-cache"].clear()
measure_memory(
lambda: [fn(content) for fn in calls],
label="all four predict_*() [cache COLD — first call misses]",
)
measure_memory(
lambda: [fn(content) for fn in calls],
label="all four predict_*() [cache WARM — all calls hit]",
)
def test_vectorize_cache_miss_vs_hit(self):
"""
Isolate the cost of a cache miss (sha256 + transform + pickle.dumps)
vs a cache hit (sha256 + pickle.loads).
"""
from django.core.cache import caches
read_cache = caches["read-cache"]
content = self.content
read_cache.clear()
_, miss_elapsed = profile_cpu(
lambda: self.classifier._vectorize(content),
label="_vectorize() [MISS: sha256 + transform + pickle.dumps]",
top=15,
)
_, hit_elapsed = profile_cpu(
lambda: self.classifier._vectorize(content),
label="_vectorize() [HIT: sha256 + pickle.loads]",
top=15,
)
print(f"\n Cache miss: {miss_elapsed * 1000:.3f} ms")
print(f" Cache hit: {hit_elapsed * 1000:.3f} ms")
print(f" Hit is {miss_elapsed / hit_elapsed:.1f}x faster than miss")
def test_content_hash_overhead(self):
"""
Micro-benchmark the sha256 of the content string — paid on every
_vectorize() call regardless of cache state, including x4 per doc.
"""
import hashlib
content = self.content
encoded = content.encode()
runs = 5_000
t0 = time.perf_counter()
for _ in range(runs):
hashlib.sha256(encoded).hexdigest()
us_per_call = (time.perf_counter() - t0) / runs * 1_000_000
print(f"\n Content: {len(content)} chars / {len(encoded)} bytes")
print(f" sha256 cost per call: {us_per_call:.2f} us (avg over {runs} runs)")
print(f" x4 calls per document: {us_per_call * 4:.2f} us total overhead")

293
test_doclist_profile.py Normal file
View File

@@ -0,0 +1,293 @@
"""
Document list API profiling — no search, pure ORM path.
Run with:
uv run pytest ../test_doclist_profile.py \
-m profiling --override-ini="addopts=" -s -v
Corpus: 5 000 documents, 30 correspondents, 20 doc types, 80 tags,
~500 notes (10 %), 10 custom fields with instances on ~50 % of docs.
Scenarios
---------
TestDocListProfile
- test_list_default_ordering GET /api/documents/ created desc, page 1, page_size=25
- test_list_title_ordering same with ordering=title
- test_list_page_size_comparison page_size=10 / 25 / 100 in sequence
- test_list_detail_fields GET /api/documents/{id}/ — single document serializer cost
- test_list_cpu_profile cProfile of one list request
TestSelectionDataProfile
- test_selection_data_unfiltered _get_selection_data_for_queryset(all docs) in isolation
- test_selection_data_via_api GET /api/documents/?include_selection_data=true
- test_selection_data_filtered filtered vs unfiltered COUNT query comparison
"""
from __future__ import annotations
import datetime
import random
import time
import pytest
from django.contrib.auth.models import User
from faker import Faker
from profiling import profile_block
from profiling import profile_cpu
from rest_framework.test import APIClient
from documents.models import Correspondent
from documents.models import CustomField
from documents.models import CustomFieldInstance
from documents.models import Document
from documents.models import DocumentType
from documents.models import Note
from documents.models import Tag
from documents.views import DocumentViewSet
pytestmark = [pytest.mark.profiling, pytest.mark.django_db]
# ---------------------------------------------------------------------------
# Corpus parameters
# ---------------------------------------------------------------------------
NUM_DOCS = 5_000
NUM_CORRESPONDENTS = 30
NUM_DOC_TYPES = 20
NUM_TAGS = 80
NOTE_FRACTION = 0.10
CUSTOM_FIELD_COUNT = 10
CUSTOM_FIELD_FRACTION = 0.50
PAGE_SIZE = 25
SEED = 42
# ---------------------------------------------------------------------------
# Module-scoped corpus fixture
# ---------------------------------------------------------------------------
@pytest.fixture(scope="module")
def module_db(django_db_setup, django_db_blocker):
"""Unlock the DB for the whole module (module-scoped)."""
with django_db_blocker.unblock():
yield
@pytest.fixture(scope="module")
def doclist_corpus(module_db):
"""
Build a 5 000-document corpus with tags, notes, custom fields, correspondents,
and doc types. All objects are deleted on teardown.
"""
fake = Faker()
Faker.seed(SEED)
rng = random.Random(SEED)
print(f"\n[setup] Creating {NUM_CORRESPONDENTS} correspondents...") # noqa: T201
correspondents = [
Correspondent.objects.create(name=f"dlcorp-{i}-{fake.company()}"[:128])
for i in range(NUM_CORRESPONDENTS)
]
print(f"[setup] Creating {NUM_DOC_TYPES} doc types...") # noqa: T201
doc_types = [
DocumentType.objects.create(name=f"dltype-{i}-{fake.word()}"[:128])
for i in range(NUM_DOC_TYPES)
]
print(f"[setup] Creating {NUM_TAGS} tags...") # noqa: T201
tags = [
Tag.objects.create(name=f"dltag-{i}-{fake.word()}"[:100])
for i in range(NUM_TAGS)
]
print(f"[setup] Creating {CUSTOM_FIELD_COUNT} custom fields...") # noqa: T201
custom_fields = [
CustomField.objects.create(
name=f"Field {i}",
data_type=CustomField.FieldDataType.STRING,
)
for i in range(CUSTOM_FIELD_COUNT)
]
note_user = User.objects.create_user(username="doclistnoteuser", password="x")
owner = User.objects.create_superuser(username="doclistowner", password="admin")
print(f"[setup] Building {NUM_DOCS} document rows...") # noqa: T201
base_date = datetime.date(2018, 1, 1)
raw_docs = []
for i in range(NUM_DOCS):
day_offset = rng.randint(0, 6 * 365)
raw_docs.append(
Document(
title=fake.sentence(nb_words=rng.randint(3, 8)).rstrip("."),
content="\n\n".join(
fake.paragraph(nb_sentences=rng.randint(2, 5))
for _ in range(rng.randint(1, 3))
),
checksum=f"DL{i:07d}",
correspondent=rng.choice(correspondents + [None] * 5),
document_type=rng.choice(doc_types + [None] * 4),
created=base_date + datetime.timedelta(days=day_offset),
owner=owner if rng.random() < 0.8 else None,
),
)
t0 = time.perf_counter()
documents = Document.objects.bulk_create(raw_docs)
print(f"[setup] bulk_create {NUM_DOCS} docs: {time.perf_counter() - t0:.2f}s") # noqa: T201
t0 = time.perf_counter()
for doc in documents:
k = rng.randint(0, 5)
if k:
doc.tags.add(*rng.sample(tags, k))
print(f"[setup] tag M2M assignments: {time.perf_counter() - t0:.2f}s") # noqa: T201
note_docs = rng.sample(documents, int(NUM_DOCS * NOTE_FRACTION))
Note.objects.bulk_create(
[
Note(
document=doc,
note=fake.sentence(nb_words=rng.randint(4, 15)),
user=note_user,
)
for doc in note_docs
],
)
cf_docs = rng.sample(documents, int(NUM_DOCS * CUSTOM_FIELD_FRACTION))
CustomFieldInstance.objects.bulk_create(
[
CustomFieldInstance(
document=doc,
field=rng.choice(custom_fields),
value_text=fake.word(),
)
for doc in cf_docs
],
)
first_doc_pk = documents[0].pk
yield {"owner": owner, "first_doc_pk": first_doc_pk, "tags": tags}
print("\n[teardown] Removing doclist corpus...") # noqa: T201
Document.objects.all().delete()
Correspondent.objects.all().delete()
DocumentType.objects.all().delete()
Tag.objects.all().delete()
CustomField.objects.all().delete()
User.objects.filter(username__in=["doclistnoteuser", "doclistowner"]).delete()
# ---------------------------------------------------------------------------
# TestDocListProfile
# ---------------------------------------------------------------------------
class TestDocListProfile:
"""Profile GET /api/documents/ — pure ORM path, no Tantivy."""
@pytest.fixture(autouse=True)
def _client(self, doclist_corpus):
owner = doclist_corpus["owner"]
self.client = APIClient()
self.client.force_authenticate(user=owner)
self.first_doc_pk = doclist_corpus["first_doc_pk"]
def test_list_default_ordering(self):
"""GET /api/documents/ default ordering (-created), page 1, page_size=25."""
with profile_block(
f"GET /api/documents/ default ordering [page_size={PAGE_SIZE}]",
):
response = self.client.get(
f"/api/documents/?page=1&page_size={PAGE_SIZE}",
)
assert response.status_code == 200
def test_list_title_ordering(self):
"""GET /api/documents/ ordered by title — tests ORM sort path."""
with profile_block(
f"GET /api/documents/?ordering=title [page_size={PAGE_SIZE}]",
):
response = self.client.get(
f"/api/documents/?ordering=title&page=1&page_size={PAGE_SIZE}",
)
assert response.status_code == 200
def test_list_page_size_comparison(self):
"""Compare serializer cost at page_size=10, 25, 100."""
for page_size in [10, 25, 100]:
with profile_block(f"GET /api/documents/ [page_size={page_size}]"):
response = self.client.get(
f"/api/documents/?page=1&page_size={page_size}",
)
assert response.status_code == 200
def test_list_detail_fields(self):
"""GET /api/documents/{id}/ — per-doc serializer cost with all relations."""
pk = self.first_doc_pk
with profile_block(f"GET /api/documents/{pk}/ — single doc serializer"):
response = self.client.get(f"/api/documents/{pk}/")
assert response.status_code == 200
def test_list_cpu_profile(self):
"""cProfile of one list request — surfaces hot frames in serializer."""
profile_cpu(
lambda: self.client.get(
f"/api/documents/?page=1&page_size={PAGE_SIZE}",
),
label=f"GET /api/documents/ cProfile [page_size={PAGE_SIZE}]",
top=30,
)
# ---------------------------------------------------------------------------
# TestSelectionDataProfile
# ---------------------------------------------------------------------------
class TestSelectionDataProfile:
"""Profile _get_selection_data_for_queryset — the 5+ COUNT queries per request."""
@pytest.fixture(autouse=True)
def _setup(self, doclist_corpus):
owner = doclist_corpus["owner"]
self.client = APIClient()
self.client.force_authenticate(user=owner)
self.tags = doclist_corpus["tags"]
def test_selection_data_unfiltered(self):
"""Call _get_selection_data_for_queryset(all docs) directly — COUNT queries in isolation."""
viewset = DocumentViewSet()
qs = Document.objects.all()
with profile_block("_get_selection_data_for_queryset(all docs) — direct call"):
viewset._get_selection_data_for_queryset(qs)
def test_selection_data_via_api(self):
"""Full API round-trip with include_selection_data=true."""
with profile_block(
f"GET /api/documents/?include_selection_data=true [page_size={PAGE_SIZE}]",
):
response = self.client.get(
f"/api/documents/?page=1&page_size={PAGE_SIZE}&include_selection_data=true",
)
assert response.status_code == 200
assert "selection_data" in response.data
def test_selection_data_filtered(self):
"""selection_data on a tag-filtered queryset — filtered COUNT vs unfiltered."""
tag = self.tags[0]
viewset = DocumentViewSet()
filtered_qs = Document.objects.filter(tags=tag)
unfiltered_qs = Document.objects.all()
print(f"\n Tag '{tag.name}' matches {filtered_qs.count()} docs") # noqa: T201
with profile_block("_get_selection_data_for_queryset(unfiltered)"):
viewset._get_selection_data_for_queryset(unfiltered_qs)
with profile_block("_get_selection_data_for_queryset(filtered by tag)"):
viewset._get_selection_data_for_queryset(filtered_qs)

284
test_matching_profile.py Normal file
View File

@@ -0,0 +1,284 @@
"""
Matching pipeline profiling.
Run with:
uv run pytest ../test_matching_profile.py \
-m profiling --override-ini="addopts=" -s -v
Corpus: 1 document + 50 correspondents, 100 tags, 25 doc types, 20 storage
paths. Labels are spread across all six matching algorithms
(NONE, ANY, ALL, LITERAL, REGEX, FUZZY, AUTO).
Classifier is passed as None -- MATCH_AUTO models skip prediction gracefully,
which is correct for isolating the ORM query and Python-side evaluation cost.
Scenarios
---------
TestMatchingPipelineProfile
- test_match_correspondents 50 correspondents, algorithm mix
- test_match_tags 100 tags
- test_match_document_types 25 doc types
- test_match_storage_paths 20 storage paths
- test_full_match_sequence all four in order (cumulative consumption cost)
- test_algorithm_breakdown each MATCH_* algorithm in isolation
"""
from __future__ import annotations
import random
import pytest
from faker import Faker
from profiling import profile_block
from documents.matching import match_correspondents
from documents.matching import match_document_types
from documents.matching import match_storage_paths
from documents.matching import match_tags
from documents.models import Correspondent
from documents.models import Document
from documents.models import DocumentType
from documents.models import MatchingModel
from documents.models import StoragePath
from documents.models import Tag
pytestmark = [pytest.mark.profiling, pytest.mark.django_db]
NUM_CORRESPONDENTS = 50
NUM_TAGS = 100
NUM_DOC_TYPES = 25
NUM_STORAGE_PATHS = 20
SEED = 42
# Algorithm distribution across labels (cycles through in order)
_ALGORITHMS = [
MatchingModel.MATCH_NONE,
MatchingModel.MATCH_ANY,
MatchingModel.MATCH_ALL,
MatchingModel.MATCH_LITERAL,
MatchingModel.MATCH_REGEX,
MatchingModel.MATCH_FUZZY,
MatchingModel.MATCH_AUTO,
]
def _algo(i: int) -> int:
return _ALGORITHMS[i % len(_ALGORITHMS)]
# ---------------------------------------------------------------------------
# Module-scoped corpus fixture
# ---------------------------------------------------------------------------
@pytest.fixture(scope="module")
def module_db(django_db_setup, django_db_blocker):
"""Unlock the DB for the whole module (module-scoped)."""
with django_db_blocker.unblock():
yield
@pytest.fixture(scope="module")
def matching_corpus(module_db):
"""
1 document with realistic content + dense matching model sets.
Classifier=None so MATCH_AUTO models are simply skipped.
"""
fake = Faker()
Faker.seed(SEED)
random.seed(SEED)
# ---- matching models ---------------------------------------------------
print(f"\n[setup] Creating {NUM_CORRESPONDENTS} correspondents...") # noqa: T201
correspondents = []
for i in range(NUM_CORRESPONDENTS):
algo = _algo(i)
match_text = (
fake.word()
if algo not in (MatchingModel.MATCH_NONE, MatchingModel.MATCH_AUTO)
else ""
)
if algo == MatchingModel.MATCH_REGEX:
match_text = r"\b" + fake.word() + r"\b"
correspondents.append(
Correspondent.objects.create(
name=f"mcorp-{i}-{fake.company()}"[:128],
matching_algorithm=algo,
match=match_text,
),
)
print(f"[setup] Creating {NUM_TAGS} tags...") # noqa: T201
tags = []
for i in range(NUM_TAGS):
algo = _algo(i)
match_text = (
fake.word()
if algo not in (MatchingModel.MATCH_NONE, MatchingModel.MATCH_AUTO)
else ""
)
if algo == MatchingModel.MATCH_REGEX:
match_text = r"\b" + fake.word() + r"\b"
tags.append(
Tag.objects.create(
name=f"mtag-{i}-{fake.word()}"[:100],
matching_algorithm=algo,
match=match_text,
),
)
print(f"[setup] Creating {NUM_DOC_TYPES} doc types...") # noqa: T201
doc_types = []
for i in range(NUM_DOC_TYPES):
algo = _algo(i)
match_text = (
fake.word()
if algo not in (MatchingModel.MATCH_NONE, MatchingModel.MATCH_AUTO)
else ""
)
if algo == MatchingModel.MATCH_REGEX:
match_text = r"\b" + fake.word() + r"\b"
doc_types.append(
DocumentType.objects.create(
name=f"mtype-{i}-{fake.word()}"[:128],
matching_algorithm=algo,
match=match_text,
),
)
print(f"[setup] Creating {NUM_STORAGE_PATHS} storage paths...") # noqa: T201
storage_paths = []
for i in range(NUM_STORAGE_PATHS):
algo = _algo(i)
match_text = (
fake.word()
if algo not in (MatchingModel.MATCH_NONE, MatchingModel.MATCH_AUTO)
else ""
)
if algo == MatchingModel.MATCH_REGEX:
match_text = r"\b" + fake.word() + r"\b"
storage_paths.append(
StoragePath.objects.create(
name=f"mpath-{i}-{fake.word()}",
path=f"{fake.word()}/{{title}}",
matching_algorithm=algo,
match=match_text,
),
)
# ---- document with diverse content ------------------------------------
doc = Document.objects.create(
title="quarterly invoice payment tax financial statement",
content=" ".join(fake.paragraph(nb_sentences=5) for _ in range(3)),
checksum="MATCHPROF0001",
)
print(f"[setup] Document pk={doc.pk}, content length={len(doc.content)} chars") # noqa: T201
print( # noqa: T201
f" Correspondents: {NUM_CORRESPONDENTS} "
f"({sum(1 for c in correspondents if c.matching_algorithm == MatchingModel.MATCH_AUTO)} AUTO)",
)
print( # noqa: T201
f" Tags: {NUM_TAGS} "
f"({sum(1 for t in tags if t.matching_algorithm == MatchingModel.MATCH_AUTO)} AUTO)",
)
yield {"doc": doc}
# Teardown
print("\n[teardown] Removing matching corpus...") # noqa: T201
Document.objects.all().delete()
Correspondent.objects.all().delete()
Tag.objects.all().delete()
DocumentType.objects.all().delete()
StoragePath.objects.all().delete()
# ---------------------------------------------------------------------------
# TestMatchingPipelineProfile
# ---------------------------------------------------------------------------
class TestMatchingPipelineProfile:
"""Profile the matching functions called per document during consumption."""
@pytest.fixture(autouse=True)
def _setup(self, matching_corpus):
self.doc = matching_corpus["doc"]
def test_match_correspondents(self):
"""50 correspondents, algorithm mix. Query count + time."""
with profile_block(
f"match_correspondents() [{NUM_CORRESPONDENTS} correspondents, mixed algorithms]",
):
result = match_correspondents(self.doc, classifier=None)
print(f" -> {len(result)} matched") # noqa: T201
def test_match_tags(self):
"""100 tags -- densest set in real installs."""
with profile_block(f"match_tags() [{NUM_TAGS} tags, mixed algorithms]"):
result = match_tags(self.doc, classifier=None)
print(f" -> {len(result)} matched") # noqa: T201
def test_match_document_types(self):
"""25 doc types."""
with profile_block(
f"match_document_types() [{NUM_DOC_TYPES} types, mixed algorithms]",
):
result = match_document_types(self.doc, classifier=None)
print(f" -> {len(result)} matched") # noqa: T201
def test_match_storage_paths(self):
"""20 storage paths."""
with profile_block(
f"match_storage_paths() [{NUM_STORAGE_PATHS} paths, mixed algorithms]",
):
result = match_storage_paths(self.doc, classifier=None)
print(f" -> {len(result)} matched") # noqa: T201
def test_full_match_sequence(self):
"""All four match_*() calls in order -- cumulative cost per document consumed."""
with profile_block(
"full match sequence: correspondents + doc_types + tags + storage_paths",
):
match_correspondents(self.doc, classifier=None)
match_document_types(self.doc, classifier=None)
match_tags(self.doc, classifier=None)
match_storage_paths(self.doc, classifier=None)
def test_algorithm_breakdown(self):
"""Create one correspondent per algorithm and time each independently."""
import time
from documents.matching import matches
fake = Faker()
algo_names = {
MatchingModel.MATCH_NONE: "MATCH_NONE",
MatchingModel.MATCH_ANY: "MATCH_ANY",
MatchingModel.MATCH_ALL: "MATCH_ALL",
MatchingModel.MATCH_LITERAL: "MATCH_LITERAL",
MatchingModel.MATCH_REGEX: "MATCH_REGEX",
MatchingModel.MATCH_FUZZY: "MATCH_FUZZY",
}
doc = self.doc
print() # noqa: T201
for algo, name in algo_names.items():
match_text = fake.word() if algo != MatchingModel.MATCH_NONE else ""
if algo == MatchingModel.MATCH_REGEX:
match_text = r"\b" + fake.word() + r"\b"
model = Correspondent(
name=f"algo-test-{name}",
matching_algorithm=algo,
match=match_text,
)
# Time 1000 iterations to get stable microsecond readings
runs = 1_000
t0 = time.perf_counter()
for _ in range(runs):
matches(model, doc)
us_per_call = (time.perf_counter() - t0) / runs * 1_000_000
print( # noqa: T201
f" {name:<20s} {us_per_call:8.2f} us/call (match={match_text[:20]!r})",
)

154
test_sanity_profile.py Normal file
View File

@@ -0,0 +1,154 @@
"""
Sanity checker profiling.
Run with:
uv run pytest ../test_sanity_profile.py \
-m profiling --override-ini="addopts=" -s -v
Corpus: 2 000 documents with stub files (original + archive + thumbnail)
created in a temp MEDIA_ROOT.
Scenarios
---------
TestSanityCheckerProfile
- test_sanity_full_corpus full check_sanity() -- cProfile + tracemalloc
- test_sanity_query_pattern profile_block summary: query count + time
"""
from __future__ import annotations
import hashlib
import time
import pytest
from django.test import override_settings
from profiling import measure_memory
from profiling import profile_block
from profiling import profile_cpu
from documents.models import Document
from documents.sanity_checker import check_sanity
pytestmark = [pytest.mark.profiling, pytest.mark.django_db]
NUM_DOCS = 2_000
SEED = 42
# ---------------------------------------------------------------------------
# Module-scoped fixture: temp directories + corpus
# ---------------------------------------------------------------------------
@pytest.fixture(scope="module")
def module_db(django_db_setup, django_db_blocker):
"""Unlock the DB for the whole module (module-scoped)."""
with django_db_blocker.unblock():
yield
@pytest.fixture(scope="module")
def sanity_corpus(tmp_path_factory, module_db):
"""
Build a 2 000-document corpus. For each document create stub files
(1-byte placeholders) in ORIGINALS_DIR, ARCHIVE_DIR, and THUMBNAIL_DIR
so the sanity checker's file-existence and checksum checks have real targets.
"""
media = tmp_path_factory.mktemp("sanity_media")
originals_dir = media / "documents" / "originals"
archive_dir = media / "documents" / "archive"
thumb_dir = media / "documents" / "thumbnails"
for d in (originals_dir, archive_dir, thumb_dir):
d.mkdir(parents=True)
# Use override_settings as a context manager for the whole fixture lifetime
settings_ctx = override_settings(
MEDIA_ROOT=media,
ORIGINALS_DIR=originals_dir,
ARCHIVE_DIR=archive_dir,
THUMBNAIL_DIR=thumb_dir,
MEDIA_LOCK=media / "media.lock",
)
settings_ctx.enable()
print(f"\n[setup] Creating {NUM_DOCS} documents with stub files...") # noqa: T201
t0 = time.perf_counter()
docs = []
for i in range(NUM_DOCS):
content = f"document content for doc {i}"
checksum = hashlib.sha256(content.encode()).hexdigest()
orig_filename = f"{i:07d}.pdf"
arch_filename = f"{i:07d}.pdf"
orig_path = originals_dir / orig_filename
arch_path = archive_dir / arch_filename
orig_path.write_bytes(content.encode())
arch_path.write_bytes(content.encode())
docs.append(
Document(
title=f"Document {i:05d}",
content=content,
checksum=checksum,
archive_checksum=checksum,
filename=orig_filename,
archive_filename=arch_filename,
mime_type="application/pdf",
),
)
created = Document.objects.bulk_create(docs, batch_size=500)
# Thumbnails use doc.pk, so create them after bulk_create assigns pks
for doc in created:
thumb_path = thumb_dir / f"{doc.pk:07d}.webp"
thumb_path.write_bytes(b"\x00") # minimal thumbnail stub
print( # noqa: T201
f"[setup] bulk_create + file creation: {time.perf_counter() - t0:.2f}s",
)
yield {"media": media}
# Teardown
print("\n[teardown] Removing sanity corpus...") # noqa: T201
Document.objects.all().delete()
settings_ctx.disable()
# ---------------------------------------------------------------------------
# TestSanityCheckerProfile
# ---------------------------------------------------------------------------
class TestSanityCheckerProfile:
"""Profile check_sanity() on a realistic corpus with real files."""
@pytest.fixture(autouse=True)
def _setup(self, sanity_corpus):
self.media = sanity_corpus["media"]
def test_sanity_full_corpus(self):
"""Full check_sanity() -- cProfile surfaces hot frames, tracemalloc shows peak."""
_, elapsed = profile_cpu(
lambda: check_sanity(scheduled=False),
label=f"check_sanity() [{NUM_DOCS} docs, real files]",
top=25,
)
_, peak_kib, delta_kib = measure_memory(
lambda: check_sanity(scheduled=False),
label=f"check_sanity() [{NUM_DOCS} docs] -- memory",
)
print("\n Summary:") # noqa: T201
print(f" Wall time (CPU profile run): {elapsed * 1000:.1f} ms") # noqa: T201
print(f" Peak memory (second run): {peak_kib:.1f} KiB") # noqa: T201
print(f" Memory delta: {delta_kib:+.1f} KiB") # noqa: T201
def test_sanity_query_pattern(self):
"""profile_block view: query count + query time + wall time in one summary."""
with profile_block(f"check_sanity() [{NUM_DOCS} docs] -- query count"):
check_sanity(scheduled=False)

273
test_search_profiling.py Normal file
View File

@@ -0,0 +1,273 @@
"""
Search performance profiling tests.
Run explicitly — excluded from the normal test suite:
uv run pytest -m profiling -s -p no:xdist --override-ini="addopts=" -v
The ``-s`` flag is required to see profile_block() output.
The ``-p no:xdist`` flag disables parallel execution for accurate measurements.
Corpus: 5 000 documents generated deterministically from a fixed Faker seed,
with realistic variety: 30 correspondents, 15 document types, 50 tags, ~500
notes spread across ~10 % of documents.
"""
from __future__ import annotations
import random
import pytest
from django.contrib.auth.models import User
from faker import Faker
from profiling import profile_block
from rest_framework.test import APIClient
from documents.models import Correspondent
from documents.models import Document
from documents.models import DocumentType
from documents.models import Note
from documents.models import Tag
from documents.search import get_backend
from documents.search import reset_backend
from documents.search._backend import SearchMode
pytestmark = [pytest.mark.profiling, pytest.mark.search, pytest.mark.django_db]
# ---------------------------------------------------------------------------
# Corpus parameters
# ---------------------------------------------------------------------------
DOC_COUNT = 5_000
SEED = 42
NUM_CORRESPONDENTS = 30
NUM_DOC_TYPES = 15
NUM_TAGS = 50
NOTE_FRACTION = 0.10 # ~500 documents get a note
PAGE_SIZE = 25
def _build_corpus(rng: random.Random, fake: Faker) -> None:
"""
Insert the full corpus into the database and index it.
Uses bulk_create for the Document rows (fast) then handles the M2M tag
relationships and notes individually. Indexes the full corpus with a
single backend.rebuild() call.
"""
import datetime
# ---- lookup objects -------------------------------------------------
correspondents = [
Correspondent.objects.create(name=f"profcorp-{i}-{fake.company()}"[:128])
for i in range(NUM_CORRESPONDENTS)
]
doc_types = [
DocumentType.objects.create(name=f"proftype-{i}-{fake.word()}"[:128])
for i in range(NUM_DOC_TYPES)
]
tags = [
Tag.objects.create(name=f"proftag-{i}-{fake.word()}"[:100])
for i in range(NUM_TAGS)
]
note_user = User.objects.create_user(username="profnoteuser", password="x")
# ---- bulk-create documents ------------------------------------------
base_date = datetime.date(2018, 1, 1)
raw_docs = []
for i in range(DOC_COUNT):
day_offset = rng.randint(0, 6 * 365)
created = base_date + datetime.timedelta(days=day_offset)
raw_docs.append(
Document(
title=fake.sentence(nb_words=rng.randint(3, 9)).rstrip("."),
content="\n\n".join(
fake.paragraph(nb_sentences=rng.randint(3, 7))
for _ in range(rng.randint(2, 5))
),
checksum=f"PROF{i:07d}",
correspondent=rng.choice(correspondents + [None] * 8),
document_type=rng.choice(doc_types + [None] * 4),
created=created,
),
)
documents = Document.objects.bulk_create(raw_docs)
# ---- tags (M2M, post-bulk) ------------------------------------------
for doc in documents:
k = rng.randint(0, 5)
if k:
doc.tags.add(*rng.sample(tags, k))
# ---- notes on ~10 % of docs -----------------------------------------
note_docs = rng.sample(documents, int(DOC_COUNT * NOTE_FRACTION))
for doc in note_docs:
Note.objects.create(
document=doc,
note=fake.sentence(nb_words=rng.randint(6, 20)),
user=note_user,
)
# ---- build Tantivy index --------------------------------------------
backend = get_backend()
qs = Document.objects.select_related(
"correspondent",
"document_type",
"storage_path",
"owner",
).prefetch_related("tags", "notes__user", "custom_fields__field")
backend.rebuild(qs)
class TestSearchProfiling:
"""
Performance profiling for the Tantivy search backend and DRF API layer.
Each test builds a fresh 5 000-document corpus, exercises one hot path,
and prints profile_block() measurements to stdout. No correctness
assertions — the goal is to surface hot spots and track regressions.
"""
@pytest.fixture(autouse=True)
def _setup(self, tmp_path, settings):
index_dir = tmp_path / "index"
index_dir.mkdir()
settings.INDEX_DIR = index_dir
reset_backend()
rng = random.Random(SEED)
fake = Faker()
Faker.seed(SEED)
self.user = User.objects.create_superuser(
username="profiler",
password="admin",
)
self.client = APIClient()
self.client.force_authenticate(user=self.user)
_build_corpus(rng, fake)
yield
reset_backend()
# -- 1. Backend: search_ids relevance ---------------------------------
def test_profile_search_ids_relevance(self):
"""Profile: search_ids() with relevance ordering across several queries."""
backend = get_backend()
queries = [
"invoice payment",
"annual report",
"bank statement",
"contract agreement",
"receipt",
]
with profile_block(f"search_ids — relevance ({len(queries)} queries)"):
for q in queries:
backend.search_ids(q, user=None)
# -- 2. Backend: search_ids with Tantivy-native sort ------------------
def test_profile_search_ids_sorted(self):
"""Profile: search_ids() sorted by a Tantivy fast field (created)."""
backend = get_backend()
with profile_block("search_ids — sorted by created (asc + desc)"):
backend.search_ids(
"the",
user=None,
sort_field="created",
sort_reverse=False,
)
backend.search_ids(
"the",
user=None,
sort_field="created",
sort_reverse=True,
)
# -- 3. Backend: highlight_hits for a page of 25 ----------------------
def test_profile_highlight_hits(self):
"""Profile: highlight_hits() for a 25-document page."""
backend = get_backend()
all_ids = backend.search_ids("report", user=None)
page_ids = all_ids[:PAGE_SIZE]
with profile_block(f"highlight_hits — {len(page_ids)} docs"):
backend.highlight_hits("report", page_ids)
# -- 4. Backend: autocomplete -----------------------------------------
def test_profile_autocomplete(self):
"""Profile: autocomplete() with eight common prefixes."""
backend = get_backend()
prefixes = ["inv", "pay", "con", "rep", "sta", "acc", "doc", "fin"]
with profile_block(f"autocomplete — {len(prefixes)} prefixes"):
for prefix in prefixes:
backend.autocomplete(prefix, limit=10)
# -- 5. Backend: simple-mode search (TEXT and TITLE) ------------------
def test_profile_search_ids_simple_modes(self):
"""Profile: search_ids() in TEXT and TITLE simple-search modes."""
backend = get_backend()
queries = ["invoice 2023", "annual report", "bank statement"]
with profile_block(
f"search_ids — TEXT + TITLE modes ({len(queries)} queries each)",
):
for q in queries:
backend.search_ids(q, user=None, search_mode=SearchMode.TEXT)
backend.search_ids(q, user=None, search_mode=SearchMode.TITLE)
# -- 6. API: full round-trip, relevance + page 1 ----------------------
def test_profile_api_relevance_search(self):
"""Profile: full API search round-trip, relevance order, page 1."""
with profile_block(
f"API /documents/?query=… relevance (page 1, page_size={PAGE_SIZE})",
):
response = self.client.get(
f"/api/documents/?query=invoice+payment&page=1&page_size={PAGE_SIZE}",
)
assert response.status_code == 200
# -- 7. API: full round-trip, ORM-ordered (title) ---------------------
def test_profile_api_orm_sorted_search(self):
"""Profile: full API search round-trip with ORM-delegated sort (title)."""
with profile_block("API /documents/?query=…&ordering=title"):
response = self.client.get(
f"/api/documents/?query=report&ordering=title&page=1&page_size={PAGE_SIZE}",
)
assert response.status_code == 200
# -- 8. API: full round-trip, score sort ------------------------------
def test_profile_api_score_sort(self):
"""Profile: full API search with ordering=-score (relevance, preserve order)."""
with profile_block("API /documents/?query=…&ordering=-score"):
response = self.client.get(
f"/api/documents/?query=statement&ordering=-score&page=1&page_size={PAGE_SIZE}",
)
assert response.status_code == 200
# -- 9. API: full round-trip, with selection_data ---------------------
def test_profile_api_with_selection_data(self):
"""Profile: full API search including include_selection_data=true."""
with profile_block("API /documents/?query=…&include_selection_data=true"):
response = self.client.get(
f"/api/documents/?query=contract&page=1&page_size={PAGE_SIZE}"
"&include_selection_data=true",
)
assert response.status_code == 200
assert "selection_data" in response.data
# -- 10. API: paginated (page 2) --------------------------------------
def test_profile_api_page_2(self):
"""Profile: full API search, page 2 — exercises page offset arithmetic."""
with profile_block(f"API /documents/?query=…&page=2&page_size={PAGE_SIZE}"):
response = self.client.get(
f"/api/documents/?query=the&page=2&page_size={PAGE_SIZE}",
)
assert response.status_code == 200

231
test_workflow_profile.py Normal file
View File

@@ -0,0 +1,231 @@
"""
Workflow trigger matching profiling.
Run with:
uv run pytest ../test_workflow_profile.py \
-m profiling --override-ini="addopts=" -s -v
Corpus: 500 documents + correspondents + tags + sets of WorkflowTrigger
objects at 5 and 20 count to allow scaling comparisons.
Scenarios
---------
TestWorkflowMatchingProfile
- test_existing_document_5_workflows existing_document_matches_workflow x 5 triggers
- test_existing_document_20_workflows same x 20 triggers
- test_workflow_prefilter prefilter_documents_by_workflowtrigger on 500 docs
- test_trigger_type_comparison compare DOCUMENT_ADDED vs DOCUMENT_UPDATED overhead
"""
from __future__ import annotations
import random
import time
import pytest
from faker import Faker
from profiling import profile_block
from documents.matching import existing_document_matches_workflow
from documents.matching import prefilter_documents_by_workflowtrigger
from documents.models import Correspondent
from documents.models import Document
from documents.models import Tag
from documents.models import Workflow
from documents.models import WorkflowAction
from documents.models import WorkflowTrigger
pytestmark = [pytest.mark.profiling, pytest.mark.django_db]
NUM_DOCS = 500
NUM_CORRESPONDENTS = 10
NUM_TAGS = 20
SEED = 42
# ---------------------------------------------------------------------------
# Module-scoped fixture
# ---------------------------------------------------------------------------
@pytest.fixture(scope="module")
def module_db(django_db_setup, django_db_blocker):
"""Unlock the DB for the whole module (module-scoped)."""
with django_db_blocker.unblock():
yield
@pytest.fixture(scope="module")
def workflow_corpus(module_db):
"""
500 documents + correspondents + tags + sets of workflow triggers
at 5 and 20 count to allow scaling comparisons.
"""
fake = Faker()
Faker.seed(SEED)
rng = random.Random(SEED)
# ---- lookup objects ---------------------------------------------------
print("\n[setup] Creating lookup objects...") # noqa: T201
correspondents = [
Correspondent.objects.create(name=f"wfcorp-{i}-{fake.company()}"[:128])
for i in range(NUM_CORRESPONDENTS)
]
tags = [
Tag.objects.create(name=f"wftag-{i}-{fake.word()}"[:100])
for i in range(NUM_TAGS)
]
# ---- documents --------------------------------------------------------
print(f"[setup] Building {NUM_DOCS} documents...") # noqa: T201
raw_docs = []
for i in range(NUM_DOCS):
raw_docs.append(
Document(
title=fake.sentence(nb_words=4).rstrip("."),
content=fake.paragraph(nb_sentences=3),
checksum=f"WF{i:07d}",
correspondent=rng.choice(correspondents + [None] * 3),
),
)
documents = Document.objects.bulk_create(raw_docs, batch_size=500)
for doc in documents:
k = rng.randint(0, 3)
if k:
doc.tags.add(*rng.sample(tags, k))
sample_doc = documents[0]
print(f"[setup] Sample doc pk={sample_doc.pk}") # noqa: T201
# ---- build triggers at scale 5 and 20 --------------------------------
_wf_counter = [0]
def _make_triggers(n: int, trigger_type: int) -> list[WorkflowTrigger]:
triggers = []
for i in range(n):
# Alternate between no filter and a correspondent filter
corr = correspondents[i % NUM_CORRESPONDENTS] if i % 3 == 0 else None
trigger = WorkflowTrigger.objects.create(
type=trigger_type,
filter_has_correspondent=corr,
)
action = WorkflowAction.objects.create(
type=WorkflowAction.WorkflowActionType.ASSIGNMENT,
)
idx = _wf_counter[0]
_wf_counter[0] += 1
wf = Workflow.objects.create(name=f"wf-profile-{idx}")
wf.triggers.add(trigger)
wf.actions.add(action)
triggers.append(trigger)
return triggers
print("[setup] Creating workflow triggers...") # noqa: T201
triggers_5 = _make_triggers(5, WorkflowTrigger.WorkflowTriggerType.DOCUMENT_UPDATED)
triggers_20 = _make_triggers(
20,
WorkflowTrigger.WorkflowTriggerType.DOCUMENT_UPDATED,
)
triggers_added = _make_triggers(
5,
WorkflowTrigger.WorkflowTriggerType.DOCUMENT_ADDED,
)
yield {
"doc": sample_doc,
"triggers_5": triggers_5,
"triggers_20": triggers_20,
"triggers_added": triggers_added,
}
# Teardown
print("\n[teardown] Removing workflow corpus...") # noqa: T201
Workflow.objects.all().delete()
WorkflowTrigger.objects.all().delete()
WorkflowAction.objects.all().delete()
Document.objects.all().delete()
Correspondent.objects.all().delete()
Tag.objects.all().delete()
# ---------------------------------------------------------------------------
# TestWorkflowMatchingProfile
# ---------------------------------------------------------------------------
class TestWorkflowMatchingProfile:
"""Profile workflow trigger evaluation per document save."""
@pytest.fixture(autouse=True)
def _setup(self, workflow_corpus):
self.doc = workflow_corpus["doc"]
self.triggers_5 = workflow_corpus["triggers_5"]
self.triggers_20 = workflow_corpus["triggers_20"]
self.triggers_added = workflow_corpus["triggers_added"]
def test_existing_document_5_workflows(self):
"""existing_document_matches_workflow x 5 DOCUMENT_UPDATED triggers."""
doc = self.doc
triggers = self.triggers_5
with profile_block(
f"existing_document_matches_workflow [{len(triggers)} triggers]",
):
for trigger in triggers:
existing_document_matches_workflow(doc, trigger)
def test_existing_document_20_workflows(self):
"""existing_document_matches_workflow x 20 triggers -- shows linear scaling."""
doc = self.doc
triggers = self.triggers_20
with profile_block(
f"existing_document_matches_workflow [{len(triggers)} triggers]",
):
for trigger in triggers:
existing_document_matches_workflow(doc, trigger)
# Also time each call individually to show per-trigger overhead
timings = []
for trigger in triggers:
t0 = time.perf_counter()
existing_document_matches_workflow(doc, trigger)
timings.append((time.perf_counter() - t0) * 1_000_000)
avg_us = sum(timings) / len(timings)
print(f"\n Per-trigger avg: {avg_us:.1f} us (n={len(timings)})") # noqa: T201
def test_workflow_prefilter(self):
"""prefilter_documents_by_workflowtrigger on 500 docs -- tag + correspondent filters."""
qs = Document.objects.all()
print(f"\n Corpus: {qs.count()} documents") # noqa: T201
for trigger in self.triggers_20[:3]:
label = (
f"prefilter_documents_by_workflowtrigger "
f"[corr={trigger.filter_has_correspondent_id}]"
)
with profile_block(label):
result = prefilter_documents_by_workflowtrigger(qs, trigger)
# Evaluate the queryset
count = result.count()
print(f" -> {count} docs passed filter") # noqa: T201
def test_trigger_type_comparison(self):
"""Compare per-call overhead of DOCUMENT_UPDATED vs DOCUMENT_ADDED."""
doc = self.doc
runs = 200
for label, triggers in [
("DOCUMENT_UPDATED", self.triggers_5),
("DOCUMENT_ADDED", self.triggers_added),
]:
t0 = time.perf_counter()
for _ in range(runs):
for trigger in triggers:
existing_document_matches_workflow(doc, trigger)
total_calls = runs * len(triggers)
us_per_call = (time.perf_counter() - t0) / total_calls * 1_000_000
print( # noqa: T201
f" {label:<22s} {us_per_call:.2f} us/call "
f"({total_calls} calls, {len(triggers)} triggers)",
)

157
uv.lock generated
View File

@@ -875,15 +875,15 @@ wheels = [
[[package]]
name = "django"
version = "5.2.13"
version = "5.2.12"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "asgiref", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "sqlparse", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/1f/c5/c69e338eb2959f641045802e5ea87ca4bf5ac90c5fd08953ca10742fad51/django-5.2.13.tar.gz", hash = "sha256:a31589db5188d074c63f0945c3888fad104627dfcc236fb2b97f71f89da33bc4", size = 10890368, upload-time = "2026-04-07T14:02:15.072Z" }
sdist = { url = "https://files.pythonhosted.org/packages/bd/55/b9445fc0695b03746f355c05b2eecc54c34e05198c686f4fc4406b722b52/django-5.2.12.tar.gz", hash = "sha256:6b809af7165c73eff5ce1c87fdae75d4da6520d6667f86401ecf55b681eb1eeb", size = 10860574, upload-time = "2026-03-03T13:56:05.509Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/59/b1/51ab36b2eefcf8cdb9338c7188668a157e29e30306bfc98a379704c9e10d/django-5.2.13-py3-none-any.whl", hash = "sha256:5788fce61da23788a8ce6f02583765ab060d396720924789f97fa42119d37f7a", size = 8310982, upload-time = "2026-04-07T14:02:08.883Z" },
{ url = "https://files.pythonhosted.org/packages/4e/32/4b144e125678efccf5d5b61581de1c4088d6b0286e46096e3b8de0d556c8/django-5.2.12-py3-none-any.whl", hash = "sha256:4853482f395c3a151937f6991272540fcbf531464f254a347bf7c89f53c8cff7", size = 8310245, upload-time = "2026-03-03T13:56:01.174Z" },
]
[[package]]
@@ -935,6 +935,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/12/bf/af8ad2aa5a402f278b444ca70729fb12ee96ddb89c19c32a2d7c5189358f/django_cachalot-2.9.0-py3-none-any.whl", hash = "sha256:b80ac4930613a7849988ea772a53598d262a15eaf55e5ec8c78accae7fdd99ff", size = 57814, upload-time = "2026-01-28T05:23:28.741Z" },
]
[[package]]
name = "django-celery-results"
version = "2.6.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "celery", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a6/b5/9966c28e31014c228305e09d48b19b35522a8f941fe5af5f81f40dc8fa80/django_celery_results-2.6.0.tar.gz", hash = "sha256:9abcd836ae6b61063779244d8887a88fe80bbfaba143df36d3cb07034671277c", size = 83985, upload-time = "2025-04-10T08:23:52.677Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2c/da/70f0f3c5364735344c4bc89e53413bcaae95b4fc1de4e98a7a3b9fb70c88/django_celery_results-2.6.0-py3-none-any.whl", hash = "sha256:b9ccdca2695b98c7cbbb8dea742311ba9a92773d71d7b4944a676e69a7df1c73", size = 38351, upload-time = "2025-04-10T08:23:49.965Z" },
]
[[package]]
name = "django-compression-middleware"
version = "0.5.0"
@@ -2856,6 +2869,7 @@ dependencies = [
{ name = "django-allauth", extra = ["mfa", "socialaccount"], marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-auditlog", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-cachalot", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-celery-results", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-compression-middleware", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-cors-headers", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "django-extensions", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -3000,10 +3014,11 @@ requires-dist = [
{ name = "channels-redis", specifier = "~=4.2" },
{ name = "concurrent-log-handler", specifier = "~=0.9.25" },
{ name = "dateparser", specifier = "~=1.2" },
{ name = "django", specifier = "~=5.2.13" },
{ name = "django", specifier = "~=5.2.10" },
{ name = "django-allauth", extras = ["mfa", "socialaccount"], specifier = "~=65.15.0" },
{ name = "django-auditlog", specifier = "~=3.4.1" },
{ name = "django-cachalot", specifier = "~=2.9.0" },
{ name = "django-celery-results", specifier = "~=2.6.0" },
{ name = "django-compression-middleware", specifier = "~=0.5.0" },
{ name = "django-cors-headers", specifier = "~=4.9.0" },
{ name = "django-extensions", specifier = "~=4.1" },
@@ -3072,7 +3087,7 @@ dev = [
{ name = "faker", specifier = "~=40.12.0" },
{ name = "imagehash" },
{ name = "prek", specifier = "~=0.3.0" },
{ name = "pytest", specifier = "~=9.0.3" },
{ name = "pytest", specifier = "~=9.0.0" },
{ name = "pytest-cov", specifier = "~=7.1.0" },
{ name = "pytest-django", specifier = "~=4.12.0" },
{ name = "pytest-env", specifier = "~=1.6.0" },
@@ -3095,7 +3110,7 @@ testing = [
{ name = "factory-boy", specifier = "~=3.3.1" },
{ name = "faker", specifier = "~=40.12.0" },
{ name = "imagehash" },
{ name = "pytest", specifier = "~=9.0.3" },
{ name = "pytest", specifier = "~=9.0.0" },
{ name = "pytest-cov", specifier = "~=7.1.0" },
{ name = "pytest-django", specifier = "~=4.12.0" },
{ name = "pytest-env", specifier = "~=1.6.0" },
@@ -3250,70 +3265,70 @@ wheels = [
[[package]]
name = "pillow"
version = "12.2.0"
version = "12.1.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/8c/21/c2bcdd5906101a30244eaffc1b6e6ce71a31bd0742a01eb89e660ebfac2d/pillow-12.2.0.tar.gz", hash = "sha256:a830b1a40919539d07806aa58e1b114df53ddd43213d9c8b75847eee6c0182b5", size = 46987819, upload-time = "2026-04-01T14:46:17.687Z" }
sdist = { url = "https://files.pythonhosted.org/packages/1f/42/5c74462b4fd957fcd7b13b04fb3205ff8349236ea74c7c375766d6c82288/pillow-12.1.1.tar.gz", hash = "sha256:9ad8fa5937ab05218e2b6a4cff30295ad35afd2f83ac592e68c0d871bb0fdbc4", size = 46980264, upload-time = "2026-02-11T04:23:07.146Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/68/e1/748f5663efe6edcfc4e74b2b93edfb9b8b99b67f21a854c3ae416500a2d9/pillow-12.2.0-cp311-cp311-macosx_10_10_x86_64.whl", hash = "sha256:8be29e59487a79f173507c30ddf57e733a357f67881430449bb32614075a40ab", size = 5354347, upload-time = "2026-04-01T14:42:44.255Z" },
{ url = "https://files.pythonhosted.org/packages/47/a1/d5ff69e747374c33a3b53b9f98cca7889fce1fd03d79cdc4e1bccc6c5a87/pillow-12.2.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:71cde9a1e1551df7d34a25462fc60325e8a11a82cc2e2f54578e5e9a1e153d65", size = 4695873, upload-time = "2026-04-01T14:42:46.452Z" },
{ url = "https://files.pythonhosted.org/packages/df/21/e3fbdf54408a973c7f7f89a23b2cb97a7ef30c61ab4142af31eee6aebc88/pillow-12.2.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f490f9368b6fc026f021db16d7ec2fbf7d89e2edb42e8ec09d2c60505f5729c7", size = 6280168, upload-time = "2026-04-01T14:42:49.228Z" },
{ url = "https://files.pythonhosted.org/packages/d3/f1/00b7278c7dd52b17ad4329153748f87b6756ec195ff786c2bdf12518337d/pillow-12.2.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8bd7903a5f2a4545f6fd5935c90058b89d30045568985a71c79f5fd6edf9b91e", size = 8088188, upload-time = "2026-04-01T14:42:51.735Z" },
{ url = "https://files.pythonhosted.org/packages/ad/cf/220a5994ef1b10e70e85748b75649d77d506499352be135a4989c957b701/pillow-12.2.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3997232e10d2920a68d25191392e3a4487d8183039e1c74c2297f00ed1c50705", size = 6394401, upload-time = "2026-04-01T14:42:54.343Z" },
{ url = "https://files.pythonhosted.org/packages/e9/bd/e51a61b1054f09437acfbc2ff9106c30d1eb76bc1453d428399946781253/pillow-12.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e74473c875d78b8e9d5da2a70f7099549f9eb37ded4e2f6a463e60125bccd176", size = 7079655, upload-time = "2026-04-01T14:42:56.954Z" },
{ url = "https://files.pythonhosted.org/packages/6b/3d/45132c57d5fb4b5744567c3817026480ac7fc3ce5d4c47902bc0e7f6f853/pillow-12.2.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:56a3f9c60a13133a98ecff6197af34d7824de9b7b38c3654861a725c970c197b", size = 6503105, upload-time = "2026-04-01T14:42:59.847Z" },
{ url = "https://files.pythonhosted.org/packages/7d/2e/9df2fc1e82097b1df3dce58dc43286aa01068e918c07574711fcc53e6fb4/pillow-12.2.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:90e6f81de50ad6b534cab6e5aef77ff6e37722b2f5d908686f4a5c9eba17a909", size = 7203402, upload-time = "2026-04-01T14:43:02.664Z" },
{ url = "https://files.pythonhosted.org/packages/58/be/7482c8a5ebebbc6470b3eb791812fff7d5e0216c2be3827b30b8bb6603ed/pillow-12.2.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:2d192a155bbcec180f8564f693e6fd9bccff5a7af9b32e2e4bf8c9c69dbad6b5", size = 5308279, upload-time = "2026-04-01T14:43:13.246Z" },
{ url = "https://files.pythonhosted.org/packages/d8/95/0a351b9289c2b5cbde0bacd4a83ebc44023e835490a727b2a3bd60ddc0f4/pillow-12.2.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f3f40b3c5a968281fd507d519e444c35f0ff171237f4fdde090dd60699458421", size = 4695490, upload-time = "2026-04-01T14:43:15.584Z" },
{ url = "https://files.pythonhosted.org/packages/de/af/4e8e6869cbed569d43c416fad3dc4ecb944cb5d9492defaed89ddd6fe871/pillow-12.2.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:03e7e372d5240cc23e9f07deca4d775c0817bffc641b01e9c3af208dbd300987", size = 6284462, upload-time = "2026-04-01T14:43:18.268Z" },
{ url = "https://files.pythonhosted.org/packages/e9/9e/c05e19657fd57841e476be1ab46c4d501bffbadbafdc31a6d665f8b737b6/pillow-12.2.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:b86024e52a1b269467a802258c25521e6d742349d760728092e1bc2d135b4d76", size = 8094744, upload-time = "2026-04-01T14:43:20.716Z" },
{ url = "https://files.pythonhosted.org/packages/2b/54/1789c455ed10176066b6e7e6da1b01e50e36f94ba584dc68d9eebfe9156d/pillow-12.2.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7371b48c4fa448d20d2714c9a1f775a81155050d383333e0a6c15b1123dda005", size = 6398371, upload-time = "2026-04-01T14:43:23.443Z" },
{ url = "https://files.pythonhosted.org/packages/43/e3/fdc657359e919462369869f1c9f0e973f353f9a9ee295a39b1fea8ee1a77/pillow-12.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:62f5409336adb0663b7caa0da5c7d9e7bdbaae9ce761d34669420c2a801b2780", size = 7087215, upload-time = "2026-04-01T14:43:26.758Z" },
{ url = "https://files.pythonhosted.org/packages/8b/f8/2f6825e441d5b1959d2ca5adec984210f1ec086435b0ed5f52c19b3b8a6e/pillow-12.2.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:01afa7cf67f74f09523699b4e88c73fb55c13346d212a59a2db1f86b0a63e8c5", size = 6509783, upload-time = "2026-04-01T14:43:29.56Z" },
{ url = "https://files.pythonhosted.org/packages/67/f9/029a27095ad20f854f9dba026b3ea6428548316e057e6fc3545409e86651/pillow-12.2.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fc3d34d4a8fbec3e88a79b92e5465e0f9b842b628675850d860b8bd300b159f5", size = 7212112, upload-time = "2026-04-01T14:43:32.091Z" },
{ url = "https://files.pythonhosted.org/packages/4a/01/53d10cf0dbad820a8db274d259a37ba50b88b24768ddccec07355382d5ad/pillow-12.2.0-cp313-cp313-ios_13_0_arm64_iphoneos.whl", hash = "sha256:8297651f5b5679c19968abefd6bb84d95fe30ef712eb1b2d9b2d31ca61267f4c", size = 4100837, upload-time = "2026-04-01T14:43:41.506Z" },
{ url = "https://files.pythonhosted.org/packages/0f/98/f3a6657ecb698c937f6c76ee564882945f29b79bad496abcba0e84659ec5/pillow-12.2.0-cp313-cp313-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:50d8520da2a6ce0af445fa6d648c4273c3eeefbc32d7ce049f22e8b5c3daecc2", size = 4176528, upload-time = "2026-04-01T14:43:43.773Z" },
{ url = "https://files.pythonhosted.org/packages/69/bc/8986948f05e3ea490b8442ea1c1d4d990b24a7e43d8a51b2c7d8b1dced36/pillow-12.2.0-cp313-cp313-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:766cef22385fa1091258ad7e6216792b156dc16d8d3fa607e7545b2b72061f1c", size = 3640401, upload-time = "2026-04-01T14:43:45.87Z" },
{ url = "https://files.pythonhosted.org/packages/34/46/6c717baadcd62bc8ed51d238d521ab651eaa74838291bda1f86fe1f864c9/pillow-12.2.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:5d2fd0fa6b5d9d1de415060363433f28da8b1526c1c129020435e186794b3795", size = 5308094, upload-time = "2026-04-01T14:43:48.438Z" },
{ url = "https://files.pythonhosted.org/packages/71/43/905a14a8b17fdb1ccb58d282454490662d2cb89a6bfec26af6d3520da5ec/pillow-12.2.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:56b25336f502b6ed02e889f4ece894a72612fe885889a6e8c4c80239ff6e5f5f", size = 4695402, upload-time = "2026-04-01T14:43:51.292Z" },
{ url = "https://files.pythonhosted.org/packages/73/dd/42107efcb777b16fa0393317eac58f5b5cf30e8392e266e76e51cff28c3d/pillow-12.2.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f1c943e96e85df3d3478f7b691f229887e143f81fedab9b20205349ab04d73ed", size = 6280005, upload-time = "2026-04-01T14:43:54.242Z" },
{ url = "https://files.pythonhosted.org/packages/a8/68/b93e09e5e8549019e61acf49f65b1a8530765a7f812c77a7461bca7e4494/pillow-12.2.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:03f6fab9219220f041c74aeaa2939ff0062bd5c364ba9ce037197f4c6d498cd9", size = 8090669, upload-time = "2026-04-01T14:43:57.335Z" },
{ url = "https://files.pythonhosted.org/packages/4b/6e/3ccb54ce8ec4ddd1accd2d89004308b7b0b21c4ac3d20fa70af4760a4330/pillow-12.2.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5cdfebd752ec52bf5bb4e35d9c64b40826bc5b40a13df7c3cda20a2c03a0f5ed", size = 6395194, upload-time = "2026-04-01T14:43:59.864Z" },
{ url = "https://files.pythonhosted.org/packages/67/ee/21d4e8536afd1a328f01b359b4d3997b291ffd35a237c877b331c1c3b71c/pillow-12.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:eedf4b74eda2b5a4b2b2fb4c006d6295df3bf29e459e198c90ea48e130dc75c3", size = 7082423, upload-time = "2026-04-01T14:44:02.74Z" },
{ url = "https://files.pythonhosted.org/packages/78/5f/e9f86ab0146464e8c133fe85df987ed9e77e08b29d8d35f9f9f4d6f917ba/pillow-12.2.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:00a2865911330191c0b818c59103b58a5e697cae67042366970a6b6f1b20b7f9", size = 6505667, upload-time = "2026-04-01T14:44:05.381Z" },
{ url = "https://files.pythonhosted.org/packages/ed/1e/409007f56a2fdce61584fd3acbc2bbc259857d555196cedcadc68c015c82/pillow-12.2.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1e1757442ed87f4912397c6d35a0db6a7b52592156014706f17658ff58bbf795", size = 7208580, upload-time = "2026-04-01T14:44:08.39Z" },
{ url = "https://files.pythonhosted.org/packages/4d/a4/b342930964e3cb4dce5038ae34b0eab4653334995336cd486c5a8c25a00c/pillow-12.2.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:042db20a421b9bafecc4b84a8b6e444686bd9d836c7fd24542db3e7df7baad9b", size = 5309927, upload-time = "2026-04-01T14:44:18.89Z" },
{ url = "https://files.pythonhosted.org/packages/9f/de/23198e0a65a9cf06123f5435a5d95cea62a635697f8f03d134d3f3a96151/pillow-12.2.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:dd025009355c926a84a612fecf58bb315a3f6814b17ead51a8e48d3823d9087f", size = 4698624, upload-time = "2026-04-01T14:44:21.115Z" },
{ url = "https://files.pythonhosted.org/packages/01/a6/1265e977f17d93ea37aa28aa81bad4fa597933879fac2520d24e021c8da3/pillow-12.2.0-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:88ddbc66737e277852913bd1e07c150cc7bb124539f94c4e2df5344494e0a612", size = 6321252, upload-time = "2026-04-01T14:44:23.663Z" },
{ url = "https://files.pythonhosted.org/packages/3c/83/5982eb4a285967baa70340320be9f88e57665a387e3a53a7f0db8231a0cd/pillow-12.2.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:d362d1878f00c142b7e1a16e6e5e780f02be8195123f164edf7eddd911eefe7c", size = 8126550, upload-time = "2026-04-01T14:44:26.772Z" },
{ url = "https://files.pythonhosted.org/packages/4e/48/6ffc514adce69f6050d0753b1a18fd920fce8cac87620d5a31231b04bfc5/pillow-12.2.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:2c727a6d53cb0018aadd8018c2b938376af27914a68a492f59dfcaca650d5eea", size = 6433114, upload-time = "2026-04-01T14:44:29.615Z" },
{ url = "https://files.pythonhosted.org/packages/36/a3/f9a77144231fb8d40ee27107b4463e205fa4677e2ca2548e14da5cf18dce/pillow-12.2.0-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:efd8c21c98c5cc60653bcb311bef2ce0401642b7ce9d09e03a7da87c878289d4", size = 7115667, upload-time = "2026-04-01T14:44:32.773Z" },
{ url = "https://files.pythonhosted.org/packages/c1/fc/ac4ee3041e7d5a565e1c4fd72a113f03b6394cc72ab7089d27608f8aaccb/pillow-12.2.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:9f08483a632889536b8139663db60f6724bfcb443c96f1b18855860d7d5c0fd4", size = 6538966, upload-time = "2026-04-01T14:44:35.252Z" },
{ url = "https://files.pythonhosted.org/packages/c0/a8/27fb307055087f3668f6d0a8ccb636e7431d56ed0750e07a60547b1e083e/pillow-12.2.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:dac8d77255a37e81a2efcbd1fc05f1c15ee82200e6c240d7e127e25e365c39ea", size = 7238241, upload-time = "2026-04-01T14:44:37.875Z" },
{ url = "https://files.pythonhosted.org/packages/bf/98/4595daa2365416a86cb0d495248a393dfc84e96d62ad080c8546256cb9c0/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:3adc9215e8be0448ed6e814966ecf3d9952f0ea40eb14e89a102b87f450660d8", size = 4100848, upload-time = "2026-04-01T14:44:48.48Z" },
{ url = "https://files.pythonhosted.org/packages/0b/79/40184d464cf89f6663e18dfcf7ca21aae2491fff1a16127681bf1fa9b8cf/pillow-12.2.0-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:6a9adfc6d24b10f89588096364cc726174118c62130c817c2837c60cf08a392b", size = 4176515, upload-time = "2026-04-01T14:44:51.353Z" },
{ url = "https://files.pythonhosted.org/packages/b0/63/703f86fd4c422a9cf722833670f4f71418fb116b2853ff7da722ea43f184/pillow-12.2.0-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:6a6e67ea2e6feda684ed370f9a1c52e7a243631c025ba42149a2cc5934dec295", size = 3640159, upload-time = "2026-04-01T14:44:53.588Z" },
{ url = "https://files.pythonhosted.org/packages/71/e0/fb22f797187d0be2270f83500aab851536101b254bfa1eae10795709d283/pillow-12.2.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2bb4a8d594eacdfc59d9e5ad972aa8afdd48d584ffd5f13a937a664c3e7db0ed", size = 5312185, upload-time = "2026-04-01T14:44:56.039Z" },
{ url = "https://files.pythonhosted.org/packages/ba/8c/1a9e46228571de18f8e28f16fabdfc20212a5d019f3e3303452b3f0a580d/pillow-12.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:80b2da48193b2f33ed0c32c38140f9d3186583ce7d516526d462645fd98660ae", size = 4695386, upload-time = "2026-04-01T14:44:58.663Z" },
{ url = "https://files.pythonhosted.org/packages/70/62/98f6b7f0c88b9addd0e87c217ded307b36be024d4ff8869a812b241d1345/pillow-12.2.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:22db17c68434de69d8ecfc2fe821569195c0c373b25cccb9cbdacf2c6e53c601", size = 6280384, upload-time = "2026-04-01T14:45:01.5Z" },
{ url = "https://files.pythonhosted.org/packages/5e/03/688747d2e91cfbe0e64f316cd2e8005698f76ada3130d0194664174fa5de/pillow-12.2.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7b14cc0106cd9aecda615dd6903840a058b4700fcb817687d0ee4fc8b6e389be", size = 8091599, upload-time = "2026-04-01T14:45:04.5Z" },
{ url = "https://files.pythonhosted.org/packages/f6/35/577e22b936fcdd66537329b33af0b4ccfefaeabd8aec04b266528cddb33c/pillow-12.2.0-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8cbeb542b2ebc6fcdacabf8aca8c1a97c9b3ad3927d46b8723f9d4f033288a0f", size = 6396021, upload-time = "2026-04-01T14:45:07.117Z" },
{ url = "https://files.pythonhosted.org/packages/11/8d/d2532ad2a603ca2b93ad9f5135732124e57811d0168155852f37fbce2458/pillow-12.2.0-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4bfd07bc812fbd20395212969e41931001fd59eb55a60658b0e5710872e95286", size = 7083360, upload-time = "2026-04-01T14:45:09.763Z" },
{ url = "https://files.pythonhosted.org/packages/5e/26/d325f9f56c7e039034897e7380e9cc202b1e368bfd04d4cbe6a441f02885/pillow-12.2.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:9aba9a17b623ef750a4d11b742cbafffeb48a869821252b30ee21b5e91392c50", size = 6507628, upload-time = "2026-04-01T14:45:12.378Z" },
{ url = "https://files.pythonhosted.org/packages/5f/f7/769d5632ffb0988f1c5e7660b3e731e30f7f8ec4318e94d0a5d674eb65a4/pillow-12.2.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:deede7c263feb25dba4e82ea23058a235dcc2fe1f6021025dc71f2b618e26104", size = 7209321, upload-time = "2026-04-01T14:45:15.122Z" },
{ url = "https://files.pythonhosted.org/packages/b6/ab/1b426a3974cb0e7da5c29ccff4807871d48110933a57207b5a676cccc155/pillow-12.2.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:57850958fe9c751670e49b2cecf6294acc99e562531f4bd317fa5ddee2068463", size = 5314225, upload-time = "2026-04-01T14:45:25.637Z" },
{ url = "https://files.pythonhosted.org/packages/19/1e/dce46f371be2438eecfee2a1960ee2a243bbe5e961890146d2dee1ff0f12/pillow-12.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5d38f1411c0ed9f97bcb49b7bd59b6b7c314e0e27420e34d99d844b9ce3b6f3", size = 4698541, upload-time = "2026-04-01T14:45:28.355Z" },
{ url = "https://files.pythonhosted.org/packages/55/c3/7fbecf70adb3a0c33b77a300dc52e424dc22ad8cdc06557a2e49523b703d/pillow-12.2.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c0a9f29ca8e79f09de89293f82fc9b0270bb4af1d58bc98f540cc4aedf03166", size = 6322251, upload-time = "2026-04-01T14:45:30.924Z" },
{ url = "https://files.pythonhosted.org/packages/1c/3c/7fbc17cfb7e4fe0ef1642e0abc17fc6c94c9f7a16be41498e12e2ba60408/pillow-12.2.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1610dd6c61621ae1cf811bef44d77e149ce3f7b95afe66a4512f8c59f25d9ebe", size = 8127807, upload-time = "2026-04-01T14:45:33.908Z" },
{ url = "https://files.pythonhosted.org/packages/ff/c3/a8ae14d6defd2e448493ff512fae903b1e9bd40b72efb6ec55ce0048c8ce/pillow-12.2.0-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0a34329707af4f73cf1782a36cd2289c0368880654a2c11f027bcee9052d35dd", size = 6433935, upload-time = "2026-04-01T14:45:36.623Z" },
{ url = "https://files.pythonhosted.org/packages/6e/32/2880fb3a074847ac159d8f902cb43278a61e85f681661e7419e6596803ed/pillow-12.2.0-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e9c4f5b3c546fa3458a29ab22646c1c6c787ea8f5ef51300e5a60300736905e", size = 7116720, upload-time = "2026-04-01T14:45:39.258Z" },
{ url = "https://files.pythonhosted.org/packages/46/87/495cc9c30e0129501643f24d320076f4cc54f718341df18cc70ec94c44e1/pillow-12.2.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fb043ee2f06b41473269765c2feae53fc2e2fbf96e5e22ca94fb5ad677856f06", size = 6540498, upload-time = "2026-04-01T14:45:41.879Z" },
{ url = "https://files.pythonhosted.org/packages/18/53/773f5edca692009d883a72211b60fdaf8871cbef075eaa9d577f0a2f989e/pillow-12.2.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:f278f034eb75b4e8a13a54a876cc4a5ab39173d2cdd93a638e1b467fc545ac43", size = 7239413, upload-time = "2026-04-01T14:45:44.705Z" },
{ url = "https://files.pythonhosted.org/packages/4e/b7/2437044fb910f499610356d1352e3423753c98e34f915252aafecc64889f/pillow-12.2.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:0538bd5e05efec03ae613fd89c4ce0368ecd2ba239cc25b9f9be7ed426b0af1f", size = 5273969, upload-time = "2026-04-01T14:45:55.538Z" },
{ url = "https://files.pythonhosted.org/packages/f6/f4/8316e31de11b780f4ac08ef3654a75555e624a98db1056ecb2122d008d5a/pillow-12.2.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:394167b21da716608eac917c60aa9b969421b5dcbbe02ae7f013e7b85811c69d", size = 4659674, upload-time = "2026-04-01T14:45:58.093Z" },
{ url = "https://files.pythonhosted.org/packages/d4/37/664fca7201f8bb2aa1d20e2c3d5564a62e6ae5111741966c8319ca802361/pillow-12.2.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5d04bfa02cc2d23b497d1e90a0f927070043f6cbf303e738300532379a4b4e0f", size = 5288479, upload-time = "2026-04-01T14:46:01.141Z" },
{ url = "https://files.pythonhosted.org/packages/49/62/5b0ed78fce87346be7a5cfcfaaad91f6a1f98c26f86bdbafa2066c647ef6/pillow-12.2.0-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0c838a5125cee37e68edec915651521191cef1e6aa336b855f495766e77a366e", size = 7032230, upload-time = "2026-04-01T14:46:03.874Z" },
{ url = "https://files.pythonhosted.org/packages/c3/28/ec0fc38107fc32536908034e990c47914c57cd7c5a3ece4d8d8f7ffd7e27/pillow-12.2.0-pp311-pypy311_pp73-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4a6c9fa44005fa37a91ebfc95d081e8079757d2e904b27103f4f5fa6f0bf78c0", size = 5355404, upload-time = "2026-04-01T14:46:06.33Z" },
{ url = "https://files.pythonhosted.org/packages/5e/8b/51b0eddcfa2180d60e41f06bd6d0a62202b20b59c68f5a132e615b75aecf/pillow-12.2.0-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:25373b66e0dd5905ed63fa3cae13c82fbddf3079f2c8bf15c6fb6a35586324c1", size = 6002215, upload-time = "2026-04-01T14:46:08.83Z" },
{ url = "https://files.pythonhosted.org/packages/2b/46/5da1ec4a5171ee7bf1a0efa064aba70ba3d6e0788ce3f5acd1375d23c8c0/pillow-12.1.1-cp311-cp311-macosx_10_10_x86_64.whl", hash = "sha256:e879bb6cd5c73848ef3b2b48b8af9ff08c5b71ecda8048b7dd22d8a33f60be32", size = 5304084, upload-time = "2026-02-11T04:20:27.501Z" },
{ url = "https://files.pythonhosted.org/packages/78/93/a29e9bc02d1cf557a834da780ceccd54e02421627200696fcf805ebdc3fb/pillow-12.1.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:365b10bb9417dd4498c0e3b128018c4a624dc11c7b97d8cc54effe3b096f4c38", size = 4657866, upload-time = "2026-02-11T04:20:29.827Z" },
{ url = "https://files.pythonhosted.org/packages/13/84/583a4558d492a179d31e4aae32eadce94b9acf49c0337c4ce0b70e0a01f2/pillow-12.1.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:d4ce8e329c93845720cd2014659ca67eac35f6433fd3050393d85f3ecef0dad5", size = 6232148, upload-time = "2026-02-11T04:20:31.329Z" },
{ url = "https://files.pythonhosted.org/packages/d5/e2/53c43334bbbb2d3b938978532fbda8e62bb6e0b23a26ce8592f36bcc4987/pillow-12.1.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fc354a04072b765eccf2204f588a7a532c9511e8b9c7f900e1b64e3e33487090", size = 8038007, upload-time = "2026-02-11T04:20:34.225Z" },
{ url = "https://files.pythonhosted.org/packages/b8/a6/3d0e79c8a9d58150dd98e199d7c1c56861027f3829a3a60b3c2784190180/pillow-12.1.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7e7976bf1910a8116b523b9f9f58bf410f3e8aa330cd9a2bb2953f9266ab49af", size = 6345418, upload-time = "2026-02-11T04:20:35.858Z" },
{ url = "https://files.pythonhosted.org/packages/a2/c8/46dfeac5825e600579157eea177be43e2f7ff4a99da9d0d0a49533509ac5/pillow-12.1.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:597bd9c8419bc7c6af5604e55847789b69123bbe25d65cc6ad3012b4f3c98d8b", size = 7034590, upload-time = "2026-02-11T04:20:37.91Z" },
{ url = "https://files.pythonhosted.org/packages/af/bf/e6f65d3db8a8bbfeaf9e13cc0417813f6319863a73de934f14b2229ada18/pillow-12.1.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:2c1fc0f2ca5f96a3c8407e41cca26a16e46b21060fe6d5b099d2cb01412222f5", size = 6458655, upload-time = "2026-02-11T04:20:39.496Z" },
{ url = "https://files.pythonhosted.org/packages/f9/c2/66091f3f34a25894ca129362e510b956ef26f8fb67a0e6417bc5744e56f1/pillow-12.1.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:578510d88c6229d735855e1f278aa305270438d36a05031dfaae5067cc8eb04d", size = 7159286, upload-time = "2026-02-11T04:20:41.139Z" },
{ url = "https://files.pythonhosted.org/packages/07/d3/8df65da0d4df36b094351dce696f2989bec731d4f10e743b1c5f4da4d3bf/pillow-12.1.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ab323b787d6e18b3d91a72fc99b1a2c28651e4358749842b8f8dfacd28ef2052", size = 5262803, upload-time = "2026-02-11T04:20:47.653Z" },
{ url = "https://files.pythonhosted.org/packages/d6/71/5026395b290ff404b836e636f51d7297e6c83beceaa87c592718747e670f/pillow-12.1.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:adebb5bee0f0af4909c30db0d890c773d1a92ffe83da908e2e9e720f8edf3984", size = 4657601, upload-time = "2026-02-11T04:20:49.328Z" },
{ url = "https://files.pythonhosted.org/packages/b1/2e/1001613d941c67442f745aff0f7cc66dd8df9a9c084eb497e6a543ee6f7e/pillow-12.1.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:bb66b7cc26f50977108790e2456b7921e773f23db5630261102233eb355a3b79", size = 6234995, upload-time = "2026-02-11T04:20:51.032Z" },
{ url = "https://files.pythonhosted.org/packages/07/26/246ab11455b2549b9233dbd44d358d033a2f780fa9007b61a913c5b2d24e/pillow-12.1.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:aee2810642b2898bb187ced9b349e95d2a7272930796e022efaf12e99dccd293", size = 8045012, upload-time = "2026-02-11T04:20:52.882Z" },
{ url = "https://files.pythonhosted.org/packages/b2/8b/07587069c27be7535ac1fe33874e32de118fbd34e2a73b7f83436a88368c/pillow-12.1.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a0b1cd6232e2b618adcc54d9882e4e662a089d5768cd188f7c245b4c8c44a397", size = 6349638, upload-time = "2026-02-11T04:20:54.444Z" },
{ url = "https://files.pythonhosted.org/packages/ff/79/6df7b2ee763d619cda2fb4fea498e5f79d984dae304d45a8999b80d6cf5c/pillow-12.1.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7aac39bcf8d4770d089588a2e1dd111cbaa42df5a94be3114222057d68336bd0", size = 7041540, upload-time = "2026-02-11T04:20:55.97Z" },
{ url = "https://files.pythonhosted.org/packages/2c/5e/2ba19e7e7236d7529f4d873bdaf317a318896bac289abebd4bb00ef247f0/pillow-12.1.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:ab174cd7d29a62dd139c44bf74b698039328f45cb03b4596c43473a46656b2f3", size = 6462613, upload-time = "2026-02-11T04:20:57.542Z" },
{ url = "https://files.pythonhosted.org/packages/03/03/31216ec124bb5c3dacd74ce8efff4cc7f52643653bad4825f8f08c697743/pillow-12.1.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:339ffdcb7cbeaa08221cd401d517d4b1fe7a9ed5d400e4a8039719238620ca35", size = 7166745, upload-time = "2026-02-11T04:20:59.196Z" },
{ url = "https://files.pythonhosted.org/packages/d5/11/6db24d4bd7685583caeae54b7009584e38da3c3d4488ed4cd25b439de486/pillow-12.1.1-cp313-cp313-ios_13_0_arm64_iphoneos.whl", hash = "sha256:d242e8ac078781f1de88bf823d70c1a9b3c7950a44cdf4b7c012e22ccbcd8e4e", size = 4062689, upload-time = "2026-02-11T04:21:06.804Z" },
{ url = "https://files.pythonhosted.org/packages/33/c0/ce6d3b1fe190f0021203e0d9b5b99e57843e345f15f9ef22fcd43842fd21/pillow-12.1.1-cp313-cp313-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:02f84dfad02693676692746df05b89cf25597560db2857363a208e393429f5e9", size = 4138535, upload-time = "2026-02-11T04:21:08.452Z" },
{ url = "https://files.pythonhosted.org/packages/a0/c6/d5eb6a4fb32a3f9c21a8c7613ec706534ea1cf9f4b3663e99f0d83f6fca8/pillow-12.1.1-cp313-cp313-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:e65498daf4b583091ccbb2556c7000abf0f3349fcd57ef7adc9a84a394ed29f6", size = 3601364, upload-time = "2026-02-11T04:21:10.194Z" },
{ url = "https://files.pythonhosted.org/packages/14/a1/16c4b823838ba4c9c52c0e6bbda903a3fe5a1bdbf1b8eb4fff7156f3e318/pillow-12.1.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:6c6db3b84c87d48d0088943bf33440e0c42370b99b1c2a7989216f7b42eede60", size = 5262561, upload-time = "2026-02-11T04:21:11.742Z" },
{ url = "https://files.pythonhosted.org/packages/bb/ad/ad9dc98ff24f485008aa5cdedaf1a219876f6f6c42a4626c08bc4e80b120/pillow-12.1.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:8b7e5304e34942bf62e15184219a7b5ad4ff7f3bb5cca4d984f37df1a0e1aee2", size = 4657460, upload-time = "2026-02-11T04:21:13.786Z" },
{ url = "https://files.pythonhosted.org/packages/9e/1b/f1a4ea9a895b5732152789326202a82464d5254759fbacae4deea3069334/pillow-12.1.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:18e5bddd742a44b7e6b1e773ab5db102bd7a94c32555ba656e76d319d19c3850", size = 6232698, upload-time = "2026-02-11T04:21:15.949Z" },
{ url = "https://files.pythonhosted.org/packages/95/f4/86f51b8745070daf21fd2e5b1fe0eb35d4db9ca26e6d58366562fb56a743/pillow-12.1.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fc44ef1f3de4f45b50ccf9136999d71abb99dca7706bc75d222ed350b9fd2289", size = 8041706, upload-time = "2026-02-11T04:21:17.723Z" },
{ url = "https://files.pythonhosted.org/packages/29/9b/d6ecd956bb1266dd1045e995cce9b8d77759e740953a1c9aad9502a0461e/pillow-12.1.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5a8eb7ed8d4198bccbd07058416eeec51686b498e784eda166395a23eb99138e", size = 6346621, upload-time = "2026-02-11T04:21:19.547Z" },
{ url = "https://files.pythonhosted.org/packages/71/24/538bff45bde96535d7d998c6fed1a751c75ac7c53c37c90dc2601b243893/pillow-12.1.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:47b94983da0c642de92ced1702c5b6c292a84bd3a8e1d1702ff923f183594717", size = 7038069, upload-time = "2026-02-11T04:21:21.378Z" },
{ url = "https://files.pythonhosted.org/packages/94/0e/58cb1a6bc48f746bc4cb3adb8cabff73e2742c92b3bf7a220b7cf69b9177/pillow-12.1.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:518a48c2aab7ce596d3bf79d0e275661b846e86e4d0e7dec34712c30fe07f02a", size = 6460040, upload-time = "2026-02-11T04:21:23.148Z" },
{ url = "https://files.pythonhosted.org/packages/6c/57/9045cb3ff11eeb6c1adce3b2d60d7d299d7b273a2e6c8381a524abfdc474/pillow-12.1.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:a550ae29b95c6dc13cf69e2c9dc5747f814c54eeb2e32d683e5e93af56caa029", size = 7164523, upload-time = "2026-02-11T04:21:25.01Z" },
{ url = "https://files.pythonhosted.org/packages/19/2a/b9d62794fc8a0dd14c1943df68347badbd5511103e0d04c035ffe5cf2255/pillow-12.1.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:0330d233c1a0ead844fc097a7d16c0abff4c12e856c0b325f231820fee1f39da", size = 5264880, upload-time = "2026-02-11T04:21:32.865Z" },
{ url = "https://files.pythonhosted.org/packages/26/9d/e03d857d1347fa5ed9247e123fcd2a97b6220e15e9cb73ca0a8d91702c6e/pillow-12.1.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:5dae5f21afb91322f2ff791895ddd8889e5e947ff59f71b46041c8ce6db790bc", size = 4660616, upload-time = "2026-02-11T04:21:34.97Z" },
{ url = "https://files.pythonhosted.org/packages/f7/ec/8a6d22afd02570d30954e043f09c32772bfe143ba9285e2fdb11284952cd/pillow-12.1.1-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2e0c664be47252947d870ac0d327fea7e63985a08794758aa8af5b6cb6ec0c9c", size = 6269008, upload-time = "2026-02-11T04:21:36.623Z" },
{ url = "https://files.pythonhosted.org/packages/3d/1d/6d875422c9f28a4a361f495a5f68d9de4a66941dc2c619103ca335fa6446/pillow-12.1.1-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:691ab2ac363b8217f7d31b3497108fb1f50faab2f75dfb03284ec2f217e87bf8", size = 8073226, upload-time = "2026-02-11T04:21:38.585Z" },
{ url = "https://files.pythonhosted.org/packages/a1/cd/134b0b6ee5eda6dc09e25e24b40fdafe11a520bc725c1d0bbaa5e00bf95b/pillow-12.1.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e9e8064fb1cc019296958595f6db671fba95209e3ceb0c4734c9baf97de04b20", size = 6380136, upload-time = "2026-02-11T04:21:40.562Z" },
{ url = "https://files.pythonhosted.org/packages/7a/a9/7628f013f18f001c1b98d8fffe3452f306a70dc6aba7d931019e0492f45e/pillow-12.1.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:472a8d7ded663e6162dafdf20015c486a7009483ca671cece7a9279b512fcb13", size = 7067129, upload-time = "2026-02-11T04:21:42.521Z" },
{ url = "https://files.pythonhosted.org/packages/1e/f8/66ab30a2193b277785601e82ee2d49f68ea575d9637e5e234faaa98efa4c/pillow-12.1.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:89b54027a766529136a06cfebeecb3a04900397a3590fd252160b888479517bf", size = 6491807, upload-time = "2026-02-11T04:21:44.22Z" },
{ url = "https://files.pythonhosted.org/packages/da/0b/a877a6627dc8318fdb84e357c5e1a758c0941ab1ddffdafd231983788579/pillow-12.1.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:86172b0831b82ce4f7877f280055892b31179e1576aa00d0df3bb1bbf8c3e524", size = 7190954, upload-time = "2026-02-11T04:21:46.114Z" },
{ url = "https://files.pythonhosted.org/packages/03/d0/bebb3ffbf31c5a8e97241476c4cf8b9828954693ce6744b4a2326af3e16b/pillow-12.1.1-cp314-cp314-ios_13_0_arm64_iphoneos.whl", hash = "sha256:417423db963cb4be8bac3fc1204fe61610f6abeed1580a7a2cbb2fbda20f12af", size = 4062652, upload-time = "2026-02-11T04:21:53.19Z" },
{ url = "https://files.pythonhosted.org/packages/2d/c0/0e16fb0addda4851445c28f8350d8c512f09de27bbb0d6d0bbf8b6709605/pillow-12.1.1-cp314-cp314-ios_13_0_arm64_iphonesimulator.whl", hash = "sha256:b957b71c6b2387610f556a7eb0828afbe40b4a98036fc0d2acfa5a44a0c2036f", size = 4138823, upload-time = "2026-02-11T04:22:03.088Z" },
{ url = "https://files.pythonhosted.org/packages/6b/fb/6170ec655d6f6bb6630a013dd7cf7bc218423d7b5fa9071bf63dc32175ae/pillow-12.1.1-cp314-cp314-ios_13_0_x86_64_iphonesimulator.whl", hash = "sha256:097690ba1f2efdeb165a20469d59d8bb03c55fb6621eb2041a060ae8ea3e9642", size = 3601143, upload-time = "2026-02-11T04:22:04.909Z" },
{ url = "https://files.pythonhosted.org/packages/59/04/dc5c3f297510ba9a6837cbb318b87dd2b8f73eb41a43cc63767f65cb599c/pillow-12.1.1-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:2815a87ab27848db0321fb78c7f0b2c8649dee134b7f2b80c6a45c6831d75ccd", size = 5266254, upload-time = "2026-02-11T04:22:07.656Z" },
{ url = "https://files.pythonhosted.org/packages/05/30/5db1236b0d6313f03ebf97f5e17cda9ca060f524b2fcc875149a8360b21c/pillow-12.1.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:f7ed2c6543bad5a7d5530eb9e78c53132f93dfa44a28492db88b41cdab885202", size = 4657499, upload-time = "2026-02-11T04:22:09.613Z" },
{ url = "https://files.pythonhosted.org/packages/6f/18/008d2ca0eb612e81968e8be0bbae5051efba24d52debf930126d7eaacbba/pillow-12.1.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:652a2c9ccfb556235b2b501a3a7cf3742148cd22e04b5625c5fe057ea3e3191f", size = 6232137, upload-time = "2026-02-11T04:22:11.434Z" },
{ url = "https://files.pythonhosted.org/packages/70/f1/f14d5b8eeb4b2cd62b9f9f847eb6605f103df89ef619ac68f92f748614ea/pillow-12.1.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:d6e4571eedf43af33d0fc233a382a76e849badbccdf1ac438841308652a08e1f", size = 8042721, upload-time = "2026-02-11T04:22:13.321Z" },
{ url = "https://files.pythonhosted.org/packages/5a/d6/17824509146e4babbdabf04d8171491fa9d776f7061ff6e727522df9bd03/pillow-12.1.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b574c51cf7d5d62e9be37ba446224b59a2da26dc4c1bb2ecbe936a4fb1a7cb7f", size = 6347798, upload-time = "2026-02-11T04:22:15.449Z" },
{ url = "https://files.pythonhosted.org/packages/d1/ee/c85a38a9ab92037a75615aba572c85ea51e605265036e00c5b67dfafbfe2/pillow-12.1.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a37691702ed687799de29a518d63d4682d9016932db66d4e90c345831b02fb4e", size = 7039315, upload-time = "2026-02-11T04:22:17.24Z" },
{ url = "https://files.pythonhosted.org/packages/ec/f3/bc8ccc6e08a148290d7523bde4d9a0d6c981db34631390dc6e6ec34cacf6/pillow-12.1.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:f95c00d5d6700b2b890479664a06e754974848afaae5e21beb4d83c106923fd0", size = 6462360, upload-time = "2026-02-11T04:22:19.111Z" },
{ url = "https://files.pythonhosted.org/packages/f6/ab/69a42656adb1d0665ab051eec58a41f169ad295cf81ad45406963105408f/pillow-12.1.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:559b38da23606e68681337ad74622c4dbba02254fc9cb4488a305dd5975c7eeb", size = 7165438, upload-time = "2026-02-11T04:22:21.041Z" },
{ url = "https://files.pythonhosted.org/packages/6c/9d/efd18493f9de13b87ede7c47e69184b9e859e4427225ea962e32e56a49bc/pillow-12.1.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:1f90cff8aa76835cba5769f0b3121a22bd4eb9e6884cfe338216e557a9a548b8", size = 5268612, upload-time = "2026-02-11T04:22:29.884Z" },
{ url = "https://files.pythonhosted.org/packages/f8/f1/4f42eb2b388eb2ffc660dcb7f7b556c1015c53ebd5f7f754965ef997585b/pillow-12.1.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:1f1be78ce9466a7ee64bfda57bdba0f7cc499d9794d518b854816c41bf0aa4e9", size = 4660567, upload-time = "2026-02-11T04:22:31.799Z" },
{ url = "https://files.pythonhosted.org/packages/01/54/df6ef130fa43e4b82e32624a7b821a2be1c5653a5fdad8469687a7db4e00/pillow-12.1.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:42fc1f4677106188ad9a55562bbade416f8b55456f522430fadab3cef7cd4e60", size = 6269951, upload-time = "2026-02-11T04:22:33.921Z" },
{ url = "https://files.pythonhosted.org/packages/a9/48/618752d06cc44bb4aae8ce0cd4e6426871929ed7b46215638088270d9b34/pillow-12.1.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:98edb152429ab62a1818039744d8fbb3ccab98a7c29fc3d5fcef158f3f1f68b7", size = 8074769, upload-time = "2026-02-11T04:22:35.877Z" },
{ url = "https://files.pythonhosted.org/packages/c3/bd/f1d71eb39a72fa088d938655afba3e00b38018d052752f435838961127d8/pillow-12.1.1-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d470ab1178551dd17fdba0fef463359c41aaa613cdcd7ff8373f54be629f9f8f", size = 6381358, upload-time = "2026-02-11T04:22:37.698Z" },
{ url = "https://files.pythonhosted.org/packages/64/ef/c784e20b96674ed36a5af839305f55616f8b4f8aa8eeccf8531a6e312243/pillow-12.1.1-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6408a7b064595afcab0a49393a413732a35788f2a5092fdc6266952ed67de586", size = 7068558, upload-time = "2026-02-11T04:22:39.597Z" },
{ url = "https://files.pythonhosted.org/packages/73/cb/8059688b74422ae61278202c4e1ad992e8a2e7375227be0a21c6b87ca8d5/pillow-12.1.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:5d8c41325b382c07799a3682c1c258469ea2ff97103c53717b7893862d0c98ce", size = 6493028, upload-time = "2026-02-11T04:22:42.73Z" },
{ url = "https://files.pythonhosted.org/packages/c6/da/e3c008ed7d2dd1f905b15949325934510b9d1931e5df999bb15972756818/pillow-12.1.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:c7697918b5be27424e9ce568193efd13d925c4481dd364e43f5dff72d33e10f8", size = 7191940, upload-time = "2026-02-11T04:22:44.543Z" },
{ url = "https://files.pythonhosted.org/packages/56/11/5d43209aa4cb58e0cc80127956ff1796a68b928e6324bbf06ef4db34367b/pillow-12.1.1-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:600fd103672b925fe62ed08e0d874ea34d692474df6f4bf7ebe148b30f89f39f", size = 5228606, upload-time = "2026-02-11T04:22:52.106Z" },
{ url = "https://files.pythonhosted.org/packages/5f/d5/3b005b4e4fda6698b371fa6c21b097d4707585d7db99e98d9b0b87ac612a/pillow-12.1.1-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:665e1b916b043cef294bc54d47bf02d87e13f769bc4bc5fa225a24b3a6c5aca9", size = 4622321, upload-time = "2026-02-11T04:22:53.827Z" },
{ url = "https://files.pythonhosted.org/packages/df/36/ed3ea2d594356fd8037e5a01f6156c74bc8d92dbb0fa60746cc96cabb6e8/pillow-12.1.1-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:495c302af3aad1ca67420ddd5c7bd480c8867ad173528767d906428057a11f0e", size = 5247579, upload-time = "2026-02-11T04:22:56.094Z" },
{ url = "https://files.pythonhosted.org/packages/54/9a/9cc3e029683cf6d20ae5085da0dafc63148e3252c2f13328e553aaa13cfb/pillow-12.1.1-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8fd420ef0c52c88b5a035a0886f367748c72147b2b8f384c9d12656678dfdfa9", size = 6989094, upload-time = "2026-02-11T04:22:58.288Z" },
{ url = "https://files.pythonhosted.org/packages/00/98/fc53ab36da80b88df0967896b6c4b4cd948a0dc5aa40a754266aa3ae48b3/pillow-12.1.1-pp311-pypy311_pp73-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f975aa7ef9684ce7e2c18a3aa8f8e2106ce1e46b94ab713d156b2898811651d3", size = 5313850, upload-time = "2026-02-11T04:23:00.554Z" },
{ url = "https://files.pythonhosted.org/packages/30/02/00fa585abfd9fe9d73e5f6e554dc36cc2b842898cbfc46d70353dae227f8/pillow-12.1.1-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8089c852a56c2966cf18835db62d9b34fef7ba74c726ad943928d494fa7f4735", size = 5963343, upload-time = "2026-02-11T04:23:02.934Z" },
]
[[package]]
@@ -3754,7 +3769,7 @@ wheels = [
[[package]]
name = "pytest"
version = "9.0.3"
version = "9.0.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "iniconfig", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -3762,9 +3777,9 @@ dependencies = [
{ name = "pluggy", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pygments", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/7d/0d/549bd94f1a0a402dc8cf64563a117c0f3765662e2e668477624baeec44d5/pytest-9.0.3.tar.gz", hash = "sha256:b86ada508af81d19edeb213c681b1d48246c1a91d304c6c81a427674c17eb91c", size = 1572165, upload-time = "2026-04-07T17:16:18.027Z" }
sdist = { url = "https://files.pythonhosted.org/packages/d1/db/7ef3487e0fb0049ddb5ce41d3a49c235bf9ad299b6a25d5780a89f19230f/pytest-9.0.2.tar.gz", hash = "sha256:75186651a92bd89611d1d9fc20f0b4345fd827c41ccd5c299a868a05d70edf11", size = 1568901, upload-time = "2025-12-06T21:30:51.014Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249, upload-time = "2026-04-07T17:16:16.13Z" },
{ url = "https://files.pythonhosted.org/packages/3b/ab/b3226f0bd7cdcf710fbede2b3548584366da3b19b5021e74f5bde2a8fa3f/pytest-9.0.2-py3-none-any.whl", hash = "sha256:711ffd45bf766d5264d487b917733b453d917afd2b0ad65223959f59089f875b", size = 374801, upload-time = "2025-12-06T21:30:49.154Z" },
]
[[package]]