mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2026-06-28 16:24:19 +00:00
Docs: bulk-edit operation registry spec + implementation plan
Adds the extensibility design that hoists the bulk-edit operation definition (today smeared across serialisers.py, views.py, bulk_edit.py and keyed three different ways) behind a BulkEditOperation registry + PermissionRequirements value object, with per-operation OpenAPI examples. Contract-preserving refactor; both docs reviewed across multiple passes (permission matrix verified, both view call sites accounted for). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,524 @@
|
||||
# Bulk-Edit Operation Registry Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Collapse the bulk-edit operation definition — today smeared across 8 sites in 3 files, keyed 3 different ways — into a single `BulkEditOperation` object per operation, held in an ordered registry. The serializer and both view call sites consume the registry instead of re-encoding the operation list. The wire/API contract is preserved byte-for-byte; per-operation OpenAPI examples are added so the bulk API documents itself.
|
||||
|
||||
**Architecture:** A new `documents/bulk_operations.py` defines a `BulkEditOperation` ABC, a frozen `PermissionRequirements` value object, a per-operation DRF parameter serializer (validation + coercion), and an ordered `BULK_EDIT_OPERATIONS` registry whose 16 entries wrap the existing `bulk_edit.py` functions (which are unchanged). `BulkEditSerializer` resolves a method string to an operation and delegates parameter validation; `BulkEditView.post` and `_execute_document_action` read `op.needs_user` / `op.audit_field` / `op.required_permissions(...)` instead of the `METHOD_NAMES_*` sets, `MODIFIED_FIELD_BY_METHOD`, and the three `method in [...]` permission blocks.
|
||||
|
||||
**Tech Stack:** Python ≥3.11, Django REST Framework, drf-spectacular, pytest + pytest-mock + factory-boy. Backend tests run on the Linux VM (this is a Windows host); `ruff` runs locally.
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-06-16-bulk-edit-operation-registry-design.md` (rev. 2 — read the Operation inventory matrix and the Parameter coercion contract before starting; they are the source of truth for every per-op cell).
|
||||
|
||||
---
|
||||
|
||||
## Conventions for every task
|
||||
|
||||
- **Run backend tests on the VM** via the helper (never locally — the lockfile is linux/macOS only):
|
||||
```bash
|
||||
bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "<pytest targets/args>"
|
||||
```
|
||||
- **Lint locally** with the global ruff binary (not `uv run`):
|
||||
```bash
|
||||
ruff check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py
|
||||
ruff format src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py
|
||||
```
|
||||
- **New tests are pytest-style** (per CLAUDE.md): grouped in classes, `@pytest.mark.django_db` on the class where DB is needed, factory-boy factories (`UserFactory`, `DocumentFactory`, `TagFactory`, …), the `mocker` fixture, `@pytest.mark.parametrize`, full type annotations on fixtures and tests.
|
||||
- **`CustomFieldFactory` does not exist yet** in `tests/factories.py` (only `Correspondent`/`DocumentType`/`Tag`/`StoragePath`/`Document`/`User`/`PaperlessTask`). The `modify_custom_fields` `clean_parameters` tests need `CustomField` rows — add a `CustomFieldFactory` there first (per CLAUDE.md's "add a factory when a model lacks one").
|
||||
- **Do NOT convert the existing `test_api_bulk_edit.py`** (DRF `APITestCase` style) — it is the regression net and stays as-is. It must be green at every commit. Its `mock.patch("documents.serialisers.bulk_edit.<fn>")` / `documents.views.bulk_edit.<fn>` targets keep working **only if** the two invariants below hold — verify them, do not assume them.
|
||||
|
||||
### Two load-bearing invariants (the contract-preservation kernel)
|
||||
|
||||
1. **Module identity:** `serialisers.py`, `views.py`, and the new `bulk_operations.py` must each import the operations module as `from documents import bulk_edit` (module import, not `from documents.bulk_edit import merge`). All three then reference the _same_ `sys.modules["documents.bulk_edit"]` object, so a `mock.patch("documents.serialisers.bulk_edit.merge")` mutates the attribute every call site sees. **Verify** `serialisers.py` and `views.py` already use `from documents import bulk_edit` before relying on this.
|
||||
2. **Call-time lookup:** each `BulkEditOperation.execute` must call `bulk_edit.merge(doc_ids, **kw)` (attribute lookup at call time), NOT capture the function at class-definition time (`fn = bulk_edit.merge` as a class attribute). Otherwise the patch — applied after import — won't be seen.
|
||||
|
||||
## File structure
|
||||
|
||||
- **Create** `src/documents/bulk_operations.py` — `PermissionRequirements`, `BulkEditOperation` ABC, the per-op parameter serializers, the 16 operation classes, and the ordered `BULK_EDIT_OPERATIONS` registry. One cohesive module.
|
||||
- **Create** `src/documents/tests/test_bulk_operations.py` — pytest-style unit tests: the permission-matrix characterization (Task 1), then `required_permissions` / `clean_parameters` / registry-parity unit tests (Task 2).
|
||||
- **Modify** `src/documents/serialisers.py` — rewrite `BulkEditSerializer.method` choices, `validate_method`, and `validate()`; delete the `_validate_parameters_*` methods (their logic moves into the per-op serializers).
|
||||
- **Modify** `src/documents/views.py` — rewrite `_has_document_permissions`; delete `METHOD_NAMES_REQUIRING_USER`/`_TRIGGER_SOURCE` and `MODIFIED_FIELD_BY_METHOD`; route `BulkEditView.post` through the registry; change `_execute_document_action`'s signature from `method` to `op`; and update the **six** moved-endpoint caller views (`RotateDocumentsView`, `MergeDocumentsView`, `DeleteDocumentsView`, `ReprocessDocumentsView`, `EditPdfDocumentsView`, `RemovePasswordDocumentsView`, `views.py:2964-3109`) to pass `op=BULK_EDIT_OPERATIONS["<name>"]` instead of `method=bulk_edit.<fn>`. Add `from drf_spectacular.utils import OpenApiExample` (Task 4 needs it — not currently imported).
|
||||
|
||||
---
|
||||
|
||||
## Task 1: Permission-matrix characterization test (the safety net)
|
||||
|
||||
This test freezes today's permission behavior **before** any refactor. It must PASS against the current code unchanged — if any case is red now, the spec's matrix (or your reading of it) is wrong; stop and reconcile before proceeding. After the cutover (Task 3) it must still pass identically.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `src/documents/tests/test_bulk_operations.py`
|
||||
|
||||
- [ ] **Step 1: Write the behavior-level permission test against the live API**
|
||||
|
||||
Drive the real `bulk_edit/` endpoint so the test is independent of internal structure (it survives the refactor without edits). Build users with precise permission sets and owners, and assert the 200-vs-403 outcome per operation and parameter combination. Cover, at minimum, the conditional cases the spec calls out:
|
||||
|
||||
- ownership required: `set_permissions`, `delete`, `rotate`, `delete_pages`, `edit_pdf`, `remove_password` (unconditional); `merge`/`split` only when `delete_originals=true`.
|
||||
- `add_document` required: `split`, `merge` (unconditional); `edit_pdf`/`remove_password` only when `update_document` is falsy.
|
||||
- `delete_document` required: `delete` (unconditional); `merge`/`split` only when `delete_originals=true`.
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from rest_framework import status
|
||||
from rest_framework.test import APIClient
|
||||
|
||||
from documents.models import Document
|
||||
from documents.tests.factories import DocumentFactory
|
||||
from documents.tests.factories import UserFactory
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
class TestBulkEditPermissionMatrix:
|
||||
@pytest.fixture()
|
||||
def owned_docs(self, ...) -> list[Document]: ...
|
||||
|
||||
# parametrize (method, parameters, perms_to_grant, is_owner) -> expected_status
|
||||
@pytest.mark.parametrize(("method", "parameters", "grant", "owner", "expected"), [
|
||||
("set_correspondent", {"correspondent": None}, ["change"], False, status.HTTP_200_OK),
|
||||
("delete", {}, ["change"], True, status.HTTP_200_OK),
|
||||
("delete", {}, ["change"], False, status.HTTP_403_FORBIDDEN), # ownership
|
||||
("delete", {}, ["change", "delete"], False, status.HTTP_403_FORBIDDEN), # still needs ownership
|
||||
("merge", {"delete_originals": False}, ["change", "add"], False, status.HTTP_200_OK), # no ownership when not deleting
|
||||
("merge", {"delete_originals": True}, ["change", "add", "delete"], False, status.HTTP_403_FORBIDDEN), # ownership now required
|
||||
("edit_pdf", {"operations": [{"page": 1}], "update_document": False}, ["change"], True, status.HTTP_403_FORBIDDEN), # needs add_document
|
||||
("edit_pdf", {"operations": [{"page": 1}], "update_document": True}, ["change"], True, status.HTTP_200_OK), # update => owner+change only
|
||||
("remove_password", {"password": "x", "update_document": False}, ["change"], True, status.HTTP_403_FORBIDDEN), # needs add_document
|
||||
("remove_password", {"password": "x", "update_document": True}, ["change"], True, status.HTTP_200_OK),
|
||||
# ... fill every row of the spec matrix, both polarities of each conditional ...
|
||||
])
|
||||
def test_permission_outcome(self, method, parameters, grant, owner, expected, ...) -> None:
|
||||
# mock the actual bulk_edit.<fn> so execution is a no-op; we test ONLY the
|
||||
# permission gate's status code, not the operation's effect.
|
||||
...
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- Mock the underlying `bulk_edit.<fn>` (patch `documents.views.bulk_edit.<fn>`) so the operations don't actually run — this test is purely about the permission gate returning 200 vs 403.
|
||||
- A superuser short-circuits to allowed (`views.py:2697`); include one superuser row to pin that.
|
||||
- This is verbose by design; the matrix is the security contract. Prefer one parametrized test over hand-written methods.
|
||||
- **Cover the six moved single-action endpoints too (REQUIRED — C2).** `/api/documents/rotate/`, `/merge/`, `/delete/`, `/reprocess/`, `/edit_pdf/`, `/remove_password/` run the **same** `_has_document_permissions` gate via `_execute_document_action`, and that path is rewritten in Task 3 (C1). Add a parallel parametrized test that POSTs to each (their request bodies are the dedicated serializers' fields — e.g. `{"documents": [...], "degrees": 90}` for rotate — **not** a `method`+`parameters` envelope). The existing `test_api_bulk_edit.py` already covers these endpoints' permission gates (`test_rotate_insufficient_permissions:1320`, `test_merge_and_delete_insufficient_permissions:1381`, `test_edit_pdf_insufficient_permissions:1635`, `test_remove_password_insufficient_permissions:1719`), so this is hardening rather than the sole net — but make the moved-endpoint matrix explicit here so the `_execute_document_action` rewrite is guarded by a parametrized characterization, not scattered one-offs.
|
||||
- **`edit_pdf` test docs need a `page_count` (M3).** `clean_parameters` for `edit_pdf` bounds-checks `op["page"]` against `Document.page_count` (`serialisers.py:2052-2059`); this test mocks execution but **not** validation, so an `edit_pdf` row with `page: 1` needs its target doc created with `page_count >= 1`, else it fails with a 400 (out-of-bounds) instead of the expected 200/403.
|
||||
|
||||
- [ ] **Step 2: Run it against CURRENT code — it must PASS**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py -v"`
|
||||
Expected: PASS. If any row is red, the spec matrix is misread — reconcile against `views.py:2713-2760` before writing any production code.
|
||||
|
||||
- [ ] **Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add src/documents/tests/test_bulk_operations.py
|
||||
git commit -m "Test: characterize bulk-edit permission matrix before refactor"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: Build `bulk_operations.py` (registry, ABC, ops, serializers) — old path untouched
|
||||
|
||||
Build the entire new module with full unit coverage **while the existing dispatch still runs**, so the whole suite stays green throughout. Nothing in `serialisers.py`/`views.py` changes in this task.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `src/documents/bulk_operations.py`
|
||||
- Modify (append): `src/documents/tests/test_bulk_operations.py`
|
||||
|
||||
- [ ] **Step 1: Write failing unit tests for `PermissionRequirements`, `required_permissions`, and `clean_parameters`**
|
||||
|
||||
Append to `test_bulk_operations.py`. White-box this time — assert the value objects directly:
|
||||
|
||||
```python
|
||||
from documents import bulk_operations as ops
|
||||
|
||||
|
||||
class TestRequiredPermissions:
|
||||
@pytest.mark.parametrize(("name", "params", "expected"), [
|
||||
("set_correspondent", {}, ops.PermissionRequirements(change=True)),
|
||||
("delete", {}, ops.PermissionRequirements(change=True, ownership=True, delete_document=True)),
|
||||
("merge", {"delete_originals": False}, ops.PermissionRequirements(change=True, add_document=True)),
|
||||
("merge", {"delete_originals": True}, ops.PermissionRequirements(change=True, add_document=True, ownership=True, delete_document=True)),
|
||||
("edit_pdf", {"update_document": False}, ops.PermissionRequirements(change=True, ownership=True, add_document=True)),
|
||||
("edit_pdf", {"update_document": True}, ops.PermissionRequirements(change=True, ownership=True)),
|
||||
("remove_password", {"update_document": False}, ops.PermissionRequirements(change=True, ownership=True, add_document=True)),
|
||||
("remove_password", {"update_document": True}, ops.PermissionRequirements(change=True, ownership=True)),
|
||||
# ... every operation, both polarities of each conditional (spec matrix) ...
|
||||
])
|
||||
def test_required_permissions(self, name, params, expected) -> None:
|
||||
assert ops.BULK_EDIT_OPERATIONS[name].required_permissions(params) == expected
|
||||
|
||||
|
||||
class TestRegistryParity:
|
||||
def test_choices_are_16_unique_in_canonical_order(self) -> None:
|
||||
# 8 field-ops, then MOVED_DOCUMENT_ACTION_ENDPOINTS key order
|
||||
assert list(ops.BULK_EDIT_OPERATIONS) == [
|
||||
"set_correspondent", "set_document_type", "set_storage_path",
|
||||
"add_tag", "remove_tag", "modify_tags", "modify_custom_fields",
|
||||
"set_permissions",
|
||||
"delete", "reprocess", "rotate", "merge",
|
||||
"edit_pdf", "remove_password", "split", "delete_pages",
|
||||
]
|
||||
assert "redo_ocr" not in ops.BULK_EDIT_OPERATIONS
|
||||
|
||||
def test_every_op_executes_via_module_attribute(self, mocker) -> None:
|
||||
# guards invariant #2: call-time lookup so patches still bite
|
||||
m = mocker.patch("documents.bulk_operations.bulk_edit.merge", return_value="OK")
|
||||
ops.BULK_EDIT_OPERATIONS["merge"].execute([1], delete_originals=False)
|
||||
m.assert_called_once()
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
class TestCleanParameters:
|
||||
# mirror the existing _validate_parameters_* tests: defaults applied, pages
|
||||
# string parse, page-bounds vs page_count, custom-field list-or-dict +
|
||||
# documentlink targets, owner existence, source_mode gating. Assert the SAME
|
||||
# ValidationError message strings the old validators raised.
|
||||
...
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py::TestRequiredPermissions -v"`
|
||||
Expected: FAIL with `ModuleNotFoundError: No module named 'documents.bulk_operations'`.
|
||||
|
||||
- [ ] **Step 3: Implement `PermissionRequirements` and the `BulkEditOperation` ABC**
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
import dataclasses
|
||||
from abc import ABC
|
||||
from abc import abstractmethod
|
||||
from typing import ClassVar
|
||||
|
||||
from rest_framework import serializers
|
||||
|
||||
from documents import bulk_edit # module import — invariant #1
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class PermissionRequirements:
|
||||
change: bool = True # documents.change_document + object-level, always
|
||||
ownership: bool = False # user owns (or doc.owner is None for) ALL docs
|
||||
add_document: bool = False # documents.add_document
|
||||
delete_document: bool = False # documents.delete_document
|
||||
|
||||
|
||||
class BulkEditOperation(ABC):
|
||||
name: ClassVar[str]
|
||||
audit_field: ClassVar[str | None] = None
|
||||
supports_all: ClassVar[bool] = True
|
||||
max_documents: ClassVar[int | None] = None
|
||||
too_many_documents_message: ClassVar[str | None] = None
|
||||
needs_user: ClassVar[bool] = False
|
||||
needs_trigger_source: ClassVar[bool] = False
|
||||
parameter_serializer_class: ClassVar[type[serializers.Serializer] | None] = None
|
||||
example_parameters: ClassVar[dict] = {}
|
||||
|
||||
def clean_parameters(self, parameters: dict, *, user, documents: list[int]) -> dict:
|
||||
if self.parameter_serializer_class is None:
|
||||
return parameters
|
||||
serializer = self.parameter_serializer_class(
|
||||
data=parameters,
|
||||
context={"user": user, "documents": documents},
|
||||
)
|
||||
serializer.is_valid(raise_exception=True)
|
||||
# merge coerced/validated values back over the raw dict so passthrough
|
||||
# keys (e.g. metadata_document_id, source_mode) survive.
|
||||
return {**parameters, **serializer.validated_data}
|
||||
|
||||
def required_permissions(self, parameters: dict) -> PermissionRequirements:
|
||||
return PermissionRequirements()
|
||||
|
||||
@abstractmethod
|
||||
def execute(self, doc_ids: list[int], **parameters) -> str: ...
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Implement the 16 operation classes + parameter serializers**
|
||||
|
||||
Follow the spec's Operation inventory matrix for every cell. Representative examples — the simple assignment op, and the two conditional ones:
|
||||
|
||||
```python
|
||||
class SetCorrespondentOperation(BulkEditOperation):
|
||||
name = "set_correspondent"
|
||||
audit_field = "correspondent"
|
||||
parameter_serializer_class = SetCorrespondentParametersSerializer # validates correspondent id|null
|
||||
example_parameters = {"correspondent": 1}
|
||||
|
||||
def execute(self, doc_ids, **kw):
|
||||
return bulk_edit.set_correspondent(doc_ids, **kw)
|
||||
|
||||
|
||||
class MergeOperation(BulkEditOperation):
|
||||
name = "merge"
|
||||
supports_all = False
|
||||
needs_user = needs_trigger_source = True
|
||||
parameter_serializer_class = MergeParametersSerializer
|
||||
example_parameters = {"delete_originals": False, "archive_fallback": False}
|
||||
|
||||
def required_permissions(self, parameters):
|
||||
delete = parameters.get("delete_originals", False)
|
||||
return PermissionRequirements(
|
||||
change=True, add_document=True,
|
||||
ownership=delete, delete_document=delete,
|
||||
)
|
||||
|
||||
def execute(self, doc_ids, **kw):
|
||||
return bulk_edit.merge(doc_ids, **kw)
|
||||
|
||||
|
||||
class EditPdfOperation(BulkEditOperation):
|
||||
name = "edit_pdf"
|
||||
supports_all = False
|
||||
max_documents = 1
|
||||
too_many_documents_message = "Edit PDF method only supports one document"
|
||||
needs_user = needs_trigger_source = True
|
||||
parameter_serializer_class = EditPdfParametersSerializer
|
||||
example_parameters = {"operations": [{"page": 1, "rotate": 90}], "update_document": False, "include_metadata": True}
|
||||
|
||||
def required_permissions(self, parameters):
|
||||
# edit_pdf is ALWAYS ownership-gated (views.py:2722); add_document only
|
||||
# when NOT update_document (views.py:2740-2741).
|
||||
update = parameters.get("update_document", False)
|
||||
return PermissionRequirements(change=True, ownership=True, add_document=not update)
|
||||
|
||||
def execute(self, doc_ids, **kw):
|
||||
return bulk_edit.edit_pdf(doc_ids, **kw)
|
||||
```
|
||||
|
||||
Parameter serializers carry the validation+coercion the spec's "Parameter coercion contract to preserve" section enumerates — preserve the exact `ValidationError` message strings. Example for the DB/cross-field case:
|
||||
|
||||
```python
|
||||
class EditPdfParametersSerializer(serializers.Serializer):
|
||||
operations = serializers.ListField(child=serializers.DictField())
|
||||
update_document = serializers.BooleanField(required=False, default=False)
|
||||
include_metadata = serializers.BooleanField(required=False, default=True)
|
||||
# source_mode handled here too, only when present
|
||||
|
||||
def validate(self, attrs):
|
||||
# reproduce serialisers.py:2045-2059 verbatim, incl. messages:
|
||||
# - "update_document only allowed with a single output document"
|
||||
# - page-bounds: "Page {n} is out of bounds for document with {k} pages."
|
||||
# using self.context["documents"][0] / Document.objects.get(...)
|
||||
return attrs
|
||||
```
|
||||
|
||||
`RemovePasswordOperation` keeps an `update_document` param (it exists — `bulk_edit.py:881`); its `required_permissions` mirrors `EditPdfOperation`'s `add_document=not update` (but ownership is unconditional too — see matrix). `DeleteOperation` / `ReprocessOperation` set `parameter_serializer_class = None`. Do **not** register `redo_ocr`.
|
||||
|
||||
**Defaulting parity (H3) — match each old validator exactly, no more, no less.** `test_api_bulk_edit.py` asserts `mock.call_args` kwargs, so a serializer that injects a default the old validator didn't will break those asserts. `edit_pdf` _did_ default `update_document=False` / `include_metadata=True` (`serialisers.py:2038-2043`) → keep them. `remove_password` validated **only** `password` (`serialisers.py:2061-2065`) and did **not** default `update_document` / `include_metadata` / `delete_original` → `RemovePasswordParametersSerializer` must declare only `password`. `update_document` then survives as a **raw passthrough key** in `parameters` (so `required_permissions` still reads it via `parameters.get("update_document", False)`), and no extra kwargs reach `bulk_edit.remove_password`. Apply the same "match the old defaulting" rule to every op.
|
||||
|
||||
**`set_permissions` transform (H2) — the QuerySet shape is load-bearing.** `SetPermissionsParametersSerializer` must run `validate_set_permissions` (from `SetPermissionsMixin`, which `BulkEditSerializer` already inherits) so that `validated_data["set_permissions"]` carries the **QuerySet-dict** structure `bulk_edit.set_permissions` consumes — not the raw `{view:{users:[ids]}}` dict. A plain `DictField` would leave the raw dict in `validated_data`, and `{**parameters, **validated_data}` would then feed the function the wrong shape. Also default `merge=False` and validate `owner` existence (`serialisers.py:1946-1952`).
|
||||
|
||||
Build the **ordered** registry (legacy section in `MOVED_DOCUMENT_ACTION_ENDPOINTS` key order — `edit_pdf, remove_password` before `split, delete_pages`):
|
||||
|
||||
```python
|
||||
BULK_EDIT_OPERATIONS: dict[str, BulkEditOperation] = {
|
||||
op.name: op
|
||||
for op in (
|
||||
SetCorrespondentOperation(), SetDocumentTypeOperation(),
|
||||
SetStoragePathOperation(), AddTagOperation(), RemoveTagOperation(),
|
||||
ModifyTagsOperation(), ModifyCustomFieldsOperation(), SetPermissionsOperation(),
|
||||
DeleteOperation(), ReprocessOperation(), RotateOperation(), MergeOperation(),
|
||||
EditPdfOperation(), RemovePasswordOperation(), SplitOperation(), DeletePagesOperation(),
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run unit tests to green**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py -v"`
|
||||
Expected: PASS (permission matrix, required_permissions, registry parity, clean_parameters). The existing `test_api_bulk_edit.py` is untouched and still green (old path runs).
|
||||
|
||||
- [ ] **Step 6: Lint & commit**
|
||||
|
||||
```bash
|
||||
ruff check src/documents/bulk_operations.py && ruff format src/documents/bulk_operations.py
|
||||
git add src/documents/bulk_operations.py src/documents/tests/test_bulk_operations.py
|
||||
git commit -m "Feature: add bulk-edit operation registry (not yet wired)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Cutover — wire the serializer and BOTH view call sites
|
||||
|
||||
This is the atomic swap: `validate_method` returning an operation object ripples to both view sites, so serializer + views land in **one commit**. The full `test_api_bulk_edit.py` regression suite plus Task 1's matrix test are the contract; both must be green at the end.
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `src/documents/serialisers.py`
|
||||
- Modify: `src/documents/views.py`
|
||||
|
||||
- [ ] **Step 1: Confirm invariant #1**
|
||||
|
||||
Grep that `serialisers.py` and `views.py` import `from documents import bulk_edit` (not `from documents.bulk_edit import ...`). If they use member imports, the existing patches break — convert to module import as part of this task and note it.
|
||||
|
||||
- [ ] **Step 2: Rewrite `BulkEditSerializer`**
|
||||
|
||||
- `method = serializers.ChoiceField(choices=list(bulk_operations.BULK_EDIT_OPERATIONS), ...)` — registry alone (16, canonical order), **not** `+ LEGACY_DOCUMENT_ACTION_METHODS`.
|
||||
- `validate_method` → `return bulk_operations.BULK_EDIT_OPERATIONS[method]` (returns the op; raise `ValidationError("Unsupported method.")` on KeyError to preserve the message).
|
||||
- `validate()`:
|
||||
```python
|
||||
op = attrs["method"]
|
||||
if attrs.get("all", False) and not op.supports_all:
|
||||
raise serializers.ValidationError("This method does not support all=true.")
|
||||
if op.max_documents is not None and len(attrs["documents"]) > op.max_documents:
|
||||
raise serializers.ValidationError(op.too_many_documents_message)
|
||||
attrs["parameters"] = op.clean_parameters(
|
||||
attrs["parameters"], user=self.user, documents=attrs["documents"],
|
||||
)
|
||||
return attrs
|
||||
```
|
||||
- **Delete** all `_validate_parameters_*` / `_validate_storage_path` / `validate_parameters_remove_password` methods (their logic now lives in the per-op serializers). Keep `MOVED_DOCUMENT_ACTION_ENDPOINTS` / `LEGACY_DOCUMENT_ACTION_METHODS` (still used by the view's deprecation warning).
|
||||
|
||||
- [ ] **Step 3: Rewrite `_has_document_permissions` to consume `PermissionRequirements`**
|
||||
|
||||
```python
|
||||
def _has_document_permissions(self, *, user, documents, op, parameters) -> bool:
|
||||
if user.is_superuser:
|
||||
return True
|
||||
document_objs = Document.objects.select_related("owner").filter(pk__in=documents)
|
||||
reqs = op.required_permissions(parameters)
|
||||
ok = user.has_perm("documents.change_document") and all(
|
||||
has_perms_owner_aware(user, "change_document", doc) for doc in document_objs
|
||||
)
|
||||
if ok and reqs.ownership:
|
||||
ok = all((doc.owner == user or doc.owner is None) for doc in document_objs)
|
||||
if ok and reqs.add_document:
|
||||
ok = user.has_perm("documents.add_document")
|
||||
if ok and reqs.delete_document:
|
||||
ok = user.has_perm("documents.delete_document")
|
||||
return ok
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Route BOTH call sites through the op — they obtain the op differently**
|
||||
|
||||
There are two distinct paths, and `_execute_document_action` does **NOT** read `validated_data["method"]` (its serializers have no `method` field — it receives the operation as an argument). Handle each:
|
||||
|
||||
- **Delete** `METHOD_NAMES_REQUIRING_USER`, `METHOD_NAMES_REQUIRING_TRIGGER_SOURCE` (note: it is an alias — `METHOD_NAMES_REQUIRING_TRIGGER_SOURCE = METHOD_NAMES_REQUIRING_USER` at `views.py:2687` — so they are one object), and `MODIFIED_FIELD_BY_METHOD`.
|
||||
|
||||
- **`BulkEditView.post`** (`views.py:2852-2947`) — the `/bulk_edit/` path: `op = serializer.validated_data["method"]` (the registry object `validate_method` now returns). Replace `method.__name__ in METHOD_NAMES_REQUIRING_USER` → `op.needs_user`; trigger-source check → `op.needs_trigger_source`; the permission call → `_has_document_permissions(op=op, ...)`; `method(documents, **parameters)` → `op.execute(documents, **parameters)`. Audit block: `modified_field = op.audit_field` (replaces `MODIFIED_FIELD_BY_METHOD.get(method.__name__)`), reason → `f"Bulk edit: {op.name}"`. Snapshot/`log_create` otherwise unchanged.
|
||||
|
||||
- **`_execute_document_action`** (`views.py:2764-2807`) — the moved single-action path used by six views: change its signature from `method` to `op: BulkEditOperation`. Inside, replace `method.__name__ in METHOD_NAMES_REQUIRING_USER` → `op.needs_user`; trigger check → `op.needs_trigger_source`; `_has_document_permissions(method=method, ...)` → `_has_document_permissions(op=op, ...)`; `method(documents, **parameters)` → `op.execute(documents, **parameters)`. This path has **no** audit block — leave it that way. `op.clean_parameters` is **not** called here: each moved view's own serializer (`RotateDocumentsSerializer`, `MergeDocumentsSerializer`, …) already validated its parameters; the op supplies only needs_user / needs_trigger_source / required_permissions / execute.
|
||||
|
||||
- **The six caller views** (`RotateDocumentsView:2964`, `MergeDocumentsView:2991`, `DeleteDocumentsView:3018`, `ReprocessDocumentsView:3045`, `EditPdfDocumentsView:3072`, `RemovePasswordDocumentsView:3099`): change each `method=bulk_edit.<fn>` argument to `op=BULK_EDIT_OPERATIONS["<name>"]` (e.g. `op=BULK_EDIT_OPERATIONS["rotate"]`).
|
||||
|
||||
- [ ] **Step 5: Run the FULL regression + matrix suites**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_bulk_edit.py src/documents/tests/test_bulk_operations.py -v"`
|
||||
Expected: PASS — every existing `test_api_bulk_edit.py` test (patch targets still bite via invariant #1; `__name__`-dependent asserts gone), plus Task 1's matrix unchanged. If a `documents.serialisers.bulk_edit.X` / `documents.views.bulk_edit.X` patch stops biting, invariant #1 or #2 is violated — check the import style and that `execute` does call-time lookup.
|
||||
|
||||
- [ ] **Step 6: Run the broader API + audit suites** (signals/audit log touch this path)
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_documents.py src/documents/tests/test_api_bulk_download.py -k bulk or audit -v"`
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 7: Lint & commit**
|
||||
|
||||
```bash
|
||||
ruff check src/documents/serialisers.py src/documents/views.py && ruff format src/documents/serialisers.py src/documents/views.py
|
||||
git add src/documents/serialisers.py src/documents/views.py
|
||||
git commit -m "Refactor: route bulk_edit through the operation registry"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Registry-driven OpenAPI examples
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `src/documents/views.py`
|
||||
- Test: `src/documents/tests/test_bulk_operations.py`
|
||||
|
||||
- [ ] **Step 1: Write a failing test that every example validates**
|
||||
|
||||
```python
|
||||
class TestBulkEditExamples:
|
||||
def test_every_operation_has_a_valid_example(self) -> None:
|
||||
from documents.views import _bulk_edit_examples
|
||||
examples = _bulk_edit_examples()
|
||||
assert {e.summary for e in examples} == set(ops.BULK_EDIT_OPERATIONS)
|
||||
for ex in examples:
|
||||
op = ops.BULK_EDIT_OPERATIONS[ex.value["method"]]
|
||||
if op.parameter_serializer_class is not None:
|
||||
s = op.parameter_serializer_class(data=ex.value["parameters"], context={...})
|
||||
assert s.is_valid(), s.errors
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Implement the helper and wire `@extend_schema`**
|
||||
|
||||
First add the import — `OpenApiExample` is **not** currently in `views.py` (extend the existing `from drf_spectacular.utils import ...` line):
|
||||
|
||||
```python
|
||||
from drf_spectacular.utils import OpenApiExample
|
||||
```
|
||||
|
||||
```python
|
||||
def _bulk_edit_examples() -> list[OpenApiExample]:
|
||||
return [
|
||||
OpenApiExample(
|
||||
name=op.name, summary=op.name,
|
||||
value={"documents": [1, 2], "method": op.name, "parameters": op.example_parameters},
|
||||
request_only=True,
|
||||
)
|
||||
for op in BULK_EDIT_OPERATIONS.values()
|
||||
]
|
||||
```
|
||||
|
||||
Add `examples=_bulk_edit_examples()` to the existing `bulk_edit` `extend_schema(...)` (`views.py:2811-2825`). Leave `operation_id`, `description`, and the `responses` inline serializer unchanged.
|
||||
|
||||
- [ ] **Step 3: Run the example test + a schema smoke check**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py::TestBulkEditExamples -v"`
|
||||
Then regenerate the OpenAPI schema on the VM and confirm the diff is **examples-only** — the `method` enum membership/order is byte-identical and the request/response structure is unchanged:
|
||||
|
||||
```bash
|
||||
ssh -o BatchMode=yes -p 2244 trenton@localhost 'bash -lc "cd ~/projects/paperless-ngx && uv run manage.py spectacular --file /tmp/schema.yml"'
|
||||
```
|
||||
|
||||
Expected: schema generates without error; the `bulk_edit` `method` enum lists the 16 methods in canonical order; examples appear.
|
||||
|
||||
- [ ] **Step 4: Lint & commit**
|
||||
|
||||
```bash
|
||||
ruff check src/documents/views.py && ruff format src/documents/views.py
|
||||
git add src/documents/views.py src/documents/tests/test_bulk_operations.py
|
||||
git commit -m "Feature: document bulk_edit parameters via per-operation OpenAPI examples"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Final verification
|
||||
|
||||
**Files:** none (verification only).
|
||||
|
||||
- [ ] **Step 1: Full bulk-edit-related suite**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_bulk_edit.py src/documents/tests/test_bulk_operations.py src/documents/tests/test_api_bulk_download.py -v"`
|
||||
Expected: PASS, no failures, no errors.
|
||||
|
||||
- [ ] **Step 2: Type-check on the VM (pyrefly, with baseline)**
|
||||
|
||||
```bash
|
||||
tar czf - src pyproject.toml uv.lock .pyrefly-baseline.json | ssh -o BatchMode=yes -p 2244 trenton@localhost 'tar xzf - -C ~/projects/paperless-ngx'
|
||||
ssh -o BatchMode=yes -p 2244 trenton@localhost 'bash -lc "cd ~/projects/paperless-ngx && uv run pyrefly check"'
|
||||
```
|
||||
|
||||
Expected: no new type errors beyond the baseline.
|
||||
|
||||
- [ ] **Step 3: Final lint/format pass**
|
||||
|
||||
Run: `ruff check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py src/documents/tests/test_bulk_operations.py && ruff format --check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py`
|
||||
Expected: clean.
|
||||
|
||||
- [ ] **Step 4: Confirm the smear is gone**
|
||||
|
||||
Grep to verify no orphaned references remain: `MODIFIED_FIELD_BY_METHOD`, `METHOD_NAMES_REQUIRING_USER`, `_validate_parameters_`, and `method.__name__` in `views.py` should all be gone; `bulk_edit.<fn>` should appear only inside `bulk_operations.py` `execute` methods.
|
||||
|
||||
---
|
||||
|
||||
## Notes for the implementer
|
||||
|
||||
- **The permission matrix is the whole ballgame.** A wrong `required_permissions` cell is a privilege-escalation bug, not a cosmetic one. Task 1's parametrized characterization test (written and green _before_ the refactor) is the guardrail — never weaken a case to make the refactor pass; if it goes red, the production code is wrong.
|
||||
- **Preserve `ValidationError` message text verbatim** when porting `_validate_parameters_*` into the per-op serializers — `test_api_bulk_edit.py` asserts specific strings (e.g. the three distinct "only one document" messages, the all=true message, "out of bounds", "update_document only allowed with a single output document").
|
||||
- **Two call sites, obtained differently.** `BulkEditView.post` reads `op` from `validated_data["method"]` and owns the audit logging; `_execute_document_action` receives `op` as an argument from its **six** caller views (which change `method=bulk_edit.<fn>` → `op=BULK_EDIT_OPERATIONS["<name>"]`) and has no audit. Convert both paths and all six caller views; Task 1 must characterize both before cutover.
|
||||
- **`redo_ocr` stays unregistered** (dead/unreachable today; registering it would newly accept it on the wire).
|
||||
- **Out of scope:** a discriminated `oneOf` request schema for `parameters` — examples (Task 4) are the agreed approach; the polymorphic schema is a possible later follow-up (the discriminator `method` and payload `parameters` are sibling fields, which `PolymorphicProxySerializer` does not model cleanly).
|
||||
@@ -0,0 +1,513 @@
|
||||
# Bulk-Edit Operation Registry — Design
|
||||
|
||||
**Date:** 2026-06-16
|
||||
**Branch base:** `dev`
|
||||
**Status:** Draft (rev. 2 — corrected per critical review)
|
||||
|
||||
## Problem
|
||||
|
||||
A single bulk-edit operation's definition is smeared across **eight sites in
|
||||
three files**, keyed **three different ways**, with no single source of truth.
|
||||
Taking `merge` as the worked example:
|
||||
|
||||
| # | Location | What it holds | Keyed by |
|
||||
| --- | ------------------------- | --------------------------------------------------------- | --------------------- |
|
||||
| 1 | `serialisers.py:1758` | name in the `method` `ChoiceField.choices` | **string** |
|
||||
| 2 | `serialisers.py:1849` | `validate_method` `elif` → returns `bulk_edit.merge` | **string → function** |
|
||||
| 3 | `serialisers.py:2070` | the `all=true`-unsupported list | **function identity** |
|
||||
| 4 | `serialisers.py:2115` | `validate()` dispatch → `_validate_parameters_merge` | **function identity** |
|
||||
| 5 | `serialisers.py:2008` | `_validate_parameters_merge` (validate + coerce defaults) | — |
|
||||
| 6 | `views.py:2687` | `METHOD_NAMES_REQUIRING_USER` / `_TRIGGER_SOURCE` | **`__name__`** |
|
||||
| 7 | `views.py:2727,2738,2754` | three permission blocks (`method in [...]`) | **function identity** |
|
||||
| 8 | `views.py:2844` | `MODIFIED_FIELD_BY_METHOD` audit field | **string** |
|
||||
|
||||
Plus the execution function itself in `bulk_edit.py`.
|
||||
|
||||
Three structural problems follow:
|
||||
|
||||
- **`validate_method` resolves the request string to a _function object_**
|
||||
(`serialisers.py:1826-1860`), so everything downstream compares either
|
||||
`method == bulk_edit.merge` (identity), `method.__name__` (string), or the raw
|
||||
request string. Three keying schemes for one concept. Adding an operation — or
|
||||
editing one — means touching all eight sites, and forgetting one fails
|
||||
_silently_ (an op that runs but isn't audited, or skips an ownership check)
|
||||
rather than loudly.
|
||||
|
||||
- **The permission matrix is parameter-conditional and security-critical.** From
|
||||
`views.py:2713-2760`: ownership is required for `merge`/`split` _only_ when
|
||||
`delete_originals` is set; `add_document` is required for `edit_pdf`/
|
||||
`remove_password` _only_ when `update_document` is not set; `delete_document`
|
||||
for `merge`/`split` _only_ when `delete_originals`. This logic is correct but
|
||||
lives far from the operations it governs, so it is hard to audit and easy to
|
||||
break.
|
||||
|
||||
- **The API is self-undocumenting.** `parameters` is a bare
|
||||
`serializers.DictField` (`serialisers.py:1773`). drf-spectacular renders it as
|
||||
a free-form object, so the OpenAPI schema tells a caller nothing about what
|
||||
`merge` versus `set_correspondent` actually expect. The repo uses
|
||||
`@extend_schema`/`inline_serializer` widely (62 sites) but has **no**
|
||||
`PolymorphicProxySerializer`, `OpenApiExample`, or `discriminator` usage to
|
||||
describe this polymorphic endpoint.
|
||||
|
||||
## Goal
|
||||
|
||||
Make each bulk-edit operation a **single object** that owns all eight facts —
|
||||
name, execution callable, parameter validation/coercion, audit field, the
|
||||
`all=`/single-document constraints, the user/trigger-source needs, and its
|
||||
parameter-conditional permission requirements. Operations live in a registry;
|
||||
the serializer and view consume the registry instead of re-encoding the
|
||||
operation list. Adding an operation becomes one class plus one registry entry,
|
||||
not an eight-site edit. As a deliberate, contract-preserving bonus, each
|
||||
operation also contributes a **per-operation request example** so the bulk API
|
||||
finally documents itself in the OpenAPI schema.
|
||||
|
||||
**The wire contract does not change.** This is a relocation of internal logic,
|
||||
not a redefinition of the endpoint.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- New `documents/bulk_operations.py` (registry + `BulkEditOperation` classes +
|
||||
`PermissionRequirements`). The execution functions stay in `bulk_edit.py`;
|
||||
operation classes wrap them.
|
||||
- Rewrite `BulkEditSerializer.validate_method` / `validate()` and the
|
||||
`_validate_parameters_*` methods to delegate to the operation's parameter
|
||||
serializer.
|
||||
- Rewrite `BulkEditView._has_document_permissions`, the `METHOD_NAMES_*` sets,
|
||||
and `MODIFIED_FIELD_BY_METHOD` to read from the registry.
|
||||
- Add `examples=[...]` to the `bulk_edit` `@extend_schema`, generated from the
|
||||
registry (one example per operation).
|
||||
- Unit tests per operation; keep every existing `test_api_bulk_edit*` test green.
|
||||
|
||||
Out of scope:
|
||||
|
||||
- Changing any operation's behavior, accepted method strings, parameter names,
|
||||
defaults, coercion, or permission outcome. Byte-for-byte wire compatibility.
|
||||
- The legacy-method deprecation-warning machinery
|
||||
(`MOVED_DOCUMENT_ACTION_ENDPOINTS`, the API-v9-drop TODO at `views.py:2855`):
|
||||
legacy methods log a warning and process **inline** — there is **no** redirect
|
||||
(`views.py:2856-2866`). Preserved as-is.
|
||||
- A full polymorphic request schema (`oneOf`/discriminated `parameters`). Examples
|
||||
(option 1) are in scope; a discriminated schema is a possible future follow-up
|
||||
and is **not** built here — the discriminator (`method`) and the variant
|
||||
payload (`parameters`) are sibling fields, which `PolymorphicProxySerializer`
|
||||
does not model cleanly. YAGNI until examples prove insufficient.
|
||||
- Converting `bulk_edit.py` into a package, or touching the execution functions'
|
||||
internals.
|
||||
- Any third-party / entry-point registration of operations. The registry is
|
||||
in-tree only; an entry point could be layered on later but the PDF/page ops are
|
||||
tightly bound to internal helpers, so ecosystem value is low and unproven.
|
||||
|
||||
## Decisions
|
||||
|
||||
These shape the design and are the reviewable choices:
|
||||
|
||||
1. **Operations wrap, not replace, the `bulk_edit.py` functions.** Each
|
||||
`BulkEditOperation.execute` calls the existing function. The execution code is
|
||||
correct and well-tested; this refactor is about the metadata and dispatch
|
||||
around it, exactly as the export-sink refactor moved _plumbing_ without
|
||||
touching export _contents_.
|
||||
2. **Parameter validation moves into a per-operation DRF `Serializer`**, not an
|
||||
ad-hoc `clean_*` method. A real serializer (a) validates and coerces in one
|
||||
place (replacing the `_validate_parameters_*` methods _and_ their in-place
|
||||
mutation of defaults / the `pages`-string parse), (b) accepts `context`
|
||||
(`user`, `documents`) for the cross-field/DB checks (page-bounds vs
|
||||
`document.page_count`, documentlink targets, owner existence), and (c) is a
|
||||
structure drf-spectacular already understands. Operations with no parameters
|
||||
(`delete`, `reprocess`) declare `parameter_serializer_class = None`.
|
||||
3. **Permission requirements are computed by the operation, given the validated
|
||||
parameters**, returning a `PermissionRequirements` value object. The
|
||||
parameter-conditional kernel (ownership iff `delete_originals`, etc.) lives
|
||||
next to the operation it governs. The view's three permission blocks collapse
|
||||
to "build requirements, then check each flag generically."
|
||||
4. **Examples are derived from the registry** (option 1 from the design
|
||||
discussion). Each operation declares a canonical `example_parameters` dict; a
|
||||
helper builds one `OpenApiExample` per operation for the `bulk_edit`
|
||||
`@extend_schema`. Adding an operation therefore auto-adds its example — the
|
||||
examples cannot drift out of sync with the registry. This is the only piece
|
||||
that _adds_ to the schema; it does not alter the request/response structure.
|
||||
5. **The registry is the single source of the method enum.** Today's enum is the
|
||||
8 hardcoded field-ops (`serialisers.py:1758-1766`) plus
|
||||
`LEGACY_DOCUMENT_ACTION_METHODS` — but the legacy methods (`delete, reprocess,
|
||||
rotate, merge, edit_pdf, remove_password, split, delete_pages`) **are
|
||||
themselves operations**, not a disjoint set, so all **16 unique** methods live
|
||||
in the registry. `ChoiceField.choices` is therefore
|
||||
`list(BULK_EDIT_OPERATIONS)` **alone** — do NOT append
|
||||
`LEGACY_DOCUMENT_ACTION_METHODS` (that would duplicate 8 entries and churn the
|
||||
enum, the exact thing this decision prevents). The registry must be **ordered**
|
||||
to reproduce today's member order — the 8 field-ops first (in
|
||||
`serialisers.py:1758-1766` order), then the 8 legacy methods in
|
||||
`MOVED_DOCUMENT_ACTION_ENDPOINTS` **key/insertion order** (`delete, reprocess,
|
||||
rotate, merge, edit_pdf, remove_password, split, delete_pages`;
|
||||
`serialisers.py:1745-1754`) — so the generated OpenAPI `enum` is byte-identical.
|
||||
NB: that legacy order is `edit_pdf, remove_password` _before_ `split,
|
||||
delete_pages` — do not reorder them.
|
||||
|
||||
## Architecture
|
||||
|
||||
### `PermissionRequirements`
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class PermissionRequirements:
|
||||
change: bool = True # documents.change_document + object-level, always
|
||||
ownership: bool = False # user owns (or doc.owner is None for) ALL docs
|
||||
add_document: bool = False # documents.add_document
|
||||
delete_document: bool = False # documents.delete_document
|
||||
```
|
||||
|
||||
### `BulkEditOperation`
|
||||
|
||||
New module `documents/bulk_operations.py`:
|
||||
|
||||
```python
|
||||
class BulkEditOperation(ABC):
|
||||
name: ClassVar[str]
|
||||
audit_field: ClassVar[str | None] = None # → MODIFIED_FIELD_BY_METHOD
|
||||
supports_all: ClassVar[bool] = True # → the all=true guard
|
||||
max_documents: ClassVar[int | None] = None # split/delete_pages/edit_pdf = 1
|
||||
too_many_documents_message: ClassVar[str | None] = None # per-op error text (H3)
|
||||
needs_user: ClassVar[bool] = False # → METHOD_NAMES_REQUIRING_USER
|
||||
needs_trigger_source: ClassVar[bool] = False # → ..._REQUIRING_TRIGGER_SOURCE
|
||||
parameter_serializer_class: ClassVar[type[serializers.Serializer] | None] = None
|
||||
example_parameters: ClassVar[dict] = {} # → OpenApiExample payload
|
||||
|
||||
def clean_parameters(self, parameters: dict, *, user, documents) -> dict:
|
||||
"""Validate + coerce via parameter_serializer_class (context=user,documents).
|
||||
Returns the normalized parameters. Raises serializers.ValidationError.
|
||||
No-op passthrough when parameter_serializer_class is None."""
|
||||
|
||||
def required_permissions(self, parameters: dict) -> PermissionRequirements:
|
||||
"""The parameter-conditional permission kernel. Default: change only."""
|
||||
return PermissionRequirements()
|
||||
|
||||
@abstractmethod
|
||||
def execute(self, doc_ids: list[int], **parameters) -> str: ...
|
||||
```
|
||||
|
||||
The two subtle operations, stated next to their own rules:
|
||||
|
||||
```python
|
||||
class MergeOperation(BulkEditOperation):
|
||||
name = "merge"
|
||||
supports_all = False
|
||||
needs_user = needs_trigger_source = True
|
||||
parameter_serializer_class = MergeParametersSerializer
|
||||
example_parameters = {"delete_originals": False, "archive_fallback": False}
|
||||
|
||||
def required_permissions(self, parameters):
|
||||
delete = parameters.get("delete_originals", False)
|
||||
return PermissionRequirements(
|
||||
change=True, add_document=True,
|
||||
ownership=delete, delete_document=delete,
|
||||
)
|
||||
|
||||
def execute(self, doc_ids, **kw):
|
||||
return bulk_edit.merge(doc_ids, **kw)
|
||||
|
||||
|
||||
class EditPdfOperation(BulkEditOperation):
|
||||
name = "edit_pdf"
|
||||
supports_all = False
|
||||
max_documents = 1
|
||||
needs_user = needs_trigger_source = True
|
||||
parameter_serializer_class = EditPdfParametersSerializer
|
||||
example_parameters = {
|
||||
"operations": [{"page": 1, "rotate": 90}],
|
||||
"update_document": False,
|
||||
"include_metadata": True,
|
||||
}
|
||||
|
||||
def required_permissions(self, parameters):
|
||||
update = parameters.get("update_document", False)
|
||||
# edit_pdf is ALWAYS ownership-gated (views.py:2722); add_document only
|
||||
# when NOT update_document (views.py:2740-2741).
|
||||
return PermissionRequirements(
|
||||
change=True, ownership=True, add_document=not update,
|
||||
)
|
||||
```
|
||||
|
||||
### Registry
|
||||
|
||||
```python
|
||||
BULK_EDIT_OPERATIONS: dict[str, BulkEditOperation] = {
|
||||
op.name: op
|
||||
for op in (
|
||||
SetCorrespondentOperation(), SetDocumentTypeOperation(),
|
||||
SetStoragePathOperation(), AddTagOperation(), RemoveTagOperation(),
|
||||
ModifyTagsOperation(), ModifyCustomFieldsOperation(),
|
||||
SetPermissionsOperation(),
|
||||
# legacy section — MUST match MOVED_DOCUMENT_ACTION_ENDPOINTS key order
|
||||
# (serialisers.py:1745-1754) so the generated enum is byte-identical:
|
||||
DeleteOperation(), ReprocessOperation(), RotateOperation(),
|
||||
MergeOperation(), EditPdfOperation(), RemovePasswordOperation(),
|
||||
SplitOperation(), DeletePagesOperation(),
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
There is **no** `redo_ocr` entry. `validate_method` has a `method == "redo_ocr"`
|
||||
branch (`serialisers.py:1843`), but `"redo_ocr"` is absent from `choices`
|
||||
(`serialisers.py:1758-1768`), so the `ChoiceField` rejects it _before_
|
||||
`validate_method` runs — that branch is unreachable dead code today. Do **not**
|
||||
add `redo_ocr` to the registry: doing so would make it a valid `choices` entry
|
||||
and newly accept it on the wire (a contract change). `reprocess` is registered
|
||||
once, under `reprocess`.
|
||||
|
||||
### How each call site collapses
|
||||
|
||||
- **`ChoiceField.choices`** → `list(BULK_EDIT_OPERATIONS)` (the 16 unique
|
||||
methods, registry ordered to match today). Legacy methods are already registry
|
||||
ops, so they are **not** appended separately (see Decision 5).
|
||||
- **`validate_method`** → `return BULK_EDIT_OPERATIONS[method]` (the validated
|
||||
value becomes an _operation object_ instead of a function — internal only,
|
||||
`method` is `write_only`).
|
||||
- **`validate()`** → `op.clean_parameters(parameters, user=…, documents=…)`; the
|
||||
`all=true` guard becomes `if attrs.get("all") and not op.supports_all: raise
|
||||
ValidationError("This method does not support all=true.")` (today's single
|
||||
shared message, `serialisers.py:2077`, asserted verbatim by
|
||||
`test_api_bulk_edit.py:763`); the per-method "only one document" checks become
|
||||
an `op.max_documents` check that raises `op.too_many_documents_message`. That
|
||||
text is **per-op** — "Split method only supports one document", "Delete pages
|
||||
method only supports one document", "Edit PDF method only supports one document"
|
||||
(`serialisers.py:2105,2111,2119`) — and is asserted verbatim (e.g.
|
||||
`test_api_bulk_edit.py:1519`), so it **cannot** be collapsed to one generic
|
||||
string.
|
||||
- **`METHOD_NAMES_REQUIRING_USER` / `_TRIGGER_SOURCE`** → `op.needs_user` /
|
||||
`op.needs_trigger_source`.
|
||||
- **The three permission blocks** → one pass:
|
||||
```python
|
||||
reqs = op.required_permissions(parameters)
|
||||
ok = user.has_perm("documents.change_document") and all(
|
||||
has_perms_owner_aware(user, "change_document", d) for d in document_objs
|
||||
)
|
||||
if ok and reqs.ownership: ok = user_is_owner_of_all_documents
|
||||
if ok and reqs.add_document: ok = user.has_perm("documents.add_document")
|
||||
if ok and reqs.delete_document: ok = user.has_perm("documents.delete_document")
|
||||
```
|
||||
- **`MODIFIED_FIELD_BY_METHOD`** → `op.audit_field`.
|
||||
|
||||
**Two call sites consume this, not one.** `BulkEditView.post`
|
||||
(`views.py:2852-2947`) is a fully **inlined** path — it is the only path the
|
||||
`bulk_edit/` endpoint uses. It checks permissions, sets `user`/`trigger_source`,
|
||||
runs the audit-log block (`views.py:2896-2940`, currently keyed on
|
||||
`method.__name__` → becomes `op.audit_field`), and calls `method(documents,
|
||||
**parameters)`. `_execute_document_action` (`views.py:2764-2807`) is a
|
||||
**separate** path used by the _moved single-action_ endpoints
|
||||
(`/api/documents/delete/`, `/rotate/`, …); it builds `parameters`, sets
|
||||
user/trigger, and checks permissions independently and has **no** audit logging.
|
||||
The refactor must convert **both** to the registry; audit logging stays only in
|
||||
`post`.
|
||||
|
||||
## Operation inventory (the faithful matrix)
|
||||
|
||||
Compiled from `bulk_edit.py` signatures, `serialisers.py:2067-2126`, and
|
||||
`views.py:2679-2760`. `change` is required for every operation and omitted.
|
||||
`[source_mode]` is the shared optional param accepted by the PDF-touching ops
|
||||
(validated by `_validate_source_mode` only when present).
|
||||
|
||||
| Operation (`name`) | Parameters | `supports_all` | `max_documents` | user/trigger | `audit_field` | ownership | add_doc | delete_doc |
|
||||
| ------------------------------ | ------------------------------------------------------------------------------------------------------------- | -------------- | --------------- | ------------ | --------------- | ---------------------- | ------------------------- | ---------------------- |
|
||||
| `set_correspondent` | `correspondent: int\|null` | yes | — | no | `correspondent` | — | — | — |
|
||||
| `set_document_type` | `document_type: int\|null` | yes | — | no | `document_type` | — | — | — |
|
||||
| `set_storage_path` | `storage_path: int\|null` | yes | — | no | `storage_path` | — | — | — |
|
||||
| `add_tag` | `tag: int` | yes | — | no | `tags` | — | — | — |
|
||||
| `remove_tag` | `tag: int` | yes | — | no | `tags` | — | — | — |
|
||||
| `modify_tags` | `add_tags: int[]`, `remove_tags: int[]` | yes | — | no | `tags` | — | — | — |
|
||||
| `modify_custom_fields` | `add_custom_fields: int[]\|{id:val}`, `remove_custom_fields: int[]` | yes | — | no | `custom_fields` | — | — | — |
|
||||
| `set_permissions` | `set_permissions: obj`, `owner: int\|null`, `merge: bool=false` | yes | — | no | `None` | **yes** | — | — |
|
||||
| `delete` | _(none)_ | yes | — | no | `deleted_at` | **yes** | — | **yes** |
|
||||
| `reprocess` (alias `redo_ocr`) | _(none)_ | yes | — | no | `checksum` | — | — | — |
|
||||
| `rotate` | `degrees: int`, `[source_mode]` | yes | — | **yes** | `None` | **yes** | — | — |
|
||||
| `merge` | `delete_originals: bool=false`, `archive_fallback: bool=false`, `metadata_document_id?: int`, `[source_mode]` | **no** | — | **yes** | `None` | iff `delete_originals` | **yes** | iff `delete_originals` |
|
||||
| `split` | `pages: str→int[][]`, `delete_originals: bool=false`, `[source_mode]` | **no** | **1** | **yes** | `None` | iff `delete_originals` | **yes** | iff `delete_originals` |
|
||||
| `delete_pages` | `pages: int[]`, `[source_mode]` | **no** | **1** | **yes** | `None` | **yes** | — | — |
|
||||
| `edit_pdf` | `operations: obj[]`, `update_document: bool=false`, `include_metadata: bool=true`, `[source_mode]` | **no** | **1** | **yes** | `None` | **yes** | iff not `update_document` | — |
|
||||
| `remove_password` | `password: str`, `update_document: bool=false`, `[source_mode]` | **no** | — | **yes** | `None` | **yes** | iff not `update_document` | — |
|
||||
|
||||
Notes that are easy to get wrong and are pinned here:
|
||||
|
||||
- `edit_pdf` ownership is **unconditional** — it is in the unconditional
|
||||
ownership list (`views.py:2722`); the separate `edit_pdf and update_document`
|
||||
clause (`views.py:2730`) is redundant and folds away.
|
||||
- `remove_password` **does** accept an `update_document` param
|
||||
(`bulk_edit.py:881`), and `parameters` is a passthrough `DictField` whose
|
||||
validator (`serialisers.py:2061-2065`) neither strips nor defaults it. So its
|
||||
`add_document` requirement is `not parameters.get("update_document", False)` —
|
||||
identical to `edit_pdf`, **not** an unconditional `True`. Sending
|
||||
`update_document: true` legitimately drops the add_document requirement today,
|
||||
and that behavior must be preserved. (Earlier drafts claimed the param did not
|
||||
exist — that was a permission-correctness bug.)
|
||||
- `merge` and `remove_password` are **not** single-document (no `max_documents`),
|
||||
even though both set `supports_all = False`.
|
||||
|
||||
## Parameter coercion contract to preserve
|
||||
|
||||
`clean_parameters` must reproduce every in-place coercion the current
|
||||
`_validate_parameters_*` methods perform, not merely the validation. Full list
|
||||
(an implementation-plan checklist):
|
||||
|
||||
- `merge` / `split`: default `delete_originals=False`
|
||||
(`serialisers.py:1998,2013`); `merge` also defaults `archive_fallback=False`
|
||||
(`:2018`).
|
||||
- `edit_pdf`: default `update_document=False`, `include_metadata=True`
|
||||
(`:2038,2043`); reject `update_document=True` with multiple output docs
|
||||
(`:2045-2050`).
|
||||
- `set_permissions`: default `merge=False` (`:1951-1952`) and **mutate**
|
||||
`parameters["set_permissions"]` in place via `validate_set_permissions`
|
||||
(`:1946`, `SetPermissionsMixin`); validate `owner` existence
|
||||
(`:1939-1943,1949-1950`). Needs its own `SetPermissionsParametersSerializer`.
|
||||
- `split`: parse the `pages` string `"1-3,5"` → `[[1,2,3],[5]]` (`:1974-1990`).
|
||||
- `source_mode`: validated and applied **only when present** in `parameters`
|
||||
(`:2084-2085` gate → `validate_source_mode`, `:1964-1969`), independent of
|
||||
method — so each PDF-touching op's serializer opts into it conditionally.
|
||||
- `modify_custom_fields`: accept **list OR `{id: value}` dict**, and for
|
||||
DOCUMENTLINK fields validate targets via `validate_documentlink_targets`
|
||||
(`:1787-1824`).
|
||||
- **Param-name spelling differs by op** and must match exactly: `merge`/`split`
|
||||
use `delete_originals` (plural); `edit_pdf`/`remove_password` use
|
||||
`delete_original` (singular) (`bulk_edit.py:509,619,751,882`).
|
||||
|
||||
## OpenAPI examples (the "make it useful" piece)
|
||||
|
||||
A single helper builds the examples from the registry:
|
||||
|
||||
```python
|
||||
def _bulk_edit_examples() -> list[OpenApiExample]:
|
||||
return [
|
||||
OpenApiExample(
|
||||
name=op.name,
|
||||
summary=op.name,
|
||||
value={"documents": [1, 2], "method": op.name,
|
||||
"parameters": op.example_parameters},
|
||||
request_only=True,
|
||||
)
|
||||
for op in BULK_EDIT_OPERATIONS.values()
|
||||
]
|
||||
```
|
||||
|
||||
wired into the existing decorator (the response schema at `views.py:2818-2825`
|
||||
is untouched):
|
||||
|
||||
```python
|
||||
@extend_schema_view(
|
||||
post=extend_schema(
|
||||
operation_id="bulk_edit",
|
||||
description="Perform a bulk edit operation on a list of documents",
|
||||
examples=_bulk_edit_examples(),
|
||||
responses={200: inline_serializer(name="BulkEditDocumentsResult",
|
||||
fields={"result": serializers.CharField()})},
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
Result: the Swagger/Redoc page shows a concrete, valid request body for every
|
||||
operation (`merge`, `edit_pdf`, …), generated from the same objects that
|
||||
validate the request — they cannot drift apart. The request _structure_
|
||||
(`{documents, method, parameters, …}`) and the `method` `enum` are unchanged;
|
||||
examples are purely additive.
|
||||
|
||||
## Data flow
|
||||
|
||||
```
|
||||
POST /api/documents/bulk_edit/ {documents|all|filters, method, parameters, from_webui}
|
||||
├─ legacy method? → log deprecation warning, then process INLINE (no redirect; views.py:2856-2866)
|
||||
├─ BulkEditSerializer.validate_method(method) → op = BULK_EDIT_OPERATIONS[method]
|
||||
├─ validate():
|
||||
│ ├─ all=true and not op.supports_all → ValidationError (shared message)
|
||||
│ ├─ op.max_documents and len(documents) > it → ValidationError (op.too_many_documents_message)
|
||||
│ └─ parameters = op.clean_parameters(parameters, user=…, documents=…)
|
||||
└─ BulkEditView.post (inlined; the only path bulk_edit/ uses):
|
||||
├─ if op.needs_user: parameters["user"] = user
|
||||
├─ if op.needs_trigger_source: parameters["trigger_source"] = WEB_UI|API_UPLOAD
|
||||
├─ reqs = op.required_permissions(parameters); check change/ownership/add/delete
|
||||
│ → 403 HttpResponseForbidden on failure (unchanged)
|
||||
├─ if op.audit_field and AUDIT_LOG_ENABLED: snapshot old values (views.py:2896-2910)
|
||||
├─ result = op.execute(documents, **parameters) (call-time bulk_edit.<fn> lookup)
|
||||
└─ if op.audit_field and AUDIT_LOG_ENABLED: LogEntry per doc → Response({"result": …})
|
||||
note: _execute_document_action (views.py:2764-2807) is the SEPARATE moved-single-action path
|
||||
(/api/documents/delete/, /rotate/, …); it converts to the registry too, but has NO audit log.
|
||||
audit "reason" string uses op.name (== bulk_edit.<fn>.__name__ today, so unchanged at runtime).
|
||||
```
|
||||
|
||||
## Backwards compatibility
|
||||
|
||||
- **Wire contract:** request/response shapes, accepted `method` strings,
|
||||
parameter names, defaults, coercion, and every permission outcome are
|
||||
byte-for-byte preserved. `method` becoming an operation object is internal
|
||||
(`write_only`).
|
||||
- **`bulk_edit.<fn>` patching keeps working — by module identity, not luck.**
|
||||
Existing tests patch `documents.serialisers.bulk_edit.<fn>` and
|
||||
`documents.views.bulk_edit.<fn>` (e.g. `test_api_bulk_edit.py:203,485,1100,1271`).
|
||||
All of `documents.serialisers.bulk_edit`, `documents.views.bulk_edit`, and the
|
||||
new `documents.bulk_operations.bulk_edit` are the **same module object** in
|
||||
`sys.modules`; patching an attribute via any path mutates the one shared module.
|
||||
So as long as each `op.execute` does a **call-time** lookup
|
||||
(`return bulk_edit.merge(doc_ids, **kw)`, not a function captured at
|
||||
class-definition time), the existing patches still intercept and those tests
|
||||
stay untouched.
|
||||
- **The `method.__name__` dependency disappears.** `setup_mock`
|
||||
(`test_api_bulk_edit.py:61-63`) sets `m.__name__` because dispatch reads
|
||||
`method.__name__` (`views.py:2783,2879,2896,2938`). The refactor replaces every
|
||||
such read with `op.name` / `op.needs_user` / `op.audit_field`, so the mock's
|
||||
`__name__` no longer affects dispatch. The audit "reason" becomes
|
||||
`f"Bulk edit: {op.name}"`; since `bulk_edit.merge.__name__ == "merge" ==
|
||||
op.name`, real-run audit text is unchanged. No test asserts
|
||||
`validated_data["method"]` identity (verified), so `validate_method` returning
|
||||
an operation object is safe.
|
||||
- **Legacy methods:** `MOVED_DOCUMENT_ACTION_ENDPOINTS` /
|
||||
`LEGACY_DOCUMENT_ACTION_METHODS` and the v9-drop TODO are unchanged. They drive
|
||||
only the inline deprecation warning (`views.py:2856-2866`), **not** the
|
||||
`choices` — which come wholly from the registry, since the legacy methods _are_
|
||||
registry ops (see C1/Decision 5).
|
||||
- **OpenAPI:** the `method` `enum` and request/response structure are unchanged
|
||||
(Decision 5); `examples` are additive. Regenerated schema diff should be
|
||||
_examples only_.
|
||||
|
||||
## Testing
|
||||
|
||||
New `documents/tests/test_bulk_operations.py` (pytest classes, factory-boy
|
||||
factories, the `mocker` fixture, `parametrize`, full type annotations; run on the
|
||||
Linux VM):
|
||||
|
||||
- **Permission matrix, parametrized over every operation** — the highest-value
|
||||
test. For each op and each relevant parameter combination
|
||||
(`delete_originals` on/off, `update_document` on/off), assert
|
||||
`op.required_permissions(params)` equals the expected
|
||||
`PermissionRequirements`. This freezes the security kernel against drift.
|
||||
- **Registry/serializer parity** — `ChoiceField.choices` equals the **16 unique**
|
||||
method strings in today's exact order (8 field-ops, then the 8
|
||||
`MOVED_DOCUMENT_ACTION_ENDPOINTS` keys); **no duplicates**; `redo_ocr` absent;
|
||||
every method resolves to an operation. (Guards against the C1 duplication bug.)
|
||||
- **Parameter validation/coercion** per op — defaults applied (`merge` →
|
||||
`delete_originals=False`, `archive_fallback=False`; `split`/`edit_pdf` defaults),
|
||||
the `pages` string→list parse, page-bounds-vs-`page_count`, documentlink target
|
||||
and owner-existence checks — mirroring the current `_validate_parameters_*` tests.
|
||||
- **`supports_all` / `max_documents`** — `all=true` rejected for the five
|
||||
no-all ops; `>1` document rejected for `split`/`delete_pages`/`edit_pdf`.
|
||||
- **Examples** — `_bulk_edit_examples()` yields one entry per distinct operation,
|
||||
each `value["parameters"]` validates clean through that op's
|
||||
`parameter_serializer_class` (guarantees documented examples are valid).
|
||||
|
||||
Existing `test_api_bulk_edit.py` / `test_api_bulk_download.py` stay green
|
||||
unchanged — external behavior (accepted methods, validation errors, permission
|
||||
403s, audit fields, results) is preserved.
|
||||
|
||||
## Risks
|
||||
|
||||
- **Permission-matrix mistranslation is a privilege-escalation bug, not a
|
||||
cosmetic one.** This is the whole ballgame. Mitigation: move the logic verbatim
|
||||
into per-op `required_permissions`, and the parametrized permission test above
|
||||
is written _first_ against the current behavior, then held invariant across the
|
||||
refactor.
|
||||
- **The `method`-as-function-object contract** is relied on by existing tests
|
||||
(identity compares, `bulk_edit.<fn>` patching). Mitigation: keep `execute`
|
||||
delegating to the module-level function so patches still bite; adjust only the
|
||||
identity asserts. Audit `test_api_bulk_edit.py` before coding.
|
||||
- **Serializer-based validation subtly changing error messages/shapes.** The
|
||||
current validators raise specific `ValidationError` strings that tests assert
|
||||
on. Mitigation: preserve message text when porting each `_validate_parameters_*`
|
||||
into its serializer; diff the test expectations.
|
||||
- **Enum churn in the generated schema.** Mitigation: Decision 5 fixes member set
|
||||
and order; the schema-diff check in CI should show examples-only changes.
|
||||
Reference in New Issue
Block a user