Docs: bulk-edit operation registry spec + implementation plan

Adds the extensibility design that hoists the bulk-edit operation
definition (today smeared across serialisers.py, views.py, bulk_edit.py
and keyed three different ways) behind a BulkEditOperation registry +
PermissionRequirements value object, with per-operation OpenAPI examples.
Contract-preserving refactor; both docs reviewed across multiple passes
(permission matrix verified, both view call sites accounted for).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
stumpylog
2026-06-16 09:45:03 -07:00
parent cef47eff76
commit 8ef3e4ad00
2 changed files with 1037 additions and 0 deletions
@@ -0,0 +1,524 @@
# Bulk-Edit Operation Registry Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Collapse the bulk-edit operation definition — today smeared across 8 sites in 3 files, keyed 3 different ways — into a single `BulkEditOperation` object per operation, held in an ordered registry. The serializer and both view call sites consume the registry instead of re-encoding the operation list. The wire/API contract is preserved byte-for-byte; per-operation OpenAPI examples are added so the bulk API documents itself.
**Architecture:** A new `documents/bulk_operations.py` defines a `BulkEditOperation` ABC, a frozen `PermissionRequirements` value object, a per-operation DRF parameter serializer (validation + coercion), and an ordered `BULK_EDIT_OPERATIONS` registry whose 16 entries wrap the existing `bulk_edit.py` functions (which are unchanged). `BulkEditSerializer` resolves a method string to an operation and delegates parameter validation; `BulkEditView.post` and `_execute_document_action` read `op.needs_user` / `op.audit_field` / `op.required_permissions(...)` instead of the `METHOD_NAMES_*` sets, `MODIFIED_FIELD_BY_METHOD`, and the three `method in [...]` permission blocks.
**Tech Stack:** Python ≥3.11, Django REST Framework, drf-spectacular, pytest + pytest-mock + factory-boy. Backend tests run on the Linux VM (this is a Windows host); `ruff` runs locally.
**Spec:** `docs/superpowers/specs/2026-06-16-bulk-edit-operation-registry-design.md` (rev. 2 — read the Operation inventory matrix and the Parameter coercion contract before starting; they are the source of truth for every per-op cell).
---
## Conventions for every task
- **Run backend tests on the VM** via the helper (never locally — the lockfile is linux/macOS only):
```bash
bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "<pytest targets/args>"
```
- **Lint locally** with the global ruff binary (not `uv run`):
```bash
ruff check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py
ruff format src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py
```
- **New tests are pytest-style** (per CLAUDE.md): grouped in classes, `@pytest.mark.django_db` on the class where DB is needed, factory-boy factories (`UserFactory`, `DocumentFactory`, `TagFactory`, …), the `mocker` fixture, `@pytest.mark.parametrize`, full type annotations on fixtures and tests.
- **`CustomFieldFactory` does not exist yet** in `tests/factories.py` (only `Correspondent`/`DocumentType`/`Tag`/`StoragePath`/`Document`/`User`/`PaperlessTask`). The `modify_custom_fields` `clean_parameters` tests need `CustomField` rows — add a `CustomFieldFactory` there first (per CLAUDE.md's "add a factory when a model lacks one").
- **Do NOT convert the existing `test_api_bulk_edit.py`** (DRF `APITestCase` style) — it is the regression net and stays as-is. It must be green at every commit. Its `mock.patch("documents.serialisers.bulk_edit.<fn>")` / `documents.views.bulk_edit.<fn>` targets keep working **only if** the two invariants below hold — verify them, do not assume them.
### Two load-bearing invariants (the contract-preservation kernel)
1. **Module identity:** `serialisers.py`, `views.py`, and the new `bulk_operations.py` must each import the operations module as `from documents import bulk_edit` (module import, not `from documents.bulk_edit import merge`). All three then reference the _same_ `sys.modules["documents.bulk_edit"]` object, so a `mock.patch("documents.serialisers.bulk_edit.merge")` mutates the attribute every call site sees. **Verify** `serialisers.py` and `views.py` already use `from documents import bulk_edit` before relying on this.
2. **Call-time lookup:** each `BulkEditOperation.execute` must call `bulk_edit.merge(doc_ids, **kw)` (attribute lookup at call time), NOT capture the function at class-definition time (`fn = bulk_edit.merge` as a class attribute). Otherwise the patch — applied after import — won't be seen.
## File structure
- **Create** `src/documents/bulk_operations.py` — `PermissionRequirements`, `BulkEditOperation` ABC, the per-op parameter serializers, the 16 operation classes, and the ordered `BULK_EDIT_OPERATIONS` registry. One cohesive module.
- **Create** `src/documents/tests/test_bulk_operations.py` — pytest-style unit tests: the permission-matrix characterization (Task 1), then `required_permissions` / `clean_parameters` / registry-parity unit tests (Task 2).
- **Modify** `src/documents/serialisers.py` — rewrite `BulkEditSerializer.method` choices, `validate_method`, and `validate()`; delete the `_validate_parameters_*` methods (their logic moves into the per-op serializers).
- **Modify** `src/documents/views.py` — rewrite `_has_document_permissions`; delete `METHOD_NAMES_REQUIRING_USER`/`_TRIGGER_SOURCE` and `MODIFIED_FIELD_BY_METHOD`; route `BulkEditView.post` through the registry; change `_execute_document_action`'s signature from `method` to `op`; and update the **six** moved-endpoint caller views (`RotateDocumentsView`, `MergeDocumentsView`, `DeleteDocumentsView`, `ReprocessDocumentsView`, `EditPdfDocumentsView`, `RemovePasswordDocumentsView`, `views.py:2964-3109`) to pass `op=BULK_EDIT_OPERATIONS["<name>"]` instead of `method=bulk_edit.<fn>`. Add `from drf_spectacular.utils import OpenApiExample` (Task 4 needs it — not currently imported).
---
## Task 1: Permission-matrix characterization test (the safety net)
This test freezes today's permission behavior **before** any refactor. It must PASS against the current code unchanged — if any case is red now, the spec's matrix (or your reading of it) is wrong; stop and reconcile before proceeding. After the cutover (Task 3) it must still pass identically.
**Files:**
- Create: `src/documents/tests/test_bulk_operations.py`
- [ ] **Step 1: Write the behavior-level permission test against the live API**
Drive the real `bulk_edit/` endpoint so the test is independent of internal structure (it survives the refactor without edits). Build users with precise permission sets and owners, and assert the 200-vs-403 outcome per operation and parameter combination. Cover, at minimum, the conditional cases the spec calls out:
- ownership required: `set_permissions`, `delete`, `rotate`, `delete_pages`, `edit_pdf`, `remove_password` (unconditional); `merge`/`split` only when `delete_originals=true`.
- `add_document` required: `split`, `merge` (unconditional); `edit_pdf`/`remove_password` only when `update_document` is falsy.
- `delete_document` required: `delete` (unconditional); `merge`/`split` only when `delete_originals=true`.
```python
import pytest
from rest_framework import status
from rest_framework.test import APIClient
from documents.models import Document
from documents.tests.factories import DocumentFactory
from documents.tests.factories import UserFactory
@pytest.mark.django_db
class TestBulkEditPermissionMatrix:
@pytest.fixture()
def owned_docs(self, ...) -> list[Document]: ...
# parametrize (method, parameters, perms_to_grant, is_owner) -> expected_status
@pytest.mark.parametrize(("method", "parameters", "grant", "owner", "expected"), [
("set_correspondent", {"correspondent": None}, ["change"], False, status.HTTP_200_OK),
("delete", {}, ["change"], True, status.HTTP_200_OK),
("delete", {}, ["change"], False, status.HTTP_403_FORBIDDEN), # ownership
("delete", {}, ["change", "delete"], False, status.HTTP_403_FORBIDDEN), # still needs ownership
("merge", {"delete_originals": False}, ["change", "add"], False, status.HTTP_200_OK), # no ownership when not deleting
("merge", {"delete_originals": True}, ["change", "add", "delete"], False, status.HTTP_403_FORBIDDEN), # ownership now required
("edit_pdf", {"operations": [{"page": 1}], "update_document": False}, ["change"], True, status.HTTP_403_FORBIDDEN), # needs add_document
("edit_pdf", {"operations": [{"page": 1}], "update_document": True}, ["change"], True, status.HTTP_200_OK), # update => owner+change only
("remove_password", {"password": "x", "update_document": False}, ["change"], True, status.HTTP_403_FORBIDDEN), # needs add_document
("remove_password", {"password": "x", "update_document": True}, ["change"], True, status.HTTP_200_OK),
# ... fill every row of the spec matrix, both polarities of each conditional ...
])
def test_permission_outcome(self, method, parameters, grant, owner, expected, ...) -> None:
# mock the actual bulk_edit.<fn> so execution is a no-op; we test ONLY the
# permission gate's status code, not the operation's effect.
...
```
Notes:
- Mock the underlying `bulk_edit.<fn>` (patch `documents.views.bulk_edit.<fn>`) so the operations don't actually run — this test is purely about the permission gate returning 200 vs 403.
- A superuser short-circuits to allowed (`views.py:2697`); include one superuser row to pin that.
- This is verbose by design; the matrix is the security contract. Prefer one parametrized test over hand-written methods.
- **Cover the six moved single-action endpoints too (REQUIRED — C2).** `/api/documents/rotate/`, `/merge/`, `/delete/`, `/reprocess/`, `/edit_pdf/`, `/remove_password/` run the **same** `_has_document_permissions` gate via `_execute_document_action`, and that path is rewritten in Task 3 (C1). Add a parallel parametrized test that POSTs to each (their request bodies are the dedicated serializers' fields — e.g. `{"documents": [...], "degrees": 90}` for rotate — **not** a `method`+`parameters` envelope). The existing `test_api_bulk_edit.py` already covers these endpoints' permission gates (`test_rotate_insufficient_permissions:1320`, `test_merge_and_delete_insufficient_permissions:1381`, `test_edit_pdf_insufficient_permissions:1635`, `test_remove_password_insufficient_permissions:1719`), so this is hardening rather than the sole net — but make the moved-endpoint matrix explicit here so the `_execute_document_action` rewrite is guarded by a parametrized characterization, not scattered one-offs.
- **`edit_pdf` test docs need a `page_count` (M3).** `clean_parameters` for `edit_pdf` bounds-checks `op["page"]` against `Document.page_count` (`serialisers.py:2052-2059`); this test mocks execution but **not** validation, so an `edit_pdf` row with `page: 1` needs its target doc created with `page_count >= 1`, else it fails with a 400 (out-of-bounds) instead of the expected 200/403.
- [ ] **Step 2: Run it against CURRENT code — it must PASS**
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py -v"`
Expected: PASS. If any row is red, the spec matrix is misread — reconcile against `views.py:2713-2760` before writing any production code.
- [ ] **Step 3: Commit**
```bash
git add src/documents/tests/test_bulk_operations.py
git commit -m "Test: characterize bulk-edit permission matrix before refactor"
```
---
## Task 2: Build `bulk_operations.py` (registry, ABC, ops, serializers) — old path untouched
Build the entire new module with full unit coverage **while the existing dispatch still runs**, so the whole suite stays green throughout. Nothing in `serialisers.py`/`views.py` changes in this task.
**Files:**
- Create: `src/documents/bulk_operations.py`
- Modify (append): `src/documents/tests/test_bulk_operations.py`
- [ ] **Step 1: Write failing unit tests for `PermissionRequirements`, `required_permissions`, and `clean_parameters`**
Append to `test_bulk_operations.py`. White-box this time — assert the value objects directly:
```python
from documents import bulk_operations as ops
class TestRequiredPermissions:
@pytest.mark.parametrize(("name", "params", "expected"), [
("set_correspondent", {}, ops.PermissionRequirements(change=True)),
("delete", {}, ops.PermissionRequirements(change=True, ownership=True, delete_document=True)),
("merge", {"delete_originals": False}, ops.PermissionRequirements(change=True, add_document=True)),
("merge", {"delete_originals": True}, ops.PermissionRequirements(change=True, add_document=True, ownership=True, delete_document=True)),
("edit_pdf", {"update_document": False}, ops.PermissionRequirements(change=True, ownership=True, add_document=True)),
("edit_pdf", {"update_document": True}, ops.PermissionRequirements(change=True, ownership=True)),
("remove_password", {"update_document": False}, ops.PermissionRequirements(change=True, ownership=True, add_document=True)),
("remove_password", {"update_document": True}, ops.PermissionRequirements(change=True, ownership=True)),
# ... every operation, both polarities of each conditional (spec matrix) ...
])
def test_required_permissions(self, name, params, expected) -> None:
assert ops.BULK_EDIT_OPERATIONS[name].required_permissions(params) == expected
class TestRegistryParity:
def test_choices_are_16_unique_in_canonical_order(self) -> None:
# 8 field-ops, then MOVED_DOCUMENT_ACTION_ENDPOINTS key order
assert list(ops.BULK_EDIT_OPERATIONS) == [
"set_correspondent", "set_document_type", "set_storage_path",
"add_tag", "remove_tag", "modify_tags", "modify_custom_fields",
"set_permissions",
"delete", "reprocess", "rotate", "merge",
"edit_pdf", "remove_password", "split", "delete_pages",
]
assert "redo_ocr" not in ops.BULK_EDIT_OPERATIONS
def test_every_op_executes_via_module_attribute(self, mocker) -> None:
# guards invariant #2: call-time lookup so patches still bite
m = mocker.patch("documents.bulk_operations.bulk_edit.merge", return_value="OK")
ops.BULK_EDIT_OPERATIONS["merge"].execute([1], delete_originals=False)
m.assert_called_once()
@pytest.mark.django_db
class TestCleanParameters:
# mirror the existing _validate_parameters_* tests: defaults applied, pages
# string parse, page-bounds vs page_count, custom-field list-or-dict +
# documentlink targets, owner existence, source_mode gating. Assert the SAME
# ValidationError message strings the old validators raised.
...
```
- [ ] **Step 2: Run to verify it fails**
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py::TestRequiredPermissions -v"`
Expected: FAIL with `ModuleNotFoundError: No module named 'documents.bulk_operations'`.
- [ ] **Step 3: Implement `PermissionRequirements` and the `BulkEditOperation` ABC**
```python
from __future__ import annotations
import dataclasses
from abc import ABC
from abc import abstractmethod
from typing import ClassVar
from rest_framework import serializers
from documents import bulk_edit # module import — invariant #1
@dataclasses.dataclass(frozen=True)
class PermissionRequirements:
change: bool = True # documents.change_document + object-level, always
ownership: bool = False # user owns (or doc.owner is None for) ALL docs
add_document: bool = False # documents.add_document
delete_document: bool = False # documents.delete_document
class BulkEditOperation(ABC):
name: ClassVar[str]
audit_field: ClassVar[str | None] = None
supports_all: ClassVar[bool] = True
max_documents: ClassVar[int | None] = None
too_many_documents_message: ClassVar[str | None] = None
needs_user: ClassVar[bool] = False
needs_trigger_source: ClassVar[bool] = False
parameter_serializer_class: ClassVar[type[serializers.Serializer] | None] = None
example_parameters: ClassVar[dict] = {}
def clean_parameters(self, parameters: dict, *, user, documents: list[int]) -> dict:
if self.parameter_serializer_class is None:
return parameters
serializer = self.parameter_serializer_class(
data=parameters,
context={"user": user, "documents": documents},
)
serializer.is_valid(raise_exception=True)
# merge coerced/validated values back over the raw dict so passthrough
# keys (e.g. metadata_document_id, source_mode) survive.
return {**parameters, **serializer.validated_data}
def required_permissions(self, parameters: dict) -> PermissionRequirements:
return PermissionRequirements()
@abstractmethod
def execute(self, doc_ids: list[int], **parameters) -> str: ...
```
- [ ] **Step 4: Implement the 16 operation classes + parameter serializers**
Follow the spec's Operation inventory matrix for every cell. Representative examples — the simple assignment op, and the two conditional ones:
```python
class SetCorrespondentOperation(BulkEditOperation):
name = "set_correspondent"
audit_field = "correspondent"
parameter_serializer_class = SetCorrespondentParametersSerializer # validates correspondent id|null
example_parameters = {"correspondent": 1}
def execute(self, doc_ids, **kw):
return bulk_edit.set_correspondent(doc_ids, **kw)
class MergeOperation(BulkEditOperation):
name = "merge"
supports_all = False
needs_user = needs_trigger_source = True
parameter_serializer_class = MergeParametersSerializer
example_parameters = {"delete_originals": False, "archive_fallback": False}
def required_permissions(self, parameters):
delete = parameters.get("delete_originals", False)
return PermissionRequirements(
change=True, add_document=True,
ownership=delete, delete_document=delete,
)
def execute(self, doc_ids, **kw):
return bulk_edit.merge(doc_ids, **kw)
class EditPdfOperation(BulkEditOperation):
name = "edit_pdf"
supports_all = False
max_documents = 1
too_many_documents_message = "Edit PDF method only supports one document"
needs_user = needs_trigger_source = True
parameter_serializer_class = EditPdfParametersSerializer
example_parameters = {"operations": [{"page": 1, "rotate": 90}], "update_document": False, "include_metadata": True}
def required_permissions(self, parameters):
# edit_pdf is ALWAYS ownership-gated (views.py:2722); add_document only
# when NOT update_document (views.py:2740-2741).
update = parameters.get("update_document", False)
return PermissionRequirements(change=True, ownership=True, add_document=not update)
def execute(self, doc_ids, **kw):
return bulk_edit.edit_pdf(doc_ids, **kw)
```
Parameter serializers carry the validation+coercion the spec's "Parameter coercion contract to preserve" section enumerates — preserve the exact `ValidationError` message strings. Example for the DB/cross-field case:
```python
class EditPdfParametersSerializer(serializers.Serializer):
operations = serializers.ListField(child=serializers.DictField())
update_document = serializers.BooleanField(required=False, default=False)
include_metadata = serializers.BooleanField(required=False, default=True)
# source_mode handled here too, only when present
def validate(self, attrs):
# reproduce serialisers.py:2045-2059 verbatim, incl. messages:
# - "update_document only allowed with a single output document"
# - page-bounds: "Page {n} is out of bounds for document with {k} pages."
# using self.context["documents"][0] / Document.objects.get(...)
return attrs
```
`RemovePasswordOperation` keeps an `update_document` param (it exists — `bulk_edit.py:881`); its `required_permissions` mirrors `EditPdfOperation`'s `add_document=not update` (but ownership is unconditional too — see matrix). `DeleteOperation` / `ReprocessOperation` set `parameter_serializer_class = None`. Do **not** register `redo_ocr`.
**Defaulting parity (H3) — match each old validator exactly, no more, no less.** `test_api_bulk_edit.py` asserts `mock.call_args` kwargs, so a serializer that injects a default the old validator didn't will break those asserts. `edit_pdf` _did_ default `update_document=False` / `include_metadata=True` (`serialisers.py:2038-2043`) → keep them. `remove_password` validated **only** `password` (`serialisers.py:2061-2065`) and did **not** default `update_document` / `include_metadata` / `delete_original` → `RemovePasswordParametersSerializer` must declare only `password`. `update_document` then survives as a **raw passthrough key** in `parameters` (so `required_permissions` still reads it via `parameters.get("update_document", False)`), and no extra kwargs reach `bulk_edit.remove_password`. Apply the same "match the old defaulting" rule to every op.
**`set_permissions` transform (H2) — the QuerySet shape is load-bearing.** `SetPermissionsParametersSerializer` must run `validate_set_permissions` (from `SetPermissionsMixin`, which `BulkEditSerializer` already inherits) so that `validated_data["set_permissions"]` carries the **QuerySet-dict** structure `bulk_edit.set_permissions` consumes — not the raw `{view:{users:[ids]}}` dict. A plain `DictField` would leave the raw dict in `validated_data`, and `{**parameters, **validated_data}` would then feed the function the wrong shape. Also default `merge=False` and validate `owner` existence (`serialisers.py:1946-1952`).
Build the **ordered** registry (legacy section in `MOVED_DOCUMENT_ACTION_ENDPOINTS` key order — `edit_pdf, remove_password` before `split, delete_pages`):
```python
BULK_EDIT_OPERATIONS: dict[str, BulkEditOperation] = {
op.name: op
for op in (
SetCorrespondentOperation(), SetDocumentTypeOperation(),
SetStoragePathOperation(), AddTagOperation(), RemoveTagOperation(),
ModifyTagsOperation(), ModifyCustomFieldsOperation(), SetPermissionsOperation(),
DeleteOperation(), ReprocessOperation(), RotateOperation(), MergeOperation(),
EditPdfOperation(), RemovePasswordOperation(), SplitOperation(), DeletePagesOperation(),
)
}
```
- [ ] **Step 5: Run unit tests to green**
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py -v"`
Expected: PASS (permission matrix, required_permissions, registry parity, clean_parameters). The existing `test_api_bulk_edit.py` is untouched and still green (old path runs).
- [ ] **Step 6: Lint & commit**
```bash
ruff check src/documents/bulk_operations.py && ruff format src/documents/bulk_operations.py
git add src/documents/bulk_operations.py src/documents/tests/test_bulk_operations.py
git commit -m "Feature: add bulk-edit operation registry (not yet wired)"
```
---
## Task 3: Cutover — wire the serializer and BOTH view call sites
This is the atomic swap: `validate_method` returning an operation object ripples to both view sites, so serializer + views land in **one commit**. The full `test_api_bulk_edit.py` regression suite plus Task 1's matrix test are the contract; both must be green at the end.
**Files:**
- Modify: `src/documents/serialisers.py`
- Modify: `src/documents/views.py`
- [ ] **Step 1: Confirm invariant #1**
Grep that `serialisers.py` and `views.py` import `from documents import bulk_edit` (not `from documents.bulk_edit import ...`). If they use member imports, the existing patches break — convert to module import as part of this task and note it.
- [ ] **Step 2: Rewrite `BulkEditSerializer`**
- `method = serializers.ChoiceField(choices=list(bulk_operations.BULK_EDIT_OPERATIONS), ...)` — registry alone (16, canonical order), **not** `+ LEGACY_DOCUMENT_ACTION_METHODS`.
- `validate_method` → `return bulk_operations.BULK_EDIT_OPERATIONS[method]` (returns the op; raise `ValidationError("Unsupported method.")` on KeyError to preserve the message).
- `validate()`:
```python
op = attrs["method"]
if attrs.get("all", False) and not op.supports_all:
raise serializers.ValidationError("This method does not support all=true.")
if op.max_documents is not None and len(attrs["documents"]) > op.max_documents:
raise serializers.ValidationError(op.too_many_documents_message)
attrs["parameters"] = op.clean_parameters(
attrs["parameters"], user=self.user, documents=attrs["documents"],
)
return attrs
```
- **Delete** all `_validate_parameters_*` / `_validate_storage_path` / `validate_parameters_remove_password` methods (their logic now lives in the per-op serializers). Keep `MOVED_DOCUMENT_ACTION_ENDPOINTS` / `LEGACY_DOCUMENT_ACTION_METHODS` (still used by the view's deprecation warning).
- [ ] **Step 3: Rewrite `_has_document_permissions` to consume `PermissionRequirements`**
```python
def _has_document_permissions(self, *, user, documents, op, parameters) -> bool:
if user.is_superuser:
return True
document_objs = Document.objects.select_related("owner").filter(pk__in=documents)
reqs = op.required_permissions(parameters)
ok = user.has_perm("documents.change_document") and all(
has_perms_owner_aware(user, "change_document", doc) for doc in document_objs
)
if ok and reqs.ownership:
ok = all((doc.owner == user or doc.owner is None) for doc in document_objs)
if ok and reqs.add_document:
ok = user.has_perm("documents.add_document")
if ok and reqs.delete_document:
ok = user.has_perm("documents.delete_document")
return ok
```
- [ ] **Step 4: Route BOTH call sites through the op — they obtain the op differently**
There are two distinct paths, and `_execute_document_action` does **NOT** read `validated_data["method"]` (its serializers have no `method` field — it receives the operation as an argument). Handle each:
- **Delete** `METHOD_NAMES_REQUIRING_USER`, `METHOD_NAMES_REQUIRING_TRIGGER_SOURCE` (note: it is an alias — `METHOD_NAMES_REQUIRING_TRIGGER_SOURCE = METHOD_NAMES_REQUIRING_USER` at `views.py:2687` — so they are one object), and `MODIFIED_FIELD_BY_METHOD`.
- **`BulkEditView.post`** (`views.py:2852-2947`) — the `/bulk_edit/` path: `op = serializer.validated_data["method"]` (the registry object `validate_method` now returns). Replace `method.__name__ in METHOD_NAMES_REQUIRING_USER` → `op.needs_user`; trigger-source check → `op.needs_trigger_source`; the permission call → `_has_document_permissions(op=op, ...)`; `method(documents, **parameters)` → `op.execute(documents, **parameters)`. Audit block: `modified_field = op.audit_field` (replaces `MODIFIED_FIELD_BY_METHOD.get(method.__name__)`), reason → `f"Bulk edit: {op.name}"`. Snapshot/`log_create` otherwise unchanged.
- **`_execute_document_action`** (`views.py:2764-2807`) — the moved single-action path used by six views: change its signature from `method` to `op: BulkEditOperation`. Inside, replace `method.__name__ in METHOD_NAMES_REQUIRING_USER` → `op.needs_user`; trigger check → `op.needs_trigger_source`; `_has_document_permissions(method=method, ...)` → `_has_document_permissions(op=op, ...)`; `method(documents, **parameters)` → `op.execute(documents, **parameters)`. This path has **no** audit block — leave it that way. `op.clean_parameters` is **not** called here: each moved view's own serializer (`RotateDocumentsSerializer`, `MergeDocumentsSerializer`, …) already validated its parameters; the op supplies only needs_user / needs_trigger_source / required_permissions / execute.
- **The six caller views** (`RotateDocumentsView:2964`, `MergeDocumentsView:2991`, `DeleteDocumentsView:3018`, `ReprocessDocumentsView:3045`, `EditPdfDocumentsView:3072`, `RemovePasswordDocumentsView:3099`): change each `method=bulk_edit.<fn>` argument to `op=BULK_EDIT_OPERATIONS["<name>"]` (e.g. `op=BULK_EDIT_OPERATIONS["rotate"]`).
- [ ] **Step 5: Run the FULL regression + matrix suites**
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_bulk_edit.py src/documents/tests/test_bulk_operations.py -v"`
Expected: PASS — every existing `test_api_bulk_edit.py` test (patch targets still bite via invariant #1; `__name__`-dependent asserts gone), plus Task 1's matrix unchanged. If a `documents.serialisers.bulk_edit.X` / `documents.views.bulk_edit.X` patch stops biting, invariant #1 or #2 is violated — check the import style and that `execute` does call-time lookup.
- [ ] **Step 6: Run the broader API + audit suites** (signals/audit log touch this path)
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_documents.py src/documents/tests/test_api_bulk_download.py -k bulk or audit -v"`
Expected: PASS.
- [ ] **Step 7: Lint & commit**
```bash
ruff check src/documents/serialisers.py src/documents/views.py && ruff format src/documents/serialisers.py src/documents/views.py
git add src/documents/serialisers.py src/documents/views.py
git commit -m "Refactor: route bulk_edit through the operation registry"
```
---
## Task 4: Registry-driven OpenAPI examples
**Files:**
- Modify: `src/documents/views.py`
- Test: `src/documents/tests/test_bulk_operations.py`
- [ ] **Step 1: Write a failing test that every example validates**
```python
class TestBulkEditExamples:
def test_every_operation_has_a_valid_example(self) -> None:
from documents.views import _bulk_edit_examples
examples = _bulk_edit_examples()
assert {e.summary for e in examples} == set(ops.BULK_EDIT_OPERATIONS)
for ex in examples:
op = ops.BULK_EDIT_OPERATIONS[ex.value["method"]]
if op.parameter_serializer_class is not None:
s = op.parameter_serializer_class(data=ex.value["parameters"], context={...})
assert s.is_valid(), s.errors
```
- [ ] **Step 2: Implement the helper and wire `@extend_schema`**
First add the import — `OpenApiExample` is **not** currently in `views.py` (extend the existing `from drf_spectacular.utils import ...` line):
```python
from drf_spectacular.utils import OpenApiExample
```
```python
def _bulk_edit_examples() -> list[OpenApiExample]:
return [
OpenApiExample(
name=op.name, summary=op.name,
value={"documents": [1, 2], "method": op.name, "parameters": op.example_parameters},
request_only=True,
)
for op in BULK_EDIT_OPERATIONS.values()
]
```
Add `examples=_bulk_edit_examples()` to the existing `bulk_edit` `extend_schema(...)` (`views.py:2811-2825`). Leave `operation_id`, `description`, and the `responses` inline serializer unchanged.
- [ ] **Step 3: Run the example test + a schema smoke check**
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py::TestBulkEditExamples -v"`
Then regenerate the OpenAPI schema on the VM and confirm the diff is **examples-only** — the `method` enum membership/order is byte-identical and the request/response structure is unchanged:
```bash
ssh -o BatchMode=yes -p 2244 trenton@localhost 'bash -lc "cd ~/projects/paperless-ngx && uv run manage.py spectacular --file /tmp/schema.yml"'
```
Expected: schema generates without error; the `bulk_edit` `method` enum lists the 16 methods in canonical order; examples appear.
- [ ] **Step 4: Lint & commit**
```bash
ruff check src/documents/views.py && ruff format src/documents/views.py
git add src/documents/views.py src/documents/tests/test_bulk_operations.py
git commit -m "Feature: document bulk_edit parameters via per-operation OpenAPI examples"
```
---
## Task 5: Final verification
**Files:** none (verification only).
- [ ] **Step 1: Full bulk-edit-related suite**
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_bulk_edit.py src/documents/tests/test_bulk_operations.py src/documents/tests/test_api_bulk_download.py -v"`
Expected: PASS, no failures, no errors.
- [ ] **Step 2: Type-check on the VM (pyrefly, with baseline)**
```bash
tar czf - src pyproject.toml uv.lock .pyrefly-baseline.json | ssh -o BatchMode=yes -p 2244 trenton@localhost 'tar xzf - -C ~/projects/paperless-ngx'
ssh -o BatchMode=yes -p 2244 trenton@localhost 'bash -lc "cd ~/projects/paperless-ngx && uv run pyrefly check"'
```
Expected: no new type errors beyond the baseline.
- [ ] **Step 3: Final lint/format pass**
Run: `ruff check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py src/documents/tests/test_bulk_operations.py && ruff format --check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py`
Expected: clean.
- [ ] **Step 4: Confirm the smear is gone**
Grep to verify no orphaned references remain: `MODIFIED_FIELD_BY_METHOD`, `METHOD_NAMES_REQUIRING_USER`, `_validate_parameters_`, and `method.__name__` in `views.py` should all be gone; `bulk_edit.<fn>` should appear only inside `bulk_operations.py` `execute` methods.
---
## Notes for the implementer
- **The permission matrix is the whole ballgame.** A wrong `required_permissions` cell is a privilege-escalation bug, not a cosmetic one. Task 1's parametrized characterization test (written and green _before_ the refactor) is the guardrail — never weaken a case to make the refactor pass; if it goes red, the production code is wrong.
- **Preserve `ValidationError` message text verbatim** when porting `_validate_parameters_*` into the per-op serializers — `test_api_bulk_edit.py` asserts specific strings (e.g. the three distinct "only one document" messages, the all=true message, "out of bounds", "update_document only allowed with a single output document").
- **Two call sites, obtained differently.** `BulkEditView.post` reads `op` from `validated_data["method"]` and owns the audit logging; `_execute_document_action` receives `op` as an argument from its **six** caller views (which change `method=bulk_edit.<fn>``op=BULK_EDIT_OPERATIONS["<name>"]`) and has no audit. Convert both paths and all six caller views; Task 1 must characterize both before cutover.
- **`redo_ocr` stays unregistered** (dead/unreachable today; registering it would newly accept it on the wire).
- **Out of scope:** a discriminated `oneOf` request schema for `parameters` — examples (Task 4) are the agreed approach; the polymorphic schema is a possible later follow-up (the discriminator `method` and payload `parameters` are sibling fields, which `PolymorphicProxySerializer` does not model cleanly).
@@ -0,0 +1,513 @@
# Bulk-Edit Operation Registry — Design
**Date:** 2026-06-16
**Branch base:** `dev`
**Status:** Draft (rev. 2 — corrected per critical review)
## Problem
A single bulk-edit operation's definition is smeared across **eight sites in
three files**, keyed **three different ways**, with no single source of truth.
Taking `merge` as the worked example:
| # | Location | What it holds | Keyed by |
| --- | ------------------------- | --------------------------------------------------------- | --------------------- |
| 1 | `serialisers.py:1758` | name in the `method` `ChoiceField.choices` | **string** |
| 2 | `serialisers.py:1849` | `validate_method` `elif` → returns `bulk_edit.merge` | **string → function** |
| 3 | `serialisers.py:2070` | the `all=true`-unsupported list | **function identity** |
| 4 | `serialisers.py:2115` | `validate()` dispatch → `_validate_parameters_merge` | **function identity** |
| 5 | `serialisers.py:2008` | `_validate_parameters_merge` (validate + coerce defaults) | — |
| 6 | `views.py:2687` | `METHOD_NAMES_REQUIRING_USER` / `_TRIGGER_SOURCE` | **`__name__`** |
| 7 | `views.py:2727,2738,2754` | three permission blocks (`method in [...]`) | **function identity** |
| 8 | `views.py:2844` | `MODIFIED_FIELD_BY_METHOD` audit field | **string** |
Plus the execution function itself in `bulk_edit.py`.
Three structural problems follow:
- **`validate_method` resolves the request string to a _function object_**
(`serialisers.py:1826-1860`), so everything downstream compares either
`method == bulk_edit.merge` (identity), `method.__name__` (string), or the raw
request string. Three keying schemes for one concept. Adding an operation — or
editing one — means touching all eight sites, and forgetting one fails
_silently_ (an op that runs but isn't audited, or skips an ownership check)
rather than loudly.
- **The permission matrix is parameter-conditional and security-critical.** From
`views.py:2713-2760`: ownership is required for `merge`/`split` _only_ when
`delete_originals` is set; `add_document` is required for `edit_pdf`/
`remove_password` _only_ when `update_document` is not set; `delete_document`
for `merge`/`split` _only_ when `delete_originals`. This logic is correct but
lives far from the operations it governs, so it is hard to audit and easy to
break.
- **The API is self-undocumenting.** `parameters` is a bare
`serializers.DictField` (`serialisers.py:1773`). drf-spectacular renders it as
a free-form object, so the OpenAPI schema tells a caller nothing about what
`merge` versus `set_correspondent` actually expect. The repo uses
`@extend_schema`/`inline_serializer` widely (62 sites) but has **no**
`PolymorphicProxySerializer`, `OpenApiExample`, or `discriminator` usage to
describe this polymorphic endpoint.
## Goal
Make each bulk-edit operation a **single object** that owns all eight facts —
name, execution callable, parameter validation/coercion, audit field, the
`all=`/single-document constraints, the user/trigger-source needs, and its
parameter-conditional permission requirements. Operations live in a registry;
the serializer and view consume the registry instead of re-encoding the
operation list. Adding an operation becomes one class plus one registry entry,
not an eight-site edit. As a deliberate, contract-preserving bonus, each
operation also contributes a **per-operation request example** so the bulk API
finally documents itself in the OpenAPI schema.
**The wire contract does not change.** This is a relocation of internal logic,
not a redefinition of the endpoint.
## Scope
In scope:
- New `documents/bulk_operations.py` (registry + `BulkEditOperation` classes +
`PermissionRequirements`). The execution functions stay in `bulk_edit.py`;
operation classes wrap them.
- Rewrite `BulkEditSerializer.validate_method` / `validate()` and the
`_validate_parameters_*` methods to delegate to the operation's parameter
serializer.
- Rewrite `BulkEditView._has_document_permissions`, the `METHOD_NAMES_*` sets,
and `MODIFIED_FIELD_BY_METHOD` to read from the registry.
- Add `examples=[...]` to the `bulk_edit` `@extend_schema`, generated from the
registry (one example per operation).
- Unit tests per operation; keep every existing `test_api_bulk_edit*` test green.
Out of scope:
- Changing any operation's behavior, accepted method strings, parameter names,
defaults, coercion, or permission outcome. Byte-for-byte wire compatibility.
- The legacy-method deprecation-warning machinery
(`MOVED_DOCUMENT_ACTION_ENDPOINTS`, the API-v9-drop TODO at `views.py:2855`):
legacy methods log a warning and process **inline** — there is **no** redirect
(`views.py:2856-2866`). Preserved as-is.
- A full polymorphic request schema (`oneOf`/discriminated `parameters`). Examples
(option 1) are in scope; a discriminated schema is a possible future follow-up
and is **not** built here — the discriminator (`method`) and the variant
payload (`parameters`) are sibling fields, which `PolymorphicProxySerializer`
does not model cleanly. YAGNI until examples prove insufficient.
- Converting `bulk_edit.py` into a package, or touching the execution functions'
internals.
- Any third-party / entry-point registration of operations. The registry is
in-tree only; an entry point could be layered on later but the PDF/page ops are
tightly bound to internal helpers, so ecosystem value is low and unproven.
## Decisions
These shape the design and are the reviewable choices:
1. **Operations wrap, not replace, the `bulk_edit.py` functions.** Each
`BulkEditOperation.execute` calls the existing function. The execution code is
correct and well-tested; this refactor is about the metadata and dispatch
around it, exactly as the export-sink refactor moved _plumbing_ without
touching export _contents_.
2. **Parameter validation moves into a per-operation DRF `Serializer`**, not an
ad-hoc `clean_*` method. A real serializer (a) validates and coerces in one
place (replacing the `_validate_parameters_*` methods _and_ their in-place
mutation of defaults / the `pages`-string parse), (b) accepts `context`
(`user`, `documents`) for the cross-field/DB checks (page-bounds vs
`document.page_count`, documentlink targets, owner existence), and (c) is a
structure drf-spectacular already understands. Operations with no parameters
(`delete`, `reprocess`) declare `parameter_serializer_class = None`.
3. **Permission requirements are computed by the operation, given the validated
parameters**, returning a `PermissionRequirements` value object. The
parameter-conditional kernel (ownership iff `delete_originals`, etc.) lives
next to the operation it governs. The view's three permission blocks collapse
to "build requirements, then check each flag generically."
4. **Examples are derived from the registry** (option 1 from the design
discussion). Each operation declares a canonical `example_parameters` dict; a
helper builds one `OpenApiExample` per operation for the `bulk_edit`
`@extend_schema`. Adding an operation therefore auto-adds its example — the
examples cannot drift out of sync with the registry. This is the only piece
that _adds_ to the schema; it does not alter the request/response structure.
5. **The registry is the single source of the method enum.** Today's enum is the
8 hardcoded field-ops (`serialisers.py:1758-1766`) plus
`LEGACY_DOCUMENT_ACTION_METHODS` — but the legacy methods (`delete, reprocess,
rotate, merge, edit_pdf, remove_password, split, delete_pages`) **are
themselves operations**, not a disjoint set, so all **16 unique** methods live
in the registry. `ChoiceField.choices` is therefore
`list(BULK_EDIT_OPERATIONS)` **alone** — do NOT append
`LEGACY_DOCUMENT_ACTION_METHODS` (that would duplicate 8 entries and churn the
enum, the exact thing this decision prevents). The registry must be **ordered**
to reproduce today's member order — the 8 field-ops first (in
`serialisers.py:1758-1766` order), then the 8 legacy methods in
`MOVED_DOCUMENT_ACTION_ENDPOINTS` **key/insertion order** (`delete, reprocess,
rotate, merge, edit_pdf, remove_password, split, delete_pages`;
`serialisers.py:1745-1754`) — so the generated OpenAPI `enum` is byte-identical.
NB: that legacy order is `edit_pdf, remove_password` _before_ `split,
delete_pages` — do not reorder them.
## Architecture
### `PermissionRequirements`
```python
@dataclass(frozen=True)
class PermissionRequirements:
change: bool = True # documents.change_document + object-level, always
ownership: bool = False # user owns (or doc.owner is None for) ALL docs
add_document: bool = False # documents.add_document
delete_document: bool = False # documents.delete_document
```
### `BulkEditOperation`
New module `documents/bulk_operations.py`:
```python
class BulkEditOperation(ABC):
name: ClassVar[str]
audit_field: ClassVar[str | None] = None # → MODIFIED_FIELD_BY_METHOD
supports_all: ClassVar[bool] = True # → the all=true guard
max_documents: ClassVar[int | None] = None # split/delete_pages/edit_pdf = 1
too_many_documents_message: ClassVar[str | None] = None # per-op error text (H3)
needs_user: ClassVar[bool] = False # → METHOD_NAMES_REQUIRING_USER
needs_trigger_source: ClassVar[bool] = False # → ..._REQUIRING_TRIGGER_SOURCE
parameter_serializer_class: ClassVar[type[serializers.Serializer] | None] = None
example_parameters: ClassVar[dict] = {} # → OpenApiExample payload
def clean_parameters(self, parameters: dict, *, user, documents) -> dict:
"""Validate + coerce via parameter_serializer_class (context=user,documents).
Returns the normalized parameters. Raises serializers.ValidationError.
No-op passthrough when parameter_serializer_class is None."""
def required_permissions(self, parameters: dict) -> PermissionRequirements:
"""The parameter-conditional permission kernel. Default: change only."""
return PermissionRequirements()
@abstractmethod
def execute(self, doc_ids: list[int], **parameters) -> str: ...
```
The two subtle operations, stated next to their own rules:
```python
class MergeOperation(BulkEditOperation):
name = "merge"
supports_all = False
needs_user = needs_trigger_source = True
parameter_serializer_class = MergeParametersSerializer
example_parameters = {"delete_originals": False, "archive_fallback": False}
def required_permissions(self, parameters):
delete = parameters.get("delete_originals", False)
return PermissionRequirements(
change=True, add_document=True,
ownership=delete, delete_document=delete,
)
def execute(self, doc_ids, **kw):
return bulk_edit.merge(doc_ids, **kw)
class EditPdfOperation(BulkEditOperation):
name = "edit_pdf"
supports_all = False
max_documents = 1
needs_user = needs_trigger_source = True
parameter_serializer_class = EditPdfParametersSerializer
example_parameters = {
"operations": [{"page": 1, "rotate": 90}],
"update_document": False,
"include_metadata": True,
}
def required_permissions(self, parameters):
update = parameters.get("update_document", False)
# edit_pdf is ALWAYS ownership-gated (views.py:2722); add_document only
# when NOT update_document (views.py:2740-2741).
return PermissionRequirements(
change=True, ownership=True, add_document=not update,
)
```
### Registry
```python
BULK_EDIT_OPERATIONS: dict[str, BulkEditOperation] = {
op.name: op
for op in (
SetCorrespondentOperation(), SetDocumentTypeOperation(),
SetStoragePathOperation(), AddTagOperation(), RemoveTagOperation(),
ModifyTagsOperation(), ModifyCustomFieldsOperation(),
SetPermissionsOperation(),
# legacy section — MUST match MOVED_DOCUMENT_ACTION_ENDPOINTS key order
# (serialisers.py:1745-1754) so the generated enum is byte-identical:
DeleteOperation(), ReprocessOperation(), RotateOperation(),
MergeOperation(), EditPdfOperation(), RemovePasswordOperation(),
SplitOperation(), DeletePagesOperation(),
)
}
```
There is **no** `redo_ocr` entry. `validate_method` has a `method == "redo_ocr"`
branch (`serialisers.py:1843`), but `"redo_ocr"` is absent from `choices`
(`serialisers.py:1758-1768`), so the `ChoiceField` rejects it _before_
`validate_method` runs — that branch is unreachable dead code today. Do **not**
add `redo_ocr` to the registry: doing so would make it a valid `choices` entry
and newly accept it on the wire (a contract change). `reprocess` is registered
once, under `reprocess`.
### How each call site collapses
- **`ChoiceField.choices`** → `list(BULK_EDIT_OPERATIONS)` (the 16 unique
methods, registry ordered to match today). Legacy methods are already registry
ops, so they are **not** appended separately (see Decision 5).
- **`validate_method`** → `return BULK_EDIT_OPERATIONS[method]` (the validated
value becomes an _operation object_ instead of a function — internal only,
`method` is `write_only`).
- **`validate()`** → `op.clean_parameters(parameters, user=…, documents=…)`; the
`all=true` guard becomes `if attrs.get("all") and not op.supports_all: raise
ValidationError("This method does not support all=true.")` (today's single
shared message, `serialisers.py:2077`, asserted verbatim by
`test_api_bulk_edit.py:763`); the per-method "only one document" checks become
an `op.max_documents` check that raises `op.too_many_documents_message`. That
text is **per-op** — "Split method only supports one document", "Delete pages
method only supports one document", "Edit PDF method only supports one document"
(`serialisers.py:2105,2111,2119`) — and is asserted verbatim (e.g.
`test_api_bulk_edit.py:1519`), so it **cannot** be collapsed to one generic
string.
- **`METHOD_NAMES_REQUIRING_USER` / `_TRIGGER_SOURCE`** → `op.needs_user` /
`op.needs_trigger_source`.
- **The three permission blocks** → one pass:
```python
reqs = op.required_permissions(parameters)
ok = user.has_perm("documents.change_document") and all(
has_perms_owner_aware(user, "change_document", d) for d in document_objs
)
if ok and reqs.ownership: ok = user_is_owner_of_all_documents
if ok and reqs.add_document: ok = user.has_perm("documents.add_document")
if ok and reqs.delete_document: ok = user.has_perm("documents.delete_document")
```
- **`MODIFIED_FIELD_BY_METHOD`** → `op.audit_field`.
**Two call sites consume this, not one.** `BulkEditView.post`
(`views.py:2852-2947`) is a fully **inlined** path — it is the only path the
`bulk_edit/` endpoint uses. It checks permissions, sets `user`/`trigger_source`,
runs the audit-log block (`views.py:2896-2940`, currently keyed on
`method.__name__` → becomes `op.audit_field`), and calls `method(documents,
**parameters)`. `_execute_document_action` (`views.py:2764-2807`) is a
**separate** path used by the _moved single-action_ endpoints
(`/api/documents/delete/`, `/rotate/`, …); it builds `parameters`, sets
user/trigger, and checks permissions independently and has **no** audit logging.
The refactor must convert **both** to the registry; audit logging stays only in
`post`.
## Operation inventory (the faithful matrix)
Compiled from `bulk_edit.py` signatures, `serialisers.py:2067-2126`, and
`views.py:2679-2760`. `change` is required for every operation and omitted.
`[source_mode]` is the shared optional param accepted by the PDF-touching ops
(validated by `_validate_source_mode` only when present).
| Operation (`name`) | Parameters | `supports_all` | `max_documents` | user/trigger | `audit_field` | ownership | add_doc | delete_doc |
| ------------------------------ | ------------------------------------------------------------------------------------------------------------- | -------------- | --------------- | ------------ | --------------- | ---------------------- | ------------------------- | ---------------------- |
| `set_correspondent` | `correspondent: int\|null` | yes | — | no | `correspondent` | — | — | — |
| `set_document_type` | `document_type: int\|null` | yes | — | no | `document_type` | — | — | — |
| `set_storage_path` | `storage_path: int\|null` | yes | — | no | `storage_path` | — | — | — |
| `add_tag` | `tag: int` | yes | — | no | `tags` | — | — | — |
| `remove_tag` | `tag: int` | yes | — | no | `tags` | — | — | — |
| `modify_tags` | `add_tags: int[]`, `remove_tags: int[]` | yes | — | no | `tags` | — | — | — |
| `modify_custom_fields` | `add_custom_fields: int[]\|{id:val}`, `remove_custom_fields: int[]` | yes | — | no | `custom_fields` | — | — | — |
| `set_permissions` | `set_permissions: obj`, `owner: int\|null`, `merge: bool=false` | yes | — | no | `None` | **yes** | — | — |
| `delete` | _(none)_ | yes | — | no | `deleted_at` | **yes** | — | **yes** |
| `reprocess` (alias `redo_ocr`) | _(none)_ | yes | — | no | `checksum` | — | — | — |
| `rotate` | `degrees: int`, `[source_mode]` | yes | — | **yes** | `None` | **yes** | — | — |
| `merge` | `delete_originals: bool=false`, `archive_fallback: bool=false`, `metadata_document_id?: int`, `[source_mode]` | **no** | — | **yes** | `None` | iff `delete_originals` | **yes** | iff `delete_originals` |
| `split` | `pages: str→int[][]`, `delete_originals: bool=false`, `[source_mode]` | **no** | **1** | **yes** | `None` | iff `delete_originals` | **yes** | iff `delete_originals` |
| `delete_pages` | `pages: int[]`, `[source_mode]` | **no** | **1** | **yes** | `None` | **yes** | — | — |
| `edit_pdf` | `operations: obj[]`, `update_document: bool=false`, `include_metadata: bool=true`, `[source_mode]` | **no** | **1** | **yes** | `None` | **yes** | iff not `update_document` | — |
| `remove_password` | `password: str`, `update_document: bool=false`, `[source_mode]` | **no** | — | **yes** | `None` | **yes** | iff not `update_document` | — |
Notes that are easy to get wrong and are pinned here:
- `edit_pdf` ownership is **unconditional** — it is in the unconditional
ownership list (`views.py:2722`); the separate `edit_pdf and update_document`
clause (`views.py:2730`) is redundant and folds away.
- `remove_password` **does** accept an `update_document` param
(`bulk_edit.py:881`), and `parameters` is a passthrough `DictField` whose
validator (`serialisers.py:2061-2065`) neither strips nor defaults it. So its
`add_document` requirement is `not parameters.get("update_document", False)` —
identical to `edit_pdf`, **not** an unconditional `True`. Sending
`update_document: true` legitimately drops the add_document requirement today,
and that behavior must be preserved. (Earlier drafts claimed the param did not
exist — that was a permission-correctness bug.)
- `merge` and `remove_password` are **not** single-document (no `max_documents`),
even though both set `supports_all = False`.
## Parameter coercion contract to preserve
`clean_parameters` must reproduce every in-place coercion the current
`_validate_parameters_*` methods perform, not merely the validation. Full list
(an implementation-plan checklist):
- `merge` / `split`: default `delete_originals=False`
(`serialisers.py:1998,2013`); `merge` also defaults `archive_fallback=False`
(`:2018`).
- `edit_pdf`: default `update_document=False`, `include_metadata=True`
(`:2038,2043`); reject `update_document=True` with multiple output docs
(`:2045-2050`).
- `set_permissions`: default `merge=False` (`:1951-1952`) and **mutate**
`parameters["set_permissions"]` in place via `validate_set_permissions`
(`:1946`, `SetPermissionsMixin`); validate `owner` existence
(`:1939-1943,1949-1950`). Needs its own `SetPermissionsParametersSerializer`.
- `split`: parse the `pages` string `"1-3,5"` → `[[1,2,3],[5]]` (`:1974-1990`).
- `source_mode`: validated and applied **only when present** in `parameters`
(`:2084-2085` gate → `validate_source_mode`, `:1964-1969`), independent of
method — so each PDF-touching op's serializer opts into it conditionally.
- `modify_custom_fields`: accept **list OR `{id: value}` dict**, and for
DOCUMENTLINK fields validate targets via `validate_documentlink_targets`
(`:1787-1824`).
- **Param-name spelling differs by op** and must match exactly: `merge`/`split`
use `delete_originals` (plural); `edit_pdf`/`remove_password` use
`delete_original` (singular) (`bulk_edit.py:509,619,751,882`).
## OpenAPI examples (the "make it useful" piece)
A single helper builds the examples from the registry:
```python
def _bulk_edit_examples() -> list[OpenApiExample]:
return [
OpenApiExample(
name=op.name,
summary=op.name,
value={"documents": [1, 2], "method": op.name,
"parameters": op.example_parameters},
request_only=True,
)
for op in BULK_EDIT_OPERATIONS.values()
]
```
wired into the existing decorator (the response schema at `views.py:2818-2825`
is untouched):
```python
@extend_schema_view(
post=extend_schema(
operation_id="bulk_edit",
description="Perform a bulk edit operation on a list of documents",
examples=_bulk_edit_examples(),
responses={200: inline_serializer(name="BulkEditDocumentsResult",
fields={"result": serializers.CharField()})},
),
)
```
Result: the Swagger/Redoc page shows a concrete, valid request body for every
operation (`merge`, `edit_pdf`, …), generated from the same objects that
validate the request — they cannot drift apart. The request _structure_
(`{documents, method, parameters, …}`) and the `method` `enum` are unchanged;
examples are purely additive.
## Data flow
```
POST /api/documents/bulk_edit/ {documents|all|filters, method, parameters, from_webui}
├─ legacy method? → log deprecation warning, then process INLINE (no redirect; views.py:2856-2866)
├─ BulkEditSerializer.validate_method(method) → op = BULK_EDIT_OPERATIONS[method]
├─ validate():
│ ├─ all=true and not op.supports_all → ValidationError (shared message)
│ ├─ op.max_documents and len(documents) > it → ValidationError (op.too_many_documents_message)
│ └─ parameters = op.clean_parameters(parameters, user=…, documents=…)
└─ BulkEditView.post (inlined; the only path bulk_edit/ uses):
├─ if op.needs_user: parameters["user"] = user
├─ if op.needs_trigger_source: parameters["trigger_source"] = WEB_UI|API_UPLOAD
├─ reqs = op.required_permissions(parameters); check change/ownership/add/delete
│ → 403 HttpResponseForbidden on failure (unchanged)
├─ if op.audit_field and AUDIT_LOG_ENABLED: snapshot old values (views.py:2896-2910)
├─ result = op.execute(documents, **parameters) (call-time bulk_edit.<fn> lookup)
└─ if op.audit_field and AUDIT_LOG_ENABLED: LogEntry per doc → Response({"result": …})
note: _execute_document_action (views.py:2764-2807) is the SEPARATE moved-single-action path
(/api/documents/delete/, /rotate/, …); it converts to the registry too, but has NO audit log.
audit "reason" string uses op.name (== bulk_edit.<fn>.__name__ today, so unchanged at runtime).
```
## Backwards compatibility
- **Wire contract:** request/response shapes, accepted `method` strings,
parameter names, defaults, coercion, and every permission outcome are
byte-for-byte preserved. `method` becoming an operation object is internal
(`write_only`).
- **`bulk_edit.<fn>` patching keeps working — by module identity, not luck.**
Existing tests patch `documents.serialisers.bulk_edit.<fn>` and
`documents.views.bulk_edit.<fn>` (e.g. `test_api_bulk_edit.py:203,485,1100,1271`).
All of `documents.serialisers.bulk_edit`, `documents.views.bulk_edit`, and the
new `documents.bulk_operations.bulk_edit` are the **same module object** in
`sys.modules`; patching an attribute via any path mutates the one shared module.
So as long as each `op.execute` does a **call-time** lookup
(`return bulk_edit.merge(doc_ids, **kw)`, not a function captured at
class-definition time), the existing patches still intercept and those tests
stay untouched.
- **The `method.__name__` dependency disappears.** `setup_mock`
(`test_api_bulk_edit.py:61-63`) sets `m.__name__` because dispatch reads
`method.__name__` (`views.py:2783,2879,2896,2938`). The refactor replaces every
such read with `op.name` / `op.needs_user` / `op.audit_field`, so the mock's
`__name__` no longer affects dispatch. The audit "reason" becomes
`f"Bulk edit: {op.name}"`; since `bulk_edit.merge.__name__ == "merge" ==
op.name`, real-run audit text is unchanged. No test asserts
`validated_data["method"]` identity (verified), so `validate_method` returning
an operation object is safe.
- **Legacy methods:** `MOVED_DOCUMENT_ACTION_ENDPOINTS` /
`LEGACY_DOCUMENT_ACTION_METHODS` and the v9-drop TODO are unchanged. They drive
only the inline deprecation warning (`views.py:2856-2866`), **not** the
`choices` — which come wholly from the registry, since the legacy methods _are_
registry ops (see C1/Decision 5).
- **OpenAPI:** the `method` `enum` and request/response structure are unchanged
(Decision 5); `examples` are additive. Regenerated schema diff should be
_examples only_.
## Testing
New `documents/tests/test_bulk_operations.py` (pytest classes, factory-boy
factories, the `mocker` fixture, `parametrize`, full type annotations; run on the
Linux VM):
- **Permission matrix, parametrized over every operation** — the highest-value
test. For each op and each relevant parameter combination
(`delete_originals` on/off, `update_document` on/off), assert
`op.required_permissions(params)` equals the expected
`PermissionRequirements`. This freezes the security kernel against drift.
- **Registry/serializer parity** — `ChoiceField.choices` equals the **16 unique**
method strings in today's exact order (8 field-ops, then the 8
`MOVED_DOCUMENT_ACTION_ENDPOINTS` keys); **no duplicates**; `redo_ocr` absent;
every method resolves to an operation. (Guards against the C1 duplication bug.)
- **Parameter validation/coercion** per op — defaults applied (`merge` →
`delete_originals=False`, `archive_fallback=False`; `split`/`edit_pdf` defaults),
the `pages` string→list parse, page-bounds-vs-`page_count`, documentlink target
and owner-existence checks — mirroring the current `_validate_parameters_*` tests.
- **`supports_all` / `max_documents`** — `all=true` rejected for the five
no-all ops; `>1` document rejected for `split`/`delete_pages`/`edit_pdf`.
- **Examples** — `_bulk_edit_examples()` yields one entry per distinct operation,
each `value["parameters"]` validates clean through that op's
`parameter_serializer_class` (guarantees documented examples are valid).
Existing `test_api_bulk_edit.py` / `test_api_bulk_download.py` stay green
unchanged — external behavior (accepted methods, validation errors, permission
403s, audit fields, results) is preserved.
## Risks
- **Permission-matrix mistranslation is a privilege-escalation bug, not a
cosmetic one.** This is the whole ballgame. Mitigation: move the logic verbatim
into per-op `required_permissions`, and the parametrized permission test above
is written _first_ against the current behavior, then held invariant across the
refactor.
- **The `method`-as-function-object contract** is relied on by existing tests
(identity compares, `bulk_edit.<fn>` patching). Mitigation: keep `execute`
delegating to the module-level function so patches still bite; adjust only the
identity asserts. Audit `test_api_bulk_edit.py` before coding.
- **Serializer-based validation subtly changing error messages/shapes.** The
current validators raise specific `ValidationError` strings that tests assert
on. Mitigation: preserve message text when porting each `_validate_parameters_*`
into its serializer; diff the test expectations.
- **Enum churn in the generated schema.** Mitigation: Decision 5 fixes member set
and order; the schema-diff check in CI should show examples-only changes.