From 8ef3e4ad0089c61d6768a9018d25ecfef3ee68e7 Mon Sep 17 00:00:00 2001 From: stumpylog <797416+stumpylog@users.noreply.github.com> Date: Tue, 16 Jun 2026 09:45:03 -0700 Subject: [PATCH] Docs: bulk-edit operation registry spec + implementation plan Adds the extensibility design that hoists the bulk-edit operation definition (today smeared across serialisers.py, views.py, bulk_edit.py and keyed three different ways) behind a BulkEditOperation registry + PermissionRequirements value object, with per-operation OpenAPI examples. Contract-preserving refactor; both docs reviewed across multiple passes (permission matrix verified, both view call sites accounted for). Co-Authored-By: Claude Opus 4.8 --- ...2026-06-16-bulk-edit-operation-registry.md | 524 ++++++++++++++++++ ...-16-bulk-edit-operation-registry-design.md | 513 +++++++++++++++++ 2 files changed, 1037 insertions(+) create mode 100644 docs/superpowers/plans/2026-06-16-bulk-edit-operation-registry.md create mode 100644 docs/superpowers/specs/2026-06-16-bulk-edit-operation-registry-design.md diff --git a/docs/superpowers/plans/2026-06-16-bulk-edit-operation-registry.md b/docs/superpowers/plans/2026-06-16-bulk-edit-operation-registry.md new file mode 100644 index 000000000..3a6dc880f --- /dev/null +++ b/docs/superpowers/plans/2026-06-16-bulk-edit-operation-registry.md @@ -0,0 +1,524 @@ +# Bulk-Edit Operation Registry Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Collapse the bulk-edit operation definition — today smeared across 8 sites in 3 files, keyed 3 different ways — into a single `BulkEditOperation` object per operation, held in an ordered registry. The serializer and both view call sites consume the registry instead of re-encoding the operation list. The wire/API contract is preserved byte-for-byte; per-operation OpenAPI examples are added so the bulk API documents itself. + +**Architecture:** A new `documents/bulk_operations.py` defines a `BulkEditOperation` ABC, a frozen `PermissionRequirements` value object, a per-operation DRF parameter serializer (validation + coercion), and an ordered `BULK_EDIT_OPERATIONS` registry whose 16 entries wrap the existing `bulk_edit.py` functions (which are unchanged). `BulkEditSerializer` resolves a method string to an operation and delegates parameter validation; `BulkEditView.post` and `_execute_document_action` read `op.needs_user` / `op.audit_field` / `op.required_permissions(...)` instead of the `METHOD_NAMES_*` sets, `MODIFIED_FIELD_BY_METHOD`, and the three `method in [...]` permission blocks. + +**Tech Stack:** Python ≥3.11, Django REST Framework, drf-spectacular, pytest + pytest-mock + factory-boy. Backend tests run on the Linux VM (this is a Windows host); `ruff` runs locally. + +**Spec:** `docs/superpowers/specs/2026-06-16-bulk-edit-operation-registry-design.md` (rev. 2 — read the Operation inventory matrix and the Parameter coercion contract before starting; they are the source of truth for every per-op cell). + +--- + +## Conventions for every task + +- **Run backend tests on the VM** via the helper (never locally — the lockfile is linux/macOS only): + ```bash + bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "" + ``` +- **Lint locally** with the global ruff binary (not `uv run`): + ```bash + ruff check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py + ruff format src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py + ``` +- **New tests are pytest-style** (per CLAUDE.md): grouped in classes, `@pytest.mark.django_db` on the class where DB is needed, factory-boy factories (`UserFactory`, `DocumentFactory`, `TagFactory`, …), the `mocker` fixture, `@pytest.mark.parametrize`, full type annotations on fixtures and tests. +- **`CustomFieldFactory` does not exist yet** in `tests/factories.py` (only `Correspondent`/`DocumentType`/`Tag`/`StoragePath`/`Document`/`User`/`PaperlessTask`). The `modify_custom_fields` `clean_parameters` tests need `CustomField` rows — add a `CustomFieldFactory` there first (per CLAUDE.md's "add a factory when a model lacks one"). +- **Do NOT convert the existing `test_api_bulk_edit.py`** (DRF `APITestCase` style) — it is the regression net and stays as-is. It must be green at every commit. Its `mock.patch("documents.serialisers.bulk_edit.")` / `documents.views.bulk_edit.` targets keep working **only if** the two invariants below hold — verify them, do not assume them. + +### Two load-bearing invariants (the contract-preservation kernel) + +1. **Module identity:** `serialisers.py`, `views.py`, and the new `bulk_operations.py` must each import the operations module as `from documents import bulk_edit` (module import, not `from documents.bulk_edit import merge`). All three then reference the _same_ `sys.modules["documents.bulk_edit"]` object, so a `mock.patch("documents.serialisers.bulk_edit.merge")` mutates the attribute every call site sees. **Verify** `serialisers.py` and `views.py` already use `from documents import bulk_edit` before relying on this. +2. **Call-time lookup:** each `BulkEditOperation.execute` must call `bulk_edit.merge(doc_ids, **kw)` (attribute lookup at call time), NOT capture the function at class-definition time (`fn = bulk_edit.merge` as a class attribute). Otherwise the patch — applied after import — won't be seen. + +## File structure + +- **Create** `src/documents/bulk_operations.py` — `PermissionRequirements`, `BulkEditOperation` ABC, the per-op parameter serializers, the 16 operation classes, and the ordered `BULK_EDIT_OPERATIONS` registry. One cohesive module. +- **Create** `src/documents/tests/test_bulk_operations.py` — pytest-style unit tests: the permission-matrix characterization (Task 1), then `required_permissions` / `clean_parameters` / registry-parity unit tests (Task 2). +- **Modify** `src/documents/serialisers.py` — rewrite `BulkEditSerializer.method` choices, `validate_method`, and `validate()`; delete the `_validate_parameters_*` methods (their logic moves into the per-op serializers). +- **Modify** `src/documents/views.py` — rewrite `_has_document_permissions`; delete `METHOD_NAMES_REQUIRING_USER`/`_TRIGGER_SOURCE` and `MODIFIED_FIELD_BY_METHOD`; route `BulkEditView.post` through the registry; change `_execute_document_action`'s signature from `method` to `op`; and update the **six** moved-endpoint caller views (`RotateDocumentsView`, `MergeDocumentsView`, `DeleteDocumentsView`, `ReprocessDocumentsView`, `EditPdfDocumentsView`, `RemovePasswordDocumentsView`, `views.py:2964-3109`) to pass `op=BULK_EDIT_OPERATIONS[""]` instead of `method=bulk_edit.`. Add `from drf_spectacular.utils import OpenApiExample` (Task 4 needs it — not currently imported). + +--- + +## Task 1: Permission-matrix characterization test (the safety net) + +This test freezes today's permission behavior **before** any refactor. It must PASS against the current code unchanged — if any case is red now, the spec's matrix (or your reading of it) is wrong; stop and reconcile before proceeding. After the cutover (Task 3) it must still pass identically. + +**Files:** + +- Create: `src/documents/tests/test_bulk_operations.py` + +- [ ] **Step 1: Write the behavior-level permission test against the live API** + +Drive the real `bulk_edit/` endpoint so the test is independent of internal structure (it survives the refactor without edits). Build users with precise permission sets and owners, and assert the 200-vs-403 outcome per operation and parameter combination. Cover, at minimum, the conditional cases the spec calls out: + +- ownership required: `set_permissions`, `delete`, `rotate`, `delete_pages`, `edit_pdf`, `remove_password` (unconditional); `merge`/`split` only when `delete_originals=true`. +- `add_document` required: `split`, `merge` (unconditional); `edit_pdf`/`remove_password` only when `update_document` is falsy. +- `delete_document` required: `delete` (unconditional); `merge`/`split` only when `delete_originals=true`. + +```python +import pytest +from rest_framework import status +from rest_framework.test import APIClient + +from documents.models import Document +from documents.tests.factories import DocumentFactory +from documents.tests.factories import UserFactory + + +@pytest.mark.django_db +class TestBulkEditPermissionMatrix: + @pytest.fixture() + def owned_docs(self, ...) -> list[Document]: ... + + # parametrize (method, parameters, perms_to_grant, is_owner) -> expected_status + @pytest.mark.parametrize(("method", "parameters", "grant", "owner", "expected"), [ + ("set_correspondent", {"correspondent": None}, ["change"], False, status.HTTP_200_OK), + ("delete", {}, ["change"], True, status.HTTP_200_OK), + ("delete", {}, ["change"], False, status.HTTP_403_FORBIDDEN), # ownership + ("delete", {}, ["change", "delete"], False, status.HTTP_403_FORBIDDEN), # still needs ownership + ("merge", {"delete_originals": False}, ["change", "add"], False, status.HTTP_200_OK), # no ownership when not deleting + ("merge", {"delete_originals": True}, ["change", "add", "delete"], False, status.HTTP_403_FORBIDDEN), # ownership now required + ("edit_pdf", {"operations": [{"page": 1}], "update_document": False}, ["change"], True, status.HTTP_403_FORBIDDEN), # needs add_document + ("edit_pdf", {"operations": [{"page": 1}], "update_document": True}, ["change"], True, status.HTTP_200_OK), # update => owner+change only + ("remove_password", {"password": "x", "update_document": False}, ["change"], True, status.HTTP_403_FORBIDDEN), # needs add_document + ("remove_password", {"password": "x", "update_document": True}, ["change"], True, status.HTTP_200_OK), + # ... fill every row of the spec matrix, both polarities of each conditional ... + ]) + def test_permission_outcome(self, method, parameters, grant, owner, expected, ...) -> None: + # mock the actual bulk_edit. so execution is a no-op; we test ONLY the + # permission gate's status code, not the operation's effect. + ... +``` + +Notes: + +- Mock the underlying `bulk_edit.` (patch `documents.views.bulk_edit.`) so the operations don't actually run — this test is purely about the permission gate returning 200 vs 403. +- A superuser short-circuits to allowed (`views.py:2697`); include one superuser row to pin that. +- This is verbose by design; the matrix is the security contract. Prefer one parametrized test over hand-written methods. +- **Cover the six moved single-action endpoints too (REQUIRED — C2).** `/api/documents/rotate/`, `/merge/`, `/delete/`, `/reprocess/`, `/edit_pdf/`, `/remove_password/` run the **same** `_has_document_permissions` gate via `_execute_document_action`, and that path is rewritten in Task 3 (C1). Add a parallel parametrized test that POSTs to each (their request bodies are the dedicated serializers' fields — e.g. `{"documents": [...], "degrees": 90}` for rotate — **not** a `method`+`parameters` envelope). The existing `test_api_bulk_edit.py` already covers these endpoints' permission gates (`test_rotate_insufficient_permissions:1320`, `test_merge_and_delete_insufficient_permissions:1381`, `test_edit_pdf_insufficient_permissions:1635`, `test_remove_password_insufficient_permissions:1719`), so this is hardening rather than the sole net — but make the moved-endpoint matrix explicit here so the `_execute_document_action` rewrite is guarded by a parametrized characterization, not scattered one-offs. +- **`edit_pdf` test docs need a `page_count` (M3).** `clean_parameters` for `edit_pdf` bounds-checks `op["page"]` against `Document.page_count` (`serialisers.py:2052-2059`); this test mocks execution but **not** validation, so an `edit_pdf` row with `page: 1` needs its target doc created with `page_count >= 1`, else it fails with a 400 (out-of-bounds) instead of the expected 200/403. + +- [ ] **Step 2: Run it against CURRENT code — it must PASS** + +Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py -v"` +Expected: PASS. If any row is red, the spec matrix is misread — reconcile against `views.py:2713-2760` before writing any production code. + +- [ ] **Step 3: Commit** + +```bash +git add src/documents/tests/test_bulk_operations.py +git commit -m "Test: characterize bulk-edit permission matrix before refactor" +``` + +--- + +## Task 2: Build `bulk_operations.py` (registry, ABC, ops, serializers) — old path untouched + +Build the entire new module with full unit coverage **while the existing dispatch still runs**, so the whole suite stays green throughout. Nothing in `serialisers.py`/`views.py` changes in this task. + +**Files:** + +- Create: `src/documents/bulk_operations.py` +- Modify (append): `src/documents/tests/test_bulk_operations.py` + +- [ ] **Step 1: Write failing unit tests for `PermissionRequirements`, `required_permissions`, and `clean_parameters`** + +Append to `test_bulk_operations.py`. White-box this time — assert the value objects directly: + +```python +from documents import bulk_operations as ops + + +class TestRequiredPermissions: + @pytest.mark.parametrize(("name", "params", "expected"), [ + ("set_correspondent", {}, ops.PermissionRequirements(change=True)), + ("delete", {}, ops.PermissionRequirements(change=True, ownership=True, delete_document=True)), + ("merge", {"delete_originals": False}, ops.PermissionRequirements(change=True, add_document=True)), + ("merge", {"delete_originals": True}, ops.PermissionRequirements(change=True, add_document=True, ownership=True, delete_document=True)), + ("edit_pdf", {"update_document": False}, ops.PermissionRequirements(change=True, ownership=True, add_document=True)), + ("edit_pdf", {"update_document": True}, ops.PermissionRequirements(change=True, ownership=True)), + ("remove_password", {"update_document": False}, ops.PermissionRequirements(change=True, ownership=True, add_document=True)), + ("remove_password", {"update_document": True}, ops.PermissionRequirements(change=True, ownership=True)), + # ... every operation, both polarities of each conditional (spec matrix) ... + ]) + def test_required_permissions(self, name, params, expected) -> None: + assert ops.BULK_EDIT_OPERATIONS[name].required_permissions(params) == expected + + +class TestRegistryParity: + def test_choices_are_16_unique_in_canonical_order(self) -> None: + # 8 field-ops, then MOVED_DOCUMENT_ACTION_ENDPOINTS key order + assert list(ops.BULK_EDIT_OPERATIONS) == [ + "set_correspondent", "set_document_type", "set_storage_path", + "add_tag", "remove_tag", "modify_tags", "modify_custom_fields", + "set_permissions", + "delete", "reprocess", "rotate", "merge", + "edit_pdf", "remove_password", "split", "delete_pages", + ] + assert "redo_ocr" not in ops.BULK_EDIT_OPERATIONS + + def test_every_op_executes_via_module_attribute(self, mocker) -> None: + # guards invariant #2: call-time lookup so patches still bite + m = mocker.patch("documents.bulk_operations.bulk_edit.merge", return_value="OK") + ops.BULK_EDIT_OPERATIONS["merge"].execute([1], delete_originals=False) + m.assert_called_once() + + +@pytest.mark.django_db +class TestCleanParameters: + # mirror the existing _validate_parameters_* tests: defaults applied, pages + # string parse, page-bounds vs page_count, custom-field list-or-dict + + # documentlink targets, owner existence, source_mode gating. Assert the SAME + # ValidationError message strings the old validators raised. + ... +``` + +- [ ] **Step 2: Run to verify it fails** + +Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py::TestRequiredPermissions -v"` +Expected: FAIL with `ModuleNotFoundError: No module named 'documents.bulk_operations'`. + +- [ ] **Step 3: Implement `PermissionRequirements` and the `BulkEditOperation` ABC** + +```python +from __future__ import annotations + +import dataclasses +from abc import ABC +from abc import abstractmethod +from typing import ClassVar + +from rest_framework import serializers + +from documents import bulk_edit # module import — invariant #1 + + +@dataclasses.dataclass(frozen=True) +class PermissionRequirements: + change: bool = True # documents.change_document + object-level, always + ownership: bool = False # user owns (or doc.owner is None for) ALL docs + add_document: bool = False # documents.add_document + delete_document: bool = False # documents.delete_document + + +class BulkEditOperation(ABC): + name: ClassVar[str] + audit_field: ClassVar[str | None] = None + supports_all: ClassVar[bool] = True + max_documents: ClassVar[int | None] = None + too_many_documents_message: ClassVar[str | None] = None + needs_user: ClassVar[bool] = False + needs_trigger_source: ClassVar[bool] = False + parameter_serializer_class: ClassVar[type[serializers.Serializer] | None] = None + example_parameters: ClassVar[dict] = {} + + def clean_parameters(self, parameters: dict, *, user, documents: list[int]) -> dict: + if self.parameter_serializer_class is None: + return parameters + serializer = self.parameter_serializer_class( + data=parameters, + context={"user": user, "documents": documents}, + ) + serializer.is_valid(raise_exception=True) + # merge coerced/validated values back over the raw dict so passthrough + # keys (e.g. metadata_document_id, source_mode) survive. + return {**parameters, **serializer.validated_data} + + def required_permissions(self, parameters: dict) -> PermissionRequirements: + return PermissionRequirements() + + @abstractmethod + def execute(self, doc_ids: list[int], **parameters) -> str: ... +``` + +- [ ] **Step 4: Implement the 16 operation classes + parameter serializers** + +Follow the spec's Operation inventory matrix for every cell. Representative examples — the simple assignment op, and the two conditional ones: + +```python +class SetCorrespondentOperation(BulkEditOperation): + name = "set_correspondent" + audit_field = "correspondent" + parameter_serializer_class = SetCorrespondentParametersSerializer # validates correspondent id|null + example_parameters = {"correspondent": 1} + + def execute(self, doc_ids, **kw): + return bulk_edit.set_correspondent(doc_ids, **kw) + + +class MergeOperation(BulkEditOperation): + name = "merge" + supports_all = False + needs_user = needs_trigger_source = True + parameter_serializer_class = MergeParametersSerializer + example_parameters = {"delete_originals": False, "archive_fallback": False} + + def required_permissions(self, parameters): + delete = parameters.get("delete_originals", False) + return PermissionRequirements( + change=True, add_document=True, + ownership=delete, delete_document=delete, + ) + + def execute(self, doc_ids, **kw): + return bulk_edit.merge(doc_ids, **kw) + + +class EditPdfOperation(BulkEditOperation): + name = "edit_pdf" + supports_all = False + max_documents = 1 + too_many_documents_message = "Edit PDF method only supports one document" + needs_user = needs_trigger_source = True + parameter_serializer_class = EditPdfParametersSerializer + example_parameters = {"operations": [{"page": 1, "rotate": 90}], "update_document": False, "include_metadata": True} + + def required_permissions(self, parameters): + # edit_pdf is ALWAYS ownership-gated (views.py:2722); add_document only + # when NOT update_document (views.py:2740-2741). + update = parameters.get("update_document", False) + return PermissionRequirements(change=True, ownership=True, add_document=not update) + + def execute(self, doc_ids, **kw): + return bulk_edit.edit_pdf(doc_ids, **kw) +``` + +Parameter serializers carry the validation+coercion the spec's "Parameter coercion contract to preserve" section enumerates — preserve the exact `ValidationError` message strings. Example for the DB/cross-field case: + +```python +class EditPdfParametersSerializer(serializers.Serializer): + operations = serializers.ListField(child=serializers.DictField()) + update_document = serializers.BooleanField(required=False, default=False) + include_metadata = serializers.BooleanField(required=False, default=True) + # source_mode handled here too, only when present + + def validate(self, attrs): + # reproduce serialisers.py:2045-2059 verbatim, incl. messages: + # - "update_document only allowed with a single output document" + # - page-bounds: "Page {n} is out of bounds for document with {k} pages." + # using self.context["documents"][0] / Document.objects.get(...) + return attrs +``` + +`RemovePasswordOperation` keeps an `update_document` param (it exists — `bulk_edit.py:881`); its `required_permissions` mirrors `EditPdfOperation`'s `add_document=not update` (but ownership is unconditional too — see matrix). `DeleteOperation` / `ReprocessOperation` set `parameter_serializer_class = None`. Do **not** register `redo_ocr`. + +**Defaulting parity (H3) — match each old validator exactly, no more, no less.** `test_api_bulk_edit.py` asserts `mock.call_args` kwargs, so a serializer that injects a default the old validator didn't will break those asserts. `edit_pdf` _did_ default `update_document=False` / `include_metadata=True` (`serialisers.py:2038-2043`) → keep them. `remove_password` validated **only** `password` (`serialisers.py:2061-2065`) and did **not** default `update_document` / `include_metadata` / `delete_original` → `RemovePasswordParametersSerializer` must declare only `password`. `update_document` then survives as a **raw passthrough key** in `parameters` (so `required_permissions` still reads it via `parameters.get("update_document", False)`), and no extra kwargs reach `bulk_edit.remove_password`. Apply the same "match the old defaulting" rule to every op. + +**`set_permissions` transform (H2) — the QuerySet shape is load-bearing.** `SetPermissionsParametersSerializer` must run `validate_set_permissions` (from `SetPermissionsMixin`, which `BulkEditSerializer` already inherits) so that `validated_data["set_permissions"]` carries the **QuerySet-dict** structure `bulk_edit.set_permissions` consumes — not the raw `{view:{users:[ids]}}` dict. A plain `DictField` would leave the raw dict in `validated_data`, and `{**parameters, **validated_data}` would then feed the function the wrong shape. Also default `merge=False` and validate `owner` existence (`serialisers.py:1946-1952`). + +Build the **ordered** registry (legacy section in `MOVED_DOCUMENT_ACTION_ENDPOINTS` key order — `edit_pdf, remove_password` before `split, delete_pages`): + +```python +BULK_EDIT_OPERATIONS: dict[str, BulkEditOperation] = { + op.name: op + for op in ( + SetCorrespondentOperation(), SetDocumentTypeOperation(), + SetStoragePathOperation(), AddTagOperation(), RemoveTagOperation(), + ModifyTagsOperation(), ModifyCustomFieldsOperation(), SetPermissionsOperation(), + DeleteOperation(), ReprocessOperation(), RotateOperation(), MergeOperation(), + EditPdfOperation(), RemovePasswordOperation(), SplitOperation(), DeletePagesOperation(), + ) +} +``` + +- [ ] **Step 5: Run unit tests to green** + +Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py -v"` +Expected: PASS (permission matrix, required_permissions, registry parity, clean_parameters). The existing `test_api_bulk_edit.py` is untouched and still green (old path runs). + +- [ ] **Step 6: Lint & commit** + +```bash +ruff check src/documents/bulk_operations.py && ruff format src/documents/bulk_operations.py +git add src/documents/bulk_operations.py src/documents/tests/test_bulk_operations.py +git commit -m "Feature: add bulk-edit operation registry (not yet wired)" +``` + +--- + +## Task 3: Cutover — wire the serializer and BOTH view call sites + +This is the atomic swap: `validate_method` returning an operation object ripples to both view sites, so serializer + views land in **one commit**. The full `test_api_bulk_edit.py` regression suite plus Task 1's matrix test are the contract; both must be green at the end. + +**Files:** + +- Modify: `src/documents/serialisers.py` +- Modify: `src/documents/views.py` + +- [ ] **Step 1: Confirm invariant #1** + +Grep that `serialisers.py` and `views.py` import `from documents import bulk_edit` (not `from documents.bulk_edit import ...`). If they use member imports, the existing patches break — convert to module import as part of this task and note it. + +- [ ] **Step 2: Rewrite `BulkEditSerializer`** + +- `method = serializers.ChoiceField(choices=list(bulk_operations.BULK_EDIT_OPERATIONS), ...)` — registry alone (16, canonical order), **not** `+ LEGACY_DOCUMENT_ACTION_METHODS`. +- `validate_method` → `return bulk_operations.BULK_EDIT_OPERATIONS[method]` (returns the op; raise `ValidationError("Unsupported method.")` on KeyError to preserve the message). +- `validate()`: + ```python + op = attrs["method"] + if attrs.get("all", False) and not op.supports_all: + raise serializers.ValidationError("This method does not support all=true.") + if op.max_documents is not None and len(attrs["documents"]) > op.max_documents: + raise serializers.ValidationError(op.too_many_documents_message) + attrs["parameters"] = op.clean_parameters( + attrs["parameters"], user=self.user, documents=attrs["documents"], + ) + return attrs + ``` +- **Delete** all `_validate_parameters_*` / `_validate_storage_path` / `validate_parameters_remove_password` methods (their logic now lives in the per-op serializers). Keep `MOVED_DOCUMENT_ACTION_ENDPOINTS` / `LEGACY_DOCUMENT_ACTION_METHODS` (still used by the view's deprecation warning). + +- [ ] **Step 3: Rewrite `_has_document_permissions` to consume `PermissionRequirements`** + +```python +def _has_document_permissions(self, *, user, documents, op, parameters) -> bool: + if user.is_superuser: + return True + document_objs = Document.objects.select_related("owner").filter(pk__in=documents) + reqs = op.required_permissions(parameters) + ok = user.has_perm("documents.change_document") and all( + has_perms_owner_aware(user, "change_document", doc) for doc in document_objs + ) + if ok and reqs.ownership: + ok = all((doc.owner == user or doc.owner is None) for doc in document_objs) + if ok and reqs.add_document: + ok = user.has_perm("documents.add_document") + if ok and reqs.delete_document: + ok = user.has_perm("documents.delete_document") + return ok +``` + +- [ ] **Step 4: Route BOTH call sites through the op — they obtain the op differently** + +There are two distinct paths, and `_execute_document_action` does **NOT** read `validated_data["method"]` (its serializers have no `method` field — it receives the operation as an argument). Handle each: + +- **Delete** `METHOD_NAMES_REQUIRING_USER`, `METHOD_NAMES_REQUIRING_TRIGGER_SOURCE` (note: it is an alias — `METHOD_NAMES_REQUIRING_TRIGGER_SOURCE = METHOD_NAMES_REQUIRING_USER` at `views.py:2687` — so they are one object), and `MODIFIED_FIELD_BY_METHOD`. + +- **`BulkEditView.post`** (`views.py:2852-2947`) — the `/bulk_edit/` path: `op = serializer.validated_data["method"]` (the registry object `validate_method` now returns). Replace `method.__name__ in METHOD_NAMES_REQUIRING_USER` → `op.needs_user`; trigger-source check → `op.needs_trigger_source`; the permission call → `_has_document_permissions(op=op, ...)`; `method(documents, **parameters)` → `op.execute(documents, **parameters)`. Audit block: `modified_field = op.audit_field` (replaces `MODIFIED_FIELD_BY_METHOD.get(method.__name__)`), reason → `f"Bulk edit: {op.name}"`. Snapshot/`log_create` otherwise unchanged. + +- **`_execute_document_action`** (`views.py:2764-2807`) — the moved single-action path used by six views: change its signature from `method` to `op: BulkEditOperation`. Inside, replace `method.__name__ in METHOD_NAMES_REQUIRING_USER` → `op.needs_user`; trigger check → `op.needs_trigger_source`; `_has_document_permissions(method=method, ...)` → `_has_document_permissions(op=op, ...)`; `method(documents, **parameters)` → `op.execute(documents, **parameters)`. This path has **no** audit block — leave it that way. `op.clean_parameters` is **not** called here: each moved view's own serializer (`RotateDocumentsSerializer`, `MergeDocumentsSerializer`, …) already validated its parameters; the op supplies only needs_user / needs_trigger_source / required_permissions / execute. + +- **The six caller views** (`RotateDocumentsView:2964`, `MergeDocumentsView:2991`, `DeleteDocumentsView:3018`, `ReprocessDocumentsView:3045`, `EditPdfDocumentsView:3072`, `RemovePasswordDocumentsView:3099`): change each `method=bulk_edit.` argument to `op=BULK_EDIT_OPERATIONS[""]` (e.g. `op=BULK_EDIT_OPERATIONS["rotate"]`). + +- [ ] **Step 5: Run the FULL regression + matrix suites** + +Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_bulk_edit.py src/documents/tests/test_bulk_operations.py -v"` +Expected: PASS — every existing `test_api_bulk_edit.py` test (patch targets still bite via invariant #1; `__name__`-dependent asserts gone), plus Task 1's matrix unchanged. If a `documents.serialisers.bulk_edit.X` / `documents.views.bulk_edit.X` patch stops biting, invariant #1 or #2 is violated — check the import style and that `execute` does call-time lookup. + +- [ ] **Step 6: Run the broader API + audit suites** (signals/audit log touch this path) + +Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_documents.py src/documents/tests/test_api_bulk_download.py -k bulk or audit -v"` +Expected: PASS. + +- [ ] **Step 7: Lint & commit** + +```bash +ruff check src/documents/serialisers.py src/documents/views.py && ruff format src/documents/serialisers.py src/documents/views.py +git add src/documents/serialisers.py src/documents/views.py +git commit -m "Refactor: route bulk_edit through the operation registry" +``` + +--- + +## Task 4: Registry-driven OpenAPI examples + +**Files:** + +- Modify: `src/documents/views.py` +- Test: `src/documents/tests/test_bulk_operations.py` + +- [ ] **Step 1: Write a failing test that every example validates** + +```python +class TestBulkEditExamples: + def test_every_operation_has_a_valid_example(self) -> None: + from documents.views import _bulk_edit_examples + examples = _bulk_edit_examples() + assert {e.summary for e in examples} == set(ops.BULK_EDIT_OPERATIONS) + for ex in examples: + op = ops.BULK_EDIT_OPERATIONS[ex.value["method"]] + if op.parameter_serializer_class is not None: + s = op.parameter_serializer_class(data=ex.value["parameters"], context={...}) + assert s.is_valid(), s.errors +``` + +- [ ] **Step 2: Implement the helper and wire `@extend_schema`** + +First add the import — `OpenApiExample` is **not** currently in `views.py` (extend the existing `from drf_spectacular.utils import ...` line): + +```python +from drf_spectacular.utils import OpenApiExample +``` + +```python +def _bulk_edit_examples() -> list[OpenApiExample]: + return [ + OpenApiExample( + name=op.name, summary=op.name, + value={"documents": [1, 2], "method": op.name, "parameters": op.example_parameters}, + request_only=True, + ) + for op in BULK_EDIT_OPERATIONS.values() + ] +``` + +Add `examples=_bulk_edit_examples()` to the existing `bulk_edit` `extend_schema(...)` (`views.py:2811-2825`). Leave `operation_id`, `description`, and the `responses` inline serializer unchanged. + +- [ ] **Step 3: Run the example test + a schema smoke check** + +Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_bulk_operations.py::TestBulkEditExamples -v"` +Then regenerate the OpenAPI schema on the VM and confirm the diff is **examples-only** — the `method` enum membership/order is byte-identical and the request/response structure is unchanged: + +```bash +ssh -o BatchMode=yes -p 2244 trenton@localhost 'bash -lc "cd ~/projects/paperless-ngx && uv run manage.py spectacular --file /tmp/schema.yml"' +``` + +Expected: schema generates without error; the `bulk_edit` `method` enum lists the 16 methods in canonical order; examples appear. + +- [ ] **Step 4: Lint & commit** + +```bash +ruff check src/documents/views.py && ruff format src/documents/views.py +git add src/documents/views.py src/documents/tests/test_bulk_operations.py +git commit -m "Feature: document bulk_edit parameters via per-operation OpenAPI examples" +``` + +--- + +## Task 5: Final verification + +**Files:** none (verification only). + +- [ ] **Step 1: Full bulk-edit-related suite** + +Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_api_bulk_edit.py src/documents/tests/test_bulk_operations.py src/documents/tests/test_api_bulk_download.py -v"` +Expected: PASS, no failures, no errors. + +- [ ] **Step 2: Type-check on the VM (pyrefly, with baseline)** + +```bash +tar czf - src pyproject.toml uv.lock .pyrefly-baseline.json | ssh -o BatchMode=yes -p 2244 trenton@localhost 'tar xzf - -C ~/projects/paperless-ngx' +ssh -o BatchMode=yes -p 2244 trenton@localhost 'bash -lc "cd ~/projects/paperless-ngx && uv run pyrefly check"' +``` + +Expected: no new type errors beyond the baseline. + +- [ ] **Step 3: Final lint/format pass** + +Run: `ruff check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py src/documents/tests/test_bulk_operations.py && ruff format --check src/documents/bulk_operations.py src/documents/serialisers.py src/documents/views.py` +Expected: clean. + +- [ ] **Step 4: Confirm the smear is gone** + +Grep to verify no orphaned references remain: `MODIFIED_FIELD_BY_METHOD`, `METHOD_NAMES_REQUIRING_USER`, `_validate_parameters_`, and `method.__name__` in `views.py` should all be gone; `bulk_edit.` should appear only inside `bulk_operations.py` `execute` methods. + +--- + +## Notes for the implementer + +- **The permission matrix is the whole ballgame.** A wrong `required_permissions` cell is a privilege-escalation bug, not a cosmetic one. Task 1's parametrized characterization test (written and green _before_ the refactor) is the guardrail — never weaken a case to make the refactor pass; if it goes red, the production code is wrong. +- **Preserve `ValidationError` message text verbatim** when porting `_validate_parameters_*` into the per-op serializers — `test_api_bulk_edit.py` asserts specific strings (e.g. the three distinct "only one document" messages, the all=true message, "out of bounds", "update_document only allowed with a single output document"). +- **Two call sites, obtained differently.** `BulkEditView.post` reads `op` from `validated_data["method"]` and owns the audit logging; `_execute_document_action` receives `op` as an argument from its **six** caller views (which change `method=bulk_edit.` → `op=BULK_EDIT_OPERATIONS[""]`) and has no audit. Convert both paths and all six caller views; Task 1 must characterize both before cutover. +- **`redo_ocr` stays unregistered** (dead/unreachable today; registering it would newly accept it on the wire). +- **Out of scope:** a discriminated `oneOf` request schema for `parameters` — examples (Task 4) are the agreed approach; the polymorphic schema is a possible later follow-up (the discriminator `method` and payload `parameters` are sibling fields, which `PolymorphicProxySerializer` does not model cleanly). diff --git a/docs/superpowers/specs/2026-06-16-bulk-edit-operation-registry-design.md b/docs/superpowers/specs/2026-06-16-bulk-edit-operation-registry-design.md new file mode 100644 index 000000000..5f5295505 --- /dev/null +++ b/docs/superpowers/specs/2026-06-16-bulk-edit-operation-registry-design.md @@ -0,0 +1,513 @@ +# Bulk-Edit Operation Registry — Design + +**Date:** 2026-06-16 +**Branch base:** `dev` +**Status:** Draft (rev. 2 — corrected per critical review) + +## Problem + +A single bulk-edit operation's definition is smeared across **eight sites in +three files**, keyed **three different ways**, with no single source of truth. +Taking `merge` as the worked example: + +| # | Location | What it holds | Keyed by | +| --- | ------------------------- | --------------------------------------------------------- | --------------------- | +| 1 | `serialisers.py:1758` | name in the `method` `ChoiceField.choices` | **string** | +| 2 | `serialisers.py:1849` | `validate_method` `elif` → returns `bulk_edit.merge` | **string → function** | +| 3 | `serialisers.py:2070` | the `all=true`-unsupported list | **function identity** | +| 4 | `serialisers.py:2115` | `validate()` dispatch → `_validate_parameters_merge` | **function identity** | +| 5 | `serialisers.py:2008` | `_validate_parameters_merge` (validate + coerce defaults) | — | +| 6 | `views.py:2687` | `METHOD_NAMES_REQUIRING_USER` / `_TRIGGER_SOURCE` | **`__name__`** | +| 7 | `views.py:2727,2738,2754` | three permission blocks (`method in [...]`) | **function identity** | +| 8 | `views.py:2844` | `MODIFIED_FIELD_BY_METHOD` audit field | **string** | + +Plus the execution function itself in `bulk_edit.py`. + +Three structural problems follow: + +- **`validate_method` resolves the request string to a _function object_** + (`serialisers.py:1826-1860`), so everything downstream compares either + `method == bulk_edit.merge` (identity), `method.__name__` (string), or the raw + request string. Three keying schemes for one concept. Adding an operation — or + editing one — means touching all eight sites, and forgetting one fails + _silently_ (an op that runs but isn't audited, or skips an ownership check) + rather than loudly. + +- **The permission matrix is parameter-conditional and security-critical.** From + `views.py:2713-2760`: ownership is required for `merge`/`split` _only_ when + `delete_originals` is set; `add_document` is required for `edit_pdf`/ + `remove_password` _only_ when `update_document` is not set; `delete_document` + for `merge`/`split` _only_ when `delete_originals`. This logic is correct but + lives far from the operations it governs, so it is hard to audit and easy to + break. + +- **The API is self-undocumenting.** `parameters` is a bare + `serializers.DictField` (`serialisers.py:1773`). drf-spectacular renders it as + a free-form object, so the OpenAPI schema tells a caller nothing about what + `merge` versus `set_correspondent` actually expect. The repo uses + `@extend_schema`/`inline_serializer` widely (62 sites) but has **no** + `PolymorphicProxySerializer`, `OpenApiExample`, or `discriminator` usage to + describe this polymorphic endpoint. + +## Goal + +Make each bulk-edit operation a **single object** that owns all eight facts — +name, execution callable, parameter validation/coercion, audit field, the +`all=`/single-document constraints, the user/trigger-source needs, and its +parameter-conditional permission requirements. Operations live in a registry; +the serializer and view consume the registry instead of re-encoding the +operation list. Adding an operation becomes one class plus one registry entry, +not an eight-site edit. As a deliberate, contract-preserving bonus, each +operation also contributes a **per-operation request example** so the bulk API +finally documents itself in the OpenAPI schema. + +**The wire contract does not change.** This is a relocation of internal logic, +not a redefinition of the endpoint. + +## Scope + +In scope: + +- New `documents/bulk_operations.py` (registry + `BulkEditOperation` classes + + `PermissionRequirements`). The execution functions stay in `bulk_edit.py`; + operation classes wrap them. +- Rewrite `BulkEditSerializer.validate_method` / `validate()` and the + `_validate_parameters_*` methods to delegate to the operation's parameter + serializer. +- Rewrite `BulkEditView._has_document_permissions`, the `METHOD_NAMES_*` sets, + and `MODIFIED_FIELD_BY_METHOD` to read from the registry. +- Add `examples=[...]` to the `bulk_edit` `@extend_schema`, generated from the + registry (one example per operation). +- Unit tests per operation; keep every existing `test_api_bulk_edit*` test green. + +Out of scope: + +- Changing any operation's behavior, accepted method strings, parameter names, + defaults, coercion, or permission outcome. Byte-for-byte wire compatibility. +- The legacy-method deprecation-warning machinery + (`MOVED_DOCUMENT_ACTION_ENDPOINTS`, the API-v9-drop TODO at `views.py:2855`): + legacy methods log a warning and process **inline** — there is **no** redirect + (`views.py:2856-2866`). Preserved as-is. +- A full polymorphic request schema (`oneOf`/discriminated `parameters`). Examples + (option 1) are in scope; a discriminated schema is a possible future follow-up + and is **not** built here — the discriminator (`method`) and the variant + payload (`parameters`) are sibling fields, which `PolymorphicProxySerializer` + does not model cleanly. YAGNI until examples prove insufficient. +- Converting `bulk_edit.py` into a package, or touching the execution functions' + internals. +- Any third-party / entry-point registration of operations. The registry is + in-tree only; an entry point could be layered on later but the PDF/page ops are + tightly bound to internal helpers, so ecosystem value is low and unproven. + +## Decisions + +These shape the design and are the reviewable choices: + +1. **Operations wrap, not replace, the `bulk_edit.py` functions.** Each + `BulkEditOperation.execute` calls the existing function. The execution code is + correct and well-tested; this refactor is about the metadata and dispatch + around it, exactly as the export-sink refactor moved _plumbing_ without + touching export _contents_. +2. **Parameter validation moves into a per-operation DRF `Serializer`**, not an + ad-hoc `clean_*` method. A real serializer (a) validates and coerces in one + place (replacing the `_validate_parameters_*` methods _and_ their in-place + mutation of defaults / the `pages`-string parse), (b) accepts `context` + (`user`, `documents`) for the cross-field/DB checks (page-bounds vs + `document.page_count`, documentlink targets, owner existence), and (c) is a + structure drf-spectacular already understands. Operations with no parameters + (`delete`, `reprocess`) declare `parameter_serializer_class = None`. +3. **Permission requirements are computed by the operation, given the validated + parameters**, returning a `PermissionRequirements` value object. The + parameter-conditional kernel (ownership iff `delete_originals`, etc.) lives + next to the operation it governs. The view's three permission blocks collapse + to "build requirements, then check each flag generically." +4. **Examples are derived from the registry** (option 1 from the design + discussion). Each operation declares a canonical `example_parameters` dict; a + helper builds one `OpenApiExample` per operation for the `bulk_edit` + `@extend_schema`. Adding an operation therefore auto-adds its example — the + examples cannot drift out of sync with the registry. This is the only piece + that _adds_ to the schema; it does not alter the request/response structure. +5. **The registry is the single source of the method enum.** Today's enum is the + 8 hardcoded field-ops (`serialisers.py:1758-1766`) plus + `LEGACY_DOCUMENT_ACTION_METHODS` — but the legacy methods (`delete, reprocess, +rotate, merge, edit_pdf, remove_password, split, delete_pages`) **are + themselves operations**, not a disjoint set, so all **16 unique** methods live + in the registry. `ChoiceField.choices` is therefore + `list(BULK_EDIT_OPERATIONS)` **alone** — do NOT append + `LEGACY_DOCUMENT_ACTION_METHODS` (that would duplicate 8 entries and churn the + enum, the exact thing this decision prevents). The registry must be **ordered** + to reproduce today's member order — the 8 field-ops first (in + `serialisers.py:1758-1766` order), then the 8 legacy methods in + `MOVED_DOCUMENT_ACTION_ENDPOINTS` **key/insertion order** (`delete, reprocess, +rotate, merge, edit_pdf, remove_password, split, delete_pages`; + `serialisers.py:1745-1754`) — so the generated OpenAPI `enum` is byte-identical. + NB: that legacy order is `edit_pdf, remove_password` _before_ `split, +delete_pages` — do not reorder them. + +## Architecture + +### `PermissionRequirements` + +```python +@dataclass(frozen=True) +class PermissionRequirements: + change: bool = True # documents.change_document + object-level, always + ownership: bool = False # user owns (or doc.owner is None for) ALL docs + add_document: bool = False # documents.add_document + delete_document: bool = False # documents.delete_document +``` + +### `BulkEditOperation` + +New module `documents/bulk_operations.py`: + +```python +class BulkEditOperation(ABC): + name: ClassVar[str] + audit_field: ClassVar[str | None] = None # → MODIFIED_FIELD_BY_METHOD + supports_all: ClassVar[bool] = True # → the all=true guard + max_documents: ClassVar[int | None] = None # split/delete_pages/edit_pdf = 1 + too_many_documents_message: ClassVar[str | None] = None # per-op error text (H3) + needs_user: ClassVar[bool] = False # → METHOD_NAMES_REQUIRING_USER + needs_trigger_source: ClassVar[bool] = False # → ..._REQUIRING_TRIGGER_SOURCE + parameter_serializer_class: ClassVar[type[serializers.Serializer] | None] = None + example_parameters: ClassVar[dict] = {} # → OpenApiExample payload + + def clean_parameters(self, parameters: dict, *, user, documents) -> dict: + """Validate + coerce via parameter_serializer_class (context=user,documents). + Returns the normalized parameters. Raises serializers.ValidationError. + No-op passthrough when parameter_serializer_class is None.""" + + def required_permissions(self, parameters: dict) -> PermissionRequirements: + """The parameter-conditional permission kernel. Default: change only.""" + return PermissionRequirements() + + @abstractmethod + def execute(self, doc_ids: list[int], **parameters) -> str: ... +``` + +The two subtle operations, stated next to their own rules: + +```python +class MergeOperation(BulkEditOperation): + name = "merge" + supports_all = False + needs_user = needs_trigger_source = True + parameter_serializer_class = MergeParametersSerializer + example_parameters = {"delete_originals": False, "archive_fallback": False} + + def required_permissions(self, parameters): + delete = parameters.get("delete_originals", False) + return PermissionRequirements( + change=True, add_document=True, + ownership=delete, delete_document=delete, + ) + + def execute(self, doc_ids, **kw): + return bulk_edit.merge(doc_ids, **kw) + + +class EditPdfOperation(BulkEditOperation): + name = "edit_pdf" + supports_all = False + max_documents = 1 + needs_user = needs_trigger_source = True + parameter_serializer_class = EditPdfParametersSerializer + example_parameters = { + "operations": [{"page": 1, "rotate": 90}], + "update_document": False, + "include_metadata": True, + } + + def required_permissions(self, parameters): + update = parameters.get("update_document", False) + # edit_pdf is ALWAYS ownership-gated (views.py:2722); add_document only + # when NOT update_document (views.py:2740-2741). + return PermissionRequirements( + change=True, ownership=True, add_document=not update, + ) +``` + +### Registry + +```python +BULK_EDIT_OPERATIONS: dict[str, BulkEditOperation] = { + op.name: op + for op in ( + SetCorrespondentOperation(), SetDocumentTypeOperation(), + SetStoragePathOperation(), AddTagOperation(), RemoveTagOperation(), + ModifyTagsOperation(), ModifyCustomFieldsOperation(), + SetPermissionsOperation(), + # legacy section — MUST match MOVED_DOCUMENT_ACTION_ENDPOINTS key order + # (serialisers.py:1745-1754) so the generated enum is byte-identical: + DeleteOperation(), ReprocessOperation(), RotateOperation(), + MergeOperation(), EditPdfOperation(), RemovePasswordOperation(), + SplitOperation(), DeletePagesOperation(), + ) +} +``` + +There is **no** `redo_ocr` entry. `validate_method` has a `method == "redo_ocr"` +branch (`serialisers.py:1843`), but `"redo_ocr"` is absent from `choices` +(`serialisers.py:1758-1768`), so the `ChoiceField` rejects it _before_ +`validate_method` runs — that branch is unreachable dead code today. Do **not** +add `redo_ocr` to the registry: doing so would make it a valid `choices` entry +and newly accept it on the wire (a contract change). `reprocess` is registered +once, under `reprocess`. + +### How each call site collapses + +- **`ChoiceField.choices`** → `list(BULK_EDIT_OPERATIONS)` (the 16 unique + methods, registry ordered to match today). Legacy methods are already registry + ops, so they are **not** appended separately (see Decision 5). +- **`validate_method`** → `return BULK_EDIT_OPERATIONS[method]` (the validated + value becomes an _operation object_ instead of a function — internal only, + `method` is `write_only`). +- **`validate()`** → `op.clean_parameters(parameters, user=…, documents=…)`; the + `all=true` guard becomes `if attrs.get("all") and not op.supports_all: raise +ValidationError("This method does not support all=true.")` (today's single + shared message, `serialisers.py:2077`, asserted verbatim by + `test_api_bulk_edit.py:763`); the per-method "only one document" checks become + an `op.max_documents` check that raises `op.too_many_documents_message`. That + text is **per-op** — "Split method only supports one document", "Delete pages + method only supports one document", "Edit PDF method only supports one document" + (`serialisers.py:2105,2111,2119`) — and is asserted verbatim (e.g. + `test_api_bulk_edit.py:1519`), so it **cannot** be collapsed to one generic + string. +- **`METHOD_NAMES_REQUIRING_USER` / `_TRIGGER_SOURCE`** → `op.needs_user` / + `op.needs_trigger_source`. +- **The three permission blocks** → one pass: + ```python + reqs = op.required_permissions(parameters) + ok = user.has_perm("documents.change_document") and all( + has_perms_owner_aware(user, "change_document", d) for d in document_objs + ) + if ok and reqs.ownership: ok = user_is_owner_of_all_documents + if ok and reqs.add_document: ok = user.has_perm("documents.add_document") + if ok and reqs.delete_document: ok = user.has_perm("documents.delete_document") + ``` +- **`MODIFIED_FIELD_BY_METHOD`** → `op.audit_field`. + +**Two call sites consume this, not one.** `BulkEditView.post` +(`views.py:2852-2947`) is a fully **inlined** path — it is the only path the +`bulk_edit/` endpoint uses. It checks permissions, sets `user`/`trigger_source`, +runs the audit-log block (`views.py:2896-2940`, currently keyed on +`method.__name__` → becomes `op.audit_field`), and calls `method(documents, +**parameters)`. `_execute_document_action` (`views.py:2764-2807`) is a +**separate** path used by the _moved single-action_ endpoints +(`/api/documents/delete/`, `/rotate/`, …); it builds `parameters`, sets +user/trigger, and checks permissions independently and has **no** audit logging. +The refactor must convert **both** to the registry; audit logging stays only in +`post`. + +## Operation inventory (the faithful matrix) + +Compiled from `bulk_edit.py` signatures, `serialisers.py:2067-2126`, and +`views.py:2679-2760`. `change` is required for every operation and omitted. +`[source_mode]` is the shared optional param accepted by the PDF-touching ops +(validated by `_validate_source_mode` only when present). + +| Operation (`name`) | Parameters | `supports_all` | `max_documents` | user/trigger | `audit_field` | ownership | add_doc | delete_doc | +| ------------------------------ | ------------------------------------------------------------------------------------------------------------- | -------------- | --------------- | ------------ | --------------- | ---------------------- | ------------------------- | ---------------------- | +| `set_correspondent` | `correspondent: int\|null` | yes | — | no | `correspondent` | — | — | — | +| `set_document_type` | `document_type: int\|null` | yes | — | no | `document_type` | — | — | — | +| `set_storage_path` | `storage_path: int\|null` | yes | — | no | `storage_path` | — | — | — | +| `add_tag` | `tag: int` | yes | — | no | `tags` | — | — | — | +| `remove_tag` | `tag: int` | yes | — | no | `tags` | — | — | — | +| `modify_tags` | `add_tags: int[]`, `remove_tags: int[]` | yes | — | no | `tags` | — | — | — | +| `modify_custom_fields` | `add_custom_fields: int[]\|{id:val}`, `remove_custom_fields: int[]` | yes | — | no | `custom_fields` | — | — | — | +| `set_permissions` | `set_permissions: obj`, `owner: int\|null`, `merge: bool=false` | yes | — | no | `None` | **yes** | — | — | +| `delete` | _(none)_ | yes | — | no | `deleted_at` | **yes** | — | **yes** | +| `reprocess` (alias `redo_ocr`) | _(none)_ | yes | — | no | `checksum` | — | — | — | +| `rotate` | `degrees: int`, `[source_mode]` | yes | — | **yes** | `None` | **yes** | — | — | +| `merge` | `delete_originals: bool=false`, `archive_fallback: bool=false`, `metadata_document_id?: int`, `[source_mode]` | **no** | — | **yes** | `None` | iff `delete_originals` | **yes** | iff `delete_originals` | +| `split` | `pages: str→int[][]`, `delete_originals: bool=false`, `[source_mode]` | **no** | **1** | **yes** | `None` | iff `delete_originals` | **yes** | iff `delete_originals` | +| `delete_pages` | `pages: int[]`, `[source_mode]` | **no** | **1** | **yes** | `None` | **yes** | — | — | +| `edit_pdf` | `operations: obj[]`, `update_document: bool=false`, `include_metadata: bool=true`, `[source_mode]` | **no** | **1** | **yes** | `None` | **yes** | iff not `update_document` | — | +| `remove_password` | `password: str`, `update_document: bool=false`, `[source_mode]` | **no** | — | **yes** | `None` | **yes** | iff not `update_document` | — | + +Notes that are easy to get wrong and are pinned here: + +- `edit_pdf` ownership is **unconditional** — it is in the unconditional + ownership list (`views.py:2722`); the separate `edit_pdf and update_document` + clause (`views.py:2730`) is redundant and folds away. +- `remove_password` **does** accept an `update_document` param + (`bulk_edit.py:881`), and `parameters` is a passthrough `DictField` whose + validator (`serialisers.py:2061-2065`) neither strips nor defaults it. So its + `add_document` requirement is `not parameters.get("update_document", False)` — + identical to `edit_pdf`, **not** an unconditional `True`. Sending + `update_document: true` legitimately drops the add_document requirement today, + and that behavior must be preserved. (Earlier drafts claimed the param did not + exist — that was a permission-correctness bug.) +- `merge` and `remove_password` are **not** single-document (no `max_documents`), + even though both set `supports_all = False`. + +## Parameter coercion contract to preserve + +`clean_parameters` must reproduce every in-place coercion the current +`_validate_parameters_*` methods perform, not merely the validation. Full list +(an implementation-plan checklist): + +- `merge` / `split`: default `delete_originals=False` + (`serialisers.py:1998,2013`); `merge` also defaults `archive_fallback=False` + (`:2018`). +- `edit_pdf`: default `update_document=False`, `include_metadata=True` + (`:2038,2043`); reject `update_document=True` with multiple output docs + (`:2045-2050`). +- `set_permissions`: default `merge=False` (`:1951-1952`) and **mutate** + `parameters["set_permissions"]` in place via `validate_set_permissions` + (`:1946`, `SetPermissionsMixin`); validate `owner` existence + (`:1939-1943,1949-1950`). Needs its own `SetPermissionsParametersSerializer`. +- `split`: parse the `pages` string `"1-3,5"` → `[[1,2,3],[5]]` (`:1974-1990`). +- `source_mode`: validated and applied **only when present** in `parameters` + (`:2084-2085` gate → `validate_source_mode`, `:1964-1969`), independent of + method — so each PDF-touching op's serializer opts into it conditionally. +- `modify_custom_fields`: accept **list OR `{id: value}` dict**, and for + DOCUMENTLINK fields validate targets via `validate_documentlink_targets` + (`:1787-1824`). +- **Param-name spelling differs by op** and must match exactly: `merge`/`split` + use `delete_originals` (plural); `edit_pdf`/`remove_password` use + `delete_original` (singular) (`bulk_edit.py:509,619,751,882`). + +## OpenAPI examples (the "make it useful" piece) + +A single helper builds the examples from the registry: + +```python +def _bulk_edit_examples() -> list[OpenApiExample]: + return [ + OpenApiExample( + name=op.name, + summary=op.name, + value={"documents": [1, 2], "method": op.name, + "parameters": op.example_parameters}, + request_only=True, + ) + for op in BULK_EDIT_OPERATIONS.values() + ] +``` + +wired into the existing decorator (the response schema at `views.py:2818-2825` +is untouched): + +```python +@extend_schema_view( + post=extend_schema( + operation_id="bulk_edit", + description="Perform a bulk edit operation on a list of documents", + examples=_bulk_edit_examples(), + responses={200: inline_serializer(name="BulkEditDocumentsResult", + fields={"result": serializers.CharField()})}, + ), +) +``` + +Result: the Swagger/Redoc page shows a concrete, valid request body for every +operation (`merge`, `edit_pdf`, …), generated from the same objects that +validate the request — they cannot drift apart. The request _structure_ +(`{documents, method, parameters, …}`) and the `method` `enum` are unchanged; +examples are purely additive. + +## Data flow + +``` +POST /api/documents/bulk_edit/ {documents|all|filters, method, parameters, from_webui} + ├─ legacy method? → log deprecation warning, then process INLINE (no redirect; views.py:2856-2866) + ├─ BulkEditSerializer.validate_method(method) → op = BULK_EDIT_OPERATIONS[method] + ├─ validate(): + │ ├─ all=true and not op.supports_all → ValidationError (shared message) + │ ├─ op.max_documents and len(documents) > it → ValidationError (op.too_many_documents_message) + │ └─ parameters = op.clean_parameters(parameters, user=…, documents=…) + └─ BulkEditView.post (inlined; the only path bulk_edit/ uses): + ├─ if op.needs_user: parameters["user"] = user + ├─ if op.needs_trigger_source: parameters["trigger_source"] = WEB_UI|API_UPLOAD + ├─ reqs = op.required_permissions(parameters); check change/ownership/add/delete + │ → 403 HttpResponseForbidden on failure (unchanged) + ├─ if op.audit_field and AUDIT_LOG_ENABLED: snapshot old values (views.py:2896-2910) + ├─ result = op.execute(documents, **parameters) (call-time bulk_edit. lookup) + └─ if op.audit_field and AUDIT_LOG_ENABLED: LogEntry per doc → Response({"result": …}) +note: _execute_document_action (views.py:2764-2807) is the SEPARATE moved-single-action path + (/api/documents/delete/, /rotate/, …); it converts to the registry too, but has NO audit log. + audit "reason" string uses op.name (== bulk_edit..__name__ today, so unchanged at runtime). +``` + +## Backwards compatibility + +- **Wire contract:** request/response shapes, accepted `method` strings, + parameter names, defaults, coercion, and every permission outcome are + byte-for-byte preserved. `method` becoming an operation object is internal + (`write_only`). +- **`bulk_edit.` patching keeps working — by module identity, not luck.** + Existing tests patch `documents.serialisers.bulk_edit.` and + `documents.views.bulk_edit.` (e.g. `test_api_bulk_edit.py:203,485,1100,1271`). + All of `documents.serialisers.bulk_edit`, `documents.views.bulk_edit`, and the + new `documents.bulk_operations.bulk_edit` are the **same module object** in + `sys.modules`; patching an attribute via any path mutates the one shared module. + So as long as each `op.execute` does a **call-time** lookup + (`return bulk_edit.merge(doc_ids, **kw)`, not a function captured at + class-definition time), the existing patches still intercept and those tests + stay untouched. +- **The `method.__name__` dependency disappears.** `setup_mock` + (`test_api_bulk_edit.py:61-63`) sets `m.__name__` because dispatch reads + `method.__name__` (`views.py:2783,2879,2896,2938`). The refactor replaces every + such read with `op.name` / `op.needs_user` / `op.audit_field`, so the mock's + `__name__` no longer affects dispatch. The audit "reason" becomes + `f"Bulk edit: {op.name}"`; since `bulk_edit.merge.__name__ == "merge" == +op.name`, real-run audit text is unchanged. No test asserts + `validated_data["method"]` identity (verified), so `validate_method` returning + an operation object is safe. +- **Legacy methods:** `MOVED_DOCUMENT_ACTION_ENDPOINTS` / + `LEGACY_DOCUMENT_ACTION_METHODS` and the v9-drop TODO are unchanged. They drive + only the inline deprecation warning (`views.py:2856-2866`), **not** the + `choices` — which come wholly from the registry, since the legacy methods _are_ + registry ops (see C1/Decision 5). +- **OpenAPI:** the `method` `enum` and request/response structure are unchanged + (Decision 5); `examples` are additive. Regenerated schema diff should be + _examples only_. + +## Testing + +New `documents/tests/test_bulk_operations.py` (pytest classes, factory-boy +factories, the `mocker` fixture, `parametrize`, full type annotations; run on the +Linux VM): + +- **Permission matrix, parametrized over every operation** — the highest-value + test. For each op and each relevant parameter combination + (`delete_originals` on/off, `update_document` on/off), assert + `op.required_permissions(params)` equals the expected + `PermissionRequirements`. This freezes the security kernel against drift. +- **Registry/serializer parity** — `ChoiceField.choices` equals the **16 unique** + method strings in today's exact order (8 field-ops, then the 8 + `MOVED_DOCUMENT_ACTION_ENDPOINTS` keys); **no duplicates**; `redo_ocr` absent; + every method resolves to an operation. (Guards against the C1 duplication bug.) +- **Parameter validation/coercion** per op — defaults applied (`merge` → + `delete_originals=False`, `archive_fallback=False`; `split`/`edit_pdf` defaults), + the `pages` string→list parse, page-bounds-vs-`page_count`, documentlink target + and owner-existence checks — mirroring the current `_validate_parameters_*` tests. +- **`supports_all` / `max_documents`** — `all=true` rejected for the five + no-all ops; `>1` document rejected for `split`/`delete_pages`/`edit_pdf`. +- **Examples** — `_bulk_edit_examples()` yields one entry per distinct operation, + each `value["parameters"]` validates clean through that op's + `parameter_serializer_class` (guarantees documented examples are valid). + +Existing `test_api_bulk_edit.py` / `test_api_bulk_download.py` stay green +unchanged — external behavior (accepted methods, validation errors, permission +403s, audit fields, results) is preserved. + +## Risks + +- **Permission-matrix mistranslation is a privilege-escalation bug, not a + cosmetic one.** This is the whole ballgame. Mitigation: move the logic verbatim + into per-op `required_permissions`, and the parametrized permission test above + is written _first_ against the current behavior, then held invariant across the + refactor. +- **The `method`-as-function-object contract** is relied on by existing tests + (identity compares, `bulk_edit.` patching). Mitigation: keep `execute` + delegating to the module-level function so patches still bite; adjust only the + identity asserts. Audit `test_api_bulk_edit.py` before coding. +- **Serializer-based validation subtly changing error messages/shapes.** The + current validators raise specific `ValidationError` strings that tests assert + on. Mitigation: preserve message text when porting each `_validate_parameters_*` + into its serializer; diff the test expectations. +- **Enum churn in the generated schema.** Mitigation: Decision 5 fixes member set + and order; the schema-diff check in CI should show examples-only changes.