mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2026-06-30 09:14:17 +00:00
Docs: export zip compression implementation plan
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,839 @@
|
||||
# Export Zip Compression Control Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add `--zip-compression {stored,deflated,bzip2,lzma,zstd}` and `--zip-compression-level N` flags to `document_exporter`, threaded into `ZipExportSink`, with import-side safety for codecs the running Python can't read.
|
||||
|
||||
**Architecture:** A new pure-data module `documents/export/compression.py` owns the method↔constant map, per-method level bounds, the runtime availability probe, and a compress-type readability check. `ZipExportSink` gains `compression`/`compresslevel` constructor params. The command validates flags up front (fail-fast `CommandError`) and constructs the sink; the importer pre-checks entry compress types before extracting.
|
||||
|
||||
**Tech Stack:** Python ≥3.11 (zstd only on 3.14+), `zipfile`, `compression.zstd` (PEP 784), pytest + pytest-mock + factory-boy. Backend tests run on the Linux VM (Python 3.11 — zstd positive tests are `skipif`-guarded); `ruff` runs locally.
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-06-16-export-zip-compression-design.md`
|
||||
|
||||
**PREREQUISITE:** The base refactor `docs/superpowers/plans/2026-06-16-export-sink-architecture.md` MUST be merged first. This plan assumes `src/documents/export/sinks.py` exists with `ZipExportSink(target, zip_name, *, delete=False)` opening its `ZipFile` in `_open()`.
|
||||
|
||||
---
|
||||
|
||||
## Verified facts (CPython 3.14.3, via `uv run --python 3.14 --no-project`)
|
||||
|
||||
- Constants: `ZIP_STORED=0`, `ZIP_DEFLATED=8`, `ZIP_BZIP2=12`, `ZIP_LZMA=14`, `ZIP_ZSTANDARD=93` (zstd added 3.14; absent on < 3.14).
|
||||
- `ZipFile(file, "w", compression=…, compresslevel=…)` applies both as the default for every `write`/`writestr` — no per-entry args needed (verified).
|
||||
- Level bounds: `deflated` 0–9, `bzip2` 1–9, `lzma`/`stored` ignore level, `zstd` -131072…22 (`compression.zstd.CompressionParameter.compression_level.bounds() == (-131072, 22)`).
|
||||
- An invalid level fails at the **first write** (`ValueError: Invalid initialization option` / `compresslevel must be between 1 and 9`), plus GC-time `AttributeError` noise on close — hence up-front validation.
|
||||
- zstd is backed by `compression.zstd`; `zipfile` raises `RuntimeError` if it's unavailable.
|
||||
|
||||
## Conventions for every task
|
||||
|
||||
- **Run backend tests on the VM:** `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "<targets>"` (never locally).
|
||||
- **Lint locally:** `ruff check <paths> && ruff format <paths>` (global ruff, not `uv run`).
|
||||
- **Tests are pytest-style:** classes, `@pytest.mark.django_db` on the class only where DB is needed (the `compression.py` and sink tests need no DB), factory-boy, `mocker`, `parametrize`, full type annotations.
|
||||
- The VM runs Python 3.11, so **zstd positive tests must be `@pytest.mark.skipif(...)`-guarded**; they will simply not run there. zstd _rejection_ tests (the < 3.14 path) DO run on the VM.
|
||||
|
||||
## File structure
|
||||
|
||||
- **Create** `src/documents/export/compression.py` — method map, CLI choices, level bounds, `compression_available()`, `level_error()`, `compress_type_readable()`, `unreadable_method_names()`. Pure, no Django.
|
||||
- **Create** `src/documents/tests/export/test_compression.py` — unit tests for the above.
|
||||
- **Modify** `src/documents/export/sinks.py` — `ZipExportSink.__init__` gains `compression`/`compresslevel`; `_open()` passes them to `ZipFile`.
|
||||
- **Modify** `src/documents/tests/export/test_sinks.py` — assert the chosen `compress_type` is applied.
|
||||
- **Modify** `src/documents/management/commands/document_exporter.py` — add the two CLI flags, up-front validation, and pass resolved values to `ZipExportSink`.
|
||||
- **Modify** `src/documents/tests/test_management_exporter.py` — flag validation + default-unchanged tests.
|
||||
- **Modify** `src/documents/management/commands/document_importer.py` — pre-extract compress-type check.
|
||||
- **Modify** `src/documents/tests/test_management_importer.py` — unsupported-codec → `CommandError`.
|
||||
- **Modify** `docs/administration.md` — document both flags + zstd portability caveat.
|
||||
|
||||
---
|
||||
|
||||
## Task 1: `documents/export/compression.py` (pure compression policy)
|
||||
|
||||
**Files:**
|
||||
|
||||
- Create: `src/documents/export/compression.py`
|
||||
- Test: `src/documents/tests/export/test_compression.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing tests**
|
||||
|
||||
Create `src/documents/tests/export/test_compression.py`:
|
||||
|
||||
```python
|
||||
import sys
|
||||
import zipfile
|
||||
|
||||
import pytest
|
||||
|
||||
from documents.export import compression
|
||||
|
||||
|
||||
class TestCompressionMethods:
|
||||
def test_choices_always_include_zstd(self) -> None:
|
||||
# zstd is offered regardless of runtime; availability is checked separately
|
||||
assert compression.COMPRESSION_CHOICES == (
|
||||
"stored",
|
||||
"deflated",
|
||||
"bzip2",
|
||||
"lzma",
|
||||
"zstd",
|
||||
)
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
("name", "constant"),
|
||||
[
|
||||
("stored", zipfile.ZIP_STORED),
|
||||
("deflated", zipfile.ZIP_DEFLATED),
|
||||
("bzip2", zipfile.ZIP_BZIP2),
|
||||
("lzma", zipfile.ZIP_LZMA),
|
||||
],
|
||||
)
|
||||
def test_method_maps_to_zipfile_constant(self, name: str, constant: int) -> None:
|
||||
assert compression.COMPRESSION_METHODS[name] == constant
|
||||
|
||||
def test_stored_and_deflated_always_available(self) -> None:
|
||||
assert compression.compression_available("stored")
|
||||
assert compression.compression_available("deflated")
|
||||
|
||||
def test_zstd_availability_tracks_runtime(self) -> None:
|
||||
expected: bool = sys.version_info >= (3, 14)
|
||||
assert compression.compression_available("zstd") == expected
|
||||
|
||||
|
||||
class TestLevelError:
|
||||
@pytest.mark.parametrize(
|
||||
("method", "level"),
|
||||
[
|
||||
("deflated", 0),
|
||||
("deflated", 9),
|
||||
("bzip2", 1),
|
||||
("bzip2", 9),
|
||||
("deflated", None),
|
||||
("stored", None),
|
||||
],
|
||||
)
|
||||
def test_valid_levels_return_none(self, method: str, level: int | None) -> None:
|
||||
assert compression.level_error(method, level) is None
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
("method", "level"),
|
||||
[
|
||||
("deflated", 10),
|
||||
("deflated", -1),
|
||||
("bzip2", 0),
|
||||
("bzip2", 10),
|
||||
],
|
||||
)
|
||||
def test_out_of_range_levels_return_message(
|
||||
self,
|
||||
method: str,
|
||||
level: int,
|
||||
) -> None:
|
||||
msg: str | None = compression.level_error(method, level)
|
||||
assert msg is not None
|
||||
assert "between" in msg
|
||||
|
||||
@pytest.mark.parametrize("method", ["stored", "lzma"])
|
||||
def test_level_on_levelless_method_is_rejected(self, method: str) -> None:
|
||||
msg: str | None = compression.level_error(method, 5)
|
||||
assert msg is not None
|
||||
assert "no effect" in msg
|
||||
|
||||
|
||||
class TestCompressTypeReadable:
|
||||
@pytest.mark.parametrize("ct", [zipfile.ZIP_STORED, zipfile.ZIP_DEFLATED])
|
||||
def test_stored_and_deflated_always_readable(self, ct: int) -> None:
|
||||
assert compression.compress_type_readable(ct)
|
||||
|
||||
def test_zstd_compress_type_readability_tracks_runtime(self) -> None:
|
||||
# 93 = ZIP_ZSTANDARD; 20 = legacy zstd method id (read-only)
|
||||
expected: bool = sys.version_info >= (3, 14)
|
||||
assert compression.compress_type_readable(93) == expected
|
||||
assert compression.compress_type_readable(20) == expected
|
||||
|
||||
def test_unknown_compress_type_is_unreadable(self) -> None:
|
||||
assert not compression.compress_type_readable(9999)
|
||||
|
||||
def test_unreadable_method_names_lists_methods(self) -> None:
|
||||
# An unknown method id maps to no name and is reported generically.
|
||||
names: set[str] = compression.unreadable_method_names({9999})
|
||||
assert names == {"method 9999"}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/export/test_compression.py -v"`
|
||||
Expected: FAIL with `ModuleNotFoundError: No module named 'documents.export.compression'`.
|
||||
|
||||
- [ ] **Step 3: Implement `compression.py`**
|
||||
|
||||
Create `src/documents/export/compression.py`:
|
||||
|
||||
```python
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import zipfile
|
||||
|
||||
# ZIP_ZSTANDARD exists only on Python 3.14+ (PEP 784). None elsewhere.
|
||||
ZSTD: int | None = getattr(zipfile, "ZIP_ZSTANDARD", None)
|
||||
|
||||
# CLI choices are fixed across runtimes so argparse never hides zstd; runtime
|
||||
# availability is enforced separately in compression_available().
|
||||
COMPRESSION_CHOICES: tuple[str, ...] = (
|
||||
"stored",
|
||||
"deflated",
|
||||
"bzip2",
|
||||
"lzma",
|
||||
"zstd",
|
||||
)
|
||||
|
||||
# Method name -> zipfile compression constant (zstd only when supported).
|
||||
COMPRESSION_METHODS: dict[str, int] = {
|
||||
"stored": zipfile.ZIP_STORED,
|
||||
"deflated": zipfile.ZIP_DEFLATED,
|
||||
"bzip2": zipfile.ZIP_BZIP2,
|
||||
"lzma": zipfile.ZIP_LZMA,
|
||||
}
|
||||
if ZSTD is not None:
|
||||
COMPRESSION_METHODS["zstd"] = ZSTD
|
||||
|
||||
# Inclusive (min, max) level bounds per method; None => level not applicable.
|
||||
# Verified on CPython 3.14.3.
|
||||
LEVEL_BOUNDS: dict[str, tuple[int, int] | None] = {
|
||||
"stored": None,
|
||||
"deflated": (0, 9),
|
||||
"bzip2": (1, 9),
|
||||
"lzma": None,
|
||||
"zstd": (-131072, 22),
|
||||
}
|
||||
|
||||
# zipfile compress_type id -> method name. 93 = current zstd id, 20 = legacy
|
||||
# zstd id that zipfile can still read.
|
||||
_COMPRESS_TYPE_TO_METHOD: dict[int, str] = {
|
||||
zipfile.ZIP_STORED: "stored",
|
||||
zipfile.ZIP_DEFLATED: "deflated",
|
||||
zipfile.ZIP_BZIP2: "bzip2",
|
||||
zipfile.ZIP_LZMA: "lzma",
|
||||
93: "zstd",
|
||||
20: "zstd",
|
||||
}
|
||||
|
||||
|
||||
def compression_available(method: str) -> bool:
|
||||
"""Whether the running interpreter can actually use the given method."""
|
||||
if method in ("stored", "deflated"):
|
||||
# zlib is a hard CPython dependency; stored needs nothing.
|
||||
return True
|
||||
if method == "bzip2":
|
||||
return _module_importable("bz2")
|
||||
if method == "lzma":
|
||||
return _module_importable("lzma")
|
||||
if method == "zstd":
|
||||
return ZSTD is not None and _module_importable("compression.zstd")
|
||||
return False
|
||||
|
||||
|
||||
def _module_importable(name: str) -> bool:
|
||||
try:
|
||||
importlib.import_module(name)
|
||||
except ImportError:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def level_error(method: str, level: int | None) -> str | None:
|
||||
"""Return a human message if (method, level) is invalid, else None."""
|
||||
if level is None:
|
||||
return None
|
||||
bounds = LEVEL_BOUNDS[method]
|
||||
if bounds is None:
|
||||
return f"--zip-compression-level has no effect for '{method}'"
|
||||
low, high = bounds
|
||||
if not (low <= level <= high):
|
||||
return (
|
||||
f"--zip-compression-level for '{method}' must be between "
|
||||
f"{low} and {high}"
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def compress_type_readable(compress_type: int) -> bool:
|
||||
"""Whether this interpreter can decompress an entry of the given type."""
|
||||
method = _COMPRESS_TYPE_TO_METHOD.get(compress_type)
|
||||
if method is None:
|
||||
return False
|
||||
return compression_available(method)
|
||||
|
||||
|
||||
def unreadable_method_names(compress_types: set[int]) -> set[str]:
|
||||
"""Map a set of compress_type ids to human method names for error messages."""
|
||||
names: set[str] = set()
|
||||
for ct in compress_types:
|
||||
names.add(_COMPRESS_TYPE_TO_METHOD.get(ct, f"method {ct}"))
|
||||
return names
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run to verify it passes**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/export/test_compression.py -v"`
|
||||
Expected: PASS (on the 3.11 VM, `test_zstd_availability_tracks_runtime` and `test_zstd_compress_type_readability_tracks_runtime` assert `False`).
|
||||
|
||||
- [ ] **Step 5: Lint**
|
||||
|
||||
Run: `ruff check src/documents/export/compression.py src/documents/tests/export/test_compression.py && ruff format src/documents/export/compression.py src/documents/tests/export/test_compression.py`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add src/documents/export/compression.py src/documents/tests/export/test_compression.py
|
||||
git commit -m "Feature: add export compression policy module"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 2: `ZipExportSink` accepts compression method + level
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `src/documents/export/sinks.py`
|
||||
- Test: `src/documents/tests/export/test_sinks.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
Append to `src/documents/tests/export/test_sinks.py` (the top-of-file block already imports `zipfile`, `Path`, `pytest`, `ZipExportSink`, `StreamingManifestWriter` from the base-refactor plan):
|
||||
|
||||
```python
|
||||
class TestZipExportSinkCompression:
|
||||
@pytest.fixture()
|
||||
def source_file(self, tmp_path: Path) -> Path:
|
||||
src: Path = tmp_path / "src" / "doc.pdf"
|
||||
src.parent.mkdir(parents=True)
|
||||
src.write_bytes(b"PDF-CONTENT" * 100)
|
||||
return src
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
("method", "constant"),
|
||||
[
|
||||
("stored", zipfile.ZIP_STORED),
|
||||
("deflated", zipfile.ZIP_DEFLATED),
|
||||
("bzip2", zipfile.ZIP_BZIP2),
|
||||
("lzma", zipfile.ZIP_LZMA),
|
||||
],
|
||||
)
|
||||
def test_compression_method_is_applied_to_file_entries(
|
||||
self,
|
||||
tmp_path: Path,
|
||||
source_file: Path,
|
||||
method: str,
|
||||
constant: int,
|
||||
) -> None:
|
||||
target: Path = tmp_path / "out"
|
||||
target.mkdir()
|
||||
with ZipExportSink(
|
||||
target,
|
||||
"export",
|
||||
delete=False,
|
||||
compression=constant,
|
||||
) as sink:
|
||||
sink.add_file(source_file, "doc.pdf")
|
||||
with zipfile.ZipFile(target / "export.zip") as zf:
|
||||
info = zf.getinfo("doc.pdf")
|
||||
assert info.compress_type == constant
|
||||
|
||||
def test_compressing_method_beats_stored(
|
||||
self,
|
||||
tmp_path: Path,
|
||||
source_file: Path,
|
||||
) -> None:
|
||||
# Robust size invariant: a compressing method must be <= stored on
|
||||
# compressible content (avoids flaky level-9-vs-level-1 comparisons).
|
||||
sizes: dict[str, int] = {}
|
||||
for name, constant in (("stored", zipfile.ZIP_STORED), ("deflated", zipfile.ZIP_DEFLATED)):
|
||||
target: Path = tmp_path / name
|
||||
target.mkdir()
|
||||
with ZipExportSink(target, "export", delete=False, compression=constant) as sink:
|
||||
sink.add_file(source_file, "doc.pdf")
|
||||
sizes[name] = (target / "export.zip").stat().st_size
|
||||
assert sizes["deflated"] <= sizes["stored"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/export/test_sinks.py::TestZipExportSinkCompression -v"`
|
||||
Expected: FAIL with `TypeError: __init__() got an unexpected keyword argument 'compression'`.
|
||||
|
||||
- [ ] **Step 3: Add the params to `ZipExportSink`**
|
||||
|
||||
In `src/documents/export/sinks.py`, change `ZipExportSink.__init__` to accept the new keyword-only params and store them, and pass them in `_open()`:
|
||||
|
||||
```python
|
||||
def __init__(
|
||||
self,
|
||||
target: Path,
|
||||
zip_name: str,
|
||||
*,
|
||||
delete: bool = False,
|
||||
compression: int = zipfile.ZIP_DEFLATED,
|
||||
compresslevel: int | None = None,
|
||||
) -> None:
|
||||
self._target = target.resolve()
|
||||
self._zip_path = (self._target / zip_name).with_suffix(".zip")
|
||||
self._tmp_path = self._zip_path.with_name(self._zip_path.name + ".tmp")
|
||||
self._delete = delete
|
||||
self._compression = compression
|
||||
self._compresslevel = compresslevel
|
||||
self._zip: zipfile.ZipFile | None = None
|
||||
self._dirs: set[str] = set()
|
||||
self._pending_manifest: tuple[Path, str] | None = None
|
||||
self._stream_open = False
|
||||
```
|
||||
|
||||
And in `_open()`:
|
||||
|
||||
```python
|
||||
def _open(self) -> None:
|
||||
settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
|
||||
self._zip = zipfile.ZipFile(
|
||||
self._tmp_path,
|
||||
"w",
|
||||
compression=self._compression,
|
||||
compresslevel=self._compresslevel,
|
||||
allowZip64=True,
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run to verify it passes**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/export/test_sinks.py -v"`
|
||||
Expected: PASS (all sink tests, including the four method params and the size invariant). `bzip2`/`lzma` are present on the VM's CPython, so those params pass.
|
||||
|
||||
- [ ] **Step 5: Lint**
|
||||
|
||||
Run: `ruff check src/documents/export/sinks.py && ruff format src/documents/export/sinks.py`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add src/documents/export/sinks.py src/documents/tests/export/test_sinks.py
|
||||
git commit -m "Feature: ZipExportSink accepts compression method and level"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 3: Wire CLI flags + validation into `document_exporter`
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `src/documents/management/commands/document_exporter.py`
|
||||
- Test: `src/documents/tests/test_management_exporter.py`
|
||||
|
||||
- [ ] **Step 1: Add the argparse flags**
|
||||
|
||||
In `document_exporter.py`, add the import near the other `documents.export` import:
|
||||
|
||||
```python
|
||||
from documents.export.compression import COMPRESSION_CHOICES
|
||||
from documents.export.compression import COMPRESSION_METHODS
|
||||
from documents.export.compression import compression_available
|
||||
from documents.export.compression import level_error
|
||||
from documents.export.compression import ZSTD
|
||||
```
|
||||
|
||||
In `add_arguments`, after the `--zip-name` argument, add:
|
||||
|
||||
```python
|
||||
parser.add_argument(
|
||||
"--zip-compression",
|
||||
choices=COMPRESSION_CHOICES,
|
||||
default=None,
|
||||
help=(
|
||||
"Compression method for the export zip (requires --zip). "
|
||||
"Default: deflated. 'zstd' requires Python 3.14+ on both the "
|
||||
"exporting and importing machine."
|
||||
),
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--zip-compression-level",
|
||||
type=int,
|
||||
default=None,
|
||||
help=(
|
||||
"Compression level for the export zip (requires --zip). "
|
||||
"deflated: 0-9, bzip2: 1-9, zstd: -131072..22; ignored for "
|
||||
"stored/lzma."
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Read + validate the flags in `handle()`**
|
||||
|
||||
In `handle()`, after the existing `--compare-*` + `--zip` guard, add the compression flag handling. Insert before the sink construction:
|
||||
|
||||
```python
|
||||
zip_compression: str | None = options["zip_compression"]
|
||||
zip_compression_level: int | None = options["zip_compression_level"]
|
||||
|
||||
if not self.zip_export and (
|
||||
zip_compression is not None or zip_compression_level is not None
|
||||
):
|
||||
raise CommandError(
|
||||
"--zip-compression and --zip-compression-level require --zip",
|
||||
)
|
||||
|
||||
compression_method = zip_compression or "deflated"
|
||||
if self.zip_export:
|
||||
if not compression_available(compression_method):
|
||||
if compression_method == "zstd" and ZSTD is None:
|
||||
raise CommandError(
|
||||
"zstd compression requires Python 3.14 or newer",
|
||||
)
|
||||
raise CommandError(
|
||||
f"Compression method '{compression_method}' is not "
|
||||
f"available on this Python runtime",
|
||||
)
|
||||
level_msg = level_error(compression_method, zip_compression_level)
|
||||
if level_msg is not None:
|
||||
raise CommandError(level_msg)
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Pass the resolved values into `ZipExportSink`**
|
||||
|
||||
Change the `ZipExportSink(...)` construction in `handle()` to:
|
||||
|
||||
```python
|
||||
if self.zip_export:
|
||||
sink = ZipExportSink(
|
||||
self.target,
|
||||
options["zip_name"],
|
||||
delete=self.delete,
|
||||
compression=COMPRESSION_METHODS[compression_method],
|
||||
compresslevel=zip_compression_level,
|
||||
)
|
||||
else:
|
||||
sink = DirectoryExportSink(
|
||||
self.target,
|
||||
compare_checksums=self.compare_checksums,
|
||||
compare_json=self.compare_json,
|
||||
delete=self.delete,
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Write the command-level tests**
|
||||
|
||||
Add to the `TestExportImport` class in `src/documents/tests/test_management_exporter.py` (imports `call_command`, `CommandError`, `ZipFile`, `timezone` already present):
|
||||
|
||||
```python
|
||||
def test_compression_flags_require_zip(self) -> None:
|
||||
for args in (
|
||||
["--zip-compression", "lzma"],
|
||||
["--zip-compression-level", "5"],
|
||||
):
|
||||
with self.assertRaises(CommandError):
|
||||
call_command(
|
||||
"document_exporter",
|
||||
self.target,
|
||||
*args,
|
||||
skip_checks=True,
|
||||
)
|
||||
|
||||
def test_zip_compression_level_out_of_range_raises(self) -> None:
|
||||
with self.assertRaises(CommandError):
|
||||
call_command(
|
||||
"document_exporter",
|
||||
self.target,
|
||||
"--zip",
|
||||
"--zip-compression",
|
||||
"deflated",
|
||||
"--zip-compression-level",
|
||||
"99",
|
||||
skip_checks=True,
|
||||
)
|
||||
|
||||
def test_zip_compression_level_rejected_for_stored(self) -> None:
|
||||
with self.assertRaises(CommandError):
|
||||
call_command(
|
||||
"document_exporter",
|
||||
self.target,
|
||||
"--zip",
|
||||
"--zip-compression",
|
||||
"stored",
|
||||
"--zip-compression-level",
|
||||
"5",
|
||||
skip_checks=True,
|
||||
)
|
||||
|
||||
def test_zip_lzma_compression_round_trips(self) -> None:
|
||||
call_command(
|
||||
"document_exporter",
|
||||
self.target,
|
||||
"--zip",
|
||||
"--zip-compression",
|
||||
"lzma",
|
||||
skip_checks=True,
|
||||
)
|
||||
expected = str(
|
||||
self.target / f"export-{timezone.localdate().isoformat()}.zip",
|
||||
)
|
||||
self.assertIsFile(expected)
|
||||
with ZipFile(expected) as zip_file:
|
||||
info = zip_file.getinfo("manifest.json")
|
||||
# manifest.json carries the chosen method; deflated is the default
|
||||
self.assertEqual(info.compress_type, 14) # ZIP_LZMA
|
||||
|
||||
def test_default_zip_uses_deflate(self) -> None:
|
||||
call_command(
|
||||
"document_exporter",
|
||||
self.target,
|
||||
"--zip",
|
||||
skip_checks=True,
|
||||
)
|
||||
expected = str(
|
||||
self.target / f"export-{timezone.localdate().isoformat()}.zip",
|
||||
)
|
||||
with ZipFile(expected) as zip_file:
|
||||
info = zip_file.getinfo("manifest.json")
|
||||
self.assertEqual(info.compress_type, 8) # ZIP_DEFLATED
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the tests**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_management_exporter.py -v"`
|
||||
Expected: PASS — the new tests plus all existing exporter tests stay green.
|
||||
|
||||
- [ ] **Step 6: Lint**
|
||||
|
||||
Run: `ruff check src/documents/management/commands/document_exporter.py src/documents/tests/test_management_exporter.py && ruff format src/documents/management/commands/document_exporter.py src/documents/tests/test_management_exporter.py`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 7: Commit**
|
||||
|
||||
```bash
|
||||
git add src/documents/management/commands/document_exporter.py src/documents/tests/test_management_exporter.py
|
||||
git commit -m "Feature: add --zip-compression and --zip-compression-level flags"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 4: Importer pre-check for unreadable codecs
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `src/documents/management/commands/document_importer.py`
|
||||
- Test: `src/documents/tests/test_management_importer.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing test**
|
||||
|
||||
The importer test file `src/documents/tests/test_management_importer.py` is
|
||||
`TestCase`-style (`class TestCommandImport(... TestCase)`, `self.assertRaises`,
|
||||
`DirectoriesMixin` gives `self.dirs.scratch_dir`). Match that style. Add this
|
||||
method to `TestCommandImport`. It builds a valid zip and patches the readability
|
||||
probe so the check fires deterministically on any runtime:
|
||||
|
||||
```python
|
||||
def test_import_rejects_unreadable_compression(self) -> None:
|
||||
"""
|
||||
GIVEN:
|
||||
- A zip archive with an entry whose compression this Python can't read
|
||||
WHEN:
|
||||
- Import is attempted
|
||||
THEN:
|
||||
- A CommandError naming the issue is raised, before extraction
|
||||
"""
|
||||
import zipfile
|
||||
from unittest import mock
|
||||
|
||||
archive = Path(self.dirs.scratch_dir) / "export.zip"
|
||||
with zipfile.ZipFile(archive, "w") as zf:
|
||||
zf.writestr("manifest.json", "[]")
|
||||
|
||||
with mock.patch(
|
||||
"documents.management.commands.document_importer.compress_type_readable",
|
||||
return_value=False,
|
||||
):
|
||||
with self.assertRaises(CommandError) as e:
|
||||
call_command(
|
||||
"document_importer",
|
||||
str(archive),
|
||||
"--no-progress-bar",
|
||||
skip_checks=True,
|
||||
)
|
||||
self.assertIn("compression", str(e.exception))
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run to verify it fails**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_management_importer.py -k unreadable_compression -v"`
|
||||
Expected: FAIL — no pre-check exists yet, so the import proceeds (or fails with a different error).
|
||||
|
||||
- [ ] **Step 3: Implement the pre-check**
|
||||
|
||||
In `document_importer.py`, add the import:
|
||||
|
||||
```python
|
||||
from documents.export.compression import compress_type_readable
|
||||
from documents.export.compression import unreadable_method_names
|
||||
```
|
||||
|
||||
Find the zip-handling block (around `document_importer.py:453`):
|
||||
|
||||
```python
|
||||
with ZipFile(self.source) as zf:
|
||||
zf.extractall(tmp_dir)
|
||||
```
|
||||
|
||||
Replace it with a pre-check before extraction:
|
||||
|
||||
```python
|
||||
with ZipFile(self.source) as zf:
|
||||
unsupported = {
|
||||
info.compress_type
|
||||
for info in zf.infolist()
|
||||
if not compress_type_readable(info.compress_type)
|
||||
}
|
||||
if unsupported:
|
||||
names = ", ".join(sorted(unreadable_method_names(unsupported)))
|
||||
raise CommandError(
|
||||
f"This archive uses compression this Python cannot "
|
||||
f"read ({names}). zstd archives require Python 3.14+.",
|
||||
)
|
||||
zf.extractall(tmp_dir)
|
||||
```
|
||||
|
||||
Confirm `CommandError` is imported in `document_importer.py` (it is used elsewhere; if not, add `from django.core.management.base import CommandError`).
|
||||
|
||||
- [ ] **Step 4: Run to verify it passes**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/test_management_importer.py -v"`
|
||||
Expected: PASS — the new test plus all existing importer tests (normal deflated/stored archives still import).
|
||||
|
||||
- [ ] **Step 5: Lint**
|
||||
|
||||
Run: `ruff check src/documents/management/commands/document_importer.py src/documents/tests/test_management_importer.py && ruff format src/documents/management/commands/document_importer.py src/documents/tests/test_management_importer.py`
|
||||
Expected: no errors.
|
||||
|
||||
- [ ] **Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add src/documents/management/commands/document_importer.py src/documents/tests/test_management_importer.py
|
||||
git commit -m "Feature: importer rejects archives with unreadable compression"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 5: Document the flags
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `docs/administration.md`
|
||||
|
||||
- [ ] **Step 1: Add the flags to the option list**
|
||||
|
||||
In `docs/administration.md`, update the usage block (around line 257) to include the new flags:
|
||||
|
||||
```
|
||||
document_exporter target [-c] [-d] [-f] [-na] [-nt] [-p] [-sm] [-z]
|
||||
|
||||
optional arguments:
|
||||
-c, --compare-checksums
|
||||
-cj, --compare-json
|
||||
-d, --delete
|
||||
-f, --use-filename-format
|
||||
-na, --no-archive
|
||||
-nt, --no-thumbnail
|
||||
-p, --use-folder-prefix
|
||||
-sm, --split-manifest
|
||||
-z, --zip
|
||||
-zn, --zip-name
|
||||
--zip-compression
|
||||
--zip-compression-level
|
||||
--data-only
|
||||
--no-progress-bar
|
||||
--passphrase
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add the prose**
|
||||
|
||||
After the `-z`/`--zip` paragraph (around line 330), add:
|
||||
|
||||
```markdown
|
||||
The compression method for the zip can be set with `--zip-compression`
|
||||
(`stored`, `deflated` (default), `bzip2`, `lzma`, or `zstd`) and tuned with
|
||||
`--zip-compression-level` (deflated: 0–9, bzip2: 1–9, zstd: -131072–22; ignored
|
||||
for `stored` and `lzma`). Both options require `--zip`.
|
||||
|
||||
!!! warning
|
||||
|
||||
`zstd` compression requires Python 3.14 or newer on **both** the machine
|
||||
creating the export and any machine importing it. An archive compressed with
|
||||
`zstd` (or `lzma`/`bzip2` where those modules are unavailable) cannot be
|
||||
imported on a runtime that lacks the codec; the importer will refuse it with
|
||||
a clear error. The default `deflated` is universally readable.
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Verify the docs build is not broken (lint markdown)**
|
||||
|
||||
Run: `ruff check docs/ 2>/dev/null; echo "docs are markdown; rely on prettier pre-commit"`
|
||||
(No code to test. The prettier pre-commit hook will reformat on commit.)
|
||||
|
||||
- [ ] **Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add docs/administration.md
|
||||
git commit -m "Docs: document --zip-compression and --zip-compression-level"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Task 6: Final verification
|
||||
|
||||
**Files:** none (verification only).
|
||||
|
||||
- [ ] **Step 1: Full backend suites on the VM**
|
||||
|
||||
Run: `bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/documents/tests/export/ src/documents/tests/test_management_exporter.py src/documents/tests/test_management_importer.py -v"`
|
||||
Expected: PASS, no failures.
|
||||
|
||||
- [ ] **Step 2: Spot-check the zstd happy path on Python 3.14 (cannot run under Django on the 3.11 VM)**
|
||||
|
||||
The zstd positive round-trip can't run in the 3.11 test env. Confirm the policy module behaves on a real 3.14 interpreter with a standalone check (no Django needed):
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
uv run --python 3.14 --no-project python -c "import sys; sys.path.insert(0,'src'); import django; print('skip')" 2>/dev/null || \
|
||||
uv run --python 3.14 --no-project python -c "
|
||||
import zipfile, io
|
||||
from compression.zstd import CompressionParameter as CP
|
||||
print('zstd const', zipfile.ZIP_ZSTANDARD, 'bounds', CP.compression_level.bounds())
|
||||
buf = io.BytesIO()
|
||||
with zipfile.ZipFile(buf,'w',compression=zipfile.ZIP_ZSTANDARD,compresslevel=19) as zf:
|
||||
zf.writestr('a.txt','x'*1000)
|
||||
with zipfile.ZipFile(buf) as zf:
|
||||
assert zf.getinfo('a.txt').compress_type == zipfile.ZIP_ZSTANDARD
|
||||
assert zf.read('a.txt') == b'x'*1000
|
||||
print('zstd round-trip OK')
|
||||
"
|
||||
```
|
||||
|
||||
Expected: prints `zstd const 93 bounds (-131072, 22)` and `zstd round-trip OK`. This validates the constant, bounds, and that a zstd archive round-trips — the parts the 3.11 CI cannot exercise.
|
||||
|
||||
- [ ] **Step 3: Type-check on the VM (pyrefly)**
|
||||
|
||||
```bash
|
||||
tar czf - src pyproject.toml uv.lock .pyrefly-baseline.json | ssh -o BatchMode=yes -p 2244 trenton@localhost 'tar xzf - -C ~/projects/paperless-ngx'
|
||||
ssh -o BatchMode=yes -p 2244 trenton@localhost 'bash -lc "cd ~/projects/paperless-ngx && uv run pyrefly check"'
|
||||
```
|
||||
|
||||
Expected: no new type errors beyond the baseline. (Note: `import compression.zstd` is guarded behind `importlib.import_module`, so it is never statically resolved on the 3.11 baseline.)
|
||||
|
||||
- [ ] **Step 4: Final lint**
|
||||
|
||||
Run: `ruff check src/documents/export/ src/documents/management/commands/document_exporter.py src/documents/management/commands/document_importer.py && ruff format --check src/documents/export/ src/documents/management/commands/document_exporter.py src/documents/management/commands/document_importer.py`
|
||||
Expected: clean.
|
||||
|
||||
---
|
||||
|
||||
## Notes for the implementer
|
||||
|
||||
- **Default behavior is unchanged:** with no flags, the sink is constructed with `compression=ZIP_DEFLATED, compresslevel=None` — byte-method-identical to today (`shutil.make_archive` used `ZIP_DEFLATED` with no level). `test_default_zip_uses_deflate` pins this.
|
||||
- **zstd availability is gated three ways and never imported statically:** the constant via `getattr`, the codec via `importlib.import_module("compression.zstd")`, and the CLI value rejected with a friendly message on < 3.14. The choices list always contains `zstd` so argparse doesn't hide it.
|
||||
- **The importer pre-check is the safety net** for portability foot-guns — without it an unreadable entry raises a bare `NotImplementedError` mid-`extractall`. The check runs on `infolist()` (metadata only) before any extraction.
|
||||
- **Why `--zip-compression` defaults to `None`, not `"deflated"`:** so `handle()` can detect "user passed it without `--zip`" and fail fast. The effective default is resolved as `zip_compression or "deflated"`.
|
||||
Reference in New Issue
Block a user