Commit Graph

3850 Commits

Author SHA1 Message Date
GitHub Actions
217b5df591 Auto translate strings 2026-03-10 23:47:25 +00:00
shamoon
3efc9a5733 Fix: use effective content for matching and suggestion content (#12293) 2026-03-10 23:45:56 +00:00
GitHub Actions
2b4ea570ef Auto translate strings 2026-03-10 18:58:20 +00:00
shamoon
86573fc1a0 Chore: separate actions from bulk edit endpoint (#12286) 2026-03-10 18:55:36 +00:00
GitHub Actions
1221e7f21c Auto translate strings 2026-03-09 22:37:56 +00:00
shamoon
3e32e90355 Breaking: drop support for api versions < 9 (#12284) 2026-03-09 22:36:22 +00:00
Trenton H
63cb75564e Chore: Remove some further old items (encryption passphrase and PNG handling) (#12290) 2026-03-09 22:04:51 +00:00
GitHub Actions
0c7d56c5e7 Auto translate strings 2026-03-09 17:45:53 +00:00
Trenton H
0bcf904e3a Chore: Finish settings refactor (#12263) 2026-03-09 17:43:51 +00:00
Trenton H
bcc2f11152 Performance: Stream JSON during import for memory improvements (#12276)
* Perf: stream manifest parsing with ijson in document_importer

Replace bulk json.load of the full manifest (which materializes the
entire JSON array into memory) with incremental ijson streaming.
Eliminates self.manifest entirely — records are never all in memory
at once.

- Add ijson>=3.2 dependency
- New module-level iter_manifest_records() generator
- load_manifest_files() collects paths only; no parsing at load time
- check_manifest_validity() streams without accumulating records
- decrypt_secret_fields() streams each manifest to a .decrypted.json
  temp file record-by-record; temp files cleaned up after file copy
- _import_files_from_manifest() collects only document records (small
  fraction of manifest) for the tqdm progress bar

Measured on 200 docs + 200 CustomFieldInstances:
- Streaming validation: peak memory 3081 KiB -> 333 KiB (89% reduction)
- Stream-decrypt to file: peak memory 3081 KiB -> 549 KiB (82% reduction)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Perf: slim dict in _import_files_from_manifest, discard fields

When collecting document records for the file-copy step, extract only
the 4 keys the loop actually uses (pk + 3 exported filename keys) and
discard the full fields dict (content, checksum, tags, etc.).

Peak memory for the document-record list: 939 KiB -> 375 KiB (60% reduction).
Wall time unchanged.
2026-03-09 10:20:48 -07:00
Trenton H
e30676f889 Feature: Migrate import/export to rich progress (#12260)
* Refactor: migrate exporter/importer from tqdm to PaperlessCommand.track()

Replace direct tqdm usage in document_exporter and document_importer with
the PaperlessCommand base class and its track() method, which is backed by
Rich and handles --no-progress-bar automatically. Also removes the unused
ProgressBarMixin from mixins.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Refactor: add explicit supports_progress_bar and supports_multiprocessing to all PaperlessCommand subclasses

Each management command now explicitly declares both class attributes
rather than relying on defaults, making intent unambiguous at a glance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 08:59:17 -07:00
GitHub Actions
4badf0e7c2 Auto translate strings 2026-03-09 01:52:08 +00:00
Paul Gessinger
bc26d94593 Chore: Add saved view compatibility in API version 9 (#12280)
---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-03-08 18:50:31 -07:00
Trenton H
2cdb1424ef Performance: Further export memory improvements (#12273)
* Perf: streaming manifest writer for document exporter (Phase 3)

Replaces the in-memory manifest dict accumulation with a
StreamingManifestWriter that writes records to manifest.json
incrementally, keeping only one batch resident in memory at a time.

Key changes:
- Add StreamingManifestWriter: writes to .tmp atomically, BLAKE2b
  compare for --compare-json, discard() on exception
- Add _encrypt_record_inline(): per-record encryption replacing the
  bulk encrypt_secret_fields() call; crypto setup moved before streaming
- Add _write_split_manifest(): extracted per-document manifest writing
- Refactor dump(): non-doc records streamed during transaction, documents
  accumulated then written after filenames are assigned
- Upgrade check_and_write_json() from MD5 to BLAKE2b
- Remove encrypt_secret_fields() and unused itertools.chain import
- Add profiling marker to pyproject.toml

Measured improvement (200 docs + 200 CustomFieldInstances, same
dump() code path, only writer differs):
- Peak memory: ~50% reduction
- Memory delta: ~70% reduction
- Wall time and query count: unchanged

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Refactor: O(1) lookup table for CRYPT_FIELDS in per-record encryption

Add CRYPT_FIELDS_BY_MODEL to CryptMixin, derived from CRYPT_FIELDS at
class definition time. _encrypt_record_inline() now does a single dict
lookup instead of a linear scan per record, eliminating the loop and
break pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 14:24:50 -08:00
Trenton H
f5c0c21922 Chore: Lazy imports of the heavy AI modules (#12275) 2026-03-07 12:53:22 -08:00
Trenton H
9d5e618de8 Chore: pytest style paperless tests (#12254) 2026-03-06 13:04:23 -08:00
GitHub Actions
7345f2e81c Auto translate strings 2026-03-06 20:01:12 +00:00
shamoon
731448a8f9 Fixhancement: support version-specific edits (#12233) 2026-03-06 11:59:26 -08:00
shamoon
24a2cfd957 Change: use explicit doc creation instead of clone for versions (#12226) 2026-03-04 15:57:44 -08:00
GitHub Actions
7cf2ef6398 Auto translate strings 2026-03-04 23:29:54 +00:00
shamoon
df03207eef Fix: correct doc version filename handling (#12223) 2026-03-04 23:28:07 +00:00
Trenton H
1e21bcd26e Breaking: Drop support for Python 3.10 (#12234) 2026-03-04 15:03:33 -08:00
Trenton H
a9cb89c633 Enhancement: Improve exporter memory efficiency (#12236)
Phase 1 -- Eliminate JSON round-trip in document exporter

Replace json.loads(serializers.serialize("json", qs)) with
serializers.serialize("python", qs) to skip the intermediate
JSON string allocation and parse step. Use DjangoJSONEncoder
in check_and_write_json() to handle native Python types
(datetime, Decimal, UUID) the Python serializer returns.

Phase 2 -- Batched QuerySet serialization in document exporter

Add serialize_queryset_batched() helper that uses QuerySet.iterator()
and itertools.islice to stream records in configurable chunks, bounding
peak memory during serialization to batch_size * avg_record_size rather
than loading the entire QuerySet at once.
2026-03-04 14:54:20 -08:00
GitHub Actions
a37e24c1ad Auto translate strings 2026-03-04 22:17:32 +00:00
shamoon
85a18e5911 Enhancement: saved view sharing (#12142) 2026-03-04 14:15:43 -08:00
GitHub Actions
ae182c459b Auto translate strings 2026-03-04 21:34:02 +00:00
shamoon
d51a118aac Merge branch 'main' into dev 2026-03-04 13:31:20 -08:00
shamoon
8f311c4b6b Bump version to 2.20.10 2026-03-04 10:38:14 -08:00
shamoon
f25322600d Merge branch 'release/v2.20.x' 2026-03-04 10:09:01 -08:00
shamoon
615f27e6fb Fix: support string coercion in filepath jinja templates (#12244) 2026-03-04 08:32:34 -08:00
Andreas Schneider
190fc70288 Fix: use maxsplit=1 in Redis URL parsing to handle URLs with multiple colons (#12239) 2026-03-04 01:06:51 -08:00
shamoon
5b809122b5 Fix: apply ordering after annotating tag document count (#12238) 2026-03-04 00:33:13 -08:00
GitHub Actions
c623234769 Auto translate strings 2026-03-04 00:29:21 +00:00
shamoon
299dac21ee Enhancement: “live” document updates (#12141) 2026-03-04 00:27:07 +00:00
Trenton H
5498503d60 Chore: Improve user migration path (#12232)
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-03-03 15:51:48 -08:00
shamoon
8b8307571a Fix: enforce path limit for db filename fields (#12235) 2026-03-03 13:19:56 -08:00
Trenton H
43406f44f2 Feature: Improve the retagger output using rich (#12194) 2026-03-03 07:14:59 -08:00
Trenton H
e58a35d40c Feature: Transition sanity check to rich and improve output (#12182) 2026-03-02 10:53:39 -08:00
Trenton H
20a9cd40e8 Feature: Switch all indexing to use rich (#12193) 2026-03-02 10:41:09 -08:00
GitHub Actions
62efb4078f Auto translate strings 2026-03-02 16:23:02 +00:00
shamoon
96ac7b2336 Tweak: Ignore version docs for workflows (#12217) 2026-03-02 08:21:14 -08:00
shamoon
95484db71b Update paperless_mail npm packges 2026-03-02 00:12:52 -08:00
shamoon
8e7084eba7 Bump tailwindcss to v3.4.19 2026-03-02 00:12:52 -08:00
GitHub Actions
dd06627e43 Auto translate strings 2026-02-28 10:34:26 +00:00
shamoon
f65807b906 Merge branch 'main' into dev
# Conflicts:
#	docs/setup.md
#	src-ui/src/app/components/manage/document-attributes/management-list/management-list.component.ts
#	src/documents/tests/test_api_documents.py
2026-02-28 02:31:20 -08:00
shamoon
47f9f642a9 Bump version to 2.20.9 2026-02-28 01:35:26 -08:00
shamoon
c7f83212a3 Enforce on selection_data too 2026-02-28 01:27:40 -08:00
shamoon
b010f65ae7 Fix GHSA-386h-chg4-cfw9 2026-02-28 01:16:53 -08:00
Jan Kleine
0bc032a67d Development: improve test portability (#12187)
* Fix: improve test portability

* Make settings always consistent

* Make a few more tests deterministic wrt settings

* Dont pollute settings for this one

* Fix timezone issue with mail parser

* Update test_parser.py

* Uh, I guess OCR gives variants for this

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-02-27 23:24:11 +00:00
GitHub Actions
8531078a54 Auto translate strings 2026-02-27 22:39:20 +00:00