* Perf: streaming manifest writer for document exporter (Phase 3)
Replaces the in-memory manifest dict accumulation with a
StreamingManifestWriter that writes records to manifest.json
incrementally, keeping only one batch resident in memory at a time.
Key changes:
- Add StreamingManifestWriter: writes to .tmp atomically, BLAKE2b
compare for --compare-json, discard() on exception
- Add _encrypt_record_inline(): per-record encryption replacing the
bulk encrypt_secret_fields() call; crypto setup moved before streaming
- Add _write_split_manifest(): extracted per-document manifest writing
- Refactor dump(): non-doc records streamed during transaction, documents
accumulated then written after filenames are assigned
- Upgrade check_and_write_json() from MD5 to BLAKE2b
- Remove encrypt_secret_fields() and unused itertools.chain import
- Add profiling marker to pyproject.toml
Measured improvement (200 docs + 200 CustomFieldInstances, same
dump() code path, only writer differs):
- Peak memory: ~50% reduction
- Memory delta: ~70% reduction
- Wall time and query count: unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Refactor: O(1) lookup table for CRYPT_FIELDS in per-record encryption
Add CRYPT_FIELDS_BY_MODEL to CryptMixin, derived from CRYPT_FIELDS at
class definition time. _encrypt_record_inline() now does a single dict
lookup instead of a linear scan per record, eliminating the loop and
break pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1 -- Eliminate JSON round-trip in document exporter
Replace json.loads(serializers.serialize("json", qs)) with
serializers.serialize("python", qs) to skip the intermediate
JSON string allocation and parse step. Use DjangoJSONEncoder
in check_and_write_json() to handle native Python types
(datetime, Decimal, UUID) the Python serializer returns.
Phase 2 -- Batched QuerySet serialization in document exporter
Add serialize_queryset_batched() helper that uses QuerySet.iterator()
and itertools.islice to stream records in configurable chunks, bounding
peak memory during serialization to batch_size * avg_record_size rather
than loading the entire QuerySet at once.
* Saving some start on this
* At least partially working for the tesseract parser
* Problems with migration testing need to figure out
* Work around that error
* Fixes max m_pixels
* Moving the settings to main paperless application
* Starting some consumer options
* More fixes and work
* Fixes these last tests
* Fix max_length on OcrSettings.mode field
* Fix all fields on Common & Ocr settings serializers
* Umbrellla config view
* Revert "Umbrellla config view"
This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5.
* Updates to use a single configuration object for all settings
* Squashed commit of the following:
commit 8a0a49dd57
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 23:02:47 2023 -0800
Fix formatting
commit 66b2d90c50
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 22:36:35 2023 -0800
Refactor frontend data models
commit 5723bd8dd8
Author: Adam Bogdał <adam@bogdal.pl>
Date: Wed Dec 20 01:17:43 2023 +0100
Fix: speed up admin panel for installs with a large number of documents (#5052)
commit 9b08ce1761
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 15:18:51 2023 -0800
Update PULL_REQUEST_TEMPLATE.md
commit a6248bec2d
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 15:02:05 2023 -0800
Chore: Update Angular to v17 (#4980)
commit b1f6f52486
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 13:53:56 2023 -0800
Fix: Dont allow null custom_fields property via API (#5063)
commit 638d9970fd
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 13:43:50 2023 -0800
Enhancement: symmetric document links (#4907)
commit 5e8de4c1da
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 19 12:45:04 2023 -0800
Enhancement: shared icon & shared by me filter (#4859)
commit 088bad9030
Author: Trenton H <797416+stumpylog@users.noreply.github.com>
Date: Tue Dec 19 12:04:03 2023 -0800
Bulk updates all the backend libraries (#5061)
* Saving some work on frontend config
* Very basic but dynamically-generated config form
* Saving work on slightly less ugly frontend config
* JSON validation for user_args field
* Fully dynamic config form
* Adds in some additional validators for a nicer error message
* Cleaning up the testing and coverage more
* Reverts unintentional change
* Adds documentation about the settings and the precedence
* Couple more commenting and style fixes
---------
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>