Commit Graph

92 Commits

Author SHA1 Message Date
Trenton H 2cdb1424ef Performance: Further export memory improvements (#12273)
* Perf: streaming manifest writer for document exporter (Phase 3)

Replaces the in-memory manifest dict accumulation with a
StreamingManifestWriter that writes records to manifest.json
incrementally, keeping only one batch resident in memory at a time.

Key changes:
- Add StreamingManifestWriter: writes to .tmp atomically, BLAKE2b
  compare for --compare-json, discard() on exception
- Add _encrypt_record_inline(): per-record encryption replacing the
  bulk encrypt_secret_fields() call; crypto setup moved before streaming
- Add _write_split_manifest(): extracted per-document manifest writing
- Refactor dump(): non-doc records streamed during transaction, documents
  accumulated then written after filenames are assigned
- Upgrade check_and_write_json() from MD5 to BLAKE2b
- Remove encrypt_secret_fields() and unused itertools.chain import
- Add profiling marker to pyproject.toml

Measured improvement (200 docs + 200 CustomFieldInstances, same
dump() code path, only writer differs):
- Peak memory: ~50% reduction
- Memory delta: ~70% reduction
- Wall time and query count: unchanged

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Refactor: O(1) lookup table for CRYPT_FIELDS in per-record encryption

Add CRYPT_FIELDS_BY_MODEL to CryptMixin, derived from CRYPT_FIELDS at
class definition time. _encrypt_record_inline() now does a single dict
lookup instead of a linear scan per record, eliminating the loop and
break pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 14:24:50 -08:00
Trenton H a9cb89c633 Enhancement: Improve exporter memory efficiency (#12236)
Phase 1 -- Eliminate JSON round-trip in document exporter

Replace json.loads(serializers.serialize("json", qs)) with
serializers.serialize("python", qs) to skip the intermediate
JSON string allocation and parse step. Use DjangoJSONEncoder
in check_and_write_json() to handle native Python types
(datetime, Decimal, UUID) the Python serializer returns.

Phase 2 -- Batched QuerySet serialization in document exporter

Add serialize_queryset_batched() helper that uses QuerySet.iterator()
and itertools.islice to stream records in configurable chunks, bounding
peak memory during serialization to batch_size * avg_record_size rather
than loading the entire QuerySet at once.
2026-03-04 14:54:20 -08:00
Sebastian Steinbeißer 3b5ffbf9fa Chore(mypy): Annotate None returns for typing improvements (#11213) 2026-02-02 08:44:12 -08:00
Trenton H d0032c18be Breaking: Remove support for document and thumbnail encryption (#11850) 2026-01-24 19:29:54 -08:00
Sebastian Steinbeißer 6dca4daea5 Chore: switch from os.path to pathlib.Path (#10397) 2025-08-06 10:50:42 -07:00
Trenton H 6804c92861 Fix: Include email and webhook objects in the export (#8790) 2025-01-17 13:00:59 -08:00
dependabot[bot] 00485138f9 Chore(deps-dev): Bump the development group with 4 updates (#8352)
Bumps the development group with 4 updates: [ruff](https://github.com/astral-sh/ruff), [pytest-httpx](https://github.com/Colin-b/pytest_httpx), [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) and [mkdocs-material](https://github.com/squidfunk/mkdocs-material).


Updates `ruff` from 0.7.3 to 0.8.0
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.7.3...0.8.0)

Updates `pytest-httpx` from 0.33.0 to 0.34.0
- [Release notes](https://github.com/Colin-b/pytest_httpx/releases)
- [Changelog](https://github.com/Colin-b/pytest_httpx/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/Colin-b/pytest_httpx/compare/v0.33.0...v0.34.0)

Updates `pytest-rerunfailures` from 14.0 to 15.0
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/14.0...15.0)

Updates `mkdocs-material` from 9.5.44 to 9.5.46
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.44...9.5.46)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: pytest-httpx
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: pytest-rerunfailures
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: development
- dependency-name: mkdocs-material
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-01 18:56:54 -08:00
Kevin Doren 827121808a Enhancement: Add --compare-json option to document_exporter to write json files only if changed (#8261) 2024-11-19 07:20:24 -08:00
shamoon e94a92ed59 Feature: two-factor authentication (#8012) 2024-11-18 18:34:46 +00:00
shamoon aac04e73b9 Fix: correct serializing of auth tokens for export (#8100) 2024-10-29 17:02:32 +00:00
shamoon 7649903d3c Enhancement / fix: include social accounts and api tokens in export (#8016) 2024-10-26 06:51:22 -07:00
Trenton H e6f59472e4 Chore: Drop Python 3.9 support (#7774) 2024-09-26 12:22:24 -07:00
Trenton H d9002005b1 Feature: Allow encrypting sensitive fields in export (#6927)
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-06-09 14:41:18 +00:00
Trenton H 085447e7c4 Feature: Allow a data only export/import cycle (#6871) 2024-06-01 18:22:59 -07:00
Trenton H d4d0604da2 Moves additional auditlog imports into protected blocks (#6638) 2024-05-08 09:04:32 -07:00
shamoon 0f8b2e69c9 Change: enable auditlog by default, fix import / export (#6267) 2024-04-04 18:51:15 +00:00
Trenton H 13201dbfff Ensure all creations of directories create the parents too (#5711) 2024-02-10 11:02:40 -08:00
Trenton H 8da2535a65 Fix: zip exports not respecting the --delete option (#5245) 2024-01-04 19:58:58 +00:00
shamoon 3b6ce16f1c Feature: Workflows (#5121) 2024-01-03 08:19:19 +00:00
Trenton H 061f33fb05 Feature: Allow setting backend configuration settings via the UI (#5126)
* Saving some start on this

* At least partially working for the tesseract parser

* Problems with migration testing need to figure out

* Work around that error

* Fixes max m_pixels

* Moving the settings to main paperless application

* Starting some consumer options

* More fixes and work

* Fixes these last tests

* Fix max_length on OcrSettings.mode field

* Fix all fields on Common & Ocr settings serializers

* Umbrellla config view

* Revert "Umbrellla config view"

This reverts commit fbaf9f4be30f89afeb509099180158a3406416a5.

* Updates to use a single configuration object for all settings

* Squashed commit of the following:

commit 8a0a49dd57
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 23:02:47 2023 -0800

    Fix formatting

commit 66b2d90c50
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 22:36:35 2023 -0800

    Refactor frontend data models

commit 5723bd8dd8
Author: Adam Bogdał <adam@bogdal.pl>
Date:   Wed Dec 20 01:17:43 2023 +0100

    Fix: speed up admin panel for installs with a large number of documents (#5052)

commit 9b08ce1761
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:18:51 2023 -0800

    Update PULL_REQUEST_TEMPLATE.md

commit a6248bec2d
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 15:02:05 2023 -0800

    Chore: Update Angular to v17 (#4980)

commit b1f6f52486
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:53:56 2023 -0800

    Fix: Dont allow null custom_fields property via API (#5063)

commit 638d9970fd
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 13:43:50 2023 -0800

    Enhancement: symmetric document links (#4907)

commit 5e8de4c1da
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date:   Tue Dec 19 12:45:04 2023 -0800

    Enhancement: shared icon & shared by me filter (#4859)

commit 088bad9030
Author: Trenton H <797416+stumpylog@users.noreply.github.com>
Date:   Tue Dec 19 12:04:03 2023 -0800

    Bulk updates all the backend libraries (#5061)

* Saving some work on frontend config

* Very basic but dynamically-generated config form

* Saving work on slightly less ugly frontend config

* JSON validation for user_args field

* Fully dynamic config form

* Adds in some additional validators for a nicer error message

* Cleaning up the testing and coverage more

* Reverts unintentional change

* Adds documentation about the settings and the precedence

* Couple more commenting and style fixes

---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2023-12-29 15:42:56 -08:00
Trenton H be2de4f15d Fixes export of custom field instances during a split manifest export (#4984) 2023-12-14 19:23:39 -08:00
shamoon 90f90dc9b4 Fix: export consumption templates & custom fields in exporter (#4825) 2023-12-04 21:33:15 -08:00
Trenton H caddcaf807 Forces JSON files to be written as UTF-8, and disables the ensure ASCII option which escapes non-ASCII chars (#4574) 2023-11-13 14:18:21 -08:00
Trenton H e8527ba723 Chore: Cleanup command arguments and standardize process count handling (#4541)
Cleans up some command help text and adds more control over process count for command with a Pool
2023-11-09 11:46:37 -08:00
Trenton H ec9ebd3026 Allow the user the specifiy the zip file name (#4189) 2023-09-15 16:33:28 -07:00
Trenton Holmes 650c816a7b Removes support for Python 3.8 and lower from the code base 2023-09-10 11:42:59 -07:00
Trenton H 9f5d47c320 Fixes issues with copy2 or copystat and SELinux see #3665 2023-07-22 06:27:49 -07:00
shamoon bbd4659fbf Include global and object-level permissions in export / import
adds test for transaction
2023-06-23 23:33:36 -07:00
shamoon 243598ae50 Exclude consumer & AnonymousUser users from export manifest 2023-05-30 20:51:25 -07:00
Trenton H 6f163111ce Upgrades black to v23, upgrades ruff 2023-04-26 09:35:27 -07:00
Trenton H 3bcbd05252 Fixes ruff not running isort against the codebase 2023-04-26 09:35:27 -07:00
Trenton H ce41ac9158 Configures ruff as the one stop linter and resolves warnings it raised 2023-04-01 17:03:52 -07:00
shamoon bf8ae22f3f Rename comments --> notes 2023-03-18 13:59:17 -07:00
Trenton H c422a081bf Be sure the scratch directory exists before using it as temporary directory 2023-03-01 07:13:31 -08:00
Trenton H 06e2500443 Moves the mktime call into the if block where it is used, preventing exceptions during rare cases 2023-02-02 07:25:32 -08:00
Matthieu Helleboid 02a40055f5 replace --use-filename-prefix with --use-folder-prefix 2023-01-24 11:06:49 -08:00
Matthieu Helleboid aeecc10e45 sort exporter option by alphabetical order 2023-01-24 11:06:49 -08:00
Matthieu Helleboid 270f8677a7 add document comments to dedicated manifest file when using 'split-manifest' 2023-01-24 11:06:49 -08:00
Matthieu Helleboid 20763e7c26 Fix split_manifest default value 2023-01-24 11:06:49 -08:00
Matthieu Helleboid b33ba4c902 fix json serialization bug after migration after to Pathlib) 2023-01-24 11:06:49 -08:00
Matthieu Helleboid fae5e834b9 fix bug on administration exporter when using -d, --delete option 2023-01-24 11:06:49 -08:00
Matthieu Helleboid 4cb4bd13ad add split-manifest option to administration exporter 2023-01-24 11:06:49 -08:00
Matthieu Helleboid 896304ccaa add prefix option to administration exporter 2023-01-24 11:06:49 -08:00
Matthieu Helleboid 9ae186e6f9 add no-archive and no-thumbnail options to administration exporter and importer 2023-01-24 11:06:49 -08:00
Trenton H b25f083687 Updates the exporter to use pathlib and add a few more tests for coverage 2023-01-14 06:33:12 -08:00
Trenton Holmes a6b7beaf6b Adds option to allow a user to export directory to a zipfile 2022-12-04 16:38:25 -08:00
Michael Shamoon 15d074d39c Include storage path in exporter + tests 2022-09-11 07:39:35 -07:00
Michael Shamoon 0f4b118b61 Basic verification of Comment export & exporter comment tuple fix
From oprhaned commits https://github.com/paperless-ngx/paperless-ngx/commit/b1855a4b7af689d0a7c7f18bf7ef513967da269f
https://github.com/paperless-ngx/paperless-ngx/commit/53f21574fd8af0f3561c12f709a14980f8f1cb7f

Co-Authored-By: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
2022-08-24 14:24:10 -07:00
Michael Shamoon d5018af2a3 python code style 2022-08-23 19:20:08 -07:00
tim-vogel 817882ff6f add comment function 2022-08-23 19:19:21 -07:00