Commit Graph

343 Commits

Author SHA1 Message Date
Trenton H 58c117d3b4 Cleans up the old passphrase stuff and the PNG handling during import 2026-03-09 09:50:36 -07:00
Trenton H e30676f889 Feature: Migrate import/export to rich progress (#12260)
* Refactor: migrate exporter/importer from tqdm to PaperlessCommand.track()

Replace direct tqdm usage in document_exporter and document_importer with
the PaperlessCommand base class and its track() method, which is backed by
Rich and handles --no-progress-bar automatically. Also removes the unused
ProgressBarMixin from mixins.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Refactor: add explicit supports_progress_bar and supports_multiprocessing to all PaperlessCommand subclasses

Each management command now explicitly declares both class attributes
rather than relying on defaults, making intent unambiguous at a glance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 08:59:17 -07:00
Trenton H 2cdb1424ef Performance: Further export memory improvements (#12273)
* Perf: streaming manifest writer for document exporter (Phase 3)

Replaces the in-memory manifest dict accumulation with a
StreamingManifestWriter that writes records to manifest.json
incrementally, keeping only one batch resident in memory at a time.

Key changes:
- Add StreamingManifestWriter: writes to .tmp atomically, BLAKE2b
  compare for --compare-json, discard() on exception
- Add _encrypt_record_inline(): per-record encryption replacing the
  bulk encrypt_secret_fields() call; crypto setup moved before streaming
- Add _write_split_manifest(): extracted per-document manifest writing
- Refactor dump(): non-doc records streamed during transaction, documents
  accumulated then written after filenames are assigned
- Upgrade check_and_write_json() from MD5 to BLAKE2b
- Remove encrypt_secret_fields() and unused itertools.chain import
- Add profiling marker to pyproject.toml

Measured improvement (200 docs + 200 CustomFieldInstances, same
dump() code path, only writer differs):
- Peak memory: ~50% reduction
- Memory delta: ~70% reduction
- Wall time and query count: unchanged

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Refactor: O(1) lookup table for CRYPT_FIELDS in per-record encryption

Add CRYPT_FIELDS_BY_MODEL to CryptMixin, derived from CRYPT_FIELDS at
class definition time. _encrypt_record_inline() now does a single dict
lookup instead of a linear scan per record, eliminating the loop and
break pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 14:24:50 -08:00
Trenton H a9cb89c633 Enhancement: Improve exporter memory efficiency (#12236)
Phase 1 -- Eliminate JSON round-trip in document exporter

Replace json.loads(serializers.serialize("json", qs)) with
serializers.serialize("python", qs) to skip the intermediate
JSON string allocation and parse step. Use DjangoJSONEncoder
in check_and_write_json() to handle native Python types
(datetime, Decimal, UUID) the Python serializer returns.

Phase 2 -- Batched QuerySet serialization in document exporter

Add serialize_queryset_batched() helper that uses QuerySet.iterator()
and itertools.islice to stream records in configurable chunks, bounding
peak memory during serialization to batch_size * avg_record_size rather
than loading the entire QuerySet at once.
2026-03-04 14:54:20 -08:00
Trenton H 43406f44f2 Feature: Improve the retagger output using rich (#12194) 2026-03-03 07:14:59 -08:00
Trenton H e58a35d40c Feature: Transition sanity check to rich and improve output (#12182) 2026-03-02 10:53:39 -08:00
Trenton H 20a9cd40e8 Feature: Switch all indexing to use rich (#12193) 2026-03-02 10:41:09 -08:00
Trenton H c30ee1ec03 Feature: Switch progress bar library to rich (#12169) 2026-02-26 12:20:21 -08:00
Sebastian Steinbeißer 3b5ffbf9fa Chore(mypy): Annotate None returns for typing improvements (#11213) 2026-02-02 08:44:12 -08:00
Trenton H 6859e7e3c2 Chore: Resolve more flaky tests (#11920) 2026-01-28 16:13:27 +00:00
Trenton H d0032c18be Breaking: Remove support for document and thumbnail encryption (#11850) 2026-01-24 19:29:54 -08:00
Trenton H 51b466a86b Feature: Simplify and improve the consumer (#11753) 2026-01-21 14:37:48 -08:00
shamoon e940764fe0 Feature: Paperless AI (#10319) 2026-01-13 16:24:42 +00:00
shamoon b3d6359afc Chore: set signal receivers with weak=False 2025-11-17 10:02:32 -08:00
shamoon 6119c215e7 Fix: skip fuzzy matching for empty document content (#10914) 2025-09-22 23:30:24 -07:00
shamoon 0e35acaef5 Fix: add extra error handling to _consume for file checks (#10897) 2025-09-21 13:21:40 -07:00
Sebastian Steinbeißer d2064a2535 Chore: switch from os.path to pathlib.Path (#10539) 2025-09-03 08:12:41 -07:00
dependabot[bot] edb8c06e2a Chore(deps): Bump the django group across 1 directory with 9 updates (#10538)
* Chore(deps): Bump the django group across 1 directory with 9 updates

Bumps the django group with 9 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [django](https://github.com/django/django) | `5.1.8` | `5.2.5` |
| [django-auditlog](https://github.com/jazzband/django-auditlog) | `3.1.2` | `3.2.1` |
| [django-guardian](https://github.com/django-guardian/django-guardian) | `2.4.0` | `3.0.3` |
| [django-multiselectfield](https://github.com/goinnn/django-multiselectfield) | `0.1.13` | `1.0.1` |
| [django-soft-delete](https://github.com/san4ezy/django_softdelete) | `1.0.18` | `1.0.19` |
| [djangorestframework](https://github.com/encode/django-rest-framework) | `3.16.0` | `3.16.1` |
| [djangorestframework-guardian](https://github.com/rpkilby/django-rest-framework-guardian) | `0.3.0` | `0.4.0` |
| [drf-spectacular-sidecar](https://github.com/tfranzel/drf-spectacular-sidecar) | `2025.4.1` | `2025.8.1` |
| [pytest-django](https://github.com/pytest-dev/pytest-django) | `4.10.0` | `4.11.1` |



Updates `django` from 5.1.8 to 5.2.5
- [Commits](https://github.com/django/django/compare/5.1.8...5.2.5)

Updates `django-auditlog` from 3.1.2 to 3.2.1
- [Release notes](https://github.com/jazzband/django-auditlog/releases)
- [Changelog](https://github.com/jazzband/django-auditlog/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jazzband/django-auditlog/compare/v3.1.2...v3.2.1)

Updates `django-guardian` from 2.4.0 to 3.0.3
- [Release notes](https://github.com/django-guardian/django-guardian/releases)
- [Commits](https://github.com/django-guardian/django-guardian/compare/v2.4.0...3.0.3)

Updates `django-multiselectfield` from 0.1.13 to 1.0.1
- [Release notes](https://github.com/goinnn/django-multiselectfield/releases)
- [Changelog](https://github.com/goinnn/django-multiselectfield/blob/master/CHANGES.rst)
- [Commits](https://github.com/goinnn/django-multiselectfield/compare/v0.1.13...v1.0.1)

Updates `django-soft-delete` from 1.0.18 to 1.0.19
- [Changelog](https://github.com/san4ezy/django_softdelete/blob/master/CHANGELOG.md)
- [Commits](https://github.com/san4ezy/django_softdelete/commits)

Updates `djangorestframework` from 3.16.0 to 3.16.1
- [Release notes](https://github.com/encode/django-rest-framework/releases)
- [Commits](https://github.com/encode/django-rest-framework/compare/3.16.0...3.16.1)

Updates `djangorestframework-guardian` from 0.3.0 to 0.4.0
- [Changelog](https://github.com/rpkilby/django-rest-framework-guardian/blob/master/CHANGELOG)
- [Commits](https://github.com/rpkilby/django-rest-framework-guardian/compare/0.3.0...0.4.0)

Updates `drf-spectacular-sidecar` from 2025.4.1 to 2025.8.1
- [Commits](https://github.com/tfranzel/drf-spectacular-sidecar/compare/2025.4.1...2025.8.1)

Updates `pytest-django` from 4.10.0 to 4.11.1
- [Release notes](https://github.com/pytest-dev/pytest-django/releases)
- [Changelog](https://github.com/pytest-dev/pytest-django/blob/main/docs/changelog.rst)
- [Commits](https://github.com/pytest-dev/pytest-django/compare/v4.10.0...v4.11.1)

---
updated-dependencies:
- dependency-name: django
  dependency-version: 5.2.5
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: django
- dependency-name: django-auditlog
  dependency-version: 3.2.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: django
- dependency-name: django-guardian
  dependency-version: 3.0.3
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: django
- dependency-name: django-multiselectfield
  dependency-version: 1.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: django
- dependency-name: django-soft-delete
  dependency-version: 1.0.19
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: django
- dependency-name: djangorestframework
  dependency-version: 3.16.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: django
- dependency-name: djangorestframework-guardian
  dependency-version: 0.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: django
- dependency-name: drf-spectacular-sidecar
  dependency-version: 2025.8.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: django
- dependency-name: pytest-django
  dependency-version: 4.11.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: django
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix log matches related to newlines, add newlines to stdout.writelines

* Fix disable api remote auth test, Django 5.2 no longer uses process_request

* Remove postgres version check

* Update administration.md

* Handle django-multiselectfield v1.0 changes

* Update administration.md

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2025-08-11 13:45:14 -07:00
Sebastian Steinbeißer 6dca4daea5 Chore: switch from os.path to pathlib.Path (#10397) 2025-08-06 10:50:42 -07:00
Kilian 246f17c6c8 Enhancement: support import of zipped export (#10073) 2025-06-13 10:06:37 -07:00
shamoon 0b079f57ed Partially Revert "Chore(deps): Bump the small-changes group across 1 directory with 6 updates (#9921)"
This partially reverts commit 1fe166e014.
2025-05-19 12:20:05 -07:00
dependabot[bot] 1fe166e014 Chore(deps): Bump the small-changes group across 1 directory with 6 updates (#9921)
* Chore(deps): Bump the small-changes group across 1 directory with 6 updates

Bumps the small-changes group with 6 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [concurrent-log-handler](https://github.com/Preston-Landers/concurrent-log-handler) | `0.9.25` | `0.9.26` |
| [django](https://github.com/django/django) | `5.1.8` | `5.2.1` |
| [django-auditlog](https://github.com/jazzband/django-auditlog) | `3.0.0` | `3.1.2` |
| [drf-spectacular-sidecar](https://github.com/tfranzel/drf-spectacular-sidecar) | `2025.4.1` | `2025.5.1` |
| [ocrmypdf](https://github.com/ocrmypdf/OCRmyPDF) | `16.10.0` | `16.10.1` |
| [setproctitle](https://github.com/dvarrazzo/py-setproctitle) | `1.3.5` | `1.3.6` |



Updates `concurrent-log-handler` from 0.9.25 to 0.9.26
- [Release notes](https://github.com/Preston-Landers/concurrent-log-handler/releases)
- [Changelog](https://github.com/Preston-Landers/concurrent-log-handler/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Preston-Landers/concurrent-log-handler/compare/0.9.25...0.9.26)

Updates `django` from 5.1.8 to 5.2.1
- [Commits](https://github.com/django/django/compare/5.1.8...5.2.1)

Updates `django-auditlog` from 3.0.0 to 3.1.2
- [Release notes](https://github.com/jazzband/django-auditlog/releases)
- [Changelog](https://github.com/jazzband/django-auditlog/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jazzband/django-auditlog/compare/v3.0.0...v3.1.2)

Updates `drf-spectacular-sidecar` from 2025.4.1 to 2025.5.1
- [Commits](https://github.com/tfranzel/drf-spectacular-sidecar/compare/2025.4.1...2025.5.1)

Updates `ocrmypdf` from 16.10.0 to 16.10.1
- [Release notes](https://github.com/ocrmypdf/OCRmyPDF/releases)
- [Changelog](https://github.com/ocrmypdf/OCRmyPDF/blob/main/docs/release_notes.md)
- [Commits](https://github.com/ocrmypdf/OCRmyPDF/compare/v16.10.0...v16.10.1)

Updates `setproctitle` from 1.3.5 to 1.3.6
- [Changelog](https://github.com/dvarrazzo/py-setproctitle/blob/master/HISTORY.rst)
- [Commits](https://github.com/dvarrazzo/py-setproctitle/compare/version-1.3.5...version-1.3.6)

---
updated-dependencies:
- dependency-name: concurrent-log-handler
  dependency-version: 0.9.26
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: small-changes
- dependency-name: django
  dependency-version: 5.2.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: small-changes
- dependency-name: django-auditlog
  dependency-version: 3.1.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: small-changes
- dependency-name: drf-spectacular-sidecar
  dependency-version: 2025.5.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: small-changes
- dependency-name: ocrmypdf
  dependency-version: 16.10.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: small-changes
- dependency-name: setproctitle
  dependency-version: 1.3.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: small-changes
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix log matches related to newlines, add newlines to stdout.writelines

* Fix disable api remote auth test, Django 5.2 no longer uses process_request

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2025-05-19 11:03:02 -07:00
Trenton H f4413e0a08 Fix: Always clean up INotify (#9359) 2025-03-11 10:01:00 -07:00
shamoon 2d52226732 Enhancement: system status report sanity check, simpler classifier check, styling updates (#9106) 2025-02-26 22:12:20 +00:00
Sebastian Steinbeißer e560fa3be0 Chore: Enable ruff FBT (#8645) 2025-02-07 09:12:03 -08:00
dependabot[bot] 20ec8cb57b Chore(deps-dev): Bump the development group with 2 updates (#8841)
* Chore(deps-dev): Bump the development group with 2 updates

Bumps the development group with 2 updates: [ruff](https://github.com/astral-sh/ruff) and [mkdocs-material](https://github.com/squidfunk/mkdocs-material).


Updates `ruff` from 0.8.6 to 0.9.2
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.8.6...0.9.2)

Updates `mkdocs-material` from 9.5.49 to 9.5.50
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.49...9.5.50)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: mkdocs-material
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update .pre-commit-config.yaml

* Run new ruff format

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2025-01-21 19:22:25 +00:00
Trenton H 6804c92861 Fix: Include email and webhook objects in the export (#8790) 2025-01-17 13:00:59 -08:00
Sebastian Steinbeißer 935d077836 Chore: Switch from os.path to pathlib.Path (#8325)
---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2025-01-06 12:12:27 -08:00
shamoon 7d182ab894 Enhancement: prune audit logs and management command (#8416) 2024-12-03 19:28:27 +00:00
shamoon 0fc1860d4c Enhancement: use stable unique IDs for custom field select options (#8299) 2024-12-02 04:15:38 +00:00
dependabot[bot] 00485138f9 Chore(deps-dev): Bump the development group with 4 updates (#8352)
Bumps the development group with 4 updates: [ruff](https://github.com/astral-sh/ruff), [pytest-httpx](https://github.com/Colin-b/pytest_httpx), [pytest-rerunfailures](https://github.com/pytest-dev/pytest-rerunfailures) and [mkdocs-material](https://github.com/squidfunk/mkdocs-material).


Updates `ruff` from 0.7.3 to 0.8.0
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.7.3...0.8.0)

Updates `pytest-httpx` from 0.33.0 to 0.34.0
- [Release notes](https://github.com/Colin-b/pytest_httpx/releases)
- [Changelog](https://github.com/Colin-b/pytest_httpx/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/Colin-b/pytest_httpx/compare/v0.33.0...v0.34.0)

Updates `pytest-rerunfailures` from 14.0 to 15.0
- [Changelog](https://github.com/pytest-dev/pytest-rerunfailures/blob/master/CHANGES.rst)
- [Commits](https://github.com/pytest-dev/pytest-rerunfailures/compare/14.0...15.0)

Updates `mkdocs-material` from 9.5.44 to 9.5.46
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](https://github.com/squidfunk/mkdocs-material/compare/9.5.44...9.5.46)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: pytest-httpx
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: development
- dependency-name: pytest-rerunfailures
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: development
- dependency-name: mkdocs-material
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-01 18:56:54 -08:00
shamoon 9c1561adfb Change: change update content to handle archive disabled (#8315) 2024-11-20 20:01:13 +00:00
Kevin Doren 827121808a Enhancement: Add --compare-json option to document_exporter to write json files only if changed (#8261) 2024-11-19 07:20:24 -08:00
shamoon e94a92ed59 Feature: two-factor authentication (#8012) 2024-11-18 18:34:46 +00:00
shamoon aac04e73b9 Fix: correct serializing of auth tokens for export (#8100) 2024-10-29 17:02:32 +00:00
shamoon 28fdb170bf Fix: handle uuid fields created under mariadb and Django 4 (#8034) 2024-10-28 13:54:16 +00:00
shamoon 0d96cd03d5 Fix: disable custom field signals during import in 2.13.0 (#8065) 2024-10-27 18:43:24 -07:00
shamoon 7649903d3c Enhancement / fix: include social accounts and api tokens in export (#8016) 2024-10-26 06:51:22 -07:00
Trenton H e6f59472e4 Chore: Drop Python 3.9 support (#7774) 2024-09-26 12:22:24 -07:00
Trenton H 91585a1fa6 Prefer the metadata JSON file over the version JSON file (#7048) 2024-06-20 12:49:54 -07:00
Trenton H d9002005b1 Feature: Allow encrypting sensitive fields in export (#6927)
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2024-06-09 14:41:18 +00:00
Trenton H 085447e7c4 Feature: Allow a data only export/import cycle (#6871) 2024-06-01 18:22:59 -07:00
Trenton H 622f624132 Chore: Change the code formatter to Ruff (#6756)
* Changing the formatting to ruff-format

* Replaces references to black to ruff or ruff format, removes black from dependencies
2024-05-18 02:26:50 +00:00
Trenton H d4d0604da2 Moves additional auditlog imports into protected blocks (#6638) 2024-05-08 09:04:32 -07:00
shamoon 0f8b2e69c9 Change: enable auditlog by default, fix import / export (#6267) 2024-04-04 18:51:15 +00:00
shamoon d4963b9cbe Fix: document_renamer fails with audit_log enabled (#6175) 2024-03-24 07:26:25 -07:00
grembo 76064178f5 Fix: inotify read timeout not in ms (#5876)
* Fix inotify read timeout

This was off by a factor of 1000, leading to a lot more invocations
of the loop body than necessary.

* Incorporate review feedback
2024-02-26 00:46:47 +00:00
Trenton H 13201dbfff Ensure all creations of directories create the parents too (#5711) 2024-02-10 11:02:40 -08:00
Trenton H 4813a7bc70 Chore: Adds additional rules for Ruff linter (#5660) 2024-02-05 21:46:59 +00:00
Trenton H 2aea220c6d Ensure the scratch directory exists before consuming this source (#5579) 2024-01-28 09:07:04 -08:00