paperless_text/parsers.py was moved to paperless/parsers/text.py as part of
the Phase 3 parser migration. Update the signal-based get_parser() factory
to import from the new location and strip the legacy logging_group /
progress_callback kwargs that the new TextDocumentParser no longer accepts.
This shim keeps document consumption functional until Phase 4 replaces the
signal path with the new ParserRegistry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Define MetadataEntry TypedDict (namespace, prefix, key, value) in
paperless.parsers and export it from __all__
- Add extract_metadata(document_path, mime_type) -> list[MetadataEntry]
to ParserProtocol; implementations must not raise
- Implement extract_metadata on TextDocumentParser (returns [])
- Update DummyParser fixture in test_registry to include extract_metadata
and align parse/get_thumbnail signatures with the current Protocol
- Add TestTextParserMetadata tests covering empty-list return and
mime_type-agnostic behaviour
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement ParserProtocol on the moved TextDocumentParser without inheriting
from the old DocumentParser ABC:
- Add class-level identity attributes (name, version, author, url)
- Add supported_mime_types() and score() classmethods
- Add can_produce_archive and requires_pdf_rendition properties (both False)
- Replace tempdir / read_file_handle_unicode_errors from old base class with
a self-contained __init__, __enter__, __exit__, and _read_text helper
- Drop file_name parameter from parse() and get_thumbnail(); add produce_archive kwarg
- Use Self as __enter__ return type; align __exit__ exc_tb type to TracebackType | None
- Register TextDocumentParser in ParserRegistry.register_defaults()
Tests:
- Rewrite test_text_parser.py with 20 tests covering protocol compliance,
lifecycle/cleanup, parse, thumbnail, and registry integration
- Update parsers/conftest.py with text_parser fixture and sample file fixtures
- Update top-level tests/conftest.py with shared clean_registry autouse fixture
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move TextDocumentParser and its test suite from paperless_text/ into the
new paperless/ package where parsers are being consolidated:
- paperless_text/parsers.py → paperless/parsers/text.py
- paperless_text/tests/test_parser.py → paperless/tests/parsers/test_text_parser.py
- paperless_text/tests/conftest.py → paperless/tests/parsers/conftest.py
- paperless_text/tests/samples/*.txt → paperless/tests/samples/text/
Also add paperless/tests/__init__.py, paperless/tests/parsers/__init__.py,
and a new top-level paperless/tests/conftest.py for shared fixtures.
The parser and test files are unchanged; subsequent commits will update
them to implement ParserProtocol.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove all Sphinx cross-reference markup (:meth:, :class:, :func:,
:attr:, :data:, backtick quoting) from registry.py and __init__.py
docstrings; use plain prose matching the rest of the codebase
- Remove unused file_name parameter from parse() and get_thumbnail()
in ParserProtocol — no existing parser reads it and the path already
carries the filename
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces the foundation of the entrypoint-based parser discovery
system to replace the signal-based document_consumer_declaration approach.
- Add ParserProtocol: runtime_checkable Protocol defining the full
contract for document parsers (supported_mime_types, score, parse,
context manager, result accessors)
- Add ParserRegistry: lazy singleton with entrypoint discovery via
importlib.metadata group 'paperless_ngx.parsers', uniform score-based
selection across external and built-in parsers
- Add get_parser_registry(), init_builtin_parsers(), reset_parser_registry()
module-level helpers
- Wire Celery worker_process_init to call init_builtin_parsers() eagerly
in each worker, deferring third-party discovery to first task use
- Add 28 pytest tests covering Protocol compliance, singleton lifecycle,
scoring logic, entrypoint discovery, and log output
Built-in parsers and consumer migration follow in Phases 3-6.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Perf: stream manifest parsing with ijson in document_importer
Replace bulk json.load of the full manifest (which materializes the
entire JSON array into memory) with incremental ijson streaming.
Eliminates self.manifest entirely — records are never all in memory
at once.
- Add ijson>=3.2 dependency
- New module-level iter_manifest_records() generator
- load_manifest_files() collects paths only; no parsing at load time
- check_manifest_validity() streams without accumulating records
- decrypt_secret_fields() streams each manifest to a .decrypted.json
temp file record-by-record; temp files cleaned up after file copy
- _import_files_from_manifest() collects only document records (small
fraction of manifest) for the tqdm progress bar
Measured on 200 docs + 200 CustomFieldInstances:
- Streaming validation: peak memory 3081 KiB -> 333 KiB (89% reduction)
- Stream-decrypt to file: peak memory 3081 KiB -> 549 KiB (82% reduction)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Perf: slim dict in _import_files_from_manifest, discard fields
When collecting document records for the file-copy step, extract only
the 4 keys the loop actually uses (pk + 3 exported filename keys) and
discard the full fields dict (content, checksum, tags, etc.).
Peak memory for the document-record list: 939 KiB -> 375 KiB (60% reduction).
Wall time unchanged.
* Refactor: migrate exporter/importer from tqdm to PaperlessCommand.track()
Replace direct tqdm usage in document_exporter and document_importer with
the PaperlessCommand base class and its track() method, which is backed by
Rich and handles --no-progress-bar automatically. Also removes the unused
ProgressBarMixin from mixins.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Refactor: add explicit supports_progress_bar and supports_multiprocessing to all PaperlessCommand subclasses
Each management command now explicitly declares both class attributes
rather than relying on defaults, making intent unambiguous at a glance.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Perf: streaming manifest writer for document exporter (Phase 3)
Replaces the in-memory manifest dict accumulation with a
StreamingManifestWriter that writes records to manifest.json
incrementally, keeping only one batch resident in memory at a time.
Key changes:
- Add StreamingManifestWriter: writes to .tmp atomically, BLAKE2b
compare for --compare-json, discard() on exception
- Add _encrypt_record_inline(): per-record encryption replacing the
bulk encrypt_secret_fields() call; crypto setup moved before streaming
- Add _write_split_manifest(): extracted per-document manifest writing
- Refactor dump(): non-doc records streamed during transaction, documents
accumulated then written after filenames are assigned
- Upgrade check_and_write_json() from MD5 to BLAKE2b
- Remove encrypt_secret_fields() and unused itertools.chain import
- Add profiling marker to pyproject.toml
Measured improvement (200 docs + 200 CustomFieldInstances, same
dump() code path, only writer differs):
- Peak memory: ~50% reduction
- Memory delta: ~70% reduction
- Wall time and query count: unchanged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Refactor: O(1) lookup table for CRYPT_FIELDS in per-record encryption
Add CRYPT_FIELDS_BY_MODEL to CryptMixin, derived from CRYPT_FIELDS at
class definition time. _encrypt_record_inline() now does a single dict
lookup instead of a linear scan per record, eliminating the loop and
break pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1 -- Eliminate JSON round-trip in document exporter
Replace json.loads(serializers.serialize("json", qs)) with
serializers.serialize("python", qs) to skip the intermediate
JSON string allocation and parse step. Use DjangoJSONEncoder
in check_and_write_json() to handle native Python types
(datetime, Decimal, UUID) the Python serializer returns.
Phase 2 -- Batched QuerySet serialization in document exporter
Add serialize_queryset_batched() helper that uses QuerySet.iterator()
and itertools.islice to stream records in configurable chunks, bounding
peak memory during serialization to batch_size * avg_record_size rather
than loading the entire QuerySet at once.
* Fix: improve test portability
* Make settings always consistent
* Make a few more tests deterministic wrt settings
* Dont pollute settings for this one
* Fix timezone issue with mail parser
* Update test_parser.py
* Uh, I guess OCR gives variants for this
---------
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
* Uses a custom transport to resolve the slim chance of a DNS rebinding affecting the webhook
* Fix WebhookTransport hostname resolution and validation
* Fix test failures
* Lint
* Keep all internal logic inside WebhookTransport
* Fix test failure
* Update handlers.py
* Update handlers.py
---------
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
* tests: general cleanup and fixes for runnning under docker
This now allows tests to be run under a locally built or production
docker image with something like:
`docker run --rm -v $PWD:/usr/src/paperless --entrypoint=bash paperlessngx/paperless-ngx:latest -c "uv run pytest"`
Specific fixes:
- fix unreachable code around `assertRaises` blocks
- fix `assertInt` typos
- fix `str(e)` vs `str(e.exception)` issues
- skip permission-based checks when root (in a docker container)
- catch `OSError` problems when instantiating `INotify` and
skip inotify-based tests when it's unavailable.
* Reverts most files to dev while keeping the exception assert fixes
---------
Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>
This alters the retry/backoff logic in the init-wait-for-db script to be more
optimistic about database availability. During regular deployment and
operations of paperless-ngx, it's common to restart the application server with
the database instance already running, so we should optimize for this case.
Instead of unconditionally delaying 5 seconds between each connection attempt,
start with a minimum delay of 1 second and increase the delay linearly with
each attempt, maxing out at 10 seconds. This makes the retry count-based
failure mode less practical, so instead we just use a timeout-based approach.*
*NOTE: the original implementation would have an effective timeout of 25s. This
alters the behavior to 60s.
Additionally, this removes an unnecessary 5s delay that was injected in the
postgres case. The script uses a more comprehensive connection check for
postgres than it does mariadb, so if anything this 5s delay after getting an
"ok" response from the DB was extra unnecessary in the postgres case.
* chore(devcontainer): drop read-only host .gitconfig bind mount
The bind mount prevented adjusting git config inside the dev container, and VS Code Dev Containers already copies the host .gitconfig automatically, making the mount unnecessary. This restores ability to manage git settings within the container.
* chore(gitignore): ignore .pnpm-store folder for pnpm package management
Add .pnpm-store/ to .gitignore to prevent local pnpm package store from being tracked by git when using the devcontainer.
* docs(development): clarify VS Code devcontainer setup steps for Windows
Add instructions, how to overcome some issues caused by using Windows as host system.
This helps prevent excessive processing times on very large documents
by limiting the text analyzed during date parsing, tag prediction,
and correspondent matching.
If the document exceeds 1.2M chars, crop to 1M char.
* Security: prevent XSS with storage path template rendering
* Security: prevent XSS svg uploads
* Security: force attachment disposition for logo
* Add suggestions from code review
* Improve SVG validation with allowlist for tags and attributes
* Explicitly set the HOME environment for the migrations to fix issue with certificates
* Defines the HOME globally when we're running as root for startup
@@ -95,11 +75,8 @@ You can start and debug backend services either as debugging sessions via `launc
#### Using Tasks
#### Using Tasks
1. Open the command palette:
1. Open the command palette and select `Tasks: Run Task`.
- **Windows/Linux**: `Ctrl+Shift+P`
2. Choose the desired task:
- **Mac**: `Cmd+Shift+P`
2. Select `Tasks: Run Task`.
3. Choose the desired task:
-`Runserver`
-`Runserver`
-`Document Consumer`
-`Document Consumer`
-`Celery`
-`Celery`
@@ -112,6 +89,18 @@ Additional tasks are available for common maintenance operations:
- **Migrate Database**: To apply database migrations.
- **Migrate Database**: To apply database migrations.
- **Create Superuser**: To create an admin user for the application.
- **Create Superuser**: To create an admin user for the application.
## Committing from the Host Machine
The DevContainer automatically installs Git pre-commit hooks during setup. However, these hooks are configured for use inside the container.
If you want to commit changes from your host machine (outside the DevContainer), you need to set up prek on your host. This installs it as a standalone tool.
```bash
uv tool install prek && prek install
```
After this, you can commit either from inside the DevContainer or from your host machine.
## Let's Get Started!
## Let's Get Started!
Follow the steps above to get your development environment up and running. Happy coding!
Follow the steps above to get your development environment up and running. Happy coding!
### ⚠️ Please remember: issues are for *bugs* only! ⚠️
That is, something you believe affects every single user of Paperless-ngx, not just you. If you're not sure, start with one of the other options below.
That is, something you believe affects every single user of Paperless-ngx (and the demo, for example), not just you. If you are not sure, start with one of the other options below.
Also, note that **Paperless-ngx does not perform OCR or archive file creation itself**, those are handled by other tools. Problems with OCR or archive versions of specific files should likely be raised 'upstream', see https://github.com/ocrmypdf/OCRmyPDF/issues or https://github.com/tesseract-ocr/tesseract/issues
Also, note that **Paperless-ngx does not perform OCR or archive file creation itself**, those are handled by other tools. Problems with OCR or archive versions of specific files should likely be raised 'upstream', see https://github.com/ocrmypdf/OCRmyPDF/issues or https://github.com/tesseract-ocr/tesseract/issues
- type:markdown
- type:markdown
@@ -59,6 +59,12 @@ body:
label:Browser logs
label:Browser logs
description:Logs from the web browser related to your issue, if needed
description:Logs from the web browser related to your issue, if needed
render:bash
render:bash
- type:textarea
id:logs_services
attributes:
label:Services logs
description:Logs from other services (or containers) related to your issue, if needed. For example, the database or redis logs.
@@ -35,8 +35,8 @@ NOTE: PRs that do not address the following will not be merged, please do not sk
- [ ] I have read & agree with the [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md).
- [ ] I have read & agree with the [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md).
- [ ] If applicable, I have included testing coverage for new code in this PR, for [backend](https://docs.paperless-ngx.com/development/#testing) and / or [front-end](https://docs.paperless-ngx.com/development/#testing-and-code-style) changes.
- [ ] If applicable, I have included testing coverage for new code in this PR, for [backend](https://docs.paperless-ngx.com/development/#testing) and / or [front-end](https://docs.paperless-ngx.com/development/#testing-and-code-style) changes.
- [ ] If applicable, I have tested my code for new features & regressions on both mobile & desktop devices, using the latest version of major browsers.
- [ ] If applicable, I have tested my code for breaking changes & regressions on both mobile & desktop devices, using the latest version of major browsers.
- [ ] If applicable, I have checked that all tests pass, see [documentation](https://docs.paperless-ngx.com/development/#back-end-development).
- [ ] If applicable, I have checked that all tests pass, see [documentation](https://docs.paperless-ngx.com/development/#back-end-development).
- [ ] I have run all `pre-commit` hooks, see [documentation](https://docs.paperless-ngx.com/development/#code-formatting-with-pre-commit-hooks).
- [ ] I have run all Git `pre-commit` hooks, see [documentation](https://docs.paperless-ngx.com/development/#code-formatting-with-pre-commit-hooks).
- [ ] I have made corresponding changes to the documentation as needed.
- [ ] I have made corresponding changes to the documentation as needed.
- [ ] I have checked my modifications for any breaking changes.
- [ ] In the description of the PR above I have disclosed the use of AI tools in the coding of this PR.
This issue has been automatically marked as stale because it has not had
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions. See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
days-before-pr-stale:14
days-before-pr-close:7
stale-pr-message:""
stale-pr-label:stale
exempt-pr-labels:'notable'
close-pr-message:>
This pull request has been automatically closed because it has not had recent activity. Thank you for your contributions. Please open a new pull request or discussion if you would like to continue working on this change.
lock-threads:
lock-threads:
name:'Lock Old Threads'
name:'Lock Old Threads'
if:github.repository_owner == 'paperless-ngx'
if:github.repository_owner == 'paperless-ngx'
runs-on:ubuntu-24.04
runs-on:ubuntu-24.04
steps:
steps:
- uses:dessant/lock-threads@v5
- uses:dessant/lock-threads@v6.0.0
with:
with:
issue-inactive-days:'30'
issue-inactive-days:'30'
pr-inactive-days:'30'
pr-inactive-days:'30'
discussion-inactive-days:'30'
discussion-inactive-days:'30'
log-output:true
log-output:true
issue-comment:>
issue-comment:>
This issue has been automatically locked since there
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
has not been any recent activity after it was closed.
Please open a new discussion or issue for related concerns.
See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
pr-comment:>
pr-comment:>
This pull request has been automatically locked since there
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns. See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
has not been any recent activity after it was closed.
Please open a new discussion or issue for related concerns.
See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
discussion-comment:>
discussion-comment:>
This discussion has been automatically locked since there
This discussion has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion for related concerns. See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
has not been any recent activity after it was closed.
Please open a new discussion for related concerns.
See our [contributing guidelines](https://github.com/paperless-ngx/paperless-ngx/blob/dev/CONTRIBUTING.md#automatic-repository-maintenance) for more details.
If you feel like contributing to the project, please do! Bug fixes and improvements are always welcome.
If you feel like contributing to the project, please do! Bug fixes and improvements are always welcome.
⚠️ Please note: Pull requests that implement a new feature or enhancement _should almost always target an existing feature request_ with evidence of community interest and discussion. This is in order to balance the work of implementing and maintaining new features / enhancements. Pull requests that are opened without meeting this requirement may not be merged.
If you want to implement something big:
If you want to implement something big:
-Please start a discussion about that in the issues! Maybe something similar is already in development and we can make it happen together.
-As above, please start with a discussion! Maybe something similar is already in development and we can make it happen together.
- When making additions to the project, consider if the majority of users will benefit from your change. If not, you're probably better of forking the project.
- When making additions to the project, consider if the majority of users will benefit from your change. If not, you're probably better of forking the project.
- Also consider if your change will get in the way of other users. A good change is a change that enhances the experience of some users who want that change and does not affect users who do not care about the change.
- Also consider if your change will get in the way of other users. A good change is a change that enhances the experience of some users who want that change and does not affect users who do not care about the change.
- Please see the [paperless-ngx merge process](#merging-prs) below.
- Please see the [paperless-ngx merge process](#merging-prs) below.
## Python
## Python
Paperless supports python 3.10 - 3.12 at this time. We format Python code with [ruff](https://docs.astral.sh/ruff/formatter/).
Paperless-ngx currently supports Python 3.11, 3.12, 3.13, and 3.14. As a policy, we aim to support at least the three most recent Python versions, and drop support for versions as they reach end-of-life. Older versions may be supported if dependencies permit, but this is not guaranteed.
We format Python code with [ruff](https://docs.astral.sh/ruff/formatter/).
## Branches
## Branches
@@ -37,6 +41,8 @@ Before you can run `pytest`, ensure to [properly set up your local environment](
Once you have submitted a **P**ull **R**equest it will be reviewed, approved, and merged by one or more community members of any team. Automated code tests and formatting checks must be passed.
Once you have submitted a **P**ull **R**equest it will be reviewed, approved, and merged by one or more community members of any team. Automated code tests and formatting checks must be passed.
Important: Pull requests that implement a new feature or enhancement _should almost always target an existing feature request_ with evidence of community interest and discussion. This is in order to balance the work of implementing and maintaining new features / enhancements. Instead of opening a PR which does not meet this requirement, please open a feature request instead, to gather feedback from both users and the project maintainers.
## Non-Trivial Requests
## Non-Trivial Requests
PRs deemed `non-trivial` will go through a stricter review process before being merged into `dev`. This is to ensure code quality and complete functionality (free of side effects).
PRs deemed `non-trivial` will go through a stricter review process before being merged into `dev`. This is to ensure code quality and complete functionality (free of side effects).
@@ -81,7 +87,7 @@ Some notes about translation:
If a language has already been added, and you would like to contribute new translations or change existing translations, please read the "Translation" section in the README.md file for further details on that.
If a language has already been added, and you would like to contribute new translations or change existing translations, please read the "Translation" section in the README.md file for further details on that.
If you would like the project to be translated to another language, first head over to https://crwd.in/paperless-ngx to check if that language has already been enabled for translation.
If you would like the project to be translated to another language, first head over to https://crowdin.com/project/paperless-ngx to check if that language has already been enabled for translation.
If not, please request the language to be added by creating an issue on GitHub. The issue should contain:
If not, please request the language to be added by creating an issue on GitHub. The issue should contain:
- English name of the language (the localized name can be added on Crowdin).
- English name of the language (the localized name can be added on Crowdin).
@@ -109,28 +115,12 @@ Paperless-ngx is a community project. We do our best to delegate permission and
## Structure
## Structure
As of writing, there are 21 members in paperless-ngx. 4 of these people have complete administrative privileges to the repo:
There are currently 2 members in paperless-ngx with complete administrative privileges to the repo:
There are 5 teams collaborating on specific tasks within paperless-ngx:
There are other members who occasionally contribute but we are actively seeking more dedicated maintainers of the project. Please reach out if you are interested.
-@paperless-ngx/test (General testing for larger PRs)
## Permissions
All team members are notified when mentioned or assigned to a relevant issue or pull request. Additionally, each team has slightly different access to paperless-ngx:
- The **test** team has no special permissions.
- The **issues** team has `triage` access. This means they can organize issues and pull requests.
- The **backend**, **frontend**, and **ci-cd** teams have `write` access. This means they can approve PRs and push code, containers, releases, and more.
## Joining
## Joining
@@ -141,13 +131,13 @@ The admins occasionally invite contributors directly if we believe having them o
# Automatic Repository Maintenance
# Automatic Repository Maintenance
The Paperless-ngx team appreciates all effort and interest from the community in filing bug reports, creating feature requests, sharing ideas and helping other
The Paperless-ngx team appreciates all effort and interest from the community in filing bug reports, creating feature requests, sharing ideas and helping other
community members. That said, in an effort to keep the repository organized and managebale the project uses automatic handling of certain areas:
community members. That said, in an effort to keep the repository organized and manageable the project uses automatic handling of certain areas:
- Issues that cannot be reproduced will be marked 'stale' after 7 days of inactivity and closed after 14 further days of inactivity.
- Issues that cannot be reproduced will be marked 'stale' after 7 days of inactivity and closed after 14 further days of inactivity.
- Issues, pull requests and discussions that are closed will be locked after 30 days of inactivity.
- Issues, pull requests and discussions that are closed will be locked after 30 days of inactivity.
- Discussions with a marked answer will be automatically closed.
- Discussions with a marked answer will be automatically closed.
- Discussions in the 'General' or 'Support' categories will be closed after 180 days of inactivity.
- Discussions in the 'General' or 'Support' categories will be closed after 180 days of inactivity.
- Feature requests that do not meet the following thresholds will be closed: 180 days of inactivity, < 5 "up-votes" after 180 days, < 20 "up-votes" after 1 year or < 80 "up-votes" at 2 years.
- Feature requests that do not meet the following thresholds will be closed: 180 days of inactivity with less than 80 "up-votes", < 5 "up-votes" after 180 days, < 20 "up-votes" after 1 year or < 40 "up-votes" at 2 years.
In all cases, threads can be re-opened by project maintainers and, of course, users can always create a new discussion for related concerns.
In all cases, threads can be re-opened by project maintainers and, of course, users can always create a new discussion for related concerns.
Finally, remember that all information remains searchable and 'closed' feature requests can still serve as inspiration for new features.
Finally, remember that all information remains searchable and 'closed' feature requests can still serve as inspiration for new features.
@@ -83,7 +83,7 @@ People interested in continuing the work on paperless-ngx are encouraged to reac
## Translation
## Translation
Paperless-ngx is available in many languages that are coordinated on Crowdin. If you want to help out by translating paperless-ngx into your language, please head over to https://crwd.in/paperless-ngx, and thank you! More details can be found in [CONTRIBUTING.md](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md#translating-paperless-ngx).
Paperless-ngx is available in many languages that are coordinated on Crowdin. If you want to help out by translating paperless-ngx into your language, please head over to https://crowdin.com/project/paperless-ngx, and thank you! More details can be found in [CONTRIBUTING.md](https://github.com/paperless-ngx/paperless-ngx/blob/main/CONTRIBUTING.md#translating-paperless-ngx).
@@ -62,6 +62,10 @@ copies you created in the steps above.
## Updating Paperless {#updating}
## Updating Paperless {#updating}
!!! warning
Please review the [migration instructions](migration-v3.md) before upgrading Paperless-ngx to v3.0, it includes some breaking changes that require manual intervention before upgrading.
### Docker Route {#docker-updating}
### Docker Route {#docker-updating}
If a new release of paperless-ngx is available, upgrading depends on how
If a new release of paperless-ngx is available, upgrading depends on how
@@ -179,10 +183,14 @@ following:
### Database Upgrades
### Database Upgrades
In general, paperless does not require a specific version of PostgreSQL or MariaDB and it is
Paperless-ngx is compatible with Django-supported versions of PostgreSQL and MariaDB and it is generally
safe to update them to newer versions. However, you should always take a backup and follow
safe to update them to newer versions. However, you should always take a backup and follow
the instructions from your database's documentation for how to upgrade between major versions.
the instructions from your database's documentation for how to upgrade between major versions.
!!! note
As of Paperless-ngx v2.18, the minimum supported version of PostgreSQL is 14.
For PostgreSQL, refer to [Upgrading a PostgreSQL Cluster](https://www.postgresql.org/docs/current/upgrading.html).
For PostgreSQL, refer to [Upgrading a PostgreSQL Cluster](https://www.postgresql.org/docs/current/upgrading.html).
For MariaDB, refer to [Upgrading MariaDB](https://mariadb.com/kb/en/upgrading/)
For MariaDB, refer to [Upgrading MariaDB](https://mariadb.com/kb/en/upgrading/)
@@ -306,7 +314,7 @@ in dedicated folders according to their nature: `archive`, `originals`,
If `-sm` or `--split-manifest` is provided, information about document
If `-sm` or `--split-manifest` is provided, information about document
will be placed in individual json files, instead of a single JSON file. The main
will be placed in individual json files, instead of a single JSON file. The main
manifest.json will still contain application wide information (e.g. tags, correspondent,
manifest.json will still contain application wide information (e.g. tags, correspondent,
documenttype, etc)
documenttype, etc)
If `-z` or `--zip` is provided, the export will be a zip file
If `-z` or `--zip` is provided, the export will be a zip file
in the target directory, named according to the current local date or the
in the target directory, named according to the current local date or the
@@ -333,7 +341,7 @@ must be provided to import. If this value is lost, the export cannot be imported
The document importer takes the export produced by the [Document
The document importer takes the export produced by the [Document
exporter](#exporter) and imports it into paperless.
exporter](#exporter) and imports it into paperless.
The importer works just like the exporter. You point it at a directory,
The importer works just like the exporter. You point it at a directory or the generated .zip file,
and the script does the rest of the work:
and the script does the rest of the work:
```shell
```shell
@@ -351,9 +359,6 @@ When you use the provided docker compose script, put the export inside
the `export` folder in your paperless source directory. Specify
the `export` folder in your paperless source directory. Specify
`../export` as the `source`.
`../export` as the `source`.
Note that .zip files (as can be generated from the exporter) are not supported. You must unzip them into
the target directory first.
!!! note
!!! note
Importing from a previous version of Paperless may work, but for best
Importing from a previous version of Paperless may work, but for best
@@ -460,6 +465,22 @@ of the index and usually makes queries faster and also ensures that the
autocompletion works properly. This command is regularly invoked by the
autocompletion works properly. This command is regularly invoked by the
task scheduler.
task scheduler.
### Clearing the database read cache
If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.
This includes operations such as restoring a database backup or executing SQL statements like UPDATE, INSERT, DELETE, ALTER, CREATE, or DROP.
Failing to invalidate the cache after such modifications can lead to stale data being served from the cache, and **may cause data corruption** or inconsistent behavior in the application.
Use the following management command to clear the cache:
```
python3 manage.py invalidate_cachalot
```
!!! info
The database read cache is based on Django-Cachalot. You can refer to their [documentation](https://django-cachalot.readthedocs.io/en/latest/quickstart.html#manage-py-command).
### Managing filenames {#renamer}
### Managing filenames {#renamer}
If you use paperless' feature to
If you use paperless' feature to
@@ -563,39 +584,9 @@ document.
documents, such as encrypted PDF documents. The archiver will skip over
documents, such as encrypted PDF documents. The archiver will skip over
these documents each time it sees them.
these documents each time it sees them.
### Managing encryption {#encryption}
!!! warning
Encryption was removed in [paperless-ng 0.9](changelog.md#paperless-ng-090)
because it did not really provide any additional security, the passphrase
was stored in a configuration file on the same system as the documents.
Furthermore, the entire text content of the documents is stored plain in
the database, even if your documents are encrypted. Filenames are not
encrypted as well. Finally, the web server provides transparent access to
your encrypted documents.
Consider running paperless on an encrypted filesystem instead, which
will then at least provide security against physical hardware theft.
#### Enabling encryption
Enabling encryption is no longer supported.
#### Disabling encryption
Basic usage to disable encryption of your document store:
(Note: If `PAPERLESS_PASSPHRASE` isn't set already, you need to specify
it here)
```
decrypt_documents [--passphrase SECR3TP4SSPHRA$E]
```
### Detecting duplicates {#fuzzy_duplicate}
### Detecting duplicates {#fuzzy_duplicate}
Paperless already catches and prevents upload of exactly matching documents,
Paperless-ngx already catches and warns of exactly matching documents,
however a new scan of an existing document may not produce an exact bit for bit
however a new scan of an existing document may not produce an exact bit for bit
duplicate. But the content should be exact or close, allowing detection.
duplicate. But the content should be exact or close, allowing detection.
| `DOCUMENT_ID` | Database primary key of the document |
| `DOCUMENT_ID` | Database primary key of the document |
| `DOCUMENT_FILE_NAME` | Formatted filename, not including paths |
| `DOCUMENT_FILE_NAME` | Formatted filename, not including paths |
| `DOCUMENT_TYPE` | The document type (if any) |
| `DOCUMENT_CREATED` | Date & time when document created |
| `DOCUMENT_CREATED` | Date & time when document created |
| `DOCUMENT_MODIFIED` | Date & time when document was last modified |
| `DOCUMENT_MODIFIED` | Date & time when document was last modified |
| `DOCUMENT_ADDED` | Date & time when document was added |
| `DOCUMENT_ADDED` | Date & time when document was added |
@@ -261,6 +262,10 @@ your files differently, you can do that by adjusting the
or using [storage paths (see below)](#storage-paths). Paperless adds the
or using [storage paths (see below)](#storage-paths). Paperless adds the
correct file extension e.g. `.pdf`, `.jpg` automatically.
correct file extension e.g. `.pdf`, `.jpg` automatically.
When a document has file versions, each version uses the same naming rules and
storage path resolution as any other document file, with an added version suffix
such as `_v1`, `_v2`, etc.
This variable allows you to configure the filename (folders are allowed)
This variable allows you to configure the filename (folders are allowed)
using placeholders. For example, configuring this to
using placeholders. For example, configuring this to
@@ -352,6 +357,8 @@ If paperless detects that two documents share the same filename,
paperless will automatically append `_01`, `_02`, etc to the filename.
paperless will automatically append `_01`, `_02`, etc to the filename.
This happens if all the placeholders in a filename evaluate to the same
This happens if all the placeholders in a filename evaluate to the same
value.
value.
For versioned files, this counter is appended after the version suffix
(for example `statement_v2_01.pdf`).
If there are any errors in the placeholders included in `PAPERLESS_FILENAME_FORMAT`,
If there are any errors in the placeholders included in `PAPERLESS_FILENAME_FORMAT`,
paperless will fall back to using the default naming scheme instead.
paperless will fall back to using the default naming scheme instead.
@@ -430,8 +437,138 @@ This allows for complex logic to be included in the format, including [logical s
and [filters](https://jinja.palletsprojects.com/en/3.1.x/templates/#id11) to manipulate the [variables](#filename-format-variables)
and [filters](https://jinja.palletsprojects.com/en/3.1.x/templates/#id11) to manipulate the [variables](#filename-format-variables)
provided. The template is provided as a string, potentially multiline, and rendered into a single line.
provided. The template is provided as a string, potentially multiline, and rendered into a single line.
In addition, the entire Document instance is available to be utilized in a more advanced way, as well as some variables which only make sense to be accessed
In addition, a limited `document` object is available for advanced templates.
with more complex logic.
This object includes common metadata fields such as `id`, `pk`, `title`, `content`, `page_count`, `created`, `added`, `modified`, `mime_type`,
`checksum`, `archive_checksum`, `archive_serial_number`, `filename`, `archive_filename`, and `original_filename`.
Related values are available as nested objects with limited fields, for example document.correspondent.name, etc.
#### Custom Jinja2 Filters
##### Custom Field Access
The `get_cf_value` filter retrieves a value from custom field data with optional default fallback.
- Version-specific file data (file, mime type, checksums, archive info, extracted text content) belongs to the selected/latest version.
Version-aware endpoints:
- `GET /api/documents/{id}/`: returns root document data; `content` resolves to latest version content by default. Use `?version={version_id}` to resolve content for a specific version.
- `PATCH /api/documents/{id}/`: content updates target the selected version (`?version={version_id}`) or latest version by default; non-content metadata updates target the root document.
: Caches the database read query results into Redis. This can significantly improve application response times by caching database queries, at the cost of slightly increased memory usage.
Defaults to `false`.
!!! danger
**Do not modify the database outside the application while it is running.**
This includes actions such as restoring a backup, upgrading the database, or performing manual inserts. All external modifications must be done **only when the application is stopped**.
After making any such changes, you **must invalidate the DB read cache** using the `invalidate_cachalot` management command.
: Specifies how long (in seconds) read data should be cached.
Allowed values are between `1` (one second) and `31536000` (one year). Defaults to `3600` (one hour).
!!! warning
A high TTL increases memory usage over time. Memory may be used until end of TTL, even if the cache is invalidated with the `invalidate_cachalot` command.
In case of an out-of-memory (OOM) situation, Redis may stop accepting new data — including cache entries, scheduled tasks, and documents to consume.
If your system has limited RAM, consider configuring a dedicated Redis instance for the read cache, with a memory limit and the eviction policy set to `allkeys-lru`.
For more details, refer to the [Redis eviction policy documentation](https://redis.io/docs/latest/develop/reference/eviction/), and see the `PAPERLESS_READ_CACHE_REDIS_URL` setting to specify a separate Redis broker.
: The default layout to use for emails that are consumed as documents. Must be one of the integer choices below. Note that mail
: The default layout to use for emails that are consumed as documents. Must be one of the integer choices below. Note that mail
rules can specify this setting, thus this fallback is used for the default selection and for .eml files consumed by other means.
rules can specify this setting, thus this fallback is used for the default selection and for .eml files consumed by other means.
@@ -598,7 +694,7 @@ system. See the corresponding
: Sync groups from the third party authentication system (e.g. OIDC) to Paperless-ngx. When enabled, users will be added or removed from groups based on their group membership in the third party authentication system. Groups must already exist in Paperless-ngx and have the same name as in the third party authentication system. Groups are updated upon logging in via the third party authentication system, see the corresponding [django-allauth documentation](https://docs.allauth.org/en/dev/socialaccount/signals.html).
: Sync groups from the third party authentication system (e.g. OIDC) to Paperless-ngx. When enabled, users will be added or removed from groups based on their group membership in the third party authentication system. Groups must already exist in Paperless-ngx and have the same name as in the third party authentication system. Groups are updated upon logging in via the third party authentication system, see the corresponding [django-allauth documentation](https://docs.allauth.org/en/dev/socialaccount/signals.html).
: In order to pass groups from the authentication system you will need to update your [PAPERLESS_SOCIALACCOUNT_PROVIDERS](#PAPERLESS_SOCIALACCOUNT_PROVIDERS) setting by adding a top-level "SCOPES" setting which includes "groups", e.g.:
: In order to pass groups from the authentication system you will need to update your [PAPERLESS_SOCIALACCOUNT_PROVIDERS](#PAPERLESS_SOCIALACCOUNT_PROVIDERS) setting by adding a top-level "SCOPES" setting which includes "groups", or the custom groups claim configured in [`PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS_CLAIM`](#PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS_CLAIM) e.g.:
: Allows you to define a custom groups claim. See [PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS](#PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS) which is required for this setting to take effect.
Specifies which language Paperless should use when parsing dates from documents.
This should be a language code supported by the dateparser library,
for example: "en", or a combination such as "en+de".
Locales are also supported (e.g., "en-AU").
Multiple languages can be combined using "+", for example: "en+de" or "en-AU+de".
For valid values, refer to the list of supported languages and locales in the [dateparser documentation](https://dateparser.readthedocs.io/en/latest/supported_locales.html).
Set this to match the languages in which most of your documents are written.
If not set, Paperless will attempt to infer the language(s) from the OCR configuration (`PAPERLESS_OCR_LANGUAGE`).
!!! note
This format differs from the `PAPERLESS_OCR_LANGUAGE` setting, which uses ISO 639-2 codes (3 letters, e.g., "eng+deu" for Tesseract OCR).
: Configures how the consumer detects new files in the consumption directory.
: If paperless won't find documents added to your consume folder, it
When set to `0` (default), paperless uses native filesystem notifications for efficient, immediate detection of new files.
might not be able to automatically detect filesystem changes. In
that case, specify a polling interval in seconds here, which will
then cause paperless to periodically check your consumption
directory for changes. This will also disable listening for file
system changes with `inotify`.
Defaults to 0, which disables polling and uses filesystem
When set to a positive number, paperless polls the consumption directory at that interval in seconds. Use polling for network filesystems (NFS, SMB/CIFS) where native notifications may not work reliably.
: The model to use for the embedding backend for RAG. This can be set to any of the embedding models supported by the current embedding backend. If not supplied, defaults to "text-embedding-3-small" for OpenAI and "sentence-transformers/all-MiniLM-L6-v2" for Huggingface.
git ls-files -- '*.ts' | xargs uv run prek run prettier --files
```
```
Front end testing uses Jest and Playwright. Unit tests and e2e tests,
Front end testing uses Jest and Playwright. Unit tests and e2e tests,
respectively, can be run non-interactively with:
respectively, can be run non-interactively with:
```bash
```bash
$ ng test
pnpm ng test
$ npx playwright test
pnpm playwright test
```
```
Playwright also includes a UI which can be run with:
Playwright also includes a UI which can be run with:
```bash
```bash
$ npx playwright test --ui
pnpm playwright test --ui
```
```
### Building the frontend
### Building the frontend
@@ -239,7 +239,7 @@ $ npx playwright test --ui
In order to build the front end and serve it as part of Django, execute:
In order to build the front end and serve it as part of Django, execute:
```bash
```bash
$ ng build --configuration production
pnpm ng build --configuration production
```
```
This will build the front end and put it in a location from which the
This will build the front end and put it in a location from which the
@@ -312,10 +312,10 @@ end (such as error messages).
- The source language of the project is "en_US".
- The source language of the project is "en_US".
- Localization files end up in the folder `src/locale/`.
- Localization files end up in the folder `src/locale/`.
- In order to extract strings from the application, call
- In order to extract strings from the application, call
`python3 manage.py makemessages -l en_US`. This is important after
`uv run manage.py makemessages -l en_US`. This is important after
making changes to translatable strings.
making changes to translatable strings.
- The message files need to be compiled for them to show up in the
- The message files need to be compiled for them to show up in the
application. Call `python3 manage.py compilemessages` to do this.
application. Call `uv run manage.py compilemessages` to do this.
The generated files don't get committed into git, since these are
The generated files don't get committed into git, since these are
derived artifacts. The build pipeline takes care of executing this
derived artifacts. The build pipeline takes care of executing this
command.
command.
@@ -338,13 +338,13 @@ LANGUAGES = [
## Building the documentation
## Building the documentation
The documentation is built using material-mkdocs, see their [documentation](https://squidfunk.github.io/mkdocs-material/reference/).
The documentation is built using Zensical, see their [documentation](https://zensical.org/docs/).
If you want to build the documentation locally, this is how you do it:
If you want to build the documentation locally, this is how you do it:
1. Build the documentation
1. Build the documentation
```bash
```bash
$ uv run mkdocs build --config-file mkdocs.yml
$ uv run zensical build
```
```
_alternatively..._
_alternatively..._
@@ -355,10 +355,10 @@ If you want to build the documentation locally, this is how you do it:
something.
something.
```bash
```bash
$ uv run mkdocs serve
$ uv run zensical serve
```
```
## Building the Docker image
## Building the Docker image {#docker_build}
The docker image is primarily built by the GitHub actions workflow, but
The docker image is primarily built by the GitHub actions workflow, but
it can be faster when developing to build and tag an image locally.
it can be faster when developing to build and tag an image locally.
@@ -470,9 +470,158 @@ To get started:
2. VS Code will prompt you with "Reopen in container". Do so and wait for the environment to start.
2. VS Code will prompt you with "Reopen in container". Do so and wait for the environment to start.
3. Initialize the project by running the task **Project Setup: Run all Init Tasks**. This
3. In case your host operating system is Windows:
- The Source Control view in Visual Studio Code might show: "The detected Git repository is potentially unsafe as the folder is owned by someone other than the current user." Use "Manage Unsafe Repositories" to fix this.
- Git might have detecteded modifications for all files, because Windows is using CRLF line endings. Run `git checkout .` in the containers terminal to fix this issue.
4. Initialize the project by running the task **Project Setup: Run all Init Tasks**. This
will initialize the database tables and create a superuser. Then you can compile the front end
will initialize the database tables and create a superuser. Then you can compile the front end
for production or run the frontend in debug mode.
for production or run the frontend in debug mode.
4. The project is ready for debugging, start either run the fullstack debug or individual debug
5. The project is ready for debugging, start either run the fullstack debug or individual debug
processes. Yo spin up the project without debugging run the task **Project Start: Run all Services**
processes. Yo spin up the project without debugging run the task **Project Start: Run all Services**
## Developing Date Parser Plugins
Paperless-ngx uses a plugin system for date parsing, allowing you to extend or replace the default date parsing behavior. Plugins are discovered using [Python entry points](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).
### Creating a Date Parser Plugin
To create a custom date parser plugin, you need to:
1. Create a class that inherits from `DateParserPluginBase`
2. Implement the required abstract method
3. Register your plugin via an entry point
#### 1. Implementing the Parser Class
Your parser must extend `documents.plugins.date_parsing.DateParserPluginBase` and implement the `parse` method:
```python
from collections.abc import Iterator
import datetime
from documents.plugins.date_parsing import DateParserPluginBase
Parse dates from the document's filename and content.
Args:
filename: The original filename of the document
content: The extracted text content of the document
Yields:
datetime.datetime: Valid datetime objects found in the document
"""
# Your parsing logic here
# Use self.config to access configuration settings
# Example: parse dates from filename first
if self.config.filename_date_order:
# Your filename parsing logic
yield some_datetime
# Then parse dates from content
# Your content parsing logic
yield another_datetime
```
#### 2. Configuration and Helper Methods
Your parser instance is initialized with a `DateParserConfig` object accessible via `self.config`. This provides:
- `languages: list[str]` - List of language codes for date parsing
- `timezone_str: str` - Timezone string for date localization
- `ignore_dates: set[datetime.date]` - Dates that should be filtered out
- `reference_time: datetime.datetime` - Current time for filtering future dates
- `filename_date_order: str | None` - Date order preference for filenames (e.g., "DMY", "MDY")
- `content_date_order: str` - Date order preference for content
The base class provides two helper methods you can use:
```python
def _parse_string(
self,
date_string: str,
date_order: str,
) -> datetime.datetime | None:
"""
Parse a single date string using dateparser with configured settings.
"""
def _filter_date(
self,
date: datetime.datetime | None,
) -> datetime.datetime | None:
"""
Validate a parsed datetime against configured rules.
Filters out dates before 1900, future dates, and ignored dates.
"""
```
#### 3. Resource Management (Optional)
If your plugin needs to acquire or release resources (database connections, API clients, etc.), override the context manager methods. Paperless-ngx will always use plugins as context managers, ensuring resources can be released even in the event of errors.
#### 4. Registering Your Plugin
Register your plugin using a setuptools entry point in your package's `pyproject.toml`:
The entry point name (e.g., `"my_parser"`) is used for sorting when multiple plugins are found. Paperless-ngx will use the first plugin alphabetically by name if multiple plugins are discovered.
### Plugin Discovery
Paperless-ngx automatically discovers and loads date parser plugins at runtime. The discovery process:
1. Queries the `paperless_ngx.date_parsers` entry point group
2. Validates that each plugin is a subclass of `DateParserPluginBase`
3. Sorts valid plugins alphabetically by entry point name
4. Uses the first valid plugin, or falls back to the default `RegexDateParserPlugin` if none are found
If multiple plugins are installed, a warning is logged indicating which plugin was selected.
### Example: Simple Date Parser
Here's a minimal example that only looks for ISO 8601 dates:
```python
import datetime
import re
from collections.abc import Iterator
from documents.plugins.date_parsing.base import DateParserPluginBase
class ISODateParserPlugin(DateParserPluginBase):
"""
Parser that only matches ISO 8601 formatted dates (YYYY-MM-DD).
@@ -25,12 +29,14 @@ physical documents into a searchable online archive so you can keep, well, _less
## Features
## Features
-**Organize and index** your scanned documents with tags, correspondents, types, and more.
-**Organize and index** your scanned documents with tags, correspondents, types, and more.
-_Your_ data is stored locally on _your_ server and is never transmitted or shared in any way.
-_Your_ data is stored locally on _your_ server and is never transmitted or shared in any way, unless you explicitly choose to do so.
- Performs **OCR** on your documents, adding searchable and selectable text, even to documents scanned with only images.
- Performs **OCR** on your documents, adding searchable and selectable text, even to documents scanned with only images.
- Utilizes the open-source Tesseract engine to recognize more than 100 languages.
- Utilizes the open-source Tesseract engine to recognize more than 100 languages.
-_New!_ Supports remote OCR with Azure AI (opt-in).
- Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals.
- Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals.
- Uses machine-learning to automatically add tags, correspondents and document types to your documents.
- Uses machine-learning to automatically add tags, correspondents and document types to your documents.
-Supports PDF documents, images, plain text files, Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents)[^1] and more.
-**New**: Paperless-ngx can now leverage AI (Large Language Models or LLMs) for document suggestions. This is an optional feature that can be enabled (and is disabled by default).
- Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents)[^1] and more.
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely with different configurations assigned to different documents.
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely with different configurations assigned to different documents.
-**Beautiful, modern web application** that features:
-**Beautiful, modern web application** that features:
- Customizable dashboard with statistics.
- Customizable dashboard with statistics.
@@ -197,7 +203,7 @@ People interested in continuing the work on paperless-ngx are encouraged to reac
### Translation
### Translation
Paperless-ngx is available in many languages that are coordinated on [Crowdin](https://crwd.in/paperless-ngx). If you want to help out by translating paperless-ngx into your language, please head over to the [Paperless-ngx project at Crowdin](https://crwd.in/paperless-ngx), and thank you!
Paperless-ngx is available in many languages that are coordinated on [Crowdin](https://crowdin.com/project/paperless-ngx). If you want to help out by translating paperless-ngx into your language, please head over to the [Paperless-ngx project at Crowdin](https://crowdin.com/project/paperless-ngx), and thank you!
| `CONSUMER_POLLING` | [`CONSUMER_POLLING_INTERVAL`](configuration.md#PAPERLESS_CONSUMER_POLLING_INTERVAL) | Renamed for clarity |
| `CONSUMER_INOTIFY_DELAY` | [`CONSUMER_STABILITY_DELAY`](configuration.md#PAPERLESS_CONSUMER_STABILITY_DELAY) | Unified for all modes |
| `CONSUMER_POLLING_DELAY` | _Removed_ | Use `CONSUMER_STABILITY_DELAY` |
| `CONSUMER_POLLING_RETRY_COUNT` | _Removed_ | Automatic with stability tracking |
| `CONSUMER_IGNORE_PATTERNS` | [`CONSUMER_IGNORE_PATTERNS`](configuration.md#PAPERLESS_CONSUMER_IGNORE_PATTERNS) | **Now regex, not fnmatch**; user patterns are added to (not replacing) default ones |
| _New_ | [`CONSUMER_IGNORE_DIRS`](configuration.md#PAPERLESS_CONSUMER_IGNORE_DIRS) | Additional directories to ignore; user entries are added to (not replacing) defaults |
## Encryption Support
Document and thumbnail encryption is no longer supported. This was previously deprecated in [paperless-ng 0.9.3](https://github.com/paperless-ngx/paperless-ngx/blob/dev/docs/changelog.md#paperless-ng-093)
Users must decrypt their document using the `decrypt_documents` command before upgrading.
## Barcode Scanner Changes
Support for [pyzbar](https://github.com/NaturalHistoryMuseum/pyzbar) has been removed. The underlying libzbar library has
seen no updates in 16 years and is largely unmaintained, and the pyzbar Python wrapper last saw a release in March 2022. In
practice, pyzbar struggled with barcode detection reliability, particularly on skewed, low-contrast, or partially
obscured barcodes. [zxing-cpp](https://github.com/zxing-cpp/zxing-cpp) is actively maintained, significantly more
reliable at finding barcodes, and now ships pre-built wheels for both x86_64 and arm64, removing the need to build the library.
The `CONSUMER_BARCODE_SCANNER` setting has been removed. zxing-cpp is now the only backend.
You can go multiple routes to setup and run Paperless:
# Installation
- [Use the script to setup a Docker install](#docker_script)
!!! tip "Quick Start"
- [Use the Docker compose templates](#docker)
- [Build the Docker image yourself](#docker_build)
- [Install Paperless-ngx directly on your system manually ("bare metal")](#bare_metal)
- A user-maintained list of commercial hosting providers can be found [in the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects)
The Docker routes are quick & easy. These are the recommended routes.
This configures all the stuff from the above automatically so that it
just works and uses sensible defaults for all configuration options.
Here you find a cheat-sheet for docker beginners: [CLI
_If piping into a shell directly from the internet makes you nervous, inspect [the script](https://github.com/paperless-ngx/paperless-ngx/blob/main/install-paperless-ngx.sh) first!_
!!! note
## Overview
macOS users will need to install [gnu-sed](https://formulae.brew.sh/formula/gnu-sed) with support
Choose the installation route that best fits your setup:
for running as `sed` as well as [wget](https://formulae.brew.sh/formula/wget).
| [Installation script](#docker_script) | Fastest first-time setup with guided prompts (recommended for most users) | Low |
| [Docker Compose templates](#docker) | Manual control over compose files and settings | Medium |
| [Bare metal](#bare_metal) | Advanced setups, packaging, and development-adjacent workflows | High |
| [Hosted providers (wiki)](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects#hosting-providers) | Managed hosting options maintained by the community — check details carefully | Varies |
1. Make sure that Docker and Docker Compose are [installed](https://docs.docker.com/engine/install/){:target="\_blank"}.
For most users, Docker is the best option. It is faster to set up,
easier to maintain, and ships with sensible defaults.
2. Go to the [/docker/compose directory on the project
The bare-metal route gives you more control, but it requires manual
installation and operation of all components. It is usually best suited
for advanced users and contributors.
!!! info
Because [superuser](usage.md#superusers) accounts have full access to all objects and documents, you may want to create a separate user account for daily use,
or "downgrade" your superuser account to a normal user account after setup.
## Installation Script {#docker_script}
Paperless-ngx provides an interactive script for Docker Compose setups.
It asks a few configuration questions, then creates the required files,
pulls the image, starts the containers, and creates your [superuser](usage.md#superusers)
account. In short, it automates the [Docker Compose setup](#docker) described below.
#### Prerequisites
- Docker and Docker Compose must be [installed](https://docs.docker.com/engine/install/){:target="\_blank"}.
- macOS users will need [GNU sed](https://formulae.brew.sh/formula/gnu-sed) with support for running as `sed` as well as [wget](https://formulae.brew.sh/formula/wget).
4. Follow the [Docker setup](#docker) above except when asked to run
**File systems without inotify support (e.g. NFS)**
`docker compose pull` to pull the image, run
```shell-session
Some file systems, such as NFS network shares, don't support file system
docker compose build
notifications with `inotify`. When the consumption directory is on such a
```
file system, Paperless-ngx will not pick up new files with the default
configuration. Use [`PAPERLESS_CONSUMER_POLLING`](configuration.md#PAPERLESS_CONSUMER_POLLING)
to enable polling and disable inotify. See [here](configuration.md#polling).
instead to build the image.
## Bare Metal Install {#bare_metal}
### Bare Metal Route {#bare_metal}
#### Prerequisites
Paperless runs on linux only. The following procedure has been tested on
- Paperless runs on Linux only, Windows is not supported.
a minimal installation of Debian/Buster, which is the current stable
- Python 3.11, 3.12, 3.13, or 3.14 is required. As a policy, Paperless-ngx aims to support at least the three most recent Python versions and drops support for versions as they reach end-of-life. Newer versions may work, but some dependencies may not be fully compatible.
release at the time of writing. Windows is not and will never be
supported.
Paperless requires Python 3. At this time, 3.10 - 3.12 are tested versions.
#### Installation
Newer versions may work, but some dependencies may not fully support newer versions.
Support for older Python versions may be dropped as they reach end of life or as newer versions
are released, dependency support is confirmed, etc.
1. Install dependencies. Paperless requires the following packages.
1. Install dependencies. Paperless requires the following packages:
- `python3`
- `python3`
- `python3-pip`
- `python3-pip`
@@ -204,13 +190,12 @@ are released, dependency support is confirmed, etc.
- `libpq-dev` for PostgreSQL
- `libpq-dev` for PostgreSQL
- `libmagic-dev` for mime type detection
- `libmagic-dev` for mime type detection
- `mariadb-client` for MariaDB compile time
- `mariadb-client` for MariaDB compile time
- `libzbar0` for barcode detection
- `poppler-utils` for barcode detection
- `poppler-utils` for barcode detection
Use this list for your preferred package management:
Use this list for your preferred package management:
1. Set the environment variable [`PAPERLESS_REDIS`](configuration.md#PAPERLESS_REDIS) so it points to
1. Set the environment variable [`PAPERLESS_REDIS`](configuration.md#PAPERLESS_REDIS) so it points to
the new Redis container
the new Redis container.
4. Update user mapping
4. Update user mapping.
1. If set, change the environment variable `PUID` to `USERMAP_UID`
1. If set, change the environment variable `PUID` to `USERMAP_UID`.
1. If set, change the environment variable `PGID` to `USERMAP_GID`
1. If set, change the environment variable `PGID` to `USERMAP_GID`.
5. Update configuration paths
5. Update configuration paths.
1. Set the environment variable [`PAPERLESS_DATA_DIR`](configuration.md#PAPERLESS_DATA_DIR) to `/config`
1. Set the environment variable [`PAPERLESS_DATA_DIR`](configuration.md#PAPERLESS_DATA_DIR) to `/config`.
6. Update media paths
6. Update media paths.
1. Set the environment variable [`PAPERLESS_MEDIA_ROOT`](configuration.md#PAPERLESS_MEDIA_ROOT) to
1. Set the environment variable [`PAPERLESS_MEDIA_ROOT`](configuration.md#PAPERLESS_MEDIA_ROOT) to
`/data/media`
`/data/media`.
7. Update timezone
7. Update timezone.
1. Set the environment variable [`PAPERLESS_TIME_ZONE`](configuration.md#PAPERLESS_TIME_ZONE) to the same
1. Set the environment variable [`PAPERLESS_TIME_ZONE`](configuration.md#PAPERLESS_TIME_ZONE) to the same
value as `TZ`
value as `TZ`.
8. Modify the `image:` to point to
8. Modify `image:` to point to
`ghcr.io/paperless-ngx/paperless-ngx:latest` or a specific version
`ghcr.io/paperless-ngx/paperless-ngx:latest` or a specific version
if preferred.
if preferred.
9. Start the containers as before, using `docker compose`.
9. Start the containers as before, using `docker compose`.
## Moving data from SQLite to PostgreSQL or MySQL/MariaDB {#sqlite_to_psql}
## Running Paperless-ngx on less powerful devices {#less-powerful-devices data-toc-label="Less Powerful Devices"}
The best way to migrate between database types is to perform an [export](administration.md#exporter) and then
Paperless runs on Raspberry Pi. Some tasks can be slow on lower-powered
[import](administration.md#importer) into a clean installation of Paperless-ngx.
hardware, but a few settings can improve performance:
## Moving back to Paperless
Lets say you migrated to Paperless-ngx and used it for a while, but
decided that you don't like it and want to move back (If you do, send
me a mail about what part you didn't like!), you can totally do that
with a few simple steps.
Paperless-ngx modified the database schema slightly, however, these
changes can be reverted while keeping your current data, so that your
current data will be compatible with original Paperless. Thumbnails
were also changed from PNG to WEBP format and will need to be
re-generated.
Execute this:
```shell-session
$ cd /path/to/paperless
$ docker compose run --rm webserver migrate documents 0023
```
Or without docker:
```shell-session
$ cd /path/to/paperless/src
$ python3 manage.py migrate documents 0023
```
After regenerating thumbnails, you'll need to clear your cookies
(Paperless-ngx comes with updated dependencies that do cookie-processing
differently) and probably your cache as well.
# Considerations for less powerful devices {#less-powerful-devices}
Paperless runs on Raspberry Pi. However, some things are rather slow on
the Pi and configuring some options in paperless can help improve
performance immensely:
- Stick with SQLite to save some resources. See [troubleshooting](troubleshooting.md#log-reports-creating-paperlesstask-failed)
- Stick with SQLite to save some resources. See [troubleshooting](troubleshooting.md#log-reports-creating-paperlesstask-failed)
if you encounter issues with SQLite locking.
if you encounter issues with SQLite locking.
- If you do not need the filesystem-based consumer, consider disabling it
- If you do not need the filesystem-based consumer, consider disabling it
entirely by setting [`PAPERLESS_CONSUMER_DISABLE`](configuration.md#PAPERLESS_CONSUMER_DISABLE) to `true`.
entirely by setting [`PAPERLESS_CONSUMER_DISABLE`](configuration.md#PAPERLESS_CONSUMER_DISABLE) to `true`.
- Consider setting [`PAPERLESS_OCR_PAGES`](configuration.md#PAPERLESS_OCR_PAGES) to 1, so that paperless will
- Consider setting [`PAPERLESS_OCR_PAGES`](configuration.md#PAPERLESS_OCR_PAGES) to 1, so that Paperless
only OCR the first page of your documents. In most cases, this page
OCRs only the first page of your documents. In most cases, this page
contains enough information to be able to find it.
contains enough information to be able to find it.
- [`PAPERLESS_TASK_WORKERS`](configuration.md#PAPERLESS_TASK_WORKERS) and [`PAPERLESS_THREADS_PER_WORKER`](configuration.md#PAPERLESS_THREADS_PER_WORKER) are
- [`PAPERLESS_TASK_WORKERS`](configuration.md#PAPERLESS_TASK_WORKERS) and [`PAPERLESS_THREADS_PER_WORKER`](configuration.md#PAPERLESS_THREADS_PER_WORKER) are
configured to use all cores. The Raspberry Pi models 3 and up have 4
configured to use all cores. The Raspberry Pi models 3 and up have 4
cores, meaning that paperless will use 2 workers and 2 threads per
cores, meaning that Paperless will use 2 workers and 2 threads per
worker. This may result in sluggish response times during
worker. This may result in sluggish response times during
consumption, so you might want to lower these settings (example: 2
consumption, so you might want to lower these settings (example: 2
workers and 1 thread to always have some computing power left for
workers and 1 thread to always have some computing power left for
other tasks).
other tasks).
- Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `skip` and consider
- Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `skip` and consider
OCR'ing your documents before feeding them into paperless. Some
OCRing your documents before feeding them into Paperless. Some
scanners are able to do this!
scanners are able to do this!
- Set [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE`](configuration.md#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) to `with_text` to skip archive
- Set [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE`](configuration.md#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) to `with_text` to skip archive
file generation for already ocr'ed documents, or `always` to skip it
file generation for already OCRed documents, or `always` to skip it
for all documents.
for all documents.
- If you want to perform OCR on the device, consider using
- If you want to perform OCR on the device, consider using
`PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use
`PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use
less memory at the expense of slightly worse OCR results.
less memory at the expense of slightly worse OCR results.
- If using docker, consider setting [`PAPERLESS_WEBSERVER_WORKERS`](configuration.md#PAPERLESS_WEBSERVER_WORKERS) to 1. This will save some memory.
- If using Docker, consider setting [`PAPERLESS_WEBSERVER_WORKERS`](configuration.md#PAPERLESS_WEBSERVER_WORKERS) to 1. This will save some memory.
- Consider setting [`PAPERLESS_ENABLE_NLTK`](configuration.md#PAPERLESS_ENABLE_NLTK) to false, to disable the
- Consider setting [`PAPERLESS_ENABLE_NLTK`](configuration.md#PAPERLESS_ENABLE_NLTK) to false, to disable the
more advanced language processing, which can take more memory and
more advanced language processing, which can take more memory and
processing time.
processing time.
@@ -727,17 +674,19 @@ For details, refer to [configuration](configuration.md).
Updating the
Updating the
[automatic matching algorithm](advanced_usage.md#automatic-matching) takes quite a bit of time. However, the update mechanism
[automatic matching algorithm](advanced_usage.md#automatic-matching) takes quite a bit of time. However, the update mechanism
checks if your data has changed before doing the heavy lifting. If you
checks if your data has changed before doing the heavy lifting. If you
experience the algorithm taking too much cpu time, consider changing the
experience the algorithm taking too much CPU time, consider changing the
schedule in the admin interface to daily. You can also manually invoke
schedule in the admin interface to daily. You can also manually invoke
the task by changing the date and time of the next run to today/now.
the task by changing the date and time of the next run to today/now.
The actual matching of the algorithm is fast and works on Raspberry Pi
The actual matching of the algorithm is fast and works on Raspberry Pi
as well as on any other device.
as well as on any other device.
# Using nginx as a reverse proxy {#nginx}
## Additional considerations
Please see [the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Using-a-Reverse-Proxy-with-Paperless-ngx#nginx) for user-maintained documentation of using nginx with Paperless-ngx.
**Using a reverse proxy with Paperless-ngx**
# Enhancing security {#security}
Please see [the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Using-a-Reverse-Proxy-with-Paperless-ngx#nginx) for user-maintained documentation on using nginx with Paperless-ngx.
Please see [the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Using-Security-Tools-with-Paperless-ngx) for user-maintained documentation of how to configure security tools like Fail2ban with Paperless-ngx.
**Enhancing security**
Please see [the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Using-Security-Tools-with-Paperless-ngx) for user-maintained documentation on configuring security tools like Fail2ban with Paperless-ngx.
The user will need to manually move the file out of the consume folder
and back in, for the initial failing file to be consumed.
## Log reports "Creating PaperlessTask failed".
## Log reports "Creating PaperlessTask failed".
@@ -335,7 +297,7 @@ You may see errors when deleting documents like:
Data too long for column 'transaction_id' at row 1
Data too long for column 'transaction_id' at row 1
```
```
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backawards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
This error can occur in installations which have upgraded from a version of Paperless-ngx that used Django 4 (Paperless-ngx versions prior to v2.13.0) with a MariaDB/MySQL database. Due to the backwards-incompatible change in Django 5, the column "documents_document.transaction_id" will need to be re-created, which can be done with a one-time run of the following management command:
Paperless-ngx is an application that manages your personal documents. With
Paperless-ngx is an application that manages your personal documents. With
the (optional) help of a document scanner (see [the scanners wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations)), Paperless-ngx transforms your unwieldy
the (optional) help of a document scanner (see [the scanners wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-&-Software-Recommendations)), Paperless-ngx transforms your unwieldy
@@ -30,6 +34,9 @@ Each document has data fields that you can assign to them:
- A _document type_ is used to demarcate the type of a document such
- A _document type_ is used to demarcate the type of a document such
as letter, bank statement, invoice, contract, etc. It is used to
as letter, bank statement, invoice, contract, etc. It is used to
identify what a document is about.
identify what a document is about.
- The document _storage path_ is the location where the document files
are stored. See [Storage Paths](advanced_usage.md#storage-paths) for
more information.
- The _date added_ of a document is the date the document was scanned
- The _date added_ of a document is the date the document was scanned
into paperless. You cannot and should not change this date.
into paperless. You cannot and should not change this date.
- The _date created_ of a document is the date the document was
- The _date created_ of a document is the date the document was
@@ -82,6 +89,17 @@ You can view the document, edit its metadata, assign tags, correspondents,
document types, and custom fields. You can also view the document history,
document types, and custom fields. You can also view the document history,
download the document or share it via a share link.
download the document or share it via a share link.
### Document File Versions
Think of versions as **file history** for a document.
- Versions track the underlying file and extracted text content (OCR/text).
- Metadata such as tags, correspondent, document type, storage path and custom fields stay on the "root" document.
- Version files follow normal filename formatting (including storage paths) and add a `_vN` suffix (for example `_v1`, `_v2`).
- By default, search and document content use the latest version.
- In document detail, selecting a version switches the preview, file metadata and content (and download etc buttons) to that version.
- Deleting a non-root version keeps metadata and falls back to the latest remaining version.
### Management Lists
### Management Lists
Paperless-ngx includes management lists for tags, correspondents, document types
Paperless-ngx includes management lists for tags, correspondents, document types
@@ -89,7 +107,17 @@ and more. These areas allow you to view, add, edit, delete and manage permission
for these objects. You can also manage saved views, mail accounts, mail rules,
for these objects. You can also manage saved views, mail accounts, mail rules,
workflows and more from the management sections.
workflows and more from the management sections.
## Adding documents to paperless
### Nested Tags
Paperless-ngx v2.19 introduces support for nested tags, allowing you to create a
hierarchy of tags, which may be useful for organizing your documents. Tags can
have a 'parent' tag, creating a tree-like structure, to a maximum depth of 5. When
a tag is added to a document, all of its parent tags are also added automatically
and similarly, when a tag is removed from a document, all of its child tags are
also removed. Additionally, assigning a parent to an existing tag will automatically
update all documents that have this tag assigned, adding the parent tag as well.
## Adding documents to Paperless-ngx
Once you've got Paperless setup, you need to start feeding documents
Once you've got Paperless setup, you need to start feeding documents
into it. When adding documents to paperless, it will perform the
into it. When adding documents to paperless, it will perform the
@@ -115,7 +143,8 @@ following operations on your documents:
No matter which options you choose, Paperless will always store the
No matter which options you choose, Paperless will always store the
original document that it found in the consumption directory or in the
original document that it found in the consumption directory or in the
mail and will never overwrite that document. Archived versions are
mail and will never overwrite that document (except when using certain
document actions, which make that clear). Archived versions are
stored alongside the original versions. Any files found in the
stored alongside the original versions. Any files found in the
consumption directory will stored inside the Paperless-ngx file
consumption directory will stored inside the Paperless-ngx file
structure and will not be retained in the consumption directory.
structure and will not be retained in the consumption directory.
@@ -159,7 +188,7 @@ process.
Please see [the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects) for a user-maintained list of related projects and
Please see [the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects) for a user-maintained list of related projects and
software (e.g. for mobile devices) that is compatible with Paperless-ngx.
software (e.g. for mobile devices) that is compatible with Paperless-ngx.
### Email {#usage-email}
### Incoming Email {#incoming-mail}
You can tell paperless-ngx to consume documents from your email
You can tell paperless-ngx to consume documents from your email
accounts. This is a very flexible and powerful feature, if you regularly
accounts. This is a very flexible and powerful feature, if you regularly
@@ -247,6 +276,10 @@ different means. These are as follows:
Paperless is set up to check your mails every 10 minutes. This can be
Paperless is set up to check your mails every 10 minutes. This can be
configured via [`PAPERLESS_EMAIL_TASK_CRON`](configuration.md#PAPERLESS_EMAIL_TASK_CRON)
configured via [`PAPERLESS_EMAIL_TASK_CRON`](configuration.md#PAPERLESS_EMAIL_TASK_CRON)
#### Processed Mail
Paperless keeps track of emails it has processed in order to avoid processing the same mail multiple times. This uses the message `UID` provided by the mail server, which should be unique for each message. You can view and manage processed mails from the web UI under Mail > Processed Mails. If you need to re-process a message, you can delete the corresponding processed mail entry, which will allow Paperless-ngx to process the email again the next time the mail fetch task runs.
#### OAuth Email Setup
#### OAuth Email Setup
Paperless-ngx supports OAuth2 authentication for Gmail and Outlook email accounts. To set up an email account with OAuth2, you will need to create a 'developer' app with the respective provider and obtain the client ID and client secret and set the appropriate [configuration variables](configuration.md#email_oauth). You will also need to set either [`PAPERLESS_OAUTH_CALLBACK_BASE_URL`](configuration.md#PAPERLESS_OAUTH_CALLBACK_BASE_URL) or [`PAPERLESS_URL`](configuration.md#PAPERLESS_URL) to the correct value for the OAuth2 flow to work correctly.
Paperless-ngx supports OAuth2 authentication for Gmail and Outlook email accounts. To set up an email account with OAuth2, you will need to create a 'developer' app with the respective provider and obtain the client ID and client secret and set the appropriate [configuration variables](configuration.md#email_oauth). You will also need to set either [`PAPERLESS_OAUTH_CALLBACK_BASE_URL`](configuration.md#PAPERLESS_OAUTH_CALLBACK_BASE_URL) or [`PAPERLESS_URL`](configuration.md#PAPERLESS_URL) to the correct value for the OAuth2 flow to work correctly.
@@ -260,6 +293,55 @@ Once setup, navigating to the email settings page in Paperless-ngx will allow yo
You can also submit a document using the REST API, see [POSTing documents](api.md#file-uploads)
You can also submit a document using the REST API, see [POSTing documents](api.md#file-uploads)
for details.
for details.
## Document Suggestions
Paperless-ngx can suggest tags, correspondents, document types and storage paths for documents based on the content of the document. This is done using a (non-LLM) machine learning model that is trained on the documents in your database. The suggestions are shown in the document detail page and can be accepted or rejected by the user.
## AI Features
Paperless-ngx includes several features that use AI to enhance the document management experience. These features are optional and can be enabled or disabled in the settings. If you are using the AI features, you may want to also enable the "LLM index" feature, which supports Retrieval-Augmented Generation (RAG) designed to improve the quality of AI responses. The LLM index feature is not enabled by default and requires additional configuration.
!!! warning
Remember that Paperless-ngx will send document content to the AI provider you have configured, so consider the privacy implications of using these features, especially if using a remote model (e.g. OpenAI), instead of the default local model.
The AI features work by creating an embedding of the text content and metadata of documents, which is then used for various tasks such as similarity search and question answering. This uses the FAISS vector store.
### AI-Enhanced Suggestions
If enabled, Paperless-ngx can use an AI LLM model to suggest document titles, dates, tags, correspondents and document types for documents. This feature will always be "opt-in" and does not disable the existing classifier-based suggestion system. Currently, both remote (via the OpenAI API) and local (via Ollama) models are supported, see [configuration](configuration.md#ai) for details.
### Document Chat
Paperless-ngx can use an AI LLM model to answer questions about a document or across multiple documents. Again, this feature works best when RAG is enabled. The chat feature is available in the upper app toolbar and will switch between chatting across multiple documents or a single document based on the current view.
## Sharing documents from Paperless-ngx
Paperless-ngx supports sharing documents with other users by assigning them [permissions](#object-permissions)
to the document. Document files can also be shared externally via [share links](#share-links), [email](#email-sharing)
or using [email](#workflow-action-email) or [webhook](#workflow-action-webhook) actions in workflows.
### Share Links
"Share links" are public links to files (or an archive of files) and can be created and managed under the 'Send' button on the document detail screen or from the bulk editor.
- Share links do not require a user to login and thus link directly to a file or bundled download.
- Links are unique and are of the form `{paperless-url}/share/{randomly-generated-slug}`.
- Links can optionally have an expiration time set.
- After a link expires or is deleted users will be redirected to the regular paperless-ngx login.
- From the document detail screen you can create a share link for that single document.
- From the bulk editor you can create a **share link bundle** for any selection. Paperless-ngx prepares a ZIP archive in the background and exposes a single share link. You can revisit the "Manage share link bundles" dialog to monitor progress, retry failed bundles, or delete links.
!!! tip
If your paperless-ngx instance is behind a reverse-proxy you may want to create an exception to bypass any authentication layers that are part of your setup in order to make links truly publicly-accessible. Of course, do so with caution.
### Email Sharing {#email-sharing}
Paperless-ngx supports directly sending documents via email. If an email server has been [configured](configuration.md#email-sending)
the "Send" button on the document detail page will include an "Email" option. You can also share files via email automatically by using
a [workflow action](#workflow-action-email).
## Permissions
## Permissions
Permissions in Paperless-ngx are based around ['global' permissions](#global-permissions) as well as
Permissions in Paperless-ngx are based around ['global' permissions](#global-permissions) as well as
@@ -301,6 +383,11 @@ permissions can be granted to limit access to certain parts of the UI (and corre
Superusers can access all parts of the front and backend application as well as any and all objects. Superuser status can only be granted by another superuser.
Superusers can access all parts of the front and backend application as well as any and all objects. Superuser status can only be granted by another superuser.
!!! tip
Because superuser accounts can see all objects and documents, you may want to use a regular account for day-to-day use. Additional superuser accounts can
be created via [cli](administration.md#create-superuser) or granted superuser status from an existing superuser account.
#### Admin Status
#### Admin Status
Admin status (Django 'staff status') grants access to viewing the paperless logs and the system status dialog
Admin status (Django 'staff status') grants access to viewing the paperless logs and the system status dialog
@@ -312,25 +399,25 @@ Global permissions define what areas of the app and API endpoints users can acce
determine if a user can create, edit, delete or view _any_ documents, but individual documents themselves
determine if a user can create, edit, delete or view _any_ documents, but individual documents themselves
| UISettings | Add, edit, delete or view the UI settings that are used by the web app.<br/>:warning: **Users that will access the web UI must be granted at least _View_ permissions.** |
| UISettings | Add, edit, delete or view the UI settings that are used by the web app.<br/>:warning: **Users that will access the web UI must be granted at least _View_ permissions.** |
| User | Add, edit, delete or view Users. |
| User | Add, edit, delete or view Users. |
| Workflow | Add, edit, delete or view Workflows.<br/>Note that Workflows are global, in other words all users who can access workflows have access to the same set of them. |
| Workflow | Add, edit, delete or view Workflows.<br/>Note that Workflows are global; all users who can access workflows see the same set. Workflows have other permission implications — see [Workflow permissions](#workflow-permissions). |
#### Detailed Explanation of Object Permissions {#object-permissions}
#### Detailed Explanation of Object Permissions {#object-permissions}
@@ -369,9 +456,9 @@ fields and permissions, which will be merged.
### Workflow Triggers
### Workflow Triggers
#### Types
#### Types {#workflow-trigger-types}
Currently, there are three events that correspond to workflow trigger 'types':
Currently, there are four events that correspond to workflow trigger 'types':
1.**Consumption Started**: _before_ a document is consumed, so events can include filters by source (mail, consumption
1.**Consumption Started**: _before_ a document is consumed, so events can include filters by source (mail, consumption
folder or API), file path, file name, mail rule
folder or API), file path, file name, mail rule
@@ -379,11 +466,12 @@ Currently, there are three events that correspond to workflow trigger 'types':
but the document content has been extracted and metadata such as document type, tags, etc. have been set, so these can now
but the document content has been extracted and metadata such as document type, tags, etc. have been set, so these can now
be used for filtering.
be used for filtering.
3.**Document Updated**: when a document is updated. Similar to 'added' events, triggers can include filtering by content matching,
3.**Document Updated**: when a document is updated. Similar to 'added' events, triggers can include filtering by content matching,
tags, doc type, or correspondent.
tags, doc type, correspondent or storage path.
4.**Scheduled**: a scheduled trigger that can be used to run workflows at a specific time. The date used can be either the document
4.**Scheduled**: a scheduled trigger that can be used to run workflows at a specific time. The date used can be either the document
added, created, updated date or you can specify a (date) custom field. You can also specify a day offset from the date.
added, created, updated date or you can specify a (date) custom field. You can also specify a day offset from the date (positive
offsets will trigger after the date, negative offsets will trigger before).
The following flow diagram illustrates the three document trigger types:
The following flow diagram illustrates the four document trigger types:
```mermaid
```mermaid
flowchart TD
flowchart TD
@@ -399,6 +487,10 @@ flowchart TD
'Updated'
'Updated'
trigger(s)"}
trigger(s)"}
scheduled{"Documents
matching
trigger(s)"}
A[New Document] --> consumption
A[New Document] --> consumption
consumption --> |Yes| C[Workflow Actions Run]
consumption --> |Yes| C[Workflow Actions Run]
consumption --> |No| D
consumption --> |No| D
@@ -411,6 +503,11 @@ flowchart TD
updated --> |Yes| J[Workflow Actions Run]
updated --> |Yes| J[Workflow Actions Run]
updated --> |No| K
updated --> |No| K
J --> K[Document Saved]
J --> K[Document Saved]
L[Scheduled Task Check<br/>hourly at :05] --> M[Get All Scheduled Triggers]
M --> scheduled
scheduled --> |Yes| N[Workflow Actions Run]
scheduled --> |No| O[Document Saved]
N --> O
```
```
#### Filters {#workflow-trigger-filters}
#### Filters {#workflow-trigger-filters}
@@ -418,22 +515,32 @@ flowchart TD
Workflows allow you to filter by:
Workflows allow you to filter by:
- Source, e.g. documents uploaded via consume folder, API (& the web UI) and mail fetch
- Source, e.g. documents uploaded via consume folder, API (& the web UI) and mail fetch
- File name, including wildcards e.g. \*.pdf will apply to all pdfs
- File name, including wildcards e.g. \*.pdf will apply to all pdfs.
- File path, including wildcards. Note that enabling `PAPERLESS_CONSUMER_RECURSIVE` would allow, for
- File path, including wildcards. Note that enabling `PAPERLESS_CONSUMER_RECURSIVE` would allow, for
example, automatically assigning documents to different owners based on the upload directory.
example, automatically assigning documents to different owners based on the upload directory.
- Mail rule. Choosing this option will force 'mail fetch' to be the workflow source.
- Mail rule. Choosing this option will force 'mail fetch' to be the workflow source.
- Content matching (`Added` and `Updated` triggers only). Filter document content using the matching settings.
- Content matching (`Added`, `Updated` and `Scheduled` triggers only). Filter document content using the matching settings.
- Tags (`Added` and `Updated` triggers only). Filter for documents with any of the specified tags
- Document type (`Added` and`Updated`triggers only). Filter documents with this doc type
There are also 'advanced' filters available for `Added`,`Updated`and `Scheduled` triggers:
- Correspondent (`Added` and `Updated` triggers only). Filter documents with this correspondent
- Any Tags: Filter for documents with any of the specified tags.
- All Tags: Filter for documents with all of the specified tags.
- No Tags: Filter for documents with none of the specified tags.
- Document type: Filter documents with this document type.
- Not Document types: Filter documents without any of these document types.
- Correspondent: Filter documents with this correspondent.
- Not Correspondents: Filter documents without any of these correspondents.
- Storage path: Filter documents with this storage path.
- Not Storage paths: Filter documents without any of these storage paths.
- Custom field query: Filter documents with a custom field query (the same as used for the document list filters).
### Workflow Actions
### Workflow Actions
#### Types
#### Types {#workflow-action-types}
The following workflow action types are available:
The following workflow action types are available:
##### Assignment
##### Assignment {#workflow-action-assignment}
"Assignment" actions can assign:
"Assignment" actions can assign:
@@ -443,7 +550,7 @@ The following workflow action types are available:
- View and / or edit permissions to users or groups
- View and / or edit permissions to users or groups
- Custom fields. Note that no value for the field will be set
- Custom fields. Note that no value for the field will be set
##### Removal
##### Removal {#workflow-action-removal}
"Removal" actions can remove either all of or specific sets of the following:
"Removal" actions can remove either all of or specific sets of the following:
@@ -452,7 +559,7 @@ The following workflow action types are available:
- View and / or edit permissions
- View and / or edit permissions
- Custom fields
- Custom fields
##### Email
##### Email {#workflow-action-email}
"Email" actions can send documents via email. This action requires a mail server to be [configured](configuration.md#email-sending). You can specify:
"Email" actions can send documents via email. This action requires a mail server to be [configured](configuration.md#email-sending). You can specify:
@@ -460,7 +567,7 @@ The following workflow action types are available:
- The subject and body of the email, which can include placeholders, see [placeholders](usage.md#workflow-placeholders) below
- The subject and body of the email, which can include placeholders, see [placeholders](usage.md#workflow-placeholders) below
- Whether to include the document as an attachment
- Whether to include the document as an attachment
##### Webhook
##### Webhook {#workflow-action-webhook}
"Webhook" actions send a POST request to a specified URL. You can specify:
"Webhook" actions send a POST request to a specified URL. You can specify:
@@ -469,44 +576,79 @@ The following workflow action types are available:
- Encoding for the request body, either JSON or form data
- Encoding for the request body, either JSON or form data
- The request headers as key-value pairs
- The request headers as key-value pairs
For security reasons, webhooks can be limited to specific ports and disallowed from connecting to local URLs. See the relevant
[configuration settings](configuration.md#workflow-webhooks) to change this behavior. If you are allowing non-admins to create workflows,
you may want to adjust these settings to prevent abuse.
##### Move to Trash {#workflow-action-move-to-trash}
"Move to Trash" actions move the document to the trash. The document can be restored
from the trash until the trash is emptied (after the configured delay or manually).
The "Move to Trash" action will always be executed at the end of the workflow run,
regardless of its position in the action list. After a "Move to Trash" action is executed
no other workflow will be executed on the document.
If a "Move to Trash" action is executed in a consume pipeline, the consumption
will be aborted and the file will be deleted.
#### Workflow placeholders
#### Workflow placeholders
Some workflow text can include placeholders but the available options differ depending on the type of
Titles and webhook payloads can be generated by workflows using [Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/).
workflow trigger. This is because at the time of consumption (when the text is to be set), no automatic tags etc. have been
This allows for complex logic to beused, including [logical structures](https://jinja.palletsprojects.com/en/3.1.x/templates/#list-of-control-structures)
applied. You can use the following placeholders with any trigger type:
and [filters](https://jinja.palletsprojects.com/en/3.1.x/templates/#id11).
The template is provided as a string.
-`{correspondent}`: assigned correspondent name
Using Jinja2 Templates is also useful for [Date localization](advanced_usage.md#date-localization) in the title.
-`{document_type}`: assigned document type name
-`{owner_username}`: assigned owner username
The available inputs differ depending on the type of workflow trigger.
-`{added}`: added datetime
This is because at the time of consumption (when the text is to be set), no automatic tags etc. have been
-`{added_year}`: added year
applied. You can use the following placeholders in the template with any trigger type:
-`{added_year_short}`: added year
-`{added_month}`: added month
-`{{correspondent}}`: assigned correspondent name
-`{added_month_name}`: added month name
-`{{document_type}}`: assigned document type name
-`{added_month_name_short}`: added month short name
-`{{owner_username}}`: assigned owner username
-`{added_day}`: added day
-`{{added}}`: added datetime
-`{added_time}`: added time in HH:MM format
-`{{added_year}}`: added year
-`{original_filename}`: original file name without extension
-`{{added_year_short}}`: added year
-`{filename}`: current file name without extension
-`{{added_month}}`: added month
-`{{added_month_name}}`: added month name
-`{{added_month_name_short}}`: added month short name
-`{{added_day}}`: added day
-`{{added_time}}`: added time in HH:MM format
-`{{original_filename}}`: original file name without extension
-`{{filename}}`: current file name without extension (for "added" workflows this may not be final yet, you can use `{{original_filename}}`)
-`{{doc_title}}`: current document title (cannot be used in title assignment)
The following placeholders are only available for "added" or "updated" triggers
The following placeholders are only available for "added" or "updated" triggers
-`{created}`: created datetime
-`{{created}}`: created datetime
-`{created_year}`: created year
-`{{created_year}}`: created year
-`{created_year_short}`: created year
-`{{created_year_short}}`: created year
-`{created_month}`: created month
-`{{created_month}}`: created month
-`{created_month_name}`: created month name
-`{{created_month_name}}`: created month name
-`{created_month_name_short}`: created month short name
-`{{created_month_name_short}}`: created month short name
-`{created_day}`: created day
-`{{created_day}}`: created day
-`{created_time}`: created time in HH:MM format
-`{{created_time}}`: created time in HH:MM format
-`{doc_url}`: URL to the document in the web UI. Requires the `PAPERLESS_URL` setting to be set.
-`{{doc_url}}`: URL to the document in the web UI. Requires the `PAPERLESS_URL` setting to be set.
-`{{doc_id}}`: Document ID
##### Examples
```jinja2
{{ created | localize_date('MMMM', 'en_US') }}
<!-- Output: "January" -->
{{ added | localize_date('MMMM', 'de_DE') }}
<!-- Output: "Juni" --> # codespell:ignore
```
### Workflow permissions
### Workflow permissions
All users who have application permissions for editing workflows can see the same set
All users who have application permissions for editing workflows can see the same set
of workflows. In other words, workflows themselves intentionally do not have an owner or permissions.
of workflows. In other words, workflows themselves intentionally do not have an owner or permissions.
Given their potentially far-reaching capabilities, you may want to restrict access to workflows.
Given their potentially far-reaching capabilities, including changing the permissions of existing documents, you may want to restrict access to workflows.
Upon migration, existing installs will grant access to workflows to users who can add
Upon migration, existing installs will grant access to workflows to users who can add
documents (and superusers who can always access all parts of the app).
documents (and superusers who can always access all parts of the app).
@@ -544,27 +686,16 @@ The following custom field types are supported:
-`Document Link`: reference(s) to other document(s) displayed as links, automatically creates a symmetrical link in reverse
-`Document Link`: reference(s) to other document(s) displayed as links, automatically creates a symmetrical link in reverse
-`Select`: a pre-defined list of strings from which the user can choose
-`Select`: a pre-defined list of strings from which the user can choose
## Share Links
Paperless-ngx added the ability to create shareable links to files in version 2.0. You can find the button for this on the document detail screen.
- Share links do not require a user to login and thus link directly to a file.
- Links are unique and are of the form `{paperless-url}/share/{randomly-generated-slug}`.
- Links can optionally have an expiration time set.
- After a link expires or is deleted users will be redirected to the regular paperless-ngx login.
!!! tip
If your paperless-ngx instance is behind a reverse-proxy you may want to create an exception to bypass any authentication layers that are part of your setup in order to make links truly publicly-accessible. Of course, do so with caution.
## PDF Actions
## PDF Actions
Paperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files):
Paperless-ngx supports basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files). When viewing an individual document you can
open the 'PDF Editor' to use a simple UI for re-arranging, rotating, deleting pages and splitting documents.
- Merging documents: available when selecting multiple documents for 'bulk editing'.
- Merging documents: available when selecting multiple documents for 'bulk editing'.
- Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page.
- Rotating documents: available when selecting multiple documents for 'bulk editing' and via the pdf editor on an individual document's details page.
- Splitting documents: available from an individual document's details page.
- Splitting documents: via the pdf editor on an individual document's details page.
- Deleting pages: available from an individual document's details page.
- Deleting pages: via the pdf editor on an individual document's details page.
- Re-arranging pages: via the pdf editor on an individual document's details page.
!!! important
!!! important
@@ -582,7 +713,7 @@ When you first delete a document it is moved to the 'trash' until either it is e
You can set how long documents remain in the trash before being automatically deleted with [`PAPERLESS_EMPTY_TRASH_DELAY`](configuration.md#PAPERLESS_EMPTY_TRASH_DELAY), which defaults
You can set how long documents remain in the trash before being automatically deleted with [`PAPERLESS_EMPTY_TRASH_DELAY`](configuration.md#PAPERLESS_EMPTY_TRASH_DELAY), which defaults
to 30 days. Until the file is actually deleted (e.g. the trash is emptied), all files and database content remains intact and can be restored at any point up until that time.
to 30 days. Until the file is actually deleted (e.g. the trash is emptied), all files and database content remains intact and can be restored at any point up until that time.
Additionally you may configure a directory where deleted files are moved to when they the trash is emptied with [`PAPERLESS_EMPTY_TRASH_DIR`](configuration.md#PAPERLESS_EMPTY_TRASH_DIR).
Additionally you may configure a directory where deleted files are moved to when the trash is emptied with [`PAPERLESS_EMPTY_TRASH_DIR`](configuration.md#PAPERLESS_EMPTY_TRASH_DIR).
Note that the empty trash directory only stores the original file, the archive file and all database information is permanently removed once a document is fully deleted.
Note that the empty trash directory only stores the original file, the archive file and all database information is permanently removed once a document is fully deleted.
## Best practices {#basic-searching}
## Best practices {#basic-searching}
@@ -827,6 +958,21 @@ how regularly you intend to scan documents and use paperless.
performed the task associated with the document, move it to the
performed the task associated with the document, move it to the
inbox.
inbox.
## Remote OCR
!!! important
This feature is disabled by default and will always remain strictly "opt-in".
Paperless-ngx supports performing OCR on documents using remote services. At the moment, this is limited to
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.