Documentation (beta): Updates documentation for new v3 features (#13033)

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
This commit is contained in:
Trenton H
2026-06-18 16:20:31 -07:00
committed by GitHub
parent a009ea1f04
commit bb5d7438b1
10 changed files with 187 additions and 52 deletions
+32
View File
@@ -65,6 +65,11 @@ copies you created in the steps above.
Please review the [migration instructions](migration-v3.md) before upgrading Paperless-ngx to v3.0, it includes some breaking changes that require manual intervention before upgrading.
!!! note
Upgrading to v3 clears the existing task history; previously completed, failed, or
acknowledged tasks will no longer appear in the task list afterward. No action is required.
### Docker Route {#docker-updating}
If a new release of paperless-ngx is available, upgrading depends on how
@@ -500,6 +505,33 @@ task scheduler.
python3 manage.py document_index reindex --if-needed
```
### Managing the LLM (AI) index {#llm-index}
When the [AI features](advanced_usage.md#ai-features) are enabled with an embedding
backend, Paperless-ngx maintains a vector index of your documents used for
Retrieval-Augmented Generation (RAG), similar-document retrieval, and document chat. The
index is updated automatically on the schedule set by
[`PAPERLESS_LLM_INDEX_TASK_CRON`](configuration.md#PAPERLESS_LLM_INDEX_TASK_CRON), but you
can manage it manually:
```
document_llmindex {rebuild,update,compact}
```
Specify `rebuild` to build the index from scratch from all documents in the database. Use
this the first time you enable the feature, or after changing the embedding backend or
model.
Specify `update` to incrementally index new and changed documents. This is what the
scheduled task runs.
Specify `compact` to reclaim space and optimize the on-disk vector store.
!!! note
These commands have no effect unless AI is enabled and an embedding backend is
configured.
### Clearing the database read cache
If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.
+83 -2
View File
@@ -97,6 +97,85 @@ when using this feature:
of these correspondents to ANY new document, if both are set to
automatic matching.
## AI features {#ai-features}
Paperless-ngx includes a set of optional features backed by a large language model
(LLM): AI-assisted suggestions, similar-document retrieval, and a document chat. They
are **off by default** and never replace the built-in, non-LLM
[matching and suggestions](#matching).
!!! warning
Enabling these features sends document content (and metadata) to the LLM backend you
configure. If that backend is a remote/hosted provider, your documents leave your
server and may incur usage charges. Consider the privacy implications before enabling,
and prefer a local backend (Ollama, or a self-hosted OpenAI-compatible gateway) if that
matters to you.
All AI settings can be supplied as `PAPERLESS_AI_*` environment variables (see
[configuration](configuration.md#ai)) or set in the admin under
**Settings → Application Configuration**; the database value takes precedence over the
environment.
### Enabling the AI features
At a minimum you need to enable AI and choose an LLM backend:
- [`PAPERLESS_AI_ENABLED`](configuration.md#PAPERLESS_AI_ENABLED) — master switch.
- [`PAPERLESS_AI_LLM_BACKEND`](configuration.md#PAPERLESS_AI_LLM_BACKEND) — `ollama`
(runs locally) or `openai-like` (OpenAI itself or any OpenAI-compatible API).
- [`PAPERLESS_AI_LLM_MODEL`](configuration.md#PAPERLESS_AI_LLM_MODEL), and for
`openai-like` usually [`PAPERLESS_AI_LLM_API_KEY`](configuration.md#PAPERLESS_AI_LLM_API_KEY)
and/or [`PAPERLESS_AI_LLM_ENDPOINT`](configuration.md#PAPERLESS_AI_LLM_ENDPOINT). Ollama
requires `PAPERLESS_AI_LLM_ENDPOINT` pointing at your Ollama server.
### AI-assisted suggestions
With AI enabled, Paperless-ngx can suggest a title, tags, correspondent, document type,
storage path and dates by sending the document to the LLM. This is **opt-in per request**
and surfaces through the "Suggest" control on the document detail page, alongside the
classic classifier-based suggestions — it does not disable them. Suggestion output
language can be steered with
[`PAPERLESS_AI_LLM_OUTPUT_LANGUAGE`](configuration.md#PAPERLESS_AI_LLM_OUTPUT_LANGUAGE)
(otherwise it follows the user's UI language).
### The LLM index (RAG) and similar documents
Setting an embedding backend turns on the **LLM index**, a vector index of your documents
that enables Retrieval-Augmented Generation (RAG). When enabled, suggestions are grounded
in similar existing documents, and the document chat can retrieve relevant context.
Enable it by setting
[`PAPERLESS_AI_LLM_EMBEDDING_BACKEND`](configuration.md#PAPERLESS_AI_LLM_EMBEDDING_BACKEND)
(`huggingface` for fully-local embeddings, or `ollama` / `openai-like`). The index is only
built when AI is enabled **and** an embedding backend is set.
The index is updated automatically on a schedule controlled by
[`PAPERLESS_LLM_INDEX_TASK_CRON`](configuration.md#PAPERLESS_LLM_INDEX_TASK_CRON) (daily by
default), and can be rebuilt or compacted manually — see
[Managing the LLM index](administration.md#llm-index).
!!! note
Local embeddings via `huggingface` download the embedding model on first use into the
Paperless data directory. The first run therefore needs network access and some disk
space.
### Document chat
When the LLM index is enabled, the chat control in the top app toolbar answers questions
about your documents. It operates over a single document or across multiple documents
depending on the current view, and its answers include links to the source documents it
drew from.
### AI Security notes
- Document content is passed to the LLM as **untrusted data**.
- By default Paperless-ngx allows AI endpoints that resolve to private/loopback addresses
(for local backends). Set
[`PAPERLESS_AI_LLM_ALLOW_INTERNAL_ENDPOINTS`](configuration.md#PAPERLESS_AI_LLM_ALLOW_INTERNAL_ENDPOINTS)
to `false` to block them.
## Hooking into the consumption process {#consume-hooks}
Sometimes you may want to do something arbitrary whenever a document is
@@ -846,7 +925,7 @@ Paperless is able to utilize barcodes for automatically performing some tasks. B
At this time, the library utilized for detection of barcodes supports the following types:
- AN-13/UPC-A
- EAN-13/UPC-A
- UPC-E
- EAN-8
- Code 128
@@ -855,7 +934,9 @@ At this time, the library utilized for detection of barcodes supports the follow
- Codabar
- Interleaved 2 of 5
- QR Code
- SQ Code
- Data Matrix
- Aztec
- PDF417
For usage in Paperless, the type of barcode does not matter, only the contents of it.
+7
View File
@@ -227,6 +227,7 @@ Version-aware endpoints:
- `PATCH /api/documents/{id}/`: content updates target the selected version (`?version={version_id}`) or latest version by default; non-content metadata updates target the root document.
- `GET /api/documents/{id}/download/`, `GET /api/documents/{id}/preview/`, `GET /api/documents/{id}/thumb/`, `GET /api/documents/{id}/metadata/`: accept `?version={version_id}`.
- `POST /api/documents/{id}/update_version/`: uploads a new version using multipart form field `document` and optional `version_label`.
- `PATCH /api/documents/{id}/versions/{version_id}/`: updates the `version_label` of a specific version.
- `DELETE /api/documents/{root_id}/versions/{version_id}/`: deletes a non-root version.
## Permissions
@@ -445,3 +446,9 @@ Initial API version.
large lists of object IDs for operations affecting many objects.
- The legacy `title_content` document search parameter is deprecated and will be removed in a future version.
Clients should use `text` for simple title-and-content search and `title_search` for title-only search.
- The task tracking system was redesigned. The tasks list (`/api/tasks/`) is now paginated, and the
task object exposes `task_type` (formerly `task_name`) and `trigger_source` (formerly `type`). New
read-only endpoints `/api/tasks/summary/`, `/api/tasks/status_counts/`, and `/api/tasks/active/`
provide aggregate views, and `POST /api/tasks/run/` lets privileged users dispatch supported tasks.
API v9 continues to serve the unpaginated list with the legacy field names until support for v9 is
dropped.
+5 -5
View File
@@ -62,14 +62,14 @@ and the relevant connection variables.
#### [`PAPERLESS_DBENGINE=<engine>`](#PAPERLESS_DBENGINE) {#PAPERLESS_DBENGINE}
: Specifies the database engine to use. Accepted values are `sqlite`, `postgresql`,
and `mariadb`.
Defaults to `sqlite` if not set.
and `mariadb`. PostgreSQL and MariaDB users must set this explicitly.
PostgreSQL and MariaDB both require [`PAPERLESS_DBHOST`](#PAPERLESS_DBHOST) to be
set. SQLite does not use any other connection variables; the database file is always
located at `<PAPERLESS_DATA_DIR>/db.sqlite3`.
Defaults to `sqlite`.
!!! warning
Using MariaDB comes with some caveats.
See [MySQL Caveats](advanced_usage.md#mysql-caveats).
@@ -892,7 +892,7 @@ modes are available:
The default is `auto`.
For the `skip`, `redo`, and `force` modes, read more about OCR
For the `redo` and `force` modes, read more about OCR
behaviour in the [OCRmyPDF
documentation](https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped).
@@ -2131,7 +2131,7 @@ used with the OpenAI-compatible backend to target a custom provider or local gat
Defaults to true, which allows internal endpoints.
#### [`PAPERLESS_AI_LLM_INDEX_TASK_CRON=<cron expression>`](#PAPERLESS_AI_LLM_INDEX_TASK_CRON) {#PAPERLESS_AI_LLM_INDEX_TASK_CRON}
#### [`PAPERLESS_LLM_INDEX_TASK_CRON=<cron expression>`](#PAPERLESS_LLM_INDEX_TASK_CRON) {#PAPERLESS_LLM_INDEX_TASK_CRON}
: Configures the schedule to update the AI embeddings of text content and metadata for all documents. Only performed if
AI is enabled and the LLM embedding backend is set.
+9 -8
View File
@@ -132,7 +132,7 @@ uv run manage.py runserver & \
```
You might need the front end to test your back end code.
This assumes that you have AngularJS installed on your system.
This assumes that you have Angular installed on your system.
Go to the [Front end development](#front-end-development) section for further details.
To build the front end once use this command:
@@ -174,7 +174,7 @@ To add a new development package `uv add --dev <package>`
## Front end development
The front end is built using AngularJS. In order to get started, you need Node.js (version 24+) and
The front end is built using Angular. In order to get started, you need Node.js (version 24+) and
`pnpm`.
!!! note
@@ -248,12 +248,12 @@ that authentication is working.
## Localization
Paperless-ngx is available in many different languages. Since Paperless-ngx
consists both of a Django application and an AngularJS front end, both
consists both of a Django application and an Angular front end, both
these parts have to be translated separately.
### Front end localization
- The AngularJS front end does localization according to the [Angular
- The Angular front end does localization according to the [Angular
documentation](https://angular.io/guide/i18n).
- The source language of the project is "en_US".
- The source strings end up in the file `src-ui/messages.xlf`.
@@ -495,7 +495,7 @@ class MyCustomParser:
self._tempdir = Path(
tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR)
)
self._text: str | None = None
self._text: str = ""
self._archive_path: Path | None = None
def __enter__(self) -> Self:
@@ -553,7 +553,8 @@ def parse(
**Result accessors**
```python
def get_text(self) -> str | None:
def get_text(self) -> str:
# Return the extracted text, or an empty string if none was found.
return self._text
def get_date(self) -> "datetime.datetime | None":
@@ -684,7 +685,7 @@ class XmlDocumentParser:
def __init__(self, logging_group: object = None) -> None:
settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
self._tempdir = Path(tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR))
self._text: str | None = None
self._text: str = ""
def __enter__(self) -> Self:
return self
@@ -702,7 +703,7 @@ class XmlDocumentParser:
except ET.ParseError as e:
raise ParseError(f"XML parse error: {e}") from e
def get_text(self) -> str | None:
def get_text(self) -> str:
return self._text
def get_date(self):
+20 -1
View File
@@ -70,7 +70,16 @@ elsewhere. Here are a couple notes about that.
Paperless-ngx determines the type of a file by inspecting its content
rather than its file extensions. However, files processed via the
consumption directory will be rejected if they have a file extension that
not supported by any of the available parsers.
is not supported by any of the available parsers.
## _Are duplicate documents rejected?_
**A:** Not by default. As of v3, a file whose contents match an existing document is still
consumed, and the duplicate is flagged in the UI — open the document and check the
**Duplicates** tab to review documents that share the same content. If you prefer the old
behavior of rejecting duplicates during consumption, set
[`PAPERLESS_CONSUMER_DELETE_DUPLICATES`](configuration.md#PAPERLESS_CONSUMER_DELETE_DUPLICATES)
to `true`.
## _Will paperless-ngx run on Raspberry Pi?_
@@ -118,6 +127,16 @@ able to run paperless, you're a bit on your own. If you can't run the
docker image, the documentation has instructions for bare metal
installs.
## _Does Paperless-ngx use AI, and is my data private?_
**A:** Paperless-ngx includes optional AI features — LLM-based suggestions, document chat,
and similar-document retrieval — that are **disabled by default**. They only run when you
enable them and configure an LLM backend. The built-in tag/correspondent suggestions use a
local, non-LLM machine-learning model and do not send your data anywhere. If you enable the
LLM features, document content is sent to whichever backend you configure — this can be a
fully local backend (e.g. Ollama) or a remote provider. See
[AI features](advanced_usage.md#ai-features) for details.
## _Which message broker should I use_?
Paperless-ngx talks to a Redis-compatible message broker, so any broker that
+2 -1
View File
@@ -35,9 +35,10 @@ physical documents into a searchable online archive so you can keep, well, _less
- _New!_ Supports remote OCR with Azure AI (opt-in).
- Documents are saved as PDF/A format which is designed for long term storage, alongside the unaltered originals.
- Uses machine-learning to automatically add tags, correspondents and document types to your documents.
- **New**: Paperless-ngx can now leverage AI (Large Language Models or LLMs) for document suggestions. This is an optional feature that can be enabled (and is disabled by default).
- **New**: Paperless-ngx can optionally leverage AI (Large Language Models or LLMs) for document suggestions, chatting with your documents, and similar-document retrieval. These features are opt-in and disabled by default.
- Supports PDF documents, images, plain text files, Office documents (Word, Excel, PowerPoint, and LibreOffice equivalents)[^1] and more.
- Paperless stores your documents plain on disk. Filenames and folders are managed by paperless and their format can be configured freely with different configurations assigned to different documents.
- Keep multiple **versions** of a document's file under a single entry, sharing one set of metadata.
- **Beautiful, modern web application** that features:
- Customizable dashboard with statistics.
- Filtering by tags, correspondents, types, and more.
+9 -3
View File
@@ -178,7 +178,7 @@ to enable polling and disable inotify. See [here](configuration.md#polling).
- `fonts-liberation` for generating thumbnails for plain text
files
- `imagemagick` >= 6 for PDF conversion
- `gnupg` for handling encrypted documents
- `gnupg` for decrypting GPG-encrypted email
- `libpq-dev` for PostgreSQL
- `libmagic-dev` for mime type detection
- `mariadb-client` for MariaDB compile time
@@ -271,8 +271,8 @@ to enable polling and disable inotify. See [here](configuration.md#polling).
needs. Required settings for getting Paperless-ngx running are:
- [`PAPERLESS_REDIS`](configuration.md#PAPERLESS_REDIS) should point to your broker, such as
`redis://localhost:6379`.
- [`PAPERLESS_DBENGINE`](configuration.md#PAPERLESS_DBENGINE) is optional, and should be one of `postgres`,
`mariadb`, or `sqlite`
- [`PAPERLESS_DBENGINE`](configuration.md#PAPERLESS_DBENGINE) should be one of `postgresql`,
`mariadb`, or `sqlite`. PostgreSQL and MariaDB users must set this explicitly.
- [`PAPERLESS_DBHOST`](configuration.md#PAPERLESS_DBHOST) should be the hostname on which your
PostgreSQL server is running. Do not configure this to use
SQLite instead. Also configure port, database name, user and
@@ -450,6 +450,12 @@ development documentation.
You can migrate to Paperless-ngx from Paperless-ng or from the original
Paperless project.
!!! note
Upgrading an existing Paperless-ngx installation from v2 to v3 has its own
breaking changes and required steps. See the [v3 migration guide](migration-v3.md)
before upgrading.
<h3 id="migration_ng">Migrating from Paperless-ng</h3>
Paperless-ngx is meant to be a drop-in replacement for Paperless-ng, and
-31
View File
@@ -149,37 +149,6 @@ operating system, if these are different from `1000`. See [Docker setup](setup.m
Also ensure that you are able to read and write to the consumption
directory on the host.
## OSError: \[Errno 19\] No such device when consuming files
If you experience errors such as:
```shell-session
File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file
return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__
self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
OSError: [Errno 19] No such device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
res = f(*task["args"], **task["kwargs"])
File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file
override_tag_ids=override_tag_ids)
File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file
raise ConsumerError(e)
```
Paperless uses a search index to provide better and faster full text
searching. This search index is stored inside the `data` folder. The
search index uses memory-mapped files (mmap). The above error indicates
that paperless was unable to create and open these files.
This happens when you're trying to store the data directory on certain
file systems (mostly network shares) that don't support memory-mapped
files.
## Web-UI stuck at "Loading\..."
This might have multiple reasons.
+20 -1
View File
@@ -292,6 +292,23 @@ Once setup, navigating to the email settings page in Paperless-ngx will allow yo
You can also submit a document using the REST API, see [POSTing documents](api.md#file-uploads)
for details.
### Duplicate documents
By default, Paperless-ngx **does not reject duplicates**. If you consume a file whose
contents exactly match an existing document (same checksum), the new copy is still
consumed and a warning is logged. The task entry for the upload also flags that a
duplicate was detected and links to the existing document(s).
To review duplicates, open a document and switch to the **Duplicates** tab on the
document detail page. It lists other documents that share the same content, including any
that are in the trash (shown with a badge), and links to each so you can decide which to
keep.
If you would rather reject duplicates at consumption time (the pre-v3 behavior), set
[`PAPERLESS_CONSUMER_DELETE_DUPLICATES`](configuration.md#PAPERLESS_CONSUMER_DELETE_DUPLICATES)
to `true`. The duplicate file is then deleted instead of consumed, and the task fails with
a "document already exists" message.
## Document Suggestions
Paperless-ngx can suggest tags, correspondents, document types and storage paths for documents based on the content of the document. This is done using a (non-LLM) machine learning model that is trained on the documents in your database. The suggestions are shown in the document detail page and can be accepted or rejected by the user.
@@ -306,7 +323,9 @@ Paperless-ngx includes several features that use AI to enhance the document mana
so consider the privacy implications of using these features, especially if using a remote
model or API provider instead of the default local model.
The AI features work by creating an embedding of the text content and metadata of documents, which is then used for various tasks such as similarity search and question answering. This uses the FAISS vector store.
The AI features work by creating an embedding of the text content and metadata of documents, which is then used for various tasks such as similarity search and question answering.
See [AI features](advanced_usage.md#ai-features) for how to enable and configure these features, including choosing an LLM backend and setting up the LLM index for RAG.
### AI-Enhanced Suggestions