Merge branch 'dev' into feature-version-added-wf-trigger

2026-06-20 12:24:17 +00:00 · 2026-04-03 21:46:45 -07:00
parent ddce646c7e a945cd9379
commit 17ff136e1f
314 changed files with 18502 additions and 8225 deletions
@@ -180,6 +180,16 @@ following:
    This might not actually do anything. Not every new paperless version
    comes with new database migrations.

+4.  Rebuild the search index if needed.
+
+    ```shell-session
+    cd src
+    python3 manage.py document_index reindex --if-needed
+    ```
+
+    This is a no-op if the index is already up to date, so it is safe to
+    run on every upgrade.
+
 ### Database Upgrades

 Paperless-ngx is compatible with Django-supported versions of PostgreSQL and MariaDB and it is generally
@@ -453,17 +463,42 @@ the search yields non-existing documents or won't find anything, you
 may need to recreate the index manually.

 ```
-document_index {reindex,optimize}
+document_index {reindex,optimize} [--recreate] [--if-needed]
 ```

-Specify `reindex` to have the index created from scratch. This may take
-some time.
+Specify `reindex` to rebuild the index from all documents in the database. This
+may take some time.

-Specify `optimize` to optimize the index. This updates certain aspects
-of the index and usually makes queries faster and also ensures that the
-autocompletion works properly. This command is regularly invoked by the
+Pass `--recreate` to wipe the existing index before rebuilding. Use this when the
+index is corrupted or you want a fully clean rebuild.
+
+Pass `--if-needed` to skip the rebuild if the index is already up to date (schema
+version and search language match). Safe to run on every startup or upgrade.
+
+Specify `optimize` to optimize the index. This command is regularly invoked by the
 task scheduler.

+!!! note
+
+    The `optimize` subcommand is deprecated and is now a no-op. Tantivy manages
+    segment merging automatically; no manual optimization step is needed.
+
+!!! note
+
+    **Docker users:** On every startup, the container runs
+    `document_index reindex --if-needed` automatically. Schema changes, language
+    changes, and missing indexes are all detected and rebuilt before the webserver
+    starts. No manual step is required.
+
+    **Bare metal users:** Run the following command after each upgrade (and after
+    changing `PAPERLESS_SEARCH_LANGUAGE`). It is a no-op if the index is already
+    up to date:
+
+    ```shell-session
+    cd src
+    python3 manage.py document_index reindex --if-needed
+    ```
+
 ### Clearing the database read cache

 If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.
@@ -723,6 +723,81 @@ services:

 1. Note the `:ro` tag means the folder will be mounted as read only. This is for extra security against changes

+## Installing third-party parser plugins {#parser-plugins}
+
+Third-party parser plugins extend Paperless-ngx to support additional file
+formats. A plugin is a Python package that advertises itself under the
+`paperless_ngx.parsers` entry point group. Refer to the
+[developer documentation](development.md#making-custom-parsers) for how to
+create one.
+
+!!! warning "Third-party plugins are not officially supported"
+
+    The Paperless-ngx maintainers do not provide support for third-party
+    plugins. Issues caused by or requiring changes to a third-party plugin
+    will be closed without further investigation. Always reproduce problems
+    with all plugins removed before filing a bug report.
+
+### Docker
+
+Use a [custom container initialization script](#custom-container-initialization)
+to install the package before the webserver starts. Create a shell script and
+mount it into `/custom-cont-init.d`:
+
+```bash
+#!/bin/bash
+# /path/to/my/scripts/install-parsers.sh
+
+pip install my-paperless-parser-package
+```
+
+Mount it in your `docker-compose.yml`:
+
+```yaml
+services:
+  webserver:
+    # ...
+    volumes:
+      - /path/to/my/scripts:/custom-cont-init.d:ro
+```
+
+The script runs as `root` before the webserver starts, so the package will be
+available when Paperless-ngx discovers plugins at startup.
+
+### Bare metal
+
+Install the package into the same Python environment that runs Paperless-ngx.
+If you followed the standard bare-metal install guide, that is the `paperless`
+user's environment:
+
+```bash
+sudo -Hu paperless pip3 install my-paperless-parser-package
+```
+
+If you are using `uv` or a virtual environment, activate it first and then run:
+
+```bash
+uv pip install my-paperless-parser-package
+# or
+pip install my-paperless-parser-package
+```
+
+Restart all Paperless-ngx services after installation so the new plugin is
+discovered.
+
+### Verifying installation
+
+On the next startup, check the application logs for a line confirming
+discovery:
+
+```
+Loaded third-party parser 'My Parser' v1.0.0 by Acme Corp (entrypoint: 'my_parser').
+```
+
+If this line does not appear, verify that the package is installed in the
+correct environment and that its `pyproject.toml` declares the
+`paperless_ngx.parsers` entry point.
+
 ## MySQL Caveats {#mysql-caveats}

 ### Case Sensitivity
@@ -62,10 +62,14 @@ The REST api provides five different forms of authentication.

 ## Searching for documents

-Full text searching is available on the `/api/documents/` endpoint. Two
-specific query parameters cause the API to return full text search
+Full text searching is available on the `/api/documents/` endpoint. The
+following query parameters cause the API to return Tantivy-backed search
 results:

+- `/api/documents/?text=your%20search%20query`: Search title and content
+  using simple substring-style search.
+- `/api/documents/?title_search=your%20search%20query`: Search title only
+  using simple substring-style search.
 - `/api/documents/?query=your%20search%20query`: Search for a document
  using a full text query. For details on the syntax, see [Basic Usage - Searching](usage.md#basic-usage_searching).
 - `/api/documents/?more_like_id=1234`: Search for documents similar to
@@ -167,9 +171,8 @@ Query parameters:
 - `term`: The incomplete term.
 - `limit`: Amount of results. Defaults to 10.

-Results returned by the endpoint are ordered by importance of the term
-in the document index. The first result is the term that has the highest
-[Tf/Idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) score in the index.
+Results are ordered by how many of the user's visible documents contain
+each matching word. The first result is the word that appears in the most documents.

 ```json
 ["term1", "term3", "term6", "term4"]
@@ -437,3 +440,8 @@ Initial API version.
  moved from the bulk edit endpoint to their own individual endpoints. Using these methods via
  the bulk edit endpoint is still supported for compatibility with versions < 10 until support
  for API v9 is dropped.
+- The `all` parameter of list endpoints is now deprecated and will be removed in a future version.
+- The bulk edit objects endpoint now supports `all` and `filters` parameters to avoid having to send
+  large lists of object IDs for operations affecting many objects.
+- The legacy `title_content` document search parameter is deprecated and will be removed in a future version.
+  Clients should use `text` for simple title-and-content search and `title_search` for title-only search.
@@ -1,5 +1,56 @@
 # Changelog

+## paperless-ngx 2.20.12
+
+### Security
+
+- Resolve [GHSA-96jx-fj7m-qh6x](https://github.com/paperless-ngx/paperless-ngx/security/advisories/GHSA-96jx-fj7m-qh6x)
+
+### Bug Fixes
+
+- Fix: Scope the workflow saves to prevent clobbering filename/archive_filename [@stumpylog](https://github.com/stumpylog) ([#12390](https://github.com/paperless-ngx/paperless-ngx/pull/12390))
+- Fix: don't try to usermod/groupmod when non-root + update docs (#<!---->12365) [@stumpylog](https://github.com/stumpylog) ([#12391](https://github.com/paperless-ngx/paperless-ngx/pull/12391))
+- Fix: avoid moving files if already moved [@shamoon](https://github.com/shamoon) ([#12389](https://github.com/paperless-ngx/paperless-ngx/pull/12389))
+- Fix: remove pagination from document notes api spec [@shamoon](https://github.com/shamoon) ([#12388](https://github.com/paperless-ngx/paperless-ngx/pull/12388))
+- Fix: fix file button hover color in dark mode [@shamoon](https://github.com/shamoon) ([#12367](https://github.com/paperless-ngx/paperless-ngx/pull/12367))
+- Fixhancement: only offer basic auth for appropriate requests [@shamoon](https://github.com/shamoon) ([#12362](https://github.com/paperless-ngx/paperless-ngx/pull/12362))
+
+### All App Changes
+
+<details>
+<summary>5 changes</summary>
+
+- Fix: Scope the workflow saves to prevent clobbering filename/archive_filename [@stumpylog](https://github.com/stumpylog) ([#12390](https://github.com/paperless-ngx/paperless-ngx/pull/12390))
+- Fix: avoid moving files if already moved [@shamoon](https://github.com/shamoon) ([#12389](https://github.com/paperless-ngx/paperless-ngx/pull/12389))
+- Fix: remove pagination from document notes api spec [@shamoon](https://github.com/shamoon) ([#12388](https://github.com/paperless-ngx/paperless-ngx/pull/12388))
+- Fix: fix file button hover color in dark mode [@shamoon](https://github.com/shamoon) ([#12367](https://github.com/paperless-ngx/paperless-ngx/pull/12367))
+- Fixhancement: only offer basic auth for appropriate requests [@shamoon](https://github.com/shamoon) ([#12362](https://github.com/paperless-ngx/paperless-ngx/pull/12362))
+</details>
+
+## paperless-ngx 2.20.11
+
+### Security
+
+- Resolve [GHSA-59xh-5vwx-4c4q](https://github.com/paperless-ngx/paperless-ngx/security/advisories/GHSA-59xh-5vwx-4c4q)
+
+### Bug Fixes
+
+- Fix: correct dropdown list active color in dark mode [@shamoon](https://github.com/shamoon) ([#12328](https://github.com/paperless-ngx/paperless-ngx/pull/12328))
+- Fixhancement: clear descendant selections in dropdown when parent toggled [@shamoon](https://github.com/shamoon) ([#12326](https://github.com/paperless-ngx/paperless-ngx/pull/12326))
+- Fix: prevent wrapping with larger amounts of tags on small cards, reset moreTags setting to correct count [@shamoon](https://github.com/shamoon) ([#12302](https://github.com/paperless-ngx/paperless-ngx/pull/12302))
+- Fix: prevent stale db filename during workflow actions [@shamoon](https://github.com/shamoon) ([#12289](https://github.com/paperless-ngx/paperless-ngx/pull/12289))
+
+### All App Changes
+
+<details>
+<summary>4 changes</summary>
+
+- Fix: correct dropdown list active color in dark mode [@shamoon](https://github.com/shamoon) ([#12328](https://github.com/paperless-ngx/paperless-ngx/pull/12328))
+- Fixhancement: clear descendant selections in dropdown when parent toggled [@shamoon](https://github.com/shamoon) ([#12326](https://github.com/paperless-ngx/paperless-ngx/pull/12326))
+- Fix: prevent wrapping with larger amounts of tags on small cards, reset moreTags setting to correct count [@shamoon](https://github.com/shamoon) ([#12302](https://github.com/paperless-ngx/paperless-ngx/pull/12302))
+- Fix: prevent stale db filename during workflow actions [@shamoon](https://github.com/shamoon) ([#12289](https://github.com/paperless-ngx/paperless-ngx/pull/12289))
+</details>
+
 ## paperless-ngx 2.20.10

 ### Bug Fixes
@@ -402,6 +402,12 @@ Defaults to `/usr/share/nltk_data`

 : This is where paperless will store the classification model.

+    !!! warning
+
+        The classification model uses Python's pickle serialization format.
+        Ensure this file is only writable by the paperless user, as a
+        maliciously crafted model file could execute arbitrary code when loaded.
+
    Defaults to `PAPERLESS_DATA_DIR/classification_model.pickle`.

 ## Logging
@@ -422,14 +428,20 @@ Defaults to `/usr/share/nltk_data`

 #### [`PAPERLESS_SECRET_KEY=<key>`](#PAPERLESS_SECRET_KEY) {#PAPERLESS_SECRET_KEY}

-: Paperless uses this to make session tokens. If you expose paperless
-on the internet, you need to change this, since the default secret
-is well known.
+: **Required.** Paperless uses this to make session tokens and sign
+sensitive data. Paperless will refuse to start if this is not set.

    Use any sequence of characters. The more, the better. You don't
-    need to remember this. Just face-roll your keyboard.
+    need to remember this. You can generate a suitable key with:

-    Default is listed in the file `src/paperless/settings.py`.
+        python3 -c "import secrets; print(secrets.token_urlsafe(64))"
+
+    !!! warning
+
+        This setting has no default value. You **must** set it before
+        starting Paperless. Existing installations that relied on the
+        previous default value should set `PAPERLESS_SECRET_KEY` to
+        that value to avoid invalidating existing sessions and tokens.

 #### [`PAPERLESS_URL=<url>`](#PAPERLESS_URL) {#PAPERLESS_URL}

@@ -674,6 +686,9 @@ See the corresponding [django-allauth documentation](https://docs.allauth.org/en
 for a list of provider configurations. You will also need to include the relevant Django 'application' inside the
 [PAPERLESS_APPS](#PAPERLESS_APPS) setting to activate that specific authentication provider (e.g. `allauth.socialaccount.providers.openid_connect` for the [OIDC Connect provider](https://docs.allauth.org/en/latest/socialaccount/providers/openid_connect.html)).

+: For OpenID Connect providers, set `settings.token_auth_method` if your identity provider
+requires a specific token endpoint authentication method.
+
    Defaults to None, which does not enable any third party authentication systems.

 #### [`PAPERLESS_SOCIAL_AUTO_SIGNUP=<bool>`](#PAPERLESS_SOCIAL_AUTO_SIGNUP) {#PAPERLESS_SOCIAL_AUTO_SIGNUP}
@@ -767,6 +782,14 @@ If both the [PAPERLESS_ACCOUNT_DEFAULT_GROUPS](#PAPERLESS_ACCOUNT_DEFAULT_GROUPS

    Defaults to 1209600 (2 weeks)

+#### [`PAPERLESS_TOKEN_THROTTLE_RATE=<rate>`](#PAPERLESS_TOKEN_THROTTLE_RATE) {#PAPERLESS_TOKEN_THROTTLE_RATE}
+
+: Rate limit for the API token authentication endpoint (`/api/token/`), used to mitigate brute-force login attempts.
+Uses Django REST Framework's [throttle rate format](https://www.django-rest-framework.org/api-guide/throttling/#setting-the-throttling-policy),
+e.g. `5/min`, `100/hour`, `1000/day`.
+
+    Defaults to `5/min`
+
 ## OCR settings {#ocr}

 Paperless uses [OCRmyPDF](https://ocrmypdf.readthedocs.io/en/latest/)
@@ -1100,6 +1123,32 @@ should be a valid crontab(5) expression describing when to run.

    Defaults to `0 0 * * *` or daily at midnight.

+#### [`PAPERLESS_SEARCH_LANGUAGE=<language>`](#PAPERLESS_SEARCH_LANGUAGE) {#PAPERLESS_SEARCH_LANGUAGE}
+
+: Sets the stemmer language for the full-text search index.
+Stemming improves recall by matching word variants (e.g. "running" matches "run").
+Changing this setting causes the index to be rebuilt automatically on next startup.
+An invalid value raises an error at startup.
+
+: Use the ISO 639-1 two-letter code (e.g. `en`, `de`, `fr`). Lowercase full names
+(e.g. `english`, `german`, `french`) are also accepted. The capitalized names shown
+in the [Tantivy Language enum](https://docs.rs/tantivy/latest/tantivy/tokenizer/enum.Language.html)
+documentation are **not** valid — use the lowercase equivalent.
+
+: If not set, paperless infers the language from
+[`PAPERLESS_OCR_LANGUAGE`](#PAPERLESS_OCR_LANGUAGE). If the OCR language has no
+Tantivy stemmer equivalent, stemming is disabled.
+
+    Defaults to unset (inferred from `PAPERLESS_OCR_LANGUAGE`).
+
+#### [`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD=<float>`](#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD) {#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD}
+
+: When set to a float value, approximate/fuzzy matching is applied alongside exact
+matching. Fuzzy results rank below exact matches. A value of `0.5` is a reasonable
+starting point. Leave unset to disable fuzzy matching entirely.
+
+    Defaults to unset (disabled).
+
 #### [`PAPERLESS_SANITY_TASK_CRON=<cron expression>`](#PAPERLESS_SANITY_TASK_CRON) {#PAPERLESS_SANITY_TASK_CRON}

 : Configures the scheduled sanity checker frequency. The value should be a
@@ -1391,6 +1440,14 @@ ports.

 ## Incoming Mail {#incoming_mail}

+#### [`PAPERLESS_EMAIL_ALLOW_INTERNAL_HOSTS=<bool>`](#PAPERLESS_EMAIL_ALLOW_INTERNAL_HOSTS) {#PAPERLESS_EMAIL_ALLOW_INTERNAL_HOSTS}
+
+: If set to false, incoming mail account connections are blocked when the
+configured IMAP hostname resolves to a non-public address (for example,
+localhost, link-local, or RFC1918 private ranges).
+
+    Defaults to true, which allows internal hosts.
+
 ### Email OAuth {#email_oauth}

 #### [`PAPERLESS_OAUTH_CALLBACK_BASE_URL=<str>`](#PAPERLESS_OAUTH_CALLBACK_BASE_URL) {#PAPERLESS_OAUTH_CALLBACK_BASE_URL}
@@ -1947,6 +2004,12 @@ current backend. If not supplied, defaults to "gpt-3.5-turbo" for OpenAI and "ll

    Defaults to None.

+#### [`PAPERLESS_AI_LLM_ALLOW_INTERNAL_ENDPOINTS=<bool>`](#PAPERLESS_AI_LLM_ALLOW_INTERNAL_ENDPOINTS) {#PAPERLESS_AI_LLM_ALLOW_INTERNAL_ENDPOINTS}
+
+: If set to false, Paperless blocks AI endpoint URLs that resolve to non-public addresses (e.g., localhost, etc).
+
+    Defaults to true, which allows internal endpoints.
+
 #### [`PAPERLESS_AI_LLM_INDEX_TASK_CRON=<cron expression>`](#PAPERLESS_AI_LLM_INDEX_TASK_CRON) {#PAPERLESS_AI_LLM_INDEX_TASK_CRON}

 : Configures the schedule to update the AI embeddings of text content and metadata for all documents. Only performed if
@@ -370,121 +370,367 @@ docker build --file Dockerfile --tag paperless:local .

 ## Extending Paperless-ngx

-Paperless-ngx does not have any fancy plugin systems and will probably never
-have. However, some parts of the application have been designed to allow
-easy integration of additional features without any modification to the
-base code.
+Paperless-ngx supports third-party document parsers via a Python entry point
+plugin system. Plugins are distributed as ordinary Python packages and
+discovered automatically at startup — no changes to the Paperless-ngx source
+are required.
+
+!!! warning "Third-party plugins are not officially supported"
+
+    The Paperless-ngx maintainers do not provide support for third-party
+    plugins. Issues that are caused by or require changes to a third-party
+    plugin will be closed without further investigation. If you believe you
+    have found a bug in Paperless-ngx itself (not in a plugin), please
+    reproduce it with all third-party plugins removed before filing an issue.

 ### Making custom parsers

-Paperless-ngx uses parsers to add documents. A parser is
-responsible for:
+Paperless-ngx uses parsers to add documents. A parser is responsible for:

- Retrieving the content from the original
- Creating a thumbnail
- _optional:_ Retrieving a created date from the original
- _optional:_ Creating an archived document from the original
+- Extracting plain-text content from the document
+- Generating a thumbnail image
+- _optional:_ Detecting the document's creation date
+- _optional:_ Producing a searchable PDF archive copy

-Custom parsers can be added to Paperless-ngx to support more file types. In
-order to do that, you need to write the parser itself and announce its
-existence to Paperless-ngx.
+Custom parsers are distributed as ordinary Python packages and registered
+via a [setuptools entry point](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).
+No changes to the Paperless-ngx source are required.

-The parser itself must extend `documents.parsers.DocumentParser` and
-must implement the methods `parse` and `get_thumbnail`. You can provide
-your own implementation to `get_date` if you don't want to rely on
-Paperless-ngx' default date guessing mechanisms.
+#### 1. Implementing the parser class
+
+Your parser must satisfy the `ParserProtocol` structural interface defined in
+`paperless.parsers`. The simplest approach is to write a plain class — no base
+class is required, only the right attributes and methods.
+
+**Class-level identity attributes**
+
+The registry reads these before instantiating the parser, so they must be
+plain class attributes (not instance attributes or properties):

 ```python
-class MyCustomParser(DocumentParser):
-
-    def parse(self, document_path, mime_type):
-        # This method does not return anything. Rather, you should assign
-        # whatever you got from the document to the following fields:
-
-        # The content of the document.
-        self.text = "content"
-
-        # Optional: path to a PDF document that you created from the original.
-        self.archive_path = os.path.join(self.tempdir, "archived.pdf")
-
-        # Optional: "created" date of the document.
-        self.date = get_created_from_metadata(document_path)
-
-    def get_thumbnail(self, document_path, mime_type):
-        # This should return the path to a thumbnail you created for this
-        # document.
-        return os.path.join(self.tempdir, "thumb.webp")
+class MyCustomParser:
+    name    = "My Format Parser"   # human-readable name shown in logs
+    version = "1.0.0"              # semantic version string
+    author  = "Acme Corp"          # author / organisation
+    url     = "https://example.com/my-parser"  # docs or issue tracker
 ```

-If you encounter any issues during parsing, raise a
-`documents.parsers.ParseError`.
+**Declaring supported MIME types**

-The `self.tempdir` directory is a temporary directory that is guaranteed
-to be empty and removed after consumption finished. You can use that
-directory to store any intermediate files and also use it to store the
-thumbnail / archived document.
-
-After that, you need to announce your parser to Paperless-ngx. You need to
-connect a handler to the `document_consumer_declaration` signal. Have a
-look in the file `src/paperless_tesseract/apps.py` on how that's done.
-The handler is a method that returns information about your parser:
+Return a `dict` mapping MIME type strings to preferred file extensions
+(including the leading dot). Paperless-ngx uses the extension when storing
+archive copies and serving files for download.

 ```python
-def myparser_consumer_declaration(sender, **kwargs):
+@classmethod
+def supported_mime_types(cls) -> dict[str, str]:
    return {
-        "parser": MyCustomParser,
-        "weight": 0,
-        "mime_types": {
-            "application/pdf": ".pdf",
-            "image/jpeg": ".jpg",
-        }
+        "application/x-my-format": ".myf",
+        "application/x-my-format-alt": ".myf",
    }
 ```

- `parser` is a reference to a class that extends `DocumentParser`.
- `weight` is used whenever two or more parsers are able to parse a
-  file: The parser with the higher weight wins. This can be used to
-  override the parsers provided by Paperless-ngx.
- `mime_types` is a dictionary. The keys are the mime types your
-  parser supports and the value is the default file extension that
-  Paperless-ngx should use when storing files and serving them for
-  download. We could guess that from the file extensions, but some
-  mime types have many extensions associated with them and the Python
-  methods responsible for guessing the extension do not always return
-  the same value.
+**Scoring**

-## Using Visual Studio Code devcontainer
+When more than one parser can handle a file, the registry calls `score()` on
+each candidate and picks the one with the highest result and equal scores favor third-party parsers over built-ins. Return `None` to
+decline handling a file even though the MIME type is listed as supported (for
+example, when a required external service is not configured).

-Another easy way to get started with development is to use Visual Studio
-Code devcontainers. This approach will create a preconfigured development
-environment with all of the required tools and dependencies.
-[Learn more about devcontainers](https://code.visualstudio.com/docs/devcontainers/containers).
-The .devcontainer/vscode/tasks.json and .devcontainer/vscode/launch.json files
-contain more information about the specific tasks and launch configurations (see the
-non-standard "description" field).
+| Score  | Meaning                                                                           |
+| ------ | --------------------------------------------------------------------------------- |
+| `None` | Decline — do not handle this file                                                 |
+| `10`   | Default priority used by all built-in parsers                                     |
+| `20`   | Priority used by the remote OCR built-in parser, allowing it to replace Tesseract |
+| `> 10` | Override a built-in parser for the same MIME type                                 |

-To get started:
+```python
+@classmethod
+def score(
+    cls,
+    mime_type: str,
+    filename: str,
+    path: "Path | None" = None,
+) -> int | None:
+    # Inspect filename or file bytes here if needed.
+    return 10
+```

-1. Clone the repository on your machine and open the Paperless-ngx folder in VS Code.
+**Archive and rendition flags**

-2. VS Code will prompt you with "Reopen in container". Do so and wait for the environment to start.
+```python
+@property
+def can_produce_archive(self) -> bool:
+    """True if parse() can produce a searchable PDF archive copy."""
+    return True   # or False if your parser doesn't produce PDFs

-3. In case your host operating system is Windows:
-   - The Source Control view in Visual Studio Code might show: "The detected Git repository is potentially unsafe as the folder is owned by someone other than the current user." Use "Manage Unsafe Repositories" to fix this.
-   - Git might have detecteded modifications for all files, because Windows is using CRLF line endings. Run `git checkout .` in the containers terminal to fix this issue.
+@property
+def requires_pdf_rendition(self) -> bool:
+    """True if the original format cannot be displayed by a browser
+    (e.g. DOCX, ODT) and the PDF output must always be kept."""
+    return False
+```

-4. Initialize the project by running the task **Project Setup: Run all Init Tasks**. This
-   will initialize the database tables and create a superuser. Then you can compile the front end
-   for production or run the frontend in debug mode.
+**Context manager — temp directory lifecycle**

-5. The project is ready for debugging, start either run the fullstack debug or individual debug
-   processes. Yo spin up the project without debugging run the task **Project Start: Run all Services**
+Paperless-ngx always uses parsers as context managers. Create a temporary
+working directory in `__enter__` (or `__init__`) and remove it in `__exit__`
+regardless of whether an exception occurred. Store intermediate files,
+thumbnails, and archive PDFs inside this directory.

-## Developing Date Parser Plugins
+```python
+import shutil
+import tempfile
+from pathlib import Path
+from typing import Self
+from types import TracebackType
+
+from django.conf import settings
+
+class MyCustomParser:
+    ...
+
+    def __init__(self, logging_group: object = None) -> None:
+        settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
+        self._tempdir = Path(
+            tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR)
+        )
+        self._text: str | None = None
+        self._archive_path: Path | None = None
+
+    def __enter__(self) -> Self:
+        return self
+
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        shutil.rmtree(self._tempdir, ignore_errors=True)
+```
+
+**Optional context — `configure()`**
+
+The consumer calls `configure()` with a `ParserContext` after instantiation
+and before `parse()`. If your parser doesn't need context, a no-op
+implementation is fine:
+
+```python
+from paperless.parsers import ParserContext
+
+def configure(self, context: ParserContext) -> None:
+    pass   # override if you need context.mailrule_id, etc.
+```
+
+**Parsing**
+
+`parse()` is the core method. It must not return a value; instead, store
+results in instance attributes and expose them via the accessor methods below.
+Raise `documents.parsers.ParseError` on any unrecoverable failure.
+
+```python
+from documents.parsers import ParseError
+
+def parse(
+    self,
+    document_path: Path,
+    mime_type: str,
+    *,
+    produce_archive: bool = True,
+) -> None:
+    try:
+        self._text = extract_text_from_my_format(document_path)
+    except Exception as e:
+        raise ParseError(f"Failed to parse {document_path}: {e}") from e
+
+    if produce_archive and self.can_produce_archive:
+        archive = self._tempdir / "archived.pdf"
+        convert_to_pdf(document_path, archive)
+        self._archive_path = archive
+```
+
+**Result accessors**
+
+```python
+def get_text(self) -> str | None:
+    return self._text
+
+def get_date(self) -> "datetime.datetime | None":
+    # Return a datetime extracted from the document, or None to let
+    # Paperless-ngx use its default date-guessing logic.
+    return None
+
+def get_archive_path(self) -> Path | None:
+    return self._archive_path
+
+def get_page_count(self, document_path: Path, mime_type: str) -> int | None:
+    # If the format doesn't have the concept of pages, return None
+    return count_pages(document_path)
+
+```
+
+**Thumbnail**
+
+`get_thumbnail()` may be called independently of `parse()`. Return the path
+to a WebP image inside `self._tempdir`. The image should be roughly 500 × 700
+pixels.
+
+```python
+def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
+    thumb = self._tempdir / "thumb.webp"
+    render_thumbnail(document_path, thumb)
+    return thumb
+```
+
+**Optional methods**
+
+These are called by the API on demand, not during the consumption pipeline.
+Implement them if your format supports the information; otherwise return
+`None` / `[]`.
+
+```python
+
+def extract_metadata(
+    self,
+    document_path: Path,
+    mime_type: str,
+) -> "list[MetadataEntry]":
+    # Must never raise. Return [] if metadata cannot be read.
+    from paperless.parsers import MetadataEntry
+    return [
+        MetadataEntry(
+            namespace="https://example.com/ns/",
+            prefix="ex",
+            key="Author",
+            value="Alice",
+        )
+    ]
+```
+
+#### 2. Registering via entry point
+
+Add the following to your package's `pyproject.toml`. The key (left of `=`)
+is an arbitrary name used only in log output; the value is the
+`module:ClassName` import path.
+
+```toml
+[project.entry-points."paperless_ngx.parsers"]
+my_parser = "my_package.parsers:MyCustomParser"
+```
+
+Install your package into the same Python environment as Paperless-ngx (or
+add it to the Docker image), and the parser will be discovered automatically
+on the next startup. No configuration changes are needed.
+
+To verify discovery, check the application logs at startup for a line like:
+
+```
+Loaded third-party parser 'My Format Parser' v1.0.0 by Acme Corp (entrypoint: 'my_parser').
+```
+
+#### 3. Utilities
+
+`paperless.parsers.utils` provides helpers you can import directly:
+
+| Function                                | Description                                                      |
+| --------------------------------------- | ---------------------------------------------------------------- |
+| `read_file_handle_unicode_errors(path)` | Read a file as UTF-8, replacing invalid bytes instead of raising |
+| `get_page_count_for_pdf(path)`          | Count pages in a PDF using pikepdf                               |
+| `extract_pdf_metadata(path)`            | Extract XMP metadata from a PDF as a `list[MetadataEntry]`       |
+
+#### Minimal example
+
+A complete, working parser for a hypothetical plain-XML format:
+
+```python
+from __future__ import annotations
+
+import shutil
+import tempfile
+from pathlib import Path
+from typing import Self
+from types import TracebackType
+import xml.etree.ElementTree as ET
+
+from django.conf import settings
+
+from documents.parsers import ParseError
+from paperless.parsers import ParserContext
+
+
+class XmlDocumentParser:
+    name    = "XML Parser"
+    version = "1.0.0"
+    author  = "Acme Corp"
+    url     = "https://example.com/xml-parser"
+
+    @classmethod
+    def supported_mime_types(cls) -> dict[str, str]:
+        return {"application/xml": ".xml", "text/xml": ".xml"}
+
+    @classmethod
+    def score(cls, mime_type: str, filename: str, path: Path | None = None) -> int | None:
+        return 10
+
+    @property
+    def can_produce_archive(self) -> bool:
+        return False
+
+    @property
+    def requires_pdf_rendition(self) -> bool:
+        return False
+
+    def __init__(self, logging_group: object = None) -> None:
+        settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
+        self._tempdir = Path(tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR))
+        self._text: str | None = None
+
+    def __enter__(self) -> Self:
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb) -> None:
+        shutil.rmtree(self._tempdir, ignore_errors=True)
+
+    def configure(self, context: ParserContext) -> None:
+        pass
+
+    def parse(self, document_path: Path, mime_type: str, *, produce_archive: bool = True) -> None:
+        try:
+            tree = ET.parse(document_path)
+            self._text = " ".join(tree.getroot().itertext())
+        except ET.ParseError as e:
+            raise ParseError(f"XML parse error: {e}") from e
+
+    def get_text(self) -> str | None:
+        return self._text
+
+    def get_date(self):
+        return None
+
+    def get_archive_path(self) -> Path | None:
+        return None
+
+    def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
+        from PIL import Image, ImageDraw
+        img = Image.new("RGB", (500, 700), color="white")
+        ImageDraw.Draw(img).text((10, 10), "XML Document", fill="black")
+        out = self._tempdir / "thumb.webp"
+        img.save(out, format="WEBP")
+        return out
+
+    def get_page_count(self, document_path: Path, mime_type: str) -> int | None:
+        return None
+
+    def extract_metadata(self, document_path: Path, mime_type: str) -> list:
+        return []
+```
+
+### Developing date parser plugins

 Paperless-ngx uses a plugin system for date parsing, allowing you to extend or replace the default date parsing behavior. Plugins are discovered using [Python entry points](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).

-### Creating a Date Parser Plugin
+#### Creating a Date Parser Plugin

 To create a custom date parser plugin, you need to:

@@ -492,7 +738,7 @@ To create a custom date parser plugin, you need to:
 2. Implement the required abstract method
 3. Register your plugin via an entry point

-#### 1. Implementing the Parser Class
+##### 1. Implementing the Parser Class

 Your parser must extend `documents.plugins.date_parsing.DateParserPluginBase` and implement the `parse` method:

@@ -532,7 +778,7 @@ class MyDateParserPlugin(DateParserPluginBase):
        yield another_datetime
 ```

-#### 2. Configuration and Helper Methods
+##### 2. Configuration and Helper Methods

 Your parser instance is initialized with a `DateParserConfig` object accessible via `self.config`. This provides:

@@ -565,11 +811,11 @@ def _filter_date(
    """
 ```

-#### 3. Resource Management (Optional)
+##### 3. Resource Management (Optional)

 If your plugin needs to acquire or release resources (database connections, API clients, etc.), override the context manager methods. Paperless-ngx will always use plugins as context managers, ensuring resources can be released even in the event of errors.

-#### 4. Registering Your Plugin
+##### 4. Registering Your Plugin

 Register your plugin using a setuptools entry point in your package's `pyproject.toml`:

@@ -580,7 +826,7 @@ my_parser = "my_package.parsers:MyDateParserPlugin"

 The entry point name (e.g., `"my_parser"`) is used for sorting when multiple plugins are found. Paperless-ngx will use the first plugin alphabetically by name if multiple plugins are discovered.

-### Plugin Discovery
+#### Plugin Discovery

 Paperless-ngx automatically discovers and loads date parser plugins at runtime. The discovery process:

@@ -591,7 +837,7 @@ Paperless-ngx automatically discovers and loads date parser plugins at runtime.

 If multiple plugins are installed, a warning is logged indicating which plugin was selected.

-### Example: Simple Date Parser
+#### Example: Simple Date Parser

 Here's a minimal example that only looks for ISO 8601 dates:

@@ -623,3 +869,30 @@ class ISODateParserPlugin(DateParserPluginBase):
            if filtered_date is not None:
                yield filtered_date
 ```
+
+## Using Visual Studio Code devcontainer
+
+Another easy way to get started with development is to use Visual Studio
+Code devcontainers. This approach will create a preconfigured development
+environment with all of the required tools and dependencies.
+[Learn more about devcontainers](https://code.visualstudio.com/docs/devcontainers/containers).
+The .devcontainer/vscode/tasks.json and .devcontainer/vscode/launch.json files
+contain more information about the specific tasks and launch configurations (see the
+non-standard "description" field).
+
+To get started:
+
+1. Clone the repository on your machine and open the Paperless-ngx folder in VS Code.
+
+2. VS Code will prompt you with "Reopen in container". Do so and wait for the environment to start.
+
+3. In case your host operating system is Windows:
+   - The Source Control view in Visual Studio Code might show: "The detected Git repository is potentially unsafe as the folder is owned by someone other than the current user." Use "Manage Unsafe Repositories" to fix this.
+   - Git might have detecteded modifications for all files, because Windows is using CRLF line endings. Run `git checkout .` in the containers terminal to fix this issue.
+
+4. Initialize the project by running the task **Project Setup: Run all Init Tasks**. This
+   will initialize the database tables and create a superuser. Then you can compile the front end
+   for production or run the frontend in debug mode.
+
+5. The project is ready for debugging, start either run the fullstack debug or individual debug
+   processes. Yo spin up the project without debugging run the task **Project Start: Run all Services**
@@ -1,5 +1,24 @@
 # v3 Migration Guide

+## Secret Key is Now Required
+
+The `PAPERLESS_SECRET_KEY` environment variable is now required. This is a critical security setting used for cryptographic signing and should be set to a long, random value.
+
+### Action Required
+
+If you are upgrading an existing installation, you must now set `PAPERLESS_SECRET_KEY` explicitly.
+
+If your installation was relying on the previous built-in default key, you have two options:
+
+- Set `PAPERLESS_SECRET_KEY` to that previous value to preserve existing sessions and tokens.
+- Set `PAPERLESS_SECRET_KEY` to a new random value to improve security, understanding that this will invalidate existing sessions and other signed tokens.
+
+For new installations, or if you choose to rotate the key, you may generate a new secret key with:
+
+```bash
+python3 -c "import secrets; print(secrets.token_urlsafe(64))"
+```
+
 ## Consumer Settings Changes

 The v3 consumer command uses a [different library](https://watchfiles.helpmanual.io/) to unify
@@ -103,3 +122,61 @@ Multiple options are combined in a single value:
 ```bash
 PAPERLESS_DB_OPTIONS="sslmode=require;sslrootcert=/certs/ca.pem;pool.max_size=10"
 ```
+
+## Search Index (Whoosh -> Tantivy)
+
+The full-text search backend has been replaced with [Tantivy](https://github.com/quickwit-oss/tantivy).
+The index format is incompatible with Whoosh, so **the search index is automatically rebuilt from
+scratch on first startup after upgrading**. No manual action is required for the rebuild itself.
+
+### Note and custom field search syntax
+
+The old Whoosh index exposed `note` and `custom_field` as flat text fields that were included in
+unqualified searches (e.g. just typing `invoice` would match note content). With Tantivy these are
+now structured JSON fields accessed via dotted paths:
+
+| Old syntax           | New syntax                  |
+| -------------------- | --------------------------- |
+| `note:query`         | `notes.note:query`          |
+| `custom_field:query` | `custom_fields.value:query` |
+
+**Saved views are migrated automatically.** Any saved view filter rule that used an explicit
+`note:` or `custom_field:` field prefix in a fulltext query is rewritten to the new syntax by a
+data migration that runs on upgrade.
+
+**Unqualified queries are not migrated.** If you had a saved view with a plain search term (e.g.
+`invoice`) that happened to match note content or custom field values, it will no longer return
+those matches. Update those queries to use the explicit prefix, for example:
+
+```
+invoice OR notes.note:invoice OR custom_fields.value:invoice
+```
+
+Custom field names can also be searched with `custom_fields.name:fieldname`.
+
+## OpenID Connect Token Endpoint Authentication
+
+Some existing OpenID Connect setups may require an explicit token endpoint authentication method after upgrading to v3.
+
+#### Action Required
+
+If OIDC login fails at the callback with an `invalid_client` error, add `token_auth_method` to the provider `settings` in
+[`PAPERLESS_SOCIALACCOUNT_PROVIDERS`](configuration.md#PAPERLESS_SOCIALACCOUNT_PROVIDERS).
+
+For example:
+
+```json
+{
+  "openid_connect": {
+    "APPS": [
+      {
+        ...
+        "settings": {
+          "server_url": "https://login.example.com",
+          "token_auth_method": "client_secret_basic"
+        }
+      }
+    ]
+  }
+}
+```
@@ -140,24 +140,17 @@ a [superuser](usage.md#superusers) account.

 !!! warning

-    It is currently not possible to run the container rootless if additional languages are specified via `PAPERLESS_OCR_LANGUAGES`.
+    It is not possible to run the container rootless if additional languages are specified via `PAPERLESS_OCR_LANGUAGES`.

-If you want to run Paperless as a rootless container, make this
-change in `docker-compose.yml`:
+If you want to run Paperless as a rootless container, set `user:` in `docker-compose.yml` to the UID and GID of your host user (use `id -u` and `id -g` to find these values). The container process starts directly as that user with no internal privilege remapping:

- Set the `user` running the container to map to the `paperless`
-  user in the container. This value (`user_id` below) should be
-  the same ID that `USERMAP_UID` and `USERMAP_GID` are set to in
-  `docker-compose.env`. See `USERMAP_UID` and `USERMAP_GID`
-  [here](configuration.md#docker).
+```yaml
+webserver:
+  image: ghcr.io/paperless-ngx/paperless-ngx:latest
+  user: '1000:1000'
+```

-Your entry for Paperless should contain something like:
-
-> ```
-> webserver:
->   image: ghcr.io/paperless-ngx/paperless-ngx:latest
->   user: <user_id>
-> ```
+Do not combine this with `USERMAP_UID` or `USERMAP_GID`, which are intended for the non-rootless case described in step 3.

 **File systems without inotify support (e.g. NFS)**

@@ -814,13 +814,20 @@ contract you signed 8 years ago).

 When you search paperless for a document, it tries to match this query
 against your documents. Paperless will look for matching documents by
-inspecting their content, title, correspondent, type and tags. Paperless
-returns a scored list of results, so that documents matching your query
-better will appear further up in the search results.
+inspecting their content, title, correspondent, type, tags, notes, and
+custom field values. Paperless returns a scored list of results, so that
+documents matching your query better will appear further up in the search
+results.

 By default, paperless returns only documents which contain all words
-typed in the search bar. However, paperless also offers advanced search
-syntax if you want to drill down the results further.
+typed in the search bar. A few things to know about how matching works:
+
+- **Word-order-independent**: "invoice unpaid" and "unpaid invoice" return the same results.
+- **Accent-insensitive**: searching `resume` also finds `résumé`, `cafe` finds `café`.
+- **Separator-agnostic**: punctuation and separators are stripped during indexing, so
+  searching a partial number like `1312` finds documents containing `A-1312/B`.
+
+Paperless also offers advanced search syntax if you want to drill down further.

 Matching documents with logical expressions:

@@ -849,18 +856,69 @@ Matching inexact words:
 produ*name
 ```

+Matching natural date keywords:
+
+```
+added:today
+modified:yesterday
+created:this_week
+added:last_month
+modified:this_year
+```
+
+Supported date keywords: `today`, `yesterday`, `this_week`, `last_week`,
+`this_month`, `last_month`, `this_year`, `last_year`.
+
+#### Searching custom fields
+
+Custom field values are included in the full-text index, so a plain search
+already matches documents whose custom field values contain your search terms.
+To narrow by field name or value specifically:
+
+```
+custom_fields.value:policy
+custom_fields.name:"Contract Number"
+custom_fields.name:Insurance custom_fields.value:policy
+```
+
+- `custom_fields.value` matches against the value of any custom field.
+- `custom_fields.name` matches the name of the field (use quotes for multi-word names).
+- Combine both to find documents where a specific named field contains a specific value.
+
+Because separators are stripped during indexing, individual parts of formatted
+codes are searchable on their own. A value stored as `A-1312/99.50` produces the
+tokens `a`, `1312`, `99`, `50` — each searchable independently:
+
+```
+custom_fields.value:1312
+custom_fields.name:"Contract Number" custom_fields.value:1312
+```
+
 !!! note

-    Inexact terms are hard for search indexes. These queries might take a
-    while to execute. That's why paperless offers auto complete and query
-    correction.
+    Custom date fields do not support relative date syntax (e.g. `[now to 2 weeks]`).
+    For date ranges on custom date fields, use the document list filters in the web UI.
+
+#### Searching notes
+
+Notes content is included in full-text search automatically. To search
+by note author or content specifically:
+
+```
+notes.user:alice
+notes.note:reminder
+notes.user:alice notes.note:insurance
+```

 All of these constructs can be combined as you see fit. If you want to
-learn more about the query language used by paperless, paperless uses
-Whoosh's default query language. Head over to [Whoosh query
-language](https://whoosh.readthedocs.io/en/latest/querylang.html). For
-details on what date parsing utilities are available, see [Date
-parsing](https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries).
+learn more about the query language used by paperless, see the
+[Tantivy query language documentation](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).
+
+!!! note
+
+    Fuzzy (approximate) matching can be enabled by setting
+    [`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD`](configuration.md#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD).
+    When enabled, paperless will include near-miss results ranked below exact matches.

 ## Keyboard shortcuts / hotkeys