Try to further clarify some interactions

This commit is contained in:
Trenton H
2026-03-27 08:04:07 -07:00
parent de97eea3e2
commit 84ab36ba70
2 changed files with 30 additions and 4 deletions

View File

@@ -853,12 +853,31 @@ for display in the web interface.
stored. Saves disk space but the web viewer will display the
original file directly.
**Behaviour by file type and mode** (`auto` column shows the default):
| Document type | `never` | `auto` (default) | `always` |
| -------------------------- | ------- | -------------------------- | -------- |
| Scanned image (TIFF, JPEG) | No | **Yes** | Yes |
| Image-based PDF | No | **Yes** (short/no text) | Yes |
| Born-digital PDF | No | No (has embedded text) | Yes |
| Plain text, email, HTML | No | No | No |
| DOCX / ODT (via Tika) | Yes\* | Yes\* | Yes\* |
\* Tika always produces a PDF rendition for display; this counts as
the archive regardless of the setting.
!!! note
This setting only applies to parsers that can produce archives
(e.g. the Tesseract/OCR parser). Parsers that must convert
documents to PDF for display (e.g. DOCX, ODT via Tika) will
always produce a PDF regardless of this setting.
This setting applies to the built-in Tesseract parser. Parsers
that must always convert documents to PDF for display (e.g. DOCX,
ODT via Tika) will produce a PDF regardless of this setting.
!!! note
The **remote OCR parser** (Azure AI) always produces a searchable
PDF and stores it as the archive copy, regardless of this setting.
`ARCHIVE_FILE_GENERATION=never` has no effect when the remote
parser handles a document.
#### [`PAPERLESS_OCR_CLEAN=<mode>`](#PAPERLESS_OCR_CLEAN) {#PAPERLESS_OCR_CLEAN}

View File

@@ -156,6 +156,13 @@ PAPERLESS_ARCHIVE_FILE_GENERATION=auto
Paperless will emit a startup warning if the old environment variables are still set.
### Remote OCR parser
If you use the **remote OCR parser** (Azure AI), note that it always produces a
searchable PDF and stores it as the archive copy. `ARCHIVE_FILE_GENERATION=never`
has no effect for documents handled by the remote parser — the archive is produced
unconditionally by the remote engine.
## OpenID Connect Token Endpoint Authentication
Some existing OpenID Connect setups may require an explicit token endpoint authentication method after upgrading to v3.