Docs: update search documentation for Tantivy backend

- configuration.md: add PAPERLESS_SEARCH_LANGUAGE and
  PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD settings
- usage.md: replace Whoosh query language link with Tantivy; remove
  "inexact terms are slow" note; add full natural date keyword list;
  add fuzzy search note
- api.md: update autocomplete ordering description (alphabetical, not Tf/Idf)
- administration.md: deprecate `optimize` subcommand (now a no-op);
  add one-time reindex upgrade note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Trenton H
2026-03-30 13:19:27 -07:00
parent 7f63259f41
commit b626f5602c
4 changed files with 50 additions and 15 deletions

View File

@@ -459,11 +459,20 @@ document_index {reindex,optimize}
Specify `reindex` to have the index created from scratch. This may take
some time.
Specify `optimize` to optimize the index. This updates certain aspects
of the index and usually makes queries faster and also ensures that the
autocompletion works properly. This command is regularly invoked by the
Specify `optimize` to optimize the index. This command is regularly invoked by the
task scheduler.
!!! note
The `optimize` subcommand is deprecated and is now a no-op. Tantivy manages
segment merging automatically; no manual optimization step is needed.
!!! note
On first startup after upgrading from a previous version, paperless detects
that the index format has changed and automatically performs a one-time full
reindex. No manual migration step is required.
### Clearing the database read cache
If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.

View File

@@ -167,9 +167,8 @@ Query parameters:
- `term`: The incomplete term.
- `limit`: Amount of results. Defaults to 10.
Results returned by the endpoint are ordered by importance of the term
in the document index. The first result is the term that has the highest
[Tf/Idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) score in the index.
Results are ordered alphabetically by prefix match. The first result is
the lexicographically first word in the index that starts with the given term.
```json
["term1", "term3", "term6", "term4"]

View File

@@ -1103,6 +1103,23 @@ should be a valid crontab(5) expression describing when to run.
Defaults to `0 0 * * *` or daily at midnight.
#### [`PAPERLESS_SEARCH_LANGUAGE=<language>`](#PAPERLESS_SEARCH_LANGUAGE) {#PAPERLESS_SEARCH_LANGUAGE}
: Sets the stemmer language for the full-text search index (e.g. `en`, `de`, `fr`).
Stemming improves recall by matching word variants (e.g. "running" matches "run").
Changing this setting causes the index to be rebuilt automatically on next startup.
Supported values are the language names accepted by Tantivy's built-in stemmer.
Defaults to `""` (no stemming).
#### [`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD=<float>`](#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD) {#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD}
: When set to a float value, approximate/fuzzy matching is applied alongside exact
matching. Fuzzy results rank below exact matches. A value of `0.5` is a reasonable
starting point. Leave unset to disable fuzzy matching entirely.
Defaults to unset (disabled).
#### [`PAPERLESS_SANITY_TASK_CRON=<cron expression>`](#PAPERLESS_SANITY_TASK_CRON) {#PAPERLESS_SANITY_TASK_CRON}
: Configures the scheduled sanity checker frequency. The value should be a

View File

@@ -839,18 +839,28 @@ Matching inexact words:
produ*name
```
!!! note
Matching natural date keywords:
Inexact terms are hard for search indexes. These queries might take a
while to execute. That's why paperless offers auto complete and query
correction.
```
added:today
modified:yesterday
created:this_week
added:last_month
modified:this_year
```
Supported date keywords: `today`, `yesterday`, `this_week`, `last_week`,
`this_month`, `last_month`, `this_year`, `last_year`.
All of these constructs can be combined as you see fit. If you want to
learn more about the query language used by paperless, paperless uses
Whoosh's default query language. Head over to [Whoosh query
language](https://whoosh.readthedocs.io/en/latest/querylang.html). For
details on what date parsing utilities are available, see [Date
parsing](https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries).
learn more about the query language used by paperless, see the
[Tantivy query language documentation](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).
!!! note
Fuzzy (approximate) matching can be enabled by setting
[`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD`](configuration.md#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD).
When enabled, paperless will include near-miss results ranked below exact matches.
## Keyboard shortcuts / hotkeys