Docs: update search documentation for Tantivy backend

- configuration.md: add PAPERLESS_SEARCH_LANGUAGE and PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD settings - usage.md: replace Whoosh query language link with Tantivy; remove "inexact terms are slow" note; add full natural date keyword list; add fuzzy search note - api.md: update autocomplete ordering description (alphabetical, not Tf/Idf) - administration.md: deprecate `optimize` subcommand (now a no-op); add one-time reindex upgrade note Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 02:55:24 +00:00 · 2026-03-30 13:19:27 -07:00
parent 7f63259f41
commit b626f5602c
4 changed files with 50 additions and 15 deletions
@@ -459,11 +459,20 @@ document_index {reindex,optimize}
 Specify `reindex` to have the index created from scratch. This may take
 some time.

-Specify `optimize` to optimize the index. This updates certain aspects
-of the index and usually makes queries faster and also ensures that the
-autocompletion works properly. This command is regularly invoked by the
+Specify `optimize` to optimize the index. This command is regularly invoked by the
 task scheduler.

+!!! note
+
+    The `optimize` subcommand is deprecated and is now a no-op. Tantivy manages
+    segment merging automatically; no manual optimization step is needed.
+
+!!! note
+
+    On first startup after upgrading from a previous version, paperless detects
+    that the index format has changed and automatically performs a one-time full
+    reindex. No manual migration step is required.
+
 ### Clearing the database read cache

 If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context.
@@ -167,9 +167,8 @@ Query parameters:
 - `term`: The incomplete term.
 - `limit`: Amount of results. Defaults to 10.

-Results returned by the endpoint are ordered by importance of the term
-in the document index. The first result is the term that has the highest
-[Tf/Idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) score in the index.
+Results are ordered alphabetically by prefix match. The first result is
+the lexicographically first word in the index that starts with the given term.

 ```json
 ["term1", "term3", "term6", "term4"]
@@ -1103,6 +1103,23 @@ should be a valid crontab(5) expression describing when to run.

    Defaults to `0 0 * * *` or daily at midnight.

+#### [`PAPERLESS_SEARCH_LANGUAGE=<language>`](#PAPERLESS_SEARCH_LANGUAGE) {#PAPERLESS_SEARCH_LANGUAGE}
+
+: Sets the stemmer language for the full-text search index (e.g. `en`, `de`, `fr`).
+Stemming improves recall by matching word variants (e.g. "running" matches "run").
+Changing this setting causes the index to be rebuilt automatically on next startup.
+Supported values are the language names accepted by Tantivy's built-in stemmer.
+
+    Defaults to `""` (no stemming).
+
+#### [`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD=<float>`](#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD) {#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD}
+
+: When set to a float value, approximate/fuzzy matching is applied alongside exact
+matching. Fuzzy results rank below exact matches. A value of `0.5` is a reasonable
+starting point. Leave unset to disable fuzzy matching entirely.
+
+    Defaults to unset (disabled).
+
 #### [`PAPERLESS_SANITY_TASK_CRON=<cron expression>`](#PAPERLESS_SANITY_TASK_CRON) {#PAPERLESS_SANITY_TASK_CRON}

 : Configures the scheduled sanity checker frequency. The value should be a
@@ -839,18 +839,28 @@ Matching inexact words:
 produ*name
 ```

-!!! note
+Matching natural date keywords:

-    Inexact terms are hard for search indexes. These queries might take a
-    while to execute. That's why paperless offers auto complete and query
-    correction.
+```
+added:today
+modified:yesterday
+created:this_week
+added:last_month
+modified:this_year
+```
+
+Supported date keywords: `today`, `yesterday`, `this_week`, `last_week`,
+`this_month`, `last_month`, `this_year`, `last_year`.

 All of these constructs can be combined as you see fit. If you want to
-learn more about the query language used by paperless, paperless uses
-Whoosh's default query language. Head over to [Whoosh query
-language](https://whoosh.readthedocs.io/en/latest/querylang.html). For
-details on what date parsing utilities are available, see [Date
-parsing](https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries).
+learn more about the query language used by paperless, see the
+[Tantivy query language documentation](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).
+
+!!! note
+
+    Fuzzy (approximate) matching can be enabled by setting
+    [`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD`](configuration.md#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD).
+    When enabled, paperless will include near-miss results ranked below exact matches.

 ## Keyboard shortcuts / hotkeys