diff --git a/docs/administration.md b/docs/administration.md index e55b899f5..9d123cd38 100644 --- a/docs/administration.md +++ b/docs/administration.md @@ -459,11 +459,20 @@ document_index {reindex,optimize} Specify `reindex` to have the index created from scratch. This may take some time. -Specify `optimize` to optimize the index. This updates certain aspects -of the index and usually makes queries faster and also ensures that the -autocompletion works properly. This command is regularly invoked by the +Specify `optimize` to optimize the index. This command is regularly invoked by the task scheduler. +!!! note + + The `optimize` subcommand is deprecated and is now a no-op. Tantivy manages + segment merging automatically; no manual optimization step is needed. + +!!! note + + On first startup after upgrading from a previous version, paperless detects + that the index format has changed and automatically performs a one-time full + reindex. No manual migration step is required. + ### Clearing the database read cache If the database read cache is enabled, **you must run this command** after making any changes to the database outside the application context. diff --git a/docs/api.md b/docs/api.md index bd550c519..23fd2dd05 100644 --- a/docs/api.md +++ b/docs/api.md @@ -167,9 +167,8 @@ Query parameters: - `term`: The incomplete term. - `limit`: Amount of results. Defaults to 10. -Results returned by the endpoint are ordered by importance of the term -in the document index. The first result is the term that has the highest -[Tf/Idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) score in the index. +Results are ordered alphabetically by prefix match. The first result is +the lexicographically first word in the index that starts with the given term. ```json ["term1", "term3", "term6", "term4"] diff --git a/docs/configuration.md b/docs/configuration.md index 4ce2d9dc6..0482e9ee4 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1103,6 +1103,23 @@ should be a valid crontab(5) expression describing when to run. Defaults to `0 0 * * *` or daily at midnight. +#### [`PAPERLESS_SEARCH_LANGUAGE=`](#PAPERLESS_SEARCH_LANGUAGE) {#PAPERLESS_SEARCH_LANGUAGE} + +: Sets the stemmer language for the full-text search index (e.g. `en`, `de`, `fr`). +Stemming improves recall by matching word variants (e.g. "running" matches "run"). +Changing this setting causes the index to be rebuilt automatically on next startup. +Supported values are the language names accepted by Tantivy's built-in stemmer. + + Defaults to `""` (no stemming). + +#### [`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD=`](#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD) {#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD} + +: When set to a float value, approximate/fuzzy matching is applied alongside exact +matching. Fuzzy results rank below exact matches. A value of `0.5` is a reasonable +starting point. Leave unset to disable fuzzy matching entirely. + + Defaults to unset (disabled). + #### [`PAPERLESS_SANITY_TASK_CRON=`](#PAPERLESS_SANITY_TASK_CRON) {#PAPERLESS_SANITY_TASK_CRON} : Configures the scheduled sanity checker frequency. The value should be a diff --git a/docs/usage.md b/docs/usage.md index 6da6c4d77..97d1a3550 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -839,18 +839,28 @@ Matching inexact words: produ*name ``` -!!! note +Matching natural date keywords: - Inexact terms are hard for search indexes. These queries might take a - while to execute. That's why paperless offers auto complete and query - correction. +``` +added:today +modified:yesterday +created:this_week +added:last_month +modified:this_year +``` + +Supported date keywords: `today`, `yesterday`, `this_week`, `last_week`, +`this_month`, `last_month`, `this_year`, `last_year`. All of these constructs can be combined as you see fit. If you want to -learn more about the query language used by paperless, paperless uses -Whoosh's default query language. Head over to [Whoosh query -language](https://whoosh.readthedocs.io/en/latest/querylang.html). For -details on what date parsing utilities are available, see [Date -parsing](https://whoosh.readthedocs.io/en/latest/dates.html#parsing-date-queries). +learn more about the query language used by paperless, see the +[Tantivy query language documentation](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html). + +!!! note + + Fuzzy (approximate) matching can be enabled by setting + [`PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD`](configuration.md#PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD). + When enabled, paperless will include near-miss results ranked below exact matches. ## Keyboard shortcuts / hotkeys