Chore: bump Docker index version to 1 (Tantivy); add language change detection

- Reset index_version to 1 — Tantivy is a full format change so
  versioning restarts from scratch; all existing v9 installs trigger
  an automatic reindex on next container start
- Add PAPERLESS_SEARCH_LANGUAGE change detection: track raw env var in
  .index_language so changing the language setting auto-reindexes;
  raw env var (not resolved language) avoids false positives from
  OCR_LANGUAGE inference
- docs/administration.md: clarify that Docker handles the post-upgrade
  reindex automatically; bare metal users need to run
  document_index reindex manually; add that as step 4 in the
  bare metal upgrade guide

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Trenton H
2026-03-30 13:55:20 -07:00
parent fdf08bdc43
commit ae494d4b6a
2 changed files with 42 additions and 7 deletions
@@ -3,9 +3,14 @@
declare -r log_prefix="[init-index]"
declare -r index_version=9
# Version 1: Tantivy backend (replaces Whoosh; resets versioning from scratch)
declare -r index_version=1
declare -r data_dir="${PAPERLESS_DATA_DIR:-/usr/src/paperless/data}"
declare -r index_version_file="${data_dir}/.index_version"
declare -r index_language_file="${data_dir}/.index_language"
# Track the raw env var (not the resolved language) so inference changes
# don't cause spurious reindexes — only explicit setting changes trigger one.
declare -r search_language="${PAPERLESS_SEARCH_LANGUAGE:-}"
update_index () {
echo "${log_prefix} Search index out of date. Updating..."
@@ -13,16 +18,24 @@ update_index () {
if [[ -n "${USER_IS_NON_ROOT}" ]]; then
python3 manage.py document_index reindex --no-progress-bar
echo ${index_version} | tee "${index_version_file}" > /dev/null
echo "${search_language}" | tee "${index_language_file}" > /dev/null
else
s6-setuidgid paperless python3 manage.py document_index reindex --no-progress-bar
echo ${index_version} | s6-setuidgid paperless tee "${index_version_file}" > /dev/null
echo "${search_language}" | s6-setuidgid paperless tee "${index_language_file}" > /dev/null
fi
}
if [[ (! -f "${index_version_file}") ]]; then
if [[ ! -f "${index_version_file}" ]]; then
echo "${log_prefix} No index version file found"
update_index
elif [[ $(<"${index_version_file}") != "$index_version" ]]; then
echo "${log_prefix} index version updated"
elif [[ $(<"${index_version_file}") != "${index_version}" ]]; then
echo "${log_prefix} Index version updated"
update_index
elif [[ ! -f "${index_language_file}" ]]; then
echo "${log_prefix} No language file found"
update_index
elif [[ $(<"${index_language_file}") != "${search_language}" ]]; then
echo "${log_prefix} Search language changed"
update_index
fi
+25 -3
View File
@@ -180,6 +180,17 @@ following:
This might not actually do anything. Not every new paperless version
comes with new database migrations.
4. Rebuild the search index.
```shell-session
cd src
python3 manage.py document_index reindex
```
This is required when the search backend has changed (e.g. the upgrade
to Tantivy). It is safe to run on every upgrade — if the index is already
current it completes quickly.
### Database Upgrades
Paperless-ngx is compatible with Django-supported versions of PostgreSQL and MariaDB and it is generally
@@ -469,9 +480,20 @@ task scheduler.
!!! note
On first startup after upgrading from a previous version, paperless detects
that the index format has changed and automatically performs a one-time full
reindex. No manual migration step is required.
**Docker users:** On first startup after upgrading, the container automatically
detects the index format change and runs a full reindex before starting the
webserver. No manual step is required.
**Bare metal users:** After upgrading, run the following command once to rebuild
the search index in the new format:
```shell-session
cd src
python3 manage.py document_index reindex
```
Changing `PAPERLESS_SEARCH_LANGUAGE` also requires a manual reindex on bare
metal (Docker handles this automatically).
### Clearing the database read cache