Commit Graph

11268 Commits

Author SHA1 Message Date
Trenton H
3cc78fe994 Fix: fall back to in-memory index when INDEX_DIR does not exist
When open_or_rebuild_index is called and the index directory does not exist,
return a fresh in-memory Tantivy index instead of creating the directory as
a side effect. This prevents workspace contamination during test runs where
INDEX_DIR has not been redirected to a temp directory.

In production the data directory is always created during setup, so disk-
based indexes continue to work normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 16:27:07 -07:00
Trenton H
9e8b5ddf08 Refactor: consolidate IterWrapper/identity into documents.utils
Move the duplicated `IterWrapper` type alias and `identity` function from
tasks.py, _backend.py, sanity_checker.py, and paperless_ai/indexing.py into
a single location in documents/utils.py. All four callers now import from
there.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 16:26:49 -07:00
Trenton H
e4b63d61b9 Fix: ensure index dir exists before open, fix test isolation gaps
- `open_or_rebuild_index` now calls `index_dir.mkdir(parents=True, exist_ok=True)`
  so a missing index directory is created on demand rather than crashing on
  `iterdir()` inside `wipe_index`
- `TestTagHierarchy.setUp` calls `super().setUp()` so `DirectoriesMixin` runs
  and `self.dirs` is set before teardown tries to clean up
- `test_search_more_like` d4 content changed to words with no overlap with d2/d3
  to avoid spurious MLT hits from shared stop words at `min_doc_frequency=1`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 16:06:09 -07:00
Trenton H
e7f68c2082 docs: Enhance docstrings and test quality for Tantivy search backend
- Add comprehensive docstrings to all public methods and classes in the search package
  - Clarify purpose, parameters, return values, and implementation notes
  - Document thread safety, error handling, and usage patterns
  - Explain Tantivy-specific workarounds and design decisions

- Improve test quality and pytest compliance
  - Add descriptive comments explaining what each test verifies
  - Convert TestIndexOptimize to pytest style with @pytest.mark.django_db
  - Ensure all test docstrings focus on behavior verification rather than implementation

- Maintain existing functionality while improving code documentation
  - No changes to production logic or test coverage
  - All tests continue to pass with enhanced clarity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 15:54:18 -07:00
Trenton H
12eb9b9abf Fix: add _search_index fixture to TestDateWorkflowLocalization
Tests that create or consume documents trigger the search index signal handler,
which calls get_backend().add_or_update() against settings.INDEX_DIR. This
class only inherited SampleDirMixin, leaving INDEX_DIR pointing at the default
non-existent path and causing FileNotFoundError in CI.

Added _search_index fixture to documents/tests/conftest.py: creates a temp
index directory, overrides INDEX_DIR, and resets the backend singleton.
Applied via @pytest.mark.usefixtures on the class.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 15:36:03 -07:00
Trenton H
ac03a3d609 Fix: count notes during iteration instead of issuing extra COUNT(*) query
document.notes.count() bypasses the prefetch cache and hits the DB on every
document during rebuild. Counting in the existing loop eliminates the query
entirely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 15:14:15 -07:00
Trenton H
7d1af2e215 Refactor: simplify Tantivy search backend for clarity and consistency
- Remove duplicated list comprehension in search sort branches
- Simplify WriteBatch.__exit__ by removing redundant else/pass block
- Fix rebuild() to swap index once before loop instead of per-document
- Add error recovery in rebuild() to restore old index on failure
- Remove redundant re-import of register_tokenizers in rebuild()
- Use tuple unpacking in autocomplete hit iteration
- Collect tag names in single pass for autocomplete text sources
- Use lazy % formatting in logger.debug instead of f-string
- Remove redundant score list variable in normalization
- Fix stale "NLTK stopword filtering" comment (NLTK was removed)
- Remove obvious inline comments that restate the code
- Align index_optimize task message with management command wording

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 15:03:55 -07:00
Trenton H
061099b064 Refactor: inline index_reindex into management command; promote needs_rebuild to public API
- Rename _needs_rebuild -> needs_rebuild and export from documents.search
- document_index command imports directly from documents.search, constructs
  the queryset and calls get_backend().rebuild() inline — no tasks.py indirection
- Optimize subcommand logs deprecation directly; no longer calls index_optimize
- Remove index_reindex from tasks.py
- Convert TestMakeIndex to pytest class (no TestCase); use mocker fixtures
- Simplify TestIndexReindex -> TestIndexOptimize (wrapper test removed)

Co-Authored-By: Antoine Mérino <3023499+Merinorus@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:41:25 -07:00
Trenton H
6699679c29 Docs: expand search section with custom field, notes, and tokenization examples
Adds word-order, accent-insensitivity, and separator-agnostic notes to the
intro, then new subsections covering custom_fields.name/value query syntax
with tokenization examples and a limitation note for custom date fields, plus
a notes.user/notes.note subsection.

Also prefetch document versions during index_reindex.

Co-Authored-By: Antoine Mérino <3023499+Merinorus@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:28:46 -07:00
Trenton H
8107b7d209 Fix: break autocomplete frequency ties alphabetically for stable output
Equal-frequency words were non-deterministically ordered; sort key is
now (-count, word) so ties resolve alphabetically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:13:10 -07:00
Trenton H
897f7d2199 Tests: cover document_index reindex --if-needed flag
Two cases: skips when _needs_rebuild returns False; runs when True.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:11:07 -07:00
Trenton H
da3ff7865e Enhancement: document_index reindex --if-needed; simplify Docker startup
Add --if-needed flag to `document_index reindex`: checks _needs_rebuild()
(schema version + language sentinels) and skips if index is up to date.
Safe to run on every upgrade or startup.

Simplify Docker init-search-index script to unconditionally call
`reindex --if-needed` — schema/language change detection is now fully
delegated to Python. Removes the bash index_version and language file
tracking entirely; Tantivy's own sentinels are the source of truth.

Update docs: bare metal upgrade step uses --if-needed; Docker note
updated to describe the new always-runs-but-skips-if-current behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:08:49 -07:00
Trenton H
ae494d4b6a Chore: bump Docker index version to 1 (Tantivy); add language change detection
- Reset index_version to 1 — Tantivy is a full format change so
  versioning restarts from scratch; all existing v9 installs trigger
  an automatic reindex on next container start
- Add PAPERLESS_SEARCH_LANGUAGE change detection: track raw env var in
  .index_language so changing the language setting auto-reindexes;
  raw env var (not resolved language) avoids false positives from
  OCR_LANGUAGE inference
- docs/administration.md: clarify that Docker handles the post-upgrade
  reindex automatically; bare metal users need to run
  document_index reindex manually; add that as step 4 in the
  bare metal upgrade guide

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 13:55:20 -07:00
Trenton H
fdf08bdc43 Enhancement: infer SEARCH_LANGUAGE from OCR_LANGUAGE; validate if explicit
- SEARCH_LANGUAGE is now str | None (None = no stemming, not "")
- When PAPERLESS_SEARCH_LANGUAGE is set, validate it against
  SUPPORTED_LANGUAGES via get_choice_from_env (startup error on bad value)
- When not set, infer from OCR_LANGUAGE's primary Tesseract code
  (eng→en, deu→de, fra→fr, etc.) covering all 18 Tantivy-supported languages
- _schema.py sentinel normalises None → "" for on-disk comparison
- _tokenizer.py type annotations updated to str | None
- docs: recommend ISO 639-1 two-letter codes; note that capitalized
  Tantivy enum names are not valid; link to Tantivy Language enum

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 13:37:34 -07:00
Trenton H
b10f3de2eb Enhancement: rank autocomplete suggestions by document frequency
Replace set-based alphabetical autocomplete with Counter-based
document-frequency ordering. Words appearing in more of the user's
visible documents rank first — the same signal Whoosh used for
Tf/Idf-based ordering, computed permission-correctly from already-
fetched stored values at no extra index cost.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 13:25:56 -07:00
Trenton H
b626f5602c Docs: update search documentation for Tantivy backend
- configuration.md: add PAPERLESS_SEARCH_LANGUAGE and
  PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD settings
- usage.md: replace Whoosh query language link with Tantivy; remove
  "inexact terms are slow" note; add full natural date keyword list;
  add fuzzy search note
- api.md: update autocomplete ordering description (alphabetical, not Tf/Idf)
- administration.md: deprecate `optimize` subcommand (now a no-op);
  add one-time reindex upgrade note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 13:19:27 -07:00
Trenton H
7f63259f41 Remove silly lines 2026-03-30 13:08:57 -07:00
Trenton H
a213c2cc9b test(search): expand query rewriting coverage for Whoosh compat shims
Fold [-N unit to now] range, field:YYYYMMDD (with TZ-aware DateField vs
DateTimeField logic), and parse_user_query fuzzy path into the renamed
TestWhooshQueryRewriting class. TestParseUserQuery covers the full pipeline
including fuzzy blend mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 12:55:23 -07:00
Trenton H
34d2897ab1 refactor(search): replace context manager smell with explicit open/close lifecycle
TantivyBackend now uses open()/close()/_ensure_open() instead of __enter__/__exit__.
get_backend() tracks _backend_path and auto-reinitializes when settings.INDEX_DIR
changes, fixing the xdist/override_settings isolation bug where parallel workers
would share a stale singleton pointing at a deleted index directory.

Test fixtures use in-memory indices (path=None) for speed and isolation.
Singleton behavior covered by TestSingleton in test_backend.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 12:49:46 -07:00
Trenton H
50f6b2d4c3 feat(search): wire Tantivy backend into all callsites; remove Whoosh
- Replace all `from documents import index` + Whoosh writer usage across
  admin.py, bulk_edit.py, tasks.py, views.py, signals/handlers.py with
  `get_backend().add_or_update/remove/batch_update`
- Add `effective_content` param to `_build_tantivy_doc` / `add_or_update`
  (used by signal handler to re-index root doc with version's OCR text)
- Add `wipe_index()` (renamed from `_wipe_index`) to public API; use from
  `document_index --recreate` flag
- `index_optimize()` replaced with deprecation log message; Tantivy
  manages segment merging automatically
- `index_reindex()` now calls `get_backend().rebuild()` + `reset_backend()`
  with select_related/prefetch_related for efficiency
- `document_index` management command: add `--recreate` flag
- Status view: use `get_backend()` + dir mtime scan instead of Whoosh
  `ix.last_modified()`
- Delete `documents/index.py`, `test_index.py`, `test_delayedquery.py`
- Update all tests: patch `documents.search.get_backend` (lazy imports);
  `DirectoriesMixin` calls `reset_backend()` in setUp/tearDown;
  `TestDocumentConsumptionFinishedSignal` likewise
- `test_api_search.py`: fix order-independent assertions for date-range
  queries; fix `_rewrite_8digit_date` to be field-aware and
  timezone-correct for DateTimeField vs DateField

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 10:43:30 -07:00
Trenton H
9df2a603b7 feat(search): package public exports
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 08:41:02 -07:00
Trenton H
fcd4d28f37 refactor(search): replace NLTK autocomplete extraction with regex \w+ + timeout
NLTK was inappropriate here: no stopword filtering (users should be able to
autocomplete any word), no length floor, and unicode-aware \w+ splits
consistently with Tantivy's simple tokenizer. regex library used (already a
project dependency) for ReDoS protection via per-call timeout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 08:38:22 -07:00
Trenton H
0fb57205db feat(search): complete TantivyBackend — search, autocomplete, more_like_this, rebuild, WriteBatch
Dual-field approach for notes/custom_fields: JSON fields support structured queries
(notes.user:alice, custom_fields.name:invoice); companion text fields (note, custom_field)
carry content for default full-text search — tantivy-py 0.25 parse_query rejects dotted
paths in default_field_names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 08:31:52 -07:00
Trenton H
0078ef9cd5 refactor(search): add docstrings and complete type annotations to all search module functions
- Add descriptive docstrings to all functions in _schema.py, _tokenizer.py, and _query.py
- Complete type annotations for all function parameters and return values
- Fix 8 mypy strict errors in _query.py:
  - Add re.Match[str] type parameters for regex matches
  - Fix "Returning Any" error with str() cast
  - Add type annotations for build_permission_filter() and parse_user_query()
  - Remove lazy imports, move to module top level
- All 29 search module tests continue to pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 15:26:07 -07:00
Trenton H
957049c512 fix(search): register fast-field tokenizer for simple_analyzer; fix perm_index fixture
Tantivy requires register_fast_field_tokenizer for any tokenizer used by
fast=True text fields — it writes default fast column values on every commit
even when a document omits those fields, raising ValueError otherwise.

perm_index fixture simplified to use in-memory index (path=None).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 15:01:39 -07:00
Trenton H
33da63c229 feat(search): normalize_query, build_permission_filter, parse_user_query pipeline
Implement query normalization and permission filtering for Tantivy search:

- normalize_query: expands comma-separated field values with AND operator
- build_permission_filter: security-critical permission filtering for documents
  - no owner (NULL in Django) → documents without owner_id field
  - owned by user → owner_id = user.pk
  - shared with user → viewer_id = user.pk
  - uses disjunction_max_query for proper OR semantics
  - workaround for tantivy-py unsigned type detection bug via range_query
- parse_user_query: full pipeline with fuzzy search support
- DEFAULT_SEARCH_FIELDS and boost configuration

Note: Permission filter tests require Tantivy environment setup;
core functionality implemented and normalize tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 14:56:58 -07:00
Trenton H
cbeb7469a1 feat(search): natural date keyword rewriting with Whoosh compat shims
Implement date/timezone boundary math for natural language date queries:

- `created` (DateField): local calendar date to UTC midnight boundaries
- `added`/`modified` (DateTimeField): local day boundaries with full offset arithmetic
- Whoosh compat shims: compact dates (YYYYMMDDHHmmss) → ISO 8601
- Relative ranges: `[now-7d TO now]` → concrete ISO timestamps
- Natural keywords: today, yesterday, this_week, last_week, etc.
- Timezone-aware: handles UTC offset arithmetic for datetime fields
- Passthrough: bare keywords without field prefixes unchanged

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 14:47:06 -07:00
Trenton H
2cf85d9b58 chore: make whoosh imports lazy to unblock test collection during Tantivy migration
Module-level whoosh imports in tasks.py and paperless/views.py prevented
test collection after removing whoosh-reloaded. Move to lazy imports inside
the functions that use them; will be removed entirely in Task 14.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 14:37:28 -07:00
Trenton H
494d17e7ac feat(search): PAPERLESS_SEARCH_LANGUAGE and PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD settings
Add two new environment variables for Tantivy search backend:
- PAPERLESS_SEARCH_LANGUAGE: language code for stemming (empty string disables)
- PAPERLESS_ADVANCED_FUZZY_SEARCH_THRESHOLD: float threshold for fuzzy search blending (None disables)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 14:33:37 -07:00
Trenton H
e8fe3a6a62 feat(search): tokenizer registration — paperless_text with language stemming, simple_analyzer, bigram_analyzer
This commit implements Task 3 of the Tantivy search backend migration:

- Add `src/documents/search/_tokenizer.py` with three custom tokenizers:
  - `paperless_text`: simple → remove_long(65) → lowercase → ascii_fold [→ stemmer]
    Supports 18 languages via Snowball stemmer with fallback warning for unsupported languages
  - `simple_analyzer`: simple → lowercase → ascii_fold (for shadow sort fields)
  - `bigram_analyzer`: ngram(2,2) → lowercase (for CJK/no-whitespace language support)

- Add comprehensive tests in `src/documents/tests/search/test_tokenizer.py`:
  - ASCII folding test: verifies "café résumé" is findable as "cafe resume"
  - Bigram CJK test: verifies "東京都" is searchable by substring "東京"
  - Warning test: verifies unsupported languages log appropriate warnings

- `register_tokenizers()` function must be called on every Index instance
  as tantivy requires re-registration at each open

- Language support includes common ISO 639-1 codes and full names:
  Arabic, Danish, Dutch, English, Finnish, French, German, Greek,
  Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian,
  Spanish, Swedish, Tamil, Turkish

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 14:30:40 -07:00
Trenton H
884edd6eea feat(search): schema definition and open_or_rebuild_index with sentinel logic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 14:08:09 -07:00
Trenton H
d00fb4f345 feat: add tantivy dependency, search package skeleton, search pytest marker
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 13:56:56 -07:00
Trenton H
9383471fa0 Feature: Transition all checksums to use SHA256 (#12432) 2026-03-26 11:28:02 -07:00
dependabot[bot]
0060b46c8b Chore(deps): Bump requests in the uv group across 1 directory (#12441)
Bumps the uv group with 1 update in the / directory: [requests](https://github.com/psf/requests).


Updates `requests` from 2.32.5 to 2.33.0
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.5...v2.33.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-26 09:04:20 -07:00
GitHub Actions
b153ec803b Auto translate strings 2026-03-26 14:38:10 +00:00
shamoon
38dba60ceb Enhancement: auto-hide the search bar on mobile (#12404) 2026-03-26 07:36:32 -07:00
shamoon
ae0474450f Chore: logger, response and template sanitization cleanup (#12439) 2026-03-26 07:36:02 -07:00
Trenton H
8efb01010c fix: Don't silently drop the change_groups and switch to a couple slightly more efficient implementations (#12431) 2026-03-26 14:15:42 +00:00
Trenton H
d18bbfa9c3 Chore: Instead of manual temporary directory management, use a context manager (#12430) 2026-03-26 14:05:58 +00:00
GitHub Actions
ec76d3c762 Auto translate strings 2026-03-26 12:45:29 +00:00
shamoon
bdc0a58242 Security: prevent prototype pollution in settings and list view (#12438) 2026-03-26 05:43:49 -07:00
dependabot[bot]
b049ad9626 Chore(deps): Bump cbor2 in the uv group across 1 directory (#12424)
Bumps the uv group with 1 update in the / directory: [cbor2](https://github.com/agronholm/cbor2).


Updates `cbor2` from 5.8.0 to 5.9.0
- [Release notes](https://github.com/agronholm/cbor2/releases)
- [Commits](https://github.com/agronholm/cbor2/compare/5.8.0...5.9.0)

---
updated-dependencies:
- dependency-name: cbor2
  dependency-version: 5.9.0
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-24 09:26:24 -07:00
GitHub Actions
79def8a200 Auto translate strings 2026-03-22 13:55:02 +00:00
Trenton H
701735f6e5 Chore: Drop old signal and unneeded apps, transition to parser registry instead (#12405)
* refactor: switch consumer and callers to ParserRegistry (Phase 4)

Replace all Django signal-based parser discovery with direct registry
calls. Removes `_parser_cleanup`, `parser_is_new_style` shims, and all
old-style isinstance checks. All parser instantiation now uses the
`with parser_class() as parser:` context manager pattern.

- documents/parsers.py: delegate to get_parser_registry(); drop lru_cache
- documents/consumer.py: use registry + context manager; remove shims
- documents/tasks.py: same pattern
- documents/management/commands/document_thumbnails.py: same pattern
- documents/views.py: get_metadata uses context manager
- documents/checks.py: use get_parser_registry().all_parsers()
- paperless/parsers/registry.py: add all_parsers() public method
- tests: update mocks to target documents.consumer.get_parser_class_for_mime_type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: drop get_parser_class_for_mime_type; callers use registry directly

All callers now call get_parser_registry().get_parser_for_file() with
the actual filename and path, enabling score() to use file extension
hints. The MIME-only helper is removed.

- consumer.py: passes self.filename + self.working_copy
- tasks.py: passes document.original_filename + document.source_path
- document_thumbnails.py: same pattern
- views.py: passes Path(file).name + Path(file)
- parsers.py: internal helpers inline the registry call with filename=""
- test_parsers.py: drop TestParserDiscovery (was testing mock behavior);
  TestParserAvailability uses registry directly
- test_consumer.py: mocks switch to documents.consumer.get_parser_registry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: remove document_consumer_declaration signal infrastructure

Remove the document_consumer_declaration signal that was previously used
for parser registration. Each parser app no longer connects to this signal,
and the signal declaration itself has been removed from documents/signals.

Changes:
- Remove document_consumer_declaration from documents/signals/__init__.py
- Remove ready() methods and signal imports from all parser app configs
- Delete signal shim files (signals.py) from all parser apps:
  - paperless_tesseract/signals.py
  - paperless_text/signals.py
  - paperless_tika/signals.py
  - paperless_mail/signals.py
  - paperless_remote/signals.py

Parser discovery now happens exclusively through the ParserRegistry
system introduced in the previous refactor phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: remove empty paperless_text and paperless_tika Django apps

After parser classes were moved to paperless/parsers/ in the plugin
refactor, these Django apps contained only empty AppConfig classes
with no models, views, tasks, migrations, or other functionality.

- Remove paperless_text and paperless_tika from INSTALLED_APPS
- Delete empty app directories entirely
- Update pyproject.toml test exclusions
- Clean stale mypy baseline entries for moved parser files

paperless_remote app is retained as it contains meaningful system
checks for Azure AI configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Moves the checks and tests to the main application and removes the old applications

* Adds a comment to satisy Sonar

* refactor: remove automatic log_summary() call from get_parser_registry()

The summary was logged once per process, causing it to appear repeatedly
during Docker startup (management commands, web server, each Celery
worker subprocess). External parsers are already announced individually
at INFO when discovered; the full summary is redundant noise.
log_summary() is retained on ParserRegistry for manual/debug use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Cleans up the duplicate test file/fixture

* Fixes a race condition where webserver threads could race to populate the registry

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 06:53:32 -07:00
GitHub Actions
07f54bfdab Auto translate strings 2026-03-21 09:26:23 +00:00
shamoon
0f84af27d0 Merge branch 'main' into dev
# Conflicts:
#	docs/setup.md
#	src-ui/src/main.ts
#	src/documents/tests/test_api_bulk_edit.py
#	src/documents/tests/test_api_custom_fields.py
#	src/documents/tests/test_api_search.py
#	src/documents/tests/test_api_status.py
#	src/documents/tests/test_workflows.py
#	src/paperless_mail/tests/test_api.py
2026-03-21 02:12:19 -07:00
shamoon
9646b8c67d Bump version to 2.20.13 v2.20.13 2026-03-21 01:50:04 -07:00
shamoon
e590d7df69 Merge branch 'release/v2.20.x' 2026-03-21 01:49:32 -07:00
shamoon
cc71aad058 Fix: suggest corrections only if visible results 2026-03-21 01:24:23 -07:00
shamoon
3cbdf5d0b7 Fix: require view permission for more-like search 2026-03-21 01:20:59 -07:00