Previously autocomplete scanned every visible document to extract
words, then filtered by prefix in Python. Now builds a regex query
on autocomplete_word so Tantivy only returns docs containing matching
words. At 5k docs: rare prefixes go from 335ms to <1ms, common
prefixes from 342ms to 199ms with 58-99% less peak memory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Global search only displays 3 results but was fetching all matching IDs
and hydrating them via in_bulk. Now passes limit=9 to search_ids().
more_like_this_ids could return limit-1 results when the original doc
appeared in the result set. Now fetches limit+1 and slices after
filtering.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Break the monolithic list() method into typed sub-functions for
readability: parse_search_params, intersect_and_order, run_text_search,
run_more_like_this. Also defer get_backend() until after param
validation so invalid requests fail fast.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The notes SnippetGenerator was being recreated per document instead of
lazily initialized once like the content generator.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All production callers now use search_ids() + highlight_hits(). Migrated
10 tests to search_ids(), removed 5 that tested search()-specific
features (score normalization, highlight windowing).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The global search endpoint only needs document IDs (takes top 3), not
highlights or scores. Using search_ids() avoids building SearchHit dicts
and removes the last production caller of backend.search().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The method is no longer called anywhere in production code — all callers
were migrated to more_like_this_ids() during the search pagination work.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deduplicates query parsing (3 call sites) and permission filter
wrapping (4 call sites) into private helper methods on TantivyBackend.
Also documents the N-lookup limitation of highlight_hits().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- TantivyRelevanceList.__getitem__ now handles int keys, not just slices
- search_ids() docstring corrected ("no highlights or scores")
- Empty ordering param now correctly becomes None instead of ""
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use search_ids() for the full set of matching IDs (lightweight ints,
no arbitrary cap) and highlight_hits() for just the displayed page.
TantivyRelevanceList now holds ordered IDs for count/selection_data
and a small page of rich SearchHit dicts for serialization.
Removes the hardcoded 10000 limit that silently truncated results
for large collections. Memory usage down ~10% on sorted/paginated
search paths at 200 docs, with larger gains expected at scale.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Profiling tests and helper served their purpose during the search
performance optimization work. Baseline and post-implementation
data captured in docs/superpowers/plans/.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single source of truth for sort field mapping. The viewset now references
TantivyBackend.SORTABLE_FIELDS instead of maintaining a duplicate set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
search_ids() returns only document IDs matching a query — no highlights,
no SearchHit objects. more_like_this_ids() does the same for MLT queries.
These provide lightweight paths when only IDs are needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Gate expensive snippet/highlight generation to only the requested
slice of hits, allowing the viewset to avoid generating highlights
for all 10k results when only 25 are displayed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make sure we're always using regex with timeouts for user controlled data
- Adds rate limiting to the token endpoint (configurable)
- Signs the classifier pickle file with the SECRET_KEY and refuse to load one which doesn't verify.
- Require the user to set a secret key, instead of falling back to our old hard coded one
* Tests: add regression test for redis URL with empty username and password
Covers the unix://:SECRET@/path.sock format (empty username, password only),
which was missing from the existing test cases for PR #12239.
* Update src/paperless/tests/settings/test_custom_parsers.py
---------
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>