paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-08-02 00:52:20 +00:00

Author	SHA1	Message	Date
Trenton HandGitHub	67da965d21	Merge branch 'dev' into feature-search-pagination-improvements	2026-04-08 17:37:30 -07:00
Trenton H	acdee63197	Call to_html on snippets. JSON fields don't support snippets, so store a 'notes_text' to highlight instead. Use tantivty score when sorting for that, instead of discarding it	2026-04-08 15:05:04 -07:00
GitHub Actions	ec6969e326	Auto translate strings	2026-04-08 15:42:05 +00:00
shamoonandGitHub	4629bbf83e	Enhancement: add view_global_statistics and view_system_status permissions (#12530 )	2026-04-08 15:39:47 +00:00
Trenton H	759717404e	Adds notes for where we can improve, if either fixes, features or a new release drop in from Tantivy	2026-04-07 14:45:27 -07:00
Trenton H	0bdaff203c	Fixes copilot found issues, try to tune the filtering as suggested	2026-04-07 13:30:35 -07:00
Trenton HandGitHub	689f5964fc	Merge branch 'dev' into feature-search-pagination-improvements	2026-04-07 08:06:35 -07:00
GitHub Actions	51c59746a7	Auto translate strings	2026-04-06 22:51:57 +00:00
Trenton H GitHub Claude Sonnet 4.6 shamoon	c232d443fa	Breaking: Decouple OCR control from archive file control (#12448 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-04-06 15:50:21 -07:00
Trenton HolmesandClaude Opus 4.6	51624840f2	docs: note autocomplete as candidate for Redis caching Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 14:33:45 -07:00
Trenton HolmesandClaude Opus 4.6	48309938c6	perf: use prefix query in autocomplete to avoid full-index scan Previously autocomplete scanned every visible document to extract words, then filtered by prefix in Python. Now builds a regex query on autocomplete_word so Tantivy only returns docs containing matching words. At 5k docs: rare prefixes go from 335ms to <1ms, common prefixes from 342ms to 199ms with 58-99% less peak memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 14:26:26 -07:00
Trenton HolmesandClaude Opus 4.6	b4cfc27876	docs: note potential large IN clause in selection_data query Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 13:52:41 -07:00
Trenton HolmesandClaude Opus 4.6	86ac3ba9f1	fix: limit global search to 9 IDs and fix more_like_this_ids off-by-one Global search only displays 3 results but was fetching all matching IDs and hydrating them via in_bulk. Now passes limit=9 to search_ids(). more_like_this_ids could return limit-1 results when the original doc appeared in the result set. Now fetches limit+1 and slices after filtering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 13:49:23 -07:00
Trenton HolmesandClaude Opus 4.6	67261287d2	refactor: extract nested helpers in UnifiedSearchViewSet.list() Break the monolithic list() method into typed sub-functions for readability: parse_search_params, intersect_and_order, run_text_search, run_more_like_this. Also defer get_backend() until after param validation so invalid requests fail fast. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 13:31:36 -07:00
Trenton HolmesandClaude Opus 4.6	ca077ba1e3	fix: reuse notes snippet generator across docs in highlight_hits() The notes SnippetGenerator was being recreated per document instead of lazily initialized once like the content generator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 13:23:14 -07:00
Trenton HolmesandClaude Opus 4.6	e3076b8d62	refactor: remove dead search() method and SearchResults from TantivyBackend All production callers now use search_ids() + highlight_hits(). Migrated 10 tests to search_ids(), removed 5 that tested search()-specific features (score normalization, highlight windowing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 13:21:01 -07:00
Trenton HolmesandClaude Opus 4.6	cb851fc217	refactor: switch global search from backend.search() to search_ids() The global search endpoint only needs document IDs (takes top 3), not highlights or scores. Using search_ids() avoids building SearchHit dicts and removes the last production caller of backend.search(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 13:14:21 -07:00
Trenton HolmesandClaude Opus 4.6	534fcfde6b	refactor: remove dead more_like_this() method from TantivyBackend The method is no longer called anywhere in production code — all callers were migrated to more_like_this_ids() during the search pagination work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-06 13:10:58 -07:00
Trenton HolmesandClaude Opus 4.6	0b5b6fdad5	refactor: extract _parse_query and _apply_permission_filter helpers Deduplicates query parsing (3 call sites) and permission filter wrapping (4 call sites) into private helper methods on TantivyBackend. Also documents the N-lookup limitation of highlight_hits(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-05 13:31:37 -07:00
Trenton HolmesandClaude Opus 4.6	d98dbd50f4	fix: address code review findings (int keys, docstring, empty ordering) - TantivyRelevanceList.__getitem__ now handles int keys, not just slices - search_ids() docstring corrected ("no highlights or scores") - Empty ordering param now correctly becomes None instead of "" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-05 13:26:10 -07:00
Trenton Holmes	7649e4a6b1	Merge remote-tracking branch 'origin/dev' into feature-search-pagination-improvements	2026-04-05 13:18:43 -07:00
Trenton HolmesandClaude Opus 4.6	610ba27891	feat: replace 10000 overfetch with search_ids + page-only highlights Use search_ids() for the full set of matching IDs (lightweight ints, no arbitrary cap) and highlight_hits() for just the displayed page. TantivyRelevanceList now holds ordered IDs for count/selection_data and a small page of rich SearchHit dicts for serialization. Removes the hardcoded 10000 limit that silently truncated results for large collections. Memory usage down ~10% on sorted/paginated search paths at 200 docs, with larger gains expected at scale. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-05 12:54:47 -07:00
Trenton HandGitHub	5f5fb263c9	Fix: Don't create a new note highlight generator per note in the loop (#12512 )	2026-04-03 17:34:15 -07:00
Trenton HolmesandClaude Opus 4.6	7c50e0077c	chore: remove temporary profiling infrastructure Profiling tests and helper served their purpose during the search performance optimization work. Baseline and post-implementation data captured in docs/superpowers/plans/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 15:53:55 -07:00
Trenton HolmesandClaude Opus 4.6	288740ea62	refactor: promote sort_field_map to class-level constant on TantivyBackend Single source of truth for sort field mapping. The viewset now references TantivyBackend.SORTABLE_FIELDS instead of maintaining a duplicate set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 15:53:49 -07:00
Trenton HolmesandClaude Opus 4.6	d998d3fbaf	feat: delegate sorting to Tantivy and use page-only highlights in viewset Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 15:35:14 -07:00
Trenton HolmesandClaude Opus 4.6	6cf01dd383	feat: add search_ids() and more_like_this_ids() lightweight methods search_ids() returns only document IDs matching a query — no highlights, no SearchHit objects. more_like_this_ids() does the same for MLT queries. These provide lightweight paths when only IDs are needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 15:21:52 -07:00
Trenton HolmesandClaude Opus 4.6	0d915c58a4	feat: add highlight_page/highlight_page_size params to search() Gate expensive snippet/highlight generation to only the requested slice of hits, allowing the viewset to avoid generating highlights for all 10k results when only 25 are displayed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 15:10:00 -07:00
Trenton HolmesandClaude Opus 4.6	46008d2da7	test: add baseline profiling tests for search performance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 14:58:11 -07:00
shamoonandGitHub	b807b107ad	Enhancement: include sharelinks + bundles in export/import (#12479 )	2026-04-03 21:51:57 +00:00
Trenton HandGitHub	c2f02851da	Chore: Better typed status manager messages (#12509 )	2026-04-03 21:18:01 +00:00
GitHub Actions	d0f8a98a9a	Auto translate strings	2026-04-03 20:55:14 +00:00
shamoonandGitHub	566afdffca	Enhancement: unify text search to use tantivy (#12485 )	2026-04-03 13:53:45 -07:00
Trenton HandGitHub	f32ad98d8e	Feature: Update consumer logging to include task ID for log correlation (#12510 )	2026-04-03 13:31:40 -07:00
Trenton HandGitHub	d365f19962	Security: Registers a custom serializer which signs the task payload (#12504 )	2026-04-03 03:49:54 +00:00
GitHub Actions	2703c12f1a	Auto translate strings	2026-04-03 03:25:57 +00:00
shamoonandGitHub	e7c7978d67	Enhancement: allow opt-in blocking internal mail hosts (#12502 )	2026-04-03 03:24:28 +00:00
GitHub Actions	83501757df	Auto translate strings	2026-04-02 22:36:32 +00:00
Trenton HandGitHub	dda05a7c00	Security: Improve overall security in a few ways (#12501 ) - Make sure we're always using regex with timeouts for user controlled data - Adds rate limiting to the token endpoint (configurable) - Signs the classifier pickle file with the SECRET_KEY and refuse to load one which doesn't verify. - Require the user to set a secret key, instead of falling back to our old hard coded one	2026-04-02 15:30:26 -07:00
Trenton HandGitHub	376af81b9c	Fix: Resolve another TC assuming an object has been created somewhere (#12503 )	2026-04-02 14:58:28 -07:00
GitHub Actions	05c9e21fac	Auto translate strings	2026-04-02 19:40:05 +00:00
Trenton H GitHub Claude Sonnet 4.6 Antoine Mérino	aed9abe48c	Feature: Replace Whoosh with tantivy search backend (#12471 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Antoine Mérino <3023499+Merinorus@users.noreply.github.com>	2026-04-02 12:38:22 -07:00
GitHub Actions	2aa0c9f0b4	Auto translate strings	2026-03-31 18:25:03 +00:00
shamoonandGitHub	d2328b776a	Performance: support bulk edit without id lists (#12355 )	2026-03-31 18:23:28 +00:00
GitHub Actions	e1da2a1efe	Auto translate strings	2026-03-31 14:57:34 +00:00
shamoonandGitHub	245514ad10	Performance: deprecate and remove usage of `all` in API results (#12309 )	2026-03-31 07:55:59 -07:00
GitHub Actions	020057e1a4	Auto translate strings	2026-03-30 16:40:47 +00:00
shamoonandGitHub	f715533770	Performance: support passing selection data with filtered document requests (#12300 )	2026-03-30 16:38:52 +00:00
Jan KleineandGitHub	0292edbee7	Fixhancement: include trashed documents in document exporter/importer (#12425 )	2026-03-30 16:30:22 +00:00
Andreas Schneider GitHub shamoon	85e0d1842a	Tests: add regression test for redis URL with empty username (#12460 ) * Tests: add regression test for redis URL with empty username and password Covers the unix://:SECRET@/path.sock format (empty username, password only), which was missing from the existing test cases for PR #12239. * Update src/paperless/tests/settings/test_custom_parsers.py --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-03-29 06:31:18 -07:00

1 2 3 4 5 ...