paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-04-01 22:02:44 +00:00

Author	SHA1	Message	Date
shamoon	3539f3f66a	Switch simple substring search to simple_search analyzer	2026-04-01 13:37:54 -07:00
shamoon	7c98d29de2	Fix e2e	2026-04-01 11:51:42 -07:00
shamoon	0b9f67fe68	Just moving these comments	2026-04-01 11:47:12 -07:00
shamoon	6a08244c52	Fix this one failing test	2026-04-01 11:41:10 -07:00
shamoon	935d75d457	Update all these uses of FILTER_TITLE_CONTENT	2026-04-01 11:24:24 -07:00
shamoon	af671397f5	Update the filter editor too	2026-04-01 11:23:24 -07:00
shamoon	66e4409242	Bring in the new filter type to frontend	2026-04-01 11:23:04 -07:00
shamoon	8756054778	Ok make it a proper filter type	2026-04-01 11:21:04 -07:00
shamoon	2cbbbaf170	Add a couple deprecation notes	2026-04-01 11:08:01 -07:00
shamoon	4b4f656fbb	Drop the custom fields text query option, but dont break existing views	2026-04-01 11:01:54 -07:00
shamoon	54d2da2375	Backend tests	2026-04-01 10:59:11 -07:00
shamoon	71889e6e90	Use tantivy for global search too	2026-04-01 10:58:39 -07:00
shamoon	02fab43df9	Handle simple searches with frontend query param parsing	2026-04-01 10:38:08 -07:00
shamoon	9139507bd6	Wire the simple searches to view	2026-04-01 10:18:26 -07:00
shamoon	3d77e45c14	Add a simple title query	2026-04-01 10:13:13 -07:00
shamoon	631074e4ed	Add simple text search mode and API param	2026-04-01 10:08:27 -07:00
Trenton H	64fe8546ca	Custom field indexing wouldn't have matched exactly, also, index the select field label, not its ID (might break, don't want the VM)	2026-03-31 14:29:31 -07:00
Trenton H	edcadfcdc7	Merge branch 'dev' into feature-tantivy-search-backend	2026-03-31 12:07:39 -07:00
Trenton H	eac4a6ca05	Merge remote-tracking branch 'origin/dev' into feature-tantivy-search-backend Hopefully the conflicts are good	2026-03-31 11:57:10 -07:00
GitHub Actions	2aa0c9f0b4	Auto translate strings	2026-03-31 18:25:03 +00:00
shamoon	d2328b776a	Performance: support bulk edit without id lists (#12355 )	2026-03-31 18:23:28 +00:00
Trenton H	c981fb26f7	Adds no cover on some defensive error handling, cover a few other cases more directly	2026-03-31 11:14:26 -07:00
Trenton H	9003bfdeea	Fine, I'll spin up the VM here. Good this got tested	2026-03-31 09:19:57 -07:00
Trenton H	2bb7c7ae17	Chore: Document the parser plugin system (#12423 ) Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-03-31 09:16:43 -07:00
Trenton H	32111b00f5	Further search coverage which maybe works, we'll find out	2026-03-31 09:02:40 -07:00
Trenton H	4ddb27afc7	better typing through fixtures + comment	2026-03-31 08:23:33 -07:00
Trenton H	65b9d69ee6	Quick testing to cover most (everything?) in needs_rebuild	2026-03-31 08:11:11 -07:00
Trenton H	8272b98f4e	Adds more coverage for the Whoosh re-writing to ISO format, covering standard and some odd cases like wrapping	2026-03-31 08:08:29 -07:00
Trenton H	25e905395c	And the filelock one, this is defensive stuff I don't see value in	2026-03-31 08:07:15 -07:00
Trenton H	977d41f3aa	Also no cover this defensive TimeoutError	2026-03-31 08:06:38 -07:00
Trenton H	1000c47d86	TimeoutError is builtin, not exported from regex	2026-03-31 08:02:58 -07:00
GitHub Actions	e1da2a1efe	Auto translate strings	2026-03-31 14:57:34 +00:00
shamoon	245514ad10	Performance: deprecate and remove usage of `all` in API results (#12309 )	2026-03-31 07:55:59 -07:00
Trenton H	1e1bba1a15	Improves typing a touch	2026-03-31 07:52:21 -07:00
Trenton H	97034e8ff6	Covers the reindex --recreate path and documents the flag	2026-03-31 07:50:44 -07:00
Trenton H	0b032fffeb	Probably covers that branch, we'll find out	2026-03-31 07:33:53 -07:00
Trenton H	dd4bd8dd7e	Updates fixture for sonar	2026-03-31 07:33:04 -07:00
Trenton H	881196183c	iterdir doesn't need a list	2026-03-31 07:32:47 -07:00
Trenton H	f36ea803d1	Uses regex matching with a timeout for all Whoosh query re-writing	2026-03-31 07:18:14 -07:00
Trenton H	eaa23751de	Merge: resolve conflict with include_selection_data from #12300 The origin/dev branch added include_selection_data support to the search list() view (PR #12300). Our Tantivy list() had replaced the Whoosh implementation entirely, causing a conflict. Resolution: keep the Tantivy implementation and incorporate the include_selection_data feature. When requested, selection_data is computed over all matching document IDs from ordered_hits (the full Tantivy result set, not just the current page). Also update test_search_with_include_selection_data from #12300 to use the Tantivy indexing API (get_backend().add_or_update) instead of the removed Whoosh AsyncWriter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 17:08:39 -07:00
Trenton H	3cc78fe994	Fix: fall back to in-memory index when INDEX_DIR does not exist When open_or_rebuild_index is called and the index directory does not exist, return a fresh in-memory Tantivy index instead of creating the directory as a side effect. This prevents workspace contamination during test runs where INDEX_DIR has not been redirected to a temp directory. In production the data directory is always created during setup, so disk- based indexes continue to work normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 16:27:07 -07:00
Trenton H	9e8b5ddf08	Refactor: consolidate IterWrapper/identity into documents.utils Move the duplicated `IterWrapper` type alias and `identity` function from tasks.py, _backend.py, sanity_checker.py, and paperless_ai/indexing.py into a single location in documents/utils.py. All four callers now import from there. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 16:26:49 -07:00
Trenton H	e4b63d61b9	Fix: ensure index dir exists before open, fix test isolation gaps - `open_or_rebuild_index` now calls `index_dir.mkdir(parents=True, exist_ok=True)` so a missing index directory is created on demand rather than crashing on `iterdir()` inside `wipe_index` - `TestTagHierarchy.setUp` calls `super().setUp()` so `DirectoriesMixin` runs and `self.dirs` is set before teardown tries to clean up - `test_search_more_like` d4 content changed to words with no overlap with d2/d3 to avoid spurious MLT hits from shared stop words at `min_doc_frequency=1` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 16:06:09 -07:00
Trenton H	e7f68c2082	docs: Enhance docstrings and test quality for Tantivy search backend - Add comprehensive docstrings to all public methods and classes in the search package - Clarify purpose, parameters, return values, and implementation notes - Document thread safety, error handling, and usage patterns - Explain Tantivy-specific workarounds and design decisions - Improve test quality and pytest compliance - Add descriptive comments explaining what each test verifies - Convert TestIndexOptimize to pytest style with @pytest.mark.django_db - Ensure all test docstrings focus on behavior verification rather than implementation - Maintain existing functionality while improving code documentation - No changes to production logic or test coverage - All tests continue to pass with enhanced clarity Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 15:54:18 -07:00
Trenton H	12eb9b9abf	Fix: add _search_index fixture to TestDateWorkflowLocalization Tests that create or consume documents trigger the search index signal handler, which calls get_backend().add_or_update() against settings.INDEX_DIR. This class only inherited SampleDirMixin, leaving INDEX_DIR pointing at the default non-existent path and causing FileNotFoundError in CI. Added _search_index fixture to documents/tests/conftest.py: creates a temp index directory, overrides INDEX_DIR, and resets the backend singleton. Applied via @pytest.mark.usefixtures on the class. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 15:36:03 -07:00
Trenton H	ac03a3d609	Fix: count notes during iteration instead of issuing extra COUNT(*) query document.notes.count() bypasses the prefetch cache and hits the DB on every document during rebuild. Counting in the existing loop eliminates the query entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 15:14:15 -07:00
Trenton H	7d1af2e215	Refactor: simplify Tantivy search backend for clarity and consistency - Remove duplicated list comprehension in search sort branches - Simplify WriteBatch.__exit__ by removing redundant else/pass block - Fix rebuild() to swap index once before loop instead of per-document - Add error recovery in rebuild() to restore old index on failure - Remove redundant re-import of register_tokenizers in rebuild() - Use tuple unpacking in autocomplete hit iteration - Collect tag names in single pass for autocomplete text sources - Use lazy % formatting in logger.debug instead of f-string - Remove redundant score list variable in normalization - Fix stale "NLTK stopword filtering" comment (NLTK was removed) - Remove obvious inline comments that restate the code - Align index_optimize task message with management command wording Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 15:03:55 -07:00
Trenton H	061099b064	Refactor: inline index_reindex into management command; promote needs_rebuild to public API - Rename _needs_rebuild -> needs_rebuild and export from documents.search - document_index command imports directly from documents.search, constructs the queryset and calls get_backend().rebuild() inline — no tasks.py indirection - Optimize subcommand logs deprecation directly; no longer calls index_optimize - Remove index_reindex from tasks.py - Convert TestMakeIndex to pytest class (no TestCase); use mocker fixtures - Simplify TestIndexReindex -> TestIndexOptimize (wrapper test removed) Co-Authored-By: Antoine Mérino <3023499+Merinorus@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 14:41:25 -07:00
Trenton H	6699679c29	Docs: expand search section with custom field, notes, and tokenization examples Adds word-order, accent-insensitivity, and separator-agnostic notes to the intro, then new subsections covering custom_fields.name/value query syntax with tokenization examples and a limitation note for custom date fields, plus a notes.user/notes.note subsection. Also prefetch document versions during index_reindex. Co-Authored-By: Antoine Mérino <3023499+Merinorus@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 14:28:46 -07:00
Trenton H	8107b7d209	Fix: break autocomplete frequency ties alphabetically for stable output Equal-frequency words were non-deterministically ordered; sort key is now (-count, word) so ties resolve alphabetically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-30 14:13:10 -07:00

1 2 3 4 5 ...

11316 Commits