paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-05-22 14:35:25 +00:00

Author	SHA1	Message	Date
Trenton H	58789e5061	Chore: Structured consume task return values (#12612 )	2026-04-20 13:19:54 -07:00
Trenton H	8e67828bd7	Feature: Redesign the task system (#12584 ) * feat(tasks): replace PaperlessTask model with structured redesign Drop the old string-based PaperlessTask table and recreate it with Status/TaskType/TriggerSource enums, JSONField result storage, and duration tracking fields. Update all call sites to use the new API. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tasks): rewrite signal handlers to track all task types Replace the old consume_file-only handler with a full rewrite that tracks 6 task types (consume_file, train_classifier, sanity_check, index_optimize, llm_index, mail_fetch) with proper trigger source detection, input data extraction, legacy result string parsing, duration/wait time recording, and structured error capture on failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(tasks): add traceback and revoked state coverage to signal tests * refactor(tasks): remove manual PaperlessTask creation and scheduled/auto params All task records are now created exclusively via Celery signals (Task 2). Removed PaperlessTask creation/update from train_classifier, sanity_check, llmindex_index, and check_sanity. Removed scheduled= and auto= parameters from all 7 call sites. Updated apply_async callers to use trigger_source headers instead. Exceptions now propagate naturally from task functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tasks): auto-inject trigger_source=scheduled header for all beat tasks Inject `headers: {"trigger_source": "scheduled"}` into every Celery beat schedule entry so signal handlers can identify scheduler-originated tasks without per-task instrumentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tasks): update serializer, filter, and viewset with v9 backwards compat - Replace TasksViewSerializer/RunTaskViewSerializer with TaskSerializerV10 (new field names), TaskSerializerV9 (v9 compat), TaskSummarySerializer, and RunTaskSerializer - Add AcknowledgeTasksViewSerializer unchanged (kept existing validation) - Expand PaperlessTaskFilterSet with MultipleChoiceFilter for task_type, trigger_source, status; add is_complete, date_created_after/before filters - Replace TasksViewSet.get_serializer_class() to branch on request.version - Add get_queryset() v9 compat for task_name/type query params - Add acknowledge_all, summary, active actions to TasksViewSet - Rewrite run action to use apply_async with trigger_source header - Add timedelta import to views.py; add MultipleChoiceFilter/DateTimeFilter to filters.py imports Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tasks): add read_only_fields to TaskSerializerV9, enforce admin via permission_classes on run action * test(tasks): rewrite API task tests for redesigned model and v9 compat Replaces the old Django TestCase-based tests with pytest-style classes using PaperlessTaskFactory. Covers v10 field names, v9 backwards-compat field mapping, filtering, ordering, acknowledge, acknowledge_all, summary, active, and run endpoints. Also adds PaperlessTaskFactory to factories.py and fixes a redundant source= kwarg in TaskSerializerV10.related_document_ids. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(tasks): fix two spec gaps in task API test suite Move test_list_is_owner_aware to TestGetTasksV10 (it tests GET /api/tasks/, not acknowledge). Add test_related_document_ids_includes_duplicate_of to cover the duplicate_of path in the related_document_ids property. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(tasks): address code quality review findings Remove trivial field-existence tests per project conventions. Fix potentially flaky ordering test to use explicit date_created values. Add is_complete=false filter test, v9 type filter input direction test, and tighten TestActive second test to target REVOKED specifically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tasks): update TaskAdmin for redesigned model Add date_created, duration_seconds to list_display; add trigger_source to list_filter; add input_data, duration_seconds, wait_time_seconds to readonly_fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tasks): update Angular types and service for task redesign Replace PaperlessTaskName/PaperlessTaskType/PaperlessTaskStatus enums with new PaperlessTaskType, PaperlessTaskTriggerSource, PaperlessTaskStatus enums. Update PaperlessTask interface to new field names (task_type, trigger_source, input_data, result_message, related_document_ids). Update TasksService to filter by task_type instead of task_name. Update tasks component and system-status-dialog to use new field names. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore(tasks): remove django-celery-results PaperlessTask now tracks all task results via Celery signals. The django-celery-results DB backend was write-only -- nothing reads from it. Drop the package and add a migration to clean up the orphaned tables. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: fix remaining tests broken by task system redesign Update all tests that created PaperlessTask objects with old field names to use PaperlessTaskFactory and new field names (task_type, trigger_source, status, result_message). Use apply_async instead of delay where mocked. Drop TestCheckSanityTaskRecording — tests PaperlessTask creation that was intentionally removed from check_sanity(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(tasks): improve test_api_tasks.py structure and add api marker - Move admin_client, v9_client, user_client fixtures to conftest.py so they can be reused by other API tests; all three now build on the rest_api_client fixture instead of creating APIClient() directly - Move regular_user fixture to conftest.py (was already done, now also used by the new client fixtures) - Add docstrings to every test method describing the behaviour under test - Move timedelta/timezone imports to module level - Register 'api' pytest marker in pyproject.toml and apply pytestmark to the entire file so all 40 tests are selectable via -m api Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(tasks): simplify task tracking code after redesign - Extract COMPLETE_STATUSES as a class constant on PaperlessTask, eliminating the repeated status tuple across models.py, views.py (3×), and filters.py - Extract _CELERY_STATE_TO_STATUS as a module-level constant instead of rebuilding the dict on every task_postrun - Extract _V9_TYPE_TO_TRIGGER_SOURCE and _RUNNABLE_TASKS as class constants on TasksViewSet instead of rebuilding on every request - Extract _TRIGGER_SOURCE_TO_V9_TYPE as a class constant on TaskSerializerV9 instead of rebuilding per serialized object - Extract _get_consume_args helper to deduplicate identical arg extraction logic in _extract_input_data, _determine_trigger_source, and _extract_owner_id - Move inline imports (re, traceback) and Avg to module level - Fix _DOCUMENT_SOURCE_TO_TRIGGER type annotation key type to DocumentSource instead of Any - Remove redundant truthiness checks in SystemStatusView branches already guarded by an is-None check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(tasks): add docstrings and rename _parse_legacy_result - Add docstrings to _extract_input_data, _determine_trigger_source, _extract_owner_id explaining what each helper does and why - Rename _parse_legacy_result -> _parse_consume_result: the function parses current consume_file string outputs (consumer.py returns "New document id N created" and "It is a duplicate of X (#N)"), not legacy data; the old name was misleading Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tasks): extend and harden the task system redesign - TaskType: add EMPTY_TRASH, CHECK_WORKFLOWS, CLEANUP_SHARE_LINKS; remove INDEX_REBUILD (no backing task — beat schedule uses index_optimize) - TRACKED_TASKS: wire up all nine task types including the three new ones and llmindex_index / process_mail_accounts - Add task_revoked_handler so cancelled/expired tasks are marked REVOKED - Fix double-write: task_postrun_handler no longer overwrites result_data when status is already FAILURE (task_failure_handler owns that write) - v9 serialiser: map EMAIL_CONSUME and FOLDER_CONSUME to AUTO_TASK - views: scope task list to owner for regular users, admins see all; validate ?days= query param and return 400 on bad input - tests: add test_list_admin_sees_all_tasks; rename/fix test_parses_duplicate_string (duplicates produce SUCCESS, not FAILURE); use PaperlessTaskFactory in modified tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tasks): fix MAIL_FETCH null input_data and postrun double-query - _extract_input_data: return {} instead of {"account_ids": None} when process_mail_accounts is called without an explicit account list (the normal beat-scheduled path); add test to cover this path - task_postrun_handler: replace filter().first() + filter().update() with get() + save(update_fields=[...]) — single fetch, single write, consistent with task_prerun_handler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tasks): add queryset stub to satisfy drf-spectacular schema generation TasksViewSet.get_queryset() accesses request.user, which drf-spectacular cannot provide during static schema generation. Adding a class-level queryset = PaperlessTask.objects.none() gives spectacular a model to introspect without invoking get_queryset(), eliminating both warnings and the test_valid_schema failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(tasks): fill coverage gaps in task system - test_task_signals: add TestTaskRevokedHandler (marks REVOKED, ignores None request, ignores unknown id); switch existing direct PaperlessTask.objects.create calls to PaperlessTaskFactory; import pytest_mock and use MockerFixture typing on mocker params - test_api_tasks: add test_rejects_invalid_days_param to TestSummary - tasks.service.spec: add dismissAllTasks test (POST acknowledge_all + reload) - models: add pragma: no cover to __str__, is_complete, and related_document_ids (trivial delegates, covered indirectly) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Well, that was a bad push. * Fixes v9 API compatability with testing coverage * fix(tasks): restore INDEX_OPTIMIZE enum and remove no-op run button INDEX_OPTIMIZE was dropped from the TaskType enum but still referenced in _RUNNABLE_TASKS (views.py) and the frontend system-status-dialog, causing an AttributeError at import time. Restore the enum value in the model and migration so the serializer accepts it, but remove it from _RUNNABLE_TASKS since index_optimize is a Tantivy no-op. Remove the frontend "Run Task" button for index optimization accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tasks): v9 type filter now matches all equivalent trigger sources The v9 ?type= query param mapped each value to a single TriggerSource, but the serializer maps multiple sources to the same v9 type value. A task serialized as "auto_task" would not appear when filtering by ?type=auto_task if its trigger_source was email_consume or folder_consume. Same issue for "manual_task" missing web_ui and api_upload sources. Changed to trigger_source__in with the full set of sources for each v9 type value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tasks): give task_failure_handler full ownership of FAILURE path task_postrun_handler now early-returns for FAILURE states instead of redundantly writing status and date_done. task_failure_handler now computes duration_seconds and wait_time_seconds so failed tasks get complete timing data. This eliminates a wasted .get() + .save() round trip on every failed task and gives each handler a clean, non-overlapping responsibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tasks): resolve trigger_source header via TriggerSource enum lookup Replace two hardcoded string comparisons ("scheduled", "system") with a single TriggerSource(header_source) lookup so the enum values are the single source of truth. Any valid TriggerSource DB value passed in the header is accepted; invalid values fall through to the document-source / MANUAL logic. Update tests to pass enum values in headers rather than raw strings, and add a test for the invalid-header fallback path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tasks): use TriggerSource enum values at all apply_async call sites Replace raw strings ("system", "manual") with PaperlessTask.TriggerSource enum values in the three callers that can import models. The settings file remains a raw string (models cannot be imported at settings load time) with a comment pointing to the enum value it must match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(tasks): parametrize repetitive test cases in task test files test_api_tasks.py: - Collapse six trigger_source->v9-type tests into one parametrized test, adding the previously untested API_UPLOAD case - Collapse three task_name mapping tests (two remaps + pass-through) into one parametrized test - Collapse two acknowledge_all status tests into one parametrized test - Collapse two run-endpoint 400 tests into one parametrized test - Update run/ assertions to use TriggerSource enum values test_task_signals.py: - Collapse three trigger_source header tests into one parametrized test - Collapse two DocumentSource->TriggerSource mapping tests into one parametrized test - Collapse two prerun ignore-invalid-id tests into one parametrized test All parametrize cases use pytest.param with descriptive ids. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Handle JSON serialization for datetime and Path. Further restrist the v9 permissions as Copilot suggests * That should fix the generated schema/browser * Use XSerializer for the schema * A few more basic cases I see no value in covering * Drops the migration related stuff too. Just in case we want it again or it confuses people * fix: annotate tasks_summary_retrieve as array of TaskSummarySerializer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: annotate tasks_active_retrieve as array of TaskSerializerV10 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Restore task running to superuser only * Removes the acknowledge/dismiss all stuff * Aligns v10 and v9 task permissions with each other * Short blurb just to warn users about the tasks being cleared --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-20 09:28:41 -07:00
Trenton H	8c1225e120	Fixes an N+1 query in matching with the version content fetching by prefetching versions (#12562 )	2026-04-13 13:10:28 -07:00
shamoon	566afdffca	Enhancement: unify text search to use tantivy (#12485 )	2026-04-03 13:53:45 -07:00
Trenton H	aed9abe48c	Feature: Replace Whoosh with tantivy search backend (#12471 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Antoine Mérino <3023499+Merinorus@users.noreply.github.com>	2026-04-02 12:38:22 -07:00
Trenton H	9383471fa0	Feature: Transition all checksums to use SHA256 (#12432 )	2026-03-26 11:28:02 -07:00
shamoon	3efc9a5733	Fix: use effective content for matching and suggestion content (#12293 )	2026-03-10 23:45:56 +00:00
shamoon	df03207eef	Fix: correct doc version filename handling (#12223 )	2026-03-04 23:28:07 +00:00
shamoon	85a18e5911	Enhancement: saved view sharing (#12142 )	2026-03-04 14:15:43 -08:00
shamoon	d51a118aac	Merge branch 'main' into dev	2026-03-04 13:31:20 -08:00
shamoon	8b8307571a	Fix: enforce path limit for db filename fields (#12235 )	2026-03-03 13:19:56 -08:00
Trenton H	43406f44f2	Feature: Improve the retagger output using rich (#12194 )	2026-03-03 07:14:59 -08:00
shamoon	ceee769e26	Feature: document file versions (#12061 )	2026-02-26 16:46:54 +00:00
Jan Kleine	c4ea332c61	Feature: move to trash action for workflows (#11176 ) Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-02-23 16:42:50 -08:00
Trenton H	e8e027abc0	Chore: Optimizes the integer fields for choice types mostly, while leaving plenty of room to grow (#12057 )	2026-02-10 15:11:44 -08:00
shamoon	e45fca475a	Feature: password removal workflow action (#11665 )	2026-02-03 17:10:07 +00:00
Sebastian Steinbeißer	3b5ffbf9fa	Chore(mypy): Annotate `None` returns for typing improvements (#11213 )	2026-02-02 08:44:12 -08:00
shamoon	1f074390e4	Feature: sharelink bundles (#11682 )	2026-01-27 18:54:51 +00:00
Antoine Mérino	df07b8a03e	Performance: faster statistics panel on dashboard (#11760 )	2026-01-26 12:10:57 -08:00
shamoon	4428354150	Feature: allow duplicates with warnings, UI for discovery (#11815 )	2026-01-26 18:55:08 +00:00
shamoon	45f5025f78	Enhancement: Add 'any of' workflow trigger filters (#11683 )	2026-01-25 13:45:50 -08:00
Trenton H	d0032c18be	Breaking: Remove support for document and thumbnail encryption (#11850 )	2026-01-24 19:29:54 -08:00
shamoon	742c136773	Fix: use explicit order field for workflow actions (#11781 )	2026-01-16 07:39:00 -08:00
shamoon	055ce9172c	Fix: use explicit order field for workflow actions (#11781 )	2026-01-15 22:49:21 +00:00
shamoon	e940764fe0	Feature: Paperless AI (#10319 )	2026-01-13 16:24:42 +00:00
shamoon	d0bd111eab	Change: make workflowrun a softdeletemodel (#11194 )	2025-10-27 20:51:39 +00:00
shamoon	fcae006afa	Tweak: improve tag parent validation error handling (#11096 )	2025-10-20 22:42:01 -07:00
shamoon	f6c004183e	Feature: Advanced Workflow Trigger Filters (#11029 )	2025-10-13 22:23:56 +00:00
shamoon	4cff907ba0	Feature: Nested Tags (#10833 ) --------- Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>	2025-09-17 21:41:39 +00:00
jojo2357	feb5d534b5	Enhancement: long text custom field (#10846 ) --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2025-09-14 03:19:00 +00:00
david-loe	2dc4f1f49b	Enhancement: add storage path as workflow trigger filter (#10771 ) --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2025-09-11 17:41:04 +00:00
sidey79	9e11e7fd05	Enhancement: jinja template support for workflow title assignment (#10700 ) --------- Co-authored-by: Trenton Holmes <797416+stumpylog@users.noreply.github.com> Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2025-09-11 06:56:16 -07:00
Antoine Mérino	8adc26e09d	Enhancement: Limit excessively long content length when computing suggestions (#10656 ) This helps prevent excessive processing times on very large documents by limiting the text analyzed during date parsing, tag prediction, and correspondent matching. If the document exceeds 1.2M chars, crop to 1M char.	2025-09-09 13:02:16 -07:00
shamoon	59bf25edb1	Fix: created date fixes in v2.16 (#10026 )	2025-05-24 09:45:07 -07:00
shamoon	1a6f32534c	Change: treat created as date not datetime (#9793 )	2025-05-16 14:23:04 +00:00
shamoon	344cc70cd5	Enhancement: support negative offset in scheduled workflows (#9746 )	2025-05-11 20:04:46 +00:00
Sebastian Steinbeißer	76d363f22d	Chore: switch from os.path to pathlib.Path (#9060 )	2025-03-05 21:06:01 +00:00
shamoon	edc7181843	Enhancement: support assigning custom field values in workflows (#9272 )	2025-03-05 12:30:19 -08:00
Trenton H	f205c4d0e2	Removes undocumented FileInfo (#9298 )	2025-03-04 13:49:47 -08:00
Trenton H	f3e6ed56b9	Removes the unused Log model and LogFilterSet (#9294 )	2025-03-04 18:26:25 +00:00
shamoon	2d52226732	Enhancement: system status report sanity check, simpler classifier check, styling updates (#9106 )	2025-02-26 22:12:20 +00:00
shamoon	4f08b5fa20	Enhancement: "webui" workflowtrigger source option (#9170 )	2025-02-21 08:26:00 -08:00
shamoon	e49ecd4dfe	Enhancement: use charfield for webhook url, custom validation (#9128 ) --------- Co-authored-by: Trenton H <797416+stumpylog@users.noreply.github.com>	2025-02-16 14:26:30 -08:00
shamoon	63bb3644f6	Enhancement: filter by file type (#8946 )	2025-02-10 08:09:50 -08:00
Sebastian Steinbeißer	e560fa3be0	Chore: Enable ruff FBT (#8645 )	2025-02-07 09:12:03 -08:00
shamoon	e08606af6e	Enhancement: date picker and date filter dropdown improvements (#9033 )	2025-02-06 23:01:48 -08:00
shamoon	ed1775e689	Enhancement: allow specifying JSON encoding for webhooks (#8799 )	2025-01-18 12:19:50 -08:00
shamoon	1d65628132	Feature: email, webhook workflow actions (#8108 )	2024-12-03 00:12:40 +00:00
shamoon	0fc1860d4c	Enhancement: use stable unique IDs for custom field select options (#8299 )	2024-12-02 04:15:38 +00:00
shamoon	2b29233a1e	Feature: scheduled workflow trigger (#8036 )	2024-11-24 18:22:31 +00:00

1 2 3 4 5 ...

305 Commits