paperless-ngx

mirror of https://github.com/paperless-ngx/paperless-ngx.git synced 2026-04-06 16:18:51 +00:00

Author	SHA1	Message	Date
Trenton Holmes	a557d2210b	Actually, the system check wouldn't see the 'wrong' value like that	2026-04-05 15:16:27 -07:00
Trenton Holmes	7454ce5a20	Try to automatically migrate user's DB settings values	2026-04-05 14:51:23 -07:00
Trenton Holmes	7f01b3a6f9	If the selected OCR mode is not a valid choice, warn and default to auto instead	2026-04-05 13:38:25 -07:00
Trenton Holmes	4138319832	Merge remote-tracking branch 'origin/dev' into feature-archive-ocr-decoupling	2026-04-05 13:32:00 -07:00
Trenton H	5f5fb263c9	Fix: Don't create a new note highlight generator per note in the loop (#12512 )	2026-04-03 17:34:15 -07:00
shamoon	b807b107ad	Enhancement: include sharelinks + bundles in export/import (#12479 )	2026-04-03 21:51:57 +00:00
Trenton H	c2f02851da	Chore: Better typed status manager messages (#12509 )	2026-04-03 21:18:01 +00:00
GitHub Actions	d0f8a98a9a	Auto translate strings	2026-04-03 20:55:14 +00:00
shamoon	566afdffca	Enhancement: unify text search to use tantivy (#12485 )	2026-04-03 13:53:45 -07:00
Trenton H	f32ad98d8e	Feature: Update consumer logging to include task ID for log correlation (#12510 )	2026-04-03 13:31:40 -07:00
Trenton H	91c77c42f0	Add debug level logging for why an archive is made and why we decided OCR or not	2026-04-03 09:16:00 -07:00
Trenton H	8115332cc9	Tests and fix a bug with the img2pdf functionality	2026-04-03 09:05:21 -07:00
Trenton H	c3be765761	Merge branch 'dev' into feature-archive-ocr-decoupling	2026-04-03 08:17:09 -07:00
Trenton H	d365f19962	Security: Registers a custom serializer which signs the task payload (#12504 )	2026-04-03 03:49:54 +00:00
GitHub Actions	2703c12f1a	Auto translate strings	2026-04-03 03:25:57 +00:00
shamoon	e7c7978d67	Enhancement: allow opt-in blocking internal mail hosts (#12502 )	2026-04-03 03:24:28 +00:00
GitHub Actions	83501757df	Auto translate strings	2026-04-02 22:36:32 +00:00
Trenton H	dda05a7c00	Security: Improve overall security in a few ways (#12501 ) - Make sure we're always using regex with timeouts for user controlled data - Adds rate limiting to the token endpoint (configurable) - Signs the classifier pickle file with the SECRET_KEY and refuse to load one which doesn't verify. - Require the user to set a secret key, instead of falling back to our old hard coded one	2026-04-02 15:30:26 -07:00
Trenton H	33c41dd2e7	Merge remote-tracking branch 'origin/dev' into feature-archive-ocr-decoupling	2026-04-02 15:27:08 -07:00
Trenton H	376af81b9c	Fix: Resolve another TC assuming an object has been created somewhere (#12503 )	2026-04-02 14:58:28 -07:00
GitHub Actions	05c9e21fac	Auto translate strings	2026-04-02 19:40:05 +00:00
Trenton H	aed9abe48c	Feature: Replace Whoosh with tantivy search backend (#12471 ) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Antoine Mérino <3023499+Merinorus@users.noreply.github.com>	2026-04-02 12:38:22 -07:00
GitHub Actions	2aa0c9f0b4	Auto translate strings	2026-03-31 18:25:03 +00:00
shamoon	d2328b776a	Performance: support bulk edit without id lists (#12355 )	2026-03-31 18:23:28 +00:00
GitHub Actions	e1da2a1efe	Auto translate strings	2026-03-31 14:57:34 +00:00
shamoon	245514ad10	Performance: deprecate and remove usage of `all` in API results (#12309 )	2026-03-31 07:55:59 -07:00
GitHub Actions	020057e1a4	Auto translate strings	2026-03-30 16:40:47 +00:00
shamoon	f715533770	Performance: support passing selection data with filtered document requests (#12300 )	2026-03-30 16:38:52 +00:00
Jan Kleine	0292edbee7	Fixhancement: include trashed documents in document exporter/importer (#12425 )	2026-03-30 16:30:22 +00:00
Andreas Schneider	85e0d1842a	Tests: add regression test for redis URL with empty username (#12460 ) * Tests: add regression test for redis URL with empty username and password Covers the unix://:SECRET@/path.sock format (empty username, password only), which was missing from the existing test cases for PR #12239. * Update src/paperless/tests/settings/test_custom_parsers.py --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-03-29 06:31:18 -07:00
GitHub Actions	62f79c088e	Auto translate strings	2026-03-28 21:00:05 +00:00
shamoon	129da3ade7	Tweakhancement: show file extension in StoragePath test (#12452 )	2026-03-28 13:58:33 -07:00
Trenton H	5cbbe0be89	Improvements for typing purposes mostly + some reuse	2026-03-28 13:21:52 -07:00
Trenton H	d5248838ca	Whoops, the tagged PDF check catches our fixture sample files, which broke these	2026-03-27 13:47:52 -07:00
Trenton H	6eb6e352da	Adds a tagged PDF check as well, for an even better decision to skip OCR in auto mode	2026-03-27 08:45:34 -07:00
Trenton H	d89a86643d	Merge branch 'dev' into feature-archive-ocr-decoupling	2026-03-27 08:35:25 -07:00
Trenton H	68322376f2	test: use pytest-django settings fixture and pytest.param in new tests - TestShouldProduceArchive: replace @override_settings decorators with settings fixture; consolidate 10 individual tests into 2 parametrized tests (test_generation_setting, test_auto_pdf_archive_decision) - TestDeprecatedV2OcrEnvVarWarnings: call check_deprecated_v2_ocr_env_vars() directly instead of django_checks.run_checks(); use mocker.patch.dict for env isolation; consolidate warn cases into one parametrized test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:51:24 -07:00
Trenton H	2729b0d3dc	refactor: consolidate pdftotext utility and archive-decision logic - Add extract_pdf_text() and PDF_TEXT_MIN_LENGTH to paperless/parsers/utils.py, eliminating duplicate pdftotext call sites in consumer.py and tesseract.py - Rename _should_produce_archive → should_produce_archive (now public, imported by both consumer.py and tasks.py) - update_document_content_maybe_archive_file now calls should_produce_archive, honouring ARCHIVE_FILE_GENERATION the same as the consumer pipeline - Fallback OCR path sets archive_path when produce_archive=True; update test_with_form_redo_produces_no_archive to use produce_archive=False Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:51:24 -07:00
Trenton H	07c0ed5e26	feat!: add deprecated v2 OCR env var warnings to system checks	2026-03-27 07:51:24 -07:00
Trenton H	d684452588	feat: compute produce_archive from ARCHIVE_FILE_GENERATION, pass to parser Add _extract_text_for_archive_check() and _should_produce_archive() helper functions to documents/consumer.py. These compute whether the parser should produce a PDF/A archive based on the ARCHIVE_FILE_GENERATION setting (always/ never/auto), parser capabilities (can_produce_archive, requires_pdf_rendition), MIME type, and pdftotext-based born-digital detection for auto mode. Update the parse() call site to compute and pass produce_archive=... kwarg. Add 10 unit tests in test_consumer_archive.py; update two existing consumer tests that asserted run_subprocess call counts now that pdftotext is invoked during auto-mode archive detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:51:24 -07:00
Trenton H	e00658375b	chore: remove dead archive_file_generation assignments from tests	2026-03-27 07:51:24 -07:00
Trenton H	a0cf673f1b	feat!: restructure parse() for OCR_MODE=auto/off and produce_archive flag Implement the new decoupled archive/OCR control in RasterisedDocumentParser: - construct_ocrmypdf_parameters(): add skip_text parameter; fix AUTO mode dispatch so skip_text is only added when explicitly requested (text-present + produce_archive case) rather than unconditionally; add OFF mode support. - parse(): remove archive_file_generation checks; control archive creation exclusively via the produce_archive bool passed by the consumer. - OFF + no archive: return pdftotext text, skip OCRmyPDF entirely. - OFF + image + archive: use new _convert_image_to_pdfa() helper. - OFF + PDF + archive: run OCRmyPDF with skip_text=True (PDF/A only). - AUTO + text + no archive: skip OCRmyPDF entirely (fast path). - AUTO + text + archive: run OCRmyPDF with skip_text=True. - AUTO + no text: run normal OCR regardless of produce_archive. - FORCE/REDO: always run OCRmyPDF; set archive_path only when produce_archive. - Add _convert_image_to_pdfa(): img2pdf wrapping + pikepdf PDF/A-2b stamping without invoking Tesseract or Ghostscript. - Add PriorOcrFoundError to the fallback exception list (same treatment as InputFileError: retry with force_ocr). - Update existing tests to use produce_archive instead of archive_file_generation: TestSkipArchive rewritten; RTL test uses mode=off to preserve Arabic text layer; AUTO mode tests clarified. - Add test_parse_modes.py: 11 focused unit tests with mocked ocrmypdf.ocr verifying control flow for all mode/produce_archive combinations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:51:24 -07:00
Trenton H	300432ae05	feat!: drop skip_archive_file field, add archive_file_generation to ApplicationConfiguration Replace the old skip_archive_file DB field with the correctly-named archive_file_generation field on ApplicationConfiguration. Remove the temporary getattr fallback in OcrConfig now that the migration exists. Update all test fixtures and API response assertions to use the new field name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:51:24 -07:00
Trenton H	6ba1b726be	feat!: update OcrConfig to read archive_file_generation from DB field Switches OcrConfig.__post_init__ from reading the old skip_archive_file attribute to the new archive_file_generation attribute, with a getattr fallback to skip_archive_file for compatibility until Task 4 renames the DB model field. Updates null_app_config fixtures in both the parser conftest and the new test_ocr_config.py to explicitly set both attributes to None so MagicMock doesn't return truthy auto-generated attributes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:51:24 -07:00
Trenton H	38d2abb982	feat!: rename OCR_SKIP_ARCHIVE_FILE to ARCHIVE_FILE_GENERATION Rename the Django setting OCR_SKIP_ARCHIVE_FILE to ARCHIVE_FILE_GENERATION and the env var PAPERLESS_OCR_SKIP_ARCHIVE_FILE to PAPERLESS_ARCHIVE_FILE_GENERATION. Rename the OcrConfig attribute skip_archive_file to archive_file_generation. Update checks.py error messages and all tests accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 07:51:24 -07:00
Trenton H	cd653959d6	feat!: replace ModeChoices and ArchiveFileChoices with new v3 enums - Replace ModeChoices (SKIP/SKIP_NO_ARCHIVE/REDO/FORCE) with new values: AUTO, FORCE, REDO, OFF - Remove ArchiveFileChoices entirely; add ArchiveFileGenerationChoices with AUTO, ALWAYS, NEVER values - Update checks.py valid sets and default settings to use new enum values - Update tesseract parser to use new enum comparisons; AUTO mode maps to skip_text behavior; FORCE/REDO bypass archive-skip early-exit - Update all affected tests to use new valid mode/archive string values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 12:50:43 -07:00
Trenton H	9383471fa0	Feature: Transition all checksums to use SHA256 (#12432 )	2026-03-26 11:28:02 -07:00
GitHub Actions	b153ec803b	Auto translate strings	2026-03-26 14:38:10 +00:00
shamoon	ae0474450f	Chore: logger, response and template sanitization cleanup (#12439 )	2026-03-26 07:36:02 -07:00
Trenton H	8efb01010c	fix: Don't silently drop the change_groups and switch to a couple slightly more efficient implementations (#12431 )	2026-03-26 14:15:42 +00:00

1 2 3 4 5 ...

3941 Commits