Feat: refactor TextDocumentParser to ParserProtocol

Starting from the moved paperless_text/parsers.py, rewrite the class to satisfy ParserProtocol without inheriting from the old DocumentParser base: - Add class-level identity attributes (name, version, author, url) - Add supported_mime_types() and score() classmethods - Add can_produce_archive and requires_pdf_rendition properties (both False) - Replace tempdir / read_file_handle_unicode_errors from old base class with a self-contained __init__, __enter__, __exit__, and _read_text helper - Drop file_name parameter from parse() and get_thumbnail(); add produce_archive kwarg - Add extract_metadata() returning [] (plain text has no structured metadata) - Remove get_settings() (not part of ParserProtocol) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Chore: move paperless_text/parsers.py to paperless/parsers/text.py
2026-03-10 03:01:23 +00:00 · 2026-03-09 16:54:52 -07:00 · 2026-03-09 16:31:00 -07:00 · 2026-03-09 16:07:10 -07:00 · 2026-03-09 15:40:28 -07:00 · 2026-03-09 15:30:09 -07:00
29 changed files with 3095 additions and 1278 deletions
--- a/.github/workflows/ci-backend.yml
+++ b/.github/workflows/ci-backend.yml
@@ -41,7 +41,7 @@ jobs:
          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
            echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
          elif [[ "${{ github.event.created }}" == "true" ]]; then
-            echo "base=${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
+            echo "base=origin/${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
          else
            echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
          fi
--- a/.github/workflows/ci-docs.yml
+++ b/.github/workflows/ci-docs.yml
@@ -43,7 +43,7 @@ jobs:
          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
            echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
          elif [[ "${{ github.event.created }}" == "true" ]]; then
-            echo "base=${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
+            echo "base=origin/${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
          else
            echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
          fi
--- a/.github/workflows/ci-frontend.yml
+++ b/.github/workflows/ci-frontend.yml
@@ -38,7 +38,7 @@ jobs:
          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
            echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
          elif [[ "${{ github.event.created }}" == "true" ]]; then
-            echo "base=${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
+            echo "base=origin/${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
          else
            echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
          fi
--- a/src/documents/matching.py
+++ b/src/documents/matching.py
@@ -169,7 +169,7 @@ def match_storage_paths(document: Document, classifier: DocumentClassifier, user
 def matches(matching_model: MatchingModel, document: Document):
    search_flags = 0

-    document_content = document.get_effective_content() or ""
+    document_content = document.content

    # Check that match is not empty
    if not matching_model.match.strip():
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -361,42 +361,6 @@ class Document(SoftDeleteModel, ModelWithOwner):  # type: ignore[django-manager-
            res += f" {self.title}"
        return res

-    def get_effective_content(self) -> str | None:
-        """
-        Returns the effective content for the document.
-
-        For root documents, this is the latest version's content when available.
-        For version documents, this is always the document's own content.
-        If the queryset already annotated ``effective_content``, that value is used.
-        """
-        if hasattr(self, "effective_content"):
-            return getattr(self, "effective_content")
-
-        if self.root_document_id is not None or self.pk is None:
-            return self.content
-
-        prefetched_cache = getattr(self, "_prefetched_objects_cache", None)
-        prefetched_versions = (
-            prefetched_cache.get("versions")
-            if isinstance(prefetched_cache, dict)
-            else None
-        )
-        if prefetched_versions:
-            latest_prefetched = max(prefetched_versions, key=lambda doc: doc.id)
-            return latest_prefetched.content
-
-        latest_version_content = (
-            Document.objects.filter(root_document=self)
-            .order_by("-id")
-            .values_list("content", flat=True)
-            .first()
-        )
-        return (
-            latest_version_content
-            if latest_version_content is not None
-            else self.content
-        )
-
    @property
    def suggestion_content(self):
        """
@@ -409,21 +373,15 @@ class Document(SoftDeleteModel, ModelWithOwner):  # type: ignore[django-manager-
        This improves processing speed for large documents while keeping
        enough context for accurate suggestions.
        """
-        effective_content = self.get_effective_content()
-        if not effective_content or len(effective_content) <= 1200000:
-            return effective_content
+        if not self.content or len(self.content) <= 1200000:
+            return self.content
        else:
            # Use 80% from the start and 20% from the end
            # to preserve both opening and closing context.
            head_len = 800000
            tail_len = 200000

-            return " ".join(
-                (
-                    effective_content[:head_len],
-                    effective_content[-tail_len:],
-                ),
-            )
+            return " ".join((self.content[:head_len], self.content[-tail_len:]))

    @property
    def source_path(self) -> Path:
--- a/src/documents/tests/test_document_model.py
+++ b/src/documents/tests/test_document_model.py
@@ -156,46 +156,6 @@ class TestDocument(TestCase):
        )
        self.assertEqual(doc.get_public_filename(), "2020-12-25 test")

-    def test_suggestion_content_uses_latest_version_content_for_root_documents(
-        self,
-    ) -> None:
-        root = Document.objects.create(
-            title="root",
-            checksum="root",
-            mime_type="application/pdf",
-            content="outdated root content",
-        )
-        version = Document.objects.create(
-            title="v1",
-            checksum="v1",
-            mime_type="application/pdf",
-            root_document=root,
-            content="latest version content",
-        )
-
-        self.assertEqual(root.suggestion_content, version.content)
-
-    def test_content_length_is_per_document_row_for_versions(self) -> None:
-        root = Document.objects.create(
-            title="root",
-            checksum="root",
-            mime_type="application/pdf",
-            content="abc",
-        )
-        version = Document.objects.create(
-            title="v1",
-            checksum="v1",
-            mime_type="application/pdf",
-            root_document=root,
-            content="abcdefgh",
-        )
-
-        root.refresh_from_db()
-        version.refresh_from_db()
-
-        self.assertEqual(root.content_length, 3)
-        self.assertEqual(version.content_length, 8)
-

 def test_suggestion_content() -> None:
    """
--- a/src/documents/tests/test_matchables.py
+++ b/src/documents/tests/test_matchables.py
@@ -48,52 +48,6 @@ class _TestMatchingBase(TestCase):


 class TestMatching(_TestMatchingBase):
-    def test_matches_uses_latest_version_content_for_root_documents(self) -> None:
-        root = Document.objects.create(
-            title="root",
-            checksum="root",
-            mime_type="application/pdf",
-            content="root content without token",
-        )
-        Document.objects.create(
-            title="v1",
-            checksum="v1",
-            mime_type="application/pdf",
-            root_document=root,
-            content="latest version contains keyword",
-        )
-        tag = Tag.objects.create(
-            name="tag",
-            match="keyword",
-            matching_algorithm=Tag.MATCH_ANY,
-        )
-
-        self.assertTrue(matching.matches(tag, root))
-
-    def test_matches_does_not_fall_back_to_root_content_when_version_exists(
-        self,
-    ) -> None:
-        root = Document.objects.create(
-            title="root",
-            checksum="root",
-            mime_type="application/pdf",
-            content="root contains keyword",
-        )
-        Document.objects.create(
-            title="v1",
-            checksum="v1",
-            mime_type="application/pdf",
-            root_document=root,
-            content="latest version without token",
-        )
-        tag = Tag.objects.create(
-            name="tag",
-            match="keyword",
-            matching_algorithm=Tag.MATCH_ANY,
-        )
-
-        self.assertFalse(matching.matches(tag, root))
-
    def test_match_none(self) -> None:
        self._test_matching(
            "",
--- a/src/locale/en_US/LC_MESSAGES/django.po
+++ b/src/locale/en_US/LC_MESSAGES/django.po
@@ -2,7 +2,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: paperless-ngx\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-03-09 17:44+0000\n"
+"POT-Creation-Date: 2026-03-09 01:51+0000\n"
 "PO-Revision-Date: 2022-02-17 04:17\n"
 "Last-Translator: \n"
 "Language-Team: English\n"
@@ -1856,151 +1856,151 @@ msgstr ""
 msgid "paperless application settings"
 msgstr ""

-#: paperless/settings/__init__.py:521
+#: paperless/settings/__init__.py:752
 msgid "English (US)"
 msgstr ""

-#: paperless/settings/__init__.py:522
+#: paperless/settings/__init__.py:753
 msgid "Arabic"
 msgstr ""

-#: paperless/settings/__init__.py:523
+#: paperless/settings/__init__.py:754
 msgid "Afrikaans"
 msgstr ""

-#: paperless/settings/__init__.py:524
+#: paperless/settings/__init__.py:755
 msgid "Belarusian"
 msgstr ""

-#: paperless/settings/__init__.py:525
+#: paperless/settings/__init__.py:756
 msgid "Bulgarian"
 msgstr ""

-#: paperless/settings/__init__.py:526
+#: paperless/settings/__init__.py:757
 msgid "Catalan"
 msgstr ""

-#: paperless/settings/__init__.py:527
+#: paperless/settings/__init__.py:758
 msgid "Czech"
 msgstr ""

-#: paperless/settings/__init__.py:528
+#: paperless/settings/__init__.py:759
 msgid "Danish"
 msgstr ""

-#: paperless/settings/__init__.py:529
+#: paperless/settings/__init__.py:760
 msgid "German"
 msgstr ""

-#: paperless/settings/__init__.py:530
+#: paperless/settings/__init__.py:761
 msgid "Greek"
 msgstr ""

-#: paperless/settings/__init__.py:531
+#: paperless/settings/__init__.py:762
 msgid "English (GB)"
 msgstr ""

-#: paperless/settings/__init__.py:532
+#: paperless/settings/__init__.py:763
 msgid "Spanish"
 msgstr ""

-#: paperless/settings/__init__.py:533
+#: paperless/settings/__init__.py:764
 msgid "Persian"
 msgstr ""

-#: paperless/settings/__init__.py:534
+#: paperless/settings/__init__.py:765
 msgid "Finnish"
 msgstr ""

-#: paperless/settings/__init__.py:535
+#: paperless/settings/__init__.py:766
 msgid "French"
 msgstr ""

-#: paperless/settings/__init__.py:536
+#: paperless/settings/__init__.py:767
 msgid "Hungarian"
 msgstr ""

-#: paperless/settings/__init__.py:537
+#: paperless/settings/__init__.py:768
 msgid "Indonesian"
 msgstr ""

-#: paperless/settings/__init__.py:538
+#: paperless/settings/__init__.py:769
 msgid "Italian"
 msgstr ""

-#: paperless/settings/__init__.py:539
+#: paperless/settings/__init__.py:770
 msgid "Japanese"
 msgstr ""

-#: paperless/settings/__init__.py:540
+#: paperless/settings/__init__.py:771
 msgid "Korean"
 msgstr ""

-#: paperless/settings/__init__.py:541
+#: paperless/settings/__init__.py:772
 msgid "Luxembourgish"
 msgstr ""

-#: paperless/settings/__init__.py:542
+#: paperless/settings/__init__.py:773
 msgid "Norwegian"
 msgstr ""

-#: paperless/settings/__init__.py:543
+#: paperless/settings/__init__.py:774
 msgid "Dutch"
 msgstr ""

-#: paperless/settings/__init__.py:544
+#: paperless/settings/__init__.py:775
 msgid "Polish"
 msgstr ""

-#: paperless/settings/__init__.py:545
+#: paperless/settings/__init__.py:776
 msgid "Portuguese (Brazil)"
 msgstr ""

-#: paperless/settings/__init__.py:546
+#: paperless/settings/__init__.py:777
 msgid "Portuguese"
 msgstr ""

-#: paperless/settings/__init__.py:547
+#: paperless/settings/__init__.py:778
 msgid "Romanian"
 msgstr ""

-#: paperless/settings/__init__.py:548
+#: paperless/settings/__init__.py:779
 msgid "Russian"
 msgstr ""

-#: paperless/settings/__init__.py:549
+#: paperless/settings/__init__.py:780
 msgid "Slovak"
 msgstr ""

-#: paperless/settings/__init__.py:550
+#: paperless/settings/__init__.py:781
 msgid "Slovenian"
 msgstr ""

-#: paperless/settings/__init__.py:551
+#: paperless/settings/__init__.py:782
 msgid "Serbian"
 msgstr ""

-#: paperless/settings/__init__.py:552
+#: paperless/settings/__init__.py:783
 msgid "Swedish"
 msgstr ""

-#: paperless/settings/__init__.py:553
+#: paperless/settings/__init__.py:784
 msgid "Turkish"
 msgstr ""

-#: paperless/settings/__init__.py:554
+#: paperless/settings/__init__.py:785
 msgid "Ukrainian"
 msgstr ""

-#: paperless/settings/__init__.py:555
+#: paperless/settings/__init__.py:786
 msgid "Vietnamese"
 msgstr ""

-#: paperless/settings/__init__.py:556
+#: paperless/settings/__init__.py:787
 msgid "Chinese Simplified"
 msgstr ""

-#: paperless/settings/__init__.py:557
+#: paperless/settings/__init__.py:788
 msgid "Chinese Traditional"
 msgstr ""

--- a/src/paperless/celery.py
+++ b/src/paperless/celery.py
@@ -1,6 +1,7 @@
 import os

 from celery import Celery
+from celery.signals import worker_process_init

 # Set the default Django settings module for the 'celery' program.
 os.environ.setdefault("DJANGO_SETTINGS_MODULE", "paperless.settings")
@@ -15,3 +16,18 @@ app.config_from_object("django.conf:settings", namespace="CELERY")

 # Load task modules from all registered Django apps.
 app.autodiscover_tasks()
+
+
+@worker_process_init.connect
+def on_worker_process_init(**kwargs) -> None:
+    """Register built-in parsers eagerly in each Celery worker process.
+
+    This registers only the built-in parsers (no entrypoint discovery) so
+    that workers can begin consuming documents immediately.  Entrypoint
+    discovery for third-party parsers is deferred to the first call of
+    ``get_parser_registry()`` inside a task, keeping ``worker_process_init``
+    well within its 4-second timeout budget.
+    """
+    from paperless.parsers.registry import init_builtin_parsers
+
+    init_builtin_parsers()
--- a/src/paperless/parsers/init.py
+++ b/src/paperless/parsers/init.py
@@ -0,0 +1,379 @@
+"""
+Public interface for the Paperless-ngx parser plugin system.
+
+This module defines ParserProtocol — the structural contract that every
+document parser must satisfy, whether it is a built-in parser shipped with
+Paperless-ngx or a third-party parser installed via a Python entrypoint.
+
+Phase 1/2 scope: only the Protocol is defined here. The transitional
+DocumentParser ABC (Phase 3) and concrete built-in parsers (Phase 3+) will
+be added in later phases, so there are intentionally no imports of parser
+implementations here.
+
+Usage example (third-party parser)::
+
+    from paperless.parsers import ParserProtocol
+
+    class MyParser:
+        name = "my-parser"
+        version = "1.0.0"
+        author = "Acme Corp"
+        url = "https://example.com/my-parser"
+
+        @classmethod
+        def supported_mime_types(cls) -> dict[str, str]:
+            return {"application/x-my-format": ".myf"}
+
+        @classmethod
+        def score(cls, mime_type, filename, path=None):
+            return 10
+
+        # … implement remaining protocol methods …
+
+    assert isinstance(MyParser(), ParserProtocol)
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+from typing import Protocol
+from typing import Self
+from typing import TypedDict
+from typing import runtime_checkable
+
+if TYPE_CHECKING:
+    import datetime
+    from pathlib import Path
+    from types import TracebackType
+
+__all__ = [
+    "MetadataEntry",
+    "ParserProtocol",
+]
+
+
+class MetadataEntry(TypedDict):
+    """A single metadata field extracted from a document.
+
+    All four keys are required. Values are always serialised to strings —
+    type-specific conversion (dates, integers, lists) is the responsibility
+    of the parser before returning.
+    """
+
+    namespace: str
+    """URI of the metadata namespace (e.g. 'http://ns.adobe.com/pdf/1.3/')."""
+
+    prefix: str
+    """Conventional namespace prefix (e.g. 'pdf', 'xmp', 'dc')."""
+
+    key: str
+    """Field name within the namespace (e.g. 'Author', 'CreateDate')."""
+
+    value: str
+    """String representation of the field value."""
+
+
+@runtime_checkable
+class ParserProtocol(Protocol):
+    """Structural contract for all Paperless-ngx document parsers.
+
+    Both built-in parsers and third-party plugins (discovered via the
+    "paperless_ngx.parsers" entrypoint group) must satisfy this Protocol.
+    Because it is decorated with runtime_checkable, isinstance(obj,
+    ParserProtocol) works at runtime based on method presence, which is
+    useful for validation in ParserRegistry.discover.
+
+    Parsers must expose four string attributes at the class level so the
+    registry can log attribution information without instantiating the parser:
+
+    name : str
+        Human-readable parser name (e.g. "Tesseract OCR").
+    version : str
+        Semantic version string (e.g. "1.2.3").
+    author : str
+        Author or organisation name.
+    url : str
+        URL for documentation, source code, or issue tracker.
+    """
+
+    # ------------------------------------------------------------------
+    # Class-level identity (checked by the registry, not Protocol methods)
+    # ------------------------------------------------------------------
+
+    name: str
+    version: str
+    author: str
+    url: str
+
+    # ------------------------------------------------------------------
+    # Class methods
+    # ------------------------------------------------------------------
+
+    @classmethod
+    def supported_mime_types(cls) -> dict[str, str]:
+        """Return a mapping of supported MIME types to preferred file extensions.
+
+        The keys are MIME type strings (e.g. "application/pdf"), and the
+        values are the preferred file extension including the leading dot
+        (e.g. ".pdf").  The registry uses this mapping both to decide whether
+        a parser is a candidate for a given file and to determine the default
+        extension when creating archive copies.
+
+        Returns
+        -------
+        dict[str, str]
+            {mime_type: extension} mapping — may be empty if the parser
+            has been temporarily disabled.
+        """
+        ...
+
+    @classmethod
+    def score(
+        cls,
+        mime_type: str,
+        filename: str,
+        path: Path | None = None,
+    ) -> int | None:
+        """Return a priority score for handling this file, or None to decline.
+
+        The registry calls this after confirming that the MIME type is in
+        supported_mime_types. Parsers may inspect filename and optionally
+        the file at path to refine their confidence level.
+
+        A higher score wins. Return None to explicitly decline handling a file
+        even though the MIME type is listed as supported (e.g. when a feature
+        flag is disabled, or a required service is not configured).
+
+        Parameters
+        ----------
+        mime_type:
+            The detected MIME type of the file to be parsed.
+        filename:
+            The original filename, including extension.
+        path:
+            Optional filesystem path to the file. Parsers that need to
+            inspect file content (e.g. magic-byte sniffing) may use this.
+            May be None when scoring happens before the file is available locally.
+
+        Returns
+        -------
+        int | None
+            Priority score (higher wins), or None to decline.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Properties
+    # ------------------------------------------------------------------
+
+    @property
+    def can_produce_archive(self) -> bool:
+        """Whether this parser can produce a searchable PDF archive copy.
+
+        If True, the consumption pipeline may request an archive version when
+        processing the document, subject to the ARCHIVE_FILE_GENERATION
+        setting. If False, only thumbnail and text extraction are performed.
+        """
+        ...
+
+    @property
+    def requires_pdf_rendition(self) -> bool:
+        """Whether the parser must produce a PDF for the frontend to display.
+
+        True for formats the browser cannot display natively (e.g. DOCX, ODT).
+        When True, the pipeline always stores the PDF output regardless of the
+        ARCHIVE_FILE_GENERATION setting, since the original format cannot be
+        shown to the user.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Core parsing interface
+    # ------------------------------------------------------------------
+
+    def parse(
+        self,
+        document_path: Path,
+        mime_type: str,
+        *,
+        produce_archive: bool = True,
+    ) -> None:
+        """Parse document_path and populate internal state.
+
+        After a successful call, callers retrieve results via get_text,
+        get_date, and get_archive_path.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the document file to parse.
+        mime_type:
+            Detected MIME type of the document.
+        produce_archive:
+            When True (the default) and can_produce_archive is also True,
+            the parser should produce a searchable PDF at the path returned
+            by get_archive_path. Pass False when only text extraction and
+            thumbnail generation are required and disk I/O should be minimised.
+
+        Raises
+        ------
+        documents.parsers.ParseError
+            If parsing fails for any reason.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Result accessors
+    # ------------------------------------------------------------------
+
+    def get_text(self) -> str | None:
+        """Return the plain-text content extracted during parse.
+
+        Returns
+        -------
+        str | None
+            Extracted text, or None if no text could be found.
+        """
+        ...
+
+    def get_date(self) -> datetime.datetime | None:
+        """Return the document date detected during parse.
+
+        Returns
+        -------
+        datetime.datetime | None
+            Detected document date, or None if no date was found.
+        """
+        ...
+
+    def get_archive_path(self) -> Path | None:
+        """Return the path to the generated archive PDF, or None.
+
+        Returns
+        -------
+        Path | None
+            Path to the searchable PDF archive, or None if no archive was
+            produced (e.g. because produce_archive=False or the parser does
+            not support archive generation).
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Thumbnail and metadata
+    # ------------------------------------------------------------------
+
+    def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
+        """Generate and return the path to a thumbnail image for the document.
+
+        May be called independently of parse. The returned path must point to
+        an existing WebP image file inside the parser's temporary working
+        directory.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        Path
+            Path to the generated thumbnail image (WebP format preferred).
+        """
+        ...
+
+    def get_page_count(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> int | None:
+        """Return the number of pages in the document, if determinable.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        int | None
+            Page count, or None if the parser cannot determine it.
+        """
+        ...
+
+    def extract_metadata(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> list[MetadataEntry]:
+        """Extract format-specific metadata from the document.
+
+        Called by the API view layer on demand — not during the consumption
+        pipeline. Results are returned to the frontend for per-file display.
+
+        For documents with an archive version, this method is called twice:
+        once for the original file (with its native MIME type) and once for
+        the archive file (with ``"application/pdf"``). Parsers that produce
+        archives should handle both cases.
+
+        Implementations must not raise. A failure to read metadata is not
+        fatal — log a warning and return whatever partial results were
+        collected, or ``[]`` if none.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the file to extract metadata from.
+        mime_type:
+            MIME type of the file at ``document_path``. May be
+            ``"application/pdf"`` when called for the archive version.
+
+        Returns
+        -------
+        list[MetadataEntry]
+            Zero or more metadata entries. Returns ``[]`` if no metadata
+            could be extracted or the format does not support it.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Context manager
+    # ------------------------------------------------------------------
+
+    def __enter__(self) -> Self:
+        """Enter the parser context, returning the parser instance.
+
+        Implementations should perform any resource allocation here if not
+        done in __init__ (e.g. creating API clients or temp directories).
+
+        Returns
+        -------
+        Self
+            The parser instance itself.
+        """
+        ...
+
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        """Exit the parser context and release all resources.
+
+        Implementations must clean up all temporary files and other resources
+        regardless of whether an exception occurred.
+
+        Parameters
+        ----------
+        exc_type:
+            The exception class, or None if no exception was raised.
+        exc_val:
+            The exception instance, or None.
+        exc_tb:
+            The traceback, or None.
+        """
+        ...
--- a/src/paperless/parsers/registry.py
+++ b/src/paperless/parsers/registry.py
@@ -0,0 +1,365 @@
+"""
+Singleton registry that tracks all document parsers available to
+Paperless-ngx — both built-ins shipped with the application and third-party
+plugins installed via Python entrypoints.
+
+Public surface
+--------------
+get_parser_registry
+    Lazy-initialise and return the shared ParserRegistry. This is the primary
+    entry point for production code.
+
+init_builtin_parsers
+    Register built-in parsers only, without entrypoint discovery. Safe to
+    call from Celery worker_process_init where importing all entrypoints
+    would be wasteful or cause side effects.
+
+reset_parser_registry
+    Reset module-level state. For tests only.
+
+Entrypoint group
+----------------
+Third-party parsers must advertise themselves under the
+"paperless_ngx.parsers" entrypoint group in their pyproject.toml::
+
+    [project.entry-points."paperless_ngx.parsers"]
+    my_parser = "my_package.parsers:MyParser"
+
+The loaded class must expose the following attributes at the class level
+(not just on instances) for the registry to accept it:
+name, version, author, url, supported_mime_types (callable), score (callable).
+"""
+
+from __future__ import annotations
+
+import logging
+from importlib.metadata import entry_points
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+    from paperless.parsers import ParserProtocol
+
+logger = logging.getLogger("paperless.parsers.registry")
+
+# ---------------------------------------------------------------------------
+# Module-level singleton state
+# ---------------------------------------------------------------------------
+
+_registry: ParserRegistry | None = None
+_discovery_complete: bool = False
+
+# Attribute names that every registered external parser class must expose.
+_REQUIRED_ATTRS: tuple[str, ...] = (
+    "name",
+    "version",
+    "author",
+    "url",
+    "supported_mime_types",
+    "score",
+)
+
+
+# ---------------------------------------------------------------------------
+# Module-level accessor functions
+# ---------------------------------------------------------------------------
+
+
+def get_parser_registry() -> ParserRegistry:
+    """Return the shared ParserRegistry instance.
+
+    On the first call this function:
+
+    1. Creates a new ParserRegistry.
+    2. Calls register_defaults to install built-in parsers.
+    3. Calls discover to load third-party plugins via importlib.metadata entrypoints.
+    4. Calls log_summary to emit a startup summary.
+
+    Subsequent calls return the same instance immediately.
+
+    Returns
+    -------
+    ParserRegistry
+        The shared registry singleton.
+    """
+    global _registry, _discovery_complete
+
+    if _registry is None:
+        _registry = ParserRegistry()
+        _registry.register_defaults()
+
+    if not _discovery_complete:
+        _registry.discover()
+        _registry.log_summary()
+        _discovery_complete = True
+
+    return _registry
+
+
+def init_builtin_parsers() -> None:
+    """Register built-in parsers without performing entrypoint discovery.
+
+    Intended for use in Celery worker_process_init handlers where importing
+    all installed entrypoints would be wasteful, slow, or could produce
+    undesirable side effects. Entrypoint discovery (third-party plugins) is
+    deliberately not performed.
+
+    Safe to call multiple times — subsequent calls are no-ops.
+
+    Returns
+    -------
+    None
+    """
+    global _registry
+
+    if _registry is None:
+        _registry = ParserRegistry()
+        _registry.register_defaults()
+        _registry.log_summary()
+
+
+def reset_parser_registry() -> None:
+    """Reset the module-level registry state to its initial values.
+
+    Resets _registry and _discovery_complete so the next call to
+    get_parser_registry will re-initialise everything from scratch.
+
+    FOR TESTS ONLY. Do not call this in production code — resetting the
+    registry mid-request causes all subsequent parser lookups to go through
+    discovery again, which is expensive and may have unexpected side effects
+    in multi-threaded environments.
+
+    Returns
+    -------
+    None
+    """
+    global _registry, _discovery_complete
+
+    _registry = None
+    _discovery_complete = False
+
+
+# ---------------------------------------------------------------------------
+# Registry class
+# ---------------------------------------------------------------------------
+
+
+class ParserRegistry:
+    """Registry that maps MIME types to the best available parser class.
+
+    Parsers are partitioned into two lists:
+
+    _builtins
+        Parser classes registered via register_builtin (populated by
+        register_defaults in Phase 3+).
+
+    _external
+        Parser classes loaded from installed Python entrypoints via discover.
+
+    When resolving a parser for a file, external parsers are evaluated
+    alongside built-in parsers using a uniform scoring mechanism. Both lists
+    are iterated together; the class with the highest score wins. If an
+    external parser wins, its attribution details are logged so users can
+    identify which third-party package handled their document.
+    """
+
+    def __init__(self) -> None:
+        self._external: list[type[ParserProtocol]] = []
+        self._builtins: list[type[ParserProtocol]] = []
+
+    # ------------------------------------------------------------------
+    # Registration
+    # ------------------------------------------------------------------
+
+    def register_builtin(self, parser_class: type[ParserProtocol]) -> None:
+        """Register a built-in parser class.
+
+        Built-in parsers are shipped with Paperless-ngx and are appended to
+        the _builtins list. They are never overridden by external parsers;
+        instead, scoring determines which parser wins for any given file.
+
+        Parameters
+        ----------
+        parser_class:
+            The parser class to register. Must satisfy ParserProtocol.
+        """
+        self._builtins.append(parser_class)
+
+    def register_defaults(self) -> None:
+        """Register the built-in parsers that ship with Paperless-ngx.
+
+        Each parser that has been migrated to the new ParserProtocol interface
+        is registered here.  Parsers are added in ascending weight order so
+        that log output is predictable; scoring determines which parser wins
+        at runtime regardless of registration order.
+        """
+        from paperless.parsers.text import TextDocumentParser
+
+        self.register_builtin(TextDocumentParser)
+
+    # ------------------------------------------------------------------
+    # Discovery
+    # ------------------------------------------------------------------
+
+    def discover(self) -> None:
+        """Load third-party parsers from the "paperless_ngx.parsers" entrypoint group.
+
+        For each advertised entrypoint the method:
+
+        1. Calls ep.load() to import the class.
+        2. Validates that the class exposes all required attributes.
+        3. On success, appends the class to _external and logs an info message.
+        4. On failure (import error or missing attributes), logs an appropriate
+           warning/error and continues to the next entrypoint.
+
+        Errors during discovery of a single parser do not prevent other parsers
+        from being loaded.
+
+        Returns
+        -------
+        None
+        """
+        eps = entry_points(group="paperless_ngx.parsers")
+
+        for ep in eps:
+            try:
+                parser_class = ep.load()
+            except Exception:
+                logger.exception(
+                    "Failed to load parser entrypoint '%s' — skipping.",
+                    ep.name,
+                )
+                continue
+
+            missing = [
+                attr for attr in _REQUIRED_ATTRS if not hasattr(parser_class, attr)
+            ]
+            if missing:
+                logger.warning(
+                    "Parser loaded from entrypoint '%s' is missing required "
+                    "attributes %r — skipping.",
+                    ep.name,
+                    missing,
+                )
+                continue
+
+            self._external.append(parser_class)
+            logger.info(
+                "Loaded third-party parser '%s' v%s by %s (entrypoint: '%s').",
+                parser_class.name,
+                parser_class.version,
+                parser_class.author,
+                ep.name,
+            )
+
+    # ------------------------------------------------------------------
+    # Summary logging
+    # ------------------------------------------------------------------
+
+    def log_summary(self) -> None:
+        """Log a startup summary of all registered parsers.
+
+        Built-in parsers are listed first, followed by any external parsers
+        discovered from entrypoints.  If no external parsers were found a
+        short informational message is logged instead of an empty list.
+
+        Returns
+        -------
+        None
+        """
+        logger.info(
+            "Built-in parsers (%d):",
+            len(self._builtins),
+        )
+        for cls in self._builtins:
+            logger.info(
+                "  [built-in] %s v%s — %s",
+                getattr(cls, "name", repr(cls)),
+                getattr(cls, "version", "unknown"),
+                getattr(cls, "url", "built-in"),
+            )
+
+        if not self._external:
+            logger.info("No third-party parsers discovered.")
+            return
+
+        logger.info(
+            "Third-party parsers (%d):",
+            len(self._external),
+        )
+        for cls in self._external:
+            logger.info(
+                "  [external] %s v%s by %s — report issues at %s",
+                getattr(cls, "name", repr(cls)),
+                getattr(cls, "version", "unknown"),
+                getattr(cls, "author", "unknown"),
+                getattr(cls, "url", "unknown"),
+            )
+
+    # ------------------------------------------------------------------
+    # Parser resolution
+    # ------------------------------------------------------------------
+
+    def get_parser_for_file(
+        self,
+        mime_type: str,
+        filename: str,
+        path: Path | None = None,
+    ) -> type[ParserProtocol] | None:
+        """Return the best parser class for the given file, or None.
+
+        All registered parsers (external first, then built-ins) are evaluated
+        against the file. A parser is eligible if mime_type appears in the dict
+        returned by its supported_mime_types classmethod, and its score
+        classmethod returns a non-None integer.
+
+        The parser with the highest score wins. When two parsers return the
+        same score, the one that appears earlier in the evaluation order wins
+        (external parsers are evaluated before built-ins, giving third-party
+        packages a chance to override defaults at equal priority).
+
+        When an external parser is selected, its identity is logged at INFO
+        level so operators can trace which package handled a document.
+
+        Parameters
+        ----------
+        mime_type:
+            The detected MIME type of the file.
+        filename:
+            The original filename, including extension.
+        path:
+            Optional filesystem path to the file. Forwarded to each
+            parser's score method.
+
+        Returns
+        -------
+        type[ParserProtocol] | None
+            The winning parser class, or None if no parser can handle the file.
+        """
+        best_score: int | None = None
+        best_parser: type[ParserProtocol] | None = None
+
+        # External parsers are placed first so that, at equal scores, an
+        # external parser wins over a built-in (first-seen policy).
+        for parser_class in (*self._external, *self._builtins):
+            if mime_type not in parser_class.supported_mime_types():
+                continue
+
+            score = parser_class.score(mime_type, filename, path)
+            if score is None:
+                continue
+
+            if best_score is None or score > best_score:
+                best_score = score
+                best_parser = parser_class
+
+        if best_parser is not None and best_parser in self._external:
+            logger.info(
+                "Document handled by third-party parser '%s' v%s — %s",
+                getattr(best_parser, "name", repr(best_parser)),
+                getattr(best_parser, "version", "unknown"),
+                getattr(best_parser, "url", "unknown"),
+            )
+
+        return best_parser
--- a/src/paperless/parsers/text.py
+++ b/src/paperless/parsers/text.py
@@ -0,0 +1,320 @@
+"""
+Built-in plain-text document parser.
+
+Handles text/plain, text/csv, and application/csv MIME types by reading the
+file content directly.  Thumbnails are generated by rendering a page-sized
+WebP image from the first 100,000 characters using Pillow.
+"""
+
+from __future__ import annotations
+
+import logging
+import shutil
+import tempfile
+from pathlib import Path
+from typing import TYPE_CHECKING
+from typing import Self
+
+from django.conf import settings
+from PIL import Image
+from PIL import ImageDraw
+from PIL import ImageFont
+
+from paperless.version import __full_version_str__
+
+if TYPE_CHECKING:
+    import datetime
+    from types import TracebackType
+
+    from paperless.parsers import MetadataEntry
+
+logger = logging.getLogger("paperless.parsing.text")
+
+_SUPPORTED_MIME_TYPES: dict[str, str] = {
+    "text/plain": ".txt",
+    "text/csv": ".csv",
+    "application/csv": ".csv",
+}
+
+
+class TextDocumentParser:
+    """Parse plain-text documents (txt, csv) for Paperless-ngx.
+
+    This parser reads the file content directly as UTF-8 text and renders a
+    simple thumbnail using Pillow.  It does not perform OCR and does not
+    produce a searchable PDF archive copy.
+
+    Class attributes
+    ----------------
+    name : str
+        Human-readable parser name.
+    version : str
+        Semantic version string, kept in sync with Paperless-ngx releases.
+    author : str
+        Maintainer name.
+    url : str
+        Issue tracker / source URL.
+    """
+
+    name: str = "Paperless-ngx Text Parser"
+    version: str = __full_version_str__
+    author: str = "Paperless-ngx Contributors"
+    url: str = "https://github.com/paperless-ngx/paperless-ngx"
+
+    # ------------------------------------------------------------------
+    # Class methods
+    # ------------------------------------------------------------------
+
+    @classmethod
+    def supported_mime_types(cls) -> dict[str, str]:
+        """Return the MIME types this parser handles.
+
+        Returns
+        -------
+        dict[str, str]
+            Mapping of MIME type to preferred file extension.
+        """
+        return _SUPPORTED_MIME_TYPES
+
+    @classmethod
+    def score(
+        cls,
+        mime_type: str,
+        filename: str,
+        path: Path | None = None,
+    ) -> int | None:
+        """Return the priority score for handling this file.
+
+        Parameters
+        ----------
+        mime_type:
+            Detected MIME type of the file.
+        filename:
+            Original filename including extension.
+        path:
+            Optional filesystem path. Not inspected by this parser.
+
+        Returns
+        -------
+        int | None
+            10 if the MIME type is supported, otherwise None.
+        """
+        if mime_type in _SUPPORTED_MIME_TYPES:
+            return 10
+        return None
+
+    # ------------------------------------------------------------------
+    # Properties
+    # ------------------------------------------------------------------
+
+    @property
+    def can_produce_archive(self) -> bool:
+        """Whether this parser can produce a searchable PDF archive copy.
+
+        Returns
+        -------
+        bool
+            Always False — the text parser does not produce a PDF archive.
+        """
+        return False
+
+    @property
+    def requires_pdf_rendition(self) -> bool:
+        """Whether the parser must produce a PDF for the frontend to display.
+
+        Returns
+        -------
+        bool
+            Always False — plain text files are displayable as-is.
+        """
+        return False
+
+    # ------------------------------------------------------------------
+    # Lifecycle
+    # ------------------------------------------------------------------
+
+    def __init__(self, logging_group: object = None) -> None:
+        settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
+        self._tempdir = Path(
+            tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR),
+        )
+        self._text: str | None = None
+
+    def __enter__(self) -> Self:
+        return self
+
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        logger.debug("Cleaning up temporary directory %s", self._tempdir)
+        shutil.rmtree(self._tempdir, ignore_errors=True)
+
+    # ------------------------------------------------------------------
+    # Core parsing interface
+    # ------------------------------------------------------------------
+
+    def parse(
+        self,
+        document_path: Path,
+        mime_type: str,
+        *,
+        produce_archive: bool = True,
+    ) -> None:
+        """Read the document and store its text content.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the text file.
+        mime_type:
+            Detected MIME type of the document.
+        produce_archive:
+            Ignored — this parser never produces a PDF archive.
+
+        Raises
+        ------
+        documents.parsers.ParseError
+            If the file cannot be read.
+        """
+        self._text = self._read_text(document_path)
+
+    # ------------------------------------------------------------------
+    # Result accessors
+    # ------------------------------------------------------------------
+
+    def get_text(self) -> str | None:
+        """Return the plain-text content extracted during parse.
+
+        Returns
+        -------
+        str | None
+            Extracted text, or None if parse has not been called yet.
+        """
+        return self._text
+
+    def get_date(self) -> datetime.datetime | None:
+        """Return the document date detected during parse.
+
+        Returns
+        -------
+        datetime.datetime | None
+            Always None — the text parser does not detect dates.
+        """
+        return None
+
+    def get_archive_path(self) -> Path | None:
+        """Return the path to a generated archive PDF, or None.
+
+        Returns
+        -------
+        Path | None
+            Always None — the text parser does not produce a PDF archive.
+        """
+        return None
+
+    # ------------------------------------------------------------------
+    # Thumbnail and metadata
+    # ------------------------------------------------------------------
+
+    def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
+        """Render the first portion of the document as a WebP thumbnail.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        Path
+            Path to the generated WebP thumbnail inside the temporary directory.
+        """
+        max_chars = 100_000
+        file_size_limit = 50 * 1024 * 1024
+
+        if document_path.stat().st_size > file_size_limit:
+            text = "[File too large to preview]"
+        else:
+            with Path(document_path).open("r", encoding="utf-8", errors="replace") as f:
+                text = f.read(max_chars)
+
+        img = Image.new("RGB", (500, 700), color="white")
+        draw = ImageDraw.Draw(img)
+        font = ImageFont.truetype(
+            font=settings.THUMBNAIL_FONT_NAME,
+            size=20,
+            layout_engine=ImageFont.Layout.BASIC,
+        )
+        draw.multiline_text((5, 5), text, font=font, fill="black", spacing=4)
+
+        out_path = self._tempdir / "thumb.webp"
+        img.save(out_path, format="WEBP")
+
+        return out_path
+
+    def get_page_count(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> int | None:
+        """Return the number of pages in the document.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        int | None
+            Always None — page count is not meaningful for plain text.
+        """
+        return None
+
+    def extract_metadata(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> list[MetadataEntry]:
+        """Extract format-specific metadata from the document.
+
+        Returns
+        -------
+        list[MetadataEntry]
+            Always ``[]`` — plain text files carry no structured metadata.
+        """
+        return []
+
+    # ------------------------------------------------------------------
+    # Private helpers
+    # ------------------------------------------------------------------
+
+    def _read_text(self, filepath: Path) -> str:
+        """Read file content, replacing invalid UTF-8 bytes rather than failing.
+
+        Parameters
+        ----------
+        filepath:
+            Path to the file to read.
+
+        Returns
+        -------
+        str
+            File content as a string.
+        """
+        try:
+            return filepath.read_text(encoding="utf-8")
+        except UnicodeDecodeError as exc:
+            logger.warning(
+                "Unicode error reading %s, replacing bad bytes: %s",
+                filepath,
+                exc,
+            )
+            return filepath.read_bytes().decode("utf-8", errors="replace")
--- a/src/paperless/settings/init.py
+++ b/src/paperless/settings/init.py
@@ -6,25 +6,18 @@ import math
 import multiprocessing
 import os
 import tempfile
+from os import PathLike
 from pathlib import Path
 from typing import Final
 from urllib.parse import urlparse

+from celery.schedules import crontab
 from compression_middleware.middleware import CompressionMiddleware
+from dateparser.languages.loader import LocaleDataLoader
 from django.utils.translation import gettext_lazy as _
 from dotenv import load_dotenv

-from paperless.settings.custom import parse_beat_schedule
-from paperless.settings.custom import parse_dateparser_languages
 from paperless.settings.custom import parse_db_settings
-from paperless.settings.custom import parse_hosting_settings
-from paperless.settings.custom import parse_ignore_dates
-from paperless.settings.custom import parse_redis_url
-from paperless.settings.parsers import get_bool_from_env
-from paperless.settings.parsers import get_float_from_env
-from paperless.settings.parsers import get_int_from_env
-from paperless.settings.parsers import get_list_from_env
-from paperless.settings.parsers import get_path_from_env

 logger = logging.getLogger("paperless.settings")

@@ -52,8 +45,239 @@ for path in [
 os.environ["OMP_THREAD_LIMIT"] = "1"


+def __get_boolean(key: str, default: str = "NO") -> bool:
+    """
+    Return a boolean value based on whatever the user has supplied in the
+    environment based on whether the value "looks like" it's True or not.
+    """
+    return bool(os.getenv(key, default).lower() in ("yes", "y", "1", "t", "true"))
+
+
+def __get_int(key: str, default: int) -> int:
+    """
+    Return an integer value based on the environment variable or a default
+    """
+    return int(os.getenv(key, default))
+
+
+def __get_optional_int(key: str) -> int | None:
+    """
+    Returns None if the environment key is not present, otherwise an integer
+    """
+    if key in os.environ:
+        return __get_int(key, -1)  # pragma: no cover
+    return None
+
+
+def __get_float(key: str, default: float) -> float:
+    """
+    Return an integer value based on the environment variable or a default
+    """
+    return float(os.getenv(key, default))
+
+
+def __get_path(
+    key: str,
+    default: PathLike | str,
+) -> Path:
+    """
+    Return a normalized, absolute path based on the environment variable or a default,
+    if provided
+    """
+    if key in os.environ:
+        return Path(os.environ[key]).resolve()
+    return Path(default).resolve()
+
+
+def __get_optional_path(key: str) -> Path | None:
+    """
+    Returns None if the environment key is not present, otherwise a fully resolved Path
+    """
+    if key in os.environ:
+        return __get_path(key, "")
+    return None
+
+
+def __get_list(
+    key: str,
+    default: list[str] | None = None,
+    sep: str = ",",
+) -> list[str]:
+    """
+    Return a list of elements from the environment, as separated by the given
+    string, or the default if the key does not exist
+    """
+    if key in os.environ:
+        return list(filter(None, os.environ[key].split(sep)))
+    elif default is not None:
+        return default
+    else:
+        return []
+
+
+def _parse_redis_url(env_redis: str | None) -> tuple[str, str]:
+    """
+    Gets the Redis information from the environment or a default and handles
+    converting from incompatible django_channels and celery formats.
+
+    Returns a tuple of (celery_url, channels_url)
+    """
+
+    # Not set, return a compatible default
+    if env_redis is None:
+        return ("redis://localhost:6379", "redis://localhost:6379")
+
+    if "unix" in env_redis.lower():
+        # channels_redis socket format, looks like:
+        # "unix:///path/to/redis.sock"
+        _, path = env_redis.split(":", 1)
+        # Optionally setting a db number
+        if "?db=" in env_redis:
+            path, number = path.split("?db=")
+            return (f"redis+socket:{path}?virtual_host={number}", env_redis)
+        else:
+            return (f"redis+socket:{path}", env_redis)
+
+    elif "+socket" in env_redis.lower():
+        # celery socket style, looks like:
+        # "redis+socket:///path/to/redis.sock"
+        _, path = env_redis.split(":", 1)
+        if "?virtual_host=" in env_redis:
+            # Virtual host (aka db number)
+            path, number = path.split("?virtual_host=")
+            return (env_redis, f"unix:{path}?db={number}")
+        else:
+            return (env_redis, f"unix:{path}")
+
+    # Not a socket
+    return (env_redis, env_redis)
+
+
+def _parse_beat_schedule() -> dict:
+    """
+    Configures the scheduled tasks, according to default or
+    environment variables.  Task expiration is configured so the task will
+    expire (and not run), shortly before the default frequency will put another
+    of the same task into the queue
+
+
+    https://docs.celeryq.dev/en/stable/userguide/periodic-tasks.html#beat-entries
+    https://docs.celeryq.dev/en/latest/userguide/calling.html#expiration
+    """
+    schedule = {}
+    tasks = [
+        {
+            "name": "Check all e-mail accounts",
+            "env_key": "PAPERLESS_EMAIL_TASK_CRON",
+            # Default every ten minutes
+            "env_default": "*/10 * * * *",
+            "task": "paperless_mail.tasks.process_mail_accounts",
+            "options": {
+                # 1 minute before default schedule sends again
+                "expires": 9.0 * 60.0,
+            },
+        },
+        {
+            "name": "Train the classifier",
+            "env_key": "PAPERLESS_TRAIN_TASK_CRON",
+            # Default hourly at 5 minutes past the hour
+            "env_default": "5 */1 * * *",
+            "task": "documents.tasks.train_classifier",
+            "options": {
+                # 1 minute before default schedule sends again
+                "expires": 59.0 * 60.0,
+            },
+        },
+        {
+            "name": "Optimize the index",
+            "env_key": "PAPERLESS_INDEX_TASK_CRON",
+            # Default daily at midnight
+            "env_default": "0 0 * * *",
+            "task": "documents.tasks.index_optimize",
+            "options": {
+                # 1 hour before default schedule sends again
+                "expires": 23.0 * 60.0 * 60.0,
+            },
+        },
+        {
+            "name": "Perform sanity check",
+            "env_key": "PAPERLESS_SANITY_TASK_CRON",
+            # Default Sunday at 00:30
+            "env_default": "30 0 * * sun",
+            "task": "documents.tasks.sanity_check",
+            "options": {
+                # 1 hour before default schedule sends again
+                "expires": ((7.0 * 24.0) - 1.0) * 60.0 * 60.0,
+            },
+        },
+        {
+            "name": "Empty trash",
+            "env_key": "PAPERLESS_EMPTY_TRASH_TASK_CRON",
+            # Default daily at 01:00
+            "env_default": "0 1 * * *",
+            "task": "documents.tasks.empty_trash",
+            "options": {
+                # 1 hour before default schedule sends again
+                "expires": 23.0 * 60.0 * 60.0,
+            },
+        },
+        {
+            "name": "Check and run scheduled workflows",
+            "env_key": "PAPERLESS_WORKFLOW_SCHEDULED_TASK_CRON",
+            # Default hourly at 5 minutes past the hour
+            "env_default": "5 */1 * * *",
+            "task": "documents.tasks.check_scheduled_workflows",
+            "options": {
+                # 1 minute before default schedule sends again
+                "expires": 59.0 * 60.0,
+            },
+        },
+        {
+            "name": "Rebuild LLM index",
+            "env_key": "PAPERLESS_LLM_INDEX_TASK_CRON",
+            # Default daily at 02:10
+            "env_default": "10 2 * * *",
+            "task": "documents.tasks.llmindex_index",
+            "options": {
+                # 1 hour before default schedule sends again
+                "expires": 23.0 * 60.0 * 60.0,
+            },
+        },
+        {
+            "name": "Cleanup expired share link bundles",
+            "env_key": "PAPERLESS_SHARE_LINK_BUNDLE_CLEANUP_CRON",
+            # Default daily at 02:00
+            "env_default": "0 2 * * *",
+            "task": "documents.tasks.cleanup_expired_share_link_bundles",
+            "options": {
+                # 1 hour before default schedule sends again
+                "expires": 23.0 * 60.0 * 60.0,
+            },
+        },
+    ]
+    for task in tasks:
+        # Either get the environment setting or use the default
+        value = os.getenv(task["env_key"], task["env_default"])
+        # Don't add disabled tasks to the schedule
+        if value == "disable":
+            continue
+        # I find https://crontab.guru/ super helpful
+        # crontab(5) format
+        #   - five time-and-date fields
+        #   - separated by at least one blank
+        minute, hour, day_month, month, day_week = value.split(" ")
+
+        schedule[task["name"]] = {
+            "task": task["task"],
+            "schedule": crontab(minute, hour, day_week, day_month, month),
+            "options": task["options"],
+        }
+
+    return schedule
+
+
 # NEVER RUN WITH DEBUG IN PRODUCTION.
-DEBUG = get_bool_from_env("PAPERLESS_DEBUG", "NO")
+DEBUG = __get_boolean("PAPERLESS_DEBUG", "NO")


 ###############################################################################
@@ -62,21 +286,21 @@ DEBUG = get_bool_from_env("PAPERLESS_DEBUG", "NO")

 BASE_DIR: Path = Path(__file__).resolve().parent.parent.parent

-STATIC_ROOT = get_path_from_env("PAPERLESS_STATICDIR", BASE_DIR.parent / "static")
+STATIC_ROOT = __get_path("PAPERLESS_STATICDIR", BASE_DIR.parent / "static")

-MEDIA_ROOT = get_path_from_env("PAPERLESS_MEDIA_ROOT", BASE_DIR.parent / "media")
+MEDIA_ROOT = __get_path("PAPERLESS_MEDIA_ROOT", BASE_DIR.parent / "media")
 ORIGINALS_DIR = MEDIA_ROOT / "documents" / "originals"
 ARCHIVE_DIR = MEDIA_ROOT / "documents" / "archive"
 THUMBNAIL_DIR = MEDIA_ROOT / "documents" / "thumbnails"
 SHARE_LINK_BUNDLE_DIR = MEDIA_ROOT / "documents" / "share_link_bundles"

-DATA_DIR = get_path_from_env("PAPERLESS_DATA_DIR", BASE_DIR.parent / "data")
+DATA_DIR = __get_path("PAPERLESS_DATA_DIR", BASE_DIR.parent / "data")

-NLTK_DIR = get_path_from_env("PAPERLESS_NLTK_DIR", "/usr/share/nltk_data")
+NLTK_DIR = __get_path("PAPERLESS_NLTK_DIR", "/usr/share/nltk_data")

 # Check deprecated setting first
 EMPTY_TRASH_DIR = (
-    get_path_from_env("PAPERLESS_TRASH_DIR", os.getenv("PAPERLESS_EMPTY_TRASH_DIR"))
+    __get_path("PAPERLESS_TRASH_DIR", os.getenv("PAPERLESS_EMPTY_TRASH_DIR"))
    if os.getenv("PAPERLESS_TRASH_DIR") or os.getenv("PAPERLESS_EMPTY_TRASH_DIR")
    else None
 )
@@ -85,21 +309,21 @@ EMPTY_TRASH_DIR = (
 # threads.
 MEDIA_LOCK = MEDIA_ROOT / "media.lock"
 INDEX_DIR = DATA_DIR / "index"
-MODEL_FILE = get_path_from_env(
+MODEL_FILE = __get_path(
    "PAPERLESS_MODEL_FILE",
    DATA_DIR / "classification_model.pickle",
 )
 LLM_INDEX_DIR = DATA_DIR / "llm_index"

-LOGGING_DIR = get_path_from_env("PAPERLESS_LOGGING_DIR", DATA_DIR / "log")
+LOGGING_DIR = __get_path("PAPERLESS_LOGGING_DIR", DATA_DIR / "log")

-CONSUMPTION_DIR = get_path_from_env(
+CONSUMPTION_DIR = __get_path(
    "PAPERLESS_CONSUMPTION_DIR",
    BASE_DIR.parent / "consume",
 )

 # This will be created if it doesn't exist
-SCRATCH_DIR = get_path_from_env(
+SCRATCH_DIR = __get_path(
    "PAPERLESS_SCRATCH_DIR",
    Path(tempfile.gettempdir()) / "paperless",
 )
@@ -108,7 +332,7 @@ SCRATCH_DIR = get_path_from_env(
 # Application Definition                                                      #
 ###############################################################################

-env_apps = get_list_from_env("PAPERLESS_APPS")
+env_apps = __get_list("PAPERLESS_APPS")

 INSTALLED_APPS = [
    "whitenoise.runserver_nostatic",
@@ -181,7 +405,7 @@ MIDDLEWARE = [
 ]

 # Optional to enable compression
-if get_bool_from_env("PAPERLESS_ENABLE_COMPRESSION", "yes"):  # pragma: no cover
+if __get_boolean("PAPERLESS_ENABLE_COMPRESSION", "yes"):  # pragma: no cover
    MIDDLEWARE.insert(0, "compression_middleware.middleware.CompressionMiddleware")

 # Workaround to not compress streaming responses (e.g. chat).
@@ -200,8 +424,20 @@ CompressionMiddleware.process_response = patched_process_response
 ROOT_URLCONF = "paperless.urls"


+def _parse_base_paths() -> tuple[str, str, str, str, str]:
+    script_name = os.getenv("PAPERLESS_FORCE_SCRIPT_NAME")
+    base_url = (script_name or "") + "/"
+    login_url = base_url + "accounts/login/"
+    login_redirect_url = base_url + "dashboard"
+    logout_redirect_url = os.getenv(
+        "PAPERLESS_LOGOUT_REDIRECT_URL",
+        login_url + "?loggedout=1",
+    )
+    return script_name, base_url, login_url, login_redirect_url, logout_redirect_url
+
+
 FORCE_SCRIPT_NAME, BASE_URL, LOGIN_URL, LOGIN_REDIRECT_URL, LOGOUT_REDIRECT_URL = (
-    parse_hosting_settings()
+    _parse_base_paths()
 )

 # DRF Spectacular settings
@@ -235,7 +471,7 @@ STORAGES = {
    "default": {"BACKEND": "django.core.files.storage.FileSystemStorage"},
 }

-_CELERY_REDIS_URL, _CHANNELS_REDIS_URL = parse_redis_url(
+_CELERY_REDIS_URL, _CHANNELS_REDIS_URL = _parse_redis_url(
    os.getenv("PAPERLESS_REDIS", None),
 )
 _REDIS_KEY_PREFIX = os.getenv("PAPERLESS_REDIS_PREFIX", "")
@@ -284,8 +520,8 @@ EMAIL_PORT: Final[int] = int(os.getenv("PAPERLESS_EMAIL_PORT", 25))
 EMAIL_HOST_USER: Final[str] = os.getenv("PAPERLESS_EMAIL_HOST_USER", "")
 EMAIL_HOST_PASSWORD: Final[str] = os.getenv("PAPERLESS_EMAIL_HOST_PASSWORD", "")
 DEFAULT_FROM_EMAIL: Final[str] = os.getenv("PAPERLESS_EMAIL_FROM", EMAIL_HOST_USER)
-EMAIL_USE_TLS: Final[bool] = get_bool_from_env("PAPERLESS_EMAIL_USE_TLS")
-EMAIL_USE_SSL: Final[bool] = get_bool_from_env("PAPERLESS_EMAIL_USE_SSL")
+EMAIL_USE_TLS: Final[bool] = __get_boolean("PAPERLESS_EMAIL_USE_TLS")
+EMAIL_USE_SSL: Final[bool] = __get_boolean("PAPERLESS_EMAIL_USE_SSL")
 EMAIL_SUBJECT_PREFIX: Final[str] = "[Paperless-ngx] "
 EMAIL_TIMEOUT = 30.0
 EMAIL_ENABLED = EMAIL_HOST != "localhost" or EMAIL_HOST_USER != ""
@@ -310,22 +546,20 @@ ACCOUNT_DEFAULT_HTTP_PROTOCOL = os.getenv(
 )

 ACCOUNT_ADAPTER = "paperless.adapter.CustomAccountAdapter"
-ACCOUNT_ALLOW_SIGNUPS = get_bool_from_env("PAPERLESS_ACCOUNT_ALLOW_SIGNUPS")
-ACCOUNT_DEFAULT_GROUPS = get_list_from_env("PAPERLESS_ACCOUNT_DEFAULT_GROUPS")
+ACCOUNT_ALLOW_SIGNUPS = __get_boolean("PAPERLESS_ACCOUNT_ALLOW_SIGNUPS")
+ACCOUNT_DEFAULT_GROUPS = __get_list("PAPERLESS_ACCOUNT_DEFAULT_GROUPS")

 SOCIALACCOUNT_ADAPTER = "paperless.adapter.CustomSocialAccountAdapter"
-SOCIALACCOUNT_ALLOW_SIGNUPS = get_bool_from_env(
+SOCIALACCOUNT_ALLOW_SIGNUPS = __get_boolean(
    "PAPERLESS_SOCIALACCOUNT_ALLOW_SIGNUPS",
    "yes",
 )
-SOCIALACCOUNT_AUTO_SIGNUP = get_bool_from_env("PAPERLESS_SOCIAL_AUTO_SIGNUP")
+SOCIALACCOUNT_AUTO_SIGNUP = __get_boolean("PAPERLESS_SOCIAL_AUTO_SIGNUP")
 SOCIALACCOUNT_PROVIDERS = json.loads(
    os.getenv("PAPERLESS_SOCIALACCOUNT_PROVIDERS", "{}"),
 )
-SOCIAL_ACCOUNT_DEFAULT_GROUPS = get_list_from_env(
-    "PAPERLESS_SOCIAL_ACCOUNT_DEFAULT_GROUPS",
-)
-SOCIAL_ACCOUNT_SYNC_GROUPS = get_bool_from_env("PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS")
+SOCIAL_ACCOUNT_DEFAULT_GROUPS = __get_list("PAPERLESS_SOCIAL_ACCOUNT_DEFAULT_GROUPS")
+SOCIAL_ACCOUNT_SYNC_GROUPS = __get_boolean("PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS")
 SOCIAL_ACCOUNT_SYNC_GROUPS_CLAIM: Final[str] = os.getenv(
    "PAPERLESS_SOCIAL_ACCOUNT_SYNC_GROUPS_CLAIM",
    "groups",
@@ -337,8 +571,8 @@ MFA_TOTP_ISSUER = "Paperless-ngx"

 ACCOUNT_EMAIL_SUBJECT_PREFIX = "[Paperless-ngx] "

-DISABLE_REGULAR_LOGIN = get_bool_from_env("PAPERLESS_DISABLE_REGULAR_LOGIN")
-REDIRECT_LOGIN_TO_SSO = get_bool_from_env("PAPERLESS_REDIRECT_LOGIN_TO_SSO")
+DISABLE_REGULAR_LOGIN = __get_boolean("PAPERLESS_DISABLE_REGULAR_LOGIN")
+REDIRECT_LOGIN_TO_SSO = __get_boolean("PAPERLESS_REDIRECT_LOGIN_TO_SSO")

 AUTO_LOGIN_USERNAME = os.getenv("PAPERLESS_AUTO_LOGIN_USERNAME")

@@ -351,15 +585,12 @@ ACCOUNT_EMAIL_VERIFICATION = (
    )
 )

-ACCOUNT_EMAIL_UNKNOWN_ACCOUNTS = get_bool_from_env(
+ACCOUNT_EMAIL_UNKNOWN_ACCOUNTS = __get_boolean(
    "PAPERLESS_ACCOUNT_EMAIL_UNKNOWN_ACCOUNTS",
    "True",
 )

-ACCOUNT_SESSION_REMEMBER = get_bool_from_env(
-    "PAPERLESS_ACCOUNT_SESSION_REMEMBER",
-    "True",
-)
+ACCOUNT_SESSION_REMEMBER = __get_boolean("PAPERLESS_ACCOUNT_SESSION_REMEMBER", "True")
 SESSION_EXPIRE_AT_BROWSER_CLOSE = not ACCOUNT_SESSION_REMEMBER
 SESSION_COOKIE_AGE = int(
    os.getenv("PAPERLESS_SESSION_COOKIE_AGE", 60 * 60 * 24 * 7 * 3),
@@ -376,8 +607,8 @@ if AUTO_LOGIN_USERNAME:

 def _parse_remote_user_settings() -> str:
    global MIDDLEWARE, AUTHENTICATION_BACKENDS, REST_FRAMEWORK
-    enable = get_bool_from_env("PAPERLESS_ENABLE_HTTP_REMOTE_USER")
-    enable_api = get_bool_from_env("PAPERLESS_ENABLE_HTTP_REMOTE_USER_API")
+    enable = __get_boolean("PAPERLESS_ENABLE_HTTP_REMOTE_USER")
+    enable_api = __get_boolean("PAPERLESS_ENABLE_HTTP_REMOTE_USER_API")
    if enable or enable_api:
        MIDDLEWARE.append("paperless.auth.HttpRemoteUserMiddleware")
        AUTHENTICATION_BACKENDS.insert(
@@ -405,16 +636,16 @@ HTTP_REMOTE_USER_HEADER_NAME = _parse_remote_user_settings()
 X_FRAME_OPTIONS = "SAMEORIGIN"

 # The next 3 settings can also be set using just PAPERLESS_URL
-CSRF_TRUSTED_ORIGINS = get_list_from_env("PAPERLESS_CSRF_TRUSTED_ORIGINS")
+CSRF_TRUSTED_ORIGINS = __get_list("PAPERLESS_CSRF_TRUSTED_ORIGINS")

 if DEBUG:
    # Allow access from the angular development server during debugging
    CSRF_TRUSTED_ORIGINS.append("http://localhost:4200")

 # We allow CORS from localhost:8000
-CORS_ALLOWED_ORIGINS = get_list_from_env(
+CORS_ALLOWED_ORIGINS = __get_list(
    "PAPERLESS_CORS_ALLOWED_HOSTS",
-    default=["http://localhost:8000"],
+    ["http://localhost:8000"],
 )

 if DEBUG:
@@ -427,7 +658,7 @@ CORS_EXPOSE_HEADERS = [
    "Content-Disposition",
 ]

-ALLOWED_HOSTS = get_list_from_env("PAPERLESS_ALLOWED_HOSTS", default=["*"])
+ALLOWED_HOSTS = __get_list("PAPERLESS_ALLOWED_HOSTS", ["*"])
 if ALLOWED_HOSTS != ["*"]:
    # always allow localhost. Necessary e.g. for healthcheck in docker.
    ALLOWED_HOSTS.append("localhost")
@@ -447,10 +678,10 @@ def _parse_paperless_url():
 PAPERLESS_URL = _parse_paperless_url()

 # For use with trusted proxies
-TRUSTED_PROXIES = get_list_from_env("PAPERLESS_TRUSTED_PROXIES")
+TRUSTED_PROXIES = __get_list("PAPERLESS_TRUSTED_PROXIES")

-USE_X_FORWARDED_HOST = get_bool_from_env("PAPERLESS_USE_X_FORWARD_HOST", "false")
-USE_X_FORWARDED_PORT = get_bool_from_env("PAPERLESS_USE_X_FORWARD_PORT", "false")
+USE_X_FORWARDED_HOST = __get_boolean("PAPERLESS_USE_X_FORWARD_HOST", "false")
+USE_X_FORWARDED_PORT = __get_boolean("PAPERLESS_USE_X_FORWARD_PORT", "false")
 SECURE_PROXY_SSL_HEADER = (
    tuple(json.loads(os.environ["PAPERLESS_PROXY_SSL_HEADER"]))
    if "PAPERLESS_PROXY_SSL_HEADER" in os.environ
@@ -493,7 +724,7 @@ CSRF_COOKIE_NAME = f"{COOKIE_PREFIX}csrftoken"
 SESSION_COOKIE_NAME = f"{COOKIE_PREFIX}sessionid"
 LANGUAGE_COOKIE_NAME = f"{COOKIE_PREFIX}django_language"

-EMAIL_CERTIFICATE_FILE = get_path_from_env("PAPERLESS_EMAIL_CERTIFICATE_LOCATION")
+EMAIL_CERTIFICATE_FILE = __get_optional_path("PAPERLESS_EMAIL_CERTIFICATE_LOCATION")


 ###############################################################################
@@ -644,7 +875,7 @@ CELERY_BROKER_URL = _CELERY_REDIS_URL
 CELERY_TIMEZONE = TIME_ZONE

 CELERY_WORKER_HIJACK_ROOT_LOGGER = False
-CELERY_WORKER_CONCURRENCY: Final[int] = get_int_from_env("PAPERLESS_TASK_WORKERS", 1)
+CELERY_WORKER_CONCURRENCY: Final[int] = __get_int("PAPERLESS_TASK_WORKERS", 1)
 TASK_WORKERS = CELERY_WORKER_CONCURRENCY
 CELERY_WORKER_MAX_TASKS_PER_CHILD = 1
 CELERY_WORKER_SEND_TASK_EVENTS = True
@@ -657,7 +888,7 @@ CELERY_BROKER_TRANSPORT_OPTIONS = {
 }

 CELERY_TASK_TRACK_STARTED = True
-CELERY_TASK_TIME_LIMIT: Final[int] = get_int_from_env("PAPERLESS_WORKER_TIMEOUT", 1800)
+CELERY_TASK_TIME_LIMIT: Final[int] = __get_int("PAPERLESS_WORKER_TIMEOUT", 1800)

 CELERY_RESULT_EXTENDED = True
 CELERY_RESULT_BACKEND = "django-db"
@@ -669,7 +900,7 @@ CELERY_TASK_SERIALIZER = "pickle"
 CELERY_ACCEPT_CONTENT = ["application/json", "application/x-python-serialize"]

 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#beat-schedule
-CELERY_BEAT_SCHEDULE = parse_beat_schedule()
+CELERY_BEAT_SCHEDULE = _parse_beat_schedule()

 # https://docs.celeryq.dev/en/stable/userguide/configuration.html#beat-schedule-filename
 CELERY_BEAT_SCHEDULE_FILENAME = str(DATA_DIR / "celerybeat-schedule.db")
@@ -677,14 +908,14 @@ CELERY_BEAT_SCHEDULE_FILENAME = str(DATA_DIR / "celerybeat-schedule.db")

 # Cachalot: Database read cache.
 def _parse_cachalot_settings():
-    ttl = get_int_from_env("PAPERLESS_READ_CACHE_TTL", 3600)
+    ttl = __get_int("PAPERLESS_READ_CACHE_TTL", 3600)
    ttl = min(ttl, 31536000) if ttl > 0 else 3600
-    _, redis_url = parse_redis_url(
+    _, redis_url = _parse_redis_url(
        os.getenv("PAPERLESS_READ_CACHE_REDIS_URL", _CHANNELS_REDIS_URL),
    )
    result = {
        "CACHALOT_CACHE": "read-cache",
-        "CACHALOT_ENABLED": get_bool_from_env(
+        "CACHALOT_ENABLED": __get_boolean(
            "PAPERLESS_DB_READ_CACHE_ENABLED",
            default="no",
        ),
@@ -769,9 +1000,9 @@ CONSUMER_POLLING_INTERVAL = float(os.getenv("PAPERLESS_CONSUMER_POLLING_INTERVAL

 CONSUMER_STABILITY_DELAY = float(os.getenv("PAPERLESS_CONSUMER_STABILITY_DELAY", 5))

-CONSUMER_DELETE_DUPLICATES = get_bool_from_env("PAPERLESS_CONSUMER_DELETE_DUPLICATES")
+CONSUMER_DELETE_DUPLICATES = __get_boolean("PAPERLESS_CONSUMER_DELETE_DUPLICATES")

-CONSUMER_RECURSIVE = get_bool_from_env("PAPERLESS_CONSUMER_RECURSIVE")
+CONSUMER_RECURSIVE = __get_boolean("PAPERLESS_CONSUMER_RECURSIVE")

 # Ignore regex patterns, matched against filename only
 CONSUMER_IGNORE_PATTERNS = list(
@@ -793,13 +1024,13 @@ CONSUMER_IGNORE_DIRS = list(
    ),
 )

-CONSUMER_SUBDIRS_AS_TAGS = get_bool_from_env("PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS")
+CONSUMER_SUBDIRS_AS_TAGS = __get_boolean("PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS")

-CONSUMER_ENABLE_BARCODES: Final[bool] = get_bool_from_env(
+CONSUMER_ENABLE_BARCODES: Final[bool] = __get_boolean(
    "PAPERLESS_CONSUMER_ENABLE_BARCODES",
 )

-CONSUMER_BARCODE_TIFF_SUPPORT: Final[bool] = get_bool_from_env(
+CONSUMER_BARCODE_TIFF_SUPPORT: Final[bool] = __get_boolean(
    "PAPERLESS_CONSUMER_BARCODE_TIFF_SUPPORT",
 )

@@ -808,7 +1039,7 @@ CONSUMER_BARCODE_STRING: Final[str] = os.getenv(
    "PATCHT",
 )

-CONSUMER_ENABLE_ASN_BARCODE: Final[bool] = get_bool_from_env(
+CONSUMER_ENABLE_ASN_BARCODE: Final[bool] = __get_boolean(
    "PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE",
 )

@@ -817,26 +1048,23 @@ CONSUMER_ASN_BARCODE_PREFIX: Final[str] = os.getenv(
    "ASN",
 )

-CONSUMER_BARCODE_UPSCALE: Final[float] = get_float_from_env(
+CONSUMER_BARCODE_UPSCALE: Final[float] = __get_float(
    "PAPERLESS_CONSUMER_BARCODE_UPSCALE",
    0.0,
 )

-CONSUMER_BARCODE_DPI: Final[int] = get_int_from_env(
-    "PAPERLESS_CONSUMER_BARCODE_DPI",
-    300,
-)
+CONSUMER_BARCODE_DPI: Final[int] = __get_int("PAPERLESS_CONSUMER_BARCODE_DPI", 300)

-CONSUMER_BARCODE_MAX_PAGES: Final[int] = get_int_from_env(
+CONSUMER_BARCODE_MAX_PAGES: Final[int] = __get_int(
    "PAPERLESS_CONSUMER_BARCODE_MAX_PAGES",
    0,
 )

-CONSUMER_BARCODE_RETAIN_SPLIT_PAGES = get_bool_from_env(
+CONSUMER_BARCODE_RETAIN_SPLIT_PAGES = __get_boolean(
    "PAPERLESS_CONSUMER_BARCODE_RETAIN_SPLIT_PAGES",
 )

-CONSUMER_ENABLE_TAG_BARCODE: Final[bool] = get_bool_from_env(
+CONSUMER_ENABLE_TAG_BARCODE: Final[bool] = __get_boolean(
    "PAPERLESS_CONSUMER_ENABLE_TAG_BARCODE",
 )

@@ -849,11 +1077,11 @@ CONSUMER_TAG_BARCODE_MAPPING = dict(
    ),
 )

-CONSUMER_TAG_BARCODE_SPLIT: Final[bool] = get_bool_from_env(
+CONSUMER_TAG_BARCODE_SPLIT: Final[bool] = __get_boolean(
    "PAPERLESS_CONSUMER_TAG_BARCODE_SPLIT",
 )

-CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED: Final[bool] = get_bool_from_env(
+CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED: Final[bool] = __get_boolean(
    "PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED",
 )

@@ -862,13 +1090,13 @@ CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME: Final[str] = os.getenv(
    "double-sided",
 )

-CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT: Final[bool] = get_bool_from_env(
+CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT: Final[bool] = __get_boolean(
    "PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT",
 )

 CONSUMER_PDF_RECOVERABLE_MIME_TYPES = ("application/octet-stream",)

-OCR_PAGES = get_int_from_env("PAPERLESS_OCR_PAGES")
+OCR_PAGES = __get_optional_int("PAPERLESS_OCR_PAGES")

 # The default language that tesseract will attempt to use when parsing
 # documents.  It should be a 3-letter language code consistent with ISO 639.
@@ -882,20 +1110,20 @@ OCR_MODE = os.getenv("PAPERLESS_OCR_MODE", "skip")

 OCR_SKIP_ARCHIVE_FILE = os.getenv("PAPERLESS_OCR_SKIP_ARCHIVE_FILE", "never")

-OCR_IMAGE_DPI = get_int_from_env("PAPERLESS_OCR_IMAGE_DPI")
+OCR_IMAGE_DPI = __get_optional_int("PAPERLESS_OCR_IMAGE_DPI")

 OCR_CLEAN = os.getenv("PAPERLESS_OCR_CLEAN", "clean")

-OCR_DESKEW: Final[bool] = get_bool_from_env("PAPERLESS_OCR_DESKEW", "true")
+OCR_DESKEW: Final[bool] = __get_boolean("PAPERLESS_OCR_DESKEW", "true")

-OCR_ROTATE_PAGES: Final[bool] = get_bool_from_env("PAPERLESS_OCR_ROTATE_PAGES", "true")
+OCR_ROTATE_PAGES: Final[bool] = __get_boolean("PAPERLESS_OCR_ROTATE_PAGES", "true")

-OCR_ROTATE_PAGES_THRESHOLD: Final[float] = get_float_from_env(
+OCR_ROTATE_PAGES_THRESHOLD: Final[float] = __get_float(
    "PAPERLESS_OCR_ROTATE_PAGES_THRESHOLD",
    12.0,
 )

-OCR_MAX_IMAGE_PIXELS: Final[int | None] = get_int_from_env(
+OCR_MAX_IMAGE_PIXELS: Final[int | None] = __get_optional_int(
    "PAPERLESS_OCR_MAX_IMAGE_PIXELS",
 )

@@ -906,7 +1134,7 @@ OCR_COLOR_CONVERSION_STRATEGY = os.getenv(

 OCR_USER_ARGS = os.getenv("PAPERLESS_OCR_USER_ARGS")

-MAX_IMAGE_PIXELS: Final[int | None] = get_int_from_env(
+MAX_IMAGE_PIXELS: Final[int | None] = __get_optional_int(
    "PAPERLESS_MAX_IMAGE_PIXELS",
 )

@@ -921,7 +1149,7 @@ CONVERT_MEMORY_LIMIT = os.getenv("PAPERLESS_CONVERT_MEMORY_LIMIT")
 GS_BINARY = os.getenv("PAPERLESS_GS_BINARY", "gs")

 # Fallback layout for .eml consumption
-EMAIL_PARSE_DEFAULT_LAYOUT = get_int_from_env(
+EMAIL_PARSE_DEFAULT_LAYOUT = __get_int(
    "PAPERLESS_EMAIL_PARSE_DEFAULT_LAYOUT",
    1,  # MailRule.PdfLayout.TEXT_HTML but that can't be imported here
 )
@@ -935,9 +1163,23 @@ DATE_ORDER = os.getenv("PAPERLESS_DATE_ORDER", "DMY")
 FILENAME_DATE_ORDER = os.getenv("PAPERLESS_FILENAME_DATE_ORDER")


+def _parse_dateparser_languages(languages: str | None):
+    language_list = languages.split("+") if languages else []
+    # There is an unfixed issue in zh-Hant and zh-Hans locales in the dateparser lib.
+    # See: https://github.com/scrapinghub/dateparser/issues/875
+    for index, language in enumerate(language_list):
+        if language.startswith("zh-") and "zh" not in language_list:
+            logger.warning(
+                f'Chinese locale detected: {language}. dateparser might fail to parse some dates with this locale, so Chinese ("zh") will be used as a fallback.',
+            )
+            language_list.append("zh")
+
+    return list(LocaleDataLoader().get_locale_map(locales=language_list))
+
+
 # If not set, we will infer it at runtime
 DATE_PARSER_LANGUAGES = (
-    parse_dateparser_languages(
+    _parse_dateparser_languages(
        os.getenv("PAPERLESS_DATE_PARSER_LANGUAGES"),
    )
    if os.getenv("PAPERLESS_DATE_PARSER_LANGUAGES")
@@ -948,7 +1190,7 @@ DATE_PARSER_LANGUAGES = (
 # Maximum number of dates taken from document start to end to show as suggestions for
 # `created` date in the frontend. Duplicates are removed, which can result in
 # fewer dates shown.
-NUMBER_OF_SUGGESTED_DATES = get_int_from_env("PAPERLESS_NUMBER_OF_SUGGESTED_DATES", 3)
+NUMBER_OF_SUGGESTED_DATES = __get_int("PAPERLESS_NUMBER_OF_SUGGESTED_DATES", 3)

 # Specify the filename format for out files
 FILENAME_FORMAT = os.getenv("PAPERLESS_FILENAME_FORMAT")
@@ -956,7 +1198,7 @@ FILENAME_FORMAT = os.getenv("PAPERLESS_FILENAME_FORMAT")
 # If this is enabled, variables in filename format will resolve to
 # empty-string instead of 'none'.
 # Directories with 'empty names' are omitted, too.
-FILENAME_FORMAT_REMOVE_NONE = get_bool_from_env(
+FILENAME_FORMAT_REMOVE_NONE = __get_boolean(
    "PAPERLESS_FILENAME_FORMAT_REMOVE_NONE",
    "NO",
 )
@@ -967,7 +1209,7 @@ THUMBNAIL_FONT_NAME = os.getenv(
 )

 # Tika settings
-TIKA_ENABLED = get_bool_from_env("PAPERLESS_TIKA_ENABLED", "NO")
+TIKA_ENABLED = __get_boolean("PAPERLESS_TIKA_ENABLED", "NO")
 TIKA_ENDPOINT = os.getenv("PAPERLESS_TIKA_ENDPOINT", "http://localhost:9998")
 TIKA_GOTENBERG_ENDPOINT = os.getenv(
    "PAPERLESS_TIKA_GOTENBERG_ENDPOINT",
@@ -977,21 +1219,52 @@ TIKA_GOTENBERG_ENDPOINT = os.getenv(
 if TIKA_ENABLED:
    INSTALLED_APPS.append("paperless_tika.apps.PaperlessTikaConfig")

-AUDIT_LOG_ENABLED = get_bool_from_env("PAPERLESS_AUDIT_LOG_ENABLED", "true")
+AUDIT_LOG_ENABLED = __get_boolean("PAPERLESS_AUDIT_LOG_ENABLED", "true")
 if AUDIT_LOG_ENABLED:
    INSTALLED_APPS.append("auditlog")
    MIDDLEWARE.append("auditlog.middleware.AuditlogMiddleware")


+def _parse_ignore_dates(
+    env_ignore: str,
+    date_order: str = DATE_ORDER,
+) -> set[datetime.datetime]:
+    """
+    If the PAPERLESS_IGNORE_DATES environment variable is set, parse the
+    user provided string(s) into dates
+
+    Args:
+        env_ignore (str): The value of the environment variable, comma separated dates
+        date_order (str, optional): The format of the date strings.
+                                    Defaults to DATE_ORDER.
+
+    Returns:
+        Set[datetime.datetime]: The set of parsed date objects
+    """
+    import dateparser
+
+    ignored_dates = set()
+    for s in env_ignore.split(","):
+        d = dateparser.parse(
+            s,
+            settings={
+                "DATE_ORDER": date_order,
+            },
+        )
+        if d:
+            ignored_dates.add(d.date())
+    return ignored_dates
+
+
 # List dates that should be ignored when trying to parse date from document text
 IGNORE_DATES: set[datetime.date] = set()

 if os.getenv("PAPERLESS_IGNORE_DATES") is not None:
-    IGNORE_DATES = parse_ignore_dates(os.getenv("PAPERLESS_IGNORE_DATES"), DATE_ORDER)
+    IGNORE_DATES = _parse_ignore_dates(os.getenv("PAPERLESS_IGNORE_DATES"))

 ENABLE_UPDATE_CHECK = os.getenv("PAPERLESS_ENABLE_UPDATE_CHECK", "default")
 if ENABLE_UPDATE_CHECK != "default":
-    ENABLE_UPDATE_CHECK = get_bool_from_env("PAPERLESS_ENABLE_UPDATE_CHECK")
+    ENABLE_UPDATE_CHECK = __get_boolean("PAPERLESS_ENABLE_UPDATE_CHECK")

 APP_TITLE = os.getenv("PAPERLESS_APP_TITLE", None)
 APP_LOGO = os.getenv("PAPERLESS_APP_LOGO", None)
@@ -1036,7 +1309,7 @@ def _get_nltk_language_setting(ocr_lang: str) -> str | None:
    return iso_code_to_nltk.get(ocr_lang)


-NLTK_ENABLED: Final[bool] = get_bool_from_env("PAPERLESS_ENABLE_NLTK", "yes")
+NLTK_ENABLED: Final[bool] = __get_boolean("PAPERLESS_ENABLE_NLTK", "yes")

 NLTK_LANGUAGE: str | None = _get_nltk_language_setting(OCR_LANGUAGE)

@@ -1045,7 +1318,7 @@ NLTK_LANGUAGE: str | None = _get_nltk_language_setting(OCR_LANGUAGE)
 ###############################################################################

 EMAIL_GNUPG_HOME: Final[str | None] = os.getenv("PAPERLESS_EMAIL_GNUPG_HOME")
-EMAIL_ENABLE_GPG_DECRYPTOR: Final[bool] = get_bool_from_env(
+EMAIL_ENABLE_GPG_DECRYPTOR: Final[bool] = __get_boolean(
    "PAPERLESS_ENABLE_GPG_DECRYPTOR",
 )

@@ -1053,7 +1326,7 @@ EMAIL_ENABLE_GPG_DECRYPTOR: Final[bool] = get_bool_from_env(
 ###############################################################################
 # Soft Delete                                                                 #
 ###############################################################################
-EMPTY_TRASH_DELAY = max(get_int_from_env("PAPERLESS_EMPTY_TRASH_DELAY", 30), 1)
+EMPTY_TRASH_DELAY = max(__get_int("PAPERLESS_EMPTY_TRASH_DELAY", 30), 1)


 ###############################################################################
@@ -1078,17 +1351,21 @@ OUTLOOK_OAUTH_ENABLED = bool(
 ###############################################################################
 # Webhooks
 ###############################################################################
-WEBHOOKS_ALLOWED_SCHEMES = {
+WEBHOOKS_ALLOWED_SCHEMES = set(
    s.lower()
-    for s in get_list_from_env(
+    for s in __get_list(
        "PAPERLESS_WEBHOOKS_ALLOWED_SCHEMES",
-        default=["http", "https"],
+        ["http", "https"],
    )
-}
-WEBHOOKS_ALLOWED_PORTS = {
-    int(p) for p in get_list_from_env("PAPERLESS_WEBHOOKS_ALLOWED_PORTS", default=[])
-}
-WEBHOOKS_ALLOW_INTERNAL_REQUESTS = get_bool_from_env(
+)
+WEBHOOKS_ALLOWED_PORTS = set(
+    int(p)
+    for p in __get_list(
+        "PAPERLESS_WEBHOOKS_ALLOWED_PORTS",
+        [],
+    )
+)
+WEBHOOKS_ALLOW_INTERNAL_REQUESTS = __get_boolean(
    "PAPERLESS_WEBHOOKS_ALLOW_INTERNAL_REQUESTS",
    "true",
 )
@@ -1103,7 +1380,7 @@ REMOTE_OCR_ENDPOINT = os.getenv("PAPERLESS_REMOTE_OCR_ENDPOINT")
 ################################################################################
 # AI Settings                                                                  #
 ################################################################################
-AI_ENABLED = get_bool_from_env("PAPERLESS_AI_ENABLED", "NO")
+AI_ENABLED = __get_boolean("PAPERLESS_AI_ENABLED", "NO")
 LLM_EMBEDDING_BACKEND = os.getenv(
    "PAPERLESS_AI_LLM_EMBEDDING_BACKEND",
 )  # "huggingface" or "openai"
--- a/src/paperless/settings/custom.py
+++ b/src/paperless/settings/custom.py
@@ -1,191 +1,11 @@
-import datetime
-import logging
 import os
 from pathlib import Path
 from typing import Any

-from celery.schedules import crontab
-from dateparser.languages.loader import LocaleDataLoader
-
 from paperless.settings.parsers import get_choice_from_env
 from paperless.settings.parsers import get_int_from_env
 from paperless.settings.parsers import parse_dict_from_str

-logger = logging.getLogger(__name__)
-
-
-def parse_hosting_settings() -> tuple[str | None, str, str, str, str]:
-    script_name = os.getenv("PAPERLESS_FORCE_SCRIPT_NAME")
-    base_url = (script_name or "") + "/"
-    login_url = base_url + "accounts/login/"
-    login_redirect_url = base_url + "dashboard"
-    logout_redirect_url = os.getenv(
-        "PAPERLESS_LOGOUT_REDIRECT_URL",
-        login_url + "?loggedout=1",
-    )
-    return script_name, base_url, login_url, login_redirect_url, logout_redirect_url
-
-
-def parse_redis_url(env_redis: str | None) -> tuple[str, str]:
-    """
-    Gets the Redis information from the environment or a default and handles
-    converting from incompatible django_channels and celery formats.
-
-    Returns a tuple of (celery_url, channels_url)
-    """
-
-    # Not set, return a compatible default
-    if env_redis is None:
-        return ("redis://localhost:6379", "redis://localhost:6379")
-
-    if "unix" in env_redis.lower():
-        # channels_redis socket format, looks like:
-        # "unix:///path/to/redis.sock"
-        _, path = env_redis.split(":", maxsplit=1)
-        # Optionally setting a db number
-        if "?db=" in env_redis:
-            path, number = path.split("?db=")
-            return (f"redis+socket:{path}?virtual_host={number}", env_redis)
-        else:
-            return (f"redis+socket:{path}", env_redis)
-
-    elif "+socket" in env_redis.lower():
-        # celery socket style, looks like:
-        # "redis+socket:///path/to/redis.sock"
-        _, path = env_redis.split(":", maxsplit=1)
-        if "?virtual_host=" in env_redis:
-            # Virtual host (aka db number)
-            path, number = path.split("?virtual_host=")
-            return (env_redis, f"unix:{path}?db={number}")
-        else:
-            return (env_redis, f"unix:{path}")
-
-    # Not a socket
-    return (env_redis, env_redis)
-
-
-def parse_beat_schedule() -> dict:
-    """
-    Configures the scheduled tasks, according to default or
-    environment variables.  Task expiration is configured so the task will
-    expire (and not run), shortly before the default frequency will put another
-    of the same task into the queue
-
-
-    https://docs.celeryq.dev/en/stable/userguide/periodic-tasks.html#beat-entries
-    https://docs.celeryq.dev/en/latest/userguide/calling.html#expiration
-    """
-    schedule = {}
-    tasks = [
-        {
-            "name": "Check all e-mail accounts",
-            "env_key": "PAPERLESS_EMAIL_TASK_CRON",
-            # Default every ten minutes
-            "env_default": "*/10 * * * *",
-            "task": "paperless_mail.tasks.process_mail_accounts",
-            "options": {
-                # 1 minute before default schedule sends again
-                "expires": 9.0 * 60.0,
-            },
-        },
-        {
-            "name": "Train the classifier",
-            "env_key": "PAPERLESS_TRAIN_TASK_CRON",
-            # Default hourly at 5 minutes past the hour
-            "env_default": "5 */1 * * *",
-            "task": "documents.tasks.train_classifier",
-            "options": {
-                # 1 minute before default schedule sends again
-                "expires": 59.0 * 60.0,
-            },
-        },
-        {
-            "name": "Optimize the index",
-            "env_key": "PAPERLESS_INDEX_TASK_CRON",
-            # Default daily at midnight
-            "env_default": "0 0 * * *",
-            "task": "documents.tasks.index_optimize",
-            "options": {
-                # 1 hour before default schedule sends again
-                "expires": 23.0 * 60.0 * 60.0,
-            },
-        },
-        {
-            "name": "Perform sanity check",
-            "env_key": "PAPERLESS_SANITY_TASK_CRON",
-            # Default Sunday at 00:30
-            "env_default": "30 0 * * sun",
-            "task": "documents.tasks.sanity_check",
-            "options": {
-                # 1 hour before default schedule sends again
-                "expires": ((7.0 * 24.0) - 1.0) * 60.0 * 60.0,
-            },
-        },
-        {
-            "name": "Empty trash",
-            "env_key": "PAPERLESS_EMPTY_TRASH_TASK_CRON",
-            # Default daily at 01:00
-            "env_default": "0 1 * * *",
-            "task": "documents.tasks.empty_trash",
-            "options": {
-                # 1 hour before default schedule sends again
-                "expires": 23.0 * 60.0 * 60.0,
-            },
-        },
-        {
-            "name": "Check and run scheduled workflows",
-            "env_key": "PAPERLESS_WORKFLOW_SCHEDULED_TASK_CRON",
-            # Default hourly at 5 minutes past the hour
-            "env_default": "5 */1 * * *",
-            "task": "documents.tasks.check_scheduled_workflows",
-            "options": {
-                # 1 minute before default schedule sends again
-                "expires": 59.0 * 60.0,
-            },
-        },
-        {
-            "name": "Rebuild LLM index",
-            "env_key": "PAPERLESS_LLM_INDEX_TASK_CRON",
-            # Default daily at 02:10
-            "env_default": "10 2 * * *",
-            "task": "documents.tasks.llmindex_index",
-            "options": {
-                # 1 hour before default schedule sends again
-                "expires": 23.0 * 60.0 * 60.0,
-            },
-        },
-        {
-            "name": "Cleanup expired share link bundles",
-            "env_key": "PAPERLESS_SHARE_LINK_BUNDLE_CLEANUP_CRON",
-            # Default daily at 02:00
-            "env_default": "0 2 * * *",
-            "task": "documents.tasks.cleanup_expired_share_link_bundles",
-            "options": {
-                # 1 hour before default schedule sends again
-                "expires": 23.0 * 60.0 * 60.0,
-            },
-        },
-    ]
-    for task in tasks:
-        # Either get the environment setting or use the default
-        value = os.getenv(task["env_key"], task["env_default"])
-        # Don't add disabled tasks to the schedule
-        if value == "disable":
-            continue
-        # I find https://crontab.guru/ super helpful
-        # crontab(5) format
-        #   - five time-and-date fields
-        #   - separated by at least one blank
-        minute, hour, day_month, month, day_week = value.split(" ")
-
-        schedule[task["name"]] = {
-            "task": task["task"],
-            "schedule": crontab(minute, hour, day_week, day_month, month),
-            "options": task["options"],
-        }
-
-    return schedule
-

 def parse_db_settings(data_dir: Path) -> dict[str, dict[str, Any]]:
    """Parse database settings from environment variables.
@@ -300,48 +120,3 @@ def parse_db_settings(data_dir: Path) -> dict[str, dict[str, Any]]:
    )

    return {"default": db_config}
-
-
-def parse_dateparser_languages(languages: str | None) -> list[str]:
-    language_list = languages.split("+") if languages else []
-    # There is an unfixed issue in zh-Hant and zh-Hans locales in the dateparser lib.
-    # See: https://github.com/scrapinghub/dateparser/issues/875
-    for index, language in enumerate(language_list):
-        if language.startswith("zh-") and "zh" not in language_list:
-            logger.warning(
-                f"Chinese locale detected: {language}. dateparser might fail to parse"
-                f' some dates with this locale, so Chinese ("zh") will be used as a fallback.',
-            )
-            language_list.append("zh")
-
-    return list(LocaleDataLoader().get_locale_map(locales=language_list))
-
-
-def parse_ignore_dates(
-    env_ignore: str,
-    date_order: str,
-) -> set[datetime.date]:
-    """
-    If the PAPERLESS_IGNORE_DATES environment variable is set, parse the
-    user provided string(s) into dates
-
-    Args:
-        env_ignore (str): The value of the environment variable, comma separated dates
-        date_order (str): The format of the date strings.
-
-    Returns:
-        set[datetime.date]: The set of parsed date objects
-    """
-    import dateparser
-
-    ignored_dates = set()
-    for s in env_ignore.split(","):
-        d = dateparser.parse(
-            s,
-            settings={
-                "DATE_ORDER": date_order,
-            },
-        )
-        if d:
-            ignored_dates.add(d.date())
-    return ignored_dates
--- a/src/paperless/settings/parsers.py
+++ b/src/paperless/settings/parsers.py
@@ -156,108 +156,6 @@ def parse_dict_from_str(
    return settings


-def get_bool_from_env(key: str, default: str = "NO") -> bool:
-    """
-    Return a boolean value based on whatever the user has supplied in the
-    environment based on whether the value "looks like" it's True or not.
-    """
-    return str_to_bool(os.getenv(key, default))
-
-
-@overload
-def get_float_from_env(key: str) -> float | None: ...
-
-
-@overload
-def get_float_from_env(key: str, default: None) -> float | None: ...
-
-
-@overload
-def get_float_from_env(key: str, default: float) -> float: ...
-
-
-def get_float_from_env(key: str, default: float | None = None) -> float | None:
-    """
-    Return a float value based on the environment variable.
-    If default is provided, returns that value when key is missing.
-    If default is None, returns None when key is missing.
-    """
-    if key not in os.environ:
-        return default
-
-    return float(os.environ[key])
-
-
-@overload
-def get_path_from_env(key: str) -> Path | None: ...
-
-
-@overload
-def get_path_from_env(key: str, default: None) -> Path | None: ...
-
-
-@overload
-def get_path_from_env(key: str, default: Path | str) -> Path: ...
-
-
-def get_path_from_env(key: str, default: Path | str | None = None) -> Path | None:
-    """
-    Return a Path object based on the environment variable.
-    If default is provided, returns that value when key is missing.
-    If default is None, returns None when key is missing.
-    """
-    if key not in os.environ:
-        return default if default is None else Path(default).resolve()
-
-    return Path(os.environ[key]).resolve()
-
-
-def get_list_from_env(
-    key: str,
-    separator: str = ",",
-    default: list[T] | None = None,
-    *,
-    strip_whitespace: bool = True,
-    remove_empty: bool = True,
-    required: bool = False,
-) -> list[str] | list[T]:
-    """
-    Get and parse a list from an environment variable or return a default.
-
-    Args:
-        key: Environment variable name
-        separator: Character(s) to split on (default: ',')
-        default: Default value to return if env var is not set or empty
-        strip_whitespace: Whether to strip whitespace from each element
-        remove_empty: Whether to remove empty strings from the result
-        required: If True, raise an error when the env var is missing and no default provided
-
-    Returns:
-        List of strings or list of type-cast values, or default if env var is empty/None
-
-    Raises:
-        ValueError: If required=True and env var is missing and there is no default
-    """
-    # Get the environment variable value
-    env_value = os.environ.get(key)
-
-    # Handle required environment variables
-    if required and env_value is None and default is None:
-        raise ValueError(f"Required environment variable '{key}' is not set")
-
-    if env_value:
-        items = env_value.split(separator)
-        if strip_whitespace:
-            items = [item.strip() for item in items]
-        if remove_empty:
-            items = [item for item in items if item]
-        return items
-    elif default is not None:
-        return default
-    else:
-        return []
-
-
 def get_choice_from_env(
    env_key: str,
    choices: set[str],
--- a/src/paperless/tests/conftest.py
+++ b/src/paperless/tests/conftest.py
@@ -0,0 +1,48 @@
+"""
+Fixtures defined here are available to every test module under
+src/paperless/tests/ (including sub-packages such as parsers/).
+
+Session-scoped fixtures for the shared samples directory live here so
+sub-package conftest files can reference them without duplicating path logic.
+Parser-specific fixtures (concrete parser instances, format-specific sample
+files) live in paperless/tests/parsers/conftest.py.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import TYPE_CHECKING
+
+import pytest
+
+from paperless.parsers.registry import reset_parser_registry
+
+if TYPE_CHECKING:
+    from collections.abc import Generator
+
+
+@pytest.fixture(scope="session")
+def samples_dir() -> Path:
+    """Absolute path to the shared parser sample files directory.
+
+    Sub-package conftest files derive format-specific paths from this root,
+    e.g. ``samples_dir / "text" / "test.txt"``.
+
+    Returns
+    -------
+    Path
+        Directory containing all sample documents used by parser tests.
+    """
+    return (Path(__file__).parent / "samples").resolve()
+
+
+@pytest.fixture(autouse=True)
+def clean_registry() -> Generator[None, None, None]:
+    """Reset the parser registry before and after every test.
+
+    This prevents registry state from leaking between tests that call
+    get_parser_registry() or init_builtin_parsers().
+    """
+    reset_parser_registry()
+    yield
+    reset_parser_registry()
--- a/src/paperless/tests/parsers/init.py
+++ b/src/paperless/tests/parsers/init.py
--- a/src/paperless/tests/parsers/conftest.py
+++ b/src/paperless/tests/parsers/conftest.py
@@ -0,0 +1,76 @@
+"""
+Parser fixtures that are used across multiple test modules in this package
+are defined here.  Format-specific sample-file fixtures are grouped by parser
+so it is easy to see which files belong to which test module.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+import pytest
+
+from paperless.parsers.text import TextDocumentParser
+
+if TYPE_CHECKING:
+    from collections.abc import Generator
+    from pathlib import Path
+
+
+# ------------------------------------------------------------------
+# Text parser sample files
+# ------------------------------------------------------------------
+
+
+@pytest.fixture(scope="session")
+def text_samples_dir(samples_dir: Path) -> Path:
+    """Absolute path to the text parser sample files directory.
+
+    Returns
+    -------
+    Path
+        ``<samples_dir>/text/``
+    """
+    return samples_dir / "text"
+
+
+@pytest.fixture(scope="session")
+def sample_txt_file(text_samples_dir: Path) -> Path:
+    """Path to a valid UTF-8 plain-text sample file.
+
+    Returns
+    -------
+    Path
+        Absolute path to ``text/test.txt``.
+    """
+    return text_samples_dir / "test.txt"
+
+
+@pytest.fixture(scope="session")
+def malformed_txt_file(text_samples_dir: Path) -> Path:
+    """Path to a text file containing invalid UTF-8 bytes.
+
+    Returns
+    -------
+    Path
+        Absolute path to ``text/decode_error.txt``.
+    """
+    return text_samples_dir / "decode_error.txt"
+
+
+# ------------------------------------------------------------------
+# Text parser instance
+# ------------------------------------------------------------------
+
+
+@pytest.fixture()
+def text_parser() -> Generator[TextDocumentParser, None, None]:
+    """Yield a TextDocumentParser and clean up its temporary directory afterwards.
+
+    Yields
+    ------
+    TextDocumentParser
+        A ready-to-use parser instance.
+    """
+    with TextDocumentParser() as parser:
+        yield parser
--- a/src/paperless/tests/parsers/test_text_parser.py
+++ b/src/paperless/tests/parsers/test_text_parser.py
@@ -0,0 +1,256 @@
+"""
+Tests for paperless.parsers.text.TextDocumentParser.
+
+All tests use the context-manager protocol for parser lifecycle.  Sample
+files are provided by session-scoped fixtures defined in conftest.py.
+"""
+
+from __future__ import annotations
+
+import tempfile
+from pathlib import Path
+
+import pytest
+
+from paperless.parsers import ParserProtocol
+from paperless.parsers.text import TextDocumentParser
+
+
+class TestTextParserProtocol:
+    """Verify that TextDocumentParser satisfies the ParserProtocol contract."""
+
+    def test_isinstance_satisfies_protocol(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert isinstance(text_parser, ParserProtocol)
+
+    def test_class_attributes_present(self) -> None:
+        assert isinstance(TextDocumentParser.name, str) and TextDocumentParser.name
+        assert (
+            isinstance(TextDocumentParser.version, str) and TextDocumentParser.version
+        )
+        assert isinstance(TextDocumentParser.author, str) and TextDocumentParser.author
+        assert isinstance(TextDocumentParser.url, str) and TextDocumentParser.url
+
+    def test_supported_mime_types_returns_dict(self) -> None:
+        mime_types = TextDocumentParser.supported_mime_types()
+        assert isinstance(mime_types, dict)
+        assert "text/plain" in mime_types
+        assert "text/csv" in mime_types
+        assert "application/csv" in mime_types
+
+    @pytest.mark.parametrize(
+        ("mime_type", "expected"),
+        [
+            ("text/plain", 10),
+            ("text/csv", 10),
+            ("application/csv", 10),
+            ("application/pdf", None),
+            ("image/png", None),
+        ],
+    )
+    def test_score(self, mime_type: str, expected: int | None) -> None:
+        assert TextDocumentParser.score(mime_type, "file.txt") == expected
+
+    def test_can_produce_archive_is_false(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert text_parser.can_produce_archive is False
+
+    def test_requires_pdf_rendition_is_false(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert text_parser.requires_pdf_rendition is False
+
+
+class TestTextParserLifecycle:
+    """Verify context-manager behaviour and temporary directory cleanup."""
+
+    def test_context_manager_cleans_up_tempdir(self) -> None:
+        with TextDocumentParser() as parser:
+            tempdir = parser._tempdir
+            assert tempdir.exists()
+        assert not tempdir.exists()
+
+    def test_context_manager_cleans_up_after_exception(self) -> None:
+        tempdir: Path | None = None
+        with pytest.raises(RuntimeError):
+            with TextDocumentParser() as parser:
+                tempdir = parser._tempdir
+                raise RuntimeError("boom")
+        assert tempdir is not None
+        assert not tempdir.exists()
+
+
+class TestTextParserParse:
+    """Verify parse() and the result accessors."""
+
+    def test_parse_valid_utf8(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        text_parser.parse(sample_txt_file, "text/plain")
+
+        assert text_parser.get_text() == "This is a test file.\n"
+
+    def test_parse_returns_none_for_archive_path(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        text_parser.parse(sample_txt_file, "text/plain")
+
+        assert text_parser.get_archive_path() is None
+
+    def test_parse_returns_none_for_date(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        text_parser.parse(sample_txt_file, "text/plain")
+
+        assert text_parser.get_date() is None
+
+    def test_parse_invalid_utf8_bytes_replaced(
+        self,
+        text_parser: TextDocumentParser,
+        malformed_txt_file: Path,
+    ) -> None:
+        """
+        GIVEN:
+            - A text file containing invalid UTF-8 byte sequences
+        WHEN:
+            - The file is parsed
+        THEN:
+            - Parsing succeeds
+            - Invalid bytes are replaced with the Unicode replacement character
+        """
+        text_parser.parse(malformed_txt_file, "text/plain")
+
+        assert text_parser.get_text() == "Pantothens\ufffdure\n"
+
+    def test_get_text_none_before_parse(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert text_parser.get_text() is None
+
+
+class TestTextParserThumbnail:
+    """Verify thumbnail generation."""
+
+    def test_thumbnail_exists_and_is_file(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        thumb = text_parser.get_thumbnail(sample_txt_file, "text/plain")
+
+        assert thumb.exists()
+        assert thumb.is_file()
+
+    def test_thumbnail_large_file_does_not_read_all(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        """
+        GIVEN:
+            - A text file larger than 50 MB
+        WHEN:
+            - A thumbnail is requested
+        THEN:
+            - The thumbnail is generated without loading the full file
+        """
+        with tempfile.NamedTemporaryFile(
+            delete=False,
+            mode="w",
+            encoding="utf-8",
+            suffix=".txt",
+        ) as tmp:
+            tmp.write("A" * (51 * 1024 * 1024))
+            large_file = Path(tmp.name)
+
+        try:
+            thumb = text_parser.get_thumbnail(large_file, "text/plain")
+            assert thumb.exists()
+            assert thumb.is_file()
+        finally:
+            large_file.unlink(missing_ok=True)
+
+    def test_get_page_count_returns_none(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        assert text_parser.get_page_count(sample_txt_file, "text/plain") is None
+
+
+class TestTextParserMetadata:
+    """Verify extract_metadata behaviour."""
+
+    def test_extract_metadata_returns_empty_list(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        result = text_parser.extract_metadata(sample_txt_file, "text/plain")
+
+        assert result == []
+
+    def test_extract_metadata_returns_list_type(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        result = text_parser.extract_metadata(sample_txt_file, "text/plain")
+
+        assert isinstance(result, list)
+
+    def test_extract_metadata_ignores_mime_type(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        """extract_metadata returns [] regardless of the mime_type argument."""
+        assert text_parser.extract_metadata(sample_txt_file, "application/pdf") == []
+        assert text_parser.extract_metadata(sample_txt_file, "text/csv") == []
+
+
+class TestTextParserRegistry:
+    """Verify that TextDocumentParser is registered by default."""
+
+    def test_registered_in_defaults(self) -> None:
+        from paperless.parsers.registry import ParserRegistry
+
+        registry = ParserRegistry()
+        registry.register_defaults()
+
+        assert TextDocumentParser in registry._builtins
+
+    def test_get_parser_for_text_plain(self) -> None:
+        from paperless.parsers.registry import get_parser_registry
+
+        registry = get_parser_registry()
+        parser_cls = registry.get_parser_for_file("text/plain", "doc.txt")
+
+        assert parser_cls is TextDocumentParser
+
+    def test_get_parser_for_text_csv(self) -> None:
+        from paperless.parsers.registry import get_parser_registry
+
+        registry = get_parser_registry()
+        parser_cls = registry.get_parser_for_file("text/csv", "data.csv")
+
+        assert parser_cls is TextDocumentParser
+
+    def test_get_parser_for_unknown_type_returns_none(self) -> None:
+        from paperless.parsers.registry import get_parser_registry
+
+        registry = get_parser_registry()
+        parser_cls = registry.get_parser_for_file("application/pdf", "doc.pdf")
+
+        assert parser_cls is None
--- a/src/paperless/tests/samples/text/decode_error.txt
+++ b/src/paperless/tests/samples/text/decode_error.txt
@@ -0,0 +1 @@
+Pantothensäure
--- a/src/paperless/tests/samples/text/test.txt
+++ b/src/paperless/tests/samples/text/test.txt
@@ -0,0 +1 @@
+This is a test file.
--- a/src/paperless/tests/settings/test_custom_parsers.py
+++ b/src/paperless/tests/settings/test_custom_parsers.py
@@ -1,279 +1,10 @@
-import datetime
 import os
 from pathlib import Path
-from typing import Any

 import pytest
-from celery.schedules import crontab
 from pytest_mock import MockerFixture

-from paperless.settings.custom import parse_beat_schedule
-from paperless.settings.custom import parse_dateparser_languages
 from paperless.settings.custom import parse_db_settings
-from paperless.settings.custom import parse_hosting_settings
-from paperless.settings.custom import parse_ignore_dates
-from paperless.settings.custom import parse_redis_url
-
-
-class TestRedisSocketConversion:
-    @pytest.mark.parametrize(
-        ("input_url", "expected"),
-        [
-            pytest.param(
-                None,
-                ("redis://localhost:6379", "redis://localhost:6379"),
-                id="none_uses_default",
-            ),
-            pytest.param(
-                "redis+socket:///run/redis/redis.sock",
-                (
-                    "redis+socket:///run/redis/redis.sock",
-                    "unix:///run/redis/redis.sock",
-                ),
-                id="celery_style_socket",
-            ),
-            pytest.param(
-                "unix:///run/redis/redis.sock",
-                (
-                    "redis+socket:///run/redis/redis.sock",
-                    "unix:///run/redis/redis.sock",
-                ),
-                id="redis_py_style_socket",
-            ),
-            pytest.param(
-                "redis+socket:///run/redis/redis.sock?virtual_host=5",
-                (
-                    "redis+socket:///run/redis/redis.sock?virtual_host=5",
-                    "unix:///run/redis/redis.sock?db=5",
-                ),
-                id="celery_style_socket_with_db",
-            ),
-            pytest.param(
-                "unix:///run/redis/redis.sock?db=10",
-                (
-                    "redis+socket:///run/redis/redis.sock?virtual_host=10",
-                    "unix:///run/redis/redis.sock?db=10",
-                ),
-                id="redis_py_style_socket_with_db",
-            ),
-            pytest.param(
-                "redis://myredishost:6379",
-                ("redis://myredishost:6379", "redis://myredishost:6379"),
-                id="host_with_port_unchanged",
-            ),
-            # Credentials in unix:// URL contain multiple colons (user:password@)
-            # Regression test for https://github.com/paperless-ngx/paperless-ngx/pull/12239
-            pytest.param(
-                "unix://user:password@/run/redis/redis.sock",
-                (
-                    "redis+socket://user:password@/run/redis/redis.sock",
-                    "unix://user:password@/run/redis/redis.sock",
-                ),
-                id="redis_py_style_socket_with_credentials",
-            ),
-            pytest.param(
-                "redis+socket://user:password@/run/redis/redis.sock",
-                (
-                    "redis+socket://user:password@/run/redis/redis.sock",
-                    "unix://user:password@/run/redis/redis.sock",
-                ),
-                id="celery_style_socket_with_credentials",
-            ),
-        ],
-    )
-    def test_redis_socket_parsing(
-        self,
-        input_url: str | None,
-        expected: tuple[str, str],
-    ) -> None:
-        """
-        GIVEN:
-            - Various Redis connection URI formats
-        WHEN:
-            - The URI is parsed
-        THEN:
-            - Socket based URIs are translated
-            - Non-socket URIs are unchanged
-            - None provided uses default
-        """
-        result = parse_redis_url(input_url)
-        assert expected == result
-
-
-class TestParseHostingSettings:
-    @pytest.mark.parametrize(
-        ("env", "expected"),
-        [
-            pytest.param(
-                {},
-                (
-                    None,
-                    "/",
-                    "/accounts/login/",
-                    "/dashboard",
-                    "/accounts/login/?loggedout=1",
-                ),
-                id="no_env_vars",
-            ),
-            pytest.param(
-                {"PAPERLESS_FORCE_SCRIPT_NAME": "/paperless"},
-                (
-                    "/paperless",
-                    "/paperless/",
-                    "/paperless/accounts/login/",
-                    "/paperless/dashboard",
-                    "/paperless/accounts/login/?loggedout=1",
-                ),
-                id="force_script_name_only",
-            ),
-            pytest.param(
-                {
-                    "PAPERLESS_FORCE_SCRIPT_NAME": "/docs",
-                    "PAPERLESS_LOGOUT_REDIRECT_URL": "/custom/logout",
-                },
-                (
-                    "/docs",
-                    "/docs/",
-                    "/docs/accounts/login/",
-                    "/docs/dashboard",
-                    "/custom/logout",
-                ),
-                id="force_script_name_and_logout_redirect",
-            ),
-        ],
-    )
-    def test_parse_hosting_settings(
-        self,
-        mocker: MockerFixture,
-        env: dict[str, str],
-        expected: tuple[str | None, str, str, str, str],
-    ) -> None:
-        """Test parse_hosting_settings with various env configurations."""
-        mocker.patch.dict(os.environ, env, clear=True)
-
-        result = parse_hosting_settings()
-
-        assert result == expected
-
-
-def make_expected_schedule(
-    overrides: dict[str, dict[str, Any]] | None = None,
-    disabled: set[str] | None = None,
-) -> dict[str, Any]:
-    """
-    Build the expected schedule with optional overrides and disabled tasks.
-    """
-
-    mail_expire = 9.0 * 60.0
-    classifier_expire = 59.0 * 60.0
-    index_expire = 23.0 * 60.0 * 60.0
-    sanity_expire = ((7.0 * 24.0) - 1.0) * 60.0 * 60.0
-    empty_trash_expire = 23.0 * 60.0 * 60.0
-    workflow_expire = 59.0 * 60.0
-    llm_index_expire = 23.0 * 60.0 * 60.0
-    share_link_cleanup_expire = 23.0 * 60.0 * 60.0
-
-    schedule: dict[str, Any] = {
-        "Check all e-mail accounts": {
-            "task": "paperless_mail.tasks.process_mail_accounts",
-            "schedule": crontab(minute="*/10"),
-            "options": {"expires": mail_expire},
-        },
-        "Train the classifier": {
-            "task": "documents.tasks.train_classifier",
-            "schedule": crontab(minute="5", hour="*/1"),
-            "options": {"expires": classifier_expire},
-        },
-        "Optimize the index": {
-            "task": "documents.tasks.index_optimize",
-            "schedule": crontab(minute=0, hour=0),
-            "options": {"expires": index_expire},
-        },
-        "Perform sanity check": {
-            "task": "documents.tasks.sanity_check",
-            "schedule": crontab(minute=30, hour=0, day_of_week="sun"),
-            "options": {"expires": sanity_expire},
-        },
-        "Empty trash": {
-            "task": "documents.tasks.empty_trash",
-            "schedule": crontab(minute=0, hour="1"),
-            "options": {"expires": empty_trash_expire},
-        },
-        "Check and run scheduled workflows": {
-            "task": "documents.tasks.check_scheduled_workflows",
-            "schedule": crontab(minute="5", hour="*/1"),
-            "options": {"expires": workflow_expire},
-        },
-        "Rebuild LLM index": {
-            "task": "documents.tasks.llmindex_index",
-            "schedule": crontab(minute="10", hour="2"),
-            "options": {"expires": llm_index_expire},
-        },
-        "Cleanup expired share link bundles": {
-            "task": "documents.tasks.cleanup_expired_share_link_bundles",
-            "schedule": crontab(minute=0, hour="2"),
-            "options": {"expires": share_link_cleanup_expire},
-        },
-    }
-
-    overrides = overrides or {}
-    disabled = disabled or set()
-
-    for key, val in overrides.items():
-        schedule[key] = {**schedule.get(key, {}), **val}
-
-    for key in disabled:
-        schedule.pop(key, None)
-
-    return schedule
-
-
-class TestParseBeatSchedule:
-    @pytest.mark.parametrize(
-        ("env", "expected"),
-        [
-            pytest.param({}, make_expected_schedule(), id="defaults"),
-            pytest.param(
-                {"PAPERLESS_EMAIL_TASK_CRON": "*/50 * * * mon"},
-                make_expected_schedule(
-                    overrides={
-                        "Check all e-mail accounts": {
-                            "schedule": crontab(minute="*/50", day_of_week="mon"),
-                        },
-                    },
-                ),
-                id="email-changed",
-            ),
-            pytest.param(
-                {"PAPERLESS_INDEX_TASK_CRON": "disable"},
-                make_expected_schedule(disabled={"Optimize the index"}),
-                id="index-disabled",
-            ),
-            pytest.param(
-                {
-                    "PAPERLESS_EMAIL_TASK_CRON": "disable",
-                    "PAPERLESS_TRAIN_TASK_CRON": "disable",
-                    "PAPERLESS_SANITY_TASK_CRON": "disable",
-                    "PAPERLESS_INDEX_TASK_CRON": "disable",
-                    "PAPERLESS_EMPTY_TRASH_TASK_CRON": "disable",
-                    "PAPERLESS_WORKFLOW_SCHEDULED_TASK_CRON": "disable",
-                    "PAPERLESS_LLM_INDEX_TASK_CRON": "disable",
-                    "PAPERLESS_SHARE_LINK_BUNDLE_CLEANUP_CRON": "disable",
-                },
-                {},
-                id="all-disabled",
-            ),
-        ],
-    )
-    def test_parse_beat_schedule(
-        self,
-        env: dict[str, str],
-        expected: dict[str, Any],
-        mocker: MockerFixture,
-    ) -> None:
-        mocker.patch.dict(os.environ, env, clear=False)
-        schedule = parse_beat_schedule()
-        assert schedule == expected


 class TestParseDbSettings:
@@ -533,85 +264,3 @@ class TestParseDbSettings:
        settings = parse_db_settings(tmp_path)

        assert settings == expected_database_settings
-
-
-class TestParseIgnoreDates:
-    """Tests the parsing of the PAPERLESS_IGNORE_DATES setting value."""
-
-    def test_no_ignore_dates_set(self) -> None:
-        """
-        GIVEN:
-            - No ignore dates are set
-        THEN:
-            - No ignore dates are parsed
-        """
-        assert parse_ignore_dates("", "YMD") == set()
-
-    @pytest.mark.parametrize(
-        ("env_str", "date_format", "expected"),
-        [
-            pytest.param(
-                "1985-05-01",
-                "YMD",
-                {datetime.date(1985, 5, 1)},
-                id="single-ymd",
-            ),
-            pytest.param(
-                "1985-05-01,1991-12-05",
-                "YMD",
-                {datetime.date(1985, 5, 1), datetime.date(1991, 12, 5)},
-                id="multiple-ymd",
-            ),
-            pytest.param(
-                "2010-12-13",
-                "YMD",
-                {datetime.date(2010, 12, 13)},
-                id="single-ymd-2",
-            ),
-            pytest.param(
-                "11.01.10",
-                "DMY",
-                {datetime.date(2010, 1, 11)},
-                id="single-dmy",
-            ),
-            pytest.param(
-                "11.01.2001,15-06-1996",
-                "DMY",
-                {datetime.date(2001, 1, 11), datetime.date(1996, 6, 15)},
-                id="multiple-dmy",
-            ),
-        ],
-    )
-    def test_ignore_dates_parsed(
-        self,
-        env_str: str,
-        date_format: str,
-        expected: set[datetime.date],
-    ) -> None:
-        """
-        GIVEN:
-            - Ignore dates are set per certain inputs
-        THEN:
-            - All ignore dates are parsed
-        """
-        assert parse_ignore_dates(env_str, date_format) == expected
-
-
-@pytest.mark.parametrize(
-    ("languages", "expected"),
-    [
-        ("de", ["de"]),
-        ("zh", ["zh"]),
-        ("fr+en", ["fr", "en"]),
-        # Locales must be supported
-        ("en-001+fr-CA", ["en-001", "fr-CA"]),
-        ("en-001+fr", ["en-001", "fr"]),
-        # Special case for Chinese: variants seem to miss some dates,
-        # so we always add "zh" as a fallback.
-        ("en+zh-Hans-HK", ["en", "zh-Hans-HK", "zh"]),
-        ("en+zh-Hans", ["en", "zh-Hans", "zh"]),
-        ("en+zh-Hans+zh-Hant", ["en", "zh-Hans", "zh-Hant", "zh"]),
-    ],
-)
-def test_parse_dateparser_languages(languages: str, expected: list[str]) -> None:
-    assert sorted(parse_dateparser_languages(languages)) == sorted(expected)
--- a/src/paperless/tests/settings/test_environment_parsers.py
+++ b/src/paperless/tests/settings/test_environment_parsers.py
@@ -4,12 +4,8 @@ from pathlib import Path
 import pytest
 from pytest_mock import MockerFixture

-from paperless.settings.parsers import get_bool_from_env
 from paperless.settings.parsers import get_choice_from_env
-from paperless.settings.parsers import get_float_from_env
 from paperless.settings.parsers import get_int_from_env
-from paperless.settings.parsers import get_list_from_env
-from paperless.settings.parsers import get_path_from_env
 from paperless.settings.parsers import parse_dict_from_str
 from paperless.settings.parsers import str_to_bool

@@ -209,29 +205,6 @@ class TestParseDictFromString:
        assert isinstance(result["database"]["port"], int)


-class TestGetBoolFromEnv:
-    def test_existing_env_var(self, mocker):
-        """Test that an existing environment variable is read and converted."""
-        mocker.patch.dict(os.environ, {"TEST_VAR": "true"})
-        assert get_bool_from_env("TEST_VAR") is True
-
-    def test_missing_env_var_uses_default_no(self, mocker):
-        """Test that a missing environment variable uses default 'NO' and returns False."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        assert get_bool_from_env("MISSING_VAR") is False
-
-    def test_missing_env_var_with_explicit_default(self, mocker):
-        """Test that a missing environment variable uses the provided default."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        assert get_bool_from_env("MISSING_VAR", default="yes") is True
-
-    def test_invalid_value_raises_error(self, mocker):
-        """Test that an invalid value raises ValueError (delegates to str_to_bool)."""
-        mocker.patch.dict(os.environ, {"INVALID_VAR": "maybe"})
-        with pytest.raises(ValueError):
-            get_bool_from_env("INVALID_VAR")
-
-
 class TestGetIntFromEnv:
    @pytest.mark.parametrize(
        ("env_value", "expected"),
@@ -286,199 +259,6 @@ class TestGetIntFromEnv:
            get_int_from_env("INVALID_INT")


-class TestGetFloatFromEnv:
-    @pytest.mark.parametrize(
-        ("env_value", "expected"),
-        [
-            pytest.param("3.14", 3.14, id="pi"),
-            pytest.param("42", 42.0, id="int_as_float"),
-            pytest.param("-2.5", -2.5, id="negative"),
-            pytest.param("0.0", 0.0, id="zero_float"),
-            pytest.param("0", 0.0, id="zero_int"),
-            pytest.param("1.5e2", 150.0, id="sci_positive"),
-            pytest.param("1e-3", 0.001, id="sci_negative"),
-            pytest.param("-1.23e4", -12300.0, id="sci_large"),
-        ],
-    )
-    def test_existing_env_var_valid_floats(self, mocker, env_value, expected):
-        """Test that existing environment variables with valid floats return correct values."""
-        mocker.patch.dict(os.environ, {"FLOAT_VAR": env_value})
-        assert get_float_from_env("FLOAT_VAR") == expected
-
-    @pytest.mark.parametrize(
-        ("default", "expected"),
-        [
-            pytest.param(3.14, 3.14, id="pi_default"),
-            pytest.param(0.0, 0.0, id="zero_default"),
-            pytest.param(-2.5, -2.5, id="negative_default"),
-            pytest.param(None, None, id="none_default"),
-        ],
-    )
-    def test_missing_env_var_with_defaults(self, mocker, default, expected):
-        """Test that missing environment variables return provided defaults."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        assert get_float_from_env("MISSING_VAR", default=default) == expected
-
-    def test_missing_env_var_no_default(self, mocker):
-        """Test that missing environment variable with no default returns None."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        assert get_float_from_env("MISSING_VAR") is None
-
-    @pytest.mark.parametrize(
-        "invalid_value",
-        [
-            pytest.param("not_a_number", id="text"),
-            pytest.param("42.5.0", id="double_decimal"),
-            pytest.param("42a", id="alpha_suffix"),
-            pytest.param("", id="empty"),
-            pytest.param(" ", id="whitespace"),
-            pytest.param("true", id="boolean"),
-            pytest.param("1.2.3", id="triple_decimal"),
-        ],
-    )
-    def test_invalid_float_values_raise_error(self, mocker, invalid_value):
-        """Test that invalid float values raise ValueError."""
-        mocker.patch.dict(os.environ, {"INVALID_FLOAT": invalid_value})
-        with pytest.raises(ValueError):
-            get_float_from_env("INVALID_FLOAT")
-
-
-class TestGetPathFromEnv:
-    @pytest.mark.parametrize(
-        "env_value",
-        [
-            pytest.param("/tmp/test", id="absolute"),
-            pytest.param("relative/path", id="relative"),
-            pytest.param("/path/with spaces/file.txt", id="spaces"),
-            pytest.param(".", id="current_dir"),
-            pytest.param("..", id="parent_dir"),
-            pytest.param("/", id="root"),
-        ],
-    )
-    def test_existing_env_var_paths(self, mocker, env_value):
-        """Test that existing environment variables with paths return resolved Path objects."""
-        mocker.patch.dict(os.environ, {"PATH_VAR": env_value})
-        result = get_path_from_env("PATH_VAR")
-        assert isinstance(result, Path)
-        assert result == Path(env_value).resolve()
-
-    def test_missing_env_var_no_default(self, mocker):
-        """Test that missing environment variable with no default returns None."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        assert get_path_from_env("MISSING_VAR") is None
-
-    def test_missing_env_var_with_none_default(self, mocker):
-        """Test that missing environment variable with None default returns None."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        assert get_path_from_env("MISSING_VAR", default=None) is None
-
-    @pytest.mark.parametrize(
-        "default_path_str",
-        [
-            pytest.param("/default/path", id="absolute_default"),
-            pytest.param("relative/default", id="relative_default"),
-            pytest.param(".", id="current_default"),
-        ],
-    )
-    def test_missing_env_var_with_path_defaults(self, mocker, default_path_str):
-        """Test that missing environment variables return resolved default Path objects."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        default_path = Path(default_path_str)
-        result = get_path_from_env("MISSING_VAR", default=default_path)
-        assert isinstance(result, Path)
-        assert result == default_path.resolve()
-
-    def test_relative_paths_are_resolved(self, mocker):
-        """Test that relative paths are properly resolved to absolute paths."""
-        mocker.patch.dict(os.environ, {"REL_PATH": "relative/path"})
-        result = get_path_from_env("REL_PATH")
-        assert result is not None
-        assert result.is_absolute()
-
-
-class TestGetListFromEnv:
-    @pytest.mark.parametrize(
-        ("env_value", "expected"),
-        [
-            pytest.param("a,b,c", ["a", "b", "c"], id="basic_comma_separated"),
-            pytest.param("single", ["single"], id="single_element"),
-            pytest.param("", [], id="empty_string"),
-            pytest.param("a, b , c", ["a", "b", "c"], id="whitespace_trimmed"),
-            pytest.param("a,,b,c", ["a", "b", "c"], id="empty_elements_removed"),
-        ],
-    )
-    def test_existing_env_var_basic_parsing(self, mocker, env_value, expected):
-        """Test that existing environment variables are parsed correctly."""
-        mocker.patch.dict(os.environ, {"LIST_VAR": env_value})
-        result = get_list_from_env("LIST_VAR")
-        assert result == expected
-
-    @pytest.mark.parametrize(
-        ("separator", "env_value", "expected"),
-        [
-            pytest.param("|", "a|b|c", ["a", "b", "c"], id="pipe_separator"),
-            pytest.param(":", "a:b:c", ["a", "b", "c"], id="colon_separator"),
-            pytest.param(";", "a;b;c", ["a", "b", "c"], id="semicolon_separator"),
-        ],
-    )
-    def test_custom_separators(self, mocker, separator, env_value, expected):
-        """Test that custom separators work correctly."""
-        mocker.patch.dict(os.environ, {"LIST_VAR": env_value})
-        result = get_list_from_env("LIST_VAR", separator=separator)
-        assert result == expected
-
-    @pytest.mark.parametrize(
-        ("default", "expected"),
-        [
-            pytest.param(
-                ["default1", "default2"],
-                ["default1", "default2"],
-                id="string_list_default",
-            ),
-            pytest.param([1, 2, 3], [1, 2, 3], id="int_list_default"),
-            pytest.param(None, [], id="none_default_returns_empty_list"),
-        ],
-    )
-    def test_missing_env_var_with_defaults(self, mocker, default, expected):
-        """Test that missing environment variables return provided defaults."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        result = get_list_from_env("MISSING_VAR", default=default)
-        assert result == expected
-
-    def test_missing_env_var_no_default(self, mocker):
-        """Test that missing environment variable with no default returns empty list."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        result = get_list_from_env("MISSING_VAR")
-        assert result == []
-
-    def test_required_env_var_missing_raises_error(self, mocker):
-        """Test that missing required environment variable raises ValueError."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        with pytest.raises(
-            ValueError,
-            match="Required environment variable 'REQUIRED_VAR' is not set",
-        ):
-            get_list_from_env("REQUIRED_VAR", required=True)
-
-    def test_required_env_var_with_default_does_not_raise(self, mocker):
-        """Test that required environment variable with default does not raise error."""
-        mocker.patch.dict(os.environ, {}, clear=True)
-        result = get_list_from_env("REQUIRED_VAR", default=["default"], required=True)
-        assert result == ["default"]
-
-    def test_strip_whitespace_false(self, mocker):
-        """Test that whitespace is preserved when strip_whitespace=False."""
-        mocker.patch.dict(os.environ, {"LIST_VAR": " a , b , c "})
-        result = get_list_from_env("LIST_VAR", strip_whitespace=False)
-        assert result == [" a ", " b ", " c "]
-
-    def test_remove_empty_false(self, mocker):
-        """Test that empty elements are preserved when remove_empty=False."""
-        mocker.patch.dict(os.environ, {"LIST_VAR": "a,,b,,c"})
-        result = get_list_from_env("LIST_VAR", remove_empty=False)
-        assert result == ["a", "", "b", "", "c"]
-
-
 class TestGetEnvChoice:
    @pytest.fixture
    def valid_choices(self) -> set[str]:
@@ -614,3 +394,21 @@ class TestGetEnvChoice:
        result = get_choice_from_env("TEST_ENV", large_choices)

        assert result == "option_50"
+
+    def test_different_env_keys(
+        self,
+        mocker: MockerFixture,
+        valid_choices: set[str],
+    ) -> None:
+        """Test function works with different environment variable keys."""
+        test_cases = [
+            ("DJANGO_ENV", "development"),
+            ("DATABASE_BACKEND", "staging"),
+            ("LOG_LEVEL", "production"),
+            ("APP_MODE", "development"),
+        ]
+
+        for env_key, env_value in test_cases:
+            mocker.patch.dict("os.environ", {env_key: env_value})
+            result = get_choice_from_env(env_key, valid_choices)
+            assert result == env_value
--- a/src/paperless/tests/settings/test_settings.py
+++ b/src/paperless/tests/settings/test_settings.py
@@ -1,56 +0,0 @@
-import os
-from unittest import TestCase
-from unittest import mock
-
-from paperless.settings import _parse_paperless_url
-from paperless.settings import default_threads_per_worker
-
-
-class TestThreadCalculation(TestCase):
-    def test_workers_threads(self) -> None:
-        """
-        GIVEN:
-            - Certain CPU counts
-        WHEN:
-            - Threads per worker is calculated
-        THEN:
-            - Threads per worker less than or equal to CPU count
-            - At least 1 thread per worker
-        """
-        default_workers = 1
-
-        for i in range(1, 64):
-            with mock.patch(
-                "paperless.settings.multiprocessing.cpu_count",
-            ) as cpu_count:
-                cpu_count.return_value = i
-
-                default_threads = default_threads_per_worker(default_workers)
-
-                self.assertGreaterEqual(default_threads, 1)
-
-                self.assertLessEqual(default_workers * default_threads, i)
-
-
-class TestPaperlessURLSettings(TestCase):
-    def test_paperless_url(self) -> None:
-        """
-        GIVEN:
-            - PAPERLESS_URL is set
-        WHEN:
-            - The URL is parsed
-        THEN:
-            - The URL is returned and present in related settings
-        """
-        with mock.patch.dict(
-            os.environ,
-            {
-                "PAPERLESS_URL": "https://example.com",
-            },
-        ):
-            url = _parse_paperless_url()
-            self.assertEqual("https://example.com", url)
-            from django.conf import settings
-
-            self.assertIn(url, settings.CSRF_TRUSTED_ORIGINS)
-            self.assertIn(url, settings.CORS_ALLOWED_ORIGINS)
--- a/src/paperless/tests/settings/test_db_cache.py
+++ b/src/paperless/tests/settings/test_db_cache.py
--- a/src/paperless/tests/test_registry.py
+++ b/src/paperless/tests/test_registry.py
@@ -0,0 +1,710 @@
+"""
+Tests for :mod:`paperless.parsers` (ParserProtocol) and
+:mod:`paperless.parsers.registry` (ParserRegistry + module-level helpers).
+
+All tests use pytest-style functions/classes — no unittest.TestCase.
+The ``clean_registry`` fixture ensures complete isolation between tests by
+resetting the module-level singleton before and after every test.
+"""
+
+from __future__ import annotations
+
+import logging
+from importlib.metadata import EntryPoint
+from pathlib import Path
+from typing import Self
+from unittest.mock import MagicMock
+from unittest.mock import patch
+
+import pytest
+
+from paperless.parsers import ParserProtocol
+from paperless.parsers.registry import ParserRegistry
+from paperless.parsers.registry import get_parser_registry
+from paperless.parsers.registry import init_builtin_parsers
+from paperless.parsers.registry import reset_parser_registry
+
+
+@pytest.fixture()
+def dummy_parser_cls() -> type:
+    """Return a class that fully satisfies :class:`ParserProtocol`.
+
+    GIVEN: A need to exercise registry and Protocol logic with a minimal
+           but complete parser.
+    WHEN:  A test requests this fixture.
+    THEN:  A class with all required attributes and methods is returned.
+    """
+
+    class DummyParser:
+        name = "dummy-parser"
+        version = "0.1.0"
+        author = "Test Author"
+        url = "https://example.com/dummy-parser"
+
+        @classmethod
+        def supported_mime_types(cls) -> dict[str, str]:
+            return {"text/plain": ".txt"}
+
+        @classmethod
+        def score(
+            cls,
+            mime_type: str,
+            filename: str,
+            path: Path | None = None,
+        ) -> int | None:
+            return 10
+
+        @property
+        def can_produce_archive(self) -> bool:
+            return False
+
+        @property
+        def requires_pdf_rendition(self) -> bool:
+            return False
+
+        def parse(
+            self,
+            document_path: Path,
+            mime_type: str,
+            *,
+            produce_archive: bool = True,
+        ) -> None:
+            pass
+
+        def get_text(self) -> str | None:
+            return None
+
+        def get_date(self) -> None:
+            return None
+
+        def get_archive_path(self) -> Path | None:
+            return None
+
+        def get_thumbnail(
+            self,
+            document_path: Path,
+            mime_type: str,
+        ) -> Path:
+            return Path("/tmp/thumbnail.webp")
+
+        def get_page_count(
+            self,
+            document_path: Path,
+            mime_type: str,
+        ) -> int | None:
+            return None
+
+        def extract_metadata(
+            self,
+            document_path: Path,
+            mime_type: str,
+        ) -> list:
+            return []
+
+        def __enter__(self) -> Self:
+            return self
+
+        def __exit__(self, exc_type, exc_val, exc_tb) -> None:
+            pass
+
+    return DummyParser
+
+
+class TestParserProtocol:
+    """Verify runtime isinstance() checks against ParserProtocol."""
+
+    def test_compliant_class_instance_passes_isinstance(
+        self,
+        dummy_parser_cls: type,
+    ) -> None:
+        """
+        GIVEN: A class that implements every method required by ParserProtocol.
+        WHEN:  isinstance() is called with the Protocol.
+        THEN:  The check passes (returns True).
+        """
+        instance = dummy_parser_cls()
+        assert isinstance(instance, ParserProtocol)
+
+    def test_non_compliant_class_instance_fails_isinstance(self) -> None:
+        """
+        GIVEN: A plain class with no parser-related methods.
+        WHEN:  isinstance() is called with ParserProtocol.
+        THEN:  The check fails (returns False).
+        """
+
+        class Unrelated:
+            pass
+
+        assert not isinstance(Unrelated(), ParserProtocol)
+
+    @pytest.mark.parametrize(
+        "missing_method",
+        [
+            pytest.param("parse", id="missing-parse"),
+            pytest.param("get_text", id="missing-get_text"),
+            pytest.param("get_thumbnail", id="missing-get_thumbnail"),
+            pytest.param("__enter__", id="missing-__enter__"),
+            pytest.param("__exit__", id="missing-__exit__"),
+        ],
+    )
+    def test_partial_compliant_fails_isinstance(
+        self,
+        dummy_parser_cls: type,
+        missing_method: str,
+    ) -> None:
+        """
+        GIVEN: A class that satisfies ParserProtocol except for one method.
+        WHEN:  isinstance() is called with ParserProtocol.
+        THEN:  The check fails because the Protocol is not fully satisfied.
+        """
+        # Create a subclass and delete the specified method to break compliance.
+        partial_cls = type(
+            "PartialParser",
+            (dummy_parser_cls,),
+            {missing_method: None},  # Replace with None — not callable
+        )
+        assert not isinstance(partial_cls(), ParserProtocol)
+
+
+class TestRegistrySingleton:
+    """Verify the module-level singleton lifecycle functions."""
+
+    def test_get_parser_registry_returns_instance(self) -> None:
+        """
+        GIVEN: No registry has been created yet.
+        WHEN:  get_parser_registry() is called.
+        THEN:  A ParserRegistry instance is returned.
+        """
+        registry = get_parser_registry()
+        assert isinstance(registry, ParserRegistry)
+
+    def test_get_parser_registry_same_instance_on_repeated_calls(self) -> None:
+        """
+        GIVEN: A registry instance was created by a prior call.
+        WHEN:  get_parser_registry() is called a second time.
+        THEN:  The exact same object (identity) is returned.
+        """
+        first = get_parser_registry()
+        second = get_parser_registry()
+        assert first is second
+
+    def test_reset_parser_registry_gives_fresh_instance(self) -> None:
+        """
+        GIVEN: A registry instance already exists.
+        WHEN:  reset_parser_registry() is called and then get_parser_registry()
+               is called again.
+        THEN:  A new, distinct registry instance is returned.
+        """
+        first = get_parser_registry()
+        reset_parser_registry()
+        second = get_parser_registry()
+        assert first is not second
+
+    def test_init_builtin_parsers_does_not_run_discover(
+        self,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """
+        GIVEN: discover() would raise an exception if called.
+        WHEN:  init_builtin_parsers() is called.
+        THEN:  No exception is raised, confirming discover() was not invoked.
+        """
+
+        def exploding_discover(self) -> None:
+            raise RuntimeError(
+                "discover() must not be called from init_builtin_parsers",
+            )
+
+        monkeypatch.setattr(ParserRegistry, "discover", exploding_discover)
+
+        # Should complete without raising.
+        init_builtin_parsers()
+
+    def test_init_builtin_parsers_idempotent(self) -> None:
+        """
+        GIVEN: init_builtin_parsers() has already been called once.
+        WHEN:  init_builtin_parsers() is called a second time.
+        THEN:  No error is raised and the same registry instance is reused.
+        """
+        init_builtin_parsers()
+        # Capture the registry created by the first call.
+        import paperless.parsers.registry as reg_module
+
+        first_registry = reg_module._registry
+
+        init_builtin_parsers()
+
+        assert reg_module._registry is first_registry
+
+
+class TestParserRegistryGetParserForFile:
+    """Verify parser selection logic in get_parser_for_file()."""
+
+    def test_returns_none_when_no_parsers_registered(self) -> None:
+        """
+        GIVEN: A registry with no parsers registered.
+        WHEN:  get_parser_for_file() is called for any MIME type.
+        THEN:  None is returned.
+        """
+        registry = ParserRegistry()
+        result = registry.get_parser_for_file("text/plain", "doc.txt")
+        assert result is None
+
+    def test_returns_none_for_unsupported_mime_type(
+        self,
+        dummy_parser_cls: type,
+    ) -> None:
+        """
+        GIVEN: A registry with a parser that supports only 'text/plain'.
+        WHEN:  get_parser_for_file() is called with 'application/pdf'.
+        THEN:  None is returned.
+        """
+        registry = ParserRegistry()
+        registry.register_builtin(dummy_parser_cls)
+        result = registry.get_parser_for_file("application/pdf", "file.pdf")
+        assert result is None
+
+    def test_returns_parser_for_supported_mime_type(
+        self,
+        dummy_parser_cls: type,
+    ) -> None:
+        """
+        GIVEN: A registry with a parser registered for 'text/plain'.
+        WHEN:  get_parser_for_file() is called with 'text/plain'.
+        THEN:  The registered parser class is returned.
+        """
+        registry = ParserRegistry()
+        registry.register_builtin(dummy_parser_cls)
+        result = registry.get_parser_for_file("text/plain", "readme.txt")
+        assert result is dummy_parser_cls
+
+    def test_highest_score_wins(self) -> None:
+        """
+        GIVEN: Two parsers both supporting 'text/plain' with scores 5 and 20.
+        WHEN:  get_parser_for_file() is called for 'text/plain'.
+        THEN:  The parser with score 20 is returned.
+        """
+
+        class LowScoreParser:
+            name = "low"
+            version = "1.0"
+            author = "A"
+            url = "https://example.com/low"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 5
+
+        class HighScoreParser:
+            name = "high"
+            version = "1.0"
+            author = "B"
+            url = "https://example.com/high"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 20
+
+        registry = ParserRegistry()
+        registry.register_builtin(LowScoreParser)
+        registry.register_builtin(HighScoreParser)
+        result = registry.get_parser_for_file("text/plain", "readme.txt")
+        assert result is HighScoreParser
+
+    def test_parser_returning_none_score_is_skipped(self) -> None:
+        """
+        GIVEN: A parser that returns None from score() for the given file.
+        WHEN:  get_parser_for_file() is called.
+        THEN:  That parser is skipped and None is returned (no other candidates).
+        """
+
+        class DecliningParser:
+            name = "declining"
+            version = "1.0"
+            author = "A"
+            url = "https://example.com"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None  # Explicitly declines
+
+        registry = ParserRegistry()
+        registry.register_builtin(DecliningParser)
+        result = registry.get_parser_for_file("text/plain", "readme.txt")
+        assert result is None
+
+    def test_all_parsers_decline_returns_none(self) -> None:
+        """
+        GIVEN: Multiple parsers that all return None from score().
+        WHEN:  get_parser_for_file() is called.
+        THEN:  None is returned.
+        """
+
+        class AlwaysDeclines:
+            name = "declines"
+            version = "1.0"
+            author = "A"
+            url = "https://example.com"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None
+
+        registry = ParserRegistry()
+        registry.register_builtin(AlwaysDeclines)
+        registry._external.append(AlwaysDeclines)
+        result = registry.get_parser_for_file("text/plain", "file.txt")
+        assert result is None
+
+    def test_external_parser_beats_builtin_same_score(self) -> None:
+        """
+        GIVEN: An external and a built-in parser both returning score 10.
+        WHEN:  get_parser_for_file() is called.
+        THEN:  The external parser wins because externals are evaluated first
+               and the first-seen-wins policy applies at equal scores.
+        """
+
+        class BuiltinParser:
+            name = "builtin"
+            version = "1.0"
+            author = "Core"
+            url = "https://example.com/builtin"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 10
+
+        class ExternalParser:
+            name = "external"
+            version = "2.0"
+            author = "Third Party"
+            url = "https://example.com/external"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 10
+
+        registry = ParserRegistry()
+        registry.register_builtin(BuiltinParser)
+        registry._external.append(ExternalParser)
+        result = registry.get_parser_for_file("text/plain", "file.txt")
+        assert result is ExternalParser
+
+    def test_builtin_wins_when_external_declines(self) -> None:
+        """
+        GIVEN: An external parser that declines (score None) and a built-in
+               that returns score 5.
+        WHEN:  get_parser_for_file() is called.
+        THEN:  The built-in parser is returned.
+        """
+
+        class DecliningExternal:
+            name = "declining-external"
+            version = "1.0"
+            author = "Third Party"
+            url = "https://example.com/declining"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None
+
+        class AcceptingBuiltin:
+            name = "accepting-builtin"
+            version = "1.0"
+            author = "Core"
+            url = "https://example.com/accepting"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 5
+
+        registry = ParserRegistry()
+        registry.register_builtin(AcceptingBuiltin)
+        registry._external.append(DecliningExternal)
+        result = registry.get_parser_for_file("text/plain", "file.txt")
+        assert result is AcceptingBuiltin
+
+
+class TestDiscover:
+    """Verify entrypoint discovery in ParserRegistry.discover()."""
+
+    def test_discover_with_no_entrypoints(self) -> None:
+        """
+        GIVEN: No entrypoints are registered under 'paperless_ngx.parsers'.
+        WHEN:  discover() is called.
+        THEN:  _external remains empty and no errors are raised.
+        """
+        registry = ParserRegistry()
+
+        with patch(
+            "paperless.parsers.registry.entry_points",
+            return_value=[],
+        ):
+            registry.discover()
+
+        assert registry._external == []
+
+    def test_discover_adds_valid_external_parser(self) -> None:
+        """
+        GIVEN: One valid entrypoint whose loaded class has all required attrs.
+        WHEN:  discover() is called.
+        THEN:  The class is appended to _external.
+        """
+
+        class ValidExternal:
+            name = "valid-external"
+            version = "3.0.0"
+            author = "Someone"
+            url = "https://example.com/valid"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"application/pdf": ".pdf"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 5
+
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "valid_external"
+        mock_ep.load.return_value = ValidExternal
+
+        registry = ParserRegistry()
+
+        with patch(
+            "paperless.parsers.registry.entry_points",
+            return_value=[mock_ep],
+        ):
+            registry.discover()
+
+        assert ValidExternal in registry._external
+
+    def test_discover_skips_entrypoint_with_load_error(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: An entrypoint whose load() method raises ImportError.
+        WHEN:  discover() is called.
+        THEN:  The entrypoint is skipped, an error is logged, and _external
+               remains empty.
+        """
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "broken_ep"
+        mock_ep.load.side_effect = ImportError("missing dependency")
+
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.ERROR, logger="paperless.parsers.registry"):
+            with patch(
+                "paperless.parsers.registry.entry_points",
+                return_value=[mock_ep],
+            ):
+                registry.discover()
+
+        assert registry._external == []
+        assert any(
+            "broken_ep" in record.message
+            for record in caplog.records
+            if record.levelno >= logging.ERROR
+        )
+
+    def test_discover_skips_entrypoint_with_missing_attrs(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A class loaded from an entrypoint that is missing the 'score'
+               attribute.
+        WHEN:  discover() is called.
+        THEN:  The entrypoint is skipped, a warning is logged, and _external
+               remains empty.
+        """
+
+        class MissingScore:
+            name = "missing-score"
+            version = "1.0"
+            author = "Someone"
+            url = "https://example.com"
+
+            # 'score' classmethod is intentionally absent.
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "missing_score_ep"
+        mock_ep.load.return_value = MissingScore
+
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.WARNING, logger="paperless.parsers.registry"):
+            with patch(
+                "paperless.parsers.registry.entry_points",
+                return_value=[mock_ep],
+            ):
+                registry.discover()
+
+        assert registry._external == []
+        assert any(
+            "missing_score_ep" in record.message
+            for record in caplog.records
+            if record.levelno >= logging.WARNING
+        )
+
+    def test_discover_logs_loaded_parser_info(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A valid entrypoint that loads successfully.
+        WHEN:  discover() is called.
+        THEN:  An INFO log message is emitted containing the parser name,
+               version, author, and entrypoint name.
+        """
+
+        class LoggableParser:
+            name = "loggable"
+            version = "4.2.0"
+            author = "Log Tester"
+            url = "https://example.com/loggable"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"image/png": ".png"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 1
+
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "loggable_ep"
+        mock_ep.load.return_value = LoggableParser
+
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            with patch(
+                "paperless.parsers.registry.entry_points",
+                return_value=[mock_ep],
+            ):
+                registry.discover()
+
+        info_messages = " ".join(
+            r.message for r in caplog.records if r.levelno == logging.INFO
+        )
+        assert "loggable" in info_messages
+        assert "4.2.0" in info_messages
+        assert "Log Tester" in info_messages
+        assert "loggable_ep" in info_messages
+
+
+class TestLogSummary:
+    """Verify log output from ParserRegistry.log_summary()."""
+
+    def test_log_summary_with_no_external_parsers(
+        self,
+        dummy_parser_cls: type,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A registry with one built-in parser and no external parsers.
+        WHEN:  log_summary() is called.
+        THEN:  The built-in parser name appears in the logs.
+        """
+        registry = ParserRegistry()
+        registry.register_builtin(dummy_parser_cls)
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            registry.log_summary()
+
+        all_messages = " ".join(r.message for r in caplog.records)
+        assert dummy_parser_cls.name in all_messages
+
+    def test_log_summary_with_external_parsers(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A registry with one external parser registered.
+        WHEN:  log_summary() is called.
+        THEN:  The external parser name, version, author, and url appear in
+               the log output.
+        """
+
+        class ExtParser:
+            name = "ext-parser"
+            version = "9.9.9"
+            author = "Ext Corp"
+            url = "https://ext.example.com"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None
+
+        registry = ParserRegistry()
+        registry._external.append(ExtParser)
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            registry.log_summary()
+
+        all_messages = " ".join(r.message for r in caplog.records)
+        assert "ext-parser" in all_messages
+        assert "9.9.9" in all_messages
+        assert "Ext Corp" in all_messages
+        assert "https://ext.example.com" in all_messages
+
+    def test_log_summary_logs_no_third_party_message_when_none(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A registry with no external parsers.
+        WHEN:  log_summary() is called.
+        THEN:  A message containing 'No third-party parsers discovered.' is
+               logged.
+        """
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            registry.log_summary()
+
+        all_messages = " ".join(r.message for r in caplog.records)
+        assert "No third-party parsers discovered." in all_messages
--- a/src/paperless/tests/settings/test_remote_user.py
+++ b/src/paperless/tests/settings/test_remote_user.py
--- a/src/paperless/tests/test_settings.py
+++ b/src/paperless/tests/test_settings.py
@@ -0,0 +1,482 @@
+import datetime
+import os
+from unittest import TestCase
+from unittest import mock
+
+import pytest
+from celery.schedules import crontab
+
+from paperless.settings import _parse_base_paths
+from paperless.settings import _parse_beat_schedule
+from paperless.settings import _parse_dateparser_languages
+from paperless.settings import _parse_ignore_dates
+from paperless.settings import _parse_paperless_url
+from paperless.settings import _parse_redis_url
+from paperless.settings import default_threads_per_worker
+
+
+class TestIgnoreDateParsing(TestCase):
+    """
+    Tests the parsing of the PAPERLESS_IGNORE_DATES setting value
+    """
+
+    def _parse_checker(self, test_cases) -> None:
+        """
+        Helper function to check ignore date parsing
+
+        Args:
+            test_cases (_type_): _description_
+        """
+        for env_str, date_format, expected_date_set in test_cases:
+            self.assertSetEqual(
+                _parse_ignore_dates(env_str, date_format),
+                expected_date_set,
+            )
+
+    def test_no_ignore_dates_set(self) -> None:
+        """
+        GIVEN:
+            - No ignore dates are set
+        THEN:
+            - No ignore dates are parsed
+        """
+        self.assertSetEqual(_parse_ignore_dates(""), set())
+
+    def test_single_ignore_dates_set(self) -> None:
+        """
+        GIVEN:
+            - Ignore dates are set per certain inputs
+        THEN:
+            - All ignore dates are parsed
+        """
+        test_cases = [
+            ("1985-05-01", "YMD", {datetime.date(1985, 5, 1)}),
+            (
+                "1985-05-01,1991-12-05",
+                "YMD",
+                {datetime.date(1985, 5, 1), datetime.date(1991, 12, 5)},
+            ),
+            ("2010-12-13", "YMD", {datetime.date(2010, 12, 13)}),
+            ("11.01.10", "DMY", {datetime.date(2010, 1, 11)}),
+            (
+                "11.01.2001,15-06-1996",
+                "DMY",
+                {datetime.date(2001, 1, 11), datetime.date(1996, 6, 15)},
+            ),
+        ]
+
+        self._parse_checker(test_cases)
+
+
+class TestThreadCalculation(TestCase):
+    def test_workers_threads(self) -> None:
+        """
+        GIVEN:
+            - Certain CPU counts
+        WHEN:
+            - Threads per worker is calculated
+        THEN:
+            - Threads per worker less than or equal to CPU count
+            - At least 1 thread per worker
+        """
+        default_workers = 1
+
+        for i in range(1, 64):
+            with mock.patch(
+                "paperless.settings.multiprocessing.cpu_count",
+            ) as cpu_count:
+                cpu_count.return_value = i
+
+                default_threads = default_threads_per_worker(default_workers)
+
+                self.assertGreaterEqual(default_threads, 1)
+
+                self.assertLessEqual(default_workers * default_threads, i)
+
+
+class TestRedisSocketConversion(TestCase):
+    def test_redis_socket_parsing(self) -> None:
+        """
+        GIVEN:
+            - Various Redis connection URI formats
+        WHEN:
+            - The URI is parsed
+        THEN:
+            - Socket based URIs are translated
+            - Non-socket URIs are unchanged
+            - None provided uses default
+        """
+
+        for input, expected in [
+            # Nothing is set
+            (None, ("redis://localhost:6379", "redis://localhost:6379")),
+            # celery style
+            (
+                "redis+socket:///run/redis/redis.sock",
+                (
+                    "redis+socket:///run/redis/redis.sock",
+                    "unix:///run/redis/redis.sock",
+                ),
+            ),
+            # redis-py / channels-redis style
+            (
+                "unix:///run/redis/redis.sock",
+                (
+                    "redis+socket:///run/redis/redis.sock",
+                    "unix:///run/redis/redis.sock",
+                ),
+            ),
+            # celery style with db
+            (
+                "redis+socket:///run/redis/redis.sock?virtual_host=5",
+                (
+                    "redis+socket:///run/redis/redis.sock?virtual_host=5",
+                    "unix:///run/redis/redis.sock?db=5",
+                ),
+            ),
+            # redis-py / channels-redis style with db
+            (
+                "unix:///run/redis/redis.sock?db=10",
+                (
+                    "redis+socket:///run/redis/redis.sock?virtual_host=10",
+                    "unix:///run/redis/redis.sock?db=10",
+                ),
+            ),
+            # Just a host with a port
+            (
+                "redis://myredishost:6379",
+                ("redis://myredishost:6379", "redis://myredishost:6379"),
+            ),
+        ]:
+            result = _parse_redis_url(input)
+            self.assertTupleEqual(expected, result)
+
+
+class TestCeleryScheduleParsing(TestCase):
+    MAIL_EXPIRE_TIME = 9.0 * 60.0
+    CLASSIFIER_EXPIRE_TIME = 59.0 * 60.0
+    INDEX_EXPIRE_TIME = 23.0 * 60.0 * 60.0
+    SANITY_EXPIRE_TIME = ((7.0 * 24.0) - 1.0) * 60.0 * 60.0
+    EMPTY_TRASH_EXPIRE_TIME = 23.0 * 60.0 * 60.0
+    RUN_SCHEDULED_WORKFLOWS_EXPIRE_TIME = 59.0 * 60.0
+    LLM_INDEX_EXPIRE_TIME = 23.0 * 60.0 * 60.0
+    CLEANUP_EXPIRED_SHARE_BUNDLES_EXPIRE_TIME = 23.0 * 60.0 * 60.0
+
+    def test_schedule_configuration_default(self) -> None:
+        """
+        GIVEN:
+            - No configured task schedules
+        WHEN:
+            - The celery beat schedule is built
+        THEN:
+            - The default schedule is returned
+        """
+        schedule = _parse_beat_schedule()
+
+        self.assertDictEqual(
+            {
+                "Check all e-mail accounts": {
+                    "task": "paperless_mail.tasks.process_mail_accounts",
+                    "schedule": crontab(minute="*/10"),
+                    "options": {"expires": self.MAIL_EXPIRE_TIME},
+                },
+                "Train the classifier": {
+                    "task": "documents.tasks.train_classifier",
+                    "schedule": crontab(minute="5", hour="*/1"),
+                    "options": {"expires": self.CLASSIFIER_EXPIRE_TIME},
+                },
+                "Optimize the index": {
+                    "task": "documents.tasks.index_optimize",
+                    "schedule": crontab(minute=0, hour=0),
+                    "options": {"expires": self.INDEX_EXPIRE_TIME},
+                },
+                "Perform sanity check": {
+                    "task": "documents.tasks.sanity_check",
+                    "schedule": crontab(minute=30, hour=0, day_of_week="sun"),
+                    "options": {"expires": self.SANITY_EXPIRE_TIME},
+                },
+                "Empty trash": {
+                    "task": "documents.tasks.empty_trash",
+                    "schedule": crontab(minute=0, hour="1"),
+                    "options": {"expires": self.EMPTY_TRASH_EXPIRE_TIME},
+                },
+                "Check and run scheduled workflows": {
+                    "task": "documents.tasks.check_scheduled_workflows",
+                    "schedule": crontab(minute="5", hour="*/1"),
+                    "options": {"expires": self.RUN_SCHEDULED_WORKFLOWS_EXPIRE_TIME},
+                },
+                "Rebuild LLM index": {
+                    "task": "documents.tasks.llmindex_index",
+                    "schedule": crontab(minute=10, hour=2),
+                    "options": {
+                        "expires": self.LLM_INDEX_EXPIRE_TIME,
+                    },
+                },
+                "Cleanup expired share link bundles": {
+                    "task": "documents.tasks.cleanup_expired_share_link_bundles",
+                    "schedule": crontab(minute=0, hour=2),
+                    "options": {
+                        "expires": self.CLEANUP_EXPIRED_SHARE_BUNDLES_EXPIRE_TIME,
+                    },
+                },
+            },
+            schedule,
+        )
+
+    def test_schedule_configuration_changed(self) -> None:
+        """
+        GIVEN:
+            - Email task is configured non-default
+        WHEN:
+            - The celery beat schedule is built
+        THEN:
+            - The email task is configured per environment
+            - The default schedule is returned for other tasks
+        """
+        with mock.patch.dict(
+            os.environ,
+            {"PAPERLESS_EMAIL_TASK_CRON": "*/50 * * * mon"},
+        ):
+            schedule = _parse_beat_schedule()
+
+        self.assertDictEqual(
+            {
+                "Check all e-mail accounts": {
+                    "task": "paperless_mail.tasks.process_mail_accounts",
+                    "schedule": crontab(minute="*/50", day_of_week="mon"),
+                    "options": {"expires": self.MAIL_EXPIRE_TIME},
+                },
+                "Train the classifier": {
+                    "task": "documents.tasks.train_classifier",
+                    "schedule": crontab(minute="5", hour="*/1"),
+                    "options": {"expires": self.CLASSIFIER_EXPIRE_TIME},
+                },
+                "Optimize the index": {
+                    "task": "documents.tasks.index_optimize",
+                    "schedule": crontab(minute=0, hour=0),
+                    "options": {"expires": self.INDEX_EXPIRE_TIME},
+                },
+                "Perform sanity check": {
+                    "task": "documents.tasks.sanity_check",
+                    "schedule": crontab(minute=30, hour=0, day_of_week="sun"),
+                    "options": {"expires": self.SANITY_EXPIRE_TIME},
+                },
+                "Empty trash": {
+                    "task": "documents.tasks.empty_trash",
+                    "schedule": crontab(minute=0, hour="1"),
+                    "options": {"expires": self.EMPTY_TRASH_EXPIRE_TIME},
+                },
+                "Check and run scheduled workflows": {
+                    "task": "documents.tasks.check_scheduled_workflows",
+                    "schedule": crontab(minute="5", hour="*/1"),
+                    "options": {"expires": self.RUN_SCHEDULED_WORKFLOWS_EXPIRE_TIME},
+                },
+                "Rebuild LLM index": {
+                    "task": "documents.tasks.llmindex_index",
+                    "schedule": crontab(minute=10, hour=2),
+                    "options": {
+                        "expires": self.LLM_INDEX_EXPIRE_TIME,
+                    },
+                },
+                "Cleanup expired share link bundles": {
+                    "task": "documents.tasks.cleanup_expired_share_link_bundles",
+                    "schedule": crontab(minute=0, hour=2),
+                    "options": {
+                        "expires": self.CLEANUP_EXPIRED_SHARE_BUNDLES_EXPIRE_TIME,
+                    },
+                },
+            },
+            schedule,
+        )
+
+    def test_schedule_configuration_disabled(self) -> None:
+        """
+        GIVEN:
+            - Search index task is disabled
+        WHEN:
+            - The celery beat schedule is built
+        THEN:
+            - The search index task is not present
+            - The default schedule is returned for other tasks
+        """
+        with mock.patch.dict(os.environ, {"PAPERLESS_INDEX_TASK_CRON": "disable"}):
+            schedule = _parse_beat_schedule()
+
+        self.assertDictEqual(
+            {
+                "Check all e-mail accounts": {
+                    "task": "paperless_mail.tasks.process_mail_accounts",
+                    "schedule": crontab(minute="*/10"),
+                    "options": {"expires": self.MAIL_EXPIRE_TIME},
+                },
+                "Train the classifier": {
+                    "task": "documents.tasks.train_classifier",
+                    "schedule": crontab(minute="5", hour="*/1"),
+                    "options": {"expires": self.CLASSIFIER_EXPIRE_TIME},
+                },
+                "Perform sanity check": {
+                    "task": "documents.tasks.sanity_check",
+                    "schedule": crontab(minute=30, hour=0, day_of_week="sun"),
+                    "options": {"expires": self.SANITY_EXPIRE_TIME},
+                },
+                "Empty trash": {
+                    "task": "documents.tasks.empty_trash",
+                    "schedule": crontab(minute=0, hour="1"),
+                    "options": {"expires": self.EMPTY_TRASH_EXPIRE_TIME},
+                },
+                "Check and run scheduled workflows": {
+                    "task": "documents.tasks.check_scheduled_workflows",
+                    "schedule": crontab(minute="5", hour="*/1"),
+                    "options": {"expires": self.RUN_SCHEDULED_WORKFLOWS_EXPIRE_TIME},
+                },
+                "Rebuild LLM index": {
+                    "task": "documents.tasks.llmindex_index",
+                    "schedule": crontab(minute=10, hour=2),
+                    "options": {
+                        "expires": self.LLM_INDEX_EXPIRE_TIME,
+                    },
+                },
+                "Cleanup expired share link bundles": {
+                    "task": "documents.tasks.cleanup_expired_share_link_bundles",
+                    "schedule": crontab(minute=0, hour=2),
+                    "options": {
+                        "expires": self.CLEANUP_EXPIRED_SHARE_BUNDLES_EXPIRE_TIME,
+                    },
+                },
+            },
+            schedule,
+        )
+
+    def test_schedule_configuration_disabled_all(self) -> None:
+        """
+        GIVEN:
+            - All tasks are disabled
+        WHEN:
+            - The celery beat schedule is built
+        THEN:
+            - No tasks are scheduled
+        """
+        with mock.patch.dict(
+            os.environ,
+            {
+                "PAPERLESS_EMAIL_TASK_CRON": "disable",
+                "PAPERLESS_TRAIN_TASK_CRON": "disable",
+                "PAPERLESS_SANITY_TASK_CRON": "disable",
+                "PAPERLESS_INDEX_TASK_CRON": "disable",
+                "PAPERLESS_EMPTY_TRASH_TASK_CRON": "disable",
+                "PAPERLESS_WORKFLOW_SCHEDULED_TASK_CRON": "disable",
+                "PAPERLESS_LLM_INDEX_TASK_CRON": "disable",
+                "PAPERLESS_SHARE_LINK_BUNDLE_CLEANUP_CRON": "disable",
+            },
+        ):
+            schedule = _parse_beat_schedule()
+
+        self.assertDictEqual(
+            {},
+            schedule,
+        )
+
+
+class TestPaperlessURLSettings(TestCase):
+    def test_paperless_url(self) -> None:
+        """
+        GIVEN:
+            - PAPERLESS_URL is set
+        WHEN:
+            - The URL is parsed
+        THEN:
+            - The URL is returned and present in related settings
+        """
+        with mock.patch.dict(
+            os.environ,
+            {
+                "PAPERLESS_URL": "https://example.com",
+            },
+        ):
+            url = _parse_paperless_url()
+            self.assertEqual("https://example.com", url)
+            from django.conf import settings
+
+            self.assertIn(url, settings.CSRF_TRUSTED_ORIGINS)
+            self.assertIn(url, settings.CORS_ALLOWED_ORIGINS)
+
+
+class TestPathSettings(TestCase):
+    def test_default_paths(self) -> None:
+        """
+        GIVEN:
+            - PAPERLESS_FORCE_SCRIPT_NAME is not set
+        WHEN:
+            - Settings are parsed
+        THEN:
+            - Paths are as expected
+        """
+        base_paths = _parse_base_paths()
+        self.assertEqual(None, base_paths[0])  # FORCE_SCRIPT_NAME
+        self.assertEqual("/", base_paths[1])  # BASE_URL
+        self.assertEqual("/accounts/login/", base_paths[2])  # LOGIN_URL
+        self.assertEqual("/dashboard", base_paths[3])  # LOGIN_REDIRECT_URL
+        self.assertEqual(
+            "/accounts/login/?loggedout=1",
+            base_paths[4],
+        )  # LOGOUT_REDIRECT_URL
+
+    @mock.patch("os.environ", {"PAPERLESS_FORCE_SCRIPT_NAME": "/paperless"})
+    def test_subpath(self) -> None:
+        """
+        GIVEN:
+            - PAPERLESS_FORCE_SCRIPT_NAME is set
+        WHEN:
+            - Settings are parsed
+        THEN:
+            - The path is returned and present in related settings
+        """
+        base_paths = _parse_base_paths()
+        self.assertEqual("/paperless", base_paths[0])  # FORCE_SCRIPT_NAME
+        self.assertEqual("/paperless/", base_paths[1])  # BASE_URL
+        self.assertEqual("/paperless/accounts/login/", base_paths[2])  # LOGIN_URL
+        self.assertEqual("/paperless/dashboard", base_paths[3])  # LOGIN_REDIRECT_URL
+        self.assertEqual(
+            "/paperless/accounts/login/?loggedout=1",
+            base_paths[4],
+        )  # LOGOUT_REDIRECT_URL
+
+    @mock.patch(
+        "os.environ",
+        {
+            "PAPERLESS_FORCE_SCRIPT_NAME": "/paperless",
+            "PAPERLESS_LOGOUT_REDIRECT_URL": "/foobar/",
+        },
+    )
+    def test_subpath_with_explicit_logout_url(self) -> None:
+        """
+        GIVEN:
+            - PAPERLESS_FORCE_SCRIPT_NAME is set and so is PAPERLESS_LOGOUT_REDIRECT_URL
+        WHEN:
+            - Settings are parsed
+        THEN:
+            - The correct logout redirect URL is returned
+        """
+        base_paths = _parse_base_paths()
+        self.assertEqual("/paperless/", base_paths[1])  # BASE_URL
+        self.assertEqual("/foobar/", base_paths[4])  # LOGOUT_REDIRECT_URL
+
+
+@pytest.mark.parametrize(
+    ("languages", "expected"),
+    [
+        ("de", ["de"]),
+        ("zh", ["zh"]),
+        ("fr+en", ["fr", "en"]),
+        # Locales must be supported
+        ("en-001+fr-CA", ["en-001", "fr-CA"]),
+        ("en-001+fr", ["en-001", "fr"]),
+        # Special case for Chinese: variants seem to miss some dates,
+        # so we always add "zh" as a fallback.
+        ("en+zh-Hans-HK", ["en", "zh-Hans-HK", "zh"]),
+        ("en+zh-Hans", ["en", "zh-Hans", "zh"]),
+        ("en+zh-Hans+zh-Hant", ["en", "zh-Hans", "zh-Hant", "zh"]),
+    ],
+)
+def test_parser_date_parser_languages(languages, expected) -> None:
+    assert sorted(_parse_dateparser_languages(languages)) == sorted(expected)
--- a/src/paperless_text/parsers.py
+++ b/src/paperless_text/parsers.py
@@ -1,50 +0,0 @@
-from pathlib import Path
-
-from django.conf import settings
-from PIL import Image
-from PIL import ImageDraw
-from PIL import ImageFont
-
-from documents.parsers import DocumentParser
-
-
-class TextDocumentParser(DocumentParser):
-    """
-    This parser directly parses a text document (.txt, .md, or .csv)
-    """
-
-    logging_name = "paperless.parsing.text"
-
-    def get_thumbnail(self, document_path: Path, mime_type, file_name=None) -> Path:
-        # Avoid reading entire file into memory
-        max_chars = 100_000
-        file_size_limit = 50 * 1024 * 1024
-
-        if document_path.stat().st_size > file_size_limit:
-            text = "[File too large to preview]"
-        else:
-            with Path(document_path).open("r", encoding="utf-8", errors="replace") as f:
-                text = f.read(max_chars)
-
-        img = Image.new("RGB", (500, 700), color="white")
-        draw = ImageDraw.Draw(img)
-        font = ImageFont.truetype(
-            font=settings.THUMBNAIL_FONT_NAME,
-            size=20,
-            layout_engine=ImageFont.Layout.BASIC,
-        )
-        draw.multiline_text((5, 5), text, font=font, fill="black", spacing=4)
-
-        out_path = self.tempdir / "thumb.webp"
-        img.save(out_path, format="WEBP")
-
-        return out_path
-
-    def parse(self, document_path, mime_type, file_name=None) -> None:
-        self.text = self.read_file_handle_unicode_errors(document_path)
-
-    def get_settings(self) -> None:
-        """
-        This parser does not implement additional settings yet
-        """
-        return None
Author	SHA1	Message	Date
Trenton H	7eb417e796	Feat: refactor TextDocumentParser to ParserProtocol Starting from the moved paperless_text/parsers.py, rewrite the class to satisfy ParserProtocol without inheriting from the old DocumentParser base: - Add class-level identity attributes (name, version, author, url) - Add supported_mime_types() and score() classmethods - Add can_produce_archive and requires_pdf_rendition properties (both False) - Replace tempdir / read_file_handle_unicode_errors from old base class with a self-contained __init__, __enter__, __exit__, and _read_text helper - Drop file_name parameter from parse() and get_thumbnail(); add produce_archive kwarg - Add extract_metadata() returning [] (plain text has no structured metadata) - Remove get_settings() (not part of ParserProtocol) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 16:54:52 -07:00
Trenton H	8c08362ebc	Chore: move paperless_text/parsers.py to paperless/parsers/text.py Preserves git history of the original TextDocumentParser implementation. The file will be edited in the next commit to implement ParserProtocol. Consumption via the old signal-based system is temporarily broken. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 16:31:00 -07:00
Trenton H	c37ab946e1	Feat: add MetadataEntry TypedDict and extract_metadata to ParserProtocol - Define MetadataEntry TypedDict (namespace, prefix, key, value) in paperless.parsers and export it from __all__ - Add extract_metadata(document_path, mime_type) -> list[MetadataEntry] to ParserProtocol; implementations must not raise - Implement extract_metadata on TextDocumentParser (returns []) - Update DummyParser fixture in test_registry to include extract_metadata and align parse/get_thumbnail signatures with the current Protocol - Add TestTextParserMetadata tests covering empty-list return and mime_type-agnostic behaviour Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 16:07:10 -07:00
Trenton H	82068303d0	Use the main version as the built-in parser version	2026-03-09 15:40:28 -07:00
Trenton H	cc8e9a7108	Fix: type ParserRegistry lists and methods as type[ParserProtocol] _builtins, _external, register_builtin, and get_parser_for_file were typed as plain `type`, giving mypy no way to verify that supported_mime_types and score exist on the stored classes. Using type[ParserProtocol] throughout resolves the attr-defined errors and makes the registry's type contract explicit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:30:09 -07:00
Trenton H	1870f69053	Fix: use Self as __enter__ return type in TextDocumentParser Returning the concrete class name would give callers the wrong type if the class is ever subclassed. Self resolves to the actual runtime type, matching the ParserProtocol declaration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:25:54 -07:00
Trenton H	053d590cb8	Fix: align ParserProtocol.__exit__ exc_tb type with TextDocumentParser Both now use TracebackType \| None instead of object. The Protocol's object annotation was overly broad — Python only ever passes TracebackType or None as the third argument to __exit__, and the narrower type is required for pyrefly's contravariant parameter check to pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:24:49 -07:00
Trenton H	987aa363dc	Chore: use text_parser fixture instead of direct instantiation in tests Tests that were using `with TextDocumentParser() as parser:` inline now receive the parser via the text_parser fixture. The two lifecycle tests that must control instantiation directly (cleanup and exception cleanup) are intentionally left unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:23:01 -07:00
Trenton H	b8f63026f7	Chore: reorganise parser tests and samples into sub-directories - Move text sample files into tests/samples/text/ so each parser type has its own folder as more parsers are migrated - Move test_text_parser.py into tests/parsers/ sub-package (new __init__.py) - Split conftest.py: top-level keeps clean_registry + samples_dir; new parsers/conftest.py holds text_samples_dir, sample_txt_file, malformed_txt_file, and text_parser fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:38:23 -07:00
Trenton H	3a232f0c8f	Feature: Phase 3 — migrate TextDocumentParser to ParserProtocol - Add paperless/parsers/text.py: standalone TextDocumentParser implementing ParserProtocol (no inheritance from old DocumentParser ABC); uses __enter__/ __exit__ for tempdir lifecycle, score()-based MIME registration - Register TextDocumentParser in ParserRegistry.register_defaults() - Add paperless/tests/conftest.py: session-scoped sample_dir, sample_txt_file, malformed_txt_file fixtures; function-scoped text_parser fixture using the context-manager protocol; autouse clean_registry fixture (moved from test_registry.py to avoid duplication) - Add paperless/tests/test_text_parser.py: 20 tests covering protocol compliance, lifecycle/cleanup, parse, thumbnail, and registry integration - Copy sample files (test.txt, decode_error.txt) to paperless/tests/samples/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:34:08 -07:00
Trenton H	404ef6b40d	Formatting	2026-03-09 14:25:33 -07:00
Trenton H	8c40491034	Refactor: Clean up ParserProtocol docstrings and drop file_name parameter - Remove all Sphinx cross-reference markup (:meth:, :class:, :func:, :attr:, :data:, backtick quoting) from registry.py and __init__.py docstrings; use plain prose matching the rest of the codebase - Remove unused file_name parameter from parse() and get_thumbnail() in ParserProtocol — no existing parser reads it and the path already carries the filename Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:09:32 -07:00
Trenton H	0f6bdaf5de	Feature: Add parser plugin registry and ParserProtocol (Phase 1 & 2) Introduces the foundation of the entrypoint-based parser discovery system to replace the signal-based document_consumer_declaration approach. - Add ParserProtocol: runtime_checkable Protocol defining the full contract for document parsers (supported_mime_types, score, parse, context manager, result accessors) - Add ParserRegistry: lazy singleton with entrypoint discovery via importlib.metadata group 'paperless_ngx.parsers', uniform score-based selection across external and built-in parsers - Add get_parser_registry(), init_builtin_parsers(), reset_parser_registry() module-level helpers - Wire Celery worker_process_init to call init_builtin_parsers() eagerly in each worker, deferring third-party discovery to first task use - Add 28 pytest tests covering Protocol compliance, singleton lifecycle, scoring logic, entrypoint discovery, and log output Built-in parsers and consumer migration follow in Phases 3-6. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 13:54:52 -07:00