Feat: refactor TextDocumentParser to ParserProtocol

Starting from the moved paperless_text/parsers.py, rewrite the class to satisfy ParserProtocol without inheriting from the old DocumentParser base: - Add class-level identity attributes (name, version, author, url) - Add supported_mime_types() and score() classmethods - Add can_produce_archive and requires_pdf_rendition properties (both False) - Replace tempdir / read_file_handle_unicode_errors from old base class with a self-contained __init__, __enter__, __exit__, and _read_text helper - Drop file_name parameter from parse() and get_thumbnail(); add produce_archive kwarg - Add extract_metadata() returning [] (plain text has no structured metadata) - Remove get_settings() (not part of ParserProtocol) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Chore: move paperless_text/parsers.py to paperless/parsers/text.py
2026-03-10 03:01:23 +00:00 · 2026-03-09 16:54:52 -07:00 · 2026-03-09 16:31:00 -07:00 · 2026-03-09 16:07:10 -07:00 · 2026-03-09 15:40:28 -07:00 · 2026-03-09 15:30:09 -07:00
55 changed files with 3639 additions and 721 deletions
--- a/.github/workflows/ci-backend.yml
+++ b/.github/workflows/ci-backend.yml
@@ -3,21 +3,9 @@ on:
  push:
    branches-ignore:
      - 'translations**'
-    paths:
-      - 'src/**'
-      - 'pyproject.toml'
-      - 'uv.lock'
-      - 'docker/compose/docker-compose.ci-test.yml'
-      - '.github/workflows/ci-backend.yml'
  pull_request:
    branches-ignore:
      - 'translations**'
-    paths:
-      - 'src/**'
-      - 'pyproject.toml'
-      - 'uv.lock'
-      - 'docker/compose/docker-compose.ci-test.yml'
-      - '.github/workflows/ci-backend.yml'
  workflow_dispatch:
 concurrency:
  group: backend-${{ github.event.pull_request.number || github.ref }}
@@ -26,7 +14,55 @@ env:
  DEFAULT_UV_VERSION: "0.10.x"
  NLTK_DATA: "/usr/share/nltk_data"
 jobs:
+  changes:
+    name: Detect Backend Changes
+    runs-on: ubuntu-slim
+    outputs:
+      backend_changed: ${{ steps.force.outputs.run_all == 'true' || steps.filter.outputs.backend == 'true' }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6.0.2
+        with:
+          fetch-depth: 0
+      - name: Decide run mode
+        id: force
+        run: |
+          if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
+            echo "run_all=true" >> "$GITHUB_OUTPUT"
+          elif [[ "${{ github.event_name }}" == "push" && ( "${{ github.ref_name }}" == "main" || "${{ github.ref_name }}" == "dev" ) ]]; then
+            echo "run_all=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "run_all=false" >> "$GITHUB_OUTPUT"
+          fi
+      - name: Set diff range
+        id: range
+        if: steps.force.outputs.run_all != 'true'
+        run: |
+          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
+            echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
+          elif [[ "${{ github.event.created }}" == "true" ]]; then
+            echo "base=origin/${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
+          fi
+          echo "ref=${{ github.sha }}" >> "$GITHUB_OUTPUT"
+      - name: Detect changes
+        id: filter
+        if: steps.force.outputs.run_all != 'true'
+        uses: dorny/paths-filter@v3.0.2
+        with:
+          base: ${{ steps.range.outputs.base }}
+          ref: ${{ steps.range.outputs.ref }}
+          filters: |
+            backend:
+              - 'src/**'
+              - 'pyproject.toml'
+              - 'uv.lock'
+              - 'docker/compose/docker-compose.ci-test.yml'
+              - '.github/workflows/ci-backend.yml'
  test:
+    needs: changes
+    if: needs.changes.outputs.backend_changed == 'true'
    name: "Python ${{ matrix.python-version }}"
    runs-on: ubuntu-24.04
    strategy:
@@ -100,6 +136,8 @@ jobs:
          docker compose --file docker/compose/docker-compose.ci-test.yml logs
          docker compose --file docker/compose/docker-compose.ci-test.yml down
  typing:
+    needs: changes
+    if: needs.changes.outputs.backend_changed == 'true'
    name: Check project typing
    runs-on: ubuntu-24.04
    env:
@@ -150,3 +188,27 @@ jobs:
            --show-error-codes \
            --warn-unused-configs \
            src/ | uv run mypy-baseline filter
+  gate:
+    name: Backend CI Gate
+    needs: [changes, test, typing]
+    if: always()
+    runs-on: ubuntu-slim
+    steps:
+      - name: Check gate
+        run: |
+          if [[ "${{ needs.changes.outputs.backend_changed }}" != "true" ]]; then
+            echo "No backend-relevant changes detected."
+            exit 0
+          fi
+
+          if [[ "${{ needs.test.result }}" != "success" ]]; then
+            echo "::error::Backend test job result: ${{ needs.test.result }}"
+            exit 1
+          fi
+
+          if [[ "${{ needs.typing.result }}" != "success" ]]; then
+            echo "::error::Backend typing job result: ${{ needs.typing.result }}"
+            exit 1
+          fi
+
+          echo "Backend checks passed."
--- a/.github/workflows/ci-docker.yml
+++ b/.github/workflows/ci-docker.yml
@@ -149,15 +149,16 @@ jobs:
          mkdir -p /tmp/digests
          digest="${{ steps.build.outputs.digest }}"
          echo "digest=${digest}"
-          touch "/tmp/digests/${digest#sha256:}"
+          echo "${digest}" > "/tmp/digests/digest-${{ matrix.arch }}.txt"
      - name: Upload digest
        if: steps.check-push.outputs.should-push == 'true'
        uses: actions/upload-artifact@v7.0.0
        with:
          name: digests-${{ matrix.arch }}
-          path: /tmp/digests/*
+          path: /tmp/digests/digest-${{ matrix.arch }}.txt
          if-no-files-found: error
          retention-days: 1
+          archive: false
  merge-and-push:
    name: Merge and Push Manifest
    runs-on: ubuntu-24.04
@@ -171,7 +172,7 @@ jobs:
        uses: actions/download-artifact@v8.0.0
        with:
          path: /tmp/digests
-          pattern: digests-*
+          pattern: digest-*.txt
          merge-multiple: true
      - name: List digests
        run: |
@@ -217,8 +218,9 @@ jobs:
          tags=$(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "${DOCKER_METADATA_OUTPUT_JSON}")

          digests=""
-          for digest in *; do
-            digests+="${{ env.REGISTRY }}/${REPOSITORY}@sha256:${digest} "
+          for digest_file in digest-*.txt; do
+            digest=$(cat "${digest_file}")
+            digests+="${{ env.REGISTRY }}/${REPOSITORY}@${digest} "
          done

          echo "Creating manifest with tags: ${tags}"
--- a/.github/workflows/ci-docs.yml
+++ b/.github/workflows/ci-docs.yml
@@ -1,22 +1,9 @@
 name: Documentation
 on:
  push:
-    branches:
-      - main
-      - dev
-    paths:
-      - 'docs/**'
-      - 'zensical.toml'
-      - 'pyproject.toml'
-      - 'uv.lock'
-      - '.github/workflows/ci-docs.yml'
+    branches-ignore:
+      - 'translations**'
  pull_request:
-    paths:
-      - 'docs/**'
-      - 'zensical.toml'
-      - 'pyproject.toml'
-      - 'uv.lock'
-      - '.github/workflows/ci-docs.yml'
  workflow_dispatch:
 concurrency:
  group: docs-${{ github.event.pull_request.number || github.ref }}
@@ -29,7 +16,55 @@ env:
  DEFAULT_UV_VERSION: "0.10.x"
  DEFAULT_PYTHON_VERSION: "3.12"
 jobs:
+  changes:
+    name: Detect Docs Changes
+    runs-on: ubuntu-slim
+    outputs:
+      docs_changed: ${{ steps.force.outputs.run_all == 'true' || steps.filter.outputs.docs == 'true' }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6.0.2
+        with:
+          fetch-depth: 0
+      - name: Decide run mode
+        id: force
+        run: |
+          if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
+            echo "run_all=true" >> "$GITHUB_OUTPUT"
+          elif [[ "${{ github.event_name }}" == "push" && ( "${{ github.ref_name }}" == "main" || "${{ github.ref_name }}" == "dev" ) ]]; then
+            echo "run_all=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "run_all=false" >> "$GITHUB_OUTPUT"
+          fi
+      - name: Set diff range
+        id: range
+        if: steps.force.outputs.run_all != 'true'
+        run: |
+          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
+            echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
+          elif [[ "${{ github.event.created }}" == "true" ]]; then
+            echo "base=origin/${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
+          fi
+          echo "ref=${{ github.sha }}" >> "$GITHUB_OUTPUT"
+      - name: Detect changes
+        id: filter
+        if: steps.force.outputs.run_all != 'true'
+        uses: dorny/paths-filter@v3.0.2
+        with:
+          base: ${{ steps.range.outputs.base }}
+          ref: ${{ steps.range.outputs.ref }}
+          filters: |
+            docs:
+              - 'docs/**'
+              - 'zensical.toml'
+              - 'pyproject.toml'
+              - 'uv.lock'
+              - '.github/workflows/ci-docs.yml'
  build:
+    needs: changes
+    if: needs.changes.outputs.docs_changed == 'true'
    name: Build Documentation
    runs-on: ubuntu-24.04
    steps:
@@ -64,8 +99,8 @@ jobs:
          name: github-pages-${{ github.run_id }}-${{ github.run_attempt }}
  deploy:
    name: Deploy Documentation
-    needs: build
-    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+    needs: [changes, build]
+    if: github.event_name == 'push' && github.ref == 'refs/heads/main' && needs.changes.outputs.docs_changed == 'true'
    runs-on: ubuntu-24.04
    environment:
      name: github-pages
@@ -76,3 +111,22 @@ jobs:
        id: deployment
        with:
          artifact_name: github-pages-${{ github.run_id }}-${{ github.run_attempt }}
+  gate:
+    name: Docs CI Gate
+    needs: [changes, build]
+    if: always()
+    runs-on: ubuntu-slim
+    steps:
+      - name: Check gate
+        run: |
+          if [[ "${{ needs.changes.outputs.docs_changed }}" != "true" ]]; then
+            echo "No docs-relevant changes detected."
+            exit 0
+          fi
+
+          if [[ "${{ needs.build.result }}" != "success" ]]; then
+            echo "::error::Docs build job result: ${{ needs.build.result }}"
+            exit 1
+          fi
+
+          echo "Docs checks passed."
--- a/.github/workflows/ci-frontend.yml
+++ b/.github/workflows/ci-frontend.yml
@@ -3,21 +3,60 @@ on:
  push:
    branches-ignore:
      - 'translations**'
-    paths:
-      - 'src-ui/**'
-      - '.github/workflows/ci-frontend.yml'
  pull_request:
    branches-ignore:
      - 'translations**'
-    paths:
-      - 'src-ui/**'
-      - '.github/workflows/ci-frontend.yml'
  workflow_dispatch:
 concurrency:
  group: frontend-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true
 jobs:
+  changes:
+    name: Detect Frontend Changes
+    runs-on: ubuntu-slim
+    outputs:
+      frontend_changed: ${{ steps.force.outputs.run_all == 'true' || steps.filter.outputs.frontend == 'true' }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6.0.2
+        with:
+          fetch-depth: 0
+      - name: Decide run mode
+        id: force
+        run: |
+          if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
+            echo "run_all=true" >> "$GITHUB_OUTPUT"
+          elif [[ "${{ github.event_name }}" == "push" && ( "${{ github.ref_name }}" == "main" || "${{ github.ref_name }}" == "dev" ) ]]; then
+            echo "run_all=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "run_all=false" >> "$GITHUB_OUTPUT"
+          fi
+      - name: Set diff range
+        id: range
+        if: steps.force.outputs.run_all != 'true'
+        run: |
+          if [[ "${{ github.event_name }}" == "pull_request" ]]; then
+            echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
+          elif [[ "${{ github.event.created }}" == "true" ]]; then
+            echo "base=origin/${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
+          else
+            echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
+          fi
+          echo "ref=${{ github.sha }}" >> "$GITHUB_OUTPUT"
+      - name: Detect changes
+        id: filter
+        if: steps.force.outputs.run_all != 'true'
+        uses: dorny/paths-filter@v3.0.2
+        with:
+          base: ${{ steps.range.outputs.base }}
+          ref: ${{ steps.range.outputs.ref }}
+          filters: |
+            frontend:
+              - 'src-ui/**'
+              - '.github/workflows/ci-frontend.yml'
  install-dependencies:
+    needs: changes
+    if: needs.changes.outputs.frontend_changed == 'true'
    name: Install Dependencies
    runs-on: ubuntu-24.04
    steps:
@@ -45,7 +84,8 @@ jobs:
        run: cd src-ui && pnpm install
  lint:
    name: Lint
-    needs: install-dependencies
+    needs: [changes, install-dependencies]
+    if: needs.changes.outputs.frontend_changed == 'true'
    runs-on: ubuntu-24.04
    steps:
      - name: Checkout
@@ -73,7 +113,8 @@ jobs:
        run: cd src-ui && pnpm run lint
  unit-tests:
    name: "Unit Tests (${{ matrix.shard-index }}/${{ matrix.shard-count }})"
-    needs: install-dependencies
+    needs: [changes, install-dependencies]
+    if: needs.changes.outputs.frontend_changed == 'true'
    runs-on: ubuntu-24.04
    strategy:
      fail-fast: false
@@ -119,7 +160,8 @@ jobs:
          directory: src-ui/coverage/
  e2e-tests:
    name: "E2E Tests (${{ matrix.shard-index }}/${{ matrix.shard-count }})"
-    needs: install-dependencies
+    needs: [changes, install-dependencies]
+    if: needs.changes.outputs.frontend_changed == 'true'
    runs-on: ubuntu-24.04
    container: mcr.microsoft.com/playwright:v1.58.2-noble
    env:
@@ -159,7 +201,8 @@ jobs:
        run: cd src-ui && pnpm exec playwright test --shard ${{ matrix.shard-index }}/${{ matrix.shard-count }}
  bundle-analysis:
    name: Bundle Analysis
-    needs: [unit-tests, e2e-tests]
+    needs: [changes, unit-tests, e2e-tests]
+    if: needs.changes.outputs.frontend_changed == 'true'
    runs-on: ubuntu-24.04
    steps:
      - name: Checkout
@@ -189,3 +232,42 @@ jobs:
        env:
          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
        run: cd src-ui && pnpm run build --configuration=production
+  gate:
+    name: Frontend CI Gate
+    needs: [changes, install-dependencies, lint, unit-tests, e2e-tests, bundle-analysis]
+    if: always()
+    runs-on: ubuntu-slim
+    steps:
+      - name: Check gate
+        run: |
+          if [[ "${{ needs.changes.outputs.frontend_changed }}" != "true" ]]; then
+            echo "No frontend-relevant changes detected."
+            exit 0
+          fi
+
+          if [[ "${{ needs['install-dependencies'].result }}" != "success" ]]; then
+            echo "::error::Frontend install job result: ${{ needs['install-dependencies'].result }}"
+            exit 1
+          fi
+
+          if [[ "${{ needs.lint.result }}" != "success" ]]; then
+            echo "::error::Frontend lint job result: ${{ needs.lint.result }}"
+            exit 1
+          fi
+
+          if [[ "${{ needs['unit-tests'].result }}" != "success" ]]; then
+            echo "::error::Frontend unit-tests job result: ${{ needs['unit-tests'].result }}"
+            exit 1
+          fi
+
+          if [[ "${{ needs['e2e-tests'].result }}" != "success" ]]; then
+            echo "::error::Frontend e2e-tests job result: ${{ needs['e2e-tests'].result }}"
+            exit 1
+          fi
+
+          if [[ "${{ needs['bundle-analysis'].result }}" != "success" ]]; then
+            echo "::error::Frontend bundle-analysis job result: ${{ needs['bundle-analysis'].result }}"
+            exit 1
+          fi
+
+          echo "Frontend checks passed."
--- a/.github/workflows/pr-bot.yml
+++ b/.github/workflows/pr-bot.yml
@@ -2,13 +2,24 @@ name: PR Bot
 on:
  pull_request_target:
    types: [opened]
-permissions:
-  contents: read
-  pull-requests: write
 jobs:
+  anti-slop:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      issues: read
+      pull-requests: write
+    steps:
+      - uses: peakoss/anti-slop@v0.2.1
+        with:
+          max-failures: 4
+          failure-add-pr-labels: 'ai'
  pr-bot:
    name: Automated PR Bot
    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
    steps:
      - name: Label PR by file path or branch name
        # see .github/labeler.yml for the labeler config
--- a/2
+++ b/2
@@ -30,7 +30,7 @@ RUN set -eux \
 # Purpose: Installs s6-overlay and rootfs
 # Comments:
 #  - Don't leave anything extra in here either
-FROM ghcr.io/astral-sh/uv:0.10.8-python3.12-trixie-slim AS s6-overlay-base
+FROM ghcr.io/astral-sh/uv:0.10.7-python3.12-trixie-slim AS s6-overlay-base

 WORKDIR /usr/src/s6

--- a/docs/development.md
+++ b/docs/development.md
@@ -75,13 +75,13 @@ first-time setup.
 4.  Install the Python dependencies:

    ```bash
-    $ uv sync --group dev
+    uv sync --group dev
    ```

 5.  Install pre-commit hooks:

    ```bash
-    $ uv run prek install
+    uv run prek install
    ```

 6.  Apply migrations and create a superuser (also can be done via the web UI) for your development instance:
@@ -89,8 +89,8 @@ first-time setup.
    ```bash
    # src/

-    $ uv run manage.py migrate
-    $ uv run manage.py createsuperuser
+    uv run manage.py migrate
+    uv run manage.py createsuperuser
    ```

 7.  You can now either ...
@@ -103,7 +103,7 @@ first-time setup.

    -   spin up a bare Redis container

-        ```
+        ```bash
        docker run -d -p 6379:6379 --restart unless-stopped redis:latest
        ```

@@ -118,18 +118,18 @@ work well for development, but you can use whatever you want.
 Configure the IDE to use the `src/`-folder as the base source folder.
 Configure the following launch configurations in your IDE:

-   `python3 manage.py runserver`
-   `python3 manage.py document_consumer`
-   `celery --app paperless worker -l DEBUG` (or any other log level)
+-   `uv run manage.py runserver`
+-   `uv run manage.py document_consumer`
+-   `uv run celery --app paperless worker -l DEBUG` (or any other log level)

 To start them all:

 ```bash
 # src/

-$ python3 manage.py runserver & \
-  python3 manage.py document_consumer & \
-  celery --app paperless worker -l DEBUG
+uv run manage.py runserver & \
+  uv run manage.py document_consumer & \
+  uv run celery --app paperless worker -l DEBUG
 ```

 You might need the front end to test your back end code.
@@ -140,8 +140,8 @@ To build the front end once use this command:
 ```bash
 # src-ui/

-$ pnpm install
-$ ng build --configuration production
+pnpm install
+pnpm ng build --configuration production
 ```

 ### Testing
@@ -199,7 +199,7 @@ The front end is built using AngularJS. In order to get started, you need Node.j
 4.  You can launch a development server by running:

    ```bash
-    ng serve
+    pnpm ng serve
    ```

    This will automatically update whenever you save. However, in-place
@@ -217,21 +217,21 @@ commit. See [above](#code-formatting-with-pre-commit-hooks) for installation ins
 command such as

 ```bash
-$ git ls-files -- '*.ts' | xargs prek run prettier --files
+git ls-files -- '*.ts' | xargs uv run prek run prettier --files
 ```

 Front end testing uses Jest and Playwright. Unit tests and e2e tests,
 respectively, can be run non-interactively with:

 ```bash
-$ ng test
-$ npx playwright test
+pnpm ng test
+pnpm playwright test
 ```

 Playwright also includes a UI which can be run with:

 ```bash
-$ npx playwright test --ui
+pnpm playwright test --ui
 ```

 ### Building the frontend
@@ -239,7 +239,7 @@ $ npx playwright test --ui
 In order to build the front end and serve it as part of Django, execute:

 ```bash
-$ ng build --configuration production
+pnpm ng build --configuration production
 ```

 This will build the front end and put it in a location from which the
@@ -312,10 +312,10 @@ end (such as error messages).
 -   The source language of the project is "en_US".
 -   Localization files end up in the folder `src/locale/`.
 -   In order to extract strings from the application, call
-    `python3 manage.py makemessages -l en_US`. This is important after
+    `uv run manage.py makemessages -l en_US`. This is important after
    making changes to translatable strings.
 -   The message files need to be compiled for them to show up in the
-    application. Call `python3 manage.py compilemessages` to do this.
+    application. Call `uv run manage.py compilemessages` to do this.
    The generated files don't get committed into git, since these are
    derived artifacts. The build pipeline takes care of executing this
    command.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -49,6 +49,7 @@ dependencies = [
  "flower~=2.0.1",
  "gotenberg-client~=0.13.1",
  "httpx-oauth~=0.16",
+  "ijson>=3.2",
  "imap-tools~=1.11.0",
  "jinja2~=3.1.5",
  "langdetect~=1.0.9",
--- a/src-ui/messages.xlf
+++ b/src-ui/messages.xlf
@@ -3434,39 +3434,46 @@
          <context context-type="linenumber">9</context>
        </context-group>
      </trans-unit>
+      <trans-unit id="6705735915615634619" datatype="html">
+        <source>{VAR_PLURAL, plural, =1 {One page} other {<x id="INTERPOLATION"/> pages}}</source>
+        <context-group purpose="location">
+          <context context-type="sourcefile">src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html</context>
+          <context context-type="linenumber">25</context>
+        </context-group>
+      </trans-unit>
      <trans-unit id="7508164375697837821" datatype="html">
        <source>Use metadata from:</source>
        <context-group purpose="location">
          <context context-type="sourcefile">src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html</context>
-          <context context-type="linenumber">22</context>
+          <context context-type="linenumber">34</context>
        </context-group>
      </trans-unit>
      <trans-unit id="2020403212524346652" datatype="html">
        <source>Regenerate all metadata</source>
        <context-group purpose="location">
          <context context-type="sourcefile">src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html</context>
-          <context context-type="linenumber">24</context>
+          <context context-type="linenumber">36</context>
        </context-group>
      </trans-unit>
      <trans-unit id="2710430925353472741" datatype="html">
        <source>Try to include archive version in merge for non-PDF files</source>
        <context-group purpose="location">
          <context context-type="sourcefile">src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html</context>
-          <context context-type="linenumber">32</context>
+          <context context-type="linenumber">44</context>
        </context-group>
      </trans-unit>
      <trans-unit id="5612366187076076264" datatype="html">
        <source>Delete original documents after successful merge</source>
        <context-group purpose="location">
          <context context-type="sourcefile">src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html</context>
-          <context context-type="linenumber">36</context>
+          <context context-type="linenumber">48</context>
        </context-group>
      </trans-unit>
      <trans-unit id="5138283234724909648" datatype="html">
        <source>Note that only PDFs will be included.</source>
        <context-group purpose="location">
          <context context-type="sourcefile">src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html</context>
-          <context context-type="linenumber">39</context>
+          <context context-type="linenumber">51</context>
        </context-group>
      </trans-unit>
      <trans-unit id="1309641780471803652" datatype="html">
--- a/src-ui/src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html
+++ b/src-ui/src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.html
@@ -10,10 +10,22 @@
        <ul class="list-group"
            cdkDropList
            (cdkDropListDropped)="onDrop($event)">
-            @for (documentID of documentIDs; track documentID) {
-                <li class="list-group-item" cdkDrag>
+            @for (document of documents; track document.id) {
+                <li class="list-group-item d-flex align-items-center" cdkDrag>
                    <i-bs name="grip-vertical" class="me-2"></i-bs>
-                    {{getDocument(documentID)?.title}}
+                    <div class="d-flex flex-column">
+                        <div>
+                          @if (document.correspondent) {
+                            <b>{{document.correspondent | correspondentName | async}}: </b>
+                          }{{document.title}}
+                        </div>
+                        <small class="text-muted">
+                          {{document.created | customDate:'mediumDate'}}
+                          @if (document.page_count) {
+                            | {document.page_count, plural, =1 {One page} other {{{document.page_count}} pages}}
+                          }
+                        </small>
+                    </div>
                </li>
            }
        </ul>
--- a/src-ui/src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.ts
+++ b/src-ui/src/app/components/common/confirm-dialog/merge-confirm-dialog/merge-confirm-dialog.component.ts
@@ -3,11 +3,14 @@ import {
  DragDropModule,
  moveItemInArray,
 } from '@angular/cdk/drag-drop'
+import { AsyncPipe } from '@angular/common'
 import { Component, OnInit, inject } from '@angular/core'
 import { FormsModule, ReactiveFormsModule } from '@angular/forms'
 import { NgxBootstrapIconsModule } from 'ngx-bootstrap-icons'
 import { takeUntil } from 'rxjs'
 import { Document } from 'src/app/data/document'
+import { CorrespondentNamePipe } from 'src/app/pipes/correspondent-name.pipe'
+import { CustomDatePipe } from 'src/app/pipes/custom-date.pipe'
 import { PermissionsService } from 'src/app/services/permissions.service'
 import { DocumentService } from 'src/app/services/rest/document.service'
 import { ConfirmDialogComponent } from '../confirm-dialog.component'
@@ -17,6 +20,9 @@ import { ConfirmDialogComponent } from '../confirm-dialog.component'
  templateUrl: './merge-confirm-dialog.component.html',
  styleUrl: './merge-confirm-dialog.component.scss',
  imports: [
+    AsyncPipe,
+    CorrespondentNamePipe,
+    CustomDatePipe,
    DragDropModule,
    FormsModule,
    ReactiveFormsModule,
--- a/src/documents/management/commands/base.py
+++ b/src/documents/management/commands/base.py
@@ -304,7 +304,7 @@ class PaperlessCommand(RichCommand):

        Progress output is directed to stderr to match the convention that
        progress bars are transient UI feedback, not command output. This
-        mirrors tqdm's default behavior and prevents progress bar rendering
+        mirrors the convention that progress bars are transient UI feedback and prevents progress bar rendering
        from interfering with stdout-based assertions in tests or piped
        command output.

--- a/src/documents/management/commands/document_archiver.py
+++ b/src/documents/management/commands/document_archiver.py
@@ -17,6 +17,7 @@ class Command(PaperlessCommand):
        "modified) after their initial import."
    )

+    supports_progress_bar = True
    supports_multiprocessing = True

    def add_arguments(self, parser):
--- a/src/documents/management/commands/document_exporter.py
+++ b/src/documents/management/commands/document_exporter.py
@@ -3,12 +3,10 @@ import json
 import os
 import shutil
 import tempfile
-from itertools import chain
 from itertools import islice
 from pathlib import Path
 from typing import TYPE_CHECKING

-import tqdm
 from allauth.mfa.models import Authenticator
 from allauth.socialaccount.models import SocialAccount
 from allauth.socialaccount.models import SocialApp
@@ -19,7 +17,6 @@ from django.contrib.auth.models import Permission
 from django.contrib.auth.models import User
 from django.contrib.contenttypes.models import ContentType
 from django.core import serializers
-from django.core.management.base import BaseCommand
 from django.core.management.base import CommandError
 from django.core.serializers.json import DjangoJSONEncoder
 from django.db import transaction
@@ -38,6 +35,7 @@ if settings.AUDIT_LOG_ENABLED:

 from documents.file_handling import delete_empty_directories
 from documents.file_handling import generate_filename
+from documents.management.commands.base import PaperlessCommand
 from documents.management.commands.mixins import CryptMixin
 from documents.models import Correspondent
 from documents.models import CustomField
@@ -81,14 +79,99 @@ def serialize_queryset_batched(
        yield serializers.serialize("python", chunk)


-class Command(CryptMixin, BaseCommand):
+class StreamingManifestWriter:
+    """Incrementally writes a JSON array to a file, one record at a time.
+
+    Writes to <target>.tmp first; on close(), optionally BLAKE2b-compares
+    with the existing file (--compare-json) and renames or discards accordingly.
+    On exception, discard() deletes the tmp file and leaves the original intact.
+    """
+
+    def __init__(
+        self,
+        path: Path,
+        *,
+        compare_json: bool = False,
+        files_in_export_dir: "set[Path] | None" = None,
+    ) -> None:
+        self._path = path.resolve()
+        self._tmp_path = self._path.with_suffix(self._path.suffix + ".tmp")
+        self._compare_json = compare_json
+        self._files_in_export_dir: set[Path] = (
+            files_in_export_dir if files_in_export_dir is not None else set()
+        )
+        self._file = None
+        self._first = True
+
+    def open(self) -> None:
+        self._path.parent.mkdir(parents=True, exist_ok=True)
+        self._file = self._tmp_path.open("w", encoding="utf-8")
+        self._file.write("[")
+        self._first = True
+
+    def write_record(self, record: dict) -> None:
+        if not self._first:
+            self._file.write(",\n")
+        else:
+            self._first = False
+        self._file.write(
+            json.dumps(record, cls=DjangoJSONEncoder, indent=2, ensure_ascii=False),
+        )
+
+    def write_batch(self, records: list[dict]) -> None:
+        for record in records:
+            self.write_record(record)
+
+    def close(self) -> None:
+        if self._file is None:
+            return
+        self._file.write("\n]")
+        self._file.close()
+        self._file = None
+        self._finalize()
+
+    def discard(self) -> None:
+        if self._file is not None:
+            self._file.close()
+            self._file = None
+        if self._tmp_path.exists():
+            self._tmp_path.unlink()
+
+    def _finalize(self) -> None:
+        """Compare with existing file (if --compare-json) then rename or discard tmp."""
+        if self._path in self._files_in_export_dir:
+            self._files_in_export_dir.remove(self._path)
+            if self._compare_json:
+                existing_hash = hashlib.blake2b(self._path.read_bytes()).hexdigest()
+                new_hash = hashlib.blake2b(self._tmp_path.read_bytes()).hexdigest()
+                if existing_hash == new_hash:
+                    self._tmp_path.unlink()
+                    return
+        self._tmp_path.rename(self._path)
+
+    def __enter__(self) -> "StreamingManifestWriter":
+        self.open()
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb) -> None:
+        if exc_type is not None:
+            self.discard()
+        else:
+            self.close()
+
+
+class Command(CryptMixin, PaperlessCommand):
    help = (
        "Decrypt and rename all files in our collection into a given target "
        "directory.  And include a manifest file containing document data for "
        "easy import."
    )

+    supports_progress_bar = True
+    supports_multiprocessing = False
+
    def add_arguments(self, parser) -> None:
+        super().add_arguments(parser)
        parser.add_argument("target")

        parser.add_argument(
@@ -195,13 +278,6 @@ class Command(CryptMixin, BaseCommand):
            help="If set, only the database will be imported, not files",
        )

-        parser.add_argument(
-            "--no-progress-bar",
-            default=False,
-            action="store_true",
-            help="If set, the progress bar will not be shown",
-        )
-
        parser.add_argument(
            "--passphrase",
            help="If provided, is used to encrypt sensitive data in the export",
@@ -230,7 +306,6 @@ class Command(CryptMixin, BaseCommand):
        self.no_thumbnail: bool = options["no_thumbnail"]
        self.zip_export: bool = options["zip"]
        self.data_only: bool = options["data_only"]
-        self.no_progress_bar: bool = options["no_progress_bar"]
        self.passphrase: str | None = options.get("passphrase")
        self.batch_size: int = options["batch_size"]

@@ -322,95 +397,85 @@ class Command(CryptMixin, BaseCommand):
        if settings.AUDIT_LOG_ENABLED:
            manifest_key_to_object_query["log_entries"] = LogEntry.objects.all()

-        with transaction.atomic():
-            manifest_dict = {}
-
-            # Build an overall manifest
-            for key, object_query in manifest_key_to_object_query.items():
-                manifest_dict[key] = list(
-                    chain.from_iterable(
-                        serialize_queryset_batched(
-                            object_query,
-                            batch_size=self.batch_size,
-                        ),
-                    ),
-                )
-
-            self.encrypt_secret_fields(manifest_dict)
-
-            # These are treated specially and included in the per-document manifest
-            # if that setting is enabled.  Otherwise, they are just exported to the bulk
-            # manifest
-            document_map: dict[int, Document] = {
-                d.pk: d for d in manifest_key_to_object_query["documents"]
-            }
-            document_manifest = manifest_dict["documents"]
-
-        # 3. Export files from each document
-        for index, document_dict in tqdm.tqdm(
-            enumerate(document_manifest),
-            total=len(document_manifest),
-            disable=self.no_progress_bar,
-        ):
-            document = document_map[document_dict["pk"]]
-
-            # 3.1. generate a unique filename
-            base_name = self.generate_base_name(document)
-
-            # 3.2. write filenames into manifest
-            original_target, thumbnail_target, archive_target = (
-                self.generate_document_targets(document, base_name, document_dict)
+        # Crypto setup before streaming begins
+        if self.passphrase:
+            self.setup_crypto(passphrase=self.passphrase)
+        elif MailAccount.objects.count() > 0 or SocialToken.objects.count() > 0:
+            self.stdout.write(
+                self.style.NOTICE(
+                    "No passphrase was given, sensitive fields will be in plaintext",
+                ),
            )

-            # 3.3. write files to target folder
-            if not self.data_only:
-                self.copy_document_files(
-                    document,
-                    original_target,
-                    thumbnail_target,
-                    archive_target,
-                )
-
-            if self.split_manifest:
-                manifest_name = base_name.with_name(f"{base_name.stem}-manifest.json")
-                if self.use_folder_prefix:
-                    manifest_name = Path("json") / manifest_name
-                manifest_name = (self.target / manifest_name).resolve()
-                manifest_name.parent.mkdir(parents=True, exist_ok=True)
-                content = [document_manifest[index]]
-                content += list(
-                    filter(
-                        lambda d: d["fields"]["document"] == document_dict["pk"],
-                        manifest_dict["notes"],
-                    ),
-                )
-                content += list(
-                    filter(
-                        lambda d: d["fields"]["document"] == document_dict["pk"],
-                        manifest_dict["custom_field_instances"],
-                    ),
-                )
-
-                self.check_and_write_json(
-                    content,
-                    manifest_name,
-                )
-
-        # These were exported already
-        if self.split_manifest:
-            del manifest_dict["documents"]
-            del manifest_dict["notes"]
-            del manifest_dict["custom_field_instances"]
-
-        # 4.1 write primary manifest to target folder
-        manifest = []
-        for key, item in manifest_dict.items():
-            manifest.extend(item)
+        document_manifest: list[dict] = []
        manifest_path = (self.target / "manifest.json").resolve()
-        self.check_and_write_json(
-            manifest,
+
+        with StreamingManifestWriter(
            manifest_path,
-        )
+            compare_json=self.compare_json,
+            files_in_export_dir=self.files_in_export_dir,
+        ) as writer:
+            with transaction.atomic():
+                for key, qs in manifest_key_to_object_query.items():
+                    if key == "documents":
+                        # Accumulate for file-copy loop; written to manifest after
+                        for batch in serialize_queryset_batched(
+                            qs,
+                            batch_size=self.batch_size,
+                        ):
+                            for record in batch:
+                                self._encrypt_record_inline(record)
+                            document_manifest.extend(batch)
+                    elif self.split_manifest and key in (
+                        "notes",
+                        "custom_field_instances",
+                    ):
+                        # Written per-document in _write_split_manifest
+                        pass
+                    else:
+                        for batch in serialize_queryset_batched(
+                            qs,
+                            batch_size=self.batch_size,
+                        ):
+                            for record in batch:
+                                self._encrypt_record_inline(record)
+                            writer.write_batch(batch)
+
+            document_map: dict[int, Document] = {
+                d.pk: d for d in Document.objects.order_by("id")
+            }
+
+            # 3. Export files from each document
+            for index, document_dict in enumerate(
+                self.track(
+                    document_manifest,
+                    description="Exporting documents...",
+                    total=len(document_manifest),
+                ),
+            ):
+                document = document_map[document_dict["pk"]]
+
+                # 3.1. generate a unique filename
+                base_name = self.generate_base_name(document)
+
+                # 3.2. write filenames into manifest
+                original_target, thumbnail_target, archive_target = (
+                    self.generate_document_targets(document, base_name, document_dict)
+                )
+
+                # 3.3. write files to target folder
+                if not self.data_only:
+                    self.copy_document_files(
+                        document,
+                        original_target,
+                        thumbnail_target,
+                        archive_target,
+                    )
+
+                if self.split_manifest:
+                    self._write_split_manifest(document_dict, document, base_name)
+                else:
+                    writer.write_record(document_dict)

        # 4.2 write version information to target folder
        extra_metadata_path = (self.target / "metadata.json").resolve()
@@ -532,6 +597,42 @@ class Command(CryptMixin, BaseCommand):
                archive_target,
            )

+    def _encrypt_record_inline(self, record: dict) -> None:
+        """Encrypt sensitive fields in a single record, if passphrase is set."""
+        if not self.passphrase:
+            return
+        fields = self.CRYPT_FIELDS_BY_MODEL.get(record.get("model", ""))
+        if fields:
+            for field in fields:
+                if record["fields"].get(field):
+                    record["fields"][field] = self.encrypt_string(
+                        value=record["fields"][field],
+                    )
+
+    def _write_split_manifest(
+        self,
+        document_dict: dict,
+        document: Document,
+        base_name: Path,
+    ) -> None:
+        """Write per-document manifest file for --split-manifest mode."""
+        content = [document_dict]
+        content.extend(
+            serializers.serialize("python", Note.objects.filter(document=document)),
+        )
+        content.extend(
+            serializers.serialize(
+                "python",
+                CustomFieldInstance.objects.filter(document=document),
+            ),
+        )
+        manifest_name = base_name.with_name(f"{base_name.stem}-manifest.json")
+        if self.use_folder_prefix:
+            manifest_name = Path("json") / manifest_name
+        manifest_name = (self.target / manifest_name).resolve()
+        manifest_name.parent.mkdir(parents=True, exist_ok=True)
+        self.check_and_write_json(content, manifest_name)
+
    def check_and_write_json(
        self,
        content: list[dict] | dict,
@@ -549,14 +650,14 @@ class Command(CryptMixin, BaseCommand):
        if target in self.files_in_export_dir:
            self.files_in_export_dir.remove(target)
            if self.compare_json:
-                target_checksum = hashlib.md5(target.read_bytes()).hexdigest()
+                target_checksum = hashlib.blake2b(target.read_bytes()).hexdigest()
                src_str = json.dumps(
                    content,
                    cls=DjangoJSONEncoder,
                    indent=2,
                    ensure_ascii=False,
                )
-                src_checksum = hashlib.md5(src_str.encode("utf-8")).hexdigest()
+                src_checksum = hashlib.blake2b(src_str.encode("utf-8")).hexdigest()
                if src_checksum == target_checksum:
                    perform_write = False

@@ -606,28 +707,3 @@ class Command(CryptMixin, BaseCommand):
        if perform_copy:
            target.parent.mkdir(parents=True, exist_ok=True)
            copy_file_with_basic_stats(source, target)
-
-    def encrypt_secret_fields(self, manifest: dict) -> None:
-        """
-        Encrypts certain fields in the export.  Currently limited to the mail account password
-        """
-
-        if self.passphrase:
-            self.setup_crypto(passphrase=self.passphrase)
-
-            for crypt_config in self.CRYPT_FIELDS:
-                exporter_key = crypt_config["exporter_key"]
-                crypt_fields = crypt_config["fields"]
-                for manifest_record in manifest[exporter_key]:
-                    for field in crypt_fields:
-                        if manifest_record["fields"][field]:
-                            manifest_record["fields"][field] = self.encrypt_string(
-                                value=manifest_record["fields"][field],
-                            )
-
-        elif MailAccount.objects.count() > 0 or SocialToken.objects.count() > 0:
-            self.stdout.write(
-                self.style.NOTICE(
-                    "No passphrase was given, sensitive fields will be in plaintext",
-                ),
-            )
--- a/src/documents/management/commands/document_fuzzy_match.py
+++ b/src/documents/management/commands/document_fuzzy_match.py
@@ -40,6 +40,7 @@ def _process_and_match(work: _WorkPackage) -> _WorkResult:
 class Command(PaperlessCommand):
    help = "Searches for documents where the content almost matches"

+    supports_progress_bar = True
    supports_multiprocessing = True

    def add_arguments(self, parser):
--- a/src/documents/management/commands/document_importer.py
+++ b/src/documents/management/commands/document_importer.py
@@ -8,14 +8,13 @@ from pathlib import Path
 from zipfile import ZipFile
 from zipfile import is_zipfile

-import tqdm
+import ijson
 from django.conf import settings
 from django.contrib.auth.models import Permission
 from django.contrib.auth.models import User
 from django.contrib.contenttypes.models import ContentType
 from django.core.exceptions import FieldDoesNotExist
 from django.core.management import call_command
-from django.core.management.base import BaseCommand
 from django.core.management.base import CommandError
 from django.core.serializers.base import DeserializationError
 from django.db import IntegrityError
@@ -25,6 +24,7 @@ from django.db.models.signals import post_save
 from filelock import FileLock

 from documents.file_handling import create_source_path_directory
+from documents.management.commands.base import PaperlessCommand
 from documents.management.commands.mixins import CryptMixin
 from documents.models import Correspondent
 from documents.models import CustomField
@@ -47,6 +47,15 @@ if settings.AUDIT_LOG_ENABLED:
    from auditlog.registry import auditlog


+def iter_manifest_records(path: Path) -> Generator[dict, None, None]:
+    """Yield records one at a time from a manifest JSON array via ijson."""
+    try:
+        with path.open("rb") as f:
+            yield from ijson.items(f, "item")
+    except ijson.JSONError as e:
+        raise CommandError(f"Failed to parse manifest file {path}: {e}") from e
+
+
@contextmanager
 def disable_signal(sig, receiver, sender, *, weak: bool | None = None) -> Generator:
    try:
@@ -57,21 +66,18 @@ def disable_signal(sig, receiver, sender, *, weak: bool | None = None) -> Genera
        sig.connect(receiver=receiver, sender=sender, **kwargs)


-class Command(CryptMixin, BaseCommand):
+class Command(CryptMixin, PaperlessCommand):
    help = (
        "Using a manifest.json file, load the data from there, and import the "
        "documents it refers to."
    )

-    def add_arguments(self, parser) -> None:
-        parser.add_argument("source")
+    supports_progress_bar = True
+    supports_multiprocessing = False

-        parser.add_argument(
-            "--no-progress-bar",
-            default=False,
-            action="store_true",
-            help="If set, the progress bar will not be shown",
-        )
+    def add_arguments(self, parser) -> None:
+        super().add_arguments(parser)
+        parser.add_argument("source")

        parser.add_argument(
            "--data-only",
@@ -147,14 +153,9 @@ class Command(CryptMixin, BaseCommand):
        Loads manifest data from the various JSON files for parsing and loading the database
        """
        main_manifest_path: Path = self.source / "manifest.json"
-
-        with main_manifest_path.open() as infile:
-            self.manifest = json.load(infile)
        self.manifest_paths.append(main_manifest_path)

        for file in Path(self.source).glob("**/*-manifest.json"):
-            with file.open() as infile:
-                self.manifest += json.load(infile)
            self.manifest_paths.append(file)

    def load_metadata(self) -> None:
@@ -231,12 +232,10 @@ class Command(CryptMixin, BaseCommand):

        self.source = Path(options["source"]).resolve()
        self.data_only: bool = options["data_only"]
-        self.no_progress_bar: bool = options["no_progress_bar"]
        self.passphrase: str | None = options.get("passphrase")
        self.version: str | None = None
        self.salt: str | None = None
        self.manifest_paths = []
-        self.manifest = []

        # Create a temporary directory for extracting a zip file into it, even if supplied source is no zip file to keep code cleaner.
        with tempfile.TemporaryDirectory() as tmp_dir:
@@ -296,6 +295,9 @@ class Command(CryptMixin, BaseCommand):
            else:
                self.stdout.write(self.style.NOTICE("Data only import completed"))

+            for tmp in getattr(self, "_decrypted_tmp_paths", []):
+                tmp.unlink(missing_ok=True)
+
        self.stdout.write("Updating search index...")
        call_command(
            "document_index",
@@ -348,11 +350,12 @@ class Command(CryptMixin, BaseCommand):
                    ) from e

        self.stdout.write("Checking the manifest")
-        for record in self.manifest:
-            # Only check if the document files exist if this is not data only
-            # We don't care about documents for a data only import
-            if not self.data_only and record["model"] == "documents.document":
-                check_document_validity(record)
+        for manifest_path in self.manifest_paths:
+            for record in iter_manifest_records(manifest_path):
+                # Only check if the document files exist if this is not data only
+                # We don't care about documents for a data only import
+                if not self.data_only and record["model"] == "documents.document":
+                    check_document_validity(record)

    def _import_files_from_manifest(self) -> None:
        settings.ORIGINALS_DIR.mkdir(parents=True, exist_ok=True)
@@ -361,23 +364,31 @@ class Command(CryptMixin, BaseCommand):

        self.stdout.write("Copy files into paperless...")

-        manifest_documents = list(
-            filter(lambda r: r["model"] == "documents.document", self.manifest),
-        )
+        document_records = [
+            {
+                "pk": record["pk"],
+                EXPORTER_FILE_NAME: record[EXPORTER_FILE_NAME],
+                EXPORTER_THUMBNAIL_NAME: record.get(EXPORTER_THUMBNAIL_NAME),
+                EXPORTER_ARCHIVE_NAME: record.get(EXPORTER_ARCHIVE_NAME),
+            }
+            for manifest_path in self.manifest_paths
+            for record in iter_manifest_records(manifest_path)
+            if record["model"] == "documents.document"
+        ]

-        for record in tqdm.tqdm(manifest_documents, disable=self.no_progress_bar):
+        for record in self.track(document_records, description="Copying files..."):
            document = Document.objects.get(pk=record["pk"])

            doc_file = record[EXPORTER_FILE_NAME]
            document_path = self.source / doc_file

-            if EXPORTER_THUMBNAIL_NAME in record:
+            if record[EXPORTER_THUMBNAIL_NAME]:
                thumb_file = record[EXPORTER_THUMBNAIL_NAME]
                thumbnail_path = (self.source / thumb_file).resolve()
            else:
                thumbnail_path = None

-            if EXPORTER_ARCHIVE_NAME in record:
+            if record[EXPORTER_ARCHIVE_NAME]:
                archive_file = record[EXPORTER_ARCHIVE_NAME]
                archive_path = self.source / archive_file
            else:
@@ -418,33 +429,43 @@ class Command(CryptMixin, BaseCommand):

            document.save()

+    def _decrypt_record_if_needed(self, record: dict) -> dict:
+        fields = self.CRYPT_FIELDS_BY_MODEL.get(record.get("model", ""))
+        if fields:
+            for field in fields:
+                if record["fields"].get(field):
+                    record["fields"][field] = self.decrypt_string(
+                        value=record["fields"][field],
+                    )
+        return record
+
    def decrypt_secret_fields(self) -> None:
        """
-        The converse decryption of some fields out of the export before importing to database
+        The converse decryption of some fields out of the export before importing to database.
+        Streams records from each manifest path and writes decrypted content to a temp file.
        """
-        if self.passphrase:
-            # Salt has been loaded from metadata.json at this point, so it cannot be None
-            self.setup_crypto(passphrase=self.passphrase, salt=self.salt)
-
-            had_at_least_one_record = False
-
-            for crypt_config in self.CRYPT_FIELDS:
-                importer_model: str = crypt_config["model_name"]
-                crypt_fields: str = crypt_config["fields"]
-                for record in filter(
-                    lambda x: x["model"] == importer_model,
-                    self.manifest,
-                ):
-                    had_at_least_one_record = True
-                    for field in crypt_fields:
-                        if record["fields"][field]:
-                            record["fields"][field] = self.decrypt_string(
-                                value=record["fields"][field],
-                            )
-
-            if had_at_least_one_record:
-                # It's annoying, but the DB is loaded from the JSON directly
-                # Maybe could change that in the future?
-                (self.source / "manifest.json").write_text(
-                    json.dumps(self.manifest, indent=2, ensure_ascii=False),
-                )
+        if not self.passphrase:
+            return
+        # Salt has been loaded from metadata.json at this point, so it cannot be None
+        self.setup_crypto(passphrase=self.passphrase, salt=self.salt)
+        self._decrypted_tmp_paths: list[Path] = []
+        new_paths: list[Path] = []
+        for manifest_path in self.manifest_paths:
+            tmp = manifest_path.with_name(manifest_path.stem + ".decrypted.json")
+            with tmp.open("w", encoding="utf-8") as out:
+                out.write("[\n")
+                first = True
+                for record in iter_manifest_records(manifest_path):
+                    if not first:
+                        out.write(",\n")
+                    json.dump(
+                        self._decrypt_record_if_needed(record),
+                        out,
+                        indent=2,
+                        ensure_ascii=False,
+                    )
+                    first = False
+                out.write("\n]\n")
+            self._decrypted_tmp_paths.append(tmp)
+            new_paths.append(tmp)
+        self.manifest_paths = new_paths
--- a/src/documents/management/commands/document_index.py
+++ b/src/documents/management/commands/document_index.py
@@ -8,6 +8,9 @@ from documents.tasks import index_reindex
 class Command(PaperlessCommand):
    help = "Manages the document index."

+    supports_progress_bar = True
+    supports_multiprocessing = False
+
    def add_arguments(self, parser):
        super().add_arguments(parser)
        parser.add_argument("command", choices=["reindex", "optimize"])
--- a/src/documents/management/commands/document_llmindex.py
+++ b/src/documents/management/commands/document_llmindex.py
@@ -7,6 +7,9 @@ from documents.tasks import llmindex_index
 class Command(PaperlessCommand):
    help = "Manages the LLM-based vector index for Paperless."

+    supports_progress_bar = True
+    supports_multiprocessing = False
+
    def add_arguments(self, parser: Any) -> None:
        super().add_arguments(parser)
        parser.add_argument("command", choices=["rebuild", "update"])
--- a/src/documents/management/commands/document_renamer.py
+++ b/src/documents/management/commands/document_renamer.py
@@ -7,6 +7,9 @@ from documents.models import Document
 class Command(PaperlessCommand):
    help = "Rename all documents"

+    supports_progress_bar = True
+    supports_multiprocessing = False
+
    def handle(self, *args, **options):
        for document in self.track(Document.objects.all(), description="Renaming..."):
            post_save.send(Document, instance=document, created=False)
--- a/src/documents/management/commands/document_retagger.py
+++ b/src/documents/management/commands/document_retagger.py
@@ -180,6 +180,9 @@ class Command(PaperlessCommand):
        "modified) after their initial import."
    )

+    supports_progress_bar = True
+    supports_multiprocessing = False
+
    def add_arguments(self, parser) -> None:
        super().add_arguments(parser)
        parser.add_argument("-c", "--correspondent", default=False, action="store_true")
--- a/src/documents/management/commands/document_sanity_checker.py
+++ b/src/documents/management/commands/document_sanity_checker.py
@@ -24,6 +24,9 @@ _LEVEL_STYLE: dict[int, tuple[str, str]] = {
 class Command(PaperlessCommand):
    help = "This command checks your document archive for issues."

+    supports_progress_bar = True
+    supports_multiprocessing = False
+
    def _render_results(self, messages: SanityCheckMessages) -> None:
        """Render sanity check results as a Rich table."""

--- a/src/documents/management/commands/document_thumbnails.py
+++ b/src/documents/management/commands/document_thumbnails.py
@@ -36,6 +36,7 @@ def _process_document(doc_id: int) -> None:
 class Command(PaperlessCommand):
    help = "This will regenerate the thumbnails for all documents."

+    supports_progress_bar = True
    supports_multiprocessing = True

    def add_arguments(self, parser) -> None:
--- a/src/documents/management/commands/mixins.py
+++ b/src/documents/management/commands/mixins.py
@@ -1,6 +1,5 @@
 import base64
 import os
-from argparse import ArgumentParser
 from typing import TypedDict

 from cryptography.fernet import Fernet
@@ -21,25 +20,6 @@ class CryptFields(TypedDict):
    fields: list[str]


-class ProgressBarMixin:
-    """
-    Many commands use a progress bar, which can be disabled
-    via this class
-    """
-
-    def add_argument_progress_bar_mixin(self, parser: ArgumentParser) -> None:
-        parser.add_argument(
-            "--no-progress-bar",
-            default=False,
-            action="store_true",
-            help="If set, the progress bar will not be shown",
-        )
-
-    def handle_progress_bar_mixin(self, *args, **options) -> None:
-        self.no_progress_bar = options["no_progress_bar"]
-        self.use_progress_bar = not self.no_progress_bar
-
-
 class CryptMixin:
    """
    Fully based on:
@@ -71,7 +51,7 @@ class CryptMixin:
    key_size = 32
    kdf_algorithm = "pbkdf2_sha256"

-    CRYPT_FIELDS: CryptFields = [
+    CRYPT_FIELDS: list[CryptFields] = [
        {
            "exporter_key": "mail_accounts",
            "model_name": "paperless_mail.mailaccount",
@@ -89,6 +69,10 @@ class CryptMixin:
            ],
        },
    ]
+    # O(1) lookup for per-record encryption; derived from CRYPT_FIELDS at class definition time
+    CRYPT_FIELDS_BY_MODEL: dict[str, list[str]] = {
+        cfg["model_name"]: cfg["fields"] for cfg in CRYPT_FIELDS
+    }

    def get_crypt_params(self) -> dict[str, dict[str, str | int]]:
        return {
--- a/src/documents/management/commands/prune_audit_logs.py
+++ b/src/documents/management/commands/prune_audit_logs.py
@@ -9,6 +9,9 @@ class Command(PaperlessCommand):

    help = "Prunes the audit logs of objects that no longer exist."

+    supports_progress_bar = True
+    supports_multiprocessing = False
+
    def handle(self, *args, **options):
        with transaction.atomic():
            for log_entry in self.track(
--- a/src/documents/serialisers.py
+++ b/src/documents/serialisers.py
@@ -1440,6 +1440,124 @@ class SavedViewSerializer(OwnedObjectSerializer):
            "set_permissions",
        ]

+    def _get_api_version(self) -> int:
+        request = self.context.get("request")
+        return int(
+            request.version if request else settings.REST_FRAMEWORK["DEFAULT_VERSION"],
+        )
+
+    def _update_legacy_visibility_preferences(
+        self,
+        saved_view_id: int,
+        *,
+        show_on_dashboard: bool | None,
+        show_in_sidebar: bool | None,
+    ) -> UiSettings | None:
+        if show_on_dashboard is None and show_in_sidebar is None:
+            return None
+
+        request = self.context.get("request")
+        user = request.user if request else self.user
+        if user is None:
+            return None
+
+        ui_settings, _ = UiSettings.objects.get_or_create(
+            user=user,
+            defaults={"settings": {}},
+        )
+        current_settings = (
+            ui_settings.settings if isinstance(ui_settings.settings, dict) else {}
+        )
+        current_settings = dict(current_settings)
+
+        saved_views_settings = current_settings.get("saved_views")
+        if isinstance(saved_views_settings, dict):
+            saved_views_settings = dict(saved_views_settings)
+        else:
+            saved_views_settings = {}
+
+        dashboard_ids = {
+            int(raw_id)
+            for raw_id in saved_views_settings.get("dashboard_views_visible_ids", [])
+            if str(raw_id).isdigit()
+        }
+        sidebar_ids = {
+            int(raw_id)
+            for raw_id in saved_views_settings.get("sidebar_views_visible_ids", [])
+            if str(raw_id).isdigit()
+        }
+
+        if show_on_dashboard is not None:
+            if show_on_dashboard:
+                dashboard_ids.add(saved_view_id)
+            else:
+                dashboard_ids.discard(saved_view_id)
+        if show_in_sidebar is not None:
+            if show_in_sidebar:
+                sidebar_ids.add(saved_view_id)
+            else:
+                sidebar_ids.discard(saved_view_id)
+
+        saved_views_settings["dashboard_views_visible_ids"] = sorted(dashboard_ids)
+        saved_views_settings["sidebar_views_visible_ids"] = sorted(sidebar_ids)
+        current_settings["saved_views"] = saved_views_settings
+        ui_settings.settings = current_settings
+        ui_settings.save(update_fields=["settings"])
+        return ui_settings
+
+    def to_representation(self, instance):
+        # TODO: remove this and related backwards compatibility code when API v9 is dropped
+        ret = super().to_representation(instance)
+        request = self.context.get("request")
+        api_version = self._get_api_version()
+
+        if api_version < 10:
+            dashboard_ids = set()
+            sidebar_ids = set()
+            user = request.user if request else None
+            if user is not None and hasattr(user, "ui_settings"):
+                ui_settings = user.ui_settings.settings or None
+                saved_views = None
+                if isinstance(ui_settings, dict):
+                    saved_views = ui_settings.get("saved_views", {})
+                if isinstance(saved_views, dict):
+                    dashboard_ids = set(
+                        saved_views.get("dashboard_views_visible_ids", []),
+                    )
+                    sidebar_ids = set(
+                        saved_views.get("sidebar_views_visible_ids", []),
+                    )
+            ret["show_on_dashboard"] = instance.id in dashboard_ids
+            ret["show_in_sidebar"] = instance.id in sidebar_ids
+
+        return ret
+
+    def to_internal_value(self, data):
+        # TODO: remove this and related backwards compatibility code when API v9 is dropped
+        api_version = self._get_api_version()
+        if api_version >= 10:
+            return super().to_internal_value(data)
+
+        normalized_data = data.copy()
+        legacy_visibility_fields = {}
+        boolean_field = serializers.BooleanField()
+
+        for field_name in ("show_on_dashboard", "show_in_sidebar"):
+            if field_name in normalized_data:
+                try:
+                    legacy_visibility_fields[field_name] = (
+                        boolean_field.to_internal_value(
+                            normalized_data.get(field_name),
+                        )
+                    )
+                except serializers.ValidationError as exc:
+                    raise serializers.ValidationError({field_name: exc.detail})
+                del normalized_data[field_name]
+
+        ret = super().to_internal_value(normalized_data)
+        ret.update(legacy_visibility_fields)
+        return ret
+
    def validate(self, attrs):
        attrs = super().validate(attrs)
        if "display_fields" in attrs and attrs["display_fields"] is not None:
@@ -1459,6 +1577,9 @@ class SavedViewSerializer(OwnedObjectSerializer):
        return attrs

    def update(self, instance, validated_data):
+        request = self.context.get("request")
+        show_on_dashboard = validated_data.pop("show_on_dashboard", None)
+        show_in_sidebar = validated_data.pop("show_in_sidebar", None)
        if "filter_rules" in validated_data:
            rules_data = validated_data.pop("filter_rules")
        else:
@@ -1480,9 +1601,19 @@ class SavedViewSerializer(OwnedObjectSerializer):
            SavedViewFilterRule.objects.filter(saved_view=instance).delete()
            for rule_data in rules_data:
                SavedViewFilterRule.objects.create(saved_view=instance, **rule_data)
+        ui_settings = self._update_legacy_visibility_preferences(
+            instance.id,
+            show_on_dashboard=show_on_dashboard,
+            show_in_sidebar=show_in_sidebar,
+        )
+        if request is not None and ui_settings is not None:
+            request.user.ui_settings = ui_settings
        return instance

    def create(self, validated_data):
+        request = self.context.get("request")
+        show_on_dashboard = validated_data.pop("show_on_dashboard", None)
+        show_in_sidebar = validated_data.pop("show_in_sidebar", None)
        rules_data = validated_data.pop("filter_rules")
        if "user" in validated_data:
            # backwards compatibility
@@ -1490,6 +1621,13 @@ class SavedViewSerializer(OwnedObjectSerializer):
        saved_view = super().create(validated_data)
        for rule_data in rules_data:
            SavedViewFilterRule.objects.create(saved_view=saved_view, **rule_data)
+        ui_settings = self._update_legacy_visibility_preferences(
+            saved_view.id,
+            show_on_dashboard=show_on_dashboard,
+            show_in_sidebar=show_in_sidebar,
+        )
+        if request is not None and ui_settings is not None:
+            request.user.ui_settings = ui_settings
        return saved_view


--- a/src/documents/tests/test_api_documents.py
+++ b/src/documents/tests/test_api_documents.py
@@ -41,6 +41,7 @@ from documents.models import SavedView
 from documents.models import ShareLink
 from documents.models import StoragePath
 from documents.models import Tag
+from documents.models import UiSettings
 from documents.models import Workflow
 from documents.models import WorkflowAction
 from documents.models import WorkflowTrigger
@@ -2200,6 +2201,205 @@ class TestDocumentApi(DirectoriesMixin, DocumentConsumeDelayMixin, APITestCase):
        self.assertEqual(response.status_code, status.HTTP_200_OK)
        self.assertEqual(response.data["count"], 0)

+    def test_saved_view_api_version_backward_compatibility(self) -> None:
+        """
+        GIVEN:
+            - Saved views and UiSettings with visibility preferences
+        WHEN:
+            - API request with version=9 (legacy)
+            - API request with version=10 (current)
+        THEN:
+            - Version 9 returns show_on_dashboard and show_in_sidebar from UiSettings
+            - Version 10 omits these fields (moved to UiSettings)
+        """
+        v1 = SavedView.objects.create(
+            owner=self.user,
+            name="dashboard_view",
+            sort_field="created",
+        )
+        v2 = SavedView.objects.create(
+            owner=self.user,
+            name="sidebar_view",
+            sort_field="created",
+        )
+        v3 = SavedView.objects.create(
+            owner=self.user,
+            name="hidden_view",
+            sort_field="created",
+        )
+
+        UiSettings.objects.update_or_create(
+            user=self.user,
+            defaults={
+                "settings": {
+                    "saved_views": {
+                        "dashboard_views_visible_ids": [v1.id],
+                        "sidebar_views_visible_ids": [v2.id],
+                    },
+                },
+            },
+        )
+
+        response_v9 = self.client.get(
+            "/api/saved_views/",
+            headers={"Accept": "application/json; version=9"},
+            format="json",
+        )
+        self.assertEqual(response_v9.status_code, status.HTTP_200_OK)
+        results_v9 = {r["id"]: r for r in response_v9.data["results"]}
+        self.assertIn("show_on_dashboard", results_v9[v1.id])
+        self.assertIn("show_in_sidebar", results_v9[v1.id])
+        self.assertTrue(results_v9[v1.id]["show_on_dashboard"])
+        self.assertFalse(results_v9[v1.id]["show_in_sidebar"])
+        self.assertTrue(results_v9[v2.id]["show_in_sidebar"])
+        self.assertFalse(results_v9[v2.id]["show_on_dashboard"])
+        self.assertFalse(results_v9[v3.id]["show_on_dashboard"])
+        self.assertFalse(results_v9[v3.id]["show_in_sidebar"])
+
+        response_v10 = self.client.get(
+            "/api/saved_views/",
+            headers={"Accept": "application/json; version=10"},
+            format="json",
+        )
+        self.assertEqual(response_v10.status_code, status.HTTP_200_OK)
+        results_v10 = {r["id"]: r for r in response_v10.data["results"]}
+        self.assertNotIn("show_on_dashboard", results_v10[v1.id])
+        self.assertNotIn("show_in_sidebar", results_v10[v1.id])
+
+    def test_saved_view_api_version_9_user_without_ui_settings(self) -> None:
+        """
+        GIVEN:
+            - User with no UiSettings and a saved view
+        WHEN:
+            - API request with version=9
+        THEN:
+            - show_on_dashboard and show_in_sidebar are False (default)
+        """
+        SavedView.objects.create(
+            owner=self.user,
+            name="test_view",
+            sort_field="created",
+        )
+        UiSettings.objects.filter(user=self.user).delete()
+
+        response = self.client.get(
+            "/api/saved_views/",
+            headers={"Accept": "application/json; version=9"},
+            format="json",
+        )
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+        result = response.data["results"][0]
+        self.assertFalse(result["show_on_dashboard"])
+        self.assertFalse(result["show_in_sidebar"])
+
+    def test_saved_view_api_version_9_create_writes_visibility_to_ui_settings(
+        self,
+    ) -> None:
+        """
+        GIVEN:
+            - No UiSettings for the current user
+        WHEN:
+            - A saved view is created through API version 9 with visibility flags
+        THEN:
+            - Visibility is persisted in UiSettings.saved_views
+        """
+        UiSettings.objects.filter(user=self.user).delete()
+
+        response = self.client.post(
+            "/api/saved_views/",
+            {
+                "name": "legacy-v9-create",
+                "sort_field": "created",
+                "filter_rules": [],
+                "show_on_dashboard": True,
+                "show_in_sidebar": False,
+            },
+            headers={"Accept": "application/json; version=9"},
+            format="json",
+        )
+        self.assertEqual(response.status_code, status.HTTP_201_CREATED)
+        self.assertTrue(response.data["show_on_dashboard"])
+        self.assertFalse(response.data["show_in_sidebar"])
+
+        self.user.refresh_from_db()
+        self.assertTrue(hasattr(self.user, "ui_settings"))
+        saved_view_settings = self.user.ui_settings.settings["saved_views"]
+        self.assertListEqual(
+            saved_view_settings["dashboard_views_visible_ids"],
+            [response.data["id"]],
+        )
+        self.assertListEqual(saved_view_settings["sidebar_views_visible_ids"], [])
+
+    def test_saved_view_api_version_9_patch_writes_visibility_to_ui_settings(
+        self,
+    ) -> None:
+        """
+        GIVEN:
+            - Existing saved views and UiSettings visibility ids
+        WHEN:
+            - A saved view is updated through API version 9 visibility flags
+        THEN:
+            - The per-user UiSettings visibility ids are updated
+        """
+        v1 = SavedView.objects.create(
+            owner=self.user,
+            name="legacy-v9-patch-1",
+            sort_field="created",
+        )
+        v2 = SavedView.objects.create(
+            owner=self.user,
+            name="legacy-v9-patch-2",
+            sort_field="created",
+        )
+        UiSettings.objects.update_or_create(
+            user=self.user,
+            defaults={
+                "settings": {
+                    "saved_views": {
+                        "dashboard_views_visible_ids": [v1.id],
+                        "sidebar_views_visible_ids": [v1.id, v2.id],
+                    },
+                },
+            },
+        )
+
+        response = self.client.patch(
+            f"/api/saved_views/{v1.id}/",
+            {
+                "show_on_dashboard": False,
+            },
+            headers={"Accept": "application/json; version=9"},
+            format="json",
+        )
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+        self.assertFalse(response.data["show_on_dashboard"])
+        self.assertTrue(response.data["show_in_sidebar"])
+
+        self.user.refresh_from_db()
+        saved_view_settings = self.user.ui_settings.settings["saved_views"]
+        self.assertListEqual(saved_view_settings["dashboard_views_visible_ids"], [])
+        self.assertListEqual(
+            saved_view_settings["sidebar_views_visible_ids"],
+            [v1.id, v2.id],
+        )
+
+        response = self.client.patch(
+            f"/api/saved_views/{v1.id}/",
+            {
+                "show_in_sidebar": False,
+            },
+            headers={"Accept": "application/json; version=9"},
+            format="json",
+        )
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+        self.assertFalse(response.data["show_on_dashboard"])
+        self.assertFalse(response.data["show_in_sidebar"])
+
+        self.user.refresh_from_db()
+        saved_view_settings = self.user.ui_settings.settings["saved_views"]
+        self.assertListEqual(saved_view_settings["dashboard_views_visible_ids"], [])
+        self.assertListEqual(saved_view_settings["sidebar_views_visible_ids"], [v2.id])
+
    def test_saved_view_create_update_patch(self) -> None:
        User.objects.create_user("user1")

--- a/src/documents/tests/test_management_exporter.py
+++ b/src/documents/tests/test_management_exporter.py
@@ -753,6 +753,31 @@ class TestExportImport(
            call_command("document_importer", "--no-progress-bar", self.target)
            self.assertEqual(Document.objects.count(), 4)

+    def test_folder_prefix_with_split(self) -> None:
+        """
+        GIVEN:
+            - Request to export documents to directory
+        WHEN:
+            - Option use_folder_prefix is used
+            - Option split manifest is used
+        THEN:
+            - Documents can be imported again
+        """
+        shutil.rmtree(Path(self.dirs.media_dir) / "documents")
+        shutil.copytree(
+            Path(__file__).parent / "samples" / "documents",
+            Path(self.dirs.media_dir) / "documents",
+        )
+
+        self._do_export(use_folder_prefix=True, split_manifest=True)
+
+        with paperless_environment():
+            self.assertEqual(Document.objects.count(), 4)
+            Document.objects.all().delete()
+            self.assertEqual(Document.objects.count(), 0)
+            call_command("document_importer", "--no-progress-bar", self.target)
+            self.assertEqual(Document.objects.count(), 4)
+
    def test_import_db_transaction_failed(self) -> None:
        """
        GIVEN:
--- a/src/documents/tests/test_management_importer.py
+++ b/src/documents/tests/test_management_importer.py
@@ -119,15 +119,22 @@ class TestCommandImport(
            # No read permissions
            original_path.chmod(0o222)

+            manifest_path = Path(temp_dir) / "manifest.json"
+            manifest_path.write_text(
+                json.dumps(
+                    [
+                        {
+                            "model": "documents.document",
+                            EXPORTER_FILE_NAME: "original.pdf",
+                            EXPORTER_ARCHIVE_NAME: "archive.pdf",
+                        },
+                    ],
+                ),
+            )
+
            cmd = Command()
            cmd.source = Path(temp_dir)
-            cmd.manifest = [
-                {
-                    "model": "documents.document",
-                    EXPORTER_FILE_NAME: "original.pdf",
-                    EXPORTER_ARCHIVE_NAME: "archive.pdf",
-                },
-            ]
+            cmd.manifest_paths = [manifest_path]
            cmd.data_only = False
            with self.assertRaises(CommandError) as cm:
                cmd.check_manifest_validity()
@@ -296,7 +303,7 @@ class TestCommandImport(
        (self.dirs.scratch_dir / "manifest.json").touch()

        # We're not building a manifest, so it fails, but this test doesn't care
-        with self.assertRaises(json.decoder.JSONDecodeError):
+        with self.assertRaises(CommandError):
            call_command(
                "document_importer",
                "--no-progress-bar",
@@ -325,7 +332,7 @@ class TestCommandImport(
        )

        # We're not building a manifest, so it fails, but this test doesn't care
-        with self.assertRaises(json.decoder.JSONDecodeError):
+        with self.assertRaises(CommandError):
            call_command(
                "document_importer",
                "--no-progress-bar",
--- a/src/locale/en_US/LC_MESSAGES/django.po
+++ b/src/locale/en_US/LC_MESSAGES/django.po
@@ -2,7 +2,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: paperless-ngx\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-03-06 20:00+0000\n"
+"POT-Creation-Date: 2026-03-09 01:51+0000\n"
 "PO-Revision-Date: 2022-02-17 04:17\n"
 "Last-Translator: \n"
 "Language-Team: English\n"
@@ -1299,7 +1299,7 @@ msgstr ""
 msgid "workflow runs"
 msgstr ""

-#: documents/serialisers.py:463 documents/serialisers.py:2344
+#: documents/serialisers.py:463 documents/serialisers.py:2482
 msgid "Insufficient permissions."
 msgstr ""

@@ -1307,39 +1307,39 @@ msgstr ""
 msgid "Invalid color."
 msgstr ""

-#: documents/serialisers.py:1967
+#: documents/serialisers.py:2105
 #, python-format
 msgid "File type %(type)s not supported"
 msgstr ""

-#: documents/serialisers.py:2011
+#: documents/serialisers.py:2149
 #, python-format
 msgid "Custom field id must be an integer: %(id)s"
 msgstr ""

-#: documents/serialisers.py:2018
+#: documents/serialisers.py:2156
 #, python-format
 msgid "Custom field with id %(id)s does not exist"
 msgstr ""

-#: documents/serialisers.py:2035 documents/serialisers.py:2045
+#: documents/serialisers.py:2173 documents/serialisers.py:2183
 msgid ""
 "Custom fields must be a list of integers or an object mapping ids to values."
 msgstr ""

-#: documents/serialisers.py:2040
+#: documents/serialisers.py:2178
 msgid "Some custom fields don't exist or were specified twice."
 msgstr ""

-#: documents/serialisers.py:2187
+#: documents/serialisers.py:2325
 msgid "Invalid variable detected."
 msgstr ""

-#: documents/serialisers.py:2400
+#: documents/serialisers.py:2538
 msgid "Duplicate document identifiers are not allowed."
 msgstr ""

-#: documents/serialisers.py:2430 documents/views.py:3328
+#: documents/serialisers.py:2568 documents/views.py:3328
 #, python-format
 msgid "Documents not found: %(ids)s"
 msgstr ""
--- a/src/paperless/celery.py
+++ b/src/paperless/celery.py
@@ -1,6 +1,7 @@
 import os

 from celery import Celery
+from celery.signals import worker_process_init

 # Set the default Django settings module for the 'celery' program.
 os.environ.setdefault("DJANGO_SETTINGS_MODULE", "paperless.settings")
@@ -15,3 +16,18 @@ app.config_from_object("django.conf:settings", namespace="CELERY")

 # Load task modules from all registered Django apps.
 app.autodiscover_tasks()
+
+
+@worker_process_init.connect
+def on_worker_process_init(**kwargs) -> None:
+    """Register built-in parsers eagerly in each Celery worker process.
+
+    This registers only the built-in parsers (no entrypoint discovery) so
+    that workers can begin consuming documents immediately.  Entrypoint
+    discovery for third-party parsers is deferred to the first call of
+    ``get_parser_registry()`` inside a task, keeping ``worker_process_init``
+    well within its 4-second timeout budget.
+    """
+    from paperless.parsers.registry import init_builtin_parsers
+
+    init_builtin_parsers()
--- a/src/paperless/parsers/init.py
+++ b/src/paperless/parsers/init.py
@@ -0,0 +1,379 @@
+"""
+Public interface for the Paperless-ngx parser plugin system.
+
+This module defines ParserProtocol — the structural contract that every
+document parser must satisfy, whether it is a built-in parser shipped with
+Paperless-ngx or a third-party parser installed via a Python entrypoint.
+
+Phase 1/2 scope: only the Protocol is defined here. The transitional
+DocumentParser ABC (Phase 3) and concrete built-in parsers (Phase 3+) will
+be added in later phases, so there are intentionally no imports of parser
+implementations here.
+
+Usage example (third-party parser)::
+
+    from paperless.parsers import ParserProtocol
+
+    class MyParser:
+        name = "my-parser"
+        version = "1.0.0"
+        author = "Acme Corp"
+        url = "https://example.com/my-parser"
+
+        @classmethod
+        def supported_mime_types(cls) -> dict[str, str]:
+            return {"application/x-my-format": ".myf"}
+
+        @classmethod
+        def score(cls, mime_type, filename, path=None):
+            return 10
+
+        # … implement remaining protocol methods …
+
+    assert isinstance(MyParser(), ParserProtocol)
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+from typing import Protocol
+from typing import Self
+from typing import TypedDict
+from typing import runtime_checkable
+
+if TYPE_CHECKING:
+    import datetime
+    from pathlib import Path
+    from types import TracebackType
+
+__all__ = [
+    "MetadataEntry",
+    "ParserProtocol",
+]
+
+
+class MetadataEntry(TypedDict):
+    """A single metadata field extracted from a document.
+
+    All four keys are required. Values are always serialised to strings —
+    type-specific conversion (dates, integers, lists) is the responsibility
+    of the parser before returning.
+    """
+
+    namespace: str
+    """URI of the metadata namespace (e.g. 'http://ns.adobe.com/pdf/1.3/')."""
+
+    prefix: str
+    """Conventional namespace prefix (e.g. 'pdf', 'xmp', 'dc')."""
+
+    key: str
+    """Field name within the namespace (e.g. 'Author', 'CreateDate')."""
+
+    value: str
+    """String representation of the field value."""
+
+
+@runtime_checkable
+class ParserProtocol(Protocol):
+    """Structural contract for all Paperless-ngx document parsers.
+
+    Both built-in parsers and third-party plugins (discovered via the
+    "paperless_ngx.parsers" entrypoint group) must satisfy this Protocol.
+    Because it is decorated with runtime_checkable, isinstance(obj,
+    ParserProtocol) works at runtime based on method presence, which is
+    useful for validation in ParserRegistry.discover.
+
+    Parsers must expose four string attributes at the class level so the
+    registry can log attribution information without instantiating the parser:
+
+    name : str
+        Human-readable parser name (e.g. "Tesseract OCR").
+    version : str
+        Semantic version string (e.g. "1.2.3").
+    author : str
+        Author or organisation name.
+    url : str
+        URL for documentation, source code, or issue tracker.
+    """
+
+    # ------------------------------------------------------------------
+    # Class-level identity (checked by the registry, not Protocol methods)
+    # ------------------------------------------------------------------
+
+    name: str
+    version: str
+    author: str
+    url: str
+
+    # ------------------------------------------------------------------
+    # Class methods
+    # ------------------------------------------------------------------
+
+    @classmethod
+    def supported_mime_types(cls) -> dict[str, str]:
+        """Return a mapping of supported MIME types to preferred file extensions.
+
+        The keys are MIME type strings (e.g. "application/pdf"), and the
+        values are the preferred file extension including the leading dot
+        (e.g. ".pdf").  The registry uses this mapping both to decide whether
+        a parser is a candidate for a given file and to determine the default
+        extension when creating archive copies.
+
+        Returns
+        -------
+        dict[str, str]
+            {mime_type: extension} mapping — may be empty if the parser
+            has been temporarily disabled.
+        """
+        ...
+
+    @classmethod
+    def score(
+        cls,
+        mime_type: str,
+        filename: str,
+        path: Path | None = None,
+    ) -> int | None:
+        """Return a priority score for handling this file, or None to decline.
+
+        The registry calls this after confirming that the MIME type is in
+        supported_mime_types. Parsers may inspect filename and optionally
+        the file at path to refine their confidence level.
+
+        A higher score wins. Return None to explicitly decline handling a file
+        even though the MIME type is listed as supported (e.g. when a feature
+        flag is disabled, or a required service is not configured).
+
+        Parameters
+        ----------
+        mime_type:
+            The detected MIME type of the file to be parsed.
+        filename:
+            The original filename, including extension.
+        path:
+            Optional filesystem path to the file. Parsers that need to
+            inspect file content (e.g. magic-byte sniffing) may use this.
+            May be None when scoring happens before the file is available locally.
+
+        Returns
+        -------
+        int | None
+            Priority score (higher wins), or None to decline.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Properties
+    # ------------------------------------------------------------------
+
+    @property
+    def can_produce_archive(self) -> bool:
+        """Whether this parser can produce a searchable PDF archive copy.
+
+        If True, the consumption pipeline may request an archive version when
+        processing the document, subject to the ARCHIVE_FILE_GENERATION
+        setting. If False, only thumbnail and text extraction are performed.
+        """
+        ...
+
+    @property
+    def requires_pdf_rendition(self) -> bool:
+        """Whether the parser must produce a PDF for the frontend to display.
+
+        True for formats the browser cannot display natively (e.g. DOCX, ODT).
+        When True, the pipeline always stores the PDF output regardless of the
+        ARCHIVE_FILE_GENERATION setting, since the original format cannot be
+        shown to the user.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Core parsing interface
+    # ------------------------------------------------------------------
+
+    def parse(
+        self,
+        document_path: Path,
+        mime_type: str,
+        *,
+        produce_archive: bool = True,
+    ) -> None:
+        """Parse document_path and populate internal state.
+
+        After a successful call, callers retrieve results via get_text,
+        get_date, and get_archive_path.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the document file to parse.
+        mime_type:
+            Detected MIME type of the document.
+        produce_archive:
+            When True (the default) and can_produce_archive is also True,
+            the parser should produce a searchable PDF at the path returned
+            by get_archive_path. Pass False when only text extraction and
+            thumbnail generation are required and disk I/O should be minimised.
+
+        Raises
+        ------
+        documents.parsers.ParseError
+            If parsing fails for any reason.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Result accessors
+    # ------------------------------------------------------------------
+
+    def get_text(self) -> str | None:
+        """Return the plain-text content extracted during parse.
+
+        Returns
+        -------
+        str | None
+            Extracted text, or None if no text could be found.
+        """
+        ...
+
+    def get_date(self) -> datetime.datetime | None:
+        """Return the document date detected during parse.
+
+        Returns
+        -------
+        datetime.datetime | None
+            Detected document date, or None if no date was found.
+        """
+        ...
+
+    def get_archive_path(self) -> Path | None:
+        """Return the path to the generated archive PDF, or None.
+
+        Returns
+        -------
+        Path | None
+            Path to the searchable PDF archive, or None if no archive was
+            produced (e.g. because produce_archive=False or the parser does
+            not support archive generation).
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Thumbnail and metadata
+    # ------------------------------------------------------------------
+
+    def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
+        """Generate and return the path to a thumbnail image for the document.
+
+        May be called independently of parse. The returned path must point to
+        an existing WebP image file inside the parser's temporary working
+        directory.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        Path
+            Path to the generated thumbnail image (WebP format preferred).
+        """
+        ...
+
+    def get_page_count(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> int | None:
+        """Return the number of pages in the document, if determinable.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        int | None
+            Page count, or None if the parser cannot determine it.
+        """
+        ...
+
+    def extract_metadata(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> list[MetadataEntry]:
+        """Extract format-specific metadata from the document.
+
+        Called by the API view layer on demand — not during the consumption
+        pipeline. Results are returned to the frontend for per-file display.
+
+        For documents with an archive version, this method is called twice:
+        once for the original file (with its native MIME type) and once for
+        the archive file (with ``"application/pdf"``). Parsers that produce
+        archives should handle both cases.
+
+        Implementations must not raise. A failure to read metadata is not
+        fatal — log a warning and return whatever partial results were
+        collected, or ``[]`` if none.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the file to extract metadata from.
+        mime_type:
+            MIME type of the file at ``document_path``. May be
+            ``"application/pdf"`` when called for the archive version.
+
+        Returns
+        -------
+        list[MetadataEntry]
+            Zero or more metadata entries. Returns ``[]`` if no metadata
+            could be extracted or the format does not support it.
+        """
+        ...
+
+    # ------------------------------------------------------------------
+    # Context manager
+    # ------------------------------------------------------------------
+
+    def __enter__(self) -> Self:
+        """Enter the parser context, returning the parser instance.
+
+        Implementations should perform any resource allocation here if not
+        done in __init__ (e.g. creating API clients or temp directories).
+
+        Returns
+        -------
+        Self
+            The parser instance itself.
+        """
+        ...
+
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        """Exit the parser context and release all resources.
+
+        Implementations must clean up all temporary files and other resources
+        regardless of whether an exception occurred.
+
+        Parameters
+        ----------
+        exc_type:
+            The exception class, or None if no exception was raised.
+        exc_val:
+            The exception instance, or None.
+        exc_tb:
+            The traceback, or None.
+        """
+        ...
--- a/src/paperless/parsers/registry.py
+++ b/src/paperless/parsers/registry.py
@@ -0,0 +1,365 @@
+"""
+Singleton registry that tracks all document parsers available to
+Paperless-ngx — both built-ins shipped with the application and third-party
+plugins installed via Python entrypoints.
+
+Public surface
+--------------
+get_parser_registry
+    Lazy-initialise and return the shared ParserRegistry. This is the primary
+    entry point for production code.
+
+init_builtin_parsers
+    Register built-in parsers only, without entrypoint discovery. Safe to
+    call from Celery worker_process_init where importing all entrypoints
+    would be wasteful or cause side effects.
+
+reset_parser_registry
+    Reset module-level state. For tests only.
+
+Entrypoint group
+----------------
+Third-party parsers must advertise themselves under the
+"paperless_ngx.parsers" entrypoint group in their pyproject.toml::
+
+    [project.entry-points."paperless_ngx.parsers"]
+    my_parser = "my_package.parsers:MyParser"
+
+The loaded class must expose the following attributes at the class level
+(not just on instances) for the registry to accept it:
+name, version, author, url, supported_mime_types (callable), score (callable).
+"""
+
+from __future__ import annotations
+
+import logging
+from importlib.metadata import entry_points
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+    from paperless.parsers import ParserProtocol
+
+logger = logging.getLogger("paperless.parsers.registry")
+
+# ---------------------------------------------------------------------------
+# Module-level singleton state
+# ---------------------------------------------------------------------------
+
+_registry: ParserRegistry | None = None
+_discovery_complete: bool = False
+
+# Attribute names that every registered external parser class must expose.
+_REQUIRED_ATTRS: tuple[str, ...] = (
+    "name",
+    "version",
+    "author",
+    "url",
+    "supported_mime_types",
+    "score",
+)
+
+
+# ---------------------------------------------------------------------------
+# Module-level accessor functions
+# ---------------------------------------------------------------------------
+
+
+def get_parser_registry() -> ParserRegistry:
+    """Return the shared ParserRegistry instance.
+
+    On the first call this function:
+
+    1. Creates a new ParserRegistry.
+    2. Calls register_defaults to install built-in parsers.
+    3. Calls discover to load third-party plugins via importlib.metadata entrypoints.
+    4. Calls log_summary to emit a startup summary.
+
+    Subsequent calls return the same instance immediately.
+
+    Returns
+    -------
+    ParserRegistry
+        The shared registry singleton.
+    """
+    global _registry, _discovery_complete
+
+    if _registry is None:
+        _registry = ParserRegistry()
+        _registry.register_defaults()
+
+    if not _discovery_complete:
+        _registry.discover()
+        _registry.log_summary()
+        _discovery_complete = True
+
+    return _registry
+
+
+def init_builtin_parsers() -> None:
+    """Register built-in parsers without performing entrypoint discovery.
+
+    Intended for use in Celery worker_process_init handlers where importing
+    all installed entrypoints would be wasteful, slow, or could produce
+    undesirable side effects. Entrypoint discovery (third-party plugins) is
+    deliberately not performed.
+
+    Safe to call multiple times — subsequent calls are no-ops.
+
+    Returns
+    -------
+    None
+    """
+    global _registry
+
+    if _registry is None:
+        _registry = ParserRegistry()
+        _registry.register_defaults()
+        _registry.log_summary()
+
+
+def reset_parser_registry() -> None:
+    """Reset the module-level registry state to its initial values.
+
+    Resets _registry and _discovery_complete so the next call to
+    get_parser_registry will re-initialise everything from scratch.
+
+    FOR TESTS ONLY. Do not call this in production code — resetting the
+    registry mid-request causes all subsequent parser lookups to go through
+    discovery again, which is expensive and may have unexpected side effects
+    in multi-threaded environments.
+
+    Returns
+    -------
+    None
+    """
+    global _registry, _discovery_complete
+
+    _registry = None
+    _discovery_complete = False
+
+
+# ---------------------------------------------------------------------------
+# Registry class
+# ---------------------------------------------------------------------------
+
+
+class ParserRegistry:
+    """Registry that maps MIME types to the best available parser class.
+
+    Parsers are partitioned into two lists:
+
+    _builtins
+        Parser classes registered via register_builtin (populated by
+        register_defaults in Phase 3+).
+
+    _external
+        Parser classes loaded from installed Python entrypoints via discover.
+
+    When resolving a parser for a file, external parsers are evaluated
+    alongside built-in parsers using a uniform scoring mechanism. Both lists
+    are iterated together; the class with the highest score wins. If an
+    external parser wins, its attribution details are logged so users can
+    identify which third-party package handled their document.
+    """
+
+    def __init__(self) -> None:
+        self._external: list[type[ParserProtocol]] = []
+        self._builtins: list[type[ParserProtocol]] = []
+
+    # ------------------------------------------------------------------
+    # Registration
+    # ------------------------------------------------------------------
+
+    def register_builtin(self, parser_class: type[ParserProtocol]) -> None:
+        """Register a built-in parser class.
+
+        Built-in parsers are shipped with Paperless-ngx and are appended to
+        the _builtins list. They are never overridden by external parsers;
+        instead, scoring determines which parser wins for any given file.
+
+        Parameters
+        ----------
+        parser_class:
+            The parser class to register. Must satisfy ParserProtocol.
+        """
+        self._builtins.append(parser_class)
+
+    def register_defaults(self) -> None:
+        """Register the built-in parsers that ship with Paperless-ngx.
+
+        Each parser that has been migrated to the new ParserProtocol interface
+        is registered here.  Parsers are added in ascending weight order so
+        that log output is predictable; scoring determines which parser wins
+        at runtime regardless of registration order.
+        """
+        from paperless.parsers.text import TextDocumentParser
+
+        self.register_builtin(TextDocumentParser)
+
+    # ------------------------------------------------------------------
+    # Discovery
+    # ------------------------------------------------------------------
+
+    def discover(self) -> None:
+        """Load third-party parsers from the "paperless_ngx.parsers" entrypoint group.
+
+        For each advertised entrypoint the method:
+
+        1. Calls ep.load() to import the class.
+        2. Validates that the class exposes all required attributes.
+        3. On success, appends the class to _external and logs an info message.
+        4. On failure (import error or missing attributes), logs an appropriate
+           warning/error and continues to the next entrypoint.
+
+        Errors during discovery of a single parser do not prevent other parsers
+        from being loaded.
+
+        Returns
+        -------
+        None
+        """
+        eps = entry_points(group="paperless_ngx.parsers")
+
+        for ep in eps:
+            try:
+                parser_class = ep.load()
+            except Exception:
+                logger.exception(
+                    "Failed to load parser entrypoint '%s' — skipping.",
+                    ep.name,
+                )
+                continue
+
+            missing = [
+                attr for attr in _REQUIRED_ATTRS if not hasattr(parser_class, attr)
+            ]
+            if missing:
+                logger.warning(
+                    "Parser loaded from entrypoint '%s' is missing required "
+                    "attributes %r — skipping.",
+                    ep.name,
+                    missing,
+                )
+                continue
+
+            self._external.append(parser_class)
+            logger.info(
+                "Loaded third-party parser '%s' v%s by %s (entrypoint: '%s').",
+                parser_class.name,
+                parser_class.version,
+                parser_class.author,
+                ep.name,
+            )
+
+    # ------------------------------------------------------------------
+    # Summary logging
+    # ------------------------------------------------------------------
+
+    def log_summary(self) -> None:
+        """Log a startup summary of all registered parsers.
+
+        Built-in parsers are listed first, followed by any external parsers
+        discovered from entrypoints.  If no external parsers were found a
+        short informational message is logged instead of an empty list.
+
+        Returns
+        -------
+        None
+        """
+        logger.info(
+            "Built-in parsers (%d):",
+            len(self._builtins),
+        )
+        for cls in self._builtins:
+            logger.info(
+                "  [built-in] %s v%s — %s",
+                getattr(cls, "name", repr(cls)),
+                getattr(cls, "version", "unknown"),
+                getattr(cls, "url", "built-in"),
+            )
+
+        if not self._external:
+            logger.info("No third-party parsers discovered.")
+            return
+
+        logger.info(
+            "Third-party parsers (%d):",
+            len(self._external),
+        )
+        for cls in self._external:
+            logger.info(
+                "  [external] %s v%s by %s — report issues at %s",
+                getattr(cls, "name", repr(cls)),
+                getattr(cls, "version", "unknown"),
+                getattr(cls, "author", "unknown"),
+                getattr(cls, "url", "unknown"),
+            )
+
+    # ------------------------------------------------------------------
+    # Parser resolution
+    # ------------------------------------------------------------------
+
+    def get_parser_for_file(
+        self,
+        mime_type: str,
+        filename: str,
+        path: Path | None = None,
+    ) -> type[ParserProtocol] | None:
+        """Return the best parser class for the given file, or None.
+
+        All registered parsers (external first, then built-ins) are evaluated
+        against the file. A parser is eligible if mime_type appears in the dict
+        returned by its supported_mime_types classmethod, and its score
+        classmethod returns a non-None integer.
+
+        The parser with the highest score wins. When two parsers return the
+        same score, the one that appears earlier in the evaluation order wins
+        (external parsers are evaluated before built-ins, giving third-party
+        packages a chance to override defaults at equal priority).
+
+        When an external parser is selected, its identity is logged at INFO
+        level so operators can trace which package handled a document.
+
+        Parameters
+        ----------
+        mime_type:
+            The detected MIME type of the file.
+        filename:
+            The original filename, including extension.
+        path:
+            Optional filesystem path to the file. Forwarded to each
+            parser's score method.
+
+        Returns
+        -------
+        type[ParserProtocol] | None
+            The winning parser class, or None if no parser can handle the file.
+        """
+        best_score: int | None = None
+        best_parser: type[ParserProtocol] | None = None
+
+        # External parsers are placed first so that, at equal scores, an
+        # external parser wins over a built-in (first-seen policy).
+        for parser_class in (*self._external, *self._builtins):
+            if mime_type not in parser_class.supported_mime_types():
+                continue
+
+            score = parser_class.score(mime_type, filename, path)
+            if score is None:
+                continue
+
+            if best_score is None or score > best_score:
+                best_score = score
+                best_parser = parser_class
+
+        if best_parser is not None and best_parser in self._external:
+            logger.info(
+                "Document handled by third-party parser '%s' v%s — %s",
+                getattr(best_parser, "name", repr(best_parser)),
+                getattr(best_parser, "version", "unknown"),
+                getattr(best_parser, "url", "unknown"),
+            )
+
+        return best_parser
--- a/src/paperless/parsers/text.py
+++ b/src/paperless/parsers/text.py
@@ -0,0 +1,320 @@
+"""
+Built-in plain-text document parser.
+
+Handles text/plain, text/csv, and application/csv MIME types by reading the
+file content directly.  Thumbnails are generated by rendering a page-sized
+WebP image from the first 100,000 characters using Pillow.
+"""
+
+from __future__ import annotations
+
+import logging
+import shutil
+import tempfile
+from pathlib import Path
+from typing import TYPE_CHECKING
+from typing import Self
+
+from django.conf import settings
+from PIL import Image
+from PIL import ImageDraw
+from PIL import ImageFont
+
+from paperless.version import __full_version_str__
+
+if TYPE_CHECKING:
+    import datetime
+    from types import TracebackType
+
+    from paperless.parsers import MetadataEntry
+
+logger = logging.getLogger("paperless.parsing.text")
+
+_SUPPORTED_MIME_TYPES: dict[str, str] = {
+    "text/plain": ".txt",
+    "text/csv": ".csv",
+    "application/csv": ".csv",
+}
+
+
+class TextDocumentParser:
+    """Parse plain-text documents (txt, csv) for Paperless-ngx.
+
+    This parser reads the file content directly as UTF-8 text and renders a
+    simple thumbnail using Pillow.  It does not perform OCR and does not
+    produce a searchable PDF archive copy.
+
+    Class attributes
+    ----------------
+    name : str
+        Human-readable parser name.
+    version : str
+        Semantic version string, kept in sync with Paperless-ngx releases.
+    author : str
+        Maintainer name.
+    url : str
+        Issue tracker / source URL.
+    """
+
+    name: str = "Paperless-ngx Text Parser"
+    version: str = __full_version_str__
+    author: str = "Paperless-ngx Contributors"
+    url: str = "https://github.com/paperless-ngx/paperless-ngx"
+
+    # ------------------------------------------------------------------
+    # Class methods
+    # ------------------------------------------------------------------
+
+    @classmethod
+    def supported_mime_types(cls) -> dict[str, str]:
+        """Return the MIME types this parser handles.
+
+        Returns
+        -------
+        dict[str, str]
+            Mapping of MIME type to preferred file extension.
+        """
+        return _SUPPORTED_MIME_TYPES
+
+    @classmethod
+    def score(
+        cls,
+        mime_type: str,
+        filename: str,
+        path: Path | None = None,
+    ) -> int | None:
+        """Return the priority score for handling this file.
+
+        Parameters
+        ----------
+        mime_type:
+            Detected MIME type of the file.
+        filename:
+            Original filename including extension.
+        path:
+            Optional filesystem path. Not inspected by this parser.
+
+        Returns
+        -------
+        int | None
+            10 if the MIME type is supported, otherwise None.
+        """
+        if mime_type in _SUPPORTED_MIME_TYPES:
+            return 10
+        return None
+
+    # ------------------------------------------------------------------
+    # Properties
+    # ------------------------------------------------------------------
+
+    @property
+    def can_produce_archive(self) -> bool:
+        """Whether this parser can produce a searchable PDF archive copy.
+
+        Returns
+        -------
+        bool
+            Always False — the text parser does not produce a PDF archive.
+        """
+        return False
+
+    @property
+    def requires_pdf_rendition(self) -> bool:
+        """Whether the parser must produce a PDF for the frontend to display.
+
+        Returns
+        -------
+        bool
+            Always False — plain text files are displayable as-is.
+        """
+        return False
+
+    # ------------------------------------------------------------------
+    # Lifecycle
+    # ------------------------------------------------------------------
+
+    def __init__(self, logging_group: object = None) -> None:
+        settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
+        self._tempdir = Path(
+            tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR),
+        )
+        self._text: str | None = None
+
+    def __enter__(self) -> Self:
+        return self
+
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        logger.debug("Cleaning up temporary directory %s", self._tempdir)
+        shutil.rmtree(self._tempdir, ignore_errors=True)
+
+    # ------------------------------------------------------------------
+    # Core parsing interface
+    # ------------------------------------------------------------------
+
+    def parse(
+        self,
+        document_path: Path,
+        mime_type: str,
+        *,
+        produce_archive: bool = True,
+    ) -> None:
+        """Read the document and store its text content.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the text file.
+        mime_type:
+            Detected MIME type of the document.
+        produce_archive:
+            Ignored — this parser never produces a PDF archive.
+
+        Raises
+        ------
+        documents.parsers.ParseError
+            If the file cannot be read.
+        """
+        self._text = self._read_text(document_path)
+
+    # ------------------------------------------------------------------
+    # Result accessors
+    # ------------------------------------------------------------------
+
+    def get_text(self) -> str | None:
+        """Return the plain-text content extracted during parse.
+
+        Returns
+        -------
+        str | None
+            Extracted text, or None if parse has not been called yet.
+        """
+        return self._text
+
+    def get_date(self) -> datetime.datetime | None:
+        """Return the document date detected during parse.
+
+        Returns
+        -------
+        datetime.datetime | None
+            Always None — the text parser does not detect dates.
+        """
+        return None
+
+    def get_archive_path(self) -> Path | None:
+        """Return the path to a generated archive PDF, or None.
+
+        Returns
+        -------
+        Path | None
+            Always None — the text parser does not produce a PDF archive.
+        """
+        return None
+
+    # ------------------------------------------------------------------
+    # Thumbnail and metadata
+    # ------------------------------------------------------------------
+
+    def get_thumbnail(self, document_path: Path, mime_type: str) -> Path:
+        """Render the first portion of the document as a WebP thumbnail.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        Path
+            Path to the generated WebP thumbnail inside the temporary directory.
+        """
+        max_chars = 100_000
+        file_size_limit = 50 * 1024 * 1024
+
+        if document_path.stat().st_size > file_size_limit:
+            text = "[File too large to preview]"
+        else:
+            with Path(document_path).open("r", encoding="utf-8", errors="replace") as f:
+                text = f.read(max_chars)
+
+        img = Image.new("RGB", (500, 700), color="white")
+        draw = ImageDraw.Draw(img)
+        font = ImageFont.truetype(
+            font=settings.THUMBNAIL_FONT_NAME,
+            size=20,
+            layout_engine=ImageFont.Layout.BASIC,
+        )
+        draw.multiline_text((5, 5), text, font=font, fill="black", spacing=4)
+
+        out_path = self._tempdir / "thumb.webp"
+        img.save(out_path, format="WEBP")
+
+        return out_path
+
+    def get_page_count(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> int | None:
+        """Return the number of pages in the document.
+
+        Parameters
+        ----------
+        document_path:
+            Absolute path to the source document.
+        mime_type:
+            Detected MIME type of the document.
+
+        Returns
+        -------
+        int | None
+            Always None — page count is not meaningful for plain text.
+        """
+        return None
+
+    def extract_metadata(
+        self,
+        document_path: Path,
+        mime_type: str,
+    ) -> list[MetadataEntry]:
+        """Extract format-specific metadata from the document.
+
+        Returns
+        -------
+        list[MetadataEntry]
+            Always ``[]`` — plain text files carry no structured metadata.
+        """
+        return []
+
+    # ------------------------------------------------------------------
+    # Private helpers
+    # ------------------------------------------------------------------
+
+    def _read_text(self, filepath: Path) -> str:
+        """Read file content, replacing invalid UTF-8 bytes rather than failing.
+
+        Parameters
+        ----------
+        filepath:
+            Path to the file to read.
+
+        Returns
+        -------
+        str
+            File content as a string.
+        """
+        try:
+            return filepath.read_text(encoding="utf-8")
+        except UnicodeDecodeError as exc:
+            logger.warning(
+                "Unicode error reading %s, replacing bad bytes: %s",
+                filepath,
+                exc,
+            )
+            return filepath.read_bytes().decode("utf-8", errors="replace")
--- a/src/paperless/tests/conftest.py
+++ b/src/paperless/tests/conftest.py
@@ -0,0 +1,48 @@
+"""
+Fixtures defined here are available to every test module under
+src/paperless/tests/ (including sub-packages such as parsers/).
+
+Session-scoped fixtures for the shared samples directory live here so
+sub-package conftest files can reference them without duplicating path logic.
+Parser-specific fixtures (concrete parser instances, format-specific sample
+files) live in paperless/tests/parsers/conftest.py.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import TYPE_CHECKING
+
+import pytest
+
+from paperless.parsers.registry import reset_parser_registry
+
+if TYPE_CHECKING:
+    from collections.abc import Generator
+
+
+@pytest.fixture(scope="session")
+def samples_dir() -> Path:
+    """Absolute path to the shared parser sample files directory.
+
+    Sub-package conftest files derive format-specific paths from this root,
+    e.g. ``samples_dir / "text" / "test.txt"``.
+
+    Returns
+    -------
+    Path
+        Directory containing all sample documents used by parser tests.
+    """
+    return (Path(__file__).parent / "samples").resolve()
+
+
+@pytest.fixture(autouse=True)
+def clean_registry() -> Generator[None, None, None]:
+    """Reset the parser registry before and after every test.
+
+    This prevents registry state from leaking between tests that call
+    get_parser_registry() or init_builtin_parsers().
+    """
+    reset_parser_registry()
+    yield
+    reset_parser_registry()
--- a/src/paperless/tests/parsers/init.py
+++ b/src/paperless/tests/parsers/init.py
--- a/src/paperless/tests/parsers/conftest.py
+++ b/src/paperless/tests/parsers/conftest.py
@@ -0,0 +1,76 @@
+"""
+Parser fixtures that are used across multiple test modules in this package
+are defined here.  Format-specific sample-file fixtures are grouped by parser
+so it is easy to see which files belong to which test module.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+import pytest
+
+from paperless.parsers.text import TextDocumentParser
+
+if TYPE_CHECKING:
+    from collections.abc import Generator
+    from pathlib import Path
+
+
+# ------------------------------------------------------------------
+# Text parser sample files
+# ------------------------------------------------------------------
+
+
+@pytest.fixture(scope="session")
+def text_samples_dir(samples_dir: Path) -> Path:
+    """Absolute path to the text parser sample files directory.
+
+    Returns
+    -------
+    Path
+        ``<samples_dir>/text/``
+    """
+    return samples_dir / "text"
+
+
+@pytest.fixture(scope="session")
+def sample_txt_file(text_samples_dir: Path) -> Path:
+    """Path to a valid UTF-8 plain-text sample file.
+
+    Returns
+    -------
+    Path
+        Absolute path to ``text/test.txt``.
+    """
+    return text_samples_dir / "test.txt"
+
+
+@pytest.fixture(scope="session")
+def malformed_txt_file(text_samples_dir: Path) -> Path:
+    """Path to a text file containing invalid UTF-8 bytes.
+
+    Returns
+    -------
+    Path
+        Absolute path to ``text/decode_error.txt``.
+    """
+    return text_samples_dir / "decode_error.txt"
+
+
+# ------------------------------------------------------------------
+# Text parser instance
+# ------------------------------------------------------------------
+
+
+@pytest.fixture()
+def text_parser() -> Generator[TextDocumentParser, None, None]:
+    """Yield a TextDocumentParser and clean up its temporary directory afterwards.
+
+    Yields
+    ------
+    TextDocumentParser
+        A ready-to-use parser instance.
+    """
+    with TextDocumentParser() as parser:
+        yield parser
--- a/src/paperless/tests/parsers/test_text_parser.py
+++ b/src/paperless/tests/parsers/test_text_parser.py
@@ -0,0 +1,256 @@
+"""
+Tests for paperless.parsers.text.TextDocumentParser.
+
+All tests use the context-manager protocol for parser lifecycle.  Sample
+files are provided by session-scoped fixtures defined in conftest.py.
+"""
+
+from __future__ import annotations
+
+import tempfile
+from pathlib import Path
+
+import pytest
+
+from paperless.parsers import ParserProtocol
+from paperless.parsers.text import TextDocumentParser
+
+
+class TestTextParserProtocol:
+    """Verify that TextDocumentParser satisfies the ParserProtocol contract."""
+
+    def test_isinstance_satisfies_protocol(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert isinstance(text_parser, ParserProtocol)
+
+    def test_class_attributes_present(self) -> None:
+        assert isinstance(TextDocumentParser.name, str) and TextDocumentParser.name
+        assert (
+            isinstance(TextDocumentParser.version, str) and TextDocumentParser.version
+        )
+        assert isinstance(TextDocumentParser.author, str) and TextDocumentParser.author
+        assert isinstance(TextDocumentParser.url, str) and TextDocumentParser.url
+
+    def test_supported_mime_types_returns_dict(self) -> None:
+        mime_types = TextDocumentParser.supported_mime_types()
+        assert isinstance(mime_types, dict)
+        assert "text/plain" in mime_types
+        assert "text/csv" in mime_types
+        assert "application/csv" in mime_types
+
+    @pytest.mark.parametrize(
+        ("mime_type", "expected"),
+        [
+            ("text/plain", 10),
+            ("text/csv", 10),
+            ("application/csv", 10),
+            ("application/pdf", None),
+            ("image/png", None),
+        ],
+    )
+    def test_score(self, mime_type: str, expected: int | None) -> None:
+        assert TextDocumentParser.score(mime_type, "file.txt") == expected
+
+    def test_can_produce_archive_is_false(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert text_parser.can_produce_archive is False
+
+    def test_requires_pdf_rendition_is_false(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert text_parser.requires_pdf_rendition is False
+
+
+class TestTextParserLifecycle:
+    """Verify context-manager behaviour and temporary directory cleanup."""
+
+    def test_context_manager_cleans_up_tempdir(self) -> None:
+        with TextDocumentParser() as parser:
+            tempdir = parser._tempdir
+            assert tempdir.exists()
+        assert not tempdir.exists()
+
+    def test_context_manager_cleans_up_after_exception(self) -> None:
+        tempdir: Path | None = None
+        with pytest.raises(RuntimeError):
+            with TextDocumentParser() as parser:
+                tempdir = parser._tempdir
+                raise RuntimeError("boom")
+        assert tempdir is not None
+        assert not tempdir.exists()
+
+
+class TestTextParserParse:
+    """Verify parse() and the result accessors."""
+
+    def test_parse_valid_utf8(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        text_parser.parse(sample_txt_file, "text/plain")
+
+        assert text_parser.get_text() == "This is a test file.\n"
+
+    def test_parse_returns_none_for_archive_path(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        text_parser.parse(sample_txt_file, "text/plain")
+
+        assert text_parser.get_archive_path() is None
+
+    def test_parse_returns_none_for_date(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        text_parser.parse(sample_txt_file, "text/plain")
+
+        assert text_parser.get_date() is None
+
+    def test_parse_invalid_utf8_bytes_replaced(
+        self,
+        text_parser: TextDocumentParser,
+        malformed_txt_file: Path,
+    ) -> None:
+        """
+        GIVEN:
+            - A text file containing invalid UTF-8 byte sequences
+        WHEN:
+            - The file is parsed
+        THEN:
+            - Parsing succeeds
+            - Invalid bytes are replaced with the Unicode replacement character
+        """
+        text_parser.parse(malformed_txt_file, "text/plain")
+
+        assert text_parser.get_text() == "Pantothens\ufffdure\n"
+
+    def test_get_text_none_before_parse(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        assert text_parser.get_text() is None
+
+
+class TestTextParserThumbnail:
+    """Verify thumbnail generation."""
+
+    def test_thumbnail_exists_and_is_file(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        thumb = text_parser.get_thumbnail(sample_txt_file, "text/plain")
+
+        assert thumb.exists()
+        assert thumb.is_file()
+
+    def test_thumbnail_large_file_does_not_read_all(
+        self,
+        text_parser: TextDocumentParser,
+    ) -> None:
+        """
+        GIVEN:
+            - A text file larger than 50 MB
+        WHEN:
+            - A thumbnail is requested
+        THEN:
+            - The thumbnail is generated without loading the full file
+        """
+        with tempfile.NamedTemporaryFile(
+            delete=False,
+            mode="w",
+            encoding="utf-8",
+            suffix=".txt",
+        ) as tmp:
+            tmp.write("A" * (51 * 1024 * 1024))
+            large_file = Path(tmp.name)
+
+        try:
+            thumb = text_parser.get_thumbnail(large_file, "text/plain")
+            assert thumb.exists()
+            assert thumb.is_file()
+        finally:
+            large_file.unlink(missing_ok=True)
+
+    def test_get_page_count_returns_none(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        assert text_parser.get_page_count(sample_txt_file, "text/plain") is None
+
+
+class TestTextParserMetadata:
+    """Verify extract_metadata behaviour."""
+
+    def test_extract_metadata_returns_empty_list(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        result = text_parser.extract_metadata(sample_txt_file, "text/plain")
+
+        assert result == []
+
+    def test_extract_metadata_returns_list_type(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        result = text_parser.extract_metadata(sample_txt_file, "text/plain")
+
+        assert isinstance(result, list)
+
+    def test_extract_metadata_ignores_mime_type(
+        self,
+        text_parser: TextDocumentParser,
+        sample_txt_file: Path,
+    ) -> None:
+        """extract_metadata returns [] regardless of the mime_type argument."""
+        assert text_parser.extract_metadata(sample_txt_file, "application/pdf") == []
+        assert text_parser.extract_metadata(sample_txt_file, "text/csv") == []
+
+
+class TestTextParserRegistry:
+    """Verify that TextDocumentParser is registered by default."""
+
+    def test_registered_in_defaults(self) -> None:
+        from paperless.parsers.registry import ParserRegistry
+
+        registry = ParserRegistry()
+        registry.register_defaults()
+
+        assert TextDocumentParser in registry._builtins
+
+    def test_get_parser_for_text_plain(self) -> None:
+        from paperless.parsers.registry import get_parser_registry
+
+        registry = get_parser_registry()
+        parser_cls = registry.get_parser_for_file("text/plain", "doc.txt")
+
+        assert parser_cls is TextDocumentParser
+
+    def test_get_parser_for_text_csv(self) -> None:
+        from paperless.parsers.registry import get_parser_registry
+
+        registry = get_parser_registry()
+        parser_cls = registry.get_parser_for_file("text/csv", "data.csv")
+
+        assert parser_cls is TextDocumentParser
+
+    def test_get_parser_for_unknown_type_returns_none(self) -> None:
+        from paperless.parsers.registry import get_parser_registry
+
+        registry = get_parser_registry()
+        parser_cls = registry.get_parser_for_file("application/pdf", "doc.pdf")
+
+        assert parser_cls is None
--- a/src/paperless/tests/samples/text/decode_error.txt
+++ b/src/paperless/tests/samples/text/decode_error.txt
@@ -0,0 +1 @@
+Pantothensäure
--- a/src/paperless/tests/samples/text/test.txt
+++ b/src/paperless/tests/samples/text/test.txt
@@ -0,0 +1 @@
+This is a test file.
--- a/src/paperless/tests/test_adapter.py
+++ b/src/paperless/tests/test_adapter.py
@@ -1,107 +1,100 @@
-from unittest import mock
+import logging

+import pytest
 from allauth.account.adapter import get_adapter
 from allauth.core import context
 from allauth.socialaccount.adapter import get_adapter as get_social_adapter
-from django.conf import settings
 from django.contrib.auth.models import AnonymousUser
 from django.contrib.auth.models import Group
 from django.contrib.auth.models import User
 from django.forms import ValidationError
 from django.http import HttpRequest
-from django.test import TestCase
-from django.test import override_settings
 from django.urls import reverse
+from pytest_django.fixtures import SettingsWrapper
+from pytest_mock import MockerFixture
 from rest_framework.authtoken.models import Token

 from paperless.adapter import DrfTokenStrategy


-class TestCustomAccountAdapter(TestCase):
-    def test_is_open_for_signup(self) -> None:
+@pytest.mark.django_db
+class TestCustomAccountAdapter:
+    def test_is_open_for_signup(self, settings: SettingsWrapper) -> None:
        adapter = get_adapter()

        # With no accounts, signups should be allowed
-        self.assertTrue(adapter.is_open_for_signup(None))
+        assert adapter.is_open_for_signup(None)

        User.objects.create_user("testuser")

-        # Test when ACCOUNT_ALLOW_SIGNUPS is True
        settings.ACCOUNT_ALLOW_SIGNUPS = True
-        self.assertTrue(adapter.is_open_for_signup(None))
+        assert adapter.is_open_for_signup(None)

-        # Test when ACCOUNT_ALLOW_SIGNUPS is False
        settings.ACCOUNT_ALLOW_SIGNUPS = False
-        self.assertFalse(adapter.is_open_for_signup(None))
+        assert not adapter.is_open_for_signup(None)

-    def test_is_safe_url(self) -> None:
+    def test_is_safe_url(self, settings: SettingsWrapper) -> None:
        request = HttpRequest()
-        request.get_host = mock.Mock(return_value="example.com")
+        request.get_host = lambda: "example.com"
        with context.request_context(request):
            adapter = get_adapter()
-            with override_settings(ALLOWED_HOSTS=["*"]):
-                # True because request host is same
-                url = "https://example.com"
-                self.assertTrue(adapter.is_safe_url(url))

-            url = "https://evil.com"
+            settings.ALLOWED_HOSTS = ["*"]
+            # True because request host is same
+            assert adapter.is_safe_url("https://example.com")
            # False despite wildcard because request host is different
-            self.assertFalse(adapter.is_safe_url(url))
+            assert not adapter.is_safe_url("https://evil.com")

            settings.ALLOWED_HOSTS = ["example.com"]
-            url = "https://example.com"
            # True because request host is same
-            self.assertTrue(adapter.is_safe_url(url))
+            assert adapter.is_safe_url("https://example.com")

            settings.ALLOWED_HOSTS = ["*", "example.com"]
-            url = "//evil.com"
            # False because request host is not in allowed hosts
-            self.assertFalse(adapter.is_safe_url(url))
+            assert not adapter.is_safe_url("//evil.com")

-    @mock.patch("allauth.core.internal.ratelimit.consume", return_value=True)
-    def test_pre_authenticate(self, mock_consume) -> None:
+    def test_pre_authenticate(
+        self,
+        settings: SettingsWrapper,
+        mocker: MockerFixture,
+    ) -> None:
+        mocker.patch("allauth.core.internal.ratelimit.consume", return_value=True)
        adapter = get_adapter()
        request = HttpRequest()
-        request.get_host = mock.Mock(return_value="example.com")
+        request.get_host = lambda: "example.com"

        settings.DISABLE_REGULAR_LOGIN = False
        adapter.pre_authenticate(request)

        settings.DISABLE_REGULAR_LOGIN = True
-        with self.assertRaises(ValidationError):
+        with pytest.raises(ValidationError):
            adapter.pre_authenticate(request)

-    def test_get_reset_password_from_key_url(self) -> None:
+    def test_get_reset_password_from_key_url(self, settings: SettingsWrapper) -> None:
        request = HttpRequest()
-        request.get_host = mock.Mock(return_value="foo.org")
+        request.get_host = lambda: "foo.org"
        with context.request_context(request):
            adapter = get_adapter()

-            # Test when PAPERLESS_URL is None
-            with override_settings(
-                PAPERLESS_URL=None,
-                ACCOUNT_DEFAULT_HTTP_PROTOCOL="https",
-            ):
-                expected_url = f"https://foo.org{reverse('account_reset_password_from_key', kwargs={'uidb36': 'UID', 'key': 'KEY'})}"
-                self.assertEqual(
-                    adapter.get_reset_password_from_key_url("UID-KEY"),
-                    expected_url,
-                )
+            settings.PAPERLESS_URL = None
+            settings.ACCOUNT_DEFAULT_HTTP_PROTOCOL = "https"
+            expected_url = f"https://foo.org{reverse('account_reset_password_from_key', kwargs={'uidb36': 'UID', 'key': 'KEY'})}"
+            assert adapter.get_reset_password_from_key_url("UID-KEY") == expected_url

-            # Test when PAPERLESS_URL is not None
-            with override_settings(PAPERLESS_URL="https://bar.com"):
-                expected_url = f"https://bar.com{reverse('account_reset_password_from_key', kwargs={'uidb36': 'UID', 'key': 'KEY'})}"
-                self.assertEqual(
-                    adapter.get_reset_password_from_key_url("UID-KEY"),
-                    expected_url,
-                )
+            settings.PAPERLESS_URL = "https://bar.com"
+            expected_url = f"https://bar.com{reverse('account_reset_password_from_key', kwargs={'uidb36': 'UID', 'key': 'KEY'})}"
+            assert adapter.get_reset_password_from_key_url("UID-KEY") == expected_url

-    @override_settings(ACCOUNT_DEFAULT_GROUPS=["group1", "group2"])
-    def test_save_user_adds_groups(self) -> None:
+    def test_save_user_adds_groups(
+        self,
+        settings: SettingsWrapper,
+        mocker: MockerFixture,
+    ) -> None:
+        settings.ACCOUNT_DEFAULT_GROUPS = ["group1", "group2"]
        Group.objects.create(name="group1")
        user = User.objects.create_user("testuser")
        adapter = get_adapter()
-        form = mock.Mock(
+        form = mocker.MagicMock(
            cleaned_data={
                "username": "testuser",
                "email": "user@example.com",
@@ -110,88 +103,81 @@ class TestCustomAccountAdapter(TestCase):

        user = adapter.save_user(HttpRequest(), user, form, commit=True)

-        self.assertEqual(user.groups.count(), 1)
-        self.assertTrue(user.groups.filter(name="group1").exists())
-        self.assertFalse(user.groups.filter(name="group2").exists())
+        assert user.groups.count() == 1
+        assert user.groups.filter(name="group1").exists()
+        assert not user.groups.filter(name="group2").exists()

-    def test_fresh_install_save_creates_superuser(self) -> None:
+    def test_fresh_install_save_creates_superuser(self, mocker: MockerFixture) -> None:
        adapter = get_adapter()
-        form = mock.Mock(
+        form = mocker.MagicMock(
            cleaned_data={
                "username": "testuser",
                "email": "user@paperless-ngx.com",
            },
        )
        user = adapter.save_user(HttpRequest(), User(), form, commit=True)
-        self.assertTrue(user.is_superuser)
+        assert user.is_superuser

-        # Next time, it should not create a superuser
-        form = mock.Mock(
+        form = mocker.MagicMock(
            cleaned_data={
                "username": "testuser2",
                "email": "user2@paperless-ngx.com",
            },
        )
        user2 = adapter.save_user(HttpRequest(), User(), form, commit=True)
-        self.assertFalse(user2.is_superuser)
+        assert not user2.is_superuser


-class TestCustomSocialAccountAdapter(TestCase):
-    def test_is_open_for_signup(self) -> None:
+class TestCustomSocialAccountAdapter:
+    @pytest.mark.django_db
+    def test_is_open_for_signup(self, settings: SettingsWrapper) -> None:
        adapter = get_social_adapter()

-        # Test when SOCIALACCOUNT_ALLOW_SIGNUPS is True
        settings.SOCIALACCOUNT_ALLOW_SIGNUPS = True
-        self.assertTrue(adapter.is_open_for_signup(None, None))
+        assert adapter.is_open_for_signup(None, None)

-        # Test when SOCIALACCOUNT_ALLOW_SIGNUPS is False
        settings.SOCIALACCOUNT_ALLOW_SIGNUPS = False
-        self.assertFalse(adapter.is_open_for_signup(None, None))
+        assert not adapter.is_open_for_signup(None, None)

    def test_get_connect_redirect_url(self) -> None:
        adapter = get_social_adapter()
-        request = None
-        socialaccount = None
+        assert adapter.get_connect_redirect_url(None, None) == reverse("base")

-        # Test the default URL
-        expected_url = reverse("base")
-        self.assertEqual(
-            adapter.get_connect_redirect_url(request, socialaccount),
-            expected_url,
-        )
-
-    @override_settings(SOCIAL_ACCOUNT_DEFAULT_GROUPS=["group1", "group2"])
-    def test_save_user_adds_groups(self) -> None:
+    @pytest.mark.django_db
+    def test_save_user_adds_groups(
+        self,
+        settings: SettingsWrapper,
+        mocker: MockerFixture,
+    ) -> None:
+        settings.SOCIAL_ACCOUNT_DEFAULT_GROUPS = ["group1", "group2"]
        Group.objects.create(name="group1")
        adapter = get_social_adapter()
-        request = HttpRequest()
        user = User.objects.create_user("testuser")
-        sociallogin = mock.Mock(
-            user=user,
-        )
+        sociallogin = mocker.MagicMock(user=user)

-        user = adapter.save_user(request, sociallogin, None)
+        user = adapter.save_user(HttpRequest(), sociallogin, None)

-        self.assertEqual(user.groups.count(), 1)
-        self.assertTrue(user.groups.filter(name="group1").exists())
-        self.assertFalse(user.groups.filter(name="group2").exists())
+        assert user.groups.count() == 1
+        assert user.groups.filter(name="group1").exists()
+        assert not user.groups.filter(name="group2").exists()

-    def test_error_logged_on_authentication_error(self) -> None:
+    def test_error_logged_on_authentication_error(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
        adapter = get_social_adapter()
-        request = HttpRequest()
-        with self.assertLogs("paperless.auth", level="INFO") as log_cm:
+        with caplog.at_level(logging.INFO, logger="paperless.auth"):
            adapter.on_authentication_error(
-                request,
+                HttpRequest(),
                provider="test-provider",
                error="Error",
                exception="Test authentication error",
            )
-        self.assertTrue(
-            any("Test authentication error" in message for message in log_cm.output),
-        )
+        assert any("Test authentication error" in msg for msg in caplog.messages)


-class TestDrfTokenStrategy(TestCase):
+@pytest.mark.django_db
+class TestDrfTokenStrategy:
    def test_create_access_token_creates_new_token(self) -> None:
        """
        GIVEN:
@@ -201,7 +187,6 @@ class TestDrfTokenStrategy(TestCase):
        THEN:
            - A new token is created and its key is returned
        """
-
        user = User.objects.create_user("testuser")
        request = HttpRequest()
        request.user = user
@@ -209,13 +194,9 @@ class TestDrfTokenStrategy(TestCase):
        strategy = DrfTokenStrategy()
        token_key = strategy.create_access_token(request)

-        # Verify a token was created
-        self.assertIsNotNone(token_key)
-        self.assertTrue(Token.objects.filter(user=user).exists())
-
-        # Verify the returned key matches the created token
-        token = Token.objects.get(user=user)
-        self.assertEqual(token_key, token.key)
+        assert token_key is not None
+        assert Token.objects.filter(user=user).exists()
+        assert token_key == Token.objects.get(user=user).key

    def test_create_access_token_returns_existing_token(self) -> None:
        """
@@ -226,7 +207,6 @@ class TestDrfTokenStrategy(TestCase):
        THEN:
            - The same token key is returned (no new token created)
        """
-
        user = User.objects.create_user("testuser")
        existing_token = Token.objects.create(user=user)

@@ -236,11 +216,8 @@ class TestDrfTokenStrategy(TestCase):
        strategy = DrfTokenStrategy()
        token_key = strategy.create_access_token(request)

-        # Verify the existing token key is returned
-        self.assertEqual(token_key, existing_token.key)
-
-        # Verify only one token exists (no duplicate created)
-        self.assertEqual(Token.objects.filter(user=user).count(), 1)
+        assert token_key == existing_token.key
+        assert Token.objects.filter(user=user).count() == 1

    def test_create_access_token_returns_none_for_unauthenticated_user(self) -> None:
        """
@@ -251,12 +228,11 @@ class TestDrfTokenStrategy(TestCase):
        THEN:
            - None is returned and no token is created
        """
-
        request = HttpRequest()
        request.user = AnonymousUser()

        strategy = DrfTokenStrategy()
        token_key = strategy.create_access_token(request)

-        self.assertIsNone(token_key)
-        self.assertEqual(Token.objects.count(), 0)
+        assert token_key is None
+        assert Token.objects.count() == 0
--- a/src/paperless/tests/test_checks.py
+++ b/src/paperless/tests/test_checks.py
@@ -1,16 +1,15 @@
 import os
+from collections.abc import Callable
+from dataclasses import dataclass
 from pathlib import Path
 from unittest import mock

 import pytest
 from django.core.checks import Error
 from django.core.checks import Warning
-from django.test import TestCase
-from django.test import override_settings
+from pytest_django.fixtures import SettingsWrapper
 from pytest_mock import MockerFixture

-from documents.tests.utils import DirectoriesMixin
-from documents.tests.utils import FileSystemAssertsMixin
 from paperless.checks import audit_log_check
 from paperless.checks import binaries_check
 from paperless.checks import check_deprecated_db_settings
@@ -20,54 +19,84 @@ from paperless.checks import paths_check
 from paperless.checks import settings_values_check


-class TestChecks(DirectoriesMixin, TestCase):
-    def test_binaries(self) -> None:
-        self.assertEqual(binaries_check(None), [])
+@dataclass(frozen=True, slots=True)
+class PaperlessTestDirs:
+    data_dir: Path
+    media_dir: Path
+    consumption_dir: Path

-    @override_settings(CONVERT_BINARY="uuuhh")
-    def test_binaries_fail(self) -> None:
-        self.assertEqual(len(binaries_check(None)), 1)

-    def test_paths_check(self) -> None:
-        self.assertEqual(paths_check(None), [])
+# TODO: consolidate with documents/tests/conftest.py PaperlessDirs/paperless_dirs
+#       once the paperless and documents test suites are ready to share fixtures.
+@pytest.fixture()
+def directories(tmp_path: Path, settings: SettingsWrapper) -> PaperlessTestDirs:
+    data_dir = tmp_path / "data"
+    media_dir = tmp_path / "media"
+    consumption_dir = tmp_path / "consumption"

-    @override_settings(
-        MEDIA_ROOT=Path("uuh"),
-        DATA_DIR=Path("whatever"),
-        CONSUMPTION_DIR=Path("idontcare"),
+    for d in (data_dir, media_dir, consumption_dir):
+        d.mkdir()
+
+    settings.DATA_DIR = data_dir
+    settings.MEDIA_ROOT = media_dir
+    settings.CONSUMPTION_DIR = consumption_dir
+
+    return PaperlessTestDirs(
+        data_dir=data_dir,
+        media_dir=media_dir,
+        consumption_dir=consumption_dir,
    )
-    def test_paths_check_dont_exist(self) -> None:
-        msgs = paths_check(None)
-        self.assertEqual(len(msgs), 3, str(msgs))

-        for msg in msgs:
-            self.assertTrue(msg.msg.endswith("is set but doesn't exist."))

-    def test_paths_check_no_access(self) -> None:
-        Path(self.dirs.data_dir).chmod(0o000)
-        Path(self.dirs.media_dir).chmod(0o000)
-        Path(self.dirs.consumption_dir).chmod(0o000)
+class TestChecks:
+    def test_binaries(self) -> None:
+        assert binaries_check(None) == []

-        self.addCleanup(os.chmod, self.dirs.data_dir, 0o777)
-        self.addCleanup(os.chmod, self.dirs.media_dir, 0o777)
-        self.addCleanup(os.chmod, self.dirs.consumption_dir, 0o777)
+    def test_binaries_fail(self, settings: SettingsWrapper) -> None:
+        settings.CONVERT_BINARY = "uuuhh"
+        assert len(binaries_check(None)) == 1
+
+    @pytest.mark.usefixtures("directories")
+    def test_paths_check(self) -> None:
+        assert paths_check(None) == []
+
+    def test_paths_check_dont_exist(self, settings: SettingsWrapper) -> None:
+        settings.MEDIA_ROOT = Path("uuh")
+        settings.DATA_DIR = Path("whatever")
+        settings.CONSUMPTION_DIR = Path("idontcare")

        msgs = paths_check(None)
-        self.assertEqual(len(msgs), 3)

+        assert len(msgs) == 3, str(msgs)
        for msg in msgs:
-            self.assertTrue(msg.msg.endswith("is not writeable"))
+            assert msg.msg.endswith("is set but doesn't exist.")

-    @override_settings(DEBUG=False)
-    def test_debug_disabled(self) -> None:
-        self.assertEqual(debug_mode_check(None), [])
+    def test_paths_check_no_access(self, directories: PaperlessTestDirs) -> None:
+        directories.data_dir.chmod(0o000)
+        directories.media_dir.chmod(0o000)
+        directories.consumption_dir.chmod(0o000)

-    @override_settings(DEBUG=True)
-    def test_debug_enabled(self) -> None:
-        self.assertEqual(len(debug_mode_check(None)), 1)
+        try:
+            msgs = paths_check(None)
+        finally:
+            directories.data_dir.chmod(0o777)
+            directories.media_dir.chmod(0o777)
+            directories.consumption_dir.chmod(0o777)
+
+        assert len(msgs) == 3
+        for msg in msgs:
+            assert msg.msg.endswith("is not writeable")
+
+    def test_debug_disabled(self, settings: SettingsWrapper) -> None:
+        settings.DEBUG = False
+        assert debug_mode_check(None) == []
+
+    def test_debug_enabled(self, settings: SettingsWrapper) -> None:
+        settings.DEBUG = True
+        assert len(debug_mode_check(None)) == 1


-class TestSettingsChecksAgainstDefaults(DirectoriesMixin, TestCase):
+class TestSettingsChecksAgainstDefaults:
    def test_all_valid(self) -> None:
        """
        GIVEN:
@@ -78,104 +107,71 @@ class TestSettingsChecksAgainstDefaults(DirectoriesMixin, TestCase):
            - No system check errors reported
        """
        msgs = settings_values_check(None)
-        self.assertEqual(len(msgs), 0)
+        assert len(msgs) == 0


-class TestOcrSettingsChecks(DirectoriesMixin, TestCase):
-    @override_settings(OCR_OUTPUT_TYPE="notapdf")
-    def test_invalid_output_type(self) -> None:
+class TestOcrSettingsChecks:
+    @pytest.mark.parametrize(
+        ("setting", "value", "expected_msg"),
+        [
+            pytest.param(
+                "OCR_OUTPUT_TYPE",
+                "notapdf",
+                'OCR output type "notapdf"',
+                id="invalid-output-type",
+            ),
+            pytest.param(
+                "OCR_MODE",
+                "makeitso",
+                'OCR output mode "makeitso"',
+                id="invalid-mode",
+            ),
+            pytest.param(
+                "OCR_MODE",
+                "skip_noarchive",
+                "deprecated",
+                id="deprecated-mode",
+            ),
+            pytest.param(
+                "OCR_SKIP_ARCHIVE_FILE",
+                "invalid",
+                'OCR_SKIP_ARCHIVE_FILE setting "invalid"',
+                id="invalid-skip-archive-file",
+            ),
+            pytest.param(
+                "OCR_CLEAN",
+                "cleanme",
+                'OCR clean mode "cleanme"',
+                id="invalid-clean",
+            ),
+        ],
+    )
+    def test_invalid_setting_produces_one_error(
+        self,
+        settings: SettingsWrapper,
+        setting: str,
+        value: str,
+        expected_msg: str,
+    ) -> None:
        """
        GIVEN:
            - Default settings
-            - OCR output type is invalid
+            - One OCR setting is set to an invalid value
        WHEN:
            - Settings are validated
        THEN:
-            - system check error reported for OCR output type
+            - Exactly one system check error is reported containing the expected message
        """
+        setattr(settings, setting, value)
+
        msgs = settings_values_check(None)
-        self.assertEqual(len(msgs), 1)

-        msg = msgs[0]
-
-        self.assertIn('OCR output type "notapdf"', msg.msg)
-
-    @override_settings(OCR_MODE="makeitso")
-    def test_invalid_ocr_type(self) -> None:
-        """
-        GIVEN:
-            - Default settings
-            - OCR type is invalid
-        WHEN:
-            - Settings are validated
-        THEN:
-            - system check error reported for OCR type
-        """
-        msgs = settings_values_check(None)
-        self.assertEqual(len(msgs), 1)
-
-        msg = msgs[0]
-
-        self.assertIn('OCR output mode "makeitso"', msg.msg)
-
-    @override_settings(OCR_MODE="skip_noarchive")
-    def test_deprecated_ocr_type(self) -> None:
-        """
-        GIVEN:
-            - Default settings
-            - OCR type is deprecated
-        WHEN:
-            - Settings are validated
-        THEN:
-            - deprecation warning reported for OCR type
-        """
-        msgs = settings_values_check(None)
-        self.assertEqual(len(msgs), 1)
-
-        msg = msgs[0]
-
-        self.assertIn("deprecated", msg.msg)
-
-    @override_settings(OCR_SKIP_ARCHIVE_FILE="invalid")
-    def test_invalid_ocr_skip_archive_file(self) -> None:
-        """
-        GIVEN:
-            - Default settings
-            - OCR_SKIP_ARCHIVE_FILE is invalid
-        WHEN:
-            - Settings are validated
-        THEN:
-            - system check error reported for OCR_SKIP_ARCHIVE_FILE
-        """
-        msgs = settings_values_check(None)
-        self.assertEqual(len(msgs), 1)
-
-        msg = msgs[0]
-
-        self.assertIn('OCR_SKIP_ARCHIVE_FILE setting "invalid"', msg.msg)
-
-    @override_settings(OCR_CLEAN="cleanme")
-    def test_invalid_ocr_clean(self) -> None:
-        """
-        GIVEN:
-            - Default settings
-            - OCR cleaning type is invalid
-        WHEN:
-            - Settings are validated
-        THEN:
-            - system check error reported for OCR cleaning type
-        """
-        msgs = settings_values_check(None)
-        self.assertEqual(len(msgs), 1)
-
-        msg = msgs[0]
-
-        self.assertIn('OCR clean mode "cleanme"', msg.msg)
+        assert len(msgs) == 1
+        assert expected_msg in msgs[0].msg


-class TestTimezoneSettingsChecks(DirectoriesMixin, TestCase):
-    @override_settings(TIME_ZONE="TheMoon\\MyCrater")
-    def test_invalid_timezone(self) -> None:
+class TestTimezoneSettingsChecks:
+    def test_invalid_timezone(self, settings: SettingsWrapper) -> None:
        """
        GIVEN:
            - Default settings
@@ -185,17 +181,16 @@ class TestTimezoneSettingsChecks(DirectoriesMixin, TestCase):
        THEN:
            - system check error reported for timezone
        """
+        settings.TIME_ZONE = "TheMoon\\MyCrater"
+
        msgs = settings_values_check(None)
-        self.assertEqual(len(msgs), 1)

-        msg = msgs[0]
-
-        self.assertIn('Timezone "TheMoon\\MyCrater"', msg.msg)
+        assert len(msgs) == 1
+        assert 'Timezone "TheMoon\\MyCrater"' in msgs[0].msg


-class TestEmailCertSettingsChecks(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
-    @override_settings(EMAIL_CERTIFICATE_FILE=Path("/tmp/not_actually_here.pem"))
-    def test_not_valid_file(self) -> None:
+class TestEmailCertSettingsChecks:
+    def test_not_valid_file(self, settings: SettingsWrapper) -> None:
        """
        GIVEN:
            - Default settings
@@ -205,19 +200,22 @@ class TestEmailCertSettingsChecks(DirectoriesMixin, FileSystemAssertsMixin, Test
        THEN:
            - system check error reported for email certificate
        """
-        self.assertIsNotFile("/tmp/not_actually_here.pem")
+        cert_path = Path("/tmp/not_actually_here.pem")
+        assert not cert_path.is_file()
+        settings.EMAIL_CERTIFICATE_FILE = cert_path

        msgs = settings_values_check(None)

-        self.assertEqual(len(msgs), 1)
-
-        msg = msgs[0]
-
-        self.assertIn("Email cert /tmp/not_actually_here.pem is not a file", msg.msg)
+        assert len(msgs) == 1
+        assert "Email cert /tmp/not_actually_here.pem is not a file" in msgs[0].msg


-class TestAuditLogChecks(TestCase):
-    def test_was_enabled_once(self) -> None:
+class TestAuditLogChecks:
+    def test_was_enabled_once(
+        self,
+        settings: SettingsWrapper,
+        mocker: MockerFixture,
+    ) -> None:
        """
        GIVEN:
            - Audit log is not enabled
@@ -226,23 +224,18 @@ class TestAuditLogChecks(TestCase):
        THEN:
            - system check error reported for disabling audit log
        """
-        introspect_mock = mock.MagicMock()
+        settings.AUDIT_LOG_ENABLED = False
+        introspect_mock = mocker.MagicMock()
        introspect_mock.introspection.table_names.return_value = ["auditlog_logentry"]
-        with override_settings(AUDIT_LOG_ENABLED=False):
-            with mock.patch.dict(
-                "paperless.checks.connections",
-                {"default": introspect_mock},
-            ):
-                msgs = audit_log_check(None)
+        mocker.patch.dict(
+            "paperless.checks.connections",
+            {"default": introspect_mock},
+        )

-                self.assertEqual(len(msgs), 1)
+        msgs = audit_log_check(None)

-                msg = msgs[0]
-
-                self.assertIn(
-                    ("auditlog table was found but audit log is disabled."),
-                    msg.msg,
-                )
+        assert len(msgs) == 1
+        assert "auditlog table was found but audit log is disabled." in msgs[0].msg


 DEPRECATED_VARS: dict[str, str] = {
@@ -271,20 +264,16 @@ class TestDeprecatedDbSettings:
    @pytest.mark.parametrize(
        ("env_var", "db_option_key"),
        [
-            ("PAPERLESS_DB_TIMEOUT", "timeout"),
-            ("PAPERLESS_DB_POOLSIZE", "pool.min_size / pool.max_size"),
-            ("PAPERLESS_DBSSLMODE", "sslmode"),
-            ("PAPERLESS_DBSSLROOTCERT", "sslrootcert"),
-            ("PAPERLESS_DBSSLCERT", "sslcert"),
-            ("PAPERLESS_DBSSLKEY", "sslkey"),
-        ],
-        ids=[
-            "db-timeout",
-            "db-poolsize",
-            "ssl-mode",
-            "ssl-rootcert",
-            "ssl-cert",
-            "ssl-key",
+            pytest.param("PAPERLESS_DB_TIMEOUT", "timeout", id="db-timeout"),
+            pytest.param(
+                "PAPERLESS_DB_POOLSIZE",
+                "pool.min_size / pool.max_size",
+                id="db-poolsize",
+            ),
+            pytest.param("PAPERLESS_DBSSLMODE", "sslmode", id="ssl-mode"),
+            pytest.param("PAPERLESS_DBSSLROOTCERT", "sslrootcert", id="ssl-rootcert"),
+            pytest.param("PAPERLESS_DBSSLCERT", "sslcert", id="ssl-cert"),
+            pytest.param("PAPERLESS_DBSSLKEY", "sslkey", id="ssl-key"),
        ],
    )
    def test_single_deprecated_var_produces_one_warning(
@@ -403,7 +392,10 @@ class TestV3MinimumUpgradeVersionCheck:
    """Test suite for check_v3_minimum_upgrade_version system check."""

    @pytest.fixture
-    def build_conn_mock(self, mocker: MockerFixture):
+    def build_conn_mock(
+        self,
+        mocker: MockerFixture,
+    ) -> Callable[[list[str], list[str]], mock.MagicMock]:
        """Factory fixture that builds a connections['default'] mock.

        Usage::
@@ -423,7 +415,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_no_migrations_table_fresh_install(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
@@ -442,7 +434,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_no_documents_migrations_fresh_install(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
@@ -461,7 +453,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_v3_state_with_0001_squashed(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
@@ -485,7 +477,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_v3_state_with_0002_squashed_only(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
@@ -504,7 +496,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_v2_20_9_state_ready_to_upgrade(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
@@ -531,7 +523,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_v2_20_8_raises_error(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
@@ -558,7 +550,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_very_old_version_raises_error(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
@@ -585,7 +577,7 @@ class TestV3MinimumUpgradeVersionCheck:
    def test_error_hint_mentions_v2_20_9(
        self,
        mocker: MockerFixture,
-        build_conn_mock,
+        build_conn_mock: Callable[[list[str], list[str]], mock.MagicMock],
    ) -> None:
        """
        GIVEN:
--- a/src/paperless/tests/test_registry.py
+++ b/src/paperless/tests/test_registry.py
@@ -0,0 +1,710 @@
+"""
+Tests for :mod:`paperless.parsers` (ParserProtocol) and
+:mod:`paperless.parsers.registry` (ParserRegistry + module-level helpers).
+
+All tests use pytest-style functions/classes — no unittest.TestCase.
+The ``clean_registry`` fixture ensures complete isolation between tests by
+resetting the module-level singleton before and after every test.
+"""
+
+from __future__ import annotations
+
+import logging
+from importlib.metadata import EntryPoint
+from pathlib import Path
+from typing import Self
+from unittest.mock import MagicMock
+from unittest.mock import patch
+
+import pytest
+
+from paperless.parsers import ParserProtocol
+from paperless.parsers.registry import ParserRegistry
+from paperless.parsers.registry import get_parser_registry
+from paperless.parsers.registry import init_builtin_parsers
+from paperless.parsers.registry import reset_parser_registry
+
+
+@pytest.fixture()
+def dummy_parser_cls() -> type:
+    """Return a class that fully satisfies :class:`ParserProtocol`.
+
+    GIVEN: A need to exercise registry and Protocol logic with a minimal
+           but complete parser.
+    WHEN:  A test requests this fixture.
+    THEN:  A class with all required attributes and methods is returned.
+    """
+
+    class DummyParser:
+        name = "dummy-parser"
+        version = "0.1.0"
+        author = "Test Author"
+        url = "https://example.com/dummy-parser"
+
+        @classmethod
+        def supported_mime_types(cls) -> dict[str, str]:
+            return {"text/plain": ".txt"}
+
+        @classmethod
+        def score(
+            cls,
+            mime_type: str,
+            filename: str,
+            path: Path | None = None,
+        ) -> int | None:
+            return 10
+
+        @property
+        def can_produce_archive(self) -> bool:
+            return False
+
+        @property
+        def requires_pdf_rendition(self) -> bool:
+            return False
+
+        def parse(
+            self,
+            document_path: Path,
+            mime_type: str,
+            *,
+            produce_archive: bool = True,
+        ) -> None:
+            pass
+
+        def get_text(self) -> str | None:
+            return None
+
+        def get_date(self) -> None:
+            return None
+
+        def get_archive_path(self) -> Path | None:
+            return None
+
+        def get_thumbnail(
+            self,
+            document_path: Path,
+            mime_type: str,
+        ) -> Path:
+            return Path("/tmp/thumbnail.webp")
+
+        def get_page_count(
+            self,
+            document_path: Path,
+            mime_type: str,
+        ) -> int | None:
+            return None
+
+        def extract_metadata(
+            self,
+            document_path: Path,
+            mime_type: str,
+        ) -> list:
+            return []
+
+        def __enter__(self) -> Self:
+            return self
+
+        def __exit__(self, exc_type, exc_val, exc_tb) -> None:
+            pass
+
+    return DummyParser
+
+
+class TestParserProtocol:
+    """Verify runtime isinstance() checks against ParserProtocol."""
+
+    def test_compliant_class_instance_passes_isinstance(
+        self,
+        dummy_parser_cls: type,
+    ) -> None:
+        """
+        GIVEN: A class that implements every method required by ParserProtocol.
+        WHEN:  isinstance() is called with the Protocol.
+        THEN:  The check passes (returns True).
+        """
+        instance = dummy_parser_cls()
+        assert isinstance(instance, ParserProtocol)
+
+    def test_non_compliant_class_instance_fails_isinstance(self) -> None:
+        """
+        GIVEN: A plain class with no parser-related methods.
+        WHEN:  isinstance() is called with ParserProtocol.
+        THEN:  The check fails (returns False).
+        """
+
+        class Unrelated:
+            pass
+
+        assert not isinstance(Unrelated(), ParserProtocol)
+
+    @pytest.mark.parametrize(
+        "missing_method",
+        [
+            pytest.param("parse", id="missing-parse"),
+            pytest.param("get_text", id="missing-get_text"),
+            pytest.param("get_thumbnail", id="missing-get_thumbnail"),
+            pytest.param("__enter__", id="missing-__enter__"),
+            pytest.param("__exit__", id="missing-__exit__"),
+        ],
+    )
+    def test_partial_compliant_fails_isinstance(
+        self,
+        dummy_parser_cls: type,
+        missing_method: str,
+    ) -> None:
+        """
+        GIVEN: A class that satisfies ParserProtocol except for one method.
+        WHEN:  isinstance() is called with ParserProtocol.
+        THEN:  The check fails because the Protocol is not fully satisfied.
+        """
+        # Create a subclass and delete the specified method to break compliance.
+        partial_cls = type(
+            "PartialParser",
+            (dummy_parser_cls,),
+            {missing_method: None},  # Replace with None — not callable
+        )
+        assert not isinstance(partial_cls(), ParserProtocol)
+
+
+class TestRegistrySingleton:
+    """Verify the module-level singleton lifecycle functions."""
+
+    def test_get_parser_registry_returns_instance(self) -> None:
+        """
+        GIVEN: No registry has been created yet.
+        WHEN:  get_parser_registry() is called.
+        THEN:  A ParserRegistry instance is returned.
+        """
+        registry = get_parser_registry()
+        assert isinstance(registry, ParserRegistry)
+
+    def test_get_parser_registry_same_instance_on_repeated_calls(self) -> None:
+        """
+        GIVEN: A registry instance was created by a prior call.
+        WHEN:  get_parser_registry() is called a second time.
+        THEN:  The exact same object (identity) is returned.
+        """
+        first = get_parser_registry()
+        second = get_parser_registry()
+        assert first is second
+
+    def test_reset_parser_registry_gives_fresh_instance(self) -> None:
+        """
+        GIVEN: A registry instance already exists.
+        WHEN:  reset_parser_registry() is called and then get_parser_registry()
+               is called again.
+        THEN:  A new, distinct registry instance is returned.
+        """
+        first = get_parser_registry()
+        reset_parser_registry()
+        second = get_parser_registry()
+        assert first is not second
+
+    def test_init_builtin_parsers_does_not_run_discover(
+        self,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        """
+        GIVEN: discover() would raise an exception if called.
+        WHEN:  init_builtin_parsers() is called.
+        THEN:  No exception is raised, confirming discover() was not invoked.
+        """
+
+        def exploding_discover(self) -> None:
+            raise RuntimeError(
+                "discover() must not be called from init_builtin_parsers",
+            )
+
+        monkeypatch.setattr(ParserRegistry, "discover", exploding_discover)
+
+        # Should complete without raising.
+        init_builtin_parsers()
+
+    def test_init_builtin_parsers_idempotent(self) -> None:
+        """
+        GIVEN: init_builtin_parsers() has already been called once.
+        WHEN:  init_builtin_parsers() is called a second time.
+        THEN:  No error is raised and the same registry instance is reused.
+        """
+        init_builtin_parsers()
+        # Capture the registry created by the first call.
+        import paperless.parsers.registry as reg_module
+
+        first_registry = reg_module._registry
+
+        init_builtin_parsers()
+
+        assert reg_module._registry is first_registry
+
+
+class TestParserRegistryGetParserForFile:
+    """Verify parser selection logic in get_parser_for_file()."""
+
+    def test_returns_none_when_no_parsers_registered(self) -> None:
+        """
+        GIVEN: A registry with no parsers registered.
+        WHEN:  get_parser_for_file() is called for any MIME type.
+        THEN:  None is returned.
+        """
+        registry = ParserRegistry()
+        result = registry.get_parser_for_file("text/plain", "doc.txt")
+        assert result is None
+
+    def test_returns_none_for_unsupported_mime_type(
+        self,
+        dummy_parser_cls: type,
+    ) -> None:
+        """
+        GIVEN: A registry with a parser that supports only 'text/plain'.
+        WHEN:  get_parser_for_file() is called with 'application/pdf'.
+        THEN:  None is returned.
+        """
+        registry = ParserRegistry()
+        registry.register_builtin(dummy_parser_cls)
+        result = registry.get_parser_for_file("application/pdf", "file.pdf")
+        assert result is None
+
+    def test_returns_parser_for_supported_mime_type(
+        self,
+        dummy_parser_cls: type,
+    ) -> None:
+        """
+        GIVEN: A registry with a parser registered for 'text/plain'.
+        WHEN:  get_parser_for_file() is called with 'text/plain'.
+        THEN:  The registered parser class is returned.
+        """
+        registry = ParserRegistry()
+        registry.register_builtin(dummy_parser_cls)
+        result = registry.get_parser_for_file("text/plain", "readme.txt")
+        assert result is dummy_parser_cls
+
+    def test_highest_score_wins(self) -> None:
+        """
+        GIVEN: Two parsers both supporting 'text/plain' with scores 5 and 20.
+        WHEN:  get_parser_for_file() is called for 'text/plain'.
+        THEN:  The parser with score 20 is returned.
+        """
+
+        class LowScoreParser:
+            name = "low"
+            version = "1.0"
+            author = "A"
+            url = "https://example.com/low"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 5
+
+        class HighScoreParser:
+            name = "high"
+            version = "1.0"
+            author = "B"
+            url = "https://example.com/high"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 20
+
+        registry = ParserRegistry()
+        registry.register_builtin(LowScoreParser)
+        registry.register_builtin(HighScoreParser)
+        result = registry.get_parser_for_file("text/plain", "readme.txt")
+        assert result is HighScoreParser
+
+    def test_parser_returning_none_score_is_skipped(self) -> None:
+        """
+        GIVEN: A parser that returns None from score() for the given file.
+        WHEN:  get_parser_for_file() is called.
+        THEN:  That parser is skipped and None is returned (no other candidates).
+        """
+
+        class DecliningParser:
+            name = "declining"
+            version = "1.0"
+            author = "A"
+            url = "https://example.com"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None  # Explicitly declines
+
+        registry = ParserRegistry()
+        registry.register_builtin(DecliningParser)
+        result = registry.get_parser_for_file("text/plain", "readme.txt")
+        assert result is None
+
+    def test_all_parsers_decline_returns_none(self) -> None:
+        """
+        GIVEN: Multiple parsers that all return None from score().
+        WHEN:  get_parser_for_file() is called.
+        THEN:  None is returned.
+        """
+
+        class AlwaysDeclines:
+            name = "declines"
+            version = "1.0"
+            author = "A"
+            url = "https://example.com"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None
+
+        registry = ParserRegistry()
+        registry.register_builtin(AlwaysDeclines)
+        registry._external.append(AlwaysDeclines)
+        result = registry.get_parser_for_file("text/plain", "file.txt")
+        assert result is None
+
+    def test_external_parser_beats_builtin_same_score(self) -> None:
+        """
+        GIVEN: An external and a built-in parser both returning score 10.
+        WHEN:  get_parser_for_file() is called.
+        THEN:  The external parser wins because externals are evaluated first
+               and the first-seen-wins policy applies at equal scores.
+        """
+
+        class BuiltinParser:
+            name = "builtin"
+            version = "1.0"
+            author = "Core"
+            url = "https://example.com/builtin"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 10
+
+        class ExternalParser:
+            name = "external"
+            version = "2.0"
+            author = "Third Party"
+            url = "https://example.com/external"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 10
+
+        registry = ParserRegistry()
+        registry.register_builtin(BuiltinParser)
+        registry._external.append(ExternalParser)
+        result = registry.get_parser_for_file("text/plain", "file.txt")
+        assert result is ExternalParser
+
+    def test_builtin_wins_when_external_declines(self) -> None:
+        """
+        GIVEN: An external parser that declines (score None) and a built-in
+               that returns score 5.
+        WHEN:  get_parser_for_file() is called.
+        THEN:  The built-in parser is returned.
+        """
+
+        class DecliningExternal:
+            name = "declining-external"
+            version = "1.0"
+            author = "Third Party"
+            url = "https://example.com/declining"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None
+
+        class AcceptingBuiltin:
+            name = "accepting-builtin"
+            version = "1.0"
+            author = "Core"
+            url = "https://example.com/accepting"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 5
+
+        registry = ParserRegistry()
+        registry.register_builtin(AcceptingBuiltin)
+        registry._external.append(DecliningExternal)
+        result = registry.get_parser_for_file("text/plain", "file.txt")
+        assert result is AcceptingBuiltin
+
+
+class TestDiscover:
+    """Verify entrypoint discovery in ParserRegistry.discover()."""
+
+    def test_discover_with_no_entrypoints(self) -> None:
+        """
+        GIVEN: No entrypoints are registered under 'paperless_ngx.parsers'.
+        WHEN:  discover() is called.
+        THEN:  _external remains empty and no errors are raised.
+        """
+        registry = ParserRegistry()
+
+        with patch(
+            "paperless.parsers.registry.entry_points",
+            return_value=[],
+        ):
+            registry.discover()
+
+        assert registry._external == []
+
+    def test_discover_adds_valid_external_parser(self) -> None:
+        """
+        GIVEN: One valid entrypoint whose loaded class has all required attrs.
+        WHEN:  discover() is called.
+        THEN:  The class is appended to _external.
+        """
+
+        class ValidExternal:
+            name = "valid-external"
+            version = "3.0.0"
+            author = "Someone"
+            url = "https://example.com/valid"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"application/pdf": ".pdf"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 5
+
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "valid_external"
+        mock_ep.load.return_value = ValidExternal
+
+        registry = ParserRegistry()
+
+        with patch(
+            "paperless.parsers.registry.entry_points",
+            return_value=[mock_ep],
+        ):
+            registry.discover()
+
+        assert ValidExternal in registry._external
+
+    def test_discover_skips_entrypoint_with_load_error(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: An entrypoint whose load() method raises ImportError.
+        WHEN:  discover() is called.
+        THEN:  The entrypoint is skipped, an error is logged, and _external
+               remains empty.
+        """
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "broken_ep"
+        mock_ep.load.side_effect = ImportError("missing dependency")
+
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.ERROR, logger="paperless.parsers.registry"):
+            with patch(
+                "paperless.parsers.registry.entry_points",
+                return_value=[mock_ep],
+            ):
+                registry.discover()
+
+        assert registry._external == []
+        assert any(
+            "broken_ep" in record.message
+            for record in caplog.records
+            if record.levelno >= logging.ERROR
+        )
+
+    def test_discover_skips_entrypoint_with_missing_attrs(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A class loaded from an entrypoint that is missing the 'score'
+               attribute.
+        WHEN:  discover() is called.
+        THEN:  The entrypoint is skipped, a warning is logged, and _external
+               remains empty.
+        """
+
+        class MissingScore:
+            name = "missing-score"
+            version = "1.0"
+            author = "Someone"
+            url = "https://example.com"
+
+            # 'score' classmethod is intentionally absent.
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"text/plain": ".txt"}
+
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "missing_score_ep"
+        mock_ep.load.return_value = MissingScore
+
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.WARNING, logger="paperless.parsers.registry"):
+            with patch(
+                "paperless.parsers.registry.entry_points",
+                return_value=[mock_ep],
+            ):
+                registry.discover()
+
+        assert registry._external == []
+        assert any(
+            "missing_score_ep" in record.message
+            for record in caplog.records
+            if record.levelno >= logging.WARNING
+        )
+
+    def test_discover_logs_loaded_parser_info(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A valid entrypoint that loads successfully.
+        WHEN:  discover() is called.
+        THEN:  An INFO log message is emitted containing the parser name,
+               version, author, and entrypoint name.
+        """
+
+        class LoggableParser:
+            name = "loggable"
+            version = "4.2.0"
+            author = "Log Tester"
+            url = "https://example.com/loggable"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {"image/png": ".png"}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return 1
+
+        mock_ep = MagicMock(spec=EntryPoint)
+        mock_ep.name = "loggable_ep"
+        mock_ep.load.return_value = LoggableParser
+
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            with patch(
+                "paperless.parsers.registry.entry_points",
+                return_value=[mock_ep],
+            ):
+                registry.discover()
+
+        info_messages = " ".join(
+            r.message for r in caplog.records if r.levelno == logging.INFO
+        )
+        assert "loggable" in info_messages
+        assert "4.2.0" in info_messages
+        assert "Log Tester" in info_messages
+        assert "loggable_ep" in info_messages
+
+
+class TestLogSummary:
+    """Verify log output from ParserRegistry.log_summary()."""
+
+    def test_log_summary_with_no_external_parsers(
+        self,
+        dummy_parser_cls: type,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A registry with one built-in parser and no external parsers.
+        WHEN:  log_summary() is called.
+        THEN:  The built-in parser name appears in the logs.
+        """
+        registry = ParserRegistry()
+        registry.register_builtin(dummy_parser_cls)
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            registry.log_summary()
+
+        all_messages = " ".join(r.message for r in caplog.records)
+        assert dummy_parser_cls.name in all_messages
+
+    def test_log_summary_with_external_parsers(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A registry with one external parser registered.
+        WHEN:  log_summary() is called.
+        THEN:  The external parser name, version, author, and url appear in
+               the log output.
+        """
+
+        class ExtParser:
+            name = "ext-parser"
+            version = "9.9.9"
+            author = "Ext Corp"
+            url = "https://ext.example.com"
+
+            @classmethod
+            def supported_mime_types(cls):
+                return {}
+
+            @classmethod
+            def score(cls, mime_type, filename, path=None):
+                return None
+
+        registry = ParserRegistry()
+        registry._external.append(ExtParser)
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            registry.log_summary()
+
+        all_messages = " ".join(r.message for r in caplog.records)
+        assert "ext-parser" in all_messages
+        assert "9.9.9" in all_messages
+        assert "Ext Corp" in all_messages
+        assert "https://ext.example.com" in all_messages
+
+    def test_log_summary_logs_no_third_party_message_when_none(
+        self,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """
+        GIVEN: A registry with no external parsers.
+        WHEN:  log_summary() is called.
+        THEN:  A message containing 'No third-party parsers discovered.' is
+               logged.
+        """
+        registry = ParserRegistry()
+
+        with caplog.at_level(logging.INFO, logger="paperless.parsers.registry"):
+            registry.log_summary()
+
+        all_messages = " ".join(r.message for r in caplog.records)
+        assert "No third-party parsers discovered." in all_messages
--- a/src/paperless/tests/test_utils.py
+++ b/src/paperless/tests/test_utils.py
@@ -9,35 +9,50 @@ from paperless.utils import ocr_to_dateparser_languages
@pytest.mark.parametrize(
    ("ocr_language", "expected"),
    [
-        # One language
-        ("eng", ["en"]),
-        # Multiple languages
-        ("fra+ita+lao", ["fr", "it", "lo"]),
-        # Languages that don't have a two-letter equivalent
-        ("fil", ["fil"]),
-        # Languages with a script part supported by dateparser
-        ("aze_cyrl+srp_latn", ["az-Cyrl", "sr-Latn"]),
-        # Languages with a script part not supported by dateparser
-        # In this case, default to the language without script
-        ("deu_frak", ["de"]),
-        # Traditional and simplified chinese don't have the same name in dateparser,
-        # so they're converted to the general chinese language
-        ("chi_tra+chi_sim", ["zh"]),
-        # If a language is not supported by dateparser, fallback to the supported ones
-        ("eng+unsupported_language+por", ["en", "pt"]),
-        # If no language is supported, fallback to default
-        ("unsupported1+unsupported2", []),
-        # Duplicate languages, should not duplicate in result
-        ("eng+eng", ["en"]),
-        # Language with script, but script is not mapped
-        ("ita_unknownscript", ["it"]),
+        pytest.param("eng", ["en"], id="single-language"),
+        pytest.param("fra+ita+lao", ["fr", "it", "lo"], id="multiple-languages"),
+        pytest.param("fil", ["fil"], id="no-two-letter-equivalent"),
+        pytest.param(
+            "aze_cyrl+srp_latn",
+            ["az-Cyrl", "sr-Latn"],
+            id="script-supported-by-dateparser",
+        ),
+        pytest.param(
+            "deu_frak",
+            ["de"],
+            id="script-not-supported-falls-back-to-language",
+        ),
+        pytest.param(
+            "chi_tra+chi_sim",
+            ["zh"],
+            id="chinese-variants-collapse-to-general",
+        ),
+        pytest.param(
+            "eng+unsupported_language+por",
+            ["en", "pt"],
+            id="unsupported-language-skipped",
+        ),
+        pytest.param(
+            "unsupported1+unsupported2",
+            [],
+            id="all-unsupported-returns-empty",
+        ),
+        pytest.param("eng+eng", ["en"], id="duplicates-deduplicated"),
+        pytest.param(
+            "ita_unknownscript",
+            ["it"],
+            id="unknown-script-falls-back-to-language",
+        ),
    ],
 )
-def test_ocr_to_dateparser_languages(ocr_language, expected):
+def test_ocr_to_dateparser_languages(ocr_language: str, expected: list[str]) -> None:
    assert sorted(ocr_to_dateparser_languages(ocr_language)) == sorted(expected)


-def test_ocr_to_dateparser_languages_exception(monkeypatch, caplog):
+def test_ocr_to_dateparser_languages_exception(
+    monkeypatch: pytest.MonkeyPatch,
+    caplog: pytest.LogCaptureFixture,
+) -> None:
    # Patch LocaleDataLoader.get_locale_map to raise an exception
    class DummyLoader:
        def get_locale_map(self, locales=None):
--- a/src/paperless/tests/test_views.py
+++ b/src/paperless/tests/test_views.py
@@ -1,24 +1,31 @@
-import tempfile
 from pathlib import Path

-from django.test import override_settings
+from django.test import Client
+from pytest_django.fixtures import SettingsWrapper


-def test_favicon_view(client):
-    with tempfile.TemporaryDirectory() as tmpdir:
-        static_dir = Path(tmpdir)
-        favicon_path = static_dir / "paperless" / "img" / "favicon.ico"
-        favicon_path.parent.mkdir(parents=True, exist_ok=True)
-        favicon_path.write_bytes(b"FAKE ICON DATA")
+def test_favicon_view(
+    client: Client,
+    tmp_path: Path,
+    settings: SettingsWrapper,
+) -> None:
+    favicon_path = tmp_path / "paperless" / "img" / "favicon.ico"
+    favicon_path.parent.mkdir(parents=True)
+    favicon_path.write_bytes(b"FAKE ICON DATA")

-        with override_settings(STATIC_ROOT=static_dir):
-            response = client.get("/favicon.ico")
-            assert response.status_code == 200
-            assert response["Content-Type"] == "image/x-icon"
-            assert b"".join(response.streaming_content) == b"FAKE ICON DATA"
+    settings.STATIC_ROOT = tmp_path
+
+    response = client.get("/favicon.ico")
+    assert response.status_code == 200
+    assert response["Content-Type"] == "image/x-icon"
+    assert b"".join(response.streaming_content) == b"FAKE ICON DATA"


-def test_favicon_view_missing_file(client):
-    with override_settings(STATIC_ROOT=Path(tempfile.mkdtemp())):
-        response = client.get("/favicon.ico")
-        assert response.status_code == 404
+def test_favicon_view_missing_file(
+    client: Client,
+    tmp_path: Path,
+    settings: SettingsWrapper,
+) -> None:
+    settings.STATIC_ROOT = tmp_path
+    response = client.get("/favicon.ico")
+    assert response.status_code == 404
--- a/src/paperless_ai/base_model.py
+++ b/src/paperless_ai/base_model.py
@@ -1,4 +1,4 @@
-from llama_index.core.bridge.pydantic import BaseModel
+from pydantic import BaseModel


 class DocumentClassifierSchema(BaseModel):
--- a/src/paperless_ai/chat.py
+++ b/src/paperless_ai/chat.py
@@ -1,10 +1,6 @@
 import logging
 import sys

-from llama_index.core import VectorStoreIndex
-from llama_index.core.prompts import PromptTemplate
-from llama_index.core.query_engine import RetrieverQueryEngine
-
 from documents.models import Document
 from paperless_ai.client import AIClient
 from paperless_ai.indexing import load_or_build_index
@@ -14,15 +10,13 @@ logger = logging.getLogger("paperless_ai.chat")
 MAX_SINGLE_DOC_CONTEXT_CHARS = 15000
 SINGLE_DOC_SNIPPET_CHARS = 800

-CHAT_PROMPT_TMPL = PromptTemplate(
-    template="""Context information is below.
+CHAT_PROMPT_TMPL = """Context information is below.
    ---------------------
    {context_str}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Query: {query_str}
-    Answer:""",
-)
+    Answer:"""


 def stream_chat_with_documents(query_str: str, documents: list[Document]):
@@ -43,6 +37,10 @@ def stream_chat_with_documents(query_str: str, documents: list[Document]):
        yield "Sorry, I couldn't find any content to answer your question."
        return

+    from llama_index.core import VectorStoreIndex
+    from llama_index.core.prompts import PromptTemplate
+    from llama_index.core.query_engine import RetrieverQueryEngine
+
    local_index = VectorStoreIndex(nodes=nodes)
    retriever = local_index.as_retriever(
        similarity_top_k=3 if len(documents) == 1 else 5,
@@ -85,7 +83,8 @@ def stream_chat_with_documents(query_str: str, documents: list[Document]):
            for node in top_nodes
        )

-    prompt = CHAT_PROMPT_TMPL.partial_format(
+    prompt_template = PromptTemplate(template=CHAT_PROMPT_TMPL)
+    prompt = prompt_template.partial_format(
        context_str=context,
        query_str=query_str,
    ).format(llm=client.llm)
--- a/src/paperless_ai/client.py
+++ b/src/paperless_ai/client.py
@@ -1,9 +1,10 @@
 import logging
+from typing import TYPE_CHECKING

-from llama_index.core.llms import ChatMessage
-from llama_index.core.program.function_program import get_function_tool
-from llama_index.llms.ollama import Ollama
-from llama_index.llms.openai import OpenAI
+if TYPE_CHECKING:
+    from llama_index.core.llms import ChatMessage
+    from llama_index.llms.ollama import Ollama
+    from llama_index.llms.openai import OpenAI

 from paperless.config import AIConfig
 from paperless_ai.base_model import DocumentClassifierSchema
@@ -20,14 +21,18 @@ class AIClient:
        self.settings = AIConfig()
        self.llm = self.get_llm()

-    def get_llm(self) -> Ollama | OpenAI:
+    def get_llm(self) -> "Ollama | OpenAI":
        if self.settings.llm_backend == "ollama":
+            from llama_index.llms.ollama import Ollama
+
            return Ollama(
                model=self.settings.llm_model or "llama3.1",
                base_url=self.settings.llm_endpoint or "http://localhost:11434",
                request_timeout=120,
            )
        elif self.settings.llm_backend == "openai":
+            from llama_index.llms.openai import OpenAI
+
            return OpenAI(
                model=self.settings.llm_model or "gpt-3.5-turbo",
                api_base=self.settings.llm_endpoint or None,
@@ -43,6 +48,9 @@ class AIClient:
            self.settings.llm_model,
        )

+        from llama_index.core.llms import ChatMessage
+        from llama_index.core.program.function_program import get_function_tool
+
        user_msg = ChatMessage(role="user", content=prompt)
        tool = get_function_tool(DocumentClassifierSchema)
        result = self.llm.chat_with_tools(
@@ -58,7 +66,7 @@ class AIClient:
        parsed = DocumentClassifierSchema(**tool_calls[0].tool_kwargs)
        return parsed.model_dump()

-    def run_chat(self, messages: list[ChatMessage]) -> str:
+    def run_chat(self, messages: list["ChatMessage"]) -> str:
        logger.debug(
            "Running chat query against %s with model %s",
            self.settings.llm_backend,
--- a/src/paperless_ai/embedding.py
+++ b/src/paperless_ai/embedding.py
@@ -1,13 +1,12 @@
 import json
 from typing import TYPE_CHECKING

+from django.conf import settings
+
 if TYPE_CHECKING:
    from pathlib import Path

-from django.conf import settings
-from llama_index.core.base.embeddings.base import BaseEmbedding
-from llama_index.embeddings.huggingface import HuggingFaceEmbedding
-from llama_index.embeddings.openai import OpenAIEmbedding
+    from llama_index.core.base.embeddings.base import BaseEmbedding

 from documents.models import Document
 from documents.models import Note
@@ -15,17 +14,21 @@ from paperless.config import AIConfig
 from paperless.models import LLMEmbeddingBackend


-def get_embedding_model() -> BaseEmbedding:
+def get_embedding_model() -> "BaseEmbedding":
    config = AIConfig()

    match config.llm_embedding_backend:
        case LLMEmbeddingBackend.OPENAI:
+            from llama_index.embeddings.openai import OpenAIEmbedding
+
            return OpenAIEmbedding(
                model=config.llm_embedding_model or "text-embedding-3-small",
                api_key=config.llm_api_key,
                api_base=config.llm_endpoint or None,
            )
        case LLMEmbeddingBackend.HUGGINGFACE:
+            from llama_index.embeddings.huggingface import HuggingFaceEmbedding
+
            return HuggingFaceEmbedding(
                model_name=config.llm_embedding_model
                or "sentence-transformers/all-MiniLM-L6-v2",
--- a/src/paperless_ai/indexing.py
+++ b/src/paperless_ai/indexing.py
@@ -4,26 +4,12 @@ from collections.abc import Callable
 from collections.abc import Iterable
 from datetime import timedelta
 from pathlib import Path
+from typing import TYPE_CHECKING
 from typing import TypeVar

-import faiss
-import llama_index.core.settings as llama_settings
 from celery import states
 from django.conf import settings
 from django.utils import timezone
-from llama_index.core import Document as LlamaDocument
-from llama_index.core import StorageContext
-from llama_index.core import VectorStoreIndex
-from llama_index.core import load_index_from_storage
-from llama_index.core.indices.prompt_helper import PromptHelper
-from llama_index.core.node_parser import SimpleNodeParser
-from llama_index.core.prompts import PromptTemplate
-from llama_index.core.retrievers import VectorIndexRetriever
-from llama_index.core.schema import BaseNode
-from llama_index.core.storage.docstore import SimpleDocumentStore
-from llama_index.core.storage.index_store import SimpleIndexStore
-from llama_index.core.text_splitter import TokenTextSplitter
-from llama_index.vector_stores.faiss import FaissVectorStore

 from documents.models import Document
 from documents.models import PaperlessTask
@@ -34,6 +20,10 @@ from paperless_ai.embedding import get_embedding_model
 _T = TypeVar("_T")
 IterWrapper = Callable[[Iterable[_T]], Iterable[_T]]

+if TYPE_CHECKING:
+    from llama_index.core import VectorStoreIndex
+    from llama_index.core.schema import BaseNode
+

 def _identity(iterable: Iterable[_T]) -> Iterable[_T]:
    return iterable
@@ -75,12 +65,23 @@ def get_or_create_storage_context(*, rebuild=False):
        settings.LLM_INDEX_DIR.mkdir(parents=True, exist_ok=True)

    if rebuild or not settings.LLM_INDEX_DIR.exists():
+        import faiss
+        from llama_index.core import StorageContext
+        from llama_index.core.storage.docstore import SimpleDocumentStore
+        from llama_index.core.storage.index_store import SimpleIndexStore
+        from llama_index.vector_stores.faiss import FaissVectorStore
+
        embedding_dim = get_embedding_dim()
        faiss_index = faiss.IndexFlatL2(embedding_dim)
        vector_store = FaissVectorStore(faiss_index=faiss_index)
        docstore = SimpleDocumentStore()
        index_store = SimpleIndexStore()
    else:
+        from llama_index.core import StorageContext
+        from llama_index.core.storage.docstore import SimpleDocumentStore
+        from llama_index.core.storage.index_store import SimpleIndexStore
+        from llama_index.vector_stores.faiss import FaissVectorStore
+
        vector_store = FaissVectorStore.from_persist_dir(settings.LLM_INDEX_DIR)
        docstore = SimpleDocumentStore.from_persist_dir(settings.LLM_INDEX_DIR)
        index_store = SimpleIndexStore.from_persist_dir(settings.LLM_INDEX_DIR)
@@ -93,7 +94,7 @@ def get_or_create_storage_context(*, rebuild=False):
    )


-def build_document_node(document: Document) -> list[BaseNode]:
+def build_document_node(document: Document) -> list["BaseNode"]:
    """
    Given a Document, returns parsed Nodes ready for indexing.
    """
@@ -112,6 +113,9 @@ def build_document_node(document: Document) -> list[BaseNode]:
        "added": document.added.isoformat() if document.added else None,
        "modified": document.modified.isoformat(),
    }
+    from llama_index.core import Document as LlamaDocument
+    from llama_index.core.node_parser import SimpleNodeParser
+
    doc = LlamaDocument(text=text, metadata=metadata)
    parser = SimpleNodeParser()
    return parser.get_nodes_from_documents([doc])
@@ -122,6 +126,10 @@ def load_or_build_index(nodes=None):
    Load an existing VectorStoreIndex if present,
    or build a new one using provided nodes if storage is empty.
    """
+    import llama_index.core.settings as llama_settings
+    from llama_index.core import VectorStoreIndex
+    from llama_index.core import load_index_from_storage
+
    embed_model = get_embedding_model()
    llama_settings.Settings.embed_model = embed_model
    storage_context = get_or_create_storage_context()
@@ -143,7 +151,7 @@ def load_or_build_index(nodes=None):
        )


-def remove_document_docstore_nodes(document: Document, index: VectorStoreIndex):
+def remove_document_docstore_nodes(document: Document, index: "VectorStoreIndex"):
    """
    Removes existing documents from docstore for a given document from the index.
    This is necessary because FAISS IndexFlatL2 is append-only.
@@ -174,6 +182,8 @@ def update_llm_index(
    """
    Rebuild or update the LLM index.
    """
+    from llama_index.core import VectorStoreIndex
+
    nodes = []

    documents = Document.objects.all()
@@ -187,6 +197,8 @@ def update_llm_index(
        (settings.LLM_INDEX_DIR / "meta.json").unlink(missing_ok=True)
        # Rebuild index from scratch
        logger.info("Rebuilding LLM index.")
+        import llama_index.core.settings as llama_settings
+
        embed_model = get_embedding_model()
        llama_settings.Settings.embed_model = embed_model
        storage_context = get_or_create_storage_context(rebuild=True)
@@ -271,6 +283,10 @@ def llm_index_remove_document(document: Document):


 def truncate_content(content: str) -> str:
+    from llama_index.core.indices.prompt_helper import PromptHelper
+    from llama_index.core.prompts import PromptTemplate
+    from llama_index.core.text_splitter import TokenTextSplitter
+
    prompt_helper = PromptHelper(
        context_window=8192,
        num_output=512,
@@ -315,6 +331,8 @@ def query_similar_documents(
        else None
    )

+    from llama_index.core.retrievers import VectorIndexRetriever
+
    retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=top_k,
--- a/src/paperless_ai/tests/test_ai_indexing.py
+++ b/src/paperless_ai/tests/test_ai_indexing.py
@@ -181,11 +181,11 @@ def test_load_or_build_index_builds_when_nodes_given(
 ) -> None:
    with (
        patch(
-            "paperless_ai.indexing.load_index_from_storage",
+            "llama_index.core.load_index_from_storage",
            side_effect=ValueError("Index not found"),
        ),
        patch(
-            "paperless_ai.indexing.VectorStoreIndex",
+            "llama_index.core.VectorStoreIndex",
            return_value=MagicMock(),
        ) as mock_index_cls,
        patch(
@@ -206,7 +206,7 @@ def test_load_or_build_index_raises_exception_when_no_nodes(
 ) -> None:
    with (
        patch(
-            "paperless_ai.indexing.load_index_from_storage",
+            "llama_index.core.load_index_from_storage",
            side_effect=ValueError("Index not found"),
        ),
        patch(
@@ -225,11 +225,11 @@ def test_load_or_build_index_succeeds_when_nodes_given(
 ) -> None:
    with (
        patch(
-            "paperless_ai.indexing.load_index_from_storage",
+            "llama_index.core.load_index_from_storage",
            side_effect=ValueError("Index not found"),
        ),
        patch(
-            "paperless_ai.indexing.VectorStoreIndex",
+            "llama_index.core.VectorStoreIndex",
            return_value=MagicMock(),
        ) as mock_index_cls,
        patch(
@@ -334,7 +334,7 @@ def test_query_similar_documents(
        patch(
            "paperless_ai.indexing.vector_store_file_exists",
        ) as mock_vector_store_exists,
-        patch("paperless_ai.indexing.VectorIndexRetriever") as mock_retriever_cls,
+        patch("llama_index.core.retrievers.VectorIndexRetriever") as mock_retriever_cls,
        patch("paperless_ai.indexing.Document.objects.filter") as mock_filter,
    ):
        mock_storage.return_value = MagicMock()
--- a/src/paperless_ai/tests/test_chat.py
+++ b/src/paperless_ai/tests/test_chat.py
@@ -45,7 +45,7 @@ def test_stream_chat_with_one_document_full_content(mock_document) -> None:
        patch("paperless_ai.chat.AIClient") as mock_client_cls,
        patch("paperless_ai.chat.load_or_build_index") as mock_load_index,
        patch(
-            "paperless_ai.chat.RetrieverQueryEngine.from_args",
+            "llama_index.core.query_engine.RetrieverQueryEngine.from_args",
        ) as mock_query_engine_cls,
    ):
        mock_client = MagicMock()
@@ -76,7 +76,7 @@ def test_stream_chat_with_multiple_documents_retrieval(patch_embed_nodes) -> Non
        patch("paperless_ai.chat.AIClient") as mock_client_cls,
        patch("paperless_ai.chat.load_or_build_index") as mock_load_index,
        patch(
-            "paperless_ai.chat.RetrieverQueryEngine.from_args",
+            "llama_index.core.query_engine.RetrieverQueryEngine.from_args",
        ) as mock_query_engine_cls,
        patch.object(VectorStoreIndex, "as_retriever") as mock_as_retriever,
    ):
--- a/src/paperless_ai/tests/test_client.py
+++ b/src/paperless_ai/tests/test_client.py
@@ -18,13 +18,13 @@ def mock_ai_config():

@pytest.fixture
 def mock_ollama_llm():
-    with patch("paperless_ai.client.Ollama") as MockOllama:
+    with patch("llama_index.llms.ollama.Ollama") as MockOllama:
        yield MockOllama


@pytest.fixture
 def mock_openai_llm():
-    with patch("paperless_ai.client.OpenAI") as MockOpenAI:
+    with patch("llama_index.llms.openai.OpenAI") as MockOpenAI:
        yield MockOpenAI


--- a/src/paperless_ai/tests/test_embedding.py
+++ b/src/paperless_ai/tests/test_embedding.py
@@ -67,7 +67,7 @@ def test_get_embedding_model_openai(mock_ai_config):
    mock_ai_config.return_value.llm_api_key = "test_api_key"
    mock_ai_config.return_value.llm_endpoint = "http://test-url"

-    with patch("paperless_ai.embedding.OpenAIEmbedding") as MockOpenAIEmbedding:
+    with patch("llama_index.embeddings.openai.OpenAIEmbedding") as MockOpenAIEmbedding:
        model = get_embedding_model()
        MockOpenAIEmbedding.assert_called_once_with(
            model="text-embedding-3-small",
@@ -84,7 +84,7 @@ def test_get_embedding_model_huggingface(mock_ai_config):
    )

    with patch(
-        "paperless_ai.embedding.HuggingFaceEmbedding",
+        "llama_index.embeddings.huggingface.HuggingFaceEmbedding",
    ) as MockHuggingFaceEmbedding:
        model = get_embedding_model()
        MockHuggingFaceEmbedding.assert_called_once_with(
--- a/src/paperless_text/parsers.py
+++ b/src/paperless_text/parsers.py
@@ -1,50 +0,0 @@
-from pathlib import Path
-
-from django.conf import settings
-from PIL import Image
-from PIL import ImageDraw
-from PIL import ImageFont
-
-from documents.parsers import DocumentParser
-
-
-class TextDocumentParser(DocumentParser):
-    """
-    This parser directly parses a text document (.txt, .md, or .csv)
-    """
-
-    logging_name = "paperless.parsing.text"
-
-    def get_thumbnail(self, document_path: Path, mime_type, file_name=None) -> Path:
-        # Avoid reading entire file into memory
-        max_chars = 100_000
-        file_size_limit = 50 * 1024 * 1024
-
-        if document_path.stat().st_size > file_size_limit:
-            text = "[File too large to preview]"
-        else:
-            with Path(document_path).open("r", encoding="utf-8", errors="replace") as f:
-                text = f.read(max_chars)
-
-        img = Image.new("RGB", (500, 700), color="white")
-        draw = ImageDraw.Draw(img)
-        font = ImageFont.truetype(
-            font=settings.THUMBNAIL_FONT_NAME,
-            size=20,
-            layout_engine=ImageFont.Layout.BASIC,
-        )
-        draw.multiline_text((5, 5), text, font=font, fill="black", spacing=4)
-
-        out_path = self.tempdir / "thumb.webp"
-        img.save(out_path, format="WEBP")
-
-        return out_path
-
-    def parse(self, document_path, mime_type, file_name=None) -> None:
-        self.text = self.read_file_handle_unicode_errors(document_path)
-
-    def get_settings(self) -> None:
-        """
-        This parser does not implement additional settings yet
-        """
-        return None
--- a/uv.lock
+++ b/uv.lock
@@ -1748,6 +1748,73 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" },
 ]

+[[package]]
+name = "ijson"
+version = "3.5.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f4/57/60d1a6a512f2f0508d0bc8b4f1cc5616fd3196619b66bd6a01f9155a1292/ijson-3.5.0.tar.gz", hash = "sha256:94688760720e3f5212731b3cb8d30267f9a045fb38fb3870254e7b9504246f31", size = 68658, upload-time = "2026-02-24T03:58:30.974Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/65/da/644343198abca5e0f6e2486063f8d8f3c443ca0ef5e5c890e51ef6032e33/ijson-3.5.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:5616311404b858d32740b7ad8b9a799c62165f5ecb85d0a8ed16c21665a90533", size = 88964, upload-time = "2026-02-24T03:56:53.099Z" },
+    { url = "https://files.pythonhosted.org/packages/5b/63/8621190aa2baf96156dfd4c632b6aa9f1464411e50b98750c09acc0505ea/ijson-3.5.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e9733f94029dd41702d573ef64752e2556e72aea14623d6dbb7a44ca1ccf30fd", size = 60582, upload-time = "2026-02-24T03:56:54.261Z" },
+    { url = "https://files.pythonhosted.org/packages/20/31/6a3f041fdd17dacff33b7d7d3ba3df6dca48740108340c6042f974b2ad20/ijson-3.5.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:db8398c6721b98412a4f618da8022550c8b9c5d9214040646071b5deb4d4a393", size = 60632, upload-time = "2026-02-24T03:56:55.159Z" },
+    { url = "https://files.pythonhosted.org/packages/e4/68/474541998abbdecfd46a744536878335de89aceb9f085bff1aaf35575ceb/ijson-3.5.0-cp311-cp311-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:c061314845c08163b1784b6076ea5f075372461a32e6916f4e5f211fd4130b64", size = 131988, upload-time = "2026-02-24T03:56:56.35Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/32/e05ff8b72a44fe9d192f41c5dcbc35cfa87efc280cdbfe539ffaf4a7535e/ijson-3.5.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1111a1c5ac79119c5d6e836f900c1a53844b50a18af38311baa6bb61e2645aca", size = 138669, upload-time = "2026-02-24T03:56:57.555Z" },
+    { url = "https://files.pythonhosted.org/packages/49/b5/955a83b031102c7a602e2c06d03aff0a0e584212f09edb94ccc754d203ac/ijson-3.5.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1e74aff8c681c24002b61b1822f9511d4c384f324f7dbc08c78538e01fdc9fcb", size = 135093, upload-time = "2026-02-24T03:56:59.267Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/f2/30250cfcb4d2766669b31f6732689aab2bb91de426a15a3ebe482df7ee48/ijson-3.5.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:739a7229b1b0cc5f7e2785a6e7a5fc915e850d3fed9588d0e89a09f88a417253", size = 138715, upload-time = "2026-02-24T03:57:00.491Z" },
+    { url = "https://files.pythonhosted.org/packages/a2/05/785a145d7e75e04e04480d59b6323cd4b1d9013a6cd8643fa635fbc93490/ijson-3.5.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:ef88712160360cab3ca6471a4e5418243f8b267cf1fe1620879d1b5558babc71", size = 133194, upload-time = "2026-02-24T03:57:01.759Z" },
+    { url = "https://files.pythonhosted.org/packages/14/eb/80d6f8a748dead4034cea0939494a67d10ccf88d6413bf6e860393139676/ijson-3.5.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:6ca0d1b6b5f8166a6248f4309497585fb8553b04bc8179a0260fad636cfdb798", size = 135588, upload-time = "2026-02-24T03:57:03.131Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/17/9c63c7688025f3a8c47ea717b8306649c8c7244e49e20a2be4e3515dc75c/ijson-3.5.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:1ebefbe149a6106cc848a3eaf536af51a9b5ccc9082de801389f152dba6ab755", size = 88536, upload-time = "2026-02-24T03:57:06.809Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/dd/e15c2400244c117b06585452ebc63ae254f5a6964f712306afd1422daae0/ijson-3.5.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:19e30d9f00f82e64de689c0b8651b9cfed879c184b139d7e1ea5030cec401c21", size = 60499, upload-time = "2026-02-24T03:57:09.155Z" },
+    { url = "https://files.pythonhosted.org/packages/77/a9/bf4fe3538a0c965f16b406f180a06105b875da83f0743e36246be64ef550/ijson-3.5.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:a04a33ee78a6f27b9b8528c1ca3c207b1df3b8b867a4cf2fcc4109986f35c227", size = 60330, upload-time = "2026-02-24T03:57:10.574Z" },
+    { url = "https://files.pythonhosted.org/packages/31/76/6f91bdb019dd978fce1bc5ea1cd620cfc096d258126c91db2c03a20a7f34/ijson-3.5.0-cp312-cp312-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:7d48dc2984af02eb3c56edfb3f13b3f62f2f3e4fe36f058c8cfc75d93adf4fed", size = 138977, upload-time = "2026-02-24T03:57:11.932Z" },
+    { url = "https://files.pythonhosted.org/packages/11/be/bbc983059e48a54b0121ee60042979faed7674490bbe7b2c41560db3f436/ijson-3.5.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f1e73a44844d9adbca9cf2c4132cd875933e83f3d4b23881fcaf82be83644c7d", size = 149785, upload-time = "2026-02-24T03:57:13.255Z" },
+    { url = "https://files.pythonhosted.org/packages/6d/81/2fee58f9024a3449aee83edfa7167fb5ccd7e1af2557300e28531bb68e16/ijson-3.5.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7389a56b8562a19948bdf1d7bae3a2edc8c7f86fb59834dcb1c4c722818e645a", size = 149729, upload-time = "2026-02-24T03:57:14.191Z" },
+    { url = "https://files.pythonhosted.org/packages/c7/56/f1706761fcc096c9d414b3dcd000b1e6e5c24364c21cfba429837f98ee8d/ijson-3.5.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:3176f23f8ebec83f374ed0c3b4e5a0c4db7ede54c005864efebbed46da123608", size = 150697, upload-time = "2026-02-24T03:57:15.855Z" },
+    { url = "https://files.pythonhosted.org/packages/d9/6e/ee0d9c875a0193b632b3e9ccd1b22a50685fb510256ad57ba483b6529f77/ijson-3.5.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:6babd88e508630c6ef86c9bebaaf13bb2fb8ec1d8f8868773a03c20253f599bc", size = 142873, upload-time = "2026-02-24T03:57:16.831Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/bf/f9d4399d0e6e3fd615035290a71e97c843f17f329b43638c0a01cf112d73/ijson-3.5.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:dc1b3836b174b6db2fa8319f1926fb5445abd195dc963368092103f8579cb8ed", size = 151583, upload-time = "2026-02-24T03:57:17.757Z" },
+    { url = "https://files.pythonhosted.org/packages/a2/71/d67e764a712c3590627480643a3b51efcc3afa4ef3cb54ee4c989073c97e/ijson-3.5.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:e9cedc10e40dd6023c351ed8bfc7dcfce58204f15c321c3c1546b9c7b12562a4", size = 88544, upload-time = "2026-02-24T03:57:21.293Z" },
+    { url = "https://files.pythonhosted.org/packages/1a/39/f1c299371686153fa3cf5c0736b96247a87a1bee1b7145e6d21f359c505a/ijson-3.5.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:3647649f782ee06c97490b43680371186651f3f69bebe64c6083ee7615d185e5", size = 60495, upload-time = "2026-02-24T03:57:22.501Z" },
+    { url = "https://files.pythonhosted.org/packages/16/94/b1438e204d75e01541bebe3e668fe3e68612d210e9931ae1611062dd0a56/ijson-3.5.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:90e74be1dce05fce73451c62d1118671f78f47c9f6be3991c82b91063bf01fc9", size = 60325, upload-time = "2026-02-24T03:57:23.332Z" },
+    { url = "https://files.pythonhosted.org/packages/30/e2/4aa9c116fa86cc8b0f574f3c3a47409edc1cd4face05d0e589a5a176b05d/ijson-3.5.0-cp313-cp313-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:78e9ad73e7be2dd80627504bd5cbf512348c55ce2c06e362ed7683b5220e8568", size = 138774, upload-time = "2026-02-24T03:57:24.683Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/d2/738b88752a70c3be1505faa4dcd7110668c2712e582a6a36488ed1e295d4/ijson-3.5.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9577449313cc94be89a4fe4b3e716c65f09cc19636d5a6b2861c4e80dddebd58", size = 149820, upload-time = "2026-02-24T03:57:26.062Z" },
+    { url = "https://files.pythonhosted.org/packages/ed/df/0b3ab9f393ca8f72ea03bc896ba9fdc987e90ae08cdb51c32a4ee0c14d5e/ijson-3.5.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:3e4c1178fb50aff5f5701a30a5152ead82a14e189ce0f6102fa1b5f10b2f54ff", size = 149747, upload-time = "2026-02-24T03:57:27.308Z" },
+    { url = "https://files.pythonhosted.org/packages/cc/a3/b0037119f75131b78cb00acc2657b1a9d0435475f1f2c5f8f5a170b66b9c/ijson-3.5.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:0eb402ab026ffb37a918d75af2b7260fe6cfbce13232cc83728a714dd30bd81d", size = 151027, upload-time = "2026-02-24T03:57:28.522Z" },
+    { url = "https://files.pythonhosted.org/packages/22/a0/cb344de1862bf09d8f769c9d25c944078c87dd59a1b496feec5ad96309a4/ijson-3.5.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:5b08ee08355f9f729612a8eb9bf69cc14f9310c3b2a487c6f1c3c65d85216ec4", size = 142996, upload-time = "2026-02-24T03:57:29.774Z" },
+    { url = "https://files.pythonhosted.org/packages/ca/32/a8ffd67182e02ea61f70f62daf43ded4fa8a830a2520a851d2782460aba8/ijson-3.5.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:bda62b6d48442903e7bf56152108afb7f0f1293c2b9bef2f2c369defea76ab18", size = 152068, upload-time = "2026-02-24T03:57:30.969Z" },
+    { url = "https://files.pythonhosted.org/packages/42/65/13e2492d17e19a2084523e18716dc2809159f2287fd2700c735f311e76c4/ijson-3.5.0-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:4d4b0cd676b8c842f7648c1a783448fac5cd3b98289abd83711b3e275e143524", size = 93019, upload-time = "2026-02-24T03:57:33.976Z" },
+    { url = "https://files.pythonhosted.org/packages/33/92/483fc97ece0c3f1cecabf48f6a7a36e89d19369eec462faaeaa34c788992/ijson-3.5.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:252dec3680a48bb82d475e36b4ae1b3a9d7eb690b951bb98a76c5fe519e30188", size = 62714, upload-time = "2026-02-24T03:57:34.819Z" },
+    { url = "https://files.pythonhosted.org/packages/4b/88/793fe020a0fe9d9eed4c285cf4a5cfdb0a935708b3bde0d72f35c794b513/ijson-3.5.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:aa1b5dca97d323931fde2501172337384c958914d81a9dac7f00f0d4bfc76bc7", size = 62460, upload-time = "2026-02-24T03:57:35.874Z" },
+    { url = "https://files.pythonhosted.org/packages/51/69/f1a2690aa8d4df1f4e262b385e65a933ffdc250b091531bac9a449c19e16/ijson-3.5.0-cp313-cp313t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:7a5ec7fd86d606094bba6f6f8f87494897102fa4584ef653f3005c51a784c320", size = 199273, upload-time = "2026-02-24T03:57:37.07Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/a2/f1346d5299e79b988ab472dc773d5381ec2d57c23cb2f1af3ede4a810e62/ijson-3.5.0-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:009f41443e1521847701c6d87fa3923c0b1961be3c7e7de90947c8cb92ea7c44", size = 216884, upload-time = "2026-02-24T03:57:38.346Z" },
+    { url = "https://files.pythonhosted.org/packages/28/3c/8b637e869be87799e6c2c3c275a30a546f086b1aed77e2b7f11512168c5a/ijson-3.5.0-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e4c3651d1f9fe2839a93fdf8fd1d5ca3a54975349894249f3b1b572bcc4bd577", size = 207306, upload-time = "2026-02-24T03:57:39.718Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/7c/18b1c1df6951ca056782d7580ec40cea4ff9a27a0947d92640d1cc8c4ae3/ijson-3.5.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:945b7abcfcfeae2cde17d8d900870f03536494245dda7ad4f8d056faa303256c", size = 211364, upload-time = "2026-02-24T03:57:40.953Z" },
+    { url = "https://files.pythonhosted.org/packages/f3/55/e795812e82851574a9dba8a53fde045378f531ef14110c6fb55dbd23b443/ijson-3.5.0-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:0574b0a841ff97495c13e9d7260fbf3d85358b061f540c52a123db9dbbaa2ed6", size = 200608, upload-time = "2026-02-24T03:57:42.272Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/cd/013c85b4749b57a4cb4c2670014d1b32b8db4ab1a7be92ea7aeb5d7fe7b5/ijson-3.5.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:f969ffb2b89c5cdf686652d7fb66252bc72126fa54d416317411497276056a18", size = 205127, upload-time = "2026-02-24T03:57:43.286Z" },
+    { url = "https://files.pythonhosted.org/packages/7a/93/0868efe753dc1df80cc405cf0c1f2527a6991643607c741bff8dcb899b3b/ijson-3.5.0-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:25a5a6b2045c90bb83061df27cfa43572afa43ba9408611d7bfe237c20a731a9", size = 89094, upload-time = "2026-02-24T03:57:46.115Z" },
+    { url = "https://files.pythonhosted.org/packages/24/94/fd5a832a0df52ef5e4e740f14ac8640725d61034a1b0c561e8b5fb424706/ijson-3.5.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:8976c54c0b864bc82b951bae06567566ac77ef63b90a773a69cd73aab47f4f4f", size = 60715, upload-time = "2026-02-24T03:57:47.552Z" },
+    { url = "https://files.pythonhosted.org/packages/70/79/1b9a90af5732491f9eec751ee211b86b11011e1158c555c06576d52c3919/ijson-3.5.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:859eb2038f7f1b0664df4241957694cc35e6295992d71c98659b22c69b3cbc10", size = 60638, upload-time = "2026-02-24T03:57:48.428Z" },
+    { url = "https://files.pythonhosted.org/packages/23/6f/2c551ea980fe56f68710a8d5389cfbd015fc45aaafd17c3c52c346db6aa1/ijson-3.5.0-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:c911aa02991c7c0d3639b6619b93a93210ff1e7f58bf7225d613abea10adc78e", size = 140667, upload-time = "2026-02-24T03:57:49.314Z" },
+    { url = "https://files.pythonhosted.org/packages/25/0e/27b887879ba6a5bc29766e3c5af4942638c952220fd63e1e442674f7883a/ijson-3.5.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:903cbdc350173605220edc19796fbea9b2203c8b3951fb7335abfa8ed37afda8", size = 149850, upload-time = "2026-02-24T03:57:50.329Z" },
+    { url = "https://files.pythonhosted.org/packages/da/1e/23e10e1bc04bf31193b21e2960dce14b17dbd5d0c62204e8401c59d62c08/ijson-3.5.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a4549d96ded5b8efa71639b2160235415f6bdb8c83367615e2dbabcb72755c33", size = 149206, upload-time = "2026-02-24T03:57:51.261Z" },
+    { url = "https://files.pythonhosted.org/packages/8e/90/e552f6495063b235cf7fa2c592f6597c057077195e517b842a0374fd470c/ijson-3.5.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:6b2dcf6349e6042d83f3f8c39ce84823cf7577eba25bac5aae5e39bbbbbe9c1c", size = 150438, upload-time = "2026-02-24T03:57:52.198Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/18/45bf8f297c41b42a1c231d261141097babd953d2c28a07be57ae4c3a1a02/ijson-3.5.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:e44af39e6f8a17e5627dcd89715d8279bf3474153ff99aae031a936e5c5572e5", size = 144369, upload-time = "2026-02-24T03:57:53.22Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/3a/deb9772bb2c0cead7ad64f00c3598eec9072bdf511818e70e2c512eeabbe/ijson-3.5.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:9260332304b7e7828db56d43f08fc970a3ab741bf84ff10189361ea1b60c395b", size = 151352, upload-time = "2026-02-24T03:57:54.375Z" },
+    { url = "https://files.pythonhosted.org/packages/9f/d9/86f7fac35e0835faa188085ae0579e813493d5261ce056484015ad533445/ijson-3.5.0-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:2ea4b676ec98e374c1df400a47929859e4fa1239274339024df4716e802aa7e4", size = 93069, upload-time = "2026-02-24T03:57:57.849Z" },
+    { url = "https://files.pythonhosted.org/packages/33/d2/e7366ed9c6e60228d35baf4404bac01a126e7775ea8ce57f560125ed190a/ijson-3.5.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:014586eec043e23c80be9a923c56c3a0920a0f1f7d17478ce7bc20ba443968ef", size = 62767, upload-time = "2026-02-24T03:57:58.758Z" },
+    { url = "https://files.pythonhosted.org/packages/35/8b/3e703e8cc4b3ada79f13b28070b51d9550c578f76d1968657905857b2ddd/ijson-3.5.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:d5b8b886b0248652d437f66e7c5ac318bbdcb2c7137a7e5327a68ca00b286f5f", size = 62467, upload-time = "2026-02-24T03:58:00.261Z" },
+    { url = "https://files.pythonhosted.org/packages/21/42/0c91af32c1ee8a957fdac2e051b5780756d05fd34e4b60d94a08d51bac1d/ijson-3.5.0-cp314-cp314t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:498fd46ae2349297e43acf97cdc421e711dbd7198418677259393d2acdc62d78", size = 200447, upload-time = "2026-02-24T03:58:01.591Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/80/796ea0e391b7e2d45c5b1b451734bba03f81c2984cf955ea5eaa6c4920ad/ijson-3.5.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:22a51b4f9b81f12793731cf226266d1de2112c3c04ba4a04117ad4e466897e05", size = 217820, upload-time = "2026-02-24T03:58:02.598Z" },
+    { url = "https://files.pythonhosted.org/packages/38/14/52b6613fdda4078c62eb5b4fe3efc724ddc55a4ad524c93de51830107aa3/ijson-3.5.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9636c710dc4ac4a281baa266a64f323b4cc165cec26836af702c44328b59a515", size = 208310, upload-time = "2026-02-24T03:58:04.759Z" },
+    { url = "https://files.pythonhosted.org/packages/6a/ad/8b3105a78774fd4a65e534a21d975ef3a77e189489fe3029ebcaeba5e243/ijson-3.5.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:f7168a39e8211107666d71b25693fd1b2bac0b33735ef744114c403c6cac21e1", size = 211843, upload-time = "2026-02-24T03:58:05.836Z" },
+    { url = "https://files.pythonhosted.org/packages/36/ab/a2739f6072d6e1160581bc3ed32da614c8cced023dcd519d9c5fa66e0425/ijson-3.5.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:8696454245415bc617ab03b0dc3ae4c86987df5dc6a90bad378fe72c5409d89e", size = 200906, upload-time = "2026-02-24T03:58:07.788Z" },
+    { url = "https://files.pythonhosted.org/packages/6d/5e/e06c2de3c3d4a9cfb655c1ad08a68fb72838d271072cdd3196576ac4431a/ijson-3.5.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:c21bfb61f71f191565885bf1bc29e0a186292d866b4880637b833848360bdc1b", size = 205495, upload-time = "2026-02-24T03:58:09.163Z" },
+    { url = "https://files.pythonhosted.org/packages/d9/3b/d31ecfa63a218978617446159f3d77aab2417a5bd2885c425b176353ff78/ijson-3.5.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:d64c624da0e9d692d6eb0ff63a79656b59d76bf80773a17c5b0f835e4e8ef627", size = 57715, upload-time = "2026-02-24T03:58:24.545Z" },
+    { url = "https://files.pythonhosted.org/packages/30/51/b170e646d378e8cccf9637c05edb5419b00c2c4df64b0258c3af5355608e/ijson-3.5.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:876f7df73b7e0d6474f9caa729b9cdbfc8e76de9075a4887dfd689e29e85c4ca", size = 57205, upload-time = "2026-02-24T03:58:25.681Z" },
+    { url = "https://files.pythonhosted.org/packages/ef/83/44dbd0231b0a8c6c14d27473d10c4e27dfbce7d5d9a833c79e3e6c33eb40/ijson-3.5.0-pp311-pypy311_pp73-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:e7dbff2c8d9027809b0cde663df44f3210da10ea377121d42896fb6ee405dd31", size = 71229, upload-time = "2026-02-24T03:58:27.103Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/98/cf84048b7c6cec888826e696a31f45bee7ebcac15e532b6be1fc4c2c9608/ijson-3.5.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4217a1edc278660679e1197c83a1a2a2d367792bfbb2a3279577f4b59b93730d", size = 71217, upload-time = "2026-02-24T03:58:28.021Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/0a/e34c729a87ff67dc6540f6bcc896626158e691d433ab57db0086d73decd2/ijson-3.5.0-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:04f0fc740311388ee745ba55a12292b722d6f52000b11acbb913982ba5fbdf87", size = 68618, upload-time = "2026-02-24T03:58:28.918Z" },
+]
+
 [[package]]
 name = "imagehash"
 version = "4.3.2"
@@ -2751,6 +2818,7 @@ dependencies = [
    { name = "flower", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "gotenberg-client", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "httpx-oauth", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
+    { name = "ijson", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "imap-tools", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "jinja2", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "langdetect", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -2898,6 +2966,7 @@ requires-dist = [
    { name = "gotenberg-client", specifier = "~=0.13.1" },
    { name = "granian", extras = ["uvloop"], marker = "extra == 'webserver'", specifier = "~=2.7.0" },
    { name = "httpx-oauth", specifier = "~=0.16" },
+    { name = "ijson", specifier = ">=3.2" },
    { name = "imap-tools", specifier = "~=1.11.0" },
    { name = "jinja2", specifier = "~=3.1.5" },
    { name = "langdetect", specifier = "~=1.0.9" },
Author	SHA1	Message	Date
Trenton H	7eb417e796	Feat: refactor TextDocumentParser to ParserProtocol Starting from the moved paperless_text/parsers.py, rewrite the class to satisfy ParserProtocol without inheriting from the old DocumentParser base: - Add class-level identity attributes (name, version, author, url) - Add supported_mime_types() and score() classmethods - Add can_produce_archive and requires_pdf_rendition properties (both False) - Replace tempdir / read_file_handle_unicode_errors from old base class with a self-contained __init__, __enter__, __exit__, and _read_text helper - Drop file_name parameter from parse() and get_thumbnail(); add produce_archive kwarg - Add extract_metadata() returning [] (plain text has no structured metadata) - Remove get_settings() (not part of ParserProtocol) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 16:54:52 -07:00
Trenton H	8c08362ebc	Chore: move paperless_text/parsers.py to paperless/parsers/text.py Preserves git history of the original TextDocumentParser implementation. The file will be edited in the next commit to implement ParserProtocol. Consumption via the old signal-based system is temporarily broken. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 16:31:00 -07:00
Trenton H	c37ab946e1	Feat: add MetadataEntry TypedDict and extract_metadata to ParserProtocol - Define MetadataEntry TypedDict (namespace, prefix, key, value) in paperless.parsers and export it from __all__ - Add extract_metadata(document_path, mime_type) -> list[MetadataEntry] to ParserProtocol; implementations must not raise - Implement extract_metadata on TextDocumentParser (returns []) - Update DummyParser fixture in test_registry to include extract_metadata and align parse/get_thumbnail signatures with the current Protocol - Add TestTextParserMetadata tests covering empty-list return and mime_type-agnostic behaviour Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 16:07:10 -07:00
Trenton H	82068303d0	Use the main version as the built-in parser version	2026-03-09 15:40:28 -07:00
Trenton H	cc8e9a7108	Fix: type ParserRegistry lists and methods as type[ParserProtocol] _builtins, _external, register_builtin, and get_parser_for_file were typed as plain `type`, giving mypy no way to verify that supported_mime_types and score exist on the stored classes. Using type[ParserProtocol] throughout resolves the attr-defined errors and makes the registry's type contract explicit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:30:09 -07:00
Trenton H	1870f69053	Fix: use Self as __enter__ return type in TextDocumentParser Returning the concrete class name would give callers the wrong type if the class is ever subclassed. Self resolves to the actual runtime type, matching the ParserProtocol declaration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:25:54 -07:00
Trenton H	053d590cb8	Fix: align ParserProtocol.__exit__ exc_tb type with TextDocumentParser Both now use TracebackType \| None instead of object. The Protocol's object annotation was overly broad — Python only ever passes TracebackType or None as the third argument to __exit__, and the narrower type is required for pyrefly's contravariant parameter check to pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:24:49 -07:00
Trenton H	987aa363dc	Chore: use text_parser fixture instead of direct instantiation in tests Tests that were using `with TextDocumentParser() as parser:` inline now receive the parser via the text_parser fixture. The two lifecycle tests that must control instantiation directly (cleanup and exception cleanup) are intentionally left unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 15:23:01 -07:00
Trenton H	b8f63026f7	Chore: reorganise parser tests and samples into sub-directories - Move text sample files into tests/samples/text/ so each parser type has its own folder as more parsers are migrated - Move test_text_parser.py into tests/parsers/ sub-package (new __init__.py) - Split conftest.py: top-level keeps clean_registry + samples_dir; new parsers/conftest.py holds text_samples_dir, sample_txt_file, malformed_txt_file, and text_parser fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:38:23 -07:00
Trenton H	3a232f0c8f	Feature: Phase 3 — migrate TextDocumentParser to ParserProtocol - Add paperless/parsers/text.py: standalone TextDocumentParser implementing ParserProtocol (no inheritance from old DocumentParser ABC); uses __enter__/ __exit__ for tempdir lifecycle, score()-based MIME registration - Register TextDocumentParser in ParserRegistry.register_defaults() - Add paperless/tests/conftest.py: session-scoped sample_dir, sample_txt_file, malformed_txt_file fixtures; function-scoped text_parser fixture using the context-manager protocol; autouse clean_registry fixture (moved from test_registry.py to avoid duplication) - Add paperless/tests/test_text_parser.py: 20 tests covering protocol compliance, lifecycle/cleanup, parse, thumbnail, and registry integration - Copy sample files (test.txt, decode_error.txt) to paperless/tests/samples/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:34:08 -07:00
Trenton H	404ef6b40d	Formatting	2026-03-09 14:25:33 -07:00
Trenton H	8c40491034	Refactor: Clean up ParserProtocol docstrings and drop file_name parameter - Remove all Sphinx cross-reference markup (:meth:, :class:, :func:, :attr:, :data:, backtick quoting) from registry.py and __init__.py docstrings; use plain prose matching the rest of the codebase - Remove unused file_name parameter from parse() and get_thumbnail() in ParserProtocol — no existing parser reads it and the path already carries the filename Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 14:09:32 -07:00
Trenton H	0f6bdaf5de	Feature: Add parser plugin registry and ParserProtocol (Phase 1 & 2) Introduces the foundation of the entrypoint-based parser discovery system to replace the signal-based document_consumer_declaration approach. - Add ParserProtocol: runtime_checkable Protocol defining the full contract for document parsers (supported_mime_types, score, parse, context manager, result accessors) - Add ParserRegistry: lazy singleton with entrypoint discovery via importlib.metadata group 'paperless_ngx.parsers', uniform score-based selection across external and built-in parsers - Add get_parser_registry(), init_builtin_parsers(), reset_parser_registry() module-level helpers - Wire Celery worker_process_init to call init_builtin_parsers() eagerly in each worker, deferring third-party discovery to first task use - Add 28 pytest tests covering Protocol compliance, singleton lifecycle, scoring logic, entrypoint discovery, and log output Built-in parsers and consumer migration follow in Phases 3-6. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 13:54:52 -07:00
Trenton H	bcc2f11152	Performance: Stream JSON during import for memory improvements (#12276 ) * Perf: stream manifest parsing with ijson in document_importer Replace bulk json.load of the full manifest (which materializes the entire JSON array into memory) with incremental ijson streaming. Eliminates self.manifest entirely — records are never all in memory at once. - Add ijson>=3.2 dependency - New module-level iter_manifest_records() generator - load_manifest_files() collects paths only; no parsing at load time - check_manifest_validity() streams without accumulating records - decrypt_secret_fields() streams each manifest to a .decrypted.json temp file record-by-record; temp files cleaned up after file copy - _import_files_from_manifest() collects only document records (small fraction of manifest) for the tqdm progress bar Measured on 200 docs + 200 CustomFieldInstances: - Streaming validation: peak memory 3081 KiB -> 333 KiB (89% reduction) - Stream-decrypt to file: peak memory 3081 KiB -> 549 KiB (82% reduction) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Perf: slim dict in _import_files_from_manifest, discard fields When collecting document records for the file-copy step, extract only the 4 keys the loop actually uses (pk + 3 exported filename keys) and discard the full fields dict (content, checksum, tags, etc.). Peak memory for the document-record list: 939 KiB -> 375 KiB (60% reduction). Wall time unchanged.	2026-03-09 10:20:48 -07:00
shamoon	e18b1fd99d	Chore: use unified "gates" for ci tests and docs checks (#12277 )	2026-03-09 17:02:34 +00:00
Trenton H	e30676f889	Feature: Migrate import/export to rich progress (#12260 ) * Refactor: migrate exporter/importer from tqdm to PaperlessCommand.track() Replace direct tqdm usage in document_exporter and document_importer with the PaperlessCommand base class and its track() method, which is backed by Rich and handles --no-progress-bar automatically. Also removes the unused ProgressBarMixin from mixins.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Refactor: add explicit supports_progress_bar and supports_multiprocessing to all PaperlessCommand subclasses Each management command now explicitly declares both class attributes rather than relying on defaults, making intent unambiguous at a glance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 08:59:17 -07:00
Martin Kleine	2a28549c5a	Documentation: Update development commands and pnpm for Angular build commands (#12283 ) --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-03-09 07:06:16 -07:00
GitHub Actions	4badf0e7c2	Auto translate strings	2026-03-09 01:52:08 +00:00
Paul Gessinger	bc26d94593	Chore: Add saved view compatibility in API version 9 (#12280 ) --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-03-08 18:50:31 -07:00
shamoon	93cbbf34b7	Merge branch 'main' into dev	2026-03-07 23:30:08 -08:00
shamoon	1e8622494d	Documentation: remove broken link	2026-03-07 23:29:42 -08:00
GitHub Actions	0c3298f030	Auto translate strings	2026-03-08 03:06:59 +00:00
Sven-Hendrik Haase	2b288c094d	Enhancement: Show correspondent in document merge dialog (#12271 ) --------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>	2026-03-07 19:05:28 -08:00
Trenton H	2cdb1424ef	Performance: Further export memory improvements (#12273 ) * Perf: streaming manifest writer for document exporter (Phase 3) Replaces the in-memory manifest dict accumulation with a StreamingManifestWriter that writes records to manifest.json incrementally, keeping only one batch resident in memory at a time. Key changes: - Add StreamingManifestWriter: writes to .tmp atomically, BLAKE2b compare for --compare-json, discard() on exception - Add _encrypt_record_inline(): per-record encryption replacing the bulk encrypt_secret_fields() call; crypto setup moved before streaming - Add _write_split_manifest(): extracted per-document manifest writing - Refactor dump(): non-doc records streamed during transaction, documents accumulated then written after filenames are assigned - Upgrade check_and_write_json() from MD5 to BLAKE2b - Remove encrypt_secret_fields() and unused itertools.chain import - Add profiling marker to pyproject.toml Measured improvement (200 docs + 200 CustomFieldInstances, same dump() code path, only writer differs): - Peak memory: ~50% reduction - Memory delta: ~70% reduction - Wall time and query count: unchanged Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Refactor: O(1) lookup table for CRYPT_FIELDS in per-record encryption Add CRYPT_FIELDS_BY_MODEL to CryptMixin, derived from CRYPT_FIELDS at class definition time. _encrypt_record_inline() now does a single dict lookup instead of a linear scan per record, eliminating the loop and break pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-07 14:24:50 -08:00
Trenton H	f5c0c21922	Chore: Lazy imports of the heavy AI modules (#12275 )	2026-03-07 12:53:22 -08:00
Trenton H	91ddda9256	Fix: Uploaded digest artifact name for Docker build (#12272 )	2026-03-06 13:15:45 -08:00
Trenton H	9d5e618de8	Chore: pytest style paperless tests (#12254 )	2026-03-06 13:04:23 -08:00
Trenton H	50ae49c7da	Chore: Uploads the digests as just files, no zips (#12264 )	2026-03-06 12:56:34 -08:00
shamoon	ba023ef332	Chore: Add anti-slop job to PR workflow (#12248 )	2026-03-06 20:36:24 +00:00