Compare commits

...

24 Commits

Author SHA1 Message Date
Trenton H
9312d94b1c Upgrades Django manually, since dependabot is failing. Resolves security alerts 2026-04-11 14:49:58 -07:00
shamoon
fdd5e3ecb2 Update SECURITY.md 2026-04-10 12:34:47 -07:00
shamoon
df3b656352 Add tests 2026-04-10 12:06:28 -07:00
shamoon
51e721733f Enhancement: validate and sanitize uploaded logos (#12551) 2026-04-10 11:50:58 -07:00
dependabot[bot]
0ad8b8c002 Chore(deps): Bump the utilities-minor group with 19 updates (#12540)
Bumps the utilities-minor group with 19 updates:

| Package | From | To |
| --- | --- | --- |
| [dateparser](https://github.com/scrapinghub/dateparser) | `1.3.0` | `1.4.0` |
| [drf-spectacular-sidecar](https://github.com/tfranzel/drf-spectacular-sidecar) | `2026.3.1` | `2026.4.1` |
| llama-index-embeddings-huggingface | `0.6.1` | `0.7.0` |
| llama-index-embeddings-openai | `0.5.2` | `0.6.0` |
| llama-index-llms-ollama | `0.9.1` | `0.10.1` |
| llama-index-llms-openai | `0.6.26` | `0.7.5` |
| llama-index-vector-stores-faiss | `0.5.3` | `0.6.0` |
| [openai](https://github.com/openai/openai-python) | `2.26.0` | `2.30.0` |
| [regex](https://github.com/mrabarnett/mrab-regex) | `2026.2.28` | `2026.3.32` |
| [sentence-transformers](https://github.com/huggingface/sentence-transformers) | `5.2.3` | `5.3.0` |
| [torch](https://github.com/pytorch/pytorch) | `2.10.0` | `2.11.0` |
| [faker](https://github.com/joke2k/faker) | `40.8.0` | `40.12.0` |
| [pytest-cov](https://github.com/pytest-dev/pytest-cov) | `7.0.0` | `7.1.0` |
| [pytest-env](https://github.com/pytest-dev/pytest-env) | `1.5.0` | `1.6.0` |
| [celery-types](https://github.com/sbdchd/celery-types) | `0.24.0` | `0.26.0` |
| [mypy](https://github.com/python/mypy) | `1.19.1` | `1.20.0` |
| [pyrefly](https://github.com/facebook/pyrefly) | `0.55.0` | `0.59.0` |
| [types-channels](https://github.com/python/typeshed) | `4.3.0.20250822` | `4.3.0.20260408` |
| [types-dateparser](https://github.com/python/typeshed) | `1.3.0.20260206` | `1.4.0.20260328` |


Updates `dateparser` from 1.3.0 to 1.4.0
- [Release notes](https://github.com/scrapinghub/dateparser/releases)
- [Changelog](https://github.com/scrapinghub/dateparser/blob/master/HISTORY.rst)
- [Commits](https://github.com/scrapinghub/dateparser/compare/v1.3.0...v1.4.0)

Updates `drf-spectacular-sidecar` from 2026.3.1 to 2026.4.1
- [Commits](https://github.com/tfranzel/drf-spectacular-sidecar/compare/2026.3.1...2026.4.1)

Updates `llama-index-embeddings-huggingface` from 0.6.1 to 0.7.0

Updates `llama-index-embeddings-openai` from 0.5.2 to 0.6.0

Updates `llama-index-llms-ollama` from 0.9.1 to 0.10.1

Updates `llama-index-llms-openai` from 0.6.26 to 0.7.5

Updates `llama-index-vector-stores-faiss` from 0.5.3 to 0.6.0

Updates `openai` from 2.26.0 to 2.30.0
- [Release notes](https://github.com/openai/openai-python/releases)
- [Changelog](https://github.com/openai/openai-python/blob/main/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-python/compare/v2.26.0...v2.30.0)

Updates `regex` from 2026.2.28 to 2026.3.32
- [Changelog](https://github.com/mrabarnett/mrab-regex/blob/hg/changelog.txt)
- [Commits](https://github.com/mrabarnett/mrab-regex/compare/2026.2.28...2026.3.32)

Updates `sentence-transformers` from 5.2.3 to 5.3.0
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.3...v5.3.0)

Updates `torch` from 2.10.0 to 2.11.0
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.10.0...v2.11.0)

Updates `faker` from 40.8.0 to 40.12.0
- [Release notes](https://github.com/joke2k/faker/releases)
- [Changelog](https://github.com/joke2k/faker/blob/master/CHANGELOG.md)
- [Commits](https://github.com/joke2k/faker/compare/v40.8.0...v40.12.0)

Updates `pytest-cov` from 7.0.0 to 7.1.0
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest-cov/compare/v7.0.0...v7.1.0)

Updates `pytest-env` from 1.5.0 to 1.6.0
- [Release notes](https://github.com/pytest-dev/pytest-env/releases)
- [Commits](https://github.com/pytest-dev/pytest-env/compare/1.5.0...1.6.0)

Updates `celery-types` from 0.24.0 to 0.26.0
- [Commits](https://github.com/sbdchd/celery-types/commits)

Updates `mypy` from 1.19.1 to 1.20.0
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.19.1...v1.20.0)

Updates `pyrefly` from 0.55.0 to 0.59.0
- [Release notes](https://github.com/facebook/pyrefly/releases)
- [Commits](https://github.com/facebook/pyrefly/compare/0.55.0...0.59.0)

Updates `types-channels` from 4.3.0.20250822 to 4.3.0.20260408
- [Commits](https://github.com/python/typeshed/commits)

Updates `types-dateparser` from 1.3.0.20260206 to 1.4.0.20260328
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: dateparser
  dependency-version: 1.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: drf-spectacular-sidecar
  dependency-version: 2026.4.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: llama-index-embeddings-huggingface
  dependency-version: 0.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: llama-index-embeddings-openai
  dependency-version: 0.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: llama-index-llms-ollama
  dependency-version: 0.10.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: llama-index-llms-openai
  dependency-version: 0.7.5
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: llama-index-vector-stores-faiss
  dependency-version: 0.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: openai
  dependency-version: 2.30.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: regex
  dependency-version: 2026.3.32
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: sentence-transformers
  dependency-version: 5.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: torch
  dependency-version: 2.11.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: faker
  dependency-version: 40.12.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: pytest-cov
  dependency-version: 7.1.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: pytest-env
  dependency-version: 1.6.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: celery-types
  dependency-version: 0.26.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: mypy
  dependency-version: 1.20.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: pyrefly
  dependency-version: 0.59.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
- dependency-name: types-channels
  dependency-version: 4.3.0.20260408
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: utilities-minor
- dependency-name: types-dateparser
  dependency-version: 1.4.0.20260328
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: utilities-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 15:09:42 -07:00
dependabot[bot]
4d5d77ce15 Chore(deps): Bump cryptography in the uv group across 1 directory (#12546)
Bumps the uv group with 1 update in the / directory: [cryptography](https://github.com/pyca/cryptography).


Updates `cryptography` from 46.0.6 to 46.0.7
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/46.0.6...46.0.7)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.7
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 14:01:39 -07:00
dependabot[bot]
5ba2ce9c98 Chore(deps-dev): Bump types-python-dateutil (#12542)
Bumps [types-python-dateutil](https://github.com/python/typeshed) from 2.9.0.20260305 to 2.9.0.20260323.
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: types-python-dateutil
  dependency-version: 2.9.0.20260323
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 13:46:05 -07:00
dependabot[bot]
d8fe6a9a36 Chore(deps-dev): Bump types-pytz (#12541)
Bumps [types-pytz](https://github.com/python/typeshed) from 2025.2.0.20251108 to 2026.1.1.20260304.
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: types-pytz
  dependency-version: 2026.1.1.20260304
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 13:03:54 -07:00
dependabot[bot]
bd630c1280 Chore(deps): Bump django-guardian in the utilities-patch group (#12539)
Bumps the utilities-patch group with 1 update: [django-guardian](https://github.com/django-guardian/django-guardian).


Updates `django-guardian` from 3.3.0 to 3.3.1
- [Release notes](https://github.com/django-guardian/django-guardian/releases)
- [Commits](https://github.com/django-guardian/django-guardian/compare/3.3.0...3.3.1)

---
updated-dependencies:
- dependency-name: django-guardian
  dependency-version: 3.3.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: utilities-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 11:59:35 -07:00
dependabot[bot]
ab183b9982 Chore(deps-dev): Bump zensical in the development group (#12532)
Bumps the development group with 1 update: [zensical](https://github.com/zensical/zensical).


Updates `zensical` from 0.0.29 to 0.0.31
- [Release notes](https://github.com/zensical/zensical/releases)
- [Commits](https://github.com/zensical/zensical/compare/v0.0.29...v0.0.31)

---
updated-dependencies:
- dependency-name: zensical
  dependency-version: 0.0.31
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 18:19:50 +00:00
dependabot[bot]
439e10d767 Chore(deps): Bump pdfjs-dist from 5.4.624 to 5.6.205 in /src-ui (#12536)
Bumps [pdfjs-dist](https://github.com/mozilla/pdf.js) from 5.4.624 to 5.6.205.
- [Release notes](https://github.com/mozilla/pdf.js/releases)
- [Commits](https://github.com/mozilla/pdf.js/compare/v5.4.624...v5.6.205)

---
updated-dependencies:
- dependency-name: pdfjs-dist
  dependency-version: 5.6.205
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 17:31:39 +00:00
dependabot[bot]
cebfea9d94 Chore(deps): Bump the actions group with 4 updates (#12538)
Bumps the actions group with 4 updates: [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv), [codecov/codecov-action](https://github.com/codecov/codecov-action), [github/codeql-action](https://github.com/github/codeql-action) and [crowdin/github-action](https://github.com/crowdin/github-action).


Updates `astral-sh/setup-uv` from 7.3.1 to 8.0.0
- [Release notes](https://github.com/astral-sh/setup-uv/releases)
- [Commits](5a095e7a20...cec208311d)

Updates `codecov/codecov-action` from 5.5.2 to 6.0.0
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](671740ac38...57e3a136b7)

Updates `github/codeql-action` from 4.32.5 to 4.35.1
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/github/codeql-action/compare/v4.32.5...c10b8064de6f491fea524254123dbe5e09572f13)

Updates `crowdin/github-action` from 2.15.0 to 2.16.0
- [Release notes](https://github.com/crowdin/github-action/releases)
- [Commits](8818ff65bf...7ca9c452bf)

---
updated-dependencies:
- dependency-name: astral-sh/setup-uv
  dependency-version: 8.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
- dependency-name: codecov/codecov-action
  dependency-version: 6.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
- dependency-name: github/codeql-action
  dependency-version: 4.35.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
- dependency-name: crowdin/github-action
  dependency-version: 2.16.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 10:23:38 -07:00
dependabot[bot]
a97c0d8a06 Chore(deps-dev): Bump the frontend-eslint-dependencies group (#12535)
Bumps the frontend-eslint-dependencies group in /src-ui with 3 updates: [@typescript-eslint/eslint-plugin](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/eslint-plugin), [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser) and [@typescript-eslint/utils](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/utils).


Updates `@typescript-eslint/eslint-plugin` from 8.57.2 to 8.58.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/eslint-plugin/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.58.0/packages/eslint-plugin)

Updates `@typescript-eslint/parser` from 8.57.2 to 8.58.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.58.0/packages/parser)

Updates `@typescript-eslint/utils` from 8.57.2 to 8.58.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/utils/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.58.0/packages/utils)

---
updated-dependencies:
- dependency-name: "@typescript-eslint/eslint-plugin"
  dependency-version: 8.58.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: frontend-eslint-dependencies
- dependency-name: "@typescript-eslint/parser"
  dependency-version: 8.58.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: frontend-eslint-dependencies
- dependency-name: "@typescript-eslint/utils"
  dependency-version: 8.58.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: frontend-eslint-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 16:50:57 +00:00
dependabot[bot]
1e571ea23c Chore(deps): Bump the frontend-angular-dependencies group (#12533)
Bumps the frontend-angular-dependencies group in /src-ui with 5 updates:

| Package | From | To |
| --- | --- | --- |
| [@ng-select/ng-select](https://github.com/ng-select/ng-select) | `21.5.2` | `21.7.0` |
| [@angular-devkit/core](https://github.com/angular/angular-cli) | `21.2.3` | `21.2.6` |
| [@angular-devkit/schematics](https://github.com/angular/angular-cli) | `21.2.3` | `21.2.6` |
| [@angular/build](https://github.com/angular/angular-cli) | `21.2.3` | `21.2.6` |
| [@angular/cli](https://github.com/angular/angular-cli) | `21.2.3` | `21.2.6` |


Updates `@ng-select/ng-select` from 21.5.2 to 21.7.0
- [Release notes](https://github.com/ng-select/ng-select/releases)
- [Changelog](https://github.com/ng-select/ng-select/blob/master/CHANGELOG.md)
- [Commits](https://github.com/ng-select/ng-select/compare/v21.5.2...v21.7.0)

Updates `@angular-devkit/core` from 21.2.3 to 21.2.6
- [Release notes](https://github.com/angular/angular-cli/releases)
- [Changelog](https://github.com/angular/angular-cli/blob/main/CHANGELOG.md)
- [Commits](https://github.com/angular/angular-cli/compare/v21.2.3...v21.2.6)

Updates `@angular-devkit/schematics` from 21.2.3 to 21.2.6
- [Release notes](https://github.com/angular/angular-cli/releases)
- [Changelog](https://github.com/angular/angular-cli/blob/main/CHANGELOG.md)
- [Commits](https://github.com/angular/angular-cli/compare/v21.2.3...v21.2.6)

Updates `@angular/build` from 21.2.3 to 21.2.6
- [Release notes](https://github.com/angular/angular-cli/releases)
- [Changelog](https://github.com/angular/angular-cli/blob/main/CHANGELOG.md)
- [Commits](https://github.com/angular/angular-cli/compare/v21.2.3...v21.2.6)

Updates `@angular/cli` from 21.2.3 to 21.2.6
- [Release notes](https://github.com/angular/angular-cli/releases)
- [Changelog](https://github.com/angular/angular-cli/blob/main/CHANGELOG.md)
- [Commits](https://github.com/angular/angular-cli/compare/v21.2.3...v21.2.6)

---
updated-dependencies:
- dependency-name: "@ng-select/ng-select"
  dependency-version: 21.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: frontend-angular-dependencies
- dependency-name: "@angular-devkit/core"
  dependency-version: 21.2.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-angular-dependencies
- dependency-name: "@angular-devkit/schematics"
  dependency-version: 21.2.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-angular-dependencies
- dependency-name: "@angular/build"
  dependency-version: 21.2.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-angular-dependencies
- dependency-name: "@angular/cli"
  dependency-version: 21.2.6
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-angular-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 16:33:58 +00:00
dependabot[bot]
b80b92a2b2 Chore(deps-dev): Bump jest-preset-angular from 16.1.1 to 16.1.2 in /src-ui in the frontend-jest-dependencies group across 1 directory (#12534)
* Chore(deps-dev): Bump jest-preset-angular

Bumps the frontend-jest-dependencies group in /src-ui with 1 update: [jest-preset-angular](https://github.com/thymikee/jest-preset-angular).


Updates `jest-preset-angular` from 16.1.1 to 16.1.2
- [Release notes](https://github.com/thymikee/jest-preset-angular/releases)
- [Changelog](https://github.com/thymikee/jest-preset-angular/blob/main/CHANGELOG.md)
- [Commits](https://github.com/thymikee/jest-preset-angular/compare/v16.1.1...v16.1.2)

---
updated-dependencies:
- dependency-name: jest-preset-angular
  dependency-version: 16.1.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: frontend-jest-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>

* Circumvent setSystemTime bug

See https://github.com/sinonjs/fake-timers/issues/557

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-04-08 16:11:53 +00:00
dependabot[bot]
c07b802bb8 Chore(deps-dev): Bump @playwright/test from 1.58.2 to 1.59.0 in /src-ui (#12537)
* Chore(deps-dev): Bump @playwright/test from 1.58.2 to 1.59.0 in /src-ui

Bumps [@playwright/test](https://github.com/microsoft/playwright) from 1.58.2 to 1.59.0.
- [Release notes](https://github.com/microsoft/playwright/releases)
- [Commits](https://github.com/microsoft/playwright/compare/v1.58.2...v1.59.0)

---
updated-dependencies:
- dependency-name: "@playwright/test"
  dependency-version: 1.59.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* bump Playwright docker images

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-04-08 15:52:49 +00:00
GitHub Actions
ec6969e326 Auto translate strings 2026-04-08 15:42:05 +00:00
shamoon
4629bbf83e Enhancement: add view_global_statistics and view_system_status permissions (#12530) 2026-04-08 15:39:47 +00:00
shamoon
826ffcccef Handle the final batch of zizmor warnings 2026-04-08 08:06:00 -07:00
shamoon
b7a5255102 Chore: address more zizmor flags (#12529) 2026-04-08 14:16:09 +00:00
dependabot[bot]
962a4ddd73 Chore(deps): Bump the npm_and_yarn group across 1 directory with 2 updates (#12531)
Bumps the npm_and_yarn group with 2 updates in the /src-ui directory: [@hono/node-server](https://github.com/honojs/node-server) and [hono](https://github.com/honojs/hono).


Updates `@hono/node-server` from 1.19.12 to 1.19.13
- [Release notes](https://github.com/honojs/node-server/releases)
- [Commits](https://github.com/honojs/node-server/compare/v1.19.12...v1.19.13)

Updates `hono` from 4.12.9 to 4.12.12
- [Release notes](https://github.com/honojs/hono/releases)
- [Commits](https://github.com/honojs/hono/compare/v4.12.9...v4.12.12)

---
updated-dependencies:
- dependency-name: "@hono/node-server"
  dependency-version: 1.19.13
  dependency-type: indirect
  dependency-group: npm_and_yarn
- dependency-name: hono
  dependency-version: 4.12.12
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-07 21:39:44 -07:00
Trenton H
a5fe88d2a1 Chore: Resolves some zizmor reported code scan findings (#12516)
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-04-06 23:03:29 +00:00
GitHub Actions
51c59746a7 Auto translate strings 2026-04-06 22:51:57 +00:00
Trenton H
c232d443fa Breaking: Decouple OCR control from archive file control (#12448)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-04-06 15:50:21 -07:00
70 changed files with 3896 additions and 1301 deletions

View File

@@ -164,6 +164,8 @@ updates:
directory: "/" # Location of package manifests
schedule:
interval: "monthly"
cooldown:
default-days: 7
groups:
pre-commit-dependencies:
patterns:

View File

@@ -13,10 +13,13 @@ concurrency:
env:
DEFAULT_UV_VERSION: "0.10.x"
NLTK_DATA: "/usr/share/nltk_data"
permissions: {}
jobs:
changes:
name: Detect Backend Changes
runs-on: ubuntu-slim
permissions:
contents: read
outputs:
backend_changed: ${{ steps.force.outputs.run_all == 'true' || steps.filter.outputs.backend == 'true' }}
steps:
@@ -27,10 +30,13 @@ jobs:
persist-credentials: false
- name: Decide run mode
id: force
env:
EVENT_NAME: ${{ github.event_name }}
REF_NAME: ${{ github.ref_name }}
run: |
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
if [[ "${EVENT_NAME}" == "workflow_dispatch" ]]; then
echo "run_all=true" >> "$GITHUB_OUTPUT"
elif [[ "${{ github.event_name }}" == "push" && ( "${{ github.ref_name }}" == "main" || "${{ github.ref_name }}" == "dev" ) ]]; then
elif [[ "${EVENT_NAME}" == "push" && ( "${REF_NAME}" == "main" || "${REF_NAME}" == "dev" ) ]]; then
echo "run_all=true" >> "$GITHUB_OUTPUT"
else
echo "run_all=false" >> "$GITHUB_OUTPUT"
@@ -38,15 +44,22 @@ jobs:
- name: Set diff range
id: range
if: steps.force.outputs.run_all != 'true'
env:
BEFORE_SHA: ${{ github.event.before }}
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
EVENT_CREATED: ${{ github.event.created }}
EVENT_NAME: ${{ github.event_name }}
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
SHA: ${{ github.sha }}
run: |
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
elif [[ "${{ github.event.created }}" == "true" ]]; then
echo "base=${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
if [[ "${EVENT_NAME}" == "pull_request" ]]; then
echo "base=${PR_BASE_SHA}" >> "$GITHUB_OUTPUT"
elif [[ "${EVENT_CREATED}" == "true" ]]; then
echo "base=${DEFAULT_BRANCH}" >> "$GITHUB_OUTPUT"
else
echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
echo "base=${BEFORE_SHA}" >> "$GITHUB_OUTPUT"
fi
echo "ref=${{ github.sha }}" >> "$GITHUB_OUTPUT"
echo "ref=${SHA}" >> "$GITHUB_OUTPUT"
- name: Detect changes
id: filter
if: steps.force.outputs.run_all != 'true'
@@ -66,6 +79,8 @@ jobs:
if: needs.changes.outputs.backend_changed == 'true'
name: "Python ${{ matrix.python-version }}"
runs-on: ubuntu-24.04
permissions:
contents: read
strategy:
matrix:
python-version: ['3.11', '3.12', '3.13', '3.14']
@@ -85,7 +100,7 @@ jobs:
with:
python-version: "${{ matrix.python-version }}"
- name: Install uv
uses: astral-sh/setup-uv@5a095e7a2014a4212f075830d4f7277575a9d098 # v7.3.1
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
with:
version: ${{ env.DEFAULT_UV_VERSION }}
enable-cache: true
@@ -99,9 +114,11 @@ jobs:
run: |
sudo cp docker/rootfs/etc/ImageMagick-6/paperless-policy.xml /etc/ImageMagick-6/policy.xml
- name: Install Python dependencies
env:
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
uv sync \
--python ${{ steps.setup-python.outputs.python-version }} \
--python "${PYTHON_VERSION}" \
--group testing \
--frozen
- name: List installed Python dependencies
@@ -109,26 +126,27 @@ jobs:
uv pip list
- name: Install NLTK data
run: |
uv run python -m nltk.downloader punkt punkt_tab snowball_data stopwords -d ${{ env.NLTK_DATA }}
uv run python -m nltk.downloader punkt punkt_tab snowball_data stopwords -d "${NLTK_DATA}"
- name: Run tests
env:
NLTK_DATA: ${{ env.NLTK_DATA }}
PAPERLESS_CI_TEST: 1
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
uv run \
--python ${{ steps.setup-python.outputs.python-version }} \
--python "${PYTHON_VERSION}" \
--dev \
--frozen \
pytest
- name: Upload test results to Codecov
if: always()
uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v5.5.2
uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
with:
flags: backend-python-${{ matrix.python-version }}
files: junit.xml
report_type: test_results
- name: Upload coverage to Codecov
uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v5.5.2
uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
with:
flags: backend-python-${{ matrix.python-version }}
files: coverage.xml
@@ -143,6 +161,8 @@ jobs:
if: needs.changes.outputs.backend_changed == 'true'
name: Check project typing
runs-on: ubuntu-24.04
permissions:
contents: read
env:
DEFAULT_PYTHON: "3.12"
steps:
@@ -156,15 +176,17 @@ jobs:
with:
python-version: "${{ env.DEFAULT_PYTHON }}"
- name: Install uv
uses: astral-sh/setup-uv@5a095e7a2014a4212f075830d4f7277575a9d098 # v7.3.1
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
with:
version: ${{ env.DEFAULT_UV_VERSION }}
enable-cache: true
python-version: ${{ steps.setup-python.outputs.python-version }}
- name: Install Python dependencies
env:
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
uv sync \
--python ${{ steps.setup-python.outputs.python-version }} \
--python "${PYTHON_VERSION}" \
--group testing \
--group typing \
--frozen
@@ -200,19 +222,23 @@ jobs:
runs-on: ubuntu-slim
steps:
- name: Check gate
env:
BACKEND_CHANGED: ${{ needs.changes.outputs.backend_changed }}
TEST_RESULT: ${{ needs.test.result }}
TYPING_RESULT: ${{ needs.typing.result }}
run: |
if [[ "${{ needs.changes.outputs.backend_changed }}" != "true" ]]; then
if [[ "${BACKEND_CHANGED}" != "true" ]]; then
echo "No backend-relevant changes detected."
exit 0
fi
if [[ "${{ needs.test.result }}" != "success" ]]; then
echo "::error::Backend test job result: ${{ needs.test.result }}"
if [[ "${TEST_RESULT}" != "success" ]]; then
echo "::error::Backend test job result: ${TEST_RESULT}"
exit 1
fi
if [[ "${{ needs.typing.result }}" != "success" ]]; then
echo "::error::Backend typing job result: ${{ needs.typing.result }}"
if [[ "${TYPING_RESULT}" != "success" ]]; then
echo "::error::Backend typing job result: ${TYPING_RESULT}"
exit 1
fi

View File

@@ -89,7 +89,7 @@ jobs:
push_external="true"
;;
esac
case "${{ github.ref }}" in
case "${GITHUB_REF}" in
refs/tags/v*|*beta.rc*)
push_external="true"
;;
@@ -166,6 +166,7 @@ jobs:
runs-on: ubuntu-24.04
needs: build-arch
if: needs.build-arch.outputs.should-push == 'true'
environment: image-publishing
permissions:
contents: read
packages: write
@@ -230,8 +231,10 @@ jobs:
docker buildx imagetools create ${tags} ${digests}
- name: Inspect image
env:
FIRST_TAG: ${{ fromJSON(steps.docker-meta.outputs.json).tags[0] }}
run: |
docker buildx imagetools inspect ${{ fromJSON(steps.docker-meta.outputs.json).tags[0] }}
docker buildx imagetools inspect "${FIRST_TAG}"
- name: Copy to Docker Hub
if: needs.build-arch.outputs.push-external == 'true'
env:

View File

@@ -10,8 +10,6 @@ concurrency:
cancel-in-progress: true
permissions:
contents: read
pages: write
id-token: write
env:
DEFAULT_UV_VERSION: "0.10.x"
DEFAULT_PYTHON_VERSION: "3.12"
@@ -80,7 +78,7 @@ jobs:
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
- name: Install uv
uses: astral-sh/setup-uv@5a095e7a2014a4212f075830d4f7277575a9d098 # v7.3.1
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
with:
version: ${{ env.DEFAULT_UV_VERSION }}
enable-cache: true
@@ -105,6 +103,9 @@ jobs:
needs: [changes, build]
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && needs.changes.outputs.docs_changed == 'true'
runs-on: ubuntu-24.04
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}

View File

@@ -10,10 +10,13 @@ on:
concurrency:
group: frontend-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
changes:
name: Detect Frontend Changes
runs-on: ubuntu-slim
permissions:
contents: read
outputs:
frontend_changed: ${{ steps.force.outputs.run_all == 'true' || steps.filter.outputs.frontend == 'true' }}
steps:
@@ -21,12 +24,16 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Decide run mode
id: force
env:
EVENT_NAME: ${{ github.event_name }}
REF_NAME: ${{ github.ref_name }}
run: |
if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
if [[ "${EVENT_NAME}" == "workflow_dispatch" ]]; then
echo "run_all=true" >> "$GITHUB_OUTPUT"
elif [[ "${{ github.event_name }}" == "push" && ( "${{ github.ref_name }}" == "main" || "${{ github.ref_name }}" == "dev" ) ]]; then
elif [[ "${EVENT_NAME}" == "push" && ( "${REF_NAME}" == "main" || "${REF_NAME}" == "dev" ) ]]; then
echo "run_all=true" >> "$GITHUB_OUTPUT"
else
echo "run_all=false" >> "$GITHUB_OUTPUT"
@@ -34,15 +41,22 @@ jobs:
- name: Set diff range
id: range
if: steps.force.outputs.run_all != 'true'
env:
BEFORE_SHA: ${{ github.event.before }}
DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
EVENT_CREATED: ${{ github.event.created }}
EVENT_NAME: ${{ github.event_name }}
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
SHA: ${{ github.sha }}
run: |
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
echo "base=${{ github.event.pull_request.base.sha }}" >> "$GITHUB_OUTPUT"
elif [[ "${{ github.event.created }}" == "true" ]]; then
echo "base=${{ github.event.repository.default_branch }}" >> "$GITHUB_OUTPUT"
if [[ "${EVENT_NAME}" == "pull_request" ]]; then
echo "base=${PR_BASE_SHA}" >> "$GITHUB_OUTPUT"
elif [[ "${EVENT_CREATED}" == "true" ]]; then
echo "base=${DEFAULT_BRANCH}" >> "$GITHUB_OUTPUT"
else
echo "base=${{ github.event.before }}" >> "$GITHUB_OUTPUT"
echo "base=${BEFORE_SHA}" >> "$GITHUB_OUTPUT"
fi
echo "ref=${{ github.sha }}" >> "$GITHUB_OUTPUT"
echo "ref=${SHA}" >> "$GITHUB_OUTPUT"
- name: Detect changes
id: filter
if: steps.force.outputs.run_all != 'true'
@@ -59,6 +73,8 @@ jobs:
if: needs.changes.outputs.frontend_changed == 'true'
name: Install Dependencies
runs-on: ubuntu-24.04
permissions:
contents: read
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -89,6 +105,8 @@ jobs:
needs: [changes, install-dependencies]
if: needs.changes.outputs.frontend_changed == 'true'
runs-on: ubuntu-24.04
permissions:
contents: read
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -120,6 +138,8 @@ jobs:
needs: [changes, install-dependencies]
if: needs.changes.outputs.frontend_changed == 'true'
runs-on: ubuntu-24.04
permissions:
contents: read
strategy:
fail-fast: false
matrix:
@@ -154,13 +174,13 @@ jobs:
run: cd src-ui && pnpm run test --max-workers=2 --shard=${{ matrix.shard-index }}/${{ matrix.shard-count }}
- name: Upload test results to Codecov
if: always()
uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v5.5.2
uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
with:
flags: frontend-node-${{ matrix.node-version }}
directory: src-ui/
report_type: test_results
- name: Upload coverage to Codecov
uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v5.5.2
uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
with:
flags: frontend-node-${{ matrix.node-version }}
directory: src-ui/coverage/
@@ -169,7 +189,9 @@ jobs:
needs: [changes, install-dependencies]
if: needs.changes.outputs.frontend_changed == 'true'
runs-on: ubuntu-24.04
container: mcr.microsoft.com/playwright:v1.58.2-noble
permissions:
contents: read
container: mcr.microsoft.com/playwright:v1.59.0-noble
env:
PLAYWRIGHT_BROWSERS_PATH: /ms-playwright
PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: 1
@@ -212,6 +234,9 @@ jobs:
needs: [changes, unit-tests, e2e-tests]
if: needs.changes.outputs.frontend_changed == 'true'
runs-on: ubuntu-24.04
environment: bundle-analysis
permissions:
contents: read
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -248,34 +273,41 @@ jobs:
runs-on: ubuntu-slim
steps:
- name: Check gate
env:
BUNDLE_ANALYSIS_RESULT: ${{ needs['bundle-analysis'].result }}
E2E_RESULT: ${{ needs['e2e-tests'].result }}
FRONTEND_CHANGED: ${{ needs.changes.outputs.frontend_changed }}
INSTALL_RESULT: ${{ needs['install-dependencies'].result }}
LINT_RESULT: ${{ needs.lint.result }}
UNIT_RESULT: ${{ needs['unit-tests'].result }}
run: |
if [[ "${{ needs.changes.outputs.frontend_changed }}" != "true" ]]; then
if [[ "${FRONTEND_CHANGED}" != "true" ]]; then
echo "No frontend-relevant changes detected."
exit 0
fi
if [[ "${{ needs['install-dependencies'].result }}" != "success" ]]; then
echo "::error::Frontend install job result: ${{ needs['install-dependencies'].result }}"
if [[ "${INSTALL_RESULT}" != "success" ]]; then
echo "::error::Frontend install job result: ${INSTALL_RESULT}"
exit 1
fi
if [[ "${{ needs.lint.result }}" != "success" ]]; then
echo "::error::Frontend lint job result: ${{ needs.lint.result }}"
if [[ "${LINT_RESULT}" != "success" ]]; then
echo "::error::Frontend lint job result: ${LINT_RESULT}"
exit 1
fi
if [[ "${{ needs['unit-tests'].result }}" != "success" ]]; then
echo "::error::Frontend unit-tests job result: ${{ needs['unit-tests'].result }}"
if [[ "${UNIT_RESULT}" != "success" ]]; then
echo "::error::Frontend unit-tests job result: ${UNIT_RESULT}"
exit 1
fi
if [[ "${{ needs['e2e-tests'].result }}" != "success" ]]; then
echo "::error::Frontend e2e-tests job result: ${{ needs['e2e-tests'].result }}"
if [[ "${E2E_RESULT}" != "success" ]]; then
echo "::error::Frontend e2e-tests job result: ${E2E_RESULT}"
exit 1
fi
if [[ "${{ needs['bundle-analysis'].result }}" != "success" ]]; then
echo "::error::Frontend bundle-analysis job result: ${{ needs['bundle-analysis'].result }}"
if [[ "${BUNDLE_ANALYSIS_RESULT}" != "success" ]]; then
echo "::error::Frontend bundle-analysis job result: ${BUNDLE_ANALYSIS_RESULT}"
exit 1
fi

View File

@@ -9,6 +9,8 @@ on:
concurrency:
group: lint-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
lint:
name: Linting via prek

View File

@@ -10,10 +10,14 @@ concurrency:
env:
DEFAULT_UV_VERSION: "0.10.x"
DEFAULT_PYTHON_VERSION: "3.12"
permissions: {}
jobs:
wait-for-docker:
name: Wait for Docker Build
runs-on: ubuntu-24.04
permissions:
checks: read
statuses: read
steps:
- name: Wait for Docker build
uses: lewagon/wait-on-check-action@74049309dfeff245fe8009a0137eacf28136cb3c # v1.5.0
@@ -26,6 +30,8 @@ jobs:
name: Build Release
needs: wait-for-docker
runs-on: ubuntu-24.04
permissions:
contents: read
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -40,8 +46,7 @@ jobs:
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24.x
cache: 'pnpm'
cache-dependency-path: 'src-ui/pnpm-lock.yaml'
package-manager-cache: false
- name: Install frontend dependencies
run: cd src-ui && pnpm install
- name: Build frontend
@@ -53,23 +58,27 @@ jobs:
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
- name: Install uv
uses: astral-sh/setup-uv@5a095e7a2014a4212f075830d4f7277575a9d098 # v7.3.1
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
with:
version: ${{ env.DEFAULT_UV_VERSION }}
enable-cache: true
enable-cache: false
python-version: ${{ steps.setup-python.outputs.python-version }}
- name: Install Python dependencies
env:
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
uv sync --python ${{ steps.setup-python.outputs.python-version }} --dev --frozen
uv sync --python "${PYTHON_VERSION}" --dev --frozen
- name: Install system dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends gettext liblept5
# ---- Build Documentation ----
- name: Build documentation
env:
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
uv run \
--python ${{ steps.setup-python.outputs.python-version }} \
--python "${PYTHON_VERSION}" \
--dev \
--frozen \
zensical build --clean
@@ -78,16 +87,20 @@ jobs:
run: |
uv export --quiet --no-dev --all-extras --format requirements-txt --output-file requirements.txt
- name: Compile messages
env:
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
cd src/
uv run \
--python ${{ steps.setup-python.outputs.python-version }} \
--python "${PYTHON_VERSION}" \
manage.py compilemessages
- name: Collect static files
env:
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
run: |
cd src/
uv run \
--python ${{ steps.setup-python.outputs.python-version }} \
--python "${PYTHON_VERSION}" \
manage.py collectstatic --no-input --clear
- name: Assemble release package
run: |
@@ -129,6 +142,9 @@ jobs:
name: Publish Release
needs: build-release
runs-on: ubuntu-24.04
permissions:
contents: write
pull-requests: write
outputs:
prerelease: ${{ steps.get-version.outputs.prerelease }}
changelog: ${{ steps.create-release.outputs.body }}
@@ -141,9 +157,11 @@ jobs:
path: ./
- name: Get version info
id: get-version
env:
REF_NAME: ${{ github.ref_name }}
run: |
echo "version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
if [[ "${{ github.ref_name }}" == *"-beta.rc"* ]]; then
echo "version=${REF_NAME}" >> $GITHUB_OUTPUT
if [[ "${REF_NAME}" == *"-beta.rc"* ]]; then
echo "prerelease=true" >> $GITHUB_OUTPUT
else
echo "prerelease=false" >> $GITHUB_OUTPUT
@@ -176,6 +194,9 @@ jobs:
needs: publish-release
if: needs.publish-release.outputs.prerelease == 'false'
runs-on: ubuntu-24.04
permissions:
contents: write
pull-requests: write
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -188,18 +209,24 @@ jobs:
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
- name: Install uv
uses: astral-sh/setup-uv@5a095e7a2014a4212f075830d4f7277575a9d098 # v7.3.1
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
with:
version: ${{ env.DEFAULT_UV_VERSION }}
enable-cache: true
enable-cache: false
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
- name: Update changelog
working-directory: docs
env:
CHANGELOG: ${{ needs.publish-release.outputs.changelog }}
PYTHON_VERSION: ${{ steps.setup-python.outputs.python-version }}
VERSION: ${{ needs.publish-release.outputs.version }}
run: |
git branch ${{ needs.publish-release.outputs.version }}-changelog
git checkout ${{ needs.publish-release.outputs.version }}-changelog
branch_name="${VERSION}-changelog"
echo -e "# Changelog\n\n${{ needs.publish-release.outputs.changelog }}\n" > changelog-new.md
git branch "${branch_name}"
git checkout "${branch_name}"
printf '# Changelog\n\n%s\n' "${CHANGELOG}" > changelog-new.md
echo "Manually linking usernames"
sed -i -r 's|@([a-zA-Z0-9_]+) \(\[#|[@\1](https://github.com/\1) ([#|g' changelog-new.md
@@ -212,24 +239,28 @@ jobs:
mv changelog-new.md changelog.md
uv run \
--python ${{ steps.setup-python.outputs.python-version }} \
--python "${PYTHON_VERSION}" \
--dev \
prek run --files changelog.md || true
git config --global user.name "github-actions"
git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com"
git commit -am "Changelog ${{ needs.publish-release.outputs.version }} - GHA"
git push origin ${{ needs.publish-release.outputs.version }}-changelog
git commit -am "Changelog ${VERSION} - GHA"
git push origin "${branch_name}"
- name: Create pull request
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
env:
VERSION: ${{ needs.publish-release.outputs.version }}
with:
script: |
const { repo, owner } = context.repo;
const version = process.env.VERSION;
const head = `${version}-changelog`;
const result = await github.rest.pulls.create({
title: 'Documentation: Add ${{ needs.publish-release.outputs.version }} changelog',
title: `Documentation: Add ${version} changelog`,
owner,
repo,
head: '${{ needs.publish-release.outputs.version }}-changelog',
head,
base: 'main',
body: 'This PR is auto-generated by CI.'
});

View File

@@ -33,10 +33,18 @@ jobs:
container:
image: semgrep/semgrep:1.155.0@sha256:cc869c685dcc0fe497c86258da9f205397d8108e56d21a86082ea4886e52784d
if: github.actor != 'dependabot[bot]'
permissions:
contents: read
security-events: write
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Run Semgrep
run: semgrep scan --config auto
run: semgrep scan --config auto --sarif-output results.sarif
- name: Upload results to GitHub code scanning
uses: github/codeql-action/upload-sarif@c10b8064de6f491fea524254123dbe5e09572f13 # v4.35.1
if: always()
with:
sarif_file: results.sarif

View File

@@ -12,11 +12,13 @@ on:
concurrency:
group: registry-tags-cleanup
cancel-in-progress: false
permissions: {}
jobs:
cleanup-images:
name: Cleanup Image Tags for ${{ matrix.primary-name }}
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-24.04
environment: registry-maintenance
strategy:
fail-fast: false
matrix:
@@ -43,6 +45,7 @@ jobs:
runs-on: ubuntu-24.04
needs:
- cleanup-images
environment: registry-maintenance
strategy:
fail-fast: false
matrix:

View File

@@ -39,7 +39,7 @@ jobs:
persist-credentials: false
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@c793b717bc78562f491db7b0e93a3a178b099162 # v4.32.5
uses: github/codeql-action/init@c10b8064de6f491fea524254123dbe5e09572f13 # v4.35.1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
@@ -47,4 +47,4 @@ jobs:
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@c793b717bc78562f491db7b0e93a3a178b099162 # v4.32.5
uses: github/codeql-action/analyze@c10b8064de6f491fea524254123dbe5e09572f13 # v4.35.1

View File

@@ -6,11 +6,15 @@ on:
push:
paths: ['src/locale/**', 'src-ui/messages.xlf', 'src-ui/src/locale/**']
branches: [dev]
permissions:
contents: write
pull-requests: write
jobs:
synchronize-with-crowdin:
name: Crowdin Sync
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-24.04
environment: translation-sync
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -18,7 +22,7 @@ jobs:
token: ${{ secrets.PNGX_BOT_PAT }}
persist-credentials: false
- name: crowdin action
uses: crowdin/github-action@8818ff65bfc4322384f983ea37e3926948c11745 # v2.15.0
uses: crowdin/github-action@7ca9c452bfe9197d3bb7fa83a4d7e2b0c9ae835d # v2.16.0
with:
upload_translations: false
download_translations: true

View File

@@ -3,10 +3,6 @@ on:
schedule:
- cron: '0 3 * * *'
workflow_dispatch:
permissions:
issues: write
pull-requests: write
discussions: write
concurrency:
group: lock
jobs:
@@ -14,6 +10,9 @@ jobs:
name: 'Stale'
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-24.04
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@b5d41d4e1d5dceea10e7104786b73624c18a190f # v10.2.0
with:
@@ -36,6 +35,10 @@ jobs:
name: 'Lock Old Threads'
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-24.04
permissions:
issues: write
pull-requests: write
discussions: write
steps:
- uses: dessant/lock-threads@7266a7ce5c1df01b1c6db85bf8cd86c737dadbe7 # v6.0.0
with:
@@ -56,6 +59,8 @@ jobs:
name: 'Close Answered Discussions'
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-24.04
permissions:
discussions: write
steps:
- uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
@@ -113,6 +118,8 @@ jobs:
name: 'Close Outdated Discussions'
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-24.04
permissions:
discussions: write
steps:
- uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
@@ -205,6 +212,8 @@ jobs:
name: 'Close Unsupported Feature Requests'
if: github.repository_owner == 'paperless-ngx'
runs-on: ubuntu-24.04
permissions:
discussions: write
steps:
- uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:

View File

@@ -7,6 +7,7 @@ jobs:
generate-translate-strings:
name: Generate Translation Strings
runs-on: ubuntu-latest
environment: translation-sync
permissions:
contents: write
steps:
@@ -26,7 +27,7 @@ jobs:
sudo apt-get update -qq
sudo apt-get install -qq --no-install-recommends gettext
- name: Install uv
uses: astral-sh/setup-uv@5a095e7a2014a4212f075830d4f7277575a9d098 # v7.3.1
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
with:
enable-cache: true
- name: Install backend python dependencies

29
.github/zizmor.yml vendored Normal file
View File

@@ -0,0 +1,29 @@
rules:
template-injection:
ignore:
# github.event_name is a GitHub-internal constant (push/pull_request/etc.),
# not attacker-controllable.
- ci-docker.yml:74
- ci-docs.yml:33
# github.event.repository.default_branch refers to the target repo's setting,
# which only admins can change; not influenced by fork PR authors.
- ci-docs.yml:45
# steps.setup-python.outputs.python-version is always a semver string (e.g. "3.12.0")
# produced by actions/setup-python from a hardcoded env var input.
- ci-docs.yml:88
- ci-docs.yml:92
# needs.*.result is always one of: success/failure/cancelled/skipped.
- ci-docs.yml:131
- ci-docs.yml:132
# needs.changes.outputs.* is always "true" or "false".
- ci-docs.yml:126
# steps.build.outputs.digest is always a SHA256 digest (sha256:[a-f0-9]{64}).
- ci-docker.yml:152
dangerous-triggers:
ignore:
# Both workflows use pull_request_target solely to label/comment on fork PRs
# (requires write-back access unavailable to pull_request). Neither workflow
# checks out PR code or executes anything from the fork — only reads PR
# metadata via context/API. Permissions are scoped to pull-requests: write.
- pr-bot.yml:2
- project-actions.yml:2

1
.gitignore vendored
View File

@@ -111,3 +111,4 @@ celerybeat-schedule*
# ignore pnpm package store folder created when setting up the devcontainer
.pnpm-store/
.worktrees

View File

@@ -2,8 +2,83 @@
## Reporting a Vulnerability
The Paperless-ngx team and community take security bugs seriously. We appreciate your efforts to responsibly disclose your findings, and will make every effort to acknowledge your contributions.
The Paperless-ngx team and community take security issues seriously. We appreciate good-faith reports and will make every effort to review legitimate findings responsibly.
To report a security issue, please use the GitHub Security Advisory ["Report a Vulnerability"](https://github.com/paperless-ngx/paperless-ngx/security/advisories/new) tab.
The team will send a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.
After the initial reply to your report, the team may ask for additional information, reproduction steps, affected versions, configuration details, or proof-of-concept material needed to verify the issue.
## What makes a helpful report
Please include as much of the following as possible:
- A clear description of the issue and why it is a security vulnerability.
- Affected Paperless-ngx version(s).
- Required configuration, permissions, or preconditions.
- Step-by-step reproduction instructions.
- A realistic impact statement.
- Logs, screenshots, requests, or a proof of concept where appropriate.
- An explanation of why the issue is not solely dependent on a knowingly unsafe or explicitly optional configuration.
Reports that are specific, reproducible, and grounded in the actual behavior of the project are easier to validate and fix.
## AI-assisted submissions
AI tools can be useful for security research, but they can also produce plausible-sounding reports that do not reflect actual vulnerabilities in Paperless-ngx.
If you use AI tools to help draft or investigate a report, please disclose this and verify the result yourself before submission. Make sure the report:
- identifies a real code path in this project
- includes a real reproduction, not just a theoretical concern
- demonstrates realistic impact
- accounts for existing permission checks, configuration requirements, and trust boundaries
- is about Paperless-ngx itself, rather than a hypothetical deployment, downstream customization, third-party app, scanner output, reverse proxy behavior, or external service unless you can show a direct product vulnerability
Submitting unverified AI-generated claims creates unnecessary review work and slows down handling of legitimate reports.
<!-- Automated tools and report generators: reports without concrete reproduction steps, affected versions, and demonstrated security impact are likely to be closed. -->
## Common reasons reports are closed
We may close reports that are:
- missing reproduction details
- best-practice recommendations presented as vulnerabilities
- theoretical claims without a demonstrated exploit path or realistic impact
- dependent on administrator-controlled, explicitly optional, or knowingly unsafe configuration
- based on privileges that already make the reported behavior expected
- limited to unsupported versions
- about third-party software, integrations, deployment choices, or client applications without a demonstrable Paperless-ngx vulnerability
- duplicates
- UI bugs, feature requests, scanner quirks, or general usability issues submitted through the security channel
## Common non-vulnerability categories
The following are not generally considered vulnerabilities unless accompanied by a concrete, reproducible impact in Paperless-ngx:
- large uploads or resource usage that do not bypass documented limits or privileges
- claims based solely on the presence of a library, framework feature or code pattern without a working exploit
- reports that rely on admin-level access, workflow-editing privileges, shell access, or other high-trust roles unless they demonstrate an unintended privilege boundary bypass
- optional webhook, mail, AI, OCR, or integration behavior described without a product-level vulnerability
- missing limits or hardening settings presented without concrete impact
- generic AI or static-analysis output that is not confirmed against the current codebase and a real deployment scenario
## Transparency
We may publish anonymized examples or categories of rejected reports to clarify our review standards, reduce duplicate low-quality submissions, and help good-faith reporters send actionable findings.
A mistaken report made in good faith is not misconduct. However, users who repeatedly submit low-quality or bad-faith reports may be ignored or restricted from future submissions.
## Scope and expectations
Please use the security reporting channel only for security vulnerabilities in Paperless-ngx.
Please do not use the security advisory system for:
- support questions
- general bug reports
- feature requests
- browser compatibility issues
- issues in third-party mobile apps, reverse proxies, or deployment tooling unless you can demonstrate a Paperless-ngx vulnerability
The team will review reports as time permits, but submission does not guarantee that a report is valid, in scope, or will result in a fix. Reports that do not describe a reproducible product-level issue may be closed without extended back-and-forth.

View File

@@ -821,11 +821,14 @@ parsing documents.
#### [`PAPERLESS_OCR_MODE=<mode>`](#PAPERLESS_OCR_MODE) {#PAPERLESS_OCR_MODE}
: Tell paperless when and how to perform ocr on your documents. Three
: Tell paperless when and how to perform ocr on your documents. Four
modes are available:
- `skip`: Paperless skips all pages and will perform ocr only on
pages where no text is present. This is the safest option.
- `auto` (default): Paperless detects whether a document already
has embedded text via pdftotext. If sufficient text is found,
OCR is skipped for that document (`--skip-text`). If no text is
present, OCR runs normally. This is the safest option for mixed
document collections.
- `redo`: Paperless will OCR all pages of your documents and
attempt to replace any existing text layers with new text. This
@@ -843,24 +846,59 @@ modes are available:
significantly larger and text won't appear as sharp when zoomed
in.
The default is `skip`, which only performs OCR when necessary and
always creates archived documents.
- `off`: Paperless never invokes the OCR engine. For PDFs, text
is extracted via pdftotext only. For image documents, text will
be empty. Archive file generation still works via format
conversion (no Tesseract or Ghostscript required).
Read more about this in the [OCRmyPDF
The default is `auto`.
For the `skip`, `redo`, and `force` modes, read more about OCR
behaviour in the [OCRmyPDF
documentation](https://ocrmypdf.readthedocs.io/en/latest/advanced.html#when-ocr-is-skipped).
#### [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE=<mode>`](#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) {#PAPERLESS_OCR_SKIP_ARCHIVE_FILE}
#### [`PAPERLESS_ARCHIVE_FILE_GENERATION=<mode>`](#PAPERLESS_ARCHIVE_FILE_GENERATION) {#PAPERLESS_ARCHIVE_FILE_GENERATION}
: Specify when you would like paperless to skip creating an archived
version of your documents. This is useful if you don't want to have two
almost-identical versions of your documents in the media folder.
: Controls when paperless creates a PDF/A archive version of your
documents. Archive files are stored alongside the original and are used
for display in the web interface.
- `never`: Never skip creating an archived version.
- `with_text`: Skip creating an archived version for documents
that already have embedded text.
- `always`: Always skip creating an archived version.
- `auto` (default): Produce archives for scanned or image-based
documents. Skip archive generation for born-digital PDFs that
already contain embedded text. This is the recommended setting
for mixed document collections.
- `always`: Always produce a PDF/A archive when the parser
supports it, regardless of whether the document already has
text.
- `never`: Never produce an archive. Only the original file is
stored. Saves disk space but the web viewer will display the
original file directly.
The default is `never`.
**Behaviour by file type and mode** (`auto` column shows the default):
| Document type | `never` | `auto` (default) | `always` |
| -------------------------- | ------- | -------------------------- | -------- |
| Scanned image (TIFF, JPEG) | No | **Yes** | Yes |
| Image-based PDF | No | **Yes** (short/no text, untagged) | Yes |
| Born-digital PDF | No | No (tagged or has embedded text) | Yes |
| Plain text, email, HTML | No | No | No |
| DOCX / ODT (via Tika) | Yes\* | Yes\* | Yes\* |
\* Tika always produces a PDF rendition for display; this counts as
the archive regardless of the setting.
!!! note
This setting applies to the built-in Tesseract parser. Parsers
that must always convert documents to PDF for display (e.g. DOCX,
ODT via Tika) will produce a PDF regardless of this setting.
!!! note
The **remote OCR parser** (Azure AI) always produces a searchable
PDF and stores it as the archive copy, regardless of this setting.
`ARCHIVE_FILE_GENERATION=never` has no effect when the remote
parser handles a document.
#### [`PAPERLESS_OCR_CLEAN=<mode>`](#PAPERLESS_OCR_CLEAN) {#PAPERLESS_OCR_CLEAN}

View File

@@ -123,7 +123,68 @@ Multiple options are combined in a single value:
PAPERLESS_DB_OPTIONS="sslmode=require;sslrootcert=/certs/ca.pem;pool.max_size=10"
```
## Search Index (Whoosh -> Tantivy)
## OCR and Archive File Generation Settings
The settings that control OCR behaviour and archive file generation have been redesigned. The old settings that coupled these two concerns together are **removed** — old values are not silently honoured; a startup warning is logged if any removed variable is still set in your environment.
### Removed settings
| Removed Setting | Replacement |
| ------------------------------------------- | --------------------------------------------------------------------- |
| `PAPERLESS_OCR_MODE=skip` | `PAPERLESS_OCR_MODE=auto` (new default) |
| `PAPERLESS_OCR_MODE=skip_noarchive` | `PAPERLESS_OCR_MODE=auto` + `PAPERLESS_ARCHIVE_FILE_GENERATION=never` |
| `PAPERLESS_OCR_SKIP_ARCHIVE_FILE=never` | `PAPERLESS_ARCHIVE_FILE_GENERATION=always` |
| `PAPERLESS_OCR_SKIP_ARCHIVE_FILE=with_text` | `PAPERLESS_ARCHIVE_FILE_GENERATION=auto` (new default) |
| `PAPERLESS_OCR_SKIP_ARCHIVE_FILE=always` | `PAPERLESS_ARCHIVE_FILE_GENERATION=never` |
### What changed and why
Previously, `OCR_MODE` conflated two independent concerns: whether to run OCR and whether to produce an archive. `skip` meant "skip OCR if text exists, but always produce an archive". `skip_noarchive` meant "skip OCR if text exists, and also skip the archive". This made it impossible to, for example, disable OCR entirely while still producing archives.
The new settings are independent:
- [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) controls OCR: `auto` (default), `force`, `redo`, `off`.
- [`PAPERLESS_ARCHIVE_FILE_GENERATION`](configuration.md#PAPERLESS_ARCHIVE_FILE_GENERATION) controls archive production: `auto` (default), `always`, `never`.
### Database configuration
If you changed OCR settings via the admin UI (ApplicationConfiguration), the database values are **migrated automatically** during the upgrade. `mode` values (`skip` / `skip_noarchive`) are mapped to their new equivalents and `skip_archive_file` values are converted to the new `archive_file_generation` field. After upgrading, review the OCR settings in the admin UI to confirm the migrated values match your intent.
### Action Required
Remove any `PAPERLESS_OCR_SKIP_ARCHIVE_FILE` variable from your environment. If you relied on `OCR_MODE=skip` or `OCR_MODE=skip_noarchive`, update accordingly:
```bash
# v2: skip OCR when text present, always archive
PAPERLESS_OCR_MODE=skip
# v3: equivalent (auto is the new default)
# No change needed — auto is the default
# v2: skip OCR when text present, skip archive too
PAPERLESS_OCR_MODE=skip_noarchive
# v3: equivalent
PAPERLESS_OCR_MODE=auto
PAPERLESS_ARCHIVE_FILE_GENERATION=never
# v2: always skip archive
PAPERLESS_OCR_SKIP_ARCHIVE_FILE=always
# v3: equivalent
PAPERLESS_ARCHIVE_FILE_GENERATION=never
# v2: skip archive only for born-digital docs
PAPERLESS_OCR_SKIP_ARCHIVE_FILE=with_text
# v3: equivalent (auto is the new default)
PAPERLESS_ARCHIVE_FILE_GENERATION=auto
```
### Remote OCR parser
If you use the **remote OCR parser** (Azure AI), note that it always produces a
searchable PDF and stores it as the archive copy. `ARCHIVE_FILE_GENERATION=never`
has no effect for documents handled by the remote parser — the archive is produced
unconditionally by the remote engine.
# Search Index (Whoosh -> Tantivy)
The full-text search backend has been replaced with [Tantivy](https://github.com/quickwit-oss/tantivy).
The index format is incompatible with Whoosh, so **the search index is automatically rebuilt from

View File

@@ -633,12 +633,11 @@ hardware, but a few settings can improve performance:
consumption, so you might want to lower these settings (example: 2
workers and 1 thread to always have some computing power left for
other tasks).
- Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `skip` and consider
- Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `auto` and consider
OCRing your documents before feeding them into Paperless. Some
scanners are able to do this!
- Set [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE`](configuration.md#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) to `with_text` to skip archive
file generation for already OCRed documents, or `always` to skip it
for all documents.
- Set [`PAPERLESS_ARCHIVE_FILE_GENERATION`](configuration.md#PAPERLESS_ARCHIVE_FILE_GENERATION) to `never` to skip archive
file generation entirely, saving disk space at the cost of in-browser PDF/A viewing.
- If you want to perform OCR on the device, consider using
`PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use
less memory at the expense of slightly worse OCR results.

View File

@@ -134,9 +134,9 @@ following operations on your documents:
!!! tip
This process can be configured to fit your needs. If you don't want
paperless to create archived versions for digital documents, you can
configure that by configuring
`PAPERLESS_OCR_SKIP_ARCHIVE_FILE=with_text`. Please read the
paperless to create archived versions for born-digital documents, set
[`PAPERLESS_ARCHIVE_FILE_GENERATION=auto`](configuration.md#PAPERLESS_ARCHIVE_FILE_GENERATION)
(the default). To skip archives entirely, use `never`. Please read the
[relevant section in the documentation](configuration.md#ocr).
!!! note
@@ -398,25 +398,27 @@ Global permissions define what areas of the app and API endpoints users can acce
determine if a user can create, edit, delete or view _any_ documents, but individual documents themselves
still have "object-level" permissions.
| Type | Details |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| AppConfig | _Change_ or higher permissions grants access to the "Application Configuration" area. |
| Correspondent | Add, edit, delete or view Correspondents. |
| CustomField | Add, edit, delete or view Custom Fields. |
| Document | Add, edit, delete or view Documents. |
| DocumentType | Add, edit, delete or view Document Types. |
| Group | Add, edit, delete or view Groups. |
| MailAccount | Add, edit, delete or view Mail Accounts. |
| MailRule | Add, edit, delete or view Mail Rules. |
| Note | Add, edit, delete or view Notes. |
| PaperlessTask | View or dismiss (_Change_) File Tasks. |
| SavedView | Add, edit, delete or view Saved Views. |
| ShareLink | Add, delete or view Share Links. |
| StoragePath | Add, edit, delete or view Storage Paths. |
| Tag | Add, edit, delete or view Tags. |
| UISettings | Add, edit, delete or view the UI settings that are used by the web app.<br/>:warning: **Users that will access the web UI must be granted at least _View_ permissions.** |
| User | Add, edit, delete or view Users. |
| Workflow | Add, edit, delete or view Workflows.<br/>Note that Workflows are global; all users who can access workflows see the same set. Workflows have other permission implications — see [Workflow permissions](#workflow-permissions). |
| Type | Details |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| AppConfig | _Change_ or higher permissions grants access to the "Application Configuration" area. |
| Correspondent | Add, edit, delete or view Correspondents. |
| CustomField | Add, edit, delete or view Custom Fields. |
| Document | Add, edit, delete or view Documents. |
| DocumentType | Add, edit, delete or view Document Types. |
| Group | Add, edit, delete or view Groups. |
| GlobalStatistics | View aggregate object counts and statistics. This does not grant access to view individual documents. |
| MailAccount | Add, edit, delete or view Mail Accounts. |
| MailRule | Add, edit, delete or view Mail Rules. |
| Note | Add, edit, delete or view Notes. |
| PaperlessTask | View or dismiss (_Change_) File Tasks. |
| SavedView | Add, edit, delete or view Saved Views. |
| ShareLink | Add, delete or view Share Links. |
| StoragePath | Add, edit, delete or view Storage Paths. |
| SystemStatus | View the system status dialog and corresponding API endpoint. Admin users also retain system status access. |
| Tag | Add, edit, delete or view Tags. |
| UISettings | Add, edit, delete or view the UI settings that are used by the web app.<br/>:warning: **Users that will access the web UI must be granted at least _View_ permissions.** |
| User | Add, edit, delete or view Users. |
| Workflow | Add, edit, delete or view Workflows.<br/>Note that Workflows are global; all users who can access workflows see the same set. Workflows have other permission implications — see [Workflow permissions](#workflow-permissions). |
#### Detailed Explanation of Object Permissions {#object-permissions}

View File

@@ -24,7 +24,7 @@ dependencies = [
"dateparser~=1.2",
# WARNING: django does not use semver.
# Only patch versions are guaranteed to not introduce breaking changes.
"django~=5.2.10",
"django~=5.2.13",
"django-allauth[mfa,socialaccount]~=65.15.0",
"django-auditlog~=3.4.1",
"django-cachalot~=2.9.0",
@@ -41,7 +41,7 @@ dependencies = [
"djangorestframework~=3.16",
"djangorestframework-guardian~=0.4.0",
"drf-spectacular~=0.28",
"drf-spectacular-sidecar~=2026.3.1",
"drf-spectacular-sidecar~=2026.4.1",
"drf-writable-nested~=0.7.1",
"faiss-cpu>=1.10",
"filelock~=3.25.2",
@@ -76,7 +76,7 @@ dependencies = [
"setproctitle~=1.3.4",
"tantivy>=0.25.1",
"tika-client~=0.11.0",
"torch~=2.10.0",
"torch~=2.11.0",
"watchfiles>=1.1.1",
"whitenoise~=6.11",
"zxing-cpp~=3.0.0",
@@ -111,12 +111,12 @@ lint = [
testing = [
"daphne",
"factory-boy~=3.3.1",
"faker~=40.8.0",
"faker~=40.12.0",
"imagehash",
"pytest~=9.0.0",
"pytest-cov~=7.0.0",
"pytest-cov~=7.1.0",
"pytest-django~=4.12.0",
"pytest-env~=1.5.0",
"pytest-env~=1.6.0",
"pytest-httpx",
"pytest-mock~=3.15.1",
# "pytest-randomly~=4.0.1",

View File

@@ -316,11 +316,11 @@
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">193</context>
<context context-type="linenumber">195</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">197</context>
<context context-type="linenumber">199</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/app-frame/app-frame.component.html</context>
@@ -518,7 +518,7 @@
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">136</context>
<context context-type="linenumber">138</context>
</context-group>
</trans-unit>
<trans-unit id="2180291763949669799" datatype="html">
@@ -540,7 +540,7 @@
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">399</context>
<context context-type="linenumber">401</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/confirm-dialog/confirm-dialog.component.ts</context>
@@ -615,7 +615,7 @@
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">400</context>
<context context-type="linenumber">402</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/correspondent-edit-dialog/correspondent-edit-dialog.component.html</context>
@@ -922,126 +922,126 @@
<source>Open Django Admin</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">30</context>
<context context-type="linenumber">32</context>
</context-group>
</trans-unit>
<trans-unit id="6439365426343089851" datatype="html">
<source>General</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">40</context>
<context context-type="linenumber">42</context>
</context-group>
</trans-unit>
<trans-unit id="8671234314555525900" datatype="html">
<source>Appearance</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">44</context>
<context context-type="linenumber">46</context>
</context-group>
</trans-unit>
<trans-unit id="3777637051272512093" datatype="html">
<source>Display language</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">47</context>
<context context-type="linenumber">49</context>
</context-group>
</trans-unit>
<trans-unit id="53523152145406584" datatype="html">
<source>You need to reload the page after applying a new language.</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">60</context>
<context context-type="linenumber">62</context>
</context-group>
</trans-unit>
<trans-unit id="3766032098416558788" datatype="html">
<source>Date display</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">68</context>
<context context-type="linenumber">70</context>
</context-group>
</trans-unit>
<trans-unit id="3733378544613473393" datatype="html">
<source>Date format</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">85</context>
<context context-type="linenumber">87</context>
</context-group>
</trans-unit>
<trans-unit id="3407788781115661841" datatype="html">
<source>Short: <x id="INTERPOLATION" equiv-text="{{today | customDate:&apos;shortDate&apos;:null:computedDateLocale}}"/></source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">91,92</context>
<context context-type="linenumber">93,94</context>
</context-group>
</trans-unit>
<trans-unit id="6290748171049664628" datatype="html">
<source>Medium: <x id="INTERPOLATION" equiv-text="{{today | customDate:&apos;mediumDate&apos;:null:computedDateLocale}}"/></source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">95,96</context>
<context context-type="linenumber">97,98</context>
</context-group>
</trans-unit>
<trans-unit id="7189855711197998347" datatype="html">
<source>Long: <x id="INTERPOLATION" equiv-text="{{today | customDate:&apos;longDate&apos;:null:computedDateLocale}}"/></source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">99,100</context>
<context context-type="linenumber">101,102</context>
</context-group>
</trans-unit>
<trans-unit id="3982403428275430291" datatype="html">
<source>Sidebar</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">107</context>
<context context-type="linenumber">109</context>
</context-group>
</trans-unit>
<trans-unit id="4608457133854405683" datatype="html">
<source>Use &apos;slim&apos; sidebar (icons only)</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">111</context>
<context context-type="linenumber">113</context>
</context-group>
</trans-unit>
<trans-unit id="1356890996281769972" datatype="html">
<source>Dark mode</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">118</context>
<context context-type="linenumber">120</context>
</context-group>
</trans-unit>
<trans-unit id="4913823100518391922" datatype="html">
<source>Use system settings</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">121</context>
<context context-type="linenumber">123</context>
</context-group>
</trans-unit>
<trans-unit id="5782828784040423650" datatype="html">
<source>Enable dark mode</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">122</context>
<context context-type="linenumber">124</context>
</context-group>
</trans-unit>
<trans-unit id="6336642923114460405" datatype="html">
<source>Invert thumbnails in dark mode</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">123</context>
<context context-type="linenumber">125</context>
</context-group>
</trans-unit>
<trans-unit id="7983234071833154796" datatype="html">
<source>Theme Color</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">129</context>
<context context-type="linenumber">131</context>
</context-group>
</trans-unit>
<trans-unit id="6760166989231109310" datatype="html">
<source>Global search</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">142</context>
<context context-type="linenumber">144</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/app-frame/global-search/global-search.component.ts</context>
@@ -1052,28 +1052,28 @@
<source>Do not include advanced search results</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">145</context>
<context context-type="linenumber">147</context>
</context-group>
</trans-unit>
<trans-unit id="3969258421469113318" datatype="html">
<source>Full search links to</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">151</context>
<context context-type="linenumber">153</context>
</context-group>
</trans-unit>
<trans-unit id="6631288852577115923" datatype="html">
<source>Title and content search</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">155</context>
<context context-type="linenumber">157</context>
</context-group>
</trans-unit>
<trans-unit id="1010505078885609376" datatype="html">
<source>Advanced search</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">156</context>
<context context-type="linenumber">158</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/app-frame/global-search/global-search.component.html</context>
@@ -1088,21 +1088,21 @@
<source>Update checking</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">161</context>
<context context-type="linenumber">163</context>
</context-group>
</trans-unit>
<trans-unit id="5070799004079086984" datatype="html">
<source>Enable update checking</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">164</context>
<context context-type="linenumber">166</context>
</context-group>
</trans-unit>
<trans-unit id="5752465522295465624" datatype="html">
<source>What&apos;s this?</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">165</context>
<context context-type="linenumber">167</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/page-header/page-header.component.html</context>
@@ -1121,21 +1121,21 @@
<source> Update checking works by pinging the public GitHub API for the latest release to determine whether a new version is available. Actual updating of the app must still be performed manually. </source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">169,171</context>
<context context-type="linenumber">171,173</context>
</context-group>
</trans-unit>
<trans-unit id="8416061320800650487" datatype="html">
<source>No tracking data is collected by the app in any way.</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">173</context>
<context context-type="linenumber">175</context>
</context-group>
</trans-unit>
<trans-unit id="5775451530782446954" datatype="html">
<source>Saved Views</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">179</context>
<context context-type="linenumber">181</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/app-frame/app-frame.component.html</context>
@@ -1154,126 +1154,126 @@
<source>Show warning when closing saved views with unsaved changes</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">182</context>
<context context-type="linenumber">184</context>
</context-group>
</trans-unit>
<trans-unit id="4975481913502931184" datatype="html">
<source>Show document counts in sidebar saved views</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">183</context>
<context context-type="linenumber">185</context>
</context-group>
</trans-unit>
<trans-unit id="8939587804990976924" datatype="html">
<source>Items per page</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">200</context>
<context context-type="linenumber">202</context>
</context-group>
</trans-unit>
<trans-unit id="908152367861642592" datatype="html">
<source>Document editing</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">212</context>
<context context-type="linenumber">214</context>
</context-group>
</trans-unit>
<trans-unit id="6708098108196142028" datatype="html">
<source>Use PDF viewer provided by the browser</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">215</context>
<context context-type="linenumber">217</context>
</context-group>
</trans-unit>
<trans-unit id="9003921625412907981" datatype="html">
<source>This is usually faster for displaying large PDF documents, but it might not work on some browsers.</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">215</context>
<context context-type="linenumber">217</context>
</context-group>
</trans-unit>
<trans-unit id="2678648946508279627" datatype="html">
<source>Default zoom</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">221</context>
<context context-type="linenumber">223</context>
</context-group>
</trans-unit>
<trans-unit id="2222784219255971268" datatype="html">
<source>Fit width</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">225</context>
<context context-type="linenumber">227</context>
</context-group>
</trans-unit>
<trans-unit id="8409221133589393872" datatype="html">
<source>Fit page</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">226</context>
<context context-type="linenumber">228</context>
</context-group>
</trans-unit>
<trans-unit id="7019985100624067992" datatype="html">
<source>Only applies to the Paperless-ngx PDF viewer.</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">228</context>
<context context-type="linenumber">230</context>
</context-group>
</trans-unit>
<trans-unit id="2959590948110714366" datatype="html">
<source>Automatically remove inbox tag(s) on save</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">234</context>
<context context-type="linenumber">236</context>
</context-group>
</trans-unit>
<trans-unit id="8793267604636304297" datatype="html">
<source>Show document thumbnail during loading</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">240</context>
<context context-type="linenumber">242</context>
</context-group>
</trans-unit>
<trans-unit id="1783600598811723080" datatype="html">
<source>Built-in fields to show:</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">246</context>
<context context-type="linenumber">248</context>
</context-group>
</trans-unit>
<trans-unit id="3467966318201103991" datatype="html">
<source>Uncheck fields to hide them on the document details page.</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">258</context>
<context context-type="linenumber">260</context>
</context-group>
</trans-unit>
<trans-unit id="8508424367627989968" datatype="html">
<source>Bulk editing</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">264</context>
<context context-type="linenumber">266</context>
</context-group>
</trans-unit>
<trans-unit id="8158899674926420054" datatype="html">
<source>Show confirmation dialogs</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">267</context>
<context context-type="linenumber">269</context>
</context-group>
</trans-unit>
<trans-unit id="290238406234356122" datatype="html">
<source>Apply on close</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">268</context>
<context context-type="linenumber">270</context>
</context-group>
</trans-unit>
<trans-unit id="5084275925647254161" datatype="html">
<source>PDF Editor</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">272</context>
<context context-type="linenumber">274</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/document-detail/document-detail.component.html</context>
@@ -1288,14 +1288,14 @@
<source>Default editing mode</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">275</context>
<context context-type="linenumber">277</context>
</context-group>
</trans-unit>
<trans-unit id="7273640930165035289" datatype="html">
<source>Create new document(s)</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">279</context>
<context context-type="linenumber">281</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/pdf-editor/pdf-editor.component.html</context>
@@ -1306,7 +1306,7 @@
<source>Add document version</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">280</context>
<context context-type="linenumber">282</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/pdf-editor/pdf-editor.component.html</context>
@@ -1317,7 +1317,7 @@
<source>Notes</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">285</context>
<context context-type="linenumber">287</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/document-list/document-list.component.html</context>
@@ -1336,14 +1336,14 @@
<source>Enable notes</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">288</context>
<context context-type="linenumber">290</context>
</context-group>
</trans-unit>
<trans-unit id="7314814725704332646" datatype="html">
<source>Permissions</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">297</context>
<context context-type="linenumber">299</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/group-edit-dialog/group-edit-dialog.component.html</context>
@@ -1394,28 +1394,28 @@
<source>Default Permissions</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">300</context>
<context context-type="linenumber">302</context>
</context-group>
</trans-unit>
<trans-unit id="6544153565064275581" datatype="html">
<source> Settings apply to this user account for objects (Tags, Mail Rules, etc. but not documents) created via the web UI. </source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">304,306</context>
<context context-type="linenumber">306,308</context>
</context-group>
</trans-unit>
<trans-unit id="4292903881380648974" datatype="html">
<source>Default Owner</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">311</context>
<context context-type="linenumber">313</context>
</context-group>
</trans-unit>
<trans-unit id="734147282056744882" datatype="html">
<source>Objects without an owner can be viewed and edited by all users</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">315</context>
<context context-type="linenumber">317</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/input/permissions/permissions-form/permissions-form.component.html</context>
@@ -1426,18 +1426,18 @@
<source>Default View Permissions</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">320</context>
<context context-type="linenumber">322</context>
</context-group>
</trans-unit>
<trans-unit id="2191775412581217688" datatype="html">
<source>Users:</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">325</context>
<context context-type="linenumber">327</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">352</context>
<context context-type="linenumber">354</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.html</context>
@@ -1468,11 +1468,11 @@
<source>Groups:</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">335</context>
<context context-type="linenumber">337</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">362</context>
<context context-type="linenumber">364</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.html</context>
@@ -1503,14 +1503,14 @@
<source>Default Edit Permissions</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">347</context>
<context context-type="linenumber">349</context>
</context-group>
</trans-unit>
<trans-unit id="3728984448750213892" datatype="html">
<source>Edit permissions also grant viewing permissions</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">371</context>
<context context-type="linenumber">373</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/workflow-edit-dialog/workflow-edit-dialog.component.html</context>
@@ -1529,7 +1529,7 @@
<source>Notifications</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">379</context>
<context context-type="linenumber">381</context>
</context-group>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/app-frame/toasts-dropdown/toasts-dropdown.component.html</context>
@@ -1540,42 +1540,42 @@
<source>Document processing</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">382</context>
<context context-type="linenumber">384</context>
</context-group>
</trans-unit>
<trans-unit id="3656786776644872398" datatype="html">
<source>Show notifications when new documents are detected</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">386</context>
<context context-type="linenumber">388</context>
</context-group>
</trans-unit>
<trans-unit id="6057053428592387613" datatype="html">
<source>Show notifications when document processing completes successfully</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">387</context>
<context context-type="linenumber">389</context>
</context-group>
</trans-unit>
<trans-unit id="370315664367425513" datatype="html">
<source>Show notifications when document processing fails</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">388</context>
<context context-type="linenumber">390</context>
</context-group>
</trans-unit>
<trans-unit id="6838309441164918531" datatype="html">
<source>Suppress notifications on dashboard</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">389</context>
<context context-type="linenumber">391</context>
</context-group>
</trans-unit>
<trans-unit id="2741919327232918179" datatype="html">
<source>This will suppress all messages about document processing status on the dashboard.</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/admin/settings/settings.component.html</context>
<context context-type="linenumber">389</context>
<context context-type="linenumber">391</context>
</context-group>
</trans-unit>
<trans-unit id="6839066544204061364" datatype="html">
@@ -4800,8 +4800,8 @@
<context context-type="linenumber">26</context>
</context-group>
</trans-unit>
<trans-unit id="8563400529811056364" datatype="html">
<source>Access logs, Django backend</source>
<trans-unit id="5409927574404161431" datatype="html">
<source>Access system status, logs, Django backend</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/user-edit-dialog/user-edit-dialog.component.html</context>
<context context-type="linenumber">26</context>
@@ -4814,8 +4814,8 @@
<context context-type="linenumber">30</context>
</context-group>
</trans-unit>
<trans-unit id="1403759966357927756" datatype="html">
<source>(Grants all permissions and can view objects)</source>
<trans-unit id="5622335314381948156" datatype="html">
<source>Grants all permissions and can view all objects</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/edit-dialog/user-edit-dialog/user-edit-dialog.component.html</context>
<context context-type="linenumber">30</context>
@@ -6198,7 +6198,7 @@
<source>Inherited from group</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/components/common/permissions-select/permissions-select.component.ts</context>
<context context-type="linenumber">78</context>
<context context-type="linenumber">85</context>
</context-group>
</trans-unit>
<trans-unit id="6418218602775540217" datatype="html">
@@ -10456,8 +10456,8 @@
<context context-type="linenumber">111</context>
</context-group>
</trans-unit>
<trans-unit id="6114528299376689399" datatype="html">
<source>Skip Archive File</source>
<trans-unit id="8305051609904776938" datatype="html">
<source>Archive File Generation</source>
<context-group purpose="location">
<context context-type="sourcefile">src/app/data/paperless-config.ts</context>
<context context-type="linenumber">119</context>

View File

@@ -21,7 +21,7 @@
"@angular/platform-browser-dynamic": "~21.2.6",
"@angular/router": "~21.2.6",
"@ng-bootstrap/ng-bootstrap": "^20.0.0",
"@ng-select/ng-select": "^21.5.2",
"@ng-select/ng-select": "^21.7.0",
"@ngneat/dirty-check-forms": "^3.0.3",
"@popperjs/core": "^2.11.8",
"bootstrap": "^5.3.8",
@@ -32,7 +32,7 @@
"ngx-cookie-service": "^21.3.1",
"ngx-device-detector": "^11.0.0",
"ngx-ui-tour-ng-bootstrap": "^18.0.0",
"pdfjs-dist": "^5.4.624",
"pdfjs-dist": "^5.6.205",
"rxjs": "^7.8.2",
"tslib": "^2.8.1",
"utif": "^3.1.0",
@@ -42,28 +42,28 @@
"devDependencies": {
"@angular-builders/custom-webpack": "^21.0.3",
"@angular-builders/jest": "^21.0.3",
"@angular-devkit/core": "^21.2.3",
"@angular-devkit/schematics": "^21.2.3",
"@angular-devkit/core": "^21.2.6",
"@angular-devkit/schematics": "^21.2.6",
"@angular-eslint/builder": "21.3.1",
"@angular-eslint/eslint-plugin": "21.3.1",
"@angular-eslint/eslint-plugin-template": "21.3.1",
"@angular-eslint/schematics": "21.3.1",
"@angular-eslint/template-parser": "21.3.1",
"@angular/build": "^21.2.3",
"@angular/cli": "~21.2.3",
"@angular/build": "^21.2.6",
"@angular/cli": "~21.2.6",
"@angular/compiler-cli": "~21.2.6",
"@codecov/webpack-plugin": "^1.9.1",
"@playwright/test": "^1.58.2",
"@playwright/test": "^1.59.0",
"@types/jest": "^30.0.0",
"@types/node": "^25.5.0",
"@typescript-eslint/eslint-plugin": "^8.57.2",
"@typescript-eslint/parser": "^8.57.2",
"@typescript-eslint/utils": "^8.57.2",
"@typescript-eslint/eslint-plugin": "^8.58.0",
"@typescript-eslint/parser": "^8.58.0",
"@typescript-eslint/utils": "^8.58.0",
"eslint": "^10.1.0",
"jest": "30.3.0",
"jest-environment-jsdom": "^30.3.0",
"jest-junit": "^16.0.0",
"jest-preset-angular": "^16.1.1",
"jest-preset-angular": "^16.1.2",
"jest-websocket-mock": "^2.5.0",
"prettier-plugin-organize-imports": "^4.3.0",
"ts-node": "~10.9.1",

1035
src-ui/pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@@ -7,7 +7,7 @@
<button class="btn btn-sm btn-outline-primary" (click)="tourService.start()">
<i-bs class="me-2" name="airplane"></i-bs><ng-container i18n>Start tour</ng-container>
</button>
@if (permissionsService.isAdmin()) {
@if (canViewSystemStatus) {
<button class="btn btn-sm btn-outline-primary position-relative ms-md-5 me-1" (click)="showSystemStatus()"
[disabled]="!systemStatus">
@if (!systemStatus) {
@@ -26,6 +26,8 @@
}
<ng-container i18n>System Status</ng-container>
</button>
}
@if (permissionsService.isAdmin()) {
<a class="btn btn-sm btn-primary" href="admin/" target="_blank">
<ng-container i18n>Open Django Admin</ng-container>
<i-bs class="ms-2" name="arrow-up-right"></i-bs>

View File

@@ -29,7 +29,11 @@ import { IfOwnerDirective } from 'src/app/directives/if-owner.directive'
import { IfPermissionsDirective } from 'src/app/directives/if-permissions.directive'
import { PermissionsGuard } from 'src/app/guards/permissions.guard'
import { CustomDatePipe } from 'src/app/pipes/custom-date.pipe'
import { PermissionsService } from 'src/app/services/permissions.service'
import {
PermissionAction,
PermissionType,
PermissionsService,
} from 'src/app/services/permissions.service'
import { GroupService } from 'src/app/services/rest/group.service'
import { SavedViewService } from 'src/app/services/rest/saved-view.service'
import { UserService } from 'src/app/services/rest/user.service'
@@ -328,7 +332,13 @@ describe('SettingsComponent', () => {
it('should load system status on initialize, show errors if needed', () => {
jest.spyOn(systemStatusService, 'get').mockReturnValue(of(status))
jest.spyOn(permissionsService, 'isAdmin').mockReturnValue(true)
jest
.spyOn(permissionsService, 'currentUserCan')
.mockImplementation(
(action, type) =>
action === PermissionAction.View &&
type === PermissionType.SystemStatus
)
completeSetup()
expect(component['systemStatus']).toEqual(status) // private
expect(component.systemStatusHasErrors).toBeTruthy()
@@ -344,7 +354,13 @@ describe('SettingsComponent', () => {
it('should open system status dialog', () => {
const modalOpenSpy = jest.spyOn(modalService, 'open')
jest.spyOn(systemStatusService, 'get').mockReturnValue(of(status))
jest.spyOn(permissionsService, 'isAdmin').mockReturnValue(true)
jest
.spyOn(permissionsService, 'currentUserCan')
.mockImplementation(
(action, type) =>
action === PermissionAction.View &&
type === PermissionType.SystemStatus
)
completeSetup()
component.showSystemStatus()
expect(modalOpenSpy).toHaveBeenCalledWith(SystemStatusDialogComponent, {

View File

@@ -429,7 +429,7 @@ export class SettingsComponent
this.settingsForm.patchValue(currentFormValue)
}
if (this.permissionsService.isAdmin()) {
if (this.canViewSystemStatus) {
this.systemStatusService.get().subscribe((status) => {
this.systemStatus = status
})
@@ -647,6 +647,16 @@ export class SettingsComponent
.setValue(Array.from(hiddenFields))
}
public get canViewSystemStatus(): boolean {
return (
this.permissionsService.isAdmin() ||
this.permissionsService.currentUserCan(
PermissionAction.View,
PermissionType.SystemStatus
)
)
}
showSystemStatus() {
const modal: NgbModalRef = this.modalService.open(
SystemStatusDialogComponent,

View File

@@ -23,11 +23,11 @@
</div>
<div class="form-check form-switch form-check-inline">
<input type="checkbox" class="form-check-input" id="is_staff" formControlName="is_staff">
<label class="form-check-label" for="is_staff"><ng-container i18n>Admin</ng-container> <small class="form-text text-muted ms-1" i18n>Access logs, Django backend</small></label>
<label class="form-check-label" for="is_staff"><ng-container i18n>Admin</ng-container> <small class="form-text text-muted ms-1" i18n>Access system status, logs, Django backend</small></label>
</div>
<div class="form-check form-switch form-check-inline">
<input type="checkbox" class="form-check-input" id="is_superuser" formControlName="is_superuser" (change)="onToggleSuperUser()">
<label class="form-check-label" for="is_superuser"><ng-container i18n>Superuser</ng-container> <small class="form-text text-muted ms-1" i18n>(Grants all permissions and can view objects)</small></label>
<label class="form-check-label" for="is_superuser"><ng-container i18n>Superuser</ng-container> <small class="form-text text-muted ms-1" i18n>Grants all permissions and can view all objects</small></label>
</div>
</div>

View File

@@ -26,8 +26,8 @@
<input type="checkbox" class="form-check-input" id="{{type}}_all" (change)="toggleAll($event, type)" [checked]="typesWithAllActions.has(type) || isInherited(type)" [attr.disabled]="disabled || isInherited(type) ? true : null">
<label class="form-check-label visually-hidden" for="{{type}}_all" i18n>All</label>
</div>
@for (action of PermissionAction | keyvalue; track action) {
<div class="col form-check form-check-inline" [ngbPopover]="inheritedWarning" [disablePopover]="!isInherited(type, action.key)" placement="left" triggers="mouseenter:mouseleave">
@for (action of PermissionAction | keyvalue: sortActions; track action.key) {
<div class="col form-check form-check-inline" [class.invisible]="!isActionSupported(PermissionType[type], action.value)" [ngbPopover]="inheritedWarning" [disablePopover]="!isInherited(type, action.key)" placement="left" triggers="mouseenter:mouseleave">
<input type="checkbox" class="form-check-input" id="{{type}}_{{action.key}}" formControlName="{{action.key}}">
<label class="form-check-label visually-hidden" for="{{type}}_{{action.key}}">{{action.key}}</label>
</div>

View File

@@ -26,7 +26,6 @@ const inheritedPermissions = ['change_tag', 'view_documenttype']
describe('PermissionsSelectComponent', () => {
let component: PermissionsSelectComponent
let fixture: ComponentFixture<PermissionsSelectComponent>
let permissionsChangeResult: Permissions
let settingsService: SettingsService
beforeEach(async () => {
@@ -45,7 +44,7 @@ describe('PermissionsSelectComponent', () => {
fixture = TestBed.createComponent(PermissionsSelectComponent)
fixture.debugElement.injector.get(NG_VALUE_ACCESSOR)
component = fixture.componentInstance
component.registerOnChange((r) => (permissionsChangeResult = r))
component.registerOnChange((r) => r)
fixture.detectChanges()
})
@@ -75,7 +74,6 @@ describe('PermissionsSelectComponent', () => {
it('should update on permissions set', () => {
component.ngOnInit()
component.writeValue(permissions)
expect(permissionsChangeResult).toEqual(permissions)
expect(component.typesWithAllActions).toContain('Document')
})
@@ -92,13 +90,12 @@ describe('PermissionsSelectComponent', () => {
it('disable checkboxes when permissions are inherited', () => {
component.ngOnInit()
component.inheritedPermissions = inheritedPermissions
fixture.detectChanges()
expect(component.isInherited('Document', 'Add')).toBeFalsy()
expect(component.isInherited('Document')).toBeFalsy()
expect(component.isInherited('Tag', 'Change')).toBeTruthy()
const input1 = fixture.debugElement.query(By.css('input#Document_Add'))
expect(input1.nativeElement.disabled).toBeFalsy()
const input2 = fixture.debugElement.query(By.css('input#Tag_Change'))
expect(input2.nativeElement.disabled).toBeTruthy()
expect(component.form.get('Document').get('Add').disabled).toBeFalsy()
expect(component.form.get('Tag').get('Change').disabled).toBeTruthy()
})
it('should exclude history permissions if disabled', () => {
@@ -107,4 +104,60 @@ describe('PermissionsSelectComponent', () => {
component = fixture.componentInstance
expect(component.allowedTypes).not.toContain('History')
})
it('should treat global statistics as view-only', () => {
component.ngOnInit()
fixture.detectChanges()
expect(
component.isActionSupported(
PermissionType.GlobalStatistics,
PermissionAction.View
)
).toBeTruthy()
expect(
component.isActionSupported(
PermissionType.GlobalStatistics,
PermissionAction.Add
)
).toBeFalsy()
const addInput = fixture.debugElement.query(
By.css('input#GlobalStatistics_Add')
)
const viewInput = fixture.debugElement.query(
By.css('input#GlobalStatistics_View')
)
expect(addInput.nativeElement.disabled).toBeTruthy()
expect(viewInput.nativeElement.disabled).toBeFalsy()
})
it('should treat system status as view-only', () => {
component.ngOnInit()
fixture.detectChanges()
expect(
component.isActionSupported(
PermissionType.SystemStatus,
PermissionAction.View
)
).toBeTruthy()
expect(
component.isActionSupported(
PermissionType.SystemStatus,
PermissionAction.Change
)
).toBeFalsy()
const changeInput = fixture.debugElement.query(
By.css('input#SystemStatus_Change')
)
const viewInput = fixture.debugElement.query(
By.css('input#SystemStatus_View')
)
expect(changeInput.nativeElement.disabled).toBeTruthy()
expect(viewInput.nativeElement.disabled).toBeFalsy()
})
})

View File

@@ -1,4 +1,4 @@
import { KeyValuePipe } from '@angular/common'
import { KeyValue, KeyValuePipe } from '@angular/common'
import { Component, forwardRef, inject, Input, OnInit } from '@angular/core'
import {
AbstractControl,
@@ -58,6 +58,13 @@ export class PermissionsSelectComponent
typesWithAllActions: Set<string> = new Set()
private readonly actionOrder = [
PermissionAction.Add,
PermissionAction.Change,
PermissionAction.Delete,
PermissionAction.View,
]
_inheritedPermissions: string[] = []
@Input()
@@ -86,7 +93,7 @@ export class PermissionsSelectComponent
}
this.allowedTypes.forEach((type) => {
const control = new FormGroup({})
for (const action in PermissionAction) {
for (const action of Object.keys(PermissionAction)) {
control.addControl(action, new FormControl(null))
}
this.form.addControl(type, control)
@@ -106,18 +113,14 @@ export class PermissionsSelectComponent
this.permissionsService.getPermissionKeys(permissionStr)
if (actionKey && typeKey) {
if (this.form.get(typeKey)?.get(actionKey)) {
this.form
.get(typeKey)
.get(actionKey)
.patchValue(true, { emitEvent: false })
}
this.form
.get(typeKey)
?.get(actionKey)
?.patchValue(true, { emitEvent: false })
}
})
this.allowedTypes.forEach((type) => {
if (
Object.values(this.form.get(type).value).every((val) => val == true)
) {
if (this.typeHasAllActionsSelected(type)) {
this.typesWithAllActions.add(type)
} else {
this.typesWithAllActions.delete(type)
@@ -149,12 +152,16 @@ export class PermissionsSelectComponent
this.form.valueChanges.subscribe((newValue) => {
let permissions = []
Object.entries(newValue).forEach(([typeKey, typeValue]) => {
// e.g. [Document, { Add: true, View: true ... }]
const selectedActions = Object.entries(typeValue).filter(
([actionKey, actionValue]) => actionValue == true
([actionKey, actionValue]) =>
actionValue &&
this.isActionSupported(
PermissionType[typeKey],
PermissionAction[actionKey]
)
)
selectedActions.forEach(([actionKey, actionValue]) => {
selectedActions.forEach(([actionKey]) => {
permissions.push(
(PermissionType[typeKey] as string).replace(
'%s',
@@ -163,7 +170,7 @@ export class PermissionsSelectComponent
)
})
if (selectedActions.length == Object.entries(typeValue).length) {
if (this.typeHasAllActionsSelected(typeKey)) {
this.typesWithAllActions.add(typeKey)
} else {
this.typesWithAllActions.delete(typeKey)
@@ -174,19 +181,23 @@ export class PermissionsSelectComponent
permissions.filter((p) => !this._inheritedPermissions.includes(p))
)
})
this.updateDisabledStates()
}
toggleAll(event, type) {
const typeGroup = this.form.get(type)
if (event.target.checked) {
Object.keys(PermissionAction).forEach((action) => {
typeGroup.get(action).patchValue(true)
Object.keys(PermissionAction)
.filter((action) =>
this.isActionSupported(PermissionType[type], PermissionAction[action])
)
.forEach((action) => {
typeGroup.get(action).patchValue(event.target.checked)
})
if (this.typeHasAllActionsSelected(type)) {
this.typesWithAllActions.add(type)
} else {
Object.keys(PermissionAction).forEach((action) => {
typeGroup.get(action).patchValue(false)
})
this.typesWithAllActions.delete(type)
}
}
@@ -201,14 +212,21 @@ export class PermissionsSelectComponent
)
)
} else {
return Object.values(PermissionAction).every((action) => {
return this._inheritedPermissions.includes(
this.permissionsService.getPermissionCode(
action as PermissionAction,
PermissionType[typeKey]
return Object.keys(PermissionAction)
.filter((action) =>
this.isActionSupported(
PermissionType[typeKey],
PermissionAction[action]
)
)
})
.every((action) => {
return this._inheritedPermissions.includes(
this.permissionsService.getPermissionCode(
PermissionAction[action],
PermissionType[typeKey]
)
)
})
}
}
@@ -216,12 +234,55 @@ export class PermissionsSelectComponent
this.allowedTypes.forEach((type) => {
const control = this.form.get(type)
let actionControl: AbstractControl
for (const action in PermissionAction) {
for (const action of Object.keys(PermissionAction)) {
actionControl = control.get(action)
if (
!this.isActionSupported(
PermissionType[type],
PermissionAction[action]
)
) {
actionControl.patchValue(false, { emitEvent: false })
actionControl.disable({ emitEvent: false })
continue
}
this.isInherited(type, action) || this.disabled
? actionControl.disable()
: actionControl.enable()
? actionControl.disable({ emitEvent: false })
: actionControl.enable({ emitEvent: false })
}
})
}
public isActionSupported(
type: PermissionType,
action: PermissionAction
): boolean {
// Global statistics and system status only support view
if (
type === PermissionType.GlobalStatistics ||
type === PermissionType.SystemStatus
) {
return action === PermissionAction.View
}
return true
}
private typeHasAllActionsSelected(typeKey: string): boolean {
return Object.keys(PermissionAction)
.filter((action) =>
this.isActionSupported(
PermissionType[typeKey],
PermissionAction[action]
)
)
.every((action) => !!this.form.get(typeKey)?.get(action)?.value)
}
public sortActions = (
a: KeyValue<string, PermissionAction>,
b: KeyValue<string, PermissionAction>
): number =>
this.actionOrder.indexOf(a.value) - this.actionOrder.indexOf(b.value)
}

View File

@@ -11,16 +11,16 @@ export enum OutputTypeConfig {
}
export enum ModeConfig {
SKIP = 'skip',
REDO = 'redo',
AUTO = 'auto',
FORCE = 'force',
SKIP_NO_ARCHIVE = 'skip_noarchive',
REDO = 'redo',
OFF = 'off',
}
export enum ArchiveFileConfig {
NEVER = 'never',
WITH_TEXT = 'with_text',
AUTO = 'auto',
ALWAYS = 'always',
NEVER = 'never',
}
export enum CleanConfig {
@@ -115,11 +115,11 @@ export const PaperlessConfigOptions: ConfigOption[] = [
category: ConfigCategory.OCR,
},
{
key: 'skip_archive_file',
title: $localize`Skip Archive File`,
key: 'archive_file_generation',
title: $localize`Archive File Generation`,
type: ConfigOptionType.Select,
choices: mapToItems(ArchiveFileConfig),
config_key: 'PAPERLESS_OCR_SKIP_ARCHIVE_FILE',
config_key: 'PAPERLESS_ARCHIVE_FILE_GENERATION',
category: ConfigCategory.OCR,
},
{
@@ -337,7 +337,7 @@ export interface PaperlessConfig extends ObjectWithId {
pages: number
language: string
mode: ModeConfig
skip_archive_file: ArchiveFileConfig
archive_file_generation: ArchiveFileConfig
image_dpi: number
unpaper_clean: CleanConfig
deskew: boolean

View File

@@ -6,6 +6,11 @@ import {
PermissionsService,
} from './permissions.service'
const VIEW_ONLY_PERMISSION_TYPES = new Set<PermissionType>([
PermissionType.GlobalStatistics,
PermissionType.SystemStatus,
])
describe('PermissionsService', () => {
let permissionsService: PermissionsService
@@ -264,6 +269,8 @@ describe('PermissionsService', () => {
'change_applicationconfiguration',
'delete_applicationconfiguration',
'view_applicationconfiguration',
'view_global_statistics',
'view_system_status',
],
{
username: 'testuser',
@@ -274,7 +281,10 @@ describe('PermissionsService', () => {
Object.values(PermissionType).forEach((type) => {
Object.values(PermissionAction).forEach((action) => {
expect(permissionsService.currentUserCan(action, type)).toBeTruthy()
expect(permissionsService.currentUserCan(action, type)).toBe(
!VIEW_ONLY_PERMISSION_TYPES.has(type) ||
action === PermissionAction.View
)
})
})

View File

@@ -29,6 +29,8 @@ export enum PermissionType {
CustomField = '%s_customfield',
Workflow = '%s_workflow',
ProcessedMail = '%s_processedmail',
GlobalStatistics = '%s_global_statistics',
SystemStatus = '%s_system_status',
}
@Injectable({

View File

@@ -73,7 +73,7 @@ describe('LocalizedDateParserFormatter', () => {
it('should handle years when current year % 100 < 50', () => {
jest.useFakeTimers()
jest.setSystemTime(new Date(2026, 5, 15))
jest.setSystemTime(new Date(2026, 5, 15).getTime())
let val = dateParserFormatter.parse('5/4/26')
expect(val).toEqual({ day: 4, month: 5, year: 2026 })
@@ -87,7 +87,7 @@ describe('LocalizedDateParserFormatter', () => {
it('should handle years when current year % 100 >= 50', () => {
jest.useFakeTimers()
jest.setSystemTime(new Date(2076, 5, 15))
jest.setSystemTime(new Date(2076, 5, 15).getTime())
const val = dateParserFormatter.parse('5/4/00')
expect(val).toEqual({ day: 4, month: 5, year: 2100 })
jest.useRealTimers()

View File

@@ -1,4 +1,5 @@
import datetime
import logging
import os
import shutil
import tempfile
@@ -50,9 +51,14 @@ from documents.utils import compute_checksum
from documents.utils import copy_basic_file_stats
from documents.utils import copy_file_with_basic_stats
from documents.utils import run_subprocess
from paperless.config import OcrConfig
from paperless.models import ArchiveFileGenerationChoices
from paperless.parsers import ParserContext
from paperless.parsers import ParserProtocol
from paperless.parsers.registry import get_parser_registry
from paperless.parsers.utils import PDF_TEXT_MIN_LENGTH
from paperless.parsers.utils import extract_pdf_text
from paperless.parsers.utils import is_tagged_pdf
LOGGING_NAME: Final[str] = "paperless.consumer"
@@ -105,6 +111,74 @@ class ConsumerStatusShortMessage(StrEnum):
FAILED = "failed"
def should_produce_archive(
parser: "ParserProtocol",
mime_type: str,
document_path: Path,
log: logging.Logger | None = None,
) -> bool:
"""Return True if a PDF/A archive should be produced for this document.
IMPORTANT: *parser* must be an instantiated parser, not the class.
``requires_pdf_rendition`` and ``can_produce_archive`` are instance
``@property`` methods — accessing them on the class returns the descriptor
(always truthy).
"""
_log = log or logging.getLogger(LOGGING_NAME)
# Must produce a PDF so the frontend can display the original format at all.
if parser.requires_pdf_rendition:
_log.debug("Archive: yes — parser requires PDF rendition for frontend display")
return True
# Parser cannot produce an archive (e.g. TextDocumentParser).
if not parser.can_produce_archive:
_log.debug("Archive: no — parser cannot produce archives")
return False
generation = OcrConfig().archive_file_generation
if generation == ArchiveFileGenerationChoices.ALWAYS:
_log.debug("Archive: yes — ARCHIVE_FILE_GENERATION=always")
return True
if generation == ArchiveFileGenerationChoices.NEVER:
_log.debug("Archive: no — ARCHIVE_FILE_GENERATION=never")
return False
# auto: produce archives for scanned/image documents; skip for born-digital PDFs.
if mime_type.startswith("image/"):
_log.debug("Archive: yes — image document, ARCHIVE_FILE_GENERATION=auto")
return True
if mime_type == "application/pdf":
if is_tagged_pdf(document_path):
_log.debug(
"Archive: no — born-digital PDF (structure tags detected),"
" ARCHIVE_FILE_GENERATION=auto",
)
return False
text = extract_pdf_text(document_path)
if text is None or len(text) <= PDF_TEXT_MIN_LENGTH:
_log.debug(
"Archive: yes — scanned PDF (text_length=%d%d),"
" ARCHIVE_FILE_GENERATION=auto",
len(text) if text else 0,
PDF_TEXT_MIN_LENGTH,
)
return True
_log.debug(
"Archive: no — born-digital PDF (text_length=%d > %d),"
" ARCHIVE_FILE_GENERATION=auto",
len(text),
PDF_TEXT_MIN_LENGTH,
)
return False
_log.debug(
"Archive: no — MIME type %r not eligible for auto archive generation",
mime_type,
)
return False
class ConsumerPluginMixin:
if TYPE_CHECKING:
from logging import Logger
@@ -436,7 +510,17 @@ class ConsumerPlugin(
)
self.log.debug(f"Parsing {self.filename}...")
document_parser.parse(self.working_copy, mime_type)
produce_archive = should_produce_archive(
document_parser,
mime_type,
self.working_copy,
self.log,
)
document_parser.parse(
self.working_copy,
mime_type,
produce_archive=produce_archive,
)
self.log.debug(f"Generating thumbnail for {self.filename}...")
self._send_progress(
@@ -785,7 +869,7 @@ class ConsumerPlugin(
return document
def apply_overrides(self, document) -> None:
def apply_overrides(self, document: Document) -> None:
if self.metadata.correspondent_id:
document.correspondent = Correspondent.objects.get(
pk=self.metadata.correspondent_id,

View File

@@ -56,6 +56,26 @@ class PaperlessAdminPermissions(BasePermission):
return request.user.is_staff
def has_global_statistics_permission(user: User | None) -> bool:
if user is None or not getattr(user, "is_authenticated", False):
return False
return getattr(user, "is_superuser", False) or user.has_perm(
"paperless.view_global_statistics",
)
def has_system_status_permission(user: User | None) -> bool:
if user is None or not getattr(user, "is_authenticated", False):
return False
return (
getattr(user, "is_superuser", False)
or getattr(user, "is_staff", False)
or user.has_perm("paperless.view_system_status")
)
def get_groups_with_only_permission(obj, codename):
ctype = ContentType.objects.get_for_model(obj)
permission = Permission.objects.get(content_type=ctype, codename=codename)

View File

@@ -30,6 +30,7 @@ from documents.consumer import AsnCheckPlugin
from documents.consumer import ConsumerPlugin
from documents.consumer import ConsumerPreflightPlugin
from documents.consumer import WorkflowTriggerPlugin
from documents.consumer import should_produce_archive
from documents.data_models import ConsumableDocument
from documents.data_models import DocumentMetadataOverrides
from documents.double_sided import CollatePlugin
@@ -311,7 +312,16 @@ def update_document_content_maybe_archive_file(document_id) -> None:
parser.configure(ParserContext())
try:
parser.parse(document.source_path, mime_type)
produce_archive = should_produce_archive(
parser,
mime_type,
document.source_path,
)
parser.parse(
document.source_path,
mime_type,
produce_archive=produce_archive,
)
thumbnail = parser.get_thumbnail(document.source_path, mime_type)

View File

@@ -6,6 +6,8 @@ from unittest.mock import patch
from django.contrib.auth.models import User
from django.core.files.uploadedfile import SimpleUploadedFile
from django.test import override_settings
from PIL import Image
from PIL.PngImagePlugin import PngInfo
from rest_framework import status
from rest_framework.test import APITestCase
@@ -46,7 +48,7 @@ class TestApiAppConfig(DirectoriesMixin, APITestCase):
"pages": None,
"language": None,
"mode": None,
"skip_archive_file": None,
"archive_file_generation": None,
"image_dpi": None,
"unpaper_clean": None,
"deskew": None,
@@ -201,6 +203,156 @@ class TestApiAppConfig(DirectoriesMixin, APITestCase):
)
self.assertFalse(Path(old_logo.path).exists())
def test_api_strips_exif_data_from_uploaded_logo(self) -> None:
"""
GIVEN:
- A JPEG logo upload containing EXIF metadata
WHEN:
- Uploaded via PATCH to app config
THEN:
- Stored logo image has EXIF metadata removed
"""
image = Image.new("RGB", (12, 12), "blue")
exif = Image.Exif()
exif[315] = "Paperless Test Author"
logo = BytesIO()
image.save(logo, format="JPEG", exif=exif)
logo.seek(0)
response = self.client.patch(
f"{self.ENDPOINT}1/",
{
"app_logo": SimpleUploadedFile(
name="logo-with-exif.jpg",
content=logo.getvalue(),
content_type="image/jpeg",
),
},
)
self.assertEqual(response.status_code, status.HTTP_200_OK)
config = ApplicationConfiguration.objects.first()
with Image.open(config.app_logo.path) as stored_logo:
stored_exif = stored_logo.getexif()
self.assertEqual(len(stored_exif), 0)
def test_api_strips_png_metadata_from_uploaded_logo(self) -> None:
"""
GIVEN:
- A PNG logo upload containing text metadata
WHEN:
- Uploaded via PATCH to app config
THEN:
- Stored logo image has metadata removed
"""
image = Image.new("RGB", (12, 12), "green")
pnginfo = PngInfo()
pnginfo.add_text("Author", "Paperless Test Author")
logo = BytesIO()
image.save(logo, format="PNG", pnginfo=pnginfo)
logo.seek(0)
response = self.client.patch(
f"{self.ENDPOINT}1/",
{
"app_logo": SimpleUploadedFile(
name="logo-with-metadata.png",
content=logo.getvalue(),
content_type="image/png",
),
},
)
self.assertEqual(response.status_code, status.HTTP_200_OK)
config = ApplicationConfiguration.objects.first()
with Image.open(config.app_logo.path) as stored_logo:
stored_text = stored_logo.text
self.assertEqual(stored_text, {})
def test_api_accepts_valid_gif_logo(self) -> None:
"""
GIVEN:
- A valid GIF logo upload
WHEN:
- Uploaded via PATCH to app config
THEN:
- Upload succeeds
"""
image = Image.new("RGB", (12, 12), "red")
logo = BytesIO()
image.save(logo, format="GIF", comment=b"Paperless Test Comment")
logo.seek(0)
response = self.client.patch(
f"{self.ENDPOINT}1/",
{
"app_logo": SimpleUploadedFile(
name="logo.gif",
content=logo.getvalue(),
content_type="image/gif",
),
},
)
self.assertEqual(response.status_code, status.HTTP_200_OK)
def test_api_rejects_invalid_raster_logo(self) -> None:
"""
GIVEN:
- A file named as a JPEG but containing non-image payload data
WHEN:
- Uploaded via PATCH to app config
THEN:
- Upload is rejected with 400
"""
response = self.client.patch(
f"{self.ENDPOINT}1/",
{
"app_logo": SimpleUploadedFile(
name="not-an-image.jpg",
content=b"<script>alert('xss')</script>",
content_type="image/jpeg",
),
},
)
self.assertEqual(response.status_code, status.HTTP_400_BAD_REQUEST)
self.assertIn("invalid logo image", str(response.data).lower())
@override_settings(MAX_IMAGE_PIXELS=100)
def test_api_rejects_logo_exceeding_max_image_pixels(self) -> None:
"""
GIVEN:
- A raster logo larger than the configured MAX_IMAGE_PIXELS limit
WHEN:
- Uploaded via PATCH to app config
THEN:
- Upload is rejected with 400
"""
image = Image.new("RGB", (12, 12), "purple")
logo = BytesIO()
image.save(logo, format="PNG")
logo.seek(0)
response = self.client.patch(
f"{self.ENDPOINT}1/",
{
"app_logo": SimpleUploadedFile(
name="too-large.png",
content=logo.getvalue(),
content_type="image/png",
),
},
)
self.assertEqual(response.status_code, status.HTTP_400_BAD_REQUEST)
self.assertIn(
"uploaded logo exceeds the maximum allowed image size",
str(response.data).lower(),
)
def test_api_rejects_malicious_svg_logo(self) -> None:
"""
GIVEN:

View File

@@ -1309,7 +1309,7 @@ class TestCustomFieldsAPI(DirectoriesMixin, APITestCase):
# Test as user without access to the document
non_superuser = User.objects.create_user(username="non_superuser")
non_superuser.user_permissions.add(
*Permission.objects.all(),
*Permission.objects.exclude(codename="view_global_statistics"),
)
non_superuser.save()
self.client.force_authenticate(user=non_superuser)

View File

@@ -18,6 +18,7 @@ from django.contrib.auth.models import Permission
from django.contrib.auth.models import User
from django.core import mail
from django.core.cache import cache
from django.core.files.uploadedfile import SimpleUploadedFile
from django.db import DataError
from django.test import override_settings
from django.utils import timezone
@@ -1314,6 +1315,41 @@ class TestDocumentApi(DirectoriesMixin, DocumentConsumeDelayMixin, APITestCase):
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(response.data["documents_inbox"], 0)
def test_statistics_with_statistics_permission(self) -> None:
owner = User.objects.create_user("owner")
stats_user = User.objects.create_user("stats-user")
stats_user.user_permissions.add(
Permission.objects.get(codename="view_global_statistics"),
)
inbox_tag = Tag.objects.create(
name="stats_inbox",
is_inbox_tag=True,
owner=owner,
)
Document.objects.create(
title="owned-doc",
checksum="stats-A",
mime_type="application/pdf",
content="abcdef",
owner=owner,
).tags.add(inbox_tag)
Correspondent.objects.create(name="stats-correspondent", owner=owner)
DocumentType.objects.create(name="stats-type", owner=owner)
StoragePath.objects.create(name="stats-path", path="archive", owner=owner)
self.client.force_authenticate(user=stats_user)
response = self.client.get("/api/statistics/")
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.assertEqual(response.data["documents_total"], 1)
self.assertEqual(response.data["documents_inbox"], 1)
self.assertEqual(response.data["inbox_tags"], [inbox_tag.pk])
self.assertEqual(response.data["character_count"], 6)
self.assertEqual(response.data["correspondent_count"], 1)
self.assertEqual(response.data["document_type_count"], 1)
self.assertEqual(response.data["storage_path_count"], 1)
def test_upload(self) -> None:
self.consume_file_mock.return_value = celery.result.AsyncResult(
id=str(uuid.uuid4()),
@@ -1342,6 +1378,79 @@ class TestDocumentApi(DirectoriesMixin, DocumentConsumeDelayMixin, APITestCase):
self.assertIsNone(overrides.document_type_id)
self.assertIsNone(overrides.tag_ids)
def test_upload_with_path_traversal_filename_is_reduced_to_basename(self) -> None:
self.consume_file_mock.return_value = celery.result.AsyncResult(
id=str(uuid.uuid4()),
)
payload = SimpleUploadedFile(
"../../outside.pdf",
(Path(__file__).parent / "samples" / "simple.pdf").read_bytes(),
content_type="application/pdf",
)
response = self.client.post(
"/api/documents/post_document/",
{"document": payload},
)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.consume_file_mock.assert_called_once()
input_doc, overrides = self.get_last_consume_delay_call_args()
self.assertEqual(input_doc.original_file.name, "outside.pdf")
self.assertEqual(overrides.filename, "outside.pdf")
self.assertNotIn("..", input_doc.original_file.name)
self.assertNotIn("..", overrides.filename)
self.assertTrue(
input_doc.original_file.resolve(strict=False).is_relative_to(
Path(settings.SCRATCH_DIR).resolve(strict=False),
),
)
def test_upload_with_path_traversal_content_disposition_filename_is_reduced_to_basename(
self,
) -> None:
self.consume_file_mock.return_value = celery.result.AsyncResult(
id=str(uuid.uuid4()),
)
pdf_bytes = (Path(__file__).parent / "samples" / "simple.pdf").read_bytes()
boundary = "paperless-boundary"
payload = (
(
f"--{boundary}\r\n"
'Content-Disposition: form-data; name="document"; '
'filename="../../outside.pdf"\r\n'
"Content-Type: application/pdf\r\n\r\n"
).encode()
+ pdf_bytes
+ f"\r\n--{boundary}--\r\n".encode()
)
response = self.client.generic(
"POST",
"/api/documents/post_document/",
payload,
content_type=f"multipart/form-data; boundary={boundary}",
)
self.assertEqual(response.status_code, status.HTTP_200_OK)
self.consume_file_mock.assert_called_once()
input_doc, overrides = self.get_last_consume_delay_call_args()
self.assertEqual(input_doc.original_file.name, "outside.pdf")
self.assertEqual(overrides.filename, "outside.pdf")
self.assertNotIn("..", input_doc.original_file.name)
self.assertNotIn("..", overrides.filename)
self.assertTrue(
input_doc.original_file.resolve(strict=False).is_relative_to(
Path(settings.SCRATCH_DIR).resolve(strict=False),
),
)
def test_document_filters_use_latest_version_content(self) -> None:
root = Document.objects.create(
title="versioned root",

View File

@@ -5,12 +5,14 @@ from pathlib import Path
from unittest import mock
from celery import states
from django.contrib.auth.models import Permission
from django.contrib.auth.models import User
from django.test import override_settings
from rest_framework import status
from rest_framework.test import APITestCase
from documents.models import PaperlessTask
from documents.permissions import has_system_status_permission
from paperless import version
@@ -91,6 +93,22 @@ class TestSystemStatus(APITestCase):
self.client.force_login(normal_user)
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)
# test the permission helper function directly for good measure
self.assertFalse(has_system_status_permission(None))
def test_system_status_with_system_status_permission(self) -> None:
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_401_UNAUTHORIZED)
user = User.objects.create_user(username="status_user")
user.user_permissions.add(
Permission.objects.get(codename="view_system_status"),
)
self.client.force_login(user)
response = self.client.get(self.ENDPOINT)
self.assertEqual(response.status_code, status.HTTP_200_OK)
def test_system_status_with_bad_basic_auth_challenges(self) -> None:
self.client.credentials(HTTP_AUTHORIZATION="Basic invalid")

View File

@@ -1020,7 +1020,7 @@ class TestTagBarcode(DirectoriesMixin, SampleDirMixin, GetReaderPluginMixin, Tes
CONSUMER_TAG_BARCODE_SPLIT=True,
CONSUMER_TAG_BARCODE_MAPPING={"TAG:(.*)": "\\g<1>"},
CELERY_TASK_ALWAYS_EAGER=True,
OCR_MODE="skip",
OCR_MODE="auto",
)
def test_consume_barcode_file_tag_split_and_assignment(self) -> None:
"""

View File

@@ -230,7 +230,11 @@ class TestConsumer(
shutil.copy(src, dst)
return dst
@override_settings(FILENAME_FORMAT=None, TIME_ZONE="America/Chicago")
@override_settings(
FILENAME_FORMAT=None,
TIME_ZONE="America/Chicago",
ARCHIVE_FILE_GENERATION="always",
)
def testNormalOperation(self) -> None:
filename = self.get_test_file()
@@ -629,7 +633,10 @@ class TestConsumer(
# Database empty
self.assertEqual(Document.objects.all().count(), 0)
@override_settings(FILENAME_FORMAT="{correspondent}/{title}")
@override_settings(
FILENAME_FORMAT="{correspondent}/{title}",
ARCHIVE_FILE_GENERATION="always",
)
def testFilenameHandling(self) -> None:
with self.get_consumer(
self.get_test_file(),
@@ -646,7 +653,7 @@ class TestConsumer(
self._assert_first_last_send_progress()
@mock.patch("documents.consumer.generate_unique_filename")
@override_settings(FILENAME_FORMAT="{pk}")
@override_settings(FILENAME_FORMAT="{pk}", ARCHIVE_FILE_GENERATION="always")
def testFilenameHandlingFallsBackWhenGeneratedPathExceedsDbLimit(self, m):
m.side_effect = lambda doc, archive_filename=False: Path(
("a" * 1100 + ".pdf") if not archive_filename else ("b" * 1100 + ".pdf"),
@@ -673,7 +680,10 @@ class TestConsumer(
self._assert_first_last_send_progress()
@override_settings(FILENAME_FORMAT="{correspondent}/{title}")
@override_settings(
FILENAME_FORMAT="{correspondent}/{title}",
ARCHIVE_FILE_GENERATION="always",
)
@mock.patch("documents.signals.handlers.generate_unique_filename")
def testFilenameHandlingUnstableFormat(self, m) -> None:
filenames = ["this", "that", "now this", "i cannot decide"]
@@ -1021,7 +1031,7 @@ class TestConsumer(
self.assertEqual(Document.objects.count(), 2)
self._assert_first_last_send_progress()
@override_settings(FILENAME_FORMAT="{title}")
@override_settings(FILENAME_FORMAT="{title}", ARCHIVE_FILE_GENERATION="always")
@mock.patch("documents.consumer.get_parser_registry")
def test_similar_filenames(self, m) -> None:
shutil.copy(
@@ -1132,6 +1142,7 @@ class TestConsumer(
mock_mail_parser_parse.assert_called_once_with(
consumer.working_copy,
"message/rfc822",
produce_archive=True,
)
@@ -1279,7 +1290,14 @@ class PreConsumeTestCase(DirectoriesMixin, GetConsumerMixin, TestCase):
def test_no_pre_consume_script(self, m) -> None:
with self.get_consumer(self.test_file) as c:
c.run()
m.assert_not_called()
# Verify no pre-consume script subprocess was invoked
# (run_subprocess may still be called by _extract_text_for_archive_check)
script_calls = [
call
for call in m.call_args_list
if call.args and call.args[0] and call.args[0][0] not in ("pdftotext",)
]
self.assertEqual(script_calls, [])
@mock.patch("documents.consumer.run_subprocess")
@override_settings(PRE_CONSUME_SCRIPT="does-not-exist")
@@ -1295,9 +1313,16 @@ class PreConsumeTestCase(DirectoriesMixin, GetConsumerMixin, TestCase):
with self.get_consumer(self.test_file) as c:
c.run()
m.assert_called_once()
self.assertTrue(m.called)
args, _ = m.call_args
# Find the call that invoked the pre-consume script
# (run_subprocess may also be called by _extract_text_for_archive_check)
script_call = next(
call
for call in m.call_args_list
if call.args and call.args[0] and call.args[0][0] == script.name
)
args, _ = script_call
command = args[0]
environment = args[1]

View File

@@ -0,0 +1,189 @@
"""Tests for should_produce_archive()."""
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
from unittest.mock import MagicMock
import pytest
from documents.consumer import should_produce_archive
if TYPE_CHECKING:
from pytest_mock import MockerFixture
def _parser_instance(
*,
can_produce: bool = True,
requires_rendition: bool = False,
) -> MagicMock:
"""Return a mock parser instance with the given capability flags."""
instance = MagicMock()
instance.can_produce_archive = can_produce
instance.requires_pdf_rendition = requires_rendition
return instance
@pytest.fixture()
def null_app_config(mocker) -> MagicMock:
"""Mock ApplicationConfiguration with all fields None → falls back to Django settings."""
return mocker.MagicMock(
output_type=None,
pages=None,
language=None,
mode=None,
archive_file_generation=None,
image_dpi=None,
unpaper_clean=None,
deskew=None,
rotate_pages=None,
rotate_pages_threshold=None,
max_image_pixels=None,
color_conversion_strategy=None,
user_args=None,
)
@pytest.fixture(autouse=True)
def patch_app_config(mocker, null_app_config):
"""Patch BaseConfig._get_config_instance for all tests in this module."""
mocker.patch(
"paperless.config.BaseConfig._get_config_instance",
return_value=null_app_config,
)
class TestShouldProduceArchive:
@pytest.mark.parametrize(
("generation", "can_produce", "requires_rendition", "mime", "expected"),
[
pytest.param(
"never",
True,
False,
"application/pdf",
False,
id="never-returns-false",
),
pytest.param(
"always",
True,
False,
"application/pdf",
True,
id="always-returns-true",
),
pytest.param(
"never",
True,
True,
"application/pdf",
True,
id="requires-rendition-overrides-never",
),
pytest.param(
"always",
False,
False,
"text/plain",
False,
id="cannot-produce-overrides-always",
),
pytest.param(
"always",
False,
True,
"application/pdf",
True,
id="requires-rendition-wins-even-if-cannot-produce",
),
pytest.param(
"auto",
True,
False,
"image/tiff",
True,
id="auto-image-returns-true",
),
pytest.param(
"auto",
True,
False,
"message/rfc822",
False,
id="auto-non-pdf-non-image-returns-false",
),
],
)
def test_generation_setting(
self,
settings,
generation: str,
can_produce: bool, # noqa: FBT001
requires_rendition: bool, # noqa: FBT001
mime: str,
expected: bool, # noqa: FBT001
) -> None:
settings.ARCHIVE_FILE_GENERATION = generation
parser = _parser_instance(
can_produce=can_produce,
requires_rendition=requires_rendition,
)
assert should_produce_archive(parser, mime, Path("/tmp/doc")) is expected
@pytest.mark.parametrize(
("extracted_text", "expected"),
[
pytest.param(
"This is a born-digital PDF with lots of text content. " * 10,
False,
id="born-digital-long-text-skips-archive",
),
pytest.param(None, True, id="no-text-scanned-produces-archive"),
pytest.param("tiny", True, id="short-text-treated-as-scanned"),
],
)
def test_auto_pdf_archive_decision(
self,
mocker: MockerFixture,
settings,
extracted_text: str | None,
expected: bool, # noqa: FBT001
) -> None:
settings.ARCHIVE_FILE_GENERATION = "auto"
mocker.patch("documents.consumer.is_tagged_pdf", return_value=False)
mocker.patch("documents.consumer.extract_pdf_text", return_value=extracted_text)
parser = _parser_instance(can_produce=True, requires_rendition=False)
assert (
should_produce_archive(parser, "application/pdf", Path("/tmp/doc.pdf"))
is expected
)
def test_tagged_pdf_skips_archive_in_auto_mode(
self,
mocker: MockerFixture,
settings,
) -> None:
"""Tagged PDFs (e.g. Word exports) are treated as born-digital regardless of text length."""
settings.ARCHIVE_FILE_GENERATION = "auto"
mocker.patch("documents.consumer.is_tagged_pdf", return_value=True)
parser = _parser_instance(can_produce=True, requires_rendition=False)
assert (
should_produce_archive(parser, "application/pdf", Path("/tmp/doc.pdf"))
is False
)
def test_tagged_pdf_does_not_call_pdftotext(
self,
mocker: MockerFixture,
settings,
) -> None:
"""When a PDF is tagged, pdftotext is not invoked (fast path)."""
settings.ARCHIVE_FILE_GENERATION = "auto"
mocker.patch("documents.consumer.is_tagged_pdf", return_value=True)
mock_extract = mocker.patch("documents.consumer.extract_pdf_text")
parser = _parser_instance(can_produce=True, requires_rendition=False)
should_produce_archive(parser, "application/pdf", Path("/tmp/doc.pdf"))
mock_extract.assert_not_called()

View File

@@ -27,7 +27,10 @@ sample_file: Path = Path(__file__).parent / "samples" / "simple.pdf"
@pytest.mark.management
@override_settings(FILENAME_FORMAT="{correspondent}/{title}")
@override_settings(
FILENAME_FORMAT="{correspondent}/{title}",
ARCHIVE_FILE_GENERATION="always",
)
class TestArchiver(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
def make_models(self):
return Document.objects.create(

View File

@@ -213,6 +213,7 @@ class TestEmptyTrashTask(DirectoriesMixin, FileSystemAssertsMixin, TestCase):
self.assertEqual(Document.global_objects.count(), 0)
@override_settings(ARCHIVE_FILE_GENERATION="always")
class TestUpdateContent(DirectoriesMixin, TestCase):
def test_update_content_maybe_archive_file(self) -> None:
"""

View File

@@ -165,7 +165,9 @@ from documents.permissions import ViewDocumentsPermissions
from documents.permissions import annotate_document_count_for_related_queryset
from documents.permissions import get_document_count_filter_for_user
from documents.permissions import get_objects_for_user_owner_aware
from documents.permissions import has_global_statistics_permission
from documents.permissions import has_perms_owner_aware
from documents.permissions import has_system_status_permission
from documents.permissions import set_permissions_for_object
from documents.plugins.date_parsing import get_date_parser
from documents.schema import generate_object_with_permissions_schema
@@ -3265,10 +3267,11 @@ class StatisticsView(GenericAPIView):
def get(self, request, format=None):
user = request.user if request.user is not None else None
can_view_global_stats = has_global_statistics_permission(user) or user is None
documents = (
Document.objects.all()
if user is None
if can_view_global_stats
else get_objects_for_user_owner_aware(
user,
"documents.view_document",
@@ -3277,12 +3280,12 @@ class StatisticsView(GenericAPIView):
)
tags = (
Tag.objects.all()
if user is None
if can_view_global_stats
else get_objects_for_user_owner_aware(user, "documents.view_tag", Tag)
).only("id", "is_inbox_tag")
correspondent_count = (
Correspondent.objects.count()
if user is None
if can_view_global_stats
else get_objects_for_user_owner_aware(
user,
"documents.view_correspondent",
@@ -3291,7 +3294,7 @@ class StatisticsView(GenericAPIView):
)
document_type_count = (
DocumentType.objects.count()
if user is None
if can_view_global_stats
else get_objects_for_user_owner_aware(
user,
"documents.view_documenttype",
@@ -3300,7 +3303,7 @@ class StatisticsView(GenericAPIView):
)
storage_path_count = (
StoragePath.objects.count()
if user is None
if can_view_global_stats
else get_objects_for_user_owner_aware(
user,
"documents.view_storagepath",
@@ -4257,7 +4260,7 @@ class SystemStatusView(PassUserMixin):
permission_classes = (IsAuthenticated,)
def get(self, request, format=None):
if not request.user.is_staff:
if not has_system_status_permission(request.user):
return HttpResponseForbidden("Insufficient permissions")
current_version = version.__full_version_str__

View File

@@ -2,7 +2,7 @@ msgid ""
msgstr ""
"Project-Id-Version: paperless-ngx\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-03 20:54+0000\n"
"POT-Creation-Date: 2026-04-08 15:41+0000\n"
"PO-Revision-Date: 2022-02-17 04:17\n"
"Last-Translator: \n"
"Language-Team: English\n"
@@ -1308,8 +1308,8 @@ msgid "workflow runs"
msgstr ""
#: documents/serialisers.py:463 documents/serialisers.py:815
#: documents/serialisers.py:2545 documents/views.py:2120
#: documents/views.py:2175 paperless_mail/serialisers.py:143
#: documents/serialisers.py:2545 documents/views.py:2122
#: documents/views.py:2177 paperless_mail/serialisers.py:143
msgid "Insufficient permissions."
msgstr ""
@@ -1349,7 +1349,7 @@ msgstr ""
msgid "Duplicate document identifiers are not allowed."
msgstr ""
#: documents/serialisers.py:2631 documents/views.py:3784
#: documents/serialisers.py:2631 documents/views.py:3787
#, python-format
msgid "Documents not found: %(ids)s"
msgstr ""
@@ -1617,28 +1617,28 @@ msgstr ""
msgid "Unable to parse URI {value}"
msgstr ""
#: documents/views.py:2077
#: documents/views.py:2079
msgid "Specify only one of text, title_search, query, or more_like_id."
msgstr ""
#: documents/views.py:2113 documents/views.py:2172
#: documents/views.py:2115 documents/views.py:2174
msgid "Invalid more_like_id"
msgstr ""
#: documents/views.py:3796
#: documents/views.py:3799
#, python-format
msgid "Insufficient permissions to share document %(id)s."
msgstr ""
#: documents/views.py:3839
#: documents/views.py:3842
msgid "Bundle is already being processed."
msgstr ""
#: documents/views.py:3896
#: documents/views.py:3899
msgid "The share link bundle is still being prepared. Please try again later."
msgstr ""
#: documents/views.py:3906
#: documents/views.py:3909
msgid "The share link bundle is unavailable."
msgstr ""
@@ -1666,32 +1666,28 @@ msgstr ""
msgid "pdfa-3"
msgstr ""
#: paperless/models.py:39
msgid "skip"
#: paperless/models.py:39 paperless/models.py:50
msgid "auto"
msgstr ""
#: paperless/models.py:40
msgid "redo"
msgstr ""
#: paperless/models.py:41
msgid "force"
msgstr ""
#: paperless/models.py:42
msgid "skip_noarchive"
#: paperless/models.py:41
msgid "redo"
msgstr ""
#: paperless/models.py:50
msgid "never"
#: paperless/models.py:42
msgid "off"
msgstr ""
#: paperless/models.py:51
msgid "with_text"
msgid "always"
msgstr ""
#: paperless/models.py:52
msgid "always"
msgid "never"
msgstr ""
#: paperless/models.py:60
@@ -1755,7 +1751,7 @@ msgid "Sets the OCR mode"
msgstr ""
#: paperless/models.py:130
msgid "Controls the generation of an archive file"
msgid "Controls archive file generation"
msgstr ""
#: paperless/models.py:138

View File

@@ -5,6 +5,7 @@ import shutil
import stat
import subprocess
from pathlib import Path
from typing import Any
from django.conf import settings
from django.core.checks import Error
@@ -22,7 +23,7 @@ writeable_hint = (
)
def path_check(var, directory: Path) -> list[Error]:
def path_check(var: str, directory: Path) -> list[Error]:
messages: list[Error] = []
if directory:
if not directory.is_dir():
@@ -59,7 +60,7 @@ def path_check(var, directory: Path) -> list[Error]:
@register()
def paths_check(app_configs, **kwargs) -> list[Error]:
def paths_check(app_configs: Any, **kwargs: Any) -> list[Error]:
"""
Check the various paths for existence, readability and writeability
"""
@@ -73,7 +74,7 @@ def paths_check(app_configs, **kwargs) -> list[Error]:
@register()
def binaries_check(app_configs, **kwargs):
def binaries_check(app_configs: Any, **kwargs: Any) -> list[Error]:
"""
Paperless requires the existence of a few binaries, so we do some checks
for those here.
@@ -93,7 +94,7 @@ def binaries_check(app_configs, **kwargs):
@register()
def debug_mode_check(app_configs, **kwargs):
def debug_mode_check(app_configs: Any, **kwargs: Any) -> list[Warning]:
if settings.DEBUG:
return [
Warning(
@@ -109,7 +110,7 @@ def debug_mode_check(app_configs, **kwargs):
@register()
def settings_values_check(app_configs, **kwargs):
def settings_values_check(app_configs: Any, **kwargs: Any) -> list[Error | Warning]:
"""
Validates at least some of the user provided settings
"""
@@ -132,23 +133,14 @@ def settings_values_check(app_configs, **kwargs):
Error(f'OCR output type "{settings.OCR_OUTPUT_TYPE}" is not valid'),
)
if settings.OCR_MODE not in {"force", "skip", "redo", "skip_noarchive"}:
if settings.OCR_MODE not in {"auto", "force", "redo", "off"}:
msgs.append(Error(f'OCR output mode "{settings.OCR_MODE}" is not valid'))
if settings.OCR_MODE == "skip_noarchive":
msgs.append(
Warning(
'OCR output mode "skip_noarchive" is deprecated and will be '
"removed in a future version. Please use "
"PAPERLESS_OCR_SKIP_ARCHIVE_FILE instead.",
),
)
if settings.OCR_SKIP_ARCHIVE_FILE not in {"never", "with_text", "always"}:
if settings.ARCHIVE_FILE_GENERATION not in {"auto", "always", "never"}:
msgs.append(
Error(
"OCR_SKIP_ARCHIVE_FILE setting "
f'"{settings.OCR_SKIP_ARCHIVE_FILE}" is not valid',
"PAPERLESS_ARCHIVE_FILE_GENERATION setting "
f'"{settings.ARCHIVE_FILE_GENERATION}" is not valid',
),
)
@@ -191,7 +183,7 @@ def settings_values_check(app_configs, **kwargs):
@register()
def audit_log_check(app_configs, **kwargs):
def audit_log_check(app_configs: Any, **kwargs: Any) -> list[Error]:
db_conn = connections["default"]
all_tables = db_conn.introspection.table_names()
result = []
@@ -303,7 +295,42 @@ def check_deprecated_db_settings(
@register()
def check_remote_parser_configured(app_configs, **kwargs) -> list[Error]:
def check_deprecated_v2_ocr_env_vars(
app_configs: object,
**kwargs: object,
) -> list[Warning]:
"""Warn when deprecated v2 OCR environment variables are set.
Users upgrading from v2 may still have these in their environment or
config files, where they are now silently ignored.
"""
warnings: list[Warning] = []
if os.environ.get("PAPERLESS_OCR_SKIP_ARCHIVE_FILE"):
warnings.append(
Warning(
"PAPERLESS_OCR_SKIP_ARCHIVE_FILE is set but has no effect. "
"Use PAPERLESS_ARCHIVE_FILE_GENERATION=never/always/auto instead.",
id="paperless.W002",
),
)
ocr_mode = os.environ.get("PAPERLESS_OCR_MODE", "")
if ocr_mode in {"skip", "skip_noarchive"}:
warnings.append(
Warning(
f"PAPERLESS_OCR_MODE={ocr_mode!r} is not a valid value. "
f"Use PAPERLESS_OCR_MODE=auto (and PAPERLESS_ARCHIVE_FILE_GENERATION=never "
f"if you used skip_noarchive) instead.",
id="paperless.W003",
),
)
return warnings
@register()
def check_remote_parser_configured(app_configs: Any, **kwargs: Any) -> list[Error]:
if settings.REMOTE_OCR_ENGINE == "azureai" and not (
settings.REMOTE_OCR_ENDPOINT and settings.REMOTE_OCR_API_KEY
):
@@ -329,7 +356,7 @@ def get_tesseract_langs():
@register()
def check_default_language_available(app_configs, **kwargs):
def check_default_language_available(app_configs: Any, **kwargs: Any) -> list[Error]:
errs = []
if not settings.OCR_LANGUAGE:

View File

@@ -4,6 +4,11 @@ import json
from django.conf import settings
from paperless.models import ApplicationConfiguration
from paperless.models import ArchiveFileGenerationChoices
from paperless.models import CleanChoices
from paperless.models import ColorConvertChoices
from paperless.models import ModeChoices
from paperless.models import OutputTypeChoices
@dataclasses.dataclass
@@ -28,7 +33,7 @@ class OutputTypeConfig(BaseConfig):
Almost all parsers care about the chosen PDF output format
"""
output_type: str = dataclasses.field(init=False)
output_type: OutputTypeChoices = dataclasses.field(init=False)
def __post_init__(self) -> None:
app_config = self._get_config_instance()
@@ -45,15 +50,17 @@ class OcrConfig(OutputTypeConfig):
pages: int | None = dataclasses.field(init=False)
language: str = dataclasses.field(init=False)
mode: str = dataclasses.field(init=False)
skip_archive_file: str = dataclasses.field(init=False)
mode: ModeChoices = dataclasses.field(init=False)
archive_file_generation: ArchiveFileGenerationChoices = dataclasses.field(
init=False,
)
image_dpi: int | None = dataclasses.field(init=False)
clean: str = dataclasses.field(init=False)
clean: CleanChoices = dataclasses.field(init=False)
deskew: bool = dataclasses.field(init=False)
rotate: bool = dataclasses.field(init=False)
rotate_threshold: float = dataclasses.field(init=False)
max_image_pixel: float | None = dataclasses.field(init=False)
color_conversion_strategy: str = dataclasses.field(init=False)
color_conversion_strategy: ColorConvertChoices = dataclasses.field(init=False)
user_args: dict[str, str] | None = dataclasses.field(init=False)
def __post_init__(self) -> None:
@@ -64,8 +71,8 @@ class OcrConfig(OutputTypeConfig):
self.pages = app_config.pages or settings.OCR_PAGES
self.language = app_config.language or settings.OCR_LANGUAGE
self.mode = app_config.mode or settings.OCR_MODE
self.skip_archive_file = (
app_config.skip_archive_file or settings.OCR_SKIP_ARCHIVE_FILE
self.archive_file_generation = (
app_config.archive_file_generation or settings.ARCHIVE_FILE_GENERATION
)
self.image_dpi = app_config.image_dpi or settings.OCR_IMAGE_DPI
self.clean = app_config.unpaper_clean or settings.OCR_CLEAN

View File

@@ -0,0 +1,90 @@
# Generated by Django 5.2.12 on 2026-03-26 20:31
from django.db import migrations
from django.db import models
_MODE_MAP = {
"skip": "auto",
"redo": "redo",
"force": "force",
"skip_noarchive": "auto",
}
_ARCHIVE_MAP = {
# never skip -> always generate
"never": "always",
# skip when text present -> auto
"with_text": "auto",
# always skip -> never generate
"always": "never",
}
def migrate_old_values(apps, schema_editor):
ApplicationConfiguration = apps.get_model("paperless", "ApplicationConfiguration")
for config in ApplicationConfiguration.objects.all():
old_mode = config.mode
old_skip = config.skip_archive_file
# Map the old mode value
if old_mode in _MODE_MAP:
config.mode = _MODE_MAP[old_mode]
# Map skip_archive_file -> archive_file_generation
if old_skip in _ARCHIVE_MAP:
config.archive_file_generation = _ARCHIVE_MAP[old_skip]
# skip_noarchive implied no archive file; set that if the user
# didn't already have an explicit skip_archive_file preference
if old_mode == "skip_noarchive" and old_skip is None:
config.archive_file_generation = "never"
config.save()
class Migration(migrations.Migration):
dependencies = [
("paperless", "0007_optimize_integer_field_sizes"),
]
operations = [
# 1. Update mode choices in-place (old values still in the column)
migrations.AlterField(
model_name="applicationconfiguration",
name="mode",
field=models.CharField(
blank=True,
choices=[
("auto", "auto"),
("force", "force"),
("redo", "redo"),
("off", "off"),
],
max_length=16,
null=True,
verbose_name="Sets the OCR mode",
),
),
# 2. Add the new field
migrations.AddField(
model_name="applicationconfiguration",
name="archive_file_generation",
field=models.CharField(
blank=True,
choices=[("auto", "auto"), ("always", "always"), ("never", "never")],
max_length=8,
null=True,
verbose_name="Controls archive file generation",
),
),
# 3. Migrate data from old values to new
migrations.RunPython(
migrate_old_values,
migrations.RunPython.noop,
),
# 4. Drop the old field
migrations.RemoveField(
model_name="applicationconfiguration",
name="skip_archive_file",
),
]

View File

@@ -0,0 +1,22 @@
# Generated by Django 5.2.12 on 2026-04-07 23:13
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("paperless", "0008_replace_skip_archive_file"),
]
operations = [
migrations.AlterModelOptions(
name="applicationconfiguration",
options={
"permissions": [
("view_global_statistics", "Can view global object counts"),
("view_system_status", "Can view system status information"),
],
"verbose_name": "paperless application settings",
},
),
]

View File

@@ -36,20 +36,20 @@ class ModeChoices(models.TextChoices):
and our own custom setting
"""
SKIP = ("skip", _("skip"))
REDO = ("redo", _("redo"))
AUTO = ("auto", _("auto"))
FORCE = ("force", _("force"))
SKIP_NO_ARCHIVE = ("skip_noarchive", _("skip_noarchive"))
REDO = ("redo", _("redo"))
OFF = ("off", _("off"))
class ArchiveFileChoices(models.TextChoices):
class ArchiveFileGenerationChoices(models.TextChoices):
"""
Settings to control creation of an archive PDF file
"""
NEVER = ("never", _("never"))
WITH_TEXT = ("with_text", _("with_text"))
AUTO = ("auto", _("auto"))
ALWAYS = ("always", _("always"))
NEVER = ("never", _("never"))
class CleanChoices(models.TextChoices):
@@ -126,12 +126,12 @@ class ApplicationConfiguration(AbstractSingletonModel):
choices=ModeChoices.choices,
)
skip_archive_file = models.CharField(
verbose_name=_("Controls the generation of an archive file"),
archive_file_generation = models.CharField(
verbose_name=_("Controls archive file generation"),
null=True,
blank=True,
max_length=16,
choices=ArchiveFileChoices.choices,
max_length=8,
choices=ArchiveFileGenerationChoices.choices,
)
image_dpi = models.PositiveSmallIntegerField(
@@ -341,6 +341,10 @@ class ApplicationConfiguration(AbstractSingletonModel):
class Meta:
verbose_name = _("paperless application settings")
permissions = [
("view_global_statistics", "Can view global object counts"),
("view_system_status", "Can view system status information"),
]
def __str__(self) -> str: # pragma: no cover
return "ApplicationConfiguration"

View File

@@ -1,5 +1,6 @@
from __future__ import annotations
import importlib.resources
import logging
import os
import re
@@ -8,6 +9,8 @@ import tempfile
from pathlib import Path
from typing import TYPE_CHECKING
from typing import Any
from typing import Final
from typing import NoReturn
from typing import Self
from django.conf import settings
@@ -15,12 +18,16 @@ from PIL import Image
from documents.parsers import ParseError
from documents.parsers import make_thumbnail_from_pdf
from documents.utils import copy_file_with_basic_stats
from documents.utils import maybe_override_pixel_limit
from documents.utils import run_subprocess
from paperless.config import OcrConfig
from paperless.models import ArchiveFileChoices
from paperless.models import CleanChoices
from paperless.models import ModeChoices
from paperless.models import OutputTypeChoices
from paperless.parsers.utils import PDF_TEXT_MIN_LENGTH
from paperless.parsers.utils import extract_pdf_text
from paperless.parsers.utils import is_tagged_pdf
from paperless.parsers.utils import read_file_handle_unicode_errors
from paperless.version import __full_version_str__
@@ -33,7 +40,11 @@ if TYPE_CHECKING:
logger = logging.getLogger("paperless.parsing.tesseract")
_SUPPORTED_MIME_TYPES: dict[str, str] = {
_SRGB_ICC_DATA: Final[bytes] = (
importlib.resources.files("ocrmypdf.data").joinpath("sRGB.icc").read_bytes()
)
_SUPPORTED_MIME_TYPES: Final[dict[str, str]] = {
"application/pdf": ".pdf",
"image/jpeg": ".jpg",
"image/png": ".png",
@@ -99,7 +110,7 @@ class RasterisedDocumentParser:
# Lifecycle
# ------------------------------------------------------------------
def __init__(self, logging_group: object = None) -> None:
def __init__(self, logging_group: object | None = None) -> None:
settings.SCRATCH_DIR.mkdir(parents=True, exist_ok=True)
self.tempdir = Path(
tempfile.mkdtemp(prefix="paperless-", dir=settings.SCRATCH_DIR),
@@ -233,7 +244,7 @@ class RasterisedDocumentParser:
if (
sidecar_file is not None
and sidecar_file.is_file()
and self.settings.mode != "redo"
and self.settings.mode != ModeChoices.REDO
):
text = read_file_handle_unicode_errors(sidecar_file)
@@ -250,36 +261,7 @@ class RasterisedDocumentParser:
if not Path(pdf_file).is_file():
return None
try:
text = None
with tempfile.NamedTemporaryFile(
mode="w+",
dir=self.tempdir,
) as tmp:
run_subprocess(
[
"pdftotext",
"-q",
"-layout",
"-enc",
"UTF-8",
str(pdf_file),
tmp.name,
],
logger=self.log,
)
text = read_file_handle_unicode_errors(Path(tmp.name))
return post_process_text(text)
except Exception:
# If pdftotext fails, fall back to OCR.
self.log.warning(
"Error while getting text from PDF document with pdftotext",
exc_info=True,
)
# probably not a PDF file.
return None
return post_process_text(extract_pdf_text(Path(pdf_file), log=self.log))
def construct_ocrmypdf_parameters(
self,
@@ -289,6 +271,7 @@ class RasterisedDocumentParser:
sidecar_file: Path,
*,
safe_fallback: bool = False,
skip_text: bool = False,
) -> dict[str, Any]:
ocrmypdf_args: dict[str, Any] = {
"input_file_or_options": input_file,
@@ -307,15 +290,14 @@ class RasterisedDocumentParser:
self.settings.color_conversion_strategy
)
if self.settings.mode == ModeChoices.FORCE or safe_fallback:
if safe_fallback or self.settings.mode == ModeChoices.FORCE:
ocrmypdf_args["force_ocr"] = True
elif self.settings.mode in {
ModeChoices.SKIP,
ModeChoices.SKIP_NO_ARCHIVE,
}:
ocrmypdf_args["skip_text"] = True
elif self.settings.mode == ModeChoices.REDO:
ocrmypdf_args["redo_ocr"] = True
elif skip_text or self.settings.mode == ModeChoices.OFF:
ocrmypdf_args["skip_text"] = True
elif self.settings.mode == ModeChoices.AUTO:
pass # no extra flag: normal OCR (text not found case)
else: # pragma: no cover
raise ParseError(f"Invalid ocr mode: {self.settings.mode}")
@@ -400,6 +382,115 @@ class RasterisedDocumentParser:
return ocrmypdf_args
def _convert_image_to_pdfa(self, document_path: Path) -> Path:
"""Convert an image to a PDF/A-2b file without invoking the OCR engine.
Uses img2pdf for the initial image->PDF wrapping, then pikepdf to stamp
PDF/A-2b conformance metadata.
No Tesseract and no Ghostscript are invoked.
"""
import img2pdf
import pikepdf
plain_pdf_path = Path(self.tempdir) / "image_plain.pdf"
try:
convert_kwargs: dict = {}
if self.settings.image_dpi is not None:
convert_kwargs["layout_fun"] = img2pdf.get_fixed_dpi_layout_fun(
(self.settings.image_dpi, self.settings.image_dpi),
)
plain_pdf_path.write_bytes(
img2pdf.convert(str(document_path), **convert_kwargs),
)
except Exception as e:
raise ParseError(
f"img2pdf conversion failed for {document_path}: {e!s}",
) from e
pdfa_path = Path(self.tempdir) / "archive.pdf"
try:
with pikepdf.open(plain_pdf_path) as pdf:
cs = pdf.make_stream(_SRGB_ICC_DATA)
cs["/N"] = 3
output_intent = pikepdf.Dictionary(
Type=pikepdf.Name("/OutputIntent"),
S=pikepdf.Name("/GTS_PDFA1"),
OutputConditionIdentifier=pikepdf.String("sRGB"),
DestOutputProfile=cs,
)
pdf.Root["/OutputIntents"] = pdf.make_indirect(
pikepdf.Array([output_intent]),
)
meta = pdf.open_metadata(set_pikepdf_as_editor=False)
meta["pdfaid:part"] = "2"
meta["pdfaid:conformance"] = "B"
pdf.save(pdfa_path)
except Exception as e:
self.log.warning(
f"PDF/A metadata stamping failed ({e!s}); falling back to plain PDF.",
)
pdfa_path.write_bytes(plain_pdf_path.read_bytes())
return pdfa_path
def _convert_pdf_to_pdfa(
self,
input_path: Path,
output_path: Path,
) -> None:
"""Convert a PDF to PDF/A using Ghostscript directly, without OCR.
Respects the user's output_type, color_conversion_strategy, and
continue_on_soft_render_error settings.
"""
from ocrmypdf._exec.ghostscript import generate_pdfa
from ocrmypdf.pdfa import generate_pdfa_ps
output_type = self.settings.output_type
if output_type == OutputTypeChoices.PDF:
# No PDF/A requested — just copy the original
copy_file_with_basic_stats(input_path, output_path)
return
# Map output_type to pdfa_part: pdfa→2, pdfa-1→1, pdfa-2→2, pdfa-3→3
pdfa_part = "2" if output_type == "pdfa" else output_type.split("-")[-1]
pdfmark = Path(self.tempdir) / "pdfa.ps"
generate_pdfa_ps(pdfmark)
color_strategy = self.settings.color_conversion_strategy or "RGB"
self.log.debug(
"Converting PDF to PDF/A-%s via Ghostscript (no OCR): %s",
pdfa_part,
input_path,
)
generate_pdfa(
pdf_pages=[pdfmark, input_path],
output_file=output_path,
compression="auto",
color_conversion_strategy=color_strategy,
pdfa_part=pdfa_part,
)
def _handle_subprocess_output_error(self, e: Exception) -> NoReturn:
"""Log context for Ghostscript failures and raise ParseError.
Called from the SubprocessOutputError handlers in parse() to avoid
duplicating the Ghostscript hint and re-raise logic.
"""
if "Ghostscript PDF/A rendering" in str(e):
self.log.warning(
"Ghostscript PDF/A rendering failed, consider setting "
"PAPERLESS_OCR_USER_ARGS: "
"'{\"continue_on_soft_render_error\": true}'",
)
raise ParseError(
f"SubprocessOutputError: {e!s}. See logs for more information.",
) from e
def parse(
self,
document_path: Path,
@@ -409,57 +500,107 @@ class RasterisedDocumentParser:
) -> None:
# This forces tesseract to use one core per page.
os.environ["OMP_THREAD_LIMIT"] = "1"
VALID_TEXT_LENGTH = 50
if mime_type == "application/pdf":
text_original = self.extract_text(None, document_path)
original_has_text = (
text_original is not None and len(text_original) > VALID_TEXT_LENGTH
)
else:
text_original = None
original_has_text = False
# If the original has text, and the user doesn't want an archive,
# we're done here
skip_archive_for_text = (
self.settings.mode == ModeChoices.SKIP_NO_ARCHIVE
or self.settings.skip_archive_file
in {
ArchiveFileChoices.WITH_TEXT,
ArchiveFileChoices.ALWAYS,
}
)
if skip_archive_for_text and original_has_text:
self.log.debug("Document has text, skipping OCRmyPDF entirely.")
self.text = text_original
return
# Either no text was in the original or there should be an archive
# file created, so OCR the file and create an archive with any
# text located via OCR
import ocrmypdf
from ocrmypdf import EncryptedPdfError
from ocrmypdf import InputFileError
from ocrmypdf import SubprocessOutputError
from ocrmypdf.exceptions import DigitalSignatureError
from ocrmypdf.exceptions import PriorOcrFoundError
if mime_type == "application/pdf":
text_original = self.extract_text(None, document_path)
original_has_text = is_tagged_pdf(document_path, log=self.log) or (
text_original is not None and len(text_original) > PDF_TEXT_MIN_LENGTH
)
else:
text_original = None
original_has_text = False
self.log.debug(
"Text detection: original_has_text=%s (text_length=%d, mode=%s, produce_archive=%s)",
original_has_text,
len(text_original) if text_original else 0,
self.settings.mode,
produce_archive,
)
# --- OCR_MODE=off: never invoke OCR engine ---
if self.settings.mode == ModeChoices.OFF:
if not produce_archive:
self.log.debug(
"OCR: skipped — OCR_MODE=off, no archive requested;"
" returning pdftotext content only",
)
self.text = text_original or ""
return
if self.is_image(mime_type):
self.log.debug(
"OCR: skipped — OCR_MODE=off, image input;"
" converting to PDF/A without OCR",
)
try:
self.archive_path = self._convert_image_to_pdfa(
document_path,
)
self.text = ""
except Exception as e:
raise ParseError(
f"Image to PDF/A conversion failed: {e!s}",
) from e
return
# PDFs in off mode: PDF/A conversion via Ghostscript, no OCR
archive_path = Path(self.tempdir) / "archive.pdf"
try:
self._convert_pdf_to_pdfa(document_path, archive_path)
self.archive_path = archive_path
self.text = text_original or ""
except SubprocessOutputError as e:
self._handle_subprocess_output_error(e)
except Exception as e:
raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
return
# --- OCR_MODE=auto: skip ocrmypdf entirely if text exists and no archive needed ---
if (
self.settings.mode == ModeChoices.AUTO
and original_has_text
and not produce_archive
):
self.log.debug(
"Document has text and no archive requested; skipping OCRmyPDF entirely.",
)
self.text = text_original
return
# --- All other paths: run ocrmypdf ---
archive_path = Path(self.tempdir) / "archive.pdf"
sidecar_file = Path(self.tempdir) / "sidecar.txt"
# auto mode with existing text: PDF/A conversion only (no OCR).
skip_text = self.settings.mode == ModeChoices.AUTO and original_has_text
if skip_text:
self.log.debug(
"OCR strategy: PDF/A conversion only (skip_text)"
" — OCR_MODE=auto, document already has text",
)
else:
self.log.debug("OCR strategy: full OCR — OCR_MODE=%s", self.settings.mode)
args = self.construct_ocrmypdf_parameters(
document_path,
mime_type,
archive_path,
sidecar_file,
skip_text=skip_text,
)
try:
self.log.debug(f"Calling OCRmyPDF with args: {args}")
ocrmypdf.ocr(**args)
if self.settings.skip_archive_file != ArchiveFileChoices.ALWAYS:
if produce_archive:
self.archive_path = archive_path
self.text = self.extract_text(sidecar_file, archive_path)
@@ -474,16 +615,8 @@ class RasterisedDocumentParser:
if original_has_text:
self.text = text_original
except SubprocessOutputError as e:
if "Ghostscript PDF/A rendering" in str(e):
self.log.warning(
"Ghostscript PDF/A rendering failed, consider setting "
"PAPERLESS_OCR_USER_ARGS: '{\"continue_on_soft_render_error\": true}'",
)
raise ParseError(
f"SubprocessOutputError: {e!s}. See logs for more information.",
) from e
except (NoTextFoundException, InputFileError) as e:
self._handle_subprocess_output_error(e)
except (NoTextFoundException, InputFileError, PriorOcrFoundError) as e:
self.log.warning(
f"Encountered an error while running OCR: {e!s}. "
f"Attempting force OCR to get the text.",
@@ -492,8 +625,6 @@ class RasterisedDocumentParser:
archive_path_fallback = Path(self.tempdir) / "archive-fallback.pdf"
sidecar_file_fallback = Path(self.tempdir) / "sidecar-fallback.txt"
# Attempt to run OCR with safe settings.
args = self.construct_ocrmypdf_parameters(
document_path,
mime_type,
@@ -505,25 +636,18 @@ class RasterisedDocumentParser:
try:
self.log.debug(f"Fallback: Calling OCRmyPDF with args: {args}")
ocrmypdf.ocr(**args)
# Don't return the archived file here, since this file
# is bigger and blurry due to --force-ocr.
self.text = self.extract_text(
sidecar_file_fallback,
archive_path_fallback,
)
if produce_archive:
self.archive_path = archive_path_fallback
except Exception as e:
# If this fails, we have a serious issue at hand.
raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
except Exception as e:
# Anything else is probably serious.
raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
# As a last resort, if we still don't have any text for any reason,
# try to extract the text from the original document.
if not self.text:
if original_has_text:
self.text = text_original

View File

@@ -10,15 +10,105 @@ from __future__ import annotations
import logging
import re
import tempfile
from pathlib import Path
from typing import TYPE_CHECKING
from typing import Final
if TYPE_CHECKING:
from pathlib import Path
from paperless.parsers import MetadataEntry
logger = logging.getLogger("paperless.parsers.utils")
# Minimum character count for a PDF to be considered "born-digital" (has real text).
# Used by both the consumer (archive decision) and the tesseract parser (skip-OCR decision).
PDF_TEXT_MIN_LENGTH: Final[int] = 50
def is_tagged_pdf(
path: Path,
log: logging.Logger | None = None,
) -> bool:
"""Return True if the PDF declares itself as tagged (born-digital indicator).
Tagged PDFs (e.g. exported from Word or LibreOffice) have ``/MarkInfo``
with ``/Marked true`` in the document root. This is a reliable signal
that the document has a logical structure and embedded text — running OCR
on it is unnecessary and archive generation can be skipped.
https://github.com/ocrmypdf/OCRmyPDF/blob/4e974ebd465a5921b2e79004f098f5d203010282/src/ocrmypdf/pdfinfo/info.py#L449
Parameters
----------
path:
Absolute path to the PDF file.
log:
Logger for warnings. Falls back to the module-level logger when omitted.
Returns
-------
bool
``True`` when the PDF is tagged, ``False`` otherwise or on any error.
"""
import pikepdf
_log = log or logger
try:
with pikepdf.open(path) as pdf:
mark_info = pdf.Root.get("/MarkInfo")
if mark_info is None:
return False
return bool(mark_info.get("/Marked", False))
except Exception:
_log.warning("Could not check PDF tag status for %s", path, exc_info=True)
return False
def extract_pdf_text(
path: Path,
log: logging.Logger | None = None,
) -> str | None:
"""Run pdftotext on *path* and return the extracted text, or None on failure.
Parameters
----------
path:
Absolute path to the PDF file.
log:
Logger for warnings. Falls back to the module-level logger when omitted.
Returns
-------
str | None
Extracted text, or ``None`` if pdftotext fails or the file is not a PDF.
"""
from documents.utils import run_subprocess
_log = log or logger
try:
with tempfile.TemporaryDirectory() as tmpdir:
out_path = Path(tmpdir) / "text.txt"
run_subprocess(
[
"pdftotext",
"-q",
"-layout",
"-enc",
"UTF-8",
str(path),
str(out_path),
],
logger=_log,
)
text = read_file_handle_unicode_errors(out_path, log=_log)
return text or None
except Exception:
_log.warning(
"Error while getting text from PDF document with pdftotext",
exc_info=True,
)
return None
def read_file_handle_unicode_errors(
filepath: Path,

View File

@@ -1,4 +1,5 @@
import logging
from io import BytesIO
import magic
from allauth.mfa.adapter import get_adapter as get_mfa_adapter
@@ -11,13 +12,16 @@ from django.contrib.auth.models import Group
from django.contrib.auth.models import Permission
from django.contrib.auth.models import User
from django.contrib.auth.password_validation import validate_password
from django.core.files.uploadedfile import InMemoryUploadedFile
from django.core.files.uploadedfile import UploadedFile
from PIL import Image
from rest_framework import serializers
from rest_framework.authtoken.serializers import AuthTokenSerializer
from paperless.models import ApplicationConfiguration
from paperless.network import validate_outbound_http_url
from paperless.validators import reject_dangerous_svg
from paperless.validators import validate_raster_image
from paperless_mail.serialisers import ObfuscatedPasswordField
logger = logging.getLogger("paperless.settings")
@@ -233,9 +237,40 @@ class ApplicationConfigurationSerializer(serializers.ModelSerializer):
instance.app_logo.delete()
return super().update(instance, validated_data)
def _sanitize_raster_image(self, file: UploadedFile) -> UploadedFile:
try:
data = BytesIO()
image = Image.open(file)
image.save(data, format=image.format)
data.seek(0)
return InMemoryUploadedFile(
file=data,
field_name=file.field_name,
name=file.name,
content_type=file.content_type,
size=data.getbuffer().nbytes,
charset=getattr(file, "charset", None),
)
finally:
image.close()
def validate_app_logo(self, file: UploadedFile):
if file and magic.from_buffer(file.read(2048), mime=True) == "image/svg+xml":
reject_dangerous_svg(file)
"""
Validates and sanitizes the uploaded app logo image. Model field already restricts to
jpg/png/gif/svg.
"""
if file:
mime_type = magic.from_buffer(file.read(2048), mime=True)
if mime_type == "image/svg+xml":
reject_dangerous_svg(file)
else:
validate_raster_image(file)
if mime_type in {"image/jpeg", "image/png"}:
file = self._sanitize_raster_image(file)
return file
def validate_llm_endpoint(self, value: str | None) -> str | None:

View File

@@ -889,10 +889,23 @@ OCR_LANGUAGE = os.getenv("PAPERLESS_OCR_LANGUAGE", "eng")
# OCRmyPDF --output-type options are available.
OCR_OUTPUT_TYPE = os.getenv("PAPERLESS_OCR_OUTPUT_TYPE", "pdfa")
# skip. redo, force
OCR_MODE = os.getenv("PAPERLESS_OCR_MODE", "skip")
if os.environ.get("PAPERLESS_OCR_MODE", "") in (
"skip",
"skip_noarchive",
): # pragma: no cover
OCR_MODE = "auto"
else:
OCR_MODE = get_choice_from_env(
"PAPERLESS_OCR_MODE",
{"auto", "force", "redo", "off"},
default="auto",
)
OCR_SKIP_ARCHIVE_FILE = os.getenv("PAPERLESS_OCR_SKIP_ARCHIVE_FILE", "never")
ARCHIVE_FILE_GENERATION = get_choice_from_env(
"PAPERLESS_ARCHIVE_FILE_GENERATION",
{"auto", "always", "never"},
default="auto",
)
OCR_IMAGE_DPI = get_int_from_env("PAPERLESS_OCR_IMAGE_DPI")

View File

@@ -708,7 +708,7 @@ def null_app_config(mocker: MockerFixture) -> MagicMock:
pages=None,
language=None,
mode=None,
skip_archive_file=None,
archive_file_generation=None,
image_dpi=None,
unpaper_clean=None,
deskew=None,

View File

@@ -0,0 +1,141 @@
"""
Tests for RasterisedDocumentParser._convert_image_to_pdfa.
The method converts an image to a PDF/A-2b file using img2pdf (wrapping)
then pikepdf (PDF/A metadata stamping), with a fallback to plain PDF when
pikepdf stamping fails. No Tesseract or Ghostscript is invoked.
These are unit/integration tests: img2pdf and pikepdf run for real; only
error-path branches mock the respective library call.
"""
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
import img2pdf
import magic
import pikepdf
import pytest
from documents.parsers import ParseError
if TYPE_CHECKING:
from pytest_mock import MockerFixture
from paperless.parsers.tesseract import RasterisedDocumentParser
class TestConvertImageToPdfa:
"""_convert_image_to_pdfa: output shape, error paths, DPI handling."""
def test_valid_png_produces_pdf_bytes(
self,
tesseract_parser: RasterisedDocumentParser,
simple_png_file: Path,
) -> None:
"""
GIVEN: a valid PNG with DPI metadata
WHEN: _convert_image_to_pdfa is called
THEN: the returned file is non-empty and begins with the PDF magic bytes
"""
result = tesseract_parser._convert_image_to_pdfa(simple_png_file)
assert result.exists()
assert magic.from_file(str(result), mime=True) == "application/pdf"
def test_output_path_is_archive_pdf_in_tempdir(
self,
tesseract_parser: RasterisedDocumentParser,
simple_png_file: Path,
) -> None:
"""
GIVEN: any valid image
WHEN: _convert_image_to_pdfa is called
THEN: the returned path is exactly <tempdir>/archive.pdf
"""
result = tesseract_parser._convert_image_to_pdfa(simple_png_file)
assert result == Path(tesseract_parser.tempdir) / "archive.pdf"
def test_img2pdf_failure_raises_parse_error(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_png_file: Path,
) -> None:
"""
GIVEN: img2pdf.convert raises an exception
WHEN: _convert_image_to_pdfa is called
THEN: a ParseError is raised that mentions "img2pdf conversion failed"
"""
mocker.patch.object(img2pdf, "convert", side_effect=Exception("boom"))
with pytest.raises(ParseError, match="img2pdf conversion failed"):
tesseract_parser._convert_image_to_pdfa(simple_png_file)
def test_pikepdf_stamping_failure_falls_back_to_plain_pdf(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_png_file: Path,
) -> None:
"""
GIVEN: pikepdf.open raises during PDF/A metadata stamping
WHEN: _convert_image_to_pdfa is called
THEN: no exception is raised and the returned file is still a valid PDF
(plain PDF bytes are used as fallback)
"""
mocker.patch.object(pikepdf, "open", side_effect=Exception("pikepdf boom"))
result = tesseract_parser._convert_image_to_pdfa(simple_png_file)
assert result.exists()
assert magic.from_file(str(result), mime=True) == "application/pdf"
def test_image_dpi_setting_applies_fixed_dpi_layout(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_no_dpi_png_file: Path,
) -> None:
"""
GIVEN: parser.settings.image_dpi = 150
WHEN: _convert_image_to_pdfa is called with a no-DPI PNG
THEN: img2pdf.get_fixed_dpi_layout_fun is called with (150, 150)
and the output is still a valid PDF
"""
spy = mocker.patch.object(
img2pdf,
"get_fixed_dpi_layout_fun",
wraps=img2pdf.get_fixed_dpi_layout_fun,
)
tesseract_parser.settings.image_dpi = 150
result = tesseract_parser._convert_image_to_pdfa(simple_no_dpi_png_file)
spy.assert_called_once_with((150, 150))
assert magic.from_file(str(result), mime=True) == "application/pdf"
def test_no_image_dpi_setting_skips_fixed_dpi_layout(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_png_file: Path,
) -> None:
"""
GIVEN: parser.settings.image_dpi is None (default)
WHEN: _convert_image_to_pdfa is called
THEN: img2pdf.get_fixed_dpi_layout_fun is never called
"""
spy = mocker.patch.object(
img2pdf,
"get_fixed_dpi_layout_fun",
wraps=img2pdf.get_fixed_dpi_layout_fun,
)
tesseract_parser.settings.image_dpi = None
tesseract_parser._convert_image_to_pdfa(simple_png_file)
spy.assert_not_called()

View File

@@ -0,0 +1,440 @@
"""
Focused tests for RasterisedDocumentParser.parse() mode behaviour.
These tests mock ``ocrmypdf.ocr`` so they run without a real Tesseract/OCRmyPDF
installation and execute quickly. The intent is to verify the *control flow*
introduced by the ``produce_archive`` flag and the ``OCR_MODE=auto/off`` logic,
not to test OCRmyPDF itself.
Fixtures are pulled from conftest.py in this package.
"""
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
import pytest
if TYPE_CHECKING:
from pytest_mock import MockerFixture
from paperless.parsers.tesseract import RasterisedDocumentParser
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_LONG_TEXT = "This is a test document with enough text. " * 5 # >50 chars
_SHORT_TEXT = "Hi." # <50 chars
def _make_extract_text(text: str | None):
"""Return a side_effect function for ``extract_text`` that returns *text*."""
def _extract(sidecar_file, pdf_file):
return text
return _extract
# ---------------------------------------------------------------------------
# AUTO mode — PDF with sufficient text layer
# ---------------------------------------------------------------------------
class TestAutoModeWithText:
"""AUTO mode, original PDF has detectable text (>50 chars)."""
def test_auto_text_no_archive_skips_ocrmypdf(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_digital_pdf_file: Path,
) -> None:
"""
GIVEN:
- AUTO mode, produce_archive=False
- PDF with text > VALID_TEXT_LENGTH
WHEN:
- parse() is called
THEN:
- ocrmypdf.ocr is NOT called (early return path)
- archive_path remains None
- text is set from the original
"""
# Patch extract_text to return long text (simulating detectable text layer)
mocker.patch.object(
tesseract_parser,
"extract_text",
return_value=_LONG_TEXT,
)
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
simple_digital_pdf_file,
"application/pdf",
produce_archive=False,
)
mock_ocr.assert_not_called()
assert tesseract_parser.archive_path is None
assert tesseract_parser.get_text() == _LONG_TEXT
def test_auto_text_with_archive_calls_ocrmypdf_skip_text(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_digital_pdf_file: Path,
) -> None:
"""
GIVEN:
- AUTO mode, produce_archive=True
- PDF with text > VALID_TEXT_LENGTH
WHEN:
- parse() is called
THEN:
- ocrmypdf.ocr IS called with skip_text=True
- archive_path is set
"""
mocker.patch.object(
tesseract_parser,
"extract_text",
return_value=_LONG_TEXT,
)
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
simple_digital_pdf_file,
"application/pdf",
produce_archive=True,
)
mock_ocr.assert_called_once()
call_kwargs = mock_ocr.call_args.kwargs
assert call_kwargs.get("skip_text") is True
assert "force_ocr" not in call_kwargs
assert "redo_ocr" not in call_kwargs
assert tesseract_parser.archive_path is not None
# ---------------------------------------------------------------------------
# AUTO mode — PDF without text layer (or too short)
# ---------------------------------------------------------------------------
class TestAutoModeNoText:
"""AUTO mode, original PDF has no detectable text (<= 50 chars)."""
def test_auto_no_text_with_archive_calls_ocrmypdf_no_extra_flag(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
multi_page_images_pdf_file: Path,
) -> None:
"""
GIVEN:
- AUTO mode, produce_archive=True
- PDF with no text (or text <= VALID_TEXT_LENGTH)
WHEN:
- parse() is called
THEN:
- ocrmypdf.ocr IS called WITHOUT skip_text/force_ocr/redo_ocr
- archive_path is set (since produce_archive=True)
"""
# Return "no text" for the original; return real text for archive
extract_call_count = 0
def _extract_side(sidecar_file, pdf_file):
nonlocal extract_call_count
extract_call_count += 1
if extract_call_count == 1:
return None # original has no text
return _LONG_TEXT # text from archive after OCR
mocker.patch.object(tesseract_parser, "extract_text", side_effect=_extract_side)
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
multi_page_images_pdf_file,
"application/pdf",
produce_archive=True,
)
mock_ocr.assert_called_once()
call_kwargs = mock_ocr.call_args.kwargs
assert "skip_text" not in call_kwargs
assert "force_ocr" not in call_kwargs
assert "redo_ocr" not in call_kwargs
assert tesseract_parser.archive_path is not None
def test_auto_no_text_no_archive_calls_ocrmypdf(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
multi_page_images_pdf_file: Path,
) -> None:
"""
GIVEN:
- AUTO mode, produce_archive=False
- PDF with no text
WHEN:
- parse() is called
THEN:
- ocrmypdf.ocr IS called (no early return since no text detected)
- archive_path is NOT set (produce_archive=False)
"""
extract_call_count = 0
def _extract_side(sidecar_file, pdf_file):
nonlocal extract_call_count
extract_call_count += 1
if extract_call_count == 1:
return None
return _LONG_TEXT
mocker.patch.object(tesseract_parser, "extract_text", side_effect=_extract_side)
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
multi_page_images_pdf_file,
"application/pdf",
produce_archive=False,
)
mock_ocr.assert_called_once()
assert tesseract_parser.archive_path is None
# ---------------------------------------------------------------------------
# OFF mode — PDF
# ---------------------------------------------------------------------------
class TestOffModePdf:
"""OCR_MODE=off, document is a PDF."""
def test_off_no_archive_returns_pdftotext(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_digital_pdf_file: Path,
) -> None:
"""
GIVEN:
- OFF mode, produce_archive=False
- PDF with text
WHEN:
- parse() is called
THEN:
- ocrmypdf.ocr is NOT called
- archive_path is None
- text comes from pdftotext (extract_text)
"""
mocker.patch.object(
tesseract_parser,
"extract_text",
return_value=_LONG_TEXT,
)
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "off"
tesseract_parser.parse(
simple_digital_pdf_file,
"application/pdf",
produce_archive=False,
)
mock_ocr.assert_not_called()
assert tesseract_parser.archive_path is None
assert tesseract_parser.get_text() == _LONG_TEXT
def test_off_with_archive_uses_ghostscript_not_ocr(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_digital_pdf_file: Path,
) -> None:
"""
GIVEN:
- OFF mode, produce_archive=True
- PDF document
WHEN:
- parse() is called
THEN:
- ocrmypdf.ocr is NOT called
- Ghostscript generate_pdfa IS called (PDF/A conversion without OCR)
- archive_path is set
- text comes from pdftotext, not OCR
"""
mocker.patch.object(
tesseract_parser,
"extract_text",
return_value=_LONG_TEXT,
)
mock_ocr = mocker.patch("ocrmypdf.ocr")
mock_gs = mocker.patch(
"ocrmypdf._exec.ghostscript.generate_pdfa",
)
mocker.patch("ocrmypdf.pdfa.generate_pdfa_ps")
tesseract_parser.settings.mode = "off"
tesseract_parser.parse(
simple_digital_pdf_file,
"application/pdf",
produce_archive=True,
)
mock_ocr.assert_not_called()
mock_gs.assert_called_once()
assert tesseract_parser.archive_path is not None
assert tesseract_parser.get_text() == _LONG_TEXT
# ---------------------------------------------------------------------------
# OFF mode — image
# ---------------------------------------------------------------------------
class TestOffModeImage:
"""OCR_MODE=off, document is an image (PNG)."""
def test_off_image_no_archive_no_ocrmypdf(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_png_file: Path,
) -> None:
"""
GIVEN:
- OFF mode, produce_archive=False
- Image document (PNG)
WHEN:
- parse() is called
THEN:
- ocrmypdf.ocr is NOT called
- archive_path is None
- text is empty string (images have no text layer)
"""
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "off"
tesseract_parser.parse(simple_png_file, "image/png", produce_archive=False)
mock_ocr.assert_not_called()
assert tesseract_parser.archive_path is None
assert tesseract_parser.get_text() == ""
def test_off_image_with_archive_uses_img2pdf_path(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_png_file: Path,
) -> None:
"""
GIVEN:
- OFF mode, produce_archive=True
- Image document (PNG)
WHEN:
- parse() is called
THEN:
- _convert_image_to_pdfa() is called instead of ocrmypdf.ocr
- archive_path is set to the returned path
- text is empty string
"""
fake_archive = Path("/tmp/fake-archive.pdf")
mock_convert = mocker.patch.object(
tesseract_parser,
"_convert_image_to_pdfa",
return_value=fake_archive,
)
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "off"
tesseract_parser.parse(simple_png_file, "image/png", produce_archive=True)
mock_convert.assert_called_once_with(simple_png_file)
mock_ocr.assert_not_called()
assert tesseract_parser.archive_path == fake_archive
assert tesseract_parser.get_text() == ""
# ---------------------------------------------------------------------------
# produce_archive=False never sets archive_path for FORCE / REDO / AUTO modes
# ---------------------------------------------------------------------------
class TestProduceArchiveFalse:
"""Verify produce_archive=False never results in an archive regardless of mode."""
@pytest.mark.parametrize("mode", ["force", "redo"])
def test_produce_archive_false_force_redo_modes(
self,
mode: str,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
multi_page_images_pdf_file: Path,
) -> None:
"""
GIVEN:
- FORCE or REDO mode, produce_archive=False
- Any PDF
WHEN:
- parse() is called (ocrmypdf mocked to succeed)
THEN:
- archive_path is NOT set even though ocrmypdf ran
"""
mocker.patch.object(
tesseract_parser,
"extract_text",
return_value=_LONG_TEXT,
)
mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = mode
tesseract_parser.parse(
multi_page_images_pdf_file,
"application/pdf",
produce_archive=False,
)
assert tesseract_parser.archive_path is None
assert tesseract_parser.get_text() is not None
def test_produce_archive_false_auto_with_text(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
simple_digital_pdf_file: Path,
) -> None:
"""
GIVEN:
- AUTO mode, produce_archive=False
- PDF with text > VALID_TEXT_LENGTH
WHEN:
- parse() is called
THEN:
- ocrmypdf is skipped entirely (early return)
- archive_path is None
"""
mocker.patch.object(
tesseract_parser,
"extract_text",
return_value=_LONG_TEXT,
)
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
simple_digital_pdf_file,
"application/pdf",
produce_archive=False,
)
mock_ocr.assert_not_called()
assert tesseract_parser.archive_path is None

View File

@@ -94,15 +94,35 @@ class TestParserSettingsFromDb(DirectoriesMixin, FileSystemAssertsMixin, TestCas
WHEN:
- OCR parameters are constructed
THEN:
- Configuration from database is utilized
- Configuration from database is utilized (AUTO mode with skip_text=True
triggers skip_text; AUTO mode alone does not add any extra flag)
"""
# AUTO mode with skip_text=True explicitly passed: skip_text is set
with override_settings(OCR_MODE="redo"):
instance = ApplicationConfiguration.objects.all().first()
instance.mode = ModeChoices.SKIP
instance.mode = ModeChoices.AUTO
instance.save()
params = RasterisedDocumentParser(None).construct_ocrmypdf_parameters(
input_file="input.pdf",
output_file="output.pdf",
sidecar_file="sidecar.txt",
mime_type="application/pdf",
safe_fallback=False,
skip_text=True,
)
self.assertTrue(params["skip_text"])
self.assertNotIn("redo_ocr", params)
self.assertNotIn("force_ocr", params)
# AUTO mode alone (no skip_text): no extra OCR flag is set
with override_settings(OCR_MODE="redo"):
instance = ApplicationConfiguration.objects.all().first()
instance.mode = ModeChoices.AUTO
instance.save()
params = self.get_params()
self.assertTrue(params["skip_text"])
self.assertNotIn("skip_text", params)
self.assertNotIn("redo_ocr", params)
self.assertNotIn("force_ocr", params)

View File

@@ -370,15 +370,26 @@ class TestParsePdf:
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
"""
GIVEN:
- Multi-page digital PDF with sufficient text layer
- Default settings (mode=auto, produce_archive=True)
WHEN:
- Document is parsed
THEN:
- Archive is created (AUTO mode + text present + produce_archive=True
→ PDF/A conversion via skip_text)
- Text is extracted
"""
tesseract_parser.parse(
tesseract_samples_dir / "simple-digital.pdf",
tesseract_samples_dir / "multi-page-digital.pdf",
"application/pdf",
)
assert tesseract_parser.archive_path is not None
assert tesseract_parser.archive_path.is_file()
assert_ordered_substrings(
tesseract_parser.get_text(),
["This is a test document."],
tesseract_parser.get_text().lower(),
["page 1", "page 2", "page 3"],
)
def test_with_form_default(
@@ -397,7 +408,7 @@ class TestParsePdf:
["Please enter your name in here:", "This is a PDF document with a form."],
)
def test_with_form_redo_produces_no_archive(
def test_with_form_redo_no_archive_when_not_requested(
self,
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
@@ -406,6 +417,7 @@ class TestParsePdf:
tesseract_parser.parse(
tesseract_samples_dir / "with-form.pdf",
"application/pdf",
produce_archive=False,
)
assert tesseract_parser.archive_path is None
assert_ordered_substrings(
@@ -433,7 +445,7 @@ class TestParsePdf:
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
tesseract_parser.settings.mode = "skip"
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(tesseract_samples_dir / "signed.pdf", "application/pdf")
assert tesseract_parser.archive_path is None
assert_ordered_substrings(
@@ -449,7 +461,7 @@ class TestParsePdf:
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
tesseract_parser.settings.mode = "skip"
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / "encrypted.pdf",
"application/pdf",
@@ -559,7 +571,7 @@ class TestParseMultiPage:
@pytest.mark.parametrize(
"mode",
[
pytest.param("skip", id="skip"),
pytest.param("auto", id="auto"),
pytest.param("redo", id="redo"),
pytest.param("force", id="force"),
],
@@ -587,7 +599,7 @@ class TestParseMultiPage:
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
tesseract_parser.settings.mode = "skip"
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / "multi-page-images.pdf",
"application/pdf",
@@ -735,16 +747,18 @@ class TestSkipArchive:
"""
GIVEN:
- File with existing text layer
- Mode: skip_noarchive
- Mode: auto, produce_archive=False
WHEN:
- Document is parsed
THEN:
- Text extracted; no archive created
- Text extracted from original; no archive created (text exists +
produce_archive=False skips OCRmyPDF entirely)
"""
tesseract_parser.settings.mode = "skip_noarchive"
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / "multi-page-digital.pdf",
"application/pdf",
produce_archive=False,
)
assert tesseract_parser.archive_path is None
assert_ordered_substrings(
@@ -760,13 +774,13 @@ class TestSkipArchive:
"""
GIVEN:
- File with image-only pages (no text layer)
- Mode: skip_noarchive
- Mode: auto, skip_archive_file: auto
WHEN:
- Document is parsed
THEN:
- Text extracted; archive created (OCR needed)
- Text extracted; archive created (OCR needed, no existing text)
"""
tesseract_parser.settings.mode = "skip_noarchive"
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / "multi-page-images.pdf",
"application/pdf",
@@ -778,41 +792,58 @@ class TestSkipArchive:
)
@pytest.mark.parametrize(
("skip_archive_file", "filename", "expect_archive"),
("produce_archive", "filename", "expect_archive"),
[
pytest.param("never", "multi-page-digital.pdf", True, id="never-with-text"),
pytest.param("never", "multi-page-images.pdf", True, id="never-no-text"),
pytest.param(
"with_text",
True,
"multi-page-digital.pdf",
False,
id="with-text-layer",
True,
id="produce-archive-with-text",
),
pytest.param(
"with_text",
True,
"multi-page-images.pdf",
True,
id="with-text-no-layer",
id="produce-archive-no-text",
),
pytest.param(
"always",
False,
"multi-page-digital.pdf",
False,
id="always-with-text",
id="no-archive-with-text-layer",
),
pytest.param(
False,
"multi-page-images.pdf",
False,
id="no-archive-no-text-layer",
),
pytest.param("always", "multi-page-images.pdf", False, id="always-no-text"),
],
)
def test_skip_archive_file_setting(
def test_produce_archive_flag(
self,
skip_archive_file: str,
produce_archive: bool, # noqa: FBT001
filename: str,
expect_archive: str,
expect_archive: bool, # noqa: FBT001
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
tesseract_parser.settings.skip_archive_file = skip_archive_file
tesseract_parser.parse(tesseract_samples_dir / filename, "application/pdf")
"""
GIVEN:
- Various PDFs (with and without text layers)
- produce_archive flag set to True or False
WHEN:
- Document is parsed
THEN:
- archive_path is set if and only if produce_archive=True
- Text is always extracted
"""
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / filename,
"application/pdf",
produce_archive=produce_archive,
)
text = tesseract_parser.get_text().lower()
assert_ordered_substrings(text, ["page 1", "page 2", "page 3"])
if expect_archive:
@@ -820,6 +851,59 @@ class TestSkipArchive:
else:
assert tesseract_parser.archive_path is None
def test_tagged_pdf_skips_ocr_in_auto_mode(
self,
mocker: MockerFixture,
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
"""
GIVEN:
- A tagged PDF (e.g. exported from Word, /MarkInfo /Marked true)
- Mode: auto, produce_archive=False
WHEN:
- Document is parsed
THEN:
- OCRmyPDF is not invoked (tagged ⇒ original_has_text=True)
- Text is extracted from the original via pdftotext
- No archive is produced
"""
tesseract_parser.settings.mode = "auto"
mock_ocr = mocker.patch("ocrmypdf.ocr")
tesseract_parser.parse(
tesseract_samples_dir / "simple-digital.pdf",
"application/pdf",
produce_archive=False,
)
mock_ocr.assert_not_called()
assert tesseract_parser.archive_path is None
assert tesseract_parser.get_text()
def test_tagged_pdf_produces_pdfa_archive_without_ocr(
self,
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
"""
GIVEN:
- A tagged PDF (e.g. exported from Word, /MarkInfo /Marked true)
- Mode: auto, produce_archive=True
WHEN:
- Document is parsed
THEN:
- OCRmyPDF runs with skip_text (PDF/A conversion only, no OCR)
- Archive is produced
- Text is preserved from the original
"""
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / "simple-digital.pdf",
"application/pdf",
produce_archive=True,
)
assert tesseract_parser.archive_path is not None
assert tesseract_parser.get_text()
# ---------------------------------------------------------------------------
# Parse — mixed pages / sidecar
@@ -835,13 +919,13 @@ class TestParseMixed:
"""
GIVEN:
- File with text in some pages (image) and some pages (digital)
- Mode: skip
- Mode: auto (skip_text), skip_archive_file: always
WHEN:
- Document is parsed
THEN:
- All pages extracted; archive created; sidecar notes skipped pages
"""
tesseract_parser.settings.mode = "skip"
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / "multi-page-mixed.pdf",
"application/pdf",
@@ -898,17 +982,18 @@ class TestParseMixed:
) -> None:
"""
GIVEN:
- File with mixed pages
- Mode: skip_noarchive
- File with mixed pages (some with text, some image-only)
- Mode: auto, produce_archive=False
WHEN:
- Document is parsed
THEN:
- No archive created (file has text layer); later-page text present
- No archive created (produce_archive=False); text from text layer present
"""
tesseract_parser.settings.mode = "skip_noarchive"
tesseract_parser.settings.mode = "auto"
tesseract_parser.parse(
tesseract_samples_dir / "multi-page-mixed.pdf",
"application/pdf",
produce_archive=False,
)
assert tesseract_parser.archive_path is None
assert_ordered_substrings(
@@ -923,12 +1008,12 @@ class TestParseMixed:
class TestParseRotate:
def test_rotate_skip_mode(
def test_rotate_auto_mode(
self,
tesseract_parser: RasterisedDocumentParser,
tesseract_samples_dir: Path,
) -> None:
tesseract_parser.settings.mode = "skip"
tesseract_parser.settings.mode = "auto"
tesseract_parser.settings.rotate = True
tesseract_parser.parse(tesseract_samples_dir / "rotated.pdf", "application/pdf")
assert_ordered_substrings(
@@ -955,12 +1040,19 @@ class TestParseRtl:
) -> None:
"""
GIVEN:
- PDF with RTL Arabic text
- PDF with RTL Arabic text in its text layer (short: 18 chars)
- mode=off, produce_archive=True: PDF/A conversion via skip_text, no OCR engine
WHEN:
- Document is parsed
THEN:
- Arabic content is extracted (normalised for bidi)
- Arabic content is extracted from the PDF text layer (normalised for bidi)
Note: The RTL PDF has a short text layer (< VALID_TEXT_LENGTH=50) so AUTO mode
would attempt full OCR, which fails due to PriorOcrFoundError and falls back to
force-ocr with English Tesseract (producing garbage). Using mode="off" forces
skip_text=True so the Arabic text layer is preserved through PDF/A conversion.
"""
tesseract_parser.settings.mode = "off"
tesseract_parser.parse(
tesseract_samples_dir / "rtl-test.pdf",
"application/pdf",
@@ -971,7 +1063,8 @@ class TestParseRtl:
if unicodedata.category(ch) != "Cf" and not ch.isspace()
)
assert "ةرازو" in normalised
assert any(token in normalised for token in ("ةیلخادلا", "الاخليد"))
# pdftotext uses Arabic Yeh (U+064A) where ocrmypdf used Farsi Yeh (U+06CC)
assert any(token in normalised for token in ("ةیلخادلا", "الاخليد", "ةيلخادال"))
# ---------------------------------------------------------------------------
@@ -1023,11 +1116,11 @@ class TestOcrmypdfParameters:
assert ("clean" in params) == expected_clean
assert ("clean_final" in params) == expected_clean_final
def test_clean_final_skip_mode(
def test_clean_final_auto_mode(
self,
make_tesseract_parser: MakeTesseractParser,
) -> None:
with make_tesseract_parser(OCR_CLEAN="clean-final", OCR_MODE="skip") as parser:
with make_tesseract_parser(OCR_CLEAN="clean-final", OCR_MODE="auto") as parser:
params = parser.construct_ocrmypdf_parameters("", "", "", "")
assert params["clean_final"] is True
assert "clean" not in params
@@ -1044,9 +1137,9 @@ class TestOcrmypdfParameters:
@pytest.mark.parametrize(
("ocr_mode", "ocr_deskew", "expect_deskew"),
[
pytest.param("skip", True, True, id="skip-deskew-on"),
pytest.param("auto", True, True, id="auto-deskew-on"),
pytest.param("redo", True, False, id="redo-deskew-off"),
pytest.param("skip", False, False, id="skip-no-deskew"),
pytest.param("auto", False, False, id="auto-no-deskew"),
],
)
def test_deskew_option(

View File

@@ -132,13 +132,13 @@ class TestOcrSettingsChecks:
pytest.param(
"OCR_MODE",
"skip_noarchive",
"deprecated",
id="deprecated-mode",
'OCR output mode "skip_noarchive"',
id="deprecated-mode-now-invalid",
),
pytest.param(
"OCR_SKIP_ARCHIVE_FILE",
"ARCHIVE_FILE_GENERATION",
"invalid",
'OCR_SKIP_ARCHIVE_FILE setting "invalid"',
'PAPERLESS_ARCHIVE_FILE_GENERATION setting "invalid"',
id="invalid-skip-archive-file",
),
pytest.param(

View File

@@ -0,0 +1,64 @@
"""Tests for v3 system checks: deprecated v2 OCR env var warnings."""
from __future__ import annotations
import os
from typing import TYPE_CHECKING
import pytest
from paperless.checks import check_deprecated_v2_ocr_env_vars
if TYPE_CHECKING:
from pytest_mock import MockerFixture
class TestDeprecatedV2OcrEnvVarWarnings:
def test_no_deprecated_vars_returns_empty(self, mocker: MockerFixture) -> None:
"""No warnings when neither deprecated variable is set."""
mocker.patch.dict(os.environ, {"PAPERLESS_OCR_MODE": "auto"}, clear=True)
result = check_deprecated_v2_ocr_env_vars(None)
assert result == []
@pytest.mark.parametrize(
("env_var", "env_value", "expected_id", "expected_fragment"),
[
pytest.param(
"PAPERLESS_OCR_SKIP_ARCHIVE_FILE",
"always",
"paperless.W002",
"PAPERLESS_OCR_SKIP_ARCHIVE_FILE",
id="skip-archive-file-warns",
),
pytest.param(
"PAPERLESS_OCR_MODE",
"skip",
"paperless.W003",
"skip",
id="ocr-mode-skip-warns",
),
pytest.param(
"PAPERLESS_OCR_MODE",
"skip_noarchive",
"paperless.W003",
"skip_noarchive",
id="ocr-mode-skip-noarchive-warns",
),
],
)
def test_deprecated_var_produces_one_warning(
self,
mocker: MockerFixture,
env_var: str,
env_value: str,
expected_id: str,
expected_fragment: str,
) -> None:
"""Each deprecated setting in isolation produces exactly one warning."""
mocker.patch.dict(os.environ, {env_var: env_value}, clear=True)
result = check_deprecated_v2_ocr_env_vars(None)
assert len(result) == 1
warning = result[0]
assert warning.id == expected_id
assert expected_fragment in warning.msg

View File

@@ -0,0 +1,89 @@
from documents.tests.utils import TestMigrations
class TestMigrateSkipArchiveFile(TestMigrations):
migrate_from = "0007_optimize_integer_field_sizes"
migrate_to = "0008_replace_skip_archive_file"
def setUpBeforeMigration(self, apps):
ApplicationConfiguration = apps.get_model(
"paperless",
"ApplicationConfiguration",
)
ApplicationConfiguration.objects.all().delete()
ApplicationConfiguration.objects.create(
pk=1,
mode="skip",
skip_archive_file="always",
)
ApplicationConfiguration.objects.create(
pk=2,
mode="redo",
skip_archive_file="with_text",
)
ApplicationConfiguration.objects.create(
pk=3,
mode="force",
skip_archive_file="never",
)
ApplicationConfiguration.objects.create(
pk=4,
mode="skip_noarchive",
skip_archive_file=None,
)
ApplicationConfiguration.objects.create(
pk=5,
mode="skip_noarchive",
skip_archive_file="never",
)
ApplicationConfiguration.objects.create(pk=6, mode=None, skip_archive_file=None)
def _get_config(self, pk):
ApplicationConfiguration = self.apps.get_model(
"paperless",
"ApplicationConfiguration",
)
return ApplicationConfiguration.objects.get(pk=pk)
def test_skip_mapped_to_auto(self):
config = self._get_config(1)
assert config.mode == "auto"
def test_skip_archive_always_mapped_to_never(self):
config = self._get_config(1)
assert config.archive_file_generation == "never"
def test_redo_unchanged(self):
config = self._get_config(2)
assert config.mode == "redo"
def test_skip_archive_with_text_mapped_to_auto(self):
config = self._get_config(2)
assert config.archive_file_generation == "auto"
def test_force_unchanged(self):
config = self._get_config(3)
assert config.mode == "force"
def test_skip_archive_never_mapped_to_always(self):
config = self._get_config(3)
assert config.archive_file_generation == "always"
def test_skip_noarchive_mapped_to_auto(self):
config = self._get_config(4)
assert config.mode == "auto"
def test_skip_noarchive_implies_archive_never(self):
config = self._get_config(4)
assert config.archive_file_generation == "never"
def test_skip_noarchive_explicit_skip_archive_takes_precedence(self):
"""skip_archive_file=never maps to always, not overridden by skip_noarchive."""
config = self._get_config(5)
assert config.mode == "auto"
assert config.archive_file_generation == "always"
def test_null_values_remain_null(self):
config = self._get_config(6)
assert config.mode is None
assert config.archive_file_generation is None

View File

@@ -0,0 +1,66 @@
"""Tests for OcrConfig archive_file_generation field behavior."""
from __future__ import annotations
from typing import TYPE_CHECKING
import pytest
from django.test import override_settings
from paperless.config import OcrConfig
if TYPE_CHECKING:
from unittest.mock import MagicMock
@pytest.fixture()
def null_app_config(mocker) -> MagicMock:
"""Mock ApplicationConfiguration with all fields None → falls back to Django settings."""
return mocker.MagicMock(
output_type=None,
pages=None,
language=None,
mode=None,
archive_file_generation=None,
image_dpi=None,
unpaper_clean=None,
deskew=None,
rotate_pages=None,
rotate_pages_threshold=None,
max_image_pixels=None,
color_conversion_strategy=None,
user_args=None,
)
@pytest.fixture()
def make_ocr_config(mocker, null_app_config):
mocker.patch(
"paperless.config.BaseConfig._get_config_instance",
return_value=null_app_config,
)
def _make(**django_settings_overrides):
with override_settings(**django_settings_overrides):
return OcrConfig()
return _make
class TestOcrConfigArchiveFileGeneration:
def test_auto_from_settings(self, make_ocr_config) -> None:
cfg = make_ocr_config(OCR_MODE="auto", ARCHIVE_FILE_GENERATION="auto")
assert cfg.archive_file_generation == "auto"
def test_always_from_settings(self, make_ocr_config) -> None:
cfg = make_ocr_config(ARCHIVE_FILE_GENERATION="always")
assert cfg.archive_file_generation == "always"
def test_never_from_settings(self, make_ocr_config) -> None:
cfg = make_ocr_config(ARCHIVE_FILE_GENERATION="never")
assert cfg.archive_file_generation == "never"
def test_db_value_overrides_setting(self, make_ocr_config, null_app_config) -> None:
null_app_config.archive_file_generation = "never"
cfg = make_ocr_config(ARCHIVE_FILE_GENERATION="always")
assert cfg.archive_file_generation == "never"

View File

@@ -0,0 +1,25 @@
"""Tests for paperless.parsers.utils helpers."""
from __future__ import annotations
from pathlib import Path
from paperless.parsers.utils import is_tagged_pdf
SAMPLES = Path(__file__).parent / "samples" / "tesseract"
class TestIsTaggedPdf:
def test_tagged_pdf_returns_true(self) -> None:
assert is_tagged_pdf(SAMPLES / "simple-digital.pdf") is True
def test_untagged_pdf_returns_false(self) -> None:
assert is_tagged_pdf(SAMPLES / "multi-page-images.pdf") is False
def test_nonexistent_path_returns_false(self) -> None:
assert is_tagged_pdf(Path("/nonexistent/file.pdf")) is False
def test_corrupt_pdf_returns_false(self, tmp_path: Path) -> None:
bad = tmp_path / "bad.pdf"
bad.write_bytes(b"not a pdf")
assert is_tagged_pdf(bad) is False

View File

@@ -1,6 +1,10 @@
from io import BytesIO
from django.conf import settings
from django.core.exceptions import ValidationError
from django.core.files.uploadedfile import UploadedFile
from lxml import etree
from PIL import Image
ALLOWED_SVG_TAGS: set[str] = {
# Basic shapes
@@ -254,3 +258,30 @@ def reject_dangerous_svg(file: UploadedFile) -> None:
raise ValidationError(
f"URI scheme not allowed in {attr_name}: must be #anchor, relative path, or data:image/*",
)
def validate_raster_image(file: UploadedFile) -> None:
"""
Validates that the uploaded file is a valid raster image (JPEG, PNG, etc.)
and does not exceed maximum pixel limits.
Raises ValidationError if the image is invalid or exceeds the allowed size.
"""
file.seek(0)
image_data = file.read()
try:
with Image.open(BytesIO(image_data)) as image:
image.verify()
if (
settings.MAX_IMAGE_PIXELS is not None
and settings.MAX_IMAGE_PIXELS > 0
and image.width * image.height > settings.MAX_IMAGE_PIXELS
):
raise ValidationError(
"Uploaded logo exceeds the maximum allowed image size.",
)
if image.format is None: # pragma: no cover
raise ValidationError("Invalid logo image.")
except (OSError, Image.DecompressionBombError) as e:
raise ValidationError("Invalid logo image.") from e

668
uv.lock generated

File diff suppressed because it is too large Load Diff