Breaking: Remove the positional arguments from the pre/post consume scripts. Environment has been preferred for quite some time now

Update SECURITY.md
Add tests
2026-04-11 02:28:51 +00:00 · 2026-04-10 13:10:45 -07:00 · 2026-04-10 12:34:47 -07:00 · 2026-04-10 12:06:28 -07:00 · 2026-04-10 11:50:58 -07:00
8 changed files with 436 additions and 19 deletions
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -2,8 +2,83 @@

 ## Reporting a Vulnerability

-The Paperless-ngx team and community take security bugs seriously. We appreciate your efforts to responsibly disclose your findings, and will make every effort to acknowledge your contributions.
+The Paperless-ngx team and community take security issues seriously. We appreciate good-faith reports and will make every effort to review legitimate findings responsibly.

 To report a security issue, please use the GitHub Security Advisory ["Report a Vulnerability"](https://github.com/paperless-ngx/paperless-ngx/security/advisories/new) tab.

-The team will send a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.
+After the initial reply to your report, the team may ask for additional information, reproduction steps, affected versions, configuration details, or proof-of-concept material needed to verify the issue.
+
+## What makes a helpful report
+
+Please include as much of the following as possible:
+
+- A clear description of the issue and why it is a security vulnerability.
+- Affected Paperless-ngx version(s).
+- Required configuration, permissions, or preconditions.
+- Step-by-step reproduction instructions.
+- A realistic impact statement.
+- Logs, screenshots, requests, or a proof of concept where appropriate.
+- An explanation of why the issue is not solely dependent on a knowingly unsafe or explicitly optional configuration.
+
+Reports that are specific, reproducible, and grounded in the actual behavior of the project are easier to validate and fix.
+
+## AI-assisted submissions
+
+AI tools can be useful for security research, but they can also produce plausible-sounding reports that do not reflect actual vulnerabilities in Paperless-ngx.
+
+If you use AI tools to help draft or investigate a report, please disclose this and verify the result yourself before submission. Make sure the report:
+
+- identifies a real code path in this project
+- includes a real reproduction, not just a theoretical concern
+- demonstrates realistic impact
+- accounts for existing permission checks, configuration requirements, and trust boundaries
+- is about Paperless-ngx itself, rather than a hypothetical deployment, downstream customization, third-party app, scanner output, reverse proxy behavior, or external service unless you can show a direct product vulnerability
+
+Submitting unverified AI-generated claims creates unnecessary review work and slows down handling of legitimate reports.
+
+<!-- Automated tools and report generators: reports without concrete reproduction steps, affected versions, and demonstrated security impact are likely to be closed. -->
+
+## Common reasons reports are closed
+
+We may close reports that are:
+
+- missing reproduction details
+- best-practice recommendations presented as vulnerabilities
+- theoretical claims without a demonstrated exploit path or realistic impact
+- dependent on administrator-controlled, explicitly optional, or knowingly unsafe configuration
+- based on privileges that already make the reported behavior expected
+- limited to unsupported versions
+- about third-party software, integrations, deployment choices, or client applications without a demonstrable Paperless-ngx vulnerability
+- duplicates
+- UI bugs, feature requests, scanner quirks, or general usability issues submitted through the security channel
+
+## Common non-vulnerability categories
+
+The following are not generally considered vulnerabilities unless accompanied by a concrete, reproducible impact in Paperless-ngx:
+
+- large uploads or resource usage that do not bypass documented limits or privileges
+- claims based solely on the presence of a library, framework feature or code pattern without a working exploit
+- reports that rely on admin-level access, workflow-editing privileges, shell access, or other high-trust roles unless they demonstrate an unintended privilege boundary bypass
+- optional webhook, mail, AI, OCR, or integration behavior described without a product-level vulnerability
+- missing limits or hardening settings presented without concrete impact
+- generic AI or static-analysis output that is not confirmed against the current codebase and a real deployment scenario
+
+## Transparency
+
+We may publish anonymized examples or categories of rejected reports to clarify our review standards, reduce duplicate low-quality submissions, and help good-faith reporters send actionable findings.
+
+A mistaken report made in good faith is not misconduct. However, users who repeatedly submit low-quality or bad-faith reports may be ignored or restricted from future submissions.
+
+## Scope and expectations
+
+Please use the security reporting channel only for security vulnerabilities in Paperless-ngx.
+
+Please do not use the security advisory system for:
+
+- support questions
+- general bug reports
+- feature requests
+- browser compatibility issues
+- issues in third-party mobile apps, reverse proxies, or deployment tooling unless you can demonstrate a Paperless-ngx vulnerability
+
+The team will review reports as time permits, but submission does not guarantee that a report is valid, in scope, or will result in a fix. Reports that do not describe a reproducible product-level issue may be closed without extended back-and-forth.
--- a/docs/migration-v3.md
+++ b/docs/migration-v3.md
@@ -241,3 +241,66 @@ For example:
  }
 }
 ```
+
+## Consume Script Positional Arguments Removed
+
+Pre- and post-consumption scripts no longer receive positional arguments. All information is
+now passed exclusively via environment variables, which have been available since earlier versions.
+
+### Pre-consumption script
+
+Previously, the original file path was passed as `$1`. It is now only available as
+`DOCUMENT_SOURCE_PATH`.
+
+**Before:**
+
+```bash
+#!/usr/bin/env bash
+# $1 was the original file path
+process_document "$1"
+```
+
+**After:**
+
+```bash
+#!/usr/bin/env bash
+process_document "${DOCUMENT_SOURCE_PATH}"
+```
+
+### Post-consumption script
+
+Previously, document metadata was passed as positional arguments `$1` through `$8`:
+
+| Argument | Environment Variable Equivalent |
+| -------- | ------------------------------- |
+| `$1`     | `DOCUMENT_ID`                   |
+| `$2`     | `DOCUMENT_FILE_NAME`            |
+| `$3`     | `DOCUMENT_SOURCE_PATH`          |
+| `$4`     | `DOCUMENT_THUMBNAIL_PATH`       |
+| `$5`     | `DOCUMENT_DOWNLOAD_URL`         |
+| `$6`     | `DOCUMENT_THUMBNAIL_URL`        |
+| `$7`     | `DOCUMENT_CORRESPONDENT`        |
+| `$8`     | `DOCUMENT_TAGS`                 |
+
+**Before:**
+
+```bash
+#!/usr/bin/env bash
+DOCUMENT_ID=$1
+CORRESPONDENT=$7
+TAGS=$8
+```
+
+**After:**
+
+```bash
+#!/usr/bin/env bash
+# Use environment variables directly
+echo "Document ${DOCUMENT_ID} from ${DOCUMENT_CORRESPONDENT} tagged: ${DOCUMENT_TAGS}"
+```
+
+### Action Required
+
+Update any pre- or post-consumption scripts that read `$1`, `$2`, etc. to use the
+corresponding environment variables instead. Environment variables have been the preferred
+option since v1.8.0.
--- a/src/documents/consumer.py
+++ b/src/documents/consumer.py
@@ -313,7 +313,6 @@ class ConsumerPlugin(
            run_subprocess(
                [
                    settings.PRE_CONSUME_SCRIPT,
-                    original_file_path,
                ],
                script_env,
                self.log,
@@ -383,14 +382,6 @@ class ConsumerPlugin(
            run_subprocess(
                [
                    settings.POST_CONSUME_SCRIPT,
-                    str(document.pk),
-                    document.get_public_filename(),
-                    os.path.normpath(document.source_path),
-                    os.path.normpath(document.thumbnail_path),
-                    reverse("document-download", kwargs={"pk": document.pk}),
-                    reverse("document-thumb", kwargs={"pk": document.pk}),
-                    str(document.correspondent),
-                    str(",".join(document.tags.all().values_list("name", flat=True))),
                ],
                script_env,
                self.log,
--- a/src/documents/tests/test_api_app_config.py
+++ b/src/documents/tests/test_api_app_config.py
@@ -6,6 +6,8 @@ from unittest.mock import patch
 from django.contrib.auth.models import User
 from django.core.files.uploadedfile import SimpleUploadedFile
 from django.test import override_settings
+from PIL import Image
+from PIL.PngImagePlugin import PngInfo
 from rest_framework import status
 from rest_framework.test import APITestCase

@@ -201,6 +203,156 @@ class TestApiAppConfig(DirectoriesMixin, APITestCase):
        )
        self.assertFalse(Path(old_logo.path).exists())

+    def test_api_strips_exif_data_from_uploaded_logo(self) -> None:
+        """
+        GIVEN:
+            - A JPEG logo upload containing EXIF metadata
+        WHEN:
+            - Uploaded via PATCH to app config
+        THEN:
+            - Stored logo image has EXIF metadata removed
+        """
+        image = Image.new("RGB", (12, 12), "blue")
+        exif = Image.Exif()
+        exif[315] = "Paperless Test Author"
+
+        logo = BytesIO()
+        image.save(logo, format="JPEG", exif=exif)
+        logo.seek(0)
+
+        response = self.client.patch(
+            f"{self.ENDPOINT}1/",
+            {
+                "app_logo": SimpleUploadedFile(
+                    name="logo-with-exif.jpg",
+                    content=logo.getvalue(),
+                    content_type="image/jpeg",
+                ),
+            },
+        )
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+
+        config = ApplicationConfiguration.objects.first()
+        with Image.open(config.app_logo.path) as stored_logo:
+            stored_exif = stored_logo.getexif()
+
+        self.assertEqual(len(stored_exif), 0)
+
+    def test_api_strips_png_metadata_from_uploaded_logo(self) -> None:
+        """
+        GIVEN:
+            - A PNG logo upload containing text metadata
+        WHEN:
+            - Uploaded via PATCH to app config
+        THEN:
+            - Stored logo image has metadata removed
+        """
+        image = Image.new("RGB", (12, 12), "green")
+        pnginfo = PngInfo()
+        pnginfo.add_text("Author", "Paperless Test Author")
+
+        logo = BytesIO()
+        image.save(logo, format="PNG", pnginfo=pnginfo)
+        logo.seek(0)
+
+        response = self.client.patch(
+            f"{self.ENDPOINT}1/",
+            {
+                "app_logo": SimpleUploadedFile(
+                    name="logo-with-metadata.png",
+                    content=logo.getvalue(),
+                    content_type="image/png",
+                ),
+            },
+        )
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+
+        config = ApplicationConfiguration.objects.first()
+        with Image.open(config.app_logo.path) as stored_logo:
+            stored_text = stored_logo.text
+
+        self.assertEqual(stored_text, {})
+
+    def test_api_accepts_valid_gif_logo(self) -> None:
+        """
+        GIVEN:
+            - A valid GIF logo upload
+        WHEN:
+            - Uploaded via PATCH to app config
+        THEN:
+            - Upload succeeds
+        """
+        image = Image.new("RGB", (12, 12), "red")
+
+        logo = BytesIO()
+        image.save(logo, format="GIF", comment=b"Paperless Test Comment")
+        logo.seek(0)
+
+        response = self.client.patch(
+            f"{self.ENDPOINT}1/",
+            {
+                "app_logo": SimpleUploadedFile(
+                    name="logo.gif",
+                    content=logo.getvalue(),
+                    content_type="image/gif",
+                ),
+            },
+        )
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+
+    def test_api_rejects_invalid_raster_logo(self) -> None:
+        """
+        GIVEN:
+            - A file named as a JPEG but containing non-image payload data
+        WHEN:
+            - Uploaded via PATCH to app config
+        THEN:
+            - Upload is rejected with 400
+        """
+        response = self.client.patch(
+            f"{self.ENDPOINT}1/",
+            {
+                "app_logo": SimpleUploadedFile(
+                    name="not-an-image.jpg",
+                    content=b"<script>alert('xss')</script>",
+                    content_type="image/jpeg",
+                ),
+            },
+        )
+        self.assertEqual(response.status_code, status.HTTP_400_BAD_REQUEST)
+        self.assertIn("invalid logo image", str(response.data).lower())
+
+    @override_settings(MAX_IMAGE_PIXELS=100)
+    def test_api_rejects_logo_exceeding_max_image_pixels(self) -> None:
+        """
+        GIVEN:
+            - A raster logo larger than the configured MAX_IMAGE_PIXELS limit
+        WHEN:
+            - Uploaded via PATCH to app config
+        THEN:
+            - Upload is rejected with 400
+        """
+        image = Image.new("RGB", (12, 12), "purple")
+        logo = BytesIO()
+        image.save(logo, format="PNG")
+        logo.seek(0)
+
+        response = self.client.patch(
+            f"{self.ENDPOINT}1/",
+            {
+                "app_logo": SimpleUploadedFile(
+                    name="too-large.png",
+                    content=logo.getvalue(),
+                    content_type="image/png",
+                ),
+            },
+        )
+        self.assertEqual(response.status_code, status.HTTP_400_BAD_REQUEST)
+        self.assertIn(
+            "uploaded logo exceeds the maximum allowed image size",
+            str(response.data).lower(),
+        )
+
    def test_api_rejects_malicious_svg_logo(self) -> None:
        """
        GIVEN:
--- a/src/documents/tests/test_api_documents.py
+++ b/src/documents/tests/test_api_documents.py
@@ -18,6 +18,7 @@ from django.contrib.auth.models import Permission
 from django.contrib.auth.models import User
 from django.core import mail
 from django.core.cache import cache
+from django.core.files.uploadedfile import SimpleUploadedFile
 from django.db import DataError
 from django.test import override_settings
 from django.utils import timezone
@@ -1377,6 +1378,79 @@ class TestDocumentApi(DirectoriesMixin, DocumentConsumeDelayMixin, APITestCase):
        self.assertIsNone(overrides.document_type_id)
        self.assertIsNone(overrides.tag_ids)

+    def test_upload_with_path_traversal_filename_is_reduced_to_basename(self) -> None:
+        self.consume_file_mock.return_value = celery.result.AsyncResult(
+            id=str(uuid.uuid4()),
+        )
+
+        payload = SimpleUploadedFile(
+            "../../outside.pdf",
+            (Path(__file__).parent / "samples" / "simple.pdf").read_bytes(),
+            content_type="application/pdf",
+        )
+
+        response = self.client.post(
+            "/api/documents/post_document/",
+            {"document": payload},
+        )
+
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+        self.consume_file_mock.assert_called_once()
+
+        input_doc, overrides = self.get_last_consume_delay_call_args()
+
+        self.assertEqual(input_doc.original_file.name, "outside.pdf")
+        self.assertEqual(overrides.filename, "outside.pdf")
+        self.assertNotIn("..", input_doc.original_file.name)
+        self.assertNotIn("..", overrides.filename)
+        self.assertTrue(
+            input_doc.original_file.resolve(strict=False).is_relative_to(
+                Path(settings.SCRATCH_DIR).resolve(strict=False),
+            ),
+        )
+
+    def test_upload_with_path_traversal_content_disposition_filename_is_reduced_to_basename(
+        self,
+    ) -> None:
+        self.consume_file_mock.return_value = celery.result.AsyncResult(
+            id=str(uuid.uuid4()),
+        )
+
+        pdf_bytes = (Path(__file__).parent / "samples" / "simple.pdf").read_bytes()
+        boundary = "paperless-boundary"
+        payload = (
+            (
+                f"--{boundary}\r\n"
+                'Content-Disposition: form-data; name="document"; '
+                'filename="../../outside.pdf"\r\n'
+                "Content-Type: application/pdf\r\n\r\n"
+            ).encode()
+            + pdf_bytes
+            + f"\r\n--{boundary}--\r\n".encode()
+        )
+
+        response = self.client.generic(
+            "POST",
+            "/api/documents/post_document/",
+            payload,
+            content_type=f"multipart/form-data; boundary={boundary}",
+        )
+
+        self.assertEqual(response.status_code, status.HTTP_200_OK)
+        self.consume_file_mock.assert_called_once()
+
+        input_doc, overrides = self.get_last_consume_delay_call_args()
+
+        self.assertEqual(input_doc.original_file.name, "outside.pdf")
+        self.assertEqual(overrides.filename, "outside.pdf")
+        self.assertNotIn("..", input_doc.original_file.name)
+        self.assertNotIn("..", overrides.filename)
+        self.assertTrue(
+            input_doc.original_file.resolve(strict=False).is_relative_to(
+                Path(settings.SCRATCH_DIR).resolve(strict=False),
+            ),
+        )
+
    def test_document_filters_use_latest_version_content(self) -> None:
        root = Document.objects.create(
            title="versioned root",
--- a/src/documents/tests/test_consumer.py
+++ b/src/documents/tests/test_consumer.py
@@ -1328,7 +1328,7 @@ class PreConsumeTestCase(DirectoriesMixin, GetConsumerMixin, TestCase):
                    environment = args[1]

                    self.assertEqual(command[0], script.name)
-                    self.assertEqual(command[1], str(self.test_file))
+                    self.assertEqual(len(command), 1)

                    subset = {
                        "DOCUMENT_SOURCE_PATH": str(c.input_doc.original_file),
@@ -1478,11 +1478,7 @@ class PostConsumeTestCase(DirectoriesMixin, GetConsumerMixin, TestCase):
                environment = args[1]

                self.assertEqual(command[0], script.name)
-                self.assertEqual(command[1], str(doc.pk))
-                self.assertEqual(command[5], f"/api/documents/{doc.pk}/download/")
-                self.assertEqual(command[6], f"/api/documents/{doc.pk}/thumb/")
-                self.assertEqual(command[7], "my_bank")
-                self.assertCountEqual(command[8].split(","), ["a", "b"])
+                self.assertEqual(len(command), 1)

                subset = {
                    "DOCUMENT_ID": str(doc.pk),
--- a/src/paperless/serialisers.py
+++ b/src/paperless/serialisers.py
@@ -1,4 +1,5 @@
 import logging
+from io import BytesIO

 import magic
 from allauth.mfa.adapter import get_adapter as get_mfa_adapter
@@ -11,13 +12,16 @@ from django.contrib.auth.models import Group
 from django.contrib.auth.models import Permission
 from django.contrib.auth.models import User
 from django.contrib.auth.password_validation import validate_password
+from django.core.files.uploadedfile import InMemoryUploadedFile
 from django.core.files.uploadedfile import UploadedFile
+from PIL import Image
 from rest_framework import serializers
 from rest_framework.authtoken.serializers import AuthTokenSerializer

 from paperless.models import ApplicationConfiguration
 from paperless.network import validate_outbound_http_url
 from paperless.validators import reject_dangerous_svg
+from paperless.validators import validate_raster_image
 from paperless_mail.serialisers import ObfuscatedPasswordField

 logger = logging.getLogger("paperless.settings")
@@ -233,9 +237,40 @@ class ApplicationConfigurationSerializer(serializers.ModelSerializer):
            instance.app_logo.delete()
        return super().update(instance, validated_data)

+    def _sanitize_raster_image(self, file: UploadedFile) -> UploadedFile:
+        try:
+            data = BytesIO()
+            image = Image.open(file)
+            image.save(data, format=image.format)
+            data.seek(0)
+
+            return InMemoryUploadedFile(
+                file=data,
+                field_name=file.field_name,
+                name=file.name,
+                content_type=file.content_type,
+                size=data.getbuffer().nbytes,
+                charset=getattr(file, "charset", None),
+            )
+        finally:
+            image.close()
+
    def validate_app_logo(self, file: UploadedFile):
-        if file and magic.from_buffer(file.read(2048), mime=True) == "image/svg+xml":
-            reject_dangerous_svg(file)
+        """
+        Validates and sanitizes the uploaded app logo image. Model field already restricts to
+        jpg/png/gif/svg.
+        """
+        if file:
+            mime_type = magic.from_buffer(file.read(2048), mime=True)
+
+            if mime_type == "image/svg+xml":
+                reject_dangerous_svg(file)
+            else:
+                validate_raster_image(file)
+
+                if mime_type in {"image/jpeg", "image/png"}:
+                    file = self._sanitize_raster_image(file)
+
        return file

    def validate_llm_endpoint(self, value: str | None) -> str | None:
--- a/src/paperless/validators.py
+++ b/src/paperless/validators.py
@@ -1,6 +1,10 @@
+from io import BytesIO
+
+from django.conf import settings
 from django.core.exceptions import ValidationError
 from django.core.files.uploadedfile import UploadedFile
 from lxml import etree
+from PIL import Image

 ALLOWED_SVG_TAGS: set[str] = {
    # Basic shapes
@@ -254,3 +258,30 @@ def reject_dangerous_svg(file: UploadedFile) -> None:
                    raise ValidationError(
                        f"URI scheme not allowed in {attr_name}: must be #anchor, relative path, or data:image/*",
                    )
+
+
+def validate_raster_image(file: UploadedFile) -> None:
+    """
+    Validates that the uploaded file is a valid raster image (JPEG, PNG, etc.)
+    and does not exceed maximum pixel limits.
+    Raises ValidationError if the image is invalid or exceeds the allowed size.
+    """
+
+    file.seek(0)
+    image_data = file.read()
+    try:
+        with Image.open(BytesIO(image_data)) as image:
+            image.verify()
+
+            if (
+                settings.MAX_IMAGE_PIXELS is not None
+                and settings.MAX_IMAGE_PIXELS > 0
+                and image.width * image.height > settings.MAX_IMAGE_PIXELS
+            ):
+                raise ValidationError(
+                    "Uploaded logo exceeds the maximum allowed image size.",
+                )
+            if image.format is None:  # pragma: no cover
+                raise ValidationError("Invalid logo image.")
+    except (OSError, Image.DecompressionBombError) as e:
+        raise ValidationError("Invalid logo image.") from e
Author	SHA1	Message	Date
Trenton H	b4c26fb204	Breaking: Remove the positional arguments from the pre/post consume scripts. Environment has been preferred for quite some time now	2026-04-10 13:10:45 -07:00
shamoon	fdd5e3ecb2	Update SECURITY.md	2026-04-10 12:34:47 -07:00
shamoon	df3b656352	Add tests	2026-04-10 12:06:28 -07:00
shamoon	51e721733f	Enhancement: validate and sanitize uploaded logos (#12551 )	2026-04-10 11:50:58 -07:00