Finish forensic→failure rename: archive-folder migration + dashboard/doc cleanup (#776)

The forensic→failure rename (#659) left a few loose ends and one deliberate hold-back. This closes them. Leftover rename misses (broken paths / stale canonical names): - CONTRIBUTING.md, dashboard-dev-bootstrap.sh: samples/forensic/* → samples/failure/* - dashboard-dev-bootstrap.sh, dashboards/README.md: dmarc_forensic_dashboard.xml → dmarc_failure_dashboard.xml (the file was already renamed; the import path and view name were not) - docs/source/usage.md: PARSEDMARC_GENERAL_SAVE_FORENSIC → ..._SAVE_FAILURE example - samples/parsedmarc.ini: save_forensic → save_failure - pyproject.toml, README.md: canonical "failure" naming (ci.ini intentionally keeps save_forensic to smoke-test the deprecated alias.) Archive subfolder rename + on-startup migration: - New failure reports now archive to <archive>/Failure (was <archive>/Forensic). - _migrate_forensic_archive_folder() runs once on startup (best-effort): renames Forensic→Failure when no Failure folder exists yet, merges the two when both exist, no-ops when there's no legacy folder, and logs-and-skips a mailbox it can't reorganize (warn, don't crash). This consolidates pre- and post-rename failure reports into one folder, replacing the previously documented decision to keep the folder named Forensic to avoid a split archive. Uses the folder-management API (folder_exists / rename_folder / merge_folders) added in mailsuite 2.1.0; the pin is bumped to >=2.1.0. Grafana dashboard (the rename PR updated OSD/Splunk/ES-OS but not Grafana): - Forensic panel titles + the datasource label → Failure; the fo-column display label and its linked byName field-override matcher both → "Failure Policy" (changed together so the column-width override keeps matching). - dev-bootstrap Grafana ES datasource: dmarc_forensic* → dmarc_f* (matches both pre-rename dmarc_forensic* and post-rename dmarc_failure*, like the OSD/Kibana dashboards); RESEED wipe loop now also clears dmarc_failure* indices. - Removed dashboards/grafana/Grafana-DMARC_Reports.json-new_panel.json, an orphan export accidentally committed in #736 and referenced by nothing. Tests (tests/test_init.py): - TestMigrateForensicArchiveFolderMaildir: real on-disk Maildir round-trips via mailsuite's MaildirConnection (no mocks) — rename, merge, no-op, and the full get_dmarc_reports_from_mailbox orchestration. Runs in CI (no network/creds). - TestMigrateForensicArchiveFolderErrorHandling: the one path a real Maildir can't reproduce — a backend that raises mid-operation must warn, not crash. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-07-09 18:25:09 +00:00 · 2026-05-21 12:29:40 -04:00
parent 327fcff2b9
commit a6778707d7
11 changed files with 181 additions and 5927 deletions
@@ -64,7 +64,7 @@ Forensic reports have been renamed to failure reports throughout the project to

 - Old function/type names preserved as aliases: `parse_forensic_report = parse_failure_report`, `ForensicReport = FailureReport`, etc.
 - CLI config accepts both old (`save_forensic`, `forensic_topic`) and new keys (`save_failure`, `failure_topic`)
- IMAP archive subfolder name is intentionally kept as `Forensic` (under `archive_folder`) so existing deployments don't end up with a split archive across `Forensic/` and `Failure/`.
+- The archive subfolder for failure reports is now `Failure` (under `archive_folder`), renamed from `Forensic`. To avoid a split archive across `Forensic/` and `Failure/`, parsedmarc migrates an existing `Forensic` subfolder into `Failure` automatically on startup (best-effort): it renames the folder when no `Failure` folder exists yet, merges the two when both already exist, and logs-and-skips any mailbox it cannot reorganize (warn, don't crash). This consolidation uses the folder-management API (`folder_exists` / `rename_folder` / `merge_folders`) added in mailsuite 2.1.0, so the required `mailsuite` version is now `>=2.1.0`.
 - RFC 7489 reports parse with `None` for RFC 9990-only fields
 - **Updated dashboards with queries are backward compatible**: queries match data indexed under both old (`dmarc_forensic*` / `dmarc:forensic`) and new (`dmarc_failure*` / `dmarc:failure`) names, so dashboards show data from before and after the rename:
  - **OpenSearch Dashboards**: Index pattern uses `dmarc_f*` to match both `dmarc_forensic*` and `dmarc_failure*`
@@ -34,7 +34,7 @@ sample reports:

 ```bash
 parsedmarc --debug -c ci.ini samples/aggregate/*
-parsedmarc --debug -c ci.ini samples/forensic/*
+parsedmarc --debug -c ci.ini samples/failure/*
 ```

 To skip DNS lookups during tests, set:
@@ -29,7 +29,7 @@ Please consider [sponsoring my work](https://github.com/sponsors/seanthegeek) if
 ## Features

 - Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf DMARC reports
+- Parses failure/ruf DMARC reports (formerly called forensic reports)
 - Parses reports from SMTP TLS Reporting
 - Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
 - Transparently handles gzip or zip compressed reports
@@ -176,9 +176,10 @@ else
        echo "  RESEED=1: wiping existing parsedmarc data from all backends"
        # ES 8.x rejects wildcard DELETEs by default
        # (action.destructive_requires_name=true). Enumerate the daily indexes
-        # parsedmarc rolls (dmarc_aggregate-YYYY-MM-DD, dmarc_forensic-...,
-        # smtp_tls-...) and DELETE each one explicitly.
-        for prefix in dmarc_aggregate dmarc_forensic smtp_tls; do
+        # parsedmarc rolls (dmarc_aggregate-YYYY-MM-DD, dmarc_failure-...,
+        # smtp_tls-...) and DELETE each one explicitly. dmarc_forensic-* is the
+        # pre-rename failure index family, kept here so RESEED clears old data.
+        for prefix in dmarc_aggregate dmarc_failure dmarc_forensic smtp_tls; do
            for idx in $(curl -sf "http://localhost:9200/_cat/indices/${prefix}*?h=index" 2>/dev/null); do
                curl -sS -X DELETE "http://localhost:9200/${idx}" >/dev/null 2>&1 || true
            done
@@ -231,7 +232,7 @@ else
        samples/aggregate/protection.outlook.com!example.com!1711756800!1711843200.xml
        samples/aggregate/usssa.com!example.com!1538784000!1538870399.xml
        samples/aggregate/veeam.com!example.com!1530133200!1530219600.xml
-        samples/forensic/*.eml
+        samples/failure/*.eml
        samples/smtp_tls/*.json
        samples/smtp_tls/google.com_smtp_tls_report.eml
    )
@@ -268,7 +269,9 @@ log "Configuring Grafana datasources"
 # Two Elasticsearch datasources, one per index family, matching the dashboard's
 # template variables (dmarc-ag and dmarc-fo). Skipped when already present.
 declare -a GF_DS_NAMES=("dmarc-ag" "dmarc-fo")
-declare -a GF_DS_INDEX=("dmarc_aggregate*" "dmarc_forensic*")
+# dmarc_f* matches both pre-rename dmarc_forensic* and post-rename
+# dmarc_failure* indices, mirroring the OpenSearch/Kibana dashboards.
+declare -a GF_DS_INDEX=("dmarc_aggregate*" "dmarc_f*")
 declare -a GF_DS_TIME=("date_range" "arrival_date")
 for i in 0 1; do
    name="${GF_DS_NAMES[$i]}"
@@ -335,7 +338,7 @@ splunk_import_view() {
 }

 splunk_import_view dmarc_aggregate dashboards/splunk/dmarc_aggregate_dashboard.xml
-splunk_import_view dmarc_forensic  dashboards/splunk/dmarc_forensic_dashboard.xml
+splunk_import_view dmarc_failure   dashboards/splunk/dmarc_failure_dashboard.xml
 splunk_import_view smtp_tls        dashboards/splunk/smtp_tls_dashboard.xml

 cat <<EOF
@@ -4,7 +4,7 @@ This directory holds the dashboard sources that ship with parsedmarc:

 - [opensearch/opensearch_dashboards.ndjson](opensearch/opensearch_dashboards.ndjson) — the source-of-truth saved-objects export. It is imported into both **OpenSearch Dashboards** and **Kibana** (the file format is compatible with both).
 - [grafana/Grafana-DMARC_Reports.json](grafana/Grafana-DMARC_Reports.json) — the Grafana dashboard, with two Elasticsearch datasources (`dmarc-ag`, `dmarc-fo`).
- [splunk/](splunk/) — three Splunk dashboard XML views (`dmarc_aggregate`, `dmarc_forensic`, `smtp_tls`).
+- [splunk/](splunk/) — three Splunk dashboard XML views (`dmarc_aggregate`, `dmarc_failure`, `smtp_tls`).

 Edits to any of these files should be exported from a running instance after authoring the change in the UI, not hand-edited (with the occasional exception of small XML tweaks for Splunk).

@@ -73,12 +73,12 @@ OSD imports default to the `global_tenant` so other admins on the instance can s
 2. **Dashboard settings → JSON Model**, copy the JSON, save it to [grafana/Grafana-DMARC_Reports.json](grafana/Grafana-DMARC_Reports.json).
 3. Re-run the bootstrap script.

-The bootstrap script provisions two `elasticsearch` datasources (`dmarc-ag` for `dmarc_aggregate*`, `dmarc-fo` for `dmarc_forensic*`) on first run; existing datasources are left alone.
+The bootstrap script provisions two `elasticsearch` datasources (`dmarc-ag` for `dmarc_aggregate*`, `dmarc-fo` for `dmarc_f*`, which matches both pre-rename `dmarc_forensic*` and post-rename `dmarc_failure*`) on first run; existing datasources are left alone.

 ### Splunk

 1. Edit the dashboard at http://localhost:8000/ inside the **DMARC** app.
-2. Open the dashboard's **Source** view, copy the XML, and paste it over the matching file in [splunk/](splunk/) (`dmarc_aggregate_dashboard.xml`, `dmarc_forensic_dashboard.xml`, or `smtp_tls_dashboard.xml`).
+2. Open the dashboard's **Source** view, copy the XML, and paste it over the matching file in [splunk/](splunk/) (`dmarc_aggregate_dashboard.xml`, `dmarc_failure_dashboard.xml`, or `smtp_tls_dashboard.xml`).
 3. Re-run the bootstrap script. It re-imports each view via `DELETE` + `POST` to the splunkd management API.

 ## Reseeding sample data
@@ -87,7 +87,7 @@ The bootstrap script provisions two `elasticsearch` datasources (`dmarc-ag` for
 RESEED=1 ./dashboard-dev-bootstrap.sh
 ```

-Wipes every `dmarc_aggregate*` / `dmarc_forensic*` / `smtp_tls*` index from ES and OS, drops and recreates the Splunk `email` index, then re-runs the parsedmarc CLI against the curated sample list. Use this after changing parsedmarc's enrichment or output schemas.
+Wipes every `dmarc_aggregate*` / `dmarc_failure*` / `dmarc_forensic*` / `smtp_tls*` index from ES and OS, drops and recreates the Splunk `email` index, then re-runs the parsedmarc CLI against the curated sample list. Use this after changing parsedmarc's enrichment or output schemas.

 ## Tearing the stack down

@@ -2751,7 +2751,7 @@
          {
            "matcher": {
              "id": "byName",
-              "options": "Forensic Policy"
+              "options": "Failure Policy"
            },
            "properties": [
              {
@@ -2957,7 +2957,7 @@
              "published_policy.adkim.keyword": "DKIM Policy",
              "published_policy.aspf.keyword": "SPF Policy",
              "published_policy.domain.keyword": "Domain",
-              "published_policy.fo.keyword": "Forensic Policy",
+              "published_policy.fo.keyword": "Failure Policy",
              "published_policy.p.keyword": "Policy",
              "published_policy.pct": "Percentage",
              "published_policy.sp.keyword": "Subdomain Policy",
@@ -3685,7 +3685,7 @@
      },
      "id": 32,
      "panels": [],
-      "title": "DMARC Forensic",
+      "title": "DMARC Failure",
      "type": "row"
    },
    {
@@ -3929,7 +3929,7 @@
      ],
      "timeFrom": null,
      "timeShift": null,
-      "title": "Forensic Samples",
+      "title": "Failure Samples",
      "transformations": [
        {
          "id": "organize",
@@ -4026,7 +4026,7 @@
      ],
      "timeFrom": null,
      "timeShift": null,
-      "title": "Forensic Sample Sources by Country",
+      "title": "Failure Sample Sources by Country",
      "type": "geomap",
      "options": {
        "view": {
@@ -4207,7 +4207,7 @@
      ],
      "timeFrom": null,
      "timeShift": null,
-      "title": "DMARC Forensic Sample Source Countries",
+      "title": "DMARC Failure Sample Source Countries",
      "transformations": [
        {
          "id": "organize",
@@ -4402,7 +4402,7 @@
      ],
      "timeFrom": null,
      "timeShift": null,
-      "title": "Top 1000 Forensic Sample Source IP Addresses",
+      "title": "Top 1000 Failure Sample Source IP Addresses",
      "transformations": [
        {
          "id": "organize",
@@ -4807,7 +4807,7 @@
        "error": null,
        "hide": 2,
        "includeAll": false,
-        "label": "Datasource: Forensic",
+        "label": "Datasource: Failure",
        "multi": false,
        "name": "datasourcefo",
        "options": [],
@@ -662,7 +662,7 @@ services:
      PARSEDMARC_MAILBOX_WATCH: "true"
      PARSEDMARC_ELASTICSEARCH_HOSTS: http://elasticsearch:9200
      PARSEDMARC_GENERAL_SAVE_AGGREGATE: "true"
-      PARSEDMARC_GENERAL_SAVE_FORENSIC: "true"
+      PARSEDMARC_GENERAL_SAVE_FAILURE: "true"
 ```

 ### Docker secrets (`_FILE` suffix)
@@ -2064,6 +2064,54 @@ def get_dmarc_reports_from_mbox(
    }


+def _migrate_forensic_archive_folder(
+    connection: MailboxConnection, archive_folder: str
+) -> None:
+    """Consolidate a pre-rename ``<archive>/Forensic`` subfolder into
+    ``<archive>/Failure``.
+
+    Before failure reports were renamed from "forensic" reports, they were
+    archived under ``<archive_folder>/Forensic``; they now go to
+    ``<archive_folder>/Failure``. This best-effort, run-on-startup migration
+    moves any pre-existing legacy archive into the new location so reports
+    filed before and after the rename live in the same folder.
+
+    It is a no-op when there is no legacy ``Forensic`` folder (the common
+    case), and never raises: a mailbox that cannot be reorganized is logged
+    and skipped, consistent with the rest of parsedmarc's mailbox handling
+    (warn, don't crash). Uses the folder-management API added in mailsuite
+    2.1.0 (``folder_exists`` / ``rename_folder`` / ``merge_folders``).
+    """
+    old_folder = "{0}/Forensic".format(archive_folder)
+    new_folder = "{0}/Failure".format(archive_folder)
+    try:
+        if not connection.folder_exists(old_folder):
+            return
+        if connection.folder_exists(new_folder):
+            # Both exist (e.g. a partial earlier migration, or a manually
+            # created Failure folder): move the legacy folder's messages into
+            # the new one and drop the now-empty legacy folder.
+            connection.merge_folders(old_folder, new_folder)
+            logger.info(
+                "Merged legacy archive folder {0} into {1}".format(
+                    old_folder, new_folder
+                )
+            )
+        else:
+            connection.rename_folder(old_folder, new_folder)
+            logger.info(
+                "Renamed legacy archive folder {0} to {1}".format(
+                    old_folder, new_folder
+                )
+            )
+    except Exception as error:
+        logger.warning(
+            "Could not migrate legacy archive folder {0} to {1}: {2}".format(
+                old_folder, new_folder, error
+            )
+        )
+
+
 def get_dmarc_reports_from_mailbox(
    connection: MailboxConnection,
    *,
@@ -2134,7 +2182,7 @@ def get_dmarc_reports_from_mailbox(
    failure_report_msg_uids = []
    smtp_tls_msg_uids = []
    aggregate_reports_folder = "{0}/Aggregate".format(archive_folder)
-    failure_reports_folder = "{0}/Forensic".format(archive_folder)
+    failure_reports_folder = "{0}/Failure".format(archive_folder)
    smtp_tls_reports_folder = "{0}/SMTP-TLS".format(archive_folder)
    invalid_reports_folder = "{0}/Invalid".format(archive_folder)

@@ -2144,6 +2192,7 @@ def get_dmarc_reports_from_mailbox(
        smtp_tls_reports = results["smtp_tls_reports"].copy()

    if not test and create_folders:
+        _migrate_forensic_archive_folder(connection, archive_folder)
        connection.create_folder(archive_folder)
        connection.create_folder(aggregate_reports_folder)
        connection.create_folder(failure_reports_folder)
@@ -10,7 +10,7 @@ name = "parsedmarc"
 dynamic = [
    "version",
 ]
-description = "A Python package and CLI for parsing aggregate and forensic DMARC reports"
+description = "A Python package and CLI for parsing aggregate, failure, and SMTP TLS DMARC reports"
 readme = "README.md"
 license = "Apache-2.0"
 authors = [
@@ -41,7 +41,7 @@ dependencies = [
    "expiringdict>=1.1.4",
    "kafka-python-ng>=2.2.2",
    "lxml>=4.4.0",
-    "mailsuite[gmail,msgraph]>=2.0.2",
+    "mailsuite[gmail,msgraph]>=2.1.0",
    "maxminddb>=2.0.0",
    "opensearch-py>=2.4.2,<=4.0.0",
    "publicsuffixlist>=0.10.0",
@@ -6,18 +6,22 @@ extract_report, get_dmarc_reports_from_mbox, and the CSV / JSON renderers.
 """

 import json
+import mailbox
 import os
 import unittest
 from datetime import datetime, timedelta, timezone
 from glob import glob
 from io import BytesIO
 from pathlib import Path
-from tempfile import NamedTemporaryFile
+from shutil import rmtree
+from tempfile import NamedTemporaryFile, mkdtemp
 from typing import BinaryIO, cast
+from unittest.mock import MagicMock

 from lxml import etree  # type: ignore[import-untyped]

 import parsedmarc
+from parsedmarc.mail import MaildirConnection
 from parsedmarc.types import AggregateReport, FailureReport, SMTPTLSReport

 # Detect if running in GitHub Actions to skip DNS lookups
@@ -2330,6 +2334,105 @@ class TestGetDmarcReportsFromMailboxValidation(unittest.TestCase):
        self.assertIn("connection", str(ctx.exception).lower())


+class TestMigrateForensicArchiveFolderErrorHandling(unittest.TestCase):
+    """The one migration scenario a real on-disk Maildir can't reproduce: a
+    backend that raises mid-operation. _migrate_forensic_archive_folder must
+    warn and continue (warn, don't crash) so a mailbox it cannot reorganize
+    doesn't abort the whole run.
+
+    The rename / merge / no-op behavior is covered for real (no mocks) in
+    TestMigrateForensicArchiveFolderMaildir; only this failure path needs a
+    mock, to force a folder operation to raise."""
+
+    def test_backend_error_is_warned_not_raised(self):
+        conn = MagicMock()
+        conn.folder_exists.side_effect = lambda name: name.endswith("/Forensic")
+        conn.rename_folder.side_effect = RuntimeError("server said no")
+        with self.assertLogs("parsedmarc.log", level="WARNING") as cm:
+            parsedmarc._migrate_forensic_archive_folder(conn, "Archive")
+        self.assertTrue(
+            any("Could not migrate" in line for line in cm.output),
+            cm.output,
+        )
+
+
+class TestMigrateForensicArchiveFolderMaildir(unittest.TestCase):
+    """End-to-end migration against a real on-disk Maildir via mailsuite's
+    MaildirConnection — no mocks. This exercises the actual mailsuite 2.1.0
+    folder API (folder_exists / rename_folder / merge_folders / delete_folder)
+    and the on-disk result, so it would catch a real behavioral break that a
+    mock-based test cannot (e.g. a signature mismatch, or messages left behind
+    in the legacy folder)."""
+
+    def setUp(self):
+        self._tmp = mkdtemp()
+        self.addCleanup(rmtree, self._tmp, ignore_errors=True)
+        self.conn = MaildirConnection(self._tmp, maildir_create=True)
+        # Parent must exist before nested subfolders, as get_dmarc_reports_-
+        # from_mailbox creates it (create_folder(archive_folder)) first.
+        self.conn.create_folder("Archive")
+
+    def _seed(self, folder, subject):
+        """Drop a real RFC 822 message into an on-disk Maildir subfolder."""
+        self.conn.create_folder(folder)
+        box = mailbox.Maildir(os.path.join(self._tmp, "." + folder))
+        box.add(
+            mailbox.MaildirMessage(
+                "From: reporter@example.com\n"
+                "To: dmarc@example.org\n"
+                f"Subject: {subject}\n\nbody\n"
+            )
+        )
+        box.flush()
+
+    def test_rename_moves_legacy_folder_and_its_messages(self):
+        """Only the legacy folder exists: it (and the message inside it) is
+        renamed to Failure, leaving nothing behind in Forensic."""
+        self._seed("Archive/Forensic", "legacy failure report")
+        self.assertTrue(self.conn.folder_exists("Archive/Forensic"))
+        self.assertFalse(self.conn.folder_exists("Archive/Failure"))
+
+        parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive")
+
+        self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
+        self.assertTrue(self.conn.folder_exists("Archive/Failure"))
+        self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 1)
+
+    def test_merge_consolidates_messages_when_both_exist(self):
+        """Both folders exist: the legacy folder's messages are merged into
+        the existing Failure folder (which keeps its own), and the emptied
+        legacy folder is deleted."""
+        self._seed("Archive/Failure", "post-rename failure report")
+        self._seed("Archive/Forensic", "legacy failure report")
+
+        parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive")
+
+        self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
+        self.assertTrue(self.conn.folder_exists("Archive/Failure"))
+        self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 2)
+
+    def test_no_legacy_folder_is_noop(self):
+        """No legacy folder (the common case): nothing is created or changed."""
+        parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive")
+        self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
+        self.assertFalse(self.conn.folder_exists("Archive/Failure"))
+
+    def test_orchestration_migrates_before_creating_folders(self):
+        """get_dmarc_reports_from_mailbox runs the migration *before* it
+        creates folders: a seeded legacy Forensic folder ends up consolidated
+        into the newly-created Failure subfolder (message and all), not split
+        across the two. Driven through the real orchestration with an empty
+        INBOX, so no parsing or network occurs."""
+        self._seed("Archive/Forensic", "legacy failure report")
+
+        result = parsedmarc.get_dmarc_reports_from_mailbox(connection=self.conn)
+
+        self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
+        self.assertTrue(self.conn.folder_exists("Archive/Failure"))
+        self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 1)
+        self.assertEqual(result["failure_reports"], [])
+
+
 class TestEmailResultsErrorBranches(unittest.TestCase):
    """email_results requires mail_to to be a list — this is enforced
    by an assert. A regression that dropped the assert would mean the