Finish forensic→failure rename: archive-folder migration + dashboard/doc cleanup (#776)

The forensic→failure rename (#659) left a few loose ends and one deliberate
hold-back. This closes them.

Leftover rename misses (broken paths / stale canonical names):
- CONTRIBUTING.md, dashboard-dev-bootstrap.sh: samples/forensic/* → samples/failure/*
- dashboard-dev-bootstrap.sh, dashboards/README.md: dmarc_forensic_dashboard.xml
  → dmarc_failure_dashboard.xml (the file was already renamed; the import path
  and view name were not)
- docs/source/usage.md: PARSEDMARC_GENERAL_SAVE_FORENSIC → ..._SAVE_FAILURE example
- samples/parsedmarc.ini: save_forensic → save_failure
- pyproject.toml, README.md: canonical "failure" naming
(ci.ini intentionally keeps save_forensic to smoke-test the deprecated alias.)

Archive subfolder rename + on-startup migration:
- New failure reports now archive to <archive>/Failure (was <archive>/Forensic).
- _migrate_forensic_archive_folder() runs once on startup (best-effort):
  renames Forensic→Failure when no Failure folder exists yet, merges the two
  when both exist, no-ops when there's no legacy folder, and logs-and-skips a
  mailbox it can't reorganize (warn, don't crash). This consolidates pre- and
  post-rename failure reports into one folder, replacing the previously
  documented decision to keep the folder named Forensic to avoid a split
  archive. Uses the folder-management API (folder_exists / rename_folder /
  merge_folders) added in mailsuite 2.1.0; the pin is bumped to >=2.1.0.

Grafana dashboard (the rename PR updated OSD/Splunk/ES-OS but not Grafana):
- Forensic panel titles + the datasource label → Failure; the fo-column display
  label and its linked byName field-override matcher both → "Failure Policy"
  (changed together so the column-width override keeps matching).
- dev-bootstrap Grafana ES datasource: dmarc_forensic* → dmarc_f* (matches both
  pre-rename dmarc_forensic* and post-rename dmarc_failure*, like the OSD/Kibana
  dashboards); RESEED wipe loop now also clears dmarc_failure* indices.
- Removed dashboards/grafana/Grafana-DMARC_Reports.json-new_panel.json, an
  orphan export accidentally committed in #736 and referenced by nothing.

Tests (tests/test_init.py):
- TestMigrateForensicArchiveFolderMaildir: real on-disk Maildir round-trips via
  mailsuite's MaildirConnection (no mocks) — rename, merge, no-op, and the full
  get_dmarc_reports_from_mailbox orchestration. Runs in CI (no network/creds).
- TestMigrateForensicArchiveFolderErrorHandling: the one path a real Maildir
  can't reproduce — a backend that raises mid-operation must warn, not crash.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Sean Whalen
2026-05-21 12:29:40 -04:00
committed by GitHub
parent 327fcff2b9
commit a6778707d7
11 changed files with 181 additions and 5927 deletions
+1 -1
View File
@@ -64,7 +64,7 @@ Forensic reports have been renamed to failure reports throughout the project to
- Old function/type names preserved as aliases: `parse_forensic_report = parse_failure_report`, `ForensicReport = FailureReport`, etc.
- CLI config accepts both old (`save_forensic`, `forensic_topic`) and new keys (`save_failure`, `failure_topic`)
- IMAP archive subfolder name is intentionally kept as `Forensic` (under `archive_folder`) so existing deployments don't end up with a split archive across `Forensic/` and `Failure/`.
- The archive subfolder for failure reports is now `Failure` (under `archive_folder`), renamed from `Forensic`. To avoid a split archive across `Forensic/` and `Failure/`, parsedmarc migrates an existing `Forensic` subfolder into `Failure` automatically on startup (best-effort): it renames the folder when no `Failure` folder exists yet, merges the two when both already exist, and logs-and-skips any mailbox it cannot reorganize (warn, don't crash). This consolidation uses the folder-management API (`folder_exists` / `rename_folder` / `merge_folders`) added in mailsuite 2.1.0, so the required `mailsuite` version is now `>=2.1.0`.
- RFC 7489 reports parse with `None` for RFC 9990-only fields
- **Updated dashboards with queries are backward compatible**: queries match data indexed under both old (`dmarc_forensic*` / `dmarc:forensic`) and new (`dmarc_failure*` / `dmarc:failure`) names, so dashboards show data from before and after the rename:
- **OpenSearch Dashboards**: Index pattern uses `dmarc_f*` to match both `dmarc_forensic*` and `dmarc_failure*`
+1 -1
View File
@@ -34,7 +34,7 @@ sample reports:
```bash
parsedmarc --debug -c ci.ini samples/aggregate/*
parsedmarc --debug -c ci.ini samples/forensic/*
parsedmarc --debug -c ci.ini samples/failure/*
```
To skip DNS lookups during tests, set:
+1 -1
View File
@@ -29,7 +29,7 @@ Please consider [sponsoring my work](https://github.com/sponsors/seanthegeek) if
## Features
- Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf DMARC reports
- Parses failure/ruf DMARC reports (formerly called forensic reports)
- Parses reports from SMTP TLS Reporting
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports
+9 -6
View File
@@ -176,9 +176,10 @@ else
echo " RESEED=1: wiping existing parsedmarc data from all backends"
# ES 8.x rejects wildcard DELETEs by default
# (action.destructive_requires_name=true). Enumerate the daily indexes
# parsedmarc rolls (dmarc_aggregate-YYYY-MM-DD, dmarc_forensic-...,
# smtp_tls-...) and DELETE each one explicitly.
for prefix in dmarc_aggregate dmarc_forensic smtp_tls; do
# parsedmarc rolls (dmarc_aggregate-YYYY-MM-DD, dmarc_failure-...,
# smtp_tls-...) and DELETE each one explicitly. dmarc_forensic-* is the
# pre-rename failure index family, kept here so RESEED clears old data.
for prefix in dmarc_aggregate dmarc_failure dmarc_forensic smtp_tls; do
for idx in $(curl -sf "http://localhost:9200/_cat/indices/${prefix}*?h=index" 2>/dev/null); do
curl -sS -X DELETE "http://localhost:9200/${idx}" >/dev/null 2>&1 || true
done
@@ -231,7 +232,7 @@ else
samples/aggregate/protection.outlook.com!example.com!1711756800!1711843200.xml
samples/aggregate/usssa.com!example.com!1538784000!1538870399.xml
samples/aggregate/veeam.com!example.com!1530133200!1530219600.xml
samples/forensic/*.eml
samples/failure/*.eml
samples/smtp_tls/*.json
samples/smtp_tls/google.com_smtp_tls_report.eml
)
@@ -268,7 +269,9 @@ log "Configuring Grafana datasources"
# Two Elasticsearch datasources, one per index family, matching the dashboard's
# template variables (dmarc-ag and dmarc-fo). Skipped when already present.
declare -a GF_DS_NAMES=("dmarc-ag" "dmarc-fo")
declare -a GF_DS_INDEX=("dmarc_aggregate*" "dmarc_forensic*")
# dmarc_f* matches both pre-rename dmarc_forensic* and post-rename
# dmarc_failure* indices, mirroring the OpenSearch/Kibana dashboards.
declare -a GF_DS_INDEX=("dmarc_aggregate*" "dmarc_f*")
declare -a GF_DS_TIME=("date_range" "arrival_date")
for i in 0 1; do
name="${GF_DS_NAMES[$i]}"
@@ -335,7 +338,7 @@ splunk_import_view() {
}
splunk_import_view dmarc_aggregate dashboards/splunk/dmarc_aggregate_dashboard.xml
splunk_import_view dmarc_forensic dashboards/splunk/dmarc_forensic_dashboard.xml
splunk_import_view dmarc_failure dashboards/splunk/dmarc_failure_dashboard.xml
splunk_import_view smtp_tls dashboards/splunk/smtp_tls_dashboard.xml
cat <<EOF
+4 -4
View File
@@ -4,7 +4,7 @@ This directory holds the dashboard sources that ship with parsedmarc:
- [opensearch/opensearch_dashboards.ndjson](opensearch/opensearch_dashboards.ndjson) — the source-of-truth saved-objects export. It is imported into both **OpenSearch Dashboards** and **Kibana** (the file format is compatible with both).
- [grafana/Grafana-DMARC_Reports.json](grafana/Grafana-DMARC_Reports.json) — the Grafana dashboard, with two Elasticsearch datasources (`dmarc-ag`, `dmarc-fo`).
- [splunk/](splunk/) — three Splunk dashboard XML views (`dmarc_aggregate`, `dmarc_forensic`, `smtp_tls`).
- [splunk/](splunk/) — three Splunk dashboard XML views (`dmarc_aggregate`, `dmarc_failure`, `smtp_tls`).
Edits to any of these files should be exported from a running instance after authoring the change in the UI, not hand-edited (with the occasional exception of small XML tweaks for Splunk).
@@ -73,12 +73,12 @@ OSD imports default to the `global_tenant` so other admins on the instance can s
2. **Dashboard settings → JSON Model**, copy the JSON, save it to [grafana/Grafana-DMARC_Reports.json](grafana/Grafana-DMARC_Reports.json).
3. Re-run the bootstrap script.
The bootstrap script provisions two `elasticsearch` datasources (`dmarc-ag` for `dmarc_aggregate*`, `dmarc-fo` for `dmarc_forensic*`) on first run; existing datasources are left alone.
The bootstrap script provisions two `elasticsearch` datasources (`dmarc-ag` for `dmarc_aggregate*`, `dmarc-fo` for `dmarc_f*`, which matches both pre-rename `dmarc_forensic*` and post-rename `dmarc_failure*`) on first run; existing datasources are left alone.
### Splunk
1. Edit the dashboard at http://localhost:8000/ inside the **DMARC** app.
2. Open the dashboard's **Source** view, copy the XML, and paste it over the matching file in [splunk/](splunk/) (`dmarc_aggregate_dashboard.xml`, `dmarc_forensic_dashboard.xml`, or `smtp_tls_dashboard.xml`).
2. Open the dashboard's **Source** view, copy the XML, and paste it over the matching file in [splunk/](splunk/) (`dmarc_aggregate_dashboard.xml`, `dmarc_failure_dashboard.xml`, or `smtp_tls_dashboard.xml`).
3. Re-run the bootstrap script. It re-imports each view via `DELETE` + `POST` to the splunkd management API.
## Reseeding sample data
@@ -87,7 +87,7 @@ The bootstrap script provisions two `elasticsearch` datasources (`dmarc-ag` for
RESEED=1 ./dashboard-dev-bootstrap.sh
```
Wipes every `dmarc_aggregate*` / `dmarc_forensic*` / `smtp_tls*` index from ES and OS, drops and recreates the Splunk `email` index, then re-runs the parsedmarc CLI against the curated sample list. Use this after changing parsedmarc's enrichment or output schemas.
Wipes every `dmarc_aggregate*` / `dmarc_failure*` / `dmarc_forensic*` / `smtp_tls*` index from ES and OS, drops and recreates the Splunk `email` index, then re-runs the parsedmarc CLI against the curated sample list. Use this after changing parsedmarc's enrichment or output schemas.
## Tearing the stack down
@@ -2751,7 +2751,7 @@
{
"matcher": {
"id": "byName",
"options": "Forensic Policy"
"options": "Failure Policy"
},
"properties": [
{
@@ -2957,7 +2957,7 @@
"published_policy.adkim.keyword": "DKIM Policy",
"published_policy.aspf.keyword": "SPF Policy",
"published_policy.domain.keyword": "Domain",
"published_policy.fo.keyword": "Forensic Policy",
"published_policy.fo.keyword": "Failure Policy",
"published_policy.p.keyword": "Policy",
"published_policy.pct": "Percentage",
"published_policy.sp.keyword": "Subdomain Policy",
@@ -3685,7 +3685,7 @@
},
"id": 32,
"panels": [],
"title": "DMARC Forensic",
"title": "DMARC Failure",
"type": "row"
},
{
@@ -3929,7 +3929,7 @@
],
"timeFrom": null,
"timeShift": null,
"title": "Forensic Samples",
"title": "Failure Samples",
"transformations": [
{
"id": "organize",
@@ -4026,7 +4026,7 @@
],
"timeFrom": null,
"timeShift": null,
"title": "Forensic Sample Sources by Country",
"title": "Failure Sample Sources by Country",
"type": "geomap",
"options": {
"view": {
@@ -4207,7 +4207,7 @@
],
"timeFrom": null,
"timeShift": null,
"title": "DMARC Forensic Sample Source Countries",
"title": "DMARC Failure Sample Source Countries",
"transformations": [
{
"id": "organize",
@@ -4402,7 +4402,7 @@
],
"timeFrom": null,
"timeShift": null,
"title": "Top 1000 Forensic Sample Source IP Addresses",
"title": "Top 1000 Failure Sample Source IP Addresses",
"transformations": [
{
"id": "organize",
@@ -4807,7 +4807,7 @@
"error": null,
"hide": 2,
"includeAll": false,
"label": "Datasource: Forensic",
"label": "Datasource: Failure",
"multi": false,
"name": "datasourcefo",
"options": [],
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -662,7 +662,7 @@ services:
PARSEDMARC_MAILBOX_WATCH: "true"
PARSEDMARC_ELASTICSEARCH_HOSTS: http://elasticsearch:9200
PARSEDMARC_GENERAL_SAVE_AGGREGATE: "true"
PARSEDMARC_GENERAL_SAVE_FORENSIC: "true"
PARSEDMARC_GENERAL_SAVE_FAILURE: "true"
```
### Docker secrets (`_FILE` suffix)
+50 -1
View File
@@ -2064,6 +2064,54 @@ def get_dmarc_reports_from_mbox(
}
def _migrate_forensic_archive_folder(
connection: MailboxConnection, archive_folder: str
) -> None:
"""Consolidate a pre-rename ``<archive>/Forensic`` subfolder into
``<archive>/Failure``.
Before failure reports were renamed from "forensic" reports, they were
archived under ``<archive_folder>/Forensic``; they now go to
``<archive_folder>/Failure``. This best-effort, run-on-startup migration
moves any pre-existing legacy archive into the new location so reports
filed before and after the rename live in the same folder.
It is a no-op when there is no legacy ``Forensic`` folder (the common
case), and never raises: a mailbox that cannot be reorganized is logged
and skipped, consistent with the rest of parsedmarc's mailbox handling
(warn, don't crash). Uses the folder-management API added in mailsuite
2.1.0 (``folder_exists`` / ``rename_folder`` / ``merge_folders``).
"""
old_folder = "{0}/Forensic".format(archive_folder)
new_folder = "{0}/Failure".format(archive_folder)
try:
if not connection.folder_exists(old_folder):
return
if connection.folder_exists(new_folder):
# Both exist (e.g. a partial earlier migration, or a manually
# created Failure folder): move the legacy folder's messages into
# the new one and drop the now-empty legacy folder.
connection.merge_folders(old_folder, new_folder)
logger.info(
"Merged legacy archive folder {0} into {1}".format(
old_folder, new_folder
)
)
else:
connection.rename_folder(old_folder, new_folder)
logger.info(
"Renamed legacy archive folder {0} to {1}".format(
old_folder, new_folder
)
)
except Exception as error:
logger.warning(
"Could not migrate legacy archive folder {0} to {1}: {2}".format(
old_folder, new_folder, error
)
)
def get_dmarc_reports_from_mailbox(
connection: MailboxConnection,
*,
@@ -2134,7 +2182,7 @@ def get_dmarc_reports_from_mailbox(
failure_report_msg_uids = []
smtp_tls_msg_uids = []
aggregate_reports_folder = "{0}/Aggregate".format(archive_folder)
failure_reports_folder = "{0}/Forensic".format(archive_folder)
failure_reports_folder = "{0}/Failure".format(archive_folder)
smtp_tls_reports_folder = "{0}/SMTP-TLS".format(archive_folder)
invalid_reports_folder = "{0}/Invalid".format(archive_folder)
@@ -2144,6 +2192,7 @@ def get_dmarc_reports_from_mailbox(
smtp_tls_reports = results["smtp_tls_reports"].copy()
if not test and create_folders:
_migrate_forensic_archive_folder(connection, archive_folder)
connection.create_folder(archive_folder)
connection.create_folder(aggregate_reports_folder)
connection.create_folder(failure_reports_folder)
+2 -2
View File
@@ -10,7 +10,7 @@ name = "parsedmarc"
dynamic = [
"version",
]
description = "A Python package and CLI for parsing aggregate and forensic DMARC reports"
description = "A Python package and CLI for parsing aggregate, failure, and SMTP TLS DMARC reports"
readme = "README.md"
license = "Apache-2.0"
authors = [
@@ -41,7 +41,7 @@ dependencies = [
"expiringdict>=1.1.4",
"kafka-python-ng>=2.2.2",
"lxml>=4.4.0",
"mailsuite[gmail,msgraph]>=2.0.2",
"mailsuite[gmail,msgraph]>=2.1.0",
"maxminddb>=2.0.0",
"opensearch-py>=2.4.2,<=4.0.0",
"publicsuffixlist>=0.10.0",
+104 -1
View File
@@ -6,18 +6,22 @@ extract_report, get_dmarc_reports_from_mbox, and the CSV / JSON renderers.
"""
import json
import mailbox
import os
import unittest
from datetime import datetime, timedelta, timezone
from glob import glob
from io import BytesIO
from pathlib import Path
from tempfile import NamedTemporaryFile
from shutil import rmtree
from tempfile import NamedTemporaryFile, mkdtemp
from typing import BinaryIO, cast
from unittest.mock import MagicMock
from lxml import etree # type: ignore[import-untyped]
import parsedmarc
from parsedmarc.mail import MaildirConnection
from parsedmarc.types import AggregateReport, FailureReport, SMTPTLSReport
# Detect if running in GitHub Actions to skip DNS lookups
@@ -2330,6 +2334,105 @@ class TestGetDmarcReportsFromMailboxValidation(unittest.TestCase):
self.assertIn("connection", str(ctx.exception).lower())
class TestMigrateForensicArchiveFolderErrorHandling(unittest.TestCase):
"""The one migration scenario a real on-disk Maildir can't reproduce: a
backend that raises mid-operation. _migrate_forensic_archive_folder must
warn and continue (warn, don't crash) so a mailbox it cannot reorganize
doesn't abort the whole run.
The rename / merge / no-op behavior is covered for real (no mocks) in
TestMigrateForensicArchiveFolderMaildir; only this failure path needs a
mock, to force a folder operation to raise."""
def test_backend_error_is_warned_not_raised(self):
conn = MagicMock()
conn.folder_exists.side_effect = lambda name: name.endswith("/Forensic")
conn.rename_folder.side_effect = RuntimeError("server said no")
with self.assertLogs("parsedmarc.log", level="WARNING") as cm:
parsedmarc._migrate_forensic_archive_folder(conn, "Archive")
self.assertTrue(
any("Could not migrate" in line for line in cm.output),
cm.output,
)
class TestMigrateForensicArchiveFolderMaildir(unittest.TestCase):
"""End-to-end migration against a real on-disk Maildir via mailsuite's
MaildirConnection — no mocks. This exercises the actual mailsuite 2.1.0
folder API (folder_exists / rename_folder / merge_folders / delete_folder)
and the on-disk result, so it would catch a real behavioral break that a
mock-based test cannot (e.g. a signature mismatch, or messages left behind
in the legacy folder)."""
def setUp(self):
self._tmp = mkdtemp()
self.addCleanup(rmtree, self._tmp, ignore_errors=True)
self.conn = MaildirConnection(self._tmp, maildir_create=True)
# Parent must exist before nested subfolders, as get_dmarc_reports_-
# from_mailbox creates it (create_folder(archive_folder)) first.
self.conn.create_folder("Archive")
def _seed(self, folder, subject):
"""Drop a real RFC 822 message into an on-disk Maildir subfolder."""
self.conn.create_folder(folder)
box = mailbox.Maildir(os.path.join(self._tmp, "." + folder))
box.add(
mailbox.MaildirMessage(
"From: reporter@example.com\n"
"To: dmarc@example.org\n"
f"Subject: {subject}\n\nbody\n"
)
)
box.flush()
def test_rename_moves_legacy_folder_and_its_messages(self):
"""Only the legacy folder exists: it (and the message inside it) is
renamed to Failure, leaving nothing behind in Forensic."""
self._seed("Archive/Forensic", "legacy failure report")
self.assertTrue(self.conn.folder_exists("Archive/Forensic"))
self.assertFalse(self.conn.folder_exists("Archive/Failure"))
parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive")
self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
self.assertTrue(self.conn.folder_exists("Archive/Failure"))
self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 1)
def test_merge_consolidates_messages_when_both_exist(self):
"""Both folders exist: the legacy folder's messages are merged into
the existing Failure folder (which keeps its own), and the emptied
legacy folder is deleted."""
self._seed("Archive/Failure", "post-rename failure report")
self._seed("Archive/Forensic", "legacy failure report")
parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive")
self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
self.assertTrue(self.conn.folder_exists("Archive/Failure"))
self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 2)
def test_no_legacy_folder_is_noop(self):
"""No legacy folder (the common case): nothing is created or changed."""
parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive")
self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
self.assertFalse(self.conn.folder_exists("Archive/Failure"))
def test_orchestration_migrates_before_creating_folders(self):
"""get_dmarc_reports_from_mailbox runs the migration *before* it
creates folders: a seeded legacy Forensic folder ends up consolidated
into the newly-created Failure subfolder (message and all), not split
across the two. Driven through the real orchestration with an empty
INBOX, so no parsing or network occurs."""
self._seed("Archive/Forensic", "legacy failure report")
result = parsedmarc.get_dmarc_reports_from_mailbox(connection=self.conn)
self.assertFalse(self.conn.folder_exists("Archive/Forensic"))
self.assertTrue(self.conn.folder_exists("Archive/Failure"))
self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 1)
self.assertEqual(result["failure_reports"], [])
class TestEmailResultsErrorBranches(unittest.TestCase):
"""email_results requires mail_to to be a list — this is enforced
by an assert. A regression that dropped the assert would mean the