From a6778707d7e11cfdb25402f80526b2c605bf6744 Mon Sep 17 00:00:00 2001 From: Sean Whalen <44679+seanthegeek@users.noreply.github.com> Date: Thu, 21 May 2026 12:29:40 -0400 Subject: [PATCH] =?UTF-8?q?Finish=20forensic=E2=86=92failure=20rename:=20a?= =?UTF-8?q?rchive-folder=20migration=20+=20dashboard/doc=20cleanup=20(#776?= =?UTF-8?q?)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The forensic→failure rename (#659) left a few loose ends and one deliberate hold-back. This closes them. Leftover rename misses (broken paths / stale canonical names): - CONTRIBUTING.md, dashboard-dev-bootstrap.sh: samples/forensic/* → samples/failure/* - dashboard-dev-bootstrap.sh, dashboards/README.md: dmarc_forensic_dashboard.xml → dmarc_failure_dashboard.xml (the file was already renamed; the import path and view name were not) - docs/source/usage.md: PARSEDMARC_GENERAL_SAVE_FORENSIC → ..._SAVE_FAILURE example - samples/parsedmarc.ini: save_forensic → save_failure - pyproject.toml, README.md: canonical "failure" naming (ci.ini intentionally keeps save_forensic to smoke-test the deprecated alias.) Archive subfolder rename + on-startup migration: - New failure reports now archive to /Failure (was /Forensic). - _migrate_forensic_archive_folder() runs once on startup (best-effort): renames Forensic→Failure when no Failure folder exists yet, merges the two when both exist, no-ops when there's no legacy folder, and logs-and-skips a mailbox it can't reorganize (warn, don't crash). This consolidates pre- and post-rename failure reports into one folder, replacing the previously documented decision to keep the folder named Forensic to avoid a split archive. Uses the folder-management API (folder_exists / rename_folder / merge_folders) added in mailsuite 2.1.0; the pin is bumped to >=2.1.0. Grafana dashboard (the rename PR updated OSD/Splunk/ES-OS but not Grafana): - Forensic panel titles + the datasource label → Failure; the fo-column display label and its linked byName field-override matcher both → "Failure Policy" (changed together so the column-width override keeps matching). - dev-bootstrap Grafana ES datasource: dmarc_forensic* → dmarc_f* (matches both pre-rename dmarc_forensic* and post-rename dmarc_failure*, like the OSD/Kibana dashboards); RESEED wipe loop now also clears dmarc_failure* indices. - Removed dashboards/grafana/Grafana-DMARC_Reports.json-new_panel.json, an orphan export accidentally committed in #736 and referenced by nothing. Tests (tests/test_init.py): - TestMigrateForensicArchiveFolderMaildir: real on-disk Maildir round-trips via mailsuite's MaildirConnection (no mocks) — rename, merge, no-op, and the full get_dmarc_reports_from_mailbox orchestration. Runs in CI (no network/creds). - TestMigrateForensicArchiveFolderErrorHandling: the one path a real Maildir can't reproduce — a backend that raises mid-operation must warn, not crash. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- CONTRIBUTING.md | 2 +- README.md | 2 +- dashboard-dev-bootstrap.sh | 15 +- dashboards/README.md | 8 +- dashboards/grafana/Grafana-DMARC_Reports.json | 16 +- .../Grafana-DMARC_Reports.json-new_panel.json | 5901 ----------------- docs/source/usage.md | 2 +- parsedmarc/__init__.py | 51 +- pyproject.toml | 4 +- tests/test_init.py | 105 +- 11 files changed, 181 insertions(+), 5927 deletions(-) delete mode 100644 dashboards/grafana/Grafana-DMARC_Reports.json-new_panel.json diff --git a/CHANGELOG.md b/CHANGELOG.md index 9bb50e4..2211697 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -64,7 +64,7 @@ Forensic reports have been renamed to failure reports throughout the project to - Old function/type names preserved as aliases: `parse_forensic_report = parse_failure_report`, `ForensicReport = FailureReport`, etc. - CLI config accepts both old (`save_forensic`, `forensic_topic`) and new keys (`save_failure`, `failure_topic`) -- IMAP archive subfolder name is intentionally kept as `Forensic` (under `archive_folder`) so existing deployments don't end up with a split archive across `Forensic/` and `Failure/`. +- The archive subfolder for failure reports is now `Failure` (under `archive_folder`), renamed from `Forensic`. To avoid a split archive across `Forensic/` and `Failure/`, parsedmarc migrates an existing `Forensic` subfolder into `Failure` automatically on startup (best-effort): it renames the folder when no `Failure` folder exists yet, merges the two when both already exist, and logs-and-skips any mailbox it cannot reorganize (warn, don't crash). This consolidation uses the folder-management API (`folder_exists` / `rename_folder` / `merge_folders`) added in mailsuite 2.1.0, so the required `mailsuite` version is now `>=2.1.0`. - RFC 7489 reports parse with `None` for RFC 9990-only fields - **Updated dashboards with queries are backward compatible**: queries match data indexed under both old (`dmarc_forensic*` / `dmarc:forensic`) and new (`dmarc_failure*` / `dmarc:failure`) names, so dashboards show data from before and after the rename: - **OpenSearch Dashboards**: Index pattern uses `dmarc_f*` to match both `dmarc_forensic*` and `dmarc_failure*` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e5db8cc..aa6b24d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -34,7 +34,7 @@ sample reports: ```bash parsedmarc --debug -c ci.ini samples/aggregate/* -parsedmarc --debug -c ci.ini samples/forensic/* +parsedmarc --debug -c ci.ini samples/failure/* ``` To skip DNS lookups during tests, set: diff --git a/README.md b/README.md index b92a945..0f59d95 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ Please consider [sponsoring my work](https://github.com/sponsors/seanthegeek) if ## Features - Parses draft and 1.0 standard aggregate/rua DMARC reports -- Parses forensic/failure/ruf DMARC reports +- Parses failure/ruf DMARC reports (formerly called forensic reports) - Parses reports from SMTP TLS Reporting - Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API - Transparently handles gzip or zip compressed reports diff --git a/dashboard-dev-bootstrap.sh b/dashboard-dev-bootstrap.sh index c3b5216..41ab042 100755 --- a/dashboard-dev-bootstrap.sh +++ b/dashboard-dev-bootstrap.sh @@ -176,9 +176,10 @@ else echo " RESEED=1: wiping existing parsedmarc data from all backends" # ES 8.x rejects wildcard DELETEs by default # (action.destructive_requires_name=true). Enumerate the daily indexes - # parsedmarc rolls (dmarc_aggregate-YYYY-MM-DD, dmarc_forensic-..., - # smtp_tls-...) and DELETE each one explicitly. - for prefix in dmarc_aggregate dmarc_forensic smtp_tls; do + # parsedmarc rolls (dmarc_aggregate-YYYY-MM-DD, dmarc_failure-..., + # smtp_tls-...) and DELETE each one explicitly. dmarc_forensic-* is the + # pre-rename failure index family, kept here so RESEED clears old data. + for prefix in dmarc_aggregate dmarc_failure dmarc_forensic smtp_tls; do for idx in $(curl -sf "http://localhost:9200/_cat/indices/${prefix}*?h=index" 2>/dev/null); do curl -sS -X DELETE "http://localhost:9200/${idx}" >/dev/null 2>&1 || true done @@ -231,7 +232,7 @@ else samples/aggregate/protection.outlook.com!example.com!1711756800!1711843200.xml samples/aggregate/usssa.com!example.com!1538784000!1538870399.xml samples/aggregate/veeam.com!example.com!1530133200!1530219600.xml - samples/forensic/*.eml + samples/failure/*.eml samples/smtp_tls/*.json samples/smtp_tls/google.com_smtp_tls_report.eml ) @@ -268,7 +269,9 @@ log "Configuring Grafana datasources" # Two Elasticsearch datasources, one per index family, matching the dashboard's # template variables (dmarc-ag and dmarc-fo). Skipped when already present. declare -a GF_DS_NAMES=("dmarc-ag" "dmarc-fo") -declare -a GF_DS_INDEX=("dmarc_aggregate*" "dmarc_forensic*") +# dmarc_f* matches both pre-rename dmarc_forensic* and post-rename +# dmarc_failure* indices, mirroring the OpenSearch/Kibana dashboards. +declare -a GF_DS_INDEX=("dmarc_aggregate*" "dmarc_f*") declare -a GF_DS_TIME=("date_range" "arrival_date") for i in 0 1; do name="${GF_DS_NAMES[$i]}" @@ -335,7 +338,7 @@ splunk_import_view() { } splunk_import_view dmarc_aggregate dashboards/splunk/dmarc_aggregate_dashboard.xml -splunk_import_view dmarc_forensic dashboards/splunk/dmarc_forensic_dashboard.xml +splunk_import_view dmarc_failure dashboards/splunk/dmarc_failure_dashboard.xml splunk_import_view smtp_tls dashboards/splunk/smtp_tls_dashboard.xml cat < None: + """Consolidate a pre-rename ``/Forensic`` subfolder into + ``/Failure``. + + Before failure reports were renamed from "forensic" reports, they were + archived under ``/Forensic``; they now go to + ``/Failure``. This best-effort, run-on-startup migration + moves any pre-existing legacy archive into the new location so reports + filed before and after the rename live in the same folder. + + It is a no-op when there is no legacy ``Forensic`` folder (the common + case), and never raises: a mailbox that cannot be reorganized is logged + and skipped, consistent with the rest of parsedmarc's mailbox handling + (warn, don't crash). Uses the folder-management API added in mailsuite + 2.1.0 (``folder_exists`` / ``rename_folder`` / ``merge_folders``). + """ + old_folder = "{0}/Forensic".format(archive_folder) + new_folder = "{0}/Failure".format(archive_folder) + try: + if not connection.folder_exists(old_folder): + return + if connection.folder_exists(new_folder): + # Both exist (e.g. a partial earlier migration, or a manually + # created Failure folder): move the legacy folder's messages into + # the new one and drop the now-empty legacy folder. + connection.merge_folders(old_folder, new_folder) + logger.info( + "Merged legacy archive folder {0} into {1}".format( + old_folder, new_folder + ) + ) + else: + connection.rename_folder(old_folder, new_folder) + logger.info( + "Renamed legacy archive folder {0} to {1}".format( + old_folder, new_folder + ) + ) + except Exception as error: + logger.warning( + "Could not migrate legacy archive folder {0} to {1}: {2}".format( + old_folder, new_folder, error + ) + ) + + def get_dmarc_reports_from_mailbox( connection: MailboxConnection, *, @@ -2134,7 +2182,7 @@ def get_dmarc_reports_from_mailbox( failure_report_msg_uids = [] smtp_tls_msg_uids = [] aggregate_reports_folder = "{0}/Aggregate".format(archive_folder) - failure_reports_folder = "{0}/Forensic".format(archive_folder) + failure_reports_folder = "{0}/Failure".format(archive_folder) smtp_tls_reports_folder = "{0}/SMTP-TLS".format(archive_folder) invalid_reports_folder = "{0}/Invalid".format(archive_folder) @@ -2144,6 +2192,7 @@ def get_dmarc_reports_from_mailbox( smtp_tls_reports = results["smtp_tls_reports"].copy() if not test and create_folders: + _migrate_forensic_archive_folder(connection, archive_folder) connection.create_folder(archive_folder) connection.create_folder(aggregate_reports_folder) connection.create_folder(failure_reports_folder) diff --git a/pyproject.toml b/pyproject.toml index 1c55123..cb6f9a1 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -10,7 +10,7 @@ name = "parsedmarc" dynamic = [ "version", ] -description = "A Python package and CLI for parsing aggregate and forensic DMARC reports" +description = "A Python package and CLI for parsing aggregate, failure, and SMTP TLS DMARC reports" readme = "README.md" license = "Apache-2.0" authors = [ @@ -41,7 +41,7 @@ dependencies = [ "expiringdict>=1.1.4", "kafka-python-ng>=2.2.2", "lxml>=4.4.0", - "mailsuite[gmail,msgraph]>=2.0.2", + "mailsuite[gmail,msgraph]>=2.1.0", "maxminddb>=2.0.0", "opensearch-py>=2.4.2,<=4.0.0", "publicsuffixlist>=0.10.0", diff --git a/tests/test_init.py b/tests/test_init.py index 586e2e0..1e1ea32 100644 --- a/tests/test_init.py +++ b/tests/test_init.py @@ -6,18 +6,22 @@ extract_report, get_dmarc_reports_from_mbox, and the CSV / JSON renderers. """ import json +import mailbox import os import unittest from datetime import datetime, timedelta, timezone from glob import glob from io import BytesIO from pathlib import Path -from tempfile import NamedTemporaryFile +from shutil import rmtree +from tempfile import NamedTemporaryFile, mkdtemp from typing import BinaryIO, cast +from unittest.mock import MagicMock from lxml import etree # type: ignore[import-untyped] import parsedmarc +from parsedmarc.mail import MaildirConnection from parsedmarc.types import AggregateReport, FailureReport, SMTPTLSReport # Detect if running in GitHub Actions to skip DNS lookups @@ -2330,6 +2334,105 @@ class TestGetDmarcReportsFromMailboxValidation(unittest.TestCase): self.assertIn("connection", str(ctx.exception).lower()) +class TestMigrateForensicArchiveFolderErrorHandling(unittest.TestCase): + """The one migration scenario a real on-disk Maildir can't reproduce: a + backend that raises mid-operation. _migrate_forensic_archive_folder must + warn and continue (warn, don't crash) so a mailbox it cannot reorganize + doesn't abort the whole run. + + The rename / merge / no-op behavior is covered for real (no mocks) in + TestMigrateForensicArchiveFolderMaildir; only this failure path needs a + mock, to force a folder operation to raise.""" + + def test_backend_error_is_warned_not_raised(self): + conn = MagicMock() + conn.folder_exists.side_effect = lambda name: name.endswith("/Forensic") + conn.rename_folder.side_effect = RuntimeError("server said no") + with self.assertLogs("parsedmarc.log", level="WARNING") as cm: + parsedmarc._migrate_forensic_archive_folder(conn, "Archive") + self.assertTrue( + any("Could not migrate" in line for line in cm.output), + cm.output, + ) + + +class TestMigrateForensicArchiveFolderMaildir(unittest.TestCase): + """End-to-end migration against a real on-disk Maildir via mailsuite's + MaildirConnection — no mocks. This exercises the actual mailsuite 2.1.0 + folder API (folder_exists / rename_folder / merge_folders / delete_folder) + and the on-disk result, so it would catch a real behavioral break that a + mock-based test cannot (e.g. a signature mismatch, or messages left behind + in the legacy folder).""" + + def setUp(self): + self._tmp = mkdtemp() + self.addCleanup(rmtree, self._tmp, ignore_errors=True) + self.conn = MaildirConnection(self._tmp, maildir_create=True) + # Parent must exist before nested subfolders, as get_dmarc_reports_- + # from_mailbox creates it (create_folder(archive_folder)) first. + self.conn.create_folder("Archive") + + def _seed(self, folder, subject): + """Drop a real RFC 822 message into an on-disk Maildir subfolder.""" + self.conn.create_folder(folder) + box = mailbox.Maildir(os.path.join(self._tmp, "." + folder)) + box.add( + mailbox.MaildirMessage( + "From: reporter@example.com\n" + "To: dmarc@example.org\n" + f"Subject: {subject}\n\nbody\n" + ) + ) + box.flush() + + def test_rename_moves_legacy_folder_and_its_messages(self): + """Only the legacy folder exists: it (and the message inside it) is + renamed to Failure, leaving nothing behind in Forensic.""" + self._seed("Archive/Forensic", "legacy failure report") + self.assertTrue(self.conn.folder_exists("Archive/Forensic")) + self.assertFalse(self.conn.folder_exists("Archive/Failure")) + + parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive") + + self.assertFalse(self.conn.folder_exists("Archive/Forensic")) + self.assertTrue(self.conn.folder_exists("Archive/Failure")) + self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 1) + + def test_merge_consolidates_messages_when_both_exist(self): + """Both folders exist: the legacy folder's messages are merged into + the existing Failure folder (which keeps its own), and the emptied + legacy folder is deleted.""" + self._seed("Archive/Failure", "post-rename failure report") + self._seed("Archive/Forensic", "legacy failure report") + + parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive") + + self.assertFalse(self.conn.folder_exists("Archive/Forensic")) + self.assertTrue(self.conn.folder_exists("Archive/Failure")) + self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 2) + + def test_no_legacy_folder_is_noop(self): + """No legacy folder (the common case): nothing is created or changed.""" + parsedmarc._migrate_forensic_archive_folder(self.conn, "Archive") + self.assertFalse(self.conn.folder_exists("Archive/Forensic")) + self.assertFalse(self.conn.folder_exists("Archive/Failure")) + + def test_orchestration_migrates_before_creating_folders(self): + """get_dmarc_reports_from_mailbox runs the migration *before* it + creates folders: a seeded legacy Forensic folder ends up consolidated + into the newly-created Failure subfolder (message and all), not split + across the two. Driven through the real orchestration with an empty + INBOX, so no parsing or network occurs.""" + self._seed("Archive/Forensic", "legacy failure report") + + result = parsedmarc.get_dmarc_reports_from_mailbox(connection=self.conn) + + self.assertFalse(self.conn.folder_exists("Archive/Forensic")) + self.assertTrue(self.conn.folder_exists("Archive/Failure")) + self.assertEqual(len(self.conn.fetch_messages("Archive/Failure")), 1) + self.assertEqual(result["failure_reports"], []) + + class TestEmailResultsErrorBranches(unittest.TestCase): """email_results requires mail_to to be a list — this is enforced by an assert. A regression that dropped the assert would mean the