Land 10.0.3 changes on master (#785)

PR #784 was stacked on the #783 branch and its base was never retargeted to
master, so it merged into fix/mailsuite-2.2.1-empty-address instead of master.
master therefore has 10.0.2 (#783's squash) but is missing the 10.0.3 changes.

This re-lands exactly that delta — the Reply-To/Delivered-To parser fix, the
ES/OS Reply-To header flattening, and the Splunk/OpenSearch/Grafana failure
dashboard fixes, with the version bumped to 10.0.3. No mailsuite re-bump (the
>=2.2.1 floor is already on master from 10.0.2).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Sean Whalen
2026-05-24 13:54:40 -04:00
committed by GitHub
parent 2c8b2c0f14
commit e104f1118c
12 changed files with 185 additions and 9 deletions
+15
View File
@@ -1,5 +1,20 @@
# Changelog
## 10.0.3
### Bug fixes
- Fix `Reply-To` (and `Delivered-To`) addresses being dropped from failure-report samples. `parse_email()` looked up mailparser's underscored `reply_to` / `delivered_to` keys, but `mail_json` names those headers `reply-to` / `delivered-to`, so the lookup always missed and `parsed_sample["reply_to"]` was always `[]` regardless of the message. Failure samples now carry their parsed Reply-To addresses through to JSON/CSV output and the Elasticsearch/OpenSearch nested `sample.reply_to` field.
### Dashboard fixes
All failure (RUF) dashboards now render every displayed address (`From`, `To`, `Reply-To`) the same way: `Display Name <addr>`, or the bare address when there is no display name. The format is assembled at query time from fields (`display_name` / `address`) that already exist on previously-indexed reports, so the panels work on historical data, not only on reports stored after upgrading — with one unavoidable exception: a report's `Reply-To` only appears for reports **parsed by 10.0.3 or later**. Earlier versions discarded it at parse time (the bug above), so it is absent from older stored reports; recovering it requires re-parsing the original samples.
- **Splunk failure dashboard:** the email-samples panel showed empty `from` and `reply_to` columns — it renamed `parsed_sample.headers.from{}{}` / `parsed_sample.headers.reply-to{}{}`, which are mis-cased (the header keys are `From` / `Reply-To`) and array-of-array shaped. The panel now builds `from` and `reply_to` with an `eval` that coalesces `display_name <address>` down to the bare `address` when there is no display name. (A multi-address `Reply-To` falls back to addresses-only — a Splunk multi-value-rendering limitation, not a data-loss one.)
- **OpenSearch failure dashboard:** the column labelled `reply_to` aggregated `sample.headers.in-reply-to.keyword` — the `In-Reply-To` threading header, not the Reply-To address. It now aggregates `sample.headers.reply-to.keyword`, and that field was added to the `dmarc_f*` index pattern. To support it, the Elasticsearch/OpenSearch failure writer now flattens the `Reply-To` header into a display string on `sample.headers["reply-to"]`, mirroring the existing `From` / `To` handling. (Re-import the dashboards, or refresh the `dmarc_f*` index pattern, to pick up the new field.)
- **Grafana (Elasticsearch) dashboard:** the *Failure Samples* panel already read `sample.headers.reply-to.keyword`, but that field previously held the raw `[[name, address]]` array (split into separate name/address terms). The failure-writer flattening above makes the existing `ReplyTo` column render a clean `Name <address>` string — no dashboard change required.
- **Grafana (PostgreSQL) dashboard:** the *Failure Reports* panel did not surface the message `From` header or `Reply-To` at all (it showed only the envelope `Mail From` / `Rcpt To`). Added `From` (from `sample_from`) and `Reply To` (aggregated from `dmarc_failure_sample_address`) columns.
## 10.0.2
### Changes
@@ -1391,7 +1391,7 @@
},
"editorMode": "code",
"format": "table",
"rawSql": "SELECT\n f.arrival_date_utc AS \"Arrival Date\",\n COALESCE(f.feedback_type, '') AS \"Feedback Type\",\n COALESCE(f.reported_domain, '') AS \"Reported Domain\",\n COALESCE(f.source_ip_address::TEXT, '') AS \"Source IP\",\n COALESCE(f.source_reverse_dns, '') AS \"Reverse DNS\",\n COALESCE(f.source_base_domain, '') AS \"Source Domain\",\n COALESCE(f.source_country, '') AS \"Country\",\n COALESCE(array_to_string(f.auth_failure, ', '), '') AS \"Auth Failure\",\n COALESCE(f.authentication_results, '') AS \"Auth Results\",\n COALESCE(f.delivery_result, '') AS \"Delivery Result\",\n COALESCE(f.dkim_domain, '') AS \"DKIM Domain\",\n COALESCE(f.sample_subject, '') AS \"Subject\",\n COALESCE(f.original_mail_from, '') AS \"Mail From\",\n COALESCE(f.original_rcpt_to, '') AS \"Rcpt To\"\nFROM dmarc_failure_report f\nWHERE f.arrival_date_utc IS NOT NULL\n AND f.arrival_date_utc::TIMESTAMPTZ BETWEEN $__timeFrom() AND $__timeTo()\nORDER BY f.id DESC",
"rawSql": "SELECT\n f.arrival_date_utc AS \"Arrival Date\",\n COALESCE(f.feedback_type, '') AS \"Feedback Type\",\n COALESCE(f.reported_domain, '') AS \"Reported Domain\",\n COALESCE(f.source_ip_address::TEXT, '') AS \"Source IP\",\n COALESCE(f.source_reverse_dns, '') AS \"Reverse DNS\",\n COALESCE(f.source_base_domain, '') AS \"Source Domain\",\n COALESCE(f.source_country, '') AS \"Country\",\n COALESCE(array_to_string(f.auth_failure, ', '), '') AS \"Auth Failure\",\n COALESCE(f.authentication_results, '') AS \"Auth Results\",\n COALESCE(f.delivery_result, '') AS \"Delivery Result\",\n COALESCE(f.dkim_domain, '') AS \"DKIM Domain\",\n COALESCE(\n CASE WHEN COALESCE(f.sample_from->>'display_name', '') <> ''\n THEN (f.sample_from->>'display_name') || ' <' || (f.sample_from->>'address') || '>'\n ELSE f.sample_from->>'address'\n END, '') AS \"From\",\n COALESCE((\n SELECT string_agg(\n CASE WHEN COALESCE(a.display_name, '') <> ''\n THEN a.display_name || ' <' || a.address || '>'\n ELSE a.address END, ', ')\n FROM dmarc_failure_sample_address a\n WHERE a.report_id = f.id AND a.address_type = 'reply_to'\n ), '') AS \"Reply To\",\n COALESCE(f.sample_subject, '') AS \"Subject\",\n COALESCE(f.original_mail_from, '') AS \"Mail From\",\n COALESCE(f.original_rcpt_to, '') AS \"Rcpt To\"\nFROM dmarc_failure_report f\nWHERE f.arrival_date_utc IS NOT NULL\n AND f.arrival_date_utc::TIMESTAMPTZ BETWEEN $__timeFrom() AND $__timeTo()\nORDER BY f.id DESC",
"refId": "A"
}
]
File diff suppressed because one or more lines are too long
@@ -61,7 +61,9 @@
<title>DMARC failure email samples</title>
<table>
<search base="base_search">
<query>| rename parsed_sample.headers.from{}{} as from, parsed_sample.headers.Subject as subject, parsed_sample.headers.reply-to{}{} as reply_to
<query>| eval from=coalesce('parsed_sample.from.display_name'." &lt;".'parsed_sample.from.address'."&gt;", 'parsed_sample.from.address')
| eval reply_to=coalesce('parsed_sample.reply_to{}.display_name'." &lt;".'parsed_sample.reply_to{}.address'."&gt;", 'parsed_sample.reply_to{}.address')
| rename parsed_sample.subject as subject
| table arrival_date_utc, source.ip_address, "from", subject, reply_to, authentication_results
| sort -arrival_date_utc</query>
</search>
+1 -1
View File
@@ -1,4 +1,4 @@
__version__ = "10.0.2"
__version__ = "10.0.3"
USER_AGENT = f"parsedmarc/{__version__}"
+10
View File
@@ -701,6 +701,16 @@ def save_failure_report_to_elasticsearch(
to_["sample.headers.to"] = headers["to"]
to_query = Q(dict(match_phrase=to_)) # pyright: ignore[reportArgumentType]
q = q & to_query
if "reply-to" in headers:
# Flatten the Reply-To header to a string so it can be displayed
# and aggregated like From/To. Only the first address is used,
# matching the From/To handling above. Not part of the dedup
# query.
headers["reply-to"] = headers["reply-to"][0]
if headers["reply-to"][0] == "":
headers["reply-to"] = headers["reply-to"][1]
else:
headers["reply-to"] = " <".join(headers["reply-to"]) + ">"
if "subject" in headers:
subject = headers["subject"]
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
+10
View File
@@ -701,6 +701,16 @@ def save_failure_report_to_opensearch(
to_["sample.headers.to"] = headers["to"]
to_query = Q(dict(match_phrase=to_))
q = q & to_query
if "reply-to" in headers:
# Flatten the Reply-To header to a string so it can be displayed
# and aggregated like From/To. Only the first address is used,
# matching the From/To handling above. Not part of the dedup
# query.
headers["reply-to"] = headers["reply-to"][0]
if headers["reply-to"][0] == "":
headers["reply-to"] = headers["reply-to"][1]
else:
headers["reply-to"] = " <".join(headers["reply-to"]) + ">"
if "subject" in headers:
subject = headers["subject"]
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
+10 -4
View File
@@ -1133,9 +1133,15 @@ def parse_email(
parsed_email["date"] = parsed_email["date"].replace("T", " ")
else:
parsed_email["date"] = None
if "reply_to" in parsed_email:
# mailparser's mail_json names these headers with hyphens
# ("reply-to", "delivered-to"), not underscores. Reading the
# underscored key always missed, so every Reply-To address was
# silently dropped. Convert under the underscored name consumers
# expect and drop the raw hyphenated key so the body carries a
# single representation, matching how "to"/"cc"/"bcc" are handled.
if "reply-to" in parsed_email:
parsed_email["reply_to"] = list(
map(lambda x: parse_email_address(x), parsed_email["reply_to"])
map(lambda x: parse_email_address(x), parsed_email.pop("reply-to"))
)
else:
parsed_email["reply_to"] = []
@@ -1161,9 +1167,9 @@ def parse_email(
else:
parsed_email["bcc"] = []
if "delivered_to" in parsed_email:
if "delivered-to" in parsed_email:
parsed_email["delivered_to"] = list(
map(lambda x: parse_email_address(x), parsed_email["delivered_to"])
map(lambda x: parse_email_address(x), parsed_email.pop("delivered-to"))
)
if "attachments" not in parsed_email:
+30
View File
@@ -751,6 +751,36 @@ class TestSaveFailureReport(unittest.TestCase):
save_failure_report_to_elasticsearch(report)
mock_save.assert_called_once()
def test_reply_to_header_flattened_and_indexed(self):
"""A Reply-To header is flattened to a display string on
``sample.headers["reply-to"]`` — so the failure dashboard's
``sample.headers.reply-to.keyword`` column resolves — and each
Reply-To address also populates the nested ``sample.reply_to``
docs. Asserts on the document handed to .save(), not merely
that save ran."""
report = _failure_report()
report["parsed_sample"]["headers"]["Reply-To"] = [
["Real One", "real@phish.example"]
]
report["parsed_sample"]["reply_to"] = [
{"display_name": "Real One", "address": "real@phish.example"}
]
with (
patch("parsedmarc.elastic.Search", return_value=_empty_search()),
patch("parsedmarc.elastic.Index"),
patch.object(
elastic_module._FailureReportDoc, "save", autospec=True
) as mock_save,
):
save_failure_report_to_elasticsearch(report)
doc = mock_save.call_args.args[0]
self.assertEqual(
doc.sample.headers["reply-to"], "Real One <real@phish.example>"
)
self.assertEqual(
[a.address for a in doc.sample.reply_to], ["real@phish.example"]
)
# ---------------------------------------------------------------------------
# save_smtp_tls_report_to_elasticsearch
+30
View File
@@ -749,6 +749,36 @@ class TestSaveFailureReport(unittest.TestCase):
save_failure_report_to_opensearch(report)
mock_save.assert_called_once()
def test_reply_to_header_flattened_and_indexed(self):
"""A Reply-To header is flattened to a display string on
``sample.headers["reply-to"]`` — so the failure dashboard's
``sample.headers.reply-to.keyword`` column resolves — and each
Reply-To address also populates the nested ``sample.reply_to``
docs. Asserts on the document handed to .save(), not merely
that save ran."""
report = _failure_report()
report["parsed_sample"]["headers"]["Reply-To"] = [
["Real One", "real@phish.example"]
]
report["parsed_sample"]["reply_to"] = [
{"display_name": "Real One", "address": "real@phish.example"}
]
with (
patch("parsedmarc.opensearch.Search", return_value=_empty_search()),
patch("parsedmarc.opensearch.Index"),
patch.object(
opensearch_module._FailureReportDoc, "save", autospec=True
) as mock_save,
):
save_failure_report_to_opensearch(report)
doc = mock_save.call_args.args[0]
self.assertEqual(
doc.sample.headers["reply-to"], "Real One <real@phish.example>"
)
self.assertEqual(
[a.address for a in doc.sample.reply_to], ["real@phish.example"]
)
# ---------------------------------------------------------------------------
# save_smtp_tls_report_to_opensearch
+33
View File
@@ -665,6 +665,39 @@ class TestPostgreSQLClientSave(unittest.TestCase):
self.assertEqual(len(addr_sqls), 1)
self.assertIn("solo@example.com", addr_sqls[0][1])
def test_save_failure_report_indexes_reply_to_address(self):
"""A parsed Reply-To address is written to
dmarc_failure_sample_address with address_type 'reply_to' — the
rows the Grafana PostgreSQL failure panel aggregates for its
'Reply To' column. Guards the path that parse_email now
populates (reply_to was always [] before the hyphen-key fix)."""
client, mock_conn = _make_client()
cur = _mock_cursor(mock_conn, [None, (1,)])
report = {
"arrival_date_utc": "2024-01-15 10:30:00",
"reported_domain": "example.com",
"source": {"ip_address": "203.0.113.1"},
"parsed_sample": {
"subject": "Test",
"reply_to": [
{"display_name": "Real One", "address": "real@phish.example"}
],
},
}
client.save_failure_report_to_postgresql(report)
reply_to_inserts = [
_named_params(c)
for c in cur.execute.call_args_list
if "dmarc_failure_sample_address" in c.args[0]
and c.args[1][1] == "reply_to"
]
self.assertEqual(len(reply_to_inserts), 1)
self.assertEqual(reply_to_inserts[0]["address"], "real@phish.example")
self.assertEqual(reply_to_inserts[0]["display_name"], "Real One")
class TestPostgreSQLSaveErrors(unittest.TestCase):
"""Driver errors raised mid-save are wrapped in PostgreSQLError."""
+40
View File
@@ -579,6 +579,46 @@ Body text"""
self.assertEqual(result["subject"], "Test")
self.assertEqual(result["reply_to"], [])
def testReplyToHeaderIsParsed(self):
"""A Reply-To header populates reply_to with every address.
Regression: parse_email read mailparser's underscored
``reply_to`` key, but mail_json names the header ``reply-to``,
so the lookup always missed and every Reply-To address was
silently dropped (reply_to was always []).
"""
email_str = (
"From: Sender <sender@example.com>\r\n"
"Reply-To: Real One <real@phish.example>,"
" Second <two@phish.example>\r\n"
"To: victim@example.org\r\n"
"Subject: Hi\r\n\r\nBody\r\n"
)
result = parsedmarc.utils.parse_email(email_str)
self.assertEqual(
[a["address"] for a in result["reply_to"]],
["real@phish.example", "two@phish.example"],
)
self.assertEqual(result["reply_to"][0]["display_name"], "Real One")
def testDeliveredToHeaderIsParsed(self):
"""A Delivered-To header populates delivered_to.
Same hyphen/underscore key mismatch as reply_to: mail_json
names the header ``delivered-to``, so reading ``delivered_to``
dropped it.
"""
email_str = (
"From: Sender <sender@example.com>\r\n"
"Delivered-To: box@example.org\r\n"
"To: box@example.org\r\n"
"Subject: Hi\r\n\r\nBody\r\n"
)
result = parsedmarc.utils.parse_email(email_str)
self.assertEqual(
[a["address"] for a in result["delivered_to"]], ["box@example.org"]
)
def testEmailWithNoSubject(self):
"""parse_email defaults subject to None when missing"""
email_str = """From: test@example.com