mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-26 05:35:24 +00:00
Land 10.0.3 changes on master (#785)
PR #784 was stacked on the #783 branch and its base was never retargeted to master, so it merged into fix/mailsuite-2.2.1-empty-address instead of master. master therefore has 10.0.2 (#783's squash) but is missing the 10.0.3 changes. This re-lands exactly that delta — the Reply-To/Delivered-To parser fix, the ES/OS Reply-To header flattening, and the Splunk/OpenSearch/Grafana failure dashboard fixes, with the version bumped to 10.0.3. No mailsuite re-bump (the >=2.2.1 floor is already on master from 10.0.2). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,20 @@
|
||||
# Changelog
|
||||
|
||||
## 10.0.3
|
||||
|
||||
### Bug fixes
|
||||
|
||||
- Fix `Reply-To` (and `Delivered-To`) addresses being dropped from failure-report samples. `parse_email()` looked up mailparser's underscored `reply_to` / `delivered_to` keys, but `mail_json` names those headers `reply-to` / `delivered-to`, so the lookup always missed and `parsed_sample["reply_to"]` was always `[]` regardless of the message. Failure samples now carry their parsed Reply-To addresses through to JSON/CSV output and the Elasticsearch/OpenSearch nested `sample.reply_to` field.
|
||||
|
||||
### Dashboard fixes
|
||||
|
||||
All failure (RUF) dashboards now render every displayed address (`From`, `To`, `Reply-To`) the same way: `Display Name <addr>`, or the bare address when there is no display name. The format is assembled at query time from fields (`display_name` / `address`) that already exist on previously-indexed reports, so the panels work on historical data, not only on reports stored after upgrading — with one unavoidable exception: a report's `Reply-To` only appears for reports **parsed by 10.0.3 or later**. Earlier versions discarded it at parse time (the bug above), so it is absent from older stored reports; recovering it requires re-parsing the original samples.
|
||||
|
||||
- **Splunk failure dashboard:** the email-samples panel showed empty `from` and `reply_to` columns — it renamed `parsed_sample.headers.from{}{}` / `parsed_sample.headers.reply-to{}{}`, which are mis-cased (the header keys are `From` / `Reply-To`) and array-of-array shaped. The panel now builds `from` and `reply_to` with an `eval` that coalesces `display_name <address>` down to the bare `address` when there is no display name. (A multi-address `Reply-To` falls back to addresses-only — a Splunk multi-value-rendering limitation, not a data-loss one.)
|
||||
- **OpenSearch failure dashboard:** the column labelled `reply_to` aggregated `sample.headers.in-reply-to.keyword` — the `In-Reply-To` threading header, not the Reply-To address. It now aggregates `sample.headers.reply-to.keyword`, and that field was added to the `dmarc_f*` index pattern. To support it, the Elasticsearch/OpenSearch failure writer now flattens the `Reply-To` header into a display string on `sample.headers["reply-to"]`, mirroring the existing `From` / `To` handling. (Re-import the dashboards, or refresh the `dmarc_f*` index pattern, to pick up the new field.)
|
||||
- **Grafana (Elasticsearch) dashboard:** the *Failure Samples* panel already read `sample.headers.reply-to.keyword`, but that field previously held the raw `[[name, address]]` array (split into separate name/address terms). The failure-writer flattening above makes the existing `ReplyTo` column render a clean `Name <address>` string — no dashboard change required.
|
||||
- **Grafana (PostgreSQL) dashboard:** the *Failure Reports* panel did not surface the message `From` header or `Reply-To` at all (it showed only the envelope `Mail From` / `Rcpt To`). Added `From` (from `sample_from`) and `Reply To` (aggregated from `dmarc_failure_sample_address`) columns.
|
||||
|
||||
## 10.0.2
|
||||
|
||||
### Changes
|
||||
|
||||
@@ -1391,7 +1391,7 @@
|
||||
},
|
||||
"editorMode": "code",
|
||||
"format": "table",
|
||||
"rawSql": "SELECT\n f.arrival_date_utc AS \"Arrival Date\",\n COALESCE(f.feedback_type, '') AS \"Feedback Type\",\n COALESCE(f.reported_domain, '') AS \"Reported Domain\",\n COALESCE(f.source_ip_address::TEXT, '') AS \"Source IP\",\n COALESCE(f.source_reverse_dns, '') AS \"Reverse DNS\",\n COALESCE(f.source_base_domain, '') AS \"Source Domain\",\n COALESCE(f.source_country, '') AS \"Country\",\n COALESCE(array_to_string(f.auth_failure, ', '), '') AS \"Auth Failure\",\n COALESCE(f.authentication_results, '') AS \"Auth Results\",\n COALESCE(f.delivery_result, '') AS \"Delivery Result\",\n COALESCE(f.dkim_domain, '') AS \"DKIM Domain\",\n COALESCE(f.sample_subject, '') AS \"Subject\",\n COALESCE(f.original_mail_from, '') AS \"Mail From\",\n COALESCE(f.original_rcpt_to, '') AS \"Rcpt To\"\nFROM dmarc_failure_report f\nWHERE f.arrival_date_utc IS NOT NULL\n AND f.arrival_date_utc::TIMESTAMPTZ BETWEEN $__timeFrom() AND $__timeTo()\nORDER BY f.id DESC",
|
||||
"rawSql": "SELECT\n f.arrival_date_utc AS \"Arrival Date\",\n COALESCE(f.feedback_type, '') AS \"Feedback Type\",\n COALESCE(f.reported_domain, '') AS \"Reported Domain\",\n COALESCE(f.source_ip_address::TEXT, '') AS \"Source IP\",\n COALESCE(f.source_reverse_dns, '') AS \"Reverse DNS\",\n COALESCE(f.source_base_domain, '') AS \"Source Domain\",\n COALESCE(f.source_country, '') AS \"Country\",\n COALESCE(array_to_string(f.auth_failure, ', '), '') AS \"Auth Failure\",\n COALESCE(f.authentication_results, '') AS \"Auth Results\",\n COALESCE(f.delivery_result, '') AS \"Delivery Result\",\n COALESCE(f.dkim_domain, '') AS \"DKIM Domain\",\n COALESCE(\n CASE WHEN COALESCE(f.sample_from->>'display_name', '') <> ''\n THEN (f.sample_from->>'display_name') || ' <' || (f.sample_from->>'address') || '>'\n ELSE f.sample_from->>'address'\n END, '') AS \"From\",\n COALESCE((\n SELECT string_agg(\n CASE WHEN COALESCE(a.display_name, '') <> ''\n THEN a.display_name || ' <' || a.address || '>'\n ELSE a.address END, ', ')\n FROM dmarc_failure_sample_address a\n WHERE a.report_id = f.id AND a.address_type = 'reply_to'\n ), '') AS \"Reply To\",\n COALESCE(f.sample_subject, '') AS \"Subject\",\n COALESCE(f.original_mail_from, '') AS \"Mail From\",\n COALESCE(f.original_rcpt_to, '') AS \"Rcpt To\"\nFROM dmarc_failure_report f\nWHERE f.arrival_date_utc IS NOT NULL\n AND f.arrival_date_utc::TIMESTAMPTZ BETWEEN $__timeFrom() AND $__timeTo()\nORDER BY f.id DESC",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -61,7 +61,9 @@
|
||||
<title>DMARC failure email samples</title>
|
||||
<table>
|
||||
<search base="base_search">
|
||||
<query>| rename parsed_sample.headers.from{}{} as from, parsed_sample.headers.Subject as subject, parsed_sample.headers.reply-to{}{} as reply_to
|
||||
<query>| eval from=coalesce('parsed_sample.from.display_name'." <".'parsed_sample.from.address'.">", 'parsed_sample.from.address')
|
||||
| eval reply_to=coalesce('parsed_sample.reply_to{}.display_name'." <".'parsed_sample.reply_to{}.address'.">", 'parsed_sample.reply_to{}.address')
|
||||
| rename parsed_sample.subject as subject
|
||||
| table arrival_date_utc, source.ip_address, "from", subject, reply_to, authentication_results
|
||||
| sort -arrival_date_utc</query>
|
||||
</search>
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
__version__ = "10.0.2"
|
||||
__version__ = "10.0.3"
|
||||
|
||||
USER_AGENT = f"parsedmarc/{__version__}"
|
||||
|
||||
|
||||
@@ -701,6 +701,16 @@ def save_failure_report_to_elasticsearch(
|
||||
to_["sample.headers.to"] = headers["to"]
|
||||
to_query = Q(dict(match_phrase=to_)) # pyright: ignore[reportArgumentType]
|
||||
q = q & to_query
|
||||
if "reply-to" in headers:
|
||||
# Flatten the Reply-To header to a string so it can be displayed
|
||||
# and aggregated like From/To. Only the first address is used,
|
||||
# matching the From/To handling above. Not part of the dedup
|
||||
# query.
|
||||
headers["reply-to"] = headers["reply-to"][0]
|
||||
if headers["reply-to"][0] == "":
|
||||
headers["reply-to"] = headers["reply-to"][1]
|
||||
else:
|
||||
headers["reply-to"] = " <".join(headers["reply-to"]) + ">"
|
||||
if "subject" in headers:
|
||||
subject = headers["subject"]
|
||||
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
|
||||
|
||||
@@ -701,6 +701,16 @@ def save_failure_report_to_opensearch(
|
||||
to_["sample.headers.to"] = headers["to"]
|
||||
to_query = Q(dict(match_phrase=to_))
|
||||
q = q & to_query
|
||||
if "reply-to" in headers:
|
||||
# Flatten the Reply-To header to a string so it can be displayed
|
||||
# and aggregated like From/To. Only the first address is used,
|
||||
# matching the From/To handling above. Not part of the dedup
|
||||
# query.
|
||||
headers["reply-to"] = headers["reply-to"][0]
|
||||
if headers["reply-to"][0] == "":
|
||||
headers["reply-to"] = headers["reply-to"][1]
|
||||
else:
|
||||
headers["reply-to"] = " <".join(headers["reply-to"]) + ">"
|
||||
if "subject" in headers:
|
||||
subject = headers["subject"]
|
||||
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
|
||||
|
||||
+10
-4
@@ -1133,9 +1133,15 @@ def parse_email(
|
||||
parsed_email["date"] = parsed_email["date"].replace("T", " ")
|
||||
else:
|
||||
parsed_email["date"] = None
|
||||
if "reply_to" in parsed_email:
|
||||
# mailparser's mail_json names these headers with hyphens
|
||||
# ("reply-to", "delivered-to"), not underscores. Reading the
|
||||
# underscored key always missed, so every Reply-To address was
|
||||
# silently dropped. Convert under the underscored name consumers
|
||||
# expect and drop the raw hyphenated key so the body carries a
|
||||
# single representation, matching how "to"/"cc"/"bcc" are handled.
|
||||
if "reply-to" in parsed_email:
|
||||
parsed_email["reply_to"] = list(
|
||||
map(lambda x: parse_email_address(x), parsed_email["reply_to"])
|
||||
map(lambda x: parse_email_address(x), parsed_email.pop("reply-to"))
|
||||
)
|
||||
else:
|
||||
parsed_email["reply_to"] = []
|
||||
@@ -1161,9 +1167,9 @@ def parse_email(
|
||||
else:
|
||||
parsed_email["bcc"] = []
|
||||
|
||||
if "delivered_to" in parsed_email:
|
||||
if "delivered-to" in parsed_email:
|
||||
parsed_email["delivered_to"] = list(
|
||||
map(lambda x: parse_email_address(x), parsed_email["delivered_to"])
|
||||
map(lambda x: parse_email_address(x), parsed_email.pop("delivered-to"))
|
||||
)
|
||||
|
||||
if "attachments" not in parsed_email:
|
||||
|
||||
@@ -751,6 +751,36 @@ class TestSaveFailureReport(unittest.TestCase):
|
||||
save_failure_report_to_elasticsearch(report)
|
||||
mock_save.assert_called_once()
|
||||
|
||||
def test_reply_to_header_flattened_and_indexed(self):
|
||||
"""A Reply-To header is flattened to a display string on
|
||||
``sample.headers["reply-to"]`` — so the failure dashboard's
|
||||
``sample.headers.reply-to.keyword`` column resolves — and each
|
||||
Reply-To address also populates the nested ``sample.reply_to``
|
||||
docs. Asserts on the document handed to .save(), not merely
|
||||
that save ran."""
|
||||
report = _failure_report()
|
||||
report["parsed_sample"]["headers"]["Reply-To"] = [
|
||||
["Real One", "real@phish.example"]
|
||||
]
|
||||
report["parsed_sample"]["reply_to"] = [
|
||||
{"display_name": "Real One", "address": "real@phish.example"}
|
||||
]
|
||||
with (
|
||||
patch("parsedmarc.elastic.Search", return_value=_empty_search()),
|
||||
patch("parsedmarc.elastic.Index"),
|
||||
patch.object(
|
||||
elastic_module._FailureReportDoc, "save", autospec=True
|
||||
) as mock_save,
|
||||
):
|
||||
save_failure_report_to_elasticsearch(report)
|
||||
doc = mock_save.call_args.args[0]
|
||||
self.assertEqual(
|
||||
doc.sample.headers["reply-to"], "Real One <real@phish.example>"
|
||||
)
|
||||
self.assertEqual(
|
||||
[a.address for a in doc.sample.reply_to], ["real@phish.example"]
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# save_smtp_tls_report_to_elasticsearch
|
||||
|
||||
@@ -749,6 +749,36 @@ class TestSaveFailureReport(unittest.TestCase):
|
||||
save_failure_report_to_opensearch(report)
|
||||
mock_save.assert_called_once()
|
||||
|
||||
def test_reply_to_header_flattened_and_indexed(self):
|
||||
"""A Reply-To header is flattened to a display string on
|
||||
``sample.headers["reply-to"]`` — so the failure dashboard's
|
||||
``sample.headers.reply-to.keyword`` column resolves — and each
|
||||
Reply-To address also populates the nested ``sample.reply_to``
|
||||
docs. Asserts on the document handed to .save(), not merely
|
||||
that save ran."""
|
||||
report = _failure_report()
|
||||
report["parsed_sample"]["headers"]["Reply-To"] = [
|
||||
["Real One", "real@phish.example"]
|
||||
]
|
||||
report["parsed_sample"]["reply_to"] = [
|
||||
{"display_name": "Real One", "address": "real@phish.example"}
|
||||
]
|
||||
with (
|
||||
patch("parsedmarc.opensearch.Search", return_value=_empty_search()),
|
||||
patch("parsedmarc.opensearch.Index"),
|
||||
patch.object(
|
||||
opensearch_module._FailureReportDoc, "save", autospec=True
|
||||
) as mock_save,
|
||||
):
|
||||
save_failure_report_to_opensearch(report)
|
||||
doc = mock_save.call_args.args[0]
|
||||
self.assertEqual(
|
||||
doc.sample.headers["reply-to"], "Real One <real@phish.example>"
|
||||
)
|
||||
self.assertEqual(
|
||||
[a.address for a in doc.sample.reply_to], ["real@phish.example"]
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# save_smtp_tls_report_to_opensearch
|
||||
|
||||
@@ -665,6 +665,39 @@ class TestPostgreSQLClientSave(unittest.TestCase):
|
||||
self.assertEqual(len(addr_sqls), 1)
|
||||
self.assertIn("solo@example.com", addr_sqls[0][1])
|
||||
|
||||
def test_save_failure_report_indexes_reply_to_address(self):
|
||||
"""A parsed Reply-To address is written to
|
||||
dmarc_failure_sample_address with address_type 'reply_to' — the
|
||||
rows the Grafana PostgreSQL failure panel aggregates for its
|
||||
'Reply To' column. Guards the path that parse_email now
|
||||
populates (reply_to was always [] before the hyphen-key fix)."""
|
||||
client, mock_conn = _make_client()
|
||||
cur = _mock_cursor(mock_conn, [None, (1,)])
|
||||
|
||||
report = {
|
||||
"arrival_date_utc": "2024-01-15 10:30:00",
|
||||
"reported_domain": "example.com",
|
||||
"source": {"ip_address": "203.0.113.1"},
|
||||
"parsed_sample": {
|
||||
"subject": "Test",
|
||||
"reply_to": [
|
||||
{"display_name": "Real One", "address": "real@phish.example"}
|
||||
],
|
||||
},
|
||||
}
|
||||
|
||||
client.save_failure_report_to_postgresql(report)
|
||||
|
||||
reply_to_inserts = [
|
||||
_named_params(c)
|
||||
for c in cur.execute.call_args_list
|
||||
if "dmarc_failure_sample_address" in c.args[0]
|
||||
and c.args[1][1] == "reply_to"
|
||||
]
|
||||
self.assertEqual(len(reply_to_inserts), 1)
|
||||
self.assertEqual(reply_to_inserts[0]["address"], "real@phish.example")
|
||||
self.assertEqual(reply_to_inserts[0]["display_name"], "Real One")
|
||||
|
||||
|
||||
class TestPostgreSQLSaveErrors(unittest.TestCase):
|
||||
"""Driver errors raised mid-save are wrapped in PostgreSQLError."""
|
||||
|
||||
@@ -579,6 +579,46 @@ Body text"""
|
||||
self.assertEqual(result["subject"], "Test")
|
||||
self.assertEqual(result["reply_to"], [])
|
||||
|
||||
def testReplyToHeaderIsParsed(self):
|
||||
"""A Reply-To header populates reply_to with every address.
|
||||
|
||||
Regression: parse_email read mailparser's underscored
|
||||
``reply_to`` key, but mail_json names the header ``reply-to``,
|
||||
so the lookup always missed and every Reply-To address was
|
||||
silently dropped (reply_to was always []).
|
||||
"""
|
||||
email_str = (
|
||||
"From: Sender <sender@example.com>\r\n"
|
||||
"Reply-To: Real One <real@phish.example>,"
|
||||
" Second <two@phish.example>\r\n"
|
||||
"To: victim@example.org\r\n"
|
||||
"Subject: Hi\r\n\r\nBody\r\n"
|
||||
)
|
||||
result = parsedmarc.utils.parse_email(email_str)
|
||||
self.assertEqual(
|
||||
[a["address"] for a in result["reply_to"]],
|
||||
["real@phish.example", "two@phish.example"],
|
||||
)
|
||||
self.assertEqual(result["reply_to"][0]["display_name"], "Real One")
|
||||
|
||||
def testDeliveredToHeaderIsParsed(self):
|
||||
"""A Delivered-To header populates delivered_to.
|
||||
|
||||
Same hyphen/underscore key mismatch as reply_to: mail_json
|
||||
names the header ``delivered-to``, so reading ``delivered_to``
|
||||
dropped it.
|
||||
"""
|
||||
email_str = (
|
||||
"From: Sender <sender@example.com>\r\n"
|
||||
"Delivered-To: box@example.org\r\n"
|
||||
"To: box@example.org\r\n"
|
||||
"Subject: Hi\r\n\r\nBody\r\n"
|
||||
)
|
||||
result = parsedmarc.utils.parse_email(email_str)
|
||||
self.assertEqual(
|
||||
[a["address"] for a in result["delivered_to"]], ["box@example.org"]
|
||||
)
|
||||
|
||||
def testEmailWithNoSubject(self):
|
||||
"""parse_email defaults subject to None when missing"""
|
||||
email_str = """From: test@example.com
|
||||
|
||||
Reference in New Issue
Block a user