mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-07-05 16:25:09 +00:00
10.2.0 - Explain why a report is invalid instead of "Not a valid report" (#802)
* Explain why a report is invalid instead of "Not a valid report"
The parser catches broadly so one malformed report can't crash a batch,
but every failure surfaced as the generic ParserError("Not a valid
report"), telling operators nothing about the cause.
parse_report_file() now keeps each format parser's specific error as it
tries aggregate XML -> SMTP TLS JSON -> report email, and when all three
reject the input it content-sniffs the leading byte to surface the single
relevant reason (e.g. "Invalid aggregate report: Missing field:
'org_name'", or "Not a recognized report format (...)"). The CLI already
logs str(error), so this reaches the user with no cli.py change.
Every parser catch site also re-raises with `raise ... from <original>`,
preserving the underlying ExpatError / JSONDecodeError / KeyError /
archive errors on __cause__ for library callers and tracebacks. The same
exception *types* are still raised.
Finally, the catch-all "unexpected error" branches append
`(raised at <file>:<line>)` from the deepest traceback frame, but only
when the parsedmarc logger is at DEBUG level (e.g. the CLI's --debug);
normal-level output is unchanged.
Bumps the in-progress version to 10.2.0 and documents all three in the
CHANGELOG.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Cover the failure-report path in parse_report_file error tests
The reason-surfacing tests covered the aggregate and SMTP TLS branches but
not failure reports, which reach parse_report_file only via the email
path. Add a malformed multipart/report failure email (missing the required
Source-IP) and assert the message names the failure format and the missing
field rather than collapsing to "Not a valid report".
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Cover the new error-reporting lines; drop one unreachable catch
Bring the lines added by this PR to full test coverage:
- Test _exc_origin() with a never-raised exception (no __traceback__) so the
"no frames" guard is exercised.
- Test _parse_smtp_tls_failure_details() with a non-dict, which raises
TypeError (not KeyError) and exercises the generic catch-all.
- Test parse_report_email() with an unparseable Date header, which trips the
initial mail-parse catch-all and becomes a ParserError.
Two dead lines are removed rather than hidden, per the project's "delete
unreachable branches, no # pragma: no cover" rule:
- _looks_like_email() looped with a `continue` for blank lines, but every
caller passes lstrip()-ed text, so the first line is never blank. Simplified
to inspect the first line directly.
- parse_report_email()'s `except Exception` after `except InvalidFailureReport`
was unreachable: parse_failure_report wraps its entire body and provably
raises only InvalidFailureReport, which the preceding handler already catches.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Fix IndexError when backfilling envelope_from from SPF results
_parse_report_record() backfills a missing/empty envelope_from from the
last SPF auth result's domain. The "envelope_from is None" branch gated on
the raw auth_results["spf"] list but indexed the filtered
new_record["auth_results"]["spf"] list, which only holds results that have
a domain. A reporter sending an SPF result with no domain made the filtered
list empty while the raw list was non-empty, so [-1] raised IndexError and
the whole record failed to parse.
The two near-identical envelope_from backfill branches (missing identifier
vs. empty identifier) drifted apart -- only one was updated when the
filtered new_record list was introduced -- which is what let them disagree
on which list to read. Merge them into a single path, keyed on
dict.get("envelope_from") is None, that gates and indexes the same raw list
with the "domain" membership guard the missing-identifier branch already
used.
Regression test: envelope_from=None with an SPF result carrying no domain
now parses to envelope_from=None instead of raising. This is the bug that
motivated the surrounding error-reporting work in this PR.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Trim comment
* Cover the touched error branches; drop a dead UnicodeDecodeError catch
Codecov flagged the pre-existing error branches this PR touched (adding
`from e` / `_exc_origin`) as changed-and-uncovered. Most are real
malformed-report paths, so add honest tests that drive them with realistic
inputs:
- parse_smtp_tls_report_json: nested missing key (date-range without
start-datetime) -> InvalidSMTPTLSReport chaining a KeyError.
- parse_aggregate_report_xml: non-structured report_metadata -> the
AttributeError branch ("Report missing required section").
- parse_report_email: valid legacy text/plain failure report (success path),
a text report missing its fields, a base64 attachment of malformed
aggregate XML, and one of invalid SMTP TLS JSON.
- parse_report_file: gzipped junk -> the str branch of the content sniff.
The `except UnicodeDecodeError` in extract_report is removed as dead code
(no `# pragma: no cover`, per the repo rule): str-mode streams are already
rejected by explicit isinstance checks, and every decode() uses
errors="ignore", so it can never fire. str-mode still raises ParserError.
Also rename the two new failure-report tests from "Forensic" to "Failure"
and add an AGENTS.md rule: RUF reports are "failure reports"; "forensic" is
reserved for the literal backward-compat alias identifiers only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Fix pyright errors in the new error-branch tests
CI runs pyright over the whole repo (tests included); these slipped through
because the local check only covered parsedmarc/__init__.py:
- _parse_smtp_tls_failure_details("not a dict") is a deliberate wrong-type
test -> targeted `# pyright: ignore[reportArgumentType]`.
- result["report"]["source"]["ip_address"] on a ParsedReport TypedDict ->
cast(FailureReport, result["report"]) first, matching existing tests.
This is what failed lint-docs-build (and, since `test` needs it, skipped the
Codecov upload) on the prior commits.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -60,6 +60,8 @@ To skip DNS lookups during testing, set `GITHUB_ACTIONS=true`.
|
||||
|
||||
`ReportType = Literal["aggregate", "failure", "smtp_tls"]`. Exception hierarchy: `ParserError` → `InvalidDMARCReport` → `InvalidAggregateReport`/`InvalidFailureReport`, and `InvalidSMTPTLSReport`. Legacy alias `InvalidForensicReport = InvalidFailureReport` preserved.
|
||||
|
||||
**Terminology: say "failure report", never "forensic report".** RUF reports are **failure reports** (RFC 9991 terminology) everywhere in new code, test names, docstrings, comments, CHANGELOG entries, and prose. "Forensic" is the legacy term, kept *only* as backward-compatible API aliases (`parse_forensic_report`, `InvalidForensicReport`, `parsed_forensic_reports_to_csv`, `ForensicReport`, the `Forensic` archive-folder name). Use "forensic" only when naming one of those literal pre-existing identifiers — never as a generic description of the report type.
|
||||
|
||||
### RFC 9989 / RFC 9990 / RFC 9991 support
|
||||
|
||||
Aggregate reports parse under both RFC 7489 and RFC 9990 in one code path. RFC 9990 adds these fields, all surfaced through `AggregatePolicyPublished` / `AggregateReportMetadata` / `AggregateAuthResult*`:
|
||||
|
||||
@@ -1,5 +1,17 @@
|
||||
# Changelog
|
||||
|
||||
## 10.2.0
|
||||
|
||||
### Changes
|
||||
|
||||
- **Invalid reports now explain *why* they are invalid.** The parser still catches broadly so one malformed report can never crash a batch, but the diagnostics are no longer a bare `Not a valid report`. `parse_report_file()` keeps each format parser's specific error as it tries aggregate XML → SMTP TLS JSON → report email, and when all three reject the input it sniffs the content shape to surface the single relevant reason — e.g. `Invalid aggregate report: Missing field: 'org_name'`, `Invalid SMTP TLS report: Missing required field: date-range`, or `Not a recognized report format (not a DMARC aggregate XML report, an SMTP TLS JSON report, or a DMARC report email)`. This is the message the CLI already logs (`Failed to parse <file> - <reason>`), so operators see the cause without a code change on their end.
|
||||
- **Original exceptions are now chained.** Every parser catch site re-raises with `raise … from <original>`, so the underlying `ExpatError`, `json.JSONDecodeError`, `KeyError`, and archive/decompression errors are preserved on `__cause__` for library callers and tracebacks instead of being flattened into a string. Behavior is otherwise unchanged — the same exception *types* are still raised.
|
||||
- **Unexpected errors cite their origin in debug mode.** When a catch-all branch turns an unforeseen exception into a generic `Unexpected error: …` / archive / mail-parse failure, the message now appends `(raised at <file>:<line>)` pinpointing the deepest traceback frame — but only when the `parsedmarc` logger is at `DEBUG` level (e.g. the CLI's `--debug`). Normal-level output is unchanged.
|
||||
|
||||
### Bug fixes
|
||||
|
||||
- **Fixed an `IndexError` when backfilling `envelope_from` from SPF results.** In `_parse_report_record()`, when an aggregate report's `<identifiers>` carried an empty `envelope_from`, the fallback checked the raw `auth_results["spf"]` list for length but then indexed the *filtered* `new_record["auth_results"]["spf"]` list (which only contains results that have a `domain`). A reporter that sent an SPF auth result with no `domain` made the filtered list empty while the raw list was non-empty, so the `[-1]` index raised `IndexError` and the whole record failed to parse. The two near-identical `envelope_from` backfill branches (one for a missing identifier, one for an empty one) are now a single code path that gates and indexes the same list, matching the already-correct missing-identifier branch.
|
||||
|
||||
## 10.1.1
|
||||
|
||||
### Changes
|
||||
|
||||
+127
-43
@@ -8,11 +8,13 @@ import binascii
|
||||
import email
|
||||
import email.utils
|
||||
import json
|
||||
import logging
|
||||
import mailbox
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import tempfile
|
||||
import traceback
|
||||
import xml.parsers.expat as expat
|
||||
import zipfile
|
||||
import zlib
|
||||
@@ -144,6 +146,26 @@ class InvalidFailureReport(InvalidDMARCReport):
|
||||
InvalidForensicReport = InvalidFailureReport
|
||||
|
||||
|
||||
def _exc_origin(error: BaseException) -> str:
|
||||
"""Returns a ``" (raised at <file>:<line>)"`` suffix pointing at where an
|
||||
unexpected exception actually originated, but only when the parsedmarc
|
||||
logger is at ``DEBUG`` level. Returns ``""`` otherwise, so normal output is
|
||||
unchanged.
|
||||
|
||||
The deepest traceback frame is the line that raised, which is the useful
|
||||
breadcrumb when a catch-all ``except Exception`` turns an unforeseen error
|
||||
into a generic ``Invalid*Report`` -- without it, the message says *what*
|
||||
failed but not *where*.
|
||||
"""
|
||||
if not logger.isEnabledFor(logging.DEBUG):
|
||||
return ""
|
||||
frames = traceback.extract_tb(error.__traceback__)
|
||||
if not frames:
|
||||
return ""
|
||||
last = frames[-1]
|
||||
return " (raised at {0}:{1})".format(last.filename, last.lineno)
|
||||
|
||||
|
||||
def _text(value: Any) -> Optional[str]:
|
||||
"""Unwrap a possibly-langAttrString value parsed by xmltodict.
|
||||
|
||||
@@ -510,7 +532,9 @@ def _parse_report_record(
|
||||
new_result["human_result"] = _text(result.get("human_result"))
|
||||
new_record["auth_results"]["spf"].append(new_result)
|
||||
|
||||
if "envelope_from" not in new_record["identifiers"]:
|
||||
# Backfill envelope_from from the last SPF result's domain when the
|
||||
# reporter omitted the identifier or sent it empty.
|
||||
if new_record["identifiers"].get("envelope_from") is None:
|
||||
envelope_from = None
|
||||
if len(auth_results["spf"]) > 0:
|
||||
spf_result = auth_results["spf"][-1]
|
||||
@@ -520,13 +544,6 @@ def _parse_report_record(
|
||||
envelope_from = str(envelope_from).lower()
|
||||
new_record["identifiers"]["envelope_from"] = envelope_from
|
||||
|
||||
elif new_record["identifiers"]["envelope_from"] is None:
|
||||
if len(auth_results["spf"]) > 0:
|
||||
envelope_from = new_record["auth_results"]["spf"][-1]["domain"]
|
||||
if envelope_from is not None:
|
||||
envelope_from = str(envelope_from).lower()
|
||||
new_record["identifiers"]["envelope_from"] = envelope_from
|
||||
|
||||
envelope_to = None
|
||||
if "envelope_to" in new_record["identifiers"]:
|
||||
envelope_to = new_record["identifiers"]["envelope_to"]
|
||||
@@ -568,9 +585,11 @@ def _parse_smtp_tls_failure_details(failure_details: dict[str, Any]):
|
||||
return new_failure_details
|
||||
|
||||
except KeyError as e:
|
||||
raise InvalidSMTPTLSReport(f"Missing required failure details field: {e}")
|
||||
raise InvalidSMTPTLSReport(
|
||||
f"Missing required failure details field: {e}"
|
||||
) from e
|
||||
except Exception as e:
|
||||
raise InvalidSMTPTLSReport(str(e))
|
||||
raise InvalidSMTPTLSReport(str(e) + _exc_origin(e)) from e
|
||||
|
||||
|
||||
def _parse_smtp_tls_report_policy(policy: dict[str, Any]):
|
||||
@@ -608,9 +627,9 @@ def _parse_smtp_tls_report_policy(policy: dict[str, Any]):
|
||||
return new_policy
|
||||
|
||||
except KeyError as e:
|
||||
raise InvalidSMTPTLSReport(f"Missing required policy field: {e}")
|
||||
raise InvalidSMTPTLSReport(f"Missing required policy field: {e}") from e
|
||||
except Exception as e:
|
||||
raise InvalidSMTPTLSReport(str(e))
|
||||
raise InvalidSMTPTLSReport(str(e) + _exc_origin(e)) from e
|
||||
|
||||
|
||||
def parse_smtp_tls_report_json(report: Union[str, bytes]) -> SMTPTLSReport:
|
||||
@@ -650,9 +669,9 @@ def parse_smtp_tls_report_json(report: Union[str, bytes]) -> SMTPTLSReport:
|
||||
return new_report
|
||||
|
||||
except KeyError as e:
|
||||
raise InvalidSMTPTLSReport(f"Missing required field: {e}")
|
||||
raise InvalidSMTPTLSReport(f"Missing required field: {e}") from e
|
||||
except Exception as e:
|
||||
raise InvalidSMTPTLSReport(str(e))
|
||||
raise InvalidSMTPTLSReport(str(e) + _exc_origin(e)) from e
|
||||
|
||||
|
||||
def parsed_smtp_tls_reports_to_csv_rows(
|
||||
@@ -1026,15 +1045,21 @@ def parse_aggregate_report_xml(
|
||||
return cast(AggregateReport, new_report)
|
||||
|
||||
except expat.ExpatError as error:
|
||||
raise InvalidAggregateReport("Invalid XML: {0}".format(error.__str__()))
|
||||
raise InvalidAggregateReport(
|
||||
"Invalid XML: {0}".format(error.__str__())
|
||||
) from error
|
||||
|
||||
except KeyError as error:
|
||||
raise InvalidAggregateReport("Missing field: {0}".format(error.__str__()))
|
||||
except AttributeError:
|
||||
raise InvalidAggregateReport("Report missing required section")
|
||||
raise InvalidAggregateReport(
|
||||
"Missing field: {0}".format(error.__str__())
|
||||
) from error
|
||||
except AttributeError as error:
|
||||
raise InvalidAggregateReport("Report missing required section") from error
|
||||
|
||||
except Exception as error:
|
||||
raise InvalidAggregateReport("Unexpected error: {0}".format(error.__str__()))
|
||||
raise InvalidAggregateReport(
|
||||
"Unexpected error: {0}{1}".format(error.__str__(), _exc_origin(error))
|
||||
) from error
|
||||
|
||||
|
||||
def extract_report(content: Union[bytes, str, BinaryIO]) -> str:
|
||||
@@ -1112,10 +1137,10 @@ def extract_report(content: Union[bytes, str, BinaryIO]) -> str:
|
||||
else:
|
||||
raise ParserError("Not a valid zip, gzip, json, or xml file")
|
||||
|
||||
except UnicodeDecodeError:
|
||||
raise ParserError("File objects must be opened in binary (rb) mode")
|
||||
except Exception as error:
|
||||
raise ParserError("Invalid archive file: {0}".format(error.__str__()))
|
||||
raise ParserError(
|
||||
"Invalid archive file: {0}{1}".format(error.__str__(), _exc_origin(error))
|
||||
) from error
|
||||
finally:
|
||||
if file_object:
|
||||
try:
|
||||
@@ -1176,7 +1201,7 @@ def parse_aggregate_report_file(
|
||||
try:
|
||||
xml = extract_report(_input)
|
||||
except Exception as e:
|
||||
raise InvalidAggregateReport(e)
|
||||
raise InvalidAggregateReport(str(e) + _exc_origin(e)) from e
|
||||
|
||||
return parse_aggregate_report_xml(
|
||||
xml,
|
||||
@@ -1557,10 +1582,14 @@ def parse_failure_report(
|
||||
return cast(FailureReport, parsed_report)
|
||||
|
||||
except KeyError as error:
|
||||
raise InvalidFailureReport("Missing value: {0}".format(error.__str__()))
|
||||
raise InvalidFailureReport(
|
||||
"Missing value: {0}".format(error.__str__())
|
||||
) from error
|
||||
|
||||
except Exception as error:
|
||||
raise InvalidFailureReport("Unexpected error: {0}".format(error.__str__()))
|
||||
raise InvalidFailureReport(
|
||||
"Unexpected error: {0}{1}".format(error.__str__(), _exc_origin(error))
|
||||
) from error
|
||||
|
||||
|
||||
def parsed_failure_reports_to_csv_rows(
|
||||
@@ -1728,7 +1757,7 @@ def parse_report_email(
|
||||
msg = email.message_from_string(input_str)
|
||||
|
||||
except Exception as e:
|
||||
raise ParserError(e.__str__())
|
||||
raise ParserError(e.__str__() + _exc_origin(e)) from e
|
||||
subject = None
|
||||
feedback_report = None
|
||||
smtp_tls_report = None
|
||||
@@ -1785,10 +1814,10 @@ def parse_report_email(
|
||||
fields["received-date"], fields["sender-ip-address"]
|
||||
)
|
||||
except Exception as e:
|
||||
error = 'Unable to parse message with subject "{0}": {1}'.format(
|
||||
subject, e
|
||||
error = 'Unable to parse message with subject "{0}": {1}{2}'.format(
|
||||
subject, e, _exc_origin(e)
|
||||
)
|
||||
raise InvalidDMARCReport(error)
|
||||
raise InvalidDMARCReport(error) from e
|
||||
|
||||
sample = parts[1].lstrip()
|
||||
logger.debug(sample)
|
||||
@@ -1827,17 +1856,18 @@ def parse_report_email(
|
||||
except (TypeError, ValueError, binascii.Error):
|
||||
pass
|
||||
|
||||
except InvalidDMARCReport:
|
||||
error = 'Message with subject "{0}" is not a valid DMARC report'.format(
|
||||
subject
|
||||
except InvalidDMARCReport as e:
|
||||
error = (
|
||||
'Message with subject "{0}" is not a valid '
|
||||
"DMARC report: {1}".format(subject, e)
|
||||
)
|
||||
raise ParserError(error)
|
||||
raise ParserError(error) from e
|
||||
|
||||
except Exception as e:
|
||||
error = 'Unable to parse message with subject "{0}": {1}'.format(
|
||||
subject, e
|
||||
error = 'Unable to parse message with subject "{0}": {1}{2}'.format(
|
||||
subject, e, _exc_origin(e)
|
||||
)
|
||||
raise ParserError(error)
|
||||
raise ParserError(error) from e
|
||||
|
||||
if feedback_report and sample:
|
||||
try:
|
||||
@@ -1861,9 +1891,7 @@ def parse_report_email(
|
||||
"is not a valid "
|
||||
"failure DMARC report: {1}".format(subject, e)
|
||||
)
|
||||
raise InvalidFailureReport(error)
|
||||
except Exception as e:
|
||||
raise InvalidFailureReport(e.__str__())
|
||||
raise InvalidFailureReport(error) from e
|
||||
|
||||
result = {"report_type": "failure", "report": failure_report}
|
||||
return result
|
||||
@@ -1875,6 +1903,54 @@ def parse_report_email(
|
||||
return result
|
||||
|
||||
|
||||
# An RFC 5322 header field name (printable ASCII excluding the colon) followed
|
||||
# by a colon at the start of a line, or an mbox "From " separator.
|
||||
_email_header_regex = re.compile(r"^(From |[\x21-\x39\x3b-\x7e]+:)")
|
||||
|
||||
|
||||
def _looks_like_email(text: str) -> bool:
|
||||
"""Returns True if the first line looks like an email header.
|
||||
|
||||
Callers pass already-``lstrip()``-ed text, so the first line is the first
|
||||
meaningful line.
|
||||
"""
|
||||
first_line = text.split("\n", 1)[0]
|
||||
return _email_header_regex.match(first_line) is not None
|
||||
|
||||
|
||||
def _describe_parse_failure(
|
||||
content: Union[str, bytes],
|
||||
aggregate_error: InvalidAggregateReport,
|
||||
smtp_tls_error: InvalidSMTPTLSReport,
|
||||
email_error: InvalidDMARCReport,
|
||||
) -> str:
|
||||
"""Builds a human-readable reason for a parse_report_file failure.
|
||||
|
||||
parse_report_file tries the aggregate XML, SMTP TLS JSON, and report-email
|
||||
parsers in turn; when all three reject the input, only the parser for the
|
||||
format the content actually resembles produced a meaningful error. The
|
||||
other two are noise -- a malformed aggregate report is also "not JSON" and
|
||||
"not an email". Sniff the leading non-whitespace byte to surface the single
|
||||
relevant reason.
|
||||
"""
|
||||
if isinstance(content, (bytes, bytearray, memoryview)):
|
||||
sniff = bytes(content)[:512].decode("utf-8", errors="replace")
|
||||
else:
|
||||
sniff = content[:512]
|
||||
sniff = sniff.lstrip()
|
||||
|
||||
if sniff.startswith("<"):
|
||||
return "Invalid aggregate report: {0}".format(aggregate_error)
|
||||
if sniff.startswith("{"):
|
||||
return "Invalid SMTP TLS report: {0}".format(smtp_tls_error)
|
||||
if _looks_like_email(sniff):
|
||||
return "Invalid report email: {0}".format(email_error)
|
||||
return (
|
||||
"Not a recognized report format (not a DMARC aggregate XML report, "
|
||||
"an SMTP TLS JSON report, or a DMARC report email)"
|
||||
)
|
||||
|
||||
|
||||
def parse_report_file(
|
||||
input_: Union[bytes, str, os.PathLike[str], os.PathLike[bytes], BinaryIO],
|
||||
*,
|
||||
@@ -1930,6 +2006,10 @@ def parse_report_file(
|
||||
|
||||
results: Optional[ParsedReport] = None
|
||||
|
||||
# parse_report_file tries the three report formats in turn. When all three
|
||||
# reject the input, keep each format's specific error so the final message
|
||||
# can explain *why* the file is invalid instead of a bare "Not a valid
|
||||
# report" (see _describe_parse_failure).
|
||||
try:
|
||||
report = parse_aggregate_report_file(
|
||||
content,
|
||||
@@ -1945,11 +2025,11 @@ def parse_report_file(
|
||||
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
|
||||
)
|
||||
results = {"report_type": "aggregate", "report": report}
|
||||
except InvalidAggregateReport:
|
||||
except InvalidAggregateReport as aggregate_error:
|
||||
try:
|
||||
report = parse_smtp_tls_report_json(content)
|
||||
results = {"report_type": "smtp_tls", "report": report}
|
||||
except InvalidSMTPTLSReport:
|
||||
except InvalidSMTPTLSReport as smtp_tls_error:
|
||||
try:
|
||||
results = parse_report_email(
|
||||
content,
|
||||
@@ -1965,8 +2045,12 @@ def parse_report_file(
|
||||
keep_alive=keep_alive,
|
||||
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
|
||||
)
|
||||
except InvalidDMARCReport:
|
||||
raise ParserError("Not a valid report")
|
||||
except InvalidDMARCReport as email_error:
|
||||
raise ParserError(
|
||||
_describe_parse_failure(
|
||||
content, aggregate_error, smtp_tls_error, email_error
|
||||
)
|
||||
) from email_error
|
||||
|
||||
if results is None:
|
||||
raise ParserError("Not a valid report")
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
__version__ = "10.1.1"
|
||||
__version__ = "10.2.0"
|
||||
|
||||
USER_AGENT = f"parsedmarc/{__version__}"
|
||||
|
||||
|
||||
+281
-1
@@ -5,7 +5,10 @@ parse_aggregate_report_xml, parse_failure_report, parse_smtp_tls_report_json,
|
||||
extract_report, get_dmarc_reports_from_mbox, and the CSV / JSON renderers.
|
||||
"""
|
||||
|
||||
import base64
|
||||
import gzip
|
||||
import json
|
||||
import logging
|
||||
import mailbox
|
||||
import os
|
||||
import unittest
|
||||
@@ -935,6 +938,33 @@ class Test(unittest.TestCase):
|
||||
result = parsedmarc._parse_report_record(record, offline=True)
|
||||
self.assertEqual(result["identifiers"]["envelope_from"], "spf.example.com")
|
||||
|
||||
def testParseReportRecordEnvelopeFromNullNoSpfDomain(self):
|
||||
"""envelope_from=None with SPF results that carry no domain must not
|
||||
raise IndexError (regression: the branch gated on the raw SPF list but
|
||||
indexed the filtered list, which is empty when no result has a domain)"""
|
||||
record = {
|
||||
"row": {
|
||||
"source_ip": "192.0.2.1",
|
||||
"count": "1",
|
||||
"policy_evaluated": {
|
||||
"disposition": "none",
|
||||
"dkim": "pass",
|
||||
"spf": "pass",
|
||||
},
|
||||
},
|
||||
"identifiers": {
|
||||
"header_from": "example.com",
|
||||
"envelope_from": None,
|
||||
},
|
||||
# A raw SPF result with no "domain" -> filtered list is empty.
|
||||
"auth_results": {
|
||||
"dkim": [],
|
||||
"spf": [{"scope": "mfrom", "result": "pass"}],
|
||||
},
|
||||
}
|
||||
result = parsedmarc._parse_report_record(record, offline=True)
|
||||
self.assertIsNone(result["identifiers"]["envelope_from"])
|
||||
|
||||
def testParseReportRecordEnvelopeTo(self):
|
||||
"""envelope_to is preserved and moved correctly"""
|
||||
record = {
|
||||
@@ -1013,6 +1043,14 @@ class Test(unittest.TestCase):
|
||||
with self.assertRaises(parsedmarc.InvalidSMTPTLSReport):
|
||||
parsedmarc._parse_smtp_tls_failure_details({"result-type": "err"})
|
||||
|
||||
def testParseSmtpTlsFailureDetailsNonDict(self):
|
||||
"""A non-dict failure-details value hits the catch-all (TypeError,
|
||||
not KeyError) and is wrapped as InvalidSMTPTLSReport"""
|
||||
with self.assertRaises(parsedmarc.InvalidSMTPTLSReport) as ctx:
|
||||
# Deliberate wrong type to exercise the non-KeyError catch-all.
|
||||
parsedmarc._parse_smtp_tls_failure_details("not a dict") # pyright: ignore[reportArgumentType]
|
||||
self.assertIsInstance(ctx.exception.__cause__, TypeError)
|
||||
|
||||
def testParseSmtpTlsReportPolicyValid(self):
|
||||
"""Valid STS policy parses correctly"""
|
||||
policy = {
|
||||
@@ -1822,8 +1860,25 @@ class TestMalformedXmlRecovery(unittest.TestCase):
|
||||
<auth_results><spf><domain>example.com</domain><result>pass</result></spf></auth_results>
|
||||
</record>
|
||||
</feedback>"""
|
||||
with self.assertRaises(parsedmarc.InvalidAggregateReport):
|
||||
with self.assertRaises(parsedmarc.InvalidAggregateReport) as ctx:
|
||||
parsedmarc.parse_aggregate_report_xml(xml, offline=True)
|
||||
# The missing-field error chains the underlying KeyError so a library
|
||||
# caller can inspect which field was absent.
|
||||
self.assertIsInstance(ctx.exception.__cause__, KeyError)
|
||||
|
||||
def testReportMetadataNotAStructureRaises(self):
|
||||
"""A non-structured report_metadata trips the AttributeError branch"""
|
||||
# report_metadata is plain text rather than nested elements, so the
|
||||
# parser's attribute access on it fails with AttributeError.
|
||||
xml = (
|
||||
"<feedback><report_metadata>x</report_metadata>"
|
||||
"<policy_published><domain>x.com</domain></policy_published>"
|
||||
"<record></record></feedback>"
|
||||
)
|
||||
with self.assertRaises(parsedmarc.InvalidAggregateReport) as ctx:
|
||||
parsedmarc.parse_aggregate_report_xml(xml, offline=True)
|
||||
self.assertIn("missing required section", str(ctx.exception))
|
||||
self.assertIsInstance(ctx.exception.__cause__, AttributeError)
|
||||
|
||||
|
||||
class TestPolicyPublishedEdgeCases(unittest.TestCase):
|
||||
@@ -1995,6 +2050,134 @@ class TestParseReportFile(unittest.TestCase):
|
||||
with self.assertRaises(parsedmarc.ParserError):
|
||||
parsedmarc.parse_report_file(b"this is not a report", offline=True)
|
||||
|
||||
def testParseReportFileInvalidAggregateReason(self):
|
||||
"""Malformed aggregate XML explains the aggregate-specific reason"""
|
||||
xml = (
|
||||
b'<?xml version="1.0"?>\n<feedback>\n'
|
||||
b"<report_metadata><email>dmarc@example.com</email>"
|
||||
b"<report_id>no-org</report_id></report_metadata>\n"
|
||||
b"<policy_published><domain>example.com</domain><p>none</p>"
|
||||
b"</policy_published>\n</feedback>"
|
||||
)
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_file(xml, offline=True)
|
||||
message = str(ctx.exception)
|
||||
# The reason must name the aggregate format and the missing field,
|
||||
# not collapse to a bare "Not a valid report".
|
||||
self.assertIn("aggregate", message.lower())
|
||||
self.assertIn("org_name", message)
|
||||
self.assertNotIn("Not a valid report", message)
|
||||
|
||||
def testParseReportFileInvalidSmtpTlsReason(self):
|
||||
"""Malformed SMTP TLS JSON explains the SMTP-TLS-specific reason"""
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_file(b'{"organization-name": "x"}', offline=True)
|
||||
message = str(ctx.exception)
|
||||
self.assertIn("SMTP TLS", message)
|
||||
self.assertIn("date-range", message)
|
||||
|
||||
def testParseReportFileInvalidFailureReason(self):
|
||||
"""A malformed failure report (email path) explains the reason"""
|
||||
# A real DMARC failure report arrives as a multipart/report email;
|
||||
# parse_report_file reaches it only via the email branch. Omit the
|
||||
# required Source-IP so parse_failure_report rejects it.
|
||||
eml = (
|
||||
b"From: dmarc-noreply@example.com\n"
|
||||
b"Subject: DMARC Failure Report\n"
|
||||
b"MIME-Version: 1.0\n"
|
||||
b"Content-Type: multipart/report; "
|
||||
b'report-type=feedback-report; boundary="b"\n\n'
|
||||
b"--b\n"
|
||||
b"Content-Type: text/plain\n\n"
|
||||
b"This is a DMARC failure report.\n"
|
||||
b"--b\n"
|
||||
b"Content-Type: message/feedback-report\n\n"
|
||||
b"Feedback-Type: auth-failure\n"
|
||||
b"Version: 1\n"
|
||||
b"--b\n"
|
||||
b"Content-Type: message/rfc822\n\n"
|
||||
b"From: spoof@victim.example\n"
|
||||
b"Subject: hi\n"
|
||||
b"--b--\n"
|
||||
)
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_file(eml, offline=True)
|
||||
message = str(ctx.exception)
|
||||
# The reason must name the failure format and the missing field,
|
||||
# not collapse to a bare "Not a valid report".
|
||||
self.assertIn("failure", message.lower())
|
||||
self.assertIn("source_ip", message)
|
||||
self.assertNotIn("Not a valid report", message)
|
||||
|
||||
def testParseReportFileUnrecognizedGzip(self):
|
||||
"""Gzipped junk decompresses to a str that matches no format.
|
||||
|
||||
Exercises the str branch of the content sniff (the decompressed
|
||||
payload is a str, not bytes).
|
||||
"""
|
||||
blob = gzip.compress(b"plain junk not a report")
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_file(blob, offline=True)
|
||||
self.assertIn("recognized report format", str(ctx.exception))
|
||||
|
||||
def testParseReportFileUnrecognizedFormat(self):
|
||||
"""Content matching no known format says so explicitly"""
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_file(b"this is not a report", offline=True)
|
||||
self.assertIn("recognized report format", str(ctx.exception))
|
||||
|
||||
def testParseReportFilePreservesCause(self):
|
||||
"""The raised ParserError chains the underlying parse failure"""
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_file(b"<feedback></feedback>", offline=True)
|
||||
# raise ... from email_error must populate __cause__ for callers and
|
||||
# tracebacks, even though the cross-format message is content-sniffed.
|
||||
self.assertIsNotNone(ctx.exception.__cause__)
|
||||
|
||||
def _parse_unexpected_error(self):
|
||||
# <feedback></feedback> parses as XML but trips a NoneType subscript
|
||||
# deep in the aggregate parser, hitting the catch-all "Unexpected
|
||||
# error" branch (not a narrow KeyError/ExpatError).
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_file(b"<feedback></feedback>", offline=True)
|
||||
return str(ctx.exception)
|
||||
|
||||
def testUnexpectedErrorOmitsOriginWhenNotDebug(self):
|
||||
"""Catch-all errors stay clean when the logger is above DEBUG"""
|
||||
logger = logging.getLogger("parsedmarc.log")
|
||||
previous = logger.level
|
||||
logger.setLevel(logging.WARNING)
|
||||
try:
|
||||
message = self._parse_unexpected_error()
|
||||
finally:
|
||||
logger.setLevel(previous)
|
||||
self.assertIn("Unexpected error", message)
|
||||
self.assertNotIn("raised at", message)
|
||||
|
||||
def testUnexpectedErrorCitesOriginInDebug(self):
|
||||
"""Catch-all errors cite the source file:line when the logger is DEBUG"""
|
||||
logger = logging.getLogger("parsedmarc.log")
|
||||
previous = logger.level
|
||||
logger.setLevel(logging.DEBUG)
|
||||
try:
|
||||
message = self._parse_unexpected_error()
|
||||
finally:
|
||||
logger.setLevel(previous)
|
||||
self.assertIn("raised at", message)
|
||||
self.assertIn("__init__.py:", message)
|
||||
|
||||
def testExcOriginEmptyWhenNoTraceback(self):
|
||||
"""_exc_origin returns '' for an exception with no traceback"""
|
||||
logger = logging.getLogger("parsedmarc.log")
|
||||
previous = logger.level
|
||||
logger.setLevel(logging.DEBUG)
|
||||
try:
|
||||
# A never-raised exception has __traceback__ is None, so there is
|
||||
# no origin frame to cite even though debug logging is on.
|
||||
self.assertEqual(parsedmarc._exc_origin(ValueError("x")), "")
|
||||
finally:
|
||||
logger.setLevel(previous)
|
||||
|
||||
|
||||
class TestParseReportEmail(unittest.TestCase):
|
||||
"""Tests for parse_report_email edge cases"""
|
||||
@@ -2017,6 +2200,79 @@ This is not a DMARC report."""
|
||||
with self.assertRaises(parsedmarc.InvalidDMARCReport):
|
||||
parsedmarc.parse_report_email(email_str, offline=True)
|
||||
|
||||
def testUnparseableDateRaisesParserError(self):
|
||||
"""An unparseable Date header trips the initial mail-parse catch-all"""
|
||||
# human_timestamp_to_datetime() raises on a junk Date, which the
|
||||
# catch-all around the initial parse turns into a ParserError.
|
||||
email_str = "From: a@b.c\nDate: not-a-real-date\nSubject: x\n\nbody"
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_email(email_str, offline=True)
|
||||
self.assertIn("not-a-real-date", str(ctx.exception))
|
||||
|
||||
def testFailureTextReportParses(self):
|
||||
"""A valid legacy text/plain failure report parses to a failure
|
||||
report (the success path that builds the synthetic feedback report
|
||||
and extracts the message sample)"""
|
||||
# The field-name regex matches letters and spaces only, so the legacy
|
||||
# fields are space-separated ("Received Date", "Sender IP Address").
|
||||
eml = (
|
||||
"From: report@example.com\nSubject: Failure Report\n"
|
||||
"Content-Type: text/plain\n\n"
|
||||
"A message claiming to be from you has failed authentication.\n"
|
||||
"Received Date: Mon, 01 Jan 2024 00:00:00 +0000\n"
|
||||
"Sender IP Address: 192.0.2.1\n"
|
||||
"detected.\n"
|
||||
"From: spoof@example.com\nTo: victim@example.com\nSubject: spam\n"
|
||||
)
|
||||
result = parsedmarc.parse_report_email(eml, offline=True)
|
||||
self.assertEqual(result["report_type"], "failure")
|
||||
report = cast(FailureReport, result["report"])
|
||||
self.assertEqual(report["source"]["ip_address"], "192.0.2.1")
|
||||
|
||||
def testFailureTextMissingFieldsRaises(self):
|
||||
"""A text/plain failure report missing its fields is rejected with
|
||||
the subject named (not silently dropped)"""
|
||||
# Has the trigger phrase and "detected." but none of the
|
||||
# Received-Date / Sender-IP-Address fields, so building the synthetic
|
||||
# feedback report raises KeyError, surfaced as InvalidDMARCReport.
|
||||
eml = (
|
||||
"From: a@b.c\nSubject: Failure\nContent-Type: text/plain\n\n"
|
||||
"A message claiming to be from you has failed. "
|
||||
"No fields here detected. nothing\n"
|
||||
)
|
||||
with self.assertRaises(parsedmarc.InvalidDMARCReport) as ctx:
|
||||
parsedmarc.parse_report_email(eml, offline=True)
|
||||
self.assertIn("Failure", str(ctx.exception))
|
||||
|
||||
def testAttachmentMalformedXmlRaises(self):
|
||||
"""A base64 attachment of malformed aggregate XML is rejected"""
|
||||
att = base64.b64encode(b"<feedback></feedback>").decode()
|
||||
eml = (
|
||||
"From: a@b.c\nSubject: Agg\nMIME-Version: 1.0\n"
|
||||
"Content-Type: application/octet-stream\n"
|
||||
"Content-Transfer-Encoding: base64\n\n" + att + "\n"
|
||||
)
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_email(eml, offline=True)
|
||||
self.assertIn("not a valid DMARC report", str(ctx.exception))
|
||||
|
||||
def testAttachmentInvalidJsonRaises(self):
|
||||
"""A base64 attachment of invalid SMTP TLS JSON is rejected.
|
||||
|
||||
parse_smtp_tls_report_json raises InvalidSMTPTLSReport, a sibling of
|
||||
InvalidDMARCReport, so it falls through to the generic catch-all and
|
||||
becomes a ParserError naming the subject.
|
||||
"""
|
||||
att = base64.b64encode(b"{not valid json").decode()
|
||||
eml = (
|
||||
"From: a@b.c\nSubject: Tls\nMIME-Version: 1.0\n"
|
||||
"Content-Type: application/octet-stream\n"
|
||||
"Content-Transfer-Encoding: base64\n\n" + att + "\n"
|
||||
)
|
||||
with self.assertRaises(parsedmarc.ParserError) as ctx:
|
||||
parsedmarc.parse_report_email(eml, offline=True)
|
||||
self.assertIn("Tls", str(ctx.exception))
|
||||
|
||||
|
||||
class TestFailureReportParsing(unittest.TestCase):
|
||||
"""Tests for failure report field defaults and edge cases"""
|
||||
@@ -2232,6 +2488,30 @@ class TestSmtpTlsReportErrors(unittest.TestCase):
|
||||
with self.assertRaises(parsedmarc.InvalidSMTPTLSReport):
|
||||
parsedmarc.parse_smtp_tls_report_json("not json {{{")
|
||||
|
||||
def testInvalidJsonPreservesCause(self):
|
||||
"""Invalid JSON chains the underlying JSONDecodeError"""
|
||||
with self.assertRaises(parsedmarc.InvalidSMTPTLSReport) as ctx:
|
||||
parsedmarc.parse_smtp_tls_report_json("not json {{{")
|
||||
self.assertIsInstance(ctx.exception.__cause__, json.JSONDecodeError)
|
||||
|
||||
def testNestedMissingKeyNamesTheField(self):
|
||||
"""A KeyError on a nested field reports which field was missing"""
|
||||
# All five required top-level fields are present, so this gets past the
|
||||
# top-level check and raises KeyError on date-range["start-datetime"].
|
||||
report = json.dumps(
|
||||
{
|
||||
"organization-name": "x",
|
||||
"date-range": {},
|
||||
"contact-info": "x",
|
||||
"report-id": "x",
|
||||
"policies": [],
|
||||
}
|
||||
)
|
||||
with self.assertRaises(parsedmarc.InvalidSMTPTLSReport) as ctx:
|
||||
parsedmarc.parse_smtp_tls_report_json(report)
|
||||
self.assertIn("start-datetime", str(ctx.exception))
|
||||
self.assertIsInstance(ctx.exception.__cause__, KeyError)
|
||||
|
||||
|
||||
class TestBucketIntervalEdgeCases(unittest.TestCase):
|
||||
"""Tests for _bucket_interval_by_day edge cases"""
|
||||
|
||||
Reference in New Issue
Block a user