10.2.0 - Explain why a report is invalid instead of "Not a valid report" (#802)

* Explain why a report is invalid instead of "Not a valid report" The parser catches broadly so one malformed report can't crash a batch, but every failure surfaced as the generic ParserError("Not a valid report"), telling operators nothing about the cause. parse_report_file() now keeps each format parser's specific error as it tries aggregate XML -> SMTP TLS JSON -> report email, and when all three reject the input it content-sniffs the leading byte to surface the single relevant reason (e.g. "Invalid aggregate report: Missing field: 'org_name'", or "Not a recognized report format (...)"). The CLI already logs str(error), so this reaches the user with no cli.py change. Every parser catch site also re-raises with `raise ... from <original>`, preserving the underlying ExpatError / JSONDecodeError / KeyError / archive errors on __cause__ for library callers and tracebacks. The same exception *types* are still raised. Finally, the catch-all "unexpected error" branches append `(raised at <file>:<line>)` from the deepest traceback frame, but only when the parsedmarc logger is at DEBUG level (e.g. the CLI's --debug); normal-level output is unchanged. Bumps the in-progress version to 10.2.0 and documents all three in the CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cover the failure-report path in parse_report_file error tests The reason-surfacing tests covered the aggregate and SMTP TLS branches but not failure reports, which reach parse_report_file only via the email path. Add a malformed multipart/report failure email (missing the required Source-IP) and assert the message names the failure format and the missing field rather than collapsing to "Not a valid report". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cover the new error-reporting lines; drop one unreachable catch Bring the lines added by this PR to full test coverage: - Test _exc_origin() with a never-raised exception (no __traceback__) so the "no frames" guard is exercised. - Test _parse_smtp_tls_failure_details() with a non-dict, which raises TypeError (not KeyError) and exercises the generic catch-all. - Test parse_report_email() with an unparseable Date header, which trips the initial mail-parse catch-all and becomes a ParserError. Two dead lines are removed rather than hidden, per the project's "delete unreachable branches, no # pragma: no cover" rule: - _looks_like_email() looped with a `continue` for blank lines, but every caller passes lstrip()-ed text, so the first line is never blank. Simplified to inspect the first line directly. - parse_report_email()'s `except Exception` after `except InvalidFailureReport` was unreachable: parse_failure_report wraps its entire body and provably raises only InvalidFailureReport, which the preceding handler already catches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix IndexError when backfilling envelope_from from SPF results _parse_report_record() backfills a missing/empty envelope_from from the last SPF auth result's domain. The "envelope_from is None" branch gated on the raw auth_results["spf"] list but indexed the filtered new_record["auth_results"]["spf"] list, which only holds results that have a domain. A reporter sending an SPF result with no domain made the filtered list empty while the raw list was non-empty, so [-1] raised IndexError and the whole record failed to parse. The two near-identical envelope_from backfill branches (missing identifier vs. empty identifier) drifted apart -- only one was updated when the filtered new_record list was introduced -- which is what let them disagree on which list to read. Merge them into a single path, keyed on dict.get("envelope_from") is None, that gates and indexes the same raw list with the "domain" membership guard the missing-identifier branch already used. Regression test: envelope_from=None with an SPF result carrying no domain now parses to envelope_from=None instead of raising. This is the bug that motivated the surrounding error-reporting work in this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Trim comment * Cover the touched error branches; drop a dead UnicodeDecodeError catch Codecov flagged the pre-existing error branches this PR touched (adding `from e` / `_exc_origin`) as changed-and-uncovered. Most are real malformed-report paths, so add honest tests that drive them with realistic inputs: - parse_smtp_tls_report_json: nested missing key (date-range without start-datetime) -> InvalidSMTPTLSReport chaining a KeyError. - parse_aggregate_report_xml: non-structured report_metadata -> the AttributeError branch ("Report missing required section"). - parse_report_email: valid legacy text/plain failure report (success path), a text report missing its fields, a base64 attachment of malformed aggregate XML, and one of invalid SMTP TLS JSON. - parse_report_file: gzipped junk -> the str branch of the content sniff. The `except UnicodeDecodeError` in extract_report is removed as dead code (no `# pragma: no cover`, per the repo rule): str-mode streams are already rejected by explicit isinstance checks, and every decode() uses errors="ignore", so it can never fire. str-mode still raises ParserError. Also rename the two new failure-report tests from "Forensic" to "Failure" and add an AGENTS.md rule: RUF reports are "failure reports"; "forensic" is reserved for the literal backward-compat alias identifiers only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix pyright errors in the new error-branch tests CI runs pyright over the whole repo (tests included); these slipped through because the local check only covered parsedmarc/__init__.py: - _parse_smtp_tls_failure_details("not a dict") is a deliberate wrong-type test -> targeted `# pyright: ignore[reportArgumentType]`. - result["report"]["source"]["ip_address"] on a ParsedReport TypedDict -> cast(FailureReport, result["report"]) first, matching existing tests. This is what failed lint-docs-build (and, since `test` needs it, skipped the Codecov upload) on the prior commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-05 16:25:09 +00:00 · 2026-06-25 14:53:56 -04:00
parent bc17e6afb2
commit a67e8d3ebc
5 changed files with 423 additions and 45 deletions
@@ -60,6 +60,8 @@ To skip DNS lookups during testing, set `GITHUB_ACTIONS=true`.

 `ReportType = Literal["aggregate", "failure", "smtp_tls"]`. Exception hierarchy: `ParserError` → `InvalidDMARCReport` → `InvalidAggregateReport`/`InvalidFailureReport`, and `InvalidSMTPTLSReport`. Legacy alias `InvalidForensicReport = InvalidFailureReport` preserved.

+**Terminology: say "failure report", never "forensic report".** RUF reports are **failure reports** (RFC 9991 terminology) everywhere in new code, test names, docstrings, comments, CHANGELOG entries, and prose. "Forensic" is the legacy term, kept *only* as backward-compatible API aliases (`parse_forensic_report`, `InvalidForensicReport`, `parsed_forensic_reports_to_csv`, `ForensicReport`, the `Forensic` archive-folder name). Use "forensic" only when naming one of those literal pre-existing identifiers — never as a generic description of the report type.
+
 ### RFC 9989 / RFC 9990 / RFC 9991 support

 Aggregate reports parse under both RFC 7489 and RFC 9990 in one code path. RFC 9990 adds these fields, all surfaced through `AggregatePolicyPublished` / `AggregateReportMetadata` / `AggregateAuthResult*`:
@@ -1,5 +1,17 @@
 # Changelog

+## 10.2.0
+
+### Changes
+
+- **Invalid reports now explain *why* they are invalid.** The parser still catches broadly so one malformed report can never crash a batch, but the diagnostics are no longer a bare `Not a valid report`. `parse_report_file()` keeps each format parser's specific error as it tries aggregate XML → SMTP TLS JSON → report email, and when all three reject the input it sniffs the content shape to surface the single relevant reason — e.g. `Invalid aggregate report: Missing field: 'org_name'`, `Invalid SMTP TLS report: Missing required field: date-range`, or `Not a recognized report format (not a DMARC aggregate XML report, an SMTP TLS JSON report, or a DMARC report email)`. This is the message the CLI already logs (`Failed to parse <file> - <reason>`), so operators see the cause without a code change on their end.
+- **Original exceptions are now chained.** Every parser catch site re-raises with `raise … from <original>`, so the underlying `ExpatError`, `json.JSONDecodeError`, `KeyError`, and archive/decompression errors are preserved on `__cause__` for library callers and tracebacks instead of being flattened into a string. Behavior is otherwise unchanged — the same exception *types* are still raised.
+- **Unexpected errors cite their origin in debug mode.** When a catch-all branch turns an unforeseen exception into a generic `Unexpected error: …` / archive / mail-parse failure, the message now appends `(raised at <file>:<line>)` pinpointing the deepest traceback frame — but only when the `parsedmarc` logger is at `DEBUG` level (e.g. the CLI's `--debug`). Normal-level output is unchanged.
+
+### Bug fixes
+
+- **Fixed an `IndexError` when backfilling `envelope_from` from SPF results.** In `_parse_report_record()`, when an aggregate report's `<identifiers>` carried an empty `envelope_from`, the fallback checked the raw `auth_results["spf"]` list for length but then indexed the *filtered* `new_record["auth_results"]["spf"]` list (which only contains results that have a `domain`). A reporter that sent an SPF auth result with no `domain` made the filtered list empty while the raw list was non-empty, so the `[-1]` index raised `IndexError` and the whole record failed to parse. The two near-identical `envelope_from` backfill branches (one for a missing identifier, one for an empty one) are now a single code path that gates and indexes the same list, matching the already-correct missing-identifier branch.
+
 ## 10.1.1

 ### Changes
@@ -8,11 +8,13 @@ import binascii
 import email
 import email.utils
 import json
+import logging
 import mailbox
 import os
 import re
 import shutil
 import tempfile
+import traceback
 import xml.parsers.expat as expat
 import zipfile
 import zlib
@@ -144,6 +146,26 @@ class InvalidFailureReport(InvalidDMARCReport):
 InvalidForensicReport = InvalidFailureReport


+def _exc_origin(error: BaseException) -> str:
+    """Returns a ``" (raised at <file>:<line>)"`` suffix pointing at where an
+    unexpected exception actually originated, but only when the parsedmarc
+    logger is at ``DEBUG`` level. Returns ``""`` otherwise, so normal output is
+    unchanged.
+
+    The deepest traceback frame is the line that raised, which is the useful
+    breadcrumb when a catch-all ``except Exception`` turns an unforeseen error
+    into a generic ``Invalid*Report`` -- without it, the message says *what*
+    failed but not *where*.
+    """
+    if not logger.isEnabledFor(logging.DEBUG):
+        return ""
+    frames = traceback.extract_tb(error.__traceback__)
+    if not frames:
+        return ""
+    last = frames[-1]
+    return " (raised at {0}:{1})".format(last.filename, last.lineno)
+
+
 def _text(value: Any) -> Optional[str]:
    """Unwrap a possibly-langAttrString value parsed by xmltodict.

@@ -510,7 +532,9 @@ def _parse_report_record(
            new_result["human_result"] = _text(result.get("human_result"))
            new_record["auth_results"]["spf"].append(new_result)

-    if "envelope_from" not in new_record["identifiers"]:
+    # Backfill envelope_from from the last SPF result's domain when the
+    # reporter omitted the identifier or sent it empty.
+    if new_record["identifiers"].get("envelope_from") is None:
        envelope_from = None
        if len(auth_results["spf"]) > 0:
            spf_result = auth_results["spf"][-1]
@@ -520,13 +544,6 @@ def _parse_report_record(
            envelope_from = str(envelope_from).lower()
        new_record["identifiers"]["envelope_from"] = envelope_from

-    elif new_record["identifiers"]["envelope_from"] is None:
-        if len(auth_results["spf"]) > 0:
-            envelope_from = new_record["auth_results"]["spf"][-1]["domain"]
-            if envelope_from is not None:
-                envelope_from = str(envelope_from).lower()
-            new_record["identifiers"]["envelope_from"] = envelope_from
-
    envelope_to = None
    if "envelope_to" in new_record["identifiers"]:
        envelope_to = new_record["identifiers"]["envelope_to"]
@@ -568,9 +585,11 @@ def _parse_smtp_tls_failure_details(failure_details: dict[str, Any]):
        return new_failure_details

    except KeyError as e:
-        raise InvalidSMTPTLSReport(f"Missing required failure details field: {e}")
+        raise InvalidSMTPTLSReport(
+            f"Missing required failure details field: {e}"
+        ) from e
    except Exception as e:
-        raise InvalidSMTPTLSReport(str(e))
+        raise InvalidSMTPTLSReport(str(e) + _exc_origin(e)) from e


 def _parse_smtp_tls_report_policy(policy: dict[str, Any]):
@@ -608,9 +627,9 @@ def _parse_smtp_tls_report_policy(policy: dict[str, Any]):
        return new_policy

    except KeyError as e:
-        raise InvalidSMTPTLSReport(f"Missing required policy field: {e}")
+        raise InvalidSMTPTLSReport(f"Missing required policy field: {e}") from e
    except Exception as e:
-        raise InvalidSMTPTLSReport(str(e))
+        raise InvalidSMTPTLSReport(str(e) + _exc_origin(e)) from e


 def parse_smtp_tls_report_json(report: Union[str, bytes]) -> SMTPTLSReport:
@@ -650,9 +669,9 @@ def parse_smtp_tls_report_json(report: Union[str, bytes]) -> SMTPTLSReport:
        return new_report

    except KeyError as e:
-        raise InvalidSMTPTLSReport(f"Missing required field: {e}")
+        raise InvalidSMTPTLSReport(f"Missing required field: {e}") from e
    except Exception as e:
-        raise InvalidSMTPTLSReport(str(e))
+        raise InvalidSMTPTLSReport(str(e) + _exc_origin(e)) from e


 def parsed_smtp_tls_reports_to_csv_rows(
@@ -1026,15 +1045,21 @@ def parse_aggregate_report_xml(
        return cast(AggregateReport, new_report)

    except expat.ExpatError as error:
-        raise InvalidAggregateReport("Invalid XML: {0}".format(error.__str__()))
+        raise InvalidAggregateReport(
+            "Invalid XML: {0}".format(error.__str__())
+        ) from error

    except KeyError as error:
-        raise InvalidAggregateReport("Missing field: {0}".format(error.__str__()))
-    except AttributeError:
-        raise InvalidAggregateReport("Report missing required section")
+        raise InvalidAggregateReport(
+            "Missing field: {0}".format(error.__str__())
+        ) from error
+    except AttributeError as error:
+        raise InvalidAggregateReport("Report missing required section") from error

    except Exception as error:
-        raise InvalidAggregateReport("Unexpected error: {0}".format(error.__str__()))
+        raise InvalidAggregateReport(
+            "Unexpected error: {0}{1}".format(error.__str__(), _exc_origin(error))
+        ) from error


 def extract_report(content: Union[bytes, str, BinaryIO]) -> str:
@@ -1112,10 +1137,10 @@ def extract_report(content: Union[bytes, str, BinaryIO]) -> str:
        else:
            raise ParserError("Not a valid zip, gzip, json, or xml file")

-    except UnicodeDecodeError:
-        raise ParserError("File objects must be opened in binary (rb) mode")
    except Exception as error:
-        raise ParserError("Invalid archive file: {0}".format(error.__str__()))
+        raise ParserError(
+            "Invalid archive file: {0}{1}".format(error.__str__(), _exc_origin(error))
+        ) from error
    finally:
        if file_object:
            try:
@@ -1176,7 +1201,7 @@ def parse_aggregate_report_file(
    try:
        xml = extract_report(_input)
    except Exception as e:
-        raise InvalidAggregateReport(e)
+        raise InvalidAggregateReport(str(e) + _exc_origin(e)) from e

    return parse_aggregate_report_xml(
        xml,
@@ -1557,10 +1582,14 @@ def parse_failure_report(
        return cast(FailureReport, parsed_report)

    except KeyError as error:
-        raise InvalidFailureReport("Missing value: {0}".format(error.__str__()))
+        raise InvalidFailureReport(
+            "Missing value: {0}".format(error.__str__())
+        ) from error

    except Exception as error:
-        raise InvalidFailureReport("Unexpected error: {0}".format(error.__str__()))
+        raise InvalidFailureReport(
+            "Unexpected error: {0}{1}".format(error.__str__(), _exc_origin(error))
+        ) from error


 def parsed_failure_reports_to_csv_rows(
@@ -1728,7 +1757,7 @@ def parse_report_email(
        msg = email.message_from_string(input_str)

    except Exception as e:
-        raise ParserError(e.__str__())
+        raise ParserError(e.__str__() + _exc_origin(e)) from e
    subject = None
    feedback_report = None
    smtp_tls_report = None
@@ -1785,10 +1814,10 @@ def parse_report_email(
                        fields["received-date"], fields["sender-ip-address"]
                    )
                except Exception as e:
-                    error = 'Unable to parse message with subject "{0}": {1}'.format(
-                        subject, e
+                    error = 'Unable to parse message with subject "{0}": {1}{2}'.format(
+                        subject, e, _exc_origin(e)
                    )
-                    raise InvalidDMARCReport(error)
+                    raise InvalidDMARCReport(error) from e

                sample = parts[1].lstrip()
                logger.debug(sample)
@@ -1827,17 +1856,18 @@ def parse_report_email(
            except (TypeError, ValueError, binascii.Error):
                pass

-            except InvalidDMARCReport:
-                error = 'Message with subject "{0}" is not a valid DMARC report'.format(
-                    subject
+            except InvalidDMARCReport as e:
+                error = (
+                    'Message with subject "{0}" is not a valid '
+                    "DMARC report: {1}".format(subject, e)
                )
-                raise ParserError(error)
+                raise ParserError(error) from e

            except Exception as e:
-                error = 'Unable to parse message with subject "{0}": {1}'.format(
-                    subject, e
+                error = 'Unable to parse message with subject "{0}": {1}{2}'.format(
+                    subject, e, _exc_origin(e)
                )
-                raise ParserError(error)
+                raise ParserError(error) from e

    if feedback_report and sample:
        try:
@@ -1861,9 +1891,7 @@ def parse_report_email(
                "is not a valid "
                "failure DMARC report: {1}".format(subject, e)
            )
-            raise InvalidFailureReport(error)
-        except Exception as e:
-            raise InvalidFailureReport(e.__str__())
+            raise InvalidFailureReport(error) from e

        result = {"report_type": "failure", "report": failure_report}
        return result
@@ -1875,6 +1903,54 @@ def parse_report_email(
    return result


+# An RFC 5322 header field name (printable ASCII excluding the colon) followed
+# by a colon at the start of a line, or an mbox "From " separator.
+_email_header_regex = re.compile(r"^(From |[\x21-\x39\x3b-\x7e]+:)")
+
+
+def _looks_like_email(text: str) -> bool:
+    """Returns True if the first line looks like an email header.
+
+    Callers pass already-``lstrip()``-ed text, so the first line is the first
+    meaningful line.
+    """
+    first_line = text.split("\n", 1)[0]
+    return _email_header_regex.match(first_line) is not None
+
+
+def _describe_parse_failure(
+    content: Union[str, bytes],
+    aggregate_error: InvalidAggregateReport,
+    smtp_tls_error: InvalidSMTPTLSReport,
+    email_error: InvalidDMARCReport,
+) -> str:
+    """Builds a human-readable reason for a parse_report_file failure.
+
+    parse_report_file tries the aggregate XML, SMTP TLS JSON, and report-email
+    parsers in turn; when all three reject the input, only the parser for the
+    format the content actually resembles produced a meaningful error. The
+    other two are noise -- a malformed aggregate report is also "not JSON" and
+    "not an email". Sniff the leading non-whitespace byte to surface the single
+    relevant reason.
+    """
+    if isinstance(content, (bytes, bytearray, memoryview)):
+        sniff = bytes(content)[:512].decode("utf-8", errors="replace")
+    else:
+        sniff = content[:512]
+    sniff = sniff.lstrip()
+
+    if sniff.startswith("<"):
+        return "Invalid aggregate report: {0}".format(aggregate_error)
+    if sniff.startswith("{"):
+        return "Invalid SMTP TLS report: {0}".format(smtp_tls_error)
+    if _looks_like_email(sniff):
+        return "Invalid report email: {0}".format(email_error)
+    return (
+        "Not a recognized report format (not a DMARC aggregate XML report, "
+        "an SMTP TLS JSON report, or a DMARC report email)"
+    )
+
+
 def parse_report_file(
    input_: Union[bytes, str, os.PathLike[str], os.PathLike[bytes], BinaryIO],
    *,
@@ -1930,6 +2006,10 @@ def parse_report_file(

    results: Optional[ParsedReport] = None

+    # parse_report_file tries the three report formats in turn. When all three
+    # reject the input, keep each format's specific error so the final message
+    # can explain *why* the file is invalid instead of a bare "Not a valid
+    # report" (see _describe_parse_failure).
    try:
        report = parse_aggregate_report_file(
            content,
@@ -1945,11 +2025,11 @@ def parse_report_file(
            normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
        )
        results = {"report_type": "aggregate", "report": report}
-    except InvalidAggregateReport:
+    except InvalidAggregateReport as aggregate_error:
        try:
            report = parse_smtp_tls_report_json(content)
            results = {"report_type": "smtp_tls", "report": report}
-        except InvalidSMTPTLSReport:
+        except InvalidSMTPTLSReport as smtp_tls_error:
            try:
                results = parse_report_email(
                    content,
@@ -1965,8 +2045,12 @@ def parse_report_file(
                    keep_alive=keep_alive,
                    normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
                )
-            except InvalidDMARCReport:
-                raise ParserError("Not a valid report")
+            except InvalidDMARCReport as email_error:
+                raise ParserError(
+                    _describe_parse_failure(
+                        content, aggregate_error, smtp_tls_error, email_error
+                    )
+                ) from email_error

    if results is None:
        raise ParserError("Not a valid report")
@@ -1,4 +1,4 @@
-__version__ = "10.1.1"
+__version__ = "10.2.0"

 USER_AGENT = f"parsedmarc/{__version__}"

@@ -5,7 +5,10 @@ parse_aggregate_report_xml, parse_failure_report, parse_smtp_tls_report_json,
 extract_report, get_dmarc_reports_from_mbox, and the CSV / JSON renderers.
 """

+import base64
+import gzip
 import json
+import logging
 import mailbox
 import os
 import unittest
@@ -935,6 +938,33 @@ class Test(unittest.TestCase):
        result = parsedmarc._parse_report_record(record, offline=True)
        self.assertEqual(result["identifiers"]["envelope_from"], "spf.example.com")

+    def testParseReportRecordEnvelopeFromNullNoSpfDomain(self):
+        """envelope_from=None with SPF results that carry no domain must not
+        raise IndexError (regression: the branch gated on the raw SPF list but
+        indexed the filtered list, which is empty when no result has a domain)"""
+        record = {
+            "row": {
+                "source_ip": "192.0.2.1",
+                "count": "1",
+                "policy_evaluated": {
+                    "disposition": "none",
+                    "dkim": "pass",
+                    "spf": "pass",
+                },
+            },
+            "identifiers": {
+                "header_from": "example.com",
+                "envelope_from": None,
+            },
+            # A raw SPF result with no "domain" -> filtered list is empty.
+            "auth_results": {
+                "dkim": [],
+                "spf": [{"scope": "mfrom", "result": "pass"}],
+            },
+        }
+        result = parsedmarc._parse_report_record(record, offline=True)
+        self.assertIsNone(result["identifiers"]["envelope_from"])
+
    def testParseReportRecordEnvelopeTo(self):
        """envelope_to is preserved and moved correctly"""
        record = {
@@ -1013,6 +1043,14 @@ class Test(unittest.TestCase):
        with self.assertRaises(parsedmarc.InvalidSMTPTLSReport):
            parsedmarc._parse_smtp_tls_failure_details({"result-type": "err"})

+    def testParseSmtpTlsFailureDetailsNonDict(self):
+        """A non-dict failure-details value hits the catch-all (TypeError,
+        not KeyError) and is wrapped as InvalidSMTPTLSReport"""
+        with self.assertRaises(parsedmarc.InvalidSMTPTLSReport) as ctx:
+            # Deliberate wrong type to exercise the non-KeyError catch-all.
+            parsedmarc._parse_smtp_tls_failure_details("not a dict")  # pyright: ignore[reportArgumentType]
+        self.assertIsInstance(ctx.exception.__cause__, TypeError)
+
    def testParseSmtpTlsReportPolicyValid(self):
        """Valid STS policy parses correctly"""
        policy = {
@@ -1822,8 +1860,25 @@ class TestMalformedXmlRecovery(unittest.TestCase):
    <auth_results><spf><domain>example.com</domain><result>pass</result></spf></auth_results>
  </record>
 </feedback>"""
-        with self.assertRaises(parsedmarc.InvalidAggregateReport):
+        with self.assertRaises(parsedmarc.InvalidAggregateReport) as ctx:
            parsedmarc.parse_aggregate_report_xml(xml, offline=True)
+        # The missing-field error chains the underlying KeyError so a library
+        # caller can inspect which field was absent.
+        self.assertIsInstance(ctx.exception.__cause__, KeyError)
+
+    def testReportMetadataNotAStructureRaises(self):
+        """A non-structured report_metadata trips the AttributeError branch"""
+        # report_metadata is plain text rather than nested elements, so the
+        # parser's attribute access on it fails with AttributeError.
+        xml = (
+            "<feedback><report_metadata>x</report_metadata>"
+            "<policy_published><domain>x.com</domain></policy_published>"
+            "<record></record></feedback>"
+        )
+        with self.assertRaises(parsedmarc.InvalidAggregateReport) as ctx:
+            parsedmarc.parse_aggregate_report_xml(xml, offline=True)
+        self.assertIn("missing required section", str(ctx.exception))
+        self.assertIsInstance(ctx.exception.__cause__, AttributeError)


 class TestPolicyPublishedEdgeCases(unittest.TestCase):
@@ -1995,6 +2050,134 @@ class TestParseReportFile(unittest.TestCase):
        with self.assertRaises(parsedmarc.ParserError):
            parsedmarc.parse_report_file(b"this is not a report", offline=True)

+    def testParseReportFileInvalidAggregateReason(self):
+        """Malformed aggregate XML explains the aggregate-specific reason"""
+        xml = (
+            b'<?xml version="1.0"?>\n<feedback>\n'
+            b"<report_metadata><email>dmarc@example.com</email>"
+            b"<report_id>no-org</report_id></report_metadata>\n"
+            b"<policy_published><domain>example.com</domain><p>none</p>"
+            b"</policy_published>\n</feedback>"
+        )
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_file(xml, offline=True)
+        message = str(ctx.exception)
+        # The reason must name the aggregate format and the missing field,
+        # not collapse to a bare "Not a valid report".
+        self.assertIn("aggregate", message.lower())
+        self.assertIn("org_name", message)
+        self.assertNotIn("Not a valid report", message)
+
+    def testParseReportFileInvalidSmtpTlsReason(self):
+        """Malformed SMTP TLS JSON explains the SMTP-TLS-specific reason"""
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_file(b'{"organization-name": "x"}', offline=True)
+        message = str(ctx.exception)
+        self.assertIn("SMTP TLS", message)
+        self.assertIn("date-range", message)
+
+    def testParseReportFileInvalidFailureReason(self):
+        """A malformed failure report (email path) explains the reason"""
+        # A real DMARC failure report arrives as a multipart/report email;
+        # parse_report_file reaches it only via the email branch. Omit the
+        # required Source-IP so parse_failure_report rejects it.
+        eml = (
+            b"From: dmarc-noreply@example.com\n"
+            b"Subject: DMARC Failure Report\n"
+            b"MIME-Version: 1.0\n"
+            b"Content-Type: multipart/report; "
+            b'report-type=feedback-report; boundary="b"\n\n'
+            b"--b\n"
+            b"Content-Type: text/plain\n\n"
+            b"This is a DMARC failure report.\n"
+            b"--b\n"
+            b"Content-Type: message/feedback-report\n\n"
+            b"Feedback-Type: auth-failure\n"
+            b"Version: 1\n"
+            b"--b\n"
+            b"Content-Type: message/rfc822\n\n"
+            b"From: spoof@victim.example\n"
+            b"Subject: hi\n"
+            b"--b--\n"
+        )
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_file(eml, offline=True)
+        message = str(ctx.exception)
+        # The reason must name the failure format and the missing field,
+        # not collapse to a bare "Not a valid report".
+        self.assertIn("failure", message.lower())
+        self.assertIn("source_ip", message)
+        self.assertNotIn("Not a valid report", message)
+
+    def testParseReportFileUnrecognizedGzip(self):
+        """Gzipped junk decompresses to a str that matches no format.
+
+        Exercises the str branch of the content sniff (the decompressed
+        payload is a str, not bytes).
+        """
+        blob = gzip.compress(b"plain junk not a report")
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_file(blob, offline=True)
+        self.assertIn("recognized report format", str(ctx.exception))
+
+    def testParseReportFileUnrecognizedFormat(self):
+        """Content matching no known format says so explicitly"""
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_file(b"this is not a report", offline=True)
+        self.assertIn("recognized report format", str(ctx.exception))
+
+    def testParseReportFilePreservesCause(self):
+        """The raised ParserError chains the underlying parse failure"""
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_file(b"<feedback></feedback>", offline=True)
+        # raise ... from email_error must populate __cause__ for callers and
+        # tracebacks, even though the cross-format message is content-sniffed.
+        self.assertIsNotNone(ctx.exception.__cause__)
+
+    def _parse_unexpected_error(self):
+        # <feedback></feedback> parses as XML but trips a NoneType subscript
+        # deep in the aggregate parser, hitting the catch-all "Unexpected
+        # error" branch (not a narrow KeyError/ExpatError).
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_file(b"<feedback></feedback>", offline=True)
+        return str(ctx.exception)
+
+    def testUnexpectedErrorOmitsOriginWhenNotDebug(self):
+        """Catch-all errors stay clean when the logger is above DEBUG"""
+        logger = logging.getLogger("parsedmarc.log")
+        previous = logger.level
+        logger.setLevel(logging.WARNING)
+        try:
+            message = self._parse_unexpected_error()
+        finally:
+            logger.setLevel(previous)
+        self.assertIn("Unexpected error", message)
+        self.assertNotIn("raised at", message)
+
+    def testUnexpectedErrorCitesOriginInDebug(self):
+        """Catch-all errors cite the source file:line when the logger is DEBUG"""
+        logger = logging.getLogger("parsedmarc.log")
+        previous = logger.level
+        logger.setLevel(logging.DEBUG)
+        try:
+            message = self._parse_unexpected_error()
+        finally:
+            logger.setLevel(previous)
+        self.assertIn("raised at", message)
+        self.assertIn("__init__.py:", message)
+
+    def testExcOriginEmptyWhenNoTraceback(self):
+        """_exc_origin returns '' for an exception with no traceback"""
+        logger = logging.getLogger("parsedmarc.log")
+        previous = logger.level
+        logger.setLevel(logging.DEBUG)
+        try:
+            # A never-raised exception has __traceback__ is None, so there is
+            # no origin frame to cite even though debug logging is on.
+            self.assertEqual(parsedmarc._exc_origin(ValueError("x")), "")
+        finally:
+            logger.setLevel(previous)
+

 class TestParseReportEmail(unittest.TestCase):
    """Tests for parse_report_email edge cases"""
@@ -2017,6 +2200,79 @@ This is not a DMARC report."""
        with self.assertRaises(parsedmarc.InvalidDMARCReport):
            parsedmarc.parse_report_email(email_str, offline=True)

+    def testUnparseableDateRaisesParserError(self):
+        """An unparseable Date header trips the initial mail-parse catch-all"""
+        # human_timestamp_to_datetime() raises on a junk Date, which the
+        # catch-all around the initial parse turns into a ParserError.
+        email_str = "From: a@b.c\nDate: not-a-real-date\nSubject: x\n\nbody"
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_email(email_str, offline=True)
+        self.assertIn("not-a-real-date", str(ctx.exception))
+
+    def testFailureTextReportParses(self):
+        """A valid legacy text/plain failure report parses to a failure
+        report (the success path that builds the synthetic feedback report
+        and extracts the message sample)"""
+        # The field-name regex matches letters and spaces only, so the legacy
+        # fields are space-separated ("Received Date", "Sender IP Address").
+        eml = (
+            "From: report@example.com\nSubject: Failure Report\n"
+            "Content-Type: text/plain\n\n"
+            "A message claiming to be from you has failed authentication.\n"
+            "Received Date: Mon, 01 Jan 2024 00:00:00 +0000\n"
+            "Sender IP Address: 192.0.2.1\n"
+            "detected.\n"
+            "From: spoof@example.com\nTo: victim@example.com\nSubject: spam\n"
+        )
+        result = parsedmarc.parse_report_email(eml, offline=True)
+        self.assertEqual(result["report_type"], "failure")
+        report = cast(FailureReport, result["report"])
+        self.assertEqual(report["source"]["ip_address"], "192.0.2.1")
+
+    def testFailureTextMissingFieldsRaises(self):
+        """A text/plain failure report missing its fields is rejected with
+        the subject named (not silently dropped)"""
+        # Has the trigger phrase and "detected." but none of the
+        # Received-Date / Sender-IP-Address fields, so building the synthetic
+        # feedback report raises KeyError, surfaced as InvalidDMARCReport.
+        eml = (
+            "From: a@b.c\nSubject: Failure\nContent-Type: text/plain\n\n"
+            "A message claiming to be from you has failed. "
+            "No fields here detected. nothing\n"
+        )
+        with self.assertRaises(parsedmarc.InvalidDMARCReport) as ctx:
+            parsedmarc.parse_report_email(eml, offline=True)
+        self.assertIn("Failure", str(ctx.exception))
+
+    def testAttachmentMalformedXmlRaises(self):
+        """A base64 attachment of malformed aggregate XML is rejected"""
+        att = base64.b64encode(b"<feedback></feedback>").decode()
+        eml = (
+            "From: a@b.c\nSubject: Agg\nMIME-Version: 1.0\n"
+            "Content-Type: application/octet-stream\n"
+            "Content-Transfer-Encoding: base64\n\n" + att + "\n"
+        )
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_email(eml, offline=True)
+        self.assertIn("not a valid DMARC report", str(ctx.exception))
+
+    def testAttachmentInvalidJsonRaises(self):
+        """A base64 attachment of invalid SMTP TLS JSON is rejected.
+
+        parse_smtp_tls_report_json raises InvalidSMTPTLSReport, a sibling of
+        InvalidDMARCReport, so it falls through to the generic catch-all and
+        becomes a ParserError naming the subject.
+        """
+        att = base64.b64encode(b"{not valid json").decode()
+        eml = (
+            "From: a@b.c\nSubject: Tls\nMIME-Version: 1.0\n"
+            "Content-Type: application/octet-stream\n"
+            "Content-Transfer-Encoding: base64\n\n" + att + "\n"
+        )
+        with self.assertRaises(parsedmarc.ParserError) as ctx:
+            parsedmarc.parse_report_email(eml, offline=True)
+        self.assertIn("Tls", str(ctx.exception))
+

 class TestFailureReportParsing(unittest.TestCase):
    """Tests for failure report field defaults and edge cases"""
@@ -2232,6 +2488,30 @@ class TestSmtpTlsReportErrors(unittest.TestCase):
        with self.assertRaises(parsedmarc.InvalidSMTPTLSReport):
            parsedmarc.parse_smtp_tls_report_json("not json {{{")

+    def testInvalidJsonPreservesCause(self):
+        """Invalid JSON chains the underlying JSONDecodeError"""
+        with self.assertRaises(parsedmarc.InvalidSMTPTLSReport) as ctx:
+            parsedmarc.parse_smtp_tls_report_json("not json {{{")
+        self.assertIsInstance(ctx.exception.__cause__, json.JSONDecodeError)
+
+    def testNestedMissingKeyNamesTheField(self):
+        """A KeyError on a nested field reports which field was missing"""
+        # All five required top-level fields are present, so this gets past the
+        # top-level check and raises KeyError on date-range["start-datetime"].
+        report = json.dumps(
+            {
+                "organization-name": "x",
+                "date-range": {},
+                "contact-info": "x",
+                "report-id": "x",
+                "policies": [],
+            }
+        )
+        with self.assertRaises(parsedmarc.InvalidSMTPTLSReport) as ctx:
+            parsedmarc.parse_smtp_tls_report_json(report)
+        self.assertIn("start-datetime", str(ctx.exception))
+        self.assertIsInstance(ctx.exception.__cause__, KeyError)
+

 class TestBucketIntervalEdgeCases(unittest.TestCase):
    """Tests for _bucket_interval_by_day edge cases"""