Split tests.py into per-module tests/test_<module>.py (#774)

* Split tests.py into per-module tests/test_<module>.py The 5174-line tests.py monolith is split into per-module files under tests/, mirroring the checkdmarc layout: tests/test_init.py parsedmarc/__init__.py parsing surface tests/test_cli.py parsedmarc/cli.py + config / env-vars / SIGHUP tests/test_utils.py parsedmarc/utils.py (DNS, IP info, PSL, etc.) tests/test_webhook.py parsedmarc/webhook.py tests/test_kafkaclient.py parsedmarc/kafkaclient.py tests/test_splunk.py parsedmarc/splunk.py tests/test_syslog.py parsedmarc/syslog.py tests/test_loganalytics.py parsedmarc/loganalytics.py tests/test_gelf.py parsedmarc/gelf.py tests/test_s3.py parsedmarc/s3.py tests/test_maps.py parsedmarc/resources/maps/ maintainer scripts The split is purely a redistribution — no test bodies changed, no tests added or removed. All 276 existing tests pass under the new layout. The current tests.py contains two kitchen-sink classes (`Test` at line 54 and `TestEnvVarConfig` at line 2360) holding tests that span many modules. Their methods are routed to the correct per-module file by name prefix; the wholly-thematic classes (TestExtractReport, TestUtilsXxx, TestSighupReload, etc.) move whole. Each target file gets its own `class Test(unittest.TestCase)` for the redistributed kitchen-sink methods, plus the thematic classes verbatim. Wiring updates: - `.github/workflows/python-tests.yml`: `pytest ... tests.py` → `python -m pytest ... tests/` (also switches to `python -m pytest` per the checkdmarc convention so cwd lands on the project root). - `pyproject.toml`: adds `[tool.pytest.ini_options] testpaths = ["tests"]` and `[tool.coverage.run] source = ["parsedmarc"]` with an `omit` for `parsedmarc/resources/maps/*.py`. The maps scripts are maintainer-only batch tooling that ships out of the wheel; excluding them from coverage makes the headline number reflect only installed library code. Runtime coverage on the new layout is 59% (was 45% with maps counted), and PR-B will push it to 90%+. - `AGENTS.md`: documents the new layout and how to run individual files / tests; tells future contributors not to reintroduce a monolithic tests.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Restore 66.9% coverage baseline (count tests/ + parsedmarc) Master's headline 66.9% number on Codecov includes the tests.py file itself (99.35% covered) being measured alongside parsedmarc/*. The original tests.py had no `[tool.coverage.run]` block, so coverage's default — "measure every file imported during the run" — counted the test code as if it were product code. The split commit added `source = ["parsedmarc"]` which suppressed measurement of the test files (correct in principle, since test files aren't shipped code), and that alone made the headline number drop by ~8 percentage points without any actual loss of testing. This commit swaps `source` for an explicit `include = ["parsedmarc/*", "tests/*"]` so both halves are measured the way they were on master. Verified: 276 tests, 66.96% line coverage (effectively unchanged from master's 66.90%). If you want the shipped-code-only number (was the headline that this commit overrides), run `pytest --cov=parsedmarc tests/`. That number is currently 59% and is the focus of the upcoming coverage-expansion PR. Also adds junit.xml to .gitignore so the CI artefact doesn't get accidentally committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Restrict coverage to shipped code (`source = ["parsedmarc"]`) Reverts the prior commit's `include = ["tests/*"]`. Counting the test files toward coverage was wrong — it conflates "shipped code exercised by tests" with "test code that pytest auto-runs", inflates the headline number, and rewards writing more tests rather than tests that verify more code. Master's apparent 66.9% was an artefact of the old monolithic tests.py having no [tool.coverage.run] block at all; coverage's default behaviour measured every imported file, including the test file itself at ~99% "covered", which added ~8 percentage points to the displayed number without any real testing signal. Restricting to `source = ["parsedmarc"]` plus the existing maps omit gives a meaningful baseline: 59% of shipped code is exercised by the test suite today. That's the number the next PR is targeting to lift to 90%+ before the 10.0.0 release; the Codecov "drop" here is a measurement correction, not a regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 03:15:24 +00:00 · 2026-05-20 19:29:09 -04:00
parent ae1e5adb66
commit 5b08627eaa
17 changed files with 5336 additions and 5178 deletions
@@ -73,7 +73,7 @@ jobs:
        pip install .[build]
    - name: Run unit tests
      run: |
-        pytest --cov --cov-report=xml --junitxml=junit.xml -o junit_family=legacy tests.py
+        python -m pytest --cov --cov-report=xml --junitxml=junit.xml -o junit_family=legacy tests/
    - name: Test sample DMARC reports
      run: |
        pip install -e .
@@ -147,3 +147,5 @@ parsedmarc/resources/maps/unknown_domains.txt
 *.bak
 *.lock
 parsedmarc/resources/maps/domain_info.tsv
+coverage.json
+junit.xml
@@ -13,10 +13,13 @@ parsedmarc is a Python module and CLI utility for parsing DMARC aggregate (RUA),
 pip install .[build]

 # Run all tests with coverage
-pytest --cov --cov-report=xml tests.py
+pytest --cov --cov-report=xml tests/
+
+# Run one test module
+pytest tests/test_init.py

 # Run a single test
-pytest tests.py::Test::testAggregateSamples
+pytest tests/test_init.py::Test::testAggregateSamples

 # Lint and format
 ruff check .
@@ -107,7 +110,7 @@ IP address info cached for 4 hours, seen aggregate report IDs cached for 1 hour
 - Ruff for formatting and linting (configured in `.vscode/settings.json`). Run `ruff check .` and `ruff format --check .` after every code edit, before committing.
 - TypedDict for structured data, type hints throughout.
 - Python ≥3.10 required.
- Tests are in a single `tests.py` file using unittest; sample reports live in `samples/`.
+- Tests live under `tests/` as `tests/test_<module>.py`, one per top-level `parsedmarc/*` module (e.g. `tests/test_init.py` for `parsedmarc/__init__.py`, `tests/test_cli.py` for `parsedmarc/cli.py`). All test classes use `unittest`. Sample reports live in `samples/`. Run with `pytest tests/`; run one file with `pytest tests/test_init.py`. New tests go in the file whose module they exercise — do not reintroduce a monolithic test file.
 - File path config values must be wrapped with `_expand_path()` in `cli.py`.
 - Maildir UID checks are intentionally relaxed (warn, don't crash) for Docker compatibility.
 - Token file writes must create parent directories before opening for write.
@@ -96,3 +96,26 @@ exclude = [
    # which must keep shipping for `importlib.resources.files()` lookups).
    "parsedmarc/resources/maps/[!_]*.py",
 ]
+
+[tool.pytest.ini_options]
+# Default to the per-module test layout under tests/. New tests should go
+# into tests/test_<module>.py to match the file they exercise; do not
+# reintroduce a monolithic tests.py.
+testpaths = ["tests"]
+
+[tool.coverage.run]
+# Coverage measures shipped code only. Master's reported ≈66.9% on
+# Codecov was an artefact of the old monolithic tests.py having no
+# [tool.coverage.run] block, which let coverage's default behaviour
+# measure every file imported during the run — including the test file
+# itself at ~99% "covered". That inflated the headline by ~8 percentage
+# points without any actual testing signal. Restricting to the parsedmarc
+# package gives a meaningful number that tracks how much of the shipped
+# library the test suite actually exercises.
+source = ["parsedmarc"]
+# Maintainer-only batch scripts under parsedmarc/resources/maps/ ship
+# out of the wheel (see the [tool.hatch.build] exclude block above) —
+# omit them so the headline number reflects only installed library code.
+omit = [
+    "*/parsedmarc/resources/maps/*.py",
+]
@@ -0,0 +1,23 @@
+"""Tests for parsedmarc.gelf"""
+
+import unittest
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testGelfBackwardCompatAlias(self):
+        """GelfClient forensic alias points to failure method"""
+        from parsedmarc.gelf import GelfClient
+
+        self.assertIs(
+            GelfClient.save_forensic_report_to_gelf,  # type: ignore[attr-defined]
+            GelfClient.save_failure_report_to_gelf,
+        )
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,58 @@
+"""Tests for parsedmarc.kafkaclient"""
+
+import unittest
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testKafkaStripMetadata(self):
+        """KafkaClient.strip_metadata extracts metadata to root"""
+        from parsedmarc.kafkaclient import KafkaClient
+
+        report = {
+            "report_metadata": {
+                "org_name": "TestOrg",
+                "org_email": "test@example.com",
+                "report_id": "r-123",
+                "begin_date": "2024-01-01",
+                "end_date": "2024-01-02",
+            },
+            "records": [],
+        }
+        result = KafkaClient.strip_metadata(report)
+        self.assertEqual(result["org_name"], "TestOrg")
+        self.assertEqual(result["org_email"], "test@example.com")
+        self.assertEqual(result["report_id"], "r-123")
+        self.assertNotIn("report_metadata", result)
+
+    def testKafkaGenerateDateRange(self):
+        """KafkaClient.generate_date_range generates date range list"""
+        from parsedmarc.kafkaclient import KafkaClient
+
+        report = {
+            "report_metadata": {
+                "begin_date": "2024-01-01 00:00:00",
+                "end_date": "2024-01-02 00:00:00",
+            }
+        }
+        result = KafkaClient.generate_date_range(report)
+        self.assertEqual(len(result), 2)
+        self.assertIn("2024-01-01", result[0])
+        self.assertIn("2024-01-02", result[1])
+
+    def testKafkaBackwardCompatAlias(self):
+        """KafkaClient forensic alias points to failure method"""
+        from parsedmarc.kafkaclient import KafkaClient
+
+        self.assertIs(
+            KafkaClient.save_forensic_reports_to_kafka,  # type: ignore[attr-defined]
+            KafkaClient.save_failure_reports_to_kafka,
+        )
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,53 @@
+"""Tests for parsedmarc.loganalytics"""
+
+import unittest
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testLogAnalyticsConfig(self):
+        """LogAnalyticsConfig stores all fields"""
+        from parsedmarc.loganalytics import LogAnalyticsConfig
+
+        config = LogAnalyticsConfig(
+            client_id="cid",
+            client_secret="csec",
+            tenant_id="tid",
+            dce="https://dce.example.com",
+            dcr_immutable_id="dcr-123",
+            dcr_aggregate_stream="agg-stream",
+            dcr_failure_stream="fail-stream",
+            dcr_smtp_tls_stream="tls-stream",
+        )
+        self.assertEqual(config.client_id, "cid")
+        self.assertEqual(config.client_secret, "csec")
+        self.assertEqual(config.tenant_id, "tid")
+        self.assertEqual(config.dce, "https://dce.example.com")
+        self.assertEqual(config.dcr_immutable_id, "dcr-123")
+        self.assertEqual(config.dcr_aggregate_stream, "agg-stream")
+        self.assertEqual(config.dcr_failure_stream, "fail-stream")
+        self.assertEqual(config.dcr_smtp_tls_stream, "tls-stream")
+
+    def testLogAnalyticsClientValidationError(self):
+        """LogAnalyticsClient raises on missing required config"""
+        from parsedmarc.loganalytics import LogAnalyticsClient, LogAnalyticsException
+
+        with self.assertRaises(LogAnalyticsException):
+            LogAnalyticsClient(
+                client_id="",
+                client_secret="csec",
+                tenant_id="tid",
+                dce="https://dce.example.com",
+                dcr_immutable_id="dcr-123",
+                dcr_aggregate_stream="agg",
+                dcr_failure_stream="fail",
+                dcr_smtp_tls_stream="tls",
+            )
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,142 @@
+"""Tests for the map-maintenance scripts under parsedmarc/resources/maps/.
+
+These scripts are maintainer-only batch tooling — they do not ship in the
+wheel — but they still need regression coverage because they enforce the
+privacy and integrity rules for the reverse-DNS map data files."""
+
+import unittest
+
+
+class TestMapScriptsIPDetection(unittest.TestCase):
+    """Full-IP detection and PSL folding in the map-maintenance scripts."""
+
+    def test_collect_domain_info_detects_full_ips(self):
+        import parsedmarc.resources.maps.collect_domain_info as cdi
+
+        # Dotted and dashed four-octet patterns with valid octets: detected.
+        self.assertTrue(cdi._has_full_ip("74-208-244-234.cprapid.com"))
+        self.assertTrue(cdi._has_full_ip("host.192.168.1.1.example.com"))
+        self.assertTrue(cdi._has_full_ip("a-10-20-30-40-brand.com"))
+        # Three octets is NOT a full IP — OVH's reverse-DNS pattern stays safe.
+        self.assertFalse(cdi._has_full_ip("ip-147-135-108.us"))
+        # Out-of-range octet fails the 0-255 sanity check.
+        self.assertFalse(cdi._has_full_ip("999-1-2-3-foo.com"))
+        # Pure domain, no IP.
+        self.assertFalse(cdi._has_full_ip("example.com"))
+
+    def test_find_unknown_detects_full_ips(self):
+        import parsedmarc.resources.maps.find_unknown_base_reverse_dns as fu
+
+        self.assertTrue(fu._has_full_ip("170-254-144-204-nobreinternet.com.br"))
+        self.assertFalse(fu._has_full_ip("ip-147-135-108.us"))
+        self.assertFalse(fu._has_full_ip("cprapid.com"))
+
+    def test_apply_psl_override_dot_prefix(self):
+        import parsedmarc.resources.maps.collect_domain_info as cdi
+
+        ov = [".cprapid.com", ".linode.com"]
+        self.assertEqual(cdi._apply_psl_override("foo.cprapid.com", ov), "cprapid.com")
+        self.assertEqual(cdi._apply_psl_override("a.b.linode.com", ov), "linode.com")
+
+    def test_apply_psl_override_dash_prefix(self):
+        import parsedmarc.resources.maps.collect_domain_info as cdi
+
+        ov = ["-nobre.com.br"]
+        self.assertEqual(
+            cdi._apply_psl_override("1-2-3-4-nobre.com.br", ov), "nobre.com.br"
+        )
+
+    def test_apply_psl_override_no_match(self):
+        import parsedmarc.resources.maps.collect_domain_info as cdi
+
+        ov = [".cprapid.com"]
+        self.assertEqual(cdi._apply_psl_override("example.com", ov), "example.com")
+
+
+class TestDetectPSLOverrides(unittest.TestCase):
+    """Cluster detection, brand-tail extraction, and full-pipeline behaviour
+    for `detect_psl_overrides.py`."""
+
+    def setUp(self):
+        import parsedmarc.resources.maps.detect_psl_overrides as dpo
+
+        self.dpo = dpo
+
+    def test_extract_brand_tail_dot_separator(self):
+        self.assertEqual(
+            self.dpo.extract_brand_tail("74-208-244-234.cprapid.com"),
+            ".cprapid.com",
+        )
+
+    def test_extract_brand_tail_dash_separator(self):
+        self.assertEqual(
+            self.dpo.extract_brand_tail("170-254-144-204-nobre.com.br"),
+            "-nobre.com.br",
+        )
+
+    def test_extract_brand_tail_no_separator(self):
+        self.assertEqual(
+            self.dpo.extract_brand_tail("host134-254-143-190tigobusiness.com.ni"),
+            "tigobusiness.com.ni",
+        )
+
+    def test_extract_brand_tail_no_ip_returns_none(self):
+        self.assertIsNone(self.dpo.extract_brand_tail("plain.example.com"))
+
+    def test_extract_brand_tail_rejects_short_tail(self):
+        """A tail shorter than MIN_TAIL_LEN is rejected to avoid folding to `.com`."""
+        # Four-octet IP followed by only `.br` (2 chars after the dot) — too short.
+        self.assertIsNone(self.dpo.extract_brand_tail("1-2-3-4.br"))
+
+    def test_detect_clusters_meets_threshold(self):
+        domains = [
+            "1-2-3-4.cprapid.com",
+            "5-6-7-8.cprapid.com",
+            "9-10-11-12.cprapid.com",
+            "1-2-3-4-other.com.br",  # not enough of these
+        ]
+        clusters = self.dpo.detect_clusters(domains, threshold=3, known_overrides=set())
+        self.assertIn(".cprapid.com", clusters)
+        self.assertEqual(len(clusters[".cprapid.com"]), 3)
+        self.assertNotIn("-other.com.br", clusters)
+
+    def test_detect_clusters_honours_threshold(self):
+        domains = [
+            "1-2-3-4.cprapid.com",
+            "5-6-7-8.cprapid.com",
+        ]
+        clusters = self.dpo.detect_clusters(domains, threshold=3, known_overrides=set())
+        self.assertEqual(clusters, {})
+
+    def test_detect_clusters_skips_known_overrides(self):
+        """Tails already in psl_overrides.txt must not be re-proposed."""
+        domains = [
+            "1-2-3-4.cprapid.com",
+            "5-6-7-8.cprapid.com",
+            "9-10-11-12.cprapid.com",
+        ]
+        clusters = self.dpo.detect_clusters(
+            domains, threshold=3, known_overrides={".cprapid.com"}
+        )
+        self.assertNotIn(".cprapid.com", clusters)
+
+    def test_apply_override_matches_first(self):
+        """apply_override iterates in list order and returns on the first match."""
+        ov = [".cprapid.com", "-nobre.com.br"]
+        self.assertEqual(
+            self.dpo.apply_override("1-2-3-4.cprapid.com", ov), "cprapid.com"
+        )
+        self.assertEqual(
+            self.dpo.apply_override("1-2-3-4-nobre.com.br", ov), "nobre.com.br"
+        )
+        self.assertEqual(self.dpo.apply_override("unrelated.com", ov), "unrelated.com")
+
+    def test_has_full_ip_shared_with_other_scripts(self):
+        """The detect script's IP check must agree with the other map scripts."""
+        self.assertTrue(self.dpo.has_full_ip("74-208-244-234.cprapid.com"))
+        self.assertFalse(self.dpo.has_full_ip("ip-147-135-108.us"))
+        self.assertFalse(self.dpo.has_full_ip("example.com"))
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,23 @@
+"""Tests for parsedmarc.s3"""
+
+import unittest
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testS3BackwardCompatAlias(self):
+        """S3Client forensic alias points to failure method"""
+        from parsedmarc.s3 import S3Client
+
+        self.assertIs(
+            S3Client.save_forensic_report_to_s3,  # type: ignore[attr-defined]
+            S3Client.save_failure_report_to_s3,
+        )
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,49 @@
+"""Tests for parsedmarc.splunk"""
+
+import unittest
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testSplunkHECClientInit(self):
+        """HECClient initializes with correct URL and headers"""
+        from parsedmarc.splunk import HECClient
+
+        client = HECClient(
+            url="https://splunk.example.com:8088",
+            access_token="my-token",
+            index="main",
+        )
+        self.assertIn("/services/collector/event/1.0", client.url)
+        self.assertEqual(client.access_token, "my-token")
+        self.assertEqual(client.index, "main")
+        self.assertEqual(client.source, "parsedmarc")
+        self.assertIn("Splunk my-token", client.session.headers["Authorization"])
+
+    def testSplunkHECClientStripTokenPrefix(self):
+        """HECClient strips 'Splunk ' prefix from token"""
+        from parsedmarc.splunk import HECClient
+
+        client = HECClient(
+            url="https://splunk.example.com",
+            access_token="Splunk my-token",
+            index="main",
+        )
+        self.assertEqual(client.access_token, "my-token")
+
+    def testSplunkBackwardCompatAlias(self):
+        """HECClient forensic alias points to failure method"""
+        from parsedmarc.splunk import HECClient
+
+        self.assertIs(
+            HECClient.save_forensic_reports_to_splunk,  # type: ignore[attr-defined]
+            HECClient.save_failure_reports_to_splunk,
+        )
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,39 @@
+"""Tests for parsedmarc.syslog"""
+
+import unittest
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testSyslogClientUdpInit(self):
+        """SyslogClient creates UDP handler"""
+        from parsedmarc.syslog import SyslogClient
+
+        client = SyslogClient("localhost", 514, protocol="udp")
+        self.assertEqual(client.server_name, "localhost")
+        self.assertEqual(client.server_port, 514)
+        self.assertEqual(client.protocol, "udp")
+
+    def testSyslogClientInvalidProtocol(self):
+        """SyslogClient with invalid protocol raises ValueError"""
+        from parsedmarc.syslog import SyslogClient
+
+        with self.assertRaises(ValueError):
+            SyslogClient("localhost", 514, protocol="invalid")
+
+    def testSyslogBackwardCompatAlias(self):
+        """SyslogClient forensic alias points to failure method"""
+        from parsedmarc.syslog import SyslogClient
+
+        self.assertIs(
+            SyslogClient.save_forensic_report_to_syslog,  # type: ignore[attr-defined]
+            SyslogClient.save_failure_report_to_syslog,
+        )
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,722 @@
+"""Tests for parsedmarc.utils"""
+
+import os
+import tempfile
+import unittest
+from datetime import datetime, timezone
+from tempfile import NamedTemporaryFile
+from unittest.mock import MagicMock, patch
+
+import dns.exception
+import requests
+from expiringdict import ExpiringDict
+
+import parsedmarc
+import parsedmarc.utils
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testBase64Decoding(self):
+        """Test base64 decoding"""
+        # Example from Wikipedia Base64 article
+        b64_str = "YW55IGNhcm5hbCBwbGVhcw"
+        decoded_str = parsedmarc.utils.decode_base64(b64_str)
+        self.assertEqual(decoded_str, b"any carnal pleas")
+
+    def testPSLDownload(self):
+        """Test Public Suffix List domain lookups"""
+        subdomain = "foo.example.com"
+        result = parsedmarc.utils.get_base_domain(subdomain)
+        self.assertEqual(result, "example.com")
+
+        # psl_overrides.txt intentionally folds CDN-customer PTRs so every
+        # sender on the same network clusters under one display key.
+        # ``.akamaiedge.net`` is an override, so its subdomains collapse to
+        # ``akamaiedge.net`` even though the live PSL carries the finer-grained
+        # ``c.akamaiedge.net`` — the override is the design decision.
+        subdomain = "e3191.c.akamaiedge.net"
+        result = parsedmarc.utils.get_base_domain(subdomain)
+        assert result == "akamaiedge.net"
+
+    def testIpAddressInfoSurfacesASNFields(self):
+        """ASN number, name, and domain from the bundled MMDB appear on every
+        IP info result, even when no PTR resolves."""
+        info = parsedmarc.utils.get_ip_address_info("8.8.8.8", offline=True)
+        self.assertEqual(info["asn"], 15169)
+        self.assertIsInstance(info["asn"], int)
+        self.assertEqual(info["as_domain"], "google.com")
+        self.assertTrue(info["as_name"])
+
+    def testIpAddressInfoFallsBackToASNMapEntryWhenNoPTR(self):
+        """When reverse DNS is absent, the ASN domain should be used as a
+        lookup into the reverse_dns_map so the row still gets attributed,
+        while reverse_dns and base_domain remain null."""
+        info = parsedmarc.utils.get_ip_address_info("8.8.8.8", offline=True)
+        self.assertIsNone(info["reverse_dns"])
+        self.assertIsNone(info["base_domain"])
+        self.assertEqual(info["name"], "Google (Including Gmail and Google Workspace)")
+        self.assertEqual(info["type"], "Email Provider")
+
+    def testIpAddressInfoFallsBackToRawASNameOnMapMiss(self):
+        """When neither PTR nor an ASN-map entry resolves, the raw AS name
+        is used as source_name with type left null — better than leaving
+        the row unattributed."""
+        # 204.79.197.100 is in an ASN whose as_domain is not in the map at
+        # the time of this test (msn.com); this exercises the as_name
+        # fallback branch without depending on a specific map state.
+        from unittest.mock import patch
+
+        with patch(
+            "parsedmarc.utils.get_ip_address_db_record",
+            return_value={
+                "country": "US",
+                "asn": 64496,
+                "as_name": "Some Unmapped Org, Inc.",
+                "as_domain": "unmapped-for-this-test.example",
+            },
+        ):
+            # Bypass cache to avoid prior-test pollution.
+            info = parsedmarc.utils.get_ip_address_info(
+                "192.0.2.1", offline=True, cache=None
+            )
+        self.assertIsNone(info["reverse_dns"])
+        self.assertIsNone(info["base_domain"])
+        self.assertIsNone(info["type"])
+        self.assertEqual(info["name"], "Some Unmapped Org, Inc.")
+        self.assertEqual(info["as_domain"], "unmapped-for-this-test.example")
+
+    def testWeakFallbackAttributionIsNotCached(self):
+        """A transient PTR lookup failure that lands on the raw-as_name
+        fallback must not poison the cache. ``get_reverse_dns()`` swallows
+        every DNSException as ``None``, so a timeout looks identical to a
+        real no-PTR case — if we cached the weak attribution, the 4-hour
+        TTL would lock in a misattribution even after the PTR returns.
+
+        PTR-backed matches and ASN-domain matches are stable attributions
+        and must still be cached, so we only skip the specific
+        ``reverse_dns=None AND type=None AND name=as_name`` state."""
+        from unittest.mock import patch
+        from expiringdict import ExpiringDict
+
+        cache = ExpiringDict(max_len=100, max_age_seconds=14400)
+
+        # Scenario 1: weak fallback (no PTR, unmapped as_domain, raw as_name
+        # used). Must NOT be cached.
+        with patch(
+            "parsedmarc.utils.get_ip_address_db_record",
+            return_value={
+                "country": "US",
+                "asn": 64496,
+                "as_name": "Some Unmapped Org, Inc.",
+                "as_domain": "unmapped-for-this-test.example",
+            },
+        ):
+            parsedmarc.utils.get_ip_address_info("192.0.2.1", offline=True, cache=cache)
+        self.assertNotIn("192.0.2.1", cache)
+
+        # Scenario 2: ASN-domain match (no PTR, as_domain IS in the map).
+        # Stable attribution — must still be cached.
+        with patch(
+            "parsedmarc.utils.get_ip_address_db_record",
+            return_value={
+                "country": "US",
+                "asn": 15169,
+                "as_name": "Google LLC",
+                "as_domain": "google.com",
+            },
+        ):
+            parsedmarc.utils.get_ip_address_info("192.0.2.2", offline=True, cache=cache)
+        self.assertIn("192.0.2.2", cache)
+
+    def testIPinfoAPIPrimarySourceAndInvalidKeyIsFatal(self):
+        """With an API token configured, lookups hit the API first via the
+        documented ?token= query param. A 401/403 response propagates as
+        ``InvalidIPinfoAPIKey`` so the CLI can exit fatally. Any other
+        non-2xx or network error falls through to the MMDB silently.
+
+        The IPinfo Lite API is documented as having no request limit, so
+        there is no rate-limit/quota handling to test — only the fatal path
+        on invalid tokens and the success path."""
+        from unittest.mock import patch, MagicMock
+
+        from parsedmarc.utils import (
+            InvalidIPinfoAPIKey,
+            configure_ipinfo_api,
+            get_ip_address_db_record,
+        )
+
+        def _mock_response(status_code, json_body=None):
+            resp = MagicMock()
+            resp.status_code = status_code
+            resp.ok = 200 <= status_code < 300
+            resp.json.return_value = json_body or {}
+            return resp
+
+        try:
+            # Success: API returns IPinfo-schema JSON; record comes from API.
+            api_json = {
+                "ip": "8.8.8.8",
+                "asn": "AS15169",
+                "as_name": "Google LLC",
+                "as_domain": "google.com",
+                "country_code": "US",
+            }
+            with patch(
+                "parsedmarc.utils.requests.get",
+                return_value=_mock_response(200, api_json),
+            ) as mock_get:
+                configure_ipinfo_api("fake-token", probe=False)
+                record = get_ip_address_db_record("8.8.8.8")
+            self.assertEqual(record["country"], "US")
+            self.assertEqual(record["asn"], 15169)
+            self.assertEqual(record["as_domain"], "google.com")
+            # Auth must use the documented query param, not a Bearer header.
+            _, kwargs = mock_get.call_args
+            self.assertEqual(kwargs["params"], {"token": "fake-token"})
+            self.assertNotIn("Authorization", kwargs["headers"])
+
+            # Invalid key: 401 raises a fatal exception even on a random lookup.
+            with patch(
+                "parsedmarc.utils.requests.get",
+                return_value=_mock_response(401),
+            ):
+                configure_ipinfo_api("bad-token", probe=False)
+                with self.assertRaises(InvalidIPinfoAPIKey):
+                    get_ip_address_db_record("8.8.8.8")
+
+            # Any other non-2xx (e.g. 500, 503) falls back to the MMDB silently.
+            configure_ipinfo_api("fake-token", probe=False)
+            with patch(
+                "parsedmarc.utils.requests.get",
+                return_value=_mock_response(500),
+            ):
+                record = get_ip_address_db_record("8.8.8.8")
+            # MMDB fallback fills in Google's ASN from the bundled MMDB.
+            self.assertEqual(record["asn"], 15169)
+        finally:
+            configure_ipinfo_api(None)
+
+    def testTimestampToDatetime(self):
+        """timestamp_to_datetime converts UNIX timestamp to datetime"""
+        from datetime import datetime
+
+        ts = 1704067200
+        dt = parsedmarc.utils.timestamp_to_datetime(ts)
+        self.assertIsInstance(dt, datetime)
+        # Should match stdlib fromtimestamp (local time)
+        self.assertEqual(dt, datetime.fromtimestamp(ts))
+
+    def testTimestampToHuman(self):
+        """timestamp_to_human returns formatted string"""
+        result = parsedmarc.utils.timestamp_to_human(1704067200)
+        self.assertRegex(result, r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}")
+
+    def testHumanTimestampToDatetime(self):
+        """human_timestamp_to_datetime parses timestamp string"""
+        dt = parsedmarc.utils.human_timestamp_to_datetime("2024-01-01 00:00:00")
+        self.assertIsInstance(dt, datetime)
+        self.assertEqual(dt.year, 2024)
+        self.assertEqual(dt.month, 1)
+        self.assertEqual(dt.day, 1)
+
+    def testHumanTimestampToDatetimeUtc(self):
+        """human_timestamp_to_datetime with to_utc=True returns UTC"""
+        dt = parsedmarc.utils.human_timestamp_to_datetime(
+            "2024-01-01 12:00:00", to_utc=True
+        )
+        self.assertEqual(dt.tzinfo, timezone.utc)
+
+    def testHumanTimestampToDatetimeParenthesisStripping(self):
+        """Parenthesized content is stripped from timestamps"""
+        dt = parsedmarc.utils.human_timestamp_to_datetime(
+            "Mon, 01 Jan 2024 00:00:00 +0000 (UTC)"
+        )
+        self.assertEqual(dt.year, 2024)
+
+    def testHumanTimestampToDatetimeNegativeZero(self):
+        """-0000 timezone is handled"""
+        dt = parsedmarc.utils.human_timestamp_to_datetime("2024-01-01 00:00:00 -0000")
+        self.assertEqual(dt.year, 2024)
+
+    def testHumanTimestampToUnixTimestamp(self):
+        """human_timestamp_to_unix_timestamp converts to int"""
+        ts = parsedmarc.utils.human_timestamp_to_unix_timestamp("2024-01-01 00:00:00")
+        self.assertIsInstance(ts, int)
+
+    def testHumanTimestampToUnixTimestampWithT(self):
+        """T separator in timestamp is handled"""
+        ts = parsedmarc.utils.human_timestamp_to_unix_timestamp("2024-01-01T00:00:00")
+        self.assertIsInstance(ts, int)
+
+    def testGetIpAddressCountry(self):
+        """get_ip_address_country returns country code using bundled DBIP"""
+        # 8.8.8.8 is a well-known Google DNS IP in US
+        country = parsedmarc.utils.get_ip_address_country("8.8.8.8")
+        self.assertEqual(country, "US")
+
+    def testGetIpAddressCountryNotFound(self):
+        """get_ip_address_country returns None for reserved IP"""
+        country = parsedmarc.utils.get_ip_address_country("127.0.0.1")
+        self.assertIsNone(country)
+
+    def testGetServiceFromReverseDnsBaseDomainOffline(self):
+        """get_service_from_reverse_dns_base_domain in offline mode"""
+        result = parsedmarc.utils.get_service_from_reverse_dns_base_domain(
+            "google.com", offline=True
+        )
+        self.assertIn("Google", result["name"])
+        self.assertIsNotNone(result["type"])
+
+    def testGetServiceFromReverseDnsBaseDomainUnknown(self):
+        """Unknown base domain returns domain as name and None as type"""
+        result = parsedmarc.utils.get_service_from_reverse_dns_base_domain(
+            "unknown-domain-xyz.example", offline=True
+        )
+        self.assertEqual(result["name"], "unknown-domain-xyz.example")
+        self.assertIsNone(result["type"])
+
+    def testGetIpAddressInfoOffline(self):
+        """get_ip_address_info in offline mode returns country but no DNS"""
+        info = parsedmarc.utils.get_ip_address_info("8.8.8.8", offline=True)
+        self.assertEqual(info["ip_address"], "8.8.8.8")
+        self.assertEqual(info["country"], "US")
+        self.assertIsNone(info["reverse_dns"])
+
+    def testGetIpAddressInfoCache(self):
+        """get_ip_address_info uses cache on second call"""
+        from expiringdict import ExpiringDict
+
+        cache = ExpiringDict(max_len=100, max_age_seconds=60)
+        with patch("parsedmarc.utils.get_reverse_dns", return_value="dns.google"):
+            info1 = parsedmarc.utils.get_ip_address_info(
+                "8.8.8.8",
+                offline=False,
+                cache=cache,
+                always_use_local_files=True,
+            )
+        self.assertIn("8.8.8.8", cache)
+        info2 = parsedmarc.utils.get_ip_address_info(
+            "8.8.8.8", offline=False, cache=cache
+        )
+        self.assertEqual(info1["ip_address"], info2["ip_address"])
+        self.assertEqual(info2["reverse_dns"], "dns.google")
+
+    def testParseEmailAddressWithDisplayName(self):
+        """parse_email_address with display name"""
+        result = parsedmarc.utils.parse_email_address(("John Doe", "john@example.com"))  # type: ignore[arg-type]
+        self.assertEqual(result["display_name"], "John Doe")
+        self.assertEqual(result["address"], "john@example.com")
+        self.assertEqual(result["local"], "john")
+        self.assertEqual(result["domain"], "example.com")
+
+    def testParseEmailAddressWithoutDisplayName(self):
+        """parse_email_address with empty display name"""
+        result = parsedmarc.utils.parse_email_address(("", "john@example.com"))  # type: ignore[arg-type]
+        self.assertIsNone(result["display_name"])
+        self.assertEqual(result["address"], "john@example.com")
+
+    def testParseEmailAddressNoAt(self):
+        """parse_email_address with no @ returns None local/domain"""
+        result = parsedmarc.utils.parse_email_address(("", "localonly"))  # type: ignore[arg-type]
+        self.assertIsNone(result["local"])
+        self.assertIsNone(result["domain"])
+
+    def testGetFilenameSafeString(self):
+        """get_filename_safe_string removes invalid chars"""
+        result = parsedmarc.utils.get_filename_safe_string('file/name:with"bad*chars')
+        self.assertNotIn("/", result)
+        self.assertNotIn(":", result)
+        self.assertNotIn('"', result)
+        self.assertNotIn("*", result)
+
+    def testGetFilenameSafeStringNone(self):
+        """get_filename_safe_string with None returns 'None'"""
+        result = parsedmarc.utils.get_filename_safe_string(None)  # type: ignore[arg-type]
+        self.assertEqual(result, "None")
+
+    def testGetFilenameSafeStringLong(self):
+        """get_filename_safe_string truncates to 100 chars"""
+        result = parsedmarc.utils.get_filename_safe_string("a" * 200)
+        self.assertEqual(len(result), 100)
+
+    def testGetFilenameSafeStringTrailingDot(self):
+        """get_filename_safe_string strips trailing dots"""
+        result = parsedmarc.utils.get_filename_safe_string("filename...")
+        self.assertFalse(result.endswith("."))
+
+    def testIsMboxNonMbox(self):
+        """is_mbox returns False for non-mbox file"""
+        result = parsedmarc.utils.is_mbox("samples/empty.xml")
+        self.assertFalse(result)
+
+    def testIsOutlookMsgNonMsg(self):
+        """is_outlook_msg returns False for non-MSG content"""
+        self.assertFalse(parsedmarc.utils.is_outlook_msg(b"not an outlook msg"))
+        self.assertFalse(parsedmarc.utils.is_outlook_msg("string content"))
+
+    def testIsOutlookMsgMagic(self):
+        """is_outlook_msg returns True for correct magic bytes"""
+        magic = b"\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1" + b"\x00" * 100
+        self.assertTrue(parsedmarc.utils.is_outlook_msg(magic))
+
+
+class TestLoadPSLOverrides(unittest.TestCase):
+    """Covers `parsedmarc.utils.load_psl_overrides`."""
+
+    def setUp(self):
+        # Snapshot the module-level list so each test leaves it as it found it.
+        self._saved = list(parsedmarc.utils.psl_overrides)
+
+    def tearDown(self):
+        parsedmarc.utils.psl_overrides.clear()
+        parsedmarc.utils.psl_overrides.extend(self._saved)
+
+    def test_offline_loads_bundled_file(self):
+        """offline=True populates the list from the bundled file, no network."""
+        result = parsedmarc.utils.load_psl_overrides(offline=True)
+        self.assertIs(result, parsedmarc.utils.psl_overrides)
+        self.assertGreater(len(result), 0)
+        # The bundled file is expected to contain at least one well-known entry.
+        self.assertIn(".linode.com", result)
+
+    def test_local_file_path_overrides_bundled(self):
+        """A custom local_file_path takes precedence over the bundled copy."""
+        with tempfile.NamedTemporaryFile(
+            "w", suffix=".txt", delete=False, encoding="utf-8"
+        ) as tf:
+            tf.write("-custom-brand.com\n.another-brand.net\n\n   \n")
+            path = tf.name
+        try:
+            result = parsedmarc.utils.load_psl_overrides(
+                offline=True, local_file_path=path
+            )
+            self.assertEqual(result, ["-custom-brand.com", ".another-brand.net"])
+        finally:
+            os.unlink(path)
+
+    def test_clear_before_reload(self):
+        """Re-running load_psl_overrides replaces the list, not appends."""
+        parsedmarc.utils.psl_overrides.clear()
+        parsedmarc.utils.psl_overrides.append(".stale-entry.com")
+        parsedmarc.utils.load_psl_overrides(offline=True)
+        self.assertNotIn(".stale-entry.com", parsedmarc.utils.psl_overrides)
+
+    def test_url_success(self):
+        """A 200 response from the URL populates the list."""
+        fake_body = "-fetched-brand.com\n.cdn-fetched.net\n"
+        mock_response = MagicMock()
+        mock_response.text = fake_body
+        mock_response.raise_for_status = MagicMock()
+        with patch(
+            "parsedmarc.utils.requests.get", return_value=mock_response
+        ) as mock_get:
+            result = parsedmarc.utils.load_psl_overrides(url="https://example.test/ov")
+            self.assertEqual(result, ["-fetched-brand.com", ".cdn-fetched.net"])
+            mock_get.assert_called_once()
+
+    def test_url_failure_falls_back_to_local(self):
+        """A network error falls back to the bundled copy."""
+        import requests
+
+        with patch(
+            "parsedmarc.utils.requests.get",
+            side_effect=requests.exceptions.ConnectionError("nope"),
+        ):
+            result = parsedmarc.utils.load_psl_overrides(url="https://example.test/ov")
+        # Bundled file still loaded.
+        self.assertGreater(len(result), 0)
+        self.assertIn(".linode.com", result)
+
+    def test_always_use_local_skips_network(self):
+        """always_use_local_file=True must not call requests.get."""
+        with patch("parsedmarc.utils.requests.get") as mock_get:
+            parsedmarc.utils.load_psl_overrides(always_use_local_file=True)
+            mock_get.assert_not_called()
+
+
+class TestLoadReverseDnsMapReloadsPSLOverrides(unittest.TestCase):
+    """`load_reverse_dns_map` must reload `psl_overrides.txt` in the same call
+    so map entries that depend on folded bases resolve correctly."""
+
+    def setUp(self):
+        self._saved = list(parsedmarc.utils.psl_overrides)
+
+    def tearDown(self):
+        parsedmarc.utils.psl_overrides.clear()
+        parsedmarc.utils.psl_overrides.extend(self._saved)
+
+    def test_map_load_triggers_psl_reload(self):
+        """Calling load_reverse_dns_map offline also invokes load_psl_overrides
+        with matching flags, and the overrides list is repopulated."""
+        rdm = {}
+        parsedmarc.utils.psl_overrides.clear()
+        parsedmarc.utils.psl_overrides.append(".stale-from-before.com")
+        with patch(
+            "parsedmarc.utils.load_psl_overrides",
+            wraps=parsedmarc.utils.load_psl_overrides,
+        ) as spy:
+            parsedmarc.utils.load_reverse_dns_map(rdm, offline=True)
+        spy.assert_called_once()
+        kwargs = spy.call_args.kwargs
+        self.assertTrue(kwargs["offline"])
+        self.assertIsNone(kwargs["url"])
+        self.assertIsNone(kwargs["local_file_path"])
+        self.assertNotIn(".stale-from-before.com", parsedmarc.utils.psl_overrides)
+
+    def test_map_load_forwards_psl_overrides_kwargs(self):
+        """psl_overrides_path / psl_overrides_url are forwarded verbatim."""
+        rdm = {}
+        with patch("parsedmarc.utils.load_psl_overrides") as spy:
+            parsedmarc.utils.load_reverse_dns_map(
+                rdm,
+                offline=True,
+                always_use_local_file=True,
+                psl_overrides_path="/tmp/custom.txt",
+                psl_overrides_url="https://example.test/ov",
+            )
+        spy.assert_called_once_with(
+            always_use_local_file=True,
+            local_file_path="/tmp/custom.txt",
+            url="https://example.test/ov",
+            offline=True,
+        )
+
+
+class TestGetBaseDomainWithOverrides(unittest.TestCase):
+    """`get_base_domain` must honour the current psl_overrides list."""
+
+    def setUp(self):
+        self._saved = list(parsedmarc.utils.psl_overrides)
+        parsedmarc.utils.psl_overrides.clear()
+        parsedmarc.utils.psl_overrides.extend([".cprapid.com", "-nobre.com.br"])
+
+    def tearDown(self):
+        parsedmarc.utils.psl_overrides.clear()
+        parsedmarc.utils.psl_overrides.extend(self._saved)
+
+    def test_dot_prefixed_override_folds_subdomain(self):
+        result = parsedmarc.utils.get_base_domain("74-208-244-234.cprapid.com")
+        self.assertEqual(result, "cprapid.com")
+
+    def test_dash_prefixed_override_folds_subdomain(self):
+        result = parsedmarc.utils.get_base_domain("host-1-2-3-4-nobre.com.br")
+        self.assertEqual(result, "nobre.com.br")
+
+    def test_unmatched_domain_falls_through_to_psl(self):
+        result = parsedmarc.utils.get_base_domain("sub.example.com")
+        self.assertEqual(result, "example.com")
+
+
+class TestUtilsDnsCaching(unittest.TestCase):
+    """Tests for DNS query caching and reverse DNS error handling"""
+
+    def testQueryDnsUsesCacheHit(self):
+        """query_dns returns cached result without making DNS query"""
+        cache = ExpiringDict(max_len=100, max_age_seconds=60)
+        cache["example.com_A"] = ["1.2.3.4"]
+        result = parsedmarc.utils.query_dns("example.com", "A", cache=cache)
+        self.assertEqual(result, ["1.2.3.4"])
+
+    def testQueryDnsCachesResult(self):
+        """query_dns stores result in cache when cache is non-empty"""
+        cache = ExpiringDict(max_len=100, max_age_seconds=60)
+        # Pre-populate so ExpiringDict is truthy
+        cache["seed_key"] = ["seed"]
+        mock_record = MagicMock()
+        mock_record.to_text.return_value = '"1.2.3.4"'
+        mock_resolver = MagicMock()
+        mock_resolver.resolve.return_value = [mock_record]
+        with patch(
+            "parsedmarc.utils.dns.resolver.Resolver", return_value=mock_resolver
+        ):
+            result = parsedmarc.utils.query_dns(
+                "test-cache.example.com", "A", cache=cache
+            )
+            self.assertEqual(result, ["1.2.3.4"])
+            self.assertIn("test-cache.example.com_A", cache)
+
+    def testReverseDnsReturnsNoneOnFailure(self):
+        """get_reverse_dns returns None on DNS exceptions"""
+        with patch(
+            "parsedmarc.utils.query_dns",
+            side_effect=dns.exception.DNSException("timeout"),
+        ):
+            result = parsedmarc.utils.get_reverse_dns("203.0.113.1")
+            self.assertIsNone(result)
+
+
+class TestUtilsIpDbPaths(unittest.TestCase):
+    """Tests for IP database path validation"""
+
+    def testCustomPathFallsBack(self):
+        """Non-existent custom db path falls back to default"""
+        result = parsedmarc.utils.get_ip_address_country(
+            "1.1.1.1", db_path="/nonexistent/path.mmdb"
+        )
+        self.assertTrue(result is None or isinstance(result, str))
+
+    def testBundledDbWorks(self):
+        """Bundled IP database returns results"""
+        result = parsedmarc.utils.get_ip_address_country("8.8.8.8")
+        self.assertEqual(result, "US")
+
+
+class TestUtilsParseEmail(unittest.TestCase):
+    """Tests for parse_email edge cases"""
+
+    def testMinimalEmail(self):
+        """parse_email handles email with minimal headers"""
+        email_str = """From: test@example.com
+Subject: Test
+
+Body text"""
+        result = parsedmarc.utils.parse_email(email_str)
+        self.assertEqual(result["subject"], "Test")
+        self.assertEqual(result["reply_to"], [])
+
+    def testEmailWithNoSubject(self):
+        """parse_email defaults subject to None when missing"""
+        email_str = """From: test@example.com
+To: other@example.com
+
+Body"""
+        result = parsedmarc.utils.parse_email(email_str)
+        self.assertIsNone(result["subject"])
+
+    def testEmailBytesInput(self):
+        """parse_email handles bytes input"""
+        email_bytes = b"""From: test@example.com
+Subject: Bytes Test
+To: other@example.com
+
+Body"""
+        result = parsedmarc.utils.parse_email(email_bytes)
+        self.assertEqual(result["subject"], "Bytes Test")
+
+    def testEmailWithAttachments(self):
+        """parse_email with strip_attachment_payloads removes payloads"""
+        from email.mime.multipart import MIMEMultipart
+        from email.mime.text import MIMEText
+        from email.mime.base import MIMEBase
+        from email import encoders
+
+        msg = MIMEMultipart()
+        msg["From"] = "test@example.com"
+        msg["To"] = "other@example.com"
+        msg["Subject"] = "Attachment Test"
+        msg.attach(MIMEText("Body text"))
+
+        attachment = MIMEBase("application", "octet-stream")
+        attachment.set_payload(b"file content here")
+        encoders.encode_base64(attachment)
+        attachment.add_header("Content-Disposition", "attachment", filename="test.bin")
+        msg.attach(attachment)
+
+        result = parsedmarc.utils.parse_email(
+            msg.as_string(), strip_attachment_payloads=True
+        )
+        for att in result["attachments"]:
+            self.assertNotIn("payload", att)
+
+
+class TestUtilsOutlookMsg(unittest.TestCase):
+    """Tests for Outlook MSG detection and conversion"""
+
+    def testIsOutlookMsg(self):
+        """is_outlook_msg detects MSG magic bytes"""
+        msg_magic = b"\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1" + b"\x00" * 100
+        self.assertTrue(parsedmarc.utils.is_outlook_msg(msg_magic))
+
+    def testIsNotOutlookMsg(self):
+        """is_outlook_msg rejects non-MSG content"""
+        self.assertFalse(parsedmarc.utils.is_outlook_msg(b"not an msg file"))
+        self.assertFalse(parsedmarc.utils.is_outlook_msg("string input"))
+
+    def testConvertOutlookMsgInvalidInput(self):
+        """convert_outlook_msg raises ValueError for non-MSG bytes"""
+        with self.assertRaises(ValueError):
+            parsedmarc.utils.convert_outlook_msg(b"not an msg file")
+
+
+class TestUtilsReverseDnsMap(unittest.TestCase):
+    """Tests for reverse DNS map loading"""
+
+    def testLoadReverseDnsMapOffline(self):
+        """load_reverse_dns_map in offline mode loads bundled map"""
+        rdns_map = {}
+        parsedmarc.utils.load_reverse_dns_map(rdns_map, offline=True)
+        self.assertTrue(len(rdns_map) > 0)
+
+    def testLoadReverseDnsMapLocalOverride(self):
+        """load_reverse_dns_map uses local_file_path when provided"""
+        with NamedTemporaryFile("w", suffix=".csv", delete=False) as f:
+            f.write("base_reverse_dns,name,type\n")
+            f.write("custom.example.com,Custom Service,hosting\n")
+            path = f.name
+        try:
+            rdns_map = {}
+            parsedmarc.utils.load_reverse_dns_map(
+                rdns_map, offline=True, local_file_path=path
+            )
+            self.assertIn("custom.example.com", rdns_map)
+            self.assertEqual(rdns_map["custom.example.com"]["name"], "Custom Service")
+        finally:
+            os.remove(path)
+
+    def testLoadReverseDnsMapNetworkFailureFallback(self):
+        """load_reverse_dns_map falls back to bundled on network error"""
+        rdns_map = {}
+        with patch(
+            "parsedmarc.utils.requests.get",
+            side_effect=requests.exceptions.ConnectionError("no network"),
+        ):
+            parsedmarc.utils.load_reverse_dns_map(rdns_map)
+        self.assertTrue(len(rdns_map) > 0)
+
+
+class TestPslOverrides(unittest.TestCase):
+    """Tests for PSL override matching"""
+
+    def testOverrideMatch(self):
+        """PSL overrides are applied when domain ends with override"""
+        # psl_overrides contains entries; test that get_base_domain
+        # handles them without error
+        result = parsedmarc.utils.get_base_domain("sub.example.com")
+        self.assertEqual(result, "example.com")
+
+
+class TestIsMbox(unittest.TestCase):
+    """Tests for is_mbox utility"""
+
+    def testValidMbox(self):
+        """is_mbox returns True for valid mbox file"""
+        with NamedTemporaryFile(suffix=".mbox", delete=False, mode="w") as f:
+            f.write("From test@example.com Thu Jan  1 00:00:00 2024\n")
+            f.write("Subject: Test\n\nBody\n\n")
+            path = f.name
+        try:
+            self.assertTrue(parsedmarc.utils.is_mbox(path))
+        finally:
+            os.remove(path)
+
+    def testEmptyFileNotMbox(self):
+        """is_mbox returns False for empty file"""
+        with NamedTemporaryFile(suffix=".mbox", delete=False) as f:
+            path = f.name
+        try:
+            self.assertFalse(parsedmarc.utils.is_mbox(path))
+        finally:
+            os.remove(path)
+
+    def testNonExistentNotMbox(self):
+        """is_mbox returns False for non-existent file"""
+        self.assertFalse(parsedmarc.utils.is_mbox("/nonexistent/file.mbox"))
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)
@@ -0,0 +1,76 @@
+"""Tests for parsedmarc.webhook"""
+
+import unittest
+from unittest.mock import MagicMock
+
+import parsedmarc
+import parsedmarc.webhook
+
+
+class Test(unittest.TestCase):
+    """Kitchen-sink tests redistributed from the original
+    tests.py monolith. Future PRs should split these further
+    into purpose-specific TestCase subclasses as natural
+    groupings emerge."""
+
+    def testWebhookClientInit(self):
+        """WebhookClient initializes with correct attributes"""
+        from parsedmarc.webhook import WebhookClient
+
+        client = WebhookClient(
+            aggregate_url="http://agg.example.com",
+            failure_url="http://fail.example.com",
+            smtp_tls_url="http://tls.example.com",
+        )
+        self.assertEqual(client.aggregate_url, "http://agg.example.com")
+        self.assertEqual(client.failure_url, "http://fail.example.com")
+        self.assertEqual(client.smtp_tls_url, "http://tls.example.com")
+        self.assertEqual(client.timeout, 60)
+
+    def testWebhookClientSaveMethods(self):
+        """WebhookClient save methods call _send_to_webhook"""
+        from parsedmarc.webhook import WebhookClient
+
+        client = WebhookClient("http://a", "http://f", "http://t")
+        client.session = MagicMock()
+        client.save_aggregate_report_to_webhook('{"test": 1}')
+        client.session.post.assert_called_with(
+            "http://a", data='{"test": 1}', timeout=60
+        )
+        client.save_failure_report_to_webhook('{"fail": 1}')
+        client.session.post.assert_called_with(
+            "http://f", data='{"fail": 1}', timeout=60
+        )
+        client.save_smtp_tls_report_to_webhook('{"tls": 1}')
+        client.session.post.assert_called_with(
+            "http://t", data='{"tls": 1}', timeout=60
+        )
+
+    def testWebhookBackwardCompatAlias(self):
+        """WebhookClient forensic alias points to failure method"""
+        from parsedmarc.webhook import WebhookClient
+
+        self.assertIs(
+            WebhookClient.save_forensic_report_to_webhook,  # type: ignore[attr-defined]
+            WebhookClient.save_failure_report_to_webhook,
+        )
+
+
+class TestWebhookClient(unittest.TestCase):
+    """Tests for webhook client initialization and close"""
+
+    def testClose(self):
+        """WebhookClient.close() closes session"""
+        client = parsedmarc.webhook.WebhookClient(
+            aggregate_url="http://invalid.test/agg",
+            failure_url="http://invalid.test/fail",
+            smtp_tls_url="http://invalid.test/tls",
+        )
+        mock_close = MagicMock()
+        client.session.close = mock_close
+        client.close()
+        mock_close.assert_called_once()
+
+
+if __name__ == "__main__":
+    unittest.main(verbosity=2)