mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-22 11:55:24 +00:00
b7b8383fa4
* Expand honest test coverage from 59% to 83%; fix two latent bugs 271 new tests across the output modules, ES/OS clients, CLI config parsing, and the top-level parsing surface. Coverage measured against shipped code only (see [tool.coverage.run] source = ["parsedmarc"] omit = ["*/parsedmarc/resources/maps/*.py"] in pyproject.toml). Per-module results: s3.py 38% → 100% (also fixes SMTP-TLS-to-S3 bug below) gelf.py 40% → 100% syslog.py 46% → 100% kafkaclient.py 34% → 100% splunk.py 24% → 100% loganalytics.py 56% → 100% webhook.py 78% → 100% (also removes redundant try/except) elastic.py 36% → 99% opensearch.py 40% → 99% cli.py 52% → 69% __init__.py 74% → 76% (also fixes append_json bug below) utils.py 84% (unchanged in this PR) TOTAL 59% → 83% The remaining 17% is honest. The biggest unreached blocks are _main() in cli.py and the watch-mode mailbox iteration in __init__.py, both of which would require either standing up live subsystems (real Elasticsearch, real IMAP) or mocking deep enough that the test would verify the mock rather than the code. The PR-A AGENTS.md guidance — "if 90% requires faking it, ship 85% honestly" — applies here. Bugs fixed while writing tests: 1. parsedmarc/s3.py — SMTP-TLS-to-S3 was completely broken. save_report_to_s3 unconditionally read report["report_metadata"] when building S3 object metadata, but RFC 8460 §4.3 SMTP TLS reports are flat (no report_metadata sub-object). The CLI's surrounding try/except silently swallowed the KeyError, so every SMTP-TLS report quietly failed to upload. Also fixes a related issue: parse_smtp_tls_report_json stores begin_date as the raw ISO-8601 string from the report (per the SMTPTLSReport TypedDict and RFC 8460 §4.3), but the S3 code path assumed a datetime with .year / .month / .day attributes. Both fixed; the broken metadata-extraction branch now uses the flat-report fields, and the date branch normalizes via human_timestamp_to_datetime. 2. parsedmarc/__init__.py — append_json corrupted JSON output files on the second write. The original implementation opened files in "a+" mode, then seek()ed backwards to overwrite the trailing "]" with ",\n" before appending more elements. Python's docs are explicit (https://docs.python.org/3/library/functions.html#open): on POSIX, writes in "a"/"a+" mode always go to EOF regardless of seek() position. The result was that the second call produced [...]\n],\n[...] -style corrupted output instead of a single merged array. Replaced with a read-merge-write pattern: load the existing array (if any), append the new elements, rewrite the whole file. The CSV cousin append_csv was not affected — it doesn't seek backwards. 3. parsedmarc/webhook.py — removed redundant try/except blocks in save_aggregate_report_to_webhook / save_failure_report_to_webhook / save_smtp_tls_report_to_webhook. _send_to_webhook already catches every Exception itself, so the outer except blocks were unreachable dead code (covered nothing, defended against nothing, and inflated the source-line count without testing value). Testing approach: mocks at SDK boundaries (boto3 resource, kafka producer, requests session, opensearch/elasticsearch Document/Search, azure LogsIngestionClient). Tests verify the parsedmarc-side transformation logic — document/event construction, index/topic naming, dedup queries, error wrapping — rather than asserting on mock invocations as a proxy for behaviour. Where a branch is defensive against a caller that doesn't exist in the codebase, the test is omitted (commented in code rather than hidden behind a pragma). 547 tests total (was 276), all passing. ruff check + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document the two bug fixes from this PR in the 10.0.0 changelog Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document testing standards in AGENTS.md Adds a "Testing standards" section covering the principles applied in PR-A (split) and PR-B (coverage expansion): - Coverage measures shipped code only — don't reintroduce tests/* to the scope, don't expand omit, don't use # pragma: no cover. - Honest tests assert on observable behaviour, not "the mock was called". Mock at SDK boundaries; parse the payload that gets sent. - "If 90% requires faking it, ship 85% honestly" — coverage is a tool, not a goal. PR-B's deliberate stops at cli.py 69% and __init__.py 76% are the documented precedent for when to halt. - Verify bug claims against the relevant RFC, internal types, installed SDK source, or upstream docs before changing code. Cite the source in the commit message and test docstring (RFC 8460 §4.3 and the Python open() docs for #775's two bug fixes are the pattern to follow). - Bugs found while writing tests are fixed in the same PR; the test doubles as the regression guard. - File layout (tests/test_<module>.py) is non-negotiable; module-level test loggers need fresh-handler setup so test ordering doesn't break assertLogs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Cover the corrupt-file fallback in append_json Codecov flagged 2 missing patch-coverage lines on PR #775: the except (json.JSONDecodeError, OSError) branch in append_json, which falls back to overwriting when the existing file isn't a parseable JSON array. Two new tests in tests/test_init.py:TestAppendJson exercise both paths: - test_corrupt_existing_file_is_overwritten_cleanly: existing file contains invalid JSON; append_json overwrites with the new array. - test_existing_file_with_non_list_root_is_overwritten: existing file parses as {"foo": ...} (dict, not list); the isinstance guard rejects it and we overwrite cleanly. Patch coverage now 100% on the bug fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
366 lines
14 KiB
Python
366 lines
14 KiB
Python
"""Tests for parsedmarc.syslog"""
|
|
|
|
import json
|
|
import logging
|
|
import socket
|
|
import unittest
|
|
from unittest.mock import MagicMock, patch
|
|
|
|
from parsedmarc.syslog import SyslogClient
|
|
|
|
|
|
def _sample_aggregate_report():
|
|
return {
|
|
"xml_schema": "draft",
|
|
"xml_namespace": None,
|
|
"report_metadata": {
|
|
"org_name": "example.com",
|
|
"org_email": "dmarc@example.com",
|
|
"org_extra_contact_info": None,
|
|
"report_id": "agg-1",
|
|
"begin_date": "2024-01-01 00:00:00",
|
|
"end_date": "2024-01-02 00:00:00",
|
|
"timespan_requires_normalization": False,
|
|
"original_timespan_seconds": 86400,
|
|
"errors": [],
|
|
"generator": None,
|
|
},
|
|
"policy_published": {
|
|
"domain": "example.com",
|
|
"adkim": "r",
|
|
"aspf": "r",
|
|
"p": "none",
|
|
"sp": "none",
|
|
"pct": None,
|
|
"fo": None,
|
|
"np": None,
|
|
"testing": None,
|
|
"discovery_method": None,
|
|
},
|
|
"records": [
|
|
{
|
|
"interval_begin": "2024-01-01 00:00:00",
|
|
"interval_end": "2024-01-02 00:00:00",
|
|
"normalized_timespan": False,
|
|
"source": {
|
|
"ip_address": "192.0.2.1",
|
|
"country": "US",
|
|
"reverse_dns": None,
|
|
"base_domain": None,
|
|
"name": None,
|
|
"type": None,
|
|
"asn": 64496,
|
|
"as_name": "Example AS",
|
|
"as_domain": "example.net",
|
|
},
|
|
"count": 9,
|
|
"alignment": {"spf": True, "dkim": True, "dmarc": True},
|
|
"policy_evaluated": {
|
|
"disposition": "none",
|
|
"dkim": "pass",
|
|
"spf": "pass",
|
|
"policy_override_reasons": [],
|
|
},
|
|
"identifiers": {
|
|
"header_from": "example.com",
|
|
"envelope_from": "example.com",
|
|
"envelope_to": None,
|
|
},
|
|
"auth_results": {"dkim": [], "spf": []},
|
|
}
|
|
],
|
|
}
|
|
|
|
|
|
class _CapturingHandler(logging.Handler):
|
|
"""Records the messages emitted by SyslogClient.logger."""
|
|
|
|
def __init__(self):
|
|
super().__init__()
|
|
self.messages: list[str] = []
|
|
|
|
def emit(self, record):
|
|
self.messages.append(record.getMessage())
|
|
|
|
|
|
def _fresh_logger():
|
|
"""Reset the module-level parsedmarc_syslog logger before each test."""
|
|
logging.getLogger("parsedmarc_syslog").handlers.clear()
|
|
|
|
|
|
class TestSyslogClientInitUdp(unittest.TestCase):
|
|
"""UDP is the default protocol — back-compat for every existing
|
|
deployment. The handler must be SOCK_DGRAM, not SOCK_STREAM."""
|
|
|
|
def test_udp_uses_dgram_socket(self):
|
|
_fresh_logger()
|
|
with patch("parsedmarc.syslog.logging.handlers.SysLogHandler") as mock_handler:
|
|
SyslogClient(server_name="syslog.example.com", server_port=514)
|
|
mock_handler.assert_called_once_with(
|
|
address=("syslog.example.com", 514),
|
|
socktype=socket.SOCK_DGRAM,
|
|
)
|
|
|
|
def test_udp_is_default(self):
|
|
"""Explicit protocol='udp' and default produce the same call."""
|
|
_fresh_logger()
|
|
with patch("parsedmarc.syslog.logging.handlers.SysLogHandler") as mock_handler:
|
|
SyslogClient("s", 514, protocol="udp")
|
|
kwargs = mock_handler.call_args.kwargs
|
|
self.assertEqual(kwargs["socktype"], socket.SOCK_DGRAM)
|
|
|
|
|
|
class TestSyslogClientInitTcp(unittest.TestCase):
|
|
"""TCP path applies the configured timeout to the underlying socket
|
|
and uses SOCK_STREAM. Wrong socket type would silently fail to
|
|
deliver messages."""
|
|
|
|
def test_tcp_uses_stream_socket(self):
|
|
_fresh_logger()
|
|
with patch("parsedmarc.syslog.logging.handlers.SysLogHandler") as mock_handler:
|
|
mock_handler.return_value.socket = MagicMock()
|
|
SyslogClient("s", 6514, protocol="tcp")
|
|
kwargs = mock_handler.call_args.kwargs
|
|
self.assertEqual(kwargs["socktype"], socket.SOCK_STREAM)
|
|
|
|
def test_tcp_applies_timeout_to_socket(self):
|
|
_fresh_logger()
|
|
sock = MagicMock()
|
|
with patch("parsedmarc.syslog.logging.handlers.SysLogHandler") as mock_handler:
|
|
mock_handler.return_value.socket = sock
|
|
SyslogClient("s", 6514, protocol="tcp", timeout=12.5)
|
|
sock.settimeout.assert_called_once_with(12.5)
|
|
|
|
|
|
class TestSyslogClientInitTls(unittest.TestCase):
|
|
"""TLS path: TLS ≥1.2 minimum, optional CA + client cert, retry on
|
|
connection failure. Each has user-facing security consequences."""
|
|
|
|
def _patch_handler_and_ssl(self):
|
|
handler_patch = patch("parsedmarc.syslog.logging.handlers.SysLogHandler")
|
|
ssl_patch = patch("parsedmarc.syslog.ssl.create_default_context")
|
|
return handler_patch, ssl_patch
|
|
|
|
def test_tls_enforces_tls_1_2_minimum(self):
|
|
"""The lowest version security teams accept is TLS 1.2."""
|
|
_fresh_logger()
|
|
import ssl
|
|
|
|
handler_p, ssl_p = self._patch_handler_and_ssl()
|
|
with handler_p as mock_h, ssl_p as mock_ctx_factory:
|
|
mock_h.return_value.socket = MagicMock()
|
|
ctx = mock_ctx_factory.return_value
|
|
SyslogClient("s", 6514, protocol="tls")
|
|
mock_ctx_factory.assert_called_once_with()
|
|
self.assertEqual(ctx.minimum_version, ssl.TLSVersion.TLSv1_2)
|
|
|
|
def test_tls_loads_ca_when_cafile_provided(self):
|
|
_fresh_logger()
|
|
handler_p, ssl_p = self._patch_handler_and_ssl()
|
|
with handler_p as mock_h, ssl_p as mock_ctx_factory:
|
|
mock_h.return_value.socket = MagicMock()
|
|
SyslogClient("s", 6514, protocol="tls", cafile_path="/etc/ca.pem")
|
|
mock_ctx_factory.return_value.load_verify_locations.assert_called_once_with(
|
|
cafile="/etc/ca.pem"
|
|
)
|
|
|
|
def test_tls_loads_client_cert_when_both_paths_provided(self):
|
|
_fresh_logger()
|
|
handler_p, ssl_p = self._patch_handler_and_ssl()
|
|
with handler_p as mock_h, ssl_p as mock_ctx_factory:
|
|
mock_h.return_value.socket = MagicMock()
|
|
SyslogClient(
|
|
"s",
|
|
6514,
|
|
protocol="tls",
|
|
certfile_path="/etc/c.pem",
|
|
keyfile_path="/etc/k.pem",
|
|
)
|
|
mock_ctx_factory.return_value.load_cert_chain.assert_called_once_with(
|
|
certfile="/etc/c.pem",
|
|
keyfile="/etc/k.pem",
|
|
)
|
|
|
|
def test_tls_warns_when_only_certfile_provided(self):
|
|
"""Half-configured client auth (cert without key, or vice
|
|
versa) is a config bug that disables client auth silently.
|
|
The code warns instead."""
|
|
_fresh_logger()
|
|
handler_p, ssl_p = self._patch_handler_and_ssl()
|
|
with handler_p as mock_h, ssl_p:
|
|
mock_h.return_value.socket = MagicMock()
|
|
with self.assertLogs("parsedmarc_syslog", level="WARNING") as cm:
|
|
SyslogClient("s", 6514, protocol="tls", certfile_path="/etc/c.pem")
|
|
self.assertTrue(
|
|
any("Both certfile_path and keyfile_path" in m for m in cm.output)
|
|
)
|
|
|
|
def test_tls_wraps_socket_with_server_hostname(self):
|
|
"""server_name is used as TLS SNI / certificate-verification hostname."""
|
|
_fresh_logger()
|
|
wrapped_sock = MagicMock()
|
|
handler_p, ssl_p = self._patch_handler_and_ssl()
|
|
with handler_p as mock_h, ssl_p as mock_ctx_factory:
|
|
raw_sock = MagicMock()
|
|
mock_h.return_value.socket = raw_sock
|
|
mock_ctx_factory.return_value.wrap_socket.return_value = wrapped_sock
|
|
SyslogClient("syslog.example.com", 6514, protocol="tls")
|
|
mock_ctx_factory.return_value.wrap_socket.assert_called_once_with(
|
|
raw_sock, server_hostname="syslog.example.com"
|
|
)
|
|
|
|
def test_tls_retries_then_succeeds(self):
|
|
"""Transient connection failures should retry up to
|
|
retry_attempts before raising."""
|
|
_fresh_logger()
|
|
attempts = {"n": 0}
|
|
|
|
def flaky_handler(*args, **kwargs):
|
|
attempts["n"] += 1
|
|
if attempts["n"] < 2:
|
|
raise OSError("network down")
|
|
h = MagicMock()
|
|
h.socket = MagicMock()
|
|
return h
|
|
|
|
with (
|
|
patch(
|
|
"parsedmarc.syslog.logging.handlers.SysLogHandler",
|
|
side_effect=flaky_handler,
|
|
),
|
|
patch("parsedmarc.syslog.ssl.create_default_context"),
|
|
patch("parsedmarc.syslog.time.sleep") as mock_sleep,
|
|
):
|
|
SyslogClient("s", 6514, protocol="tls", retry_attempts=3, retry_delay=1)
|
|
self.assertEqual(attempts["n"], 2)
|
|
mock_sleep.assert_called_with(1)
|
|
|
|
def test_tls_raises_after_exhausting_retries(self):
|
|
_fresh_logger()
|
|
with (
|
|
patch(
|
|
"parsedmarc.syslog.logging.handlers.SysLogHandler",
|
|
side_effect=OSError("network down"),
|
|
),
|
|
patch("parsedmarc.syslog.ssl.create_default_context"),
|
|
patch("parsedmarc.syslog.time.sleep"),
|
|
):
|
|
with self.assertRaises(OSError):
|
|
SyslogClient("s", 6514, protocol="tls", retry_attempts=2, retry_delay=0)
|
|
|
|
|
|
class TestSyslogClientInitInvalidProtocol(unittest.TestCase):
|
|
"""Typos in the protocol field should fail loudly."""
|
|
|
|
def test_invalid_protocol_raises_value_error(self):
|
|
_fresh_logger()
|
|
with self.assertRaises(ValueError) as ctx:
|
|
SyslogClient("s", 514, protocol="udb")
|
|
self.assertIn("udb", str(ctx.exception))
|
|
self.assertIn("'udp', 'tcp', or 'tls'", str(ctx.exception))
|
|
|
|
|
|
class TestSyslogClientSave(unittest.TestCase):
|
|
"""save_* methods emit one syslog message per CSV row, each as a
|
|
JSON-encoded payload. Wrong format would break downstream parsers."""
|
|
|
|
def _client_with_capture(self):
|
|
_fresh_logger()
|
|
with patch("parsedmarc.syslog.logging.handlers.SysLogHandler"):
|
|
client = SyslogClient("s", 514)
|
|
client.logger.removeHandler(client.log_handler)
|
|
cap = _CapturingHandler()
|
|
client.logger.addHandler(cap)
|
|
return client, cap
|
|
|
|
def test_save_aggregate_emits_json_per_row(self):
|
|
client, cap = self._client_with_capture()
|
|
client.save_aggregate_report_to_syslog([_sample_aggregate_report()])
|
|
self.assertEqual(len(cap.messages), 1)
|
|
payload = json.loads(cap.messages[0])
|
|
self.assertEqual(payload["source_ip_address"], "192.0.2.1")
|
|
self.assertEqual(payload["count"], 9)
|
|
self.assertEqual(payload["org_name"], "example.com")
|
|
|
|
def test_save_failure_emits_json_per_report(self):
|
|
client, cap = self._client_with_capture()
|
|
failure_report = {
|
|
"feedback_type": "auth-failure",
|
|
"user_agent": "test/1.0",
|
|
"version": "1",
|
|
"original_envelope_id": None,
|
|
"original_mail_from": "x@example.com",
|
|
"original_rcpt_to": None,
|
|
"arrival_date": "Thu, 1 Jan 2024 00:00:00 +0000",
|
|
"arrival_date_utc": "2024-01-01 00:00:00",
|
|
"authentication_results": None,
|
|
"delivery_result": "other",
|
|
"auth_failure": ["dmarc"],
|
|
"authentication_mechanisms": [],
|
|
"dkim_domain": None,
|
|
"reported_domain": "example.com",
|
|
"sample_headers_only": True,
|
|
"source": {
|
|
"ip_address": "192.0.2.5",
|
|
"country": "US",
|
|
"reverse_dns": None,
|
|
"base_domain": None,
|
|
"name": None,
|
|
"type": None,
|
|
"asn": 64496,
|
|
"as_name": "Example AS",
|
|
"as_domain": "example.net",
|
|
},
|
|
"sample": "...",
|
|
"parsed_sample": {"subject": "Test"},
|
|
}
|
|
client.save_failure_report_to_syslog([failure_report])
|
|
self.assertEqual(len(cap.messages), 1)
|
|
payload = json.loads(cap.messages[0])
|
|
self.assertEqual(payload["reported_domain"], "example.com")
|
|
self.assertEqual(payload["source_ip_address"], "192.0.2.5")
|
|
|
|
def test_save_smtp_tls_emits_json_per_policy(self):
|
|
client, cap = self._client_with_capture()
|
|
report = {
|
|
"organization_name": "example.com",
|
|
"begin_date": "2024-02-03T00:00:00Z",
|
|
"end_date": "2024-02-04T00:00:00Z",
|
|
"contact_info": "tls@example.com",
|
|
"report_id": "tls-1",
|
|
"policies": [
|
|
{
|
|
"policy_domain": "example.com",
|
|
"policy_type": "sts",
|
|
"successful_session_count": 100,
|
|
"failed_session_count": 0,
|
|
}
|
|
],
|
|
}
|
|
client.save_smtp_tls_report_to_syslog([report])
|
|
self.assertEqual(len(cap.messages), 1)
|
|
payload = json.loads(cap.messages[0])
|
|
self.assertEqual(payload["policy_domain"], "example.com")
|
|
|
|
|
|
class TestSyslogClientClose(unittest.TestCase):
|
|
def test_close_removes_and_closes_handler(self):
|
|
_fresh_logger()
|
|
with patch("parsedmarc.syslog.logging.handlers.SysLogHandler") as mock_handler:
|
|
client = SyslogClient("s", 514)
|
|
client.close()
|
|
mock_handler.return_value.close.assert_called_once()
|
|
self.assertNotIn(mock_handler.return_value, client.logger.handlers)
|
|
|
|
|
|
class TestSyslogBackwardCompatAlias(unittest.TestCase):
|
|
def test_forensic_alias_points_to_failure_method(self):
|
|
self.assertIs(
|
|
SyslogClient.save_forensic_report_to_syslog, # type: ignore[attr-defined]
|
|
SyslogClient.save_failure_report_to_syslog,
|
|
)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
unittest.main(verbosity=2)
|