parsedmarc

mirror of https://github.com/domainaware/parsedmarc.git synced 2026-05-22 11:55:24 +00:00

Author	SHA1	Message	Date
DVB	bf37ded688	Add support for Elastic Cloud Serverless projects (#770 )	2026-05-20 21:36:19 -04:00
Vincent	535d9db1ad	cli: support _FILE suffix on PARSEDMARC_* env vars for Docker secrets (#772 ) Appending _FILE to any PARSEDMARC_{SECTION}_{KEY} env var reads the value from the referenced file, with one trailing newline stripped. This matches the Postgres/MariaDB/Redis container-image convention so Docker Compose and Kubernetes secret mounts work without extra glue, keeping credentials out of plain environment: blocks (and out of docker inspect, container logs, and /proc/<pid>/environ). When both the direct var and its _FILE companion are set, the file wins. A missing or unreadable file raises ConfigurationError rather than silently degrading to an empty credential. The four pre-existing config keys whose own names end in _file ([general] log_file, [msgraph] token_file, [gmail_api] credentials_file / token_file) keep their direct-path semantics; pass their values via secret by doubling the suffix (_FILE_FILE).	2026-05-20 21:11:44 -04:00
Sean Whalen	b7b8383fa4	Expand honest test coverage from 59% to 83%; fix two latent bugs (#775 ) * Expand honest test coverage from 59% to 83%; fix two latent bugs 271 new tests across the output modules, ES/OS clients, CLI config parsing, and the top-level parsing surface. Coverage measured against shipped code only (see [tool.coverage.run] source = ["parsedmarc"] omit = ["/parsedmarc/resources/maps/.py"] in pyproject.toml). Per-module results: s3.py 38% → 100% (also fixes SMTP-TLS-to-S3 bug below) gelf.py 40% → 100% syslog.py 46% → 100% kafkaclient.py 34% → 100% splunk.py 24% → 100% loganalytics.py 56% → 100% webhook.py 78% → 100% (also removes redundant try/except) elastic.py 36% → 99% opensearch.py 40% → 99% cli.py 52% → 69% __init__.py 74% → 76% (also fixes append_json bug below) utils.py 84% (unchanged in this PR) TOTAL 59% → 83% The remaining 17% is honest. The biggest unreached blocks are _main() in cli.py and the watch-mode mailbox iteration in __init__.py, both of which would require either standing up live subsystems (real Elasticsearch, real IMAP) or mocking deep enough that the test would verify the mock rather than the code. The PR-A AGENTS.md guidance — "if 90% requires faking it, ship 85% honestly" — applies here. Bugs fixed while writing tests: 1. parsedmarc/s3.py — SMTP-TLS-to-S3 was completely broken. save_report_to_s3 unconditionally read report["report_metadata"] when building S3 object metadata, but RFC 8460 §4.3 SMTP TLS reports are flat (no report_metadata sub-object). The CLI's surrounding try/except silently swallowed the KeyError, so every SMTP-TLS report quietly failed to upload. Also fixes a related issue: parse_smtp_tls_report_json stores begin_date as the raw ISO-8601 string from the report (per the SMTPTLSReport TypedDict and RFC 8460 §4.3), but the S3 code path assumed a datetime with .year / .month / .day attributes. Both fixed; the broken metadata-extraction branch now uses the flat-report fields, and the date branch normalizes via human_timestamp_to_datetime. 2. parsedmarc/__init__.py — append_json corrupted JSON output files on the second write. The original implementation opened files in "a+" mode, then seek()ed backwards to overwrite the trailing "]" with ",\n" before appending more elements. Python's docs are explicit (https://docs.python.org/3/library/functions.html#open): on POSIX, writes in "a"/"a+" mode always go to EOF regardless of seek() position. The result was that the second call produced [...]\n],\n[...] -style corrupted output instead of a single merged array. Replaced with a read-merge-write pattern: load the existing array (if any), append the new elements, rewrite the whole file. The CSV cousin append_csv was not affected — it doesn't seek backwards. 3. parsedmarc/webhook.py — removed redundant try/except blocks in save_aggregate_report_to_webhook / save_failure_report_to_webhook / save_smtp_tls_report_to_webhook. _send_to_webhook already catches every Exception itself, so the outer except blocks were unreachable dead code (covered nothing, defended against nothing, and inflated the source-line count without testing value). Testing approach: mocks at SDK boundaries (boto3 resource, kafka producer, requests session, opensearch/elasticsearch Document/Search, azure LogsIngestionClient). Tests verify the parsedmarc-side transformation logic — document/event construction, index/topic naming, dedup queries, error wrapping — rather than asserting on mock invocations as a proxy for behaviour. Where a branch is defensive against a caller that doesn't exist in the codebase, the test is omitted (commented in code rather than hidden behind a pragma). 547 tests total (was 276), all passing. ruff check + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document the two bug fixes from this PR in the 10.0.0 changelog Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document testing standards in AGENTS.md Adds a "Testing standards" section covering the principles applied in PR-A (split) and PR-B (coverage expansion): - Coverage measures shipped code only — don't reintroduce tests/* to the scope, don't expand omit, don't use # pragma: no cover. - Honest tests assert on observable behaviour, not "the mock was called". Mock at SDK boundaries; parse the payload that gets sent. - "If 90% requires faking it, ship 85% honestly" — coverage is a tool, not a goal. PR-B's deliberate stops at cli.py 69% and __init__.py 76% are the documented precedent for when to halt. - Verify bug claims against the relevant RFC, internal types, installed SDK source, or upstream docs before changing code. Cite the source in the commit message and test docstring (RFC 8460 §4.3 and the Python open() docs for #775's two bug fixes are the pattern to follow). - Bugs found while writing tests are fixed in the same PR; the test doubles as the regression guard. - File layout (tests/test_<module>.py) is non-negotiable; module-level test loggers need fresh-handler setup so test ordering doesn't break assertLogs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Cover the corrupt-file fallback in append_json Codecov flagged 2 missing patch-coverage lines on PR #775: the except (json.JSONDecodeError, OSError) branch in append_json, which falls back to overwriting when the existing file isn't a parseable JSON array. Two new tests in tests/test_init.py:TestAppendJson exercise both paths: - test_corrupt_existing_file_is_overwritten_cleanly: existing file contains invalid JSON; append_json overwrites with the new array. - test_existing_file_with_non_list_root_is_overwritten: existing file parses as {"foo": ...} (dict, not list); the isinstance guard rejects it and we overwrite cleanly. Patch coverage now 100% on the bug fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 20:35:22 -04:00
Sean Whalen	5b08627eaa	Split tests.py into per-module tests/test_<module>.py (#774 ) * Split tests.py into per-module tests/test_<module>.py The 5174-line tests.py monolith is split into per-module files under tests/, mirroring the checkdmarc layout: tests/test_init.py parsedmarc/__init__.py parsing surface tests/test_cli.py parsedmarc/cli.py + config / env-vars / SIGHUP tests/test_utils.py parsedmarc/utils.py (DNS, IP info, PSL, etc.) tests/test_webhook.py parsedmarc/webhook.py tests/test_kafkaclient.py parsedmarc/kafkaclient.py tests/test_splunk.py parsedmarc/splunk.py tests/test_syslog.py parsedmarc/syslog.py tests/test_loganalytics.py parsedmarc/loganalytics.py tests/test_gelf.py parsedmarc/gelf.py tests/test_s3.py parsedmarc/s3.py tests/test_maps.py parsedmarc/resources/maps/ maintainer scripts The split is purely a redistribution — no test bodies changed, no tests added or removed. All 276 existing tests pass under the new layout. The current tests.py contains two kitchen-sink classes (`Test` at line 54 and `TestEnvVarConfig` at line 2360) holding tests that span many modules. Their methods are routed to the correct per-module file by name prefix; the wholly-thematic classes (TestExtractReport, TestUtilsXxx, TestSighupReload, etc.) move whole. Each target file gets its own `class Test(unittest.TestCase)` for the redistributed kitchen-sink methods, plus the thematic classes verbatim. Wiring updates: - `.github/workflows/python-tests.yml`: `pytest ... tests.py` → `python -m pytest ... tests/` (also switches to `python -m pytest` per the checkdmarc convention so cwd lands on the project root). - `pyproject.toml`: adds `[tool.pytest.ini_options] testpaths = ["tests"]` and `[tool.coverage.run] source = ["parsedmarc"]` with an `omit` for `parsedmarc/resources/maps/.py`. The maps scripts are maintainer-only batch tooling that ships out of the wheel; excluding them from coverage makes the headline number reflect only installed library code. Runtime coverage on the new layout is 59% (was 45% with maps counted), and PR-B will push it to 90%+. - `AGENTS.md`: documents the new layout and how to run individual files / tests; tells future contributors not to reintroduce a monolithic tests.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Restore 66.9% coverage baseline (count tests/ + parsedmarc) Master's headline 66.9% number on Codecov includes the tests.py file itself (99.35% covered) being measured alongside parsedmarc/. The original tests.py had no `[tool.coverage.run]` block, so coverage's default — "measure every file imported during the run" — counted the test code as if it were product code. The split commit added `source = ["parsedmarc"]` which suppressed measurement of the test files (correct in principle, since test files aren't shipped code), and that alone made the headline number drop by ~8 percentage points without any actual loss of testing. This commit swaps `source` for an explicit `include = ["parsedmarc/", "tests/"]` so both halves are measured the way they were on master. Verified: 276 tests, 66.96% line coverage (effectively unchanged from master's 66.90%). If you want the shipped-code-only number (was the headline that this commit overrides), run `pytest --cov=parsedmarc tests/`. That number is currently 59% and is the focus of the upcoming coverage-expansion PR. Also adds junit.xml to .gitignore so the CI artefact doesn't get accidentally committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Restrict coverage to shipped code (`source = ["parsedmarc"]`) Reverts the prior commit's `include = ["tests/*"]`. Counting the test files toward coverage was wrong — it conflates "shipped code exercised by tests" with "test code that pytest auto-runs", inflates the headline number, and rewards writing more tests rather than tests that verify more code. Master's apparent 66.9% was an artefact of the old monolithic tests.py having no [tool.coverage.run] block at all; coverage's default behaviour measured every imported file, including the test file itself at ~99% "covered", which added ~8 percentage points to the displayed number without any real testing signal. Restricting to `source = ["parsedmarc"]` plus the existing maps omit gives a meaningful baseline: 59% of shipped code is exercised by the test suite today. That's the number the next PR is targeting to lift to 90%+ before the 10.0.0 release; the Codecov "drop" here is a measurement correction, not a regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 19:29:09 -04:00

4 Commits