Commit Graph

6 Commits

Author SHA1 Message Date
Sean Whalen d7366d088f Add DMARCbis report support; rename forensic→failure project-wide
Rebased on top of master @ 2cda5bf (9.9.0), which added the ASN
source attribution work (#712, #713, #714, #715). Individual Copilot
iteration commits squashed into this single commit — the per-commit
history on the feature branch was iterative (add tests, fix lint,
move field, revert, etc.) and not worth preserving; GitHub squash-
merges PRs anyway.

### DMARCbis fields (new)

New fields from the DMARCbis XSD, plumbed through types, parsing, CSV
output, and the Elasticsearch / OpenSearch mappings:

- ``np`` — non-existent subdomain policy (``none`` / ``quarantine`` /
  ``reject``)
- ``testing`` — testing mode flag (``n`` / ``y``), replaces RFC 7489
  ``pct``
- ``discovery_method`` — policy discovery method (``psl`` /
  ``treewalk``)
- ``generator`` — report generator software identifier (metadata)
- ``human_result`` — optional descriptive text on DKIM / SPF results

RFC 7489 reports parse with ``None`` for DMARCbis-only fields.

### Forensic → failure rename

Forensic reports have been renamed to failure reports throughout the
project to reflect the proper naming since RFC 7489.

- Core: ``types.py``, ``__init__.py`` — ``ForensicReport`` →
  ``FailureReport``, ``parse_forensic_report`` →
  ``parse_failure_report``, report type ``"failure"``.
- Output modules: ``elastic.py``, ``opensearch.py``, ``splunk.py``,
  ``kafkaclient.py``, ``syslog.py``, ``gelf.py``, ``webhook.py``,
  ``loganalytics.py``, ``s3.py``.
- CLI: ``cli.py`` — args, config keys, index names
  (``dmarc_failure``).
- Docs + dashboards: all markdown, Grafana JSON, Kibana NDJSON,
  Splunk XML.

Backward compatibility preserved: old function / type names remain as
aliases (``parse_forensic_report = parse_failure_report``,
``ForensicReport = FailureReport``, etc.), CLI accepts both the old
(``save_forensic``, ``forensic_topic``) and new (``save_failure``,
``failure_topic``) config keys, and updated dashboards query both
old and new index / sourcetype names so data from before and after
the rename appears together.

### Rebase notes

Merge conflicts resolved in ``parsedmarc/constants.py`` (took bis's
10.0.0 bump), ``parsedmarc/__init__.py`` (combined bis's "failure"
wording with master's IPinfo MMDB mention), ``parsedmarc/elastic.py``
and ``parsedmarc/opensearch.py`` (kept master's ``source_asn`` /
``source_asn_name`` / ``source_asn_domain`` on the failure doc path
while renaming ``forensic_report`` → ``failure_report``), and
``CHANGELOG.md`` (10.0.0 entry now sits above the 9.9.0 entry).

All 324 tests pass; ``ruff check`` / ``ruff format --check`` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 02:26:30 -04:00
Sean Whalen 2cda5bf59b Surface ASN info and use it for source attribution when a PTR is absent (#715)
* Surface ASN info and fall back to it when a PTR is absent

Adds three new fields to every IP source record — ``asn`` (integer,
e.g. 15169), ``asn_name`` (``"Google LLC"``), ``asn_domain``
(``"google.com"``) — sourced from the bundled IPinfo Lite MMDB. These
flow through to CSV, JSON, Elasticsearch, OpenSearch, and Splunk
outputs as ``source_asn``, ``source_asn_name``, ``source_asn_domain``.

More importantly: when an IP has no reverse DNS (common for many
large senders), source attribution now falls back to the ASN domain
as a lookup key into the same ``reverse_dns_map``. Thanks to #712
and #714, ~85% of routed IPv4 space now has an ``as_domain`` that
hits the map, so rows that were previously unattributable now get a
``source_name``/``source_type`` derived from the ASN. When the ASN
domain misses the map, the raw AS name is used as ``source_name``
with ``source_type`` left null — still better than nothing.

Crucially, ``source_reverse_dns`` and ``source_base_domain`` remain
null on ASN-derived rows, so downstream consumers can still tell a
PTR-resolved attribution apart from an ASN-derived one.

ASN is stored as an integer at the schema level (Elasticsearch /
OpenSearch mappings use ``Integer``) so consumers can do range
queries and numeric sorts; dashboards can prepend ``AS`` at display
time. The MMDB reader normalizes both IPinfo's ``"AS15169"`` string
and MaxMind's ``autonomous_system_number`` int to the same int form.

Also fixes a pre-existing caching bug in ``get_ip_address_info``:
entries without reverse DNS were never written to the IP-info cache,
so every no-PTR IP re-did the MMDB read and DNS attempt on every
call. The cache write is now unconditional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Bump to 9.9.0 and document the ASN fallback work

Updates the changelog with a 9.9.0 entry covering the ASN-domain
aliases (#712, #714), map-maintenance tooling fixes (#713), and the
ASN-fallback source attribution added in this branch.

Extends AGENTS.md to explain that ``base_reverse_dns_map.csv`` is now
a mixed-namespace map (rDNS bases alongside ASN domains) and adds a
short recipe for finding high-value ASN-domain misses against the
bundled MMDB, so future contributors know where the map's second
lookup path comes from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Document project conventions previously held only in agent memory

Promotes four conventions out of per-agent memory and into AGENTS.md
so every contributor — human or agent — works from the same baseline:

- Run ruff check + format before committing (Code Style).
- Store natively numeric values as numbers, not pre-formatted strings
  (e.g. ASN as int 15169, not "AS15169"; ES/OS mappings as Integer)
  (Code Style).
- Before rewriting a tracked list/data file from freshly-generated
  content, verify the existing content via git — these files
  accumulate manually-curated entries across sessions (Editing tracked
  data files).
- A release isn't done until hatch-built sdist + wheel are attached to
  the GitHub release page; full 8-step sequence documented (Releases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 02:13:30 -04:00
Sean Whalen 1fc9f638e2 9.0.0 (#629)
* Normalize report volumes when a report timespan exceed 24 hours
2025-12-01 17:06:58 -05:00
Sean Whalen b8088505b1 Add support for SMTP TLS reports (#453) 2024-02-19 18:45:38 -05:00
Anael Mobilia bf69ea8ccc Fix typos (#413)
Co-authored-by: Anael Mobilia <anael.mobilia@mydsomanager.com>
2023-05-14 18:07:07 -04:00
Ben Companjen 2b35b785c6 Split and Organise documentation files (#404)
* Set global TOC collapse to false

* Split documentation

I tried to split the index.md file into logical parts, not changing the contents.
I did add a space and change one HTTP URL to HTTPS.

---------

Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
2023-05-03 16:11:58 -04:00