Rebased on top of master @ 2cda5bf (9.9.0), which added the ASN
source attribution work (#712, #713, #714, #715). Individual Copilot
iteration commits squashed into this single commit — the per-commit
history on the feature branch was iterative (add tests, fix lint,
move field, revert, etc.) and not worth preserving; GitHub squash-
merges PRs anyway.
New fields from the DMARCbis XSD, plumbed through types, parsing, CSV
output, and the Elasticsearch / OpenSearch mappings:
- ``np`` — non-existent subdomain policy (``none`` / ``quarantine`` /
``reject``)
- ``testing`` — testing mode flag (``n`` / ``y``), replaces RFC 7489
``pct``
- ``discovery_method`` — policy discovery method (``psl`` /
``treewalk``)
- ``generator`` — report generator software identifier (metadata)
- ``human_result`` — optional descriptive text on DKIM / SPF results
RFC 7489 reports parse with ``None`` for DMARCbis-only fields.
Forensic reports have been renamed to failure reports throughout the
project to reflect the proper naming since RFC 7489.
- Core: ``types.py``, ``__init__.py`` — ``ForensicReport`` →
``FailureReport``, ``parse_forensic_report`` →
``parse_failure_report``, report type ``"failure"``.
- Output modules: ``elastic.py``, ``opensearch.py``, ``splunk.py``,
``kafkaclient.py``, ``syslog.py``, ``gelf.py``, ``webhook.py``,
``loganalytics.py``, ``s3.py``.
- CLI: ``cli.py`` — args, config keys, index names
(``dmarc_failure``).
- Docs + dashboards: all markdown, Grafana JSON, Kibana NDJSON,
Splunk XML.
Backward compatibility preserved: old function / type names remain as
aliases (``parse_forensic_report = parse_failure_report``,
``ForensicReport = FailureReport``, etc.), CLI accepts both the old
(``save_forensic``, ``forensic_topic``) and new (``save_failure``,
``failure_topic``) config keys, and updated dashboards query both
old and new index / sourcetype names so data from before and after
the rename appears together.
Merge conflicts resolved in ``parsedmarc/constants.py`` (took bis's
10.0.0 bump), ``parsedmarc/__init__.py`` (combined bis's "failure"
wording with master's IPinfo MMDB mention), ``parsedmarc/elastic.py``
and ``parsedmarc/opensearch.py`` (kept master's ``source_asn`` /
``source_asn_name`` / ``source_asn_domain`` on the failure doc path
while renaming ``forensic_report`` → ``failure_report``), and
``CHANGELOG.md`` (10.0.0 entry now sits above the 9.9.0 entry).
All 324 tests pass; ``ruff check`` / ``ruff format --check`` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the IPinfo Lite MMDB's native field names across the output
schemas — JSON source records now emit asn, as_name, as_domain, and
CSV / Elasticsearch / OpenSearch / Splunk integrations now emit
source_asn, source_as_name, source_as_domain. The integer asn / source_asn
field is unchanged.
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add optional IPinfo Lite REST API with MMDB fallback
Configure [general] ipinfo_api_token (or PARSEDMARC_GENERAL_IPINFO_API_TOKEN)
and every IP lookup hits https://api.ipinfo.io/lite/<ip> first for fresh
country + ASN data. On HTTP 429 (rate-limit) or 402 (quota), the API is
disabled for the rest of the run and lookups fall through to the bundled /
cached MMDB; transient network errors fall through per-request without
disabling the API. An invalid token (401/403) raises InvalidIPinfoAPIKey,
which the CLI catches and exits fatally — including at startup via a probe
lookup so operators notice misconfiguration immediately. Added
ipinfo_api_url as a base-URL override for mirrors or proxies.
The API token is never logged. A new _normalize_ip_record() helper is
shared between the API path and the MMDB path so both paths produce the
same normalized shape (country code, asn int, asn_name, asn_domain).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* IPinfo API: cool down and retry instead of permanent disable
Previously a single 429 or 402 disabled the API for the whole run. Now
each event sets a cooldown (using Retry-After when present, defaulting to
5 minutes for rate limits and 1 hour for quota exhaustion). Once the
cooldown expires the next lookup retries; a successful retry logs
"IPinfo API recovered" once at info level so operators can see service
came back. Repeat rate-limit responses after the first event stay at
debug to avoid log spam.
Test now targets parsedmarc.log (the actual emitting logger) instead of
the parsedmarc parent — cli._main() sets the child's level to ERROR,
and assertLogs on the parent can't see warnings filtered before
propagation. Test also exercises the cooldown-then-recovery path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* IPinfo API: log plan and quota from /me at startup
Configure-time probe now hits https://ipinfo.io/me first. That endpoint
is documented as quota-free and doubles as a free-of-quota token check,
so we use it to both validate the token and surface plan / month-to-date
usage / remaining-quota numbers at info level:
IPinfo API configured — plan: Lite, usage: 12345/50000 this month, 37655 remaining
Field names in /me have drifted across IPinfo plan generations, so the
summary formatter probes a few aliases before giving up. If /me is
unreachable (custom mirror behind ipinfo_api_url, network error) we
fall back to the original 1.1.1.1 lookup probe, which still validates
the token and logs a generic "configured" message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Drop speculative ipinfo_api_url override
It was added mirroring ip_db_url, but the two serve different needs.
ip_db_url has a real use (internal hosting of the MMDB); an
authenticated IPinfo API isn't something anyone mirrors, and /me was
always hardcoded anyway, making the override half-baked. YAGNI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* AGENTS.md: warn against speculative config options
New section under Configuration spelling out that every option is
permanent surface area and must come from a real user need rather than
pattern-matching a nearby option. Cites the removed ipinfo_api_url as
the canonical cautionary tale so the next session doesn't reintroduce
it, and calls out "override the base URL" / "configurable retries" as
common YAGNI traps.
Also requires that new options land fully wired in one PR (INI schema,
_parse_config, Namespace defaults, docs, SIGHUP-reload path) rather
than half-implemented.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Rename [general] ip_db_url to ipinfo_url
The bundled MMDB is specifically IPinfo Lite, so the option name
should say so. ip_db_url stays accepted as a deprecated alias and
logs a warning when used; env-var equivalents accept either spelling
via the existing PARSEDMARC_{SECTION}_{KEY} machinery.
Updated the AGENTS.md cautionary tale to refer to ipinfo_url (with
the note about the alias) so the anti-pattern example still reads
correctly post-rename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix testPSLDownload to reflect .akamaiedge.net override
PSL carries c.akamaiedge.net as a public suffix, but
psl_overrides.txt intentionally folds .akamaiedge.net so every
Akamai CDN-customer PTR (the aXXXX-XX.cXXXXX.akamaiedge.net pattern)
clusters under one akamaiedge.net display key. The override was added
in 2978436 as a design decision for source attribution; the test
assertion just predates it.
Updated the comment to explain why override wins over the live PSL
here so the next reader doesn't reach for the PSL answer again.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Surface ASN info and fall back to it when a PTR is absent
Adds three new fields to every IP source record — ``asn`` (integer,
e.g. 15169), ``asn_name`` (``"Google LLC"``), ``asn_domain``
(``"google.com"``) — sourced from the bundled IPinfo Lite MMDB. These
flow through to CSV, JSON, Elasticsearch, OpenSearch, and Splunk
outputs as ``source_asn``, ``source_asn_name``, ``source_asn_domain``.
More importantly: when an IP has no reverse DNS (common for many
large senders), source attribution now falls back to the ASN domain
as a lookup key into the same ``reverse_dns_map``. Thanks to #712
and #714, ~85% of routed IPv4 space now has an ``as_domain`` that
hits the map, so rows that were previously unattributable now get a
``source_name``/``source_type`` derived from the ASN. When the ASN
domain misses the map, the raw AS name is used as ``source_name``
with ``source_type`` left null — still better than nothing.
Crucially, ``source_reverse_dns`` and ``source_base_domain`` remain
null on ASN-derived rows, so downstream consumers can still tell a
PTR-resolved attribution apart from an ASN-derived one.
ASN is stored as an integer at the schema level (Elasticsearch /
OpenSearch mappings use ``Integer``) so consumers can do range
queries and numeric sorts; dashboards can prepend ``AS`` at display
time. The MMDB reader normalizes both IPinfo's ``"AS15169"`` string
and MaxMind's ``autonomous_system_number`` int to the same int form.
Also fixes a pre-existing caching bug in ``get_ip_address_info``:
entries without reverse DNS were never written to the IP-info cache,
so every no-PTR IP re-did the MMDB read and DNS attempt on every
call. The cache write is now unconditional.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Bump to 9.9.0 and document the ASN fallback work
Updates the changelog with a 9.9.0 entry covering the ASN-domain
aliases (#712, #714), map-maintenance tooling fixes (#713), and the
ASN-fallback source attribution added in this branch.
Extends AGENTS.md to explain that ``base_reverse_dns_map.csv`` is now
a mixed-namespace map (rDNS bases alongside ASN domains) and adds a
short recipe for finding high-value ASN-domain misses against the
bundled MMDB, so future contributors know where the map's second
lookup path comes from.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Document project conventions previously held only in agent memory
Promotes four conventions out of per-agent memory and into AGENTS.md
so every contributor — human or agent — works from the same baseline:
- Run ruff check + format before committing (Code Style).
- Store natively numeric values as numbers, not pre-formatted strings
(e.g. ASN as int 15169, not "AS15169"; ES/OS mappings as Integer)
(Code Style).
- Before rewriting a tracked list/data file from freshly-generated
content, verify the existing content via git — these files
accumulate manually-curated entries across sessions (Editing tracked
data files).
- A release isn't done until hatch-built sdist + wheel are attached to
the GitHub release page; full 8-step sequence documented (Releases).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the bundled IP-to-country database from DB-IP Country Lite to
IPinfo Lite for greater lookup accuracy. The download URL, cached
filename, and packaged module path all move from
dbip/dbip-country-lite.mmdb to ipinfo/ipinfo_lite.mmdb.
IPinfo Lite uses a different MMDB schema (flat country_code) that is
incompatible with geoip2's Reader.country() helper, so get_ip_address_country()
now uses maxminddb directly and handles both the IPinfo schema and
the MaxMind/DBIP nested country.iso_code schema so users who drop in
their own MMDB from any of these providers continue to work.
Drop the geoip2 dependency (it was only used for the incompatible
helper) and add maxminddb as a direct dependency — it was already
installed transitively through geoip2.
Callers that imported parsedmarc.resources.dbip directly need to switch
to parsedmarc.resources.ipinfo. Old parsedmarc versions downloading
from the dbip/ GitHub raw URL will 404 and fall back to their bundled
copy — this is the documented behavior of load_ip_db().
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Auto-download psl_overrides.txt at startup (and whenever the reverse DNS
map is reloaded) via load_psl_overrides(); add local_psl_overrides_path
and psl_overrides_url config options
- Add collect_domain_info.py and detect_psl_overrides.py for bulk WHOIS/HTTP
enrichment and automatic cluster-based PSL override detection
- Block full-IPv4 reverse-DNS entries from ever entering
base_reverse_dns_map.csv, known_unknown_base_reverse_dns.txt, or
unknown_base_reverse_dns.csv, and sweep pre-existing IP entries
- Add Religion and Utilities to the allowed service_type values
- Document the full map-maintenance workflow in AGENTS.md
- Substantial expansion of base_reverse_dns_map.csv (net ~+1,000 entries)
- Add 26 tests covering the new loader, IP filter, PSL fold logic, and
cluster detection
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Download the latest DB-IP Country Lite mmdb from GitHub on startup and
SIGHUP, caching it locally, with fallback to a previously cached or
bundled copy. Skipped when the offline flag is set. Adds ip_db_url
config option (PARSEDMARC_GENERAL_IP_DB_URL) to override the download
URL. Bumps version to 9.6.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add environment variable configuration support and update documentation
- Introduced support for configuration via environment variables using the `PARSEDMARC_{SECTION}_{KEY}` format.
- Added `PARSEDMARC_CONFIG_FILE` variable to specify the config file path.
- Enabled env-only mode for file-less Docker deployments.
- Implemented explicit read permission checks on config files.
- Updated changelog and usage documentation to reflect these changes.
Elasticsearch and OpenSearch now verify SSL certificates by default when `ssl = True`, even without a `cert_path`
- Added `skip_certificate_verification` option to the `elasticsearch` and `opensearch` configuration sections for consistency with `splunk_hec`
- Splunk HEC `skip_certificate_verification` now works correctly with self-signed certificates
- SMTP TLS reports no longer fail when saving to multiple output targets (e.g. Elasticsearch and OpenSearch) due to in-place mutation of the report dict
- Output client initialization errors now identify which module failed (e.g. "OpenSearch: ConnectionError..." instead of generic "Output client error")
- Enhanced error handling for output client initialization
* Add MS Graph certificate authentication support
* Preserve MS Graph constructor compatibility
---------
Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
- Updated parsedmarc/syslog.py to support UDP, TCP, and TLS protocols
- Added protocol parameter with UDP as default for backward compatibility
- Implemented TLS support with CA verification and client certificate auth
- Added retry logic for TCP/TLS connections with configurable attempts and delays
- Updated parsedmarc/cli.py with new config file parsing
- Updated documentation with examples for TCP and TLS configurations
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
* Remove CLI arguments for syslog options, keep config-file only
Per user request, removed command-line argument options for syslog parameters.
All new syslog options (protocol, TLS settings, timeout, retry) are now only
available via configuration file, consistent with other similar options.
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
* Fix code review issues: remove trailing whitespace and add cert validation
- Removed trailing whitespace from syslog.py and usage.md
- Added warning when only one of certfile_path/keyfile_path is provided
- Improved error handling for incomplete TLS client certificate configuration
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
* Set minimum TLS version to 1.2 for enhanced security
Explicitly configured ssl_context.minimum_version = TLSVersion.TLSv1_2
to ensure only secure TLS versions are used for syslog connections.
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
8.19.0
- Add multi-tenant support via an index-prefix domain mapping file
- PSL overrides so that services like AWS are correctly identified
- Additional improvements to report type detection
- Fix webhook timeout parsing (PR #623)
- Output to STDOUT when the new general config boolean `silent` is set to `False` (Close#614)
- Additional services added to `base_reverse_dns_map.csv`
---------
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Félix <felix.debloisbeaucage@gmail.com>
* Added output to webhook as an option
* added documentation for new webhook configuration
---------
Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
- Added `since` option under `mailbox` section to search for messages since a certain time instead of going through the complete mailbox during testing scenarios. Acceptable values -`5m|3h|2d|1w`, units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"}). Defaults to `1d` if an incorrect value is provided.
- Not to mark messages as read if test option is selected (works only for MSGraphConnection)
- Actually save `source_type` and `source_name` to Elasticsearch and OpenSearch
- Reverse-lookup cache improvements (PR #501 closes issue #498)
- Update the included `dbip-country-lite.mmdb` to the 2024-03 version
- Update `base_reverse_dns_map.csv`
- Add new general config options (closes issue #500)
- `always_use_local_files` - Disables the download of the reverse DNS map
- `local_reverse_dns_map_path` - Overrides the default local file path to use for the reverse DNS map
- `reverse_dns_map_url` - Overrides the default download URL for the reverse DNS map