mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-21 11:25:23 +00:00
c752e776de12be0d435a15d3a97fee9179c8ab3c
Adds two complementary pieces of M&A drift detection over base_reverse_dns_map.csv:
- `collect_domain_info.py` gains two derived columns. `rebrand_signal` combines
a body-text regex ("now X" / "formerly known as X" / "we became X" / ...)
with a narrow path-and-alt-text regex ("rebrand", "brand-launch",
"brand-announcement", "name-change", "our-new-name", ...) that runs against
the JSON-unescaped page bytes, so URL slugs and image alt attributes inside
Elementor / hydration script blobs are reachable. The two-regex split is
what catches image-only acquisition banners like bankonitusa.com's "now
Navanta" — a `<a href="https://navanta.com/brand-launch-..."><img
alt="Brand announcement"></a>` with no visible text — that pure body-text
scanning misses. `external_links` collects the homepage's non-self,
non-social outbound link hosts as review context only.
- `detect_rebrands.py` is a new sibling drift sweep. It re-fetches every key
in base_reverse_dns_map.csv with the same fetch machinery, evaluates two
default flag triggers (`rebrand_signal` matched, or final URL host doesn't
sit under the input domain), and writes a compact TSV of just the flagged
rows. `external_links` is captured into the row as context but is not a
default trigger — most outbound links are to partners / customers / vendors,
and flagging them would flood review with noise. `--flag-external-links`
opts into that signal for thorough sweeps. Resume-safe via `-o`.
Output is review fodder, not automated map mutation: a single signal is one
corroborating source, and promoting a flagged row into the map still requires
a second source per the two-corroborating-sources rule.
README and AGENTS.md updated to document the new columns and script.
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parsedmarc
parsedmarc is a Python module and CLI utility for parsing DMARC
reports. When used with Elasticsearch and Kibana (or Splunk), it works
as a self-hosted open-source alternative to commercial DMARC report
processing services such as Agari Brand Protection, Dmarcian, OnDMARC,
ProofPoint Email Fraud Defense, and Valimail.
Note
Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.
Sponsors
This is a project is maintained by one developer. Please consider sponsoring my work if you or your organization benefit from it.
Features
- Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf DMARC reports
- Parses reports from SMTP TLS Reporting
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports
- Consistent data structures
- Simple JSON and/or CSV output
- Optionally email the results
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
- Optionally send reports to Apache Kafka
Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.
| Version | Supported | Reason |
|---|---|---|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Supported (requires imapclient>=3.1.0) |
Description
Languages
Python
98.7%
Shell
1.3%
