mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-05 11:35:25 +00:00
e8f15257576a9aee36201e4734f158c306d876e6
* Full-map redirect-target alias sweep: 146 new aliases Follow-up to PR #730 — runs the same redirect-target-alias analysis against the entire current map (5,509 rows) instead of only the rows added in PR #729. The map predates this session by several years, so acquisitions and rebrands accumulated without paired aliases. Method: re-ran collect_domain_info.py against every existing map entry (via --map /tmp/nonexistent.csv to bypass the skip-already-mapped filter). For each row whose homepage's final_url base differs from the domain, classified the redirect target as a same-operator alias or a sister/placeholder/etTLD that should be skipped. Three confidence tiers from 334 raw redirect-mismatch candidates: - Multi-source (>=2 mapped domains redirect to the same target): 20 aliases, all auto-included. Notable: hatena.blog (6 src — Hatena blog platform's brand consolidation), vercel.com (4 src — now.sh, vercel.app, vercel.dev), mailchimp.com (3 src — Mailchimp's tracking domains), liquid.tech (3 src — Liquid Intelligent Technologies after Neotel acquisition), supabase.com, streamlit.io (Snowflake), xfinity .com (Comcast). - Single-source with lexical-token overlap between source brand and target host: 128 aliases. These are TLD/subdomain variants (ais.co .th -> ais.th, neubox.net -> neubox.com, duck.com -> duckduckgo.com) and obvious near-rebrands (slic.com -> slicfiber.com, soverin.net -> soverin.com). - Single-source with no token overlap: 180 candidates. Held back from auto-promotion because token-mismatched single-source redirects are the bucket where false positives concentrate (small-operator pages redirecting to unrelated portals). Surfaced separately in a PR comment for hand review — many are real acquisitions (messagelabs .com -> broadcom.com, cincinnatibell.com -> altafiber.com, sparkpostmail.com -> bird.com, modis.com -> akkodis.com) that just need a maintainer's eye to confirm before mapping. Manual overrides for 5 multi-source cases where the heuristic picked the wrong source row's (name, type): - ziggo.nl: chello.sk's UPC redirect was the case-2 sister-brand pattern AGENTS.md step 6 already calls out; the legitimate source is ziggozakelijk.nl. Mapped to Ziggo, ISP. - zetaglobal.com: source rows pointed at Sailthru and Selligent (both acquired by Zeta Global). Canonical -> Zeta Global, Marketing. - crisis24.com: source rows pointed at One Call Now and Topo.ai (both acquired by Crisis24). Canonical -> Crisis24, SaaS. - directnic.com: heuristic picked "Directnic.com" from one source's name string; aligned to "Directnic" (matches the dnchosting.com source's convention). - fortinet.com: source rows pointed at Fortinet FortiMail product and Perception Point (Fortinet acquisition). Canonical -> Fortinet, Email Security (parent brand). Two false positives skipped from auto-promotion after sampling: - aichi-colony.jp -> aichi.jp: a healthcare operator's homepage redirected to the Aichi prefecture government portal — different operator (case-2 sister-host equivalent). - illinois.net -> illinois.gov: Illinois Century Network (academic) is not the State of Illinois government. Cumulative map size: 5,509 -> 5,655 rows. MMDB IPv4 coverage stays at ~90.47% (these aliases are mostly non-as_domain hosts, so they don't move the IPv4 metric — the win is PTR-side attribution coverage when DMARC reports cite the redirect target's domain). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Hand-review of held-back single-source aliases Adds 143 aliases from the held-back single-source-no-token-overlap list and updates 25 source rows to the post-rebrand brand name so both the source and alias rows resolve to the same canonical brand. Verification per case via public sources (acquisition press releases, rebrand announcements, official corporate documentation). Cases where the redirect target is a generic parent-company domain spanning many products were skipped — broadcom.com being the explicit exception where the alias uses the full product name "Broadcom Enterprise Messaging Security" so DMARC reports tagged with broadcom.com still land in the email-security bucket rather than overwriting other Broadcom product lines. Suspicious targets (parking pages, country-level TLDs, unrelated brands) were also skipped. Source-row name updates capture rebrands where the legacy brand no longer operates as such (Endurance International → Newfold Digital, Symantec Email Security → Broadcom Enterprise Messaging Security, Platform.sh → Upsun, Uninett → Sikt, SparkPost → Bird, etc.) and fix three typos uncovered during review (Goranicus → Granicus, Servastopol → Sevastopol, Wally-Wide → Valley-Wide). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document parent-company-too-generic alias guidance; rename SendGrid to "Twilio SendGrid" Two related changes: 1. Rename the canonical name on `sendgrid.com` from `SendGrid (Twilio)` to `Twilio SendGrid` for consistency with the existing `sendgrid.net` and `dlivry.co` entries — the post-acquisition official product name. 2. Add `twilio.com,Twilio,SaaS` as the parent-domain alias (rather than re-using the product-specific `Twilio SendGrid, Marketing`), so DMARC reports from non-email Twilio services (Programmable SMS, Voice, Segment, Flex, etc.) don't get mis-attributed to the email product. The product-domain entries keep the product-specific `(name, type)`. 3. Document this approach in AGENTS.md under the existing redirect-target alias rules. Two acceptable patterns for multi-product parent redirect targets: - Bare parent name + broad type (Twilio, NICE) — the safer default for parents with many distinct product lines. - Full product name + specific type (Broadcom Enterprise Messaging Security) — appropriate when the parent's domain is overwhelmingly tied to one product line for DMARC purposes. In both cases, don't blindly inherit the source row's product-specific `(name, type)` for the parent-domain alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document tiered verification approach for redirect-target alias review Captures the workflow that surfaced 143 confirmable aliases out of 180 held-back candidates with a small fraction of the search budget of "search every entry": - Tier 1: canonical name lexically corroborates the target — no search; source row is itself the second source. - Tier 2: canonical name explicitly contains "(Formerly X)" — no search; rebrand is self-documented. - Tier 3: no lexical overlap — search press releases / company newsroom / industry coverage; require two independent source categories; cite URLs in the PR. - Tier 4: target is a parking page / TLD-like base / unrelated brand — no search; reject and ship the list for heuristic tuning. Re-states the prompt-injection caveat in this verification context: press releases, homepages, news articles, WHOIS records, and search-result snippets are untrusted research data, never instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parsedmarc
parsedmarc is a Python module and CLI utility for parsing DMARC
reports. When used with Elasticsearch and Kibana (or Splunk), it works
as a self-hosted open-source alternative to commercial DMARC report
processing services such as Agari Brand Protection, Dmarcian, OnDMARC,
ProofPoint Email Fraud Defense, and Valimail.
Note
Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.
Sponsors
This is a project is maintained by one developer. Please consider sponsoring my work if you or your organization benefit from it.
Features
- Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf DMARC reports
- Parses reports from SMTP TLS Reporting
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports
- Consistent data structures
- Simple JSON and/or CSV output
- Optionally email the results
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
- Optionally send reports to Apache Kafka
Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.
| Version | Supported | Reason |
|---|---|---|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Supported (requires imapclient>=3.1.0) |
Description
Languages
Python
96.7%
Shell
3.2%
Dockerfile
0.1%
