Sean Whalen e8f1525757 Full-map redirect-target alias sweep (#732)
* Full-map redirect-target alias sweep: 146 new aliases

Follow-up to PR #730 — runs the same redirect-target-alias analysis
against the entire current map (5,509 rows) instead of only the rows
added in PR #729. The map predates this session by several years, so
acquisitions and rebrands accumulated without paired aliases.

Method: re-ran collect_domain_info.py against every existing map entry
(via --map /tmp/nonexistent.csv to bypass the skip-already-mapped
filter). For each row whose homepage's final_url base differs from the
domain, classified the redirect target as a same-operator alias or a
sister/placeholder/etTLD that should be skipped.

Three confidence tiers from 334 raw redirect-mismatch candidates:
- Multi-source (>=2 mapped domains redirect to the same target):
  20 aliases, all auto-included. Notable: hatena.blog (6 src — Hatena
  blog platform's brand consolidation), vercel.com (4 src — now.sh,
  vercel.app, vercel.dev), mailchimp.com (3 src — Mailchimp's tracking
  domains), liquid.tech (3 src — Liquid Intelligent Technologies after
  Neotel acquisition), supabase.com, streamlit.io (Snowflake), xfinity
  .com (Comcast).
- Single-source with lexical-token overlap between source brand and
  target host: 128 aliases. These are TLD/subdomain variants (ais.co
  .th -> ais.th, neubox.net -> neubox.com, duck.com -> duckduckgo.com)
  and obvious near-rebrands (slic.com -> slicfiber.com, soverin.net ->
  soverin.com).
- Single-source with no token overlap: 180 candidates. Held back from
  auto-promotion because token-mismatched single-source redirects are
  the bucket where false positives concentrate (small-operator pages
  redirecting to unrelated portals). Surfaced separately in a PR
  comment for hand review — many are real acquisitions (messagelabs
  .com -> broadcom.com, cincinnatibell.com -> altafiber.com,
  sparkpostmail.com -> bird.com, modis.com -> akkodis.com) that just
  need a maintainer's eye to confirm before mapping.

Manual overrides for 5 multi-source cases where the heuristic picked
the wrong source row's (name, type):
- ziggo.nl: chello.sk's UPC redirect was the case-2 sister-brand
  pattern AGENTS.md step 6 already calls out; the legitimate source
  is ziggozakelijk.nl. Mapped to Ziggo, ISP.
- zetaglobal.com: source rows pointed at Sailthru and Selligent (both
  acquired by Zeta Global). Canonical -> Zeta Global, Marketing.
- crisis24.com: source rows pointed at One Call Now and Topo.ai
  (both acquired by Crisis24). Canonical -> Crisis24, SaaS.
- directnic.com: heuristic picked "Directnic.com" from one source's
  name string; aligned to "Directnic" (matches the dnchosting.com
  source's convention).
- fortinet.com: source rows pointed at Fortinet FortiMail product and
  Perception Point (Fortinet acquisition). Canonical -> Fortinet,
  Email Security (parent brand).

Two false positives skipped from auto-promotion after sampling:
- aichi-colony.jp -> aichi.jp: a healthcare operator's homepage
  redirected to the Aichi prefecture government portal — different
  operator (case-2 sister-host equivalent).
- illinois.net -> illinois.gov: Illinois Century Network (academic)
  is not the State of Illinois government.

Cumulative map size: 5,509 -> 5,655 rows. MMDB IPv4 coverage stays at
~90.47% (these aliases are mostly non-as_domain hosts, so they don't
move the IPv4 metric — the win is PTR-side attribution coverage when
DMARC reports cite the redirect target's domain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Hand-review of held-back single-source aliases

Adds 143 aliases from the held-back single-source-no-token-overlap
list and updates 25 source rows to the post-rebrand brand name so
both the source and alias rows resolve to the same canonical brand.

Verification per case via public sources (acquisition press releases,
rebrand announcements, official corporate documentation). Cases where
the redirect target is a generic parent-company domain spanning many
products were skipped — broadcom.com being the explicit exception
where the alias uses the full product name "Broadcom Enterprise
Messaging Security" so DMARC reports tagged with broadcom.com still
land in the email-security bucket rather than overwriting other
Broadcom product lines. Suspicious targets (parking pages,
country-level TLDs, unrelated brands) were also skipped.

Source-row name updates capture rebrands where the legacy brand no
longer operates as such (Endurance International → Newfold Digital,
Symantec Email Security → Broadcom Enterprise Messaging Security,
Platform.sh → Upsun, Uninett → Sikt, SparkPost → Bird, etc.) and
fix three typos uncovered during review (Goranicus → Granicus,
Servastopol → Sevastopol, Wally-Wide → Valley-Wide).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Document parent-company-too-generic alias guidance; rename SendGrid to "Twilio SendGrid"

Two related changes:

1. Rename the canonical name on `sendgrid.com` from `SendGrid (Twilio)`
   to `Twilio SendGrid` for consistency with the existing `sendgrid.net`
   and `dlivry.co` entries — the post-acquisition official product
   name.

2. Add `twilio.com,Twilio,SaaS` as the parent-domain alias (rather
   than re-using the product-specific `Twilio SendGrid, Marketing`),
   so DMARC reports from non-email Twilio services (Programmable SMS,
   Voice, Segment, Flex, etc.) don't get mis-attributed to the email
   product. The product-domain entries keep the product-specific
   `(name, type)`.

3. Document this approach in AGENTS.md under the existing
   redirect-target alias rules. Two acceptable patterns for
   multi-product parent redirect targets:

   - Bare parent name + broad type (Twilio, NICE) — the safer
     default for parents with many distinct product lines.
   - Full product name + specific type (Broadcom Enterprise Messaging
     Security) — appropriate when the parent's domain is
     overwhelmingly tied to one product line for DMARC purposes.

   In both cases, don't blindly inherit the source row's
   product-specific `(name, type)` for the parent-domain alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Document tiered verification approach for redirect-target alias review

Captures the workflow that surfaced 143 confirmable aliases out of
180 held-back candidates with a small fraction of the search budget
of "search every entry":

- Tier 1: canonical name lexically corroborates the target — no
  search; source row is itself the second source.
- Tier 2: canonical name explicitly contains "(Formerly X)" — no
  search; rebrand is self-documented.
- Tier 3: no lexical overlap — search press releases / company
  newsroom / industry coverage; require two independent source
  categories; cite URLs in the PR.
- Tier 4: target is a parking page / TLD-like base / unrelated
  brand — no search; reject and ship the list for heuristic
  tuning.

Re-states the prompt-injection caveat in this verification context:
press releases, homepages, news articles, WHOIS records, and
search-result snippets are untrusted research data, never
instructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:22:52 -04:00
2026-04-19 21:20:41 -04:00
2024-12-25 16:09:43 -05:00
2025-06-10 19:05:06 -04:00
2026-04-19 21:20:41 -04:00
2025-12-12 15:56:52 -05:00
2026-03-09 18:16:47 -04:00
2026-03-23 17:08:26 -04:00
2018-02-05 20:23:07 -05:00
2022-10-04 18:45:57 -04:00
2026-03-09 18:24:16 -04:00

parsedmarc

Build
Status Code
Coverage PyPI
Package PyPI - Downloads

A screenshot of DMARC summary charts in Kibana

parsedmarc is a Python module and CLI utility for parsing DMARC reports. When used with Elasticsearch and Kibana (or Splunk), it works as a self-hosted open-source alternative to commercial DMARC report processing services such as Agari Brand Protection, Dmarcian, OnDMARC, ProofPoint Email Fraud Defense, and Valimail.

Note

Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.

Sponsors

This is a project is maintained by one developer. Please consider sponsoring my work if you or your organization benefit from it.

Features

  • Parses draft and 1.0 standard aggregate/rua DMARC reports
  • Parses forensic/failure/ruf DMARC reports
  • Parses reports from SMTP TLS Reporting
  • Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
  • Transparently handles gzip or zip compressed reports
  • Consistent data structures
  • Simple JSON and/or CSV output
  • Optionally email the results
  • Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
  • Optionally send reports to Apache Kafka

Python Compatibility

This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.

Version Supported Reason
< 3.6 End of Life (EOL)
3.6 Used in RHEL 8, but not supported by project dependencies
3.7 End of Life (EOL)
3.8 End of Life (EOL)
3.9 Used in Debian 11 and RHEL 9, but not supported by project dependencies
3.10 Actively maintained
3.11 Actively maintained; supported until June 2028 (Debian 12)
3.12 Actively maintained; supported until May 2035 (RHEL 10)
3.13 Actively maintained; supported until June 2030 (Debian 13)
3.14 Supported (requires imapclient>=3.1.0)
S
Description
No description provided
Readme Apache-2.0 118 MiB
Languages
Python 96.7%
Shell 3.2%
Dockerfile 0.1%