Sean Whalen c752e776de Detect map-key rebrands via homepage drift sweep (#752)
Adds two complementary pieces of M&A drift detection over base_reverse_dns_map.csv:

- `collect_domain_info.py` gains two derived columns. `rebrand_signal` combines
  a body-text regex ("now X" / "formerly known as X" / "we became X" / ...)
  with a narrow path-and-alt-text regex ("rebrand", "brand-launch",
  "brand-announcement", "name-change", "our-new-name", ...) that runs against
  the JSON-unescaped page bytes, so URL slugs and image alt attributes inside
  Elementor / hydration script blobs are reachable. The two-regex split is
  what catches image-only acquisition banners like bankonitusa.com's "now
  Navanta" — a `<a href="https://navanta.com/brand-launch-..."><img
  alt="Brand announcement"></a>` with no visible text — that pure body-text
  scanning misses. `external_links` collects the homepage's non-self,
  non-social outbound link hosts as review context only.

- `detect_rebrands.py` is a new sibling drift sweep. It re-fetches every key
  in base_reverse_dns_map.csv with the same fetch machinery, evaluates two
  default flag triggers (`rebrand_signal` matched, or final URL host doesn't
  sit under the input domain), and writes a compact TSV of just the flagged
  rows. `external_links` is captured into the row as context but is not a
  default trigger — most outbound links are to partners / customers / vendors,
  and flagging them would flood review with noise. `--flag-external-links`
  opts into that signal for thorough sweeps. Resume-safe via `-o`.

Output is review fodder, not automated map mutation: a single signal is one
corroborating source, and promoting a flagged row into the map still requires
a second source per the two-corroborating-sources rule.

README and AGENTS.md updated to document the new columns and script.

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:22:30 -04:00
2026-05-03 12:36:06 -04:00
2026-04-19 21:20:41 -04:00
2025-12-12 15:56:52 -05:00
2026-03-09 18:16:47 -04:00
2026-03-23 17:08:26 -04:00
2018-02-05 20:23:07 -05:00
2022-10-04 18:45:57 -04:00
2026-03-09 18:24:16 -04:00

parsedmarc

Build
Status Code
Coverage PyPI
Package PyPI - Downloads

A screenshot of DMARC summary charts in Kibana

parsedmarc is a Python module and CLI utility for parsing DMARC reports. When used with Elasticsearch and Kibana (or Splunk), it works as a self-hosted open-source alternative to commercial DMARC report processing services such as Agari Brand Protection, Dmarcian, OnDMARC, ProofPoint Email Fraud Defense, and Valimail.

Note

Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.

Sponsors

This is a project is maintained by one developer. Please consider sponsoring my work if you or your organization benefit from it.

Features

  • Parses draft and 1.0 standard aggregate/rua DMARC reports
  • Parses forensic/failure/ruf DMARC reports
  • Parses reports from SMTP TLS Reporting
  • Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
  • Transparently handles gzip or zip compressed reports
  • Consistent data structures
  • Simple JSON and/or CSV output
  • Optionally email the results
  • Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
  • Optionally send reports to Apache Kafka

Python Compatibility

This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.

Version Supported Reason
< 3.6 End of Life (EOL)
3.6 Used in RHEL 8, but not supported by project dependencies
3.7 End of Life (EOL)
3.8 End of Life (EOL)
3.9 Used in Debian 11 and RHEL 9, but not supported by project dependencies
3.10 Actively maintained
3.11 Actively maintained; supported until June 2028 (Debian 12)
3.12 Actively maintained; supported until May 2035 (RHEL 10)
3.13 Actively maintained; supported until June 2030 (Debian 13)
3.14 Supported (requires imapclient>=3.1.0)
S
Description
No description provided
Readme Apache-2.0 160 MiB
Languages
Python 98.7%
Shell 1.3%