mirror of https://github.com/domainaware/parsedmarc.git synced 2026-07-06 00:35:09 +00:00

T

Sean Whalen 769b16bb03 Drift-detect rebrands: tighten regex; promote 11 verified rebrand-aliased map keys (#753 )

* Tighten rebrand regex to drop CTA, third-party-mention, and CSS-asset FPs

The first run of detect_rebrands.py against the live map surfaced systemic
false-positive categories that drowned the real signals. Tightening over two
rounds of FP triage:

REBRAND_RE — drop bare "now <Cap>" and "joined the X" branches:

- "Buy Now PROMO", "Apply Now Who", "Order Now Free Shipping" — modern
  marketing pages saturate body text with CTA fragments and ~95% of bare
  "now <Capital>" matches were these. Replaced with the linguistically
  meaningful pattern "(is|are|was|were|am) now (?:(?:a )?part of)?" which
  still catches "BankOnIT is now Navanta", "We are now Cencora",
  "is now part of Lumen", etc.
- "joined the Festo Certified System Integrator Program", "joined the
  ClimateCAP Initiative", "joined the Fredonia Women's Rugby team" — the
  "joined the X" pattern was too generic; real "joined the X family"
  rebrand banners are rare enough that dropping the branch is the right
  trade.

REBRAND_RE — add `\b` word boundary at the start so triggers don't match
mid-word: "Stre*am* now Mystery" was matching `am now <Cap>` because the
last two letters of "Stream" satisfied the verb alternation.

REBRAND_PATH_RE — drop bare `rebrand`, `name change`, `new name for`, and
`brand-update` / `brand-refresh` patterns. They appeared too often as CSS
class names (`class="rebrand-page"`), CSS variables
(`--rebrand-underline-color`), image filenames (`bms-rebrand-logo.svg`,
`brand-update.css`), and JSON/JS strings (`"name change"` user-account
labels). Adding `\b` boundaries doesn't help because dashes are non-word
characters. The remaining narrow patterns (`brand-launch`,
`brand-announcement`, `brand-reveal`, `our-new-name`, `our-new-brand`,
`acquisition-announcement`, `merger-announcement`) still catch the
canonical bankonitusa.com case via its `brand-launch-frequently-asked-
questions` URL slug and `Brand announcement` alt text.

_REBRAND_NOISE — make the comparison case-insensitive and add
"included", "iso", "secure", "part" to suppress "is now ON" / "is now
LIVE" / "is now ISO 27001 certified" / "is now Secure Managed Wi-Fi" /
"is now Part of" patterns. Twitter/Facebook/Square (the social-platform
rebrand mentions in footers like "X (formerly Twitter)") moved to
lowercase since the comparison is now case-insensitive.

Net effect on a full sweep over the ~13,100-key map: rebrand-signal
flagged-row count dropped from ~270 (initial run) to 108 (round-3),
clearing the dominant FP categories while every real signal — verified
against the bankonitusa.com canonical case plus 11 other actual
rebrands — still fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Promote 11 verified rebrands found by drift sweep; alias 4 acquirer domains

Renames produced by `detect_rebrands.py` running against the full ~13,100-key
map and verified by re-reading each operator's homepage. Type column
unchanged for every row — only the canonical `name` shifts to the new
operator. Where the new operator's primary domain wasn't already in the map,
a case-1 alias row is added pointing to the same `(name, type)`.

Renames:

- amerisourcebergen.com: AMERISOURCEBERGEN → Cencora
- aurorahealthcare.org: Aurora Health Care → Advocate Health
- consolidated.com: Consolidated Communications → Fidium Fiber
- databridgesites.com: Meridian Parkway Data Center Owner → TierPoint
- emarsys.com: SAP Emarsys → SAP Engagement Cloud
- rig.net: RigNet → Viasat
- rxlightning.com: RxLightning → CoverMyMeds
- telepoint.bg: Telepoint → Digital Realty
- thehostgroup.com: The Host Group → HostGo
- ultisat.com: Globecomm Services Maryland → UltiSat
- unifiedpostgroup.com: Unifiedpost Group → Banqup

New aliases (operator's primary domain not previously mapped):

- cencora.com → Cencora, Healthcare
- advocatehealth.com → Advocate Health, Healthcare
- covermymeds.com → CoverMyMeds, Healthcare
- banqup.com → Banqup, SaaS

Five sweep hits intentionally deferred for lack of a clear second source:
megatel.co.nz → Nova (`nova.co.nz` is for sale via a domain broker;
unclear which Nova entity), pogozone.com → NeuBeam (NeuBeam's homepage
doesn't acknowledge the PogoZone acquisition), prempub.com → Ingenious
Media (ingeniousmedia.com fetch failed), voltagepark.com → ? (merger
with Lightning AI rather than a clean rebrand), and a handful of more
ambiguous Synopsys/Ansys/OmniAccess/Rakuten/Indigital/Synthite signals
that need manual research.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Document detect_rebrands.py cadence as run-once-a-year

The drift sweep is for catching operator rebrands and acquisitions that
accumulated since the previous run; M&A activity over the mapped operator
set is slow enough that yearly is sufficient. Annotate the script's own
docstring, the maps README, and the AGENTS.md "Related utility scripts"
entry so a future contributor doesn't mistake it for a per-batch step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-07 11:31:58 -04:00

.claude

SIGHUP-based configuration reload for watch mode (#697 )

2026-03-21 16:14:48 -04:00

.github

Align Kibana dashboards with OpenSearch Dashboards source-of-truth (#737 )

2026-04-27 01:30:48 -04:00

.vscode

Update dashboard documentation

2026-05-03 12:36:06 -04:00

dashboards

Fix splunk SMTP TLS dashboard: add additional renames for failure details and adjust stats query

2026-05-03 19:58:29 -04:00

docs

docs: update installation instructions for IPinfo Lite and MaxMind GeoLite2 databases

2026-05-04 18:52:18 -04:00

parsedmarc

Drift-detect rebrands: tighten regex; promote 11 verified rebrand-aliased map keys (#753 )

2026-05-07 11:31:58 -04:00

samples

Add example google SMTP-TLS report email

2024-09-04 20:03:51 -04:00

.dockerignore

Add Dockerfile & build/push task (#316 )

2022-05-05 21:06:38 -04:00

.gitattributes

Add additional samples and ensure git does not touch CRLF (#456 )

2024-01-02 16:29:06 -05:00

.gitignore

9.7.0 (#709 )

2026-04-19 21:20:41 -04:00

AGENTS.md

Drift-detect rebrands: tighten regex; promote 11 verified rebrand-aliased map keys (#753 )

2026-05-07 11:31:58 -04:00

build.sh

Format on build

2025-12-12 15:56:52 -05:00

CHANGELOG.md

Bump mailsuite to >=2.0.2 for 9.11.1 release (#743 )

2026-04-30 11:59:11 -04:00

ci.ini

Skip DNS lookups in GitHub Actions to prevent test timeouts (#657 )

2026-02-18 18:19:28 -05:00

CLAUDE.md

Add AGENTS.md for AI agent guidance and link from CLAUDE.md

2026-03-03 21:00:55 -05:00

codecov.yml

Tune Codecov statuses for small PRs (#678 )

2026-03-09 17:43:34 -04:00

CONTRIBUTING.md

Add contributing guide (#685 )

2026-03-09 18:16:47 -04:00

dashboard-dev-bootstrap.sh

Align Kibana dashboards with OpenSearch Dashboards source-of-truth (#737 )

2026-04-27 01:30:48 -04:00

docker-compose.dashboard-dev.yml

9.4.0

2026-03-23 17:08:26 -04:00

docker-compose.yml

Update OpenSearch healthcheck to use HTTPS and include authentication

2026-03-16 17:53:37 -04:00

Dockerfile

Updated default python docker base image to 3.13-slim (#618 )

2025-10-29 22:34:06 -04:00

LICENSE

First commit

2018-02-05 20:23:07 -05:00

publish-docs.sh

Add publish-docs.sh

2022-10-04 18:45:57 -04:00

pyproject.toml

Bump mailsuite to >=2.0.2 for 9.11.1 release (#743 )

2026-04-30 11:59:11 -04:00

README.md

Update sponsorship section in README and documentation

2026-04-04 22:14:38 -04:00

SECURITY.md

Add security policy (#688 )

2026-03-09 18:24:16 -04:00

tests.py

Offload mailbox layer to mailsuite>=2.0.0 (#741 )

2026-04-28 00:58:36 -04:00

README.md

parsedmarc

parsedmarc is a Python module and CLI utility for parsing DMARC reports. When used with Elasticsearch and Kibana (or Splunk), it works as a self-hosted open-source alternative to commercial DMARC report processing services such as Agari Brand Protection, Dmarcian, OnDMARC, ProofPoint Email Fraud Defense, and Valimail.

Note

Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.

Features

Parses draft and 1.0 standard aggregate/rua DMARC reports
Parses forensic/failure/ruf DMARC reports
Parses reports from SMTP TLS Reporting
Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
Transparently handles gzip or zip compressed reports
Consistent data structures
Simple JSON and/or CSV output
Optionally email the results
Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
Optionally send reports to Apache Kafka

Python Compatibility

This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.

Version	Supported	Reason
< 3.6	❌	End of Life (EOL)
3.6	❌	Used in RHEL 8, but not supported by project dependencies
3.7	❌	End of Life (EOL)
3.8	❌	End of Life (EOL)
3.9	❌	Used in Debian 11 and RHEL 9, but not supported by project dependencies
3.10	✅	Actively maintained
3.11	✅	Actively maintained; supported until June 2028 (Debian 12)
3.12	✅	Actively maintained; supported until May 2035 (RHEL 10)
3.13	✅	Actively maintained; supported until June 2030 (Debian 13)
3.14	✅	Supported (requires `imapclient>=3.1.0`)

README.md

parsedmarc

Sponsors

Features

Python Compatibility