mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-20 10:55:24 +00:00
34518585b60177bf2fd78db8554febdd0a13ba9b
The next 1000 by aggregate IPv4 weight, all sitting in the long tail (each
candidate ASN holds ~7,400 IPv4 addresses, ~0.21% of total v4 weight), so
auto-classification rate is modest compared to head-batches:
- 460 added to base_reverse_dns_map.csv (ISP 344, Web Host 60, Education 21,
MSP 12, Healthcare 8, Government 8, Finance 7).
- 540 added to known_unknown_base_reverse_dns.txt — homepages that were
parked, behind a Cloudflare bot challenge, returning a generic-server test
page, in obscure languages with no telecom-keyword cognates the classifier
recognized, or whose WHOIS / MMDB as_name didn't combine with any
homepage signal to clear two corroborating sources.
Classifier improvements applied this batch (relative to prior batches' code):
- MMDB as_name is the primary brand source, with cleaned title as fallback
and domain-derived as last resort (WHOIS is mostly privacy-redacted at
this depth in the long tail).
- Title-segment selection now prefers the segment whose simplified form
contains the domain root, catching cases like accessmontana.com whose
as_name is the holding company "MONTANA WEST, L.L.C." but whose title
surfaces the operator brand "Access Montana".
- as_name fallback for ISP added "Communications" (with a media-context
guard so "Christian Broadcasting Network" doesn't hit) plus bare
"Internet" / "Cable" / "Telephone Co." patterns common in rural-US ISP
brands.
- Government TLD list expanded for .go.id, .gv.at, .gov.cn, .gob.cl/ar/gt,
.admin.ch, etc.; Education TLD list expanded for .ac.kr / .ac.za /
.ac.nz / .edu.cn / .edu.tw / .edu.sg / .edu.my / .edu.ph / .edu.eg.
- MSP detection re-added (`it solutions` / `managed it support` /
`managed tech` patterns) for marconet.com / odyssey.uk / vmi.se type
long-tail managed-IT shops.
- Brand cleanup deepened to handle Brazilian EPP / EIRELI ME, Italian
s.c.a r.l, Polish sp z o.o variants, Lithuanian UAB, Czech Druzstvo,
Venezuelan C.A., trailing-single-letter artifacts, and double-spaces.
- Encoding-mojibake fixer for the common UTF-8-as-Latin-1 cases
("Fibra óptica" → "Fibra óptica") so Spanish/Portuguese ISP pages
classify even when collect_domain_info.py mishandled the encoding.
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parsedmarc
parsedmarc is a Python module and CLI utility for parsing DMARC
reports. When used with Elasticsearch and Kibana (or Splunk), it works
as a self-hosted open-source alternative to commercial DMARC report
processing services such as Agari Brand Protection, Dmarcian, OnDMARC,
ProofPoint Email Fraud Defense, and Valimail.
Note
Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.
Sponsors
This is a project is maintained by one developer. Please consider sponsoring my work if you or your organization benefit from it.
Features
- Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf DMARC reports
- Parses reports from SMTP TLS Reporting
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports
- Consistent data structures
- Simple JSON and/or CSV output
- Optionally email the results
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
- Optionally send reports to Apache Kafka
Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.
| Version | Supported | Reason |
|---|---|---|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Supported (requires imapclient>=3.1.0) |
Description
Languages
Python
98.2%
Shell
1.7%
