Sean Whalen 34518585b6 Classify reverse DNS map: next 1000 unmapped MMDB ASN domains (#754)
The next 1000 by aggregate IPv4 weight, all sitting in the long tail (each
candidate ASN holds ~7,400 IPv4 addresses, ~0.21% of total v4 weight), so
auto-classification rate is modest compared to head-batches:

- 460 added to base_reverse_dns_map.csv (ISP 344, Web Host 60, Education 21,
  MSP 12, Healthcare 8, Government 8, Finance 7).
- 540 added to known_unknown_base_reverse_dns.txt — homepages that were
  parked, behind a Cloudflare bot challenge, returning a generic-server test
  page, in obscure languages with no telecom-keyword cognates the classifier
  recognized, or whose WHOIS / MMDB as_name didn't combine with any
  homepage signal to clear two corroborating sources.

Classifier improvements applied this batch (relative to prior batches' code):

- MMDB as_name is the primary brand source, with cleaned title as fallback
  and domain-derived as last resort (WHOIS is mostly privacy-redacted at
  this depth in the long tail).
- Title-segment selection now prefers the segment whose simplified form
  contains the domain root, catching cases like accessmontana.com whose
  as_name is the holding company "MONTANA WEST, L.L.C." but whose title
  surfaces the operator brand "Access Montana".
- as_name fallback for ISP added "Communications" (with a media-context
  guard so "Christian Broadcasting Network" doesn't hit) plus bare
  "Internet" / "Cable" / "Telephone Co." patterns common in rural-US ISP
  brands.
- Government TLD list expanded for .go.id, .gv.at, .gov.cn, .gob.cl/ar/gt,
  .admin.ch, etc.; Education TLD list expanded for .ac.kr / .ac.za /
  .ac.nz / .edu.cn / .edu.tw / .edu.sg / .edu.my / .edu.ph / .edu.eg.
- MSP detection re-added (`it solutions` / `managed it support` /
  `managed tech` patterns) for marconet.com / odyssey.uk / vmi.se type
  long-tail managed-IT shops.
- Brand cleanup deepened to handle Brazilian EPP / EIRELI ME, Italian
  s.c.a r.l, Polish sp z o.o variants, Lithuanian UAB, Czech Druzstvo,
  Venezuelan C.A., trailing-single-letter artifacts, and double-spaces.
- Encoding-mojibake fixer for the common UTF-8-as-Latin-1 cases
  ("Fibra óptica" → "Fibra óptica") so Spanish/Portuguese ISP pages
  classify even when collect_domain_info.py mishandled the encoding.

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 12:06:22 -04:00
2026-05-03 12:36:06 -04:00
2026-04-19 21:20:41 -04:00
2025-12-12 15:56:52 -05:00
2026-03-09 18:16:47 -04:00
2026-03-23 17:08:26 -04:00
2018-02-05 20:23:07 -05:00
2022-10-04 18:45:57 -04:00
2026-03-09 18:24:16 -04:00

parsedmarc

Build
Status Code
Coverage PyPI
Package PyPI - Downloads

A screenshot of DMARC summary charts in Kibana

parsedmarc is a Python module and CLI utility for parsing DMARC reports. When used with Elasticsearch and Kibana (or Splunk), it works as a self-hosted open-source alternative to commercial DMARC report processing services such as Agari Brand Protection, Dmarcian, OnDMARC, ProofPoint Email Fraud Defense, and Valimail.

Note

Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.

Sponsors

This is a project is maintained by one developer. Please consider sponsoring my work if you or your organization benefit from it.

Features

  • Parses draft and 1.0 standard aggregate/rua DMARC reports
  • Parses forensic/failure/ruf DMARC reports
  • Parses reports from SMTP TLS Reporting
  • Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
  • Transparently handles gzip or zip compressed reports
  • Consistent data structures
  • Simple JSON and/or CSV output
  • Optionally email the results
  • Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
  • Optionally send reports to Apache Kafka

Python Compatibility

This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.

Version Supported Reason
< 3.6 End of Life (EOL)
3.6 Used in RHEL 8, but not supported by project dependencies
3.7 End of Life (EOL)
3.8 End of Life (EOL)
3.9 Used in Debian 11 and RHEL 9, but not supported by project dependencies
3.10 Actively maintained
3.11 Actively maintained; supported until June 2028 (Debian 12)
3.12 Actively maintained; supported until May 2035 (RHEL 10)
3.13 Actively maintained; supported until June 2030 (Debian 13)
3.14 Supported (requires imapclient>=3.1.0)
S
Description
No description provided
Readme Apache-2.0 160 MiB
Languages
Python 98.2%
Shell 1.7%