mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-20 10:55:24 +00:00
8cc017fe84e1a436b9d706bc89b3a83980ed1825
* Add Tier 0 to the verification triage: globally-known brand at primary domain In the previous ASN-domain coverage sweep, the agent ran web searches for entries like `bestbuy.com → Best Buy`, `ups.com → United Parcel Service`, `usps.gov → US Postal Service`, `marriott.com → Marriott`, `henkel.cn → Henkel`, `experian.com → Experian`, `jd.com → JD.com`, `ing.com → ING`, `verisign.com → Verisign`. For each of these the domain ↔ brand pairing is encyclopedic — same outcome a few seconds slower. The two-corroborating-sources rule (rule 8) was being applied mechanically: "MMDB as_name alone is one source, must fetch a second." But for globally-known brands at their primary domain, the brand identity itself is the second source. Searching for confirmation that Best Buy owns bestbuy.com is the kind of busywork the tier system exists to avoid. Adds Tier 0 with explicit guardrails — must be globally known (multinational or top-tier-national, decades-old, single canonical entity), must be the entity's primary marketing/corporate domain (not a tracking subdomain or regional ccTLD where ownership is non-obvious), and no recent acquisition/rebrand status in question. Cross-references the existing parent-too-generic sub-rule and warns against stretching to mid-size brands the agent happens to recognize. When in doubt: drop to Tier 3 and search. Also generalizes the section's lead from "redirect-target candidates" to cover MMDB coverage-gap and PSL private-domain candidates — the tier logic transfers cleanly across all three workflows. Updates the Tier 1 description with an explicit MMDB-coverage-gap analog. Refreshes the held-back-review split stat to 0 / 109 / 2 / 34 / 35 (Tier 0 didn't apply to that batch because every candidate was a redirect target that needed to inherit the *source row's* existing canonical name, not its own brand identity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ASN-domain coverage sweep #3: 516 new map entries Third pass against the IPinfo Lite MMDB coverage gap, processing the top ~500 unmapped as_domain entries by IPv4 weight after the prior two sweeps. Verifies each entry against AGENTS.md's tiered triage: - **Tier 0** (globally-known brand at primary domain, no search needed): Barclays, Liberty Mutual, Zurich Insurance, ABN AMRO, Swedbank, CIBC, Allstate, Julius Baer, MUFG, Travelers, USPS-Bank, ING, Florida Blue, AgriBank, Energy Transfer, FirstEnergy, Scania, Evonik, Merck KGaA, Agfa, Bosch, Iveco, Applied Materials, Micron, Andritz, Whirlpool, Leonardo, QinetiQ, Atlas Elektronik, Draper, Airbus, Jacobs Engineering, Teledyne, Dropbox, Autodesk, Wind River, Stratus, Unisys, ByteDance, Chevron, BBC, CDC, NEC, HPE, Kimberly-Clark, U.S. Bank, NATO, EUROCONTROL, Federal Reserve, NIST, NSF, DARPA, Library of Congress, IMF, FAO, IAEA, ITU, several US state/county/city governments, Australian state/federal departments, European national agencies, United Airlines, Alaska Airlines, Rakuten Mobile, Coles, Woolworths. - **Tier 1** (MMDB as_name lexically matches candidate domain, no search needed): ~150+ ISPs / hosters / cable TV operators where the as_name itself is the second corroborating source — major national/regional telcos (BTC Botswana, Uganda Telecom, ONE Albania, Tanzania Telecommunications, Kyrgyztelecom, Uzbektelekom, Telecom Algeria, MTN Rwanda, Vodacom Tanzania, Celcom Axiata, Triple T Broadcasting/Jasmine Thailand, MyRepublic Indonesia, Northwestel Canada, JT Jersey, Liberty Networks Colombia, ARLINK Argentina, Cable & Wireless Dominica, SETAR Aruba, AR Telecom Portugal), regional fiber providers (Trooli, Allied Telecom, OEC Fiber, Conexon Connect, Ben Lomand, Great Plains, BrightNet Oklahoma, All West, SDN, Tularosa, Blackfoot, Greeneville Energy, Avanti Broadband, Net at Once, Avanti, Aura Fiber, Stichting Breedband Delft), regional cable TV operators across Japan/Korea/Taiwan (Miyazaki Cable, Toyohashi Cable, Nagasaki Cable, Cable TV Toyama, Kurashiki Cable, Himeji Cable, Keumgang Cable Network), data center operators (eStruxture, PureVoltage, Hyonix, NovoServe, Voxility, Webzilla, Worldstream, Atman Poland, EO Data Center). - **Education** (TLD-restricted .edu / .ac.* / .edu.* — restriction is itself a corroborating source): 200+ universities and research institutions across US, Canada, Europe, Asia, and Australia, including Notre Dame, Washington State, U Texas Rio Grande Valley / Arlington / El Paso / San Antonio / Medical Branch, McMaster, U Ottawa, U Calgary, U Waterloo, Memorial U Newfoundland, U Auckland, U Otago, TU Munich, U Cologne, Goethe Frankfurt, Ruhr-Bochum, U Warwick, Chalmers, Lund, Gothenburg, Luleå, Osaka, Yonsei, Kasetsart, Pusan, Kuwait U, Aristotle Thessaloniki, Ł Tech U, Vienna U Economics, several Cancer Research Centers (MSKCC, Fred Hutchinson, MD Anderson, Cold Spring Harbor), national research institutes (KEK, IAEA, ITRI Taiwan, ETRI, IPM Iran, Smithsonian, ucar, Jefferson Lab, CSHL, mbari, Lam Research, Andritz Hydropower, sri.com, GSI Germany, Max Delbrück, jhuapl). - **Government** (.gov / .gov.* TLD-restricted, or as_name unambiguously names a government entity): NIST, NSF, NATO, DARPA, ITU, FAO, IAEA, IMF, US Centers for Disease Control, Federal Reserve, Library of Congress, Idaho/Chicago/King County/Pierce County/State of New York, Indianapolis, Tacoma, Fairfax County, Sweden's Vägverket and Forsakringskassan, Hessen GWDG, ANSTO Australia, South Florida Water Management District, Communications Research Centre Canada, Dataport Germany, Cenitex Victoria, EUROCONTROL. Skipped: Cox Enterprises (multi-product parent, no clean type fit), Tucows already added, sknt.ru already added, etc. Full triage shows 1 duplicate-skip from the apply pass. Sortlists.py runs cleanly. All 516 type values validate against base_reverse_dns_types.txt. No collisions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
parsedmarc
parsedmarc is a Python module and CLI utility for parsing DMARC
reports. When used with Elasticsearch and Kibana (or Splunk), it works
as a self-hosted open-source alternative to commercial DMARC report
processing services such as Agari Brand Protection, Dmarcian, OnDMARC,
ProofPoint Email Fraud Defense, and Valimail.
Note
Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.
Sponsors
This is a project is maintained by one developer. Please consider sponsoring my work if you or your organization benefit from it.
Features
- Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf DMARC reports
- Parses reports from SMTP TLS Reporting
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports
- Consistent data structures
- Simple JSON and/or CSV output
- Optionally email the results
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
- Optionally send reports to Apache Kafka
Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.
| Version | Supported | Reason |
|---|---|---|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Supported (requires imapclient>=3.1.0) |
Description
Languages
Python
98.2%
Shell
1.7%
