mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-20 10:55:24 +00:00
Classify reverse DNS map: final cleanup batch (~2,650 unmapped MMDB ASN domains) (#762)
Final cleanup pass to clear the remaining MMDB AS-domain queue. Applied an expanded multilingual classifier covering all 44 README industry types plus an Energy concept (mapped to Utilities pending a README addition). Per-detector keyword lists now include Spanish, Portuguese, French, Italian, German, Dutch, Russian, Polish, Czech, Turkish, Greek, Chinese (simplified and traditional), Japanese, Korean, Arabic, Hebrew, Hindi, Vietnamese, Indonesian, and Thai where the concept has a recognizable local-language equivalent. - 980 added to base_reverse_dns_map.csv (ISP 193, Education 193, Finance 155, Government 109, Healthcare 93, Web Host 37, MSP 31, Manufacturing 22, Logistics 17, Real Estate 12, Travel 11, Consulting 10, Tech 9, Nonprofit 9, Legal 9, Food 9, Retail 8, Religion 8, Utilities 7, plus smaller volumes across 14 more types). - 1,669 added to known_unknown_base_reverse_dns.txt — the residual unfetchable / parked / Cloudflare-challenged / non-recognized-content rows. ASN-domain coverage of the bundled IPinfo Lite MMDB after this batch: - by domain count: 29,083 / 63,993 (45.45%) - by IPv4 weight: 98.36% Total since batch 5: ~16,400 map rows + ~17,400 known-unknown rows added across 9 batches. Remaining unmapped pool size: 0 — every MMDB AS-domain has now been processed (either classified or recorded in known-unknown). Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in: