mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-04-24 14:29:27 +00:00
Expand reverse-DNS map and PSL overrides from the live PSL (#716)
* Expand reverse-DNS map and PSL overrides from the live PSL Parses the private-domains section of the live Public Suffix List and adds 269 brand-owned suffixes as PSL overrides paired with map entries, so customer subdomains on shared hosting / SaaS / PaaS platforms fold to the operator's brand. Adds 33 ASN-domain entries for the subset of these brands whose IP space is registered under a different corporate domain in the MMDB, so both the PTR-derived lookup and the ASN-fallback lookup hit the same (name, type). Also normalizes ``a2hosting.com`` from ``A2Hosting`` to ``A2 Hosting`` for spelling consistency. PTR-path wins (overrides + map entries) - Web hosts: A2 Hosting, alwaysdata, Antagonist, Beget, bplaced, Bytemark, Combell, cyber_Folks, cyon, DreamHost, EasyWP, Gehirn, HelioHost, home.pl, HostyHosting, Hypernode, IONOS (6 suffixes), Jotelulu, JouwWeb, KaasHosting, Keyweb, LCube, LiquidNet, McHost, Memset, Mittwald, Mythic Beasts, NearlyFreeSpeech, Nimbus Hosting, One.com (20 ccTLD variants), OwnProvider, Pantheon, Planet-Work, prgmr, Rackmaze, Rad Web Hosting, Raidboxes, Servebolt, SpeedPartner, Uberspace, Whatbox, WP Engine, ZAP-Hosting, Zitcom. - Dynamic DNS: DuckDNS, DynDNS (24), No-IP (22), Now-DNS, dynv6, freemyip, nsupdate.info, ddnss.de, GoIP, DrayTek. - PaaS/SaaS/IaaS: Netlify, Vercel (6), Heroku, fly.io, Render, Firebase/GCP (4), Azure (5), AWS (4), DigitalOcean (2), Red Hat OpenShift, Hasura, Supabase, Snowflake/Streamlit, Read the Docs, PythonAnywhere, GitHub, GitLab, Adobe Magento. - Hosted sites/stores: Hatena (6), Notion, Figma, Webflow, Wix (4), Shopify, Shopware, Sellfy, Spreadshop (19 ccTLDs), Datto. - Email/Marketing: Fastmail, ActiveTrail, Leadpages, Heyflow, Carrd, Typeform. - CDN/Technology: Akamai (7), Fastly (3), Yandex Cloud. ASN-path wins (MMDB coverage now attributes 1,184,256 more IPv4 addresses to a named brand, 85.04% -> 85.08%): yandex.com, ya.ru, hosting.com (A2 Hosting), beget.com, cyberfolks.pl, fly.io, bytemark.co.uk, cyberfolks.ro, keyweb.de, mittwald.de, memset.com, zap-hosting.com, datto.com, jotelulu.com, yandex.cloud, github.com, asavie.com (Akamai), and 16 others. Entries are curated from the live PSL rather than any bundled copy; brand / as_name attribution was verified against the CLAUDE.md rule that the IP-WHOIS signal is only trusted when the domain name itself matches the host's name (name-collisions in MMDB were skipped — Hypernode AU, goipgroup.com, liquidnet.com, One.com substring noise, nimbusitsolutions.com, etc.). Types follow ``base_reverse_dns_types.txt``; ``sortlists.py`` re-sorts + dedupes + validates after the batch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document PSL-derived override workflow and load_psl_overrides gotcha Adds three pieces of map-maintenance context learned while building this PR: - New subsection "Discovering overrides from the live PSL private-domains section" — distinct source from live DMARC data (unknown_base_reverse_dns.csv) and MMDB coverage-gap analysis. The private section is itself a list of brand-owned suffixes; each is a candidate (psl_override + map entry) pair. Emphasizes ruthless selectivity — most of the 600+ private-section orgs are dev sandboxes or hobby zones that will never appear in DMARC reports. - Two-path coverage as a single linked step, not two round-trips: when adding a PSL override for a hosted-content suffix (netlify.app), also add a map row for the brand's corporate as_domain (netlify.com) in the same pass. The override fixes the PTR path; the ASN-domain alias fixes the ASN-fallback path. - The load_psl_overrides() fetch-first gotcha. The no-arg form pulls the file from master on GitHub, so end-to-end testing of local overrides silently uses the old remote version. offline=True is required to test local changes against get_base_domain(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
22
AGENTS.md
22
AGENTS.md
@@ -172,6 +172,28 @@ for d, c, n in miss[:50]:
|
||||
|
||||
Apply the same classification rules above (precedence, naming consistency, skip-if-ambiguous, privacy). Many top misses will be brands already in the map under a different rDNS-base key — the goal there is to alias the ASN domain to the same `(name, type)` so both lookup paths hit. For ASN domains with no obvious brand identity (small resellers, parked ASNs), don't map them — the attribution code falls back to the raw `as_name` from the MMDB, which is better than a guess.
|
||||
|
||||
### Discovering overrides from the live PSL private-domains section
|
||||
|
||||
Separately from live DMARC data and the MMDB, the [Public Suffix List](https://publicsuffix.org/list/public_suffix_list.dat) is itself a source of override candidates. Every entry between `===BEGIN PRIVATE DOMAINS===` and `===END PRIVATE DOMAINS===` is a brand-owned suffix by definition (registered by the operator under their own name), so each is a candidate for a `(psl_override + map entry)` pair — folding `customer.brand.tld` → `brand.tld` and attributing it to the operator.
|
||||
|
||||
Workflow:
|
||||
|
||||
1. Fetch the live PSL file and parse the private section by `// Org` comment blocks → `{org: [suffixes]}`.
|
||||
2. Cross-reference against `base_reverse_dns_map.csv` keys and existing `psl_overrides.txt` entries to drop already-covered orgs.
|
||||
3. **Be ruthlessly selective.** The private section has 600+ orgs, most of which are dev sandboxes, dynamic DNS services, IPFS gateways, single-person hobby domains, or registry subzones that will never appear in a DMARC report. Keep only orgs that clearly host email senders — shared web hosts, PaaS / SaaS where customers publish mail-sending sites, email/marketing platforms, major ISPs, dynamic-DNS services that home mail servers actually use.
|
||||
4. For each kept org, emit one override (`.brand.tld` per the `psl_overrides.txt` format) and one map row per suffix, all pointing at the same `(name, type)`. Apply the README precedence rules for `type`. Grep existing map keys for the brand name before inventing a new one — the goal is a single canonical display name per operator.
|
||||
5. **Same-PR follow-up: two-path coverage.** For every brand added this way, also check whether the brand's corporate domain (e.g. `netlify.com` for `netlify.app`, `shopify.com` for `myshopify.com`, `beget.com` for `beget.app`) is an `as_domain` in the MMDB, and add a map row for it with the same `(name, type)`. The PSL override fixes the PTR path; the ASN-domain alias fixes the ASN-fallback path. Do these together — one pass, not two.
|
||||
|
||||
### The `load_psl_overrides()` fetch-first gotcha
|
||||
|
||||
`parsedmarc.utils.load_psl_overrides()` with no arguments fetches the overrides file from `raw.githubusercontent.com/domainaware/parsedmarc/master/...` *first* and only falls back to the bundled local file on network failure. This means end-to-end testing of local `psl_overrides.txt` changes via `get_base_domain()` silently uses the old remote version until the PR merges. When testing local changes, explicitly pass `offline=True`:
|
||||
|
||||
```python
|
||||
from parsedmarc.utils import load_psl_overrides, get_base_domain
|
||||
load_psl_overrides(offline=True)
|
||||
assert get_base_domain("host01.netlify.app") == "netlify.app"
|
||||
```
|
||||
|
||||
### After a batch merge
|
||||
|
||||
- Re-sort `base_reverse_dns_map.csv` alphabetically (case-insensitive) by the first column and write it out with CRLF line endings.
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -11,19 +11,288 @@
|
||||
-tataidc.co.in
|
||||
-veloxfiber.com.br
|
||||
-wconect.com.br
|
||||
.123hjemmeside.dk
|
||||
.123hjemmeside.no
|
||||
.123homepage.it
|
||||
.123kotisivu.fi
|
||||
.123minsida.se
|
||||
.123miweb.es
|
||||
.123paginaweb.pt
|
||||
.123siteweb.fr
|
||||
.123webseite.at
|
||||
.123webseite.de
|
||||
.123website.be
|
||||
.123website.ch
|
||||
.123website.lu
|
||||
.123website.nl
|
||||
.3utilities.com
|
||||
.a2hosted.com
|
||||
.activetrail.biz
|
||||
.akadns.net
|
||||
.akamai.net
|
||||
.akamaiedge.net
|
||||
.akamaihd.net
|
||||
.akamaized.net
|
||||
.alwaysdata.net
|
||||
.amazonaws.com
|
||||
.amplifyapp.com
|
||||
.antagonist.cloud
|
||||
.app-ionos.space
|
||||
.apps-1and1.com
|
||||
.apps-1and1.net
|
||||
.appspot.com
|
||||
.awsapprunner.com
|
||||
.azureedge.net
|
||||
.azurestaticapps.net
|
||||
.azurewebsites.net
|
||||
.basicserver.io
|
||||
.beget.app
|
||||
.begetcdn.cloud
|
||||
.bounceme.net
|
||||
.box.ca
|
||||
.bplaced.com
|
||||
.bplaced.de
|
||||
.bplaced.net
|
||||
.carrd.co
|
||||
.cfolks.pl
|
||||
.cloudaccess.net
|
||||
.cloudapp.net
|
||||
.cloudfront.net
|
||||
.cloudfunctions.net
|
||||
.cloudsite.builders
|
||||
.cprapid.com
|
||||
.cpserver.com
|
||||
.crd.co
|
||||
.customer.speedpartner.de
|
||||
.cyon.link
|
||||
.cyon.site
|
||||
.dattorelay.com
|
||||
.dattoweb.com
|
||||
.ddns.net
|
||||
.ddnsgeek.com
|
||||
.ddnsking.com
|
||||
.ddnss.de
|
||||
.ddnss.org
|
||||
.deltahost-ptr
|
||||
.dh.bytemark.co.uk
|
||||
.digitaloceanspaces.com
|
||||
.dnsalias.com
|
||||
.dnsalias.net
|
||||
.dnsalias.org
|
||||
.dnsup.net
|
||||
.drayddns.com
|
||||
.dreamhosters.com
|
||||
.duckdns.org
|
||||
.dyn-ip24.de
|
||||
.dyndns.biz
|
||||
.dyndns.info
|
||||
.dyndns.org
|
||||
.dyndns.tv
|
||||
.dyndns.ws
|
||||
.dyndns1.de
|
||||
.dynv6.net
|
||||
.e4.cz
|
||||
.edgecompute.app
|
||||
.edgekey.net
|
||||
.edgesuite.net
|
||||
.editorx.io
|
||||
.elasticbeanstalk.com
|
||||
.enterprisecloud.nu
|
||||
.ewp.live
|
||||
.fastlylb.net
|
||||
.fastvps-server.com
|
||||
.figma.site
|
||||
.firebaseapp.com
|
||||
.fly.dev
|
||||
.freeddns.us
|
||||
.freemyip.com
|
||||
.freetls.fastly.net
|
||||
.gehirn.ne.jp
|
||||
.git-repos.de
|
||||
.github.io
|
||||
.githubusercontent.com
|
||||
.gitlab.io
|
||||
.goip.de
|
||||
.gotdns.com
|
||||
.gotdns.org
|
||||
.gotpantheon.com
|
||||
.hasura-app.io
|
||||
.hasura.app
|
||||
.hateblo.jp
|
||||
.hatenablog.com
|
||||
.hatenablog.jp
|
||||
.hatenadiary.com
|
||||
.hatenadiary.jp
|
||||
.hatenadiary.org
|
||||
.helioho.st
|
||||
.heliohost.us
|
||||
.herokuapp.com
|
||||
.heyflow.page
|
||||
.heyflow.site
|
||||
.home-webserver.de
|
||||
.homeftp.net
|
||||
.homeftp.org
|
||||
.homeip.net
|
||||
.homelinux.net
|
||||
.homelinux.org
|
||||
.homesklep.pl
|
||||
.homeunix.net
|
||||
.homeunix.org
|
||||
.hopto.me
|
||||
.hopto.org
|
||||
.hostedpi.com
|
||||
.hosting-cluster.nl
|
||||
.hostyhosting.io
|
||||
.hypernode.io
|
||||
.in-addr-arpa
|
||||
.in-addr.arpa
|
||||
.jote.cloud
|
||||
.jotelulu.cloud
|
||||
.jouwweb.site
|
||||
.kaas.gg
|
||||
.kasserver.com
|
||||
.keymachine.de
|
||||
.khplay.nl
|
||||
.kicks-ass.net
|
||||
.kicks-ass.org
|
||||
.kinghost.net
|
||||
.lcube-server.de
|
||||
.leadpages.co
|
||||
.linode.com
|
||||
.linodeusercontent.com
|
||||
.live-website.com
|
||||
.lpages.co
|
||||
.lpusercontent.com
|
||||
.magentosite.cloud
|
||||
.mcdir.me
|
||||
.mcdir.ru
|
||||
.mcpre.ru
|
||||
.memset.net
|
||||
.miniserver.com
|
||||
.mittwald.info
|
||||
.mittwaldserver.info
|
||||
.mydatto.com
|
||||
.mydatto.net
|
||||
.mydbserver.com
|
||||
.myftp.biz
|
||||
.myftp.org
|
||||
.myhome-server.de
|
||||
.myradweb.net
|
||||
.myrdbx.io
|
||||
.myshopify.com
|
||||
.myspreadshop.at
|
||||
.myspreadshop.be
|
||||
.myspreadshop.ca
|
||||
.myspreadshop.ch
|
||||
.myspreadshop.co.uk
|
||||
.myspreadshop.com
|
||||
.myspreadshop.com.au
|
||||
.myspreadshop.de
|
||||
.myspreadshop.dk
|
||||
.myspreadshop.es
|
||||
.myspreadshop.fi
|
||||
.myspreadshop.fr
|
||||
.myspreadshop.ie
|
||||
.myspreadshop.it
|
||||
.myspreadshop.net
|
||||
.myspreadshop.nl
|
||||
.myspreadshop.no
|
||||
.myspreadshop.pl
|
||||
.myspreadshop.se
|
||||
.na4u.ru
|
||||
.netlify.app
|
||||
.nfshost.com
|
||||
.nh-serv.co.uk
|
||||
.nimsite.uk
|
||||
.no-ip.biz
|
||||
.no-ip.ca
|
||||
.no-ip.co.uk
|
||||
.no-ip.info
|
||||
.no-ip.net
|
||||
.no-ip.org
|
||||
.noip.me
|
||||
.noip.us
|
||||
.notion.site
|
||||
.now-dns.net
|
||||
.now-dns.org
|
||||
.now.sh
|
||||
.nsupdate.info
|
||||
.on-web.fr
|
||||
.ondigitalocean.app
|
||||
.onrender.com
|
||||
.own.pm
|
||||
.ownip.net
|
||||
.ownprovider.com
|
||||
.pantheonsite.io
|
||||
.plesk.page
|
||||
.podzone.net
|
||||
.podzone.org
|
||||
.pythonanywhere.com
|
||||
.rackmaze.com
|
||||
.rackmaze.net
|
||||
.readthedocs-hosted.com
|
||||
.readthedocs.io
|
||||
.redirectme.net
|
||||
.rhcloud.com
|
||||
.sakura.ne.jp
|
||||
.selfip.com
|
||||
.selfip.net
|
||||
.selfip.org
|
||||
.sellfy.store
|
||||
.serveblog.net
|
||||
.servebolt.cloud
|
||||
.servehttp.com
|
||||
.serveminecraft.net
|
||||
.servername.us
|
||||
.service.one
|
||||
.shopware.shop
|
||||
.shopware.store
|
||||
.simplesite.com
|
||||
.simplesite.com.br
|
||||
.simplesite.gr
|
||||
.simplesite.pl
|
||||
.site.rb-hosting.io
|
||||
.snowflake.app
|
||||
.square7.ch
|
||||
.square7.de
|
||||
.square7.net
|
||||
.streamlit.app
|
||||
.streamlitapp.com
|
||||
.supabase.co
|
||||
.supabase.in
|
||||
.supabase.net
|
||||
.svn-repos.de
|
||||
.sytes.net
|
||||
.trafficmanager.net
|
||||
.typeform.com
|
||||
.typo3server.info
|
||||
.uber.space
|
||||
.uk0.bigv.io
|
||||
.user.fm
|
||||
.usercontent.jp
|
||||
.v0.build
|
||||
.vercel.app
|
||||
.vercel.dev
|
||||
.vercel.run
|
||||
.virtualserver.io
|
||||
.vm.bytemark.co.uk
|
||||
.vpndns.net
|
||||
.vusercontent.net
|
||||
.we.bs
|
||||
.web.app
|
||||
.webadorsite.com
|
||||
.webflow.io
|
||||
.webhosting.be
|
||||
.website.one
|
||||
.websitebuilder.online
|
||||
.webspace-host.com
|
||||
.webspaceconfig.de
|
||||
.wixsite.com
|
||||
.wixstudio.com
|
||||
.wixstudio.io
|
||||
.wpenginepowered.com
|
||||
.xen.prgmr.com
|
||||
.yandexcloud.net
|
||||
.zap.cloud
|
||||
.zapto.org
|
||||
tigobusiness.com.ni
|
||||
|
||||
Reference in New Issue
Block a user