ASN-domain coverage sweep #3: 516 new map entries (#735)

* Add Tier 0 to the verification triage: globally-known brand at primary domain

In the previous ASN-domain coverage sweep, the agent ran web searches
for entries like `bestbuy.com → Best Buy`, `ups.com → United Parcel
Service`, `usps.gov → US Postal Service`, `marriott.com → Marriott`,
`henkel.cn → Henkel`, `experian.com → Experian`, `jd.com → JD.com`,
`ing.com → ING`, `verisign.com → Verisign`. For each of these the
domain ↔ brand pairing is encyclopedic — same outcome a few seconds
slower.

The two-corroborating-sources rule (rule 8) was being applied
mechanically: "MMDB as_name alone is one source, must fetch a second."
But for globally-known brands at their primary domain, the brand
identity itself is the second source. Searching for confirmation that
Best Buy owns bestbuy.com is the kind of busywork the tier system
exists to avoid.

Adds Tier 0 with explicit guardrails — must be globally known
(multinational or top-tier-national, decades-old, single canonical
entity), must be the entity's primary marketing/corporate domain
(not a tracking subdomain or regional ccTLD where ownership is
non-obvious), and no recent acquisition/rebrand status in question.
Cross-references the existing parent-too-generic sub-rule and
warns against stretching to mid-size brands the agent happens to
recognize. When in doubt: drop to Tier 3 and search.

Also generalizes the section's lead from "redirect-target candidates"
to cover MMDB coverage-gap and PSL private-domain candidates — the
tier logic transfers cleanly across all three workflows. Updates the
Tier 1 description with an explicit MMDB-coverage-gap analog.

Refreshes the held-back-review split stat to 0 / 109 / 2 / 34 / 35
(Tier 0 didn't apply to that batch because every candidate was a
redirect target that needed to inherit the *source row's* existing
canonical name, not its own brand identity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ASN-domain coverage sweep #3: 516 new map entries

Third pass against the IPinfo Lite MMDB coverage gap, processing the
top ~500 unmapped as_domain entries by IPv4 weight after the prior two
sweeps. Verifies each entry against AGENTS.md's tiered triage:

- **Tier 0** (globally-known brand at primary domain, no search
  needed): Barclays, Liberty Mutual, Zurich Insurance, ABN AMRO,
  Swedbank, CIBC, Allstate, Julius Baer, MUFG, Travelers, USPS-Bank,
  ING, Florida Blue, AgriBank, Energy Transfer, FirstEnergy, Scania,
  Evonik, Merck KGaA, Agfa, Bosch, Iveco, Applied Materials, Micron,
  Andritz, Whirlpool, Leonardo, QinetiQ, Atlas Elektronik, Draper,
  Airbus, Jacobs Engineering, Teledyne, Dropbox, Autodesk, Wind River,
  Stratus, Unisys, ByteDance, Chevron, BBC, CDC, NEC, HPE,
  Kimberly-Clark, U.S. Bank, NATO, EUROCONTROL, Federal Reserve, NIST,
  NSF, DARPA, Library of Congress, IMF, FAO, IAEA, ITU, several US
  state/county/city governments, Australian state/federal departments,
  European national agencies, United Airlines, Alaska Airlines,
  Rakuten Mobile, Coles, Woolworths.

- **Tier 1** (MMDB as_name lexically matches candidate domain, no
  search needed): ~150+ ISPs / hosters / cable TV operators where
  the as_name itself is the second corroborating source — major
  national/regional telcos (BTC Botswana, Uganda Telecom, ONE Albania,
  Tanzania Telecommunications, Kyrgyztelecom, Uzbektelekom, Telecom
  Algeria, MTN Rwanda, Vodacom Tanzania, Celcom Axiata, Triple T
  Broadcasting/Jasmine Thailand, MyRepublic Indonesia, Northwestel
  Canada, JT Jersey, Liberty Networks Colombia, ARLINK Argentina,
  Cable & Wireless Dominica, SETAR Aruba, AR Telecom Portugal),
  regional fiber providers (Trooli, Allied Telecom, OEC Fiber,
  Conexon Connect, Ben Lomand, Great Plains, BrightNet Oklahoma,
  All West, SDN, Tularosa, Blackfoot, Greeneville Energy, Avanti
  Broadband, Net at Once, Avanti, Aura Fiber, Stichting Breedband
  Delft), regional cable TV operators across Japan/Korea/Taiwan
  (Miyazaki Cable, Toyohashi Cable, Nagasaki Cable, Cable TV Toyama,
  Kurashiki Cable, Himeji Cable, Keumgang Cable Network), data center
  operators (eStruxture, PureVoltage, Hyonix, NovoServe, Voxility,
  Webzilla, Worldstream, Atman Poland, EO Data Center).

- **Education** (TLD-restricted .edu / .ac.* / .edu.* — restriction is
  itself a corroborating source): 200+ universities and research
  institutions across US, Canada, Europe, Asia, and Australia,
  including Notre Dame, Washington State, U Texas Rio Grande Valley /
  Arlington / El Paso / San Antonio / Medical Branch, McMaster, U
  Ottawa, U Calgary, U Waterloo, Memorial U Newfoundland, U Auckland,
  U Otago, TU Munich, U Cologne, Goethe Frankfurt, Ruhr-Bochum, U
  Warwick, Chalmers, Lund, Gothenburg, Luleå, Osaka, Yonsei, Kasetsart,
  Pusan, Kuwait U, Aristotle Thessaloniki, Ł Tech U, Vienna U Economics,
  several Cancer Research Centers (MSKCC, Fred Hutchinson, MD Anderson,
  Cold Spring Harbor), national research institutes (KEK, IAEA, ITRI
  Taiwan, ETRI, IPM Iran, Smithsonian, ucar, Jefferson Lab,
  CSHL, mbari, Lam Research, Andritz Hydropower, sri.com, GSI Germany,
  Max Delbrück, jhuapl).

- **Government** (.gov / .gov.* TLD-restricted, or as_name unambiguously
  names a government entity): NIST, NSF, NATO, DARPA, ITU, FAO, IAEA,
  IMF, US Centers for Disease Control, Federal Reserve, Library of
  Congress, Idaho/Chicago/King County/Pierce County/State of New York,
  Indianapolis, Tacoma, Fairfax County, Sweden's Vägverket and
  Forsakringskassan, Hessen GWDG, ANSTO Australia, South Florida
  Water Management District, Communications Research Centre Canada,
  Dataport Germany, Cenitex Victoria, EUROCONTROL.

Skipped: Cox Enterprises (multi-product parent, no clean type fit),
Tucows already added, sknt.ru already added, etc. Full triage shows
1 duplicate-skip from the apply pass.

Sortlists.py runs cleanly. All 516 type values validate against
base_reverse_dns_types.txt. No collisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Sean Whalen
2026-04-26 21:01:47 -04:00
committed by GitHub
parent d6d50a45e5
commit 8cc017fe84
File diff suppressed because it is too large Load Diff