mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-05-01 17:52:31 +00:00
skip-weak-fallback-cache
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f0781c6191 |
IPinfo API: keep only documented behavior (#721)
* Strip invented IPinfo API behavior; keep documented-only The IPinfo Lite API docs (https://ipinfo.io/developers/lite-api) state: "The API has no daily or monthly limit and provides unlimited access." Auth is documented as a ?token= query param only. The /me shown in the docs returns geolocation for the caller's IP — it is not a documented account/quota endpoint for Lite. Removed everything that was speculating beyond the docs: - The /me probe that pretended to return plan/limit/remaining fields. - 429 rate-limit handling, 402 quota-exhausted handling, Retry-After parsing, cooldown state, and the rate-limit warning / recovery-info logging around them. - The Authorization: Bearer header (not documented for Lite). Kept: - Lookups against the documented /lite/<ip>?token=<token> endpoint. - 401/403 treated as a fatal invalid-token (reasonable defensive check). - Network-error and non-2xx fallback to the bundled/cached MMDB. - A simple startup probe that validates the token with a single lookup and logs "IPinfo API configured" at info level. Test consolidated to cover only documented paths: success, 401 fatal, non-2xx fallback, and that auth goes in ?token= (not Authorization). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * AGENTS.md: warn against speculating past external-service docs New subsection under Configuration spelling out that third-party API integrations must start with a direct WebFetch of the canonical docs page, not a subagent query. Calls out the two traps that produced the IPinfo speculation: (1) asking subagents question shapes that presuppose the answer exists, and (2) treating feature asks as "build this" without first checking "does this apply to this service?". Uses the now-reverted IPinfo speculation as the cautionary tale so the next session has a concrete example to recognize the shape of the mistake. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Bump to 9.10.1; put removal under a new CHANGELOG section Restored the 9.10.0 entry to its as-shipped wording and moved the speculation-removal note into its own 9.10.1 Fixed section. Editing the 9.10.0 entry would have misrepresented what was actually released — the shipped tag does contain the /me probe, 429/402 cooldown, Retry-After parsing, and Bearer auth, and the changelog should say so. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c5f432c460 |
Add optional IPinfo Lite REST API with MMDB fallback (#717)
* Add optional IPinfo Lite REST API with MMDB fallback
Configure [general] ipinfo_api_token (or PARSEDMARC_GENERAL_IPINFO_API_TOKEN)
and every IP lookup hits https://api.ipinfo.io/lite/<ip> first for fresh
country + ASN data. On HTTP 429 (rate-limit) or 402 (quota), the API is
disabled for the rest of the run and lookups fall through to the bundled /
cached MMDB; transient network errors fall through per-request without
disabling the API. An invalid token (401/403) raises InvalidIPinfoAPIKey,
which the CLI catches and exits fatally — including at startup via a probe
lookup so operators notice misconfiguration immediately. Added
ipinfo_api_url as a base-URL override for mirrors or proxies.
The API token is never logged. A new _normalize_ip_record() helper is
shared between the API path and the MMDB path so both paths produce the
same normalized shape (country code, asn int, asn_name, asn_domain).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* IPinfo API: cool down and retry instead of permanent disable
Previously a single 429 or 402 disabled the API for the whole run. Now
each event sets a cooldown (using Retry-After when present, defaulting to
5 minutes for rate limits and 1 hour for quota exhaustion). Once the
cooldown expires the next lookup retries; a successful retry logs
"IPinfo API recovered" once at info level so operators can see service
came back. Repeat rate-limit responses after the first event stay at
debug to avoid log spam.
Test now targets parsedmarc.log (the actual emitting logger) instead of
the parsedmarc parent — cli._main() sets the child's level to ERROR,
and assertLogs on the parent can't see warnings filtered before
propagation. Test also exercises the cooldown-then-recovery path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* IPinfo API: log plan and quota from /me at startup
Configure-time probe now hits https://ipinfo.io/me first. That endpoint
is documented as quota-free and doubles as a free-of-quota token check,
so we use it to both validate the token and surface plan / month-to-date
usage / remaining-quota numbers at info level:
IPinfo API configured — plan: Lite, usage: 12345/50000 this month, 37655 remaining
Field names in /me have drifted across IPinfo plan generations, so the
summary formatter probes a few aliases before giving up. If /me is
unreachable (custom mirror behind ipinfo_api_url, network error) we
fall back to the original 1.1.1.1 lookup probe, which still validates
the token and logs a generic "configured" message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Drop speculative ipinfo_api_url override
It was added mirroring ip_db_url, but the two serve different needs.
ip_db_url has a real use (internal hosting of the MMDB); an
authenticated IPinfo API isn't something anyone mirrors, and /me was
always hardcoded anyway, making the override half-baked. YAGNI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* AGENTS.md: warn against speculative config options
New section under Configuration spelling out that every option is
permanent surface area and must come from a real user need rather than
pattern-matching a nearby option. Cites the removed ipinfo_api_url as
the canonical cautionary tale so the next session doesn't reintroduce
it, and calls out "override the base URL" / "configurable retries" as
common YAGNI traps.
Also requires that new options land fully wired in one PR (INI schema,
_parse_config, Namespace defaults, docs, SIGHUP-reload path) rather
than half-implemented.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Rename [general] ip_db_url to ipinfo_url
The bundled MMDB is specifically IPinfo Lite, so the option name
should say so. ip_db_url stays accepted as a deprecated alias and
logs a warning when used; env-var equivalents accept either spelling
via the existing PARSEDMARC_{SECTION}_{KEY} machinery.
Updated the AGENTS.md cautionary tale to refer to ipinfo_url (with
the note about the alias) so the anti-pattern example still reads
correctly post-rename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix testPSLDownload to reflect .akamaiedge.net override
PSL carries c.akamaiedge.net as a public suffix, but
psl_overrides.txt intentionally folds .akamaiedge.net so every
Akamai CDN-customer PTR (the aXXXX-XX.cXXXXX.akamaiedge.net pattern)
clusters under one akamaiedge.net display key. The override was added
in
|
||
|
|
2978436d89 |
Expand reverse-DNS map and PSL overrides from the live PSL (#716)
* Expand reverse-DNS map and PSL overrides from the live PSL Parses the private-domains section of the live Public Suffix List and adds 269 brand-owned suffixes as PSL overrides paired with map entries, so customer subdomains on shared hosting / SaaS / PaaS platforms fold to the operator's brand. Adds 33 ASN-domain entries for the subset of these brands whose IP space is registered under a different corporate domain in the MMDB, so both the PTR-derived lookup and the ASN-fallback lookup hit the same (name, type). Also normalizes ``a2hosting.com`` from ``A2Hosting`` to ``A2 Hosting`` for spelling consistency. PTR-path wins (overrides + map entries) - Web hosts: A2 Hosting, alwaysdata, Antagonist, Beget, bplaced, Bytemark, Combell, cyber_Folks, cyon, DreamHost, EasyWP, Gehirn, HelioHost, home.pl, HostyHosting, Hypernode, IONOS (6 suffixes), Jotelulu, JouwWeb, KaasHosting, Keyweb, LCube, LiquidNet, McHost, Memset, Mittwald, Mythic Beasts, NearlyFreeSpeech, Nimbus Hosting, One.com (20 ccTLD variants), OwnProvider, Pantheon, Planet-Work, prgmr, Rackmaze, Rad Web Hosting, Raidboxes, Servebolt, SpeedPartner, Uberspace, Whatbox, WP Engine, ZAP-Hosting, Zitcom. - Dynamic DNS: DuckDNS, DynDNS (24), No-IP (22), Now-DNS, dynv6, freemyip, nsupdate.info, ddnss.de, GoIP, DrayTek. - PaaS/SaaS/IaaS: Netlify, Vercel (6), Heroku, fly.io, Render, Firebase/GCP (4), Azure (5), AWS (4), DigitalOcean (2), Red Hat OpenShift, Hasura, Supabase, Snowflake/Streamlit, Read the Docs, PythonAnywhere, GitHub, GitLab, Adobe Magento. - Hosted sites/stores: Hatena (6), Notion, Figma, Webflow, Wix (4), Shopify, Shopware, Sellfy, Spreadshop (19 ccTLDs), Datto. - Email/Marketing: Fastmail, ActiveTrail, Leadpages, Heyflow, Carrd, Typeform. - CDN/Technology: Akamai (7), Fastly (3), Yandex Cloud. ASN-path wins (MMDB coverage now attributes 1,184,256 more IPv4 addresses to a named brand, 85.04% -> 85.08%): yandex.com, ya.ru, hosting.com (A2 Hosting), beget.com, cyberfolks.pl, fly.io, bytemark.co.uk, cyberfolks.ro, keyweb.de, mittwald.de, memset.com, zap-hosting.com, datto.com, jotelulu.com, yandex.cloud, github.com, asavie.com (Akamai), and 16 others. Entries are curated from the live PSL rather than any bundled copy; brand / as_name attribution was verified against the CLAUDE.md rule that the IP-WHOIS signal is only trusted when the domain name itself matches the host's name (name-collisions in MMDB were skipped — Hypernode AU, goipgroup.com, liquidnet.com, One.com substring noise, nimbusitsolutions.com, etc.). Types follow ``base_reverse_dns_types.txt``; ``sortlists.py`` re-sorts + dedupes + validates after the batch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document PSL-derived override workflow and load_psl_overrides gotcha Adds three pieces of map-maintenance context learned while building this PR: - New subsection "Discovering overrides from the live PSL private-domains section" — distinct source from live DMARC data (unknown_base_reverse_dns.csv) and MMDB coverage-gap analysis. The private section is itself a list of brand-owned suffixes; each is a candidate (psl_override + map entry) pair. Emphasizes ruthless selectivity — most of the 600+ private-section orgs are dev sandboxes or hobby zones that will never appear in DMARC reports. - Two-path coverage as a single linked step, not two round-trips: when adding a PSL override for a hosted-content suffix (netlify.app), also add a map row for the brand's corporate as_domain (netlify.com) in the same pass. The override fixes the PTR path; the ASN-domain alias fixes the ASN-fallback path. - The load_psl_overrides() fetch-first gotcha. The no-arg form pulls the file from master on GitHub, so end-to-end testing of local overrides silently uses the old remote version. offline=True is required to test local changes against get_base_domain(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2cda5bf59b |
Surface ASN info and use it for source attribution when a PTR is absent (#715)
* Surface ASN info and fall back to it when a PTR is absent Adds three new fields to every IP source record — ``asn`` (integer, e.g. 15169), ``asn_name`` (``"Google LLC"``), ``asn_domain`` (``"google.com"``) — sourced from the bundled IPinfo Lite MMDB. These flow through to CSV, JSON, Elasticsearch, OpenSearch, and Splunk outputs as ``source_asn``, ``source_asn_name``, ``source_asn_domain``. More importantly: when an IP has no reverse DNS (common for many large senders), source attribution now falls back to the ASN domain as a lookup key into the same ``reverse_dns_map``. Thanks to #712 and #714, ~85% of routed IPv4 space now has an ``as_domain`` that hits the map, so rows that were previously unattributable now get a ``source_name``/``source_type`` derived from the ASN. When the ASN domain misses the map, the raw AS name is used as ``source_name`` with ``source_type`` left null — still better than nothing. Crucially, ``source_reverse_dns`` and ``source_base_domain`` remain null on ASN-derived rows, so downstream consumers can still tell a PTR-resolved attribution apart from an ASN-derived one. ASN is stored as an integer at the schema level (Elasticsearch / OpenSearch mappings use ``Integer``) so consumers can do range queries and numeric sorts; dashboards can prepend ``AS`` at display time. The MMDB reader normalizes both IPinfo's ``"AS15169"`` string and MaxMind's ``autonomous_system_number`` int to the same int form. Also fixes a pre-existing caching bug in ``get_ip_address_info``: entries without reverse DNS were never written to the IP-info cache, so every no-PTR IP re-did the MMDB read and DNS attempt on every call. The cache write is now unconditional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Bump to 9.9.0 and document the ASN fallback work Updates the changelog with a 9.9.0 entry covering the ASN-domain aliases (#712, #714), map-maintenance tooling fixes (#713), and the ASN-fallback source attribution added in this branch. Extends AGENTS.md to explain that ``base_reverse_dns_map.csv`` is now a mixed-namespace map (rDNS bases alongside ASN domains) and adds a short recipe for finding high-value ASN-domain misses against the bundled MMDB, so future contributors know where the map's second lookup path comes from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document project conventions previously held only in agent memory Promotes four conventions out of per-agent memory and into AGENTS.md so every contributor — human or agent — works from the same baseline: - Run ruff check + format before committing (Code Style). - Store natively numeric values as numbers, not pre-formatted strings (e.g. ASN as int 15169, not "AS15169"; ES/OS mappings as Integer) (Code Style). - Before rewriting a tracked list/data file from freshly-generated content, verify the existing content via git — these files accumulate manually-curated entries across sessions (Editing tracked data files). - A release isn't done until hatch-built sdist + wheel are attached to the GitHub release page; full 8-step sequence documented (Releases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6effd80604 |
9.7.0 (#709)
- Auto-download psl_overrides.txt at startup (and whenever the reverse DNS map is reloaded) via load_psl_overrides(); add local_psl_overrides_path and psl_overrides_url config options - Add collect_domain_info.py and detect_psl_overrides.py for bulk WHOIS/HTTP enrichment and automatic cluster-based PSL override detection - Block full-IPv4 reverse-DNS entries from ever entering base_reverse_dns_map.csv, known_unknown_base_reverse_dns.txt, or unknown_base_reverse_dns.csv, and sweep pre-existing IP entries - Add Religion and Utilities to the allowed service_type values - Document the full map-maintenance workflow in AGENTS.md - Substantial expansion of base_reverse_dns_map.csv (net ~+1,000 entries) - Add 26 tests covering the new loader, IP filter, PSL fold logic, and cluster detection Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> |
||
|
|
1542936468 | Bump version to 9.5.4, enhance Maildir folder handling, and add config key aliases for environment variable compatibility | ||
|
|
9551c8b467 | Add AGENTS.md for AI agent guidance and link from CLAUDE.md |