mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-04-24 06:19:29 +00:00
Add optional IPinfo Lite REST API with MMDB fallback (#717)
* Add optional IPinfo Lite REST API with MMDB fallback
Configure [general] ipinfo_api_token (or PARSEDMARC_GENERAL_IPINFO_API_TOKEN)
and every IP lookup hits https://api.ipinfo.io/lite/<ip> first for fresh
country + ASN data. On HTTP 429 (rate-limit) or 402 (quota), the API is
disabled for the rest of the run and lookups fall through to the bundled /
cached MMDB; transient network errors fall through per-request without
disabling the API. An invalid token (401/403) raises InvalidIPinfoAPIKey,
which the CLI catches and exits fatally — including at startup via a probe
lookup so operators notice misconfiguration immediately. Added
ipinfo_api_url as a base-URL override for mirrors or proxies.
The API token is never logged. A new _normalize_ip_record() helper is
shared between the API path and the MMDB path so both paths produce the
same normalized shape (country code, asn int, asn_name, asn_domain).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* IPinfo API: cool down and retry instead of permanent disable
Previously a single 429 or 402 disabled the API for the whole run. Now
each event sets a cooldown (using Retry-After when present, defaulting to
5 minutes for rate limits and 1 hour for quota exhaustion). Once the
cooldown expires the next lookup retries; a successful retry logs
"IPinfo API recovered" once at info level so operators can see service
came back. Repeat rate-limit responses after the first event stay at
debug to avoid log spam.
Test now targets parsedmarc.log (the actual emitting logger) instead of
the parsedmarc parent — cli._main() sets the child's level to ERROR,
and assertLogs on the parent can't see warnings filtered before
propagation. Test also exercises the cooldown-then-recovery path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* IPinfo API: log plan and quota from /me at startup
Configure-time probe now hits https://ipinfo.io/me first. That endpoint
is documented as quota-free and doubles as a free-of-quota token check,
so we use it to both validate the token and surface plan / month-to-date
usage / remaining-quota numbers at info level:
IPinfo API configured — plan: Lite, usage: 12345/50000 this month, 37655 remaining
Field names in /me have drifted across IPinfo plan generations, so the
summary formatter probes a few aliases before giving up. If /me is
unreachable (custom mirror behind ipinfo_api_url, network error) we
fall back to the original 1.1.1.1 lookup probe, which still validates
the token and logs a generic "configured" message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Drop speculative ipinfo_api_url override
It was added mirroring ip_db_url, but the two serve different needs.
ip_db_url has a real use (internal hosting of the MMDB); an
authenticated IPinfo API isn't something anyone mirrors, and /me was
always hardcoded anyway, making the override half-baked. YAGNI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* AGENTS.md: warn against speculative config options
New section under Configuration spelling out that every option is
permanent surface area and must come from a real user need rather than
pattern-matching a nearby option. Cites the removed ipinfo_api_url as
the canonical cautionary tale so the next session doesn't reintroduce
it, and calls out "override the base URL" / "configurable retries" as
common YAGNI traps.
Also requires that new options land fully wired in one PR (INI schema,
_parse_config, Namespace defaults, docs, SIGHUP-reload path) rather
than half-implemented.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Rename [general] ip_db_url to ipinfo_url
The bundled MMDB is specifically IPinfo Lite, so the option name
should say so. ip_db_url stays accepted as a deprecated alias and
logs a warning when used; env-var equivalents accept either spelling
via the existing PARSEDMARC_{SECTION}_{KEY} machinery.
Updated the AGENTS.md cautionary tale to refer to ipinfo_url (with
the note about the alias) so the anti-pattern example still reads
correctly post-rename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix testPSLDownload to reflect .akamaiedge.net override
PSL carries c.akamaiedge.net as a public suffix, but
psl_overrides.txt intentionally folds .akamaiedge.net so every
Akamai CDN-customer PTR (the aXXXX-XX.cXXXXX.akamaiedge.net pattern)
clusters under one akamaiedge.net display key. The override was added
in 2978436 as a design decision for source attribution; the test
assertion just predates it.
Updated the comment to explain why override wins over the live PSL
here so the next reader doesn't reach for the PSL answer again.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
11
AGENTS.md
11
AGENTS.md
@@ -56,6 +56,17 @@ To skip DNS lookups during testing, set `GITHUB_ACTIONS=true`.
|
||||
|
||||
Config priority: CLI args > env vars > config file > defaults. Env var naming: `PARSEDMARC_{SECTION}_{KEY}` (e.g. `PARSEDMARC_IMAP_PASSWORD`). Section names with underscores use longest-prefix matching (`PARSEDMARC_SPLUNK_HEC_TOKEN` → `[splunk_hec] token`). Some INI keys have short aliases for env var friendliness (e.g. `[maildir] create` for `maildir_create`). File path values are expanded via `os.path.expanduser`/`os.path.expandvars`. Config can be loaded purely from env vars with no file (`PARSEDMARC_CONFIG_FILE` sets the file path).
|
||||
|
||||
#### Adding a config option is a commitment — justify each one from a real need
|
||||
|
||||
Every new option becomes documented surface area the project has to support forever. Before adding one, be able to answer "who asked for this and what breaks without it?" with a concrete user, request, or constraint — not "someone might want to override this someday".
|
||||
|
||||
**Do not pattern-match from a nearby option.** Existing overrides are not templates to copy; they exist because each had a real use case. In particular:
|
||||
|
||||
- `ipinfo_url` (formerly `ip_db_url`, still accepted as a deprecated alias) exists because users self-host the MMDB when they can't reach GitHub raw. That rationale does **not** carry over to authenticated third-party APIs (IPinfo, etc.) — nobody runs a mirror of those, and adding a "mirror URL" override for one is a YAGNI pitfall. The canonical cautionary tale: a speculative `ipinfo_api_url` was added by pattern-matching the existing download-URL override, then removed in the same PR once the lack of a real use case became obvious. Don't reintroduce it; don't add its siblings for other authenticated APIs.
|
||||
- "Override the base URL" and "configurable retry count" knobs almost always fall in this bucket. Ship the hardcoded value; add the knob when a user asks, with the use case recorded in the PR.
|
||||
|
||||
When you do add an option: surface it in the INI schema, the `_parse_config` branch, the `Namespace` defaults, the CLI docs (`docs/source/usage.md`), and SIGHUP-reload wiring together in one PR. Half-wired options (parsed but not consulted, or consulted but not documented) are worse than none.
|
||||
|
||||
### Caching
|
||||
|
||||
IP address info cached for 4 hours, seen aggregate report IDs cached for 1 hour (via `ExpiringDict`).
|
||||
|
||||
@@ -1,5 +1,12 @@
|
||||
# Changelog
|
||||
|
||||
## 9.10.0
|
||||
|
||||
### Changes
|
||||
|
||||
- Renamed `[general] ip_db_url` to `ipinfo_url` to reflect what it actually overrides (the bundled IPinfo Lite MMDB download URL). The old name is still accepted as a deprecated alias and logs a warning on use; the env-var equivalent is now `PARSEDMARC_GENERAL_IPINFO_URL`, with `PARSEDMARC_GENERAL_IP_DB_URL` also still honored.
|
||||
- Added an optional IPinfo Lite REST API path for country + ASN lookups, so deployments that want the freshest data can query the API directly instead of waiting for the next MMDB release. Configure `[general] ipinfo_api_token` (or `PARSEDMARC_GENERAL_IPINFO_API_TOKEN`) and every IP lookup hits `https://api.ipinfo.io/lite/<ip>` first. At startup the `https://ipinfo.io/me` account endpoint is hit once to validate the token and log the plan, month-to-date usage, and remaining quota at info level (e.g. `IPinfo API configured — plan: Lite, usage: 12345/50000 this month, 37655 remaining`). An invalid token exits the process with a fatal error. Rate-limit (HTTP 429) and quota-exhausted (HTTP 402) responses put the API in a cooldown (honoring `Retry-After`, with a 5-minute / 1-hour default) and fall through to the bundled/cached MMDB; the first event is logged once at warning level and recovery is logged once at info level when the next lookup succeeds. Transient network errors fall through per-request without triggering a cooldown. The API token is never logged.
|
||||
|
||||
## 9.9.0
|
||||
|
||||
### Changes
|
||||
|
||||
@@ -134,8 +134,17 @@ The full set of configuration options are:
|
||||
JSON output file
|
||||
- `ip_db_path` - str: An optional custom path to a MMDB file
|
||||
from IPinfo, MaxMind, or DBIP
|
||||
- `ip_db_url` - str: Overrides the default download URL for the
|
||||
IP-to-country database (env var: `PARSEDMARC_GENERAL_IP_DB_URL`)
|
||||
- `ipinfo_url` - str: Overrides the default download URL for the
|
||||
bundled IPinfo Lite MMDB (env var:
|
||||
`PARSEDMARC_GENERAL_IPINFO_URL`). The pre-9.10 name `ip_db_url` is
|
||||
still accepted as a deprecated alias and logs a warning.
|
||||
- `ipinfo_api_token` - str: Optional [IPinfo Lite REST API] token. When
|
||||
set, IP lookups hit the API first for the freshest country/ASN data
|
||||
and fall back to the local MMDB on rate limit, quota exhaustion, or
|
||||
network errors. An invalid token exits the process with a fatal error.
|
||||
Ignored when `offline` is set. The Lite tier is free and has no
|
||||
documented monthly request cap; see the IPinfo Lite docs for current
|
||||
limits. (env var: `PARSEDMARC_GENERAL_IPINFO_API_TOKEN`)
|
||||
- `offline` - bool: Do not use online queries for geolocation
|
||||
or DNS. Also disables automatic downloading of the IP-to-country
|
||||
database and reverse DNS map.
|
||||
@@ -801,3 +810,4 @@ journalctl -u parsedmarc.service -r
|
||||
|
||||
[cloudflare's public resolvers]: https://1.1.1.1/
|
||||
[url encoded]: https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters
|
||||
[ipinfo lite rest api]: https://ipinfo.io/developers/lite-api
|
||||
|
||||
@@ -51,6 +51,8 @@ from parsedmarc.mail import (
|
||||
from parsedmarc.mail.graph import AuthMethod
|
||||
from parsedmarc.types import ParsingResults
|
||||
from parsedmarc.utils import (
|
||||
InvalidIPinfoAPIKey,
|
||||
configure_ipinfo_api,
|
||||
get_base_domain,
|
||||
get_reverse_dns,
|
||||
is_mbox,
|
||||
@@ -397,8 +399,15 @@ def _parse_config(config: ConfigParser, opts):
|
||||
opts.ip_db_path = _expand_path(general_config["ip_db_path"])
|
||||
else:
|
||||
opts.ip_db_path = None
|
||||
if "ip_db_url" in general_config:
|
||||
opts.ip_db_url = general_config["ip_db_url"]
|
||||
if "ipinfo_url" in general_config:
|
||||
opts.ipinfo_url = general_config["ipinfo_url"]
|
||||
elif "ip_db_url" in general_config:
|
||||
# ``ip_db_url`` is the pre-9.10 name for the same option. Accept
|
||||
# it as a deprecated alias; prefer ``ipinfo_url`` going forward.
|
||||
opts.ipinfo_url = general_config["ip_db_url"]
|
||||
logger.warning("[general] ip_db_url is deprecated; rename it to ipinfo_url")
|
||||
if "ipinfo_api_token" in general_config:
|
||||
opts.ipinfo_api_token = general_config["ipinfo_api_token"]
|
||||
if "always_use_local_files" in general_config:
|
||||
opts.always_use_local_files = bool(
|
||||
general_config.getboolean("always_use_local_files")
|
||||
@@ -1832,7 +1841,8 @@ def _main():
|
||||
log_file=args.log_file,
|
||||
n_procs=1,
|
||||
ip_db_path=None,
|
||||
ip_db_url=None,
|
||||
ipinfo_url=None,
|
||||
ipinfo_api_token=None,
|
||||
always_use_local_files=False,
|
||||
reverse_dns_map_path=None,
|
||||
reverse_dns_map_url=None,
|
||||
@@ -1914,10 +1924,17 @@ def _main():
|
||||
load_ip_db(
|
||||
always_use_local_file=opts.always_use_local_files,
|
||||
local_file_path=opts.ip_db_path,
|
||||
url=opts.ip_db_url,
|
||||
url=opts.ipinfo_url,
|
||||
offline=opts.offline,
|
||||
)
|
||||
|
||||
if opts.ipinfo_api_token and not opts.offline:
|
||||
try:
|
||||
configure_ipinfo_api(opts.ipinfo_api_token)
|
||||
except InvalidIPinfoAPIKey as e:
|
||||
logger.critical(str(e))
|
||||
exit(1)
|
||||
|
||||
load_psl_overrides(
|
||||
always_use_local_file=opts.always_use_local_files,
|
||||
local_file_path=opts.psl_overrides_path,
|
||||
@@ -2352,10 +2369,21 @@ def _main():
|
||||
load_ip_db(
|
||||
always_use_local_file=new_opts.always_use_local_files,
|
||||
local_file_path=new_opts.ip_db_path,
|
||||
url=new_opts.ip_db_url,
|
||||
url=new_opts.ipinfo_url,
|
||||
offline=new_opts.offline,
|
||||
)
|
||||
|
||||
# Re-apply IPinfo API settings. Passing a falsy token disables
|
||||
# the API; a rotated token picks up here too. An invalid token
|
||||
# is fatal even on reload — the operator asked for it.
|
||||
try:
|
||||
configure_ipinfo_api(
|
||||
new_opts.ipinfo_api_token if not new_opts.offline else None,
|
||||
)
|
||||
except InvalidIPinfoAPIKey as e:
|
||||
logger.critical(str(e))
|
||||
exit(1)
|
||||
|
||||
for k, v in vars(new_opts).items():
|
||||
setattr(opts, k, v)
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
__version__ = "9.9.0"
|
||||
__version__ = "9.10.0"
|
||||
|
||||
USER_AGENT = f"parsedmarc/{__version__}"
|
||||
|
||||
|
||||
@@ -16,6 +16,7 @@ import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import tempfile
|
||||
import time
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import Optional, TypedDict, Union, cast
|
||||
|
||||
@@ -460,6 +461,322 @@ def load_ip_db(
|
||||
logger.info("Using bundled IP database")
|
||||
|
||||
|
||||
class _IPDatabaseRecord(TypedDict):
|
||||
country: Optional[str]
|
||||
asn: Optional[int]
|
||||
asn_name: Optional[str]
|
||||
asn_domain: Optional[str]
|
||||
|
||||
|
||||
class InvalidIPinfoAPIKey(Exception):
|
||||
"""Raised when the IPinfo API rejects the configured token."""
|
||||
|
||||
|
||||
# IPinfo Lite REST API. When ``_IPINFO_API_TOKEN`` is set, ``get_ip_address_db_record()``
|
||||
# queries the API first and falls through to the bundled/cached MMDB only on
|
||||
# rate-limit/quota/network errors. A 401/403 on any lookup propagates as
|
||||
# ``InvalidIPinfoAPIKey`` so the CLI exits fatally; callers of the library
|
||||
# should catch it.
|
||||
_IPINFO_API_URL = "https://api.ipinfo.io/lite"
|
||||
# Account-info / quota endpoint. Separate from the lookup URL because ``/me``
|
||||
# lives at the ipinfo.io root, not under ``/lite``. Hitting it at startup
|
||||
# both validates the token and surfaces plan/usage details; IPinfo documents
|
||||
# it as a quota-free meta endpoint.
|
||||
_IPINFO_ACCOUNT_URL = "https://ipinfo.io/me"
|
||||
_IPINFO_API_TOKEN: Optional[str] = None
|
||||
_IPINFO_API_TIMEOUT: float = 5.0
|
||||
# Default cooldowns when the API returns 429/402 without a ``Retry-After``
|
||||
# header. Rate limits are usually short; quota resets (402) are typically at a
|
||||
# day/month boundary, so we pick a longer default there.
|
||||
_IPINFO_API_RATE_LIMIT_COOLDOWN_SECONDS: float = 300.0
|
||||
_IPINFO_API_QUOTA_COOLDOWN_SECONDS: float = 3600.0
|
||||
# Unix timestamp before which lookups skip the API and go straight to the
|
||||
# MMDB. ``0`` means the API is currently available.
|
||||
_IPINFO_API_COOLDOWN_UNTIL: float = 0.0
|
||||
# Latch for recovery logging: True while the API is in a rate-limited or
|
||||
# quota-exhausted state, so the next successful lookup can log "recovered"
|
||||
# exactly once per event.
|
||||
_IPINFO_API_RATE_LIMITED: bool = False
|
||||
|
||||
|
||||
def configure_ipinfo_api(
|
||||
token: Optional[str],
|
||||
*,
|
||||
probe: bool = True,
|
||||
) -> None:
|
||||
"""Configure the IPinfo Lite REST API as the primary source for IP lookups.
|
||||
|
||||
When a token is configured, ``get_ip_address_db_record()`` hits the API
|
||||
first for every lookup and falls back to the MMDB on rate-limit, quota, or
|
||||
network errors. An invalid token raises ``InvalidIPinfoAPIKey`` — the CLI
|
||||
catches that and exits fatally.
|
||||
|
||||
Args:
|
||||
token: IPinfo API token. ``None`` or empty disables the API.
|
||||
probe: If ``True``, verify the token by hitting ``/me`` (and, if that
|
||||
is unreachable, by looking up ``1.1.1.1``). A 401/403 raises
|
||||
``InvalidIPinfoAPIKey``; other errors are logged and the token is
|
||||
still accepted so per-request fallback can take over.
|
||||
"""
|
||||
global _IPINFO_API_TOKEN
|
||||
global _IPINFO_API_COOLDOWN_UNTIL, _IPINFO_API_RATE_LIMITED
|
||||
|
||||
_IPINFO_API_TOKEN = token or None
|
||||
_IPINFO_API_COOLDOWN_UNTIL = 0.0
|
||||
_IPINFO_API_RATE_LIMITED = False
|
||||
|
||||
if not _IPINFO_API_TOKEN:
|
||||
return
|
||||
|
||||
if probe:
|
||||
# Verify the token. Any network/quota failure here is non-fatal — we
|
||||
# still accept the token and let per-request fallback handle it — but
|
||||
# an invalid-key response must fail fast so operators notice
|
||||
# immediately instead of seeing silent MMDB-only lookups all day.
|
||||
#
|
||||
# The /me meta endpoint doubles as a free-of-quota token check and a
|
||||
# plan/usage lookup, so we try it first. If /me is unreachable, fall
|
||||
# back to a lookup of 1.1.1.1 to validate the token.
|
||||
account: Optional[dict] = None
|
||||
try:
|
||||
account = _ipinfo_api_account_info()
|
||||
except InvalidIPinfoAPIKey:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.debug(f"IPinfo account info fetch failed: {e}")
|
||||
|
||||
if account is not None:
|
||||
summary = _format_ipinfo_account_summary(account)
|
||||
if summary:
|
||||
logger.info(f"IPinfo API configured — {summary}")
|
||||
else:
|
||||
logger.info("IPinfo API configured")
|
||||
return
|
||||
|
||||
try:
|
||||
_ipinfo_api_lookup("1.1.1.1")
|
||||
except InvalidIPinfoAPIKey:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.warning(f"IPinfo API probe failed (will fall back per-request): {e}")
|
||||
else:
|
||||
logger.info("IPinfo API configured")
|
||||
|
||||
|
||||
def _ipinfo_api_account_info() -> Optional[dict]:
|
||||
"""Fetch the IPinfo ``/me`` account endpoint.
|
||||
|
||||
Returns the parsed JSON dict on success, or ``None`` when the endpoint is
|
||||
unreachable (network error, non-JSON body, non-2xx other than 401/403).
|
||||
A 401/403 raises ``InvalidIPinfoAPIKey`` — this endpoint is the best way
|
||||
to validate a token since it doesn't consume a lookup-quota unit.
|
||||
"""
|
||||
if not _IPINFO_API_TOKEN:
|
||||
return None
|
||||
headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Authorization": f"Bearer {_IPINFO_API_TOKEN}",
|
||||
"Accept": "application/json",
|
||||
}
|
||||
response = requests.get(
|
||||
_IPINFO_ACCOUNT_URL, headers=headers, timeout=_IPINFO_API_TIMEOUT
|
||||
)
|
||||
if response.status_code in (401, 403):
|
||||
raise InvalidIPinfoAPIKey(
|
||||
f"IPinfo API rejected the configured token (HTTP {response.status_code})"
|
||||
)
|
||||
if not response.ok:
|
||||
logger.debug(f"IPinfo /me returned HTTP {response.status_code}")
|
||||
return None
|
||||
try:
|
||||
payload = response.json()
|
||||
except ValueError:
|
||||
return None
|
||||
return payload if isinstance(payload, dict) else None
|
||||
|
||||
|
||||
def _format_ipinfo_account_summary(account: dict) -> Optional[str]:
|
||||
"""Render a short, log-friendly summary of the IPinfo /me response.
|
||||
|
||||
Field names in /me have varied across IPinfo plan generations, so we
|
||||
probe a few aliases rather than commit to one schema. If nothing
|
||||
useful is present we return ``None`` and the caller falls back to a
|
||||
generic "configured" message.
|
||||
"""
|
||||
plan = (
|
||||
account.get("plan")
|
||||
or account.get("tier")
|
||||
or account.get("token_type")
|
||||
or account.get("type")
|
||||
)
|
||||
limit = account.get("limit") or account.get("monthly_limit")
|
||||
remaining = account.get("remaining") or account.get("requests_remaining")
|
||||
used = account.get("month") or account.get("month_requests") or account.get("used")
|
||||
|
||||
parts = []
|
||||
if plan:
|
||||
parts.append(f"plan: {plan}")
|
||||
if used is not None and limit:
|
||||
parts.append(f"usage: {used}/{limit} this month")
|
||||
elif limit:
|
||||
parts.append(f"monthly limit: {limit}")
|
||||
if remaining is not None:
|
||||
parts.append(f"{remaining} remaining")
|
||||
return ", ".join(parts) if parts else None
|
||||
|
||||
|
||||
def _parse_retry_after(response, default_seconds: float) -> float:
|
||||
"""Parse an HTTP ``Retry-After`` header as seconds.
|
||||
|
||||
Supports the delta-seconds form. HTTP-date form is rare enough for an API
|
||||
client to ignore; we just fall back to the default.
|
||||
"""
|
||||
raw = response.headers.get("Retry-After")
|
||||
if raw:
|
||||
try:
|
||||
return max(float(raw.strip()), 1.0)
|
||||
except ValueError:
|
||||
pass
|
||||
return default_seconds
|
||||
|
||||
|
||||
def _ipinfo_api_lookup(ip_address: str) -> Optional[_IPDatabaseRecord]:
|
||||
"""Look up an IP via the IPinfo Lite REST API.
|
||||
|
||||
Returns the normalized record on success, or ``None`` when the API is
|
||||
unavailable for any reason the caller should fall back from (network
|
||||
error, 429 rate limit, 402 quota exhausted, malformed response).
|
||||
|
||||
On 429/402 the API is put in a cooldown (using ``Retry-After`` when
|
||||
present) so we stop hammering it, and we log once per event at warning
|
||||
level. After the cooldown expires the next lookup retries transparently;
|
||||
a successful retry logs "API recovered" once at info level so operators
|
||||
can see service came back.
|
||||
|
||||
Raises:
|
||||
InvalidIPinfoAPIKey: on 401/403. Propagates to abort the run.
|
||||
"""
|
||||
global _IPINFO_API_COOLDOWN_UNTIL, _IPINFO_API_RATE_LIMITED
|
||||
|
||||
if not _IPINFO_API_TOKEN:
|
||||
return None
|
||||
if _IPINFO_API_COOLDOWN_UNTIL and time.time() < _IPINFO_API_COOLDOWN_UNTIL:
|
||||
return None
|
||||
|
||||
url = f"{_IPINFO_API_URL}/{ip_address}"
|
||||
headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Authorization": f"Bearer {_IPINFO_API_TOKEN}",
|
||||
"Accept": "application/json",
|
||||
}
|
||||
try:
|
||||
response = requests.get(url, headers=headers, timeout=_IPINFO_API_TIMEOUT)
|
||||
except requests.exceptions.RequestException as e:
|
||||
logger.debug(f"IPinfo API request for {ip_address} failed: {e}")
|
||||
return None
|
||||
|
||||
if response.status_code in (401, 403):
|
||||
raise InvalidIPinfoAPIKey(
|
||||
f"IPinfo API rejected the configured token (HTTP {response.status_code})"
|
||||
)
|
||||
if response.status_code == 429:
|
||||
cooldown = _parse_retry_after(response, _IPINFO_API_RATE_LIMIT_COOLDOWN_SECONDS)
|
||||
_IPINFO_API_COOLDOWN_UNTIL = time.time() + cooldown
|
||||
# First hit of a rate-limit event is visible at warning; subsequent
|
||||
# 429s after cooldown-and-retry cycles stay at debug so we don't spam
|
||||
# the log when a run spans a long quota reset.
|
||||
if not _IPINFO_API_RATE_LIMITED:
|
||||
logger.warning(
|
||||
"IPinfo API rate limit hit; falling back to the local MMDB "
|
||||
f"for {cooldown:.0f}s before retrying"
|
||||
)
|
||||
_IPINFO_API_RATE_LIMITED = True
|
||||
else:
|
||||
logger.debug(f"IPinfo API still rate-limited; retry after {cooldown:.0f}s")
|
||||
return None
|
||||
if response.status_code == 402:
|
||||
cooldown = _parse_retry_after(response, _IPINFO_API_QUOTA_COOLDOWN_SECONDS)
|
||||
_IPINFO_API_COOLDOWN_UNTIL = time.time() + cooldown
|
||||
if not _IPINFO_API_RATE_LIMITED:
|
||||
logger.warning(
|
||||
"IPinfo API quota exhausted; falling back to the local MMDB "
|
||||
f"for {cooldown:.0f}s before retrying"
|
||||
)
|
||||
_IPINFO_API_RATE_LIMITED = True
|
||||
else:
|
||||
logger.debug(
|
||||
f"IPinfo API quota still exhausted; retry after {cooldown:.0f}s"
|
||||
)
|
||||
return None
|
||||
if not response.ok:
|
||||
logger.debug(
|
||||
f"IPinfo API returned HTTP {response.status_code} for {ip_address}"
|
||||
)
|
||||
return None
|
||||
|
||||
try:
|
||||
payload = response.json()
|
||||
except ValueError:
|
||||
logger.debug(f"IPinfo API returned non-JSON for {ip_address}")
|
||||
return None
|
||||
if not isinstance(payload, dict):
|
||||
return None
|
||||
|
||||
if _IPINFO_API_RATE_LIMITED:
|
||||
logger.info("IPinfo API recovered; resuming API lookups")
|
||||
_IPINFO_API_RATE_LIMITED = False
|
||||
_IPINFO_API_COOLDOWN_UNTIL = 0.0
|
||||
|
||||
return _normalize_ip_record(payload)
|
||||
|
||||
|
||||
def _normalize_ip_record(record: dict) -> _IPDatabaseRecord:
|
||||
"""Normalize an IPinfo / MaxMind record to the internal shape.
|
||||
|
||||
Shared between the API path and the MMDB path so both schemas produce the
|
||||
same output: country as ISO code, ASN as plain int, asn_name string,
|
||||
asn_domain lowercased.
|
||||
"""
|
||||
country: Optional[str] = None
|
||||
asn: Optional[int] = None
|
||||
asn_name: Optional[str] = None
|
||||
asn_domain: Optional[str] = None
|
||||
|
||||
code = record.get("country_code")
|
||||
if code is None:
|
||||
nested = record.get("country")
|
||||
if isinstance(nested, dict):
|
||||
code = nested.get("iso_code")
|
||||
if isinstance(code, str):
|
||||
country = code
|
||||
|
||||
raw_asn = record.get("asn")
|
||||
if isinstance(raw_asn, int):
|
||||
asn = raw_asn
|
||||
elif isinstance(raw_asn, str) and raw_asn:
|
||||
digits = raw_asn.removeprefix("AS").removeprefix("as")
|
||||
if digits.isdigit():
|
||||
asn = int(digits)
|
||||
if asn is None:
|
||||
mm_asn = record.get("autonomous_system_number")
|
||||
if isinstance(mm_asn, int):
|
||||
asn = mm_asn
|
||||
|
||||
name = record.get("as_name") or record.get("autonomous_system_organization")
|
||||
if isinstance(name, str) and name:
|
||||
asn_name = name
|
||||
domain = record.get("as_domain")
|
||||
if isinstance(domain, str) and domain:
|
||||
asn_domain = domain.lower()
|
||||
|
||||
return {
|
||||
"country": country,
|
||||
"asn": asn,
|
||||
"asn_name": asn_name,
|
||||
"asn_domain": asn_domain,
|
||||
}
|
||||
|
||||
|
||||
def _get_ip_database_path(db_path: Optional[str]) -> str:
|
||||
db_paths = [
|
||||
"ipinfo_lite.mmdb",
|
||||
@@ -505,71 +822,35 @@ def _get_ip_database_path(db_path: Optional[str]) -> str:
|
||||
return db_path
|
||||
|
||||
|
||||
class _IPDatabaseRecord(TypedDict):
|
||||
country: Optional[str]
|
||||
asn: Optional[int]
|
||||
asn_name: Optional[str]
|
||||
asn_domain: Optional[str]
|
||||
|
||||
|
||||
def get_ip_address_db_record(
|
||||
ip_address: str, *, db_path: Optional[str] = None
|
||||
) -> _IPDatabaseRecord:
|
||||
"""Look up an IP in the configured MMDB and return country + ASN fields.
|
||||
"""Look up an IP and return country + ASN fields.
|
||||
|
||||
If the IPinfo Lite API is configured via ``configure_ipinfo_api()``, the
|
||||
API is queried first; any non-fatal failure (rate limit, quota, network)
|
||||
falls through to the MMDB. An invalid API token raises
|
||||
``InvalidIPinfoAPIKey`` and is not caught here.
|
||||
|
||||
IPinfo Lite carries ``country_code``, ``as_name``, and ``as_domain`` on
|
||||
every record. MaxMind/DBIP country-only databases carry only country, so
|
||||
``asn_name`` / ``asn_domain`` come back None for those users.
|
||||
"""
|
||||
api_record = _ipinfo_api_lookup(ip_address)
|
||||
if api_record is not None:
|
||||
return api_record
|
||||
|
||||
resolved_path = _get_ip_database_path(db_path)
|
||||
db_reader = maxminddb.open_database(resolved_path)
|
||||
record = db_reader.get(ip_address)
|
||||
|
||||
country: Optional[str] = None
|
||||
asn: Optional[int] = None
|
||||
asn_name: Optional[str] = None
|
||||
asn_domain: Optional[str] = None
|
||||
if isinstance(record, dict):
|
||||
# Support both the IPinfo schema (flat top-level ``country_code``) and
|
||||
# the MaxMind/DBIP schema (nested ``country.iso_code``) so users
|
||||
# dropping in their own MMDB from any of these providers keeps working.
|
||||
code = record.get("country_code")
|
||||
if code is None:
|
||||
nested = record.get("country")
|
||||
if isinstance(nested, dict):
|
||||
code = nested.get("iso_code")
|
||||
if isinstance(code, str):
|
||||
country = code
|
||||
|
||||
# Normalize ASN to a plain integer. IPinfo stores it as a string like
|
||||
# "AS15169"; MaxMind's ASN DB uses ``autonomous_system_number`` as an
|
||||
# int. Integer form lets consumers do range queries and sort
|
||||
# numerically; display-time formatting with an "AS" prefix is trivial.
|
||||
raw_asn = record.get("asn")
|
||||
if isinstance(raw_asn, int):
|
||||
asn = raw_asn
|
||||
elif isinstance(raw_asn, str) and raw_asn:
|
||||
digits = raw_asn.removeprefix("AS").removeprefix("as")
|
||||
if digits.isdigit():
|
||||
asn = int(digits)
|
||||
if asn is None:
|
||||
mm_asn = record.get("autonomous_system_number")
|
||||
if isinstance(mm_asn, int):
|
||||
asn = mm_asn
|
||||
|
||||
name = record.get("as_name") or record.get("autonomous_system_organization")
|
||||
if isinstance(name, str) and name:
|
||||
asn_name = name
|
||||
domain = record.get("as_domain")
|
||||
if isinstance(domain, str) and domain:
|
||||
asn_domain = domain.lower()
|
||||
|
||||
return {
|
||||
"country": country,
|
||||
"asn": asn,
|
||||
"asn_name": asn_name,
|
||||
"asn_domain": asn_domain,
|
||||
}
|
||||
if not isinstance(record, dict):
|
||||
return {
|
||||
"country": None,
|
||||
"asn": None,
|
||||
"asn_name": None,
|
||||
"asn_domain": None,
|
||||
}
|
||||
return _normalize_ip_record(record)
|
||||
|
||||
|
||||
def get_ip_address_country(
|
||||
|
||||
171
tests.py
171
tests.py
@@ -70,10 +70,14 @@ class Test(unittest.TestCase):
|
||||
result = parsedmarc.utils.get_base_domain(subdomain)
|
||||
assert result == "example.com"
|
||||
|
||||
# Test newer PSL entries
|
||||
# psl_overrides.txt intentionally folds CDN-customer PTRs so every
|
||||
# sender on the same network clusters under one display key.
|
||||
# ``.akamaiedge.net`` is an override, so its subdomains collapse to
|
||||
# ``akamaiedge.net`` even though the live PSL carries the finer-grained
|
||||
# ``c.akamaiedge.net`` — the override is the design decision.
|
||||
subdomain = "e3191.c.akamaiedge.net"
|
||||
result = parsedmarc.utils.get_base_domain(subdomain)
|
||||
assert result == "c.akamaiedge.net"
|
||||
assert result == "akamaiedge.net"
|
||||
|
||||
def testExtractReportXMLComparator(self):
|
||||
"""Test XML comparator function"""
|
||||
@@ -270,6 +274,137 @@ class Test(unittest.TestCase):
|
||||
self.assertEqual(info["name"], "Some Unmapped Org, Inc.")
|
||||
self.assertEqual(info["asn_domain"], "unmapped-for-this-test.example")
|
||||
|
||||
def testIPinfoAPIPrimarySourceAndInvalidKeyIsFatal(self):
|
||||
"""With an API token configured, lookups hit the API first. A 401/403
|
||||
response propagates as ``InvalidIPinfoAPIKey`` so the CLI can exit.
|
||||
A 429 puts the API in a cooldown (falling back to the MMDB) and a
|
||||
successful retry after the cooldown logs recovery."""
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
import parsedmarc.utils as utils_module
|
||||
from parsedmarc.utils import (
|
||||
InvalidIPinfoAPIKey,
|
||||
configure_ipinfo_api,
|
||||
get_ip_address_db_record,
|
||||
)
|
||||
|
||||
def _mock_response(status_code, json_body=None, headers=None):
|
||||
resp = MagicMock()
|
||||
resp.status_code = status_code
|
||||
resp.ok = 200 <= status_code < 300
|
||||
resp.json.return_value = json_body or {}
|
||||
resp.headers = headers or {}
|
||||
return resp
|
||||
|
||||
try:
|
||||
# Success: API returns IPinfo-schema JSON; record comes from API.
|
||||
api_json = {
|
||||
"ip": "8.8.8.8",
|
||||
"asn": "AS15169",
|
||||
"as_name": "Google LLC",
|
||||
"as_domain": "google.com",
|
||||
"country_code": "US",
|
||||
}
|
||||
with patch(
|
||||
"parsedmarc.utils.requests.get",
|
||||
return_value=_mock_response(200, api_json),
|
||||
):
|
||||
configure_ipinfo_api("fake-token", probe=False)
|
||||
record = get_ip_address_db_record("8.8.8.8")
|
||||
self.assertEqual(record["country"], "US")
|
||||
self.assertEqual(record["asn"], 15169)
|
||||
self.assertEqual(record["asn_domain"], "google.com")
|
||||
|
||||
# Invalid key: 401 raises a fatal exception even on a random lookup.
|
||||
with patch(
|
||||
"parsedmarc.utils.requests.get",
|
||||
return_value=_mock_response(401),
|
||||
):
|
||||
configure_ipinfo_api("bad-token", probe=False)
|
||||
with self.assertRaises(InvalidIPinfoAPIKey):
|
||||
get_ip_address_db_record("8.8.8.8")
|
||||
|
||||
# Rate limited: 429 sets a cooldown and falls back to the MMDB.
|
||||
# The first rate-limit event is logged at WARNING; during the
|
||||
# cooldown no further API requests are made.
|
||||
configure_ipinfo_api("rate-limited", probe=False)
|
||||
with patch(
|
||||
"parsedmarc.utils.requests.get",
|
||||
return_value=_mock_response(429, headers={"Retry-After": "120"}),
|
||||
):
|
||||
with self.assertLogs("parsedmarc.log", level="WARNING") as cm:
|
||||
record = get_ip_address_db_record("8.8.8.8")
|
||||
# MMDB fallback fills in Google's ASN from the bundled MMDB.
|
||||
self.assertEqual(record["asn"], 15169)
|
||||
self.assertTrue(
|
||||
any("rate limit" in line.lower() for line in cm.output),
|
||||
f"expected a rate-limit warning, got: {cm.output}",
|
||||
)
|
||||
self.assertTrue(utils_module._IPINFO_API_RATE_LIMITED)
|
||||
self.assertGreater(utils_module._IPINFO_API_COOLDOWN_UNTIL, 0.0)
|
||||
|
||||
# During cooldown: no API call; fall straight through to MMDB.
|
||||
poisoned = {"asn": "AS1", "country_code": "ZZ"}
|
||||
with patch(
|
||||
"parsedmarc.utils.requests.get",
|
||||
return_value=_mock_response(200, poisoned),
|
||||
) as mock_get:
|
||||
record = get_ip_address_db_record("8.8.8.8")
|
||||
mock_get.assert_not_called()
|
||||
|
||||
# Simulate the cooldown expiring, then a successful retry: the
|
||||
# recovery is logged at INFO and API lookups resume.
|
||||
utils_module._IPINFO_API_COOLDOWN_UNTIL = 0.0
|
||||
with patch(
|
||||
"parsedmarc.utils.requests.get",
|
||||
return_value=_mock_response(200, api_json),
|
||||
):
|
||||
with self.assertLogs("parsedmarc.log", level="INFO") as cm:
|
||||
record = get_ip_address_db_record("8.8.8.8")
|
||||
self.assertEqual(record["asn_domain"], "google.com")
|
||||
self.assertTrue(
|
||||
any("recovered" in line.lower() for line in cm.output),
|
||||
f"expected a recovery info log, got: {cm.output}",
|
||||
)
|
||||
self.assertFalse(utils_module._IPINFO_API_RATE_LIMITED)
|
||||
finally:
|
||||
configure_ipinfo_api(None)
|
||||
|
||||
def testIPinfoAPIStartupLogsAccountQuota(self):
|
||||
"""``configure_ipinfo_api(..., probe=True)`` should hit the /me
|
||||
endpoint and log plan/usage info at INFO level when available."""
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
from parsedmarc.utils import configure_ipinfo_api
|
||||
|
||||
me_body = {
|
||||
"plan": "Lite",
|
||||
"month": 12345,
|
||||
"limit": 50000,
|
||||
"remaining": 37655,
|
||||
}
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 200
|
||||
mock_resp.ok = True
|
||||
mock_resp.json.return_value = me_body
|
||||
mock_resp.headers = {}
|
||||
|
||||
try:
|
||||
with patch(
|
||||
"parsedmarc.utils.requests.get", return_value=mock_resp
|
||||
) as mock_get:
|
||||
with self.assertLogs("parsedmarc.log", level="INFO") as cm:
|
||||
configure_ipinfo_api("good-token", probe=True)
|
||||
# /me is the first (and only) probe request when it succeeds.
|
||||
called_urls = [args[0] for args, _ in mock_get.call_args_list]
|
||||
self.assertIn("https://ipinfo.io/me", called_urls)
|
||||
output = " ".join(cm.output)
|
||||
self.assertIn("Lite", output)
|
||||
self.assertIn("12345/50000", output)
|
||||
self.assertIn("37655", output)
|
||||
finally:
|
||||
configure_ipinfo_api(None)
|
||||
|
||||
def testAggregateCsvExposesASNColumns(self):
|
||||
"""The aggregate CSV output should include source_asn, source_asn_name,
|
||||
and source_asn_domain columns."""
|
||||
@@ -2784,6 +2919,38 @@ class TestConfigAliases(unittest.TestCase):
|
||||
self.assertEqual(opts.maildir_path, "/original/path")
|
||||
self.assertTrue(opts.maildir_create)
|
||||
|
||||
def test_ipinfo_url_option(self):
|
||||
"""[general] ipinfo_url lands on opts.ipinfo_url."""
|
||||
from argparse import Namespace
|
||||
from parsedmarc.cli import _parse_config
|
||||
|
||||
config = ConfigParser(interpolation=None)
|
||||
config.add_section("general")
|
||||
config.set("general", "ipinfo_url", "https://mirror.example/mmdb")
|
||||
|
||||
opts = Namespace()
|
||||
_parse_config(config, opts)
|
||||
self.assertEqual(opts.ipinfo_url, "https://mirror.example/mmdb")
|
||||
|
||||
def test_ip_db_url_deprecated_alias(self):
|
||||
"""[general] ip_db_url is accepted as an alias for ipinfo_url but
|
||||
emits a deprecation warning."""
|
||||
from argparse import Namespace
|
||||
from parsedmarc.cli import _parse_config
|
||||
|
||||
config = ConfigParser(interpolation=None)
|
||||
config.add_section("general")
|
||||
config.set("general", "ip_db_url", "https://old.example/mmdb")
|
||||
|
||||
opts = Namespace()
|
||||
with self.assertLogs("parsedmarc.log", level="WARNING") as cm:
|
||||
_parse_config(config, opts)
|
||||
self.assertEqual(opts.ipinfo_url, "https://old.example/mmdb")
|
||||
self.assertTrue(
|
||||
any("ip_db_url" in line and "deprecated" in line for line in cm.output),
|
||||
f"expected deprecation warning, got: {cm.output}",
|
||||
)
|
||||
|
||||
|
||||
class TestMaildirUidHandling(unittest.TestCase):
|
||||
"""Tests for Maildir UID mismatch handling in Docker-like environments."""
|
||||
|
||||
Reference in New Issue
Block a user