diff --git a/AGENTS.md b/AGENTS.md index 42e423c..6cdc8a5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -56,6 +56,17 @@ To skip DNS lookups during testing, set `GITHUB_ACTIONS=true`. Config priority: CLI args > env vars > config file > defaults. Env var naming: `PARSEDMARC_{SECTION}_{KEY}` (e.g. `PARSEDMARC_IMAP_PASSWORD`). Section names with underscores use longest-prefix matching (`PARSEDMARC_SPLUNK_HEC_TOKEN` → `[splunk_hec] token`). Some INI keys have short aliases for env var friendliness (e.g. `[maildir] create` for `maildir_create`). File path values are expanded via `os.path.expanduser`/`os.path.expandvars`. Config can be loaded purely from env vars with no file (`PARSEDMARC_CONFIG_FILE` sets the file path). +#### Adding a config option is a commitment — justify each one from a real need + +Every new option becomes documented surface area the project has to support forever. Before adding one, be able to answer "who asked for this and what breaks without it?" with a concrete user, request, or constraint — not "someone might want to override this someday". + +**Do not pattern-match from a nearby option.** Existing overrides are not templates to copy; they exist because each had a real use case. In particular: + +- `ipinfo_url` (formerly `ip_db_url`, still accepted as a deprecated alias) exists because users self-host the MMDB when they can't reach GitHub raw. That rationale does **not** carry over to authenticated third-party APIs (IPinfo, etc.) — nobody runs a mirror of those, and adding a "mirror URL" override for one is a YAGNI pitfall. The canonical cautionary tale: a speculative `ipinfo_api_url` was added by pattern-matching the existing download-URL override, then removed in the same PR once the lack of a real use case became obvious. Don't reintroduce it; don't add its siblings for other authenticated APIs. +- "Override the base URL" and "configurable retry count" knobs almost always fall in this bucket. Ship the hardcoded value; add the knob when a user asks, with the use case recorded in the PR. + +When you do add an option: surface it in the INI schema, the `_parse_config` branch, the `Namespace` defaults, the CLI docs (`docs/source/usage.md`), and SIGHUP-reload wiring together in one PR. Half-wired options (parsed but not consulted, or consulted but not documented) are worse than none. + ### Caching IP address info cached for 4 hours, seen aggregate report IDs cached for 1 hour (via `ExpiringDict`). diff --git a/CHANGELOG.md b/CHANGELOG.md index df7ce0c..3e55127 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,12 @@ # Changelog +## 9.10.0 + +### Changes + +- Renamed `[general] ip_db_url` to `ipinfo_url` to reflect what it actually overrides (the bundled IPinfo Lite MMDB download URL). The old name is still accepted as a deprecated alias and logs a warning on use; the env-var equivalent is now `PARSEDMARC_GENERAL_IPINFO_URL`, with `PARSEDMARC_GENERAL_IP_DB_URL` also still honored. +- Added an optional IPinfo Lite REST API path for country + ASN lookups, so deployments that want the freshest data can query the API directly instead of waiting for the next MMDB release. Configure `[general] ipinfo_api_token` (or `PARSEDMARC_GENERAL_IPINFO_API_TOKEN`) and every IP lookup hits `https://api.ipinfo.io/lite/` first. At startup the `https://ipinfo.io/me` account endpoint is hit once to validate the token and log the plan, month-to-date usage, and remaining quota at info level (e.g. `IPinfo API configured — plan: Lite, usage: 12345/50000 this month, 37655 remaining`). An invalid token exits the process with a fatal error. Rate-limit (HTTP 429) and quota-exhausted (HTTP 402) responses put the API in a cooldown (honoring `Retry-After`, with a 5-minute / 1-hour default) and fall through to the bundled/cached MMDB; the first event is logged once at warning level and recovery is logged once at info level when the next lookup succeeds. Transient network errors fall through per-request without triggering a cooldown. The API token is never logged. + ## 9.9.0 ### Changes diff --git a/docs/source/usage.md b/docs/source/usage.md index 27d682e..cd9cd57 100644 --- a/docs/source/usage.md +++ b/docs/source/usage.md @@ -134,8 +134,17 @@ The full set of configuration options are: JSON output file - `ip_db_path` - str: An optional custom path to a MMDB file from IPinfo, MaxMind, or DBIP - - `ip_db_url` - str: Overrides the default download URL for the - IP-to-country database (env var: `PARSEDMARC_GENERAL_IP_DB_URL`) + - `ipinfo_url` - str: Overrides the default download URL for the + bundled IPinfo Lite MMDB (env var: + `PARSEDMARC_GENERAL_IPINFO_URL`). The pre-9.10 name `ip_db_url` is + still accepted as a deprecated alias and logs a warning. + - `ipinfo_api_token` - str: Optional [IPinfo Lite REST API] token. When + set, IP lookups hit the API first for the freshest country/ASN data + and fall back to the local MMDB on rate limit, quota exhaustion, or + network errors. An invalid token exits the process with a fatal error. + Ignored when `offline` is set. The Lite tier is free and has no + documented monthly request cap; see the IPinfo Lite docs for current + limits. (env var: `PARSEDMARC_GENERAL_IPINFO_API_TOKEN`) - `offline` - bool: Do not use online queries for geolocation or DNS. Also disables automatic downloading of the IP-to-country database and reverse DNS map. @@ -801,3 +810,4 @@ journalctl -u parsedmarc.service -r [cloudflare's public resolvers]: https://1.1.1.1/ [url encoded]: https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters +[ipinfo lite rest api]: https://ipinfo.io/developers/lite-api diff --git a/parsedmarc/cli.py b/parsedmarc/cli.py index bef7534..25dccf8 100644 --- a/parsedmarc/cli.py +++ b/parsedmarc/cli.py @@ -51,6 +51,8 @@ from parsedmarc.mail import ( from parsedmarc.mail.graph import AuthMethod from parsedmarc.types import ParsingResults from parsedmarc.utils import ( + InvalidIPinfoAPIKey, + configure_ipinfo_api, get_base_domain, get_reverse_dns, is_mbox, @@ -397,8 +399,15 @@ def _parse_config(config: ConfigParser, opts): opts.ip_db_path = _expand_path(general_config["ip_db_path"]) else: opts.ip_db_path = None - if "ip_db_url" in general_config: - opts.ip_db_url = general_config["ip_db_url"] + if "ipinfo_url" in general_config: + opts.ipinfo_url = general_config["ipinfo_url"] + elif "ip_db_url" in general_config: + # ``ip_db_url`` is the pre-9.10 name for the same option. Accept + # it as a deprecated alias; prefer ``ipinfo_url`` going forward. + opts.ipinfo_url = general_config["ip_db_url"] + logger.warning("[general] ip_db_url is deprecated; rename it to ipinfo_url") + if "ipinfo_api_token" in general_config: + opts.ipinfo_api_token = general_config["ipinfo_api_token"] if "always_use_local_files" in general_config: opts.always_use_local_files = bool( general_config.getboolean("always_use_local_files") @@ -1832,7 +1841,8 @@ def _main(): log_file=args.log_file, n_procs=1, ip_db_path=None, - ip_db_url=None, + ipinfo_url=None, + ipinfo_api_token=None, always_use_local_files=False, reverse_dns_map_path=None, reverse_dns_map_url=None, @@ -1914,10 +1924,17 @@ def _main(): load_ip_db( always_use_local_file=opts.always_use_local_files, local_file_path=opts.ip_db_path, - url=opts.ip_db_url, + url=opts.ipinfo_url, offline=opts.offline, ) + if opts.ipinfo_api_token and not opts.offline: + try: + configure_ipinfo_api(opts.ipinfo_api_token) + except InvalidIPinfoAPIKey as e: + logger.critical(str(e)) + exit(1) + load_psl_overrides( always_use_local_file=opts.always_use_local_files, local_file_path=opts.psl_overrides_path, @@ -2352,10 +2369,21 @@ def _main(): load_ip_db( always_use_local_file=new_opts.always_use_local_files, local_file_path=new_opts.ip_db_path, - url=new_opts.ip_db_url, + url=new_opts.ipinfo_url, offline=new_opts.offline, ) + # Re-apply IPinfo API settings. Passing a falsy token disables + # the API; a rotated token picks up here too. An invalid token + # is fatal even on reload — the operator asked for it. + try: + configure_ipinfo_api( + new_opts.ipinfo_api_token if not new_opts.offline else None, + ) + except InvalidIPinfoAPIKey as e: + logger.critical(str(e)) + exit(1) + for k, v in vars(new_opts).items(): setattr(opts, k, v) diff --git a/parsedmarc/constants.py b/parsedmarc/constants.py index 94c0d13..5a8eeee 100644 --- a/parsedmarc/constants.py +++ b/parsedmarc/constants.py @@ -1,4 +1,4 @@ -__version__ = "9.9.0" +__version__ = "9.10.0" USER_AGENT = f"parsedmarc/{__version__}" diff --git a/parsedmarc/utils.py b/parsedmarc/utils.py index ea37172..8535399 100644 --- a/parsedmarc/utils.py +++ b/parsedmarc/utils.py @@ -16,6 +16,7 @@ import re import shutil import subprocess import tempfile +import time from datetime import datetime, timedelta, timezone from typing import Optional, TypedDict, Union, cast @@ -460,6 +461,322 @@ def load_ip_db( logger.info("Using bundled IP database") +class _IPDatabaseRecord(TypedDict): + country: Optional[str] + asn: Optional[int] + asn_name: Optional[str] + asn_domain: Optional[str] + + +class InvalidIPinfoAPIKey(Exception): + """Raised when the IPinfo API rejects the configured token.""" + + +# IPinfo Lite REST API. When ``_IPINFO_API_TOKEN`` is set, ``get_ip_address_db_record()`` +# queries the API first and falls through to the bundled/cached MMDB only on +# rate-limit/quota/network errors. A 401/403 on any lookup propagates as +# ``InvalidIPinfoAPIKey`` so the CLI exits fatally; callers of the library +# should catch it. +_IPINFO_API_URL = "https://api.ipinfo.io/lite" +# Account-info / quota endpoint. Separate from the lookup URL because ``/me`` +# lives at the ipinfo.io root, not under ``/lite``. Hitting it at startup +# both validates the token and surfaces plan/usage details; IPinfo documents +# it as a quota-free meta endpoint. +_IPINFO_ACCOUNT_URL = "https://ipinfo.io/me" +_IPINFO_API_TOKEN: Optional[str] = None +_IPINFO_API_TIMEOUT: float = 5.0 +# Default cooldowns when the API returns 429/402 without a ``Retry-After`` +# header. Rate limits are usually short; quota resets (402) are typically at a +# day/month boundary, so we pick a longer default there. +_IPINFO_API_RATE_LIMIT_COOLDOWN_SECONDS: float = 300.0 +_IPINFO_API_QUOTA_COOLDOWN_SECONDS: float = 3600.0 +# Unix timestamp before which lookups skip the API and go straight to the +# MMDB. ``0`` means the API is currently available. +_IPINFO_API_COOLDOWN_UNTIL: float = 0.0 +# Latch for recovery logging: True while the API is in a rate-limited or +# quota-exhausted state, so the next successful lookup can log "recovered" +# exactly once per event. +_IPINFO_API_RATE_LIMITED: bool = False + + +def configure_ipinfo_api( + token: Optional[str], + *, + probe: bool = True, +) -> None: + """Configure the IPinfo Lite REST API as the primary source for IP lookups. + + When a token is configured, ``get_ip_address_db_record()`` hits the API + first for every lookup and falls back to the MMDB on rate-limit, quota, or + network errors. An invalid token raises ``InvalidIPinfoAPIKey`` — the CLI + catches that and exits fatally. + + Args: + token: IPinfo API token. ``None`` or empty disables the API. + probe: If ``True``, verify the token by hitting ``/me`` (and, if that + is unreachable, by looking up ``1.1.1.1``). A 401/403 raises + ``InvalidIPinfoAPIKey``; other errors are logged and the token is + still accepted so per-request fallback can take over. + """ + global _IPINFO_API_TOKEN + global _IPINFO_API_COOLDOWN_UNTIL, _IPINFO_API_RATE_LIMITED + + _IPINFO_API_TOKEN = token or None + _IPINFO_API_COOLDOWN_UNTIL = 0.0 + _IPINFO_API_RATE_LIMITED = False + + if not _IPINFO_API_TOKEN: + return + + if probe: + # Verify the token. Any network/quota failure here is non-fatal — we + # still accept the token and let per-request fallback handle it — but + # an invalid-key response must fail fast so operators notice + # immediately instead of seeing silent MMDB-only lookups all day. + # + # The /me meta endpoint doubles as a free-of-quota token check and a + # plan/usage lookup, so we try it first. If /me is unreachable, fall + # back to a lookup of 1.1.1.1 to validate the token. + account: Optional[dict] = None + try: + account = _ipinfo_api_account_info() + except InvalidIPinfoAPIKey: + raise + except Exception as e: + logger.debug(f"IPinfo account info fetch failed: {e}") + + if account is not None: + summary = _format_ipinfo_account_summary(account) + if summary: + logger.info(f"IPinfo API configured — {summary}") + else: + logger.info("IPinfo API configured") + return + + try: + _ipinfo_api_lookup("1.1.1.1") + except InvalidIPinfoAPIKey: + raise + except Exception as e: + logger.warning(f"IPinfo API probe failed (will fall back per-request): {e}") + else: + logger.info("IPinfo API configured") + + +def _ipinfo_api_account_info() -> Optional[dict]: + """Fetch the IPinfo ``/me`` account endpoint. + + Returns the parsed JSON dict on success, or ``None`` when the endpoint is + unreachable (network error, non-JSON body, non-2xx other than 401/403). + A 401/403 raises ``InvalidIPinfoAPIKey`` — this endpoint is the best way + to validate a token since it doesn't consume a lookup-quota unit. + """ + if not _IPINFO_API_TOKEN: + return None + headers = { + "User-Agent": USER_AGENT, + "Authorization": f"Bearer {_IPINFO_API_TOKEN}", + "Accept": "application/json", + } + response = requests.get( + _IPINFO_ACCOUNT_URL, headers=headers, timeout=_IPINFO_API_TIMEOUT + ) + if response.status_code in (401, 403): + raise InvalidIPinfoAPIKey( + f"IPinfo API rejected the configured token (HTTP {response.status_code})" + ) + if not response.ok: + logger.debug(f"IPinfo /me returned HTTP {response.status_code}") + return None + try: + payload = response.json() + except ValueError: + return None + return payload if isinstance(payload, dict) else None + + +def _format_ipinfo_account_summary(account: dict) -> Optional[str]: + """Render a short, log-friendly summary of the IPinfo /me response. + + Field names in /me have varied across IPinfo plan generations, so we + probe a few aliases rather than commit to one schema. If nothing + useful is present we return ``None`` and the caller falls back to a + generic "configured" message. + """ + plan = ( + account.get("plan") + or account.get("tier") + or account.get("token_type") + or account.get("type") + ) + limit = account.get("limit") or account.get("monthly_limit") + remaining = account.get("remaining") or account.get("requests_remaining") + used = account.get("month") or account.get("month_requests") or account.get("used") + + parts = [] + if plan: + parts.append(f"plan: {plan}") + if used is not None and limit: + parts.append(f"usage: {used}/{limit} this month") + elif limit: + parts.append(f"monthly limit: {limit}") + if remaining is not None: + parts.append(f"{remaining} remaining") + return ", ".join(parts) if parts else None + + +def _parse_retry_after(response, default_seconds: float) -> float: + """Parse an HTTP ``Retry-After`` header as seconds. + + Supports the delta-seconds form. HTTP-date form is rare enough for an API + client to ignore; we just fall back to the default. + """ + raw = response.headers.get("Retry-After") + if raw: + try: + return max(float(raw.strip()), 1.0) + except ValueError: + pass + return default_seconds + + +def _ipinfo_api_lookup(ip_address: str) -> Optional[_IPDatabaseRecord]: + """Look up an IP via the IPinfo Lite REST API. + + Returns the normalized record on success, or ``None`` when the API is + unavailable for any reason the caller should fall back from (network + error, 429 rate limit, 402 quota exhausted, malformed response). + + On 429/402 the API is put in a cooldown (using ``Retry-After`` when + present) so we stop hammering it, and we log once per event at warning + level. After the cooldown expires the next lookup retries transparently; + a successful retry logs "API recovered" once at info level so operators + can see service came back. + + Raises: + InvalidIPinfoAPIKey: on 401/403. Propagates to abort the run. + """ + global _IPINFO_API_COOLDOWN_UNTIL, _IPINFO_API_RATE_LIMITED + + if not _IPINFO_API_TOKEN: + return None + if _IPINFO_API_COOLDOWN_UNTIL and time.time() < _IPINFO_API_COOLDOWN_UNTIL: + return None + + url = f"{_IPINFO_API_URL}/{ip_address}" + headers = { + "User-Agent": USER_AGENT, + "Authorization": f"Bearer {_IPINFO_API_TOKEN}", + "Accept": "application/json", + } + try: + response = requests.get(url, headers=headers, timeout=_IPINFO_API_TIMEOUT) + except requests.exceptions.RequestException as e: + logger.debug(f"IPinfo API request for {ip_address} failed: {e}") + return None + + if response.status_code in (401, 403): + raise InvalidIPinfoAPIKey( + f"IPinfo API rejected the configured token (HTTP {response.status_code})" + ) + if response.status_code == 429: + cooldown = _parse_retry_after(response, _IPINFO_API_RATE_LIMIT_COOLDOWN_SECONDS) + _IPINFO_API_COOLDOWN_UNTIL = time.time() + cooldown + # First hit of a rate-limit event is visible at warning; subsequent + # 429s after cooldown-and-retry cycles stay at debug so we don't spam + # the log when a run spans a long quota reset. + if not _IPINFO_API_RATE_LIMITED: + logger.warning( + "IPinfo API rate limit hit; falling back to the local MMDB " + f"for {cooldown:.0f}s before retrying" + ) + _IPINFO_API_RATE_LIMITED = True + else: + logger.debug(f"IPinfo API still rate-limited; retry after {cooldown:.0f}s") + return None + if response.status_code == 402: + cooldown = _parse_retry_after(response, _IPINFO_API_QUOTA_COOLDOWN_SECONDS) + _IPINFO_API_COOLDOWN_UNTIL = time.time() + cooldown + if not _IPINFO_API_RATE_LIMITED: + logger.warning( + "IPinfo API quota exhausted; falling back to the local MMDB " + f"for {cooldown:.0f}s before retrying" + ) + _IPINFO_API_RATE_LIMITED = True + else: + logger.debug( + f"IPinfo API quota still exhausted; retry after {cooldown:.0f}s" + ) + return None + if not response.ok: + logger.debug( + f"IPinfo API returned HTTP {response.status_code} for {ip_address}" + ) + return None + + try: + payload = response.json() + except ValueError: + logger.debug(f"IPinfo API returned non-JSON for {ip_address}") + return None + if not isinstance(payload, dict): + return None + + if _IPINFO_API_RATE_LIMITED: + logger.info("IPinfo API recovered; resuming API lookups") + _IPINFO_API_RATE_LIMITED = False + _IPINFO_API_COOLDOWN_UNTIL = 0.0 + + return _normalize_ip_record(payload) + + +def _normalize_ip_record(record: dict) -> _IPDatabaseRecord: + """Normalize an IPinfo / MaxMind record to the internal shape. + + Shared between the API path and the MMDB path so both schemas produce the + same output: country as ISO code, ASN as plain int, asn_name string, + asn_domain lowercased. + """ + country: Optional[str] = None + asn: Optional[int] = None + asn_name: Optional[str] = None + asn_domain: Optional[str] = None + + code = record.get("country_code") + if code is None: + nested = record.get("country") + if isinstance(nested, dict): + code = nested.get("iso_code") + if isinstance(code, str): + country = code + + raw_asn = record.get("asn") + if isinstance(raw_asn, int): + asn = raw_asn + elif isinstance(raw_asn, str) and raw_asn: + digits = raw_asn.removeprefix("AS").removeprefix("as") + if digits.isdigit(): + asn = int(digits) + if asn is None: + mm_asn = record.get("autonomous_system_number") + if isinstance(mm_asn, int): + asn = mm_asn + + name = record.get("as_name") or record.get("autonomous_system_organization") + if isinstance(name, str) and name: + asn_name = name + domain = record.get("as_domain") + if isinstance(domain, str) and domain: + asn_domain = domain.lower() + + return { + "country": country, + "asn": asn, + "asn_name": asn_name, + "asn_domain": asn_domain, + } + + def _get_ip_database_path(db_path: Optional[str]) -> str: db_paths = [ "ipinfo_lite.mmdb", @@ -505,71 +822,35 @@ def _get_ip_database_path(db_path: Optional[str]) -> str: return db_path -class _IPDatabaseRecord(TypedDict): - country: Optional[str] - asn: Optional[int] - asn_name: Optional[str] - asn_domain: Optional[str] - - def get_ip_address_db_record( ip_address: str, *, db_path: Optional[str] = None ) -> _IPDatabaseRecord: - """Look up an IP in the configured MMDB and return country + ASN fields. + """Look up an IP and return country + ASN fields. + + If the IPinfo Lite API is configured via ``configure_ipinfo_api()``, the + API is queried first; any non-fatal failure (rate limit, quota, network) + falls through to the MMDB. An invalid API token raises + ``InvalidIPinfoAPIKey`` and is not caught here. IPinfo Lite carries ``country_code``, ``as_name``, and ``as_domain`` on every record. MaxMind/DBIP country-only databases carry only country, so ``asn_name`` / ``asn_domain`` come back None for those users. """ + api_record = _ipinfo_api_lookup(ip_address) + if api_record is not None: + return api_record + resolved_path = _get_ip_database_path(db_path) db_reader = maxminddb.open_database(resolved_path) record = db_reader.get(ip_address) - - country: Optional[str] = None - asn: Optional[int] = None - asn_name: Optional[str] = None - asn_domain: Optional[str] = None - if isinstance(record, dict): - # Support both the IPinfo schema (flat top-level ``country_code``) and - # the MaxMind/DBIP schema (nested ``country.iso_code``) so users - # dropping in their own MMDB from any of these providers keeps working. - code = record.get("country_code") - if code is None: - nested = record.get("country") - if isinstance(nested, dict): - code = nested.get("iso_code") - if isinstance(code, str): - country = code - - # Normalize ASN to a plain integer. IPinfo stores it as a string like - # "AS15169"; MaxMind's ASN DB uses ``autonomous_system_number`` as an - # int. Integer form lets consumers do range queries and sort - # numerically; display-time formatting with an "AS" prefix is trivial. - raw_asn = record.get("asn") - if isinstance(raw_asn, int): - asn = raw_asn - elif isinstance(raw_asn, str) and raw_asn: - digits = raw_asn.removeprefix("AS").removeprefix("as") - if digits.isdigit(): - asn = int(digits) - if asn is None: - mm_asn = record.get("autonomous_system_number") - if isinstance(mm_asn, int): - asn = mm_asn - - name = record.get("as_name") or record.get("autonomous_system_organization") - if isinstance(name, str) and name: - asn_name = name - domain = record.get("as_domain") - if isinstance(domain, str) and domain: - asn_domain = domain.lower() - - return { - "country": country, - "asn": asn, - "asn_name": asn_name, - "asn_domain": asn_domain, - } + if not isinstance(record, dict): + return { + "country": None, + "asn": None, + "asn_name": None, + "asn_domain": None, + } + return _normalize_ip_record(record) def get_ip_address_country( diff --git a/tests.py b/tests.py index b964c85..561d0e1 100755 --- a/tests.py +++ b/tests.py @@ -70,10 +70,14 @@ class Test(unittest.TestCase): result = parsedmarc.utils.get_base_domain(subdomain) assert result == "example.com" - # Test newer PSL entries + # psl_overrides.txt intentionally folds CDN-customer PTRs so every + # sender on the same network clusters under one display key. + # ``.akamaiedge.net`` is an override, so its subdomains collapse to + # ``akamaiedge.net`` even though the live PSL carries the finer-grained + # ``c.akamaiedge.net`` — the override is the design decision. subdomain = "e3191.c.akamaiedge.net" result = parsedmarc.utils.get_base_domain(subdomain) - assert result == "c.akamaiedge.net" + assert result == "akamaiedge.net" def testExtractReportXMLComparator(self): """Test XML comparator function""" @@ -270,6 +274,137 @@ class Test(unittest.TestCase): self.assertEqual(info["name"], "Some Unmapped Org, Inc.") self.assertEqual(info["asn_domain"], "unmapped-for-this-test.example") + def testIPinfoAPIPrimarySourceAndInvalidKeyIsFatal(self): + """With an API token configured, lookups hit the API first. A 401/403 + response propagates as ``InvalidIPinfoAPIKey`` so the CLI can exit. + A 429 puts the API in a cooldown (falling back to the MMDB) and a + successful retry after the cooldown logs recovery.""" + from unittest.mock import patch, MagicMock + + import parsedmarc.utils as utils_module + from parsedmarc.utils import ( + InvalidIPinfoAPIKey, + configure_ipinfo_api, + get_ip_address_db_record, + ) + + def _mock_response(status_code, json_body=None, headers=None): + resp = MagicMock() + resp.status_code = status_code + resp.ok = 200 <= status_code < 300 + resp.json.return_value = json_body or {} + resp.headers = headers or {} + return resp + + try: + # Success: API returns IPinfo-schema JSON; record comes from API. + api_json = { + "ip": "8.8.8.8", + "asn": "AS15169", + "as_name": "Google LLC", + "as_domain": "google.com", + "country_code": "US", + } + with patch( + "parsedmarc.utils.requests.get", + return_value=_mock_response(200, api_json), + ): + configure_ipinfo_api("fake-token", probe=False) + record = get_ip_address_db_record("8.8.8.8") + self.assertEqual(record["country"], "US") + self.assertEqual(record["asn"], 15169) + self.assertEqual(record["asn_domain"], "google.com") + + # Invalid key: 401 raises a fatal exception even on a random lookup. + with patch( + "parsedmarc.utils.requests.get", + return_value=_mock_response(401), + ): + configure_ipinfo_api("bad-token", probe=False) + with self.assertRaises(InvalidIPinfoAPIKey): + get_ip_address_db_record("8.8.8.8") + + # Rate limited: 429 sets a cooldown and falls back to the MMDB. + # The first rate-limit event is logged at WARNING; during the + # cooldown no further API requests are made. + configure_ipinfo_api("rate-limited", probe=False) + with patch( + "parsedmarc.utils.requests.get", + return_value=_mock_response(429, headers={"Retry-After": "120"}), + ): + with self.assertLogs("parsedmarc.log", level="WARNING") as cm: + record = get_ip_address_db_record("8.8.8.8") + # MMDB fallback fills in Google's ASN from the bundled MMDB. + self.assertEqual(record["asn"], 15169) + self.assertTrue( + any("rate limit" in line.lower() for line in cm.output), + f"expected a rate-limit warning, got: {cm.output}", + ) + self.assertTrue(utils_module._IPINFO_API_RATE_LIMITED) + self.assertGreater(utils_module._IPINFO_API_COOLDOWN_UNTIL, 0.0) + + # During cooldown: no API call; fall straight through to MMDB. + poisoned = {"asn": "AS1", "country_code": "ZZ"} + with patch( + "parsedmarc.utils.requests.get", + return_value=_mock_response(200, poisoned), + ) as mock_get: + record = get_ip_address_db_record("8.8.8.8") + mock_get.assert_not_called() + + # Simulate the cooldown expiring, then a successful retry: the + # recovery is logged at INFO and API lookups resume. + utils_module._IPINFO_API_COOLDOWN_UNTIL = 0.0 + with patch( + "parsedmarc.utils.requests.get", + return_value=_mock_response(200, api_json), + ): + with self.assertLogs("parsedmarc.log", level="INFO") as cm: + record = get_ip_address_db_record("8.8.8.8") + self.assertEqual(record["asn_domain"], "google.com") + self.assertTrue( + any("recovered" in line.lower() for line in cm.output), + f"expected a recovery info log, got: {cm.output}", + ) + self.assertFalse(utils_module._IPINFO_API_RATE_LIMITED) + finally: + configure_ipinfo_api(None) + + def testIPinfoAPIStartupLogsAccountQuota(self): + """``configure_ipinfo_api(..., probe=True)`` should hit the /me + endpoint and log plan/usage info at INFO level when available.""" + from unittest.mock import patch, MagicMock + + from parsedmarc.utils import configure_ipinfo_api + + me_body = { + "plan": "Lite", + "month": 12345, + "limit": 50000, + "remaining": 37655, + } + mock_resp = MagicMock() + mock_resp.status_code = 200 + mock_resp.ok = True + mock_resp.json.return_value = me_body + mock_resp.headers = {} + + try: + with patch( + "parsedmarc.utils.requests.get", return_value=mock_resp + ) as mock_get: + with self.assertLogs("parsedmarc.log", level="INFO") as cm: + configure_ipinfo_api("good-token", probe=True) + # /me is the first (and only) probe request when it succeeds. + called_urls = [args[0] for args, _ in mock_get.call_args_list] + self.assertIn("https://ipinfo.io/me", called_urls) + output = " ".join(cm.output) + self.assertIn("Lite", output) + self.assertIn("12345/50000", output) + self.assertIn("37655", output) + finally: + configure_ipinfo_api(None) + def testAggregateCsvExposesASNColumns(self): """The aggregate CSV output should include source_asn, source_asn_name, and source_asn_domain columns.""" @@ -2784,6 +2919,38 @@ class TestConfigAliases(unittest.TestCase): self.assertEqual(opts.maildir_path, "/original/path") self.assertTrue(opts.maildir_create) + def test_ipinfo_url_option(self): + """[general] ipinfo_url lands on opts.ipinfo_url.""" + from argparse import Namespace + from parsedmarc.cli import _parse_config + + config = ConfigParser(interpolation=None) + config.add_section("general") + config.set("general", "ipinfo_url", "https://mirror.example/mmdb") + + opts = Namespace() + _parse_config(config, opts) + self.assertEqual(opts.ipinfo_url, "https://mirror.example/mmdb") + + def test_ip_db_url_deprecated_alias(self): + """[general] ip_db_url is accepted as an alias for ipinfo_url but + emits a deprecation warning.""" + from argparse import Namespace + from parsedmarc.cli import _parse_config + + config = ConfigParser(interpolation=None) + config.add_section("general") + config.set("general", "ip_db_url", "https://old.example/mmdb") + + opts = Namespace() + with self.assertLogs("parsedmarc.log", level="WARNING") as cm: + _parse_config(config, opts) + self.assertEqual(opts.ipinfo_url, "https://old.example/mmdb") + self.assertTrue( + any("ip_db_url" in line and "deprecated" in line for line in cm.output), + f"expected deprecation warning, got: {cm.output}", + ) + class TestMaildirUidHandling(unittest.TestCase): """Tests for Maildir UID mismatch handling in Docker-like environments."""