Surface ASN info and use it for source attribution when a PTR is absent (#715)

* Surface ASN info and fall back to it when a PTR is absent Adds three new fields to every IP source record — ``asn`` (integer, e.g. 15169), ``asn_name`` (``"Google LLC"``), ``asn_domain`` (``"google.com"``) — sourced from the bundled IPinfo Lite MMDB. These flow through to CSV, JSON, Elasticsearch, OpenSearch, and Splunk outputs as ``source_asn``, ``source_asn_name``, ``source_asn_domain``. More importantly: when an IP has no reverse DNS (common for many large senders), source attribution now falls back to the ASN domain as a lookup key into the same ``reverse_dns_map``. Thanks to #712 and #714, ~85% of routed IPv4 space now has an ``as_domain`` that hits the map, so rows that were previously unattributable now get a ``source_name``/``source_type`` derived from the ASN. When the ASN domain misses the map, the raw AS name is used as ``source_name`` with ``source_type`` left null — still better than nothing. Crucially, ``source_reverse_dns`` and ``source_base_domain`` remain null on ASN-derived rows, so downstream consumers can still tell a PTR-resolved attribution apart from an ASN-derived one. ASN is stored as an integer at the schema level (Elasticsearch / OpenSearch mappings use ``Integer``) so consumers can do range queries and numeric sorts; dashboards can prepend ``AS`` at display time. The MMDB reader normalizes both IPinfo's ``"AS15169"`` string and MaxMind's ``autonomous_system_number`` int to the same int form. Also fixes a pre-existing caching bug in ``get_ip_address_info``: entries without reverse DNS were never written to the IP-info cache, so every no-PTR IP re-did the MMDB read and DNS attempt on every call. The cache write is now unconditional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Bump to 9.9.0 and document the ASN fallback work Updates the changelog with a 9.9.0 entry covering the ASN-domain aliases (#712, #714), map-maintenance tooling fixes (#713), and the ASN-fallback source attribution added in this branch. Extends AGENTS.md to explain that ``base_reverse_dns_map.csv`` is now a mixed-namespace map (rDNS bases alongside ASN domains) and adds a short recipe for finding high-value ASN-domain misses against the bundled MMDB, so future contributors know where the map's second lookup path comes from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document project conventions previously held only in agent memory Promotes four conventions out of per-agent memory and into AGENTS.md so every contributor — human or agent — works from the same baseline: - Run ruff check + format before committing (Code Style). - Store natively numeric values as numbers, not pre-formatted strings (e.g. ASN as int 15169, not "AS15169"; ES/OS mappings as Integer) (Code Style). - Before rewriting a tracked list/data file from freshly-generated content, verify the existing content via git — these files accumulate manually-curated entries across sessions (Editing tracked data files). - A release isn't done until hatch-built sdist + wheel are attached to the GitHub release page; full 8-step sequence documented (Releases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-08 03:39:43 +00:00 · 2026-04-23 02:13:30 -04:00
parent c2678f8e21
commit 2cda5bf59b
11 changed files with 315 additions and 49 deletions
@@ -62,22 +62,42 @@ IP address info cached for 4 hours, seen aggregate report IDs cached for 1 hour

 ## Code Style

- Ruff for formatting and linting (configured in `.vscode/settings.json`)
- TypedDict for structured data, type hints throughout
- Python ≥3.10 required
- Tests are in a single `tests.py` file using unittest; sample reports live in `samples/`
- File path config values must be wrapped with `_expand_path()` in `cli.py`
- Maildir UID checks are intentionally relaxed (warn, don't crash) for Docker compatibility
- Token file writes must create parent directories before opening for write
+- Ruff for formatting and linting (configured in `.vscode/settings.json`). Run `ruff check .` and `ruff format --check .` after every code edit, before committing.
+- TypedDict for structured data, type hints throughout.
+- Python ≥3.10 required.
+- Tests are in a single `tests.py` file using unittest; sample reports live in `samples/`.
+- File path config values must be wrapped with `_expand_path()` in `cli.py`.
+- Maildir UID checks are intentionally relaxed (warn, don't crash) for Docker compatibility.
+- Token file writes must create parent directories before opening for write.
+- Store natively numeric values as numbers, not pre-formatted strings. Example: ASN is stored as `int 15169`, not `"AS15169"`; Elasticsearch / OpenSearch mappings for such fields use `Integer()` so consumers can do range queries and numeric sorts. Display layers format with a prefix at render time.
+
+## Editing tracked data files
+
+Before rewriting a tracked list/data file from freshly-generated content (anything under `parsedmarc/resources/maps/`, CSVs, `.txt` lists), check the existing file first — `git show HEAD:<path> | wc -l`, `git log -1 -- <path>`, `git diff --stat`. Files like `known_unknown_base_reverse_dns.txt` and `base_reverse_dns_map.csv` accumulate manually-curated entries across many sessions, and a "fresh" regeneration that drops the row count is almost certainly destroying prior work. If the new content is meant to *add* rather than *replace*, use a merge/append pattern. Treat any unexpected row-count drop in the pending diff as a red flag.
+
+## Releases
+
+A release isn't done until built artifacts are attached to the GitHub release page. Full sequence:
+
+1. Bump version in `parsedmarc/constants.py`; update `CHANGELOG.md` with a new section under the new version number.
+2. Commit on a feature branch, open a PR, merge to master.
+3. `git fetch && git checkout master && git pull`.
+4. `git tag -a <version> -m "<version>" <sha>` and `git push origin <version>`.
+5. `rm -rf dist && hatch build`. Verify `git describe --tags --exact-match` matches the tag.
+6. `gh release create <version> --title "<version>" --notes-file <notes>`.
+7. `gh release upload <version> dist/parsedmarc-<version>.tar.gz dist/parsedmarc-<version>-py3-none-any.whl`.
+8. Confirm `gh release view <version> --json assets` shows both the sdist and the wheel before considering the release complete.

 ## Maintaining the reverse DNS maps

-`parsedmarc/resources/maps/base_reverse_dns_map.csv` maps reverse DNS base domains to a display name and service type. See `parsedmarc/resources/maps/README.md` for the field format and the service_type precedence rules.
+`parsedmarc/resources/maps/base_reverse_dns_map.csv` maps a base domain to a display name and service type. The same map is consulted at two points: first with a PTR-derived base domain, and — if the IP has no PTR — with the ASN domain from the bundled IPinfo Lite MMDB (`parsedmarc/resources/ipinfo/ipinfo_lite.mmdb`). See `parsedmarc/resources/maps/README.md` for the field format and the service_type precedence rules.
+
+Because both lookup paths read the same CSV, map keys are a mixed namespace — rDNS-base domains (e.g. `comcast.net`, discovered via `base_reverse_dns.csv`) coexist with ASN domains (e.g. `comcast.com`, discovered via coverage-gap analysis against the MMDB). Entries of both kinds should point to the same `(name, type)` when they describe the same operator — grep before inventing a new display name.

 ### File format

 - CSV uses **CRLF** line endings and UTF-8 encoding — preserve both when editing programmatically.
- Entries are sorted alphabetically (case-insensitive) by the first column.
+- Entries are sorted alphabetically (case-insensitive) by the first column. `parsedmarc/resources/maps/sortlists.py` is authoritative — run it after any batch edit to re-sort, dedupe, and validate `type` values.
 - Names containing commas must be quoted.
 - Do not edit in Excel (it mangles Unicode); use LibreOffice Calc or a text editor.

@@ -125,7 +145,32 @@ When `unknown_base_reverse_dns.csv` has new entries, follow this order rather th
 - `detect_psl_overrides.py` — scans the lists for clustered IP-containing patterns, auto-adds brand suffixes to `psl_overrides.txt`, folds affected entries to their base, and removes any remaining full-IP entries. Run before the collector on any new batch.
 - `collect_domain_info.py` — the bulk enrichment collector described above. Respects `psl_overrides.txt` and skips full-IP entries.
 - `find_bad_utf8.py` — locates invalid UTF-8 bytes (used after past encoding corruption).
- `sortlists.py` — sorting helper for the list files.
+- `sortlists.py` — case-insensitive sort + dedupe + `type`-column validator for the list files; the authoritative sorter run after every batch edit.
+
+### Checking ASN-domain coverage of the MMDB
+
+Separately from `base_reverse_dns.csv`, the MMDB itself is a source of keys worth mapping. To find ASN domains with high IP weight that don't yet have a map entry, walk every record in `ipinfo_lite.mmdb`, aggregate IPv4 count per `as_domain`, and subtract what's already a map key:
+
+```python
+import csv, maxminddb
+from collections import defaultdict
+keys = set()
+with open("parsedmarc/resources/maps/base_reverse_dns_map.csv", newline="", encoding="utf-8") as f:
+    for row in csv.DictReader(f):
+        keys.add(row["base_reverse_dns"].strip().lower())
+v4 = defaultdict(int); names = {}
+for net, rec in maxminddb.open_database("parsedmarc/resources/ipinfo/ipinfo_lite.mmdb"):
+    if net.version != 4 or not isinstance(rec, dict): continue
+    d = rec.get("as_domain")
+    if not d: continue
+    v4[d.lower()] += net.num_addresses
+    names[d.lower()] = rec.get("as_name", "")
+miss = sorted(((d, v4[d], names[d]) for d in v4 if d not in keys), key=lambda x: -x[1])
+for d, c, n in miss[:50]:
+    print(f"{c:>12,}  {d:<30}  {n}")
+```
+
+Apply the same classification rules above (precedence, naming consistency, skip-if-ambiguous, privacy). Many top misses will be brands already in the map under a different rDNS-base key — the goal there is to alias the ASN domain to the same `(name, type)` so both lookup paths hit. For ASN domains with no obvious brand identity (small resellers, parked ASNs), don't map them — the attribution code falls back to the raw `as_name` from the MMDB, which is better than a guess.

 ### After a batch merge

@@ -1,5 +1,29 @@
 # Changelog

+## 9.9.0
+
+### Changes
+
+- Source attribution now has an ASN fallback. Every IP source record carries three new fields — `asn` (integer, e.g. `15169`), `asn_name` (`"Google LLC"`), and `asn_domain` (`"google.com"`) — sourced from the bundled IPinfo Lite MMDB. When an IP has no reverse DNS, `get_ip_address_info()` uses `asn_domain` as a lookup into the same `reverse_dns_map`, and if that misses, falls back to the raw `asn_name`. `reverse_dns` and `base_domain` stay null on ASN-derived rows so consumers can still distinguish PTR-derived from ASN-derived attribution.
+- Added `source_asn`, `source_asn_name`, `source_asn_domain` to CSV output (aggregate + forensic), JSON output, and the Elasticsearch / OpenSearch / Splunk integrations. `source_asn` is mapped as `Integer` at the schema level so consumers can do range queries and numeric sorts; dashboards can prepend `"AS"` at display time.
+- Expanded `base_reverse_dns_map.csv` with 500 ASN-domain aliases for the most-routed IPv4 ranges. IPv4-weighted coverage of the bundled `ipinfo_lite.mmdb` went from ~34% of routed space matching a map entry via ASN domain to ~85%. Every alias is a brand that was already in the map under a different rDNS-base key (e.g. adding `comcast.com` alongside the existing `comcast.net`), plus a small number of large operators that previously had no entry. 11 entries were also promoted out of `known_unknown_base_reverse_dns.txt` because ASN context made their identity unambiguous.
+- Added `get_ip_address_db_record()` in `parsedmarc.utils`, a single-open MMDB reader that returns country + ASN fields together. `get_ip_address_country()` is now a thin wrapper. Supports both IPinfo Lite's schema (`country_code`, `asn` as `"AS15169"`, `as_name`, `as_domain`) and MaxMind's (`country.iso_code`, `autonomous_system_number` as int, `autonomous_system_organization`) in one pass; ASN is normalized to a plain int from either. MaxMind users who drop in their own ASN MMDB get `asn` + `asn_name` populated; `asn_domain` stays null because MaxMind doesn't carry it.
+
+### Fixed
+
+- `get_ip_address_info()` now caches entries for IPs without reverse DNS. Previously the cache write was inside the `if reverse_dns is not None` branch, so every no-PTR IP re-did the MMDB read and DNS attempt on every call.
+- Fixed three bugs in `parsedmarc/resources/maps/sortlists.py` that silently disabled the `type`-column validator and sorted the map case-sensitively, contrary to its documented behavior:
+  - Validator allowed-values map was keyed on `"Type"` (capital T), but the CSV header is `"type"` (lowercase), so every row bypassed validation.
+  - Types were read with trailing newlines via `f.readlines()`, so comparisons would not have matched even if the column name had been right.
+  - `sort_csv()` was called without `case_insensitive_sort=True`, which moved the sole mixed-case key (`United-domains.de`) to the top of the file instead of into its alphabetical position.
+- Fixed eight pre-existing map rows with invalid or inconsistent `type` values that the now-working validator surfaced: casing corrections for `dhl.com` (`logistics` → `Logistics`), `ghm-grenoble.fr` (`healthcare` → `Healthcare`), and `regusnet.com` (`Real estate` → `Real Estate`); reclassified `lodestonegroup.com` from the nonexistent `Insurance` type to `Finance`; added missing `Religion` and `Utilities` entries to `base_reverse_dns_types.txt` so it matches the README's industry list.
+- Fixed the `rt.ru` map entry: was classified as `RT,Government Media`, which conflated Rostelecom (the Russian telco that owns and uses `rt.ru`) with RT / Russia Today (which uses `rt.com`). Corrected to `Rostelecom,ISP`.
+
+### Upgrade notes
+
+- Output schema change: CSV, JSON, Elasticsearch, OpenSearch, and Splunk all gain three new fields per row (`source_asn`, `source_asn_name`, `source_asn_domain`). Existing queries and dashboards keep working; dashboards that want to consume the new fields will need to be updated. Elasticsearch / OpenSearch will add the new mappings on next document write.
+- Rows for IPs without reverse DNS now populate `source_name` / `source_type` via ASN fallback. If downstream dashboards treated "null `source_name`" as a signal for "no rDNS", switch to checking `source_reverse_dns IS NULL` instead — that remains the unambiguous signal.
+
 ## 9.8.0

 ### Changes
@@ -44,7 +44,10 @@ of the report schema.
        "reverse_dns": null,
        "base_domain": null,
        "name": null,
-        "type": null
+        "type": null,
+        "asn": 7018,
+        "asn_name": "AT&T Services, Inc.",
+        "asn_domain": "att.com"
      },
      "count": 2,
      "alignment": {
@@ -90,7 +93,7 @@ of the report schema.
 ### CSV aggregate report

 ```text
-xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,normalized_timespan,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
+xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,normalized_timespan,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,source_asn,source_asn_name,source_asn_domain,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
 draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
 draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass

@@ -123,7 +126,12 @@ Thanks to GitHub user [xennn](https://github.com/xennn) for the anonymized
       "ip_address": "10.10.10.10",
       "country": null,
       "reverse_dns": null,
-       "base_domain": null
+       "base_domain": null,
+       "name": null,
+       "type": null,
+       "asn": null,
+       "asn_name": null,
+       "asn_domain": null
     },
     "authentication_mechanisms": [],
     "original_envelope_id": null,
@@ -193,7 +201,7 @@ Thanks to GitHub user [xennn](https://github.com/xennn) for the anonymized
 ### CSV forensic report

 ```text
-feedback_type,user_agent,version,original_envelope_id,original_mail_from,original_rcpt_to,arrival_date,arrival_date_utc,subject,message_id,authentication_results,dkim_domain,source_ip_address,source_country,source_reverse_dns,source_base_domain,delivery_result,auth_failure,reported_domain,authentication_mechanisms,sample_headers_only
+feedback_type,user_agent,version,original_envelope_id,original_mail_from,original_rcpt_to,arrival_date,arrival_date_utc,subject,message_id,authentication_results,dkim_domain,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,source_asn,source_asn_name,source_asn_domain,delivery_result,auth_failure,reported_domain,authentication_mechanisms,sample_headers_only
 auth-failure,Lua/1.0,1.0,,sharepoint@domain.de,peter.pan@domain.de,"Mon, 01 Oct 2018 11:20:27 +0200",2018-10-01 09:20:27,Subject,<38.E7.30937.BD6E1BB5@ mailrelay.de>,"dmarc=fail (p=none, dis=none) header.from=domain.de",,10.10.10.10,,,,policy,dmarc,domain.de,,False
 ```

@@ -238,4 +246,4 @@ auth-failure,Lua/1.0,1.0,,sharepoint@domain.de,peter.pan@domain.de,"Mon, 01 Oct
    ]
  }
 ]
-```
+```
@@ -1114,6 +1114,9 @@ def parsed_aggregate_reports_to_csv_rows(
            row["source_base_domain"] = record["source"]["base_domain"]
            row["source_name"] = record["source"]["name"]
            row["source_type"] = record["source"]["type"]
+            row["source_asn"] = record["source"]["asn"]
+            row["source_asn_name"] = record["source"]["asn_name"]
+            row["source_asn_domain"] = record["source"]["asn_domain"]
            row["count"] = record["count"]
            row["spf_aligned"] = record["alignment"]["spf"]
            row["dkim_aligned"] = record["alignment"]["dkim"]
@@ -1205,6 +1208,9 @@ def parsed_aggregate_reports_to_csv(
        "source_base_domain",
        "source_name",
        "source_type",
+        "source_asn",
+        "source_asn_name",
+        "source_asn_domain",
        "count",
        "spf_aligned",
        "dkim_aligned",
@@ -1406,6 +1412,9 @@ def parsed_forensic_reports_to_csv_rows(
        row["source_base_domain"] = report["source"]["base_domain"]
        row["source_name"] = report["source"]["name"]
        row["source_type"] = report["source"]["type"]
+        row["source_asn"] = report["source"]["asn"]
+        row["source_asn_name"] = report["source"]["asn_name"]
+        row["source_asn_domain"] = report["source"]["asn_domain"]
        row["source_country"] = report["source"]["country"]
        del row["source"]
        row["subject"] = report["parsed_sample"].get("subject")
@@ -1451,6 +1460,9 @@ def parsed_forensic_reports_to_csv(
        "source_base_domain",
        "source_name",
        "source_type",
+        "source_asn",
+        "source_asn_name",
+        "source_asn_domain",
        "delivery_result",
        "auth_failure",
        "reported_domain",
@@ -1,4 +1,4 @@
-__version__ = "9.8.0"
+__version__ = "9.9.0"

 USER_AGENT = f"parsedmarc/{__version__}"

@@ -79,6 +79,9 @@ class _AggregateReportDoc(Document):
    source_base_domain = Text()
    source_type = Text()
    source_name = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
    message_count = Integer
    disposition = Text()
    dkim_aligned = Boolean()
@@ -173,6 +176,9 @@ class _ForensicReportDoc(Document):
    source_ip_address = Ip()
    source_country = Text()
    source_reverse_dns = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
    source_authentication_mechanisms = Text()
    source_auth_failures = Text()
    dkim_domain = Text()
@@ -489,6 +495,9 @@ def save_aggregate_report_to_elasticsearch(
            source_base_domain=record["source"]["base_domain"],
            source_type=record["source"]["type"],
            source_name=record["source"]["name"],
+            source_asn=record["source"]["asn"],
+            source_asn_name=record["source"]["asn_name"],
+            source_asn_domain=record["source"]["asn_domain"],
            message_count=record["count"],
            disposition=record["policy_evaluated"]["disposition"],
            dkim_aligned=record["policy_evaluated"]["dkim"] is not None
@@ -673,6 +682,9 @@ def save_forensic_report_to_elasticsearch(
            source_country=forensic_report["source"]["country"],
            source_reverse_dns=forensic_report["source"]["reverse_dns"],
            source_base_domain=forensic_report["source"]["base_domain"],
+            source_asn=forensic_report["source"]["asn"],
+            source_asn_name=forensic_report["source"]["asn_name"],
+            source_asn_domain=forensic_report["source"]["asn_domain"],
            authentication_mechanisms=forensic_report["authentication_mechanisms"],
            auth_failure=forensic_report["auth_failure"],
            dkim_domain=forensic_report["dkim_domain"],
@@ -82,6 +82,9 @@ class _AggregateReportDoc(Document):
    source_base_domain = Text()
    source_type = Text()
    source_name = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
    message_count = Integer
    disposition = Text()
    dkim_aligned = Boolean()
@@ -176,6 +179,9 @@ class _ForensicReportDoc(Document):
    source_ip_address = Ip()
    source_country = Text()
    source_reverse_dns = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
    source_authentication_mechanisms = Text()
    source_auth_failures = Text()
    dkim_domain = Text()
@@ -519,6 +525,9 @@ def save_aggregate_report_to_opensearch(
            source_base_domain=record["source"]["base_domain"],
            source_type=record["source"]["type"],
            source_name=record["source"]["name"],
+            source_asn=record["source"]["asn"],
+            source_asn_name=record["source"]["asn_name"],
+            source_asn_domain=record["source"]["asn_domain"],
            message_count=record["count"],
            disposition=record["policy_evaluated"]["disposition"],
            dkim_aligned=record["policy_evaluated"]["dkim"] is not None
@@ -703,6 +712,9 @@ def save_forensic_report_to_opensearch(
            source_country=forensic_report["source"]["country"],
            source_reverse_dns=forensic_report["source"]["reverse_dns"],
            source_base_domain=forensic_report["source"]["base_domain"],
+            source_asn=forensic_report["source"]["asn"],
+            source_asn_name=forensic_report["source"]["asn_name"],
+            source_asn_domain=forensic_report["source"]["asn_domain"],
            authentication_mechanisms=forensic_report["authentication_mechanisms"],
            auth_failure=forensic_report["auth_failure"],
            dkim_domain=forensic_report["dkim_domain"],
@@ -104,6 +104,9 @@ class HECClient(object):
                new_report["source_base_domain"] = record["source"]["base_domain"]
                new_report["source_type"] = record["source"]["type"]
                new_report["source_name"] = record["source"]["name"]
+                new_report["source_asn"] = record["source"]["asn"]
+                new_report["source_asn_name"] = record["source"]["asn_name"]
+                new_report["source_asn_domain"] = record["source"]["asn_domain"]
                new_report["message_count"] = record["count"]
                new_report["disposition"] = record["policy_evaluated"]["disposition"]
                new_report["spf_aligned"] = record["alignment"]["spf"]
@@ -40,6 +40,9 @@ class IPSourceInfo(TypedDict):
    base_domain: Optional[str]
    name: Optional[str]
    type: Optional[str]
+    asn: Optional[int]
+    asn_name: Optional[str]
+    asn_domain: Optional[str]


 class AggregateAlignment(TypedDict):
@@ -151,6 +151,9 @@ class IPAddressInfo(TypedDict):
    base_domain: Optional[str]
    name: Optional[str]
    type: Optional[str]
+    asn: Optional[int]
+    asn_name: Optional[str]
+    asn_domain: Optional[str]


 def decode_base64(data: str) -> bytes:
@@ -457,20 +460,7 @@ def load_ip_db(
    logger.info("Using bundled IP database")


-def get_ip_address_country(
-    ip_address: str, *, db_path: Optional[str] = None
-) -> Optional[str]:
-    """
-    Returns the ISO code for the country associated
-    with the given IPv4 or IPv6 address
-
-    Args:
-        ip_address (str): The IP address to query for
-        db_path (str): Path to a MMDB file from IPinfo, MaxMind, or DBIP
-
-    Returns:
-        str: And ISO country code associated with the given IP address
-    """
+def _get_ip_database_path(db_path: Optional[str]) -> str:
    db_paths = [
        "ipinfo_lite.mmdb",
        "GeoLite2-Country.mmdb",
@@ -486,14 +476,13 @@ def get_ip_address_country(
        "dbip-country.mmdb",
    ]

-    if db_path is not None:
-        if not os.path.isfile(db_path):
-            logger.warning(
-                f"No file exists at {db_path}. Falling back to an "
-                "included copy of the IPinfo IP to Country "
-                "Lite database."
-            )
-            db_path = None
+    if db_path is not None and not os.path.isfile(db_path):
+        logger.warning(
+            f"No file exists at {db_path}. Falling back to an "
+            "included copy of the IPinfo IP to Country "
+            "Lite database."
+        )
+        db_path = None

    if db_path is None:
        for system_path in db_paths:
@@ -513,14 +502,37 @@ def get_ip_address_country(
    if db_age > timedelta(days=30):
        logger.warning("IP database is more than a month old")

-    db_reader = maxminddb.open_database(db_path)
+    return db_path
+
+
+class _IPDatabaseRecord(TypedDict):
+    country: Optional[str]
+    asn: Optional[int]
+    asn_name: Optional[str]
+    asn_domain: Optional[str]
+
+
+def get_ip_address_db_record(
+    ip_address: str, *, db_path: Optional[str] = None
+) -> _IPDatabaseRecord:
+    """Look up an IP in the configured MMDB and return country + ASN fields.
+
+    IPinfo Lite carries ``country_code``, ``as_name``, and ``as_domain`` on
+    every record. MaxMind/DBIP country-only databases carry only country, so
+    ``asn_name`` / ``asn_domain`` come back None for those users.
+    """
+    resolved_path = _get_ip_database_path(db_path)
+    db_reader = maxminddb.open_database(resolved_path)
    record = db_reader.get(ip_address)

-    # Support both the IPinfo schema (flat top-level ``country_code``) and the
-    # MaxMind/DBIP schema (nested ``country.iso_code``) so users dropping in
-    # their own MMDB from any of these providers keeps working.
    country: Optional[str] = None
+    asn: Optional[int] = None
+    asn_name: Optional[str] = None
+    asn_domain: Optional[str] = None
    if isinstance(record, dict):
+        # Support both the IPinfo schema (flat top-level ``country_code``) and
+        # the MaxMind/DBIP schema (nested ``country.iso_code``) so users
+        # dropping in their own MMDB from any of these providers keeps working.
        code = record.get("country_code")
        if code is None:
            nested = record.get("country")
@@ -529,7 +541,52 @@ def get_ip_address_country(
        if isinstance(code, str):
            country = code

-    return country
+        # Normalize ASN to a plain integer. IPinfo stores it as a string like
+        # "AS15169"; MaxMind's ASN DB uses ``autonomous_system_number`` as an
+        # int. Integer form lets consumers do range queries and sort
+        # numerically; display-time formatting with an "AS" prefix is trivial.
+        raw_asn = record.get("asn")
+        if isinstance(raw_asn, int):
+            asn = raw_asn
+        elif isinstance(raw_asn, str) and raw_asn:
+            digits = raw_asn.removeprefix("AS").removeprefix("as")
+            if digits.isdigit():
+                asn = int(digits)
+        if asn is None:
+            mm_asn = record.get("autonomous_system_number")
+            if isinstance(mm_asn, int):
+                asn = mm_asn
+
+        name = record.get("as_name") or record.get("autonomous_system_organization")
+        if isinstance(name, str) and name:
+            asn_name = name
+        domain = record.get("as_domain")
+        if isinstance(domain, str) and domain:
+            asn_domain = domain.lower()
+
+    return {
+        "country": country,
+        "asn": asn,
+        "asn_name": asn_name,
+        "asn_domain": asn_domain,
+    }
+
+
+def get_ip_address_country(
+    ip_address: str, *, db_path: Optional[str] = None
+) -> Optional[str]:
+    """
+    Returns the ISO code for the country associated
+    with the given IPv4 or IPv6 address.
+
+    Args:
+        ip_address (str): The IP address to query for
+        db_path (str): Path to a MMDB file from IPinfo, MaxMind, or DBIP
+
+    Returns:
+        str: And ISO country code associated with the given IP address
+    """
+    return get_ip_address_db_record(ip_address, db_path=db_path)["country"]


 def load_reverse_dns_map(
@@ -723,6 +780,9 @@ def get_ip_address_info(
        "base_domain": None,
        "name": None,
        "type": None,
+        "asn": None,
+        "asn_name": None,
+        "asn_domain": None,
    }
    if offline:
        reverse_dns = None
@@ -733,9 +793,13 @@ def get_ip_address_info(
            timeout=timeout,
            retries=retries,
        )
-    country = get_ip_address_country(ip_address, db_path=ip_db_path)
-    info["country"] = country
+    db_record = get_ip_address_db_record(ip_address, db_path=ip_db_path)
+    info["country"] = db_record["country"]
+    info["asn"] = db_record["asn"]
+    info["asn_name"] = db_record["asn_name"]
+    info["asn_domain"] = db_record["asn_domain"]
    info["reverse_dns"] = reverse_dns
+
    if reverse_dns is not None:
        base_domain = get_base_domain(reverse_dns)
        if base_domain is not None:
@@ -750,12 +814,34 @@ def get_ip_address_info(
            info["base_domain"] = base_domain
            info["type"] = service["type"]
            info["name"] = service["name"]
-
-        if cache is not None:
-            cache[ip_address] = info
-            logger.debug(f"IP address {ip_address} added to cache")
    else:
        logger.debug(f"IP address {ip_address} reverse_dns not found")
+        # Fall back to ASN data for source attribution. ``reverse_dns`` and
+        # ``base_domain`` are left null so consumers can still tell an
+        # ASN-derived row apart from one resolved via a real PTR.
+        map_value: ReverseDNSMap = (
+            reverse_dns_map if reverse_dns_map is not None else {}
+        )
+        if len(map_value) == 0:
+            load_reverse_dns_map(
+                map_value,
+                always_use_local_file=always_use_local_files,
+                local_file_path=reverse_dns_map_path,
+                url=reverse_dns_map_url,
+                offline=offline,
+            )
+        if info["asn_domain"] and info["asn_domain"] in map_value:
+            service = map_value[info["asn_domain"]]
+            info["name"] = service["name"]
+            info["type"] = service["type"]
+        elif info["asn_name"]:
+            # ASN-domain not in the map: surface the raw AS name with no
+            # classification. Better than leaving the row unattributed.
+            info["name"] = info["asn_name"]
+
+    if cache is not None:
+        cache[ip_address] = info
+        logger.debug(f"IP address {ip_address} added to cache")

    return info

@@ -223,6 +223,67 @@ class Test(unittest.TestCase):
            parsedmarc.parsed_smtp_tls_reports_to_csv(result["report"])
            print("Passed!")

+    def testIpAddressInfoSurfacesASNFields(self):
+        """ASN number, name, and domain from the bundled MMDB appear on every
+        IP info result, even when no PTR resolves."""
+        info = parsedmarc.utils.get_ip_address_info("8.8.8.8", offline=True)
+        self.assertEqual(info["asn"], 15169)
+        self.assertIsInstance(info["asn"], int)
+        self.assertEqual(info["asn_domain"], "google.com")
+        self.assertTrue(info["asn_name"])
+
+    def testIpAddressInfoFallsBackToASNMapEntryWhenNoPTR(self):
+        """When reverse DNS is absent, the ASN domain should be used as a
+        lookup into the reverse_dns_map so the row still gets attributed,
+        while reverse_dns and base_domain remain null."""
+        info = parsedmarc.utils.get_ip_address_info("8.8.8.8", offline=True)
+        self.assertIsNone(info["reverse_dns"])
+        self.assertIsNone(info["base_domain"])
+        self.assertEqual(info["name"], "Google (Including Gmail and Google Workspace)")
+        self.assertEqual(info["type"], "Email Provider")
+
+    def testIpAddressInfoFallsBackToRawASNameOnMapMiss(self):
+        """When neither PTR nor an ASN-map entry resolves, the raw AS name
+        is used as source_name with type left null — better than leaving
+        the row unattributed."""
+        # 204.79.197.100 is in an ASN whose as_domain is not in the map at
+        # the time of this test (msn.com); this exercises the asn_name
+        # fallback branch without depending on a specific map state.
+        from unittest.mock import patch
+
+        with patch(
+            "parsedmarc.utils.get_ip_address_db_record",
+            return_value={
+                "country": "US",
+                "asn": 64496,
+                "asn_name": "Some Unmapped Org, Inc.",
+                "asn_domain": "unmapped-for-this-test.example",
+            },
+        ):
+            # Bypass cache to avoid prior-test pollution.
+            info = parsedmarc.utils.get_ip_address_info(
+                "192.0.2.1", offline=True, cache=None
+            )
+        self.assertIsNone(info["reverse_dns"])
+        self.assertIsNone(info["base_domain"])
+        self.assertIsNone(info["type"])
+        self.assertEqual(info["name"], "Some Unmapped Org, Inc.")
+        self.assertEqual(info["asn_domain"], "unmapped-for-this-test.example")
+
+    def testAggregateCsvExposesASNColumns(self):
+        """The aggregate CSV output should include source_asn, source_asn_name,
+        and source_asn_domain columns."""
+        result = parsedmarc.parse_report_file(
+            "samples/aggregate/!example.com!1538204542!1538463818.xml",
+            always_use_local_files=True,
+            offline=True,
+        )
+        csv_text = parsedmarc.parsed_aggregate_reports_to_csv(result["report"])
+        header = csv_text.splitlines()[0].split(",")
+        self.assertIn("source_asn", header)
+        self.assertIn("source_asn_name", header)
+        self.assertIn("source_asn_domain", header)
+
    def testOpenSearchSigV4RequiresRegion(self):
        with self.assertRaises(opensearch_module.OpenSearchError):
            opensearch_module.set_hosts(