From 2cda5bf59bd4d40fc7ae1e737be3be1e684bc0de Mon Sep 17 00:00:00 2001
From: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
Date: Thu, 23 Apr 2026 02:13:30 -0400
Subject: [PATCH] Surface ASN info and use it for source attribution when a PTR
 is absent (#715)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Surface ASN info and fall back to it when a PTR is absent

Adds three new fields to every IP source record — ``asn`` (integer,
e.g. 15169), ``asn_name`` (``"Google LLC"``), ``asn_domain``
(``"google.com"``) — sourced from the bundled IPinfo Lite MMDB. These
flow through to CSV, JSON, Elasticsearch, OpenSearch, and Splunk
outputs as ``source_asn``, ``source_asn_name``, ``source_asn_domain``.

More importantly: when an IP has no reverse DNS (common for many
large senders), source attribution now falls back to the ASN domain
as a lookup key into the same ``reverse_dns_map``. Thanks to #712
and #714, ~85% of routed IPv4 space now has an ``as_domain`` that
hits the map, so rows that were previously unattributable now get a
``source_name``/``source_type`` derived from the ASN. When the ASN
domain misses the map, the raw AS name is used as ``source_name``
with ``source_type`` left null — still better than nothing.

Crucially, ``source_reverse_dns`` and ``source_base_domain`` remain
null on ASN-derived rows, so downstream consumers can still tell a
PTR-resolved attribution apart from an ASN-derived one.

ASN is stored as an integer at the schema level (Elasticsearch /
OpenSearch mappings use ``Integer``) so consumers can do range
queries and numeric sorts; dashboards can prepend ``AS`` at display
time. The MMDB reader normalizes both IPinfo's ``"AS15169"`` string
and MaxMind's ``autonomous_system_number`` int to the same int form.

Also fixes a pre-existing caching bug in ``get_ip_address_info``:
entries without reverse DNS were never written to the IP-info cache,
so every no-PTR IP re-did the MMDB read and DNS attempt on every
call. The cache write is now unconditional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Bump to 9.9.0 and document the ASN fallback work

Updates the changelog with a 9.9.0 entry covering the ASN-domain
aliases (#712, #714), map-maintenance tooling fixes (#713), and the
ASN-fallback source attribution added in this branch.

Extends AGENTS.md to explain that ``base_reverse_dns_map.csv`` is now
a mixed-namespace map (rDNS bases alongside ASN domains) and adds a
short recipe for finding high-value ASN-domain misses against the
bundled MMDB, so future contributors know where the map's second
lookup path comes from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Document project conventions previously held only in agent memory

Promotes four conventions out of per-agent memory and into AGENTS.md
so every contributor — human or agent — works from the same baseline:

- Run ruff check + format before committing (Code Style).
- Store natively numeric values as numbers, not pre-formatted strings
  (e.g. ASN as int 15169, not "AS15169"; ES/OS mappings as Integer)
  (Code Style).
- Before rewriting a tracked list/data file from freshly-generated
  content, verify the existing content via git — these files
  accumulate manually-curated entries across sessions (Editing tracked
  data files).
- A release isn't done until hatch-built sdist + wheel are attached to
  the GitHub release page; full 8-step sequence documented (Releases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 AGENTS.md                |  65 ++++++++++++++---
 CHANGELOG.md             |  24 +++++++
 docs/source/output.md    |  18 +++--
 parsedmarc/__init__.py   |  12 ++++
 parsedmarc/constants.py  |   2 +-
 parsedmarc/elastic.py    |  12 ++++
 parsedmarc/opensearch.py |  12 ++++
 parsedmarc/splunk.py     |   3 +
 parsedmarc/types.py      |   3 +
 parsedmarc/utils.py      | 152 ++++++++++++++++++++++++++++++---------
 tests.py                 |  61 ++++++++++++++++
 11 files changed, 315 insertions(+), 49 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index 12fc094..b6d449c 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -62,22 +62,42 @@ IP address info cached for 4 hours, seen aggregate report IDs cached for 1 hour
 
 ## Code Style
 
-- Ruff for formatting and linting (configured in `.vscode/settings.json`)
-- TypedDict for structured data, type hints throughout
-- Python ≥3.10 required
-- Tests are in a single `tests.py` file using unittest; sample reports live in `samples/`
-- File path config values must be wrapped with `_expand_path()` in `cli.py`
-- Maildir UID checks are intentionally relaxed (warn, don't crash) for Docker compatibility
-- Token file writes must create parent directories before opening for write
+- Ruff for formatting and linting (configured in `.vscode/settings.json`). Run `ruff check .` and `ruff format --check .` after every code edit, before committing.
+- TypedDict for structured data, type hints throughout.
+- Python ≥3.10 required.
+- Tests are in a single `tests.py` file using unittest; sample reports live in `samples/`.
+- File path config values must be wrapped with `_expand_path()` in `cli.py`.
+- Maildir UID checks are intentionally relaxed (warn, don't crash) for Docker compatibility.
+- Token file writes must create parent directories before opening for write.
+- Store natively numeric values as numbers, not pre-formatted strings. Example: ASN is stored as `int 15169`, not `"AS15169"`; Elasticsearch / OpenSearch mappings for such fields use `Integer()` so consumers can do range queries and numeric sorts. Display layers format with a prefix at render time.
+
+## Editing tracked data files
+
+Before rewriting a tracked list/data file from freshly-generated content (anything under `parsedmarc/resources/maps/`, CSVs, `.txt` lists), check the existing file first — `git show HEAD:<path> | wc -l`, `git log -1 -- <path>`, `git diff --stat`. Files like `known_unknown_base_reverse_dns.txt` and `base_reverse_dns_map.csv` accumulate manually-curated entries across many sessions, and a "fresh" regeneration that drops the row count is almost certainly destroying prior work. If the new content is meant to *add* rather than *replace*, use a merge/append pattern. Treat any unexpected row-count drop in the pending diff as a red flag.
+
+## Releases
+
+A release isn't done until built artifacts are attached to the GitHub release page. Full sequence:
+
+1. Bump version in `parsedmarc/constants.py`; update `CHANGELOG.md` with a new section under the new version number.
+2. Commit on a feature branch, open a PR, merge to master.
+3. `git fetch && git checkout master && git pull`.
+4. `git tag -a <version> -m "<version>" <sha>` and `git push origin <version>`.
+5. `rm -rf dist && hatch build`. Verify `git describe --tags --exact-match` matches the tag.
+6. `gh release create <version> --title "<version>" --notes-file <notes>`.
+7. `gh release upload <version> dist/parsedmarc-<version>.tar.gz dist/parsedmarc-<version>-py3-none-any.whl`.
+8. Confirm `gh release view <version> --json assets` shows both the sdist and the wheel before considering the release complete.
 
 ## Maintaining the reverse DNS maps
 
-`parsedmarc/resources/maps/base_reverse_dns_map.csv` maps reverse DNS base domains to a display name and service type. See `parsedmarc/resources/maps/README.md` for the field format and the service_type precedence rules.
+`parsedmarc/resources/maps/base_reverse_dns_map.csv` maps a base domain to a display name and service type. The same map is consulted at two points: first with a PTR-derived base domain, and — if the IP has no PTR — with the ASN domain from the bundled IPinfo Lite MMDB (`parsedmarc/resources/ipinfo/ipinfo_lite.mmdb`). See `parsedmarc/resources/maps/README.md` for the field format and the service_type precedence rules.
+
+Because both lookup paths read the same CSV, map keys are a mixed namespace — rDNS-base domains (e.g. `comcast.net`, discovered via `base_reverse_dns.csv`) coexist with ASN domains (e.g. `comcast.com`, discovered via coverage-gap analysis against the MMDB). Entries of both kinds should point to the same `(name, type)` when they describe the same operator — grep before inventing a new display name.
 
 ### File format
 
 - CSV uses **CRLF** line endings and UTF-8 encoding — preserve both when editing programmatically.
-- Entries are sorted alphabetically (case-insensitive) by the first column.
+- Entries are sorted alphabetically (case-insensitive) by the first column. `parsedmarc/resources/maps/sortlists.py` is authoritative — run it after any batch edit to re-sort, dedupe, and validate `type` values.
 - Names containing commas must be quoted.
 - Do not edit in Excel (it mangles Unicode); use LibreOffice Calc or a text editor.
 
@@ -125,7 +145,32 @@ When `unknown_base_reverse_dns.csv` has new entries, follow this order rather th
 - `detect_psl_overrides.py` — scans the lists for clustered IP-containing patterns, auto-adds brand suffixes to `psl_overrides.txt`, folds affected entries to their base, and removes any remaining full-IP entries. Run before the collector on any new batch.
 - `collect_domain_info.py` — the bulk enrichment collector described above. Respects `psl_overrides.txt` and skips full-IP entries.
 - `find_bad_utf8.py` — locates invalid UTF-8 bytes (used after past encoding corruption).
-- `sortlists.py` — sorting helper for the list files.
+- `sortlists.py` — case-insensitive sort + dedupe + `type`-column validator for the list files; the authoritative sorter run after every batch edit.
+
+### Checking ASN-domain coverage of the MMDB
+
+Separately from `base_reverse_dns.csv`, the MMDB itself is a source of keys worth mapping. To find ASN domains with high IP weight that don't yet have a map entry, walk every record in `ipinfo_lite.mmdb`, aggregate IPv4 count per `as_domain`, and subtract what's already a map key:
+
+```python
+import csv, maxminddb
+from collections import defaultdict
+keys = set()
+with open("parsedmarc/resources/maps/base_reverse_dns_map.csv", newline="", encoding="utf-8") as f:
+    for row in csv.DictReader(f):
+        keys.add(row["base_reverse_dns"].strip().lower())
+v4 = defaultdict(int); names = {}
+for net, rec in maxminddb.open_database("parsedmarc/resources/ipinfo/ipinfo_lite.mmdb"):
+    if net.version != 4 or not isinstance(rec, dict): continue
+    d = rec.get("as_domain")
+    if not d: continue
+    v4[d.lower()] += net.num_addresses
+    names[d.lower()] = rec.get("as_name", "")
+miss = sorted(((d, v4[d], names[d]) for d in v4 if d not in keys), key=lambda x: -x[1])
+for d, c, n in miss[:50]:
+    print(f"{c:>12,}  {d:<30}  {n}")
+```
+
+Apply the same classification rules above (precedence, naming consistency, skip-if-ambiguous, privacy). Many top misses will be brands already in the map under a different rDNS-base key — the goal there is to alias the ASN domain to the same `(name, type)` so both lookup paths hit. For ASN domains with no obvious brand identity (small resellers, parked ASNs), don't map them — the attribution code falls back to the raw `as_name` from the MMDB, which is better than a guess.
 
 ### After a batch merge
 
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 149ed7f..df7ce0c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,29 @@
 # Changelog
 
+## 9.9.0
+
+### Changes
+
+- Source attribution now has an ASN fallback. Every IP source record carries three new fields — `asn` (integer, e.g. `15169`), `asn_name` (`"Google LLC"`), and `asn_domain` (`"google.com"`) — sourced from the bundled IPinfo Lite MMDB. When an IP has no reverse DNS, `get_ip_address_info()` uses `asn_domain` as a lookup into the same `reverse_dns_map`, and if that misses, falls back to the raw `asn_name`. `reverse_dns` and `base_domain` stay null on ASN-derived rows so consumers can still distinguish PTR-derived from ASN-derived attribution.
+- Added `source_asn`, `source_asn_name`, `source_asn_domain` to CSV output (aggregate + forensic), JSON output, and the Elasticsearch / OpenSearch / Splunk integrations. `source_asn` is mapped as `Integer` at the schema level so consumers can do range queries and numeric sorts; dashboards can prepend `"AS"` at display time.
+- Expanded `base_reverse_dns_map.csv` with 500 ASN-domain aliases for the most-routed IPv4 ranges. IPv4-weighted coverage of the bundled `ipinfo_lite.mmdb` went from ~34% of routed space matching a map entry via ASN domain to ~85%. Every alias is a brand that was already in the map under a different rDNS-base key (e.g. adding `comcast.com` alongside the existing `comcast.net`), plus a small number of large operators that previously had no entry. 11 entries were also promoted out of `known_unknown_base_reverse_dns.txt` because ASN context made their identity unambiguous.
+- Added `get_ip_address_db_record()` in `parsedmarc.utils`, a single-open MMDB reader that returns country + ASN fields together. `get_ip_address_country()` is now a thin wrapper. Supports both IPinfo Lite's schema (`country_code`, `asn` as `"AS15169"`, `as_name`, `as_domain`) and MaxMind's (`country.iso_code`, `autonomous_system_number` as int, `autonomous_system_organization`) in one pass; ASN is normalized to a plain int from either. MaxMind users who drop in their own ASN MMDB get `asn` + `asn_name` populated; `asn_domain` stays null because MaxMind doesn't carry it.
+
+### Fixed
+
+- `get_ip_address_info()` now caches entries for IPs without reverse DNS. Previously the cache write was inside the `if reverse_dns is not None` branch, so every no-PTR IP re-did the MMDB read and DNS attempt on every call.
+- Fixed three bugs in `parsedmarc/resources/maps/sortlists.py` that silently disabled the `type`-column validator and sorted the map case-sensitively, contrary to its documented behavior:
+  - Validator allowed-values map was keyed on `"Type"` (capital T), but the CSV header is `"type"` (lowercase), so every row bypassed validation.
+  - Types were read with trailing newlines via `f.readlines()`, so comparisons would not have matched even if the column name had been right.
+  - `sort_csv()` was called without `case_insensitive_sort=True`, which moved the sole mixed-case key (`United-domains.de`) to the top of the file instead of into its alphabetical position.
+- Fixed eight pre-existing map rows with invalid or inconsistent `type` values that the now-working validator surfaced: casing corrections for `dhl.com` (`logistics` → `Logistics`), `ghm-grenoble.fr` (`healthcare` → `Healthcare`), and `regusnet.com` (`Real estate` → `Real Estate`); reclassified `lodestonegroup.com` from the nonexistent `Insurance` type to `Finance`; added missing `Religion` and `Utilities` entries to `base_reverse_dns_types.txt` so it matches the README's industry list.
+- Fixed the `rt.ru` map entry: was classified as `RT,Government Media`, which conflated Rostelecom (the Russian telco that owns and uses `rt.ru`) with RT / Russia Today (which uses `rt.com`). Corrected to `Rostelecom,ISP`.
+
+### Upgrade notes
+
+- Output schema change: CSV, JSON, Elasticsearch, OpenSearch, and Splunk all gain three new fields per row (`source_asn`, `source_asn_name`, `source_asn_domain`). Existing queries and dashboards keep working; dashboards that want to consume the new fields will need to be updated. Elasticsearch / OpenSearch will add the new mappings on next document write.
+- Rows for IPs without reverse DNS now populate `source_name` / `source_type` via ASN fallback. If downstream dashboards treated "null `source_name`" as a signal for "no rDNS", switch to checking `source_reverse_dns IS NULL` instead — that remains the unambiguous signal.
+
 ## 9.8.0
 
 ### Changes
diff --git a/docs/source/output.md b/docs/source/output.md
index a8d19e4..bc73403 100644
--- a/docs/source/output.md
+++ b/docs/source/output.md
@@ -44,7 +44,10 @@ of the report schema.
         "reverse_dns": null,
         "base_domain": null,
         "name": null,
-        "type": null
+        "type": null,
+        "asn": 7018,
+        "asn_name": "AT&T Services, Inc.",
+        "asn_domain": "att.com"
       },
       "count": 2,
       "alignment": {
@@ -90,7 +93,7 @@ of the report schema.
 ### CSV aggregate report
 
 ```text
-xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,normalized_timespan,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
+xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,normalized_timespan,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,source_asn,source_asn_name,source_asn_domain,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
 draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
 draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
 
@@ -123,7 +126,12 @@ Thanks to GitHub user [xennn](https://github.com/xennn) for the anonymized
        "ip_address": "10.10.10.10",
        "country": null,
        "reverse_dns": null,
-       "base_domain": null
+       "base_domain": null,
+       "name": null,
+       "type": null,
+       "asn": null,
+       "asn_name": null,
+       "asn_domain": null
      },
      "authentication_mechanisms": [],
      "original_envelope_id": null,
@@ -193,7 +201,7 @@ Thanks to GitHub user [xennn](https://github.com/xennn) for the anonymized
 ### CSV forensic report
 
 ```text
-feedback_type,user_agent,version,original_envelope_id,original_mail_from,original_rcpt_to,arrival_date,arrival_date_utc,subject,message_id,authentication_results,dkim_domain,source_ip_address,source_country,source_reverse_dns,source_base_domain,delivery_result,auth_failure,reported_domain,authentication_mechanisms,sample_headers_only
+feedback_type,user_agent,version,original_envelope_id,original_mail_from,original_rcpt_to,arrival_date,arrival_date_utc,subject,message_id,authentication_results,dkim_domain,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,source_asn,source_asn_name,source_asn_domain,delivery_result,auth_failure,reported_domain,authentication_mechanisms,sample_headers_only
 auth-failure,Lua/1.0,1.0,,sharepoint@domain.de,peter.pan@domain.de,"Mon, 01 Oct 2018 11:20:27 +0200",2018-10-01 09:20:27,Subject,<38.E7.30937.BD6E1BB5@ mailrelay.de>,"dmarc=fail (p=none, dis=none) header.from=domain.de",,10.10.10.10,,,,policy,dmarc,domain.de,,False
 ```
 
@@ -238,4 +246,4 @@ auth-failure,Lua/1.0,1.0,,sharepoint@domain.de,peter.pan@domain.de,"Mon, 01 Oct
     ]
   }
 ]
-```
\ No newline at end of file
+```
diff --git a/parsedmarc/__init__.py b/parsedmarc/__init__.py
index f15293d..103520b 100644
--- a/parsedmarc/__init__.py
+++ b/parsedmarc/__init__.py
@@ -1114,6 +1114,9 @@ def parsed_aggregate_reports_to_csv_rows(
             row["source_base_domain"] = record["source"]["base_domain"]
             row["source_name"] = record["source"]["name"]
             row["source_type"] = record["source"]["type"]
+            row["source_asn"] = record["source"]["asn"]
+            row["source_asn_name"] = record["source"]["asn_name"]
+            row["source_asn_domain"] = record["source"]["asn_domain"]
             row["count"] = record["count"]
             row["spf_aligned"] = record["alignment"]["spf"]
             row["dkim_aligned"] = record["alignment"]["dkim"]
@@ -1205,6 +1208,9 @@ def parsed_aggregate_reports_to_csv(
         "source_base_domain",
         "source_name",
         "source_type",
+        "source_asn",
+        "source_asn_name",
+        "source_asn_domain",
         "count",
         "spf_aligned",
         "dkim_aligned",
@@ -1406,6 +1412,9 @@ def parsed_forensic_reports_to_csv_rows(
         row["source_base_domain"] = report["source"]["base_domain"]
         row["source_name"] = report["source"]["name"]
         row["source_type"] = report["source"]["type"]
+        row["source_asn"] = report["source"]["asn"]
+        row["source_asn_name"] = report["source"]["asn_name"]
+        row["source_asn_domain"] = report["source"]["asn_domain"]
         row["source_country"] = report["source"]["country"]
         del row["source"]
         row["subject"] = report["parsed_sample"].get("subject")
@@ -1451,6 +1460,9 @@ def parsed_forensic_reports_to_csv(
         "source_base_domain",
         "source_name",
         "source_type",
+        "source_asn",
+        "source_asn_name",
+        "source_asn_domain",
         "delivery_result",
         "auth_failure",
         "reported_domain",
diff --git a/parsedmarc/constants.py b/parsedmarc/constants.py
index 6039f1b..94c0d13 100644
--- a/parsedmarc/constants.py
+++ b/parsedmarc/constants.py
@@ -1,4 +1,4 @@
-__version__ = "9.8.0"
+__version__ = "9.9.0"
 
 USER_AGENT = f"parsedmarc/{__version__}"
 
diff --git a/parsedmarc/elastic.py b/parsedmarc/elastic.py
index 9103a80..72223fb 100644
--- a/parsedmarc/elastic.py
+++ b/parsedmarc/elastic.py
@@ -79,6 +79,9 @@ class _AggregateReportDoc(Document):
     source_base_domain = Text()
     source_type = Text()
     source_name = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
     message_count = Integer
     disposition = Text()
     dkim_aligned = Boolean()
@@ -173,6 +176,9 @@ class _ForensicReportDoc(Document):
     source_ip_address = Ip()
     source_country = Text()
     source_reverse_dns = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
     source_authentication_mechanisms = Text()
     source_auth_failures = Text()
     dkim_domain = Text()
@@ -489,6 +495,9 @@ def save_aggregate_report_to_elasticsearch(
             source_base_domain=record["source"]["base_domain"],
             source_type=record["source"]["type"],
             source_name=record["source"]["name"],
+            source_asn=record["source"]["asn"],
+            source_asn_name=record["source"]["asn_name"],
+            source_asn_domain=record["source"]["asn_domain"],
             message_count=record["count"],
             disposition=record["policy_evaluated"]["disposition"],
             dkim_aligned=record["policy_evaluated"]["dkim"] is not None
@@ -673,6 +682,9 @@ def save_forensic_report_to_elasticsearch(
             source_country=forensic_report["source"]["country"],
             source_reverse_dns=forensic_report["source"]["reverse_dns"],
             source_base_domain=forensic_report["source"]["base_domain"],
+            source_asn=forensic_report["source"]["asn"],
+            source_asn_name=forensic_report["source"]["asn_name"],
+            source_asn_domain=forensic_report["source"]["asn_domain"],
             authentication_mechanisms=forensic_report["authentication_mechanisms"],
             auth_failure=forensic_report["auth_failure"],
             dkim_domain=forensic_report["dkim_domain"],
diff --git a/parsedmarc/opensearch.py b/parsedmarc/opensearch.py
index c9dcaf2..5260c1f 100644
--- a/parsedmarc/opensearch.py
+++ b/parsedmarc/opensearch.py
@@ -82,6 +82,9 @@ class _AggregateReportDoc(Document):
     source_base_domain = Text()
     source_type = Text()
     source_name = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
     message_count = Integer
     disposition = Text()
     dkim_aligned = Boolean()
@@ -176,6 +179,9 @@ class _ForensicReportDoc(Document):
     source_ip_address = Ip()
     source_country = Text()
     source_reverse_dns = Text()
+    source_asn = Integer()
+    source_asn_name = Text()
+    source_asn_domain = Text()
     source_authentication_mechanisms = Text()
     source_auth_failures = Text()
     dkim_domain = Text()
@@ -519,6 +525,9 @@ def save_aggregate_report_to_opensearch(
             source_base_domain=record["source"]["base_domain"],
             source_type=record["source"]["type"],
             source_name=record["source"]["name"],
+            source_asn=record["source"]["asn"],
+            source_asn_name=record["source"]["asn_name"],
+            source_asn_domain=record["source"]["asn_domain"],
             message_count=record["count"],
             disposition=record["policy_evaluated"]["disposition"],
             dkim_aligned=record["policy_evaluated"]["dkim"] is not None
@@ -703,6 +712,9 @@ def save_forensic_report_to_opensearch(
             source_country=forensic_report["source"]["country"],
             source_reverse_dns=forensic_report["source"]["reverse_dns"],
             source_base_domain=forensic_report["source"]["base_domain"],
+            source_asn=forensic_report["source"]["asn"],
+            source_asn_name=forensic_report["source"]["asn_name"],
+            source_asn_domain=forensic_report["source"]["asn_domain"],
             authentication_mechanisms=forensic_report["authentication_mechanisms"],
             auth_failure=forensic_report["auth_failure"],
             dkim_domain=forensic_report["dkim_domain"],
diff --git a/parsedmarc/splunk.py b/parsedmarc/splunk.py
index ff660f0..9f83c2a 100644
--- a/parsedmarc/splunk.py
+++ b/parsedmarc/splunk.py
@@ -104,6 +104,9 @@ class HECClient(object):
                 new_report["source_base_domain"] = record["source"]["base_domain"]
                 new_report["source_type"] = record["source"]["type"]
                 new_report["source_name"] = record["source"]["name"]
+                new_report["source_asn"] = record["source"]["asn"]
+                new_report["source_asn_name"] = record["source"]["asn_name"]
+                new_report["source_asn_domain"] = record["source"]["asn_domain"]
                 new_report["message_count"] = record["count"]
                 new_report["disposition"] = record["policy_evaluated"]["disposition"]
                 new_report["spf_aligned"] = record["alignment"]["spf"]
diff --git a/parsedmarc/types.py b/parsedmarc/types.py
index f0d367d..91e4b35 100644
--- a/parsedmarc/types.py
+++ b/parsedmarc/types.py
@@ -40,6 +40,9 @@ class IPSourceInfo(TypedDict):
     base_domain: Optional[str]
     name: Optional[str]
     type: Optional[str]
+    asn: Optional[int]
+    asn_name: Optional[str]
+    asn_domain: Optional[str]
 
 
 class AggregateAlignment(TypedDict):
diff --git a/parsedmarc/utils.py b/parsedmarc/utils.py
index 9f85728..ea37172 100644
--- a/parsedmarc/utils.py
+++ b/parsedmarc/utils.py
@@ -151,6 +151,9 @@ class IPAddressInfo(TypedDict):
     base_domain: Optional[str]
     name: Optional[str]
     type: Optional[str]
+    asn: Optional[int]
+    asn_name: Optional[str]
+    asn_domain: Optional[str]
 
 
 def decode_base64(data: str) -> bytes:
@@ -457,20 +460,7 @@ def load_ip_db(
     logger.info("Using bundled IP database")
 
 
-def get_ip_address_country(
-    ip_address: str, *, db_path: Optional[str] = None
-) -> Optional[str]:
-    """
-    Returns the ISO code for the country associated
-    with the given IPv4 or IPv6 address
-
-    Args:
-        ip_address (str): The IP address to query for
-        db_path (str): Path to a MMDB file from IPinfo, MaxMind, or DBIP
-
-    Returns:
-        str: And ISO country code associated with the given IP address
-    """
+def _get_ip_database_path(db_path: Optional[str]) -> str:
     db_paths = [
         "ipinfo_lite.mmdb",
         "GeoLite2-Country.mmdb",
@@ -486,14 +476,13 @@ def get_ip_address_country(
         "dbip-country.mmdb",
     ]
 
-    if db_path is not None:
-        if not os.path.isfile(db_path):
-            logger.warning(
-                f"No file exists at {db_path}. Falling back to an "
-                "included copy of the IPinfo IP to Country "
-                "Lite database."
-            )
-            db_path = None
+    if db_path is not None and not os.path.isfile(db_path):
+        logger.warning(
+            f"No file exists at {db_path}. Falling back to an "
+            "included copy of the IPinfo IP to Country "
+            "Lite database."
+        )
+        db_path = None
 
     if db_path is None:
         for system_path in db_paths:
@@ -513,14 +502,37 @@ def get_ip_address_country(
     if db_age > timedelta(days=30):
         logger.warning("IP database is more than a month old")
 
-    db_reader = maxminddb.open_database(db_path)
+    return db_path
+
+
+class _IPDatabaseRecord(TypedDict):
+    country: Optional[str]
+    asn: Optional[int]
+    asn_name: Optional[str]
+    asn_domain: Optional[str]
+
+
+def get_ip_address_db_record(
+    ip_address: str, *, db_path: Optional[str] = None
+) -> _IPDatabaseRecord:
+    """Look up an IP in the configured MMDB and return country + ASN fields.
+
+    IPinfo Lite carries ``country_code``, ``as_name``, and ``as_domain`` on
+    every record. MaxMind/DBIP country-only databases carry only country, so
+    ``asn_name`` / ``asn_domain`` come back None for those users.
+    """
+    resolved_path = _get_ip_database_path(db_path)
+    db_reader = maxminddb.open_database(resolved_path)
     record = db_reader.get(ip_address)
 
-    # Support both the IPinfo schema (flat top-level ``country_code``) and the
-    # MaxMind/DBIP schema (nested ``country.iso_code``) so users dropping in
-    # their own MMDB from any of these providers keeps working.
     country: Optional[str] = None
+    asn: Optional[int] = None
+    asn_name: Optional[str] = None
+    asn_domain: Optional[str] = None
     if isinstance(record, dict):
+        # Support both the IPinfo schema (flat top-level ``country_code``) and
+        # the MaxMind/DBIP schema (nested ``country.iso_code``) so users
+        # dropping in their own MMDB from any of these providers keeps working.
         code = record.get("country_code")
         if code is None:
             nested = record.get("country")
@@ -529,7 +541,52 @@ def get_ip_address_country(
         if isinstance(code, str):
             country = code
 
-    return country
+        # Normalize ASN to a plain integer. IPinfo stores it as a string like
+        # "AS15169"; MaxMind's ASN DB uses ``autonomous_system_number`` as an
+        # int. Integer form lets consumers do range queries and sort
+        # numerically; display-time formatting with an "AS" prefix is trivial.
+        raw_asn = record.get("asn")
+        if isinstance(raw_asn, int):
+            asn = raw_asn
+        elif isinstance(raw_asn, str) and raw_asn:
+            digits = raw_asn.removeprefix("AS").removeprefix("as")
+            if digits.isdigit():
+                asn = int(digits)
+        if asn is None:
+            mm_asn = record.get("autonomous_system_number")
+            if isinstance(mm_asn, int):
+                asn = mm_asn
+
+        name = record.get("as_name") or record.get("autonomous_system_organization")
+        if isinstance(name, str) and name:
+            asn_name = name
+        domain = record.get("as_domain")
+        if isinstance(domain, str) and domain:
+            asn_domain = domain.lower()
+
+    return {
+        "country": country,
+        "asn": asn,
+        "asn_name": asn_name,
+        "asn_domain": asn_domain,
+    }
+
+
+def get_ip_address_country(
+    ip_address: str, *, db_path: Optional[str] = None
+) -> Optional[str]:
+    """
+    Returns the ISO code for the country associated
+    with the given IPv4 or IPv6 address.
+
+    Args:
+        ip_address (str): The IP address to query for
+        db_path (str): Path to a MMDB file from IPinfo, MaxMind, or DBIP
+
+    Returns:
+        str: And ISO country code associated with the given IP address
+    """
+    return get_ip_address_db_record(ip_address, db_path=db_path)["country"]
 
 
 def load_reverse_dns_map(
@@ -723,6 +780,9 @@ def get_ip_address_info(
         "base_domain": None,
         "name": None,
         "type": None,
+        "asn": None,
+        "asn_name": None,
+        "asn_domain": None,
     }
     if offline:
         reverse_dns = None
@@ -733,9 +793,13 @@ def get_ip_address_info(
             timeout=timeout,
             retries=retries,
         )
-    country = get_ip_address_country(ip_address, db_path=ip_db_path)
-    info["country"] = country
+    db_record = get_ip_address_db_record(ip_address, db_path=ip_db_path)
+    info["country"] = db_record["country"]
+    info["asn"] = db_record["asn"]
+    info["asn_name"] = db_record["asn_name"]
+    info["asn_domain"] = db_record["asn_domain"]
     info["reverse_dns"] = reverse_dns
+
     if reverse_dns is not None:
         base_domain = get_base_domain(reverse_dns)
         if base_domain is not None:
@@ -750,12 +814,34 @@ def get_ip_address_info(
             info["base_domain"] = base_domain
             info["type"] = service["type"]
             info["name"] = service["name"]
-
-        if cache is not None:
-            cache[ip_address] = info
-            logger.debug(f"IP address {ip_address} added to cache")
     else:
         logger.debug(f"IP address {ip_address} reverse_dns not found")
+        # Fall back to ASN data for source attribution. ``reverse_dns`` and
+        # ``base_domain`` are left null so consumers can still tell an
+        # ASN-derived row apart from one resolved via a real PTR.
+        map_value: ReverseDNSMap = (
+            reverse_dns_map if reverse_dns_map is not None else {}
+        )
+        if len(map_value) == 0:
+            load_reverse_dns_map(
+                map_value,
+                always_use_local_file=always_use_local_files,
+                local_file_path=reverse_dns_map_path,
+                url=reverse_dns_map_url,
+                offline=offline,
+            )
+        if info["asn_domain"] and info["asn_domain"] in map_value:
+            service = map_value[info["asn_domain"]]
+            info["name"] = service["name"]
+            info["type"] = service["type"]
+        elif info["asn_name"]:
+            # ASN-domain not in the map: surface the raw AS name with no
+            # classification. Better than leaving the row unattributed.
+            info["name"] = info["asn_name"]
+
+    if cache is not None:
+        cache[ip_address] = info
+        logger.debug(f"IP address {ip_address} added to cache")
 
     return info
 
diff --git a/tests.py b/tests.py
index 1b126ce..b964c85 100755
--- a/tests.py
+++ b/tests.py
@@ -223,6 +223,67 @@ class Test(unittest.TestCase):
             parsedmarc.parsed_smtp_tls_reports_to_csv(result["report"])
             print("Passed!")
 
+    def testIpAddressInfoSurfacesASNFields(self):
+        """ASN number, name, and domain from the bundled MMDB appear on every
+        IP info result, even when no PTR resolves."""
+        info = parsedmarc.utils.get_ip_address_info("8.8.8.8", offline=True)
+        self.assertEqual(info["asn"], 15169)
+        self.assertIsInstance(info["asn"], int)
+        self.assertEqual(info["asn_domain"], "google.com")
+        self.assertTrue(info["asn_name"])
+
+    def testIpAddressInfoFallsBackToASNMapEntryWhenNoPTR(self):
+        """When reverse DNS is absent, the ASN domain should be used as a
+        lookup into the reverse_dns_map so the row still gets attributed,
+        while reverse_dns and base_domain remain null."""
+        info = parsedmarc.utils.get_ip_address_info("8.8.8.8", offline=True)
+        self.assertIsNone(info["reverse_dns"])
+        self.assertIsNone(info["base_domain"])
+        self.assertEqual(info["name"], "Google (Including Gmail and Google Workspace)")
+        self.assertEqual(info["type"], "Email Provider")
+
+    def testIpAddressInfoFallsBackToRawASNameOnMapMiss(self):
+        """When neither PTR nor an ASN-map entry resolves, the raw AS name
+        is used as source_name with type left null — better than leaving
+        the row unattributed."""
+        # 204.79.197.100 is in an ASN whose as_domain is not in the map at
+        # the time of this test (msn.com); this exercises the asn_name
+        # fallback branch without depending on a specific map state.
+        from unittest.mock import patch
+
+        with patch(
+            "parsedmarc.utils.get_ip_address_db_record",
+            return_value={
+                "country": "US",
+                "asn": 64496,
+                "asn_name": "Some Unmapped Org, Inc.",
+                "asn_domain": "unmapped-for-this-test.example",
+            },
+        ):
+            # Bypass cache to avoid prior-test pollution.
+            info = parsedmarc.utils.get_ip_address_info(
+                "192.0.2.1", offline=True, cache=None
+            )
+        self.assertIsNone(info["reverse_dns"])
+        self.assertIsNone(info["base_domain"])
+        self.assertIsNone(info["type"])
+        self.assertEqual(info["name"], "Some Unmapped Org, Inc.")
+        self.assertEqual(info["asn_domain"], "unmapped-for-this-test.example")
+
+    def testAggregateCsvExposesASNColumns(self):
+        """The aggregate CSV output should include source_asn, source_asn_name,
+        and source_asn_domain columns."""
+        result = parsedmarc.parse_report_file(
+            "samples/aggregate/!example.com!1538204542!1538463818.xml",
+            always_use_local_files=True,
+            offline=True,
+        )
+        csv_text = parsedmarc.parsed_aggregate_reports_to_csv(result["report"])
+        header = csv_text.splitlines()[0].split(",")
+        self.assertIn("source_asn", header)
+        self.assertIn("source_asn_name", header)
+        self.assertIn("source_asn_domain", header)
+
     def testOpenSearchSigV4RequiresRegion(self):
         with self.assertRaises(opensearch_module.OpenSearchError):
             opensearch_module.set_hosts(