Update docs

Normalize timespans for aggregate reports in Elasticsearch and Opensearch
Update launch configuration and metadata key for timespan in aggregate report
2026-02-18 15:36:24 +00:00 · 2025-12-01 17:04:37 -05:00 · 2025-12-01 16:34:40 -05:00 · 2025-12-01 16:10:41 -05:00 · 2025-11-30 19:43:14 -05:00 · 2025-11-30 16:17:07 -05:00
33 changed files with 2945 additions and 443 deletions
--- a/.github/workflows/python-tests.yml
+++ b/.github/workflows/python-tests.yml
@@ -30,7 +30,7 @@ jobs:
    strategy:
      fail-fast: false
      matrix:
-        python-version: ["3.9", "3.10", "3.11", "3.12"]
+        python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]

    steps:
    - uses: actions/checkout@v4
--- a/.gitignore
+++ b/.gitignore
@@ -106,7 +106,7 @@ ENV/
 .idea/

 # VS Code launch config
-.vscode/launch.json
+#.vscode/launch.json

 # Visual Studio Code settings
 #.vscode/
@@ -142,3 +142,6 @@ scratch.py

 parsedmarc/resources/maps/base_reverse_dns.csv
 parsedmarc/resources/maps/unknown_base_reverse_dns.csv
+parsedmarc/resources/maps/sus_domains.csv
+parsedmarc/resources/maps/unknown_domains.txt
+*.bak
--- a/.vscode/launch.json
+++ b/.vscode/launch.json
@@ -0,0 +1,45 @@
+{
+  // Use IntelliSense to learn about possible attributes.
+  // Hover to view descriptions of existing attributes.
+  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
+  "version": "0.2.0",
+  "configurations": [
+    {
+      "name": "Python Debugger: Current File",
+      "type": "debugpy",
+      "request": "launch",
+      "program": "${file}",
+      "console": "integratedTerminal"
+    },
+    {
+      "name": "tests.py",
+      "type": "debugpy",
+      "request": "launch",
+      "program": "tests.py",
+      "console": "integratedTerminal"
+    },
+    {
+      "name": "sample",
+      "type": "debugpy",
+      "request": "launch",
+      "module": "parsedmarc.cli",
+      "args": ["samples/private/sample"]
+    },
+    {
+      "name": "sortlists.py",
+      "type": "debugpy",
+      "request": "launch",
+      "program": "sortlists.py",
+      "cwd": "${workspaceFolder}/parsedmarc/resources/maps",
+      "console": "integratedTerminal"
+    },
+    {
+      "name": "find_unknown_base_reverse_dns.py",
+      "type": "debugpy",
+      "request": "launch",
+      "program": "find_unknown_base_reverse_dns.py",
+      "cwd": "${workspaceFolder}/parsedmarc/resources/maps",
+      "console": "integratedTerminal"
+    }
+  ]
+}
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -13,6 +13,7 @@
        "automodule",
        "backported",
        "bellsouth",
+        "boto",
        "brakhane",
        "Brightmail",
        "CEST",
@@ -36,6 +37,7 @@
        "expiringdict",
        "fieldlist",
        "genindex",
+        "geoip",
        "geoipupdate",
        "Geolite",
        "geolocation",
@@ -44,7 +46,10 @@
        "hostnames",
        "htpasswd",
        "httpasswd",
+        "httplib",
        "IMAP",
+        "imapclient",
+        "infile",
        "Interaktive",
        "IPDB",
        "journalctl",
@@ -80,14 +85,18 @@
        "nosecureimap",
        "nosniff",
        "nwettbewerb",
+        "opensearch",
        "parsedmarc",
        "passsword",
        "Postorius",
        "premade",
        "procs",
        "publicsuffix",
+        "publicsuffixlist",
        "publixsuffix",
+        "pygelf",
        "pypy",
+        "pytest",
        "quickstart",
        "Reindex",
        "replyto",
@@ -95,10 +104,13 @@
        "Rollup",
        "Rpdm",
        "SAMEORIGIN",
+        "sdist",
        "Servernameone",
        "setuptools",
        "smartquotes",
        "SMTPTLS",
+        "sortlists",
+        "sortmaps",
        "sourcetype",
        "STARTTLS",
        "tasklist",
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,68 @@
 Changelog
 =========

+9.0.0
+------
+
+- Normalize aggregate DMARC report volumes when a report timespan exceeds 24 hours
+
+8.19.1
+------
+
+- Ignore HTML content type in report email parsing (#626)
+
+8.19.0
+------
+
+- Add multi-tenant support via an index-prefix domain mapping file
+- PSL overrides so that services like AWS are correctly identified
+- Additional improvements to report type detection
+- Fix webhook timeout parsing (PR #623)
+- Output to STDOUT when the new general config boolean `silent` is set to `False` (Close #614)
+- Additional services added to `base_reverse_dns_map.csv`
+
+8.18.9
+------
+
+- Complete fix for #687 and more robust report type detection
+
+8.18.8
+------
+
+- Fix parsing emails with an uncompressed aggregate report attachment (Closes #607)
+- Add `--no-prettify-json` CLI option (PR #617)
+
+8.18.7
+------
+
+Removed improper spaces from  `base_reverse_dns_map.csv` (Closes #612)
+
+8.18.6
+------
+
+- Fix since option to correctly work with weeks (PR #604)
+- Add 183 entries to `base_reverse_dns_map.csv`
+- Add 57 entries to `known_unknown_base_reverse_dns.txt`
+- Check for invalid UTF-8 bytes in `base_reverse_dns_map.csv` at build
+- Exclude unneeded items from the `parsedmarc.resources` module at build
+
+8.18.5
+------
+
+- Fix CSV download
+
+8.18.4
+------
+
+- Fix webhooks
+
+8.18.3
+------
+
+- Move `__version__` to `parsedmarc.constants`
+- Create a constant `USER_AGENT`
+- Use the HTTP `User-Agent` header value `parsedmarc/version` for all HTTP requests
+
 8.18.2
 ------

@@ -676,7 +738,7 @@ in the ``elasticsearch`` configuration file section (closes issue #78)
 -----

 - Add filename and line number to logging output
- Improved IMAP error handling  
+- Improved IMAP error handling
 - Add CLI options

  ```text
--- a/2
+++ b/2
@@ -1,4 +1,4 @@
-ARG BASE_IMAGE=python:3.9-slim
+ARG BASE_IMAGE=python:3.13-slim
 ARG USERNAME=parsedmarc
 ARG USER_UID=1000
 ARG USER_GID=$USER_UID
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ Package](https://img.shields.io/pypi/v/parsedmarc.svg)](https://pypi.org/project
 [![PyPI - Downloads](https://img.shields.io/pypi/dm/parsedmarc?color=blue)](https://pypistats.org/packages/parsedmarc)

 <p align="center">
-  <img src="https://github.com/domainaware/parsedmarc/raw/master/docs/source/_static/screenshots/dmarc-summary-charts.png?raw=true" alt="A screenshot of DMARC summary charts in Kibana"/>
+  <img src="https://raw.githubusercontent.com/domainaware/parsedmarc/refs/heads/master/docs/source/_static/screenshots/dmarc-summary-charts.png?raw=true" alt="A screenshot of DMARC summary charts in Kibana"/>
 </p>

 `parsedmarc` is a Python module and CLI utility for parsing DMARC
@@ -34,10 +34,10 @@ Thanks to all

 ## Features

- Parses draft and 1.0 standard aggregate/rua reports
- Parses forensic/failure/ruf reports
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail
-    API
+- Parses draft and 1.0 standard aggregate/rua DMARC reports
+- Parses forensic/failure/ruf DMARC reports
+- Parses reports from SMTP TLS Reporting
+- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
 - Transparently handles gzip or zip compressed reports
 - Consistent data structures
 - Simple JSON and/or CSV output
--- a/build.sh
+++ b/build.sh
@@ -18,8 +18,11 @@ if [  -d "./../parsedmarc-docs" ]; then
  cp -rf build/html/* ../../parsedmarc-docs/
 fi
 cd ..
-sort -o "parsedmarc/resources/maps/known_unknown_base_reverse_dns.txt" "parsedmarc/resources/maps/known_unknown_base_reverse_dns.txt"
-./sortmaps.py
+cd parsedmarc/resources/maps
+python3 sortlists.py
+echo "Checking for invalid UTF-8 bytes in base_reverse_dns_map.csv"
+python3 find_bad_utf8.py base_reverse_dns_map.csv
+cd ../../..
 python3 tests.py
 rm -rf dist/ build/
-hatch build
+hatch build
--- a/docs/source/api.md
+++ b/docs/source/api.md
@@ -21,7 +21,6 @@
   :members:
 ```

-
 ## parsedmarc.splunk

 ```{eval-rst}
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -33,15 +33,16 @@ and Valimail.

 ## Features

- Parses draft and 1.0 standard aggregate/rua reports
- Parses forensic/failure/ruf reports
+- Parses draft and 1.0 standard aggregate/rua DMARC reports
+- Parses forensic/failure/ruf DMARC reports
+- Parses reports from SMTP TLS Reporting
 - Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
 - Transparently handles gzip or zip compressed reports
 - Consistent data structures
 - Simple JSON and/or CSV output
 - Optionally email the results
- Optionally send the results to Elasticsearch/OpenSearch and/or Splunk, for use with
-  premade dashboards
+- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use
+    with premade dashboards
 - Optionally send reports to Apache Kafka

 ```{toctree}
--- a/docs/source/output.md
+++ b/docs/source/output.md
@@ -23,6 +23,8 @@ of the report schema.
    "report_id": "9391651994964116463",
    "begin_date": "2012-04-27 20:00:00",
    "end_date": "2012-04-28 19:59:59",
+    "timespan_requires_normalization": false,
+    "original_timespan_seconds": 86399,
    "errors": []
  },
  "policy_published": {
@@ -39,8 +41,10 @@ of the report schema.
      "source": {
        "ip_address": "72.150.241.94",
        "country": "US",
-        "reverse_dns": "adsl-72-150-241-94.shv.bellsouth.net",
-        "base_domain": "bellsouth.net"
+        "reverse_dns": null,
+        "base_domain": null,
+        "name": null,
+        "type": null
      },
      "count": 2,
      "alignment": {
@@ -74,7 +78,10 @@ of the report schema.
            "result": "pass"
          }
        ]
-      }
+      },
+      "normalized_timespan": false,
+      "interval_begin": "2012-04-28 00:00:00",
+      "interval_end": "2012-04-28 23:59:59"
    }
  ]
 }
@@ -83,8 +90,10 @@ of the report schema.
 ### CSV aggregate report

 ```text
-xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
-draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-27 20:00:00,2012-04-28 19:59:59,,example.com,r,r,none,none,100,0,72.150.241.94,US,adsl-72-150-241-94.shv.bellsouth.net,bellsouth.net,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
+xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,normalized_timespan,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
+draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
+draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
+
 ```

 ## Sample forensic report output
--- a/docs/source/usage.md
+++ b/docs/source/usage.md
@@ -4,47 +4,50 @@

 ```text
 usage: parsedmarc [-h] [-c CONFIG_FILE] [--strip-attachment-payloads] [-o OUTPUT]
-                   [--aggregate-json-filename AGGREGATE_JSON_FILENAME]
-                   [--forensic-json-filename FORENSIC_JSON_FILENAME]
-                   [--aggregate-csv-filename AGGREGATE_CSV_FILENAME]
-                   [--forensic-csv-filename FORENSIC_CSV_FILENAME]
-                   [-n NAMESERVERS [NAMESERVERS ...]] [-t DNS_TIMEOUT] [--offline]
-                   [-s] [--verbose] [--debug] [--log-file LOG_FILE] [-v]
-                   [file_path ...]
+                  [--aggregate-json-filename AGGREGATE_JSON_FILENAME] [--forensic-json-filename FORENSIC_JSON_FILENAME]
+                  [--smtp-tls-json-filename SMTP_TLS_JSON_FILENAME] [--aggregate-csv-filename AGGREGATE_CSV_FILENAME]
+                  [--forensic-csv-filename FORENSIC_CSV_FILENAME] [--smtp-tls-csv-filename SMTP_TLS_CSV_FILENAME]
+                  [-n NAMESERVERS [NAMESERVERS ...]] [-t DNS_TIMEOUT] [--offline] [-s] [-w] [--verbose] [--debug]
+                  [--log-file LOG_FILE] [--no-prettify-json] [-v]
+                  [file_path ...]

- Parses DMARC reports
+Parses DMARC reports

- positional arguments:
-   file_path             one or more paths to aggregate or forensic report
-                         files, emails, or mbox files'
+positional arguments:
+  file_path             one or more paths to aggregate or forensic report files, emails, or mbox files'

- optional arguments:
-   -h, --help            show this help message and exit
-   -c CONFIG_FILE, --config-file CONFIG_FILE
-                         a path to a configuration file (--silent implied)
-   --strip-attachment-payloads
-                         remove attachment payloads from forensic report output
-   -o OUTPUT, --output OUTPUT
-                         write output files to the given directory
-   --aggregate-json-filename AGGREGATE_JSON_FILENAME
-                         filename for the aggregate JSON output file
-   --forensic-json-filename FORENSIC_JSON_FILENAME
-                         filename for the forensic JSON output file
-   --aggregate-csv-filename AGGREGATE_CSV_FILENAME
-                         filename for the aggregate CSV output file
-   --forensic-csv-filename FORENSIC_CSV_FILENAME
-                         filename for the forensic CSV output file
-   -n NAMESERVERS [NAMESERVERS ...], --nameservers NAMESERVERS [NAMESERVERS ...]
-                         nameservers to query
-   -t DNS_TIMEOUT, --dns_timeout DNS_TIMEOUT
-                         number of seconds to wait for an answer from DNS
-                         (default: 2.0)
-   --offline             do not make online queries for geolocation or DNS
-   -s, --silent          only print errors and warnings
-   --verbose             more verbose output
-   --debug               print debugging information
-   --log-file LOG_FILE   output logging to a file
-   -v, --version         show program's version number and exit
+options:
+  -h, --help            show this help message and exit
+  -c CONFIG_FILE, --config-file CONFIG_FILE
+                        a path to a configuration file (--silent implied)
+  --strip-attachment-payloads
+                        remove attachment payloads from forensic report output
+  -o OUTPUT, --output OUTPUT
+                        write output files to the given directory
+  --aggregate-json-filename AGGREGATE_JSON_FILENAME
+                        filename for the aggregate JSON output file
+  --forensic-json-filename FORENSIC_JSON_FILENAME
+                        filename for the forensic JSON output file
+  --smtp-tls-json-filename SMTP_TLS_JSON_FILENAME
+                        filename for the SMTP TLS JSON output file
+  --aggregate-csv-filename AGGREGATE_CSV_FILENAME
+                        filename for the aggregate CSV output file
+  --forensic-csv-filename FORENSIC_CSV_FILENAME
+                        filename for the forensic CSV output file
+  --smtp-tls-csv-filename SMTP_TLS_CSV_FILENAME
+                        filename for the SMTP TLS CSV output file
+  -n NAMESERVERS [NAMESERVERS ...], --nameservers NAMESERVERS [NAMESERVERS ...]
+                        nameservers to query
+  -t DNS_TIMEOUT, --dns_timeout DNS_TIMEOUT
+                        number of seconds to wait for an answer from DNS (default: 2.0)
+  --offline             do not make online queries for geolocation or DNS
+  -s, --silent          only print errors
+  -w, --warnings        print warnings in addition to errors
+  --verbose             more verbose output
+  --debug               print debugging information
+  --log-file LOG_FILE   output logging to a file
+  --no-prettify-json    output JSON in a single line without indentation
+  -v, --version         show program's version number and exit
 ```

 :::{note}
@@ -120,8 +123,10 @@ The full set of configuration options are:
      Elasticsearch, Splunk and/or S3
  - `save_smtp_tls` - bool: Save SMTP-STS report data to
      Elasticsearch, Splunk and/or S3
+  - `index_prefix_domain_map` -  bool: A path mapping of Opensearch/Elasticsearch index prefixes to domain names
  - `strip_attachment_payloads` - bool: Remove attachment
      payloads from results
+  - `silent` - bool: Set this to `False` to output results to STDOUT
  - `output` - str: Directory to place JSON and CSV files in.  This is required if you set either of the JSON output file options.
  - `aggregate_json_filename` - str: filename for the aggregate
      JSON output file
@@ -369,7 +374,7 @@ The full set of configuration options are:
  - `mode` - str: The GELF transport type to use. Valid modes: `tcp`, `udp`, `tls`

 - `maildir`
-  - `reports_folder` - str: Full path for mailbox maidir location (Default: `INBOX`)
+  - `maildir_path` - str: Full path for mailbox maidir location (Default: `INBOX`)
  - `maildir_create` - bool: Create maildir if not present (Default: False)

 - `webhook` - Post the individual reports to a webhook url with the report as the JSON body
@@ -445,6 +450,28 @@ PUT _cluster/settings
 Increasing this value increases resource usage.
 :::

+## Multi-tenant support
+
+Starting in `8.19.0`, ParseDMARC provides multi-tenant support by placing data into separate OpenSearch or Elasticsearch index prefixes. To set this up, create a YAML file that is formatted where each key is a tenant name, and the value is a list of domains related to that tenant, not including subdomains, like this:
+
+```yaml
+example:
+  - example.com
+  - example.net
+  - example.org
+
+whalensolutions:
+  - whalensolutions.com
+```
+
+Save it to disk where the user running ParseDMARC can read it, then set `index_prefix_domain_map` to that filepath in the `[general]` section of the ParseDMARC configuration file and do not set an `index_prefix` option in the `[elasticsearch]` or `[opensearch]` sections.
+
+When configured correctly, if ParseDMARC finds that a report is related to a domain in the mapping, the report will be saved in an index name that has the tenant name prefixed to it with a trailing underscore. Then, you can use the security features of Opensearch or the ELK stack to only grant users access to the indexes that they need.
+
+ :::{note}
+ A domain cannot be used in multiple tenant lists. Only the first prefix list that contains the matching domain is used.
+:::
+
 ## Running parsedmarc as a systemd service

 Use systemd to run `parsedmarc` as a service and process reports as
--- a/kibana/export.ndjson
+++ b/kibana/export.ndjson
--- a/parsedmarc/init.py
+++ b/parsedmarc/init.py
@@ -2,6 +2,10 @@

 """A Python package for parsing DMARC reports"""

+from __future__ import annotations
+
+from typing import Dict, List, Any, Union, IO, Callable
+
 import binascii
 import email
 import email.utils
@@ -17,9 +21,8 @@ import zlib
 from base64 import b64decode
 from collections import OrderedDict
 from csv import DictWriter
-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone, tzinfo
 from io import BytesIO, StringIO
-from typing import Callable

 import mailparser
 import xmltodict
@@ -34,12 +37,13 @@ from parsedmarc.mail import (
    MSGraphConnection,
    GmailConnection,
 )
+
+from parsedmarc.constants import __version__
 from parsedmarc.utils import get_base_domain, get_ip_address_info
 from parsedmarc.utils import is_outlook_msg, convert_outlook_msg
 from parsedmarc.utils import parse_email
 from parsedmarc.utils import timestamp_to_human, human_timestamp_to_datetime

-__version__ = "8.18.2"

 logger.debug("parsedmarc v{0}".format(__version__))

@@ -78,15 +82,196 @@ class InvalidForensicReport(InvalidDMARCReport):
    """Raised when an invalid DMARC forensic report is encountered"""


+def _bucket_interval_by_day(
+    begin: datetime,
+    end: datetime,
+    total_count: int,
+) -> List[Dict[Any]]:
+    """
+    Split the interval [begin, end) into daily buckets and distribute
+    `total_count` proportionally across those buckets.
+
+    The function:
+      1. Identifies each calendar day touched by [begin, end)
+      2. Computes how many seconds of the interval fall into each day
+      3. Assigns counts in proportion to those overlaps
+      4. Ensures the final counts sum exactly to total_count
+
+    Args:
+        begin: timezone-aware datetime, inclusive start of interval
+        end: timezone-aware datetime, exclusive end of interval
+        total_count: number of messages to distribute
+
+    Returns:
+        A list of dicts like:
+            {
+                "begin": datetime,
+                "end": datetime,
+                "count": int
+            }
+    """
+    # --- Input validation ----------------------------------------------------
+    if begin > end:
+        raise ValueError("begin must be earlier than end")
+    if begin.tzinfo is None or end.tzinfo is None:
+        raise ValueError("begin and end must be timezone-aware")
+    if begin.tzinfo is not end.tzinfo:
+        raise ValueError("begin and end must have the same tzinfo")
+    if total_count < 0:
+        raise ValueError("total_count must be non-negative")
+
+    # --- Short-circuit trivial cases -----------------------------------------
+    interval_seconds = (end - begin).total_seconds()
+    if interval_seconds <= 0 or total_count == 0:
+        return []
+
+    tz: tzinfo = begin.tzinfo
+
+    # --- Step 1: Determine all calendar days touched by [begin, end) ----------
+    #
+    # For example:
+    #   begin = Jan 1 12:00
+    #   end   = Jan 3 06:00
+    #
+    # We need buckets for:
+    #   Jan 1 12:00 → Jan 2 00:00
+    #   Jan 2 00:00 → Jan 3 00:00
+    #   Jan 3 00:00 → Jan 3 06:00
+    #
+
+    # Start at midnight on the day of `begin`.
+    day_cursor = datetime(begin.year, begin.month, begin.day, tzinfo=tz)
+
+    # If `begin` is earlier on that day (e.g. 10:00), we want that midnight.
+    # If `begin` is past that midnight (e.g. 00:30), this is correct.
+    # If `begin` is BEFORE that midnight (rare unless tz shifts), adjust:
+    if day_cursor > begin:
+        day_cursor -= timedelta(days=1)
+
+    day_buckets: List[Dict[str, Any]] = []
+
+    while day_cursor < end:
+        day_start = day_cursor
+        day_end = day_cursor + timedelta(days=1)
+
+        # Overlap between [begin, end) and this day
+        overlap_start = max(begin, day_start)
+        overlap_end = min(end, day_end)
+
+        overlap_seconds = (overlap_end - overlap_start).total_seconds()
+
+        if overlap_seconds > 0:
+            day_buckets.append(
+                {
+                    "begin": overlap_start,
+                    "end": overlap_end,
+                    "seconds": overlap_seconds,
+                }
+            )
+
+        day_cursor = day_end
+
+    # --- Step 2: Pro-rate counts across buckets -------------------------------
+    #
+    # Compute the exact fractional count for each bucket:
+    #     bucket_fraction = bucket_seconds / interval_seconds
+    #     bucket_exact    = total_count * bucket_fraction
+    #
+    # Then apply a "largest remainder" rounding strategy to ensure the sum
+    # equals exactly total_count.
+
+    exact_values: List[float] = [
+        (b["seconds"] / interval_seconds) * total_count for b in day_buckets
+    ]
+
+    floor_values: List[int] = [int(x) for x in exact_values]
+    fractional_parts: List[float] = [x - int(x) for x in exact_values]
+
+    # How many counts do we still need to distribute after flooring?
+    remainder = total_count - sum(floor_values)
+
+    # Sort buckets by descending fractional remainder
+    indices_by_fraction = sorted(
+        range(len(day_buckets)),
+        key=lambda i: fractional_parts[i],
+        reverse=True,
+    )
+
+    # Start with floor values
+    final_counts = floor_values[:]
+
+    # Add +1 to the buckets with the largest fractional parts
+    for idx in indices_by_fraction[:remainder]:
+        final_counts[idx] += 1
+
+    # --- Step 3: Build the final per-day result list -------------------------
+    results: List[Dict[str, Any]] = []
+    for bucket, count in zip(day_buckets, final_counts):
+        if count > 0:
+            results.append(
+                {
+                    "begin": bucket["begin"],
+                    "end": bucket["end"],
+                    "count": count,
+                }
+            )
+
+    return results
+
+
+def _append_parsed_record(
+    parsed_record: Dict[str, Any],
+    records: List[Dict[str, Any]],
+    begin_dt: datetime,
+    end_dt: datetime,
+    normalize: bool,
+) -> None:
+    """
+    Append a parsed DMARC record either unchanged or normalized.
+
+    Args:
+        parsed_record: The record returned by _parse_report_record().
+        records: Accumulating list of output records.
+        begin_dt: Report-level begin datetime (UTC).
+        end_dt: Report-level end datetime (UTC).
+        normalize: Whether this report exceeded the allowed timespan
+                   and should be normalized per-day.
+    """
+
+    if not normalize:
+        parsed_record["normalized_timespan"] = False
+        parsed_record["interval_begin"] = begin_dt.strftime("%Y-%m-%d %H:%M:%S")
+        parsed_record["interval_end"] = end_dt.strftime("%Y-%m-%d %H:%M:%S")
+
+        records.append(parsed_record)
+        return
+
+    # Normalization path: break record into daily buckets
+    total_count = int(parsed_record.get("count", 0))
+    buckets = _bucket_interval_by_day(begin_dt, end_dt, total_count)
+    if not buckets:
+        return
+
+    for part_index, bucket in enumerate(buckets):
+        new_rec = parsed_record.copy()
+        new_rec["count"] = bucket["count"]
+        new_rec["normalized_timespan"] = True
+
+        new_rec["interval_begin"] = bucket["begin"].strftime("%Y-%m-%d %H:%M:%S")
+        new_rec["interval_end"] = bucket["end"].strftime("%Y-%m-%d %H:%M:%S")
+
+        records.append(new_rec)
+
+
 def _parse_report_record(
-    record,
-    ip_db_path=None,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    offline=False,
-    nameservers=None,
-    dns_timeout=2.0,
+    record: dict,
+    ip_db_path: str = None,
+    always_use_local_files: bool = False,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    offline: bool = False,
+    nameservers: list[str] = None,
+    dns_timeout: float = 2.0,
 ):
    """
    Converts a record from a DMARC aggregate report into a more consistent
@@ -241,7 +426,7 @@ def _parse_report_record(
    return new_record


-def _parse_smtp_tls_failure_details(failure_details):
+def _parse_smtp_tls_failure_details(failure_details: dict):
    try:
        new_failure_details = OrderedDict(
            result_type=failure_details["result-type"],
@@ -277,7 +462,7 @@ def _parse_smtp_tls_failure_details(failure_details):
        raise InvalidSMTPTLSReport(str(e))


-def _parse_smtp_tls_report_policy(policy):
+def _parse_smtp_tls_report_policy(policy: dict):
    policy_types = ["tlsa", "sts", "no-policy-found"]
    try:
        policy_domain = policy["policy"]["policy-domain"]
@@ -314,7 +499,7 @@ def _parse_smtp_tls_report_policy(policy):
        raise InvalidSMTPTLSReport(str(e))


-def parse_smtp_tls_report_json(report):
+def parse_smtp_tls_report_json(report: dict):
    """Parses and validates an SMTP TLS report"""
    required_fields = [
        "organization-name",
@@ -353,7 +538,7 @@ def parse_smtp_tls_report_json(report):
        raise InvalidSMTPTLSReport(str(e))


-def parsed_smtp_tls_reports_to_csv_rows(reports):
+def parsed_smtp_tls_reports_to_csv_rows(reports: dict):
    """Converts one oor more parsed SMTP TLS reports into a list of single
    layer OrderedDict objects suitable for use in a CSV"""
    if type(reports) is OrderedDict:
@@ -388,7 +573,7 @@ def parsed_smtp_tls_reports_to_csv_rows(reports):
    return rows


-def parsed_smtp_tls_reports_to_csv(reports):
+def parsed_smtp_tls_reports_to_csv(reports: dict):
    """
    Converts one or more parsed SMTP TLS reports to flat CSV format, including
    headers
@@ -434,15 +619,16 @@ def parsed_smtp_tls_reports_to_csv(reports):


 def parse_aggregate_report_xml(
-    xml,
-    ip_db_path=None,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    offline=False,
-    nameservers=None,
-    timeout=2.0,
-    keep_alive=None,
+    xml: str,
+    ip_db_path: bool = None,
+    always_use_local_files: bool = False,
+    reverse_dns_map_path: bool = None,
+    reverse_dns_map_url: bool = None,
+    offline: bool = False,
+    nameservers: bool = None,
+    timeout: float = 2.0,
+    keep_alive: callable = None,
+    normalize_timespan_threshold_hours: float = 24.0,
 ):
    """Parses a DMARC XML report string and returns a consistent OrderedDict

@@ -457,6 +643,7 @@ def parse_aggregate_report_xml(
            (Cloudflare's public DNS resolvers by default)
        timeout (float): Sets the DNS timeout in seconds
        keep_alive (callable): Keep alive function
+        normalize_timespan_threshold_hours (float): Normalize timespans beyond this

    Returns:
        OrderedDict: The parsed aggregate DMARC report
@@ -521,13 +708,27 @@ def parse_aggregate_report_xml(
        report_id = report_id.replace("<", "").replace(">", "").split("@")[0]
        new_report_metadata["report_id"] = report_id
        date_range = report["report_metadata"]["date_range"]
-        if int(date_range["end"]) - int(date_range["begin"]) > 2 * 86400:
-            _error = "Time span > 24 hours - RFC 7489 section 7.2"
-            raise InvalidAggregateReport(_error)
-        date_range["begin"] = timestamp_to_human(date_range["begin"])
-        date_range["end"] = timestamp_to_human(date_range["end"])
+
+        begin_ts = int(date_range["begin"])
+        end_ts = int(date_range["end"])
+        span_seconds = end_ts - begin_ts
+
+        normalize_timespan = span_seconds > normalize_timespan_threshold_hours * 3600
+
+        date_range["begin"] = timestamp_to_human(begin_ts)
+        date_range["end"] = timestamp_to_human(end_ts)
+
        new_report_metadata["begin_date"] = date_range["begin"]
        new_report_metadata["end_date"] = date_range["end"]
+        new_report_metadata["timespan_requires_normalization"] = normalize_timespan
+        new_report_metadata["original_timespan_seconds"] = span_seconds
+        begin_dt = human_timestamp_to_datetime(
+            new_report_metadata["begin_date"], to_utc=True
+        )
+        end_dt = human_timestamp_to_datetime(
+            new_report_metadata["end_date"], to_utc=True
+        )
+
        if "error" in report["report_metadata"]:
            if not isinstance(report["report_metadata"]["error"], list):
                errors = [report["report_metadata"]["error"]]
@@ -586,7 +787,13 @@ def parse_aggregate_report_xml(
                        nameservers=nameservers,
                        dns_timeout=timeout,
                    )
-                    records.append(report_record)
+                    _append_parsed_record(
+                        parsed_record=report_record,
+                        records=records,
+                        begin_dt=begin_dt,
+                        end_dt=end_dt,
+                        normalize=normalize_timespan,
+                    )
                except Exception as e:
                    logger.warning("Could not parse record: {0}".format(e))

@@ -601,7 +808,13 @@ def parse_aggregate_report_xml(
                nameservers=nameservers,
                dns_timeout=timeout,
            )
-            records.append(report_record)
+            _append_parsed_record(
+                parsed_record=report_record,
+                records=records,
+                begin_dt=begin_dt,
+                end_dt=end_dt,
+                normalize=normalize_timespan,
+            )

        new_report["records"] = records

@@ -619,7 +832,7 @@ def parse_aggregate_report_xml(
        raise InvalidAggregateReport("Unexpected error: {0}".format(error.__str__()))


-def extract_report(content):
+def extract_report(content: Union[bytes, str, IO[Any]]):
    """
    Extracts text from a zip or gzip file, as a base64-encoded string,
    file-like object, or bytes.
@@ -683,15 +896,16 @@ def extract_report_from_file_path(file_path):


 def parse_aggregate_report_file(
-    _input,
-    offline=False,
-    always_use_local_files=None,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    ip_db_path=None,
-    nameservers=None,
-    dns_timeout=2.0,
-    keep_alive=None,
+    _input: Union[str, bytes, IO[Any]],
+    offline: bool = False,
+    always_use_local_files: bool = None,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    ip_db_path: str = None,
+    nameservers: list[str] = None,
+    dns_timeout: float = 2.0,
+    keep_alive: Callable = None,
+    normalize_timespan_threshold_hours: float = 24.0,
 ):
    """Parses a file at the given path, a file-like object. or bytes as an
    aggregate DMARC report
@@ -707,6 +921,7 @@ def parse_aggregate_report_file(
            (Cloudflare's public DNS resolvers by default)
        dns_timeout (float): Sets the DNS timeout in seconds
        keep_alive (callable): Keep alive function
+        normalize_timespan_threshold_hours (float): Normalize timespans beyond this

    Returns:
        OrderedDict: The parsed DMARC aggregate report
@@ -727,10 +942,11 @@ def parse_aggregate_report_file(
        nameservers=nameservers,
        timeout=dns_timeout,
        keep_alive=keep_alive,
+        normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
    )


-def parsed_aggregate_reports_to_csv_rows(reports):
+def parsed_aggregate_reports_to_csv_rows(reports: list[dict]):
    """
    Converts one or more parsed aggregate reports to list of dicts in flat CSV
    format
@@ -759,6 +975,9 @@ def parsed_aggregate_reports_to_csv_rows(reports):
        report_id = report["report_metadata"]["report_id"]
        begin_date = report["report_metadata"]["begin_date"]
        end_date = report["report_metadata"]["end_date"]
+        normalized_timespan = report["report_metadata"][
+            "timespan_requires_normalization"
+        ]
        errors = "|".join(report["report_metadata"]["errors"])
        domain = report["policy_published"]["domain"]
        adkim = report["policy_published"]["adkim"]
@@ -776,6 +995,7 @@ def parsed_aggregate_reports_to_csv_rows(reports):
            report_id=report_id,
            begin_date=begin_date,
            end_date=end_date,
+            normalized_timespan=normalized_timespan,
            errors=errors,
            domain=domain,
            adkim=adkim,
@@ -788,6 +1008,8 @@ def parsed_aggregate_reports_to_csv_rows(reports):

        for record in report["records"]:
            row = report_dict.copy()
+            row["begin_date"] = record["interval_begin"]
+            row["end_date"] = record["interval_end"]
            row["source_ip_address"] = record["source"]["ip_address"]
            row["source_country"] = record["source"]["country"]
            row["source_reverse_dns"] = record["source"]["reverse_dns"]
@@ -848,7 +1070,7 @@ def parsed_aggregate_reports_to_csv_rows(reports):
    return rows


-def parsed_aggregate_reports_to_csv(reports):
+def parsed_aggregate_reports_to_csv(reports: list[OrderedDict]):
    """
    Converts one or more parsed aggregate reports to flat CSV format, including
    headers
@@ -868,6 +1090,7 @@ def parsed_aggregate_reports_to_csv(reports):
        "report_id",
        "begin_date",
        "end_date",
+        "normalized_timespan",
        "errors",
        "domain",
        "adkim",
@@ -914,17 +1137,17 @@ def parsed_aggregate_reports_to_csv(reports):


 def parse_forensic_report(
-    feedback_report,
-    sample,
-    msg_date,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    offline=False,
-    ip_db_path=None,
-    nameservers=None,
-    dns_timeout=2.0,
-    strip_attachment_payloads=False,
+    feedback_report: str,
+    sample: str,
+    msg_date: datetime,
+    always_use_local_files: bool = False,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    offline: bool = False,
+    ip_db_path: str = None,
+    nameservers: list[str] = None,
+    dns_timeout: float = 2.0,
+    strip_attachment_payloads: bool = False,
 ):
    """
    Converts a DMARC forensic report and sample to a ``OrderedDict``
@@ -1053,7 +1276,7 @@ def parse_forensic_report(
        raise InvalidForensicReport("Unexpected error: {0}".format(error.__str__()))


-def parsed_forensic_reports_to_csv_rows(reports):
+def parsed_forensic_reports_to_csv_rows(reports: list[OrderedDict]):
    """
    Converts one or more parsed forensic reports to a list of dicts in flat CSV
    format
@@ -1089,7 +1312,7 @@ def parsed_forensic_reports_to_csv_rows(reports):
    return rows


-def parsed_forensic_reports_to_csv(reports):
+def parsed_forensic_reports_to_csv(reports: list[dict]):
    """
    Converts one or more parsed forensic reports to flat CSV format, including
    headers
@@ -1142,16 +1365,17 @@ def parsed_forensic_reports_to_csv(reports):


 def parse_report_email(
-    input_,
-    offline=False,
-    ip_db_path=None,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    nameservers=None,
-    dns_timeout=2.0,
-    strip_attachment_payloads=False,
-    keep_alive=None,
+    input_: Union[bytes, str],
+    offline: bool = False,
+    ip_db_path: str = None,
+    always_use_local_files: bool = False,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    nameservers: list[str] = None,
+    dns_timeout: float = 2.0,
+    strip_attachment_payloads: bool = False,
+    keep_alive: callable = None,
+    normalize_timespan_threshold_hours: float = 24.0,
 ):
    """
    Parses a DMARC report from an email
@@ -1168,6 +1392,7 @@ def parse_report_email(
        strip_attachment_payloads (bool): Remove attachment payloads from
            forensic report results
        keep_alive (callable): keep alive function
+        normalize_timespan_threshold_hours (float): Normalize timespans beyond this

    Returns:
        OrderedDict:
@@ -1183,7 +1408,7 @@ def parse_report_email(
            input_ = input_.decode(encoding="utf8", errors="replace")
        msg = mailparser.parse_from_string(input_)
        msg_headers = json.loads(msg.headers_json)
-        date = email.utils.format_datetime(datetime.utcnow())
+        date = email.utils.format_datetime(datetime.now(timezone.utc))
        if "Date" in msg_headers:
            date = human_timestamp_to_datetime(msg_headers["Date"])
        msg = email.message_from_string(input_)
@@ -1199,12 +1424,16 @@ def parse_report_email(
    if "Subject" in msg_headers:
        subject = msg_headers["Subject"]
    for part in msg.walk():
-        content_type = part.get_content_type()
+        content_type = part.get_content_type().lower()
        payload = part.get_payload()
        if not isinstance(payload, list):
            payload = [payload]
        payload = payload[0].__str__()
-        if content_type == "message/feedback-report":
+        if content_type.startswith("multipart/"):
+            continue
+        if content_type == "text/html":
+            continue
+        elif content_type == "message/feedback-report":
            try:
                if "Feedback-Type" in payload:
                    feedback_report = payload
@@ -1215,13 +1444,12 @@ def parse_report_email(
                feedback_report = feedback_report.replace("\\n", "\n")
            except (ValueError, TypeError, binascii.Error):
                feedback_report = payload
-
        elif content_type == "text/rfc822-headers":
            sample = payload
        elif content_type == "message/rfc822":
            sample = payload
        elif content_type == "application/tlsrpt+json":
-            if "{" not in payload:
+            if not payload.strip().startswith("{"):
                payload = str(b64decode(payload))
            smtp_tls_report = parse_smtp_tls_report_json(payload)
            return OrderedDict(
@@ -1233,7 +1461,6 @@ def parse_report_email(
            return OrderedDict(
                [("report_type", "smtp_tls"), ("report", smtp_tls_report)]
            )
-
        elif content_type == "text/plain":
            if "A message claiming to be from you has failed" in payload:
                try:
@@ -1260,13 +1487,14 @@ def parse_report_email(
                payload = b64decode(payload)
                if payload.startswith(MAGIC_ZIP) or payload.startswith(MAGIC_GZIP):
                    payload = extract_report(payload)
-                    ns = nameservers
-                    if payload.startswith("{"):
-                        smtp_tls_report = parse_smtp_tls_report_json(payload)
-                        result = OrderedDict(
-                            [("report_type", "smtp_tls"), ("report", smtp_tls_report)]
-                        )
-                        return result
+                if isinstance(payload, bytes):
+                    payload = payload.decode("utf-8", errors="replace")
+                if payload.strip().startswith("{"):
+                    smtp_tls_report = parse_smtp_tls_report_json(payload)
+                    result = OrderedDict(
+                        [("report_type", "smtp_tls"), ("report", smtp_tls_report)]
+                    )
+                elif payload.strip().startswith("<"):
                    aggregate_report = parse_aggregate_report_xml(
                        payload,
                        ip_db_path=ip_db_path,
@@ -1274,25 +1502,25 @@ def parse_report_email(
                        reverse_dns_map_path=reverse_dns_map_path,
                        reverse_dns_map_url=reverse_dns_map_url,
                        offline=offline,
-                        nameservers=ns,
+                        nameservers=nameservers,
                        timeout=dns_timeout,
                        keep_alive=keep_alive,
+                        normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
                    )
                    result = OrderedDict(
                        [("report_type", "aggregate"), ("report", aggregate_report)]
                    )
+
                    return result

            except (TypeError, ValueError, binascii.Error):
                pass

-            except InvalidAggregateReport as e:
-                error = (
-                    'Message with subject "{0}" '
-                    "is not a valid "
-                    "aggregate DMARC report: {1}".format(subject, e)
+            except InvalidDMARCReport:
+                error = 'Message with subject "{0}" is not a valid DMARC report'.format(
+                    subject
                )
-                raise InvalidDMARCReport(error)
+                raise ParserError(error)

            except Exception as e:
                error = 'Unable to parse message with subject "{0}": {1}'.format(
@@ -1334,16 +1562,17 @@ def parse_report_email(


 def parse_report_file(
-    input_,
-    nameservers=None,
-    dns_timeout=2.0,
-    strip_attachment_payloads=False,
-    ip_db_path=None,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    offline=False,
-    keep_alive=None,
+    input_: Union[bytes, str, IO[Any]],
+    nameservers: list[str] = None,
+    dns_timeout: float = 2.0,
+    strip_attachment_payloads: bool = False,
+    ip_db_path: str = None,
+    always_use_local_files: bool = False,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    offline: bool = False,
+    keep_alive: Callable = None,
+    normalize_timespan_threshold_hours: float = 24,
 ):
    """Parses a DMARC aggregate or forensic file at the given path, a
    file-like object. or bytes
@@ -1386,6 +1615,7 @@ def parse_report_file(
            nameservers=nameservers,
            dns_timeout=dns_timeout,
            keep_alive=keep_alive,
+            normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
        )
        results = OrderedDict([("report_type", "aggregate"), ("report", report)])
    except InvalidAggregateReport:
@@ -1406,6 +1636,7 @@ def parse_report_file(
                    dns_timeout=dns_timeout,
                    strip_attachment_payloads=sa,
                    keep_alive=keep_alive,
+                    normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
                )
            except InvalidDMARCReport:
                raise ParserError("Not a valid report")
@@ -1413,15 +1644,16 @@ def parse_report_file(


 def get_dmarc_reports_from_mbox(
-    input_,
-    nameservers=None,
-    dns_timeout=2.0,
-    strip_attachment_payloads=False,
-    ip_db_path=None,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    offline=False,
+    input_: str,
+    nameservers: list[str] = None,
+    dns_timeout: float = 2.0,
+    strip_attachment_payloads: bool = False,
+    ip_db_path: str = None,
+    always_use_local_files: bool = False,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    offline: bool = False,
+    normalize_timespan_threshold_hours: float = 24.0,
 ):
    """Parses a mailbox in mbox format containing e-mails with attached
    DMARC reports
@@ -1438,6 +1670,7 @@ def get_dmarc_reports_from_mbox(
        reverse_dns_map_url (str): URL to a reverse DNS map file
        ip_db_path (str): Path to a MMDB file from MaxMind or DBIP
        offline (bool): Do not make online queries for geolocation or DNS
+        normalize_timespan_threshold_hours (float): Normalize timespans beyond this

    Returns:
        OrderedDict: Lists of ``aggregate_reports`` and ``forensic_reports``
@@ -1467,6 +1700,7 @@ def get_dmarc_reports_from_mbox(
                    nameservers=nameservers,
                    dns_timeout=dns_timeout,
                    strip_attachment_payloads=sa,
+                    normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
                )
                if parsed_email["report_type"] == "aggregate":
                    report_org = parsed_email["report"]["report_metadata"]["org_name"]
@@ -1499,22 +1733,23 @@ def get_dmarc_reports_from_mbox(

 def get_dmarc_reports_from_mailbox(
    connection: MailboxConnection,
-    reports_folder="INBOX",
-    archive_folder="Archive",
-    delete=False,
-    test=False,
-    ip_db_path=None,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    offline=False,
-    nameservers=None,
-    dns_timeout=6.0,
-    strip_attachment_payloads=False,
-    results=None,
-    batch_size=10,
-    since=None,
-    create_folders=True,
+    reports_folder: str = "INBOX",
+    archive_folder: str = "Archive",
+    delete: bool = False,
+    test: bool = False,
+    ip_db_path: str = None,
+    always_use_local_files: str = False,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    offline: bool = False,
+    nameservers: list[str] = None,
+    dns_timeout: float = 6.0,
+    strip_attachment_payloads: bool = False,
+    results: dict = None,
+    batch_size: int = 10,
+    since: datetime = None,
+    create_folders: bool = True,
+    normalize_timespan_threshold_hours: float = 24,
 ):
    """
    Fetches and parses DMARC reports from a mailbox
@@ -1541,6 +1776,7 @@ def get_dmarc_reports_from_mailbox(
            (units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"})
        create_folders (bool): Whether to create the destination folders
            (not used in watch)
+        normalize_timespan_threshold_hours (float): Normalize timespans beyond this

    Returns:
        OrderedDict: Lists of ``aggregate_reports`` and ``forensic_reports``
@@ -1579,7 +1815,7 @@ def get_dmarc_reports_from_mailbox(

    if since:
        _since = 1440  # default one day
-        if re.match(r"\d+[mhd]$", since):
+        if re.match(r"\d+[mhdw]$", since):
            s = re.split(r"(\d+)", since)
            if s[2] == "m":
                _since = int(s[1])
@@ -1603,14 +1839,18 @@ def get_dmarc_reports_from_mailbox(
                "Only days and weeks values in 'since' option are \
                         considered for IMAP conections. Examples: 2d or 1w"
            )
-            since = (datetime.utcnow() - timedelta(minutes=_since)).date()
-            current_time = datetime.utcnow().date()
+            since = (datetime.now(timezone.utc) - timedelta(minutes=_since)).date()
+            current_time = datetime.now(timezone.utc).date()
        elif isinstance(connection, MSGraphConnection):
-            since = (datetime.utcnow() - timedelta(minutes=_since)).isoformat() + "Z"
-            current_time = datetime.utcnow().isoformat() + "Z"
+            since = (
+                datetime.now(timezone.utc) - timedelta(minutes=_since)
+            ).isoformat() + "Z"
+            current_time = datetime.now(timezone.utc).isoformat() + "Z"
        elif isinstance(connection, GmailConnection):
-            since = (datetime.utcnow() - timedelta(minutes=_since)).strftime("%s")
-            current_time = datetime.utcnow().strftime("%s")
+            since = (datetime.now(timezone.utc) - timedelta(minutes=_since)).strftime(
+                "%s"
+            )
+            current_time = datetime.now(timezone.utc).strftime("%s")
        else:
            pass

@@ -1654,6 +1894,7 @@ def get_dmarc_reports_from_mailbox(
                offline=offline,
                strip_attachment_payloads=sa,
                keep_alive=connection.keepalive,
+                normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
            )
            if parsed_email["report_type"] == "aggregate":
                report_org = parsed_email["report"]["report_metadata"]["org_name"]
@@ -1805,6 +2046,7 @@ def get_dmarc_reports_from_mailbox(
            reverse_dns_map_url=reverse_dns_map_url,
            offline=offline,
            since=current_time,
+            normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
        )

    return results
@@ -1813,20 +2055,21 @@ def get_dmarc_reports_from_mailbox(
 def watch_inbox(
    mailbox_connection: MailboxConnection,
    callback: Callable,
-    reports_folder="INBOX",
-    archive_folder="Archive",
-    delete=False,
-    test=False,
-    check_timeout=30,
-    ip_db_path=None,
-    always_use_local_files=False,
-    reverse_dns_map_path=None,
-    reverse_dns_map_url=None,
-    offline=False,
-    nameservers=None,
-    dns_timeout=6.0,
-    strip_attachment_payloads=False,
-    batch_size=None,
+    reports_folder: str = "INBOX",
+    archive_folder: str = "Archive",
+    delete: bool = False,
+    test: bool = False,
+    check_timeout: int = 30,
+    ip_db_path: str = None,
+    always_use_local_files: bool = False,
+    reverse_dns_map_path: str = None,
+    reverse_dns_map_url: str = None,
+    offline: bool = False,
+    nameservers: list[str] = None,
+    dns_timeout: float = 6.0,
+    strip_attachment_payloads: bool = False,
+    batch_size: int = None,
+    normalize_timespan_threshold_hours: float = 24,
 ):
    """
    Watches the mailbox for new messages and
@@ -1852,6 +2095,7 @@ def watch_inbox(
        strip_attachment_payloads (bool): Replace attachment payloads in
            forensic report samples with None
        batch_size (int): Number of messages to read and process before saving
+        normalize_timespan_threshold_hours (float): Normalize timespans beyond this
    """

    def check_callback(connection):
@@ -1872,6 +2116,7 @@ def watch_inbox(
            strip_attachment_payloads=sa,
            batch_size=batch_size,
            create_folders=False,
+            normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
        )
        callback(res)

@@ -1914,14 +2159,14 @@ def append_csv(filename, csv):


 def save_output(
-    results,
-    output_directory="output",
-    aggregate_json_filename="aggregate.json",
-    forensic_json_filename="forensic.json",
-    smtp_tls_json_filename="smtp_tls.json",
-    aggregate_csv_filename="aggregate.csv",
-    forensic_csv_filename="forensic.csv",
-    smtp_tls_csv_filename="smtp_tls.csv",
+    results: OrderedDict,
+    output_directory: str = "output",
+    aggregate_json_filename: str = "aggregate.json",
+    forensic_json_filename: str = "forensic.json",
+    smtp_tls_json_filename: str = "smtp_tls.json",
+    aggregate_csv_filename: str = "aggregate.csv",
+    forensic_csv_filename: str = "forensic.csv",
+    smtp_tls_csv_filename: str = "smtp_tls.csv",
 ):
    """
    Save report data in the given directory
@@ -1999,7 +2244,7 @@ def save_output(
            sample_file.write(sample)


-def get_report_zip(results):
+def get_report_zip(results: OrderedDict):
    """
    Creates a zip file of parsed report output

--- a/parsedmarc/cli.py
+++ b/parsedmarc/cli.py
@@ -9,6 +9,7 @@ from configparser import ConfigParser
 from glob import glob
 import logging
 import math
+import yaml
 from collections import OrderedDict
 import json
 from ssl import CERT_NONE, create_default_context
@@ -46,7 +47,7 @@ from parsedmarc.mail import (
 from parsedmarc.mail.graph import AuthMethod

 from parsedmarc.log import logger
-from parsedmarc.utils import is_mbox, get_reverse_dns
+from parsedmarc.utils import is_mbox, get_reverse_dns, get_base_domain
 from parsedmarc import SEEN_AGGREGATE_REPORT_IDS

 http.client._MAXHEADERS = 200  # pylint:disable=protected-access
@@ -76,6 +77,7 @@ def cli_parse(
    always_use_local_files,
    reverse_dns_map_path,
    reverse_dns_map_url,
+    normalize_timespan_threshold_hours,
    conn,
 ):
    """Separated this function for multiprocessing"""
@@ -90,6 +92,7 @@ def cli_parse(
            nameservers=nameservers,
            dns_timeout=dns_timeout,
            strip_attachment_payloads=sa,
+            normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
        )
        conn.send([file_results, file_path])
    except ParserError as error:
@@ -101,8 +104,35 @@ def cli_parse(
 def _main():
    """Called when the module is executed"""

+    def get_index_prefix(report):
+        if index_prefix_domain_map is None:
+            return None
+        if "policy_published" in report:
+            domain = report["policy_published"]["domain"]
+        elif "reported_domain" in report:
+            domain = report("reported_domain")
+        elif "policies" in report:
+            domain = report["policies"][0]["domain"]
+        if domain:
+            domain = get_base_domain(domain)
+            for prefix in index_prefix_domain_map:
+                if domain in index_prefix_domain_map[prefix]:
+                    prefix = (
+                        prefix.lower()
+                        .strip()
+                        .strip("_")
+                        .replace(" ", "_")
+                        .replace("-", "_")
+                    )
+                    prefix = f"{prefix}_"
+                    return prefix
+        return None
+
    def process_reports(reports_):
-        output_str = "{0}\n".format(json.dumps(reports_, ensure_ascii=False, indent=2))
+        indent_value = 2 if opts.prettify_json else None
+        output_str = "{0}\n".format(
+            json.dumps(reports_, ensure_ascii=False, indent=indent_value)
+        )

        if not opts.silent:
            print(output_str)
@@ -126,7 +156,8 @@ def _main():
                        elastic.save_aggregate_report_to_elasticsearch(
                            report,
                            index_suffix=opts.elasticsearch_index_suffix,
-                            index_prefix=opts.elasticsearch_index_prefix,
+                            index_prefix=opts.elasticsearch_index_prefix
+                            or get_index_prefix(report),
                            monthly_indexes=opts.elasticsearch_monthly_indexes,
                            number_of_shards=shards,
                            number_of_replicas=replicas,
@@ -147,7 +178,8 @@ def _main():
                        opensearch.save_aggregate_report_to_opensearch(
                            report,
                            index_suffix=opts.opensearch_index_suffix,
-                            index_prefix=opts.opensearch_index_prefix,
+                            index_prefix=opts.opensearch_index_prefix
+                            or get_index_prefix(report),
                            monthly_indexes=opts.opensearch_monthly_indexes,
                            number_of_shards=shards,
                            number_of_replicas=replicas,
@@ -189,8 +221,9 @@ def _main():

                try:
                    if opts.webhook_aggregate_url:
+                        indent_value = 2 if opts.prettify_json else None
                        webhook_client.save_aggregate_report_to_webhook(
-                            json.dumps(report, ensure_ascii=False, indent=2)
+                            json.dumps(report, ensure_ascii=False, indent=indent_value)
                        )
                except Exception as error_:
                    logger.error("Webhook Error: {0}".format(error_.__str__()))
@@ -212,7 +245,8 @@ def _main():
                        elastic.save_forensic_report_to_elasticsearch(
                            report,
                            index_suffix=opts.elasticsearch_index_suffix,
-                            index_prefix=opts.elasticsearch_index_prefix,
+                            index_prefix=opts.elasticsearch_index_prefix
+                            or get_index_prefix(report),
                            monthly_indexes=opts.elasticsearch_monthly_indexes,
                            number_of_shards=shards,
                            number_of_replicas=replicas,
@@ -231,7 +265,8 @@ def _main():
                        opensearch.save_forensic_report_to_opensearch(
                            report,
                            index_suffix=opts.opensearch_index_suffix,
-                            index_prefix=opts.opensearch_index_prefix,
+                            index_prefix=opts.opensearch_index_prefix
+                            or get_index_prefix(report),
                            monthly_indexes=opts.opensearch_monthly_indexes,
                            number_of_shards=shards,
                            number_of_replicas=replicas,
@@ -271,8 +306,9 @@ def _main():

                try:
                    if opts.webhook_forensic_url:
+                        indent_value = 2 if opts.prettify_json else None
                        webhook_client.save_forensic_report_to_webhook(
-                            json.dumps(report, ensure_ascii=False, indent=2)
+                            json.dumps(report, ensure_ascii=False, indent=indent_value)
                        )
                except Exception as error_:
                    logger.error("Webhook Error: {0}".format(error_.__str__()))
@@ -294,7 +330,8 @@ def _main():
                        elastic.save_smtp_tls_report_to_elasticsearch(
                            report,
                            index_suffix=opts.elasticsearch_index_suffix,
-                            index_prefix=opts.elasticsearch_index_prefix,
+                            index_prefix=opts.elasticsearch_index_prefix
+                            or get_index_prefix(report),
                            monthly_indexes=opts.elasticsearch_monthly_indexes,
                            number_of_shards=shards,
                            number_of_replicas=replicas,
@@ -313,7 +350,8 @@ def _main():
                        opensearch.save_smtp_tls_report_to_opensearch(
                            report,
                            index_suffix=opts.opensearch_index_suffix,
-                            index_prefix=opts.opensearch_index_prefix,
+                            index_prefix=opts.opensearch_index_prefix
+                            or get_index_prefix(report),
                            monthly_indexes=opts.opensearch_monthly_indexes,
                            number_of_shards=shards,
                            number_of_replicas=replicas,
@@ -353,8 +391,9 @@ def _main():

                try:
                    if opts.webhook_smtp_tls_url:
+                        indent_value = 2 if opts.prettify_json else None
                        webhook_client.save_smtp_tls_report_to_webhook(
-                            json.dumps(report, ensure_ascii=False, indent=2)
+                            json.dumps(report, ensure_ascii=False, indent=indent_value)
                        )
                except Exception as error_:
                    logger.error("Webhook Error: {0}".format(error_.__str__()))
@@ -475,6 +514,12 @@ def _main():
        "--debug", action="store_true", help="print debugging information"
    )
    arg_parser.add_argument("--log-file", default=None, help="output logging to a file")
+    arg_parser.add_argument(
+        "--no-prettify-json",
+        action="store_false",
+        dest="prettify_json",
+        help="output JSON in a single line without indentation",
+    )
    arg_parser.add_argument("-v", "--version", action="version", version=__version__)

    aggregate_reports = []
@@ -504,6 +549,7 @@ def _main():
        dns_timeout=args.dns_timeout,
        debug=args.debug,
        verbose=args.verbose,
+        prettify_json=args.prettify_json,
        save_aggregate=False,
        save_forensic=False,
        save_smtp_tls=False,
@@ -615,6 +661,7 @@ def _main():
        webhook_forensic_url=None,
        webhook_smtp_tls_url=None,
        webhook_timeout=60,
+        normalize_timespan_threshold_hours=24.0,
    )
    args = arg_parser.parse_args()

@@ -625,9 +672,19 @@ def _main():
            exit(-1)
        opts.silent = True
        config = ConfigParser()
+        index_prefix_domain_map = None
        config.read(args.config_file)
        if "general" in config.sections():
            general_config = config["general"]
+            if "silent" in general_config:
+                opts.silent = general_config.getboolean("silent")
+            if "normalize_timespan_threshold_hours" in general_config:
+                opts.normalize_timespan_threshold_hours = general_config.getfloat(
+                    "normalize_timespan_threshold_hours"
+                )
+            if "index_prefix_domain_map" in general_config:
+                with open(general_config["index_prefix_domain_map"]) as f:
+                    index_prefix_domain_map = yaml.safe_load(f)
            if "offline" in general_config:
                opts.offline = general_config.getboolean("offline")
            if "strip_attachment_payloads" in general_config:
@@ -701,6 +758,8 @@ def _main():
                opts.reverse_dns_map_path = general_config["reverse_dns_path"]
            if "reverse_dns_map_url" in general_config:
                opts.reverse_dns_map_url = general_config["reverse_dns_url"]
+            if "prettify_json" in general_config:
+                opts.prettify_json = general_config.getboolean("prettify_json")

        if "mailbox" in config.sections():
            mailbox_config = config["mailbox"]
@@ -1167,7 +1226,7 @@ def _main():
            if "smtp_tls_url" in webhook_config:
                opts.webhook_smtp_tls_url = webhook_config["smtp_tls_url"]
            if "timeout" in webhook_config:
-                opts.webhook_timeout = webhook_config["timeout"]
+                opts.webhook_timeout = webhook_config.getint("timeout")

    logger.setLevel(logging.ERROR)

@@ -1392,6 +1451,7 @@ def _main():
                    opts.always_use_local_files,
                    opts.reverse_dns_map_path,
                    opts.reverse_dns_map_url,
+                    opts.normalize_timespan_threshold_hours,
                    child_conn,
                ),
            )
@@ -1442,6 +1502,7 @@ def _main():
            reverse_dns_map_path=opts.reverse_dns_map_path,
            reverse_dns_map_url=opts.reverse_dns_map_url,
            offline=opts.offline,
+            normalize_timespan_threshold_hours=opts.normalize_timespan_threshold_hours,
        )
        aggregate_reports += reports["aggregate_reports"]
        forensic_reports += reports["forensic_reports"]
@@ -1551,6 +1612,7 @@ def _main():
                test=opts.mailbox_test,
                strip_attachment_payloads=opts.strip_attachment_payloads,
                since=opts.mailbox_since,
+                normalize_timespan_threshold_hours=opts.normalize_timespan_threshold_hours,
            )

            aggregate_reports += reports["aggregate_reports"]
@@ -1586,6 +1648,7 @@ def _main():
                username=opts.smtp_user,
                password=opts.smtp_password,
                subject=opts.smtp_subject,
+                require_encryption=opts.smtp_ssl,
            )
        except Exception:
            logger.exception("Failed to email results")
@@ -1612,6 +1675,7 @@ def _main():
                reverse_dns_map_path=opts.reverse_dns_map_path,
                reverse_dns_map_url=opts.reverse_dns_map_url,
                offline=opts.offline,
+                normalize_timespan_threshold_hours=opts.normalize_timespan_threshold_hours,
            )
        except FileExistsError as error:
            logger.error("{0}".format(error.__str__()))
--- a/parsedmarc/constants.py
+++ b/parsedmarc/constants.py
@@ -0,0 +1,2 @@
+__version__ = "9.0.0"
+USER_AGENT = f"parsedmarc/{__version__}"
--- a/parsedmarc/elastic.py
+++ b/parsedmarc/elastic.py
@@ -67,6 +67,8 @@ class _AggregateReportDoc(Document):
    date_range = Date()
    date_begin = Date()
    date_end = Date()
+    normalized_timespan = Boolean()
+    original_timespan_seconds = Integer
    errors = Text()
    published_policy = Object(_PublishedPolicy)
    source_ip_address = Ip()
@@ -393,52 +395,7 @@ def save_aggregate_report_to_elasticsearch(
    org_name = metadata["org_name"]
    report_id = metadata["report_id"]
    domain = aggregate_report["policy_published"]["domain"]
-    begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True)
-    end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True)
-    begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
-    end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
-    if monthly_indexes:
-        index_date = begin_date.strftime("%Y-%m")
-    else:
-        index_date = begin_date.strftime("%Y-%m-%d")
-    aggregate_report["begin_date"] = begin_date
-    aggregate_report["end_date"] = end_date
-    date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
-
-    org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
-    report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
-    domain_query = Q(dict(match_phrase={"published_policy.domain": domain}))
-    begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
-    end_date_query = Q(dict(match=dict(date_end=end_date)))
-
-    if index_suffix is not None:
-        search_index = "dmarc_aggregate_{0}*".format(index_suffix)
-    else:
-        search_index = "dmarc_aggregate*"
-    if index_prefix is not None:
-        search_index = "{0}{1}".format(index_prefix, search_index)
-    search = Search(index=search_index)
-    query = org_name_query & report_id_query & domain_query
-    query = query & begin_date_query & end_date_query
-    search.query = query
-
-    try:
-        existing = search.execute()
-    except Exception as error_:
-        raise ElasticsearchError(
-            "Elasticsearch's search for existing report \
-            error: {}".format(error_.__str__())
-        )
-
-    if len(existing) > 0:
-        raise AlreadySaved(
-            "An aggregate report ID {0} from {1} about {2} "
-            "with a date range of {3} UTC to {4} UTC already "
-            "exists in "
-            "Elasticsearch".format(
-                report_id, org_name, domain, begin_date_human, end_date_human
-            )
-        )
+    
    published_policy = _PublishedPolicy(
        domain=aggregate_report["policy_published"]["domain"],
        adkim=aggregate_report["policy_published"]["adkim"],
@@ -450,6 +407,52 @@ def save_aggregate_report_to_elasticsearch(
    )

    for record in aggregate_report["records"]:
+        begin_date = human_timestamp_to_datetime(record["interval_begin"], to_utc=True)
+        end_date = human_timestamp_to_datetime(record["interval_end"], to_utc=True)
+        begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
+        end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
+        if monthly_indexes:
+            index_date = begin_date.strftime("%Y-%m")
+        else:
+            index_date = begin_date.strftime("%Y-%m-%d")
+        aggregate_report["begin_date"] = begin_date
+        aggregate_report["end_date"] = end_date
+        date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
+
+        org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
+        report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
+        domain_query = Q(dict(match_phrase={"published_policy.domain": domain}))
+        begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
+        end_date_query = Q(dict(match=dict(date_end=end_date)))
+
+        if index_suffix is not None:
+            search_index = "dmarc_aggregate_{0}*".format(index_suffix)
+        else:
+            search_index = "dmarc_aggregate*"
+        if index_prefix is not None:
+            search_index = "{0}{1}".format(index_prefix, search_index)
+        search = Search(index=search_index)
+        query = org_name_query & report_id_query & domain_query
+        query = query & begin_date_query & end_date_query
+        search.query = query
+
+        try:
+            existing = search.execute()
+        except Exception as error_:
+            raise ElasticsearchError(
+                "Elasticsearch's search for existing report \
+                error: {}".format(error_.__str__())
+            )
+
+        if len(existing) > 0:
+            raise AlreadySaved(
+                "An aggregate report ID {0} from {1} about {2} "
+                "with a date range of {3} UTC to {4} UTC already "
+                "exists in "
+                "Elasticsearch".format(
+                    report_id, org_name, domain, begin_date_human, end_date_human
+                )
+            )
        agg_doc = _AggregateReportDoc(
            xml_schema=aggregate_report["xml_schema"],
            org_name=metadata["org_name"],
@@ -459,6 +462,7 @@ def save_aggregate_report_to_elasticsearch(
            date_range=date_range,
            date_begin=aggregate_report["begin_date"],
            date_end=aggregate_report["end_date"],
+            normalized_timespan=record["normalized_timespan"],
            errors=metadata["errors"],
            published_policy=published_policy,
            source_ip_address=record["source"]["ip_address"],
--- a/parsedmarc/opensearch.py
+++ b/parsedmarc/opensearch.py
@@ -67,6 +67,8 @@ class _AggregateReportDoc(Document):
    date_range = Date()
    date_begin = Date()
    date_end = Date()
+    normalized_timespan = Boolean()
+    original_timespan_seconds = Integer
    errors = Text()
    published_policy = Object(_PublishedPolicy)
    source_ip_address = Ip()
@@ -393,52 +395,7 @@ def save_aggregate_report_to_opensearch(
    org_name = metadata["org_name"]
    report_id = metadata["report_id"]
    domain = aggregate_report["policy_published"]["domain"]
-    begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True)
-    end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True)
-    begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
-    end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
-    if monthly_indexes:
-        index_date = begin_date.strftime("%Y-%m")
-    else:
-        index_date = begin_date.strftime("%Y-%m-%d")
-    aggregate_report["begin_date"] = begin_date
-    aggregate_report["end_date"] = end_date
-    date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
-
-    org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
-    report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
-    domain_query = Q(dict(match_phrase={"published_policy.domain": domain}))
-    begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
-    end_date_query = Q(dict(match=dict(date_end=end_date)))
-
-    if index_suffix is not None:
-        search_index = "dmarc_aggregate_{0}*".format(index_suffix)
-    else:
-        search_index = "dmarc_aggregate*"
-    if index_prefix is not None:
-        search_index = "{0}{1}".format(index_prefix, search_index)
-    search = Search(index=search_index)
-    query = org_name_query & report_id_query & domain_query
-    query = query & begin_date_query & end_date_query
-    search.query = query
-
-    try:
-        existing = search.execute()
-    except Exception as error_:
-        raise OpenSearchError(
-            "OpenSearch's search for existing report \
-            error: {}".format(error_.__str__())
-        )
-
-    if len(existing) > 0:
-        raise AlreadySaved(
-            "An aggregate report ID {0} from {1} about {2} "
-            "with a date range of {3} UTC to {4} UTC already "
-            "exists in "
-            "OpenSearch".format(
-                report_id, org_name, domain, begin_date_human, end_date_human
-            )
-        )
+    
    published_policy = _PublishedPolicy(
        domain=aggregate_report["policy_published"]["domain"],
        adkim=aggregate_report["policy_published"]["adkim"],
@@ -450,6 +407,52 @@ def save_aggregate_report_to_opensearch(
    )

    for record in aggregate_report["records"]:
+        begin_date = human_timestamp_to_datetime(record["interval_begin"], to_utc=True)
+        end_date = human_timestamp_to_datetime(record["interval_end"], to_utc=True)
+        begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
+        end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
+        if monthly_indexes:
+            index_date = begin_date.strftime("%Y-%m")
+        else:
+            index_date = begin_date.strftime("%Y-%m-%d")
+        aggregate_report["begin_date"] = begin_date
+        aggregate_report["end_date"] = end_date
+        date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
+
+        org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
+        report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
+        domain_query = Q(dict(match_phrase={"published_policy.domain": domain}))
+        begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
+        end_date_query = Q(dict(match=dict(date_end=end_date)))
+
+        if index_suffix is not None:
+            search_index = "dmarc_aggregate_{0}*".format(index_suffix)
+        else:
+            search_index = "dmarc_aggregate*"
+        if index_prefix is not None:
+            search_index = "{0}{1}".format(index_prefix, search_index)
+        search = Search(index=search_index)
+        query = org_name_query & report_id_query & domain_query
+        query = query & begin_date_query & end_date_query
+        search.query = query
+
+        try:
+            existing = search.execute()
+        except Exception as error_:
+            raise OpenSearchError(
+                "OpenSearch's search for existing report \
+                error: {}".format(error_.__str__())
+            )
+
+        if len(existing) > 0:
+            raise AlreadySaved(
+                "An aggregate report ID {0} from {1} about {2} "
+                "with a date range of {3} UTC to {4} UTC already "
+                "exists in "
+                "OpenSearch".format(
+                    report_id, org_name, domain, begin_date_human, end_date_human
+                )
+            )
        agg_doc = _AggregateReportDoc(
            xml_schema=aggregate_report["xml_schema"],
            org_name=metadata["org_name"],
--- a/parsedmarc/resources/maps/README.md
+++ b/parsedmarc/resources/maps/README.md
@@ -3,6 +3,8 @@
 A mapping is meant to make it easier to identify who or what a sending source is. Please consider contributing
 additional mappings in a GitHub Pull Request.

+Do not open these CSV files in Excel. It will replace Unicode characters with question marks. Use LibreOffice Calc instead.
+
 ## base_reverse_dns_map.csv

 A CSV file with three fields: `base_reverse_dns`, `name`, and `type`.
@@ -25,6 +27,7 @@ The `service_type` is based on the following rule precedence:
 - Agriculture
 - Automotive
 - Beauty
+- Conglomerate
 - Construction
 - Consulting
 - Defense
@@ -41,6 +44,7 @@ The `service_type` is based on the following rule precedence:
 - IaaS
 - Industrial
 - ISP
+- Legal
 - Logistics
 - Manufacturing
 - Marketing
@@ -50,6 +54,7 @@ The `service_type` is based on the following rule precedence:
 - Nonprofit
 - PaaS
 - Photography
+- Physical Security
 - Print
 - Publishing
 - Real Estate
@@ -72,12 +77,16 @@ A list of reverse DNS base domains that could not be identified as belonging to

 ## base_reverse_dns.csv

-A CSV with the fields `source_name` and optionally `message_countcount`. This CSV can be generated byy exporting the base DNS data from the Kibana on Splunk dashboards provided by parsedmarc. This file is not tracked by Git.
+A CSV with the fields `source_name` and optionally `message_count`. This CSV can be generated by exporting the base DNS data from the Kibana or Splunk dashboards provided by parsedmarc. This file is not tracked by Git.

 ## unknown_base_reverse_dns.csv

 A CSV file with the fields `source_name` and `message_count`. This file is not tracked by Git.

+## find_bad_utf8.py
+
+Locates invalid UTF-8 bytes in files and optionally tries to current them. Generated by GPT5. Helped me find where I had introduced invalid bytes in `base_reverse_dns_map.csv`.
+
 ## find_unknown_base_reverse_dns.py

 This is a python script that reads the domains in `base_reverse_dns.csv` and writes the domains that are not in `base_reverse_dns_map.csv` or `known_unknown_base_reverse_dns.txt` to `unknown_base_reverse_dns.csv`. This is useful for identifying potential additional domains to contribute to `base_reverse_dns_map.csv` and `known_unknown_base_reverse_dns.txt`.
--- a/parsedmarc/resources/maps/base_reverse_dns_map.csv
+++ b/parsedmarc/resources/maps/base_reverse_dns_map.csv
--- a/parsedmarc/resources/maps/base_reverse_dns_types.txt
+++ b/parsedmarc/resources/maps/base_reverse_dns_types.txt
@@ -0,0 +1,44 @@
+Agriculture
+Automotive
+Beauty
+Conglomerate
+Construction
+Consulting
+Defense
+Education
+Email Provider
+Email Security
+Entertainment
+Event Planning
+Finance
+Food
+Government
+Government Media
+Healthcare
+ISP
+IaaS
+Industrial
+Legal
+Logistics
+MSP
+MSSP
+Manufacturing
+Marketing
+News
+Nonprofit
+PaaS
+Photography
+Physical Security
+Print
+Publishing
+Real Estate
+Retail
+SaaS
+Science
+Search Engine
+Social Media
+Sports
+Staffing
+Technology
+Travel
+Web Host
--- a/parsedmarc/resources/maps/find_bad_utf8.py
+++ b/parsedmarc/resources/maps/find_bad_utf8.py
@@ -0,0 +1,488 @@
+#!/usr/bin/env python3
+
+
+import argparse
+import codecs
+import os
+import sys
+import shutil
+from typing import List, Tuple
+
+"""
+Locates and optionally corrects bad UTF-8 bytes in a file.
+Generated by GPT-5 Use at your own risk.
+"""
+
+# -------------------------
+# UTF-8 scanning
+# -------------------------
+
+
+def scan_line_for_utf8_errors(
+    line_bytes: bytes, line_no: int, base_offset: int, context: int
+):
+    """
+    Scan one line of raw bytes for UTF-8 decoding errors.
+    Returns a list of dicts describing each error.
+    """
+    pos = 0
+    results = []
+    while pos < len(line_bytes):
+        dec = codecs.getincrementaldecoder("utf-8")("strict")
+        try:
+            dec.decode(line_bytes[pos:], final=True)
+            break
+        except UnicodeDecodeError as e:
+            rel_index = e.start
+            abs_index_in_line = pos + rel_index
+            abs_offset = base_offset + abs_index_in_line
+
+            start_ctx = max(0, abs_index_in_line - context)
+            end_ctx = min(len(line_bytes), abs_index_in_line + 1 + context)
+            ctx_bytes = line_bytes[start_ctx:end_ctx]
+            bad_byte = line_bytes[abs_index_in_line : abs_index_in_line + 1]
+            col = abs_index_in_line + 1  # 1-based byte column
+
+            results.append(
+                {
+                    "line": line_no,
+                    "column": col,
+                    "abs_offset": abs_offset,
+                    "bad_byte_hex": bad_byte.hex(),
+                    "context_hex": ctx_bytes.hex(),
+                    "context_preview": ctx_bytes.decode("utf-8", errors="replace"),
+                }
+            )
+            # Move past the offending byte and continue
+            pos = abs_index_in_line + 1
+    return results
+
+
+def scan_file_for_utf8_errors(path: str, context: int, limit: int):
+    errors_found = 0
+    limit_val = limit if limit != 0 else float("inf")
+
+    with open(path, "rb") as f:
+        total_offset = 0
+        line_no = 0
+        while True:
+            line = f.readline()
+            if not line:
+                break
+            line_no += 1
+            results = scan_line_for_utf8_errors(line, line_no, total_offset, context)
+            for r in results:
+                errors_found += 1
+                print(
+                    f"[ERROR {errors_found}] Line {r['line']}, Column {r['column']}, "
+                    f"Absolute byte offset {r['abs_offset']}"
+                )
+                print(f"  Bad byte: 0x{r['bad_byte_hex']}")
+                print(f"  Context (hex): {r['context_hex']}")
+                print(f"  Context (preview): {r['context_preview']}")
+                print()
+                if errors_found >= limit_val:
+                    print(f"Reached limit of {limit} errors. Stopping.")
+                    return errors_found
+            total_offset += len(line)
+
+    if errors_found == 0:
+        print("No invalid UTF-8 bytes found. 🎉")
+    else:
+        print(f"Found {errors_found} invalid UTF-8 byte(s).")
+    return errors_found
+
+
+# -------------------------
+# Whole-file conversion
+# -------------------------
+
+
+def detect_encoding_text(path: str) -> Tuple[str, str]:
+    """
+    Use charset-normalizer to detect file encoding.
+    Return (encoding_name, decoded_text). Falls back to cp1252 if needed.
+    """
+    try:
+        from charset_normalizer import from_path
+    except ImportError:
+        print(
+            "Please install charset-normalizer: pip install charset-normalizer",
+            file=sys.stderr,
+        )
+        sys.exit(4)
+
+    matches = from_path(path)
+    match = matches.best()
+    if match is None or match.encoding is None:
+        # Fallback heuristic for Western single-byte text
+        with open(path, "rb") as fb:
+            data = fb.read()
+        try:
+            return "cp1252", data.decode("cp1252", errors="strict")
+        except UnicodeDecodeError:
+            print("Unable to detect encoding reliably.", file=sys.stderr)
+            sys.exit(5)
+
+    return match.encoding, str(match)
+
+
+def convert_to_utf8(src_path: str, out_path: str, src_encoding: str = None) -> str:
+    """
+    Convert an entire file to UTF-8 (re-decoding everything).
+    If src_encoding is provided, use it; else auto-detect.
+    Returns the encoding actually used.
+    """
+    if src_encoding:
+        with open(src_path, "rb") as fb:
+            data = fb.read()
+        try:
+            text = data.decode(src_encoding, errors="strict")
+        except LookupError:
+            print(f"Unknown encoding: {src_encoding}", file=sys.stderr)
+            sys.exit(6)
+        except UnicodeDecodeError as e:
+            print(f"Decoding failed with {src_encoding}: {e}", file=sys.stderr)
+            sys.exit(7)
+        used = src_encoding
+    else:
+        used, text = detect_encoding_text(src_path)
+
+    with open(out_path, "w", encoding="utf-8", newline="") as fw:
+        fw.write(text)
+    return used
+
+
+def verify_utf8_file(path: str) -> Tuple[bool, str]:
+    try:
+        with open(path, "rb") as fb:
+            fb.read().decode("utf-8", errors="strict")
+        return True, ""
+    except UnicodeDecodeError as e:
+        return False, str(e)
+
+
+# -------------------------
+# Targeted single-byte fixer
+# -------------------------
+
+
+def iter_lines_with_offsets(b: bytes):
+    """
+    Yield (line_bytes, line_start_abs_offset). Preserves LF/CRLF/CR in bytes.
+    """
+    start = 0
+    for i, byte in enumerate(b):
+        if byte == 0x0A:  # LF
+            yield b[start : i + 1], start
+            start = i + 1
+    if start < len(b):
+        yield b[start:], start
+
+
+def detect_probable_fallbacks() -> List[str]:
+    # Good defaults for Western/Portuguese text
+    return ["cp1252", "iso-8859-1", "iso-8859-15"]
+
+
+def repair_mixed_utf8_line(line: bytes, base_offset: int, fallback_chain: List[str]):
+    """
+    Strictly validate UTF-8 and fix *only* the exact offending byte when an error occurs.
+    This avoids touching adjacent valid UTF-8 (prevents mojibake like 'Ã©').
+    """
+    out_fragments: List[str] = []
+    fixes = []
+    pos = 0
+    n = len(line)
+
+    while pos < n:
+        dec = codecs.getincrementaldecoder("utf-8")("strict")
+        try:
+            s = dec.decode(line[pos:], final=True)
+            out_fragments.append(s)
+            break
+        except UnicodeDecodeError as e:
+            # Append the valid prefix before the error
+            if e.start > 0:
+                out_fragments.append(
+                    line[pos : pos + e.start].decode("utf-8", errors="strict")
+                )
+
+            bad_index = pos + e.start  # absolute index in 'line'
+            bad_slice = line[bad_index : bad_index + 1]  # FIX EXACTLY ONE BYTE
+
+            # Decode that single byte using the first working fallback
+            decoded = None
+            used_enc = None
+            for enc in fallback_chain:
+                try:
+                    decoded = bad_slice.decode(enc, errors="strict")
+                    used_enc = enc
+                    break
+                except Exception:
+                    continue
+            if decoded is None:
+                # latin-1 always succeeds (byte->same code point)
+                decoded = bad_slice.decode("latin-1")
+                used_enc = "latin-1 (fallback)"
+
+            out_fragments.append(decoded)
+
+            # Log the fix
+            col_1based = bad_index + 1  # byte-based column
+            fixes.append(
+                {
+                    "line_base_offset": base_offset,
+                    "line": None,  # caller fills line number
+                    "column": col_1based,
+                    "abs_offset": base_offset + bad_index,
+                    "bad_bytes_hex": bad_slice.hex(),
+                    "used_encoding": used_enc,
+                    "replacement_preview": decoded,
+                }
+            )
+
+            # Advance exactly one byte past the offending byte and continue
+            pos = bad_index + 1
+
+    return "".join(out_fragments), fixes
+
+
+def targeted_fix_to_utf8(
+    src_path: str,
+    out_path: str,
+    fallback_chain: List[str],
+    dry_run: bool,
+    max_fixes: int,
+):
+    with open(src_path, "rb") as fb:
+        data = fb.read()
+
+    total_fixes = 0
+    repaired_lines: List[str] = []
+    line_no = 0
+    max_val = max_fixes if max_fixes != 0 else float("inf")
+
+    for line_bytes, base_offset in iter_lines_with_offsets(data):
+        line_no += 1
+        # Fast path: keep lines that are already valid UTF-8
+        try:
+            repaired_lines.append(line_bytes.decode("utf-8", errors="strict"))
+            continue
+        except UnicodeDecodeError:
+            pass
+
+        fixed_text, fixes = repair_mixed_utf8_line(
+            line_bytes, base_offset, fallback_chain=fallback_chain
+        )
+        for f in fixes:
+            f["line"] = line_no
+
+        repaired_lines.append(fixed_text)
+
+        # Log fixes
+        for f in fixes:
+            total_fixes += 1
+            print(
+                f"[FIX {total_fixes}] Line {f['line']}, Column {f['column']}, Abs offset {f['abs_offset']}"
+            )
+            print(f"  Bad bytes: 0x{f['bad_bytes_hex']}")
+            print(f"  Used encoding: {f['used_encoding']}")
+            preview = f["replacement_preview"].replace("\r", "\\r").replace("\n", "\\n")
+            if len(preview) > 40:
+                preview = preview[:40] + "…"
+            print(f"  Replacement preview: {preview}")
+            print()
+            if total_fixes >= max_val:
+                print(f"Reached max fixes limit ({max_fixes}). Stopping scan.")
+                break
+        if total_fixes >= max_val:
+            break
+
+    if dry_run:
+        print(f"Dry run complete. Detected {total_fixes} fix(es). No file written.")
+        return total_fixes
+
+    # Join and verify result can be encoded to UTF-8
+    repaired_text = "".join(repaired_lines)
+    try:
+        repaired_text.encode("utf-8", errors="strict")
+    except UnicodeEncodeError as e:
+        print(f"Internal error: repaired text not valid UTF-8: {e}", file=sys.stderr)
+        sys.exit(3)
+
+    with open(out_path, "w", encoding="utf-8", newline="") as fw:
+        fw.write(repaired_text)
+
+    print(f"Fixed file written to: {out_path}")
+    print(f"Total fixes applied: {total_fixes}")
+    return total_fixes
+
+
+# -------------------------
+# CLI
+# -------------------------
+
+
+def main():
+    ap = argparse.ArgumentParser(
+        description=(
+            "Scan for invalid UTF-8; optionally convert whole file or fix only invalid bytes.\n\n"
+            "By default, --convert and --fix **edit the input file in place** and create a backup "
+            "named '<input>.bak' before writing. If you pass --output, the original file is left "
+            "unchanged and no backup is created. Use --dry-run to preview fixes without writing."
+        ),
+        formatter_class=argparse.RawTextHelpFormatter,
+    )
+    ap.add_argument("path", help="Path to the CSV/text file")
+    ap.add_argument(
+        "--context",
+        type=int,
+        default=20,
+        help="Bytes of context to show around errors (default: 20)",
+    )
+    ap.add_argument(
+        "--limit",
+        type=int,
+        default=100,
+        help="Max errors to report during scan (0 = unlimited)",
+    )
+    ap.add_argument(
+        "--skip-scan", action="store_true", help="Skip initial scan for speed"
+    )
+
+    # Whole-file convert
+    ap.add_argument(
+        "--convert",
+        action="store_true",
+        help="Convert entire file to UTF-8 using auto/forced encoding "
+        "(in-place by default; creates '<input>.bak').",
+    )
+    ap.add_argument(
+        "--encoding",
+        help="Force source encoding for --convert or first fallback for --fix",
+    )
+    ap.add_argument(
+        "--output",
+        help="Write to this path instead of in-place (no .bak is created in that case)",
+    )
+
+    # Targeted fix
+    ap.add_argument(
+        "--fix",
+        action="store_true",
+        help="Fix only invalid byte(s) via fallback encodings "
+        "(in-place by default; creates '<input>.bak').",
+    )
+    ap.add_argument(
+        "--fallbacks",
+        help="Comma-separated fallback encodings (default: cp1252,iso-8859-1,iso-8859-15)",
+    )
+    ap.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="(fix) Print fixes but do not write or create a .bak",
+    )
+    ap.add_argument(
+        "--max-fixes",
+        type=int,
+        default=0,
+        help="(fix) Stop after N fixes (0 = unlimited)",
+    )
+
+    args = ap.parse_args()
+    path = args.path
+
+    if not os.path.isfile(path):
+        print(f"File not found: {path}", file=sys.stderr)
+        sys.exit(2)
+
+    # Optional scan first
+    if not args.skip_scan:
+        scan_file_for_utf8_errors(path, context=args.context, limit=args.limit)
+
+    # Mode selection guards
+    if args.convert and args.fix:
+        print("Choose either --convert or --fix (not both).", file=sys.stderr)
+        sys.exit(9)
+    if not args.convert and not args.fix and args.skip_scan:
+        print("No action selected (use --convert or --fix).")
+        return
+    if not args.convert and not args.fix:
+        # User only wanted a scan
+        return
+
+    # Determine output path and backup behavior
+    # In-place by default: create '<input>.bak' before overwriting.
+    if args.output:
+        out_path = args.output
+        in_place = False
+    else:
+        out_path = path
+        in_place = True
+
+    # CONVERT mode
+    if args.convert:
+        print("\n[CONVERT MODE] Converting file to UTF-8...")
+        if in_place:
+            # Create backup before overwriting original
+            backup_path = path + ".bak"
+            shutil.copy2(path, backup_path)
+            print(f"Backup created: {backup_path}")
+        used = convert_to_utf8(path, out_path, src_encoding=args.encoding)
+        print(f"Source encoding used: {used}")
+        print(f"Saved UTF-8 file as: {out_path}")
+        ok, err = verify_utf8_file(out_path)
+        if ok:
+            print("Verification: output is valid UTF-8 ✅")
+        else:
+            print(f"Verification failed: {err}")
+            sys.exit(8)
+        return
+
+    # FIX mode (targeted, single-byte)
+    if args.fix:
+        print("\n[FIX MODE] Fixing only invalid bytes to UTF-8...")
+        if args.dry_run:
+            # Dry-run: never write or create backup
+            out_path_effective = os.devnull
+            in_place_effective = False
+        else:
+            out_path_effective = out_path
+            in_place_effective = in_place
+
+        # Build fallback chain (if --encoding provided, try it first)
+        if args.fallbacks:
+            fallback_chain = [e.strip() for e in args.fallbacks.split(",") if e.strip()]
+        else:
+            fallback_chain = detect_probable_fallbacks()
+        if args.encoding and args.encoding not in fallback_chain:
+            fallback_chain = [args.encoding] + fallback_chain
+
+        if in_place_effective:
+            # Create backup before overwriting original (only when actually writing)
+            backup_path = path + ".bak"
+            shutil.copy2(path, backup_path)
+            print(f"Backup created: {backup_path}")
+
+        fix_count = targeted_fix_to_utf8(
+            path,
+            out_path_effective,
+            fallback_chain=fallback_chain,
+            dry_run=args.dry_run,
+            max_fixes=args.max_fixes,
+        )
+
+        if not args.dry_run:
+            ok, err = verify_utf8_file(out_path_effective)
+            if ok:
+                print("Verification: output is valid UTF-8 ✅")
+                print(f"Fix mode completed — {fix_count} byte(s) corrected.")
+            else:
+                print(f"Verification failed: {err}")
+                sys.exit(8)
+        return
+
+
+if __name__ == "__main__":
+    main()
--- a/parsedmarc/resources/maps/find_unknown_base_reverse_dns.py
+++ b/parsedmarc/resources/maps/find_unknown_base_reverse_dns.py
@@ -1,6 +1,5 @@
 #!/usr/bin/env python

-import logging
 import os
 import csv

@@ -9,60 +8,68 @@ def _main():
    input_csv_file_path = "base_reverse_dns.csv"
    base_reverse_dns_map_file_path = "base_reverse_dns_map.csv"
    known_unknown_list_file_path = "known_unknown_base_reverse_dns.txt"
+    psl_overrides_file_path = "psl_overrides.txt"
    output_csv_file_path = "unknown_base_reverse_dns.csv"

    csv_headers = ["source_name", "message_count"]

    output_rows = []

-    logging.basicConfig()
-    logger = logging.getLogger(__name__)
-    logger.setLevel(logging.INFO)
-
-    for p in [
-        input_csv_file_path,
-        base_reverse_dns_map_file_path,
-        known_unknown_list_file_path,
-    ]:
-        if not os.path.exists(p):
-            logger.error(f"{p} does not exist")
-            exit(1)
-    logger.info(f"Loading {known_unknown_list_file_path}")
    known_unknown_domains = []
-    with open(known_unknown_list_file_path) as f:
-        for line in f.readlines():
-            domain = line.lower().strip()
-            if domain in known_unknown_domains:
-                logger.warning(
-                    f"{domain} is in {known_unknown_list_file_path} multiple times"
-                )
-            else:
-                known_unknown_domains.append(domain)
-    logger.info(f"Loading {base_reverse_dns_map_file_path}")
+    psl_overrides = []
    known_domains = []
+    output_rows = []
+
+    def load_list(file_path, list_var):
+        if not os.path.exists(file_path):
+            print(f"Error: {file_path} does not exist")
+        print(f"Loading {file_path}")
+        with open(file_path) as f:
+            for line in f.readlines():
+                domain = line.lower().strip()
+                if domain in list_var:
+                    print(f"Error: {domain} is in {file_path} multiple times")
+                    exit(1)
+                elif domain != "":
+                    list_var.append(domain)
+
+    load_list(known_unknown_list_file_path, known_unknown_domains)
+    load_list(psl_overrides_file_path, psl_overrides)
+    if not os.path.exists(base_reverse_dns_map_file_path):
+        print(f"Error: {base_reverse_dns_map_file_path} does not exist")
+    print(f"Loading {base_reverse_dns_map_file_path}")
    with open(base_reverse_dns_map_file_path) as f:
        for row in csv.DictReader(f):
            domain = row["base_reverse_dns"].lower().strip()
            if domain in known_domains:
-                logger.warning(
-                    f"{domain} is in {base_reverse_dns_map_file_path} multiple times"
+                print(
+                    f"Error: {domain} is in {base_reverse_dns_map_file_path} multiple times"
                )
+                exit()
            else:
                known_domains.append(domain)
            if domain in known_unknown_domains and known_domains:
-                pass
-                logger.warning(
-                    f"{domain} is in {known_unknown_list_file_path} and {base_reverse_dns_map_file_path}"
+                print(
+                    f"Error:{domain} is in {known_unknown_list_file_path} and \
+                        {base_reverse_dns_map_file_path}"
                )
-
-    logger.info(f"Checking domains against {base_reverse_dns_map_file_path}")
+                exit(1)
+    if not os.path.exists(input_csv_file_path):
+        print(f"Error: {base_reverse_dns_map_file_path} does not exist")
+        exit(1)
    with open(input_csv_file_path) as f:
        for row in csv.DictReader(f):
            domain = row["source_name"].lower().strip()
+            if domain == "":
+                continue
+            for psl_domain in psl_overrides:
+                if domain.endswith(psl_domain):
+                    domain = psl_domain.strip(".").strip("-")
+                    break
            if domain not in known_domains and domain not in known_unknown_domains:
-                logger.info(f"New unknown domain found: {domain}")
+                print(f"New unknown domain found: {domain}")
                output_rows.append(row)
-    logger.info(f"Writing {output_csv_file_path}")
+    print(f"Writing {output_csv_file_path}")
    with open(output_csv_file_path, "w") as f:
        writer = csv.DictWriter(f, fieldnames=csv_headers)
        writer.writeheader()
--- a/parsedmarc/resources/maps/known_unknown_base_reverse_dns.txt
+++ b/parsedmarc/resources/maps/known_unknown_base_reverse_dns.txt
@@ -1,125 +1,601 @@
-200.in-addr.arpa
+1jli.site
+26.107
+444qcuhilla.com
+4xr1.com
+9services.com
+a7e.ru
+a94434500-blog.com
+aams8.jp
+abv-10.top
+acemail.co.in
+activaicon.com
+adcritic.net
 adlucrumnewsletter.com
 admin.corpivensa.gob.ve
+advantageiq.com
+advrider.ro
 aerospacevitro.us.com
+agenturserver.de
+aghories.com
+ai270.net
 albagroup-eg.com
+alchemy.net
+alohabeachcamp.net
+alsiscad.com
+aluminumpipetubing.com
+americanstorageca.com
+amplusserver.info
+anchorfundhub.com
+anglishment.com
 anteldata.net.uy
+antis.edu
 antonaoll.com
+anviklass.org
+anwrgrp.lat
 aosau.net
 arandomserver.com
+aransk.ru
+ardcs.cn
+armninl.met
+as29550.net
+asahachimaru.com
+aserv.co.za
 asmecam.it
+ateky.net.br
+aurelienvos.com
+automatech.lat
+avistaadvantage.com
 b8sales.com
+bahjs.com
+baliaura.com
+banaras.co
+bearandbullmarketnews.com
 bestinvestingtime.com
+bhjui.com
 biocorp.com
-bisno1.co.jp
+biosophy.net
+bitter-echo.com
+bizhostingservices.com
+blguss.com
+bluenet.ch
 bluhosting.com
+bnasg.com
 bodiax.pp.ua
 bost-law.com
+brainity.com
+brazalnde.net
+brellatransplc.shop
 brnonet.cz
+broadwaycover.com
 brushinglegal.de
+brw.net
+btes.tv
+budgeteasehub.com
+buoytoys.com
+buyjapanese.jp
+c53dw7m24rj.com
+cahtelrandom.org
+casadelmarsamara.com
+cashflowmasterypro.com
+cavabeen.com
+cbti.net
+centralmalaysia.com
+chauffeurplan.co.uk
+checkpox.fun
+chegouseuvlache.org
+chinaxingyu.xyz
 christus.mx
+churchills.market
+ci-xyz.fit
+cisumrecords.com
+ckaik.cn
+clcktoact.com
+cli-eurosignal.cz
+cloud-admin.it
 cloud-edm.com
+cloudflare-email.org
+cloudhosting.rs
 cloudlogin.co
+cloudplatformpro.com
 cnode.io
+cntcloud.com
+code-it.net
+codefriend.top
+colombiaceropapel.org
 commerceinsurance.com
+comsharempc.com
+conexiona.com
 coolblaze.com
+coowo.com
+corpemail.net
+cp2-myorderbox.com
 cps.com.ar
+crnagora.net
+cross-d-bar-troutranch.com
+ctla.co.kr
+cumbalikonakhotel.com
+currencyexconverter.com
+daakbabu.com
+daikinmae.com
+dairyvalley.com.my
+dastans.ru
+datahost36.de
+ddii.network
+deep-sek.shop
+deetownsounds.com
+descarca-counter-strike.net
 detrot.xyz
+dettlaffinc.com
+dextoolse.net
+digestivedaily.com
 digi.net.my
+dinofelis.cn
+diwkyncbi.top
 dkginternet.com
+dnexpress.info
+dns-oid.com
+dnsindia.net
+domainserver.ne.jp
+domconfig.com
 doorsrv.com
+dreampox.fun
 dreamtechmedia.com
 ds.network
+dss-group.net
+dvj.theworkpc.com
+dwlcka.com
+dynamic-wiretel.in
+dyntcorp.com
+easternkingspei.com
+economiceagles.com
+egosimail.com
+eliotporterphotos.us
+emailgids.net
 emailperegrine.com
+entendercopilot.com
+entretothom.net
+epaycontrol.com
+epicinvestmentsreview.co
+epicinvestmentsreview.com
+epik.com
 epsilon-group.com
+erestaff.com
+euro-trade-gmbh.com
+example.com
+exposervers.com-new
+extendcp.co.uk
 eyecandyhosting.xyz
+fastwebnet.it
+fd9ing7wfn.com
+feipnghardware.com
 fetscorp.shop
+fewo-usedom.net
+fin-crime.com
+financeaimpoint.com
+financeupward.com
+firmflat.com
+flex-video.bnr.la
+flourishfusionlife.com
 formicidaehunt.net
 fosterheap.com
+fredi.shop
+frontiernet.net
+ftifb7tk3c.com
+gamersprotectionvpn.online
 gendns.com
+getgreencardsfast.com
+getthatroi.com
+gibbshosting.com
+gigidea.net
+giize.com
 ginous.eu.com
+gis.net
 gist-th.com
+globalglennpartners.com
+goldsboroughplace.com
 gophermedia.com
 gqlists.us.com
 gratzl.de
+greatestworldnews.com
+greennutritioncare.com
+gsbb.com
+gumbolimbo.net
+h-serv.co.uk
+haedefpartners.com
+halcyon-aboveboard.com
+hanzubon.org
+healthfuljourneyjoy.com
 hgnbroken.us.com
+highwey-diesel.com
+hirofactory.com
+hjd.asso.fr
+hongchenggco.pro
+hongkongtaxi.co
+hopsinthehanger.com
+hosted-by-worldstream.net
+hostelsucre.com
 hosting1337.com
+hostinghane.com
+hostinglotus.cloud
 hostingmichigan.com
+hostiran.name
+hostmnl.com
 hostname.localhost
 hostnetwork.com
+hosts.net.nz
+hostserv.eu
 hostwhitelabel.com
+hpms1.jp
+hunariojmk.net
+hunriokinmuim.net
+hypericine.com
+i-mecca.net
+iaasdns.com
+iam.net.ma
+iconmarketingguy.com
 idcfcloud.net
+idealconcept.live
+igmohji.com
+igppevents.org.uk
+ihglobaldns.com
+ilmessicano.com
+imjtmn.cn
 immenzaces.com
+in-addr-arpa
+in-addr.arpa
+indsalelimited.com
+indulgent-holistic.com
+industechint.org
+inshaaegypt.com
+intal.uz
+interfarma.kz
+intocpanel.com
+ip-147-135-108.us
+ip-178-33-109.eu
+ip-ptr.tech
+iswhatpercent.com
+itsidc.com
+itwebs.com
+iuon.net
 ivol.co
 jalanet.co.id
+jimishare.com
+jlccptt.net.cn
+jlenterprises.co.uk
+jmontalto.com
+joyomokei.com
+jumanra.org
+justlongshirts.com
 kahlaa.com
+kaw.theworkpc.com
 kbronet.com.tw
 kdnursing.org
+kielnet.net
+kihy.theworkpc.com
+kingschurchwirral.org
 kitchenaildbd.com
+klaomi.shop
+knkconsult.net
+kohshikai.com
+krhfund.org
+krillaglass.com
+lancorhomes.com
+landpedia.org
+lanzatuseo.es
+layerdns.cloud
+learninglinked.com
 legenditds.com
+levertechcentre.com
+lhost.no
+lideri.net.br
 lighthouse-media.com
+lightpath.net
+limogesporcelainboxes.com
+lindsaywalt.net
+linuxsunucum.com
+listertermoformadoa.com
+llsend.com
+local.net
 lohkal.com
+londionrtim.net
 lonestarmm.net
+longmarquis.com
+longwoodmgmt.com
+lse.kz
+lunvoy.com
+luxarpro.ru
+lwl-puehringer.at
+lynx.net.lb
+lyse.net
+m-sender.com.ua
+maggiolicloud.it
 magnetmail.net
+magnumgo.uz
+maia11.com
+mail-fire.com
+mailsentinel.net
+mailset.cn
+malardino.net
+managed-vps.net
 manhattanbulletpoint.com
+manpowerservices.com
+marketmysterycode.com
+marketwizardspro.com
 masterclassjournal.com
+matroguel.cam
+maximpactipo.com
+mechanicalwalk.store
+mediavobis.com
+meqlobal.com
+mgts.by
+migrans.net
+miixta.com
+milleniumsrv.com
+mindworksunlimited.com
+mirth-gale.com
+misorpresa.com
+mitomobile.com
+mitsubachi-kibako.net
+mjinn.com
+mkegs.shop
+mobius.fr
+model-ac.ink
 moderntradingnews.com
+monnaiegroup.com
+monopolizeright.com
 moonjaws.com
+morningnewscatcher.com
 motion4ever.net
 mschosting.com
+msdp1.com
 mspnet.pro
 mts-nn.ru
+multifamilydesign.com
+mxserver.ro
 mxthunder.net
+my-ihor.ru
+mycloudmailbox.com
+myfriendforum.com
 myrewards.net
 mysagestore.com
+mysecurewebserver.com
+myshanet.net
+myvps.jp
+mywedsite.net
+mywic.eu
+name.tools
+nanshenqfurniture.com
+nask.pl
+navertise.net
+ncbb.kz
 ncport.ru
+ncsdi.ws
 nebdig.com
 neovet-base.ru
+netbri.com
+netcentertelecom.net.br
+neti.ee
+netkl.org
+newinvestingguide.com
+newwallstreetcode.com
+ngvcv.cn
 nic.name
 nidix.net
+nieuwedagnetwerk.net
+nlscanme.com
+nmeuh.cn
+noisndametal.com
+nucleusemail.com
+nutriboostlife.com
+nwo.giize.com
+nwwhalewatchers.org
+ny.adsl
+nyt1.com
+offerslatedeals.com
+office365.us
 ogicom.net
+olivettilexikon.co.uk
 omegabrasil.inf.br
 onnet21.com
+onumubunumu.com
+oppt-ac.fit
+orbitel.net.co
+orfsurface.com
+orientalspot.com
+outsidences.com
 ovaltinalization.co
 overta.ru
+ox28vgrurc.com
+pamulang.net
+panaltyspot.space
+panolacountysheriffms.com
 passionatesmiles.com
+paulinelam.com
+pdi-corp.com
+peloquinbeck.com
+perimetercenter.net
+permanentscreen.com
+permasteellisagroup.com
+perumkijhyu.net
+pesnia.com.ua
+ph8ltwdi12o.com
+pharmada.com.de
+phdns3.es
+pigelixval1.com
+pipefittingsindia.com
 planethoster.net
+playamedia.io
+plesk.page
 pmnhost.net
+pokiloandhu.net
+pokupki5.ru
+polandi.net
 popiup.com
+ports.net
+posolstvostilya.com
+potia.net
 prima.com.ar
 prima.net.ar
+profsol.co.uk
+prohealthmotion.com
+promooffermarket.site
 proudserver.com
+proxado.com
+psnm.ru
+pvcwindowsprices.live
 qontenciplc.autos
+quakeclick.com
+quasarstate.store
+quatthonggiotico.com
+qxyxab44njd.com
+radianthealthrenaissance.com
+rapidns.com
 raxa.host
+reberte.com
+reethvikintl.com
+regruhosting.ru
+reliablepanel.com
+rgb365.eu
+riddlecamera.net
+riddletrends.com
+roccopugliese.com
+runnin-rebels.com
+rupar.puglia.it
+rwdhosting.ca
+s500host.com
+sageevents.co.ke
 sahacker-2020.com
 samsales.site
+sante-lorraine.fr
+saransk.ru
 satirogluet.com
-securednshost.com
+scioncontacts.com
+sdcc.my
+seaspraymta3.net
+secorp.mx
 securen.net
 securerelay.in
 securev.net
+seductiveeyes.com
+seizethedayconsulting.com
+serroplast.shop
+server290.com
+server342.com
+server3559.cc
 servershost.biz
+sfek.kz
+sgnetway.net
+shopfox.ca
+silvestrejaguar.sbs
+silvestreonca.sbs
+simplediagnostics.org
+siriuscloud.jp
+sisglobalresearch.com
+sixpacklink.net
+sjestyle.com
 smallvillages.com
+smartape-vps.com
 solusoftware.com
+sourcedns.com
+southcoastwebhosting12.com
+specialtvvs.com
 spiritualtechnologies.io
 sprout.org
+srv.cat
 stableserver.net
+statlerfa.co.uk
+stock-smtp.top
+stockepictigers.com
 stockexchangejournal.com
+subterranean-concave.com
 suksangroup.com
+swissbluetopaz.com
+switer.shop
+sysop4.com
 system.eu.com
+szhongbing.com
 t-jon.com
+tacaindo.net
+tacom.tj
+tankertelz.co
+tataidc.com
+teamveiw.com
+tecnoxia.net
+tel-xyz.fit
 tenkids.net
+terminavalley.com
 thaicloudsolutions.com
+thaikinghost.com
 thaimonster.com
+thegermainetruth.net
+thehandmaderose.com
+thepushcase.com
+ticdns.com
+tigo.bo
+toledofibra.net.br
+topdns.com
+totaal.net
+totalplay.net
+tqh.ro
+traderlearningcenter.com
+tradeukraine.site
+traveleza.com
+trwww.com
+tsuzakij.com
 tullostrucking.com
+turbinetrends.com
+twincitiesdistinctivehomes.com
+tylerfordonline.com
+uiyum.com
+ultragate.com
+uneedacollie.com
+unified.services
 unite.services
 urawasl.com
 us.servername.us
+vagebond.net
+varvia.de
+vbcploo.com
+vdc.vn
 vendimetry.com
 vibrantwellnesscorp.com
+virtualine.org
+visit.docotor
+viviotech.us
+vlflgl.com
+volganet.ru
+vrns.net
+vulterdi.edu
+vvondertex.com
 wallstreetsgossip.com
+wamego.net
+wanekoohost.com
+wealthexpertisepro.com
+web-login.eu
 weblinkinternational.com
+webnox.io
+websale.net
+welllivinghive.com
+westparkcom.com
+wetransfer-eu.com
+wheelch.me
+whoflew.com
+whpservers.com
+wisdomhard.com
+wisewealthcircle.com
+wisvis.com
+wodeniowa.com
+wordpresshosting.xyz
+wsiph2.com
+xnt.mx
+xodiax.com
+xpnuf.cn
 xsfati.us.com
 xspmail.jp
+yourciviccompass.com
+yourinvestworkbook.com
+yoursitesecure.net
 zerowebhosting.net
+zmml.uk
 znlc.jp
+ztomy.com
--- a/parsedmarc/resources/maps/psl_overrides.txt
+++ b/parsedmarc/resources/maps/psl_overrides.txt
@@ -0,0 +1,23 @@
+-applefibernet.com
+-c3.net.pl
+-celsiainternet.com
+-clientes-izzi.mx
+-clientes-zap-izzi.mx
+-imnet.com.br
+-mcnbd.com
+-smile.com.bd
+-tataidc.co.in
+-veloxfiber.com.br
+-wconect.com.br
+.amazonaws.com
+.cloudaccess.net
+.ddnsgeek.com
+.fastvps-server.com
+.in-addr-arpa
+.in-addr.arpa
+.kasserver.com
+.kinghost.net
+.linode.com
+.linodeusercontent.com
+.na4u.ru
+.sakura.ne.jp
--- a/parsedmarc/resources/maps/sortlists.py
+++ b/parsedmarc/resources/maps/sortlists.py
@@ -0,0 +1,184 @@
+#!/usr/bin/env python3
+
+from __future__ import annotations
+
+import os
+import csv
+from pathlib import Path
+from typing import Mapping, Iterable, Optional, Collection, Union, List, Dict
+
+
+class CSVValidationError(Exception):
+    def __init__(self, errors: list[str]):
+        super().__init__("\n".join(errors))
+        self.errors = errors
+
+
+def sort_csv(
+    filepath: Union[str, Path],
+    field: str,
+    *,
+    sort_field_value_must_be_unique: bool = True,
+    strip_whitespace: bool = True,
+    fields_to_lowercase: Optional[Iterable[str]] = None,
+    case_insensitive_sort: bool = False,
+    required_fields: Optional[Iterable[str]] = None,
+    allowed_values: Optional[Mapping[str, Collection[str]]] = None,
+) -> List[Dict[str, str]]:
+    """
+    Read a CSV, optionally normalize rows (strip whitespace, lowercase certain fields),
+    validate field values, and write the sorted CSV back to the same path.
+
+    - filepath: Path to the CSV to sort.
+    - field: The field name to sort by.
+    - fields_to_lowercase: Permanently lowercases these field(s) in the data.
+    - strip_whitespace: Remove all whitespace at the beginning and of field values.
+    - case_insensitive_sort: Ignore case when sorting without changing values.
+    - required_fields: A list of fields that must have data in all rows.
+    - allowed_values: A mapping of allowed values for fields.
+    """
+    path = Path(filepath)
+    required_fields = set(required_fields or [])
+    lower_set = set(fields_to_lowercase or [])
+    allowed_sets = {k: set(v) for k, v in (allowed_values or {}).items()}
+    if sort_field_value_must_be_unique:
+        seen_sort_field_values = []
+
+    with path.open("r", newline="") as infile:
+        reader = csv.DictReader(infile)
+        fieldnames = reader.fieldnames or []
+        if field not in fieldnames:
+            raise CSVValidationError([f"Missing sort column: {field!r}"])
+        missing_headers = required_fields - set(fieldnames)
+        if missing_headers:
+            raise CSVValidationError(
+                [f"Missing required header(s): {sorted(missing_headers)}"]
+            )
+        rows = list(reader)
+
+    def normalize_row(row: Dict[str, str]) -> None:
+        if strip_whitespace:
+            for k, v in row.items():
+                if isinstance(v, str):
+                    row[k] = v.strip()
+        for fld in lower_set:
+            if fld in row and isinstance(row[fld], str):
+                row[fld] = row[fld].lower()
+
+    def validate_row(
+        row: Dict[str, str], sort_field: str, line_no: int, errors: list[str]
+    ) -> None:
+        if sort_field_value_must_be_unique:
+            if row[sort_field] in seen_sort_field_values:
+                errors.append(f"Line {line_no}: Duplicate row for '{row[sort_field]}'")
+            else:
+                seen_sort_field_values.append(row[sort_field])
+        for rf in required_fields:
+            val = row.get(rf)
+            if val is None or val == "":
+                errors.append(
+                    f"Line {line_no}: Missing value for required field '{rf}'"
+                )
+        for field, allowed_values in allowed_sets.items():
+            if field in row:
+                val = row[field]
+                if val not in allowed_values:
+                    errors.append(
+                        f"Line {line_no}: '{val}' is not an allowed value for '{field}' "
+                        f"(allowed: {sorted(allowed_values)})"
+                    )
+
+    errors: list[str] = []
+    for idx, row in enumerate(rows, start=2):  # header is line 1
+        normalize_row(row)
+        validate_row(row, field, idx, errors)
+
+    if errors:
+        raise CSVValidationError(errors)
+
+    def sort_key(r: Dict[str, str]):
+        v = r.get(field, "")
+        if isinstance(v, str) and case_insensitive_sort:
+            return v.casefold()
+        return v
+
+    rows.sort(key=sort_key)
+
+    with open(filepath, "w", newline="") as outfile:
+        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
+        writer.writeheader()
+        writer.writerows(rows)
+
+
+def sort_list_file(
+    filepath: Union[str, Path],
+    *,
+    lowercase: bool = True,
+    strip: bool = True,
+    deduplicate: bool = True,
+    remove_blank_lines: bool = True,
+    ending_newline: bool = True,
+    newline: Optional[str] = "\n",
+):
+    """Read a list from a file, sort it, optionally strip and deduplicate the values,
+    then write that list back to the file.
+
+    - Filepath: The path to the file.
+    - lowercase: Lowercase all values prior to sorting.
+    - remove_blank_lines: Remove any plank lines.
+    - ending_newline: End the file with a newline, even if remove_blank_lines is true.
+    - newline: The newline character to use.
+    """
+    with open(filepath, mode="r", newline=newline) as infile:
+        lines = infile.readlines()
+        for i in range(len(lines)):
+            if lowercase:
+                lines[i] = lines[i].lower()
+            if strip:
+                lines[i] = lines[i].strip()
+        if deduplicate:
+            lines = list(set(lines))
+        if remove_blank_lines:
+            while "" in lines:
+                lines.remove("")
+        lines = sorted(lines)
+        if ending_newline:
+            if lines[-1] != "":
+                lines.append("")
+    with open(filepath, mode="w", newline=newline) as outfile:
+        outfile.write("\n".join(lines))
+
+
+def _main():
+    map_file = "base_reverse_dns_map.csv"
+    map_key = "base_reverse_dns"
+    list_files = ["known_unknown_base_reverse_dns.txt", "psl_overrides.txt"]
+    types_file = "base_reverse_dns_types.txt"
+
+    with open(types_file) as f:
+        types = f.readlines()
+        while "" in types:
+            types.remove("")
+
+    map_allowed_values = {"Type": types}
+
+    for list_file in list_files:
+        if not os.path.exists(list_file):
+            print(f"Error: {list_file} does not exist")
+            exit(1)
+        sort_list_file(list_file)
+    if not os.path.exists(types_file):
+        print(f"Error: {types_file} does not exist")
+        exit(1)
+    sort_list_file(types_file, lowercase=False)
+    if not os.path.exists(map_file):
+        print(f"Error: {map_file} does not exist")
+        exit(1)
+    try:
+        sort_csv(map_file, map_key, allowed_values=map_allowed_values)
+    except CSVValidationError as e:
+        print(f"{map_file} did not validate: {e}")
+
+
+if __name__ == "__main__":
+    _main()
--- a/parsedmarc/splunk.py
+++ b/parsedmarc/splunk.py
@@ -5,7 +5,7 @@ import json
 import urllib3
 import requests

-from parsedmarc import __version__
+from parsedmarc.constants import USER_AGENT
 from parsedmarc.log import logger
 from parsedmarc.utils import human_timestamp_to_unix_timestamp

@@ -51,7 +51,7 @@ class HECClient(object):
        self._common_data = dict(host=self.host, source=self.source, index=self.index)

        self.session.headers = {
-            "User-Agent": "parsedmarc/{0}".format(__version__),
+            "User-Agent": USER_AGENT,
            "Authorization": "Splunk {0}".format(self.access_token),
        }

@@ -78,6 +78,9 @@ class HECClient(object):
                new_report = dict()
                for metadata in report["report_metadata"]:
                    new_report[metadata] = report["report_metadata"][metadata]
+                new_report["interval_begin"] = record["interval_begin"]
+                new_report["interval_end"] = record["interval_end"]
+                new_report["normalized_timespan"] = record["normalized_timespan"]
                new_report["published_policy"] = report["policy_published"]
                new_report["source_ip_address"] = record["source"]["ip_address"]
                new_report["source_country"] = record["source"]["country"]
@@ -98,7 +101,9 @@ class HECClient(object):
                    new_report["spf_results"] = record["auth_results"]["spf"]

                data["sourcetype"] = "dmarc:aggregate"
-                timestamp = human_timestamp_to_unix_timestamp(new_report["begin_date"])
+                timestamp = human_timestamp_to_unix_timestamp(
+                    new_report["interval_begin"]
+                )
                data["time"] = timestamp
                data["event"] = new_report.copy()
                json_str += "{0}\n".format(json.dumps(data))
--- a/parsedmarc/utils.py
+++ b/parsedmarc/utils.py
@@ -37,13 +37,19 @@ import requests
 from parsedmarc.log import logger
 import parsedmarc.resources.dbip
 import parsedmarc.resources.maps
-
+from parsedmarc.constants import USER_AGENT

 parenthesis_regex = re.compile(r"\s*\(.*\)\s*")

 null_file = open(os.devnull, "w")
 mailparser_logger = logging.getLogger("mailparser")
 mailparser_logger.setLevel(logging.CRITICAL)
+psl = publicsuffixlist.PublicSuffixList()
+psl_overrides_path = str(files(parsedmarc.resources.maps).joinpath("psl_overrides.txt"))
+with open(psl_overrides_path) as f:
+    psl_overrides = [line.rstrip() for line in f.readlines()]
+    while "" in psl_overrides:
+        psl_overrides.remove("")


 class EmailParserError(RuntimeError):
@@ -78,7 +84,8 @@ def get_base_domain(domain):

    .. note::
        Results are based on a list of public domain suffixes at
-        https://publicsuffix.org/list/public_suffix_list.dat.
+        https://publicsuffix.org/list/public_suffix_list.dat and overrides included in
+        parsedmarc.resources.maps.psl_overrides.txt

    Args:
        domain (str): A domain or subdomain
@@ -87,8 +94,12 @@ def get_base_domain(domain):
        str: The base domain of the given domain

    """
-    psl = publicsuffixlist.PublicSuffixList()
-    return psl.privatesuffix(domain)
+    domain = domain.lower()
+    publicsuffix = psl.privatesuffix(domain)
+    for override in psl_overrides:
+        if domain.endswith(override):
+            return override.strip(".").strip("-")
+    return publicsuffix


 def query_dns(domain, record_type, cache=None, nameservers=None, timeout=2.0):
@@ -345,7 +356,8 @@ def get_service_from_reverse_dns_base_domain(
    if not (offline or always_use_local_file) and len(reverse_dns_map) == 0:
        try:
            logger.debug(f"Trying to fetch reverse DNS map from {url}...")
-            response = requests.get(url)
+            headers = {"User-Agent": USER_AGENT}
+            response = requests.get(url, headers=headers)
            response.raise_for_status()
            csv_file.write(response.text)
            csv_file.seek(0)
@@ -355,6 +367,7 @@ def get_service_from_reverse_dns_base_domain(
        except Exception:
            logger.warning("Not a valid CSV file")
            csv_file.seek(0)
+            logging.debug("Response body:")
            logger.debug(csv_file.read())

    if len(reverse_dns_map) == 0:
--- a/parsedmarc/webhook.py
+++ b/parsedmarc/webhook.py
@@ -1,6 +1,7 @@
 import requests

 from parsedmarc import logger
+from parsedmarc.constants import USER_AGENT


 class WebhookClient(object):
@@ -21,7 +22,7 @@ class WebhookClient(object):
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers = {
-            "User-Agent": "parsedmarc",
+            "User-Agent": USER_AGENT,
            "Content-Type": "application/json",
        }

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -55,6 +55,7 @@ dependencies = [
    "tqdm>=4.31.1",
    "urllib3>=1.25.7",
    "xmltodict>=0.12.0",
+    "PyYAML>=6.0.3"
 ]

 [project.optional-dependencies]
@@ -76,9 +77,20 @@ parsedmarc = "parsedmarc.cli:_main"
 Homepage = "https://domainaware.github.io/parsedmarc"

 [tool.hatch.version]
-path = "parsedmarc/__init__.py"
+path = "parsedmarc/constants.py"

 [tool.hatch.build.targets.sdist]
 include = [
    "/parsedmarc",
 ]
+
+[tool.hatch.build]
+exclude = [
+"base_reverse_dns.csv",
+"find_bad_utf8.py",
+"find_unknown_base_reverse_dns.py",
+"unknown_base_reverse_dns.csv",
+"sortmaps.py",
+"README.md",
+"*.bak"
+]
--- a/sortmaps.py
+++ b/sortmaps.py
@@ -1,25 +0,0 @@
-#!/usr/bin/env python3
-
-import os
-import glob
-import csv
-
-
-maps_dir = os.path.join("parsedmarc", "resources", "maps")
-csv_files = glob.glob(os.path.join(maps_dir, "*.csv"))
-
-
-def sort_csv(filepath, column=0):
-    with open(filepath, mode="r", newline="") as infile:
-        reader = csv.reader(infile)
-        header = next(reader)
-        sorted_rows = sorted(reader, key=lambda row: row[column])
-
-    with open(filepath, mode="w", newline="\n") as outfile:
-        writer = csv.writer(outfile)
-        writer.writerow(header)
-        writer.writerows(sorted_rows)
-
-
-for csv_file in csv_files:
-    sort_csv(csv_file)
--- a/splunk/smtp_tls_dashboard.xml
+++ b/splunk/smtp_tls_dashboard.xml
@@ -0,0 +1,107 @@
+<form version="1.1" theme="dark">
+  <label>SMTP TLS Reporting</label>
+  <fieldset submitButton="false" autoRun="true">
+    <input type="time" token="time">
+      <label></label>
+      <default>
+        <earliest>-7d@h</earliest>
+        <latest>now</latest>
+      </default>
+    </input>
+    <input type="text" token="organization_name" searchWhenChanged="true">
+      <label>Organization name</label>
+      <default>*</default>
+      <initialValue>*</initialValue>
+    </input>
+    <input type="text" token="policy_domain">
+      <label>Policy domain</label>
+      <default>*</default>
+      <initialValue>*</initialValue>
+    </input>
+    <input type="dropdown" token="policy_type" searchWhenChanged="true">
+      <label>Policy type</label>
+      <choice value="*">Any</choice>
+      <choice value="tlsa">tlsa</choice>
+      <choice value="sts">sts</choice>
+      <choice value="no-policy-found">no-policy-found</choice>
+      <default>*</default>
+      <initialValue>*</initialValue>
+    </input>
+  </fieldset>
+  <row>
+    <panel>
+      <title>Reporting organizations</title>
+      <table>
+        <search>
+          <query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$
+| rename policies{}.policy_domain as policy_domain
+| rename policies{}.policy_type as policy_type
+| rename policies{}.failed_session_count as failed_sessions
+| rename policies{}.failure_details{}.failed_session_count as failed_sessions
+| rename policies{}.successful_session_count as successful_sessions
+| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
+| rename policies{}.failure_details{}.receiving_ip as receiving_ip
+| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
+| rename policies{}.failure_details{}.result_type as failure_type
+| fillnull value=0 failed_sessions
+| stats sum(failed_sessions) as failed_sessions sum(successful_sessions) as successful_sessions by organization_name
+| sort -successful_sessions 0</query>
+          <earliest>$time.earliest$</earliest>
+          <latest>$time.latest$</latest>
+        </search>
+        <option name="drilldown">none</option>
+        <option name="refresh.display">progressbar</option>
+      </table>
+    </panel>
+    <panel>
+      <title>Domains</title>
+      <table>
+        <search>
+          <query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$
+| rename policies{}.policy_domain as policy_domain
+| rename policies{}.policy_type as policy_type
+| rename policies{}.failed_session_count as failed_sessions
+| rename policies{}.failure_details{}.failed_session_count as failed_sessions
+| rename policies{}.successful_session_count as successful_sessions
+| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
+| rename policies{}.failure_details{}.receiving_ip as receiving_ip
+| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
+| rename policies{}.failure_details{}.result_type as failure_type
+| fillnull value=0 failed_sessions
+| stats sum(failed_sessions) as failed_sessions sum(successful_sessions) as successful_sessions  by policy_domain
+| sort -successful_sessions 0</query>
+          <earliest>$time.earliest$</earliest>
+          <latest>$time.latest$</latest>
+        </search>
+        <option name="drilldown">none</option>
+        <option name="refresh.display">progressbar</option>
+      </table>
+    </panel>
+  </row>
+  <row>
+    <panel>
+      <title>Failure details</title>
+      <table>
+        <search>
+          <query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$ policies{}.failure_details{}.result_type=*
+| rename policies{}.policy_domain as policy_domain
+| rename policies{}.policy_type as policy_type
+| rename policies{}.failed_session_count as failed_sessions
+| rename policies{}.failure_details{}.failed_session_count as failed_sessions
+| rename policies{}.successful_session_count as successful_sessions
+| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
+| rename policies{}.failure_details{}.receiving_ip as receiving_ip
+| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
+| fillnull value=0 failed_sessions
+| rename policies{}.failure_details{}.result_type as failure_type
+| table _time organization_name policy_domain policy_type failed_sessions successful_sessions sending_mta_ip receiving_ip receiving_mx_hostname failure_type
+| sort by -_time 0</query>
+          <earliest>$time.earliest$</earliest>
+          <latest>$time.latest$</latest>
+        </search>
+        <option name="drilldown">none</option>
+        <option name="refresh.display">progressbar</option>
+      </table>
+    </panel>
+  </row>
+</form>
--- a/tests.py
+++ b/tests.py
@@ -43,11 +43,12 @@ class Test(unittest.TestCase):

    def testExtractReportXMLComparator(self):
        """Test XML comparator function"""
-        print()
-        xmlnice = open("samples/extract_report/nice-input.xml").read()
-        print(xmlnice)
-        xmlchanged = minify_xml(open("samples/extract_report/changed-input.xml").read())
-        print(xmlchanged)
+        xmlnice_file = open("samples/extract_report/nice-input.xml")
+        xmlnice = xmlnice_file.read()
+        xmlnice_file.close()
+        xmlchanged_file = open("samples/extract_report/changed-input.xml")
+        xmlchanged = minify_xml(xmlchanged_file.read())
+        xmlchanged_file.close()
        self.assertTrue(compare_xml(xmlnice, xmlnice))
        self.assertTrue(compare_xml(xmlchanged, xmlchanged))
        self.assertFalse(compare_xml(xmlnice, xmlchanged))
@@ -62,7 +63,9 @@ class Test(unittest.TestCase):
            data = f.read()
        print("Testing {0}: ".format(file), end="")
        xmlout = parsedmarc.extract_report(data)
-        xmlin = open("samples/extract_report/nice-input.xml").read()
+        xmlin_file = open("samples/extract_report/nice-input.xml")
+        xmlin = xmlin_file.read()
+        xmlin_file.close()
        self.assertTrue(compare_xml(xmlout, xmlin))
        print("Passed!")

@@ -72,7 +75,9 @@ class Test(unittest.TestCase):
        file = "samples/extract_report/nice-input.xml"
        print("Testing {0}: ".format(file), end="")
        xmlout = parsedmarc.extract_report(file)
-        xmlin = open("samples/extract_report/nice-input.xml").read()
+        xmlin_file = open("samples/extract_report/nice-input.xml")
+        xmlin = xmlin_file.read()
+        xmlin_file.close()
        self.assertTrue(compare_xml(xmlout, xmlin))
        print("Passed!")

@@ -82,7 +87,9 @@ class Test(unittest.TestCase):
        file = "samples/extract_report/nice-input.xml.gz"
        print("Testing {0}: ".format(file), end="")
        xmlout = parsedmarc.extract_report_from_file_path(file)
-        xmlin = open("samples/extract_report/nice-input.xml").read()
+        xmlin_file = open("samples/extract_report/nice-input.xml")
+        xmlin = xmlin_file.read()
+        xmlin_file.close()
        self.assertTrue(compare_xml(xmlout, xmlin))
        print("Passed!")

@@ -92,12 +99,13 @@ class Test(unittest.TestCase):
        file = "samples/extract_report/nice-input.xml.zip"
        print("Testing {0}: ".format(file), end="")
        xmlout = parsedmarc.extract_report_from_file_path(file)
-        print(xmlout)
-        xmlin = minify_xml(open("samples/extract_report/nice-input.xml").read())
-        print(xmlin)
+        xmlin_file = open("samples/extract_report/nice-input.xml")
+        xmlin = minify_xml(xmlin_file.read())
+        xmlin_file.close()
        self.assertTrue(compare_xml(xmlout, xmlin))
-        xmlin = minify_xml(open("samples/extract_report/changed-input.xml").read())
-        print(xmlin)
+        xmlin_file = open("samples/extract_report/changed-input.xml")
+        xmlin = xmlin_file.read()
+        xmlin_file.close()
        self.assertFalse(compare_xml(xmlout, xmlin))
        print("Passed!")
Author	SHA1	Message	Date
Sean Whalen	110c6e507d	Update docs	2025-12-01 17:04:37 -05:00
Sean Whalen	c8cdd90a1e	Normalize timespans for aggregate reports in Elasticsearch and Opensearch	2025-12-01 16:34:40 -05:00
Sean Whalen	46a62cc10a	Update launch configuration and metadata key for timespan in aggregate report	2025-12-01 16:10:41 -05:00
Sean Whalen	67fe009145	Add sources my name table to the Kibana DMARC Summary dashboard This matches the table in the Splunk DMARC Aggregate reports dashboard	2025-11-30 19:43:14 -05:00
Sean Whalen	e405e8fa53	Update changelog to correct timespan threshold for DMARC report normalization	2025-11-30 16:17:07 -05:00
Sean Whalen	a72d08ceb7	Refactor configuration loading for normalize_timespan_threshold_hours	2025-11-30 16:16:32 -05:00
Sean Whalen	2785e3df34	More fixes for normalize_timespan_threshold_hours:	2025-11-30 13:56:50 -05:00
Sean Whalen	f4470a7dd2	Fix normalize_timespan_threshold_hours	2025-11-30 13:46:21 -05:00
Sean Whalen	18b9894a1f	Code formatting	2025-11-30 12:40:09 -05:00
Sean Whalen	d1791a97d3	Make timespan normalization hours configurable, with a 24.0 default	2025-11-30 12:23:38 -05:00
Sean Whalen	47ca6561c1	Fix changelog version	2025-11-30 10:46:48 -05:00
Sean Whalen	a0e18206ce	Bump version to 9.0.0	2025-11-29 23:01:04 -05:00
Sean Whalen	9e4ffdd54c	Add interval_begin, interval_end, and normalized_timespan to the Splunk report	2025-11-29 21:32:33 -05:00
Sean Whalen	434bd49eb3	Fix normalized_timespan in CSV output for aggregate reports	2025-11-29 21:23:39 -05:00
Sean Whalen	589038d2c9	Add normalized_timespan to CSV output for aggregate reports	2025-11-29 21:17:27 -05:00
Sean Whalen	c558224671	Rename normalized_timespan to timespan_requires_normalization and include interval_begin and interval_end in CSV output	2025-11-29 21:16:30 -05:00
Sean Whalen	044aa9e9a0	Include interval_begin in splunk output for accurate timestamping	2025-11-29 20:50:13 -05:00
Sean Whalen	6270468d30	Remove unneeded fields	2025-11-29 17:13:24 -05:00
Sean Whalen	832be7cfa3	Clean up imports	2025-11-29 16:56:12 -05:00
Sean Whalen	04dd11cf54	Fix formatting	2025-11-29 16:51:57 -05:00
Sean Whalen	0b41942916	Always include interval_begin and interval_end in records	2025-11-29 16:46:03 -05:00
Sean Whalen	f14a34202f	Add morse type hints	2025-11-29 16:33:40 -05:00
Sean Whalen	daa6653c29	Bump version to 8.20.0 and update changelog for new report volume normalization	2025-11-29 15:26:25 -05:00
Sean Whalen	45d1093a99	Normalize report volumes when a report timespan exceed 24 hours	2025-11-29 14:52:57 -05:00
Sean Whalen	c1a757ca29	Remove outdated launch config	2025-11-29 14:45:21 -05:00
Sean Whalen	69b9d25a99	Revert code formatting	2025-11-29 14:14:54 -05:00
Sean Whalen	94d65f979d	Code formatting	2025-11-29 14:04:20 -05:00
Sean Whalen	98342ecac6	8.19.1 (#627 ) - Ignore HTML content type in report email parsing (#626)	2025-11-29 11:37:31 -05:00
Sean Whalen	38a3d4eaae	Code formatting	2025-11-28 12:48:55 -05:00
Sean Whalen	a05c230152	8.19.0 (#622 ) 8.19.0 - Add multi-tenant support via an index-prefix domain mapping file - PSL overrides so that services like AWS are correctly identified - Additional improvements to report type detection - Fix webhook timeout parsing (PR #623) - Output to STDOUT when the new general config boolean `silent` is set to `False` (Close #614) - Additional services added to `base_reverse_dns_map.csv` --------- Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com> Co-authored-by: Félix <felix.debloisbeaucage@gmail.com>	2025-11-28 12:47:00 -05:00
Sean Whalen	17bdc3a134	More tests cleanup	2025-11-21 09:10:59 -05:00
Sean Whalen	858be00f22	Fix badge links and update image source branch	2025-11-21 09:03:04 -05:00
Sean Whalen	597ca64f9f	Clean up tests	2025-11-21 00:09:28 -05:00
Sean Whalen	c5dbe2c4dc	8.10.9 - Complete fix for #687 and more robust report type detection	2025-11-20 23:50:42 -05:00
Sean Whalen	082b3d355f	8.18.8 - Fix parsing emails with an uncompressed aggregate report attachment (Closes #607) - Add `--no-prettify-json` CLI option (PR #617)	2025-11-20 20:47:57 -05:00
Sean Whalen	2a7ce47bb1	Update code coverage badge link to main branch	2025-11-20 20:28:10 -05:00
daminoux	9882405d96	Update README.md fix url screenshot (#620 ) the url of screenshot is broken	2025-11-20 20:27:15 -05:00
Andrew	fce84763b9	add --no-prettify-json CLI option (#617 ) * updates process_reports to respect newly added prettify_json option * removes duplicate definition * removes redundant option * fixes typo	2025-11-02 15:54:59 -05:00
Rowan	8a299b8600	Updated default python docker base image to 3.13-slim (#618 ) * Updated default python docker base image to 3.13-slim * Added python 3.13 to tests	2025-10-29 22:34:06 -04:00
jandr	b4c2b21547	Sorted usage of TLS on SMTP (#613 ) Added a line for the `email_results` function to take into account the smtp_ssl setting.	2025-08-25 13:51:10 -04:00
Sean Whalen	865c249437	Update features list	2025-08-24 13:39:50 -04:00
Sean Whalen	013859f10e	Fix find_unknown_base_reverse_dns.py	2025-08-19 21:18:14 -04:00
Sean Whalen	6d4a31a120	Fix find_unknown_base_reverse_dns.py and sortlist.py	2025-08-19 20:59:42 -04:00
Sean Whalen	45d3dc3b2e	Fiz sortlists.py	2025-08-19 20:23:55 -04:00
Sean Whalen	4bbd97dbaa	Improve list verification	2025-08-19 20:02:55 -04:00
Sean Whalen	5df152d469	Refactor find_unknown_base_reverse_dns.py	2025-08-18 12:59:54 -04:00
Sean Whalen	d990bef342	Use \n here too	2025-08-17 21:08:28 -04:00
Sean Whalen	caf77ca6d4	Use \n when writing CSVs	2025-08-17 21:01:07 -04:00
Sean Whalen	4b3d32c5a6	Actual, actual Actual 6.18.7 release Revert back to using python csv instead of pandas to avoid conflicts with numpy in elasticsearch	2025-08-17 20:36:15 -04:00
Sean Whalen	5df5c10f80	Pin pandas an numpy versions	2025-08-17 19:59:53 -04:00
Sean Whalen	308d4657ab	Make sort_csv function more flexible	2025-08-17 19:43:19 -04:00
Sean Whalen	0f74e33094	Fix typo	2025-08-17 19:35:16 -04:00
Sean Whalen	9f339e11f5	Actual 6.18.7 release	2025-08-17 19:34:14 -04:00
Sean Whalen	391e84b717	Fix map sorting	2025-08-17 18:15:20 -04:00
Sean Whalen	8bf06ce5af	8.18.7 Removed improper spaces from `base_reverse_dns_map.csv` (Closes #612)	2025-08-17 18:13:49 -04:00
Sean Whalen	2b7ae50a27	Better wording	2025-08-17 17:01:22 -04:00
Sean Whalen	3feb478793	8.18.6 - Fix since option to correctly work with weeks (PR #604) - Add 183 entries to `base_reverse_dns_map.csv` - Add 57 entries to `known_unknown_base_reverse_dns.txt` - Check for invalid UTF-8 bytes in `base_reverse_dns_map.csv` at build - Remove unneeded items from the `parsedmarc.resources` module at build	2025-08-17 17:00:11 -04:00
Sean Whalen	01630bb61c	Update code formatting	2025-08-17 16:01:45 -04:00
Sean Whalen	39347cb244	Sdd find_bad_utf8.py	2025-08-17 15:55:47 -04:00
Sean Whalen	ed25526d59	Update maps	2025-08-17 15:17:24 -04:00
alagendijk-minddistrict	880d7110fe	Fix since option to correctly work with weeks (#604 )	2025-08-14 18:39:04 -04:00
Martin Kjær Jørgensen	d62001f5a4	fix wrong configuration option for maildir (#606 ) Signed-off-by: Martin Kjær Jørgensen <me@lagy.org>	2025-08-14 18:36:58 -04:00
Sean Whalen	0720bffcb6	Remove extra spaces	2025-06-10 19:05:06 -04:00
Sean Whalen	fecd55a97d	Add SMTP TLS Reporting dashboard for Splunk Closes #600	2025-06-10 18:54:43 -04:00
Sean Whalen	a121306eed	Fix typo in the map	2025-06-10 10:53:55 -04:00
Sean Whalen	980c9c7904	Add Hostinger to the map	2025-06-10 10:50:06 -04:00
Sean Whalen	963f5d796f	Fix build script	2025-06-10 09:51:12 -04:00
Sean Whalen	6532f3571b	Update lists	2025-06-09 20:05:56 -04:00
Sean Whalen	ea878443a8	Update lists	2025-06-09 17:04:16 -04:00
Sean Whalen	9f6de41958	Update lists	2025-06-09 13:41:49 -04:00
Sean Whalen	119192701c	Update lists	2025-06-09 12:02:50 -04:00
Sean Whalen	1d650be48a	Fix typo	2025-06-08 21:41:07 -04:00
Sean Whalen	a85553fb18	Update lists	2025-06-08 21:40:10 -04:00
Sean Whalen	5975d8eb21	Fix sorting	2025-06-08 20:17:21 -04:00
Sean Whalen	87ae6175f2	Update lists	2025-06-08 19:51:13 -04:00
Sean Whalen	68b93ed580	Update map	2025-06-03 14:54:58 -04:00
Sean Whalen	55508b513b	Remove debugging code	2025-06-03 14:38:15 -04:00
Sean Whalen	71511c0cfc	8.18.5 - Fix CSV download	2025-06-03 11:44:42 -04:00
Sean Whalen	7c45812284	8.18.4 - Fix webhooks	2025-06-02 16:52:48 -04:00
Sean Whalen	607a091a5f	8.18.3 - Move `__version__` to `parsedmarc.constants` - Create a constant `USER_AGENT` - Use the HTTP `User-Agent` header value `parsedmarc/version` for all HTTP requests	2025-06-02 16:43:26 -04:00
Sean Whalen	c308bf938c	Update the README	2025-06-02 15:43:51 -04:00