Compare commits

...

112 Commits

Author SHA1 Message Date
Sean Whalen
2b7ae50a27 Better wording 2025-08-17 17:01:22 -04:00
Sean Whalen
3feb478793 8.18.6
- Fix since option to correctly work with weeks (PR #604)
- Add 183 entries to `base_reverse_dns_map.csv`
- Add 57 entries to `known_unknown_base_reverse_dns.txt`
- Check for invalid UTF-8 bytes in `base_reverse_dns_map.csv` at build
- Remove unneeded items from the `parsedmarc.resources` module at build
2025-08-17 17:00:11 -04:00
Sean Whalen
01630bb61c Update code formatting 2025-08-17 16:01:45 -04:00
Sean Whalen
39347cb244 Sdd find_bad_utf8.py 2025-08-17 15:55:47 -04:00
Sean Whalen
ed25526d59 Update maps 2025-08-17 15:17:24 -04:00
alagendijk-minddistrict
880d7110fe Fix since option to correctly work with weeks (#604) 2025-08-14 18:39:04 -04:00
Martin Kjær Jørgensen
d62001f5a4 fix wrong configuration option for maildir (#606)
Signed-off-by: Martin Kjær Jørgensen <me@lagy.org>
2025-08-14 18:36:58 -04:00
Sean Whalen
0720bffcb6 Remove extra spaces 2025-06-10 19:05:06 -04:00
Sean Whalen
fecd55a97d Add SMTP TLS Reporting dashboard for Splunk
Closes #600
2025-06-10 18:54:43 -04:00
Sean Whalen
a121306eed Fix typo in the map 2025-06-10 10:53:55 -04:00
Sean Whalen
980c9c7904 Add Hostinger to the map 2025-06-10 10:50:06 -04:00
Sean Whalen
963f5d796f Fix build script 2025-06-10 09:51:12 -04:00
Sean Whalen
6532f3571b Update lists 2025-06-09 20:05:56 -04:00
Sean Whalen
ea878443a8 Update lists 2025-06-09 17:04:16 -04:00
Sean Whalen
9f6de41958 Update lists 2025-06-09 13:41:49 -04:00
Sean Whalen
119192701c Update lists 2025-06-09 12:02:50 -04:00
Sean Whalen
1d650be48a Fix typo 2025-06-08 21:41:07 -04:00
Sean Whalen
a85553fb18 Update lists 2025-06-08 21:40:10 -04:00
Sean Whalen
5975d8eb21 Fix sorting 2025-06-08 20:17:21 -04:00
Sean Whalen
87ae6175f2 Update lists 2025-06-08 19:51:13 -04:00
Sean Whalen
68b93ed580 Update map 2025-06-03 14:54:58 -04:00
Sean Whalen
55508b513b Remove debugging code 2025-06-03 14:38:15 -04:00
Sean Whalen
71511c0cfc 8.18.5
- Fix CSV download
2025-06-03 11:44:42 -04:00
Sean Whalen
7c45812284 8.18.4
- Fix webhooks
2025-06-02 16:52:48 -04:00
Sean Whalen
607a091a5f 8.18.3
- Move `__version__` to `parsedmarc.constants`
- Create a constant `USER_AGENT`
- Use the HTTP `User-Agent` header value `parsedmarc/version` for all HTTP requests
2025-06-02 16:43:26 -04:00
Sean Whalen
c308bf938c Update the README 2025-06-02 15:43:51 -04:00
Sean Whalen
918501ccb5 Better formatting 2025-06-02 15:20:40 -04:00
Sean Whalen
036c372ea3 8.18.2
- Merged PR #603
  - Fixes issue #595 - CI test fails for Elasticsearch
    - Moved Elasticsearch to a separate Docker service container for CI testing
    - Dropped Python 3.8 from CI testing
  - Fixes lookup and saving of DMARC forensic reports in Elasticsearch and OpenSearch
- Updated fallback `base_reverse_dns_map.csv`, which now includes over 1,400 lines
- Updated included `dbip-country-lite.mmdb` to the June 2025 release
- Automatically fall back to the internal `base_reverse_dns_map.csv` if the received file is not valid (Fixes #602)
  - Print the received data to the debug log
2025-06-02 15:19:19 -04:00
Sean Whalen
a969d83137 Update included IP database 2025-06-02 11:30:26 -04:00
Szasza Palmer
e299f7d161 fixing ES/OS forensic report lookup and storage, extracting ES to separate CI service (#603)
* fixing ES/OS forensic report lookup and storage, extracting ES to separate CI service

* bumping CI ES version to current latest

* reshuffling CI job attributes

* removing EOL Python 3.8 from the CI pipeline
2025-06-02 11:10:10 -04:00
Sean Whalen
4c04418dae Fix domain lists check 2025-04-24 16:03:18 -04:00
Sean Whalen
2ca9373ed0 Match dashboard fields 2025-04-24 15:44:22 -04:00
Sean Whalen
961ef6d804 Revert adding the BOM
It broke reading the file with python
2025-04-24 14:04:36 -04:00
Sean Whalen
573ba1e3e9 Add UTF-8 BOM to the CSV so Excel will open the file as UTF-8 2025-04-24 13:58:06 -04:00
Sean Whalen
1d8af3ccff Add find_unknown_base_reverse_dns.py 2025-04-24 13:48:51 -04:00
Sean Whalen
8426daa26b Remove duplicate domains 2025-04-24 13:47:07 -04:00
Sean Whalen
d1531b86f2 Sort known_unknown_base_reverse_dns.txt 2025-04-24 11:42:09 -04:00
Sean Whalen
8bb046798c Simplify sender types 2025-04-23 16:20:05 -04:00
Sean Whalen
d64e12548a Fix errors in the reverse DNS map 2025-04-23 15:57:31 -04:00
Sean Whalen
380479cbf1 Update reverse DNS map 2025-04-23 15:43:38 -04:00
Sean Whalen
ace21c8084 Update base_reverse_dns map .csv.and add known_unknown_base_reverse_dns.txt 2025-04-23 15:36:14 -04:00
Sean Whalen
1a1aef21ad Replace deprecated path call with file call 2025-04-23 15:33:27 -04:00
Sean Whalen
532dbbdb7e Fix file formatting 2025-04-23 15:32:04 -04:00
miles
45738ae688 Fix SyntaxError in elastic forensic report (#598) 2025-04-23 14:40:03 -04:00
Sean Whalen
9d77bd64bc Fix some CSV entries 2025-04-01 09:23:44 -04:00
Sean Whalen
140290221d Update elastic.py 2025-03-22 15:09:44 -04:00
Sean Whalen
187d61b770 Update elastic.py 2025-03-22 15:03:42 -04:00
Sean Whalen
0443b7365e Update elastic.py 2025-03-22 14:47:50 -04:00
Sean Whalen
d7b887a835 Debug elasticsearch 2025-03-22 14:42:45 -04:00
Tom Henderson
a805733221 Raise for failed status (#594) 2025-03-22 11:22:49 -04:00
Sean Whalen
9552c3ac92 Update README.md 2025-03-21 09:41:14 -04:00
Sean Whalen
5273948be0 Make build.sh usable without the gh-pages branch 2025-02-18 09:17:12 -05:00
Sean Whalen
b51756b8bd 8.18.1
- Add missing `https://` to the default Microsoft Graph URL
2025-02-17 12:41:57 -05:00
Sean Whalen
7fa7c24cb8 Merge branch 'master' of https://github.com/domainaware/parsedmarc 2025-02-17 12:31:47 -05:00
Sean Whalen
972237ae7e Fix default Microsoft Graph URL 2025-02-17 12:31:39 -05:00
Sean Whalen
6e5333a342 Style fixes 2025-02-03 16:11:21 -05:00
Sean Whalen
47b074c80b Merge branch 'master' of https://github.com/domainaware/parsedmarc 2025-02-03 16:11:01 -05:00
Sean Whalen
a1cfeb3081 8.18.0
- Add support for Microsoft national clouds via Graph API base URL (PR #590)
- Avoid stopping processing when an invalid DMARC report is encountered (PR #587)
- Increase `http.client._MAXHEADERS` from `100` to `200` to avoid errors connecting to Elasticsearch/OpenSearch (PR #589)
2025-02-03 16:10:51 -05:00
Paul Hecker
c7c451b1b1 Set http.client._MAXHEADERS to 200 (#589) 2025-02-03 15:26:15 -05:00
Kevin Goad
669deb9755 Add support for Microsoft national clouds via Graph API base URL (#590)
* adding support for Microsoft National Clouds

* Update usage.md
2025-02-03 15:25:15 -05:00
bendem
446c018920 do not stop processing when we encounter an invalid dmarc report (#587) 2025-02-03 15:20:52 -05:00
Sean Whalen
38c6f86973 Update CHANGELOG.md 2025-01-10 09:09:24 -05:00
Sean Whalen
62ccc11925 Update changelog 2025-01-09 22:25:43 -05:00
Sean Whalen
c32ca3cae3 Fix sortmaps.py 2025-01-09 22:24:03 -05:00
Sean Whalen
010f1f84a7 8.17.0
- Ignore duplicate aggregate DMARC reports with the same `org_name` and `report_id` seen within the same hour ([#539](https://github.com/domainaware/parsedmarc/issues/539))
- Fix saving SMTP TLS reports to OpenSearch (PR #585 closed issue #576)
- Add 303 entries to `base_reverse_dns_map.csv`
2025-01-09 22:22:55 -05:00
Anael Mobilia
7da57c6382 Fix colors on export.ndjson (#586)
Old elements are put on compatibility color palette => update to status color palette
2025-01-09 22:09:44 -05:00
Sean Whalen
d08e29a306 Move sortmaps.py 2025-01-09 22:08:42 -05:00
Sean Whalen
e1e53ad4cb Use Python instead of Excel for sorting map CSVs 2025-01-09 22:03:49 -05:00
Sean Whalen
4670e9687d Update base_reverse_dns_map.csv 2025-01-09 21:18:00 -05:00
Sean Whalen
7f8a2c08cd Use a smaller key value 2025-01-09 19:34:56 -05:00
Sean Whalen
e9c05dd0bf Update base_reverse_dns_map.csv 2025-01-08 20:51:44 -05:00
Sean Whalen
9348a474dd Actually fix the CLI 2025-01-08 20:49:39 -05:00
Sean Whalen
e0decaba8c Fix CLI 2025-01-07 14:33:35 -05:00
Sean Whalen
26a651cded Use a combination of report org and report ID when checking for duplicate aggregate reports 2025-01-07 14:25:57 -05:00
Sean Whalen
bcfcd93fc6 More duplicate aggregate report checks
#535
2025-01-07 13:56:26 -05:00
Sean Whalen
54d5ed3543 Remove unused import 2025-01-07 12:57:41 -05:00
Sean Whalen
1efbc87e0e Consolidate SEEN_AGGREGATE_REPORT_IDS 2025-01-07 12:56:30 -05:00
Sean Whalen
e78e7f64af Add parsedmarc.ini to .gitignore 2025-01-07 11:59:03 -05:00
Szasza Palmer
ad9de65b99 fixing SMTP TLS report saving to OpenSearch (#585) 2025-01-07 11:57:04 -05:00
Sean Whalen
b9df12700b Check for duplicate aggregate report IDs when processing a mailbox
Fix #535
2025-01-07 11:56:51 -05:00
Sean Whalen
20843b920f Sort reverse DNS map 2025-01-06 21:26:48 -05:00
Sean Whalen
e5ae89fedf Merge branch 'master' of https://github.com/domainaware/parsedmarc 2025-01-06 21:21:57 -05:00
Sean Whalen
f148cff11c Update reverse DNS map 2025-01-06 21:19:06 -05:00
Sean Whalen
4583769e04 Update reverse DNS map 2025-01-03 09:23:06 -05:00
Sean Whalen
0ecb80b27c Update reverse DNS map 2024-12-30 11:40:29 -05:00
Sean Whalen
b8e62e6d3b Remove duplicate entry 2024-12-28 14:14:00 -05:00
Sean Whalen
c67953a2c5 Update reverse DNS map 2024-12-28 14:10:39 -05:00
Sean Whalen
27dff4298c Update reverse DNS mapping 2024-12-28 11:53:50 -05:00
Sean Whalen
f2133aacd4 Fix build dependencies 2024-12-25 18:52:42 -05:00
Sean Whalen
31917e58a9 Update build backend 2024-12-25 18:28:30 -05:00
Sean Whalen
bffb98d217 Get report ID correctly 2024-12-25 16:37:40 -05:00
Sean Whalen
1f93b3a7ea Set max_len to a value 2024-12-25 16:26:38 -05:00
Sean Whalen
88debb9729 Fix SEEN_AGGREGATE_REPORT_IDS 2024-12-25 16:21:07 -05:00
Sean Whalen
a8a5564780 Merge branch 'master' of https://github.com/domainaware/parsedmarc 2024-12-25 16:14:40 -05:00
Sean Whalen
1e26f95b7b 8.16.1
- Ignore aggregate DMARC reports seen within a period of one hour (#535)
2024-12-25 16:14:33 -05:00
ericericsw
82b48e4d01 Add files via upload (#578)
update new version dashbroad

panel model change list:
grafana-piechart-panel -> pie chart
Graph(old) -> time series
worldmap panel -> geomap

some table panel has change , be like overview add ARC Column

The problem cannot be solved at the moment: Multiple DKIM information will cause table display errors
2024-12-25 16:09:43 -05:00
Sean Whalen
617b7c5b4a Merge PR #527 2024-11-09 18:18:31 -05:00
Sean Whalen
989bfd8f07 Code cleanup 2024-11-02 11:40:37 -04:00
Sean Whalen
908cc2918c Merge branch 'ramspoluri-master' 2024-11-02 11:39:34 -04:00
Sean Whalen
bd5774d71d Merge branch 'master' of https://github.com/ramspoluri/parsedmarc into ramspoluri-master 2024-11-02 11:38:41 -04:00
Sean Whalen
8e9112bad3 Merge branch 'master' of https://github.com/ramspoluri/parsedmarc 2024-11-02 10:48:15 -04:00
Sean Whalen
40e041a8af Merge branch 'master' of https://github.com/ramspoluri/parsedmarc 2024-11-02 10:48:10 -04:00
Sean Whalen
7ba433cddb Fix code style 2024-11-02 10:39:05 -04:00
Sean Whalen
6d467c93f9 Update __init__.py
Add reference to https://www.rfc-editor.org/rfc/rfc3501#page-52
2024-11-02 10:35:22 -04:00
Sean Whalen
be38e83761 Code cleanup 2024-11-02 10:28:11 -04:00
Sean Whalen
ef4e1ac8dc Code cleanup 2024-11-02 10:26:30 -04:00
Sean Whalen
39e4c22ecc Fix syntax 2024-11-02 10:23:23 -04:00
Sean Whalen
88ff3a2c23 Update syntax to support Python < 3.10 2024-11-02 10:04:01 -04:00
Sean Whalen
d8aee569f7 Update __init__.py 2024-11-02 09:50:55 -04:00
Sean Whalen
debc28cc6e 8.15.4
- Fix crash if aggregate report timespan is > 24 hours
2024-10-24 19:53:44 -04:00
Sean Whalen
52ccf0536c 8.15.3
- Ignore aggregate reports with a timespan of > 24 hours (Fixes #282)
2024-10-24 19:43:28 -04:00
ramspoluri
f618f69c6c Added 'since' option to search for messages since a certain time
- Added `since` option under `mailbox` section to search for messages since a certain time instead of going through the complete mailbox during testing scenarios. Acceptable values -`5m|3h|2d|1w`, units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"}). Defaults to `1d` if an incorrect value is provided.
    - Not to mark messages as read if test option is selected (works only for MSGraphConnection)
2024-05-24 20:43:36 +05:30
32 changed files with 8579 additions and 281 deletions

View File

@@ -11,13 +11,26 @@ on:
jobs:
build:
runs-on: ubuntu-latest
services:
elasticsearch:
image: elasticsearch:8.18.2
env:
discovery.type: single-node
cluster.name: parsedmarc-cluster
discovery.seed_hosts: elasticsearch
bootstrap.memory_lock: true
xpack.security.enabled: false
xpack.license.self_generated.type: basic
ports:
- 9200:9200
- 9300:9300
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
python-version: ["3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
@@ -29,13 +42,6 @@ jobs:
run: |
sudo apt-get update
sudo apt-get install -y libemail-outlook-message-perl
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
sudo apt-get install apt-transport-https
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update && sudo apt-get install elasticsearch
sudo sed -i 's/xpack.security.enabled: true/xpack.security.enabled: false/' /etc/elasticsearch/elasticsearch.yml
sudo systemctl restart elasticsearch
sudo systemctl --no-pager status elasticsearch
- name: Install Python dependencies
run: |
python -m pip install --upgrade pip

7
.gitignore vendored
View File

@@ -136,3 +136,10 @@ samples/private
*.html
*.sqlite-journal
parsedmarc.ini
scratch.py
parsedmarc/resources/maps/base_reverse_dns.csv
parsedmarc/resources/maps/unknown_base_reverse_dns.csv
*.bak

13
.vscode/settings.json vendored
View File

@@ -13,6 +13,7 @@
"automodule",
"backported",
"bellsouth",
"boto",
"brakhane",
"Brightmail",
"CEST",
@@ -36,6 +37,7 @@
"expiringdict",
"fieldlist",
"genindex",
"geoip",
"geoipupdate",
"Geolite",
"geolocation",
@@ -44,7 +46,10 @@
"hostnames",
"htpasswd",
"httpasswd",
"httplib",
"IMAP",
"imapclient",
"infile",
"Interaktive",
"IPDB",
"journalctl",
@@ -70,6 +75,7 @@
"modindex",
"msgconvert",
"msgraph",
"MSSP",
"Munge",
"ndjson",
"newkey",
@@ -79,14 +85,18 @@
"nosecureimap",
"nosniff",
"nwettbewerb",
"opensearch",
"parsedmarc",
"passsword",
"Postorius",
"premade",
"procs",
"publicsuffix",
"publicsuffixlist",
"publixsuffix",
"pygelf",
"pypy",
"pytest",
"quickstart",
"Reindex",
"replyto",
@@ -94,13 +104,16 @@
"Rollup",
"Rpdm",
"SAMEORIGIN",
"sdist",
"Servernameone",
"setuptools",
"smartquotes",
"SMTPTLS",
"sortmaps",
"sourcetype",
"STARTTLS",
"tasklist",
"timespan",
"tlsa",
"tlsrpt",
"toctree",

View File

@@ -1,6 +1,84 @@
Changelog
=========
8.18.6
------
- Fix since option to correctly work with weeks (PR #604)
- Add 183 entries to `base_reverse_dns_map.csv`
- Add 57 entries to `known_unknown_base_reverse_dns.txt`
- Check for invalid UTF-8 bytes in `base_reverse_dns_map.csv` at build
- Exclude unneeded items from the `parsedmarc.resources` module at build
8.18.5
------
- Fix CSV download
8.18.4
------
- Fix webhooks
8.18.3
------
- Move `__version__` to `parsedmarc.constants`
- Create a constant `USER_AGENT`
- Use the HTTP `User-Agent` header value `parsedmarc/version` for all HTTP requests
8.18.2
------
- Merged PR #603
- Fixes issue #595 - CI test fails for Elasticsearch
- Moved Elasticsearch to a separate Docker service container for CI testing
- Dropped Python 3.8 from CI testing
- Fixes lookup and saving of DMARC forensic reports in Elasticsearch and OpenSearch
- Updated fallback `base_reverse_dns_map.csv`, which now includes over 1,400 lines
- Updated included `dbip-country-lite.mmdb` to the June 2025 release
- Automatically fall back to the internal `base_reverse_dns_map.csv` if the received file is not valid (Fixes #602)
- Print the received data to the debug log
8.18.1
------
- Add missing `https://` to the default Microsoft Graph URL
8.18.0
------
- Add support for Microsoft national clouds via Graph API base URL (PR #590)
- Avoid stopping processing when an invalid DMARC report is encountered (PR #587)
- Increase `http.client._MAXHEADERS` from `100` to `200` to avoid errors connecting to Elasticsearch/OpenSearch (PR #589)
8.17.0
------
- Ignore duplicate aggregate DMARC reports with the same `org_name` and `report_id` seen within the same hour (Fixes #535)
- Fix saving SMTP TLS reports to OpenSearch (PR #585 closed issue #576)
- Add 303 entries to `base_reverse_dns_map.csv`
8.16.1
------
- Failed attempt to ignore aggregate DMARC reports seen within a period of one hour (#535)
8.16.0
------
- Add a `since` option to only search for emails since a certain time (PR #527)
8.15.4
------
- Fix crash if aggregate report timespan is > 24 hours
8.15.3
------
- Ignore aggregate reports with a timespan of > 24 hours (Fixes #282)
8.15.2
------
@@ -624,7 +702,7 @@ in the ``elasticsearch`` configuration file section (closes issue #78)
-----
- Add filename and line number to logging output
- Improved IMAP error handling
- Improved IMAP error handling
- Add CLI options
```text

View File

@@ -42,6 +42,6 @@ Thanks to all
- Consistent data structures
- Simple JSON and/or CSV output
- Optionally email the results
- Optionally send the results to Elasticsearch and/or Splunk, for use
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use
with premade dashboards
- Optionally send reports to Apache Kafka

View File

@@ -9,12 +9,20 @@ fi
. venv/bin/activate
pip install .[build]
ruff format .
ruff check .
cd docs
make clean
make html
touch build/html/.nojekyll
cp -rf build/html/* ../../parsedmarc-docs/
if [ -d "./../parsedmarc-docs" ]; then
cp -rf build/html/* ../../parsedmarc-docs/
fi
cd ..
cd parsedmarc/resources/maps
python3 sortmaps.py
echo "Checking for invalid UTF-8 bytes in base_reverse_dns_map.csv"
python3 find_bad_utf8.py base_reverse_dns_map.csv
cd ../../..
python3 tests.py
rm -rf dist/ build/
hatch build
hatch build

View File

@@ -28,3 +28,30 @@ services:
interval: 10s
timeout: 10s
retries: 24
opensearch:
image: opensearchproject/opensearch:2.18.0
environment:
- network.host=127.0.0.1
- http.host=0.0.0.0
- node.name=opensearch
- discovery.type=single-node
- cluster.name=parsedmarc-cluster
- discovery.seed_hosts=opensearch
- bootstrap.memory_lock=true
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD}
ports:
- 127.0.0.1:9201:9200
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
test:
[
"CMD-SHELL",
"curl -s -XGET http://localhost:9201/_cluster/health?pretty | grep status | grep -q '\\(green\\|yellow\\)'"
]
interval: 10s
timeout: 10s
retries: 24

View File

@@ -166,6 +166,9 @@ The full set of configuration options are:
- `check_timeout` - int: Number of seconds to wait for a IMAP
IDLE response or the number of seconds until the next
mail check (Default: `30`)
- `since` - str: Search for messages since certain time. (Examples: `5m|3h|2d|1w`)
Acceptable units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"}).
Defaults to `1d` if incorrect value is provided.
- `imap`
- `host` - str: The IMAP server hostname or IP address
- `port` - int: The IMAP server port (Default: `993`)
@@ -205,6 +208,8 @@ The full set of configuration options are:
- `mailbox` - str: The mailbox name. This defaults to the
current user if using the UsernamePassword auth method, but
could be a shared mailbox if the user has access to the mailbox
- `graph_url` - str: Microsoft Graph URL. Allows for use of National Clouds (ex Azure Gov)
(Default: https://graph.microsoft.com)
- `token_file` - str: Path to save the token file
(Default: `.token`)
- `allow_unencrypted_storage` - bool: Allows the Azure Identity
@@ -364,7 +369,7 @@ The full set of configuration options are:
- `mode` - str: The GELF transport type to use. Valid modes: `tcp`, `udp`, `tls`
- `maildir`
- `reports_folder` - str: Full path for mailbox maidir location (Default: `INBOX`)
- `maildir_path` - str: Full path for mailbox maidir location (Default: `INBOX`)
- `maildir_create` - bool: Create maildir if not present (Default: False)
- `webhook` - Post the individual reports to a webhook url with the report as the JSON body

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@@ -17,7 +17,7 @@ import zlib
from base64 import b64decode
from collections import OrderedDict
from csv import DictWriter
from datetime import datetime
from datetime import datetime, timedelta
from io import BytesIO, StringIO
from typing import Callable
@@ -28,13 +28,19 @@ from lxml import etree
from mailsuite.smtp import send_email
from parsedmarc.log import logger
from parsedmarc.mail import MailboxConnection
from parsedmarc.mail import (
MailboxConnection,
IMAPConnection,
MSGraphConnection,
GmailConnection,
)
from parsedmarc.constants import __version__
from parsedmarc.utils import get_base_domain, get_ip_address_info
from parsedmarc.utils import is_outlook_msg, convert_outlook_msg
from parsedmarc.utils import parse_email
from parsedmarc.utils import timestamp_to_human, human_timestamp_to_datetime
__version__ = "8.15.2"
logger.debug("parsedmarc v{0}".format(__version__))
@@ -49,6 +55,7 @@ MAGIC_XML = b"\x3c\x3f\x78\x6d\x6c\x20"
MAGIC_JSON = b"\7b"
IP_ADDRESS_CACHE = ExpiringDict(max_len=10000, max_age_seconds=14400)
SEEN_AGGREGATE_REPORT_IDS = ExpiringDict(max_len=100000000, max_age_seconds=3600)
REVERSE_DNS_MAP = dict()
@@ -266,7 +273,7 @@ def _parse_smtp_tls_failure_details(failure_details):
return new_failure_details
except KeyError as e:
raise InvalidSMTPTLSReport(f"Missing required failure details field:" f" {e}")
raise InvalidSMTPTLSReport(f"Missing required failure details field: {e}")
except Exception as e:
raise InvalidSMTPTLSReport(str(e))
@@ -278,7 +285,7 @@ def _parse_smtp_tls_report_policy(policy):
policy_type = policy["policy"]["policy-type"]
failure_details = []
if policy_type not in policy_types:
raise InvalidSMTPTLSReport(f"Invalid policy type " f"{policy_type}")
raise InvalidSMTPTLSReport(f"Invalid policy type {policy_type}")
new_policy = OrderedDict(policy_domain=policy_domain, policy_type=policy_type)
if "policy-string" in policy["policy"]:
if isinstance(policy["policy"]["policy-string"], list):
@@ -326,9 +333,7 @@ def parse_smtp_tls_report_json(report):
raise Exception(f"Missing required field: {required_field}]")
if not isinstance(report["policies"], list):
policies_type = type(report["policies"])
raise InvalidSMTPTLSReport(
f"policies must be a list, " f"not {policies_type}"
)
raise InvalidSMTPTLSReport(f"policies must be a list, not {policies_type}")
for policy in report["policies"]:
policies.append(_parse_smtp_tls_report_policy(policy))
@@ -519,7 +524,7 @@ def parse_aggregate_report_xml(
date_range = report["report_metadata"]["date_range"]
if int(date_range["end"]) - int(date_range["begin"]) > 2 * 86400:
_error = "Time span > 24 hours - RFC 7489 section 7.2"
errors.append(_error)
raise InvalidAggregateReport(_error)
date_range["begin"] = timestamp_to_human(date_range["begin"])
date_range["end"] = timestamp_to_human(date_range["end"])
new_report_metadata["begin_date"] = date_range["begin"]
@@ -1240,11 +1245,11 @@ def parse_report_email(
field_name = match[0].lower().replace(" ", "-")
fields[field_name] = match[1].strip()
feedback_report = "Arrival-Date: {}\n" "Source-IP: {}" "".format(
feedback_report = "Arrival-Date: {}\nSource-IP: {}".format(
fields["received-date"], fields["sender-ip-address"]
)
except Exception as e:
error = "Unable to parse message with " 'subject "{0}": {1}'.format(
error = 'Unable to parse message with subject "{0}": {1}'.format(
subject, e
)
raise InvalidDMARCReport(error)
@@ -1288,10 +1293,10 @@ def parse_report_email(
"is not a valid "
"aggregate DMARC report: {1}".format(subject, e)
)
raise ParserError(error)
raise InvalidDMARCReport(error)
except Exception as e:
error = "Unable to parse message with " 'subject "{0}": {1}'.format(
error = 'Unable to parse message with subject "{0}": {1}'.format(
subject, e
)
raise ParserError(error)
@@ -1325,7 +1330,7 @@ def parse_report_email(
return result
if result is None:
error = 'Message with subject "{0}" is ' "not a valid report".format(subject)
error = 'Message with subject "{0}" is not a valid report'.format(subject)
raise InvalidDMARCReport(error)
@@ -1465,7 +1470,17 @@ def get_dmarc_reports_from_mbox(
strip_attachment_payloads=sa,
)
if parsed_email["report_type"] == "aggregate":
aggregate_reports.append(parsed_email["report"])
report_org = parsed_email["report"]["report_metadata"]["org_name"]
report_id = parsed_email["report"]["report_metadata"]["report_id"]
report_key = f"{report_org}_{report_id}"
if report_key not in SEEN_AGGREGATE_REPORT_IDS:
SEEN_AGGREGATE_REPORT_IDS[report_key] = True
aggregate_reports.append(parsed_email["report"])
else:
logger.debug(
"Skipping duplicate aggregate report "
f"from {report_org} with ID: {report_id}"
)
elif parsed_email["report_type"] == "forensic":
forensic_reports.append(parsed_email["report"])
elif parsed_email["report_type"] == "smtp_tls":
@@ -1499,6 +1514,7 @@ def get_dmarc_reports_from_mailbox(
strip_attachment_payloads=False,
results=None,
batch_size=10,
since=None,
create_folders=True,
):
"""
@@ -1522,6 +1538,8 @@ def get_dmarc_reports_from_mailbox(
results (dict): Results from the previous run
batch_size (int): Number of messages to read and process before saving
(use 0 for no limit)
since: Search for messages since certain time
(units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"})
create_folders (bool): Whether to create the destination folders
(not used in watch)
@@ -1534,6 +1552,9 @@ def get_dmarc_reports_from_mailbox(
if connection is None:
raise ValueError("Must supply a connection")
# current_time useful to fetch_messages later in the program
current_time = None
aggregate_reports = []
forensic_reports = []
smtp_tls_reports = []
@@ -1557,11 +1578,50 @@ def get_dmarc_reports_from_mailbox(
connection.create_folder(smtp_tls_reports_folder)
connection.create_folder(invalid_reports_folder)
messages = connection.fetch_messages(reports_folder, batch_size=batch_size)
if since:
_since = 1440 # default one day
if re.match(r"\d+[mhdw]$", since):
s = re.split(r"(\d+)", since)
if s[2] == "m":
_since = int(s[1])
elif s[2] == "h":
_since = int(s[1]) * 60
elif s[2] == "d":
_since = int(s[1]) * 60 * 24
elif s[2] == "w":
_since = int(s[1]) * 60 * 24 * 7
else:
logger.warning(
"Incorrect format for 'since' option. \
Provided value:{0}, Expected values:(5m|3h|2d|1w). \
Ignoring option, fetching messages for last 24hrs"
"SMTP does not support a time or timezone in since."
"See https://www.rfc-editor.org/rfc/rfc3501#page-52".format(since)
)
if isinstance(connection, IMAPConnection):
logger.debug(
"Only days and weeks values in 'since' option are \
considered for IMAP conections. Examples: 2d or 1w"
)
since = (datetime.utcnow() - timedelta(minutes=_since)).date()
current_time = datetime.utcnow().date()
elif isinstance(connection, MSGraphConnection):
since = (datetime.utcnow() - timedelta(minutes=_since)).isoformat() + "Z"
current_time = datetime.utcnow().isoformat() + "Z"
elif isinstance(connection, GmailConnection):
since = (datetime.utcnow() - timedelta(minutes=_since)).strftime("%s")
current_time = datetime.utcnow().strftime("%s")
else:
pass
messages = connection.fetch_messages(
reports_folder, batch_size=batch_size, since=since
)
total_messages = len(messages)
logger.debug("Found {0} messages in {1}".format(len(messages), reports_folder))
if batch_size:
if batch_size and not since:
message_limit = min(total_messages, batch_size)
else:
message_limit = total_messages
@@ -1575,7 +1635,13 @@ def get_dmarc_reports_from_mailbox(
i + 1, message_limit, msg_uid
)
)
msg_content = connection.fetch_message(msg_uid)
if isinstance(mailbox, MSGraphConnection):
if test:
msg_content = connection.fetch_message(msg_uid, mark_read=False)
else:
msg_content = connection.fetch_message(msg_uid, mark_read=True)
else:
msg_content = connection.fetch_message(msg_uid)
try:
sa = strip_attachment_payloads
parsed_email = parse_report_email(
@@ -1591,7 +1657,16 @@ def get_dmarc_reports_from_mailbox(
keep_alive=connection.keepalive,
)
if parsed_email["report_type"] == "aggregate":
aggregate_reports.append(parsed_email["report"])
report_org = parsed_email["report"]["report_metadata"]["org_name"]
report_id = parsed_email["report"]["report_metadata"]["report_id"]
report_key = f"{report_org}_{report_id}"
if report_key not in SEEN_AGGREGATE_REPORT_IDS:
SEEN_AGGREGATE_REPORT_IDS[report_key] = True
aggregate_reports.append(parsed_email["report"])
else:
logger.debug(
f"Skipping duplicate aggregate report with ID: {report_id}"
)
aggregate_report_msg_uids.append(msg_uid)
elif parsed_email["report_type"] == "forensic":
forensic_reports.append(parsed_email["report"])
@@ -1632,7 +1707,7 @@ def get_dmarc_reports_from_mailbox(
except Exception as e:
message = "Error deleting message UID"
e = "{0} {1}: " "{2}".format(message, msg_uid, e)
e = "{0} {1}: {2}".format(message, msg_uid, e)
logger.error("Mailbox error: {0}".format(e))
else:
if len(aggregate_report_msg_uids) > 0:
@@ -1706,7 +1781,12 @@ def get_dmarc_reports_from_mailbox(
]
)
total_messages = len(connection.fetch_messages(reports_folder))
if current_time:
total_messages = len(
connection.fetch_messages(reports_folder, since=current_time)
)
else:
total_messages = len(connection.fetch_messages(reports_folder))
if not test and not batch_size and total_messages > 0:
# Process emails that came in during the last run
@@ -1725,6 +1805,7 @@ def get_dmarc_reports_from_mailbox(
reverse_dns_map_path=reverse_dns_map_path,
reverse_dns_map_url=reverse_dns_map_url,
offline=offline,
since=current_time,
)
return results

View File

@@ -14,6 +14,7 @@ import json
from ssl import CERT_NONE, create_default_context
from multiprocessing import Pipe, Process
import sys
import http.client
from tqdm import tqdm
from parsedmarc import (
@@ -46,6 +47,9 @@ from parsedmarc.mail.graph import AuthMethod
from parsedmarc.log import logger
from parsedmarc.utils import is_mbox, get_reverse_dns
from parsedmarc import SEEN_AGGREGATE_REPORT_IDS
http.client._MAXHEADERS = 200 # pylint:disable=protected-access
formatter = logging.Formatter(
fmt="%(levelname)8s:%(filename)s:%(lineno)d:%(message)s",
@@ -395,7 +399,7 @@ def _main():
arg_parser.add_argument(
"-c",
"--config-file",
help="a path to a configuration file " "(--silent implied)",
help="a path to a configuration file (--silent implied)",
)
arg_parser.add_argument(
"file_path",
@@ -403,7 +407,7 @@ def _main():
help="one or more paths to aggregate or forensic "
"report files, emails, or mbox files'",
)
strip_attachment_help = "remove attachment payloads from forensic " "report output"
strip_attachment_help = "remove attachment payloads from forensic report output"
arg_parser.add_argument(
"--strip-attachment-payloads", help=strip_attachment_help, action="store_true"
)
@@ -446,14 +450,14 @@ def _main():
arg_parser.add_argument(
"-t",
"--dns_timeout",
help="number of seconds to wait for an answer " "from DNS (default: 2.0)",
help="number of seconds to wait for an answer from DNS (default: 2.0)",
type=float,
default=2.0,
)
arg_parser.add_argument(
"--offline",
action="store_true",
help="do not make online queries for geolocation " " or DNS",
help="do not make online queries for geolocation or DNS",
)
arg_parser.add_argument(
"-s", "--silent", action="store_true", help="only print errors"
@@ -510,6 +514,7 @@ def _main():
mailbox_test=False,
mailbox_batch_size=10,
mailbox_check_timeout=30,
mailbox_since=None,
imap_host=None,
imap_skip_certificate_verification=False,
imap_ssl=True,
@@ -526,6 +531,7 @@ def _main():
graph_tenant_id=None,
graph_mailbox=None,
graph_allow_unencrypted_storage=False,
graph_url="https://graph.microsoft.com",
hec=None,
hec_token=None,
hec_index=None,
@@ -714,6 +720,8 @@ def _main():
opts.mailbox_batch_size = mailbox_config.getint("batch_size")
if "check_timeout" in mailbox_config:
opts.mailbox_check_timeout = mailbox_config.getint("check_timeout")
if "since" in mailbox_config:
opts.mailbox_since = mailbox_config["since"]
if "imap" in config.sections():
imap_config = config["imap"]
@@ -726,7 +734,7 @@ def _main():
if "host" in imap_config:
opts.imap_host = imap_config["host"]
else:
logger.error("host setting missing from the " "imap config section")
logger.error("host setting missing from the imap config section")
exit(-1)
if "port" in imap_config:
opts.imap_port = imap_config.getint("port")
@@ -742,14 +750,12 @@ def _main():
if "user" in imap_config:
opts.imap_user = imap_config["user"]
else:
logger.critical("user setting missing from the " "imap config section")
logger.critical("user setting missing from the imap config section")
exit(-1)
if "password" in imap_config:
opts.imap_password = imap_config["password"]
else:
logger.critical(
"password setting missing from the " "imap config section"
)
logger.critical("password setting missing from the imap config section")
exit(-1)
if "reports_folder" in imap_config:
opts.mailbox_reports_folder = imap_config["reports_folder"]
@@ -818,21 +824,20 @@ def _main():
opts.graph_user = graph_config["user"]
else:
logger.critical(
"user setting missing from the " "msgraph config section"
"user setting missing from the msgraph config section"
)
exit(-1)
if "password" in graph_config:
opts.graph_password = graph_config["password"]
else:
logger.critical(
"password setting missing from the " "msgraph config section"
"password setting missing from the msgraph config section"
)
if "client_secret" in graph_config:
opts.graph_client_secret = graph_config["client_secret"]
else:
logger.critical(
"client_secret setting missing from the "
"msgraph config section"
"client_secret setting missing from the msgraph config section"
)
exit(-1)
@@ -845,7 +850,7 @@ def _main():
opts.graph_tenant_id = graph_config["tenant_id"]
else:
logger.critical(
"tenant_id setting missing from the " "msgraph config section"
"tenant_id setting missing from the msgraph config section"
)
exit(-1)
@@ -854,8 +859,7 @@ def _main():
opts.graph_client_secret = graph_config["client_secret"]
else:
logger.critical(
"client_secret setting missing from the "
"msgraph config section"
"client_secret setting missing from the msgraph config section"
)
exit(-1)
@@ -863,7 +867,7 @@ def _main():
opts.graph_client_id = graph_config["client_id"]
else:
logger.critical(
"client_id setting missing from the " "msgraph config section"
"client_id setting missing from the msgraph config section"
)
exit(-1)
@@ -871,10 +875,13 @@ def _main():
opts.graph_mailbox = graph_config["mailbox"]
elif opts.graph_auth_method != AuthMethod.UsernamePassword.name:
logger.critical(
"mailbox setting missing from the " "msgraph config section"
"mailbox setting missing from the msgraph config section"
)
exit(-1)
if "graph_url" in graph_config:
opts.graph_url = graph_config["graph_url"]
if "allow_unencrypted_storage" in graph_config:
opts.graph_allow_unencrypted_storage = graph_config.getboolean(
"allow_unencrypted_storage"
@@ -886,7 +893,7 @@ def _main():
opts.elasticsearch_hosts = _str_to_list(elasticsearch_config["hosts"])
else:
logger.critical(
"hosts setting missing from the " "elasticsearch config section"
"hosts setting missing from the elasticsearch config section"
)
exit(-1)
if "timeout" in elasticsearch_config:
@@ -924,7 +931,7 @@ def _main():
opts.opensearch_hosts = _str_to_list(opensearch_config["hosts"])
else:
logger.critical(
"hosts setting missing from the " "opensearch config section"
"hosts setting missing from the opensearch config section"
)
exit(-1)
if "timeout" in opensearch_config:
@@ -960,21 +967,21 @@ def _main():
opts.hec = hec_config["url"]
else:
logger.critical(
"url setting missing from the " "splunk_hec config section"
"url setting missing from the splunk_hec config section"
)
exit(-1)
if "token" in hec_config:
opts.hec_token = hec_config["token"]
else:
logger.critical(
"token setting missing from the " "splunk_hec config section"
"token setting missing from the splunk_hec config section"
)
exit(-1)
if "index" in hec_config:
opts.hec_index = hec_config["index"]
else:
logger.critical(
"index setting missing from the " "splunk_hec config section"
"index setting missing from the splunk_hec config section"
)
exit(-1)
if "skip_certificate_verification" in hec_config:
@@ -987,9 +994,7 @@ def _main():
if "hosts" in kafka_config:
opts.kafka_hosts = _str_to_list(kafka_config["hosts"])
else:
logger.critical(
"hosts setting missing from the " "kafka config section"
)
logger.critical("hosts setting missing from the kafka config section")
exit(-1)
if "user" in kafka_config:
opts.kafka_username = kafka_config["user"]
@@ -1004,21 +1009,20 @@ def _main():
opts.kafka_aggregate_topic = kafka_config["aggregate_topic"]
else:
logger.critical(
"aggregate_topic setting missing from the " "kafka config section"
"aggregate_topic setting missing from the kafka config section"
)
exit(-1)
if "forensic_topic" in kafka_config:
opts.kafka_forensic_topic = kafka_config["forensic_topic"]
else:
logger.critical(
"forensic_topic setting missing from the " "kafka config section"
"forensic_topic setting missing from the kafka config section"
)
if "smtp_tls_topic" in kafka_config:
opts.kafka_smtp_tls_topic = kafka_config["smtp_tls_topic"]
else:
logger.critical(
"forensic_topic setting missing from the "
"splunk_hec config section"
"forensic_topic setting missing from the splunk_hec config section"
)
if "smtp" in config.sections():
@@ -1026,7 +1030,7 @@ def _main():
if "host" in smtp_config:
opts.smtp_host = smtp_config["host"]
else:
logger.critical("host setting missing from the " "smtp config section")
logger.critical("host setting missing from the smtp config section")
exit(-1)
if "port" in smtp_config:
opts.smtp_port = smtp_config.getint("port")
@@ -1038,23 +1042,21 @@ def _main():
if "user" in smtp_config:
opts.smtp_user = smtp_config["user"]
else:
logger.critical("user setting missing from the " "smtp config section")
logger.critical("user setting missing from the smtp config section")
exit(-1)
if "password" in smtp_config:
opts.smtp_password = smtp_config["password"]
else:
logger.critical(
"password setting missing from the " "smtp config section"
)
logger.critical("password setting missing from the smtp config section")
exit(-1)
if "from" in smtp_config:
opts.smtp_from = smtp_config["from"]
else:
logger.critical("from setting missing from the " "smtp config section")
logger.critical("from setting missing from the smtp config section")
if "to" in smtp_config:
opts.smtp_to = _str_to_list(smtp_config["to"])
else:
logger.critical("to setting missing from the " "smtp config section")
logger.critical("to setting missing from the smtp config section")
if "subject" in smtp_config:
opts.smtp_subject = smtp_config["subject"]
if "attachment" in smtp_config:
@@ -1067,7 +1069,7 @@ def _main():
if "bucket" in s3_config:
opts.s3_bucket = s3_config["bucket"]
else:
logger.critical("bucket setting missing from the " "s3 config section")
logger.critical("bucket setting missing from the s3 config section")
exit(-1)
if "path" in s3_config:
opts.s3_path = s3_config["path"]
@@ -1092,9 +1094,7 @@ def _main():
if "server" in syslog_config:
opts.syslog_server = syslog_config["server"]
else:
logger.critical(
"server setting missing from the " "syslog config section"
)
logger.critical("server setting missing from the syslog config section")
exit(-1)
if "port" in syslog_config:
opts.syslog_port = syslog_config["port"]
@@ -1145,17 +1145,17 @@ def _main():
if "host" in gelf_config:
opts.gelf_host = gelf_config["host"]
else:
logger.critical("host setting missing from the " "gelf config section")
logger.critical("host setting missing from the gelf config section")
exit(-1)
if "port" in gelf_config:
opts.gelf_port = gelf_config["port"]
else:
logger.critical("port setting missing from the " "gelf config section")
logger.critical("port setting missing from the gelf config section")
exit(-1)
if "mode" in gelf_config:
opts.gelf_mode = gelf_config["mode"]
else:
logger.critical("mode setting missing from the " "gelf config section")
logger.critical("mode setting missing from the gelf config section")
exit(-1)
if "webhook" in config.sections():
@@ -1181,8 +1181,7 @@ def _main():
try:
fh = logging.FileHandler(opts.log_file, "a")
formatter = logging.Formatter(
"%(asctime)s - "
"%(levelname)s - [%(filename)s:%(lineno)d] - %(message)s"
"%(asctime)s - %(levelname)s - [%(filename)s:%(lineno)d] - %(message)s"
)
fh.setFormatter(formatter)
logger.addHandler(fh)
@@ -1290,7 +1289,7 @@ def _main():
if opts.hec:
if opts.hec_token is None or opts.hec_index is None:
logger.error("HEC token and HEC index are required when " "using HEC URL")
logger.error("HEC token and HEC index are required when using HEC URL")
exit(1)
verify = True
@@ -1415,7 +1414,17 @@ def _main():
logger.error("Failed to parse {0} - {1}".format(result[1], result[0]))
else:
if result[0]["report_type"] == "aggregate":
aggregate_reports.append(result[0]["report"])
report_org = result[0]["report"]["report_metadata"]["org_name"]
report_id = result[0]["report"]["report_metadata"]["report_id"]
report_key = f"{report_org}_{report_id}"
if report_key not in SEEN_AGGREGATE_REPORT_IDS:
SEEN_AGGREGATE_REPORT_IDS[report_key] = True
aggregate_reports.append(result[0]["report"])
else:
logger.debug(
"Skipping duplicate aggregate report "
f"from {report_org} with ID: {report_id}"
)
elif result[0]["report_type"] == "forensic":
forensic_reports.append(result[0]["report"])
elif result[0]["report_type"] == "smtp_tls":
@@ -1443,7 +1452,7 @@ def _main():
try:
if opts.imap_user is None or opts.imap_password is None:
logger.error(
"IMAP user and password must be specified if" "host is specified"
"IMAP user and password must be specified ifhost is specified"
)
ssl = True
@@ -1482,6 +1491,7 @@ def _main():
password=opts.graph_password,
token_file=opts.graph_token_file,
allow_unencrypted_storage=opts.graph_allow_unencrypted_storage,
graph_url=opts.graph_url,
)
except Exception:
@@ -1540,6 +1550,7 @@ def _main():
nameservers=opts.nameservers,
test=opts.mailbox_test,
strip_attachment_payloads=opts.strip_attachment_payloads,
since=opts.mailbox_since,
)
aggregate_reports += reports["aggregate_reports"]

2
parsedmarc/constants.py Normal file
View File

@@ -0,0 +1,2 @@
__version__ = "8.18.6"
USER_AGENT = f"parsedmarc/{__version__}"

View File

@@ -552,8 +552,8 @@ def save_forensic_report_to_elasticsearch(
for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header]
arrival_date_human = forensic_report["arrival_date_utc"]
arrival_date = human_timestamp_to_datetime(arrival_date_human)
arrival_date = human_timestamp_to_datetime(forensic_report["arrival_date_utc"])
arrival_date_epoch_milliseconds = int(arrival_date.timestamp() * 1000)
if index_suffix is not None:
search_index = "dmarc_forensic_{0}*".format(index_suffix)
@@ -562,20 +562,35 @@ def save_forensic_report_to_elasticsearch(
if index_prefix is not None:
search_index = "{0}{1}".format(index_prefix, search_index)
search = Search(index=search_index)
arrival_query = {"match": {"arrival_date": arrival_date}}
q = Q(arrival_query)
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds)))
from_ = None
to_ = None
subject = None
if "from" in headers:
from_ = headers["from"]
from_query = {"match_phrase": {"sample.headers.from": from_}}
q = q & Q(from_query)
# We convert the FROM header from a string list to a flat string.
headers["from"] = headers["from"][0]
if headers["from"][0] == "":
headers["from"] = headers["from"][1]
else:
headers["from"] = " <".join(headers["from"]) + ">"
from_ = dict()
from_["sample.headers.from"] = headers["from"]
from_query = Q(dict(match_phrase=from_))
q = q & from_query
if "to" in headers:
to_ = headers["to"]
to_query = {"match_phrase": {"sample.headers.to": to_}}
q = q & Q(to_query)
# We convert the TO header from a string list to a flat string.
headers["to"] = headers["to"][0]
if headers["to"][0] == "":
headers["to"] = headers["to"][1]
else:
headers["to"] = " <".join(headers["to"]) + ">"
to_ = dict()
to_["sample.headers.to"] = headers["to"]
to_query = Q(dict(match_phrase=to_))
q = q & to_query
if "subject" in headers:
subject = headers["subject"]
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
@@ -589,7 +604,9 @@ def save_forensic_report_to_elasticsearch(
"A forensic sample to {0} from {1} "
"with a subject of {2} and arrival date of {3} "
"already exists in "
"Elasticsearch".format(to_, from_, subject, arrival_date_human)
"Elasticsearch".format(
to_, from_, subject, forensic_report["arrival_date_utc"]
)
)
parsed_sample = forensic_report["parsed_sample"]
@@ -625,7 +642,7 @@ def save_forensic_report_to_elasticsearch(
user_agent=forensic_report["user_agent"],
version=forensic_report["version"],
original_mail_from=forensic_report["original_mail_from"],
arrival_date=arrival_date,
arrival_date=arrival_date_epoch_milliseconds,
domain=forensic_report["reported_domain"],
original_envelope_id=forensic_report["original_envelope_id"],
authentication_results=forensic_report["authentication_results"],

View File

@@ -63,24 +63,36 @@ class GmailConnection(MailboxConnection):
).execute()
except HttpError as e:
if e.status_code == 409:
logger.debug(
f"Folder {folder_name} already exists, " f"skipping creation"
)
logger.debug(f"Folder {folder_name} already exists, skipping creation")
else:
raise e
def _fetch_all_message_ids(self, reports_label_id, page_token=None):
results = (
self.service.users()
.messages()
.list(
userId="me",
includeSpamTrash=self.include_spam_trash,
labelIds=[reports_label_id],
pageToken=page_token,
def _fetch_all_message_ids(self, reports_label_id, page_token=None, since=None):
if since:
results = (
self.service.users()
.messages()
.list(
userId="me",
includeSpamTrash=self.include_spam_trash,
labelIds=[reports_label_id],
pageToken=page_token,
q=f"after:{since}",
)
.execute()
)
else:
results = (
self.service.users()
.messages()
.list(
userId="me",
includeSpamTrash=self.include_spam_trash,
labelIds=[reports_label_id],
pageToken=page_token,
)
.execute()
)
.execute()
)
messages = results.get("messages", [])
for message in messages:
yield message["id"]
@@ -92,7 +104,13 @@ class GmailConnection(MailboxConnection):
def fetch_messages(self, reports_folder: str, **kwargs) -> List[str]:
reports_label_id = self._find_label_id_for_label(reports_folder)
return [id for id in self._fetch_all_message_ids(reports_label_id)]
since = kwargs.get("since")
if since:
return [
id for id in self._fetch_all_message_ids(reports_label_id, since=since)
]
else:
return [id for id in self._fetch_all_message_ids(reports_label_id)]
def fetch_message(self, message_id):
msg = (

View File

@@ -89,6 +89,7 @@ class MSGraphConnection(MailboxConnection):
self,
auth_method: str,
mailbox: str,
graph_url: str,
client_id: str,
client_secret: str,
username: str,
@@ -108,7 +109,10 @@ class MSGraphConnection(MailboxConnection):
token_path=token_path,
allow_unencrypted_storage=allow_unencrypted_storage,
)
client_params = {"credential": credential}
client_params = {
"credential": credential,
"cloud": graph_url,
}
if not isinstance(credential, ClientSecretCredential):
scopes = ["Mail.ReadWrite"]
# Detect if mailbox is shared
@@ -137,25 +141,30 @@ class MSGraphConnection(MailboxConnection):
request_url = f"/users/{self.mailbox_name}/mailFolders{sub_url}"
resp = self._client.post(request_url, json=request_body)
if resp.status_code == 409:
logger.debug(f"Folder {folder_name} already exists, " f"skipping creation")
logger.debug(f"Folder {folder_name} already exists, skipping creation")
elif resp.status_code == 201:
logger.debug(f"Created folder {folder_name}")
else:
logger.warning(f"Unknown response " f"{resp.status_code} {resp.json()}")
logger.warning(f"Unknown response {resp.status_code} {resp.json()}")
def fetch_messages(self, folder_name: str, **kwargs) -> List[str]:
"""Returns a list of message UIDs in the specified folder"""
folder_id = self._find_folder_id_from_folder_path(folder_name)
url = f"/users/{self.mailbox_name}/mailFolders/" f"{folder_id}/messages"
url = f"/users/{self.mailbox_name}/mailFolders/{folder_id}/messages"
since = kwargs.get("since")
if not since:
since = None
batch_size = kwargs.get("batch_size")
if not batch_size:
batch_size = 0
emails = self._get_all_messages(url, batch_size)
emails = self._get_all_messages(url, batch_size, since)
return [email["id"] for email in emails]
def _get_all_messages(self, url, batch_size):
def _get_all_messages(self, url, batch_size, since):
messages: list
params = {"$select": "id"}
if since:
params["$filter"] = f"receivedDateTime ge {since}"
if batch_size and batch_size > 0:
params["$top"] = batch_size
else:
@@ -166,7 +175,7 @@ class MSGraphConnection(MailboxConnection):
messages = result.json()["value"]
# Loop if next page is present and not obtained message limit.
while "@odata.nextLink" in result.json() and (
batch_size == 0 or batch_size - len(messages) > 0
since is not None or (batch_size == 0 or batch_size - len(messages) > 0)
):
result = self._client.get(result.json()["@odata.nextLink"])
if result.status_code != 200:
@@ -180,17 +189,19 @@ class MSGraphConnection(MailboxConnection):
resp = self._client.patch(url, json={"isRead": "true"})
if resp.status_code != 200:
raise RuntimeWarning(
f"Failed to mark message read" f"{resp.status_code}: {resp.json()}"
f"Failed to mark message read{resp.status_code}: {resp.json()}"
)
def fetch_message(self, message_id: str):
def fetch_message(self, message_id: str, **kwargs):
url = f"/users/{self.mailbox_name}/messages/{message_id}/$value"
result = self._client.get(url)
if result.status_code != 200:
raise RuntimeWarning(
f"Failed to fetch message" f"{result.status_code}: {result.json()}"
f"Failed to fetch message{result.status_code}: {result.json()}"
)
self.mark_message_read(message_id)
mark_read = kwargs.get("mark_read")
if mark_read:
self.mark_message_read(message_id)
return result.text
def delete_message(self, message_id: str):
@@ -198,7 +209,7 @@ class MSGraphConnection(MailboxConnection):
resp = self._client.delete(url)
if resp.status_code != 204:
raise RuntimeWarning(
f"Failed to delete message " f"{resp.status_code}: {resp.json()}"
f"Failed to delete message {resp.status_code}: {resp.json()}"
)
def move_message(self, message_id: str, folder_name: str):
@@ -208,7 +219,7 @@ class MSGraphConnection(MailboxConnection):
resp = self._client.post(url, json=request_body)
if resp.status_code != 201:
raise RuntimeWarning(
f"Failed to move message " f"{resp.status_code}: {resp.json()}"
f"Failed to move message {resp.status_code}: {resp.json()}"
)
def keepalive(self):
@@ -243,7 +254,7 @@ class MSGraphConnection(MailboxConnection):
filter = f"?$filter=displayName eq '{folder_name}'"
folders_resp = self._client.get(url + filter)
if folders_resp.status_code != 200:
raise RuntimeWarning(f"Failed to list folders." f"{folders_resp.json()}")
raise RuntimeWarning(f"Failed to list folders.{folders_resp.json()}")
folders: list = folders_resp.json()["value"]
matched_folders = [
folder for folder in folders if folder["displayName"] == folder_name

View File

@@ -39,7 +39,11 @@ class IMAPConnection(MailboxConnection):
def fetch_messages(self, reports_folder: str, **kwargs):
self._client.select_folder(reports_folder)
return self._client.search()
since = kwargs.get("since")
if since:
return self._client.search(["SINCE", since])
else:
return self._client.search()
def fetch_message(self, message_id):
return self._client.fetch_message(message_id, parse=False)
@@ -81,7 +85,5 @@ class IMAPConnection(MailboxConnection):
logger.warning("IMAP connection timeout. Reconnecting...")
sleep(check_timeout)
except Exception as e:
logger.warning(
"IMAP connection error. {0}. " "Reconnecting...".format(e)
)
logger.warning("IMAP connection error. {0}. Reconnecting...".format(e))
sleep(check_timeout)

View File

@@ -202,13 +202,15 @@ class _SMTPTLSPolicyDoc(InnerDoc):
receiving_ip,
receiving_mx_helo,
failed_session_count,
sending_mta_ip=None,
receiving_mx_hostname=None,
additional_information_uri=None,
failure_reason_code=None,
):
self.failure_details.append(
_details = _SMTPTLSFailureDetailsDoc(
result_type=result_type,
ip_address=ip_address,
sending_mta_ip=sending_mta_ip,
receiving_mx_hostname=receiving_mx_hostname,
receiving_mx_helo=receiving_mx_helo,
receiving_ip=receiving_ip,
@@ -216,9 +218,10 @@ class _SMTPTLSPolicyDoc(InnerDoc):
additional_information=additional_information_uri,
failure_reason_code=failure_reason_code,
)
self.failure_details.append(_details)
class _SMTPTLSFailureReportDoc(Document):
class _SMTPTLSReportDoc(Document):
class Index:
name = "smtp_tls"
@@ -499,6 +502,7 @@ def save_aggregate_report_to_opensearch(
index = "{0}_{1}".format(index, index_suffix)
if index_prefix:
index = "{0}{1}".format(index_prefix, index)
index = "{0}-{1}".format(index, index_date)
index_settings = dict(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
@@ -548,8 +552,8 @@ def save_forensic_report_to_opensearch(
for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header]
arrival_date_human = forensic_report["arrival_date_utc"]
arrival_date = human_timestamp_to_datetime(arrival_date_human)
arrival_date = human_timestamp_to_datetime(forensic_report["arrival_date_utc"])
arrival_date_epoch_milliseconds = int(arrival_date.timestamp() * 1000)
if index_suffix is not None:
search_index = "dmarc_forensic_{0}*".format(index_suffix)
@@ -558,20 +562,35 @@ def save_forensic_report_to_opensearch(
if index_prefix is not None:
search_index = "{0}{1}".format(index_prefix, search_index)
search = Search(index=search_index)
arrival_query = {"match": {"arrival_date": arrival_date}}
q = Q(arrival_query)
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds)))
from_ = None
to_ = None
subject = None
if "from" in headers:
from_ = headers["from"]
from_query = {"match_phrase": {"sample.headers.from": from_}}
q = q & Q(from_query)
# We convert the FROM header from a string list to a flat string.
headers["from"] = headers["from"][0]
if headers["from"][0] == "":
headers["from"] = headers["from"][1]
else:
headers["from"] = " <".join(headers["from"]) + ">"
from_ = dict()
from_["sample.headers.from"] = headers["from"]
from_query = Q(dict(match_phrase=from_))
q = q & from_query
if "to" in headers:
to_ = headers["to"]
to_query = {"match_phrase": {"sample.headers.to": to_}}
q = q & Q(to_query)
# We convert the TO header from a string list to a flat string.
headers["to"] = headers["to"][0]
if headers["to"][0] == "":
headers["to"] = headers["to"][1]
else:
headers["to"] = " <".join(headers["to"]) + ">"
to_ = dict()
to_["sample.headers.to"] = headers["to"]
to_query = Q(dict(match_phrase=to_))
q = q & to_query
if "subject" in headers:
subject = headers["subject"]
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
@@ -585,7 +604,9 @@ def save_forensic_report_to_opensearch(
"A forensic sample to {0} from {1} "
"with a subject of {2} and arrival date of {3} "
"already exists in "
"OpenSearch".format(to_, from_, subject, arrival_date_human)
"OpenSearch".format(
to_, from_, subject, forensic_report["arrival_date_utc"]
)
)
parsed_sample = forensic_report["parsed_sample"]
@@ -621,7 +642,7 @@ def save_forensic_report_to_opensearch(
user_agent=forensic_report["user_agent"],
version=forensic_report["version"],
original_mail_from=forensic_report["original_mail_from"],
arrival_date=arrival_date,
arrival_date=arrival_date_epoch_milliseconds,
domain=forensic_report["reported_domain"],
original_envelope_id=forensic_report["original_envelope_id"],
authentication_results=forensic_report["authentication_results"],
@@ -685,7 +706,7 @@ def save_smtp_tls_report_to_opensearch(
AlreadySaved
"""
logger.info("Saving aggregate report to OpenSearch")
org_name = report["org_name"]
org_name = report["organization_name"]
report_id = report["report_id"]
begin_date = human_timestamp_to_datetime(report["begin_date"], to_utc=True)
end_date = human_timestamp_to_datetime(report["end_date"], to_utc=True)
@@ -741,11 +762,11 @@ def save_smtp_tls_report_to_opensearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
)
smtp_tls_doc = _SMTPTLSFailureReportDoc(
organization_name=report["organization_name"],
date_range=[report["date_begin"], report["date_end"]],
date_begin=report["date_begin"],
date_end=report["date_end"],
smtp_tls_doc = _SMTPTLSReportDoc(
org_name=report["organization_name"],
date_range=[report["begin_date"], report["end_date"]],
date_begin=report["begin_date"],
date_end=report["end_date"],
contact_info=report["contact_info"],
report_id=report["report_id"],
)
@@ -760,32 +781,48 @@ def save_smtp_tls_report_to_opensearch(
policy_doc = _SMTPTLSPolicyDoc(
policy_domain=policy["policy_domain"],
policy_type=policy["policy_type"],
succesful_session_count=policy["successful_session_count"],
failed_session_count=policy["failed_session_count"],
policy_string=policy_strings,
mx_host_patterns=mx_host_patterns,
)
if "failure_details" in policy:
failure_details = policy["failure_details"]
receiving_mx_hostname = None
additional_information_uri = None
failure_reason_code = None
if "receiving_mx_hostname" in failure_details:
receiving_mx_hostname = failure_details["receiving_mx_hostname"]
if "additional_information_uri" in failure_details:
additional_information_uri = failure_details[
"additional_information_uri"
]
if "failure_reason_code" in failure_details:
failure_reason_code = failure_details["failure_reason_code"]
policy_doc.add_failure_details(
result_type=failure_details["result_type"],
ip_address=failure_details["ip_address"],
receiving_ip=failure_details["receiving_ip"],
receiving_mx_helo=failure_details["receiving_mx_helo"],
failed_session_count=failure_details["failed_session_count"],
receiving_mx_hostname=receiving_mx_hostname,
additional_information_uri=additional_information_uri,
failure_reason_code=failure_reason_code,
)
for failure_detail in policy["failure_details"]:
receiving_mx_hostname = None
additional_information_uri = None
failure_reason_code = None
ip_address = None
receiving_ip = None
receiving_mx_helo = None
sending_mta_ip = None
if "receiving_mx_hostname" in failure_detail:
receiving_mx_hostname = failure_detail["receiving_mx_hostname"]
if "additional_information_uri" in failure_detail:
additional_information_uri = failure_detail[
"additional_information_uri"
]
if "failure_reason_code" in failure_detail:
failure_reason_code = failure_detail["failure_reason_code"]
if "ip_address" in failure_detail:
ip_address = failure_detail["ip_address"]
if "receiving_ip" in failure_detail:
receiving_ip = failure_detail["receiving_ip"]
if "receiving_mx_helo" in failure_detail:
receiving_mx_helo = failure_detail["receiving_mx_helo"]
if "sending_mta_ip" in failure_detail:
sending_mta_ip = failure_detail["sending_mta_ip"]
policy_doc.add_failure_details(
result_type=failure_detail["result_type"],
ip_address=ip_address,
receiving_ip=receiving_ip,
receiving_mx_helo=receiving_mx_helo,
failed_session_count=failure_detail["failed_session_count"],
sending_mta_ip=sending_mta_ip,
receiving_mx_hostname=receiving_mx_hostname,
additional_information_uri=additional_information_uri,
failure_reason_code=failure_reason_code,
)
smtp_tls_doc.policies.append(policy_doc)
create_indexes([index], index_settings)

View File

@@ -1,7 +1,7 @@
# About
`dbip-country-lite.mmdb` is provided by [dbip][dbip] under a
[ Creative Commons Attribution 4.0 International License][cc].
[Creative Commons Attribution 4.0 International License][cc].
[dbip]: https://db-ip.com/db/lite.php
[dbip]: https://db-ip.com/db/download/ip-to-country-lite
[cc]: http://creativecommons.org/licenses/by/4.0/

View File

@@ -3,6 +3,8 @@
A mapping is meant to make it easier to identify who or what a sending source is. Please consider contributing
additional mappings in a GitHub Pull Request.
Do not open these CSV files in Excel. It will replace Unicode characters with question marks. Use LibreOffice Calc instead.
## base_reverse_dns_map.csv
A CSV file with three fields: `base_reverse_dns`, `name`, and `type`.
@@ -19,33 +21,72 @@ The `service_type` is based on the following rule precedence:
3. All telecommunications providers that offer internet access are identified as `ISP`, even if they also offer other services, such as web hosting or email hosting.
4. All web hosting providers are identified as `Web Hosting`, even if the service also offers email hosting.
5. All email account providers are identified as `Email Provider`, no matter how or where they are hosted
6. All legitimate platforms offering their Software as a Service SaaS) are identified as `SaaS`, regardless of industry. This helps simplify metrics.
6. All legitimate platforms offering their Software as a Service (SaaS) are identified as `SaaS`, regardless of industry. This helps simplify metrics.
7. All other senders that use their own domain as a Reverse DNS base domain should be identified based on their industry
- Agriculture
- Automotive
- Beauty
- Conglomerate
- Construction
- Consulting
- Defense
- Education
- Email Provider
- Email Security
- Education
- Entertainment
- Event Planning
- Finance
- Food
- Government
- Government Media
- Healthcare
- IaaS
- Industrial
- ISP
- Legal
- Logistics
- Manufacturing
- Marketing
- MSP
- MSSP
- News
- Nonprofit
- PaaS
- Photography
- Physical Security
- Print
- Publishing
- Real Estate
- Retail
- SaaS
- Science
- Search Engine
- Social Media
- Sports
- Staffing
- Technology
- Travel
- Web Host
The file currently contains over 600 mappings from a wide variety of email sending services, including large email
providers, SaaS platforms, small web hosts, and healthcare companies. Ideally this mapping will continuously grow to
include many other services and industries.
The file currently contains over 1,400 mappings from a wide variety of email sending sources.
## known_unknown_base_reverse_dns.txt
A list of reverse DNS base domains that could not be identified as belonging to a particular organization, service, or industry.
## base_reverse_dns.csv
A CSV with the fields `source_name` and optionally `message_count`. This CSV can be generated by exporting the base DNS data from the Kibana or Splunk dashboards provided by parsedmarc. This file is not tracked by Git.
## unknown_base_reverse_dns.csv
A CSV file with the fields `source_name` and `message_count`. This file is not tracked by Git.
## find_bad_utf8.py
Locates invalid UTF-8 bytes in files and optionally tries to current them. Generated by GPT5. Helped me find where I had introduced invalid bytes in `base_reverse_dns_map.csv`.
## find_unknown_base_reverse_dns.py
This is a python script that reads the domains in `base_reverse_dns.csv` and writes the domains that are not in `base_reverse_dns_map.csv` or `known_unknown_base_reverse_dns.txt` to `unknown_base_reverse_dns.csv`. This is useful for identifying potential additional domains to contribute to `base_reverse_dns_map.csv` and `known_unknown_base_reverse_dns.txt`.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,488 @@
#!/usr/bin/env python3
import argparse
import codecs
import os
import sys
import shutil
from typing import List, Tuple
"""
Locates and optionally corrects bad UTF-8 bytes in a file.
Generated by GPT-5 Use at your own risk.
"""
# -------------------------
# UTF-8 scanning
# -------------------------
def scan_line_for_utf8_errors(
line_bytes: bytes, line_no: int, base_offset: int, context: int
):
"""
Scan one line of raw bytes for UTF-8 decoding errors.
Returns a list of dicts describing each error.
"""
pos = 0
results = []
while pos < len(line_bytes):
dec = codecs.getincrementaldecoder("utf-8")("strict")
try:
dec.decode(line_bytes[pos:], final=True)
break
except UnicodeDecodeError as e:
rel_index = e.start
abs_index_in_line = pos + rel_index
abs_offset = base_offset + abs_index_in_line
start_ctx = max(0, abs_index_in_line - context)
end_ctx = min(len(line_bytes), abs_index_in_line + 1 + context)
ctx_bytes = line_bytes[start_ctx:end_ctx]
bad_byte = line_bytes[abs_index_in_line : abs_index_in_line + 1]
col = abs_index_in_line + 1 # 1-based byte column
results.append(
{
"line": line_no,
"column": col,
"abs_offset": abs_offset,
"bad_byte_hex": bad_byte.hex(),
"context_hex": ctx_bytes.hex(),
"context_preview": ctx_bytes.decode("utf-8", errors="replace"),
}
)
# Move past the offending byte and continue
pos = abs_index_in_line + 1
return results
def scan_file_for_utf8_errors(path: str, context: int, limit: int):
errors_found = 0
limit_val = limit if limit != 0 else float("inf")
with open(path, "rb") as f:
total_offset = 0
line_no = 0
while True:
line = f.readline()
if not line:
break
line_no += 1
results = scan_line_for_utf8_errors(line, line_no, total_offset, context)
for r in results:
errors_found += 1
print(
f"[ERROR {errors_found}] Line {r['line']}, Column {r['column']}, "
f"Absolute byte offset {r['abs_offset']}"
)
print(f" Bad byte: 0x{r['bad_byte_hex']}")
print(f" Context (hex): {r['context_hex']}")
print(f" Context (preview): {r['context_preview']}")
print()
if errors_found >= limit_val:
print(f"Reached limit of {limit} errors. Stopping.")
return errors_found
total_offset += len(line)
if errors_found == 0:
print("No invalid UTF-8 bytes found. 🎉")
else:
print(f"Found {errors_found} invalid UTF-8 byte(s).")
return errors_found
# -------------------------
# Whole-file conversion
# -------------------------
def detect_encoding_text(path: str) -> Tuple[str, str]:
"""
Use charset-normalizer to detect file encoding.
Return (encoding_name, decoded_text). Falls back to cp1252 if needed.
"""
try:
from charset_normalizer import from_path
except ImportError:
print(
"Please install charset-normalizer: pip install charset-normalizer",
file=sys.stderr,
)
sys.exit(4)
matches = from_path(path)
match = matches.best()
if match is None or match.encoding is None:
# Fallback heuristic for Western single-byte text
with open(path, "rb") as fb:
data = fb.read()
try:
return "cp1252", data.decode("cp1252", errors="strict")
except UnicodeDecodeError:
print("Unable to detect encoding reliably.", file=sys.stderr)
sys.exit(5)
return match.encoding, str(match)
def convert_to_utf8(src_path: str, out_path: str, src_encoding: str = None) -> str:
"""
Convert an entire file to UTF-8 (re-decoding everything).
If src_encoding is provided, use it; else auto-detect.
Returns the encoding actually used.
"""
if src_encoding:
with open(src_path, "rb") as fb:
data = fb.read()
try:
text = data.decode(src_encoding, errors="strict")
except LookupError:
print(f"Unknown encoding: {src_encoding}", file=sys.stderr)
sys.exit(6)
except UnicodeDecodeError as e:
print(f"Decoding failed with {src_encoding}: {e}", file=sys.stderr)
sys.exit(7)
used = src_encoding
else:
used, text = detect_encoding_text(src_path)
with open(out_path, "w", encoding="utf-8", newline="") as fw:
fw.write(text)
return used
def verify_utf8_file(path: str) -> Tuple[bool, str]:
try:
with open(path, "rb") as fb:
fb.read().decode("utf-8", errors="strict")
return True, ""
except UnicodeDecodeError as e:
return False, str(e)
# -------------------------
# Targeted single-byte fixer
# -------------------------
def iter_lines_with_offsets(b: bytes):
"""
Yield (line_bytes, line_start_abs_offset). Preserves LF/CRLF/CR in bytes.
"""
start = 0
for i, byte in enumerate(b):
if byte == 0x0A: # LF
yield b[start : i + 1], start
start = i + 1
if start < len(b):
yield b[start:], start
def detect_probable_fallbacks() -> List[str]:
# Good defaults for Western/Portuguese text
return ["cp1252", "iso-8859-1", "iso-8859-15"]
def repair_mixed_utf8_line(line: bytes, base_offset: int, fallback_chain: List[str]):
"""
Strictly validate UTF-8 and fix *only* the exact offending byte when an error occurs.
This avoids touching adjacent valid UTF-8 (prevents mojibake like 'é').
"""
out_fragments: List[str] = []
fixes = []
pos = 0
n = len(line)
while pos < n:
dec = codecs.getincrementaldecoder("utf-8")("strict")
try:
s = dec.decode(line[pos:], final=True)
out_fragments.append(s)
break
except UnicodeDecodeError as e:
# Append the valid prefix before the error
if e.start > 0:
out_fragments.append(
line[pos : pos + e.start].decode("utf-8", errors="strict")
)
bad_index = pos + e.start # absolute index in 'line'
bad_slice = line[bad_index : bad_index + 1] # FIX EXACTLY ONE BYTE
# Decode that single byte using the first working fallback
decoded = None
used_enc = None
for enc in fallback_chain:
try:
decoded = bad_slice.decode(enc, errors="strict")
used_enc = enc
break
except Exception:
continue
if decoded is None:
# latin-1 always succeeds (byte->same code point)
decoded = bad_slice.decode("latin-1")
used_enc = "latin-1 (fallback)"
out_fragments.append(decoded)
# Log the fix
col_1based = bad_index + 1 # byte-based column
fixes.append(
{
"line_base_offset": base_offset,
"line": None, # caller fills line number
"column": col_1based,
"abs_offset": base_offset + bad_index,
"bad_bytes_hex": bad_slice.hex(),
"used_encoding": used_enc,
"replacement_preview": decoded,
}
)
# Advance exactly one byte past the offending byte and continue
pos = bad_index + 1
return "".join(out_fragments), fixes
def targeted_fix_to_utf8(
src_path: str,
out_path: str,
fallback_chain: List[str],
dry_run: bool,
max_fixes: int,
):
with open(src_path, "rb") as fb:
data = fb.read()
total_fixes = 0
repaired_lines: List[str] = []
line_no = 0
max_val = max_fixes if max_fixes != 0 else float("inf")
for line_bytes, base_offset in iter_lines_with_offsets(data):
line_no += 1
# Fast path: keep lines that are already valid UTF-8
try:
repaired_lines.append(line_bytes.decode("utf-8", errors="strict"))
continue
except UnicodeDecodeError:
pass
fixed_text, fixes = repair_mixed_utf8_line(
line_bytes, base_offset, fallback_chain=fallback_chain
)
for f in fixes:
f["line"] = line_no
repaired_lines.append(fixed_text)
# Log fixes
for f in fixes:
total_fixes += 1
print(
f"[FIX {total_fixes}] Line {f['line']}, Column {f['column']}, Abs offset {f['abs_offset']}"
)
print(f" Bad bytes: 0x{f['bad_bytes_hex']}")
print(f" Used encoding: {f['used_encoding']}")
preview = f["replacement_preview"].replace("\r", "\\r").replace("\n", "\\n")
if len(preview) > 40:
preview = preview[:40] + ""
print(f" Replacement preview: {preview}")
print()
if total_fixes >= max_val:
print(f"Reached max fixes limit ({max_fixes}). Stopping scan.")
break
if total_fixes >= max_val:
break
if dry_run:
print(f"Dry run complete. Detected {total_fixes} fix(es). No file written.")
return total_fixes
# Join and verify result can be encoded to UTF-8
repaired_text = "".join(repaired_lines)
try:
repaired_text.encode("utf-8", errors="strict")
except UnicodeEncodeError as e:
print(f"Internal error: repaired text not valid UTF-8: {e}", file=sys.stderr)
sys.exit(3)
with open(out_path, "w", encoding="utf-8", newline="") as fw:
fw.write(repaired_text)
print(f"Fixed file written to: {out_path}")
print(f"Total fixes applied: {total_fixes}")
return total_fixes
# -------------------------
# CLI
# -------------------------
def main():
ap = argparse.ArgumentParser(
description=(
"Scan for invalid UTF-8; optionally convert whole file or fix only invalid bytes.\n\n"
"By default, --convert and --fix **edit the input file in place** and create a backup "
"named '<input>.bak' before writing. If you pass --output, the original file is left "
"unchanged and no backup is created. Use --dry-run to preview fixes without writing."
),
formatter_class=argparse.RawTextHelpFormatter,
)
ap.add_argument("path", help="Path to the CSV/text file")
ap.add_argument(
"--context",
type=int,
default=20,
help="Bytes of context to show around errors (default: 20)",
)
ap.add_argument(
"--limit",
type=int,
default=100,
help="Max errors to report during scan (0 = unlimited)",
)
ap.add_argument(
"--skip-scan", action="store_true", help="Skip initial scan for speed"
)
# Whole-file convert
ap.add_argument(
"--convert",
action="store_true",
help="Convert entire file to UTF-8 using auto/forced encoding "
"(in-place by default; creates '<input>.bak').",
)
ap.add_argument(
"--encoding",
help="Force source encoding for --convert or first fallback for --fix",
)
ap.add_argument(
"--output",
help="Write to this path instead of in-place (no .bak is created in that case)",
)
# Targeted fix
ap.add_argument(
"--fix",
action="store_true",
help="Fix only invalid byte(s) via fallback encodings "
"(in-place by default; creates '<input>.bak').",
)
ap.add_argument(
"--fallbacks",
help="Comma-separated fallback encodings (default: cp1252,iso-8859-1,iso-8859-15)",
)
ap.add_argument(
"--dry-run",
action="store_true",
help="(fix) Print fixes but do not write or create a .bak",
)
ap.add_argument(
"--max-fixes",
type=int,
default=0,
help="(fix) Stop after N fixes (0 = unlimited)",
)
args = ap.parse_args()
path = args.path
if not os.path.isfile(path):
print(f"File not found: {path}", file=sys.stderr)
sys.exit(2)
# Optional scan first
if not args.skip_scan:
scan_file_for_utf8_errors(path, context=args.context, limit=args.limit)
# Mode selection guards
if args.convert and args.fix:
print("Choose either --convert or --fix (not both).", file=sys.stderr)
sys.exit(9)
if not args.convert and not args.fix and args.skip_scan:
print("No action selected (use --convert or --fix).")
return
if not args.convert and not args.fix:
# User only wanted a scan
return
# Determine output path and backup behavior
# In-place by default: create '<input>.bak' before overwriting.
if args.output:
out_path = args.output
in_place = False
else:
out_path = path
in_place = True
# CONVERT mode
if args.convert:
print("\n[CONVERT MODE] Converting file to UTF-8...")
if in_place:
# Create backup before overwriting original
backup_path = path + ".bak"
shutil.copy2(path, backup_path)
print(f"Backup created: {backup_path}")
used = convert_to_utf8(path, out_path, src_encoding=args.encoding)
print(f"Source encoding used: {used}")
print(f"Saved UTF-8 file as: {out_path}")
ok, err = verify_utf8_file(out_path)
if ok:
print("Verification: output is valid UTF-8 ✅")
else:
print(f"Verification failed: {err}")
sys.exit(8)
return
# FIX mode (targeted, single-byte)
if args.fix:
print("\n[FIX MODE] Fixing only invalid bytes to UTF-8...")
if args.dry_run:
# Dry-run: never write or create backup
out_path_effective = os.devnull
in_place_effective = False
else:
out_path_effective = out_path
in_place_effective = in_place
# Build fallback chain (if --encoding provided, try it first)
if args.fallbacks:
fallback_chain = [e.strip() for e in args.fallbacks.split(",") if e.strip()]
else:
fallback_chain = detect_probable_fallbacks()
if args.encoding and args.encoding not in fallback_chain:
fallback_chain = [args.encoding] + fallback_chain
if in_place_effective:
# Create backup before overwriting original (only when actually writing)
backup_path = path + ".bak"
shutil.copy2(path, backup_path)
print(f"Backup created: {backup_path}")
fix_count = targeted_fix_to_utf8(
path,
out_path_effective,
fallback_chain=fallback_chain,
dry_run=args.dry_run,
max_fixes=args.max_fixes,
)
if not args.dry_run:
ok, err = verify_utf8_file(out_path_effective)
if ok:
print("Verification: output is valid UTF-8 ✅")
print(f"Fix mode completed — {fix_count} byte(s) corrected.")
else:
print(f"Verification failed: {err}")
sys.exit(8)
return
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,93 @@
#!/usr/bin/env python
import logging
import os
import csv
def _main():
input_csv_file_path = "base_reverse_dns.csv"
base_reverse_dns_map_file_path = "base_reverse_dns_map.csv"
known_unknown_list_file_path = "known_unknown_base_reverse_dns.txt"
psl_overrides_file_path = "psl_overrides.txt"
output_csv_file_path = "unknown_base_reverse_dns.csv"
csv_headers = ["source_name", "message_count"]
output_rows = []
logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
for p in [
input_csv_file_path,
base_reverse_dns_map_file_path,
known_unknown_list_file_path,
psl_overrides_file_path,
]:
if not os.path.exists(p):
logger.error(f"{p} does not exist")
exit(1)
logger.info(f"Loading {known_unknown_list_file_path}")
known_unknown_domains = []
with open(known_unknown_list_file_path) as f:
for line in f.readlines():
domain = line.lower().strip()
if domain in known_unknown_domains:
logger.warning(
f"{domain} is in {known_unknown_list_file_path} multiple times"
)
else:
known_unknown_domains.append(domain)
logger.info(f"Loading {psl_overrides_file_path}")
psl_overrides = []
with open(psl_overrides_file_path) as f:
for line in f.readlines():
domain = line.lower().strip()
if domain in psl_overrides:
logger.warning(
f"{domain} is in {psl_overrides_file_path} multiple times"
)
else:
psl_overrides.append(domain)
logger.info(f"Loading {base_reverse_dns_map_file_path}")
known_domains = []
with open(base_reverse_dns_map_file_path) as f:
for row in csv.DictReader(f):
domain = row["base_reverse_dns"].lower().strip()
if domain in known_domains:
logger.warning(
f"{domain} is in {base_reverse_dns_map_file_path} multiple times"
)
else:
known_domains.append(domain)
if domain in known_unknown_domains and known_domains:
pass
logger.warning(
f"{domain} is in {known_unknown_list_file_path} and \
{base_reverse_dns_map_file_path}"
)
logger.info(f"Checking domains against {base_reverse_dns_map_file_path}")
with open(input_csv_file_path) as f:
for row in csv.DictReader(f):
domain = row["source_name"].lower().strip()
if domain == "":
continue
for psl_domain in psl_overrides:
if domain.endswith(psl_domain):
domain = psl_domain
break
if domain not in known_domains and domain not in known_unknown_domains:
logger.info(f"New unknown domain found: {domain}")
output_rows.append(row)
logger.info(f"Writing {output_csv_file_path}")
with open(output_csv_file_path, "w") as f:
writer = csv.DictWriter(f, fieldnames=csv_headers)
writer.writeheader()
writer.writerows(output_rows)
if __name__ == "__main__":
_main()

View File

@@ -0,0 +1,309 @@
185.in-addr.arpa
190.in-addr.arpa
200.in-addr.arpa
9services.com
a7e.ru
a94434500-blog.com
abv-10.top
adlucrumnewsletter.com
admin.corpivensa.gob.ve
advantageiq.com
advrider.ro
aerospacevitro.us.com
agenturserver.de
aghories.com
ai270.net
albagroup-eg.com
alchemy.net
anchorfundhub.com
anglishment.com
anteldata.net.uy
antis.edu
antonaoll.com
anviklass.org
anwrgrp.lat
aosau.net
arandomserver.com
aransk.ru
ardcs.cn
as29550.net
asmecam.it
aurelienvos.com
automatech.lat
avistaadvantage.com
b8sales.com
bearandbullmarketnews.com
bestinvestingtime.com
biocorp.com
bitter-echo.com
blguss.com
bluenet.ch
bluhosting.com
bodiax.pp.ua
bost-law.com
brainity.com
brnonet.cz
brushinglegal.de
brw.net
budgeteasehub.com
buoytoys.com
cashflowmasterypro.com
cavabeen.com
cbti.net
checkpox.fun
chegouseuvlache.org
christus.mx
ckaik.cn
cloud-edm.com
cloudaccess.net
cloudflare-email.org
cloudhosting.rs
cloudlogin.co
cnode.io
code-it.net
colombiaceropapel.org
commerceinsurance.com
coolblaze.com
coowo.com
corpemail.net
cp2-myorderbox.com
cps.com.ar
ctla.co.kr
dastans.ru
datahost36.de
descarca-counter-strike.net
detrot.xyz
digi.net.my
dinofelis.cn
diwkyncbi.top
dkginternet.com
dns-oid.com
domconfig.com
doorsrv.com
dreampox.fun
dreamtechmedia.com
ds.network
dvj.theworkpc.com
dwlcka.com
dyntcorp.com
economiceagles.com
egosimail.com
emailgids.net
emailperegrine.com
entretothom.net
epsilon-group.com
erestaff.com
example.com
exposervers.com-new
eyecandyhosting.xyz
fetscorp.shop
fin-crime.com
financeaimpoint.com
financeupward.com
flex-video.bnr.la
formicidaehunt.net
fosterheap.com
frontiernet.net
gendns.com
getthatroi.com
gigidea.net
giize.com
ginous.eu.com
gist-th.com
gophermedia.com
gqlists.us.com
gratzl.de
greatestworldnews.com
greennutritioncare.com
h-serv.co.uk
hgnbroken.us.com
hosting1337.com
hostinglotus.cloud
hostingmichigan.com
hostiran.name
hostmnl.com
hostname.localhost
hostnetwork.com
hosts.net.nz
hostwhitelabel.com
hpms1.jp
hypericine.com
i-mecca.net
iaasdns.com
iam.net.ma
iconmarketingguy.com
idcfcloud.net
idealconcept.live
igppevents.org.uk
imjtmn.cn
immenzaces.com
inshaaegypt.com
ip-147-135-108.us
ip-178-33-109.eu
ip-ptr.tech
itsidc.com
itwebs.com
ivol.co
jalanet.co.id
jimishare.com
jumanra.org
kahlaa.com
kbronet.com.tw
kdnursing.org
kihy.theworkpc.com
kitchenaildbd.com
layerdns.cloud
legenditds.com
lighthouse-media.com
listertermoformadoa.com
llsend.com
lohkal.com
lonestarmm.net
longwoodmgmt.com
lwl-puehringer.at
lynx.net.lb
magnetmail.net
mail-fire.com
manhattanbulletpoint.com
manpowerservices.com
marketmysterycode.com
marketwizardspro.com
masterclassjournal.com
matroguel.cam
maximpactipo.com
mechanicalwalk.store
mediavobis.com
misorpresa.com
moderntradingnews.com
moonjaws.com
morningnewscatcher.com
motion4ever.net
mschosting.com
msdp1.com
mspnet.pro
mts-nn.ru
mxserver.ro
mxthunder.net
myrewards.net
mysagestore.com
mysecurewebserver.com
myvps.jp
name.tools
nanshenqfurniture.com
nask.pl
ncport.ru
ncsdi.ws
nebdig.com
neovet-base.ru
netbri.com
netkl.org
newinvestingguide.com
newwallstreetcode.com
ngvcv.cn
nic.name
nidix.net
nlscanme.com
nmeuh.cn
noisndametal.com
nwo.giize.com
offerslatedeals.com
office365.us
ogicom.net
omegabrasil.inf.br
onnet21.com
oppt-ac.fit
orbitel.net.co
ovaltinalization.co
overta.ru
panaltyspot.space
passionatesmiles.com
perimetercenter.net
permanentscreen.com
phdns3.es
planethoster.net
plesk.page
pmnhost.net
pokupki5.ru
popiup.com
ports.net
prima.com.ar
prima.net.ar
prohealthmotion.com
proudserver.com
psnm.ru
pvcwindowsprices.live
qontenciplc.autos
quatthonggiotico.com
rapidns.com
raxa.host
reliablepanel.com
rgb365.eu
riddlecamera.net
rwdhosting.ca
s500host.com
sahacker-2020.com
samsales.site
saransk.ru
satirogluet.com
seaspraymta3.net
secorp.mx
securen.net
securerelay.in
securev.net
servershost.biz
silvestrejaguar.sbs
silvestreonca.sbs
siriuscloud.jp
sisglobalresearch.com
smallvillages.com
smartape-vps.com
solusoftware.com
spiritualtechnologies.io
sprout.org
stableserver.net
stockepictigers.com
stockexchangejournal.com
suksangroup.com
sysop4.com
system.eu.com
szhongbing.com
t-jon.com
tecnoxia.net
tel-xyz.fit
tenkids.net
terminavalley.com
thaicloudsolutions.com
thaimonster.com
thepushcase.com
totaal.net
traderlearningcenter.com
tullostrucking.com
ultragate.com
unite.services
urawasl.com
us.servername.us
varvia.de
vendimetry.com
vibrantwellnesscorp.com
viviotech.us
vlflgl.com
volganet.ru
wallstreetsgossip.com
wealthexpertisepro.com
web-login.eu
weblinkinternational.com
webnox.io
welllivinghive.com
wisdomhard.com
wisewealthcircle.com
wsiph2.com
xnt.mx
xpnuf.cn
xsfati.us.com
xspmail.jp
yourciviccompass.com
yourinvestworkbook.com
yoursitesecure.net
zerowebhosting.net
znlc.jp
ztomy.com

View File

@@ -0,0 +1,6 @@
akura.ne.jp
amazonaws.com
cloudaccess.net
h-serv.co.uk
linode.com
plesk.page

View File

@@ -0,0 +1,59 @@
#!/usr/bin/env python3
import os
import csv
maps_dir = os.path.join(".")
map_files = ["base_reverse_dns_map.csv"]
list_files = ["known_unknown_base_reverse_dns.txt", "psl_overrides.txt"]
def sort_csv(filepath, column=0):
with open(filepath, mode="r", newline="") as infile:
reader = csv.reader(infile)
header = next(reader)
sorted_rows = sorted(reader, key=lambda row: row[column])
existing_values = []
for row in sorted_rows:
if row[column] in existing_values:
print(f"Warning: {row[column]} is in {filepath} multiple times")
with open(filepath, mode="w", newline="\n") as outfile:
writer = csv.writer(outfile)
writer.writerow(header)
writer.writerows(sorted_rows)
def sort_list_file(
filepath,
lowercase=True,
strip=True,
deduplicate=True,
remove_blank_lines=True,
ending_newline=True,
newline="\n",
):
with open(filepath, mode="r", newline=newline) as infile:
lines = infile.readlines()
for i in range(len(lines)):
if lowercase:
lines[i] = lines[i].lower()
if strip:
lines[i] = lines[i].strip()
if deduplicate:
lines = list(set(lines))
if remove_blank_lines:
while "" in lines:
lines.remove("")
lines = sorted(lines)
if ending_newline:
if lines[-1] != "":
lines.append("")
with open(filepath, mode="w", newline=newline) as outfile:
outfile.write("\n".join(lines))
for csv_file in map_files:
sort_csv(os.path.join(maps_dir, csv_file))
for list_file in list_files:
sort_list_file(os.path.join(maps_dir, list_file))

View File

@@ -5,7 +5,7 @@ import json
import urllib3
import requests
from parsedmarc import __version__
from parsedmarc.constants import USER_AGENT
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_unix_timestamp
@@ -51,7 +51,7 @@ class HECClient(object):
self._common_data = dict(host=self.host, source=self.source, index=self.index)
self.session.headers = {
"User-Agent": "parsedmarc/{0}".format(__version__),
"User-Agent": USER_AGENT,
"Authorization": "Splunk {0}".format(self.access_token),
}

View File

@@ -19,10 +19,11 @@ import csv
import io
try:
import importlib.resources as pkg_resources
from importlib.resources import files
except ImportError:
# Try backported to PY<37 `importlib_resources`
import importlib_resources as pkg_resources
# Try backported to PY<3 `importlib_resources`
from importlib.resources import files
from dateutil.parser import parse as parse_date
import dns.reversename
@@ -36,7 +37,7 @@ import requests
from parsedmarc.log import logger
import parsedmarc.resources.dbip
import parsedmarc.resources.maps
from parsedmarc.constants import USER_AGENT
parenthesis_regex = re.compile(r"\s*\(.*\)\s*")
@@ -280,14 +281,13 @@ def get_ip_address_country(ip_address, db_path=None):
break
if db_path is None:
with pkg_resources.path(
parsedmarc.resources.dbip, "dbip-country-lite.mmdb"
) as path:
db_path = path
db_path = str(
files(parsedmarc.resources.dbip).joinpath("dbip-country-lite.mmdb")
)
db_age = datetime.now() - datetime.fromtimestamp(os.stat(db_path).st_mtime)
if db_age > timedelta(days=30):
logger.warning("IP database is more than a month old")
db_age = datetime.now() - datetime.fromtimestamp(os.stat(db_path).st_mtime)
if db_age > timedelta(days=30):
logger.warning("IP database is more than a month old")
db_reader = geoip2.database.Reader(db_path)
@@ -344,21 +344,30 @@ def get_service_from_reverse_dns_base_domain(
if not (offline or always_use_local_file) and len(reverse_dns_map) == 0:
try:
logger.debug(f"Trying to fetch " f"reverse DNS map from {url}...")
csv_file.write(requests.get(url).text)
logger.debug(f"Trying to fetch reverse DNS map from {url}...")
headers = {"User-Agent": USER_AGENT}
response = requests.get(url, headers=headers)
response.raise_for_status()
csv_file.write(response.text)
csv_file.seek(0)
load_csv(csv_file)
except requests.exceptions.RequestException as e:
logger.warning(f"Failed to fetch reverse DNS map: {e}")
except Exception:
logger.warning("Not a valid CSV file")
csv_file.seek(0)
logging.debug("Response body:")
logger.debug(csv_file.read())
if len(reverse_dns_map) == 0:
logger.info("Loading included reverse DNS map...")
with pkg_resources.path(
parsedmarc.resources.maps, "base_reverse_dns_map.csv"
) as path:
if local_file_path is not None:
path = local_file_path
with open(path) as csv_file:
load_csv(csv_file)
path = str(
files(parsedmarc.resources.maps).joinpath("base_reverse_dns_map.csv")
)
if local_file_path is not None:
path = local_file_path
with open(path) as csv_file:
load_csv(csv_file)
try:
service = reverse_dns_map[base_domain]
except KeyError:

View File

@@ -1,6 +1,7 @@
import requests
from parsedmarc import logger
from parsedmarc.constants import USER_AGENT
class WebhookClient(object):
@@ -21,7 +22,7 @@ class WebhookClient(object):
self.timeout = timeout
self.session = requests.Session()
self.session.headers = {
"User-Agent": "parsedmarc",
"User-Agent": USER_AGENT,
"Content-Type": "application/json",
}

View File

@@ -1,6 +1,6 @@
[build-system]
requires = [
"hatchling>=1.8.1",
"hatchling>=1.27.0",
]
build-backend = "hatchling.build"
@@ -59,7 +59,7 @@ dependencies = [
[project.optional-dependencies]
build = [
"hatch",
"hatch>=1.14.0",
"myst-parser[linkify]",
"nose",
"pytest",
@@ -76,9 +76,20 @@ parsedmarc = "parsedmarc.cli:_main"
Homepage = "https://domainaware.github.io/parsedmarc"
[tool.hatch.version]
path = "parsedmarc/__init__.py"
path = "parsedmarc/constants.py"
[tool.hatch.build.targets.sdist]
include = [
"/parsedmarc",
]
[tool.hatch.build]
exclude = [
"base_reverse_dns.csv",
"find_bad_utf8.py",
"find_unknown_base_reverse_dns.py",
"unknown_base_reverse_dns.csv",
"sortmaps.py",
"README.md",
"*.bak"
]

View File

@@ -0,0 +1,107 @@
<form version="1.1" theme="dark">
<label>SMTP TLS Reporting</label>
<fieldset submitButton="false" autoRun="true">
<input type="time" token="time">
<label></label>
<default>
<earliest>-7d@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" token="organization_name" searchWhenChanged="true">
<label>Organization name</label>
<default>*</default>
<initialValue>*</initialValue>
</input>
<input type="text" token="policy_domain">
<label>Policy domain</label>
<default>*</default>
<initialValue>*</initialValue>
</input>
<input type="dropdown" token="policy_type" searchWhenChanged="true">
<label>Policy type</label>
<choice value="*">Any</choice>
<choice value="tlsa">tlsa</choice>
<choice value="sts">sts</choice>
<choice value="no-policy-found">no-policy-found</choice>
<default>*</default>
<initialValue>*</initialValue>
</input>
</fieldset>
<row>
<panel>
<title>Reporting organizations</title>
<table>
<search>
<query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$
| rename policies{}.policy_domain as policy_domain
| rename policies{}.policy_type as policy_type
| rename policies{}.failed_session_count as failed_sessions
| rename policies{}.failure_details{}.failed_session_count as failed_sessions
| rename policies{}.successful_session_count as successful_sessions
| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
| rename policies{}.failure_details{}.receiving_ip as receiving_ip
| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
| rename policies{}.failure_details{}.result_type as failure_type
| fillnull value=0 failed_sessions
| stats sum(failed_sessions) as failed_sessions sum(successful_sessions) as successful_sessions by organization_name
| sort -successful_sessions 0</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<title>Domains</title>
<table>
<search>
<query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$
| rename policies{}.policy_domain as policy_domain
| rename policies{}.policy_type as policy_type
| rename policies{}.failed_session_count as failed_sessions
| rename policies{}.failure_details{}.failed_session_count as failed_sessions
| rename policies{}.successful_session_count as successful_sessions
| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
| rename policies{}.failure_details{}.receiving_ip as receiving_ip
| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
| rename policies{}.failure_details{}.result_type as failure_type
| fillnull value=0 failed_sessions
| stats sum(failed_sessions) as failed_sessions sum(successful_sessions) as successful_sessions by policy_domain
| sort -successful_sessions 0</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Failure details</title>
<table>
<search>
<query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$ policies{}.failure_details{}.result_type=*
| rename policies{}.policy_domain as policy_domain
| rename policies{}.policy_type as policy_type
| rename policies{}.failed_session_count as failed_sessions
| rename policies{}.failure_details{}.failed_session_count as failed_sessions
| rename policies{}.successful_session_count as successful_sessions
| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
| rename policies{}.failure_details{}.receiving_ip as receiving_ip
| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
| fillnull value=0 failed_sessions
| rename policies{}.failure_details{}.result_type as failure_type
| table _time organization_name policy_domain policy_type failed_sessions successful_sessions sending_mta_ip receiving_ip receiving_mx_hostname failure_type
| sort by -_time 0</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
</form>