Compare commits

..

126 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
4219306365 Update Python 3.9 version table entry to note Debian 11/RHEL 9 usage
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-03-03 16:27:53 +00:00
copilot-swe-agent[bot]
a6e009c149 Drop Python 3.9 support: update CI matrix, pyproject.toml, docs, and README
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-03-03 16:20:34 +00:00
copilot-swe-agent[bot]
33384bd612 Initial plan 2026-03-03 16:18:42 +00:00
Sean Whalen
33eb2aaf62 9.1.0
## Enhancements

- Add TCP and TLS support for syslog output. (#656)
- Skip DNS lookups in GitHub Actions to prevent DNS timeouts during tests timeouts. (#657)
- Remove microseconds from DMARC aggregate report time ranges before parsing them.
2026-02-20 14:36:37 -05:00
Sean Whalen
1387fb4899 9.0.11
- Remove microseconds from DMARC aggregate report time ranges before parsing them.
2026-02-20 14:27:51 -05:00
Copilot
4d97bd25aa Skip DNS lookups in GitHub Actions to prevent test timeouts (#657)
* Add offline mode for tests in GitHub Actions to skip DNS lookups

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-18 18:19:28 -05:00
Copilot
17a612df0c Add TCP and TLS transport support to syslog module (#656)
- Updated parsedmarc/syslog.py to support UDP, TCP, and TLS protocols
- Added protocol parameter with UDP as default for backward compatibility
- Implemented TLS support with CA verification and client certificate auth
- Added retry logic for TCP/TLS connections with configurable attempts and delays
- Updated parsedmarc/cli.py with new config file parsing
- Updated documentation with examples for TCP and TLS configurations

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Remove CLI arguments for syslog options, keep config-file only

Per user request, removed command-line argument options for syslog parameters.
All new syslog options (protocol, TLS settings, timeout, retry) are now only
available via configuration file, consistent with other similar options.

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Fix code review issues: remove trailing whitespace and add cert validation

- Removed trailing whitespace from syslog.py and usage.md
- Added warning when only one of certfile_path/keyfile_path is provided
- Improved error handling for incomplete TLS client certificate configuration

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Set minimum TLS version to 1.2 for enhanced security

Explicitly configured ssl_context.minimum_version = TLSVersion.TLSv1_2
to ensure only secure TLS versions are used for syslog connections.

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-18 18:12:59 -05:00
Blackmoon
221bc332ef Fixed a typo in policies.successful_session_count (#654) 2026-02-09 13:57:11 -05:00
Sean Whalen
a2a75f7a81 Fix timestamp parsing in aggregate report by removing fractional seconds 2026-01-21 13:08:48 -05:00
Anael Mobilia
50fcb51577 Update supported Python versions in docs + readme (#652)
* Update README.md

* Update index.md

* Update python-tests.yml
2026-01-19 14:40:01 -05:00
Sean Whalen
dd9ef90773 9.0.10
- Support Python 3.14+
2026-01-17 14:09:18 -05:00
Sean Whalen
0e3a4b0f06 9.0.9
Validate that a string is base64-encoded before trying to base64 decode it. (PRs #648 and #649)
2026-01-08 13:29:23 -05:00
maraspr
343b53ef18 remove newlines before b64decode (#649) 2026-01-08 12:24:20 -05:00
maraspr
792079a3e8 Validate that string is base64 (#648) 2026-01-08 10:15:27 -05:00
Sean Whalen
1f3a1fc843 Better typing 2025-12-29 17:14:54 -05:00
Sean Whalen
34fa0c145d 9.0.8
- Fix logging configuration not propagating to child parser processes (#646).
- Update `mailsuite` dependency to `?=1.11.1` to solve issues with iCloud IMAP (#493).
2025-12-29 17:07:38 -05:00
Copilot
6719a06388 Fix logging configuration not propagating to child parser processes (#646)
* Initial plan

* Fix logging configuration propagation to child parser processes

- Add _configure_logging() helper function to set up logging in child processes
- Modified cli_parse() to accept log_level and log_file parameters
- Pass current logging configuration from parent to child processes
- Logging warnings/errors from child processes now properly display

Fixes issue where logging handlers in parent process were not inherited by
child processes created via multiprocessing.Process(). Child processes now
configure their own logging with the same settings as the parent.

Tested with sample files and confirmed warnings from DNS exceptions in child
processes are now visible.

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Address code review feedback on logging configuration

- Use exact type check (type(h) is logging.StreamHandler) instead of isinstance
  to avoid confusion with FileHandler subclass
- Catch specific exceptions (IOError, OSError, PermissionError) instead of
  bare Exception when creating FileHandler
- Kept logging.ERROR as default to maintain consistency with existing behavior

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2025-12-29 15:07:22 -05:00
Sean Whalen
eafa435868 Code cleanup 2025-12-29 14:32:05 -05:00
Sean Whalen
5d772c3b36 Bump version to 9.0.7 and update changelog with IMAP since option fix 2025-12-29 14:23:50 -05:00
Copilot
72cabbef23 Fix IMAP SEARCH SINCE date format to RFC 3501 DD-Mon-YYYY (#645)
* Initial plan

* Fix IMAP since option date format to use RFC 3501 compliant DD-Mon-YYYY format

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2025-12-29 14:18:48 -05:00
Sean Whalen
3d74cd6ac0 Update CHANGELOG with issue reference for email read status
Added a reference to issue #625 regarding email read status.
2025-12-29 12:10:19 -05:00
Tomáš Kováčik
d1ac59a016 fix #641 (#642)
* fix smtptls and forensic reports for GELF

* add policy_domain, policy_type and failed_session_count to record row

* Remove unused import of json in gelf.py

---------

Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
2025-12-29 12:05:07 -05:00
Anael Mobilia
7fdd53008f Update README.md (#644) 2025-12-29 10:36:21 -05:00
Sean Whalen
35331d4b84 Add parsedmarc.types module to API reference documentation 2025-12-25 17:24:45 -05:00
Sean Whalen
de9edd3590 Add note about email read status in Microsoft 365 to changelog 2025-12-25 17:16:39 -05:00
Sean Whalen
abf4bdba13 Add type annotations for SMTP TLS and forensic report structures 2025-12-25 16:39:33 -05:00
Sean Whalen
7b842740f5 Change file permissions for tests.py to make it executable 2025-12-25 16:02:33 -05:00
Sean Whalen
ebe3ccf40a Update changelog for version 9.0.6 and set version in constants.py 2025-12-25 16:01:25 -05:00
Sean Whalen
808285658f Refactor function parameters to use non-Optional types where applicable 2025-12-25 16:01:12 -05:00
Sean Whalen
bc1dae29bd Update mailsuite dependency version to 1.11.0 2025-12-25 15:32:27 -05:00
Sean Whalen
4b904444e5 Refactor and improve parsing and extraction functions
- Updated `extract_report` to handle various input types more robustly, removing unnecessary complexity and improving error handling.
- Simplified the handling of file-like objects and added checks for binary mode.
- Enhanced the `parse_report_email` function to streamline input processing and improve type handling.
- Introduced TypedDicts for better type safety in `utils.py`, specifically for reverse DNS and IP address information.
- Refined the configuration loading in `cli.py` to ensure boolean values are consistently cast to `bool`.
- Improved overall code readability and maintainability by restructuring and clarifying logic in several functions.
2025-12-25 15:30:20 -05:00
Sean Whalen
3608bce344 Remove unused import of Union and cast from cli.py 2025-12-24 16:53:22 -05:00
Sean Whalen
fe809c4c3f Add type ignore comments for Pyright in elastic.py and opensearch.py 2025-12-24 16:49:42 -05:00
Sean Whalen
a76c2f9621 More code cleanup 2025-12-24 16:36:59 -05:00
Sean Whalen
bb8f4002bf Use literal dicts instead of ordered dicts and other code cleanup 2025-12-24 15:04:10 -05:00
Sean Whalen
b5773c6b4a Fix etree import to type checkers don't complain 2025-12-24 14:37:38 -05:00
Sean Whalen
b99bd67225 Fix get_base_domain() typing 2025-12-24 14:32:05 -05:00
Sean Whalen
af9ad568ec Specify Python version requirements in pyproject.toml 2025-12-17 16:18:24 -05:00
Sean Whalen
748164d177 Fix #638 2025-12-17 16:09:26 -05:00
Sean Whalen
487e5e1149 Format on build 2025-12-12 15:56:52 -05:00
Sean Whalen
73010cf964 Use ruff for code formatting 2025-12-12 15:44:46 -05:00
Sean Whalen
a4a5475aa8 Fix another typo before releasing 9.0.5 2025-12-08 15:29:48 -05:00
Sean Whalen
dab78880df Actual 9.0.5 release
Fix typo
2025-12-08 15:26:58 -05:00
Sean Whalen
fb54e3b742 9.0.5
- Fix report type detection bug introduced in `9.0.4` (yanked).
2025-12-08 15:22:02 -05:00
Sean Whalen
6799f10364 9.0.4
Fixes

- Fix saving reports to OpenSearch ([#637](https://github.com/domainaware/parsedmarc/issues/637))
- Fix parsing certain DMARC failure/forensic reports
- Some fixes to type hints (incomplete, but published as-is due to the above bugs)
2025-12-08 13:26:59 -05:00
Sean Whalen
445c9565a4 Update bug link in docs 2025-12-06 15:05:19 -05:00
Sean Whalen
4b786846ae Remove Python 3.14 from testing
Until cpython bug https://github.com/python/cpython/issues/142307 is fixed
2025-12-05 11:05:29 -05:00
Sean Whalen
23ae563cd8 Update Python version support details in documentation 2025-12-05 10:48:04 -05:00
Sean Whalen
cdd000e675 9.0.3
- Set `requires-python` to `>=3.9, <3.14` to avoid [this bug](https://github.com/python/cpython/issues/142307)
2025-12-05 10:43:28 -05:00
Sean Whalen
7d58abc67b Add shebang and encoding declaration to tests.py 2025-12-04 10:21:53 -05:00
Sean Whalen
a18ae439de Fix typo in RHEL version support description in documentation 2025-12-04 10:18:15 -05:00
Sean Whalen
d7061330a8 Use None for blank fields in the Top 1000 Message Sources by Name DMARC Summary dashboard widget 2025-12-03 09:22:33 -05:00
Sean Whalen
9d5654b8ec Fix bugs with the Top 1000 Message Sources by Name DMARC Summary dashboard widget 2025-12-03 09:14:52 -05:00
Sean Whalen
a0e0070dd0 Bump version to 9.0.2 2025-12-02 20:12:58 -05:00
Sean Whalen
cf3b7f2c29 ## 9.0.2
## Improvements

- Type hinting is now used properly across the entire library. (#445)

## Fixes

- Decompress report files as needed when passed via the CLI.
- Fixed incomplete removal of the ability for `parsedmarc.utils.extract_report` to accept a file path directly in `8.15.0`.

## Breaking changes

This version of the library requires consumers to pass certain arguments as keyword-only. Internally, the API uses a bare `*` in the function signature. This is standard per [PEP 3102](https://peps.python.org/pep-3102/)  and as documented in the Python Language Reference.
.
2025-12-02 19:41:14 -05:00
Sean Whalen
d312522ab7 Enhance type hints and argument formatting in multiple files for improved clarity and consistency 2025-12-02 17:06:57 -05:00
Sean Whalen
888d717476 Enhance type hints and argument formatting in utils.py for improved clarity and consistency 2025-12-02 16:21:30 -05:00
Sean Whalen
1127f65fbb Enhance type hints and argument formatting in webhook.py for improved clarity and consistency 2025-12-02 15:52:31 -05:00
Sean Whalen
d017dfcddf Enhance type hints and argument formatting across multiple files for improved clarity and consistency 2025-12-02 15:17:37 -05:00
Sean Whalen
5fae99aacc Enhance type hints for improved clarity and consistency in __init__.py, elastic.py, and opensearch.py 2025-12-02 14:14:06 -05:00
Sean Whalen
ba57368ac3 Refactor argument formatting and type hints in elastic.py for consistency 2025-12-02 13:13:25 -05:00
Sean Whalen
dc6ee5de98 Add type hints to methods in opensearch.py for improved clarity and type checking 2025-12-02 13:11:59 -05:00
Sean Whalen
158d63d205 Complete annotations on elastic.py 2025-12-02 12:59:03 -05:00
Oscar Mattsson
f1933b906c Fix 404 link to maxmind docs (#635) 2025-12-02 09:26:01 -05:00
Anael Mobilia
4b98d795ff Define minimal Python version on pyproject (#634) 2025-12-01 20:22:49 -05:00
Sean Whalen
b1356f7dfc 9.0.1
- Allow multiple `records` for the same aggregate DMARC report in Elasticsearch and Opensearch (fixes issue in 9.0.0)
- Fix typos
2025-12-01 18:57:23 -05:00
Sean Whalen
1969196e1a Switch CHANGELOG headers 2025-12-01 18:01:54 -05:00
Sean Whalen
553f15f6a9 Code formatting 2025-12-01 17:24:10 -05:00
Sean Whalen
1fc9f638e2 9.0.0 (#629)
* Normalize report volumes when a report timespan exceed 24 hours
2025-12-01 17:06:58 -05:00
Sean Whalen
48bff504b4 Fix build script to properly publish docs 2025-12-01 11:08:21 -05:00
Sean Whalen
681b7cbf85 Formatting 2025-12-01 10:56:08 -05:00
Sean Whalen
0922d6e83a Add supported Python versions to the documentation index 2025-12-01 10:24:19 -05:00
Sean Whalen
baf3f95fb1 Update README with clarification on Python 3.6 support 2025-12-01 10:20:56 -05:00
Anael Mobilia
a51f945305 Clearly define supported Python versions policy (#633)
* Clearly define supported Python versions.

Support policy based on author's comment on https://github.com/domainaware/parsedmarc/pull/458#issuecomment-2002516299 #458

* Compile Python 3.6 as Ubuntu latest run against Ubuntu 24.04 which haven't Python3.6 + 20.04 is no longer available
https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json

* Use latest versions of GH Actions

* Silent some technicals GH Actions steps

* Elasticsearch / opensearch: use supported versions + align used versions

* Delete .github/workflows/python-tests-3.6.yml

Drop Python 3.6 test

* Update Python 3.6 support status in README

---------

Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
2025-12-01 10:02:47 -05:00
Sean Whalen
55dbf8e3db Add sources my name table to the Kibana DMARC Summary dashboard
This matches the table in the Splunk DMARC  Aggregate reports dashboard
2025-11-30 19:44:14 -05:00
Anael Mobilia
00267c9847 Codestyle cleanup (#631)
* Fix typos

* Copyright - Update date

* Codestyle xxx is False -> not xxx

* Ensure "_find_label_id_for_label" always return str

* PEP-8 : apiKey -> api_key + backward compatibility for config files

* Duplicate variable initialization

* Fix format
2025-11-30 19:13:57 -05:00
Anael Mobilia
51356175e1 Get option on the type described on documentation (#632) 2025-11-30 19:00:04 -05:00
Anael Mobilia
3be10d30dd Fix warnings in docker-compose.yml (#630)
* Fix level=warning msg="...\parsedmarc\docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"

* Fix "Unquoted port mapping not recommended"
2025-11-30 18:59:01 -05:00
Sean Whalen
98342ecac6 8.19.1 (#627)
- Ignore HTML content type in report email parsing (#626)
2025-11-29 11:37:31 -05:00
Sean Whalen
38a3d4eaae Code formatting 2025-11-28 12:48:55 -05:00
Sean Whalen
a05c230152 8.19.0 (#622)
8.19.0

- Add multi-tenant support via an index-prefix domain mapping file
- PSL overrides so that services like AWS are correctly identified
- Additional improvements to report type detection
- Fix webhook timeout parsing (PR #623)
- Output to STDOUT when the new general config boolean `silent` is set to `False` (Close #614)
- Additional services added to `base_reverse_dns_map.csv`

---------

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Félix <felix.debloisbeaucage@gmail.com>
2025-11-28 12:47:00 -05:00
Sean Whalen
17bdc3a134 More tests cleanup 2025-11-21 09:10:59 -05:00
Sean Whalen
858be00f22 Fix badge links and update image source branch 2025-11-21 09:03:04 -05:00
Sean Whalen
597ca64f9f Clean up tests 2025-11-21 00:09:28 -05:00
Sean Whalen
c5dbe2c4dc 8.10.9
- Complete fix for #687 and more robust report type detection
2025-11-20 23:50:42 -05:00
Sean Whalen
082b3d355f 8.18.8
- Fix parsing emails with an uncompressed aggregate report attachment (Closes #607)
- Add `--no-prettify-json` CLI option (PR #617)
2025-11-20 20:47:57 -05:00
Sean Whalen
2a7ce47bb1 Update code coverage badge link to main branch 2025-11-20 20:28:10 -05:00
daminoux
9882405d96 Update README.md fix url screenshot (#620)
the url of screenshot is broken
2025-11-20 20:27:15 -05:00
Andrew
fce84763b9 add --no-prettify-json CLI option (#617)
* updates process_reports to respect newly added prettify_json option

* removes duplicate definition

* removes redundant option

* fixes typo
2025-11-02 15:54:59 -05:00
Rowan
8a299b8600 Updated default python docker base image to 3.13-slim (#618)
* Updated default python docker base image to 3.13-slim

* Added python 3.13 to tests
2025-10-29 22:34:06 -04:00
jandr
b4c2b21547 Sorted usage of TLS on SMTP (#613)
Added a line for the `email_results` function to take into account the smtp_ssl setting.
2025-08-25 13:51:10 -04:00
Sean Whalen
865c249437 Update features list 2025-08-24 13:39:50 -04:00
Sean Whalen
013859f10e Fix find_unknown_base_reverse_dns.py 2025-08-19 21:18:14 -04:00
Sean Whalen
6d4a31a120 Fix find_unknown_base_reverse_dns.py and sortlist.py 2025-08-19 20:59:42 -04:00
Sean Whalen
45d3dc3b2e Fiz sortlists.py 2025-08-19 20:23:55 -04:00
Sean Whalen
4bbd97dbaa Improve list verification 2025-08-19 20:02:55 -04:00
Sean Whalen
5df152d469 Refactor find_unknown_base_reverse_dns.py 2025-08-18 12:59:54 -04:00
Sean Whalen
d990bef342 Use \n here too 2025-08-17 21:08:28 -04:00
Sean Whalen
caf77ca6d4 Use \n when writing CSVs 2025-08-17 21:01:07 -04:00
Sean Whalen
4b3d32c5a6 Actual, actual Actual 6.18.7 release
Revert back to using python csv instead of pandas to avoid conflicts with numpy in elasticsearch
2025-08-17 20:36:15 -04:00
Sean Whalen
5df5c10f80 Pin pandas an numpy versions 2025-08-17 19:59:53 -04:00
Sean Whalen
308d4657ab Make sort_csv function more flexible 2025-08-17 19:43:19 -04:00
Sean Whalen
0f74e33094 Fix typo 2025-08-17 19:35:16 -04:00
Sean Whalen
9f339e11f5 Actual 6.18.7 release 2025-08-17 19:34:14 -04:00
Sean Whalen
391e84b717 Fix map sorting 2025-08-17 18:15:20 -04:00
Sean Whalen
8bf06ce5af 8.18.7
Removed improper spaces from  `base_reverse_dns_map.csv` (Closes #612)
2025-08-17 18:13:49 -04:00
Sean Whalen
2b7ae50a27 Better wording 2025-08-17 17:01:22 -04:00
Sean Whalen
3feb478793 8.18.6
- Fix since option to correctly work with weeks (PR #604)
- Add 183 entries to `base_reverse_dns_map.csv`
- Add 57 entries to `known_unknown_base_reverse_dns.txt`
- Check for invalid UTF-8 bytes in `base_reverse_dns_map.csv` at build
- Remove unneeded items from the `parsedmarc.resources` module at build
2025-08-17 17:00:11 -04:00
Sean Whalen
01630bb61c Update code formatting 2025-08-17 16:01:45 -04:00
Sean Whalen
39347cb244 Sdd find_bad_utf8.py 2025-08-17 15:55:47 -04:00
Sean Whalen
ed25526d59 Update maps 2025-08-17 15:17:24 -04:00
alagendijk-minddistrict
880d7110fe Fix since option to correctly work with weeks (#604) 2025-08-14 18:39:04 -04:00
Martin Kjær Jørgensen
d62001f5a4 fix wrong configuration option for maildir (#606)
Signed-off-by: Martin Kjær Jørgensen <me@lagy.org>
2025-08-14 18:36:58 -04:00
Sean Whalen
0720bffcb6 Remove extra spaces 2025-06-10 19:05:06 -04:00
Sean Whalen
fecd55a97d Add SMTP TLS Reporting dashboard for Splunk
Closes #600
2025-06-10 18:54:43 -04:00
Sean Whalen
a121306eed Fix typo in the map 2025-06-10 10:53:55 -04:00
Sean Whalen
980c9c7904 Add Hostinger to the map 2025-06-10 10:50:06 -04:00
Sean Whalen
963f5d796f Fix build script 2025-06-10 09:51:12 -04:00
Sean Whalen
6532f3571b Update lists 2025-06-09 20:05:56 -04:00
Sean Whalen
ea878443a8 Update lists 2025-06-09 17:04:16 -04:00
Sean Whalen
9f6de41958 Update lists 2025-06-09 13:41:49 -04:00
Sean Whalen
119192701c Update lists 2025-06-09 12:02:50 -04:00
Sean Whalen
1d650be48a Fix typo 2025-06-08 21:41:07 -04:00
Sean Whalen
a85553fb18 Update lists 2025-06-08 21:40:10 -04:00
Sean Whalen
5975d8eb21 Fix sorting 2025-06-08 20:17:21 -04:00
Sean Whalen
87ae6175f2 Update lists 2025-06-08 19:51:13 -04:00
49 changed files with 4943 additions and 1585 deletions

View File

@@ -24,11 +24,11 @@ jobs:
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v3 uses: actions/checkout@v5
- name: Docker meta - name: Docker meta
id: meta id: meta
uses: docker/metadata-action@v3 uses: docker/metadata-action@v5
with: with:
images: | images: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
@@ -40,16 +40,14 @@ jobs:
type=semver,pattern={{major}}.{{minor}} type=semver,pattern={{major}}.{{minor}}
- name: Log in to the Container registry - name: Log in to the Container registry
# https://github.com/docker/login-action/releases/tag/v2.0.0 uses: docker/login-action@v3
uses: docker/login-action@49ed152c8eca782a232dede0303416e8f356c37b
with: with:
registry: ${{ env.REGISTRY }} registry: ${{ env.REGISTRY }}
username: ${{ github.actor }} username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }} password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image - name: Build and push Docker image
# https://github.com/docker/build-push-action/releases/tag/v3.0.0 uses: docker/build-push-action@v6
uses: docker/build-push-action@e551b19e49efd4e98792db7592c17c09b89db8d8
with: with:
context: . context: .
push: ${{ github.event_name == 'release' }} push: ${{ github.event_name == 'release' }}

View File

@@ -15,7 +15,7 @@ jobs:
services: services:
elasticsearch: elasticsearch:
image: elasticsearch:8.18.2 image: elasticsearch:8.19.7
env: env:
discovery.type: single-node discovery.type: single-node
cluster.name: parsedmarc-cluster cluster.name: parsedmarc-cluster
@@ -30,18 +30,18 @@ jobs:
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"] python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v5
- name: Set up Python ${{ matrix.python-version }} - name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5 uses: actions/setup-python@v6
with: with:
python-version: ${{ matrix.python-version }} python-version: ${{ matrix.python-version }}
- name: Install system dependencies - name: Install system dependencies
run: | run: |
sudo apt-get update sudo apt-get -q update
sudo apt-get install -y libemail-outlook-message-perl sudo apt-get -qy install libemail-outlook-message-perl
- name: Install Python dependencies - name: Install Python dependencies
run: | run: |
python -m pip install --upgrade pip python -m pip install --upgrade pip
@@ -65,6 +65,6 @@ jobs:
run: | run: |
hatch build hatch build
- name: Upload coverage to Codecov - name: Upload coverage to Codecov
uses: codecov/codecov-action@v4 uses: codecov/codecov-action@v5
with: with:
token: ${{ secrets.CODECOV_TOKEN }} token: ${{ secrets.CODECOV_TOKEN }}

5
.gitignore vendored
View File

@@ -106,7 +106,7 @@ ENV/
.idea/ .idea/
# VS Code launch config # VS Code launch config
.vscode/launch.json #.vscode/launch.json
# Visual Studio Code settings # Visual Studio Code settings
#.vscode/ #.vscode/
@@ -142,3 +142,6 @@ scratch.py
parsedmarc/resources/maps/base_reverse_dns.csv parsedmarc/resources/maps/base_reverse_dns.csv
parsedmarc/resources/maps/unknown_base_reverse_dns.csv parsedmarc/resources/maps/unknown_base_reverse_dns.csv
parsedmarc/resources/maps/sus_domains.csv
parsedmarc/resources/maps/unknown_domains.txt
*.bak

45
.vscode/launch.json vendored Normal file
View File

@@ -0,0 +1,45 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal"
},
{
"name": "tests.py",
"type": "debugpy",
"request": "launch",
"program": "tests.py",
"console": "integratedTerminal"
},
{
"name": "sample",
"type": "debugpy",
"request": "launch",
"module": "parsedmarc.cli",
"args": ["samples/private/sample"]
},
{
"name": "sortlists.py",
"type": "debugpy",
"request": "launch",
"program": "sortlists.py",
"cwd": "${workspaceFolder}/parsedmarc/resources/maps",
"console": "integratedTerminal"
},
{
"name": "find_unknown_base_reverse_dns.py",
"type": "debugpy",
"request": "launch",
"program": "find_unknown_base_reverse_dns.py",
"cwd": "${workspaceFolder}/parsedmarc/resources/maps",
"console": "integratedTerminal"
}
]
}

284
.vscode/settings.json vendored
View File

@@ -1,132 +1,166 @@
{ {
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
// Let Ruff handle lint fixes + import sorting on save
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
}
},
"markdownlint.config": { "markdownlint.config": {
"MD024": false "MD024": false
}, },
"cSpell.words": [ "cSpell.words": [
"adkim", "adkim",
"akamaiedge", "akamaiedge",
"amsmath", "amsmath",
"andrewmcgilvray", "andrewmcgilvray",
"arcname", "arcname",
"aspf", "aspf",
"autoclass", "autoclass",
"automodule", "automodule",
"backported", "backported",
"bellsouth", "bellsouth",
"brakhane", "boto",
"Brightmail", "brakhane",
"CEST", "Brightmail",
"CHACHA", "CEST",
"checkdmarc", "CHACHA",
"Codecov", "checkdmarc",
"confnew", "Codecov",
"dateparser", "confnew",
"dateutil", "dateparser",
"Davmail", "dateutil",
"DBIP", "Davmail",
"dearmor", "DBIP",
"deflist", "dearmor",
"devel", "deflist",
"DMARC", "devel",
"Dmarcian", "DMARC",
"dnspython", "Dmarcian",
"dollarmath", "dnspython",
"dpkg", "dollarmath",
"exampleuser", "dpkg",
"expiringdict", "exampleuser",
"fieldlist", "expiringdict",
"genindex", "fieldlist",
"geoipupdate", "GELF",
"Geolite", "genindex",
"geolocation", "geoip",
"githubpages", "geoipupdate",
"Grafana", "Geolite",
"hostnames", "geolocation",
"htpasswd", "githubpages",
"httpasswd", "Grafana",
"IMAP", "hostnames",
"Interaktive", "htpasswd",
"IPDB", "httpasswd",
"journalctl", "httplib",
"keepalive", "ifhost",
"keyout", "IMAP",
"keyrings", "imapclient",
"Leeman", "infile",
"libemail", "Interaktive",
"linkify", "IPDB",
"LISTSERV", "journalctl",
"lxml", "kafkaclient",
"mailparser", "keepalive",
"mailrelay", "keyout",
"mailsuite", "keyrings",
"maxdepth", "Leeman",
"maxmind", "libemail",
"mbox", "linkify",
"mfrom", "LISTSERV",
"michaeldavie", "loganalytics",
"mikesiegel", "lxml",
"mitigations", "mailparser",
"MMDB", "mailrelay",
"modindex", "mailsuite",
"msgconvert", "maxdepth",
"msgraph", "MAXHEADERS",
"MSSP", "maxmind",
"Munge", "mbox",
"ndjson", "mfrom",
"newkey", "mhdw",
"Nhcm", "michaeldavie",
"nojekyll", "mikesiegel",
"nondigest", "Mimecast",
"nosecureimap", "mitigations",
"nosniff", "MMDB",
"nwettbewerb", "modindex",
"parsedmarc", "msgconvert",
"passsword", "msgraph",
"Postorius", "MSSP",
"premade", "multiprocess",
"procs", "Munge",
"publicsuffix", "ndjson",
"publixsuffix", "newkey",
"pypy", "Nhcm",
"quickstart", "nojekyll",
"Reindex", "nondigest",
"replyto", "nosecureimap",
"reversename", "nosniff",
"Rollup", "nwettbewerb",
"Rpdm", "opensearch",
"SAMEORIGIN", "opensearchpy",
"Servernameone", "parsedmarc",
"setuptools", "passsword",
"smartquotes", "pbar",
"SMTPTLS", "Postorius",
"sourcetype", "premade",
"STARTTLS", "privatesuffix",
"tasklist", "procs",
"timespan", "publicsuffix",
"tlsa", "publicsuffixlist",
"tlsrpt", "publixsuffix",
"toctree", "pygelf",
"TQDDM", "pypy",
"tqdm", "pytest",
"truststore", "quickstart",
"Übersicht", "Reindex",
"uids", "replyto",
"unparasable", "reversename",
"uper", "Rollup",
"urllib", "Rpdm",
"Valimail", "SAMEORIGIN",
"venv", "sdist",
"Vhcw", "Servernameone",
"viewcode", "setuptools",
"virtualenv", "smartquotes",
"WBITS", "SMTPTLS",
"webmail", "sortlists",
"Wettbewerber", "sortmaps",
"Whalen", "sourcetype",
"whitespaces", "STARTTLS",
"xennn", "tasklist",
"xmltodict", "timespan",
"xpack", "tlsa",
"zscholl" "tlsrpt",
"toctree",
"TQDDM",
"tqdm",
"truststore",
"Übersicht",
"uids",
"Uncategorized",
"unparasable",
"uper",
"urllib",
"Valimail",
"venv",
"Vhcw",
"viewcode",
"virtualenv",
"WBITS",
"webmail",
"Wettbewerber",
"Whalen",
"whitespaces",
"xennn",
"xmltodict",
"xpack",
"zscholl"
], ],
} }

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
ARG BASE_IMAGE=python:3.9-slim ARG BASE_IMAGE=python:3.13-slim
ARG USERNAME=parsedmarc ARG USERNAME=parsedmarc
ARG USER_UID=1000 ARG USER_UID=1000
ARG USER_GID=$USER_UID ARG USER_GID=$USER_UID

View File

@@ -9,7 +9,7 @@ Package](https://img.shields.io/pypi/v/parsedmarc.svg)](https://pypi.org/project
[![PyPI - Downloads](https://img.shields.io/pypi/dm/parsedmarc?color=blue)](https://pypistats.org/packages/parsedmarc) [![PyPI - Downloads](https://img.shields.io/pypi/dm/parsedmarc?color=blue)](https://pypistats.org/packages/parsedmarc)
<p align="center"> <p align="center">
<img src="https://github.com/domainaware/parsedmarc/raw/master/docs/source/_static/screenshots/dmarc-summary-charts.png?raw=true" alt="A screenshot of DMARC summary charts in Kibana"/> <img src="https://raw.githubusercontent.com/domainaware/parsedmarc/refs/heads/master/docs/source/_static/screenshots/dmarc-summary-charts.png?raw=true" alt="A screenshot of DMARC summary charts in Kibana"/>
</p> </p>
`parsedmarc` is a Python module and CLI utility for parsing DMARC `parsedmarc` is a Python module and CLI utility for parsing DMARC
@@ -23,25 +23,42 @@ ProofPoint Email Fraud Defense, and Valimail.
## Help Wanted ## Help Wanted
This project is maintained by one developer. Please consider This project is maintained by one developer. Please consider reviewing the open
reviewing the open [issues](https://github.com/domainaware/parsedmarc/issues) to see how you can
[issues](https://github.com/domainaware/parsedmarc/issues) to see how contribute code, documentation, or user support. Assistance on the pinned
you can contribute code, documentation, or user support. Assistance on issues would be particularly helpful.
the pinned issues would be particularly helpful.
Thanks to all Thanks to all
[contributors](https://github.com/domainaware/parsedmarc/graphs/contributors)! [contributors](https://github.com/domainaware/parsedmarc/graphs/contributors)!
## Features ## Features
- Parses draft and 1.0 standard aggregate/rua reports - Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf reports - Parses forensic/failure/ruf DMARC reports
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail - Parses reports from SMTP TLS Reporting
API - Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports - Transparently handles gzip or zip compressed reports
- Consistent data structures - Consistent data structures
- Simple JSON and/or CSV output - Simple JSON and/or CSV output
- Optionally email the results - Optionally email the results
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use - Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for
with premade dashboards use with premade dashboards
- Optionally send reports to Apache Kafka - Optionally send reports to Apache Kafka
## Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions
for RHEL or Debian.
| Version | Supported | Reason |
|---------|-----------|------------------------------------------------------------|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Actively maintained |

View File

@@ -9,17 +9,19 @@ fi
. venv/bin/activate . venv/bin/activate
pip install .[build] pip install .[build]
ruff format . ruff format .
ruff check .
cd docs cd docs
make clean make clean
make html make html
touch build/html/.nojekyll touch build/html/.nojekyll
if [ -d "./../parsedmarc-docs" ]; then if [ -d "../../parsedmarc-docs" ]; then
cp -rf build/html/* ../../parsedmarc-docs/ cp -rf build/html/* ../../parsedmarc-docs/
fi fi
cd .. cd ..
sort -o "parsedmarc/resources/maps/known_unknown_base_reverse_dns.txt" "parsedmarc/resources/maps/known_unknown_base_reverse_dns.txt" cd parsedmarc/resources/maps
./sortmaps.py python3 sortlists.py
echo "Checking for invalid UTF-8 bytes in base_reverse_dns_map.csv"
python3 find_bad_utf8.py base_reverse_dns_map.csv
cd ../../..
python3 tests.py python3 tests.py
rm -rf dist/ build/ rm -rf dist/ build/
hatch build hatch build

1
ci.ini
View File

@@ -3,6 +3,7 @@ save_aggregate = True
save_forensic = True save_forensic = True
save_smtp_tls = True save_smtp_tls = True
debug = True debug = True
offline = True
[elasticsearch] [elasticsearch]
hosts = http://localhost:9200 hosts = http://localhost:9200

View File

@@ -1,8 +1,6 @@
version: '3.7'
services: services:
elasticsearch: elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.3.1 image: docker.elastic.co/elasticsearch/elasticsearch:8.19.7
environment: environment:
- network.host=127.0.0.1 - network.host=127.0.0.1
- http.host=0.0.0.0 - http.host=0.0.0.0
@@ -14,7 +12,7 @@ services:
- xpack.security.enabled=false - xpack.security.enabled=false
- xpack.license.self_generated.type=basic - xpack.license.self_generated.type=basic
ports: ports:
- 127.0.0.1:9200:9200 - "127.0.0.1:9200:9200"
ulimits: ulimits:
memlock: memlock:
soft: -1 soft: -1
@@ -30,7 +28,7 @@ services:
retries: 24 retries: 24
opensearch: opensearch:
image: opensearchproject/opensearch:2.18.0 image: opensearchproject/opensearch:2
environment: environment:
- network.host=127.0.0.1 - network.host=127.0.0.1
- http.host=0.0.0.0 - http.host=0.0.0.0
@@ -41,7 +39,7 @@ services:
- bootstrap.memory_lock=true - bootstrap.memory_lock=true
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD} - OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD}
ports: ports:
- 127.0.0.1:9201:9200 - "127.0.0.1:9201:9200"
ulimits: ulimits:
memlock: memlock:
soft: -1 soft: -1

View File

@@ -21,7 +21,6 @@
:members: :members:
``` ```
## parsedmarc.splunk ## parsedmarc.splunk
```{eval-rst} ```{eval-rst}
@@ -29,6 +28,13 @@
:members: :members:
``` ```
## parsedmarc.types
```{eval-rst}
.. automodule:: parsedmarc.types
:members:
```
## parsedmarc.utils ## parsedmarc.utils
```{eval-rst} ```{eval-rst}

View File

@@ -20,7 +20,7 @@ from parsedmarc import __version__
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
project = "parsedmarc" project = "parsedmarc"
copyright = "2018 - 2023, Sean Whalen and contributors" copyright = "2018 - 2025, Sean Whalen and contributors"
author = "Sean Whalen and contributors" author = "Sean Whalen and contributors"
# The version info for the project you're documenting, acts as replacement for # The version info for the project you're documenting, acts as replacement for

View File

@@ -33,17 +33,36 @@ and Valimail.
## Features ## Features
- Parses draft and 1.0 standard aggregate/rua reports - Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf reports - Parses forensic/failure/ruf DMARC reports
- Parses reports from SMTP TLS Reporting
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API - Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports - Transparently handles gzip or zip compressed reports
- Consistent data structures - Consistent data structures
- Simple JSON and/or CSV output - Simple JSON and/or CSV output
- Optionally email the results - Optionally email the results
- Optionally send the results to Elasticsearch/OpenSearch and/or Splunk, for use with - Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use
premade dashboards with premade dashboards
- Optionally send reports to Apache Kafka - Optionally send reports to Apache Kafka
## Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions
for RHEL or Debian.
| Version | Supported | Reason |
|---------|-----------|------------------------------------------------------------|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Actively maintained |
```{toctree} ```{toctree}
:caption: 'Contents' :caption: 'Contents'
:maxdepth: 2 :maxdepth: 2

View File

@@ -162,10 +162,10 @@ sudo -u parsedmarc virtualenv /opt/parsedmarc/venv
``` ```
CentOS/RHEL 8 systems use Python 3.6 by default, so on those systems CentOS/RHEL 8 systems use Python 3.6 by default, so on those systems
explicitly tell `virtualenv` to use `python3.9` instead explicitly tell `virtualenv` to use `python3.10` instead
```bash ```bash
sudo -u parsedmarc virtualenv -p python3.9 /opt/parsedmarc/venv sudo -u parsedmarc virtualenv -p python3.10 /opt/parsedmarc/venv
``` ```
Activate the virtualenv Activate the virtualenv
@@ -199,7 +199,7 @@ sudo apt-get install libemail-outlook-message-perl
[geoipupdate releases page on github]: https://github.com/maxmind/geoipupdate/releases [geoipupdate releases page on github]: https://github.com/maxmind/geoipupdate/releases
[ip to country lite database]: https://db-ip.com/db/download/ip-to-country-lite [ip to country lite database]: https://db-ip.com/db/download/ip-to-country-lite
[license keys]: https://www.maxmind.com/en/accounts/current/license-key [license keys]: https://www.maxmind.com/en/accounts/current/license-key
[maxmind geoipupdate page]: https://dev.maxmind.com/geoip/geoipupdate/ [maxmind geoipupdate page]: https://dev.maxmind.com/geoip/updating-databases/
[maxmind geolite2 country database]: https://dev.maxmind.com/geoip/geolite2-free-geolocation-data [maxmind geolite2 country database]: https://dev.maxmind.com/geoip/geolite2-free-geolocation-data
[registering for a free geolite2 account]: https://www.maxmind.com/en/geolite2/signup [registering for a free geolite2 account]: https://www.maxmind.com/en/geolite2/signup
[to comply with various privacy regulations]: https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ [to comply with various privacy regulations]: https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/

View File

@@ -23,6 +23,8 @@ of the report schema.
"report_id": "9391651994964116463", "report_id": "9391651994964116463",
"begin_date": "2012-04-27 20:00:00", "begin_date": "2012-04-27 20:00:00",
"end_date": "2012-04-28 19:59:59", "end_date": "2012-04-28 19:59:59",
"timespan_requires_normalization": false,
"original_timespan_seconds": 86399,
"errors": [] "errors": []
}, },
"policy_published": { "policy_published": {
@@ -39,8 +41,10 @@ of the report schema.
"source": { "source": {
"ip_address": "72.150.241.94", "ip_address": "72.150.241.94",
"country": "US", "country": "US",
"reverse_dns": "adsl-72-150-241-94.shv.bellsouth.net", "reverse_dns": null,
"base_domain": "bellsouth.net" "base_domain": null,
"name": null,
"type": null
}, },
"count": 2, "count": 2,
"alignment": { "alignment": {
@@ -74,7 +78,10 @@ of the report schema.
"result": "pass" "result": "pass"
} }
] ]
} },
"normalized_timespan": false,
"interval_begin": "2012-04-28 00:00:00",
"interval_end": "2012-04-28 23:59:59"
} }
] ]
} }
@@ -83,8 +90,10 @@ of the report schema.
### CSV aggregate report ### CSV aggregate report
```text ```text
xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,normalized_timespan,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-27 20:00:00,2012-04-28 19:59:59,,example.com,r,r,none,none,100,0,72.150.241.94,US,adsl-72-150-241-94.shv.bellsouth.net,bellsouth.net,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
``` ```
## Sample forensic report output ## Sample forensic report output

View File

@@ -4,47 +4,50 @@
```text ```text
usage: parsedmarc [-h] [-c CONFIG_FILE] [--strip-attachment-payloads] [-o OUTPUT] usage: parsedmarc [-h] [-c CONFIG_FILE] [--strip-attachment-payloads] [-o OUTPUT]
[--aggregate-json-filename AGGREGATE_JSON_FILENAME] [--aggregate-json-filename AGGREGATE_JSON_FILENAME] [--forensic-json-filename FORENSIC_JSON_FILENAME]
[--forensic-json-filename FORENSIC_JSON_FILENAME] [--smtp-tls-json-filename SMTP_TLS_JSON_FILENAME] [--aggregate-csv-filename AGGREGATE_CSV_FILENAME]
[--aggregate-csv-filename AGGREGATE_CSV_FILENAME] [--forensic-csv-filename FORENSIC_CSV_FILENAME] [--smtp-tls-csv-filename SMTP_TLS_CSV_FILENAME]
[--forensic-csv-filename FORENSIC_CSV_FILENAME] [-n NAMESERVERS [NAMESERVERS ...]] [-t DNS_TIMEOUT] [--offline] [-s] [-w] [--verbose] [--debug]
[-n NAMESERVERS [NAMESERVERS ...]] [-t DNS_TIMEOUT] [--offline] [--log-file LOG_FILE] [--no-prettify-json] [-v]
[-s] [--verbose] [--debug] [--log-file LOG_FILE] [-v] [file_path ...]
[file_path ...]
Parses DMARC reports Parses DMARC reports
positional arguments: positional arguments:
file_path one or more paths to aggregate or forensic report file_path one or more paths to aggregate or forensic report files, emails, or mbox files'
files, emails, or mbox files'
optional arguments: options:
-h, --help show this help message and exit -h, --help show this help message and exit
-c CONFIG_FILE, --config-file CONFIG_FILE -c CONFIG_FILE, --config-file CONFIG_FILE
a path to a configuration file (--silent implied) a path to a configuration file (--silent implied)
--strip-attachment-payloads --strip-attachment-payloads
remove attachment payloads from forensic report output remove attachment payloads from forensic report output
-o OUTPUT, --output OUTPUT -o OUTPUT, --output OUTPUT
write output files to the given directory write output files to the given directory
--aggregate-json-filename AGGREGATE_JSON_FILENAME --aggregate-json-filename AGGREGATE_JSON_FILENAME
filename for the aggregate JSON output file filename for the aggregate JSON output file
--forensic-json-filename FORENSIC_JSON_FILENAME --forensic-json-filename FORENSIC_JSON_FILENAME
filename for the forensic JSON output file filename for the forensic JSON output file
--aggregate-csv-filename AGGREGATE_CSV_FILENAME --smtp-tls-json-filename SMTP_TLS_JSON_FILENAME
filename for the aggregate CSV output file filename for the SMTP TLS JSON output file
--forensic-csv-filename FORENSIC_CSV_FILENAME --aggregate-csv-filename AGGREGATE_CSV_FILENAME
filename for the forensic CSV output file filename for the aggregate CSV output file
-n NAMESERVERS [NAMESERVERS ...], --nameservers NAMESERVERS [NAMESERVERS ...] --forensic-csv-filename FORENSIC_CSV_FILENAME
nameservers to query filename for the forensic CSV output file
-t DNS_TIMEOUT, --dns_timeout DNS_TIMEOUT --smtp-tls-csv-filename SMTP_TLS_CSV_FILENAME
number of seconds to wait for an answer from DNS filename for the SMTP TLS CSV output file
(default: 2.0) -n NAMESERVERS [NAMESERVERS ...], --nameservers NAMESERVERS [NAMESERVERS ...]
--offline do not make online queries for geolocation or DNS nameservers to query
-s, --silent only print errors and warnings -t DNS_TIMEOUT, --dns_timeout DNS_TIMEOUT
--verbose more verbose output number of seconds to wait for an answer from DNS (default: 2.0)
--debug print debugging information --offline do not make online queries for geolocation or DNS
--log-file LOG_FILE output logging to a file -s, --silent only print errors
-v, --version show program's version number and exit -w, --warnings print warnings in addition to errors
--verbose more verbose output
--debug print debugging information
--log-file LOG_FILE output logging to a file
--no-prettify-json output JSON in a single line without indentation
-v, --version show program's version number and exit
``` ```
:::{note} :::{note}
@@ -120,8 +123,10 @@ The full set of configuration options are:
Elasticsearch, Splunk and/or S3 Elasticsearch, Splunk and/or S3
- `save_smtp_tls` - bool: Save SMTP-STS report data to - `save_smtp_tls` - bool: Save SMTP-STS report data to
Elasticsearch, Splunk and/or S3 Elasticsearch, Splunk and/or S3
- `index_prefix_domain_map` - bool: A path mapping of Opensearch/Elasticsearch index prefixes to domain names
- `strip_attachment_payloads` - bool: Remove attachment - `strip_attachment_payloads` - bool: Remove attachment
payloads from results payloads from results
- `silent` - bool: Set this to `False` to output results to STDOUT
- `output` - str: Directory to place JSON and CSV files in. This is required if you set either of the JSON output file options. - `output` - str: Directory to place JSON and CSV files in. This is required if you set either of the JSON output file options.
- `aggregate_json_filename` - str: filename for the aggregate - `aggregate_json_filename` - str: filename for the aggregate
JSON output file JSON output file
@@ -166,8 +171,8 @@ The full set of configuration options are:
- `check_timeout` - int: Number of seconds to wait for a IMAP - `check_timeout` - int: Number of seconds to wait for a IMAP
IDLE response or the number of seconds until the next IDLE response or the number of seconds until the next
mail check (Default: `30`) mail check (Default: `30`)
- `since` - str: Search for messages since certain time. (Examples: `5m|3h|2d|1w`) - `since` - str: Search for messages since certain time. (Examples: `5m|3h|2d|1w`)
Acceptable units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"}). Acceptable units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"}.
Defaults to `1d` if incorrect value is provided. Defaults to `1d` if incorrect value is provided.
- `imap` - `imap`
- `host` - str: The IMAP server hostname or IP address - `host` - str: The IMAP server hostname or IP address
@@ -235,7 +240,7 @@ The full set of configuration options are:
group and use that as the group id. group and use that as the group id.
```powershell ```powershell
New-ApplicationAccessPolicy -AccessRight RestrictAccess New-ApplicationAccessPolicy -AccessRight RestrictAccess
-AppId "<CLIENT_ID>" -PolicyScopeGroupId "<MAILBOX>" -AppId "<CLIENT_ID>" -PolicyScopeGroupId "<MAILBOX>"
-Description "Restrict access to dmarc reports mailbox." -Description "Restrict access to dmarc reports mailbox."
``` ```
@@ -252,7 +257,7 @@ The full set of configuration options are:
::: :::
- `user` - str: Basic auth username - `user` - str: Basic auth username
- `password` - str: Basic auth password - `password` - str: Basic auth password
- `apiKey` - str: API key - `api_key` - str: API key
- `ssl` - bool: Use an encrypted SSL/TLS connection - `ssl` - bool: Use an encrypted SSL/TLS connection
(Default: `True`) (Default: `True`)
- `timeout` - float: Timeout in seconds (Default: 60) - `timeout` - float: Timeout in seconds (Default: 60)
@@ -275,7 +280,7 @@ The full set of configuration options are:
::: :::
- `user` - str: Basic auth username - `user` - str: Basic auth username
- `password` - str: Basic auth password - `password` - str: Basic auth password
- `apiKey` - str: API key - `api_key` - str: API key
- `ssl` - bool: Use an encrypted SSL/TLS connection - `ssl` - bool: Use an encrypted SSL/TLS connection
(Default: `True`) (Default: `True`)
- `timeout` - float: Timeout in seconds (Default: 60) - `timeout` - float: Timeout in seconds (Default: 60)
@@ -331,13 +336,65 @@ The full set of configuration options are:
- `secret_access_key` - str: The secret access key (Optional) - `secret_access_key` - str: The secret access key (Optional)
- `syslog` - `syslog`
- `server` - str: The Syslog server name or IP address - `server` - str: The Syslog server name or IP address
- `port` - int: The UDP port to use (Default: `514`) - `port` - int: The port to use (Default: `514`)
- `protocol` - str: The protocol to use: `udp`, `tcp`, or `tls` (Default: `udp`)
- `cafile_path` - str: Path to CA certificate file for TLS server verification (Optional)
- `certfile_path` - str: Path to client certificate file for TLS authentication (Optional)
- `keyfile_path` - str: Path to client private key file for TLS authentication (Optional)
- `timeout` - float: Connection timeout in seconds for TCP/TLS (Default: `5.0`)
- `retry_attempts` - int: Number of retry attempts for failed connections (Default: `3`)
- `retry_delay` - int: Delay in seconds between retry attempts (Default: `5`)
**Example UDP configuration (default):**
```ini
[syslog]
server = syslog.example.com
port = 514
```
**Example TCP configuration:**
```ini
[syslog]
server = syslog.example.com
port = 6514
protocol = tcp
timeout = 10.0
retry_attempts = 5
```
**Example TLS configuration with server verification:**
```ini
[syslog]
server = syslog.example.com
port = 6514
protocol = tls
cafile_path = /path/to/ca-cert.pem
timeout = 10.0
```
**Example TLS configuration with mutual authentication:**
```ini
[syslog]
server = syslog.example.com
port = 6514
protocol = tls
cafile_path = /path/to/ca-cert.pem
certfile_path = /path/to/client-cert.pem
keyfile_path = /path/to/client-key.pem
timeout = 10.0
retry_attempts = 3
retry_delay = 5
```
- `gmail_api` - `gmail_api`
- `credentials_file` - str: Path to file containing the - `credentials_file` - str: Path to file containing the
credentials, None to disable (Default: `None`) credentials, None to disable (Default: `None`)
- `token_file` - str: Path to save the token file - `token_file` - str: Path to save the token file
(Default: `.token`) (Default: `.token`)
:::{note} :::{note}
credentials_file and token_file can be got with [quickstart](https://developers.google.com/gmail/api/quickstart/python).Please change the scope to `https://www.googleapis.com/auth/gmail.modify`. credentials_file and token_file can be got with [quickstart](https://developers.google.com/gmail/api/quickstart/python).Please change the scope to `https://www.googleapis.com/auth/gmail.modify`.
::: :::
@@ -369,7 +426,7 @@ The full set of configuration options are:
- `mode` - str: The GELF transport type to use. Valid modes: `tcp`, `udp`, `tls` - `mode` - str: The GELF transport type to use. Valid modes: `tcp`, `udp`, `tls`
- `maildir` - `maildir`
- `reports_folder` - str: Full path for mailbox maidir location (Default: `INBOX`) - `maildir_path` - str: Full path for mailbox maidir location (Default: `INBOX`)
- `maildir_create` - bool: Create maildir if not present (Default: False) - `maildir_create` - bool: Create maildir if not present (Default: False)
- `webhook` - Post the individual reports to a webhook url with the report as the JSON body - `webhook` - Post the individual reports to a webhook url with the report as the JSON body
@@ -437,7 +494,7 @@ Update the limit to 2k per example:
PUT _cluster/settings PUT _cluster/settings
{ {
"persistent" : { "persistent" : {
"cluster.max_shards_per_node" : 2000 "cluster.max_shards_per_node" : 2000
} }
} }
``` ```
@@ -445,6 +502,28 @@ PUT _cluster/settings
Increasing this value increases resource usage. Increasing this value increases resource usage.
::: :::
## Multi-tenant support
Starting in `8.19.0`, ParseDMARC provides multi-tenant support by placing data into separate OpenSearch or Elasticsearch index prefixes. To set this up, create a YAML file that is formatted where each key is a tenant name, and the value is a list of domains related to that tenant, not including subdomains, like this:
```yaml
example:
- example.com
- example.net
- example.org
whalensolutions:
- whalensolutions.com
```
Save it to disk where the user running ParseDMARC can read it, then set `index_prefix_domain_map` to that filepath in the `[general]` section of the ParseDMARC configuration file and do not set an `index_prefix` option in the `[elasticsearch]` or `[opensearch]` sections.
When configured correctly, if ParseDMARC finds that a report is related to a domain in the mapping, the report will be saved in an index name that has the tenant name prefixed to it with a trailing underscore. Then, you can use the security features of Opensearch or the ELK stack to only grant users access to the indexes that they need.
:::{note}
A domain cannot be used in multiple tenant lists. Only the first prefix list that contains the matching domain is used.
:::
## Running parsedmarc as a systemd service ## Running parsedmarc as a systemd service
Use systemd to run `parsedmarc` as a service and process reports as Use systemd to run `parsedmarc` as a service and process reports as

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@@ -3,53 +3,55 @@
"""A CLI for parsing DMARC reports""" """A CLI for parsing DMARC reports"""
from argparse import Namespace, ArgumentParser import http.client
import json
import logging
import os import os
import sys
from argparse import ArgumentParser, Namespace
from configparser import ConfigParser from configparser import ConfigParser
from glob import glob from glob import glob
import logging
import math
from collections import OrderedDict
import json
from ssl import CERT_NONE, create_default_context
from multiprocessing import Pipe, Process from multiprocessing import Pipe, Process
import sys from ssl import CERT_NONE, create_default_context
import http.client
import yaml
from tqdm import tqdm from tqdm import tqdm
from parsedmarc import ( from parsedmarc import (
get_dmarc_reports_from_mailbox, SEEN_AGGREGATE_REPORT_IDS,
watch_inbox, InvalidDMARCReport,
parse_report_file,
get_dmarc_reports_from_mbox,
elastic,
opensearch,
kafkaclient,
splunk,
save_output,
email_results,
ParserError, ParserError,
__version__, __version__,
InvalidDMARCReport, elastic,
s3, email_results,
syslog,
loganalytics,
gelf, gelf,
get_dmarc_reports_from_mailbox,
get_dmarc_reports_from_mbox,
kafkaclient,
loganalytics,
opensearch,
parse_report_file,
s3,
save_output,
splunk,
syslog,
watch_inbox,
webhook, webhook,
) )
from parsedmarc.log import logger
from parsedmarc.mail import ( from parsedmarc.mail import (
IMAPConnection,
MSGraphConnection,
GmailConnection, GmailConnection,
IMAPConnection,
MaildirConnection, MaildirConnection,
MSGraphConnection,
) )
from parsedmarc.mail.graph import AuthMethod from parsedmarc.mail.graph import AuthMethod
from parsedmarc.types import ParsingResults
from parsedmarc.utils import get_base_domain, get_reverse_dns, is_mbox
from parsedmarc.log import logger # Increase the max header limit for very large emails. `_MAXHEADERS` is a
from parsedmarc.utils import is_mbox, get_reverse_dns # private stdlib attribute and may not exist in type stubs.
from parsedmarc import SEEN_AGGREGATE_REPORT_IDS setattr(http.client, "_MAXHEADERS", 200)
http.client._MAXHEADERS = 200 # pylint:disable=protected-access
formatter = logging.Formatter( formatter = logging.Formatter(
fmt="%(levelname)8s:%(filename)s:%(lineno)d:%(message)s", fmt="%(levelname)8s:%(filename)s:%(lineno)d:%(message)s",
@@ -66,6 +68,48 @@ def _str_to_list(s):
return list(map(lambda i: i.lstrip(), _list)) return list(map(lambda i: i.lstrip(), _list))
def _configure_logging(log_level, log_file=None):
"""
Configure logging for the current process.
This is needed for child processes to properly log messages.
Args:
log_level: The logging level (e.g., logging.DEBUG, logging.WARNING)
log_file: Optional path to log file
"""
# Get the logger
from parsedmarc.log import logger
# Set the log level
logger.setLevel(log_level)
# Add StreamHandler with formatter if not already present
# Check if we already have a StreamHandler to avoid duplicates
# Use exact type check to distinguish from FileHandler subclass
has_stream_handler = any(type(h) is logging.StreamHandler for h in logger.handlers)
if not has_stream_handler:
formatter = logging.Formatter(
fmt="%(levelname)8s:%(filename)s:%(lineno)d:%(message)s",
datefmt="%Y-%m-%d:%H:%M:%S",
)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
# Add FileHandler if log_file is specified
if log_file:
try:
fh = logging.FileHandler(log_file, "a")
formatter = logging.Formatter(
"%(asctime)s - %(levelname)s - [%(filename)s:%(lineno)d] - %(message)s"
)
fh.setFormatter(formatter)
logger.addHandler(fh)
except (IOError, OSError, PermissionError) as error:
logger.warning("Unable to write to log file: {}".format(error))
def cli_parse( def cli_parse(
file_path, file_path,
sa, sa,
@@ -76,9 +120,31 @@ def cli_parse(
always_use_local_files, always_use_local_files,
reverse_dns_map_path, reverse_dns_map_path,
reverse_dns_map_url, reverse_dns_map_url,
normalize_timespan_threshold_hours,
conn, conn,
log_level=logging.ERROR,
log_file=None,
): ):
"""Separated this function for multiprocessing""" """Separated this function for multiprocessing
Args:
file_path: Path to the report file
sa: Strip attachment payloads flag
nameservers: List of nameservers
dns_timeout: DNS timeout
ip_db_path: Path to IP database
offline: Offline mode flag
always_use_local_files: Always use local files flag
reverse_dns_map_path: Path to reverse DNS map
reverse_dns_map_url: URL to reverse DNS map
normalize_timespan_threshold_hours: Timespan threshold
conn: Pipe connection for IPC
log_level: Logging level for this process
log_file: Optional path to log file
"""
# Configure logging in this child process
_configure_logging(log_level, log_file)
try: try:
file_results = parse_report_file( file_results = parse_report_file(
file_path, file_path,
@@ -90,6 +156,7 @@ def cli_parse(
nameservers=nameservers, nameservers=nameservers,
dns_timeout=dns_timeout, dns_timeout=dns_timeout,
strip_attachment_payloads=sa, strip_attachment_payloads=sa,
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours,
) )
conn.send([file_results, file_path]) conn.send([file_results, file_path])
except ParserError as error: except ParserError as error:
@@ -101,14 +168,42 @@ def cli_parse(
def _main(): def _main():
"""Called when the module is executed""" """Called when the module is executed"""
def get_index_prefix(report):
domain = None
if index_prefix_domain_map is None:
return None
if "policy_published" in report:
domain = report["policy_published"]["domain"]
elif "reported_domain" in report:
domain = report("reported_domain")
elif "policies" in report:
domain = report["policies"][0]["domain"]
if domain:
domain = get_base_domain(domain)
for prefix in index_prefix_domain_map:
if domain in index_prefix_domain_map[prefix]:
prefix = (
prefix.lower()
.strip()
.strip("_")
.replace(" ", "_")
.replace("-", "_")
)
prefix = f"{prefix}_"
return prefix
return None
def process_reports(reports_): def process_reports(reports_):
output_str = "{0}\n".format(json.dumps(reports_, ensure_ascii=False, indent=2)) indent_value = 2 if opts.prettify_json else None
output_str = "{0}\n".format(
json.dumps(reports_, ensure_ascii=False, indent=indent_value)
)
if not opts.silent: if not opts.silent:
print(output_str) print(output_str)
if opts.output: if opts.output:
save_output( save_output(
results, reports_,
output_directory=opts.output, output_directory=opts.output,
aggregate_json_filename=opts.aggregate_json_filename, aggregate_json_filename=opts.aggregate_json_filename,
forensic_json_filename=opts.forensic_json_filename, forensic_json_filename=opts.forensic_json_filename,
@@ -126,7 +221,8 @@ def _main():
elastic.save_aggregate_report_to_elasticsearch( elastic.save_aggregate_report_to_elasticsearch(
report, report,
index_suffix=opts.elasticsearch_index_suffix, index_suffix=opts.elasticsearch_index_suffix,
index_prefix=opts.elasticsearch_index_prefix, index_prefix=opts.elasticsearch_index_prefix
or get_index_prefix(report),
monthly_indexes=opts.elasticsearch_monthly_indexes, monthly_indexes=opts.elasticsearch_monthly_indexes,
number_of_shards=shards, number_of_shards=shards,
number_of_replicas=replicas, number_of_replicas=replicas,
@@ -147,7 +243,8 @@ def _main():
opensearch.save_aggregate_report_to_opensearch( opensearch.save_aggregate_report_to_opensearch(
report, report,
index_suffix=opts.opensearch_index_suffix, index_suffix=opts.opensearch_index_suffix,
index_prefix=opts.opensearch_index_prefix, index_prefix=opts.opensearch_index_prefix
or get_index_prefix(report),
monthly_indexes=opts.opensearch_monthly_indexes, monthly_indexes=opts.opensearch_monthly_indexes,
number_of_shards=shards, number_of_shards=shards,
number_of_replicas=replicas, number_of_replicas=replicas,
@@ -189,8 +286,9 @@ def _main():
try: try:
if opts.webhook_aggregate_url: if opts.webhook_aggregate_url:
indent_value = 2 if opts.prettify_json else None
webhook_client.save_aggregate_report_to_webhook( webhook_client.save_aggregate_report_to_webhook(
json.dumps(report, ensure_ascii=False, indent=2) json.dumps(report, ensure_ascii=False, indent=indent_value)
) )
except Exception as error_: except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__())) logger.error("Webhook Error: {0}".format(error_.__str__()))
@@ -212,7 +310,8 @@ def _main():
elastic.save_forensic_report_to_elasticsearch( elastic.save_forensic_report_to_elasticsearch(
report, report,
index_suffix=opts.elasticsearch_index_suffix, index_suffix=opts.elasticsearch_index_suffix,
index_prefix=opts.elasticsearch_index_prefix, index_prefix=opts.elasticsearch_index_prefix
or get_index_prefix(report),
monthly_indexes=opts.elasticsearch_monthly_indexes, monthly_indexes=opts.elasticsearch_monthly_indexes,
number_of_shards=shards, number_of_shards=shards,
number_of_replicas=replicas, number_of_replicas=replicas,
@@ -231,7 +330,8 @@ def _main():
opensearch.save_forensic_report_to_opensearch( opensearch.save_forensic_report_to_opensearch(
report, report,
index_suffix=opts.opensearch_index_suffix, index_suffix=opts.opensearch_index_suffix,
index_prefix=opts.opensearch_index_prefix, index_prefix=opts.opensearch_index_prefix
or get_index_prefix(report),
monthly_indexes=opts.opensearch_monthly_indexes, monthly_indexes=opts.opensearch_monthly_indexes,
number_of_shards=shards, number_of_shards=shards,
number_of_replicas=replicas, number_of_replicas=replicas,
@@ -271,8 +371,9 @@ def _main():
try: try:
if opts.webhook_forensic_url: if opts.webhook_forensic_url:
indent_value = 2 if opts.prettify_json else None
webhook_client.save_forensic_report_to_webhook( webhook_client.save_forensic_report_to_webhook(
json.dumps(report, ensure_ascii=False, indent=2) json.dumps(report, ensure_ascii=False, indent=indent_value)
) )
except Exception as error_: except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__())) logger.error("Webhook Error: {0}".format(error_.__str__()))
@@ -294,7 +395,8 @@ def _main():
elastic.save_smtp_tls_report_to_elasticsearch( elastic.save_smtp_tls_report_to_elasticsearch(
report, report,
index_suffix=opts.elasticsearch_index_suffix, index_suffix=opts.elasticsearch_index_suffix,
index_prefix=opts.elasticsearch_index_prefix, index_prefix=opts.elasticsearch_index_prefix
or get_index_prefix(report),
monthly_indexes=opts.elasticsearch_monthly_indexes, monthly_indexes=opts.elasticsearch_monthly_indexes,
number_of_shards=shards, number_of_shards=shards,
number_of_replicas=replicas, number_of_replicas=replicas,
@@ -313,7 +415,8 @@ def _main():
opensearch.save_smtp_tls_report_to_opensearch( opensearch.save_smtp_tls_report_to_opensearch(
report, report,
index_suffix=opts.opensearch_index_suffix, index_suffix=opts.opensearch_index_suffix,
index_prefix=opts.opensearch_index_prefix, index_prefix=opts.opensearch_index_prefix
or get_index_prefix(report),
monthly_indexes=opts.opensearch_monthly_indexes, monthly_indexes=opts.opensearch_monthly_indexes,
number_of_shards=shards, number_of_shards=shards,
number_of_replicas=replicas, number_of_replicas=replicas,
@@ -353,8 +456,9 @@ def _main():
try: try:
if opts.webhook_smtp_tls_url: if opts.webhook_smtp_tls_url:
indent_value = 2 if opts.prettify_json else None
webhook_client.save_smtp_tls_report_to_webhook( webhook_client.save_smtp_tls_report_to_webhook(
json.dumps(report, ensure_ascii=False, indent=2) json.dumps(report, ensure_ascii=False, indent=indent_value)
) )
except Exception as error_: except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__())) logger.error("Webhook Error: {0}".format(error_.__str__()))
@@ -475,6 +579,12 @@ def _main():
"--debug", action="store_true", help="print debugging information" "--debug", action="store_true", help="print debugging information"
) )
arg_parser.add_argument("--log-file", default=None, help="output logging to a file") arg_parser.add_argument("--log-file", default=None, help="output logging to a file")
arg_parser.add_argument(
"--no-prettify-json",
action="store_false",
dest="prettify_json",
help="output JSON in a single line without indentation",
)
arg_parser.add_argument("-v", "--version", action="version", version=__version__) arg_parser.add_argument("-v", "--version", action="version", version=__version__)
aggregate_reports = [] aggregate_reports = []
@@ -504,6 +614,7 @@ def _main():
dns_timeout=args.dns_timeout, dns_timeout=args.dns_timeout,
debug=args.debug, debug=args.debug,
verbose=args.verbose, verbose=args.verbose,
prettify_json=args.prettify_json,
save_aggregate=False, save_aggregate=False,
save_forensic=False, save_forensic=False,
save_smtp_tls=False, save_smtp_tls=False,
@@ -547,7 +658,7 @@ def _main():
elasticsearch_monthly_indexes=False, elasticsearch_monthly_indexes=False,
elasticsearch_username=None, elasticsearch_username=None,
elasticsearch_password=None, elasticsearch_password=None,
elasticsearch_apiKey=None, elasticsearch_api_key=None,
opensearch_hosts=None, opensearch_hosts=None,
opensearch_timeout=60, opensearch_timeout=60,
opensearch_number_of_shards=1, opensearch_number_of_shards=1,
@@ -559,7 +670,7 @@ def _main():
opensearch_monthly_indexes=False, opensearch_monthly_indexes=False,
opensearch_username=None, opensearch_username=None,
opensearch_password=None, opensearch_password=None,
opensearch_apiKey=None, opensearch_api_key=None,
kafka_hosts=None, kafka_hosts=None,
kafka_username=None, kafka_username=None,
kafka_password=None, kafka_password=None,
@@ -586,6 +697,13 @@ def _main():
s3_secret_access_key=None, s3_secret_access_key=None,
syslog_server=None, syslog_server=None,
syslog_port=None, syslog_port=None,
syslog_protocol=None,
syslog_cafile_path=None,
syslog_certfile_path=None,
syslog_keyfile_path=None,
syslog_timeout=None,
syslog_retry_attempts=None,
syslog_retry_delay=None,
gmail_api_credentials_file=None, gmail_api_credentials_file=None,
gmail_api_token_file=None, gmail_api_token_file=None,
gmail_api_include_spam_trash=False, gmail_api_include_spam_trash=False,
@@ -615,6 +733,7 @@ def _main():
webhook_forensic_url=None, webhook_forensic_url=None,
webhook_smtp_tls_url=None, webhook_smtp_tls_url=None,
webhook_timeout=60, webhook_timeout=60,
normalize_timespan_threshold_hours=24.0,
) )
args = arg_parser.parse_args() args = arg_parser.parse_args()
@@ -625,14 +744,24 @@ def _main():
exit(-1) exit(-1)
opts.silent = True opts.silent = True
config = ConfigParser() config = ConfigParser()
index_prefix_domain_map = None
config.read(args.config_file) config.read(args.config_file)
if "general" in config.sections(): if "general" in config.sections():
general_config = config["general"] general_config = config["general"]
if "silent" in general_config:
opts.silent = bool(general_config.getboolean("silent"))
if "normalize_timespan_threshold_hours" in general_config:
opts.normalize_timespan_threshold_hours = general_config.getfloat(
"normalize_timespan_threshold_hours"
)
if "index_prefix_domain_map" in general_config:
with open(general_config["index_prefix_domain_map"]) as f:
index_prefix_domain_map = yaml.safe_load(f)
if "offline" in general_config: if "offline" in general_config:
opts.offline = general_config.getboolean("offline") opts.offline = bool(general_config.getboolean("offline"))
if "strip_attachment_payloads" in general_config: if "strip_attachment_payloads" in general_config:
opts.strip_attachment_payloads = general_config.getboolean( opts.strip_attachment_payloads = bool(
"strip_attachment_payloads" general_config.getboolean("strip_attachment_payloads")
) )
if "output" in general_config: if "output" in general_config:
opts.output = general_config["output"] opts.output = general_config["output"]
@@ -650,6 +779,8 @@ def _main():
opts.smtp_tls_csv_filename = general_config["smtp_tls_csv_filename"] opts.smtp_tls_csv_filename = general_config["smtp_tls_csv_filename"]
if "dns_timeout" in general_config: if "dns_timeout" in general_config:
opts.dns_timeout = general_config.getfloat("dns_timeout") opts.dns_timeout = general_config.getfloat("dns_timeout")
if opts.dns_timeout is None:
opts.dns_timeout = 2
if "dns_test_address" in general_config: if "dns_test_address" in general_config:
opts.dns_test_address = general_config["dns_test_address"] opts.dns_test_address = general_config["dns_test_address"]
if "nameservers" in general_config: if "nameservers" in general_config:
@@ -672,19 +803,19 @@ def _main():
) )
exit(-1) exit(-1)
if "save_aggregate" in general_config: if "save_aggregate" in general_config:
opts.save_aggregate = general_config["save_aggregate"] opts.save_aggregate = bool(general_config.getboolean("save_aggregate"))
if "save_forensic" in general_config: if "save_forensic" in general_config:
opts.save_forensic = general_config["save_forensic"] opts.save_forensic = bool(general_config.getboolean("save_forensic"))
if "save_smtp_tls" in general_config: if "save_smtp_tls" in general_config:
opts.save_smtp_tls = general_config["save_smtp_tls"] opts.save_smtp_tls = bool(general_config.getboolean("save_smtp_tls"))
if "debug" in general_config: if "debug" in general_config:
opts.debug = general_config.getboolean("debug") opts.debug = bool(general_config.getboolean("debug"))
if "verbose" in general_config: if "verbose" in general_config:
opts.verbose = general_config.getboolean("verbose") opts.verbose = bool(general_config.getboolean("verbose"))
if "silent" in general_config: if "silent" in general_config:
opts.silent = general_config.getboolean("silent") opts.silent = bool(general_config.getboolean("silent"))
if "warnings" in general_config: if "warnings" in general_config:
opts.warnings = general_config.getboolean("warnings") opts.warnings = bool(general_config.getboolean("warnings"))
if "log_file" in general_config: if "log_file" in general_config:
opts.log_file = general_config["log_file"] opts.log_file = general_config["log_file"]
if "n_procs" in general_config: if "n_procs" in general_config:
@@ -694,13 +825,15 @@ def _main():
else: else:
opts.ip_db_path = None opts.ip_db_path = None
if "always_use_local_files" in general_config: if "always_use_local_files" in general_config:
opts.always_use_local_files = general_config.getboolean( opts.always_use_local_files = bool(
"always_use_local_files" general_config.getboolean("always_use_local_files")
) )
if "reverse_dns_map_path" in general_config: if "reverse_dns_map_path" in general_config:
opts.reverse_dns_map_path = general_config["reverse_dns_path"] opts.reverse_dns_map_path = general_config["reverse_dns_path"]
if "reverse_dns_map_url" in general_config: if "reverse_dns_map_url" in general_config:
opts.reverse_dns_map_url = general_config["reverse_dns_url"] opts.reverse_dns_map_url = general_config["reverse_dns_url"]
if "prettify_json" in general_config:
opts.prettify_json = bool(general_config.getboolean("prettify_json"))
if "mailbox" in config.sections(): if "mailbox" in config.sections():
mailbox_config = config["mailbox"] mailbox_config = config["mailbox"]
@@ -711,11 +844,11 @@ def _main():
if "archive_folder" in mailbox_config: if "archive_folder" in mailbox_config:
opts.mailbox_archive_folder = mailbox_config["archive_folder"] opts.mailbox_archive_folder = mailbox_config["archive_folder"]
if "watch" in mailbox_config: if "watch" in mailbox_config:
opts.mailbox_watch = mailbox_config.getboolean("watch") opts.mailbox_watch = bool(mailbox_config.getboolean("watch"))
if "delete" in mailbox_config: if "delete" in mailbox_config:
opts.mailbox_delete = mailbox_config.getboolean("delete") opts.mailbox_delete = bool(mailbox_config.getboolean("delete"))
if "test" in mailbox_config: if "test" in mailbox_config:
opts.mailbox_test = mailbox_config.getboolean("test") opts.mailbox_test = bool(mailbox_config.getboolean("test"))
if "batch_size" in mailbox_config: if "batch_size" in mailbox_config:
opts.mailbox_batch_size = mailbox_config.getint("batch_size") opts.mailbox_batch_size = mailbox_config.getint("batch_size")
if "check_timeout" in mailbox_config: if "check_timeout" in mailbox_config:
@@ -739,14 +872,15 @@ def _main():
if "port" in imap_config: if "port" in imap_config:
opts.imap_port = imap_config.getint("port") opts.imap_port = imap_config.getint("port")
if "timeout" in imap_config: if "timeout" in imap_config:
opts.imap_timeout = imap_config.getfloat("timeout") opts.imap_timeout = imap_config.getint("timeout")
if "max_retries" in imap_config: if "max_retries" in imap_config:
opts.imap_max_retries = imap_config.getint("max_retries") opts.imap_max_retries = imap_config.getint("max_retries")
if "ssl" in imap_config: if "ssl" in imap_config:
opts.imap_ssl = imap_config.getboolean("ssl") opts.imap_ssl = bool(imap_config.getboolean("ssl"))
if "skip_certificate_verification" in imap_config: if "skip_certificate_verification" in imap_config:
imap_verify = imap_config.getboolean("skip_certificate_verification") opts.imap_skip_certificate_verification = bool(
opts.imap_skip_certificate_verification = imap_verify imap_config.getboolean("skip_certificate_verification")
)
if "user" in imap_config: if "user" in imap_config:
opts.imap_user = imap_config["user"] opts.imap_user = imap_config["user"]
else: else:
@@ -774,7 +908,7 @@ def _main():
"section instead." "section instead."
) )
if "watch" in imap_config: if "watch" in imap_config:
opts.mailbox_watch = imap_config.getboolean("watch") opts.mailbox_watch = bool(imap_config.getboolean("watch"))
logger.warning( logger.warning(
"Use of the watch option in the imap " "Use of the watch option in the imap "
"configuration section has been deprecated. " "configuration section has been deprecated. "
@@ -789,7 +923,7 @@ def _main():
"section instead." "section instead."
) )
if "test" in imap_config: if "test" in imap_config:
opts.mailbox_test = imap_config.getboolean("test") opts.mailbox_test = bool(imap_config.getboolean("test"))
logger.warning( logger.warning(
"Use of the test option in the imap " "Use of the test option in the imap "
"configuration section has been deprecated. " "configuration section has been deprecated. "
@@ -883,8 +1017,8 @@ def _main():
opts.graph_url = graph_config["graph_url"] opts.graph_url = graph_config["graph_url"]
if "allow_unencrypted_storage" in graph_config: if "allow_unencrypted_storage" in graph_config:
opts.graph_allow_unencrypted_storage = graph_config.getboolean( opts.graph_allow_unencrypted_storage = bool(
"allow_unencrypted_storage" graph_config.getboolean("allow_unencrypted_storage")
) )
if "elasticsearch" in config: if "elasticsearch" in config:
@@ -912,18 +1046,22 @@ def _main():
if "index_prefix" in elasticsearch_config: if "index_prefix" in elasticsearch_config:
opts.elasticsearch_index_prefix = elasticsearch_config["index_prefix"] opts.elasticsearch_index_prefix = elasticsearch_config["index_prefix"]
if "monthly_indexes" in elasticsearch_config: if "monthly_indexes" in elasticsearch_config:
monthly = elasticsearch_config.getboolean("monthly_indexes") monthly = bool(elasticsearch_config.getboolean("monthly_indexes"))
opts.elasticsearch_monthly_indexes = monthly opts.elasticsearch_monthly_indexes = monthly
if "ssl" in elasticsearch_config: if "ssl" in elasticsearch_config:
opts.elasticsearch_ssl = elasticsearch_config.getboolean("ssl") opts.elasticsearch_ssl = bool(elasticsearch_config.getboolean("ssl"))
if "cert_path" in elasticsearch_config: if "cert_path" in elasticsearch_config:
opts.elasticsearch_ssl_cert_path = elasticsearch_config["cert_path"] opts.elasticsearch_ssl_cert_path = elasticsearch_config["cert_path"]
if "user" in elasticsearch_config: if "user" in elasticsearch_config:
opts.elasticsearch_username = elasticsearch_config["user"] opts.elasticsearch_username = elasticsearch_config["user"]
if "password" in elasticsearch_config: if "password" in elasticsearch_config:
opts.elasticsearch_password = elasticsearch_config["password"] opts.elasticsearch_password = elasticsearch_config["password"]
# Until 8.20
if "apiKey" in elasticsearch_config: if "apiKey" in elasticsearch_config:
opts.elasticsearch_apiKey = elasticsearch_config["apiKey"] opts.elasticsearch_apiKey = elasticsearch_config["apiKey"]
# Since 8.20
if "api_key" in elasticsearch_config:
opts.elasticsearch_apiKey = elasticsearch_config["api_key"]
if "opensearch" in config: if "opensearch" in config:
opensearch_config = config["opensearch"] opensearch_config = config["opensearch"]
@@ -948,18 +1086,22 @@ def _main():
if "index_prefix" in opensearch_config: if "index_prefix" in opensearch_config:
opts.opensearch_index_prefix = opensearch_config["index_prefix"] opts.opensearch_index_prefix = opensearch_config["index_prefix"]
if "monthly_indexes" in opensearch_config: if "monthly_indexes" in opensearch_config:
monthly = opensearch_config.getboolean("monthly_indexes") monthly = bool(opensearch_config.getboolean("monthly_indexes"))
opts.opensearch_monthly_indexes = monthly opts.opensearch_monthly_indexes = monthly
if "ssl" in opensearch_config: if "ssl" in opensearch_config:
opts.opensearch_ssl = opensearch_config.getboolean("ssl") opts.opensearch_ssl = bool(opensearch_config.getboolean("ssl"))
if "cert_path" in opensearch_config: if "cert_path" in opensearch_config:
opts.opensearch_ssl_cert_path = opensearch_config["cert_path"] opts.opensearch_ssl_cert_path = opensearch_config["cert_path"]
if "user" in opensearch_config: if "user" in opensearch_config:
opts.opensearch_username = opensearch_config["user"] opts.opensearch_username = opensearch_config["user"]
if "password" in opensearch_config: if "password" in opensearch_config:
opts.opensearch_password = opensearch_config["password"] opts.opensearch_password = opensearch_config["password"]
# Until 8.20
if "apiKey" in opensearch_config: if "apiKey" in opensearch_config:
opts.opensearch_apiKey = opensearch_config["apiKey"] opts.opensearch_apiKey = opensearch_config["apiKey"]
# Since 8.20
if "api_key" in opensearch_config:
opts.opensearch_apiKey = opensearch_config["api_key"]
if "splunk_hec" in config.sections(): if "splunk_hec" in config.sections():
hec_config = config["splunk_hec"] hec_config = config["splunk_hec"]
@@ -1001,9 +1143,11 @@ def _main():
if "password" in kafka_config: if "password" in kafka_config:
opts.kafka_password = kafka_config["password"] opts.kafka_password = kafka_config["password"]
if "ssl" in kafka_config: if "ssl" in kafka_config:
opts.kafka_ssl = kafka_config.getboolean("ssl") opts.kafka_ssl = bool(kafka_config.getboolean("ssl"))
if "skip_certificate_verification" in kafka_config: if "skip_certificate_verification" in kafka_config:
kafka_verify = kafka_config.getboolean("skip_certificate_verification") kafka_verify = bool(
kafka_config.getboolean("skip_certificate_verification")
)
opts.kafka_skip_certificate_verification = kafka_verify opts.kafka_skip_certificate_verification = kafka_verify
if "aggregate_topic" in kafka_config: if "aggregate_topic" in kafka_config:
opts.kafka_aggregate_topic = kafka_config["aggregate_topic"] opts.kafka_aggregate_topic = kafka_config["aggregate_topic"]
@@ -1035,9 +1179,11 @@ def _main():
if "port" in smtp_config: if "port" in smtp_config:
opts.smtp_port = smtp_config.getint("port") opts.smtp_port = smtp_config.getint("port")
if "ssl" in smtp_config: if "ssl" in smtp_config:
opts.smtp_ssl = smtp_config.getboolean("ssl") opts.smtp_ssl = bool(smtp_config.getboolean("ssl"))
if "skip_certificate_verification" in smtp_config: if "skip_certificate_verification" in smtp_config:
smtp_verify = smtp_config.getboolean("skip_certificate_verification") smtp_verify = bool(
smtp_config.getboolean("skip_certificate_verification")
)
opts.smtp_skip_certificate_verification = smtp_verify opts.smtp_skip_certificate_verification = smtp_verify
if "user" in smtp_config: if "user" in smtp_config:
opts.smtp_user = smtp_config["user"] opts.smtp_user = smtp_config["user"]
@@ -1100,28 +1246,54 @@ def _main():
opts.syslog_port = syslog_config["port"] opts.syslog_port = syslog_config["port"]
else: else:
opts.syslog_port = 514 opts.syslog_port = 514
if "protocol" in syslog_config:
opts.syslog_protocol = syslog_config["protocol"]
else:
opts.syslog_protocol = "udp"
if "cafile_path" in syslog_config:
opts.syslog_cafile_path = syslog_config["cafile_path"]
if "certfile_path" in syslog_config:
opts.syslog_certfile_path = syslog_config["certfile_path"]
if "keyfile_path" in syslog_config:
opts.syslog_keyfile_path = syslog_config["keyfile_path"]
if "timeout" in syslog_config:
opts.syslog_timeout = float(syslog_config["timeout"])
else:
opts.syslog_timeout = 5.0
if "retry_attempts" in syslog_config:
opts.syslog_retry_attempts = int(syslog_config["retry_attempts"])
else:
opts.syslog_retry_attempts = 3
if "retry_delay" in syslog_config:
opts.syslog_retry_delay = int(syslog_config["retry_delay"])
else:
opts.syslog_retry_delay = 5
if "gmail_api" in config.sections(): if "gmail_api" in config.sections():
gmail_api_config = config["gmail_api"] gmail_api_config = config["gmail_api"]
opts.gmail_api_credentials_file = gmail_api_config.get("credentials_file") opts.gmail_api_credentials_file = gmail_api_config.get("credentials_file")
opts.gmail_api_token_file = gmail_api_config.get("token_file", ".token") opts.gmail_api_token_file = gmail_api_config.get("token_file", ".token")
opts.gmail_api_include_spam_trash = gmail_api_config.getboolean( opts.gmail_api_include_spam_trash = bool(
"include_spam_trash", False gmail_api_config.getboolean("include_spam_trash", False)
) )
opts.gmail_api_paginate_messages = gmail_api_config.getboolean( opts.gmail_api_paginate_messages = bool(
"paginate_messages", True gmail_api_config.getboolean("paginate_messages", True)
) )
opts.gmail_api_scopes = gmail_api_config.get( opts.gmail_api_scopes = gmail_api_config.get(
"scopes", default_gmail_api_scope "scopes", default_gmail_api_scope
) )
opts.gmail_api_scopes = _str_to_list(opts.gmail_api_scopes) opts.gmail_api_scopes = _str_to_list(opts.gmail_api_scopes)
if "oauth2_port" in gmail_api_config: if "oauth2_port" in gmail_api_config:
opts.gmail_api_oauth2_port = gmail_api_config.get("oauth2_port", 8080) opts.gmail_api_oauth2_port = gmail_api_config.getint(
"oauth2_port", 8080
)
if "maildir" in config.sections(): if "maildir" in config.sections():
maildir_api_config = config["maildir"] maildir_api_config = config["maildir"]
opts.maildir_path = maildir_api_config.get("maildir_path") opts.maildir_path = maildir_api_config.get("maildir_path")
opts.maildir_create = maildir_api_config.get("maildir_create") opts.maildir_create = bool(
maildir_api_config.getboolean("maildir_create", fallback=False)
)
if "log_analytics" in config.sections(): if "log_analytics" in config.sections():
log_analytics_config = config["log_analytics"] log_analytics_config = config["log_analytics"]
@@ -1167,7 +1339,7 @@ def _main():
if "smtp_tls_url" in webhook_config: if "smtp_tls_url" in webhook_config:
opts.webhook_smtp_tls_url = webhook_config["smtp_tls_url"] opts.webhook_smtp_tls_url = webhook_config["smtp_tls_url"]
if "timeout" in webhook_config: if "timeout" in webhook_config:
opts.webhook_timeout = webhook_config["timeout"] opts.webhook_timeout = webhook_config.getint("timeout")
logger.setLevel(logging.ERROR) logger.setLevel(logging.ERROR)
@@ -1216,14 +1388,19 @@ def _main():
es_aggregate_index = "{0}{1}".format(prefix, es_aggregate_index) es_aggregate_index = "{0}{1}".format(prefix, es_aggregate_index)
es_forensic_index = "{0}{1}".format(prefix, es_forensic_index) es_forensic_index = "{0}{1}".format(prefix, es_forensic_index)
es_smtp_tls_index = "{0}{1}".format(prefix, es_smtp_tls_index) es_smtp_tls_index = "{0}{1}".format(prefix, es_smtp_tls_index)
elastic_timeout_value = (
float(opts.elasticsearch_timeout)
if opts.elasticsearch_timeout is not None
else 60.0
)
elastic.set_hosts( elastic.set_hosts(
opts.elasticsearch_hosts, opts.elasticsearch_hosts,
opts.elasticsearch_ssl, use_ssl=opts.elasticsearch_ssl,
opts.elasticsearch_ssl_cert_path, ssl_cert_path=opts.elasticsearch_ssl_cert_path,
opts.elasticsearch_username, username=opts.elasticsearch_username,
opts.elasticsearch_password, password=opts.elasticsearch_password,
opts.elasticsearch_apiKey, api_key=opts.elasticsearch_api_key,
timeout=opts.elasticsearch_timeout, timeout=elastic_timeout_value,
) )
elastic.migrate_indexes( elastic.migrate_indexes(
aggregate_indexes=[es_aggregate_index], aggregate_indexes=[es_aggregate_index],
@@ -1248,14 +1425,19 @@ def _main():
os_aggregate_index = "{0}{1}".format(prefix, os_aggregate_index) os_aggregate_index = "{0}{1}".format(prefix, os_aggregate_index)
os_forensic_index = "{0}{1}".format(prefix, os_forensic_index) os_forensic_index = "{0}{1}".format(prefix, os_forensic_index)
os_smtp_tls_index = "{0}{1}".format(prefix, os_smtp_tls_index) os_smtp_tls_index = "{0}{1}".format(prefix, os_smtp_tls_index)
opensearch_timeout_value = (
float(opts.opensearch_timeout)
if opts.opensearch_timeout is not None
else 60.0
)
opensearch.set_hosts( opensearch.set_hosts(
opts.opensearch_hosts, opts.opensearch_hosts,
opts.opensearch_ssl, use_ssl=opts.opensearch_ssl,
opts.opensearch_ssl_cert_path, ssl_cert_path=opts.opensearch_ssl_cert_path,
opts.opensearch_username, username=opts.opensearch_username,
opts.opensearch_password, password=opts.opensearch_password,
opts.opensearch_apiKey, api_key=opts.opensearch_api_key,
timeout=opts.opensearch_timeout, timeout=opensearch_timeout_value,
) )
opensearch.migrate_indexes( opensearch.migrate_indexes(
aggregate_indexes=[os_aggregate_index], aggregate_indexes=[os_aggregate_index],
@@ -1283,6 +1465,17 @@ def _main():
syslog_client = syslog.SyslogClient( syslog_client = syslog.SyslogClient(
server_name=opts.syslog_server, server_name=opts.syslog_server,
server_port=int(opts.syslog_port), server_port=int(opts.syslog_port),
protocol=opts.syslog_protocol or "udp",
cafile_path=opts.syslog_cafile_path,
certfile_path=opts.syslog_certfile_path,
keyfile_path=opts.syslog_keyfile_path,
timeout=opts.syslog_timeout if opts.syslog_timeout is not None else 5.0,
retry_attempts=opts.syslog_retry_attempts
if opts.syslog_retry_attempts is not None
else 3,
retry_delay=opts.syslog_retry_delay
if opts.syslog_retry_delay is not None
else 5,
) )
except Exception as error_: except Exception as error_:
logger.error("Syslog Error: {0}".format(error_.__str__())) logger.error("Syslog Error: {0}".format(error_.__str__()))
@@ -1364,16 +1557,23 @@ def _main():
results = [] results = []
pbar = None
if sys.stdout.isatty(): if sys.stdout.isatty():
pbar = tqdm(total=len(file_paths)) pbar = tqdm(total=len(file_paths))
for batch_index in range(math.ceil(len(file_paths) / opts.n_procs)): n_procs = int(opts.n_procs or 1)
if n_procs < 1:
n_procs = 1
# Capture the current log level to pass to child processes
current_log_level = logger.level
current_log_file = opts.log_file
for batch_index in range((len(file_paths) + n_procs - 1) // n_procs):
processes = [] processes = []
connections = [] connections = []
for proc_index in range( for proc_index in range(n_procs * batch_index, n_procs * (batch_index + 1)):
opts.n_procs * batch_index, opts.n_procs * (batch_index + 1)
):
if proc_index >= len(file_paths): if proc_index >= len(file_paths):
break break
@@ -1392,7 +1592,10 @@ def _main():
opts.always_use_local_files, opts.always_use_local_files,
opts.reverse_dns_map_path, opts.reverse_dns_map_path,
opts.reverse_dns_map_url, opts.reverse_dns_map_url,
opts.normalize_timespan_threshold_hours,
child_conn, child_conn,
current_log_level,
current_log_file,
), ),
) )
processes.append(process) processes.append(process)
@@ -1405,12 +1608,15 @@ def _main():
for proc in processes: for proc in processes:
proc.join() proc.join()
if sys.stdout.isatty(): if pbar is not None:
counter += 1 counter += 1
pbar.update(counter - pbar.n) pbar.update(1)
if pbar is not None:
pbar.close()
for result in results: for result in results:
if type(result[0]) is ParserError: if isinstance(result[0], ParserError) or result[0] is None:
logger.error("Failed to parse {0} - {1}".format(result[1], result[0])) logger.error("Failed to parse {0} - {1}".format(result[1], result[0]))
else: else:
if result[0]["report_type"] == "aggregate": if result[0]["report_type"] == "aggregate":
@@ -1431,6 +1637,11 @@ def _main():
smtp_tls_reports.append(result[0]["report"]) smtp_tls_reports.append(result[0]["report"])
for mbox_path in mbox_paths: for mbox_path in mbox_paths:
normalize_timespan_threshold_hours_value = (
float(opts.normalize_timespan_threshold_hours)
if opts.normalize_timespan_threshold_hours is not None
else 24.0
)
strip = opts.strip_attachment_payloads strip = opts.strip_attachment_payloads
reports = get_dmarc_reports_from_mbox( reports = get_dmarc_reports_from_mbox(
mbox_path, mbox_path,
@@ -1442,12 +1653,17 @@ def _main():
reverse_dns_map_path=opts.reverse_dns_map_path, reverse_dns_map_path=opts.reverse_dns_map_path,
reverse_dns_map_url=opts.reverse_dns_map_url, reverse_dns_map_url=opts.reverse_dns_map_url,
offline=opts.offline, offline=opts.offline,
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours_value,
) )
aggregate_reports += reports["aggregate_reports"] aggregate_reports += reports["aggregate_reports"]
forensic_reports += reports["forensic_reports"] forensic_reports += reports["forensic_reports"]
smtp_tls_reports += reports["smtp_tls_reports"] smtp_tls_reports += reports["smtp_tls_reports"]
mailbox_connection = None mailbox_connection = None
mailbox_batch_size_value = 10
mailbox_check_timeout_value = 30
normalize_timespan_threshold_hours_value = 24.0
if opts.imap_host: if opts.imap_host:
try: try:
if opts.imap_user is None or opts.imap_password is None: if opts.imap_user is None or opts.imap_password is None:
@@ -1460,16 +1676,23 @@ def _main():
if opts.imap_skip_certificate_verification: if opts.imap_skip_certificate_verification:
logger.debug("Skipping IMAP certificate verification") logger.debug("Skipping IMAP certificate verification")
verify = False verify = False
if opts.imap_ssl is False: if not opts.imap_ssl:
ssl = False ssl = False
imap_timeout = (
int(opts.imap_timeout) if opts.imap_timeout is not None else 30
)
imap_max_retries = (
int(opts.imap_max_retries) if opts.imap_max_retries is not None else 4
)
imap_port_value = int(opts.imap_port) if opts.imap_port is not None else 993
mailbox_connection = IMAPConnection( mailbox_connection = IMAPConnection(
host=opts.imap_host, host=opts.imap_host,
port=opts.imap_port, port=imap_port_value,
ssl=ssl, ssl=ssl,
verify=verify, verify=verify,
timeout=opts.imap_timeout, timeout=imap_timeout,
max_retries=opts.imap_max_retries, max_retries=imap_max_retries,
user=opts.imap_user, user=opts.imap_user,
password=opts.imap_password, password=opts.imap_password,
) )
@@ -1490,7 +1713,7 @@ def _main():
username=opts.graph_user, username=opts.graph_user,
password=opts.graph_password, password=opts.graph_password,
token_file=opts.graph_token_file, token_file=opts.graph_token_file,
allow_unencrypted_storage=opts.graph_allow_unencrypted_storage, allow_unencrypted_storage=bool(opts.graph_allow_unencrypted_storage),
graph_url=opts.graph_url, graph_url=opts.graph_url,
) )
@@ -1535,11 +1758,24 @@ def _main():
exit(1) exit(1)
if mailbox_connection: if mailbox_connection:
mailbox_batch_size_value = (
int(opts.mailbox_batch_size) if opts.mailbox_batch_size is not None else 10
)
mailbox_check_timeout_value = (
int(opts.mailbox_check_timeout)
if opts.mailbox_check_timeout is not None
else 30
)
normalize_timespan_threshold_hours_value = (
float(opts.normalize_timespan_threshold_hours)
if opts.normalize_timespan_threshold_hours is not None
else 24.0
)
try: try:
reports = get_dmarc_reports_from_mailbox( reports = get_dmarc_reports_from_mailbox(
connection=mailbox_connection, connection=mailbox_connection,
delete=opts.mailbox_delete, delete=opts.mailbox_delete,
batch_size=opts.mailbox_batch_size, batch_size=mailbox_batch_size_value,
reports_folder=opts.mailbox_reports_folder, reports_folder=opts.mailbox_reports_folder,
archive_folder=opts.mailbox_archive_folder, archive_folder=opts.mailbox_archive_folder,
ip_db_path=opts.ip_db_path, ip_db_path=opts.ip_db_path,
@@ -1551,6 +1787,7 @@ def _main():
test=opts.mailbox_test, test=opts.mailbox_test,
strip_attachment_payloads=opts.strip_attachment_payloads, strip_attachment_payloads=opts.strip_attachment_payloads,
since=opts.mailbox_since, since=opts.mailbox_since,
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours_value,
) )
aggregate_reports += reports["aggregate_reports"] aggregate_reports += reports["aggregate_reports"]
@@ -1561,31 +1798,36 @@ def _main():
logger.exception("Mailbox Error") logger.exception("Mailbox Error")
exit(1) exit(1)
results = OrderedDict( parsing_results: ParsingResults = {
[ "aggregate_reports": aggregate_reports,
("aggregate_reports", aggregate_reports), "forensic_reports": forensic_reports,
("forensic_reports", forensic_reports), "smtp_tls_reports": smtp_tls_reports,
("smtp_tls_reports", smtp_tls_reports), }
]
)
process_reports(results) process_reports(parsing_results)
if opts.smtp_host: if opts.smtp_host:
try: try:
verify = True verify = True
if opts.smtp_skip_certificate_verification: if opts.smtp_skip_certificate_verification:
verify = False verify = False
smtp_port_value = int(opts.smtp_port) if opts.smtp_port is not None else 25
smtp_to_value = (
list(opts.smtp_to)
if isinstance(opts.smtp_to, list)
else _str_to_list(str(opts.smtp_to))
)
email_results( email_results(
results, parsing_results,
opts.smtp_host, opts.smtp_host,
opts.smtp_from, opts.smtp_from,
opts.smtp_to, smtp_to_value,
port=opts.smtp_port, port=smtp_port_value,
verify=verify, verify=verify,
username=opts.smtp_user, username=opts.smtp_user,
password=opts.smtp_password, password=opts.smtp_password,
subject=opts.smtp_subject, subject=opts.smtp_subject,
require_encryption=opts.smtp_ssl,
) )
except Exception: except Exception:
logger.exception("Failed to email results") logger.exception("Failed to email results")
@@ -1602,16 +1844,17 @@ def _main():
archive_folder=opts.mailbox_archive_folder, archive_folder=opts.mailbox_archive_folder,
delete=opts.mailbox_delete, delete=opts.mailbox_delete,
test=opts.mailbox_test, test=opts.mailbox_test,
check_timeout=opts.mailbox_check_timeout, check_timeout=mailbox_check_timeout_value,
nameservers=opts.nameservers, nameservers=opts.nameservers,
dns_timeout=opts.dns_timeout, dns_timeout=opts.dns_timeout,
strip_attachment_payloads=opts.strip_attachment_payloads, strip_attachment_payloads=opts.strip_attachment_payloads,
batch_size=opts.mailbox_batch_size, batch_size=mailbox_batch_size_value,
ip_db_path=opts.ip_db_path, ip_db_path=opts.ip_db_path,
always_use_local_files=opts.always_use_local_files, always_use_local_files=opts.always_use_local_files,
reverse_dns_map_path=opts.reverse_dns_map_path, reverse_dns_map_path=opts.reverse_dns_map_path,
reverse_dns_map_url=opts.reverse_dns_map_url, reverse_dns_map_url=opts.reverse_dns_map_url,
offline=opts.offline, offline=opts.offline,
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours_value,
) )
except FileExistsError as error: except FileExistsError as error:
logger.error("{0}".format(error.__str__())) logger.error("{0}".format(error.__str__()))

View File

@@ -1,2 +1,3 @@
__version__ = "8.18.5" __version__ = "9.1.0"
USER_AGENT = f"parsedmarc/{__version__}" USER_AGENT = f"parsedmarc/{__version__}"

View File

@@ -1,27 +1,29 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from collections import OrderedDict from __future__ import annotations
from elasticsearch_dsl.search import Q from typing import Any, Optional, Union
from elasticsearch.helpers import reindex
from elasticsearch_dsl import ( from elasticsearch_dsl import (
connections, Boolean,
Object, Date,
Document, Document,
Index, Index,
Nested,
InnerDoc, InnerDoc,
Integer, Integer,
Text,
Boolean,
Ip, Ip,
Date, Nested,
Object,
Search, Search,
Text,
connections,
) )
from elasticsearch.helpers import reindex from elasticsearch_dsl.search import Q
from parsedmarc import InvalidForensicReport
from parsedmarc.log import logger from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import InvalidForensicReport
class ElasticsearchError(Exception): class ElasticsearchError(Exception):
@@ -67,6 +69,8 @@ class _AggregateReportDoc(Document):
date_range = Date() date_range = Date()
date_begin = Date() date_begin = Date()
date_end = Date() date_end = Date()
normalized_timespan = Boolean()
original_timespan_seconds = Integer
errors = Text() errors = Text()
published_policy = Object(_PublishedPolicy) published_policy = Object(_PublishedPolicy)
source_ip_address = Ip() source_ip_address = Ip()
@@ -87,18 +91,18 @@ class _AggregateReportDoc(Document):
dkim_results = Nested(_DKIMResult) dkim_results = Nested(_DKIMResult)
spf_results = Nested(_SPFResult) spf_results = Nested(_SPFResult)
def add_policy_override(self, type_, comment): def add_policy_override(self, type_: str, comment: str):
self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment)) self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment)) # pyright: ignore[reportCallIssue]
def add_dkim_result(self, domain, selector, result): def add_dkim_result(self, domain: str, selector: str, result: _DKIMResult):
self.dkim_results.append( self.dkim_results.append(
_DKIMResult(domain=domain, selector=selector, result=result) _DKIMResult(domain=domain, selector=selector, result=result)
) ) # pyright: ignore[reportCallIssue]
def add_spf_result(self, domain, scope, result): def add_spf_result(self, domain: str, scope: str, result: _SPFResult):
self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result)) self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result)) # pyright: ignore[reportCallIssue]
def save(self, **kwargs): def save(self, **kwargs): # pyright: ignore[reportIncompatibleMethodOverride]
self.passed_dmarc = False self.passed_dmarc = False
self.passed_dmarc = self.spf_aligned or self.dkim_aligned self.passed_dmarc = self.spf_aligned or self.dkim_aligned
@@ -131,26 +135,26 @@ class _ForensicSampleDoc(InnerDoc):
body = Text() body = Text()
attachments = Nested(_EmailAttachmentDoc) attachments = Nested(_EmailAttachmentDoc)
def add_to(self, display_name, address): def add_to(self, display_name: str, address: str):
self.to.append(_EmailAddressDoc(display_name=display_name, address=address)) self.to.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_reply_to(self, display_name, address): def add_reply_to(self, display_name: str, address: str):
self.reply_to.append( self.reply_to.append(
_EmailAddressDoc(display_name=display_name, address=address) _EmailAddressDoc(display_name=display_name, address=address)
) ) # pyright: ignore[reportCallIssue]
def add_cc(self, display_name, address): def add_cc(self, display_name: str, address: str):
self.cc.append(_EmailAddressDoc(display_name=display_name, address=address)) self.cc.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_bcc(self, display_name, address): def add_bcc(self, display_name: str, address: str):
self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address)) self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_attachment(self, filename, content_type, sha256): def add_attachment(self, filename: str, content_type: str, sha256: str):
self.attachments.append( self.attachments.append(
_EmailAttachmentDoc( _EmailAttachmentDoc(
filename=filename, content_type=content_type, sha256=sha256 filename=filename, content_type=content_type, sha256=sha256
) )
) ) # pyright: ignore[reportCallIssue]
class _ForensicReportDoc(Document): class _ForensicReportDoc(Document):
@@ -197,15 +201,15 @@ class _SMTPTLSPolicyDoc(InnerDoc):
def add_failure_details( def add_failure_details(
self, self,
result_type, result_type: Optional[str] = None,
ip_address, ip_address: Optional[str] = None,
receiving_ip, receiving_ip: Optional[str] = None,
receiving_mx_helo, receiving_mx_helo: Optional[str] = None,
failed_session_count, failed_session_count: Optional[int] = None,
sending_mta_ip=None, sending_mta_ip: Optional[str] = None,
receiving_mx_hostname=None, receiving_mx_hostname: Optional[str] = None,
additional_information_uri=None, additional_information_uri: Optional[str] = None,
failure_reason_code=None, failure_reason_code: Union[str, int, None] = None,
): ):
_details = _SMTPTLSFailureDetailsDoc( _details = _SMTPTLSFailureDetailsDoc(
result_type=result_type, result_type=result_type,
@@ -218,7 +222,7 @@ class _SMTPTLSPolicyDoc(InnerDoc):
additional_information=additional_information_uri, additional_information=additional_information_uri,
failure_reason_code=failure_reason_code, failure_reason_code=failure_reason_code,
) )
self.failure_details.append(_details) self.failure_details.append(_details) # pyright: ignore[reportCallIssue]
class _SMTPTLSReportDoc(Document): class _SMTPTLSReportDoc(Document):
@@ -235,13 +239,14 @@ class _SMTPTLSReportDoc(Document):
def add_policy( def add_policy(
self, self,
policy_type, policy_type: str,
policy_domain, policy_domain: str,
successful_session_count, successful_session_count: int,
failed_session_count, failed_session_count: int,
policy_string=None, *,
mx_host_patterns=None, policy_string: Optional[str] = None,
failure_details=None, mx_host_patterns: Optional[list[str]] = None,
failure_details: Optional[str] = None,
): ):
self.policies.append( self.policies.append(
policy_type=policy_type, policy_type=policy_type,
@@ -251,7 +256,7 @@ class _SMTPTLSReportDoc(Document):
policy_string=policy_string, policy_string=policy_string,
mx_host_patterns=mx_host_patterns, mx_host_patterns=mx_host_patterns,
failure_details=failure_details, failure_details=failure_details,
) ) # pyright: ignore[reportCallIssue]
class AlreadySaved(ValueError): class AlreadySaved(ValueError):
@@ -259,24 +264,25 @@ class AlreadySaved(ValueError):
def set_hosts( def set_hosts(
hosts, hosts: Union[str, list[str]],
use_ssl=False, *,
ssl_cert_path=None, use_ssl: bool = False,
username=None, ssl_cert_path: Optional[str] = None,
password=None, username: Optional[str] = None,
apiKey=None, password: Optional[str] = None,
timeout=60.0, api_key: Optional[str] = None,
timeout: float = 60.0,
): ):
""" """
Sets the Elasticsearch hosts to use Sets the Elasticsearch hosts to use
Args: Args:
hosts (str): A single hostname or URL, or list of hostnames or URLs hosts (str | list[str]): A single hostname or URL, or list of hostnames or URLs
use_ssl (bool): Use a HTTPS connection to the server use_ssl (bool): Use an HTTPS connection to the server
ssl_cert_path (str): Path to the certificate chain ssl_cert_path (str): Path to the certificate chain
username (str): The username to use for authentication username (str): The username to use for authentication
password (str): The password to use for authentication password (str): The password to use for authentication
apiKey (str): The Base64 encoded API key to use for authentication api_key (str): The Base64 encoded API key to use for authentication
timeout (float): Timeout in seconds timeout (float): Timeout in seconds
""" """
if not isinstance(hosts, list): if not isinstance(hosts, list):
@@ -289,14 +295,14 @@ def set_hosts(
conn_params["ca_certs"] = ssl_cert_path conn_params["ca_certs"] = ssl_cert_path
else: else:
conn_params["verify_certs"] = False conn_params["verify_certs"] = False
if username: if username and password:
conn_params["http_auth"] = username + ":" + password conn_params["http_auth"] = username + ":" + password
if apiKey: if api_key:
conn_params["api_key"] = apiKey conn_params["api_key"] = api_key
connections.create_connection(**conn_params) connections.create_connection(**conn_params)
def create_indexes(names, settings=None): def create_indexes(names: list[str], settings: Optional[dict[str, Any]] = None):
""" """
Create Elasticsearch indexes Create Elasticsearch indexes
@@ -319,7 +325,10 @@ def create_indexes(names, settings=None):
raise ElasticsearchError("Elasticsearch error: {0}".format(e.__str__())) raise ElasticsearchError("Elasticsearch error: {0}".format(e.__str__()))
def migrate_indexes(aggregate_indexes=None, forensic_indexes=None): def migrate_indexes(
aggregate_indexes: Optional[list[str]] = None,
forensic_indexes: Optional[list[str]] = None,
):
""" """
Updates index mappings Updates index mappings
@@ -358,7 +367,7 @@ def migrate_indexes(aggregate_indexes=None, forensic_indexes=None):
} }
Index(new_index_name).create() Index(new_index_name).create()
Index(new_index_name).put_mapping(doc_type=doc, body=body) Index(new_index_name).put_mapping(doc_type=doc, body=body)
reindex(connections.get_connection(), aggregate_index_name, new_index_name) reindex(connections.get_connection(), aggregate_index_name, new_index_name) # pyright: ignore[reportArgumentType]
Index(aggregate_index_name).delete() Index(aggregate_index_name).delete()
for forensic_index in forensic_indexes: for forensic_index in forensic_indexes:
@@ -366,18 +375,18 @@ def migrate_indexes(aggregate_indexes=None, forensic_indexes=None):
def save_aggregate_report_to_elasticsearch( def save_aggregate_report_to_elasticsearch(
aggregate_report, aggregate_report: dict[str, Any],
index_suffix=None, index_suffix: Optional[str] = None,
index_prefix=None, index_prefix: Optional[str] = None,
monthly_indexes=False, monthly_indexes: Optional[bool] = False,
number_of_shards=1, number_of_shards: int = 1,
number_of_replicas=0, number_of_replicas: int = 0,
): ):
""" """
Saves a parsed DMARC aggregate report to Elasticsearch Saves a parsed DMARC aggregate report to Elasticsearch
Args: Args:
aggregate_report (OrderedDict): A parsed forensic report aggregate_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -395,21 +404,17 @@ def save_aggregate_report_to_elasticsearch(
domain = aggregate_report["policy_published"]["domain"] domain = aggregate_report["policy_published"]["domain"]
begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True) begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True)
end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True) end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True)
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
if monthly_indexes: if monthly_indexes:
index_date = begin_date.strftime("%Y-%m") index_date = begin_date.strftime("%Y-%m")
else: else:
index_date = begin_date.strftime("%Y-%m-%d") index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) # type: ignore
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) # pyright: ignore[reportArgumentType]
domain_query = Q(dict(match_phrase={"published_policy.domain": domain})) domain_query = Q(dict(match_phrase={"published_policy.domain": domain})) # pyright: ignore[reportArgumentType]
begin_date_query = Q(dict(match=dict(date_begin=begin_date))) begin_date_query = Q(dict(match=dict(date_begin=begin_date))) # pyright: ignore[reportArgumentType]
end_date_query = Q(dict(match=dict(date_end=end_date))) end_date_query = Q(dict(match=dict(date_end=end_date))) # pyright: ignore[reportArgumentType]
if index_suffix is not None: if index_suffix is not None:
search_index = "dmarc_aggregate_{0}*".format(index_suffix) search_index = "dmarc_aggregate_{0}*".format(index_suffix)
@@ -421,6 +426,8 @@ def save_aggregate_report_to_elasticsearch(
query = org_name_query & report_id_query & domain_query query = org_name_query & report_id_query & domain_query
query = query & begin_date_query & end_date_query query = query & begin_date_query & end_date_query
search.query = query search.query = query
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
try: try:
existing = search.execute() existing = search.execute()
@@ -450,6 +457,17 @@ def save_aggregate_report_to_elasticsearch(
) )
for record in aggregate_report["records"]: for record in aggregate_report["records"]:
begin_date = human_timestamp_to_datetime(record["interval_begin"], to_utc=True)
end_date = human_timestamp_to_datetime(record["interval_end"], to_utc=True)
normalized_timespan = record["normalized_timespan"]
if monthly_indexes:
index_date = begin_date.strftime("%Y-%m")
else:
index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
agg_doc = _AggregateReportDoc( agg_doc = _AggregateReportDoc(
xml_schema=aggregate_report["xml_schema"], xml_schema=aggregate_report["xml_schema"],
org_name=metadata["org_name"], org_name=metadata["org_name"],
@@ -457,8 +475,9 @@ def save_aggregate_report_to_elasticsearch(
org_extra_contact_info=metadata["org_extra_contact_info"], org_extra_contact_info=metadata["org_extra_contact_info"],
report_id=metadata["report_id"], report_id=metadata["report_id"],
date_range=date_range, date_range=date_range,
date_begin=aggregate_report["begin_date"], date_begin=begin_date,
date_end=aggregate_report["end_date"], date_end=end_date,
normalized_timespan=normalized_timespan,
errors=metadata["errors"], errors=metadata["errors"],
published_policy=published_policy, published_policy=published_policy,
source_ip_address=record["source"]["ip_address"], source_ip_address=record["source"]["ip_address"],
@@ -508,7 +527,7 @@ def save_aggregate_report_to_elasticsearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
) )
create_indexes([index], index_settings) create_indexes([index], index_settings)
agg_doc.meta.index = index agg_doc.meta.index = index # pyright: ignore[reportOptionalMemberAccess, reportAttributeAccessIssue]
try: try:
agg_doc.save() agg_doc.save()
@@ -517,18 +536,18 @@ def save_aggregate_report_to_elasticsearch(
def save_forensic_report_to_elasticsearch( def save_forensic_report_to_elasticsearch(
forensic_report, forensic_report: dict[str, Any],
index_suffix=None, index_suffix: Optional[Any] = None,
index_prefix=None, index_prefix: Optional[str] = None,
monthly_indexes=False, monthly_indexes: Optional[bool] = False,
number_of_shards=1, number_of_shards: int = 1,
number_of_replicas=0, number_of_replicas: int = 0,
): ):
""" """
Saves a parsed DMARC forensic report to Elasticsearch Saves a parsed DMARC forensic report to Elasticsearch
Args: Args:
forensic_report (OrderedDict): A parsed forensic report forensic_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily monthly_indexes (bool): Use monthly indexes instead of daily
@@ -548,7 +567,7 @@ def save_forensic_report_to_elasticsearch(
sample_date = forensic_report["parsed_sample"]["date"] sample_date = forensic_report["parsed_sample"]["date"]
sample_date = human_timestamp_to_datetime(sample_date) sample_date = human_timestamp_to_datetime(sample_date)
original_headers = forensic_report["parsed_sample"]["headers"] original_headers = forensic_report["parsed_sample"]["headers"]
headers = OrderedDict() headers: dict[str, Any] = {}
for original_header in original_headers: for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header] headers[original_header.lower()] = original_headers[original_header]
@@ -562,7 +581,7 @@ def save_forensic_report_to_elasticsearch(
if index_prefix is not None: if index_prefix is not None:
search_index = "{0}{1}".format(index_prefix, search_index) search_index = "{0}{1}".format(index_prefix, search_index)
search = Search(index=search_index) search = Search(index=search_index)
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds))) q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds))) # pyright: ignore[reportArgumentType]
from_ = None from_ = None
to_ = None to_ = None
@@ -577,7 +596,7 @@ def save_forensic_report_to_elasticsearch(
from_ = dict() from_ = dict()
from_["sample.headers.from"] = headers["from"] from_["sample.headers.from"] = headers["from"]
from_query = Q(dict(match_phrase=from_)) from_query = Q(dict(match_phrase=from_)) # pyright: ignore[reportArgumentType]
q = q & from_query q = q & from_query
if "to" in headers: if "to" in headers:
# We convert the TO header from a string list to a flat string. # We convert the TO header from a string list to a flat string.
@@ -589,12 +608,12 @@ def save_forensic_report_to_elasticsearch(
to_ = dict() to_ = dict()
to_["sample.headers.to"] = headers["to"] to_["sample.headers.to"] = headers["to"]
to_query = Q(dict(match_phrase=to_)) to_query = Q(dict(match_phrase=to_)) # pyright: ignore[reportArgumentType]
q = q & to_query q = q & to_query
if "subject" in headers: if "subject" in headers:
subject = headers["subject"] subject = headers["subject"]
subject_query = {"match_phrase": {"sample.headers.subject": subject}} subject_query = {"match_phrase": {"sample.headers.subject": subject}}
q = q & Q(subject_query) q = q & Q(subject_query) # pyright: ignore[reportArgumentType]
search.query = q search.query = q
existing = search.execute() existing = search.execute()
@@ -672,7 +691,7 @@ def save_forensic_report_to_elasticsearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
) )
create_indexes([index], index_settings) create_indexes([index], index_settings)
forensic_doc.meta.index = index forensic_doc.meta.index = index # pyright: ignore[reportAttributeAccessIssue, reportOptionalMemberAccess]
try: try:
forensic_doc.save() forensic_doc.save()
except Exception as e: except Exception as e:
@@ -684,18 +703,18 @@ def save_forensic_report_to_elasticsearch(
def save_smtp_tls_report_to_elasticsearch( def save_smtp_tls_report_to_elasticsearch(
report, report: dict[str, Any],
index_suffix=None, index_suffix: Optional[str] = None,
index_prefix=None, index_prefix: Optional[str] = None,
monthly_indexes=False, monthly_indexes: bool = False,
number_of_shards=1, number_of_shards: int = 1,
number_of_replicas=0, number_of_replicas: int = 0,
): ):
""" """
Saves a parsed SMTP TLS report to Elasticsearch Saves a parsed SMTP TLS report to Elasticsearch
Args: Args:
report (OrderedDict): A parsed SMTP TLS report report (dict): A parsed SMTP TLS report
index_suffix (str): The suffix of the name of the index to save to index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -719,10 +738,10 @@ def save_smtp_tls_report_to_elasticsearch(
report["begin_date"] = begin_date report["begin_date"] = begin_date
report["end_date"] = end_date report["end_date"] = end_date
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) # pyright: ignore[reportArgumentType]
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) # pyright: ignore[reportArgumentType]
begin_date_query = Q(dict(match=dict(date_begin=begin_date))) begin_date_query = Q(dict(match=dict(date_begin=begin_date))) # pyright: ignore[reportArgumentType]
end_date_query = Q(dict(match=dict(date_end=end_date))) end_date_query = Q(dict(match=dict(date_end=end_date))) # pyright: ignore[reportArgumentType]
if index_suffix is not None: if index_suffix is not None:
search_index = "smtp_tls_{0}*".format(index_suffix) search_index = "smtp_tls_{0}*".format(index_suffix)
@@ -781,7 +800,7 @@ def save_smtp_tls_report_to_elasticsearch(
policy_doc = _SMTPTLSPolicyDoc( policy_doc = _SMTPTLSPolicyDoc(
policy_domain=policy["policy_domain"], policy_domain=policy["policy_domain"],
policy_type=policy["policy_type"], policy_type=policy["policy_type"],
succesful_session_count=policy["successful_session_count"], successful_session_count=policy["successful_session_count"],
failed_session_count=policy["failed_session_count"], failed_session_count=policy["failed_session_count"],
policy_string=policy_strings, policy_string=policy_strings,
mx_host_patterns=mx_host_patterns, mx_host_patterns=mx_host_patterns,
@@ -823,10 +842,10 @@ def save_smtp_tls_report_to_elasticsearch(
additional_information_uri=additional_information_uri, additional_information_uri=additional_information_uri,
failure_reason_code=failure_reason_code, failure_reason_code=failure_reason_code,
) )
smtp_tls_doc.policies.append(policy_doc) smtp_tls_doc.policies.append(policy_doc) # pyright: ignore[reportCallIssue]
create_indexes([index], index_settings) create_indexes([index], index_settings)
smtp_tls_doc.meta.index = index smtp_tls_doc.meta.index = index # pyright: ignore[reportOptionalMemberAccess, reportAttributeAccessIssue]
try: try:
smtp_tls_doc.save() smtp_tls_doc.save()

View File

@@ -1,17 +1,19 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import annotations
import logging import logging
import logging.handlers import logging.handlers
import json
import threading import threading
from typing import Any
from pygelf import GelfTcpHandler, GelfTlsHandler, GelfUdpHandler
from parsedmarc import ( from parsedmarc import (
parsed_aggregate_reports_to_csv_rows, parsed_aggregate_reports_to_csv_rows,
parsed_forensic_reports_to_csv_rows, parsed_forensic_reports_to_csv_rows,
parsed_smtp_tls_reports_to_csv_rows, parsed_smtp_tls_reports_to_csv_rows,
) )
from pygelf import GelfTcpHandler, GelfUdpHandler, GelfTlsHandler
log_context_data = threading.local() log_context_data = threading.local()
@@ -48,7 +50,7 @@ class GelfClient(object):
) )
self.logger.addHandler(self.handler) self.logger.addHandler(self.handler)
def save_aggregate_report_to_gelf(self, aggregate_reports): def save_aggregate_report_to_gelf(self, aggregate_reports: list[dict[str, Any]]):
rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports) rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports)
for row in rows: for row in rows:
log_context_data.parsedmarc = row log_context_data.parsedmarc = row
@@ -56,12 +58,14 @@ class GelfClient(object):
log_context_data.parsedmarc = None log_context_data.parsedmarc = None
def save_forensic_report_to_gelf(self, forensic_reports): def save_forensic_report_to_gelf(self, forensic_reports: list[dict[str, Any]]):
rows = parsed_forensic_reports_to_csv_rows(forensic_reports) rows = parsed_forensic_reports_to_csv_rows(forensic_reports)
for row in rows: for row in rows:
self.logger.info(json.dumps(row)) log_context_data.parsedmarc = row
self.logger.info("parsedmarc forensic report")
def save_smtp_tls_report_to_gelf(self, smtp_tls_reports): def save_smtp_tls_report_to_gelf(self, smtp_tls_reports: dict[str, Any]):
rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports) rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports)
for row in rows: for row in rows:
self.logger.info(json.dumps(row)) log_context_data.parsedmarc = row
self.logger.info("parsedmarc smtptls report")

View File

@@ -1,15 +1,17 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import annotations
import json import json
from ssl import create_default_context from ssl import SSLContext, create_default_context
from typing import Any, Optional, Union
from kafka import KafkaProducer from kafka import KafkaProducer
from kafka.errors import NoBrokersAvailable, UnknownTopicOrPartitionError from kafka.errors import NoBrokersAvailable, UnknownTopicOrPartitionError
from collections import OrderedDict
from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import __version__ from parsedmarc import __version__
from parsedmarc.log import logger from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
class KafkaError(RuntimeError): class KafkaError(RuntimeError):
@@ -18,7 +20,13 @@ class KafkaError(RuntimeError):
class KafkaClient(object): class KafkaClient(object):
def __init__( def __init__(
self, kafka_hosts, ssl=False, username=None, password=None, ssl_context=None self,
kafka_hosts: list[str],
*,
ssl: Optional[bool] = False,
username: Optional[str] = None,
password: Optional[str] = None,
ssl_context: Optional[SSLContext] = None,
): ):
""" """
Initializes the Kafka client Initializes the Kafka client
@@ -28,7 +36,7 @@ class KafkaClient(object):
ssl (bool): Use a SSL/TLS connection ssl (bool): Use a SSL/TLS connection
username (str): An optional username username (str): An optional username
password (str): An optional password password (str): An optional password
ssl_context: SSL context options ssl_context (SSLContext): SSL context options
Notes: Notes:
``use_ssl=True`` is implied when a username or password are ``use_ssl=True`` is implied when a username or password are
@@ -38,7 +46,7 @@ class KafkaClient(object):
``$ConnectionString``, and the password is the ``$ConnectionString``, and the password is the
Azure Event Hub connection string. Azure Event Hub connection string.
""" """
config = dict( config: dict[str, Any] = dict(
value_serializer=lambda v: json.dumps(v).encode("utf-8"), value_serializer=lambda v: json.dumps(v).encode("utf-8"),
bootstrap_servers=kafka_hosts, bootstrap_servers=kafka_hosts,
client_id="parsedmarc-{0}".format(__version__), client_id="parsedmarc-{0}".format(__version__),
@@ -55,7 +63,7 @@ class KafkaClient(object):
raise KafkaError("No Kafka brokers available") raise KafkaError("No Kafka brokers available")
@staticmethod @staticmethod
def strip_metadata(report): def strip_metadata(report: dict[str, Any]):
""" """
Duplicates org_name, org_email and report_id into JSON root Duplicates org_name, org_email and report_id into JSON root
and removes report_metadata key to bring it more inline and removes report_metadata key to bring it more inline
@@ -69,7 +77,7 @@ class KafkaClient(object):
return report return report
@staticmethod @staticmethod
def generate_daterange(report): def generate_date_range(report: dict[str, Any]):
""" """
Creates a date_range timestamp with format YYYY-MM-DD-T-HH:MM:SS Creates a date_range timestamp with format YYYY-MM-DD-T-HH:MM:SS
based on begin and end dates for easier parsing in Kibana. based on begin and end dates for easier parsing in Kibana.
@@ -86,7 +94,11 @@ class KafkaClient(object):
logger.debug("date_range is {}".format(date_range)) logger.debug("date_range is {}".format(date_range))
return date_range return date_range
def save_aggregate_reports_to_kafka(self, aggregate_reports, aggregate_topic): def save_aggregate_reports_to_kafka(
self,
aggregate_reports: Union[dict[str, Any], list[dict[str, Any]]],
aggregate_topic: str,
):
""" """
Saves aggregate DMARC reports to Kafka Saves aggregate DMARC reports to Kafka
@@ -96,16 +108,14 @@ class KafkaClient(object):
aggregate_topic (str): The name of the Kafka topic aggregate_topic (str): The name of the Kafka topic
""" """
if isinstance(aggregate_reports, dict) or isinstance( if isinstance(aggregate_reports, dict):
aggregate_reports, OrderedDict
):
aggregate_reports = [aggregate_reports] aggregate_reports = [aggregate_reports]
if len(aggregate_reports) < 1: if len(aggregate_reports) < 1:
return return
for report in aggregate_reports: for report in aggregate_reports:
report["date_range"] = self.generate_daterange(report) report["date_range"] = self.generate_date_range(report)
report = self.strip_metadata(report) report = self.strip_metadata(report)
for slice in report["records"]: for slice in report["records"]:
@@ -129,7 +139,11 @@ class KafkaClient(object):
except Exception as e: except Exception as e:
raise KafkaError("Kafka error: {0}".format(e.__str__())) raise KafkaError("Kafka error: {0}".format(e.__str__()))
def save_forensic_reports_to_kafka(self, forensic_reports, forensic_topic): def save_forensic_reports_to_kafka(
self,
forensic_reports: Union[dict[str, Any], list[dict[str, Any]]],
forensic_topic: str,
):
""" """
Saves forensic DMARC reports to Kafka, sends individual Saves forensic DMARC reports to Kafka, sends individual
records (slices) since Kafka requires messages to be <= 1MB records (slices) since Kafka requires messages to be <= 1MB
@@ -159,7 +173,11 @@ class KafkaClient(object):
except Exception as e: except Exception as e:
raise KafkaError("Kafka error: {0}".format(e.__str__())) raise KafkaError("Kafka error: {0}".format(e.__str__()))
def save_smtp_tls_reports_to_kafka(self, smtp_tls_reports, smtp_tls_topic): def save_smtp_tls_reports_to_kafka(
self,
smtp_tls_reports: Union[list[dict[str, Any]], dict[str, Any]],
smtp_tls_topic: str,
):
""" """
Saves SMTP TLS reports to Kafka, sends individual Saves SMTP TLS reports to Kafka, sends individual
records (slices) since Kafka requires messages to be <= 1MB records (slices) since Kafka requires messages to be <= 1MB

View File

@@ -1,9 +1,15 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from parsedmarc.log import logger
from __future__ import annotations
from typing import Any
from azure.core.exceptions import HttpResponseError from azure.core.exceptions import HttpResponseError
from azure.identity import ClientSecretCredential from azure.identity import ClientSecretCredential
from azure.monitor.ingestion import LogsIngestionClient from azure.monitor.ingestion import LogsIngestionClient
from parsedmarc.log import logger
class LogAnalyticsException(Exception): class LogAnalyticsException(Exception):
"""Raised when an Elasticsearch error occurs""" """Raised when an Elasticsearch error occurs"""
@@ -102,7 +108,12 @@ class LogAnalyticsClient(object):
"Invalid configuration. " + "One or more required settings are missing." "Invalid configuration. " + "One or more required settings are missing."
) )
def publish_json(self, results, logs_client: LogsIngestionClient, dcr_stream: str): def publish_json(
self,
results,
logs_client: LogsIngestionClient,
dcr_stream: str,
):
""" """
Background function to publish given Background function to publish given
DMARC report to specific Data Collection Rule. DMARC report to specific Data Collection Rule.
@@ -121,7 +132,11 @@ class LogAnalyticsClient(object):
raise LogAnalyticsException("Upload failed: {error}".format(error=e)) raise LogAnalyticsException("Upload failed: {error}".format(error=e))
def publish_results( def publish_results(
self, results, save_aggregate: bool, save_forensic: bool, save_smtp_tls: bool self,
results: dict[str, Any],
save_aggregate: bool,
save_forensic: bool,
save_smtp_tls: bool,
): ):
""" """
Function to publish DMARC and/or SMTP TLS reports to Log Analytics Function to publish DMARC and/or SMTP TLS reports to Log Analytics

View File

@@ -1,3 +1,7 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from base64 import urlsafe_b64decode from base64 import urlsafe_b64decode
from functools import lru_cache from functools import lru_cache
from pathlib import Path from pathlib import Path
@@ -112,14 +116,14 @@ class GmailConnection(MailboxConnection):
else: else:
return [id for id in self._fetch_all_message_ids(reports_label_id)] return [id for id in self._fetch_all_message_ids(reports_label_id)]
def fetch_message(self, message_id): def fetch_message(self, message_id) -> str:
msg = ( msg = (
self.service.users() self.service.users()
.messages() .messages()
.get(userId="me", id=message_id, format="raw") .get(userId="me", id=message_id, format="raw")
.execute() .execute()
) )
return urlsafe_b64decode(msg["raw"]) return urlsafe_b64decode(msg["raw"]).decode(errors="replace")
def delete_message(self, message_id: str): def delete_message(self, message_id: str):
self.service.users().messages().delete(userId="me", id=message_id) self.service.users().messages().delete(userId="me", id=message_id)
@@ -152,3 +156,4 @@ class GmailConnection(MailboxConnection):
for label in labels: for label in labels:
if label_name == label["id"] or label_name == label["name"]: if label_name == label["id"] or label_name == label["name"]:
return label["id"] return label["id"]
return ""

View File

@@ -1,8 +1,12 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from enum import Enum from enum import Enum
from functools import lru_cache from functools import lru_cache
from pathlib import Path from pathlib import Path
from time import sleep from time import sleep
from typing import List, Optional from typing import Any, List, Optional, Union
from azure.identity import ( from azure.identity import (
UsernamePasswordCredential, UsernamePasswordCredential,
@@ -24,7 +28,7 @@ class AuthMethod(Enum):
def _get_cache_args(token_path: Path, allow_unencrypted_storage): def _get_cache_args(token_path: Path, allow_unencrypted_storage):
cache_args = { cache_args: dict[str, Any] = {
"cache_persistence_options": TokenCachePersistenceOptions( "cache_persistence_options": TokenCachePersistenceOptions(
name="parsedmarc", allow_unencrypted_storage=allow_unencrypted_storage name="parsedmarc", allow_unencrypted_storage=allow_unencrypted_storage
) )
@@ -147,9 +151,9 @@ class MSGraphConnection(MailboxConnection):
else: else:
logger.warning(f"Unknown response {resp.status_code} {resp.json()}") logger.warning(f"Unknown response {resp.status_code} {resp.json()}")
def fetch_messages(self, folder_name: str, **kwargs) -> List[str]: def fetch_messages(self, reports_folder: str, **kwargs) -> List[str]:
"""Returns a list of message UIDs in the specified folder""" """Returns a list of message UIDs in the specified folder"""
folder_id = self._find_folder_id_from_folder_path(folder_name) folder_id = self._find_folder_id_from_folder_path(reports_folder)
url = f"/users/{self.mailbox_name}/mailFolders/{folder_id}/messages" url = f"/users/{self.mailbox_name}/mailFolders/{folder_id}/messages"
since = kwargs.get("since") since = kwargs.get("since")
if not since: if not since:
@@ -162,7 +166,7 @@ class MSGraphConnection(MailboxConnection):
def _get_all_messages(self, url, batch_size, since): def _get_all_messages(self, url, batch_size, since):
messages: list messages: list
params = {"$select": "id"} params: dict[str, Union[str, int]] = {"$select": "id"}
if since: if since:
params["$filter"] = f"receivedDateTime ge {since}" params["$filter"] = f"receivedDateTime ge {since}"
if batch_size and batch_size > 0: if batch_size and batch_size > 0:

View File

@@ -1,3 +1,9 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from typing import cast
from time import sleep from time import sleep
from imapclient.exceptions import IMAPClientError from imapclient.exceptions import IMAPClientError
@@ -11,14 +17,14 @@ from parsedmarc.mail.mailbox_connection import MailboxConnection
class IMAPConnection(MailboxConnection): class IMAPConnection(MailboxConnection):
def __init__( def __init__(
self, self,
host=None, host: str,
user=None, user: str,
password=None, password: str,
port=None, port: int = 993,
ssl=True, ssl: bool = True,
verify=True, verify: bool = True,
timeout=30, timeout: int = 30,
max_retries=4, max_retries: int = 4,
): ):
self._username = user self._username = user
self._password = password self._password = password
@@ -40,18 +46,18 @@ class IMAPConnection(MailboxConnection):
def fetch_messages(self, reports_folder: str, **kwargs): def fetch_messages(self, reports_folder: str, **kwargs):
self._client.select_folder(reports_folder) self._client.select_folder(reports_folder)
since = kwargs.get("since") since = kwargs.get("since")
if since: if since is not None:
return self._client.search(["SINCE", since]) return self._client.search(f"SINCE {since}")
else: else:
return self._client.search() return self._client.search()
def fetch_message(self, message_id): def fetch_message(self, message_id: int):
return self._client.fetch_message(message_id, parse=False) return cast(str, self._client.fetch_message(message_id, parse=False))
def delete_message(self, message_id: str): def delete_message(self, message_id: int):
self._client.delete_messages([message_id]) self._client.delete_messages([message_id])
def move_message(self, message_id: str, folder_name: str): def move_message(self, message_id: int, folder_name: str):
self._client.move_messages([message_id], folder_name) self._client.move_messages([message_id], folder_name)
def keepalive(self): def keepalive(self):

View File

@@ -1,5 +1,8 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from abc import ABC from abc import ABC
from typing import List
class MailboxConnection(ABC): class MailboxConnection(ABC):
@@ -10,16 +13,16 @@ class MailboxConnection(ABC):
def create_folder(self, folder_name: str): def create_folder(self, folder_name: str):
raise NotImplementedError raise NotImplementedError
def fetch_messages(self, reports_folder: str, **kwargs) -> List[str]: def fetch_messages(self, reports_folder: str, **kwargs):
raise NotImplementedError raise NotImplementedError
def fetch_message(self, message_id) -> str: def fetch_message(self, message_id) -> str:
raise NotImplementedError raise NotImplementedError
def delete_message(self, message_id: str): def delete_message(self, message_id):
raise NotImplementedError raise NotImplementedError
def move_message(self, message_id: str, folder_name: str): def move_message(self, message_id, folder_name: str):
raise NotImplementedError raise NotImplementedError
def keepalive(self): def keepalive(self):

View File

@@ -1,16 +1,21 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
import mailbox
import os
from time import sleep from time import sleep
from typing import Dict
from parsedmarc.log import logger from parsedmarc.log import logger
from parsedmarc.mail.mailbox_connection import MailboxConnection from parsedmarc.mail.mailbox_connection import MailboxConnection
import mailbox
import os
class MaildirConnection(MailboxConnection): class MaildirConnection(MailboxConnection):
def __init__( def __init__(
self, self,
maildir_path=None, maildir_path: str,
maildir_create=False, maildir_create: bool = False,
): ):
self._maildir_path = maildir_path self._maildir_path = maildir_path
self._maildir_create = maildir_create self._maildir_create = maildir_create
@@ -27,27 +32,31 @@ class MaildirConnection(MailboxConnection):
) )
raise Exception(ex) raise Exception(ex)
self._client = mailbox.Maildir(maildir_path, create=maildir_create) self._client = mailbox.Maildir(maildir_path, create=maildir_create)
self._subfolder_client = {} self._subfolder_client: Dict[str, mailbox.Maildir] = {}
def create_folder(self, folder_name: str): def create_folder(self, folder_name: str):
self._subfolder_client[folder_name] = self._client.add_folder(folder_name) self._subfolder_client[folder_name] = self._client.add_folder(folder_name)
self._client.add_folder(folder_name)
def fetch_messages(self, reports_folder: str, **kwargs): def fetch_messages(self, reports_folder: str, **kwargs):
return self._client.keys() return self._client.keys()
def fetch_message(self, message_id): def fetch_message(self, message_id: str) -> str:
return self._client.get(message_id).as_string() msg = self._client.get(message_id)
if msg is not None:
msg = msg.as_string()
if msg is not None:
return msg
return ""
def delete_message(self, message_id: str): def delete_message(self, message_id: str):
self._client.remove(message_id) self._client.remove(message_id)
def move_message(self, message_id: str, folder_name: str): def move_message(self, message_id: str, folder_name: str):
message_data = self._client.get(message_id) message_data = self._client.get(message_id)
if folder_name not in self._subfolder_client.keys(): if message_data is None:
self._subfolder_client = mailbox.Maildir( return
os.join(self.maildir_path, folder_name), create=self.maildir_create if folder_name not in self._subfolder_client:
) self._subfolder_client[folder_name] = self._client.add_folder(folder_name)
self._subfolder_client[folder_name].add(message_data) self._subfolder_client[folder_name].add(message_data)
self._client.remove(message_id) self._client.remove(message_id)

View File

@@ -1,27 +1,29 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from collections import OrderedDict from __future__ import annotations
from typing import Any, Optional, Union
from opensearchpy import ( from opensearchpy import (
Q, Boolean,
connections, Date,
Object,
Document, Document,
Index, Index,
Nested,
InnerDoc, InnerDoc,
Integer, Integer,
Text,
Boolean,
Ip, Ip,
Date, Nested,
Object,
Q,
Search, Search,
Text,
connections,
) )
from opensearchpy.helpers import reindex from opensearchpy.helpers import reindex
from parsedmarc import InvalidForensicReport
from parsedmarc.log import logger from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import InvalidForensicReport
class OpenSearchError(Exception): class OpenSearchError(Exception):
@@ -67,6 +69,8 @@ class _AggregateReportDoc(Document):
date_range = Date() date_range = Date()
date_begin = Date() date_begin = Date()
date_end = Date() date_end = Date()
normalized_timespan = Boolean()
original_timespan_seconds = Integer
errors = Text() errors = Text()
published_policy = Object(_PublishedPolicy) published_policy = Object(_PublishedPolicy)
source_ip_address = Ip() source_ip_address = Ip()
@@ -87,18 +91,18 @@ class _AggregateReportDoc(Document):
dkim_results = Nested(_DKIMResult) dkim_results = Nested(_DKIMResult)
spf_results = Nested(_SPFResult) spf_results = Nested(_SPFResult)
def add_policy_override(self, type_, comment): def add_policy_override(self, type_: str, comment: str):
self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment)) self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment))
def add_dkim_result(self, domain, selector, result): def add_dkim_result(self, domain: str, selector: str, result: _DKIMResult):
self.dkim_results.append( self.dkim_results.append(
_DKIMResult(domain=domain, selector=selector, result=result) _DKIMResult(domain=domain, selector=selector, result=result)
) )
def add_spf_result(self, domain, scope, result): def add_spf_result(self, domain: str, scope: str, result: _SPFResult):
self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result)) self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result))
def save(self, **kwargs): def save(self, **kwargs): # pyright: ignore[reportIncompatibleMethodOverride]
self.passed_dmarc = False self.passed_dmarc = False
self.passed_dmarc = self.spf_aligned or self.dkim_aligned self.passed_dmarc = self.spf_aligned or self.dkim_aligned
@@ -131,21 +135,21 @@ class _ForensicSampleDoc(InnerDoc):
body = Text() body = Text()
attachments = Nested(_EmailAttachmentDoc) attachments = Nested(_EmailAttachmentDoc)
def add_to(self, display_name, address): def add_to(self, display_name: str, address: str):
self.to.append(_EmailAddressDoc(display_name=display_name, address=address)) self.to.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_reply_to(self, display_name, address): def add_reply_to(self, display_name: str, address: str):
self.reply_to.append( self.reply_to.append(
_EmailAddressDoc(display_name=display_name, address=address) _EmailAddressDoc(display_name=display_name, address=address)
) )
def add_cc(self, display_name, address): def add_cc(self, display_name: str, address: str):
self.cc.append(_EmailAddressDoc(display_name=display_name, address=address)) self.cc.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_bcc(self, display_name, address): def add_bcc(self, display_name: str, address: str):
self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address)) self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_attachment(self, filename, content_type, sha256): def add_attachment(self, filename: str, content_type: str, sha256: str):
self.attachments.append( self.attachments.append(
_EmailAttachmentDoc( _EmailAttachmentDoc(
filename=filename, content_type=content_type, sha256=sha256 filename=filename, content_type=content_type, sha256=sha256
@@ -197,15 +201,15 @@ class _SMTPTLSPolicyDoc(InnerDoc):
def add_failure_details( def add_failure_details(
self, self,
result_type, result_type: Optional[str] = None,
ip_address, ip_address: Optional[str] = None,
receiving_ip, receiving_ip: Optional[str] = None,
receiving_mx_helo, receiving_mx_helo: Optional[str] = None,
failed_session_count, failed_session_count: Optional[int] = None,
sending_mta_ip=None, sending_mta_ip: Optional[str] = None,
receiving_mx_hostname=None, receiving_mx_hostname: Optional[str] = None,
additional_information_uri=None, additional_information_uri: Optional[str] = None,
failure_reason_code=None, failure_reason_code: Union[str, int, None] = None,
): ):
_details = _SMTPTLSFailureDetailsDoc( _details = _SMTPTLSFailureDetailsDoc(
result_type=result_type, result_type=result_type,
@@ -235,13 +239,14 @@ class _SMTPTLSReportDoc(Document):
def add_policy( def add_policy(
self, self,
policy_type, policy_type: str,
policy_domain, policy_domain: str,
successful_session_count, successful_session_count: int,
failed_session_count, failed_session_count: int,
policy_string=None, *,
mx_host_patterns=None, policy_string: Optional[str] = None,
failure_details=None, mx_host_patterns: Optional[list[str]] = None,
failure_details: Optional[str] = None,
): ):
self.policies.append( self.policies.append(
policy_type=policy_type, policy_type=policy_type,
@@ -259,24 +264,25 @@ class AlreadySaved(ValueError):
def set_hosts( def set_hosts(
hosts, hosts: Union[str, list[str]],
use_ssl=False, *,
ssl_cert_path=None, use_ssl: Optional[bool] = False,
username=None, ssl_cert_path: Optional[str] = None,
password=None, username: Optional[str] = None,
apiKey=None, password: Optional[str] = None,
timeout=60.0, api_key: Optional[str] = None,
timeout: Optional[float] = 60.0,
): ):
""" """
Sets the OpenSearch hosts to use Sets the OpenSearch hosts to use
Args: Args:
hosts (str|list): A hostname or URL, or list of hostnames or URLs hosts (str|list[str]): A single hostname or URL, or list of hostnames or URLs
use_ssl (bool): Use an HTTPS connection to the server use_ssl (bool): Use an HTTPS connection to the server
ssl_cert_path (str): Path to the certificate chain ssl_cert_path (str): Path to the certificate chain
username (str): The username to use for authentication username (str): The username to use for authentication
password (str): The password to use for authentication password (str): The password to use for authentication
apiKey (str): The Base64 encoded API key to use for authentication api_key (str): The Base64 encoded API key to use for authentication
timeout (float): Timeout in seconds timeout (float): Timeout in seconds
""" """
if not isinstance(hosts, list): if not isinstance(hosts, list):
@@ -289,14 +295,14 @@ def set_hosts(
conn_params["ca_certs"] = ssl_cert_path conn_params["ca_certs"] = ssl_cert_path
else: else:
conn_params["verify_certs"] = False conn_params["verify_certs"] = False
if username: if username and password:
conn_params["http_auth"] = username + ":" + password conn_params["http_auth"] = username + ":" + password
if apiKey: if api_key:
conn_params["api_key"] = apiKey conn_params["api_key"] = api_key
connections.create_connection(**conn_params) connections.create_connection(**conn_params)
def create_indexes(names, settings=None): def create_indexes(names: list[str], settings: Optional[dict[str, Any]] = None):
""" """
Create OpenSearch indexes Create OpenSearch indexes
@@ -319,7 +325,10 @@ def create_indexes(names, settings=None):
raise OpenSearchError("OpenSearch error: {0}".format(e.__str__())) raise OpenSearchError("OpenSearch error: {0}".format(e.__str__()))
def migrate_indexes(aggregate_indexes=None, forensic_indexes=None): def migrate_indexes(
aggregate_indexes: Optional[list[str]] = None,
forensic_indexes: Optional[list[str]] = None,
):
""" """
Updates index mappings Updates index mappings
@@ -366,18 +375,18 @@ def migrate_indexes(aggregate_indexes=None, forensic_indexes=None):
def save_aggregate_report_to_opensearch( def save_aggregate_report_to_opensearch(
aggregate_report, aggregate_report: dict[str, Any],
index_suffix=None, index_suffix: Optional[str] = None,
index_prefix=None, index_prefix: Optional[str] = None,
monthly_indexes=False, monthly_indexes: bool = False,
number_of_shards=1, number_of_shards: int = 1,
number_of_replicas=0, number_of_replicas: int = 0,
): ):
""" """
Saves a parsed DMARC aggregate report to OpenSearch Saves a parsed DMARC aggregate report to OpenSearch
Args: Args:
aggregate_report (OrderedDict): A parsed forensic report aggregate_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -395,15 +404,11 @@ def save_aggregate_report_to_opensearch(
domain = aggregate_report["policy_published"]["domain"] domain = aggregate_report["policy_published"]["domain"]
begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True) begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True)
end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True) end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True)
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
if monthly_indexes: if monthly_indexes:
index_date = begin_date.strftime("%Y-%m") index_date = begin_date.strftime("%Y-%m")
else: else:
index_date = begin_date.strftime("%Y-%m-%d") index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
@@ -421,6 +426,8 @@ def save_aggregate_report_to_opensearch(
query = org_name_query & report_id_query & domain_query query = org_name_query & report_id_query & domain_query
query = query & begin_date_query & end_date_query query = query & begin_date_query & end_date_query
search.query = query search.query = query
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
try: try:
existing = search.execute() existing = search.execute()
@@ -450,6 +457,17 @@ def save_aggregate_report_to_opensearch(
) )
for record in aggregate_report["records"]: for record in aggregate_report["records"]:
begin_date = human_timestamp_to_datetime(record["interval_begin"], to_utc=True)
end_date = human_timestamp_to_datetime(record["interval_end"], to_utc=True)
normalized_timespan = record["normalized_timespan"]
if monthly_indexes:
index_date = begin_date.strftime("%Y-%m")
else:
index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
agg_doc = _AggregateReportDoc( agg_doc = _AggregateReportDoc(
xml_schema=aggregate_report["xml_schema"], xml_schema=aggregate_report["xml_schema"],
org_name=metadata["org_name"], org_name=metadata["org_name"],
@@ -457,8 +475,9 @@ def save_aggregate_report_to_opensearch(
org_extra_contact_info=metadata["org_extra_contact_info"], org_extra_contact_info=metadata["org_extra_contact_info"],
report_id=metadata["report_id"], report_id=metadata["report_id"],
date_range=date_range, date_range=date_range,
date_begin=aggregate_report["begin_date"], date_begin=begin_date,
date_end=aggregate_report["end_date"], date_end=end_date,
normalized_timespan=normalized_timespan,
errors=metadata["errors"], errors=metadata["errors"],
published_policy=published_policy, published_policy=published_policy,
source_ip_address=record["source"]["ip_address"], source_ip_address=record["source"]["ip_address"],
@@ -517,18 +536,18 @@ def save_aggregate_report_to_opensearch(
def save_forensic_report_to_opensearch( def save_forensic_report_to_opensearch(
forensic_report, forensic_report: dict[str, Any],
index_suffix=None, index_suffix: Optional[str] = None,
index_prefix=None, index_prefix: Optional[str] = None,
monthly_indexes=False, monthly_indexes: bool = False,
number_of_shards=1, number_of_shards: int = 1,
number_of_replicas=0, number_of_replicas: int = 0,
): ):
""" """
Saves a parsed DMARC forensic report to OpenSearch Saves a parsed DMARC forensic report to OpenSearch
Args: Args:
forensic_report (OrderedDict): A parsed forensic report forensic_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily monthly_indexes (bool): Use monthly indexes instead of daily
@@ -548,7 +567,7 @@ def save_forensic_report_to_opensearch(
sample_date = forensic_report["parsed_sample"]["date"] sample_date = forensic_report["parsed_sample"]["date"]
sample_date = human_timestamp_to_datetime(sample_date) sample_date = human_timestamp_to_datetime(sample_date)
original_headers = forensic_report["parsed_sample"]["headers"] original_headers = forensic_report["parsed_sample"]["headers"]
headers = OrderedDict() headers: dict[str, Any] = {}
for original_header in original_headers: for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header] headers[original_header.lower()] = original_headers[original_header]
@@ -684,18 +703,18 @@ def save_forensic_report_to_opensearch(
def save_smtp_tls_report_to_opensearch( def save_smtp_tls_report_to_opensearch(
report, report: dict[str, Any],
index_suffix=None, index_suffix: Optional[str] = None,
index_prefix=None, index_prefix: Optional[str] = None,
monthly_indexes=False, monthly_indexes: bool = False,
number_of_shards=1, number_of_shards: int = 1,
number_of_replicas=0, number_of_replicas: int = 0,
): ):
""" """
Saves a parsed SMTP TLS report to OpenSearch Saves a parsed SMTP TLS report to OpenSearch
Args: Args:
report (OrderedDict): A parsed SMTP TLS report report (dict): A parsed SMTP TLS report
index_suffix (str): The suffix of the name of the index to save to index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -705,7 +724,7 @@ def save_smtp_tls_report_to_opensearch(
Raises: Raises:
AlreadySaved AlreadySaved
""" """
logger.info("Saving aggregate report to OpenSearch") logger.info("Saving SMTP TLS report to OpenSearch")
org_name = report["organization_name"] org_name = report["organization_name"]
report_id = report["report_id"] report_id = report["report_id"]
begin_date = human_timestamp_to_datetime(report["begin_date"], to_utc=True) begin_date = human_timestamp_to_datetime(report["begin_date"], to_utc=True)
@@ -781,7 +800,7 @@ def save_smtp_tls_report_to_opensearch(
policy_doc = _SMTPTLSPolicyDoc( policy_doc = _SMTPTLSPolicyDoc(
policy_domain=policy["policy_domain"], policy_domain=policy["policy_domain"],
policy_type=policy["policy_type"], policy_type=policy["policy_type"],
succesful_session_count=policy["successful_session_count"], successful_session_count=policy["successful_session_count"],
failed_session_count=policy["failed_session_count"], failed_session_count=policy["failed_session_count"],
policy_string=policy_strings, policy_string=policy_strings,
mx_host_patterns=mx_host_patterns, mx_host_patterns=mx_host_patterns,

View File

@@ -27,6 +27,7 @@ The `service_type` is based on the following rule precedence:
- Agriculture - Agriculture
- Automotive - Automotive
- Beauty - Beauty
- Conglomerate
- Construction - Construction
- Consulting - Consulting
- Defense - Defense
@@ -43,6 +44,7 @@ The `service_type` is based on the following rule precedence:
- IaaS - IaaS
- Industrial - Industrial
- ISP - ISP
- Legal
- Logistics - Logistics
- Manufacturing - Manufacturing
- Marketing - Marketing
@@ -52,6 +54,7 @@ The `service_type` is based on the following rule precedence:
- Nonprofit - Nonprofit
- PaaS - PaaS
- Photography - Photography
- Physical Security
- Print - Print
- Publishing - Publishing
- Real Estate - Real Estate
@@ -74,12 +77,16 @@ A list of reverse DNS base domains that could not be identified as belonging to
## base_reverse_dns.csv ## base_reverse_dns.csv
A CSV with the fields `source_name` and optionally `message_count`. This CSV can be generated byy exporting the base DNS data from the Kibana on Splunk dashboards provided by parsedmarc. This file is not tracked by Git. A CSV with the fields `source_name` and optionally `message_count`. This CSV can be generated by exporting the base DNS data from the Kibana or Splunk dashboards provided by parsedmarc. This file is not tracked by Git.
## unknown_base_reverse_dns.csv ## unknown_base_reverse_dns.csv
A CSV file with the fields `source_name` and `message_count`. This file is not tracked by Git. A CSV file with the fields `source_name` and `message_count`. This file is not tracked by Git.
## find_bad_utf8.py
Locates invalid UTF-8 bytes in files and optionally tries to current them. Generated by GPT5. Helped me find where I had introduced invalid bytes in `base_reverse_dns_map.csv`.
## find_unknown_base_reverse_dns.py ## find_unknown_base_reverse_dns.py
This is a python script that reads the domains in `base_reverse_dns.csv` and writes the domains that are not in `base_reverse_dns_map.csv` or `known_unknown_base_reverse_dns.txt` to `unknown_base_reverse_dns.csv`. This is useful for identifying potential additional domains to contribute to `base_reverse_dns_map.csv` and `known_unknown_base_reverse_dns.txt`. This is a python script that reads the domains in `base_reverse_dns.csv` and writes the domains that are not in `base_reverse_dns_map.csv` or `known_unknown_base_reverse_dns.txt` to `unknown_base_reverse_dns.csv`. This is useful for identifying potential additional domains to contribute to `base_reverse_dns_map.csv` and `known_unknown_base_reverse_dns.txt`.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,44 @@
Agriculture
Automotive
Beauty
Conglomerate
Construction
Consulting
Defense
Education
Email Provider
Email Security
Entertainment
Event Planning
Finance
Food
Government
Government Media
Healthcare
ISP
IaaS
Industrial
Legal
Logistics
MSP
MSSP
Manufacturing
Marketing
News
Nonprofit
PaaS
Photography
Physical Security
Print
Publishing
Real Estate
Retail
SaaS
Science
Search Engine
Social Media
Sports
Staffing
Technology
Travel
Web Host

View File

@@ -0,0 +1,488 @@
#!/usr/bin/env python3
import argparse
import codecs
import os
import sys
import shutil
from typing import List, Tuple
"""
Locates and optionally corrects bad UTF-8 bytes in a file.
Generated by GPT-5 Use at your own risk.
"""
# -------------------------
# UTF-8 scanning
# -------------------------
def scan_line_for_utf8_errors(
line_bytes: bytes, line_no: int, base_offset: int, context: int
):
"""
Scan one line of raw bytes for UTF-8 decoding errors.
Returns a list of dicts describing each error.
"""
pos = 0
results = []
while pos < len(line_bytes):
dec = codecs.getincrementaldecoder("utf-8")("strict")
try:
dec.decode(line_bytes[pos:], final=True)
break
except UnicodeDecodeError as e:
rel_index = e.start
abs_index_in_line = pos + rel_index
abs_offset = base_offset + abs_index_in_line
start_ctx = max(0, abs_index_in_line - context)
end_ctx = min(len(line_bytes), abs_index_in_line + 1 + context)
ctx_bytes = line_bytes[start_ctx:end_ctx]
bad_byte = line_bytes[abs_index_in_line : abs_index_in_line + 1]
col = abs_index_in_line + 1 # 1-based byte column
results.append(
{
"line": line_no,
"column": col,
"abs_offset": abs_offset,
"bad_byte_hex": bad_byte.hex(),
"context_hex": ctx_bytes.hex(),
"context_preview": ctx_bytes.decode("utf-8", errors="replace"),
}
)
# Move past the offending byte and continue
pos = abs_index_in_line + 1
return results
def scan_file_for_utf8_errors(path: str, context: int, limit: int):
errors_found = 0
limit_val = limit if limit != 0 else float("inf")
with open(path, "rb") as f:
total_offset = 0
line_no = 0
while True:
line = f.readline()
if not line:
break
line_no += 1
results = scan_line_for_utf8_errors(line, line_no, total_offset, context)
for r in results:
errors_found += 1
print(
f"[ERROR {errors_found}] Line {r['line']}, Column {r['column']}, "
f"Absolute byte offset {r['abs_offset']}"
)
print(f" Bad byte: 0x{r['bad_byte_hex']}")
print(f" Context (hex): {r['context_hex']}")
print(f" Context (preview): {r['context_preview']}")
print()
if errors_found >= limit_val:
print(f"Reached limit of {limit} errors. Stopping.")
return errors_found
total_offset += len(line)
if errors_found == 0:
print("No invalid UTF-8 bytes found. 🎉")
else:
print(f"Found {errors_found} invalid UTF-8 byte(s).")
return errors_found
# -------------------------
# Whole-file conversion
# -------------------------
def detect_encoding_text(path: str) -> Tuple[str, str]:
"""
Use charset-normalizer to detect file encoding.
Return (encoding_name, decoded_text). Falls back to cp1252 if needed.
"""
try:
from charset_normalizer import from_path
except ImportError:
print(
"Please install charset-normalizer: pip install charset-normalizer",
file=sys.stderr,
)
sys.exit(4)
matches = from_path(path)
match = matches.best()
if match is None or match.encoding is None:
# Fallback heuristic for Western single-byte text
with open(path, "rb") as fb:
data = fb.read()
try:
return "cp1252", data.decode("cp1252", errors="strict")
except UnicodeDecodeError:
print("Unable to detect encoding reliably.", file=sys.stderr)
sys.exit(5)
return match.encoding, str(match)
def convert_to_utf8(src_path: str, out_path: str, src_encoding: str = None) -> str:
"""
Convert an entire file to UTF-8 (re-decoding everything).
If src_encoding is provided, use it; else auto-detect.
Returns the encoding actually used.
"""
if src_encoding:
with open(src_path, "rb") as fb:
data = fb.read()
try:
text = data.decode(src_encoding, errors="strict")
except LookupError:
print(f"Unknown encoding: {src_encoding}", file=sys.stderr)
sys.exit(6)
except UnicodeDecodeError as e:
print(f"Decoding failed with {src_encoding}: {e}", file=sys.stderr)
sys.exit(7)
used = src_encoding
else:
used, text = detect_encoding_text(src_path)
with open(out_path, "w", encoding="utf-8", newline="") as fw:
fw.write(text)
return used
def verify_utf8_file(path: str) -> Tuple[bool, str]:
try:
with open(path, "rb") as fb:
fb.read().decode("utf-8", errors="strict")
return True, ""
except UnicodeDecodeError as e:
return False, str(e)
# -------------------------
# Targeted single-byte fixer
# -------------------------
def iter_lines_with_offsets(b: bytes):
"""
Yield (line_bytes, line_start_abs_offset). Preserves LF/CRLF/CR in bytes.
"""
start = 0
for i, byte in enumerate(b):
if byte == 0x0A: # LF
yield b[start : i + 1], start
start = i + 1
if start < len(b):
yield b[start:], start
def detect_probable_fallbacks() -> List[str]:
# Good defaults for Western/Portuguese text
return ["cp1252", "iso-8859-1", "iso-8859-15"]
def repair_mixed_utf8_line(line: bytes, base_offset: int, fallback_chain: List[str]):
"""
Strictly validate UTF-8 and fix *only* the exact offending byte when an error occurs.
This avoids touching adjacent valid UTF-8 (prevents mojibake like 'é').
"""
out_fragments: List[str] = []
fixes = []
pos = 0
n = len(line)
while pos < n:
dec = codecs.getincrementaldecoder("utf-8")("strict")
try:
s = dec.decode(line[pos:], final=True)
out_fragments.append(s)
break
except UnicodeDecodeError as e:
# Append the valid prefix before the error
if e.start > 0:
out_fragments.append(
line[pos : pos + e.start].decode("utf-8", errors="strict")
)
bad_index = pos + e.start # absolute index in 'line'
bad_slice = line[bad_index : bad_index + 1] # FIX EXACTLY ONE BYTE
# Decode that single byte using the first working fallback
decoded = None
used_enc = None
for enc in fallback_chain:
try:
decoded = bad_slice.decode(enc, errors="strict")
used_enc = enc
break
except Exception:
continue
if decoded is None:
# latin-1 always succeeds (byte->same code point)
decoded = bad_slice.decode("latin-1")
used_enc = "latin-1 (fallback)"
out_fragments.append(decoded)
# Log the fix
col_1based = bad_index + 1 # byte-based column
fixes.append(
{
"line_base_offset": base_offset,
"line": None, # caller fills line number
"column": col_1based,
"abs_offset": base_offset + bad_index,
"bad_bytes_hex": bad_slice.hex(),
"used_encoding": used_enc,
"replacement_preview": decoded,
}
)
# Advance exactly one byte past the offending byte and continue
pos = bad_index + 1
return "".join(out_fragments), fixes
def targeted_fix_to_utf8(
src_path: str,
out_path: str,
fallback_chain: List[str],
dry_run: bool,
max_fixes: int,
):
with open(src_path, "rb") as fb:
data = fb.read()
total_fixes = 0
repaired_lines: List[str] = []
line_no = 0
max_val = max_fixes if max_fixes != 0 else float("inf")
for line_bytes, base_offset in iter_lines_with_offsets(data):
line_no += 1
# Fast path: keep lines that are already valid UTF-8
try:
repaired_lines.append(line_bytes.decode("utf-8", errors="strict"))
continue
except UnicodeDecodeError:
pass
fixed_text, fixes = repair_mixed_utf8_line(
line_bytes, base_offset, fallback_chain=fallback_chain
)
for f in fixes:
f["line"] = line_no
repaired_lines.append(fixed_text)
# Log fixes
for f in fixes:
total_fixes += 1
print(
f"[FIX {total_fixes}] Line {f['line']}, Column {f['column']}, Abs offset {f['abs_offset']}"
)
print(f" Bad bytes: 0x{f['bad_bytes_hex']}")
print(f" Used encoding: {f['used_encoding']}")
preview = f["replacement_preview"].replace("\r", "\\r").replace("\n", "\\n")
if len(preview) > 40:
preview = preview[:40] + ""
print(f" Replacement preview: {preview}")
print()
if total_fixes >= max_val:
print(f"Reached max fixes limit ({max_fixes}). Stopping scan.")
break
if total_fixes >= max_val:
break
if dry_run:
print(f"Dry run complete. Detected {total_fixes} fix(es). No file written.")
return total_fixes
# Join and verify result can be encoded to UTF-8
repaired_text = "".join(repaired_lines)
try:
repaired_text.encode("utf-8", errors="strict")
except UnicodeEncodeError as e:
print(f"Internal error: repaired text not valid UTF-8: {e}", file=sys.stderr)
sys.exit(3)
with open(out_path, "w", encoding="utf-8", newline="") as fw:
fw.write(repaired_text)
print(f"Fixed file written to: {out_path}")
print(f"Total fixes applied: {total_fixes}")
return total_fixes
# -------------------------
# CLI
# -------------------------
def main():
ap = argparse.ArgumentParser(
description=(
"Scan for invalid UTF-8; optionally convert whole file or fix only invalid bytes.\n\n"
"By default, --convert and --fix **edit the input file in place** and create a backup "
"named '<input>.bak' before writing. If you pass --output, the original file is left "
"unchanged and no backup is created. Use --dry-run to preview fixes without writing."
),
formatter_class=argparse.RawTextHelpFormatter,
)
ap.add_argument("path", help="Path to the CSV/text file")
ap.add_argument(
"--context",
type=int,
default=20,
help="Bytes of context to show around errors (default: 20)",
)
ap.add_argument(
"--limit",
type=int,
default=100,
help="Max errors to report during scan (0 = unlimited)",
)
ap.add_argument(
"--skip-scan", action="store_true", help="Skip initial scan for speed"
)
# Whole-file convert
ap.add_argument(
"--convert",
action="store_true",
help="Convert entire file to UTF-8 using auto/forced encoding "
"(in-place by default; creates '<input>.bak').",
)
ap.add_argument(
"--encoding",
help="Force source encoding for --convert or first fallback for --fix",
)
ap.add_argument(
"--output",
help="Write to this path instead of in-place (no .bak is created in that case)",
)
# Targeted fix
ap.add_argument(
"--fix",
action="store_true",
help="Fix only invalid byte(s) via fallback encodings "
"(in-place by default; creates '<input>.bak').",
)
ap.add_argument(
"--fallbacks",
help="Comma-separated fallback encodings (default: cp1252,iso-8859-1,iso-8859-15)",
)
ap.add_argument(
"--dry-run",
action="store_true",
help="(fix) Print fixes but do not write or create a .bak",
)
ap.add_argument(
"--max-fixes",
type=int,
default=0,
help="(fix) Stop after N fixes (0 = unlimited)",
)
args = ap.parse_args()
path = args.path
if not os.path.isfile(path):
print(f"File not found: {path}", file=sys.stderr)
sys.exit(2)
# Optional scan first
if not args.skip_scan:
scan_file_for_utf8_errors(path, context=args.context, limit=args.limit)
# Mode selection guards
if args.convert and args.fix:
print("Choose either --convert or --fix (not both).", file=sys.stderr)
sys.exit(9)
if not args.convert and not args.fix and args.skip_scan:
print("No action selected (use --convert or --fix).")
return
if not args.convert and not args.fix:
# User only wanted a scan
return
# Determine output path and backup behavior
# In-place by default: create '<input>.bak' before overwriting.
if args.output:
out_path = args.output
in_place = False
else:
out_path = path
in_place = True
# CONVERT mode
if args.convert:
print("\n[CONVERT MODE] Converting file to UTF-8...")
if in_place:
# Create backup before overwriting original
backup_path = path + ".bak"
shutil.copy2(path, backup_path)
print(f"Backup created: {backup_path}")
used = convert_to_utf8(path, out_path, src_encoding=args.encoding)
print(f"Source encoding used: {used}")
print(f"Saved UTF-8 file as: {out_path}")
ok, err = verify_utf8_file(out_path)
if ok:
print("Verification: output is valid UTF-8 ✅")
else:
print(f"Verification failed: {err}")
sys.exit(8)
return
# FIX mode (targeted, single-byte)
if args.fix:
print("\n[FIX MODE] Fixing only invalid bytes to UTF-8...")
if args.dry_run:
# Dry-run: never write or create backup
out_path_effective = os.devnull
in_place_effective = False
else:
out_path_effective = out_path
in_place_effective = in_place
# Build fallback chain (if --encoding provided, try it first)
if args.fallbacks:
fallback_chain = [e.strip() for e in args.fallbacks.split(",") if e.strip()]
else:
fallback_chain = detect_probable_fallbacks()
if args.encoding and args.encoding not in fallback_chain:
fallback_chain = [args.encoding] + fallback_chain
if in_place_effective:
# Create backup before overwriting original (only when actually writing)
backup_path = path + ".bak"
shutil.copy2(path, backup_path)
print(f"Backup created: {backup_path}")
fix_count = targeted_fix_to_utf8(
path,
out_path_effective,
fallback_chain=fallback_chain,
dry_run=args.dry_run,
max_fixes=args.max_fixes,
)
if not args.dry_run:
ok, err = verify_utf8_file(out_path_effective)
if ok:
print("Verification: output is valid UTF-8 ✅")
print(f"Fix mode completed — {fix_count} byte(s) corrected.")
else:
print(f"Verification failed: {err}")
sys.exit(8)
return
if __name__ == "__main__":
main()

View File

@@ -1,6 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
import logging
import os import os
import csv import csv
@@ -9,60 +8,66 @@ def _main():
input_csv_file_path = "base_reverse_dns.csv" input_csv_file_path = "base_reverse_dns.csv"
base_reverse_dns_map_file_path = "base_reverse_dns_map.csv" base_reverse_dns_map_file_path = "base_reverse_dns_map.csv"
known_unknown_list_file_path = "known_unknown_base_reverse_dns.txt" known_unknown_list_file_path = "known_unknown_base_reverse_dns.txt"
psl_overrides_file_path = "psl_overrides.txt"
output_csv_file_path = "unknown_base_reverse_dns.csv" output_csv_file_path = "unknown_base_reverse_dns.csv"
csv_headers = ["source_name", "message_count"] csv_headers = ["source_name", "message_count"]
known_unknown_domains = []
psl_overrides = []
known_domains = []
output_rows = [] output_rows = []
logging.basicConfig() def load_list(file_path, list_var):
logger = logging.getLogger(__name__) if not os.path.exists(file_path):
logger.setLevel(logging.INFO) print(f"Error: {file_path} does not exist")
print(f"Loading {file_path}")
with open(file_path) as f:
for line in f.readlines():
domain = line.lower().strip()
if domain in list_var:
print(f"Error: {domain} is in {file_path} multiple times")
exit(1)
elif domain != "":
list_var.append(domain)
for p in [ load_list(known_unknown_list_file_path, known_unknown_domains)
input_csv_file_path, load_list(psl_overrides_file_path, psl_overrides)
base_reverse_dns_map_file_path, if not os.path.exists(base_reverse_dns_map_file_path):
known_unknown_list_file_path, print(f"Error: {base_reverse_dns_map_file_path} does not exist")
]: print(f"Loading {base_reverse_dns_map_file_path}")
if not os.path.exists(p):
logger.error(f"{p} does not exist")
exit(1)
logger.info(f"Loading {known_unknown_list_file_path}")
known_unknown_domains = []
with open(known_unknown_list_file_path) as f:
for line in f.readlines():
domain = line.lower().strip()
if domain in known_unknown_domains:
logger.warning(
f"{domain} is in {known_unknown_list_file_path} multiple times"
)
else:
known_unknown_domains.append(domain)
logger.info(f"Loading {base_reverse_dns_map_file_path}")
known_domains = []
with open(base_reverse_dns_map_file_path) as f: with open(base_reverse_dns_map_file_path) as f:
for row in csv.DictReader(f): for row in csv.DictReader(f):
domain = row["base_reverse_dns"].lower().strip() domain = row["base_reverse_dns"].lower().strip()
if domain in known_domains: if domain in known_domains:
logger.warning( print(
f"{domain} is in {base_reverse_dns_map_file_path} multiple times" f"Error: {domain} is in {base_reverse_dns_map_file_path} multiple times"
) )
exit()
else: else:
known_domains.append(domain) known_domains.append(domain)
if domain in known_unknown_domains and known_domains: if domain in known_unknown_domains and known_domains:
pass print(
logger.warning( f"Error:{domain} is in {known_unknown_list_file_path} and \
f"{domain} is in {known_unknown_list_file_path} and {base_reverse_dns_map_file_path}" {base_reverse_dns_map_file_path}"
) )
exit(1)
logger.info(f"Checking domains against {base_reverse_dns_map_file_path}") if not os.path.exists(input_csv_file_path):
print(f"Error: {base_reverse_dns_map_file_path} does not exist")
exit(1)
with open(input_csv_file_path) as f: with open(input_csv_file_path) as f:
for row in csv.DictReader(f): for row in csv.DictReader(f):
domain = row["source_name"].lower().strip() domain = row["source_name"].lower().strip()
if domain == "":
continue
for psl_domain in psl_overrides:
if domain.endswith(psl_domain):
domain = psl_domain.strip(".").strip("-")
break
if domain not in known_domains and domain not in known_unknown_domains: if domain not in known_domains and domain not in known_unknown_domains:
logger.info(f"New unknown domain found: {domain}") print(f"New unknown domain found: {domain}")
output_rows.append(row) output_rows.append(row)
logger.info(f"Writing {output_csv_file_path}") print(f"Writing {output_csv_file_path}")
with open(output_csv_file_path, "w") as f: with open(output_csv_file_path, "w") as f:
writer = csv.DictWriter(f, fieldnames=csv_headers) writer = csv.DictWriter(f, fieldnames=csv_headers)
writer.writeheader() writer.writeheader()

View File

@@ -1,125 +1,601 @@
200.in-addr.arpa 1jli.site
26.107
444qcuhilla.com
4xr1.com
9services.com
a7e.ru
a94434500-blog.com
aams8.jp
abv-10.top
acemail.co.in
activaicon.com
adcritic.net
adlucrumnewsletter.com adlucrumnewsletter.com
admin.corpivensa.gob.ve admin.corpivensa.gob.ve
advantageiq.com
advrider.ro
aerospacevitro.us.com aerospacevitro.us.com
agenturserver.de
aghories.com
ai270.net
albagroup-eg.com albagroup-eg.com
alchemy.net
alohabeachcamp.net
alsiscad.com
aluminumpipetubing.com
americanstorageca.com
amplusserver.info
anchorfundhub.com
anglishment.com
anteldata.net.uy anteldata.net.uy
antis.edu
antonaoll.com antonaoll.com
anviklass.org
anwrgrp.lat
aosau.net aosau.net
arandomserver.com arandomserver.com
aransk.ru
ardcs.cn
armninl.met
as29550.net
asahachimaru.com
aserv.co.za
asmecam.it asmecam.it
ateky.net.br
aurelienvos.com
automatech.lat
avistaadvantage.com
b8sales.com b8sales.com
bahjs.com
baliaura.com
banaras.co
bearandbullmarketnews.com
bestinvestingtime.com bestinvestingtime.com
bhjui.com
biocorp.com biocorp.com
bisno1.co.jp biosophy.net
bitter-echo.com
bizhostingservices.com
blguss.com
bluenet.ch
bluhosting.com bluhosting.com
bnasg.com
bodiax.pp.ua bodiax.pp.ua
bost-law.com bost-law.com
brainity.com
brazalnde.net
brellatransplc.shop
brnonet.cz brnonet.cz
broadwaycover.com
brushinglegal.de brushinglegal.de
brw.net
btes.tv
budgeteasehub.com
buoytoys.com
buyjapanese.jp
c53dw7m24rj.com
cahtelrandom.org
casadelmarsamara.com
cashflowmasterypro.com
cavabeen.com
cbti.net
centralmalaysia.com
chauffeurplan.co.uk
checkpox.fun
chegouseuvlache.org
chinaxingyu.xyz
christus.mx christus.mx
churchills.market
ci-xyz.fit
cisumrecords.com
ckaik.cn
clcktoact.com
cli-eurosignal.cz
cloud-admin.it
cloud-edm.com cloud-edm.com
cloudflare-email.org
cloudhosting.rs
cloudlogin.co cloudlogin.co
cloudplatformpro.com
cnode.io cnode.io
cntcloud.com
code-it.net
codefriend.top
colombiaceropapel.org
commerceinsurance.com commerceinsurance.com
comsharempc.com
conexiona.com
coolblaze.com coolblaze.com
coowo.com
corpemail.net
cp2-myorderbox.com
cps.com.ar cps.com.ar
crnagora.net
cross-d-bar-troutranch.com
ctla.co.kr
cumbalikonakhotel.com
currencyexconverter.com
daakbabu.com
daikinmae.com
dairyvalley.com.my
dastans.ru
datahost36.de
ddii.network
deep-sek.shop
deetownsounds.com
descarca-counter-strike.net
detrot.xyz detrot.xyz
dettlaffinc.com
dextoolse.net
digestivedaily.com
digi.net.my digi.net.my
dinofelis.cn
diwkyncbi.top
dkginternet.com dkginternet.com
dnexpress.info
dns-oid.com
dnsindia.net
domainserver.ne.jp
domconfig.com
doorsrv.com doorsrv.com
dreampox.fun
dreamtechmedia.com dreamtechmedia.com
ds.network ds.network
dss-group.net
dvj.theworkpc.com
dwlcka.com
dynamic-wiretel.in
dyntcorp.com
easternkingspei.com
economiceagles.com
egosimail.com
eliotporterphotos.us
emailgids.net
emailperegrine.com emailperegrine.com
entendercopilot.com
entretothom.net
epaycontrol.com
epicinvestmentsreview.co
epicinvestmentsreview.com
epik.com
epsilon-group.com epsilon-group.com
erestaff.com
euro-trade-gmbh.com
example.com
exposervers.com-new
extendcp.co.uk
eyecandyhosting.xyz eyecandyhosting.xyz
fastwebnet.it
fd9ing7wfn.com
feipnghardware.com
fetscorp.shop fetscorp.shop
fewo-usedom.net
fin-crime.com
financeaimpoint.com
financeupward.com
firmflat.com
flex-video.bnr.la
flourishfusionlife.com
formicidaehunt.net formicidaehunt.net
fosterheap.com fosterheap.com
fredi.shop
frontiernet.net
ftifb7tk3c.com
gamersprotectionvpn.online
gendns.com gendns.com
getgreencardsfast.com
getthatroi.com
gibbshosting.com
gigidea.net
giize.com
ginous.eu.com ginous.eu.com
gis.net
gist-th.com gist-th.com
globalglennpartners.com
goldsboroughplace.com
gophermedia.com gophermedia.com
gqlists.us.com gqlists.us.com
gratzl.de gratzl.de
greatestworldnews.com
greennutritioncare.com
gsbb.com
gumbolimbo.net
h-serv.co.uk
haedefpartners.com
halcyon-aboveboard.com
hanzubon.org
healthfuljourneyjoy.com
hgnbroken.us.com hgnbroken.us.com
highwey-diesel.com
hirofactory.com
hjd.asso.fr
hongchenggco.pro
hongkongtaxi.co
hopsinthehanger.com
hosted-by-worldstream.net
hostelsucre.com
hosting1337.com hosting1337.com
hostinghane.com
hostinglotus.cloud
hostingmichigan.com hostingmichigan.com
hostiran.name
hostmnl.com
hostname.localhost hostname.localhost
hostnetwork.com hostnetwork.com
hosts.net.nz
hostserv.eu
hostwhitelabel.com hostwhitelabel.com
hpms1.jp
hunariojmk.net
hunriokinmuim.net
hypericine.com
i-mecca.net
iaasdns.com
iam.net.ma
iconmarketingguy.com
idcfcloud.net idcfcloud.net
idealconcept.live
igmohji.com
igppevents.org.uk
ihglobaldns.com
ilmessicano.com
imjtmn.cn
immenzaces.com immenzaces.com
in-addr-arpa
in-addr.arpa
indsalelimited.com
indulgent-holistic.com
industechint.org
inshaaegypt.com
intal.uz
interfarma.kz
intocpanel.com
ip-147-135-108.us
ip-178-33-109.eu
ip-ptr.tech
iswhatpercent.com
itsidc.com
itwebs.com
iuon.net
ivol.co ivol.co
jalanet.co.id jalanet.co.id
jimishare.com
jlccptt.net.cn
jlenterprises.co.uk
jmontalto.com
joyomokei.com
jumanra.org
justlongshirts.com
kahlaa.com kahlaa.com
kaw.theworkpc.com
kbronet.com.tw kbronet.com.tw
kdnursing.org kdnursing.org
kielnet.net
kihy.theworkpc.com
kingschurchwirral.org
kitchenaildbd.com kitchenaildbd.com
klaomi.shop
knkconsult.net
kohshikai.com
krhfund.org
krillaglass.com
lancorhomes.com
landpedia.org
lanzatuseo.es
layerdns.cloud
learninglinked.com
legenditds.com legenditds.com
levertechcentre.com
lhost.no
lideri.net.br
lighthouse-media.com lighthouse-media.com
lightpath.net
limogesporcelainboxes.com
lindsaywalt.net
linuxsunucum.com
listertermoformadoa.com
llsend.com
local.net
lohkal.com lohkal.com
londionrtim.net
lonestarmm.net lonestarmm.net
longmarquis.com
longwoodmgmt.com
lse.kz
lunvoy.com
luxarpro.ru
lwl-puehringer.at
lynx.net.lb
lyse.net
m-sender.com.ua
maggiolicloud.it
magnetmail.net magnetmail.net
magnumgo.uz
maia11.com
mail-fire.com
mailsentinel.net
mailset.cn
malardino.net
managed-vps.net
manhattanbulletpoint.com manhattanbulletpoint.com
manpowerservices.com
marketmysterycode.com
marketwizardspro.com
masterclassjournal.com masterclassjournal.com
matroguel.cam
maximpactipo.com
mechanicalwalk.store
mediavobis.com
meqlobal.com
mgts.by
migrans.net
miixta.com
milleniumsrv.com
mindworksunlimited.com
mirth-gale.com
misorpresa.com
mitomobile.com
mitsubachi-kibako.net
mjinn.com
mkegs.shop
mobius.fr
model-ac.ink
moderntradingnews.com moderntradingnews.com
monnaiegroup.com
monopolizeright.com
moonjaws.com moonjaws.com
morningnewscatcher.com
motion4ever.net motion4ever.net
mschosting.com mschosting.com
msdp1.com
mspnet.pro mspnet.pro
mts-nn.ru mts-nn.ru
multifamilydesign.com
mxserver.ro
mxthunder.net mxthunder.net
my-ihor.ru
mycloudmailbox.com
myfriendforum.com
myrewards.net myrewards.net
mysagestore.com mysagestore.com
mysecurewebserver.com
myshanet.net
myvps.jp
mywedsite.net
mywic.eu
name.tools
nanshenqfurniture.com
nask.pl
navertise.net
ncbb.kz
ncport.ru ncport.ru
ncsdi.ws
nebdig.com nebdig.com
neovet-base.ru neovet-base.ru
netbri.com
netcentertelecom.net.br
neti.ee
netkl.org
newinvestingguide.com
newwallstreetcode.com
ngvcv.cn
nic.name nic.name
nidix.net nidix.net
nieuwedagnetwerk.net
nlscanme.com
nmeuh.cn
noisndametal.com
nucleusemail.com
nutriboostlife.com
nwo.giize.com
nwwhalewatchers.org
ny.adsl
nyt1.com
offerslatedeals.com
office365.us
ogicom.net ogicom.net
olivettilexikon.co.uk
omegabrasil.inf.br omegabrasil.inf.br
onnet21.com onnet21.com
onumubunumu.com
oppt-ac.fit
orbitel.net.co
orfsurface.com
orientalspot.com
outsidences.com
ovaltinalization.co ovaltinalization.co
overta.ru overta.ru
ox28vgrurc.com
pamulang.net
panaltyspot.space
panolacountysheriffms.com
passionatesmiles.com passionatesmiles.com
paulinelam.com
pdi-corp.com
peloquinbeck.com
perimetercenter.net
permanentscreen.com
permasteellisagroup.com
perumkijhyu.net
pesnia.com.ua
ph8ltwdi12o.com
pharmada.com.de
phdns3.es
pigelixval1.com
pipefittingsindia.com
planethoster.net planethoster.net
playamedia.io
plesk.page
pmnhost.net pmnhost.net
pokiloandhu.net
pokupki5.ru
polandi.net
popiup.com popiup.com
ports.net
posolstvostilya.com
potia.net
prima.com.ar prima.com.ar
prima.net.ar prima.net.ar
profsol.co.uk
prohealthmotion.com
promooffermarket.site
proudserver.com proudserver.com
proxado.com
psnm.ru
pvcwindowsprices.live
qontenciplc.autos qontenciplc.autos
quakeclick.com
quasarstate.store
quatthonggiotico.com
qxyxab44njd.com
radianthealthrenaissance.com
rapidns.com
raxa.host raxa.host
reberte.com
reethvikintl.com
regruhosting.ru
reliablepanel.com
rgb365.eu
riddlecamera.net
riddletrends.com
roccopugliese.com
runnin-rebels.com
rupar.puglia.it
rwdhosting.ca
s500host.com
sageevents.co.ke
sahacker-2020.com sahacker-2020.com
samsales.site samsales.site
sante-lorraine.fr
saransk.ru
satirogluet.com satirogluet.com
securednshost.com scioncontacts.com
sdcc.my
seaspraymta3.net
secorp.mx
securen.net securen.net
securerelay.in securerelay.in
securev.net securev.net
seductiveeyes.com
seizethedayconsulting.com
serroplast.shop
server290.com
server342.com
server3559.cc
servershost.biz servershost.biz
sfek.kz
sgnetway.net
shopfox.ca
silvestrejaguar.sbs
silvestreonca.sbs
simplediagnostics.org
siriuscloud.jp
sisglobalresearch.com
sixpacklink.net
sjestyle.com
smallvillages.com smallvillages.com
smartape-vps.com
solusoftware.com solusoftware.com
sourcedns.com
southcoastwebhosting12.com
specialtvvs.com
spiritualtechnologies.io spiritualtechnologies.io
sprout.org sprout.org
srv.cat
stableserver.net stableserver.net
statlerfa.co.uk
stock-smtp.top
stockepictigers.com
stockexchangejournal.com stockexchangejournal.com
subterranean-concave.com
suksangroup.com suksangroup.com
swissbluetopaz.com
switer.shop
sysop4.com
system.eu.com system.eu.com
szhongbing.com
t-jon.com t-jon.com
tacaindo.net
tacom.tj
tankertelz.co
tataidc.com
teamveiw.com
tecnoxia.net
tel-xyz.fit
tenkids.net tenkids.net
terminavalley.com
thaicloudsolutions.com thaicloudsolutions.com
thaikinghost.com
thaimonster.com thaimonster.com
thegermainetruth.net
thehandmaderose.com
thepushcase.com
ticdns.com
tigo.bo
toledofibra.net.br
topdns.com
totaal.net
totalplay.net
tqh.ro
traderlearningcenter.com
tradeukraine.site
traveleza.com
trwww.com
tsuzakij.com
tullostrucking.com tullostrucking.com
turbinetrends.com
twincitiesdistinctivehomes.com
tylerfordonline.com
uiyum.com
ultragate.com
uneedacollie.com
unified.services
unite.services unite.services
urawasl.com urawasl.com
us.servername.us us.servername.us
vagebond.net
varvia.de
vbcploo.com
vdc.vn
vendimetry.com vendimetry.com
vibrantwellnesscorp.com vibrantwellnesscorp.com
virtualine.org
visit.docotor
viviotech.us
vlflgl.com
volganet.ru
vrns.net
vulterdi.edu
vvondertex.com
wallstreetsgossip.com wallstreetsgossip.com
wamego.net
wanekoohost.com
wealthexpertisepro.com
web-login.eu
weblinkinternational.com weblinkinternational.com
webnox.io
websale.net
welllivinghive.com
westparkcom.com
wetransfer-eu.com
wheelch.me
whoflew.com
whpservers.com
wisdomhard.com
wisewealthcircle.com
wisvis.com
wodeniowa.com
wordpresshosting.xyz
wsiph2.com
xnt.mx
xodiax.com
xpnuf.cn
xsfati.us.com xsfati.us.com
xspmail.jp xspmail.jp
yourciviccompass.com
yourinvestworkbook.com
yoursitesecure.net
zerowebhosting.net zerowebhosting.net
zmml.uk
znlc.jp znlc.jp
ztomy.com

View File

@@ -0,0 +1,23 @@
-applefibernet.com
-c3.net.pl
-celsiainternet.com
-clientes-izzi.mx
-clientes-zap-izzi.mx
-imnet.com.br
-mcnbd.com
-smile.com.bd
-tataidc.co.in
-veloxfiber.com.br
-wconect.com.br
.amazonaws.com
.cloudaccess.net
.ddnsgeek.com
.fastvps-server.com
.in-addr-arpa
.in-addr.arpa
.kasserver.com
.kinghost.net
.linode.com
.linodeusercontent.com
.na4u.ru
.sakura.ne.jp

View File

@@ -0,0 +1,184 @@
#!/usr/bin/env python3
from __future__ import annotations
import os
import csv
from pathlib import Path
from typing import Mapping, Iterable, Optional, Collection, Union, List, Dict
class CSVValidationError(Exception):
def __init__(self, errors: list[str]):
super().__init__("\n".join(errors))
self.errors = errors
def sort_csv(
filepath: Union[str, Path],
field: str,
*,
sort_field_value_must_be_unique: bool = True,
strip_whitespace: bool = True,
fields_to_lowercase: Optional[Iterable[str]] = None,
case_insensitive_sort: bool = False,
required_fields: Optional[Iterable[str]] = None,
allowed_values: Optional[Mapping[str, Collection[str]]] = None,
) -> List[Dict[str, str]]:
"""
Read a CSV, optionally normalize rows (strip whitespace, lowercase certain fields),
validate field values, and write the sorted CSV back to the same path.
- filepath: Path to the CSV to sort.
- field: The field name to sort by.
- fields_to_lowercase: Permanently lowercases these field(s) in the data.
- strip_whitespace: Remove all whitespace at the beginning and of field values.
- case_insensitive_sort: Ignore case when sorting without changing values.
- required_fields: A list of fields that must have data in all rows.
- allowed_values: A mapping of allowed values for fields.
"""
path = Path(filepath)
required_fields = set(required_fields or [])
lower_set = set(fields_to_lowercase or [])
allowed_sets = {k: set(v) for k, v in (allowed_values or {}).items()}
if sort_field_value_must_be_unique:
seen_sort_field_values = []
with path.open("r", newline="") as infile:
reader = csv.DictReader(infile)
fieldnames = reader.fieldnames or []
if field not in fieldnames:
raise CSVValidationError([f"Missing sort column: {field!r}"])
missing_headers = required_fields - set(fieldnames)
if missing_headers:
raise CSVValidationError(
[f"Missing required header(s): {sorted(missing_headers)}"]
)
rows = list(reader)
def normalize_row(row: Dict[str, str]) -> None:
if strip_whitespace:
for k, v in row.items():
if isinstance(v, str):
row[k] = v.strip()
for fld in lower_set:
if fld in row and isinstance(row[fld], str):
row[fld] = row[fld].lower()
def validate_row(
row: Dict[str, str], sort_field: str, line_no: int, errors: list[str]
) -> None:
if sort_field_value_must_be_unique:
if row[sort_field] in seen_sort_field_values:
errors.append(f"Line {line_no}: Duplicate row for '{row[sort_field]}'")
else:
seen_sort_field_values.append(row[sort_field])
for rf in required_fields:
val = row.get(rf)
if val is None or val == "":
errors.append(
f"Line {line_no}: Missing value for required field '{rf}'"
)
for field, allowed_values in allowed_sets.items():
if field in row:
val = row[field]
if val not in allowed_values:
errors.append(
f"Line {line_no}: '{val}' is not an allowed value for '{field}' "
f"(allowed: {sorted(allowed_values)})"
)
errors: list[str] = []
for idx, row in enumerate(rows, start=2): # header is line 1
normalize_row(row)
validate_row(row, field, idx, errors)
if errors:
raise CSVValidationError(errors)
def sort_key(r: Dict[str, str]):
v = r.get(field, "")
if isinstance(v, str) and case_insensitive_sort:
return v.casefold()
return v
rows.sort(key=sort_key)
with open(filepath, "w", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
def sort_list_file(
filepath: Union[str, Path],
*,
lowercase: bool = True,
strip: bool = True,
deduplicate: bool = True,
remove_blank_lines: bool = True,
ending_newline: bool = True,
newline: Optional[str] = "\n",
):
"""Read a list from a file, sort it, optionally strip and deduplicate the values,
then write that list back to the file.
- Filepath: The path to the file.
- lowercase: Lowercase all values prior to sorting.
- remove_blank_lines: Remove any plank lines.
- ending_newline: End the file with a newline, even if remove_blank_lines is true.
- newline: The newline character to use.
"""
with open(filepath, mode="r", newline=newline) as infile:
lines = infile.readlines()
for i in range(len(lines)):
if lowercase:
lines[i] = lines[i].lower()
if strip:
lines[i] = lines[i].strip()
if deduplicate:
lines = list(set(lines))
if remove_blank_lines:
while "" in lines:
lines.remove("")
lines = sorted(lines)
if ending_newline:
if lines[-1] != "":
lines.append("")
with open(filepath, mode="w", newline=newline) as outfile:
outfile.write("\n".join(lines))
def _main():
map_file = "base_reverse_dns_map.csv"
map_key = "base_reverse_dns"
list_files = ["known_unknown_base_reverse_dns.txt", "psl_overrides.txt"]
types_file = "base_reverse_dns_types.txt"
with open(types_file) as f:
types = f.readlines()
while "" in types:
types.remove("")
map_allowed_values = {"Type": types}
for list_file in list_files:
if not os.path.exists(list_file):
print(f"Error: {list_file} does not exist")
exit(1)
sort_list_file(list_file)
if not os.path.exists(types_file):
print(f"Error: {types_file} does not exist")
exit(1)
sort_list_file(types_file, lowercase=False)
if not os.path.exists(map_file):
print(f"Error: {map_file} does not exist")
exit(1)
try:
sort_csv(map_file, map_key, allowed_values=map_allowed_values)
except CSVValidationError as e:
print(f"{map_file} did not validate: {e}")
if __name__ == "__main__":
_main()

View File

@@ -1,6 +1,10 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import annotations
import json import json
from typing import Any
import boto3 import boto3
from parsedmarc.log import logger from parsedmarc.log import logger
@@ -8,16 +12,16 @@ from parsedmarc.utils import human_timestamp_to_datetime
class S3Client(object): class S3Client(object):
"""A client for a Amazon S3""" """A client for interacting with Amazon S3"""
def __init__( def __init__(
self, self,
bucket_name, bucket_name: str,
bucket_path, bucket_path: str,
region_name, region_name: str,
endpoint_url, endpoint_url: str,
access_key_id, access_key_id: str,
secret_access_key, secret_access_key: str,
): ):
""" """
Initializes the S3Client Initializes the S3Client
@@ -47,18 +51,18 @@ class S3Client(object):
aws_access_key_id=access_key_id, aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key, aws_secret_access_key=secret_access_key,
) )
self.bucket = self.s3.Bucket(self.bucket_name) self.bucket = self.s3.Bucket(self.bucket_name) # type: ignore
def save_aggregate_report_to_s3(self, report): def save_aggregate_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "aggregate") self.save_report_to_s3(report, "aggregate")
def save_forensic_report_to_s3(self, report): def save_forensic_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "forensic") self.save_report_to_s3(report, "forensic")
def save_smtp_tls_report_to_s3(self, report): def save_smtp_tls_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "smtp_tls") self.save_report_to_s3(report, "smtp_tls")
def save_report_to_s3(self, report, report_type): def save_report_to_s3(self, report: dict[str, Any], report_type: str):
if report_type == "smtp_tls": if report_type == "smtp_tls":
report_date = report["begin_date"] report_date = report["begin_date"]
report_id = report["report_id"] report_id = report["report_id"]

View File

@@ -1,9 +1,14 @@
from urllib.parse import urlparse # -*- coding: utf-8 -*-
import socket
import json from __future__ import annotations
import json
import socket
from typing import Any, Union
from urllib.parse import urlparse
import urllib3
import requests import requests
import urllib3
from parsedmarc.constants import USER_AGENT from parsedmarc.constants import USER_AGENT
from parsedmarc.log import logger from parsedmarc.log import logger
@@ -23,7 +28,13 @@ class HECClient(object):
# http://docs.splunk.com/Documentation/Splunk/latest/RESTREF/RESTinput#services.2Fcollector # http://docs.splunk.com/Documentation/Splunk/latest/RESTREF/RESTinput#services.2Fcollector
def __init__( def __init__(
self, url, access_token, index, source="parsedmarc", verify=True, timeout=60 self,
url: str,
access_token: str,
index: str,
source: str = "parsedmarc",
verify=True,
timeout=60,
): ):
""" """
Initializes the HECClient Initializes the HECClient
@@ -37,9 +48,9 @@ class HECClient(object):
timeout (float): Number of seconds to wait for the server to send timeout (float): Number of seconds to wait for the server to send
data before giving up data before giving up
""" """
url = urlparse(url) parsed_url = urlparse(url)
self.url = "{0}://{1}/services/collector/event/1.0".format( self.url = "{0}://{1}/services/collector/event/1.0".format(
url.scheme, url.netloc parsed_url.scheme, parsed_url.netloc
) )
self.access_token = access_token.lstrip("Splunk ") self.access_token = access_token.lstrip("Splunk ")
self.index = index self.index = index
@@ -48,14 +59,19 @@ class HECClient(object):
self.session = requests.Session() self.session = requests.Session()
self.timeout = timeout self.timeout = timeout
self.session.verify = verify self.session.verify = verify
self._common_data = dict(host=self.host, source=self.source, index=self.index) self._common_data: dict[str, Union[str, int, float, dict]] = dict(
host=self.host, source=self.source, index=self.index
)
self.session.headers = { self.session.headers = {
"User-Agent": USER_AGENT, "User-Agent": USER_AGENT,
"Authorization": "Splunk {0}".format(self.access_token), "Authorization": "Splunk {0}".format(self.access_token),
} }
def save_aggregate_reports_to_splunk(self, aggregate_reports): def save_aggregate_reports_to_splunk(
self,
aggregate_reports: Union[list[dict[str, Any]], dict[str, Any]],
):
""" """
Saves aggregate DMARC reports to Splunk Saves aggregate DMARC reports to Splunk
@@ -75,9 +91,12 @@ class HECClient(object):
json_str = "" json_str = ""
for report in aggregate_reports: for report in aggregate_reports:
for record in report["records"]: for record in report["records"]:
new_report = dict() new_report: dict[str, Union[str, int, float, dict]] = dict()
for metadata in report["report_metadata"]: for metadata in report["report_metadata"]:
new_report[metadata] = report["report_metadata"][metadata] new_report[metadata] = report["report_metadata"][metadata]
new_report["interval_begin"] = record["interval_begin"]
new_report["interval_end"] = record["interval_end"]
new_report["normalized_timespan"] = record["normalized_timespan"]
new_report["published_policy"] = report["policy_published"] new_report["published_policy"] = report["policy_published"]
new_report["source_ip_address"] = record["source"]["ip_address"] new_report["source_ip_address"] = record["source"]["ip_address"]
new_report["source_country"] = record["source"]["country"] new_report["source_country"] = record["source"]["country"]
@@ -98,7 +117,9 @@ class HECClient(object):
new_report["spf_results"] = record["auth_results"]["spf"] new_report["spf_results"] = record["auth_results"]["spf"]
data["sourcetype"] = "dmarc:aggregate" data["sourcetype"] = "dmarc:aggregate"
timestamp = human_timestamp_to_unix_timestamp(new_report["begin_date"]) timestamp = human_timestamp_to_unix_timestamp(
new_report["interval_begin"]
)
data["time"] = timestamp data["time"] = timestamp
data["event"] = new_report.copy() data["event"] = new_report.copy()
json_str += "{0}\n".format(json.dumps(data)) json_str += "{0}\n".format(json.dumps(data))
@@ -113,7 +134,10 @@ class HECClient(object):
if response["code"] != 0: if response["code"] != 0:
raise SplunkError(response["text"]) raise SplunkError(response["text"])
def save_forensic_reports_to_splunk(self, forensic_reports): def save_forensic_reports_to_splunk(
self,
forensic_reports: Union[list[dict[str, Any]], dict[str, Any]],
):
""" """
Saves forensic DMARC reports to Splunk Saves forensic DMARC reports to Splunk
@@ -147,7 +171,9 @@ class HECClient(object):
if response["code"] != 0: if response["code"] != 0:
raise SplunkError(response["text"]) raise SplunkError(response["text"])
def save_smtp_tls_reports_to_splunk(self, reports): def save_smtp_tls_reports_to_splunk(
self, reports: Union[list[dict[str, Any]], dict[str, Any]]
):
""" """
Saves aggregate DMARC reports to Splunk Saves aggregate DMARC reports to Splunk

View File

@@ -1,8 +1,15 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import annotations
import json
import logging import logging
import logging.handlers import logging.handlers
import json import socket
import ssl
import time
from typing import Any, Optional
from parsedmarc import ( from parsedmarc import (
parsed_aggregate_reports_to_csv_rows, parsed_aggregate_reports_to_csv_rows,
@@ -14,31 +21,161 @@ from parsedmarc import (
class SyslogClient(object): class SyslogClient(object):
"""A client for Syslog""" """A client for Syslog"""
def __init__(self, server_name, server_port): def __init__(
self,
server_name: str,
server_port: int,
protocol: str = "udp",
cafile_path: Optional[str] = None,
certfile_path: Optional[str] = None,
keyfile_path: Optional[str] = None,
timeout: float = 5.0,
retry_attempts: int = 3,
retry_delay: int = 5,
):
""" """
Initializes the SyslogClient Initializes the SyslogClient
Args: Args:
server_name (str): The Syslog server server_name (str): The Syslog server
server_port (int): The Syslog UDP port server_port (int): The Syslog port
protocol (str): The protocol to use: "udp", "tcp", or "tls" (Default: "udp")
cafile_path (str): Path to CA certificate file for TLS server verification (Optional)
certfile_path (str): Path to client certificate file for TLS authentication (Optional)
keyfile_path (str): Path to client private key file for TLS authentication (Optional)
timeout (float): Connection timeout in seconds for TCP/TLS (Default: 5.0)
retry_attempts (int): Number of retry attempts for failed connections (Default: 3)
retry_delay (int): Delay in seconds between retry attempts (Default: 5)
""" """
self.server_name = server_name self.server_name = server_name
self.server_port = server_port self.server_port = server_port
self.protocol = protocol.lower()
self.timeout = timeout
self.retry_attempts = retry_attempts
self.retry_delay = retry_delay
self.logger = logging.getLogger("parsedmarc_syslog") self.logger = logging.getLogger("parsedmarc_syslog")
self.logger.setLevel(logging.INFO) self.logger.setLevel(logging.INFO)
log_handler = logging.handlers.SysLogHandler(address=(server_name, server_port))
# Create the appropriate syslog handler based on protocol
log_handler = self._create_syslog_handler(
server_name,
server_port,
self.protocol,
cafile_path,
certfile_path,
keyfile_path,
timeout,
retry_attempts,
retry_delay,
)
self.logger.addHandler(log_handler) self.logger.addHandler(log_handler)
def save_aggregate_report_to_syslog(self, aggregate_reports): def _create_syslog_handler(
self,
server_name: str,
server_port: int,
protocol: str,
cafile_path: Optional[str],
certfile_path: Optional[str],
keyfile_path: Optional[str],
timeout: float,
retry_attempts: int,
retry_delay: int,
) -> logging.handlers.SysLogHandler:
"""
Creates a SysLogHandler with the specified protocol and TLS settings
"""
if protocol == "udp":
# UDP protocol (default, backward compatible)
return logging.handlers.SysLogHandler(
address=(server_name, server_port),
socktype=socket.SOCK_DGRAM,
)
elif protocol in ["tcp", "tls"]:
# TCP or TLS protocol with retry logic
for attempt in range(1, retry_attempts + 1):
try:
if protocol == "tcp":
# TCP without TLS
handler = logging.handlers.SysLogHandler(
address=(server_name, server_port),
socktype=socket.SOCK_STREAM,
)
# Set timeout on the socket
if hasattr(handler, "socket") and handler.socket:
handler.socket.settimeout(timeout)
return handler
else:
# TLS protocol
# Create SSL context with secure defaults
ssl_context = ssl.create_default_context()
# Explicitly set minimum TLS version to 1.2 for security
ssl_context.minimum_version = ssl.TLSVersion.TLSv1_2
# Configure server certificate verification
if cafile_path:
ssl_context.load_verify_locations(cafile=cafile_path)
# Configure client certificate authentication
if certfile_path and keyfile_path:
ssl_context.load_cert_chain(
certfile=certfile_path,
keyfile=keyfile_path,
)
elif certfile_path or keyfile_path:
# Warn if only one of the two required parameters is provided
self.logger.warning(
"Both certfile_path and keyfile_path are required for "
"client certificate authentication. Client authentication "
"will not be used."
)
# Create TCP handler first
handler = logging.handlers.SysLogHandler(
address=(server_name, server_port),
socktype=socket.SOCK_STREAM,
)
# Wrap socket with TLS
if hasattr(handler, "socket") and handler.socket:
handler.socket = ssl_context.wrap_socket(
handler.socket,
server_hostname=server_name,
)
handler.socket.settimeout(timeout)
return handler
except Exception as e:
if attempt < retry_attempts:
self.logger.warning(
f"Syslog connection attempt {attempt}/{retry_attempts} failed: {e}. "
f"Retrying in {retry_delay} seconds..."
)
time.sleep(retry_delay)
else:
self.logger.error(
f"Syslog connection failed after {retry_attempts} attempts: {e}"
)
raise
else:
raise ValueError(
f"Invalid protocol '{protocol}'. Must be 'udp', 'tcp', or 'tls'."
)
def save_aggregate_report_to_syslog(self, aggregate_reports: list[dict[str, Any]]):
rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports) rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports)
for row in rows: for row in rows:
self.logger.info(json.dumps(row)) self.logger.info(json.dumps(row))
def save_forensic_report_to_syslog(self, forensic_reports): def save_forensic_report_to_syslog(self, forensic_reports: list[dict[str, Any]]):
rows = parsed_forensic_reports_to_csv_rows(forensic_reports) rows = parsed_forensic_reports_to_csv_rows(forensic_reports)
for row in rows: for row in rows:
self.logger.info(json.dumps(row)) self.logger.info(json.dumps(row))
def save_smtp_tls_report_to_syslog(self, smtp_tls_reports): def save_smtp_tls_report_to_syslog(self, smtp_tls_reports: list[dict[str, Any]]):
rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports) rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports)
for row in rows: for row in rows:
self.logger.info(json.dumps(row)) self.logger.info(json.dumps(row))

220
parsedmarc/types.py Normal file
View File

@@ -0,0 +1,220 @@
from __future__ import annotations
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
# NOTE: This module is intentionally Python 3.10 compatible.
# - No PEP 604 unions (A | B)
# - No typing.NotRequired / Required (3.11+) to avoid an extra dependency.
# For optional keys, use total=False TypedDicts.
ReportType = Literal["aggregate", "forensic", "smtp_tls"]
class AggregateReportMetadata(TypedDict):
org_name: str
org_email: str
org_extra_contact_info: Optional[str]
report_id: str
begin_date: str
end_date: str
timespan_requires_normalization: bool
original_timespan_seconds: int
errors: List[str]
class AggregatePolicyPublished(TypedDict):
domain: str
adkim: str
aspf: str
p: str
sp: str
pct: str
fo: str
class IPSourceInfo(TypedDict):
ip_address: str
country: Optional[str]
reverse_dns: Optional[str]
base_domain: Optional[str]
name: Optional[str]
type: Optional[str]
class AggregateAlignment(TypedDict):
spf: bool
dkim: bool
dmarc: bool
class AggregateIdentifiers(TypedDict):
header_from: str
envelope_from: Optional[str]
envelope_to: Optional[str]
class AggregatePolicyOverrideReason(TypedDict):
type: Optional[str]
comment: Optional[str]
class AggregateAuthResultDKIM(TypedDict):
domain: str
result: str
selector: str
class AggregateAuthResultSPF(TypedDict):
domain: str
result: str
scope: str
class AggregateAuthResults(TypedDict):
dkim: List[AggregateAuthResultDKIM]
spf: List[AggregateAuthResultSPF]
class AggregatePolicyEvaluated(TypedDict):
disposition: str
dkim: str
spf: str
policy_override_reasons: List[AggregatePolicyOverrideReason]
class AggregateRecord(TypedDict):
interval_begin: str
interval_end: str
source: IPSourceInfo
count: int
alignment: AggregateAlignment
policy_evaluated: AggregatePolicyEvaluated
disposition: str
identifiers: AggregateIdentifiers
auth_results: AggregateAuthResults
class AggregateReport(TypedDict):
xml_schema: str
report_metadata: AggregateReportMetadata
policy_published: AggregatePolicyPublished
records: List[AggregateRecord]
class EmailAddress(TypedDict):
display_name: Optional[str]
address: str
local: Optional[str]
domain: Optional[str]
class EmailAttachment(TypedDict, total=False):
filename: Optional[str]
mail_content_type: Optional[str]
sha256: Optional[str]
ParsedEmail = TypedDict(
"ParsedEmail",
{
# This is a lightly-specified version of mailsuite/mailparser JSON.
# It focuses on the fields parsedmarc uses in forensic handling.
"headers": Dict[str, Any],
"subject": Optional[str],
"filename_safe_subject": Optional[str],
"date": Optional[str],
"from": EmailAddress,
"to": List[EmailAddress],
"cc": List[EmailAddress],
"bcc": List[EmailAddress],
"attachments": List[EmailAttachment],
"body": Optional[str],
"has_defects": bool,
"defects": Any,
"defects_categories": Any,
},
total=False,
)
class ForensicReport(TypedDict):
feedback_type: Optional[str]
user_agent: Optional[str]
version: Optional[str]
original_envelope_id: Optional[str]
original_mail_from: Optional[str]
original_rcpt_to: Optional[str]
arrival_date: str
arrival_date_utc: str
authentication_results: Optional[str]
delivery_result: Optional[str]
auth_failure: List[str]
authentication_mechanisms: List[str]
dkim_domain: Optional[str]
reported_domain: str
sample_headers_only: bool
source: IPSourceInfo
sample: str
parsed_sample: ParsedEmail
class SMTPTLSFailureDetails(TypedDict):
result_type: str
failed_session_count: int
class SMTPTLSFailureDetailsOptional(SMTPTLSFailureDetails, total=False):
sending_mta_ip: str
receiving_ip: str
receiving_mx_hostname: str
receiving_mx_helo: str
additional_info_uri: str
failure_reason_code: str
ip_address: str
class SMTPTLSPolicySummary(TypedDict):
policy_domain: str
policy_type: str
successful_session_count: int
failed_session_count: int
class SMTPTLSPolicy(SMTPTLSPolicySummary, total=False):
policy_strings: List[str]
mx_host_patterns: List[str]
failure_details: List[SMTPTLSFailureDetailsOptional]
class SMTPTLSReport(TypedDict):
organization_name: str
begin_date: str
end_date: str
contact_info: Union[str, List[str]]
report_id: str
policies: List[SMTPTLSPolicy]
class AggregateParsedReport(TypedDict):
report_type: Literal["aggregate"]
report: AggregateReport
class ForensicParsedReport(TypedDict):
report_type: Literal["forensic"]
report: ForensicReport
class SMTPTLSParsedReport(TypedDict):
report_type: Literal["smtp_tls"]
report: SMTPTLSReport
ParsedReport = Union[AggregateParsedReport, ForensicParsedReport, SMTPTLSParsedReport]
class ParsingResults(TypedDict):
aggregate_reports: List[AggregateReport]
forensic_reports: List[ForensicReport]
smtp_tls_reports: List[SMTPTLSReport]

View File

@@ -1,22 +1,26 @@
# -*- coding: utf-8 -*-
"""Utility functions that might be useful for other projects""" """Utility functions that might be useful for other projects"""
import logging from __future__ import annotations
import os
from datetime import datetime
from datetime import timezone
from datetime import timedelta
from collections import OrderedDict
import tempfile
import subprocess
import shutil
import mailparser
import json
import hashlib
import base64 import base64
import mailbox
import re
import csv import csv
import hashlib
import io import io
import json
import logging
import mailbox
import os
import re
import shutil
import subprocess
import tempfile
from datetime import datetime, timedelta, timezone
from typing import Optional, TypedDict, Union, cast
import mailparser
from expiringdict import ExpiringDict
try: try:
from importlib.resources import files from importlib.resources import files
@@ -25,25 +29,31 @@ except ImportError:
from importlib.resources import files from importlib.resources import files
from dateutil.parser import parse as parse_date
import dns.reversename
import dns.resolver
import dns.exception import dns.exception
import dns.resolver
import dns.reversename
import geoip2.database import geoip2.database
import geoip2.errors import geoip2.errors
import publicsuffixlist import publicsuffixlist
import requests import requests
from dateutil.parser import parse as parse_date
from parsedmarc.log import logger
import parsedmarc.resources.dbip import parsedmarc.resources.dbip
import parsedmarc.resources.maps import parsedmarc.resources.maps
from parsedmarc.constants import USER_AGENT from parsedmarc.constants import USER_AGENT
from parsedmarc.log import logger
parenthesis_regex = re.compile(r"\s*\(.*\)\s*") parenthesis_regex = re.compile(r"\s*\(.*\)\s*")
null_file = open(os.devnull, "w") null_file = open(os.devnull, "w")
mailparser_logger = logging.getLogger("mailparser") mailparser_logger = logging.getLogger("mailparser")
mailparser_logger.setLevel(logging.CRITICAL) mailparser_logger.setLevel(logging.CRITICAL)
psl = publicsuffixlist.PublicSuffixList()
psl_overrides_path = str(files(parsedmarc.resources.maps).joinpath("psl_overrides.txt"))
with open(psl_overrides_path) as f:
psl_overrides = [line.rstrip() for line in f.readlines()]
while "" in psl_overrides:
psl_overrides.remove("")
class EmailParserError(RuntimeError): class EmailParserError(RuntimeError):
@@ -54,31 +64,49 @@ class DownloadError(RuntimeError):
"""Raised when an error occurs when downloading a file""" """Raised when an error occurs when downloading a file"""
def decode_base64(data): class ReverseDNSService(TypedDict):
name: str
type: Optional[str]
ReverseDNSMap = dict[str, ReverseDNSService]
class IPAddressInfo(TypedDict):
ip_address: str
reverse_dns: Optional[str]
country: Optional[str]
base_domain: Optional[str]
name: Optional[str]
type: Optional[str]
def decode_base64(data: str) -> bytes:
""" """
Decodes a base64 string, with padding being optional Decodes a base64 string, with padding being optional
Args: Args:
data: A base64 encoded string data (str): A base64 encoded string
Returns: Returns:
bytes: The decoded bytes bytes: The decoded bytes
""" """
data = bytes(data, encoding="ascii") data_bytes = bytes(data, encoding="ascii")
missing_padding = len(data) % 4 missing_padding = len(data_bytes) % 4
if missing_padding != 0: if missing_padding != 0:
data += b"=" * (4 - missing_padding) data_bytes += b"=" * (4 - missing_padding)
return base64.b64decode(data) return base64.b64decode(data_bytes)
def get_base_domain(domain): def get_base_domain(domain: str) -> Optional[str]:
""" """
Gets the base domain name for the given domain Gets the base domain name for the given domain
.. note:: .. note::
Results are based on a list of public domain suffixes at Results are based on a list of public domain suffixes at
https://publicsuffix.org/list/public_suffix_list.dat. https://publicsuffix.org/list/public_suffix_list.dat and overrides included in
parsedmarc.resources.maps.psl_overrides.txt
Args: Args:
domain (str): A domain or subdomain domain (str): A domain or subdomain
@@ -87,11 +115,22 @@ def get_base_domain(domain):
str: The base domain of the given domain str: The base domain of the given domain
""" """
psl = publicsuffixlist.PublicSuffixList() domain = domain.lower()
return psl.privatesuffix(domain) publicsuffix = psl.privatesuffix(domain)
for override in psl_overrides:
if domain.endswith(override):
return override.strip(".").strip("-")
return publicsuffix
def query_dns(domain, record_type, cache=None, nameservers=None, timeout=2.0): def query_dns(
domain: str,
record_type: str,
*,
cache: Optional[ExpiringDict] = None,
nameservers: Optional[list[str]] = None,
timeout: float = 2.0,
) -> list[str]:
""" """
Queries DNS Queries DNS
@@ -110,9 +149,9 @@ def query_dns(domain, record_type, cache=None, nameservers=None, timeout=2.0):
record_type = record_type.upper() record_type = record_type.upper()
cache_key = "{0}_{1}".format(domain, record_type) cache_key = "{0}_{1}".format(domain, record_type)
if cache: if cache:
records = cache.get(cache_key, None) cached_records = cache.get(cache_key, None)
if records: if isinstance(cached_records, list):
return records return cast(list[str], cached_records)
resolver = dns.resolver.Resolver() resolver = dns.resolver.Resolver()
timeout = float(timeout) timeout = float(timeout)
@@ -126,33 +165,25 @@ def query_dns(domain, record_type, cache=None, nameservers=None, timeout=2.0):
resolver.nameservers = nameservers resolver.nameservers = nameservers
resolver.timeout = timeout resolver.timeout = timeout
resolver.lifetime = timeout resolver.lifetime = timeout
if record_type == "TXT": records = list(
resource_records = list( map(
map( lambda r: r.to_text().replace('"', "").rstrip("."),
lambda r: r.strings, resolver.resolve(domain, record_type, lifetime=timeout),
resolver.resolve(domain, record_type, lifetime=timeout),
)
)
_resource_record = [
resource_record[0][:0].join(resource_record)
for resource_record in resource_records
if resource_record
]
records = [r.decode() for r in _resource_record]
else:
records = list(
map(
lambda r: r.to_text().replace('"', "").rstrip("."),
resolver.resolve(domain, record_type, lifetime=timeout),
)
) )
)
if cache: if cache:
cache[cache_key] = records cache[cache_key] = records
return records return records
def get_reverse_dns(ip_address, cache=None, nameservers=None, timeout=2.0): def get_reverse_dns(
ip_address,
*,
cache: Optional[ExpiringDict] = None,
nameservers: Optional[list[str]] = None,
timeout: float = 2.0,
) -> Optional[str]:
""" """
Resolves an IP address to a hostname using a reverse DNS query Resolves an IP address to a hostname using a reverse DNS query
@@ -170,7 +201,7 @@ def get_reverse_dns(ip_address, cache=None, nameservers=None, timeout=2.0):
try: try:
address = dns.reversename.from_address(ip_address) address = dns.reversename.from_address(ip_address)
hostname = query_dns( hostname = query_dns(
address, "PTR", cache=cache, nameservers=nameservers, timeout=timeout str(address), "PTR", cache=cache, nameservers=nameservers, timeout=timeout
)[0] )[0]
except dns.exception.DNSException as e: except dns.exception.DNSException as e:
@@ -180,7 +211,7 @@ def get_reverse_dns(ip_address, cache=None, nameservers=None, timeout=2.0):
return hostname return hostname
def timestamp_to_datetime(timestamp): def timestamp_to_datetime(timestamp: int) -> datetime:
""" """
Converts a UNIX/DMARC timestamp to a Python ``datetime`` object Converts a UNIX/DMARC timestamp to a Python ``datetime`` object
@@ -193,7 +224,7 @@ def timestamp_to_datetime(timestamp):
return datetime.fromtimestamp(int(timestamp)) return datetime.fromtimestamp(int(timestamp))
def timestamp_to_human(timestamp): def timestamp_to_human(timestamp: int) -> str:
""" """
Converts a UNIX/DMARC timestamp to a human-readable string Converts a UNIX/DMARC timestamp to a human-readable string
@@ -206,7 +237,9 @@ def timestamp_to_human(timestamp):
return timestamp_to_datetime(timestamp).strftime("%Y-%m-%d %H:%M:%S") return timestamp_to_datetime(timestamp).strftime("%Y-%m-%d %H:%M:%S")
def human_timestamp_to_datetime(human_timestamp, to_utc=False): def human_timestamp_to_datetime(
human_timestamp: str, *, to_utc: bool = False
) -> datetime:
""" """
Converts a human-readable timestamp into a Python ``datetime`` object Converts a human-readable timestamp into a Python ``datetime`` object
@@ -225,7 +258,7 @@ def human_timestamp_to_datetime(human_timestamp, to_utc=False):
return dt.astimezone(timezone.utc) if to_utc else dt return dt.astimezone(timezone.utc) if to_utc else dt
def human_timestamp_to_unix_timestamp(human_timestamp): def human_timestamp_to_unix_timestamp(human_timestamp: str) -> int:
""" """
Converts a human-readable timestamp into a UNIX timestamp Converts a human-readable timestamp into a UNIX timestamp
@@ -236,10 +269,12 @@ def human_timestamp_to_unix_timestamp(human_timestamp):
float: The converted timestamp float: The converted timestamp
""" """
human_timestamp = human_timestamp.replace("T", " ") human_timestamp = human_timestamp.replace("T", " ")
return human_timestamp_to_datetime(human_timestamp).timestamp() return int(human_timestamp_to_datetime(human_timestamp).timestamp())
def get_ip_address_country(ip_address, db_path=None): def get_ip_address_country(
ip_address: str, *, db_path: Optional[str] = None
) -> Optional[str]:
""" """
Returns the ISO code for the country associated Returns the ISO code for the country associated
with the given IPv4 or IPv6 address with the given IPv4 or IPv6 address
@@ -266,7 +301,7 @@ def get_ip_address_country(ip_address, db_path=None):
] ]
if db_path is not None: if db_path is not None:
if os.path.isfile(db_path) is False: if not os.path.isfile(db_path):
db_path = None db_path = None
logger.warning( logger.warning(
f"No file exists at {db_path}. Falling back to an " f"No file exists at {db_path}. Falling back to an "
@@ -303,12 +338,13 @@ def get_ip_address_country(ip_address, db_path=None):
def get_service_from_reverse_dns_base_domain( def get_service_from_reverse_dns_base_domain(
base_domain, base_domain,
always_use_local_file=False, *,
local_file_path=None, always_use_local_file: bool = False,
url=None, local_file_path: Optional[str] = None,
offline=False, url: Optional[str] = None,
reverse_dns_map=None, offline: bool = False,
): reverse_dns_map: Optional[ReverseDNSMap] = None,
) -> ReverseDNSService:
""" """
Returns the service name of a given base domain name from reverse DNS. Returns the service name of a given base domain name from reverse DNS.
@@ -325,12 +361,6 @@ def get_service_from_reverse_dns_base_domain(
the supplied reverse_dns_base_domain and the type will be None the supplied reverse_dns_base_domain and the type will be None
""" """
def load_csv(_csv_file):
reader = csv.DictReader(_csv_file)
for row in reader:
key = row["base_reverse_dns"].lower().strip()
reverse_dns_map[key] = dict(name=row["name"], type=row["type"])
base_domain = base_domain.lower().strip() base_domain = base_domain.lower().strip()
if url is None: if url is None:
url = ( url = (
@@ -338,11 +368,24 @@ def get_service_from_reverse_dns_base_domain(
"/parsedmarc/master/parsedmarc/" "/parsedmarc/master/parsedmarc/"
"resources/maps/base_reverse_dns_map.csv" "resources/maps/base_reverse_dns_map.csv"
) )
reverse_dns_map_value: ReverseDNSMap
if reverse_dns_map is None: if reverse_dns_map is None:
reverse_dns_map = dict() reverse_dns_map_value = {}
else:
reverse_dns_map_value = reverse_dns_map
def load_csv(_csv_file):
reader = csv.DictReader(_csv_file)
for row in reader:
key = row["base_reverse_dns"].lower().strip()
reverse_dns_map_value[key] = {
"name": row["name"],
"type": row["type"],
}
csv_file = io.StringIO() csv_file = io.StringIO()
if not (offline or always_use_local_file) and len(reverse_dns_map) == 0: if not (offline or always_use_local_file) and len(reverse_dns_map_value) == 0:
try: try:
logger.debug(f"Trying to fetch reverse DNS map from {url}...") logger.debug(f"Trying to fetch reverse DNS map from {url}...")
headers = {"User-Agent": USER_AGENT} headers = {"User-Agent": USER_AGENT}
@@ -359,7 +402,7 @@ def get_service_from_reverse_dns_base_domain(
logging.debug("Response body:") logging.debug("Response body:")
logger.debug(csv_file.read()) logger.debug(csv_file.read())
if len(reverse_dns_map) == 0: if len(reverse_dns_map_value) == 0:
logger.info("Loading included reverse DNS map...") logger.info("Loading included reverse DNS map...")
path = str( path = str(
files(parsedmarc.resources.maps).joinpath("base_reverse_dns_map.csv") files(parsedmarc.resources.maps).joinpath("base_reverse_dns_map.csv")
@@ -368,26 +411,28 @@ def get_service_from_reverse_dns_base_domain(
path = local_file_path path = local_file_path
with open(path) as csv_file: with open(path) as csv_file:
load_csv(csv_file) load_csv(csv_file)
service: ReverseDNSService
try: try:
service = reverse_dns_map[base_domain] service = reverse_dns_map_value[base_domain]
except KeyError: except KeyError:
service = dict(name=base_domain, type=None) service = {"name": base_domain, "type": None}
return service return service
def get_ip_address_info( def get_ip_address_info(
ip_address, ip_address,
ip_db_path=None, *,
reverse_dns_map_path=None, ip_db_path: Optional[str] = None,
always_use_local_files=False, reverse_dns_map_path: Optional[str] = None,
reverse_dns_map_url=None, always_use_local_files: bool = False,
cache=None, reverse_dns_map_url: Optional[str] = None,
reverse_dns_map=None, cache: Optional[ExpiringDict] = None,
offline=False, reverse_dns_map: Optional[ReverseDNSMap] = None,
nameservers=None, offline: bool = False,
timeout=2.0, nameservers: Optional[list[str]] = None,
): timeout: float = 2.0,
) -> IPAddressInfo:
""" """
Returns reverse DNS and country information for the given IP address Returns reverse DNS and country information for the given IP address
@@ -405,17 +450,27 @@ def get_ip_address_info(
timeout (float): Sets the DNS timeout in seconds timeout (float): Sets the DNS timeout in seconds
Returns: Returns:
OrderedDict: ``ip_address``, ``reverse_dns`` dict: ``ip_address``, ``reverse_dns``, ``country``
""" """
ip_address = ip_address.lower() ip_address = ip_address.lower()
if cache is not None: if cache is not None:
info = cache.get(ip_address, None) cached_info = cache.get(ip_address, None)
if info: if (
cached_info
and isinstance(cached_info, dict)
and "ip_address" in cached_info
):
logger.debug(f"IP address {ip_address} was found in cache") logger.debug(f"IP address {ip_address} was found in cache")
return info return cast(IPAddressInfo, cached_info)
info = OrderedDict() info: IPAddressInfo = {
info["ip_address"] = ip_address "ip_address": ip_address,
"reverse_dns": None,
"country": None,
"base_domain": None,
"name": None,
"type": None,
}
if offline: if offline:
reverse_dns = None reverse_dns = None
else: else:
@@ -425,9 +480,6 @@ def get_ip_address_info(
country = get_ip_address_country(ip_address, db_path=ip_db_path) country = get_ip_address_country(ip_address, db_path=ip_db_path)
info["country"] = country info["country"] = country
info["reverse_dns"] = reverse_dns info["reverse_dns"] = reverse_dns
info["base_domain"] = None
info["name"] = None
info["type"] = None
if reverse_dns is not None: if reverse_dns is not None:
base_domain = get_base_domain(reverse_dns) base_domain = get_base_domain(reverse_dns)
if base_domain is not None: if base_domain is not None:
@@ -452,7 +504,7 @@ def get_ip_address_info(
return info return info
def parse_email_address(original_address): def parse_email_address(original_address: str) -> dict[str, Optional[str]]:
if original_address[0] == "": if original_address[0] == "":
display_name = None display_name = None
else: else:
@@ -465,17 +517,15 @@ def parse_email_address(original_address):
local = address_parts[0].lower() local = address_parts[0].lower()
domain = address_parts[-1].lower() domain = address_parts[-1].lower()
return OrderedDict( return {
[ "display_name": display_name,
("display_name", display_name), "address": address,
("address", address), "local": local,
("local", local), "domain": domain,
("domain", domain), }
]
)
def get_filename_safe_string(string): def get_filename_safe_string(string: str) -> str:
""" """
Converts a string to a string that is safe for a filename Converts a string to a string that is safe for a filename
@@ -497,7 +547,7 @@ def get_filename_safe_string(string):
return string return string
def is_mbox(path): def is_mbox(path: str) -> bool:
""" """
Checks if the given content is an MBOX mailbox file Checks if the given content is an MBOX mailbox file
@@ -518,7 +568,7 @@ def is_mbox(path):
return _is_mbox return _is_mbox
def is_outlook_msg(content): def is_outlook_msg(content) -> bool:
""" """
Checks if the given content is an Outlook msg OLE/MSG file Checks if the given content is an Outlook msg OLE/MSG file
@@ -533,7 +583,7 @@ def is_outlook_msg(content):
) )
def convert_outlook_msg(msg_bytes): def convert_outlook_msg(msg_bytes: bytes) -> bytes:
""" """
Uses the ``msgconvert`` Perl utility to convert an Outlook MS file to Uses the ``msgconvert`` Perl utility to convert an Outlook MS file to
standard RFC 822 format standard RFC 822 format
@@ -542,7 +592,7 @@ def convert_outlook_msg(msg_bytes):
msg_bytes (bytes): the content of the .msg file msg_bytes (bytes): the content of the .msg file
Returns: Returns:
A RFC 822 string A RFC 822 bytes payload
""" """
if not is_outlook_msg(msg_bytes): if not is_outlook_msg(msg_bytes):
raise ValueError("The supplied bytes are not an Outlook MSG file") raise ValueError("The supplied bytes are not an Outlook MSG file")
@@ -569,7 +619,9 @@ def convert_outlook_msg(msg_bytes):
return rfc822 return rfc822
def parse_email(data, strip_attachment_payloads=False): def parse_email(
data: Union[bytes, str], *, strip_attachment_payloads: bool = False
) -> dict:
""" """
A simplified email parser A simplified email parser

View File

@@ -1,3 +1,9 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from typing import Any, Optional, Union
import requests import requests
from parsedmarc import logger from parsedmarc import logger
@@ -7,7 +13,13 @@ from parsedmarc.constants import USER_AGENT
class WebhookClient(object): class WebhookClient(object):
"""A client for webhooks""" """A client for webhooks"""
def __init__(self, aggregate_url, forensic_url, smtp_tls_url, timeout=60): def __init__(
self,
aggregate_url: str,
forensic_url: str,
smtp_tls_url: str,
timeout: Optional[int] = 60,
):
""" """
Initializes the WebhookClient Initializes the WebhookClient
Args: Args:
@@ -26,25 +38,27 @@ class WebhookClient(object):
"Content-Type": "application/json", "Content-Type": "application/json",
} }
def save_forensic_report_to_webhook(self, report): def save_forensic_report_to_webhook(self, report: str):
try: try:
self._send_to_webhook(self.forensic_url, report) self._send_to_webhook(self.forensic_url, report)
except Exception as error_: except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__())) logger.error("Webhook Error: {0}".format(error_.__str__()))
def save_smtp_tls_report_to_webhook(self, report): def save_smtp_tls_report_to_webhook(self, report: str):
try: try:
self._send_to_webhook(self.smtp_tls_url, report) self._send_to_webhook(self.smtp_tls_url, report)
except Exception as error_: except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__())) logger.error("Webhook Error: {0}".format(error_.__str__()))
def save_aggregate_report_to_webhook(self, report): def save_aggregate_report_to_webhook(self, report: str):
try: try:
self._send_to_webhook(self.aggregate_url, report) self._send_to_webhook(self.aggregate_url, report)
except Exception as error_: except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__())) logger.error("Webhook Error: {0}".format(error_.__str__()))
def _send_to_webhook(self, webhook_url, payload): def _send_to_webhook(
self, webhook_url: str, payload: Union[bytes, str, dict[str, Any]]
):
try: try:
self.session.post(webhook_url, data=payload, timeout=self.timeout) self.session.post(webhook_url, data=payload, timeout=self.timeout)
except Exception as error_: except Exception as error_:

View File

@@ -2,6 +2,7 @@
requires = [ requires = [
"hatchling>=1.27.0", "hatchling>=1.27.0",
] ]
requires_python = ">=3.10,<3.14"
build-backend = "hatchling.build" build-backend = "hatchling.build"
[project] [project]
@@ -28,6 +29,7 @@ classifiers = [
"Operating System :: OS Independent", "Operating System :: OS Independent",
"Programming Language :: Python :: 3" "Programming Language :: Python :: 3"
] ]
requires-python = ">=3.10"
dependencies = [ dependencies = [
"azure-identity>=1.8.0", "azure-identity>=1.8.0",
"azure-monitor-ingestion>=1.0.0", "azure-monitor-ingestion>=1.0.0",
@@ -46,7 +48,7 @@ dependencies = [
"imapclient>=2.1.0", "imapclient>=2.1.0",
"kafka-python-ng>=2.2.2", "kafka-python-ng>=2.2.2",
"lxml>=4.4.0", "lxml>=4.4.0",
"mailsuite>=1.9.18", "mailsuite>=1.11.2",
"msgraph-core==0.2.2", "msgraph-core==0.2.2",
"opensearch-py>=2.4.2,<=3.0.0", "opensearch-py>=2.4.2,<=3.0.0",
"publicsuffixlist>=0.10.0", "publicsuffixlist>=0.10.0",
@@ -55,6 +57,7 @@ dependencies = [
"tqdm>=4.31.1", "tqdm>=4.31.1",
"urllib3>=1.25.7", "urllib3>=1.25.7",
"xmltodict>=0.12.0", "xmltodict>=0.12.0",
"PyYAML>=6.0.3"
] ]
[project.optional-dependencies] [project.optional-dependencies]
@@ -82,3 +85,14 @@ path = "parsedmarc/constants.py"
include = [ include = [
"/parsedmarc", "/parsedmarc",
] ]
[tool.hatch.build]
exclude = [
"base_reverse_dns.csv",
"find_bad_utf8.py",
"find_unknown_base_reverse_dns.py",
"unknown_base_reverse_dns.csv",
"sortmaps.py",
"README.md",
"*.bak"
]

View File

@@ -1,25 +0,0 @@
#!/usr/bin/env python3
import os
import glob
import csv
maps_dir = os.path.join("parsedmarc", "resources", "maps")
csv_files = glob.glob(os.path.join(maps_dir, "*.csv"))
def sort_csv(filepath, column=0):
with open(filepath, mode="r", newline="") as infile:
reader = csv.reader(infile)
header = next(reader)
sorted_rows = sorted(reader, key=lambda row: row[column])
with open(filepath, mode="w", newline="\n") as outfile:
writer = csv.writer(outfile)
writer.writerow(header)
writer.writerows(sorted_rows)
for csv_file in csv_files:
sort_csv(csv_file)

View File

@@ -0,0 +1,107 @@
<form version="1.1" theme="dark">
<label>SMTP TLS Reporting</label>
<fieldset submitButton="false" autoRun="true">
<input type="time" token="time">
<label></label>
<default>
<earliest>-7d@h</earliest>
<latest>now</latest>
</default>
</input>
<input type="text" token="organization_name" searchWhenChanged="true">
<label>Organization name</label>
<default>*</default>
<initialValue>*</initialValue>
</input>
<input type="text" token="policy_domain">
<label>Policy domain</label>
<default>*</default>
<initialValue>*</initialValue>
</input>
<input type="dropdown" token="policy_type" searchWhenChanged="true">
<label>Policy type</label>
<choice value="*">Any</choice>
<choice value="tlsa">tlsa</choice>
<choice value="sts">sts</choice>
<choice value="no-policy-found">no-policy-found</choice>
<default>*</default>
<initialValue>*</initialValue>
</input>
</fieldset>
<row>
<panel>
<title>Reporting organizations</title>
<table>
<search>
<query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$
| rename policies{}.policy_domain as policy_domain
| rename policies{}.policy_type as policy_type
| rename policies{}.failed_session_count as failed_sessions
| rename policies{}.failure_details{}.failed_session_count as failed_sessions
| rename policies{}.successful_session_count as successful_sessions
| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
| rename policies{}.failure_details{}.receiving_ip as receiving_ip
| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
| rename policies{}.failure_details{}.result_type as failure_type
| fillnull value=0 failed_sessions
| stats sum(failed_sessions) as failed_sessions sum(successful_sessions) as successful_sessions by organization_name
| sort -successful_sessions 0</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<title>Domains</title>
<table>
<search>
<query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$
| rename policies{}.policy_domain as policy_domain
| rename policies{}.policy_type as policy_type
| rename policies{}.failed_session_count as failed_sessions
| rename policies{}.failure_details{}.failed_session_count as failed_sessions
| rename policies{}.successful_session_count as successful_sessions
| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
| rename policies{}.failure_details{}.receiving_ip as receiving_ip
| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
| rename policies{}.failure_details{}.result_type as failure_type
| fillnull value=0 failed_sessions
| stats sum(failed_sessions) as failed_sessions sum(successful_sessions) as successful_sessions by policy_domain
| sort -successful_sessions 0</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Failure details</title>
<table>
<search>
<query>index=email sourcetype=smtp:tls organization_name=$organization_name$ policies{}.policy_domain=$policy_domain$ policies{}.failure_details{}.result_type=*
| rename policies{}.policy_domain as policy_domain
| rename policies{}.policy_type as policy_type
| rename policies{}.failed_session_count as failed_sessions
| rename policies{}.failure_details{}.failed_session_count as failed_sessions
| rename policies{}.successful_session_count as successful_sessions
| rename policies{}.failure_details{}.sending_mta_ip as sending_mta_ip
| rename policies{}.failure_details{}.receiving_ip as receiving_ip
| rename policies{}.failure_details{}.receiving_mx_hostname as receiving_mx_hostname
| fillnull value=0 failed_sessions
| rename policies{}.failure_details{}.result_type as failure_type
| table _time organization_name policy_domain policy_type failed_sessions successful_sessions sending_mta_ip receiving_ip receiving_mx_hostname failure_type
| sort by -_time 0</query>
<earliest>$time.earliest$</earliest>
<latest>$time.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
</form>

58
tests.py Normal file → Executable file
View File

@@ -1,3 +1,6 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function, unicode_literals from __future__ import absolute_import, print_function, unicode_literals
import os import os
@@ -9,6 +12,9 @@ from lxml import etree
import parsedmarc import parsedmarc
import parsedmarc.utils import parsedmarc.utils
# Detect if running in GitHub Actions to skip DNS lookups
OFFLINE_MODE = os.environ.get("GITHUB_ACTIONS", "false").lower() == "true"
def minify_xml(xml_string): def minify_xml(xml_string):
parser = etree.XMLParser(remove_blank_text=True) parser = etree.XMLParser(remove_blank_text=True)
@@ -43,11 +49,12 @@ class Test(unittest.TestCase):
def testExtractReportXMLComparator(self): def testExtractReportXMLComparator(self):
"""Test XML comparator function""" """Test XML comparator function"""
print() xmlnice_file = open("samples/extract_report/nice-input.xml")
xmlnice = open("samples/extract_report/nice-input.xml").read() xmlnice = xmlnice_file.read()
print(xmlnice) xmlnice_file.close()
xmlchanged = minify_xml(open("samples/extract_report/changed-input.xml").read()) xmlchanged_file = open("samples/extract_report/changed-input.xml")
print(xmlchanged) xmlchanged = minify_xml(xmlchanged_file.read())
xmlchanged_file.close()
self.assertTrue(compare_xml(xmlnice, xmlnice)) self.assertTrue(compare_xml(xmlnice, xmlnice))
self.assertTrue(compare_xml(xmlchanged, xmlchanged)) self.assertTrue(compare_xml(xmlchanged, xmlchanged))
self.assertFalse(compare_xml(xmlnice, xmlchanged)) self.assertFalse(compare_xml(xmlnice, xmlchanged))
@@ -62,7 +69,9 @@ class Test(unittest.TestCase):
data = f.read() data = f.read()
print("Testing {0}: ".format(file), end="") print("Testing {0}: ".format(file), end="")
xmlout = parsedmarc.extract_report(data) xmlout = parsedmarc.extract_report(data)
xmlin = open("samples/extract_report/nice-input.xml").read() xmlin_file = open("samples/extract_report/nice-input.xml")
xmlin = xmlin_file.read()
xmlin_file.close()
self.assertTrue(compare_xml(xmlout, xmlin)) self.assertTrue(compare_xml(xmlout, xmlin))
print("Passed!") print("Passed!")
@@ -71,8 +80,10 @@ class Test(unittest.TestCase):
print() print()
file = "samples/extract_report/nice-input.xml" file = "samples/extract_report/nice-input.xml"
print("Testing {0}: ".format(file), end="") print("Testing {0}: ".format(file), end="")
xmlout = parsedmarc.extract_report(file) xmlout = parsedmarc.extract_report_from_file_path(file)
xmlin = open("samples/extract_report/nice-input.xml").read() xmlin_file = open("samples/extract_report/nice-input.xml")
xmlin = xmlin_file.read()
xmlin_file.close()
self.assertTrue(compare_xml(xmlout, xmlin)) self.assertTrue(compare_xml(xmlout, xmlin))
print("Passed!") print("Passed!")
@@ -82,7 +93,9 @@ class Test(unittest.TestCase):
file = "samples/extract_report/nice-input.xml.gz" file = "samples/extract_report/nice-input.xml.gz"
print("Testing {0}: ".format(file), end="") print("Testing {0}: ".format(file), end="")
xmlout = parsedmarc.extract_report_from_file_path(file) xmlout = parsedmarc.extract_report_from_file_path(file)
xmlin = open("samples/extract_report/nice-input.xml").read() xmlin_file = open("samples/extract_report/nice-input.xml")
xmlin = xmlin_file.read()
xmlin_file.close()
self.assertTrue(compare_xml(xmlout, xmlin)) self.assertTrue(compare_xml(xmlout, xmlin))
print("Passed!") print("Passed!")
@@ -92,12 +105,13 @@ class Test(unittest.TestCase):
file = "samples/extract_report/nice-input.xml.zip" file = "samples/extract_report/nice-input.xml.zip"
print("Testing {0}: ".format(file), end="") print("Testing {0}: ".format(file), end="")
xmlout = parsedmarc.extract_report_from_file_path(file) xmlout = parsedmarc.extract_report_from_file_path(file)
print(xmlout) xmlin_file = open("samples/extract_report/nice-input.xml")
xmlin = minify_xml(open("samples/extract_report/nice-input.xml").read()) xmlin = minify_xml(xmlin_file.read())
print(xmlin) xmlin_file.close()
self.assertTrue(compare_xml(xmlout, xmlin)) self.assertTrue(compare_xml(xmlout, xmlin))
xmlin = minify_xml(open("samples/extract_report/changed-input.xml").read()) xmlin_file = open("samples/extract_report/changed-input.xml")
print(xmlin) xmlin = xmlin_file.read()
xmlin_file.close()
self.assertFalse(compare_xml(xmlout, xmlin)) self.assertFalse(compare_xml(xmlout, xmlin))
print("Passed!") print("Passed!")
@@ -110,7 +124,7 @@ class Test(unittest.TestCase):
continue continue
print("Testing {0}: ".format(sample_path), end="") print("Testing {0}: ".format(sample_path), end="")
parsed_report = parsedmarc.parse_report_file( parsed_report = parsedmarc.parse_report_file(
sample_path, always_use_local_files=True sample_path, always_use_local_files=True, offline=OFFLINE_MODE
)["report"] )["report"]
parsedmarc.parsed_aggregate_reports_to_csv(parsed_report) parsedmarc.parsed_aggregate_reports_to_csv(parsed_report)
print("Passed!") print("Passed!")
@@ -118,7 +132,7 @@ class Test(unittest.TestCase):
def testEmptySample(self): def testEmptySample(self):
"""Test empty/unparasable report""" """Test empty/unparasable report"""
with self.assertRaises(parsedmarc.ParserError): with self.assertRaises(parsedmarc.ParserError):
parsedmarc.parse_report_file("samples/empty.xml") parsedmarc.parse_report_file("samples/empty.xml", offline=OFFLINE_MODE)
def testForensicSamples(self): def testForensicSamples(self):
"""Test sample forensic/ruf/failure DMARC reports""" """Test sample forensic/ruf/failure DMARC reports"""
@@ -128,8 +142,12 @@ class Test(unittest.TestCase):
print("Testing {0}: ".format(sample_path), end="") print("Testing {0}: ".format(sample_path), end="")
with open(sample_path) as sample_file: with open(sample_path) as sample_file:
sample_content = sample_file.read() sample_content = sample_file.read()
parsed_report = parsedmarc.parse_report_email(sample_content)["report"] parsed_report = parsedmarc.parse_report_email(
parsed_report = parsedmarc.parse_report_file(sample_path)["report"] sample_content, offline=OFFLINE_MODE
)["report"]
parsed_report = parsedmarc.parse_report_file(
sample_path, offline=OFFLINE_MODE
)["report"]
parsedmarc.parsed_forensic_reports_to_csv(parsed_report) parsedmarc.parsed_forensic_reports_to_csv(parsed_report)
print("Passed!") print("Passed!")
@@ -141,7 +159,9 @@ class Test(unittest.TestCase):
if os.path.isdir(sample_path): if os.path.isdir(sample_path):
continue continue
print("Testing {0}: ".format(sample_path), end="") print("Testing {0}: ".format(sample_path), end="")
parsed_report = parsedmarc.parse_report_file(sample_path)["report"] parsed_report = parsedmarc.parse_report_file(
sample_path, offline=OFFLINE_MODE
)["report"]
parsedmarc.parsed_smtp_tls_reports_to_csv(parsed_report) parsedmarc.parsed_smtp_tls_reports_to_csv(parsed_report)
print("Passed!") print("Passed!")