Compare commits

...

86 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
2ce864fa81 Make dashboard queries backward compatible to show data from both forensic and failure indexes
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-20 21:38:45 +00:00
copilot-swe-agent[bot]
423e0611c5 Fix Splunk sourcetype to use colon separator (dmarc:failure) matching original convention
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-20 21:05:08 +00:00
copilot-swe-agent[bot]
195fdaf7b2 Add DMARCbis field validation, preserve pass disposition, add comprehensive tests
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-20 21:03:23 +00:00
copilot-swe-agent[bot]
447f452735 Rename forensic→failure in cli.py, docs, dashboards; add DMARCbis fields to ES/OS output
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-20 21:01:16 +00:00
copilot-swe-agent[bot]
148f4c87a9 Rename "forensic" to "failure" in docs and dashboard configs
Update documentation files (output.md, usage.md, kibana.md, splunk.md,
elasticsearch.md, index.md, example.ini) and dashboard configurations
(Grafana JSON, Kibana ndjson, Splunk XML) to use "failure" terminology
instead of "forensic", consistent with the codebase rename.

- CLI args: --forensic-* → --failure-*
- Config keys: save_forensic → save_failure, forensic_topic → failure_topic, etc.
- Index names: dmarc_forensic → dmarc_failure
- Splunk dashboard: renamed file from dmarc_forensic_dashboard.xml to dmarc_failure_dashboard.xml
- Backward-compat note preserved: "formerly known as forensic reports"

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-20 20:57:18 +00:00
copilot-swe-agent[bot]
89fdbd82b0 Rename forensic references to failure in cli.py
- Rename all forensic_* variables to failure_*
- Update CLI argument names (--forensic-* to --failure-*)
- Update default filenames (forensic.json/csv to failure.json/csv)
- Update function calls to match renamed output module functions
- Update index names (dmarc_forensic to dmarc_failure)
- Update report type strings and dict keys
- Add backward-compatible config key reading (accept both old and new names)
- Update help text and log messages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-20 20:53:20 +00:00
copilot-swe-agent[bot]
f019096e5f Rename forensic to failure in output/integration modules
Rename all 'forensic' references to 'failure' in the output modules:
- elastic.py, opensearch.py, splunk.py, kafkaclient.py, syslog.py,
  gelf.py, webhook.py, loganalytics.py, s3.py

Changes include:
- Function/method names: save_forensic_* → save_failure_*
- Variable/parameter names: forensic_* → failure_*
- Class names: _ForensicReportDoc → _FailureReportDoc,
  _ForensicSampleDoc → _FailureSampleDoc
- Index/topic/sourcetype names: dmarc_forensic → dmarc_failure
- Log messages and docstrings updated
- Import statements updated to use new names from core module
- Backward-compatible aliases added at end of each file
- DMARCbis aggregate fields added to elastic.py and opensearch.py:
  np (Keyword), testing (Keyword), discovery_method (Keyword),
  generator (Text)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-20 20:47:24 +00:00
copilot-swe-agent[bot]
6660be2c8c Align DMARCbis fields with actual XSD schema: testing, discovery_method, generator, human_result; handle namespaced XML
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-20 20:40:37 +00:00
copilot-swe-agent[bot]
e09b8506fa Add DMARCbis fields (np, psd, t) to aggregate reports and rename forensic→failure in core parsing
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-20 20:35:46 +00:00
copilot-swe-agent[bot]
c6413a4a4c Rename forensic references to failure with backward-compatible aliases
- Rename parse_forensic_report -> parse_failure_report
- Rename parsed_forensic_reports_to_csv_rows -> parsed_failure_reports_to_csv_rows
- Rename parsed_forensic_reports_to_csv -> parsed_failure_reports_to_csv
- Update all internal variable names (forensic_report -> failure_report, etc.)
- Change report_type from 'forensic' to 'failure'
- Use FailureReport type instead of ForensicReport
- Use InvalidFailureReport instead of InvalidForensicReport in function bodies
- Update all docstrings and log messages
- Add backward-compatible aliases at end of file

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-02-20 20:33:52 +00:00
copilot-swe-agent[bot]
81fba4b8d2 Initial plan 2026-02-20 20:21:44 +00:00
Sean Whalen
33eb2aaf62 9.1.0
## Enhancements

- Add TCP and TLS support for syslog output. (#656)
- Skip DNS lookups in GitHub Actions to prevent DNS timeouts during tests timeouts. (#657)
- Remove microseconds from DMARC aggregate report time ranges before parsing them.
2026-02-20 14:36:37 -05:00
Sean Whalen
1387fb4899 9.0.11
- Remove microseconds from DMARC aggregate report time ranges before parsing them.
2026-02-20 14:27:51 -05:00
Copilot
4d97bd25aa Skip DNS lookups in GitHub Actions to prevent test timeouts (#657)
* Add offline mode for tests in GitHub Actions to skip DNS lookups

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-18 18:19:28 -05:00
Copilot
17a612df0c Add TCP and TLS transport support to syslog module (#656)
- Updated parsedmarc/syslog.py to support UDP, TCP, and TLS protocols
- Added protocol parameter with UDP as default for backward compatibility
- Implemented TLS support with CA verification and client certificate auth
- Added retry logic for TCP/TLS connections with configurable attempts and delays
- Updated parsedmarc/cli.py with new config file parsing
- Updated documentation with examples for TCP and TLS configurations

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Remove CLI arguments for syslog options, keep config-file only

Per user request, removed command-line argument options for syslog parameters.
All new syslog options (protocol, TLS settings, timeout, retry) are now only
available via configuration file, consistent with other similar options.

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Fix code review issues: remove trailing whitespace and add cert validation

- Removed trailing whitespace from syslog.py and usage.md
- Added warning when only one of certfile_path/keyfile_path is provided
- Improved error handling for incomplete TLS client certificate configuration

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Set minimum TLS version to 1.2 for enhanced security

Explicitly configured ssl_context.minimum_version = TLSVersion.TLSv1_2
to ensure only secure TLS versions are used for syslog connections.

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2026-02-18 18:12:59 -05:00
Blackmoon
221bc332ef Fixed a typo in policies.successful_session_count (#654) 2026-02-09 13:57:11 -05:00
Sean Whalen
a2a75f7a81 Fix timestamp parsing in aggregate report by removing fractional seconds 2026-01-21 13:08:48 -05:00
Anael Mobilia
50fcb51577 Update supported Python versions in docs + readme (#652)
* Update README.md

* Update index.md

* Update python-tests.yml
2026-01-19 14:40:01 -05:00
Sean Whalen
dd9ef90773 9.0.10
- Support Python 3.14+
2026-01-17 14:09:18 -05:00
Sean Whalen
0e3a4b0f06 9.0.9
Validate that a string is base64-encoded before trying to base64 decode it. (PRs #648 and #649)
2026-01-08 13:29:23 -05:00
maraspr
343b53ef18 remove newlines before b64decode (#649) 2026-01-08 12:24:20 -05:00
maraspr
792079a3e8 Validate that string is base64 (#648) 2026-01-08 10:15:27 -05:00
Sean Whalen
1f3a1fc843 Better typing 2025-12-29 17:14:54 -05:00
Sean Whalen
34fa0c145d 9.0.8
- Fix logging configuration not propagating to child parser processes (#646).
- Update `mailsuite` dependency to `?=1.11.1` to solve issues with iCloud IMAP (#493).
2025-12-29 17:07:38 -05:00
Copilot
6719a06388 Fix logging configuration not propagating to child parser processes (#646)
* Initial plan

* Fix logging configuration propagation to child parser processes

- Add _configure_logging() helper function to set up logging in child processes
- Modified cli_parse() to accept log_level and log_file parameters
- Pass current logging configuration from parent to child processes
- Logging warnings/errors from child processes now properly display

Fixes issue where logging handlers in parent process were not inherited by
child processes created via multiprocessing.Process(). Child processes now
configure their own logging with the same settings as the parent.

Tested with sample files and confirmed warnings from DNS exceptions in child
processes are now visible.

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Address code review feedback on logging configuration

- Use exact type check (type(h) is logging.StreamHandler) instead of isinstance
  to avoid confusion with FileHandler subclass
- Catch specific exceptions (IOError, OSError, PermissionError) instead of
  bare Exception when creating FileHandler
- Kept logging.ERROR as default to maintain consistency with existing behavior

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2025-12-29 15:07:22 -05:00
Sean Whalen
eafa435868 Code cleanup 2025-12-29 14:32:05 -05:00
Sean Whalen
5d772c3b36 Bump version to 9.0.7 and update changelog with IMAP since option fix 2025-12-29 14:23:50 -05:00
Copilot
72cabbef23 Fix IMAP SEARCH SINCE date format to RFC 3501 DD-Mon-YYYY (#645)
* Initial plan

* Fix IMAP since option date format to use RFC 3501 compliant DD-Mon-YYYY format

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2025-12-29 14:18:48 -05:00
Sean Whalen
3d74cd6ac0 Update CHANGELOG with issue reference for email read status
Added a reference to issue #625 regarding email read status.
2025-12-29 12:10:19 -05:00
Tomáš Kováčik
d1ac59a016 fix #641 (#642)
* fix smtptls and forensic reports for GELF

* add policy_domain, policy_type and failed_session_count to record row

* Remove unused import of json in gelf.py

---------

Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
2025-12-29 12:05:07 -05:00
Anael Mobilia
7fdd53008f Update README.md (#644) 2025-12-29 10:36:21 -05:00
Sean Whalen
35331d4b84 Add parsedmarc.types module to API reference documentation 2025-12-25 17:24:45 -05:00
Sean Whalen
de9edd3590 Add note about email read status in Microsoft 365 to changelog 2025-12-25 17:16:39 -05:00
Sean Whalen
abf4bdba13 Add type annotations for SMTP TLS and forensic report structures 2025-12-25 16:39:33 -05:00
Sean Whalen
7b842740f5 Change file permissions for tests.py to make it executable 2025-12-25 16:02:33 -05:00
Sean Whalen
ebe3ccf40a Update changelog for version 9.0.6 and set version in constants.py 2025-12-25 16:01:25 -05:00
Sean Whalen
808285658f Refactor function parameters to use non-Optional types where applicable 2025-12-25 16:01:12 -05:00
Sean Whalen
bc1dae29bd Update mailsuite dependency version to 1.11.0 2025-12-25 15:32:27 -05:00
Sean Whalen
4b904444e5 Refactor and improve parsing and extraction functions
- Updated `extract_report` to handle various input types more robustly, removing unnecessary complexity and improving error handling.
- Simplified the handling of file-like objects and added checks for binary mode.
- Enhanced the `parse_report_email` function to streamline input processing and improve type handling.
- Introduced TypedDicts for better type safety in `utils.py`, specifically for reverse DNS and IP address information.
- Refined the configuration loading in `cli.py` to ensure boolean values are consistently cast to `bool`.
- Improved overall code readability and maintainability by restructuring and clarifying logic in several functions.
2025-12-25 15:30:20 -05:00
Sean Whalen
3608bce344 Remove unused import of Union and cast from cli.py 2025-12-24 16:53:22 -05:00
Sean Whalen
fe809c4c3f Add type ignore comments for Pyright in elastic.py and opensearch.py 2025-12-24 16:49:42 -05:00
Sean Whalen
a76c2f9621 More code cleanup 2025-12-24 16:36:59 -05:00
Sean Whalen
bb8f4002bf Use literal dicts instead of ordered dicts and other code cleanup 2025-12-24 15:04:10 -05:00
Sean Whalen
b5773c6b4a Fix etree import to type checkers don't complain 2025-12-24 14:37:38 -05:00
Sean Whalen
b99bd67225 Fix get_base_domain() typing 2025-12-24 14:32:05 -05:00
Sean Whalen
af9ad568ec Specify Python version requirements in pyproject.toml 2025-12-17 16:18:24 -05:00
Sean Whalen
748164d177 Fix #638 2025-12-17 16:09:26 -05:00
Sean Whalen
487e5e1149 Format on build 2025-12-12 15:56:52 -05:00
Sean Whalen
73010cf964 Use ruff for code formatting 2025-12-12 15:44:46 -05:00
Sean Whalen
a4a5475aa8 Fix another typo before releasing 9.0.5 2025-12-08 15:29:48 -05:00
Sean Whalen
dab78880df Actual 9.0.5 release
Fix typo
2025-12-08 15:26:58 -05:00
Sean Whalen
fb54e3b742 9.0.5
- Fix report type detection bug introduced in `9.0.4` (yanked).
2025-12-08 15:22:02 -05:00
Sean Whalen
6799f10364 9.0.4
Fixes

- Fix saving reports to OpenSearch ([#637](https://github.com/domainaware/parsedmarc/issues/637))
- Fix parsing certain DMARC failure/forensic reports
- Some fixes to type hints (incomplete, but published as-is due to the above bugs)
2025-12-08 13:26:59 -05:00
Sean Whalen
445c9565a4 Update bug link in docs 2025-12-06 15:05:19 -05:00
Sean Whalen
4b786846ae Remove Python 3.14 from testing
Until cpython bug https://github.com/python/cpython/issues/142307 is fixed
2025-12-05 11:05:29 -05:00
Sean Whalen
23ae563cd8 Update Python version support details in documentation 2025-12-05 10:48:04 -05:00
Sean Whalen
cdd000e675 9.0.3
- Set `requires-python` to `>=3.9, <3.14` to avoid [this bug](https://github.com/python/cpython/issues/142307)
2025-12-05 10:43:28 -05:00
Sean Whalen
7d58abc67b Add shebang and encoding declaration to tests.py 2025-12-04 10:21:53 -05:00
Sean Whalen
a18ae439de Fix typo in RHEL version support description in documentation 2025-12-04 10:18:15 -05:00
Sean Whalen
d7061330a8 Use None for blank fields in the Top 1000 Message Sources by Name DMARC Summary dashboard widget 2025-12-03 09:22:33 -05:00
Sean Whalen
9d5654b8ec Fix bugs with the Top 1000 Message Sources by Name DMARC Summary dashboard widget 2025-12-03 09:14:52 -05:00
Sean Whalen
a0e0070dd0 Bump version to 9.0.2 2025-12-02 20:12:58 -05:00
Sean Whalen
cf3b7f2c29 ## 9.0.2
## Improvements

- Type hinting is now used properly across the entire library. (#445)

## Fixes

- Decompress report files as needed when passed via the CLI.
- Fixed incomplete removal of the ability for `parsedmarc.utils.extract_report` to accept a file path directly in `8.15.0`.

## Breaking changes

This version of the library requires consumers to pass certain arguments as keyword-only. Internally, the API uses a bare `*` in the function signature. This is standard per [PEP 3102](https://peps.python.org/pep-3102/)  and as documented in the Python Language Reference.
.
2025-12-02 19:41:14 -05:00
Sean Whalen
d312522ab7 Enhance type hints and argument formatting in multiple files for improved clarity and consistency 2025-12-02 17:06:57 -05:00
Sean Whalen
888d717476 Enhance type hints and argument formatting in utils.py for improved clarity and consistency 2025-12-02 16:21:30 -05:00
Sean Whalen
1127f65fbb Enhance type hints and argument formatting in webhook.py for improved clarity and consistency 2025-12-02 15:52:31 -05:00
Sean Whalen
d017dfcddf Enhance type hints and argument formatting across multiple files for improved clarity and consistency 2025-12-02 15:17:37 -05:00
Sean Whalen
5fae99aacc Enhance type hints for improved clarity and consistency in __init__.py, elastic.py, and opensearch.py 2025-12-02 14:14:06 -05:00
Sean Whalen
ba57368ac3 Refactor argument formatting and type hints in elastic.py for consistency 2025-12-02 13:13:25 -05:00
Sean Whalen
dc6ee5de98 Add type hints to methods in opensearch.py for improved clarity and type checking 2025-12-02 13:11:59 -05:00
Sean Whalen
158d63d205 Complete annotations on elastic.py 2025-12-02 12:59:03 -05:00
Oscar Mattsson
f1933b906c Fix 404 link to maxmind docs (#635) 2025-12-02 09:26:01 -05:00
Anael Mobilia
4b98d795ff Define minimal Python version on pyproject (#634) 2025-12-01 20:22:49 -05:00
Sean Whalen
b1356f7dfc 9.0.1
- Allow multiple `records` for the same aggregate DMARC report in Elasticsearch and Opensearch (fixes issue in 9.0.0)
- Fix typos
2025-12-01 18:57:23 -05:00
Sean Whalen
1969196e1a Switch CHANGELOG headers 2025-12-01 18:01:54 -05:00
Sean Whalen
553f15f6a9 Code formatting 2025-12-01 17:24:10 -05:00
Sean Whalen
1fc9f638e2 9.0.0 (#629)
* Normalize report volumes when a report timespan exceed 24 hours
2025-12-01 17:06:58 -05:00
Sean Whalen
48bff504b4 Fix build script to properly publish docs 2025-12-01 11:08:21 -05:00
Sean Whalen
681b7cbf85 Formatting 2025-12-01 10:56:08 -05:00
Sean Whalen
0922d6e83a Add supported Python versions to the documentation index 2025-12-01 10:24:19 -05:00
Sean Whalen
baf3f95fb1 Update README with clarification on Python 3.6 support 2025-12-01 10:20:56 -05:00
Anael Mobilia
a51f945305 Clearly define supported Python versions policy (#633)
* Clearly define supported Python versions.

Support policy based on author's comment on https://github.com/domainaware/parsedmarc/pull/458#issuecomment-2002516299 #458

* Compile Python 3.6 as Ubuntu latest run against Ubuntu 24.04 which haven't Python3.6 + 20.04 is no longer available
https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json

* Use latest versions of GH Actions

* Silent some technicals GH Actions steps

* Elasticsearch / opensearch: use supported versions + align used versions

* Delete .github/workflows/python-tests-3.6.yml

Drop Python 3.6 test

* Update Python 3.6 support status in README

---------

Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
2025-12-01 10:02:47 -05:00
Sean Whalen
55dbf8e3db Add sources my name table to the Kibana DMARC Summary dashboard
This matches the table in the Splunk DMARC  Aggregate reports dashboard
2025-11-30 19:44:14 -05:00
Anael Mobilia
00267c9847 Codestyle cleanup (#631)
* Fix typos

* Copyright - Update date

* Codestyle xxx is False -> not xxx

* Ensure "_find_label_id_for_label" always return str

* PEP-8 : apiKey -> api_key + backward compatibility for config files

* Duplicate variable initialization

* Fix format
2025-11-30 19:13:57 -05:00
Anael Mobilia
51356175e1 Get option on the type described on documentation (#632) 2025-11-30 19:00:04 -05:00
Anael Mobilia
3be10d30dd Fix warnings in docker-compose.yml (#630)
* Fix level=warning msg="...\parsedmarc\docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"

* Fix "Unquoted port mapping not recommended"
2025-11-30 18:59:01 -05:00
48 changed files with 3430 additions and 1801 deletions

View File

@@ -24,11 +24,11 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v5
- name: Docker meta
id: meta
uses: docker/metadata-action@v3
uses: docker/metadata-action@v5
with:
images: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
@@ -40,16 +40,14 @@ jobs:
type=semver,pattern={{major}}.{{minor}}
- name: Log in to the Container registry
# https://github.com/docker/login-action/releases/tag/v2.0.0
uses: docker/login-action@49ed152c8eca782a232dede0303416e8f356c37b
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image
# https://github.com/docker/build-push-action/releases/tag/v3.0.0
uses: docker/build-push-action@e551b19e49efd4e98792db7592c17c09b89db8d8
uses: docker/build-push-action@v6
with:
context: .
push: ${{ github.event_name == 'release' }}

View File

@@ -15,7 +15,7 @@ jobs:
services:
elasticsearch:
image: elasticsearch:8.18.2
image: elasticsearch:8.19.7
env:
discovery.type: single-node
cluster.name: parsedmarc-cluster
@@ -30,18 +30,18 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y libemail-outlook-message-perl
sudo apt-get -q update
sudo apt-get -qy install libemail-outlook-message-perl
- name: Install Python dependencies
run: |
python -m pip install --upgrade pip
@@ -65,6 +65,6 @@ jobs:
run: |
hatch build
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}

13
.vscode/launch.json vendored
View File

@@ -19,20 +19,11 @@
"console": "integratedTerminal"
},
{
"name": "sample.eml",
"name": "sample",
"type": "debugpy",
"request": "launch",
"module": "parsedmarc.cli",
"args": ["samples/private/sample.eml"]
},
{
"name": "find_sus_domains.py",
"type": "debugpy",
"request": "launch",
"program": "find_sus_domains.py",
"args": ["-i", "unknown_domains.txt", "-o", "sus_domains.csv"],
"cwd": "${workspaceFolder}/parsedmarc/resources/maps",
"console": "integratedTerminal"
"args": ["samples/private/sample"]
},
{
"name": "sortlists.py",

296
.vscode/settings.json vendored
View File

@@ -1,144 +1,166 @@
{
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
// Let Ruff handle lint fixes + import sorting on save
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
}
},
"markdownlint.config": {
"MD024": false
},
"cSpell.words": [
"adkim",
"akamaiedge",
"amsmath",
"andrewmcgilvray",
"arcname",
"aspf",
"autoclass",
"automodule",
"backported",
"bellsouth",
"boto",
"brakhane",
"Brightmail",
"CEST",
"CHACHA",
"checkdmarc",
"Codecov",
"confnew",
"dateparser",
"dateutil",
"Davmail",
"DBIP",
"dearmor",
"deflist",
"devel",
"DMARC",
"Dmarcian",
"dnspython",
"dollarmath",
"dpkg",
"exampleuser",
"expiringdict",
"fieldlist",
"genindex",
"geoip",
"geoipupdate",
"Geolite",
"geolocation",
"githubpages",
"Grafana",
"hostnames",
"htpasswd",
"httpasswd",
"httplib",
"IMAP",
"imapclient",
"infile",
"Interaktive",
"IPDB",
"journalctl",
"keepalive",
"keyout",
"keyrings",
"Leeman",
"libemail",
"linkify",
"LISTSERV",
"lxml",
"mailparser",
"mailrelay",
"mailsuite",
"maxdepth",
"maxmind",
"mbox",
"mfrom",
"michaeldavie",
"mikesiegel",
"mitigations",
"MMDB",
"modindex",
"msgconvert",
"msgraph",
"MSSP",
"Munge",
"ndjson",
"newkey",
"Nhcm",
"nojekyll",
"nondigest",
"nosecureimap",
"nosniff",
"nwettbewerb",
"opensearch",
"parsedmarc",
"passsword",
"Postorius",
"premade",
"procs",
"publicsuffix",
"publicsuffixlist",
"publixsuffix",
"pygelf",
"pypy",
"pytest",
"quickstart",
"Reindex",
"replyto",
"reversename",
"Rollup",
"Rpdm",
"SAMEORIGIN",
"sdist",
"Servernameone",
"setuptools",
"smartquotes",
"SMTPTLS",
"sortlists",
"sortmaps",
"sourcetype",
"STARTTLS",
"tasklist",
"timespan",
"tlsa",
"tlsrpt",
"toctree",
"TQDDM",
"tqdm",
"truststore",
"Übersicht",
"uids",
"unparasable",
"uper",
"urllib",
"Valimail",
"venv",
"Vhcw",
"viewcode",
"virtualenv",
"WBITS",
"webmail",
"Wettbewerber",
"Whalen",
"whitespaces",
"xennn",
"xmltodict",
"xpack",
"zscholl"
"adkim",
"akamaiedge",
"amsmath",
"andrewmcgilvray",
"arcname",
"aspf",
"autoclass",
"automodule",
"backported",
"bellsouth",
"boto",
"brakhane",
"Brightmail",
"CEST",
"CHACHA",
"checkdmarc",
"Codecov",
"confnew",
"dateparser",
"dateutil",
"Davmail",
"DBIP",
"dearmor",
"deflist",
"devel",
"DMARC",
"Dmarcian",
"dnspython",
"dollarmath",
"dpkg",
"exampleuser",
"expiringdict",
"fieldlist",
"GELF",
"genindex",
"geoip",
"geoipupdate",
"Geolite",
"geolocation",
"githubpages",
"Grafana",
"hostnames",
"htpasswd",
"httpasswd",
"httplib",
"ifhost",
"IMAP",
"imapclient",
"infile",
"Interaktive",
"IPDB",
"journalctl",
"kafkaclient",
"keepalive",
"keyout",
"keyrings",
"Leeman",
"libemail",
"linkify",
"LISTSERV",
"loganalytics",
"lxml",
"mailparser",
"mailrelay",
"mailsuite",
"maxdepth",
"MAXHEADERS",
"maxmind",
"mbox",
"mfrom",
"mhdw",
"michaeldavie",
"mikesiegel",
"Mimecast",
"mitigations",
"MMDB",
"modindex",
"msgconvert",
"msgraph",
"MSSP",
"multiprocess",
"Munge",
"ndjson",
"newkey",
"Nhcm",
"nojekyll",
"nondigest",
"nosecureimap",
"nosniff",
"nwettbewerb",
"opensearch",
"opensearchpy",
"parsedmarc",
"passsword",
"pbar",
"Postorius",
"premade",
"privatesuffix",
"procs",
"publicsuffix",
"publicsuffixlist",
"publixsuffix",
"pygelf",
"pypy",
"pytest",
"quickstart",
"Reindex",
"replyto",
"reversename",
"Rollup",
"Rpdm",
"SAMEORIGIN",
"sdist",
"Servernameone",
"setuptools",
"smartquotes",
"SMTPTLS",
"sortlists",
"sortmaps",
"sourcetype",
"STARTTLS",
"tasklist",
"timespan",
"tlsa",
"tlsrpt",
"toctree",
"TQDDM",
"tqdm",
"truststore",
"Übersicht",
"uids",
"Uncategorized",
"unparasable",
"uper",
"urllib",
"Valimail",
"venv",
"Vhcw",
"viewcode",
"virtualenv",
"WBITS",
"webmail",
"Wettbewerber",
"Whalen",
"whitespaces",
"xennn",
"xmltodict",
"xpack",
"zscholl"
],
}

File diff suppressed because it is too large Load Diff

View File

@@ -23,11 +23,10 @@ ProofPoint Email Fraud Defense, and Valimail.
## Help Wanted
This project is maintained by one developer. Please consider
reviewing the open
[issues](https://github.com/domainaware/parsedmarc/issues) to see how
you can contribute code, documentation, or user support. Assistance on
the pinned issues would be particularly helpful.
This project is maintained by one developer. Please consider reviewing the open
[issues](https://github.com/domainaware/parsedmarc/issues) to see how you can
contribute code, documentation, or user support. Assistance on the pinned
issues would be particularly helpful.
Thanks to all
[contributors](https://github.com/domainaware/parsedmarc/graphs/contributors)!
@@ -42,6 +41,24 @@ Thanks to all
- Consistent data structures
- Simple JSON and/or CSV output
- Optionally email the results
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use
with premade dashboards
- Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for
use with premade dashboards
- Optionally send reports to Apache Kafka
## Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions
for RHEL or Debian.
| Version | Supported | Reason |
|---------|-----------|------------------------------------------------------------|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ✅ | Supported until August 2026 (Debian 11); May 2032 (RHEL 9) |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Actively maintained |

View File

@@ -9,12 +9,11 @@ fi
. venv/bin/activate
pip install .[build]
ruff format .
ruff check .
cd docs
make clean
make html
touch build/html/.nojekyll
if [ -d "./../parsedmarc-docs" ]; then
if [ -d "../../parsedmarc-docs" ]; then
cp -rf build/html/* ../../parsedmarc-docs/
fi
cd ..

1
ci.ini
View File

@@ -3,6 +3,7 @@ save_aggregate = True
save_forensic = True
save_smtp_tls = True
debug = True
offline = True
[elasticsearch]
hosts = http://localhost:9200

View File

@@ -1,8 +1,6 @@
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.3.1
image: docker.elastic.co/elasticsearch/elasticsearch:8.19.7
environment:
- network.host=127.0.0.1
- http.host=0.0.0.0
@@ -14,7 +12,7 @@ services:
- xpack.security.enabled=false
- xpack.license.self_generated.type=basic
ports:
- 127.0.0.1:9200:9200
- "127.0.0.1:9200:9200"
ulimits:
memlock:
soft: -1
@@ -30,7 +28,7 @@ services:
retries: 24
opensearch:
image: opensearchproject/opensearch:2.18.0
image: opensearchproject/opensearch:2
environment:
- network.host=127.0.0.1
- http.host=0.0.0.0
@@ -41,7 +39,7 @@ services:
- bootstrap.memory_lock=true
- OPENSEARCH_INITIAL_ADMIN_PASSWORD=${OPENSEARCH_INITIAL_ADMIN_PASSWORD}
ports:
- 127.0.0.1:9201:9200
- "127.0.0.1:9201:9200"
ulimits:
memlock:
soft: -1

View File

@@ -28,6 +28,13 @@
:members:
```
## parsedmarc.types
```{eval-rst}
.. automodule:: parsedmarc.types
:members:
```
## parsedmarc.utils
```{eval-rst}

View File

@@ -20,7 +20,7 @@ from parsedmarc import __version__
# -- Project information -----------------------------------------------------
project = "parsedmarc"
copyright = "2018 - 2023, Sean Whalen and contributors"
copyright = "2018 - 2025, Sean Whalen and contributors"
author = "Sean Whalen and contributors"
# The version info for the project you're documenting, acts as replacement for

View File

@@ -214,7 +214,7 @@ Kibana index patterns with versions that match the upgraded indexes:
1. Login in to Kibana, and click on Management
2. Under Kibana, click on Saved Objects
3. Check the checkboxes for the `dmarc_aggregate` and `dmarc_forensic`
3. Check the checkboxes for the `dmarc_aggregate` and `dmarc_failure`
index patterns
4. Click Delete
5. Click Delete on the conformation message

View File

@@ -2,7 +2,7 @@
[general]
save_aggregate = True
save_forensic = True
save_failure = True
[imap]
host = imap.example.com

View File

@@ -34,7 +34,7 @@ and Valimail.
## Features
- Parses draft and 1.0 standard aggregate/rua DMARC reports
- Parses forensic/failure/ruf DMARC reports
- Parses failure/ruf DMARC reports
- Parses reports from SMTP TLS Reporting
- Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
- Transparently handles gzip or zip compressed reports
@@ -45,6 +45,24 @@ and Valimail.
with premade dashboards
- Optionally send reports to Apache Kafka
## Python Compatibility
This project supports the following Python versions, which are either actively maintained or are the default versions
for RHEL or Debian.
| Version | Supported | Reason |
|---------|-----------|------------------------------------------------------------|
| < 3.6 | ❌ | End of Life (EOL) |
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
| 3.7 | ❌ | End of Life (EOL) |
| 3.8 | ❌ | End of Life (EOL) |
| 3.9 | ✅ | Supported until August 2026 (Debian 11); May 2032 (RHEL 9) |
| 3.10 | ✅ | Actively maintained |
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | ✅ | Actively maintained |
```{toctree}
:caption: 'Contents'
:maxdepth: 2

View File

@@ -199,7 +199,7 @@ sudo apt-get install libemail-outlook-message-perl
[geoipupdate releases page on github]: https://github.com/maxmind/geoipupdate/releases
[ip to country lite database]: https://db-ip.com/db/download/ip-to-country-lite
[license keys]: https://www.maxmind.com/en/accounts/current/license-key
[maxmind geoipupdate page]: https://dev.maxmind.com/geoip/geoipupdate/
[maxmind geoipupdate page]: https://dev.maxmind.com/geoip/updating-databases/
[maxmind geolite2 country database]: https://dev.maxmind.com/geoip/geolite2-free-geolocation-data
[registering for a free geolite2 account]: https://www.maxmind.com/en/geolite2/signup
[to comply with various privacy regulations]: https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/

View File

@@ -74,14 +74,14 @@ the DMARC Summary dashboard. To view failures only, use the pie chart.
Any other filters work the same way. You can also add your own custom temporary
filters by clicking on Add Filter at the upper right of the page.
## DMARC Forensic Samples
## DMARC Failure Samples
The DMARC Forensic Samples dashboard contains information on DMARC forensic
reports (also known as failure reports or ruf reports). These reports contain
The DMARC Failure Samples dashboard contains information on DMARC failure
reports (also known as ruf reports). These reports contain
samples of emails that have failed to pass DMARC.
:::{note}
Most recipients do not send forensic/failure/ruf reports at all to avoid
Most recipients do not send failure/ruf reports at all to avoid
privacy leaks. Some recipients (notably Chinese webmail services) will only
supply the headers of sample emails. Very few provide the entire email.
:::

View File

@@ -23,6 +23,8 @@ of the report schema.
"report_id": "9391651994964116463",
"begin_date": "2012-04-27 20:00:00",
"end_date": "2012-04-28 19:59:59",
"timespan_requires_normalization": false,
"original_timespan_seconds": 86399,
"errors": []
},
"policy_published": {
@@ -39,8 +41,10 @@ of the report schema.
"source": {
"ip_address": "72.150.241.94",
"country": "US",
"reverse_dns": "adsl-72-150-241-94.shv.bellsouth.net",
"base_domain": "bellsouth.net"
"reverse_dns": null,
"base_domain": null,
"name": null,
"type": null
},
"count": 2,
"alignment": {
@@ -74,7 +78,10 @@ of the report schema.
"result": "pass"
}
]
}
},
"normalized_timespan": false,
"interval_begin": "2012-04-28 00:00:00",
"interval_end": "2012-04-28 23:59:59"
}
]
}
@@ -83,16 +90,18 @@ of the report schema.
### CSV aggregate report
```text
xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-27 20:00:00,2012-04-28 19:59:59,,example.com,r,r,none,none,100,0,72.150.241.94,US,adsl-72-150-241-94.shv.bellsouth.net,bellsouth.net,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
xml_schema,org_name,org_email,org_extra_contact_info,report_id,begin_date,end_date,normalized_timespan,errors,domain,adkim,aspf,p,sp,pct,fo,source_ip_address,source_country,source_reverse_dns,source_base_domain,source_name,source_type,count,spf_aligned,dkim_aligned,dmarc_aligned,disposition,policy_override_reasons,policy_override_comments,envelope_from,header_from,envelope_to,dkim_domains,dkim_selectors,dkim_results,spf_domains,spf_scopes,spf_results
draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
draft,acme.com,noreply-dmarc-support@acme.com,http://acme.com/dmarc/support,9391651994964116463,2012-04-28 00:00:00,2012-04-28 23:59:59,False,,example.com,r,r,none,none,100,0,72.150.241.94,US,,,,,2,True,False,True,none,,,example.com,example.com,,example.com,none,fail,example.com,mfrom,pass
```
## Sample forensic report output
## Sample failure report output
Thanks to GitHub user [xennn](https://github.com/xennn) for the anonymized
[forensic report email sample](<https://github.com/domainaware/parsedmarc/raw/master/samples/forensic/DMARC%20Failure%20Report%20for%20domain.de%20(mail-from%3Dsharepoint%40domain.de%2C%20ip%3D10.10.10.10).eml>).
[failure report email sample](<https://github.com/domainaware/parsedmarc/raw/master/samples/forensic/DMARC%20Failure%20Report%20for%20domain.de%20(mail-from%3Dsharepoint%40domain.de%2C%20ip%3D10.10.10.10).eml>).
### JSON forensic report
### JSON failure report
```json
{
@@ -181,7 +190,7 @@ Thanks to GitHub user [xennn](https://github.com/xennn) for the anonymized
}
```
### CSV forensic report
### CSV failure report
```text
feedback_type,user_agent,version,original_envelope_id,original_mail_from,original_rcpt_to,arrival_date,arrival_date_utc,subject,message_id,authentication_results,dkim_domain,source_ip_address,source_country,source_reverse_dns,source_base_domain,delivery_result,auth_failure,reported_domain,authentication_mechanisms,sample_headers_only

View File

@@ -1,10 +1,10 @@
# Splunk
Starting in version 4.3.0 `parsedmarc` supports sending aggregate and/or
forensic DMARC data to a Splunk [HTTP Event collector (HEC)].
failure DMARC data to a Splunk [HTTP Event collector (HEC)].
The project repository contains [XML files] for premade Splunk
dashboards for aggregate and forensic DMARC reports.
dashboards for aggregate and failure DMARC reports.
Copy and paste the contents of each file into a separate Splunk
dashboard XML editor.

View File

@@ -4,47 +4,50 @@
```text
usage: parsedmarc [-h] [-c CONFIG_FILE] [--strip-attachment-payloads] [-o OUTPUT]
[--aggregate-json-filename AGGREGATE_JSON_FILENAME]
[--forensic-json-filename FORENSIC_JSON_FILENAME]
[--aggregate-csv-filename AGGREGATE_CSV_FILENAME]
[--forensic-csv-filename FORENSIC_CSV_FILENAME]
[-n NAMESERVERS [NAMESERVERS ...]] [-t DNS_TIMEOUT] [--offline]
[-s] [--verbose] [--debug] [--log-file LOG_FILE] [-v]
[file_path ...]
[--aggregate-json-filename AGGREGATE_JSON_FILENAME] [--failure-json-filename FAILURE_JSON_FILENAME]
[--smtp-tls-json-filename SMTP_TLS_JSON_FILENAME] [--aggregate-csv-filename AGGREGATE_CSV_FILENAME]
[--failure-csv-filename FAILURE_CSV_FILENAME] [--smtp-tls-csv-filename SMTP_TLS_CSV_FILENAME]
[-n NAMESERVERS [NAMESERVERS ...]] [-t DNS_TIMEOUT] [--offline] [-s] [-w] [--verbose] [--debug]
[--log-file LOG_FILE] [--no-prettify-json] [-v]
[file_path ...]
Parses DMARC reports
Parses DMARC reports
positional arguments:
file_path one or more paths to aggregate or forensic report
files, emails, or mbox files'
positional arguments:
file_path one or more paths to aggregate or failure report files, emails, or mbox files'
optional arguments:
-h, --help show this help message and exit
-c CONFIG_FILE, --config-file CONFIG_FILE
a path to a configuration file (--silent implied)
--strip-attachment-payloads
remove attachment payloads from forensic report output
-o OUTPUT, --output OUTPUT
write output files to the given directory
--aggregate-json-filename AGGREGATE_JSON_FILENAME
filename for the aggregate JSON output file
--forensic-json-filename FORENSIC_JSON_FILENAME
filename for the forensic JSON output file
--aggregate-csv-filename AGGREGATE_CSV_FILENAME
filename for the aggregate CSV output file
--forensic-csv-filename FORENSIC_CSV_FILENAME
filename for the forensic CSV output file
-n NAMESERVERS [NAMESERVERS ...], --nameservers NAMESERVERS [NAMESERVERS ...]
nameservers to query
-t DNS_TIMEOUT, --dns_timeout DNS_TIMEOUT
number of seconds to wait for an answer from DNS
(default: 2.0)
--offline do not make online queries for geolocation or DNS
-s, --silent only print errors and warnings
--verbose more verbose output
--debug print debugging information
--log-file LOG_FILE output logging to a file
-v, --version show program's version number and exit
options:
-h, --help show this help message and exit
-c CONFIG_FILE, --config-file CONFIG_FILE
a path to a configuration file (--silent implied)
--strip-attachment-payloads
remove attachment payloads from failure report output
-o OUTPUT, --output OUTPUT
write output files to the given directory
--aggregate-json-filename AGGREGATE_JSON_FILENAME
filename for the aggregate JSON output file
--failure-json-filename FAILURE_JSON_FILENAME
filename for the failure JSON output file
--smtp-tls-json-filename SMTP_TLS_JSON_FILENAME
filename for the SMTP TLS JSON output file
--aggregate-csv-filename AGGREGATE_CSV_FILENAME
filename for the aggregate CSV output file
--failure-csv-filename FAILURE_CSV_FILENAME
filename for the failure CSV output file
--smtp-tls-csv-filename SMTP_TLS_CSV_FILENAME
filename for the SMTP TLS CSV output file
-n NAMESERVERS [NAMESERVERS ...], --nameservers NAMESERVERS [NAMESERVERS ...]
nameservers to query
-t DNS_TIMEOUT, --dns_timeout DNS_TIMEOUT
number of seconds to wait for an answer from DNS (default: 2.0)
--offline do not make online queries for geolocation or DNS
-s, --silent only print errors
-w, --warnings print warnings in addition to errors
--verbose more verbose output
--debug print debugging information
--log-file LOG_FILE output logging to a file
--no-prettify-json output JSON in a single line without indentation
-v, --version show program's version number and exit
```
:::{note}
@@ -67,7 +70,7 @@ For example
[general]
save_aggregate = True
save_forensic = True
save_failure = True
[imap]
host = imap.example.com
@@ -106,7 +109,7 @@ mode = tcp
[webhook]
aggregate_url = https://aggregate_url.example.com
forensic_url = https://forensic_url.example.com
failure_url = https://failure_url.example.com
smtp_tls_url = https://smtp_tls_url.example.com
timeout = 60
```
@@ -116,7 +119,7 @@ The full set of configuration options are:
- `general`
- `save_aggregate` - bool: Save aggregate report data to
Elasticsearch, Splunk and/or S3
- `save_forensic` - bool: Save forensic report data to
- `save_failure` - bool: Save failure report data to
Elasticsearch, Splunk and/or S3
- `save_smtp_tls` - bool: Save SMTP-STS report data to
Elasticsearch, Splunk and/or S3
@@ -127,7 +130,7 @@ The full set of configuration options are:
- `output` - str: Directory to place JSON and CSV files in. This is required if you set either of the JSON output file options.
- `aggregate_json_filename` - str: filename for the aggregate
JSON output file
- `forensic_json_filename` - str: filename for the forensic
- `failure_json_filename` - str: filename for the failure
JSON output file
- `ip_db_path` - str: An optional custom path to a MMDB file
from MaxMind or DBIP
@@ -168,8 +171,8 @@ The full set of configuration options are:
- `check_timeout` - int: Number of seconds to wait for a IMAP
IDLE response or the number of seconds until the next
mail check (Default: `30`)
- `since` - str: Search for messages since certain time. (Examples: `5m|3h|2d|1w`)
Acceptable units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"}).
- `since` - str: Search for messages since certain time. (Examples: `5m|3h|2d|1w`)
Acceptable units - {"m":"minutes", "h":"hours", "d":"days", "w":"weeks"}.
Defaults to `1d` if incorrect value is provided.
- `imap`
- `host` - str: The IMAP server hostname or IP address
@@ -237,7 +240,7 @@ The full set of configuration options are:
group and use that as the group id.
```powershell
New-ApplicationAccessPolicy -AccessRight RestrictAccess
New-ApplicationAccessPolicy -AccessRight RestrictAccess
-AppId "<CLIENT_ID>" -PolicyScopeGroupId "<MAILBOX>"
-Description "Restrict access to dmarc reports mailbox."
```
@@ -254,7 +257,7 @@ The full set of configuration options are:
:::
- `user` - str: Basic auth username
- `password` - str: Basic auth password
- `apiKey` - str: API key
- `api_key` - str: API key
- `ssl` - bool: Use an encrypted SSL/TLS connection
(Default: `True`)
- `timeout` - float: Timeout in seconds (Default: 60)
@@ -277,7 +280,7 @@ The full set of configuration options are:
:::
- `user` - str: Basic auth username
- `password` - str: Basic auth password
- `apiKey` - str: API key
- `api_key` - str: API key
- `ssl` - bool: Use an encrypted SSL/TLS connection
(Default: `True`)
- `timeout` - float: Timeout in seconds (Default: 60)
@@ -303,7 +306,7 @@ The full set of configuration options are:
- `skip_certificate_verification` - bool: Skip certificate
verification (not recommended)
- `aggregate_topic` - str: The Kafka topic for aggregate reports
- `forensic_topic` - str: The Kafka topic for forensic reports
- `failure_topic` - str: The Kafka topic for failure reports
- `smtp`
- `host` - str: The SMTP hostname
- `port` - int: The SMTP port (Default: `25`)
@@ -333,13 +336,65 @@ The full set of configuration options are:
- `secret_access_key` - str: The secret access key (Optional)
- `syslog`
- `server` - str: The Syslog server name or IP address
- `port` - int: The UDP port to use (Default: `514`)
- `port` - int: The port to use (Default: `514`)
- `protocol` - str: The protocol to use: `udp`, `tcp`, or `tls` (Default: `udp`)
- `cafile_path` - str: Path to CA certificate file for TLS server verification (Optional)
- `certfile_path` - str: Path to client certificate file for TLS authentication (Optional)
- `keyfile_path` - str: Path to client private key file for TLS authentication (Optional)
- `timeout` - float: Connection timeout in seconds for TCP/TLS (Default: `5.0`)
- `retry_attempts` - int: Number of retry attempts for failed connections (Default: `3`)
- `retry_delay` - int: Delay in seconds between retry attempts (Default: `5`)
**Example UDP configuration (default):**
```ini
[syslog]
server = syslog.example.com
port = 514
```
**Example TCP configuration:**
```ini
[syslog]
server = syslog.example.com
port = 6514
protocol = tcp
timeout = 10.0
retry_attempts = 5
```
**Example TLS configuration with server verification:**
```ini
[syslog]
server = syslog.example.com
port = 6514
protocol = tls
cafile_path = /path/to/ca-cert.pem
timeout = 10.0
```
**Example TLS configuration with mutual authentication:**
```ini
[syslog]
server = syslog.example.com
port = 6514
protocol = tls
cafile_path = /path/to/ca-cert.pem
certfile_path = /path/to/client-cert.pem
keyfile_path = /path/to/client-key.pem
timeout = 10.0
retry_attempts = 3
retry_delay = 5
```
- `gmail_api`
- `credentials_file` - str: Path to file containing the
credentials, None to disable (Default: `None`)
- `token_file` - str: Path to save the token file
(Default: `.token`)
:::{note}
credentials_file and token_file can be got with [quickstart](https://developers.google.com/gmail/api/quickstart/python).Please change the scope to `https://www.googleapis.com/auth/gmail.modify`.
:::
@@ -359,7 +414,7 @@ The full set of configuration options are:
- `dce` - str: The Data Collection Endpoint (DCE). Example: `https://{DCE-NAME}.{REGION}.ingest.monitor.azure.com`.
- `dcr_immutable_id` - str: The immutable ID of the Data Collection Rule (DCR)
- `dcr_aggregate_stream` - str: The stream name for aggregate reports in the DCR
- `dcr_forensic_stream` - str: The stream name for the forensic reports in the DCR
- `dcr_failure_stream` - str: The stream name for the failure reports in the DCR
- `dcr_smtp_tls_stream` - str: The stream name for the SMTP TLS reports in the DCR
:::{note}
@@ -376,7 +431,7 @@ The full set of configuration options are:
- `webhook` - Post the individual reports to a webhook url with the report as the JSON body
- `aggregate_url` - str: URL of the webhook which should receive the aggregate reports
- `forensic_url` - str: URL of the webhook which should receive the forensic reports
- `failure_url` - str: URL of the webhook which should receive the failure reports
- `smtp_tls_url` - str: URL of the webhook which should receive the smtp_tls reports
- `timeout` - int: Interval in which the webhook call should timeout
@@ -391,26 +446,26 @@ blocks DNS requests to outside resolvers.
:::
:::{note}
`save_aggregate` and `save_forensic` are separate options
because you may not want to save forensic reports
(also known as failure reports) to your Elasticsearch instance,
`save_aggregate` and `save_failure` are separate options
because you may not want to save failure reports
(formerly known as forensic reports) to your Elasticsearch instance,
particularly if you are in a highly-regulated industry that
handles sensitive data, such as healthcare or finance. If your
legitimate outgoing email fails DMARC, it is possible
that email may appear later in a forensic report.
that email may appear later in a failure report.
Forensic reports contain the original headers of an email that
Failure reports contain the original headers of an email that
failed a DMARC check, and sometimes may also include the
full message body, depending on the policy of the reporting
organization.
Most reporting organizations do not send forensic reports of any
Most reporting organizations do not send failure reports of any
kind for privacy reasons. While aggregate DMARC reports are sent
at least daily, it is normal to receive very few forensic reports.
at least daily, it is normal to receive very few failure reports.
An alternative approach is to still collect forensic/failure/ruf
An alternative approach is to still collect failure/ruf
reports in your DMARC inbox, but run `parsedmarc` with
```save_forensic = True``` manually on a separate IMAP folder (using
```save_failure = True``` manually on a separate IMAP folder (using
the ```reports_folder``` option), after you have manually moved
known samples you want to save to that folder
(e.g. malicious samples and non-sensitive legitimate samples).
@@ -439,7 +494,7 @@ Update the limit to 2k per example:
PUT _cluster/settings
{
"persistent" : {
"cluster.max_shards_per_node" : 2000
"cluster.max_shards_per_node" : 2000
}
}
```

View File

@@ -83,7 +83,7 @@
"id": 28,
"panels": [
{
"content": "# DMARC Summary\r\nAs the name suggests, this dashboard is the best place to start reviewing your aggregate DMARC data.\r\n\r\nAcross the top of the dashboard, three pie charts display the percentage of alignment pass/fail for SPF, DKIM, and DMARC. Clicking on any chart segment will filter for that value.\r\n\r\n***Note***\r\nMessages should not be considered malicious just because they failed to pass DMARC; especially if you have just started collecting data. It may be a legitimate service that needs SPF and DKIM configured correctly.\r\n\r\nStart by filtering the results to only show failed DKIM alignment. While DMARC passes if a message passes SPF or DKIM alignment, only DKIM alignment remains valid when a message is forwarded without changing the from address, which is often caused by a mailbox forwarding rule. This is because DKIM signatures are part of the message headers, whereas SPF relies on SMTP session headers.\r\n\r\nUnderneath the pie charts. you can see graphs of DMARC passage and message disposition over time.\r\n\r\nUnder the graphs you will find the most useful data tables on the dashboard. On the left, there is a list of organizations that are sending you DMARC reports. In the center, there is a list of sending servers grouped by the base domain in their reverse DNS. On the right, there is a list of email from domains, sorted by message volume.\r\n\r\nBy hovering your mouse over a data table value and using the magnifying glass icons, you can filter on or filter out different values. Start by looking at the Message Sources by Reverse DNS table. Find a sender that you recognize, such as an email marketing service, hover over it, and click on the plus (+) magnifying glass icon, to add a filter that only shows results for that sender. Now, look at the Message From Header table to the right. That shows you the domains that a sender is sending as, which might tell you which brand/business is using a particular service. With that information, you can contact them and have them set up DKIM.\r\n\r\n***Note***\r\nIf you have a lot of B2C customers, you may see a high volume of emails as your domains coming from consumer email services, such as Google/Gmail and Yahoo! This occurs when customers have mailbox rules in place that forward emails from an old account to a new account, which is why DKIM authentication is so important, as mentioned earlier. Similar patterns may be observed with businesses who send from reverse DNS addressees of parent, subsidiary, and outdated brands.\r\n\r\n***Note***\r\nYou can add your own custom temporary filters by clicking on Add Filter at the upper right of the page.\r\n\r\n# DMARC Forensic Samples\r\nThe DMARC Forensic Samples section contains information on DMARC forensic reports (also known as failure reports or ruf reports). These reports contain samples of emails that have failed to pass DMARC.\r\n\r\n***Note***\r\nMost recipients do not send forensic/failure/ruf reports at all to avoid privacy leaks. Some recipients (notably Chinese webmail services) will only supply the headers of sample emails. Very few provide the entire email.\r\n\r\n# DMARC Alignment Guide\r\nDMARC ensures that SPF and DKIM authentication mechanisms actually authenticate against the same domain that the end user sees.\r\n\r\nA message passes a DMARC check by passing DKIM or SPF, **as long as the related indicators are also in alignment.**\r\n\r\n| \t| DKIM \t| SPF \t|\r\n|-----------\t|--------------------------------------------------------------------------------------------------------------------------------------------------\t|----------------------------------------------------------------------------------------------------------------\t|\r\n| **Passing** \t| The signature in the DKIM header is validated using a public key that is published as a DNS record of the domain name specified in the signature \t| The mail server's IP address is listed in the SPF record of the domain in the SMTP envelope's mail from header \t|\r\n| **Alignment** \t| The signing domain aligns with the domain in the message's from header \t| The domain in the SMTP envelope's mail from header aligns with the domain in the message's from header \t|\r\n\r\n\r\n# Further Reading\r\n[Demystifying DMARC: A guide to preventing email spoofing](https://seanthegeek.net/459/demystifying-dmarc/amp/)\r\n\r\n[DMARC Manual](https://menainfosec.com/wp-content/uploads/2017/12/DMARC_Service_Manual.pdf)\r\n\r\n[What is “External Destination Verification”?](https://dmarcian.com/what-is-external-destination-verification/)",
"content": "# DMARC Summary\r\nAs the name suggests, this dashboard is the best place to start reviewing your aggregate DMARC data.\r\n\r\nAcross the top of the dashboard, three pie charts display the percentage of alignment pass/fail for SPF, DKIM, and DMARC. Clicking on any chart segment will filter for that value.\r\n\r\n***Note***\r\nMessages should not be considered malicious just because they failed to pass DMARC; especially if you have just started collecting data. It may be a legitimate service that needs SPF and DKIM configured correctly.\r\n\r\nStart by filtering the results to only show failed DKIM alignment. While DMARC passes if a message passes SPF or DKIM alignment, only DKIM alignment remains valid when a message is forwarded without changing the from address, which is often caused by a mailbox forwarding rule. This is because DKIM signatures are part of the message headers, whereas SPF relies on SMTP session headers.\r\n\r\nUnderneath the pie charts. you can see graphs of DMARC passage and message disposition over time.\r\n\r\nUnder the graphs you will find the most useful data tables on the dashboard. On the left, there is a list of organizations that are sending you DMARC reports. In the center, there is a list of sending servers grouped by the base domain in their reverse DNS. On the right, there is a list of email from domains, sorted by message volume.\r\n\r\nBy hovering your mouse over a data table value and using the magnifying glass icons, you can filter on or filter out different values. Start by looking at the Message Sources by Reverse DNS table. Find a sender that you recognize, such as an email marketing service, hover over it, and click on the plus (+) magnifying glass icon, to add a filter that only shows results for that sender. Now, look at the Message From Header table to the right. That shows you the domains that a sender is sending as, which might tell you which brand/business is using a particular service. With that information, you can contact them and have them set up DKIM.\r\n\r\n***Note***\r\nIf you have a lot of B2C customers, you may see a high volume of emails as your domains coming from consumer email services, such as Google/Gmail and Yahoo! This occurs when customers have mailbox rules in place that forward emails from an old account to a new account, which is why DKIM authentication is so important, as mentioned earlier. Similar patterns may be observed with businesses who send from reverse DNS addressees of parent, subsidiary, and outdated brands.\r\n\r\n***Note***\r\nYou can add your own custom temporary filters by clicking on Add Filter at the upper right of the page.\r\n\r\n# DMARC Failure Samples\r\nThe DMARC Failure Samples section contains information on DMARC failure reports (also known as ruf reports). These reports contain samples of emails that have failed to pass DMARC.\r\n\r\n***Note***\r\nMost recipients do not send failure/ruf reports at all to avoid privacy leaks. Some recipients (notably Chinese webmail services) will only supply the headers of sample emails. Very few provide the entire email.\r\n\r\n# DMARC Alignment Guide\r\nDMARC ensures that SPF and DKIM authentication mechanisms actually authenticate against the same domain that the end user sees.\r\n\r\nA message passes a DMARC check by passing DKIM or SPF, **as long as the related indicators are also in alignment.**\r\n\r\n| \t| DKIM \t| SPF \t|\r\n|-----------\t|--------------------------------------------------------------------------------------------------------------------------------------------------\t|----------------------------------------------------------------------------------------------------------------\t|\r\n| **Passing** \t| The signature in the DKIM header is validated using a public key that is published as a DNS record of the domain name specified in the signature \t| The mail server's IP address is listed in the SPF record of the domain in the SMTP envelope's mail from header \t|\r\n| **Alignment** \t| The signing domain aligns with the domain in the message's from header \t| The domain in the SMTP envelope's mail from header aligns with the domain in the message's from header \t|\r\n\r\n\r\n# Further Reading\r\n[Demystifying DMARC: A guide to preventing email spoofing](https://seanthegeek.net/459/demystifying-dmarc/amp/)\r\n\r\n[DMARC Manual](https://menainfosec.com/wp-content/uploads/2017/12/DMARC_Service_Manual.pdf)\r\n\r\n[What is “External Destination Verification”?](https://dmarcian.com/what-is-external-destination-verification/)",
"datasource": null,
"fieldConfig": {
"defaults": {
@@ -101,7 +101,7 @@
"links": [],
"mode": "markdown",
"options": {
"content": "# DMARC Summary\r\nAs the name suggests, this dashboard is the best place to start reviewing your aggregate DMARC data.\r\n\r\nAcross the top of the dashboard, three pie charts display the percentage of alignment pass/fail for SPF, DKIM, and DMARC. Clicking on any chart segment will filter for that value.\r\n\r\n***Note***\r\nMessages should not be considered malicious just because they failed to pass DMARC; especially if you have just started collecting data. It may be a legitimate service that needs SPF and DKIM configured correctly.\r\n\r\nStart by filtering the results to only show failed DKIM alignment. While DMARC passes if a message passes SPF or DKIM alignment, only DKIM alignment remains valid when a message is forwarded without changing the from address, which is often caused by a mailbox forwarding rule. This is because DKIM signatures are part of the message headers, whereas SPF relies on SMTP session headers.\r\n\r\nUnderneath the pie charts. you can see graphs of DMARC passage and message disposition over time.\r\n\r\nUnder the graphs you will find the most useful data tables on the dashboard. On the left, there is a list of organizations that are sending you DMARC reports. In the center, there is a list of sending servers grouped by the base domain in their reverse DNS. On the right, there is a list of email from domains, sorted by message volume.\r\n\r\nBy hovering your mouse over a data table value and using the magnifying glass icons, you can filter on or filter out different values. Start by looking at the Message Sources by Reverse DNS table. Find a sender that you recognize, such as an email marketing service, hover over it, and click on the plus (+) magnifying glass icon, to add a filter that only shows results for that sender. Now, look at the Message From Header table to the right. That shows you the domains that a sender is sending as, which might tell you which brand/business is using a particular service. With that information, you can contact them and have them set up DKIM.\r\n\r\n***Note***\r\nIf you have a lot of B2C customers, you may see a high volume of emails as your domains coming from consumer email services, such as Google/Gmail and Yahoo! This occurs when customers have mailbox rules in place that forward emails from an old account to a new account, which is why DKIM authentication is so important, as mentioned earlier. Similar patterns may be observed with businesses who send from reverse DNS addressees of parent, subsidiary, and outdated brands.\r\n\r\n***Note***\r\nYou can add your own custom temporary filters by clicking on Add Filter at the upper right of the page.\r\n\r\n# DMARC Forensic Samples\r\nThe DMARC Forensic Samples section contains information on DMARC forensic reports (also known as failure reports or ruf reports). These reports contain samples of emails that have failed to pass DMARC.\r\n\r\n***Note***\r\nMost recipients do not send forensic/failure/ruf reports at all to avoid privacy leaks. Some recipients (notably Chinese webmail services) will only supply the headers of sample emails. Very few provide the entire email.\r\n\r\n# DMARC Alignment Guide\r\nDMARC ensures that SPF and DKIM authentication mechanisms actually authenticate against the same domain that the end user sees.\r\n\r\nA message passes a DMARC check by passing DKIM or SPF, **as long as the related indicators are also in alignment.**\r\n\r\n| \t| DKIM \t| SPF \t|\r\n|-----------\t|--------------------------------------------------------------------------------------------------------------------------------------------------\t|----------------------------------------------------------------------------------------------------------------\t|\r\n| **Passing** \t| The signature in the DKIM header is validated using a public key that is published as a DNS record of the domain name specified in the signature \t| The mail server's IP address is listed in the SPF record of the domain in the SMTP envelope's mail from header \t|\r\n| **Alignment** \t| The signing domain aligns with the domain in the message's from header \t| The domain in the SMTP envelope's mail from header aligns with the domain in the message's from header \t|\r\n\r\n\r\n# Further Reading\r\n[Demystifying DMARC: A guide to preventing email spoofing](https://seanthegeek.net/459/demystifying-dmarc/amp/)\r\n\r\n[DMARC Manual](https://menainfosec.com/wp-content/uploads/2017/12/DMARC_Service_Manual.pdf)\r\n\r\n[What is “External Destination Verification”?](https://dmarcian.com/what-is-external-destination-verification/)",
"content": "# DMARC Summary\r\nAs the name suggests, this dashboard is the best place to start reviewing your aggregate DMARC data.\r\n\r\nAcross the top of the dashboard, three pie charts display the percentage of alignment pass/fail for SPF, DKIM, and DMARC. Clicking on any chart segment will filter for that value.\r\n\r\n***Note***\r\nMessages should not be considered malicious just because they failed to pass DMARC; especially if you have just started collecting data. It may be a legitimate service that needs SPF and DKIM configured correctly.\r\n\r\nStart by filtering the results to only show failed DKIM alignment. While DMARC passes if a message passes SPF or DKIM alignment, only DKIM alignment remains valid when a message is forwarded without changing the from address, which is often caused by a mailbox forwarding rule. This is because DKIM signatures are part of the message headers, whereas SPF relies on SMTP session headers.\r\n\r\nUnderneath the pie charts. you can see graphs of DMARC passage and message disposition over time.\r\n\r\nUnder the graphs you will find the most useful data tables on the dashboard. On the left, there is a list of organizations that are sending you DMARC reports. In the center, there is a list of sending servers grouped by the base domain in their reverse DNS. On the right, there is a list of email from domains, sorted by message volume.\r\n\r\nBy hovering your mouse over a data table value and using the magnifying glass icons, you can filter on or filter out different values. Start by looking at the Message Sources by Reverse DNS table. Find a sender that you recognize, such as an email marketing service, hover over it, and click on the plus (+) magnifying glass icon, to add a filter that only shows results for that sender. Now, look at the Message From Header table to the right. That shows you the domains that a sender is sending as, which might tell you which brand/business is using a particular service. With that information, you can contact them and have them set up DKIM.\r\n\r\n***Note***\r\nIf you have a lot of B2C customers, you may see a high volume of emails as your domains coming from consumer email services, such as Google/Gmail and Yahoo! This occurs when customers have mailbox rules in place that forward emails from an old account to a new account, which is why DKIM authentication is so important, as mentioned earlier. Similar patterns may be observed with businesses who send from reverse DNS addressees of parent, subsidiary, and outdated brands.\r\n\r\n***Note***\r\nYou can add your own custom temporary filters by clicking on Add Filter at the upper right of the page.\r\n\r\n# DMARC Failure Samples\r\nThe DMARC Failure Samples section contains information on DMARC failure reports (also known as ruf reports). These reports contain samples of emails that have failed to pass DMARC.\r\n\r\n***Note***\r\nMost recipients do not send failure/ruf reports at all to avoid privacy leaks. Some recipients (notably Chinese webmail services) will only supply the headers of sample emails. Very few provide the entire email.\r\n\r\n# DMARC Alignment Guide\r\nDMARC ensures that SPF and DKIM authentication mechanisms actually authenticate against the same domain that the end user sees.\r\n\r\nA message passes a DMARC check by passing DKIM or SPF, **as long as the related indicators are also in alignment.**\r\n\r\n| \t| DKIM \t| SPF \t|\r\n|-----------\t|--------------------------------------------------------------------------------------------------------------------------------------------------\t|----------------------------------------------------------------------------------------------------------------\t|\r\n| **Passing** \t| The signature in the DKIM header is validated using a public key that is published as a DNS record of the domain name specified in the signature \t| The mail server's IP address is listed in the SPF record of the domain in the SMTP envelope's mail from header \t|\r\n| **Alignment** \t| The signing domain aligns with the domain in the message's from header \t| The domain in the SMTP envelope's mail from header aligns with the domain in the message's from header \t|\r\n\r\n\r\n# Further Reading\r\n[Demystifying DMARC: A guide to preventing email spoofing](https://seanthegeek.net/459/demystifying-dmarc/amp/)\r\n\r\n[DMARC Manual](https://menainfosec.com/wp-content/uploads/2017/12/DMARC_Service_Manual.pdf)\r\n\r\n[What is “External Destination Verification”?](https://dmarcian.com/what-is-external-destination-verification/)",
"mode": "markdown"
},
"pluginVersion": "7.1.0",

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,2 +1,3 @@
__version__ = "8.19.1"
__version__ = "9.1.0"
USER_AGENT = f"parsedmarc/{__version__}"

View File

@@ -1,27 +1,30 @@
# -*- coding: utf-8 -*-
from collections import OrderedDict
from __future__ import annotations
from elasticsearch_dsl.search import Q
from typing import Any, Optional, Union
from elasticsearch.helpers import reindex
from elasticsearch_dsl import (
connections,
Object,
Boolean,
Date,
Document,
Index,
Nested,
InnerDoc,
Integer,
Text,
Boolean,
Ip,
Date,
Keyword,
Nested,
Object,
Search,
Text,
connections,
)
from elasticsearch.helpers import reindex
from elasticsearch_dsl.search import Q
from parsedmarc import InvalidFailureReport
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import InvalidForensicReport
class ElasticsearchError(Exception):
@@ -41,18 +44,23 @@ class _PublishedPolicy(InnerDoc):
sp = Text()
pct = Integer()
fo = Text()
np = Keyword()
testing = Keyword()
discovery_method = Keyword()
class _DKIMResult(InnerDoc):
domain = Text()
selector = Text()
result = Text()
human_result = Text()
class _SPFResult(InnerDoc):
domain = Text()
scope = Text()
results = Text()
human_result = Text()
class _AggregateReportDoc(Document):
@@ -67,6 +75,8 @@ class _AggregateReportDoc(Document):
date_range = Date()
date_begin = Date()
date_end = Date()
normalized_timespan = Boolean()
original_timespan_seconds = Integer
errors = Text()
published_policy = Object(_PublishedPolicy)
source_ip_address = Ip()
@@ -86,19 +96,37 @@ class _AggregateReportDoc(Document):
envelope_to = Text()
dkim_results = Nested(_DKIMResult)
spf_results = Nested(_SPFResult)
np = Keyword()
testing = Keyword()
discovery_method = Keyword()
generator = Text()
def add_policy_override(self, type_, comment):
self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment))
def add_policy_override(self, type_: str, comment: str):
self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment)) # pyright: ignore[reportCallIssue]
def add_dkim_result(self, domain, selector, result):
def add_dkim_result(
self, domain: str, selector: str, result: _DKIMResult,
human_result: str = None,
):
self.dkim_results.append(
_DKIMResult(domain=domain, selector=selector, result=result)
)
_DKIMResult(
domain=domain, selector=selector, result=result,
human_result=human_result,
)
) # pyright: ignore[reportCallIssue]
def add_spf_result(self, domain, scope, result):
self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result))
def add_spf_result(
self, domain: str, scope: str, result: _SPFResult,
human_result: str = None,
):
self.spf_results.append(
_SPFResult(
domain=domain, scope=scope, result=result,
human_result=human_result,
)
) # pyright: ignore[reportCallIssue]
def save(self, **kwargs):
def save(self, **kwargs): # pyright: ignore[reportIncompatibleMethodOverride]
self.passed_dmarc = False
self.passed_dmarc = self.spf_aligned or self.dkim_aligned
@@ -116,7 +144,7 @@ class _EmailAttachmentDoc(Document):
sha256 = Text()
class _ForensicSampleDoc(InnerDoc):
class _FailureSampleDoc(InnerDoc):
raw = Text()
headers = Object()
headers_only = Boolean()
@@ -131,31 +159,31 @@ class _ForensicSampleDoc(InnerDoc):
body = Text()
attachments = Nested(_EmailAttachmentDoc)
def add_to(self, display_name, address):
self.to.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_to(self, display_name: str, address: str):
self.to.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_reply_to(self, display_name, address):
def add_reply_to(self, display_name: str, address: str):
self.reply_to.append(
_EmailAddressDoc(display_name=display_name, address=address)
)
) # pyright: ignore[reportCallIssue]
def add_cc(self, display_name, address):
self.cc.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_cc(self, display_name: str, address: str):
self.cc.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_bcc(self, display_name, address):
self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_bcc(self, display_name: str, address: str):
self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_attachment(self, filename, content_type, sha256):
def add_attachment(self, filename: str, content_type: str, sha256: str):
self.attachments.append(
_EmailAttachmentDoc(
filename=filename, content_type=content_type, sha256=sha256
)
)
) # pyright: ignore[reportCallIssue]
class _ForensicReportDoc(Document):
class _FailureReportDoc(Document):
class Index:
name = "dmarc_forensic"
name = "dmarc_failure"
feedback_type = Text()
user_agent = Text()
@@ -173,7 +201,7 @@ class _ForensicReportDoc(Document):
source_auth_failures = Text()
dkim_domain = Text()
original_rcpt_to = Text()
sample = Object(_ForensicSampleDoc)
sample = Object(_FailureSampleDoc)
class _SMTPTLSFailureDetailsDoc(InnerDoc):
@@ -197,15 +225,15 @@ class _SMTPTLSPolicyDoc(InnerDoc):
def add_failure_details(
self,
result_type,
ip_address,
receiving_ip,
receiving_mx_helo,
failed_session_count,
sending_mta_ip=None,
receiving_mx_hostname=None,
additional_information_uri=None,
failure_reason_code=None,
result_type: Optional[str] = None,
ip_address: Optional[str] = None,
receiving_ip: Optional[str] = None,
receiving_mx_helo: Optional[str] = None,
failed_session_count: Optional[int] = None,
sending_mta_ip: Optional[str] = None,
receiving_mx_hostname: Optional[str] = None,
additional_information_uri: Optional[str] = None,
failure_reason_code: Union[str, int, None] = None,
):
_details = _SMTPTLSFailureDetailsDoc(
result_type=result_type,
@@ -218,7 +246,7 @@ class _SMTPTLSPolicyDoc(InnerDoc):
additional_information=additional_information_uri,
failure_reason_code=failure_reason_code,
)
self.failure_details.append(_details)
self.failure_details.append(_details) # pyright: ignore[reportCallIssue]
class _SMTPTLSReportDoc(Document):
@@ -235,13 +263,14 @@ class _SMTPTLSReportDoc(Document):
def add_policy(
self,
policy_type,
policy_domain,
successful_session_count,
failed_session_count,
policy_string=None,
mx_host_patterns=None,
failure_details=None,
policy_type: str,
policy_domain: str,
successful_session_count: int,
failed_session_count: int,
*,
policy_string: Optional[str] = None,
mx_host_patterns: Optional[list[str]] = None,
failure_details: Optional[str] = None,
):
self.policies.append(
policy_type=policy_type,
@@ -251,7 +280,7 @@ class _SMTPTLSReportDoc(Document):
policy_string=policy_string,
mx_host_patterns=mx_host_patterns,
failure_details=failure_details,
)
) # pyright: ignore[reportCallIssue]
class AlreadySaved(ValueError):
@@ -259,24 +288,25 @@ class AlreadySaved(ValueError):
def set_hosts(
hosts,
use_ssl=False,
ssl_cert_path=None,
username=None,
password=None,
apiKey=None,
timeout=60.0,
hosts: Union[str, list[str]],
*,
use_ssl: bool = False,
ssl_cert_path: Optional[str] = None,
username: Optional[str] = None,
password: Optional[str] = None,
api_key: Optional[str] = None,
timeout: float = 60.0,
):
"""
Sets the Elasticsearch hosts to use
Args:
hosts (str): A single hostname or URL, or list of hostnames or URLs
use_ssl (bool): Use a HTTPS connection to the server
hosts (str | list[str]): A single hostname or URL, or list of hostnames or URLs
use_ssl (bool): Use an HTTPS connection to the server
ssl_cert_path (str): Path to the certificate chain
username (str): The username to use for authentication
password (str): The password to use for authentication
apiKey (str): The Base64 encoded API key to use for authentication
api_key (str): The Base64 encoded API key to use for authentication
timeout (float): Timeout in seconds
"""
if not isinstance(hosts, list):
@@ -289,14 +319,14 @@ def set_hosts(
conn_params["ca_certs"] = ssl_cert_path
else:
conn_params["verify_certs"] = False
if username:
if username and password:
conn_params["http_auth"] = username + ":" + password
if apiKey:
conn_params["api_key"] = apiKey
if api_key:
conn_params["api_key"] = api_key
connections.create_connection(**conn_params)
def create_indexes(names, settings=None):
def create_indexes(names: list[str], settings: Optional[dict[str, Any]] = None):
"""
Create Elasticsearch indexes
@@ -319,19 +349,22 @@ def create_indexes(names, settings=None):
raise ElasticsearchError("Elasticsearch error: {0}".format(e.__str__()))
def migrate_indexes(aggregate_indexes=None, forensic_indexes=None):
def migrate_indexes(
aggregate_indexes: Optional[list[str]] = None,
failure_indexes: Optional[list[str]] = None,
):
"""
Updates index mappings
Args:
aggregate_indexes (list): A list of aggregate index names
forensic_indexes (list): A list of forensic index names
failure_indexes (list): A list of failure index names
"""
version = 2
if aggregate_indexes is None:
aggregate_indexes = []
if forensic_indexes is None:
forensic_indexes = []
if failure_indexes is None:
failure_indexes = []
for aggregate_index_name in aggregate_indexes:
if not Index(aggregate_index_name).exists():
continue
@@ -358,26 +391,26 @@ def migrate_indexes(aggregate_indexes=None, forensic_indexes=None):
}
Index(new_index_name).create()
Index(new_index_name).put_mapping(doc_type=doc, body=body)
reindex(connections.get_connection(), aggregate_index_name, new_index_name)
reindex(connections.get_connection(), aggregate_index_name, new_index_name) # pyright: ignore[reportArgumentType]
Index(aggregate_index_name).delete()
for forensic_index in forensic_indexes:
for failure_index in failure_indexes:
pass
def save_aggregate_report_to_elasticsearch(
aggregate_report,
index_suffix=None,
index_prefix=None,
monthly_indexes=False,
number_of_shards=1,
number_of_replicas=0,
aggregate_report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: Optional[bool] = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed DMARC aggregate report to Elasticsearch
Args:
aggregate_report (OrderedDict): A parsed forensic report
aggregate_report (dict): A parsed aggregate report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -395,21 +428,17 @@ def save_aggregate_report_to_elasticsearch(
domain = aggregate_report["policy_published"]["domain"]
begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True)
end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True)
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
if monthly_indexes:
index_date = begin_date.strftime("%Y-%m")
else:
index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
domain_query = Q(dict(match_phrase={"published_policy.domain": domain}))
begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
end_date_query = Q(dict(match=dict(date_end=end_date)))
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) # type: ignore
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) # pyright: ignore[reportArgumentType]
domain_query = Q(dict(match_phrase={"published_policy.domain": domain})) # pyright: ignore[reportArgumentType]
begin_date_query = Q(dict(match=dict(date_begin=begin_date))) # pyright: ignore[reportArgumentType]
end_date_query = Q(dict(match=dict(date_end=end_date))) # pyright: ignore[reportArgumentType]
if index_suffix is not None:
search_index = "dmarc_aggregate_{0}*".format(index_suffix)
@@ -421,6 +450,8 @@ def save_aggregate_report_to_elasticsearch(
query = org_name_query & report_id_query & domain_query
query = query & begin_date_query & end_date_query
search.query = query
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
try:
existing = search.execute()
@@ -447,9 +478,25 @@ def save_aggregate_report_to_elasticsearch(
sp=aggregate_report["policy_published"]["sp"],
pct=aggregate_report["policy_published"]["pct"],
fo=aggregate_report["policy_published"]["fo"],
np=aggregate_report["policy_published"].get("np"),
testing=aggregate_report["policy_published"].get("testing"),
discovery_method=aggregate_report["policy_published"].get(
"discovery_method"
),
)
for record in aggregate_report["records"]:
begin_date = human_timestamp_to_datetime(record["interval_begin"], to_utc=True)
end_date = human_timestamp_to_datetime(record["interval_end"], to_utc=True)
normalized_timespan = record["normalized_timespan"]
if monthly_indexes:
index_date = begin_date.strftime("%Y-%m")
else:
index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
agg_doc = _AggregateReportDoc(
xml_schema=aggregate_report["xml_schema"],
org_name=metadata["org_name"],
@@ -457,8 +504,9 @@ def save_aggregate_report_to_elasticsearch(
org_extra_contact_info=metadata["org_extra_contact_info"],
report_id=metadata["report_id"],
date_range=date_range,
date_begin=aggregate_report["begin_date"],
date_end=aggregate_report["end_date"],
date_begin=begin_date,
date_end=end_date,
normalized_timespan=normalized_timespan,
errors=metadata["errors"],
published_policy=published_policy,
source_ip_address=record["source"]["ip_address"],
@@ -476,6 +524,12 @@ def save_aggregate_report_to_elasticsearch(
header_from=record["identifiers"]["header_from"],
envelope_from=record["identifiers"]["envelope_from"],
envelope_to=record["identifiers"]["envelope_to"],
np=aggregate_report["policy_published"].get("np"),
testing=aggregate_report["policy_published"].get("testing"),
discovery_method=aggregate_report["policy_published"].get(
"discovery_method"
),
generator=metadata.get("generator"),
)
for override in record["policy_evaluated"]["policy_override_reasons"]:
@@ -488,6 +542,7 @@ def save_aggregate_report_to_elasticsearch(
domain=dkim_result["domain"],
selector=dkim_result["selector"],
result=dkim_result["result"],
human_result=dkim_result.get("human_result"),
)
for spf_result in record["auth_results"]["spf"]:
@@ -495,6 +550,7 @@ def save_aggregate_report_to_elasticsearch(
domain=spf_result["domain"],
scope=spf_result["scope"],
result=spf_result["result"],
human_result=spf_result.get("human_result"),
)
index = "dmarc_aggregate"
@@ -508,7 +564,7 @@ def save_aggregate_report_to_elasticsearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
)
create_indexes([index], index_settings)
agg_doc.meta.index = index
agg_doc.meta.index = index # pyright: ignore[reportOptionalMemberAccess, reportAttributeAccessIssue]
try:
agg_doc.save()
@@ -516,19 +572,19 @@ def save_aggregate_report_to_elasticsearch(
raise ElasticsearchError("Elasticsearch error: {0}".format(e.__str__()))
def save_forensic_report_to_elasticsearch(
forensic_report,
index_suffix=None,
index_prefix=None,
monthly_indexes=False,
number_of_shards=1,
number_of_replicas=0,
def save_failure_report_to_elasticsearch(
failure_report: dict[str, Any],
index_suffix: Optional[Any] = None,
index_prefix: Optional[str] = None,
monthly_indexes: Optional[bool] = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed DMARC forensic report to Elasticsearch
Saves a parsed DMARC failure report to Elasticsearch
Args:
forensic_report (OrderedDict): A parsed forensic report
failure_report (dict): A parsed failure report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily
@@ -541,28 +597,33 @@ def save_forensic_report_to_elasticsearch(
AlreadySaved
"""
logger.info("Saving forensic report to Elasticsearch")
forensic_report = forensic_report.copy()
logger.info("Saving failure report to Elasticsearch")
failure_report = failure_report.copy()
sample_date = None
if forensic_report["parsed_sample"]["date"] is not None:
sample_date = forensic_report["parsed_sample"]["date"]
if failure_report["parsed_sample"]["date"] is not None:
sample_date = failure_report["parsed_sample"]["date"]
sample_date = human_timestamp_to_datetime(sample_date)
original_headers = forensic_report["parsed_sample"]["headers"]
headers = OrderedDict()
original_headers = failure_report["parsed_sample"]["headers"]
headers: dict[str, Any] = {}
for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header]
arrival_date = human_timestamp_to_datetime(forensic_report["arrival_date_utc"])
arrival_date = human_timestamp_to_datetime(failure_report["arrival_date_utc"])
arrival_date_epoch_milliseconds = int(arrival_date.timestamp() * 1000)
if index_suffix is not None:
search_index = "dmarc_forensic_{0}*".format(index_suffix)
search_index = "dmarc_failure_{0}*,dmarc_forensic_{0}*".format(
index_suffix
)
else:
search_index = "dmarc_forensic*"
search_index = "dmarc_failure*,dmarc_forensic*"
if index_prefix is not None:
search_index = "{0}{1}".format(index_prefix, search_index)
search_index = ",".join(
"{0}{1}".format(index_prefix, part)
for part in search_index.split(",")
)
search = Search(index=search_index)
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds)))
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds))) # pyright: ignore[reportArgumentType]
from_ = None
to_ = None
@@ -577,7 +638,7 @@ def save_forensic_report_to_elasticsearch(
from_ = dict()
from_["sample.headers.from"] = headers["from"]
from_query = Q(dict(match_phrase=from_))
from_query = Q(dict(match_phrase=from_)) # pyright: ignore[reportArgumentType]
q = q & from_query
if "to" in headers:
# We convert the TO header from a string list to a flat string.
@@ -589,76 +650,76 @@ def save_forensic_report_to_elasticsearch(
to_ = dict()
to_["sample.headers.to"] = headers["to"]
to_query = Q(dict(match_phrase=to_))
to_query = Q(dict(match_phrase=to_)) # pyright: ignore[reportArgumentType]
q = q & to_query
if "subject" in headers:
subject = headers["subject"]
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
q = q & Q(subject_query)
q = q & Q(subject_query) # pyright: ignore[reportArgumentType]
search.query = q
existing = search.execute()
if len(existing) > 0:
raise AlreadySaved(
"A forensic sample to {0} from {1} "
"A failure sample to {0} from {1} "
"with a subject of {2} and arrival date of {3} "
"already exists in "
"Elasticsearch".format(
to_, from_, subject, forensic_report["arrival_date_utc"]
to_, from_, subject, failure_report["arrival_date_utc"]
)
)
parsed_sample = forensic_report["parsed_sample"]
sample = _ForensicSampleDoc(
raw=forensic_report["sample"],
parsed_sample = failure_report["parsed_sample"]
sample = _FailureSampleDoc(
raw=failure_report["sample"],
headers=headers,
headers_only=forensic_report["sample_headers_only"],
headers_only=failure_report["sample_headers_only"],
date=sample_date,
subject=forensic_report["parsed_sample"]["subject"],
subject=failure_report["parsed_sample"]["subject"],
filename_safe_subject=parsed_sample["filename_safe_subject"],
body=forensic_report["parsed_sample"]["body"],
body=failure_report["parsed_sample"]["body"],
)
for address in forensic_report["parsed_sample"]["to"]:
for address in failure_report["parsed_sample"]["to"]:
sample.add_to(display_name=address["display_name"], address=address["address"])
for address in forensic_report["parsed_sample"]["reply_to"]:
for address in failure_report["parsed_sample"]["reply_to"]:
sample.add_reply_to(
display_name=address["display_name"], address=address["address"]
)
for address in forensic_report["parsed_sample"]["cc"]:
for address in failure_report["parsed_sample"]["cc"]:
sample.add_cc(display_name=address["display_name"], address=address["address"])
for address in forensic_report["parsed_sample"]["bcc"]:
for address in failure_report["parsed_sample"]["bcc"]:
sample.add_bcc(display_name=address["display_name"], address=address["address"])
for attachment in forensic_report["parsed_sample"]["attachments"]:
for attachment in failure_report["parsed_sample"]["attachments"]:
sample.add_attachment(
filename=attachment["filename"],
content_type=attachment["mail_content_type"],
sha256=attachment["sha256"],
)
try:
forensic_doc = _ForensicReportDoc(
feedback_type=forensic_report["feedback_type"],
user_agent=forensic_report["user_agent"],
version=forensic_report["version"],
original_mail_from=forensic_report["original_mail_from"],
failure_doc = _FailureReportDoc(
feedback_type=failure_report["feedback_type"],
user_agent=failure_report["user_agent"],
version=failure_report["version"],
original_mail_from=failure_report["original_mail_from"],
arrival_date=arrival_date_epoch_milliseconds,
domain=forensic_report["reported_domain"],
original_envelope_id=forensic_report["original_envelope_id"],
authentication_results=forensic_report["authentication_results"],
delivery_results=forensic_report["delivery_result"],
source_ip_address=forensic_report["source"]["ip_address"],
source_country=forensic_report["source"]["country"],
source_reverse_dns=forensic_report["source"]["reverse_dns"],
source_base_domain=forensic_report["source"]["base_domain"],
authentication_mechanisms=forensic_report["authentication_mechanisms"],
auth_failure=forensic_report["auth_failure"],
dkim_domain=forensic_report["dkim_domain"],
original_rcpt_to=forensic_report["original_rcpt_to"],
domain=failure_report["reported_domain"],
original_envelope_id=failure_report["original_envelope_id"],
authentication_results=failure_report["authentication_results"],
delivery_results=failure_report["delivery_result"],
source_ip_address=failure_report["source"]["ip_address"],
source_country=failure_report["source"]["country"],
source_reverse_dns=failure_report["source"]["reverse_dns"],
source_base_domain=failure_report["source"]["base_domain"],
authentication_mechanisms=failure_report["authentication_mechanisms"],
auth_failure=failure_report["auth_failure"],
dkim_domain=failure_report["dkim_domain"],
original_rcpt_to=failure_report["original_rcpt_to"],
sample=sample,
)
index = "dmarc_forensic"
index = "dmarc_failure"
if index_suffix:
index = "{0}_{1}".format(index, index_suffix)
if index_prefix:
@@ -672,30 +733,30 @@ def save_forensic_report_to_elasticsearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
)
create_indexes([index], index_settings)
forensic_doc.meta.index = index
failure_doc.meta.index = index # pyright: ignore[reportAttributeAccessIssue, reportOptionalMemberAccess]
try:
forensic_doc.save()
failure_doc.save()
except Exception as e:
raise ElasticsearchError("Elasticsearch error: {0}".format(e.__str__()))
except KeyError as e:
raise InvalidForensicReport(
"Forensic report missing required field: {0}".format(e.__str__())
raise InvalidFailureReport(
"Failure report missing required field: {0}".format(e.__str__())
)
def save_smtp_tls_report_to_elasticsearch(
report,
index_suffix=None,
index_prefix=None,
monthly_indexes=False,
number_of_shards=1,
number_of_replicas=0,
report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed SMTP TLS report to Elasticsearch
Args:
report (OrderedDict): A parsed SMTP TLS report
report (dict): A parsed SMTP TLS report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -719,10 +780,10 @@ def save_smtp_tls_report_to_elasticsearch(
report["begin_date"] = begin_date
report["end_date"] = end_date
org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
end_date_query = Q(dict(match=dict(date_end=end_date)))
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) # pyright: ignore[reportArgumentType]
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) # pyright: ignore[reportArgumentType]
begin_date_query = Q(dict(match=dict(date_begin=begin_date))) # pyright: ignore[reportArgumentType]
end_date_query = Q(dict(match=dict(date_end=end_date))) # pyright: ignore[reportArgumentType]
if index_suffix is not None:
search_index = "smtp_tls_{0}*".format(index_suffix)
@@ -781,7 +842,7 @@ def save_smtp_tls_report_to_elasticsearch(
policy_doc = _SMTPTLSPolicyDoc(
policy_domain=policy["policy_domain"],
policy_type=policy["policy_type"],
succesful_session_count=policy["successful_session_count"],
successful_session_count=policy["successful_session_count"],
failed_session_count=policy["failed_session_count"],
policy_string=policy_strings,
mx_host_patterns=mx_host_patterns,
@@ -823,12 +884,18 @@ def save_smtp_tls_report_to_elasticsearch(
additional_information_uri=additional_information_uri,
failure_reason_code=failure_reason_code,
)
smtp_tls_doc.policies.append(policy_doc)
smtp_tls_doc.policies.append(policy_doc) # pyright: ignore[reportCallIssue]
create_indexes([index], index_settings)
smtp_tls_doc.meta.index = index
smtp_tls_doc.meta.index = index # pyright: ignore[reportOptionalMemberAccess, reportAttributeAccessIssue]
try:
smtp_tls_doc.save()
except Exception as e:
raise ElasticsearchError("Elasticsearch error: {0}".format(e.__str__()))
# Backward-compatible aliases
_ForensicSampleDoc = _FailureSampleDoc
_ForensicReportDoc = _FailureReportDoc
save_forensic_report_to_elasticsearch = save_failure_report_to_elasticsearch

View File

@@ -1,17 +1,19 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
import logging
import logging.handlers
import json
import threading
from typing import Any
from pygelf import GelfTcpHandler, GelfTlsHandler, GelfUdpHandler
from parsedmarc import (
parsed_aggregate_reports_to_csv_rows,
parsed_forensic_reports_to_csv_rows,
parsed_failure_reports_to_csv_rows,
parsed_smtp_tls_reports_to_csv_rows,
)
from pygelf import GelfTcpHandler, GelfUdpHandler, GelfTlsHandler
log_context_data = threading.local()
@@ -48,7 +50,7 @@ class GelfClient(object):
)
self.logger.addHandler(self.handler)
def save_aggregate_report_to_gelf(self, aggregate_reports):
def save_aggregate_report_to_gelf(self, aggregate_reports: list[dict[str, Any]]):
rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports)
for row in rows:
log_context_data.parsedmarc = row
@@ -56,12 +58,18 @@ class GelfClient(object):
log_context_data.parsedmarc = None
def save_forensic_report_to_gelf(self, forensic_reports):
rows = parsed_forensic_reports_to_csv_rows(forensic_reports)
def save_failure_report_to_gelf(self, failure_reports: list[dict[str, Any]]):
rows = parsed_failure_reports_to_csv_rows(failure_reports)
for row in rows:
self.logger.info(json.dumps(row))
log_context_data.parsedmarc = row
self.logger.info("parsedmarc failure report")
def save_smtp_tls_report_to_gelf(self, smtp_tls_reports):
def save_smtp_tls_report_to_gelf(self, smtp_tls_reports: dict[str, Any]):
rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports)
for row in rows:
self.logger.info(json.dumps(row))
log_context_data.parsedmarc = row
self.logger.info("parsedmarc smtptls report")
# Backward-compatible aliases
GelfClient.save_forensic_report_to_gelf = GelfClient.save_failure_report_to_gelf

View File

@@ -1,15 +1,17 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
import json
from ssl import create_default_context
from ssl import SSLContext, create_default_context
from typing import Any, Optional, Union
from kafka import KafkaProducer
from kafka.errors import NoBrokersAvailable, UnknownTopicOrPartitionError
from collections import OrderedDict
from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import __version__
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
class KafkaError(RuntimeError):
@@ -18,7 +20,13 @@ class KafkaError(RuntimeError):
class KafkaClient(object):
def __init__(
self, kafka_hosts, ssl=False, username=None, password=None, ssl_context=None
self,
kafka_hosts: list[str],
*,
ssl: Optional[bool] = False,
username: Optional[str] = None,
password: Optional[str] = None,
ssl_context: Optional[SSLContext] = None,
):
"""
Initializes the Kafka client
@@ -28,7 +36,7 @@ class KafkaClient(object):
ssl (bool): Use a SSL/TLS connection
username (str): An optional username
password (str): An optional password
ssl_context: SSL context options
ssl_context (SSLContext): SSL context options
Notes:
``use_ssl=True`` is implied when a username or password are
@@ -38,7 +46,7 @@ class KafkaClient(object):
``$ConnectionString``, and the password is the
Azure Event Hub connection string.
"""
config = dict(
config: dict[str, Any] = dict(
value_serializer=lambda v: json.dumps(v).encode("utf-8"),
bootstrap_servers=kafka_hosts,
client_id="parsedmarc-{0}".format(__version__),
@@ -55,7 +63,7 @@ class KafkaClient(object):
raise KafkaError("No Kafka brokers available")
@staticmethod
def strip_metadata(report):
def strip_metadata(report: dict[str, Any]):
"""
Duplicates org_name, org_email and report_id into JSON root
and removes report_metadata key to bring it more inline
@@ -69,7 +77,7 @@ class KafkaClient(object):
return report
@staticmethod
def generate_daterange(report):
def generate_date_range(report: dict[str, Any]):
"""
Creates a date_range timestamp with format YYYY-MM-DD-T-HH:MM:SS
based on begin and end dates for easier parsing in Kibana.
@@ -86,7 +94,11 @@ class KafkaClient(object):
logger.debug("date_range is {}".format(date_range))
return date_range
def save_aggregate_reports_to_kafka(self, aggregate_reports, aggregate_topic):
def save_aggregate_reports_to_kafka(
self,
aggregate_reports: Union[dict[str, Any], list[dict[str, Any]]],
aggregate_topic: str,
):
"""
Saves aggregate DMARC reports to Kafka
@@ -96,16 +108,14 @@ class KafkaClient(object):
aggregate_topic (str): The name of the Kafka topic
"""
if isinstance(aggregate_reports, dict) or isinstance(
aggregate_reports, OrderedDict
):
if isinstance(aggregate_reports, dict):
aggregate_reports = [aggregate_reports]
if len(aggregate_reports) < 1:
return
for report in aggregate_reports:
report["date_range"] = self.generate_daterange(report)
report["date_range"] = self.generate_date_range(report)
report = self.strip_metadata(report)
for slice in report["records"]:
@@ -129,27 +139,31 @@ class KafkaClient(object):
except Exception as e:
raise KafkaError("Kafka error: {0}".format(e.__str__()))
def save_forensic_reports_to_kafka(self, forensic_reports, forensic_topic):
def save_failure_reports_to_kafka(
self,
failure_reports: Union[dict[str, Any], list[dict[str, Any]]],
failure_topic: str,
):
"""
Saves forensic DMARC reports to Kafka, sends individual
Saves failure DMARC reports to Kafka, sends individual
records (slices) since Kafka requires messages to be <= 1MB
by default.
Args:
forensic_reports (list): A list of forensic report dicts
failure_reports (list): A list of failure report dicts
to save to Kafka
forensic_topic (str): The name of the Kafka topic
failure_topic (str): The name of the Kafka topic
"""
if isinstance(forensic_reports, dict):
forensic_reports = [forensic_reports]
if isinstance(failure_reports, dict):
failure_reports = [failure_reports]
if len(forensic_reports) < 1:
if len(failure_reports) < 1:
return
try:
logger.debug("Saving forensic reports to Kafka")
self.producer.send(forensic_topic, forensic_reports)
logger.debug("Saving failure reports to Kafka")
self.producer.send(failure_topic, failure_reports)
except UnknownTopicOrPartitionError:
raise KafkaError("Kafka error: Unknown topic or partition on broker")
except Exception as e:
@@ -159,14 +173,18 @@ class KafkaClient(object):
except Exception as e:
raise KafkaError("Kafka error: {0}".format(e.__str__()))
def save_smtp_tls_reports_to_kafka(self, smtp_tls_reports, smtp_tls_topic):
def save_smtp_tls_reports_to_kafka(
self,
smtp_tls_reports: Union[list[dict[str, Any]], dict[str, Any]],
smtp_tls_topic: str,
):
"""
Saves SMTP TLS reports to Kafka, sends individual
records (slices) since Kafka requires messages to be <= 1MB
by default.
Args:
smtp_tls_reports (list): A list of forensic report dicts
smtp_tls_reports (list): A list of SMTP TLS report dicts
to save to Kafka
smtp_tls_topic (str): The name of the Kafka topic
@@ -178,7 +196,7 @@ class KafkaClient(object):
return
try:
logger.debug("Saving forensic reports to Kafka")
logger.debug("Saving SMTP TLS reports to Kafka")
self.producer.send(smtp_tls_topic, smtp_tls_reports)
except UnknownTopicOrPartitionError:
raise KafkaError("Kafka error: Unknown topic or partition on broker")
@@ -188,3 +206,7 @@ class KafkaClient(object):
self.producer.flush()
except Exception as e:
raise KafkaError("Kafka error: {0}".format(e.__str__()))
# Backward-compatible aliases
KafkaClient.save_forensic_reports_to_kafka = KafkaClient.save_failure_reports_to_kafka

View File

@@ -1,9 +1,15 @@
# -*- coding: utf-8 -*-
from parsedmarc.log import logger
from __future__ import annotations
from typing import Any
from azure.core.exceptions import HttpResponseError
from azure.identity import ClientSecretCredential
from azure.monitor.ingestion import LogsIngestionClient
from parsedmarc.log import logger
class LogAnalyticsException(Exception):
"""Raised when an Elasticsearch error occurs"""
@@ -32,9 +38,9 @@ class LogAnalyticsConfig:
The Stream name where
the Aggregate DMARC reports
need to be pushed.
dcr_forensic_stream (str):
dcr_failure_stream (str):
The Stream name where
the Forensic DMARC reports
the Failure DMARC reports
need to be pushed.
dcr_smtp_tls_stream (str):
The Stream name where
@@ -50,7 +56,7 @@ class LogAnalyticsConfig:
dce: str,
dcr_immutable_id: str,
dcr_aggregate_stream: str,
dcr_forensic_stream: str,
dcr_failure_stream: str,
dcr_smtp_tls_stream: str,
):
self.client_id = client_id
@@ -59,7 +65,7 @@ class LogAnalyticsConfig:
self.dce = dce
self.dcr_immutable_id = dcr_immutable_id
self.dcr_aggregate_stream = dcr_aggregate_stream
self.dcr_forensic_stream = dcr_forensic_stream
self.dcr_failure_stream = dcr_failure_stream
self.dcr_smtp_tls_stream = dcr_smtp_tls_stream
@@ -78,7 +84,7 @@ class LogAnalyticsClient(object):
dce: str,
dcr_immutable_id: str,
dcr_aggregate_stream: str,
dcr_forensic_stream: str,
dcr_failure_stream: str,
dcr_smtp_tls_stream: str,
):
self.conf = LogAnalyticsConfig(
@@ -88,7 +94,7 @@ class LogAnalyticsClient(object):
dce=dce,
dcr_immutable_id=dcr_immutable_id,
dcr_aggregate_stream=dcr_aggregate_stream,
dcr_forensic_stream=dcr_forensic_stream,
dcr_failure_stream=dcr_failure_stream,
dcr_smtp_tls_stream=dcr_smtp_tls_stream,
)
if (
@@ -102,7 +108,12 @@ class LogAnalyticsClient(object):
"Invalid configuration. " + "One or more required settings are missing."
)
def publish_json(self, results, logs_client: LogsIngestionClient, dcr_stream: str):
def publish_json(
self,
results,
logs_client: LogsIngestionClient,
dcr_stream: str,
):
"""
Background function to publish given
DMARC report to specific Data Collection Rule.
@@ -121,7 +132,11 @@ class LogAnalyticsClient(object):
raise LogAnalyticsException("Upload failed: {error}".format(error=e))
def publish_results(
self, results, save_aggregate: bool, save_forensic: bool, save_smtp_tls: bool
self,
results: dict[str, Any],
save_aggregate: bool,
save_failure: bool,
save_smtp_tls: bool,
):
"""
Function to publish DMARC and/or SMTP TLS reports to Log Analytics
@@ -131,13 +146,13 @@ class LogAnalyticsClient(object):
Args:
results (list):
The DMARC reports (Aggregate & Forensic)
The DMARC reports (Aggregate & Failure)
save_aggregate (bool):
Whether Aggregate reports can be saved into Log Analytics
save_forensic (bool):
Whether Forensic reports can be saved into Log Analytics
save_failure (bool):
Whether Failure reports can be saved into Log Analytics
save_smtp_tls (bool):
Whether Forensic reports can be saved into Log Analytics
Whether Failure reports can be saved into Log Analytics
"""
conf = self.conf
credential = ClientSecretCredential(
@@ -158,16 +173,16 @@ class LogAnalyticsClient(object):
)
logger.info("Successfully pushed aggregate reports.")
if (
results["forensic_reports"]
and conf.dcr_forensic_stream
and len(results["forensic_reports"]) > 0
and save_forensic
results["failure_reports"]
and conf.dcr_failure_stream
and len(results["failure_reports"]) > 0
and save_failure
):
logger.info("Publishing forensic reports.")
logger.info("Publishing failure reports.")
self.publish_json(
results["forensic_reports"], logs_client, conf.dcr_forensic_stream
results["failure_reports"], logs_client, conf.dcr_failure_stream
)
logger.info("Successfully pushed forensic reports.")
logger.info("Successfully pushed failure reports.")
if (
results["smtp_tls_reports"]
and conf.dcr_smtp_tls_stream

View File

@@ -1,3 +1,7 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from base64 import urlsafe_b64decode
from functools import lru_cache
from pathlib import Path
@@ -112,14 +116,14 @@ class GmailConnection(MailboxConnection):
else:
return [id for id in self._fetch_all_message_ids(reports_label_id)]
def fetch_message(self, message_id):
def fetch_message(self, message_id) -> str:
msg = (
self.service.users()
.messages()
.get(userId="me", id=message_id, format="raw")
.execute()
)
return urlsafe_b64decode(msg["raw"])
return urlsafe_b64decode(msg["raw"]).decode(errors="replace")
def delete_message(self, message_id: str):
self.service.users().messages().delete(userId="me", id=message_id)
@@ -152,3 +156,4 @@ class GmailConnection(MailboxConnection):
for label in labels:
if label_name == label["id"] or label_name == label["name"]:
return label["id"]
return ""

View File

@@ -1,8 +1,12 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from enum import Enum
from functools import lru_cache
from pathlib import Path
from time import sleep
from typing import List, Optional
from typing import Any, List, Optional, Union
from azure.identity import (
UsernamePasswordCredential,
@@ -24,7 +28,7 @@ class AuthMethod(Enum):
def _get_cache_args(token_path: Path, allow_unencrypted_storage):
cache_args = {
cache_args: dict[str, Any] = {
"cache_persistence_options": TokenCachePersistenceOptions(
name="parsedmarc", allow_unencrypted_storage=allow_unencrypted_storage
)
@@ -147,9 +151,9 @@ class MSGraphConnection(MailboxConnection):
else:
logger.warning(f"Unknown response {resp.status_code} {resp.json()}")
def fetch_messages(self, folder_name: str, **kwargs) -> List[str]:
def fetch_messages(self, reports_folder: str, **kwargs) -> List[str]:
"""Returns a list of message UIDs in the specified folder"""
folder_id = self._find_folder_id_from_folder_path(folder_name)
folder_id = self._find_folder_id_from_folder_path(reports_folder)
url = f"/users/{self.mailbox_name}/mailFolders/{folder_id}/messages"
since = kwargs.get("since")
if not since:
@@ -162,7 +166,7 @@ class MSGraphConnection(MailboxConnection):
def _get_all_messages(self, url, batch_size, since):
messages: list
params = {"$select": "id"}
params: dict[str, Union[str, int]] = {"$select": "id"}
if since:
params["$filter"] = f"receivedDateTime ge {since}"
if batch_size and batch_size > 0:

View File

@@ -1,3 +1,9 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from typing import cast
from time import sleep
from imapclient.exceptions import IMAPClientError
@@ -11,14 +17,14 @@ from parsedmarc.mail.mailbox_connection import MailboxConnection
class IMAPConnection(MailboxConnection):
def __init__(
self,
host=None,
user=None,
password=None,
port=None,
ssl=True,
verify=True,
timeout=30,
max_retries=4,
host: str,
user: str,
password: str,
port: int = 993,
ssl: bool = True,
verify: bool = True,
timeout: int = 30,
max_retries: int = 4,
):
self._username = user
self._password = password
@@ -40,18 +46,18 @@ class IMAPConnection(MailboxConnection):
def fetch_messages(self, reports_folder: str, **kwargs):
self._client.select_folder(reports_folder)
since = kwargs.get("since")
if since:
return self._client.search(["SINCE", since])
if since is not None:
return self._client.search(f"SINCE {since}")
else:
return self._client.search()
def fetch_message(self, message_id):
return self._client.fetch_message(message_id, parse=False)
def fetch_message(self, message_id: int):
return cast(str, self._client.fetch_message(message_id, parse=False))
def delete_message(self, message_id: str):
def delete_message(self, message_id: int):
self._client.delete_messages([message_id])
def move_message(self, message_id: str, folder_name: str):
def move_message(self, message_id: int, folder_name: str):
self._client.move_messages([message_id], folder_name)
def keepalive(self):

View File

@@ -1,5 +1,8 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from abc import ABC
from typing import List
class MailboxConnection(ABC):
@@ -10,16 +13,16 @@ class MailboxConnection(ABC):
def create_folder(self, folder_name: str):
raise NotImplementedError
def fetch_messages(self, reports_folder: str, **kwargs) -> List[str]:
def fetch_messages(self, reports_folder: str, **kwargs):
raise NotImplementedError
def fetch_message(self, message_id) -> str:
raise NotImplementedError
def delete_message(self, message_id: str):
def delete_message(self, message_id):
raise NotImplementedError
def move_message(self, message_id: str, folder_name: str):
def move_message(self, message_id, folder_name: str):
raise NotImplementedError
def keepalive(self):

View File

@@ -1,16 +1,21 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
import mailbox
import os
from time import sleep
from typing import Dict
from parsedmarc.log import logger
from parsedmarc.mail.mailbox_connection import MailboxConnection
import mailbox
import os
class MaildirConnection(MailboxConnection):
def __init__(
self,
maildir_path=None,
maildir_create=False,
maildir_path: str,
maildir_create: bool = False,
):
self._maildir_path = maildir_path
self._maildir_create = maildir_create
@@ -27,27 +32,31 @@ class MaildirConnection(MailboxConnection):
)
raise Exception(ex)
self._client = mailbox.Maildir(maildir_path, create=maildir_create)
self._subfolder_client = {}
self._subfolder_client: Dict[str, mailbox.Maildir] = {}
def create_folder(self, folder_name: str):
self._subfolder_client[folder_name] = self._client.add_folder(folder_name)
self._client.add_folder(folder_name)
def fetch_messages(self, reports_folder: str, **kwargs):
return self._client.keys()
def fetch_message(self, message_id):
return self._client.get(message_id).as_string()
def fetch_message(self, message_id: str) -> str:
msg = self._client.get(message_id)
if msg is not None:
msg = msg.as_string()
if msg is not None:
return msg
return ""
def delete_message(self, message_id: str):
self._client.remove(message_id)
def move_message(self, message_id: str, folder_name: str):
message_data = self._client.get(message_id)
if folder_name not in self._subfolder_client.keys():
self._subfolder_client = mailbox.Maildir(
os.join(self.maildir_path, folder_name), create=self.maildir_create
)
if message_data is None:
return
if folder_name not in self._subfolder_client:
self._subfolder_client[folder_name] = self._client.add_folder(folder_name)
self._subfolder_client[folder_name].add(message_data)
self._client.remove(message_id)

View File

@@ -1,27 +1,30 @@
# -*- coding: utf-8 -*-
from collections import OrderedDict
from __future__ import annotations
from typing import Any, Optional, Union
from opensearchpy import (
Q,
connections,
Object,
Boolean,
Date,
Document,
Index,
Nested,
InnerDoc,
Integer,
Text,
Boolean,
Ip,
Date,
Keyword,
Nested,
Object,
Q,
Search,
Text,
connections,
)
from opensearchpy.helpers import reindex
from parsedmarc import InvalidFailureReport
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import InvalidForensicReport
class OpenSearchError(Exception):
@@ -41,18 +44,23 @@ class _PublishedPolicy(InnerDoc):
sp = Text()
pct = Integer()
fo = Text()
np = Keyword()
testing = Keyword()
discovery_method = Keyword()
class _DKIMResult(InnerDoc):
domain = Text()
selector = Text()
result = Text()
human_result = Text()
class _SPFResult(InnerDoc):
domain = Text()
scope = Text()
results = Text()
human_result = Text()
class _AggregateReportDoc(Document):
@@ -67,6 +75,8 @@ class _AggregateReportDoc(Document):
date_range = Date()
date_begin = Date()
date_end = Date()
normalized_timespan = Boolean()
original_timespan_seconds = Integer
errors = Text()
published_policy = Object(_PublishedPolicy)
source_ip_address = Ip()
@@ -86,19 +96,37 @@ class _AggregateReportDoc(Document):
envelope_to = Text()
dkim_results = Nested(_DKIMResult)
spf_results = Nested(_SPFResult)
np = Keyword()
testing = Keyword()
discovery_method = Keyword()
generator = Text()
def add_policy_override(self, type_, comment):
def add_policy_override(self, type_: str, comment: str):
self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment))
def add_dkim_result(self, domain, selector, result):
def add_dkim_result(
self, domain: str, selector: str, result: _DKIMResult,
human_result: str = None,
):
self.dkim_results.append(
_DKIMResult(domain=domain, selector=selector, result=result)
_DKIMResult(
domain=domain, selector=selector, result=result,
human_result=human_result,
)
)
def add_spf_result(self, domain, scope, result):
self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result))
def add_spf_result(
self, domain: str, scope: str, result: _SPFResult,
human_result: str = None,
):
self.spf_results.append(
_SPFResult(
domain=domain, scope=scope, result=result,
human_result=human_result,
)
)
def save(self, **kwargs):
def save(self, **kwargs): # pyright: ignore[reportIncompatibleMethodOverride]
self.passed_dmarc = False
self.passed_dmarc = self.spf_aligned or self.dkim_aligned
@@ -116,7 +144,7 @@ class _EmailAttachmentDoc(Document):
sha256 = Text()
class _ForensicSampleDoc(InnerDoc):
class _FailureSampleDoc(InnerDoc):
raw = Text()
headers = Object()
headers_only = Boolean()
@@ -131,21 +159,21 @@ class _ForensicSampleDoc(InnerDoc):
body = Text()
attachments = Nested(_EmailAttachmentDoc)
def add_to(self, display_name, address):
def add_to(self, display_name: str, address: str):
self.to.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_reply_to(self, display_name, address):
def add_reply_to(self, display_name: str, address: str):
self.reply_to.append(
_EmailAddressDoc(display_name=display_name, address=address)
)
def add_cc(self, display_name, address):
def add_cc(self, display_name: str, address: str):
self.cc.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_bcc(self, display_name, address):
def add_bcc(self, display_name: str, address: str):
self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address))
def add_attachment(self, filename, content_type, sha256):
def add_attachment(self, filename: str, content_type: str, sha256: str):
self.attachments.append(
_EmailAttachmentDoc(
filename=filename, content_type=content_type, sha256=sha256
@@ -153,9 +181,9 @@ class _ForensicSampleDoc(InnerDoc):
)
class _ForensicReportDoc(Document):
class _FailureReportDoc(Document):
class Index:
name = "dmarc_forensic"
name = "dmarc_failure"
feedback_type = Text()
user_agent = Text()
@@ -173,7 +201,7 @@ class _ForensicReportDoc(Document):
source_auth_failures = Text()
dkim_domain = Text()
original_rcpt_to = Text()
sample = Object(_ForensicSampleDoc)
sample = Object(_FailureSampleDoc)
class _SMTPTLSFailureDetailsDoc(InnerDoc):
@@ -197,15 +225,15 @@ class _SMTPTLSPolicyDoc(InnerDoc):
def add_failure_details(
self,
result_type,
ip_address,
receiving_ip,
receiving_mx_helo,
failed_session_count,
sending_mta_ip=None,
receiving_mx_hostname=None,
additional_information_uri=None,
failure_reason_code=None,
result_type: Optional[str] = None,
ip_address: Optional[str] = None,
receiving_ip: Optional[str] = None,
receiving_mx_helo: Optional[str] = None,
failed_session_count: Optional[int] = None,
sending_mta_ip: Optional[str] = None,
receiving_mx_hostname: Optional[str] = None,
additional_information_uri: Optional[str] = None,
failure_reason_code: Union[str, int, None] = None,
):
_details = _SMTPTLSFailureDetailsDoc(
result_type=result_type,
@@ -235,13 +263,14 @@ class _SMTPTLSReportDoc(Document):
def add_policy(
self,
policy_type,
policy_domain,
successful_session_count,
failed_session_count,
policy_string=None,
mx_host_patterns=None,
failure_details=None,
policy_type: str,
policy_domain: str,
successful_session_count: int,
failed_session_count: int,
*,
policy_string: Optional[str] = None,
mx_host_patterns: Optional[list[str]] = None,
failure_details: Optional[str] = None,
):
self.policies.append(
policy_type=policy_type,
@@ -259,24 +288,25 @@ class AlreadySaved(ValueError):
def set_hosts(
hosts,
use_ssl=False,
ssl_cert_path=None,
username=None,
password=None,
apiKey=None,
timeout=60.0,
hosts: Union[str, list[str]],
*,
use_ssl: Optional[bool] = False,
ssl_cert_path: Optional[str] = None,
username: Optional[str] = None,
password: Optional[str] = None,
api_key: Optional[str] = None,
timeout: Optional[float] = 60.0,
):
"""
Sets the OpenSearch hosts to use
Args:
hosts (str|list): A hostname or URL, or list of hostnames or URLs
hosts (str|list[str]): A single hostname or URL, or list of hostnames or URLs
use_ssl (bool): Use an HTTPS connection to the server
ssl_cert_path (str): Path to the certificate chain
username (str): The username to use for authentication
password (str): The password to use for authentication
apiKey (str): The Base64 encoded API key to use for authentication
api_key (str): The Base64 encoded API key to use for authentication
timeout (float): Timeout in seconds
"""
if not isinstance(hosts, list):
@@ -289,14 +319,14 @@ def set_hosts(
conn_params["ca_certs"] = ssl_cert_path
else:
conn_params["verify_certs"] = False
if username:
if username and password:
conn_params["http_auth"] = username + ":" + password
if apiKey:
conn_params["api_key"] = apiKey
if api_key:
conn_params["api_key"] = api_key
connections.create_connection(**conn_params)
def create_indexes(names, settings=None):
def create_indexes(names: list[str], settings: Optional[dict[str, Any]] = None):
"""
Create OpenSearch indexes
@@ -319,19 +349,22 @@ def create_indexes(names, settings=None):
raise OpenSearchError("OpenSearch error: {0}".format(e.__str__()))
def migrate_indexes(aggregate_indexes=None, forensic_indexes=None):
def migrate_indexes(
aggregate_indexes: Optional[list[str]] = None,
failure_indexes: Optional[list[str]] = None,
):
"""
Updates index mappings
Args:
aggregate_indexes (list): A list of aggregate index names
forensic_indexes (list): A list of forensic index names
failure_indexes (list): A list of failure index names
"""
version = 2
if aggregate_indexes is None:
aggregate_indexes = []
if forensic_indexes is None:
forensic_indexes = []
if failure_indexes is None:
failure_indexes = []
for aggregate_index_name in aggregate_indexes:
if not Index(aggregate_index_name).exists():
continue
@@ -361,23 +394,23 @@ def migrate_indexes(aggregate_indexes=None, forensic_indexes=None):
reindex(connections.get_connection(), aggregate_index_name, new_index_name)
Index(aggregate_index_name).delete()
for forensic_index in forensic_indexes:
for failure_index in failure_indexes:
pass
def save_aggregate_report_to_opensearch(
aggregate_report,
index_suffix=None,
index_prefix=None,
monthly_indexes=False,
number_of_shards=1,
number_of_replicas=0,
aggregate_report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed DMARC aggregate report to OpenSearch
Args:
aggregate_report (OrderedDict): A parsed forensic report
aggregate_report (dict): A parsed aggregate report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -395,15 +428,11 @@ def save_aggregate_report_to_opensearch(
domain = aggregate_report["policy_published"]["domain"]
begin_date = human_timestamp_to_datetime(metadata["begin_date"], to_utc=True)
end_date = human_timestamp_to_datetime(metadata["end_date"], to_utc=True)
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
if monthly_indexes:
index_date = begin_date.strftime("%Y-%m")
else:
index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
@@ -421,6 +450,8 @@ def save_aggregate_report_to_opensearch(
query = org_name_query & report_id_query & domain_query
query = query & begin_date_query & end_date_query
search.query = query
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
try:
existing = search.execute()
@@ -447,9 +478,25 @@ def save_aggregate_report_to_opensearch(
sp=aggregate_report["policy_published"]["sp"],
pct=aggregate_report["policy_published"]["pct"],
fo=aggregate_report["policy_published"]["fo"],
np=aggregate_report["policy_published"].get("np"),
testing=aggregate_report["policy_published"].get("testing"),
discovery_method=aggregate_report["policy_published"].get(
"discovery_method"
),
)
for record in aggregate_report["records"]:
begin_date = human_timestamp_to_datetime(record["interval_begin"], to_utc=True)
end_date = human_timestamp_to_datetime(record["interval_end"], to_utc=True)
normalized_timespan = record["normalized_timespan"]
if monthly_indexes:
index_date = begin_date.strftime("%Y-%m")
else:
index_date = begin_date.strftime("%Y-%m-%d")
aggregate_report["begin_date"] = begin_date
aggregate_report["end_date"] = end_date
date_range = [aggregate_report["begin_date"], aggregate_report["end_date"]]
agg_doc = _AggregateReportDoc(
xml_schema=aggregate_report["xml_schema"],
org_name=metadata["org_name"],
@@ -457,8 +504,9 @@ def save_aggregate_report_to_opensearch(
org_extra_contact_info=metadata["org_extra_contact_info"],
report_id=metadata["report_id"],
date_range=date_range,
date_begin=aggregate_report["begin_date"],
date_end=aggregate_report["end_date"],
date_begin=begin_date,
date_end=end_date,
normalized_timespan=normalized_timespan,
errors=metadata["errors"],
published_policy=published_policy,
source_ip_address=record["source"]["ip_address"],
@@ -476,6 +524,12 @@ def save_aggregate_report_to_opensearch(
header_from=record["identifiers"]["header_from"],
envelope_from=record["identifiers"]["envelope_from"],
envelope_to=record["identifiers"]["envelope_to"],
np=aggregate_report["policy_published"].get("np"),
testing=aggregate_report["policy_published"].get("testing"),
discovery_method=aggregate_report["policy_published"].get(
"discovery_method"
),
generator=metadata.get("generator"),
)
for override in record["policy_evaluated"]["policy_override_reasons"]:
@@ -488,6 +542,7 @@ def save_aggregate_report_to_opensearch(
domain=dkim_result["domain"],
selector=dkim_result["selector"],
result=dkim_result["result"],
human_result=dkim_result.get("human_result"),
)
for spf_result in record["auth_results"]["spf"]:
@@ -495,6 +550,7 @@ def save_aggregate_report_to_opensearch(
domain=spf_result["domain"],
scope=spf_result["scope"],
result=spf_result["result"],
human_result=spf_result.get("human_result"),
)
index = "dmarc_aggregate"
@@ -516,19 +572,19 @@ def save_aggregate_report_to_opensearch(
raise OpenSearchError("OpenSearch error: {0}".format(e.__str__()))
def save_forensic_report_to_opensearch(
forensic_report,
index_suffix=None,
index_prefix=None,
monthly_indexes=False,
number_of_shards=1,
number_of_replicas=0,
def save_failure_report_to_opensearch(
failure_report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed DMARC forensic report to OpenSearch
Saves a parsed DMARC failure report to OpenSearch
Args:
forensic_report (OrderedDict): A parsed forensic report
failure_report (dict): A parsed failure report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily
@@ -541,26 +597,31 @@ def save_forensic_report_to_opensearch(
AlreadySaved
"""
logger.info("Saving forensic report to OpenSearch")
forensic_report = forensic_report.copy()
logger.info("Saving failure report to OpenSearch")
failure_report = failure_report.copy()
sample_date = None
if forensic_report["parsed_sample"]["date"] is not None:
sample_date = forensic_report["parsed_sample"]["date"]
if failure_report["parsed_sample"]["date"] is not None:
sample_date = failure_report["parsed_sample"]["date"]
sample_date = human_timestamp_to_datetime(sample_date)
original_headers = forensic_report["parsed_sample"]["headers"]
headers = OrderedDict()
original_headers = failure_report["parsed_sample"]["headers"]
headers: dict[str, Any] = {}
for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header]
arrival_date = human_timestamp_to_datetime(forensic_report["arrival_date_utc"])
arrival_date = human_timestamp_to_datetime(failure_report["arrival_date_utc"])
arrival_date_epoch_milliseconds = int(arrival_date.timestamp() * 1000)
if index_suffix is not None:
search_index = "dmarc_forensic_{0}*".format(index_suffix)
search_index = "dmarc_failure_{0}*,dmarc_forensic_{0}*".format(
index_suffix
)
else:
search_index = "dmarc_forensic*"
search_index = "dmarc_failure*,dmarc_forensic*"
if index_prefix is not None:
search_index = "{0}{1}".format(index_prefix, search_index)
search_index = ",".join(
"{0}{1}".format(index_prefix, part)
for part in search_index.split(",")
)
search = Search(index=search_index)
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds)))
@@ -601,64 +662,64 @@ def save_forensic_report_to_opensearch(
if len(existing) > 0:
raise AlreadySaved(
"A forensic sample to {0} from {1} "
"A failure sample to {0} from {1} "
"with a subject of {2} and arrival date of {3} "
"already exists in "
"OpenSearch".format(
to_, from_, subject, forensic_report["arrival_date_utc"]
to_, from_, subject, failure_report["arrival_date_utc"]
)
)
parsed_sample = forensic_report["parsed_sample"]
sample = _ForensicSampleDoc(
raw=forensic_report["sample"],
parsed_sample = failure_report["parsed_sample"]
sample = _FailureSampleDoc(
raw=failure_report["sample"],
headers=headers,
headers_only=forensic_report["sample_headers_only"],
headers_only=failure_report["sample_headers_only"],
date=sample_date,
subject=forensic_report["parsed_sample"]["subject"],
subject=failure_report["parsed_sample"]["subject"],
filename_safe_subject=parsed_sample["filename_safe_subject"],
body=forensic_report["parsed_sample"]["body"],
body=failure_report["parsed_sample"]["body"],
)
for address in forensic_report["parsed_sample"]["to"]:
for address in failure_report["parsed_sample"]["to"]:
sample.add_to(display_name=address["display_name"], address=address["address"])
for address in forensic_report["parsed_sample"]["reply_to"]:
for address in failure_report["parsed_sample"]["reply_to"]:
sample.add_reply_to(
display_name=address["display_name"], address=address["address"]
)
for address in forensic_report["parsed_sample"]["cc"]:
for address in failure_report["parsed_sample"]["cc"]:
sample.add_cc(display_name=address["display_name"], address=address["address"])
for address in forensic_report["parsed_sample"]["bcc"]:
for address in failure_report["parsed_sample"]["bcc"]:
sample.add_bcc(display_name=address["display_name"], address=address["address"])
for attachment in forensic_report["parsed_sample"]["attachments"]:
for attachment in failure_report["parsed_sample"]["attachments"]:
sample.add_attachment(
filename=attachment["filename"],
content_type=attachment["mail_content_type"],
sha256=attachment["sha256"],
)
try:
forensic_doc = _ForensicReportDoc(
feedback_type=forensic_report["feedback_type"],
user_agent=forensic_report["user_agent"],
version=forensic_report["version"],
original_mail_from=forensic_report["original_mail_from"],
failure_doc = _FailureReportDoc(
feedback_type=failure_report["feedback_type"],
user_agent=failure_report["user_agent"],
version=failure_report["version"],
original_mail_from=failure_report["original_mail_from"],
arrival_date=arrival_date_epoch_milliseconds,
domain=forensic_report["reported_domain"],
original_envelope_id=forensic_report["original_envelope_id"],
authentication_results=forensic_report["authentication_results"],
delivery_results=forensic_report["delivery_result"],
source_ip_address=forensic_report["source"]["ip_address"],
source_country=forensic_report["source"]["country"],
source_reverse_dns=forensic_report["source"]["reverse_dns"],
source_base_domain=forensic_report["source"]["base_domain"],
authentication_mechanisms=forensic_report["authentication_mechanisms"],
auth_failure=forensic_report["auth_failure"],
dkim_domain=forensic_report["dkim_domain"],
original_rcpt_to=forensic_report["original_rcpt_to"],
domain=failure_report["reported_domain"],
original_envelope_id=failure_report["original_envelope_id"],
authentication_results=failure_report["authentication_results"],
delivery_results=failure_report["delivery_result"],
source_ip_address=failure_report["source"]["ip_address"],
source_country=failure_report["source"]["country"],
source_reverse_dns=failure_report["source"]["reverse_dns"],
source_base_domain=failure_report["source"]["base_domain"],
authentication_mechanisms=failure_report["authentication_mechanisms"],
auth_failure=failure_report["auth_failure"],
dkim_domain=failure_report["dkim_domain"],
original_rcpt_to=failure_report["original_rcpt_to"],
sample=sample,
)
index = "dmarc_forensic"
index = "dmarc_failure"
if index_suffix:
index = "{0}_{1}".format(index, index_suffix)
if index_prefix:
@@ -672,30 +733,30 @@ def save_forensic_report_to_opensearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
)
create_indexes([index], index_settings)
forensic_doc.meta.index = index
failure_doc.meta.index = index
try:
forensic_doc.save()
failure_doc.save()
except Exception as e:
raise OpenSearchError("OpenSearch error: {0}".format(e.__str__()))
except KeyError as e:
raise InvalidForensicReport(
"Forensic report missing required field: {0}".format(e.__str__())
raise InvalidFailureReport(
"Failure report missing required field: {0}".format(e.__str__())
)
def save_smtp_tls_report_to_opensearch(
report,
index_suffix=None,
index_prefix=None,
monthly_indexes=False,
number_of_shards=1,
number_of_replicas=0,
report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed SMTP TLS report to OpenSearch
Args:
report (OrderedDict): A parsed SMTP TLS report
report (dict): A parsed SMTP TLS report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -705,7 +766,7 @@ def save_smtp_tls_report_to_opensearch(
Raises:
AlreadySaved
"""
logger.info("Saving aggregate report to OpenSearch")
logger.info("Saving SMTP TLS report to OpenSearch")
org_name = report["organization_name"]
report_id = report["report_id"]
begin_date = human_timestamp_to_datetime(report["begin_date"], to_utc=True)
@@ -781,7 +842,7 @@ def save_smtp_tls_report_to_opensearch(
policy_doc = _SMTPTLSPolicyDoc(
policy_domain=policy["policy_domain"],
policy_type=policy["policy_type"],
succesful_session_count=policy["successful_session_count"],
successful_session_count=policy["successful_session_count"],
failed_session_count=policy["failed_session_count"],
policy_string=policy_strings,
mx_host_patterns=mx_host_patterns,
@@ -832,3 +893,9 @@ def save_smtp_tls_report_to_opensearch(
smtp_tls_doc.save()
except Exception as e:
raise OpenSearchError("OpenSearch error: {0}".format(e.__str__()))
# Backward-compatible aliases
_ForensicSampleDoc = _FailureSampleDoc
_ForensicReportDoc = _FailureReportDoc
save_forensic_report_to_opensearch = save_failure_report_to_opensearch

View File

@@ -132,6 +132,7 @@ asu-vei.ru,ASU-VEI,Industrial
atextelecom.com.br,ATEX Telecom,ISP
atmailcloud.com,atmail,Email Provider
ats.ca,ATS Healthcare,Healthcare
att.net,AT&T,ISP
atw.ne.jp,ATW,Web Host
au-net.ne.jp,KDDI,ISP
au.com,au,ISP
@@ -242,6 +243,7 @@ carandainet.com.br,CN Internet,ISP
cardhealth.com,Cardinal Health,Healthcare
cardinal.com,Cardinal Health,Healthcare
cardinalhealth.com,Cardinal Health,Healthcare
cardinalscriptnet.com,Cardinal Health,Healthcare
carecentrix.com,CareCentrix,Healthcare
carleton.edu,Carlton College,Education
carrierzone.com,carrierzone,Email Security
@@ -697,6 +699,7 @@ hdsupply-email.com,HD Supply,Retail
healthall.com,UC Health,Healthcare
healthcaresupplypros.com,Healthcare Supply Pros,Healthcare
healthproductsforyou.com,Health Products For You,Healthcare
healthtouch.com,Cardinal Health,Healthcare
helloserver6.com,1st Source Web,Marketing
helpforcb.com,InterServer,Web Host
helpscout.net,Help Scout,SaaS
@@ -753,6 +756,8 @@ hostwindsdns.com,Hostwinds,Web Host
hotnet.net.il,Hot Net Internet Services,ISP
hp.com,HP,Technology
hringdu.is,Hringdu,ISP
hslda.net,Home School Legal Defense Association (HSLDA),Education
hslda.org,Home School Legal Defense Association (HSLDA),Education
hspherefilter.com,"DynamicNet, Inc. (DNI)",Web Host
htc.net,HTC,ISP
htmlservices.it,HTMLServices.it,MSP
@@ -763,6 +768,7 @@ hughston.com,Hughston Clinic,Healthcare
hvvc.us,Hivelocity,Web Host
i2ts.ne.jp,i2ts,Web Host
i4i.com,i4i,Technology
ibindley.com,Cardinal Health,Healthcare
ice.co.cr,Grupo ICE,Industrial
icehosting.nl,IceHosting,Web Host
icewarpcloud.in,IceWrap,Email Provider
@@ -832,6 +838,7 @@ ip-5-196-151.eu,OVH,Web Host
ip-51-161-36.net,OVH,Web Host
ip-51-195-53.eu,OVH,Web Host
ip-51-254-53.eu,OVH,Web Host
ip-51-38-67.eu,OVH,Web Host
ip-51-77-42.eu,OVH,Web Host
ip-51-83-140.eu,OVH,Web Host
ip-51-89-240.eu,OVH,Web Host
@@ -1217,6 +1224,7 @@ nettoday.co.th,Net Today,Web Host
netventure.pl,Netventure,MSP
netvigator.com,HKT,ISP
netvision.net.il,013 Netvision,ISP
network-tech.com,Network Technologies International (NTI),SaaS
network.kz,network.kz,ISP
network80.com,Network80,Web Host
neubox.net,Neubox,Web Host
1 base_reverse_dns name type
132 atextelecom.com.br ATEX Telecom ISP
133 atmailcloud.com atmail Email Provider
134 ats.ca ATS Healthcare Healthcare
135 att.net AT&T ISP
136 atw.ne.jp ATW Web Host
137 au-net.ne.jp KDDI ISP
138 au.com au ISP
243 cardhealth.com Cardinal Health Healthcare
244 cardinal.com Cardinal Health Healthcare
245 cardinalhealth.com Cardinal Health Healthcare
246 cardinalscriptnet.com Cardinal Health Healthcare
247 carecentrix.com CareCentrix Healthcare
248 carleton.edu Carlton College Education
249 carrierzone.com carrierzone Email Security
699 healthall.com UC Health Healthcare
700 healthcaresupplypros.com Healthcare Supply Pros Healthcare
701 healthproductsforyou.com Health Products For You Healthcare
702 healthtouch.com Cardinal Health Healthcare
703 helloserver6.com 1st Source Web Marketing
704 helpforcb.com InterServer Web Host
705 helpscout.net Help Scout SaaS
756 hotnet.net.il Hot Net Internet Services ISP
757 hp.com HP Technology
758 hringdu.is Hringdu ISP
759 hslda.net Home School Legal Defense Association (HSLDA) Education
760 hslda.org Home School Legal Defense Association (HSLDA) Education
761 hspherefilter.com DynamicNet, Inc. (DNI) Web Host
762 htc.net HTC ISP
763 htmlservices.it HTMLServices.it MSP
768 hvvc.us Hivelocity Web Host
769 i2ts.ne.jp i2ts Web Host
770 i4i.com i4i Technology
771 ibindley.com Cardinal Health Healthcare
772 ice.co.cr Grupo ICE Industrial
773 icehosting.nl IceHosting Web Host
774 icewarpcloud.in IceWrap Email Provider
838 ip-51-161-36.net OVH Web Host
839 ip-51-195-53.eu OVH Web Host
840 ip-51-254-53.eu OVH Web Host
841 ip-51-38-67.eu OVH Web Host
842 ip-51-77-42.eu OVH Web Host
843 ip-51-83-140.eu OVH Web Host
844 ip-51-89-240.eu OVH Web Host
1224 netventure.pl Netventure MSP
1225 netvigator.com HKT ISP
1226 netvision.net.il 013 Netvision ISP
1227 network-tech.com Network Technologies International (NTI) SaaS
1228 network.kz network.kz ISP
1229 network80.com Network80 Web Host
1230 neubox.net Neubox Web Host

View File

@@ -13,8 +13,6 @@ def _main():
csv_headers = ["source_name", "message_count"]
output_rows = []
known_unknown_domains = []
psl_overrides = []
known_domains = []

View File

@@ -1,6 +1,10 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
import json
from typing import Any
import boto3
from parsedmarc.log import logger
@@ -8,16 +12,16 @@ from parsedmarc.utils import human_timestamp_to_datetime
class S3Client(object):
"""A client for a Amazon S3"""
"""A client for interacting with Amazon S3"""
def __init__(
self,
bucket_name,
bucket_path,
region_name,
endpoint_url,
access_key_id,
secret_access_key,
bucket_name: str,
bucket_path: str,
region_name: str,
endpoint_url: str,
access_key_id: str,
secret_access_key: str,
):
"""
Initializes the S3Client
@@ -47,18 +51,18 @@ class S3Client(object):
aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
)
self.bucket = self.s3.Bucket(self.bucket_name)
self.bucket = self.s3.Bucket(self.bucket_name) # type: ignore
def save_aggregate_report_to_s3(self, report):
def save_aggregate_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "aggregate")
def save_forensic_report_to_s3(self, report):
self.save_report_to_s3(report, "forensic")
def save_failure_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "failure")
def save_smtp_tls_report_to_s3(self, report):
def save_smtp_tls_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "smtp_tls")
def save_report_to_s3(self, report, report_type):
def save_report_to_s3(self, report: dict[str, Any], report_type: str):
if report_type == "smtp_tls":
report_date = report["begin_date"]
report_id = report["report_id"]
@@ -89,3 +93,7 @@ class S3Client(object):
self.bucket.put_object(
Body=json.dumps(report), Key=object_path, Metadata=object_metadata
)
# Backward-compatible aliases
S3Client.save_forensic_report_to_s3 = S3Client.save_failure_report_to_s3

View File

@@ -1,9 +1,14 @@
from urllib.parse import urlparse
import socket
import json
# -*- coding: utf-8 -*-
from __future__ import annotations
import json
import socket
from typing import Any, Union
from urllib.parse import urlparse
import urllib3
import requests
import urllib3
from parsedmarc.constants import USER_AGENT
from parsedmarc.log import logger
@@ -23,7 +28,13 @@ class HECClient(object):
# http://docs.splunk.com/Documentation/Splunk/latest/RESTREF/RESTinput#services.2Fcollector
def __init__(
self, url, access_token, index, source="parsedmarc", verify=True, timeout=60
self,
url: str,
access_token: str,
index: str,
source: str = "parsedmarc",
verify=True,
timeout=60,
):
"""
Initializes the HECClient
@@ -37,9 +48,9 @@ class HECClient(object):
timeout (float): Number of seconds to wait for the server to send
data before giving up
"""
url = urlparse(url)
parsed_url = urlparse(url)
self.url = "{0}://{1}/services/collector/event/1.0".format(
url.scheme, url.netloc
parsed_url.scheme, parsed_url.netloc
)
self.access_token = access_token.lstrip("Splunk ")
self.index = index
@@ -48,14 +59,19 @@ class HECClient(object):
self.session = requests.Session()
self.timeout = timeout
self.session.verify = verify
self._common_data = dict(host=self.host, source=self.source, index=self.index)
self._common_data: dict[str, Union[str, int, float, dict]] = dict(
host=self.host, source=self.source, index=self.index
)
self.session.headers = {
"User-Agent": USER_AGENT,
"Authorization": "Splunk {0}".format(self.access_token),
}
def save_aggregate_reports_to_splunk(self, aggregate_reports):
def save_aggregate_reports_to_splunk(
self,
aggregate_reports: Union[list[dict[str, Any]], dict[str, Any]],
):
"""
Saves aggregate DMARC reports to Splunk
@@ -75,9 +91,12 @@ class HECClient(object):
json_str = ""
for report in aggregate_reports:
for record in report["records"]:
new_report = dict()
new_report: dict[str, Union[str, int, float, dict]] = dict()
for metadata in report["report_metadata"]:
new_report[metadata] = report["report_metadata"][metadata]
new_report["interval_begin"] = record["interval_begin"]
new_report["interval_end"] = record["interval_end"]
new_report["normalized_timespan"] = record["normalized_timespan"]
new_report["published_policy"] = report["policy_published"]
new_report["source_ip_address"] = record["source"]["ip_address"]
new_report["source_country"] = record["source"]["country"]
@@ -98,7 +117,9 @@ class HECClient(object):
new_report["spf_results"] = record["auth_results"]["spf"]
data["sourcetype"] = "dmarc:aggregate"
timestamp = human_timestamp_to_unix_timestamp(new_report["begin_date"])
timestamp = human_timestamp_to_unix_timestamp(
new_report["interval_begin"]
)
data["time"] = timestamp
data["event"] = new_report.copy()
json_str += "{0}\n".format(json.dumps(data))
@@ -113,25 +134,28 @@ class HECClient(object):
if response["code"] != 0:
raise SplunkError(response["text"])
def save_forensic_reports_to_splunk(self, forensic_reports):
def save_failure_reports_to_splunk(
self,
failure_reports: Union[list[dict[str, Any]], dict[str, Any]],
):
"""
Saves forensic DMARC reports to Splunk
Saves failure DMARC reports to Splunk
Args:
forensic_reports (list): A list of forensic report dictionaries
failure_reports (list): A list of failure report dictionaries
to save in Splunk
"""
logger.debug("Saving forensic reports to Splunk")
if isinstance(forensic_reports, dict):
forensic_reports = [forensic_reports]
logger.debug("Saving failure reports to Splunk")
if isinstance(failure_reports, dict):
failure_reports = [failure_reports]
if len(forensic_reports) < 1:
if len(failure_reports) < 1:
return
json_str = ""
for report in forensic_reports:
for report in failure_reports:
data = self._common_data.copy()
data["sourcetype"] = "dmarc:forensic"
data["sourcetype"] = "dmarc:failure"
timestamp = human_timestamp_to_unix_timestamp(report["arrival_date_utc"])
data["time"] = timestamp
data["event"] = report.copy()
@@ -147,7 +171,9 @@ class HECClient(object):
if response["code"] != 0:
raise SplunkError(response["text"])
def save_smtp_tls_reports_to_splunk(self, reports):
def save_smtp_tls_reports_to_splunk(
self, reports: Union[list[dict[str, Any]], dict[str, Any]]
):
"""
Saves aggregate DMARC reports to Splunk
@@ -181,3 +207,7 @@ class HECClient(object):
raise SplunkError(e.__str__())
if response["code"] != 0:
raise SplunkError(response["text"])
# Backward-compatible aliases
HECClient.save_forensic_reports_to_splunk = HECClient.save_failure_reports_to_splunk

View File

@@ -1,12 +1,19 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
import json
import logging
import logging.handlers
import json
import socket
import ssl
import time
from typing import Any, Optional
from parsedmarc import (
parsed_aggregate_reports_to_csv_rows,
parsed_forensic_reports_to_csv_rows,
parsed_failure_reports_to_csv_rows,
parsed_smtp_tls_reports_to_csv_rows,
)
@@ -14,31 +21,165 @@ from parsedmarc import (
class SyslogClient(object):
"""A client for Syslog"""
def __init__(self, server_name, server_port):
def __init__(
self,
server_name: str,
server_port: int,
protocol: str = "udp",
cafile_path: Optional[str] = None,
certfile_path: Optional[str] = None,
keyfile_path: Optional[str] = None,
timeout: float = 5.0,
retry_attempts: int = 3,
retry_delay: int = 5,
):
"""
Initializes the SyslogClient
Args:
server_name (str): The Syslog server
server_port (int): The Syslog UDP port
server_port (int): The Syslog port
protocol (str): The protocol to use: "udp", "tcp", or "tls" (Default: "udp")
cafile_path (str): Path to CA certificate file for TLS server verification (Optional)
certfile_path (str): Path to client certificate file for TLS authentication (Optional)
keyfile_path (str): Path to client private key file for TLS authentication (Optional)
timeout (float): Connection timeout in seconds for TCP/TLS (Default: 5.0)
retry_attempts (int): Number of retry attempts for failed connections (Default: 3)
retry_delay (int): Delay in seconds between retry attempts (Default: 5)
"""
self.server_name = server_name
self.server_port = server_port
self.protocol = protocol.lower()
self.timeout = timeout
self.retry_attempts = retry_attempts
self.retry_delay = retry_delay
self.logger = logging.getLogger("parsedmarc_syslog")
self.logger.setLevel(logging.INFO)
log_handler = logging.handlers.SysLogHandler(address=(server_name, server_port))
# Create the appropriate syslog handler based on protocol
log_handler = self._create_syslog_handler(
server_name,
server_port,
self.protocol,
cafile_path,
certfile_path,
keyfile_path,
timeout,
retry_attempts,
retry_delay,
)
self.logger.addHandler(log_handler)
def save_aggregate_report_to_syslog(self, aggregate_reports):
def _create_syslog_handler(
self,
server_name: str,
server_port: int,
protocol: str,
cafile_path: Optional[str],
certfile_path: Optional[str],
keyfile_path: Optional[str],
timeout: float,
retry_attempts: int,
retry_delay: int,
) -> logging.handlers.SysLogHandler:
"""
Creates a SysLogHandler with the specified protocol and TLS settings
"""
if protocol == "udp":
# UDP protocol (default, backward compatible)
return logging.handlers.SysLogHandler(
address=(server_name, server_port),
socktype=socket.SOCK_DGRAM,
)
elif protocol in ["tcp", "tls"]:
# TCP or TLS protocol with retry logic
for attempt in range(1, retry_attempts + 1):
try:
if protocol == "tcp":
# TCP without TLS
handler = logging.handlers.SysLogHandler(
address=(server_name, server_port),
socktype=socket.SOCK_STREAM,
)
# Set timeout on the socket
if hasattr(handler, "socket") and handler.socket:
handler.socket.settimeout(timeout)
return handler
else:
# TLS protocol
# Create SSL context with secure defaults
ssl_context = ssl.create_default_context()
# Explicitly set minimum TLS version to 1.2 for security
ssl_context.minimum_version = ssl.TLSVersion.TLSv1_2
# Configure server certificate verification
if cafile_path:
ssl_context.load_verify_locations(cafile=cafile_path)
# Configure client certificate authentication
if certfile_path and keyfile_path:
ssl_context.load_cert_chain(
certfile=certfile_path,
keyfile=keyfile_path,
)
elif certfile_path or keyfile_path:
# Warn if only one of the two required parameters is provided
self.logger.warning(
"Both certfile_path and keyfile_path are required for "
"client certificate authentication. Client authentication "
"will not be used."
)
# Create TCP handler first
handler = logging.handlers.SysLogHandler(
address=(server_name, server_port),
socktype=socket.SOCK_STREAM,
)
# Wrap socket with TLS
if hasattr(handler, "socket") and handler.socket:
handler.socket = ssl_context.wrap_socket(
handler.socket,
server_hostname=server_name,
)
handler.socket.settimeout(timeout)
return handler
except Exception as e:
if attempt < retry_attempts:
self.logger.warning(
f"Syslog connection attempt {attempt}/{retry_attempts} failed: {e}. "
f"Retrying in {retry_delay} seconds..."
)
time.sleep(retry_delay)
else:
self.logger.error(
f"Syslog connection failed after {retry_attempts} attempts: {e}"
)
raise
else:
raise ValueError(
f"Invalid protocol '{protocol}'. Must be 'udp', 'tcp', or 'tls'."
)
def save_aggregate_report_to_syslog(self, aggregate_reports: list[dict[str, Any]]):
rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports)
for row in rows:
self.logger.info(json.dumps(row))
def save_forensic_report_to_syslog(self, forensic_reports):
rows = parsed_forensic_reports_to_csv_rows(forensic_reports)
def save_failure_report_to_syslog(self, failure_reports: list[dict[str, Any]]):
rows = parsed_failure_reports_to_csv_rows(failure_reports)
for row in rows:
self.logger.info(json.dumps(row))
def save_smtp_tls_report_to_syslog(self, smtp_tls_reports):
def save_smtp_tls_report_to_syslog(self, smtp_tls_reports: list[dict[str, Any]]):
rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports)
for row in rows:
self.logger.info(json.dumps(row))
# Backward-compatible aliases
SyslogClient.save_forensic_report_to_syslog = SyslogClient.save_failure_report_to_syslog

234
parsedmarc/types.py Normal file
View File

@@ -0,0 +1,234 @@
from __future__ import annotations
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
# NOTE: This module is intentionally Python 3.9 compatible.
# - No PEP 604 unions (A | B)
# - No typing.NotRequired / Required (3.11+) to avoid an extra dependency.
# For optional keys, use total=False TypedDicts.
ReportType = Literal["aggregate", "failure", "smtp_tls"]
class AggregateReportMetadata(TypedDict):
org_name: str
org_email: str
org_extra_contact_info: Optional[str]
report_id: str
begin_date: str
end_date: str
timespan_requires_normalization: bool
original_timespan_seconds: int
errors: List[str]
generator: Optional[str]
class AggregatePolicyPublished(TypedDict):
domain: str
adkim: str
aspf: str
p: str
sp: str
pct: str
fo: str
np: Optional[str]
testing: Optional[str]
discovery_method: Optional[str]
class IPSourceInfo(TypedDict):
ip_address: str
country: Optional[str]
reverse_dns: Optional[str]
base_domain: Optional[str]
name: Optional[str]
type: Optional[str]
class AggregateAlignment(TypedDict):
spf: bool
dkim: bool
dmarc: bool
class AggregateIdentifiers(TypedDict):
header_from: str
envelope_from: Optional[str]
envelope_to: Optional[str]
class AggregatePolicyOverrideReason(TypedDict):
type: Optional[str]
comment: Optional[str]
class AggregateAuthResultDKIM(TypedDict):
domain: str
result: str
selector: str
human_result: Optional[str]
class AggregateAuthResultSPF(TypedDict):
domain: str
result: str
scope: str
human_result: Optional[str]
class AggregateAuthResults(TypedDict):
dkim: List[AggregateAuthResultDKIM]
spf: List[AggregateAuthResultSPF]
class AggregatePolicyEvaluated(TypedDict):
disposition: str
dkim: str
spf: str
policy_override_reasons: List[AggregatePolicyOverrideReason]
class AggregateRecord(TypedDict):
interval_begin: str
interval_end: str
source: IPSourceInfo
count: int
alignment: AggregateAlignment
policy_evaluated: AggregatePolicyEvaluated
disposition: str
identifiers: AggregateIdentifiers
auth_results: AggregateAuthResults
class AggregateReport(TypedDict):
xml_schema: str
report_metadata: AggregateReportMetadata
policy_published: AggregatePolicyPublished
records: List[AggregateRecord]
class EmailAddress(TypedDict):
display_name: Optional[str]
address: str
local: Optional[str]
domain: Optional[str]
class EmailAttachment(TypedDict, total=False):
filename: Optional[str]
mail_content_type: Optional[str]
sha256: Optional[str]
ParsedEmail = TypedDict(
"ParsedEmail",
{
# This is a lightly-specified version of mailsuite/mailparser JSON.
# It focuses on the fields parsedmarc uses in failure report handling.
"headers": Dict[str, Any],
"subject": Optional[str],
"filename_safe_subject": Optional[str],
"date": Optional[str],
"from": EmailAddress,
"to": List[EmailAddress],
"cc": List[EmailAddress],
"bcc": List[EmailAddress],
"attachments": List[EmailAttachment],
"body": Optional[str],
"has_defects": bool,
"defects": Any,
"defects_categories": Any,
},
total=False,
)
class FailureReport(TypedDict):
feedback_type: Optional[str]
user_agent: Optional[str]
version: Optional[str]
original_envelope_id: Optional[str]
original_mail_from: Optional[str]
original_rcpt_to: Optional[str]
arrival_date: str
arrival_date_utc: str
authentication_results: Optional[str]
delivery_result: Optional[str]
auth_failure: List[str]
authentication_mechanisms: List[str]
dkim_domain: Optional[str]
reported_domain: str
sample_headers_only: bool
source: IPSourceInfo
sample: str
parsed_sample: ParsedEmail
# Backward-compatible alias
ForensicReport = FailureReport
class SMTPTLSFailureDetails(TypedDict):
result_type: str
failed_session_count: int
class SMTPTLSFailureDetailsOptional(SMTPTLSFailureDetails, total=False):
sending_mta_ip: str
receiving_ip: str
receiving_mx_hostname: str
receiving_mx_helo: str
additional_info_uri: str
failure_reason_code: str
ip_address: str
class SMTPTLSPolicySummary(TypedDict):
policy_domain: str
policy_type: str
successful_session_count: int
failed_session_count: int
class SMTPTLSPolicy(SMTPTLSPolicySummary, total=False):
policy_strings: List[str]
mx_host_patterns: List[str]
failure_details: List[SMTPTLSFailureDetailsOptional]
class SMTPTLSReport(TypedDict):
organization_name: str
begin_date: str
end_date: str
contact_info: Union[str, List[str]]
report_id: str
policies: List[SMTPTLSPolicy]
class AggregateParsedReport(TypedDict):
report_type: Literal["aggregate"]
report: AggregateReport
class FailureParsedReport(TypedDict):
report_type: Literal["failure"]
report: FailureReport
# Backward-compatible alias
ForensicParsedReport = FailureParsedReport
class SMTPTLSParsedReport(TypedDict):
report_type: Literal["smtp_tls"]
report: SMTPTLSReport
ParsedReport = Union[AggregateParsedReport, FailureParsedReport, SMTPTLSParsedReport]
class ParsingResults(TypedDict):
aggregate_reports: List[AggregateReport]
failure_reports: List[FailureReport]
smtp_tls_reports: List[SMTPTLSReport]

View File

@@ -1,22 +1,26 @@
# -*- coding: utf-8 -*-
"""Utility functions that might be useful for other projects"""
import logging
import os
from datetime import datetime
from datetime import timezone
from datetime import timedelta
from collections import OrderedDict
import tempfile
import subprocess
import shutil
import mailparser
import json
import hashlib
from __future__ import annotations
import base64
import mailbox
import re
import csv
import hashlib
import io
import json
import logging
import mailbox
import os
import re
import shutil
import subprocess
import tempfile
from datetime import datetime, timedelta, timezone
from typing import Optional, TypedDict, Union, cast
import mailparser
from expiringdict import ExpiringDict
try:
from importlib.resources import files
@@ -25,19 +29,19 @@ except ImportError:
from importlib.resources import files
from dateutil.parser import parse as parse_date
import dns.reversename
import dns.resolver
import dns.exception
import dns.resolver
import dns.reversename
import geoip2.database
import geoip2.errors
import publicsuffixlist
import requests
from dateutil.parser import parse as parse_date
from parsedmarc.log import logger
import parsedmarc.resources.dbip
import parsedmarc.resources.maps
from parsedmarc.constants import USER_AGENT
from parsedmarc.log import logger
parenthesis_regex = re.compile(r"\s*\(.*\)\s*")
@@ -60,25 +64,42 @@ class DownloadError(RuntimeError):
"""Raised when an error occurs when downloading a file"""
def decode_base64(data):
class ReverseDNSService(TypedDict):
name: str
type: Optional[str]
ReverseDNSMap = dict[str, ReverseDNSService]
class IPAddressInfo(TypedDict):
ip_address: str
reverse_dns: Optional[str]
country: Optional[str]
base_domain: Optional[str]
name: Optional[str]
type: Optional[str]
def decode_base64(data: str) -> bytes:
"""
Decodes a base64 string, with padding being optional
Args:
data: A base64 encoded string
data (str): A base64 encoded string
Returns:
bytes: The decoded bytes
"""
data = bytes(data, encoding="ascii")
missing_padding = len(data) % 4
data_bytes = bytes(data, encoding="ascii")
missing_padding = len(data_bytes) % 4
if missing_padding != 0:
data += b"=" * (4 - missing_padding)
return base64.b64decode(data)
data_bytes += b"=" * (4 - missing_padding)
return base64.b64decode(data_bytes)
def get_base_domain(domain):
def get_base_domain(domain: str) -> Optional[str]:
"""
Gets the base domain name for the given domain
@@ -102,7 +123,14 @@ def get_base_domain(domain):
return publicsuffix
def query_dns(domain, record_type, cache=None, nameservers=None, timeout=2.0):
def query_dns(
domain: str,
record_type: str,
*,
cache: Optional[ExpiringDict] = None,
nameservers: Optional[list[str]] = None,
timeout: float = 2.0,
) -> list[str]:
"""
Queries DNS
@@ -121,9 +149,9 @@ def query_dns(domain, record_type, cache=None, nameservers=None, timeout=2.0):
record_type = record_type.upper()
cache_key = "{0}_{1}".format(domain, record_type)
if cache:
records = cache.get(cache_key, None)
if records:
return records
cached_records = cache.get(cache_key, None)
if isinstance(cached_records, list):
return cast(list[str], cached_records)
resolver = dns.resolver.Resolver()
timeout = float(timeout)
@@ -137,33 +165,25 @@ def query_dns(domain, record_type, cache=None, nameservers=None, timeout=2.0):
resolver.nameservers = nameservers
resolver.timeout = timeout
resolver.lifetime = timeout
if record_type == "TXT":
resource_records = list(
map(
lambda r: r.strings,
resolver.resolve(domain, record_type, lifetime=timeout),
)
)
_resource_record = [
resource_record[0][:0].join(resource_record)
for resource_record in resource_records
if resource_record
]
records = [r.decode() for r in _resource_record]
else:
records = list(
map(
lambda r: r.to_text().replace('"', "").rstrip("."),
resolver.resolve(domain, record_type, lifetime=timeout),
)
records = list(
map(
lambda r: r.to_text().replace('"', "").rstrip("."),
resolver.resolve(domain, record_type, lifetime=timeout),
)
)
if cache:
cache[cache_key] = records
return records
def get_reverse_dns(ip_address, cache=None, nameservers=None, timeout=2.0):
def get_reverse_dns(
ip_address,
*,
cache: Optional[ExpiringDict] = None,
nameservers: Optional[list[str]] = None,
timeout: float = 2.0,
) -> Optional[str]:
"""
Resolves an IP address to a hostname using a reverse DNS query
@@ -181,7 +201,7 @@ def get_reverse_dns(ip_address, cache=None, nameservers=None, timeout=2.0):
try:
address = dns.reversename.from_address(ip_address)
hostname = query_dns(
address, "PTR", cache=cache, nameservers=nameservers, timeout=timeout
str(address), "PTR", cache=cache, nameservers=nameservers, timeout=timeout
)[0]
except dns.exception.DNSException as e:
@@ -191,7 +211,7 @@ def get_reverse_dns(ip_address, cache=None, nameservers=None, timeout=2.0):
return hostname
def timestamp_to_datetime(timestamp):
def timestamp_to_datetime(timestamp: int) -> datetime:
"""
Converts a UNIX/DMARC timestamp to a Python ``datetime`` object
@@ -204,7 +224,7 @@ def timestamp_to_datetime(timestamp):
return datetime.fromtimestamp(int(timestamp))
def timestamp_to_human(timestamp):
def timestamp_to_human(timestamp: int) -> str:
"""
Converts a UNIX/DMARC timestamp to a human-readable string
@@ -217,7 +237,9 @@ def timestamp_to_human(timestamp):
return timestamp_to_datetime(timestamp).strftime("%Y-%m-%d %H:%M:%S")
def human_timestamp_to_datetime(human_timestamp, to_utc=False):
def human_timestamp_to_datetime(
human_timestamp: str, *, to_utc: bool = False
) -> datetime:
"""
Converts a human-readable timestamp into a Python ``datetime`` object
@@ -236,7 +258,7 @@ def human_timestamp_to_datetime(human_timestamp, to_utc=False):
return dt.astimezone(timezone.utc) if to_utc else dt
def human_timestamp_to_unix_timestamp(human_timestamp):
def human_timestamp_to_unix_timestamp(human_timestamp: str) -> int:
"""
Converts a human-readable timestamp into a UNIX timestamp
@@ -247,10 +269,12 @@ def human_timestamp_to_unix_timestamp(human_timestamp):
float: The converted timestamp
"""
human_timestamp = human_timestamp.replace("T", " ")
return human_timestamp_to_datetime(human_timestamp).timestamp()
return int(human_timestamp_to_datetime(human_timestamp).timestamp())
def get_ip_address_country(ip_address, db_path=None):
def get_ip_address_country(
ip_address: str, *, db_path: Optional[str] = None
) -> Optional[str]:
"""
Returns the ISO code for the country associated
with the given IPv4 or IPv6 address
@@ -277,7 +301,7 @@ def get_ip_address_country(ip_address, db_path=None):
]
if db_path is not None:
if os.path.isfile(db_path) is False:
if not os.path.isfile(db_path):
db_path = None
logger.warning(
f"No file exists at {db_path}. Falling back to an "
@@ -314,12 +338,13 @@ def get_ip_address_country(ip_address, db_path=None):
def get_service_from_reverse_dns_base_domain(
base_domain,
always_use_local_file=False,
local_file_path=None,
url=None,
offline=False,
reverse_dns_map=None,
):
*,
always_use_local_file: bool = False,
local_file_path: Optional[str] = None,
url: Optional[str] = None,
offline: bool = False,
reverse_dns_map: Optional[ReverseDNSMap] = None,
) -> ReverseDNSService:
"""
Returns the service name of a given base domain name from reverse DNS.
@@ -336,12 +361,6 @@ def get_service_from_reverse_dns_base_domain(
the supplied reverse_dns_base_domain and the type will be None
"""
def load_csv(_csv_file):
reader = csv.DictReader(_csv_file)
for row in reader:
key = row["base_reverse_dns"].lower().strip()
reverse_dns_map[key] = dict(name=row["name"], type=row["type"])
base_domain = base_domain.lower().strip()
if url is None:
url = (
@@ -349,11 +368,24 @@ def get_service_from_reverse_dns_base_domain(
"/parsedmarc/master/parsedmarc/"
"resources/maps/base_reverse_dns_map.csv"
)
reverse_dns_map_value: ReverseDNSMap
if reverse_dns_map is None:
reverse_dns_map = dict()
reverse_dns_map_value = {}
else:
reverse_dns_map_value = reverse_dns_map
def load_csv(_csv_file):
reader = csv.DictReader(_csv_file)
for row in reader:
key = row["base_reverse_dns"].lower().strip()
reverse_dns_map_value[key] = {
"name": row["name"],
"type": row["type"],
}
csv_file = io.StringIO()
if not (offline or always_use_local_file) and len(reverse_dns_map) == 0:
if not (offline or always_use_local_file) and len(reverse_dns_map_value) == 0:
try:
logger.debug(f"Trying to fetch reverse DNS map from {url}...")
headers = {"User-Agent": USER_AGENT}
@@ -370,7 +402,7 @@ def get_service_from_reverse_dns_base_domain(
logging.debug("Response body:")
logger.debug(csv_file.read())
if len(reverse_dns_map) == 0:
if len(reverse_dns_map_value) == 0:
logger.info("Loading included reverse DNS map...")
path = str(
files(parsedmarc.resources.maps).joinpath("base_reverse_dns_map.csv")
@@ -379,26 +411,28 @@ def get_service_from_reverse_dns_base_domain(
path = local_file_path
with open(path) as csv_file:
load_csv(csv_file)
service: ReverseDNSService
try:
service = reverse_dns_map[base_domain]
service = reverse_dns_map_value[base_domain]
except KeyError:
service = dict(name=base_domain, type=None)
service = {"name": base_domain, "type": None}
return service
def get_ip_address_info(
ip_address,
ip_db_path=None,
reverse_dns_map_path=None,
always_use_local_files=False,
reverse_dns_map_url=None,
cache=None,
reverse_dns_map=None,
offline=False,
nameservers=None,
timeout=2.0,
):
*,
ip_db_path: Optional[str] = None,
reverse_dns_map_path: Optional[str] = None,
always_use_local_files: bool = False,
reverse_dns_map_url: Optional[str] = None,
cache: Optional[ExpiringDict] = None,
reverse_dns_map: Optional[ReverseDNSMap] = None,
offline: bool = False,
nameservers: Optional[list[str]] = None,
timeout: float = 2.0,
) -> IPAddressInfo:
"""
Returns reverse DNS and country information for the given IP address
@@ -416,17 +450,27 @@ def get_ip_address_info(
timeout (float): Sets the DNS timeout in seconds
Returns:
OrderedDict: ``ip_address``, ``reverse_dns``
dict: ``ip_address``, ``reverse_dns``, ``country``
"""
ip_address = ip_address.lower()
if cache is not None:
info = cache.get(ip_address, None)
if info:
cached_info = cache.get(ip_address, None)
if (
cached_info
and isinstance(cached_info, dict)
and "ip_address" in cached_info
):
logger.debug(f"IP address {ip_address} was found in cache")
return info
info = OrderedDict()
info["ip_address"] = ip_address
return cast(IPAddressInfo, cached_info)
info: IPAddressInfo = {
"ip_address": ip_address,
"reverse_dns": None,
"country": None,
"base_domain": None,
"name": None,
"type": None,
}
if offline:
reverse_dns = None
else:
@@ -436,9 +480,6 @@ def get_ip_address_info(
country = get_ip_address_country(ip_address, db_path=ip_db_path)
info["country"] = country
info["reverse_dns"] = reverse_dns
info["base_domain"] = None
info["name"] = None
info["type"] = None
if reverse_dns is not None:
base_domain = get_base_domain(reverse_dns)
if base_domain is not None:
@@ -463,7 +504,7 @@ def get_ip_address_info(
return info
def parse_email_address(original_address):
def parse_email_address(original_address: str) -> dict[str, Optional[str]]:
if original_address[0] == "":
display_name = None
else:
@@ -476,17 +517,15 @@ def parse_email_address(original_address):
local = address_parts[0].lower()
domain = address_parts[-1].lower()
return OrderedDict(
[
("display_name", display_name),
("address", address),
("local", local),
("domain", domain),
]
)
return {
"display_name": display_name,
"address": address,
"local": local,
"domain": domain,
}
def get_filename_safe_string(string):
def get_filename_safe_string(string: str) -> str:
"""
Converts a string to a string that is safe for a filename
@@ -508,7 +547,7 @@ def get_filename_safe_string(string):
return string
def is_mbox(path):
def is_mbox(path: str) -> bool:
"""
Checks if the given content is an MBOX mailbox file
@@ -529,7 +568,7 @@ def is_mbox(path):
return _is_mbox
def is_outlook_msg(content):
def is_outlook_msg(content) -> bool:
"""
Checks if the given content is an Outlook msg OLE/MSG file
@@ -544,7 +583,7 @@ def is_outlook_msg(content):
)
def convert_outlook_msg(msg_bytes):
def convert_outlook_msg(msg_bytes: bytes) -> bytes:
"""
Uses the ``msgconvert`` Perl utility to convert an Outlook MS file to
standard RFC 822 format
@@ -553,7 +592,7 @@ def convert_outlook_msg(msg_bytes):
msg_bytes (bytes): the content of the .msg file
Returns:
A RFC 822 string
A RFC 822 bytes payload
"""
if not is_outlook_msg(msg_bytes):
raise ValueError("The supplied bytes are not an Outlook MSG file")
@@ -580,7 +619,9 @@ def convert_outlook_msg(msg_bytes):
return rfc822
def parse_email(data, strip_attachment_payloads=False):
def parse_email(
data: Union[bytes, str], *, strip_attachment_payloads: bool = False
) -> dict:
"""
A simplified email parser

View File

@@ -1,3 +1,9 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
from typing import Any, Optional, Union
import requests
from parsedmarc import logger
@@ -7,17 +13,23 @@ from parsedmarc.constants import USER_AGENT
class WebhookClient(object):
"""A client for webhooks"""
def __init__(self, aggregate_url, forensic_url, smtp_tls_url, timeout=60):
def __init__(
self,
aggregate_url: str,
failure_url: str,
smtp_tls_url: str,
timeout: Optional[int] = 60,
):
"""
Initializes the WebhookClient
Args:
aggregate_url (str): The aggregate report webhook url
forensic_url (str): The forensic report webhook url
failure_url (str): The failure report webhook url
smtp_tls_url (str): The smtp_tls report webhook url
timeout (int): The timeout to use when calling the webhooks
"""
self.aggregate_url = aggregate_url
self.forensic_url = forensic_url
self.failure_url = failure_url
self.smtp_tls_url = smtp_tls_url
self.timeout = timeout
self.session = requests.Session()
@@ -26,26 +38,32 @@ class WebhookClient(object):
"Content-Type": "application/json",
}
def save_forensic_report_to_webhook(self, report):
def save_failure_report_to_webhook(self, report: str):
try:
self._send_to_webhook(self.forensic_url, report)
self._send_to_webhook(self.failure_url, report)
except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__()))
def save_smtp_tls_report_to_webhook(self, report):
def save_smtp_tls_report_to_webhook(self, report: str):
try:
self._send_to_webhook(self.smtp_tls_url, report)
except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__()))
def save_aggregate_report_to_webhook(self, report):
def save_aggregate_report_to_webhook(self, report: str):
try:
self._send_to_webhook(self.aggregate_url, report)
except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__()))
def _send_to_webhook(self, webhook_url, payload):
def _send_to_webhook(
self, webhook_url: str, payload: Union[bytes, str, dict[str, Any]]
):
try:
self.session.post(webhook_url, data=payload, timeout=self.timeout)
except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__()))
# Backward-compatible aliases
WebhookClient.save_forensic_report_to_webhook = WebhookClient.save_failure_report_to_webhook

View File

@@ -2,6 +2,7 @@
requires = [
"hatchling>=1.27.0",
]
requires_python = ">=3.9,<3.14"
build-backend = "hatchling.build"
[project]
@@ -28,6 +29,7 @@ classifiers = [
"Operating System :: OS Independent",
"Programming Language :: Python :: 3"
]
requires-python = ">=3.9"
dependencies = [
"azure-identity>=1.8.0",
"azure-monitor-ingestion>=1.0.0",
@@ -46,7 +48,7 @@ dependencies = [
"imapclient>=2.1.0",
"kafka-python-ng>=2.2.2",
"lxml>=4.4.0",
"mailsuite>=1.9.18",
"mailsuite>=1.11.2",
"msgraph-core==0.2.2",
"opensearch-py>=2.4.2,<=3.0.0",
"publicsuffixlist>=0.10.0",
@@ -86,11 +88,11 @@ include = [
[tool.hatch.build]
exclude = [
"base_reverse_dns.csv",
"find_bad_utf8.py",
"find_unknown_base_reverse_dns.py",
"unknown_base_reverse_dns.csv",
"sortmaps.py",
"README.md",
"*.bak"
"base_reverse_dns.csv",
"find_bad_utf8.py",
"find_unknown_base_reverse_dns.py",
"unknown_base_reverse_dns.csv",
"sortmaps.py",
"README.md",
"*.bak"
]

View File

@@ -0,0 +1,48 @@
<feedback xmlns="urn:ietf:params:xml:ns:dmarc-2.0">
<version>1.0</version>
<report_metadata>
<org_name>Sample Reporter</org_name>
<email>report_sender@example-reporter.com</email>
<extra_contact_info>...</extra_contact_info>
<report_id>3v98abbp8ya9n3va8yr8oa3ya</report_id>
<date_range>
<begin>302832000</begin>
<end>302918399</end>
</date_range>
<generator>Example DMARC Aggregate Reporter v1.2</generator>
</report_metadata>
<policy_published>
<domain>example.com</domain>
<p>quarantine</p>
<sp>none</sp>
<np>none</np>
<testing>n</testing>
<discovery_method>treewalk</discovery_method>
</policy_published>
<record>
<row>
<source_ip>192.0.2.123</source_ip>
<count>123</count>
<policy_evaluated>
<disposition>pass</disposition>
<dkim>pass</dkim>
<spf>fail</spf>
</policy_evaluated>
</row>
<identifiers>
<envelope_from>example.com</envelope_from>
<header_from>example.com</header_from>
</identifiers>
<auth_results>
<dkim>
<domain>example.com</domain>
<result>pass</result>
<selector>abc123</selector>
</dkim>
<spf>
<domain>example.com</domain>
<result>fail</result>
</spf>
</auth_results>
</record>
</feedback>

View File

@@ -0,0 +1,77 @@
<?xml version="1.0"?>
<feedback>
<version>2.0</version>
<report_metadata>
<org_name>example.net</org_name>
<email>postmaster@example.net</email>
<report_id>dmarcbis-test-report-001</report_id>
<date_range>
<begin>1700000000</begin>
<end>1700086399</end>
</date_range>
</report_metadata>
<policy_published>
<domain>example.com</domain>
<adkim>s</adkim>
<aspf>s</aspf>
<p>reject</p>
<sp>quarantine</sp>
<np>reject</np>
<testing>y</testing>
<discovery_method>treewalk</discovery_method>
<fo>1</fo>
</policy_published>
<record>
<row>
<source_ip>198.51.100.1</source_ip>
<count>5</count>
<policy_evaluated>
<disposition>none</disposition>
<dkim>pass</dkim>
<spf>pass</spf>
</policy_evaluated>
</row>
<identifiers>
<envelope_from>example.com</envelope_from>
<header_from>example.com</header_from>
</identifiers>
<auth_results>
<dkim>
<domain>example.com</domain>
<selector>selector1</selector>
<result>pass</result>
</dkim>
<spf>
<domain>example.com</domain>
<scope>mfrom</scope>
<result>pass</result>
</spf>
</auth_results>
</record>
<record>
<row>
<source_ip>203.0.113.10</source_ip>
<count>2</count>
<policy_evaluated>
<disposition>reject</disposition>
<dkim>fail</dkim>
<spf>fail</spf>
<reason>
<type>other</type>
<comment>sender not authorized</comment>
</reason>
</policy_evaluated>
</row>
<identifiers>
<envelope_from>spoofed.example.com</envelope_from>
<header_from>example.com</header_from>
</identifiers>
<auth_results>
<spf>
<domain>spoofed.example.com</domain>
<scope>mfrom</scope>
<result>fail</result>
</spf>
</auth_results>
</record>
</feedback>

View File

@@ -60,10 +60,10 @@ Create Dashboards
9. Click Save
10. Click Dashboards
11. Click Create New Dashboard
12. Use a descriptive title, such as "Forensic DMARC Data"
12. Use a descriptive title, such as "Failure DMARC Data"
13. Click Create Dashboard
14. Click on the Source button
15. Paste the content of ''dmarc_forensic_dashboard.xml`` into the source editor
15. Paste the content of ''dmarc_failure_dashboard.xml`` into the source editor
16. If the index storing the DMARC data is not named email, replace index="email" accordingly
17. Click Save

View File

@@ -1,8 +1,8 @@
<form theme="dark" version="1.1">
<label>Forensic DMARC Data</label>
<label>Failure DMARC Data</label>
<search id="base_search">
<query>
index="email" sourcetype="dmarc:forensic" parsed_sample.headers.From=$header_from$ parsed_sample.headers.To=$header_to$ parsed_sample.headers.Subject=$header_subject$ source.ip_address=$source_ip_address$ source.reverse_dns=$source_reverse_dns$ source.country=$source_country$
index="email" (sourcetype="dmarc:failure" OR sourcetype="dmarc:forensic") parsed_sample.headers.From=$header_from$ parsed_sample.headers.To=$header_to$ parsed_sample.headers.Subject=$header_subject$ source.ip_address=$source_ip_address$ source.reverse_dns=$source_reverse_dns$ source.country=$source_country$
| table *
</query>
<earliest>$time_range.earliest$</earliest>
@@ -43,7 +43,7 @@
</fieldset>
<row>
<panel>
<title>Forensic samples</title>
<title>Failure samples</title>
<table>
<search base="base_search">
<query>| table arrival_date_utc authentication_results parsed_sample.headers.From,parsed_sample.headers.To,parsed_sample.headers.Subject | sort -arrival_date_utc</query>
@@ -59,7 +59,7 @@
</row>
<row>
<panel>
<title>Forensic samples by country</title>
<title>Failure samples by country</title>
<map>
<search base="base_search">
<query>| iplocation source.ip_address| stats count by Country | geom geo_countries featureIdField="Country"</query>
@@ -72,7 +72,7 @@
</row>
<row>
<panel>
<title>Forensic samples by IP address</title>
<title>Failure samples by IP address</title>
<table>
<search base="base_search">
<query>| iplocation source.ip_address | stats count by source.ip_address,source.reverse_dns | sort -count</query>
@@ -85,7 +85,7 @@
</table>
</panel>
<panel>
<title>Forensic samples by country ISO code</title>
<title>Failure samples by country ISO code</title>
<table>
<search base="base_search">
<query>| stats count by source.country | sort - count</query>

176
tests.py Normal file → Executable file
View File

@@ -1,3 +1,6 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function, unicode_literals
import os
@@ -9,6 +12,9 @@ from lxml import etree
import parsedmarc
import parsedmarc.utils
# Detect if running in GitHub Actions to skip DNS lookups
OFFLINE_MODE = os.environ.get("GITHUB_ACTIONS", "false").lower() == "true"
def minify_xml(xml_string):
parser = etree.XMLParser(remove_blank_text=True)
@@ -74,7 +80,7 @@ class Test(unittest.TestCase):
print()
file = "samples/extract_report/nice-input.xml"
print("Testing {0}: ".format(file), end="")
xmlout = parsedmarc.extract_report(file)
xmlout = parsedmarc.extract_report_from_file_path(file)
xmlin_file = open("samples/extract_report/nice-input.xml")
xmlin = xmlin_file.read()
xmlin_file.close()
@@ -118,7 +124,7 @@ class Test(unittest.TestCase):
continue
print("Testing {0}: ".format(sample_path), end="")
parsed_report = parsedmarc.parse_report_file(
sample_path, always_use_local_files=True
sample_path, always_use_local_files=True, offline=OFFLINE_MODE
)["report"]
parsedmarc.parsed_aggregate_reports_to_csv(parsed_report)
print("Passed!")
@@ -126,21 +132,173 @@ class Test(unittest.TestCase):
def testEmptySample(self):
"""Test empty/unparasable report"""
with self.assertRaises(parsedmarc.ParserError):
parsedmarc.parse_report_file("samples/empty.xml")
parsedmarc.parse_report_file("samples/empty.xml", offline=OFFLINE_MODE)
def testForensicSamples(self):
"""Test sample forensic/ruf/failure DMARC reports"""
"""Test sample failure/ruf DMARC reports"""
print()
sample_paths = glob("samples/forensic/*.eml")
for sample_path in sample_paths:
print("Testing {0}: ".format(sample_path), end="")
with open(sample_path) as sample_file:
sample_content = sample_file.read()
parsed_report = parsedmarc.parse_report_email(sample_content)["report"]
parsed_report = parsedmarc.parse_report_file(sample_path)["report"]
parsedmarc.parsed_forensic_reports_to_csv(parsed_report)
parsed_report = parsedmarc.parse_report_email(
sample_content, offline=OFFLINE_MODE
)["report"]
parsed_report = parsedmarc.parse_report_file(
sample_path, offline=OFFLINE_MODE
)["report"]
parsedmarc.parsed_failure_reports_to_csv(parsed_report)
print("Passed!")
def testFailureReportBackwardCompat(self):
"""Test that old forensic function aliases still work"""
self.assertIs(
parsedmarc.parse_forensic_report,
parsedmarc.parse_failure_report,
)
self.assertIs(
parsedmarc.parsed_forensic_reports_to_csv,
parsedmarc.parsed_failure_reports_to_csv,
)
self.assertIs(
parsedmarc.parsed_forensic_reports_to_csv_rows,
parsedmarc.parsed_failure_reports_to_csv_rows,
)
self.assertIs(
parsedmarc.InvalidForensicReport,
parsedmarc.InvalidFailureReport,
)
def testDMARCbisDraftSample(self):
"""Test parsing the sample report from the DMARCbis aggregate draft"""
print()
sample_path = (
"samples/aggregate/dmarcbis-draft-sample.xml"
)
print("Testing {0}: ".format(sample_path), end="")
result = parsedmarc.parse_report_file(
sample_path, always_use_local_files=True, offline=True
)
report = result["report"]
# Verify report_type
self.assertEqual(result["report_type"], "aggregate")
# Verify xml_schema
self.assertEqual(report["xml_schema"], "1.0")
# Verify report_metadata
metadata = report["report_metadata"]
self.assertEqual(metadata["org_name"], "Sample Reporter")
self.assertEqual(
metadata["org_email"], "report_sender@example-reporter.com"
)
self.assertEqual(
metadata["org_extra_contact_info"], "..."
)
self.assertEqual(
metadata["report_id"], "3v98abbp8ya9n3va8yr8oa3ya"
)
self.assertEqual(
metadata["generator"],
"Example DMARC Aggregate Reporter v1.2",
)
# Verify DMARCbis policy_published fields
pp = report["policy_published"]
self.assertEqual(pp["domain"], "example.com")
self.assertEqual(pp["p"], "quarantine")
self.assertEqual(pp["sp"], "none")
self.assertEqual(pp["np"], "none")
self.assertEqual(pp["testing"], "n")
self.assertEqual(pp["discovery_method"], "treewalk")
# adkim/aspf/pct/fo default when not in XML
self.assertEqual(pp["adkim"], "r")
self.assertEqual(pp["aspf"], "r")
self.assertEqual(pp["pct"], "100")
self.assertEqual(pp["fo"], "0")
# Verify record
self.assertEqual(len(report["records"]), 1)
rec = report["records"][0]
self.assertEqual(rec["source"]["ip_address"], "192.0.2.123")
self.assertEqual(rec["count"], 123)
self.assertEqual(
rec["policy_evaluated"]["disposition"], "pass"
)
self.assertEqual(rec["policy_evaluated"]["dkim"], "pass")
self.assertEqual(rec["policy_evaluated"]["spf"], "fail")
# Verify DKIM auth result with human_result
self.assertEqual(len(rec["auth_results"]["dkim"]), 1)
dkim = rec["auth_results"]["dkim"][0]
self.assertEqual(dkim["domain"], "example.com")
self.assertEqual(dkim["selector"], "abc123")
self.assertEqual(dkim["result"], "pass")
self.assertIsNone(dkim["human_result"])
# Verify SPF auth result with human_result
self.assertEqual(len(rec["auth_results"]["spf"]), 1)
spf = rec["auth_results"]["spf"][0]
self.assertEqual(spf["domain"], "example.com")
self.assertEqual(spf["result"], "fail")
self.assertIsNone(spf["human_result"])
# Verify CSV output includes new fields
csv = parsedmarc.parsed_aggregate_reports_to_csv(report)
header = csv.split("\n")[0]
self.assertIn("np", header.split(","))
self.assertIn("testing", header.split(","))
self.assertIn("discovery_method", header.split(","))
print("Passed!")
def testDMARCbisFieldsWithRFC7489(self):
"""Test that RFC 7489 reports have None for DMARCbis-only fields"""
print()
sample_path = (
"samples/aggregate/"
"example.net!example.com!1529366400!1529452799.xml"
)
print("Testing {0}: ".format(sample_path), end="")
result = parsedmarc.parse_report_file(
sample_path, always_use_local_files=True, offline=True
)
report = result["report"]
pp = report["policy_published"]
# RFC 7489 fields present
self.assertEqual(pp["pct"], "100")
self.assertEqual(pp["fo"], "0")
# DMARCbis fields absent (None)
self.assertIsNone(pp["np"])
self.assertIsNone(pp["testing"])
self.assertIsNone(pp["discovery_method"])
# generator absent (None)
self.assertIsNone(report["report_metadata"]["generator"])
print("Passed!")
def testDMARCbisWithExplicitFields(self):
"""Test DMARCbis report with explicit testing and discovery_method"""
print()
sample_path = (
"samples/aggregate/"
"dmarcbis-example.net!example.com!1700000000!1700086399.xml"
)
print("Testing {0}: ".format(sample_path), end="")
result = parsedmarc.parse_report_file(
sample_path, always_use_local_files=True, offline=True
)
report = result["report"]
pp = report["policy_published"]
self.assertEqual(pp["np"], "reject")
self.assertEqual(pp["testing"], "y")
self.assertEqual(pp["discovery_method"], "treewalk")
print("Passed!")
def testSmtpTlsSamples(self):
"""Test sample SMTP TLS reports"""
print()
@@ -149,7 +307,9 @@ class Test(unittest.TestCase):
if os.path.isdir(sample_path):
continue
print("Testing {0}: ".format(sample_path), end="")
parsed_report = parsedmarc.parse_report_file(sample_path)["report"]
parsed_report = parsedmarc.parse_report_file(
sample_path, offline=OFFLINE_MODE
)["report"]
parsedmarc.parsed_smtp_tls_reports_to_csv(parsed_report)
print("Passed!")