Compare commits

...

39 Commits

Author SHA1 Message Date
Blackmoon
221bc332ef Fixed a typo in policies.successful_session_count (#654) 2026-02-09 13:57:11 -05:00
Sean Whalen
a2a75f7a81 Fix timestamp parsing in aggregate report by removing fractional seconds 2026-01-21 13:08:48 -05:00
Anael Mobilia
50fcb51577 Update supported Python versions in docs + readme (#652)
* Update README.md

* Update index.md

* Update python-tests.yml
2026-01-19 14:40:01 -05:00
Sean Whalen
dd9ef90773 9.0.10
- Support Python 3.14+
2026-01-17 14:09:18 -05:00
Sean Whalen
0e3a4b0f06 9.0.9
Validate that a string is base64-encoded before trying to base64 decode it. (PRs #648 and #649)
2026-01-08 13:29:23 -05:00
maraspr
343b53ef18 remove newlines before b64decode (#649) 2026-01-08 12:24:20 -05:00
maraspr
792079a3e8 Validate that string is base64 (#648) 2026-01-08 10:15:27 -05:00
Sean Whalen
1f3a1fc843 Better typing 2025-12-29 17:14:54 -05:00
Sean Whalen
34fa0c145d 9.0.8
- Fix logging configuration not propagating to child parser processes (#646).
- Update `mailsuite` dependency to `?=1.11.1` to solve issues with iCloud IMAP (#493).
2025-12-29 17:07:38 -05:00
Copilot
6719a06388 Fix logging configuration not propagating to child parser processes (#646)
* Initial plan

* Fix logging configuration propagation to child parser processes

- Add _configure_logging() helper function to set up logging in child processes
- Modified cli_parse() to accept log_level and log_file parameters
- Pass current logging configuration from parent to child processes
- Logging warnings/errors from child processes now properly display

Fixes issue where logging handlers in parent process were not inherited by
child processes created via multiprocessing.Process(). Child processes now
configure their own logging with the same settings as the parent.

Tested with sample files and confirmed warnings from DNS exceptions in child
processes are now visible.

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

* Address code review feedback on logging configuration

- Use exact type check (type(h) is logging.StreamHandler) instead of isinstance
  to avoid confusion with FileHandler subclass
- Catch specific exceptions (IOError, OSError, PermissionError) instead of
  bare Exception when creating FileHandler
- Kept logging.ERROR as default to maintain consistency with existing behavior

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2025-12-29 15:07:22 -05:00
Sean Whalen
eafa435868 Code cleanup 2025-12-29 14:32:05 -05:00
Sean Whalen
5d772c3b36 Bump version to 9.0.7 and update changelog with IMAP since option fix 2025-12-29 14:23:50 -05:00
Copilot
72cabbef23 Fix IMAP SEARCH SINCE date format to RFC 3501 DD-Mon-YYYY (#645)
* Initial plan

* Fix IMAP since option date format to use RFC 3501 compliant DD-Mon-YYYY format

Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>
2025-12-29 14:18:48 -05:00
Sean Whalen
3d74cd6ac0 Update CHANGELOG with issue reference for email read status
Added a reference to issue #625 regarding email read status.
2025-12-29 12:10:19 -05:00
Tomáš Kováčik
d1ac59a016 fix #641 (#642)
* fix smtptls and forensic reports for GELF

* add policy_domain, policy_type and failed_session_count to record row

* Remove unused import of json in gelf.py

---------

Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>
2025-12-29 12:05:07 -05:00
Anael Mobilia
7fdd53008f Update README.md (#644) 2025-12-29 10:36:21 -05:00
Sean Whalen
35331d4b84 Add parsedmarc.types module to API reference documentation 2025-12-25 17:24:45 -05:00
Sean Whalen
de9edd3590 Add note about email read status in Microsoft 365 to changelog 2025-12-25 17:16:39 -05:00
Sean Whalen
abf4bdba13 Add type annotations for SMTP TLS and forensic report structures 2025-12-25 16:39:33 -05:00
Sean Whalen
7b842740f5 Change file permissions for tests.py to make it executable 2025-12-25 16:02:33 -05:00
Sean Whalen
ebe3ccf40a Update changelog for version 9.0.6 and set version in constants.py 2025-12-25 16:01:25 -05:00
Sean Whalen
808285658f Refactor function parameters to use non-Optional types where applicable 2025-12-25 16:01:12 -05:00
Sean Whalen
bc1dae29bd Update mailsuite dependency version to 1.11.0 2025-12-25 15:32:27 -05:00
Sean Whalen
4b904444e5 Refactor and improve parsing and extraction functions
- Updated `extract_report` to handle various input types more robustly, removing unnecessary complexity and improving error handling.
- Simplified the handling of file-like objects and added checks for binary mode.
- Enhanced the `parse_report_email` function to streamline input processing and improve type handling.
- Introduced TypedDicts for better type safety in `utils.py`, specifically for reverse DNS and IP address information.
- Refined the configuration loading in `cli.py` to ensure boolean values are consistently cast to `bool`.
- Improved overall code readability and maintainability by restructuring and clarifying logic in several functions.
2025-12-25 15:30:20 -05:00
Sean Whalen
3608bce344 Remove unused import of Union and cast from cli.py 2025-12-24 16:53:22 -05:00
Sean Whalen
fe809c4c3f Add type ignore comments for Pyright in elastic.py and opensearch.py 2025-12-24 16:49:42 -05:00
Sean Whalen
a76c2f9621 More code cleanup 2025-12-24 16:36:59 -05:00
Sean Whalen
bb8f4002bf Use literal dicts instead of ordered dicts and other code cleanup 2025-12-24 15:04:10 -05:00
Sean Whalen
b5773c6b4a Fix etree import to type checkers don't complain 2025-12-24 14:37:38 -05:00
Sean Whalen
b99bd67225 Fix get_base_domain() typing 2025-12-24 14:32:05 -05:00
Sean Whalen
af9ad568ec Specify Python version requirements in pyproject.toml 2025-12-17 16:18:24 -05:00
Sean Whalen
748164d177 Fix #638 2025-12-17 16:09:26 -05:00
Sean Whalen
487e5e1149 Format on build 2025-12-12 15:56:52 -05:00
Sean Whalen
73010cf964 Use ruff for code formatting 2025-12-12 15:44:46 -05:00
Sean Whalen
a4a5475aa8 Fix another typo before releasing 9.0.5 2025-12-08 15:29:48 -05:00
Sean Whalen
dab78880df Actual 9.0.5 release
Fix typo
2025-12-08 15:26:58 -05:00
Sean Whalen
fb54e3b742 9.0.5
- Fix report type detection bug introduced in `9.0.4` (yanked).
2025-12-08 15:22:02 -05:00
Sean Whalen
6799f10364 9.0.4
Fixes

- Fix saving reports to OpenSearch ([#637](https://github.com/domainaware/parsedmarc/issues/637))
- Fix parsing certain DMARC failure/forensic reports
- Some fixes to type hints (incomplete, but published as-is due to the above bugs)
2025-12-08 13:26:59 -05:00
Sean Whalen
445c9565a4 Update bug link in docs 2025-12-06 15:05:19 -05:00
29 changed files with 1405 additions and 864 deletions

View File

@@ -30,7 +30,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
steps:
- uses: actions/checkout@v5

302
.vscode/settings.json vendored
View File

@@ -1,150 +1,166 @@
{
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
// Let Ruff handle lint fixes + import sorting on save
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
}
},
"markdownlint.config": {
"MD024": false
},
"cSpell.words": [
"adkim",
"akamaiedge",
"amsmath",
"andrewmcgilvray",
"arcname",
"aspf",
"autoclass",
"automodule",
"backported",
"bellsouth",
"boto",
"brakhane",
"Brightmail",
"CEST",
"CHACHA",
"checkdmarc",
"Codecov",
"confnew",
"dateparser",
"dateutil",
"Davmail",
"DBIP",
"dearmor",
"deflist",
"devel",
"DMARC",
"Dmarcian",
"dnspython",
"dollarmath",
"dpkg",
"exampleuser",
"expiringdict",
"fieldlist",
"GELF",
"genindex",
"geoip",
"geoipupdate",
"Geolite",
"geolocation",
"githubpages",
"Grafana",
"hostnames",
"htpasswd",
"httpasswd",
"httplib",
"IMAP",
"imapclient",
"infile",
"Interaktive",
"IPDB",
"journalctl",
"keepalive",
"keyout",
"keyrings",
"Leeman",
"libemail",
"linkify",
"LISTSERV",
"lxml",
"mailparser",
"mailrelay",
"mailsuite",
"maxdepth",
"MAXHEADERS",
"maxmind",
"mbox",
"mfrom",
"michaeldavie",
"mikesiegel",
"Mimecast",
"mitigations",
"MMDB",
"modindex",
"msgconvert",
"msgraph",
"MSSP",
"multiprocess",
"Munge",
"ndjson",
"newkey",
"Nhcm",
"nojekyll",
"nondigest",
"nosecureimap",
"nosniff",
"nwettbewerb",
"opensearch",
"opensearchpy",
"parsedmarc",
"passsword",
"Postorius",
"premade",
"procs",
"publicsuffix",
"publicsuffixlist",
"publixsuffix",
"pygelf",
"pypy",
"pytest",
"quickstart",
"Reindex",
"replyto",
"reversename",
"Rollup",
"Rpdm",
"SAMEORIGIN",
"sdist",
"Servernameone",
"setuptools",
"smartquotes",
"SMTPTLS",
"sortlists",
"sortmaps",
"sourcetype",
"STARTTLS",
"tasklist",
"timespan",
"tlsa",
"tlsrpt",
"toctree",
"TQDDM",
"tqdm",
"truststore",
"Übersicht",
"uids",
"Uncategorized",
"unparasable",
"uper",
"urllib",
"Valimail",
"venv",
"Vhcw",
"viewcode",
"virtualenv",
"WBITS",
"webmail",
"Wettbewerber",
"Whalen",
"whitespaces",
"xennn",
"xmltodict",
"xpack",
"zscholl"
"adkim",
"akamaiedge",
"amsmath",
"andrewmcgilvray",
"arcname",
"aspf",
"autoclass",
"automodule",
"backported",
"bellsouth",
"boto",
"brakhane",
"Brightmail",
"CEST",
"CHACHA",
"checkdmarc",
"Codecov",
"confnew",
"dateparser",
"dateutil",
"Davmail",
"DBIP",
"dearmor",
"deflist",
"devel",
"DMARC",
"Dmarcian",
"dnspython",
"dollarmath",
"dpkg",
"exampleuser",
"expiringdict",
"fieldlist",
"GELF",
"genindex",
"geoip",
"geoipupdate",
"Geolite",
"geolocation",
"githubpages",
"Grafana",
"hostnames",
"htpasswd",
"httpasswd",
"httplib",
"ifhost",
"IMAP",
"imapclient",
"infile",
"Interaktive",
"IPDB",
"journalctl",
"kafkaclient",
"keepalive",
"keyout",
"keyrings",
"Leeman",
"libemail",
"linkify",
"LISTSERV",
"loganalytics",
"lxml",
"mailparser",
"mailrelay",
"mailsuite",
"maxdepth",
"MAXHEADERS",
"maxmind",
"mbox",
"mfrom",
"mhdw",
"michaeldavie",
"mikesiegel",
"Mimecast",
"mitigations",
"MMDB",
"modindex",
"msgconvert",
"msgraph",
"MSSP",
"multiprocess",
"Munge",
"ndjson",
"newkey",
"Nhcm",
"nojekyll",
"nondigest",
"nosecureimap",
"nosniff",
"nwettbewerb",
"opensearch",
"opensearchpy",
"parsedmarc",
"passsword",
"pbar",
"Postorius",
"premade",
"privatesuffix",
"procs",
"publicsuffix",
"publicsuffixlist",
"publixsuffix",
"pygelf",
"pypy",
"pytest",
"quickstart",
"Reindex",
"replyto",
"reversename",
"Rollup",
"Rpdm",
"SAMEORIGIN",
"sdist",
"Servernameone",
"setuptools",
"smartquotes",
"SMTPTLS",
"sortlists",
"sortmaps",
"sourcetype",
"STARTTLS",
"tasklist",
"timespan",
"tlsa",
"tlsrpt",
"toctree",
"TQDDM",
"tqdm",
"truststore",
"Übersicht",
"uids",
"Uncategorized",
"unparasable",
"uper",
"urllib",
"Valimail",
"venv",
"Vhcw",
"viewcode",
"virtualenv",
"WBITS",
"webmail",
"Wettbewerber",
"Whalen",
"whitespaces",
"xennn",
"xmltodict",
"xpack",
"zscholl"
],
}

View File

@@ -1,5 +1,60 @@
# Changelog
## 9.0.10
- Support Python 3.14+
## 9.0.9
### Fixes
- Validate that a string is base64-encoded before trying to base64 decode it. (PRs #648 and #649)
## 9.0.8
### Fixes
- Fix logging configuration not propagating to child parser processes (#646).
- Update `mailsuite` dependency to `?=1.11.1` to solve issues with iCloud IMAP (#493).
## 9.0.7
## Fixes
- Fix IMAP `since` option (#PR 645 closes issues #581 and #643).
## 9.0.6
### Fixes
- Fix #638.
- Fix/clarify report extraction and parsing behavior for multiple input types (bytes, base64 strings, and file-like objects).
- Fix type mismatches that could cause runtime issues in SMTP emailing and CLI option handling.
### Improvements
- Improve type hints across the library (Pylance/Pyright friendliness) and reduce false-positive linter errors.
- Emails in Microsoft 365 are now marked read as they are read. This provides constancy with other mailbox types, and gives you a indication of when emails are being read as they are processed in batches. (Close #625)
### Compatibility / Dependencies
- Set Python requirement to `>=3.9,<3.14`.
- Bump `mailsuite` requirement to `>=1.11.0`.
## 9.0.5
## Fixes
- Fix report type detection introduced in `9.0.4`.
## 9.0.4 (Yanked)
### Fixes
- Fix saving reports to OpenSearch ([#637](https://github.com/domainaware/parsedmarc/issues/637))
- Fix parsing certain DMARC failure/forensic reports
- Some fixes to type hints (incomplete, but published as-is due to the above bugs)
## 9.0.3
### Fixes

View File

@@ -61,4 +61,4 @@ for RHEL or Debian.
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | | Not currently supported due to [this Python bug](https://github.com/python/cpython/issues/142307)|
| 3.14 | | Actively maintained |

View File

@@ -9,7 +9,6 @@ fi
. venv/bin/activate
pip install .[build]
ruff format .
ruff check .
cd docs
make clean
make html

View File

@@ -28,6 +28,13 @@
:members:
```
## parsedmarc.types
```{eval-rst}
.. automodule:: parsedmarc.types
:members:
```
## parsedmarc.utils
```{eval-rst}

View File

@@ -61,7 +61,7 @@ for RHEL or Debian.
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
| 3.14 | | Not currently supported due to [this Python bug](https://github.com/python/cpython/issues/142307)|
| 3.14 | | Actively maintained |
```{toctree}
:caption: 'Contents'

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@@ -3,54 +3,55 @@
"""A CLI for parsing DMARC reports"""
from argparse import Namespace, ArgumentParser
import http.client
import json
import logging
import os
import sys
from argparse import ArgumentParser, Namespace
from configparser import ConfigParser
from glob import glob
import logging
import math
import yaml
from collections import OrderedDict
import json
from ssl import CERT_NONE, create_default_context
from multiprocessing import Pipe, Process
import sys
import http.client
from ssl import CERT_NONE, create_default_context
import yaml
from tqdm import tqdm
from parsedmarc import (
get_dmarc_reports_from_mailbox,
watch_inbox,
parse_report_file,
get_dmarc_reports_from_mbox,
elastic,
opensearch,
kafkaclient,
splunk,
save_output,
email_results,
SEEN_AGGREGATE_REPORT_IDS,
InvalidDMARCReport,
ParserError,
__version__,
InvalidDMARCReport,
s3,
syslog,
loganalytics,
elastic,
email_results,
gelf,
get_dmarc_reports_from_mailbox,
get_dmarc_reports_from_mbox,
kafkaclient,
loganalytics,
opensearch,
parse_report_file,
s3,
save_output,
splunk,
syslog,
watch_inbox,
webhook,
)
from parsedmarc.log import logger
from parsedmarc.mail import (
IMAPConnection,
MSGraphConnection,
GmailConnection,
IMAPConnection,
MaildirConnection,
MSGraphConnection,
)
from parsedmarc.mail.graph import AuthMethod
from parsedmarc.types import ParsingResults
from parsedmarc.utils import get_base_domain, get_reverse_dns, is_mbox
from parsedmarc.log import logger
from parsedmarc.utils import is_mbox, get_reverse_dns, get_base_domain
from parsedmarc import SEEN_AGGREGATE_REPORT_IDS
http.client._MAXHEADERS = 200 # pylint:disable=protected-access
# Increase the max header limit for very large emails. `_MAXHEADERS` is a
# private stdlib attribute and may not exist in type stubs.
setattr(http.client, "_MAXHEADERS", 200)
formatter = logging.Formatter(
fmt="%(levelname)8s:%(filename)s:%(lineno)d:%(message)s",
@@ -67,6 +68,48 @@ def _str_to_list(s):
return list(map(lambda i: i.lstrip(), _list))
def _configure_logging(log_level, log_file=None):
"""
Configure logging for the current process.
This is needed for child processes to properly log messages.
Args:
log_level: The logging level (e.g., logging.DEBUG, logging.WARNING)
log_file: Optional path to log file
"""
# Get the logger
from parsedmarc.log import logger
# Set the log level
logger.setLevel(log_level)
# Add StreamHandler with formatter if not already present
# Check if we already have a StreamHandler to avoid duplicates
# Use exact type check to distinguish from FileHandler subclass
has_stream_handler = any(type(h) is logging.StreamHandler for h in logger.handlers)
if not has_stream_handler:
formatter = logging.Formatter(
fmt="%(levelname)8s:%(filename)s:%(lineno)d:%(message)s",
datefmt="%Y-%m-%d:%H:%M:%S",
)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
# Add FileHandler if log_file is specified
if log_file:
try:
fh = logging.FileHandler(log_file, "a")
formatter = logging.Formatter(
"%(asctime)s - %(levelname)s - [%(filename)s:%(lineno)d] - %(message)s"
)
fh.setFormatter(formatter)
logger.addHandler(fh)
except (IOError, OSError, PermissionError) as error:
logger.warning("Unable to write to log file: {}".format(error))
def cli_parse(
file_path,
sa,
@@ -79,8 +122,29 @@ def cli_parse(
reverse_dns_map_url,
normalize_timespan_threshold_hours,
conn,
log_level=logging.ERROR,
log_file=None,
):
"""Separated this function for multiprocessing"""
"""Separated this function for multiprocessing
Args:
file_path: Path to the report file
sa: Strip attachment payloads flag
nameservers: List of nameservers
dns_timeout: DNS timeout
ip_db_path: Path to IP database
offline: Offline mode flag
always_use_local_files: Always use local files flag
reverse_dns_map_path: Path to reverse DNS map
reverse_dns_map_url: URL to reverse DNS map
normalize_timespan_threshold_hours: Timespan threshold
conn: Pipe connection for IPC
log_level: Logging level for this process
log_file: Optional path to log file
"""
# Configure logging in this child process
_configure_logging(log_level, log_file)
try:
file_results = parse_report_file(
file_path,
@@ -105,6 +169,7 @@ def _main():
"""Called when the module is executed"""
def get_index_prefix(report):
domain = None
if index_prefix_domain_map is None:
return None
if "policy_published" in report:
@@ -138,7 +203,7 @@ def _main():
print(output_str)
if opts.output:
save_output(
results,
reports_,
output_directory=opts.output,
aggregate_json_filename=opts.aggregate_json_filename,
forensic_json_filename=opts.forensic_json_filename,
@@ -677,7 +742,7 @@ def _main():
if "general" in config.sections():
general_config = config["general"]
if "silent" in general_config:
opts.silent = general_config.getboolean("silent")
opts.silent = bool(general_config.getboolean("silent"))
if "normalize_timespan_threshold_hours" in general_config:
opts.normalize_timespan_threshold_hours = general_config.getfloat(
"normalize_timespan_threshold_hours"
@@ -686,10 +751,10 @@ def _main():
with open(general_config["index_prefix_domain_map"]) as f:
index_prefix_domain_map = yaml.safe_load(f)
if "offline" in general_config:
opts.offline = general_config.getboolean("offline")
opts.offline = bool(general_config.getboolean("offline"))
if "strip_attachment_payloads" in general_config:
opts.strip_attachment_payloads = general_config.getboolean(
"strip_attachment_payloads"
opts.strip_attachment_payloads = bool(
general_config.getboolean("strip_attachment_payloads")
)
if "output" in general_config:
opts.output = general_config["output"]
@@ -707,6 +772,8 @@ def _main():
opts.smtp_tls_csv_filename = general_config["smtp_tls_csv_filename"]
if "dns_timeout" in general_config:
opts.dns_timeout = general_config.getfloat("dns_timeout")
if opts.dns_timeout is None:
opts.dns_timeout = 2
if "dns_test_address" in general_config:
opts.dns_test_address = general_config["dns_test_address"]
if "nameservers" in general_config:
@@ -729,19 +796,19 @@ def _main():
)
exit(-1)
if "save_aggregate" in general_config:
opts.save_aggregate = general_config.getboolean("save_aggregate")
opts.save_aggregate = bool(general_config.getboolean("save_aggregate"))
if "save_forensic" in general_config:
opts.save_forensic = general_config.getboolean("save_forensic")
opts.save_forensic = bool(general_config.getboolean("save_forensic"))
if "save_smtp_tls" in general_config:
opts.save_smtp_tls = general_config.getboolean("save_smtp_tls")
opts.save_smtp_tls = bool(general_config.getboolean("save_smtp_tls"))
if "debug" in general_config:
opts.debug = general_config.getboolean("debug")
opts.debug = bool(general_config.getboolean("debug"))
if "verbose" in general_config:
opts.verbose = general_config.getboolean("verbose")
opts.verbose = bool(general_config.getboolean("verbose"))
if "silent" in general_config:
opts.silent = general_config.getboolean("silent")
opts.silent = bool(general_config.getboolean("silent"))
if "warnings" in general_config:
opts.warnings = general_config.getboolean("warnings")
opts.warnings = bool(general_config.getboolean("warnings"))
if "log_file" in general_config:
opts.log_file = general_config["log_file"]
if "n_procs" in general_config:
@@ -751,15 +818,15 @@ def _main():
else:
opts.ip_db_path = None
if "always_use_local_files" in general_config:
opts.always_use_local_files = general_config.getboolean(
"always_use_local_files"
opts.always_use_local_files = bool(
general_config.getboolean("always_use_local_files")
)
if "reverse_dns_map_path" in general_config:
opts.reverse_dns_map_path = general_config["reverse_dns_path"]
if "reverse_dns_map_url" in general_config:
opts.reverse_dns_map_url = general_config["reverse_dns_url"]
if "prettify_json" in general_config:
opts.prettify_json = general_config.getboolean("prettify_json")
opts.prettify_json = bool(general_config.getboolean("prettify_json"))
if "mailbox" in config.sections():
mailbox_config = config["mailbox"]
@@ -770,11 +837,11 @@ def _main():
if "archive_folder" in mailbox_config:
opts.mailbox_archive_folder = mailbox_config["archive_folder"]
if "watch" in mailbox_config:
opts.mailbox_watch = mailbox_config.getboolean("watch")
opts.mailbox_watch = bool(mailbox_config.getboolean("watch"))
if "delete" in mailbox_config:
opts.mailbox_delete = mailbox_config.getboolean("delete")
opts.mailbox_delete = bool(mailbox_config.getboolean("delete"))
if "test" in mailbox_config:
opts.mailbox_test = mailbox_config.getboolean("test")
opts.mailbox_test = bool(mailbox_config.getboolean("test"))
if "batch_size" in mailbox_config:
opts.mailbox_batch_size = mailbox_config.getint("batch_size")
if "check_timeout" in mailbox_config:
@@ -798,14 +865,14 @@ def _main():
if "port" in imap_config:
opts.imap_port = imap_config.getint("port")
if "timeout" in imap_config:
opts.imap_timeout = imap_config.getfloat("timeout")
opts.imap_timeout = imap_config.getint("timeout")
if "max_retries" in imap_config:
opts.imap_max_retries = imap_config.getint("max_retries")
if "ssl" in imap_config:
opts.imap_ssl = imap_config.getboolean("ssl")
opts.imap_ssl = bool(imap_config.getboolean("ssl"))
if "skip_certificate_verification" in imap_config:
opts.imap_skip_certificate_verification = imap_config.getboolean(
"skip_certificate_verification"
opts.imap_skip_certificate_verification = bool(
imap_config.getboolean("skip_certificate_verification")
)
if "user" in imap_config:
opts.imap_user = imap_config["user"]
@@ -834,7 +901,7 @@ def _main():
"section instead."
)
if "watch" in imap_config:
opts.mailbox_watch = imap_config.getboolean("watch")
opts.mailbox_watch = bool(imap_config.getboolean("watch"))
logger.warning(
"Use of the watch option in the imap "
"configuration section has been deprecated. "
@@ -849,7 +916,7 @@ def _main():
"section instead."
)
if "test" in imap_config:
opts.mailbox_test = imap_config.getboolean("test")
opts.mailbox_test = bool(imap_config.getboolean("test"))
logger.warning(
"Use of the test option in the imap "
"configuration section has been deprecated. "
@@ -943,8 +1010,8 @@ def _main():
opts.graph_url = graph_config["graph_url"]
if "allow_unencrypted_storage" in graph_config:
opts.graph_allow_unencrypted_storage = graph_config.getboolean(
"allow_unencrypted_storage"
opts.graph_allow_unencrypted_storage = bool(
graph_config.getboolean("allow_unencrypted_storage")
)
if "elasticsearch" in config:
@@ -972,10 +1039,10 @@ def _main():
if "index_prefix" in elasticsearch_config:
opts.elasticsearch_index_prefix = elasticsearch_config["index_prefix"]
if "monthly_indexes" in elasticsearch_config:
monthly = elasticsearch_config.getboolean("monthly_indexes")
monthly = bool(elasticsearch_config.getboolean("monthly_indexes"))
opts.elasticsearch_monthly_indexes = monthly
if "ssl" in elasticsearch_config:
opts.elasticsearch_ssl = elasticsearch_config.getboolean("ssl")
opts.elasticsearch_ssl = bool(elasticsearch_config.getboolean("ssl"))
if "cert_path" in elasticsearch_config:
opts.elasticsearch_ssl_cert_path = elasticsearch_config["cert_path"]
if "user" in elasticsearch_config:
@@ -1012,10 +1079,10 @@ def _main():
if "index_prefix" in opensearch_config:
opts.opensearch_index_prefix = opensearch_config["index_prefix"]
if "monthly_indexes" in opensearch_config:
monthly = opensearch_config.getboolean("monthly_indexes")
monthly = bool(opensearch_config.getboolean("monthly_indexes"))
opts.opensearch_monthly_indexes = monthly
if "ssl" in opensearch_config:
opts.opensearch_ssl = opensearch_config.getboolean("ssl")
opts.opensearch_ssl = bool(opensearch_config.getboolean("ssl"))
if "cert_path" in opensearch_config:
opts.opensearch_ssl_cert_path = opensearch_config["cert_path"]
if "user" in opensearch_config:
@@ -1069,9 +1136,11 @@ def _main():
if "password" in kafka_config:
opts.kafka_password = kafka_config["password"]
if "ssl" in kafka_config:
opts.kafka_ssl = kafka_config.getboolean("ssl")
opts.kafka_ssl = bool(kafka_config.getboolean("ssl"))
if "skip_certificate_verification" in kafka_config:
kafka_verify = kafka_config.getboolean("skip_certificate_verification")
kafka_verify = bool(
kafka_config.getboolean("skip_certificate_verification")
)
opts.kafka_skip_certificate_verification = kafka_verify
if "aggregate_topic" in kafka_config:
opts.kafka_aggregate_topic = kafka_config["aggregate_topic"]
@@ -1103,9 +1172,11 @@ def _main():
if "port" in smtp_config:
opts.smtp_port = smtp_config.getint("port")
if "ssl" in smtp_config:
opts.smtp_ssl = smtp_config.getboolean("ssl")
opts.smtp_ssl = bool(smtp_config.getboolean("ssl"))
if "skip_certificate_verification" in smtp_config:
smtp_verify = smtp_config.getboolean("skip_certificate_verification")
smtp_verify = bool(
smtp_config.getboolean("skip_certificate_verification")
)
opts.smtp_skip_certificate_verification = smtp_verify
if "user" in smtp_config:
opts.smtp_user = smtp_config["user"]
@@ -1173,11 +1244,11 @@ def _main():
gmail_api_config = config["gmail_api"]
opts.gmail_api_credentials_file = gmail_api_config.get("credentials_file")
opts.gmail_api_token_file = gmail_api_config.get("token_file", ".token")
opts.gmail_api_include_spam_trash = gmail_api_config.getboolean(
"include_spam_trash", False
opts.gmail_api_include_spam_trash = bool(
gmail_api_config.getboolean("include_spam_trash", False)
)
opts.gmail_api_paginate_messages = gmail_api_config.getboolean(
"paginate_messages", True
opts.gmail_api_paginate_messages = bool(
gmail_api_config.getboolean("paginate_messages", True)
)
opts.gmail_api_scopes = gmail_api_config.get(
"scopes", default_gmail_api_scope
@@ -1191,7 +1262,9 @@ def _main():
if "maildir" in config.sections():
maildir_api_config = config["maildir"]
opts.maildir_path = maildir_api_config.get("maildir_path")
opts.maildir_create = maildir_api_config.get("maildir_create")
opts.maildir_create = bool(
maildir_api_config.getboolean("maildir_create", fallback=False)
)
if "log_analytics" in config.sections():
log_analytics_config = config["log_analytics"]
@@ -1286,6 +1359,11 @@ def _main():
es_aggregate_index = "{0}{1}".format(prefix, es_aggregate_index)
es_forensic_index = "{0}{1}".format(prefix, es_forensic_index)
es_smtp_tls_index = "{0}{1}".format(prefix, es_smtp_tls_index)
elastic_timeout_value = (
float(opts.elasticsearch_timeout)
if opts.elasticsearch_timeout is not None
else 60.0
)
elastic.set_hosts(
opts.elasticsearch_hosts,
use_ssl=opts.elasticsearch_ssl,
@@ -1293,7 +1371,7 @@ def _main():
username=opts.elasticsearch_username,
password=opts.elasticsearch_password,
api_key=opts.elasticsearch_api_key,
timeout=opts.elasticsearch_timeout,
timeout=elastic_timeout_value,
)
elastic.migrate_indexes(
aggregate_indexes=[es_aggregate_index],
@@ -1318,6 +1396,11 @@ def _main():
os_aggregate_index = "{0}{1}".format(prefix, os_aggregate_index)
os_forensic_index = "{0}{1}".format(prefix, os_forensic_index)
os_smtp_tls_index = "{0}{1}".format(prefix, os_smtp_tls_index)
opensearch_timeout_value = (
float(opts.opensearch_timeout)
if opts.opensearch_timeout is not None
else 60.0
)
opensearch.set_hosts(
opts.opensearch_hosts,
use_ssl=opts.opensearch_ssl,
@@ -1325,7 +1408,7 @@ def _main():
username=opts.opensearch_username,
password=opts.opensearch_password,
api_key=opts.opensearch_api_key,
timeout=opts.opensearch_timeout,
timeout=opensearch_timeout_value,
)
opensearch.migrate_indexes(
aggregate_indexes=[os_aggregate_index],
@@ -1434,16 +1517,23 @@ def _main():
results = []
pbar = None
if sys.stdout.isatty():
pbar = tqdm(total=len(file_paths))
for batch_index in range(math.ceil(len(file_paths) / opts.n_procs)):
n_procs = int(opts.n_procs or 1)
if n_procs < 1:
n_procs = 1
# Capture the current log level to pass to child processes
current_log_level = logger.level
current_log_file = opts.log_file
for batch_index in range((len(file_paths) + n_procs - 1) // n_procs):
processes = []
connections = []
for proc_index in range(
opts.n_procs * batch_index, opts.n_procs * (batch_index + 1)
):
for proc_index in range(n_procs * batch_index, n_procs * (batch_index + 1)):
if proc_index >= len(file_paths):
break
@@ -1464,6 +1554,8 @@ def _main():
opts.reverse_dns_map_url,
opts.normalize_timespan_threshold_hours,
child_conn,
current_log_level,
current_log_file,
),
)
processes.append(process)
@@ -1476,12 +1568,15 @@ def _main():
for proc in processes:
proc.join()
if sys.stdout.isatty():
if pbar is not None:
counter += 1
pbar.update(counter - pbar.n)
pbar.update(1)
if pbar is not None:
pbar.close()
for result in results:
if type(result[0]) is ParserError:
if isinstance(result[0], ParserError) or result[0] is None:
logger.error("Failed to parse {0} - {1}".format(result[1], result[0]))
else:
if result[0]["report_type"] == "aggregate":
@@ -1502,6 +1597,11 @@ def _main():
smtp_tls_reports.append(result[0]["report"])
for mbox_path in mbox_paths:
normalize_timespan_threshold_hours_value = (
float(opts.normalize_timespan_threshold_hours)
if opts.normalize_timespan_threshold_hours is not None
else 24.0
)
strip = opts.strip_attachment_payloads
reports = get_dmarc_reports_from_mbox(
mbox_path,
@@ -1513,13 +1613,17 @@ def _main():
reverse_dns_map_path=opts.reverse_dns_map_path,
reverse_dns_map_url=opts.reverse_dns_map_url,
offline=opts.offline,
normalize_timespan_threshold_hours=opts.normalize_timespan_threshold_hours,
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours_value,
)
aggregate_reports += reports["aggregate_reports"]
forensic_reports += reports["forensic_reports"]
smtp_tls_reports += reports["smtp_tls_reports"]
mailbox_connection = None
mailbox_batch_size_value = 10
mailbox_check_timeout_value = 30
normalize_timespan_threshold_hours_value = 24.0
if opts.imap_host:
try:
if opts.imap_user is None or opts.imap_password is None:
@@ -1535,13 +1639,20 @@ def _main():
if not opts.imap_ssl:
ssl = False
imap_timeout = (
int(opts.imap_timeout) if opts.imap_timeout is not None else 30
)
imap_max_retries = (
int(opts.imap_max_retries) if opts.imap_max_retries is not None else 4
)
imap_port_value = int(opts.imap_port) if opts.imap_port is not None else 993
mailbox_connection = IMAPConnection(
host=opts.imap_host,
port=opts.imap_port,
port=imap_port_value,
ssl=ssl,
verify=verify,
timeout=opts.imap_timeout,
max_retries=opts.imap_max_retries,
timeout=imap_timeout,
max_retries=imap_max_retries,
user=opts.imap_user,
password=opts.imap_password,
)
@@ -1562,7 +1673,7 @@ def _main():
username=opts.graph_user,
password=opts.graph_password,
token_file=opts.graph_token_file,
allow_unencrypted_storage=opts.graph_allow_unencrypted_storage,
allow_unencrypted_storage=bool(opts.graph_allow_unencrypted_storage),
graph_url=opts.graph_url,
)
@@ -1607,11 +1718,24 @@ def _main():
exit(1)
if mailbox_connection:
mailbox_batch_size_value = (
int(opts.mailbox_batch_size) if opts.mailbox_batch_size is not None else 10
)
mailbox_check_timeout_value = (
int(opts.mailbox_check_timeout)
if opts.mailbox_check_timeout is not None
else 30
)
normalize_timespan_threshold_hours_value = (
float(opts.normalize_timespan_threshold_hours)
if opts.normalize_timespan_threshold_hours is not None
else 24.0
)
try:
reports = get_dmarc_reports_from_mailbox(
connection=mailbox_connection,
delete=opts.mailbox_delete,
batch_size=opts.mailbox_batch_size,
batch_size=mailbox_batch_size_value,
reports_folder=opts.mailbox_reports_folder,
archive_folder=opts.mailbox_archive_folder,
ip_db_path=opts.ip_db_path,
@@ -1623,7 +1747,7 @@ def _main():
test=opts.mailbox_test,
strip_attachment_payloads=opts.strip_attachment_payloads,
since=opts.mailbox_since,
normalize_timespan_threshold_hours=opts.normalize_timespan_threshold_hours,
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours_value,
)
aggregate_reports += reports["aggregate_reports"]
@@ -1634,27 +1758,31 @@ def _main():
logger.exception("Mailbox Error")
exit(1)
results = OrderedDict(
[
("aggregate_reports", aggregate_reports),
("forensic_reports", forensic_reports),
("smtp_tls_reports", smtp_tls_reports),
]
)
parsing_results: ParsingResults = {
"aggregate_reports": aggregate_reports,
"forensic_reports": forensic_reports,
"smtp_tls_reports": smtp_tls_reports,
}
process_reports(results)
process_reports(parsing_results)
if opts.smtp_host:
try:
verify = True
if opts.smtp_skip_certificate_verification:
verify = False
smtp_port_value = int(opts.smtp_port) if opts.smtp_port is not None else 25
smtp_to_value = (
list(opts.smtp_to)
if isinstance(opts.smtp_to, list)
else _str_to_list(str(opts.smtp_to))
)
email_results(
results,
parsing_results,
opts.smtp_host,
opts.smtp_from,
opts.smtp_to,
port=opts.smtp_port,
smtp_to_value,
port=smtp_port_value,
verify=verify,
username=opts.smtp_user,
password=opts.smtp_password,
@@ -1676,17 +1804,17 @@ def _main():
archive_folder=opts.mailbox_archive_folder,
delete=opts.mailbox_delete,
test=opts.mailbox_test,
check_timeout=opts.mailbox_check_timeout,
check_timeout=mailbox_check_timeout_value,
nameservers=opts.nameservers,
dns_timeout=opts.dns_timeout,
strip_attachment_payloads=opts.strip_attachment_payloads,
batch_size=opts.mailbox_batch_size,
batch_size=mailbox_batch_size_value,
ip_db_path=opts.ip_db_path,
always_use_local_files=opts.always_use_local_files,
reverse_dns_map_path=opts.reverse_dns_map_path,
reverse_dns_map_url=opts.reverse_dns_map_url,
offline=opts.offline,
normalize_timespan_threshold_hours=opts.normalize_timespan_threshold_hours,
normalize_timespan_threshold_hours=normalize_timespan_threshold_hours_value,
)
except FileExistsError as error:
logger.error("{0}".format(error.__str__()))

View File

@@ -1,2 +1,3 @@
__version__ = "9.0.3"
__version__ = "9.0.10"
USER_AGENT = f"parsedmarc/{__version__}"

View File

@@ -2,30 +2,28 @@
from __future__ import annotations
from typing import Optional, Union, Any
from typing import Any, Optional, Union
from collections import OrderedDict
from elasticsearch_dsl.search import Q
from elasticsearch.helpers import reindex
from elasticsearch_dsl import (
connections,
Object,
Boolean,
Date,
Document,
Index,
Nested,
InnerDoc,
Integer,
Text,
Boolean,
Ip,
Date,
Nested,
Object,
Search,
Text,
connections,
)
from elasticsearch.helpers import reindex
from elasticsearch_dsl.search import Q
from parsedmarc import InvalidForensicReport
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import InvalidForensicReport
class ElasticsearchError(Exception):
@@ -94,17 +92,17 @@ class _AggregateReportDoc(Document):
spf_results = Nested(_SPFResult)
def add_policy_override(self, type_: str, comment: str):
self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment))
self.policy_overrides.append(_PolicyOverride(type=type_, comment=comment)) # pyright: ignore[reportCallIssue]
def add_dkim_result(self, domain: str, selector: str, result: _DKIMResult):
self.dkim_results.append(
_DKIMResult(domain=domain, selector=selector, result=result)
)
) # pyright: ignore[reportCallIssue]
def add_spf_result(self, domain: str, scope: str, result: _SPFResult):
self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result))
self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result)) # pyright: ignore[reportCallIssue]
def save(self, **kwargs):
def save(self, **kwargs): # pyright: ignore[reportIncompatibleMethodOverride]
self.passed_dmarc = False
self.passed_dmarc = self.spf_aligned or self.dkim_aligned
@@ -138,25 +136,25 @@ class _ForensicSampleDoc(InnerDoc):
attachments = Nested(_EmailAttachmentDoc)
def add_to(self, display_name: str, address: str):
self.to.append(_EmailAddressDoc(display_name=display_name, address=address))
self.to.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_reply_to(self, display_name: str, address: str):
self.reply_to.append(
_EmailAddressDoc(display_name=display_name, address=address)
)
) # pyright: ignore[reportCallIssue]
def add_cc(self, display_name: str, address: str):
self.cc.append(_EmailAddressDoc(display_name=display_name, address=address))
self.cc.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_bcc(self, display_name: str, address: str):
self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address))
self.bcc.append(_EmailAddressDoc(display_name=display_name, address=address)) # pyright: ignore[reportCallIssue]
def add_attachment(self, filename: str, content_type: str, sha256: str):
self.attachments.append(
_EmailAttachmentDoc(
filename=filename, content_type=content_type, sha256=sha256
)
)
) # pyright: ignore[reportCallIssue]
class _ForensicReportDoc(Document):
@@ -203,11 +201,11 @@ class _SMTPTLSPolicyDoc(InnerDoc):
def add_failure_details(
self,
result_type: str,
ip_address: str,
receiving_ip: str,
receiving_mx_helo: str,
failed_session_count: int,
result_type: Optional[str] = None,
ip_address: Optional[str] = None,
receiving_ip: Optional[str] = None,
receiving_mx_helo: Optional[str] = None,
failed_session_count: Optional[int] = None,
sending_mta_ip: Optional[str] = None,
receiving_mx_hostname: Optional[str] = None,
additional_information_uri: Optional[str] = None,
@@ -224,7 +222,7 @@ class _SMTPTLSPolicyDoc(InnerDoc):
additional_information=additional_information_uri,
failure_reason_code=failure_reason_code,
)
self.failure_details.append(_details)
self.failure_details.append(_details) # pyright: ignore[reportCallIssue]
class _SMTPTLSReportDoc(Document):
@@ -258,7 +256,7 @@ class _SMTPTLSReportDoc(Document):
policy_string=policy_string,
mx_host_patterns=mx_host_patterns,
failure_details=failure_details,
)
) # pyright: ignore[reportCallIssue]
class AlreadySaved(ValueError):
@@ -268,12 +266,12 @@ class AlreadySaved(ValueError):
def set_hosts(
hosts: Union[str, list[str]],
*,
use_ssl: Optional[bool] = False,
use_ssl: bool = False,
ssl_cert_path: Optional[str] = None,
username: Optional[str] = None,
password: Optional[str] = None,
api_key: Optional[str] = None,
timeout: Optional[float] = 60.0,
timeout: float = 60.0,
):
"""
Sets the Elasticsearch hosts to use
@@ -297,7 +295,7 @@ def set_hosts(
conn_params["ca_certs"] = ssl_cert_path
else:
conn_params["verify_certs"] = False
if username:
if username and password:
conn_params["http_auth"] = username + ":" + password
if api_key:
conn_params["api_key"] = api_key
@@ -369,7 +367,7 @@ def migrate_indexes(
}
Index(new_index_name).create()
Index(new_index_name).put_mapping(doc_type=doc, body=body)
reindex(connections.get_connection(), aggregate_index_name, new_index_name)
reindex(connections.get_connection(), aggregate_index_name, new_index_name) # pyright: ignore[reportArgumentType]
Index(aggregate_index_name).delete()
for forensic_index in forensic_indexes:
@@ -377,18 +375,18 @@ def migrate_indexes(
def save_aggregate_report_to_elasticsearch(
aggregate_report: OrderedDict[str, Any],
aggregate_report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: Optional[bool] = False,
number_of_shards: Optional[int] = 1,
number_of_replicas: Optional[int] = 0,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed DMARC aggregate report to Elasticsearch
Args:
aggregate_report (OrderedDict): A parsed forensic report
aggregate_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -412,11 +410,11 @@ def save_aggregate_report_to_elasticsearch(
else:
index_date = begin_date.strftime("%Y-%m-%d")
org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
domain_query = Q(dict(match_phrase={"published_policy.domain": domain}))
begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
end_date_query = Q(dict(match=dict(date_end=end_date)))
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) # type: ignore
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) # pyright: ignore[reportArgumentType]
domain_query = Q(dict(match_phrase={"published_policy.domain": domain})) # pyright: ignore[reportArgumentType]
begin_date_query = Q(dict(match=dict(date_begin=begin_date))) # pyright: ignore[reportArgumentType]
end_date_query = Q(dict(match=dict(date_end=end_date))) # pyright: ignore[reportArgumentType]
if index_suffix is not None:
search_index = "dmarc_aggregate_{0}*".format(index_suffix)
@@ -428,13 +426,12 @@ def save_aggregate_report_to_elasticsearch(
query = org_name_query & report_id_query & domain_query
query = query & begin_date_query & end_date_query
search.query = query
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
try:
existing = search.execute()
except Exception as error_:
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
raise ElasticsearchError(
"Elasticsearch's search for existing report \
error: {}".format(error_.__str__())
@@ -530,7 +527,7 @@ def save_aggregate_report_to_elasticsearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
)
create_indexes([index], index_settings)
agg_doc.meta.index = index
agg_doc.meta.index = index # pyright: ignore[reportOptionalMemberAccess, reportAttributeAccessIssue]
try:
agg_doc.save()
@@ -539,8 +536,8 @@ def save_aggregate_report_to_elasticsearch(
def save_forensic_report_to_elasticsearch(
forensic_report: OrderedDict[str, Any],
index_suffix: Optional[any] = None,
forensic_report: dict[str, Any],
index_suffix: Optional[Any] = None,
index_prefix: Optional[str] = None,
monthly_indexes: Optional[bool] = False,
number_of_shards: int = 1,
@@ -550,7 +547,7 @@ def save_forensic_report_to_elasticsearch(
Saves a parsed DMARC forensic report to Elasticsearch
Args:
forensic_report (OrderedDict): A parsed forensic report
forensic_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily
@@ -570,7 +567,7 @@ def save_forensic_report_to_elasticsearch(
sample_date = forensic_report["parsed_sample"]["date"]
sample_date = human_timestamp_to_datetime(sample_date)
original_headers = forensic_report["parsed_sample"]["headers"]
headers = OrderedDict()
headers: dict[str, Any] = {}
for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header]
@@ -584,7 +581,7 @@ def save_forensic_report_to_elasticsearch(
if index_prefix is not None:
search_index = "{0}{1}".format(index_prefix, search_index)
search = Search(index=search_index)
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds)))
q = Q(dict(match=dict(arrival_date=arrival_date_epoch_milliseconds))) # pyright: ignore[reportArgumentType]
from_ = None
to_ = None
@@ -599,7 +596,7 @@ def save_forensic_report_to_elasticsearch(
from_ = dict()
from_["sample.headers.from"] = headers["from"]
from_query = Q(dict(match_phrase=from_))
from_query = Q(dict(match_phrase=from_)) # pyright: ignore[reportArgumentType]
q = q & from_query
if "to" in headers:
# We convert the TO header from a string list to a flat string.
@@ -611,12 +608,12 @@ def save_forensic_report_to_elasticsearch(
to_ = dict()
to_["sample.headers.to"] = headers["to"]
to_query = Q(dict(match_phrase=to_))
to_query = Q(dict(match_phrase=to_)) # pyright: ignore[reportArgumentType]
q = q & to_query
if "subject" in headers:
subject = headers["subject"]
subject_query = {"match_phrase": {"sample.headers.subject": subject}}
q = q & Q(subject_query)
q = q & Q(subject_query) # pyright: ignore[reportArgumentType]
search.query = q
existing = search.execute()
@@ -694,7 +691,7 @@ def save_forensic_report_to_elasticsearch(
number_of_shards=number_of_shards, number_of_replicas=number_of_replicas
)
create_indexes([index], index_settings)
forensic_doc.meta.index = index
forensic_doc.meta.index = index # pyright: ignore[reportAttributeAccessIssue, reportOptionalMemberAccess]
try:
forensic_doc.save()
except Exception as e:
@@ -706,18 +703,18 @@ def save_forensic_report_to_elasticsearch(
def save_smtp_tls_report_to_elasticsearch(
report: OrderedDict[str, Any],
index_suffix: str = None,
index_prefix: str = None,
monthly_indexes: Optional[bool] = False,
number_of_shards: Optional[int] = 1,
number_of_replicas: Optional[int] = 0,
report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed SMTP TLS report to Elasticsearch
Args:
report (OrderedDict): A parsed SMTP TLS report
report (dict): A parsed SMTP TLS report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -741,10 +738,10 @@ def save_smtp_tls_report_to_elasticsearch(
report["begin_date"] = begin_date
report["end_date"] = end_date
org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
end_date_query = Q(dict(match=dict(date_end=end_date)))
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) # pyright: ignore[reportArgumentType]
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) # pyright: ignore[reportArgumentType]
begin_date_query = Q(dict(match=dict(date_begin=begin_date))) # pyright: ignore[reportArgumentType]
end_date_query = Q(dict(match=dict(date_end=end_date))) # pyright: ignore[reportArgumentType]
if index_suffix is not None:
search_index = "smtp_tls_{0}*".format(index_suffix)
@@ -845,10 +842,10 @@ def save_smtp_tls_report_to_elasticsearch(
additional_information_uri=additional_information_uri,
failure_reason_code=failure_reason_code,
)
smtp_tls_doc.policies.append(policy_doc)
smtp_tls_doc.policies.append(policy_doc) # pyright: ignore[reportCallIssue]
create_indexes([index], index_settings)
smtp_tls_doc.meta.index = index
smtp_tls_doc.meta.index = index # pyright: ignore[reportOptionalMemberAccess, reportAttributeAccessIssue]
try:
smtp_tls_doc.save()

View File

@@ -2,21 +2,18 @@
from __future__ import annotations
from typing import Any
import logging
import logging.handlers
import json
import threading
from collections import OrderedDict
from typing import Any
from pygelf import GelfTcpHandler, GelfTlsHandler, GelfUdpHandler
from parsedmarc import (
parsed_aggregate_reports_to_csv_rows,
parsed_forensic_reports_to_csv_rows,
parsed_smtp_tls_reports_to_csv_rows,
)
from pygelf import GelfTcpHandler, GelfUdpHandler, GelfTlsHandler
log_context_data = threading.local()
@@ -53,7 +50,7 @@ class GelfClient(object):
)
self.logger.addHandler(self.handler)
def save_aggregate_report_to_gelf(self, aggregate_reports: OrderedDict[str, Any]):
def save_aggregate_report_to_gelf(self, aggregate_reports: list[dict[str, Any]]):
rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports)
for row in rows:
log_context_data.parsedmarc = row
@@ -61,12 +58,14 @@ class GelfClient(object):
log_context_data.parsedmarc = None
def save_forensic_report_to_gelf(self, forensic_reports: OrderedDict[str, Any]):
def save_forensic_report_to_gelf(self, forensic_reports: list[dict[str, Any]]):
rows = parsed_forensic_reports_to_csv_rows(forensic_reports)
for row in rows:
self.logger.info(json.dumps(row))
log_context_data.parsedmarc = row
self.logger.info("parsedmarc forensic report")
def save_smtp_tls_report_to_gelf(self, smtp_tls_reports: OrderedDict[str, Any]):
def save_smtp_tls_report_to_gelf(self, smtp_tls_reports: dict[str, Any]):
rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports)
for row in rows:
self.logger.info(json.dumps(row))
log_context_data.parsedmarc = row
self.logger.info("parsedmarc smtptls report")

View File

@@ -2,19 +2,16 @@
from __future__ import annotations
from typing import Any, Optional
from ssl import SSLContext
import json
from ssl import create_default_context
from ssl import SSLContext, create_default_context
from typing import Any, Optional, Union
from kafka import KafkaProducer
from kafka.errors import NoBrokersAvailable, UnknownTopicOrPartitionError
from collections import OrderedDict
from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import __version__
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
class KafkaError(RuntimeError):
@@ -49,7 +46,7 @@ class KafkaClient(object):
``$ConnectionString``, and the password is the
Azure Event Hub connection string.
"""
config = dict(
config: dict[str, Any] = dict(
value_serializer=lambda v: json.dumps(v).encode("utf-8"),
bootstrap_servers=kafka_hosts,
client_id="parsedmarc-{0}".format(__version__),
@@ -66,7 +63,7 @@ class KafkaClient(object):
raise KafkaError("No Kafka brokers available")
@staticmethod
def strip_metadata(report: OrderedDict[str, Any]):
def strip_metadata(report: dict[str, Any]):
"""
Duplicates org_name, org_email and report_id into JSON root
and removes report_metadata key to bring it more inline
@@ -80,7 +77,7 @@ class KafkaClient(object):
return report
@staticmethod
def generate_date_range(report: OrderedDict[str, Any]):
def generate_date_range(report: dict[str, Any]):
"""
Creates a date_range timestamp with format YYYY-MM-DD-T-HH:MM:SS
based on begin and end dates for easier parsing in Kibana.
@@ -98,7 +95,9 @@ class KafkaClient(object):
return date_range
def save_aggregate_reports_to_kafka(
self, aggregate_reports: list[OrderedDict][str, Any], aggregate_topic: str
self,
aggregate_reports: Union[dict[str, Any], list[dict[str, Any]]],
aggregate_topic: str,
):
"""
Saves aggregate DMARC reports to Kafka
@@ -109,9 +108,7 @@ class KafkaClient(object):
aggregate_topic (str): The name of the Kafka topic
"""
if isinstance(aggregate_reports, dict) or isinstance(
aggregate_reports, OrderedDict
):
if isinstance(aggregate_reports, dict):
aggregate_reports = [aggregate_reports]
if len(aggregate_reports) < 1:
@@ -143,7 +140,9 @@ class KafkaClient(object):
raise KafkaError("Kafka error: {0}".format(e.__str__()))
def save_forensic_reports_to_kafka(
self, forensic_reports: OrderedDict[str, Any], forensic_topic: str
self,
forensic_reports: Union[dict[str, Any], list[dict[str, Any]]],
forensic_topic: str,
):
"""
Saves forensic DMARC reports to Kafka, sends individual
@@ -175,7 +174,9 @@ class KafkaClient(object):
raise KafkaError("Kafka error: {0}".format(e.__str__()))
def save_smtp_tls_reports_to_kafka(
self, smtp_tls_reports: list[OrderedDict[str, Any]], smtp_tls_topic: str
self,
smtp_tls_reports: Union[list[dict[str, Any]], dict[str, Any]],
smtp_tls_topic: str,
):
"""
Saves SMTP TLS reports to Kafka, sends individual

View File

@@ -3,13 +3,13 @@
from __future__ import annotations
from typing import Any
from collections import OrderedDict
from parsedmarc.log import logger
from azure.core.exceptions import HttpResponseError
from azure.identity import ClientSecretCredential
from azure.monitor.ingestion import LogsIngestionClient
from parsedmarc.log import logger
class LogAnalyticsException(Exception):
"""Raised when an Elasticsearch error occurs"""
@@ -110,7 +110,7 @@ class LogAnalyticsClient(object):
def publish_json(
self,
results: OrderedDict[str, OrderedDict[str, Any]],
results,
logs_client: LogsIngestionClient,
dcr_stream: str,
):
@@ -133,7 +133,7 @@ class LogAnalyticsClient(object):
def publish_results(
self,
results: OrderedDict[str, OrderedDict[str, Any]],
results: dict[str, Any],
save_aggregate: bool,
save_forensic: bool,
save_smtp_tls: bool,

View File

@@ -116,14 +116,14 @@ class GmailConnection(MailboxConnection):
else:
return [id for id in self._fetch_all_message_ids(reports_label_id)]
def fetch_message(self, message_id):
def fetch_message(self, message_id) -> str:
msg = (
self.service.users()
.messages()
.get(userId="me", id=message_id, format="raw")
.execute()
)
return urlsafe_b64decode(msg["raw"])
return urlsafe_b64decode(msg["raw"]).decode(errors="replace")
def delete_message(self, message_id: str):
self.service.users().messages().delete(userId="me", id=message_id)

View File

@@ -6,7 +6,7 @@ from enum import Enum
from functools import lru_cache
from pathlib import Path
from time import sleep
from typing import List, Optional
from typing import Any, List, Optional, Union
from azure.identity import (
UsernamePasswordCredential,
@@ -28,7 +28,7 @@ class AuthMethod(Enum):
def _get_cache_args(token_path: Path, allow_unencrypted_storage):
cache_args = {
cache_args: dict[str, Any] = {
"cache_persistence_options": TokenCachePersistenceOptions(
name="parsedmarc", allow_unencrypted_storage=allow_unencrypted_storage
)
@@ -151,9 +151,9 @@ class MSGraphConnection(MailboxConnection):
else:
logger.warning(f"Unknown response {resp.status_code} {resp.json()}")
def fetch_messages(self, folder_name: str, **kwargs) -> List[str]:
def fetch_messages(self, reports_folder: str, **kwargs) -> List[str]:
"""Returns a list of message UIDs in the specified folder"""
folder_id = self._find_folder_id_from_folder_path(folder_name)
folder_id = self._find_folder_id_from_folder_path(reports_folder)
url = f"/users/{self.mailbox_name}/mailFolders/{folder_id}/messages"
since = kwargs.get("since")
if not since:
@@ -166,7 +166,7 @@ class MSGraphConnection(MailboxConnection):
def _get_all_messages(self, url, batch_size, since):
messages: list
params = {"$select": "id"}
params: dict[str, Union[str, int]] = {"$select": "id"}
if since:
params["$filter"] = f"receivedDateTime ge {since}"
if batch_size and batch_size > 0:

View File

@@ -2,7 +2,7 @@
from __future__ import annotations
from typing import Optional
from typing import cast
from time import sleep
@@ -17,15 +17,14 @@ from parsedmarc.mail.mailbox_connection import MailboxConnection
class IMAPConnection(MailboxConnection):
def __init__(
self,
host: Optional[str] = None,
*,
user: Optional[str] = None,
password: Optional[str] = None,
port: Optional[str] = None,
ssl: Optional[bool] = True,
verify: Optional[bool] = True,
timeout: Optional[int] = 30,
max_retries: Optional[int] = 4,
host: str,
user: str,
password: str,
port: int = 993,
ssl: bool = True,
verify: bool = True,
timeout: int = 30,
max_retries: int = 4,
):
self._username = user
self._password = password
@@ -47,13 +46,13 @@ class IMAPConnection(MailboxConnection):
def fetch_messages(self, reports_folder: str, **kwargs):
self._client.select_folder(reports_folder)
since = kwargs.get("since")
if since:
return self._client.search(["SINCE", since])
if since is not None:
return self._client.search(f"SINCE {since}")
else:
return self._client.search()
def fetch_message(self, message_id: int):
return self._client.fetch_message(message_id, parse=False)
return cast(str, self._client.fetch_message(message_id, parse=False))
def delete_message(self, message_id: int):
self._client.delete_messages([message_id])

View File

@@ -13,16 +13,16 @@ class MailboxConnection(ABC):
def create_folder(self, folder_name: str):
raise NotImplementedError
def fetch_messages(self, reports_folder: str, **kwargs) -> list[str]:
def fetch_messages(self, reports_folder: str, **kwargs):
raise NotImplementedError
def fetch_message(self, message_id) -> str:
raise NotImplementedError
def delete_message(self, message_id: str):
def delete_message(self, message_id):
raise NotImplementedError
def move_message(self, message_id: str, folder_name: str):
def move_message(self, message_id, folder_name: str):
raise NotImplementedError
def keepalive(self):

View File

@@ -2,21 +2,20 @@
from __future__ import annotations
from typing import Optional
import mailbox
import os
from time import sleep
from typing import Dict
from parsedmarc.log import logger
from parsedmarc.mail.mailbox_connection import MailboxConnection
import mailbox
import os
class MaildirConnection(MailboxConnection):
def __init__(
self,
maildir_path: Optional[bool] = None,
maildir_create: Optional[bool] = False,
maildir_path: str,
maildir_create: bool = False,
):
self._maildir_path = maildir_path
self._maildir_create = maildir_create
@@ -33,27 +32,31 @@ class MaildirConnection(MailboxConnection):
)
raise Exception(ex)
self._client = mailbox.Maildir(maildir_path, create=maildir_create)
self._subfolder_client = {}
self._subfolder_client: Dict[str, mailbox.Maildir] = {}
def create_folder(self, folder_name: str):
self._subfolder_client[folder_name] = self._client.add_folder(folder_name)
self._client.add_folder(folder_name)
def fetch_messages(self, reports_folder: str, **kwargs):
return self._client.keys()
def fetch_message(self, message_id: str):
return self._client.get(message_id).as_string()
def fetch_message(self, message_id: str) -> str:
msg = self._client.get(message_id)
if msg is not None:
msg = msg.as_string()
if msg is not None:
return msg
return ""
def delete_message(self, message_id: str):
self._client.remove(message_id)
def move_message(self, message_id: str, folder_name: str):
message_data = self._client.get(message_id)
if folder_name not in self._subfolder_client.keys():
self._subfolder_client = mailbox.Maildir(
os.join(self.maildir_path, folder_name), create=self.maildir_create
)
if message_data is None:
return
if folder_name not in self._subfolder_client:
self._subfolder_client[folder_name] = self._client.add_folder(folder_name)
self._subfolder_client[folder_name].add(message_data)
self._client.remove(message_id)

View File

@@ -2,30 +2,28 @@
from __future__ import annotations
from typing import Optional, Union, Any
from collections import OrderedDict
from typing import Any, Optional, Union
from opensearchpy import (
Q,
connections,
Object,
Boolean,
Date,
Document,
Index,
Nested,
InnerDoc,
Integer,
Text,
Boolean,
Ip,
Date,
Nested,
Object,
Q,
Search,
Text,
connections,
)
from opensearchpy.helpers import reindex
from parsedmarc import InvalidForensicReport
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
from parsedmarc import InvalidForensicReport
class OpenSearchError(Exception):
@@ -104,7 +102,7 @@ class _AggregateReportDoc(Document):
def add_spf_result(self, domain: str, scope: str, result: _SPFResult):
self.spf_results.append(_SPFResult(domain=domain, scope=scope, result=result))
def save(self, **kwargs):
def save(self, **kwargs): # pyright: ignore[reportIncompatibleMethodOverride]
self.passed_dmarc = False
self.passed_dmarc = self.spf_aligned or self.dkim_aligned
@@ -203,11 +201,11 @@ class _SMTPTLSPolicyDoc(InnerDoc):
def add_failure_details(
self,
result_type: str,
ip_address: str,
receiving_ip: str,
receiving_mx_helo: str,
failed_session_count: int,
result_type: Optional[str] = None,
ip_address: Optional[str] = None,
receiving_ip: Optional[str] = None,
receiving_mx_helo: Optional[str] = None,
failed_session_count: Optional[int] = None,
sending_mta_ip: Optional[str] = None,
receiving_mx_hostname: Optional[str] = None,
additional_information_uri: Optional[str] = None,
@@ -297,7 +295,7 @@ def set_hosts(
conn_params["ca_certs"] = ssl_cert_path
else:
conn_params["verify_certs"] = False
if username:
if username and password:
conn_params["http_auth"] = username + ":" + password
if api_key:
conn_params["api_key"] = api_key
@@ -376,19 +374,19 @@ def migrate_indexes(
pass
def save_aggregate_report_to_elasticsearch(
aggregate_report: OrderedDict[str, Any],
def save_aggregate_report_to_opensearch(
aggregate_report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: Optional[bool] = False,
number_of_shards: Optional[int] = 1,
number_of_replicas: Optional[int] = 0,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed DMARC aggregate report to OpenSearch
Args:
aggregate_report (OrderedDict): A parsed forensic report
aggregate_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes
@@ -428,13 +426,12 @@ def save_aggregate_report_to_elasticsearch(
query = org_name_query & report_id_query & domain_query
query = query & begin_date_query & end_date_query
search.query = query
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
try:
existing = search.execute()
except Exception as error_:
begin_date_human = begin_date.strftime("%Y-%m-%d %H:%M:%SZ")
end_date_human = end_date.strftime("%Y-%m-%d %H:%M:%SZ")
raise OpenSearchError(
"OpenSearch's search for existing report \
error: {}".format(error_.__str__())
@@ -538,11 +535,11 @@ def save_aggregate_report_to_elasticsearch(
raise OpenSearchError("OpenSearch error: {0}".format(e.__str__()))
def save_forensic_report_to_elasticsearch(
forensic_report: OrderedDict[str, Any],
index_suffix: Optional[any] = None,
def save_forensic_report_to_opensearch(
forensic_report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: Optional[bool] = False,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
@@ -550,7 +547,7 @@ def save_forensic_report_to_elasticsearch(
Saves a parsed DMARC forensic report to OpenSearch
Args:
forensic_report (OrderedDict): A parsed forensic report
forensic_report (dict): A parsed forensic report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily
@@ -570,7 +567,7 @@ def save_forensic_report_to_elasticsearch(
sample_date = forensic_report["parsed_sample"]["date"]
sample_date = human_timestamp_to_datetime(sample_date)
original_headers = forensic_report["parsed_sample"]["headers"]
headers = OrderedDict()
headers: dict[str, Any] = {}
for original_header in original_headers:
headers[original_header.lower()] = original_headers[original_header]
@@ -705,19 +702,19 @@ def save_forensic_report_to_elasticsearch(
)
def save_smtp_tls_report_to_elasticsearch(
report: OrderedDict[str, Any],
index_suffix: str = None,
index_prefix: str = None,
monthly_indexes: Optional[bool] = False,
number_of_shards: Optional[int] = 1,
number_of_replicas: Optional[int] = 0,
def save_smtp_tls_report_to_opensearch(
report: dict[str, Any],
index_suffix: Optional[str] = None,
index_prefix: Optional[str] = None,
monthly_indexes: bool = False,
number_of_shards: int = 1,
number_of_replicas: int = 0,
):
"""
Saves a parsed SMTP TLS report to OpenSearch
Args:
report (OrderedDict): A parsed SMTP TLS report
report (dict): A parsed SMTP TLS report
index_suffix (str): The suffix of the name of the index to save to
index_prefix (str): The prefix of the name of the index to save to
monthly_indexes (bool): Use monthly indexes instead of daily indexes

View File

@@ -2,13 +2,11 @@
from __future__ import annotations
import json
from typing import Any
import json
import boto3
from collections import OrderedDict
from parsedmarc.log import logger
from parsedmarc.utils import human_timestamp_to_datetime
@@ -53,18 +51,18 @@ class S3Client(object):
aws_access_key_id=access_key_id,
aws_secret_access_key=secret_access_key,
)
self.bucket = self.s3.Bucket(self.bucket_name)
self.bucket = self.s3.Bucket(self.bucket_name) # type: ignore
def save_aggregate_report_to_s3(self, report: OrderedDict[str, Any]):
def save_aggregate_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "aggregate")
def save_forensic_report_to_s3(self, report: OrderedDict[str, Any]):
def save_forensic_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "forensic")
def save_smtp_tls_report_to_s3(self, report: OrderedDict[str, Any]):
def save_smtp_tls_report_to_s3(self, report: dict[str, Any]):
self.save_report_to_s3(report, "smtp_tls")
def save_report_to_s3(self, report: OrderedDict[str, Any], report_type: str):
def save_report_to_s3(self, report: dict[str, Any], report_type: str):
if report_type == "smtp_tls":
report_date = report["begin_date"]
report_id = report["report_id"]

View File

@@ -2,16 +2,13 @@
from __future__ import annotations
from typing import Any
from collections import OrderedDict
from urllib.parse import urlparse
import socket
import json
import socket
from typing import Any, Union
from urllib.parse import urlparse
import urllib3
import requests
import urllib3
from parsedmarc.constants import USER_AGENT
from parsedmarc.log import logger
@@ -35,7 +32,7 @@ class HECClient(object):
url: str,
access_token: str,
index: str,
source: bool = "parsedmarc",
source: str = "parsedmarc",
verify=True,
timeout=60,
):
@@ -51,9 +48,9 @@ class HECClient(object):
timeout (float): Number of seconds to wait for the server to send
data before giving up
"""
url = urlparse(url)
parsed_url = urlparse(url)
self.url = "{0}://{1}/services/collector/event/1.0".format(
url.scheme, url.netloc
parsed_url.scheme, parsed_url.netloc
)
self.access_token = access_token.lstrip("Splunk ")
self.index = index
@@ -62,7 +59,9 @@ class HECClient(object):
self.session = requests.Session()
self.timeout = timeout
self.session.verify = verify
self._common_data = dict(host=self.host, source=self.source, index=self.index)
self._common_data: dict[str, Union[str, int, float, dict]] = dict(
host=self.host, source=self.source, index=self.index
)
self.session.headers = {
"User-Agent": USER_AGENT,
@@ -70,7 +69,8 @@ class HECClient(object):
}
def save_aggregate_reports_to_splunk(
self, aggregate_reports: list[OrderedDict[str, Any]]
self,
aggregate_reports: Union[list[dict[str, Any]], dict[str, Any]],
):
"""
Saves aggregate DMARC reports to Splunk
@@ -91,7 +91,7 @@ class HECClient(object):
json_str = ""
for report in aggregate_reports:
for record in report["records"]:
new_report = dict()
new_report: dict[str, Union[str, int, float, dict]] = dict()
for metadata in report["report_metadata"]:
new_report[metadata] = report["report_metadata"][metadata]
new_report["interval_begin"] = record["interval_begin"]
@@ -135,7 +135,8 @@ class HECClient(object):
raise SplunkError(response["text"])
def save_forensic_reports_to_splunk(
self, forensic_reports: list[OrderedDict[str, Any]]
self,
forensic_reports: Union[list[dict[str, Any]], dict[str, Any]],
):
"""
Saves forensic DMARC reports to Splunk
@@ -170,7 +171,9 @@ class HECClient(object):
if response["code"] != 0:
raise SplunkError(response["text"])
def save_smtp_tls_reports_to_splunk(self, reports: OrderedDict[str, Any]):
def save_smtp_tls_reports_to_splunk(
self, reports: Union[list[dict[str, Any]], dict[str, Any]]
):
"""
Saves aggregate DMARC reports to Splunk

View File

@@ -3,15 +3,11 @@
from __future__ import annotations
import json
import logging
import logging.handlers
from typing import Any
from collections import OrderedDict
import json
from parsedmarc import (
parsed_aggregate_reports_to_csv_rows,
parsed_forensic_reports_to_csv_rows,
@@ -36,23 +32,17 @@ class SyslogClient(object):
log_handler = logging.handlers.SysLogHandler(address=(server_name, server_port))
self.logger.addHandler(log_handler)
def save_aggregate_report_to_syslog(
self, aggregate_reports: list[OrderedDict[str, Any]]
):
def save_aggregate_report_to_syslog(self, aggregate_reports: list[dict[str, Any]]):
rows = parsed_aggregate_reports_to_csv_rows(aggregate_reports)
for row in rows:
self.logger.info(json.dumps(row))
def save_forensic_report_to_syslog(
self, forensic_reports: list[OrderedDict[str, Any]]
):
def save_forensic_report_to_syslog(self, forensic_reports: list[dict[str, Any]]):
rows = parsed_forensic_reports_to_csv_rows(forensic_reports)
for row in rows:
self.logger.info(json.dumps(row))
def save_smtp_tls_report_to_syslog(
self, smtp_tls_reports: list[OrderedDict[str, Any]]
):
def save_smtp_tls_report_to_syslog(self, smtp_tls_reports: list[dict[str, Any]]):
rows = parsed_smtp_tls_reports_to_csv_rows(smtp_tls_reports)
for row in rows:
self.logger.info(json.dumps(row))

220
parsedmarc/types.py Normal file
View File

@@ -0,0 +1,220 @@
from __future__ import annotations
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
# NOTE: This module is intentionally Python 3.9 compatible.
# - No PEP 604 unions (A | B)
# - No typing.NotRequired / Required (3.11+) to avoid an extra dependency.
# For optional keys, use total=False TypedDicts.
ReportType = Literal["aggregate", "forensic", "smtp_tls"]
class AggregateReportMetadata(TypedDict):
org_name: str
org_email: str
org_extra_contact_info: Optional[str]
report_id: str
begin_date: str
end_date: str
timespan_requires_normalization: bool
original_timespan_seconds: int
errors: List[str]
class AggregatePolicyPublished(TypedDict):
domain: str
adkim: str
aspf: str
p: str
sp: str
pct: str
fo: str
class IPSourceInfo(TypedDict):
ip_address: str
country: Optional[str]
reverse_dns: Optional[str]
base_domain: Optional[str]
name: Optional[str]
type: Optional[str]
class AggregateAlignment(TypedDict):
spf: bool
dkim: bool
dmarc: bool
class AggregateIdentifiers(TypedDict):
header_from: str
envelope_from: Optional[str]
envelope_to: Optional[str]
class AggregatePolicyOverrideReason(TypedDict):
type: Optional[str]
comment: Optional[str]
class AggregateAuthResultDKIM(TypedDict):
domain: str
result: str
selector: str
class AggregateAuthResultSPF(TypedDict):
domain: str
result: str
scope: str
class AggregateAuthResults(TypedDict):
dkim: List[AggregateAuthResultDKIM]
spf: List[AggregateAuthResultSPF]
class AggregatePolicyEvaluated(TypedDict):
disposition: str
dkim: str
spf: str
policy_override_reasons: List[AggregatePolicyOverrideReason]
class AggregateRecord(TypedDict):
interval_begin: str
interval_end: str
source: IPSourceInfo
count: int
alignment: AggregateAlignment
policy_evaluated: AggregatePolicyEvaluated
disposition: str
identifiers: AggregateIdentifiers
auth_results: AggregateAuthResults
class AggregateReport(TypedDict):
xml_schema: str
report_metadata: AggregateReportMetadata
policy_published: AggregatePolicyPublished
records: List[AggregateRecord]
class EmailAddress(TypedDict):
display_name: Optional[str]
address: str
local: Optional[str]
domain: Optional[str]
class EmailAttachment(TypedDict, total=False):
filename: Optional[str]
mail_content_type: Optional[str]
sha256: Optional[str]
ParsedEmail = TypedDict(
"ParsedEmail",
{
# This is a lightly-specified version of mailsuite/mailparser JSON.
# It focuses on the fields parsedmarc uses in forensic handling.
"headers": Dict[str, Any],
"subject": Optional[str],
"filename_safe_subject": Optional[str],
"date": Optional[str],
"from": EmailAddress,
"to": List[EmailAddress],
"cc": List[EmailAddress],
"bcc": List[EmailAddress],
"attachments": List[EmailAttachment],
"body": Optional[str],
"has_defects": bool,
"defects": Any,
"defects_categories": Any,
},
total=False,
)
class ForensicReport(TypedDict):
feedback_type: Optional[str]
user_agent: Optional[str]
version: Optional[str]
original_envelope_id: Optional[str]
original_mail_from: Optional[str]
original_rcpt_to: Optional[str]
arrival_date: str
arrival_date_utc: str
authentication_results: Optional[str]
delivery_result: Optional[str]
auth_failure: List[str]
authentication_mechanisms: List[str]
dkim_domain: Optional[str]
reported_domain: str
sample_headers_only: bool
source: IPSourceInfo
sample: str
parsed_sample: ParsedEmail
class SMTPTLSFailureDetails(TypedDict):
result_type: str
failed_session_count: int
class SMTPTLSFailureDetailsOptional(SMTPTLSFailureDetails, total=False):
sending_mta_ip: str
receiving_ip: str
receiving_mx_hostname: str
receiving_mx_helo: str
additional_info_uri: str
failure_reason_code: str
ip_address: str
class SMTPTLSPolicySummary(TypedDict):
policy_domain: str
policy_type: str
successful_session_count: int
failed_session_count: int
class SMTPTLSPolicy(SMTPTLSPolicySummary, total=False):
policy_strings: List[str]
mx_host_patterns: List[str]
failure_details: List[SMTPTLSFailureDetailsOptional]
class SMTPTLSReport(TypedDict):
organization_name: str
begin_date: str
end_date: str
contact_info: Union[str, List[str]]
report_id: str
policies: List[SMTPTLSPolicy]
class AggregateParsedReport(TypedDict):
report_type: Literal["aggregate"]
report: AggregateReport
class ForensicParsedReport(TypedDict):
report_type: Literal["forensic"]
report: ForensicReport
class SMTPTLSParsedReport(TypedDict):
report_type: Literal["smtp_tls"]
report: SMTPTLSReport
ParsedReport = Union[AggregateParsedReport, ForensicParsedReport, SMTPTLSParsedReport]
class ParsingResults(TypedDict):
aggregate_reports: List[AggregateReport]
forensic_reports: List[ForensicReport]
smtp_tls_reports: List[SMTPTLSReport]

View File

@@ -4,26 +4,23 @@
from __future__ import annotations
from typing import Optional, Union
import logging
import os
from datetime import datetime
from datetime import timezone
from datetime import timedelta
from collections import OrderedDict
from expiringdict import ExpiringDict
import tempfile
import subprocess
import shutil
import mailparser
import json
import hashlib
import base64
import mailbox
import re
import csv
import hashlib
import io
import json
import logging
import mailbox
import os
import re
import shutil
import subprocess
import tempfile
from datetime import datetime, timedelta, timezone
from typing import Optional, TypedDict, Union, cast
import mailparser
from expiringdict import ExpiringDict
try:
from importlib.resources import files
@@ -32,19 +29,19 @@ except ImportError:
from importlib.resources import files
from dateutil.parser import parse as parse_date
import dns.reversename
import dns.resolver
import dns.exception
import dns.resolver
import dns.reversename
import geoip2.database
import geoip2.errors
import publicsuffixlist
import requests
from dateutil.parser import parse as parse_date
from parsedmarc.log import logger
import parsedmarc.resources.dbip
import parsedmarc.resources.maps
from parsedmarc.constants import USER_AGENT
from parsedmarc.log import logger
parenthesis_regex = re.compile(r"\s*\(.*\)\s*")
@@ -67,6 +64,23 @@ class DownloadError(RuntimeError):
"""Raised when an error occurs when downloading a file"""
class ReverseDNSService(TypedDict):
name: str
type: Optional[str]
ReverseDNSMap = dict[str, ReverseDNSService]
class IPAddressInfo(TypedDict):
ip_address: str
reverse_dns: Optional[str]
country: Optional[str]
base_domain: Optional[str]
name: Optional[str]
type: Optional[str]
def decode_base64(data: str) -> bytes:
"""
Decodes a base64 string, with padding being optional
@@ -78,14 +92,14 @@ def decode_base64(data: str) -> bytes:
bytes: The decoded bytes
"""
data = bytes(data, encoding="ascii")
missing_padding = len(data) % 4
data_bytes = bytes(data, encoding="ascii")
missing_padding = len(data_bytes) % 4
if missing_padding != 0:
data += b"=" * (4 - missing_padding)
return base64.b64decode(data)
data_bytes += b"=" * (4 - missing_padding)
return base64.b64decode(data_bytes)
def get_base_domain(domain: str) -> str:
def get_base_domain(domain: str) -> Optional[str]:
"""
Gets the base domain name for the given domain
@@ -114,8 +128,8 @@ def query_dns(
record_type: str,
*,
cache: Optional[ExpiringDict] = None,
nameservers: list[str] = None,
timeout: int = 2.0,
nameservers: Optional[list[str]] = None,
timeout: float = 2.0,
) -> list[str]:
"""
Queries DNS
@@ -135,9 +149,9 @@ def query_dns(
record_type = record_type.upper()
cache_key = "{0}_{1}".format(domain, record_type)
if cache:
records = cache.get(cache_key, None)
if records:
return records
cached_records = cache.get(cache_key, None)
if isinstance(cached_records, list):
return cast(list[str], cached_records)
resolver = dns.resolver.Resolver()
timeout = float(timeout)
@@ -151,26 +165,12 @@ def query_dns(
resolver.nameservers = nameservers
resolver.timeout = timeout
resolver.lifetime = timeout
if record_type == "TXT":
resource_records = list(
map(
lambda r: r.strings,
resolver.resolve(domain, record_type, lifetime=timeout),
)
)
_resource_record = [
resource_record[0][:0].join(resource_record)
for resource_record in resource_records
if resource_record
]
records = [r.decode() for r in _resource_record]
else:
records = list(
map(
lambda r: r.to_text().replace('"', "").rstrip("."),
resolver.resolve(domain, record_type, lifetime=timeout),
)
records = list(
map(
lambda r: r.to_text().replace('"', "").rstrip("."),
resolver.resolve(domain, record_type, lifetime=timeout),
)
)
if cache:
cache[cache_key] = records
@@ -181,9 +181,9 @@ def get_reverse_dns(
ip_address,
*,
cache: Optional[ExpiringDict] = None,
nameservers: list[str] = None,
timeout: int = 2.0,
) -> str:
nameservers: Optional[list[str]] = None,
timeout: float = 2.0,
) -> Optional[str]:
"""
Resolves an IP address to a hostname using a reverse DNS query
@@ -201,7 +201,7 @@ def get_reverse_dns(
try:
address = dns.reversename.from_address(ip_address)
hostname = query_dns(
address, "PTR", cache=cache, nameservers=nameservers, timeout=timeout
str(address), "PTR", cache=cache, nameservers=nameservers, timeout=timeout
)[0]
except dns.exception.DNSException as e:
@@ -238,7 +238,7 @@ def timestamp_to_human(timestamp: int) -> str:
def human_timestamp_to_datetime(
human_timestamp: str, *, to_utc: Optional[bool] = False
human_timestamp: str, *, to_utc: bool = False
) -> datetime:
"""
Converts a human-readable timestamp into a Python ``datetime`` object
@@ -269,10 +269,12 @@ def human_timestamp_to_unix_timestamp(human_timestamp: str) -> int:
float: The converted timestamp
"""
human_timestamp = human_timestamp.replace("T", " ")
return human_timestamp_to_datetime(human_timestamp).timestamp()
return int(human_timestamp_to_datetime(human_timestamp).timestamp())
def get_ip_address_country(ip_address: str, *, db_path: Optional[str] = None) -> str:
def get_ip_address_country(
ip_address: str, *, db_path: Optional[str] = None
) -> Optional[str]:
"""
Returns the ISO code for the country associated
with the given IPv4 or IPv6 address
@@ -337,12 +339,12 @@ def get_ip_address_country(ip_address: str, *, db_path: Optional[str] = None) ->
def get_service_from_reverse_dns_base_domain(
base_domain,
*,
always_use_local_file: Optional[bool] = False,
local_file_path: Optional[bool] = None,
url: Optional[bool] = None,
offline: Optional[bool] = False,
reverse_dns_map: Optional[bool] = None,
) -> str:
always_use_local_file: bool = False,
local_file_path: Optional[str] = None,
url: Optional[str] = None,
offline: bool = False,
reverse_dns_map: Optional[ReverseDNSMap] = None,
) -> ReverseDNSService:
"""
Returns the service name of a given base domain name from reverse DNS.
@@ -359,12 +361,6 @@ def get_service_from_reverse_dns_base_domain(
the supplied reverse_dns_base_domain and the type will be None
"""
def load_csv(_csv_file):
reader = csv.DictReader(_csv_file)
for row in reader:
key = row["base_reverse_dns"].lower().strip()
reverse_dns_map[key] = dict(name=row["name"], type=row["type"])
base_domain = base_domain.lower().strip()
if url is None:
url = (
@@ -372,11 +368,24 @@ def get_service_from_reverse_dns_base_domain(
"/parsedmarc/master/parsedmarc/"
"resources/maps/base_reverse_dns_map.csv"
)
reverse_dns_map_value: ReverseDNSMap
if reverse_dns_map is None:
reverse_dns_map = dict()
reverse_dns_map_value = {}
else:
reverse_dns_map_value = reverse_dns_map
def load_csv(_csv_file):
reader = csv.DictReader(_csv_file)
for row in reader:
key = row["base_reverse_dns"].lower().strip()
reverse_dns_map_value[key] = {
"name": row["name"],
"type": row["type"],
}
csv_file = io.StringIO()
if not (offline or always_use_local_file) and len(reverse_dns_map) == 0:
if not (offline or always_use_local_file) and len(reverse_dns_map_value) == 0:
try:
logger.debug(f"Trying to fetch reverse DNS map from {url}...")
headers = {"User-Agent": USER_AGENT}
@@ -393,7 +402,7 @@ def get_service_from_reverse_dns_base_domain(
logging.debug("Response body:")
logger.debug(csv_file.read())
if len(reverse_dns_map) == 0:
if len(reverse_dns_map_value) == 0:
logger.info("Loading included reverse DNS map...")
path = str(
files(parsedmarc.resources.maps).joinpath("base_reverse_dns_map.csv")
@@ -402,10 +411,11 @@ def get_service_from_reverse_dns_base_domain(
path = local_file_path
with open(path) as csv_file:
load_csv(csv_file)
service: ReverseDNSService
try:
service = reverse_dns_map[base_domain]
service = reverse_dns_map_value[base_domain]
except KeyError:
service = dict(name=base_domain, type=None)
service = {"name": base_domain, "type": None}
return service
@@ -415,14 +425,14 @@ def get_ip_address_info(
*,
ip_db_path: Optional[str] = None,
reverse_dns_map_path: Optional[str] = None,
always_use_local_files: Optional[bool] = False,
reverse_dns_map_url: Optional[bool] = None,
always_use_local_files: bool = False,
reverse_dns_map_url: Optional[str] = None,
cache: Optional[ExpiringDict] = None,
reverse_dns_map: Optional[bool] = None,
offline: Optional[bool] = False,
reverse_dns_map: Optional[ReverseDNSMap] = None,
offline: bool = False,
nameservers: Optional[list[str]] = None,
timeout: Optional[float] = 2.0,
) -> OrderedDict[str, str]:
timeout: float = 2.0,
) -> IPAddressInfo:
"""
Returns reverse DNS and country information for the given IP address
@@ -440,17 +450,27 @@ def get_ip_address_info(
timeout (float): Sets the DNS timeout in seconds
Returns:
OrderedDict: ``ip_address``, ``reverse_dns``, ``country``
dict: ``ip_address``, ``reverse_dns``, ``country``
"""
ip_address = ip_address.lower()
if cache is not None:
info = cache.get(ip_address, None)
if info:
cached_info = cache.get(ip_address, None)
if (
cached_info
and isinstance(cached_info, dict)
and "ip_address" in cached_info
):
logger.debug(f"IP address {ip_address} was found in cache")
return info
info = OrderedDict()
info["ip_address"] = ip_address
return cast(IPAddressInfo, cached_info)
info: IPAddressInfo = {
"ip_address": ip_address,
"reverse_dns": None,
"country": None,
"base_domain": None,
"name": None,
"type": None,
}
if offline:
reverse_dns = None
else:
@@ -460,9 +480,6 @@ def get_ip_address_info(
country = get_ip_address_country(ip_address, db_path=ip_db_path)
info["country"] = country
info["reverse_dns"] = reverse_dns
info["base_domain"] = None
info["name"] = None
info["type"] = None
if reverse_dns is not None:
base_domain = get_base_domain(reverse_dns)
if base_domain is not None:
@@ -487,7 +504,7 @@ def get_ip_address_info(
return info
def parse_email_address(original_address: str) -> OrderedDict[str, str]:
def parse_email_address(original_address: str) -> dict[str, Optional[str]]:
if original_address[0] == "":
display_name = None
else:
@@ -500,14 +517,12 @@ def parse_email_address(original_address: str) -> OrderedDict[str, str]:
local = address_parts[0].lower()
domain = address_parts[-1].lower()
return OrderedDict(
[
("display_name", display_name),
("address", address),
("local", local),
("domain", domain),
]
)
return {
"display_name": display_name,
"address": address,
"local": local,
"domain": domain,
}
def get_filename_safe_string(string: str) -> str:
@@ -568,7 +583,7 @@ def is_outlook_msg(content) -> bool:
)
def convert_outlook_msg(msg_bytes: bytes) -> str:
def convert_outlook_msg(msg_bytes: bytes) -> bytes:
"""
Uses the ``msgconvert`` Perl utility to convert an Outlook MS file to
standard RFC 822 format
@@ -577,7 +592,7 @@ def convert_outlook_msg(msg_bytes: bytes) -> str:
msg_bytes (bytes): the content of the .msg file
Returns:
A RFC 822 string
A RFC 822 bytes payload
"""
if not is_outlook_msg(msg_bytes):
raise ValueError("The supplied bytes are not an Outlook MSG file")
@@ -605,8 +620,8 @@ def convert_outlook_msg(msg_bytes: bytes) -> str:
def parse_email(
data: Union[bytes, str], *, strip_attachment_payloads: Optional[bool] = False
):
data: Union[bytes, str], *, strip_attachment_payloads: bool = False
) -> dict:
"""
A simplified email parser

View File

@@ -4,8 +4,6 @@ from __future__ import annotations
from typing import Any, Optional, Union
from collections import OrderedDict
import requests
from parsedmarc import logger
@@ -40,19 +38,19 @@ class WebhookClient(object):
"Content-Type": "application/json",
}
def save_forensic_report_to_webhook(self, report: OrderedDict[str, Any]):
def save_forensic_report_to_webhook(self, report: str):
try:
self._send_to_webhook(self.forensic_url, report)
except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__()))
def save_smtp_tls_report_to_webhook(self, report: OrderedDict[str, Any]):
def save_smtp_tls_report_to_webhook(self, report: str):
try:
self._send_to_webhook(self.smtp_tls_url, report)
except Exception as error_:
logger.error("Webhook Error: {0}".format(error_.__str__()))
def save_aggregate_report_to_webhook(self, report: OrderedDict[str, Any]):
def save_aggregate_report_to_webhook(self, report: str):
try:
self._send_to_webhook(self.aggregate_url, report)
except Exception as error_:

View File

@@ -2,6 +2,7 @@
requires = [
"hatchling>=1.27.0",
]
requires_python = ">=3.9,<3.14"
build-backend = "hatchling.build"
[project]
@@ -28,7 +29,7 @@ classifiers = [
"Operating System :: OS Independent",
"Programming Language :: Python :: 3"
]
requires-python = ">=3.9, <3.14"
requires-python = ">=3.9"
dependencies = [
"azure-identity>=1.8.0",
"azure-monitor-ingestion>=1.0.0",
@@ -47,7 +48,7 @@ dependencies = [
"imapclient>=2.1.0",
"kafka-python-ng>=2.2.2",
"lxml>=4.4.0",
"mailsuite>=1.9.18",
"mailsuite>=1.11.2",
"msgraph-core==0.2.2",
"opensearch-py>=2.4.2,<=3.0.0",
"publicsuffixlist>=0.10.0",

0
tests.py Normal file → Executable file
View File