mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-03-07 07:11:24 +00:00
Compare commits
3 Commits
copilot/fi
...
copilot/cr
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3dbf21f072 | ||
|
|
2d2e2bc261 | ||
|
|
f830418381 |
2
.github/workflows/python-tests.yml
vendored
2
.github/workflows/python-tests.yml
vendored
@@ -30,7 +30,7 @@ jobs:
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
|
||||
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v5
|
||||
|
||||
64
AGENTS.md
64
AGENTS.md
@@ -1,64 +0,0 @@
|
||||
# AGENTS.md
|
||||
|
||||
This file provides guidance to AI agents when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
parsedmarc is a Python module and CLI utility for parsing DMARC aggregate (RUA), forensic (RUF), and SMTP TLS reports. It reads reports from IMAP, Microsoft Graph, Gmail API, Maildir, mbox files, or direct file paths, and outputs to JSON/CSV, Elasticsearch, OpenSearch, Splunk, Kafka, S3, Azure Log Analytics, syslog, or webhooks.
|
||||
|
||||
## Common Commands
|
||||
|
||||
```bash
|
||||
# Install with dev/build dependencies
|
||||
pip install .[build]
|
||||
|
||||
# Run all tests with coverage
|
||||
pytest --cov --cov-report=xml tests.py
|
||||
|
||||
# Run a single test
|
||||
pytest tests.py::Test::testAggregateSamples
|
||||
|
||||
# Lint and format
|
||||
ruff check .
|
||||
ruff format .
|
||||
|
||||
# Test CLI with sample reports
|
||||
parsedmarc --debug -c ci.ini samples/aggregate/*
|
||||
parsedmarc --debug -c ci.ini samples/forensic/*
|
||||
|
||||
# Build docs
|
||||
cd docs && make html
|
||||
|
||||
# Build distribution
|
||||
hatch build
|
||||
```
|
||||
|
||||
To skip DNS lookups during testing, set `GITHUB_ACTIONS=true`.
|
||||
|
||||
## Architecture
|
||||
|
||||
**Data flow:** Input sources → CLI (`cli.py:_main`) → Parse (`__init__.py`) → Enrich (DNS/GeoIP via `utils.py`) → Output integrations
|
||||
|
||||
### Key modules
|
||||
|
||||
- `parsedmarc/__init__.py` — Core parsing logic. Main functions: `parse_report_file()`, `parse_report_email()`, `parse_aggregate_report_xml()`, `parse_forensic_report()`, `parse_smtp_tls_report_json()`, `get_dmarc_reports_from_mailbox()`, `watch_inbox()`
|
||||
- `parsedmarc/cli.py` — CLI entry point (`_main`), config file parsing, output orchestration
|
||||
- `parsedmarc/types.py` — TypedDict definitions for all report types (`AggregateReport`, `ForensicReport`, `SMTPTLSReport`, `ParsingResults`)
|
||||
- `parsedmarc/utils.py` — IP/DNS/GeoIP enrichment, base64 decoding, compression handling
|
||||
- `parsedmarc/mail/` — Polymorphic mail connections: `IMAPConnection`, `GmailConnection`, `MSGraphConnection`, `MaildirConnection`
|
||||
- `parsedmarc/{elastic,opensearch,splunk,kafkaclient,loganalytics,syslog,s3,webhook,gelf}.py` — Output integrations
|
||||
|
||||
### Report type system
|
||||
|
||||
`ReportType = Literal["aggregate", "forensic", "smtp_tls"]`. Exception hierarchy: `ParserError` → `InvalidDMARCReport` → `InvalidAggregateReport`/`InvalidForensicReport`, and `InvalidSMTPTLSReport`.
|
||||
|
||||
### Caching
|
||||
|
||||
IP address info cached for 4 hours, seen aggregate report IDs cached for 1 hour (via `ExpiringDict`).
|
||||
|
||||
## Code Style
|
||||
|
||||
- Ruff for formatting and linting (configured in `.vscode/settings.json`)
|
||||
- TypedDict for structured data, type hints throughout
|
||||
- Python ≥3.10 required
|
||||
- Tests are in a single `tests.py` file using unittest; sample reports live in `samples/`
|
||||
18
CHANGELOG.md
18
CHANGELOG.md
@@ -1,23 +1,5 @@
|
||||
# Changelog
|
||||
|
||||
## 9.1.1
|
||||
|
||||
### Fixes
|
||||
|
||||
- Fix the use of Elasticsearch and OpenSearch API keys (PR #660 fixes issue #653)
|
||||
|
||||
### Changes
|
||||
|
||||
- Drop support for Python 3.9 (PR #661)
|
||||
|
||||
## 9.1.0
|
||||
|
||||
## Enhancements
|
||||
|
||||
- Add TCP and TLS support for syslog output. (#656)
|
||||
- Skip DNS lookups in GitHub Actions to prevent DNS timeouts during tests timeouts. (#657)
|
||||
- Remove microseconds from DMARC aggregate report time ranges before parsing them.
|
||||
|
||||
## 9.0.10
|
||||
|
||||
- Support Python 3.14+
|
||||
|
||||
@@ -56,9 +56,9 @@ for RHEL or Debian.
|
||||
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
|
||||
| 3.7 | ❌ | End of Life (EOL) |
|
||||
| 3.8 | ❌ | End of Life (EOL) |
|
||||
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
|
||||
| 3.9 | ✅ | Supported until August 2026 (Debian 11); May 2032 (RHEL 9) |
|
||||
| 3.10 | ✅ | Actively maintained |
|
||||
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
|
||||
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
|
||||
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
|
||||
| 3.14 | ✅ | Supported (requires `imapclient>=3.1.0`) |
|
||||
| 3.14 | ✅ | Actively maintained |
|
||||
|
||||
@@ -56,12 +56,12 @@ for RHEL or Debian.
|
||||
| 3.6 | ❌ | Used in RHEL 8, but not supported by project dependencies |
|
||||
| 3.7 | ❌ | End of Life (EOL) |
|
||||
| 3.8 | ❌ | End of Life (EOL) |
|
||||
| 3.9 | ❌ | Used in Debian 11 and RHEL 9, but not supported by project dependencies |
|
||||
| 3.9 | ✅ | Supported until August 2026 (Debian 11); May 2032 (RHEL 9) |
|
||||
| 3.10 | ✅ | Actively maintained |
|
||||
| 3.11 | ✅ | Actively maintained; supported until June 2028 (Debian 12) |
|
||||
| 3.12 | ✅ | Actively maintained; supported until May 2035 (RHEL 10) |
|
||||
| 3.13 | ✅ | Actively maintained; supported until June 2030 (Debian 13) |
|
||||
| 3.14 | ✅ | Supported (requires `imapclient>=3.1.0`) |
|
||||
| 3.14 | ✅ | Actively maintained |
|
||||
|
||||
```{toctree}
|
||||
:caption: 'Contents'
|
||||
|
||||
@@ -162,10 +162,10 @@ sudo -u parsedmarc virtualenv /opt/parsedmarc/venv
|
||||
```
|
||||
|
||||
CentOS/RHEL 8 systems use Python 3.6 by default, so on those systems
|
||||
explicitly tell `virtualenv` to use `python3.10` instead
|
||||
explicitly tell `virtualenv` to use `python3.9` instead
|
||||
|
||||
```bash
|
||||
sudo -u parsedmarc virtualenv -p python3.10 /opt/parsedmarc/venv
|
||||
sudo -u parsedmarc virtualenv -p python3.9 /opt/parsedmarc/venv
|
||||
```
|
||||
|
||||
Activate the virtualenv
|
||||
|
||||
132
google_secops_parser/README.md
Normal file
132
google_secops_parser/README.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Google SecOps Parser for parsedmarc
|
||||
|
||||
A [Google Security Operations (Chronicle)](https://cloud.google.com/security/products/security-operations) custom parser for ingesting [parsedmarc](https://domainaware.github.io/parsedmarc/) syslog events into the Unified Data Model (UDM).
|
||||
|
||||
## Overview
|
||||
|
||||
parsedmarc sends DMARC aggregate reports, forensic reports, and SMTP TLS reports as JSON-formatted syslog messages. This parser transforms those JSON events into Google SecOps UDM events for threat detection and investigation.
|
||||
|
||||
### Supported Report Types
|
||||
|
||||
| Report Type | UDM Event Type | Description |
|
||||
|---|---|---|
|
||||
| DMARC Aggregate | `EMAIL_TRANSACTION` | Aggregate DMARC authentication results from reporting organizations |
|
||||
| DMARC Forensic | `EMAIL_TRANSACTION` | Individual email authentication failure reports |
|
||||
| SMTP TLS | `GENERIC_EVENT` | SMTP TLS session success/failure reports (RFC 8460) |
|
||||
|
||||
## UDM Field Mappings
|
||||
|
||||
### DMARC Aggregate Reports
|
||||
|
||||
| parsedmarc Field | UDM Field | Notes |
|
||||
|---|---|---|
|
||||
| `source_ip_address` | `principal.ip` | IP address of the email source |
|
||||
| `source_reverse_dns` | `principal.hostname` | Reverse DNS of source |
|
||||
| `source_country` | `principal.location.country_or_region` | GeoIP country of source |
|
||||
| `header_from` | `network.email.from` | From header domain |
|
||||
| `envelope_from` | `network.email.mail_from` | Envelope sender |
|
||||
| `envelope_to` | `network.email.to` | Envelope recipient |
|
||||
| `domain` | `target.hostname` | Domain the report is about |
|
||||
| `report_id` | `metadata.product_log_id` | Report identifier |
|
||||
| `disposition` | `security_result.action` | `none`→`ALLOW`, `quarantine`→`QUARANTINE`, `reject`→`BLOCK` |
|
||||
| `dmarc_aligned` | `additional.fields` | Whether DMARC passed |
|
||||
| `spf_aligned` | `additional.fields` | Whether SPF was aligned |
|
||||
| `dkim_aligned` | `additional.fields` | Whether DKIM was aligned |
|
||||
| `org_name` | `additional.fields` | Reporting organization name |
|
||||
| `count` | `additional.fields` | Number of messages |
|
||||
| `p`, `sp`, `pct` | `additional.fields` | DMARC policy settings |
|
||||
| `dkim_domains`, `dkim_results` | `additional.fields` | DKIM authentication details |
|
||||
| `spf_domains`, `spf_results` | `additional.fields` | SPF authentication details |
|
||||
|
||||
### DMARC Forensic Reports
|
||||
|
||||
| parsedmarc Field | UDM Field | Notes |
|
||||
|---|---|---|
|
||||
| `source_ip_address` | `principal.ip` | IP address of the email source |
|
||||
| `source_reverse_dns` | `principal.hostname` | Reverse DNS of source |
|
||||
| `source_country` | `principal.location.country_or_region` | GeoIP country of source |
|
||||
| `original_mail_from` | `network.email.from` | Original sender |
|
||||
| `original_rcpt_to` | `network.email.to` | Original recipient |
|
||||
| `subject` | `network.email.subject` | Email subject |
|
||||
| `reported_domain` | `target.hostname` | Reported domain |
|
||||
| `message_id` | `metadata.product_log_id` | Email message ID |
|
||||
| `arrival_date_utc` | `metadata.event_timestamp` | Arrival timestamp (UTC) |
|
||||
| `auth_failure` | `security_result.description` | Type of authentication failure |
|
||||
| `feedback_type` | `additional.fields` | Feedback report type |
|
||||
| `authentication_results` | `additional.fields` | Full authentication results string |
|
||||
| `delivery_result` | `additional.fields` | Email delivery outcome |
|
||||
|
||||
### SMTP TLS Reports
|
||||
|
||||
| parsedmarc Field | UDM Field | Notes |
|
||||
|---|---|---|
|
||||
| `sending_mta_ip` | `principal.ip` | Sending MTA IP address |
|
||||
| `receiving_ip` | `target.ip` | Receiving MTA IP address |
|
||||
| `receiving_mx_hostname` | `target.hostname` | Receiving MX hostname |
|
||||
| `report_id` | `metadata.product_log_id` | Report identifier |
|
||||
| `organization_name` | `additional.fields` | Reporting organization |
|
||||
| `policy_domain` | `additional.fields` | Policy domain |
|
||||
| `policy_type` | `additional.fields` | TLS policy type |
|
||||
| `successful_session_count` | `additional.fields` | Count of successful TLS sessions |
|
||||
| `failed_session_count` | `additional.fields` | Count of failed TLS sessions |
|
||||
| `result_type` | `additional.fields` | Failure result type |
|
||||
| `failure_reason_code` | `additional.fields` | Failure reason code |
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- A Google Security Operations (Chronicle) tenant
|
||||
- parsedmarc configured to send syslog output (see [parsedmarc documentation](https://domainaware.github.io/parsedmarc/))
|
||||
|
||||
### Steps
|
||||
|
||||
1. **Configure parsedmarc syslog output** in your `parsedmarc.ini`:
|
||||
|
||||
```ini
|
||||
[syslog]
|
||||
server = your-chronicle-forwarder.example.com
|
||||
port = 514
|
||||
```
|
||||
|
||||
2. **Create the log source** in Google SecOps:
|
||||
- Navigate to **Settings** → **Feeds** → **Add New**
|
||||
- Select **Syslog** as the source type
|
||||
- Configure to listen for parsedmarc syslog messages
|
||||
|
||||
3. **Upload the custom parser**:
|
||||
- Navigate to **Settings** → **Parsers**
|
||||
- Click **Create Custom Parser**
|
||||
- Set the **Log Type** to match your feed configuration
|
||||
- Paste the contents of `parsedmarc.conf`
|
||||
- Click **Submit**
|
||||
|
||||
4. **Validate** the parser using the Chronicle parser validation tool with sample parsedmarc JSON events.
|
||||
|
||||
## Sample Log Events
|
||||
|
||||
### Aggregate Report
|
||||
|
||||
```json
|
||||
{"xml_schema": "1.0", "org_name": "Example Inc", "org_email": "noreply@example.net", "report_id": "abc123", "begin_date": "2024-01-01 00:00:00", "end_date": "2024-01-01 23:59:59", "domain": "example.com", "adkim": "r", "aspf": "r", "p": "reject", "sp": "reject", "pct": "100", "fo": "0", "source_ip_address": "203.0.113.1", "source_country": "United States", "source_reverse_dns": "mail.example.org", "source_base_domain": "example.org", "count": 42, "spf_aligned": true, "dkim_aligned": true, "dmarc_aligned": true, "disposition": "none", "header_from": "example.com", "envelope_from": "example.com", "envelope_to": null, "dkim_domains": "example.com", "dkim_selectors": "selector1", "dkim_results": "pass", "spf_domains": "example.com", "spf_scopes": "mfrom", "spf_results": "pass"}
|
||||
```
|
||||
|
||||
### Forensic Report
|
||||
|
||||
```json
|
||||
{"feedback_type": "auth-failure", "user_agent": "Lua/1.0", "version": "1.0", "original_mail_from": "sender@example.com", "original_rcpt_to": "recipient@example.org", "arrival_date": "Mon, 01 Jan 2024 12:00:00 +0000", "arrival_date_utc": "2024-01-01 12:00:00", "source_ip_address": "198.51.100.1", "source_country": "Germany", "source_reverse_dns": "mail.example.com", "source_base_domain": "example.com", "subject": "Test Email", "message_id": "<abc@example.com>", "authentication_results": "dmarc=fail (p=reject; dis=reject) header.from=example.com", "dkim_domain": "example.com", "delivery_result": "reject", "auth_failure": "dmarc", "reported_domain": "example.com", "authentication_mechanisms": "dmarc"}
|
||||
```
|
||||
|
||||
### SMTP TLS Report
|
||||
|
||||
```json
|
||||
{"organization_name": "Example Inc", "begin_date": "2024-01-01 00:00:00", "end_date": "2024-01-01 23:59:59", "report_id": "tls-123", "policy_domain": "example.com", "policy_type": "sts", "policy_strings": "version: STSv1; mode: enforce", "mx_host_patterns": "*.mail.example.com", "successful_session_count": 1000, "failed_session_count": 5, "result_type": "certificate-expired", "sending_mta_ip": "203.0.113.10", "receiving_ip": "198.51.100.20", "receiving_mx_hostname": "mx.example.com", "receiving_mx_helo": "mx.example.com", "failure_reason_code": "X509_V_ERR_CERT_HAS_EXPIRED"}
|
||||
```
|
||||
|
||||
## UDM Reference
|
||||
|
||||
For the complete list of UDM fields, see the [Google SecOps UDM field list](https://cloud.google.com/chronicle/docs/reference/udm-field-list).
|
||||
|
||||
## License
|
||||
|
||||
This parser is part of the [parsedmarc](https://github.com/domainaware/parsedmarc) project and is distributed under the same license.
|
||||
1052
google_secops_parser/parsedmarc.conf
Normal file
1052
google_secops_parser/parsedmarc.conf
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1058,10 +1058,10 @@ def _main():
|
||||
opts.elasticsearch_password = elasticsearch_config["password"]
|
||||
# Until 8.20
|
||||
if "apiKey" in elasticsearch_config:
|
||||
opts.elasticsearch_api_key = elasticsearch_config["apiKey"]
|
||||
opts.elasticsearch_apiKey = elasticsearch_config["apiKey"]
|
||||
# Since 8.20
|
||||
if "api_key" in elasticsearch_config:
|
||||
opts.elasticsearch_api_key = elasticsearch_config["api_key"]
|
||||
opts.elasticsearch_apiKey = elasticsearch_config["api_key"]
|
||||
|
||||
if "opensearch" in config:
|
||||
opensearch_config = config["opensearch"]
|
||||
@@ -1098,10 +1098,10 @@ def _main():
|
||||
opts.opensearch_password = opensearch_config["password"]
|
||||
# Until 8.20
|
||||
if "apiKey" in opensearch_config:
|
||||
opts.opensearch_api_key = opensearch_config["apiKey"]
|
||||
opts.opensearch_apiKey = opensearch_config["apiKey"]
|
||||
# Since 8.20
|
||||
if "api_key" in opensearch_config:
|
||||
opts.opensearch_api_key = opensearch_config["api_key"]
|
||||
opts.opensearch_apiKey = opensearch_config["api_key"]
|
||||
|
||||
if "splunk_hec" in config.sections():
|
||||
hec_config = config["splunk_hec"]
|
||||
@@ -1470,12 +1470,8 @@ def _main():
|
||||
certfile_path=opts.syslog_certfile_path,
|
||||
keyfile_path=opts.syslog_keyfile_path,
|
||||
timeout=opts.syslog_timeout if opts.syslog_timeout is not None else 5.0,
|
||||
retry_attempts=opts.syslog_retry_attempts
|
||||
if opts.syslog_retry_attempts is not None
|
||||
else 3,
|
||||
retry_delay=opts.syslog_retry_delay
|
||||
if opts.syslog_retry_delay is not None
|
||||
else 5,
|
||||
retry_attempts=opts.syslog_retry_attempts if opts.syslog_retry_attempts is not None else 3,
|
||||
retry_delay=opts.syslog_retry_delay if opts.syslog_retry_delay is not None else 5,
|
||||
)
|
||||
except Exception as error_:
|
||||
logger.error("Syslog Error: {0}".format(error_.__str__()))
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
__version__ = "9.1.1"
|
||||
__version__ = "9.0.10"
|
||||
|
||||
USER_AGENT = f"parsedmarc/{__version__}"
|
||||
|
||||
@@ -413,8 +413,8 @@ def save_aggregate_report_to_elasticsearch(
|
||||
org_name_query = Q(dict(match_phrase=dict(org_name=org_name))) # type: ignore
|
||||
report_id_query = Q(dict(match_phrase=dict(report_id=report_id))) # pyright: ignore[reportArgumentType]
|
||||
domain_query = Q(dict(match_phrase={"published_policy.domain": domain})) # pyright: ignore[reportArgumentType]
|
||||
begin_date_query = Q(dict(range=dict(date_begin=dict(gte=begin_date)))) # pyright: ignore[reportArgumentType]
|
||||
end_date_query = Q(dict(range=dict(date_end=dict(lte=end_date)))) # pyright: ignore[reportArgumentType]
|
||||
begin_date_query = Q(dict(match=dict(date_begin=begin_date))) # pyright: ignore[reportArgumentType]
|
||||
end_date_query = Q(dict(match=dict(date_end=end_date))) # pyright: ignore[reportArgumentType]
|
||||
|
||||
if index_suffix is not None:
|
||||
search_index = "dmarc_aggregate_{0}*".format(index_suffix)
|
||||
|
||||
@@ -413,8 +413,8 @@ def save_aggregate_report_to_opensearch(
|
||||
org_name_query = Q(dict(match_phrase=dict(org_name=org_name)))
|
||||
report_id_query = Q(dict(match_phrase=dict(report_id=report_id)))
|
||||
domain_query = Q(dict(match_phrase={"published_policy.domain": domain}))
|
||||
begin_date_query = Q(dict(range=dict(date_begin=dict(gte=begin_date))))
|
||||
end_date_query = Q(dict(range=dict(date_end=dict(lte=end_date))))
|
||||
begin_date_query = Q(dict(match=dict(date_begin=begin_date)))
|
||||
end_date_query = Q(dict(match=dict(date_end=end_date)))
|
||||
|
||||
if index_suffix is not None:
|
||||
search_index = "dmarc_aggregate_{0}*".format(index_suffix)
|
||||
|
||||
@@ -2,7 +2,7 @@ from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
|
||||
|
||||
# NOTE: This module is intentionally Python 3.10 compatible.
|
||||
# NOTE: This module is intentionally Python 3.9 compatible.
|
||||
# - No PEP 604 unions (A | B)
|
||||
# - No typing.NotRequired / Required (3.11+) to avoid an extra dependency.
|
||||
# For optional keys, use total=False TypedDicts.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
requires = [
|
||||
"hatchling>=1.27.0",
|
||||
]
|
||||
requires_python = ">=3.10,<3.15"
|
||||
requires_python = ">=3.9,<3.14"
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[project]
|
||||
@@ -29,7 +29,7 @@ classifiers = [
|
||||
"Operating System :: OS Independent",
|
||||
"Programming Language :: Python :: 3"
|
||||
]
|
||||
requires-python = ">=3.10"
|
||||
requires-python = ">=3.9"
|
||||
dependencies = [
|
||||
"azure-identity>=1.8.0",
|
||||
"azure-monitor-ingestion>=1.0.0",
|
||||
@@ -45,7 +45,7 @@ dependencies = [
|
||||
"google-auth-httplib2>=0.1.0",
|
||||
"google-auth-oauthlib>=0.4.6",
|
||||
"google-auth>=2.3.3",
|
||||
"imapclient>=3.1.0",
|
||||
"imapclient>=2.1.0",
|
||||
"kafka-python-ng>=2.2.2",
|
||||
"lxml>=4.4.0",
|
||||
"mailsuite>=1.11.2",
|
||||
|
||||
Reference in New Issue
Block a user