3.5 KiB
AGENTS.md
This file provides guidance to AI agents when working with code in this repository.
Project Overview
parsedmarc is a Python module and CLI utility for parsing DMARC aggregate (RUA), forensic (RUF), and SMTP TLS reports. It reads reports from IMAP, Microsoft Graph, Gmail API, Maildir, mbox files, or direct file paths, and outputs to JSON/CSV, Elasticsearch, OpenSearch, Splunk, Kafka, S3, Azure Log Analytics, syslog, or webhooks.
Common Commands
# Install with dev/build dependencies
pip install .[build]
# Run all tests with coverage
pytest --cov --cov-report=xml tests.py
# Run a single test
pytest tests.py::Test::testAggregateSamples
# Lint and format
ruff check .
ruff format .
# Test CLI with sample reports
parsedmarc --debug -c ci.ini samples/aggregate/*
parsedmarc --debug -c ci.ini samples/forensic/*
# Build docs
cd docs && make html
# Build distribution
hatch build
To skip DNS lookups during testing, set GITHUB_ACTIONS=true.
Architecture
Data flow: Input sources → CLI (cli.py:_main) → Parse (__init__.py) → Enrich (DNS/GeoIP via utils.py) → Output integrations
Key modules
parsedmarc/__init__.py— Core parsing logic. Main functions:parse_report_file(),parse_report_email(),parse_aggregate_report_xml(),parse_forensic_report(),parse_smtp_tls_report_json(),get_dmarc_reports_from_mailbox(),watch_inbox()parsedmarc/cli.py— CLI entry point (_main), config file parsing (_load_config+_parse_config), output orchestration. Supports configuration via INI files,PARSEDMARC_{SECTION}_{KEY}environment variables, or both (env vars override file values).parsedmarc/types.py— TypedDict definitions for all report types (AggregateReport,ForensicReport,SMTPTLSReport,ParsingResults)parsedmarc/utils.py— IP/DNS/GeoIP enrichment, base64 decoding, compression handlingparsedmarc/mail/— Polymorphic mail connections:IMAPConnection,GmailConnection,MSGraphConnection,MaildirConnectionparsedmarc/{elastic,opensearch,splunk,kafkaclient,loganalytics,syslog,s3,webhook,gelf}.py— Output integrations
Report type system
ReportType = Literal["aggregate", "forensic", "smtp_tls"]. Exception hierarchy: ParserError → InvalidDMARCReport → InvalidAggregateReport/InvalidForensicReport, and InvalidSMTPTLSReport.
Configuration
Config priority: CLI args > env vars > config file > defaults. Env var naming: PARSEDMARC_{SECTION}_{KEY} (e.g. PARSEDMARC_IMAP_PASSWORD). Section names with underscores use longest-prefix matching (PARSEDMARC_SPLUNK_HEC_TOKEN → [splunk_hec] token). Some INI keys have short aliases for env var friendliness (e.g. [maildir] create for maildir_create). File path values are expanded via os.path.expanduser/os.path.expandvars. Config can be loaded purely from env vars with no file (PARSEDMARC_CONFIG_FILE sets the file path).
Caching
IP address info cached for 4 hours, seen aggregate report IDs cached for 1 hour (via ExpiringDict).
Code Style
- Ruff for formatting and linting (configured in
.vscode/settings.json) - TypedDict for structured data, type hints throughout
- Python ≥3.10 required
- Tests are in a single
tests.pyfile using unittest; sample reports live insamples/ - File path config values must be wrapped with
_expand_path()incli.py - Maildir UID checks are intentionally relaxed (warn, don't crash) for Docker compatibility
- Token file writes must create parent directories before opening for write