Commit Graph

4 Commits

Author SHA1 Message Date
Sean Whalen 46e694502d Detect aggregate reports by "xml_schema" instead of "domain"
xml_schema is aggregate-only (failure/SMTP TLS rows don't carry it) and a
distinctive, non-generic field name, addressing the concern that "domain"
could be confused with other logs. parsedmarc defaults xml_schema to "draft"
when the report omits <version> (parsedmarc/__init__.py:832), so it survives a
missing version element -- unlike a field with no default.

It is also a native JSON string straight out of the json{} filter, so unlike
dmarc_aligned it needs no convert step to be testable, keeping detection
independent of the type-conversion in step 1b. xml_schema is added to the
pre-json init block (required for any if-tested field); domain stays
initialized since it is still mapped to target.hostname.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 10:36:49 -04:00
Sean Whalen 2d9a2a2a8f Fix JSON type handling and pre-json field init in SecOps parser
Two CBN behaviors, confirmed against Google's own "How to parse JSON data"
guide (statedump shows JSON true/199 retaining boolean/integer type) and the
published Corelight production parser:

1. The json{} filter preserves the original JSON type, so parsedmarc's boolean
   *_aligned / testing / normalized_timespan and numeric count / *_session_count
   / source_asn would never match string comparisons. Add a mutate{convert} step
   turning them into strings before any == "true"/"false" test or %{...} use.

2. CBN raises _failed_parsing_ when an `if [field]` references a field absent
   from the log, and most detection/mapping fields are absent in 2 of the 3
   report shapes (or null within one). Initialize every conditionally-checked
   field to "" before the json{} filter.

Without these, DMARC-fail records would not be categorized AUTH_VIOLATION and
aggregate/TLS reports could fail parsing outright. README caveat and PR
validation steps updated accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 10:22:02 -04:00
Sean Whalen 784e3050bd Detect aggregate reports by "domain" instead of "adkim"
adkim is the published policy's DKIM alignment mode (defaulted to "r" by
parsedmarc), an obscure thing to key detection on. Switch the aggregate
detector to "domain" -- the reported From-domain, a required element present
and non-empty in every aggregate record (2388/2388 sample rows) and unique to
aggregate (failure uses reported_domain, SMTP TLS uses policy_domain).
header_from is unsuitable: it can be empty when a record carries no
identifiers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 09:42:28 -04:00
Sean Whalen ca27428713 Add Google SecOps (Chronicle) UDM parser for syslog output
A SecOps-side custom parser (CBN) that maps parsedmarc's [syslog] JSON
events to the Unified Data Model. No library changes: parsedmarc already
emits structured JSON, so the DMARC->UDM mapping lives in the parser and a
downstream UDM schema change is a parser edit, not a parsedmarc release.

Covers all three report types:
- aggregate -> EMAIL_TRANSACTION
- failure   -> EMAIL_TRANSACTION
- smtp_tls  -> GENERIC_EVENT (noun from policy_domain, present on every row)

Built strictly against the official UDM and parser-syntax docs (cited
inline). Sets metadata.event_timestamp from the report window via date{},
maps disposition / auth-failure to security_result with valid action and
category enums (AUTH_VIOLATION on DMARC fail), uses real network.email
field names, and strips syslog framing before JSON parsing. Ships real
sample events generated from the project's sample reports for validation.

Not yet validated against a live SecOps tenant; caveats are documented in
the README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 09:24:20 -04:00