parsedmarc

mirror of https://github.com/domainaware/parsedmarc.git synced 2026-06-07 03:09:44 +00:00

Author	SHA1	Message	Date
Sean Whalen	46e694502d	Detect aggregate reports by "xml_schema" instead of "domain" xml_schema is aggregate-only (failure/SMTP TLS rows don't carry it) and a distinctive, non-generic field name, addressing the concern that "domain" could be confused with other logs. parsedmarc defaults xml_schema to "draft" when the report omits <version> (parsedmarc/__init__.py:832), so it survives a missing version element -- unlike a field with no default. It is also a native JSON string straight out of the json{} filter, so unlike dmarc_aligned it needs no convert step to be testable, keeping detection independent of the type-conversion in step 1b. xml_schema is added to the pre-json init block (required for any if-tested field); domain stays initialized since it is still mapped to target.hostname. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 10:36:49 -04:00
Sean Whalen	2d9a2a2a8f	Fix JSON type handling and pre-json field init in SecOps parser Two CBN behaviors, confirmed against Google's own "How to parse JSON data" guide (statedump shows JSON true/199 retaining boolean/integer type) and the published Corelight production parser: 1. The json{} filter preserves the original JSON type, so parsedmarc's boolean _aligned / testing / normalized_timespan and numeric count / _session_count / source_asn would never match string comparisons. Add a mutate{convert} step turning them into strings before any == "true"/"false" test or %{...} use. 2. CBN raises _failed_parsing_ when an `if [field]` references a field absent from the log, and most detection/mapping fields are absent in 2 of the 3 report shapes (or null within one). Initialize every conditionally-checked field to "" before the json{} filter. Without these, DMARC-fail records would not be categorized AUTH_VIOLATION and aggregate/TLS reports could fail parsing outright. README caveat and PR validation steps updated accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 10:22:02 -04:00
Sean Whalen	784e3050bd	Detect aggregate reports by "domain" instead of "adkim" adkim is the published policy's DKIM alignment mode (defaulted to "r" by parsedmarc), an obscure thing to key detection on. Switch the aggregate detector to "domain" -- the reported From-domain, a required element present and non-empty in every aggregate record (2388/2388 sample rows) and unique to aggregate (failure uses reported_domain, SMTP TLS uses policy_domain). header_from is unsuitable: it can be empty when a record carries no identifiers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 09:42:28 -04:00
Sean Whalen	ca27428713	Add Google SecOps (Chronicle) UDM parser for syslog output A SecOps-side custom parser (CBN) that maps parsedmarc's [syslog] JSON events to the Unified Data Model. No library changes: parsedmarc already emits structured JSON, so the DMARC->UDM mapping lives in the parser and a downstream UDM schema change is a parser edit, not a parsedmarc release. Covers all three report types: - aggregate -> EMAIL_TRANSACTION - failure -> EMAIL_TRANSACTION - smtp_tls -> GENERIC_EVENT (noun from policy_domain, present on every row) Built strictly against the official UDM and parser-syntax docs (cited inline). Sets metadata.event_timestamp from the report window via date{}, maps disposition / auth-failure to security_result with valid action and category enums (AUTH_VIOLATION on DMARC fail), uses real network.email field names, and strips syslog framing before JSON parsing. Ships real sample events generated from the project's sample reports for validation. Not yet validated against a live SecOps tenant; caveats are documented in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 09:24:20 -04:00

4 Commits