mirror of
https://github.com/domainaware/parsedmarc.git
synced 2026-06-07 11:19:44 +00:00
aabcfb4298
Two corrections confirmed against Google's official content-hub parsers
(content/parsers/third_party/community/*/cbn):
1. Numbers as numbers. count, source_asn, successful_session_count and
failed_session_count were being stored in additional.fields as string_value.
Store them as number_value instead (build string -> convert to uinteger ->
rename to number_value, the content-hub idiom), so SecOps can range-query and
sort them, per parsedmarc's "store numbers as numbers" rule. Booleans stay
string_value (content-hub never uses bool_value) and are still converted in
step 1b for the == "true"/"false" comparisons.
2. Conditional guards. Replaced bare `if [field] {` with `if [field] != "" {`
(76 guards + the detection cascade + policy_override). After 1a initializes
every tested field to "", a bare `if` is true for an empty field (Logstash/CBN
semantics), which would misfire detection and emit empty labels. content-hub
uses `!= ""` ~111x vs 2 bare (both flags); parser flags (no_json_payload,
not_json, *_nan) correctly stay bare.
Verified: braces balance, no stray bare field-guards, all if-tested fields
initialized, all four numeric fields emit number_value.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
223 lines
15 KiB
Markdown
223 lines
15 KiB
Markdown
# Google SecOps (Chronicle) parser for parsedmarc
|
|
|
|
A [Google Security Operations](https://cloud.google.com/security/products/security-operations)
|
|
custom parser (configuration-based normalizer / CBN) that maps the JSON events
|
|
parsedmarc emits through its built-in `[syslog]` output to the Unified Data
|
|
Model (UDM).
|
|
|
|
This is a **SecOps-side parser only** — it requires no changes to parsedmarc.
|
|
parsedmarc already ships structured JSON over syslog; the DMARC→UDM mapping
|
|
lives here so that a downstream UDM schema change is a parser edit rather than a
|
|
parsedmarc release.
|
|
|
|
> **New to SecOps parsers?** SecOps ingests a log source by running a *parser*
|
|
> that turns each raw log line into a [Unified Data Model](https://cloud.google.com/chronicle/docs/event-processing/udm-overview)
|
|
> (UDM) event. These parsers are written in a Logstash-style configuration
|
|
> language Google calls a **configuration-based normalizer (CBN)** — the
|
|
> `parsedmarc.conf` in this directory is one. You attach it to a custom *log
|
|
> type*, and SecOps then runs it on every parsedmarc syslog line. Already fluent
|
|
> in CBN? Skip to [Installation](#installation).
|
|
|
|
## Status
|
|
|
|
> [!IMPORTANT]
|
|
> This parser was written strictly against the official Google documentation
|
|
> linked at the bottom of this file, but it has **not yet been validated against
|
|
> a live SecOps tenant**. Before using it in production, paste it into the SecOps
|
|
> parser-validation tool and confirm each sample event below parses and that the
|
|
> assertions in [Caveats](#caveats) hold. Please report fixes back to the
|
|
> [parsedmarc](https://github.com/domainaware/parsedmarc) project.
|
|
|
|
## Supported report types
|
|
|
|
parsedmarc emits three flat JSON shapes (one object per syslog line). The parser
|
|
detects them by a field unique to each and maps them as follows:
|
|
|
|
| parsedmarc report | Detected by | UDM `metadata.event_type` |
|
|
|---|---|---|
|
|
| DMARC aggregate | `xml_schema` | `EMAIL_TRANSACTION` |
|
|
| DMARC failure | `feedback_type` | `EMAIL_TRANSACTION` |
|
|
| SMTP TLS (RFC 8460) | `policy_type` | `GENERIC_EVENT` |
|
|
|
|
`EMAIL_TRANSACTION` and `GENERIC_EVENT` are both valid `metadata.event_type`
|
|
values. Note that **`GENERIC_EVENT` events only appear in raw-log and UDM
|
|
search**, not in the curated SecOps views — that is the documented behaviour for
|
|
generic events, and it is why SMTP TLS reports surface differently from the two
|
|
DMARC types.
|
|
|
|
## Caveats
|
|
|
|
1. **Unvalidated** — see [Status](#status).
|
|
2. **JSON types** — Chronicle's `json{}` filter **preserves the original JSON
|
|
type**, so parsedmarc's booleans and numbers are handled differently:
|
|
- **Booleans** (`dmarc_aligned` / `spf_aligned` / `dkim_aligned` / `testing`
|
|
/ `normalized_timespan`) are converted to strings so `[dmarc_aligned] ==
|
|
"false"` works, and stored as `string_value` (Google's content-hub parsers
|
|
never use `bool_value`).
|
|
- **Numbers** (`count` / `*_session_count` / `source_asn`) are stored as
|
|
`number_value` — built as a string, `convert`-ed to `uinteger`, then
|
|
renamed — so SecOps can range-query and sort them (parsedmarc's "store
|
|
numbers as numbers" rule).
|
|
|
|
Every `if`-tested field is initialized to `""` *before* `json` and guarded
|
|
with `!= ""`: CBN raises `_failed_parsing_` on a conditional referencing an
|
|
absent field, and treats an initialized-but-empty field as present. A
|
|
DMARC-fail record (`dmarc_aligned=false`) should yield
|
|
`security_result.category = AUTH_VIOLATION` — worth confirming in the
|
|
validation tool.
|
|
3. **Aggregate count** — a DMARC aggregate record summarizes `count` messages
|
|
from one source IP, not a single message. Each record becomes one
|
|
`EMAIL_TRANSACTION` with `count` carried in `additional.fields`. There is no
|
|
first-class per-message expansion (fanning out `count` copies would
|
|
misrepresent the data).
|
|
4. **Address format** — aggregate reports only carry the From *domain*, so
|
|
`network.email.from` holds a bare domain for aggregate events but a full
|
|
address for failure events. UDM email-address fields are expected to be
|
|
`local-mailbox@domain`; downstream consumers should account for the
|
|
aggregate-domain case.
|
|
|
|
## UDM field mappings
|
|
|
|
All UDM field names below are from the
|
|
[UDM field list](https://cloud.google.com/chronicle/docs/reference/udm-field-list)
|
|
and [SecurityResult reference](https://cloud.google.com/chronicle/docs/reference/rest/v1alpha/SecurityResult).
|
|
|
|
### DMARC aggregate → `EMAIL_TRANSACTION`
|
|
|
|
| parsedmarc field | UDM field |
|
|
|---|---|
|
|
| `begin_date` | `metadata.event_timestamp` (via `date{}`) |
|
|
| `report_id` | `metadata.product_log_id` |
|
|
| `source_ip_address` | `principal.ip` |
|
|
| `source_reverse_dns` | `principal.hostname` |
|
|
| `source_country` | `principal.location.country_or_region` |
|
|
| `domain` | `target.hostname` |
|
|
| `header_from` | `network.email.from` (domain; see caveat 4) |
|
|
| `disposition` | `security_result.action` (`none`→`ALLOW`, `quarantine`→`QUARANTINE`, `reject`→`BLOCK`) |
|
|
| `dmarc_aligned=false` | `security_result.category = AUTH_VIOLATION` |
|
|
| `org_name`, `org_email`, `count`, `p`, `sp`, `np`, `pct`, `fo`, `adkim`, `aspf`, `testing`, `discovery_method`, `normalized_timespan`, `*_aligned`, `dkim_*`, `spf_*`, `policy_override_*`, `source_base_domain`, `source_name`, `source_type`, `source_asn`, `source_as_name`, `source_as_domain`, `envelope_from`, `envelope_to` | `additional.fields` |
|
|
|
|
### DMARC failure → `EMAIL_TRANSACTION`
|
|
|
|
| parsedmarc field | UDM field |
|
|
|---|---|
|
|
| `arrival_date_utc` | `metadata.event_timestamp` (via `date{}`) |
|
|
| `message_id` | `metadata.product_log_id`, `network.email.mail_id` |
|
|
| `source_ip_address` | `principal.ip` |
|
|
| `source_reverse_dns` | `principal.hostname` |
|
|
| `source_country` | `principal.location.country_or_region` |
|
|
| `reported_domain` | `target.hostname` |
|
|
| `original_mail_from` | `network.email.from` |
|
|
| `original_rcpt_to` | `network.email.to` |
|
|
| `subject` | `network.email.subject` |
|
|
| `auth_failure` | `security_result.category = AUTH_VIOLATION` + description |
|
|
| `delivery_result` | `security_result.action` (`reject`→`BLOCK`, `quarantine`→`QUARANTINE`, `delivered`→`ALLOW`) |
|
|
| `feedback_type`, `authentication_results`, `authentication_mechanisms`, `user_agent`, `dkim_domain`, `arrival_date` | `additional.fields` |
|
|
|
|
### SMTP TLS → `GENERIC_EVENT`
|
|
|
|
| parsedmarc field | UDM field |
|
|
|---|---|
|
|
| `begin_date` | `metadata.event_timestamp` (ISO 8601, via `date{}`) |
|
|
| `report_id` | `metadata.product_log_id` |
|
|
| `policy_domain` | `target.hostname` (always present → the noun) |
|
|
| `receiving_ip` | `target.ip` (failure rows only) |
|
|
| `sending_mta_ip` | `principal.ip` (failure rows only) |
|
|
| `result_type` | `security_result` (`action=FAIL`, `category=POLICY_VIOLATION`) |
|
|
| `organization_name`, `policy_type`, `policy_strings`, `mx_host_patterns`, `successful_session_count`, `failed_session_count`, `failure_reason_code`, `receiving_mx_hostname`, `receiving_mx_helo`, `additional_info_uri` | `additional.fields` |
|
|
|
|
> parsedmarc emits SMTP TLS reports as separate rows: one **success** row per
|
|
> policy (counts, no MTA IPs) and one **failure** row per failure detail (which
|
|
> may also lack MTA IPs, e.g. `sts-policy-fetch-error`). The noun therefore comes
|
|
> from `policy_domain`, which is present on every row.
|
|
|
|
## Installation
|
|
|
|
### 1. Configure parsedmarc syslog output
|
|
|
|
```ini
|
|
[syslog]
|
|
server = your-collector.example.com
|
|
port = 514
|
|
```
|
|
|
|
parsedmarc writes each report record as a single-line JSON message.
|
|
|
|
### 2. Collect the syslog stream into SecOps
|
|
|
|
Syslog is ingested by a **collector**, not a Feed. Run the
|
|
[Bindplane agent](https://cloud.google.com/chronicle/docs/install/install-forwarder)
|
|
(Google's recommended on-premises collector; the legacy Chronicle forwarder is
|
|
end-of-life) with a **Syslog** collector pointed at the port above, and assign it
|
|
a custom log type (for example `PARSEDMARC`).
|
|
|
|
### 3. Install this parser for that log type
|
|
|
|
Associate `parsedmarc.conf` with the custom log type via the SecOps parser
|
|
management UI or API (see
|
|
[Manage parsers](https://cloud.google.com/chronicle/docs/event-processing/manage-parser-updates)).
|
|
Validate against the sample events below before activating.
|
|
|
|
## Sample events for validation
|
|
|
|
These are **real** single-line outputs from parsedmarc's `[syslog]` serializers
|
|
(generated from the project's sample reports). Use them in the parser-validation
|
|
tool. A live syslog line will also carry a `<PRI>` prefix; the parser strips any
|
|
leading framing before the first `{`.
|
|
|
|
### DMARC Aggregate — fail (`dmarc_aligned=false`)
|
|
|
|
```json
|
|
{"xml_schema": "draft", "org_name": "accurateplastics.com", "org_email": "administrator@accurateplastics.com", "org_extra_contact_info": "", "report_id": "example.com:1538463741", "begin_date": "2018-10-01 17:07:12", "end_date": "2018-10-01 17:07:12", "normalized_timespan": false, "errors": "", "domain": "example.com", "adkim": "r", "aspf": "r", "p": "none", "sp": "reject", "np": "", "pct": "100", "fo": "", "testing": "", "discovery_method": "", "source_ip_address": "12.20.127.122", "source_country": "US", "source_reverse_dns": "", "source_base_domain": "", "source_name": "AT&T", "source_type": "ISP", "source_asn": 7018, "source_as_name": "AT&T Enterprises, LLC", "source_as_domain": "att.com", "count": 1, "spf_aligned": false, "dkim_aligned": false, "dmarc_aligned": false, "disposition": "none", "policy_override_reasons": "", "policy_override_comments": "", "envelope_from": "", "header_from": "example.com", "envelope_to": "", "dkim_domains": "", "dkim_selectors": "", "dkim_results": "", "spf_domains": "", "spf_scopes": "", "spf_results": ""}
|
|
```
|
|
|
|
### DMARC Aggregate — pass (`dmarc_aligned=true`)
|
|
|
|
```json
|
|
{"xml_schema": "1.0", "org_name": "example.org", "org_email": "noreply-dmarc-support@example.org", "org_extra_contact_info": "https://support.example.org/dmarc", "report_id": "20240125141224705995", "begin_date": "2024-01-25 05:12:24", "end_date": "2024-01-25 12:28:53", "normalized_timespan": false, "errors": "", "domain": "example.com", "adkim": "r", "aspf": "r", "p": "quarantine", "sp": "quarantine", "np": "", "pct": "100", "fo": "1", "testing": "", "discovery_method": "", "source_ip_address": "198.51.100.123", "source_country": "", "source_reverse_dns": "", "source_base_domain": "", "source_name": "", "source_type": "", "source_asn": "", "source_as_name": "", "source_as_domain": "", "count": 2, "spf_aligned": false, "dkim_aligned": true, "dmarc_aligned": true, "disposition": "none", "policy_override_reasons": "none", "policy_override_comments": "none", "envelope_from": "example.edu", "header_from": "example.com", "envelope_to": "example.net", "dkim_domains": "example.com", "dkim_selectors": "example", "dkim_results": "pass", "spf_domains": "example.edu", "spf_scopes": "mfrom", "spf_results": "pass"}
|
|
```
|
|
|
|
### DMARC Failure report
|
|
|
|
```json
|
|
{"feedback_type": "auth-failure", "user_agent": "Lua/1.0", "version": "1.0", "original_mail_from": "sharepoint@domain.de", "original_rcpt_to": "peter.pan@domain.de", "arrival_date": "Mon, 01 Oct 2018 11:20:27 +0200", "message_id": "<38.E7.30937.BD6E1BB5@ mailrelay.de>", "authentication_results": "dmarc=fail (p=none, dis=none) header.from=domain.de", "delivery_result": "policy", "auth_failure": "dmarc", "reported_domain": "domain.de", "arrival_date_utc": "2018-10-01 09:20:27", "authentication_mechanisms": "", "original_envelope_id": null, "dkim_domain": null, "sample_headers_only": false, "source_ip_address": "10.10.10.10", "source_reverse_dns": null, "source_base_domain": null, "source_name": null, "source_type": null, "source_asn": null, "source_as_name": null, "source_as_domain": null, "source_country": null, "subject": "Subject"}
|
|
```
|
|
|
|
### SMTP TLS — success row (counts only)
|
|
|
|
```json
|
|
{"organization_name": "Synametrics Technologies, Inc.", "begin_date": "2025-12-07T19:00:00Z", "end_date": "2025-12-08T18:59:59Z", "report_id": "1765256572301+dmarc-reports.dengage.com", "policy_strings": "version: STSv1|mode: enforce|mx: mta1.inboxsys.net|mx: mta2.inboxsys.net|max_age: 86400", "policy_domain": "dmarc-reports.dengage.com", "policy_type": "sts", "successful_session_count": 2, "failed_session_count": 0}
|
|
```
|
|
|
|
### SMTP TLS — failure-detail row
|
|
|
|
```json
|
|
{"organization_name": "Mail.ru", "begin_date": "2024-02-22T00:00:00Z", "end_date": "2024-02-23T00:00:00Z", "report_id": "b28254de-7b2e-be36-bb5c-4c3b92da8b25@mail.ru", "result_type": "sts-policy-fetch-error", "failed_session_count": 1, "failure_reason_code": "bad https response code: 404"}
|
|
```
|
|
|
|
## Official references
|
|
|
|
- [Overview of the UDM](https://cloud.google.com/chronicle/docs/event-processing/udm-overview)
|
|
- [UDM field list](https://cloud.google.com/chronicle/docs/reference/udm-field-list)
|
|
- [SecurityResult reference](https://cloud.google.com/chronicle/docs/reference/rest/v1alpha/SecurityResult)
|
|
- [Overview of log parsing](https://cloud.google.com/chronicle/docs/event-processing/parsing-overview)
|
|
- [Parser syntax reference](https://cloud.google.com/chronicle/docs/reference/parser-syntax)
|
|
- [Tips and troubleshooting when writing parsers](https://cloud.google.com/chronicle/docs/event-processing/parser-tips-troubleshooting) — intermediate fields are discarded unless mapped to `@output`; one parser is active per log type.
|
|
- [Manage prebuilt and custom parsers](https://cloud.google.com/chronicle/docs/event-processing/manage-parser-updates)
|
|
- [Chronicle content-hub](https://github.com/chronicle/content-hub) — Google's official repository of third-party SecOps parsers. Its CBN parsers (e.g. [`CLOUDFLARE_PAGESHIELD`](https://github.com/chronicle/content-hub/tree/main/content/parsers/third_party/community/CLOUDFLARE_PAGESHIELD)) use the same conventions this one does: initialize fields before `json`, `convert` JSON types to strings, map to `event.idm.read_only_udm.*`, and finalize with `@output`.
|
|
- [UDM search](https://cloud.google.com/chronicle/docs/investigation/udm-search) — `GENERIC_EVENT` events only surface in raw-log / UDM search, not curated views.
|
|
- [Install the Bindplane agent (collector)](https://cloud.google.com/chronicle/docs/install/install-forwarder)
|
|
- [Feed management](https://cloud.google.com/chronicle/docs/administration/feed-management-overview)
|
|
|
|
## Additional sources and tooling
|
|
|
|
Community resources (not official Google documentation) that informed this parser's JSON handling and are useful when validating it:
|
|
|
|
- [Parsing 101: Best Practices & Tips](https://medium.com/@thatsiemguy/parsing-101-best-practices-tips-c2e8b7ce9db8) (Chris Martin / @thatsiemguy) — basis for initializing every `if`-tested field before the `json` filter to avoid `_failed_parsing_`.
|
|
- [Corelight parser for SecOps](https://github.com/corelight/CorelightForSecOps) — a large production CBN parser that demonstrates the "convert JSON booleans/numbers to strings" idiom this parser relies on (the `json` filter preserves the original JSON type).
|
|
- [chronicle/cbn-tool](https://github.com/chronicle/cbn-tool) — CLI for the CBN parser APIs (submit and validate a parser).
|
|
|
|
## License
|
|
|
|
Distributed under the same license as [parsedmarc](https://github.com/domainaware/parsedmarc).
|