parsedmarc

mirror of https://github.com/domainaware/parsedmarc.git synced 2026-05-26 21:55:25 +00:00

Author	SHA1	Message	Date
Sean Whalen	180fc581fe	fix: OSD Global-tenant import + dropped report files with glob metacharacters; validate dev stack on OpenSearch 3.x with PostgreSQL (#781 ) * fix: import OpenSearch dashboards into the real Global tenant dashboard-dev-bootstrap.sh sent `securitytenant: global_tenant`. The OpenSearch security plugin reads that header as a tenant name, and `global_tenant` is a sample custom tenant from the security demo config -- not the shared Global tenant, whose token is the literal `global`. The import therefore landed in a separate `global_tenant` tenant (its own `.kibana_<hash>_globaltenant_1` index) and the dashboards were invisible to anyone viewing the Global tenant in OpenSearch Dashboards. Verified against the live dev cluster: `_find` under `securitytenant: global` returned 26 objects and `.kibana_1` (the Global tenant index the UI reads) went from 2 to 67 docs after re-importing with the fix. An empty/omitted header read 0 from Global -- it falls back to the user's configured default tenant -- so `global` is the only reliable token. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: don't drop report files whose names contain glob metacharacters The CLI expanded every file argument with glob(), which treats [, ], , and ? as pattern syntax. A literal path like "[Netease DMARC Failure Report] Rent Reminder.eml" -- the bracketed shape many providers use for emailed failure reports -- was read as a character class, matched nothing, and was dropped before reaching the parser, with no error. File arguments that exist on disk are now taken literally; only non-existent paths are globbed, so shell-style wildcards still expand. Also adds "postgresql" to _KNOWN_SECTIONS so PARSEDMARC_POSTGRESQL_ env vars (and their _FILE Docker-secret variants) resolve like every other backend -- the PostgreSQL backend is new in 10.0.0, so this completes the unreleased feature rather than fixing a released regression, and is documented under the PostgreSQL enhancement, not Bug fixes. Regression tests added for both. Verified end-to-end: all four samples/failure/.eml now index (the bracketed Netease report included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> dev: validate dashboards on OpenSearch 3.x and add PostgreSQL to the dev stack The dev stack ran OpenSearch Dashboards 3.x against OpenSearch 2.x, an unsupported cross-major pairing. Bump opensearch to :3 (validated on 3.6.0: OSD import into the Global tenant and all dashboards work). Add a postgresql service plus bootstrap wiring so the new PostgreSQL backend is exercised alongside the others: wait for PG, seed it via PARSEDMARC_POSTGRESQL_* env vars on the same parsedmarc run, wipe it on RESEED, create a Grafana grafana-postgresql-datasource (uid dmarc-pg), and import dashboards/grafana/Grafana-DMARC_Reports-PostgreSQL.json. PG seeding is gated on psycopg being importable: parsedmarc aborts the whole run (exit 1, nothing written to any backend) when a configured output backend can't initialize, so wiring in PG without the optional extra would silently zero ES/OS/Splunk too. When psycopg is absent the script warns and skips PG, leaving the other backends seeded. Also fix the Grafana admin password env: the container was given GRAFANA_PASSWORD, which Grafana ignores -- it reads GF_SECURITY_ADMIN_PASSWORD. Defaults to admin to match the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: list PostgreSQL on the premade-dashboards features bullet PostgreSQL ships a premade Grafana dashboard (dashboards/grafana/Grafana-DMARC_Reports-PostgreSQL.json), so it belongs on the "for use with premade dashboards" bullet alongside Elasticsearch, OpenSearch, and Splunk rather than on the plain-output-destinations line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: clear stale org_email mapping conflict in the OpenSearch dashboards The aggregate index pattern in dashboards/opensearch/opensearch_dashboards.ndjson shipped a cached field-list snapshot where org_email was a text/object conflict, plus leftover org_email.#text and org_email.#text.keyword subfields. Those came from a cluster that had indexed a langAttrString email dict ({"#text": ..., "@lang": ...}) before the parser unwrapped it. org_email is mapped as Text() and parse_aggregate_report_xml now unwraps a dict email to a plain string, so current data is consistently text -- a clean cluster's _field_caps reports no conflict. Cleared the frozen conflict and the two artifact subfields, leaving org_email (text) and org_email.keyword, matching the live mapping. Verified: re-importing the corrected ndjson yields an index pattern with org_email as a plain text field and zero conflicts; only the aggregate index-pattern line changed, all other saved objects byte-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * dev: seed the RFC 9990 (dmarc-2.0) aggregate samples samples/aggregate/rfc9990-sample.xml and rfc9990-example.net!...xml were not in the bootstrap's SAMPLE_FILES, so the dev stack only ever indexed RFC 7489 reports and the new DMARCbis fields (np, testing, discovery_method, generator, xml_namespace) never appeared in the OpenSearch/Kibana indices or were available to the dashboards. Added both samples (one declares the urn:ietf:params:xml:ns:dmarc-2.0 namespace, the other is namespaceless RFC 9990-shaped, covering both detection paths). Verified the seeded data now carries np/testing/ discovery_method/generator and xml_namespace=urn:ietf:params:xml:ns:dmarc-2.0; OpenSearch Dashboards surfaces them on an index-pattern field-list refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * dev: auto-resolve (or create) a venv for the seed and ensure psycopg The seed previously required parsedmarc to be pre-installed and only warned-and-skipped PostgreSQL when psycopg was missing. Resolve the seed environment by precedence instead: 1. explicit PARSEDMARC_BIN -> used as-is, nothing installed 2. active $VIRTUAL_ENV 3. existing repo venv/ or .venv/ 4. otherwise create $REPO_ROOT/venv For cases 2-4, run `pip install -e .[postgresql]` only when the CLI or psycopg is missing, so the dev stack can populate Postgres out of the box without a manual install step. The explicit-PARSEDMARC_BIN path is left untouched (and the psycopg seed guard still warns/skips if that env lacks the extra). Verified: a RESEED run resolves the active venv, seeds ES/OS/Splunk/PG including the RFC 9990 fields, with no output-client errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 15:42:41 -04:00
Sean Whalen	caac8e68f0	docs: note DMARC RFC support in the features list (#778 ) * docs: note DMARC RFC support in the features list The features list only mentioned "draft and 1.0" aggregate reports. Spell out the standards parsedmarc parses: RFC 7489 (legacy DMARC) and the final DMARC standard RFC 9989 with RFC 9990 aggregate reports, RFC 6591 and RFC 9991 failure reports, and RFC 8460 SMTP TLS reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: align Python compatibility table pipes (MD060) The emoji cells were padded for display width, leaving the source pipes misaligned by character count and tripping markdownlint MD060. Re-pad so every row's pipes line up by codepoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: list all optional output destinations; fix table emoji alignment Expand the features list to cover every output sink: Elasticsearch, OpenSearch, Splunk, and PostgreSQL (premade dashboards), plus Kafka, Amazon S3, Azure Log Analytics (Microsoft Sentinel), Graylog (GELF), syslog, and HTTP webhooks. Also re-pad the Python compatibility table using display width (the status emoji render two columns wide), which is what markdownlint MD060 measures — the previous codepoint-based padding still tripped the rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: separate PostgreSQL from the premade-dashboards clause PostgreSQL is a storage target without bundled premade dashboards, so it shouldn't sit inside the "for use with premade dashboards" phrase next to Elasticsearch/OpenSearch/Splunk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: move PostgreSQL to the non-dashboard outputs line Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: use compact markdown tables Switch the markdown tables (Python compatibility, env-var section mapping) to compact single-space format. It reads cleanly in a text editor and sidesteps the column-alignment churn that emoji/variable-width content caused with padded tables (markdownlint MD060). The reStructuredText grid table in dmarc.md is left as-is — it relies on multi-line cells markdown can't express. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 13:41:16 -04:00
Copilot	ae1e5adb66	Add RFC 9989/9990/9991 (final DMARC) report support; rename forensic→failure project-wide (#659 ) * Add DMARCbis report support; rename forensic→failure project-wide Rebased on top of master @ `2cda5bf` (9.9.0), which added the ASN source attribution work (#712, #713, #714, #715). Individual Copilot iteration commits squashed into this single commit — the per-commit history on the feature branch was iterative (add tests, fix lint, move field, revert, etc.) and not worth preserving; GitHub squash- merges PRs anyway. New fields from the DMARCbis XSD, plumbed through types, parsing, CSV output, and the Elasticsearch / OpenSearch mappings: - ``np`` — non-existent subdomain policy (``none`` / ``quarantine`` / ``reject``) - ``testing`` — testing mode flag (``n`` / ``y``), replaces RFC 7489 ``pct`` - ``discovery_method`` — policy discovery method (``psl`` / ``treewalk``) - ``generator`` — report generator software identifier (metadata) - ``human_result`` — optional descriptive text on DKIM / SPF results RFC 7489 reports parse with ``None`` for DMARCbis-only fields. Forensic reports have been renamed to failure reports throughout the project to reflect the proper naming since RFC 7489. - Core: ``types.py``, ``__init__.py`` — ``ForensicReport`` → ``FailureReport``, ``parse_forensic_report`` → ``parse_failure_report``, report type ``"failure"``. - Output modules: ``elastic.py``, ``opensearch.py``, ``splunk.py``, ``kafkaclient.py``, ``syslog.py``, ``gelf.py``, ``webhook.py``, ``loganalytics.py``, ``s3.py``. - CLI: ``cli.py`` — args, config keys, index names (``dmarc_failure``). - Docs + dashboards: all markdown, Grafana JSON, Kibana NDJSON, Splunk XML. Backward compatibility preserved: old function / type names remain as aliases (``parse_forensic_report = parse_failure_report``, ``ForensicReport = FailureReport``, etc.), CLI accepts both the old (``save_forensic``, ``forensic_topic``) and new (``save_failure``, ``failure_topic``) config keys, and updated dashboards query both old and new index / sourcetype names so data from before and after the rename appears together. Merge conflicts resolved in ``parsedmarc/constants.py`` (took bis's 10.0.0 bump), ``parsedmarc/__init__.py`` (combined bis's "failure" wording with master's IPinfo MMDB mention), ``parsedmarc/elastic.py`` and ``parsedmarc/opensearch.py`` (kept master's ``source_asn`` / ``source_asn_name`` / ``source_asn_domain`` on the failure doc path while renaming ``forensic_report`` → ``failure_report``), and ``CHANGELOG.md`` (10.0.0 entry now sits above the 9.9.0 entry). All 324 tests pass; ``ruff check`` / ``ruff format --check`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply post-RFC review fixes: RFC 9990 detection, langAttrString, CFWS-aware RUF parsing Aligns the implementation with the final RFCs (9989/9990/9991) instead of inferring DMARCbis support from the version element or the namespace alone. Aggregate parsing (RFC 9990): - _text() helper unwraps langAttrString values (extra_contact_info, error, comment, human_result, generator) — when reporters include the lang attribute, xmltodict yields {"#text": ..., "@lang": ...} dicts instead of strings; the parser now stores the text payload in both shapes. - New xml_namespace field on AggregateReport records the declared XML namespace (urn:ietf:params:xml:ns:dmarc-2.0 for RFC 9990 reports). - RFC 9990 detection accepts namespaceless reports that follow the RFC 9990 shape (presence of np / testing / discovery_method / generator), so reporters that don't declare the namespace still receive RFC 9990- aware validation. - Warnings: missing DKIM <selector> (REQUIRED in RFC 9990); legacy forwarded / sampled_out policy-override types (removed by RFC 9990); unknown policy-override types per the RFC 9990 enumeration. - xml_namespace added to Elasticsearch and OpenSearch document mappings. Failure parsing (RFC 9991): - Identity-Alignment and Auth-Failure are split on commas with CFWS whitespace stripped per the RFC 9991 ABNF; previously "dkim, spf" yielded ["dkim", " spf"] with a leading space on the second token. - Warnings logged when either REQUIRED field is missing. Terminology: every reference to "DMARCbis" in code, tests, sample filenames, AGENTS.md, and CHANGELOG.md is replaced with the appropriate RFC number (9989 for the policy spec, 9990 for aggregate reports, 9991 for failure reports). Sample contents are unchanged. Docs: corrects the prior claim that fo was dropped from RFC 9990 (only pct was), reframes testing as a new field (not a pct replacement, since RFC 9989 Appendix A.6 removed pct with no per-message substitute), and documents the policy_override_reason enum changes (added policy_test_mode; removed forwarded / sampled_out). Tests: 8 new tests covering xml_namespace capture, RFC 9990 detection from field shape, missing-DKIM-selector warning, legacy-override-type warning, langAttrString unwrapping across all four affected elements, and CFWS-aware Identity-Alignment / Auth-Failure parsing plus their missing-field warnings. 276 tests total, all passing; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 18:51:08 -04:00
Sean Whalen	69eee9f1dc	Update sponsorship section in README and documentation	2026-04-04 22:14:38 -04:00
Sean Whalen	d6ec35d66f	Fix typo in sponsorship note heading in documentation	2026-04-04 21:52:14 -04:00
Sean Whalen	2d931ab4f1	Add sponsor link	2026-04-04 21:51:07 -04:00
Kili	e98fdfa96b	Fix Python 3.14 support metadata and require imapclient 3.1.0 (#662 )	2026-03-04 12:36:15 -05:00
Copilot	2e3ee25ec9	Drop Python 3.9 support (#661 ) * Initial plan * Drop Python 3.9 support: update CI matrix, pyproject.toml, docs, and README Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com> * Update Python 3.9 version table entry to note Debian 11/RHEL 9 usage Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: seanthegeek <44679+seanthegeek@users.noreply.github.com>	2026-03-03 11:34:35 -05:00
Anael Mobilia	50fcb51577	Update supported Python versions in docs + readme (#652 ) * Update README.md * Update index.md * Update python-tests.yml	2026-01-19 14:40:01 -05:00
Sean Whalen	445c9565a4	Update bug link in docs	2025-12-06 15:05:19 -05:00
Sean Whalen	23ae563cd8	Update Python version support details in documentation	2025-12-05 10:48:04 -05:00
Sean Whalen	a18ae439de	Fix typo in RHEL version support description in documentation	2025-12-04 10:18:15 -05:00
Sean Whalen	0922d6e83a	Add supported Python versions to the documentation index	2025-12-01 10:24:19 -05:00
Sean Whalen	865c249437	Update features list	2025-08-24 13:39:50 -04:00
Szasza Palmer	995bdbcd97	adding OpenSearch support, fixing minor typos, and code styling (#481 ) * adding OpenSearch support, fixing minor typos and code styling * documentation update	2024-03-04 10:06:26 -05:00
Sean Whalen	21d6f92fd4	Add PyPI download stats badge	2023-10-13 10:01:48 -04:00
Sean Whalen	cd475255c5	Documentation cleanup	2023-05-03 16:44:15 -04:00
Ben Companjen	2b35b785c6	Split and Organise documentation files (#404 ) * Set global TOC collapse to false * Split documentation I tried to split the index.md file into logical parts, not changing the contents. I did add a space and change one HTTP URL to HTTPS. --------- Co-authored-by: Sean Whalen <44679+seanthegeek@users.noreply.github.com>	2023-05-03 16:11:58 -04:00
rubeste	a7280988eb	Implemented Azure Log Analytics ingestion via Data Collection Rules (#394 ) * Implemented Azure Log Analytics ingestion via Data Collection Rules * Update loganalytics.py * Update cli.py * Update pyproject.toml * Fixed config bug Fixed a bug that causes the program to fail if you do not configure a Data stream. * Fixed code format	2023-05-03 15:54:25 -04:00
Anael Mobilia	02e856a9bf	From Elasticsearch 8.7, xpack security isn't on by default but is required (#395 ) ``` org.elasticsearch.ElasticsearchSecurityException: invalid configuration for xpack.security.transport.ssl - [xpack.security.transport.ssl.enabled] is not set, but the following settings have been configured in elasticsearch.yml : [xpack.security.transport.ssl.keystore.secure_password,xpack.security.transport.ssl.truststore.secure_password] ```	2023-05-03 15:39:46 -04:00
Anael Mobilia	8b8c8c15fe	Fix markdown (#384 )	2023-01-16 14:43:36 -05:00
aroldxd	fcc64ed85a	add option to allow unencrypted fallback for token cache (#375 )	2022-12-23 18:21:22 -05:00
Anael Mobilia	4217a076de	Doc - Add info on how to update max shards (#368 ) Add information on how to fix "Elasticsearch error: RequestError(400, 'validation_exception', 'Validation Failed: 1: this action would add [1] shards, but this cluster currently has [1000]/[1000] maximum normal shards open;"	2022-12-23 18:15:11 -05:00
nmourad	0a0e4beb27	Update documentation default value for ES replica setting (#376 ) Change made in 7.1.0 "Set Elasticsearch shard replication to 0 (PR #274)" Documentation was not updated Co-authored-by: n.mourad <n.mourad@criteo.com>	2022-12-23 18:14:41 -05:00
Anael Mobilia	bcf242b0ab	Fix typo (#364 )	2022-12-23 18:13:10 -05:00
Anael Mobilia	1380eed2b8	Doc - Update install documentation to Elasticsearch/Kibana 8 (#363 ) * Update elasticsearch/kibana instructions [From elastisearch notes](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#heap-size-settings) : ``` By default, Elasticsearch automatically sets the JVM heap size based on a node’s roles and total memory. We recommend the default sizing for most production environments. ``` * Update nginx conf to TLSv1.3 and IPv6 * Replace nginx proxy by native https server Kibana now provide https web server, remove the nginx proxy part and directly use kibana * Fix typo * Add infos how to login to kibana * Add interface details	2022-12-23 18:12:39 -05:00
Anael Mobilia	69c2c6bdb6	Add details on virtualenv / package installation (#361 )	2022-12-23 18:10:35 -05:00
Anael Mobilia	7c349fe97e	Add contrib component requirement on Debian (#360 )	2022-12-23 18:09:52 -05:00
Sean Whalen	4a607420a7	Fix list formatting in docs	2022-09-10 15:16:02 -04:00
Sean Whalen	12e591255c	Fix tests	2022-09-10 14:32:43 -04:00
Sean Whalen	6540577ad5	Convert docs to markdown	2022-09-10 12:53:47 -04:00

31 Commits