mirror of https://github.com/domainaware/parsedmarc.git synced 2026-07-05 16:25:09 +00:00

T

Sean Whalen 06d277686d classify_unknown_domains.py: enforce concept-parity across ~60 languages (#765 )

Multilingual detectors previously held English at full breadth (e.g. Healthcare =
hospital + clinic + pharmacy + healthcare + pharmaceutical industry + nursing
home + medical center) while many non-English sections covered the same concept
set with only one or two transliterated words. This left every language other
than English under-detecting against pages that used the operator's natural
compound terms.

Reworked every detector so each language now expresses the same English concept
set in idiomatic compounds — never inventing calques where no natural form
exists. Added ~32 new languages (Macedonian, Belarusian, Azerbaijani, Armenian,
Georgian, Kazakh, Uzbek, Mongolian, Khmer, Burmese, Lao, Nepali, Sinhala,
Amharic, Yoruba, Hausa, Igbo, Zulu, Pashto, Kurdish, Tajik, Kyrgyz, Maltese,
Luxembourgish, Haitian Creole, Frisian, Yiddish, Faroese, Tatar, Javanese,
Sundanese, Cebuano) on top of the existing pool, again applied per-concept
rather than as token presence.

Also added British / American spelling pairs where they diverge (`tire`/`tyre`,
`defense`/`defence`, `center`/`centre`, etc.) and a handful of new English
concepts that had been implicit (`tire shop`, `car parts`, `oil exploration`,
`olympic committee`, ...) — each with its multilingual equivalents in the same
edit.

AGENTS.md: codified the rule under "Maintaining the reverse DNS maps" so future
edits are bound by it: every language section must cover the same concept set
the English section covers, with idiomatic compounds rather than calques, skip
rather than invent when no natural form exists, and any new English keyword
must be added in parallel across the existing language set.

Final shape: 11,777 alternations / 175,556 chars across 45 detectors. Ruff
check + format clean. Module compiles.

Known limitation (pre-existing, unchanged): Python's `re` does not treat
Unicode Mn / Mc combining marks as word characters, so Brahmic-script words
ending in vowel signs / virama won't match the outer `\b…\b`. Affects
pre-existing and new entries equally; fixable later by switching to the
`regex` module.

Co-authored-by: Sean Whalen <seanthegeek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-07 19:01:15 -04:00

.claude

SIGHUP-based configuration reload for watch mode (#697 )

2026-03-21 16:14:48 -04:00

.github

Align Kibana dashboards with OpenSearch Dashboards source-of-truth (#737 )

2026-04-27 01:30:48 -04:00

.vscode

Update dashboard documentation

2026-05-03 12:36:06 -04:00

dashboards

Fix splunk SMTP TLS dashboard: add additional renames for failure details and adjust stats query

2026-05-03 19:58:29 -04:00

docs

docs: update installation instructions for IPinfo Lite and MaxMind GeoLite2 databases

2026-05-04 18:52:18 -04:00

parsedmarc

classify_unknown_domains.py: enforce concept-parity across ~60 languages (#765 )

2026-05-07 19:01:15 -04:00

samples

Add example google SMTP-TLS report email

2024-09-04 20:03:51 -04:00

.dockerignore

Add Dockerfile & build/push task (#316 )

2022-05-05 21:06:38 -04:00

.gitattributes

Add additional samples and ensure git does not touch CRLF (#456 )

2024-01-02 16:29:06 -05:00

.gitignore

9.7.0 (#709 )

2026-04-19 21:20:41 -04:00

AGENTS.md

classify_unknown_domains.py: enforce concept-parity across ~60 languages (#765 )

2026-05-07 19:01:15 -04:00

build.sh

Format on build

2025-12-12 15:56:52 -05:00

CHANGELOG.md

Bump mailsuite to >=2.0.2 for 9.11.1 release (#743 )

2026-04-30 11:59:11 -04:00

ci.ini

Skip DNS lookups in GitHub Actions to prevent test timeouts (#657 )

2026-02-18 18:19:28 -05:00

CLAUDE.md

Add AGENTS.md for AI agent guidance and link from CLAUDE.md

2026-03-03 21:00:55 -05:00

codecov.yml

Tune Codecov statuses for small PRs (#678 )

2026-03-09 17:43:34 -04:00

CONTRIBUTING.md

Add contributing guide (#685 )

2026-03-09 18:16:47 -04:00

dashboard-dev-bootstrap.sh

Align Kibana dashboards with OpenSearch Dashboards source-of-truth (#737 )

2026-04-27 01:30:48 -04:00

docker-compose.dashboard-dev.yml

9.4.0

2026-03-23 17:08:26 -04:00

docker-compose.yml

Update OpenSearch healthcheck to use HTTPS and include authentication

2026-03-16 17:53:37 -04:00

Dockerfile

Updated default python docker base image to 3.13-slim (#618 )

2025-10-29 22:34:06 -04:00

LICENSE

First commit

2018-02-05 20:23:07 -05:00

publish-docs.sh

Add publish-docs.sh

2022-10-04 18:45:57 -04:00

pyproject.toml

Bump mailsuite to >=2.0.2 for 9.11.1 release (#743 )

2026-04-30 11:59:11 -04:00

README.md

Update sponsorship section in README and documentation

2026-04-04 22:14:38 -04:00

SECURITY.md

Add security policy (#688 )

2026-03-09 18:24:16 -04:00

tests.py

Offload mailbox layer to mailsuite>=2.0.0 (#741 )

2026-04-28 00:58:36 -04:00

README.md

parsedmarc

parsedmarc is a Python module and CLI utility for parsing DMARC reports. When used with Elasticsearch and Kibana (or Splunk), it works as a self-hosted open-source alternative to commercial DMARC report processing services such as Agari Brand Protection, Dmarcian, OnDMARC, ProofPoint Email Fraud Defense, and Valimail.

Note

Domain-based Message Authentication, Reporting, and Conformance (DMARC) is an email authentication protocol.

Features

Parses draft and 1.0 standard aggregate/rua DMARC reports
Parses forensic/failure/ruf DMARC reports
Parses reports from SMTP TLS Reporting
Can parse reports from an inbox over IMAP, Microsoft Graph, or Gmail API
Transparently handles gzip or zip compressed reports
Consistent data structures
Simple JSON and/or CSV output
Optionally email the results
Optionally send the results to Elasticsearch, Opensearch, and/or Splunk, for use with premade dashboards
Optionally send reports to Apache Kafka

Python Compatibility

This project supports the following Python versions, which are either actively maintained or are the default versions for RHEL or Debian.

Version	Supported	Reason
< 3.6	❌	End of Life (EOL)
3.6	❌	Used in RHEL 8, but not supported by project dependencies
3.7	❌	End of Life (EOL)
3.8	❌	End of Life (EOL)
3.9	❌	Used in Debian 11 and RHEL 9, but not supported by project dependencies
3.10	✅	Actively maintained
3.11	✅	Actively maintained; supported until June 2028 (Debian 12)
3.12	✅	Actively maintained; supported until May 2035 (RHEL 10)
3.13	✅	Actively maintained; supported until June 2030 (Debian 13)
3.14	✅	Supported (requires `imapclient>=3.1.0`)

README.md

parsedmarc

Sponsors

Features

Python Compatibility