Merge branch 'master' into copilot/support-dmarcbis-reports

This commit is contained in:
Sean Whalen
2026-03-09 17:49:02 -04:00
committed by GitHub
7 changed files with 546 additions and 48 deletions
+34
View File
@@ -146,6 +146,9 @@ The full set of configuration options are:
- `dns_timeout` - float: DNS timeout period
- `debug` - bool: Print debugging messages
- `silent` - bool: Only print errors (Default: `True`)
- `fail_on_output_error` - bool: Exit with a non-zero status code if
any configured output destination fails while saving/publishing
reports (Default: `False`)
- `log_file` - str: Write log messages to a file at this path
- `n_procs` - int: Number of process to run in parallel when
parsing in CLI mode (Default: `1`)
@@ -281,6 +284,10 @@ The full set of configuration options are:
- `user` - str: Basic auth username
- `password` - str: Basic auth password
- `api_key` - str: API key
- `auth_type` - str: Authentication type: `basic` (default) or `awssigv4` (the key `authentication_type` is accepted as an alias for this option)
- `aws_region` - str: AWS region for SigV4 authentication
(required when `auth_type = awssigv4`)
- `aws_service` - str: AWS service for SigV4 signing (Default: `es`)
- `ssl` - bool: Use an encrypted SSL/TLS connection
(Default: `True`)
- `timeout` - float: Timeout in seconds (Default: 60)
@@ -511,6 +518,33 @@ PUT _cluster/settings
Increasing this value increases resource usage.
:::
## Performance tuning
For large mailbox imports or backfills, parsedmarc can consume a noticeable amount
of memory, especially when it runs on the same host as Elasticsearch or
OpenSearch. The following settings can reduce peak memory usage and make long
imports more predictable:
- Reduce `mailbox.batch_size` to smaller values such as `100-500` instead of
processing a very large message set at once. Smaller batches trade throughput
for lower peak memory use and less sink pressure.
- Keep `n_procs` low for mailbox-heavy runs. In practice, `1-2` workers is often
a safer starting point for large backfills than aggressive parallelism.
- Use `mailbox.since` to process reports in smaller time windows such as `1d`,
`7d`, or another interval that fits the backlog. This makes it easier to catch
up incrementally instead of loading an entire mailbox history in one run.
- Set `strip_attachment_payloads = True` when forensic reports contain large
attachments and you do not need to retain the raw payloads in the parsed
output.
- Prefer running parsedmarc separately from Elasticsearch or OpenSearch, or
reserve enough RAM for both services if they must share a host.
- For very large imports, prefer incremental supervised runs, such as a
scheduler or systemd service, over infrequent massive backfills.
These are operational tuning recommendations rather than hard requirements, but
they are often enough to avoid memory pressure and reduce failures during
high-volume mailbox processing.
## Multi-tenant support
Starting in `8.19.0`, ParseDMARC provides multi-tenant support by placing data into separate OpenSearch or Elasticsearch index prefixes. To set this up, create a YAML file that is formatted where each key is a tenant name, and the value is a list of domains related to that tenant, not including subdomains, like this: