Merge branch 'master' into copilot/support-dmarcbis-reports

2026-06-11 21:19:44 +00:00 · 2026-03-09 17:49:02 -04:00
parent fce8e2247b 326e630f50
commit e77ceb30c7
7 changed files with 546 additions and 48 deletions
@@ -146,6 +146,9 @@ The full set of configuration options are:
  - `dns_timeout` - float: DNS timeout period
  - `debug` - bool: Print debugging messages
  - `silent` - bool: Only print errors (Default: `True`)
+  - `fail_on_output_error` - bool: Exit with a non-zero status code if
+      any configured output destination fails while saving/publishing
+      reports (Default: `False`)
  - `log_file` - str: Write log messages to a file at this path
  - `n_procs` - int: Number of process to run in parallel when
      parsing in CLI mode (Default: `1`)
@@ -281,6 +284,10 @@ The full set of configuration options are:
  - `user` - str: Basic auth username
  - `password` - str: Basic auth password
  - `api_key` - str: API key
+  - `auth_type` - str: Authentication type: `basic` (default) or `awssigv4` (the key `authentication_type` is accepted as an alias for this option)
+  - `aws_region` - str: AWS region for SigV4 authentication
+    (required when `auth_type = awssigv4`)
+  - `aws_service` - str: AWS service for SigV4 signing (Default: `es`)
  - `ssl` - bool: Use an encrypted SSL/TLS connection
    (Default: `True`)
  - `timeout` - float: Timeout in seconds (Default: 60)
@@ -511,6 +518,33 @@ PUT _cluster/settings
 Increasing this value increases resource usage.
 :::

+## Performance tuning
+
+For large mailbox imports or backfills, parsedmarc can consume a noticeable amount
+of memory, especially when it runs on the same host as Elasticsearch or
+OpenSearch. The following settings can reduce peak memory usage and make long
+imports more predictable:
+
+- Reduce `mailbox.batch_size` to smaller values such as `100-500` instead of
+  processing a very large message set at once. Smaller batches trade throughput
+  for lower peak memory use and less sink pressure.
+- Keep `n_procs` low for mailbox-heavy runs. In practice, `1-2` workers is often
+  a safer starting point for large backfills than aggressive parallelism.
+- Use `mailbox.since` to process reports in smaller time windows such as `1d`,
+  `7d`, or another interval that fits the backlog. This makes it easier to catch
+  up incrementally instead of loading an entire mailbox history in one run.
+- Set `strip_attachment_payloads = True` when forensic reports contain large
+  attachments and you do not need to retain the raw payloads in the parsed
+  output.
+- Prefer running parsedmarc separately from Elasticsearch or OpenSearch, or
+  reserve enough RAM for both services if they must share a host.
+- For very large imports, prefer incremental supervised runs, such as a
+  scheduler or systemd service, over infrequent massive backfills.
+
+These are operational tuning recommendations rather than hard requirements, but
+they are often enough to avoid memory pressure and reduce failures during
+high-volume mailbox processing.
+
 ## Multi-tenant support

 Starting in `8.19.0`, ParseDMARC provides multi-tenant support by placing data into separate OpenSearch or Elasticsearch index prefixes. To set this up, create a YAML file that is formatted where each key is a tenant name, and the value is a list of domains related to that tenant, not including subdomains, like this: