Add optional PostgreSQL storage backend (#667)

Adds a PostgreSQL output backend as a lighter-weight alternative to
Elasticsearch/OpenSearch, configured via a [postgresql] section
(host/port/user/password/database or a libpq connection_string). Tables
are created automatically on first run; a Grafana dashboard is included.

- psycopg is an optional extra (pip install parsedmarc[postgresql]); the
  import is guarded so `import parsedmarc` works without it, and
  PostgreSQLClient raises a clear install hint when constructed without
  the driver. Binary wheels aren't available for every platform.
- Schema captures the RFC 9990 / DMARCbis aggregate fields: np, testing,
  discovery_method, generator, xml_namespace, and per-result human_result
  on the DKIM/SPF auth-result tables.
- forensic -> failure naming throughout (table dmarc_failure_report,
  save_failure_report_to_postgresql, dashboard, docs) to match #659.
- Failure-report de-duplication mirrors the Elasticsearch backend exactly:
  arrival date + From + To + Subject (NULL-safe via IS NOT DISTINCT FROM;
  semantic JSONB equality). Aggregate and SMTP-TLS use ON CONFLICT.
- PostgreSQLClient.close() for clean CLI shutdown; comment documents why
  the two timestamp helpers must stay distinct (report dates are local,
  record/SMTP-TLS dates are UTC).
- CLI: config parse raises ConfigurationError on missing
  host/connection_string; wired into _init_output_clients + save loops.
- Tests in tests/test_postgres.py (helpers, mocked-DB save assertions,
  create_tables, connect/error wrapping, dedup, real-sample round trip)
  and tests/test_cli.py (config parse + end-to-end save wiring incl.
  AlreadySaved/PostgreSQLError handling). postgres.py at 99% line
  coverage; only _main's output-client-init retry path is left.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Fabio Scaccabarozzi
2026-05-21 14:17:49 +01:00
committed by GitHub
parent 0a703172de
commit 327fcff2b9
8 changed files with 3900 additions and 0 deletions
+46
View File
@@ -367,6 +367,52 @@ The full set of configuration options are:
`%` characters must be escaped with another `%` character,
so use `%%` wherever a `%` character is used.
:::
- `postgresql`
- `host` - str: The PostgreSQL server hostname or IP address.
Required unless `connection_string` is provided.
- `port` - int: The PostgreSQL server port (Default: `5432`)
- `user` - str: The database user name (Optional)
- `password` - str: The database user password (Optional)
- `database` - str: The database name (Optional)
- `connection_string` - str: A full libpq connection string or URI
(e.g. `postgresql://user:pass@host/dbname`). When provided,
all individual parameters above are ignored.
The PostgreSQL backend is an optional extra. Install it with
`pip install parsedmarc[postgresql]` (it pulls in `psycopg`); the
prebuilt binary wheels are not available for every platform, which is
why it is not a mandatory dependency.
Tables are created automatically on first run using
`CREATE TABLE IF NOT EXISTS`, so no manual schema migration is needed
for fresh installations.
**Example configuration:**
```ini
[postgresql]
host = localhost
port = 5432
user = parsedmarc
password = secret
database = parsedmarc
```
Or using a DSN/URI:
```ini
[postgresql]
connection_string = postgresql://parsedmarc:secret@localhost/parsedmarc
```
Saving parsed data to PostgreSQL is controlled by the `[general]`
options `save_aggregate`, `save_failure`, and `save_smtp_tls`
(`save_forensic` is still accepted as a deprecated alias for
`save_failure`). These flags must be set to `True` for the
corresponding report types (aggregate DMARC, failure DMARC, and
SMTP TLS reports) or no data will be written to PostgreSQL, even if
this section is configured.
- `s3`
- `bucket` - str: The S3 bucket name
- `path` - str: The path to upload reports to (Default: `/`)