Steps to reproduce
On a machine cloud:
relate postgresql charm to otelcolector (2/stable rev 256)
relate postgresql to maas-region:db (3.7/candidate 462)
Expected behavior
PostgresqlHighRollbackRate does not fire.
Or PostgresqlHighRollbackRate fires but fires continuously if the rollack rate is indeed high (>2%)
Actual behavior
PostgresqlHighRollbackRate alert keeps firing and closing regularly
Versions
Operating system: 24.04
Juju CLI: 3.6.23
Juju agent: 3.6.23
Charm revision: 16/stable rev 1089
Additional context
The alert is defined as:
- alert: PostgresqlHighRollbackRate
expr: 'sum by (namespace,datname,instance,datid) ((rate(pg_stat_database_xact_rollback{datname!~"template.*|postgres",datid!="0"}[3m])) / ((rate(pg_stat_database_xact_rollback{datname!~"template.*|postgres",datid!="0"}[3m])) + (rate(pg_stat_database_xact_commit{datname!~"template.*|postgres",datid!="0"}[3m])))) > 0.02'
for: 0m
labels:
severity: warning
annotations:
summary: PostgreSQL instance {{ $labels.instance }} has a high rollback rate instance.
description: |
The ratio of transactions being aborted compared to committed is > 2 %.
This is probably happening due to unoptimized configurations related to commit delay, connections, memory, and WAL files.
LABELS = {{ $labels }}
Please note the for: 0m which is really weird, given that the metrics are aggregate with a rate function over a 3m interval. I suspect this is why alertmanager reports so many times the flapping alert.
Steps to reproduce
On a machine cloud:
relate postgresql charm to otelcolector (2/stable rev 256)
relate postgresql to maas-region:db (3.7/candidate 462)
Expected behavior
PostgresqlHighRollbackRatedoes not fire.Or
PostgresqlHighRollbackRatefires but fires continuously if the rollack rate is indeed high (>2%)Actual behavior
PostgresqlHighRollbackRatealert keeps firing and closing regularlyVersions
Operating system: 24.04
Juju CLI: 3.6.23
Juju agent: 3.6.23
Charm revision: 16/stable rev 1089
Additional context
The alert is defined as:
Please note the
for: 0mwhich is really weird, given that the metrics are aggregate with a rate function over a 3m interval. I suspect this is why alertmanager reports so many times the flapping alert.