Skip to content

Transient alert PostgresqlHighRollbackRate #1809

Description

@ggouzi

Steps to reproduce

On a machine cloud:
relate postgresql charm to otelcolector (2/stable rev 256)
relate postgresql to maas-region:db (3.7/candidate 462)

Expected behavior

PostgresqlHighRollbackRate does not fire.
Or PostgresqlHighRollbackRate fires but fires continuously if the rollack rate is indeed high (>2%)

Actual behavior

PostgresqlHighRollbackRate alert keeps firing and closing regularly

Versions

Operating system: 24.04
Juju CLI: 3.6.23
Juju agent: 3.6.23
Charm revision: 16/stable rev 1089

Additional context

The alert is defined as:

    - alert: PostgresqlHighRollbackRate
      expr: 'sum by (namespace,datname,instance,datid) ((rate(pg_stat_database_xact_rollback{datname!~"template.*|postgres",datid!="0"}[3m])) / ((rate(pg_stat_database_xact_rollback{datname!~"template.*|postgres",datid!="0"}[3m])) + (rate(pg_stat_database_xact_commit{datname!~"template.*|postgres",datid!="0"}[3m])))) > 0.02'
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: PostgreSQL instance {{ $labels.instance }} has a high rollback rate instance.
        description: |
          The ratio of transactions being aborted compared to committed is > 2 %.
          This is probably happening due to unoptimized configurations related to commit delay, connections, memory, and WAL files.
          LABELS = {{ $labels }}

Please note the for: 0m which is really weird, given that the metrics are aggregate with a rate function over a 3m interval. I suspect this is why alertmanager reports so many times the flapping alert.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working as expected

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions