From c1a292d9daaaceb282a4553c35c974f9146091b8 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:24:21 -0400
Subject: [PATCH 01/14] docs: add postgres-ha-cluster.rst skeleton

---
 .../scale/postgres-ha-cluster.rst             | 43 +++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100644 source/administration-guide/scale/postgres-ha-cluster.rst

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
new file mode 100644
index 00000000000..168e6502236
--- /dev/null
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -0,0 +1,43 @@
+PostgreSQL high availability cluster
+=====================================
+
+:nosearch:
+
+This guide describes how to deploy a high availability PostgreSQL cluster for
+Mattermost using `repmgr <https://repmgr.org/>`__ for replication management
+and automatic failover, `HAProxy <https://www.haproxy.org/>`__ for connection
+routing, and `Keepalived <https://keepalived.org/>`__ for Virtual IP (VIP)
+management.
+
+This is infrastructure-level HA that operates independently of your Mattermost
+edition. It is compatible with any self-hosted Mattermost deployment.
+
+.. note::
+
+   This guide has been validated on: **Ubuntu 24.04 LTS**, **PostgreSQL 17**,
+   **repmgr 5.5**, **HAProxy 2.8**, **Keepalived**.
+
+Architecture overview
+---------------------
+
+[stub]
+
+Before you begin
+----------------
+
+[stub]
+
+Setup guide
+-----------
+
+[stub]
+
+Day-2 operations
+----------------
+
+[stub]
+
+Troubleshooting
+---------------
+
+[stub]

From c11a8fce4279c8f1a0a1baa659fa25de5882cc99 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:25:29 -0400
Subject: [PATCH 02/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20architecture=20overview=20section?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 69 ++++++++++++++++++-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index 168e6502236..19c19a93189 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -20,7 +20,74 @@ edition. It is compatible with any self-hosted Mattermost deployment.
 Architecture overview
 ---------------------
 
-[stub]
+A PostgreSQL HA cluster for Mattermost consists of three nodes running in
+parallel. Each node runs the full stack: PostgreSQL, repmgr daemon (repmgrd),
+HAProxy, Keepalived, and a health-check service. A Virtual IP (VIP) floats
+across nodes and always points to the current primary.
+
+.. code-block:: text
+
+                         VIP: <CLUSTER_VIP>
+                                │
+                ┌───────────────┼───────────────┐
+                │               │               │
+         ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
+         │     pg1     │ │     pg2     │ │     pg3     │
+         │             │ │             │ │             │
+         │  HAProxy    │ │  HAProxy    │ │  HAProxy    │
+         │  Keepalived │ │  Keepalived │ │  Keepalived │
+         │  pgchk.py   │ │  pgchk.py   │ │  pgchk.py   │
+         │  repmgrd    │ │  repmgrd    │ │  repmgrd    │
+         ├─────────────┤ ├─────────────┤ ├─────────────┤
+         │ PostgreSQL  │ │ PostgreSQL  │ │ PostgreSQL  │
+         │   PRIMARY   │ │   STANDBY   │ │   STANDBY   │
+         └─────────────┘ └─────────────┘ └─────────────┘
+
+**Components:**
+
+.. list-table::
+   :widths: 20 10 70
+   :header-rows: 1
+
+   * - Component
+     - Version
+     - Role
+   * - PostgreSQL
+     - 17
+     - Primary database engine. Streaming replication with replication slots.
+   * - repmgr / repmgrd
+     - 5.5
+     - Replication manager. Monitors cluster health and automatically promotes
+       a standby when the primary fails.
+   * - HAProxy
+     - 2.8
+     - TCP load balancer. Routes write traffic to the primary and read traffic
+       to standbys via two ports.
+   * - Keepalived
+     - —
+     - Manages the VIP using VRRP. Moves the VIP to the new primary after
+       failover.
+   * - pgchk.py
+     - —
+     - HTTP health-check endpoint (port 8008). HAProxy queries this to
+       determine which node is the current primary.
+
+**HAProxy ports:**
+
+.. list-table::
+   :widths: 15 85
+   :header-rows: 1
+
+   * - Port
+     - Purpose
+   * - 5000
+     - Write traffic — routes to the current primary only
+   * - 5001
+     - Read traffic — load-balanced across all standbys
+
+**Sizing:** This architecture is appropriate for Mattermost deployments up to
+approximately 2,000 concurrent users. For larger deployments, see
+:doc:`Scaling for Enterprise </administration-guide/scale/scaling-for-enterprise>`.
 
 Before you begin
 ----------------

From cbb742ad1122a281df54343ba9200c9b36deb135 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:26:42 -0400
Subject: [PATCH 03/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20before=20you=20begin=20/=20decision=20guidance=20section?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 109 +++++++++++++++++-
 1 file changed, 108 insertions(+), 1 deletion(-)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index 19c19a93189..a004691f1e2 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -92,7 +92,114 @@ approximately 2,000 concurrent users. For larger deployments, see
 Before you begin
 ----------------
 
-[stub]
+Is this the right architecture for you?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. list-table::
+   :widths: 30 35 35
+   :header-rows: 1
+
+   * - Scenario
+     - Recommendation
+     - Why
+   * - Cloud-hosted on AWS/GCP/Azure
+     - Use managed RDS/Cloud SQL with Multi-AZ
+     - Managed failover, no infrastructure to operate
+   * - On-premises or private cloud, single site
+     - **This guide** — single-DC HA cluster
+     - Automatic failover within the datacenter, no cloud dependency
+   * - On-premises, two or more sites, DR required
+     - Single-DC HA (this guide) + Multi-DC DR guide (coming soon)
+     - Active/warm-standby across datacenters
+
+Requirements
+~~~~~~~~~~~~
+
+**Hardware (per node — minimum):**
+
+- Operating system: Ubuntu 24.04 LTS
+- CPU: 2 cores
+- RAM: 4 GB
+- Disk: 50 GB
+
+**You need 3 nodes** and one spare IP address on the same subnet for the VIP.
+
+**Network — ports that must be open between all three nodes:**
+
+.. list-table::
+   :widths: 15 85
+   :header-rows: 1
+
+   * - Port
+     - Purpose
+   * - 22
+     - SSH (administration)
+   * - 5432
+     - PostgreSQL (replication, repmgr)
+   * - 8008
+     - pgchk.py health check (HAProxy → database nodes)
+   * - VRRP (112)
+     - Keepalived VIP election between nodes
+
+**Ports that Mattermost application servers must reach:**
+
+.. list-table::
+   :widths: 15 85
+   :header-rows: 1
+
+   * - Port
+     - Purpose
+   * - 5000
+     - Write connections (primary)
+   * - 5001
+     - Read connections (standbys)
+
+**Software:** The following packages will be installed during setup. No
+pre-installation is required.
+
+- ``postgresql-17``
+- ``postgresql-17-repmgr``
+- ``haproxy``
+- ``keepalived``
+- ``python3`` (for pgchk.py)
+
+Node planning worksheet
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Complete this before starting. You will substitute these values throughout
+the guide.
+
+.. list-table::
+   :widths: 15 25 25 35
+   :header-rows: 1
+
+   * - Node
+     - Hostname
+     - IP address
+     - Initial role
+   * - 1
+     - pg1
+     - _______________
+     - Primary
+   * - 2
+     - pg2
+     - _______________
+     - Standby
+   * - 3
+     - pg3
+     - _______________
+     - Standby
+   * - VIP
+     - —
+     - _______________
+     - Floating (always points to primary)
+
+**Subnet:** ``_______________`` (e.g. ``10.0.1.0``)
+
+Time estimate
+~~~~~~~~~~~~~
+
+Allow **2–3 hours** for a first-time setup on pre-provisioned servers.
 
 Setup guide
 -----------

From 3caf951d7c1fefd4c0ce98b3373fc93bc6365698 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:27:37 -0400
Subject: [PATCH 04/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20setup=20Phase=201=20(base=20install)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 .../scale/postgres-ha-cluster.rst             | 59 ++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index a004691f1e2..ffdc8d9252d 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -204,7 +204,64 @@ Allow **2–3 hours** for a first-time setup on pre-provisioned servers.
 Setup guide
 -----------
 
-[stub]
+.. note::
+
+   Throughout this guide, substitute the IP addresses and subnet you recorded
+   in the node planning worksheet above.
+
+.. warning::
+
+   Complete each phase in order. The checkpoint at the end of each phase must
+   pass before you proceed.
+
+Phase 1: Base installation (all nodes)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run all steps in Phase 1 on **pg1, pg2, and pg3**.
+
+**Step 1.1 — Configure /etc/hosts**
+
+On each node, append to ``/etc/hosts``:
+
+.. code-block:: text
+
+   <PG1_IP>  pg1
+   <PG2_IP>  pg2
+   <PG3_IP>  pg3
+
+Verify hostname resolution on each node:
+
+.. code-block:: bash
+
+   ping -c 1 pg1 && ping -c 1 pg2 && ping -c 1 pg3
+
+Expected: 3 successful pings.
+
+**Step 1.2 — Install PostgreSQL 17 and repmgr 5.5**
+
+.. code-block:: bash
+
+   sudo apt update
+   sudo apt install -y curl ca-certificates
+   sudo install -d /usr/share/postgresql-common/pgdg
+   sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
+       --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
+   sudo sh -c 'echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
+       https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" \
+       > /etc/apt/sources.list.d/pgdg.list'
+   sudo apt update
+   sudo apt install -y postgresql-17 postgresql-17-repmgr
+
+✅ **Phase 1 checkpoint** — run on every node:
+
+.. code-block:: bash
+
+   sudo systemctl status postgresql | grep "active (running)"
+   /usr/lib/postgresql/17/bin/repmgr --version
+
+**Pass:** PostgreSQL shows ``active (running)``; repmgr prints ``repmgr 5.5.x``.
+
+**Fail:** If PostgreSQL did not start, check ``journalctl -u postgresql`` for errors.
 
 Day-2 operations
 ----------------

From ec40b00d403c2364ff71dd2401ed55f6448fc7e9 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:28:26 -0400
Subject: [PATCH 05/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20setup=20Phase=202=20(PostgreSQL=20config)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 58 +++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index ffdc8d9252d..a921c1c2c7c 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -263,6 +263,64 @@ Expected: 3 successful pings.
 
 **Fail:** If PostgreSQL did not start, check ``journalctl -u postgresql`` for errors.
 
+Phase 2: PostgreSQL configuration (all nodes)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run all steps in Phase 2 on **pg1, pg2, and pg3**.
+
+**Step 2.1 — Configure postgresql.conf**
+
+Append to ``/etc/postgresql/17/main/postgresql.conf``:
+
+.. code-block:: ini
+
+   # Replication settings
+   listen_addresses = '*'
+   max_wal_senders = 10
+   max_replication_slots = 10
+   wal_level = replica
+   hot_standby = on
+   archive_mode = on
+   archive_command = '/bin/true'
+   shared_preload_libraries = 'repmgr'
+   wal_log_hints = on
+   wal_keep_size = 1024
+
+**Step 2.2 — Configure pg_hba.conf**
+
+Append to ``/etc/postgresql/17/main/pg_hba.conf``:
+
+.. code-block:: text
+
+   # repmgr access
+   host    repmgr      repmgr      <SUBNET>/24     trust
+   host    repmgr      repmgr      127.0.0.1/32    trust
+   # Replication connections
+   host    replication repmgr      <SUBNET>/24     trust
+   host    replication repmgr      127.0.0.1/32    trust
+
+.. note::
+
+   For production, replace ``trust`` with ``scram-sha-256`` and configure
+   ``.pgpass`` files on each node.
+
+**Step 2.3 — Restart PostgreSQL**
+
+.. code-block:: bash
+
+   sudo systemctl restart postgresql
+
+✅ **Phase 2 checkpoint** — run on every node:
+
+.. code-block:: bash
+
+   sudo -u postgres psql -c "SHOW wal_level;"
+   sudo -u postgres psql -c "SHOW shared_preload_libraries;"
+
+**Pass:** ``wal_level`` is ``replica``; ``shared_preload_libraries`` contains ``repmgr``.
+
+**Fail:** If PostgreSQL did not restart, check ``journalctl -u postgresql``.
+
 Day-2 operations
 ----------------
 

From 755c7710510fd7b48a3111ddb28a1284fce10ef8 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:29:30 -0400
Subject: [PATCH 06/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20setup=20Phase=203=20(repmgr=20+=20standby=20clone)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 104 ++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index a921c1c2c7c..1128544ce9c 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -321,6 +321,110 @@ Append to ``/etc/postgresql/17/main/pg_hba.conf``:
 
 **Fail:** If PostgreSQL did not restart, check ``journalctl -u postgresql``.
 
+Phase 3: repmgr configuration and cluster initialisation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Step 3.1 — Create repmgr user and database (pg1 only)**
+
+.. code-block:: bash
+
+   sudo -u postgres createuser --superuser repmgr
+   sudo -u postgres createdb --owner=repmgr repmgr
+   sudo -u postgres psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"
+
+**Step 3.2 — Create /etc/repmgr.conf (all nodes)**
+
+Create ``/etc/repmgr.conf`` on each node. Adjust ``node_id``, ``node_name``,
+and ``host`` for each node:
+
+**pg1:**
+
+.. code-block:: ini
+
+   node_id=1
+   node_name='pg1'
+   conninfo='host=<PG1_IP> user=repmgr dbname=repmgr connect_timeout=2'
+   data_directory='/var/lib/postgresql/17/main'
+   use_replication_slots=yes
+   monitoring_history=yes
+   log_level=INFO
+   pg_bindir='/usr/lib/postgresql/17/bin'
+   service_start_command='sudo /usr/bin/pg_ctlcluster 17 main start'
+   service_stop_command='sudo /usr/bin/pg_ctlcluster 17 main stop'
+   service_restart_command='sudo /usr/bin/pg_ctlcluster 17 main restart'
+   service_reload_command='sudo /usr/bin/pg_ctlcluster 17 main reload'
+   service_promote_command='sudo /usr/bin/pg_ctlcluster 17 main promote'
+   failover=automatic
+   promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'
+   follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
+   reconnect_attempts=3
+   reconnect_interval=5
+   monitor_interval_secs=2
+
+**pg2:** Same as above with ``node_id=2``, ``node_name='pg2'``, ``host=<PG2_IP>``.
+
+**pg3:** Same as above with ``node_id=3``, ``node_name='pg3'``, ``host=<PG3_IP>``.
+
+**Step 3.3 — Register primary (pg1 only)**
+
+.. code-block:: bash
+
+   sudo -u postgres repmgr -f /etc/repmgr.conf primary register
+
+**Step 3.4 — Clone standbys (pg2 and pg3)**
+
+Run on **pg2**, then **pg3**:
+
+.. code-block:: bash
+
+   sudo systemctl stop postgresql
+   sudo -u postgres repmgr -h <PG1_IP> -U repmgr -d repmgr \
+       -f /etc/repmgr.conf standby clone --delete-existing-pgdata
+   sudo systemctl start postgresql
+   sudo -u postgres repmgr -f /etc/repmgr.conf standby register
+
+**Step 3.5 — Start repmgrd (all nodes)**
+
+Create ``/etc/systemd/system/repmgrd.service``:
+
+.. code-block:: ini
+
+   [Unit]
+   Description=repmgr daemon
+   After=postgresql.service
+   Requires=postgresql.service
+
+   [Service]
+   User=postgres
+   ExecStart=/usr/lib/postgresql/17/bin/repmgrd -f /etc/repmgr.conf --no-daemonize
+   Restart=on-failure
+
+   [Install]
+   WantedBy=multi-user.target
+
+.. code-block:: bash
+
+   sudo systemctl daemon-reload
+   sudo systemctl enable repmgrd
+   sudo systemctl start repmgrd
+
+✅ **Phase 3 checkpoint** — run on any node:
+
+.. code-block:: bash
+
+   sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** Output shows all three nodes — pg1 as ``* running`` (primary), pg2 and
+pg3 as ``running`` (standby). On pg1, the following query returns 2 rows:
+
+.. code-block:: bash
+
+   sudo -u postgres psql -c "SELECT client_addr, state FROM pg_stat_replication;"
+
+**Fail:** A standby showing ``! running`` means replication did not establish.
+Check ``journalctl -u postgresql`` on the failed standby. Common cause: firewall
+blocking port 5432 between nodes.
+
 Day-2 operations
 ----------------
 

From 30a6955f99979c537b0e9f6540f0745356f3bc49 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:30:45 -0400
Subject: [PATCH 07/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20setup=20Phase=204=20(HAProxy,=20pgchk,=20Keepalived)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 136 ++++++++++++++++++
 1 file changed, 136 insertions(+)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index 1128544ce9c..8b54647fea8 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -425,6 +425,142 @@ pg3 as ``running`` (standby). On pg1, the following query returns 2 rows:
 Check ``journalctl -u postgresql`` on the failed standby. Common cause: firewall
 blocking port 5432 between nodes.
 
+Phase 4: HAProxy, health check, and VIP (all nodes)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run all steps in Phase 4 on **pg1, pg2, and pg3**.
+
+**Step 4.1 — Install HAProxy**
+
+.. code-block:: bash
+
+   sudo apt install -y haproxy
+
+**Step 4.2 — Configure HAProxy**
+
+Replace ``/etc/haproxy/haproxy.cfg``:
+
+.. code-block:: text
+
+   global
+       log /dev/log local0
+       maxconn 4000
+
+   defaults
+       log global
+       mode tcp
+       timeout connect 5s
+       timeout client 30s
+       timeout server 30s
+
+   frontend pg_write
+       bind *:5000
+       default_backend pg_primary
+
+   frontend pg_read
+       bind *:5001
+       default_backend pg_replicas
+
+   backend pg_primary
+       option tcp-check
+       server pg1 <PG1_IP>:5432 check port 8008
+       server pg2 <PG2_IP>:5432 check port 8008 backup
+       server pg3 <PG3_IP>:5432 check port 8008 backup
+
+   backend pg_replicas
+       balance roundrobin
+       option tcp-check
+       server pg2 <PG2_IP>:5432 check port 8008
+       server pg3 <PG3_IP>:5432 check port 8008
+       server pg1 <PG1_IP>:5432 check port 8008 backup
+
+**Step 4.3 — Deploy pgchk.py**
+
+``pgchk.py`` is a lightweight HTTP server that returns ``200 OK`` when the local
+node is the primary and ``503`` otherwise. HAProxy queries port 8008 on each
+node to determine where to route connections.
+
+Copy ``pgchk.py`` from the
+`ha-postgres-reprmgr-haproxy repository <https://github.com/sadohert/ha-postgres-reprmgr-haproxy>`__
+to ``/usr/local/bin/pgchk.py`` on each node and make it executable:
+
+.. code-block:: bash
+
+   sudo chmod +x /usr/local/bin/pgchk.py
+
+Create ``/etc/systemd/system/pgchk.service``:
+
+.. code-block:: ini
+
+   [Unit]
+   Description=PostgreSQL Health Check for HAProxy
+   After=postgresql.service
+
+   [Service]
+   ExecStart=/usr/bin/python3 /usr/local/bin/pgchk.py --port 8008
+   Restart=always
+
+   [Install]
+   WantedBy=multi-user.target
+
+.. code-block:: bash
+
+   sudo systemctl daemon-reload
+   sudo systemctl enable pgchk
+   sudo systemctl start pgchk
+   sudo systemctl enable haproxy
+   sudo systemctl start haproxy
+
+**Step 4.4 — Install and configure Keepalived**
+
+.. code-block:: bash
+
+   sudo apt install -y keepalived
+
+Create ``/etc/keepalived/keepalived.conf``. Set the ``priority`` field: pg1 gets
+``101``, pg2 gets ``100``, pg3 gets ``99``. Set ``virtual_ipaddress`` to your VIP:
+
+.. code-block:: text
+
+   vrrp_instance VI_1 {
+       state BACKUP
+       interface eth0
+       virtual_router_id 51
+       priority 101
+       advert_int 1
+       nopreempt
+       virtual_ipaddress {
+           <CLUSTER_VIP>/24
+       }
+   }
+
+.. code-block:: bash
+
+   sudo systemctl enable keepalived
+   sudo systemctl start keepalived
+
+✅ **Phase 4 checkpoint** — run on any node:
+
+.. code-block:: bash
+
+   # VIP should be active on the primary node (pg1)
+   ip addr show | grep <CLUSTER_VIP>
+
+   # Port 5000 should connect to primary
+   psql -h <CLUSTER_VIP> -p 5000 -U repmgr -d repmgr \
+       -c "SELECT inet_server_addr(), pg_is_in_recovery();"
+
+   # Port 5001 should connect to a standby
+   psql -h <CLUSTER_VIP> -p 5001 -U repmgr -d repmgr \
+       -c "SELECT inet_server_addr(), pg_is_in_recovery();"
+
+**Pass:** VIP visible on pg1. Port 5000 returns ``pg_is_in_recovery = f`` (primary).
+Port 5001 returns ``pg_is_in_recovery = t`` (standby).
+
+**Fail:** If the VIP is not on pg1, check ``journalctl -u keepalived``. If HAProxy
+is not routing correctly, check ``journalctl -u haproxy`` and verify pgchk.py
+is responding: ``curl http://<PG1_IP>:8008`` should return HTTP 200.
+
 Day-2 operations
 ----------------
 

From 63688bbaf4c03b7246a760e9f09375256becd9f6 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:31:40 -0400
Subject: [PATCH 08/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20setup=20Phase=205=20(end-to-end=20validation)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 64 +++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index 8b54647fea8..dcbeac209d1 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -561,6 +561,70 @@ Port 5001 returns ``pg_is_in_recovery = t`` (standby).
 is not routing correctly, check ``journalctl -u haproxy`` and verify pgchk.py
 is responding: ``curl http://<PG1_IP>:8008`` should return HTTP 200.
 
+Phase 5: End-to-end validation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run this phase after all four previous phases pass on all nodes. This confirms
+the cluster behaves correctly under failure before you connect Mattermost.
+
+**Step 5.1 — Confirm healthy starting state**
+
+.. code-block:: bash
+
+   sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** pg1 is ``* running`` (primary); pg2 and pg3 are ``running`` (standby).
+
+**Step 5.2 — Simulate primary failure**
+
+On **pg1**:
+
+.. code-block:: bash
+
+   sudo systemctl stop postgresql
+
+Wait 30 seconds, then on **pg2** or **pg3**:
+
+.. code-block:: bash
+
+   sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** One of pg2 or pg3 is now ``* running`` (primary). pg1 shows as ``! running``
+(unreachable — expected).
+
+**Step 5.3 — Verify HAProxy and VIP followed the new primary**
+
+.. code-block:: bash
+
+   psql -h <CLUSTER_VIP> -p 5000 -U repmgr -d repmgr \
+       -c "SELECT inet_server_addr(), pg_is_in_recovery();"
+
+**Pass:** Returns the IP of the newly promoted node with ``pg_is_in_recovery = f``.
+
+**Step 5.4 — Recover the old primary as a standby**
+
+On **pg1**:
+
+.. code-block:: bash
+
+   sudo systemctl start postgresql
+   sudo -u postgres repmgr -f /etc/repmgr.conf node rejoin \
+       --force-rewind --config-files=postgresql.conf,pg_hba.conf
+
+Then on any node:
+
+.. code-block:: bash
+
+   sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** All three nodes show ``running``; pg1 is now a standby.
+
+.. note::
+
+   Your cluster is ready for production. Connect Mattermost using the VIP
+   address and port 5000 as the primary datasource. Optionally configure
+   port 5001 as a read replica in ``config.json``.
+
 Day-2 operations
 ----------------
 

From 738ceefbb01ca55e606d14cb1c525bc6d4637a08 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:32:34 -0400
Subject: [PATCH 09/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20day-2=20operations=20section?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 72 ++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index dcbeac209d1..0072fd6fe11 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -628,7 +628,77 @@ Then on any node:
 Day-2 operations
 ----------------
 
-[stub]
+Check cluster status
+~~~~~~~~~~~~~~~~~~~~
+
+Run on any node:
+
+.. code-block:: bash
+
+   sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+Expected healthy output shows one ``* running`` primary and two ``running`` standbys.
+
+Check replication lag
+~~~~~~~~~~~~~~~~~~~~~
+
+Run on the primary:
+
+.. code-block:: bash
+
+   sudo -u postgres psql -c "
+   SELECT client_addr, application_name, state,
+          pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS lag
+   FROM pg_stat_replication;"
+
+Normal lag is under 1 MB during steady state. Lag growing continuously
+indicates a replication problem — check network connectivity and standby
+PostgreSQL logs.
+
+Controlled switchover (planned maintenance)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To move the primary role to a standby with zero data loss:
+
+.. code-block:: bash
+
+   # Run on the TARGET standby (e.g. pg2)
+   sudo -u postgres repmgr -f /etc/repmgr.conf standby switchover
+
+repmgr will demote the old primary and promote this node. The VIP and HAProxy
+will follow automatically.
+
+Add a standby node
+~~~~~~~~~~~~~~~~~~
+
+1. Provision a new server and complete Phases 1–2 of the setup guide.
+2. Create ``/etc/repmgr.conf`` with the next available ``node_id``.
+3. On the new node:
+
+   .. code-block:: bash
+
+      sudo systemctl stop postgresql
+      sudo -u postgres repmgr -h <PRIMARY_IP> -U repmgr -d repmgr \
+          -f /etc/repmgr.conf standby clone --delete-existing-pgdata
+      sudo systemctl start postgresql
+      sudo -u postgres repmgr -f /etc/repmgr.conf standby register
+
+4. Add the new node to ``/etc/haproxy/haproxy.cfg`` on all existing nodes and
+   reload HAProxy: ``sudo systemctl reload haproxy``.
+
+Rejoin a failed node
+~~~~~~~~~~~~~~~~~~~~
+
+After recovering a failed standby:
+
+.. code-block:: bash
+
+   sudo -u postgres repmgr -f /etc/repmgr.conf node rejoin \
+       --force-rewind --config-files=postgresql.conf,pg_hba.conf
+
+After rejoining a failed primary (after automatic failover has already promoted
+a new primary), run the same command on the old primary to re-register it as a
+standby.
 
 Troubleshooting
 ---------------

From 9a95ced6ae461aec94ae2478785f2721a3776b9c Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:33:36 -0400
Subject: [PATCH 10/14] =?UTF-8?q?docs:=20postgres-ha-cluster=20=E2=80=94?=
 =?UTF-8?q?=20troubleshooting=20section?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../scale/postgres-ha-cluster.rst             | 90 ++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index 0072fd6fe11..12abd582389 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -703,4 +703,92 @@ standby.
 Troubleshooting
 ---------------
 
-[stub]
+repmgrd is not starting
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** ``systemctl status repmgrd`` shows ``failed`` or ``activating``.
+
+**Likely cause:** PostgreSQL has not fully started yet, or the repmgr database
+is not accessible.
+
+**Resolution:**
+
+.. code-block:: bash
+
+   # Verify PostgreSQL is running first
+   sudo systemctl status postgresql
+
+   # Check repmgrd logs
+   journalctl -u repmgrd -n 50
+
+   # Test repmgr connection manually
+   sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+Standby not replicating
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** ``repmgr cluster show`` shows a standby as ``! running``, or
+``pg_stat_replication`` on the primary shows fewer than expected rows.
+
+**Likely cause:** Network connectivity issue on port 5432, or ``pg_hba.conf``
+not permitting the replication connection.
+
+**Resolution:**
+
+.. code-block:: bash
+
+   # From the standby, test connectivity to the primary
+   pg_isready -h <PRIMARY_IP> -p 5432 -U repmgr
+
+   # Check PostgreSQL logs on the standby
+   sudo -u postgres tail -50 /var/log/postgresql/postgresql-17-main.log
+
+VIP not moving after failover
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** After a primary failure and successful repmgr promotion, the VIP
+remains on the failed node or does not appear on the new primary.
+
+**Likely cause:** Keepalived is not running, or VRRP traffic is blocked by a
+firewall.
+
+**Resolution:**
+
+.. code-block:: bash
+
+   sudo systemctl status keepalived
+   journalctl -u keepalived -n 50
+
+   # Verify VRRP traffic is not blocked — check cloud security groups or
+   # iptables rules for protocol 112 (VRRP)
+
+HAProxy routing to wrong node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** Connections on port 5000 land on a standby (writes fail), or
+port 5001 routes to the primary.
+
+**Likely cause:** pgchk.py is not running or returning incorrect status.
+
+**Resolution:**
+
+.. code-block:: bash
+
+   # Check health check response on each node
+   curl -v http://<NODE_IP>:8008
+
+   # Primary should return HTTP 200; standbys should return HTTP 503
+   sudo systemctl status pgchk
+   journalctl -u pgchk -n 30
+
+Split-brain prevention
+~~~~~~~~~~~~~~~~~~~~~~
+
+repmgr's ``failover=automatic`` setting and ``reconnect_attempts=3`` with
+``reconnect_interval=5`` provide a brief delay before promoting a standby.
+This prevents promotion during transient network blips.
+
+If you suspect a split-brain scenario (two nodes both believing they are
+primary), **do not write to either node**. Check cluster status from a
+third node and use ``repmgr node service --action=stop`` to fence the
+unintended primary before recovering.

From aa66119d84b46aba6027c53566b1c4017e3e7886 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:34:18 -0400
Subject: [PATCH 11/14] docs: add postgres-ha-cluster to scaling-for-enterprise
 toctree

---
 .../scale/scaling-for-enterprise.rst                  | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/source/administration-guide/scale/scaling-for-enterprise.rst b/source/administration-guide/scale/scaling-for-enterprise.rst
index 18779aee9e2..5e4c0521dde 100644
--- a/source/administration-guide/scale/scaling-for-enterprise.rst
+++ b/source/administration-guide/scale/scaling-for-enterprise.rst
@@ -29,6 +29,14 @@ High availability
 
 A :doc:`high availability cluster-based deployment </administration-guide/scale/high-availability-cluster-based-deployment>` enables a Mattermost system to maintain service during outages and hardware failures through the use of redundant infrastructure.
 
+PostgreSQL high availability cluster
+--------------------------------------
+
+For self-hosted deployments on bare-metal or VMs, a
+:doc:`PostgreSQL HA cluster </administration-guide/scale/postgres-ha-cluster>`
+provides automatic database failover using repmgr, HAProxy, and Keepalived —
+without requiring a managed database service.
+
 Redis
 -----
 
@@ -45,7 +53,8 @@ Available reference architectures
     Deployment architecture at scale </administration-guide/scale/server-architecture>
     Backing storage benchmarks </administration-guide/scale/backing-storage-benchmarks>
     Enterprise search </administration-guide/scale/enterprise-search>
-    High availability </administration-guide/scale/high-availability-cluster-based-deployment> 
+    High availability </administration-guide/scale/high-availability-cluster-based-deployment>
+    PostgreSQL HA cluster </administration-guide/scale/postgres-ha-cluster>
     Redis </administration-guide/scale/redis>
     Scale up to 200 users </administration-guide/scale/scale-to-200-users>
     Scale up to 2000 users </administration-guide/scale/scale-to-2000-users>

From 1e7d0606246ba725bc4e1c68643ce34ec8ba714b Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:34:24 -0400
Subject: [PATCH 12/14] docs: cross-link to postgres-ha-cluster from HA cluster
 deployment page

---
 .../scale/high-availability-cluster-based-deployment.rst      | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/source/administration-guide/scale/high-availability-cluster-based-deployment.rst b/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
index cfeb5b263c1..e0a32dd95f0 100644
--- a/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
+++ b/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
@@ -30,7 +30,9 @@ Moreover, search replicas are also supported to handle search queries.
 Deployment guide
 ----------------
 
-Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover the configuration of databases in terms of disaster recovery, however, you can refer to the `frequently asked questions (FAQ)`_ section for our recommendations.
+Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover the configuration of databases in terms of
+disaster recovery. For self-hosted deployments requiring database-level HA,
+see :doc:`PostgreSQL high availability cluster </administration-guide/scale/postgres-ha-cluster>`.
 
 To ensure your instance and configuration are compatible with a high availability cluster-based deployment, please review the `configuration and compatibility`_ section.
 

From 7a6ae8cc294ede8ea0558c6aad5903c1a6020242 Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 09:51:28 -0400
Subject: [PATCH 13/14] docs: address CodeRabbit review feedback

- pg_hba.conf: default to scram-sha-256; trust moved to lab-only warning
- pgchk.py: embed full script inline instead of linking to external repo
- Keepalived: add ip link step to identify interface before hardcoding
- repmgr createuser: add note explaining why superuser is required
- high-availability-cluster-based-deployment.rst: clarify HA vs DR scope

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 ...-availability-cluster-based-deployment.rst |   4 +-
 .../scale/postgres-ha-cluster.rst             | 134 ++++++++++++++++--
 2 files changed, 123 insertions(+), 15 deletions(-)

diff --git a/source/administration-guide/scale/high-availability-cluster-based-deployment.rst b/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
index e0a32dd95f0..6ca31b96911 100644
--- a/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
+++ b/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
@@ -30,8 +30,8 @@ Moreover, search replicas are also supported to handle search queries.
 Deployment guide
 ----------------
 
-Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover the configuration of databases in terms of
-disaster recovery. For self-hosted deployments requiring database-level HA,
+Set up and maintain a high availability cluster-based deployment on your Mattermost servers. Database-level HA and disaster recovery design are out of scope for this document.
+For self-hosted deployments requiring database-level HA,
 see :doc:`PostgreSQL high availability cluster </administration-guide/scale/postgres-ha-cluster>`.
 
 To ensure your instance and configuration are compatible with a high availability cluster-based deployment, please review the `configuration and compatibility`_ section.
diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
index 12abd582389..f215cf668ce 100644
--- a/source/administration-guide/scale/postgres-ha-cluster.rst
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -293,16 +293,26 @@ Append to ``/etc/postgresql/17/main/pg_hba.conf``:
 .. code-block:: text
 
    # repmgr access
-   host    repmgr      repmgr      <SUBNET>/24     trust
-   host    repmgr      repmgr      127.0.0.1/32    trust
+   host    repmgr      repmgr      <SUBNET>/24     scram-sha-256
+   host    repmgr      repmgr      127.0.0.1/32    scram-sha-256
    # Replication connections
-   host    replication repmgr      <SUBNET>/24     trust
-   host    replication repmgr      127.0.0.1/32    trust
+   host    replication repmgr      <SUBNET>/24     scram-sha-256
+   host    replication repmgr      127.0.0.1/32    scram-sha-256
 
-.. note::
+Create a ``.pgpass`` file on each node so repmgr can authenticate without
+an interactive password prompt:
+
+.. code-block:: bash
 
-   For production, replace ``trust`` with ``scram-sha-256`` and configure
-   ``.pgpass`` files on each node.
+   echo "*:*:repmgr:repmgr:<YOUR_REPMGR_PASSWORD>" >> ~/.pgpass
+   chmod 600 ~/.pgpass
+
+.. warning::
+
+   **Lab and testing only:** If you want to skip password authentication for
+   initial setup, you can temporarily use ``trust`` instead of
+   ``scram-sha-256``. Do not use ``trust`` in production — it allows
+   passwordless connections from any host on the subnet.
 
 **Step 2.3 — Restart PostgreSQL**
 
@@ -326,11 +336,21 @@ Phase 3: repmgr configuration and cluster initialisation
 
 **Step 3.1 — Create repmgr user and database (pg1 only)**
 
+.. note::
+
+   repmgr requires superuser privileges to perform certain cluster operations
+   including ``pg_rewind`` (used to resync a failed primary as a standby) and
+   event notifications. If your security policy prohibits superuser accounts,
+   refer to the `repmgr documentation on permissions
+   <https://repmgr.org/docs/repmgr.html#CONFIGURATION-PERMISSIONS>`__ for the
+   minimum required grants.
+
 .. code-block:: bash
 
    sudo -u postgres createuser --superuser repmgr
    sudo -u postgres createdb --owner=repmgr repmgr
    sudo -u postgres psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"
+   sudo -u postgres psql -c "ALTER USER repmgr PASSWORD '<YOUR_REPMGR_PASSWORD>';"
 
 **Step 3.2 — Create /etc/repmgr.conf (all nodes)**
 
@@ -480,9 +500,87 @@ Replace ``/etc/haproxy/haproxy.cfg``:
 node is the primary and ``503`` otherwise. HAProxy queries port 8008 on each
 node to determine where to route connections.
 
-Copy ``pgchk.py`` from the
-`ha-postgres-reprmgr-haproxy repository <https://github.com/sadohert/ha-postgres-reprmgr-haproxy>`__
-to ``/usr/local/bin/pgchk.py`` on each node and make it executable:
+On each node, create ``/usr/local/bin/pgchk.py`` with the following content:
+
+.. code-block:: python
+
+   #!/usr/bin/env python3
+   import subprocess
+   from http.server import BaseHTTPRequestHandler, HTTPServer
+   import argparse
+
+   DEFAULT_PORT = 8008
+   PG_USER = "postgres"
+   PG_DB = "postgres"
+   PG_PORT = "5432"
+
+   class PostgresHealthCheckHandler(BaseHTTPRequestHandler):
+       def safe_write(self, data):
+           try:
+               self.wfile.write(data)
+           except (BrokenPipeError, ConnectionResetError):
+               pass
+
+       def check_postgres_status(self):
+           try:
+               cmd = ["psql", "-U", PG_USER, "-d", PG_DB, "-p", PG_PORT,
+                      "-t", "-c", "SELECT pg_is_in_recovery();"]
+               result = subprocess.run(cmd, capture_output=True, text=True, timeout=5)
+               if result.returncode != 0:
+                   return None
+               output = result.stdout.strip()
+               if output == 't':
+                   return True   # Standby
+               elif output == 'f':
+                   return False  # Primary
+               return None
+           except Exception:
+               return None
+
+       def do_GET(self):
+           status = self.check_postgres_status()
+           if status is None:
+               self.send_response(503)
+               self.end_headers()
+               self.safe_write(b"PostgreSQL Unreachable\n")
+               return
+           if self.path in ('/', '/master'):
+               if not status:
+                   self.send_response(200); self.end_headers()
+                   self.safe_write(b"OK - Primary\n")
+               else:
+                   self.send_response(503); self.end_headers()
+                   self.safe_write(b"Service Unavailable - Not Primary\n")
+           elif self.path == '/replica':
+               if status:
+                   self.send_response(200); self.end_headers()
+                   self.safe_write(b"OK - Replica\n")
+               else:
+                   self.send_response(503); self.end_headers()
+                   self.safe_write(b"Service Unavailable - Not Replica\n")
+           else:
+               self.send_response(404); self.end_headers()
+               self.safe_write(b"Not Found\n")
+
+       def log_message(self, format, *args):
+           pass
+
+   def run(port=DEFAULT_PORT):
+       httpd = HTTPServer(('', port), PostgresHealthCheckHandler)
+       print(f"Starting PostgreSQL Health Check on port {port}...")
+       try:
+           httpd.serve_forever()
+       except KeyboardInterrupt:
+           pass
+       httpd.server_close()
+
+   if __name__ == '__main__':
+       parser = argparse.ArgumentParser(description='PostgreSQL Health Check for HAProxy')
+       parser.add_argument('--port', type=int, default=DEFAULT_PORT)
+       args = parser.parse_args()
+       run(port=args.port)
+
+Make the script executable:
 
 .. code-block:: bash
 
@@ -517,14 +615,24 @@ Create ``/etc/systemd/system/pgchk.service``:
 
    sudo apt install -y keepalived
 
-Create ``/etc/keepalived/keepalived.conf``. Set the ``priority`` field: pg1 gets
-``101``, pg2 gets ``100``, pg3 gets ``99``. Set ``virtual_ipaddress`` to your VIP:
+First, identify the name of the network interface that carries your node's IP address:
+
+.. code-block:: bash
+
+   ip -o link show | awk '{print $2, $9}' | grep UP
+
+Note the interface name (e.g. ``ens3``, ``enp3s0``, ``eth0``). You will use it in
+the next step.
+
+Create ``/etc/keepalived/keepalived.conf``. Replace ``<INTERFACE>`` with the
+interface name from the previous step. Set ``priority``: pg1 gets ``101``,
+pg2 gets ``100``, pg3 gets ``99``. Set ``virtual_ipaddress`` to your VIP:
 
 .. code-block:: text
 
    vrrp_instance VI_1 {
        state BACKUP
-       interface eth0
+       interface <INTERFACE>
        virtual_router_id 51
        priority 101
        advert_int 1

From 1fb0f5e0dddf749a537ae9ec6bb79d36f74696bf Mon Sep 17 00:00:00 2001
From: Stu Doherty <stu.doherty@gmail.com>
Date: Mon, 27 Apr 2026 10:55:39 -0400
Subject: [PATCH 14/14] chore: trigger CodeRabbit re-review