diff --git a/source/administration-guide/scale/high-availability-cluster-based-deployment.rst b/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
index 169b6be00fd..214506eaf06 100644
--- a/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
+++ b/source/administration-guide/scale/high-availability-cluster-based-deployment.rst
@@ -30,7 +30,9 @@ Moreover, search replicas are also supported to handle search queries.
Deployment guide
----------------
-Set up and maintain a high availability cluster-based deployment on your Mattermost servers. This document doesn't cover the configuration of databases in terms of disaster recovery, however, you can refer to the `frequently asked questions (FAQ)`_ section for our recommendations.
+Set up and maintain a high availability cluster-based deployment on your Mattermost servers. Database-level HA and disaster recovery design are out of scope for this document.
+For self-hosted deployments requiring database-level HA,
+see :doc:`PostgreSQL high availability cluster `.
To ensure your instance and configuration are compatible with a high availability cluster-based deployment, please review the `configuration and compatibility`_ section.
diff --git a/source/administration-guide/scale/postgres-ha-cluster.rst b/source/administration-guide/scale/postgres-ha-cluster.rst
new file mode 100644
index 00000000000..f215cf668ce
--- /dev/null
+++ b/source/administration-guide/scale/postgres-ha-cluster.rst
@@ -0,0 +1,902 @@
+PostgreSQL high availability cluster
+=====================================
+
+:nosearch:
+
+This guide describes how to deploy a high availability PostgreSQL cluster for
+Mattermost using `repmgr `__ for replication management
+and automatic failover, `HAProxy `__ for connection
+routing, and `Keepalived `__ for Virtual IP (VIP)
+management.
+
+This is infrastructure-level HA that operates independently of your Mattermost
+edition. It is compatible with any self-hosted Mattermost deployment.
+
+.. note::
+
+ This guide has been validated on: **Ubuntu 24.04 LTS**, **PostgreSQL 17**,
+ **repmgr 5.5**, **HAProxy 2.8**, **Keepalived**.
+
+Architecture overview
+---------------------
+
+A PostgreSQL HA cluster for Mattermost consists of three nodes running in
+parallel. Each node runs the full stack: PostgreSQL, repmgr daemon (repmgrd),
+HAProxy, Keepalived, and a health-check service. A Virtual IP (VIP) floats
+across nodes and always points to the current primary.
+
+.. code-block:: text
+
+ VIP:
+ │
+ ┌───────────────┼───────────────┐
+ │ │ │
+ ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
+ │ pg1 │ │ pg2 │ │ pg3 │
+ │ │ │ │ │ │
+ │ HAProxy │ │ HAProxy │ │ HAProxy │
+ │ Keepalived │ │ Keepalived │ │ Keepalived │
+ │ pgchk.py │ │ pgchk.py │ │ pgchk.py │
+ │ repmgrd │ │ repmgrd │ │ repmgrd │
+ ├─────────────┤ ├─────────────┤ ├─────────────┤
+ │ PostgreSQL │ │ PostgreSQL │ │ PostgreSQL │
+ │ PRIMARY │ │ STANDBY │ │ STANDBY │
+ └─────────────┘ └─────────────┘ └─────────────┘
+
+**Components:**
+
+.. list-table::
+ :widths: 20 10 70
+ :header-rows: 1
+
+ * - Component
+ - Version
+ - Role
+ * - PostgreSQL
+ - 17
+ - Primary database engine. Streaming replication with replication slots.
+ * - repmgr / repmgrd
+ - 5.5
+ - Replication manager. Monitors cluster health and automatically promotes
+ a standby when the primary fails.
+ * - HAProxy
+ - 2.8
+ - TCP load balancer. Routes write traffic to the primary and read traffic
+ to standbys via two ports.
+ * - Keepalived
+ - —
+ - Manages the VIP using VRRP. Moves the VIP to the new primary after
+ failover.
+ * - pgchk.py
+ - —
+ - HTTP health-check endpoint (port 8008). HAProxy queries this to
+ determine which node is the current primary.
+
+**HAProxy ports:**
+
+.. list-table::
+ :widths: 15 85
+ :header-rows: 1
+
+ * - Port
+ - Purpose
+ * - 5000
+ - Write traffic — routes to the current primary only
+ * - 5001
+ - Read traffic — load-balanced across all standbys
+
+**Sizing:** This architecture is appropriate for Mattermost deployments up to
+approximately 2,000 concurrent users. For larger deployments, see
+:doc:`Scaling for Enterprise `.
+
+Before you begin
+----------------
+
+Is this the right architecture for you?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. list-table::
+ :widths: 30 35 35
+ :header-rows: 1
+
+ * - Scenario
+ - Recommendation
+ - Why
+ * - Cloud-hosted on AWS/GCP/Azure
+ - Use managed RDS/Cloud SQL with Multi-AZ
+ - Managed failover, no infrastructure to operate
+ * - On-premises or private cloud, single site
+ - **This guide** — single-DC HA cluster
+ - Automatic failover within the datacenter, no cloud dependency
+ * - On-premises, two or more sites, DR required
+ - Single-DC HA (this guide) + Multi-DC DR guide (coming soon)
+ - Active/warm-standby across datacenters
+
+Requirements
+~~~~~~~~~~~~
+
+**Hardware (per node — minimum):**
+
+- Operating system: Ubuntu 24.04 LTS
+- CPU: 2 cores
+- RAM: 4 GB
+- Disk: 50 GB
+
+**You need 3 nodes** and one spare IP address on the same subnet for the VIP.
+
+**Network — ports that must be open between all three nodes:**
+
+.. list-table::
+ :widths: 15 85
+ :header-rows: 1
+
+ * - Port
+ - Purpose
+ * - 22
+ - SSH (administration)
+ * - 5432
+ - PostgreSQL (replication, repmgr)
+ * - 8008
+ - pgchk.py health check (HAProxy → database nodes)
+ * - VRRP (112)
+ - Keepalived VIP election between nodes
+
+**Ports that Mattermost application servers must reach:**
+
+.. list-table::
+ :widths: 15 85
+ :header-rows: 1
+
+ * - Port
+ - Purpose
+ * - 5000
+ - Write connections (primary)
+ * - 5001
+ - Read connections (standbys)
+
+**Software:** The following packages will be installed during setup. No
+pre-installation is required.
+
+- ``postgresql-17``
+- ``postgresql-17-repmgr``
+- ``haproxy``
+- ``keepalived``
+- ``python3`` (for pgchk.py)
+
+Node planning worksheet
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Complete this before starting. You will substitute these values throughout
+the guide.
+
+.. list-table::
+ :widths: 15 25 25 35
+ :header-rows: 1
+
+ * - Node
+ - Hostname
+ - IP address
+ - Initial role
+ * - 1
+ - pg1
+ - _______________
+ - Primary
+ * - 2
+ - pg2
+ - _______________
+ - Standby
+ * - 3
+ - pg3
+ - _______________
+ - Standby
+ * - VIP
+ - —
+ - _______________
+ - Floating (always points to primary)
+
+**Subnet:** ``_______________`` (e.g. ``10.0.1.0``)
+
+Time estimate
+~~~~~~~~~~~~~
+
+Allow **2–3 hours** for a first-time setup on pre-provisioned servers.
+
+Setup guide
+-----------
+
+.. note::
+
+ Throughout this guide, substitute the IP addresses and subnet you recorded
+ in the node planning worksheet above.
+
+.. warning::
+
+ Complete each phase in order. The checkpoint at the end of each phase must
+ pass before you proceed.
+
+Phase 1: Base installation (all nodes)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run all steps in Phase 1 on **pg1, pg2, and pg3**.
+
+**Step 1.1 — Configure /etc/hosts**
+
+On each node, append to ``/etc/hosts``:
+
+.. code-block:: text
+
+ pg1
+ pg2
+ pg3
+
+Verify hostname resolution on each node:
+
+.. code-block:: bash
+
+ ping -c 1 pg1 && ping -c 1 pg2 && ping -c 1 pg3
+
+Expected: 3 successful pings.
+
+**Step 1.2 — Install PostgreSQL 17 and repmgr 5.5**
+
+.. code-block:: bash
+
+ sudo apt update
+ sudo apt install -y curl ca-certificates
+ sudo install -d /usr/share/postgresql-common/pgdg
+ sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc \
+ --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
+ sudo sh -c 'echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] \
+ https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" \
+ > /etc/apt/sources.list.d/pgdg.list'
+ sudo apt update
+ sudo apt install -y postgresql-17 postgresql-17-repmgr
+
+✅ **Phase 1 checkpoint** — run on every node:
+
+.. code-block:: bash
+
+ sudo systemctl status postgresql | grep "active (running)"
+ /usr/lib/postgresql/17/bin/repmgr --version
+
+**Pass:** PostgreSQL shows ``active (running)``; repmgr prints ``repmgr 5.5.x``.
+
+**Fail:** If PostgreSQL did not start, check ``journalctl -u postgresql`` for errors.
+
+Phase 2: PostgreSQL configuration (all nodes)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run all steps in Phase 2 on **pg1, pg2, and pg3**.
+
+**Step 2.1 — Configure postgresql.conf**
+
+Append to ``/etc/postgresql/17/main/postgresql.conf``:
+
+.. code-block:: ini
+
+ # Replication settings
+ listen_addresses = '*'
+ max_wal_senders = 10
+ max_replication_slots = 10
+ wal_level = replica
+ hot_standby = on
+ archive_mode = on
+ archive_command = '/bin/true'
+ shared_preload_libraries = 'repmgr'
+ wal_log_hints = on
+ wal_keep_size = 1024
+
+**Step 2.2 — Configure pg_hba.conf**
+
+Append to ``/etc/postgresql/17/main/pg_hba.conf``:
+
+.. code-block:: text
+
+ # repmgr access
+ host repmgr repmgr /24 scram-sha-256
+ host repmgr repmgr 127.0.0.1/32 scram-sha-256
+ # Replication connections
+ host replication repmgr /24 scram-sha-256
+ host replication repmgr 127.0.0.1/32 scram-sha-256
+
+Create a ``.pgpass`` file on each node so repmgr can authenticate without
+an interactive password prompt:
+
+.. code-block:: bash
+
+ echo "*:*:repmgr:repmgr:" >> ~/.pgpass
+ chmod 600 ~/.pgpass
+
+.. warning::
+
+ **Lab and testing only:** If you want to skip password authentication for
+ initial setup, you can temporarily use ``trust`` instead of
+ ``scram-sha-256``. Do not use ``trust`` in production — it allows
+ passwordless connections from any host on the subnet.
+
+**Step 2.3 — Restart PostgreSQL**
+
+.. code-block:: bash
+
+ sudo systemctl restart postgresql
+
+✅ **Phase 2 checkpoint** — run on every node:
+
+.. code-block:: bash
+
+ sudo -u postgres psql -c "SHOW wal_level;"
+ sudo -u postgres psql -c "SHOW shared_preload_libraries;"
+
+**Pass:** ``wal_level`` is ``replica``; ``shared_preload_libraries`` contains ``repmgr``.
+
+**Fail:** If PostgreSQL did not restart, check ``journalctl -u postgresql``.
+
+Phase 3: repmgr configuration and cluster initialisation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Step 3.1 — Create repmgr user and database (pg1 only)**
+
+.. note::
+
+ repmgr requires superuser privileges to perform certain cluster operations
+ including ``pg_rewind`` (used to resync a failed primary as a standby) and
+ event notifications. If your security policy prohibits superuser accounts,
+ refer to the `repmgr documentation on permissions
+ `__ for the
+ minimum required grants.
+
+.. code-block:: bash
+
+ sudo -u postgres createuser --superuser repmgr
+ sudo -u postgres createdb --owner=repmgr repmgr
+ sudo -u postgres psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"
+ sudo -u postgres psql -c "ALTER USER repmgr PASSWORD '';"
+
+**Step 3.2 — Create /etc/repmgr.conf (all nodes)**
+
+Create ``/etc/repmgr.conf`` on each node. Adjust ``node_id``, ``node_name``,
+and ``host`` for each node:
+
+**pg1:**
+
+.. code-block:: ini
+
+ node_id=1
+ node_name='pg1'
+ conninfo='host= user=repmgr dbname=repmgr connect_timeout=2'
+ data_directory='/var/lib/postgresql/17/main'
+ use_replication_slots=yes
+ monitoring_history=yes
+ log_level=INFO
+ pg_bindir='/usr/lib/postgresql/17/bin'
+ service_start_command='sudo /usr/bin/pg_ctlcluster 17 main start'
+ service_stop_command='sudo /usr/bin/pg_ctlcluster 17 main stop'
+ service_restart_command='sudo /usr/bin/pg_ctlcluster 17 main restart'
+ service_reload_command='sudo /usr/bin/pg_ctlcluster 17 main reload'
+ service_promote_command='sudo /usr/bin/pg_ctlcluster 17 main promote'
+ failover=automatic
+ promote_command='repmgr standby promote -f /etc/repmgr.conf --log-to-file'
+ follow_command='repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
+ reconnect_attempts=3
+ reconnect_interval=5
+ monitor_interval_secs=2
+
+**pg2:** Same as above with ``node_id=2``, ``node_name='pg2'``, ``host=``.
+
+**pg3:** Same as above with ``node_id=3``, ``node_name='pg3'``, ``host=``.
+
+**Step 3.3 — Register primary (pg1 only)**
+
+.. code-block:: bash
+
+ sudo -u postgres repmgr -f /etc/repmgr.conf primary register
+
+**Step 3.4 — Clone standbys (pg2 and pg3)**
+
+Run on **pg2**, then **pg3**:
+
+.. code-block:: bash
+
+ sudo systemctl stop postgresql
+ sudo -u postgres repmgr -h -U repmgr -d repmgr \
+ -f /etc/repmgr.conf standby clone --delete-existing-pgdata
+ sudo systemctl start postgresql
+ sudo -u postgres repmgr -f /etc/repmgr.conf standby register
+
+**Step 3.5 — Start repmgrd (all nodes)**
+
+Create ``/etc/systemd/system/repmgrd.service``:
+
+.. code-block:: ini
+
+ [Unit]
+ Description=repmgr daemon
+ After=postgresql.service
+ Requires=postgresql.service
+
+ [Service]
+ User=postgres
+ ExecStart=/usr/lib/postgresql/17/bin/repmgrd -f /etc/repmgr.conf --no-daemonize
+ Restart=on-failure
+
+ [Install]
+ WantedBy=multi-user.target
+
+.. code-block:: bash
+
+ sudo systemctl daemon-reload
+ sudo systemctl enable repmgrd
+ sudo systemctl start repmgrd
+
+✅ **Phase 3 checkpoint** — run on any node:
+
+.. code-block:: bash
+
+ sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** Output shows all three nodes — pg1 as ``* running`` (primary), pg2 and
+pg3 as ``running`` (standby). On pg1, the following query returns 2 rows:
+
+.. code-block:: bash
+
+ sudo -u postgres psql -c "SELECT client_addr, state FROM pg_stat_replication;"
+
+**Fail:** A standby showing ``! running`` means replication did not establish.
+Check ``journalctl -u postgresql`` on the failed standby. Common cause: firewall
+blocking port 5432 between nodes.
+
+Phase 4: HAProxy, health check, and VIP (all nodes)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run all steps in Phase 4 on **pg1, pg2, and pg3**.
+
+**Step 4.1 — Install HAProxy**
+
+.. code-block:: bash
+
+ sudo apt install -y haproxy
+
+**Step 4.2 — Configure HAProxy**
+
+Replace ``/etc/haproxy/haproxy.cfg``:
+
+.. code-block:: text
+
+ global
+ log /dev/log local0
+ maxconn 4000
+
+ defaults
+ log global
+ mode tcp
+ timeout connect 5s
+ timeout client 30s
+ timeout server 30s
+
+ frontend pg_write
+ bind *:5000
+ default_backend pg_primary
+
+ frontend pg_read
+ bind *:5001
+ default_backend pg_replicas
+
+ backend pg_primary
+ option tcp-check
+ server pg1 :5432 check port 8008
+ server pg2 :5432 check port 8008 backup
+ server pg3 :5432 check port 8008 backup
+
+ backend pg_replicas
+ balance roundrobin
+ option tcp-check
+ server pg2 :5432 check port 8008
+ server pg3 :5432 check port 8008
+ server pg1 :5432 check port 8008 backup
+
+**Step 4.3 — Deploy pgchk.py**
+
+``pgchk.py`` is a lightweight HTTP server that returns ``200 OK`` when the local
+node is the primary and ``503`` otherwise. HAProxy queries port 8008 on each
+node to determine where to route connections.
+
+On each node, create ``/usr/local/bin/pgchk.py`` with the following content:
+
+.. code-block:: python
+
+ #!/usr/bin/env python3
+ import subprocess
+ from http.server import BaseHTTPRequestHandler, HTTPServer
+ import argparse
+
+ DEFAULT_PORT = 8008
+ PG_USER = "postgres"
+ PG_DB = "postgres"
+ PG_PORT = "5432"
+
+ class PostgresHealthCheckHandler(BaseHTTPRequestHandler):
+ def safe_write(self, data):
+ try:
+ self.wfile.write(data)
+ except (BrokenPipeError, ConnectionResetError):
+ pass
+
+ def check_postgres_status(self):
+ try:
+ cmd = ["psql", "-U", PG_USER, "-d", PG_DB, "-p", PG_PORT,
+ "-t", "-c", "SELECT pg_is_in_recovery();"]
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=5)
+ if result.returncode != 0:
+ return None
+ output = result.stdout.strip()
+ if output == 't':
+ return True # Standby
+ elif output == 'f':
+ return False # Primary
+ return None
+ except Exception:
+ return None
+
+ def do_GET(self):
+ status = self.check_postgres_status()
+ if status is None:
+ self.send_response(503)
+ self.end_headers()
+ self.safe_write(b"PostgreSQL Unreachable\n")
+ return
+ if self.path in ('/', '/master'):
+ if not status:
+ self.send_response(200); self.end_headers()
+ self.safe_write(b"OK - Primary\n")
+ else:
+ self.send_response(503); self.end_headers()
+ self.safe_write(b"Service Unavailable - Not Primary\n")
+ elif self.path == '/replica':
+ if status:
+ self.send_response(200); self.end_headers()
+ self.safe_write(b"OK - Replica\n")
+ else:
+ self.send_response(503); self.end_headers()
+ self.safe_write(b"Service Unavailable - Not Replica\n")
+ else:
+ self.send_response(404); self.end_headers()
+ self.safe_write(b"Not Found\n")
+
+ def log_message(self, format, *args):
+ pass
+
+ def run(port=DEFAULT_PORT):
+ httpd = HTTPServer(('', port), PostgresHealthCheckHandler)
+ print(f"Starting PostgreSQL Health Check on port {port}...")
+ try:
+ httpd.serve_forever()
+ except KeyboardInterrupt:
+ pass
+ httpd.server_close()
+
+ if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description='PostgreSQL Health Check for HAProxy')
+ parser.add_argument('--port', type=int, default=DEFAULT_PORT)
+ args = parser.parse_args()
+ run(port=args.port)
+
+Make the script executable:
+
+.. code-block:: bash
+
+ sudo chmod +x /usr/local/bin/pgchk.py
+
+Create ``/etc/systemd/system/pgchk.service``:
+
+.. code-block:: ini
+
+ [Unit]
+ Description=PostgreSQL Health Check for HAProxy
+ After=postgresql.service
+
+ [Service]
+ ExecStart=/usr/bin/python3 /usr/local/bin/pgchk.py --port 8008
+ Restart=always
+
+ [Install]
+ WantedBy=multi-user.target
+
+.. code-block:: bash
+
+ sudo systemctl daemon-reload
+ sudo systemctl enable pgchk
+ sudo systemctl start pgchk
+ sudo systemctl enable haproxy
+ sudo systemctl start haproxy
+
+**Step 4.4 — Install and configure Keepalived**
+
+.. code-block:: bash
+
+ sudo apt install -y keepalived
+
+First, identify the name of the network interface that carries your node's IP address:
+
+.. code-block:: bash
+
+ ip -o link show | awk '{print $2, $9}' | grep UP
+
+Note the interface name (e.g. ``ens3``, ``enp3s0``, ``eth0``). You will use it in
+the next step.
+
+Create ``/etc/keepalived/keepalived.conf``. Replace ```` with the
+interface name from the previous step. Set ``priority``: pg1 gets ``101``,
+pg2 gets ``100``, pg3 gets ``99``. Set ``virtual_ipaddress`` to your VIP:
+
+.. code-block:: text
+
+ vrrp_instance VI_1 {
+ state BACKUP
+ interface
+ virtual_router_id 51
+ priority 101
+ advert_int 1
+ nopreempt
+ virtual_ipaddress {
+ /24
+ }
+ }
+
+.. code-block:: bash
+
+ sudo systemctl enable keepalived
+ sudo systemctl start keepalived
+
+✅ **Phase 4 checkpoint** — run on any node:
+
+.. code-block:: bash
+
+ # VIP should be active on the primary node (pg1)
+ ip addr show | grep
+
+ # Port 5000 should connect to primary
+ psql -h -p 5000 -U repmgr -d repmgr \
+ -c "SELECT inet_server_addr(), pg_is_in_recovery();"
+
+ # Port 5001 should connect to a standby
+ psql -h -p 5001 -U repmgr -d repmgr \
+ -c "SELECT inet_server_addr(), pg_is_in_recovery();"
+
+**Pass:** VIP visible on pg1. Port 5000 returns ``pg_is_in_recovery = f`` (primary).
+Port 5001 returns ``pg_is_in_recovery = t`` (standby).
+
+**Fail:** If the VIP is not on pg1, check ``journalctl -u keepalived``. If HAProxy
+is not routing correctly, check ``journalctl -u haproxy`` and verify pgchk.py
+is responding: ``curl http://:8008`` should return HTTP 200.
+
+Phase 5: End-to-end validation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Run this phase after all four previous phases pass on all nodes. This confirms
+the cluster behaves correctly under failure before you connect Mattermost.
+
+**Step 5.1 — Confirm healthy starting state**
+
+.. code-block:: bash
+
+ sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** pg1 is ``* running`` (primary); pg2 and pg3 are ``running`` (standby).
+
+**Step 5.2 — Simulate primary failure**
+
+On **pg1**:
+
+.. code-block:: bash
+
+ sudo systemctl stop postgresql
+
+Wait 30 seconds, then on **pg2** or **pg3**:
+
+.. code-block:: bash
+
+ sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** One of pg2 or pg3 is now ``* running`` (primary). pg1 shows as ``! running``
+(unreachable — expected).
+
+**Step 5.3 — Verify HAProxy and VIP followed the new primary**
+
+.. code-block:: bash
+
+ psql -h -p 5000 -U repmgr -d repmgr \
+ -c "SELECT inet_server_addr(), pg_is_in_recovery();"
+
+**Pass:** Returns the IP of the newly promoted node with ``pg_is_in_recovery = f``.
+
+**Step 5.4 — Recover the old primary as a standby**
+
+On **pg1**:
+
+.. code-block:: bash
+
+ sudo systemctl start postgresql
+ sudo -u postgres repmgr -f /etc/repmgr.conf node rejoin \
+ --force-rewind --config-files=postgresql.conf,pg_hba.conf
+
+Then on any node:
+
+.. code-block:: bash
+
+ sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+**Pass:** All three nodes show ``running``; pg1 is now a standby.
+
+.. note::
+
+ Your cluster is ready for production. Connect Mattermost using the VIP
+ address and port 5000 as the primary datasource. Optionally configure
+ port 5001 as a read replica in ``config.json``.
+
+Day-2 operations
+----------------
+
+Check cluster status
+~~~~~~~~~~~~~~~~~~~~
+
+Run on any node:
+
+.. code-block:: bash
+
+ sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+Expected healthy output shows one ``* running`` primary and two ``running`` standbys.
+
+Check replication lag
+~~~~~~~~~~~~~~~~~~~~~
+
+Run on the primary:
+
+.. code-block:: bash
+
+ sudo -u postgres psql -c "
+ SELECT client_addr, application_name, state,
+ pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS lag
+ FROM pg_stat_replication;"
+
+Normal lag is under 1 MB during steady state. Lag growing continuously
+indicates a replication problem — check network connectivity and standby
+PostgreSQL logs.
+
+Controlled switchover (planned maintenance)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To move the primary role to a standby with zero data loss:
+
+.. code-block:: bash
+
+ # Run on the TARGET standby (e.g. pg2)
+ sudo -u postgres repmgr -f /etc/repmgr.conf standby switchover
+
+repmgr will demote the old primary and promote this node. The VIP and HAProxy
+will follow automatically.
+
+Add a standby node
+~~~~~~~~~~~~~~~~~~
+
+1. Provision a new server and complete Phases 1–2 of the setup guide.
+2. Create ``/etc/repmgr.conf`` with the next available ``node_id``.
+3. On the new node:
+
+ .. code-block:: bash
+
+ sudo systemctl stop postgresql
+ sudo -u postgres repmgr -h -U repmgr -d repmgr \
+ -f /etc/repmgr.conf standby clone --delete-existing-pgdata
+ sudo systemctl start postgresql
+ sudo -u postgres repmgr -f /etc/repmgr.conf standby register
+
+4. Add the new node to ``/etc/haproxy/haproxy.cfg`` on all existing nodes and
+ reload HAProxy: ``sudo systemctl reload haproxy``.
+
+Rejoin a failed node
+~~~~~~~~~~~~~~~~~~~~
+
+After recovering a failed standby:
+
+.. code-block:: bash
+
+ sudo -u postgres repmgr -f /etc/repmgr.conf node rejoin \
+ --force-rewind --config-files=postgresql.conf,pg_hba.conf
+
+After rejoining a failed primary (after automatic failover has already promoted
+a new primary), run the same command on the old primary to re-register it as a
+standby.
+
+Troubleshooting
+---------------
+
+repmgrd is not starting
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** ``systemctl status repmgrd`` shows ``failed`` or ``activating``.
+
+**Likely cause:** PostgreSQL has not fully started yet, or the repmgr database
+is not accessible.
+
+**Resolution:**
+
+.. code-block:: bash
+
+ # Verify PostgreSQL is running first
+ sudo systemctl status postgresql
+
+ # Check repmgrd logs
+ journalctl -u repmgrd -n 50
+
+ # Test repmgr connection manually
+ sudo -u postgres repmgr -f /etc/repmgr.conf cluster show
+
+Standby not replicating
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** ``repmgr cluster show`` shows a standby as ``! running``, or
+``pg_stat_replication`` on the primary shows fewer than expected rows.
+
+**Likely cause:** Network connectivity issue on port 5432, or ``pg_hba.conf``
+not permitting the replication connection.
+
+**Resolution:**
+
+.. code-block:: bash
+
+ # From the standby, test connectivity to the primary
+ pg_isready -h -p 5432 -U repmgr
+
+ # Check PostgreSQL logs on the standby
+ sudo -u postgres tail -50 /var/log/postgresql/postgresql-17-main.log
+
+VIP not moving after failover
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** After a primary failure and successful repmgr promotion, the VIP
+remains on the failed node or does not appear on the new primary.
+
+**Likely cause:** Keepalived is not running, or VRRP traffic is blocked by a
+firewall.
+
+**Resolution:**
+
+.. code-block:: bash
+
+ sudo systemctl status keepalived
+ journalctl -u keepalived -n 50
+
+ # Verify VRRP traffic is not blocked — check cloud security groups or
+ # iptables rules for protocol 112 (VRRP)
+
+HAProxy routing to wrong node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Symptom:** Connections on port 5000 land on a standby (writes fail), or
+port 5001 routes to the primary.
+
+**Likely cause:** pgchk.py is not running or returning incorrect status.
+
+**Resolution:**
+
+.. code-block:: bash
+
+ # Check health check response on each node
+ curl -v http://:8008
+
+ # Primary should return HTTP 200; standbys should return HTTP 503
+ sudo systemctl status pgchk
+ journalctl -u pgchk -n 30
+
+Split-brain prevention
+~~~~~~~~~~~~~~~~~~~~~~
+
+repmgr's ``failover=automatic`` setting and ``reconnect_attempts=3`` with
+``reconnect_interval=5`` provide a brief delay before promoting a standby.
+This prevents promotion during transient network blips.
+
+If you suspect a split-brain scenario (two nodes both believing they are
+primary), **do not write to either node**. Check cluster status from a
+third node and use ``repmgr node service --action=stop`` to fence the
+unintended primary before recovering.
diff --git a/source/administration-guide/scale/scaling-for-enterprise.rst b/source/administration-guide/scale/scaling-for-enterprise.rst
index 18779aee9e2..5e4c0521dde 100644
--- a/source/administration-guide/scale/scaling-for-enterprise.rst
+++ b/source/administration-guide/scale/scaling-for-enterprise.rst
@@ -29,6 +29,14 @@ High availability
A :doc:`high availability cluster-based deployment ` enables a Mattermost system to maintain service during outages and hardware failures through the use of redundant infrastructure.
+PostgreSQL high availability cluster
+--------------------------------------
+
+For self-hosted deployments on bare-metal or VMs, a
+:doc:`PostgreSQL HA cluster `
+provides automatic database failover using repmgr, HAProxy, and Keepalived —
+without requiring a managed database service.
+
Redis
-----
@@ -45,7 +53,8 @@ Available reference architectures
Deployment architecture at scale
Backing storage benchmarks
Enterprise search
- High availability
+ High availability
+ PostgreSQL HA cluster
Redis
Scale up to 200 users
Scale up to 2000 users