Fix two bugs in archive_mode=shared on standby by x4m · Pull Request #84 · pg-sharding/cpg

x4m · 2026-04-21T12:03:59Z

Checkpoint on standby deletes WAL with .ready status. XLogArchiveCheckDone() treated archive_mode=shared like archive_mode=on during recovery, returning true unconditionally and allowing checkpoint to remove WAL segments that the primary had not yet archived. Fix: exclude shared mode from the early-return path, same as "always".
Walsender never sends archival status reports after archiving is restored. WalSndArchivalReport() calls pgstat_fetch_stat_archiver() whose result is cached per-session (PGSTAT_FETCH_CONSISTENCY_CACHE by default). The walsender has no transaction boundaries that would clear the cache, so last_archived_wal remained "" forever, and strcmp() suppressed all reports. Fix: call pgstat_clear_snapshot() before fetching archiver stats.

Add TAP tests in 051_archive_shared_checkpoint.pl that reproduce both bugs, and extend 050_archive_shared.pl with checkpoint/restore scenarios.

Currently we allow to cancel awaiting of syncronous commit. Some drivers cancel query after timeout. If application will retry idempotent query, it will get confirmation of written data. This can lead to split-brain in HA scenarios. To prevent it this we add synchronous_commit_cancelation setting disalowing cancelation of syncronous replication wait Version for PostgreSQL 16

Introduces 3 functions: extern bool mdb_admin_allow_bypass_owner_checks(Oid userId, Oid ownerId); extern void check_mdb_admin_is_member_of_role(Oid member, Oid role); extern bool mdb_admin_is_member_of_role(Oid member, Oid role); To check mdb admin belongship and role-to-role ownership transfer correctness. Our mdb_admin ACL model is the following: * Any roles user or/and roles can be granted with mdb_admin * mdb_admin member can tranfser ownershup of relations, namespaces and functions to other roles, if target role in neither: superuser, pg_read_server_files, pg_write_server_files nor pg_execute_server_program. * Allow mdb_admin to create LEAKPROOF functions * mdb admin sets session replication role * [MDB-16648 + MDB-17910]: Allow mdb admin to kill specific superuser queries MDB-27288: allow mdb_admin to kill autovac + tests mdb-27228: fix expected output

MDB replication role regression tests Patch allows user with mdb_replication role to use pg_create_logical_replication_slot, pg_replication_slot_advance, pg_drop_replication_slot functions to manage logical replication slots. Also, users with mdb_admin (which is memner of pg_create_subscription) can create subscribptions. Slot names starting with MDB.* are forbidden. Add run as owner tap tests More test cases in mdb_102

This commit introduces new mdb internal role mdb_superuser. Role is capaple of: GRANT/REVOKE any set of priviledges to/from any object in database. Has power of pg_database_owner in any database, including: DROP any object in database (except system catalog and stuff) Role is NOT capaple of: Create database, role, extension or alter other roles with such priviledges. Transfer ownership to /pass has_priv of roles: PG_READ_ALL_DATA PG_WRITE_ALL_DATA PG_EXECUTE_SERVER_PROGRAM PG_READ_SERVER_FILES PG_WRITE_SERVER_FILES Fix configure.ac USE_MDBLOCALES option handling Apply autoreconf stuff Set missing ok parameter ito true while acquiring mdb_superuser oid In regress tests, nobody creates mdb_superuser role, so missing ok is fine Fix spelling Applied suggestion Allow mdb_superuser to have power of pg_database_owner Allow mdb_superuser to alter objects and grant ACl to objects, owned by pg_database_owner. Also, when acl check, allow mdb_superuser use pg_database_owner role power to pass check regression test fixes

… non-superuser It is well known that some of PostgreSQL-related security issues (CVE) was related to COPY FROM/TO PROGRAM exploits for priviledge escalation or other unwanted behaviour or consequences. Thus, proper usage of this feature needed. For now, simply forbit this. Add mdb copy test

Do not use mdb_locales and mdb_newlocale when configured without them. Added ifdef codepath build without mdb-locales feature Add mdb locales patch, restore COPY from/to files, enable regress. Squashed commit of the following: commit 9f8ea4a5f42e0fd6077061bafbb428e88499896f Author: reshke kirill <reshke@double.cloud> Date: Wed Feb 22 09:27:53 2023 +0000 Add mdb locales patch, restore COPY from/to files, enable regress. This commit does several things. * Enables back COPY from/to FILE functionality, becuase it is used by pg_regress * Enables pg_regress tests in deb build itself. * Add mdb locales function and checks that package build with mdb-locales support Add configure ac target to define USE_MDBLOCALES properly Refactor optional setlocale, fix minor issues

Accept new startup param _pq_.service_auth_role for service auth under any user, which is niether superuser nor have some dangerous system role priv Add tap-test for mdb service role auth 👍👌😉 Fix tests after rebase contrib tests 💅️️💅️️💅️️ now works MDB-23247: debug ouput for testing purposes lowered to DEBUG5 elog level Skip caching routines for pre-startup logic (SCRAM -service auth role)

Update mdb-patched.md Update mdb-pacthes.md

Merge in MDB/postgres-dev from MDB-24221 to MDB_15_STABLE Squashed commit of the following: commit 16d1cd273560f2f9df6b4e7e1b21f8a8667e6c29 Author: Dmitry S. Fedorov <fedusia@yandex-team.ru> Date: Thu Jul 13 11:14:19 2023 +0300 MDB-24221: added support for jammy commit 12f75e39ee084e7ff54297729d7f4162c80e5b22 Author: Dmitry S. Fedorov <fedusia@yandex-team.ru> Date: Thu Jul 13 11:14:19 2023 +0300 MDB-24221: added support for jammy Add docker entrypoint and util script Add prepare-build script enable tests

Fixes for mdb build to compile: disable gss Build fix

Add test for drop database MDB behaviour

We learned from the field incidents, that 128Kb is not enough. This time we are going to try to increase this value.

There is no need to log the entire query, because it may be large and take lots of space on disk. Parameter max_log_size set the maximum length for logged query. Everything beyond that length is truncated. Value 0 disables the parameter.

Merge in MDB/postgres-dev from MDB-31374-bump-llvm-18-pg16_4 to MDB_16_4_no_force Squashed commit of the following: commit fac6ac32dda9ca581ed17134cf67db5755144386 Author: Andrey Lyarskiy <aslyarskiy@yandex-team.ru> Date: Tue Oct 29 12:59:04 2024 +0300 bump llvm to 18

Add new parameter to VACUUM command, FORCE, meaning VACUUM should terminate all backends that prevents the execution by holding conflicting lock

remove bogus progress reporting

Also truncate query to be logged in simple query Remove unused functions in buf_internals.h

Do not run mdb locales tests in OS Add llvm dir tmp

* check-world in worlfows * Run both check & check-world

In the case of a large PGSS_TEXT_FILE, the work time of the qtext_load_file function will be quite long, and the query to the pg_stat_statements table will not be cancellable, as there is no CHECK_FOR_INTERRUPT in the function. Also, the amount of bytes read can reach 1 GB, which leads to a slow read system call that does not allow cancellation of the query. Testing the speed of sequential read using fio with different block sizes shows that there is no significant difference between 16 MB blocks and 1 GB blocks. Therefore, this patch changes the maximum read value from 1 GB to 16 MB and adds INTERRUPTS_PENDING_CONDITION() check in the read loop of qtext_load_file to make it cancellable. For now, only statement execution is cancellable (fail_on_interrupt is true only for calls from pg_stat_statements_internal) Signed-off-by: rkhapov <r.khapov@ya.ru> Reviewed-by: reshke <reshke@double.cloud>

…g with "MDB" instead of exactly matching

Various parts of pg_dump consult the --schema-only and --data-only options to determine whether to run a section of code. While this is simple enough for two mutually-exclusive options, it will become progressively more complicated as more options are added. In anticipation of that, this commit introduces new internal flags called dumpSchema and dumpData, which are derivatives of --schema-only and --data-only. This commit also removes the schemaOnly and dataOnly members from the dump/restore options structs to prevent their use elsewhere. Note that this change neither adds new user-facing command-line options nor changes the existing --schema-only and --data-only options. Author: Corey Huinker Reviewed-by: Jeff Davis Discussion: https://postgr.es/m/CADkLM%3DcQgghMJOS8EcAVBwRO4s1dUVtxGZv5gLPfZkQ1nL1gzA%40mail.gmail.com

Introduce a new archive_mode setting "shared" to prevent WAL history loss during standby promotion in HA streaming replication setups. In shared mode, the primary proactively sends archival status updates to standbys via the replication protocol. The standby creates .ready files for received WAL segments but defers marking them as .done until the primary confirms archival. This prevents WAL from being recycled before it's safely archived, addressing a critical gap in PITR continuity during failover. Key implementation details: - Primary periodically sends last archived WAL segment via new PqReplMsg_ArchiveStatusReport ('a') message - Standby marks all segments <= reported segment as .done using alphanumeric comparison on segment part (timeline-safe) - Archiver skips during recovery in shared mode, activates on promotion - Cascading replication: each standby coordinates with immediate upstream - Startup check rejects archive_mode=on during recovery This "push" design (primary sends status) is more efficient than "pull" (standby queries per-segment), avoiding directory scans and stat() calls. Based on Heikki Linnakangas's 2014 design and Greenplum's production implementation, modernized for PostgreSQL 19. Includes TAP tests covering basic synchronization, promotion, cascading replication, and multiple standbys scenarios.

When standby receives archive status report, check if .ready files belong to ancestor timelines before the switch point and mark them as .done if already archived by primary.

When archive status reports arrive sequentially on the same timeline, directly generate expected WAL filenames and mark them as archived instead of scanning the entire archive_status directory. This optimization reduces overhead in the common case where the primary continuously archives segments. Directory scan is still used when: - Timeline changes (to handle ancestor timelines) - First report received - Non-sequential reports XLogArchiveForceDone() handles all cases internally (checking if .done exists, if .ready exists, or creating .done if neither exists), so no pre-check is needed.

XXX do we also need DROP OWNED to work this way? Probably not. Author: Alvaro Herrera Discussion: https://postgr.es/m/CALdSSPhjONb+EftRD=J1pqajkB+pjT0=tbMJs16C6q9+xT8NNg@mail.gmail.com

This commit adds timeout that is expected to be used as a prevention of long-running queries. Any session within the transaction will be terminated after spanning longer than this timeout. However, this timeout is not applied to prepared transactions. Only transactions with user connections are affected. Don't deal with transaction timeout in PostgresMain(). Instead, release transaction timeout activated by StartTransaction() in CommitTransaction()/AbortTransaction()/PrepareTransaction(). Deal with both enabling and disabling transaction timeout in assign_transaction_timeout(). Also, remove potentially flaky timeouts-long isolation test, which has no guarantees to pass on slow/busy machines. Discussion: https://postgr.es/m/20240215230856.pc6k57tqxt7fhldm%40awork3.anarazel.de Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com Author: Andrey Borodin <amborodin@acm.org> Author: Japin Li <japinli@hotmail.com> Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Backport-by: rkhapov <r.khapov@ya.ru> Reviewed-by: reshke <reshke@double.cloud> ===== Cherry-pick source: 51efe38 bf82f43

1. Checkpoint on standby deletes WAL with .ready status. XLogArchiveCheckDone() treated archive_mode=shared like archive_mode=on during recovery, returning true unconditionally and allowing checkpoint to remove WAL segments that the primary had not yet archived. Fix: exclude shared mode from the early-return path, same as "always". 2. Walsender never sends archival status reports after archiving is restored. WalSndArchivalReport() calls pgstat_fetch_stat_archiver() whose result is cached per-session (PGSTAT_FETCH_CONSISTENCY_CACHE by default). The walsender has no transaction boundaries that would clear the cache, so last_archived_wal remained "" forever, and strcmp() suppressed all reports. Fix: call pgstat_clear_snapshot() before fetching archiver stats. Add TAP tests in 051_archive_shared_checkpoint.pl that reproduce both bugs, and extend 050_archive_shared.pl with checkpoint/restore scenarios.

x4m and others added 30 commits April 17, 2026 08:23

Extend multixact SLRU

2a564ca

Use fadvise to prefetch WAL in xlogrecovery

834af5b

provide [mdb -postgresql] restict grant roles in YC[MDB-16990]

9522146

MDB-16955 : disallow to kill repl mon in cloud

5b52b5a

Demonstrate and fix lock of all SQL queries by pg_stat_statements

42cfeb3

Add mdb changelog

a1aaf2a

Update mdb-patched.md Update mdb-pacthes.md

Untrust all contrib

a0d76c7

Add debian for MDB 16 branch

9f99609

Indeed remove stop version from prerm

426adcc

Git apply debian patches[16]

1cbe8b6

Jammy-fixes for yc checker patch to compile

a1ef3bd

Fixes for mdb build to compile: disable gss Build fix

Restrict DROP DATABASE to superuser only

009f1fa

Add test for drop database MDB behaviour

[MDB-28474] Increate readaheadchunk for XlogPageReader()

9afe831

We learned from the field incidents, that 128Kb is not enough. This time we are going to try to increase this value.

parameter max_log_size to truncate logs

3758584

There is no need to log the entire query, because it may be large and take lots of space on disk. Parameter max_log_size set the maximum length for logged query. Everything beyond that length is truncated. Value 0 disables the parameter.

truncate query to be logged in simple query

4660aa3

Never check for superuser in walsender

4cf0e42

Support FORCE option in analyze command

789af35

Add new parameter to VACUUM command, FORCE, meaning VACUUM should terminate all backends that prevents the execution by holding conflicting lock

Use fadvise in walsender

feee557

remove bogus progress reporting

GUCify NUM_BUFFER_PARTITIONS

2dbd07e

Also truncate query to be logged in simple query Remove unused functions in buf_internals.h

Change storage class for mdb_replication utilities

01caaac

Add CI

f8523bb

Do not run mdb locales tests in OS Add llvm dir tmp

reshke and others added 24 commits April 17, 2026 08:23

MDB-32132: fix grantor selection for mdb_superuser (#5)

11c1c06

Do not use schema public in mdb_superuser regression tests (#8)

52bc809

Rework docker image logic

6026436

Fix CI

3451cbf

check-world in worklows in PG16 (#17)

4dfe562

* check-world in worlfows * Run both check & check-world

Introduce mdb_read_all_data/mdb_write_all_data

d1c0f62

Add check for mdb_service_auth role

353f194

Fix CI && fast CI circuit (#24)

a06e7ac

Allow usage on schema for mdb_read_all_data (#23)

2c1ab98

Use mirror apt repo (#39)

3d458c9

Refactor regress Dockerfile

313083b

Update Dockerfile: try

396d433

MDB-40410: Allow to kill backends which have application_name startin…

6fb2c2b

…g with "MDB" instead of exactly matching

Add TAP test for MDB-kill feature

ff830e7

v6 of bt_page_items pretty-print

6617f48

Mark ancestor timeline WAL segments as archived

c1f1e55

When standby receives archive status report, check if .ready files belong to ancestor timelines before the switch point and mark them as .done if already archived by primary.

Fuse shared archive with ycmdb.shared_archive

0a4decc

[PATCH] REASSIGN OWNED: ignore subscriptions in other databases

7201355

XXX do we also need DROP OWNED to work this way? Probably not. Author: Alvaro Herrera Discussion: https://postgr.es/m/CALdSSPhjONb+EftRD=J1pqajkB+pjT0=tbMJs16C6q9+xT8NNg@mail.gmail.com

reshke force-pushed the MDB_16_STABLE branch 2 times, most recently from 7edaea5 to c7f9a1e Compare April 29, 2026 09:41

reshke force-pushed the MDB_16_STABLE branch 2 times, most recently from 95c1852 to 3b9643a Compare May 11, 2026 19:47

reshke force-pushed the MDB_16_STABLE branch from 3b9643a to dcac9df Compare May 12, 2026 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two bugs in archive_mode=shared on standby#84

Fix two bugs in archive_mode=shared on standby#84
x4m wants to merge 54 commits into
MDB_16_STABLEfrom
fix_sa_16

x4m commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

x4m commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants