We're faced with vital need to have a proper working free tool for comparing two PostgreSQL databases and preparing the delta as SQL script.
We have multiple PostgreSQL database repositories and need to have a properly working deployment pipelines for our databases.
FROM- this database we will use a target database, we want to apply final patch here.TO- this database we will use as an etalon schema, and prepare delta script that can be applied on FROM database to make schema equals to schema of databaseTO.
cargo fmt --all -- --check
cargo clippy --all-targets --all-features
cargo test
cargo buildcargo run [options]
pgc [options]We have two opportunities to run:
- use configuration file;
- use command line arguments.
Configuration file can help us to put different functions in a chain. Command line arguments can be used to execute just one function in one time.
--command {dump|compare|clear} - the command name, dump - to create a dump file, compare - to compare two dumps, clear - to generate a script that drops all objects found in the database.
--server {server name} - to specify server name for a command, without it tool will use localhost as a host for command.
--port {port number} - to specify port number for a command, without it tool will use 5432 as a host for command.
--user {user name} - to specify user name for a command, without it tool will use information from pgpass file.
--password {user password} - to specify user password for a command, without it tool will use information from pgpass file.
--database {databasename} - to specify database name for the command, without it tool will use postgres as a database name.
--scheme {schemaname} - to specify concrete scheme for the command, without it all schemas will be used for command.
--output {filename} - to specify output file name for the command, without it the tool will use data.out as a file name for output file.
--from {filename} - to specify dump file of the FROM databadse for the comparer, default value for this property dump.from.
--to {filename} - to specify dump file of the TO databadse for the comparer, default value for this property is dump.to.
--config {filename} - this argument avoid usage of any other arguments, and tell comparer to use command chain from this file. Default: pgc.conf.
--use_ssl - specify this argument to use SSL for PostgreSQL connection.
--use-drop - specify this argument if you want to use DROPs in output script, otherwise no DROPs will be used.
--use-single-transaction - use this flag to wrap resulting diff file within explicit begin; and commit; statements (i.e. single transaction).
--use-comments {true|false} - set to false to strip SQL comments from the generated script; set to true (default) to include comments.
--grants-mode {ignore|addonly|full} - controls how grants (privileges) are handled during comparison. ignore (default) skips grants entirely; addonly adds grants that exist in TO but not in FROM; full makes grants identical by adding missing and revoking extra.
--output-for-production {true|false} - set to true to generate a migration script that is convenient to run against a live production database (default false — output is unchanged). See Production-friendly output.
--max-connections {number} - maximum number of connections in the PostgreSQL connection pool. Default: 16. Used by all concurrent introspection queries; table metadata is pulled schema-wide in one query per resource kind (columns, indexes, constraints, triggers, policies, partition info, definitions) so connection count mostly matters for the sibling queries (extensions, sequences, routines, views, etc.) running in parallel.
--use-cascade - add CASCADE to every DROP statement in the clear script. Warning: CASCADE can silently drop dependent objects that live outside the selected schema(s) (e.g., foreign keys or views in other schemas referencing the dropped objects). Use only when you are certain no cross-schema dependencies should survive. Without this flag the generated drops rely on the explicit dependency ordering and will fail cleanly if unresolved dependencies exist.
pgc --command dump --server {host} --database {database} --scheme {scheme} --output {file}As a result if this command we will have a dump file with all needed information to compile delta between two databases. This file should be used for the FROM or TO sides in compare command.
pgc --command compare --from {from_dump} --to {to_dump} --output {file} --use-dropThis command comparing two dumps and produce SQL script for the FROM database to be equal to TO database after applying it.
If we add --use-drop argument comparer will add drop scripts for all items that non exists in target database, otherwise drop scripts will be ignored.
By default, comparer ignore drops.
pgc --command compare --from {from_dump} --to {to_dump} --output {file} --use-single-transaction --output-for-productionBy default the delta script favours brevity and is meant to be applied to an idle database. With --output-for-production true (config key OUTPUT_FOR_PRODUCTION=true) the comparer instead emits a script that minimises locking on a live database:
- Indexes are built concurrently —
CREATE INDEXbecomesCREATE INDEX CONCURRENTLY(theUNIQUEkeyword is preserved), so building an index does not block writes. - Partitioned tables are handled correctly —
CONCURRENTLYis not allowed directly on a partitioned table, so for a partitioned parent the comparer emitsCREATE INDEX ... ON ONLY {parent}(in the transaction), thenCREATE INDEX CONCURRENTLYon each partition followed byALTER INDEX ... ATTACH PARTITION(after the transaction). When the partition layout cannot be expanded safely (no known partitions, or multi-level/sub-partitioned children) it falls back to a single non-concurrentCREATE INDEXon the parent and explains why in a comment. - Indexes are dropped concurrently —
DROP INDEXbecomesDROP INDEX CONCURRENTLY(kept non-concurrent for indexes on partitioned tables, where concurrent drop is illegal). - Foreign keys are validated separately — a new foreign key is added
NOT VALIDinside the transaction (a fast, metadata-only operation) and a matchingVALIDATE CONSTRAINT(the long, scan-heavy step) is emitted afterwards so it does not hold the lock for the whole migration. - DDL is emitted defensively so the migration is re-runnable — every statement PostgreSQL supports a guard for is generated idempotently, so a migration that was applied partially (e.g. a post-commit step failed) can be replayed without errors.
CREATE [UNLOGGED] TABLE,CREATE [UNLOGGED] SEQUENCE,CREATE MATERIALIZED VIEWandCREATE [UNIQUE] INDEX [CONCURRENTLY]gainIF NOT EXISTS;CREATE VIEWbecomesCREATE OR REPLACE VIEW;ALTER TABLE ... ADD COLUMNgainsIF NOT EXISTS; andALTER TABLE ... DROP COLUMN/DROP CONSTRAINTgainIF EXISTS.CREATE TYPEandALTER TABLE ... ADD CONSTRAINTare left unguarded because PostgreSQL has no idempotency clause for them. The rewrite is SQL-aware (it skips string literals, quoted identifiers, comments — including commented-out drops — and dollar-quoted function bodies), so only the structural keyword of each statement is touched.
Because CREATE/DROP INDEX CONCURRENTLY, VALIDATE CONSTRAINT and ALTER INDEX ... ATTACH PARTITION cannot run inside a transaction block, every such statement is moved to a clearly marked Production post-commit section emitted after the commit;. The rest of the migration still runs inside the single transaction when --use-single-transaction is set. Each post-commit statement runs in its own implicit transaction; most are safe to re-run thanks to idempotency guards, but ALTER INDEX ... ATTACH PARTITION has no built-in guard, so already-successful ATTACH steps may need to be skipped when replaying the section.
--output-for-production defaults to false; when off the output is byte-for-byte identical to previous behaviour.
pgc --command clear --server {host} --database {database} --scheme {scheme} --output {file} --use-single-transaction --use-commentsThis command connects to the specified database, discovers all objects in the given schema(s) and produces a SQL script that drops every found object in dependency-safe order:
- Views (topologically sorted by
table_relation; tie-break: materialized before regular, then alphabetical) - Foreign key constraints
- Tables
- Routines (functions, procedures, aggregates)
- Sequences
- Types (enums, composites, domains)
- Extensions
- Schemas
The resulting script can be applied on another database that shares the same schema to fully remove all objects originating from it.
Optional flags:
--use-single-transactionwraps the script inBEGIN/COMMIT.--use-comments(defaulttrue) adds explanatory comments before each drop statement.--use-cascadeappendsCASCADEto everyDROPstatement so that dependent objects outside the inspected schema(s) are removed automatically. Use with caution — this can drop objects you did not intend to remove (see--use-cascadedescription above).
The configuration file is pretty simple key-value configuration file. All rows starts from # will be interpreted as a comments. Also will be ignored all empty or whitespace rows.
All configuration file should based on the following one:
#
# PostgreSQL Comparer Configuration
#
# FROM
FROM_HOST=localhost
FROM_DATABASE=service
FROM_SCHEME=service
FROM_SSL=true
FROM_DUMP=from.dump
# TO
TO_HOST=localhost
TO_DATABASE=service
TO_SCHEME=service
TO_SSL=false
TO_DUMP=to.dump
# OUTPUT
OUTPUT=delta.sql
# ADDITIONAL PROPERTIES
USE_DROP=true
USE_SINGLE_TRANSACTION=true
USE_COMMENTS=false
GRANTS_MODE=ignore
MAX_CONNECTIONS=16
OUTPUT_FOR_PRODUCTION=false
OUTPUT_FOR_PRODUCTION (default false) is the configuration-file equivalent of the --output-for-production flag described in Production-friendly output.
MAX_CONNECTIONS caps the connection pool the tool opens per dump. Independent introspection queries (extensions, sequences, routines, views, types, tables, etc.) are fired concurrently via tokio::try_join!, and the table-level data (columns, indexes, constraints, triggers, policies, partition info, definitions) is pulled with schema-wide queries — one query per sub-resource, independent of table count. So the connection count mostly bounds how many of the ~18 concurrent sibling queries run without queueing.
- Server budget — never exceed half of the server's
max_connections(check withSHOW max_connections;). The PostgreSQL default is 100. - Two pools — when dumping both
FROMandTOagainst the same server the combined usage is2 × MAX_CONNECTIONS; halve your budget per pool. - Memory overhead — each PostgreSQL connection costs roughly 5–10 MB of RAM on the server.
- Diminishing returns — beyond ~24 connections there is nothing left to parallelize; the dump runs at most ~18 queries concurrently.
| Schema size | Suggested MAX_CONNECTIONS |
|---|---|
| Local or low-latency server | 16 (default) |
| High-latency / remote | 20 – 24 |
| Extremely small schema | 8 – 12 (still fine) |
The default of 16 is comfortable for most workloads. Raising it above ~24 rarely helps: the tool only has a small fixed set of sibling query batches to run.