add guardrail for multiple version repairs being blocked#20
add guardrail for multiple version repairs being blocked#20clohfink wants to merge 1 commit intojaydeepkumar1984:auto_repair_v2_on_4_1from
Conversation
| // When false, it behaves the same as normal streaming. | ||
| public volatile boolean cdc_on_repair_enabled = true; | ||
|
|
||
| public boolean mixed_version_repairs_enabled = true; |
There was a problem hiding this comment.
I think generally what I'd want is:
- mixed major version: not allowed
- mixed minor version: allowed
Could we change this to be more oriented around major versions, or do we also have a use case for considering any version difference?
There was a problem hiding this comment.
Ah i see on the trunk patch:
Mixed mode repairs and streaming adds many unknowns and additional performance impacts during upgrades.
There was a problem hiding this comment.
i can go either way, i know id actually like it for minor repairs too but i can be convinced of major repairs only.
Also wondering if we should make it for autorepairs only? currently it blocks manual repairs and autorepairs
| /** | ||
| * Guardrail disabling repairs when there are mixed versions | ||
| */ | ||
| public static final DisableFlag mixedRepairsEnabled = |
There was a problem hiding this comment.
Curious why using DisableFlag in 4.1 and EnableFlag on trunk?
There was a problem hiding this comment.
thats a change in guardrails, in 4.1 it was disableflag, in trunk they changed it to enableflag and disableflag doesnt exist anymore
## Summary
This diff:
- Implement the read flow to view table when option is enabled
- `ViewRowComparison.compare` added to ViewUtils, the method takes in
the base table row, view table row and returns a comparison result
This flow is for debugging purpose only and won't be used in making the
rebuild decision.
TODO:
- Metrics on successful mutation applied
- Metrics on different types of Comparison result when view table read
is enabled
## Test Plan
UT added
### Tested locally with ccm:
Schema:
```
CREATE KEYSPACE stresscql WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '3'} AND durable_writes = true;
CREATE TABLE stresscql.blogposts (
domain text,
published_date timeuuid,
author text,
body text,
title text,
url text,
PRIMARY KEY (domain, published_date)
) WITH CLUSTERING ORDER BY (published_date DESC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.1
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = 'A table to hold blog posts'
AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4', 'sstable_size_in_mb': '960'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND memtable = 'default'
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 10800
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'NONE'
AND speculative_retry = '99p'
AND strict_mv_consistency = true
AND auto_repair = {'bootstrap_enabled': 'true', 'full_enabled': 'true', 'incremental_enabled': 'true', 'paxos_cleanup_enabled': 'true', 'preview_repaired_enabled': 'true', 'priority': '0'};
CREATE MATERIALIZED VIEW stresscql.blogposts_mv AS
SELECT published_date, domain, author
FROM stresscql.blogposts
WHERE domain IS NOT NULL AND published_date IS NOT NULL AND author IS NOT NULL
PRIMARY KEY (published_date, domain)
WITH CLUSTERING ORDER BY (domain ASC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND memtable = 'default'
AND crc_check_chance = 1.0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'NONE'
AND speculative_retry = '99p'
AND auto_repair = {'bootstrap_enabled': 'true', 'full_enabled': 'true', 'incremental_enabled': 'true', 'paxos_cleanup_enabled': 'true', 'preview_repaired_enabled': 'true', 'priority': '0'};
CREATE MATERIALIZED VIEW stresscql.blogposts_mv2 AS
SELECT author, published_date, domain, body
FROM stresscql.blogposts
WHERE domain IS NOT NULL AND published_date IS NOT NULL AND author IS NOT NULL
PRIMARY KEY (author, published_date, domain)
WITH CLUSTERING ORDER BY (published_date DESC, domain ASC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND memtable = 'default'
AND crc_check_chance = 1.0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p'
AND auto_repair = {'bootstrap_enabled': 'true', 'full_enabled': 'true', 'incremental_enabled': 'true', 'paxos_cleanup_enabled': 'true', 'preview_repaired_enabled': 'true', 'priority': '0'};
```
Data inserted:
```
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('news.com', dbda4b90-dc7b-11f0-86e8-7d562eb05c96, 'Eve', 'Post 5 body', 'Title 5', 'http://news.com/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('news.com', daa8a960-dc7b-11f0-86e8-7d562eb05c96, 'Dana', 'Post 4 body', 'Title 4', 'http://news.com/1');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('dev.org', dbdfc9d0-dc7b-11f0-86e8-7d562eb05c96, 'Jack', 'Post 10 body', 'Title 10', 'http://dev.org/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('dev.org', dbde9150-dc7b-11f0-86e8-7d562eb05c96, 'Ivy', 'Post 9 body', 'Title 9', 'http://dev.org/1');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('tech.io', dbdda6f0-dc7b-11f0-86e8-7d562eb05c96, 'Henry', 'Post 8 body', 'Title 8', 'http://tech.io/3');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('tech.io', dbdce3a0-dc7b-11f0-86e8-7d562eb05c96, 'Grace', 'Post 7 body', 'Title 7', 'http://tech.io/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('tech.io', dbdbf940-dc7b-11f0-86e8-7d562eb05c96, 'Frank', 'Post 6 body', 'Title 6', 'http://tech.io/1');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('blog.com', daa749d0-dc7b-11f0-86e8-7d562eb05c96, 'Charlie', 'Post 3 body', 'Title 3', 'http://blog.com/3');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('blog.com', d975a7a0-dc7b-11f0-86e8-7d562eb05c96, 'Bob', 'Post 2 body', 'Title 2', 'http://blog.com/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('blog.com', d8442c80-dc7b-11f0-86e8-7d562eb05c96, 'Alice', 'Post 1 body', 'Title 1', 'http://blog.com/1');
```
Dropped the following mutations to MV:
```
UPDATE stresscql.blogposts SET author='author_updated' WHERE domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
UPDATE stresscql.blogposts SET body='body_updated' WHERE domain='news.com' AND published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96;
DELETE FROM stresscql.blogposts WHERE domain='blog.com' AND published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96;
INSERT INTO stresscql.blogposts (domain, published_date,author,body) values ('addition1', 6b5a70b0-dc7c-11f0-86e8-7d562eb05c96, 'addition_author1', 'addition_body1');
INSERT INTO stresscql.blogposts (domain, published_date,author,body) values ('addition2', 6b5a70b0-dc7c-11f0-86e8-7d562eb05c95, 'addition_author2', 'addition_body2');
UPDATE stresscql.blogposts USING TTL 60 SET author='author_updated' WHERE domain='addition2' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95;
```
Run the following to trigger repair:
```
delete from stresscql.blogposts_mv where domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='blog.com' AND published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='addition1' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='news.com' AND published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='addition2' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95;
delete from stresscql.blogposts_mv2 where author='Eve' AND domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='Charlie' AND domain='blog.com' AND published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='addition_author1' AND domain='addition1' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='Dana' AND domain='news.com' AND published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='author_updated' AND domain='addition2' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95;
delete from stresscql.blogposts_mv2 where author='author_updated' AND domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
```
The following can be found in log:
```
INFO [Native-Transport-Requests-1] 2026-01-14 11:12:35,292 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=MISMATCH author: {val='author_updated'->'Eve' ts=1768417593372000->1768417544229000}
INFO [Native-Transport-Requests-3] 2026-01-14 11:12:35,298 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96, domain=blog.com) status=STALE_BASE_ABSENT base row is tombstone but view row exists
INFO [Native-Transport-Requests-3] 2026-01-14 11:12:35,302 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96, domain=addition1) status=MISSING view row not found
INFO [Native-Transport-Requests-1] 2026-01-14 11:12:35,306 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=IDENTICAL
INFO [Native-Transport-Requests-3] 2026-01-14 11:12:35,312 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95, domain=addition2) status=MISSING view row not found
INFO [Native-Transport-Requests-2] 2026-01-14 11:12:35,317 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=Eve, published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=STALE_VALUE_CHANGED NonPkCol=author stale record, base=author_updated (ts=1768417593372000, ttl=0, delTime=2147483647), view=Eve (rowTs=1768417544229000, rowTtl=0, rowExpTime=2147483647)
INFO [Native-Transport-Requests-2] 2026-01-14 11:12:35,321 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=Charlie, published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96, domain=blog.com) status=STALE_BASE_ABSENT base row is tombstone but view row exists
INFO [Native-Transport-Requests-1] 2026-01-14 11:12:35,325 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=addition_author1, published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96, domain=addition1) status=MISSING view row not found
INFO [Native-Transport-Requests-3] 2026-01-14 11:12:35,329 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=Dana, published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=MISMATCH body: {val='body_updated'->'Post 4 body' ts=1768417593379000->1768417544296000}
INFO [Native-Transport-Requests-1] 2026-01-14 11:12:35,333 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=author_updated, published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95, domain=addition2) status=CONSISTENT_FILTERED_NONPK_COLUMN NonPKCol=author is dead
INFO [Native-Transport-Requests-1] 2026-01-14 11:12:35,760 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=author_updated, published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=MISSING view row not found
```
## Jira Issues
T3-CAAS-699
---------
Co-authored-by: Yuqi Yan <yukei0509@gmail.com>
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira