Skip to content

add guardrail for multiple version repairs being blocked#20

Open
clohfink wants to merge 1 commit intojaydeepkumar1984:auto_repair_v2_on_4_1from
clohfink:c20048-multi-version-repair-guardrail
Open

add guardrail for multiple version repairs being blocked#20
clohfink wants to merge 1 commit intojaydeepkumar1984:auto_repair_v2_on_4_1from
clohfink:c20048-multi-version-repair-guardrail

Conversation

@clohfink
Copy link

@clohfink clohfink commented Jan 7, 2025

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

// When false, it behaves the same as normal streaming.
public volatile boolean cdc_on_repair_enabled = true;

public boolean mixed_version_repairs_enabled = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generally what I'd want is:

  • mixed major version: not allowed
  • mixed minor version: allowed

Could we change this to be more oriented around major versions, or do we also have a use case for considering any version difference?

Copy link

@tolbertam tolbertam Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see on the trunk patch:

Mixed mode repairs and streaming adds many unknowns and additional performance impacts during upgrades.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can go either way, i know id actually like it for minor repairs too but i can be convinced of major repairs only.

Also wondering if we should make it for autorepairs only? currently it blocks manual repairs and autorepairs

/**
* Guardrail disabling repairs when there are mixed versions
*/
public static final DisableFlag mixedRepairsEnabled =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why using DisableFlag in 4.1 and EnableFlag on trunk?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a change in guardrails, in 4.1 it was disableflag, in trunk they changed it to enableflag and disableflag doesnt exist anymore

jaydeepkumar1984 pushed a commit that referenced this pull request Jan 28, 2026
## Summary
This diff:

- Implement the read flow to view table when option is enabled
- `ViewRowComparison.compare` added to ViewUtils, the method takes in
the base table row, view table row and returns a comparison result

This flow is for debugging purpose only and won't be used in making the
rebuild decision.

TODO:

- Metrics on successful mutation applied
- Metrics on different types of Comparison result when view table read
is enabled

## Test Plan
UT added

### Tested locally with ccm:

Schema:

```
CREATE KEYSPACE stresscql WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '3'}  AND durable_writes = true;

CREATE TABLE stresscql.blogposts (
    domain text,
    published_date timeuuid,
    author text,
    body text,
    title text,
    url text,
    PRIMARY KEY (domain, published_date)
) WITH CLUSTERING ORDER BY (published_date DESC)
    AND additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.1
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = 'A table to hold blog posts'
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4', 'sstable_size_in_mb': '960'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND memtable = 'default'
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 10800
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'NONE'
    AND speculative_retry = '99p'
    AND strict_mv_consistency = true
    AND auto_repair = {'bootstrap_enabled': 'true', 'full_enabled': 'true', 'incremental_enabled': 'true', 'paxos_cleanup_enabled': 'true', 'preview_repaired_enabled': 'true', 'priority': '0'};

CREATE MATERIALIZED VIEW stresscql.blogposts_mv AS
    SELECT published_date, domain, author
    FROM stresscql.blogposts
    WHERE domain IS NOT NULL AND published_date IS NOT NULL AND author IS NOT NULL
    PRIMARY KEY (published_date, domain)
 WITH CLUSTERING ORDER BY (domain ASC)
    AND additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND memtable = 'default'
    AND crc_check_chance = 1.0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'NONE'
    AND speculative_retry = '99p'
    AND auto_repair = {'bootstrap_enabled': 'true', 'full_enabled': 'true', 'incremental_enabled': 'true', 'paxos_cleanup_enabled': 'true', 'preview_repaired_enabled': 'true', 'priority': '0'};

CREATE MATERIALIZED VIEW stresscql.blogposts_mv2 AS
    SELECT author, published_date, domain, body
    FROM stresscql.blogposts
    WHERE domain IS NOT NULL AND published_date IS NOT NULL AND author IS NOT NULL
    PRIMARY KEY (author, published_date, domain)
 WITH CLUSTERING ORDER BY (published_date DESC, domain ASC)
    AND additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND memtable = 'default'
    AND crc_check_chance = 1.0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p'
    AND auto_repair = {'bootstrap_enabled': 'true', 'full_enabled': 'true', 'incremental_enabled': 'true', 'paxos_cleanup_enabled': 'true', 'preview_repaired_enabled': 'true', 'priority': '0'};

```

Data inserted:

```
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('news.com', dbda4b90-dc7b-11f0-86e8-7d562eb05c96, 'Eve', 'Post 5 body', 'Title 5', 'http://news.com/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('news.com', daa8a960-dc7b-11f0-86e8-7d562eb05c96, 'Dana', 'Post 4 body', 'Title 4', 'http://news.com/1');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('dev.org', dbdfc9d0-dc7b-11f0-86e8-7d562eb05c96, 'Jack', 'Post 10 body', 'Title 10', 'http://dev.org/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('dev.org', dbde9150-dc7b-11f0-86e8-7d562eb05c96, 'Ivy', 'Post 9 body', 'Title 9', 'http://dev.org/1');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('tech.io', dbdda6f0-dc7b-11f0-86e8-7d562eb05c96, 'Henry', 'Post 8 body', 'Title 8', 'http://tech.io/3');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('tech.io', dbdce3a0-dc7b-11f0-86e8-7d562eb05c96, 'Grace', 'Post 7 body', 'Title 7', 'http://tech.io/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('tech.io', dbdbf940-dc7b-11f0-86e8-7d562eb05c96, 'Frank', 'Post 6 body', 'Title 6', 'http://tech.io/1');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('blog.com', daa749d0-dc7b-11f0-86e8-7d562eb05c96, 'Charlie', 'Post 3 body', 'Title 3', 'http://blog.com/3');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('blog.com', d975a7a0-dc7b-11f0-86e8-7d562eb05c96, 'Bob', 'Post 2 body', 'Title 2', 'http://blog.com/2');
INSERT INTO stresscql.blogposts (domain, published_date, author, body, title, url)
VALUES ('blog.com', d8442c80-dc7b-11f0-86e8-7d562eb05c96, 'Alice', 'Post 1 body', 'Title 1', 'http://blog.com/1');
```

Dropped the following mutations to MV:

```
UPDATE stresscql.blogposts SET author='author_updated' WHERE domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
UPDATE stresscql.blogposts SET body='body_updated' WHERE domain='news.com' AND published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96;
DELETE FROM stresscql.blogposts WHERE domain='blog.com' AND published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96;
INSERT INTO stresscql.blogposts (domain, published_date,author,body) values ('addition1', 6b5a70b0-dc7c-11f0-86e8-7d562eb05c96, 'addition_author1', 'addition_body1');
INSERT INTO stresscql.blogposts (domain, published_date,author,body) values ('addition2', 6b5a70b0-dc7c-11f0-86e8-7d562eb05c95, 'addition_author2', 'addition_body2');
UPDATE stresscql.blogposts USING TTL 60 SET author='author_updated' WHERE domain='addition2' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95;
```

Run the following to trigger repair:

```
delete from stresscql.blogposts_mv where domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='blog.com' AND published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='addition1' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='news.com' AND published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv where domain='addition2' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95;
delete from stresscql.blogposts_mv2 where author='Eve' AND domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='Charlie' AND domain='blog.com' AND published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='addition_author1' AND domain='addition1' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='Dana' AND domain='news.com' AND published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96;
delete from stresscql.blogposts_mv2 where author='author_updated' AND domain='addition2' AND published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95;
delete from stresscql.blogposts_mv2 where author='author_updated' AND domain='news.com' AND published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96;
```

The following can be found in log:

```
INFO  [Native-Transport-Requests-1] 2026-01-14 11:12:35,292 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=MISMATCH author: {val='author_updated'->'Eve' ts=1768417593372000->1768417544229000}
INFO  [Native-Transport-Requests-3] 2026-01-14 11:12:35,298 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96, domain=blog.com) status=STALE_BASE_ABSENT base row is tombstone but view row exists
INFO  [Native-Transport-Requests-3] 2026-01-14 11:12:35,302 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96, domain=addition1) status=MISSING view row not found
INFO  [Native-Transport-Requests-1] 2026-01-14 11:12:35,306 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=IDENTICAL 
INFO  [Native-Transport-Requests-3] 2026-01-14 11:12:35,312 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv pk=(published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95, domain=addition2) status=MISSING view row not found
INFO  [Native-Transport-Requests-2] 2026-01-14 11:12:35,317 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=Eve, published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=STALE_VALUE_CHANGED NonPkCol=author stale record, base=author_updated (ts=1768417593372000, ttl=0, delTime=2147483647), view=Eve (rowTs=1768417544229000, rowTtl=0, rowExpTime=2147483647)
INFO  [Native-Transport-Requests-2] 2026-01-14 11:12:35,321 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=Charlie, published_date=daa749d0-dc7b-11f0-86e8-7d562eb05c96, domain=blog.com) status=STALE_BASE_ABSENT base row is tombstone but view row exists
INFO  [Native-Transport-Requests-1] 2026-01-14 11:12:35,325 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=addition_author1, published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c96, domain=addition1) status=MISSING view row not found
INFO  [Native-Transport-Requests-3] 2026-01-14 11:12:35,329 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=Dana, published_date=daa8a960-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=MISMATCH body: {val='body_updated'->'Post 4 body' ts=1768417593379000->1768417544296000}
INFO  [Native-Transport-Requests-1] 2026-01-14 11:12:35,333 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=author_updated, published_date=6b5a70b0-dc7c-11f0-86e8-7d562eb05c95, domain=addition2) status=CONSISTENT_FILTERED_NONPK_COLUMN NonPKCol=author is dead
INFO  [Native-Transport-Requests-1] 2026-01-14 11:12:35,760 ModificationStatement.java:605 - rebuildMVKey view=blogposts_mv2 pk=(author=author_updated, published_date=dbda4b90-dc7b-11f0-86e8-7d562eb05c96, domain=news.com) status=MISSING view row not found
```

## Jira Issues
T3-CAAS-699

---------

Co-authored-by: Yuqi Yan <yukei0509@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments