Skip to content

CASSANDRA-21174: Prevent log-incompatible upgrades#4614

Open
aparna0522 wants to merge 1 commit intoapache:trunkfrom
aparna0522:anaik-163631557
Open

CASSANDRA-21174: Prevent log-incompatible upgrades#4614
aparna0522 wants to merge 1 commit intoapache:trunkfrom
aparna0522:anaik-163631557

Conversation

@aparna0522
Copy link
Contributor

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

@aparna0522 aparna0522 marked this pull request as ready for review February 13, 2026 20:47
// This protects against restarting a node with an older binary.
Version clusterVersion = prev.directory.commonSerializationVersion;
Version newNodeVersion = nodeVersion.serializationVersion();
if (newNodeVersion.isBefore(clusterVersion))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t it better to have this check in the NodeVersion class to avoid duplicate code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I considered extracting this validation into NodeVersion but I think I might not go with it for the following few reasons:

  1. Separation of concerns: This check compares nodeVersion.serializationVersion() against prev.directory.commonSerializationVersion. The cluster's common version is external context that NodeVersion shouldn't own since it's a cluster-level property derived from the directory, not aa single node's version property.
  2. Context-specific semantics: The checks serve different purposes in each transformation. Register includes a guard that skips validation for the first node in a new cluster (empty directory), whereas Startup performs the check unconditionally since a restarting node always has an existing cluster to compare against.
  3. Avoiding inappropriate dependencies: NodeVersion is like a data class responsible for holding version information and providing serialization. Adding this validation would require NodeVersion to accept ClusterMetadata or Directory as a parameter and return transformation-specific types like Rejected. This would couple a low-level data model to higher-level coordination abstractions. It could also inadveertently cause circular dependency risks, since Directory already contains Map<NodeId, NodeVersion>.

If you have any other thoughts I am happy to discuss.

if (newNodeVersion.isBefore(clusterVersion))
{
return new Rejected(INVALID,
String.format("Cannot register node: this node's metadata serialization version %s " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to recover from this situation? Would it be possible to include the recovery strategy in this message or a log message?

Copy link
Contributor Author

@aparna0522 aparna0522 Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recovery strategy here is to upgrade the joining node to a cassandra version that supports the required metadata serialization version, so usually you can think of it as
Node Cassandra version >= minimum serialization version.
I felt this was reasonably implied by the error message, but I'm open to adding extra level details or hints if you feel it would helpful. What level of detail do you have in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants