CASSANDRA-21174: Prevent log-incompatible upgrades#4614
CASSANDRA-21174: Prevent log-incompatible upgrades#4614aparna0522 wants to merge 1 commit intoapache:trunkfrom
Conversation
| // This protects against restarting a node with an older binary. | ||
| Version clusterVersion = prev.directory.commonSerializationVersion; | ||
| Version newNodeVersion = nodeVersion.serializationVersion(); | ||
| if (newNodeVersion.isBefore(clusterVersion)) |
There was a problem hiding this comment.
Isn’t it better to have this check in the NodeVersion class to avoid duplicate code?
There was a problem hiding this comment.
Thanks for the review. I considered extracting this validation into NodeVersion but I think I might not go with it for the following few reasons:
- Separation of concerns: This check compares nodeVersion.serializationVersion() against prev.directory.commonSerializationVersion. The cluster's common version is external context that NodeVersion shouldn't own since it's a cluster-level property derived from the directory, not aa single node's version property.
- Context-specific semantics: The checks serve different purposes in each transformation. Register includes a guard that skips validation for the first node in a new cluster (empty directory), whereas Startup performs the check unconditionally since a restarting node always has an existing cluster to compare against.
- Avoiding inappropriate dependencies: NodeVersion is like a data class responsible for holding version information and providing serialization. Adding this validation would require NodeVersion to accept ClusterMetadata or Directory as a parameter and return transformation-specific types like Rejected. This would couple a low-level data model to higher-level coordination abstractions. It could also inadveertently cause circular dependency risks, since Directory already contains Map<NodeId, NodeVersion>.
If you have any other thoughts I am happy to discuss.
| if (newNodeVersion.isBefore(clusterVersion)) | ||
| { | ||
| return new Rejected(INVALID, | ||
| String.format("Cannot register node: this node's metadata serialization version %s " + |
There was a problem hiding this comment.
How to recover from this situation? Would it be possible to include the recovery strategy in this message or a log message?
There was a problem hiding this comment.
The recovery strategy here is to upgrade the joining node to a cassandra version that supports the required metadata serialization version, so usually you can think of it as
Node Cassandra version >= minimum serialization version.
I felt this was reasonably implied by the error message, but I'm open to adding extra level details or hints if you feel it would helpful. What level of detail do you have in mind?
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira