fix: handle malformed configuration payloads without panics#2
Draft
anyasabo wants to merge 4 commits into
Draft
Conversation
Configuration decode failures in append and install-snapshot RPC paths should be handled as recoverable errors, not panics. Add non-panicking decode plumbing and targeted regression tests for malformed configuration payloads. Co-authored-by: Cursor <cursoragent@cursor.com>
…then tests - Fix remaining panicking DecodeConfiguration call in fsm.go applySingle (ConfigurationStore path) by using error-returning decodeConfiguration. - Add doc comment to unexported decodeConfiguration helper. - Strengthen AppendEntries malformed-config test: verify log-store contains the persisted entry, lastLog was not advanced, and configuration state was not mutated. - Strengthen InstallSnapshot malformed-config test: verify configuration state was not mutated by failed decode. Co-authored-by: Cursor <cursoragent@cursor.com>
Strengthen append/snapshot malformed configuration tests to assert configuration state immutability, and add runFSM coverage proving malformed configuration entries are skipped without stopping subsequent valid applies. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a regression test proving NewRaft returns a decode error (instead of panicking) when replaying a persisted malformed LogConfiguration entry during startup. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
PR #465 changed the legacy v1
decodePeerspath to return errors instead of panicking on malformed input. The v2+DecodeConfigurationcall sites still panicked on malformed bytes in key runtime paths.For a core distributed-systems library, panicking on malformed persisted or remote payloads is an availability risk. These paths should reject bad payloads and keep the process alive.
What changed
decodeConfiguration([]byte) (Configuration, error)as an error-returning counterpart toDecodeConfiguration.appendEntriesconfig-entry handling (processConfigurationLogEntry,LogConfigurationcase) to return errors instead of panicking.installSnapshot(SnapshotVersion > 0) to return RPC errors on malformed configuration payloads instead of panicking.runFSM/applySingle(ConfigurationStorepath) to log decode failures and skip malformedLogConfigurationentries instead of panicking.Behavior and invariants
AppendEntries (
LogConfiguration)AppendEntriesResponse.Success=false) instead of panicking.lastLogis not advanced on decode failure,InstallSnapshot (
SnapshotVersion > 0)InstallSnapshotResponse.Success=false) instead of panicking.FSM apply (
ConfigurationStore)LogConfigurationentries are logged and skipped (no panic).Startup replay (
NewRaftlog scan)LogConfigurationentry now returns a decode error instead of panicking while scanning configuration entries during startup.Ordering note
For
LogConfiguration, decode now happens before configuration-state mutation inprocessConfigurationLogEntry. This prevents partial state promotion when decode fails.The deprecated peer-log branch (
LogAddPeerDeprecated/LogRemovePeerDeprecated) is unchanged in this PR.Scope / follow-up
This PR intentionally focuses on panic-to-error behavior in decode paths.
In the append path where
StoreLogssucceeds and later config processing fails, behavior is now:lastLogcache may remain stale on that return path.Keeping
lastLogcache in sync in that failure path is tracked as a separate follow-up in PR #6.Why this is safe
DecodeConfigurationpanic semantics are preserved for callers that rely on panic-on-error behavior.Test plan
go test -run "TestRaft_(NewRaftMalformedConfigurationEntryInLogReturnsError|RunFSMMalformedConfigurationSkipsEntryAndContinues|AppendEntriesMalformedConfigurationReturnsError|InstallSnapshotMalformedConfigurationReturnsError)" -count=1 .go test -run "TestRaft_.*Configuration.*" -count=1 .