Skip to content

Conversation

@krishvishal
Copy link
Contributor

@krishvishal krishvishal commented Feb 4, 2026

The snapshot system has three layers:

Snapshotable per-state-machine trait. Each StreamsInner, UsersInner, ConsumerGroupsInner implements to_snapshot() / from_snapshot() to convert between in-memory state.

SnapshotContributor visitor trait using the same recursive variadic tuple pattern as StateMachine::update. Walks (Users, (Streams, (ConsumerGroups, ()))) at compile time, collecting each state machine's serialized section into a Vec<SnapshotSection>.

MetadataSnapshot a top-level interface for the generic snapshot type on IggyMetadata. Defines the full lifecycle: create (snapshot from mux) → encode (to bytes) → decode (from bytes) → restore (back to mux).

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 53.78422% with 287 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.46%. Comparing base (0aa1f38) to head (3bd27da).

Files with missing lines Patch % Lines
core/metadata/src/stm/stream.rs 13.28% 109 Missing and 2 partials ⚠️
core/metadata/src/stm/consumer_group.rs 39.21% 57 Missing and 5 partials ⚠️
core/metadata/src/stm/user.rs 62.89% 53 Missing and 6 partials ⚠️
core/metadata/src/stats/mod.rs 0.00% 24 Missing ⚠️
core/metadata/src/stm/snapshot.rs 80.80% 18 Missing and 1 partial ⚠️
core/metadata/src/impls/metadata.rs 82.05% 5 Missing and 2 partials ⚠️
core/metadata/src/stm/mux.rs 91.52% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2675      +/-   ##
============================================
- Coverage     68.62%   68.46%   -0.17%     
  Complexity      637      637              
============================================
  Files           734      735       +1     
  Lines         59660    60281     +621     
  Branches      56073    56694     +621     
============================================
+ Hits          40943    41269     +326     
- Misses        16717    16988     +271     
- Partials       2000     2024      +24     
Flag Coverage Δ
java 51.41% <ø> (ø)
rust 69.62% <53.78%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
core/metadata/src/stm/mod.rs 50.00% <100.00%> (+8.20%) ⬆️
core/metadata/src/stm/mux.rs 92.30% <91.52%> (-2.43%) ⬇️
core/metadata/src/impls/metadata.rs 11.94% <82.05%> (+11.94%) ⬆️
core/metadata/src/stm/snapshot.rs 80.80% <80.80%> (ø)
core/metadata/src/stats/mod.rs 0.00% <0.00%> (ø)
core/metadata/src/stm/user.rs 35.21% <62.89%> (+35.21%) ⬆️
core/metadata/src/stm/consumer_group.rs 17.62% <39.21%> (+17.62%) ⬆️
core/metadata/src/stm/stream.rs 4.43% <13.28%> (+4.43%) ⬆️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

/// # Arguments
/// * `mux` - The multiplexing state machine containing all sub-state machines
/// * `commit_number` - The VSR commit number this snapshot corresponds to
fn create<T>(mux: &MuxStateMachine<T>, commit_number: u64) -> Result<Self, Self::Error>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We impl StateMachine for MuxStateMachine, therefor I think we could accept as first argument stm: &mut T

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// This is the interface that `MetadataHandle::Snapshot` must satisfy.
/// It provides methods for creating, encoding, decoding, and restoring snapshots.
#[allow(unused)]
pub trait MetadataSnapshot: Sized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets rename this to Snapshot trait, if there already exists one (that I created as an stub), remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

where
T: StateMachine + SnapshotContributor;

/// Get the VSR commit number this snapshot corresponds to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could use an sequence_number instead of commit number, the idea is that the sequence number would be monotonically growing on each snapshot, independent of the commit number from consensus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/// Consumer group member snapshot representation for serialization.
#[derive(Debug, Clone, Serialize, Deserialize)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need those intermediate structs, we can have the State structs impl Serialize and Deserialize directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't serialize/deserialize directly because some members of these structs can't be used with serde directly. Thats why I had to use intermediate structs.

}
}

/// Recursive case for variadic tuple pattern: (Head, Tail)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I am not sure if this is the simplest way to go around it.

I imagine the flow of abstraction to be identical to one of the StateMachine trait,
e.g we would have an Snapshotable trait, that would be implemented for everything that impls StateMachine trait,
since StateMachine trait is implemented both for MuxStateMachine aswell as the variadic!() tuple of state machines, the API would be called this way:

fn snapshot_metadata(&self, ....) {
    let mut snapshot = MetadataSnapshot::default(); <-- This would impl the `Snapshot` trait.
    
    // Here is the magic, `fill_snapshot` impl for `MuxStateMachine` would proxy to the `variadic!()` tuple impl,
    // impl Snapshotable for variadic!(St, ... Rest)
    //    where St: StateMachine + FillSnapshot, 
    //          Rest: Snapshotable,
    // {
    //    fn fill_snapshot<S: Snapshot>(&self, snapshot: &mut S) {
    //        self.0.do_fill_snapshot(snapshot);
    //        self.1.fill_snapshot(snapshot);
    //    } 
    
    //    fn restore_snapshot(...) {
    //        // ... Here similar code for restoring.
    //    }
    // }
    // You can choose whatever trait name for the `FillSnapshot` trait, smth that covers filling/recovering
    // You can have associated types on the `Snapshot` trait that would be used to represent the currently filled snapshot 
    // (for example a binary blob), and use those as arguments into the `fill_snapshot` and `restore_snapshot` methods.
    // and the snapshot from which we would restore State.
    self.mux.fill_snapshot(&mut snapshot);
}

The idea is to avoid using those `const SECTION:NAME: &'str`, rather rely on the type checking to perform the walk for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could alternatively look into extending the StateMachine trait, by adding extra bounds on it

trait StateMachine + Snapshotable + ...
{
        
}

and then have single impl block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added type based recursion and snapshotting. Removed string based version.

pub topic_name_index: Vec<((String, String), Vec<usize>)>,
}

impl Snapshotable for ConsumerGroupsInner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impl those for ConsumerGroup and Streams and Users, rather for their inners. You can create an metehod on the LeftRight, that exposes the read handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// Contains metadata about the snapshot and a collection of sections,
/// where each section corresponds to one state machine's serialized state.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SnapshotEnvelope {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid approach, to store an type-erased bytes inside of the snapshot, but alternative approach (the one that redpanda did choose), is for the snapshot to store an intermediate representation of all the Snapshotable states and serialize those. So rather than serializing individual states, they serialize the snapshot instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about it a lot and not sure what is the trade off space between those two approaches (I guess you need more allocations with their approach, as you have to copy the data first and then serialize), where with our approach, we just serialize the state and append it to the snapshot envelope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed SnapshotEnvelope. Now we store each snapshot struct and serialize it in one shot.

@hubcio hubcio changed the title feat(metadata): impl Snapshot interface for Mux state machine. feat(metadata): impl Snapshot interface for Mux state machine Feb 9, 2026
@krishvishal krishvishal force-pushed the snapshot-interface branch 2 times, most recently from f195436 to 9591421 Compare February 9, 2026 13:18
@krishvishal
Copy link
Contributor Author

Fixed merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants