From d8683e440a98f6708b7e521dc3839dad94108200 Mon Sep 17 00:00:00 2001 From: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Date: Tue, 18 Nov 2025 12:18:40 +0530 Subject: [PATCH 01/17] hot reload feature proposal Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 905 ++++++++++++++++++++++++++++ 1 file changed, 905 insertions(+) create mode 100644 proposals/012-hot-reload-feature.md diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md new file mode 100644 index 0000000..ca760fe --- /dev/null +++ b/proposals/012-hot-reload-feature.md @@ -0,0 +1,905 @@ +# Hot Reload Feature + +As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system + +This proposal is structured as a multi-part implementation to ensure clear separation of concerns and manageable development phases. + +- Part 1: Configuration Change Detection - This part focuses on monitoring configuration files, parsing changes, and comparing old vs new configurations to identify exactly which virtual clusters need to be restarted. It provides a clean interface that returns structured change operations (additions, removals, modifications) without actually performing any restart operations. + +- Part 2: Graceful Virtual Cluster Restart - This part handles the actual restart operations, including graceful connection draining, in-flight message completion, and rollback mechanisms. It takes the change decisions from Part 1 and executes them safely while ensuring minimal service disruption. + +# Part 1: Configuration Change Detection Framework + +With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures ) + +POC PR - https://github.com/kroxylicious/kroxylicious/pull/2901 + +## Core Classes & Structure + +1. **ConfigWatcherService** - File system monitoring and configuration loading orchestrator + - Monitors configuration file changes using Java NIO WatchService + - Parses YAML configuration files using the existing ConfigParser when changes are detected + - Provides graceful shutdown of executor services and watch resources + - Triggers configuration change callbacks asynchronously to initiate the hot-reload process + - Handles file parsing errors and continues monitoring + - `KafkaProxy.java` - As this is the entry point to the proxy app, this class will configure the callback which needs to be triggered when there is a config change. This class will also be responsible for setting up the ConfigurationChangeHandler and ConfigWatcherService + +``` +public final class KafkaProxy implements AutoCloseable { + ..... + public KafkaProxy(PluginFactoryRegistry pfr, Configuration config, Features features, Path configFilePath) { + ..... + // Initialize configuration change handler with direct list of detectors + this.configurationChangeHandler = new ConfigurationChangeHandler( + List.of( + new VirtualClusterChangeDetector(), + new FilterChangeDetector()), + virtualClusterManager); + } + + ... + + public CompletableFuture startConfigurationWatcher(Path configFilePath) { + ..... + this.configWatcherService = new ConfigWatcherService( + configFilePath, + this::handleConfigurationChange, + Duration.ofMillis(500) // 500ms debounce delay + ); + + return configWatcherService.start(); + } + ... + + public CompletableFuture stopConfigurationWatcher() { + if (configWatcherService == null) { + return CompletableFuture.completedFuture(null); + } + return configWatcherService.stop().thenRun(() -> { + configWatcherService = null; + }); + } + + private void handleConfigurationChange(Configuration newConfig) { + try { + Configuration newValidatedConfig = validate(newConfig, features); + Configuration oldConfig = this.config; + + // Create models once to avoid excessive logging during change detection + List oldModels = oldConfig.virtualClusterModel(pfr); + List newModels = newValidatedConfig.virtualClusterModel(pfr); + ConfigurationChangeContext changeContext = new ConfigurationChangeContext( + oldConfig, newValidatedConfig, oldModels, newModels); + + // Delegate to the configuration change handler + configurationChangeHandler.handleConfigurationChange(changeContext) + .thenRun(() -> { + // Update the stored configuration after successful hot-reload + this.config = newValidatedConfig; + // Synchronize the virtualClusterModels with the new configuration to ensure consistency + this.virtualClusterModels = newModels; + LOGGER.info("Configuration and virtual cluster models successfully updated"); + }); + } + catch (Exception e) { + LOGGER.error("Failed to validate or process configuration change", e); + } + } +} +``` +``` +public class ConfigWatcherService { + ... + public ConfigWatcherService(Path configFilePath, + Consumer onConfigurationChanged) { + + } + ... + public CompletableFuture start() {} + public CompletableFuture stop() {} + + //There will be more methods which will schedule a FileWatcher on the configpath + // and trigger the handleConfigurationChange() whenever there is a valid change + + private void handleConfigurationChange() { + ... + onConfigurationChanged.accept(newConfiguration); + ... + } + +} +``` + +2. **ConfigurationChangeHandler** - Orchestrates the entire configuration change process from detection to execution with rollback capability. + - This handler accepts a list of detector interfaces which run and identify which virtual clusters are affected. + - Once we get to know the list of clusters that need to be added/removed/restarted, this class will call the VirtualClusterManager methods to perform addition/deletion/restarts (This class will be discussed in part 2) + - This class also creates an instance of `ConfigurationChangeRollbackTracker`, which tracks what operations are being applied. So in case of any failures, the operations performed can be reversed to previous stable state. +``` +public class ConfigurationChangeHandler { + + public ConfigurationChangeHandler(List changeDetectors, + VirtualClusterManager virtualClusterManager) { + this.changeDetectors = List.copyOf(changeDetectors); + this.virtualClusterManager = virtualClusterManager; + ... + + /** + * Main entry point for handling configuration changes. + */ + public CompletableFuture handleConfigurationChange( + ConfigurationChangeContext changeContext) { + + // 1. Detect changes using all registered detectors + ChangeResult changes = detectChanges(changeContext); + + if (!changes.hasChanges()) { + LOGGER.info("No changes detected - hot-reload not needed"); + return CompletableFuture.completedFuture(null); + } + + // 2. Process changes with rollback tracking + ConfigurationChangeRollbackTracker rollbackTracker = new ConfigurationChangeRollbackTracker(); + + return processConfigurationChanges(changes, changeContext, rollbackTracker) + .thenRun(() -> { + LOGGER.info("Configuration hot-reload completed successfully - {} operations processed", + changes.getTotalOperations()); + }) + .whenComplete((result, throwable) -> { + if (throwable != null) { + LOGGER.error("Configuration change failed - initiating rollback", throwable); + performRollback(rollbackTracker); + } + }); + } + + /** + * Coordinates multiple change detectors and aggregates their results. + */ + private ChangeResult detectChanges(ConfigurationChangeContext context) { + Set allClustersToRemove = new LinkedHashSet<>(); + Set allClustersToAdd = new LinkedHashSet<>(); + Set allClustersToModify = new LinkedHashSet<>(); + + changeDetectors.forEach(detector -> { + try { + ChangeResult detectorResult = detector.detectChanges(context); + allClustersToRemove.addAll(detectorResult.clustersToRemove()); + allClustersToAdd.addAll(detectorResult.clustersToAdd()); + allClustersToModify.addAll(detectorResult.clustersToModify()); + } + catch (Exception e) { + LOGGER.error("Error in change detector '{}': {}", detector.getName(), e.getMessage(), e); + // Continue with other detectors even if one fails + } + }); + + return new ChangeResult( + new ArrayList<>(allClustersToRemove), + new ArrayList<>(allClustersToAdd), + new ArrayList<>(allClustersToModify)); + } + + /** + * Processes configuration changes in the correct order: Remove -> Modify -> Add + */ + private CompletableFuture processConfigurationChanges( + ChangeResult changes, + ConfigurationChangeContext context, + ConfigurationChangeRollbackTracker rollbackTracker) { + + // Sequential processing using stream.reduce() with CompletableFuture chaining + CompletableFuture chain = CompletableFuture.completedFuture(null); + + // All the below operations will happen by calling methods of `VirtualClusterManager` + // 1. Remove clusters first (to free up ports/resources) + // 2. Restart modified existing clusters + // 3. Add new clusters last + + return chain; + } + + /** + * Performs rollback of all successful operations in reverse order in failure + */ + private CompletableFuture performRollback(ConfigurationChangeRollbackTracker tracker) { + // Rollback in reverse order: Added -> Modified -> Removed + // Remove clusters that were successfully added + // Restore clusters that were modified (revert to old configuration) + // Re-add clusters that were removed + } + + +} +``` + +3. **ConfigurationChangeRollbackTracker** - This class maintains a record of all cluster operations (removals, modifications, additions) so they can be reversed if the overall configuration change fails. +``` +public class ConfigurationChangeRollbackTracker { + + /** + * Tracks a cluster removal operation. + */ + public void trackRemoval(String clusterName, VirtualClusterModel removedModel) {} + + /** + * Tracks a cluster modification operation. + */ + public void trackModification(String clusterName, VirtualClusterModel originalModel, VirtualClusterModel newModel) {} + + /** + * Tracks a cluster addition operation. + */ + public void trackAddition(String clusterName, VirtualClusterModel addedModel) {} +} +``` + +4. **ChangeDetector (Interface)** - Strategy pattern interface for different types of change detection. + - Currently we have only have one implementation - `VirtualClusterChangeDetector` which will detect changes in virtual cluster models, in future we can add a detector for filter changes aka `FilterChangeDetector` + - Provides a consistent API for comparing old vs new configurations via detectChanges() + - Returns structured ChangeResult objects with specific operations needed +``` +public interface ChangeDetector { + /** + * Name of this change detector for logging and debugging. + */ + String getName(); + + /** + * Detect configuration changes and return structured change information. + * @param context The configuration context containing old and new configurations + * @return ChangeResult containing categorized cluster operations + */ + ChangeResult detectChanges(ConfigurationChangeContext context); +} +``` +``` +public class VirtualClusterChangeDetector implements ChangeDetector { + + private static final Logger LOGGER = LoggerFactory.getLogger(VirtualClusterChangeDetector.class); + + @Override + public String getName() { + return "VirtualClusterChangeDetector"; + } + + @Override + public ChangeResult detectChanges(ConfigurationChangeContext context) { + // Check for modified clusters using equals() comparison + List modifiedClusters = findModifiedClusters(context); + + // Check for new clusters (exist in new but not old) + List newClusters = findNewClusters(context); + + // Check for removed clusters (exist in old but not new) + List removedClusters = findRemovedClusters(context); + + return new ChangeResult(removedClusters, newClusters, modifiedClusters); + } +``` + +5. **ConfigurationChangeContext (Record)** - Immutable data container providing context for change detection. + - Holds old and new Configuration objects for comparison +``` +public record ConfigurationChangeContext( + Configuration oldConfig, + Configuration newConfig, + List oldModels, + List newModels +) {} +``` + +6. **ChangeResult (Record)** - Contains lists of cluster names for each operation type (remove/add/modify) +``` +public record ChangeResult( + List clustersToRemove, + List clustersToAdd, + List clustersToModify +) {} +``` + +## Flow diagram + +Image + + + +# Part 2: Graceful Virtual Cluster Restart + +Part 2 of the hot-reload implementation focuses on gracefully restarting of virtual clusters. This component receives structured change operations from Part 1 and executes them in a carefully orchestrated sequence: **connection draining → resource deregistration → new resource registration → connection restoration.** + +The design emphasizes minimal service disruption by ensuring all in-flight Kafka requests complete before closing connections (or when a timeout is hit). + +## Core Classes & Structure + +1. **VirtualClusterManager** + - **What it does** - Acts as the high-level orchestrator for all virtual cluster lifecycle operations during hot-reload. `ConfigurationChangeHandler` calls the `VirtualClusterManager` to restart/add/remove clusters when there is a config change + - **Key Responsibilities:** + - **Cluster Addition**: Takes a new `VirtualClusterModel` and brings it online by registering it using `EndpointRegistry` + - **Cluster Removal**: Safely takes down an existing cluster by first draining all connections gracefully, then deregistering it using EndpointRegistry + - **Cluster Restart**: Performs a complete cluster reconfiguration by removing the old version and adding the new version with updated settings + - **Rollback Integration**: Automatically tracks all successful operations so they can be undone if later operations fail +``` +public class VirtualClusterManager { + + ... + public VirtualClusterManager(EndpointRegistry endpointRegistry, + ConnectionDrainManager connectionDrainManager) { + this.endpointRegistry = endpointRegistry; + this.connectionDrainManager = connectionDrainManager; + } + + /** + * Gracefully removes a virtual cluster by draining connections and deregistering endpoints. + */ + public CompletableFuture removeVirtualCluster(String clusterName, + List oldModels, + ConfigurationChangeRollbackTracker rollbackTracker) { + // 1. Find cluster model to remove + VirtualClusterModel clusterToRemove = findClusterModel(oldModels, clusterName); + + // 2. Drain connections gracefully (30s timeout) + return connectionDrainManager.gracefullyDrainConnections(clusterName, Duration.ofSeconds(30)) + .thenCompose(v -> { + // 3. Deregister all gateways from endpoint registry + var deregistrationFutures = clusterToRemove.gateways().values().stream() + .map(gateway -> endpointRegistry.deregisterVirtualCluster(gateway)) + .toArray(CompletableFuture[]::new); + + return CompletableFuture.allOf(deregistrationFutures); + }) + .thenRun(() -> { + // 4. Track removal for potential rollback + rollbackTracker.trackRemoval(clusterName, clusterToRemove); + LOGGER.info("Successfully removed virtual cluster '{}'", clusterName); + }); + } + + /** + * Restarts a virtual cluster with new configuration (remove + add). + */ + public CompletableFuture restartVirtualCluster(String clusterName, + List oldModels, + List newModels, + ConfigurationChangeRollbackTracker rollbackTracker) { + VirtualClusterModel oldModel = findClusterModel(oldModels, clusterName); + VirtualClusterModel newModel = findClusterModel(newModels, clusterName); + + // Step 1: Remove existing cluster (drain + deregister) + return removeVirtualCluster(clusterName, oldModels, rollbackTracker) + .thenCompose(v -> { + // Step 2: Add new cluster with updated configuration + return addVirtualCluster(clusterName, List.of(newModel), rollbackTracker); + }) + .thenRun(() -> { + // Step 3: Track modification and stop draining + rollbackTracker.trackModification(clusterName, oldModel, newModel); + connectionDrainManager.stopDraining(clusterName); + LOGGER.info("Successfully restarted virtual cluster '{}' with new configuration", clusterName); + }); + } + + /** + * Adds a new virtual cluster by registering endpoints and enabling connections. + */ + public CompletableFuture addVirtualCluster(String clusterName, + List newModels, + ConfigurationChangeRollbackTracker rollbackTracker) { + VirtualClusterModel newModel = findClusterModel(newModels, clusterName); + + return registerVirtualCluster(newModel) + .thenRun(() -> { + // Stop draining to allow new connections + connectionDrainManager.stopDraining(clusterName); + rollbackTracker.trackAddition(clusterName, newModel); + LOGGER.info("Successfully added new virtual cluster '{}'", clusterName); + }); + } + + /** + * Registers all gateways for a virtual cluster with the endpoint registry. + */ + private CompletableFuture registerVirtualCluster(VirtualClusterModel model) { + LOGGER.info("Registering virtual cluster '{}' with {} gateways", + model.getClusterName(), model.gateways().size()); + + var registrationFutures = model.gateways().values().stream() + .map(gateway -> endpointRegistry.registerVirtualCluster(gateway)) + .toArray(CompletableFuture[]::new); + + return CompletableFuture.allOf(registrationFutures) + .thenRun(() -> LOGGER.info("Successfully registered virtual cluster '{}' with all gateways", + model.getClusterName())); + } +} +``` + +2. **ConnectionDrainManager** + - **What it does** - Implements the graceful connection draining strategy during cluster restarts. This is what makes hot-reload "graceful" - it ensures that client requests in progress are completed rather than dropped. + - **Key Responsibilities:** + - **Draining Mode Control**: Starts/stops "draining mode" where new connections are rejected but existing ones continue + - **Backpressure Strategy**: Applies intelligent backpressure by disabling the channel`autoRead` on downstream channels while keeping upstream channels active. This is done so that any “new” client messages are rejected, while the upstream channel is kept open so that the existing inflight requests are delivered to kafka and their response are successfully delivered back to the client. + - **In-Flight Monitoring**: Continuously monitors pending Kafka requests and waits for them to complete before closing connections. This is done using `InFlightMessageTracker` class. + - **Explanation of the draining strategy** + - **Phase 1: Initiate Draining Mode** - Set cluster to "draining mode" in which any new connection attempts will be rejected. Then we proceed to gracefully closing the connection. + ``` + public CompletableFuture gracefullyDrainConnections(String clusterName, Duration totalTimeout) { + // 1. Get current connection and message state + int totalConnections = connectionTracker.getTotalConnectionCount(clusterName); + int totalInFlight = inFlightTracker.getTotalPendingRequestCount(clusterName); + + LOGGER.info("Starting graceful drain for cluster '{}' with {} connections and {} in-flight requests ({}s timeout)", + clusterName, totalConnections, totalInFlight, totalTimeout.getSeconds()); + + // 2. Enter draining mode - reject new connections + return startDraining(clusterName) + .thenCompose(v -> { + if (totalConnections == 0) { + // Fast path: no connections to drain + return CompletableFuture.completedFuture(null); + } else { + // Proceed with connection closure + return gracefullyCloseConnections(clusterName, totalTimeout); + } + }); + } + + public CompletableFuture startDraining(String clusterName) { + drainingClusters.put(clusterName, new AtomicBoolean(true)); + return CompletableFuture.completedFuture(null); + } + ``` + - **Phase 2: Apply Backpressure Strategy** - we set `autoRead = false` only on the downstream channel to reject any new client messages. `ConnectionTracker` class tracks which downstream/upstream channels are active for a given cluster name. + - **Downstream (Client→Proxy)** - `autoRead = false` - Prevents clients from sending NEW requests while allowing existing requests to complete + - **Upstream (Proxy→Kafka)** - `autoRead = true` - Allows Kafka responses to flow back to complete pending requests. In-flight request count decreases naturally as responses arrive + ``` + public CompletableFuture gracefullyCloseConnections(String clusterName, Duration timeout) { + // 1. Get separate channel collections + Set downstreamChannels = connectionTracker.getDownstreamActiveChannels(clusterName); + Set upstreamChannels = connectionTracker.getUpstreamActiveChannels(clusterName); + + // 2. Apply different strategies to different channel types + var allCloseFutures = new ArrayList>(); + + // Add downstream channel close futures + downstreamChannels.stream() + .map(this::disableAutoReadOnDownstreamChannel) + .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "DOWNSTREAM")) + .forEach(allCloseFutures::add); + + // Add upstream channel close futures + upstreamChannels.stream() + .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "UPSTREAM")) + .forEach(allCloseFutures::add); + + return CompletableFuture.allOf(allCloseFutures.toArray(new CompletableFuture[0])); + } + + private Channel disableAutoReadOnDownstreamChannel(Channel downstreamChannel) { + try { + if (downstreamChannel.isActive()) { + // Get the KafkaProxyFrontendHandler from the channel pipeline + KafkaProxyFrontendHandler frontendHandler = downstreamChannel.pipeline().get(KafkaProxyFrontendHandler.class); + if (frontendHandler != null) { + frontendHandler.applyBackpressure(); + LOGGER.debug("Applied backpressure via frontend handler for channel: L:/{}, R:/{}", + downstreamChannel.localAddress(), downstreamChannel.remoteAddress()); + } + else { + LOGGER.debug("Manually applying backpressure for channel: L:/{}, R:/{}", + downstreamChannel.localAddress(), downstreamChannel.remoteAddress()); + // Fallback to manual method if handler not found + downstreamChannel.config().setAutoRead(false); + } + } + } + catch (Exception e) { + LOGGER.warn("Failed to disable autoRead for downstream channel L:/{}, R:/{} - continuing with drain", + downstreamChannel.localAddress(), downstreamChannel.remoteAddress(), e); + } + return downstreamChannel; + } + ``` + + - **Phase 3: Monitor In-Flight Message Completion and close channel** - Monitor in-flight count every 100ms for draining while waiting for in-flight count to reach zero naturally. If for some reason, the in-flight count does not reach zero (hangs, could be due to underlying kafka going down), force close after timeout to prevent indefinite hangs. Once in-flight count reaches zero (or after the timeout), close the channel immediately. + ``` + private CompletableFuture gracefullyCloseChannel(Channel channel, String clusterName, + String channelType, Duration timeout) { + CompletableFuture future = new CompletableFuture<>(); + long startTime = System.currentTimeMillis(); + + // Schedule timeout + ScheduledFuture timeoutTask = scheduler.schedule(() -> { + if (!future.isDone()) { + LOGGER.warn("Graceful shutdown timeout exceeded for {} channel L:/{}, R:/{} in cluster '{}' - forcing immediate closure", + channelType, channel.localAddress(), channel.remoteAddress(), clusterName); + closeChannelImmediately(channel, future); + } + }, timeoutMillis, TimeUnit.MILLISECONDS); + + // Schedule periodic checks for in-flight messages + ScheduledFuture checkTask = scheduler.scheduleAtFixedRate(() -> { + try { + if (future.isDone()) { + return; + } + + int pendingRequests = inFlightTracker.getPendingRequestCount(clusterName, channel); + long elapsed = System.currentTimeMillis() - startTime; + + if (pendingRequests == 0) { + LOGGER.info("In-flight messages cleared for {} channel L:/{}, R:/{} in cluster '{}' - proceeding with connection closure ({}ms elapsed)", + channelType, channel.localAddress(), channel.remoteAddress(), clusterName, elapsed); + closeChannelImmediately(channel, future); + } + else { + // Just wait for existing in-flight messages to complete naturally + // Do NOT call channel.read() as it would trigger processing of new messages + int totalPending = inFlightTracker.getTotalPendingRequestCount(clusterName); + LOGGER.debug("Waiting for {} channel L:/{}, R:/{} in cluster '{}' to drain: {} pending requests (cluster total: {}, {}ms elapsed)", + channelType, channel.localAddress(), channel.remoteAddress(), clusterName, pendingRequests, totalPending, elapsed); + } + } + catch (Exception e) { + LOGGER.error("Unexpected error during graceful shutdown monitoring for channel L:/{}, R:/{} in cluster '{}'", + channel.localAddress(), channel.remoteAddress(), clusterName, e); + future.completeExceptionally(e); + } + }, 50, 100, TimeUnit.MILLISECONDS); // Check every 100ms for faster response + + // Cancel scheduled tasks when future completes and log final result + future.whenComplete((result, throwable) -> { + timeoutTask.cancel(false); + checkTask.cancel(false); + + if (throwable == null) { + LOGGER.info("Successfully completed graceful shutdown of {} channel L:/{}, R:/{} in cluster '{}'", + channelType, channel.localAddress(), channel.remoteAddress(), clusterName); + } + else { + LOGGER.error("Graceful shutdown failed for {} channel L:/{}, R:/{} in cluster '{}': {}", + channelType, channel.localAddress(), channel.remoteAddress(), clusterName, throwable.getMessage()); + } + }); + + return future; + } + + private void closeChannelImmediately(Channel channel, CompletableFuture future) { + if (future.isDone()) { + return; + } + + channel.close().addListener(channelFuture -> { + if (channelFuture.isSuccess()) { + future.complete(null); + } + else { + future.completeExceptionally(channelFuture.cause()); + } + }); + } + ``` + - **How will drain mode reject new client connections ?** - For this, we will put a check in KafkaProxyFrontendHandler#channelActive method to reject new connections, if the particular cluster is in drain mode. + ``` + public class KafkaProxyFrontendHandler + extends ChannelInboundHandlerAdapter + implements NetFilter.NetFilterContext { + .... + @Override + public void channelActive(ChannelHandlerContext ctx) throws Exception { + this.clientCtx = ctx; + + // Check if we should accept this connection (not draining) + String clusterName = virtualClusterModel.getClusterName(); + if (connectionDrainManager != null && !connectionDrainManager.shouldAcceptConnection(clusterName)) { + LOGGER.info("Rejecting new connection for draining cluster '{}'", clusterName); + ctx.close(); + return; + } + + this.proxyChannelStateMachine.onClientActive(this); + super.channelActive(this.clientCtx); + } + .... + } + ``` + +3. **ConnectionTracker** + - **What it does** - Maintains real-time inventory of all active network connections per virtual cluster. You can't gracefully drain connections if you don't know what connections exist - this class provides that visibility. + - **Key Responsibilities:** + - **Bidirectional Tracking**: Separately tracks downstream connections (client→proxy) and upstream connections (proxy→Kafka) + - **Channel Management**: Maintains collections of active `Channel` objects for bulk operations like graceful closure + - **Lifecycle Integration**: Integrates with `ProxyChannelStateMachine` to automatically track connection establishment and closure + - **Cleanup Logic**: Automatically removes references to closed channels and cleans up empty cluster entries +``` +public class ConnectionTracker { + + // Downstream connections (client → proxy) + private final Map downstreamConnections = new ConcurrentHashMap<>(); + private final Map> downstreamChannelsByCluster = new ConcurrentHashMap<>(); + + // Upstream connections (proxy → target Kafka cluster) + private final Map upstreamConnections = new ConcurrentHashMap<>(); + private final Map> upstreamChannelsByCluster = new ConcurrentHashMap<>(); + + public void onDownstreamConnectionEstablished(String clusterName, Channel channel) { + downstreamConnections.computeIfAbsent(clusterName, k -> new AtomicInteger(0)).incrementAndGet(); + downstreamChannelsByCluster.computeIfAbsent(clusterName, k -> ConcurrentHashMap.newKeySet()).add(channel); + } + + public void onDownstreamConnectionClosed(String clusterName, Channel channel) { + onConnectionClosed(clusterName, channel, downstreamConnections, downstreamChannelsByCluster); + } + + + /** + Called by ConnectionDrainManager + */ + public Set getDownstreamActiveChannels(String clusterName) { + Set channels = downstreamChannelsByCluster.get(clusterName); + return channels != null ? Set.copyOf(channels) : Set.of(); + } + + // === UPSTREAM CONNECTION TRACKING === + public void onUpstreamConnectionEstablished(String clusterName, Channel channel) { + upstreamConnections.computeIfAbsent(clusterName, k -> new AtomicInteger(0)).incrementAndGet(); + upstreamChannelsByCluster.computeIfAbsent(clusterName, k -> ConcurrentHashMap.newKeySet()).add(channel); + } + + public void onUpstreamConnectionClosed(String clusterName, Channel channel) { + onConnectionClosed(clusterName, channel, upstreamConnections, upstreamChannelsByCluster); + } + + /** + Called by ConnectionDrainManager + */ + public Set getUpstreamActiveChannels(String clusterName) { + Set channels = upstreamChannelsByCluster.get(clusterName); + return channels != null ? Set.copyOf(channels) : Set.of(); + } + + /** + Called by ConnectionDrainManager + */ + public int getTotalConnectionCount(String clusterName) { + return getDownstreamActiveConnectionCount(clusterName) + getUpstreamActiveConnectionCount(clusterName); + } + + /** + * Common method to remove a connection and clean up empty entries. + * This method decrements the connection counter and removes the channel from the set, + * cleaning up empty entries to prevent memory leaks. + */ + private void onConnectionClosed(String clusterName, Channel channel, + Map connectionCounters, + Map> channelsByCluster) { + // Decrement counter and remove if zero or negative + AtomicInteger counter = connectionCounters.get(clusterName); + if (counter != null) { + counter.decrementAndGet(); + if (counter.get() <= 0) { + connectionCounters.remove(clusterName); + } + } + + // Remove channel from set and remove empty sets + Set channels = channelsByCluster.get(clusterName); + if (channels != null) { + channels.remove(channel); + if (channels.isEmpty()) { + channelsByCluster.remove(clusterName); + } + } + } +} +``` + +4. **InFlightMessageTracker** + - **What it does** - Tracks **pending Kafka requests** to ensure no messages are lost during connection closure. This enables the "wait for completion" strategy - connections are only closed after all pending requests have received responses. + - **Key Responsibilities:** + - **Request Tracking**: Increments counters when Kafka requests are sent upstream in `ProxyChannelStateMachine` + - **Response Tracking**: Decrements counters when Kafka responses are received in `ProxyChannelStateMachine` + - **Channel Cleanup**: Handles cleanup when channels close unexpectedly, adjusting counts appropriately +``` +public class InFlightMessageTracker { + + // Map from cluster name to channel to pending request count + private final Map> pendingRequests = new ConcurrentHashMap<>(); + + // Map from cluster name to total pending requests for quick lookup + private final Map totalPendingByCluster = new ConcurrentHashMap<>(); + + /** + * Records that a request has been sent to the upstream cluster. + * + * @param clusterName The name of the virtual cluster. + * @param channel The channel handling the request. + */ + public void onRequestSent(String clusterName, Channel channel) { + pendingRequests.computeIfAbsent(clusterName, k -> new ConcurrentHashMap<>()) + .computeIfAbsent(channel, k -> new AtomicInteger(0)) + .incrementAndGet(); + + totalPendingByCluster.computeIfAbsent(clusterName, k -> new AtomicInteger(0)) + .incrementAndGet(); + } + + /** + * Records that a response has been received from the upstream cluster. + * + * @param clusterName The name of the virtual cluster. + * @param channel The channel handling the response. + */ + public void onResponseReceived(String clusterName, Channel channel) { + Map clusterRequests = pendingRequests.get(clusterName); + if (clusterRequests != null) { + AtomicInteger channelCounter = clusterRequests.get(channel); + if (channelCounter != null) { + int remaining = channelCounter.decrementAndGet(); + if (remaining <= 0) { + clusterRequests.remove(channel); + if (clusterRequests.isEmpty()) { + pendingRequests.remove(clusterName); + } + } + + AtomicInteger totalCounter = totalPendingByCluster.get(clusterName); + if (totalCounter != null) { + int totalRemaining = totalCounter.decrementAndGet(); + if (totalRemaining <= 0) { + totalPendingByCluster.remove(clusterName); + } + } + } + } + } + + /** + * Records that a channel has been closed, clearing all pending requests for that channel. + * + * @param clusterName The name of the virtual cluster. + * @param channel The channel that was closed. + */ + public void onChannelClosed(String clusterName, Channel channel) { + Map clusterRequests = pendingRequests.get(clusterName); + if (clusterRequests != null) { + AtomicInteger channelCounter = clusterRequests.remove(channel); + if (channelCounter != null) { + int pendingCount = channelCounter.get(); + if (pendingCount > 0) { + // Subtract from total + AtomicInteger totalCounter = totalPendingByCluster.get(clusterName); + if (totalCounter != null) { + int newTotal = totalCounter.addAndGet(-pendingCount); + if (newTotal <= 0) { + totalPendingByCluster.remove(clusterName); + } + } + } + } + + if (clusterRequests.isEmpty()) { + pendingRequests.remove(clusterName); + } + } + } + + /** + * Gets the number of pending requests for a specific channel in a virtual cluster. + * + * @param clusterName The name of the virtual cluster. + * @param channel The channel. + * @return The number of pending requests. + */ + public int getPendingRequestCount(String clusterName, Channel channel) { + Map clusterRequests = pendingRequests.get(clusterName); + if (clusterRequests != null) { + AtomicInteger counter = clusterRequests.get(channel); + return counter != null ? Math.max(0, counter.get()) : 0; + } + return 0; + } + + /** + * Gets the total number of pending requests for a virtual cluster across all channels. + * + * @param clusterName The name of the virtual cluster. + * @return The total number of pending requests. + */ + public int getTotalPendingRequestCount(String clusterName) { + AtomicInteger counter = totalPendingByCluster.get(clusterName); + return counter != null ? Math.max(0, counter.get()) : 0; + } +} +``` + +5. **Changes in ProxyChannelStateMachine** - We need to enhance the existing state machine for + - **Connection Lifecycle**: Automatically notifies ConnectionTracker when connections are established/closed + - **In-flight Message Tracking**: Automatically notifies InFlightMessageTracker when requests/responses flow through + +Example code changes for existing ProxyChannelStateMachine methods +``` +void messageFromServer(Object msg) { + // Track responses received from upstream Kafka (completing in-flight requests) + if (inFlightTracker != null && msg instanceof ResponseFrame && backendHandler != null) { + inFlightTracker.onResponseReceived(clusterName, backendHandler.serverCtx().channel()); + } + + .... + + // Track responses being sent to client on downstream channel + if (inFlightTracker != null && msg instanceof ResponseFrame) { + inFlightTracker.onResponseReceived(clusterName, frontendHandler.clientCtx().channel()); + } +} + +void messageFromClient(Object msg) { + // Track requests being sent upstream (creating in-flight messages) + if (inFlightTracker != null && msg instanceof RequestFrame && backendHandler != null) { + inFlightTracker.onRequestSent(clusterName, backendHandler.serverCtx().channel()); + } + + .... +} + +void onClientRequest(SaslDecodePredicate dp, + Object msg) { + .... + + // Track requests received from client on downstream channel + if (inFlightTracker != null && msg instanceof RequestFrame) { + inFlightTracker.onRequestSent(clusterName, frontendHandler.clientCtx().channel()); + } + + .... +} + +void onServerInactive() { + // Track upstream connection closure + if (connectionTracker != null && backendHandler != null) { + connectionTracker.onUpstreamConnectionClosed(clusterName, backendHandler.serverCtx().channel()); + } + // Clear any pending in-flight messages for this upstream channel + if (inFlightTracker != null && backendHandler != null) { + inFlightTracker.onChannelClosed(clusterName, backendHandler.serverCtx().channel()); + } + + .... +} + +void onClientInactive() { + // Track downstream connection closure + if (connectionTracker != null && frontendHandler != null) { + connectionTracker.onDownstreamConnectionClosed(clusterName, frontendHandler.clientCtx().channel()); + } + // Clear any pending in-flight messages for this downstream channel + if (inFlightTracker != null && frontendHandler != null) { + inFlightTracker.onChannelClosed(clusterName, frontendHandler.clientCtx().channel()); + } + + .... +} + +private void toClientActive(ProxyChannelState.ClientActive clientActive, + KafkaProxyFrontendHandler frontendHandler) { + .... + // Track downstream connection establishment + if (connectionTracker != null) { + connectionTracker.onDownstreamConnectionEstablished(clusterName, frontendHandler.clientCtx().channel()); + } +} + +private void toForwarding(Forwarding forwarding) { + .... + // Track upstream connection establishment + if (connectionTracker != null && backendHandler != null) { + connectionTracker.onUpstreamConnectionEstablished(clusterName, backendHandler.serverCtx().channel()); + } +} +``` + +# Challenges/Open questions +- If for some reason, loading of the new cluster configs fails, the code will automatically rollback to the previous state. However this will put the app in such a state that the current config file content does not match with the actual running cluster config. +- What if the rollback fails (for some unforeseen reason), the only way for the operator to know this is via Logs. In such cases, a full app restart might be required. +- If there are multiple gateway nodes running, and if there is failure in few nodes, we may have to introduce some sort of status co-ordinator rather than relying that all instances will behave the same. From 7afdec659010e572db046ba840170216b13d8f11 Mon Sep 17 00:00:00 2001 From: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Date: Thu, 29 Jan 2026 09:45:49 +0530 Subject: [PATCH 02/17] replace file watcher with HTTP based approach Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 1810 +++++++++++++++++++-------- 1 file changed, 1296 insertions(+), 514 deletions(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index ca760fe..588714c 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -1,166 +1,438 @@ -# Hot Reload Feature +# Hot Reload Feature - HTTP-Based Approach -As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system +As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of Kroxylicious app. +This proposal describes the dynamic reload feature, which enables operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. +This feature transforms Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system. -This proposal is structured as a multi-part implementation to ensure clear separation of concerns and manageable development phases. +## HTTP-Based vs File Watcher Approach -- Part 1: Configuration Change Detection - This part focuses on monitoring configuration files, parsing changes, and comparing old vs new configurations to identify exactly which virtual clusters need to be restarted. It provides a clean interface that returns structured change operations (additions, removals, modifications) without actually performing any restart operations. +The original proposal used a file-based watcher mechanism to detect configuration changes. This has been replaced with an **HTTP-based trigger mechanism** for the following reasons: -- Part 2: Graceful Virtual Cluster Restart - This part handles the actual restart operations, including graceful connection draining, in-flight message completion, and rollback mechanisms. It takes the change decisions from Part 1 and executes them safely while ensuring minimal service disruption. +| Aspect | File Watcher | HTTP Endpoint | +|--------|--------------|---------------| +| **Trigger** | Automatic on file change | Explicit HTTP POST request | +| **Control** | Passive monitoring | Active, operator-controlled | +| **Configuration Delivery** | Read from filesystem | Sent in request body | +| **Response** | Asynchronous (via logs) | Synchronous HTTP response | +| **Validation** | After file is saved | Before applying changes | +| **Rollback** | Manual file restore | Automatic on failure | -# Part 1: Configuration Change Detection Framework +This proposal is structured as a multi-part implementation to ensure clear separation of concerns: -With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures ) +- **Part 1: HTTP-Based Configuration Reload Endpoint** - This part focuses on the HTTP endpoint that receives new configurations, validates them, and triggers the hot-reload process. It provides synchronous feedback via HTTP response with detailed reload results. -POC PR - https://github.com/kroxylicious/kroxylicious/pull/2901 +- **Part 2: Graceful Virtual Cluster Restart** - This part handles the actual restart operations, including graceful connection draining, in-flight message completion, and rollback mechanisms. It takes the change decisions from Part 1 and executes them safely while ensuring minimal service disruption. -## Core Classes & Structure +**POC PR** - https://github.com/kroxylicious/kroxylicious/pull/3176 +--- + +# Part 1: HTTP-Based Configuration Reload Endpoint + +With this framework, operators can trigger configuration reloads by sending an HTTP POST request to `/admin/config/reload` with the new YAML configuration in the request body. The endpoint validates the configuration, detects changes, and orchestrates the reload process with full rollback support. + +## Endpoint Configuration -1. **ConfigWatcherService** - File system monitoring and configuration loading orchestrator - - Monitors configuration file changes using Java NIO WatchService - - Parses YAML configuration files using the existing ConfigParser when changes are detected - - Provides graceful shutdown of executor services and watch resources - - Triggers configuration change callbacks asynchronously to initiate the hot-reload process - - Handles file parsing errors and continues monitoring - - `KafkaProxy.java` - As this is the entry point to the proxy app, this class will configure the callback which needs to be triggered when there is a config change. This class will also be responsible for setting up the ConfigurationChangeHandler and ConfigWatcherService +To enable the reload endpoint, add the following to your kroxylicious configuration: +```yaml +management: + endpoints: + prometheus: {} + configReload: + enabled: true + timeout: 60s # Optional, defaults to 60s ``` -public final class KafkaProxy implements AutoCloseable { - ..... - public KafkaProxy(PluginFactoryRegistry pfr, Configuration config, Features features, Path configFilePath) { - ..... - // Initialize configuration change handler with direct list of detectors - this.configurationChangeHandler = new ConfigurationChangeHandler( - List.of( - new VirtualClusterChangeDetector(), - new FilterChangeDetector()), - virtualClusterManager); - } - ... +## HTTP API - public CompletableFuture startConfigurationWatcher(Path configFilePath) { - ..... - this.configWatcherService = new ConfigWatcherService( - configFilePath, - this::handleConfigurationChange, - Duration.ofMillis(500) // 500ms debounce delay - ); +**Endpoint:** `POST /admin/config/reload` - return configWatcherService.start(); - } - ... +**Request:** +- **Method:** POST +- **Content-Type:** `application/yaml`, `text/yaml`, or `application/x-yaml` +- **Body:** Complete YAML configuration - public CompletableFuture stopConfigurationWatcher() { - if (configWatcherService == null) { - return CompletableFuture.completedFuture(null); - } - return configWatcherService.stop().thenRun(() -> { - configWatcherService = null; - }); +**Response:** +- **Content-Type:** `application/json` +- **Status:** 200 OK (success), 400 Bad Request (validation error), 409 Conflict (concurrent reload), 500 Internal Server Error (failure) + +**Example Response (Success):** +```json +{ + "success": true, + "message": "Configuration reloaded successfully", + "clustersModified": 1, + "clustersAdded": 0, + "clustersRemoved": 0, + "timestamp": "2024-01-15T10:30:00Z" +} +``` + +**Example Response (Failure):** +```json +{ + "success": false, + "message": "Configuration validation failed: invalid bootstrap servers", + "clustersModified": 0, + "clustersAdded": 0, + "clustersRemoved": 0, + "timestamp": "2024-01-15T10:30:00Z" +} +``` + +> **WARNING:** This endpoint has NO authentication and is INSECURE by design. Use network policies or firewalls to restrict access. + +## Core Classes & Structure + +### 1. ConfigurationReloadEndpoint + +HTTP POST endpoint handler for triggering configuration reload at `/admin/config/reload`. + +- **What it does**: Receives HTTP POST requests containing YAML configuration and initiates the reload process +- **Key Responsibilities:** + - Extracts the YAML configuration from the HTTP request body + - Delegates request processing to `ReloadRequestProcessor` + - Formats successful responses using `ResponseFormatter` + - Handles different exception types and returns appropriate HTTP status codes: + - `400 Bad Request` for validation errors (invalid YAML, wrong content-type) + - `409 Conflict` for concurrent reload attempts + - `500 Internal Server Error` for reload failures + - Provides structured JSON response with reload results + +```java +public class ConfigurationReloadEndpoint implements Function { + + public static final String PATH = "/admin/config/reload"; + + private final ReloadRequestProcessor requestProcessor; + private final ResponseFormatter responseFormatter; + + public ConfigurationReloadEndpoint( + ReloadRequestProcessor requestProcessor, + ResponseFormatter responseFormatter) { + this.requestProcessor = Objects.requireNonNull(requestProcessor); + this.responseFormatter = Objects.requireNonNull(responseFormatter); } - private void handleConfigurationChange(Configuration newConfig) { + @Override + public HttpResponse apply(HttpRequest request) { try { - Configuration newValidatedConfig = validate(newConfig, features); - Configuration oldConfig = this.config; - - // Create models once to avoid excessive logging during change detection - List oldModels = oldConfig.virtualClusterModel(pfr); - List newModels = newValidatedConfig.virtualClusterModel(pfr); - ConfigurationChangeContext changeContext = new ConfigurationChangeContext( - oldConfig, newValidatedConfig, oldModels, newModels); - - // Delegate to the configuration change handler - configurationChangeHandler.handleConfigurationChange(changeContext) - .thenRun(() -> { - // Update the stored configuration after successful hot-reload - this.config = newValidatedConfig; - // Synchronize the virtualClusterModels with the new configuration to ensure consistency - this.virtualClusterModels = newModels; - LOGGER.info("Configuration and virtual cluster models successfully updated"); - }); + // Create context from request + ReloadRequestContext context = ReloadRequestContext.from(request); + + // Process request through handler chain + ReloadResponse response = requestProcessor.process(context); + + // Format and return response + return responseFormatter.format(response, request); } - catch (Exception e) { - LOGGER.error("Failed to validate or process configuration change", e); + catch (ValidationException e) { + return createErrorResponse(request, HttpResponseStatus.BAD_REQUEST, e.getMessage()); + } + catch (ConcurrentReloadException e) { + return createErrorResponse(request, HttpResponseStatus.CONFLICT, e.getMessage()); + } + catch (ReloadException e) { + return createErrorResponse(request, HttpResponseStatus.INTERNAL_SERVER_ERROR, e.getMessage()); + } + } +} +``` + +### 2. ReloadRequestProcessor + +Processes reload requests using the **Chain of Responsibility** pattern. Each handler performs a specific task and passes the context to the next handler. + +- **What it does**: Orchestrates the request processing pipeline by chaining multiple handlers that validate, parse, and execute the reload +- **Key Responsibilities:** + - Builds the handler chain in the correct order: validation → parsing → execution + - Passes an immutable `ReloadRequestContext` through each handler + - Each handler can enrich the context (e.g., add parsed Configuration) or throw exceptions + - Returns the final `ReloadResponse` from the context after all handlers complete + - Enforces maximum content length (10MB) to prevent memory exhaustion + +```java +public class ReloadRequestProcessor { + + private static final int MAX_CONTENT_LENGTH = 10 * 1024 * 1024; // 10MB + + private final List handlers; + + public ReloadRequestProcessor( + ConfigParser parser, + ConfigurationReloadOrchestrator orchestrator, + long timeoutSeconds) { + this.handlers = List.of( + new ContentTypeValidationHandler(), // 1. Validates Content-Type header + new ContentLengthValidationHandler(MAX_CONTENT_LENGTH), // 2. Validates body size + new ConfigurationParsingHandler(parser), // 3. Parses YAML to Configuration + new ConfigurationReloadHandler(orchestrator, timeoutSeconds)); // 4. Executes reload + } + + public ReloadResponse process(ReloadRequestContext context) throws ReloadException { + ReloadRequestContext currentContext = context; + + for (ReloadRequestHandler handler : handlers) { + currentContext = handler.handle(currentContext); } + + return currentContext.getResponse(); } } ``` + +**Handler Chain:** + +``` +┌─────────────────────────────────┐ +│ ContentTypeValidationHandler │ Validates Content-Type: application/yaml +└───────────────┬─────────────────┘ + │ + ▼ +┌─────────────────────────────────┐ +│ ContentLengthValidationHandler │ Validates body size <= 10MB +└───────────────┬─────────────────┘ + │ + ▼ +┌─────────────────────────────────┐ +│ ConfigurationParsingHandler │ Parses YAML → Configuration object +└───────────────┬─────────────────┘ + │ + ▼ +┌─────────────────────────────────┐ +│ ConfigurationReloadHandler │ Executes reload via orchestrator +└─────────────────────────────────┘ ``` -public class ConfigWatcherService { - ... - public ConfigWatcherService(Path configFilePath, - Consumer onConfigurationChanged) { - + +### 3. ReloadRequestContext + +Immutable context object passed through the request processing chain. Uses the builder pattern for creating modified contexts. + +- **What it does**: Carries request data and processing results through the handler chain without mutation +- **Key Responsibilities:** + - Holds the original `HttpRequest` and extracted request body + - Stores the parsed `Configuration` after parsing handler completes + - Stores the final `ReloadResponse` after reload handler completes + - Provides immutable "with" methods that return new context instances with updated fields + - Uses `Builder` pattern for clean construction and modification + +```java +public class ReloadRequestContext { + + private final HttpRequest httpRequest; + private final String requestBody; + private final Configuration parsedConfiguration; + private final ReloadResponse response; + + public static ReloadRequestContext from(HttpRequest request) { + String body = null; + if (request instanceof FullHttpRequest fullRequest) { + ByteBuf content = fullRequest.content(); + if (content.readableBytes() > 0) { + body = content.toString(StandardCharsets.UTF_8); + } + } + + return new Builder() + .withHttpRequest(request) + .withRequestBody(body) + .build(); } - ... - public CompletableFuture start() {} - public CompletableFuture stop() {} - //There will be more methods which will schedule a FileWatcher on the configpath - // and trigger the handleConfigurationChange() whenever there is a valid change + // Immutable "with" methods return new context instances + public ReloadRequestContext withParsedConfiguration(Configuration config) { + return new Builder(this).withParsedConfiguration(config).build(); + } - private void handleConfigurationChange() { - ... - onConfigurationChanged.accept(newConfiguration); - ... - } - + public ReloadRequestContext withResponse(ReloadResponse response) { + return new Builder(this).withResponse(response).build(); + } } ``` -2. **ConfigurationChangeHandler** - Orchestrates the entire configuration change process from detection to execution with rollback capability. - - This handler accepts a list of detector interfaces which run and identify which virtual clusters are affected. - - Once we get to know the list of clusters that need to be added/removed/restarted, this class will call the VirtualClusterManager methods to perform addition/deletion/restarts (This class will be discussed in part 2) - - This class also creates an instance of `ConfigurationChangeRollbackTracker`, which tracks what operations are being applied. So in case of any failures, the operations performed can be reversed to previous stable state. +### 4. ConfigurationReloadOrchestrator + +Orchestrates configuration reload operations with **concurrency control**, **validation**, and **state tracking**. Uses `ReentrantLock` to prevent concurrent reloads. + +- **What it does**: Acts as the main coordinator for the entire reload workflow, from validation through execution to state management +- **Key Responsibilities:** + - **Concurrency Control**: Uses `ReentrantLock.tryLock()` to prevent concurrent reloads and returns `ConcurrentReloadException` if a reload is already in progress + - **Configuration Validation**: Validates the new configuration using the `Features` framework before applying + - **FilterChainFactory Management**: Creates a new `FilterChainFactory` with updated filter definitions and performs atomic swap on success + - **Rollback on Failure**: If reload fails, closes the new factory and keeps the old factory active + - **State Tracking**: Maintains reload state (IDLE/IN_PROGRESS) via `ReloadStateManager` + - **Disk Persistence**: Persists successful configuration to disk by replacing the existing config file with the new one. A backup of the old config is also taken (.bak extension) + +```java +public class ConfigurationReloadOrchestrator { + + private final ConfigurationChangeHandler configurationChangeHandler; + private final PluginFactoryRegistry pluginFactoryRegistry; + private final Features features; + private final ReloadStateManager stateManager; + private final ReentrantLock reloadLock; + + private Configuration currentConfiguration; + private final @Nullable Path configFilePath; + + // Shared mutable reference to FilterChainFactory - enables atomic swaps during hot reload + private final AtomicReference filterChainFactoryRef; + + public ConfigurationReloadOrchestrator( + Configuration initialConfiguration, + ConfigurationChangeHandler configurationChangeHandler, + PluginFactoryRegistry pluginFactoryRegistry, + Features features, + @Nullable Path configFilePath, + AtomicReference filterChainFactoryRef) { + this.currentConfiguration = Objects.requireNonNull(initialConfiguration); + this.configurationChangeHandler = Objects.requireNonNull(configurationChangeHandler); + this.pluginFactoryRegistry = Objects.requireNonNull(pluginFactoryRegistry); + this.features = Objects.requireNonNull(features); + this.filterChainFactoryRef = Objects.requireNonNull(filterChainFactoryRef); + this.configFilePath = configFilePath; + this.stateManager = new ReloadStateManager(); + this.reloadLock = new ReentrantLock(); + } + + /** + * Reload configuration with concurrency control. + * This method implements the Template Method pattern - it defines the reload algorithm + * skeleton with fixed steps. + */ + public CompletableFuture reload(Configuration newConfig) { + // 1. Check if reload already in progress + if (!reloadLock.tryLock()) { + return CompletableFuture.failedFuture( + new ConcurrentReloadException("A reload operation is already in progress")); + } + + Instant startTime = Instant.now(); + + try { + // 2. Mark reload as started + stateManager.startReload(); + + // 3. Validate configuration + Configuration validatedConfig = validateConfiguration(newConfig); + + // 4. Execute reload + return executeReload(validatedConfig, startTime) + .whenComplete((result, error) -> { + if (error != null) { + stateManager.recordFailure(error); + } + else { + stateManager.recordSuccess(result); + this.currentConfiguration = validatedConfig; + persistConfigurationToDisk(validatedConfig); + } + }); + } + finally { + reloadLock.unlock(); + } + } + + /** + * Execute the configuration reload by creating a new FilterChainFactory, + * building a change context, and delegating to ConfigurationChangeHandler. + */ + private CompletableFuture executeReload(Configuration newConfig, Instant startTime) { + // 1. Create new FilterChainFactory with updated filter definitions + FilterChainFactory newFactory = new FilterChainFactory(pluginFactoryRegistry, newConfig.filterDefinitions()); + + // 2. Get old factory for rollback capability + FilterChainFactory oldFactory = filterChainFactoryRef.get(); + + // 3. Build change context with both old and new factories + List oldModels = currentConfiguration.virtualClusterModel(pluginFactoryRegistry); + List newModels = newConfig.virtualClusterModel(pluginFactoryRegistry); + + ConfigurationChangeContext changeContext = new ConfigurationChangeContext( + currentConfiguration, newConfig, + oldModels, newModels, + oldFactory, newFactory); + + // 4. Execute configuration changes + return configurationChangeHandler.handleConfigurationChange(changeContext) + .thenApply(v -> { + // SUCCESS: Atomically swap to new factory + filterChainFactoryRef.set(newFactory); + if (oldFactory != null) { + oldFactory.close(); + } + return buildReloadResult(changeContext, startTime); + }) + .exceptionally(error -> { + // FAILURE: Rollback - close new factory, keep old factory + newFactory.close(); + throw new CompletionException("Configuration reload failed", error); + }); + } +} ``` + +### 5. ConfigurationChangeHandler + +Orchestrates the entire configuration change process from detection to execution with rollback capability. + +- **What it does**: Coordinates multiple change detectors, aggregates their results, and executes cluster operations in the correct order +- **Key Responsibilities:** + - **Detector Coordination**: Accepts a list of `ChangeDetector` implementations and runs all of them to identify changes + - **Result Aggregation**: Uses `LinkedHashSet` to merge results from all detectors, removing duplicates while maintaining order + - **Ordered Execution**: Processes changes in the correct order: Remove → Modify → Add (to free up ports/resources first) + - **Rollback Tracking**: Creates a `ConfigurationChangeRollbackTracker` to track all successful operations for potential rollback + - **Rollback on Failure**: If any operation fails, initiates rollback of all previously successful operations in reverse order + +```java public class ConfigurationChangeHandler { - + + private final List changeDetectors; + private final VirtualClusterManager virtualClusterManager; + public ConfigurationChangeHandler(List changeDetectors, VirtualClusterManager virtualClusterManager) { this.changeDetectors = List.copyOf(changeDetectors); this.virtualClusterManager = virtualClusterManager; - ... + } /** * Main entry point for handling configuration changes. */ - public CompletableFuture handleConfigurationChange( - ConfigurationChangeContext changeContext) { - - // 1. Detect changes using all registered detectors - ChangeResult changes = detectChanges(changeContext); - - if (!changes.hasChanges()) { - LOGGER.info("No changes detected - hot-reload not needed"); - return CompletableFuture.completedFuture(null); - } - - // 2. Process changes with rollback tracking - ConfigurationChangeRollbackTracker rollbackTracker = new ConfigurationChangeRollbackTracker(); - - return processConfigurationChanges(changes, changeContext, rollbackTracker) - .thenRun(() -> { - LOGGER.info("Configuration hot-reload completed successfully - {} operations processed", - changes.getTotalOperations()); - }) - .whenComplete((result, throwable) -> { - if (throwable != null) { - LOGGER.error("Configuration change failed - initiating rollback", throwable); - performRollback(rollbackTracker); - } - }); - } - - /** + public CompletableFuture handleConfigurationChange(ConfigurationChangeContext changeContext) { + // 1. Detect changes using all registered detectors + ChangeResult changes = detectChanges(changeContext); + + if (!changes.hasChanges()) { + LOGGER.info("No changes detected - hot-reload not needed"); + return CompletableFuture.completedFuture(null); + } + + // 2. Process changes with rollback tracking + ConfigurationChangeRollbackTracker rollbackTracker = new ConfigurationChangeRollbackTracker(); + + return processConfigurationChanges(changes, changeContext, rollbackTracker) + .whenComplete((result, throwable) -> { + if (throwable != null) { + LOGGER.error("Configuration change failed - initiating rollback", throwable); + performRollback(rollbackTracker); + } + else { + LOGGER.info("Configuration hot-reload completed successfully - {} operations processed", + changes.getTotalOperations()); + } + }); + } + + /** * Coordinates multiple change detectors and aggregates their results. */ - private ChangeResult detectChanges(ConfigurationChangeContext context) { + private ChangeResult detectChanges(ConfigurationChangeContext context) { Set allClustersToRemove = new LinkedHashSet<>(); Set allClustersToAdd = new LinkedHashSet<>(); Set allClustersToModify = new LinkedHashSet<>(); - + changeDetectors.forEach(detector -> { try { ChangeResult detectorResult = detector.detectChanges(context); @@ -170,81 +442,52 @@ public class ConfigurationChangeHandler { } catch (Exception e) { LOGGER.error("Error in change detector '{}': {}", detector.getName(), e.getMessage(), e); - // Continue with other detectors even if one fails } }); - + return new ChangeResult( new ArrayList<>(allClustersToRemove), new ArrayList<>(allClustersToAdd), new ArrayList<>(allClustersToModify)); } - /** - * Processes configuration changes in the correct order: Remove -> Modify -> Add + /** + * Processes configuration changes in the correct order: Remove → Modify → Add */ private CompletableFuture processConfigurationChanges( - ChangeResult changes, + ChangeResult changes, ConfigurationChangeContext context, ConfigurationChangeRollbackTracker rollbackTracker) { - - // Sequential processing using stream.reduce() with CompletableFuture chaining + CompletableFuture chain = CompletableFuture.completedFuture(null); - // All the below operations will happen by calling methods of `VirtualClusterManager` // 1. Remove clusters first (to free up ports/resources) // 2. Restart modified existing clusters // 3. Add new clusters last - + return chain; } - - /** - * Performs rollback of all successful operations in reverse order in failure - */ - private CompletableFuture performRollback(ConfigurationChangeRollbackTracker tracker) { - // Rollback in reverse order: Added -> Modified -> Removed - // Remove clusters that were successfully added - // Restore clusters that were modified (revert to old configuration) - // Re-add clusters that were removed - } - - } ``` -3. **ConfigurationChangeRollbackTracker** - This class maintains a record of all cluster operations (removals, modifications, additions) so they can be reversed if the overall configuration change fails. -``` -public class ConfigurationChangeRollbackTracker { - - /** - * Tracks a cluster removal operation. - */ - public void trackRemoval(String clusterName, VirtualClusterModel removedModel) {} +### 6. ChangeDetector Interface - /** - * Tracks a cluster modification operation. - */ - public void trackModification(String clusterName, VirtualClusterModel originalModel, VirtualClusterModel newModel) {} +Strategy pattern interface for different types of change detection. - /** - * Tracks a cluster addition operation. - */ - public void trackAddition(String clusterName, VirtualClusterModel addedModel) {} -} -``` +- **What it does**: Defines a contract for components that detect specific types of configuration changes +- **Key Responsibilities:** + - Provides a consistent API for comparing old vs new configurations via `detectChanges()` + - Returns structured `ChangeResult` objects with specific operations needed (add/remove/modify) + - Enables extensibility - new detectors can be added without modifying existing code + - Currently has two implementations: `VirtualClusterChangeDetector` and `FilterChangeDetector` -4. **ChangeDetector (Interface)** - Strategy pattern interface for different types of change detection. - - Currently we have only have one implementation - `VirtualClusterChangeDetector` which will detect changes in virtual cluster models, in future we can add a detector for filter changes aka `FilterChangeDetector` - - Provides a consistent API for comparing old vs new configurations via detectChanges() - - Returns structured ChangeResult objects with specific operations needed -``` +```java public interface ChangeDetector { /** * Name of this change detector for logging and debugging. */ String getName(); - + /** * Detect configuration changes and return structured change information. * @param context The configuration context containing old and new configurations @@ -253,123 +496,552 @@ public interface ChangeDetector { ChangeResult detectChanges(ConfigurationChangeContext context); } ``` -``` + +### 7. VirtualClusterChangeDetector + +Identifies virtual clusters needing restart due to model changes (new, removed, modified). + +- **What it does**: Compares old and new `VirtualClusterModel` collections to detect cluster-level changes +- **Key Responsibilities:** + - **New Cluster Detection**: Finds clusters that exist in new configuration but not in old (additions) + - **Removed Cluster Detection**: Finds clusters that exist in old configuration but not in new (deletions) + - **Modified Cluster Detection**: Finds clusters that exist in both but have different `VirtualClusterModel` (using `equals()` comparison) + - Uses cluster name as the unique identifier for comparison + +```java public class VirtualClusterChangeDetector implements ChangeDetector { - - private static final Logger LOGGER = LoggerFactory.getLogger(VirtualClusterChangeDetector.class); - + @Override public String getName() { return "VirtualClusterChangeDetector"; } - + @Override public ChangeResult detectChanges(ConfigurationChangeContext context) { // Check for modified clusters using equals() comparison List modifiedClusters = findModifiedClusters(context); - + // Check for new clusters (exist in new but not old) List newClusters = findNewClusters(context); - + // Check for removed clusters (exist in old but not new) List removedClusters = findRemovedClusters(context); - + return new ChangeResult(removedClusters, newClusters, modifiedClusters); } + + private List findModifiedClusters(ConfigurationChangeContext context) { + Map oldModelMap = context.oldModels().stream() + .collect(Collectors.toMap(VirtualClusterModel::getClusterName, model -> model)); + + return context.newModels().stream() + .filter(newModel -> { + VirtualClusterModel oldModel = oldModelMap.get(newModel.getClusterName()); + return oldModel != null && !oldModel.equals(newModel); + }) + .map(VirtualClusterModel::getClusterName) + .collect(Collectors.toList()); + } + + private List findNewClusters(ConfigurationChangeContext context) { + Set oldClusterNames = context.oldModels().stream() + .map(VirtualClusterModel::getClusterName) + .collect(Collectors.toSet()); + + return context.newModels().stream() + .map(VirtualClusterModel::getClusterName) + .filter(name -> !oldClusterNames.contains(name)) + .collect(Collectors.toList()); + } + + private List findRemovedClusters(ConfigurationChangeContext context) { + Set newClusterNames = context.newModels().stream() + .map(VirtualClusterModel::getClusterName) + .collect(Collectors.toSet()); + + return context.oldModels().stream() + .map(VirtualClusterModel::getClusterName) + .filter(name -> !newClusterNames.contains(name)) + .collect(Collectors.toList()); + } +} ``` -5. **ConfigurationChangeContext (Record)** - Immutable data container providing context for change detection. - - Holds old and new Configuration objects for comparison +### 8. FilterChangeDetector + +Identifies clusters needing restart due to filter configuration changes. + +- **What it does**: Detects changes in filter definitions and identifies which virtual clusters are affected +- **Key Responsibilities:** + - **Filter Definition Changes**: Compares `NamedFilterDefinition` objects to find filters where type or config changed (additions/removals are handled by Configuration validation) + - **Default Filters Changes**: Detects changes to the `defaultFilters` list (order matters for filter chain execution) + - Returns only `clustersToModify` - filter changes don't cause cluster additions/removals +- **Cluster Impact Rules**: A cluster is impacted if: + - It uses a filter definition that was modified (either from explicit filters or defaults), OR + - It doesn't specify `cluster.filters()` AND the `defaultFilters` list changed + +```java +public class FilterChangeDetector implements ChangeDetector { + + @Override + public String getName() { + return "FilterChangeDetector"; + } + + @Override + public ChangeResult detectChanges(ConfigurationChangeContext context) { + // Detect filter definition changes + Set modifiedFilterNames = findModifiedFilterDefinitions(context); + + // Detect default filters changes (order matters for filter chain execution) + boolean defaultFiltersChanged = hasDefaultFiltersChanged(context); + + // Find impacted clusters + List clustersToModify = findImpactedClusters(modifiedFilterNames, defaultFiltersChanged, context); + + return new ChangeResult(List.of(), List.of(), clustersToModify); + } + + /** + * Find filter definitions that have been modified. + * A filter is considered modified if the type or config changed. + * Note: Filter additions/removals are not tracked here as they're handled by Configuration validation. + */ + private Set findModifiedFilterDefinitions(ConfigurationChangeContext context) { + Map oldDefs = buildFilterDefMap(context.oldConfig()); + Map newDefs = buildFilterDefMap(context.newConfig()); + + Set modifiedFilterNames = new HashSet<>(); + + // Check each new definition to see if it differs from the old one + for (Map.Entry entry : newDefs.entrySet()) { + String filterName = entry.getKey(); + NamedFilterDefinition newDef = entry.getValue(); + NamedFilterDefinition oldDef = oldDefs.get(filterName); + + // Filter exists in both configs - check if it changed + if (oldDef != null && !oldDef.equals(newDef)) { + modifiedFilterNames.add(filterName); + } + } + + return modifiedFilterNames; + } + + /** + * Check if the default filters list has changed. + * Order matters because filter chain execution is sequential. + */ + private boolean hasDefaultFiltersChanged(ConfigurationChangeContext context) { + List oldDefaults = context.oldConfig().defaultFilters(); + List newDefaults = context.newConfig().defaultFilters(); + // Use Objects.equals for null-safe comparison - checks both content AND order + return !Objects.equals(oldDefaults, newDefaults); + } + + /** + * Find virtual clusters that are impacted by filter changes. + * Uses a simple single-pass approach: iterate through each cluster and check if it's + * affected by any filter change. Prioritizes code clarity over optimization. + */ + private List findImpactedClusters( + Set modifiedFilterNames, + boolean defaultFiltersChanged, + ConfigurationChangeContext context) { + + // Early return if nothing changed + if (modifiedFilterNames.isEmpty() && !defaultFiltersChanged) { + return List.of(); + } + + List impactedClusters = new ArrayList<>(); + + // Simple approach: check each cluster's resolved filters + for (VirtualClusterModel cluster : context.newModels()) { + String clusterName = cluster.getClusterName(); + + // Get this cluster's resolved filters (either explicit or from defaults) + List clusterFilterNames = cluster.getFilters() + .stream() + .map(NamedFilterDefinition::name) + .toList(); + + // Check if cluster uses any modified filter OR uses defaults and defaults changed + boolean usesModifiedFilter = clusterFilterNames.stream() + .anyMatch(modifiedFilterNames::contains); + + boolean usesChangedDefaults = defaultFiltersChanged && + clusterUsesDefaults(cluster, context.newConfig()); + + if (usesModifiedFilter || usesChangedDefaults) { + impactedClusters.add(clusterName); + } + } + + return impactedClusters; + } + + /** + * Check if a cluster uses default filters. + * A cluster uses defaults if it doesn't specify its own filters list. + */ + private boolean clusterUsesDefaults(VirtualClusterModel cluster, Configuration config) { + VirtualCluster vc = config.virtualClusters().stream() + .filter(v -> v.name().equals(cluster.getClusterName())) + .findFirst() + .orElse(null); + + // Cluster uses defaults if it doesn't specify its own filters + return vc != null && vc.filters() == null; + } +} ``` + +### 9. Supporting Records + +**ConfigurationChangeContext** - Immutable context for change detection. +- **What it does**: Provides a single object containing all the data needed for change detection, including both old and new configurations and their pre-computed models +- **Key fields**: `oldConfig`, `newConfig`, `oldModels`, `newModels`, `oldFilterChainFactory`, `newFilterChainFactory` +- **Why FilterChainFactory is included**: Enables filter-related change detectors to reference the factories for comparison + +```java public record ConfigurationChangeContext( - Configuration oldConfig, - Configuration newConfig, - List oldModels, - List newModels -) {} + Configuration oldConfig, + Configuration newConfig, + List oldModels, + List newModels, + @Nullable FilterChainFactory oldFilterChainFactory, + @Nullable FilterChainFactory newFilterChainFactory) {} ``` -6. **ChangeResult (Record)** - Contains lists of cluster names for each operation type (remove/add/modify) -``` +**ChangeResult** - Result of change detection. +- **What it does**: Contains categorized lists of cluster names for each operation type needed +- **Key fields**: `clustersToRemove`, `clustersToAdd`, `clustersToModify` +- **Utility methods**: `hasChanges()` to check if any changes detected, `getTotalOperations()` to get total count + +```java public record ChangeResult( - List clustersToRemove, - List clustersToAdd, - List clustersToModify -) {} + List clustersToRemove, + List clustersToAdd, + List clustersToModify) { + + public boolean hasChanges() { + return !clustersToRemove.isEmpty() || !clustersToAdd.isEmpty() || !clustersToModify.isEmpty(); + } + + public int getTotalOperations() { + return clustersToRemove.size() + clustersToAdd.size() + clustersToModify.size(); + } +} ``` -## Flow diagram +**ReloadResponse** - HTTP response payload. +- **What it does**: Serializable record that represents the JSON response sent back to HTTP clients +- **Key fields**: `success`, `message`, `clustersModified`, `clustersAdded`, `clustersRemoved`, `timestamp` +- **Factory methods**: `from(ReloadResult)` to convert internal result, `error(message)` for error responses + +```java +public record ReloadResponse( + boolean success, + String message, + int clustersModified, + int clustersAdded, + int clustersRemoved, + String timestamp) { + + public static ReloadResponse from(ReloadResult result) { + return new ReloadResponse( + result.isSuccess(), + result.getMessage(), + result.getClustersModified(), + result.getClustersAdded(), + result.getClustersRemoved(), + result.getTimestamp().toString()); + } + + public static ReloadResponse error(String message) { + return new ReloadResponse(false, message, 0, 0, 0, Instant.now().toString()); + } +} +``` + +**ReloadStateManager** - Tracks reload state and history. +- **What it does**: Maintains the current reload state and a history of recent reload operations for observability +- **Key responsibilities**: Tracks `IDLE`/`IN_PROGRESS` state, records success/failure with `ReloadResult`, maintains bounded history (max 10 entries) +- **Thread safety**: Uses `AtomicReference` for state and `synchronized` blocks for history access + +```java +public class ReloadStateManager { + + private static final int MAX_HISTORY_SIZE = 10; + + private final AtomicReference currentState; + private final Deque reloadHistory; + + public enum ReloadState { + IDLE, + IN_PROGRESS + } + + public void startReload() { + currentState.set(ReloadState.IN_PROGRESS); + } + + public void recordSuccess(ReloadResult result) { + currentState.set(ReloadState.IDLE); + addToHistory(result); + } + + public void recordFailure(Throwable error) { + currentState.set(ReloadState.IDLE); + addToHistory(ReloadResult.failure(error.getMessage())); + } + + public ReloadState getCurrentState() { + return currentState.get(); + } + + public Optional getLastResult() { + synchronized (reloadHistory) { + return reloadHistory.isEmpty() ? Optional.empty() : Optional.of(reloadHistory.peekLast()); + } + } +} +``` + +## Integration with KafkaProxy + +The `KafkaProxy` class initializes the reload orchestrator and passes it to the management endpoint. + +- **What it does**: `KafkaProxy` is the entry point to the proxy app and is responsible for setting up all hot-reload components +- **Key Responsibilities:** + - Creates `ConnectionTracker` and `InFlightMessageTracker` for connection management + - Creates `ConnectionDrainManager` and `VirtualClusterManager` for cluster lifecycle operations + - Creates `ConfigurationChangeHandler` with list of change detectors (`VirtualClusterChangeDetector`, `FilterChangeDetector`) + - Creates `AtomicReference` that is shared between `KafkaProxyInitializer` and `ConfigurationReloadOrchestrator` for atomic factory swaps + - Creates `ConfigurationReloadOrchestrator` and passes it to `ManagementInitializer` for HTTP endpoint registration +- **Why AtomicReference is used**: Both the initializers (which create filter chains for new connections) and the orchestrator (which swaps factories on reload) need access to the current factory. Using `AtomicReference` enables atomic, thread-safe swaps. + +```java +public final class KafkaProxy implements AutoCloseable { + + // Shared mutable reference to FilterChainFactory - enables atomic swaps during hot reload + private AtomicReference filterChainFactoryRef; + + private final ConfigurationChangeHandler configurationChangeHandler; + private final @Nullable ConfigurationReloadOrchestrator reloadOrchestrator; + + public KafkaProxy(PluginFactoryRegistry pfr, Configuration config, Features features, @Nullable Path configFilePath) { + // Initialize connection management components + this.connectionDrainManager = new ConnectionDrainManager(connectionTracker, inFlightTracker); + this.virtualClusterManager = new VirtualClusterManager(endpointRegistry, connectionDrainManager); + + // Initialize configuration change handler with detectors + this.configurationChangeHandler = new ConfigurationChangeHandler( + List.of( + new VirtualClusterChangeDetector(), + new FilterChangeDetector()), + virtualClusterManager); + + // Create AtomicReference for FilterChainFactory + this.filterChainFactoryRef = new AtomicReference<>(); + + // Initialize reload orchestrator for HTTP endpoint + this.reloadOrchestrator = new ConfigurationReloadOrchestrator( + config, + configurationChangeHandler, + pfr, + features, + configFilePath, + filterChainFactoryRef); + } + + public CompletableFuture startup() { + // Create initial FilterChainFactory and store in shared atomic reference + FilterChainFactory initialFactory = new FilterChainFactory(pfr, config.filterDefinitions()); + this.filterChainFactoryRef.set(initialFactory); + + // Pass atomic reference to initializers for dynamic factory swaps + var tlsServerBootstrap = buildServerBootstrap(proxyEventGroup, + new KafkaProxyInitializer(filterChainFactoryRef, ...)); + var plainServerBootstrap = buildServerBootstrap(proxyEventGroup, + new KafkaProxyInitializer(filterChainFactoryRef, ...)); + + // Start management listener with reload orchestrator + var managementFuture = maybeStartManagementListener(managementEventGroup, meterRegistries, reloadOrchestrator); + + // ... + } +} +``` -Image +## Flow Diagram +``` +┌───────────────────────────────────────────────────────────────────────────┐ +│ HTTP POST /admin/config/reload │ +│ Content-Type: application/yaml │ +│ Body: │ +└───────────────────────────────────┬───────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ ConfigurationReloadEndpoint │ +│ - Creates ReloadRequestContext │ +│ - Delegates to ReloadRequestProcessor │ +└───────────────────────────────────┬───────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ ReloadRequestProcessor │ +│ - Chain of Responsibility pattern │ +│ │ +│ ┌─────────────────┐ ┌──────────────────┐ ┌────────────────────────┐ │ +│ │ ContentType │ → │ ContentLength │ → │ ConfigurationParsing │ │ +│ │ Validation │ │ Validation │ │ Handler │ │ +│ └─────────────────┘ └──────────────────┘ └────────────┬───────────┘ │ +│ │ │ +│ ▼ │ +│ ┌────────────────────────────┐ │ +│ │ ConfigurationReload │ │ +│ │ Handler │ │ +│ └────────────┬───────────────┘ │ +└────────────────────────────────────────────────────────┼──────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ ConfigurationReloadOrchestrator │ +│ - Concurrency control (ReentrantLock) │ +│ - Configuration validation │ +│ - Creates new FilterChainFactory │ +│ - Builds ConfigurationChangeContext │ +└───────────────────────────────────┬───────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ ConfigurationChangeHandler │ +│ - Coordinates change detectors │ +│ - Aggregates change results │ +│ │ +│ ┌──────────────────────────┐ ┌────────────────────────┐ │ +│ │ VirtualClusterChange │ │ FilterChangeDetector │ │ +│ │ Detector │ │ │ │ +│ │ - New clusters │ │ - Modified filters │ │ +│ │ - Removed clusters │ │ - Default filters │ │ +│ │ - Modified clusters │ │ - Impacted clusters │ │ +│ └──────────────┬───────────┘ └───────────┬────────────┘ │ +│ │ │ │ +│ └─────────────┬──────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌──────────────────────┐ │ +│ │ ChangeResult │ │ +│ │ - clustersToRemove │ │ +│ │ - clustersToAdd │ │ +│ │ - clustersToModify │ │ +│ └──────────┬───────────┘ │ +└────────────────────────────────┼──────────────────────────────────────────┘ + │ + ▼ + ┌────────────────────────┐ + │ VirtualClusterManager │ ────► Part 2: Graceful Restart + │ - Remove clusters │ + │ - Add clusters │ + │ - Restart clusters │ + └────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ ON SUCCESS │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ filterChainFactoryRef.set(newFactory) ◄── Atomic swap! │ │ +│ │ oldFactory.close() │ │ +│ │ currentConfiguration = newConfig │ │ +│ │ persistConfigurationToDisk(newConfig) │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +└───────────────────────────────────┬───────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ HTTP 200 OK │ +│ Content-Type: application/json │ +│ {"success": true, "clustersModified": 1, ...} │ +└───────────────────────────────────────────────────────────────────────────┘ +``` +--- # Part 2: Graceful Virtual Cluster Restart -Part 2 of the hot-reload implementation focuses on gracefully restarting of virtual clusters. This component receives structured change operations from Part 1 and executes them in a carefully orchestrated sequence: **connection draining → resource deregistration → new resource registration → connection restoration.** +Part 2 of the hot-reload implementation focuses on gracefully restarting virtual clusters. This component receives structured change operations from Part 1 and executes them in a carefully orchestrated sequence: **connection draining → resource deregistration → new resource registration → connection restoration.** The design emphasizes minimal service disruption by ensuring all in-flight Kafka requests complete before closing connections (or when a timeout is hit). ## Core Classes & Structure -1. **VirtualClusterManager** - - **What it does** - Acts as the high-level orchestrator for all virtual cluster lifecycle operations during hot-reload. `ConfigurationChangeHandler` calls the `VirtualClusterManager` to restart/add/remove clusters when there is a config change - - **Key Responsibilities:** - - **Cluster Addition**: Takes a new `VirtualClusterModel` and brings it online by registering it using `EndpointRegistry` - - **Cluster Removal**: Safely takes down an existing cluster by first draining all connections gracefully, then deregistering it using EndpointRegistry - - **Cluster Restart**: Performs a complete cluster reconfiguration by removing the old version and adding the new version with updated settings - - **Rollback Integration**: Automatically tracks all successful operations so they can be undone if later operations fail -``` +### 1. VirtualClusterManager + +Acts as the high-level orchestrator for all virtual cluster lifecycle operations during hot-reload. `ConfigurationChangeHandler` calls the `VirtualClusterManager` to restart/add/remove clusters when there is a config change. + +- **What it does**: Manages the complete lifecycle of virtual clusters including addition, removal, and restart operations +- **Key Responsibilities:** + - **Cluster Addition**: Takes a new `VirtualClusterModel` and brings it online by registering all gateways with `EndpointRegistry` + - **Cluster Removal**: Safely takes down an existing cluster by first draining all connections gracefully via `ConnectionDrainManager`, then deregistering from `EndpointRegistry` + - **Cluster Restart**: Performs a complete cluster reconfiguration by orchestrating remove → add sequence with updated settings + - **Rollback Integration**: Automatically tracks all successful operations via `ConfigurationChangeRollbackTracker` so they can be undone if later operations fail + +```java public class VirtualClusterManager { - - ... + + private final EndpointRegistry endpointRegistry; + private final ConnectionDrainManager connectionDrainManager; + public VirtualClusterManager(EndpointRegistry endpointRegistry, ConnectionDrainManager connectionDrainManager) { this.endpointRegistry = endpointRegistry; this.connectionDrainManager = connectionDrainManager; } - + /** * Gracefully removes a virtual cluster by draining connections and deregistering endpoints. */ public CompletableFuture removeVirtualCluster(String clusterName, List oldModels, ConfigurationChangeRollbackTracker rollbackTracker) { - // 1. Find cluster model to remove VirtualClusterModel clusterToRemove = findClusterModel(oldModels, clusterName); - - // 2. Drain connections gracefully (30s timeout) + + // 1. Drain connections gracefully (30s timeout) return connectionDrainManager.gracefullyDrainConnections(clusterName, Duration.ofSeconds(30)) .thenCompose(v -> { - // 3. Deregister all gateways from endpoint registry + // 2. Deregister all gateways from endpoint registry var deregistrationFutures = clusterToRemove.gateways().values().stream() .map(gateway -> endpointRegistry.deregisterVirtualCluster(gateway)) .toArray(CompletableFuture[]::new); - + return CompletableFuture.allOf(deregistrationFutures); }) .thenRun(() -> { - // 4. Track removal for potential rollback + // 3. Track removal for potential rollback rollbackTracker.trackRemoval(clusterName, clusterToRemove); LOGGER.info("Successfully removed virtual cluster '{}'", clusterName); }); } - + /** * Restarts a virtual cluster with new configuration (remove + add). */ public CompletableFuture restartVirtualCluster(String clusterName, + VirtualClusterModel newModel, List oldModels, - List newModels, ConfigurationChangeRollbackTracker rollbackTracker) { VirtualClusterModel oldModel = findClusterModel(oldModels, clusterName); - VirtualClusterModel newModel = findClusterModel(newModels, clusterName); - + // Step 1: Remove existing cluster (drain + deregister) return removeVirtualCluster(clusterName, oldModels, rollbackTracker) .thenCompose(v -> { // Step 2: Add new cluster with updated configuration - return addVirtualCluster(clusterName, List.of(newModel), rollbackTracker); + return addVirtualCluster(newModel, rollbackTracker); }) .thenRun(() -> { // Step 3: Track modification and stop draining @@ -378,15 +1050,14 @@ public class VirtualClusterManager { LOGGER.info("Successfully restarted virtual cluster '{}' with new configuration", clusterName); }); } - + /** * Adds a new virtual cluster by registering endpoints and enabling connections. */ - public CompletableFuture addVirtualCluster(String clusterName, - List newModels, + public CompletableFuture addVirtualCluster(VirtualClusterModel newModel, ConfigurationChangeRollbackTracker rollbackTracker) { - VirtualClusterModel newModel = findClusterModel(newModels, clusterName); - + String clusterName = newModel.getClusterName(); + return registerVirtualCluster(newModel) .thenRun(() -> { // Stop draining to allow new connections @@ -395,224 +1066,205 @@ public class VirtualClusterManager { LOGGER.info("Successfully added new virtual cluster '{}'", clusterName); }); } - + /** * Registers all gateways for a virtual cluster with the endpoint registry. */ private CompletableFuture registerVirtualCluster(VirtualClusterModel model) { - LOGGER.info("Registering virtual cluster '{}' with {} gateways", - model.getClusterName(), model.gateways().size()); - var registrationFutures = model.gateways().values().stream() .map(gateway -> endpointRegistry.registerVirtualCluster(gateway)) .toArray(CompletableFuture[]::new); - - return CompletableFuture.allOf(registrationFutures) - .thenRun(() -> LOGGER.info("Successfully registered virtual cluster '{}' with all gateways", - model.getClusterName())); + + return CompletableFuture.allOf(registrationFutures); } } ``` -2. **ConnectionDrainManager** - - **What it does** - Implements the graceful connection draining strategy during cluster restarts. This is what makes hot-reload "graceful" - it ensures that client requests in progress are completed rather than dropped. - - **Key Responsibilities:** - - **Draining Mode Control**: Starts/stops "draining mode" where new connections are rejected but existing ones continue - - **Backpressure Strategy**: Applies intelligent backpressure by disabling the channel`autoRead` on downstream channels while keeping upstream channels active. This is done so that any “new” client messages are rejected, while the upstream channel is kept open so that the existing inflight requests are delivered to kafka and their response are successfully delivered back to the client. - - **In-Flight Monitoring**: Continuously monitors pending Kafka requests and waits for them to complete before closing connections. This is done using `InFlightMessageTracker` class. - - **Explanation of the draining strategy** - - **Phase 1: Initiate Draining Mode** - Set cluster to "draining mode" in which any new connection attempts will be rejected. Then we proceed to gracefully closing the connection. - ``` - public CompletableFuture gracefullyDrainConnections(String clusterName, Duration totalTimeout) { - // 1. Get current connection and message state - int totalConnections = connectionTracker.getTotalConnectionCount(clusterName); - int totalInFlight = inFlightTracker.getTotalPendingRequestCount(clusterName); - - LOGGER.info("Starting graceful drain for cluster '{}' with {} connections and {} in-flight requests ({}s timeout)", - clusterName, totalConnections, totalInFlight, totalTimeout.getSeconds()); - - // 2. Enter draining mode - reject new connections - return startDraining(clusterName) - .thenCompose(v -> { - if (totalConnections == 0) { - // Fast path: no connections to drain - return CompletableFuture.completedFuture(null); - } else { - // Proceed with connection closure - return gracefullyCloseConnections(clusterName, totalTimeout); - } - }); - } - - public CompletableFuture startDraining(String clusterName) { - drainingClusters.put(clusterName, new AtomicBoolean(true)); - return CompletableFuture.completedFuture(null); - } - ``` - - **Phase 2: Apply Backpressure Strategy** - we set `autoRead = false` only on the downstream channel to reject any new client messages. `ConnectionTracker` class tracks which downstream/upstream channels are active for a given cluster name. - - **Downstream (Client→Proxy)** - `autoRead = false` - Prevents clients from sending NEW requests while allowing existing requests to complete - - **Upstream (Proxy→Kafka)** - `autoRead = true` - Allows Kafka responses to flow back to complete pending requests. In-flight request count decreases naturally as responses arrive - ``` - public CompletableFuture gracefullyCloseConnections(String clusterName, Duration timeout) { - // 1. Get separate channel collections - Set downstreamChannels = connectionTracker.getDownstreamActiveChannels(clusterName); - Set upstreamChannels = connectionTracker.getUpstreamActiveChannels(clusterName); - - // 2. Apply different strategies to different channel types - var allCloseFutures = new ArrayList>(); - - // Add downstream channel close futures - downstreamChannels.stream() - .map(this::disableAutoReadOnDownstreamChannel) - .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "DOWNSTREAM")) - .forEach(allCloseFutures::add); - - // Add upstream channel close futures - upstreamChannels.stream() - .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "UPSTREAM")) - .forEach(allCloseFutures::add); - - return CompletableFuture.allOf(allCloseFutures.toArray(new CompletableFuture[0])); - } - - private Channel disableAutoReadOnDownstreamChannel(Channel downstreamChannel) { - try { - if (downstreamChannel.isActive()) { - // Get the KafkaProxyFrontendHandler from the channel pipeline - KafkaProxyFrontendHandler frontendHandler = downstreamChannel.pipeline().get(KafkaProxyFrontendHandler.class); - if (frontendHandler != null) { - frontendHandler.applyBackpressure(); - LOGGER.debug("Applied backpressure via frontend handler for channel: L:/{}, R:/{}", - downstreamChannel.localAddress(), downstreamChannel.remoteAddress()); - } - else { - LOGGER.debug("Manually applying backpressure for channel: L:/{}, R:/{}", - downstreamChannel.localAddress(), downstreamChannel.remoteAddress()); - // Fallback to manual method if handler not found - downstreamChannel.config().setAutoRead(false); - } - } +### 2. ConnectionDrainManager + +Implements the graceful connection draining strategy during cluster restarts. This is what makes hot-reload "graceful" - it ensures that client requests in progress are completed rather than dropped. + +- **What it does**: Manages the graceful shutdown of connections during cluster restart, ensuring no in-flight messages are lost +- **Key Responsibilities:** + - **Draining Mode Control**: Maintains a map of draining clusters; when a cluster enters drain mode, `shouldAcceptConnection()` returns false to reject new connections + - **Backpressure Strategy**: Sets `autoRead = false` only on downstream (client→proxy) channels to prevent new requests, while keeping upstream (proxy→Kafka) channels reading to allow responses to complete naturally + - **In-Flight Monitoring**: Uses a scheduled executor to periodically check `InFlightMessageTracker` (every 100ms) and closes channels when pending requests reach zero + - **Timeout Handling**: If in-flight count doesn't reach zero within the timeout (default 30s), force-closes the channel to prevent indefinite hangs + - **Resource Cleanup**: Implements `AutoCloseable` to properly shut down the scheduler on proxy shutdown +- **Explanation of the Draining Strategy:** + - **Phase 1**: Enter draining mode → new connection attempts are rejected + - **Phase 2**: Apply backpressure → downstream `autoRead=false`, upstream `autoRead=true` + - **Phase 3**: Monitor in-flight messages → wait for count to reach zero or timeout, then close channel + +```java +public class ConnectionDrainManager implements AutoCloseable { + + private final ConnectionTracker connectionTracker; + private final InFlightMessageTracker inFlightTracker; + private final Map drainingClusters = new ConcurrentHashMap<>(); + private final ScheduledExecutorService scheduler; + + public ConnectionDrainManager(ConnectionTracker connectionTracker, + InFlightMessageTracker inFlightTracker) { + this.connectionTracker = connectionTracker; + this.inFlightTracker = inFlightTracker; + this.scheduler = new ScheduledThreadPoolExecutor(2, r -> { + Thread t = new Thread(r, "connection-drain-manager"); + t.setDaemon(true); + return t; + }); + } + + /** + * Determines if a new connection should be accepted for the specified virtual cluster. + */ + public boolean shouldAcceptConnection(String clusterName) { + return !isDraining(clusterName); + } + + /** + * Performs a complete graceful drain operation by stopping new connections + * and immediately closing existing connections after in-flight messages complete. + */ + public CompletableFuture gracefullyDrainConnections(String clusterName, Duration totalTimeout) { + int totalActiveConnections = connectionTracker.getTotalConnectionCount(clusterName); + int totalInFlightRequests = inFlightTracker.getTotalPendingRequestCount(clusterName); + + LOGGER.info("Starting graceful drain for cluster '{}' with {} connections and {} in-flight requests", + clusterName, totalActiveConnections, totalInFlightRequests); + + return startDraining(clusterName) + .thenCompose(v -> { + if (totalActiveConnections == 0) { + return CompletableFuture.completedFuture(null); } - catch (Exception e) { - LOGGER.warn("Failed to disable autoRead for downstream channel L:/{}, R:/{} - continuing with drain", - downstreamChannel.localAddress(), downstreamChannel.remoteAddress(), e); + else { + return gracefullyCloseConnections(clusterName, totalTimeout); } - return downstreamChannel; + }); + } + + /** + * Starts draining - new connections will be rejected. + */ + public CompletableFuture startDraining(String clusterName) { + drainingClusters.put(clusterName, new AtomicBoolean(true)); + return CompletableFuture.completedFuture(null); + } + + /** + * Gracefully closes all active connections for the specified virtual cluster. + * Strategy: Disable autoRead on downstream channels to prevent new requests, + * but keep upstream channels reading to allow responses to complete naturally. + */ + public CompletableFuture gracefullyCloseConnections(String clusterName, Duration timeout) { + Set downstreamChannels = connectionTracker.getDownstreamActiveChannels(clusterName); + Set upstreamChannels = connectionTracker.getUpstreamActiveChannels(clusterName); + + var allCloseFutures = new ArrayList>(); + + // STRATEGY: + // - Downstream (autoRead=false): Prevents new client requests from being processed + // - Upstream (autoRead=true): Allows Kafka responses to be processed normally + + // Add downstream channel close futures + downstreamChannels.stream() + .map(this::disableAutoReadOnDownstreamChannel) + .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "DOWNSTREAM")) + .forEach(allCloseFutures::add); + + // Add upstream channel close futures + upstreamChannels.stream() + .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "UPSTREAM")) + .forEach(allCloseFutures::add); + + return CompletableFuture.allOf(allCloseFutures.toArray(new CompletableFuture[0])); + } + + private Channel disableAutoReadOnDownstreamChannel(Channel downstreamChannel) { + try { + if (downstreamChannel.isActive()) { + KafkaProxyFrontendHandler frontendHandler = + downstreamChannel.pipeline().get(KafkaProxyFrontendHandler.class); + if (frontendHandler != null) { + frontendHandler.applyBackpressure(); } - ``` - - - **Phase 3: Monitor In-Flight Message Completion and close channel** - Monitor in-flight count every 100ms for draining while waiting for in-flight count to reach zero naturally. If for some reason, the in-flight count does not reach zero (hangs, could be due to underlying kafka going down), force close after timeout to prevent indefinite hangs. Once in-flight count reaches zero (or after the timeout), close the channel immediately. - ``` - private CompletableFuture gracefullyCloseChannel(Channel channel, String clusterName, - String channelType, Duration timeout) { - CompletableFuture future = new CompletableFuture<>(); - long startTime = System.currentTimeMillis(); - - // Schedule timeout - ScheduledFuture timeoutTask = scheduler.schedule(() -> { - if (!future.isDone()) { - LOGGER.warn("Graceful shutdown timeout exceeded for {} channel L:/{}, R:/{} in cluster '{}' - forcing immediate closure", - channelType, channel.localAddress(), channel.remoteAddress(), clusterName); - closeChannelImmediately(channel, future); - } - }, timeoutMillis, TimeUnit.MILLISECONDS); - - // Schedule periodic checks for in-flight messages - ScheduledFuture checkTask = scheduler.scheduleAtFixedRate(() -> { - try { - if (future.isDone()) { - return; - } - - int pendingRequests = inFlightTracker.getPendingRequestCount(clusterName, channel); - long elapsed = System.currentTimeMillis() - startTime; - - if (pendingRequests == 0) { - LOGGER.info("In-flight messages cleared for {} channel L:/{}, R:/{} in cluster '{}' - proceeding with connection closure ({}ms elapsed)", - channelType, channel.localAddress(), channel.remoteAddress(), clusterName, elapsed); - closeChannelImmediately(channel, future); - } - else { - // Just wait for existing in-flight messages to complete naturally - // Do NOT call channel.read() as it would trigger processing of new messages - int totalPending = inFlightTracker.getTotalPendingRequestCount(clusterName); - LOGGER.debug("Waiting for {} channel L:/{}, R:/{} in cluster '{}' to drain: {} pending requests (cluster total: {}, {}ms elapsed)", - channelType, channel.localAddress(), channel.remoteAddress(), clusterName, pendingRequests, totalPending, elapsed); - } - } - catch (Exception e) { - LOGGER.error("Unexpected error during graceful shutdown monitoring for channel L:/{}, R:/{} in cluster '{}'", - channel.localAddress(), channel.remoteAddress(), clusterName, e); - future.completeExceptionally(e); - } - }, 50, 100, TimeUnit.MILLISECONDS); // Check every 100ms for faster response - - // Cancel scheduled tasks when future completes and log final result - future.whenComplete((result, throwable) -> { - timeoutTask.cancel(false); - checkTask.cancel(false); - - if (throwable == null) { - LOGGER.info("Successfully completed graceful shutdown of {} channel L:/{}, R:/{} in cluster '{}'", - channelType, channel.localAddress(), channel.remoteAddress(), clusterName); - } - else { - LOGGER.error("Graceful shutdown failed for {} channel L:/{}, R:/{} in cluster '{}': {}", - channelType, channel.localAddress(), channel.remoteAddress(), clusterName, throwable.getMessage()); - } - }); - - return future; - } - - private void closeChannelImmediately(Channel channel, CompletableFuture future) { - if (future.isDone()) { - return; - } - - channel.close().addListener(channelFuture -> { - if (channelFuture.isSuccess()) { - future.complete(null); - } - else { - future.completeExceptionally(channelFuture.cause()); - } - }); - } - ``` - - **How will drain mode reject new client connections ?** - For this, we will put a check in KafkaProxyFrontendHandler#channelActive method to reject new connections, if the particular cluster is in drain mode. - ``` - public class KafkaProxyFrontendHandler - extends ChannelInboundHandlerAdapter - implements NetFilter.NetFilterContext { - .... - @Override - public void channelActive(ChannelHandlerContext ctx) throws Exception { - this.clientCtx = ctx; - - // Check if we should accept this connection (not draining) - String clusterName = virtualClusterModel.getClusterName(); - if (connectionDrainManager != null && !connectionDrainManager.shouldAcceptConnection(clusterName)) { - LOGGER.info("Rejecting new connection for draining cluster '{}'", clusterName); - ctx.close(); - return; + else { + downstreamChannel.config().setAutoRead(false); } - - this.proxyChannelStateMachine.onClientActive(this); - super.channelActive(this.clientCtx); } - .... } - ``` + catch (Exception e) { + LOGGER.warn("Failed to disable autoRead for downstream channel - continuing", e); + } + return downstreamChannel; + } + + /** + * Gracefully closes a single channel. + * Monitors in-flight count every 100ms, closes when zero or on timeout. + */ + private CompletableFuture gracefullyCloseChannel(Channel channel, String clusterName, + Duration timeout, String channelType) { + CompletableFuture future = new CompletableFuture<>(); + long timeoutMillis = timeout.toMillis(); + long startTime = System.currentTimeMillis(); + + // Schedule timeout + ScheduledFuture timeoutTask = scheduler.schedule(() -> { + if (!future.isDone()) { + LOGGER.warn("Graceful shutdown timeout - forcing closure"); + closeChannelImmediately(channel, future); + } + }, timeoutMillis, TimeUnit.MILLISECONDS); + + // Schedule periodic checks for in-flight messages + ScheduledFuture checkTask = scheduler.scheduleAtFixedRate(() -> { + if (future.isDone()) return; + + int pendingRequests = inFlightTracker.getPendingRequestCount(clusterName, channel); + if (pendingRequests == 0) { + closeChannelImmediately(channel, future); + } + }, 50, 100, TimeUnit.MILLISECONDS); + + // Cancel tasks when done + future.whenComplete((result, throwable) -> { + timeoutTask.cancel(false); + checkTask.cancel(false); + }); + + return future; + } + + private void closeChannelImmediately(Channel channel, CompletableFuture future) { + if (future.isDone()) return; -3. **ConnectionTracker** - - **What it does** - Maintains real-time inventory of all active network connections per virtual cluster. You can't gracefully drain connections if you don't know what connections exist - this class provides that visibility. - - **Key Responsibilities:** - - **Bidirectional Tracking**: Separately tracks downstream connections (client→proxy) and upstream connections (proxy→Kafka) - - **Channel Management**: Maintains collections of active `Channel` objects for bulk operations like graceful closure - - **Lifecycle Integration**: Integrates with `ProxyChannelStateMachine` to automatically track connection establishment and closure - - **Cleanup Logic**: Automatically removes references to closed channels and cleans up empty cluster entries + channel.close().addListener(channelFuture -> { + if (channelFuture.isSuccess()) { + future.complete(null); + } + else { + future.completeExceptionally(channelFuture.cause()); + } + }); + } +} ``` + +### 3. ConnectionTracker + +Maintains real-time inventory of all active network connections per virtual cluster. + +- **What it does**: Provides real-time visibility into all active connections (downstream and upstream) for each virtual cluster +- **Key Responsibilities:** + - **Bidirectional Tracking**: Separately tracks downstream connections (client→proxy) and upstream connections (proxy→Kafka) using `ConcurrentHashMap` + - **Channel Management**: Maintains collections of active `Channel` objects for bulk operations like graceful closure + - **Lifecycle Integration**: Integrates with `ProxyChannelStateMachine` to automatically track connection establishment and closure events + - **Cleanup Logic**: Automatically removes references to closed channels and cleans up empty cluster entries to prevent memory leaks + - **Thread Safety**: Uses `ConcurrentHashMap` and `AtomicInteger` for thread-safe operations from multiple Netty event loops + +```java public class ConnectionTracker { // Downstream connections (client → proxy) @@ -632,49 +1284,20 @@ public class ConnectionTracker { onConnectionClosed(clusterName, channel, downstreamConnections, downstreamChannelsByCluster); } - - /** - Called by ConnectionDrainManager - */ public Set getDownstreamActiveChannels(String clusterName) { Set channels = downstreamChannelsByCluster.get(clusterName); return channels != null ? Set.copyOf(channels) : Set.of(); } - // === UPSTREAM CONNECTION TRACKING === - public void onUpstreamConnectionEstablished(String clusterName, Channel channel) { - upstreamConnections.computeIfAbsent(clusterName, k -> new AtomicInteger(0)).incrementAndGet(); - upstreamChannelsByCluster.computeIfAbsent(clusterName, k -> ConcurrentHashMap.newKeySet()).add(channel); - } - - public void onUpstreamConnectionClosed(String clusterName, Channel channel) { - onConnectionClosed(clusterName, channel, upstreamConnections, upstreamChannelsByCluster); - } + // Similar methods for upstream connections... - /** - Called by ConnectionDrainManager - */ - public Set getUpstreamActiveChannels(String clusterName) { - Set channels = upstreamChannelsByCluster.get(clusterName); - return channels != null ? Set.copyOf(channels) : Set.of(); - } - - /** - Called by ConnectionDrainManager - */ public int getTotalConnectionCount(String clusterName) { return getDownstreamActiveConnectionCount(clusterName) + getUpstreamActiveConnectionCount(clusterName); } - /** - * Common method to remove a connection and clean up empty entries. - * This method decrements the connection counter and removes the channel from the set, - * cleaning up empty entries to prevent memory leaks. - */ private void onConnectionClosed(String clusterName, Channel channel, Map connectionCounters, Map> channelsByCluster) { - // Decrement counter and remove if zero or negative AtomicInteger counter = connectionCounters.get(clusterName); if (counter != null) { counter.decrementAndGet(); @@ -683,7 +1306,6 @@ public class ConnectionTracker { } } - // Remove channel from set and remove empty sets Set channels = channelsByCluster.get(clusterName); if (channels != null) { channels.remove(channel); @@ -695,13 +1317,20 @@ public class ConnectionTracker { } ``` -4. **InFlightMessageTracker** - - **What it does** - Tracks **pending Kafka requests** to ensure no messages are lost during connection closure. This enables the "wait for completion" strategy - connections are only closed after all pending requests have received responses. - - **Key Responsibilities:** - - **Request Tracking**: Increments counters when Kafka requests are sent upstream in `ProxyChannelStateMachine` - - **Response Tracking**: Decrements counters when Kafka responses are received in `ProxyChannelStateMachine` - - **Channel Cleanup**: Handles cleanup when channels close unexpectedly, adjusting counts appropriately -``` +### 4. InFlightMessageTracker + +Tracks pending Kafka requests to ensure no messages are lost during connection closure. + +- **What it does**: Maintains counters of pending Kafka requests per channel and cluster to enable "wait for completion" strategy during graceful shutdown +- **Key Responsibilities:** + - **Request Tracking**: Increments counters when Kafka requests are sent upstream (called from `ProxyChannelStateMachine.messageFromClient()`) + - **Response Tracking**: Decrements counters when Kafka responses are received (called from `ProxyChannelStateMachine.messageFromServer()`) + - **Per-Channel Counts**: Maintains a two-level map: `cluster name → channel → pending count` for granular tracking + - **Cluster Totals**: Maintains a separate map for quick cluster-wide total lookup without iterating all channels + - **Channel Cleanup**: When a channel closes unexpectedly, adjusts counts appropriately to prevent stuck counters + - **Thread Safety**: Uses `ConcurrentHashMap` and `AtomicInteger` for thread-safe concurrent access + +```java public class InFlightMessageTracker { // Map from cluster name to channel to pending request count @@ -712,9 +1341,6 @@ public class InFlightMessageTracker { /** * Records that a request has been sent to the upstream cluster. - * - * @param clusterName The name of the virtual cluster. - * @param channel The channel handling the request. */ public void onRequestSent(String clusterName, Channel channel) { pendingRequests.computeIfAbsent(clusterName, k -> new ConcurrentHashMap<>()) @@ -727,9 +1353,6 @@ public class InFlightMessageTracker { /** * Records that a response has been received from the upstream cluster. - * - * @param clusterName The name of the virtual cluster. - * @param channel The channel handling the response. */ public void onResponseReceived(String clusterName, Channel channel) { Map clusterRequests = pendingRequests.get(clusterName); @@ -756,10 +1379,7 @@ public class InFlightMessageTracker { } /** - * Records that a channel has been closed, clearing all pending requests for that channel. - * - * @param clusterName The name of the virtual cluster. - * @param channel The channel that was closed. + * Records that a channel has been closed, clearing all pending requests. */ public void onChannelClosed(String clusterName, Channel channel) { Map clusterRequests = pendingRequests.get(clusterName); @@ -768,7 +1388,6 @@ public class InFlightMessageTracker { if (channelCounter != null) { int pendingCount = channelCounter.get(); if (pendingCount > 0) { - // Subtract from total AtomicInteger totalCounter = totalPendingByCluster.get(clusterName); if (totalCounter != null) { int newTotal = totalCounter.addAndGet(-pendingCount); @@ -785,13 +1404,6 @@ public class InFlightMessageTracker { } } - /** - * Gets the number of pending requests for a specific channel in a virtual cluster. - * - * @param clusterName The name of the virtual cluster. - * @param channel The channel. - * @return The number of pending requests. - */ public int getPendingRequestCount(String clusterName, Channel channel) { Map clusterRequests = pendingRequests.get(clusterName); if (clusterRequests != null) { @@ -801,12 +1413,6 @@ public class InFlightMessageTracker { return 0; } - /** - * Gets the total number of pending requests for a virtual cluster across all channels. - * - * @param clusterName The name of the virtual cluster. - * @return The total number of pending requests. - */ public int getTotalPendingRequestCount(String clusterName) { AtomicInteger counter = totalPendingByCluster.get(clusterName); return counter != null ? Math.max(0, counter.get()) : 0; @@ -814,45 +1420,77 @@ public class InFlightMessageTracker { } ``` -5. **Changes in ProxyChannelStateMachine** - We need to enhance the existing state machine for - - **Connection Lifecycle**: Automatically notifies ConnectionTracker when connections are established/closed - - **In-flight Message Tracking**: Automatically notifies InFlightMessageTracker when requests/responses flow through +### 5. ConfigurationChangeRollbackTracker -Example code changes for existing ProxyChannelStateMachine methods -``` -void messageFromServer(Object msg) { - // Track responses received from upstream Kafka (completing in-flight requests) - if (inFlightTracker != null && msg instanceof ResponseFrame && backendHandler != null) { - inFlightTracker.onResponseReceived(clusterName, backendHandler.serverCtx().channel()); - } +Maintains a record of all cluster operations so they can be reversed if the overall configuration change fails. + +- **What it does**: Records all successful cluster operations during a configuration change so they can be undone if a later operation fails +- **Key Responsibilities:** + - **Removal Tracking**: Stores the cluster name and original `VirtualClusterModel` for each removed cluster, enabling re-addition on rollback + - **Modification Tracking**: Stores both the original and new `VirtualClusterModel` for each modified cluster, enabling revert to original state + - **Addition Tracking**: Stores the cluster name and `VirtualClusterModel` for each added cluster, enabling removal on rollback + - **Rollback Order**: Provides ordered lists to enable rollback in reverse order: Added → Modified → Removed + +```java +public class ConfigurationChangeRollbackTracker { - .... + private final List removedClusters = new ArrayList<>(); + private final List modifiedClusters = new ArrayList<>(); + private final List addedClusters = new ArrayList<>(); - // Track responses being sent to client on downstream channel - if (inFlightTracker != null && msg instanceof ResponseFrame) { - inFlightTracker.onResponseReceived(clusterName, frontendHandler.clientCtx().channel()); + private final Map removedModels = new HashMap<>(); + private final Map originalModels = new HashMap<>(); + private final Map modifiedModels = new HashMap<>(); + private final Map addedModels = new HashMap<>(); + + public void trackRemoval(String clusterName, VirtualClusterModel removedModel) { + removedClusters.add(clusterName); + removedModels.put(clusterName, removedModel); } -} -void messageFromClient(Object msg) { - // Track requests being sent upstream (creating in-flight messages) - if (inFlightTracker != null && msg instanceof RequestFrame && backendHandler != null) { - inFlightTracker.onRequestSent(clusterName, backendHandler.serverCtx().channel()); + public void trackModification(String clusterName, VirtualClusterModel originalModel, + VirtualClusterModel newModel) { + modifiedClusters.add(clusterName); + originalModels.put(clusterName, originalModel); + modifiedModels.put(clusterName, newModel); + } + + public void trackAddition(String clusterName, VirtualClusterModel addedModel) { + addedClusters.add(clusterName); + addedModels.put(clusterName, addedModel); } - .... + // Getter methods for rollback operations... } +``` + +### 6. Integration with ProxyChannelStateMachine -void onClientRequest(SaslDecodePredicate dp, - Object msg) { - .... +The existing `ProxyChannelStateMachine` is enhanced to integrate with connection tracking and in-flight message tracking. - // Track requests received from client on downstream channel - if (inFlightTracker != null && msg instanceof RequestFrame) { - inFlightTracker.onRequestSent(clusterName, frontendHandler.clientCtx().channel()); +- **What it does**: Adds hooks into the existing state machine to notify `ConnectionTracker` and `InFlightMessageTracker` of connection and message lifecycle events +- **Key Responsibilities:** + - **Connection Lifecycle**: Automatically notifies `ConnectionTracker` when connections are established (`toClientActive`, `toForwarding`) and closed (`onServerInactive`, `onClientInactive`) + - **In-flight Message Tracking**: Automatically notifies `InFlightMessageTracker` when requests are sent upstream (`messageFromClient`) and responses received (`messageFromServer`) + - **Cleanup on Close**: Ensures `InFlightMessageTracker.onChannelClosed()` is called when channels close to clear any pending counts + +```java +// Example integration points in ProxyChannelStateMachine + +void messageFromServer(Object msg) { + // Track responses received from upstream Kafka + if (inFlightTracker != null && msg instanceof ResponseFrame && backendHandler != null) { + inFlightTracker.onResponseReceived(clusterName, backendHandler.serverCtx().channel()); } + // ... existing logic +} - .... +void messageFromClient(Object msg) { + // Track requests being sent upstream + if (inFlightTracker != null && msg instanceof RequestFrame && backendHandler != null) { + inFlightTracker.onRequestSent(clusterName, backendHandler.serverCtx().channel()); + } + // ... existing logic } void onServerInactive() { @@ -860,12 +1498,11 @@ void onServerInactive() { if (connectionTracker != null && backendHandler != null) { connectionTracker.onUpstreamConnectionClosed(clusterName, backendHandler.serverCtx().channel()); } - // Clear any pending in-flight messages for this upstream channel + // Clear any pending in-flight messages if (inFlightTracker != null && backendHandler != null) { inFlightTracker.onChannelClosed(clusterName, backendHandler.serverCtx().channel()); } - - .... + // ... existing logic } void onClientInactive() { @@ -873,33 +1510,178 @@ void onClientInactive() { if (connectionTracker != null && frontendHandler != null) { connectionTracker.onDownstreamConnectionClosed(clusterName, frontendHandler.clientCtx().channel()); } - // Clear any pending in-flight messages for this downstream channel if (inFlightTracker != null && frontendHandler != null) { inFlightTracker.onChannelClosed(clusterName, frontendHandler.clientCtx().channel()); } - - .... + // ... existing logic } private void toClientActive(ProxyChannelState.ClientActive clientActive, KafkaProxyFrontendHandler frontendHandler) { - .... // Track downstream connection establishment if (connectionTracker != null) { connectionTracker.onDownstreamConnectionEstablished(clusterName, frontendHandler.clientCtx().channel()); } + // ... existing logic } private void toForwarding(Forwarding forwarding) { - .... // Track upstream connection establishment if (connectionTracker != null && backendHandler != null) { connectionTracker.onUpstreamConnectionEstablished(clusterName, backendHandler.serverCtx().channel()); } + // ... existing logic } ``` -# Challenges/Open questions -- If for some reason, loading of the new cluster configs fails, the code will automatically rollback to the previous state. However this will put the app in such a state that the current config file content does not match with the actual running cluster config. -- What if the rollback fails (for some unforeseen reason), the only way for the operator to know this is via Logs. In such cases, a full app restart might be required. -- If there are multiple gateway nodes running, and if there is failure in few nodes, we may have to introduce some sort of status co-ordinator rather than relying that all instances will behave the same. +### 7. Rejecting New Connections During Drain + +The `KafkaProxyFrontendHandler` checks if a cluster is draining before accepting new connections. + +- **What it does**: Adds a guard check in `channelActive()` to reject new connections when a cluster is being drained +- **How it works**: Calls `connectionDrainManager.shouldAcceptConnection(clusterName)` before allowing the connection to proceed. If the cluster is draining, immediately closes the channel with a log message. +- **Why this is needed**: Without this check, new connections could be established while we're trying to drain existing connections, making the drain process take longer or never complete. + +```java +public class KafkaProxyFrontendHandler + extends ChannelInboundHandlerAdapter + implements NetFilter.NetFilterContext { + + @Override + public void channelActive(ChannelHandlerContext ctx) throws Exception { + this.clientCtx = ctx; + + // Check if we should accept this connection (not draining) + String clusterName = virtualClusterModel.getClusterName(); + if (connectionDrainManager != null && !connectionDrainManager.shouldAcceptConnection(clusterName)) { + LOGGER.info("Rejecting new connection for draining cluster '{}'", clusterName); + ctx.close(); + return; + } + + this.proxyChannelStateMachine.onClientActive(this); + super.channelActive(this.clientCtx); + } +} +``` + +## Graceful Restart Flow Diagram + +``` +┌───────────────────────────────────────────────────────────────────────────┐ +│ PHASE 1: INITIATE DRAINING │ +│ │ +│ VirtualClusterManager.restartVirtualCluster() │ +│ │ │ +│ ▼ │ +│ ConnectionDrainManager.startDraining(clusterName) │ +│ - drainingClusters.put(clusterName, true) │ +│ - New connections will be REJECTED │ +└───────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ PHASE 2: APPLY BACKPRESSURE │ +│ │ +│ ConnectionDrainManager.gracefullyCloseConnections() │ +│ │ +│ For each DOWNSTREAM channel (client → proxy): │ +│ - channel.config().setAutoRead(false) │ +│ - Stops receiving NEW client requests │ +│ │ +│ For each UPSTREAM channel (proxy → Kafka): │ +│ - autoRead remains TRUE │ +│ - Continues receiving Kafka responses │ +│ │ +│ Result: In-flight count decreases naturally as responses arrive │ +└───────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ PHASE 3: MONITOR & CLOSE CHANNELS │ +│ │ +│ For each channel: │ +│ ┌─────────────────────────────────────────────────────────────────────┐ │ +│ │ scheduler.scheduleAtFixedRate(() -> { │ │ +│ │ pendingRequests = inFlightTracker.getPendingRequestCount() │ │ +│ │ │ │ +│ │ if (pendingRequests == 0) { │ │ +│ │ channel.close() ◄── Safe to close! │ │ +│ │ } │ │ +│ │ }, 50ms, 100ms) │ │ +│ │ │ │ +│ │ scheduler.schedule(() -> { │ │ +│ │ if (!done) channel.close() ◄── Force close on timeout │ │ +│ │ }, 30 seconds) │ │ +│ └─────────────────────────────────────────────────────────────────────┘ │ +└───────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ PHASE 4: DEREGISTER & REGISTER │ +│ │ +│ endpointRegistry.deregisterVirtualCluster(oldGateway) │ +│ - Unbinds network ports │ +│ │ +│ endpointRegistry.registerVirtualCluster(newGateway) │ +│ - Binds network ports with new configuration │ +└───────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────────────┐ +│ PHASE 5: STOP DRAINING │ +│ │ +│ ConnectionDrainManager.stopDraining(clusterName) │ +│ - drainingClusters.remove(clusterName) │ +│ - New connections now ACCEPTED │ +│ - Cluster is fully operational with new configuration │ +└───────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +# Example Usage + +## Triggering a Reload with curl + +```bash +curl -X POST http://localhost:9190/admin/config/reload \ + -H "Content-Type: application/yaml" \ + --data-binary @new-config.yaml +``` + +**Response:** +```json +{ + "success": true, + "message": "Configuration reloaded successfully", + "clustersModified": 1, + "clustersAdded": 1, + "clustersRemoved": 0, + "timestamp": "2024-01-15T10:30:00.123456Z" +} +``` + +## Example Configuration with Reload Endpoint + +```yaml +management: + bindAddress: 0.0.0.0 + port: 9190 + endpoints: + prometheus: {} + configReload: + enabled: true + timeout: 60s + +virtualClusters: +- name: "demo-cluster" + targetCluster: + bootstrapServers: "broker:9092" + gateways: + - name: "default-gateway" + portIdentifiesNode: + bootstrapAddress: "localhost:9092" +``` + +--- From df3aabf1861cc4620f2052efcccec9ef09706de2 Mon Sep 17 00:00:00 2001 From: Sam Barker Date: Wed, 18 Feb 2026 16:36:11 +1300 Subject: [PATCH 03/17] Restructure proposal to reflect discussion consensus MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rewrite the hot reload proposal to focus on architectural decisions rather than implementation detail. The PR discussion has established consensus on several key points that the document didn't reflect: - Reframe around applyConfiguration(Configuration) as the core API, decoupled from trigger mechanisms (HTTP, file watcher, operator) - Remove all Java class implementations and handler chains — these belong in the code PR where they're reviewable in context - Call out remove+add with brief per-cluster downtime as deliberate - Call out all-or-nothing rollback as the initial default, consistent with startup behaviour - Move ReloadOptions to deployment-level static configuration rather than per-call parameters - Identify plugin resource tracking as a known gap with pointer to separate proposal - Flag open questions (config granularity, failure behaviour options, drain timeout configurability) - Defer trigger mechanism design as explicit future work Assisted-by: Claude claude-opus-4-6 Signed-off-by: Sam Barker Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 1736 ++------------------------- 1 file changed, 109 insertions(+), 1627 deletions(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index 588714c..e3cca70 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -1,1687 +1,169 @@ -# Hot Reload Feature - HTTP-Based Approach +# Changing Active Proxy Configuration -As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of Kroxylicious app. -This proposal describes the dynamic reload feature, which enables operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. -This feature transforms Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system. +This proposal describes a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. +The proxy exposes a core `applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. -## HTTP-Based vs File Watcher Approach +## Current situation -The original proposal used a file-based watcher mechanism to detect configuration changes. This has been replaced with an **HTTP-based trigger mechanism** for the following reasons: +Any change to Kroxylicious configuration — adding, removing, or modifying a virtual cluster, changing a filter definition, or updating default filters — requires a full restart of the proxy process. +This means all virtual clusters are torn down and rebuilt, dropping every client connection even if only one cluster was modified. -| Aspect | File Watcher | HTTP Endpoint | -|--------|--------------|---------------| -| **Trigger** | Automatic on file change | Explicit HTTP POST request | -| **Control** | Passive monitoring | Active, operator-controlled | -| **Configuration Delivery** | Read from filesystem | Sent in request body | -| **Response** | Asynchronous (via logs) | Synchronous HTTP response | -| **Validation** | After file is saved | Before applying changes | -| **Rollback** | Manual file restore | Automatic on failure | +In Kubernetes deployments, a configuration change means a pod restart. +In standalone deployments, it means stopping and restarting the process. +Both cause a service interruption that is disproportionate to the scope of the change. -This proposal is structured as a multi-part implementation to ensure clear separation of concerns: +## Motivation -- **Part 1: HTTP-Based Configuration Reload Endpoint** - This part focuses on the HTTP endpoint that receives new configurations, validates them, and triggers the hot-reload process. It provides synchronous feedback via HTTP response with detailed reload results. +Operators need to be able to modify proxy configuration in place. +Common scenarios include: -- **Part 2: Graceful Virtual Cluster Restart** - This part handles the actual restart operations, including graceful connection draining, in-flight message completion, and rollback mechanisms. It takes the change decisions from Part 1 and executes them safely while ensuring minimal service disruption. +- **Adding or removing virtual clusters** as tenants are onboarded or offboarded. +- **Updating filter configuration** (e.g. changing encryption keys, adjusting rate limits, modifying ACL rules). +- **Rotating TLS certificates or credentials** that filters reference. -**POC PR** - https://github.com/kroxylicious/kroxylicious/pull/3176 ---- +The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. +Unaffected clusters should continue serving traffic without interruption. -# Part 1: HTTP-Based Configuration Reload Endpoint +## Proposal -With this framework, operators can trigger configuration reloads by sending an HTTP POST request to `/admin/config/reload` with the new YAML configuration in the request body. The endpoint validates the configuration, detects changes, and orchestrates the reload process with full rollback support. +### Core API: `applyConfiguration()` -## Endpoint Configuration - -To enable the reload endpoint, add the following to your kroxylicious configuration: - -```yaml -management: - endpoints: - prometheus: {} - configReload: - enabled: true - timeout: 60s # Optional, defaults to 60s -``` - -## HTTP API - -**Endpoint:** `POST /admin/config/reload` - -**Request:** -- **Method:** POST -- **Content-Type:** `application/yaml`, `text/yaml`, or `application/x-yaml` -- **Body:** Complete YAML configuration - -**Response:** -- **Content-Type:** `application/json` -- **Status:** 200 OK (success), 400 Bad Request (validation error), 409 Conflict (concurrent reload), 500 Internal Server Error (failure) - -**Example Response (Success):** -```json -{ - "success": true, - "message": "Configuration reloaded successfully", - "clustersModified": 1, - "clustersAdded": 0, - "clustersRemoved": 0, - "timestamp": "2024-01-15T10:30:00Z" -} -``` - -**Example Response (Failure):** -```json -{ - "success": false, - "message": "Configuration validation failed: invalid bootstrap servers", - "clustersModified": 0, - "clustersAdded": 0, - "clustersRemoved": 0, - "timestamp": "2024-01-15T10:30:00Z" -} -``` - -> **WARNING:** This endpoint has NO authentication and is INSECURE by design. Use network policies or firewalls to restrict access. - -## Core Classes & Structure - -### 1. ConfigurationReloadEndpoint - -HTTP POST endpoint handler for triggering configuration reload at `/admin/config/reload`. - -- **What it does**: Receives HTTP POST requests containing YAML configuration and initiates the reload process -- **Key Responsibilities:** - - Extracts the YAML configuration from the HTTP request body - - Delegates request processing to `ReloadRequestProcessor` - - Formats successful responses using `ResponseFormatter` - - Handles different exception types and returns appropriate HTTP status codes: - - `400 Bad Request` for validation errors (invalid YAML, wrong content-type) - - `409 Conflict` for concurrent reload attempts - - `500 Internal Server Error` for reload failures - - Provides structured JSON response with reload results - -```java -public class ConfigurationReloadEndpoint implements Function { - - public static final String PATH = "/admin/config/reload"; - - private final ReloadRequestProcessor requestProcessor; - private final ResponseFormatter responseFormatter; - - public ConfigurationReloadEndpoint( - ReloadRequestProcessor requestProcessor, - ResponseFormatter responseFormatter) { - this.requestProcessor = Objects.requireNonNull(requestProcessor); - this.responseFormatter = Objects.requireNonNull(responseFormatter); - } - - @Override - public HttpResponse apply(HttpRequest request) { - try { - // Create context from request - ReloadRequestContext context = ReloadRequestContext.from(request); - - // Process request through handler chain - ReloadResponse response = requestProcessor.process(context); - - // Format and return response - return responseFormatter.format(response, request); - } - catch (ValidationException e) { - return createErrorResponse(request, HttpResponseStatus.BAD_REQUEST, e.getMessage()); - } - catch (ConcurrentReloadException e) { - return createErrorResponse(request, HttpResponseStatus.CONFLICT, e.getMessage()); - } - catch (ReloadException e) { - return createErrorResponse(request, HttpResponseStatus.INTERNAL_SERVER_ERROR, e.getMessage()); - } - } -} -``` - -### 2. ReloadRequestProcessor - -Processes reload requests using the **Chain of Responsibility** pattern. Each handler performs a specific task and passes the context to the next handler. - -- **What it does**: Orchestrates the request processing pipeline by chaining multiple handlers that validate, parse, and execute the reload -- **Key Responsibilities:** - - Builds the handler chain in the correct order: validation → parsing → execution - - Passes an immutable `ReloadRequestContext` through each handler - - Each handler can enrich the context (e.g., add parsed Configuration) or throw exceptions - - Returns the final `ReloadResponse` from the context after all handlers complete - - Enforces maximum content length (10MB) to prevent memory exhaustion - -```java -public class ReloadRequestProcessor { - - private static final int MAX_CONTENT_LENGTH = 10 * 1024 * 1024; // 10MB - - private final List handlers; - - public ReloadRequestProcessor( - ConfigParser parser, - ConfigurationReloadOrchestrator orchestrator, - long timeoutSeconds) { - this.handlers = List.of( - new ContentTypeValidationHandler(), // 1. Validates Content-Type header - new ContentLengthValidationHandler(MAX_CONTENT_LENGTH), // 2. Validates body size - new ConfigurationParsingHandler(parser), // 3. Parses YAML to Configuration - new ConfigurationReloadHandler(orchestrator, timeoutSeconds)); // 4. Executes reload - } - - public ReloadResponse process(ReloadRequestContext context) throws ReloadException { - ReloadRequestContext currentContext = context; - - for (ReloadRequestHandler handler : handlers) { - currentContext = handler.handle(currentContext); - } - - return currentContext.getResponse(); - } -} -``` - -**Handler Chain:** +The central operation is: ``` -┌─────────────────────────────────┐ -│ ContentTypeValidationHandler │ Validates Content-Type: application/yaml -└───────────────┬─────────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ ContentLengthValidationHandler │ Validates body size <= 10MB -└───────────────┬─────────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ ConfigurationParsingHandler │ Parses YAML → Configuration object -└───────────────┬─────────────────┘ - │ - ▼ -┌─────────────────────────────────┐ -│ ConfigurationReloadHandler │ Executes reload via orchestrator -└─────────────────────────────────┘ -``` - -### 3. ReloadRequestContext - -Immutable context object passed through the request processing chain. Uses the builder pattern for creating modified contexts. - -- **What it does**: Carries request data and processing results through the handler chain without mutation -- **Key Responsibilities:** - - Holds the original `HttpRequest` and extracted request body - - Stores the parsed `Configuration` after parsing handler completes - - Stores the final `ReloadResponse` after reload handler completes - - Provides immutable "with" methods that return new context instances with updated fields - - Uses `Builder` pattern for clean construction and modification - -```java -public class ReloadRequestContext { - - private final HttpRequest httpRequest; - private final String requestBody; - private final Configuration parsedConfiguration; - private final ReloadResponse response; - - public static ReloadRequestContext from(HttpRequest request) { - String body = null; - if (request instanceof FullHttpRequest fullRequest) { - ByteBuf content = fullRequest.content(); - if (content.readableBytes() > 0) { - body = content.toString(StandardCharsets.UTF_8); - } - } - - return new Builder() - .withHttpRequest(request) - .withRequestBody(body) - .build(); - } - - // Immutable "with" methods return new context instances - public ReloadRequestContext withParsedConfiguration(Configuration config) { - return new Builder(this).withParsedConfiguration(config).build(); - } - - public ReloadRequestContext withResponse(ReloadResponse response) { - return new Builder(this).withResponse(response).build(); - } -} +applyConfiguration(Configuration) ``` -### 4. ConfigurationReloadOrchestrator - -Orchestrates configuration reload operations with **concurrency control**, **validation**, and **state tracking**. Uses `ReentrantLock` to prevent concurrent reloads. - -- **What it does**: Acts as the main coordinator for the entire reload workflow, from validation through execution to state management -- **Key Responsibilities:** - - **Concurrency Control**: Uses `ReentrantLock.tryLock()` to prevent concurrent reloads and returns `ConcurrentReloadException` if a reload is already in progress - - **Configuration Validation**: Validates the new configuration using the `Features` framework before applying - - **FilterChainFactory Management**: Creates a new `FilterChainFactory` with updated filter definitions and performs atomic swap on success - - **Rollback on Failure**: If reload fails, closes the new factory and keeps the old factory active - - **State Tracking**: Maintains reload state (IDLE/IN_PROGRESS) via `ReloadStateManager` - - **Disk Persistence**: Persists successful configuration to disk by replacing the existing config file with the new one. A backup of the old config is also taken (.bak extension) +The caller provides a complete `Configuration` object. +The proxy compares it against the currently running configuration, determines what changed, and applies the changes. -```java -public class ConfigurationReloadOrchestrator { +This is a **state-of-the-world** approach: the caller provides the desired end state, and the proxy is responsible for computing and executing the diff. +This is the right starting point — it is simple to reason about and avoids the complexity of delta-based or partial-update APIs. +More granular approaches (deltas, targeted snapshots) are worth exploring later, but the initial API should leave room for them without committing to them now. - private final ConfigurationChangeHandler configurationChangeHandler; - private final PluginFactoryRegistry pluginFactoryRegistry; - private final Features features; - private final ReloadStateManager stateManager; - private final ReentrantLock reloadLock; +**Trigger mechanisms are explicitly out of scope for this proposal.** +The `applyConfiguration()` operation is the internal interface that any trigger plugs into. +How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. +Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). - private Configuration currentConfiguration; - private final @Nullable Path configFilePath; +**Failure behaviour is deployment-level static configuration, not a per-call parameter.** +Whether the proxy rolls back, terminates, or continues on failure will vary between deployments (a multi-tenant ingress has different requirements than a sidecar), but it should not vary between invocations within the same deployment. +These decisions belong in the proxy's static configuration, keeping the `applyConfiguration()` signature simple. - // Shared mutable reference to FilterChainFactory - enables atomic swaps during hot reload - private final AtomicReference filterChainFactoryRef; +### Configuration change detection - public ConfigurationReloadOrchestrator( - Configuration initialConfiguration, - ConfigurationChangeHandler configurationChangeHandler, - PluginFactoryRegistry pluginFactoryRegistry, - Features features, - @Nullable Path configFilePath, - AtomicReference filterChainFactoryRef) { - this.currentConfiguration = Objects.requireNonNull(initialConfiguration); - this.configurationChangeHandler = Objects.requireNonNull(configurationChangeHandler); - this.pluginFactoryRegistry = Objects.requireNonNull(pluginFactoryRegistry); - this.features = Objects.requireNonNull(features); - this.filterChainFactoryRef = Objects.requireNonNull(filterChainFactoryRef); - this.configFilePath = configFilePath; - this.stateManager = new ReloadStateManager(); - this.reloadLock = new ReentrantLock(); - } +When `applyConfiguration()` is called, the proxy compares the new configuration against the running state to determine which virtual clusters need to be restarted. +A cluster requires a restart if any of the following changed: - /** - * Reload configuration with concurrency control. - * This method implements the Template Method pattern - it defines the reload algorithm - * skeleton with fixed steps. - */ - public CompletableFuture reload(Configuration newConfig) { - // 1. Check if reload already in progress - if (!reloadLock.tryLock()) { - return CompletableFuture.failedFuture( - new ConcurrentReloadException("A reload operation is already in progress")); - } +- **The virtual cluster model itself** — bootstrap address, TLS settings, gateway configuration, or any other property that contributes to `VirtualClusterModel.equals()`. +- **A filter definition used by the cluster** — if the type or configuration of a `NamedFilterDefinition` referenced by the cluster changed (compared via `equals()`). +- **The default filters list** — if the cluster relies on default filters and the default filters list changed (order matters, since filter chain execution is sequential). - Instant startTime = Instant.now(); +Clusters where none of these changed are left untouched — they continue serving traffic throughout the apply operation. - try { - // 2. Mark reload as started - stateManager.startReload(); +### Cluster modification: remove + add - // 3. Validate configuration - Configuration validatedConfig = validateConfiguration(newConfig); +A modified virtual cluster is restarted by tearing it down and rebuilding it with the new configuration. +This is a **remove then add** operation: - // 4. Execute reload - return executeReload(validatedConfig, startTime) - .whenComplete((result, error) -> { - if (error != null) { - stateManager.recordFailure(error); - } - else { - stateManager.recordSuccess(result); - this.currentConfiguration = validatedConfig; - persistConfigurationToDisk(validatedConfig); - } - }); - } - finally { - reloadLock.unlock(); - } - } +1. Gracefully drain existing connections (see below). +2. Deregister the cluster's gateways from the endpoint registry (unbind ports). +3. Register the cluster's new gateways with the endpoint registry (bind ports with new configuration). +4. Accept new connections. - /** - * Execute the configuration reload by creating a new FilterChainFactory, - * building a change context, and delegating to ConfigurationChangeHandler. - */ - private CompletableFuture executeReload(Configuration newConfig, Instant startTime) { - // 1. Create new FilterChainFactory with updated filter definitions - FilterChainFactory newFactory = new FilterChainFactory(pluginFactoryRegistry, newConfig.filterDefinitions()); +This means a modified cluster experiences a brief period of unavailability while its ports are unbound and rebound. +Clients connected to the cluster will be disconnected during the drain phase. - // 2. Get old factory for rollback capability - FilterChainFactory oldFactory = filterChainFactoryRef.get(); +This is a deliberate design choice. +More surgical approaches — such as swapping the filter chain on existing connections without dropping them, or performing a rolling handoff — would reduce disruption, but they add significant complexity (connection migration, state transfer between filter chain instances, partial rollback of in-flight connections). +The remove+add approach is the right starting point: it is straightforward, predictable, and consistent with how the proxy handles startup failures today. +More surgical alternatives are worth exploring as future work once the foundation is solid. - // 3. Build change context with both old and new factories - List oldModels = currentConfiguration.virtualClusterModel(pluginFactoryRegistry); - List newModels = newConfig.virtualClusterModel(pluginFactoryRegistry); +Changes are processed in the order: **remove → modify → add**. +Removing clusters first frees up ports and resources that new or modified clusters may need. - ConfigurationChangeContext changeContext = new ConfigurationChangeContext( - currentConfiguration, newConfig, - oldModels, newModels, - oldFactory, newFactory); +### Graceful connection draining - // 4. Execute configuration changes - return configurationChangeHandler.handleConfigurationChange(changeContext) - .thenApply(v -> { - // SUCCESS: Atomically swap to new factory - filterChainFactoryRef.set(newFactory); - if (oldFactory != null) { - oldFactory.close(); - } - return buildReloadResult(changeContext, startTime); - }) - .exceptionally(error -> { - // FAILURE: Rollback - close new factory, keep old factory - newFactory.close(); - throw new CompletionException("Configuration reload failed", error); - }); - } -} -``` - -### 5. ConfigurationChangeHandler - -Orchestrates the entire configuration change process from detection to execution with rollback capability. - -- **What it does**: Coordinates multiple change detectors, aggregates their results, and executes cluster operations in the correct order -- **Key Responsibilities:** - - **Detector Coordination**: Accepts a list of `ChangeDetector` implementations and runs all of them to identify changes - - **Result Aggregation**: Uses `LinkedHashSet` to merge results from all detectors, removing duplicates while maintaining order - - **Ordered Execution**: Processes changes in the correct order: Remove → Modify → Add (to free up ports/resources first) - - **Rollback Tracking**: Creates a `ConfigurationChangeRollbackTracker` to track all successful operations for potential rollback - - **Rollback on Failure**: If any operation fails, initiates rollback of all previously successful operations in reverse order - -```java -public class ConfigurationChangeHandler { - - private final List changeDetectors; - private final VirtualClusterManager virtualClusterManager; - - public ConfigurationChangeHandler(List changeDetectors, - VirtualClusterManager virtualClusterManager) { - this.changeDetectors = List.copyOf(changeDetectors); - this.virtualClusterManager = virtualClusterManager; - } - - /** - * Main entry point for handling configuration changes. - */ - public CompletableFuture handleConfigurationChange(ConfigurationChangeContext changeContext) { - // 1. Detect changes using all registered detectors - ChangeResult changes = detectChanges(changeContext); +Before tearing down a modified or removed cluster, the proxy drains its connections gracefully rather than dropping them abruptly. +The drain process has three phases: - if (!changes.hasChanges()) { - LOGGER.info("No changes detected - hot-reload not needed"); - return CompletableFuture.completedFuture(null); - } +1. **Reject new connections.** The cluster is marked as draining. Any new client connection attempt to the cluster is immediately refused. Unaffected clusters continue accepting connections normally. - // 2. Process changes with rollback tracking - ConfigurationChangeRollbackTracker rollbackTracker = new ConfigurationChangeRollbackTracker(); +2. **Apply backpressure to existing connections.** On each downstream (client → proxy) channel, reading is disabled (`autoRead = false`) so no new requests are accepted from the client. Upstream (proxy → Kafka) channels continue reading so that responses to already-forwarded requests can flow back to clients. - return processConfigurationChanges(changes, changeContext, rollbackTracker) - .whenComplete((result, throwable) -> { - if (throwable != null) { - LOGGER.error("Configuration change failed - initiating rollback", throwable); - performRollback(rollbackTracker); - } - else { - LOGGER.info("Configuration hot-reload completed successfully - {} operations processed", - changes.getTotalOperations()); - } - }); - } +3. **Wait for in-flight requests to complete, then close.** Each connection is monitored for in-flight Kafka requests. Once the pending request count for a connection reaches zero, the connection is closed. If the count does not reach zero within the drain timeout, the connection is force-closed. The drain timeout should be configurable — long-running consumer rebalances or slow produces with `acks=all` can legitimately exceed a short default. - /** - * Coordinates multiple change detectors and aggregates their results. - */ - private ChangeResult detectChanges(ConfigurationChangeContext context) { - Set allClustersToRemove = new LinkedHashSet<>(); - Set allClustersToAdd = new LinkedHashSet<>(); - Set allClustersToModify = new LinkedHashSet<>(); +This approach ensures that in-progress Kafka operations complete where possible, while bounding the time the proxy waits before proceeding with the restart. - changeDetectors.forEach(detector -> { - try { - ChangeResult detectorResult = detector.detectChanges(context); - allClustersToRemove.addAll(detectorResult.clustersToRemove()); - allClustersToAdd.addAll(detectorResult.clustersToAdd()); - allClustersToModify.addAll(detectorResult.clustersToModify()); - } - catch (Exception e) { - LOGGER.error("Error in change detector '{}': {}", detector.getName(), e.getMessage(), e); - } - }); +### Failure behaviour and rollback - return new ChangeResult( - new ArrayList<>(allClustersToRemove), - new ArrayList<>(allClustersToAdd), - new ArrayList<>(allClustersToModify)); - } +The initial default is **all-or-nothing rollback**: if any cluster operation fails during apply (e.g. a port conflict when rebinding, a TLS error, a plugin initialisation failure), all previously successful operations in that apply are rolled back in reverse order. +Added clusters are removed, modified clusters are reverted to their original configuration, and removed clusters are re-added. - /** - * Processes configuration changes in the correct order: Remove → Modify → Add - */ - private CompletableFuture processConfigurationChanges( - ChangeResult changes, - ConfigurationChangeContext context, - ConfigurationChangeRollbackTracker rollbackTracker) { +This is consistent with startup behaviour, where a failure in any virtual cluster fails the entire proxy. +It produces a predictable outcome: after a failed apply, the proxy is in its previous known-good state, and the operator can investigate and retry. - CompletableFuture chain = CompletableFuture.completedFuture(null); +Other deployment models may need different behaviour: - // 1. Remove clusters first (to free up ports/resources) - // 2. Restart modified existing clusters - // 3. Add new clusters last +- A **multi-tenant ingress** deployment might prefer to continue running with partial success rather than rolling back all changes because one tenant's cluster failed. +- A **Kubernetes sidecar** deployment might prefer to terminate the process and let the supervisor restart it. - return chain; - } -} -``` - -### 6. ChangeDetector Interface - -Strategy pattern interface for different types of change detection. - -- **What it does**: Defines a contract for components that detect specific types of configuration changes -- **Key Responsibilities:** - - Provides a consistent API for comparing old vs new configurations via `detectChanges()` - - Returns structured `ChangeResult` objects with specific operations needed (add/remove/modify) - - Enables extensibility - new detectors can be added without modifying existing code - - Currently has two implementations: `VirtualClusterChangeDetector` and `FilterChangeDetector` +These alternatives are deployment-level configuration choices (as discussed above) and do not need to be resolved for the initial implementation. +All-or-nothing rollback is the safe default that covers the common case. -```java -public interface ChangeDetector { - /** - * Name of this change detector for logging and debugging. - */ - String getName(); +### Plugin resource tracking (known gap) - /** - * Detect configuration changes and return structured change information. - * @param context The configuration context containing old and new configurations - * @return ChangeResult containing categorized cluster operations - */ - ChangeResult detectChanges(ConfigurationChangeContext context); -} -``` - -### 7. VirtualClusterChangeDetector +The change detection described above can identify when a filter's YAML configuration changes (via `equals()` on the configuration model), but it cannot detect when external resources that a plugin reads during initialisation have changed. +For example, a password file, TLS keystore, or ACL rules file may have changed on disk even though the plugin's configuration (which only references the file path) is identical. -Identifies virtual clusters needing restart due to model changes (new, removed, modified). +These reads typically happen deep in nested plugin call stacks (e.g. `RecordEncryption` → `KmsService` → `CredentialProvider` → `FilePassword`), so the runtime has no visibility into what was read or whether it has changed. -- **What it does**: Compares old and new `VirtualClusterModel` collections to detect cluster-level changes -- **Key Responsibilities:** - - **New Cluster Detection**: Finds clusters that exist in new configuration but not in old (additions) - - **Removed Cluster Detection**: Finds clusters that exist in old configuration but not in new (deletions) - - **Modified Cluster Detection**: Finds clusters that exist in both but have different `VirtualClusterModel` (using `equals()` comparison) - - Uses cluster name as the unique identifier for comparison +Without addressing this gap, an apply operation would miss these changes entirely — the plugin configuration hasn't changed, so no restart is triggered, even though the plugin's actual behaviour would differ if it were reinitialised. -```java -public class VirtualClusterChangeDetector implements ChangeDetector { +An approach is being explored where plugins read external resources through the runtime (rather than doing direct file I/O), allowing the runtime to track what was read and hash the content for subsequent change detection. +This makes dependency tracking automatic rather than relying on plugin authors to opt in. +The detailed design for this mechanism will be covered in a separate proposal to keep this one focused on the core apply operation. - @Override - public String getName() { - return "VirtualClusterChangeDetector"; - } +## Open questions - @Override - public ChangeResult detectChanges(ConfigurationChangeContext context) { - // Check for modified clusters using equals() comparison - List modifiedClusters = findModifiedClusters(context); +- **Configuration granularity**: The initial design uses state-of-the-world snapshots. Is there a use case that requires delta-based operations or more targeted snapshots in the near term, or is this purely future work? +- **Failure behaviour options beyond all-or-nothing**: What specific deployment models need partial-success or terminate-on-failure semantics, and what configuration surface do they need? +- **Drain timeout default and configurability**: What is a reasonable default drain timeout? How should it be configured — globally, per-cluster, or both? - // Check for new clusters (exist in new but not old) - List newClusters = findNewClusters(context); +## Trigger mechanisms (future work) - // Check for removed clusters (exist in old but not new) - List removedClusters = findRemovedClusters(context); +The `applyConfiguration()` operation is trigger-agnostic. +The following trigger mechanisms have been discussed but are explicitly deferred: - return new ChangeResult(removedClusters, newClusters, modifiedClusters); - } +- **HTTP endpoint**: An HTTP POST endpoint (e.g. `/admin/config/reload`) that accepts a new configuration and calls `applyConfiguration()`. Provides synchronous feedback. Questions remain around security (authentication, binding to localhost vs. network interfaces), whether the endpoint receives the configuration inline or reads it from a file path, and content-type handling. +- **File watcher**: A filesystem watcher that detects changes to the configuration file and triggers `applyConfiguration()`. Interacts with Kubernetes ConfigMap mount semantics. Questions remain around debouncing, atomic file replacement, and read-only filesystem constraints. +- **Operator integration**: A Kubernetes operator that reconciles a CRD and calls `applyConfiguration()` via the proxy's API. The operator owns the desired state; the proxy does not persist configuration to disk. - private List findModifiedClusters(ConfigurationChangeContext context) { - Map oldModelMap = context.oldModels().stream() - .collect(Collectors.toMap(VirtualClusterModel::getClusterName, model -> model)); +Each of these can be designed and implemented independently once the core `applyConfiguration()` mechanism is in place. - return context.newModels().stream() - .filter(newModel -> { - VirtualClusterModel oldModel = oldModelMap.get(newModel.getClusterName()); - return oldModel != null && !oldModel.equals(newModel); - }) - .map(VirtualClusterModel::getClusterName) - .collect(Collectors.toList()); - } +## Affected/not affected projects - private List findNewClusters(ConfigurationChangeContext context) { - Set oldClusterNames = context.oldModels().stream() - .map(VirtualClusterModel::getClusterName) - .collect(Collectors.toSet()); +**Affected:** - return context.newModels().stream() - .map(VirtualClusterModel::getClusterName) - .filter(name -> !oldClusterNames.contains(name)) - .collect(Collectors.toList()); - } +- **kroxylicious** (core proxy) — The `applyConfiguration()` operation, change detection, cluster lifecycle management, and connection draining all live here. +- **kroxylicious-junit5-extension** — Test infrastructure may need to support applying configuration changes to a running proxy in integration tests. - private List findRemovedClusters(ConfigurationChangeContext context) { - Set newClusterNames = context.newModels().stream() - .map(VirtualClusterModel::getClusterName) - .collect(Collectors.toSet()); - - return context.oldModels().stream() - .map(VirtualClusterModel::getClusterName) - .filter(name -> !newClusterNames.contains(name)) - .collect(Collectors.toList()); - } -} -``` +**Not affected:** -### 8. FilterChangeDetector +- **kroxylicious-operator** — The operator will eventually be a trigger mechanism, but the core apply operation does not depend on it. +- **Filter/plugin implementations** — Existing filters do not need to change. The plugin resource tracking gap (above) may eventually require filters to change how they read external resources, but that is a separate proposal. -Identifies clusters needing restart due to filter configuration changes. +## Compatibility -- **What it does**: Detects changes in filter definitions and identifies which virtual clusters are affected -- **Key Responsibilities:** - - **Filter Definition Changes**: Compares `NamedFilterDefinition` objects to find filters where type or config changed (additions/removals are handled by Configuration validation) - - **Default Filters Changes**: Detects changes to the `defaultFilters` list (order matters for filter chain execution) - - Returns only `clustersToModify` - filter changes don't cause cluster additions/removals -- **Cluster Impact Rules**: A cluster is impacted if: - - It uses a filter definition that was modified (either from explicit filters or defaults), OR - - It doesn't specify `cluster.filters()` AND the `defaultFilters` list changed +- The `applyConfiguration()` operation is additive — it does not change existing startup behaviour. +- Virtual cluster configuration semantics are unchanged; the proposal only adds the ability to apply changes at runtime. +- Filter definitions and their configuration are unchanged. +- No changes to the on-disk configuration file format. -```java -public class FilterChangeDetector implements ChangeDetector { - - @Override - public String getName() { - return "FilterChangeDetector"; - } - - @Override - public ChangeResult detectChanges(ConfigurationChangeContext context) { - // Detect filter definition changes - Set modifiedFilterNames = findModifiedFilterDefinitions(context); - - // Detect default filters changes (order matters for filter chain execution) - boolean defaultFiltersChanged = hasDefaultFiltersChanged(context); - - // Find impacted clusters - List clustersToModify = findImpactedClusters(modifiedFilterNames, defaultFiltersChanged, context); - - return new ChangeResult(List.of(), List.of(), clustersToModify); - } - - /** - * Find filter definitions that have been modified. - * A filter is considered modified if the type or config changed. - * Note: Filter additions/removals are not tracked here as they're handled by Configuration validation. - */ - private Set findModifiedFilterDefinitions(ConfigurationChangeContext context) { - Map oldDefs = buildFilterDefMap(context.oldConfig()); - Map newDefs = buildFilterDefMap(context.newConfig()); - - Set modifiedFilterNames = new HashSet<>(); - - // Check each new definition to see if it differs from the old one - for (Map.Entry entry : newDefs.entrySet()) { - String filterName = entry.getKey(); - NamedFilterDefinition newDef = entry.getValue(); - NamedFilterDefinition oldDef = oldDefs.get(filterName); - - // Filter exists in both configs - check if it changed - if (oldDef != null && !oldDef.equals(newDef)) { - modifiedFilterNames.add(filterName); - } - } - - return modifiedFilterNames; - } - - /** - * Check if the default filters list has changed. - * Order matters because filter chain execution is sequential. - */ - private boolean hasDefaultFiltersChanged(ConfigurationChangeContext context) { - List oldDefaults = context.oldConfig().defaultFilters(); - List newDefaults = context.newConfig().defaultFilters(); - // Use Objects.equals for null-safe comparison - checks both content AND order - return !Objects.equals(oldDefaults, newDefaults); - } - - /** - * Find virtual clusters that are impacted by filter changes. - * Uses a simple single-pass approach: iterate through each cluster and check if it's - * affected by any filter change. Prioritizes code clarity over optimization. - */ - private List findImpactedClusters( - Set modifiedFilterNames, - boolean defaultFiltersChanged, - ConfigurationChangeContext context) { - - // Early return if nothing changed - if (modifiedFilterNames.isEmpty() && !defaultFiltersChanged) { - return List.of(); - } - - List impactedClusters = new ArrayList<>(); - - // Simple approach: check each cluster's resolved filters - for (VirtualClusterModel cluster : context.newModels()) { - String clusterName = cluster.getClusterName(); - - // Get this cluster's resolved filters (either explicit or from defaults) - List clusterFilterNames = cluster.getFilters() - .stream() - .map(NamedFilterDefinition::name) - .toList(); - - // Check if cluster uses any modified filter OR uses defaults and defaults changed - boolean usesModifiedFilter = clusterFilterNames.stream() - .anyMatch(modifiedFilterNames::contains); - - boolean usesChangedDefaults = defaultFiltersChanged && - clusterUsesDefaults(cluster, context.newConfig()); - - if (usesModifiedFilter || usesChangedDefaults) { - impactedClusters.add(clusterName); - } - } - - return impactedClusters; - } - - /** - * Check if a cluster uses default filters. - * A cluster uses defaults if it doesn't specify its own filters list. - */ - private boolean clusterUsesDefaults(VirtualClusterModel cluster, Configuration config) { - VirtualCluster vc = config.virtualClusters().stream() - .filter(v -> v.name().equals(cluster.getClusterName())) - .findFirst() - .orElse(null); - - // Cluster uses defaults if it doesn't specify its own filters - return vc != null && vc.filters() == null; - } -} -``` - -### 9. Supporting Records - -**ConfigurationChangeContext** - Immutable context for change detection. -- **What it does**: Provides a single object containing all the data needed for change detection, including both old and new configurations and their pre-computed models -- **Key fields**: `oldConfig`, `newConfig`, `oldModels`, `newModels`, `oldFilterChainFactory`, `newFilterChainFactory` -- **Why FilterChainFactory is included**: Enables filter-related change detectors to reference the factories for comparison - -```java -public record ConfigurationChangeContext( - Configuration oldConfig, - Configuration newConfig, - List oldModels, - List newModels, - @Nullable FilterChainFactory oldFilterChainFactory, - @Nullable FilterChainFactory newFilterChainFactory) {} -``` - -**ChangeResult** - Result of change detection. -- **What it does**: Contains categorized lists of cluster names for each operation type needed -- **Key fields**: `clustersToRemove`, `clustersToAdd`, `clustersToModify` -- **Utility methods**: `hasChanges()` to check if any changes detected, `getTotalOperations()` to get total count - -```java -public record ChangeResult( - List clustersToRemove, - List clustersToAdd, - List clustersToModify) { - - public boolean hasChanges() { - return !clustersToRemove.isEmpty() || !clustersToAdd.isEmpty() || !clustersToModify.isEmpty(); - } - - public int getTotalOperations() { - return clustersToRemove.size() + clustersToAdd.size() + clustersToModify.size(); - } -} -``` - -**ReloadResponse** - HTTP response payload. -- **What it does**: Serializable record that represents the JSON response sent back to HTTP clients -- **Key fields**: `success`, `message`, `clustersModified`, `clustersAdded`, `clustersRemoved`, `timestamp` -- **Factory methods**: `from(ReloadResult)` to convert internal result, `error(message)` for error responses - -```java -public record ReloadResponse( - boolean success, - String message, - int clustersModified, - int clustersAdded, - int clustersRemoved, - String timestamp) { - - public static ReloadResponse from(ReloadResult result) { - return new ReloadResponse( - result.isSuccess(), - result.getMessage(), - result.getClustersModified(), - result.getClustersAdded(), - result.getClustersRemoved(), - result.getTimestamp().toString()); - } - - public static ReloadResponse error(String message) { - return new ReloadResponse(false, message, 0, 0, 0, Instant.now().toString()); - } -} -``` - -**ReloadStateManager** - Tracks reload state and history. -- **What it does**: Maintains the current reload state and a history of recent reload operations for observability -- **Key responsibilities**: Tracks `IDLE`/`IN_PROGRESS` state, records success/failure with `ReloadResult`, maintains bounded history (max 10 entries) -- **Thread safety**: Uses `AtomicReference` for state and `synchronized` blocks for history access - -```java -public class ReloadStateManager { - - private static final int MAX_HISTORY_SIZE = 10; - - private final AtomicReference currentState; - private final Deque reloadHistory; - - public enum ReloadState { - IDLE, - IN_PROGRESS - } - - public void startReload() { - currentState.set(ReloadState.IN_PROGRESS); - } - - public void recordSuccess(ReloadResult result) { - currentState.set(ReloadState.IDLE); - addToHistory(result); - } - - public void recordFailure(Throwable error) { - currentState.set(ReloadState.IDLE); - addToHistory(ReloadResult.failure(error.getMessage())); - } - - public ReloadState getCurrentState() { - return currentState.get(); - } - - public Optional getLastResult() { - synchronized (reloadHistory) { - return reloadHistory.isEmpty() ? Optional.empty() : Optional.of(reloadHistory.peekLast()); - } - } -} -``` - -## Integration with KafkaProxy - -The `KafkaProxy` class initializes the reload orchestrator and passes it to the management endpoint. - -- **What it does**: `KafkaProxy` is the entry point to the proxy app and is responsible for setting up all hot-reload components -- **Key Responsibilities:** - - Creates `ConnectionTracker` and `InFlightMessageTracker` for connection management - - Creates `ConnectionDrainManager` and `VirtualClusterManager` for cluster lifecycle operations - - Creates `ConfigurationChangeHandler` with list of change detectors (`VirtualClusterChangeDetector`, `FilterChangeDetector`) - - Creates `AtomicReference` that is shared between `KafkaProxyInitializer` and `ConfigurationReloadOrchestrator` for atomic factory swaps - - Creates `ConfigurationReloadOrchestrator` and passes it to `ManagementInitializer` for HTTP endpoint registration -- **Why AtomicReference is used**: Both the initializers (which create filter chains for new connections) and the orchestrator (which swaps factories on reload) need access to the current factory. Using `AtomicReference` enables atomic, thread-safe swaps. - -```java -public final class KafkaProxy implements AutoCloseable { - - // Shared mutable reference to FilterChainFactory - enables atomic swaps during hot reload - private AtomicReference filterChainFactoryRef; - - private final ConfigurationChangeHandler configurationChangeHandler; - private final @Nullable ConfigurationReloadOrchestrator reloadOrchestrator; - - public KafkaProxy(PluginFactoryRegistry pfr, Configuration config, Features features, @Nullable Path configFilePath) { - // Initialize connection management components - this.connectionDrainManager = new ConnectionDrainManager(connectionTracker, inFlightTracker); - this.virtualClusterManager = new VirtualClusterManager(endpointRegistry, connectionDrainManager); - - // Initialize configuration change handler with detectors - this.configurationChangeHandler = new ConfigurationChangeHandler( - List.of( - new VirtualClusterChangeDetector(), - new FilterChangeDetector()), - virtualClusterManager); - - // Create AtomicReference for FilterChainFactory - this.filterChainFactoryRef = new AtomicReference<>(); - - // Initialize reload orchestrator for HTTP endpoint - this.reloadOrchestrator = new ConfigurationReloadOrchestrator( - config, - configurationChangeHandler, - pfr, - features, - configFilePath, - filterChainFactoryRef); - } - - public CompletableFuture startup() { - // Create initial FilterChainFactory and store in shared atomic reference - FilterChainFactory initialFactory = new FilterChainFactory(pfr, config.filterDefinitions()); - this.filterChainFactoryRef.set(initialFactory); - - // Pass atomic reference to initializers for dynamic factory swaps - var tlsServerBootstrap = buildServerBootstrap(proxyEventGroup, - new KafkaProxyInitializer(filterChainFactoryRef, ...)); - var plainServerBootstrap = buildServerBootstrap(proxyEventGroup, - new KafkaProxyInitializer(filterChainFactoryRef, ...)); - - // Start management listener with reload orchestrator - var managementFuture = maybeStartManagementListener(managementEventGroup, meterRegistries, reloadOrchestrator); - - // ... - } -} -``` - -## Flow Diagram - -``` -┌───────────────────────────────────────────────────────────────────────────┐ -│ HTTP POST /admin/config/reload │ -│ Content-Type: application/yaml │ -│ Body: │ -└───────────────────────────────────┬───────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ ConfigurationReloadEndpoint │ -│ - Creates ReloadRequestContext │ -│ - Delegates to ReloadRequestProcessor │ -└───────────────────────────────────┬───────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ ReloadRequestProcessor │ -│ - Chain of Responsibility pattern │ -│ │ -│ ┌─────────────────┐ ┌──────────────────┐ ┌────────────────────────┐ │ -│ │ ContentType │ → │ ContentLength │ → │ ConfigurationParsing │ │ -│ │ Validation │ │ Validation │ │ Handler │ │ -│ └─────────────────┘ └──────────────────┘ └────────────┬───────────┘ │ -│ │ │ -│ ▼ │ -│ ┌────────────────────────────┐ │ -│ │ ConfigurationReload │ │ -│ │ Handler │ │ -│ └────────────┬───────────────┘ │ -└────────────────────────────────────────────────────────┼──────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ ConfigurationReloadOrchestrator │ -│ - Concurrency control (ReentrantLock) │ -│ - Configuration validation │ -│ - Creates new FilterChainFactory │ -│ - Builds ConfigurationChangeContext │ -└───────────────────────────────────┬───────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ ConfigurationChangeHandler │ -│ - Coordinates change detectors │ -│ - Aggregates change results │ -│ │ -│ ┌──────────────────────────┐ ┌────────────────────────┐ │ -│ │ VirtualClusterChange │ │ FilterChangeDetector │ │ -│ │ Detector │ │ │ │ -│ │ - New clusters │ │ - Modified filters │ │ -│ │ - Removed clusters │ │ - Default filters │ │ -│ │ - Modified clusters │ │ - Impacted clusters │ │ -│ └──────────────┬───────────┘ └───────────┬────────────┘ │ -│ │ │ │ -│ └─────────────┬──────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌──────────────────────┐ │ -│ │ ChangeResult │ │ -│ │ - clustersToRemove │ │ -│ │ - clustersToAdd │ │ -│ │ - clustersToModify │ │ -│ └──────────┬───────────┘ │ -└────────────────────────────────┼──────────────────────────────────────────┘ - │ - ▼ - ┌────────────────────────┐ - │ VirtualClusterManager │ ────► Part 2: Graceful Restart - │ - Remove clusters │ - │ - Add clusters │ - │ - Restart clusters │ - └────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ ON SUCCESS │ -│ │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ filterChainFactoryRef.set(newFactory) ◄── Atomic swap! │ │ -│ │ oldFactory.close() │ │ -│ │ currentConfiguration = newConfig │ │ -│ │ persistConfigurationToDisk(newConfig) │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -└───────────────────────────────────┬───────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ HTTP 200 OK │ -│ Content-Type: application/json │ -│ {"success": true, "clustersModified": 1, ...} │ -└───────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -# Part 2: Graceful Virtual Cluster Restart - -Part 2 of the hot-reload implementation focuses on gracefully restarting virtual clusters. This component receives structured change operations from Part 1 and executes them in a carefully orchestrated sequence: **connection draining → resource deregistration → new resource registration → connection restoration.** - -The design emphasizes minimal service disruption by ensuring all in-flight Kafka requests complete before closing connections (or when a timeout is hit). - -## Core Classes & Structure - -### 1. VirtualClusterManager - -Acts as the high-level orchestrator for all virtual cluster lifecycle operations during hot-reload. `ConfigurationChangeHandler` calls the `VirtualClusterManager` to restart/add/remove clusters when there is a config change. - -- **What it does**: Manages the complete lifecycle of virtual clusters including addition, removal, and restart operations -- **Key Responsibilities:** - - **Cluster Addition**: Takes a new `VirtualClusterModel` and brings it online by registering all gateways with `EndpointRegistry` - - **Cluster Removal**: Safely takes down an existing cluster by first draining all connections gracefully via `ConnectionDrainManager`, then deregistering from `EndpointRegistry` - - **Cluster Restart**: Performs a complete cluster reconfiguration by orchestrating remove → add sequence with updated settings - - **Rollback Integration**: Automatically tracks all successful operations via `ConfigurationChangeRollbackTracker` so they can be undone if later operations fail - -```java -public class VirtualClusterManager { - - private final EndpointRegistry endpointRegistry; - private final ConnectionDrainManager connectionDrainManager; - - public VirtualClusterManager(EndpointRegistry endpointRegistry, - ConnectionDrainManager connectionDrainManager) { - this.endpointRegistry = endpointRegistry; - this.connectionDrainManager = connectionDrainManager; - } - - /** - * Gracefully removes a virtual cluster by draining connections and deregistering endpoints. - */ - public CompletableFuture removeVirtualCluster(String clusterName, - List oldModels, - ConfigurationChangeRollbackTracker rollbackTracker) { - VirtualClusterModel clusterToRemove = findClusterModel(oldModels, clusterName); - - // 1. Drain connections gracefully (30s timeout) - return connectionDrainManager.gracefullyDrainConnections(clusterName, Duration.ofSeconds(30)) - .thenCompose(v -> { - // 2. Deregister all gateways from endpoint registry - var deregistrationFutures = clusterToRemove.gateways().values().stream() - .map(gateway -> endpointRegistry.deregisterVirtualCluster(gateway)) - .toArray(CompletableFuture[]::new); - - return CompletableFuture.allOf(deregistrationFutures); - }) - .thenRun(() -> { - // 3. Track removal for potential rollback - rollbackTracker.trackRemoval(clusterName, clusterToRemove); - LOGGER.info("Successfully removed virtual cluster '{}'", clusterName); - }); - } - - /** - * Restarts a virtual cluster with new configuration (remove + add). - */ - public CompletableFuture restartVirtualCluster(String clusterName, - VirtualClusterModel newModel, - List oldModels, - ConfigurationChangeRollbackTracker rollbackTracker) { - VirtualClusterModel oldModel = findClusterModel(oldModels, clusterName); - - // Step 1: Remove existing cluster (drain + deregister) - return removeVirtualCluster(clusterName, oldModels, rollbackTracker) - .thenCompose(v -> { - // Step 2: Add new cluster with updated configuration - return addVirtualCluster(newModel, rollbackTracker); - }) - .thenRun(() -> { - // Step 3: Track modification and stop draining - rollbackTracker.trackModification(clusterName, oldModel, newModel); - connectionDrainManager.stopDraining(clusterName); - LOGGER.info("Successfully restarted virtual cluster '{}' with new configuration", clusterName); - }); - } - - /** - * Adds a new virtual cluster by registering endpoints and enabling connections. - */ - public CompletableFuture addVirtualCluster(VirtualClusterModel newModel, - ConfigurationChangeRollbackTracker rollbackTracker) { - String clusterName = newModel.getClusterName(); - - return registerVirtualCluster(newModel) - .thenRun(() -> { - // Stop draining to allow new connections - connectionDrainManager.stopDraining(clusterName); - rollbackTracker.trackAddition(clusterName, newModel); - LOGGER.info("Successfully added new virtual cluster '{}'", clusterName); - }); - } - - /** - * Registers all gateways for a virtual cluster with the endpoint registry. - */ - private CompletableFuture registerVirtualCluster(VirtualClusterModel model) { - var registrationFutures = model.gateways().values().stream() - .map(gateway -> endpointRegistry.registerVirtualCluster(gateway)) - .toArray(CompletableFuture[]::new); - - return CompletableFuture.allOf(registrationFutures); - } -} -``` - -### 2. ConnectionDrainManager - -Implements the graceful connection draining strategy during cluster restarts. This is what makes hot-reload "graceful" - it ensures that client requests in progress are completed rather than dropped. - -- **What it does**: Manages the graceful shutdown of connections during cluster restart, ensuring no in-flight messages are lost -- **Key Responsibilities:** - - **Draining Mode Control**: Maintains a map of draining clusters; when a cluster enters drain mode, `shouldAcceptConnection()` returns false to reject new connections - - **Backpressure Strategy**: Sets `autoRead = false` only on downstream (client→proxy) channels to prevent new requests, while keeping upstream (proxy→Kafka) channels reading to allow responses to complete naturally - - **In-Flight Monitoring**: Uses a scheduled executor to periodically check `InFlightMessageTracker` (every 100ms) and closes channels when pending requests reach zero - - **Timeout Handling**: If in-flight count doesn't reach zero within the timeout (default 30s), force-closes the channel to prevent indefinite hangs - - **Resource Cleanup**: Implements `AutoCloseable` to properly shut down the scheduler on proxy shutdown -- **Explanation of the Draining Strategy:** - - **Phase 1**: Enter draining mode → new connection attempts are rejected - - **Phase 2**: Apply backpressure → downstream `autoRead=false`, upstream `autoRead=true` - - **Phase 3**: Monitor in-flight messages → wait for count to reach zero or timeout, then close channel - -```java -public class ConnectionDrainManager implements AutoCloseable { - - private final ConnectionTracker connectionTracker; - private final InFlightMessageTracker inFlightTracker; - private final Map drainingClusters = new ConcurrentHashMap<>(); - private final ScheduledExecutorService scheduler; - - public ConnectionDrainManager(ConnectionTracker connectionTracker, - InFlightMessageTracker inFlightTracker) { - this.connectionTracker = connectionTracker; - this.inFlightTracker = inFlightTracker; - this.scheduler = new ScheduledThreadPoolExecutor(2, r -> { - Thread t = new Thread(r, "connection-drain-manager"); - t.setDaemon(true); - return t; - }); - } - - /** - * Determines if a new connection should be accepted for the specified virtual cluster. - */ - public boolean shouldAcceptConnection(String clusterName) { - return !isDraining(clusterName); - } - - /** - * Performs a complete graceful drain operation by stopping new connections - * and immediately closing existing connections after in-flight messages complete. - */ - public CompletableFuture gracefullyDrainConnections(String clusterName, Duration totalTimeout) { - int totalActiveConnections = connectionTracker.getTotalConnectionCount(clusterName); - int totalInFlightRequests = inFlightTracker.getTotalPendingRequestCount(clusterName); - - LOGGER.info("Starting graceful drain for cluster '{}' with {} connections and {} in-flight requests", - clusterName, totalActiveConnections, totalInFlightRequests); - - return startDraining(clusterName) - .thenCompose(v -> { - if (totalActiveConnections == 0) { - return CompletableFuture.completedFuture(null); - } - else { - return gracefullyCloseConnections(clusterName, totalTimeout); - } - }); - } - - /** - * Starts draining - new connections will be rejected. - */ - public CompletableFuture startDraining(String clusterName) { - drainingClusters.put(clusterName, new AtomicBoolean(true)); - return CompletableFuture.completedFuture(null); - } - - /** - * Gracefully closes all active connections for the specified virtual cluster. - * Strategy: Disable autoRead on downstream channels to prevent new requests, - * but keep upstream channels reading to allow responses to complete naturally. - */ - public CompletableFuture gracefullyCloseConnections(String clusterName, Duration timeout) { - Set downstreamChannels = connectionTracker.getDownstreamActiveChannels(clusterName); - Set upstreamChannels = connectionTracker.getUpstreamActiveChannels(clusterName); - - var allCloseFutures = new ArrayList>(); - - // STRATEGY: - // - Downstream (autoRead=false): Prevents new client requests from being processed - // - Upstream (autoRead=true): Allows Kafka responses to be processed normally - - // Add downstream channel close futures - downstreamChannels.stream() - .map(this::disableAutoReadOnDownstreamChannel) - .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "DOWNSTREAM")) - .forEach(allCloseFutures::add); - - // Add upstream channel close futures - upstreamChannels.stream() - .map(channel -> gracefullyCloseChannel(channel, clusterName, timeout, "UPSTREAM")) - .forEach(allCloseFutures::add); - - return CompletableFuture.allOf(allCloseFutures.toArray(new CompletableFuture[0])); - } - - private Channel disableAutoReadOnDownstreamChannel(Channel downstreamChannel) { - try { - if (downstreamChannel.isActive()) { - KafkaProxyFrontendHandler frontendHandler = - downstreamChannel.pipeline().get(KafkaProxyFrontendHandler.class); - if (frontendHandler != null) { - frontendHandler.applyBackpressure(); - } - else { - downstreamChannel.config().setAutoRead(false); - } - } - } - catch (Exception e) { - LOGGER.warn("Failed to disable autoRead for downstream channel - continuing", e); - } - return downstreamChannel; - } - - /** - * Gracefully closes a single channel. - * Monitors in-flight count every 100ms, closes when zero or on timeout. - */ - private CompletableFuture gracefullyCloseChannel(Channel channel, String clusterName, - Duration timeout, String channelType) { - CompletableFuture future = new CompletableFuture<>(); - long timeoutMillis = timeout.toMillis(); - long startTime = System.currentTimeMillis(); - - // Schedule timeout - ScheduledFuture timeoutTask = scheduler.schedule(() -> { - if (!future.isDone()) { - LOGGER.warn("Graceful shutdown timeout - forcing closure"); - closeChannelImmediately(channel, future); - } - }, timeoutMillis, TimeUnit.MILLISECONDS); - - // Schedule periodic checks for in-flight messages - ScheduledFuture checkTask = scheduler.scheduleAtFixedRate(() -> { - if (future.isDone()) return; - - int pendingRequests = inFlightTracker.getPendingRequestCount(clusterName, channel); - if (pendingRequests == 0) { - closeChannelImmediately(channel, future); - } - }, 50, 100, TimeUnit.MILLISECONDS); - - // Cancel tasks when done - future.whenComplete((result, throwable) -> { - timeoutTask.cancel(false); - checkTask.cancel(false); - }); - - return future; - } - - private void closeChannelImmediately(Channel channel, CompletableFuture future) { - if (future.isDone()) return; - - channel.close().addListener(channelFuture -> { - if (channelFuture.isSuccess()) { - future.complete(null); - } - else { - future.completeExceptionally(channelFuture.cause()); - } - }); - } -} -``` - -### 3. ConnectionTracker - -Maintains real-time inventory of all active network connections per virtual cluster. - -- **What it does**: Provides real-time visibility into all active connections (downstream and upstream) for each virtual cluster -- **Key Responsibilities:** - - **Bidirectional Tracking**: Separately tracks downstream connections (client→proxy) and upstream connections (proxy→Kafka) using `ConcurrentHashMap` - - **Channel Management**: Maintains collections of active `Channel` objects for bulk operations like graceful closure - - **Lifecycle Integration**: Integrates with `ProxyChannelStateMachine` to automatically track connection establishment and closure events - - **Cleanup Logic**: Automatically removes references to closed channels and cleans up empty cluster entries to prevent memory leaks - - **Thread Safety**: Uses `ConcurrentHashMap` and `AtomicInteger` for thread-safe operations from multiple Netty event loops - -```java -public class ConnectionTracker { - - // Downstream connections (client → proxy) - private final Map downstreamConnections = new ConcurrentHashMap<>(); - private final Map> downstreamChannelsByCluster = new ConcurrentHashMap<>(); - - // Upstream connections (proxy → target Kafka cluster) - private final Map upstreamConnections = new ConcurrentHashMap<>(); - private final Map> upstreamChannelsByCluster = new ConcurrentHashMap<>(); - - public void onDownstreamConnectionEstablished(String clusterName, Channel channel) { - downstreamConnections.computeIfAbsent(clusterName, k -> new AtomicInteger(0)).incrementAndGet(); - downstreamChannelsByCluster.computeIfAbsent(clusterName, k -> ConcurrentHashMap.newKeySet()).add(channel); - } - - public void onDownstreamConnectionClosed(String clusterName, Channel channel) { - onConnectionClosed(clusterName, channel, downstreamConnections, downstreamChannelsByCluster); - } - - public Set getDownstreamActiveChannels(String clusterName) { - Set channels = downstreamChannelsByCluster.get(clusterName); - return channels != null ? Set.copyOf(channels) : Set.of(); - } - - // Similar methods for upstream connections... - - public int getTotalConnectionCount(String clusterName) { - return getDownstreamActiveConnectionCount(clusterName) + getUpstreamActiveConnectionCount(clusterName); - } - - private void onConnectionClosed(String clusterName, Channel channel, - Map connectionCounters, - Map> channelsByCluster) { - AtomicInteger counter = connectionCounters.get(clusterName); - if (counter != null) { - counter.decrementAndGet(); - if (counter.get() <= 0) { - connectionCounters.remove(clusterName); - } - } - - Set channels = channelsByCluster.get(clusterName); - if (channels != null) { - channels.remove(channel); - if (channels.isEmpty()) { - channelsByCluster.remove(clusterName); - } - } - } -} -``` - -### 4. InFlightMessageTracker - -Tracks pending Kafka requests to ensure no messages are lost during connection closure. - -- **What it does**: Maintains counters of pending Kafka requests per channel and cluster to enable "wait for completion" strategy during graceful shutdown -- **Key Responsibilities:** - - **Request Tracking**: Increments counters when Kafka requests are sent upstream (called from `ProxyChannelStateMachine.messageFromClient()`) - - **Response Tracking**: Decrements counters when Kafka responses are received (called from `ProxyChannelStateMachine.messageFromServer()`) - - **Per-Channel Counts**: Maintains a two-level map: `cluster name → channel → pending count` for granular tracking - - **Cluster Totals**: Maintains a separate map for quick cluster-wide total lookup without iterating all channels - - **Channel Cleanup**: When a channel closes unexpectedly, adjusts counts appropriately to prevent stuck counters - - **Thread Safety**: Uses `ConcurrentHashMap` and `AtomicInteger` for thread-safe concurrent access - -```java -public class InFlightMessageTracker { - - // Map from cluster name to channel to pending request count - private final Map> pendingRequests = new ConcurrentHashMap<>(); - - // Map from cluster name to total pending requests for quick lookup - private final Map totalPendingByCluster = new ConcurrentHashMap<>(); - - /** - * Records that a request has been sent to the upstream cluster. - */ - public void onRequestSent(String clusterName, Channel channel) { - pendingRequests.computeIfAbsent(clusterName, k -> new ConcurrentHashMap<>()) - .computeIfAbsent(channel, k -> new AtomicInteger(0)) - .incrementAndGet(); - - totalPendingByCluster.computeIfAbsent(clusterName, k -> new AtomicInteger(0)) - .incrementAndGet(); - } - - /** - * Records that a response has been received from the upstream cluster. - */ - public void onResponseReceived(String clusterName, Channel channel) { - Map clusterRequests = pendingRequests.get(clusterName); - if (clusterRequests != null) { - AtomicInteger channelCounter = clusterRequests.get(channel); - if (channelCounter != null) { - int remaining = channelCounter.decrementAndGet(); - if (remaining <= 0) { - clusterRequests.remove(channel); - if (clusterRequests.isEmpty()) { - pendingRequests.remove(clusterName); - } - } - - AtomicInteger totalCounter = totalPendingByCluster.get(clusterName); - if (totalCounter != null) { - int totalRemaining = totalCounter.decrementAndGet(); - if (totalRemaining <= 0) { - totalPendingByCluster.remove(clusterName); - } - } - } - } - } - - /** - * Records that a channel has been closed, clearing all pending requests. - */ - public void onChannelClosed(String clusterName, Channel channel) { - Map clusterRequests = pendingRequests.get(clusterName); - if (clusterRequests != null) { - AtomicInteger channelCounter = clusterRequests.remove(channel); - if (channelCounter != null) { - int pendingCount = channelCounter.get(); - if (pendingCount > 0) { - AtomicInteger totalCounter = totalPendingByCluster.get(clusterName); - if (totalCounter != null) { - int newTotal = totalCounter.addAndGet(-pendingCount); - if (newTotal <= 0) { - totalPendingByCluster.remove(clusterName); - } - } - } - } - - if (clusterRequests.isEmpty()) { - pendingRequests.remove(clusterName); - } - } - } - - public int getPendingRequestCount(String clusterName, Channel channel) { - Map clusterRequests = pendingRequests.get(clusterName); - if (clusterRequests != null) { - AtomicInteger counter = clusterRequests.get(channel); - return counter != null ? Math.max(0, counter.get()) : 0; - } - return 0; - } - - public int getTotalPendingRequestCount(String clusterName) { - AtomicInteger counter = totalPendingByCluster.get(clusterName); - return counter != null ? Math.max(0, counter.get()) : 0; - } -} -``` - -### 5. ConfigurationChangeRollbackTracker - -Maintains a record of all cluster operations so they can be reversed if the overall configuration change fails. - -- **What it does**: Records all successful cluster operations during a configuration change so they can be undone if a later operation fails -- **Key Responsibilities:** - - **Removal Tracking**: Stores the cluster name and original `VirtualClusterModel` for each removed cluster, enabling re-addition on rollback - - **Modification Tracking**: Stores both the original and new `VirtualClusterModel` for each modified cluster, enabling revert to original state - - **Addition Tracking**: Stores the cluster name and `VirtualClusterModel` for each added cluster, enabling removal on rollback - - **Rollback Order**: Provides ordered lists to enable rollback in reverse order: Added → Modified → Removed - -```java -public class ConfigurationChangeRollbackTracker { - - private final List removedClusters = new ArrayList<>(); - private final List modifiedClusters = new ArrayList<>(); - private final List addedClusters = new ArrayList<>(); - - private final Map removedModels = new HashMap<>(); - private final Map originalModels = new HashMap<>(); - private final Map modifiedModels = new HashMap<>(); - private final Map addedModels = new HashMap<>(); - - public void trackRemoval(String clusterName, VirtualClusterModel removedModel) { - removedClusters.add(clusterName); - removedModels.put(clusterName, removedModel); - } - - public void trackModification(String clusterName, VirtualClusterModel originalModel, - VirtualClusterModel newModel) { - modifiedClusters.add(clusterName); - originalModels.put(clusterName, originalModel); - modifiedModels.put(clusterName, newModel); - } - - public void trackAddition(String clusterName, VirtualClusterModel addedModel) { - addedClusters.add(clusterName); - addedModels.put(clusterName, addedModel); - } - - // Getter methods for rollback operations... -} -``` - -### 6. Integration with ProxyChannelStateMachine - -The existing `ProxyChannelStateMachine` is enhanced to integrate with connection tracking and in-flight message tracking. - -- **What it does**: Adds hooks into the existing state machine to notify `ConnectionTracker` and `InFlightMessageTracker` of connection and message lifecycle events -- **Key Responsibilities:** - - **Connection Lifecycle**: Automatically notifies `ConnectionTracker` when connections are established (`toClientActive`, `toForwarding`) and closed (`onServerInactive`, `onClientInactive`) - - **In-flight Message Tracking**: Automatically notifies `InFlightMessageTracker` when requests are sent upstream (`messageFromClient`) and responses received (`messageFromServer`) - - **Cleanup on Close**: Ensures `InFlightMessageTracker.onChannelClosed()` is called when channels close to clear any pending counts - -```java -// Example integration points in ProxyChannelStateMachine - -void messageFromServer(Object msg) { - // Track responses received from upstream Kafka - if (inFlightTracker != null && msg instanceof ResponseFrame && backendHandler != null) { - inFlightTracker.onResponseReceived(clusterName, backendHandler.serverCtx().channel()); - } - // ... existing logic -} - -void messageFromClient(Object msg) { - // Track requests being sent upstream - if (inFlightTracker != null && msg instanceof RequestFrame && backendHandler != null) { - inFlightTracker.onRequestSent(clusterName, backendHandler.serverCtx().channel()); - } - // ... existing logic -} - -void onServerInactive() { - // Track upstream connection closure - if (connectionTracker != null && backendHandler != null) { - connectionTracker.onUpstreamConnectionClosed(clusterName, backendHandler.serverCtx().channel()); - } - // Clear any pending in-flight messages - if (inFlightTracker != null && backendHandler != null) { - inFlightTracker.onChannelClosed(clusterName, backendHandler.serverCtx().channel()); - } - // ... existing logic -} - -void onClientInactive() { - // Track downstream connection closure - if (connectionTracker != null && frontendHandler != null) { - connectionTracker.onDownstreamConnectionClosed(clusterName, frontendHandler.clientCtx().channel()); - } - if (inFlightTracker != null && frontendHandler != null) { - inFlightTracker.onChannelClosed(clusterName, frontendHandler.clientCtx().channel()); - } - // ... existing logic -} - -private void toClientActive(ProxyChannelState.ClientActive clientActive, - KafkaProxyFrontendHandler frontendHandler) { - // Track downstream connection establishment - if (connectionTracker != null) { - connectionTracker.onDownstreamConnectionEstablished(clusterName, frontendHandler.clientCtx().channel()); - } - // ... existing logic -} - -private void toForwarding(Forwarding forwarding) { - // Track upstream connection establishment - if (connectionTracker != null && backendHandler != null) { - connectionTracker.onUpstreamConnectionEstablished(clusterName, backendHandler.serverCtx().channel()); - } - // ... existing logic -} -``` - -### 7. Rejecting New Connections During Drain - -The `KafkaProxyFrontendHandler` checks if a cluster is draining before accepting new connections. - -- **What it does**: Adds a guard check in `channelActive()` to reject new connections when a cluster is being drained -- **How it works**: Calls `connectionDrainManager.shouldAcceptConnection(clusterName)` before allowing the connection to proceed. If the cluster is draining, immediately closes the channel with a log message. -- **Why this is needed**: Without this check, new connections could be established while we're trying to drain existing connections, making the drain process take longer or never complete. - -```java -public class KafkaProxyFrontendHandler - extends ChannelInboundHandlerAdapter - implements NetFilter.NetFilterContext { - - @Override - public void channelActive(ChannelHandlerContext ctx) throws Exception { - this.clientCtx = ctx; - - // Check if we should accept this connection (not draining) - String clusterName = virtualClusterModel.getClusterName(); - if (connectionDrainManager != null && !connectionDrainManager.shouldAcceptConnection(clusterName)) { - LOGGER.info("Rejecting new connection for draining cluster '{}'", clusterName); - ctx.close(); - return; - } - - this.proxyChannelStateMachine.onClientActive(this); - super.channelActive(this.clientCtx); - } -} -``` - -## Graceful Restart Flow Diagram - -``` -┌───────────────────────────────────────────────────────────────────────────┐ -│ PHASE 1: INITIATE DRAINING │ -│ │ -│ VirtualClusterManager.restartVirtualCluster() │ -│ │ │ -│ ▼ │ -│ ConnectionDrainManager.startDraining(clusterName) │ -│ - drainingClusters.put(clusterName, true) │ -│ - New connections will be REJECTED │ -└───────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ PHASE 2: APPLY BACKPRESSURE │ -│ │ -│ ConnectionDrainManager.gracefullyCloseConnections() │ -│ │ -│ For each DOWNSTREAM channel (client → proxy): │ -│ - channel.config().setAutoRead(false) │ -│ - Stops receiving NEW client requests │ -│ │ -│ For each UPSTREAM channel (proxy → Kafka): │ -│ - autoRead remains TRUE │ -│ - Continues receiving Kafka responses │ -│ │ -│ Result: In-flight count decreases naturally as responses arrive │ -└───────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ PHASE 3: MONITOR & CLOSE CHANNELS │ -│ │ -│ For each channel: │ -│ ┌─────────────────────────────────────────────────────────────────────┐ │ -│ │ scheduler.scheduleAtFixedRate(() -> { │ │ -│ │ pendingRequests = inFlightTracker.getPendingRequestCount() │ │ -│ │ │ │ -│ │ if (pendingRequests == 0) { │ │ -│ │ channel.close() ◄── Safe to close! │ │ -│ │ } │ │ -│ │ }, 50ms, 100ms) │ │ -│ │ │ │ -│ │ scheduler.schedule(() -> { │ │ -│ │ if (!done) channel.close() ◄── Force close on timeout │ │ -│ │ }, 30 seconds) │ │ -│ └─────────────────────────────────────────────────────────────────────┘ │ -└───────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ PHASE 4: DEREGISTER & REGISTER │ -│ │ -│ endpointRegistry.deregisterVirtualCluster(oldGateway) │ -│ - Unbinds network ports │ -│ │ -│ endpointRegistry.registerVirtualCluster(newGateway) │ -│ - Binds network ports with new configuration │ -└───────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌───────────────────────────────────────────────────────────────────────────┐ -│ PHASE 5: STOP DRAINING │ -│ │ -│ ConnectionDrainManager.stopDraining(clusterName) │ -│ - drainingClusters.remove(clusterName) │ -│ - New connections now ACCEPTED │ -│ - Cluster is fully operational with new configuration │ -└───────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -# Example Usage - -## Triggering a Reload with curl - -```bash -curl -X POST http://localhost:9190/admin/config/reload \ - -H "Content-Type: application/yaml" \ - --data-binary @new-config.yaml -``` - -**Response:** -```json -{ - "success": true, - "message": "Configuration reloaded successfully", - "clustersModified": 1, - "clustersAdded": 1, - "clustersRemoved": 0, - "timestamp": "2024-01-15T10:30:00.123456Z" -} -``` - -## Example Configuration with Reload Endpoint - -```yaml -management: - bindAddress: 0.0.0.0 - port: 9190 - endpoints: - prometheus: {} - configReload: - enabled: true - timeout: 60s - -virtualClusters: -- name: "demo-cluster" - targetCluster: - bootstrapServers: "broker:9092" - gateways: - - name: "default-gateway" - portIdentifiesNode: - bootstrapAddress: "localhost:9092" -``` +## Rejected alternatives ---- +- **File watcher as the primary trigger**: The original proposal used filesystem watching to detect configuration changes. This was set aside in favour of decoupling the trigger from the apply operation, since the trigger mechanism has unresolved design questions (security, delivery method, Kubernetes integration) that should not block the core capability. +- **`ReloadOptions` as a per-call parameter**: An approach where each call to `reload()` could specify failure behaviour (rollback/terminate/continue) and whether to persist to disk. Rejected because these decisions vary by deployment, not by invocation — they belong in static configuration. +- **`ConfigurationReconciler` naming**: Considered to describe the "compare desired vs current and converge" pattern, but rejected because Kubernetes reconcilers already exist in the Kroxylicious codebase and overloading the term would cause confusion. +- **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `applyConfiguration()`. A validate/dry-run capability can be added later without changing the interface. +- **Inline configuration via HTTP POST body**: Discussed having the HTTP endpoint accept the full YAML configuration in the request body. An alternative view is that configuration should always live in files (for source control, auditability, consistent state) and the HTTP endpoint should just trigger reading from a specified file path. This question is deferred along with the HTTP trigger design. From 3f6d4dc16cc8ed9449c81a10db10b787d4eed87e Mon Sep 17 00:00:00 2001 From: Sam Barker Date: Wed, 18 Feb 2026 17:07:40 +1300 Subject: [PATCH 04/17] Address review feedback on restructured proposal - Fix summary to read as proposed behaviour, not existing - Use "administrators" instead of "operators" for humans to avoid confusion with the Kubernetes operator process - Fix filter config examples (KMS endpoint, key selection pattern) - Clarify failure behaviour is consistent across trigger mechanisms - Note thundering herd as a known trade-off of remove+add - Fix "original proposal" to "earlier iterations" Assisted-by: Claude claude-opus-4-6 Signed-off-by: Sam Barker Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index e3cca70..1b6c1dd 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -1,7 +1,7 @@ # Changing Active Proxy Configuration -This proposal describes a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. -The proxy exposes a core `applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. +This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. +It proposes a core `applyConfiguration(Configuration)` operation that would accept a complete configuration, detect what changed, and converge the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. ## Current situation @@ -14,11 +14,11 @@ Both cause a service interruption that is disproportionate to the scope of the c ## Motivation -Operators need to be able to modify proxy configuration in place. +Administrators need to be able to modify proxy configuration in place. Common scenarios include: - **Adding or removing virtual clusters** as tenants are onboarded or offboarded. -- **Updating filter configuration** (e.g. changing encryption keys, adjusting rate limits, modifying ACL rules). +- **Updating filter configuration** (e.g. updating a KMS endpoint, changing a key selection pattern, modifying ACL rules). - **Rotating TLS certificates or credentials** that filters reference. The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. @@ -46,8 +46,9 @@ The `applyConfiguration()` operation is the internal interface that any trigger How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). -**Failure behaviour is deployment-level static configuration, not a per-call parameter.** -Whether the proxy rolls back, terminates, or continues on failure will vary between deployments (a multi-tenant ingress has different requirements than a sidecar), but it should not vary between invocations within the same deployment. +**Failure behaviour is deployment-level static configuration.** +Whether the proxy rolls back, terminates, or continues on failure will vary between deployments (a multi-tenant ingress has different requirements than a sidecar), but it should not vary between invocations or between trigger mechanisms within the same deployment. +A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. These decisions belong in the proxy's static configuration, keeping the `applyConfiguration()` signature simple. ### Configuration change detection @@ -77,6 +78,7 @@ Clients connected to the cluster will be disconnected during the drain phase. This is a deliberate design choice. More surgical approaches — such as swapping the filter chain on existing connections without dropping them, or performing a rolling handoff — would reduce disruption, but they add significant complexity (connection migration, state transfer between filter chain instances, partial rollback of in-flight connections). The remove+add approach is the right starting point: it is straightforward, predictable, and consistent with how the proxy handles startup failures today. +The remove+add approach also creates a thundering herd when all disconnected clients reconnect simultaneously after the cluster comes back up; mitigation strategies (e.g. staggered connection acceptance) are future work. More surgical alternatives are worth exploring as future work once the foundation is solid. Changes are processed in the order: **remove → modify → add**. @@ -101,7 +103,7 @@ The initial default is **all-or-nothing rollback**: if any cluster operation fai Added clusters are removed, modified clusters are reverted to their original configuration, and removed clusters are re-added. This is consistent with startup behaviour, where a failure in any virtual cluster fails the entire proxy. -It produces a predictable outcome: after a failed apply, the proxy is in its previous known-good state, and the operator can investigate and retry. +It produces a predictable outcome: after a failed apply, the proxy is in its previous known-good state, and the administrator can investigate and retry. Other deployment models may need different behaviour: @@ -162,7 +164,7 @@ Each of these can be designed and implemented independently once the core `apply ## Rejected alternatives -- **File watcher as the primary trigger**: The original proposal used filesystem watching to detect configuration changes. This was set aside in favour of decoupling the trigger from the apply operation, since the trigger mechanism has unresolved design questions (security, delivery method, Kubernetes integration) that should not block the core capability. +- **File watcher as the primary trigger**: Earlier iterations of this proposal used filesystem watching to detect configuration changes. This was set aside in favour of decoupling the trigger from the apply operation, since the trigger mechanism has unresolved design questions (security, delivery method, Kubernetes integration) that should not block the core capability. - **`ReloadOptions` as a per-call parameter**: An approach where each call to `reload()` could specify failure behaviour (rollback/terminate/continue) and whether to persist to disk. Rejected because these decisions vary by deployment, not by invocation — they belong in static configuration. - **`ConfigurationReconciler` naming**: Considered to describe the "compare desired vs current and converge" pattern, but rejected because Kubernetes reconcilers already exist in the Kroxylicious codebase and overloading the term would cause confusion. - **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `applyConfiguration()`. A validate/dry-run capability can be added later without changing the interface. From 2ec40f5d7bdc1c90a2266e23dbc6590671dd42f7 Mon Sep 17 00:00:00 2001 From: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Date: Mon, 13 Apr 2026 23:10:44 +0530 Subject: [PATCH 05/17] updated hot reload proposal as per the new virtual lifecycle state proposal This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `applyConfiguration(Configuration)` operation that detects changes and restarts only affected virtual clusters. Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 234 ++++++++++++++++++---------- 1 file changed, 148 insertions(+), 86 deletions(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index 1b6c1dd..0052f48 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -1,28 +1,26 @@ # Changing Active Proxy Configuration -This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. -It proposes a core `applyConfiguration(Configuration)` operation that would accept a complete configuration, detect what changed, and converge the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. +**Builds on:** [Proposal 016 — Virtual Cluster Lifecycle](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) + +This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. + +This proposal extends the virtual cluster lifecycle model (Proposal 016) with reload operations. Where Proposal 016 defines the per-VC state machine and the `VirtualClusterManager` that owns it, this proposal defines the change detection pipeline, the reload orchestration layer, and the connection draining infrastructure that drive lifecycle transitions during a configuration apply. ## Current situation -Any change to Kroxylicious configuration — adding, removing, or modifying a virtual cluster, changing a filter definition, or updating default filters — requires a full restart of the proxy process. -This means all virtual clusters are torn down and rebuilt, dropping every client connection even if only one cluster was modified. +Any change to Kroxylicious configuration — adding, removing, or modifying a virtual cluster, changing a filter definition, or updating default filters — requires a full restart of the proxy process. This means all virtual clusters are torn down and rebuilt, dropping every client connection even if only one cluster was modified. -In Kubernetes deployments, a configuration change means a pod restart. -In standalone deployments, it means stopping and restarting the process. -Both cause a service interruption that is disproportionate to the scope of the change. +In Kubernetes deployments, a configuration change means a pod restart. In standalone deployments, it means stopping and restarting the process. Both cause a service interruption that is disproportionate to the scope of the change. ## Motivation -Administrators need to be able to modify proxy configuration in place. -Common scenarios include: +Administrators need to be able to modify proxy configuration in place. Common scenarios include: - **Adding or removing virtual clusters** as tenants are onboarded or offboarded. - **Updating filter configuration** (e.g. updating a KMS endpoint, changing a key selection pattern, modifying ACL rules). - **Rotating TLS certificates or credentials** that filters reference. -The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. -Unaffected clusters should continue serving traffic without interruption. +The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. Unaffected clusters should continue serving traffic without interruption. ## Proposal @@ -30,112 +28,186 @@ Unaffected clusters should continue serving traffic without interruption. The central operation is: -``` -applyConfiguration(Configuration) +```java +public CompletableFuture applyConfiguration(Configuration newConfig) ``` -The caller provides a complete `Configuration` object. -The proxy compares it against the currently running configuration, determines what changed, and applies the changes. +The caller provides a complete `Configuration` object. The proxy compares it against the currently running configuration, determines what changed, and applies the changes. The method returns a `CompletableFuture` that completes with a structured result on success or exceptionally on failure. -This is a **state-of-the-world** approach: the caller provides the desired end state, and the proxy is responsible for computing and executing the diff. -This is the right starting point — it is simple to reason about and avoids the complexity of delta-based or partial-update APIs. -More granular approaches (deltas, targeted snapshots) are worth exploring later, but the initial API should leave room for them without committing to them now. +In this approach: the caller provides the desired end state, and the proxy is responsible for computing and executing the diff. This is the right starting point — it is simple to reason about and avoids the complexity of delta-based or partial-update APIs. More granular approaches (deltas, targeted snapshots) are worth exploring later, but the initial API should leave room for them without committing to them now. -**Trigger mechanisms are explicitly out of scope for this proposal.** -The `applyConfiguration()` operation is the internal interface that any trigger plugs into. -How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. -Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). +**Trigger mechanisms are explicitly out of scope for this proposal.** The `applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). -**Failure behaviour is deployment-level static configuration.** -Whether the proxy rolls back, terminates, or continues on failure will vary between deployments (a multi-tenant ingress has different requirements than a sidecar), but it should not vary between invocations or between trigger mechanisms within the same deployment. -A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. -These decisions belong in the proxy's static configuration, keeping the `applyConfiguration()` signature simple. +**Failure behaviour is deployment-level static configuration.** Whether the proxy rolls back or terminates on failure is controlled by `ReloadOptions`, defined once in the proxy's static YAML configuration file. This ensures consistent, operator-controlled behaviour regardless of trigger mechanism. A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. These decisions belong in the proxy's static configuration, keeping the `applyConfiguration()` signature simple. + +```yaml +reloadOptions: + onFailure: ROLLBACK # ROLLBACK | TERMINATE + persistConfigToDisk: true # true | false +``` + +| Field | Values | Description | +|-------|--------|--------------------------------------------------------------------------------------------------------| +| `onFailure` | `ROLLBACK` | **Default.** Undo all changes, restore proxy to previous stable state. | +| | `TERMINATE` | Shut down the entire proxy. Let external supervision (K8s, systemd) restart. | +| `persistConfigToDisk` | `true` | **Default.** Write the new configuration to the config file (with `.bak` backup of the replaced file). | +| | `false` | Don't persist. The reload is in-memory only. | ### Configuration change detection -When `applyConfiguration()` is called, the proxy compares the new configuration against the running state to determine which virtual clusters need to be restarted. -A cluster requires a restart if any of the following changed: +When `applyConfiguration()` is called, the proxy compares the new configuration against the running state to determine which virtual clusters need to be restarted. Change detection is implemented as a pipeline of `ChangeDetector` implementations, each responsible for one category of change: + +- **`VirtualClusterChangeDetector`** — identifies clusters that were added, removed, or modified by comparing `VirtualClusterModel` instances via `equals()`. A cluster requires a restart if any property that contributes to `VirtualClusterModel.equals()` changed (bootstrap address, TLS settings, gateway configuration, etc.). +- **`FilterChangeDetector`** — identifies clusters affected by filter configuration changes. A cluster requires a restart if a `NamedFilterDefinition` it references changed (type or configuration, compared via `equals()`), or if the `defaultFilters` list changed (order matters, since filter chain execution is sequential) and the cluster relies on default filters. -- **The virtual cluster model itself** — bootstrap address, TLS settings, gateway configuration, or any other property that contributes to `VirtualClusterModel.equals()`. -- **A filter definition used by the cluster** — if the type or configuration of a `NamedFilterDefinition` referenced by the cluster changed (compared via `equals()`). -- **The default filters list** — if the cluster relies on default filters and the default filters list changed (order matters, since filter chain execution is sequential). +Detectors return a `ChangeResult(clustersToRemove, clustersToAdd, clustersToModify)`. Results from all detectors are aggregated via `LinkedHashSet` to maintain order while deduplicating cluster names that appear in multiple detector results. Clusters where none of these changed are left untouched — they continue serving traffic throughout the apply operation. -### Cluster modification: remove + add +### Cluster modification via lifecycle transitions + +A modified virtual cluster is restarted by driving it through the lifecycle states defined in Proposal 016. Proposal 016 defines the per-VC state machine (`VirtualClusterLifecycleState`) and the `VirtualClusterLifecycleManager` that enforces valid transitions. This proposal adds the reload operations that drive those transitions. + +The three reload operations map to lifecycle transitions as follows: -A modified virtual cluster is restarted by tearing it down and rebuilding it with the new configuration. -This is a **remove then add** operation: +**Restart (modify):** `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING` -1. Gracefully drain existing connections (see below). -2. Deregister the cluster's gateways from the endpoint registry (unbind ports). -3. Register the cluster's new gateways with the endpoint registry (bind ports with new configuration). -4. Accept new connections. +A modified cluster is torn down and rebuilt with the new configuration. During restart, the lifecycle state cycles through Draining and back to Initializing without ever reaching the terminal Stopped state. This means the `onVirtualClusterStopped` callback (defined in Proposal 016) does not fire during restart — reload is an internal VCM operation that stays within the lifecycle state machine. -This means a modified cluster experiences a brief period of unavailability while its ports are unbound and rebound. -Clients connected to the cluster will be disconnected during the drain phase. +**Remove:** `SERVING → DRAINING → [drain connections] → [deregister gateways] → STOPPED` -This is a deliberate design choice. -More surgical approaches — such as swapping the filter chain on existing connections without dropping them, or performing a rolling handoff — would reduce disruption, but they add significant complexity (connection migration, state transfer between filter chain instances, partial rollback of in-flight connections). -The remove+add approach is the right starting point: it is straightforward, predictable, and consistent with how the proxy handles startup failures today. -The remove+add approach also creates a thundering herd when all disconnected clients reconnect simultaneously after the cluster comes back up; mitigation strategies (e.g. staggered connection acceptance) are future work. -More surgical alternatives are worth exploring as future work once the foundation is solid. +A removed cluster is permanently torn down. It reaches the terminal Stopped state, and the `onVirtualClusterStopped` callback fires. If the proxy's `onVirtualClusterStopped` failure policy is `serve: none` (the default from Proposal 016), the proxy shuts down. If the policy is `serve: successful`, the proxy continues with the remaining healthy virtual clusters. -Changes are processed in the order: **remove → modify → add**. -Removing clusters first frees up ports and resources that new or modified clusters may need. +**Add:** `[create lifecycle manager in INITIALIZING] → [register gateways] → SERVING` + +A new cluster starts in the Initializing state with a fresh `VirtualClusterLifecycleManager`, registers its gateways with the `EndpointRegistry`, and transitions to Serving. + +Changes are processed in the order: **remove → modify → add**. Removing clusters first frees up ports and resources that new or modified clusters may need. + +This means a modified cluster experiences a brief period of unavailability while its ports are unbound and rebound. Clients connected to the cluster will be disconnected during the drain phase. This is a deliberate design choice. More surgical approaches — such as swapping the filter chain on existing connections without dropping them, or performing a rolling handoff — would reduce disruption, but they add significant complexity. The remove+add approach is the right starting point: it is straightforward, predictable, and consistent with how the proxy handles startup failures today. The remove+add approach also creates a thundering herd when all disconnected clients reconnect simultaneously after the cluster comes back up; mitigation strategies (e.g. staggered connection acceptance) are future work. ### Graceful connection draining -Before tearing down a modified or removed cluster, the proxy drains its connections gracefully rather than dropping them abruptly. -The drain process has three phases: +Before tearing down a modified or removed cluster, the proxy drives its lifecycle from **Serving to Draining** (via `VirtualClusterLifecycleManager.startDraining()`). The detailed mechanics of connection draining — rejecting new connections, applying backpressure, waiting for in-flight requests, and force-closing after timeout — are defined as part of the `Draining` lifecycle state in Proposal 016 and its implementation. This proposal does not redefine that behaviour; it relies on the lifecycle state machine to handle drain semantics. + +Once all connections are drained (or the drain timeout expires), the lifecycle transitions out of Draining: +- For **restart**, gateways are deregistered and re-registered, the lifecycle transitions through Initializing to Serving. +- For **remove**, the lifecycle manager transitions from Draining to Stopped via `drainComplete()`, and the `onVirtualClusterStopped` callback fires. + +### VirtualClusterManager integration + +[Proposal-016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) defines `VirtualClusterManager` as the owner of the VC configuration tree and lifecycle state. It holds the `VirtualClusterModel` list, the per-VC `VirtualClusterLifecycleManager` instances, and the `onVirtualClusterStopped` callback. + +For hot reload, the `VirtualClusterManager` is extended with reload operations that combine lifecycle transitions with infrastructure actions: + +- **`removeVirtualCluster(clusterName, rollbackTracker)`** — drives `SERVING → DRAINING → [drain via ConnectionDrainManager] → [deregister via EndpointRegistry] → STOPPED`. Fires `onVirtualClusterStopped` callback. Tracks the removal for potential rollback. +- **`restartVirtualCluster(clusterName, newModel, rollbackTracker)`** — drives `SERVING → DRAINING → [drain] → [deregister] → INITIALIZING → [register] → SERVING`. Never reaches Stopped; callback does not fire. Tracks the modification for potential rollback. +- **`addVirtualCluster(newModel, rollbackTracker)`** — creates a new lifecycle manager in Initializing, drives `[register via EndpointRegistry] → INITIALIZING → SERVING`. Tracks the addition for potential rollback. + +The `VirtualClusterManager` gains two dependencies for reload operations: +- `EndpointRegistry` — for gateway registration and deregistration (port binding/unbinding) +- `ConnectionDrainManager` — for graceful connection draining during the Draining state -1. **Reject new connections.** The cluster is marked as draining. Any new client connection attempt to the cluster is immediately refused. Unaffected clusters continue accepting connections normally. +The `ConfigurationChangeHandler` calls these `VirtualClusterManager` methods based on the `ChangeResult` from the detection pipeline. -2. **Apply backpressure to existing connections.** On each downstream (client → proxy) channel, reading is disabled (`autoRead = false`) so no new requests are accepted from the client. Upstream (proxy → Kafka) channels continue reading so that responses to already-forwarded requests can flow back to clients. +### FilterChainFactory hot-swap -3. **Wait for in-flight requests to complete, then close.** Each connection is monitored for in-flight Kafka requests. Once the pending request count for a connection reaches zero, the connection is closed. If the count does not reach zero within the drain timeout, the connection is force-closed. The drain timeout should be configurable — long-running consumer rebalances or slow produces with `acks=all` can legitimately exceed a short default. +Filter configuration changes require replacing the `FilterChainFactory` that creates filter chains for new connections. The existing architecture creates `FilterChainFactory` once at startup and passes it as a `final` reference to `KafkaProxyInitializer`. -This approach ensures that in-progress Kafka operations complete where possible, while bounding the time the proxy waits before proceeding with the restart. +To support hot reload, `KafkaProxy` holds an `AtomicReference` shared between: +- `KafkaProxyInitializer` — reads via `.get()` on each new connection, always getting the current factory +- `ConfigurationReloadOrchestrator` — swaps via `.set()` after successful reload + +On success: the orchestrator atomically swaps to the new factory and closes the old one. On failure with `ROLLBACK`: the new factory is closed and the reference remains unchanged. This ensures a clean transition with no race conditions between connection setup and factory replacement. ### Failure behaviour and rollback -The initial default is **all-or-nothing rollback**: if any cluster operation fails during apply (e.g. a port conflict when rebinding, a TLS error, a plugin initialisation failure), all previously successful operations in that apply are rolled back in reverse order. -Added clusters are removed, modified clusters are reverted to their original configuration, and removed clusters are re-added. +The `onFailure` option in `ReloadOptions` controls what happens when a cluster operation fails during apply: + +**`ROLLBACK` (default):** All previously successful operations in that apply are rolled back in reverse order via the `ConfigurationChangeRollbackTracker`. Added clusters are removed, modified clusters are reverted to their original configuration, and removed clusters are re-added. The `FilterChainFactory` reference remains unchanged (old factory stays active). After a failed apply, the proxy is in its previous known-good state. + +Rollback drives the same lifecycle transitions in reverse: a successfully added cluster is removed (driven to Stopped), a successfully modified cluster is restarted with the original model, and a successfully removed cluster is re-added (registered and driven to Serving). -This is consistent with startup behaviour, where a failure in any virtual cluster fails the entire proxy. -It produces a predictable outcome: after a failed apply, the proxy is in its previous known-good state, and the administrator can investigate and retry. +**`TERMINATE`:** No rollback is attempted. `KafkaProxy.applyConfiguration()` calls `shutdown()`, which drives all VCs through the standard shutdown path (`transitionAllToDraining()` → `transitionAllToStopped()`), firing `onVirtualClusterStopped` for each. The process supervisor (Kubernetes, systemd) is expected to restart the proxy. -Other deployment models may need different behaviour: +For removed clusters that reach Stopped during apply, the `onVirtualClusterStopped` callback fires per the Proposal 016 contract. The callback's failure policy (`serve: none` or `serve: successful`) applies independently of the `onFailure` reload option — these are orthogonal policies. The reload `onFailure` controls whether the orchestration layer rolls back; the `onVirtualClusterStopped` policy controls whether the proxy-level owner shuts down when any VC reaches Stopped. -- A **multi-tenant ingress** deployment might prefer to continue running with partial success rather than rolling back all changes because one tenant's cluster failed. -- A **Kubernetes sidecar** deployment might prefer to terminate the process and let the supervisor restart it. +### Orchestration pipeline -These alternatives are deployment-level configuration choices (as discussed above) and do not need to be resolved for the initial implementation. -All-or-nothing rollback is the safe default that covers the common case. +The complete `applyConfiguration()` pipeline flows through these layers: + +``` +KafkaProxy.applyConfiguration(newConfig) + │ + ├── Reads ReloadOptions from bootstrap config + ├── Guards: proxy must be running, orchestrator must be initialized + │ + ▼ +ConfigurationReloadOrchestrator.reload(newConfig, reloadOptions) + │ + ├── Acquires reloadLock (prevents concurrent reloads) + ├── Validates new configuration via Features framework + ├── Creates new FilterChainFactory with updated filter definitions + ├── Builds ConfigurationChangeContext (old/new config, models, factories) + │ + ▼ +ConfigurationChangeHandler.handleConfigurationChange(context, onFailure) + │ + ├── Aggregates ChangeDetector results: + │ VirtualClusterChangeDetector → added/removed/modified VCs + │ FilterChangeDetector → VCs affected by filter changes + │ + ├── Creates ConfigurationChangeRollbackTracker + ├── Processes changes in order: Remove → Modify → Add + │ + ▼ +VirtualClusterManager (for each affected VC) + │ + ├── removeVirtualCluster: SERVING → DRAINING → drain → deregister → STOPPED + ├── restartVirtualCluster: SERVING → DRAINING → drain → deregister → INITIALIZING → register → SERVING + ├── addVirtualCluster: INITIALIZING → register → SERVING + │ + ▼ +On success: swap FilterChainFactory, update current config, optionally persist to disk +On failure: apply onFailure policy (ROLLBACK / TERMINATE) +``` + +### Concurrency control + +Only one reload operation can execute at a time. The `ConfigurationReloadOrchestrator` uses a `ReentrantLock` to prevent concurrent `applyConfiguration()` calls. A second call while a reload is in progress fails immediately with a `ConcurrentReloadException` rather than queuing. ### Plugin resource tracking (known gap) -The change detection described above can identify when a filter's YAML configuration changes (via `equals()` on the configuration model), but it cannot detect when external resources that a plugin reads during initialisation have changed. -For example, a password file, TLS keystore, or ACL rules file may have changed on disk even though the plugin's configuration (which only references the file path) is identical. +The change detection described above can identify when a filter's YAML configuration changes (via `equals()` on the configuration model), but it cannot detect when external resources that a plugin reads during initialisation have changed. For example, a password file, TLS keystore, or ACL rules file may have changed on disk even though the plugin's configuration (which only references the file path) is identical. These reads typically happen deep in nested plugin call stacks (e.g. `RecordEncryption` → `KmsService` → `CredentialProvider` → `FilePassword`), so the runtime has no visibility into what was read or whether it has changed. Without addressing this gap, an apply operation would miss these changes entirely — the plugin configuration hasn't changed, so no restart is triggered, even though the plugin's actual behaviour would differ if it were reinitialised. -An approach is being explored where plugins read external resources through the runtime (rather than doing direct file I/O), allowing the runtime to track what was read and hash the content for subsequent change detection. -This makes dependency tracking automatic rather than relying on plugin authors to opt in. -The detailed design for this mechanism will be covered in a separate proposal to keep this one focused on the core apply operation. +An approach is being explored where plugins read external resources through the runtime (rather than doing direct file I/O), allowing the runtime to track what was read and hash the content for subsequent change detection. This makes dependency tracking automatic rather than relying on plugin authors to opt in. The detailed design for this mechanism will be covered in a separate proposal to keep this one focused on the core apply operation. + +### Metrics and observability (future work) + +The initial implementation does not include reload-specific metrics or observability endpoints. +However, the following metrics are identified as valuable for future work and should be introduced once the core reload mechanism is stable: + +- **`kroxylicious_reload_total`** — counter of `applyConfiguration()` invocations, labelled by outcome (`success`, `rollback`, `terminate`). Enables alerting on reload failures and tracking reload frequency. +- **`kroxylicious_reload_duration_seconds`** — histogram of end-to-end reload duration. Helps operators understand whether reload is meeting SLA expectations and identify slow operations. +- **`kroxylicious_reload_clusters_affected_total`** — counter of per-VC operations during reload, labelled by operation (`add`, `remove`, `modify`) and outcome (`success`, `failure`, `rolledback`). Provides granularity beyond the aggregate reload result. +- **`kroxylicious_drain_duration_seconds`** — histogram of per-VC connection drain duration. Helps tune the `drainTimeout` configuration and detect VCs with long-lived connections. +- **`kroxylicious_drain_connections_force_closed_total`** — counter of connections force-closed after drain timeout. A high rate indicates the drain timeout is too aggressive for the workload. + +The per-VC lifecycle state metrics (`kroxylicious_virtual_cluster_state`, `kroxylicious_virtual_cluster_state_duration_seconds`, `kroxylicious_virtual_cluster_transitions_total`) defined in Proposal 016 complement these reload-specific metrics and should be implemented alongside them. + +A management endpoint exposing the last reload result (timestamp, outcome, affected clusters, duration) would also be valuable for on-demand inspection by operators. The design of this endpoint is deferred along with the metrics implementation. ## Open questions -- **Configuration granularity**: The initial design uses state-of-the-world snapshots. Is there a use case that requires delta-based operations or more targeted snapshots in the near term, or is this purely future work? -- **Failure behaviour options beyond all-or-nothing**: What specific deployment models need partial-success or terminate-on-failure semantics, and what configuration surface do they need? -- **Drain timeout default and configurability**: What is a reasonable default drain timeout? How should it be configured — globally, per-cluster, or both? +- **Rollback and `onVirtualClusterStopped` interaction**: If a remove operation succeeds (VC reaches Stopped, callback fires) but a subsequent add operation fails and triggers rollback (re-adding the removed VC), the callback will have already fired for the Stopped transition. Should the rollback suppress or compensate for the callback? With `serve: none`, the callback may trigger proxy shutdown before rollback has a chance to restore the VC. ## Trigger mechanisms (future work) -The `applyConfiguration()` operation is trigger-agnostic. -The following trigger mechanisms have been discussed but are explicitly deferred: +The `applyConfiguration()` operation is trigger-agnostic. The following trigger mechanisms have been discussed but are explicitly deferred: - **HTTP endpoint**: An HTTP POST endpoint (e.g. `/admin/config/reload`) that accepts a new configuration and calls `applyConfiguration()`. Provides synchronous feedback. Questions remain around security (authentication, binding to localhost vs. network interfaces), whether the endpoint receives the configuration inline or reads it from a file path, and content-type handling. - **File watcher**: A filesystem watcher that detects changes to the configuration file and triggers `applyConfiguration()`. Interacts with Kubernetes ConfigMap mount semantics. Questions remain around debouncing, atomic file replacement, and read-only filesystem constraints. @@ -143,29 +215,19 @@ The following trigger mechanisms have been discussed but are explicitly deferred Each of these can be designed and implemented independently once the core `applyConfiguration()` mechanism is in place. -## Affected/not affected projects - -**Affected:** - -- **kroxylicious** (core proxy) — The `applyConfiguration()` operation, change detection, cluster lifecycle management, and connection draining all live here. -- **kroxylicious-junit5-extension** — Test infrastructure may need to support applying configuration changes to a running proxy in integration tests. - -**Not affected:** - -- **kroxylicious-operator** — The operator will eventually be a trigger mechanism, but the core apply operation does not depend on it. -- **Filter/plugin implementations** — Existing filters do not need to change. The plugin resource tracking gap (above) may eventually require filters to change how they read external resources, but that is a separate proposal. - ## Compatibility - The `applyConfiguration()` operation is additive — it does not change existing startup behaviour. - Virtual cluster configuration semantics are unchanged; the proposal only adds the ability to apply changes at runtime. - Filter definitions and their configuration are unchanged. -- No changes to the on-disk configuration file format. +- No changes to the on-disk configuration file format beyond the optional `reloadOptions` block. +- The lifecycle state model (Proposal 016) is unchanged; this proposal only adds operations that drive transitions through the existing state machine. ## Rejected alternatives - **File watcher as the primary trigger**: Earlier iterations of this proposal used filesystem watching to detect configuration changes. This was set aside in favour of decoupling the trigger from the apply operation, since the trigger mechanism has unresolved design questions (security, delivery method, Kubernetes integration) that should not block the core capability. -- **`ReloadOptions` as a per-call parameter**: An approach where each call to `reload()` could specify failure behaviour (rollback/terminate/continue) and whether to persist to disk. Rejected because these decisions vary by deployment, not by invocation — they belong in static configuration. +- **`ReloadOptions` as a per-call parameter**: An approach where each call to `applyConfiguration()` could specify failure behaviour (rollback/terminate). Rejected because these decisions vary by deployment, not by invocation — they belong in static configuration. - **`ConfigurationReconciler` naming**: Considered to describe the "compare desired vs current and converge" pattern, but rejected because Kubernetes reconcilers already exist in the Kroxylicious codebase and overloading the term would cause confusion. - **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `applyConfiguration()`. A validate/dry-run capability can be added later without changing the interface. - **Inline configuration via HTTP POST body**: Discussed having the HTTP endpoint accept the full YAML configuration in the request body. An alternative view is that configuration should always live in files (for source control, auditability, consistent state) and the HTTP endpoint should just trigger reading from a specified file path. This question is deferred along with the HTTP trigger design. +- **Separate VirtualClusterManager for reload**: The original hot-reload design had a `VirtualClusterManager` that was purely an operation orchestrator (with `EndpointRegistry` and `ConnectionDrainManager` dependencies). Rather than maintaining two classes with the same name, the reload operations merge into the [Proposal 016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) `VirtualClusterManager`, which already owns the VC model list and lifecycle managers. The merged class gains `EndpointRegistry` and `ConnectionDrainManager` dependencies and the `removeVirtualCluster`/`restartVirtualCluster`/`addVirtualCluster` methods. From 02f44e79632a1c06704ff8a24e1f5387822f3ea3 Mon Sep 17 00:00:00 2001 From: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Date: Tue, 21 Apr 2026 18:27:28 +0530 Subject: [PATCH 06/17] Refine hot reload proposal with new policies and layers Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 214 ++++++++++++++++++++++------ 1 file changed, 171 insertions(+), 43 deletions(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index 0052f48..b32c4a3 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -4,7 +4,7 @@ This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. -This proposal extends the virtual cluster lifecycle model (Proposal 016) with reload operations. Where Proposal 016 defines the per-VC state machine and the `VirtualClusterManager` that owns it, this proposal defines the change detection pipeline, the reload orchestration layer, and the connection draining infrastructure that drive lifecycle transitions during a configuration apply. +This proposal extends the virtual cluster lifecycle model (Proposal 016) with reload operations, an edge-based failure policy, and a configuration change orchestration layer. Where Proposal 016 defines the per-VC state machine and the `VirtualClusterManager` that owns it, this proposal defines the change detection pipeline, the reload orchestration, and the two policy layers (terminal failure and configuration failure) that govern how the proxy responds to problems during reload. ## Current situation @@ -22,6 +22,8 @@ Administrators need to be able to modify proxy configuration in place. Common sc The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. Unaffected clusters should continue serving traffic without interruption. +This proposal also delivers the startup behaviour change that Proposal 016 made possible. With `serve: successful` and reload, an operator can let healthy VCs serve traffic while fixing a broken VC's config and re-applying — making per-VC independence useful rather than theoretical. + ## Proposal ### Core API: `applyConfiguration()` @@ -38,20 +40,51 @@ In this approach: the caller provides the desired end state, and the proxy is re **Trigger mechanisms are explicitly out of scope for this proposal.** The `applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). -**Failure behaviour is deployment-level static configuration.** Whether the proxy rolls back or terminates on failure is controlled by `ReloadOptions`, defined once in the proxy's static YAML configuration file. This ensures consistent, operator-controlled behaviour regardless of trigger mechanism. A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. These decisions belong in the proxy's static configuration, keeping the `applyConfiguration()` signature simple. +### Two lifecycle policy layers + +Proposal 016's lifecycle model has two distinct policy points, each on a different edge of the state graph. This proposal refines the failure policy from a node-based hook (firing when a VC *arrives at* `Stopped`) to an edge-based hook (firing based on *which transition* brought it there). + +**Layer 1 — Per-VC recovery (`onVirtualClusterFailed`):** +When a VC hits the `initializing → failed` edge, what happens to that VC? Retry? How many times? Proposal 016 explicitly deferred this to the reload proposal (*"recovery policies defined by a future reload proposal under onVirtualClusterFailed"*). It's a per-VC concern — VC-B getting 3 retry attempts has nothing to do with VC-A. In the initial implementation, retries are hardcoded to 0 — a failed VC immediately transitions to `stopped`. The seam exists in the code but is not exposed in configuration until retry with backoff is needed. + +**Layer 2 — Terminal failure (`onVirtualClusterTerminalFailure`):** +When a VC traverses the `failed → stopped` edge — meaning it is truly unrecoverable — what's the blast radius? `serve: none` (proxy shuts down) or `serve: successful` (remaining VCs continue). This fires only on the `failed → stopped` edge, not on `draining → stopped` (intentional removal) or `initializing → stopped` (shutdown during startup). The lifecycle state model is unchanged — same states, same transitions. The intelligence is in the `VirtualClusterManager` knowing which transitions are policy-triggering, not in the state itself. + +### Configuration model + +Failure behaviour is deployment-level static configuration, split into two independent policy dimensions. A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. These decisions belong in the proxy's static configuration, keeping the `applyConfiguration()` signature simple. ```yaml -reloadOptions: - onFailure: ROLLBACK # ROLLBACK | TERMINATE - persistConfigToDisk: true # true | false +proxy: + # Lifecycle: fires on the failed → stopped edge only + # Applies at startup AND reload + onVirtualClusterTerminalFailure: + serve: none # none | successful + + # Reload-specific settings + configurationReload: + onFailure: + rollback: true # true | false + persistToDisk: true # true | false ``` -| Field | Values | Description | -|-------|--------|--------------------------------------------------------------------------------------------------------| -| `onFailure` | `ROLLBACK` | **Default.** Undo all changes, restore proxy to previous stable state. | -| | `TERMINATE` | Shut down the entire proxy. Let external supervision (K8s, systemd) restart. | -| `persistConfigToDisk` | `true` | **Default.** Write the new configuration to the config file (with `.bak` backup of the replaced file). | -| | `false` | Don't persist. The reload is in-memory only. | +| Block | Field | Values | Description | +|-------|-------|--------|-------------| +| `onVirtualClusterTerminalFailure` | `serve` | `none` | **Default.** Any unrecoverable VC shuts down the proxy. | +| | | `successful` | Remaining healthy VCs continue serving. Failed VC is reported. | +| `configurationReload` | `onFailure.rollback` | `true` | **Default.** Atomic reload — revert all VCs to prior config on failure. | +| | | `false` | Best-effort — keep what succeeded, let failed VCs die. | +| | `persistToDisk` | `true` | **Default.** Write the new configuration to the config file (with `.bak` backup of the replaced file). | +| | | `false` | Don't persist. The reload is in-memory only. | + +These two dimensions are independently meaningful, producing four distinct behaviours: + +| `rollback` | `serve` | Behaviour | +|-----------|---------|-----------| +| `true` | `none` | Atomic reload. Revert on failure. If revert itself fails, proxy dies. | +| `true` | `successful` | Atomic reload. Revert on failure. If revert itself fails, surviving VCs continue. | +| `false` | `none` | Best-effort. Any unrecoverable VC kills the proxy. | +| `false` | `successful` | Best-effort. Failed VCs die, rest continue. (e.g. TLS enforcement case.) | ### Configuration change detection @@ -72,11 +105,11 @@ The three reload operations map to lifecycle transitions as follows: **Restart (modify):** `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING` -A modified cluster is torn down and rebuilt with the new configuration. During restart, the lifecycle state cycles through Draining and back to Initializing without ever reaching the terminal Stopped state. This means the `onVirtualClusterStopped` callback (defined in Proposal 016) does not fire during restart — reload is an internal VCM operation that stays within the lifecycle state machine. +A modified cluster is torn down and rebuilt with the new configuration. During restart, the lifecycle state cycles through Draining and back to Initializing without ever reaching the terminal Stopped state. This means the `onVirtualClusterTerminalFailure` callback does not fire during restart — reload is an internal VCM operation that stays within the lifecycle state machine. **Remove:** `SERVING → DRAINING → [drain connections] → [deregister gateways] → STOPPED` -A removed cluster is permanently torn down. It reaches the terminal Stopped state, and the `onVirtualClusterStopped` callback fires. If the proxy's `onVirtualClusterStopped` failure policy is `serve: none` (the default from Proposal 016), the proxy shuts down. If the policy is `serve: successful`, the proxy continues with the remaining healthy virtual clusters. +A removed cluster is permanently torn down. It reaches the terminal Stopped state via `draining → stopped`. This is an intentional removal, not a failure — the `onVirtualClusterTerminalFailure` callback does **not** fire (it only fires on the `failed → stopped` edge). **Add:** `[create lifecycle manager in INITIALIZING] → [register gateways] → SERVING` @@ -92,17 +125,17 @@ Before tearing down a modified or removed cluster, the proxy drives its lifecycl Once all connections are drained (or the drain timeout expires), the lifecycle transitions out of Draining: - For **restart**, gateways are deregistered and re-registered, the lifecycle transitions through Initializing to Serving. -- For **remove**, the lifecycle manager transitions from Draining to Stopped via `drainComplete()`, and the `onVirtualClusterStopped` callback fires. +- For **remove**, the lifecycle manager transitions from Draining to Stopped via `drainComplete()`. This is the `draining → stopped` edge — the terminal failure callback does **not** fire. ### VirtualClusterManager integration -[Proposal-016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) defines `VirtualClusterManager` as the owner of the VC configuration tree and lifecycle state. It holds the `VirtualClusterModel` list, the per-VC `VirtualClusterLifecycleManager` instances, and the `onVirtualClusterStopped` callback. +[Proposal 016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) defines `VirtualClusterManager` as the owner of the VC configuration tree and lifecycle state. It holds the `VirtualClusterModel` list, the per-VC `VirtualClusterLifecycleManager` instances, and the `onVirtualClusterTerminalFailure` callback. For hot reload, the `VirtualClusterManager` is extended with reload operations that combine lifecycle transitions with infrastructure actions: -- **`removeVirtualCluster(clusterName, rollbackTracker)`** — drives `SERVING → DRAINING → [drain via ConnectionDrainManager] → [deregister via EndpointRegistry] → STOPPED`. Fires `onVirtualClusterStopped` callback. Tracks the removal for potential rollback. -- **`restartVirtualCluster(clusterName, newModel, rollbackTracker)`** — drives `SERVING → DRAINING → [drain] → [deregister] → INITIALIZING → [register] → SERVING`. Never reaches Stopped; callback does not fire. Tracks the modification for potential rollback. -- **`addVirtualCluster(newModel, rollbackTracker)`** — creates a new lifecycle manager in Initializing, drives `[register via EndpointRegistry] → INITIALIZING → SERVING`. Tracks the addition for potential rollback. +- **`removeVirtualCluster(clusterName, rollbackTracker)`** — drives `SERVING → DRAINING → [drain via ConnectionDrainManager] → [deregister via EndpointRegistry] → STOPPED`. This is the `draining → stopped` edge — callback does **not** fire. Tracks the removal for potential rollback. +- **`restartVirtualCluster(clusterName, newModel, rollbackTracker)`** — drives `SERVING → DRAINING → [drain] → [deregister] → INITIALIZING → [register] → SERVING`. Never reaches Stopped; callback does not fire. If re-initialization fails: `INITIALIZING → FAILED → STOPPED` via the `failed → stopped` edge — callback **fires**. Tracks the modification for potential rollback. +- **`addVirtualCluster(newModel, rollbackTracker)`** — creates a new lifecycle manager in Initializing, drives `[register via EndpointRegistry] → INITIALIZING → SERVING`. If initialization fails: `INITIALIZING → FAILED → STOPPED` via the `failed → stopped` edge — callback **fires**. Tracks the addition for potential rollback. The `VirtualClusterManager` gains two dependencies for reload operations: - `EndpointRegistry` — for gateway registration and deregistration (port binding/unbinding) @@ -118,19 +151,31 @@ To support hot reload, `KafkaProxy` holds an `AtomicReference Date: Fri, 24 Apr 2026 10:57:54 +0530 Subject: [PATCH 07/17] Fix wording as per comments Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index b32c4a3..c24ee32 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -93,7 +93,7 @@ When `applyConfiguration()` is called, the proxy compares the new configuration - **`VirtualClusterChangeDetector`** — identifies clusters that were added, removed, or modified by comparing `VirtualClusterModel` instances via `equals()`. A cluster requires a restart if any property that contributes to `VirtualClusterModel.equals()` changed (bootstrap address, TLS settings, gateway configuration, etc.). - **`FilterChangeDetector`** — identifies clusters affected by filter configuration changes. A cluster requires a restart if a `NamedFilterDefinition` it references changed (type or configuration, compared via `equals()`), or if the `defaultFilters` list changed (order matters, since filter chain execution is sequential) and the cluster relies on default filters. -Detectors return a `ChangeResult(clustersToRemove, clustersToAdd, clustersToModify)`. Results from all detectors are aggregated via `LinkedHashSet` to maintain order while deduplicating cluster names that appear in multiple detector results. +Detectors return a `ChangeResult(clustersToRemove, clustersToAdd, clustersToModify)`. Results from all detectors are aggregated and then passed onto `VirtualClusterManager` to perform relevant operations. Clusters where none of these changed are left untouched — they continue serving traffic throughout the apply operation. @@ -101,9 +101,9 @@ Clusters where none of these changed are left untouched — they continue servin A modified virtual cluster is restarted by driving it through the lifecycle states defined in Proposal 016. Proposal 016 defines the per-VC state machine (`VirtualClusterLifecycleState`) and the `VirtualClusterLifecycleManager` that enforces valid transitions. This proposal adds the reload operations that drive those transitions. -The three reload operations map to lifecycle transitions as follows: +The three change operations map to lifecycle transitions as follows: -**Restart (modify):** `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING` +**Modify (Restart VC):** `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING` A modified cluster is torn down and rebuilt with the new configuration. During restart, the lifecycle state cycles through Draining and back to Initializing without ever reaching the terminal Stopped state. This means the `onVirtualClusterTerminalFailure` callback does not fire during restart — reload is an internal VCM operation that stays within the lifecycle state machine. From 8bfbc36bcb85d48597edd706b68e8003fa49ed4a Mon Sep 17 00:00:00 2001 From: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Date: Fri, 24 Apr 2026 11:09:40 +0530 Subject: [PATCH 08/17] refactor Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 1 - 1 file changed, 1 deletion(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index c24ee32..ab6cfe7 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -18,7 +18,6 @@ Administrators need to be able to modify proxy configuration in place. Common sc - **Adding or removing virtual clusters** as tenants are onboarded or offboarded. - **Updating filter configuration** (e.g. updating a KMS endpoint, changing a key selection pattern, modifying ACL rules). -- **Rotating TLS certificates or credentials** that filters reference. The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. Unaffected clusters should continue serving traffic without interruption. From 9a2b1325f758852a748006db2ee4e694dad76855 Mon Sep 17 00:00:00 2001 From: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Date: Mon, 27 Apr 2026 18:40:45 +0530 Subject: [PATCH 09/17] Added javadoc to public applyConfiguration API Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com> Signed-off-by: Urjit Patel --- proposals/012-hot-reload-feature.md | 90 ++++++++++++++++++++++------- 1 file changed, 70 insertions(+), 20 deletions(-) diff --git a/proposals/012-hot-reload-feature.md b/proposals/012-hot-reload-feature.md index ab6cfe7..aeef004 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/012-hot-reload-feature.md @@ -2,7 +2,7 @@ **Builds on:** [Proposal 016 — Virtual Cluster Lifecycle](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) -This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. +This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `KafkaProxy.applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. This proposal extends the virtual cluster lifecycle model (Proposal 016) with reload operations, an edge-based failure policy, and a configuration change orchestration layer. Where Proposal 016 defines the per-VC state machine and the `VirtualClusterManager` that owns it, this proposal defines the change detection pipeline, the reload orchestration, and the two policy layers (terminal failure and configuration failure) that govern how the proxy responds to problems during reload. @@ -25,19 +25,68 @@ This proposal also delivers the startup behaviour change that Proposal 016 made ## Proposal -### Core API: `applyConfiguration()` +### Core API: `KafkaProxy.applyConfiguration()` The central operation is: ```java -public CompletableFuture applyConfiguration(Configuration newConfig) +class KafkaProxy { + // ... add the following method + + /** + * Apply the given configuration to this running proxy, restarting only the + * virtual clusters whose effective configuration differs from the current + * running state. Unaffected clusters continue serving traffic throughout + * the apply. + * + *

Validation contract

+ *

Static validation (schema conformance, required fields, field-value + * ranges, internal consistency) is the embedder's responsibility and is + * expected to have been performed on {@code newConfig} before this method + * is called. + * + *

Validation which depends on runtime state (like port conflicts) + * will be done during applyConfiguration() and reported through the ReloadResult + * + *

Error reporting

+ *

This method throws synchronously only for programmer errors: + *

    + *
  • {@link NullPointerException} if {@code newConfig} is {@code null};
  • + *
  • {@link IllegalStateException} if the proxy has not been started or + * has been shut down.
  • + *
+ *

All other failures — validation failures (runtime exceptions), lifecycle transition + * failures during drain or re-init, partial reloads — surface via + * exceptional completion of the returned future. + * + * @param newConfig the desired end-state configuration; must be non-null + * and statically valid + * @return a future that completes with a {@link ReloadResult} listing which + * clusters ended up unchanged, added, restarted, removed, or failed + * @throws NullPointerException if {@code newConfig} is {@code null} + * @throws IllegalStateException if the proxy is not in the running state + */ + public CompletableFuture applyConfiguration(Configuration newConfig); +} ``` The caller provides a complete `Configuration` object. The proxy compares it against the currently running configuration, determines what changed, and applies the changes. The method returns a `CompletableFuture` that completes with a structured result on success or exceptionally on failure. -In this approach: the caller provides the desired end state, and the proxy is responsible for computing and executing the diff. This is the right starting point — it is simple to reason about and avoids the complexity of delta-based or partial-update APIs. More granular approaches (deltas, targeted snapshots) are worth exploring later, but the initial API should leave room for them without committing to them now. +`ReloadResult` reports which clusters ended up in each terminal outcome of the apply: -**Trigger mechanisms are explicitly out of scope for this proposal.** The `applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). +```java +public record ReloadResult( + Set clustersUnchanged, + Set clustersAdded, + Set clustersRestarted, + Set clustersRemoved, + Set failedClusters +) {} +``` + +In this approach: the caller provides the desired end state, and the proxy is responsible for computing and executing the diff. This is the right starting point — it is simple to reason about and avoids the complexity of delta-based or partial-update APIs. More granular approaches (deltas, targeted snapshots) may be worth exploring later, but the initial API should leave room for them without committing to them now. + +**Trigger mechanisms are explicitly out of scope for this proposal.** The `KafkaProxy.applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). ### Two lifecycle policy layers @@ -51,7 +100,8 @@ When a VC traverses the `failed → stopped` edge — meaning it is truly unreco ### Configuration model -Failure behaviour is deployment-level static configuration, split into two independent policy dimensions. A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. These decisions belong in the proxy's static configuration, keeping the `applyConfiguration()` signature simple. +Failure behaviour is deployment-level static configuration, split into two independent policy dimensions. A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. These decisions belong in the proxy's static configuration, keeping the ` +KafkaProxy.applyConfiguration()` signature simple. ```yaml proxy: @@ -87,7 +137,7 @@ These two dimensions are independently meaningful, producing four distinct behav ### Configuration change detection -When `applyConfiguration()` is called, the proxy compares the new configuration against the running state to determine which virtual clusters need to be restarted. Change detection is implemented as a pipeline of `ChangeDetector` implementations, each responsible for one category of change: +When `KafkaProxy.applyConfiguration()` is called, the proxy compares the new configuration against the running state to determine which virtual clusters need to be restarted. Change detection is implemented as a pipeline of `ChangeDetector` implementations, each responsible for one category of change: - **`VirtualClusterChangeDetector`** — identifies clusters that were added, removed, or modified by comparing `VirtualClusterModel` instances via `equals()`. A cluster requires a restart if any property that contributes to `VirtualClusterModel.equals()` changed (bootstrap address, TLS settings, gateway configuration, etc.). - **`FilterChangeDetector`** — identifies clusters affected by filter configuration changes. A cluster requires a restart if a `NamedFilterDefinition` it references changed (type or configuration, compared via `equals()`), or if the `defaultFilters` list changed (order matters, since filter chain execution is sequential) and the cluster relies on default filters. @@ -154,7 +204,7 @@ On success: the orchestrator atomically swaps to the new factory and closes the ### Failure behaviour and rollback -When a VC operation fails during `applyConfiguration()`, two independent policies govern the response: +When a VC operation fails during `KafkaProxy.applyConfiguration()`, two independent policies govern the response: **Orchestration policy (`configurationReload.onFailure.rollback`):** @@ -178,7 +228,7 @@ These policies compose independently. `rollback` determines whether the orchestr ### Orchestration pipeline -The complete `applyConfiguration()` pipeline flows through these layers: +The complete `KafkaProxy.applyConfiguration()` pipeline flows through these layers: ``` KafkaProxy.applyConfiguration(newConfig) @@ -218,7 +268,7 @@ On failure: apply rollback policy, then terminal failure policy if VC is unrecov ### Concurrency control -Only one reload operation can execute at a time. The `ConfigurationReloadOrchestrator` uses a `ReentrantLock` to prevent concurrent `applyConfiguration()` calls. A second call while a reload is in progress fails immediately with a `ConcurrentReloadException` rather than queuing. +Only one reload operation can execute at a time. The `ConfigurationReloadOrchestrator` uses a `ReentrantLock` to prevent concurrent `KafkaProxy.applyConfiguration()` calls. A second call while a reload is in progress fails immediately with a `ConcurrentReloadException` rather than queuing. ### Worked examples @@ -295,7 +345,7 @@ Terminal failure (serve policy): The following metrics are part of the reload implementation: -- **`kroxylicious_reload_total`** — counter of `applyConfiguration()` invocations, labelled by outcome (`success`, `rollback`, `failure`). Enables alerting on reload failures and tracking reload frequency. +- **`kroxylicious_reload_total`** — counter of `KafkaProxy.applyConfiguration()` invocations, labelled by outcome (`success`, `rollback`, `failure`). Enables alerting on reload failures and tracking reload frequency. - **`kroxylicious_reload_duration_seconds`** — histogram of end-to-end reload duration. Helps operators understand whether reload is meeting SLA expectations and identify slow operations. - **`kroxylicious_reload_clusters_affected_total`** — counter of per-VC operations during reload, labelled by operation (`add`, `remove`, `modify`) and outcome (`success`, `failure`, `rolledback`). Provides granularity beyond the aggregate reload result. - **`kroxylicious_drain_duration_seconds`** — histogram of per-VC connection drain duration. Helps tune the `drainTimeout` configuration and detect VCs with long-lived connections. @@ -318,19 +368,19 @@ An approach is being explored where plugins read external resources through the ## Trigger mechanisms (future work) -The `applyConfiguration()` operation is trigger-agnostic. The following trigger mechanisms have been discussed but are explicitly deferred: +The `KafkaProxy.applyConfiguration()` operation is trigger-agnostic. The following trigger mechanisms have been discussed but are explicitly deferred: -- **HTTP endpoint**: An HTTP POST endpoint (e.g. `/admin/config/reload`) that accepts a new configuration and calls `applyConfiguration()`. Provides synchronous feedback. Questions remain around security (authentication, binding to localhost vs. network interfaces), whether the endpoint receives the configuration inline or reads it from a file path, and content-type handling. -- **File watcher**: A filesystem watcher that detects changes to the configuration file and triggers `applyConfiguration()`. Interacts with Kubernetes ConfigMap mount semantics. Questions remain around debouncing, atomic file replacement, and read-only filesystem constraints. -- **Operator integration**: A Kubernetes operator that reconciles a CRD and calls `applyConfiguration()` via the proxy's API. The operator owns the desired state; the proxy does not persist configuration to disk. +- **HTTP endpoint**: An HTTP POST endpoint (e.g. `/admin/config/reload`) that accepts a new configuration and calls `KafkaProxy.applyConfiguration()`. Provides synchronous feedback. Questions remain around security (authentication, binding to localhost vs. network interfaces), whether the endpoint receives the configuration inline or reads it from a file path, and content-type handling. +- **File watcher**: A filesystem watcher that detects changes to the configuration file and triggers `KafkaProxy.applyConfiguration()`. Interacts with Kubernetes ConfigMap mount semantics. Questions remain around debouncing, atomic file replacement, and read-only filesystem constraints. +- **Operator integration**: A Kubernetes operator that reconciles a CRD and calls `KafkaProxy.applyConfiguration()` via the proxy's API. The operator owns the desired state; the proxy does not persist configuration to disk. -Each of these can be designed and implemented independently once the core `applyConfiguration()` mechanism is in place. +Each of these can be designed and implemented independently once the core `KafkaProxy.applyConfiguration()` mechanism is in place. ## Affected/not affected projects **Affected:** -- **kroxylicious-runtime** (core proxy) — The `applyConfiguration()` operation, change detection pipeline (`ChangeDetector`, `VirtualClusterChangeDetector`, `FilterChangeDetector`), reload orchestration (`ConfigurationReloadOrchestrator`, `ConfigurationChangeHandler`, `ConfigurationChangeRollbackTracker`), connection draining infrastructure, and lifecycle-integrated reload operations on `VirtualClusterManager` all live here. This builds on the lifecycle state model (`VirtualClusterLifecycleState`, `VirtualClusterLifecycleManager`, `VirtualClusterManager`) introduced by Proposal 016. +- **kroxylicious-runtime** (core proxy) — The `KafkaProxy.applyConfiguration()` operation, change detection pipeline (`ChangeDetector`, `VirtualClusterChangeDetector`, `FilterChangeDetector`), reload orchestration (`ConfigurationReloadOrchestrator`, `ConfigurationChangeHandler`, `ConfigurationChangeRollbackTracker`), connection draining infrastructure, and lifecycle-integrated reload operations on `VirtualClusterManager` all live here. This builds on the lifecycle state model (`VirtualClusterLifecycleState`, `VirtualClusterLifecycleManager`, `VirtualClusterManager`) introduced by Proposal 016. - **kroxylicious-junit5-extension** — Test infrastructure may need to support applying configuration changes to a running proxy in integration tests. **Not affected:** @@ -340,7 +390,7 @@ Each of these can be designed and implemented independently once the core `apply ## Compatibility -- The `applyConfiguration()` operation is additive — it does not change existing startup behaviour. +- The `KafkaProxy.applyConfiguration()` operation is additive — it does not change existing startup behaviour. - The default configuration (`onVirtualClusterTerminalFailure.serve: none`, `configurationReload.onFailure.rollback: true`) matches current behaviour — the proxy shuts down if any VC fails at startup. - Virtual cluster configuration semantics are unchanged; the proposal only adds the ability to apply changes at runtime. - Filter definitions and their configuration are unchanged. @@ -352,9 +402,9 @@ Each of these can be designed and implemented independently once the core `apply - **File watcher as the primary trigger**: Earlier iterations of this proposal used filesystem watching to detect configuration changes. This was set aside in favour of decoupling the trigger from the apply operation, since the trigger mechanism has unresolved design questions (security, delivery method, Kubernetes integration) that should not block the core capability. - **Node-based failure policy (`onVirtualClusterStopped`)**: The original Proposal 016 design fired the serve policy when a VC *arrived at* the `Stopped` state. This conflates intentional removal (a success) with terminal failure (a problem). Replaced with edge-based policy that fires only on the `failed → stopped` transition. - **Single `onFailure: ROLLBACK | TERMINATE` knob**: The original reload design conflated two independent dimensions — orchestration rollback and terminal failure policy — into a single configuration option. Decomposed into `configurationReload.onFailure.rollback` (orchestration) and `onVirtualClusterTerminalFailure.serve` (lifecycle), which compose independently and reveal two additional meaningful behaviour combinations. -- **`ReloadOptions` as a per-call parameter**: An approach where each call to `applyConfiguration()` could specify failure behaviour. Rejected because these decisions vary by deployment, not by invocation — they belong in static configuration. +- **`ReloadOptions` as a per-call parameter**: An approach where each call to `KafkaProxy.applyConfiguration()` could specify failure behaviour. Rejected because these decisions vary by deployment, not by invocation — they belong in static configuration. - **`ConfigurationReconciler` naming**: Considered to describe the "compare desired vs current and converge" pattern, but rejected because Kubernetes reconcilers already exist in the Kroxylicious codebase and overloading the term would cause confusion. -- **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `applyConfiguration()`. A validate/dry-run capability can be added later without changing the interface. +- **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `KafkaProxy.applyConfiguration()`. A validate/dry-run capability can be added later without changing the interface. - **Inline configuration via HTTP POST body**: Discussed having the HTTP endpoint accept the full YAML configuration in the request body. An alternative view is that configuration should always live in files (for source control, auditability, consistent state) and the HTTP endpoint should just trigger reading from a specified file path. This question is deferred along with the HTTP trigger design. - **Separate VirtualClusterManager for reload**: The original hot-reload design had a `VirtualClusterManager` that was purely an operation orchestrator (with `EndpointRegistry` and `ConnectionDrainManager` dependencies). Rather than maintaining two classes with the same name, the reload operations merge into the [Proposal 016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) `VirtualClusterManager`, which already owns the VC model list and lifecycle managers. The merged class gains `EndpointRegistry` and `ConnectionDrainManager` dependencies and the `removeVirtualCluster`/`restartVirtualCluster`/`addVirtualCluster` methods. - **Two terminal states (`Stopped` and `TerminallyFailed`)**: Considered adding a separate terminal state for unrecoverable failures. Rejected because the distinction is about the transition edge, not the terminal state — a stopped cluster is permanently done regardless of why. The edge-based policy hook achieves the same goal without adding state machine complexity. From 354b0c4fb040096cdf48aa8788bcfb40fc42083f Mon Sep 17 00:00:00 2001 From: Urjit Patel Date: Tue, 28 Apr 2026 19:17:03 +0530 Subject: [PATCH 10/17] Rename proposal to use PR number Signed-off-by: Urjit Patel --- .../{012-hot-reload-feature.md => 083-hot-reload-feature.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename proposals/{012-hot-reload-feature.md => 083-hot-reload-feature.md} (99%) diff --git a/proposals/012-hot-reload-feature.md b/proposals/083-hot-reload-feature.md similarity index 99% rename from proposals/012-hot-reload-feature.md rename to proposals/083-hot-reload-feature.md index aeef004..a9873ba 100644 --- a/proposals/012-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -1,4 +1,4 @@ -# Changing Active Proxy Configuration +# 83 - Changing Active Proxy Configuration **Builds on:** [Proposal 016 — Virtual Cluster Lifecycle](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) From 39d3638664d50aeec8eac7ed8481b330e0fecc18 Mon Sep 17 00:00:00 2001 From: Urjit Patel Date: Tue, 12 May 2026 13:48:00 +0530 Subject: [PATCH 11/17] Updated design doc as per latest discussion Signed-off-by: Urjit Patel --- proposals/083-hot-reload-feature.md | 352 ++++++++++++++-------------- 1 file changed, 178 insertions(+), 174 deletions(-) diff --git a/proposals/083-hot-reload-feature.md b/proposals/083-hot-reload-feature.md index a9873ba..b7a1fa4 100644 --- a/proposals/083-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -1,4 +1,4 @@ -# 83 - Changing Active Proxy Configuration +# Changing Active Proxy Configuration **Builds on:** [Proposal 016 — Virtual Cluster Lifecycle](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) @@ -21,7 +21,7 @@ Administrators need to be able to modify proxy configuration in place. Common sc The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. Unaffected clusters should continue serving traffic without interruption. -This proposal also delivers the startup behaviour change that Proposal 016 made possible. With `serve: successful` and reload, an operator can let healthy VCs serve traffic while fixing a broken VC's config and re-applying — making per-VC independence useful rather than theoretical. +This proposal also delivers the runtime capability that Proposal 016 made possible: with `applyConfiguration()` available, an operator (via whatever trigger they're using) can let healthy VCs continue serving while fixing a broken VC's config and re-applying — making per-VC independence useful rather than theoretical. ## Proposal @@ -39,14 +39,23 @@ class KafkaProxy { * running state. Unaffected clusters continue serving traffic throughout * the apply. * + *

Scope

+ *

This method applies only the virtual-cluster sections of the configuration + * and the named filter definitions that those virtual clusters reference. + * Other configuration sections (management, metrics, admin, etc.) are out of + * scope and are not reconciled by this operation; changes to those sections + * still require a proxy restart. + * *

Validation contract

*

Static validation (schema conformance, required fields, field-value - * ranges, internal consistency) is the embedder's responsibility and is + * ranges, internal consistency) is the caller's responsibility and is * expected to have been performed on {@code newConfig} before this method * is called. * - *

Validation which depends on runtime state (like port conflicts) - * will be done during applyConfiguration() and reported through the ReloadResult + *

Validation which depends on runtime state (port conflicts, plugin + * instantiation, TLS material readability) is performed during + * {@code applyConfiguration()} and reported via the returned future's + * {@link ConfigurationResult}. * *

Error reporting

*

This method throws synchronously only for programmer errors: @@ -55,85 +64,126 @@ class KafkaProxy { *

  • {@link IllegalStateException} if the proxy has not been started or * has been shut down.
  • * - *

    All other failures — validation failures (runtime exceptions), lifecycle transition - * failures during drain or re-init, partial reloads — surface via - * exceptional completion of the returned future. + * + *

    All other failures surface through the returned future: + *

      + *
    • Catastrophic failure — the apply could not be evaluated (e.g. + * internal proxy bug, unexpected I/O failure inside the orchestrator). + * The future completes exceptionally.
    • + *
    • Per-component failure — the apply was evaluated and one or + * more components (virtual clusters or referenced filters) failed to + * converge. The future completes normally with a {@code ConfigurationResult} + * whose {@code errors()} collection is non-empty.
    • + *
    + * + *

    Failure-handling policy (whether to shut down on partial failure, attempt + * a rollback, alert, or retry) is the caller's responsibility, expressed via + * the standard {@link java.util.concurrent.CompletableFuture#whenComplete} + * pattern. The proxy itself takes no policy action based on {@code errors()}; + * it only reports. * * @param newConfig the desired end-state configuration; must be non-null * and statically valid - * @return a future that completes with a {@link ReloadResult} listing which - * clusters ended up unchanged, added, restarted, removed, or failed + * @return a future that completes with a {@link ConfigurationResult} describing + * any per-component failures encountered while applying the + * configuration * @throws NullPointerException if {@code newConfig} is {@code null} * @throws IllegalStateException if the proxy is not in the running state */ - public CompletableFuture applyConfiguration(Configuration newConfig); + public CompletableFuture applyConfiguration(Configuration newConfig); } ``` -The caller provides a complete `Configuration` object. The proxy compares it against the currently running configuration, determines what changed, and applies the changes. The method returns a `CompletableFuture` that completes with a structured result on success or exceptionally on failure. +The caller provides a complete `Configuration` object. The proxy compares the in-scope sections (virtual clusters + referenced named filters) against the currently running configuration, determines what changed, and applies the changes. The method returns a `CompletableFuture` that completes with the apply outcome or exceptionally on catastrophic failure. -`ReloadResult` reports which clusters ended up in each terminal outcome of the apply: +`ConfigurationResult` reports any per-component failures encountered during the apply. The interface is deliberately minimal: it only enumerates *what failed* and *why*, and leaves any reaction to the caller. ```java -public record ReloadResult( - Set clustersUnchanged, - Set clustersAdded, - Set clustersRestarted, - Set clustersRemoved, - Set failedClusters -) {} -``` +public interface ConfigurationResult { + /** + * Returns the per-component failures encountered while applying the configuration. + * One entry per failed component (e.g. one virtual cluster, one referenced filter). + * Empty when the apply succeeded with no failed components. + *

    + * The returned collection is immutable; iteration order is unspecified. + */ + Collection errors(); -In this approach: the caller provides the desired end state, and the proxy is responsible for computing and executing the diff. This is the right starting point — it is simple to reason about and avoids the complexity of delta-based or partial-update APIs. More granular approaches (deltas, targeted snapshots) may be worth exploring later, but the initial API should leave room for them without committing to them now. + /** Convenience predicate equivalent to {@code !errors().isEmpty()}. */ + default boolean hasErrors() { return !errors().isEmpty(); } +} -**Trigger mechanisms are explicitly out of scope for this proposal.** The `KafkaProxy.applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). +public record ConfigurationError(String humanReadableIdentifier, Throwable cause) { } +``` -### Two lifecycle policy layers +`ConfigurationError.humanReadableIdentifier` is a best-effort string that identifies which component failed (a virtual cluster name, a filter reference, etc.). The string's format is implementation-defined and intended for human consumption — operators reading logs, alerts, or admin endpoints. Programmatic consumers should rely on `cause` (the underlying exception) for typed failure detection rather than parsing the identifier. -Proposal 016's lifecycle model has two distinct policy points, each on a different edge of the state graph. This proposal refines the failure policy from a node-based hook (firing when a VC *arrives at* `Stopped`) to an edge-based hook (firing based on *which transition* brought it there). +In this approach the caller provides the desired end state, the proxy computes and executes the diff, and the proxy reports per-component outcomes — but takes no action on those outcomes. The intentional minimalism preserves freedom of manoeuvre for the broader proxy-configuration rework tracked separately; richer APIs (categorised outcomes, structured identifiers, lifecycle event streams) can be added without breaking this contract. -**Layer 1 — Per-VC recovery (`onVirtualClusterFailed`):** -When a VC hits the `initializing → failed` edge, what happens to that VC? Retry? How many times? Proposal 016 explicitly deferred this to the reload proposal (*"recovery policies defined by a future reload proposal under onVirtualClusterFailed"*). It's a per-VC concern — VC-B getting 3 retry attempts has nothing to do with VC-A. In the initial implementation, retries are hardcoded to 0 — a failed VC immediately transitions to `stopped`. The seam exists in the code but is not exposed in configuration until retry with backoff is needed. +#### Caller-side failure handling -**Layer 2 — Terminal failure (`onVirtualClusterTerminalFailure`):** -When a VC traverses the `failed → stopped` edge — meaning it is truly unrecoverable — what's the blast radius? `serve: none` (proxy shuts down) or `serve: successful` (remaining VCs continue). This fires only on the `failed → stopped` edge, not on `draining → stopped` (intentional removal) or `initializing → stopped` (shutdown during startup). The lifecycle state model is unchanged — same states, same transitions. The intelligence is in the `VirtualClusterManager` knowing which transitions are policy-triggering, not in the state itself. +Because the proxy does not act on `errors()`, callers express their failure policy via `whenComplete` on the returned future. Three illustrative patterns: -### Configuration model +**Shut down on any failure**: -Failure behaviour is deployment-level static configuration, split into two independent policy dimensions. A configuration apply triggered by an HTTP endpoint should behave identically to one triggered by a file watcher or an operator callback. These decisions belong in the proxy's static configuration, keeping the ` -KafkaProxy.applyConfiguration()` signature simple. +```java +proxy.applyConfiguration(newConfig) + .whenComplete((result, ex) -> { + if (ex != null) { + LOGGER.atError().setCause(ex).log("Configuration apply failed catastrophically"); + proxy.shutdown(); + return; + } + for (var error : result.errors()) { + LOGGER.atError() + .setCause(error.cause()) + .addKeyValue("component", error.humanReadableIdentifier()) + .log("Configuration apply failed for component"); + } + if (result.hasErrors()) { + proxy.shutdown(); + } + }); +``` -```yaml -proxy: - # Lifecycle: fires on the failed → stopped edge only - # Applies at startup AND reload - onVirtualClusterTerminalFailure: - serve: none # none | successful +The future completes with the aggregate result *before* any action is taken, so logging happens before shutdown — no ordering problem. - # Reload-specific settings - configurationReload: - onFailure: - rollback: true # true | false - persistToDisk: true # true | false +**Best-effort apply, keep what worked**: + +```java +proxy.applyConfiguration(newConfig) + .whenComplete((result, ex) -> { + if (ex != null) { + alerter.send("catastrophic-apply-failure", ex); + return; + } + for (var error : result.errors()) { + alerter.send("component-apply-failure", error); + } + // Surviving components continue serving; no proxy-level action taken. + }); ``` -| Block | Field | Values | Description | -|-------|-------|--------|-------------| -| `onVirtualClusterTerminalFailure` | `serve` | `none` | **Default.** Any unrecoverable VC shuts down the proxy. | -| | | `successful` | Remaining healthy VCs continue serving. Failed VC is reported. | -| `configurationReload` | `onFailure.rollback` | `true` | **Default.** Atomic reload — revert all VCs to prior config on failure. | -| | | `false` | Best-effort — keep what succeeded, let failed VCs die. | -| | `persistToDisk` | `true` | **Default.** Write the new configuration to the config file (with `.bak` backup of the replaced file). | -| | | `false` | Don't persist. The reload is in-memory only. | +**Rollback on failure** (a sophisticated trigger): + +```java +proxy.applyConfiguration(newConfig) + .whenComplete((result, ex) -> { + if (result != null && result.hasErrors()) { + proxy.applyConfiguration(oldConfig) + .whenComplete((rollbackResult, rollbackEx) -> { + if (rollbackEx != null || rollbackResult.hasErrors()) { + // rollback itself failed — last-resort policy + proxy.shutdown(); + } + }); + } + }); +``` -These two dimensions are independently meaningful, producing four distinct behaviours: +Each of these is a trigger-side concern. The proxy does not need to know which policy is in use, and adding a new policy in the future does not require any proxy changes. -| `rollback` | `serve` | Behaviour | -|-----------|---------|-----------| -| `true` | `none` | Atomic reload. Revert on failure. If revert itself fails, proxy dies. | -| `true` | `successful` | Atomic reload. Revert on failure. If revert itself fails, surviving VCs continue. | -| `false` | `none` | Best-effort. Any unrecoverable VC kills the proxy. | -| `false` | `successful` | Best-effort. Failed VCs die, rest continue. (e.g. TLS enforcement case.) | +**Trigger mechanisms are explicitly out of scope for this proposal.** The `KafkaProxy.applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). ### Configuration change detection @@ -148,17 +198,19 @@ Clusters where none of these changed are left untouched — they continue servin ### Cluster modification via lifecycle transitions -A modified virtual cluster is restarted by driving it through the lifecycle states defined in Proposal 016. Proposal 016 defines the per-VC state machine (`VirtualClusterLifecycleState`) and the `VirtualClusterLifecycleManager` that enforces valid transitions. This proposal adds the reload operations that drive those transitions. +A modified virtual cluster is restarted by driving it through the lifecycle states defined in Proposal 016. Proposal 016 defines the per-VC state machine (`VirtualClusterLifecycleState`) and the `VirtualClusterRegistry` that enforces valid transitions. This proposal adds the reload operations that drive those transitions. The three change operations map to lifecycle transitions as follows: **Modify (Restart VC):** `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING` -A modified cluster is torn down and rebuilt with the new configuration. During restart, the lifecycle state cycles through Draining and back to Initializing without ever reaching the terminal Stopped state. This means the `onVirtualClusterTerminalFailure` callback does not fire during restart — reload is an internal VCM operation that stays within the lifecycle state machine. +A modified cluster is torn down and rebuilt with the new configuration. During restart, the lifecycle state cycles through Draining and back to Initializing without ever reaching the terminal Stopped state. **Remove:** `SERVING → DRAINING → [drain connections] → [deregister gateways] → STOPPED` -A removed cluster is permanently torn down. It reaches the terminal Stopped state via `draining → stopped`. This is an intentional removal, not a failure — the `onVirtualClusterTerminalFailure` callback does **not** fire (it only fires on the `failed → stopped` edge). +A removed cluster is permanently torn down. It reaches the terminal Stopped state via `draining → stopped`. This is an intentional removal, not a failure; it is not reported through `ConfigurationResult.errors()`. + +If a *modify* or *add* operation fails — i.e. a VC traverses the `initializing → failed → stopped` path during the apply — the failure is reported as a `ConfigurationError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. **Add:** `[create lifecycle manager in INITIALIZING] → [register gateways] → SERVING` @@ -170,27 +222,12 @@ This means a modified cluster experiences a brief period of unavailability while ### Graceful connection draining -Before tearing down a modified or removed cluster, the proxy drives its lifecycle from **Serving to Draining** (via `VirtualClusterLifecycleManager.startDraining()`). The detailed mechanics of connection draining — rejecting new connections, applying backpressure, waiting for in-flight requests, and force-closing after timeout — are defined as part of the `Draining` lifecycle state in Proposal 016 and its implementation. This proposal does not redefine that behaviour; it relies on the lifecycle state machine to handle drain semantics. +Before tearing down a modified or removed cluster, the proxy drives its lifecycle from **Serving to Draining**. The detailed mechanics of connection draining — rejecting new connections, applying backpressure, waiting for in-flight requests, and force-closing after timeout — are defined as part of the `Draining` lifecycle state in Proposal 016 and its implementation. This proposal does not redefine that behaviour; it relies on the lifecycle state machine to handle drain semantics. Once all connections are drained (or the drain timeout expires), the lifecycle transitions out of Draining: - For **restart**, gateways are deregistered and re-registered, the lifecycle transitions through Initializing to Serving. - For **remove**, the lifecycle manager transitions from Draining to Stopped via `drainComplete()`. This is the `draining → stopped` edge — the terminal failure callback does **not** fire. -### VirtualClusterManager integration - -[Proposal 016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) defines `VirtualClusterManager` as the owner of the VC configuration tree and lifecycle state. It holds the `VirtualClusterModel` list, the per-VC `VirtualClusterLifecycleManager` instances, and the `onVirtualClusterTerminalFailure` callback. - -For hot reload, the `VirtualClusterManager` is extended with reload operations that combine lifecycle transitions with infrastructure actions: - -- **`removeVirtualCluster(clusterName, rollbackTracker)`** — drives `SERVING → DRAINING → [drain via ConnectionDrainManager] → [deregister via EndpointRegistry] → STOPPED`. This is the `draining → stopped` edge — callback does **not** fire. Tracks the removal for potential rollback. -- **`restartVirtualCluster(clusterName, newModel, rollbackTracker)`** — drives `SERVING → DRAINING → [drain] → [deregister] → INITIALIZING → [register] → SERVING`. Never reaches Stopped; callback does not fire. If re-initialization fails: `INITIALIZING → FAILED → STOPPED` via the `failed → stopped` edge — callback **fires**. Tracks the modification for potential rollback. -- **`addVirtualCluster(newModel, rollbackTracker)`** — creates a new lifecycle manager in Initializing, drives `[register via EndpointRegistry] → INITIALIZING → SERVING`. If initialization fails: `INITIALIZING → FAILED → STOPPED` via the `failed → stopped` edge — callback **fires**. Tracks the addition for potential rollback. - -The `VirtualClusterManager` gains two dependencies for reload operations: -- `EndpointRegistry` — for gateway registration and deregistration (port binding/unbinding) -- `ConnectionDrainManager` — for graceful connection draining during the Draining state - -The `ConfigurationChangeHandler` calls these `VirtualClusterManager` methods based on the `ChangeResult` from the detection pipeline. ### FilterChainFactory hot-swap @@ -200,31 +237,21 @@ To support hot reload, `KafkaProxy` holds an `AtomicReference Date: Tue, 12 May 2026 21:43:07 +0530 Subject: [PATCH 12/17] Added concurrency section, PR comments Signed-off-by: Urjit Patel --- proposals/083-hot-reload-feature.md | 218 +++++++++++++++------------- 1 file changed, 118 insertions(+), 100 deletions(-) diff --git a/proposals/083-hot-reload-feature.md b/proposals/083-hot-reload-feature.md index b7a1fa4..88a7a44 100644 --- a/proposals/083-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -1,4 +1,4 @@ -# Changing Active Proxy Configuration +# 83 - Changing Active Proxy Configuration **Builds on:** [Proposal 016 — Virtual Cluster Lifecycle](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) @@ -31,66 +31,70 @@ The central operation is: ```java class KafkaProxy { - // ... add the following method - - /** - * Apply the given configuration to this running proxy, restarting only the - * virtual clusters whose effective configuration differs from the current - * running state. Unaffected clusters continue serving traffic throughout - * the apply. - * - *

    Scope

    - *

    This method applies only the virtual-cluster sections of the configuration - * and the named filter definitions that those virtual clusters reference. - * Other configuration sections (management, metrics, admin, etc.) are out of - * scope and are not reconciled by this operation; changes to those sections - * still require a proxy restart. - * - *

    Validation contract

    - *

    Static validation (schema conformance, required fields, field-value - * ranges, internal consistency) is the caller's responsibility and is - * expected to have been performed on {@code newConfig} before this method - * is called. - * - *

    Validation which depends on runtime state (port conflicts, plugin - * instantiation, TLS material readability) is performed during - * {@code applyConfiguration()} and reported via the returned future's - * {@link ConfigurationResult}. - * - *

    Error reporting

    - *

    This method throws synchronously only for programmer errors: - *

      - *
    • {@link NullPointerException} if {@code newConfig} is {@code null};
    • - *
    • {@link IllegalStateException} if the proxy has not been started or - * has been shut down.
    • - *
    - * - *

    All other failures surface through the returned future: - *

      - *
    • Catastrophic failure — the apply could not be evaluated (e.g. - * internal proxy bug, unexpected I/O failure inside the orchestrator). - * The future completes exceptionally.
    • - *
    • Per-component failure — the apply was evaluated and one or - * more components (virtual clusters or referenced filters) failed to - * converge. The future completes normally with a {@code ConfigurationResult} - * whose {@code errors()} collection is non-empty.
    • - *
    - * - *

    Failure-handling policy (whether to shut down on partial failure, attempt - * a rollback, alert, or retry) is the caller's responsibility, expressed via - * the standard {@link java.util.concurrent.CompletableFuture#whenComplete} - * pattern. The proxy itself takes no policy action based on {@code errors()}; - * it only reports. - * - * @param newConfig the desired end-state configuration; must be non-null - * and statically valid - * @return a future that completes with a {@link ConfigurationResult} describing - * any per-component failures encountered while applying the - * configuration - * @throws NullPointerException if {@code newConfig} is {@code null} - * @throws IllegalStateException if the proxy is not in the running state - */ - public CompletableFuture applyConfiguration(Configuration newConfig); + // ... add the following method + + /** + * Apply the given configuration to this running proxy, restarting only the + * virtual clusters whose effective configuration differs from the current + * running state. Unaffected clusters continue serving traffic throughout + * the apply. + * + *

    Scope

    + *

    This method handles replacement configurations on an already-running + * proxy. The initial configuration continues to be supplied via the {@link KafkaProxy} + * constructor at proxy startup; that path is unchanged by this proposal. + * + *

    Within that scope, this method applies only the virtual-cluster sections of + * the configuration and the named filter definitions that those virtual clusters + * reference. Other configuration sections (management, metrics, admin, etc.) are + * out of scope and are not reconciled by this operation; changes to those sections + * still require a proxy restart. + * + *

    Validation contract

    + *

    Static validation (schema conformance, required fields, field-value + * ranges, internal consistency) is the caller's responsibility and is + * expected to have been performed on {@code newConfig} before this method + * is called. + * + *

    Validation which depends on runtime state (port conflicts, plugin + * instantiation, TLS material readability) is performed during + * {@code applyConfiguration()} and reported via the returned future's + * {@link ConfigurationResult}. + * + *

    Error reporting

    + *

    This method throws synchronously only for programmer errors: + *

      + *
    • {@link NullPointerException} if {@code newConfig} is {@code null};
    • + *
    • {@link IllegalStateException} if the proxy has not been started or + * has been shut down.
    • + *
    + * + *

    All other failures surface through the returned future: + *

      + *
    • Catastrophic failure — the apply could not be evaluated (e.g. + * internal proxy bug, unexpected I/O failure inside the orchestrator). + * The future completes exceptionally.
    • + *
    • Per-component failure — the apply was evaluated and one or + * more components (virtual clusters or referenced filters) failed to + * converge. The future completes normally with a {@code ConfigurationResult} + * whose {@code errors()} collection is non-empty.
    • + *
    + * + *

    Failure-handling policy (whether to shut down on partial failure, attempt + * a rollback, alert, or retry) is the caller's responsibility, expressed via + * the standard {@link java.util.concurrent.CompletableFuture#whenComplete} + * pattern. The proxy itself takes no policy action based on {@code errors()}; + * it only reports. + * + * @param newConfig the desired end-state configuration; must be non-null + * and statically valid + * @return a future that completes with a {@link ConfigurationResult} describing + * any per-component failures encountered while applying the + * configuration + * @throws NullPointerException if {@code newConfig} is {@code null} + * @throws IllegalStateException if the proxy is not in the running state + */ + public CompletableFuture applyConfiguration(Configuration newConfig); } ``` @@ -100,17 +104,17 @@ The caller provides a complete `Configuration` object. The proxy compares the in ```java public interface ConfigurationResult { - /** - * Returns the per-component failures encountered while applying the configuration. - * One entry per failed component (e.g. one virtual cluster, one referenced filter). - * Empty when the apply succeeded with no failed components. - *

    - * The returned collection is immutable; iteration order is unspecified. - */ - Collection errors(); - - /** Convenience predicate equivalent to {@code !errors().isEmpty()}. */ - default boolean hasErrors() { return !errors().isEmpty(); } + /** + * Returns the per-component failures encountered while applying the configuration. + * One entry per failed component (e.g. one virtual cluster, one referenced filter). + * Empty when the apply succeeded with no failed components. + *

    + * The returned collection is immutable; iteration order is unspecified. + */ + Collection errors(); + + /** Convenience predicate equivalent to {@code !errors().isEmpty()}. */ + default boolean hasErrors() { return !errors().isEmpty(); } } public record ConfigurationError(String humanReadableIdentifier, Throwable cause) { } @@ -129,21 +133,21 @@ Because the proxy does not act on `errors()`, callers express their failure poli ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (ex != null) { - LOGGER.atError().setCause(ex).log("Configuration apply failed catastrophically"); + if (ex != null) { + LOGGER.atError().setCause(ex).log("Configuration apply failed catastrophically"); proxy.shutdown(); return; - } - for (var error : result.errors()) { - LOGGER.atError() + } + for (var error : result.errors()) { + LOGGER.atError() .setCause(error.cause()) - .addKeyValue("component", error.humanReadableIdentifier()) - .log("Configuration apply failed for component"); + .addKeyValue("component", error.humanReadableIdentifier()) + .log("Configuration apply failed for component"); } - if (result.hasErrors()) { - proxy.shutdown(); + if (result.hasErrors()) { + proxy.shutdown(); } - }); + }); ``` The future completes with the aggregate result *before* any action is taken, so logging happens before shutdown — no ordering problem. @@ -153,15 +157,15 @@ The future completes with the aggregate result *before* any action is taken, so ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (ex != null) { - alerter.send("catastrophic-apply-failure", ex); + if (ex != null) { + alerter.send("catastrophic-apply-failure", ex); return; + } + for (var error : result.errors()) { + alerter.send("component-apply-failure", error); } - for (var error : result.errors()) { - alerter.send("component-apply-failure", error); - } - // Surviving components continue serving; no proxy-level action taken. - }); + // Surviving components continue serving; no proxy-level action taken. + }); ``` **Rollback on failure** (a sophisticated trigger): @@ -169,16 +173,16 @@ proxy.applyConfiguration(newConfig) ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (result != null && result.hasErrors()) { - proxy.applyConfiguration(oldConfig) + if (result != null && result.hasErrors()) { + proxy.applyConfiguration(oldConfig) .whenComplete((rollbackResult, rollbackEx) -> { - if (rollbackEx != null || rollbackResult.hasErrors()) { - // rollback itself failed — last-resort policy - proxy.shutdown(); + if (rollbackEx != null || rollbackResult.hasErrors()) { + // rollback itself failed — last-resort policy + proxy.shutdown(); } - }); - } - }); + }); + } + }); ``` Each of these is a trigger-side concern. The proxy does not need to know which policy is in use, and adding a new policy in the future does not require any proxy changes. @@ -298,6 +302,20 @@ On per-VC failures: keep successfully-applied changes in effect, complete On catastrophic error: complete future exceptionally ``` +### Concurrency control + +Only one apply operation can execute at a time. The orchestrator uses a `ReentrantLock` to prevent overlap. A second `KafkaProxy.applyConfiguration()` call that arrives while an apply is in progress is **not queued**: it completes exceptionally with a `ConcurrentApplyException`. The trigger is expected to retry, typically with the most recent desired configuration. + +Rejecting (rather than queuing) the second call is deliberate. Queuing would raise three questions the proposal does not want to answer: + +- **Staleness**: a queued apply may carry configuration that is already obsolete by the time it executes (the trigger's source of truth has moved on). Triggers cannot easily detect this. +- **Bounded depth**: an unbounded queue is a memory hazard; a bounded queue forces a "what to do when full?" policy, which is the very thing we're trying to avoid pushing into the proxy. +- **Coalescing semantics**: should an arriving call replace the oldest queued call, the most recent one, or neither? Each choice is defensible and trigger-specific. + +By rejecting fast, the proxy forces these decisions onto the trigger, where they can be made with full knowledge of the trigger's source of truth. A trigger that wants "always apply the most recent config" can implement it cleanly via the rejection-and-retry loop (debounce, then call `applyConfiguration` with the latest config); a trigger that needs different semantics can implement those too without the proxy having pre-committed to a policy. + +`ConcurrentApplyException` is the exceptional-completion cause for this scenario (i.e. accessed via `future.exceptionally(...)` or the `ex` parameter of `whenComplete`, not thrown synchronously from `applyConfiguration` itself — that distinction matches the rest of the error-reporting contract). + ### Worked examples All examples assume: VC-A and VC-B serving; `applyConfiguration(newConfig)` modifies both. Differences below are in what the caller's `whenComplete` does on failure. @@ -357,9 +375,9 @@ The distinction between "per-VC failure" (future completes normally, `errors()` The following metrics are part of the reload implementation: -- **`kroxylicious_apply_total`** — counter of `KafkaProxy.applyConfiguration()` invocations, labelled by outcome (`success` = future completes with empty `errors()`; `partial_failure` = future completes with non-empty `errors()`; `catastrophic` = future completes exceptionally). Enables alerting on apply failures and tracking apply frequency. -- **`kroxylicious_apply_duration_seconds`** — histogram of end-to-end apply duration. Helps operators understand whether apply is meeting SLA expectations and identify slow operations. -- **`kroxylicious_apply_clusters_affected_total`** — counter of per-VC operations during apply, labelled by operation (`add`, `remove`, `modify`) and outcome (`success`, `failure`). Provides granularity beyond the aggregate result. +- **`kroxylicious_apply_config_total`** — counter of `KafkaProxy.applyConfiguration()` invocations, labelled by outcome (`success` = future completes with empty `errors()`; `partial_failure` = future completes with non-empty `errors()`; `catastrophic` = future completes exceptionally). Enables alerting on apply failures and tracking apply frequency. +- **`kroxylicious_apply_config_duration_seconds`** — histogram of end-to-end apply duration. Helps operators understand whether apply is meeting SLA expectations and identify slow operations. +- **`kroxylicious_apply_config_clusters_affected_total`** — counter of per-VC operations during apply, labelled by operation (`add`, `remove`, `modify`) and outcome (`success`, `failure`). Provides granularity beyond the aggregate result. - **`kroxylicious_drain_duration_seconds`** — histogram of per-VC connection drain duration. Helps tune the `drainTimeout` configuration and detect VCs with long-lived connections. - **`kroxylicious_drain_connections_force_closed_total`** — counter of connections force-closed after drain timeout. A high rate indicates the drain timeout is too aggressive for the workload. From 87b69b8bd39c6a62a3160eca759d8f5cbe0b6aea Mon Sep 17 00:00:00 2001 From: Urjit Patel Date: Wed, 13 May 2026 10:28:13 +0530 Subject: [PATCH 13/17] Updated Compatibility section Signed-off-by: Urjit Patel --- proposals/083-hot-reload-feature.md | 73 +++++++++++++++-------------- 1 file changed, 37 insertions(+), 36 deletions(-) diff --git a/proposals/083-hot-reload-feature.md b/proposals/083-hot-reload-feature.md index 88a7a44..c23405f 100644 --- a/proposals/083-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -104,17 +104,17 @@ The caller provides a complete `Configuration` object. The proxy compares the in ```java public interface ConfigurationResult { - /** - * Returns the per-component failures encountered while applying the configuration. - * One entry per failed component (e.g. one virtual cluster, one referenced filter). - * Empty when the apply succeeded with no failed components. - *

    - * The returned collection is immutable; iteration order is unspecified. - */ - Collection errors(); - - /** Convenience predicate equivalent to {@code !errors().isEmpty()}. */ - default boolean hasErrors() { return !errors().isEmpty(); } + /** + * Returns the per-component failures encountered while applying the configuration. + * One entry per failed component (e.g. one virtual cluster, one referenced filter). + * Empty when the apply succeeded with no failed components. + *

    + * The returned collection is immutable; iteration order is unspecified. + */ + Collection errors(); + + /** Convenience predicate equivalent to {@code !errors().isEmpty()}. */ + default boolean hasErrors() { return !errors().isEmpty(); } } public record ConfigurationError(String humanReadableIdentifier, Throwable cause) { } @@ -133,21 +133,21 @@ Because the proxy does not act on `errors()`, callers express their failure poli ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (ex != null) { - LOGGER.atError().setCause(ex).log("Configuration apply failed catastrophically"); + if (ex != null) { + LOGGER.atError().setCause(ex).log("Configuration apply failed catastrophically"); proxy.shutdown(); return; - } - for (var error : result.errors()) { - LOGGER.atError() + } + for (var error : result.errors()) { + LOGGER.atError() .setCause(error.cause()) - .addKeyValue("component", error.humanReadableIdentifier()) - .log("Configuration apply failed for component"); + .addKeyValue("component", error.humanReadableIdentifier()) + .log("Configuration apply failed for component"); } - if (result.hasErrors()) { - proxy.shutdown(); + if (result.hasErrors()) { + proxy.shutdown(); } - }); + }); ``` The future completes with the aggregate result *before* any action is taken, so logging happens before shutdown — no ordering problem. @@ -157,15 +157,15 @@ The future completes with the aggregate result *before* any action is taken, so ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (ex != null) { - alerter.send("catastrophic-apply-failure", ex); + if (ex != null) { + alerter.send("catastrophic-apply-failure", ex); return; - } - for (var error : result.errors()) { - alerter.send("component-apply-failure", error); } - // Surviving components continue serving; no proxy-level action taken. - }); + for (var error : result.errors()) { + alerter.send("component-apply-failure", error); + } + // Surviving components continue serving; no proxy-level action taken. + }); ``` **Rollback on failure** (a sophisticated trigger): @@ -173,16 +173,16 @@ proxy.applyConfiguration(newConfig) ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (result != null && result.hasErrors()) { - proxy.applyConfiguration(oldConfig) + if (result != null && result.hasErrors()) { + proxy.applyConfiguration(oldConfig) .whenComplete((rollbackResult, rollbackEx) -> { - if (rollbackEx != null || rollbackResult.hasErrors()) { - // rollback itself failed — last-resort policy - proxy.shutdown(); + if (rollbackEx != null || rollbackResult.hasErrors()) { + // rollback itself failed — last-resort policy + proxy.shutdown(); } - }); - } - }); + }); + } + }); ``` Each of these is a trigger-side concern. The proxy does not need to know which policy is in use, and adding a new policy in the future does not require any proxy changes. @@ -416,6 +416,7 @@ Each of these can be designed and implemented independently once the core `Kafka - Filter definitions and their configuration are unchanged. - The on-disk configuration file format is unchanged. - The lifecycle state model (Proposal 016) is unchanged; this proposal only adds operations that drive transitions through the existing state machine. +- **Proposal 016 compatibility:** This proposal supersedes the *Virtual Cluster Failure Policy* section of Proposal 016. The `onVirtualClusterStopped.serve` configuration described there is not implemented; failure-handling policy is instead expressed by the caller via `whenComplete` on the `CompletableFuture` returned by `applyConfiguration()`. ## Rejected alternatives From 01563db85182d0f67822ad343da5ced559aaabe7 Mon Sep 17 00:00:00 2001 From: Urjit Patel Date: Wed, 13 May 2026 11:42:15 +0530 Subject: [PATCH 14/17] Update document to reflect latest PR comments Signed-off-by: Urjit Patel --- proposals/083-hot-reload-feature.md | 105 ++++++++++++++++++---------- 1 file changed, 70 insertions(+), 35 deletions(-) diff --git a/proposals/083-hot-reload-feature.md b/proposals/083-hot-reload-feature.md index c23405f..ef70506 100644 --- a/proposals/083-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -4,7 +4,7 @@ This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `KafkaProxy.applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. -This proposal extends the virtual cluster lifecycle model (Proposal 016) with reload operations, an edge-based failure policy, and a configuration change orchestration layer. Where Proposal 016 defines the per-VC state machine and the `VirtualClusterManager` that owns it, this proposal defines the change detection pipeline, the reload orchestration, and the two policy layers (terminal failure and configuration failure) that govern how the proxy responds to problems during reload. +This proposal extends the virtual cluster lifecycle model (Proposal 016) with reload operations, an edge-based failure policy, and a configuration change orchestration layer. Where Proposal 016 defines the per-VC state machine and the `VirtualClusterRegistry` that owns it, this proposal defines the change detection pipeline, the reload orchestration, and the two policy layers (terminal failure and configuration failure) that govern how the proxy responds to problems during reload. ## Current situation @@ -41,14 +41,23 @@ class KafkaProxy { * *

    Scope

    *

    This method handles replacement configurations on an already-running - * proxy. The initial configuration continues to be supplied via the {@link KafkaProxy} - * constructor at proxy startup; that path is unchanged by this proposal. + * proxy. The initial configuration is supplied via the {@link KafkaProxy} + * constructor at proxy startup. * *

    Within that scope, this method applies only the virtual-cluster sections of * the configuration and the named filter definitions that those virtual clusters * reference. Other configuration sections (management, metrics, admin, etc.) are - * out of scope and are not reconciled by this operation; changes to those sections - * still require a proxy restart. + * out of scope. + * + *

    If {@code newConfig} differs from the running configuration in any out-of-scope + * section, the apply is rejected as a pre-flight check before any virtual-cluster + * change is attempted: the returned future completes exceptionally with an + * {@link OutOfScopeChangeException} naming the differing section(s) and the proxy's + * running state is unchanged. Changes to those sections still require a proxy restart. + * Rejecting (rather than silently ignoring) out-of-scope diffs preserves freedom of + * manoeuvre: if a future iteration supports hot-reload of additional sections, the + * exception simply stops being thrown for those sections — silent-ignore would by + * contrast become a breaking semantic change without an API-version bump. * *

    Validation contract

    *

    Static validation (schema conformance, required fields, field-value @@ -71,8 +80,13 @@ class KafkaProxy { * *

    All other failures surface through the returned future: *

      - *
    • Catastrophic failure — the apply could not be evaluated (e.g. - * internal proxy bug, unexpected I/O failure inside the orchestrator). + *
    • Input rejection — the submitted configuration is not acceptable + * to this method (e.g. a change to an out-of-scope section). Detected as a + * pre-flight check before any state change is attempted; no virtual cluster + * is touched. The future completes exceptionally with a specific exception + * type identifying the rejection reason (e.g. {@link OutOfScopeChangeException}).
    • + *
    • Catastrophic failure — the apply began but could not be completed + * (e.g. internal proxy bug, unexpected I/O failure inside the orchestrator). * The future completes exceptionally.
    • *
    • Per-component failure — the apply was evaluated and one or * more components (virtual clusters or referenced filters) failed to @@ -185,6 +199,8 @@ proxy.applyConfiguration(newConfig) }); ``` +The caller supplies `oldConfig` from its own state — the proxy does not currently expose a getter for its running configuration. This is sufficient because triggers that perform rollback typically already have their own source-of-truth for the previous desired state (a Kubernetes ConfigMap revision, an HTTP-endpoint request history, a previously-loaded file). If a future use case requires the proxy to be queryable for its current state, an accessor can be added without changing the `applyConfiguration` contract. + Each of these is a trigger-side concern. The proxy does not need to know which policy is in use, and adding a new policy in the future does not require any proxy changes. **Trigger mechanisms are explicitly out of scope for this proposal.** The `KafkaProxy.applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). @@ -196,33 +212,36 @@ When `KafkaProxy.applyConfiguration()` is called, the proxy compares the new con - **`VirtualClusterChangeDetector`** — identifies clusters that were added, removed, or modified by comparing `VirtualClusterModel` instances via `equals()`. A cluster requires a restart if any property that contributes to `VirtualClusterModel.equals()` changed (bootstrap address, TLS settings, gateway configuration, etc.). - **`FilterChangeDetector`** — identifies clusters affected by filter configuration changes. A cluster requires a restart if a `NamedFilterDefinition` it references changed (type or configuration, compared via `equals()`), or if the `defaultFilters` list changed (order matters, since filter chain execution is sequential) and the cluster relies on default filters. -Detectors return a `ChangeResult(clustersToRemove, clustersToAdd, clustersToModify)`. Results from all detectors are aggregated and then passed onto `VirtualClusterManager` to perform relevant operations. +Detectors return a `ChangeResult(clustersToRemove, clustersToAdd, clustersToModify)`. Results from all detectors are aggregated and then passed onto `VirtualClusterRegistry` to perform relevant operations. Clusters where none of these changed are left untouched — they continue serving traffic throughout the apply operation. ### Cluster modification via lifecycle transitions -A modified virtual cluster is restarted by driving it through the lifecycle states defined in Proposal 016. Proposal 016 defines the per-VC state machine (`VirtualClusterLifecycleState`) and the `VirtualClusterRegistry` that enforces valid transitions. This proposal adds the reload operations that drive those transitions. - -The three change operations map to lifecycle transitions as follows: +The reload operations are exposed as **methods on `VirtualClusterRegistry`** (Proposal 016). Each method drives the per-VC state machine (`VirtualClusterLifecycleState`) through one of three transition sequences. This proposal adds these methods; Proposal 016's existing state machine and transition guards are unchanged. -**Modify (Restart VC):** `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING` +**`removeVirtualCluster(clusterName)`** -A modified cluster is torn down and rebuilt with the new configuration. During restart, the lifecycle state cycles through Draining and back to Initializing without ever reaching the terminal Stopped state. +- **Lifecycle path:** `SERVING → DRAINING → [drain connections] → [deregister gateways] → STOPPED` +- **Behaviour:** The cluster is permanently torn down. It reaches the terminal Stopped state via the `draining → stopped` edge. +- **Failure reporting:** This is an intentional removal, not a failure; it is not reported through `ConfigurationResult.errors()`. -**Remove:** `SERVING → DRAINING → [drain connections] → [deregister gateways] → STOPPED` +**`replaceVirtualCluster(clusterName, newModel)`** -A removed cluster is permanently torn down. It reaches the terminal Stopped state via `draining → stopped`. This is an intentional removal, not a failure; it is not reported through `ConfigurationResult.errors()`. +- **Intent:** Apply a new configuration to an existing cluster. The method is named by intent rather than implementation. +- **Current implementation:** Equivalent to `removeVirtualCluster(clusterName)` followed by `addVirtualCluster(newModel)` — the cluster cycles through `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING`. Clients connected to the cluster are disconnected during the drain phase. +- **Caveats of the current implementation:** The drain + re-init approach means a replaced cluster experiences a brief period of unavailability while its ports are unbound and rebound. It also creates a thundering herd when all disconnected clients reconnect simultaneously after the cluster comes back up; mitigation strategies (e.g. staggered connection acceptance) are future work. These are properties of the *current implementation*, not of the `replaceVirtualCluster` contract — a future implementation may eliminate them without breaking callers. +- **Failure reporting:** If the invocation fails — i.e. the VC traverses the `initializing → failed → stopped` path during the apply — the failure is reported as a `ConfigurationError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. -If a *modify* or *add* operation fails — i.e. a VC traverses the `initializing → failed → stopped` path during the apply — the failure is reported as a `ConfigurationError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. +**`addVirtualCluster(newModel)`** -**Add:** `[create lifecycle manager in INITIALIZING] → [register gateways] → SERVING` +- **Lifecycle path:** `[create lifecycle manager in INITIALIZING] → [register gateways] → SERVING` +- **Behaviour:** A new cluster starts in the Initializing state with a fresh `VirtualClusterLifecycle`, registers its gateways with the `EndpointRegistry`, and transitions to Serving. +- **Failure reporting:** If the invocation fails — i.e. the VC traverses the `initializing → failed → stopped` path during the apply — the failure is reported as a `ConfigurationError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. -A new cluster starts in the Initializing state with a fresh `VirtualClusterLifecycleManager`, registers its gateways with the `EndpointRegistry`, and transitions to Serving. +**Processing order** -Changes are processed in the order: **remove → modify → add**. Removing clusters first frees up ports and resources that new or modified clusters may need. - -This means a modified cluster experiences a brief period of unavailability while its ports are unbound and rebound. Clients connected to the cluster will be disconnected during the drain phase. This is a deliberate design choice. More surgical approaches — such as swapping the filter chain on existing connections without dropping them, or performing a rolling handoff — would reduce disruption, but they add significant complexity. The remove+add approach is the right starting point: it is straightforward, predictable, and consistent with how the proxy handles startup failures today. The remove+add approach also creates a thundering herd when all disconnected clients reconnect simultaneously after the cluster comes back up; mitigation strategies (e.g. staggered connection acceptance) are future work. +Changes are processed in the order: **remove → replace → add**. Removing clusters first frees up ports and resources that new or replacement clusters may need. ### Graceful connection draining @@ -233,6 +252,22 @@ Once all connections are drained (or the drain timeout expires), the lifecycle t - For **remove**, the lifecycle manager transitions from Draining to Stopped via `drainComplete()`. This is the `draining → stopped` edge — the terminal failure callback does **not** fire. +### ConfigurationReloadOrchestrator + +`ConfigurationReloadOrchestrator` is an internal class of `kroxylicious-runtime`. It is not part of the public API; embedders interact with the proxy only via `KafkaProxy.applyConfiguration()`. `KafkaProxy` constructs and holds the orchestrator privately, and embedders never obtain a reference to it. + +The orchestrator owns the apply pipeline end-to-end. Its responsibilities are: + +- **Concurrency control** — serialises overlapping apply calls (see [Concurrency control](#concurrency-control) below). +- **Pre-flight validation** — runs static-validation on `newConfig` before any state-changing work, and rejects out-of-scope changes (see the Javadoc on `KafkaProxy.applyConfiguration` and the [Orchestration pipeline](#orchestration-pipeline) section below). +- **Change detection** — calls the `VirtualClusterChangeDetector` and `FilterChangeDetector` (described in [Configuration change detection](#configuration-change-detection) above) directly. The detectors are internal collaborators of the orchestrator; the orchestrator does *not* receive a pre-computed `ChangeResult` from `KafkaProxy`. +- **Per-VC change execution** — drives the `VirtualClusterRegistry` (Proposal 016) to apply the detected changes in `removeVirtualCluster → replaceVirtualCluster → addVirtualCluster` order. Each method invocation runs the corresponding per-VC lifecycle transitions described in [Cluster modification via lifecycle transitions](#cluster-modification-via-lifecycle-transitions) above. +- **`FilterChainFactory` hot-swap** — atomically swaps the `FilterChainFactory` reference held by `KafkaProxy` on success (see [FilterChainFactory hot-swap](#filterchainfactory-hot-swap) below). +- **Result construction** — accumulates per-component outcomes into a `ConfigurationResult` and completes the future returned to the caller. + +The orchestrator does **not** own connection-level mechanics, per-VC lifecycle state, or endpoint registration — those remain with the per-VC `VirtualClusterLifecycle`, the `VirtualClusterRegistry` (both Proposal 016), and the `EndpointRegistry` (existing infrastructure) respectively. The orchestrator coordinates these collaborators; it does not duplicate their responsibilities. + + ### FilterChainFactory hot-swap Filter configuration changes require replacing the `FilterChainFactory` that creates filter chains for new connections. The existing architecture creates `FilterChainFactory` once at startup and passes it as a `final` reference to `KafkaProxyInitializer`. @@ -265,6 +300,9 @@ The complete `KafkaProxy.applyConfiguration()` pipeline flows through these laye KafkaProxy.applyConfiguration(newConfig) │ ├── Guards: proxy must be running, orchestrator must be initialized + ├── Pre-flight: reject if newConfig differs from current config in any + │ out-of-scope section (future completes exceptionally with + │ OutOfScopeChangeException; no further evaluation) │ ▼ ConfigurationReloadOrchestrator.reload(newConfig) @@ -272,26 +310,23 @@ ConfigurationReloadOrchestrator.reload(newConfig) ├── Acquires reloadLock (prevents concurrent reloads) ├── Validates new configuration via Features framework ├── Creates new FilterChainFactory with updated filter definitions - ├── Builds ConfigurationChangeContext (old/new config, models, factories) - │ - ▼ -ConfigurationChangeHandler.handleConfigurationChange(context) │ - ├── Aggregates ChangeDetector results: + ├── Aggregates ChangeDetector results (called directly by orchestrator): │ VirtualClusterChangeDetector → added/removed/modified VCs │ FilterChangeDetector → VCs affected by filter changes │ - ├── Processes changes in order: Remove → Modify → Add - │ For each change: invoke the corresponding VirtualClusterManager - │ method. Accumulate successes; collect failures as ConfigurationError - │ entries. + ├── Processes changes in order: remove → replace → add + │ For each change: invoke the corresponding VirtualClusterRegistry + │ method (removeVirtualCluster / replaceVirtualCluster / addVirtualCluster). + │ Accumulate successes; collect failures as ConfigurationError entries. │ ▼ -VirtualClusterManager (for each affected VC) +VirtualClusterRegistry (for each affected VC) │ - ├── removeVirtualCluster: SERVING → DRAINING → drain → deregister → STOPPED - ├── restartVirtualCluster: SERVING → DRAINING → drain → deregister → INITIALIZING → register → SERVING - ├── addVirtualCluster: INITIALIZING → register → SERVING + ├── removeVirtualCluster: SERVING → DRAINING → drain → deregister → STOPPED + ├── replaceVirtualCluster: SERVING → DRAINING → drain → deregister → INITIALIZING → register → SERVING + │ (current impl; intent is "apply newModel" — future impl may be more surgical) + ├── addVirtualCluster: INITIALIZING → register → SERVING │ ▼ On success (no errors): swap FilterChainFactory, update current config, complete @@ -429,5 +464,5 @@ Each of these can be designed and implemented independently once the core `Kafka - **`ConfigurationReconciler` naming**: Considered to describe the "compare desired vs current and converge" pattern, but rejected because Kubernetes reconcilers already exist in the Kroxylicious codebase and overloading the term would cause confusion. - **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `KafkaProxy.applyConfiguration()`. A validate/dry-run capability can be added later without changing the interface. - **Inline configuration via HTTP POST body**: Discussed having the HTTP endpoint accept the full YAML configuration in the request body. An alternative view is that configuration should always live in files (for source control, auditability, consistent state) and the HTTP endpoint should just trigger reading from a specified file path. This question is deferred along with the HTTP trigger design. -- **Separate VirtualClusterManager for reload**: The original hot-reload design had a `VirtualClusterManager` that was purely an operation orchestrator (with `EndpointRegistry` and `ConnectionDrainManager` dependencies). Rather than maintaining two classes with the same name, the reload operations merge into the [Proposal 016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) `VirtualClusterManager`, which already owns the VC model list and lifecycle managers. The merged class gains `EndpointRegistry` and `ConnectionDrainManager` dependencies and the `removeVirtualCluster`/`restartVirtualCluster`/`addVirtualCluster` methods. +- **Separate VirtualClusterManager for reload**: The original hot-reload design had a `VirtualClusterManager` that was purely an operation orchestrator (with `EndpointRegistry` and `ConnectionDrainManager` dependencies). Rather than maintaining two classes with the same name, the reload operations merge into the [Proposal 016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) class (originally also called `VirtualClusterManager`, renamed to `VirtualClusterRegistry` in kroxylicious PR #3888), which already owns the VC model list and lifecycle managers. The merged class gains `EndpointRegistry` and `ConnectionDrainManager` dependencies and the `removeVirtualCluster`/`replaceVirtualCluster`/`addVirtualCluster` methods. - **Two terminal states (`Stopped` and `TerminallyFailed`)**: Considered adding a separate terminal state for unrecoverable failures. Rejected because the distinction is about the transition edge, not the terminal state — a stopped cluster is permanently done regardless of why. The edge-based policy hook achieves the same goal without adding state machine complexity. From 6ee245c6cdd35cac3f175adb590a560d2d29860d Mon Sep 17 00:00:00 2001 From: Urjit Patel Date: Wed, 13 May 2026 11:54:38 +0530 Subject: [PATCH 15/17] handled ConcurrentApplyException in trigger examples Signed-off-by: Urjit Patel --- proposals/083-hot-reload-feature.md | 36 ++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/proposals/083-hot-reload-feature.md b/proposals/083-hot-reload-feature.md index ef70506..3faf659 100644 --- a/proposals/083-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -147,6 +147,11 @@ Because the proxy does not act on `errors()`, callers express their failure poli ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { + if (ex instanceof ConcurrentApplyException) { + // Another apply is in flight; the proxy is healthy. Don't shut down — retry later. + LOGGER.atWarn().log("apply rejected: another apply in progress; will retry"); + return; + } if (ex != null) { LOGGER.atError().setCause(ex).log("Configuration apply failed catastrophically"); proxy.shutdown(); @@ -171,6 +176,10 @@ The future completes with the aggregate result *before* any action is taken, so ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { + if (ex instanceof ConcurrentApplyException) { + // Another apply is in flight; not a real failure — trigger can retry later. + return; + } if (ex != null) { alerter.send("catastrophic-apply-failure", ex); return; @@ -187,16 +196,20 @@ proxy.applyConfiguration(newConfig) ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (result != null && result.hasErrors()) { - proxy.applyConfiguration(oldConfig) + if (ex instanceof ConcurrentApplyException) { + // Another apply is in flight; not a real failure — trigger can retry later. + return; + } + if (result != null && result.hasErrors()) { + proxy.applyConfiguration(oldConfig) .whenComplete((rollbackResult, rollbackEx) -> { - if (rollbackEx != null || rollbackResult.hasErrors()) { - // rollback itself failed — last-resort policy - proxy.shutdown(); + if (rollbackEx != null || rollbackResult.hasErrors()) { + // rollback itself failed — last-resort policy + proxy.shutdown(); } - }); - } - }); + }); + } + }); ``` The caller supplies `oldConfig` from its own state — the proxy does not currently expose a getter for its running configuration. This is sufficient because triggers that perform rollback typically already have their own source-of-truth for the previous desired state (a Kubernetes ConfigMap revision, an HTTP-endpoint request history, a previously-loaded file). If a future use case requires the proxy to be queryable for its current state, an accessor can be added without changing the `applyConfiguration` contract. @@ -351,6 +364,13 @@ By rejecting fast, the proxy forces these decisions onto the trigger, where they `ConcurrentApplyException` is the exceptional-completion cause for this scenario (i.e. accessed via `future.exceptionally(...)` or the `ex` parameter of `whenComplete`, not thrown synchronously from `applyConfiguration` itself — that distinction matches the rest of the error-reporting contract). +**Implication for callers' error-handling patterns.** Because `ConcurrentApplyException` indicates the apply was *not attempted* — another apply was in flight, and the proxy's state may reflect *that* apply's changes — callers must distinguish it from per-component failure or catastrophic failure when reacting to the future's exceptional completion. In particular: + +- A trigger that performs rollback on apply failure (the [Rollback on failure](#core-api-kafkaproxyapplyconfiguration) pattern shown above) must *not* roll back on `ConcurrentApplyException`: doing so would replay this trigger's `oldConfig` over the *other* trigger's just-applied changes, undoing them. +- A trigger that shuts down on apply failure (the [Shut down on any failure](#core-api-kafkaproxyapplyconfiguration) pattern shown above) must *not* shut down on `ConcurrentApplyException`: the proxy is healthy and the other trigger is mid-apply. + +The recommended discrimination is `ex instanceof ConcurrentApplyException` — respond with retry (or no-op) rather than the destructive policy. The same discipline applies to any other "rejected before attempt" exception (e.g. `OutOfScopeChangeException`): those also indicate the proxy did not change state, and the trigger's response should reflect that. + ### Worked examples All examples assume: VC-A and VC-B serving; `applyConfiguration(newConfig)` modifies both. Differences below are in what the caller's `whenComplete` does on failure. From ce836796d3c65d2bf5c2651a7922a6125f381404 Mon Sep 17 00:00:00 2001 From: Urjit Patel Date: Wed, 13 May 2026 11:57:46 +0530 Subject: [PATCH 16/17] Fix indentation Signed-off-by: Urjit Patel --- proposals/083-hot-reload-feature.md | 180 ++++++++++++++-------------- 1 file changed, 90 insertions(+), 90 deletions(-) diff --git a/proposals/083-hot-reload-feature.md b/proposals/083-hot-reload-feature.md index 3faf659..5402a6e 100644 --- a/proposals/083-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -31,84 +31,84 @@ The central operation is: ```java class KafkaProxy { - // ... add the following method - - /** - * Apply the given configuration to this running proxy, restarting only the - * virtual clusters whose effective configuration differs from the current - * running state. Unaffected clusters continue serving traffic throughout - * the apply. - * - *

      Scope

      - *

      This method handles replacement configurations on an already-running - * proxy. The initial configuration is supplied via the {@link KafkaProxy} - * constructor at proxy startup. - * - *

      Within that scope, this method applies only the virtual-cluster sections of - * the configuration and the named filter definitions that those virtual clusters - * reference. Other configuration sections (management, metrics, admin, etc.) are - * out of scope. - * - *

      If {@code newConfig} differs from the running configuration in any out-of-scope - * section, the apply is rejected as a pre-flight check before any virtual-cluster - * change is attempted: the returned future completes exceptionally with an - * {@link OutOfScopeChangeException} naming the differing section(s) and the proxy's - * running state is unchanged. Changes to those sections still require a proxy restart. - * Rejecting (rather than silently ignoring) out-of-scope diffs preserves freedom of - * manoeuvre: if a future iteration supports hot-reload of additional sections, the - * exception simply stops being thrown for those sections — silent-ignore would by - * contrast become a breaking semantic change without an API-version bump. - * - *

      Validation contract

      - *

      Static validation (schema conformance, required fields, field-value - * ranges, internal consistency) is the caller's responsibility and is - * expected to have been performed on {@code newConfig} before this method - * is called. - * - *

      Validation which depends on runtime state (port conflicts, plugin - * instantiation, TLS material readability) is performed during - * {@code applyConfiguration()} and reported via the returned future's - * {@link ConfigurationResult}. - * - *

      Error reporting

      - *

      This method throws synchronously only for programmer errors: - *

        - *
      • {@link NullPointerException} if {@code newConfig} is {@code null};
      • - *
      • {@link IllegalStateException} if the proxy has not been started or - * has been shut down.
      • - *
      - * - *

      All other failures surface through the returned future: - *

        - *
      • Input rejection — the submitted configuration is not acceptable - * to this method (e.g. a change to an out-of-scope section). Detected as a - * pre-flight check before any state change is attempted; no virtual cluster - * is touched. The future completes exceptionally with a specific exception - * type identifying the rejection reason (e.g. {@link OutOfScopeChangeException}).
      • - *
      • Catastrophic failure — the apply began but could not be completed - * (e.g. internal proxy bug, unexpected I/O failure inside the orchestrator). - * The future completes exceptionally.
      • - *
      • Per-component failure — the apply was evaluated and one or - * more components (virtual clusters or referenced filters) failed to - * converge. The future completes normally with a {@code ConfigurationResult} - * whose {@code errors()} collection is non-empty.
      • - *
      - * - *

      Failure-handling policy (whether to shut down on partial failure, attempt - * a rollback, alert, or retry) is the caller's responsibility, expressed via - * the standard {@link java.util.concurrent.CompletableFuture#whenComplete} - * pattern. The proxy itself takes no policy action based on {@code errors()}; - * it only reports. - * - * @param newConfig the desired end-state configuration; must be non-null - * and statically valid - * @return a future that completes with a {@link ConfigurationResult} describing - * any per-component failures encountered while applying the - * configuration - * @throws NullPointerException if {@code newConfig} is {@code null} - * @throws IllegalStateException if the proxy is not in the running state - */ - public CompletableFuture applyConfiguration(Configuration newConfig); + // ... add the following method + + /** + * Apply the given configuration to this running proxy, restarting only the + * virtual clusters whose effective configuration differs from the current + * running state. Unaffected clusters continue serving traffic throughout + * the apply. + * + *

      Scope

      + *

      This method handles replacement configurations on an already-running + * proxy. The initial configuration is supplied via the {@link KafkaProxy} + * constructor at proxy startup. + * + *

      Within that scope, this method applies only the virtual-cluster sections of + * the configuration and the named filter definitions that those virtual clusters + * reference. Other configuration sections (management, metrics, admin, etc.) are + * out of scope. + * + *

      If {@code newConfig} differs from the running configuration in any out-of-scope + * section, the apply is rejected as a pre-flight check before any virtual-cluster + * change is attempted: the returned future completes exceptionally with an + * {@link OutOfScopeChangeException} naming the differing section(s) and the proxy's + * running state is unchanged. Changes to those sections still require a proxy restart. + * Rejecting (rather than silently ignoring) out-of-scope diffs preserves freedom of + * manoeuvre: if a future iteration supports hot-reload of additional sections, the + * exception simply stops being thrown for those sections — silent-ignore would by + * contrast become a breaking semantic change without an API-version bump. + * + *

      Validation contract

      + *

      Static validation (schema conformance, required fields, field-value + * ranges, internal consistency) is the caller's responsibility and is + * expected to have been performed on {@code newConfig} before this method + * is called. + * + *

      Validation which depends on runtime state (port conflicts, plugin + * instantiation, TLS material readability) is performed during + * {@code applyConfiguration()} and reported via the returned future's + * {@link ConfigurationResult}. + * + *

      Error reporting

      + *

      This method throws synchronously only for programmer errors: + *

        + *
      • {@link NullPointerException} if {@code newConfig} is {@code null};
      • + *
      • {@link IllegalStateException} if the proxy has not been started or + * has been shut down.
      • + *
      + * + *

      All other failures surface through the returned future: + *

        + *
      • Input rejection — the submitted configuration is not acceptable + * to this method (e.g. a change to an out-of-scope section). Detected as a + * pre-flight check before any state change is attempted; no virtual cluster + * is touched. The future completes exceptionally with a specific exception + * type identifying the rejection reason (e.g. {@link OutOfScopeChangeException}).
      • + *
      • Catastrophic failure — the apply began but could not be completed + * (e.g. internal proxy bug, unexpected I/O failure inside the orchestrator). + * The future completes exceptionally.
      • + *
      • Per-component failure — the apply was evaluated and one or + * more components (virtual clusters or referenced filters) failed to + * converge. The future completes normally with a {@code ConfigurationResult} + * whose {@code errors()} collection is non-empty.
      • + *
      + * + *

      Failure-handling policy (whether to shut down on partial failure, attempt + * a rollback, alert, or retry) is the caller's responsibility, expressed via + * the standard {@link java.util.concurrent.CompletableFuture#whenComplete} + * pattern. The proxy itself takes no policy action based on {@code errors()}; + * it only reports. + * + * @param newConfig the desired end-state configuration; must be non-null + * and statically valid + * @return a future that completes with a {@link ConfigurationResult} describing + * any per-component failures encountered while applying the + * configuration + * @throws NullPointerException if {@code newConfig} is {@code null} + * @throws IllegalStateException if the proxy is not in the running state + */ + public CompletableFuture applyConfiguration(Configuration newConfig); } ``` @@ -196,20 +196,20 @@ proxy.applyConfiguration(newConfig) ```java proxy.applyConfiguration(newConfig) .whenComplete((result, ex) -> { - if (ex instanceof ConcurrentApplyException) { - // Another apply is in flight; not a real failure — trigger can retry later. - return; - } - if (result != null && result.hasErrors()) { - proxy.applyConfiguration(oldConfig) + if (ex instanceof ConcurrentApplyException) { + // Another apply is in flight; not a real failure — trigger can retry later. + return; + } + if (result != null && result.hasErrors()) { + proxy.applyConfiguration(oldConfig) .whenComplete((rollbackResult, rollbackEx) -> { - if (rollbackEx != null || rollbackResult.hasErrors()) { - // rollback itself failed — last-resort policy - proxy.shutdown(); + if (rollbackEx != null || rollbackResult.hasErrors()) { + // rollback itself failed — last-resort policy + proxy.shutdown(); } - }); - } - }); + }); + } + }); ``` The caller supplies `oldConfig` from its own state — the proxy does not currently expose a getter for its running configuration. This is sufficient because triggers that perform rollback typically already have their own source-of-truth for the previous desired state (a Kubernetes ConfigMap revision, an HTTP-endpoint request history, a previously-loaded file). If a future use case requires the proxy to be queryable for its current state, an accessor can be added without changing the `applyConfiguration` contract. From dce62c491aec01d2e83360f9b014fd0a40338293 Mon Sep 17 00:00:00 2001 From: Urjit Patel Date: Wed, 13 May 2026 12:21:26 +0530 Subject: [PATCH 17/17] Rename applyConfiguration to reconfigure and rename associated classes Signed-off-by: Urjit Patel --- proposals/083-hot-reload-feature.md | 193 ++++++++++++++-------------- 1 file changed, 96 insertions(+), 97 deletions(-) diff --git a/proposals/083-hot-reload-feature.md b/proposals/083-hot-reload-feature.md index 5402a6e..46c96e9 100644 --- a/proposals/083-hot-reload-feature.md +++ b/proposals/083-hot-reload-feature.md @@ -2,7 +2,7 @@ **Builds on:** [Proposal 016 — Virtual Cluster Lifecycle](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) -This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `KafkaProxy.applyConfiguration(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. +This proposal introduces a mechanism for applying configuration changes to a running Kroxylicious proxy without a full restart. It defines a core `KafkaProxy.reconfigure(Configuration)` operation that accepts a complete configuration, detects what changed, and converges the running state to match — restarting only the affected virtual clusters while leaving unaffected clusters available. This proposal extends the virtual cluster lifecycle model (Proposal 016) with reload operations, an edge-based failure policy, and a configuration change orchestration layer. Where Proposal 016 defines the per-VC state machine and the `VirtualClusterRegistry` that owns it, this proposal defines the change detection pipeline, the reload orchestration, and the two policy layers (terminal failure and configuration failure) that govern how the proxy responds to problems during reload. @@ -21,11 +21,11 @@ Administrators need to be able to modify proxy configuration in place. Common sc The proxy should apply these changes with minimal disruption: only the virtual clusters affected by the change should experience downtime. Unaffected clusters should continue serving traffic without interruption. -This proposal also delivers the runtime capability that Proposal 016 made possible: with `applyConfiguration()` available, an operator (via whatever trigger they're using) can let healthy VCs continue serving while fixing a broken VC's config and re-applying — making per-VC independence useful rather than theoretical. +This proposal also delivers the runtime capability that Proposal 016 made possible: with `reconfigure()` available, an operator (via whatever trigger they're using) can let healthy VCs continue serving while fixing a broken VC's config and re-applying — making per-VC independence useful rather than theoretical. ## Proposal -### Core API: `KafkaProxy.applyConfiguration()` +### Core API: `KafkaProxy.reconfigure()` The central operation is: @@ -34,10 +34,10 @@ class KafkaProxy { // ... add the following method /** - * Apply the given configuration to this running proxy, restarting only the - * virtual clusters whose effective configuration differs from the current + * Reconfigure this running proxy with the given configuration, restarting only + * the virtual clusters whose effective configuration differs from the current * running state. Unaffected clusters continue serving traffic throughout - * the apply. + * the reconfiguration. * *

      Scope

      *

      This method handles replacement configurations on an already-running @@ -50,7 +50,7 @@ class KafkaProxy { * out of scope. * *

      If {@code newConfig} differs from the running configuration in any out-of-scope - * section, the apply is rejected as a pre-flight check before any virtual-cluster + * section, the reconfiguration is rejected as a pre-flight check before any virtual-cluster * change is attempted: the returned future completes exceptionally with an * {@link OutOfScopeChangeException} naming the differing section(s) and the proxy's * running state is unchanged. Changes to those sections still require a proxy restart. @@ -67,8 +67,8 @@ class KafkaProxy { * *

      Validation which depends on runtime state (port conflicts, plugin * instantiation, TLS material readability) is performed during - * {@code applyConfiguration()} and reported via the returned future's - * {@link ConfigurationResult}. + * {@code reconfigure()} and reported via the returned future's + * {@link ReconfigureResult}. * *

      Error reporting

      *

      This method throws synchronously only for programmer errors: @@ -85,12 +85,12 @@ class KafkaProxy { * pre-flight check before any state change is attempted; no virtual cluster * is touched. The future completes exceptionally with a specific exception * type identifying the rejection reason (e.g. {@link OutOfScopeChangeException}).

    • - *
    • Catastrophic failure — the apply began but could not be completed + *
    • Catastrophic failure — the reconfiguration began but could not be completed * (e.g. internal proxy bug, unexpected I/O failure inside the orchestrator). * The future completes exceptionally.
    • - *
    • Per-component failure — the apply was evaluated and one or + *
    • Per-component failure — the reconfiguration was evaluated and one or * more components (virtual clusters or referenced filters) failed to - * converge. The future completes normally with a {@code ConfigurationResult} + * converge. The future completes normally with a {@code ReconfigureResult} * whose {@code errors()} collection is non-empty.
    • *
    * @@ -102,39 +102,38 @@ class KafkaProxy { * * @param newConfig the desired end-state configuration; must be non-null * and statically valid - * @return a future that completes with a {@link ConfigurationResult} describing - * any per-component failures encountered while applying the - * configuration + * @return a future that completes with a {@link ReconfigureResult} describing + * any per-component failures encountered while reconfiguring * @throws NullPointerException if {@code newConfig} is {@code null} * @throws IllegalStateException if the proxy is not in the running state */ - public CompletableFuture applyConfiguration(Configuration newConfig); + public CompletableFuture reconfigure(Configuration newConfig); } ``` -The caller provides a complete `Configuration` object. The proxy compares the in-scope sections (virtual clusters + referenced named filters) against the currently running configuration, determines what changed, and applies the changes. The method returns a `CompletableFuture` that completes with the apply outcome or exceptionally on catastrophic failure. +The caller provides a complete `Configuration` object. The proxy compares the in-scope sections (virtual clusters + referenced named filters) against the currently running configuration, determines what changed, and applies the changes. The method returns a `CompletableFuture` that completes with the reconfiguration outcome or exceptionally on catastrophic failure. -`ConfigurationResult` reports any per-component failures encountered during the apply. The interface is deliberately minimal: it only enumerates *what failed* and *why*, and leaves any reaction to the caller. +`ReconfigureResult` reports any per-component failures encountered during the reconfiguration. The interface is deliberately minimal: it only enumerates *what failed* and *why*, and leaves any reaction to the caller. ```java -public interface ConfigurationResult { +public interface ReconfigureResult { /** - * Returns the per-component failures encountered while applying the configuration. + * Returns the per-component failures encountered while reconfiguring. * One entry per failed component (e.g. one virtual cluster, one referenced filter). - * Empty when the apply succeeded with no failed components. + * Empty when the reconfiguration succeeded with no failed components. *

    * The returned collection is immutable; iteration order is unspecified. */ - Collection errors(); + Collection errors(); /** Convenience predicate equivalent to {@code !errors().isEmpty()}. */ default boolean hasErrors() { return !errors().isEmpty(); } } -public record ConfigurationError(String humanReadableIdentifier, Throwable cause) { } +public record ReconfigureError(String humanReadableIdentifier, Throwable cause) { } ``` -`ConfigurationError.humanReadableIdentifier` is a best-effort string that identifies which component failed (a virtual cluster name, a filter reference, etc.). The string's format is implementation-defined and intended for human consumption — operators reading logs, alerts, or admin endpoints. Programmatic consumers should rely on `cause` (the underlying exception) for typed failure detection rather than parsing the identifier. +`ReconfigureError.humanReadableIdentifier` is a best-effort string that identifies which component failed (a virtual cluster name, a filter reference, etc.). The string's format is implementation-defined and intended for human consumption — operators reading logs, alerts, or admin endpoints. Programmatic consumers should rely on `cause` (the underlying exception) for typed failure detection rather than parsing the identifier. In this approach the caller provides the desired end state, the proxy computes and executes the diff, and the proxy reports per-component outcomes — but takes no action on those outcomes. The intentional minimalism preserves freedom of manoeuvre for the broader proxy-configuration rework tracked separately; richer APIs (categorised outcomes, structured identifiers, lifecycle event streams) can be added without breaking this contract. @@ -145,15 +144,15 @@ Because the proxy does not act on `errors()`, callers express their failure poli **Shut down on any failure**: ```java -proxy.applyConfiguration(newConfig) +proxy.reconfigure(newConfig) .whenComplete((result, ex) -> { - if (ex instanceof ConcurrentApplyException) { - // Another apply is in flight; the proxy is healthy. Don't shut down — retry later. - LOGGER.atWarn().log("apply rejected: another apply in progress; will retry"); + if (ex instanceof ConcurrentReconfigureException) { + // Another reconfigure is in flight; the proxy is healthy. Don't shut down — retry later. + LOGGER.atWarn().log("reconfigure rejected: another reconfigure in progress; will retry"); return; } if (ex != null) { - LOGGER.atError().setCause(ex).log("Configuration apply failed catastrophically"); + LOGGER.atError().setCause(ex).log("Reconfigure failed catastrophically"); proxy.shutdown(); return; } @@ -161,7 +160,7 @@ proxy.applyConfiguration(newConfig) LOGGER.atError() .setCause(error.cause()) .addKeyValue("component", error.humanReadableIdentifier()) - .log("Configuration apply failed for component"); + .log("Reconfigure failed for component"); } if (result.hasErrors()) { proxy.shutdown(); @@ -171,21 +170,21 @@ proxy.applyConfiguration(newConfig) The future completes with the aggregate result *before* any action is taken, so logging happens before shutdown — no ordering problem. -**Best-effort apply, keep what worked**: +**Best-effort reconfigure, keep what worked**: ```java -proxy.applyConfiguration(newConfig) +proxy.reconfigure(newConfig) .whenComplete((result, ex) -> { - if (ex instanceof ConcurrentApplyException) { - // Another apply is in flight; not a real failure — trigger can retry later. + if (ex instanceof ConcurrentReconfigureException) { + // Another reconfigure is in flight; not a real failure — trigger can retry later. return; } if (ex != null) { - alerter.send("catastrophic-apply-failure", ex); + alerter.send("catastrophic-reconfigure-failure", ex); return; } for (var error : result.errors()) { - alerter.send("component-apply-failure", error); + alerter.send("component-reconfigure-failure", error); } // Surviving components continue serving; no proxy-level action taken. }); @@ -194,14 +193,14 @@ proxy.applyConfiguration(newConfig) **Rollback on failure** (a sophisticated trigger): ```java -proxy.applyConfiguration(newConfig) +proxy.reconfigure(newConfig) .whenComplete((result, ex) -> { - if (ex instanceof ConcurrentApplyException) { - // Another apply is in flight; not a real failure — trigger can retry later. + if (ex instanceof ConcurrentReconfigureException) { + // Another reconfigure is in flight; not a real failure — trigger can retry later. return; } if (result != null && result.hasErrors()) { - proxy.applyConfiguration(oldConfig) + proxy.reconfigure(oldConfig) .whenComplete((rollbackResult, rollbackEx) -> { if (rollbackEx != null || rollbackResult.hasErrors()) { // rollback itself failed — last-resort policy @@ -212,22 +211,22 @@ proxy.applyConfiguration(newConfig) }); ``` -The caller supplies `oldConfig` from its own state — the proxy does not currently expose a getter for its running configuration. This is sufficient because triggers that perform rollback typically already have their own source-of-truth for the previous desired state (a Kubernetes ConfigMap revision, an HTTP-endpoint request history, a previously-loaded file). If a future use case requires the proxy to be queryable for its current state, an accessor can be added without changing the `applyConfiguration` contract. +The caller supplies `oldConfig` from its own state — the proxy does not currently expose a getter for its running configuration. This is sufficient because triggers that perform rollback typically already have their own source-of-truth for the previous desired state (a Kubernetes ConfigMap revision, an HTTP-endpoint request history, a previously-loaded file). If a future use case requires the proxy to be queryable for its current state, an accessor can be added without changing the `reconfigure` contract. Each of these is a trigger-side concern. The proxy does not need to know which policy is in use, and adding a new policy in the future does not require any proxy changes. -**Trigger mechanisms are explicitly out of scope for this proposal.** The `KafkaProxy.applyConfiguration()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). +**Trigger mechanisms are explicitly out of scope for this proposal.** The `KafkaProxy.reconfigure()` operation is the internal interface that any trigger plugs into. How the new configuration arrives — whether via an HTTP endpoint, a file watcher detecting a changed ConfigMap, or a Kubernetes operator callback — is a separate concern. Deferring this keeps the proposal focused and avoids blocking on unresolved questions about trigger design (see [Trigger mechanisms](#trigger-mechanisms-future-work) below). ### Configuration change detection -When `KafkaProxy.applyConfiguration()` is called, the proxy compares the new configuration against the running state to determine which virtual clusters need to be restarted. Change detection is implemented as a pipeline of `ChangeDetector` implementations, each responsible for one category of change: +When `KafkaProxy.reconfigure()` is called, the proxy compares the new configuration against the running state to determine which virtual clusters need to be restarted. Change detection is implemented as a pipeline of `ChangeDetector` implementations, each responsible for one category of change: - **`VirtualClusterChangeDetector`** — identifies clusters that were added, removed, or modified by comparing `VirtualClusterModel` instances via `equals()`. A cluster requires a restart if any property that contributes to `VirtualClusterModel.equals()` changed (bootstrap address, TLS settings, gateway configuration, etc.). - **`FilterChangeDetector`** — identifies clusters affected by filter configuration changes. A cluster requires a restart if a `NamedFilterDefinition` it references changed (type or configuration, compared via `equals()`), or if the `defaultFilters` list changed (order matters, since filter chain execution is sequential) and the cluster relies on default filters. Detectors return a `ChangeResult(clustersToRemove, clustersToAdd, clustersToModify)`. Results from all detectors are aggregated and then passed onto `VirtualClusterRegistry` to perform relevant operations. -Clusters where none of these changed are left untouched — they continue serving traffic throughout the apply operation. +Clusters where none of these changed are left untouched — they continue serving traffic throughout the reconfiguration operation. ### Cluster modification via lifecycle transitions @@ -237,20 +236,20 @@ The reload operations are exposed as **methods on `VirtualClusterRegistry`** (Pr - **Lifecycle path:** `SERVING → DRAINING → [drain connections] → [deregister gateways] → STOPPED` - **Behaviour:** The cluster is permanently torn down. It reaches the terminal Stopped state via the `draining → stopped` edge. -- **Failure reporting:** This is an intentional removal, not a failure; it is not reported through `ConfigurationResult.errors()`. +- **Failure reporting:** This is an intentional removal, not a failure; it is not reported through `ReconfigureResult.errors()`. **`replaceVirtualCluster(clusterName, newModel)`** - **Intent:** Apply a new configuration to an existing cluster. The method is named by intent rather than implementation. - **Current implementation:** Equivalent to `removeVirtualCluster(clusterName)` followed by `addVirtualCluster(newModel)` — the cluster cycles through `SERVING → DRAINING → [drain connections] → [deregister gateways] → INITIALIZING → [register gateways] → SERVING`. Clients connected to the cluster are disconnected during the drain phase. - **Caveats of the current implementation:** The drain + re-init approach means a replaced cluster experiences a brief period of unavailability while its ports are unbound and rebound. It also creates a thundering herd when all disconnected clients reconnect simultaneously after the cluster comes back up; mitigation strategies (e.g. staggered connection acceptance) are future work. These are properties of the *current implementation*, not of the `replaceVirtualCluster` contract — a future implementation may eliminate them without breaking callers. -- **Failure reporting:** If the invocation fails — i.e. the VC traverses the `initializing → failed → stopped` path during the apply — the failure is reported as a `ConfigurationError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. +- **Failure reporting:** If the invocation fails — i.e. the VC traverses the `initializing → failed → stopped` path during the reconfiguration — the failure is reported as a `ReconfigureError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. **`addVirtualCluster(newModel)`** - **Lifecycle path:** `[create lifecycle manager in INITIALIZING] → [register gateways] → SERVING` - **Behaviour:** A new cluster starts in the Initializing state with a fresh `VirtualClusterLifecycle`, registers its gateways with the `EndpointRegistry`, and transitions to Serving. -- **Failure reporting:** If the invocation fails — i.e. the VC traverses the `initializing → failed → stopped` path during the apply — the failure is reported as a `ConfigurationError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. +- **Failure reporting:** If the invocation fails — i.e. the VC traverses the `initializing → failed → stopped` path during the reconfiguration — the failure is reported as a `ReconfigureError` entry in the result returned to the caller. The caller decides whether to retry, rollback, alert, or shut down via the `whenComplete` patterns shown above. **Processing order** @@ -267,16 +266,16 @@ Once all connections are drained (or the drain timeout expires), the lifecycle t ### ConfigurationReloadOrchestrator -`ConfigurationReloadOrchestrator` is an internal class of `kroxylicious-runtime`. It is not part of the public API; embedders interact with the proxy only via `KafkaProxy.applyConfiguration()`. `KafkaProxy` constructs and holds the orchestrator privately, and embedders never obtain a reference to it. +`ConfigurationReloadOrchestrator` is an internal class of `kroxylicious-runtime`. It is not part of the public API; embedders interact with the proxy only via `KafkaProxy.reconfigure()`. `KafkaProxy` constructs and holds the orchestrator privately, and embedders never obtain a reference to it. -The orchestrator owns the apply pipeline end-to-end. Its responsibilities are: +The orchestrator owns the reconfiguration pipeline end-to-end. Its responsibilities are: -- **Concurrency control** — serialises overlapping apply calls (see [Concurrency control](#concurrency-control) below). -- **Pre-flight validation** — runs static-validation on `newConfig` before any state-changing work, and rejects out-of-scope changes (see the Javadoc on `KafkaProxy.applyConfiguration` and the [Orchestration pipeline](#orchestration-pipeline) section below). +- **Concurrency control** — serialises overlapping reconfigure calls (see [Concurrency control](#concurrency-control) below). +- **Pre-flight validation** — runs static-validation on `newConfig` before any state-changing work, and rejects out-of-scope changes (see the Javadoc on `KafkaProxy.reconfigure` and the [Orchestration pipeline](#orchestration-pipeline) section below). - **Change detection** — calls the `VirtualClusterChangeDetector` and `FilterChangeDetector` (described in [Configuration change detection](#configuration-change-detection) above) directly. The detectors are internal collaborators of the orchestrator; the orchestrator does *not* receive a pre-computed `ChangeResult` from `KafkaProxy`. - **Per-VC change execution** — drives the `VirtualClusterRegistry` (Proposal 016) to apply the detected changes in `removeVirtualCluster → replaceVirtualCluster → addVirtualCluster` order. Each method invocation runs the corresponding per-VC lifecycle transitions described in [Cluster modification via lifecycle transitions](#cluster-modification-via-lifecycle-transitions) above. - **`FilterChainFactory` hot-swap** — atomically swaps the `FilterChainFactory` reference held by `KafkaProxy` on success (see [FilterChainFactory hot-swap](#filterchainfactory-hot-swap) below). -- **Result construction** — accumulates per-component outcomes into a `ConfigurationResult` and completes the future returned to the caller. +- **Result construction** — accumulates per-component outcomes into a `ReconfigureResult` and completes the future returned to the caller. The orchestrator does **not** own connection-level mechanics, per-VC lifecycle state, or endpoint registration — those remain with the per-VC `VirtualClusterLifecycle`, the `VirtualClusterRegistry` (both Proposal 016), and the `EndpointRegistry` (existing infrastructure) respectively. The orchestrator coordinates these collaborators; it does not duplicate their responsibilities. @@ -289,28 +288,28 @@ To support hot reload, `KafkaProxy` holds an `AtomicReference` returned by `applyConfiguration()`. +- **Proposal 016 compatibility:** This proposal supersedes the *Virtual Cluster Failure Policy* section of Proposal 016. The `onVirtualClusterStopped.serve` configuration described there is not implemented; failure-handling policy is instead expressed by the caller via `whenComplete` on the `CompletableFuture` returned by `reconfigure()`. ## Rejected alternatives -- **File watcher as the primary trigger**: Earlier iterations of this proposal used filesystem watching to detect configuration changes. This was set aside in favour of decoupling the trigger from the apply operation, since the trigger mechanism has unresolved design questions (security, delivery method, Kubernetes integration) that should not block the core capability. -- **Node-based failure policy (`onVirtualClusterStopped`)**: The original Proposal 016 design fired the serve policy when a VC *arrived at* the `Stopped` state. This conflates intentional removal (a success) with terminal failure (a problem). The current design avoids this entirely by surfacing failures through `ConfigurationResult.errors()` and letting the caller decide blast radius. +- **File watcher as the primary trigger**: Earlier iterations of this proposal used filesystem watching to detect configuration changes. This was set aside in favour of decoupling the trigger from the reconfiguration operation, since the trigger mechanism has unresolved design questions (security, delivery method, Kubernetes integration) that should not block the core capability. +- **Node-based failure policy (`onVirtualClusterStopped`)**: The original Proposal 016 design fired the serve policy when a VC *arrived at* the `Stopped` state. This conflates intentional removal (a success) with terminal failure (a problem). The current design avoids this entirely by surfacing failures through `ReconfigureResult.errors()` and letting the caller decide blast radius. - **`onVirtualClusterTerminalFailure` / `configurationReload` YAML config blocks**: An earlier iteration proposed YAML-level deployment policy for "what to do when a VC fails" and "should reload roll back atomically." Rejected in favour of caller-side policy via `whenComplete`. The proxy reports outcomes; the caller chooses the response. This shrinks the proxy's API surface and lets different deployment scenarios (CLI, operator, sophisticated trigger) express different policies without proxy changes. -- **`ReloadOptions` as a per-call parameter**: An approach where each call to `KafkaProxy.applyConfiguration()` could specify failure behaviour. Rejected because the caller can express any policy on the future returned by the method — a per-call parameter would not add expressive power and would commit the proxy to a specific failure-policy taxonomy. +- **`ReloadOptions` as a per-call parameter**: An approach where each call to `KafkaProxy.reconfigure()` could specify failure behaviour. Rejected because the caller can express any policy on the future returned by the method — a per-call parameter would not add expressive power and would commit the proxy to a specific failure-policy taxonomy. - **`VirtualClusterLifecycleObserver` injected at construction**: An earlier iteration proposed a push-based observer notified of every lifecycle transition. Rejected for this proposal in favour of the simpler `whenComplete` model, which covers the failure-handling use case without introducing a new extension point. A general-purpose lifecycle event stream (useful for control-plane status push) may be revisited in a follow-up proposal. -- **Structured `ConfigurationError` (typed by failure layer)**: An earlier iteration proposed `ConfigurationError(String virtualClusterName, @Nullable String filterName, ReloadPhase phase, Throwable cause)`. The current minimal form (one identifier + cause) is intentional — committing to a phase enum or a structured-fields shape now would constrain the broader proxy-config rework. Programmatic consumers should use `cause` for typed handling; the identifier is for human consumption. +- **Structured `ReconfigureError` (typed by failure layer)**: An earlier iteration proposed `ReconfigureError(String virtualClusterName, @Nullable String filterName, ReloadPhase phase, Throwable cause)`. The current minimal form (one identifier + cause) is intentional — committing to a phase enum or a structured-fields shape now would constrain the broader proxy-config rework. Programmatic consumers should use `cause` for typed handling; the identifier is for human consumption. - **`ConfigurationReconciler` naming**: Considered to describe the "compare desired vs current and converge" pattern, but rejected because Kubernetes reconcilers already exist in the Kroxylicious codebase and overloading the term would cause confusion. -- **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `KafkaProxy.applyConfiguration()`. A validate/dry-run capability can be added later without changing the interface. +- **Plan/apply split on the public interface**: Considered exposing separate `plan()` and `apply()` methods to enable dry-run validation. Decided this is an internal concern — the trigger just needs `KafkaProxy.reconfigure()`. A validate/dry-run capability can be added later without changing the interface. - **Inline configuration via HTTP POST body**: Discussed having the HTTP endpoint accept the full YAML configuration in the request body. An alternative view is that configuration should always live in files (for source control, auditability, consistent state) and the HTTP endpoint should just trigger reading from a specified file path. This question is deferred along with the HTTP trigger design. - **Separate VirtualClusterManager for reload**: The original hot-reload design had a `VirtualClusterManager` that was purely an operation orchestrator (with `EndpointRegistry` and `ConnectionDrainManager` dependencies). Rather than maintaining two classes with the same name, the reload operations merge into the [Proposal 016](https://github.com/kroxylicious/design/blob/main/proposals/016-virtual-cluster-lifecycle.md) class (originally also called `VirtualClusterManager`, renamed to `VirtualClusterRegistry` in kroxylicious PR #3888), which already owns the VC model list and lifecycle managers. The merged class gains `EndpointRegistry` and `ConnectionDrainManager` dependencies and the `removeVirtualCluster`/`replaceVirtualCluster`/`addVirtualCluster` methods. - **Two terminal states (`Stopped` and `TerminallyFailed`)**: Considered adding a separate terminal state for unrecoverable failures. Rejected because the distinction is about the transition edge, not the terminal state — a stopped cluster is permanently done regardless of why. The edge-based policy hook achieves the same goal without adding state machine complexity.