Add BlockedThreadChecker for gRPC direct executor#260
Open
mattisonchao wants to merge 6 commits intomainfrom
Open
Add BlockedThreadChecker for gRPC direct executor#260mattisonchao wants to merge 6 commits intomainfrom
mattisonchao wants to merge 6 commits intomainfrom
Conversation
- Map all 12 Oxia gRPC status codes with retriability flags - Extract codes from grpc-status-details-bin trailer with description-based fallback for plain gRPC errors - Update ShardManager to use OxiaStatus for namespace-not-found Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the separate fromDescription() method and inline the description-based check directly in fromError(). The fallback is needed because Go gRPC only sends grpc-status-details-bin when WithDetails() is used. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- WriteBatch and ReadBatch retry on retriable errors using CompletableFuture.exceptionallyCompose() with exponential backoff - Retry is non-blocking: uses CompletableFuture.delayedExecutor() instead of Thread.sleep() to avoid blocking the batcher thread - Convert ReadBatch from StreamObserver to CompletableFuture-based to support retry composition - Bounded by requestTimeout deadline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hreads Similar to Vert.x's "Don't block me" feature, this monitors the internal direct executor thread used by gRPC/Netty and logs a warning with stack trace when a callback blocks longer than the threshold (default 500ms). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Controlled by -Doxia.client.blockedThreadChecker.enabled=true - Configurable interval via -Doxia.client.blockedThreadChecker.intervalMs - Configurable threshold via -Doxia.client.blockedThreadChecker.warnThresholdMs - Replace Timer with ScheduledExecutorService for robustness - Re-warn every 5s for long-running blocks instead of warn-once - Disabled by default: zero overhead in production unless opted in Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BlockedThreadCheckerthat monitors the gRPC direct executor thread and logs warnings (with stack traces) when a callback blocks longer than 500msCheckedDirectExecutor— runs tasks on the calling thread (no thread switch), but wraps execution with start/end timing tracked via aConcurrentHashMapTimerthread periodically checks all tracked threads and logs blocked onesOxiaStubManager→OxiaStub→ gRPC channel builder; closed on client shutdownBlockedThreadCheckeris provided (e.g. in tests), falls back to plain.directExecutor()Test plan
ConcurrentHashMaptracking andAtomicLongwarn-once logic