Skip to content

Add BlockedThreadChecker for gRPC direct executor#260

Open
mattisonchao wants to merge 6 commits intomainfrom
feat/blocked-thread-checker
Open

Add BlockedThreadChecker for gRPC direct executor#260
mattisonchao wants to merge 6 commits intomainfrom
feat/blocked-thread-checker

Conversation

@mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Mar 9, 2026

Summary

  • Adds a Vert.x-style BlockedThreadChecker that monitors the gRPC direct executor thread and logs warnings (with stack traces) when a callback blocks longer than 500ms
  • Introduces CheckedDirectExecutor — runs tasks on the calling thread (no thread switch), but wraps execution with start/end timing tracked via a ConcurrentHashMap
  • A background daemon Timer thread periodically checks all tracked threads and logs blocked ones
  • Wired through OxiaStubManagerOxiaStub → gRPC channel builder; closed on client shutdown
  • When no BlockedThreadChecker is provided (e.g. in tests), falls back to plain .directExecutor()

Test plan

  • Verify compilation passes
  • Verify existing tests still pass (no behavioral change — same direct execution model)
  • Manual test: add a blocking callback and verify warning is logged with stack trace
  • Review thread-safety of ConcurrentHashMap tracking and AtomicLong warn-once logic

mattisonchao and others added 4 commits March 9, 2026 11:44
- Map all 12 Oxia gRPC status codes with retriability flags
- Extract codes from grpc-status-details-bin trailer with
  description-based fallback for plain gRPC errors
- Update ShardManager to use OxiaStatus for namespace-not-found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the separate fromDescription() method and inline the
description-based check directly in fromError(). The fallback is
needed because Go gRPC only sends grpc-status-details-bin when
WithDetails() is used.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- WriteBatch and ReadBatch retry on retriable errors using
  CompletableFuture.exceptionallyCompose() with exponential backoff
- Retry is non-blocking: uses CompletableFuture.delayedExecutor()
  instead of Thread.sleep() to avoid blocking the batcher thread
- Convert ReadBatch from StreamObserver to CompletableFuture-based
  to support retry composition
- Bounded by requestTimeout deadline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hreads

Similar to Vert.x's "Don't block me" feature, this monitors the internal
direct executor thread used by gRPC/Netty and logs a warning with stack
trace when a callback blocks longer than the threshold (default 500ms).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mattisonchao and others added 2 commits March 9, 2026 13:17
- Controlled by -Doxia.client.blockedThreadChecker.enabled=true
- Configurable interval via -Doxia.client.blockedThreadChecker.intervalMs
- Configurable threshold via -Doxia.client.blockedThreadChecker.warnThresholdMs
- Replace Timer with ScheduledExecutorService for robustness
- Re-warn every 5s for long-running blocks instead of warn-once
- Disabled by default: zero overhead in production unless opted in

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant