Skip to content

Fix/server cpu exhaustion#353

Merged
0pcom merged 4 commits intoskycoin:developfrom
0pcom:fix/server-cpu-exhaustion
Mar 26, 2026
Merged

Fix/server cpu exhaustion#353
0pcom merged 4 commits intoskycoin:developfrom
0pcom:fix/server-cpu-exhaustion

Conversation

@0pcom
Copy link
Copy Markdown
Collaborator

@0pcom 0pcom commented Mar 26, 2026

No description provided.

0pcom added 4 commits March 26, 2026 13:10
- Enforce maxSessions limit: reject new TCP connections when at capacity
  instead of accepting and logging a debug message
- Add per-session concurrent stream limit (2048) using a semaphore to
  prevent a single session (e.g. setup-node) from spawning unbounded
  goroutines that starve the CPU
- Add backoff delay (50ms) on non-fatal stream accept errors to prevent
  tight CPU spin loops when persistent errors occur
- Streams that exceed the concurrency limit are immediately closed
  rather than queued, providing backpressure to the client
maxSessions only controls discovery advertisement, not connection
acceptance. Services and visors connect to all servers regardless
of advertised load, so rejecting sessions would break connectivity.
- Add read deadline (HandshakeTimeout) on initial stream request read
  so slow or malicious clients cannot hold goroutines and semaphore
  slots indefinitely. Deadline is cleared before the long-lived
  bidirectional copy loop.
- Remove stale TODO comment in server accept loop
- Fix indentation from previous revert
Run the pprof HTTP server on a dedicated OS thread via
runtime.LockOSThread() and bump GOMAXPROCS by 1 to reserve a thread
for it. This ensures the kernel scheduler gives pprof CPU time even
when the Go runtime is saturated with thousands of stream-handling
goroutines, which is exactly when pprof is needed most to diagnose
the problem.
@0pcom 0pcom merged commit b57639c into skycoin:develop Mar 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant