Skip to content

feat: add Celeris framework benchmarks#189

Open
FumingPower3925 wants to merge 17 commits intomainfrom
feat/celeris-benchmarks
Open

feat: add Celeris framework benchmarks#189
FumingPower3925 wants to merge 17 commits intomainfrom
feat/celeris-benchmarks

Conversation

@FumingPower3925
Copy link
Contributor

Summary

  • Adds servers/celeris/ package wrapping the Celeris HTTP engine (v0.3.0) for benchmarking
  • 27 benchmark server variants: 3 engines (iouring, epoll, adaptive) × 3 objectives (latency, throughput, balanced) × 3 protocols (h1, h2, hybrid)
  • New celeris benchmark mode (-mode celeris) and included in all mode
  • Dashboard classification updated with "celeris" category
  • Data-driven dispatch via ParseServerType() — no giant switch statements

Server naming convention

celeris-{engine}-{objective}-{protocol}

Examples: celeris-iouring-throughput-h1, celeris-adaptive-balanced-hybrid

Engines

Engine Description
iouring io_uring (Linux 5.10+)
epoll epoll (Linux 2.6+)
adaptive Smart runtime engine selection

Objectives

Objective Description
latency Optimized for low latency
throughput Optimized for maximum RPS
balanced Balanced between latency and throughput

The std engine (net/http wrapper) is available for local dev but excluded from benchmark runs.

Closes #188

Test plan

  • go build ./cmd/server/ and go build ./cmd/bench/ pass (macOS, non-Linux stub path)
  • go test ./internal/dashboard/ passes with new classify/framework test cases
  • golangci-lint run clean on all changed packages
  • Linux cross-compile: GOOS=linux go build ./cmd/server/
  • End-to-end: run celeris-std-throughput-h1 locally to validate handler responses

…protocol combinations

Adds servers/celeris/ package wrapping the Celeris HTTP engine (v0.3.0)
for benchmarking. Supports 27 server variants: 3 engines (iouring, epoll,
adaptive) × 3 objectives (latency, throughput, balanced) × 3 protocols
(h1, h2, hybrid). The std engine is available for local dev but excluded
from benchmark runs.

Server naming: celeris-{engine}-{objective}-{protocol}

Ref: #188
…rd, and CI

- cmd/server: add celeris dispatch files (Linux + stub) with prefix-based
  routing via ParseServerType()
- cmd/bench: add 27-entry celerisServers list and "celeris" benchmark mode
- internal/dashboard: add "celeris" category classification, update
  parseFramework to strip protocol suffix (works for all naming depths)
- .github/workflows/benchmark.yml: update jq classify_category and
  parse_framework for celeris server names

Ref: #188
@FumingPower3925 FumingPower3925 added this to the v0.5.0 milestone Mar 8, 2026
@FumingPower3925 FumingPower3925 self-assigned this Mar 8, 2026
@FumingPower3925 FumingPower3925 added the bench-fast Trigger fast benchmark (small instances, quick test) label Mar 8, 2026
The io_uring/epoll event loops call the handler synchronously on the
event loop thread. Per-request json.Marshal allocations cause GC pressure
that stalls the locked OS thread, leading to hangs under benchmark load.

Changes:
- Pre-compute JSON body at construction time (like theoretical servers)
- Pre-compute header slices and static response bodies as package vars
- Add defer s.Cancel() to clean up stream context after each request
- Inline all WriteResponse calls to eliminate method overhead
@FumingPower3925 FumingPower3925 added bench-fast Trigger fast benchmark (small instances, quick test) and removed bench-fast Trigger fast benchmark (small instances, quick test) labels Mar 9, 2026
v0.3.1 fixes a bug in the io_uring engine's send queue / buffer
management that caused hangs when handling responses under sustained
benchmark load. Validated with 48/48 tests passing across all
engine/objective/protocol combinations.
@FumingPower3925 FumingPower3925 added bench-fast Trigger fast benchmark (small instances, quick test) and removed bench-fast Trigger fast benchmark (small instances, quick test) labels Mar 9, 2026
Two issues caused CI to appear stuck for hours when a server stops
responding (e.g., Celeris io_uring engine deadlock under sustained load):

1. Run() had no safeguard timeout — workers used the parent context
   which is never cancelled by the benchmark duration timer. If the
   HTTP client timeout fails to fire (e.g., server keeps TCP alive
   but stops processing), wg.Wait() blocks indefinitely while the
   heartbeat goroutine keeps the C2 from detecting the issue.

   Fix: Create a scoped context (Duration + 60s) for workers and
   cancel it when the benchmark period ends, ensuring all in-flight
   HTTP requests are cancelled promptly.

2. No failed server detection — if a server produced 0 RPS (clearly
   broken), the runner still attempted all 5 benchmark types, wasting
   up to 50 minutes per server in retry loops.

   Fix: Skip remaining benchmark types if RPS < 10 (server unhealthy)
   or after 2 consecutive failures.
…loop deadlock)

v0.3.2 fixes the three compounding issues that caused the io_uring
engine to deadlock under sustained high-concurrency load:
- sync.Pool buffers (Data/OutboundBuffer) now returned after H1 handler
- sendQueue capped to prevent unbounded growth under back-pressure
- Reduced allocation pressure in the hot path
@FumingPower3925 FumingPower3925 added bench-fast Trigger fast benchmark (small instances, quick test) and removed bench-fast Trigger fast benchmark (small instances, quick test) labels Mar 9, 2026
With 50 servers × 5 benchmark types = 250 total benchmarks (up from
115 before celeris), the previous timeouts were too tight:

- BenchmarkTimeout: 3h → 5h (metal mode needs ~150 min + retries)
- Cleanup watchdog: 2h → 6h (was killing runs before BenchmarkTimeout)
- Workflow job timeout: 4h → 7h
- Workflow MAX_WAIT: 4h → 6h
@FumingPower3925 FumingPower3925 added bench-fast Trigger fast benchmark (small instances, quick test) and removed bench-fast Trigger fast benchmark (small instances, quick test) labels Mar 9, 2026
When spot capacity is unavailable, the orchestrator wastes 10 minutes
per AZ waiting for workers that will never register. Instance boot +
binary download + registration takes 2-3 minutes in practice, so 4
minutes is plenty. This cuts worst-case AZ cycling from 60+ minutes
to ~24 minutes before on-demand fallback.
…egister

When waitForWorkers times out, check which role (server/client) actually
registered. If one succeeded but the other didn't get spot capacity,
immediately deploy the missing role as on-demand in the same AZ instead
of failing the entire architecture and cycling to the next AZ.
When a spot instance is terminated mid-benchmark, instead of bubbling the
error up to runArchitectureWithRetry (which restarts from scratch in a new
AZ), recover in-place:

1. Detect interruption faster by monitoring BOTH server and client
   heartbeats (previously only checked client, so server-only
   interruptions took ~15 min to detect)
2. Delete the terminated spot stacks and clear stale registrations
3. Deploy both roles as on-demand in the same AZ (parallel)
4. New client resumes from checkpoint via completed_benchmarks in the
   assignment API — no work is repeated
5. If in-place recovery fails, fall through to the existing AZ-cycling
   retry in runArchitectureWithRetry
@FumingPower3925 FumingPower3925 removed the bench-fast Trigger fast benchmark (small instances, quick test) label Mar 9, 2026
@FumingPower3925 FumingPower3925 added the bench-fast Trigger fast benchmark (small instances, quick test) label Mar 9, 2026
@FumingPower3925 FumingPower3925 added bench-fast Trigger fast benchmark (small instances, quick test) bench-med Trigger medium benchmark (medium instances) and removed bench-fast Trigger fast benchmark (small instances, quick test) labels Mar 9, 2026
…r count

When the benchmark duration expires, in-flight HTTP requests are cancelled
via context, which was incorrectly counted as errors (1-32 per benchmark).
Now only counts actual server errors during the benchmark window.
@FumingPower3925 FumingPower3925 added bench-fast Trigger fast benchmark (small instances, quick test) and removed bench-fast Trigger fast benchmark (small instances, quick test) labels Mar 10, 2026
@github-actions

This comment was marked as outdated.

Without this, the last periodic heartbeat could show stale progress
(e.g. 243/245) if the final benchmarks completed between ticks,
making it look like benchmarks were missed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bench-fast Trigger fast benchmark (small instances, quick test) bench-med Trigger medium benchmark (medium instances)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add Celeris framework benchmarks

1 participant