+
For part 6 let's do some benchmarks;
+
+
These values are not "scientific", just a ballpark estimate.
+
+
What is going to be benchmarked
+
+
+ - io_uring read+write with IVTS reactor inline continuations (RunAsynchrounousContinuation = false)
+ - io_uring read+write without IVTS reactor inline continuations (threadpool) (RunAsynchrounousContinuation = true)
+ - io_uring read + libc send write without IVTS reactor inline continuations (threadpool) (RunAsynchrounousContinuation = true)
+ - epoll read+write with IVTS reactor inline continuations
+ - epoll read+write without IVTS reactor inline continuations
+ - System.Net.Socket (Kestrel stock) - epoll threadpool
+
+
+
Tests
+
+
(No pipelining)
+
+
+ - Synchronous lightweight plaintext "OK" response.
+ - Asynchronous workload:
_ = await Task.Run(static () => JsonSerializer.Serialize("Hello World!"));
+
+
+
The purpose of the async workload is to force the continuation onto the threadpool, not to model a heavy async workload.
+
+
Hardware
+
+
i9 14900k
+ 64GB DDR5 6400MHz
+ Linux Kernel 6.17.0-22-generic
+
+
Tests are done through localhost loopback (no NIC influence)
+ MTU 1500
+
+
Load generators
+
+
Http/1.1 no TLS
+
+
+ wrk (epoll)
+ gcannon (io_uring)
+
+
+
io_uring read+write with IVTS reactor inline continuations
+
+
This is the exact model explored throughout the series, expected to deliver high performance on synchronous test.
+
+
Reactor count: 12
+
+
Sync workload
+
+
+
+
Async workload (very unstable)
+
+
+
+
io_uring read+write without IVTS reactor inline
+
+
Similar model explored throughout the series but with RunAsynchronousContinuation set to true on both IVTS, expected to deliver close results on both tests.
+
+
Reactor count: 12
+
+
Sync workload
+
+
+
+
Async workload
+
+
+
+
io_uring read + libc send write without IVTS reactor inline continuations
+
+
Similar model explored throughout the series but with RunAsynchronousContinuation set to true on both IVTS and the write branch is not io_uring, instead we use the libc's send, expected to deliver close results on both tests. This is an hybrid approach and should be the middle ground between the first two models.
+
+
Reactor count: 12
+
+
Sync workload
+
+
+
+
Async workload
+
+
+
+
epoll read+write with IVTS reactor inline continuations
+
+
Pure epoll approach with same reactor threading architecture. Inline handler continuation for both IVTS.
+
+
Reactor count: 12
+
+
Sync workload
+
+
+
+
Async workload
+
+
+
+
epoll read+write without IVTS reactor inline continuations
+
+
Pure epoll approach with same reactor threading architecture. Threadpool handler continuation for both IVTS.
+
+
Reactor count: 6
+
+
Sync workload
+
+
+
+
Async workload
+
+
+
+
System.Net.Socket (Kestrel stock) - epoll threadpool
+
+
Kestrel's stock network I/O with some tunning
+
+
Sync workload
+
+
+
+
Async workload
+
+
+
+
Comparison at a glance
+
+
wrk and gcannon req/s and avg latency for every model, side by side.
+
+
+
+
CPU usage: the inline-IVTS cases (io_uring r+w IVTS, epoll r+w IVTS) cap at around 1200% max, while every other model averages ~1600%.
+
+
* Async run flagged as very unstable in the original write-up.
+
+
Conclusion
+
+
The numbers are aligned with part 5's rant. On a fully synchronous benchmark, io_uring with the reactor inline continuation rides ahead, no cross thread hand offs.
+
+
Force the continuation on the threadpool (async workload) and that lead evaporates. The hybrid approach reclaims most of it and is a serious contender for further tests with Kestrel integration.
+
+
A little note on the load generators, quite interesting results, gcannon seems a lot more stable on latency values while wrk is all over the place.
+
+
Important to highlight that the reactor inline sync models consume in average 20% less CPU as they are bounded to 12 reactor CPU threads. On the other hand, solutions that allow threadpool continuation will use as much CPU is available. For example, epoll r+w IVTS inline can actually yield 3.9M rps if we increase the reactor count to 16, surpassing System.Net.Socket performance for same CPU usage.
+
+
Very surprising result on epoll r+w threadpool, was expecting the performance to be equal to System.Net.Socket, this will be quite interesting for part 7.
+
+
On part 7 some of these models will be integrated on Kestrel/ASP.NET for direct benchmark comparison.
+
+