Fastest Python WebSocket client on plain TCP β wins 12/12 benchmark cells against picows, aiohttp, websockets, and websocket-client across three server architectures. Sync and async APIs. Pure-Rust internals (PyO3 + rustls TLS), ~9 KB per idle connection.
| Your situation | Pick | Why |
|---|---|---|
| Plain Python script / CLI tool (no asyncio) | ws-rs sync | picows is async-only; 1.3β7Γ over websocket-client (size-dependent) |
| asyncio app, sequential RR β€ 8 KB | ws-rs sync via to_thread |
+13β28% over picows; sync bypasses asyncio overhead |
| asyncio app, pipelined / streaming | ws-rs async | mean RPS +14β21% over picows on best path |
| asyncio app, β₯ 1 MB payloads | ws-rs async β picows | tied within 5%; server-side ceiling |
| RR latency-sensitive, payload β₯ 16 KB | picows | 3β5 Β΅s lower per-message RTT (Cython cdef vs PyO3) |
| Cross-network (real server, real RTT) | Either | network RT dominates; library choice < 1% of total latency |
vs picows
- ws-rs wins: pipelined throughput +14β21% mean (best path); TLS small-payload sync mode +18β37%; native sync API (picows is async-only); idle-connection memory 9 KB vs 50 KB (5β6Γ lighter).
- picows wins: single-RT RR latency by 3β5 Β΅s at 16 KB+ payloads (Cython direct CPython API vs PyO3 method dispatch); at 256 Bβ4 KB the two are essentially tied.
Neutral Rust echo server (tokio-tungstenite), pinned cores, uvloop, 1 s discarded pre-pass per cell:
| Payload | ws-rs sync | ws-rs async | picows | aiohttp | websockets | websocket-client |
|---|---|---|---|---|---|---|
| 256 B | 15.0k | 12.8k | 13.5k | 12.0k | 9.9k | 11.7k |
| 8 KB | 15.1k | 13.4k | 13.2k | 11.8k | 9.6k | 10.1k |
| 100 KB | 10.8k | 10.3k | 10.5k | 9.4k | 8.1k | 4.5k |
| 1 MB | 4.0k | 4.2k | 4.2k | 3.4k | 3.0k | 509 |
ws-rs wins or ties 12/12 cells across three server architectures (tokio-tungstenite, fastwebsockets, picows-server).
| Payload | ws-rs sync | ws-rs async | picows | aiohttp | websockets | websocket-client |
|---|---|---|---|---|---|---|
| 256 B | 13.2k | 9.3k | 9.6k | 9.3k | 8.0k | 9.7k |
| 8 KB | 11.9k | 8.9k | 8.8k | 8.3k | 7.2k | 8.5k |
| 100 KB | 7.1k | 5.9k | 6.0k | 5.8k | 5.3k | 3.8k |
| 1 MB | 1.4k | 1.4k | 1.5k | 1.4k | 1.4k | 461 |
Sync wins TLS 256 Bβ100 KB by 30β60%. At 1 MB picows leads by ~7%; ws-rs ties websockets/aiohttp. (1 MB is the realistic upper bound: Cloudflare's per-frame hard cap, Azure SignalR's default.)
~8.9 KB per idle connection (was ~130 KB before lazy-alloc landed in v0.7.0). 10 K idle connections: ~90 MB vs ~1.3 GB. Notification fan-out, IoT backends, long-poll receivers all benefit.
π Full benchmarks β all 3 servers, TCP & TLS, latency distributions, compression | π Optimization Research
- TLS backend:
native-tlsβrustls(pure-Rust, no OpenSSL). Eliminates the OpenSSL global-state conflict that crashed picows when loaded in the same process. Cross-platform wheel builds simplified. asyncio.BufferedProtocol+ ring-buffer recv: 64 KB pipelined +15% mean vs picows.- Lazy buffer allocation: per-connection memory 130 KB β 8.9 KB (β93%).
on_messagecallback API for users who want to bypass the per-frame Future overhead.- Scheme-dispatched protocol class:
ws://uses BufferedProtocol;wss://uses plain Protocol (asyncio's SSLProtocol interacts poorly with BufferedProtocol's small TLS-record callbacks).
Full details in CHANGELOG.md.
# From PyPI (recommended)
pip install websocket-rs
# Using uv
uv pip install websocket-rs
# From source
pip install git+https://github.com/coseto6125/websocket-rs.git# Sync API
from websocket_rs.sync.client import connect
with connect("ws://localhost:8765") as ws:
ws.send("Hello")
response = ws.recv()
print(response)# Async API
import asyncio
from websocket_rs.async_client import connect
async def main():
async with await connect("ws://localhost:8765") as ws:
await ws.send("Hello")
response = await ws.recv()
print(response)
asyncio.run(main())| Method | Description | Example |
|---|---|---|
connect(url) |
Create and connect WebSocket | ws = connect("ws://localhost:8765") |
send(message) |
Send message (str or bytes) | ws.send("Hello") |
recv() |
Receive message | msg = ws.recv() |
close() |
Close connection | ws.close() |
# Sync
from websocket_rs.sync.client import ClientConnection
ws = ClientConnection(url, connect_timeout=10.0, receive_timeout=10.0, tcp_nodelay=True)
# Async
from websocket_rs.async_client import ClientConnection
ws = ClientConnection(url, headers={"Key": "val"}, proxy="socks5://host:port",
connect_timeout=10.0, receive_timeout=10.0, tcp_nodelay=True)# Specify version (example for Linux x86_64, Python 3.12+)
uv pip install https://github.com/coseto6125/websocket-rs/releases/download/v0.6.0/websocket_rs-0.6.0-cp312-abi3-manylinux_2_34_x86_64.whlRequirements:
- Python 3.12+
- Rust 1.80+ (rustup.rs)
git clone https://github.com/coseto6125/websocket-rs.git
cd websocket-rs
pip install maturin
maturin develop --release[project]
dependencies = [
"websocket-rs @ git+https://github.com/coseto6125/websocket-rs.git@main",
]# Run API compatibility tests
python tests/test_compatibility.py
# Run picows-parity RPS benchmark (5 clients Γ 3 server architectures Γ 4 sizes, ~10 min)
python tests/benchmark_picows_parity.py
# Run latency-distribution benchmark (RR + pipelined, mean/p50/p99)
python tests/benchmark_three_servers.py# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Setup development environment
make install # Creates venv and installs dependencies
# Build and test
make dev # Development build
make test # Run tests
make bench # Run benchmarks
# Or manually with uv
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
maturin develop --release# Install development dependencies
pip install maturin pytest websockets
# Development mode (fast iteration)
maturin develop
# Release mode (best performance)
maturin develop --release
# Watch mode (auto-recompile)
maturin develop --release --watch- Zero-cost abstractions: Rust's async/await compiles to efficient state machines
- Tokio runtime: Work-stealing scheduler optimized for I/O-bound tasks
- No GIL: True parallelism for concurrent operations
- Memory safety: No segfaults, data races, or memory leaks
Sync Client β Pure tungstenite (blocking I/O), no async runtime overhead:
Python β GIL release β Rust tungstenite β socket I/O β result
Async Client β Actor pattern with tokio-tungstenite:
Python β mpsc channel β tokio background task β socket I/O
β ReadyFuture fast path when data is buffered
The async bridge adds ~36Β΅s per operation (channel + call_soon_threadsafe).
This is negligible for large payloads or cross-network usage, but visible on localhost small messages.
# Check Rust version
rustc --version # Requires >= 1.70
# Clean and rebuild
cargo clean
maturin develop --release
# Verbose mode
maturin develop --release -v- TimeoutError: Increase
connect_timeoutparameter - Module not found: Run
maturin developfirst - Connection refused: Check if server is running
- Performance not as expected: Ensure using
--releasebuild
Contributions welcome! Please ensure:
- All tests pass
- API compatibility is maintained
- Performance benchmarks included
- Documentation updated
MIT License - See LICENSE
- PyO3 - Rust Python bindings
- Tokio - Async runtime
- tokio-tungstenite - WebSocket implementation
- websockets - Python WebSocket library
- picows - High-performance Python WebSocket client. Special thanks to @tarasko for issue #11, which prompted the cross-validation benchmark methodology (multiple servers, controlled CPU pinning, statistically meaningful iteration counts) now used throughout
tests/benchmark_*.py. The result table you see above wouldn't exist without that nudge.