By default, Loom spawns one worker thread per available CPU core (including logical cores with hyperthreading). You can customize this:
const pool = try ThreadPool.init(allocator, .{
.num_threads = 4, // Fixed thread count
});Or leave it as null to use all available cores:
const pool = try ThreadPool.init(allocator, .{
.num_threads = null, // Default: CPU count
});Loom uses work stealing based on the Chase-Lev deque algorithm. Here's how it works:
- When you call
join(a, b)from worker thread W, W places taskbinto its local deque - W immediately starts executing
a - Other idle threads can "steal"
bfrom W's deque (FIFO order) - If no one steals
b, W executes it after completinga
This approach ensures:
- Cache locality: Threads prefer their own recent work (LIFO)
- Load balancing: Idle threads steal from busy ones
- Low overhead: ~5-10ns for local operations, ~20-50ns for steals
Zig doesn't have Rust's borrow checker, so thread safety is your responsibility. Common patterns:
Read-only data — Pass by pointer, no synchronization needed:
const ctx = Context{ .threshold = 100.0, .lookup_table = &table };
loom.par_iter(data)
.withContext(&ctx)
.forEach(process);Counters — Use std.atomic.Value:
var count = std.atomic.Value(usize).init(0);
loom.par_iter(data).for_each(struct {
fn process(item: *Item) void {
if (item.matches()) {
_ = count.fetchAdd(1, .monotonic);
}
}
}.process);Complex mutations — Use std.Thread.Mutex:
var mutex = std.Thread.Mutex{};
var results = std.ArrayList(Result).init(allocator);
loom.par_iter(data).for_each(struct {
fn process(item: *Item) void {
const result = compute(item);
mutex.lock();
defer mutex.unlock();
results.append(result) catch {};
}
}.process);Best performance — Thread-local accumulation with final merge (use reduce):
const sum = loom.par_iter(data).reduce(Reducer(i64).sum());No. Unlike Rust's Rayon, Loom cannot guarantee data-race freedom at compile time because Zig lacks a borrow checker.
However, Loom's API encourages race-free patterns:
map(),filter(),reduce()operate on independent elementsreduce()uses thread-local accumulators merged at the end- Context is passed as a pointer, making shared state explicit
Safe usage is the programmer's responsibility. Follow the patterns above and you'll be fine.
Parallel iteration adds overhead. Avoid it when:
- Small data sets — Below ~1000 elements, sequential is faster
- Trivial operations — If the per-element work is tiny (e.g., simple addition), overhead dominates
- I/O-bound work — Disk/network operations don't benefit from CPU parallelism
- Already parallelized — Don't nest
par_iterinsidepar_iter
Use .withMinChunk() to control the sequential threshold:
loom.par_iter(data)
.withMinChunk(1000) // Don't parallelize below 1000 elements
.for_each(process);-
Reduce threads — Test with 1 thread to isolate concurrency bugs:
const pool = try ThreadPool.init(allocator, .{ .num_threads = 1 });
-
Add logging — Use
std.debug.printwith thread ID:std.debug.print("Thread {d}: processing item {d}\n", .{ std.Thread.getCurrentId(), item.id });
-
Use sanitizers — Build with thread sanitizer (when Zig supports it)
-
Simplify — Replace
par_iterwith regularforto verify sequential correctness first
join(a, b) — Binary fork-join for exactly 2 tasks:
const left, const right = loom.join(
computeLeft, .{data[0..mid]},
computeRight, .{data[mid..]},
);scope — Dynamic spawning for variable number of tasks:
loom.scope(struct {
fn body(s: *loom.Scope) void {
for (items) |item| {
s.spawn(process, .{item});
}
}
}.body);
// All spawned tasks complete before scope exitsUse join for divide-and-conquer algorithms; use scope for task parallelism.
Not directly. Loom is designed for CPU-bound parallelism using OS threads, not async I/O. For I/O-bound workloads, consider Zig's std.event or other async solutions.
However, you can use Loom for the CPU-intensive parts of your application and async for I/O.
| Feature | Loom (Zig) | Rayon (Rust) |
|---|---|---|
| Work stealing | Yes | Yes |
| Parallel iterators | Yes | Yes |
| Fork-join | Yes | Yes |
| Data-race freedom | No (programmer responsibility) | Yes (compiler enforced) |
| Async integration | No | Limited |
| Zero dependencies | Yes | Yes |
Loom brings Rayon's ergonomics to Zig, with the trade-off that safety is not compiler-enforced.
Documentation:
- README.md — Quick start and API reference
- LOOM_IMPLEMENTATION.md — Architecture deep dive
Samples:
| Sample | Description |
|---|---|
| loom_api_showcase.zig | Comprehensive API examples |
| basic_join.zig | Binary fork-join basics |
| scope_spawn.zig | Structured concurrency with scopes |
| parallel_sum.zig | Parallel reduction |
| parallel_map.zig | Parallel transformation |
| parallel_find.zig | Parallel search |
| parallel_sort.zig | Parallel sorting |
| map_collect.zig | Map and collect results |
| vec_collect.zig | Vector collection |
| chunks_pipeline.zig | Chunk-based processing |
| quicksort.zig | Divide-and-conquer quicksort |
| mergesort.zig | Divide-and-conquer mergesort |
| fibonacci.zig | Recursive parallelism |
| factorial.zig | Parallel factorial |
| matrix_multiply.zig | Matrix operations |
| monte_carlo_pi.zig | Monte Carlo simulation |
| prime_sieve.zig | Prime number sieve |
| pythagoras.zig | Pythagorean triples |
| game_of_life.zig | Conway's Game of Life |
| nbody_sim.zig | N-body simulation |
| tree_traverse.zig | Parallel tree traversal |
| tsp.zig | Traveling salesman problem |
| word_count.zig | Parallel word counting |
| csv_transform_mmap_with_loom.zig | CSV processing (memory-mapped) |
| csv_transform_in_mem_with_loom.zig | CSV processing (in-memory) |
| image_processing.zig | Image processing |
| image_transform.zig | Image transformation |
| custom_pool_stats.zig | Thread pool statistics |
| join_microbench.zig | Join performance benchmark |
| context_api_bench.zig | Context API benchmark |
| noop_bench.zig | No-op overhead benchmark |