An idiomatic Rust port of the FastFlow parallel programming framework. Built from the ground up using Rust patterns — no FFI, no unsafe code.
paraflow provides high-level algorithmic skeleton abstractions for structured parallel programming on shared-memory multi-core systems.
- Pipeline — linear chain of processing stages, each running in its own thread
- Farm — master-worker parallelism with configurable scheduling
- All-to-All — shuffle communication between two sets of workers
- Combine — fuse two nodes into one thread for zero-overhead composition
- Macro Data Flow — DAG-based task scheduling with automatic RAW/WAW dependency resolution
- Type-safe builders — compile-time checking that adjacent stages have matching types
- Closure-based nodes —
FnNode,FnSource,FnSinkfor quick prototyping - GPU Acceleration — optional
paraflow-gpucrate for compute shaders via wgpu (Vulkan/Metal/DX12) - Lock-free queues — custom SPSC and MPMC queues (
paraflow-queue) for minimal-overhead inter-stage communication; 2.5x faster than crossbeam for pipelines - No
unsafein user code — enforced with#![deny(unsafe_code)]across all crates exceptparaflow-queue(which usesunsafeinternally for lock-free data structures)
use paraflow::pipeline::Pipeline;
use paraflow_core::message::NodeResult;
use paraflow_core::node::{FnNode, FnSink, FnSource};
fn main() {
let source = FnSource::new({
let mut i = 0u64;
move || {
if i < 10 { i += 1; Some(i) } else { None }
}
});
let doubler = FnNode::new(|x: u64| NodeResult::Continue(x * 2));
let printer = FnSink::new(|x: u64| println!("Result: {x}"));
let result = Pipeline::builder()
.add_source(source)
.then(doubler)
.then_sink(printer)
.run_and_wait();
match result {
Ok(()) => println!("Pipeline complete!"),
Err(e) => eprintln!("Pipeline failed: {e}"),
}
}paraflow/
├── crates/
│ ├── paraflow-queue/ # Lock-free SPSC & MPMC bounded queues
│ ├── paraflow-core/ # Traits, message types, channels, thread runtime
│ ├── paraflow/ # Skeletons: Pipeline, Farm, A2A, Combine
│ ├── paraflow-patterns/ # High-level: ParallelFor, Map, D&C, Stencil, MDF, etc.
│ └── paraflow-gpu/ # GPU acceleration via wgpu (compute shaders)
└── examples/
Nodes are the basic units of computation. Each node runs in its own thread and communicates via channels.
| Trait | Purpose |
|---|---|
Node<In, Out> |
Transforms input to output |
SourceNode<Out> |
Generates data (first pipeline stage) |
SinkNode<In> |
Consumes data (last pipeline stage) |
MultiInputNode |
Receives from multiple channels |
MultiOutputNode |
Sends to multiple channels |
Messages flow between nodes as Message<T>:
enum Message<T> { Data(T), Eos, Eosw }NodeResult controls what a node emits:
enum NodeResult<T> { Continue(T), GoOn, Stop, Multi(Vec<T>) }Fuse two nodes into one to eliminate a channel and thread:
use paraflow::combine::Combine;
let fused = Combine::new(
FnNode::new(|x: u64| NodeResult::Continue(x * 2)),
FnNode::new(|x: u64| NodeResult::Continue(x + 1)),
);
// fused implements Node<In=u64, Out=u64>, runs in a single thread| C++ FastFlow | paraflow |
|---|---|
void* + sentinel pointers (FF_EOS, FF_GO_ON) |
Message<T> and NodeResult<T> enums |
Class inheritance (ff_node, ff_minode) |
Traits with associated types |
| Hand-rolled lock-free SPSC queues | Custom lock-free SPSC & MPMC queues (paraflow-queue) with crossbeam fallback for Select |
ff_comb merges via internal wiring |
Combine<N1, N2> implements Node directly |
| Runtime type casts | Compile-time type checking via generic builders |
A step-by-step tutorial covers all core patterns with side-by-side FastFlow C++ and paraflow Rust comparisons. Each section has a runnable example:
cargo run --example tutorial_1_pipeline # 3-stage pipeline
cargo run --example tutorial_2_farm # Farm with workers
cargo run --example tutorial_3_farm_scheduling # On-demand scheduling
cargo run --example tutorial_4_ordered_farm # Order-preserving farm
cargo run --example tutorial_5_parallel_for # Parallel loops
cargo run --example tutorial_6_parallel_reduce # Parallel reduction
cargo run --example tutorial_7_divide_conquer # Divide & conquer (merge sort)
cargo run --example tutorial_8_combine # Node fusion
cargo run --example tutorial_9_feedback # Iterative feedback loop# Run tests
cargo test --workspace
# Run the example
cargo run --example simple_pipeline -p paraflow
# Lint
cargo clippy --workspace -- -D warnings- Phase 1 — Foundation: core traits, pipeline, combine
- Phase 2 — Farm skeleton with scheduling policies
- Phase 3 — All-to-All and skeleton composition/nesting
- Phase 4 — High-level patterns (ParallelFor, ParallelReduce, Map, Divide & Conquer, TaskPool, PoolEvolution, Stencil, GraphSearch, NodeSelector)
- Phase 5 — Benchmarks and optimization (criterion micro-benchmarks, NodeContext stack optimization, lock-free SPSC/MPMC queues)
- Phase 6 — Macro Data Flow (DAG-based task scheduling with automatic dependency resolution)
- Phase 7 — GPU acceleration via wgpu (GpuNode trait, batch processing, GPU stencil/parallel_for/parallel_reduce)
MIT