A native Rust implementation of the SLIDE paper family (Sub-LInear Deep learning Engine), with extensions for LLM inference, transformer sparsity prediction, and private two-party computation.
AGPL-3.0 with additional terms. See LICENSE for details. Commercial use is restricted to the Quilibrium mainnet. Automated reproduction (including LLM-assisted "clean room" reimplementation) for commercial substitutes is expressly prohibited.
Klearu is organized as a Cargo workspace with 10 crates:
| Crate | Description |
|---|---|
| klearu-core | Foundation: LSH hash families, sparse tensors, SLIDE network training |
| klearu-accel | SIMD vectorization (AVX2/NEON/scalar), BF16 quantization, cache-aligned memory |
| klearu-mongoose | Learnable hash functions, adaptive rebuild scheduling with drift detection |
| klearu-bolt | LSH hyperparameter autotuning, sparse inference optimizations |
| klearu-dejavu | Deja Vu transformer sparsity prediction (attention heads + MLP neurons) |
| klearu-llm | LLaMA-compatible LLM inference with optional sparsity |
| klearu | Facade crate with feature-gated re-exports |
| klearu-dpf | Distributed Point Functions (AES-based BGI construction) and DCF |
| klearu-mpc | 2PC building blocks: Q16.16/Q32.32 fixed-point, Beaver triples, additive sharing |
| klearu-private | Private LLM inference via 2PC with Ferret OT and Ristretto255 OPRF |
The core crates build standalone. The klearu-private crate depends on the ferret crate from the Quilibrium monorepo via a relative path. To build the full workspace including private inference, clone both repositories as siblings:
your-workspace/
klearu/ # this repository
monorepo/ # git clone https://github.com/quilibriumnetwork/monorepo
Then build from inside klearu/:
# Full workspace (requires monorepo sibling for klearu-private)
cargo build --release
# With specific features via the facade crate
cargo build --release -p klearu --features full
# LLM inference only (no monorepo needed)
cargo build --release -p klearu-llm
# LLM with sparse inference (no monorepo needed)
cargo build --release -p klearu-llm --features sparse
# Private inference (requires monorepo sibling)
cargo build --release -p klearu-privatecargo test --workspaceThe foundation crate provides LSH-based sub-linear training and inference.
Hash families (HashFamily trait): SimHash, WtaHash, DwtaHash, MinHash, SparseRandomProjection
LSH index (LshIndexTrait): query(), query_union(), query_with_counts() — with FIFO or reservoir-sampled buckets
Network: Full SLIDE training loop with configurable layers, optimizers, and sampling strategies.
use klearu_core::config::*;
use klearu_core::network::Network;
let config = SlideConfig {
network: NetworkConfig {
layers: vec![
LayerConfig::hidden(784, 1024),
LayerConfig::output(1024, 10),
],
optimizer: OptimizerType::Adam,
learning_rate: 0.001,
batch_size: 128,
num_threads: 4,
},
seed: 42,
hogwild: true,
};
let mut network = Network::new(config);| Parameter | Default | Description |
|---|---|---|
num_tables (L) |
50 | Number of LSH hash tables |
num_hashes (K) |
6 | Hash bits per table |
bucket_capacity |
128 | Max neurons per bucket |
bucket_type |
FIFO | FIFO or Reservoir sampling |
hash_function |
SimHash | SimHash, WtaHash, DwtaHash, MinHash, SRP |
rebuild_interval_base |
100 | Steps between LSH rebuilds |
rebuild_decay |
0.1 | Exponential decay for rebuild interval |
optimizer |
Adam | Adam or SGD |
activation |
ReLU | ReLU, Sigmoid, Tanh, Softmax |
sampling |
Vanilla | Vanilla, TopK, Threshold |
hogwild |
false | Lock-free parallel training |
Platform-adaptive SIMD (AVX2 on x86, NEON on ARM, scalar fallback) for dot products and scatter-add. BF16 quantization with two modes: full BF16 or BF16-storage/FP32-gradient. ContiguousWeightStore provides cache-line-aligned (64-byte) weight layouts.
Trainable hash functions that adapt to data distribution, plus an AdaptiveScheduler that monitors hash-bucket drift via EMA and triggers rebuilds only when needed.
| Parameter | Default | Description |
|---|---|---|
min_interval |
— | Minimum steps between rebuild checks |
max_interval |
— | Forced rebuild interval |
sample_fraction |
— | Fraction of neurons to sample for drift |
drift_threshold |
— | Drift level that triggers a rebuild |
ema_alpha |
0.3 | Exponential moving average smoothing |
Automatic LSH hyperparameter search over K and L to hit a target recall while minimizing query cost.
use klearu_bolt::autotune::LshAutotuner;
let tuner = LshAutotuner::new(0.9) // target 90% recall
.with_k_range(4, 16)
.with_l_range(10, 200)
.with_num_samples(100)
.with_speedup_ratio(0.1);
let result = tuner.autotune(&neurons, &queries, 42);
// result.best_k, result.best_l, result.recall, result.query_costImplementation of the Deja Vu paper: lightweight MLP predictors that identify which attention heads and FFN neurons are important for each token, enabling sparse transformer inference.
A LLaMA-compatible inference engine supporting GQA, RoPE, RMSNorm, and SwiGLU. Works with any HuggingFace-format model that uses the LLaMA architecture.
| Parameter | Default | Description |
|---|---|---|
temperature |
0.7 | Sampling temperature (0.0 = greedy) |
top_k |
40 | Top-k filtering (0 = disabled) |
top_p |
0.9 | Nucleus sampling (1.0 = disabled) |
repetition_penalty |
1.1 | Penalize repeated tokens (1.0 = disabled) |
max_new_tokens |
512 | Maximum tokens to generate |
template |
auto | Chat template (auto, zephyr, chatml, llama2, llama3, mistral, raw) |
| Parameter | Default | Description |
|---|---|---|
head_sparsity |
0.5 | Fraction of attention heads to keep |
neuron_sparsity |
0.5 | Fraction of FFN neurons to keep |
AES-based DPF using the BGI construction, plus DCF (Distributed Comparison Functions) via prefix decomposition into DPFs. Used as a building block for the MPC protocols.
Fixed-point arithmetic in Q16.16 (u32 shares) and Q32.32 (u64 shares), additive secret sharing, Beaver triple multiplication, polynomial SiLU approximation, and reveal-and-correct RMSNorm. Provides a Transport trait for abstracting communication.
End-to-end private inference combining Ferret COT (Correlated Oblivious Transfer), Ristretto255 OPRF, and the MPC building blocks. Two security levels:
| Level | Communication | Privacy | Speed |
|---|---|---|---|
| Lower | ~4.6 KB/token | Server learns nothing; client embedding revealed then plaintext forward | Fast |
| High | ~2 MB/token, ~34K triples | Only norms, queries, and gate values revealed | Slower |
Klearu works with any HuggingFace LLaMA-architecture model in safetensors format. SmolLM models are a good starting point for testing:
# Install the HuggingFace CLI if you don't have it
pip install huggingface-hub
# Download SmolLM-135M-Instruct (~270 MB)
huggingface-cli download HuggingFaceTB/SmolLM-135M-Instruct \
--local-dir SmolLM-135M-Instruct
# Or a larger model — SmolLM-360M-Instruct (~720 MB)
huggingface-cli download HuggingFaceTB/SmolLM-360M-Instruct \
--local-dir SmolLM-360M-Instruct
# Or SmolLM-1.7B-Instruct (~3.4 GB)
huggingface-cli download HuggingFaceTB/SmolLM-1.7B-Instruct \
--local-dir SmolLM-1.7B-InstructThe model directory should contain at minimum:
config.json— HuggingFace model configurationtokenizer.json— Tokenizer*.safetensors— Model weights
# Basic chat (auto-detects chat template)
cargo run --release --bin chat -- ./SmolLM-135M-Instruct
# With custom sampling parameters
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
--temp 0.8 --top-k 50 --top-p 0.95 --max-tokens 256
# With a system prompt
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
--system "You are a helpful coding assistant."
# Force a specific chat template
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
--template chatmlThe chat binary starts an interactive loop — type your message and press Enter. Use Ctrl-D to quit.
First calibrate sparsity predictors, then run with --sparse:
# Train predictors (requires sparse feature)
cargo run --release --features sparse --bin calibrate -- ./SmolLM-135M-Instruct \
--samples 16 --epochs 100
# Chat with sparse inference
cargo run --release --features sparse --bin chat -- ./SmolLM-135M-Instruct \
--sparse --head-sparsity 0.5 --neuron-sparsity 0.5Validate that a model loads and runs correctly:
cargo run --release --bin diagnose -- ./SmolLM-135M-InstructThis checks config parsing, weight loading, tokenizer functionality, forward pass sanity, and greedy generation.
Run inference where the server holds the model weights and the client's input tokens remain private:
# Terminal 1 — start the server
cargo run --release --bin private-server -- ./SmolLM-135M-Instruct \
--port 9000 --security lower
# Terminal 2 — connect the client
cargo run --release --bin private-client -- ./SmolLM-135M-Instruct \
--host localhost:9000 --security lowerFor development and testing, add --dummy-triples to both sides to skip Ferret OT setup. For real security, omit this flag to use actual oblivious transfer.
The facade crate (klearu) provides feature-gated access to all functionality:
| Feature | Enables |
|---|---|
simd |
SIMD-accelerated dot products and scatter-add |
bf16 |
BF16 quantization |
mongoose |
Learnable hashing and adaptive scheduling |
bolt |
LSH autotuning |
deja-vu |
Transformer sparsity prediction |
llm |
LLM inference engine |
full |
All of the above |
The sparse feature on klearu-llm enables Deja Vu sparse inference and the calibrate binary.