Quick Start Guide

Get benchmarking in 5 minutes on multi-core NUMA systems.

Prerequisites

Python 3.7+
numactl (for NUMA pinning)
At least one llama.cpp build with llama-bench binary
Multi-core CPU with NUMA (AMD Threadripper, EPYC, Intel Xeon, etc.)

Step 1: Install Dependencies

cd /var/www/hft-cpu-test
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Step 2: Create Your Config

Start with the minimal template:

cp configs/minimal.yaml configs/mytest.yaml

Edit configs/mytest.yaml:

Set model path:

model:
  path: /data/models/Qwen2.5-14B-Q4_K_M.gguf
  name: Qwen2.5-14B-Q4_K_M

Set build path:

builds:
  - name: my-build
    path: /opt/llama.cpp/build/bin/llama-bench
    provider: OpenBLAS  # or BLIS-OpenMP, MKL, none
    env:
      OPENBLAS_NUM_THREADS: "1"
      OMP_NUM_THREADS: "1"

builds_select:
  - my-build

NUMA pinning (adjust for your CPU):

First, check your CPU topology:

lscpu --parse=CPU,Core,Node
numactl --hardware

Then update the physcpubind values with your physical core IDs (not SMT/HT siblings):

pinning:
  presets:
    all-cores:
      numactl: "-N 0,1 -m 0,1 --physcpubind=<your_physical_core_ids>"

Step 3: Verify System Configuration

Check your system is ready:

./scripts/check_system.sh

Ensure CPU governor is set to "performance":

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Disable NUMA balancing for reproducible results:

echo 0 | sudo tee /proc/sys/kernel/numa_balancing

Step 4: Dry Run

Test your config without executing benchmarks:

./run_bench.sh configs/mytest.yaml --dry-run

This shows the test matrix that will be executed.

Step 5: Run Exploratory

./run_bench.sh configs/mytest.yaml

Results:

Runs all test combinations (2-3 reps each)
Takes ~10-30 minutes depending on matrix size
Outputs to reports/<timestamp>-exploratory/

Step 6: Review Results

cat reports/latest/summary.md

Look for:

Top performers per metric (pp512, tg128, mixed)
Consistency across repetitions
Any obvious outliers or anomalies

Step 7: Run Deep Testing (Parameter Sweep)

After exploratory identifies winners, create a deep config to test parameter variations:

# Copy starter template
cp configs/example-deep.yaml configs/mytest-deep.yaml

# Edit to use your winning builds and add parameter sweep
nano configs/mytest-deep.yaml

Deep testing provides:

Parameter sweeps: KV cache types, MLA variants, batch sizes
Tests combinations to find optimal settings
Same repetitions (3), but many more parameter variations
Production-ready recommendations with optimal parameters

Step 8: Select Production Settings

From reports/latest/summary.md, identify the winning configuration:

Winner: blis-omp-znver1 / all-16-cores / b=256,ub=96 / kv=q8_0 / mla3-fa
  - pp512: 145.3 t/s ±2.1
  - tg128: 47.8 t/s ±0.8
  - mixed: 52.1 t/s ±1.2

Troubleshooting

"numactl: command not found"

sudo apt-get install numactl  # Debian/Ubuntu
sudo yum install numactl      # RHEL/CentOS

"Could not parse output"

llama-bench output format may have changed. Check:

/path/to/llama-bench --help
/path/to/llama-bench -m model.gguf -p 10 -n 0  # Quick test

Low performance across all tests

Check:

# CPU governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# Should be "performance"

# NUMA balancing
cat /proc/sys/kernel/numa_balancing
# Should be "0"

# Verify binary is optimized
file /path/to/llama-bench
ldd /path/to/llama-bench | grep -E 'blas|omp'

Next Steps

Read docs/CONFIG_SCHEMA.md for full YAML reference
Read docs/PROVENANCE.md to understand captured data
Customize scenarios for your workload
Use scripts/setup_builds.sh to create multiple builds

Complete Workflow Example

# 1. One-time setup
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 2. System prep
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 0 | sudo tee /proc/sys/kernel/numa_balancing

# 3. Create config
cp configs/minimal.yaml configs/prod-test.yaml
nano configs/prod-test.yaml  # Edit model/build paths and CPU bindings

# 4. Verify setup
./scripts/check_system.sh
./run_bench.sh configs/prod-test.yaml --dry-run

# 5. Exploratory (fast sweep)
./run_bench.sh configs/prod-test.yaml
# ~15-20 minutes

# 6. Review winners
cat reports/latest/summary.md
cat reports/latest/promote.yaml

# 7. Deep test winners
./run_bench.sh reports/latest/promote.yaml
# ~30-40 minutes

# 8. Final decision
cat reports/latest/summary.md
# Apply winning config to production!

Determining Your CPU Topology

Every CPU is different. Here's how to configure correctly:

Step 1: Check Topology

lscpu --parse=CPU,Core,Node

Look for the pattern:

Physical cores: Unique "Core" numbers (use these!)
SMT/HT siblings: Duplicate "Core" numbers (skip these)

Step 2: Common Patterns

AMD Threadripper/EPYC (sequential):

Physical cores: 0, 1, 2, 3, ... N
SMT siblings: N+1, N+2, ... 2N
Use: --physcpubind=0,1,2,3,...,N

Some Intel Xeon (even-numbered):

Physical cores: 0, 2, 4, 6, ... 2N
SMT siblings: 1, 3, 5, 7, ... 2N+1
Use: --physcpubind=0,2,4,6,...,2N

Step 3: Update Config

Replace the --physcpubind values in your YAML with your physical core IDs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start Guide

Prerequisites

Step 1: Install Dependencies

Step 2: Create Your Config

Step 3: Verify System Configuration

Step 4: Dry Run

Step 5: Run Exploratory

Step 6: Review Results

Step 7: Run Deep Testing (Parameter Sweep)

Step 8: Select Production Settings

Troubleshooting

"numactl: command not found"

"Could not parse output"

Low performance across all tests

Next Steps

Complete Workflow Example

Determining Your CPU Topology

Step 1: Check Topology

Step 2: Common Patterns

Step 3: Update Config

FilesExpand file tree

QUICK_START.md

Latest commit

History

QUICK_START.md

File metadata and controls

Quick Start Guide

Prerequisites

Step 1: Install Dependencies

Step 2: Create Your Config

Step 3: Verify System Configuration

Step 4: Dry Run

Step 5: Run Exploratory

Step 6: Review Results

Step 7: Run Deep Testing (Parameter Sweep)

Step 8: Select Production Settings

Troubleshooting

"numactl: command not found"

"Could not parse output"

Low performance across all tests

Next Steps

Complete Workflow Example

Determining Your CPU Topology

Step 1: Check Topology

Step 2: Common Patterns

Step 3: Update Config