A high-performance UCI chess engine built for Apple Silicon, combining CPU-optimized NNUE alpha-beta search with Metal GPU transformer inference in a parallel hybrid architecture.
MetalFish runs three search modes selectable at runtime via UCI options. The hybrid mode—its primary strength—runs alpha-beta and MCTS in true parallel, using Apple's unified memory for zero-copy communication between CPU and GPU workloads.
| Engine | Description | UCI Option |
|---|---|---|
| Alpha-Beta | Classical PVS with CPU NNUE (~7M NPS) | Default |
| MCTS | Transformer-backed tree search via Metal/MPSGraph | UseMCTS true |
| Hybrid | Parallel MCTS + AB with dynamic move arbitration | UseHybridSearch true |
| Engine | Score |
|---|---|
| MetalFish-Hybrid | 22/24 |
| MetalFish-AB | 22/24 |
| Stockfish (reference) | 20/24 |
| Lc0 (same weights) | 19/24 |
| MetalFish-MCTS | 19/24 |
Iterative-deepening PVS with full pruning and reduction suite:
- Aspiration windows, null move pruning, futility, razoring
- Late Move Reductions/Pruning, singular extensions
- History heuristics (butterfly, capture, continuation, pawn)
- Static Exchange Evaluation for capture ordering
- Transposition table with cluster-based replacement
- Syzygy tablebase probing
- Dual-network CPU NNUE with NEON SIMD
GPU-accelerated Monte Carlo Tree Search matching Lc0's search behavior:
- PUCT selection with logarithmic exploration growth
- KLD (Kullback-Leibler Divergence) stopper for search stability
- Smart pruning with cooperative stopper hints
- Multiple time managers (smooth, legacy, alphazero, simple)
- First Play Urgency reduction
- Moves Left Head utility
- Arena-allocated nodes (128-byte aligned)
- Batched Metal/MPSGraph transformer inference
- O(1) policy lookup via pre-built index table
- Solid tree optimization for cache locality
- Tree reuse with visit baseline tracking
Runs MCTS and Alpha-Beta simultaneously with intelligent move arbitration:
- CPU (AB) and GPU (MCTS) at full throughput in parallel
- Multi-tier confidence system (reliable/strong/overwhelming) for move selection
- MCTS root rejection: overrides AB when transformer strongly disagrees
- AB root order hints: MCTS policy guides AB's move ordering
- Position classifier tunes decision weights by position type
- Visit-share-based confidence (current-search visits only, excludes tree reuse)
- First-class
ponderhitsupport—converts ponder search in-place, preserving all work - Lock-free atomic communication via shared state
- Dual-network architecture (big: 1024, small: 128 hidden)
- HalfKAv2_hm feature set with incremental accumulator updates
- 8 layer stacks with PSQT buckets
- NEON dot product SIMD on Apple Silicon
- 112-plane input encoding (8 history positions + auxiliary)
- Multi-head attention encoder with FFN
- Attention-based policy head (1858 moves)
- WDL value head + moves-left head
- Supports
.pb/.pb.gzweights (float32, float16, bfloat16)
| Optimization | Detail |
|---|---|
| FP16 weights | 2x memory bandwidth for transformer inference |
| Unified memory | Zero-copy CPU/GPU sharing, no transfer overhead |
| Buffer pooling | os_unfair_lock-guarded I/O pool, no per-inference allocation |
| MPSGraph | Apple's graph API for transformer encoder/attention/FFN |
| vDSP softmax | Accelerate framework SIMD for policy computation |
| 128-byte alignment | Node structures match Apple Silicon cache lines |
| Sub-batch parallelism | Large batches split across parallel command buffers |
| NEON dot product | armv8.2-a+dotprod for NNUE feature transforms |
| ARM yield | __builtin_arm_yield() in spin-wait loops |
| CPU-only NNUE | GPU reserved exclusively for transformer inference |
MetalFish includes a production Lichess bot (tools/lichess_bot.py) with:
- Opening book from Lichess masters database (instant moves)
- Pondering (thinks on opponent's time)
- Elo-based opponent seeking with widening range
- Time control rotation (rapid → blitz → bullet)
- Engine crash recovery and automatic restart
- Syzygy endgame tablebases
python3 tools/lichess_bot.py --seek --tc "10+5" --no-casual --elo-seek- macOS 13.0+, Xcode Command Line Tools
- CMake 3.20+, Protobuf 3.0+
- Apple Silicon (M1/M2/M3/M4) recommended
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(sysctl -n hw.ncpu)networks/nn-c288c895ea92.nnue # NNUE (auto-loaded)
networks/nn-37f18f62d772.nnue # NNUE small (auto-loaded)
networks/BT4-1024x15x32h-swa-6147500.pb # Transformer (set via NNWeights)./build/metalfish# Alpha-Beta (default)
go movetime 5000
# MCTS
setoption name UseMCTS value true
setoption name NNWeights value networks/BT4-1024x15x32h-swa-6147500.pb
go movetime 5000
# Hybrid (strongest)
setoption name UseHybridSearch value true
setoption name NNWeights value networks/BT4-1024x15x32h-swa-6147500.pb
setoption name Threads value 11
go movetime 5000
| Option | Default | Description |
|---|---|---|
Threads |
1 | Search threads (Hybrid splits between AB and MCTS) |
Hash |
16 | Transposition table (MB) |
UseMCTS |
false | Pure MCTS mode |
UseHybridSearch |
false | Hybrid MCTS+AB mode |
NNWeights |
Transformer network path | |
SyzygyPath |
Endgame tablebase directory | |
Ponder |
false | Think on opponent's time |
HybridMCTSThreads |
0 | MCTS threads (0 = auto) |
HybridABThreads |
0 | AB threads (0 = auto) |
HybridTrace |
false | Log decision diagnostics |
./build/metalfish_tests # Unit tests
python3 tests/testing.py # UCI protocol + perft
python3 tests/test_ponder_stress.py --smoke # Fast ponder lifecycle smoke test
python3 tests/test_ponder_stress.py # Full ponder lifecycle stress
python3 tests/paper_benchmarks.py --tactical --movetime 5000 # BK suitesrc/
core/ Bitboard, position, move generation
eval/ CPU NNUE evaluation, Metal backend
nn/ Transformer network, MPSGraph inference
search/ Alpha-Beta search engine
mcts/ MCTS search engine
hybrid/ Hybrid MCTS+AB engine
uci/ UCI protocol layer
syzygy/ Endgame tablebase probing
tests/ Unit tests, benchmarks, stress tests
tools/ Lichess bot, tournament scripts, trace analysis
networks/ Weight files
GNU General Public License v3.0. See LICENSE.
Nripesh Niketan