Skip to content

NripeshN/MetalFish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

568 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MetalFish

A high-performance UCI chess engine built for Apple Silicon, combining CPU-optimized NNUE alpha-beta search with Metal GPU transformer inference in a parallel hybrid architecture.

Overview

MetalFish runs three search modes selectable at runtime via UCI options. The hybrid mode—its primary strength—runs alpha-beta and MCTS in true parallel, using Apple's unified memory for zero-copy communication between CPU and GPU workloads.

Engine Description UCI Option
Alpha-Beta Classical PVS with CPU NNUE (~7M NPS) Default
MCTS Transformer-backed tree search via Metal/MPSGraph UseMCTS true
Hybrid Parallel MCTS + AB with dynamic move arbitration UseHybridSearch true

Benchmark Results (BK Tactical Suite, 5s/position)

Engine Score
MetalFish-Hybrid 22/24
MetalFish-AB 22/24
Stockfish (reference) 20/24
Lc0 (same weights) 19/24
MetalFish-MCTS 19/24

Search Engines

Alpha-Beta (search/)

Iterative-deepening PVS with full pruning and reduction suite:

  • Aspiration windows, null move pruning, futility, razoring
  • Late Move Reductions/Pruning, singular extensions
  • History heuristics (butterfly, capture, continuation, pawn)
  • Static Exchange Evaluation for capture ordering
  • Transposition table with cluster-based replacement
  • Syzygy tablebase probing
  • Dual-network CPU NNUE with NEON SIMD

MCTS (mcts/)

GPU-accelerated Monte Carlo Tree Search matching Lc0's search behavior:

  • PUCT selection with logarithmic exploration growth
  • KLD (Kullback-Leibler Divergence) stopper for search stability
  • Smart pruning with cooperative stopper hints
  • Multiple time managers (smooth, legacy, alphazero, simple)
  • First Play Urgency reduction
  • Moves Left Head utility
  • Arena-allocated nodes (128-byte aligned)
  • Batched Metal/MPSGraph transformer inference
  • O(1) policy lookup via pre-built index table
  • Solid tree optimization for cache locality
  • Tree reuse with visit baseline tracking

Hybrid (hybrid/)

Runs MCTS and Alpha-Beta simultaneously with intelligent move arbitration:

  • CPU (AB) and GPU (MCTS) at full throughput in parallel
  • Multi-tier confidence system (reliable/strong/overwhelming) for move selection
  • MCTS root rejection: overrides AB when transformer strongly disagrees
  • AB root order hints: MCTS policy guides AB's move ordering
  • Position classifier tunes decision weights by position type
  • Visit-share-based confidence (current-search visits only, excludes tree reuse)
  • First-class ponderhit support—converts ponder search in-place, preserving all work
  • Lock-free atomic communication via shared state

Neural Networks

NNUE (eval/nnue/)

  • Dual-network architecture (big: 1024, small: 128 hidden)
  • HalfKAv2_hm feature set with incremental accumulator updates
  • 8 layer stacks with PSQT buckets
  • NEON dot product SIMD on Apple Silicon

Transformer (nn/)

  • 112-plane input encoding (8 history positions + auxiliary)
  • Multi-head attention encoder with FFN
  • Attention-based policy head (1858 moves)
  • WDL value head + moves-left head
  • Supports .pb / .pb.gz weights (float32, float16, bfloat16)

Apple Silicon Optimizations

Optimization Detail
FP16 weights 2x memory bandwidth for transformer inference
Unified memory Zero-copy CPU/GPU sharing, no transfer overhead
Buffer pooling os_unfair_lock-guarded I/O pool, no per-inference allocation
MPSGraph Apple's graph API for transformer encoder/attention/FFN
vDSP softmax Accelerate framework SIMD for policy computation
128-byte alignment Node structures match Apple Silicon cache lines
Sub-batch parallelism Large batches split across parallel command buffers
NEON dot product armv8.2-a+dotprod for NNUE feature transforms
ARM yield __builtin_arm_yield() in spin-wait loops
CPU-only NNUE GPU reserved exclusively for transformer inference

Lichess Bot

MetalFish includes a production Lichess bot (tools/lichess_bot.py) with:

  • Opening book from Lichess masters database (instant moves)
  • Pondering (thinks on opponent's time)
  • Elo-based opponent seeking with widening range
  • Time control rotation (rapid → blitz → bullet)
  • Engine crash recovery and automatic restart
  • Syzygy endgame tablebases
python3 tools/lichess_bot.py --seek --tc "10+5" --no-casual --elo-seek

Building

Requirements

  • macOS 13.0+, Xcode Command Line Tools
  • CMake 3.20+, Protobuf 3.0+
  • Apple Silicon (M1/M2/M3/M4) recommended

Build

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(sysctl -n hw.ncpu)

Network Files

networks/nn-c288c895ea92.nnue       # NNUE (auto-loaded)
networks/nn-37f18f62d772.nnue       # NNUE small (auto-loaded)
networks/BT4-1024x15x32h-swa-6147500.pb  # Transformer (set via NNWeights)

Usage

./build/metalfish

Engine Modes

# Alpha-Beta (default)
go movetime 5000

# MCTS
setoption name UseMCTS value true
setoption name NNWeights value networks/BT4-1024x15x32h-swa-6147500.pb
go movetime 5000

# Hybrid (strongest)
setoption name UseHybridSearch value true
setoption name NNWeights value networks/BT4-1024x15x32h-swa-6147500.pb
setoption name Threads value 11
go movetime 5000

Key UCI Options

Option Default Description
Threads 1 Search threads (Hybrid splits between AB and MCTS)
Hash 16 Transposition table (MB)
UseMCTS false Pure MCTS mode
UseHybridSearch false Hybrid MCTS+AB mode
NNWeights Transformer network path
SyzygyPath Endgame tablebase directory
Ponder false Think on opponent's time
HybridMCTSThreads 0 MCTS threads (0 = auto)
HybridABThreads 0 AB threads (0 = auto)
HybridTrace false Log decision diagnostics

Testing

./build/metalfish_tests              # Unit tests
python3 tests/testing.py             # UCI protocol + perft
python3 tests/test_ponder_stress.py --smoke  # Fast ponder lifecycle smoke test
python3 tests/test_ponder_stress.py  # Full ponder lifecycle stress
python3 tests/paper_benchmarks.py --tactical --movetime 5000  # BK suite

Project Structure

src/
  core/       Bitboard, position, move generation
  eval/       CPU NNUE evaluation, Metal backend
  nn/         Transformer network, MPSGraph inference
  search/     Alpha-Beta search engine
  mcts/       MCTS search engine
  hybrid/     Hybrid MCTS+AB engine
  uci/        UCI protocol layer
  syzygy/     Endgame tablebase probing
tests/        Unit tests, benchmarks, stress tests
tools/        Lichess bot, tournament scripts, trace analysis
networks/     Weight files

License

GNU General Public License v3.0. See LICENSE.

Author

Nripesh Niketan

About

GPU-accelerated Chess Engine

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors