Domain-agnostic reinforcement learning agent for Game Boy games using curiosity-driven exploration.
NeuralQuest is a research project that implements a pure curiosity-driven RL agent capable of learning to play Game Boy games without any game-specific knowledge, hardcoded behaviors, or external rewards. The agent uses Random Network Distillation (RND) for intrinsic motivation and archive-based exploration for systematic state discovery.
- Domain-Agnostic Learning: No hardcoded game logic, waypoints, or ROM-specific features
- Pure Curiosity: All learning driven by intrinsic motivation (RND) with optional terminal rewards
- Archive-Based Exploration: Systematic discovery and revisiting of novel states
- Minimal Dependencies: NumPy-only neural networks, no PyTorch/TensorFlow required
- Reproducible: Fully deterministic execution with configurable seeds
- Environment: PyBoy Game Boy emulator wrapper with RAM-based observations
- Policy: Actor-Critic (A2C) with Generalized Advantage Estimation (GAE)
- Curiosity: Random Network Distillation for intrinsic rewards
- Exploration: SimHash-based state archive with frontier sampling
- Networks: Pure NumPy implementation with custom backpropagation
- 9-Action Discrete Control: noop, up, down, left, right, A, B, start, select
- RAM Observations: Frame-stacked emulator RAM (no pixel processing)
- Archive System: 64-bit SimHash for state discretization and frontier-based resets
- Persistent Learning: Save/load networks and archive for continued exploration
# Clone repository
git clone https://github.com/your-org/neuralquest.git
cd neuralquest
# Install dependencies
pip install -r requirements.txt
# Or install in development mode
pip install -e .# Train agent on a Game Boy ROM (you must legally own the ROM)
python -m redai.cli.run_train path/to/your/game.gb --config configs/baseline.toml
# Resume training from checkpoint
python -m redai.cli.run_train path/to/your/game.gb --resume checkpoints/epoch_001000
# Evaluate trained agent
python -m redai.cli.eval path/to/your/game.gb --checkpoint checkpoints/epoch_001000 --episodes 50
# Quick smoke test (5 minutes)
python -m redai.cli.run_train path/to/your/game.gb --smoke-test# Vectorized training (recommended) - 10 parallel PyBoy instances
python train_vector.py --config configs/pokemon_red_vector_exploration_fixed.toml --n-envs 10
# Legacy single-environment training
python scripts/train_pokemon.py --mode standard
# Fast training for quick results
python scripts/train_pokemon.py --mode fast
# Maximum exploration to discover the entire game world
python scripts/train_pokemon.py --mode exploration
# Evaluate and watch the agent play
python scripts/eval_pokemon.py --mode standard --render --episodes 5
# Analyze exploration patterns and frontier cells
python scripts/eval_pokemon.py --mode standard --analyze-archive --frontier-analysisNeuralQuest now supports vectorized training with multiple parallel PyBoy instances for 10x faster learning:
# Full vectorized training with progress monitoring
python train_vector.py --config configs/pokemon_red_vector_exploration_fixed.toml \
--n-envs 10 --visual-env -1 --monitor-progress
# Quick vectorized test
python train_vector.py --config configs/pokemon_red_vector_exploration_fixed.toml \
--n-envs 4 --epochs 10
# Resume vectorized training
python train_vector.py --config configs/pokemon_red_vector_exploration_fixed.toml \
--resume pokemon_vector_checkpoints/epoch_050 --n-envs 10Vectorized Features:
- 10x Performance: 600+ total FPS across parallel environments
- Progress Monitoring: Auto-capture screenshots from all environments
- Stability: Fixed RND collapse with robust intrinsic reward system
- Enhanced Debugging: Frontier sampling decision logging and archive verification
Customize training via TOML configuration files:
[env]
frame_skip = 4
sticky_p = 0.1
seed = 1337
[algo]
gamma = 0.995
lr_policy = 3e-4
batch_horizon = 2048
[rnd]
beta = 0.2
lr = 1e-3
reward_clip = 5.0
[archive]
capacity = 20000
p_frontier = 0.25
novel_lru = 5000Override any parameter from command line:
python -m redai.cli.run_train game.gb --override rnd.beta=0.3 --override algo.lr_policy=1e-3Monitor training progress via CSV logs:
- Exploration: Unique cells discovered per hour, archive growth
- Performance: Episode length distribution, policy entropy
- Learning: Policy/value losses, gradient norms, intrinsic rewards
- Vectorized: Per-environment statistics, total throughput (600+ FPS)
Screenshot Capture:
- Automatic screenshots every 10 seconds from all environments
- Saved to
progress_screenshots/env_XX/latest.png - Visual confirmation of agent progress and exploration
Web Dashboard:
- Real-time training metrics visualization
- Progress screenshots from all environments
- Archive growth and exploration statistics
# View real-time progress with web dashboard
python -m http.server 8000
# Open http://localhost:8000/progress_viewer.htmlThe agent demonstrates domain-agnostic learning through:
- Increasing exploration rate: More unique states discovered over time
- Improving survival: Longer episodes before game over
- Exploration diversity: Discovery of different game areas and states
- Persistent improvement: Performance maintained after checkpoint resume
- Ablation validation: Removing RND/archive collapses exploration
Run comprehensive test suite:
# All tests
python -m pytest redai/tests/
# Specific test categories
python -m pytest redai/tests/test_gradcheck.py # Neural network gradients
python -m pytest redai/tests/test_archive.py # Archive and exploration
python -m pytest redai/tests/test_rnd.py # Random Network Distillation
# Manual gradient checking
python -m redai.tests.test_gradcheck
# Integration smoke test
python -m redai.cli.run_train path/to/rom.gb --smoke-testThe repository is production-ready with:
NeuralQuest/
โโโ redai/ # Core package
โ โโโ algo/ # A2C algorithm
โ โโโ envs/ # PyBoy environment + vectorization
โ โโโ explore/ # Archive & hashing
โ โโโ nets/ # NumPy networks + RND fixes
โ โโโ tracking/ # Empty tracking module
โ โโโ tests/ # Test suite
โ โโโ train/ # Training infrastructure + vectorized trainer
โโโ configs/ # TOML configurations
โโโ scripts/ # Training & evaluation scripts
โโโ roms/ # Game ROMs (gitignored)
โโโ train_vector.py # Vectorized training entry point (NEW)
โโโ progress_viewer.html # Real-time monitoring dashboard (NEW)
โโโ pokemon_vector_logs/ # Vectorized training logs (NEW, gitignored)
โโโ progress_screenshots/ # Training progress images (NEW, gitignored)
โโโ requirements.txt # Dependencies
โโโ setup.py # Package installation
โโโ .gitignore # Comprehensive exclusions
The .gitignore excludes:
- Training artifacts: checkpoints, logs, save files
- ROMs: Game Boy files (*.gb, *.gbc, *.gba)
- Python cache: pycache, *.pyc, build artifacts
- Development: debug files, local configs, temp files
- System: OS-specific files (Windows, macOS, Linux)
- Data: Large model files, datasets, media files
# Initialize repository
git init
git add .
git commit -m "Initial NeuralQuest implementation
๐ฎ Domain-agnostic RL agent for Game Boy games
๐ง RND curiosity + archive-based exploration
๐ฌ Pure NumPy neural networks
โ
Comprehensive test suite"
# Add ROM files locally (not tracked)
cp your_game.gb roms/pokemon_red.gb
# Train with version control
git checkout -b experiment/pokemon-training
python scripts/train_pokemon.py --mode standard
git add configs/ -A
git commit -m "Add Pokemon Red training configuration"- ROM Management: ROMs are gitignored - distribute separately
- Checkpoint Storage: Training artifacts excluded from VCS
- Configuration: Use TOML files for reproducible experiments
- Dependencies: Minimal requirements (NumPy, PyBoy, tomli)
- Testing: Run full test suite before deployment
- Fixed random target network and trainable predictor
- Prediction error serves as intrinsic reward for novel states
- Reward normalization via exponential moving averages
- Stability Fixes: L2 regularization, predictor reset mechanism, fallback exploration bonus
- SimHash (random projection + sign) for state discretization
- Frontier scoring:
ฮฑ/(1+visits) + ฮณ*age + ฮถ*depth - LRU novelty detection with Hamming distance thresholds
- Automatic capacity management with intelligent eviction
- NumPy-Only Networks: Custom MLP with Adam optimizer and gradient clipping
- Deterministic Execution: Reproducible results across platforms when enabled
- Efficient Archive: Compressed savestates with hash-based collision detection
- Performance Optimized: >30k environment frames/minute target
- Throughput: โฅ30,000 environment frames/minute (headless mode)
- Exploration: >100 unique cells/hour sustained discovery rate
- Memory: Archive capacity up to 20,000 states with compression
- Determinism: Identical results from identical seeds and configurations
- Throughput: 600+ total FPS across 10 parallel environments
- Per-Environment: ~60 FPS per PyBoy instance
- Exploration: >1000 unique cells/hour with parallel discovery
- Memory: Archive capacity up to 10M states with intelligent scaling
- Stability: No crashes during extended training (500+ epochs)
- ROM Ownership: Users must legally own Game Boy ROM files
- Research Use: Single-player research applications only
- No Distribution: No ROM files or ROM-specific data included
- Privacy: Logs contain only emulator states and learned parameters
redai/
โโโ envs/ # PyBoy environment wrapper + vectorization
โโโ nets/ # NumPy neural networks (MLP, RND) + stability fixes
โโโ algo/ # A2C algorithm implementation
โโโ explore/ # Archive system and hashing
โโโ tracking/ # Empty tracking module
โโโ train/ # Training loop + vectorized trainer
โโโ cli/ # Command-line interface
โโโ tests/ # Comprehensive test suite
- Install development dependencies:
pip install -e .[dev] - Run tests:
pytest - Format code:
black redai/ - Type checking:
mypy redai/ - Linting:
flake8 redai/
- v1.0: RAM-only observations, A2C+RND, archive exploration โ
- v1.1: Vectorized training, RND stability fixes โ (current)
- v1.2: Optional PyTorch backend for acceleration
- v2.0: Pixel observations with convolutional networks
- v3.0: Hierarchical exploration with option-critic
- v4.0: Model-based planning with learned dynamics
- RND Paper: Random Network Distillation
- Go-Explore: Archive-based exploration
- PyBoy: Game Boy emulator
- GAE Paper: Generalized Advantage Estimation
MIT License - see LICENSE file for details.
Built with PyBoy emulator and inspired by curiosity-driven exploration research.