We welcome contributions.
Want to evaluate a new coding agent? The framework is designed for this.
- Implement the
Agentinterface fromsrc/slop_code/agent_runner/agent.py - Define lifecycle methods:
setup(),run(),reset(),cleanup() - Create an agent configuration YAML in
configs/agents/ - Add setup documentation in
docs/agents/
Contribute new evaluation problems to expand the benchmark. The process has three steps: design your problem, implement it, then validate and submit.
Start with a full problem concept, then break it into checkpoints. Good problems test whether agents can write flexible, maintainable code that handles progressive requirements.
- Problem Design Philosophy - What makes a good problem, checkpoint patterns, common pitfalls
- Example: Designing a Calculator Problem - See the design thinking process in action
Turn your design into code with test cases, loader, and verifier.
- Step-by-Step Tutorial - Create your first problem (30 min hands-on)
- Problem Structure Reference - Directory layout and file roles
- Test Case Authoring - Writing effective test cases
- Creating Loaders & Verifiers - Implementation patterns
Ensure your problem works and submit a PR.
- Validation Checklist - Pre-submission checklist with PR guidance
- Troubleshooting - Common issues and solutions
| Component | Description |
|---|---|
config.yaml |
Problem configuration with inline checkpoint definitions |
checkpoint_N.md |
Specification for each checkpoint |
tests/test_checkpoint_N.py |
Pytest tests for each checkpoint |
tests/conftest.py |
Shared pytest fixtures |
tests/data/checkpoint_N/ |
Test case data (core, hidden, errors) |
uv syncuv run pytest -q # Run all tests
uv run pytest tests/path/to/test_file.py # Run specific testuv run ruff check . # Lint
uv run isort . # Format imports- Line length: 80 characters
- Use
from __future__ import annotationsin all modules - Use pathlib (
Path) instead ofos.path - Type all function signatures
- Use Pydantic models for configuration
- Use
structlog.get_logger(__name__)for logging
uv run slop-code --helpOr look at the documentation.