A framework for computer-use software engineering agents that interact with computers through visual perception and basic actions.
Programming with Pixels (PwP) is a modern framework for evaluating and developing Software Engineering (SWE) agents that interact with computers as humans do - through visual perception and basic actions like typing and clicking.
Our motivating hypothesis is that achieving general-purpose Software Engineering (SWE) agents requires a shift to computer-use agents that can interact with any IDE interface through screenshots and primitive actions, rather than through specialized tool APIs.
- Visual Interface Interaction: Interact with any visual interface through screenshots and input commands
- Docker Containerization: Secure, reproducible environments for running applications
- Benchmark Suite: Extensive set of programming tasks for evaluating agents
- Agent Framework: Built-in support for different agent architectures
- VNC Support: Remote viewing of graphical environments
- CLI Interface: Easy command-line tools for working with environments
pip install programming-with-pixelsfrom pwp import PwP
from pwp import PwPBench
# Create a basic environment
env = PwP(image_name='pwp_env')
# Take a screenshot
observation = env.render()
observation.save('screenshot.png')
# Execute a command
result = env.step("echo 'Hello, World!'")
print(result['output'])
# Try a benchmark task
bench = PwPBench('humaneval')
dataset = bench.get_dataset()
task_env = bench.get_env(dataset[0])This project is licensed under the MIT License.