This document provides detailed information about the PwP package structure, architecture, and usage patterns.
PwP is built around a core Docker-based environment system that allows AI agents to interact with visual interfaces through screenshots and input commands. The system follows a modular design with the following key components:
pwp/
├── env/ # Environment module for Docker-based visual interfaces
├── bench/ # Benchmark module for evaluating agents
├── agents/ # Agent implementations
├── utils/ # Utility functions and helpers
├── tools/ # Tools for agent interaction
├── functions/ # Function implementations for tools
├── prompts/ # Prompt templates for agents
└── docker/ # Docker configuration files for environments
The core environment module provides the PwP class, which manages Docker containers for visual interface interaction.
- Docker container management (creation, starting, stopping)
- Screenshot capture and rendering
- Command execution within the container
- File manipulation within the container
- VNC support for remote viewing
- Checkpointing system for environment state preservation
from pwp.env import PwP
# Create a basic environment
env = PwP(image_name='pwp_env')
# Execute a command
result = env.step("ls -la")
print(result['output'])
# Take a screenshot
screenshot = env.render()
screenshot.save('current_state.png')
# Get DOM structure with bounding boxes
annotated_img, dom_data = env.get_som_image(screenshot)
annotated_img.save('annotated_screenshot.png')
# Get currently visible file in editor
file_view = env.get_file_view()
print(f"Current file: {file_view['filePath']}")
print(f"Cursor position: {file_view['cursorPosition']}")
# Create a checkpoint
env.add_checkpoint("my_checkpoint")
# Restore from checkpoint
env.restore_checkpoint("my_checkpoint")
# Clean up
env.stop()
env.remove()The benchmark module provides the PwPBench class, which manages benchmark tasks and evaluation.
PwP supports a wide range of benchmark tasks:
humaneval: Python coding problemsswebench: Software engineering benchmarkswebench-java: Java software engineering benchmarkdsbench: Data science benchmarkchartmimic: Chart recreation tasksintercode: Interactive coding in bash, SQL, CTFdesign2code: Converting design mockups to codecanitedit: Code editing tasksresq: Reasoning about SQL queriesminictx: Minimal context understandingbird: BI reporting dashboard tasksvscode: VSCode-specific tasksnocode: No-code tool interactionswebench-mm: Multimodal software engineering
from pwp.bench import PwPBench
# Create a benchmark instance
bench = PwPBench('humaneval')
# Get the dataset
dataset = bench.get_dataset()
print(f"Loaded {len(dataset)} tasks")
# Create an environment for a specific task
env = bench.get_env(dataset[0])
# Evaluate a solution
reward = bench.get_reward(env, dataset[0])
print(f"Task reward: {reward}")The agents module provides implementations of different agent architectures for interacting with visual interfaces.
AssistedAgent: Agent that receives assistance from a humanComputerUseAgent: Agent that interacts directly with the computer
The utilities module provides helper functions for various tasks:
- Image processing and manipulation
- DOM element parsing and visualization
- LLM utilities for embedding and encoding
- Caching utilities
from pwp.utils.utils import draw_bounding_boxes
# Draw bounding boxes on an image based on DOM data
annotated_img = draw_bounding_boxes(
dom_data_csv,
screenshot,
viewport_size={'height': 1080, 'width': 1920},
caption_icons=True
)The tools module provides implementations of tools that agents can use to interact with environments.
- Computer interaction tools (mouse, keyboard)
- File system tools (read, write, search)
- UI analysis tools (element identification)
- DOM manipulation tools
The functions module provides implementations of functions that are called by tools.
The prompts module provides templates for different agent types.
The docker module contains environment Dockerfile and other configuration, setup scripts for creating Docker environments.
You can create custom environments by extending the base Docker image:
FROM pwp_env
# Install additional packages
RUN apt-get update && apt-get install -y \
your-package \
&& rm -rf /var/lib/apt/lists/*
# Add application files
COPY your-app /home/devuser/your-app
# Set up application
RUN cd /home/devuser/your-app && \
npm installTo add a new benchmark task, please follow the detailed instructions in our Contributing Guidelines.
In brief:
- Create a directory in
pwp_bench/with your task name - Add a
data.jsonlordata.jsonfile with task examples - Create a
setup_filesdirectory withsetup.pyandeval.py - Add your task to
task_configsinpwp.bench.benchmark
To create a new agent:
- Create a new file in
pwp.agents - Implement the agent interface
- Add any necessary prompts to
pwp.prompts - Add any custom tools to
pwp.tools