EnvBeats

This is an example repo demonstrating integration between OpenEnv and AgentBeats.

Kickoff: "Assess agent at url=..."
  │
  v
┌────────────────┐   (1) init / reset()  ┌───────────┐
│                │──────────────────────>│           │
│  Assessor A2A  │  (return StepResult   │  OpenEnv  │
│                │       from reset)     │           │
└────────────────┘                       └───────────┘
  │          │                                 ^
  │          │                                 │ exposes
  │          │                                 v
  │          │  (2) Create New MCP server ┌─────────┐        ┌───────────────┐
  │          └───────────────────────────>│         │───────>│               │
  │             & connect to gateway      │ New MCP │        │ MCP-X Gateway │
  │                                       │         │<───────│               │
  │                                       └─────────┘        └───────────────┘
  │                                      (done or                   ^
  │                                       timeout)                  │ (5)
  │                                                                 │ step()
  │                                                                 │ state()
  │                                                                 │
  │  (3) Send task instructions                                     │
  │      (include reset() StepResult)    ┌────────────────┐         │
  └─────────────────────────────────────>│                │─────────┘
                                         │  Assessee A2A  │
                      (4) "Ok will do"   │                │
                        <────────────────│                │
                                         └────────────────┘

How to run

Start MCP-X (at 9000 by default):

cd mcp-x
uv run python mcp_x.py

Start the assessor agent (at 9999 by default):

cd eb_assessor
uv run python main.py

Start the assessee agent (at 9990 by default):

Here we provide three examples, you can choose to run any one of them or all of them to see how different assessees interact with the assessor and the environment.

eb_assessee_gym: a gym-style agent implementation using MCP tools to interact with the environment

cd eb_assessee_gym
uv run python main.py

eb_assessee_pure_mcp: a slightly modified, llm-driven a2a agent directly from google's repo without using any LLM framework

# remember to config the `.env` file first to include the key
cd eb_assessee_pure_mcp/a2a-mcp-without-framework
uv run --env-file ../.env python -m src.no_llm_framework.server.__main__ --port 9990

eb_assessee_human: a human-in-the-loop example where the human can trigger MCP calls manually

# remember to run `npx @modelcontextprotocol/inspector` first to install the inspector for MCP debugging
cd eb_assessee_human
uv run python main.py

Kickoff the evaluation

cd eb_kickoff
uv run python main.py

Workflow

Env init message (from reset()) is passed to the assessee in the task instruction message.
Assessee agent is provided with OpenEnv interfaces as MCPs (state(), step()).

Future work

This could be generic for any OpenEnv environment (without Python type enforcement).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnvBeats

How to run

Workflow

Future work

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
eb_assessee_gym		eb_assessee_gym
eb_assessee_human		eb_assessee_human
eb_assessee_pure_mcp		eb_assessee_pure_mcp
eb_assessor		eb_assessor
eb_kickoff		eb_kickoff
mcp-x		mcp-x
.gitignore		.gitignore
readme.md		readme.md

agentbeats/envbeats

Folders and files

Latest commit

History

Repository files navigation

EnvBeats

How to run

Workflow

Future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages