How to run AgentFlow Benchmark with a locally trained Flow-GRPO checkpoint

Hi,

I trained a model using the Flow-GRPO training pipeline following the README setup:

- Base model: `Qwen/Qwen2.5-7B-Instruct` (served via vLLM)
- MODEL_ENGINE in `config.yaml`: `["trainable", "gpt-4o-mini", "gpt-4o-mini", "gpt-4o-mini"]`
(i.e., I used `gpt-4o-mini` for the non-trainable model engines, while the trainable engine is the local vLLM-served model)

Now I’m trying to evaluate the trained checkpoint using the “AgentFlow Benchmark” section in the README. My understanding is:

1. Serve the model checkpoint with vLLM
2. Run the benchmark `run.sh` script

**Question:** Is that the correct evaluation flow?

If so, I’m confused about how to use my **local training checkpoint** with the provided benchmark scripts. The example scripts seem written for the published model on Hugging Face (`AgentFlow-7B/agentflow-planner-7b`). I’m not sure what to change in:

- `serve_vllm.sh`
- `run.sh`

…so they load my checkpoint saved under:
`checkpoints/AgentFlow_pro/AgentFlow_pro/global_step_*`

Could you please clarify what edits are needed and the recommended way to point the benchmark scripts to a locally trained checkpoint?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run AgentFlow Benchmark with a locally trained Flow-GRPO checkpoint #33

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to run AgentFlow Benchmark with a locally trained Flow-GRPO checkpoint #33

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions