Hi,
I trained a model using the Flow-GRPO training pipeline following the README setup:
- Base model:
Qwen/Qwen2.5-7B-Instruct (served via vLLM)
- MODEL_ENGINE in
config.yaml: ["trainable", "gpt-4o-mini", "gpt-4o-mini", "gpt-4o-mini"]
(i.e., I used gpt-4o-mini for the non-trainable model engines, while the trainable engine is the local vLLM-served model)
Now I’m trying to evaluate the trained checkpoint using the “AgentFlow Benchmark” section in the README. My understanding is:
- Serve the model checkpoint with vLLM
- Run the benchmark
run.sh script
Question: Is that the correct evaluation flow?
If so, I’m confused about how to use my local training checkpoint with the provided benchmark scripts. The example scripts seem written for the published model on Hugging Face (AgentFlow-7B/agentflow-planner-7b). I’m not sure what to change in:
…so they load my checkpoint saved under:
checkpoints/AgentFlow_pro/AgentFlow_pro/global_step_*
Could you please clarify what edits are needed and the recommended way to point the benchmark scripts to a locally trained checkpoint?
Thanks!
Hi,
I trained a model using the Flow-GRPO training pipeline following the README setup:
Qwen/Qwen2.5-7B-Instruct(served via vLLM)config.yaml:["trainable", "gpt-4o-mini", "gpt-4o-mini", "gpt-4o-mini"](i.e., I used
gpt-4o-minifor the non-trainable model engines, while the trainable engine is the local vLLM-served model)Now I’m trying to evaluate the trained checkpoint using the “AgentFlow Benchmark” section in the README. My understanding is:
run.shscriptQuestion: Is that the correct evaluation flow?
If so, I’m confused about how to use my local training checkpoint with the provided benchmark scripts. The example scripts seem written for the published model on Hugging Face (
AgentFlow-7B/agentflow-planner-7b). I’m not sure what to change in:serve_vllm.shrun.sh…so they load my checkpoint saved under:
checkpoints/AgentFlow_pro/AgentFlow_pro/global_step_*Could you please clarify what edits are needed and the recommended way to point the benchmark scripts to a locally trained checkpoint?
Thanks!