[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI by vermouth1992 · Pull Request #63 · Ascend/TransferQueue

vermouth1992 · 2026-03-28T06:05:57Z

Summary

Rewrite single_controller_demo.py to showcase a realistic agentic RLHF training loop: replaces the
previous hardcoded tensor inputs and AsyncvLLMServer with a multi-turn AgentLoop that interleaves
LLM generation with simulated tool calls, an OpenAI-style MessageDataset with interleaved image
support, a proper DataLoader-based training loop, and an explicit compute_reward step that writes
advantages back through TQ.
Adopt the new KV APIs introduced in [feat,CI] Improve KV API usability and KVBatchMeta interactions #57 throughout the recipe — uses kv_batch_get_by_meta,
the return value of kv_batch_put (cumulative KVBatchMeta), and removes all manual
kv_meta.fields.append(...) calls.
Add structured configuration via @dataclass classes (TrainerConfig, AgentLoopConfig,
MessageDatasetConfig) and argparse CLI, replacing the flat OmegaConf.create dict. Trainer config
and TQ config are now cleanly separated.
Add recipe-check.yml CI workflow that runs the demo end-to-end on every push/PR with reduced
parameters (--num-samples 8 --global-batch-size 4 --rollout-agent-num-workers 1) to keep CI fast.

Key changes

Area	Before	After
Rollout	`AsyncvLLMServer` (Ray actor, single-turn)	`AgentLoop` (multi-turn with tool calls, per-sample async)
Data	Hardcoded `[[1,2],[3,4],...]` tensors	`MessageDataset` + `DataLoader` with random multi-modal messages
Reward	Simulated inline (`time.sleep`)	`compute_reward()` producing per-token advantages via TQ
KV API	`kv_batch_get(keys=..., fields=...)` + manual field tracking	`kv_batch_get_by_meta(meta=...)` + `kv_batch_put` return value
Config	Flat `OmegaConf` dict	Typed `@dataclass` hierarchy + `argparse` CLI
CI	None	`recipe-check.yml` workflow

Test plan

Run the recipe locally: python recipe/simple_use_case/single_controller_demo.py --num-samples 8 --global-batch-size 4
Verify the new recipe-check.yml workflow passes in CI
Confirm existing tests (pytest tests) still pass

Signed-off-by: Chi Zhang <czhangseu@gmail.com>

ascend-robot · 2026-03-28T06:06:09Z

CLA Signature Pass

vermouth1992, thanks for your pull request. All authors of the commits have signed the CLA. 👍

vermouth1992 added 2 commits March 28, 2026 13:55

update

4f46008

Signed-off-by: Chi Zhang <czhangseu@gmail.com>

add recipe to ci

1dd11e2

Signed-off-by: Chi Zhang <czhangseu@gmail.com>

ascend-robot added the ascend-cla/yes label Mar 28, 2026

0oshowero0 approved these changes Mar 28, 2026

View reviewed changes

0oshowero0 merged commit a367879 into Ascend:main Mar 28, 2026
7 checks passed

vermouth1992 deleted the chi/dev/dry-run-v1 branch March 28, 2026 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI#63

[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI#63
0oshowero0 merged 2 commits intoAscend:mainfrom
vermouth1992:chi/dev/dry-run-v1

vermouth1992 commented Mar 28, 2026

Uh oh!

ascend-robot commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vermouth1992 commented Mar 28, 2026

Summary

Key changes

Test plan

Uh oh!

ascend-robot commented Mar 28, 2026

CLA Signature Pass

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants