Skip to content

[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI#63

Merged
0oshowero0 merged 2 commits intoAscend:mainfrom
vermouth1992:chi/dev/dry-run-v1
Mar 28, 2026
Merged

[recipe] feat: Revamp single-controller demo with agentic multi-turn rollout and add CI#63
0oshowero0 merged 2 commits intoAscend:mainfrom
vermouth1992:chi/dev/dry-run-v1

Conversation

@vermouth1992
Copy link
Copy Markdown
Collaborator

Summary

  • Rewrite single_controller_demo.py to showcase a realistic agentic RLHF training loop: replaces the
    previous hardcoded tensor inputs and AsyncvLLMServer with a multi-turn AgentLoop that interleaves
    LLM generation with simulated tool calls, an OpenAI-style MessageDataset with interleaved image
    support, a proper DataLoader-based training loop, and an explicit compute_reward step that writes
    advantages back through TQ.
  • Adopt the new KV APIs introduced in [feat,CI] Improve KV API usability and KVBatchMeta interactions #57 throughout the recipe — uses kv_batch_get_by_meta,
    the return value of kv_batch_put (cumulative KVBatchMeta), and removes all manual
    kv_meta.fields.append(...) calls.
  • Add structured configuration via @dataclass classes (TrainerConfig, AgentLoopConfig,
    MessageDatasetConfig) and argparse CLI, replacing the flat OmegaConf.create dict. Trainer config
    and TQ config are now cleanly separated.
  • Add recipe-check.yml CI workflow that runs the demo end-to-end on every push/PR with reduced
    parameters (--num-samples 8 --global-batch-size 4 --rollout-agent-num-workers 1) to keep CI fast.

Key changes

Area Before After
Rollout AsyncvLLMServer (Ray actor, single-turn) AgentLoop (multi-turn with tool calls, per-sample async)
Data Hardcoded [[1,2],[3,4],...] tensors MessageDataset + DataLoader with random multi-modal messages
Reward Simulated inline (time.sleep) compute_reward() producing per-token advantages via TQ
KV API kv_batch_get(keys=..., fields=...) + manual field tracking kv_batch_get_by_meta(meta=...) + kv_batch_put return value
Config Flat OmegaConf dict Typed @dataclass hierarchy + argparse CLI
CI None recipe-check.yml workflow

Test plan

  • Run the recipe locally: python recipe/simple_use_case/single_controller_demo.py --num-samples 8 --global-batch-size 4
  • Verify the new recipe-check.yml workflow passes in CI
  • Confirm existing tests (pytest tests) still pass

Signed-off-by: Chi Zhang <czhangseu@gmail.com>
Signed-off-by: Chi Zhang <czhangseu@gmail.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

vermouth1992, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@0oshowero0 0oshowero0 merged commit a367879 into Ascend:main Mar 28, 2026
7 checks passed
@vermouth1992 vermouth1992 deleted the chi/dev/dry-run-v1 branch March 28, 2026 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants