Skip to content

DAG-based orchestrator: replace builder claim loop with centralized story assignment #2

@markgar

Description

@markgar

Summary

Replace the current decentralized builder claim loop (optimistic locking via git push) with a centralized DAG executor model where the orchestrator assigns stories to builders and launches them per-milestone.

Current Model

  • Builders are long-running processes that loop: claim → plan → build → merge → repeat
  • Claiming uses optimistic locking: mark [N] in BACKLOG.md, commit, push — if push fails (race), pull and retry
  • Fixed number of builders run for the entire session, even when the dependency graph is narrow
  • Builders sit idle waiting when their dependencies aren't met

Proposed Model

The orchestrator becomes a DAG executor (similar to Airflow, make -j, or CI pipeline runners). The backlog dependency graph is the DAG, stories are tasks, builders are workers.

while stories remain:
    newly_eligible = stories unblocked by latest completions
    for story in newly_eligible (up to max_concurrency):
        plan_milestone(story)
        launch_builder(story)  # fresh copilot session
    wait_for_any_builder_to_finish()
    merge_completed_branch()
    mark_story_done()
    # loop — completion may have unblocked more stories

Key changes

  1. Centralized assignment — orchestrator decides which builder gets which story. No claiming races, no retry loops.
  2. Reactive scaling — launch a builder when a story becomes eligible, not a fixed pool upfront. Early bottleneck stories (scaffolding) get 1 builder; wide parallel bands get many.
  3. Stateless builders — each builder is a fresh Copilot session per milestone. Clean context, no drift, no accumulated state.
  4. Per-milestone model selection — orchestrator could assign different models to different stories (opus for complex, haiku for simple).
  5. Orchestrator-managed merge — orchestrator handles branch merging after builder completion, simplifying builder code.

Advantages

  • No git push races or retry noise
  • Smarter scheduling (critical path awareness, load balancing)
  • Never wastes builder slots on idle waiting
  • Builder failures are isolated — retry that milestone without killing the whole loop
  • Natural place for per-story model selection
  • max_concurrency is a resource limit, not a fixed allocation

Complications to address

  • Merge coordination: who merges, when, what if two milestones touch the same files?
  • Reviewer/tester/validator timing: they currently react to branch activity and milestone signals — triggers may need rethinking
  • Planner currently runs inside builder's claim loop — pull it into the orchestrator
  • Need a "builder finished" signal from each short-lived builder invocation
  • Significant restructuring of builder.py and orchestrator.py

Migration path

Could be incremental:

  1. First: move story assignment from builders to orchestrator (eliminate claiming)
  2. Then: make builders single-milestone invocations instead of loops
  3. Then: add reactive scaling based on dependency graph

Migrated from markgar/multi-agent-dev#3

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions