Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .claude/skills/map-skill-eval/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,26 @@ mapify skill-eval view map-plan
mapify skill-eval view map-plan --result .map/eval-runs/map-plan/20260601T120000-optimize.json --open
```

## Optimizing the whole skill (BODY/logic), not just the description

`mapify skill-eval optimize` tunes only the trigger **`description:`** (does the skill fire on the
right prompt?). To improve a skill's **body/logic** by OUTCOME quality (does it do its job well once
it runs?), do NOT start from scratch — there is a worked, reusable flow and harness:

- **Flow (start here):** `docs/whole-skill-optimization-flow.md` — measure outcome quality on golden
fixtures with a hybrid metric (deterministic gates + a trace-cited LLM judge), then human-edit the
body and re-measure (Approach B). Includes the fixture recipe, the measure→edit loop, and gotchas.
- **Working log + findings:** `docs/whole-skill-optimization-notes.md`.
- **Harness:** `tests/skills_eval/whole_skill/spike_runner.py` (`--degrade {body,actor,monitor}`),
fixtures under `tests/skills_eval/fixtures/whole_skill/`.

**Key finding (don't re-derive):** for thin-orchestration skills (e.g. `map-task`), prose scope/
correctness discipline — in the SKILL.md body OR the shared agent prompts — is **low-leverage**
(ablations showed body-good == body-bad). The real levers are the **`affected_files` contract** and
the **mechanical validators** (`validate_mutation_boundary` + test-gate + the MONITOR warn→feedback
gates). Prose optimization pays off where behavior is genuinely prose-governed: the final **report
format** and the **trigger description** (this skill). Spend effort accordingly.

## Related Commands

- `/map-plan` — plan and decompose tasks.
Expand Down
57 changes: 45 additions & 12 deletions .claude/skills/map-task/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,15 +126,10 @@ Route to the appropriate executor based on `$PHASE`. All phases from `/map-effic
- **ACTOR (2.3)** — Implement the subtask
- **MONITOR (2.4)** — Required validation before the subtask can complete.

Single-subtask execution must keep using the shared branch workspace artifacts rather than creating task-local side files:



- `code-review-00N.md`
- `qa-001.md`
- `pr-draft.md`

When Monitor runs during `/map-task`, append to the next `code-review-00N.md` so targeted subtask execution stays aligned with the full workflow artifact model.
Single-subtask execution must keep using the shared branch workspace artifacts in `.map/<branch>/`
(e.g. `code-review-00N.md`, `qa-001.md`, `pr-draft.md`) rather than creating task-local side files.
When Monitor runs during `/map-task`, append to the next `code-review-00N.md` so targeted subtask
execution stays aligned with the full workflow artifact model.

For each step:
1. Get next step from orchestrator
Expand All @@ -147,7 +142,15 @@ For each step:
- Run `python3 .map/scripts/map_orchestrator.py monitor_failed --feedback "<feedback>"` and retry Actor with feedback (max 5 iterations).
- If the result says `retry_isolation=clean_retry_required`, run `python3 .map/scripts/map_step_runner.py validate_retry_quarantine` and make the next Actor attempt use `.map/<branch>/retry_quarantine.json` as clean-room context instead of rehydrating the rejected approach.

## Step 4: Completion and Progress Report
**Termination (do not loop or fake-complete):** if the 5 Actor iterations are exhausted without Monitor `valid: true`, OR the subtask cannot be satisfied within its declared scope (it would require an out-of-scope file, a dependency change, or a contract not in the blueprint), then STOP. Do NOT mark the subtask complete and do NOT expand scope to force a pass. Emit the **BLOCKED** outcome report (Step 4) stating the reason and the exact contract change needed.

## Step 4: Outcome Report

Every `/map-task` run ends with **exactly one** outcome report — **COMPLETE** or **BLOCKED** —
carrying these required fields: `Subtask`, `Status`, `Files Modified`, `Validation` (test/Monitor
result), and (BLOCKED only) `Blocker` + `Needed`. Never end a run without one of these reports.

### Complete Outcome

When `get_next_step` returns `is_complete: true`:

Expand Down Expand Up @@ -220,6 +223,32 @@ ALL SUBTASKS COMPLETE (${TOTAL}/${TOTAL})
Run /map-check for final verification, or /map-learn to extract patterns.
```

### Blocked Outcome

When the subtask cannot complete within its declared scope (retries exhausted, an out-of-scope
change would be required, or a dependency/contract conflict): do NOT update the plan status to
`complete`. Report the blocker and stop for a contract update:

```text
═══════════════════════════════════════════════════
SUBTASK BLOCKED
═══════════════════════════════════════════════════
Subtask: ${SUBTASK_ID}
Title: <title>
Status: BLOCKED
Files Modified: <list, or "none">
Validation: <Monitor/test result that could not be satisfied>

Blocker: <why it cannot complete in scope — e.g. requires editing <file> not in
this subtask's affected_files, or a dependency change not in the contract>
Needed: <the exact contract change to unblock — e.g. add <file> to ST-XXX
affected_files, or split into a new subtask>
═══════════════════════════════════════════════════
```

Then stop. Suggest `/map-plan` (to amend the decomposition) or ask the user for a contract decision —
do not silently expand scope or mark the subtask complete.

---

## Error Handling
Expand Down Expand Up @@ -261,9 +290,13 @@ Proceed anyway? (The Actor will work with whatever state exists.)
## Examples

```
/map-task <typical args>
/map-task ST-003 # execute subtask ST-003 from the existing plan
```

If a persisted TDD contract exists for the subtask (`test_contract_ST-003.md` +
`test_handoff_ST-003.json`), `/map-task ST-003` automatically resumes at ACTOR against those tests.

## Troubleshooting

- **Issue:** Workflow doesn't behave as expected. **Fix:** Re-read the section above titled 'What this command CANNOT do' (if present) and ensure prerequisites are met. Run `/map-resume` to recover from interruptions.
- **Issue:** Workflow doesn't behave as expected. **Fix:** Confirm the **Prerequisites** (a plan must exist) and re-read the **Mutation Boundary Constraints** and **When Not To Expand Scope** sections above. Run `/map-resume` to recover from an interrupted run.
- **Issue:** The subtask can't pass validation within its allowed files. **Fix:** Don't expand scope — emit the **BLOCKED** outcome report (Step 4) and amend the contract via `/map-plan`.
65 changes: 65 additions & 0 deletions .map/scripts/map_orchestrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,15 @@ class StepState:
contract_ready_subtasks: dict[str, dict] = field(default_factory=dict)
clean_retry_count: int = 0
contaminated_retry_count: int = 0
# Subtask IDs already nudged once for a (non-strict) scope warning. The
# warn->actor-feedback gate (validate_step 2.4) fires at most ONCE per
# subtask, so a persistent false positive (affected_files drift) cannot
# burn the retry budget — after the single nudge the gate passes.
scope_feedback_subtasks: list[str] = field(default_factory=list)
# Subtask IDs already nudged once for a false-progress warning (MONITOR
# approved but the subtask changed NOTHING despite declaring affected_files).
# Same once-per-subtask bound as scope_feedback_subtasks.
progress_feedback_subtasks: list[str] = field(default_factory=list)
retry_isolation_status: dict[str, str] = field(default_factory=dict)
retry_quarantine_paths: dict[str, str] = field(default_factory=dict)
completed_at: Optional[str] = None
Expand Down Expand Up @@ -403,6 +412,8 @@ def to_dict(self) -> dict:
"contract_ready_subtasks": self.contract_ready_subtasks,
"clean_retry_count": self.clean_retry_count,
"contaminated_retry_count": self.contaminated_retry_count,
"scope_feedback_subtasks": self.scope_feedback_subtasks,
"progress_feedback_subtasks": self.progress_feedback_subtasks,
"retry_isolation_status": self.retry_isolation_status,
"retry_quarantine_paths": self.retry_quarantine_paths,
"completed_at": self.completed_at,
Expand Down Expand Up @@ -441,6 +452,8 @@ def from_dict(cls, data: dict) -> "StepState":
contract_ready_subtasks=data.get("contract_ready_subtasks", {}),
clean_retry_count=data.get("clean_retry_count", 0),
contaminated_retry_count=data.get("contaminated_retry_count", 0),
scope_feedback_subtasks=data.get("scope_feedback_subtasks", []),
progress_feedback_subtasks=data.get("progress_feedback_subtasks", []),
retry_isolation_status=data.get("retry_isolation_status", {}),
retry_quarantine_paths=data.get("retry_quarantine_paths", {}),
completed_at=data.get("completed_at"),
Expand Down Expand Up @@ -1158,6 +1171,58 @@ def validate_step(
f"Unexpected files: {scope_report.get('unexpected', [])}"
),
}
# warn->actor-feedback: a non-strict scope leak does NOT hard-fail
# the subtask, but the FIRST time it is seen we route it back to
# the Actor as feedback so it self-corrects (revert the
# out-of-scope edits, or escalate for a contract update). Bounded
# to once per subtask (scope_feedback_subtasks guard) so a
# persistent false positive (affected_files drift) cannot burn the
# retry budget — after the single nudge the gate passes.
if (
scope_status == "warning"
and state.current_subtask_id not in state.scope_feedback_subtasks
):
state.scope_feedback_subtasks.append(state.current_subtask_id)
state.save(state_file)
unexpected = scope_report.get("unexpected", [])
hint = scope_report.get("diagnostic_hint", "")
return {
"valid": False,
"message": (
"Scope warning (mutation-boundary): these files are "
f"outside {state.current_subtask_id}'s affected_files: "
f"{unexpected}. Revert the out-of-scope changes; OR, if "
"they are genuinely required, STOP and report a blocker "
"for a contract update — do not silently keep them. "
+ (f"({hint})" if hint else "")
).strip(),
}
# false-progress (correctness): MONITOR is approving, but the
# subtask changed NOTHING despite declaring affected_files. Same
# warn->actor-feedback trick (once per subtask via
# progress_feedback_subtasks): nudge the Actor to implement the
# change or report a blocker, rather than silently closing a
# subtask that did nothing.
if (
scope_status != "error"
and scope_report.get("expected")
and not scope_report.get("actual")
and state.current_subtask_id not in state.progress_feedback_subtasks
):
state.progress_feedback_subtasks.append(state.current_subtask_id)
state.save(state_file)
return {
"valid": False,
"message": (
"False-progress (mutation-boundary): MONITOR is closing "
f"{state.current_subtask_id} but NO files changed, though "
"its contract declares affected_files="
f"{scope_report.get('expected')}. Implement the change "
"with Edit/Write; OR if it is already satisfied or not "
"needed, STOP and report a blocker for a contract update "
"— do not close a subtask that did nothing."
),
}
except ImportError:
pass
# CHOOSE_MODE is auto-skipped; execution_mode is always "batch"
Expand Down
Loading
Loading