azalio · azalio · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026 · Jun 5, 2026
diff --git a/.claude/skills/map-skill-eval/SKILL.md b/.claude/skills/map-skill-eval/SKILL.md
@@ -141,6 +141,26 @@ mapify skill-eval view map-plan
 mapify skill-eval view map-plan --result .map/eval-runs/map-plan/20260601T120000-optimize.json --open
 ```
 
+## Optimizing the whole skill (BODY/logic), not just the description
+
+`mapify skill-eval optimize` tunes only the trigger **`description:`** (does the skill fire on the
+right prompt?). To improve a skill's **body/logic** by OUTCOME quality (does it do its job well once
+it runs?), do NOT start from scratch — there is a worked, reusable flow and harness:
+
+- **Flow (start here):** `docs/whole-skill-optimization-flow.md` — measure outcome quality on golden
+  fixtures with a hybrid metric (deterministic gates + a trace-cited LLM judge), then human-edit the
+  body and re-measure (Approach B). Includes the fixture recipe, the measure→edit loop, and gotchas.
+- **Working log + findings:** `docs/whole-skill-optimization-notes.md`.
+- **Harness:** `tests/skills_eval/whole_skill/spike_runner.py` (`--degrade {body,actor,monitor}`),
+  fixtures under `tests/skills_eval/fixtures/whole_skill/`.
+
+**Key finding (don't re-derive):** for thin-orchestration skills (e.g. `map-task`), prose scope/
+correctness discipline — in the SKILL.md body OR the shared agent prompts — is **low-leverage**
+(ablations showed body-good == body-bad). The real levers are the **`affected_files` contract** and
+the **mechanical validators** (`validate_mutation_boundary` + test-gate + the MONITOR warn→feedback
+gates). Prose optimization pays off where behavior is genuinely prose-governed: the final **report
+format** and the **trigger description** (this skill). Spend effort accordingly.
+
 ## Related Commands
 
 - `/map-plan` — plan and decompose tasks.

diff --git a/.claude/skills/map-task/SKILL.md b/.claude/skills/map-task/SKILL.md
@@ -126,15 +126,10 @@ Route to the appropriate executor based on `$PHASE`. All phases from `/map-effic
 - **ACTOR (2.3)** — Implement the subtask
 - **MONITOR (2.4)** — Required validation before the subtask can complete.
 
-Single-subtask execution must keep using the shared branch workspace artifacts rather than creating task-local side files:
-
-
-
-- `code-review-00N.md`
-- `qa-001.md`
-- `pr-draft.md`
-
-When Monitor runs during `/map-task`, append to the next `code-review-00N.md` so targeted subtask execution stays aligned with the full workflow artifact model.
+Single-subtask execution must keep using the shared branch workspace artifacts in `.map/<branch>/`
+(e.g. `code-review-00N.md`, `qa-001.md`, `pr-draft.md`) rather than creating task-local side files.
+When Monitor runs during `/map-task`, append to the next `code-review-00N.md` so targeted subtask
+execution stays aligned with the full workflow artifact model.
 
 For each step:
 1. Get next step from orchestrator
@@ -147,7 +142,15 @@ For each step:
 - Run `python3 .map/scripts/map_orchestrator.py monitor_failed --feedback "<feedback>"` and retry Actor with feedback (max 5 iterations).
 - If the result says `retry_isolation=clean_retry_required`, run `python3 .map/scripts/map_step_runner.py validate_retry_quarantine` and make the next Actor attempt use `.map/<branch>/retry_quarantine.json` as clean-room context instead of rehydrating the rejected approach.
 
-## Step 4: Completion and Progress Report
+**Termination (do not loop or fake-complete):** if the 5 Actor iterations are exhausted without Monitor `valid: true`, OR the subtask cannot be satisfied within its declared scope (it would require an out-of-scope file, a dependency change, or a contract not in the blueprint), then STOP. Do NOT mark the subtask complete and do NOT expand scope to force a pass. Emit the **BLOCKED** outcome report (Step 4) stating the reason and the exact contract change needed.
+
+## Step 4: Outcome Report
+
+Every `/map-task` run ends with **exactly one** outcome report — **COMPLETE** or **BLOCKED** —
+carrying these required fields: `Subtask`, `Status`, `Files Modified`, `Validation` (test/Monitor
+result), and (BLOCKED only) `Blocker` + `Needed`. Never end a run without one of these reports.
+
+### Complete Outcome
 
 When `get_next_step` returns `is_complete: true`:
 
@@ -220,6 +223,32 @@ ALL SUBTASKS COMPLETE (${TOTAL}/${TOTAL})
 Run /map-check for final verification, or /map-learn to extract patterns.
 ```
 
+### Blocked Outcome
+
+When the subtask cannot complete within its declared scope (retries exhausted, an out-of-scope
+change would be required, or a dependency/contract conflict): do NOT update the plan status to
+`complete`. Report the blocker and stop for a contract update:
+
+```text
+═══════════════════════════════════════════════════
+SUBTASK BLOCKED
+═══════════════════════════════════════════════════
+Subtask: ${SUBTASK_ID}
+Title: <title>
+Status: BLOCKED
+Files Modified: <list, or "none">
+Validation: <Monitor/test result that could not be satisfied>
+
+Blocker: <why it cannot complete in scope — e.g. requires editing <file> not in
+         this subtask's affected_files, or a dependency change not in the contract>
+Needed:  <the exact contract change to unblock — e.g. add <file> to ST-XXX
+         affected_files, or split into a new subtask>
+═══════════════════════════════════════════════════
+```
+
+Then stop. Suggest `/map-plan` (to amend the decomposition) or ask the user for a contract decision —
+do not silently expand scope or mark the subtask complete.
+
 ---
 
 ## Error Handling
@@ -261,9 +290,13 @@ Proceed anyway? (The Actor will work with whatever state exists.)
 ## Examples
 
 ```
-/map-task <typical args>
+/map-task ST-003          # execute subtask ST-003 from the existing plan
 ```
 
+If a persisted TDD contract exists for the subtask (`test_contract_ST-003.md` +
+`test_handoff_ST-003.json`), `/map-task ST-003` automatically resumes at ACTOR against those tests.
+
 ## Troubleshooting
 
-- **Issue:** Workflow doesn't behave as expected. **Fix:** Re-read the section above titled 'What this command CANNOT do' (if present) and ensure prerequisites are met. Run `/map-resume` to recover from interruptions.
+- **Issue:** Workflow doesn't behave as expected. **Fix:** Confirm the **Prerequisites** (a plan must exist) and re-read the **Mutation Boundary Constraints** and **When Not To Expand Scope** sections above. Run `/map-resume` to recover from an interrupted run.
+- **Issue:** The subtask can't pass validation within its allowed files. **Fix:** Don't expand scope — emit the **BLOCKED** outcome report (Step 4) and amend the contract via `/map-plan`.
diff --git a/.map/scripts/map_orchestrator.py b/.map/scripts/map_orchestrator.py
@@ -336,6 +336,15 @@ class StepState:
     contract_ready_subtasks: dict[str, dict] = field(default_factory=dict)
     clean_retry_count: int = 0
     contaminated_retry_count: int = 0
+    # Subtask IDs already nudged once for a (non-strict) scope warning. The
+    # warn->actor-feedback gate (validate_step 2.4) fires at most ONCE per
+    # subtask, so a persistent false positive (affected_files drift) cannot
+    # burn the retry budget — after the single nudge the gate passes.
+    scope_feedback_subtasks: list[str] = field(default_factory=list)
+    # Subtask IDs already nudged once for a false-progress warning (MONITOR
+    # approved but the subtask changed NOTHING despite declaring affected_files).
+    # Same once-per-subtask bound as scope_feedback_subtasks.
+    progress_feedback_subtasks: list[str] = field(default_factory=list)
     retry_isolation_status: dict[str, str] = field(default_factory=dict)
     retry_quarantine_paths: dict[str, str] = field(default_factory=dict)
     completed_at: Optional[str] = None
@@ -403,6 +412,8 @@ def to_dict(self) -> dict:
             "contract_ready_subtasks": self.contract_ready_subtasks,
             "clean_retry_count": self.clean_retry_count,
             "contaminated_retry_count": self.contaminated_retry_count,
+            "scope_feedback_subtasks": self.scope_feedback_subtasks,
+            "progress_feedback_subtasks": self.progress_feedback_subtasks,
             "retry_isolation_status": self.retry_isolation_status,
             "retry_quarantine_paths": self.retry_quarantine_paths,
             "completed_at": self.completed_at,
@@ -441,6 +452,8 @@ def from_dict(cls, data: dict) -> "StepState":
             contract_ready_subtasks=data.get("contract_ready_subtasks", {}),
             clean_retry_count=data.get("clean_retry_count", 0),
             contaminated_retry_count=data.get("contaminated_retry_count", 0),
+            scope_feedback_subtasks=data.get("scope_feedback_subtasks", []),
+            progress_feedback_subtasks=data.get("progress_feedback_subtasks", []),
             retry_isolation_status=data.get("retry_isolation_status", {}),
             retry_quarantine_paths=data.get("retry_quarantine_paths", {}),
             completed_at=data.get("completed_at"),
@@ -1158,6 +1171,58 @@ def validate_step(
                             f"Unexpected files: {scope_report.get('unexpected', [])}"
                         ),
                     }
+                # warn->actor-feedback: a non-strict scope leak does NOT hard-fail
+                # the subtask, but the FIRST time it is seen we route it back to
+                # the Actor as feedback so it self-corrects (revert the
+                # out-of-scope edits, or escalate for a contract update). Bounded
+                # to once per subtask (scope_feedback_subtasks guard) so a
+                # persistent false positive (affected_files drift) cannot burn the
+                # retry budget — after the single nudge the gate passes.
+                if (
+                    scope_status == "warning"
+                    and state.current_subtask_id not in state.scope_feedback_subtasks
+                ):
+                    state.scope_feedback_subtasks.append(state.current_subtask_id)
+                    state.save(state_file)
+                    unexpected = scope_report.get("unexpected", [])
+                    hint = scope_report.get("diagnostic_hint", "")
+                    return {
+                        "valid": False,
+                        "message": (
+                            "Scope warning (mutation-boundary): these files are "
+                            f"outside {state.current_subtask_id}'s affected_files: "
+                            f"{unexpected}. Revert the out-of-scope changes; OR, if "
+                            "they are genuinely required, STOP and report a blocker "
+                            "for a contract update — do not silently keep them. "
+                            + (f"({hint})" if hint else "")
+                        ).strip(),
+                    }
+                # false-progress (correctness): MONITOR is approving, but the
+                # subtask changed NOTHING despite declaring affected_files. Same
+                # warn->actor-feedback trick (once per subtask via
+                # progress_feedback_subtasks): nudge the Actor to implement the
+                # change or report a blocker, rather than silently closing a
+                # subtask that did nothing.
+                if (
+                    scope_status != "error"
+                    and scope_report.get("expected")
+                    and not scope_report.get("actual")
+                    and state.current_subtask_id not in state.progress_feedback_subtasks
+                ):
+                    state.progress_feedback_subtasks.append(state.current_subtask_id)
+                    state.save(state_file)
+                    return {
+                        "valid": False,
+                        "message": (
+                            "False-progress (mutation-boundary): MONITOR is closing "
+                            f"{state.current_subtask_id} but NO files changed, though "
+                            "its contract declares affected_files="
+                            f"{scope_report.get('expected')}. Implement the change "
+                            "with Edit/Write; OR if it is already satisfied or not "
+                            "needed, STOP and report a blocker for a contract update "
+                            "— do not close a subtask that did nothing."
+                        ),
+                    }
             except ImportError:
                 pass
     # CHOOSE_MODE is auto-skipped; execution_mode is always "batch"