[task] Add CLARIFY protocol path eval

### Checks

- [x] I searched open and closed issues and pull requests and did not find a duplicate.

### Area

Tests

### Goal

The only existing eval (`test_eval_harness_single_hop`) exercises the `DONE` path. There is no eval verifying a real model correctly identifies a genuinely ambiguous query and emits `CLARIFY` rather than hallucinating forward with `REMEMBER` or `DONE`.

Run `anchor.run` with queries that are genuinely underspecified — no entity names, no clear intent — and assert:

- `result.stop_reason == "ask"`
- `result.kind == "ask"`
- `result.content` contains a question directed at the user
- The model does not emit `DONE` or `REMEMBER` for these inputs

Use 2-3 distinct underspecified queries to reduce flakiness risk from any single prompt.

**The model may fail this test currently, that means the prompt needs fine tuning which is not included within this ticket**

### Definition of done


- New file `tests/evals/test_eval_clarify.py`
- Uses the shared `ai_fn`, `light_ai_fn`, and `embed_fn` fixtures from `conftest.py`
- Marked `@pytest.mark.eval`
- Passes `uv run pytest -m eval tests/evals` locally with Ollama running
- Each case runs 3x minimum given the non-deterministic nature of the CLARIFY path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[task] Add CLARIFY protocol path eval #39

Checks

Area

Goal

Definition of done

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[task] Add CLARIFY protocol path eval #39

Description

Checks

Area

Goal

Definition of done

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions