add static consistency analyzer to core.contracts


### Description

`src/agentanvil/core/contracts.py` ships `AgentContract`, `Policy`, `Task`, `Constraints` (and enums). Loading a YAML contract is possible, but nothing validates it *semantically*. A contract can today declare contradictory policies, tasks referencing an unavailable oracle, or constraints that are trivially impossible to meet — and the loader will accept all of them.

The static consistency analyzer ( of the ) is the deliverable that closes this gap. It is referenced in the  as a defining feature of AgentAnvil vs competitors (DeepEval, Promptfoo) because it shifts contract errors from runtime to load time.

Three concrete checks:

**1. Policy contradictions.** `forbid tool X` combined with `require tool X` is fatal. Detection is either by explicit rule-id pairs or by structural overlap of regex patterns against enum tool names.

**2. Oracle mismatches.** Task declares `oracle: human` but the corpus configuration has no human annotator. Task declares `oracle: functional` but `success_criteria` are natural-language sentences with no code-level assertion. These are fatal.

**3. Impossible constraints.** `constraints.max_latency_ms: 100` with a task requiring 3 sequential LLM calls (budget computed conservatively via `LLMBackend.cost_estimate`). These are warnings unless the budget is unambiguously exceeded.

### Proposal

**1. A typed diagnostic object:**

```python
# src/agentanvil/core/contracts_analyzer.py (new)
from enum import Enum
from pydantic import BaseModel


class DiagnosticLevel(str, Enum):
    FATAL = "fatal"
    WARNING = "warning"
    INFO = "info"


class Diagnostic(BaseModel):
    level: DiagnosticLevel
    check: str       # e.g. "policy_contradiction", "oracle_mismatch"
    message: str
    refs: list[str]  # IDs of the offending policies / tasks / constraints


class ContractDiagnostics(BaseModel):
    diagnostics: list[Diagnostic]

    @property
    def fatal(self) -> list[Diagnostic]:
        return [d for d in self.diagnostics if d.level == DiagnosticLevel.FATAL]

    @property
    def has_fatal(self) -> bool:
        return bool(self.fatal)
```

**2. Core analyzer function:**

```python
def analyze(contract: AgentContract, *, backend_cost_fn=None) -> ContractDiagnostics:
    diagnostics: list[Diagnostic] = []
    diagnostics += _check_policy_contradictions(contract)    # cons_policy
    diagnostics += _check_oracle_availability(contract)      # cons_oracle
    diagnostics += _check_constraint_feasibility(contract, backend_cost_fn)  # cons_budget
    diagnostics += _check_cross_level_references(contract)   # cons_cross
    diagnostics += _check_multiagent_coherence(contract)     # used once 0.3.0 lands
    diagnostics += _check_a2a_coherence(contract)            # used once 0.3.0 lands
    return ContractDiagnostics(diagnostics=diagnostics)
```

Each `_check_*` function implements one clause of the consistency relation `⊨` defined formally in the planning notes. The analyzer is the **operational instantiation** of the formal semantics; the two must stay in sync and the paper cites the code.

- `_check_policy_contradictions` — `cons_policy`: forbids *require X* + *forbid X* on overlapping scope.
- `_check_oracle_availability` — `cons_oracle`: realisability of declared oracle given corpus + ensemble configuration.
- `_check_constraint_feasibility` — `cons_budget`: per-level budget estimate ≤ declared constraint.
- `_check_cross_level_references` — `cons_cross`: every L2 / L3 reference to L1 entities resolves (e.g. trajectory assertion citing tool declared in L1 schema).
- `_check_multiagent_coherence`, `_check_a2a_coherence` — extensions of the same relation to multi-agent and A2A clauses (filled in by 0.3.0 #032 and #036).

Each `_check_*` function is testable in isolation with a fixture pair (passing contract, failing contract). The test suite doubles as the machine-checked proof that the implementation honours the formal semantics.

**3. Wire into CLI:**

```python
# src/agentanvil/cli/main.py (updated validate)
@app.command()
def validate(contract: Path):
    c = AgentContract.from_yaml(contract.read_text())
    diagnostics = analyze(c)
    for d in diagnostics.diagnostics:
        typer.echo(f"[{d.level.value}] {d.check}: {d.message} (refs={d.refs})")
    if diagnostics.has_fatal:
        raise typer.Exit(1)
```

### Scope

- `src/agentanvil/core/contracts_analyzer.py` — new.
- `src/agentanvil/cli/main.py` — extend `validate` to call the analyzer and exit non-zero on fatal.
- `tests/core/test_contracts_analyzer.py` — 12–20 test cases covering each check, pass + fail.
- `docs/contracts.md` — new, "writing contracts" reference; shows how to read diagnostics.

### Regression tests

- `test_analyze_empty_contract_has_no_diagnostics`
- `test_analyze_detects_policy_contradiction_forbid_vs_require_tool`
- `test_analyze_detects_oracle_mismatch_human_without_annotator_config`
- `test_analyze_detects_oracle_mismatch_functional_without_code_criteria`
- `test_analyze_detects_impossible_latency_constraint`
- `test_analyze_warns_on_near_limit_budget_constraint`
- `test_analyze_passes_on_well_formed_fixture_contract`
- `test_cli_validate_exits_nonzero_on_fatal`
- `test_cli_validate_prints_all_diagnostics`

### Notes

- This issue is the **operational instantiation of the formal contract semantics** published in the framework. The relation `C ⊨ cons_policy ∧ cons_oracle ∧ cons_budget ∧ cons_cross` is what the analyzer decides; the paper cites the module.
- Multi-agent and A2A checks (`_check_multiagent_coherence`, `_check_a2a_coherence`) stub-return `[]` in 0.2.0 and are filled in by 0.3.0 #032 and #036.
- Budget-feasibility check uses `LLMBackend.cost_estimate` from #002; if no backend is provided, `_check_constraint_feasibility` returns `[]` (non-blocking).
- `_check_cross_level_references` is new scope in this extension: validates that L2 / L3 references to L1 entities (tools, schema fields) resolve. Dangling references are fatal.
- Depends on: #002 (backend for cost estimate).
- Enables: honest contract authoring before 0.2.0 runs. the framework §Formal Framework cites the module as the decidability witness.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add static consistency analyzer to core.contracts #4

Description

Proposal

Scope

Regression tests

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

add static consistency analyzer to core.contracts #4

Description

Description

Proposal

Scope

Regression tests

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions