Skip to content

add static consistency analyzer to core.contracts #4

@cchinchilla-dev

Description

@cchinchilla-dev

Description

src/agentanvil/core/contracts.py ships AgentContract, Policy, Task, Constraints (and enums). Loading a YAML contract is possible, but nothing validates it semantically. A contract can today declare contradictory policies, tasks referencing an unavailable oracle, or constraints that are trivially impossible to meet — and the loader will accept all of them.

The static consistency analyzer ( of the ) is the deliverable that closes this gap. It is referenced in the as a defining feature of AgentAnvil vs competitors (DeepEval, Promptfoo) because it shifts contract errors from runtime to load time.

Three concrete checks:

1. Policy contradictions. forbid tool X combined with require tool X is fatal. Detection is either by explicit rule-id pairs or by structural overlap of regex patterns against enum tool names.

2. Oracle mismatches. Task declares oracle: human but the corpus configuration has no human annotator. Task declares oracle: functional but success_criteria are natural-language sentences with no code-level assertion. These are fatal.

3. Impossible constraints. constraints.max_latency_ms: 100 with a task requiring 3 sequential LLM calls (budget computed conservatively via LLMBackend.cost_estimate). These are warnings unless the budget is unambiguously exceeded.

Proposal

1. A typed diagnostic object:

# src/agentanvil/core/contracts_analyzer.py (new)
from enum import Enum
from pydantic import BaseModel


class DiagnosticLevel(str, Enum):
    FATAL = "fatal"
    WARNING = "warning"
    INFO = "info"


class Diagnostic(BaseModel):
    level: DiagnosticLevel
    check: str       # e.g. "policy_contradiction", "oracle_mismatch"
    message: str
    refs: list[str]  # IDs of the offending policies / tasks / constraints


class ContractDiagnostics(BaseModel):
    diagnostics: list[Diagnostic]

    @property
    def fatal(self) -> list[Diagnostic]:
        return [d for d in self.diagnostics if d.level == DiagnosticLevel.FATAL]

    @property
    def has_fatal(self) -> bool:
        return bool(self.fatal)

2. Core analyzer function:

def analyze(contract: AgentContract, *, backend_cost_fn=None) -> ContractDiagnostics:
    diagnostics: list[Diagnostic] = []
    diagnostics += _check_policy_contradictions(contract)    # cons_policy
    diagnostics += _check_oracle_availability(contract)      # cons_oracle
    diagnostics += _check_constraint_feasibility(contract, backend_cost_fn)  # cons_budget
    diagnostics += _check_cross_level_references(contract)   # cons_cross
    diagnostics += _check_multiagent_coherence(contract)     # used once 0.3.0 lands
    diagnostics += _check_a2a_coherence(contract)            # used once 0.3.0 lands
    return ContractDiagnostics(diagnostics=diagnostics)

Each _check_* function implements one clause of the consistency relation defined formally in the planning notes. The analyzer is the operational instantiation of the formal semantics; the two must stay in sync and the paper cites the code.

  • _check_policy_contradictionscons_policy: forbids require X + forbid X on overlapping scope.
  • _check_oracle_availabilitycons_oracle: realisability of declared oracle given corpus + ensemble configuration.
  • _check_constraint_feasibilitycons_budget: per-level budget estimate ≤ declared constraint.
  • _check_cross_level_referencescons_cross: every L2 / L3 reference to L1 entities resolves (e.g. trajectory assertion citing tool declared in L1 schema).
  • _check_multiagent_coherence, _check_a2a_coherence — extensions of the same relation to multi-agent and A2A clauses (filled in by 0.3.0 #032 and #036).

Each _check_* function is testable in isolation with a fixture pair (passing contract, failing contract). The test suite doubles as the machine-checked proof that the implementation honours the formal semantics.

3. Wire into CLI:

# src/agentanvil/cli/main.py (updated validate)
@app.command()
def validate(contract: Path):
    c = AgentContract.from_yaml(contract.read_text())
    diagnostics = analyze(c)
    for d in diagnostics.diagnostics:
        typer.echo(f"[{d.level.value}] {d.check}: {d.message} (refs={d.refs})")
    if diagnostics.has_fatal:
        raise typer.Exit(1)

Scope

  • src/agentanvil/core/contracts_analyzer.py — new.
  • src/agentanvil/cli/main.py — extend validate to call the analyzer and exit non-zero on fatal.
  • tests/core/test_contracts_analyzer.py — 12–20 test cases covering each check, pass + fail.
  • docs/contracts.md — new, "writing contracts" reference; shows how to read diagnostics.

Regression tests

  • test_analyze_empty_contract_has_no_diagnostics
  • test_analyze_detects_policy_contradiction_forbid_vs_require_tool
  • test_analyze_detects_oracle_mismatch_human_without_annotator_config
  • test_analyze_detects_oracle_mismatch_functional_without_code_criteria
  • test_analyze_detects_impossible_latency_constraint
  • test_analyze_warns_on_near_limit_budget_constraint
  • test_analyze_passes_on_well_formed_fixture_contract
  • test_cli_validate_exits_nonzero_on_fatal
  • test_cli_validate_prints_all_diagnostics

Notes

  • This issue is the operational instantiation of the formal contract semantics published in the framework. The relation C ⊨ cons_policy ∧ cons_oracle ∧ cons_budget ∧ cons_cross is what the analyzer decides; the paper cites the module.
  • Multi-agent and A2A checks (_check_multiagent_coherence, _check_a2a_coherence) stub-return [] in 0.2.0 and are filled in by 0.3.0 #032 and #036.
  • Budget-feasibility check uses LLMBackend.cost_estimate from chore(meta): harden version-linearity script and ci pin #2; if no backend is provided, _check_constraint_feasibility returns [] (non-blocking).
  • _check_cross_level_references is new scope in this extension: validates that L2 / L3 references to L1 entities (tools, schema fields) resolve. Dangling references are fatal.
  • Depends on: chore(meta): harden version-linearity script and ci pin #2 (backend for cost estimate).
  • Enables: honest contract authoring before 0.2.0 runs. the framework §Formal Framework cites the module as the decidability witness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    contractsContract layer (YAML, Pydantic, static consistency)coreenhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions