Problem
Section 3.7.1 defines AgentTrustProfile with fields derived exclusively from AgentRequest terminal states via AuditRecord events:
totalRequests, successRate, calibrationError, calibrationDrift, lastUpdated
This covers request-level calibration (declared confidence vs. actual success rate) but has no field for diagnostic quality — whether the agent's observations and diagnoses were factually correct, as judged by SREs.
The missing field blocks a concrete trust signal: empirical diagnostic accuracy from human ground-truth review, which is a better signal than self-reported confidence scores for governing autonomous actions.
Proposal
Add diagnosticAccuracy to the AgentTrustProfile schema in Section 3.7.1:
| Field |
Type |
Description |
diagnosticAccuracy |
float [0.0, 1.0] |
EMA of SRE verdict correctness over reviewed AgentDiagnostic records. 1.0 = all reviewed diagnostics correct, 0.0 = all incorrect. null if no verdicts yet. |
Update semantics (new subsection in 3.7.2)
The control plane MUST update diagnosticAccuracy whenever an AgentDiagnostic status verdict is set by an authorized reviewer:
diagnosticAccuracy_new = α × verdict_score + (1 − α) × diagnosticAccuracy_prev
Where verdict_score is 1.0 for correct, 0.5 for partial, 0.0 for incorrect, and α is the same configurable EMA decay factor as successRate.
The update MUST be recorded as an AuditRecord event (new event type: agent.trustprofile.diagnostic.updated).
Policy access
Expose agent.diagnosticAccuracy in the CEL evaluation context alongside the existing agent.calibrationError, enabling policies like:
# Require high diagnostic accuracy before allowing autonomous remediation
agent.diagnosticAccuracy == null || agent.diagnosticAccuracy > 0.90
Spec sections affected
- Section 3.7.1: Add
diagnosticAccuracy field to AgentTrustProfile table
- Section 3.7.2: Add update semantics for
diagnosticAccuracy
- Section 3.7.3: Expose
agent.diagnosticAccuracy in CEL evaluation context
- Section 9.6: Add
diagnosticAccuracy to AgentTrustProfile JSON schema
- Section 10 (Audit Events table): Add
agent.trustprofile.diagnostic.updated event
Context
The K8s reference binding (ravisantoshgudimetla/aip-k8s) implements the verdict collection mechanism via AgentDiagnosticStatus.verdict (correct/incorrect/partial) on the AgentDiagnostic CRD status subresource. This spec issue is a prerequisite for the trust profile aggregation step of that work.
Related: agent-control-plane/aip-k8s#51
Problem
Section 3.7.1 defines
AgentTrustProfilewith fields derived exclusively fromAgentRequestterminal states viaAuditRecordevents:totalRequests,successRate,calibrationError,calibrationDrift,lastUpdatedThis covers request-level calibration (declared confidence vs. actual success rate) but has no field for diagnostic quality — whether the agent's observations and diagnoses were factually correct, as judged by SREs.
The missing field blocks a concrete trust signal: empirical diagnostic accuracy from human ground-truth review, which is a better signal than self-reported confidence scores for governing autonomous actions.
Proposal
Add
diagnosticAccuracyto theAgentTrustProfileschema in Section 3.7.1:diagnosticAccuracyAgentDiagnosticrecords.1.0= all reviewed diagnostics correct,0.0= all incorrect.nullif no verdicts yet.Update semantics (new subsection in 3.7.2)
The control plane MUST update
diagnosticAccuracywhenever anAgentDiagnosticstatus verdict is set by an authorized reviewer:Where
verdict_scoreis1.0forcorrect,0.5forpartial,0.0forincorrect, and α is the same configurable EMA decay factor assuccessRate.The update MUST be recorded as an
AuditRecordevent (new event type:agent.trustprofile.diagnostic.updated).Policy access
Expose
agent.diagnosticAccuracyin the CEL evaluation context alongside the existingagent.calibrationError, enabling policies like:Spec sections affected
diagnosticAccuracyfield to AgentTrustProfile tablediagnosticAccuracyagent.diagnosticAccuracyin CEL evaluation contextdiagnosticAccuracyto AgentTrustProfile JSON schemaagent.trustprofile.diagnostic.updatedeventContext
The K8s reference binding (
ravisantoshgudimetla/aip-k8s) implements the verdict collection mechanism viaAgentDiagnosticStatus.verdict(correct/incorrect/partial) on theAgentDiagnosticCRD status subresource. This spec issue is a prerequisite for the trust profile aggregation step of that work.Related: agent-control-plane/aip-k8s#51