Skip to content

feat: add structured error taxonomy and prometheus metrics (AURA-RM-008 + RM-001)#80

Closed
brandon-shelton-mezmo wants to merge 3 commits into
mainfrom
feature/AURA-RM-008-001-error-taxonomy-metrics
Closed

feat: add structured error taxonomy and prometheus metrics (AURA-RM-008 + RM-001)#80
brandon-shelton-mezmo wants to merge 3 commits into
mainfrom
feature/AURA-RM-008-001-error-taxonomy-metrics

Conversation

@brandon-shelton-mezmo
Copy link
Copy Markdown

Summary

  • AURA-RM-008 (Error Taxonomy): ErrorCategory enum (13 categories), AuraError sanitization layer, taxonomy code field in API error responses. Client-facing messages are generic — internal details logged server-side only.
  • AURA-RM-001 (Prometheus Metrics): /metrics endpoint with 6 metric families — request duration histogram, token counters, tool duration histogram, error counters, in-flight gauge, MCP connection gauge. Includes AURA_METRICS_ENABLED kill switch and cardinality guards.
  • Grafana dashboard: Starter dashboard JSON with 12 panels at examples/grafana/aura-dashboard.json
  • Development process: Updated to 8 phases (added Documentation phase before Release)

Verified

  • cargo fmt --check clean
  • cargo clippy --all-targets --all-features -- -D warnings clean
  • cargo test --workspace --lib — 299 unit tests pass
  • 9 integration tests pass (integration-metrics feature flag)
  • Live tested with Bedrock LLM + Mock MCP server + Prometheus + Grafana
  • Token counters match API response usage exactly across 100+ requests
  • In-flight gauge showed 30 during concurrent burst, 0 after
  • Error counters fire for validation errors (400s) and client disconnects
  • MCP tool duration recorded for mock_tool and list_files
  • MCP connection gauge shows connected servers
  • /health and /metrics accessible during graceful shutdown
  • All 16 acceptance criteria verified
  • All quality gates checked
  • No secrets in diff
  • Backward compatible — existing configs load without changes

Specs

  • .specify/specs/AURA-RM-008/ — product, architecture, quality specs + review records (4 rounds → clean)
  • .specify/specs/AURA-RM-001/ — product, architecture, quality specs + review records (3 rounds → clean)

Ref: LOG-00000

Establish spec-kit directory structure with constitution, spec
templates (product, architecture, quality), and 11 roadmap items
(AURA-RM-001 through AURA-RM-011) covering production readiness
gaps from metrics to incident response. Define the 7-phase
development lifecycle with multi-agent review gates.

Ref: LOG-00000
Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
AURA-RM-008: Structured Error Taxonomy
- ErrorCategory enum with 13 runtime error categories
- AuraError struct separating internal messages (logs) from
  sanitized client-facing messages (API responses)
- ErrorDetail::classified() and validation() constructors
- From impls for DetectedToolError and StreamTermination
- All 6 error construction sites updated with taxonomy codes
- Existing error_type values frozen (Article I compliance)
- 13 unit tests

AURA-RM-001: Prometheus Metrics Endpoint
- /metrics endpoint with Prometheus text exposition format
- aura_http_request_duration_seconds histogram (25ms-300s)
- aura_llm_tokens_total counter (by type/provider/agent)
- aura_mcp_tool_duration_seconds histogram (10ms-60s)
- aura_errors_total counter (by taxonomy category)
- aura_http_requests_in_flight gauge
- aura_mcp_server_connected gauge
- AURA_METRICS_ENABLED kill switch
- /health and /metrics exempt from shutdown_guard
- Cardinality guards (100 tool cap, 64-char name limit)
- 9 integration tests, 2 unit tests
- Starter Grafana dashboard (examples/grafana/)

Ref: LOG-00000
Signed-off-by: brandon.shelton <brandon.shelton@mezmo.com>
@brandon-shelton-mezmo brandon-shelton-mezmo force-pushed the feature/AURA-RM-008-001-error-taxonomy-metrics branch from 8acd031 to 6b846c2 Compare April 28, 2026 19:51
The metrics tests sent `"model": "Test Assistant"` (the agent name) but
model lookup uses `alias` when set. The test config has
`alias = "test-assistant"`, causing 404 on chat requests.

Signed-off-by: brandon-shelton-mezmo <brandon.shelton@mezmo.com>
@brandon-shelton-mezmo brandon-shelton-mezmo changed the base branch from feature/spec-driven-roadmap to main April 29, 2026 19:08
@brandon-shelton-mezmo brandon-shelton-mezmo requested a review from a team April 29, 2026 19:08
@brandon-shelton-mezmo
Copy link
Copy Markdown
Author

Superseded by PR #117 which includes both AURA-RM-008 and AURA-RM-002 changes.

@github-actions github-actions Bot locked and limited conversation to collaborators May 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant