Releases: shepard-system/shepard-obs-stack
Releases · shepard-system/shepard-obs-stack
v1.3.1
v1.3.0 — Comparative dashboard
What's New
Comparative Dashboard (PR #3)
- 9th dashboard — side-by-side comparison of Claude Code, Gemini CLI, and Codex
- 5 rows, 14 panels: overview piecharts, token bargauges, cache efficiency, tool calls timeline, error counts, top repos, quality stats
- Consistent provider color coding: Claude (purple), Gemini (green), Codex (amber)
$git_repotemplate variable for filtering hook metrics
/obs-compare Skill
- New
/obs-compareslash command for CLI-based provider comparison - Sessions, cost, tokens, cache efficiency, tool calls, errors — all in one report
Full Changelog: v1.2.0...v1.3.0
v1.2.0 — Skills, per-turn context breakdown
What's New
Claude Code Skills (PR #1)
- 6 slash-command skills for querying the stack from conversation:
/obs-status,/obs-cost,/obs-sessions,/obs-tools,/obs-alerts,/obs-query scripts/obs-api.sh— centralized API client with env var overrides for auth/TLS
Per-Turn Context Breakdown (PR #2)
- Session parser enriched with
context.*attributes: character counts and token estimates for tool output, user prompts, compact summaries - Per-turn spans (
claude.turn) gated bySHEPARD_DETAILED_TRACES=1— per-turn token breakdown with input/output/cache tokens and tool count - Hook metrics:
shepherd_context_chars_total,shepherd_context_compaction_pre_tokens_total - Context Breakdown row in Claude Deep Dive dashboard (piecharts + stat panels)
- 10 new parser tests
Full Changelog: v1.1.0...v1.2.0
v1.1.0 — Rust accelerator, test suite, hardening, PromQL fixes
What's New
Rust Accelerator (opt-in)
shepard-hookbinary replaces bash+jq for metric emission and session parsing- 3-step resolution: project-local → PATH → bash fallback
- Install:
./scripts/install-accelerator.sh(downloads tohooks/bin/, gitignored)
Test Suite (113 tests)
- 4 suites: shell syntax, config validation, hooks, session parsers
- CI workflow with promtool rule validation + shellcheck
- Alert regression tests (rule counts + expression guards)
Hardening
--max-time 5on all fire-and-forget curls (prevents zombie processes)- All ports bound to
127.0.0.1(not exposed to LAN) - Loki ruler WAL moved from
/tmpto/loki/ruler-wal(survives OS cleanup) - Detached HEAD fallback to short SHA in git-context.sh
- Dashboard
noValue: "–"+ descriptions on all 8 dashboards
PromQL Fixes (critical)
- Fixed per-session metric aggregation: Native OTel metrics (Claude, Gemini) have unique
session_idlabels per session.increase()returned 0 for session counts and overestimated cost by up to 43%. - Session counts now use
count(max_over_time())to count distinct series - Cost/tokens/commits stat panels use
sum(max_over_time())for accurate totals - Added
round()on all integer counter panels (87 panels across 7 dashboards) - Time-series and hook metric queries unchanged (work correctly with
increase())
Alerting
- 16 alert rules across 3 tiers (infrastructure, pipeline, services)
- Inhibit chains: collector down → suppresses downstream alerts
- LokiDown uses dedicated scrape job (not OTel Collector exporter)
Full Changelog
v1.0.0 — The Eye
Initial public release. Docker-based observability for AI coding assistants.
Highlights
- 6 services: OTel Collector, Prometheus, Loki, Tempo, Grafana, Alertmanager
- 3 CLIs supported: Claude Code, Codex, Gemini CLI — hooks + native OpenTelemetry
- 8 Grafana dashboards: cost, tools, operations, quality, per-provider deep dives, session timeline
- 15 alert rules in 3 tiers (infrastructure, pipeline, business logic)
- One command to start:
./scripts/init.sh
What's included
Hooks provide git context and labeled tool/event counters. Native OTel provides tokens, cost, logs, and traces.
| CLI | Hooks | Native OTel |
|---|---|---|
| Claude Code | PreToolUse, PostToolUse, SessionStart, Stop |
metrics + logs |
| Codex CLI | agent-turn-complete |
logs |
| Gemini CLI | AfterTool, AfterAgent, AfterModel, SessionEnd |
metrics + logs + traces |
Security: PreToolUse guard blocks access to sensitive files (.env, credentials, keys). Sensitive file access counter + alert rule.
Session Timeline: synthetic OTLP traces parsed from all 3 CLI session logs — tool call waterfall, MCP timing, sub-agents.
See CHANGELOG.md for the full list of features.