From 8631711dd7fcd325197f6714415ef643ebdd7e4c Mon Sep 17 00:00:00 2001 From: luiseiman Date: Wed, 3 Jun 2026 13:46:47 -0300 Subject: [PATCH 1/4] =?UTF-8?q?feat(v4):=20audit=20two-dimension=20model?= =?UTF-8?q?=20=E2=80=94=20Native=20Health=20+=20dotforge=20Adoption?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reorient the audit from "do you have dotforge machinery?" to "do you use native Claude Code well?", per the native-first scope decision. - Dimension A — Native Health (score, 0-10): 5 obligatory + 10 recommended. New native-usage items: auto-memory hygiene (MEMORY.md as index, not dump), permission cascade (settings.local.json), attribution (vs deprecated includeCoAuthoredBy). Same obl*0.7 + rec*0.3 formula, security cap 6.0. - Dimension B — dotforge Adoption (forge_adoption, 0-5): behaviors, workflows, override loop, domain rules, sync recency. INFORMATIONAL — never affects score. Native-first projects (B=0, A=10) are now a desirable outcome, not penalized. Migrate all checklist consumers to keep them aligned: - audit/score.sh (CI engine): rewrite items 6-15 + add dimension B + dual output - audit_all.py (12-project re-auditor): same, writes forge_adoption to registry - .github/workflows/audit.yml: use native_health_items, fix stale /7 divisor - docs (README, usage-guide, guia-uso): two-dimension tables + formula Fixes three pre-existing inconsistencies: scoring summed items 6-15 while the checklist had 6-17; registry comment showed wrong 3.0/8 divisor; v4 items 16-17 were outside the score formula. Adds ADR docs/v4/SCOPE-DECISION.md and rule domain/native-vs-dotforge-boundary.md. Co-Authored-By: Claude Opus 4.8 --- .../domain/native-vs-dotforge-boundary.md | 48 +++ .github/workflows/audit.yml | 24 +- CLAUDE.md | 2 +- README.md | 4 +- audit/checklist.md | 125 ++++--- audit/score.sh | 338 +++++++++++------- audit/scoring.md | 48 ++- docs/guia-uso.md | 45 ++- docs/usage-guide.md | 44 ++- docs/v4/SCOPE-DECISION.md | 69 ++++ registry/projects.yml | 13 +- scripts/audit_all.py | 223 +++++++----- skills/audit-project/SKILL.md | 100 +++--- 13 files changed, 721 insertions(+), 362 deletions(-) create mode 100644 .claude/rules/domain/native-vs-dotforge-boundary.md create mode 100644 docs/v4/SCOPE-DECISION.md diff --git a/.claude/rules/domain/native-vs-dotforge-boundary.md b/.claude/rules/domain/native-vs-dotforge-boundary.md new file mode 100644 index 0000000..093d8c1 --- /dev/null +++ b/.claude/rules/domain/native-vs-dotforge-boundary.md @@ -0,0 +1,48 @@ +--- +globs: docs/v4/*.md, behaviors/*, stacks/*, skills/*, .claude/rules/domain/*.md +description: Native-first boundary — what dotforge keeps vs cedes to native Claude Code, and the method for deciding +domain: dotforge-meta +last_verified: 2026-06-03 +--- + +# Native vs dotforge Boundary + +## Governing principle + +If Claude Code resolves it natively, ADOPT the native solution. dotforge only owns +what natively has no equivalent. Scope shrinks as Claude Code grows — that is correct. + +## Method (mandatory before any scope decision) + +Verify the CURRENT native state against official docs (code.claude.com/docs, +anthropics/claude-code) BEFORE classifying. A scope call on stale assumptions is wrong +by default — this is why `/forge watch` (keeping domain rules current) is the INPUT that +makes boundary decisions correct, not meta-work. Never cede a capability without +confirming the native feature actually covers the real case. + +## Classification (verified 2026-06-03) + +KEEP — no native equivalent: +- **Domain rules** — curated encyclopedia of CC internals + business domain. Core asset. +- **Cross-project propagation with merge** — `forge:section` markers + `/forge sync`. + Native symlinks/global CLAUDE.md only COPY, never merge per-project customization. +- **Behaviors v3** — 5-level escalation + session state + auditable override. + `hookify` is a binary warn/block wrapper; no native behavior-governance spec exists. + KEEP pending validation that production projects actually consult `overrides.log`. +- **Registry + audit cross-project** — REORIENT: audit must measure good use of NATIVE + features (auto-memory active, sandbox set, deny rules present, /init run), not presence + of dotforge machinery. + +CEDE — native covers it: +- Individual learning capture → native auto-memory (per-project, `~/.claude/projects/

/memory/`) +- Identical shared rules (no merge needed) → native symlinks in `.claude/rules/` + global CLAUDE.md +- Workflows / orchestration → `/workflows`, Agent Teams, `/deep-research` (already ceded v4) +- Base CLAUDE.md generation → `/init` +- One-shot code review → `/code-review --comment/--fix` +- Model routing as a system → `/effort` (keep only as documentation) + +## Anti-pattern + +Building atop native internals that may change (compiled hooks depend on the hook API +shape). Every breaking upstream change forces a re-tune. Keep the native-dependent +surface minimal and validate delta demand before expanding it. diff --git a/.github/workflows/audit.yml b/.github/workflows/audit.yml index e79940c..2e2c784 100644 --- a/.github/workflows/audit.yml +++ b/.github/workflows/audit.yml @@ -43,29 +43,45 @@ jobs: with: script: | const result = JSON.parse(process.env.AUDIT_RESULT); - const score = result.score; + const score = result.native_health ?? result.score; const level = result.level; + const adoption = result.forge_adoption ?? 0; + const adoptionLabel = result.adoption_label ?? 'None'; const cap = result.security_cap ? '\n> ⚠️ **Security cap applied** — settings.json or block-destructive missing.' : ''; const emoji = score >= 9 ? '🟢' : score >= 7 ? '🟡' : score >= 5 ? '🟠' : '🔴'; - const itemRows = Object.entries(result.items).map(([key, v]) => { + const itemRows = Object.entries(result.native_health_items).map(([key, v]) => { const num = key.split('_')[0]; const name = key.split('_').slice(1).join('_'); const icon = v.score === 0 ? '❌' : v.score === 1 ? '⚠️' : '✅'; return `| ${num} | ${name} | ${icon} ${v.score} | ${v.note} |`; }).join('\n'); + const adoptionRows = Object.entries(result.adoption_items || {}).map(([key, v]) => { + const num = key.split('_')[0]; + const name = key.split('_').slice(1).join('_'); + const icon = v.score === 0 ? '—' : '✅'; + return `| ${num} | ${name} | ${icon} ${v.score} | ${v.note} |`; + }).join('\n'); + const body = [ - `## ${emoji} dotforge Audit Score: **${score}/10** (${level})${cap}`, + `## ${emoji} dotforge Native Health: **${score}/10** (${level})${cap}`, + `dotforge Adoption: **${adoption}/5** (${adoptionLabel}) — _informational, does not affect Native Health_`, '', - `| Obligatorio | ${result.score_obligatorio}/10 | Recomendado | ${result.score_recomendado}/7 |`, + `| Obligatorio | ${result.score_obligatorio}/10 | Recomendado | ${result.score_recomendado}/10 |`, '|---|---|---|---|', '', + '### Dimension A — Native Health', '| # | Item | Score | Note |', '|---|---|---|---|', itemRows, '', + '### Dimension B — dotforge Adoption (informational)', + '| # | Item | Score | Note |', + '|---|---|---|---|', + adoptionRows, + '', '_Score computed by `audit/score.sh` — mechanical checks only. Run `/forge audit` for full semantic evaluation._', ].join('\n'); diff --git a/CLAUDE.md b/CLAUDE.md index 5976bf4..019b547 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -73,7 +73,7 @@ Seven subagent definitions in `agents/`: researcher (read-only exploration), arc ### Audit System -`audit/checklist.md` defines 15 items (5 obligatory scored 0-2, 10 recommended scored 0-1). `audit/scoring.md` normalizes to a 10-point scale. Security-critical items (settings.json, block-destructive hook) cap the score at 6.0 if missing. Registry in `registry/projects.yml` tracks scores across managed projects. +Two-dimension model (v4.x). **Dimension A — Native Health** (`score`, 0-10): 5 obligatory items (0-2) + 10 recommended (0-1), normalized as `obligatory*0.7 + recommended*0.3`. Security-critical items (settings.json, block-destructive hook) cap it at 6.0 if missing. Measures good use of native Claude Code (auto-memory as index, permission cascade, attribution, sandbox, deny rules). **Dimension B — dotforge Adoption** (`forge_adoption`, 0-5): behaviors/workflows/override-loop/domain-rules/sync-recency. **Informational — does NOT affect Native Health.** A native-first project scoring B=0 with A=10 is a desirable outcome (see `.claude/rules/domain/native-vs-dotforge-boundary.md`). `audit/checklist.md` + `audit/scoring.md` are the source of truth; registry in `registry/projects.yml` tracks both across managed projects. ### Integrations diff --git a/README.md b/README.md index 9e5799a..47f03a3 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ For people and teams managing more than one Claude Code project. - **`scripts/process-override-log.sh`** — bash script that processes `.forge/audit/overrides.log` and auto-creates `practices/inbox/auto-override-*.md` for behaviors overridden ≥3 times in 30 days. Idempotent. 10/10 tests green. Cost: 0 LLM calls, pure bash. - **`session-start-process-overrides.sh`** wired in `SessionStart` (template + self-hosting) — auto-captures frequent overrides as practices on every session start. - **`scripts/migrate-v3-to-v4.sh`** — safe migration script with mandatory `--dry-run`, atomic backup, `--rollback`. See [`docs/v4/MIGRATION-V3-TO-V4.md`](docs/v4/MIGRATION-V3-TO-V4.md). -- **Audit checklist items 16-17** — workflow availability + override loop active. Score impact: v3-perfect projects report ~9.5/10 v4 until migrated. +- **Audit two-dimension model** — **Native Health** (0-10: native Claude Code usage + security) + **dotforge Adoption** (0-5: informational, does not affect the score). Behaviors / workflows / override-loop moved to the non-penalizing Adoption dimension — native-first projects no longer lose points for skipping dotforge machinery. New Native-Health items: auto-memory hygiene, permission cascade, attribution. - **`domain/workflow-economics.md`** (new domain rule) — documents v4 PoC cost-quality findings. Decision matrix: when workflow vs skill. Token economy principles. **TL;DR: workflows are 4-25x more expensive than bash skills for recurring work — use only as on-demand escalation, not as default refactor.** - **`workflows/watch.js`** ships as REFERENCE implementation, NOT promoted to `/forge watch` default. The bash skill remains the production tool. @@ -213,7 +213,7 @@ dotforge/ ├── mcp/ # MCP server templates (github, postgres, supabase, redis, slack) ├── behaviors/ # v3 declarative policies (index.yaml + one dir per behavior) ├── scripts/ # v3 runtime, compiler, and /forge behavior CLI -├── audit/ # Checklist (15 items) + scoring normalized to 10 +├── audit/ # Native Health (15 items, 0-10) + dotforge Adoption (5 items, informational) ├── practices/ # Pipeline: inbox → evaluating → active → deprecated ├── global/ # Global ~/.claude/ management (CLAUDE.md, settings, sync.sh) ├── registry/ # Project tracking with scores and history diff --git a/audit/checklist.md b/audit/checklist.md index a7558eb..ccda677 100644 --- a/audit/checklist.md +++ b/audit/checklist.md @@ -1,5 +1,14 @@ # Checklist de Auditoría dotforge +El audit tiene **dos dimensiones independientes**: + +- **A — Salud Nativa** (score 0-10): ¿el proyecto usa bien Claude Code nativo + seguridad? Es el score que importa para cualquier proyecto, use o no la maquinaria dotforge. +- **B — Adopción dotforge** (informativo 0-5): ¿cuánto adoptó la gobernanza dotforge? **NO penaliza** la Salud Nativa. Un proyecto native-first puro saca 0/5 acá sin perder un punto en A. + +--- + +# Dimensión A — Salud Nativa (score 0-10) + ## Obligatorio (cada item: 0-2 puntos, total máximo: 10) ### 1. CLAUDE.md (0-2) @@ -7,7 +16,7 @@ - 1: Existe pero <20 líneas útiles O falta alguna sección clave - 2: Completo — incluye **todas** estas secciones: stack/tecnologías, arquitectura/estructura, comandos build/test exactos, convenciones -**Verificación:** No contar líneas vacías ni comentarios. Buscar presencia explícita de: nombre del stack, al menos 1 comando build/test, estructura de directorios o descripción de arquitectura. +**Verificación:** No contar líneas vacías ni comentarios. Buscar presencia explícita de: nombre del stack, al menos 1 comando build/test, estructura de directorios o descripción de arquitectura. `/init` nativo genera la base; score 2 exige que esté completo. ### 2. .claude/settings.json (0-2) - 0: No existe @@ -33,72 +42,94 @@ ## Recomendado (cada item: 0-1 punto, total máximo: 10) -### 6. CLAUDE_ERRORS.md -- 0: No existe -- 1: Existe con formato para registrar errores (tabla con columna Type: syntax|logic|integration|config|security) +### 6. .gitignore protege secrets +- 0: No hay .gitignore o no protege .env/secrets +- 1: .gitignore incluye .env, *.key, *.pem, credentials + +### 7. Prompt injection scan +- 0: Rules or CLAUDE.md contain suspicious patterns (prompt injection risk) +- 1: No suspicious patterns detected + +**Verification:** Scan `.claude/rules/`, `CLAUDE.md`, and any `*.md` in `.claude/` for patterns: `ignore previous`, `system:`, ``, ``, ``, encoded payloads (base64 inline blocks), `IGNORE ALL`, `disregard`, `override instructions`. If any match → score 0 with explicit warning. + +### 8. Auto mode safety (0-1) +- 0: Auto mode enabled without deny list covering .env, *.key, *.pem, *credentials* +- 1: Auto mode enabled WITH complete deny list OR auto mode not enabled + +**Verification:** Check if `permissions.defaultMode` is `"auto"` in settings.json. If yes, verify deny list covers secrets. If not enabled (default), automatic pass. + +### 9. OS-level sandboxing (0-1) +- 0: Project handles secrets (env vars, credentials, API keys, cloud configs) with no `sandbox.enabled` in settings.json +- 1: `sandbox.enabled: true` with at least `network.allowedDomains` OR `filesystem.denyRead` covering sensitive paths — OR project demonstrably handles no secrets (automatic pass) -### 7. Hook de lint automático +**Verification:** Parse `settings.json` for `sandbox.enabled`. If true, verify at least one filesystem or network restriction. If false, scan for secret indicators (`.env*`, `credentials*`, `*.key`, `*.pem`, cloud CLIs). Projects without secrets auto-pass. Not applicable on Windows native (WSL2 only). See `.claude/rules/domain/sandboxing.md`. + +### 10. Hook de lint automático - 0: No hay lint post-write - 1: Hook de lint configurado para el stack del proyecto Y es ejecutable (`chmod +x`) -### 8. Comandos custom (.claude/commands/) +### 11. Auto-memory bien usado (0-1) +- 0: No hay memoria de proyecto, O MEMORY.md es un dump (>200 líneas o >25KB — se trunca, contenido invisible) +- 1: Memoria nativa bien estructurada: `MEMORY.md` es un índice conciso de punteros (<200 líneas Y <25KB), con archivos de memoria enlazados. Si el proyecto rastrea errores, `CLAUDE_ERRORS.md` existe con formato de tabla (columna Type: syntax|logic|integration|config|security) + +**Verificación:** Contar líneas y bytes de `MEMORY.md`. Penalizar el anti-patrón de volcar contenido en el índice (regla nativa: solo las primeras 200 líneas / 25KB se inyectan por sesión). Ver `.claude/rules/domain/context-window-optimization.md`. + +### 12. Permission cascade (0-1) +- 0: Overrides locales mezclados en `settings.json` versionado (rutas absolutas de máquina, allows ad-hoc) que ensucian el commit +- 1: `settings.local.json` usado para overrides per-máquina/per-usuario, O el proyecto no necesita overrides locales (auto-pass) + +**Verificación:** Si hay rutas de máquina o permisos ad-hoc en `.claude/settings.json` versionado que deberían estar en `settings.local.json` → 0. Cascade nativo: Managed > Local > Project > User. Ver `.claude/rules/domain/permission-model.md`. + +### 13. Attribution configurado (0-1) +- 0: Usa el `includeCoAuthoredBy` deprecado, O trailers de commit/PR inconsistentes con la intención del proyecto +- 1: `attribution.commit` / `attribution.pr` configurado en settings.json, O el co-author por defecto es aceptable para el proyecto (auto-pass) + +**Verificación:** Buscar `includeCoAuthoredBy` (deprecado → recomendar migrar a `attribution.*`). Auto-pass si el default alcanza. Para GitHub/GitLab/Bitbucket self-hosted, verificar `prUrlTemplate`. Ver `.claude/rules/_common.md` § Git. + +### 14. Comandos custom (.claude/commands/) - 0: No hay comandos custom - 1: Al menos 1 comando custom relevante al proyecto -### 9. Memory del proyecto -- 0: No hay archivos de memoria -- 1: Existe memoria con contexto útil del proyecto - -### 10. Agentes de orquestación +### 15. Agentes de orquestación - 0: No hay .claude/agents/ ni regla agents.md - 1: Agentes instalados + regla de orquestación activa en .claude/rules/ -### 11. .gitignore protege secrets -- 0: No hay .gitignore o no protege .env/secrets -- 1: .gitignore incluye .env, *.key, *.pem, credentials - -### 12. Prompt injection scan -- 0: Rules or CLAUDE.md contain suspicious patterns (prompt injection risk) -- 1: No suspicious patterns detected +**Tier adjustments (dimensión A):** +- `simple` (<5K LOC, 1 stack, sin CI): items 14-15 con score 0 no penalizan (N/A) +- `complex` (>50K LOC, 3+ stacks, monorepo): items 14-15 semi-obligatorios (cada uno 0-2 en vez de 0-1) -**Verification:** Scan `.claude/rules/`, `CLAUDE.md`, and any `*.md` in `.claude/` for patterns: `ignore previous`, `system:`, ``, ``, ``, encoded payloads (base64 inline blocks), `IGNORE ALL`, `disregard`, `override instructions`. If any match → score 0 with explicit warning. +--- -### 13. Auto mode safety (0-1) -- 0: Auto mode enabled without deny list covering .env, *.key, *.pem, *credentials* -- 1: Auto mode enabled WITH complete deny list OR auto mode not enabled +# Dimensión B — Adopción dotforge (informativo, 0-5) -**Verification:** Check if `permissions.defaultMode` is set to `"auto"` in settings.json. If yes, verify deny list covers secrets. If auto mode is not enabled (default), automatic pass. +**No afecta el score de Salud Nativa.** Mide cuánto adoptó el proyecto la maquinaria de gobernanza dotforge. Reportar como `Adopción: N/5` con label (0=None, 1-2=Partial, 3-4=Most, 5=Full). Sirve para decidir propagación, no para juzgar calidad. -### 14. Behaviors coverage (v3) (0-1) -- 0: No v3 behaviors enforced — declaration in `behaviors/index.yaml` alone DOES NOT count -- 1: At least one v3 behavior compiled to a runtime hook under `.claude/hooks/generated/` AND referenced in `settings.json` so the harness actually loads it +### B1. Behaviors v3 compilados y wired +- 0: Sin behaviors enforced — declaración en `behaviors/index.yaml` sola NO cuenta +- 1: Al menos un behavior compilado a `.claude/hooks/generated/*__pretooluse__*.sh` Y referenciado en `settings.json` -**Verification:** Score reflects ENFORCEMENT, not intent. Required evidence: -1. `.claude/hooks/generated/*__pretooluse__*.sh` (or matching event suffix) exists for at least one behavior — proof the YAML compiled -2. `settings.json` references the generated hook path (auto-injected by `/forge behavior on` or merged from a `*.settings.json` snippet) +**Verificación:** `ls .claude/hooks/generated 2>/dev/null` y `grep generated .claude/settings.json`. -A project with `behaviors/index.yaml` declaring `enabled: true` for several behaviors but no compiled hooks scores **0**. Compilation without the settings.json reference also scores 0 — the harness does not auto-load generated hooks. To diagnose: `ls .claude/hooks/generated 2>/dev/null` and `grep generated .claude/settings.json`. A project that has not opted into the v3 behavior governance layer scores 0 — this does not apply the security cap. +### B2. Workflow availability (v4) +- 0: No hay `workflows/` o está vacío +- 1: `workflows/` con al menos un `.js` que contiene `export const meta` -### 15. OS-level sandboxing (0-1) -- 0: Project handles secrets (env vars, credentials, API keys, cloud configs) with no `sandbox.enabled` in settings.json -- 1: `sandbox.enabled: true` with at least `network.allowedDomains` OR `filesystem.denyRead` covering the project's sensitive paths — OR project demonstrably handles no secrets (automatic pass) +**Verificación:** `grep -q "export const meta" workflows/*.js`. Señal de gobernanza, no de calidad — los bash skills siguen siendo el workhorse. Ver `docs/v4/SPEC.md`. -**Verification:** Parse `settings.json` for `sandbox.enabled`. If true, verify at least one filesystem or network restriction is configured. If false, scan project for indicators of secret handling: presence of `.env*`, `credentials*`, `*.key`, `*.pem`, or references to cloud CLIs (`gcloud`, `aws`, `kubectl`) in scripts. Projects without secrets auto-pass. Not applicable on Windows native (WSL2 only). See `.claude/rules/domain/sandboxing.md`. +### B3. Override capture loop activo (v4) +- 0: `.forge/audit/overrides.log` no rastreado O `session-start-process-overrides.sh` no wired +- 1: Ambos presentes: log existe Y el hook está en `.claude/settings.json` SessionStart -### 16. Workflow availability (v4, 0-1) -- 0: No `workflows/` directory OR directory empty -- 1: `workflows/` directory exists with at least one `.js` file containing an `export const meta` block +**Verificación:** `test -f .forge/audit/overrides.log && grep -q "session-start-process-overrides.sh" .claude/settings.json`. Solo significativo si hay behaviors activos. Ver `scripts/process-override-log.sh`. -**Verification:** `ls workflows/*.js 2>/dev/null` returns at least one file; `grep -q "export const meta" workflows/*.js` confirms valid workflow shape. Score is deliberately low (1 point) — workflow presence is a governance signal, not a quality measure. Bash skills remain the workhorse. See `docs/v4/SPEC.md`. +### B4. Domain rules +- 0: No hay `.claude/rules/domain/` +- 1: Al menos un domain rule presente y fresco (`last_verified` <90 días) -### 17. Override capture loop active (v4, 0-1) -- 0: `.forge/audit/overrides.log` not tracked OR `process-override-log.sh` not wired in SessionStart -- 1: Both present: log file exists AND the hook is referenced in `.claude/settings.json` SessionStart hooks +**Verificación:** Contar archivos en `.claude/rules/domain/`. Reportar cuántos están stale (>90 días). Si hay lógica de negocio pero no domain rules, sugerir `/forge domain extract`. -**Verification:** -```bash -test -f .forge/audit/overrides.log && \ - grep -q "session-start-process-overrides.sh" .claude/settings.json -``` +### B5. Sync recency +- 0: `dotforge_version` del proyecto desfasado respecto a `VERSION` por ≥1 minor, o desconocido +- 1: Proyecto sincronizado a la versión actual de dotforge (`dotforge_version` == `VERSION`) -Projects that have not opted into v3 behavior governance (no `behaviors/` directory) auto-pass. The override loop is meaningful only when behaviors are active and may generate soft_block overrides. See `scripts/process-override-log.sh` and `docs/v4/SPEC.md`. +**Verificación:** Comparar `dotforge_version` del registry con `$DOTFORGE_DIR/VERSION`. diff --git a/audit/score.sh b/audit/score.sh index b5dfaad..02dba41 100755 --- a/audit/score.sh +++ b/audit/score.sh @@ -4,14 +4,18 @@ # # Usage: ./audit/score.sh [PROJECT_DIR] [--json] [--threshold N] # -# Computes the 15-item checklist mechanically without Claude. +# Two-dimension model (v4.x — see audit/checklist.md + audit/scoring.md): +# Dimension A — Native Health: 5 obligatory (0-2) + 10 recommended (0-1). +# score = obl*0.7 + rec*0.3, security cap 6.0. The CI gate. +# Dimension B — dotforge Adoption: 5 items (0-1). Informational, 0-5. +# Does NOT affect Native Health. # Semantic checks (CLAUDE.md quality, rule content) are approximated with heuristics. # Score is indicative — /forge audit provides authoritative semantic evaluation. # # Exit codes: # 0 — audit complete # 1 — PROJECT_DIR not found -# 2 — threshold set and score < threshold (CI gate) +# 2 — threshold set and native_health < threshold (CI gate) set -uo pipefail @@ -36,13 +40,15 @@ fi cd "$PROJECT_DIR" -# --- Score variables (s1..s15) and notes (n1..n15) --- +# --- Dimension A: scores (s1..s15) and notes (n1..n15) --- s1=0; n1=""; s2=0; n2=""; s3=0; n3=""; s4=0; n4=""; s5=0; n5="" s6=0; n6=""; s7=0; n7=""; s8=0; n8=""; s9=0; n9=""; s10=0; n10="" s11=0; n11=""; s12=0; n12=""; s13=0; n13=""; s14=0; n14=""; s15=0; n15="" +# --- Dimension B: scores (b1..b5) and notes (m1..m5) --- +b1=0; m1=""; b2=0; m2=""; b3=0; m3=""; b4=0; m4=""; b5=0; m5="" # ───────────────────────────────────────────────────────────────────────────── -# OBLIGATORIO (each 0-2) +# DIMENSION A — OBLIGATORIO (each 0-2) # ───────────────────────────────────────────────────────────────────────────── # 1. CLAUDE.md @@ -128,75 +134,24 @@ else fi # ───────────────────────────────────────────────────────────────────────────── -# RECOMENDADO (each 0-1) +# DIMENSION A — RECOMENDADO (each 0-1) — native Claude Code usage # ───────────────────────────────────────────────────────────────────────────── -# 6. CLAUDE_ERRORS.md -if [[ ! -f "CLAUDE_ERRORS.md" ]]; then - s6=0; n6="CLAUDE_ERRORS.md not found" -elif grep -qE '\| *Type *\||\| *Tipo *\|' "CLAUDE_ERRORS.md"; then - s6=1; n6="Present with Type column" -else - s6=1; n6="Present but missing Type column" -fi - -# 7. Lint hook (any lint hook: lint-on-save, lint-python, lint-ts, lint-swift, etc.) -LINT_FOUND="" -for lf in .claude/hooks/lint-*.sh; do - [[ -f "$lf" ]] && LINT_FOUND="$lf" && break -done -if [[ -n "$LINT_FOUND" && -x "$LINT_FOUND" ]]; then s7=1; n7="$(basename "$LINT_FOUND") present and executable" -elif [[ -n "$LINT_FOUND" ]]; then s7=1; n7="$(basename "$LINT_FOUND") present but not executable" -else s7=0; n7="No lint hook found (lint-*.sh)" -fi - -# 8. Custom commands -CMD_DIR=".claude/commands" -if [[ -d "$CMD_DIR" ]] && [[ -n "$(ls "$CMD_DIR"/*.md 2>/dev/null)" ]]; then - CC=$(ls "$CMD_DIR"/*.md 2>/dev/null | wc -l | tr -d ' ') - s8=1; n8="${CC} custom command(s)" -else - s8=0; n8=".claude/commands/ absent or empty" -fi - -# 9. Project memory (agent-memory with real content, or MEMORY.md) -MEM_FILES=$(find .claude/agent-memory -name "*.md" -not -name ".gitkeep" 2>/dev/null | wc -l | tr -d ' ') -if [[ "$MEM_FILES" -gt 0 ]]; then - s9=1; n9="agent-memory/ with ${MEM_FILES} file(s)" -elif [[ -d ".claude/agent-memory" ]]; then - s9=1; n9="agent-memory/ initialized (no content yet)" -elif [[ -f ".claude/MEMORY.md" ]]; then - s9=1; n9="MEMORY.md present" -else - s9=0; n9="No project memory found" -fi - -# 10. Agents + orchestration -HA2=0; HR2=0 -[[ -d ".claude/agents" ]] && [[ -n "$(ls .claude/agents/*.md 2>/dev/null)" ]] && HA2=1 -[[ -f ".claude/rules/agents.md" ]] && HR2=1 -if [[ $HA2 -eq 1 && $HR2 -eq 1 ]]; then - AC=$(ls .claude/agents/*.md 2>/dev/null | wc -l | tr -d ' ') - s10=1; n10="${AC} agents + agents.md rule" -elif [[ $HA2 -eq 1 || $HR2 -eq 1 ]]; then s10=1; n10="Partial (agents:${HA2} rule:${HR2})" -else s10=0; n10="No agents or orchestration rule" -fi - -# 11. .gitignore +# 6. .gitignore protects secrets if [[ ! -f ".gitignore" ]]; then - s11=0; n11=".gitignore not found" + s6=0; n6=".gitignore not found" else GE=$(grep -cE '^\.env$|^\.env\b' .gitignore 2>/dev/null) GK=$(grep -c '\.key' .gitignore 2>/dev/null) GP=$(grep -c '\.pem' .gitignore 2>/dev/null) GR=$(grep -cE '(credentials|secret)' .gitignore 2>/dev/null) GC=$((GE + GK + GP + GR)) - if [[ $GC -ge 2 ]]; then s11=1; n11="Covers secrets (${GC}/4 patterns)" - else s11=0; n11="Weak secret protection (${GC}/4 patterns)" + if [[ $GC -ge 2 ]]; then s6=1; n6="Covers secrets (${GC}/4 patterns)" + else s6=0; n6="Weak secret protection (${GC}/4 patterns)" fi fi -# 12. Prompt injection scan +# 7. Prompt injection scan SCAN_FOUND="" SCAN_COUNT=0 for f in CLAUDE.md .claude/rules/*.md .claude/*.md; do @@ -208,55 +163,30 @@ for f in CLAUDE.md .claude/rules/*.md .claude/*.md; do [[ -n "$MATCH" ]] && SCAN_FOUND="${SCAN_FOUND} ${f}" done if [[ -n "$SCAN_FOUND" ]]; then - s12=0; n12="⚠ Suspicious patterns in:${SCAN_FOUND}" + s7=0; n7="⚠ Suspicious patterns in:${SCAN_FOUND}" else - s12=1; n12="Clean (${SCAN_COUNT} files scanned)" + s7=1; n7="Clean (${SCAN_COUNT} files scanned)" fi -# 13. Auto mode safety +# 8. Auto mode safety if [[ ! -f "$SETTINGS" ]]; then - s13=1; n13="settings.json not found — auto mode not enabled (pass)" + s8=1; n8="settings.json not found — auto mode not enabled (pass)" elif ! grep -q '"defaultMode"' "$SETTINGS" 2>/dev/null; then - s13=1; n13="defaultMode not set — auto mode not enabled (pass)" + s8=1; n8="defaultMode not set — auto mode not enabled (pass)" elif ! grep -q '"auto"' "$SETTINGS" 2>/dev/null; then - s13=1; n13="defaultMode present but not auto (pass)" + s8=1; n8="defaultMode present but not auto (pass)" else - # Auto mode is enabled — check deny list covers secrets HE=$(grep -c '\.env' "$SETTINGS" 2>/dev/null) HK=$(grep -c '\.key' "$SETTINGS" 2>/dev/null) HP=$(grep -c '\.pem' "$SETTINGS" 2>/dev/null) HR=$(grep -c 'credentials' "$SETTINGS" 2>/dev/null) DC=$((HE + HK + HP + HR)) - if [[ $DC -ge 3 ]]; then s13=1; n13="Auto mode enabled WITH deny list covering secrets (${DC}/4)" - else s13=0; n13="Auto mode enabled WITHOUT complete deny list (.env:${HE} .key:${HK} .pem:${HP} credentials:${HR})" + if [[ $DC -ge 3 ]]; then s8=1; n8="Auto mode enabled WITH deny list covering secrets (${DC}/4)" + else s8=0; n8="Auto mode enabled WITHOUT complete deny list (.env:${HE} .key:${HK} .pem:${HP} credentials:${HR})" fi fi -# 14. Behaviors coverage (v3 — behavior governance) -# Pass if project has at least one compiled behavior hook OR a behaviors/index.yaml -# with at least one enabled behavior. -s14=0; n14="No v3 behaviors detected" -if [[ -f "behaviors/index.yaml" ]]; then - BH_ENABLED=$(python3 -c " -import yaml, sys -try: - d = yaml.safe_load(open('behaviors/index.yaml')) or {} - n = sum(1 for b in (d.get('behaviors') or []) if b.get('enabled', True)) - print(n) -except Exception: - print(0) -" 2>/dev/null) - if [[ "${BH_ENABLED:-0}" -gt 0 ]]; then - s14=1; n14="${BH_ENABLED} behaviors enabled in behaviors/index.yaml" - fi -elif ls .claude/hooks/generated/*__pretooluse__*.sh >/dev/null 2>&1; then - BH_COUNT=$(ls .claude/hooks/generated/*__pretooluse__*.sh 2>/dev/null | wc -l | tr -d ' ') - s14=1; n14="${BH_COUNT} compiled behavior hooks in .claude/hooks/generated/" -elif [[ -f "$SETTINGS" ]] && grep -qE '(behaviors|__pretooluse__)' "$SETTINGS" 2>/dev/null; then - s14=1; n14="behavior hook references present in settings.json" -fi - -# 15. OS-level sandboxing +# 9. OS-level sandboxing SANDBOX_STATE="off" if [[ -f "$SETTINGS" ]]; then SANDBOX_STATE=$(python3 -c " @@ -277,7 +207,6 @@ except Exception: print('off') " 2>/dev/null) fi - HANDLES_SECRETS=0 SECRET_REASON="" if ls .env .env.* 2>/dev/null | grep -vE '\.(example|sample|template)$' >/dev/null 2>&1; then @@ -287,37 +216,151 @@ elif find . -maxdepth 3 -type f \( -name '*.key' -o -name '*.pem' -o -name 'cred elif grep -rqE '(gcloud|aws configure|kubectl apply|firebase login|openai|anthropic|supabase)' --include='*.sh' --include='*.md' --include='*.env' --include='*.yaml' . 2>/dev/null; then HANDLES_SECRETS=1; SECRET_REASON="cloud/API refs in scripts or docs" fi - case "$SANDBOX_STATE" in - on_restricted) - s15=1; n15="sandbox.enabled with filesystem/network restrictions" ;; - on_permissive) - s15=0; n15="sandbox.enabled but no filesystem/network restrictions configured" ;; + on_restricted) s9=1; n9="sandbox.enabled with filesystem/network restrictions" ;; + on_permissive) s9=0; n9="sandbox.enabled but no filesystem/network restrictions configured" ;; off) - if [[ $HANDLES_SECRETS -eq 0 ]]; then - s15=1; n15="No secrets detected — sandboxing not required (auto-pass)" - else - s15=0; n15="Project handles secrets (${SECRET_REASON}) but sandbox.enabled is not true" - fi - ;; + if [[ $HANDLES_SECRETS -eq 0 ]]; then s9=1; n9="No secrets detected — sandboxing not required (auto-pass)" + else s9=0; n9="Project handles secrets (${SECRET_REASON}) but sandbox.enabled is not true" + fi ;; esac +# 10. Lint hook (lint-on-save, lint-python, lint-ts, lint-swift, etc.) +LINT_FOUND="" +for lf in .claude/hooks/lint-*.sh; do + [[ -f "$lf" ]] && LINT_FOUND="$lf" && break +done +if [[ -n "$LINT_FOUND" && -x "$LINT_FOUND" ]]; then s10=1; n10="$(basename "$LINT_FOUND") present and executable" +elif [[ -n "$LINT_FOUND" ]]; then s10=1; n10="$(basename "$LINT_FOUND") present but not executable" +else s10=0; n10="No lint hook found (lint-*.sh)" +fi + +# 11. Auto-memory well used (MEMORY.md index hygiene + error log) +ERRLOG=0 +if [[ -f "CLAUDE_ERRORS.md" ]]; then + if grep -qE '\| *Type *\||\| *Tipo *\|' "CLAUDE_ERRORS.md"; then ERRLOG=2; else ERRLOG=1; fi +fi +MEM_PRESENT=0; MEM_DUMP=0; MEM_LINES=0 +for mf in ".claude/MEMORY.md" "MEMORY.md"; do + if [[ -f "$mf" ]]; then + MEM_PRESENT=1 + MEM_LINES=$(wc -l < "$mf" | tr -d ' ') + MEM_BYTES=$(wc -c < "$mf" | tr -d ' ') + if [[ ${MEM_LINES:-0} -gt 200 || ${MEM_BYTES:-0} -gt 25600 ]]; then MEM_DUMP=1; fi + break + fi +done +AGMEM=$(find .claude/agent-memory -name "*.md" -not -name ".gitkeep" 2>/dev/null | wc -l | tr -d ' ') +if [[ $MEM_DUMP -eq 1 ]]; then + s11=0; n11="MEMORY.md is a dump (${MEM_LINES} lines / >25KB) — only first 200 lines/25KB injected" +elif [[ $ERRLOG -ge 1 || ${AGMEM:-0} -gt 0 || $MEM_PRESENT -eq 1 ]]; then + s11=1; n11="Memory present (error-log:${ERRLOG} agent-mem:${AGMEM} memory.md-index:${MEM_PRESENT})" +else + s11=0; n11="No project memory or error log found" +fi + +# 12. Permission cascade (machine-local overrides in settings.local.json) +if [[ -f ".claude/settings.local.json" ]]; then + s12=1; n12="settings.local.json used for local overrides" +elif [[ -f "$SETTINGS" ]] && grep -qE '/Users/|/home/[a-z]' "$SETTINGS" 2>/dev/null; then + s12=0; n12="Machine paths in versioned settings.json — move to settings.local.json" +else + s12=1; n12="No local overrides needed (auto-pass)" +fi + +# 13. Attribution configured (attribution.* not deprecated includeCoAuthoredBy) +if [[ -f "$SETTINGS" ]] && grep -q 'includeCoAuthoredBy' "$SETTINGS" 2>/dev/null; then + s13=0; n13="Uses deprecated includeCoAuthoredBy — migrate to attribution.commit/pr" +elif [[ -f "$SETTINGS" ]] && grep -q '"attribution"' "$SETTINGS" 2>/dev/null; then + s13=1; n13="attribution.* configured" +else + s13=1; n13="Default co-author acceptable (auto-pass)" +fi + +# 14. Custom commands +CMD_DIR=".claude/commands" +if [[ -d "$CMD_DIR" ]] && [[ -n "$(ls "$CMD_DIR"/*.md 2>/dev/null)" ]]; then + CC=$(ls "$CMD_DIR"/*.md 2>/dev/null | wc -l | tr -d ' ') + s14=1; n14="${CC} custom command(s)" +else + s14=0; n14=".claude/commands/ absent or empty" +fi + +# 15. Agents + orchestration +HA2=0; HR2=0 +[[ -d ".claude/agents" ]] && [[ -n "$(ls .claude/agents/*.md 2>/dev/null)" ]] && HA2=1 +[[ -f ".claude/rules/agents.md" ]] && HR2=1 +if [[ $HA2 -eq 1 && $HR2 -eq 1 ]]; then + AC=$(ls .claude/agents/*.md 2>/dev/null | wc -l | tr -d ' ') + s15=1; n15="${AC} agents + agents.md rule" +elif [[ $HA2 -eq 1 || $HR2 -eq 1 ]]; then s15=1; n15="Partial (agents:${HA2} rule:${HR2})" +else s15=0; n15="No agents or orchestration rule" +fi + # ───────────────────────────────────────────────────────────────────────────── -# Calculate score +# DIMENSION B — dotforge Adoption (each 0-1, informational) +# ───────────────────────────────────────────────────────────────────────────── + +# B1. v3 behaviors compiled AND wired in settings.json +if ls .claude/hooks/generated/*__pretooluse__*.sh >/dev/null 2>&1 \ + && [[ -f "$SETTINGS" ]] && grep -qE '(generated|__pretooluse__)' "$SETTINGS" 2>/dev/null; then + BH_COUNT=$(ls .claude/hooks/generated/*__pretooluse__*.sh 2>/dev/null | wc -l | tr -d ' ') + b1=1; m1="${BH_COUNT} compiled behavior hook(s) wired in settings.json" +elif [[ -f "behaviors/index.yaml" ]]; then + b1=0; m1="behaviors declared but not compiled+wired (declaration alone does not count)" +else + b1=0; m1="No v3 behaviors" +fi + +# B2. Workflow availability (v4) +if ls workflows/*.js >/dev/null 2>&1 && grep -lq "export const meta" workflows/*.js 2>/dev/null; then + WF=$(grep -l "export const meta" workflows/*.js 2>/dev/null | wc -l | tr -d ' ') + b2=1; m2="${WF} workflow(s) with export const meta" +else + b2=0; m2="No workflows/ with valid meta block" +fi + +# B3. Override capture loop active (v4) +if [[ -f ".forge/audit/overrides.log" ]] && [[ -f "$SETTINGS" ]] \ + && grep -q "session-start-process-overrides.sh" "$SETTINGS" 2>/dev/null; then + b3=1; m3="overrides.log present and hook wired in SessionStart" +else + b3=0; m3="Override loop not wired (log + SessionStart hook required)" +fi + +# B4. Domain rules present +DOM=$(ls .claude/rules/domain/*.md 2>/dev/null | wc -l | tr -d ' ') +if [[ "${DOM:-0}" -gt 0 ]]; then + b4=1; m4="${DOM} domain rule(s) (freshness checked semantically by /forge audit)" +else + b4=0; m4="No domain rules in .claude/rules/domain/" +fi + +# B5. Sync recency — not mechanically determinable standalone (needs registry) +b5=0; m5="Sync recency indeterminate standalone — resolved by /forge audit via registry" + +# ───────────────────────────────────────────────────────────────────────────── +# Calculate scores # ───────────────────────────────────────────────────────────────────────────── SCORE_OBL=$((s1 + s2 + s3 + s4 + s5)) SCORE_REC=$((s6 + s7 + s8 + s9 + s10 + s11 + s12 + s13 + s14 + s15)) - -SCORE_TOTAL=$(awk "BEGIN { printf \"%.2f\", ${SCORE_OBL} * 0.7 + ${SCORE_REC} * (3.0 / 10) }") +NATIVE_HEALTH=$(awk "BEGIN { printf \"%.2f\", ${SCORE_OBL} * 0.7 + ${SCORE_REC} * (3.0 / 10) }") SECURITY_CAP=false if [[ $s2 -eq 0 || $s4 -eq 0 ]]; then SECURITY_CAP=true - SCORE_TOTAL=$(awk "BEGIN { v=${SCORE_TOTAL}; printf \"%.2f\", (v > 6.0 ? 6.0 : v) }") + NATIVE_HEALTH=$(awk "BEGIN { v=${NATIVE_HEALTH}; printf \"%.2f\", (v > 6.0 ? 6.0 : v) }") +fi + +FORGE_ADOPTION=$((b1 + b2 + b3 + b4 + b5)) +if [[ $FORGE_ADOPTION -eq 0 ]]; then ADOPTION_LABEL="None" +elif [[ $FORGE_ADOPTION -le 2 ]]; then ADOPTION_LABEL="Partial" +elif [[ $FORGE_ADOPTION -le 4 ]]; then ADOPTION_LABEL="Most" +else ADOPTION_LABEL="Full" fi LEVEL=$(awk "BEGIN { - s = ${SCORE_TOTAL} + s = ${NATIVE_HEALTH} if (s >= 9) print \"Excelente\" else if (s >= 7) print \"Bueno\" else if (s >= 5) print \"Aceptable\" @@ -338,27 +381,37 @@ if $OUTPUT_JSON; then python3 - <&2 - echo "FAIL: score ${SCORE_TOTAL} is below threshold ${THRESHOLD}" >&2 + echo "FAIL: native_health ${NATIVE_HEALTH} is below threshold ${THRESHOLD}" >&2 exit 2 fi fi diff --git a/audit/scoring.md b/audit/scoring.md index 3aa2cc8..ca8c740 100644 --- a/audit/scoring.md +++ b/audit/scoring.md @@ -1,39 +1,61 @@ # Scoring de Auditoría -## Cálculo +El audit produce **dos números independientes**: `native_health` (0-10, el score principal) y `forge_adoption` (0-5, informativo). + +## Dimensión A — Salud Nativa (score principal) ``` -score_obligatorio = sum(items 1-5) # máximo 10 -score_recomendado = sum(items 6-15) # máximo 10 -score_total = score_obligatorio * 0.7 + score_recomendado * (3.0 / 10) # max = 7.0 + 3.0 = 10.0 -score_normalizado = min(score_total, 10) +native_health_obligatorio = sum(items 1-5) # máximo 10 +native_health_recomendado = sum(items 6-15) # máximo 10 +native_health = native_health_obligatorio * 0.7 + native_health_recomendado * 0.3 # max = 7.0 + 3.0 = 10.0 +native_health = min(native_health, 10) ``` **Efecto:** obligatorios perfectos sin recomendados = 7.0 (Bueno). Cada recomendado aporta 0.3 — para llegar a 9+ se necesitan al menos 7 recomendados. -## Cap por seguridad crítica +### Cap por seguridad crítica -Si alguno de estos items es **0**, el score total tiene un cap máximo de **6.0**: +Si alguno de estos items es **0**, `native_health` tiene un cap máximo de **6.0**: - Item 2 (settings.json) — sin permisos configurados - Item 4 (hook block-destructive) — sin protección contra comandos destructivos **Razón:** Un proyecto sin seguridad básica no puede ser "Excelente" independientemente de cuántos recomendados tenga. -## Interpretación +### Interpretación -| Score | Nivel | Significado | +| native_health | Nivel | Significado | |-------|-------|-------------| -| 9-10 | Excelente | Configuración completa y madura. Solo ajustes menores. | +| 9-10 | Excelente | Configuración nativa completa y madura. Solo ajustes menores. | | 7-8.9 | Bueno | Sólido pero faltan algunos recomendados. | | 5-6.9 | Aceptable | Funcional pero con gaps importantes. Necesita sync. | | 3-4.9 | Deficiente | Faltan obligatorios. Necesita bootstrap parcial. | | 0-2.9 | Crítico | Casi sin configuración. Necesita bootstrap completo. | -## Prioridad de corrección +## Dimensión B — Adopción dotforge (informativo) + +``` +forge_adoption = sum(items B1-B5) # 0-5 +``` + +**No entra en `native_health` ni lo modifica.** Es un indicador de cuánta gobernanza dotforge adoptó el proyecto. + +| forge_adoption | Label | Lectura | +|----|-------|---------| +| 0 | None | Native-first puro. Válido y sin penalización. | +| 1-2 | Partial | Adopción parcial de la maquinaria. | +| 3-4 | Most | Adopción amplia. | +| 5 | Full | Gobernanza dotforge completa. | + +Un `forge_adoption: 0` con `native_health: 10` es un resultado **excelente y deseable** bajo el principio native-first (ver `.claude/rules/domain/native-vs-dotforge-boundary.md`). No recomendar adoptar maquinaria dotforge solo para subir B. + +## Prioridad de corrección (dimensión A primero) 1. Hook block-destructive (seguridad) 2. settings.json con deny list (seguridad) 3. CLAUDE.md (contexto para Claude) 4. Rules con globs (calidad de output) -5. Lint hook (calidad de código) -6. El resto +5. Auto-memory bien usado (MEMORY.md como índice) +6. Lint hook (calidad de código) +7. El resto de la dimensión A + +La dimensión B solo se aborda cuando el proyecto decide explícitamente adoptar gobernanza dotforge — nunca para "subir el número". diff --git a/docs/guia-uso.md b/docs/guia-uso.md index 470f704..02b984b 100644 --- a/docs/guia-uso.md +++ b/docs/guia-uso.md @@ -206,7 +206,7 @@ Principio fundamental: **merge, no overwrite**. Nunca sobrescribe sin confirmaci #### `/forge audit` — verificar estado -Score 0-10 normalizado contra un checklist de 15 items. +Dos dimensiones: **Salud Nativa** (score 0-10, checklist de 15 items) + **Adopción dotforge** (0-5, informativo, no afecta el score). ### Dashboard multi-proyecto @@ -340,7 +340,12 @@ Cada stack aporta: ## 7. Sistema de auditoría -### Checklist (15 items) +### Dos dimensiones + +- **A — Salud Nativa** (0-10): buen uso de Claude Code nativo + seguridad. El score principal. +- **B — Adopción dotforge** (0-5): cuánta gobernanza dotforge adoptó el proyecto. **Informativo — no afecta la Salud Nativa.** Un proyecto native-first con B=0 y A=10 es un resultado deseable. + +### Dimensión A — Salud Nativa (15 items) #### Obligatorios (0-2 puntos cada uno, peso 70%) @@ -352,29 +357,41 @@ Cada stack aporta: | 4 | **Hook block-destructive** | No existe | Existe pero mal configurado | Existe + ejecutable + wired en settings.json | | 5 | **Comandos build/test** | No documentados | En README pero no en CLAUDE.md | Documentados en CLAUDE.md con comandos exactos | -#### Recomendados (0-1 punto cada uno, peso 30%) +#### Recomendados (0-1 punto cada uno, peso 30%) — uso nativo de Claude Code + +| # | Item | Criterio | +|---|------|----------| +| 6 | .gitignore | Protege .env, *.key, *.pem, credentials | +| 7 | Prompt injection scan | Sin patrones sospechosos en rules/CLAUDE.md | +| 8 | Auto-mode safety | Allow rules usan comandos específicos, no patrones de intérprete | +| 9 | OS-level sandboxing | `sandbox.enabled` con restricciones de filesystem/network, o proyecto sin manejo de secretos (auto-pass) | +| 10 | Hook de lint | Configurado para el stack + ejecutable | +| 11 | **Auto-memory bien usado** | `MEMORY.md` es índice conciso (<200 líneas Y <25KB), no dump; `CLAUDE_ERRORS.md` con columna Type si rastrea errores | +| 12 | **Permission cascade** | Overrides locales en `settings.local.json`, no en el `settings.json` versionado (auto-pass si no hace falta) | +| 13 | **Attribution configurado** | `attribution.commit`/`attribution.pr` (no el deprecado `includeCoAuthoredBy`); auto-pass si el default alcanza | +| 14 | Comandos custom | Al menos 1 comando relevante | +| 15 | Agentes | Instalados + regla de orquestación activa | + +### Dimensión B — Adopción dotforge (5 items, informativo) | # | Item | Criterio | |---|------|----------| -| 6 | CLAUDE_ERRORS.md | Existe con formato de tabla y tipos válidos | -| 7 | Hook de lint | Configurado para el stack + ejecutable | -| 8 | Comandos custom | Al menos 1 comando relevante | -| 9 | Memory del proyecto | Existe con contexto útil | -| 10 | Agentes | Instalados + regla de orquestación activa | -| 11 | .gitignore | Protege .env, *.key, *.pem, credentials | -| 12 | Prompt injection scan | Sin patrones sospechosos en rules/CLAUDE.md | -| 13 | Auto-mode safety | Allow rules usan comandos específicos, no patrones de intérprete | -| 14 | Behaviors coverage (v3) | Al menos 1 behavior habilitado en `behaviors/index.yaml` o hook compilado en `.claude/hooks/generated/` | -| 15 | OS-level sandboxing | `sandbox.enabled` con restricciones de filesystem/network, o proyecto sin manejo de secretos (auto-pass) | +| B1 | Behaviors v3 compilados | Hook compilado en `.claude/hooks/generated/` Y wired en settings.json | +| B2 | Workflow availability | `workflows/` con al menos un `.js` con `export const meta` | +| B3 | Override capture loop | `.forge/audit/overrides.log` + `session-start-process-overrides.sh` wired en SessionStart | +| B4 | Domain rules | Al menos un rule en `.claude/rules/domain/` (frescura evaluada semánticamente) | +| B5 | Sync recency | `dotforge_version` del proyecto == `VERSION` actual | ### Fórmula de scoring ``` -score = obligatorio × 0.7 + recomendado × (3.0 / 10) +native_health = obligatorio × 0.7 + recomendado × 0.3 # 0-10, score principal +forge_adoption = sum(B1..B5) # 0-5, informativo ``` - Obligatorios perfectos sin recomendados = **7.0** (Bueno) - Cada recomendado aporta 0.3 — para llegar a 9+ necesitás al menos 7 recomendados +- `forge_adoption` nunca afecta `native_health` ### Cap de seguridad diff --git a/docs/usage-guide.md b/docs/usage-guide.md index 471985f..197a658 100644 --- a/docs/usage-guide.md +++ b/docs/usage-guide.md @@ -391,7 +391,13 @@ Each stack provides: ## 7. Audit system -### Checklist (15 items) +### Two dimensions + +The audit produces two independent numbers: +- **A — Native Health** (0-10): good use of native Claude Code + security. The primary score. +- **B — dotforge Adoption** (0-5): how much dotforge governance the project adopted. **Informational — does not affect Native Health.** A native-first project scoring B=0 with A=10 is a desirable outcome. + +### Dimension A — Native Health (15 items) #### Required (0-2 points each, 70% weight) @@ -403,29 +409,41 @@ Each stack provides: | 4 | **Hook block-destructive** | Does not exist | Exists but misconfigured | Exists + executable + wired in settings.json | | 5 | **Build/test commands** | Not documented | In README but not in CLAUDE.md | Documented in CLAUDE.md with exact commands | -#### Recommended (0-1 point each, 30% weight) +#### Recommended (0-1 point each, 30% weight) — native Claude Code usage + +| # | Item | Criteria | +|---|------|----------| +| 6 | .gitignore | Protects .env, *.key, *.pem, credentials | +| 7 | Prompt injection scan | No suspicious patterns in rules/CLAUDE.md | +| 8 | Auto-mode safety | Allow rules use specific tool commands, not interpreter patterns | +| 9 | OS-level sandboxing | `sandbox.enabled` with filesystem/network restrictions, or project handles no secrets (auto-pass) | +| 10 | Lint hook | Configured for the stack + executable | +| 11 | **Auto-memory well used** | `MEMORY.md` is a concise index (<200 lines AND <25KB), not a dump; `CLAUDE_ERRORS.md` with Type column if errors tracked | +| 12 | **Permission cascade** | Machine-local overrides in `settings.local.json`, not in versioned `settings.json` (auto-pass if none needed) | +| 13 | **Attribution configured** | `attribution.commit`/`attribution.pr` set (not deprecated `includeCoAuthoredBy`); auto-pass if default acceptable | +| 14 | Custom commands | At least 1 relevant command | +| 15 | Agents | Installed + active orchestration rule | + +### Dimension B — dotforge Adoption (5 items, informational) | # | Item | Criteria | |---|------|----------| -| 6 | CLAUDE_ERRORS.md | Exists with table format and valid types | -| 7 | Lint hook | Configured for the stack + executable | -| 8 | Custom commands | At least 1 relevant command | -| 9 | Project memory | Exists with useful context | -| 10 | Agents | Installed + active orchestration rule | -| 11 | .gitignore | Protects .env, *.key, *.pem, credentials | -| 12 | Prompt injection scan | No suspicious patterns in rules/CLAUDE.md | -| 13 | Auto-mode safety | Allow rules use specific tool commands, not interpreter patterns | -| 14 | Behaviors coverage (v3) | At least 1 behavior enabled in `behaviors/index.yaml` or compiled hook under `.claude/hooks/generated/` | -| 15 | OS-level sandboxing | `sandbox.enabled` with filesystem/network restrictions, or project handles no secrets (auto-pass) | +| B1 | v3 behaviors compiled | Compiled hook under `.claude/hooks/generated/` AND wired in settings.json | +| B2 | Workflow availability | `workflows/` with at least one `.js` containing `export const meta` | +| B3 | Override capture loop | `.forge/audit/overrides.log` + `session-start-process-overrides.sh` wired in SessionStart | +| B4 | Domain rules | At least one rule in `.claude/rules/domain/` (freshness checked semantically) | +| B5 | Sync recency | Project `dotforge_version` == current `VERSION` | ### Scoring formula ``` -score = required x 0.7 + recommended x (3.0 / 10) +native_health = required x 0.7 + recommended x 0.3 # 0-10, the primary score +forge_adoption = sum(B1..B5) # 0-5, informational ``` - Perfect required items without recommended = **7.0** (Good) - Each recommended item contributes 0.3 — to reach 9+ you need at least 7 recommended items +- `forge_adoption` never affects `native_health` ### Security cap diff --git a/docs/v4/SCOPE-DECISION.md b/docs/v4/SCOPE-DECISION.md new file mode 100644 index 0000000..088922b --- /dev/null +++ b/docs/v4/SCOPE-DECISION.md @@ -0,0 +1,69 @@ +# ADR v4 — Frontera native-first: qué mantiene dotforge y qué cede + +**Fecha:** 2026-06-03 +**Estado:** Aceptada +**Decisión asociada:** complementa `docs/v3/DECISIONS.md`; criterio permanente en `.claude/rules/domain/native-vs-dotforge-boundary.md` + +## Contexto + +En los últimos 6 meses Claude Code absorbió nativamente buena parte de la superficie +que dotforge construyó como capa externa: hooks maduros, `hookify`, auto-memory, +`/workflows`, Agent Teams, `/code-review`, `/init`, sandboxing, permission cascade. +El changelog reciente de dotforge es mayormente *reactivo* (sincronizar con +v2.1.144–v2.1.161). Riesgo: gastar energía corriendo detrás de features nativas en vez +de crear valor no-replicable. + +## Principio adoptado + +**Lo nativo gana por defecto. dotforge solo existe donde lo nativo no llega.** +El scope se reduce a medida que Claude Code crece, y eso es correcto. + +## Método (obligatorio) + +Toda decisión de frontera exige verificar el estado nativo ACTUAL contra docs oficiales +ANTES de clasificar. Aplicar el principio sobre supuestos viejos produce decisiones +erróneas: en la primera pasada se recomendó "recortar v3 fuerte" apoyándose en un +`COMPETITIVE.md` con semanas de antigüedad; la verificación fresca mostró que el delta +de v3 NO está cubierto nativamente. Por eso `/forge watch` deja de ser meta-trabajo: +es el insumo que hace correctas las decisiones de scope. + +## Evidencia verificada (2026-06-03) + +- **hookify** ([github.com/anthropics/claude-code/plugins/hookify](https://github.com/anthropics/claude-code/tree/main/plugins/hookify)): + plugin oficial, wrapper de conveniencia sobre hooks, acciones binarias `warn`/`block`. + Sin escalación de 5 niveles, sin state compartido, sin override auditable. Anthropic + no shipeó capa de behavior governance. → delta de v3 **no cubierto**. +- **auto-memory** ([code.claude.com/docs/en/memory](https://code.claude.com/docs/en/memory)): + estrictamente per-proyecto. NO propaga cross-project (issues #36561, #39195 abiertas + sin timeline). Para *instrucciones* compartidas sí hay nativo (global CLAUDE.md + + symlinks en `.claude/rules/`), pero no para *learnings* ni para propagación con merge. + +## Decisión + +**MANTENER** (sin equivalente nativo): +1. Domain rules — activo principal. +2. Propagación cross-project con merge/customización preservada (`forge:section` + sync). +3. Behaviors v3 (escalación + state + override audit) — sujeto a validar uso real del + `overrides.log` en proyectos de producción. +4. Registry + audit cross-project — **reorientado** a auditar buen uso de lo nativo. + +**CEDER** (nativo lo resuelve): +- Captura individual de learnings → auto-memory. +- Reglas idénticas compartidas (sin merge) → symlinks + global CLAUDE.md. +- Workflows / orquestación → `/workflows`, Agent Teams (ya cedido en v4). +- CLAUDE.md base → `/init`. +- One-shot code review → `/code-review`. +- Model routing como sistema → `/effort` (queda como documentación). + +## Consecuencias + +- Es un downsizing estratégico, no una expansión. El compilador de behaviors queda + candidato a retiro si el delta del override log no se sostiene por demanda real. +- No se retira código sin verificación empírica previa de que el nativo cubre el caso. +- El audit reorientado ("¿usás bien lo nativo?") es la pieza de mayor ROI nuevo y no + envejece con cada release de Claude Code. + +## Validación pendiente + +- ¿Los proyectos de producción (trading, banca NBCH) consultan `overrides.log`? + Decide el futuro del compilador de behaviors v3. diff --git a/registry/projects.yml b/registry/projects.yml index a130f76..051ef3b 100644 --- a/registry/projects.yml +++ b/registry/projects.yml @@ -10,9 +10,13 @@ # committed example shows the schema; `projects.local.yml` holds real state. # # Schema: name, path, stacks, last_audit, last_sync, dotforge_version, score, -# history, metrics_summary, ultracode_tier, notes -# Scoring v2.3.0: score = obligatorio*0.7 + recomendado*(3.0/8) (max 10.0) -# history: array of {date, score, version} — appended, never overwritten +# forge_adoption, history, metrics_summary, ultracode_tier, notes +# Scoring v4.x (two dimensions, see audit/scoring.md): +# score = native_health = obligatorio*0.7 + recomendado*0.3 (max 10.0) +# items 1-5 obligatorios (0-2), items 6-15 recomendados (0-1) +# security cap 6.0 if item 2 or 4 == 0 +# forge_adoption = sum(items B1-B5), 0-5, INFORMATIONAL — does not affect score +# history: array of {date, score, adoption, version} — appended, never overwritten # metrics_summary: aggregated from ~/.claude/metrics/{slug}/ via /forge insights # # ultracode_tier (v3.12.0+): drives default posture for Ultracode mode + @@ -39,10 +43,11 @@ projects: last_audit: 2026-04-14 dotforge_version: 3.0.4 score: 9.7 + forge_adoption: 5 history: - {date: 2026-04-08, score: 10.0, version: 2.9.1} - {date: 2026-04-14, score: 9.7, version: 3.0.4} - notes: "Reference config. v3 behaviors compiled to generated/. Sandbox pending (item 15)." + notes: "Reference config. native_health 9.7, adoption Full (5/5). Sandbox pending (item 9)." - name: cds-dashboard path: /Users/luiseiman/Documents/jira nbch/cds-dashboard diff --git a/scripts/audit_all.py b/scripts/audit_all.py index b4ccb07..1bc1d31 100755 --- a/scripts/audit_all.py +++ b/scripts/audit_all.py @@ -2,7 +2,9 @@ """Audit all projects listed in registry/projects.local.yml against audit/checklist.md. Deterministic, script-based alternative to running the /audit-project skill 12 times. -Walks each project path, scores the 15 checklist items, and updates the registry. +Two-dimension model (v4.x): + - Native Health (score, 0-10): 5 obligatory + 10 recommended native-usage items. + - dotforge Adoption (forge_adoption, 0-5): informational, does not affect score. Usage: python3 scripts/audit_all.py [--dry-run] """ @@ -25,9 +27,6 @@ VERSION_FILE = DOTFORGE / "VERSION" TODAY = date.today().isoformat() -# Prompt-injection patterns — tuned to avoid false positives on CLI placeholders. -# Standalone in docs (e.g. `/compact `) is NOT flagged. -# We require either a matching close-tag OR a hijack phrase. INJECTION_PHRASES = [ r"ignore previous", r"IGNORE ALL", @@ -76,11 +75,12 @@ def scan_injection(texts: list[str]) -> tuple[bool, str]: return False, "" -def audit(proj_path: Path, name: str) -> dict: - r = {"name": name, "path": str(proj_path), "items": {}, "notes": []} +def audit(proj_path: Path, name: str, version: str, prev_version) -> dict: + r = {"name": name, "path": str(proj_path), "items": {}, "adoption": {}, "notes": []} claude_md = proj_path / "CLAUDE.md" claude_dir = proj_path / ".claude" settings_json = claude_dir / "settings.json" + settings_local = claude_dir / "settings.local.json" hooks_dir = claude_dir / "hooks" rules_dir = claude_dir / "rules" commands_dir = claude_dir / "commands" @@ -89,6 +89,8 @@ def audit(proj_path: Path, name: str) -> dict: manifest = claude_dir / ".forge-manifest.json" gitignore = proj_path / ".gitignore" + # ── DIMENSION A — obligatory (0-2) ── + # Item 1: CLAUDE.md if not claude_md.exists(): r["items"]["1_claude_md"] = 0 @@ -146,8 +148,7 @@ def audit(proj_path: Path, name: str) -> dict: has_push = "--force" in content wired = False if s: - hooks = s.get("hooks", {}) - wired = "block-destructive" in json.dumps(hooks) + wired = "block-destructive" in json.dumps(s.get("hooks", {})) if executable and wired and has_rm and has_drop and has_push: r["items"]["4_block_destructive"] = 2 else: @@ -171,56 +172,19 @@ def audit(proj_path: Path, name: str) -> dict: else: r["items"]["5_build_test"] = 0 - # Item 6: CLAUDE_ERRORS.md (accept English or Spanish headers) - if errors_md.exists(): - t = read_text(errors_md) - has_type = bool(re.search(r"\b(Type|Tipo)\b", t)) and bool(re.search( - r"\b(syntax|logic|integration|config|security)\b", t, re.I)) - r["items"]["6_errors_md"] = 1 if has_type else 0 - else: - r["items"]["6_errors_md"] = 0 - - # Item 7: lint hook - lint_hooks = list(hooks_dir.glob("*lint*.sh")) if hooks_dir.exists() else [] - r["items"]["7_lint_hook"] = 1 if any(test_x(h) for h in lint_hooks) else 0 - - # Item 8: custom commands - cmd_files = list(commands_dir.glob("*.md")) if commands_dir.exists() else [] - r["items"]["8_commands"] = 1 if cmd_files else 0 - - # Item 9: project memory - mem_candidates = [ - claude_dir / "MEMORY.md", - claude_dir / "agent-memory", - proj_path / "MEMORY.md", - ] - has_mem = False - for m in mem_candidates: - if m.is_dir() and any(m.iterdir()): - has_mem = True - break - if m.is_file() and m.stat().st_size > 100: - has_mem = True - break - r["items"]["9_memory"] = 1 if has_mem else 0 - - # Item 10: agents - agent_files = list(agents_dir.glob("*.md")) if agents_dir.exists() else [] - agents_rule = (rules_dir / "agents.md") if rules_dir.exists() else None - has_agents = bool(agent_files) and agents_rule and agents_rule.exists() - r["items"]["10_agents"] = 1 if has_agents else 0 + # ── DIMENSION A — recommended (0-1): native Claude Code usage ── - # Item 11: .gitignore + # Item 6: .gitignore if gitignore.exists(): g = read_text(gitignore) has_env = bool(re.search(r"^\.env", g, re.M)) has_keys = bool(re.search(r"\*\.key|\*\.pem", g)) has_creds = bool(re.search(r"credentials", g, re.I)) - r["items"]["11_gitignore"] = 1 if (has_env and (has_keys or has_creds)) else 0 + r["items"]["6_gitignore"] = 1 if (has_env and (has_keys or has_creds)) else 0 else: - r["items"]["11_gitignore"] = 0 + r["items"]["6_gitignore"] = 0 - # Item 12: prompt-injection scan + # Item 7: prompt-injection scan scan_paths = [] if rules_dir.exists(): scan_paths.extend(rules_dir.glob("**/*.md")) @@ -230,9 +194,9 @@ def audit(proj_path: Path, name: str) -> dict: found, reason = scan_injection(texts) if found: r["notes"].append(f"injection: {reason}") - r["items"]["12_injection"] = 0 if found else 1 + r["items"]["7_injection"] = 0 if found else 1 - # Item 13: auto-mode safety + # Item 8: auto-mode safety if s: mode = s.get("permissions", {}).get("defaultMode", "") if mode == "auto": @@ -240,20 +204,13 @@ def audit(proj_path: Path, name: str) -> dict: denies_secrets = sum( 1 for d in deny if re.search(r"\.env|\*\.key|\*\.pem|credentials", str(d), re.I) ) >= 3 - r["items"]["13_auto_safe"] = 1 if denies_secrets else 0 + r["items"]["8_auto_safe"] = 1 if denies_secrets else 0 else: - r["items"]["13_auto_safe"] = 1 # auto mode not enabled — auto-pass + r["items"]["8_auto_safe"] = 1 # auto mode not enabled — auto-pass else: - r["items"]["13_auto_safe"] = 0 - - # Item 14: v3 behaviors - gen_dir = hooks_dir / "generated" - gen_hooks = list(gen_dir.glob("*__pretooluse__*.sh")) if gen_dir.exists() else [] - beh_idx = proj_path / "behaviors/index.yaml" - has_beh = bool(gen_hooks) or beh_idx.exists() - r["items"]["14_behaviors"] = 1 if has_beh else 0 + r["items"]["8_auto_safe"] = 1 # no settings — auto mode not enabled - # Item 15: sandbox / env-scrub auto-pass + # Item 9: sandbox / env-scrub auto-pass sandbox_on = False env_scrub = False if s: @@ -265,9 +222,9 @@ def audit(proj_path: Path, name: str) -> dict: fs = s.get("sandbox", {}).get("filesystem", {}) net = s.get("sandbox", {}).get("network", {}) has_restriction = bool(fs.get("denyRead") or fs.get("allowWrite") or net.get("allowedDomains")) - r["items"]["15_sandbox"] = 1 if has_restriction else 0 + r["items"]["9_sandbox"] = 1 if has_restriction else 0 elif env_scrub: - r["items"]["15_sandbox"] = 1 # env-scrub is acceptable defense-in-depth + r["items"]["9_sandbox"] = 1 # env-scrub is acceptable defense-in-depth else: has_secrets = False for pat in (".env", ".env.local", "credentials.json", "key.pem"): @@ -280,24 +237,118 @@ def audit(proj_path: Path, name: str) -> dict: if re.search(r"\b(gcloud|aws|kubectl|terraform)\b", t): shell_refs = True break - r["items"]["15_sandbox"] = 0 if (has_secrets or shell_refs) else 1 - - # Score calculation - mand = sum(r["items"][f"{i}_{k}"] for i, k in [ - (1, "claude_md"), (2, "settings"), (3, "rules"), - (4, "block_destructive"), (5, "build_test") - ]) - rec = sum(r["items"][f"{i}_{k}"] for i, k in [ - (6, "errors_md"), (7, "lint_hook"), (8, "commands"), (9, "memory"), - (10, "agents"), (11, "gitignore"), (12, "injection"), - (13, "auto_safe"), (14, "behaviors"), (15, "sandbox") - ]) + r["items"]["9_sandbox"] = 0 if (has_secrets or shell_refs) else 1 + + # Item 10: lint hook + lint_hooks = list(hooks_dir.glob("*lint*.sh")) if hooks_dir.exists() else [] + r["items"]["10_lint_hook"] = 1 if any(test_x(h) for h in lint_hooks) else 0 + + # Item 11: auto-memory well used (index hygiene + error log) + errlog = 0 + if errors_md.exists(): + t = read_text(errors_md) + if re.search(r"\b(Type|Tipo)\b", t) and re.search( + r"\b(syntax|logic|integration|config|security)\b", t, re.I): + errlog = 2 + else: + errlog = 1 + mem_present = False + mem_dump = False + for mf in (claude_dir / "MEMORY.md", proj_path / "MEMORY.md"): + if mf.is_file(): + mem_present = True + txt = read_text(mf) + lines = txt.count("\n") + 1 + if lines > 200 or mf.stat().st_size > 25600: + mem_dump = True + break + agmem_dir = claude_dir / "agent-memory" + agmem = bool(agmem_dir.is_dir() and any( + f.name != ".gitkeep" for f in agmem_dir.glob("**/*.md"))) + if mem_dump: + r["items"]["11_auto_memory"] = 0 + r["notes"].append("MEMORY.md is a dump (>200 lines/25KB)") + elif errlog >= 1 or agmem or mem_present: + r["items"]["11_auto_memory"] = 1 + else: + r["items"]["11_auto_memory"] = 0 + + # Item 12: permission cascade (machine-local overrides in settings.local.json) + if settings_local.exists(): + r["items"]["12_permission_cascade"] = 1 + elif s and re.search(r"/Users/|/home/[a-z]", json.dumps(s)): + r["items"]["12_permission_cascade"] = 0 + r["notes"].append("machine paths in versioned settings.json") + else: + r["items"]["12_permission_cascade"] = 1 # no local overrides needed + + # Item 13: attribution configured (not deprecated includeCoAuthoredBy) + if s and "includeCoAuthoredBy" in json.dumps(s): + r["items"]["13_attribution"] = 0 + r["notes"].append("deprecated includeCoAuthoredBy") + else: + r["items"]["13_attribution"] = 1 # attribution.* set, or default acceptable + + # Item 14: custom commands + cmd_files = list(commands_dir.glob("*.md")) if commands_dir.exists() else [] + r["items"]["14_commands"] = 1 if cmd_files else 0 + + # Item 15: agents + agent_files = list(agents_dir.glob("*.md")) if agents_dir.exists() else [] + agents_rule = (rules_dir / "agents.md") if rules_dir.exists() else None + has_agents = bool(agent_files) and agents_rule and agents_rule.exists() + r["items"]["15_agents"] = 1 if has_agents else 0 + + # ── DIMENSION B — dotforge Adoption (0-1 each, informational) ── + + # B1: behaviors compiled AND wired + gen_dir = hooks_dir / "generated" + gen_hooks = list(gen_dir.glob("*__pretooluse__*.sh")) if gen_dir.exists() else [] + wired_beh = bool(s and re.search(r"generated|__pretooluse__", json.dumps(s.get("hooks", {})))) + r["adoption"]["B1_behaviors"] = 1 if (gen_hooks and wired_beh) else 0 + + # B2: workflow availability + wf_dir = proj_path / "workflows" + wf_files = list(wf_dir.glob("*.js")) if wf_dir.exists() else [] + r["adoption"]["B2_workflows"] = 1 if any( + "export const meta" in read_text(f, 5000) for f in wf_files) else 0 + + # B3: override capture loop + override_log = proj_path / ".forge/audit/overrides.log" + wired_override = bool(s and "session-start-process-overrides.sh" in json.dumps(s)) + r["adoption"]["B3_override_loop"] = 1 if (override_log.exists() and wired_override) else 0 + + # B4: domain rules + domain_dir = rules_dir / "domain" + domain_rules = list(domain_dir.glob("*.md")) if domain_dir.exists() else [] + r["adoption"]["B4_domain_rules"] = 1 if domain_rules else 0 + + # B5: sync recency — project version matches current dotforge VERSION + r["adoption"]["B5_sync_recency"] = 1 if (prev_version and str(prev_version) == version) else 0 + + # ── Score calculation ── + mand = sum(r["items"][k] for k in ( + "1_claude_md", "2_settings", "3_rules", "4_block_destructive", "5_build_test")) + rec = sum(r["items"][k] for k in ( + "6_gitignore", "7_injection", "8_auto_safe", "9_sandbox", "10_lint_hook", + "11_auto_memory", "12_permission_cascade", "13_attribution", "14_commands", "15_agents")) total = mand * 0.7 + rec * 0.3 if r["items"]["2_settings"] == 0 or r["items"]["4_block_destructive"] == 0: total = min(total, 6.0) + adoption = sum(r["adoption"].values()) + if adoption == 0: + label = "None" + elif adoption <= 2: + label = "Partial" + elif adoption <= 4: + label = "Most" + else: + label = "Full" r["mand"] = mand r["rec"] = rec - r["score"] = round(total, 2) + r["score"] = round(total, 2) # native_health (registry-compatible key) + r["forge_adoption"] = adoption + r["adoption_label"] = label r["manifest_present"] = manifest.exists() return r @@ -319,28 +370,28 @@ def main() -> int: if not p.exists(): print(f"SKIP: {proj['name']} — path not found: {p}") continue - r = audit(p, proj["name"]) + r = audit(p, proj["name"], version, proj.get("dotforge_version")) r["prev_score"] = proj.get("score") r["prev_version"] = proj.get("dotforge_version") results.append(r) - print(f"\n{'Project':<20} {'Mand':>6} {'Rec':>6} {'Prev':>5} {'New':>5} {'Δ':>6} {'Baseline':>10} {'Notes'}") - print("─" * 100) + print(f"\n{'Project':<20} {'Mand':>6} {'Rec':>6} {'Prev':>5} {'Health':>6} {'Δ':>6} {'Adopt':>7} {'Notes'}") + print("─" * 104) for r in results: prev = r.get("prev_score") or 0 delta = r["score"] - prev delta_s = f"{delta:+.2f}" if prev else " new" - baseline = "manifest" if r["manifest_present"] else "none" + adopt = f"{r['forge_adoption']}/5 {r['adoption_label'][:4]}" notes = ", ".join(r["notes"][:2]) if r["notes"] else "" print( f"{r['name']:<20} {r['mand']:>3}/10 {r['rec']:>3}/10 " - f"{prev:>5.1f} {r['score']:>5.2f} {delta_s:>6} {baseline:>10} {notes[:40]}" + f"{prev:>5.1f} {r['score']:>6.2f} {delta_s:>6} {adopt:>7} {notes[:34]}" ) avg = sum(r["score"] for r in results) / len(results) perfect = sum(1 for r in results if r["score"] >= 9.0) need_attn = sum(1 for r in results if r["score"] < 9.0) - print(f"\n{len(results)} projects | avg {avg:.2f} | {perfect} perfect (≥9) | {need_attn} need attention") + print(f"\n{len(results)} projects | avg health {avg:.2f} | {perfect} perfect (≥9) | {need_attn} need attention") print() for r in results: @@ -358,10 +409,12 @@ def main() -> int: for r in results: if proj["name"] == r["name"]: hist = proj.setdefault("history", []) - hist.append({"date": TODAY, "score": r["score"], "version": version}) + hist.append({"date": TODAY, "score": r["score"], + "adoption": r["forge_adoption"], "version": version}) proj["history"] = hist[-8:] proj["last_audit"] = TODAY proj["score"] = r["score"] + proj["forge_adoption"] = r["forge_adoption"] proj["dotforge_version"] = version break diff --git a/skills/audit-project/SKILL.md b/skills/audit-project/SKILL.md index cd93079..0f27e41 100644 --- a/skills/audit-project/SKILL.md +++ b/skills/audit-project/SKILL.md @@ -64,40 +64,47 @@ For each checklist item, verify existence **and quality**: - Is it referenced in `.claude/settings.json` under hooks? 5. **Build/test commands** — Are they in CLAUDE.md? Do they match the detected stack? -### Recommended (0-10 bonus points) -6. **CLAUDE_ERRORS.md** — Does it exist with table format with Type column? -7. **Hook lint** — Does it exist? Is it executable? (verify `chmod +x`) -8. **Custom commands** — Are there files in `.claude/commands/`? -9. **Memory** — Are there project memory files? -10. **Agents** — Is there `.claude/agents/` + `agents.md` rule in rules? -11. **.gitignore** — Does it protect .env, *.key, *.pem, credentials? -12. **Prompt injection scan** — Are rules/CLAUDE.md free of suspicious patterns? -13. **Auto mode safety** — If `permissions.defaultMode: "auto"` in settings.json, is the deny list complete? (auto-pass if not auto) -14. **v3 behaviors compiled** — Are there `.claude/hooks/generated/*.sh` AND referenced in settings.json? (Project that hasn't opted into v3 governance scores 0; non-applicable cap does NOT apply) -15. **OS-level sandboxing** — `sandbox.enabled: true` with at least one restriction OR project demonstrably handles no secrets (auto-pass) -16. **Workflow availability (v4)** — `workflows/` directory exists with at least one `.js` file containing `export const meta` block. Auto-pass if project has not opted into v4 (no `workflows/` reference in `audit/scoring.md` for v3.x projects). -17. **Override capture loop active (v4)** — `.forge/audit/overrides.log` exists AND `session-start-process-overrides.sh` wired in `.claude/settings.json` SessionStart. Auto-pass if project hasn't installed v3 behaviors (no `behaviors/` dir). - -**Tier adjustments:** -- `simple`: items 8-10 score 0 don't penalize (treated as N/A) -- `complex`: items 8-10 become semi-obligatory (each 0-2 instead of 0-1) - -**v4 transition note:** Items 16-17 are documented in `audit/checklist.md` but enforcement varies by dotforge version: -- v3.x: items 16-17 auto-pass (informational only) -- v4.0+: items 16-17 contribute to score per normal rules - -To detect target enforcement: check `$DOTFORGE_DIR/VERSION` — if major < 4, treat items 16-17 as informational. - -## Step 4: Calculate score +### Dimension A — Native Health, Recommended (0-10 bonus points) +6. **.gitignore** — Does it protect .env, *.key, *.pem, credentials? +7. **Prompt injection scan** — Are rules/CLAUDE.md free of suspicious patterns? +8. **Auto mode safety** — If `permissions.defaultMode: "auto"`, is the deny list complete? (auto-pass if not auto) +9. **OS-level sandboxing** — `sandbox.enabled: true` with at least one restriction OR project demonstrably handles no secrets (auto-pass) +10. **Hook lint** — Does it exist? Is it executable? (verify `chmod +x`) +11. **Auto-memory well used (NEW)** — Is `MEMORY.md` a concise index (<200 lines AND <25KB), not a content dump? If errors are tracked, `CLAUDE_ERRORS.md` exists with table format (Type column). Penalize dumping content into the index — only first 200 lines / 25KB are injected per session. +12. **Permission cascade (NEW)** — Are machine-local overrides kept in `settings.local.json` rather than polluting versioned `settings.json`? Auto-pass if no local overrides needed. +13. **Attribution configured (NEW)** — `attribution.commit`/`attribution.pr` set (not the deprecated `includeCoAuthoredBy`)? Auto-pass if the default co-author is acceptable. For self-hosted forges, check `prUrlTemplate`. +14. **Custom commands** — Are there files in `.claude/commands/`? +15. **Agents** — Is there `.claude/agents/` + `agents.md` rule in rules? + +**Tier adjustments (dimension A):** +- `simple`: items 14-15 score 0 don't penalize (treated as N/A) +- `complex`: items 14-15 become semi-obligatory (each 0-2 instead of 0-1) + +### Dimension B — dotforge Adoption (informational, 0-5, does NOT affect native_health) +- **B1. v3 behaviors compiled** — `.claude/hooks/generated/*.sh` exist AND referenced in settings.json? +- **B2. Workflow availability (v4)** — `workflows/` with at least one `.js` containing `export const meta`? +- **B3. Override capture loop (v4)** — `.forge/audit/overrides.log` exists AND `session-start-process-overrides.sh` wired in SessionStart? +- **B4. Domain rules** — at least one rule in `.claude/rules/domain/` with `last_verified` <90 days? Report stale count. +- **B5. Sync recency** — project `dotforge_version` == `$DOTFORGE_DIR/VERSION`? + +A project scoring B=0 (native-first) is a valid, non-penalized outcome. Never recommend adopting dotforge machinery just to raise B. + +## Step 4: Calculate scores (two dimensions) Use weights from `$DOTFORGE_DIR/audit/scoring.md`: -1. `score_obligatory = sum(items 1-5)` — maximum 10 -2. `score_recommended = sum(items 6-15)` — maximum 10 (v3) or `sum(items 6-17)` — maximum 12 (v4) -3. `score_total = score_obligatory * 0.7 + score_recommended * (3.0 / max_recommended)` — max 7.0 + 3.0 = 10.0 + +**Dimension A — Native Health (the primary score):** +1. `native_health_obligatory = sum(items 1-5)` — maximum 10 +2. `native_health_recommended = sum(items 6-15)` — maximum 10 +3. `native_health = native_health_obligatory * 0.7 + native_health_recommended * 0.3` — max 10.0 4. Apply tier adjustments before calculating (see Step 1b) -5. `score_normalized = min(score_total, 10)` +5. `native_health = min(native_health, 10)` + +**Security cap:** If item 2 (settings.json) or item 4 (block-destructive) is 0, `native_health` max = 6.0. -**Security cap:** If item 2 (settings.json) or item 4 (block-destructive) is 0, maximum score = 6.0. +**Dimension B — dotforge Adoption (informational):** +6. `forge_adoption = sum(items B1-B5)` — 0 to 5. Does NOT enter native_health. +7. Label: 0=None, 1-2=Partial, 3-4=Most, 5=Full. ## Step 5: Generate report @@ -108,7 +115,10 @@ Date: {{YYYY-MM-DD}} Detected stack: {{stacks}} Tier: {{simple|standard|complex}} dotforge version: {{version from last bootstrap/sync if detectable}} -Score: {{X.X}}/10 {{level}} +Native Health: {{X.X}}/10 {{level}} +dotforge Adoption: {{N}}/5 {{None|Partial|Most|Full}} (informational — does not affect Native Health) + +═ DIMENSION A — NATIVE HEALTH ═ ── OBLIGATORY ── {{✅|⚠️|❌}} CLAUDE.md ({{0-2}}) — {{detail: which sections exist/missing}} @@ -118,18 +128,23 @@ Score: {{X.X}}/10 {{level}} {{✅|⚠️|❌}} Build/test commands ({{0-2}}) — {{detail: which ones and whether they match the stack}} ── RECOMMENDED ── -{{✅|⚠️}} CLAUDE_ERRORS.md — {{detail}} -{{✅|⚠️}} Hook lint — {{detail: executable yes/no}} -{{✅|⚠️}} Custom commands — {{detail: N commands}} -{{✅|⚠️}} Memory — {{detail}} -{{✅|⚠️}} Agents — {{detail}} {{✅|⚠️}} .gitignore — {{detail}} {{✅|⚠️}} Prompt injection scan — {{detail}} {{✅|⚠️}} Auto mode safety — {{detail: auto mode active/inactive, deny list complete/incomplete}} -{{✅|⚠️}} v3 behaviors compiled — {{detail: N generated hooks, settings reference yes/no}} {{✅|⚠️}} OS sandboxing — {{detail: enabled/disabled, secret indicators yes/no}} -{{✅|⚠️|—}} v4 workflow availability — {{detail: N .js workflows OR "n/a v3 project"}} -{{✅|⚠️|—}} v4 override loop active — {{detail: hook wired yes/no, log exists yes/no, OR "n/a no v3 behaviors"}} +{{✅|⚠️}} Hook lint — {{detail: executable yes/no}} +{{✅|⚠️}} Auto-memory well used — {{detail: MEMORY.md lines/KB, index vs dump, CLAUDE_ERRORS yes/no}} +{{✅|⚠️}} Permission cascade — {{detail: settings.local.json used / no local overrides}} +{{✅|⚠️}} Attribution configured — {{detail: attribution.* set / deprecated includeCoAuthoredBy / default ok}} +{{✅|⚠️}} Custom commands — {{detail: N commands}} +{{✅|⚠️}} Agents — {{detail}} + +═ DIMENSION B — DOTFORGE ADOPTION ═ (informational) +{{✅|—}} B1 v3 behaviors compiled — {{detail: N generated hooks, settings reference yes/no}} +{{✅|—}} B2 v4 workflow availability — {{detail: N .js workflows OR "none"}} +{{✅|—}} B3 v4 override loop active — {{detail: hook wired yes/no, log exists yes/no}} +{{✅|—}} B4 domain rules — {{detail: N rules, M stale >90d}} +{{✅|—}} B5 sync recency — {{detail: project version vs current VERSION}} ── DOMAIN KNOWLEDGE ── Role defined: {{✓ if ## Role exists in CLAUDE.md with content | ✗ otherwise}} @@ -178,9 +193,12 @@ This closes the Audit → Learning synergy: detected gaps feed back into the pra ## Step 8: Update registry If `$DOTFORGE_DIR/registry/projects.yml` exists, update the project entry: -- `score:` with the calculated score +- `score:` with `native_health` (the primary score — preserves trend continuity with prior audits) +- `forge_adoption:` with the dimension-B value (0-5) - `last_audit:` with the current date - `dotforge_version:` with the VERSION version if the project was bootstrapped - `last_sync:` preserve the existing value (do not modify here) - `notes:` brief summary of the audit -- `history:` append a new entry `{date: YYYY-MM-DD, score: X.X, version: }`. Never overwrite previous entries — this enables score trending over time. +- `history:` append a new entry `{date: YYYY-MM-DD, score: X.X, adoption: N, version: }`. Never overwrite previous entries — this enables trending over time. + +**Transition note:** the two-dimension model (v4.x) changes how scores compose vs the single-score model. Native-first projects (no behaviors/workflows) will show HIGHER `native_health` than their old single score because dimension-B items no longer penalize them. Expect a one-time step in the history trend at the first two-dimension audit; this is by design, not a regression. From 01af60a63e7bafd18d96d097f6de5ef88440a673 Mon Sep 17 00:00:00 2001 From: luiseiman Date: Wed, 3 Jun 2026 13:52:11 -0300 Subject: [PATCH 2/4] fix(ci): grant audit workflow pull-requests/issues write for PR comment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The audit job computed the score correctly but failed at the comment step with 403 "Resource not accessible by integration" — the workflow declared no permissions and the repo default GITHUB_TOKEN is read-only. Co-Authored-By: Claude Opus 4.8 --- .github/workflows/audit.yml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.github/workflows/audit.yml b/.github/workflows/audit.yml index 2e2c784..9dc7b72 100644 --- a/.github/workflows/audit.yml +++ b/.github/workflows/audit.yml @@ -1,5 +1,10 @@ name: dotforge audit score +permissions: + contents: read + pull-requests: write + issues: write + on: pull_request: branches: [main] From d705cc65067ab412edbc4eb41541e0d4e562d353 Mon Sep 17 00:00:00 2001 From: luiseiman Date: Wed, 3 Jun 2026 13:55:49 -0300 Subject: [PATCH 3/4] fix: bump plugin.json to 4.0.0 to match VERSION The v4.0.0 release bumped VERSION but left .claude-plugin/plugin.json at 3.0.4, failing the ci.yml version-consistency check on every PR. Co-Authored-By: Claude Opus 4.8 --- .claude-plugin/plugin.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 29bf5ad..954c805 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "dotforge", - "version": "3.0.4", + "version": "4.0.0", "description": "Behavior governance for Claude Code — declarative runtime policies on tool calls (search-first, no-destructive-git, verify-before-done, …) compiled to PreToolUse hooks, plus configuration governance: 18 skills, 7 agents, 16 stacks, audit scoring, practices pipeline.", "author": { "name": "Luis Eiman", From 43f2b97c4f738a6eb3188ebea6326f66760cfcac Mon Sep 17 00:00:00 2001 From: luiseiman Date: Wed, 3 Jun 2026 13:59:13 -0300 Subject: [PATCH 4/4] fix(test): self-baseline test_on_off instead of assuming search-first default search-first ships enabled=false since v3.6.1 (flag-consume false positives), but test_on_off.sh still asserted an initial enabled=true, failing on every PR. The on/off cycle test now sets its own baseline via the CLI, decoupling it from the shipped default of any single behavior. Co-Authored-By: Claude Opus 4.8 --- scripts/forge-behavior/tests/test_on_off.sh | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/scripts/forge-behavior/tests/test_on_off.sh b/scripts/forge-behavior/tests/test_on_off.sh index 5150d2c..74f5cea 100755 --- a/scripts/forge-behavior/tests/test_on_off.sh +++ b/scripts/forge-behavior/tests/test_on_off.sh @@ -8,9 +8,12 @@ cli_test_init # ---------- Project scope ---------- -# Verify initial state +# Establish a known baseline. search-first ships enabled=false since v3.6.1 +# (flag-consume false positives), so this on/off cycle test sets its own +# starting state instead of assuming the shipped default. +bash "$CLI" on search-first --project >/dev/null || { printf 'FAIL: baseline on project\n' >&2; exit 1; } initial=$(yaml_enabled_by_id "${FORGE_BEHAVIORS_DIR}/index.yaml" "search-first") -assert_eq "true" "$initial" "initial enabled in index.yaml" || exit 1 +assert_eq "true" "$initial" "baseline enabled in index.yaml" || exit 1 # Disable at project scope bash "$CLI" off search-first --project >/dev/null || { printf 'FAIL: off project\n' >&2; exit 1; }