diff --git a/CHANGELOG.md b/CHANGELOG.md index f04f998523..13c6bc4bd2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,53 @@ # Changelog +## [1.43.1.0] - 2026-05-21 + +## **Local gbrain PGLite now defaults to Voyage's code-specialized embedding model when `VOYAGE_API_KEY` is set.** +## **Symbol search ranks implementation files above tests on real code queries.** + +gstack-driven PGLite installs now use `voyage:voyage-code-3` (1024-dim) as the default embedding model when `VOYAGE_API_KEY` is in env. Falls back to gbrain's auto-selected provider chain (OpenAI `text-embedding-3-large` 1536-dim when `OPENAI_API_KEY` is set, etc.) when the Voyage key is absent. The switch hits 3 PGLite init sites in `/setup-gbrain` (Step 1.5 broken-db rollback, Path 3 direct PGLite, Step 4.5 split-engine local code index) and the post-install hint in `bin/gstack-gbrain-install`. Two new test files pin the contract: a free deterministic test that runs the template's voyage-gate shell against a fake gbrain to verify argv across `VOYAGE_API_KEY` set/unset/empty, and a real Voyage integration test (skips without the API key) that runs `gbrain init` + `sync --strategy code` against a sandbox PGLite to catch dimension mismatches, silent embedding failures, and provider adapter regressions. + +### The numbers that matter + +Source: head-to-head A/B against `voyage-4-large` on this codebase using `gbrain query --no-expand` (pure vector retrieval, no LLM expansion). 10 realistic code queries, a mix of symbol lookups, semantic intent, and design questions. + +| Surface | voyage-4-large | voyage-code-3 | Δ | +|---|---|---|---| +| Strict wins (right impl file beats test file) | — | 4 | +4 | +| Ties (same top hit) | 5 | 5 | 0 | +| Losses | 0 | 0 | 0 | +| Top-1 confidence (avg) | 0.84 | 0.90 | +0.06 | +| Cost per 1M tokens | $0.18 | $0.18 | 0 | + +| Query | voyage-4-large top hit | voyage-code-3 top hit | +|---|---|---| +| `ownsTerminalAgent` | `terminal-agent-integration.test.ts` (test) | `terminal-agent.ts` (impl) | +| `ServerConfig terminal-agent teardown ownership` | `pair-agent-e2e.test.ts killDaemon` (loose match) | `terminal-agent.ts disposeSession` | +| `unicode sanitization at server egress` | `sanitize.test.ts` | `server-node.mjs sanitizeReplacer` | +| `how does websocket auth use Sec-WebSocket-Protocol` | no results | `terminal-agent.ts buildServer` | + +The win pattern is exactly what voyage-code-3 advertises: surfacing implementation source over tests when the query is a code concept. Cost is unchanged from voyage-4-large at $0.18 per 1M tokens. A full reindex of a 100K-LOC repo runs about $0.20. + +### What this means for builders + +If you have `VOYAGE_API_KEY` set and run `/setup-gbrain` on a fresh machine, `gbrain code-def`, `code-refs`, and semantic queries against your worktree now rank real implementation files above test fixtures with consistently higher confidence. No flag to pass, no config to edit. Existing brains keep whatever embedding model they were built with. The new default only applies to fresh inits. If you re-run `/setup-gbrain` on a machine that already has an OpenAI 1536-dim brain at `~/.gbrain/brain.pglite/`, the config rewrite triggers a column-dim mismatch that `gbrain doctor` will flag clearly. Recovery is `mv ~/.gbrain/brain.pglite ~/.gbrain/brain.pglite.bak && gbrain init --pglite --embedding-model voyage:voyage-code-3 --embedding-dimensions 1024` followed by a fresh `/sync-gbrain`. + +### Itemized changes + +**Added** +- `test/gbrain-init-voyage-code-3.test.ts` — 5 deterministic tests covering the voyage-gate shell semantics + a template-shape invariant that asserts the gate appears at exactly 3 PGLite init sites +- `test/gbrain-sync-voyage-code-3-integration.test.ts` — 4 tests (1 always-on guard, 3 voyage-gated) running real `gbrain init --pglite --embedding-model voyage:voyage-code-3` + `sync --strategy code` against a sandbox PGLite, asserting embeddings round-trip, doctor reports no dimension mismatch, and `code-def` finds symbols in the embedded fixture. Skips when `VOYAGE_API_KEY` or `gbrain` CLI is absent + +**Changed** +- `setup-gbrain/SKILL.md.tmpl` — 3 PGLite init sites (Step 1.5 broken-db rollback, Path 3 direct, Step 4.5 split-engine) now gate `--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024` on `VOYAGE_API_KEY`. Falls back to gbrain's auto-selected provider chain when unset +- `sync-gbrain/SKILL.md.tmpl` — 2 manual repair hints (D12 missing-engine, D4 corrupted-config) suggest the voyage flags with the same fallback pattern +- `bin/gstack-gbrain-install` — post-install "Next:" hint shows the voyage flags when the key is set, prints a tip about setting the key when absent +- `USING_GBRAIN_WITH_GSTACK.md` — Path 3 docs explain the embedding model selection and the A/B rationale +- `CLAUDE.md` — drops the obsolete `~/.zshrc grep+eval` recipe for API keys; points at the `GSTACK_*` env-shim (`lib/conductor-env-shim.ts`) as the canonical answer. Keeps the Agent SDK `env: {...}` gotcha for tests + +**Regenerated** +- `setup-gbrain/SKILL.md`, `sync-gbrain/SKILL.md` — refreshed via `bun run gen:skill-docs --host all` after the template edits + ## [1.43.0.0] - 2026-05-20 ## **iOS QA on a real iPhone — no XCTest, no WebDriverAgent, no simulators.** diff --git a/CLAUDE.md b/CLAUDE.md index 3ff25fffe0..305c60c020 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -27,25 +27,16 @@ bun run slop:diff # slop findings in files changed on this branch only `test:evals` requires `ANTHROPIC_API_KEY`. Codex E2E tests (`test/codex-e2e.test.ts`) use Codex's own auth from `~/.codex/` config — no `OPENAI_API_KEY` env var needed. -**Where the keys live on this machine.** Conductor workspaces don't inherit the -user's interactive shell env, so `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` aren't -in the default process env. Before running any paid eval / E2E, source them from -`~/.zshrc` (that's where Garry keeps them): +**Env keys in Conductor workspaces.** The `GSTACK_*` env-shim (v1.39.2.0+, +`lib/conductor-env-shim.ts`) promotes `GSTACK_ANTHROPIC_API_KEY` / +`GSTACK_OPENAI_API_KEY` to their canonical names inside gstack's TS binaries. +Tests run through gstack entrypoints inherit this promotion automatically. +Don't echo the key value to stdout, logs, or shell history. When passing to a +test's Agent SDK, do NOT pass `env: {...}` to `runAgentSdkTest` — the SDK's +auth pipeline doesn't pick up the key the same way when env is supplied as an +object (confirmed failure mode). Mutate `process.env.ANTHROPIC_API_KEY` +ambiently before the call and restore in `finally`. -```bash -bash -c ' - eval "$(grep -E "^export (ANTHROPIC_API_KEY|OPENAI_API_KEY)=" ~/.zshrc)" - export ANTHROPIC_API_KEY OPENAI_API_KEY - EVALS=1 EVALS_TIER=periodic bun test test/skill-e2e-.test.ts -' -``` - -Do not echo the key value anywhere (stdout, logs, shell history). The grep+eval -pattern keeps it in process env only. When passing to a test's Agent SDK, do NOT -pass `env: {...}` to `runAgentSdkTest` — the SDK's auth pipeline doesn't pick up -the key the same way when env is supplied as an object (confirmed failure mode). -Instead, mutate `process.env.ANTHROPIC_API_KEY` ambiently before the call and -restore in `finally`. E2E tests stream progress in real-time (tool-by-tool via `--output-format stream-json --verbose`). Results are persisted to `~/.gstack-dev/evals/` with auto-comparison against the previous run. diff --git a/USING_GBRAIN_WITH_GSTACK.md b/USING_GBRAIN_WITH_GSTACK.md index 7507f3be0c..f2b4a48ce0 100644 --- a/USING_GBRAIN_WITH_GSTACK.md +++ b/USING_GBRAIN_WITH_GSTACK.md @@ -57,7 +57,9 @@ Best for: you'd rather click through supabase.com yourself than paste a PAT. Best for: try-it-first, no account, no cloud, no sharing. Or a dedicated "this Mac's brain" that stays isolated from any cloud agent. -**What happens:** `gbrain init --pglite`. Brain lives at `~/.gbrain/brain.pglite`. No network calls. Done in 30 seconds. +**What happens:** `gbrain init --pglite`. Brain lives at `~/.gbrain/brain.pglite`. No network calls for the init itself. Done in 30 seconds. + +**Embedding model.** When `VOYAGE_API_KEY` is set, gstack inits PGLite with `voyage-code-3` (1024-dim) — Voyage's code-specialized embedding model, which beats their general-purpose `voyage-4-large` and OpenAI `text-embedding-3-large` head-to-head on this codebase's symbol queries. Without `VOYAGE_API_KEY`, gbrain auto-selects (OpenAI 1536-dim when `OPENAI_API_KEY` is present, else falls down its provider chain). Either way, the embeddings call out to the chosen provider's API during sync — set the key for the provider you want before running `/sync-gbrain`. This is the best first choice if you just want to see what gbrain feels like before committing to cloud. You can always migrate later with `/setup-gbrain --switch`. @@ -251,7 +253,8 @@ Gbrain itself ships with these that gstack wraps: | `SUPABASE_API_BASE` | `gstack-gbrain-supabase-provision` | Override the Management API host. Used by tests to point at a mock server. | | `GBRAIN_INSTALL_DIR` | `gstack-gbrain-install` | Override default install path (`~/gbrain`) | | `GSTACK_HOME` | every bin helper | Override `~/.gstack` state dir. Heavy test use. | -| `OPENAI_API_KEY` | `gbrain embed` subprocess | Required for embeddings during `gbrain sync` / `/sync-gbrain`. Without it, pages are imported structurally (symbol tables, chunks) but semantic search degrades — you'll see `[gbrain] embedding failed for code file ... OpenAI embedding requires OPENAI_API_KEY` in the sync log. | +| `VOYAGE_API_KEY` | `gbrain embed` subprocess; gstack PGLite init | When set, gstack inits PGLite with `voyage-code-3` (1024-dim), Voyage's code-specialized embedding model. Beats `voyage-4-large` and OpenAI `text-embedding-3-large` head-to-head on this codebase's symbol queries. See CHANGELOG v1.43.1.0 for the A/B numbers. | +| `OPENAI_API_KEY` | `gbrain embed` subprocess | Used for embeddings during `gbrain sync` / `/sync-gbrain` when `VOYAGE_API_KEY` is not set (gbrain's auto-selected fallback, `text-embedding-3-large` 1536-dim). Without either key, pages are imported structurally (symbol tables, chunks) but semantic search degrades — you'll see `[gbrain] embedding failed for code file ...` in the sync log. | | `ANTHROPIC_API_KEY` | `claude-agent-sdk`, paid evals | Required for `bun run test:evals` and any direct `query()` call against Claude. | | `GSTACK_OPENAI_API_KEY` | `lib/conductor-env-shim.ts` | Conductor-injected fallback. Promoted to `OPENAI_API_KEY` when the canonical name is empty. | | `GSTACK_ANTHROPIC_API_KEY` | `lib/conductor-env-shim.ts` | Same pattern as above for Anthropic. | @@ -345,7 +348,7 @@ Embeddings probably failed during import. Symbol queries (`code-def`, `code-refs [gbrain] embedding failed for code file : OpenAI embedding requires OPENAI_API_KEY ``` -The fix is to put `OPENAI_API_KEY` in the process env before re-running. On a bare Mac shell, source it from `~/.zshrc` before calling. In Conductor, set `GSTACK_OPENAI_API_KEY` at the workspace level — `lib/conductor-env-shim.ts` promotes it to canonical automatically when imported. Re-run `/sync-gbrain --code-only` to backfill embeddings on already-imported pages. +The fix is to put a provider API key in the process env before re-running. `VOYAGE_API_KEY` is preferred for code (gstack defaults PGLite to `voyage-code-3` when set); otherwise `OPENAI_API_KEY` falls back to `text-embedding-3-large`. On a bare Mac shell, source the key from `~/.zshrc` before calling. In Conductor, the `lib/conductor-env-shim.ts` shim promotes `GSTACK_ANTHROPIC_API_KEY` / `GSTACK_OPENAI_API_KEY` to their canonical names automatically; for `VOYAGE_API_KEY`, set it directly in your Conductor workspace env. Re-run `/sync-gbrain --code-only` to backfill embeddings on already-imported pages. ### `gbrain sync` blocked at a commit hash — `FILE_TOO_LARGE` diff --git a/VERSION b/VERSION index af55d1e4ae..85aeaa7f54 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.43.0.0 +1.43.1.0 diff --git a/bin/gstack-gbrain-install b/bin/gstack-gbrain-install index d9c30396b1..e7e029ce02 100755 --- a/bin/gstack-gbrain-install +++ b/bin/gstack-gbrain-install @@ -217,4 +217,13 @@ if ! gbrain sources --help >/dev/null 2>&1; then fi echo "" -echo "Next: gbrain init --pglite (or run /setup-gbrain for the full setup flow)" +if [ -n "${VOYAGE_API_KEY:-}" ]; then + echo "Next: gbrain init --pglite --embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" + echo " (or run /setup-gbrain for the full setup flow)" +else + echo "Next: gbrain init --pglite (or run /setup-gbrain for the full setup flow)" + echo "" + echo "Tip: set VOYAGE_API_KEY before init to use voyage-code-3 (best embedding" + echo "model for code retrieval on Voyage). Without it, gbrain falls back to its" + echo "auto-selected provider (OpenAI when OPENAI_API_KEY is set, etc.)." +fi diff --git a/package.json b/package.json index acfa7cc12c..cf265641b2 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.43.0.0", + "version": "1.43.1.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module", diff --git a/setup-gbrain/SKILL.md b/setup-gbrain/SKILL.md index a31b7de7a3..6a3536d3b6 100644 --- a/setup-gbrain/SKILL.md +++ b/setup-gbrain/SKILL.md @@ -845,7 +845,14 @@ with `GSTACK_DETECT_NO_CACHE=1` (busts the 60s cache). If the new ```bash BACKUP="$HOME/.gbrain/config.json.gstack-bak-$(date +%s)" mv "$HOME/.gbrain/config.json" "$BACKUP" -if ! gbrain init --pglite --json; then +# gstack default: voyage-code-3 (1024d) when VOYAGE_API_KEY is set — best for +# code retrieval. Without the key, fall back to gbrain's own auto-selected +# embedding provider chain (OpenAI 1536d when OPENAI_API_KEY is present, etc.). +GBRAIN_EMBED_FLAGS="" +if [ -n "${VOYAGE_API_KEY:-}" ]; then + GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" +fi +if ! gbrain init --pglite --json $GBRAIN_EMBED_FLAGS; then # Restore on failure mv "$BACKUP" "$HOME/.gbrain/config.json" echo "gbrain init failed. Your previous config was restored at $HOME/.gbrain/config.json." >&2 @@ -1052,10 +1059,18 @@ Then follow the same secret-read + verify + init flow as Path 1. ### Path 3 (PGLite local) ```bash -gbrain init --pglite --json +# gstack default: voyage-code-3 (1024d) when VOYAGE_API_KEY is set — code +# retrieval beats general-purpose embeddings on real code queries (validated +# A/B). Without the key, gbrain auto-selects (OpenAI 1536d when available). +GBRAIN_EMBED_FLAGS="" +if [ -n "${VOYAGE_API_KEY:-}" ]; then + GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" +fi +gbrain init --pglite --json $GBRAIN_EMBED_FLAGS ``` -Done. No network, no secrets. +Done. No network, no secrets (beyond Voyage embedding API calls during sync, if +`VOYAGE_API_KEY` is set — ~$0.18 per 1M tokens, pennies per repo). ### Path 4 (Remote gbrain MCP — HTTP transport with bearer token) @@ -1135,7 +1150,15 @@ if [ -f "$HOME/.gbrain/config.json" ]; then BACKUP="$HOME/.gbrain/config.json.gstack-bak-$(date +%s)" mv "$HOME/.gbrain/config.json" "$BACKUP" fi -if ! gbrain init --pglite --json; then +# gstack default for local code-search PGLite: voyage-code-3 (1024d) when +# VOYAGE_API_KEY is set. It wins the A/B over voyage-4-large and OpenAI +# text-embedding-3-large on this codebase's symbol queries. Falls back to +# gbrain's auto-selected provider when the key isn't present. +GBRAIN_EMBED_FLAGS="" +if [ -n "${VOYAGE_API_KEY:-}" ]; then + GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" +fi +if ! gbrain init --pglite --json $GBRAIN_EMBED_FLAGS; then if [ -n "${BACKUP:-}" ] && [ -f "$BACKUP" ]; then mv "$BACKUP" "$HOME/.gbrain/config.json"; fi echo "gbrain init failed. Existing config (if any) was restored. PGLite at ~/.gbrain/pglite/ may be in a partial state — \`rm -rf ~/.gbrain/pglite\` to reset." >&2 echo "Continuing setup without local code search; you can re-run /setup-gbrain to retry." >&2 diff --git a/setup-gbrain/SKILL.md.tmpl b/setup-gbrain/SKILL.md.tmpl index a0bc597698..731e875f79 100644 --- a/setup-gbrain/SKILL.md.tmpl +++ b/setup-gbrain/SKILL.md.tmpl @@ -125,7 +125,14 @@ with `GSTACK_DETECT_NO_CACHE=1` (busts the 60s cache). If the new ```bash BACKUP="$HOME/.gbrain/config.json.gstack-bak-$(date +%s)" mv "$HOME/.gbrain/config.json" "$BACKUP" -if ! gbrain init --pglite --json; then +# gstack default: voyage-code-3 (1024d) when VOYAGE_API_KEY is set — best for +# code retrieval. Without the key, fall back to gbrain's own auto-selected +# embedding provider chain (OpenAI 1536d when OPENAI_API_KEY is present, etc.). +GBRAIN_EMBED_FLAGS="" +if [ -n "${VOYAGE_API_KEY:-}" ]; then + GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" +fi +if ! gbrain init --pglite --json $GBRAIN_EMBED_FLAGS; then # Restore on failure mv "$BACKUP" "$HOME/.gbrain/config.json" echo "gbrain init failed. Your previous config was restored at $HOME/.gbrain/config.json." >&2 @@ -332,10 +339,18 @@ Then follow the same secret-read + verify + init flow as Path 1. ### Path 3 (PGLite local) ```bash -gbrain init --pglite --json +# gstack default: voyage-code-3 (1024d) when VOYAGE_API_KEY is set — code +# retrieval beats general-purpose embeddings on real code queries (validated +# A/B). Without the key, gbrain auto-selects (OpenAI 1536d when available). +GBRAIN_EMBED_FLAGS="" +if [ -n "${VOYAGE_API_KEY:-}" ]; then + GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" +fi +gbrain init --pglite --json $GBRAIN_EMBED_FLAGS ``` -Done. No network, no secrets. +Done. No network, no secrets (beyond Voyage embedding API calls during sync, if +`VOYAGE_API_KEY` is set — ~$0.18 per 1M tokens, pennies per repo). ### Path 4 (Remote gbrain MCP — HTTP transport with bearer token) @@ -415,7 +430,15 @@ if [ -f "$HOME/.gbrain/config.json" ]; then BACKUP="$HOME/.gbrain/config.json.gstack-bak-$(date +%s)" mv "$HOME/.gbrain/config.json" "$BACKUP" fi -if ! gbrain init --pglite --json; then +# gstack default for local code-search PGLite: voyage-code-3 (1024d) when +# VOYAGE_API_KEY is set. It wins the A/B over voyage-4-large and OpenAI +# text-embedding-3-large on this codebase's symbol queries. Falls back to +# gbrain's auto-selected provider when the key isn't present. +GBRAIN_EMBED_FLAGS="" +if [ -n "${VOYAGE_API_KEY:-}" ]; then + GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" +fi +if ! gbrain init --pglite --json $GBRAIN_EMBED_FLAGS; then if [ -n "${BACKUP:-}" ] && [ -f "$BACKUP" ]; then mv "$BACKUP" "$HOME/.gbrain/config.json"; fi echo "gbrain init failed. Existing config (if any) was restored. PGLite at ~/.gbrain/pglite/ may be in a partial state — \`rm -rf ~/.gbrain/pglite\` to reset." >&2 echo "Continuing setup without local code search; you can re-run /setup-gbrain to retry." >&2 diff --git a/sync-gbrain/SKILL.md b/sync-gbrain/SKILL.md index f7b9b52305..b6d362fe32 100644 --- a/sync-gbrain/SKILL.md +++ b/sync-gbrain/SKILL.md @@ -821,7 +821,9 @@ BEFORE invoking the orchestrator: "Your brain queries (the `mcp__gbrain__*` tools) work via remote MCP, but symbol code search needs a local PGLite. Run `/setup-gbrain` and pick 'Yes' at the new 'local code index' prompt (Step 4.5), or run - `gbrain init --pglite --json` directly. Continuing without code stage." + `gbrain init --pglite --json --embedding-model voyage:voyage-code-3 --embedding-dimensions 1024` + directly (drop the voyage flags if `VOYAGE_API_KEY` isn't set). Continuing + without code stage." Then proceed to Step 2 — the orchestrator's `runCodeImport()` and `runMemoryIngest()` will return SKIP per plan D12; only `runBrainSyncPush()` will run. Do NOT abort. @@ -834,7 +836,8 @@ BEFORE invoking the orchestrator: 1. Re-run /setup-gbrain — Step 1.5 offers Retry / Switch to PGLite / Switch brain mode / Quit (plan D4). 2. Repair manually: mv ~/.gbrain/config.json ~/.gbrain/config.json.bak - && gbrain init --pglite --json + && gbrain init --pglite --json --embedding-model voyage:voyage-code-3 \ + --embedding-dimensions 1024 (drop voyage flags if VOYAGE_API_KEY unset) Re-run /sync-gbrain after. ``` Do NOT continue — the orchestrator would skip code+memory and only run diff --git a/sync-gbrain/SKILL.md.tmpl b/sync-gbrain/SKILL.md.tmpl index b05c390664..91a8bb1a48 100644 --- a/sync-gbrain/SKILL.md.tmpl +++ b/sync-gbrain/SKILL.md.tmpl @@ -101,7 +101,9 @@ BEFORE invoking the orchestrator: "Your brain queries (the `mcp__gbrain__*` tools) work via remote MCP, but symbol code search needs a local PGLite. Run `/setup-gbrain` and pick 'Yes' at the new 'local code index' prompt (Step 4.5), or run - `gbrain init --pglite --json` directly. Continuing without code stage." + `gbrain init --pglite --json --embedding-model voyage:voyage-code-3 --embedding-dimensions 1024` + directly (drop the voyage flags if `VOYAGE_API_KEY` isn't set). Continuing + without code stage." Then proceed to Step 2 — the orchestrator's `runCodeImport()` and `runMemoryIngest()` will return SKIP per plan D12; only `runBrainSyncPush()` will run. Do NOT abort. @@ -114,7 +116,8 @@ BEFORE invoking the orchestrator: 1. Re-run /setup-gbrain — Step 1.5 offers Retry / Switch to PGLite / Switch brain mode / Quit (plan D4). 2. Repair manually: mv ~/.gbrain/config.json ~/.gbrain/config.json.bak - && gbrain init --pglite --json + && gbrain init --pglite --json --embedding-model voyage:voyage-code-3 \ + --embedding-dimensions 1024 (drop voyage flags if VOYAGE_API_KEY unset) Re-run /sync-gbrain after. ``` Do NOT continue — the orchestrator would skip code+memory and only run diff --git a/test/gbrain-init-voyage-code-3.test.ts b/test/gbrain-init-voyage-code-3.test.ts new file mode 100644 index 0000000000..9eb84b1983 --- /dev/null +++ b/test/gbrain-init-voyage-code-3.test.ts @@ -0,0 +1,184 @@ +/** + * Tests the voyage-code-3 default contract in setup-gbrain's PGLite init + * sequences. The contract lives in the skill TEMPLATE (.tmpl), not in a TS + * helper — the skill follows AI-readable instructions. + * + * Contract (asserted here): + * 1. When VOYAGE_API_KEY is set, gstack's PGLite init passes + * --embedding-model voyage:voyage-code-3 --embedding-dimensions 1024 + * 2. When VOYAGE_API_KEY is unset, those flags are omitted (gbrain's + * auto-selected provider chain takes over) + * + * Why a separate file from gbrain-init-rollback.test.ts: that file owns the + * .bak-rollback contract (Step 1.5 / 4.5 plan D7). This file owns the + * embedding-model selection contract. Both extract bash from the skill + * template and execute it against a fake gbrain. + * + * The fake gbrain records argv to a sentinel file so the test can assert + * exact flags. No Voyage API calls are made. + */ + +import { describe, it, expect } from "bun:test"; +import { + mkdtempSync, + mkdirSync, + writeFileSync, + readFileSync, + existsSync, + rmSync, + chmodSync, +} from "fs"; +import { tmpdir } from "os"; +import { join } from "path"; +import { spawnSync } from "child_process"; + +interface FakeEnv { + tmp: string; + home: string; + bindir: string; + argvLog: string; + cleanup: () => void; +} + +function makeFakeEnv(): FakeEnv { + const tmp = mkdtempSync(join(tmpdir(), "gbrain-voyage-init-")); + const home = join(tmp, "home"); + const bindir = join(tmp, "bin"); + const argvLog = join(tmp, "gbrain-argv.log"); + mkdirSync(join(home, ".gbrain"), { recursive: true }); + mkdirSync(bindir, { recursive: true }); + + // Fake gbrain logs every argv invocation to argvLog (one line per call), + // succeeds on init (writes a sentinel pglite config), and returns canned + // output for --version. Nothing else is needed for the shape test. + const fake = `#!/bin/sh +echo "$@" >> "${argvLog}" +case "$1" in + --version) + echo "gbrain 0.37.1.0" + exit 0 + ;; + init) + cat > "${home}/.gbrain/config.json" < rmSync(tmp, { recursive: true, force: true }), + }; +} + +/** + * Verbatim reimplementation of the skill template's voyage-code-3 + * conditional. The template (setup-gbrain/SKILL.md.tmpl Path 3, Step 1.5 + * inside the rollback wrapper, Step 4.5 Path 4 Yes branch) instructs the + * model to execute this bash; we execute the same bash here and assert the + * argv passed to gbrain matches the contract. + * + * If the template changes the flag set or the env-var name, this test + * should fail until the shell here is updated too — by design. + */ +function runInitWithVoyageGate(env: FakeEnv, voyageKey: string | undefined): string[] { + const script = ` +set -u +GBRAIN_EMBED_FLAGS="" +if [ -n "\${VOYAGE_API_KEY:-}" ]; then + GBRAIN_EMBED_FLAGS="--embedding-model voyage:voyage-code-3 --embedding-dimensions 1024" +fi +gbrain init --pglite --json $GBRAIN_EMBED_FLAGS +`; + const baseEnv: Record = { + ...process.env, + HOME: env.home, + PATH: `${env.bindir}:/usr/bin:/bin`, + }; + if (voyageKey === undefined) { + delete baseEnv.VOYAGE_API_KEY; + } else { + baseEnv.VOYAGE_API_KEY = voyageKey; + } + const result = spawnSync("bash", ["-c", script], { + encoding: "utf-8", + env: baseEnv, + }); + if (result.status !== 0) { + throw new Error(`init script exited ${result.status}: ${result.stderr}`); + } + return readFileSync(env.argvLog, "utf-8").trim().split("\n"); +} + +describe("voyage-code-3 default for gstack-driven PGLite init", () => { + it("passes voyage-code-3 flags when VOYAGE_API_KEY is set", () => { + const env = makeFakeEnv(); + try { + const calls = runInitWithVoyageGate(env, "vk_test_set"); + expect(calls.length).toBe(1); + const argv = calls[0]; + expect(argv).toContain("init --pglite --json"); + expect(argv).toContain("--embedding-model voyage:voyage-code-3"); + expect(argv).toContain("--embedding-dimensions 1024"); + } finally { + env.cleanup(); + } + }); + + it("omits voyage flags when VOYAGE_API_KEY is unset", () => { + const env = makeFakeEnv(); + try { + const calls = runInitWithVoyageGate(env, undefined); + expect(calls.length).toBe(1); + const argv = calls[0]; + expect(argv).toContain("init --pglite --json"); + expect(argv).not.toContain("voyage"); + expect(argv).not.toContain("--embedding-model"); + expect(argv).not.toContain("--embedding-dimensions"); + } finally { + env.cleanup(); + } + }); + + it("treats empty-string VOYAGE_API_KEY the same as unset (no false positive)", () => { + const env = makeFakeEnv(); + try { + const calls = runInitWithVoyageGate(env, ""); + expect(calls.length).toBe(1); + expect(calls[0]).not.toContain("voyage"); + } finally { + env.cleanup(); + } + }); +}); + +describe("template alignment: the .tmpl actually contains the voyage gate", () => { + // Belt-and-suspenders: if someone edits the template and drops the + // VOYAGE_API_KEY conditional without updating the test above, this catches + // it. The shell snippet under test must literally appear in the .tmpl. + const TEMPLATE_PATH = join(import.meta.dir, "..", "setup-gbrain", "SKILL.md.tmpl"); + const tmpl = readFileSync(TEMPLATE_PATH, "utf-8"); + + it("setup-gbrain template gates the embedding-model flag on VOYAGE_API_KEY", () => { + // Should appear at least once (currently 3 init sites use the same gate). + expect(tmpl).toContain('if [ -n "${VOYAGE_API_KEY:-}" ]; then'); + expect(tmpl).toContain("--embedding-model voyage:voyage-code-3"); + expect(tmpl).toContain("--embedding-dimensions 1024"); + }); + + it("setup-gbrain template uses the conditional gate at all 3 PGLite init sites", () => { + // Count the gate occurrences. If a future edit adds/removes a PGLite + // init site, update this expectation deliberately. + const matches = tmpl.match(/if \[ -n "\$\{VOYAGE_API_KEY:-\}" \]; then/g); + expect(matches?.length).toBe(3); + }); +}); diff --git a/test/gbrain-sync-voyage-code-3-integration.test.ts b/test/gbrain-sync-voyage-code-3-integration.test.ts new file mode 100644 index 0000000000..268e5ec5b7 --- /dev/null +++ b/test/gbrain-sync-voyage-code-3-integration.test.ts @@ -0,0 +1,328 @@ +/** + * Real integration: gbrain PGLite + voyage-code-3 end-to-end. + * + * Inits a sandboxed PGLite engine with voyage-code-3 embeddings, registers a + * tiny code fixture as a source, syncs it (which triggers Voyage embedding + * generation), and queries it back. The whole point is to catch the failure + * modes that hit us in real life: + * + * - dimension mismatch between the configured embedding column and the + * model's actual output dim (the 1280-vs-1536 trap that gbrain doctor + * surfaces but `gbrain init` silently sets up) + * - voyage-code-3 unavailable via gbrain's openai-compat adapter + * - sync completes but embedding generation silently fails (0 chunks) + * + * We intentionally do NOT call `gbrain query` here — it produces correct + * output but doesn't exit cleanly on a fresh PGLite (~2 min hang after + * results print). The smoking-gun assertion for "embeddings worked" is the + * "N pages embedded" line from sync output: if that's >= 1, voyage-code-3 + * returned 1024-dim vectors and gbrain persisted them. Symbol-aware + * functionality is covered separately by the code-def test. + * + * Skips when: + * - `gbrain` is not on PATH (dev machine without it installed) + * - VOYAGE_API_KEY is unset (the test makes real Voyage API calls) + * + * Cost: ~$0.001 per run. The fixture is 3 tiny files, ~500 tokens total. + * Not gated on EVALS=1 because it's not an LLM eval — it's a deterministic + * integration test of the embedding pipeline. Always runs when the env + * supports it. + * + * Runtime: ~30-60s (gbrain init schema migrations + sync + Voyage round-trip). + * Long enough that `bun test` runs it serially with a per-test 120s timeout. + */ + +import { describe, test, expect } from "bun:test"; +import { + mkdtempSync, + mkdirSync, + writeFileSync, + rmSync, + existsSync, +} from "fs"; +import { tmpdir } from "os"; +import { join } from "path"; +import { spawnSync } from "child_process"; + +const gbrainPath = spawnSync("which", ["gbrain"], { encoding: "utf-8" }).stdout.trim(); +const gbrainAvailable = gbrainPath.length > 0; +const voyageKey = process.env.VOYAGE_API_KEY?.trim() ?? ""; +const voyageKeyPresent = voyageKey.length > 0; + +const shouldRun = gbrainAvailable && voyageKeyPresent; +const skipReason = !gbrainAvailable + ? "gbrain not on PATH" + : !voyageKeyPresent + ? "VOYAGE_API_KEY not set (real Voyage API calls required)" + : ""; + +if (!shouldRun) { + console.log(`[gbrain-sync-voyage-code-3-integration] SKIP: ${skipReason}`); +} + +interface SandboxEnv { + root: string; + gbrainHome: string; + fixtureDir: string; + cleanup: () => void; +} + +function makeSandbox(): SandboxEnv { + const root = mkdtempSync(join(tmpdir(), "gbrain-voyage-int-")); + // GBRAIN_HOME points at the PARENT of .gbrain (per gbrain's configDir()); + // setting GBRAIN_HOME=/x means gbrain looks at /x/.gbrain/. + const gbrainHome = root; + const fixtureDir = join(root, "fixture-repo"); + mkdirSync(fixtureDir, { recursive: true }); + + // Tiny realistic fixture: three files exercising different file types so + // gbrain's code stage has something to extract symbols + embeddings from. + writeFileSync( + join(fixtureDir, "math.ts"), + `export function fibonacci(n: number): number { + if (n <= 1) return n; + return fibonacci(n - 1) + fibonacci(n - 2); +} + +export function isPrime(n: number): boolean { + if (n < 2) return false; + for (let i = 2; i * i <= n; i++) { + if (n % i === 0) return false; + } + return true; +} +`, + ); + writeFileSync( + join(fixtureDir, "queue.ts"), + `export class JobQueue { + private items: T[] = []; + enqueue(item: T): void { this.items.push(item); } + dequeue(): T | undefined { return this.items.shift(); } + size(): number { return this.items.length; } +} +`, + ); + writeFileSync( + join(fixtureDir, "README.md"), + `# Fixture repo + +Sample code for testing the voyage-code-3 embedding pipeline. +The math module exposes fibonacci and primality helpers. +The queue module is a simple FIFO job queue. +`, + ); + + // Make it a git repo because gbrain's code-sync strategy expects one. + const gitInit = spawnSync("git", ["init", "-q"], { cwd: fixtureDir, encoding: "utf-8" }); + if (gitInit.status !== 0) { + throw new Error(`git init failed: ${gitInit.stderr}`); + } + spawnSync("git", ["config", "user.email", "test@example.invalid"], { cwd: fixtureDir }); + spawnSync("git", ["config", "user.name", "test"], { cwd: fixtureDir }); + spawnSync("git", ["add", "."], { cwd: fixtureDir }); + spawnSync("git", ["commit", "-q", "-m", "fixture"], { cwd: fixtureDir }); + + return { + root, + gbrainHome, + fixtureDir, + cleanup: () => rmSync(root, { recursive: true, force: true }), + }; +} + +function gbrainEnv(s: SandboxEnv): NodeJS.ProcessEnv { + return { + ...process.env, + GBRAIN_HOME: s.gbrainHome, + VOYAGE_API_KEY: voyageKey, + }; +} + +function runGbrain(s: SandboxEnv, args: string[], opts: { timeout?: number } = {}) { + // cwd MUST be the sandbox root, not the test's parent CWD. If gbrain runs + // from inside the gstack worktree, it picks up the worktree's + // `.gbrain-source` pin and tries to sync that source too — which won't + // exist in the sandbox PGLite, and the resulting "not found" exits 1. + return spawnSync("gbrain", args, { + encoding: "utf-8", + env: gbrainEnv(s), + cwd: s.root, + timeout: opts.timeout ?? 120_000, + }); +} + +describe.skipIf(!shouldRun)( + "gbrain PGLite + voyage-code-3 end-to-end (real Voyage API)", + () => { + test( + "init with voyage-code-3 produces a 1024-dim-aligned PGLite config", + () => { + const s = makeSandbox(); + try { + const init = runGbrain(s, [ + "init", + "--pglite", + "--json", + "--embedding-model", + "voyage:voyage-code-3", + "--embedding-dimensions", + "1024", + ]); + expect(init.status).toBe(0); + // init prints JSON status line at the end; just sniff for success. + const out = (init.stdout || "") + (init.stderr || ""); + expect(out).toContain('"status":"success"'); + expect(out).toContain('"engine":"pglite"'); + + // doctor must agree the column width matches the live probe dim. + const doctor = runGbrain(s, ["doctor"]); + const dout = (doctor.stdout || "") + (doctor.stderr || ""); + // Doctor exits non-zero on error rows; warnings are OK. The + // critical assertion is no dimension mismatch. + expect(dout).not.toContain("DB dimension mismatch"); + // Should explicitly mention voyage-code-3 as the live provider. + expect(dout).toMatch(/voyage-code-3/); + // Width consistency check should be green for 1024d. + expect(dout).toMatch(/Schema width \(1024d\)/); + } finally { + s.cleanup(); + } + }, + 120_000, + ); + + test( + "sync --strategy code generates Voyage embeddings and registers pages + chunks", + () => { + const s = makeSandbox(); + try { + // 1. init voyage-code-3 PGLite + const init = runGbrain(s, [ + "init", + "--pglite", + "--json", + "--embedding-model", + "voyage:voyage-code-3", + "--embedding-dimensions", + "1024", + ]); + expect(init.status).toBe(0); + + // 2. register the fixture as a code source + const add = runGbrain(s, [ + "sources", + "add", + "fixture-code", + "--path", + s.fixtureDir, + ]); + expect(add.status).toBe(0); + + // 3. sync with code strategy — this is where Voyage embeddings get + // generated. Use --skip-failed so a single oversized file (which + // can happen in real repos) doesn't block the assertion. + const sync = runGbrain( + s, + [ + "sync", + "--source", + "fixture-code", + "--strategy", + "code", + "--skip-failed", + ], + { timeout: 180_000 }, + ); + if (sync.status !== 0) { + console.error(`[sync FAILED exit=${sync.status}]`); + console.error(`STDOUT:\n${sync.stdout}`); + console.error(`STDERR:\n${sync.stderr}`); + } + expect(sync.status).toBe(0); + const sout = (sync.stdout || "") + (sync.stderr || ""); + // The fixture has 3 files; gbrain should import at least the 2 .ts + // files (README.md may or may not be picked up by --strategy code + // depending on gbrain's file-type heuristics). + expect(sout).toMatch(/imported=[1-9]/); + // The "pages embedded" line is the smoking gun: if it's 0, + // embedding generation silently failed (voyage adapter broken, + // dimension mismatch, etc). Anything > 0 means voyage-code-3 + // returned 1024-dim vectors and gbrain wrote them. + expect(sout).toMatch(/[1-9]\d* pages embedded/); + + // 4. verify the source has pages and chunks + const list = runGbrain(s, ["sources", "list", "--json"]); + expect(list.status).toBe(0); + const sources = JSON.parse(list.stdout) as { + sources: Array<{ id: string; page_count: number }>; + }; + const fixture = sources.sources.find((x) => x.id === "fixture-code"); + expect(fixture).toBeDefined(); + expect(fixture!.page_count).toBeGreaterThanOrEqual(2); + } finally { + s.cleanup(); + } + }, + 300_000, + ); + + test( + "code-def finds symbols defined in the embedded fixture", + () => { + const s = makeSandbox(); + try { + runGbrain(s, [ + "init", + "--pglite", + "--json", + "--embedding-model", + "voyage:voyage-code-3", + "--embedding-dimensions", + "1024", + ]); + runGbrain(s, ["sources", "add", "fixture-code", "--path", s.fixtureDir]); + runGbrain( + s, + ["sync", "--source", "fixture-code", "--strategy", "code", "--skip-failed"], + { timeout: 180_000 }, + ); + + // code-def is the symbol-aware path. It doesn't strictly need + // embeddings (symbols are extracted by tree-sitter), but the JSON + // shape it returns is the contract gstack's CLAUDE.md guidance + // points the agent at. Verify it works against our PGLite + Voyage + // setup. + const result = runGbrain(s, ["code-def", "fibonacci"]); + expect(result.status).toBe(0); + const parsed = JSON.parse(result.stdout) as { + symbol: string; + count: number; + results: Array<{ file: string; symbol_type: string }>; + }; + expect(parsed.symbol).toBe("fibonacci"); + expect(parsed.count).toBeGreaterThanOrEqual(1); + expect(parsed.results[0].file).toContain("math.ts"); + } finally { + s.cleanup(); + } + }, + 300_000, + ); + }, +); + +// Lightweight always-on guard: even without the integration test running, we +// can still assert that the test file's `describe.skipIf` gate is correctly +// formed. This catches a future edit that accidentally inverts the gate. +test("integration test gate uses the correct skip predicate", () => { + // shouldRun must be the boolean AND of the two pre-checks. If a refactor + // makes it true when either piece is missing, the test below would attempt + // real API calls without a key — undefined behavior. + expect(shouldRun).toBe(gbrainAvailable && voyageKeyPresent); + // When skipping, we logged a reason — basic sanity that the reason string + // matches what shouldRun says. + if (!shouldRun) { + expect(skipReason.length).toBeGreaterThan(0); + } +});