diff --git a/CLAUDE.md b/CLAUDE.md
index f637dd6a..5d01bd54 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -81,6 +81,8 @@ Defined in `src/types.ts`. Both extractors and resolvers must use these exact st
 - `instructions-template.ts` no longer holds an instructions body — it exports only the `<!-- CODEGRAPH_START -->`/`<!-- CODEGRAPH_END -->` markers. The installer **stopped writing** a `## CodeGraph` block into each agent's instructions file (`CLAUDE.md` / `~/.codex/AGENTS.md` / `~/.config/opencode/AGENTS.md` / `~/.gemini/GEMINI.md` / `.cursor/rules/codegraph.mdc` / Kiro steering doc) because it duplicated the MCP `initialize` instructions verbatim (issue #529). Each target's `install` (self-heal on upgrade) and `uninstall` use the markers to **strip** a block a previous install left behind. `server-instructions.ts` is the single source of truth for agent-facing guidance.
 - All installer changes need matching coverage in `__tests__/installer-targets.test.ts` — there are ~47 parameterized contract tests covering install idempotency, sibling preservation, uninstall reverses install, byte-equal re-runs returning `unchanged`, and partial-state recovery for Codex.
 
+To add a new language, follow the cookbook at [`docs/ADDING-A-LANGUAGE.md`](docs/ADDING-A-LANGUAGE.md).
+
 ### Cursor MCP working-directory quirk
 
 Cursor launches MCP subprocesses with the wrong cwd and doesn't pass `rootUri` in `initialize`. The installer injects `--path` into Cursor's MCP args — absolute path for local installs, `${workspaceFolder}` for global installs. If you touch Cursor wiring, preserve this.
@@ -263,3 +265,4 @@ publish actions on shared state. Write the files, hand the user the commands.
   - The **last main commit** — `git log --first-parent main -1 --format='%ai %h %s'`. A comment after the last release but before a fix on main may already be addressed there but unreleased.
   - The **current branch's tip** — your own unmerged work obviously can't be what the comment is reacting to.
   Always disambiguate "released," "merged-but-unreleased," and "in-progress" before agreeing that a user-reported problem is unfixed (or that a fix is incomplete). A user saying "your fix only covers X" about a recent PR is usually pointing at the *released* shortcomings — your in-flight branch may already address them but they have no way to know that.
+- For contributor-facing guidance (PR workflow, commit conventions), see `CONTRIBUTING.md`.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 00000000..07f86a71
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,248 @@
+# Contributing to CodeGraph
+
+Thanks for your interest in contributing! This guide covers everything you need to
+get started.
+
+## Table of Contents
+
+- [Code of Conduct](#code-of-conduct)
+- [Getting Started](#getting-started)
+- [Development Setup](#development-setup)
+- [Project Architecture](#project-architecture)
+- [Making Changes](#making-changes)
+- [Adding a New Language](#adding-a-new-language)
+- [Testing](#testing)
+- [Commit Messages](#commit-messages)
+- [Pull Requests](#pull-requests)
+- [Reporting Issues](#reporting-issues)
+
+---
+
+## Code of Conduct
+
+Be respectful and constructive. We're all here to build something useful.
+
+## Getting Started
+
+### Prerequisites
+
+- **Node.js** >= 20.0.0, < 25.0.0 (Node 25.x has a V8 WASM JIT bug — see
+  [#81](https://github.com/colbymchenry/codegraph/issues/81))
+- **npm** (ships with Node)
+- **Git**
+
+### Fork and Clone
+
+```bash
+# Fork via GitHub UI, then:
+git clone https://github.com/<your-username>/codegraph.git
+cd codegraph
+git remote add upstream https://github.com/colbymchenry/codegraph.git
+```
+
+### Install and Verify
+
+```bash
+npm install
+npm run build
+npm test
+```
+
+If all tests pass, you're ready to go.
+
+## Development Setup
+
+### Useful Commands
+
+| Command | What it does |
+|---|---|
+| `npm run build` | Compile TypeScript + copy assets into `dist/` |
+| `npm run dev` | Watch mode — rebuilds on file change |
+| `npm run clean` | Remove `dist/` |
+| `npm test` | Run the full test suite (vitest) |
+| `npm run test:watch` | Run tests in watch mode |
+| `npm run cli` | Build then run the local CLI binary |
+
+### Running a Single Test
+
+```bash
+npx vitest run __tests__/extraction.test.ts -t "Groovy"
+```
+
+### Project Structure
+
+```
+src/
+├── index.ts              # Public API (CodeGraph class)
+├── types.ts              # Core type definitions (NodeKind, EdgeKind, Language)
+├── db/                   # SQLite database layer (schema, queries)
+├── extraction/           # Tree-sitter parsing and per-language extractors
+│   ├── languages/        # One file per language (python.ts, go.ts, ...)
+│   ├── tree-sitter.ts    # Core extraction engine
+│   └── wasm/             # Vendored grammar .wasm files
+├── resolution/           # Cross-file reference resolution
+│   └── frameworks/       # Framework-specific resolvers (Express, Django, ...)
+├── graph/                # Graph traversal (BFS/DFS, impact radius)
+├── context/              # Context building for AI consumption
+├── mcp/                  # MCP server implementation
+├── installer/            # Multi-agent installer (Claude, Cursor, Codex, opencode)
+├── search/               # FTS5 query parsing
+├── sync/                 # File watcher and git hooks
+└── bin/                  # CLI entry point
+__tests__/                # Tests mirror the module they cover
+docs/                     # Design docs, benchmarks, cookbooks
+```
+
+## Making Changes
+
+### Branching
+
+Always branch off `upstream/main`:
+
+```bash
+git checkout -b feat/my-feature upstream/main
+```
+
+Use descriptive branch names: `feat/`, `fix/`, `docs/`, `refactor/`.
+
+### What to Edit
+
+- **Types** (`src/types.ts`): `NodeKind`, `EdgeKind`, and `Language` are
+  runtime-iterable `const` arrays. Changing them affects the entire pipeline.
+- **Extractors** (`src/extraction/languages/`): Each language has its own file
+  exporting a `LanguageExtractor` config. See
+  [Adding a New Language](#adding-a-new-language).
+- **MCP tools** (`src/mcp/`): Changes to tool behavior require updating all
+  three of `server-instructions.ts`, `instructions-template.ts`, and
+  `.cursor/rules/codegraph.mdc` — they're the same guidance written to different
+  places.
+- **Installer** (`src/installer/`): Adding a new agent target is one new file in
+  `targets/` + one entry in `registry.ts`. All changes need test coverage in
+  `__tests__/installer-targets.test.ts`.
+
+### Build Verification
+
+Before committing, always run:
+
+```bash
+npm run build && npm test
+```
+
+TypeScript strict mode is fully enabled — the compiler catches a lot. The build
+also copies `.wasm` files and `schema.sql` into `dist/`; if you add new assets,
+make sure `copy-assets` picks them up.
+
+## Adding a New Language
+
+There's a dedicated cookbook for this — see
+[`docs/ADDING-A-LANGUAGE.md`](docs/ADDING-A-LANGUAGE.md). It walks through:
+
+1. Sourcing a tree-sitter `.wasm` grammar
+2. Probing the AST before writing code
+3. Registering the language (one new file + two registry lines)
+4. Writing the extractor (two patterns: `LanguageExtractor` config vs custom class)
+5. Testing and PR checklist
+
+## Testing
+
+### Philosophy
+
+Tests use **real files** and **real SQLite** — there is no DB mocking. Each test
+creates a temp directory with `fs.mkdtempSync` and cleans up in `afterAll`/`afterEach`.
+
+### Running Tests
+
+```bash
+npm test                          # full suite
+npx vitest run __tests__/extraction.test.ts   # single file
+npx vitest run -t "TypeScript"    # filter by test name
+```
+
+### Writing Tests
+
+- Place tests in `__tests__/` mirroring the module they cover
+- Use `extractFromSource(filename, code)` for extraction unit tests
+- Use `it.runIf(process.platform === 'win32')(...)` for platform-gated tests
+- Clean up temp dirs in `afterEach`/`afterAll`
+- Don't skip or mock the database — integration tests must hit real SQLite
+
+### Evaluation Tests
+
+The `__tests__/evaluation/` directory has retrieval quality benchmarks. Run
+with:
+
+```bash
+npm run eval    # builds first, then runs the evaluation runner
+```
+
+These are not part of the standard test suite and are run separately.
+
+## Commit Messages
+
+We follow [Conventional Commits](https://www.conventionalcommits.org/):
+
+```
+type(scope): description
+```
+
+**Types:** `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `perf`
+
+**Common scopes:** `extraction`, `resolution`, `mcp`, `cli`, `installer`,
+`watcher`, `npm`, `release`
+
+**Examples:**
+
+```
+feat(extraction): add Groovy language support
+fix(resolution): stream node-kind scans in synthesis to fix OOM
+docs(readme): link to the website & docs site
+chore: update vitest to v2.1.9
+```
+
+- Use the imperative mood ("add" not "added")
+- Keep the subject line under 72 characters
+- Reference issues with `Closes #123` or `Relates to #123` in the body
+
+## Pull Requests
+
+### Before Opening
+
+1. Rebase on the latest `upstream/main`
+2. Run `npm run build && npm test` — everything must pass
+3. Run `npx tsc --noEmit` — no type errors
+4. Keep changes focused — one concern per PR
+
+### PR Description
+
+Include:
+
+- **What** changed and **why**
+- Test plan (what you ran, what passed)
+- Any known limitations or follow-up work
+- For language additions: grammar source, version, license, and sha256 if
+  vendored (see [docs/ADDING-A-LANGUAGE.md](docs/ADDING-A-LANGUAGE.md) §8)
+
+### Review Process
+
+- Maintainers may request changes — please respond to feedback
+- Keep PRs up to date with `upstream/main` via rebase (not merge commits)
+- Squash commits if the history is messy
+
+## Reporting Issues
+
+When filing a bug report, please include:
+
+- CodeGraph version (`codegraph --version` or `npx @colbymchenry/codegraph --version`)
+- Node.js version (`node --version`)
+- Operating system and version
+- Steps to reproduce
+- Expected vs actual behavior
+- Relevant log output or error messages
+
+For feature requests, describe the use case and why existing functionality
+doesn't cover it.
+
+## License
+
+By contributing, you agree that your contributions will be licensed under the
+[MIT License](LICENSE).
diff --git a/README.md b/README.md
index 1a9800ee..405b7bc0 100644
--- a/README.md
+++ b/README.md
@@ -628,6 +628,8 @@ is written):
 | Lua | `.lua` | Full support (functions, methods with receivers, local variables, `require` imports, call edges) |
 | Luau | `.luau` | Full support (everything in Lua, plus `type`/`export type` aliases, typed signatures, and Roblox instance-path `require`) |
 
+Want to add another language? See [`docs/ADDING-A-LANGUAGE.md`](docs/ADDING-A-LANGUAGE.md) — it walks through sourcing a tree-sitter grammar, probing the AST, choosing between the OO and self-contained extractor patterns, and the worked examples in the existing extractors.
+
 ## Troubleshooting
 
 **"CodeGraph not initialized"** — Run `codegraph init` in your project directory first.
@@ -653,6 +655,12 @@ is written):
  </picture>
 </a>
 
+## Contributing
+
+Contributions are welcome! See [`CONTRIBUTING.md`](CONTRIBUTING.md) for development setup, testing conventions, and PR guidelines.
+
+Want to add a new language? See [`docs/ADDING-A-LANGUAGE.md`](docs/ADDING-A-LANGUAGE.md) — it walks through sourcing a tree-sitter grammar, probing the AST, choosing an extractor pattern, and the worked examples in the existing extractors.
+
 ## License
 
 MIT
diff --git a/docs/ADDING-A-LANGUAGE.md b/docs/ADDING-A-LANGUAGE.md
new file mode 100644
index 00000000..89119a95
--- /dev/null
+++ b/docs/ADDING-A-LANGUAGE.md
@@ -0,0 +1,503 @@
+# Adding a Language
+
+This is a cookbook for adding a new language to CodeGraph. It assumes you have a
+working dev setup (`npm install` and `npm test` pass).
+
+There are two patterns. **Pick the one that matches the language you're adding.**
+
+| Language shape | Pattern | Examples |
+|---|---|---|
+| Procedural / OO with named functions, classes, methods | **`LanguageExtractor` config** | `python.ts`, `ruby.ts`, `r.ts` |
+| Declarative / template / configuration / no named functions | **Custom extractor class** | `hcl-extractor.ts`, `liquid-extractor.ts`, `sql-extractor.ts` |
+
+The two patterns share the same setup steps (1–4) and only diverge at the extractor
+itself (step 5).
+
+---
+
+## 1. Source a tree-sitter wasm grammar
+
+CodeGraph parses everything via [`web-tree-sitter`](https://www.npmjs.com/package/web-tree-sitter),
+so the grammar has to be available as a `.wasm` file. Three options, in order of
+preference:
+
+### 1a. Already in `tree-sitter-wasms`
+
+The [`tree-sitter-wasms`](https://www.npmjs.com/package/tree-sitter-wasms) npm package
+ships pre-built wasms for 30+ common languages. Check `node_modules/tree-sitter-wasms/out/`
+after a fresh install:
+
+```bash
+ls node_modules/tree-sitter-wasms/out/ | grep <lang>
+```
+
+If your grammar is there, you're done with this step — just reference the filename.
+
+### 1b. A pre-built `.wasm` released somewhere else
+
+Many grammars publish wasms in their GitHub releases (e.g. r-lib/tree-sitter-r) or
+in a separate npm package (e.g. `@tree-sitter-grammars/tree-sitter-hcl` ships
+`tree-sitter-hcl.wasm` directly in the tarball).
+
+```bash
+# GitHub release
+curl -sL -o src/extraction/wasm/tree-sitter-foo.wasm \
+  https://github.com/.../releases/download/vX.Y.Z/tree-sitter-foo.wasm
+
+# Inside an npm tarball
+mkdir -p /tmp/foo && cd /tmp/foo
+curl -sL https://registry.npmjs.org/tree-sitter-foo/-/tree-sitter-foo-X.Y.Z.tgz | tar xz
+cp package/tree-sitter-foo.wasm <repo>/src/extraction/wasm/
+```
+
+Verify the sha256 against the upstream release manifest before committing.
+
+### 1c. Build from source
+
+If only the C source is published (e.g. DerekStride/tree-sitter-sql), build the wasm
+locally with `tree-sitter-cli`. Recent versions ship their own wasi-sdk and don't need
+Docker or local emcc:
+
+```bash
+mkdir /tmp/foo && cd /tmp/foo
+curl -sL https://github.com/.../releases/download/vX.Y.Z/tree-sitter-foo.tar.gz | tar xz
+npx --yes tree-sitter-cli@latest build --wasm
+cp tree-sitter-foo.wasm <repo>/src/extraction/wasm/
+```
+
+### Where the wasm lives
+
+- Grammars from the `tree-sitter-wasms` package are loaded directly from there at runtime.
+- Other grammars must be **vendored** under `src/extraction/wasm/` so they ship in the
+  npm package. The build's `copy-assets` script copies every `.wasm` from that
+  directory into `dist/extraction/wasm/`.
+
+**License check.** Tree-sitter grammars are usually MIT or Apache-2.0 — confirm before
+committing the wasm and note the source/version in the file's header comment so the
+provenance is recoverable later.
+
+---
+
+## 2. Probe the AST
+
+Don't guess at node types. Parse a representative sample and dump the tree:
+
+```js
+// scratch/probe.mjs
+import { Parser, Language } from 'web-tree-sitter';
+await Parser.init();
+const lang = await Language.load('./src/extraction/wasm/tree-sitter-foo.wasm');
+const parser = new Parser();
+parser.setLanguage(lang);
+
+const sample = `
+// realistic code here — cover every construct you plan to extract
+`;
+
+const tree = parser.parse(sample);
+function dump(n, d = 0, max = 4) {
+  if (d > max) return;
+  const text = n.text.length > 60 ? n.text.slice(0, 60).replace(/\n/g, '\\n') + '...' : n.text.replace(/\n/g, '\\n');
+  console.log(`${'  '.repeat(d)}${n.type}  "${text}"`);
+  for (let i = 0; i < n.namedChildCount; i++) dump(n.namedChild(i), d + 1, max);
+}
+dump(tree.rootNode);
+```
+
+```bash
+node scratch/probe.mjs
+```
+
+Cover every construct you plan to extract: function definitions, classes, methods,
+imports, assignments, calls, references. Watch for surprises:
+
+- Some grammars wrap names in extra layers (`identifier > simple_identifier`)
+- Field names (`childForFieldName`) often differ from what the docs imply
+- Operator nodes can be named, unnamed, or both — call `child(i)` vs `namedChild(i)`
+  and inspect
+
+Save the probe output before you start coding — you'll refer to it constantly.
+
+---
+
+## 3. Register the language
+
+Adding a language is **one new file plus two registry lines**. The per-language
+registry (`src/extraction/languages/`) is the single source of truth — extension
+maps, include globs, grammar config, and the EXTRACTORS lookup are all derived
+from it.
+
+**Step 3a — Create `src/extraction/languages/foo.ts`** with a `LanguageDef`:
+
+```ts
+import type { LanguageDef } from './types';
+import type { LanguageExtractor } from '../tree-sitter-types';
+
+// Path A languages (procedural / OO — Python, Ruby, R) define a
+// LanguageExtractor here and reference it from the def below.
+export const fooExtractor: LanguageExtractor = {
+  functionTypes: ['function_definition'],
+  classTypes: ['class_definition'],
+  // ... see Section 5a for the full shape
+};
+
+export const FOO_DEF: LanguageDef = {
+  name: 'foo',
+  displayName: 'Foo',
+  extensions: ['.foo'],
+  includeGlobs: ['**/*.foo'],
+  grammar: {
+    wasmFile: 'tree-sitter-foo.wasm',
+    vendored: true,            // omit if the wasm lives in `tree-sitter-wasms`
+    extractor: fooExtractor,
+  },
+  // For Path B languages (HCL / SQL / Liquid — non-OO), set
+  // customExtractor instead of (or in addition to) `extractor`:
+  // customExtractor: (filePath, source) => new FooExtractor(filePath, source).extract(),
+};
+```
+
+**Step 3b — Register in `src/extraction/languages/registry.ts`** (2 lines):
+
+```ts
+import { FOO_DEF } from './foo';   // alphabetical
+// ...
+const ALL_DEFS: readonly LanguageDef[] = [
+  // ... existing definitions, alphabetical
+  FOO_DEF,
+  // ...
+];
+```
+
+**Step 3c — Add `'foo'` to the `Language` union in `src/types.ts`** (1 line):
+
+```ts
+export type Language =
+  | 'typescript'
+  | ...
+  | 'foo'                  // ← add here
+  | 'unknown';
+```
+
+That's it. `DEFAULT_CONFIG.include`, `EXTENSION_MAP`, the `EXTRACTORS` lookup,
+and `getLanguageDisplayName()` are all derived from the registry — no parallel
+lists to keep in sync.
+
+The `Language` union update is the only spot that touches a shared file. New
+languages registered only via the registry (without a `Language` union entry)
+also work at runtime — the union is mostly for TypeScript narrowing in
+language-specific resolution code.
+
+> **Why per-file?** Two PRs adding two different languages used to collide on
+> the same `EXTRACTORS` map, the same `EXTENSION_MAP`, the same `Language`
+> union, and the same `WASM_GRAMMAR_FILES` table. With per-file `LanguageDef`s,
+> two language PRs only conflict if their alphabetical positions in `registry.ts`
+> happen to land on the same line — almost never. See `src/extraction/languages/`
+> for ~20 worked examples.
+
+**`CLAUDE.md`** — append the language to the "Supported Languages" line so the
+LLM-readable architecture doc stays in sync.
+
+---
+
+## 4. Type-check before writing the extractor
+
+Run `npx tsc --noEmit` now. If it's not clean, the wiring is wrong — fix that
+before adding extraction logic, otherwise type errors will pile up.
+
+---
+
+## 5a. Path A — Plug into `LanguageExtractor`
+
+Use this when the language has named function/class/method declarations (Python, Ruby,
+Java, R, etc.). Create `src/extraction/languages/<lang>.ts`:
+
+```ts
+import type { LanguageExtractor } from '../tree-sitter-types';
+
+export const fooExtractor: LanguageExtractor = {
+  // Map AST node types → graph kinds. Empty array = "this kind doesn't
+  // exist in this language."
+  functionTypes: ['function_definition'],
+  classTypes: ['class_definition'],
+  methodTypes: ['function_definition'],   // often the same node, dispatched by context
+  interfaceTypes: [],
+  structTypes: [],
+  enumTypes: [],
+  typeAliasTypes: [],
+  importTypes: ['import_statement'],
+  callTypes: ['call'],
+  variableTypes: ['assignment'],
+
+  // Field names tree-sitter exposes for extractors to read.
+  nameField: 'name',
+  bodyField: 'body',
+  paramsField: 'parameters',
+  returnField: 'return_type',
+
+  // Optional hooks — implement what you need:
+  getSignature: (node, source) => { ... },
+  isExported: (node, source) => { ... },
+  isAsync: (node) => { ... },
+
+  // Escape hatch: take over a specific node type entirely. Return true to
+  // tell the core "I handled this, skip default dispatch."
+  visitNode: (node, ctx) => {
+    // R uses this to handle `name <- function() {}` because tree-sitter's
+    // function_definition has no name field — the name is on the LHS of
+    // the enclosing assignment.
+    return false;
+  },
+};
+```
+
+Reference it from your `LanguageDef` (Section 3a):
+
+```ts
+// in src/extraction/languages/foo.ts
+export const FOO_DEF: LanguageDef = {
+  name: 'foo',
+  // ...
+  grammar: { wasmFile: 'tree-sitter-foo.wasm', vendored: true, extractor: fooExtractor },
+};
+```
+
+The core (`TreeSitterExtractor` in `src/extraction/tree-sitter.ts`) does the rest:
+walks the AST, dispatches based on your `*Types` arrays, calls your hooks, manages
+the scope stack, and emits nodes/edges.
+
+**Worked example: R** (`src/extraction/languages/r.ts`). R's `function_definition`
+has no name (it's anonymous), so `functionTypes` is empty and the `visitNode` hook
+intercepts `binary_operator` assignments and emits the function manually via
+`ctx.createNode('function', name, ...)`.
+
+## 5b. Path B — Custom extractor class
+
+Use this when the language is declarative (HCL, SQL, dbt) or has a fundamentally
+different shape than functions/classes/methods (Liquid templates, Pascal `.dfm` form
+files). Create `src/extraction/<lang>-extractor.ts`:
+
+```ts
+import { Node, Edge, ExtractionResult, ExtractionError, UnresolvedReference } from '../types';
+import { generateNodeId, getNodeText } from './tree-sitter-helpers';
+import { getParser } from './grammars';
+
+export class FooExtractor {
+  private filePath: string;
+  private source: string;
+  private nodes: Node[] = [];
+  private edges: Edge[] = [];
+  private unresolvedReferences: UnresolvedReference[] = [];
+  private errors: ExtractionError[] = [];
+
+  constructor(filePath: string, source: string) {
+    this.filePath = filePath;
+    this.source = source;
+  }
+
+  extract(): ExtractionResult {
+    const startTime = Date.now();
+    const parser = getParser('foo');
+    if (!parser) {
+      this.errors.push({ message: 'foo grammar not loaded', severity: 'error', code: 'grammar_unavailable' });
+      return this.result(startTime);
+    }
+    const tree = parser.parse(this.source);
+    if (!tree) { ... return this.result(startTime); }
+
+    try {
+      const fileNodeId = this.createFileNode();
+      // Walk the AST, emit nodes via this.nodes.push and this.edges.push
+      // Emit references via this.unresolvedReferences.push so the resolver
+      // pass can match them across files.
+      ...
+      return this.result(startTime);
+    } finally {
+      tree.delete();   // ← important: tree-sitter trees back onto WASM memory
+    }
+  }
+
+  private result(startTime: number): ExtractionResult {
+    return {
+      nodes: this.nodes,
+      edges: this.edges,
+      unresolvedReferences: this.unresolvedReferences,
+      errors: this.errors,
+      durationMs: Date.now() - startTime,
+    };
+  }
+}
+```
+
+Wire the dispatch via `customExtractor` in your `LanguageDef` (Section 3a):
+
+```ts
+// in src/extraction/languages/foo.ts
+import { FooExtractor } from '../foo-extractor';
+import type { LanguageDef } from './types';
+
+export const FOO_DEF: LanguageDef = {
+  name: 'foo',
+  displayName: 'Foo',
+  extensions: ['.foo'],
+  includeGlobs: ['**/*.foo'],
+  // For languages that need a tree-sitter parser AND a custom extractor
+  // (HCL, SQL): set both `grammar` and `customExtractor`. The grammar
+  // entry only registers the wasm so the parser is available; the
+  // customExtractor takes the dispatch.
+  grammar: { wasmFile: 'tree-sitter-foo.wasm', vendored: true, extractor: { /* skeleton */ } },
+  customExtractor: (filePath, source) => new FooExtractor(filePath, source).extract(),
+};
+```
+
+The dispatch in `src/extraction/tree-sitter.ts` reads `customExtractor` off
+the language def — no per-language `if` branches to maintain.
+
+**Worked examples:**
+
+- `src/extraction/hcl-extractor.ts` — Terraform / HCL. Block-based DDL. Each
+  top-level block becomes a node whose qualified name matches the Terraform
+  reference form (`var.X`, `local.X`, `module.X`, `aws_s3_bucket.foo`) so the
+  resolver can match references across files automatically.
+- `src/extraction/sql-extractor.ts` — SQL DDL. CREATE TABLE / VIEW / FUNCTION /
+  TRIGGER / TYPE / SCHEMA → graph nodes; foreign keys, view source tables,
+  trigger target tables and executed functions → edges.
+- `src/extraction/liquid-extractor.ts` — Shopify Liquid templates. Regex-based
+  (no tree-sitter) since the template grammar isn't useful for code intelligence.
+
+---
+
+## 6. Pick `NodeKind` and `EdgeKind` values
+
+`NodeKind` and `EdgeKind` are fixed unions in `src/types.ts`. Map your language's
+constructs onto the closest existing kind rather than introducing new ones —
+adding a new kind is a cross-cutting change that touches search, resolution, and
+context-building code.
+
+Common mappings used by recent extractors:
+
+| Language construct | NodeKind |
+|---|---|
+| Function / procedure / standalone routine | `function` |
+| Method on a class | `method` |
+| Class / type / table / declarative resource | `class` |
+| Trait / mixin | `trait` |
+| Interface / protocol | `interface` |
+| Module / package / file-level scope / Terraform module | `module` |
+| Namespace / schema / SQL schema / Terraform provider | `namespace` |
+| Variable / Terraform variable | `variable` |
+| Constant / Terraform local / R top-level binding | `constant` |
+| Type alias / SQL composite type | `type_alias` |
+| Enum (any) | `enum` |
+| Import / library / source / require | `import` |
+| Output / re-export / Terraform output | `export` |
+
+Edges are usually one of:
+
+| Edge | When |
+|---|---|
+| `contains` | Parent contains child (file → block, class → method) |
+| `calls` | Function/method invokes another |
+| `imports` | File pulls in another module/file |
+| `references` | Generic mention of another symbol (FK, lookup, attribute access) |
+| `extends` / `implements` | Inheritance relationships |
+
+Emit references through `unresolvedReferences` (with `referenceName` set to a
+qualified name that matches what you put on the target node's `qualifiedName`) —
+the resolver pass matches them across files using the `name-matcher` and
+`import-resolver` modules.
+
+---
+
+## 7. Tests
+
+Tests live in `__tests__/extraction.test.ts`, grouped by language with a
+`describe('<Language> Extraction', ...)` block. Use `extractFromSource` directly
+for unit-style tests:
+
+```ts
+import { extractFromSource } from '../src/extraction';
+
+describe('Foo Extraction', () => {
+  describe('Language detection', () => {
+    it('should detect Foo files', () => {
+      expect(detectLanguage('main.foo')).toBe('foo');
+    });
+  });
+
+  describe('Function extraction', () => {
+    it('should extract a top-level function', () => {
+      const code = `function add(a, b) a + b`;
+      const result = extractFromSource('main.foo', code);
+      const fn = result.nodes.find((n) => n.kind === 'function' && n.name === 'add');
+      expect(fn).toBeDefined();
+    });
+  });
+});
+```
+
+Cover the AST shapes you saw in the probe, especially the surprising ones. Pay
+particular attention to:
+
+- The smallest possible valid program (`expect(...).toBeDefined()` for the file node)
+- Each node-kind mapping (one test per emitted kind)
+- Reference forms (call edges, FK / cross-file references, imports)
+- Anything you intentionally skipped (anonymous lambdas, dynamic imports, etc.)
+  with a negative assertion so the omission is documented
+
+Run the suite serialized to avoid the file-watcher tests' parallel flakiness:
+
+```bash
+npx vitest run --no-file-parallelism
+```
+
+End-to-end smoke test from a fresh fixture before opening the PR:
+
+```bash
+SMOKE=$(mktemp -d) && cat > "$SMOKE/main.foo" <<'EOF'
+... realistic input ...
+EOF
+cd "$SMOKE" && git init -q
+node <repo>/dist/bin/codegraph.js init "$SMOKE"
+node <repo>/dist/bin/codegraph.js index "$SMOKE"
+node <repo>/dist/bin/codegraph.js status "$SMOKE"
+cd "$SMOKE" && node <repo>/dist/bin/codegraph.js query "<symbol>"
+```
+
+The `status` call should report your file under "Files by Language", and `query`
+should turn up the symbols you expect at the right line numbers.
+
+---
+
+## 8. Open the PR
+
+Include in the PR description:
+
+- The grammar source + version + license + sha256 (if vendored)
+- A small worked example showing what gets extracted
+- The full test plan (`npm test`, `tsc`, `npm run build`, CLI smoke)
+- Any known limitations (constructs not supported, AST quirks, things the grammar
+  itself can't parse)
+
+Don't claim support for constructs the grammar can't actually parse — this happens
+more often than you'd expect (e.g. `tree-sitter-sql` errors out on `CREATE
+PROCEDURE` because procedure-body syntax varies sharply across dialects). Say what
+works, say what doesn't, and let reviewers decide.
+
+---
+
+## Reference: existing extractors as templates
+
+Read these in source order if your language is similar to one of them:
+
+- **Procedural / OO:** `src/extraction/languages/python.ts` (small, easy to read),
+  `ruby.ts` (with bare-call detection), `kotlin.ts` (extension functions),
+  `r.ts` (no `def` keyword — uses `visitNode` hook for assignments)
+- **Declarative / config:** `src/extraction/hcl-extractor.ts` (Terraform reference
+  graph), `sql-extractor.ts` (DDL with FK / view source extraction)
+- **Embedded / template:** `src/extraction/svelte-extractor.ts` (delegates to JS
+  for `<script>` blocks), `liquid-extractor.ts` (regex-based, no tree-sitter)
+- **Form / non-tree-sitter:** `src/extraction/dfm-extractor.ts` (Delphi `.dfm`
+  files; line-based regex parser cross-linked with Pascal symbols)
+
+When in doubt, copy the extractor closest in shape to yours and modify from there.