This report summarizes whether CodeWell is ready for public release.
For the next-phase architecture plan beyond this release snapshot, see docs/ARCHITECTURE_V2.md.
CodeWell is ready for local maintainer testing, but not ready for a public launch yet.
The core package, CLI, MCP entry points, fixture evaluations, self-evaluations, and package smoke checks pass the current release gate. The first open-source evaluation is complete and now passes all committed checks.
Last verified locally on May 17, 2026 with:
python scripts/check_release.pyResult: passed.
This was re-verified after fixing the current ruff and mypy regressions that had temporarily
broken the gate earlier on May 17, 2026.
The gate covers:
- CodeWell self-evaluation tasks
- natural-language retrieval evaluation tasks
- acceptance evaluation tasks
- fixture project evaluation
- TypeScript fixture evaluation
- TypeScript barrel evaluation
- TypeScript extended evaluation
- TypeScript arrow evaluation
- TypeScript routes evaluation
- TypeScript command evaluation
- TypeScript test evaluation
- TypeScript reexport evaluation
- package build and installed CLI smoke test
- unit tests
- ruff linting
- mypy type checking
- pip dependency consistency
- Git diff whitespace checks
Implemented:
- local workspace indexing for Python, TypeScript, and JavaScript source files
- incremental re-indexing for changed and deleted files
- SQLite schema migrations, FTS5 search, status, and overview commands
- symbol extraction for Python files using the standard-library
astparser - initial symbol, import, and direct-call extraction for TypeScript and JavaScript files
- direct call-edge tracing
- JSON context packs with budget-aware selected files, symbols, imports, call edges, traces, and revision memory
- context-pack graph expansion for related support files through imports, caller files, and callee definitions
- static test, command, and route relationship expansion for Python context packs
- relative-import graph expansion, barrel and re-export support, and common test, command, and route expansion for TypeScript and JavaScript context packs
- multi-hop TS/JS graph expansion through alias imports,
export *chains, and importer-side route-entry relationships - task-level TS/JS evaluation coverage for route-entry expansion, command-entry expansion, test-entry expansion, and named re-export chains
- local GitHub-backed TS/JS real-project evaluation coverage for:
- Express + Prisma route/controller/service/repository/schema/test flows
- React + Vite auth route/form/api/test/protected-route flows
- NestJS clean-architecture controller/use-case/factory/CRM/e2e flows
- NestJS boilerplate auth/service/guard/interceptor/bootstrap flows
- AdonisJS controller-validator/auth-middleware/kernel-registration flows
- actionable CLI guidance for missing databases, unsupported URLs, and empty searches
- copy-paste installed-user first-run examples for verifying the core loop locally
- stable JSON examples for CLI trace/context output and MCP search/context/revision-memory tool results
- local read-only UI for repository status, lexical search, revision-memory search, ingest history, detached workspace health, repair queue state, repair-audit summaries, provenance, and raw/derived boundary inspection
- detached-library initialization with
codewell init-library - detached-library indexing with
--library-rootso raw source trees can remain unchanged - managed detached-library intake for external ZIP archives, source folders, bare code files, papers, and loose documents
- read-only protection for managed imported code, with best-effort Windows ACL hardening
- optional auto-indexing immediately after intake import
- detached-library repair planning, audit inspection, and admin/ops status views
- revision-memory applicability scoring with local match signals and warnings for weak or stale fixes
- public GitHub repository archive ingest with local caching and provenance metadata
- token-authenticated GitHub ingest for private repositories without storing credentials in provenance metadata
- intake-imported papers/documents surfaced as lightweight relevant references in context packs and MCP
- revision memory for failures, fixes, verification state, and applicability notes
- CLI commands for indexing, search, trace, context, status, overview, ingest history, detached library initialization, repair planning, repair-audit inspection, revision recording/search, UI serving, and MCP serving
- MCP tools over the same local engine
- fixture, self, natural-language, acceptance, package, and multi-project evaluation harnesses
- GitHub Actions workflow for the full release gate
Not implemented:
- optional embeddings or reranking
- richer local UI views for files, symbol details, failures, and graph relationships
- richer route, command, test, import, caller, and callee relationships beyond the current direct graph, especially outside Python
- Retrieval quality is now checked against fixtures, CodeWell itself, three open-source Python libraries, multiple bundled TypeScript task fixtures, and five local GitHub-backed TS/JS repositories, but the maintained TS/JS real-project set is still small.
- The latest Claude Code A/B automation is operational, but some recent TS/JS agent-eval runs are
not formal evidence because older source trees under
eval-repos/github/were found to be polluted. Clean baselines must come fromeval-repos/source-readonly/. - The current next V2 repeated-run check is temporarily blocked by the external Claude runtime environment: local tests pass, but outbound HTTPS connectivity for the configured evaluation runtime is currently failing.
- Context pack explanations cover selected files, but they are still heuristic and should be improved as richer graph relationships are added.
- Intake-imported paper/document retrieval is now lightweight and useful, but PDF-specific paper extraction is still shallow; title/abstract/keyword extraction is not implemented yet.
- Call edges are static and direct; dynamic dispatch, framework routes, command handlers, and test relationships are not modeled generally, and TS/JS parsing is still intentionally shallow beyond common declaration and import forms.
- Public GitHub ingest downloads archive snapshots and is not optimized for very large repositories.
Before publishing the repository as a public project, decide whether the current pre-alpha scope is acceptable for the launch audience.
The current maintained real-project evidence now includes:
- PyPI-backed open-source evaluation for Click, Requests, and Rich with 9 total tasks and 45 of 45 checks passing
- local GitHub-backed evaluation for
express-typescript-prisma-postgresqlandbulletproof-react(apps/react-vite), plusnestjs-clean-architectureandnestjs-boilerplate, plusadonisjs-starter-kits(api-monorepo/apps/backend) with 20 total tasks and 98 of 98 checks passing
Agent A/B evidence note:
- recent Claude Code pair runs are useful for pipeline validation, but any run copied from the old
eval-repos/github/source tree should not be treated as formal evidence until the clean baseline is rebuilt undereval-repos/source-readonly/ - a current internal small-sample summary now lives in
docs/AGENT_EVAL_SUMMARY.md; treat it as directional product evidence, not final public benchmark evidence - the current V1-facing evaluation interpretation is summarized in
docs/V1_EVALUATION_NOTE.mdand should be used for product-scope wording rather than any single run or partial subset - the current freeze-candidate baseline scope is summarized in
docs/V1_FREEZE_CANDIDATE.mdand should be used when deciding how to snapshot V1 separately from the public launch decision
Highest priority:
- broaden retrieval evaluation coverage before public launch
- add another non-Nest framework-heavy TS backend target beyond the current maintained set
- compare the current five GitHub-backed TS/JS targets against one additional non-trivial framework project before making broader release claims
Immediate continuation tasks:
- Restore a working outbound Claude evaluation runtime path so the pending V2 repeated-run check can complete.
- Re-run the current three-task
strong_multi_goalV2 slice after the latest default-output tightening change. - Re-check graph precision after raising TS/JS graph-expansion depth and candidate limits.
- Rebuild and verify clean
eval-repos/source-readonlybaselines before recording more formal Claude Code A/B evidence. - Add another TS backend project outside the current NestJS-heavy set, ideally with plugin, hook, command, job, or service/repository conventions that differ from the current Express, NestJS, and AdonisJS samples.
- Add mixed named re-export plus
export *task coverage where real-project evidence justifies it.
Important implementation note:
The current TS/JS direction is intentionally conservative. Continue preferring cheap, explicit static relationships over broad framework-specific inference or required LLM assistance.
Public release can wait until the remaining P1 developer-experience and retrieval-quality items are either implemented or explicitly accepted as pre-alpha limitations.