Skip to content

Latest commit

 

History

History
189 lines (156 loc) · 9.33 KB

File metadata and controls

189 lines (156 loc) · 9.33 KB

CodeWell Release Readiness

This report summarizes whether CodeWell is ready for public release.

For the next-phase architecture plan beyond this release snapshot, see docs/ARCHITECTURE_V2.md.

Decision

CodeWell is ready for local maintainer testing, but not ready for a public launch yet.

The core package, CLI, MCP entry points, fixture evaluations, self-evaluations, and package smoke checks pass the current release gate. The first open-source evaluation is complete and now passes all committed checks.

Last Local Gate

Last verified locally on May 17, 2026 with:

python scripts/check_release.py

Result: passed.

This was re-verified after fixing the current ruff and mypy regressions that had temporarily broken the gate earlier on May 17, 2026.

The gate covers:

  • CodeWell self-evaluation tasks
  • natural-language retrieval evaluation tasks
  • acceptance evaluation tasks
  • fixture project evaluation
  • TypeScript fixture evaluation
  • TypeScript barrel evaluation
  • TypeScript extended evaluation
  • TypeScript arrow evaluation
  • TypeScript routes evaluation
  • TypeScript command evaluation
  • TypeScript test evaluation
  • TypeScript reexport evaluation
  • package build and installed CLI smoke test
  • unit tests
  • ruff linting
  • mypy type checking
  • pip dependency consistency
  • Git diff whitespace checks

Current Feature Coverage

Implemented:

  • local workspace indexing for Python, TypeScript, and JavaScript source files
  • incremental re-indexing for changed and deleted files
  • SQLite schema migrations, FTS5 search, status, and overview commands
  • symbol extraction for Python files using the standard-library ast parser
  • initial symbol, import, and direct-call extraction for TypeScript and JavaScript files
  • direct call-edge tracing
  • JSON context packs with budget-aware selected files, symbols, imports, call edges, traces, and revision memory
  • context-pack graph expansion for related support files through imports, caller files, and callee definitions
  • static test, command, and route relationship expansion for Python context packs
  • relative-import graph expansion, barrel and re-export support, and common test, command, and route expansion for TypeScript and JavaScript context packs
  • multi-hop TS/JS graph expansion through alias imports, export * chains, and importer-side route-entry relationships
  • task-level TS/JS evaluation coverage for route-entry expansion, command-entry expansion, test-entry expansion, and named re-export chains
  • local GitHub-backed TS/JS real-project evaluation coverage for:
    • Express + Prisma route/controller/service/repository/schema/test flows
    • React + Vite auth route/form/api/test/protected-route flows
    • NestJS clean-architecture controller/use-case/factory/CRM/e2e flows
    • NestJS boilerplate auth/service/guard/interceptor/bootstrap flows
    • AdonisJS controller-validator/auth-middleware/kernel-registration flows
  • actionable CLI guidance for missing databases, unsupported URLs, and empty searches
  • copy-paste installed-user first-run examples for verifying the core loop locally
  • stable JSON examples for CLI trace/context output and MCP search/context/revision-memory tool results
  • local read-only UI for repository status, lexical search, revision-memory search, ingest history, detached workspace health, repair queue state, repair-audit summaries, provenance, and raw/derived boundary inspection
  • detached-library initialization with codewell init-library
  • detached-library indexing with --library-root so raw source trees can remain unchanged
  • managed detached-library intake for external ZIP archives, source folders, bare code files, papers, and loose documents
  • read-only protection for managed imported code, with best-effort Windows ACL hardening
  • optional auto-indexing immediately after intake import
  • detached-library repair planning, audit inspection, and admin/ops status views
  • revision-memory applicability scoring with local match signals and warnings for weak or stale fixes
  • public GitHub repository archive ingest with local caching and provenance metadata
  • token-authenticated GitHub ingest for private repositories without storing credentials in provenance metadata
  • intake-imported papers/documents surfaced as lightweight relevant references in context packs and MCP
  • revision memory for failures, fixes, verification state, and applicability notes
  • CLI commands for indexing, search, trace, context, status, overview, ingest history, detached library initialization, repair planning, repair-audit inspection, revision recording/search, UI serving, and MCP serving
  • MCP tools over the same local engine
  • fixture, self, natural-language, acceptance, package, and multi-project evaluation harnesses
  • GitHub Actions workflow for the full release gate

Not implemented:

  • optional embeddings or reranking
  • richer local UI views for files, symbol details, failures, and graph relationships
  • richer route, command, test, import, caller, and callee relationships beyond the current direct graph, especially outside Python

Known Limitations

  • Retrieval quality is now checked against fixtures, CodeWell itself, three open-source Python libraries, multiple bundled TypeScript task fixtures, and five local GitHub-backed TS/JS repositories, but the maintained TS/JS real-project set is still small.
  • The latest Claude Code A/B automation is operational, but some recent TS/JS agent-eval runs are not formal evidence because older source trees under eval-repos/github/ were found to be polluted. Clean baselines must come from eval-repos/source-readonly/.
  • The current next V2 repeated-run check is temporarily blocked by the external Claude runtime environment: local tests pass, but outbound HTTPS connectivity for the configured evaluation runtime is currently failing.
  • Context pack explanations cover selected files, but they are still heuristic and should be improved as richer graph relationships are added.
  • Intake-imported paper/document retrieval is now lightweight and useful, but PDF-specific paper extraction is still shallow; title/abstract/keyword extraction is not implemented yet.
  • Call edges are static and direct; dynamic dispatch, framework routes, command handlers, and test relationships are not modeled generally, and TS/JS parsing is still intentionally shallow beyond common declaration and import forms.
  • Public GitHub ingest downloads archive snapshots and is not optimized for very large repositories.

Required Evidence Before Public Launch

Before publishing the repository as a public project, decide whether the current pre-alpha scope is acceptable for the launch audience.

The current maintained real-project evidence now includes:

  • PyPI-backed open-source evaluation for Click, Requests, and Rich with 9 total tasks and 45 of 45 checks passing
  • local GitHub-backed evaluation for express-typescript-prisma-postgresql and bulletproof-react (apps/react-vite), plus nestjs-clean-architecture and nestjs-boilerplate, plus adonisjs-starter-kits (api-monorepo/apps/backend) with 20 total tasks and 98 of 98 checks passing

Agent A/B evidence note:

  • recent Claude Code pair runs are useful for pipeline validation, but any run copied from the old eval-repos/github/ source tree should not be treated as formal evidence until the clean baseline is rebuilt under eval-repos/source-readonly/
  • a current internal small-sample summary now lives in docs/AGENT_EVAL_SUMMARY.md; treat it as directional product evidence, not final public benchmark evidence
  • the current V1-facing evaluation interpretation is summarized in docs/V1_EVALUATION_NOTE.md and should be used for product-scope wording rather than any single run or partial subset
  • the current freeze-candidate baseline scope is summarized in docs/V1_FREEZE_CANDIDATE.md and should be used when deciding how to snapshot V1 separately from the public launch decision

Next Work

Highest priority:

  • broaden retrieval evaluation coverage before public launch
  • add another non-Nest framework-heavy TS backend target beyond the current maintained set
  • compare the current five GitHub-backed TS/JS targets against one additional non-trivial framework project before making broader release claims

Immediate continuation tasks:

  1. Restore a working outbound Claude evaluation runtime path so the pending V2 repeated-run check can complete.
  2. Re-run the current three-task strong_multi_goal V2 slice after the latest default-output tightening change.
  3. Re-check graph precision after raising TS/JS graph-expansion depth and candidate limits.
  4. Rebuild and verify clean eval-repos/source-readonly baselines before recording more formal Claude Code A/B evidence.
  5. Add another TS backend project outside the current NestJS-heavy set, ideally with plugin, hook, command, job, or service/repository conventions that differ from the current Express, NestJS, and AdonisJS samples.
  6. Add mixed named re-export plus export * task coverage where real-project evidence justifies it.

Important implementation note:

The current TS/JS direction is intentionally conservative. Continue preferring cheap, explicit static relationships over broad framework-specific inference or required LLM assistance.

Public release can wait until the remaining P1 developer-experience and retrieval-quality items are either implemented or explicitly accepted as pre-alpha limitations.