Skip to content

Codex/claude cli provider#180

Open
Yukoval-Dakia wants to merge 35 commits into
alibaba:mainfrom
Yukoval-Dakia:codex/claude-cli-provider
Open

Codex/claude cli provider#180
Yukoval-Dakia wants to merge 35 commits into
alibaba:mainfrom
Yukoval-Dakia:codex/claude-cli-provider

Conversation

@Yukoval-Dakia

@Yukoval-Dakia Yukoval-Dakia commented Jun 20, 2026

Copy link
Copy Markdown

Description

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring (no functional changes)
  • Documentation update
  • CI / Build / Tooling

How Has This Been Tested?

  • make test passes locally
  • Manual testing (describe below)

Checklist

  • My code follows the project's coding style (go fmt, go vet)
  • I have performed a self-review of my code
  • I have added tests that prove my fix is effective or my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the documentation accordingly (if applicable)
  • I have signed the CLA

Related Issues

Yukoval-Dakia and others added 30 commits June 6, 2026 13:21
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci: add PR auto-review workflow (self-hosted + codex)
* feat(llm): add codex provider via official Codex CLI

Add a 'codex' protocol that delegates LLM calls to the locally
installed Codex CLI, using its own ChatGPT/API-key authentication
instead of requiring an HTTP endpoint and token.

- New CodexClient with two runtimes: 'exec' (codex exec, default)
  and 'app_server' (persistent codex app-server over JSON-RPC)
- Resolver: OCR_LLM_PROTOCOL=codex env var and llm.protocol /
  llm.codex_runtime config keys; URL/token no longer required for
  the codex protocol
- Runs codex with --sandbox read-only and --ephemeral
- Model falls back to the Codex CLI's own global config when unset
- Docs: README sections for subscription-based setup

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address review findings in codex provider

- app-server: bound startup/initialize with caller context + 30s cap;
  readLoop now fails pending requests and completions when the process
  exits, instead of hanging until external timeouts
- app-server: route notifications by threadId, drain stale events before
  each turn, and drop oldest (never newest) on buffer overflow so
  final-answer/turn-completed events survive
- app-server: best-effort turn/interrupt when the caller cancels
- client: detect dead app-server processes and restart on next call
  instead of reusing a stale pipe
- provider: reject empty tool_calls (requires explicit task_done) and
  non-object tool arguments at the provider boundary
- resolver/config: validate llm.protocol enum and honor explicit
  openai/anthropic values over the legacy use_anthropic toggle;
  validate llm.codex_runtime

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-2 review findings

- resolver: explicit OCR_LLM_PROTOCOL env now takes precedence over the
  config file (deliberate per-invocation override, e.g. CI)
- resolver: validate codex_runtime values and normalize aliases to
  canonical 'app_server'; typos now error instead of silently selecting
  the exec runtime
- client: StreamCompletion rejects tool requests instead of silently
  dropping tool-call responses

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-3 review findings

- resolver: explicit non-codex OCR_LLM_PROTOCOL without a full endpoint
  now fails fast instead of silently resolving the config file's
  (different) protocol
- resolver: codex_runtime carried inside extra_body is validated and
  normalized like the dedicated setting; non-string values rejected
- provider: tool-calls schema gains minItems:1 so schema-valid output
  cannot produce the empty array the parser rejects

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-4 review findings

- app-server: turn slot is a context-aware semaphore; waiters no longer
  block past their deadline behind a slow turn
- app-server: interrupt the turn when turn/start is canceled in flight
- app-server: readLoop reaps the process via cmd.Wait (no zombies),
  including the initialization-failure path
- app-server: turn/completed now requires positive threadId correlation
  (anonymous completion can no longer finish a turn with stale text);
  snake_case thread_id accepted
- app-server: malformed output schema surfaces an error instead of being
  silently dropped
- resolver: whitespace-only codex_runtime treated as unset before the
  extra_body fallback
- exec runtime: pin -c approval_policy=never so interactive user configs
  (e.g. on-request) cannot stall the non-interactive review loop

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-5 review findings (protocol verified live)

Verified against a live 'codex app-server' session (codex-cli 0.134.0):
turn/completed arrives as {threadId, turn:{id,...}} with threadId at the
top level, and agent messages are item type 'agentMessage' (camelCase) —
the existing type check is correct and stays unchanged.

- app-server: capture result.turn.id from turn/start and include turnId
  in turn/interrupt requests
- app-server: answer server-initiated JSON-RPC requests (id+method) with
  method-not-found instead of dropping them as unmatched responses,
  so the server is never left blocked waiting on us
- provider: retry once on schema-valid-but-empty tool_calls before
  failing the completion (the agent treats completion errors as fatal)
- provider: fence conversation/diff/tool-result content in UNTRUSTED
  markers with an explicit do-not-follow-instructions note, and place
  control instructions before the fenced data

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-6 review findings

- resolver: config-file llm.protocol now has the same fail-fast contract
  as OCR_LLM_PROTOCOL — an explicit non-codex protocol without a full
  endpoint errors instead of silently falling through to Claude env /
  shell rc providers
- app-server: turn/completed travels on a dedicated channel that is
  never subject to overflow dropping; remaining buffered items are
  drained at completion so a final answer cannot be missed
- provider: untrusted-content fence markers now carry a per-request
  random token, so review data cannot forge a closing marker and smuggle
  instructions outside the fence

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-7 review findings

- app-server: route server-initiated requests by raw JSON-RPC id so
  string ids also receive the method-not-found rejection; responses to
  ids we never issued are dropped explicitly
- app-server: item/completed now requires positive thread correlation
  like turn/completed (live traces show items always carry threadId);
  anonymous stragglers from a canceled turn can no longer be recorded
  as the current turn's answer

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-8 review findings

- provider: fence-token entropy failure now fails the completion cleanly
  (toolPrompt returns an error) instead of panicking the process
- app-server: on cancellation the active-turn slot is handed off to the
  interrupt goroutine and released only after the interrupt settles, so
  a new turn cannot overlap one that is still being stopped
- app-server: agentMessage items (the final answer) join turn/completed
  on the dedicated never-dropped channel, so a verbose turn can no
  longer evict the final answer while keeping the completion signal

The two anonymous-correlation findings were inaccurate: with
'a.threadID != "" && id != a.threadID', an empty id is rejected, and
TestCodexAppServerAccumulator{RejectsAnonymousItems,IgnoresOtherThreads}
pin that behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(llm): address round-9 review findings

- app-server: the completions channel enqueue is now also non-blocking
  with drop-oldest, so a noisy canceled turn can never stall the sole
  stdout reader and wedge response delivery (incl. turn/interrupt acks);
  only the latest final answer and completion signal matter
- app-server: on a scanner error the child is killed before cmd.Wait,
  otherwise a process blocked writing to an unread stdout would make
  Wait block forever and done would never close

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLM endpoint now uses the OpenAI-compatible opencode-go gateway:
URL/model from repo variables, token from OCR_LLM_TOKEN secret.
Concurrency stays serialized for runner load, not Codex auth.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Was an uncommitted WIP edit; committed onto sync-upstream work branch so it
is preserved through the upstream merge. (OCR_LLM_PROTOCOL openai->anthropic)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	.github/workflows/ocr-review.yml
#	README.md
#	README.zh-CN.md
#	cmd/opencodereview/config_cmd.go
#	cmd/opencodereview/config_cmd_test.go
#	internal/llm/resolver.go
#	internal/llm/resolver_test.go
- LlmComment gains Severity (blocker|major|minor|nit) + Confidence (0-1)
- code_comment tool schema requires both; ParseComments reads them
- output filtered to severity>=major && confidence>=0.7 by default;
  tune via OCR_MIN_SEVERITY / OCR_MIN_CONFIDENCE / OCR_DISABLE_SEVERITY_FILTER;
  drop count reported to stderr (never silently truncated)
- prompt: report only high-confidence actionable issues, no nits

Targets the low-precision/high-noise problem; complements the existing
REVIEW_FILTER_TASK fact-check pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-file deterministic pre-pass that finds where a changed file's symbols are
referenced across the repo (go/parser for Go, embedded Node TS helper for TS;
no CGO) and injects a capped impact summary into the review context via a
generalized ContextProvider hook (reused later by cross-PR learnings).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6 TDD tasks: reviewctx ContextProvider framework, impact analyzer types +
changed-line extraction, Go analyzer (go/parser), TS analyzer (embedded Node
helper, no CGO), CrossRefProvider (grep+confirm+capped summary), agent wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… comment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…po dir

- FileReviewInput gains a Ref field (reviewed git ref; "" = working tree)
- agent.renderExtraContext computes and threads the ref via tool.ParseReviewMode/RefValue
- CrossRefProvider.findRefs uses git grep <ref> when Ref is set, strips the
  "<ref>:<path>" prefix from matches, and reads file bodies via git show
- New readCandidate helper: os.ReadFile for working-tree, git show for ref mode
- tsAnalyzer carries repoDir so Node runs from the repo dir (fixing typescript
  require resolution); nodeHasTypeScript is now a method for the same reason
- Early total>=maxRefs break added at the top of the per-symbol loop
- New regression test TestCrossRefProviderRefMode proves ref-mode greps HEAD
  not the mutated working tree

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-ups from the cross-ref impact review:
- M1: render the <cross_reference_impact> wrapper only when there is content,
  so non-Go/TS files (most files) no longer leave empty tags in the prompt.
  Template now carries the bare {{extra_context}} token.
- M3: cap probed changed-symbols at 25 per file (each costs one git-grep);
  a huge refactor file can't spawn hundreds of greps. Truncation is noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-review pipeline: workflow collects OCR inline comments' resolve/reply
state (GraphQL) -> OCR distills per-comment Learnings, embeds via BigModel,
persists to a local JSON-lines store -> LearningsProvider retrieves top-k by
cosine and injects past feedback into the prompt via the existing
ContextProvider hook. Split into two shippable phases (collect+store, then
retrieve+inject).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5 tasks: Learning types + JSON-lines store (dedupe/soft-cap); BigModel
embedder; idempotent feedback.json ingest; env config + best-effort wiring
into review_cmd; workflow github-script GraphQL collector. No prompt
injection (Phase 2). TDD, no CGO, fully best-effort.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final whole-branch review flagged the internal/learn import inserted out of
alphabetical order, which would fail a CI gofmt gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Yukoval-Dakia and others added 5 commits June 19, 2026 11:07
Recent PRs produced 0 comments. Local repro (real MAIN_TASK prompt + diff +
thinking 32k) showed the model analyzes thoroughly and explores via file_read
— it does not hide findings — but the 'only high-confidence / prefer
precision over coverage / do not report nits' prompt plus a major/0.7 filter
made it judge high-quality refactor PRs as 'looks correct' and report nothing.
Rebalance: prompt now asks for actionable issues incl. substantive minors;
severity filter default major->minor, confidence 0.7->0.5 (still drops pure
nits / low-confidence). Env overrides unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a github-script collect-feedback step that runs before OCR: it queries
GraphQL for the resolve/reply state of OCR's prior inline comments on the PR,
derives a verdict per comment (resolved->accepted, disagree-reply->rejected,
long-unresolved->rejected weak, else skip), and writes feedback.json. The
review step now sets OCR_LEARNINGS=on and OCR_LEARNINGS_FEEDBACK so the binary
ingests it (best-effort). Closes the missing producer half of Phase 1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ingest feedback.json into the learnings store without running a review.
This is the binary half of decoupling collection from the review trigger:
a lightweight "PR closed" workflow job can capture final thread verdicts
(resolved->accepted, unresolved->rejected) at merge time — the reliable
capture point that the review-time collector misses when a developer pushes
a fix, resolves the thread, then merges without re-triggering a review.

Reuses runLearningsIngest (token via resolved endpoint, store path via repo
remote); --feedback overrides OCR_LEARNINGS_FEEDBACK, --repo sets the dir.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ration

Close the cross-PR learnings loop so OCR stops re-flagging findings a human
already dismissed across review rounds.

- Phase 2 retrieval: cosine TopRejected/BestRejected over the store (learn).
- reflagSuppressor: output-stage pass that drops comments matching a prior
  rejected learning (cosine>=OCR_REFLAG_THRESHOLD, path-gated, fail-open).
- Decouple collection from review: shared collect-feedback.js module + new
  ocr-learn-ingest.yml triggered on PR close / thread resolve, so manual
  verdicts after review are captured (fixes pre-review-only timing flaw).
- Outdated threads => accepted (code changed = addressed); human disagreement
  still outranks it.
- ocr learn calibrate: offline pairwise-cosine stats to pick the threshold.
- Fix pre-existing post-review bugs: preserve failed inline comment bodies in
  the summary; safe code fences (safeFence) so backticks can't break markdown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 OpenCodeReview found 44 issue(s) in this PR.

  • ✅ 44 posted as inline comment(s)
  • 📝 0 posted as summary

Commands:
review, r Start a code review
rules Inspect and debug review rules
learn Collect prior-review feedback into the learnings store

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor consistency issue: all other subcommands (review, rules, config, llm) have a corresponding "Use ... for more information" hint at the bottom of the usage text, but learn is missing one. Consider adding:

Use "ocr learn -h" for more information about learn.

Comment on lines +73 to +76
// Ingest-only entry point: setupLearnings performs the ingest as a side
// effect; the returned suppressor is irrelevant here and discarded.
_ = setupLearnings(context.Background(), resolved, ep.Token, gitRunner)
return nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Silent failure in dedicated ingest command. setupLearnings is designed as a best-effort helper for the review command — it prints warnings to stderr and returns a disabled suppressor on any error, so the review proceeds unaffected. However, for the standalone learn ingest command, the ingest IS the primary operation. Discarding its return value means:

  1. If there's no LLM token, the ingest silently skips with only a stderr message, and the command exits successfully (exit code 0).
  2. If the store can't be opened or written, same silent success.
  3. If embedding fails for all items, same silent success.

In a CI/CD pipeline (the stated use case), the job will appear green even though no learnings were stored. This defeats the purpose of a dedicated ingest entry point.

Consider either:

  • Extracting the core ingest logic into a separate function that returns errors, and calling it directly here instead of reusing setupLearnings.
  • Or at minimum, checking whether cfg.FeedbackPath is set and the ingest actually ran before returning nil.

Comment on lines +110 to +111
Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex
CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LlmConfig struct is missing the ClaudeRuntime field that exists in the resolver's llmFileConfig. The resolver (internal/llm/resolver.go) supports claude as a protocol and reads claude_runtime from the config file, but users cannot set it via ocr config set llm.claude_runtime ... because:

  1. There is no ClaudeRuntime field in this struct.
  2. There is no case "llm.claude_runtime" handler in setConfigValue.
  3. The Protocol field validation rejects "claude" (only allows anthropic, openai, codex).
  4. The comment on Protocol says // anthropic, openai, or codex but should also include claude.

This means the claude protocol is unreachable through the config CLI, creating an inconsistency with the resolver.

Suggestion:

Suggested change
Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex
CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server
Protocol string `json:"protocol,omitempty"` // anthropic, openai, codex, or claude
CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server
ClaudeRuntime string `json:"claude_runtime,omitempty"` // exec or app_server

case "exec", "app_server", "app-server", "appserver":
cfg.Llm.CodexRuntime = v
default:
return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says the value must be 'exec' or 'app_server', but the switch statement actually accepts four variants: exec, app_server, app-server, and appserver. The error message should reflect all accepted values, or at minimum mention that hyphenated/concatenated forms are also accepted. This avoids confusing users who try app-server and get an error message telling them it's invalid, even though it would work.

Suggestion:

Suggested change
return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value)
return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec', 'app_server', 'app-server', or 'appserver'", value)

Comment on lines +53 to +63
if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" {
if r := severityRank(v); r > 0 {
f.minSeverity = r
f.minSeverityLabel = strings.ToLower(v)
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" {
if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 {
f.minConfidence = c
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid environment variable values are silently ignored. If a user sets OCR_MIN_SEVERITY=mnior (typo) or OCR_MIN_CONFIDENCE=abc, the filter falls back to defaults without any warning. Given that the rest of this codebase consistently warns on stderr for configuration issues (e.g., setupLearnings in review_cmd.go logs warnings like [ocr] learnings: ...), this function should follow the same pattern and emit a warning when an env var is set but cannot be parsed. In CI environments, silent fallback to defaults can make users believe their filter is configured when it isn't.

Suggestion:

Suggested change
if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" {
if r := severityRank(v); r > 0 {
f.minSeverity = r
f.minSeverityLabel = strings.ToLower(v)
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" {
if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 {
f.minConfidence = c
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" {
if r := severityRank(v); r > 0 {
f.minSeverity = r
f.minSeverityLabel = strings.ToLower(v)
} else {
fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_SEVERITY=%q is not a valid severity (blocker|major|minor|nit); using default %q\n", v, f.minSeverityLabel)
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" {
if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 {
f.minConfidence = c
} else {
fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_CONFIDENCE=%q is invalid (must be 0.0-1.0); using default %.2f\n", v, f.minConfidence)
}
}

Comment thread internal/llm/resolver.go
Comment on lines +70 to +72
if strings.TrimSpace(os.Getenv(envOCRLLMProtocol)) != "" {
strategies[0], strategies[1] = strategies[1], strategies[0]
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When OCR_LLM_PROTOCOL is set to openai or anthropic (without URL/token/model env vars), the strategy swap promotes tryOCREnv ahead of tryOCRConfig. Since tryOCREnv returns a hard error (not ok=false) in this fail-fast path, the loop at line 76-78 returns immediately — preventing tryOCRConfig from ever being tried.

This means a user who sets OCR_LLM_PROTOCOL=openai to explicitly select the protocol, while having valid openai-compatible endpoint details in their config file, will get an unhelpful error about missing env vars instead of successfully resolving from the config file.

Consider either:

  1. Only swapping strategies when the protocol is codex or claude (which are self-contained), or
  2. Having tryOCREnv return (ep, false, nil) instead of an error for openai/anthropic when URL/token/model are missing, so the loop continues to the next strategy.

Comment thread internal/llm/resolver.go
}

return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; one of OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, or ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL must be set")
return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; set OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL, or OCR_LLM_PROTOCOL=codex/claude")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message only mentions codex/claude as valid values for OCR_LLM_PROTOCOL, but normalizeProtocol also accepts anthropic and openai. Users seeing this error might not realize those are also valid protocol options.

Suggestion:

Suggested change
return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; set OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL, or OCR_LLM_PROTOCOL=codex/claude")
return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; set OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL, or OCR_LLM_PROTOCOL=anthropic|openai|codex|claude")

Comment on lines +48 to +54
resp, err := c.toolCompletionByRuntime(ctx, req)
if err != nil && errors.Is(err, errEmptyCodexToolCalls) && ctx.Err() == nil {
// A single transient empty response must not fail the whole file
// review (the agent treats completion errors as fatal); retry once
// before surfacing the error.
resp, err = c.toolCompletionByRuntime(ctx, req)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry on errEmptyCodexToolCalls re-invokes toolCompletionByRuntime with the same original ctx. Both the exec path (runCodex) and app-server path (runCodexAppServer) create their own context.WithTimeout(ctx, c.cfg.Timeout) internally, so each attempt gets the full timeout budget. In the worst case, the total wall-clock time can reach 2 × Timeout (e.g., 20 minutes with the default 10-minute timeout).

Consider deriving the retry's context from the first attempt's remaining deadline, or using a shared parent timeout that encompasses both attempts, to bound the total time.

Comment on lines +247 to +249
func (c *CodexClient) buildExecArgs(schemaPath, outputPath string) []string {
return c.buildExecArgsForModel(c.cfg.Model, schemaPath, outputPath)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildExecArgs appears to be unused — all callers use buildExecArgsForModel directly. Consider removing this method to reduce dead code.

Comment on lines +475 to +477
func toolCallChatResponse(model string, call ToolCall) *ChatResponse {
return toolCallsChatResponse(model, []ToolCall{call})
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toolCallChatResponse (singular) appears to be unused — only toolCallsChatResponse (plural) is called. Consider removing this function to reduce dead code.

Commands:
review, r Start a code review
rules Inspect and debug review rules
learn Collect prior-review feedback into the learnings store

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor consistency issue: all other subcommands (review, rules, config, llm) have a corresponding "Use ... for more information" hint at the bottom of the usage text, but learn is missing one. Consider adding:

Use "ocr learn -h" for more information about learn.

Comment on lines +73 to +76
// Ingest-only entry point: setupLearnings performs the ingest as a side
// effect; the returned suppressor is irrelevant here and discarded.
_ = setupLearnings(context.Background(), resolved, ep.Token, gitRunner)
return nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Silent failure in dedicated ingest command. setupLearnings is designed as a best-effort helper for the review command — it prints warnings to stderr and returns a disabled suppressor on any error, so the review proceeds unaffected. However, for the standalone learn ingest command, the ingest IS the primary operation. Discarding its return value means:

  1. If there's no LLM token, the ingest silently skips with only a stderr message, and the command exits successfully (exit code 0).
  2. If the store can't be opened or written, same silent success.
  3. If embedding fails for all items, same silent success.

In a CI/CD pipeline (the stated use case), the job will appear green even though no learnings were stored. This defeats the purpose of a dedicated ingest entry point.

Consider either:

  • Extracting the core ingest logic into a separate function that returns errors, and calling it directly here instead of reusing setupLearnings.
  • Or at minimum, checking whether cfg.FeedbackPath is set and the ingest actually ran before returning nil.

Comment on lines +110 to +111
Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex
CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LlmConfig struct is missing the ClaudeRuntime field that exists in the resolver's llmFileConfig. The resolver (internal/llm/resolver.go) supports claude as a protocol and reads claude_runtime from the config file, but users cannot set it via ocr config set llm.claude_runtime ... because:

  1. There is no ClaudeRuntime field in this struct.
  2. There is no case "llm.claude_runtime" handler in setConfigValue.
  3. The Protocol field validation rejects "claude" (only allows anthropic, openai, codex).
  4. The comment on Protocol says // anthropic, openai, or codex but should also include claude.

This means the claude protocol is unreachable through the config CLI, creating an inconsistency with the resolver.

Suggestion:

Suggested change
Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex
CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server
Protocol string `json:"protocol,omitempty"` // anthropic, openai, codex, or claude
CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server
ClaudeRuntime string `json:"claude_runtime,omitempty"` // exec or app_server

case "exec", "app_server", "app-server", "appserver":
cfg.Llm.CodexRuntime = v
default:
return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says the value must be 'exec' or 'app_server', but the switch statement actually accepts four variants: exec, app_server, app-server, and appserver. The error message should reflect all accepted values, or at minimum mention that hyphenated/concatenated forms are also accepted. This avoids confusing users who try app-server and get an error message telling them it's invalid, even though it would work.

Suggestion:

Suggested change
return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value)
return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec', 'app_server', 'app-server', or 'appserver'", value)

Comment on lines +53 to +63
if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" {
if r := severityRank(v); r > 0 {
f.minSeverity = r
f.minSeverityLabel = strings.ToLower(v)
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" {
if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 {
f.minConfidence = c
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid environment variable values are silently ignored. If a user sets OCR_MIN_SEVERITY=mnior (typo) or OCR_MIN_CONFIDENCE=abc, the filter falls back to defaults without any warning. Given that the rest of this codebase consistently warns on stderr for configuration issues (e.g., setupLearnings in review_cmd.go logs warnings like [ocr] learnings: ...), this function should follow the same pattern and emit a warning when an env var is set but cannot be parsed. In CI environments, silent fallback to defaults can make users believe their filter is configured when it isn't.

Suggestion:

Suggested change
if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" {
if r := severityRank(v); r > 0 {
f.minSeverity = r
f.minSeverityLabel = strings.ToLower(v)
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" {
if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 {
f.minConfidence = c
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" {
if r := severityRank(v); r > 0 {
f.minSeverity = r
f.minSeverityLabel = strings.ToLower(v)
} else {
fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_SEVERITY=%q is not a valid severity (blocker|major|minor|nit); using default %q\n", v, f.minSeverityLabel)
}
}
if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" {
if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 {
f.minConfidence = c
} else {
fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_CONFIDENCE=%q is invalid (must be 0.0-1.0); using default %.2f\n", v, f.minConfidence)
}
}

Comment on lines 112 to +114
ocr review \
--from "origin/${BASE_REF}" \
--to "${HEAD_SHA}" \
--to "origin/${HEAD_REF}" \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Fork PR refs will fail to resolve. The --to "origin/${HEAD_REF}" argument assumes the head branch exists on origin, but for fork PRs, the head branch only exists on the fork's remote — not on origin. The previous version of this workflow explicitly fetched the PR ref via git fetch origin pull/${PR_NUMBER}/head, which was removed.

This affects both pull_request events from forks (where head.ref is the fork's branch name) and issue_comment events on fork PRs (where head_ref comes from the API).

Consider restoring a fetch step, e.g.:

- name: Fetch PR head ref
  run: git fetch origin pull/${{ github.event.pull_request.number || steps.pr-context.outputs.pr_number }}/head:pr-head

And then use pr-head as the --to argument instead of origin/${HEAD_REF}.

Comment on lines +85 to +86
const { collectFeedback } = require(
`${process.env.GITHUB_WORKSPACE}/scripts/github-actions/collect-feedback.js`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security concern: Executing code from the PR branch with access to secrets. This step runs on pull_request events (not pull_request_target), which means the checkout is of the PR's merge commit. An attacker could modify scripts/github-actions/collect-feedback.js in their PR to execute arbitrary code with access to secrets.GITHUB_TOKEN (passed via github-token) and potentially other secrets available in the job environment.

The old workflow used pull_request_target specifically to avoid this class of vulnerability. Consider either:

  1. Switching back to pull_request_target (and accepting the trade-offs), or
  2. Pinning the script to a trusted ref (e.g., the default branch) rather than loading it from the checked-out workspace, e.g. by fetching the file content via the GitHub API from the base branch.

Comment on lines +26 to +28
concurrency:
group: ocr-review
cancel-in-progress: false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue: Global serialization bottleneck. The concurrency group ocr-review is shared across all PRs in the repository, and cancel-in-progress: false means runs queue indefinitely. For repositories with multiple active PRs, reviews will be serialized and delayed significantly.

The previous per-PR concurrency group (${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}) allowed concurrent reviews across different PRs while still preventing duplicate runs on the same PR. If the intent is to protect a single self-hosted runner, consider using a matrix-level concurrency or a runner-group approach instead of serializing all reviews globally.

Comment on lines +63 to +70
for _, c := range comments {
vec, err := r.emb.Embed(ctx, c.Content)
if err != nil {
fmt.Fprintf(os.Stderr, "[ocr] reflag: embed failed for %s:%d (kept): %v\n", c.Path, c.StartLine, err)
kept = append(kept, c)
continue
}
best, ok := r.store.BestRejected(vec)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: Sequential embedding API calls per comment. Each comment triggers a separate Embed network call (30s timeout configured in BigModelEmbedder). For a review with many comments, this adds significant latency (N × ~200-500ms typical embedding API latency). Consider either:

  1. Adding a BatchEmbed(ctx, texts []string) method to the Embedder interface to amortize round-trips, or
  2. Using bounded concurrency (e.g., errgroup with a semaphore) to parallelize embedding calls.

Additionally, BestRejected performs a linear scan over all rejected entries for each comment, making the overall complexity O(N×M) where N=comments and M=rejected learnings. This is fine for small stores but worth noting as the store grows.

Comment on lines +82 to +85
func pathMatches(stored, current string) bool {
stored = strings.TrimSpace(stored)
return stored == "" || stored == strings.TrimSpace(current)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robustness: Path comparison may miss equivalent paths. Simple string equality after trimming whitespace won't match paths that differ in normalization (e.g., ./src/foo.go vs src/foo.go, or dir/../dir/foo.go vs dir/foo.go). Consider using filepath.Clean on both sides before comparison to avoid missed suppressions due to path formatting differences:

func pathMatches(stored, current string) bool {
    stored = strings.TrimSpace(stored)
    if stored == "" {
        return true
    }
    return filepath.Clean(stored) == filepath.Clean(strings.TrimSpace(current))
}

Comment on lines +64 to +68
ast.Inspect(f, func(n ast.Node) bool {
id, ok := n.(*ast.Ident)
if !ok || id.Name != name {
return true
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential false positives from name-only matching: The References method matches any *ast.Ident whose Name equals the target, without checking whether the identifier actually refers to the same symbol. This means local variables, function parameters, struct field names in composite literals, or other unrelated identifiers that happen to share the same name will be incorrectly reported as references.

For example, if the changed symbol is Config (a package-level type), and another file has func foo(Config string) { ... }, the parameter Config would be falsely reported as a reference.

Since this is used for impact analysis to guide code review, false positives could mislead reviewers. Consider at minimum documenting this limitation in the method or interface comment, or exploring lightweight heuristics (e.g., skipping identifiers that are themselves declarations via id.Obj != nil && id.Obj.Pos() == id.Pos()) to reduce noise.

Comment on lines +53 to +54
default:
newLine++ // context line

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Lines starting with \ (e.g., \ No newline at end of file) are a standard part of unified diffs and should not increment the new-file line counter. Currently they fall into the default case and cause newLine++, which shifts all subsequent line numbers by 1. This can lead to incorrect changed-line detection and wrong symbols being flagged in impact analysis.

Add an explicit case to skip these metadata lines:

case strings.HasPrefix(line, "\\"):
    // diff metadata (e.g., "\ No newline at end of file"); skip

Note: The test suite does not cover this edge case either.

Suggestion:

Suggested change
default:
newLine++ // context line
case strings.HasPrefix(line, "\\"):
// diff metadata (e.g., "\ No newline at end of file"); skip
default:
newLine++ // context line

Comment on lines +50 to +54
cmd.Stdin = strings.NewReader(string(in))
out, err := cmd.Output()
if err != nil {
return resp, err
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing stderr capture makes node failures opaque. When cmd.Output() fails (e.g., node exits non-zero because TypeScript is not installed, or the script has a syntax error), the returned *exec.ExitError does not include stderr by default. The caller receives only an exit status like exit status 1 with no diagnostic from the Node process. This will be especially confusing when TypeScript is missing from the repo's node_modules.

Capture stderr and include it in the error message so failures are actionable:

Suggestion:

Suggested change
cmd.Stdin = strings.NewReader(string(in))
out, err := cmd.Output()
if err != nil {
return resp, err
}
cmd.Stdin = strings.NewReader(string(in))
var stderr bytes.Buffer
cmd.Stderr = &stderr
out, err := cmd.Output()
if err != nil {
return resp, fmt.Errorf("%w: %s", err, strings.TrimSpace(stderr.String()))
}

Comment on lines +70 to +76
func (a tsAnalyzer) nodeHasTypeScript() bool {
cmd := exec.Command("node", "-e", "require.resolve('typescript')")
if a.repoDir != "" {
cmd.Dir = a.repoDir
}
return cmd.Run() == nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeHasTypeScript is defined but never called. This method exists to check whether TypeScript is available, but no caller invokes it before runTSHelper. When TypeScript is not installed in the target repo, runTSHelper will fail with an unhelpful error from node (especially given the missing stderr capture above). Either integrate this check into runTSHelper / the analyzer selection logic, or remove the dead code to avoid confusion.

Comment on lines +58 to +70
func percentile(sorted []float32, p float64) float32 {
if len(sorted) == 0 {
return 0
}
idx := int(p * float64(len(sorted)-1))
if idx < 0 {
idx = 0
}
if idx >= len(sorted) {
idx = len(sorted) - 1
}
return sorted[idx]
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nearest-rank percentile method produces degenerate results for small sample sizes. For example, with 3 rejected learnings (3 pairs), both P90 and P95 map to index int(0.90 * 2) = 1 and int(0.95 * 2) = 1 respectively — returning the same value as the median. With 2 rejected learnings (1 pair), all percentiles return the single value.

While the Suggested threshold is clamped to mitigate this, the reported P90/P95 stats themselves can be misleading to a user trying to understand the distribution. Consider either:

  1. Using linear interpolation between adjacent ranks (e.g., the "exclusive" or "inclusive" percentile method), or
  2. Documenting that these statistics are only meaningful above a minimum pair count (e.g., ≥10 pairs) and returning ok=false below that threshold.

Comment on lines +18 to +26
const kindFor = n => {
if (ts.isFunctionDeclaration(n)) return 'function';
if (ts.isMethodDeclaration(n)) return 'method';
if (ts.isClassDeclaration(n)) return 'class';
if (ts.isInterfaceDeclaration(n)) return 'interface';
if (ts.isTypeAliasDeclaration(n)) return 'type';
if (ts.isEnumDeclaration(n)) return 'enum';
return null;
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The kindFor function does not handle VariableDeclaration nodes (const/let/var). The Symbol.Kind contract in analyzer.go explicitly lists const as an expected kind, and the Go analyzer (go_analyzer.go) captures const and var declarations. This means that if a changed line contains only variable declarations (e.g., export const FOO = ...), no symbols will be reported, causing the impact analyzer to miss those changes entirely.

Add handling for variable declarations, e.g.:

if (ts.isVariableDeclaration(n)) return 'const'; // or distinguish via parent flags

Suggestion:

Suggested change
const kindFor = n => {
if (ts.isFunctionDeclaration(n)) return 'function';
if (ts.isMethodDeclaration(n)) return 'method';
if (ts.isClassDeclaration(n)) return 'class';
if (ts.isInterfaceDeclaration(n)) return 'interface';
if (ts.isTypeAliasDeclaration(n)) return 'type';
if (ts.isEnumDeclaration(n)) return 'enum';
return null;
};
const kindFor = n => {
if (ts.isFunctionDeclaration(n)) return 'function';
if (ts.isMethodDeclaration(n)) return 'method';
if (ts.isClassDeclaration(n)) return 'class';
if (ts.isInterfaceDeclaration(n)) return 'interface';
if (ts.isTypeAliasDeclaration(n)) return 'type';
if (ts.isEnumDeclaration(n)) return 'enum';
if (ts.isVariableDeclaration(n)) {
const p = n.parent;
if (p && ts.isVariableDeclarationList(p) && (p.flags & ts.NodeFlags.Const)) return 'const';
return 'var';
}
return null;
};

const refs = [];
const seen = new Set();
const visit = n => {
if (ts.isIdentifier(n) && n.text === req.name) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential accuracy issue: References are matched purely by identifier text (n.text === req.name) without semantic symbol resolution. This means identically-named but unrelated identifiers (e.g., a local variable shadowing an import, different class members with the same name) will be incorrectly reported as references. While the upstream git grep -w pre-filter reduces noise, this can still produce false positives that mislead reviewers.

Consider using ts.createProgram + TypeChecker.getSymbolAtLocation() for proper symbol-level resolution when precision matters. If this is intentionally a best-effort heuristic, adding a comment documenting this limitation would help future maintainers.

}
tools := buildToolRegistry(collector, fileReader)

suppressor := setupLearnings(context.Background(), repoDir, ep.Token, gitRunner)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Codex/Claude protocol users can never use the learnings feature.

When the resolved protocol is codex or claude, the resolver returns a ResolvedEndpoint with an empty Token field (these protocols use their own authentication mechanisms). Since setupLearnings receives ep.Token and immediately returns a disabled suppressor when it's empty (line 325-327), the learnings/feedback loop is permanently unavailable for codex/claude users.

Consider either:

  1. Reading the embedding API token from a dedicated env var (e.g., OCR_EMBED_TOKEN) independent of the LLM endpoint token, or
  2. Falling back to a separate token source when ep.Token is empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants