Codex/claude cli provider#180
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci: add PR auto-review workflow (self-hosted + codex)
* feat(llm): add codex provider via official Codex CLI
Add a 'codex' protocol that delegates LLM calls to the locally
installed Codex CLI, using its own ChatGPT/API-key authentication
instead of requiring an HTTP endpoint and token.
- New CodexClient with two runtimes: 'exec' (codex exec, default)
and 'app_server' (persistent codex app-server over JSON-RPC)
- Resolver: OCR_LLM_PROTOCOL=codex env var and llm.protocol /
llm.codex_runtime config keys; URL/token no longer required for
the codex protocol
- Runs codex with --sandbox read-only and --ephemeral
- Model falls back to the Codex CLI's own global config when unset
- Docs: README sections for subscription-based setup
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address review findings in codex provider
- app-server: bound startup/initialize with caller context + 30s cap;
readLoop now fails pending requests and completions when the process
exits, instead of hanging until external timeouts
- app-server: route notifications by threadId, drain stale events before
each turn, and drop oldest (never newest) on buffer overflow so
final-answer/turn-completed events survive
- app-server: best-effort turn/interrupt when the caller cancels
- client: detect dead app-server processes and restart on next call
instead of reusing a stale pipe
- provider: reject empty tool_calls (requires explicit task_done) and
non-object tool arguments at the provider boundary
- resolver/config: validate llm.protocol enum and honor explicit
openai/anthropic values over the legacy use_anthropic toggle;
validate llm.codex_runtime
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-2 review findings
- resolver: explicit OCR_LLM_PROTOCOL env now takes precedence over the
config file (deliberate per-invocation override, e.g. CI)
- resolver: validate codex_runtime values and normalize aliases to
canonical 'app_server'; typos now error instead of silently selecting
the exec runtime
- client: StreamCompletion rejects tool requests instead of silently
dropping tool-call responses
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-3 review findings
- resolver: explicit non-codex OCR_LLM_PROTOCOL without a full endpoint
now fails fast instead of silently resolving the config file's
(different) protocol
- resolver: codex_runtime carried inside extra_body is validated and
normalized like the dedicated setting; non-string values rejected
- provider: tool-calls schema gains minItems:1 so schema-valid output
cannot produce the empty array the parser rejects
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-4 review findings
- app-server: turn slot is a context-aware semaphore; waiters no longer
block past their deadline behind a slow turn
- app-server: interrupt the turn when turn/start is canceled in flight
- app-server: readLoop reaps the process via cmd.Wait (no zombies),
including the initialization-failure path
- app-server: turn/completed now requires positive threadId correlation
(anonymous completion can no longer finish a turn with stale text);
snake_case thread_id accepted
- app-server: malformed output schema surfaces an error instead of being
silently dropped
- resolver: whitespace-only codex_runtime treated as unset before the
extra_body fallback
- exec runtime: pin -c approval_policy=never so interactive user configs
(e.g. on-request) cannot stall the non-interactive review loop
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-5 review findings (protocol verified live)
Verified against a live 'codex app-server' session (codex-cli 0.134.0):
turn/completed arrives as {threadId, turn:{id,...}} with threadId at the
top level, and agent messages are item type 'agentMessage' (camelCase) —
the existing type check is correct and stays unchanged.
- app-server: capture result.turn.id from turn/start and include turnId
in turn/interrupt requests
- app-server: answer server-initiated JSON-RPC requests (id+method) with
method-not-found instead of dropping them as unmatched responses,
so the server is never left blocked waiting on us
- provider: retry once on schema-valid-but-empty tool_calls before
failing the completion (the agent treats completion errors as fatal)
- provider: fence conversation/diff/tool-result content in UNTRUSTED
markers with an explicit do-not-follow-instructions note, and place
control instructions before the fenced data
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-6 review findings
- resolver: config-file llm.protocol now has the same fail-fast contract
as OCR_LLM_PROTOCOL — an explicit non-codex protocol without a full
endpoint errors instead of silently falling through to Claude env /
shell rc providers
- app-server: turn/completed travels on a dedicated channel that is
never subject to overflow dropping; remaining buffered items are
drained at completion so a final answer cannot be missed
- provider: untrusted-content fence markers now carry a per-request
random token, so review data cannot forge a closing marker and smuggle
instructions outside the fence
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-7 review findings
- app-server: route server-initiated requests by raw JSON-RPC id so
string ids also receive the method-not-found rejection; responses to
ids we never issued are dropped explicitly
- app-server: item/completed now requires positive thread correlation
like turn/completed (live traces show items always carry threadId);
anonymous stragglers from a canceled turn can no longer be recorded
as the current turn's answer
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-8 review findings
- provider: fence-token entropy failure now fails the completion cleanly
(toolPrompt returns an error) instead of panicking the process
- app-server: on cancellation the active-turn slot is handed off to the
interrupt goroutine and released only after the interrupt settles, so
a new turn cannot overlap one that is still being stopped
- app-server: agentMessage items (the final answer) join turn/completed
on the dedicated never-dropped channel, so a verbose turn can no
longer evict the final answer while keeping the completion signal
The two anonymous-correlation findings were inaccurate: with
'a.threadID != "" && id != a.threadID', an empty id is rejected, and
TestCodexAppServerAccumulator{RejectsAnonymousItems,IgnoresOtherThreads}
pin that behavior.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(llm): address round-9 review findings
- app-server: the completions channel enqueue is now also non-blocking
with drop-oldest, so a noisy canceled turn can never stall the sole
stdout reader and wedge response delivery (incl. turn/interrupt acks);
only the latest final answer and completion signal matter
- app-server: on a scanner error the child is killed before cmd.Wait,
otherwise a process blocked writing to an unread stdout would make
Wait block forever and done would never close
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLM endpoint now uses the OpenAI-compatible opencode-go gateway: URL/model from repo variables, token from OCR_LLM_TOKEN secret. Concurrency stays serialized for runner load, not Codex auth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Was an uncommitted WIP edit; committed onto sync-upstream work branch so it is preserved through the upstream merge. (OCR_LLM_PROTOCOL openai->anthropic) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # .github/workflows/ocr-review.yml # README.md # README.zh-CN.md # cmd/opencodereview/config_cmd.go # cmd/opencodereview/config_cmd_test.go # internal/llm/resolver.go # internal/llm/resolver_test.go
- LlmComment gains Severity (blocker|major|minor|nit) + Confidence (0-1) - code_comment tool schema requires both; ParseComments reads them - output filtered to severity>=major && confidence>=0.7 by default; tune via OCR_MIN_SEVERITY / OCR_MIN_CONFIDENCE / OCR_DISABLE_SEVERITY_FILTER; drop count reported to stderr (never silently truncated) - prompt: report only high-confidence actionable issues, no nits Targets the low-precision/high-noise problem; complements the existing REVIEW_FILTER_TASK fact-check pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-file deterministic pre-pass that finds where a changed file's symbols are referenced across the repo (go/parser for Go, embedded Node TS helper for TS; no CGO) and injects a capped impact summary into the review context via a generalized ContextProvider hook (reused later by cross-PR learnings). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6 TDD tasks: reviewctx ContextProvider framework, impact analyzer types + changed-line extraction, Go analyzer (go/parser), TS analyzer (embedded Node helper, no CGO), CrossRefProvider (grep+confirm+capped summary), agent wiring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…po dir - FileReviewInput gains a Ref field (reviewed git ref; "" = working tree) - agent.renderExtraContext computes and threads the ref via tool.ParseReviewMode/RefValue - CrossRefProvider.findRefs uses git grep <ref> when Ref is set, strips the "<ref>:<path>" prefix from matches, and reads file bodies via git show - New readCandidate helper: os.ReadFile for working-tree, git show for ref mode - tsAnalyzer carries repoDir so Node runs from the repo dir (fixing typescript require resolution); nodeHasTypeScript is now a method for the same reason - Early total>=maxRefs break added at the top of the per-symbol loop - New regression test TestCrossRefProviderRefMode proves ref-mode greps HEAD not the mutated working tree Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-ups from the cross-ref impact review:
- M1: render the <cross_reference_impact> wrapper only when there is content,
so non-Go/TS files (most files) no longer leave empty tags in the prompt.
Template now carries the bare {{extra_context}} token.
- M3: cap probed changed-symbols at 25 per file (each costs one git-grep);
a huge refactor file can't spawn hundreds of greps. Truncation is noted.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-review pipeline: workflow collects OCR inline comments' resolve/reply state (GraphQL) -> OCR distills per-comment Learnings, embeds via BigModel, persists to a local JSON-lines store -> LearningsProvider retrieves top-k by cosine and injects past feedback into the prompt via the existing ContextProvider hook. Split into two shippable phases (collect+store, then retrieve+inject). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5 tasks: Learning types + JSON-lines store (dedupe/soft-cap); BigModel embedder; idempotent feedback.json ingest; env config + best-effort wiring into review_cmd; workflow github-script GraphQL collector. No prompt injection (Phase 2). TDD, no CGO, fully best-effort. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final whole-branch review flagged the internal/learn import inserted out of alphabetical order, which would fail a CI gofmt gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d, check read error
Recent PRs produced 0 comments. Local repro (real MAIN_TASK prompt + diff + thinking 32k) showed the model analyzes thoroughly and explores via file_read — it does not hide findings — but the 'only high-confidence / prefer precision over coverage / do not report nits' prompt plus a major/0.7 filter made it judge high-quality refactor PRs as 'looks correct' and report nothing. Rebalance: prompt now asks for actionable issues incl. substantive minors; severity filter default major->minor, confidence 0.7->0.5 (still drops pure nits / low-confidence). Env overrides unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a github-script collect-feedback step that runs before OCR: it queries GraphQL for the resolve/reply state of OCR's prior inline comments on the PR, derives a verdict per comment (resolved->accepted, disagree-reply->rejected, long-unresolved->rejected weak, else skip), and writes feedback.json. The review step now sets OCR_LEARNINGS=on and OCR_LEARNINGS_FEEDBACK so the binary ingests it (best-effort). Closes the missing producer half of Phase 1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ingest feedback.json into the learnings store without running a review. This is the binary half of decoupling collection from the review trigger: a lightweight "PR closed" workflow job can capture final thread verdicts (resolved->accepted, unresolved->rejected) at merge time — the reliable capture point that the review-time collector misses when a developer pushes a fix, resolves the thread, then merges without re-triggering a review. Reuses runLearningsIngest (token via resolved endpoint, store path via repo remote); --feedback overrides OCR_LEARNINGS_FEEDBACK, --repo sets the dir. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ration Close the cross-PR learnings loop so OCR stops re-flagging findings a human already dismissed across review rounds. - Phase 2 retrieval: cosine TopRejected/BestRejected over the store (learn). - reflagSuppressor: output-stage pass that drops comments matching a prior rejected learning (cosine>=OCR_REFLAG_THRESHOLD, path-gated, fail-open). - Decouple collection from review: shared collect-feedback.js module + new ocr-learn-ingest.yml triggered on PR close / thread resolve, so manual verdicts after review are captured (fixes pre-review-only timing flaw). - Outdated threads => accepted (code changed = addressed); human disagreement still outranks it. - ocr learn calibrate: offline pairwise-cosine stats to pick the threshold. - Fix pre-existing post-review bugs: preserve failed inline comment bodies in the summary; safe code fences (safeFence) so backticks can't break markdown. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
| Commands: | ||
| review, r Start a code review | ||
| rules Inspect and debug review rules | ||
| learn Collect prior-review feedback into the learnings store |
There was a problem hiding this comment.
Minor consistency issue: all other subcommands (review, rules, config, llm) have a corresponding "Use ... for more information" hint at the bottom of the usage text, but learn is missing one. Consider adding:
Use "ocr learn -h" for more information about learn.
| // Ingest-only entry point: setupLearnings performs the ingest as a side | ||
| // effect; the returned suppressor is irrelevant here and discarded. | ||
| _ = setupLearnings(context.Background(), resolved, ep.Token, gitRunner) | ||
| return nil |
There was a problem hiding this comment.
Bug: Silent failure in dedicated ingest command. setupLearnings is designed as a best-effort helper for the review command — it prints warnings to stderr and returns a disabled suppressor on any error, so the review proceeds unaffected. However, for the standalone learn ingest command, the ingest IS the primary operation. Discarding its return value means:
- If there's no LLM token, the ingest silently skips with only a stderr message, and the command exits successfully (exit code 0).
- If the store can't be opened or written, same silent success.
- If embedding fails for all items, same silent success.
In a CI/CD pipeline (the stated use case), the job will appear green even though no learnings were stored. This defeats the purpose of a dedicated ingest entry point.
Consider either:
- Extracting the core ingest logic into a separate function that returns errors, and calling it directly here instead of reusing
setupLearnings. - Or at minimum, checking whether
cfg.FeedbackPathis set and the ingest actually ran before returning nil.
| Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex | ||
| CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server |
There was a problem hiding this comment.
The LlmConfig struct is missing the ClaudeRuntime field that exists in the resolver's llmFileConfig. The resolver (internal/llm/resolver.go) supports claude as a protocol and reads claude_runtime from the config file, but users cannot set it via ocr config set llm.claude_runtime ... because:
- There is no
ClaudeRuntimefield in this struct. - There is no
case "llm.claude_runtime"handler insetConfigValue. - The
Protocolfield validation rejects"claude"(only allowsanthropic,openai,codex). - The comment on
Protocolsays// anthropic, openai, or codexbut should also includeclaude.
This means the claude protocol is unreachable through the config CLI, creating an inconsistency with the resolver.
Suggestion:
| Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex | |
| CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server | |
| Protocol string `json:"protocol,omitempty"` // anthropic, openai, codex, or claude | |
| CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server | |
| ClaudeRuntime string `json:"claude_runtime,omitempty"` // exec or app_server |
| case "exec", "app_server", "app-server", "appserver": | ||
| cfg.Llm.CodexRuntime = v | ||
| default: | ||
| return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value) |
There was a problem hiding this comment.
The error message says the value must be 'exec' or 'app_server', but the switch statement actually accepts four variants: exec, app_server, app-server, and appserver. The error message should reflect all accepted values, or at minimum mention that hyphenated/concatenated forms are also accepted. This avoids confusing users who try app-server and get an error message telling them it's invalid, even though it would work.
Suggestion:
| return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value) | |
| return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec', 'app_server', 'app-server', or 'appserver'", value) |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" { | ||
| if r := severityRank(v); r > 0 { | ||
| f.minSeverity = r | ||
| f.minSeverityLabel = strings.ToLower(v) | ||
| } | ||
| } | ||
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" { | ||
| if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 { | ||
| f.minConfidence = c | ||
| } | ||
| } |
There was a problem hiding this comment.
Invalid environment variable values are silently ignored. If a user sets OCR_MIN_SEVERITY=mnior (typo) or OCR_MIN_CONFIDENCE=abc, the filter falls back to defaults without any warning. Given that the rest of this codebase consistently warns on stderr for configuration issues (e.g., setupLearnings in review_cmd.go logs warnings like [ocr] learnings: ...), this function should follow the same pattern and emit a warning when an env var is set but cannot be parsed. In CI environments, silent fallback to defaults can make users believe their filter is configured when it isn't.
Suggestion:
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" { | |
| if r := severityRank(v); r > 0 { | |
| f.minSeverity = r | |
| f.minSeverityLabel = strings.ToLower(v) | |
| } | |
| } | |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" { | |
| if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 { | |
| f.minConfidence = c | |
| } | |
| } | |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" { | |
| if r := severityRank(v); r > 0 { | |
| f.minSeverity = r | |
| f.minSeverityLabel = strings.ToLower(v) | |
| } else { | |
| fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_SEVERITY=%q is not a valid severity (blocker|major|minor|nit); using default %q\n", v, f.minSeverityLabel) | |
| } | |
| } | |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" { | |
| if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 { | |
| f.minConfidence = c | |
| } else { | |
| fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_CONFIDENCE=%q is invalid (must be 0.0-1.0); using default %.2f\n", v, f.minConfidence) | |
| } | |
| } |
| if strings.TrimSpace(os.Getenv(envOCRLLMProtocol)) != "" { | ||
| strategies[0], strategies[1] = strategies[1], strategies[0] | ||
| } |
There was a problem hiding this comment.
When OCR_LLM_PROTOCOL is set to openai or anthropic (without URL/token/model env vars), the strategy swap promotes tryOCREnv ahead of tryOCRConfig. Since tryOCREnv returns a hard error (not ok=false) in this fail-fast path, the loop at line 76-78 returns immediately — preventing tryOCRConfig from ever being tried.
This means a user who sets OCR_LLM_PROTOCOL=openai to explicitly select the protocol, while having valid openai-compatible endpoint details in their config file, will get an unhelpful error about missing env vars instead of successfully resolving from the config file.
Consider either:
- Only swapping strategies when the protocol is
codexorclaude(which are self-contained), or - Having
tryOCREnvreturn(ep, false, nil)instead of an error foropenai/anthropicwhen URL/token/model are missing, so the loop continues to the next strategy.
| } | ||
|
|
||
| return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; one of OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, or ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL must be set") | ||
| return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; set OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL, or OCR_LLM_PROTOCOL=codex/claude") |
There was a problem hiding this comment.
The error message only mentions codex/claude as valid values for OCR_LLM_PROTOCOL, but normalizeProtocol also accepts anthropic and openai. Users seeing this error might not realize those are also valid protocol options.
Suggestion:
| return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; set OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL, or OCR_LLM_PROTOCOL=codex/claude") | |
| return ResolvedEndpoint{}, fmt.Errorf("no valid LLM endpoint configured; set OCR_LLM_URL/OCR_LLM_TOKEN/OCR_LLM_MODEL, ~/.opencodereview/config.json, ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKEN/ANTHROPIC_MODEL, or OCR_LLM_PROTOCOL=anthropic|openai|codex|claude") |
| resp, err := c.toolCompletionByRuntime(ctx, req) | ||
| if err != nil && errors.Is(err, errEmptyCodexToolCalls) && ctx.Err() == nil { | ||
| // A single transient empty response must not fail the whole file | ||
| // review (the agent treats completion errors as fatal); retry once | ||
| // before surfacing the error. | ||
| resp, err = c.toolCompletionByRuntime(ctx, req) | ||
| } |
There was a problem hiding this comment.
The retry on errEmptyCodexToolCalls re-invokes toolCompletionByRuntime with the same original ctx. Both the exec path (runCodex) and app-server path (runCodexAppServer) create their own context.WithTimeout(ctx, c.cfg.Timeout) internally, so each attempt gets the full timeout budget. In the worst case, the total wall-clock time can reach 2 × Timeout (e.g., 20 minutes with the default 10-minute timeout).
Consider deriving the retry's context from the first attempt's remaining deadline, or using a shared parent timeout that encompasses both attempts, to bound the total time.
| func (c *CodexClient) buildExecArgs(schemaPath, outputPath string) []string { | ||
| return c.buildExecArgsForModel(c.cfg.Model, schemaPath, outputPath) | ||
| } |
There was a problem hiding this comment.
buildExecArgs appears to be unused — all callers use buildExecArgsForModel directly. Consider removing this method to reduce dead code.
| func toolCallChatResponse(model string, call ToolCall) *ChatResponse { | ||
| return toolCallsChatResponse(model, []ToolCall{call}) | ||
| } |
There was a problem hiding this comment.
toolCallChatResponse (singular) appears to be unused — only toolCallsChatResponse (plural) is called. Consider removing this function to reduce dead code.
| Commands: | ||
| review, r Start a code review | ||
| rules Inspect and debug review rules | ||
| learn Collect prior-review feedback into the learnings store |
There was a problem hiding this comment.
Minor consistency issue: all other subcommands (review, rules, config, llm) have a corresponding "Use ... for more information" hint at the bottom of the usage text, but learn is missing one. Consider adding:
Use "ocr learn -h" for more information about learn.
| // Ingest-only entry point: setupLearnings performs the ingest as a side | ||
| // effect; the returned suppressor is irrelevant here and discarded. | ||
| _ = setupLearnings(context.Background(), resolved, ep.Token, gitRunner) | ||
| return nil |
There was a problem hiding this comment.
Bug: Silent failure in dedicated ingest command. setupLearnings is designed as a best-effort helper for the review command — it prints warnings to stderr and returns a disabled suppressor on any error, so the review proceeds unaffected. However, for the standalone learn ingest command, the ingest IS the primary operation. Discarding its return value means:
- If there's no LLM token, the ingest silently skips with only a stderr message, and the command exits successfully (exit code 0).
- If the store can't be opened or written, same silent success.
- If embedding fails for all items, same silent success.
In a CI/CD pipeline (the stated use case), the job will appear green even though no learnings were stored. This defeats the purpose of a dedicated ingest entry point.
Consider either:
- Extracting the core ingest logic into a separate function that returns errors, and calling it directly here instead of reusing
setupLearnings. - Or at minimum, checking whether
cfg.FeedbackPathis set and the ingest actually ran before returning nil.
| Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex | ||
| CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server |
There was a problem hiding this comment.
The LlmConfig struct is missing the ClaudeRuntime field that exists in the resolver's llmFileConfig. The resolver (internal/llm/resolver.go) supports claude as a protocol and reads claude_runtime from the config file, but users cannot set it via ocr config set llm.claude_runtime ... because:
- There is no
ClaudeRuntimefield in this struct. - There is no
case "llm.claude_runtime"handler insetConfigValue. - The
Protocolfield validation rejects"claude"(only allowsanthropic,openai,codex). - The comment on
Protocolsays// anthropic, openai, or codexbut should also includeclaude.
This means the claude protocol is unreachable through the config CLI, creating an inconsistency with the resolver.
Suggestion:
| Protocol string `json:"protocol,omitempty"` // anthropic, openai, or codex | |
| CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server | |
| Protocol string `json:"protocol,omitempty"` // anthropic, openai, codex, or claude | |
| CodexRuntime string `json:"codex_runtime,omitempty"` // exec or app_server | |
| ClaudeRuntime string `json:"claude_runtime,omitempty"` // exec or app_server |
| case "exec", "app_server", "app-server", "appserver": | ||
| cfg.Llm.CodexRuntime = v | ||
| default: | ||
| return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value) |
There was a problem hiding this comment.
The error message says the value must be 'exec' or 'app_server', but the switch statement actually accepts four variants: exec, app_server, app-server, and appserver. The error message should reflect all accepted values, or at minimum mention that hyphenated/concatenated forms are also accepted. This avoids confusing users who try app-server and get an error message telling them it's invalid, even though it would work.
Suggestion:
| return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec' or 'app_server'", value) | |
| return fmt.Errorf("invalid llm.codex_runtime value %q: must be 'exec', 'app_server', 'app-server', or 'appserver'", value) |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" { | ||
| if r := severityRank(v); r > 0 { | ||
| f.minSeverity = r | ||
| f.minSeverityLabel = strings.ToLower(v) | ||
| } | ||
| } | ||
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" { | ||
| if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 { | ||
| f.minConfidence = c | ||
| } | ||
| } |
There was a problem hiding this comment.
Invalid environment variable values are silently ignored. If a user sets OCR_MIN_SEVERITY=mnior (typo) or OCR_MIN_CONFIDENCE=abc, the filter falls back to defaults without any warning. Given that the rest of this codebase consistently warns on stderr for configuration issues (e.g., setupLearnings in review_cmd.go logs warnings like [ocr] learnings: ...), this function should follow the same pattern and emit a warning when an env var is set but cannot be parsed. In CI environments, silent fallback to defaults can make users believe their filter is configured when it isn't.
Suggestion:
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" { | |
| if r := severityRank(v); r > 0 { | |
| f.minSeverity = r | |
| f.minSeverityLabel = strings.ToLower(v) | |
| } | |
| } | |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" { | |
| if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 { | |
| f.minConfidence = c | |
| } | |
| } | |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_SEVERITY")); v != "" { | |
| if r := severityRank(v); r > 0 { | |
| f.minSeverity = r | |
| f.minSeverityLabel = strings.ToLower(v) | |
| } else { | |
| fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_SEVERITY=%q is not a valid severity (blocker|major|minor|nit); using default %q\n", v, f.minSeverityLabel) | |
| } | |
| } | |
| if v := strings.TrimSpace(os.Getenv("OCR_MIN_CONFIDENCE")); v != "" { | |
| if c, err := strconv.ParseFloat(v, 64); err == nil && c >= 0 && c <= 1 { | |
| f.minConfidence = c | |
| } else { | |
| fmt.Fprintf(os.Stderr, "[ocr] warning: OCR_MIN_CONFIDENCE=%q is invalid (must be 0.0-1.0); using default %.2f\n", v, f.minConfidence) | |
| } | |
| } |
| ocr review \ | ||
| --from "origin/${BASE_REF}" \ | ||
| --to "${HEAD_SHA}" \ | ||
| --to "origin/${HEAD_REF}" \ |
There was a problem hiding this comment.
Bug: Fork PR refs will fail to resolve. The --to "origin/${HEAD_REF}" argument assumes the head branch exists on origin, but for fork PRs, the head branch only exists on the fork's remote — not on origin. The previous version of this workflow explicitly fetched the PR ref via git fetch origin pull/${PR_NUMBER}/head, which was removed.
This affects both pull_request events from forks (where head.ref is the fork's branch name) and issue_comment events on fork PRs (where head_ref comes from the API).
Consider restoring a fetch step, e.g.:
- name: Fetch PR head ref
run: git fetch origin pull/${{ github.event.pull_request.number || steps.pr-context.outputs.pr_number }}/head:pr-headAnd then use pr-head as the --to argument instead of origin/${HEAD_REF}.
| const { collectFeedback } = require( | ||
| `${process.env.GITHUB_WORKSPACE}/scripts/github-actions/collect-feedback.js`); |
There was a problem hiding this comment.
Security concern: Executing code from the PR branch with access to secrets. This step runs on pull_request events (not pull_request_target), which means the checkout is of the PR's merge commit. An attacker could modify scripts/github-actions/collect-feedback.js in their PR to execute arbitrary code with access to secrets.GITHUB_TOKEN (passed via github-token) and potentially other secrets available in the job environment.
The old workflow used pull_request_target specifically to avoid this class of vulnerability. Consider either:
- Switching back to
pull_request_target(and accepting the trade-offs), or - Pinning the script to a trusted ref (e.g., the default branch) rather than loading it from the checked-out workspace, e.g. by fetching the file content via the GitHub API from the base branch.
| concurrency: | ||
| group: ocr-review | ||
| cancel-in-progress: false |
There was a problem hiding this comment.
Potential issue: Global serialization bottleneck. The concurrency group ocr-review is shared across all PRs in the repository, and cancel-in-progress: false means runs queue indefinitely. For repositories with multiple active PRs, reviews will be serialized and delayed significantly.
The previous per-PR concurrency group (${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}) allowed concurrent reviews across different PRs while still preventing duplicate runs on the same PR. If the intent is to protect a single self-hosted runner, consider using a matrix-level concurrency or a runner-group approach instead of serializing all reviews globally.
| for _, c := range comments { | ||
| vec, err := r.emb.Embed(ctx, c.Content) | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "[ocr] reflag: embed failed for %s:%d (kept): %v\n", c.Path, c.StartLine, err) | ||
| kept = append(kept, c) | ||
| continue | ||
| } | ||
| best, ok := r.store.BestRejected(vec) |
There was a problem hiding this comment.
Performance: Sequential embedding API calls per comment. Each comment triggers a separate Embed network call (30s timeout configured in BigModelEmbedder). For a review with many comments, this adds significant latency (N × ~200-500ms typical embedding API latency). Consider either:
- Adding a
BatchEmbed(ctx, texts []string)method to theEmbedderinterface to amortize round-trips, or - Using bounded concurrency (e.g.,
errgroupwith a semaphore) to parallelize embedding calls.
Additionally, BestRejected performs a linear scan over all rejected entries for each comment, making the overall complexity O(N×M) where N=comments and M=rejected learnings. This is fine for small stores but worth noting as the store grows.
| func pathMatches(stored, current string) bool { | ||
| stored = strings.TrimSpace(stored) | ||
| return stored == "" || stored == strings.TrimSpace(current) | ||
| } |
There was a problem hiding this comment.
Robustness: Path comparison may miss equivalent paths. Simple string equality after trimming whitespace won't match paths that differ in normalization (e.g., ./src/foo.go vs src/foo.go, or dir/../dir/foo.go vs dir/foo.go). Consider using filepath.Clean on both sides before comparison to avoid missed suppressions due to path formatting differences:
func pathMatches(stored, current string) bool {
stored = strings.TrimSpace(stored)
if stored == "" {
return true
}
return filepath.Clean(stored) == filepath.Clean(strings.TrimSpace(current))
}| ast.Inspect(f, func(n ast.Node) bool { | ||
| id, ok := n.(*ast.Ident) | ||
| if !ok || id.Name != name { | ||
| return true | ||
| } |
There was a problem hiding this comment.
Potential false positives from name-only matching: The References method matches any *ast.Ident whose Name equals the target, without checking whether the identifier actually refers to the same symbol. This means local variables, function parameters, struct field names in composite literals, or other unrelated identifiers that happen to share the same name will be incorrectly reported as references.
For example, if the changed symbol is Config (a package-level type), and another file has func foo(Config string) { ... }, the parameter Config would be falsely reported as a reference.
Since this is used for impact analysis to guide code review, false positives could mislead reviewers. Consider at minimum documenting this limitation in the method or interface comment, or exploring lightweight heuristics (e.g., skipping identifiers that are themselves declarations via id.Obj != nil && id.Obj.Pos() == id.Pos()) to reduce noise.
| default: | ||
| newLine++ // context line |
There was a problem hiding this comment.
Bug: Lines starting with \ (e.g., \ No newline at end of file) are a standard part of unified diffs and should not increment the new-file line counter. Currently they fall into the default case and cause newLine++, which shifts all subsequent line numbers by 1. This can lead to incorrect changed-line detection and wrong symbols being flagged in impact analysis.
Add an explicit case to skip these metadata lines:
case strings.HasPrefix(line, "\\"):
// diff metadata (e.g., "\ No newline at end of file"); skipNote: The test suite does not cover this edge case either.
Suggestion:
| default: | |
| newLine++ // context line | |
| case strings.HasPrefix(line, "\\"): | |
| // diff metadata (e.g., "\ No newline at end of file"); skip | |
| default: | |
| newLine++ // context line |
| cmd.Stdin = strings.NewReader(string(in)) | ||
| out, err := cmd.Output() | ||
| if err != nil { | ||
| return resp, err | ||
| } |
There was a problem hiding this comment.
Missing stderr capture makes node failures opaque. When cmd.Output() fails (e.g., node exits non-zero because TypeScript is not installed, or the script has a syntax error), the returned *exec.ExitError does not include stderr by default. The caller receives only an exit status like exit status 1 with no diagnostic from the Node process. This will be especially confusing when TypeScript is missing from the repo's node_modules.
Capture stderr and include it in the error message so failures are actionable:
Suggestion:
| cmd.Stdin = strings.NewReader(string(in)) | |
| out, err := cmd.Output() | |
| if err != nil { | |
| return resp, err | |
| } | |
| cmd.Stdin = strings.NewReader(string(in)) | |
| var stderr bytes.Buffer | |
| cmd.Stderr = &stderr | |
| out, err := cmd.Output() | |
| if err != nil { | |
| return resp, fmt.Errorf("%w: %s", err, strings.TrimSpace(stderr.String())) | |
| } |
| func (a tsAnalyzer) nodeHasTypeScript() bool { | ||
| cmd := exec.Command("node", "-e", "require.resolve('typescript')") | ||
| if a.repoDir != "" { | ||
| cmd.Dir = a.repoDir | ||
| } | ||
| return cmd.Run() == nil | ||
| } |
There was a problem hiding this comment.
nodeHasTypeScript is defined but never called. This method exists to check whether TypeScript is available, but no caller invokes it before runTSHelper. When TypeScript is not installed in the target repo, runTSHelper will fail with an unhelpful error from node (especially given the missing stderr capture above). Either integrate this check into runTSHelper / the analyzer selection logic, or remove the dead code to avoid confusion.
| func percentile(sorted []float32, p float64) float32 { | ||
| if len(sorted) == 0 { | ||
| return 0 | ||
| } | ||
| idx := int(p * float64(len(sorted)-1)) | ||
| if idx < 0 { | ||
| idx = 0 | ||
| } | ||
| if idx >= len(sorted) { | ||
| idx = len(sorted) - 1 | ||
| } | ||
| return sorted[idx] | ||
| } |
There was a problem hiding this comment.
The nearest-rank percentile method produces degenerate results for small sample sizes. For example, with 3 rejected learnings (3 pairs), both P90 and P95 map to index int(0.90 * 2) = 1 and int(0.95 * 2) = 1 respectively — returning the same value as the median. With 2 rejected learnings (1 pair), all percentiles return the single value.
While the Suggested threshold is clamped to mitigate this, the reported P90/P95 stats themselves can be misleading to a user trying to understand the distribution. Consider either:
- Using linear interpolation between adjacent ranks (e.g., the "exclusive" or "inclusive" percentile method), or
- Documenting that these statistics are only meaningful above a minimum pair count (e.g., ≥10 pairs) and returning
ok=falsebelow that threshold.
| const kindFor = n => { | ||
| if (ts.isFunctionDeclaration(n)) return 'function'; | ||
| if (ts.isMethodDeclaration(n)) return 'method'; | ||
| if (ts.isClassDeclaration(n)) return 'class'; | ||
| if (ts.isInterfaceDeclaration(n)) return 'interface'; | ||
| if (ts.isTypeAliasDeclaration(n)) return 'type'; | ||
| if (ts.isEnumDeclaration(n)) return 'enum'; | ||
| return null; | ||
| }; |
There was a problem hiding this comment.
Bug: The kindFor function does not handle VariableDeclaration nodes (const/let/var). The Symbol.Kind contract in analyzer.go explicitly lists const as an expected kind, and the Go analyzer (go_analyzer.go) captures const and var declarations. This means that if a changed line contains only variable declarations (e.g., export const FOO = ...), no symbols will be reported, causing the impact analyzer to miss those changes entirely.
Add handling for variable declarations, e.g.:
if (ts.isVariableDeclaration(n)) return 'const'; // or distinguish via parent flagsSuggestion:
| const kindFor = n => { | |
| if (ts.isFunctionDeclaration(n)) return 'function'; | |
| if (ts.isMethodDeclaration(n)) return 'method'; | |
| if (ts.isClassDeclaration(n)) return 'class'; | |
| if (ts.isInterfaceDeclaration(n)) return 'interface'; | |
| if (ts.isTypeAliasDeclaration(n)) return 'type'; | |
| if (ts.isEnumDeclaration(n)) return 'enum'; | |
| return null; | |
| }; | |
| const kindFor = n => { | |
| if (ts.isFunctionDeclaration(n)) return 'function'; | |
| if (ts.isMethodDeclaration(n)) return 'method'; | |
| if (ts.isClassDeclaration(n)) return 'class'; | |
| if (ts.isInterfaceDeclaration(n)) return 'interface'; | |
| if (ts.isTypeAliasDeclaration(n)) return 'type'; | |
| if (ts.isEnumDeclaration(n)) return 'enum'; | |
| if (ts.isVariableDeclaration(n)) { | |
| const p = n.parent; | |
| if (p && ts.isVariableDeclarationList(p) && (p.flags & ts.NodeFlags.Const)) return 'const'; | |
| return 'var'; | |
| } | |
| return null; | |
| }; |
| const refs = []; | ||
| const seen = new Set(); | ||
| const visit = n => { | ||
| if (ts.isIdentifier(n) && n.text === req.name) { |
There was a problem hiding this comment.
Potential accuracy issue: References are matched purely by identifier text (n.text === req.name) without semantic symbol resolution. This means identically-named but unrelated identifiers (e.g., a local variable shadowing an import, different class members with the same name) will be incorrectly reported as references. While the upstream git grep -w pre-filter reduces noise, this can still produce false positives that mislead reviewers.
Consider using ts.createProgram + TypeChecker.getSymbolAtLocation() for proper symbol-level resolution when precision matters. If this is intentionally a best-effort heuristic, adding a comment documenting this limitation would help future maintainers.
| } | ||
| tools := buildToolRegistry(collector, fileReader) | ||
|
|
||
| suppressor := setupLearnings(context.Background(), repoDir, ep.Token, gitRunner) |
There was a problem hiding this comment.
Bug: Codex/Claude protocol users can never use the learnings feature.
When the resolved protocol is codex or claude, the resolver returns a ResolvedEndpoint with an empty Token field (these protocols use their own authentication mechanisms). Since setupLearnings receives ep.Token and immediately returns a disabled suppressor when it's empty (line 325-327), the learnings/feedback loop is permanently unavailable for codex/claude users.
Consider either:
- Reading the embedding API token from a dedicated env var (e.g.,
OCR_EMBED_TOKEN) independent of the LLM endpoint token, or - Falling back to a separate token source when
ep.Tokenis empty.
Description
Type of Change
How Has This Been Tested?
make testpasses locallyChecklist
go fmt,go vet)Related Issues