fix(perl): eliminate false-positive Perl CALLS edges (builtins, framework method calls, config strings)#477
Conversation
The Perl branch of extract_scripting_callee blindly returned the text of child(0) of every call node. In config-heavy Perl (.cgi/.pl with embedded log4perl-style config), tree-sitter-perl misparses dotted config tokens (e.g. "log4perl.appender.File.utf8") into call-shaped nodes, and that dotted string was emitted as a callee_name, later matched by the generic short-name resolver to unrelated project subs. Now the Perl branch pulls the real name token (method/function field, else child(0)) and validates it as a bare Perl sub/method identifier via perl_is_identifier_callee: must start with a letter or '_' and contain only [A-Za-z0-9_:] (allowing the '::' package separator). Any '.', whitespace, quote, or '/' disqualifies it and NULL is returned so no CALLS edge forms. Gated to CBM_LANG_PERL; other languages are untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>
Perl builtins (push/shift/keys/sprintf/...) carry no language or call-kind awareness through the generic name-matcher in cbm_registry_resolve. When a project defines a sub whose name collides with a builtin, an invocation of the builtin was wired to that sub by same-module / suffix matching - a false-positive CALLS edge. Adds cbm_perl_is_builtin (curated, sorted bsearch set of 94 perlfunc core builtins) and applies it in both call-resolution passes (sequential resolve_single_call and parallel resolve_file_calls), gated on the file language == CBM_LANG_PERL and only AFTER LSP resolution has declined, so a genuine LSP-resolved call is never suppressed. The file language is threaded into both resolvers via a new trailing CBMLanguage parameter; every other language reaches cbm_registry_resolve unchanged (byte-identical behavior). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>
…alls A Perl method call ($obj->m / Class->m) carries no receiver type at the structural tier, so the generic short-name matcher in cbm_registry_resolve would wire $dbh->commit / $cgi->param / $logger->log to any project sub sharing the bare method name - the dominant source of false-positive CALLS edges in CPAN/framework-heavy Perl. Resolving such a call correctly is the LSP's job, not the bare-name matcher's. Adds a CBMCall.is_method flag (zero-init false, so all other languages and existing call sites are unaffected). method_call_expression is added to the Perl call node set and handle_calls sets is_method=true only for that node type when the file language is Perl. Both call-resolution passes then skip generic resolution for Perl method calls (combined with the builtin guard from the prior commit). Genuine intra-project function calls (non-method, non-builtin) still resolve as before. LSP-resolved method calls are unaffected because the guard runs only after LSP resolution declines. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>
…trings)
Hermetic tests for the three Perl call-graph noise fixes:
test_extraction.c (extraction tier):
- config string is never emitted as a callee; genuine call still extracted
- builtin calls (push/keys) extracted but never flagged is_method
- arrow/method calls ($self->commit / $dbh->commit) set is_method=true,
while the genuine function call (helper) does not
- a JS method call never sets is_method (flag is Perl-only — other
languages unaffected)
test_registry.c (resolver tier):
- cbm_perl_is_builtin recognizes core builtins (incl. first/last of the
sorted set) and rejects project subs, case variants, empty, and NULL
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>
Round 1 (Claude panel + DO DeepSeek) findings: - Rework the Perl noise guard so it suppresses only WEAK generic matches (suffix_match/unique_name) and KEEPS high-confidence same_module/import_map. The prior guard ran before cbm_registry_resolve and dropped genuine same-file calls to builtin-named subs (e.g. a project sub log/index/open called as a bare function) that pre-PR resolved via same_module. Extracted the decision into pure, unit-tested cbm_perl_suppress_generic_match() shared by the sequential (pass_calls.c) and parallel (pass_parallel.c) resolvers; corrected the inaccurate comments (Perl has no LSP/textual stage before the guard). - Tighten perl_is_identifier_callee to require '::' pairs (reject a lone ':', ':::', or trailing '::'). - Add resolver-contract tests covering weak-match suppression, same_module/ import_map retention, genuine-call survival, non-Perl no-op, and NULL strategy. Verified on a real Perl monorepo: .cgi builtin/CPAN/config-string noise stays eliminated while same_module edges to builtin-named subs are recovered. Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>
QA Round 1Reviewers: Claude Code (claude-opus-4-8) parallel review panel (3 lenses) + DigitalOcean DeepSeek ( Contract Verification (issue #476)
Findings[claude] Finding 1 — Test gap (major, fixed). The suppression branch and edge-survival path had no resolver-level test (extraction-level + a builtin-set unit test only); the test comments promised end-to-end coverage that didn't exist. Fixed: extracted the suppression decision into pure [claude] Finding 2 — Builtin guard over-suppressed genuine same-file calls (regression, fixed). Perl has no LSP resolver, so the guard ran before [claude|do:deepseek-v4-pro] Finding 3 — [do:deepseek-v4-pro] Finding 4 — locale-dependent [do:deepseek-v4-pro] "is_method never set in extraction" (reported critical) — REFUTED (false positive). The DO reviewer chunked the diff by commit and reviewed the config-string commit in isolation. Result3 confirmed findings fixed (1 major + 1 regression + 1 minor); 1 advisory deferred; 1 reported-critical refuted. Fixes committed as SAST: GitHub code scanning is not enabled on this repo — security delta skipped (non-blocking). QA performed by Claude Code (claude-opus-4-8) parallel panel + do:deepseek-v4-pro |
Round 2 (Claude panel) caught a regression introduced by the round-1 refactor: cbm_perl_suppress_generic_match whitelisted only the exact strategies "same_module" and "import_map", but resolve_import_map can also return "import_map_suffix" (confidence 0.85 — a genuine import-based resolution, not a weak short-name guess). A '::'-qualified Perl builtin/method call resolved via the import-suffix fallback was therefore dropped, contradicting the helper's documented contract and partially missing acceptance criterion (d). Add import_map_suffix to the kept (high-confidence) set so only the weak short-name strategies (suffix_match / unique_name) are suppressed; update the doc comment and add a unit-test case asserting import_map_suffix is retained. Deferred as advisory (non-blocking, noted on the PR): a hypothetical leading-'::' (main:: shorthand) under-extraction in perl_is_identifier_callee, and a colon-edge-case coverage gap (logic correct by inspection). Signed-off-by: Shane McCarron <shane.mccarron@corvexconnect.com>
QA Round 2Reviewer: Claude Code (claude-opus-4-8) parallel review panel (3 lenses). The DigitalOcean DeepSeek second opinion timed out this round (~7 min, no output) — recorded as a non-blocking second-opinion failure; round 1 captured a DO opinion. Contract Verification (issue #476)
Findings[claude] Finding 1 — round-1 helper whitelist omitted [claude] Finding 2 — leading- [claude] Finding 3 — colon-edge-case test gap (minor, advisory — deferred). [claude] Finding 4 — pre-existing unrelated test failure. Result1 confirmed regression fixed ( QA performed by Claude Code (claude-opus-4-8) parallel panel (DO DeepSeek second opinion timed out — non-blocking) |
QA Round 3 (confirming) — CLEAN ✅Reviewer: Claude Code (claude-opus-4-8) parallel review panel (3 lenses). All three lenses returned empty findings. (The DigitalOcean DeepSeek second opinion did not complete within the window again this round — recorded as a non-blocking second-opinion failure; a DO opinion was captured in round 1.) Contract Verification (issue #476) — all pass
Verification highlights
ResultNo new or remaining confirmed defects. The two prior advisory items (hypothetical leading- 3 QA rounds complete; round 3 clean. Marking ready for review. QA performed by Claude Code (claude-opus-4-8) parallel panel |
Summary
Perl files are extracted (call sites emitted), and any call the textual resolver can't place falls back to a generic short-name matcher with no language or call-kind awareness. It wires Perl builtins, framework method calls, and mis-parsed config strings to unrelated project subs that merely share a name — polluting the Perl call graph with false-positive
CALLSedges.This fixes the three sources, all gated on
CBM_LANG_PERLso the generic resolver andCBMCall(shared by all 10 languages) stay byte-identical for non-Perl.What changed
fix(perl): stop extracting config strings as call targets—extract_scripting_callee(Perl branch) now extracts the real method/function name token and rejects non-identifier callees (containing., quotes, whitespace, …), so dotted config strings/literals (e.g.log4perl.appender.File.utf8) never become call targets.fix(resolver): don't match Perl builtins to project subs— adds a curated Perl builtin set (src/pipeline/registry.c); when an unresolved Perl call's name is a builtin, the generic edge is suppressed. Real same-file subs are already resolved by earlier stages before the generic fallback, so this only drops spurious builtin matches.fix(resolver): suppress generic short-name matching for Perl method calls— addsis_methodtoCBMCall(default false → no-op for other languages), set during Perl extraction for arrow/method calls, threaded intoresolve_single_call/pass_parallel’s resolver. Perl method calls with an unknown receiver no longer generic-match to free subs (precise method resolution is the LSP's job).tests/test_extraction.candtests/test_registry.ccovering builtins, config-string rejection, method-call suppression, a genuine-call-still-resolves case, and a cross-language no-op check.Validation
Re-indexing a large real Perl monorepo (~1,200 modules + 352
.cgiendpoints) with the fix:.cgisuffix_matchedges.cgibuiltin / CPAN-method / config-string noiseCALLSedgesThis is precision via noise removal — fewer, more-correct edges. Genuine intra-project resolution survives.
scripts/build.sh— clean (-Werror).scripts/test.sh— green except the unrelated pre-existingcli_hook_gate_script_no_predictable_tmp_issue384; cross-language breadth check[CALLS-BREADTH] 53 langs: 0 FAILURESconfirms all other languages still resolve.clang-format— clean on changed files.Closes #476
🤖 Generated with Claude Code