Skip to content

Label only requested genes in scatter -g, not co-binned neighbors (gh#458)#1078

Merged
etal merged 1 commit into
masterfrom
fix-458-gene-label-filter
May 24, 2026
Merged

Label only requested genes in scatter -g, not co-binned neighbors (gh#458)#1078
etal merged 1 commit into
masterfrom
fix-458-gene-label-filter

Conversation

@etal
Copy link
Copy Markdown
Owner

@etal etal commented May 24, 2026

Summary

cnvkit scatter -c chr17:37850000-37890000 -g ERBB2 also labeled MIR4728 — a gene the user never requested — contradicting the documented behavior (doc/plots.rst: "Any other genes in the plotted region will not be shown unless also specified with -g").

Root cause

CNVkit stores gene annotation inside each bin's gene column rather than as a separate track, so one bin label can pack several comma-joined genes (e.g. "ERBB2,MIR4728"). In cnvlib/plots.py::gene_coords_by_name, bins were selected correctly by the requested name, but the region's label was then reconstructed from the union of all gene names appearing in those bins — leaking co-binned neighbors.

Fix

Restrict the reconstructed label to the requested names:

uniq_names.update(g for g in oname.split(",") if g in names)

When several genes are requested and happen to share bins, all requested names still appear. Because every selected bin necessarily contains the requested name, no region can become unlabeled.

Tests

  • New CNATests::test_gene_coords_by_name (in test/test_cnvlib.py): asserts -g ERBB2 against a bin labeled "ERBB2,MIR4728" yields only ERBB2, and that requesting both names surfaces both. The test fails against the old code ('ERBB2,MIR4728' != 'ERBB2') and passes with the fix.
  • Full test_cnvlib.py (30) and test_commands.py (73) pass; mypy and ruff clean.

Notes

  • Plotting-label fix only — no change to .cnr/.cns/.cnn/SEG/VCF output, so no impact on downstream clinical pipelines.
  • The code now matches the existing docs, so no doc change is needed.

Closes #458.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.81%. Comparing base (86bf4da) to head (c9d5845).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1078   +/-   ##
=======================================
  Coverage   67.81%   67.81%           
=======================================
  Files          74       74           
  Lines        7686     7686           
  Branches     1366     1366           
=======================================
  Hits         5212     5212           
  Misses       2034     2034           
  Partials      440      440           
Flag Coverage Δ
unittests 67.81% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…#458)

CNVkit stores gene annotation inside each bin's `gene` column rather than
separately, so a single bin label can pack several genes (e.g.
"ERBB2,MIR4728"). gene_coords_by_name() matched bins by the requested
name correctly, but then reconstructed each region's label from the union
of *all* gene names found in those bins -- surfacing co-binned neighbors
the user never asked for. Hence `scatter -c chr17:... -g ERBB2` also
labeled MIR4728, contradicting the documented behavior in doc/plots.rst
("Any other genes ... will not be shown unless also specified with -g").

Restrict the reconstructed label to the requested names. When multiple
genes are requested and share bins, all requested names still appear.

Plotting-label fix only; no change to .cnr/.cns/.cnn/SEG/VCF output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@etal etal force-pushed the fix-458-gene-label-filter branch from f47b2d3 to c9d5845 Compare May 24, 2026 16:36
@etal etal merged commit a2bb8f0 into master May 24, 2026
13 checks passed
@etal etal deleted the fix-458-gene-label-filter branch May 24, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scatter -c -g in amplicon mode labels more than the specified gene

1 participant