perf(render_points): drop the AnnData hack + fix the categorical cliff by timtreis · Pull Request #730 · scverse/spatialdata-plot

timtreis · 2026-06-21T11:21:37Z

render_points built a full per-point AnnData on every call (even no-color), incurring AnnData's O(n) index-uniqueness check + dtype cast on both backends before drawing — so datashader never paid off. This removes that hack and the high-cardinality-categorical legend cliff.

10M transcripts: ~3× faster general (no-color 11→3.5s, continuous 9→2.6s); ~20× for Xenium color-by-gene (16.9→0.9s). Output unchanged within visual-test tolerance.

Changes

Drop the AnnData hack: coords from the points frame, color from the existing get_values merge, legend from ColorSpec.
Skip the per-entry categorical legend past scanpy's 102-color palette limit (unreadable + O(categories²) to build).
Datashader: single-color categoricals (e.g. scanpy's grey fallback) render via the cheap count path (byte-identical); matplotlib passes a scalar color= for uniform color instead of a per-point array.

Behavior notes: legend skipped >102 categories (warning); single-color categorical datashader renders as a count (byte-identical). Two single-color baselines shifted (sub-pixel antialiasing) and were regenerated.

render_points built a full AnnData over every point (X=xy, obs=coords) just to reuse the legacy color machinery — incurring AnnData's O(n) index-uniqueness check + dtype cast on every call, regardless of backend or color. The modern ColorSpec/resolve_color pipeline already carries coords (points df), color (get_values merge + color_spec), and the legend (color_spec), so the AnnData is vestigial. Remove it: feed matplotlib coords from points["x"/"y"], let the existing get_values merge supply table obs/var colors, and keep the original table in sdata_filt so resolve_color still reads user uns palettes. Also drop the now-dead `adata` parameter threaded through _add_legend_and_colorbar / _decorate_axs / _render_centroids_as_points (none read it). 10M-transcript render: ~3x faster on both backends (no-color 11.2s->3.5s mpl, 8.6s->3.2s ds; continuous 9.2s->2.6s mpl, 8.2s->2.6s ds).

The per-point color vector alpha-strip used np.unique(return_inverse=True), which sorts millions of strings (argsort dominated the datashader render: ~1s at 10M). pd.factorize dedups in O(n) via hashing with no sort and produces a byte-identical per-point result. Modest win for the no-color/categorical paths.

… limit Coloring by a high-cardinality categorical (e.g. Xenium points by gene, ~3000 genes) spent ~10s building the legend: scanpy's _add_categorical_legend adds one autoscaling artist per category, so matplotlib re-autoscales O(categories^2) (sticky_edges called ~categories^2 times). Past len(default_102)=102 categories scanpy already colors every point uniform grey, so a per-entry legend carries no information anyway. Skip it with a warning above that limit (tied to scanpy's palette so the two stay in sync). 2M points x 3085 genes: 16.9s -> 6.5s.

When every point resolves to the same colour — notably past scanpy's 102-colour palette, where all categories become uniform grey — datashader's per-category ds.by aggregate + composite is pure waste: the output is byte-identical to a plain single-colour count render. Detect the uniform colour vector and route to the cheap count path. 2M points x 3085 genes: 6.0s -> 0.86s (~7x), byte-identical output, no spurious colorbar; low-cardinality categoricals are unaffected.

When every marker resolves to the same colour (no color / single colour / collapsed grey), _scatter_points handed ax.scatter a per-point colour array, forcing matplotlib's per-point colour-mapping machinery — the dominant cost at scale. Detect a uniform fixed-width-string colour vector (cheap vectorised compare) and pass a scalar color= instead. Visually identical (sub-tolerance edge antialiasing); numeric/continuous vectors keep the c=/cmap/norm path. 10M no-color matplotlib render: ~3.5s -> ~2.4s. Mirrors the datashader single-colour collapse.

The uniform-colour scalar `color=` path produces a sub-pixel edge-antialiasing difference vs the previous per-point `c=` array (markers identical in position, size, and colour). Two single-colour matplotlib stacking baselines exceeded TOL=15 on their few large markers; regenerated from CI. Diff is edge-only (verified), not a rendering change.

codecov-commenter · 2026-06-21T11:35:28Z

Codecov Report

❌ Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.34%. Comparing base (078afb1) to head (7ccbc11).

Files with missing lines	Patch %	Lines
src/spatialdata_plot/pl/_datashader.py	81.81%	1 Missing and 1 partial ⚠️
src/spatialdata_plot/pl/utils.py	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #730      +/-   ##
==========================================
- Coverage   79.38%   79.34%   -0.04%     
==========================================
  Files          17       17              
  Lines        4604     4600       -4     
  Branches     1031     1030       -1     
==========================================
- Hits         3655     3650       -5     
- Misses        599      600       +1     
  Partials      350      350

Files with missing lines	Coverage Δ
src/spatialdata_plot/pl/render.py	`89.56% <100.00%> (+0.02%)`	⬆️
src/spatialdata_plot/pl/_datashader.py	`88.06% <81.81%> (+0.02%)`	⬆️
src/spatialdata_plot/pl/utils.py	`78.86% <50.00%> (-0.20%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Review cleanups for the render_points perf work: - Unify the "is the colour vector uniform?" check: matplotlib's _scatter_points now reuses _color_vector_is_uniform instead of an inline copy, and the helper gains a fixed-width-string fast path (vectorised compare) so the datashader collapse no longer pays a full nunique hash on every categorical render. - Skip the per-point alpha-strip when col_for_color is None (no-colour / collapsed single-colour): _ds_shade_categorical already strips color_vector[0] there, so the O(n) factorize + N-array rebuild was wasted (~720MB at 20M). - Collapse the two near-identical ax.scatter() calls in _scatter_points into one with conditional colour kwargs. Behaviour-preserving: collapse output still byte-identical, low-cardinality categoricals unaffected, 151 non-visual tests pass.

…alettes The skipped-legend warning claimed points are "uniform grey" past the limit, but that only holds for scanpy's default palette — a custom cmap/palette gives distinct colors for >102 categories (verified: cmap='viridis' + 150 cats → 150 distinct colors). Reword to the palette-agnostic, true reasons (a per-entry legend that large is unreadable and O(categories^2) slow to build). The skip itself is unchanged and defensible regardless of palette.

timtreis added 6 commits June 21, 2026 13:20

timtreis changed the title ~~perf(render_points): drop the AnnData hack + fix the categorical cliff (~3x general, ~20x Xenium-by-gene)~~ perf(render_points): drop the AnnData hack + fix the categorical cliff Jun 21, 2026

timtreis added 2 commits June 21, 2026 13:46

timtreis merged commit d3fef03 into main Jun 21, 2026
7 of 8 checks passed

timtreis added the release-changed label Jun 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(render_points): drop the AnnData hack + fix the categorical cliff #730

perf(render_points): drop the AnnData hack + fix the categorical cliff #730
timtreis merged 8 commits into
mainfrom
perf/render-points-no-anndata

timtreis commented Jun 21, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timtreis commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timtreis commented Jun 21, 2026 •

edited

Loading

codecov-commenter commented Jun 21, 2026 •

edited

Loading