Skip to content

perf: hoist tile/clamp out of highp sampler inner loop#177

Open
bhark wants to merge 1 commit into
linebender:mainfrom
bhark:perf/highp-sampler-hoist
Open

perf: hoist tile/clamp out of highp sampler inner loop#177
bhark wants to merge 1 commit into
linebender:mainfrom
bhark:perf/highp-sampler-hoist

Conversation

@bhark
Copy link
Copy Markdown

@bhark bhark commented May 23, 2026

This PR is part of a series linked to #174.

What

Hoists tile() / clamp / y*stride out of the inner loop of the bicubic and bilinear samplers in highp. Dead sample() helper is removed. The two sampler functions have been collapsed, as they're very similar after this work.

Why

tile() is opaque to LLVM, so it never gets CSE'd out of the inner loop. Hoisting drops the cost per-stage, such that the inner body collapses to one i32 add + bitcast + gather + load + mad.

Results

speedup
Geomean (56 benches) 1.01x
patterns::hq (bicubic, sampler N=4) 1.78x
patterns::lq (bilinear, sampler N=2) 1.32x
patterns::plain 1.00x
gradients::*, fill::*, blend::* 1.00x

No regressions. Benched on an i5-13400F with -Ctarget-cpu=haswell.

Notes

  • The primitives (tile, ulp_sub, to_u32x8_bitcast) are unchanged. This is just a scheduling change.
  • Should be bit-identical to the original sample() path.

@RazrFalcon
Copy link
Copy Markdown
Collaborator

Nice! There were no const generics in Rust when this code was originally written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants