Skip to content

Implement faster getWorldPos#52

Open
Ameobea wants to merge 1 commit into
N8python:masterfrom
Ameobea:faster-getworldpos
Open

Implement faster getWorldPos#52
Ameobea wants to merge 1 commit into
N8python:masterfrom
Ameobea:faster-getworldpos

Conversation

@Ameobea
Copy link
Copy Markdown

@Ameobea Ameobea commented May 22, 2026

  • All non-crazy camera projection matrices (excluding ortho) have several zero values. Using a fully generic mat4 x vec4 matrix multiplication wastes work for these zero entries. These big matrix multiplies add up to significant overhead, and N8AO uses this getWorldPos function in several places, so it ends up getting called multiple times per pixel per frame. It's actually a hotspot in several N8AO shaders, which I've verified directly myself: https://i.ameo.link/dqf.png
  • By replacing the generic mat4 x vec4 with a hand-rolled set of FMAs on pairs of non-zero entries only, the total FLOPs get cut significantly while producing identical results
  • For reference, see: https://www.derschmale.com/2014/03/19/reconstructing-positions-from-the-depth-buffer-pt-2-perspective-and-orthographic-general-case/

The only situation I could see this change being a problem is for users with fully-custom camera matrices, like if they're implementing some kind of wonky distortion or mirroring effect or something like that.

 * All non-crazy camera projection matrices (excluding ortho) have several zero values.  Using a fully generic mat4 x vec4 matrix multiplication wastes work for these zero entries.  These big matrix multiplies add up to significant overhead, and N8AO uses this `getWorldPos` function in several places, so it ends up getting called multiple times per pixel per frame.  It's actually a hotspot in several N8AO shaders, which I've verified directly myself: https://i.ameo.link/dqf.png
 * By replacing the generic mat4 x vec4 with a hand-rolled set of FMAs on pairs of non-zero entries only, the total FLOPs get cut significantly while producing identical results
 * For reference, see: https://www.derschmale.com/2014/03/19/reconstructing-positions-from-the-depth-buffer-pt-2-perspective-and-orthographic-general-case/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant