Use intrinsics for Mask::{to,from}_array#189
Conversation
a32f2cf to
d2ab646
Compare
|
This version should work except that I believe it requires |
d2ab646 to
62c9c97
Compare
This significantly simplifies codegen and should improve mask perf. Co-authored-by: Jacob Lifshay <programmerjake@gmail.com>
62c9c97 to
d6bc68b
Compare
programmerjake
left a comment
There was a problem hiding this comment.
lgtm, once simd_cast is fixed
| // false: 0b_0000_0000 | ||
| // Thus, an array of bools is also a valid array of bytes: [u8; N] | ||
| // This would be hypothetically valid as an "in-place" transmute, | ||
| // but these are "dependently-sized" types, so copy elision it is! |
There was a problem hiding this comment.
transmute_copy doesn't check if the input and output types have the same size.
There was a problem hiding this comment.
yup, but we know they're the same size since the input type is [bool; LANES] and the output type is [u8; LANES] and Rust guarantees that bool is a single byte.
| let bools: Simd<i8, LANES> = | ||
| intrinsics::simd_ne(Simd::from_array(bytes), Simd::splat(0u8)); |
There was a problem hiding this comment.
Why is this necessary? Can't this be just Simd::from_array(bytes)?
Edit: is it because this needs -1 for true instead of 1?
There was a problem hiding this comment.
Exactly. It could also be simd_neg, come to think of it.
There was a problem hiding this comment.
simd_ne is best because llvm knows the result is supposed to be a mask, with neg llvm doesn't know that, so will likely optimize more poorly
|
Now that rust-lang/rust#92425 is merged, that should allow this to land. Just have to wait for nightly to roll over. |
This significantly simplifies codegen and should improve mask perf.
Thanks to @programmerjake for the example for
from_array!This fixes #184.