Skip to content

Disable aligned allocation in non linux/macos#82

Merged
ajz34 merged 1 commit into
RESTGroup:masterfrom
ajz34:state-ub-aligned-alloc
Jun 7, 2026
Merged

Disable aligned allocation in non linux/macos#82
ajz34 merged 1 commit into
RESTGroup:masterfrom
ajz34:state-ub-aligned-alloc

Conversation

@ajz34

@ajz34 ajz34 commented Jun 7, 2026

Copy link
Copy Markdown
Member

This PR will disable aligned allocation on non-linux/macos platforms (actually that's windows).

Current aligned allocation follows https://users.rust-lang.org/t/how-can-i-allocate-aligned-memory-in-rust/33293.

Aligned allocation may be UB, dependent on compiler. In linux/macos, which usually uses LLVM backend, it seems aligned allocation will be correctly freed; while in windows, the aligned pointer and freed pointer may have some wrong magic that will have some corruption, which reason I'm not very sure but will cause memory-related problems.

This is related to a problem found in rest_libcint: https://gitee.com/restgroup/rest_libcint/pulls/6, which is found and proposed by @jeanwsr.

@ajz34

ajz34 commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Following discussion is the motivation of introducing aligned allocation.
This is not documented in code. I will take this PR as scratch, and maybe in future these comments will be correctly documented in some place.

Why I tried to use aligned allocation, but actually not that useful?

Using aligned allocation is increasing (but seems not surely) increase the posibility to use vmovapd as x86 CPU assembly instruction (aligned memory move precision double), instead of vmovupd (unaligned).

However, in actual,

  • Rust is not very sensitive to *const f64 alignment. The standard way is to declare a new type like FpSimd and use #[repr(align(64))]. But that also means you cannot use the usual *const f64, but need to cast to *const FpSimd<f64> to make the compiler know this pointer is aligned, and (unsafely) make sure your cast is valid.
  • Different to gto evaluation (tabulated AO), where the grids is quite large and is possible to enable/hint SIMD (given --target-cpu=native to activate LLVM SIMD optimization). For electronic integrals, we need to copy integral-block by each shell to a whole tensor; and usually the *mut f64 pointer of integral-block in the tensor is not aligned at 64 Bytes. So even though whole tensor is aligned allocated, no operations are actually aligned, so vmovapd will not be useful in this task anyway.

The above comment is mostly for electronic integrals and GTO-type orbital computation, but similar discussion may also applie here. I think that for most users that want to use tensors in scientific programming, is not very concern on how data is aligned. Concerning alignment when writing code is actually great burden to programmer, since correctly using alignment usually requires intrinsics/assembly-level optimization; this is too technical and is not what a scientist should think about.

@ajz34 ajz34 merged commit 93fd109 into RESTGroup:master Jun 7, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant