Fix seq_lock cacheline padding and CACHELINE_BYTES on aarch64 (#16)#18
Conversation
Two coupled layout defects (issue #16): 1. seq_lock<STORAGE>'s padding[] was sized as CACHELINE_BYTES - sizeof(seq) - sizeof(size), assuming seq (4B) and size (8B) pack back-to-back. On 64-bit targets the compiler inserts 4 bytes between them, so the header is 16B not 12B and entry no longer started on a cacheline boundary (offset 36/40 on aarch64, 68/72 on x86_64) — defeating the seqlock's false-sharing isolation. Fix: replace padding[] with `alignas(CACHELINE_BYTES) STORAGE entry`, which places entry at offsetof == CACHELINE_BYTES on every ABI regardless of STORAGE alignment, plus a static_assert guarding against over-aligned STORAGE. 2. smp.h hardcoded CACHELINE_BYTES = 32 for the whole `__arm__ || __aarch64__` branch. armv7 (i.MX/Cortex-A9) is 32, but every aarch64 core uses 64-byte lines — verified on the target (Orin/Cortex-A78AE) via /sys/.../cache/index*/coherency_line_size = 64. Split the branch so aarch64 gets 64. Without this, even the corrected padding only isolated to a 32-byte boundary on Jetson, so seq/size and entry still shared a real cacheline. With entry now always one cacheline in, the ipc_read tool's storage_align rounding (derived from the format string) is redundant, so entry_offset is computed directly from offsetof(seq_lock<char>, entry). Add tests/ipc/seq_lock_layout.cpp asserting offsetof(entry) == CACHELINE_BYTES across storage alignment classes; it fails to compile on the buggy layout. Verified across armv7/aarch64/x86_64 via multi-arch containers and cross-compiled for i.MX and Jetson. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes seq_lock’s shared-memory layout so the payload (entry) reliably starts on a cacheline boundary across 32-bit ARM, aarch64, and x86_64, preventing false sharing and correcting external tooling offset calculations (per issue #16).
Changes:
- Correct
CACHELINE_BYTESon aarch64 to 64 while keeping armv7 at 32. - Replace manual header padding in
seq_lockwithalignas(CACHELINE_BYTES) STORAGE entryplus a guard against over-aligned storage. - Simplify
tools/ipc_readto useoffsetof(seq_lock<char>, entry)directly, and add a new layout regression test.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
tyndall/ipc/smp.h |
Splits ARM/aarch64 handling so aarch64 uses 64-byte cachelines. |
tyndall/ipc/seq_lock.h |
Makes entry cacheline-aligned via alignas, removing brittle manual padding. |
tools/ipc_read.cpp |
Computes entry_offset via offsetof now that entry offset is fixed. |
tests/ipc/seq_lock_layout.cpp |
Adds compile-time and runtime checks to prevent layout regressions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const size_t off = | ||
| reinterpret_cast<char *>(&s.entry) - reinterpret_cast<char *>(&s); |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
On-device end-to-end verification (Orin / aarch64)Deployed this branch's binaries to an isolated 8-byte-aligned
Both payloads read back exactly as written — no phantom leading This complements the static/compile-time verification (multi-arch container runs + i.MX/Jetson cross-compiles in the issue thread): the layout is now correct on real hardware as well. |
|
Still need to test on i.MX |
Subtracting two char* derived from distinct subobjects (the seq_lock object vs its entry member) is not well-defined; cast to uintptr_t and subtract the addresses instead. Addresses PR #18 review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes #16.
Summary
Two coupled shared-memory layout defects in
seq_lock:Padding miscalculation.
padding[CACHELINE_BYTES - sizeof(seq) - sizeof(size)]assumedseq(4B) andsize(8B) pack back-to-back. On 64-bit targets the compiler inserts 4 bytes between them, so the header is 16B not 12B andentryno longer started on a cacheline boundary (offset 36/40 on aarch64, 68/72 on x86_64) — defeating the seqlock's false-sharing isolation and causing theipc_readtool to read garbage.Wrong
CACHELINE_BYTESon aarch64.smp.hhardcoded32for the whole__arm__ || __aarch64__branch. That's right for armv7 (i.MX/Cortex-A9) but every aarch64 core uses 64-byte lines — verified on the target (Orin / Cortex-A78AE) via/sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size = 64. Without this, even the corrected padding only isolated to a 32-byte boundary on Jetson, soseq/sizeandentrystill shared a real cacheline.Changes
tyndall/ipc/seq_lock.h— replacepadding[]withalignas(CACHELINE_BYTES) STORAGE entry;(entry always atoffsetof == CACHELINE_BYTES, independent of STORAGE alignment) + astatic_assertguarding over-aligned STORAGE.tyndall/ipc/smp.h— split the ARM branch so aarch64 getsCACHELINE_BYTES = 64, armv7 keeps32.tools/ipc_read.cpp— drop the now-redundantformat_char_align/storage_alignrounding; computeentry_offset = offsetof(seq_lock<char>, entry)directly.tests/ipc/seq_lock_layout.cpp(new) —static_assert+ runtime check thatoffsetof(entry) == CACHELINE_BYTESfor storage types across alignment classes. Fails to compile on the buggy layout, so the bug can't regress.Verification
Multi-arch containers (QEMU), post-fix
entryoffset now equals the real cacheline on every ABI:CACHELINE_BYTESvec3/mixedthat previously broke).tests(incl. the new layout test —static_assertspass on the real aarch64 toolchain) andtyndall_tool_ipc_readbuild cleanly.See issue #16 for the full investigation, the multi-arch reproduction, and the on-device cache-line confirmation.
🤖 Generated with Claude Code