Skip to content

Fix seq_lock cacheline padding and CACHELINE_BYTES on aarch64 (#16)#18

Merged
jp-pino merged 4 commits into
mainfrom
jp-pino/fix-seqlock-cacheline-padding
Jun 29, 2026
Merged

Fix seq_lock cacheline padding and CACHELINE_BYTES on aarch64 (#16)#18
jp-pino merged 4 commits into
mainfrom
jp-pino/fix-seqlock-cacheline-padding

Conversation

@jp-pino

@jp-pino jp-pino commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Fixes #16.

Summary

Two coupled shared-memory layout defects in seq_lock:

  1. Padding miscalculation. padding[CACHELINE_BYTES - sizeof(seq) - sizeof(size)] assumed seq (4B) and size (8B) pack back-to-back. On 64-bit targets the compiler inserts 4 bytes between them, so the header is 16B not 12B and entry no longer started on a cacheline boundary (offset 36/40 on aarch64, 68/72 on x86_64) — defeating the seqlock's false-sharing isolation and causing the ipc_read tool to read garbage.

  2. Wrong CACHELINE_BYTES on aarch64. smp.h hardcoded 32 for the whole __arm__ || __aarch64__ branch. That's right for armv7 (i.MX/Cortex-A9) but every aarch64 core uses 64-byte lines — verified on the target (Orin / Cortex-A78AE) via /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size = 64. Without this, even the corrected padding only isolated to a 32-byte boundary on Jetson, so seq/size and entry still shared a real cacheline.

Changes

  • tyndall/ipc/seq_lock.h — replace padding[] with alignas(CACHELINE_BYTES) STORAGE entry; (entry always at offsetof == CACHELINE_BYTES, independent of STORAGE alignment) + a static_assert guarding over-aligned STORAGE.
  • tyndall/ipc/smp.h — split the ARM branch so aarch64 gets CACHELINE_BYTES = 64, armv7 keeps 32.
  • tools/ipc_read.cpp — drop the now-redundant format_char_align/storage_align rounding; compute entry_offset = offsetof(seq_lock<char>, entry) directly.
  • tests/ipc/seq_lock_layout.cpp (new) — static_assert + runtime check that offsetof(entry) == CACHELINE_BYTES for storage types across alignment classes. Fails to compile on the buggy layout, so the bug can't regress.

Verification

Multi-arch containers (QEMU), post-fix entry offset now equals the real cacheline on every ABI:

ABI before after CACHELINE_BYTES
armv7 (i.MX) 32 ✅ 32 32
aarch64 (Orin) 36/40 ❌ 64 64
x86_64 68/72 ❌ 64 ✅ 64
  • Layout test runs green on all three arches (incl. 8-byte-aligned vec3/mixed that previously broke).
  • Cross-compiled for i.MX and Jetson via the Yocto SDK: tests (incl. the new layout test — static_asserts pass on the real aarch64 toolchain) and tyndall_tool_ipc_read build cleanly.

See issue #16 for the full investigation, the multi-arch reproduction, and the on-device cache-line confirmation.

🤖 Generated with Claude Code

Two coupled layout defects (issue #16):

1. seq_lock<STORAGE>'s padding[] was sized as
   CACHELINE_BYTES - sizeof(seq) - sizeof(size), assuming seq (4B) and
   size (8B) pack back-to-back. On 64-bit targets the compiler inserts
   4 bytes between them, so the header is 16B not 12B and entry no
   longer started on a cacheline boundary (offset 36/40 on aarch64,
   68/72 on x86_64) — defeating the seqlock's false-sharing isolation.

   Fix: replace padding[] with `alignas(CACHELINE_BYTES) STORAGE entry`,
   which places entry at offsetof == CACHELINE_BYTES on every ABI
   regardless of STORAGE alignment, plus a static_assert guarding
   against over-aligned STORAGE.

2. smp.h hardcoded CACHELINE_BYTES = 32 for the whole `__arm__ ||
   __aarch64__` branch. armv7 (i.MX/Cortex-A9) is 32, but every aarch64
   core uses 64-byte lines — verified on the target (Orin/Cortex-A78AE)
   via /sys/.../cache/index*/coherency_line_size = 64. Split the branch
   so aarch64 gets 64. Without this, even the corrected padding only
   isolated to a 32-byte boundary on Jetson, so seq/size and entry still
   shared a real cacheline.

With entry now always one cacheline in, the ipc_read tool's
storage_align rounding (derived from the format string) is redundant, so
entry_offset is computed directly from offsetof(seq_lock<char>, entry).

Add tests/ipc/seq_lock_layout.cpp asserting offsetof(entry) ==
CACHELINE_BYTES across storage alignment classes; it fails to compile on
the buggy layout. Verified across armv7/aarch64/x86_64 via multi-arch
containers and cross-compiled for i.MX and Jetson.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jp-pino jp-pino self-assigned this Jun 26, 2026
@jp-pino jp-pino added the bug Something isn't working label Jun 26, 2026
@jp-pino jp-pino added this to the Blunux v5.1 milestone Jun 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes seq_lock’s shared-memory layout so the payload (entry) reliably starts on a cacheline boundary across 32-bit ARM, aarch64, and x86_64, preventing false sharing and correcting external tooling offset calculations (per issue #16).

Changes:

  • Correct CACHELINE_BYTES on aarch64 to 64 while keeping armv7 at 32.
  • Replace manual header padding in seq_lock with alignas(CACHELINE_BYTES) STORAGE entry plus a guard against over-aligned storage.
  • Simplify tools/ipc_read to use offsetof(seq_lock<char>, entry) directly, and add a new layout regression test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
tyndall/ipc/smp.h Splits ARM/aarch64 handling so aarch64 uses 64-byte cachelines.
tyndall/ipc/seq_lock.h Makes entry cacheline-aligned via alignas, removing brittle manual padding.
tools/ipc_read.cpp Computes entry_offset via offsetof now that entry offset is fixed.
tests/ipc/seq_lock_layout.cpp Adds compile-time and runtime checks to prevent layout regressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tyndall/ipc/smp.h Outdated
Comment thread tools/ipc_read.cpp
Comment thread tests/ipc/seq_lock_layout.cpp
Comment thread tests/ipc/seq_lock_layout.cpp Outdated
Comment on lines +55 to +56
const size_t off =
reinterpret_cast<char *>(&s.entry) - reinterpret_cast<char *>(&s);
jp-pino and others added 2 commits June 26, 2026 20:16
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@jp-pino

jp-pino commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

On-device end-to-end verification (Orin / aarch64)

Deployed this branch's binaries to an isolated /tmp dir on the target (Orin, Cortex-A78AE, coherency_line_size = 64) with a private LD_LIBRARY_PATH — no system install, so other running services were untouched. Published two payloads to unique channels and read them back with ipc_read.

8-byte-aligned vec3 (3 doubles) — the case that previously printed 0,0,0:

$ tyndall_ex_ipc_vec3_writer &        # writes {1.5, 2.5, 3.5} to /test/seqlock_vec3
$ tyndall_tool_ipc_read <shm> ddd
0(d): 1.500000
1(d): 2.500000
2(d): 3.500000

my_struct {unsigned char i; float j} round-trip:

$ tyndall_tool_ipc_read <shm> hf
0(h): 195
1(f): 963.000000        # consistent: writer increments both; 963 mod 256 = 195

Both payloads read back exactly as written — no phantom leading 0.0, no field misalignment. Test channels and the temp dir were cleaned up afterward.

This complements the static/compile-time verification (multi-arch container runs + i.MX/Jetson cross-compiles in the issue thread): the layout is now correct on real hardware as well.

@jp-pino

jp-pino commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Still need to test on i.MX

Subtracting two char* derived from distinct subobjects (the seq_lock
object vs its entry member) is not well-defined; cast to uintptr_t and
subtract the addresses instead. Addresses PR #18 review feedback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jp-pino jp-pino merged commit c83dab3 into main Jun 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

seq_lock header padding miscalculation on 64-bit targets

3 participants