Fix seq_lock cacheline padding and CACHELINE_BYTES on aarch64 (#16) by jp-pino · Pull Request #18 · BluEye-Robotics/tyndall

jp-pino · 2026-06-26T18:09:58Z

Fixes #16.

Summary

Two coupled shared-memory layout defects in seq_lock:

Padding miscalculation. padding[CACHELINE_BYTES - sizeof(seq) - sizeof(size)] assumed seq (4B) and size (8B) pack back-to-back. On 64-bit targets the compiler inserts 4 bytes between them, so the header is 16B not 12B and entry no longer started on a cacheline boundary (offset 36/40 on aarch64, 68/72 on x86_64) — defeating the seqlock's false-sharing isolation and causing the ipc_read tool to read garbage.
Wrong CACHELINE_BYTES on aarch64. smp.h hardcoded 32 for the whole __arm__ || __aarch64__ branch. That's right for armv7 (i.MX/Cortex-A9) but every aarch64 core uses 64-byte lines — verified on the target (Orin / Cortex-A78AE) via /sys/devices/system/cpu/cpu0/cache/index*/coherency_line_size = 64. Without this, even the corrected padding only isolated to a 32-byte boundary on Jetson, so seq/size and entry still shared a real cacheline.

Changes

tyndall/ipc/seq_lock.h — replace padding[] with alignas(CACHELINE_BYTES) STORAGE entry; (entry always at offsetof == CACHELINE_BYTES, independent of STORAGE alignment) + a static_assert guarding over-aligned STORAGE.
tyndall/ipc/smp.h — split the ARM branch so aarch64 gets CACHELINE_BYTES = 64, armv7 keeps 32.
tools/ipc_read.cpp — drop the now-redundant format_char_align/storage_align rounding; compute entry_offset = offsetof(seq_lock<char>, entry) directly.
tests/ipc/seq_lock_layout.cpp (new) — static_assert + runtime check that offsetof(entry) == CACHELINE_BYTES for storage types across alignment classes. Fails to compile on the buggy layout, so the bug can't regress.

Verification

Multi-arch containers (QEMU), post-fix entry offset now equals the real cacheline on every ABI:

ABI	before	after	`CACHELINE_BYTES`
armv7 (i.MX)	32 ✅	32	32
aarch64 (Orin)	36/40 ❌	64 ✅	64
x86_64	68/72 ❌	64 ✅	64

Layout test runs green on all three arches (incl. 8-byte-aligned vec3/mixed that previously broke).
Cross-compiled for i.MX and Jetson via the Yocto SDK: tests (incl. the new layout test — static_asserts pass on the real aarch64 toolchain) and tyndall_tool_ipc_read build cleanly.

See issue #16 for the full investigation, the multi-arch reproduction, and the on-device cache-line confirmation.

🤖 Generated with Claude Code

Two coupled layout defects (issue #16): 1. seq_lock<STORAGE>'s padding[] was sized as CACHELINE_BYTES - sizeof(seq) - sizeof(size), assuming seq (4B) and size (8B) pack back-to-back. On 64-bit targets the compiler inserts 4 bytes between them, so the header is 16B not 12B and entry no longer started on a cacheline boundary (offset 36/40 on aarch64, 68/72 on x86_64) — defeating the seqlock's false-sharing isolation. Fix: replace padding[] with `alignas(CACHELINE_BYTES) STORAGE entry`, which places entry at offsetof == CACHELINE_BYTES on every ABI regardless of STORAGE alignment, plus a static_assert guarding against over-aligned STORAGE. 2. smp.h hardcoded CACHELINE_BYTES = 32 for the whole `__arm__ || __aarch64__` branch. armv7 (i.MX/Cortex-A9) is 32, but every aarch64 core uses 64-byte lines — verified on the target (Orin/Cortex-A78AE) via /sys/.../cache/index*/coherency_line_size = 64. Split the branch so aarch64 gets 64. Without this, even the corrected padding only isolated to a 32-byte boundary on Jetson, so seq/size and entry still shared a real cacheline. With entry now always one cacheline in, the ipc_read tool's storage_align rounding (derived from the format string) is redundant, so entry_offset is computed directly from offsetof(seq_lock<char>, entry). Add tests/ipc/seq_lock_layout.cpp asserting offsetof(entry) == CACHELINE_BYTES across storage alignment classes; it fails to compile on the buggy layout. Verified across armv7/aarch64/x86_64 via multi-arch containers and cross-compiled for i.MX and Jetson. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes seq_lock’s shared-memory layout so the payload (entry) reliably starts on a cacheline boundary across 32-bit ARM, aarch64, and x86_64, preventing false sharing and correcting external tooling offset calculations (per issue #16).

Changes:

Correct CACHELINE_BYTES on aarch64 to 64 while keeping armv7 at 32.
Replace manual header padding in seq_lock with alignas(CACHELINE_BYTES) STORAGE entry plus a guard against over-aligned storage.
Simplify tools/ipc_read to use offsetof(seq_lock<char>, entry) directly, and add a new layout regression test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`tyndall/ipc/smp.h`	Splits ARM/aarch64 handling so aarch64 uses 64-byte cachelines.
`tyndall/ipc/seq_lock.h`	Makes `entry` cacheline-aligned via `alignas`, removing brittle manual padding.
`tools/ipc_read.cpp`	Computes `entry_offset` via `offsetof` now that `entry` offset is fixed.
`tests/ipc/seq_lock_layout.cpp`	Adds compile-time and runtime checks to prevent layout regressions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  const size_t off =
+      reinterpret_cast<char *>(&s.entry) - reinterpret_cast<char *>(&s);


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

jp-pino · 2026-06-26T18:19:40Z

On-device end-to-end verification (Orin / aarch64)

Deployed this branch's binaries to an isolated /tmp dir on the target (Orin, Cortex-A78AE, coherency_line_size = 64) with a private LD_LIBRARY_PATH — no system install, so other running services were untouched. Published two payloads to unique channels and read them back with ipc_read.

8-byte-aligned vec3 (3 doubles) — the case that previously printed 0,0,0:

$ tyndall_ex_ipc_vec3_writer &        # writes {1.5, 2.5, 3.5} to /test/seqlock_vec3
$ tyndall_tool_ipc_read <shm> ddd
0(d): 1.500000
1(d): 2.500000
2(d): 3.500000

my_struct {unsigned char i; float j} round-trip:

$ tyndall_tool_ipc_read <shm> hf
0(h): 195
1(f): 963.000000        # consistent: writer increments both; 963 mod 256 = 195

Both payloads read back exactly as written — no phantom leading 0.0, no field misalignment. Test channels and the temp dir were cleaned up afterward.

This complements the static/compile-time verification (multi-arch container runs + i.MX/Jetson cross-compiles in the issue thread): the layout is now correct on real hardware as well.

jp-pino · 2026-06-26T18:20:11Z

Still need to test on i.MX

Subtracting two char* derived from distinct subobjects (the seq_lock object vs its entry member) is not well-defined; cast to uintptr_t and subtract the addresses instead. Addresses PR #18 review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jp-pino requested review from Copilot, follesoe and johannesschrimpf June 26, 2026 18:12

jp-pino self-assigned this Jun 26, 2026

jp-pino added the bug Something isn't working label Jun 26, 2026

jp-pino added this to the Blunux v5.1 milestone Jun 26, 2026

Copilot started reviewing on behalf of jp-pino June 26, 2026 18:12 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Comment thread tyndall/ipc/smp.h Outdated

Comment thread tools/ipc_read.cpp

Comment thread tests/ipc/seq_lock_layout.cpp

Comment thread tests/ipc/seq_lock_layout.cpp Outdated

Comment on lines +55 to +56

const size_t off =

reinterpret_cast<char *>(&s.entry) - reinterpret_cast<char *>(&s);

jp-pino and others added 2 commits June 26, 2026 20:16

Potential fix for pull request finding

13ed54c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

5dd43bc

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

johannesschrimpf approved these changes Jun 29, 2026

View reviewed changes

jp-pino merged commit c83dab3 into main Jun 29, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix seq_lock cacheline padding and CACHELINE_BYTES on aarch64 (#16)#18

Fix seq_lock cacheline padding and CACHELINE_BYTES on aarch64 (#16)#18
jp-pino merged 4 commits into
mainfrom
jp-pino/fix-seqlock-cacheline-padding

jp-pino commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jp-pino commented Jun 26, 2026

Uh oh!

jp-pino commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		const size_t off =
		reinterpret_cast<char >(&s.entry) - reinterpret_cast<char >(&s);

Uh oh!

Conversation

jp-pino commented Jun 26, 2026

Summary

Changes

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jp-pino commented Jun 26, 2026

On-device end-to-end verification (Orin / aarch64)

Uh oh!

jp-pino commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants