Summary
The newly-merged lru_lock_nmi selftest (commit 6e1e4a9d60ed, Jun 7 2026) fails deterministically on aarch64 and intermittently on x86_64 with drain_then_verify_capacity returning -EIO. The test's post-stress verification runs before the rqspinlock recovery path has time to reclaim nodes left in pending_free state.
Failure Details
- Test / Component:
lru_lock_nmi (common_lru, no_common_lru, percpu_lru subtests)
- Frequency: Every aarch64 CI run since Jun 7; occasional x86_64 (percpu_lru subtest). Observed in 5+ independent PRs.
- Failure mode: -EIO from
drain_then_verify_capacity — after NMI stress + drain + refill, some inserted keys are missing due to unexpected LRU eviction
- Affected architectures: aarch64 (deterministic), x86_64 (intermittent, percpu_lru subtest)
- CI runs observed:
Root Cause Analysis
Commit 89edbdfc5d03 ("bpf: Fix NMI/tracepoint re-entry deadlock on lru locks") converts LRU locks to raw_res_spin_lock (rqspinlock) and adds recovery paths for lock acquisition failure. When a lock cannot be acquired (NMI re-entry), nodes are marked pending_free=1 and either:
- Published to a lockless
free_llist for later pickup
- Left on the pending/inactive list with
pending_free=1 for lazy reclamation
The lazy reclamation happens during:
__local_list_flush() (called from bpf_lru_list_pop_free_to_local())
__bpf_lru_list_shrink_inactive() (called during LRU shrink)
The test (tools/testing/selftests/bpf/prog_tests/lru_lock_nmi.c:228) calls drain_then_verify_capacity() immediately after destroying perf event links and joining hammer threads. This function:
- Deletes all keys (drains the map)
- Refills with exactly
MAP_ENTRIES (64) unique keys spread across CPUs
- Verifies all keys are retrievable
If pending_free nodes haven't been reclaimed yet, they still consume map capacity. The refill phase inserts 64 keys into a map that internally has fewer than 64 free slots (some occupied by pending_free nodes), triggering LRU eviction. Evicted keys then fail the lookup verification.
On aarch64, perf events don't use true NMI but IRQ-level exceptions. The slightly different timing characteristics compared to x86 NMI cause more frequent lock contention windows, making the failure deterministic rather than intermittent.
Proposed Fix
See 0001-selftests-bpf-Fix-lru_lock_nmi-post-stress-verificat.patch:
- Add 10ms sleep after destroying perf links and joining threads — allows in-flight interrupt handlers to complete.
- Retry verification up to 3 times — each drain+refill cycle triggers internal LRU list maintenance that reclaims pending_free nodes. After 2-3 cycles all stranded nodes are recovered.
This approach is robust because it doesn't rely on specific timing; it uses the kernel's own reclamation machinery to recover the nodes before asserting.
Impact
Every BPF CI run on aarch64 (test_progs, test_progs_no_alu32, test_progs_cpuv4) fails due to this test. This blocks meaningful CI signal for all submitted PRs on aarch64. On x86_64, the percpu_lru subtest fails intermittently, causing additional noise.
References
Summary
The newly-merged
lru_lock_nmiselftest (commit 6e1e4a9d60ed, Jun 7 2026) fails deterministically on aarch64 and intermittently on x86_64 withdrain_then_verify_capacityreturning -EIO. The test's post-stress verification runs before the rqspinlock recovery path has time to reclaim nodes left inpending_freestate.Failure Details
lru_lock_nmi(common_lru,no_common_lru,percpu_lrusubtests)drain_then_verify_capacity— after NMI stress + drain + refill, some inserted keys are missing due to unexpected LRU evictionRoot Cause Analysis
Commit 89edbdfc5d03 ("bpf: Fix NMI/tracepoint re-entry deadlock on lru locks") converts LRU locks to
raw_res_spin_lock(rqspinlock) and adds recovery paths for lock acquisition failure. When a lock cannot be acquired (NMI re-entry), nodes are markedpending_free=1and either:free_llistfor later pickuppending_free=1for lazy reclamationThe lazy reclamation happens during:
__local_list_flush()(called frombpf_lru_list_pop_free_to_local())__bpf_lru_list_shrink_inactive()(called during LRU shrink)The test (
tools/testing/selftests/bpf/prog_tests/lru_lock_nmi.c:228) callsdrain_then_verify_capacity()immediately after destroying perf event links and joining hammer threads. This function:MAP_ENTRIES(64) unique keys spread across CPUsIf
pending_freenodes haven't been reclaimed yet, they still consume map capacity. The refill phase inserts 64 keys into a map that internally has fewer than 64 free slots (some occupied by pending_free nodes), triggering LRU eviction. Evicted keys then fail the lookup verification.On aarch64, perf events don't use true NMI but IRQ-level exceptions. The slightly different timing characteristics compared to x86 NMI cause more frequent lock contention windows, making the failure deterministic rather than intermittent.
Proposed Fix
See
0001-selftests-bpf-Fix-lru_lock_nmi-post-stress-verificat.patch:This approach is robust because it doesn't rely on specific timing; it uses the kernel's own reclamation machinery to recover the nodes before asserting.
Impact
Every BPF CI run on aarch64 (test_progs, test_progs_no_alu32, test_progs_cpuv4) fails due to this test. This blocks meaningful CI signal for all submitted PRs on aarch64. On x86_64, the percpu_lru subtest fails intermittently, causing additional noise.
References