Skip to content

[bpf-ci-bot] lru_lock_nmi test fails deterministically on aarch64 due to pending_free node reclamation race #488

@kernel-patches-review-bot

Description

@kernel-patches-review-bot

Summary

The newly-merged lru_lock_nmi selftest (commit 6e1e4a9d60ed, Jun 7 2026) fails deterministically on aarch64 and intermittently on x86_64 with drain_then_verify_capacity returning -EIO. The test's post-stress verification runs before the rqspinlock recovery path has time to reclaim nodes left in pending_free state.

Failure Details

Root Cause Analysis

Commit 89edbdfc5d03 ("bpf: Fix NMI/tracepoint re-entry deadlock on lru locks") converts LRU locks to raw_res_spin_lock (rqspinlock) and adds recovery paths for lock acquisition failure. When a lock cannot be acquired (NMI re-entry), nodes are marked pending_free=1 and either:

  • Published to a lockless free_llist for later pickup
  • Left on the pending/inactive list with pending_free=1 for lazy reclamation

The lazy reclamation happens during:

  • __local_list_flush() (called from bpf_lru_list_pop_free_to_local())
  • __bpf_lru_list_shrink_inactive() (called during LRU shrink)

The test (tools/testing/selftests/bpf/prog_tests/lru_lock_nmi.c:228) calls drain_then_verify_capacity() immediately after destroying perf event links and joining hammer threads. This function:

  1. Deletes all keys (drains the map)
  2. Refills with exactly MAP_ENTRIES (64) unique keys spread across CPUs
  3. Verifies all keys are retrievable

If pending_free nodes haven't been reclaimed yet, they still consume map capacity. The refill phase inserts 64 keys into a map that internally has fewer than 64 free slots (some occupied by pending_free nodes), triggering LRU eviction. Evicted keys then fail the lookup verification.

On aarch64, perf events don't use true NMI but IRQ-level exceptions. The slightly different timing characteristics compared to x86 NMI cause more frequent lock contention windows, making the failure deterministic rather than intermittent.

Proposed Fix

See 0001-selftests-bpf-Fix-lru_lock_nmi-post-stress-verificat.patch:

  1. Add 10ms sleep after destroying perf links and joining threads — allows in-flight interrupt handlers to complete.
  2. Retry verification up to 3 times — each drain+refill cycle triggers internal LRU list maintenance that reclaims pending_free nodes. After 2-3 cycles all stranded nodes are recovered.

This approach is robust because it doesn't rely on specific timing; it uses the kernel's own reclamation machinery to recover the nodes before asserting.

Impact

Every BPF CI run on aarch64 (test_progs, test_progs_no_alu32, test_progs_cpuv4) fails due to this test. This blocks meaningful CI signal for all submitted PRs on aarch64. On x86_64, the percpu_lru subtest fails intermittently, causing additional noise.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions