mm: Unconditional per-VMA locks and cleanups by kernel-patches-daemon-bpf[bot] · Pull Request #12438 · kernel-patches/bpf

kernel-patches-daemon-bpf · 2026-06-10T23:15:37Z

Pull request for series with
subject: mm: Unconditional per-VMA locks and cleanups
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1109563

kernel-patches-daemon-bpf · 2026-06-10T23:15:39Z

Upstream branch: 2e8ad1f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1109563
version: 2

The per-VMA locks have been around for several years. They've had some bugs worked out of them and have seen quite wide use. However, they are still only available when architectures explicitly enable them. Remove the conditional compilation around the per-VMA locks, making them available on all architectures and configs. The approach up to now seemed to be to add ARCH_SUPPORTS_PER_VMA_LOCK when the architecture started using per-VMA locks in the fault handler. But, contrary to the naming, the Kconfig option does not really indicate whether the architecture supports per-VMA locks or not. It is more of a marker for whether the architecture is likely to benefit from per-VMA locks. To me, the most important thing side-effect of universal availability is letting per-VMA locks be used in SMP=n configs. This lets us use per-VMA locking in all x86 code without fallbacks. Overall, this just generally makes the kernel simpler. Just look at the diffstat. It also opens the door to users that want to use the per-VMA locks in common code. Doing *that* brings additional simplifications. The downside of this is adding some fields to vm_area_struct and mm_struct. There are likely ways to optimize this, especially for things like SMP=n configs. For now, do the simplest thing: use the same implementation everywhere. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: linux-mm@kvack.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Todd Kjos <tkjos@android.com> Cc: Christian Brauner <christian@brauner.io> Cc: Carlos Llamas <cmllamas@google.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Ahern <dsahern@kernel.org> Cc: netdev@vger.kernel.org

tl;dr: lock_vma_under_rcu() is already a trylock. No need to do both it and mmap_read_trylock(). Long Version: == Background == Historically, binder used an mmap_read_trylock() in its shrinker code. This ensures that reclaim is not blocked on an mmap_lock. Commit 95bc2d4 ("binder: use per-vma lock in page reclaiming") added support for the per-VMA lock, but left mmap_read_trylock() as a fallback. This was presumably because the per-VMA locking can fail for several reasons and most (all?) lock_vma_under_rcu() callers have a fallback to mmap_read_trylock(). == Problem == The fallback is not worth the complexity here. lock_vma_under_rcu() is essentially already a non-blocking trylock. The main reason it fails is also the reason mmap_read_trylock() fails: something is holding mmap_write_lock(). The only remedy for a collision with mmap_write_lock() is to wait, which this code can not do. So the "fallback" after lock_vma_under_rcu() failure is not really a fallback: it is really likely to just be retrying in vain. That retry in an of itself isn't horrible. But it adds complexity. == Solution == Now that per-VMA locks are universally available, lock_vma_under_rcu() will not persistently fail. Rely on it alone and simplify the code. Full disclosure: I originally tried to do this with lock_vma_under_rcu_wait(), but it did not fit well with the mmap_lock trylock semantics. Claude caught this in a review and suggested the approach in this path. It seemed sane to me. So, Suggesed-by: Claude, I guess. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: linux-mm@kvack.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Todd Kjos <tkjos@android.com> Cc: Christian Brauner <christian@brauner.io> Cc: Carlos Llamas <cmllamas@google.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Ahern <dsahern@kernel.org> Cc: netdev@vger.kernel.org

== Background == There are basically two parallel ways to look up a VMA: the traditional way, which is protected by mmap_lock, and the RCU-based per-VMA lock way which is based on RCU and refcounts. == Problem == The mmap_lock one is more straightforward to use but it has a big disadvantage in that it can not be mixed with page faults since those can take mmap_lock for read, which can deadlock when mixed with page faults. For example: mmap_read_lock(mm); // Another thread does mmap_write_lock(). // New mmap_lock readers are blocked. vma = vma_lookup(mm, address); // This deadlocks on mmap_read_lock() if it faults: copy_from_user(address); mmap_read_unlock(mm); The RCU one can be mixed with faults, but it is not available in all configs, so all RCU users need to be able to fall back to the traditional way. == Solution == Add a variant of the RCU-based lookup that waits for writers. This is basically the same as the existing RCU-based lookup, but it also takes mmap_lock for read and waits for writers to finish before returning the VMA. This has some advantages: 1. Callers do not need to have a fallback path for when they collide with writers. 2. It can be used in contexts where page faults can happen because it can take the mmap_lock for read but never *holds* it. 3. Its fast path does not require taking mmap_lock for read. Basically, when applied correctly, this approach results in faster *and* simpler code. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: linux-mm@kvack.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Todd Kjos <tkjos@android.com> Cc: Christian Brauner <christian@brauner.io> Cc: Carlos Llamas <cmllamas@google.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Ahern <dsahern@kernel.org> Cc: netdev@vger.kernel.org

Previously, the per-VMA locking could fail in the face of writers which necessitate a fallback to mmap_lock. The new vma_start_read_unlocked() will wait for writers instead of failing. Use the new helper. Wait for writers. Remove the fallback to mmap_lock. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: linux-mm@kvack.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Todd Kjos <tkjos@android.com> Cc: Christian Brauner <christian@brauner.io> Cc: Carlos Llamas <cmllamas@google.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Ahern <dsahern@kernel.org> Cc: netdev@vger.kernel.org

Previously, the per-VMA locking could fail in the face of writers which necessitates a fallback to mmap_lock. The new lock_vma_under_rcu_wait() will wait for writers instead of failing. Use the new helper. Wait for writers. Remove the fallback to mmap_lock. This really is a nice cleanup. It removes the need to pass the lock state back and forth to find_tcp_vma(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: linux-mm@kvack.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arve Hjønnevåg <arve@android.com> Cc: Todd Kjos <tkjos@android.com> Cc: Christian Brauner <christian@brauner.io> Cc: Carlos Llamas <cmllamas@google.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Ahern <dsahern@kernel.org> Cc: netdev@vger.kernel.org

kernel-patches-daemon-bpf · 2026-06-10T23:25:27Z

Upstream branch: 30dee2c
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1109563
version: 2

kernel-patches-review-bot · 2026-06-10T23:31:37Z

commit b687b1a163de1cfd4e91b9f8f9537d2782de82a0
Author: Dave Hansen <dave.hansen@linux.intel.com>

mm: Add RCU-based VMA lookup helper that waits for writers

This adds vma_start_read_unlocked(), a VMA lookup that uses the RCU-based
per-VMA lock on its fast path and falls back to taking mmap_lock for read
to wait out writers on its slow path. The intent is to let callers avoid
maintaining their own mmap_lock fallback path.

> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 68489dbef0c4..924dd3e865fc 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -257,6 +257,9 @@ static inline bool vma_start_read_locked(struct vm_area_struct *vma)
>  	return vma_start_read_locked_nested(vma, 0);
>  }
>
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +					       unsigned long address);
> +

> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index e20d01e8d38f..441d2beb325d 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -338,6 +338,33 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> +struct vm_area_struct *vma_start_read_unlocked(struct mm_struct *mm,
> +					       unsigned long address)
> +{
> +	struct vm_area_struct *vma;
> +
> +	/* Fast path: return stable VMA covering 'address': */
> +	vma = lock_vma_under_rcu(mm, address);
> +	if (vma)
> +		return vma;
> +
> +	/* Slow path: preclude VMA writers by getting mmap read lock. */
> +	guard(rwsem_read)(&mm->mmap_lock);
> +	if (!vma_start_read_locked(vma))
> +		return NULL;
> +
> +	return vma;
> +}

Can the slow path here dereference a NULL vma?

The slow path is only reached when lock_vma_under_rcu() returned NULL, since
the 'if (vma) return vma;' branch was not taken. So 'vma' is NULL at the point
vma_start_read_locked(vma) is called.

vma_start_read_locked() forwards to vma_start_read_locked_nested(vma, 0), which
starts with mmap_assert_locked(vma->vm_mm) and then operates on vma->vm_refcnt,
both of which dereference 'vma':

  static inline bool vma_start_read_locked(struct vm_area_struct *vma)
  {
  	return vma_start_read_locked_nested(vma, 0);
  }

This looks reachable on normal runtime paths: lock_vma_under_rcu() returns NULL
when no VMA covers the address, on a writer collision, or when the address
falls outside the locked VMA's range.

Separately, should the slow path perform its own lookup before locking? The
comment says the function should "Find the VMA covering 'address' and lock it
for reading," but the slow path never calls vma_lookup()/find_vma() and instead
reuses the stale 'vma' local. Would something like the following be the intent?

  vma = vma_lookup(mm, address);
  if (!vma)
  	return NULL;
  if (!vma_start_read_locked(vma))
  	return NULL;
  return vma;

Callers are wired up later in this series ('tcp: Remove mmap_lock fallback
path' and 'binder: Remove mmap_lock fallback'), so the path appears reachable
once those land, and no later fix for vma_start_read_unlocked() shows up in
between.

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: mm: Add RCU-based VMA lookup helper that waits for writers
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27312528837

kernel-patches-review-bot · 2026-06-10T23:33:05Z

commit 3e3467afd8b4d369ea7e4532b91f259416b5bc9c
Author: Dave Hansen <dave.hansen@linux.intel.com>

binder: Make shrinker rely solely on per-VMA lock

This drops the mmap_read_trylock() fallback in binder_alloc_free_page()
and relies only on lock_vma_under_rcu(), removing the mm_locked tracking
and the associated cleanup branches.

> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index e4488ad86a65..9d8239b2640e 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c

[ ... ]

> @@ -1203,11 +1197,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
>  err_invalid_vma:
>  	mutex_unlock(&alloc->mutex);
>  err_get_alloc_mutex_failed:
> -	if (mm_locked)
> -		mmap_read_unlock(mm);
> -	else
> +	if (vma)
>  		vma_end_read(vma);
> -err_mmap_read_lock_failed:
> +err_vma_lock_failed:
>  	mmput_async(mm);
>  err_mmget:
>  	return LRU_SKIP;

The old err_mmap_read_lock_failed: label was reached by a goto from the
mmap_read_trylock() fallback that this patch removes. After renaming it to
err_vma_lock_failed:, is there any remaining goto that targets it?

Grepping the file, err_vma_lock_failed appears only at the label definition
and nowhere as a goto target, so the label is now reached only by fall-through
from err_get_alloc_mutex_failed:.

A label reached purely by fall-through is not considered "used", so wouldn't
-Wunused-label (enabled via -Wall) warn here, and turn into a build failure
under the default CONFIG_WERROR=y?

Would dropping the err_vma_lock_failed: label and letting mmput_async(mm) fall
through avoid that?

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: binder: Make shrinker rely solely on per-VMA lock
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27312528837

kernel-patches-daemon-bpf · 2026-06-10T23:37:18Z

Forwarding comment 4675691780 via email
In-Reply-To: 20260610230415.C0521C88@davehans-spike.ostc.intel.com
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260610230415.C0521C88@davehans-spike.ostc.intel.com/

kernel-patches-daemon-bpf · 2026-06-10T23:37:20Z

Forwarding comment 4675700241 via email
In-Reply-To: 20260610230413.D68967BC@davehans-spike.ostc.intel.com
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260610230413.D68967BC@davehans-spike.ostc.intel.com/

kernel-patches-review-bot · 2026-06-10T23:37:22Z

commit c4e197dd47640d9ed0b93394ce249569f60dce76
Author: Dave Hansen <dave.hansen@linux.intel.com>

mm: Make per-VMA locks available universally

The per-VMA locks have been around for several years. They've had some
bugs worked out of them and have seen quite wide use. However, they
are still only available when architectures explicitly enable them.
Remove the conditional compilation around the per-VMA locks, making
them available on all architectures and configs.

> diff --git a/mm/Kconfig b/mm/Kconfig
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1394,19 +1394,6 @@
>  config LRU_GEN_WALKS_MMU
>  	def_bool y
>  	depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
> -# }
> -
> -config ARCH_SUPPORTS_PER_VMA_LOCK
> -       def_bool n
> -
> -config PER_VMA_LOCK
> -	def_bool y
> -	depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
> -	help
> -	  Allow per-vma locking during page fault handling.
> -
> -	  This feature allows locking each virtual memory area separately when
> -	  handling page faults instead of taking mmap_lock.
>
>  config LOCK_MM_AND_FIND_VMA
>  	bool

Removing the CONFIG_PER_VMA_LOCK config symbol makes it permanently
undefined, but kernel/bpf/stackmap.c (not modified in this commit) still
keys its locking protocol on it. This creates a lock imbalance in the BPF
stack build-id collection path.

In stack_map_lock_vma(), lock_vma_under_rcu() now always executes the real
implementation (not a no-op stub). When it succeeds, it returns a VMA
holding a per-VMA read lock, with mmap_lock NOT held:

kernel/bpf/stackmap.c:stack_map_lock_vma() {
    ...
    vma = lock_vma_under_rcu(mm, ip);
    if (vma) {
        lock->vma = vma;
        return vma;
    }
    ...
}

But stack_map_unlock_vma() still has the #ifdef CONFIG_PER_VMA_LOCK around
its cleanup:

kernel/bpf/stackmap.c:stack_map_unlock_vma() {
#ifdef CONFIG_PER_VMA_LOCK
    vma_end_read(lock->vma);
#else
    mmap_read_unlock(lock->mm);
#endif
    lock->vma = NULL;
}

With CONFIG_PER_VMA_LOCK now undefined, the #else branch executes
mmap_read_unlock(lock->mm) on an mmap_lock rwsem that was never acquired.

This corrupts the rwsem state, and also leaks the per-VMA read lock (never
released via vma_end_read), permanently blocking VMA detach/free. Should
kernel/bpf/stackmap.c be updated to remove these #ifdef guards as well?

[ ... ]

> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -174,7 +174,6 @@
>      /// When per-vma locks are disabled, this always returns `None`.
>      #[inline]
>      pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option<VmaReadGuard<'_>> {
> -        #[cfg(CONFIG_PER_VMA_LOCK)]
>          {
>              // SAFETY: Calling `bindings::lock_vma_under_rcu` is always okay given an mm where
>              // `mm_users` is non-zero.
> @@ -188,12 +187,6 @@
>                  });
>              }
>          }
> -
> -        // Silence warnings about unused variables.
> -        #[cfg(not(CONFIG_PER_VMA_LOCK))]
> -        let _ = vma_addr;
> -
> -        None
>      }

Removing the trailing None here breaks compilation. The function signature
is pub fn lock_vma_under_rcu(&self, vma_addr: usize) ->
Option<VmaReadGuard<'_>>. The return Some(...) only executes when
!vma.is_null(). On the vma.is_null() path (lookup miss or lock contention)
the if has no else, so the inner block evaluates to (), and the function's
tail expression is (). This is a type mismatch (rustc E0308: expected
Option<VmaReadGuard<'_>>, found ()).

Shouldn't the trailing None be kept to serve as the return value for the
null/contention path?

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: mm: Make per-VMA locks available universally
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27312528837

kernel-patches-daemon-bpf · 2026-06-10T23:48:33Z

Forwarding comment 4675724775 via email
In-Reply-To: 20260610230411.617DD5E7@davehans-spike.ostc.intel.com
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260610230411.617DD5E7@davehans-spike.ostc.intel.com/

kernel-patches-daemon-bpf Bot added new bpf-next V2 labels Jun 10, 2026

kernel-patches-daemon-bpf Bot force-pushed the bpf-next_base branch from d9ed865 to 1b36123 Compare June 10, 2026 23:23

hansendc added 5 commits June 10, 2026 16:25

kernel-patches-daemon-bpf Bot force-pushed the series/1109563=>bpf-next branch from f69c59b to ebe6901 Compare June 10, 2026 23:25

kernel-patches-review-bot Bot added the ai-review label Jun 10, 2026

kernel-patches-daemon-bpf Bot removed the ai-review label Jun 10, 2026

kernel-patches-review-bot Bot added the ai-review label Jun 10, 2026

kernel-patches-daemon-bpf Bot removed the ai-review label Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mm: Unconditional per-VMA locks and cleanups#12438

mm: Unconditional per-VMA locks and cleanups#12438
kernel-patches-daemon-bpf[bot] wants to merge 5 commits into
bpf-next_basefrom
series/1109563=>bpf-next

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 10, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 10, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

kernel-patches-review-bot Bot commented Jun 10, 2026

Uh oh!

kernel-patches-daemon-bpf Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant