Skip to content

bpf, sockmap: take the socket lock in udp_bpf_recvmsg()#12435

Open
kernel-patches-daemon-bpf[bot] wants to merge 1 commit into
bpf_basefrom
series/1109482=>bpf
Open

bpf, sockmap: take the socket lock in udp_bpf_recvmsg()#12435
kernel-patches-daemon-bpf[bot] wants to merge 1 commit into
bpf_basefrom
series/1109482=>bpf

Conversation

@kernel-patches-daemon-bpf

Copy link
Copy Markdown

Pull request for series with
subject: bpf, sockmap: take the socket lock in udp_bpf_recvmsg()
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1109482

udp_bpf_recvmsg() drains the psock ingress_msg queue through
sk_msg_recvmsg() without holding the socket lock. __sk_msg_recvmsg()
walks the head sk_msg's scatterlist and updates each entry in place
(sge->length -= copy, sge->offset += copy), drops the page once an entry
is consumed, and frees the sk_msg when it is drained. ingress_lock only
serializes the queue list operations, not this per-entry consume;
copy_page_to_iter() can fault, so the loop cannot hold a spinlock.

Two threads calling recvmsg() on the same UDP socket then consume the
same sk_msg at once:

  CPU0                                CPU1

  sk_msg_recvmsg                      sk_msg_recvmsg
    copy = sge->length                 copy = sge->length
    copy_page_to_iter(copy)            copy_page_to_iter(copy)
    sge->length -= copy
      /* sge->length == 0 */
                                       sge->length -= copy
                                         /* underflows to ~0U */

The next pass loads the underflowed length into a signed int, the value
is negative so the copied + copy > len clamp is skipped, and
copy_page_to_iter() is handed ~2^64 as its size_t bytes, which trips
page_copy_sane(). The same race can put_page() one page twice and free
the sk_msg while the other reader still walks it.

  WARNING: CPU: 0 PID: 5484 at lib/iov_iter.c:356 copy_page_to_iter
  WARNING: CPU: 1 PID: 5485 at lib/iov_iter.c:356 copy_page_to_iter
  CPU: 1 UID: 0 PID: 5485 Comm: syz.6.1204 Not tainted 7.1.0-rc7 #1
  RIP: 0010:copy_page_to_iter+0xa9/0x270
  Call Trace:
   <TASK>
   __sk_msg_recvmsg+0x384/0x9a0
   udp_bpf_recvmsg+0x1a0/0x6f0
   sock_recvmsg+0x124/0x150
   __sys_recvfrom+0x1c2/0x2b0
   __x64_sys_recvfrom+0x7b/0x90
   do_syscall_64+0x115/0x6d0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   </TASK>

tcp_bpf_recvmsg_parser() already serializes this with lock_sock() around
__sk_msg_recvmsg(); udp_bpf_recvmsg() does not. Take the socket lock
around the sk_msg_recvmsg() consume so a given ingress sk_msg is drained
by one reader at a time. The wait path is left unlocked, as
udp_msg_wait_data() sleeps without dropping the socket lock.

Fixes: 1f5be6b ("udp: Implement udp_bpf_recvmsg() for sockmap")
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Upstream branch: e7ae89a
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1109482
version: 1

@kernel-patches-review-bot

Copy link
Copy Markdown
commit 81f2f0434c54de61c1977e2408840a9ad479daa6
Author: Sechang Lim <rhkrqnwk98@gmail.com>

bpf, sockmap: take the socket lock in udp_bpf_recvmsg()

This wraps the sk_msg_recvmsg() consume in udp_bpf_recvmsg() with
lock_sock()/release_sock() so two recvmsg() callers can no longer race on
the same ingress sk_msg, while leaving the sleeping wait path unlocked.

> udp_bpf_recvmsg() drains the psock ingress_msg queue through
> sk_msg_recvmsg() without holding the socket lock. __sk_msg_recvmsg()
> walks the head sk_msg's scatterlist and updates each entry in place
> (sge->length -= copy, sge->offset += copy), drops the page once an entry
> is consumed, and frees the sk_msg when it is drained. ingress_lock only
> serializes the queue list operations, not this per-entry consume;
> copy_page_to_iter() can fault, so the loop cannot hold a spinlock.
>
> Two threads calling recvmsg() on the same UDP socket then consume the
> same sk_msg at once:
>
>   CPU0                                CPU1
>
>   sk_msg_recvmsg                      sk_msg_recvmsg
>     copy = sge->length                 copy = sge->length
>     copy_page_to_iter(copy)            copy_page_to_iter(copy)
>     sge->length -= copy
>       /* sge->length == 0 */
>                                        sge->length -= copy
>                                          /* underflows to ~0U */
>
> The next pass loads the underflowed length into a signed int, the value
> is negative so the copied + copy > len clamp is skipped, and
> copy_page_to_iter() is handed ~2^64 as its size_t bytes, which trips
> page_copy_sane(). The same race can put_page() one page twice and free
> the sk_msg while the other reader still walks it.
>
> tcp_bpf_recvmsg_parser() already serializes this with lock_sock() around
> __sk_msg_recvmsg(); udp_bpf_recvmsg() does not. Take the socket lock
> around the sk_msg_recvmsg() consume so a given ingress sk_msg is drained
> by one reader at a time. The wait path is left unlocked, as
> udp_msg_wait_data() sleeps without dropping the socket lock.
>
> Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")

Is the Fixes: tag pointing at the right commit?

Commit 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")
introduced udp_bpf_recvmsg() with a lock_sock(sk)/release_sock(sk) pair
wrapping the whole function, including the sk_msg_recvmsg() consume, so the
race described here was not present at that point.

The unlocked consume appears to have been introduced later by
9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy"), whose diff
removes both lock_sock(sk) and release_sock(sk) from udp_bpf_recvmsg().

Should the tag instead be:

  Fixes: 9f2470fbc4cb ("skmsg: Improve udp_bpf_recvmsg() accuracy")



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf, sockmap: take the socket lock in udp_bpf_recvmsg()
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27296268347

@kernel-patches-daemon-bpf

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant