[PW_SID:1068755] Direct Map Removal Support for guest_memfd#1639
[PW_SID:1068755] Direct Map Removal Support for guest_memfd#1639linux-riscv-bot wants to merge 16 commits intoworkflow__riscv__fixesfrom
Conversation
This is to avoid excessive conversions folio->page->address when adding helpers on top of set_direct_map_valid_noflush() in the next patch. Acked-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Let's provide folio_{zap,restore}_direct_map helpers as preparation for
supporting removal of the direct map for guest_memfd folios.
In folio_zap_direct_map(), flush TLB to make sure the data is not
accessible.
The new helpers need to be accessible to KVM on architectures that
support guest_memfd (x86 and arm64).
Direct map removal gives guest_memfd the same protection that
memfd_secret does, such as hardening against Spectre-like attacks
through in-kernel gadgets.
Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com>
Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
This drops an optimization in gup_fast_folio_allowed() where secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is enabled by default since commit b758fe6 ("mm/secretmem: make it on by default"), so the secretmem check did not actually end up elided in most cases anymore anyway. This is in preparation of the generalization of handling mappings where direct map entries of folios are set to not present. Currently, mappings that match this description are secretmem mappings (memfd_secret()). Later, some guest_memfd configurations will also fall into this category. Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Move the check for pinning closer to where the result is used. No functional changes. Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are set to not present. Currently, mappings that match this description are secretmem mappings (memfd_secret()). Later, some guest_memfd configurations will also fall into this category. Reject this new type of mappings in all locations that currently reject secretmem mappings, on the assumption that if secretmem mappings are rejected somewhere, it is precisely because of an inability to deal with folios without direct map entries, and then make memfd_secret() use AS_NO_DIRECT_MAP on its address_space to drop its special vma_is_secretmem()/secretmem_mapping() checks. Use a new flag instead of overloading AS_INACCESSIBLE (which is already set by guest_memfd) because not all guest_memfd mappings will end up being direct map removed (e.g. in pKVM setups, parts of guest_memfd that can be mapped to userspace should also be GUP-able, and generally not have restrictions on who can access it). Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Add a no-op stub for kvm_arch_gmem_invalidate if CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE=n. This allows defining kvm_gmem_free_folio without ifdef-ery, which allows more cleanly using guest_memfd's free_folio callback for non-arch-invalidation related code. Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
x86 supports GUEST_MEMFD_FLAG_NO_DIRECT_MAP whenever direct map modifications are possible (which is always the case). Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Support for GUEST_MEMFD_FLAG_NO_DIRECT_MAP on arm64 depends on 1) direct map manipulations at 4k granularity being possible, and 2) FEAT_S2FWB. 1) is met whenever the direct map is set up at 4k granularity (e.g. not with huge/gigantic pages) at boottime, as due to ARM's break-before-make semantics, breaking huge mappings into 4k mappings in the direct map is not possible (BBM would require temporary invalidation of the entire huge mapping, even if only a 4k subrange should be zapped, which will probably crash the kernel). However, the current default for rodata_full is true, which forces a 4k direct map. 2) is required to allow KVM to elide cache coherency operations when installing stage 2 page tables, which require the direct map to be entry for the newly mapped memory to be present (which it will not be, as guest_memfd would have removed direct map entries in kvm_gmem_get_pfn()). Cc: Will Deacon <will@kernel.org> Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Add GUEST_MEMFD_FLAG_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When set, guest_memfd folios will be removed from the direct map after preparation, with direct map entries only restored when the folios are freed. To ensure these folios do not end up in places where the kernel cannot deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct address_space if GUEST_MEMFD_FLAG_NO_DIRECT_MAP is requested. Note that this flag causes removal of direct map entries for all guest_memfd folios independent of whether they are "shared" or "private" (although current guest_memfd only supports either all folios in the "shared" state, or all folios in the "private" state if GUEST_MEMFD_FLAG_MMAP is not set). The usecase for removing direct map entries of also the shared parts of guest_memfd are a special type of non-CoCo VM where, host userspace is trusted to have access to all of guest memory, but where Spectre-style transient execution attacks through the host kernel's direct map should still be mitigated. In this setup, KVM retains access to guest memory via userspace mappings of guest_memfd, which are reflected back into KVM's memslots via userspace_addr. This is needed for things like MMIO emulation on x86_64 to work. Direct map entries are zapped right before guest or userspace mappings of gmem folios are set up, e.g. in kvm_gmem_fault_user_mapping() or kvm_gmem_get_pfn() [called from the KVM MMU code]. The only place where a gmem folio can be allocated without being mapped anywhere is kvm_gmem_populate(), where handling potential failures of direct map removal is not possible (by the time direct map removal is attempted, the folio is already marked as prepared, meaning attempting to re-try kvm_gmem_populate() would just result in -EEXIST without fixing up the direct map state). These folios are then removed form the direct map upon kvm_gmem_get_pfn(), e.g. when they are mapped into the guest later. Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
If guest memory is backed using a VMA that does not allow GUP (e.g. a userspace mapping of guest_memfd when the fd was allocated using GUEST_MEMFD_FLAG_NO_DIRECT_MAP), then directly loading the test ELF binary into it via read(2) potentially does not work. To nevertheless support loading binaries in this cases, do the read(2) syscall using a bounce buffer, and then memcpy from the bounce buffer into guest memory. Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…d != -1 Have vm_mem_add() always set KVM_MEM_GUEST_MEMFD in the memslot flags if a guest_memfd is passed in as an argument. This eliminates the possibility where a guest_memfd instance is passed to vm_mem_add(), but it ends up being ignored because the flags argument does not specify KVM_MEM_GUEST_MEMFD at the same time. This makes it easy to support more scenarios in which no vm_mem_add() is not passed a guest_memfd instance, but is expected to allocate one. Currently, this only happens if guest_memfd == -1 but flags & KVM_MEM_GUEST_MEMFD != 0, but later vm_mem_add() will gain support for loading the test code itself into guest_memfd (via GUEST_MEMFD_FLAG_MMAP) if requested via a special vm_mem_backing_src_type, at which point having to make sure the src_type and flags are in-sync becomes cumbersome. Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Allow selftests to configure their memslots such that userspace_addr is set to a MAP_SHARED mapping of the guest_memfd that's associated with the memslot. This setup is the configuration for non-CoCo VMs, where all guest memory is backed by a guest_memfd whose folios are all marked shared, but KVM is still able to access guest memory to provide functionality such as MMIO emulation on x86. Add backing types for normal guest_memfd, as well as direct map removed guest_memfd. Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
…tests Extend mem conversion selftests to cover the scenario that the guest can fault in and write gmem-backed guest memory even if its direct map removed. Also cover the new flag in guest_memfd_test.c tests. Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Use one of the padding fields in struct vm_shape to carry an enum vm_mem_backing_src_type value, to give the option to overwrite the default of VM_MEM_SRC_ANONYMOUS in __vm_create(). Overwriting this default will allow tests to create VMs where the test code is backed by mmap'd guest_memfd instead of anonymous memory. Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
Add a selftest that loads itself into guest_memfd (via GUEST_MEMFD_FLAG_MMAP) and triggers an MMIO exit when executed. This exercises x86 MMIO emulation code inside KVM for guest_memfd-backed memslots where the guest_memfd folios are direct map removed. Particularly, it validates that x86 MMIO emulation code (guest page table walks + instruction fetch) correctly accesses gmem through the VMA that's been reflected into the memslot's userspace_addr field (instead of trying to do direct map accesses). Signed-off-by: Patrick Roy <patrick.roy@linux.dev> Signed-off-by: Nikita Kalyazin <kalyazin@amazon.com> Signed-off-by: Linux RISC-V bot <linux.riscv.bot@gmail.com>
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 1: "[v11,01/16] set_memory: set_direct_map_* to take address" |
|
Patch 2: "[v11,02/16] set_memory: add folio_{zap,restore}_direct_map helpers" |
|
Patch 2: "[v11,02/16] set_memory: add folio_{zap,restore}_direct_map helpers" |
|
Patch 14: "[v11,14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests" |
|
Patch 14: "[v11,14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests" |
|
Patch 14: "[v11,14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests" |
|
Patch 14: "[v11,14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests" |
|
Patch 14: "[v11,14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests" |
|
Patch 14: "[v11,14/16] KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing selftests" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 15: "[v11,15/16] KVM: selftests: stuff vm_mem_backing_src_type into vm_shape" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
|
Patch 16: "[v11,16/16] KVM: selftests: Test guest execution from direct map removed gmem" |
PR for series 1068755 applied to workflow__riscv__fixes
Name: Direct Map Removal Support for guest_memfd
URL: https://patchwork.kernel.org/project/linux-riscv/list/?series=1068755
Version: 11