mbox series

[RFC,00/19] QEMU gmem implemention

Message ID 20230731162201.271114-1-xiaoyao.li@intel.com
Headers show
Series QEMU gmem implemention | expand

Message

Xiaoyao Li July 31, 2023, 4:21 p.m. UTC
This is the first RFC version of enabling KVM gmem[1] as the backend for
private memory of KVM_X86_PROTECTED_VM.

It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
and introduces 'private' property for memory backend. When the vm type
is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
it will call KVM gmem ioctl to allocate private memory for the backend.

    $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
          -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
	  ...

Unfortunately this patch series fails the boot of OVMF at very early
stage due to triple fault because KVM doesn't support emulate string IO
to private memory. We leave it as an open to be discussed.

There are following design opens that need to be discussed:

1. how to determine the vm type?

   a. like this series, specify the vm type via machine property
      'kvm-type'
   b. check the memory backend, if any backend has 'private' property
      set, the vm-type is set to KVM_X86_PROTECTED_VM.

2. whether 'private' property is needed if we choose 1.b as design 

   with 1.b, QEMU can decide whether the memory region needs to be
   private (allocates gmem fd for it) or not, on its own.

3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
   purose of it and what's the requirement on it. I think it's the
   questions for KVM folks than QEMU folks.

Any other idea/open/question is welcomed.


Beside, TDX QEMU implemetation is based on this series to provide
private gmem for TD private memory, which can be found at [2].
And it can work corresponding KVM [3] to boot TDX guest. 

[1] https://lore.kernel.org/all/20230718234512.1690985-1-seanjc@google.com/
[2] https://github.com/intel/qemu-tdx/tree/tdx-upstream-wip
[3] https://github.com/intel/tdx/tree/kvm-upstream-2023.07.27-v6.5-rc2-workaround

Chao Peng (4):
  RAMBlock: Support KVM gmemory
  kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  physmem: Add ram_block_convert_range
  kvm: handle KVM_EXIT_MEMORY_FAULT

Isaku Yamahata (4):
  HostMem: Add private property to indicate to use kvm gmem
  trace/kvm: Add trace for page convertion between shared and private
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host

Xiaoyao Li (11):
  trace/kvm: Split address space and slot id in
    trace_kvm_set_user_memory()
  *** HACK *** linux-headers: Update headers to pull in gmem APIs
  memory: Introduce memory_region_can_be_private()
  i386/pc: Drop pc_machine_kvm_type()
  target/i386: Implement mc->kvm_type() to get VM type
  i386/kvm: Create gmem fd for KVM_X86_SW_PROTECTED_VM
  kvm: Introduce support for memory_attributes
  kvm/memory: Introduce the infrastructure to set the default
    shared/private value
  i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM
  physmem: replace function name with __func__ in
    ram_block_discard_range()
  i386: Disable SMM mode for X86_SW_PROTECTED_VM

 accel/kvm/kvm-all.c         | 166 +++++++++++++++++++++++++++++++++---
 accel/kvm/trace-events      |   4 +-
 backends/hostmem.c          |  18 ++++
 hw/i386/pc.c                |   5 --
 hw/i386/pc_q35.c            |   3 +-
 hw/i386/x86.c               |  27 ++++++
 hw/pci-host/q35.c           |  61 ++++++++-----
 include/exec/cpu-common.h   |   2 +
 include/exec/memory.h       |  24 ++++++
 include/exec/ramblock.h     |   1 +
 include/hw/i386/pc.h        |   4 +-
 include/hw/i386/x86.h       |   4 +
 include/hw/pci-host/q35.h   |   1 +
 include/sysemu/hostmem.h    |   2 +-
 include/sysemu/kvm.h        |   3 +
 include/sysemu/kvm_int.h    |   2 +
 linux-headers/asm-x86/kvm.h |   3 +
 linux-headers/linux/kvm.h   |  50 +++++++++++
 qapi/qom.json               |   4 +
 softmmu/memory.c            |  27 ++++++
 softmmu/physmem.c           |  97 ++++++++++++++-------
 target/i386/kvm/kvm.c       |  84 ++++++++++++++++++
 target/i386/kvm/kvm_i386.h  |   1 +
 23 files changed, 517 insertions(+), 76 deletions(-)

Comments

Daniel P. Berrangé July 31, 2023, 4:51 p.m. UTC | #1
On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
> This is the first RFC version of enabling KVM gmem[1] as the backend for
> private memory of KVM_X86_PROTECTED_VM.
> 
> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> and introduces 'private' property for memory backend. When the vm type
> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> it will call KVM gmem ioctl to allocate private memory for the backend.
> 
>     $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>           -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> 	  ...
> 
> Unfortunately this patch series fails the boot of OVMF at very early
> stage due to triple fault because KVM doesn't support emulate string IO
> to private memory. We leave it as an open to be discussed.
> 
> There are following design opens that need to be discussed:
> 
> 1. how to determine the vm type?
> 
>    a. like this series, specify the vm type via machine property
>       'kvm-type'
>    b. check the memory backend, if any backend has 'private' property
>       set, the vm-type is set to KVM_X86_PROTECTED_VM.
> 
> 2. whether 'private' property is needed if we choose 1.b as design 
> 
>    with 1.b, QEMU can decide whether the memory region needs to be
>    private (allocates gmem fd for it) or not, on its own.
> 
> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
>    purose of it and what's the requirement on it. I think it's the
>    questions for KVM folks than QEMU folks.
> 
> Any other idea/open/question is welcomed.
> 
> 
> Beside, TDX QEMU implemetation is based on this series to provide
> private gmem for TD private memory, which can be found at [2].
> And it can work corresponding KVM [3] to boot TDX guest. 

We already have a general purpose configuration mechanism for
confidential guests.  The -machine argument has a property
confidential-guest-support=$OBJECT-ID, for pointing to an
object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
interface in QEMU. This is implemented with SEV, PPC PEF
mode, and s390 protvirt.

I would expect TDX to follow this same design ie

    qemu-system-x86_64 \
      -object tdx-guest,id=tdx0,..... \
      -machine q35,confidential-guest-support=tdx0 \
      ...

and not require inventing the new 'kvm-type' attribute at least.

For the memory backend though, I'm not so sure - possibly that
might be something that still wants an extra property to identify
the type of memory to allocate, since we use memory-backend-ram
for a variety of use cases.  Or it could be an entirely new object
type such as "memory-backend-gmem"


With regards,
Daniel
Isaku Yamahata July 31, 2023, 5:10 p.m. UTC | #2
On Mon, Jul 31, 2023 at 12:21:42PM -0400,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> This is the first RFC version of enabling KVM gmem[1] as the backend for
> private memory of KVM_X86_PROTECTED_VM.
> 
> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> and introduces 'private' property for memory backend. When the vm type
> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> it will call KVM gmem ioctl to allocate private memory for the backend.
> 
>     $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>           -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> 	  ...
> 
> Unfortunately this patch series fails the boot of OVMF at very early
> stage due to triple fault because KVM doesn't support emulate string IO
> to private memory. We leave it as an open to be discussed.
> 
> There are following design opens that need to be discussed:
> 
> 1. how to determine the vm type?
> 
>    a. like this series, specify the vm type via machine property
>       'kvm-type'
>    b. check the memory backend, if any backend has 'private' property
>       set, the vm-type is set to KVM_X86_PROTECTED_VM.

Hi Xiaoyao.  Because qemu has already confidential guest support, we should
utilize it.  Say,
qemu  \
  -object sw-protected, id=swp0, <more options for KVM_X86_SW_PROTECTED_VM> \
  -machine confidential-guest-support=swp0



> 2. whether 'private' property is needed if we choose 1.b as design 
> 
>    with 1.b, QEMU can decide whether the memory region needs to be
>    private (allocates gmem fd for it) or not, on its own.


Memory region property (how to create KVM memory slot) should be independent
from underlying VM type.  Some (e.g. TDX) may require KVM private memory slot,
some may not.  Leave the decision to its vm type backend.  They can use qemu
memory listener.
Xiaoyao Li Aug. 1, 2023, 1:45 a.m. UTC | #3
On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote:
> On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
>> This is the first RFC version of enabling KVM gmem[1] as the backend for
>> private memory of KVM_X86_PROTECTED_VM.
>>
>> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
>> and introduces 'private' property for memory backend. When the vm type
>> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
>> it will call KVM gmem ioctl to allocate private memory for the backend.
>>
>>      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>>            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
>> 	  ...
>>
>> Unfortunately this patch series fails the boot of OVMF at very early
>> stage due to triple fault because KVM doesn't support emulate string IO
>> to private memory. We leave it as an open to be discussed.
>>
>> There are following design opens that need to be discussed:
>>
>> 1. how to determine the vm type?
>>
>>     a. like this series, specify the vm type via machine property
>>        'kvm-type'
>>     b. check the memory backend, if any backend has 'private' property
>>        set, the vm-type is set to KVM_X86_PROTECTED_VM.
>>
>> 2. whether 'private' property is needed if we choose 1.b as design
>>
>>     with 1.b, QEMU can decide whether the memory region needs to be
>>     private (allocates gmem fd for it) or not, on its own.
>>
>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
>>     purose of it and what's the requirement on it. I think it's the
>>     questions for KVM folks than QEMU folks.
>>
>> Any other idea/open/question is welcomed.
>>
>>
>> Beside, TDX QEMU implemetation is based on this series to provide
>> private gmem for TD private memory, which can be found at [2].
>> And it can work corresponding KVM [3] to boot TDX guest.
> 
> We already have a general purpose configuration mechanism for
> confidential guests.  The -machine argument has a property
> confidential-guest-support=$OBJECT-ID, for pointing to an
> object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
> interface in QEMU. This is implemented with SEV, PPC PEF
> mode, and s390 protvirt.
> 
> I would expect TDX to follow this same design ie
> 
>      qemu-system-x86_64 \
>        -object tdx-guest,id=tdx0,..... \
>        -machine q35,confidential-guest-support=tdx0 \
>        ...
> 
> and not require inventing the new 'kvm-type' attribute at least.

yes.

TDX is initialized exactly as the above.

This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. 
It's my fault that forgot to list the option of introducing 
sw_protected_vm object with CONFIDENTIAL_GUEST_SUPPORT interface.
Thanks for Isaku to raise it 
https://lore.kernel.org/qemu-devel/20230731171041.GB1807130@ls.amr.corp.intel.com/

we can specify KVM_X86_SW_PROTECTED_VM this way:

qemu  \
   -object sw-protected,id=swp0,... \
   -machine confidential-guest-support=swp0 \
   ...

> For the memory backend though, I'm not so sure - possibly that
> might be something that still wants an extra property to identify
> the type of memory to allocate, since we use memory-backend-ram
> for a variety of use cases.  Or it could be an entirely new object
> type such as "memory-backend-gmem"

What I want to discuss is whether providing the interface to users to 
allow them configuring which memory is/can be private. For example, QEMU 
can do it internally. If users wants a confidential guest, QEMU 
allocates private gmem for normal RAM automatically.
Xiaoyao Li Aug. 1, 2023, 1:55 a.m. UTC | #4
On 8/1/2023 1:10 AM, Isaku Yamahata wrote:
> On Mon, Jul 31, 2023 at 12:21:42PM -0400,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> This is the first RFC version of enabling KVM gmem[1] as the backend for
>> private memory of KVM_X86_PROTECTED_VM.
>>
>> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
>> and introduces 'private' property for memory backend. When the vm type
>> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
>> it will call KVM gmem ioctl to allocate private memory for the backend.
>>
>>      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>>            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
>> 	  ...
>>
>> Unfortunately this patch series fails the boot of OVMF at very early
>> stage due to triple fault because KVM doesn't support emulate string IO
>> to private memory. We leave it as an open to be discussed.
>>
>> There are following design opens that need to be discussed:
>>
>> 1. how to determine the vm type?
>>
>>     a. like this series, specify the vm type via machine property
>>        'kvm-type'
>>     b. check the memory backend, if any backend has 'private' property
>>        set, the vm-type is set to KVM_X86_PROTECTED_VM.
> 
> Hi Xiaoyao.  Because qemu has already confidential guest support, we should
> utilize it.  Say,
> qemu  \
>    -object sw-protected, id=swp0, <more options for KVM_X86_SW_PROTECTED_VM> \
>    -machine confidential-guest-support=swp0

thanks for pointing out this option. I thought of it and forgot to list 
it as option.

It seems better and I'll go this direction if no one has different opinion.

> 
>> 2. whether 'private' property is needed if we choose 1.b as design
>>
>>     with 1.b, QEMU can decide whether the memory region needs to be
>>     private (allocates gmem fd for it) or not, on its own.
> 
> 
> Memory region property (how to create KVM memory slot) should be independent
> from underlying VM type.  Some (e.g. TDX) may require KVM private memory slot,
> some may not.  Leave the decision to its vm type backend.  They can use qemu
> memory listener.

As I replied to Daniel, the topic is whether 'private' property is 
needed. Is it essential to let users decide which memory can be private? 
It seems OK that QEMU can make the decision based on VM type.
Isaku Yamahata Aug. 14, 2023, 9:45 p.m. UTC | #5
On Thu, Aug 10, 2023 at 10:58:09AM -0500,
Michael Roth via <qemu-devel@nongnu.org> wrote:

> On Tue, Aug 01, 2023 at 09:45:41AM +0800, Xiaoyao Li wrote:
> > On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote:
> > > On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
> > > > This is the first RFC version of enabling KVM gmem[1] as the backend for
> > > > private memory of KVM_X86_PROTECTED_VM.
> > > > 
> > > > It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> > > > and introduces 'private' property for memory backend. When the vm type
> > > > is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> > > > it will call KVM gmem ioctl to allocate private memory for the backend.
> > > > 
> > > >      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
> > > >            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> > > > 	  ...
> > > > 
> > > > Unfortunately this patch series fails the boot of OVMF at very early
> > > > stage due to triple fault because KVM doesn't support emulate string IO
> > > > to private memory. We leave it as an open to be discussed.
> > > > 
> > > > There are following design opens that need to be discussed:
> > > > 
> > > > 1. how to determine the vm type?
> > > > 
> > > >     a. like this series, specify the vm type via machine property
> > > >        'kvm-type'
> > > >     b. check the memory backend, if any backend has 'private' property
> > > >        set, the vm-type is set to KVM_X86_PROTECTED_VM.
> > > > 
> > > > 2. whether 'private' property is needed if we choose 1.b as design
> > > > 
> > > >     with 1.b, QEMU can decide whether the memory region needs to be
> > > >     private (allocates gmem fd for it) or not, on its own.
> > > > 
> > > > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
> > > >     purose of it and what's the requirement on it. I think it's the
> > > >     questions for KVM folks than QEMU folks.
> > > > 
> > > > Any other idea/open/question is welcomed.
> > > > 
> > > > 
> > > > Beside, TDX QEMU implemetation is based on this series to provide
> > > > private gmem for TD private memory, which can be found at [2].
> > > > And it can work corresponding KVM [3] to boot TDX guest.
> > > 
> > > We already have a general purpose configuration mechanism for
> > > confidential guests.  The -machine argument has a property
> > > confidential-guest-support=$OBJECT-ID, for pointing to an
> > > object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
> > > interface in QEMU. This is implemented with SEV, PPC PEF
> > > mode, and s390 protvirt.
> > > 
> > > I would expect TDX to follow this same design ie
> > > 
> > >      qemu-system-x86_64 \
> > >        -object tdx-guest,id=tdx0,..... \
> > >        -machine q35,confidential-guest-support=tdx0 \
> > >        ...
> > > 
> > > and not require inventing the new 'kvm-type' attribute at least.
> > 
> > yes.
> > 
> > TDX is initialized exactly as the above.
> > 
> > This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. It's
> > my fault that forgot to list the option of introducing sw_protected_vm
> > object with CONFIDENTIAL_GUEST_SUPPORT interface.
> > Thanks for Isaku to raise it https://lore.kernel.org/qemu-devel/20230731171041.GB1807130@ls.amr.corp.intel.com/
> > 
> > we can specify KVM_X86_SW_PROTECTED_VM this way:
> > 
> > qemu  \
> >   -object sw-protected,id=swp0,... \
> >   -machine confidential-guest-support=swp0 \
> >   ...
> > 
> > > For the memory backend though, I'm not so sure - possibly that
> > > might be something that still wants an extra property to identify
> > > the type of memory to allocate, since we use memory-backend-ram
> > > for a variety of use cases.  Or it could be an entirely new object
> > > type such as "memory-backend-gmem"
> > 
> > What I want to discuss is whether providing the interface to users to allow
> > them configuring which memory is/can be private. For example, QEMU can do it
> > internally. If users wants a confidential guest, QEMU allocates private gmem
> > for normal RAM automatically.
> 
> I think handling it automatically simplifies things a good deal on the
> QEMU side. I think it's still worthwhile to still allow:
> 
>  -object memory-backend-memfd-private,...
> 
> because it provides a nice mechanism to set up a pair of shared/private
> memfd's to enable hole-punching via fallocate() to avoid doubling memory
> allocations for shared/private. It's also a nice place to control
> potentially-configurable things like:
> 
>  - whether or not to enable discard/hole-punching
>  - if discard is enabled, whether or not to register the range via
>    RamDiscardManager interface so that VFIO/IOMMU mappings get updated
>    when doing PCI passthrough. SNP relies on this for PCI passthrough
>    when discard is enabled, otherwise DMA occurs to stale mappings of
>    discarded bounce-buffer pages:
> 
>      https://github.com/AMDESE/qemu/blob/snp-latest/backends/hostmem-memfd-private.c#L449
> 
> But for other memory ranges, it doesn't do a lot of good to rely on
> users to control those via -object memory-backend-memfd-private, since
> QEMU will set up some regions internally, like the UEFI ROM.
> 
> It also isn't ideal for QEMU itself to internally control what
> should/shouldn't be set up with a backing guest_memfd, because some
> guest kernels do weird stuff, like scan for ROM regions in areas that
> guest kernels might have mapped as encrypted in guest page table. You
> can consider them to be guest bugs, but even current SNP-capable
> kernels exhibit this behavior and if the guest wants to do dumb stuff
> QEMU should let it.
> 
> But for these latter 2 cases, it doesn't make sense to attempt to do
> any sort of discarding of backing pages since it doesn't make sense to
> discard ROM pages.
> 
> So I think it makes sense to just set up the gmemfd automatically across
> the board internally, and keep memory-backend-memfd-private around
> purely as a way to control/configure discardable memory.


I'm looking at the repo and
31a7c7e36684 ("*hostmem-memfd-private: Initial discard manager support")

Do we have to implement RAM_DISCARD_MANGER at memory-backend-memfd-private?
Can't we implement it at host_mem? The interface callbacks can have check
"if (!private) return".  Then we can support any host-mem backend.