Message ID | 20220512031803.3315890-12-xiaoyao.li@intel.com |
---|---|
State | New |
Headers | show |
Series | TDX QEMU support | expand |
> +int tdx_pre_create_vcpu(CPUState *cpu) > +{ > + MachineState *ms = MACHINE(qdev_get_machine()); > + X86CPU *x86cpu = X86_CPU(cpu); > + CPUX86State *env = &x86cpu->env; > + struct kvm_tdx_init_vm init_vm; > + int r = 0; > + > + qemu_mutex_lock(&tdx_guest->lock); > + if (tdx_guest->initialized) { > + goto out; > + } > + > + memset(&init_vm, 0, sizeof(init_vm)); > + init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0); > + > + init_vm.attributes = tdx_guest->attributes; > + init_vm.max_vcpus = ms->smp.cpus; > + > + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm); > + if (r < 0) { > + error_report("KVM_TDX_INIT_VM failed %s", strerror(-r)); > + goto out; > + } > + > + tdx_guest->initialized = true; > + > +out: > + qemu_mutex_unlock(&tdx_guest->lock); > + return r; > +} Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me. take care, Gerd
On 5/23/2022 5:20 PM, Gerd Hoffmann wrote: >> +int tdx_pre_create_vcpu(CPUState *cpu) >> +{ >> + MachineState *ms = MACHINE(qdev_get_machine()); >> + X86CPU *x86cpu = X86_CPU(cpu); >> + CPUX86State *env = &x86cpu->env; >> + struct kvm_tdx_init_vm init_vm; >> + int r = 0; >> + >> + qemu_mutex_lock(&tdx_guest->lock); >> + if (tdx_guest->initialized) { >> + goto out; >> + } >> + >> + memset(&init_vm, 0, sizeof(init_vm)); >> + init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0); >> + >> + init_vm.attributes = tdx_guest->attributes; >> + init_vm.max_vcpus = ms->smp.cpus; >> + >> + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm); >> + if (r < 0) { >> + error_report("KVM_TDX_INIT_VM failed %s", strerror(-r)); >> + goto out; >> + } >> + >> + tdx_guest->initialized = true; >> + >> +out: >> + qemu_mutex_unlock(&tdx_guest->lock); >> + return r; >> +} > > Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me. That's because for TDX, it has to do VM-scope (feature) initialization before creating vcpu. This is new to KVM and QEMU, that every feature is vcpu-scope and configured per-vcpu before. To minimize the change to QEMU, we want to utilize @cpu and @cpu->env to grab the configuration info. That's why it goes this way. Do you have any better idea on it? > take care, > Gerd >
Hi, > > Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me. > > That's because for TDX, it has to do VM-scope (feature) initialization > before creating vcpu. This is new to KVM and QEMU, that every feature is > vcpu-scope and configured per-vcpu before. > > To minimize the change to QEMU, we want to utilize @cpu and @cpu->env to > grab the configuration info. That's why it goes this way. > > Do you have any better idea on it? Maybe it's a bit more work to add VM-scope initialization support to qemu. But I expect that approach will work better long-term. You need this mutex and the 'initialized' variable in your code to make sure it runs only once because the way you hook it in is not ideal ... [ disclaimer: I'm not that familiar with the kvm interface in qemu ] take care, Gerd
On 5/24/2022 2:57 PM, Gerd Hoffmann wrote: > Hi, > >>> Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me. >> >> That's because for TDX, it has to do VM-scope (feature) initialization >> before creating vcpu. This is new to KVM and QEMU, that every feature is >> vcpu-scope and configured per-vcpu before. >> >> To minimize the change to QEMU, we want to utilize @cpu and @cpu->env to >> grab the configuration info. That's why it goes this way. >> >> Do you have any better idea on it? > > Maybe it's a bit more work to add VM-scope initialization support to > qemu. If just introducing VM-scope initialization to QEMU, it would be easy. What matters is what needs to be done inside VM-scope initialization. For TDX, we need to settle down the features that configured for the TD. Typically, the features are attributes of cpu object, parsed from "-cpu" option and stored in cpu object. I cannot think up a clean solution for it, other than 1) implement the same attributes from cpu object to machine object, or 2) create a CPU object when initializing machine object and collect all the info from "-cpu" and drop it in the end; then why not do it when creating 1st vcpu like this patch. That's what I can think up. Let's see if anyone has better idea. > But I expect that approach will work better long-term. You need > this mutex and the 'initialized' variable in your code to make sure it > runs only once because the way you hook it in is not ideal ... > > [ disclaimer: I'm not that familiar with the kvm interface in qemu ] > > take care, > Gerd >
On Wed, Jun 01, 2022 at 03:20:46PM +0800, Xiaoyao Li wrote: > On 5/24/2022 2:57 PM, Gerd Hoffmann wrote: > > Hi, > > Maybe it's a bit more work to add VM-scope initialization support to > > qemu. > > If just introducing VM-scope initialization to QEMU, it would be easy. What > matters is what needs to be done inside VM-scope initialization. > > For TDX, we need to settle down the features that configured for the TD. > Typically, the features are attributes of cpu object, parsed from "-cpu" > option and stored in cpu object. > 2) create a CPU object when initializing machine object and collect all the > info from "-cpu" and drop it in the end; then why not do it when creating > 1st vcpu like this patch. Do VM-scope tdx initialization late enough that cpu objects are already created at that point, so you can collect the info you need without a dummy cpu? I guess it could be helpful for the discussion when you can outine the 'big picture' for tdx initialization. How does kvm accel setup look like without TDX, and what additional actions are needed for TDX? What ordering requirements and other constrains exist? take care, Gerd
On 6/1/2022 3:54 PM, Gerd Hoffmann wrote: > On Wed, Jun 01, 2022 at 03:20:46PM +0800, Xiaoyao Li wrote: >> On 5/24/2022 2:57 PM, Gerd Hoffmann wrote: >>> Hi, >>> Maybe it's a bit more work to add VM-scope initialization support to >>> qemu. >> >> If just introducing VM-scope initialization to QEMU, it would be easy. What >> matters is what needs to be done inside VM-scope initialization. >> >> For TDX, we need to settle down the features that configured for the TD. >> Typically, the features are attributes of cpu object, parsed from "-cpu" >> option and stored in cpu object. > >> 2) create a CPU object when initializing machine object and collect all the >> info from "-cpu" and drop it in the end; then why not do it when creating >> 1st vcpu like this patch. > > Do VM-scope tdx initialization late enough that cpu objects are already > created at that point, so you can collect the info you need without a > dummy cpu? new CPU object is created during creating each vcpu. So we have to use mutex and flag to ensure VM-scope initialization is executed only once. And it's werid to hook VM-scope initialization in the middle of the vcpu creating phase to satisfy "late enough", so we choose to do it just before calling KVM API to initializing vcpu. > I guess it could be helpful for the discussion when you can outine the > 'big picture' for tdx initialization. How does kvm accel setup look > like without TDX, and what additional actions are needed for TDX? What > ordering requirements and other constrains exist? To boot a TDX VM, it requires several changes/additional steps in the flow: 1. specify the vm type KVM_X86_TDX_VM when creating VM with IOCTL(KVM_CREATE_VM); - When initializing KVM accel 2. initialize VM scope configuration before creating any VCPU; 3. initialize VCPU scope configuration; - done inside machine_init_done_notifier; 4. initialize virtual firmware in guest private memory before vcpu running; - done inside machine_init_done_notifier; 5. finalize the TD's measurement; - done inside machine init_done_notifier; And we are discussing where to do step 2). We can find from the code of tdx_pre_create_vcpu(), that it needs cpuid entries[] and attributes as input to KVM. cpuid entries[] is set up by kvm_x86_arch_cpuid() mainly based on 'CPUX86State *env' attributes.pks is retrieved from env->features[] and attributes.pmu is retrieved from x86cpu->enable_pmu to make VM-socpe data is consistent with VCPU data, we do choose the point late enough to ensure all the info/configurations from VCPU are settle down, that just before calling KVM API to do VCPU-scope configuration. > take care, > Gerd >
Hi, > > I guess it could be helpful for the discussion when you can outine the > > 'big picture' for tdx initialization. How does kvm accel setup look > > like without TDX, and what additional actions are needed for TDX? What > > ordering requirements and other constrains exist? > > To boot a TDX VM, it requires several changes/additional steps in the flow: > > 1. specify the vm type KVM_X86_TDX_VM when creating VM with > IOCTL(KVM_CREATE_VM); > - When initializing KVM accel > > 2. initialize VM scope configuration before creating any VCPU; > > 3. initialize VCPU scope configuration; > - done inside machine_init_done_notifier; > > 4. initialize virtual firmware in guest private memory before vcpu running; > - done inside machine_init_done_notifier; > > 5. finalize the TD's measurement; > - done inside machine init_done_notifier; > > > And we are discussing where to do step 2). > > We can find from the code of tdx_pre_create_vcpu(), that it needs > cpuid entries[] and attributes as input to KVM. > > cpuid entries[] is set up by kvm_x86_arch_cpuid() mainly based on > 'CPUX86State *env' > > attributes.pks is retrieved from env->features[] > and attributes.pmu is retrieved from x86cpu->enable_pmu > > to make VM-socpe data is consistent with VCPU data, we do choose the point > late enough to ensure all the info/configurations from VCPU are settle down, > that just before calling KVM API to do VCPU-scope configuration. So essentially tdx defines (some) vcpu properties at vm scope? Given that all vcpus typically identical (and maybe tdx even enforces this) this makes sense. A comment in the source code explaining this would be good. thanks, Gerd
On 6/7/2022 7:16 PM, Gerd Hoffmann wrote: > Hi, > >>> I guess it could be helpful for the discussion when you can outine the >>> 'big picture' for tdx initialization. How does kvm accel setup look >>> like without TDX, and what additional actions are needed for TDX? What >>> ordering requirements and other constrains exist? >> >> To boot a TDX VM, it requires several changes/additional steps in the flow: >> >> 1. specify the vm type KVM_X86_TDX_VM when creating VM with >> IOCTL(KVM_CREATE_VM); >> - When initializing KVM accel >> >> 2. initialize VM scope configuration before creating any VCPU; >> >> 3. initialize VCPU scope configuration; >> - done inside machine_init_done_notifier; >> >> 4. initialize virtual firmware in guest private memory before vcpu running; >> - done inside machine_init_done_notifier; >> >> 5. finalize the TD's measurement; >> - done inside machine init_done_notifier; >> >> >> And we are discussing where to do step 2). >> >> We can find from the code of tdx_pre_create_vcpu(), that it needs >> cpuid entries[] and attributes as input to KVM. >> >> cpuid entries[] is set up by kvm_x86_arch_cpuid() mainly based on >> 'CPUX86State *env' >> >> attributes.pks is retrieved from env->features[] >> and attributes.pmu is retrieved from x86cpu->enable_pmu >> >> to make VM-socpe data is consistent with VCPU data, we do choose the point >> late enough to ensure all the info/configurations from VCPU are settle down, >> that just before calling KVM API to do VCPU-scope configuration. > > So essentially tdx defines (some) vcpu properties at vm scope? Not TDX, but QEMU. Most of the CPU features are configrued by "-cpu" option not "-machine" option. > Given > that all vcpus typically identical (and maybe tdx even enforces this) > this makes sense. > > A comment in the source code explaining this would be good. > > thanks, > Gerd >
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index e6fa9d23207a..88468878d181 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -470,10 +470,17 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp) trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu)); + /* + * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call + * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer + * dereference. + */ + cpu->kvm_state = s; ret = kvm_arch_pre_create_vcpu(cpu); if (ret < 0) { error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed"); + cpu->kvm_state = NULL; goto err; } @@ -481,11 +488,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp) if (ret < 0) { error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)", kvm_arch_vcpu_id(cpu)); + cpu->kvm_state = NULL; goto err; } cpu->kvm_fd = ret; - cpu->kvm_state = s; cpu->vcpu_dirty = true; cpu->dirty_pages = 0; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 5be151e6499b..f2d7c3cf59ac 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2175,6 +2175,14 @@ int kvm_arch_init_vcpu(CPUState *cs) return r; } +int kvm_arch_pre_create_vcpu(CPUState *cpu) +{ + if (is_tdx_vm()) + return tdx_pre_create_vcpu(cpu); + + return 0; +} + int kvm_arch_destroy_vcpu(CPUState *cs) { X86CPU *cpu = X86_CPU(cs); diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c index 1df24735201e..2871de9d7b56 100644 --- a/target/i386/kvm/tdx-stub.c +++ b/target/i386/kvm/tdx-stub.c @@ -7,3 +7,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp) { return -EINVAL; } + +int tdx_pre_create_vcpu(CPUState *cpu) +{ + return -EINVAL; +} diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c index 6e3b15ba8a4a..3472b59c2dbb 100644 --- a/target/i386/kvm/tdx.c +++ b/target/i386/kvm/tdx.c @@ -18,6 +18,7 @@ #include "sysemu/kvm.h" #include "hw/i386/x86.h" +#include "kvm_i386.h" #include "tdx.h" #define TDX_SUPPORTED_KVM_FEATURES ((1ULL << KVM_FEATURE_NOP_IO_DELAY) | \ @@ -165,6 +166,38 @@ void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg, } } +int tdx_pre_create_vcpu(CPUState *cpu) +{ + MachineState *ms = MACHINE(qdev_get_machine()); + X86CPU *x86cpu = X86_CPU(cpu); + CPUX86State *env = &x86cpu->env; + struct kvm_tdx_init_vm init_vm; + int r = 0; + + qemu_mutex_lock(&tdx_guest->lock); + if (tdx_guest->initialized) { + goto out; + } + + memset(&init_vm, 0, sizeof(init_vm)); + init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0); + + init_vm.attributes = tdx_guest->attributes; + init_vm.max_vcpus = ms->smp.cpus; + + r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm); + if (r < 0) { + error_report("KVM_TDX_INIT_VM failed %s", strerror(-r)); + goto out; + } + + tdx_guest->initialized = true; + +out: + qemu_mutex_unlock(&tdx_guest->lock); + return r; +} + /* tdx guest */ OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest, tdx_guest, @@ -177,6 +210,8 @@ static void tdx_guest_init(Object *obj) { TdxGuest *tdx = TDX_GUEST(obj); + qemu_mutex_init(&tdx->lock); + tdx->attributes = 0; } diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h index 06599b65b827..46a24ee8c7cc 100644 --- a/target/i386/kvm/tdx.h +++ b/target/i386/kvm/tdx.h @@ -17,6 +17,9 @@ typedef struct TdxGuestClass { typedef struct TdxGuest { ConfidentialGuestSupport parent_obj; + QemuMutex lock; + + bool initialized; uint64_t attributes; /* TD attributes */ } TdxGuest; @@ -29,5 +32,6 @@ bool is_tdx_vm(void); int tdx_kvm_init(MachineState *ms, Error **errp); void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg, uint32_t *ret); +int tdx_pre_create_vcpu(CPUState *cpu); #endif /* QEMU_I386_TDX_H */
Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT configures global TD state, e.g. the canonical CPUID config, and must be executed prior to creating vCPUs. Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM and tie x86cpu->enable_pmu with TD's attributes. Note, this doesn't address the fact that QEMU may change the CPUID configuration when creating vCPUs, i.e. punts on refactoring QEMU to provide a stable CPUID config prior to kvm_arch_init(). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> --- accel/kvm/kvm-all.c | 9 ++++++++- target/i386/kvm/kvm.c | 8 ++++++++ target/i386/kvm/tdx-stub.c | 5 +++++ target/i386/kvm/tdx.c | 35 +++++++++++++++++++++++++++++++++++ target/i386/kvm/tdx.h | 4 ++++ 5 files changed, 60 insertions(+), 1 deletion(-)