diff mbox series

[1/1] target/riscv: enable floating point unit

Message ID 20240916181633.366449-1-heinrich.schuchardt@canonical.com
State New
Headers show
Series [1/1] target/riscv: enable floating point unit | expand

Commit Message

Heinrich Schuchardt Sept. 16, 2024, 6:16 p.m. UTC
OpenSBI enables the floating point in mstatus. For consistency QEMU/KVM
should do the same.

Without this patch EDK II with TLS enabled crashes when hitting the first
floating point instruction while running QEMU with --accel kvm and runs
fine with --accel tcg.

Additionally to this patch EDK II should be changed to make no assumptions
about the state of the floating point unit.

Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
---
 target/riscv/cpu.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Andrew Jones Sept. 17, 2024, 12:13 p.m. UTC | #1
On Mon, Sep 16, 2024 at 08:16:33PM GMT, Heinrich Schuchardt wrote:
> OpenSBI enables the floating point in mstatus. For consistency QEMU/KVM
> should do the same.
> 
> Without this patch EDK II with TLS enabled crashes when hitting the first
> floating point instruction while running QEMU with --accel kvm and runs
> fine with --accel tcg.
> 
> Additionally to this patch EDK II should be changed to make no assumptions
> about the state of the floating point unit.
> 
> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
> ---
>  target/riscv/cpu.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 4bda754b01..c32e2721d4 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -923,6 +923,13 @@ static void riscv_cpu_reset_hold(Object *obj, ResetType type)
>      if (mcc->parent_phases.hold) {
>          mcc->parent_phases.hold(obj, type);
>      }
> +    if (riscv_has_ext(env, RVF) || riscv_has_ext(env, RVD)) {
> +        env->mstatus = set_field(env->mstatus, MSTATUS_FS, env->misa_mxl);
> +        for (int regnr = 0; regnr < 32; ++regnr) {
> +            env->fpr[regnr] = 0;
> +        }
> +        riscv_csrrw(env, CSR_FCSR, NULL, 0, -1);
> +    }

If this is only fixing KVM, then I think it belongs in
kvm_riscv_reset_vcpu(). But, I feel like we're working around an issue
with KVM synchronization with this, as well as with the "clear CSR values"
part of commit 8633951530cc ("target/riscv: Clear CSR values at reset and
sync MPSTATE with host"). KVM knows how to reset VCPUs. It does so on
VCPU creation and for any secondaries started with SBI HSM start. KVM's
reset would set sstatus.FS to 1 ("Initial") and zero out all the fp
registers and fcsr. So it seems like we're either synchronizing prior to
KVM resetting the boot VCPU, not synchronizing at all, or KVM isn't doing
the reset of the boot VCPU.

Thanks,
drew
Heinrich Schuchardt Sept. 17, 2024, 1:28 p.m. UTC | #2
On 17.09.24 14:13, Andrew Jones wrote:
> On Mon, Sep 16, 2024 at 08:16:33PM GMT, Heinrich Schuchardt wrote:
>> OpenSBI enables the floating point in mstatus. For consistency QEMU/KVM
>> should do the same.
>>
>> Without this patch EDK II with TLS enabled crashes when hitting the first
>> floating point instruction while running QEMU with --accel kvm and runs
>> fine with --accel tcg.
>>
>> Additionally to this patch EDK II should be changed to make no assumptions
>> about the state of the floating point unit.
>>
>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
>> ---
>>   target/riscv/cpu.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> index 4bda754b01..c32e2721d4 100644
>> --- a/target/riscv/cpu.c
>> +++ b/target/riscv/cpu.c
>> @@ -923,6 +923,13 @@ static void riscv_cpu_reset_hold(Object *obj, ResetType type)
>>       if (mcc->parent_phases.hold) {
>>           mcc->parent_phases.hold(obj, type);
>>       }
>> +    if (riscv_has_ext(env, RVF) || riscv_has_ext(env, RVD)) {
>> +        env->mstatus = set_field(env->mstatus, MSTATUS_FS, env->misa_mxl);
>> +        for (int regnr = 0; regnr < 32; ++regnr) {
>> +            env->fpr[regnr] = 0;
>> +        }
>> +        riscv_csrrw(env, CSR_FCSR, NULL, 0, -1);
>> +    }
> 
> If this is only fixing KVM, then I think it belongs in
> kvm_riscv_reset_vcpu(). But, I feel like we're working around an issue
> with KVM synchronization with this, as well as with the "clear CSR values"
> part of commit 8633951530cc ("target/riscv: Clear CSR values at reset and
> sync MPSTATE with host"). KVM knows how to reset VCPUs. It does so on
> VCPU creation and for any secondaries started with SBI HSM start. KVM's
> reset would set sstatus.FS to 1 ("Initial") and zero out all the fp
> registers and fcsr. So it seems like we're either synchronizing prior to
> KVM resetting the boot VCPU, not synchronizing at all, or KVM isn't doing
> the reset of the boot VCPU.
> 
> Thanks,
> drew

Hello Drew,

Thanks for reviewing.

Concerning the question whether kvm_riscv_reset_vcpu() would be a better 
place for the change:

Is there any specification prescribing what the state of the FS bits 
should be when entering M-mode and when entering S-mode?

Patch 8633951530cc seems not to touch the status register in QEMU's 
kvm_riscv_reset_vcpu(). So it is not obvious that this patch could have 
caused the problem.

Looking at the call sequences in Linux gives some ideas where to debug:

kvm_arch_vcpu_create calls kvm_riscv_reset_vcpu which calls
kvm_riscv_vcpu_fp_reset.

riscv_vcpu_set_isa_ext_single and kvm_riscv_vcpu_set_reg_config
only call kvm_riscv_vcpu_fp_reset if !vcpu->arch.ran_atleast_once.

kvm_riscv_vcpu_fp_reset sets FS bits to "initial"
if CONFIG_FPU=y and extension F or D is available.

It seems that in KVM only the creation of a vcpu will set the FS bits 
but rebooting will not.

Best regards

Heinrich
Andrew Jones Sept. 17, 2024, 2:49 p.m. UTC | #3
On Tue, Sep 17, 2024 at 03:28:42PM GMT, Heinrich Schuchardt wrote:
> On 17.09.24 14:13, Andrew Jones wrote:
> > On Mon, Sep 16, 2024 at 08:16:33PM GMT, Heinrich Schuchardt wrote:
> > > OpenSBI enables the floating point in mstatus. For consistency QEMU/KVM
> > > should do the same.
> > > 
> > > Without this patch EDK II with TLS enabled crashes when hitting the first
> > > floating point instruction while running QEMU with --accel kvm and runs
> > > fine with --accel tcg.
> > > 
> > > Additionally to this patch EDK II should be changed to make no assumptions
> > > about the state of the floating point unit.
> > > 
> > > Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
> > > ---
> > >   target/riscv/cpu.c | 7 +++++++
> > >   1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > > index 4bda754b01..c32e2721d4 100644
> > > --- a/target/riscv/cpu.c
> > > +++ b/target/riscv/cpu.c
> > > @@ -923,6 +923,13 @@ static void riscv_cpu_reset_hold(Object *obj, ResetType type)
> > >       if (mcc->parent_phases.hold) {
> > >           mcc->parent_phases.hold(obj, type);
> > >       }
> > > +    if (riscv_has_ext(env, RVF) || riscv_has_ext(env, RVD)) {
> > > +        env->mstatus = set_field(env->mstatus, MSTATUS_FS, env->misa_mxl);
> > > +        for (int regnr = 0; regnr < 32; ++regnr) {
> > > +            env->fpr[regnr] = 0;
> > > +        }
> > > +        riscv_csrrw(env, CSR_FCSR, NULL, 0, -1);
> > > +    }
> > 
> > If this is only fixing KVM, then I think it belongs in
> > kvm_riscv_reset_vcpu(). But, I feel like we're working around an issue
> > with KVM synchronization with this, as well as with the "clear CSR values"
> > part of commit 8633951530cc ("target/riscv: Clear CSR values at reset and
> > sync MPSTATE with host"). KVM knows how to reset VCPUs. It does so on
> > VCPU creation and for any secondaries started with SBI HSM start. KVM's
> > reset would set sstatus.FS to 1 ("Initial") and zero out all the fp
> > registers and fcsr. So it seems like we're either synchronizing prior to
> > KVM resetting the boot VCPU, not synchronizing at all, or KVM isn't doing
> > the reset of the boot VCPU.
> > 
> > Thanks,
> > drew
> 
> Hello Drew,
> 
> Thanks for reviewing.
> 
> Concerning the question whether kvm_riscv_reset_vcpu() would be a better
> place for the change:
> 
> Is there any specification prescribing what the state of the FS bits should
> be when entering M-mode and when entering S-mode?

I didn't see anything in the spec, so I think 0 (or 1 when all fp
registers are also reset) is reasonable for an implementation to
choose.

> 
> Patch 8633951530cc seems not to touch the status register in QEMU's
> kvm_riscv_reset_vcpu(). So it is not obvious that this patch could have
> caused the problem.

I don't think 8633951530cc caused this problem. It was solving its own
problem in the same way, which is to add some more reset for the VCPU.
I think both it and this patch are working around a problem with KVM or
with a problem synchronizing with KVM. If that's the case, and we fix
KVM or the synchronization with KVM, then I would revert the reset parts
of 8633951530cc too.

> 
> Looking at the call sequences in Linux gives some ideas where to debug:
> 
> kvm_arch_vcpu_create calls kvm_riscv_reset_vcpu which calls
> kvm_riscv_vcpu_fp_reset.
> 
> riscv_vcpu_set_isa_ext_single and kvm_riscv_vcpu_set_reg_config
> only call kvm_riscv_vcpu_fp_reset if !vcpu->arch.ran_atleast_once.
> 
> kvm_riscv_vcpu_fp_reset sets FS bits to "initial"
> if CONFIG_FPU=y and extension F or D is available.
> 
> It seems that in KVM only the creation of a vcpu will set the FS bits but
> rebooting will not.

If KVM never resets the boot VCPU on reboot, then maybe it should or needs
QEMU to inform it to do so. I'd rather just one of the two (KVM or QEMU)
decide what needs to be reset and to which values, rather than both having
their own ideas. For example, with this patch, the boot hart will have its
sstatus.FS set to 3, but, iiuc, all secondaries will be brought up
with their sstatus.FS set to 1.

Thanks,
drew
Heinrich Schuchardt Sept. 17, 2024, 4:45 p.m. UTC | #4
On 17.09.24 16:49, Andrew Jones wrote:
> On Tue, Sep 17, 2024 at 03:28:42PM GMT, Heinrich Schuchardt wrote:
>> On 17.09.24 14:13, Andrew Jones wrote:
>>> On Mon, Sep 16, 2024 at 08:16:33PM GMT, Heinrich Schuchardt wrote:
>>>> OpenSBI enables the floating point in mstatus. For consistency QEMU/KVM
>>>> should do the same.
>>>>
>>>> Without this patch EDK II with TLS enabled crashes when hitting the first
>>>> floating point instruction while running QEMU with --accel kvm and runs
>>>> fine with --accel tcg.
>>>>
>>>> Additionally to this patch EDK II should be changed to make no assumptions
>>>> about the state of the floating point unit.
>>>>
>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
>>>> ---
>>>>    target/riscv/cpu.c | 7 +++++++
>>>>    1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>>>> index 4bda754b01..c32e2721d4 100644
>>>> --- a/target/riscv/cpu.c
>>>> +++ b/target/riscv/cpu.c
>>>> @@ -923,6 +923,13 @@ static void riscv_cpu_reset_hold(Object *obj, ResetType type)
>>>>        if (mcc->parent_phases.hold) {
>>>>            mcc->parent_phases.hold(obj, type);
>>>>        }
>>>> +    if (riscv_has_ext(env, RVF) || riscv_has_ext(env, RVD)) {
>>>> +        env->mstatus = set_field(env->mstatus, MSTATUS_FS, env->misa_mxl);
>>>> +        for (int regnr = 0; regnr < 32; ++regnr) {
>>>> +            env->fpr[regnr] = 0;
>>>> +        }
>>>> +        riscv_csrrw(env, CSR_FCSR, NULL, 0, -1);
>>>> +    }
>>>
>>> If this is only fixing KVM, then I think it belongs in
>>> kvm_riscv_reset_vcpu(). But, I feel like we're working around an issue
>>> with KVM synchronization with this, as well as with the "clear CSR values"
>>> part of commit 8633951530cc ("target/riscv: Clear CSR values at reset and
>>> sync MPSTATE with host"). KVM knows how to reset VCPUs. It does so on
>>> VCPU creation and for any secondaries started with SBI HSM start. KVM's
>>> reset would set sstatus.FS to 1 ("Initial") and zero out all the fp
>>> registers and fcsr. So it seems like we're either synchronizing prior to
>>> KVM resetting the boot VCPU, not synchronizing at all, or KVM isn't doing
>>> the reset of the boot VCPU.
>>>
>>> Thanks,
>>> drew
>>
>> Hello Drew,
>>
>> Thanks for reviewing.
>>
>> Concerning the question whether kvm_riscv_reset_vcpu() would be a better
>> place for the change:
>>
>> Is there any specification prescribing what the state of the FS bits should
>> be when entering M-mode and when entering S-mode?
> 
> I didn't see anything in the spec, so I think 0 (or 1 when all fp
> registers are also reset) is reasonable for an implementation to
> choose.
> 
>>
>> Patch 8633951530cc seems not to touch the status register in QEMU's
>> kvm_riscv_reset_vcpu(). So it is not obvious that this patch could have
>> caused the problem.
> 
> I don't think 8633951530cc caused this problem. It was solving its own
> problem in the same way, which is to add some more reset for the VCPU.
> I think both it and this patch are working around a problem with KVM or
> with a problem synchronizing with KVM. If that's the case, and we fix
> KVM or the synchronization with KVM, then I would revert the reset parts
> of 8633951530cc too.
> 
>>
>> Looking at the call sequences in Linux gives some ideas where to debug:
>>
>> kvm_arch_vcpu_create calls kvm_riscv_reset_vcpu which calls
>> kvm_riscv_vcpu_fp_reset.
>>
>> riscv_vcpu_set_isa_ext_single and kvm_riscv_vcpu_set_reg_config
>> only call kvm_riscv_vcpu_fp_reset if !vcpu->arch.ran_atleast_once.
>>
>> kvm_riscv_vcpu_fp_reset sets FS bits to "initial"
>> if CONFIG_FPU=y and extension F or D is available.
>>
>> It seems that in KVM only the creation of a vcpu will set the FS bits but
>> rebooting will not.
> 
> If KVM never resets the boot VCPU on reboot, then maybe it should or needs
> QEMU to inform it to do so. I'd rather just one of the two (KVM or QEMU)
> decide what needs to be reset and to which values, rather than both having
> their own ideas. For example, with this patch, the boot hart will have its
> sstatus.FS set to 3, but, iiuc, all secondaries will be brought up
> with their sstatus.FS set to 1.
> 
> Thanks,
> drew

Hello Drew,

I added some debug messages.

Without smp: Linux' kvm_riscv_vcpu_fp_reset() is called before QEMU's 
kvm_riscv_reset_vcpu() and is never called on reboot.

qemu-system-riscv64 -M virt -accel kvm -nographic -kernel 
payload_workaround.bin
[  920.805102] kvm_arch_vcpu_create: Entry
[  920.805608] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  920.805961] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  920.806289] kvm_arch_vcpu_create: Exit
[  920.810554] kvm_arch_vcpu_create: Entry
[  920.810959] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  920.811334] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  920.811696] kvm_arch_vcpu_create: Exit
[  920.816772] kvm_arch_vcpu_create: Entry
[  920.817095] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  920.817411] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  920.817975] kvm_arch_vcpu_create: Exit
[  920.818395] kvm_riscv_vcpu_set_reg_config:
[  920.818696] kvm_riscv_vcpu_set_reg_config:
[  920.818975] kvm_riscv_vcpu_set_reg_config:
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
[  920.946333] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
[  920.947031] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
[  920.947700] kvm_riscv_check_vcpu_requests: Entry
[  920.948482] kvm_riscv_check_vcpu_requests: Entry

Test payload
============

[  920.950012] kvm_arch_vcpu_ioctl_run: run->ext_reason 35

[  920.950666] kvm_riscv_check_vcpu_requests: Entry
Rebooting

[  920.951478] kvm_arch_vcpu_ioctl_run: run->ext_reason 35
[  920.952051] kvm_riscv_check_vcpu_requests: Entry
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
[  920.962404] kvm_arch_vcpu_ioctl_run: run->ext_reason 24
[  920.962969] kvm_arch_vcpu_ioctl_run: run->ext_reason 24
[  920.963496] kvm_riscv_check_vcpu_requests: Entry

Test payload
============


With -smp 2 this seems to hold true per CPU. So essentially the effect 
of vm_riscv_vcpu_fp_reset() is always ignored both on the primary and 
the secondary harts.

$ qemu-system-riscv64 -M virt -accel kvm -smp 2 -nographic -kernel 
payload_workaround.bin
[  202.573659] kvm_arch_vcpu_create: Entry
[  202.574024] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  202.574328] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  202.574626] kvm_arch_vcpu_create: Exit
[  202.580626] kvm_arch_vcpu_create: Entry
[  202.581070] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  202.581599] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  202.582040] kvm_arch_vcpu_create: Exit
[  202.587356] kvm_arch_vcpu_create: Entry
[  202.587894] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  202.588376] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  202.589188] kvm_arch_vcpu_create: Exit
[  202.589650] kvm_riscv_vcpu_set_reg_config:
[  202.590014] kvm_riscv_vcpu_set_reg_config:
[  202.590340] kvm_riscv_vcpu_set_reg_config:
[  202.595220] kvm_arch_vcpu_create: Entry
[  202.595604] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  202.595939] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  202.596278] kvm_arch_vcpu_create: Exit
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
[  202.602093] kvm_arch_vcpu_create: Entry
[  202.602426] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  202.602777] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  202.603110] kvm_arch_vcpu_create: Exit
[  202.607898] kvm_arch_vcpu_create: Entry
[  202.608306] kvm_riscv_vcpu_fp_reset: At entry FS=0
[  202.608989] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[  202.609416] kvm_arch_vcpu_create: Exit
[  202.609939] kvm_riscv_vcpu_set_reg_config:
[  202.610312] kvm_riscv_vcpu_set_reg_config:
[  202.610666] kvm_riscv_vcpu_set_reg_config:
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
[  202.749911] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
[  202.750370] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
[  202.750799] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
[  202.750819] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
[  202.751574] kvm_riscv_check_vcpu_requests: Entry
[  202.751617] kvm_riscv_check_vcpu_requests: Entry
[  202.752737] kvm_riscv_check_vcpu_requests: Entry

Test payload
============

[  202.753678] kvm_arch_vcpu_ioctl_run: run->ext_reason 35
fcvt.d.w fa5,a5
[  202.754145] kvm_riscv_check_vcpu_requests: Entry
Rebooting

[  202.754655] kvm_arch_vcpu_ioctl_run: run->ext_reason 35
[  202.755030] kvm_riscv_check_vcpu_requests: Entry
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
[  202.770352] kvm_arch_vcpu_ioctl_run: run->ext_reason 24
[  202.770915] kvm_arch_vcpu_ioctl_run: run->ext_reason 10
[  202.770951] kvm_arch_vcpu_ioctl_run: run->ext_reason 24
[  202.771802] kvm_arch_vcpu_ioctl_run: run->ext_reason 10
[  202.772272] kvm_riscv_check_vcpu_requests: Entry
[  202.772888] kvm_riscv_check_vcpu_requests: Entry

Test payload
============


When thinking about the migration of virtual machines shouldn't QEMU be 
in control of the initial state of vcpus instead of KVM?

CCing the RISC-V KVM maintainers.

Best regards

Heinrich
Andrew Jones Sept. 18, 2024, 6:05 a.m. UTC | #5
On Tue, Sep 17, 2024 at 06:45:21PM GMT, Heinrich Schuchardt wrote:
...
> When thinking about the migration of virtual machines shouldn't QEMU be in
> control of the initial state of vcpus instead of KVM?
>

Thinking about this more, I'm inclined to agree. Initial state and reset
state should be traits of the VMM (potentially influenced by the user)
rather than KVM.

Thanks,
drew
Peter Maydell Sept. 18, 2024, 11:10 a.m. UTC | #6
On Wed, 18 Sept 2024 at 07:06, Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Tue, Sep 17, 2024 at 06:45:21PM GMT, Heinrich Schuchardt wrote:
> ...
> > When thinking about the migration of virtual machines shouldn't QEMU be in
> > control of the initial state of vcpus instead of KVM?
> >
>
> Thinking about this more, I'm inclined to agree. Initial state and reset
> state should be traits of the VMM (potentially influenced by the user)
> rather than KVM.

Mmm. IIRC the way this works on Arm at least is that at some point
post-reset and before running the VM we do a QEMU->kernel state
sync, which means that whatever the kernel does with the CPU state
doesn't matter, only what QEMU's idea of reset is. Looking at the
source I think the way this happens is that kvm_cpu_synchronize_post_reset()
arranges to do a kvm_arch_put_registers(). (For Arm we have to do
some fiddling around to make sure our CPU state is in the right
place for that put_registers to DTRT, which is what kvm_arm_reset_vcpu()
is doing, but that's a consequence of the way we chose to handle
migration and in particular migration of system registers rather than
something necessarily every architecture wants to be doing.)

This also works for reset of the vCPU on a guest-reboot. We don't
tell KVM to reset the vCPU, we just set up the vCPU state on the
QEMU side and then do a QEMU->kernel state sync of it.

-- PMM
Heinrich Schuchardt Sept. 18, 2024, 1:06 p.m. UTC | #7
On 18.09.24 13:10, Peter Maydell wrote:
> On Wed, 18 Sept 2024 at 07:06, Andrew Jones <ajones@ventanamicro.com> wrote:
>>
>> On Tue, Sep 17, 2024 at 06:45:21PM GMT, Heinrich Schuchardt wrote:
>> ...
>>> When thinking about the migration of virtual machines shouldn't QEMU be in
>>> control of the initial state of vcpus instead of KVM?
>>>
>>
>> Thinking about this more, I'm inclined to agree. Initial state and reset
>> state should be traits of the VMM (potentially influenced by the user)
>> rather than KVM.
> 
> Mmm. IIRC the way this works on Arm at least is that at some point
> post-reset and before running the VM we do a QEMU->kernel state
> sync, which means that whatever the kernel does with the CPU state
> doesn't matter, only what QEMU's idea of reset is. Looking at the
> source I think the way this happens is that kvm_cpu_synchronize_post_reset()
> arranges to do a kvm_arch_put_registers(). (For Arm we have to do
> some fiddling around to make sure our CPU state is in the right
> place for that put_registers to DTRT, which is what kvm_arm_reset_vcpu()
> is doing, but that's a consequence of the way we chose to handle
> migration and in particular migration of system registers rather than
> something necessarily every architecture wants to be doing.)
> 
> This also works for reset of the vCPU on a guest-reboot. We don't
> tell KVM to reset the vCPU, we just set up the vCPU state on the
> QEMU side and then do a QEMU->kernel state sync of it.
> 
> -- PMM

Thanks Peter for looking into this.

QEMU's cpu_synchronize_all_post_init() and 
do_kvm_cpu_synchronize_post_reset() both end up in 
kvm_arch_put_registers() and that is long after Linux 
kvm_arch_vcpu_create() has been setting some FPU state. See the output 
below.

kvm_arch_put_registers() copies the CSRs by calling 
kvm_riscv_put_regs_csr(). Here we can find:

     KVM_RISCV_SET_CSR(cs, env, sstatus, env->mstatus);

This call enables or disables the FPU according to the value of 
env->mstatus.

So we need to set the desired state of the floating point unit in QEMU. 
And this is what the current patch does both for TCG and KVM.

Best regards

Heinrich


$ qemu-system-riscv64 -M virt -accel kvm -nographic -kernel payload.bin
QEMU qemu_init: Entry
QEMU qmp_x_exit_preconfig: Entry
[ 3503.369249] kvm_arch_vcpu_create: Entry
[ 3503.369669] kvm_riscv_vcpu_fp_reset: At entry FS=0
[ 3503.369966] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[ 3503.370256] kvm_arch_vcpu_create: Exit
[ 3503.378620] kvm_arch_vcpu_create: Entry
[ 3503.379123] kvm_riscv_vcpu_fp_reset: At entry FS=0
[ 3503.379610] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[ 3503.380111] kvm_arch_vcpu_create: Exit
[ 3503.394837] kvm_arch_vcpu_create: Entry
[ 3503.395238] kvm_riscv_vcpu_fp_reset: At entry FS=0
[ 3503.395585] kvm_riscv_vcpu_fp_reset: At exit FS=8192
[ 3503.395947] kvm_arch_vcpu_create: Exit
[ 3503.397023] kvm_riscv_vcpu_set_reg_config:
[ 3503.398066] kvm_riscv_vcpu_set_reg_config:
[ 3503.398430] kvm_riscv_vcpu_set_reg_config:
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
QEMU qemu_machine_creation_done: Entry
QEMU qdev_machine_creation_done: Entry
QEMU cpu_synchronize_all_post_init: Entry
QEMU cpu_synchronize_post_init: Entry
QEMU kvm_cpu_synchronize_post_init: Entry
QEMU do_kvm_cpu_synchronize_post_init: Entry
QEMU kvm_arch_put_registers: Entry
QEMU kvm_riscv_put_regs_csr: Entry
QEMU kvm_riscv_put_regs_csr: Exit
QEMU kvm_arch_put_registers: Exit
QEMU do_kvm_cpu_synchronize_post_init: Exit
QEMU kvm_cpu_synchronize_post_init: Exit
QEMU cpu_synchronize_post_init: Exit
QEMU cpu_synchronize_all_post_init: Exit
QEMU qemu_system_reset: Entry
QEMU kvm_arch_get_registers: Entry
QEMU riscv_cpu_reset_hold: Entry
QEMU kvm_riscv_reset_vcpu: Entry
QEMU kvm_riscv_reset_vcpu: Exit
QEMU riscv_cpu_reset_hold: Exit
QEMU cpu_synchronize_all_post_reset: Entry
QEMU cpu_synchronize_post_reset: Entry
QEMU do_kvm_cpu_synchronize_post_reset: Entry
QEMU kvm_arch_put_registers: Entry
QEMU kvm_riscv_put_regs_csr: Entry
QEMU kvm_riscv_put_regs_csr: Exit
QEMU kvm_riscv_sync_mpstate_to_kvm: Entry
QEMU kvm_riscv_sync_mpstate_to_kvm: Exit
QEMU kvm_arch_put_registers: Exit
QEMU do_kvm_cpu_synchronize_post_reset: Exit
QEMU cpu_synchronize_post_reset: Exit
QEMU cpu_synchronize_all_post_reset: Exit
QEMU qemu_system_reset: Exit
QEMU qdev_machine_creation_done: Exit
QEMU qmp_x_exit_preconfig: Exit
QEMU qemu_init: Exit
QEMU kvm_cpu_exec: Entry
[ 3503.566493] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
QEMU kvm_cpu_exec: Exit
QEMU kvm_cpu_exec: Entry
[ 3503.568338] kvm_arch_vcpu_ioctl_run: run->ext_reason 0
[ 3503.568740] kvm_riscv_check_vcpu_requests: Entry
[ 3503.569534] kvm_riscv_check_vcpu_requests: Entry

Test payload
============
Peter Maydell Sept. 18, 2024, 1:12 p.m. UTC | #8
On Wed, 18 Sept 2024 at 14:06, Heinrich Schuchardt
<heinrich.schuchardt@canonical.com> wrote:
> Thanks Peter for looking into this.
>
> QEMU's cpu_synchronize_all_post_init() and
> do_kvm_cpu_synchronize_post_reset() both end up in
> kvm_arch_put_registers() and that is long after Linux
> kvm_arch_vcpu_create() has been setting some FPU state. See the output
> below.
>
> kvm_arch_put_registers() copies the CSRs by calling
> kvm_riscv_put_regs_csr(). Here we can find:
>
>      KVM_RISCV_SET_CSR(cs, env, sstatus, env->mstatus);
>
> This call enables or disables the FPU according to the value of
> env->mstatus.
>
> So we need to set the desired state of the floating point unit in QEMU.
> And this is what the current patch does both for TCG and KVM.

If it does this for both TCG and KVM then I don't understand
this bit from the commit message:

# Without this patch EDK II with TLS enabled crashes when hitting the first
# floating point instruction while running QEMU with --accel kvm and runs
# fine with --accel tcg.

Shouldn't this guest crash the same way with both KVM and TCG without
this patch, because the FPU state is the same for both?

-- PMM
Heinrich Schuchardt Sept. 18, 2024, 1:49 p.m. UTC | #9
On 18.09.24 15:12, Peter Maydell wrote:
> On Wed, 18 Sept 2024 at 14:06, Heinrich Schuchardt
> <heinrich.schuchardt@canonical.com> wrote:
>> Thanks Peter for looking into this.
>>
>> QEMU's cpu_synchronize_all_post_init() and
>> do_kvm_cpu_synchronize_post_reset() both end up in
>> kvm_arch_put_registers() and that is long after Linux
>> kvm_arch_vcpu_create() has been setting some FPU state. See the output
>> below.
>>
>> kvm_arch_put_registers() copies the CSRs by calling
>> kvm_riscv_put_regs_csr(). Here we can find:
>>
>>       KVM_RISCV_SET_CSR(cs, env, sstatus, env->mstatus);
>>
>> This call enables or disables the FPU according to the value of
>> env->mstatus.
>>
>> So we need to set the desired state of the floating point unit in QEMU.
>> And this is what the current patch does both for TCG and KVM.
> 
> If it does this for both TCG and KVM then I don't understand
> this bit from the commit message:
> 
> # Without this patch EDK II with TLS enabled crashes when hitting the first
> # floating point instruction while running QEMU with --accel kvm and runs
> # fine with --accel tcg.
> 
> Shouldn't this guest crash the same way with both KVM and TCG without
> this patch, because the FPU state is the same for both?
> 
> -- PMM

By default `qemu-system-riscv64 --accel tcg` runs OpenSBI as firmware 
which enables the FPU.

If you would choose a different SBI implementation which does not enable 
the FPU you could experience the same crash.

Best regards

Heinrich
Andrew Jones Sept. 18, 2024, 1:56 p.m. UTC | #10
On Wed, Sep 18, 2024 at 03:49:39PM GMT, Heinrich Schuchardt wrote:
> On 18.09.24 15:12, Peter Maydell wrote:
> > On Wed, 18 Sept 2024 at 14:06, Heinrich Schuchardt
> > <heinrich.schuchardt@canonical.com> wrote:
> > > Thanks Peter for looking into this.
> > > 
> > > QEMU's cpu_synchronize_all_post_init() and
> > > do_kvm_cpu_synchronize_post_reset() both end up in
> > > kvm_arch_put_registers() and that is long after Linux
> > > kvm_arch_vcpu_create() has been setting some FPU state. See the output
> > > below.
> > > 
> > > kvm_arch_put_registers() copies the CSRs by calling
> > > kvm_riscv_put_regs_csr(). Here we can find:
> > > 
> > >       KVM_RISCV_SET_CSR(cs, env, sstatus, env->mstatus);
> > > 
> > > This call enables or disables the FPU according to the value of
> > > env->mstatus.
> > > 
> > > So we need to set the desired state of the floating point unit in QEMU.
> > > And this is what the current patch does both for TCG and KVM.
> > 
> > If it does this for both TCG and KVM then I don't understand
> > this bit from the commit message:
> > 
> > # Without this patch EDK II with TLS enabled crashes when hitting the first
> > # floating point instruction while running QEMU with --accel kvm and runs
> > # fine with --accel tcg.
> > 
> > Shouldn't this guest crash the same way with both KVM and TCG without
> > this patch, because the FPU state is the same for both?
> > 
> > -- PMM
> 
> By default `qemu-system-riscv64 --accel tcg` runs OpenSBI as firmware which
> enables the FPU.
> 
> If you would choose a different SBI implementation which does not enable the
> FPU you could experience the same crash.
> 

Thanks Heinrich, I had also forgotten that distinction. So the last
question is whether or not we want to reset mstatus.FS to 1 instead of 3,
as is done in this patch.

Thanks,
drew
Richard Henderson Sept. 18, 2024, 2:38 p.m. UTC | #11
On 9/16/24 20:16, Heinrich Schuchardt wrote:
> +        env->mstatus = set_field(env->mstatus, MSTATUS_FS, env->misa_mxl);

Using mxl is definitely wrong here.


r~
Peter Maydell Sept. 18, 2024, 3:32 p.m. UTC | #12
On Wed, 18 Sept 2024 at 14:49, Heinrich Schuchardt
<heinrich.schuchardt@canonical.com> wrote:
>
> On 18.09.24 15:12, Peter Maydell wrote:
> > On Wed, 18 Sept 2024 at 14:06, Heinrich Schuchardt
> > <heinrich.schuchardt@canonical.com> wrote:
> >> Thanks Peter for looking into this.
> >>
> >> QEMU's cpu_synchronize_all_post_init() and
> >> do_kvm_cpu_synchronize_post_reset() both end up in
> >> kvm_arch_put_registers() and that is long after Linux
> >> kvm_arch_vcpu_create() has been setting some FPU state. See the output
> >> below.
> >>
> >> kvm_arch_put_registers() copies the CSRs by calling
> >> kvm_riscv_put_regs_csr(). Here we can find:
> >>
> >>       KVM_RISCV_SET_CSR(cs, env, sstatus, env->mstatus);
> >>
> >> This call enables or disables the FPU according to the value of
> >> env->mstatus.
> >>
> >> So we need to set the desired state of the floating point unit in QEMU.
> >> And this is what the current patch does both for TCG and KVM.
> >
> > If it does this for both TCG and KVM then I don't understand
> > this bit from the commit message:
> >
> > # Without this patch EDK II with TLS enabled crashes when hitting the first
> > # floating point instruction while running QEMU with --accel kvm and runs
> > # fine with --accel tcg.
> >
> > Shouldn't this guest crash the same way with both KVM and TCG without
> > this patch, because the FPU state is the same for both?

> By default `qemu-system-riscv64 --accel tcg` runs OpenSBI as firmware
> which enables the FPU.
>
> If you would choose a different SBI implementation which does not enable
> the FPU you could experience the same crash.

Ah, so KVM vs TCG is a red herring and it's actually "some guest
firmware doesn't enable the FPU itself, and if you run that then it will
fall over, whether you do it in KVM or TCG" ? That makes more sense.

I don't have an opinion on whether you want to do that or not,
not knowing what the riscv architecture mandates. (On Arm this
would be fairly clearly "the guest software is broken and
should be fixed", but that's because the Arm architecture
says you can't assume the FPU is enabled from reset.)

I do think the commit message could use clarification to
explain this.

thanks
-- PMM
Heinrich Schuchardt Sept. 18, 2024, 3:49 p.m. UTC | #13
On 18.09.24 15:56, Andrew Jones wrote:
> Thanks Heinrich, I had also forgotten that distinction. So the last
> question is whether or not we want to reset mstatus.FS to 1 instead of 3,
> as is done in this patch.
> 
> Thanks,
> drew


If the FD flag set to 3 (Dirty) indicates that "the corresponding state 
has potentially been modified since the last context save". Upon context 
switches the value of the floating point registers has to be saved.

The value 1 (Initial) implies that saving the registers on a context 
change is not needed.

3 (Dirty) is the safest choice. This is why OpenSBI is also using it.

For reference:

3.1.6.6. Extension Context Status in mstatus Register
The RISC-V Instruction Set Manual: Volume II - Privileged Architecture
Version 2024-04-11

Best regards

Heinrich
Heinrich Schuchardt Sept. 18, 2024, 4:27 p.m. UTC | #14
On 18.09.24 17:32, Peter Maydell wrote:
> On Wed, 18 Sept 2024 at 14:49, Heinrich Schuchardt
> <heinrich.schuchardt@canonical.com> wrote:
>>
>> On 18.09.24 15:12, Peter Maydell wrote:
>>> On Wed, 18 Sept 2024 at 14:06, Heinrich Schuchardt
>>> <heinrich.schuchardt@canonical.com> wrote:
>>>> Thanks Peter for looking into this.
>>>>
>>>> QEMU's cpu_synchronize_all_post_init() and
>>>> do_kvm_cpu_synchronize_post_reset() both end up in
>>>> kvm_arch_put_registers() and that is long after Linux
>>>> kvm_arch_vcpu_create() has been setting some FPU state. See the output
>>>> below.
>>>>
>>>> kvm_arch_put_registers() copies the CSRs by calling
>>>> kvm_riscv_put_regs_csr(). Here we can find:
>>>>
>>>>        KVM_RISCV_SET_CSR(cs, env, sstatus, env->mstatus);
>>>>
>>>> This call enables or disables the FPU according to the value of
>>>> env->mstatus.
>>>>
>>>> So we need to set the desired state of the floating point unit in QEMU.
>>>> And this is what the current patch does both for TCG and KVM.
>>>
>>> If it does this for both TCG and KVM then I don't understand
>>> this bit from the commit message:
>>>
>>> # Without this patch EDK II with TLS enabled crashes when hitting the first
>>> # floating point instruction while running QEMU with --accel kvm and runs
>>> # fine with --accel tcg.
>>>
>>> Shouldn't this guest crash the same way with both KVM and TCG without
>>> this patch, because the FPU state is the same for both?
> 
>> By default `qemu-system-riscv64 --accel tcg` runs OpenSBI as firmware
>> which enables the FPU.
>>
>> If you would choose a different SBI implementation which does not enable
>> the FPU you could experience the same crash.
> 
> Ah, so KVM vs TCG is a red herring and it's actually "some guest
> firmware doesn't enable the FPU itself, and if you run that then it will
> fall over, whether you do it in KVM or TCG" ? That makes more sense.
> 
> I don't have an opinion on whether you want to do that or not,
> not knowing what the riscv architecture mandates. (On Arm this
> would be fairly clearly "the guest software is broken and
> should be fixed", but that's because the Arm architecture
> says you can't assume the FPU is enabled from reset.)
> 
> I do think the commit message could use clarification to
> explain this.
> 
> thanks
> -- PMM

I have not found a specification defining what the status of the FPU 
should be when M-Mode is stared and when moving from M-Mode to S-Mode.

OpenSBI (which is the dominating M-Mode firmware and invoked by default 
in TCG mode) enables the FPU before jumping to S-Mode. KVM should to the 
same for consistency.

Best regards

Heinrich
Alistair Francis Oct. 8, 2024, 12:45 a.m. UTC | #15
On Thu, Sep 19, 2024 at 1:34 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Wed, 18 Sept 2024 at 14:49, Heinrich Schuchardt
> <heinrich.schuchardt@canonical.com> wrote:
> >
> > On 18.09.24 15:12, Peter Maydell wrote:
> > > On Wed, 18 Sept 2024 at 14:06, Heinrich Schuchardt
> > > <heinrich.schuchardt@canonical.com> wrote:
> > >> Thanks Peter for looking into this.
> > >>
> > >> QEMU's cpu_synchronize_all_post_init() and
> > >> do_kvm_cpu_synchronize_post_reset() both end up in
> > >> kvm_arch_put_registers() and that is long after Linux
> > >> kvm_arch_vcpu_create() has been setting some FPU state. See the output
> > >> below.
> > >>
> > >> kvm_arch_put_registers() copies the CSRs by calling
> > >> kvm_riscv_put_regs_csr(). Here we can find:
> > >>
> > >>       KVM_RISCV_SET_CSR(cs, env, sstatus, env->mstatus);
> > >>
> > >> This call enables or disables the FPU according to the value of
> > >> env->mstatus.
> > >>
> > >> So we need to set the desired state of the floating point unit in QEMU.
> > >> And this is what the current patch does both for TCG and KVM.
> > >
> > > If it does this for both TCG and KVM then I don't understand
> > > this bit from the commit message:
> > >
> > > # Without this patch EDK II with TLS enabled crashes when hitting the first
> > > # floating point instruction while running QEMU with --accel kvm and runs
> > > # fine with --accel tcg.
> > >
> > > Shouldn't this guest crash the same way with both KVM and TCG without
> > > this patch, because the FPU state is the same for both?
>
> > By default `qemu-system-riscv64 --accel tcg` runs OpenSBI as firmware
> > which enables the FPU.
> >
> > If you would choose a different SBI implementation which does not enable
> > the FPU you could experience the same crash.
>
> Ah, so KVM vs TCG is a red herring and it's actually "some guest
> firmware doesn't enable the FPU itself, and if you run that then it will
> fall over, whether you do it in KVM or TCG" ? That makes more sense.
>
> I don't have an opinion on whether you want to do that or not,
> not knowing what the riscv architecture mandates. (On Arm this
> would be fairly clearly "the guest software is broken and
> should be fixed", but that's because the Arm architecture
> says you can't assume the FPU is enabled from reset.)

RISC-V is the same. Section "3.4 Reset" states that:

"All other hart state is UNSPECIFIED." (the paragraph doesn't mention
the FS state).

So it's unspecified what the value is on reset. Guest software
shouldn't assume anything about it and it does seem like a guest
software bug.

In saying that, we are allowed to set it then as the spec doesn't say
it should be 0. So setting it to 0x01 (Initial) doesn't seem like a
bad idea, as the name kind of implies that it should be 0x01 on reset

Alistair

>
> I do think the commit message could use clarification to
> explain this.
>
> thanks
> -- PMM
>
diff mbox series

Patch

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 4bda754b01..c32e2721d4 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -923,6 +923,13 @@  static void riscv_cpu_reset_hold(Object *obj, ResetType type)
     if (mcc->parent_phases.hold) {
         mcc->parent_phases.hold(obj, type);
     }
+    if (riscv_has_ext(env, RVF) || riscv_has_ext(env, RVD)) {
+        env->mstatus = set_field(env->mstatus, MSTATUS_FS, env->misa_mxl);
+        for (int regnr = 0; regnr < 32; ++regnr) {
+            env->fpr[regnr] = 0;
+        }
+        riscv_csrrw(env, CSR_FCSR, NULL, 0, -1);
+    }
 #ifndef CONFIG_USER_ONLY
     env->misa_mxl = mcc->misa_mxl_max;
     env->priv = PRV_M;