Message ID | 20240910175809.2135596-8-david@redhat.com |
---|---|
State | New |
Headers | show |
Series | s390x: virtio-mem support | expand |
On 10/09/2024 19.58, David Hildenbrand wrote: > A guest OS that supports memory hotplug / memory devices must during > boot be aware of the maximum possible physical memory address that it might > have to handle at a later stage during its runtime > > For example, the maximum possible memory address might be required to > prepare the kernel virtual address space accordingly (e.g., select page > table hierarchy depth). > > On s390x there is currently no such mechanism that is compatible with > paravirtualized memory devices, because the whole SCLP interface was > designed around the idea of "storage increments" and "standby memory". > Paravirtualized memory devices we want to support, such as virtio-mem, have > no intersection with any of that, but could co-exist with them in the > future if ever needed. > > In particular, a guest OS must never detect and use device memory > without the help of a proper device driver. Device memory must not be > exposed in any firmware-provided memory map (SCLP or diag260 on s390x). > For this reason, these memory devices will be places in memory *above* > the "maximum storage increment" exposed via SCLP. > > Let's provide a new diag500 subcode to query the memory limit determined in > s390_memory_init(). > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > hw/s390x/s390-hypercall.c | 3 +++ > hw/s390x/s390-hypercall.h | 1 + > 2 files changed, 4 insertions(+) > > diff --git a/hw/s390x/s390-hypercall.c b/hw/s390x/s390-hypercall.c > index f09e8a1d81..ac48fc0961 100644 > --- a/hw/s390x/s390-hypercall.c > +++ b/hw/s390x/s390-hypercall.c > @@ -68,6 +68,9 @@ int handle_diag_500(CPUS390XState *env) > case DIAG500_VIRTIO_CCW_NOTIFY: > env->regs[2] = handle_virtio_ccw_notify(env->regs[2], env->regs[3]); > return 0; > + case DIAG500_STORAGE_LIMIT: > + env->regs[2] = s390_get_memory_limit() - 1; > + return 0; > default: > return -EINVAL; > } > diff --git a/hw/s390x/s390-hypercall.h b/hw/s390x/s390-hypercall.h > index b7ac29f444..f0ca62bcbb 100644 > --- a/hw/s390x/s390-hypercall.h > +++ b/hw/s390x/s390-hypercall.h > @@ -18,6 +18,7 @@ > #define DIAG500_VIRTIO_RESET 1 /* legacy */ > #define DIAG500_VIRTIO_SET_STATUS 2 /* legacy */ > #define DIAG500_VIRTIO_CCW_NOTIFY 3 /* KVM_S390_VIRTIO_CCW_NOTIFY */ > +#define DIAG500_STORAGE_LIMIT 4 > > int handle_diag_500(CPUS390XState *env); Reviewed-by: Thomas Huth <thuth@redhat.com> Sounds very reasonable to me - but it would be good to get an Ack/Reviewed-by from IBM folks here (in case they prefer a different interface)... hope they'll join the discussion! Thomas
On 9/12/24 10:19 AM, Thomas Huth wrote: > On 10/09/2024 19.58, David Hildenbrand wrote: [...] >> diff --git a/hw/s390x/s390-hypercall.h b/hw/s390x/s390-hypercall.h >> index b7ac29f444..f0ca62bcbb 100644 >> --- a/hw/s390x/s390-hypercall.h >> +++ b/hw/s390x/s390-hypercall.h >> @@ -18,6 +18,7 @@ >> #define DIAG500_VIRTIO_RESET 1 /* legacy */ >> #define DIAG500_VIRTIO_SET_STATUS 2 /* legacy */ >> #define DIAG500_VIRTIO_CCW_NOTIFY 3 /* KVM_S390_VIRTIO_CCW_NOTIFY */ >> +#define DIAG500_STORAGE_LIMIT 4 >> >> int handle_diag_500(CPUS390XState *env); > > Reviewed-by: Thomas Huth <thuth@redhat.com> > > Sounds very reasonable to me - but it would be good to get an > Ack/Reviewed-by from IBM folks here (in case they prefer a different > interface)... hope they'll join the discussion! I've publicized the series on the internal channels yesterday. We're aware of the fact that we need to provide review.
On Thu, 12 Sep 2024 10:19:00 +0200 Thomas Huth <thuth@redhat.com> wrote: > > diff --git a/hw/s390x/s390-hypercall.h b/hw/s390x/s390-hypercall.h > > index b7ac29f444..f0ca62bcbb 100644 > > --- a/hw/s390x/s390-hypercall.h > > +++ b/hw/s390x/s390-hypercall.h > > @@ -18,6 +18,7 @@ > > #define DIAG500_VIRTIO_RESET 1 /* legacy */ > > #define DIAG500_VIRTIO_SET_STATUS 2 /* legacy */ > > #define DIAG500_VIRTIO_CCW_NOTIFY 3 /* KVM_S390_VIRTIO_CCW_NOTIFY */ > > +#define DIAG500_STORAGE_LIMIT 4 > > > > int handle_diag_500(CPUS390XState *env); > > Reviewed-by: Thomas Huth <thuth@redhat.com> > > Sounds very reasonable to me - but it would be good to get an > Ack/Reviewed-by from IBM folks here (in case they prefer a different > interface)... hope they'll join the discussion! > > Thomas According to Documentation/virt/kvm/s390/s390-diag.rst in the linux source tree DIAG 500 is for kvm virtio funcions. Based on the commit message this storagelimit DIAG is rather loosely tied to virtio if at all, so from that perspective DIAG may not be a perfect fit. OTOH I don't see a better fit either. I would prefer to have Christian's opinion on this. I have no strong opinion myself. If we decide to go with a DIAG, I would like to see Documentation/virt/kvm/s390/s390-diag.rst updated accordingly. Also if decide to go with DIAG 500, maybe leaving some space between the codes more closely tied to virtio and between the ones less closely tied to virito (for the unlikely case that we end up wanting another DIAG 500 subcode for virtio) might make sense. I.e e could make DIAG500_STORAGE_LIMIT 8 or 16 instead of 4. Again nothing important, just an idea. Regards, Halil
On 27.09.24 20:05, Halil Pasic wrote: > On Thu, 12 Sep 2024 10:19:00 +0200 > Thomas Huth <thuth@redhat.com> wrote: > >>> diff --git a/hw/s390x/s390-hypercall.h b/hw/s390x/s390-hypercall.h >>> index b7ac29f444..f0ca62bcbb 100644 >>> --- a/hw/s390x/s390-hypercall.h >>> +++ b/hw/s390x/s390-hypercall.h >>> @@ -18,6 +18,7 @@ >>> #define DIAG500_VIRTIO_RESET 1 /* legacy */ >>> #define DIAG500_VIRTIO_SET_STATUS 2 /* legacy */ >>> #define DIAG500_VIRTIO_CCW_NOTIFY 3 /* KVM_S390_VIRTIO_CCW_NOTIFY */ >>> +#define DIAG500_STORAGE_LIMIT 4 >>> >>> int handle_diag_500(CPUS390XState *env); >> >> Reviewed-by: Thomas Huth <thuth@redhat.com> >> >> Sounds very reasonable to me - but it would be good to get an >> Ack/Reviewed-by from IBM folks here (in case they prefer a different >> interface)... hope they'll join the discussion! >> >> Thomas > > According to Documentation/virt/kvm/s390/s390-diag.rst in the > linux source tree DIAG 500 is for kvm virtio funcions. I assume you skimmed the QEMU patches, including the one that suggest changing (or rather extending) that. > > Based on the commit message this storagelimit DIAG is rather loosely > tied to virtio if at all, so from that perspective DIAG may not be a > perfect fit. OTOH I don't see a better fit either. I would prefer to > have Christian's opinion on this. I have no strong opinion myself. > > If we decide to go with a DIAG, I would like to see> Documentation/virt/kvm/s390/s390-diag.rst > updated accordingly. I'll document wherever people fancy. I'll even document in multiple locations :) > > Also if decide to go with DIAG 500, maybe leaving some space > between the codes more closely tied to virtio and between > the ones less closely tied to virito (for the unlikely case > that we end up wanting another DIAG 500 subcode for virtio) > might make sense. I.e e could make DIAG500_STORAGE_LIMIT > 8 or 16 instead of 4. Again nothing important, just an idea. I don't see a reason to do that, but in the end I don't care as long as people let me know what to do instead. Thanks for taking a look!
Am 27.09.24 um 20:05 schrieb Halil Pasic: > On Thu, 12 Sep 2024 10:19:00 +0200 > Thomas Huth <thuth@redhat.com> wrote: > >>> diff --git a/hw/s390x/s390-hypercall.h b/hw/s390x/s390-hypercall.h >>> index b7ac29f444..f0ca62bcbb 100644 >>> --- a/hw/s390x/s390-hypercall.h >>> +++ b/hw/s390x/s390-hypercall.h >>> @@ -18,6 +18,7 @@ >>> #define DIAG500_VIRTIO_RESET 1 /* legacy */ >>> #define DIAG500_VIRTIO_SET_STATUS 2 /* legacy */ >>> #define DIAG500_VIRTIO_CCW_NOTIFY 3 /* KVM_S390_VIRTIO_CCW_NOTIFY */ >>> +#define DIAG500_STORAGE_LIMIT 4 >>> >>> int handle_diag_500(CPUS390XState *env); >> >> Reviewed-by: Thomas Huth <thuth@redhat.com> >> >> Sounds very reasonable to me - but it would be good to get an >> Ack/Reviewed-by from IBM folks here (in case they prefer a different >> interface)... hope they'll join the discussion! >> >> Thomas > > According to Documentation/virt/kvm/s390/s390-diag.rst in the > linux source tree DIAG 500 is for kvm virtio funcions. > > Based on the commit message this storagelimit DIAG is rather loosely > tied to virtio if at all, so from that perspective DIAG may not be a > perfect fit. OTOH I don't see a better fit either. I would prefer to > have Christian's opinion on this. I have no strong opinion myself. Some remarks: 500 with a new subcode would work, it is marked as KVM hypervisor call in our docs. 501 was used in the past for software breakpoints. So we could use 502 as the next free one (This is reserved for KVM). We do have kvm_stat counters for 500, not sure if people debugging virtio will care. The only important question for me is, what code is generated by gcc for the switch statement and will any variant slow down the virtio doorbell. diag.c has if (!vcpu->kvm->arch.css_support || (vcpu->run->s.regs.gprs[1] != KVM_S390_VIRTIO_CCW_NOTIFY)) return -EOPNOTSUPP; So 500+4 should probably not cause any harm apart from branch prediction going wrong the first 2 or 3 notifies. 502 will make kvm_s390_handle_diag larger. So I tend to go with 500+4.
On Mon, 30 Sep 2024 13:11:31 +0200 Christian Borntraeger <borntraeger@linux.ibm.com> wrote: > We do have kvm_stat counters for 500, not sure if people debugging virtio > will care. Could end up being confusing, as currently we can assume each and every DIAG 500 is a virtio doorbell. But I don't think the chance of this causing real headache is big. > The only important question for me is, what code is generated by gcc for > the switch statement and will any variant slow down the virtio doorbell. > diag.c has > if (!vcpu->kvm->arch.css_support || > (vcpu->run->s.regs.gprs[1] != KVM_S390_VIRTIO_CCW_NOTIFY)) > return -EOPNOTSUPP; > > So 500+4 should probably not cause any harm apart from branch prediction > going wrong the first 2 or 3 notifies. > > 502 will make kvm_s390_handle_diag larger. What do you mean by this last paragraph? I suppose we are talking about int kvm_s390_handle_diag(struct kvm_vcpu *vcpu) { int code = kvm_s390_get_base_disp_rs(vcpu, NULL) & 0xffff; if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); trace_kvm_s390_handle_diag(vcpu, code); switch (code) { case 0x10: return diag_release_pages(vcpu); case 0x44: return __diag_time_slice_end(vcpu); case 0x9c: return __diag_time_slice_end_directed(vcpu); case 0x258: return __diag_page_ref_service(vcpu); case 0x308: return __diag_ipl_functions(vcpu); case 0x500: return __diag_virtio_hypercall(vcpu); default: vcpu->stat.instruction_diagnose_other++; return -EOPNOTSUPP; } } and my understanding is that the default branch of the switch statement would be already suitable for DIAG 502 as it is today for DIAG 502. So I'm quite confused by your statement that 502 will make kvm_s390_handle_diag larger (as the only meaning of larger I can think of is more code). Can you please clarify? Regards, Halil
On 30.09.24 13:11, Christian Borntraeger wrote: > Am 27.09.24 um 20:05 schrieb Halil Pasic: >> On Thu, 12 Sep 2024 10:19:00 +0200 >> Thomas Huth <thuth@redhat.com> wrote: >> >>>> diff --git a/hw/s390x/s390-hypercall.h b/hw/s390x/s390-hypercall.h >>>> index b7ac29f444..f0ca62bcbb 100644 >>>> --- a/hw/s390x/s390-hypercall.h >>>> +++ b/hw/s390x/s390-hypercall.h >>>> @@ -18,6 +18,7 @@ >>>> #define DIAG500_VIRTIO_RESET 1 /* legacy */ >>>> #define DIAG500_VIRTIO_SET_STATUS 2 /* legacy */ >>>> #define DIAG500_VIRTIO_CCW_NOTIFY 3 /* KVM_S390_VIRTIO_CCW_NOTIFY */ >>>> +#define DIAG500_STORAGE_LIMIT 4 >>>> >>>> int handle_diag_500(CPUS390XState *env); >>> >>> Reviewed-by: Thomas Huth <thuth@redhat.com> >>> >>> Sounds very reasonable to me - but it would be good to get an >>> Ack/Reviewed-by from IBM folks here (in case they prefer a different >>> interface)... hope they'll join the discussion! >>> >>> Thomas >> >> According to Documentation/virt/kvm/s390/s390-diag.rst in the >> linux source tree DIAG 500 is for kvm virtio funcions. >> >> Based on the commit message this storagelimit DIAG is rather loosely >> tied to virtio if at all, so from that perspective DIAG may not be a >> perfect fit. OTOH I don't see a better fit either. I would prefer to >> have Christian's opinion on this. I have no strong opinion myself. > > Some remarks: > 500 with a new subcode would work, it is marked as KVM hypervisor call > in our docs. 501 was used in the past for software breakpoints. Right, we use it in the absence of KVM_CAP_S390_USER_INSTR0. > So we could use 502 as the next free one (This is reserved for KVM). > We do have kvm_stat counters for 500, not sure if people debugging virtio > will care. It would be one additional trigger during system boot, so likely not really an issue. We could always add new stats for selected subcodes (i.e, KVM_S390_VIRTIO_CCW_NOTIFY). > The only important question for me is, what code is generated by gcc for > the switch statement and will any variant slow down the virtio doorbell. > diag.c has > if (!vcpu->kvm->arch.css_support || > (vcpu->run->s.regs.gprs[1] != KVM_S390_VIRTIO_CCW_NOTIFY)) > return -EOPNOTSUPP; > > So 500+4 should probably not cause any harm apart from branch prediction > going wrong the first 2 or 3 notifies. Right, it's very unlikely to be noticeable at all. Thanks for the feedback!
Am 30.09.24 um 14:57 schrieb Halil Pasic: > On Mon, 30 Sep 2024 13:11:31 +0200 > Christian Borntraeger <borntraeger@linux.ibm.com> wrote: > >> We do have kvm_stat counters for 500, not sure if people debugging virtio >> will care. > > Could end up being confusing, as currently we can assume each and every > DIAG 500 is a virtio doorbell. But I don't think the chance of this > causing real headache is big. > >> The only important question for me is, what code is generated by gcc for >> the switch statement and will any variant slow down the virtio doorbell. >> diag.c has >> if (!vcpu->kvm->arch.css_support || >> (vcpu->run->s.regs.gprs[1] != KVM_S390_VIRTIO_CCW_NOTIFY)) >> return -EOPNOTSUPP; >> >> So 500+4 should probably not cause any harm apart from branch prediction >> going wrong the first 2 or 3 notifies. >> >> 502 will make kvm_s390_handle_diag larger. > > What do you mean by this last paragraph? > > I suppose we are talking about > int kvm_s390_handle_diag(struct kvm_vcpu *vcpu) > { > int code = kvm_s390_get_base_disp_rs(vcpu, NULL) & 0xffff; > > if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) > return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); > > trace_kvm_s390_handle_diag(vcpu, code); > switch (code) { > case 0x10: > return diag_release_pages(vcpu); > case 0x44: > return __diag_time_slice_end(vcpu); > case 0x9c: > return __diag_time_slice_end_directed(vcpu); > case 0x258: > return __diag_page_ref_service(vcpu); > case 0x308: > return __diag_ipl_functions(vcpu); > case 0x500: > return __diag_virtio_hypercall(vcpu); > default: > vcpu->stat.instruction_diagnose_other++; > return -EOPNOTSUPP; > } > } > > and my understanding is that the default branch of the switch > statement would be already suitable for DIAG 502 as it is today > for DIAG 502. So I'm quite confused by your statement that > 502 will make kvm_s390_handle_diag larger (as the only meaning > of larger I can think of is more code). Can you please clarify? gcc has logic for switch statements that decide about branch table or a chained compare+jump. I think due to spectre gcc now avoids indirect branches as much as possible but still a larger switch statement might kick the decision from inline compare/jump to a branch table. I am not worried in this particular case this was more or less a "what could go wrong".
On Tue, 1 Oct 2024 11:15:02 +0200 Christian Borntraeger <borntraeger@linux.ibm.com> wrote: [..] > >> So 500+4 should probably not cause any harm apart from branch prediction > >> going wrong the first 2 or 3 notifies. > >> > >> 502 will make kvm_s390_handle_diag larger. > > > > What do you mean by this last paragraph? [..] > gcc has logic for switch statements that decide about branch table or > a chained compare+jump. I think due to spectre gcc now avoids indirect > branches as much as possible but still a larger switch statement might > kick the decision from inline compare/jump to a branch table. > > I am not worried in this particular case this was more or less a > "what could go wrong". Hm, you did state that "502 will make kvm_s390_handle_diag larger". I suppose now we agree that 502 would not make kvm_s390_handle_diag larger. Right? I understood that you prefer 500+4 over 502 because the latter would make kvm_s390_handle_diag larger. Now that we have, I hope clarified, that 502 would not make the switch larger, do you still prefer 500+4? BTW your insights are very appreciated! Regards, Halil
Am 01.10.24 um 15:31 schrieb Halil Pasic: > On Tue, 1 Oct 2024 11:15:02 +0200 > Christian Borntraeger <borntraeger@linux.ibm.com> wrote: > [..] >>>> So 500+4 should probably not cause any harm apart from branch prediction >>>> going wrong the first 2 or 3 notifies. >>>> >>>> 502 will make kvm_s390_handle_diag larger. >>> >>> What do you mean by this last paragraph? > [..] > >> gcc has logic for switch statements that decide about branch table or >> a chained compare+jump. I think due to spectre gcc now avoids indirect >> branches as much as possible but still a larger switch statement might >> kick the decision from inline compare/jump to a branch table. >> >> I am not worried in this particular case this was more or less a >> "what could go wrong". > > Hm, you did state that "502 will make kvm_s390_handle_diag larger". I > suppose now we agree that 502 would not make kvm_s390_handle_diag larger. > Right? > > I understood that you prefer 500+4 over 502 because the latter would > make kvm_s390_handle_diag larger. Now that we have, I hope clarified, > that 502 would not make the switch larger, do you still prefer 500+4? > > BTW your insights are very appreciated! OK you mean that diag502 is not handled in the kernel but instead via default. Yes you are right. So it should not matter I guess.
diff --git a/hw/s390x/s390-hypercall.c b/hw/s390x/s390-hypercall.c index f09e8a1d81..ac48fc0961 100644 --- a/hw/s390x/s390-hypercall.c +++ b/hw/s390x/s390-hypercall.c @@ -68,6 +68,9 @@ int handle_diag_500(CPUS390XState *env) case DIAG500_VIRTIO_CCW_NOTIFY: env->regs[2] = handle_virtio_ccw_notify(env->regs[2], env->regs[3]); return 0; + case DIAG500_STORAGE_LIMIT: + env->regs[2] = s390_get_memory_limit() - 1; + return 0; default: return -EINVAL; } diff --git a/hw/s390x/s390-hypercall.h b/hw/s390x/s390-hypercall.h index b7ac29f444..f0ca62bcbb 100644 --- a/hw/s390x/s390-hypercall.h +++ b/hw/s390x/s390-hypercall.h @@ -18,6 +18,7 @@ #define DIAG500_VIRTIO_RESET 1 /* legacy */ #define DIAG500_VIRTIO_SET_STATUS 2 /* legacy */ #define DIAG500_VIRTIO_CCW_NOTIFY 3 /* KVM_S390_VIRTIO_CCW_NOTIFY */ +#define DIAG500_STORAGE_LIMIT 4 int handle_diag_500(CPUS390XState *env);
A guest OS that supports memory hotplug / memory devices must during boot be aware of the maximum possible physical memory address that it might have to handle at a later stage during its runtime For example, the maximum possible memory address might be required to prepare the kernel virtual address space accordingly (e.g., select page table hierarchy depth). On s390x there is currently no such mechanism that is compatible with paravirtualized memory devices, because the whole SCLP interface was designed around the idea of "storage increments" and "standby memory". Paravirtualized memory devices we want to support, such as virtio-mem, have no intersection with any of that, but could co-exist with them in the future if ever needed. In particular, a guest OS must never detect and use device memory without the help of a proper device driver. Device memory must not be exposed in any firmware-provided memory map (SCLP or diag260 on s390x). For this reason, these memory devices will be places in memory *above* the "maximum storage increment" exposed via SCLP. Let's provide a new diag500 subcode to query the memory limit determined in s390_memory_init(). Signed-off-by: David Hildenbrand <david@redhat.com> --- hw/s390x/s390-hypercall.c | 3 +++ hw/s390x/s390-hypercall.h | 1 + 2 files changed, 4 insertions(+)