Message ID | 20231024094351.50464-1-quic_acaggian@quicinc.com |
---|---|
State | New |
Headers | show |
Series | [RFC] mem: Fix mem region size when is UINT64_MAX | expand |
On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano <quic_acaggian@quicinc.com> wrote: > > This looks like a bug. When the size is `UINT64_MAX`, it is reset to > (Int128)`1 << 64` which actually is `UINT64_MAX + 1`. > > Then, an assert is triggered when the size is converted back to uin64_t > by using the int128_get64() function, as the new value happens to be > different than the previous one. > > Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com> > --- > system/memory.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/system/memory.c b/system/memory.c > index a800fbc9e5..d41fc6af88 100644 > --- a/system/memory.c > +++ b/system/memory.c > @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr, > uint64_t size) > { > mr->size = int128_make64(size); > - if (size == UINT64_MAX) { > - mr->size = int128_2_64(); > - } No, this is intentional. In these memory region creation APIs that take a uint64_t size parameter, size == UINT64_MAX is a special case that means "actually the full 64 bit address space" (and there is no way to ask for an MR to have a size that is truly UINT64_MAX bytes). When we create the MR, the size is stored in the MemoryRegion struct as its true size, because we have an Int128 field there. What assertion (with backtrace) is being hit? The issue is probably at that point, not here. thanks -- PMM
Hi Peter, Thanks for the quick response. On 24/10/2023 12:28, Peter Maydell wrote: > On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano > <quic_acaggian@quicinc.com> wrote: >> >> This looks like a bug. When the size is `UINT64_MAX`, it is reset to >> (Int128)`1 << 64` which actually is `UINT64_MAX + 1`. >> >> Then, an assert is triggered when the size is converted back to uin64_t >> by using the int128_get64() function, as the new value happens to be >> different than the previous one. >> >> Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com> >> --- >> system/memory.c | 3 --- >> 1 file changed, 3 deletions(-) >> >> diff --git a/system/memory.c b/system/memory.c >> index a800fbc9e5..d41fc6af88 100644 >> --- a/system/memory.c >> +++ b/system/memory.c >> @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr, >> uint64_t size) >> { >> mr->size = int128_make64(size); >> - if (size == UINT64_MAX) { >> - mr->size = int128_2_64(); >> - } > > No, this is intentional. In these memory region creation APIs > that take a uint64_t size parameter, size == UINT64_MAX is a > special case that means "actually the full 64 bit address space" > (and there is no way to ask for an MR to have a size that is > truly UINT64_MAX bytes). When we create the MR, the size is > stored in the MemoryRegion struct as its true size, because > we have an Int128 field there. > > What assertion (with backtrace) is being hit? The issue is > probably at that point, not here. Here you can. I'm basically creating a system_memory of size UINT64_MAX, and the assert is being hit when the memory is registered to KVM. #5 0x0000fffff6fc4040 in __GI___assert_fail (assertion=0xffffe111d9c8 "r == a", file=0xffffe111d960 "qemu/include/qemu/int128.h", line=34, function=0xffffe111f348 <__PRETTY_FUNCTION__.46> "int128_get64") at ./assert/assert.c:101 #6 0x0000ffffe0c8cf6c in int128_get64 (a=18446744073709551616) at qemu/include/qemu/int128.h:34 #7 0x0000ffffe0c92cec in kvm_region_commit (listener=0xffffd83e92e0) at qemu/accel/kvm/kvm-all.c:1503 #8 0x0000ffffe0bd495c in memory_region_transaction_commit () at qemu/softmmu/memory.c:1109 #9 0x0000ffffe0bd8a90 in memory_region_update_container_subregions (subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2606 #10 0x0000ffffe0bd8b3c in memory_region_add_subregion_common (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2621 #11 0x0000ffffe0bd8b74 in memory_region_add_subregion (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2629 #12 0x0000ffffe05d5508 in gpex_host_realize (dev=0xaaaaabb69910, errp=0xffffdd4ce1f0) at qemu/hw/pci-host/gpex.c:132 > > thanks > -- PMM Cheers, Antonio
On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano <quic_acaggian@quicinc.com> wrote: > > Hi Peter, > > Thanks for the quick response. > > On 24/10/2023 12:28, Peter Maydell wrote: > > On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano > > <quic_acaggian@quicinc.com> wrote: > >> > >> This looks like a bug. When the size is `UINT64_MAX`, it is reset to > >> (Int128)`1 << 64` which actually is `UINT64_MAX + 1`. > >> > >> Then, an assert is triggered when the size is converted back to uin64_t > >> by using the int128_get64() function, as the new value happens to be > >> different than the previous one. > >> > >> Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com> > >> --- > >> system/memory.c | 3 --- > >> 1 file changed, 3 deletions(-) > >> > >> diff --git a/system/memory.c b/system/memory.c > >> index a800fbc9e5..d41fc6af88 100644 > >> --- a/system/memory.c > >> +++ b/system/memory.c > >> @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr, > >> uint64_t size) > >> { > >> mr->size = int128_make64(size); > >> - if (size == UINT64_MAX) { > >> - mr->size = int128_2_64(); > >> - } > > > > No, this is intentional. In these memory region creation APIs > > that take a uint64_t size parameter, size == UINT64_MAX is a > > special case that means "actually the full 64 bit address space" > > (and there is no way to ask for an MR to have a size that is > > truly UINT64_MAX bytes). When we create the MR, the size is > > stored in the MemoryRegion struct as its true size, because > > we have an Int128 field there. > > > > What assertion (with backtrace) is being hit? The issue is > > probably at that point, not here. > > Here you can. I'm basically creating a system_memory of size UINT64_MAX, > and the assert is being hit when the memory is registered to KVM. (I've cc'd Paolo to check my understanding of how KVM works.) > #5 0x0000fffff6fc4040 in __GI___assert_fail (assertion=0xffffe111d9c8 > "r == a", file=0xffffe111d960 "qemu/include/qemu/int128.h", line=34, > function=0xffffe111f348 <__PRETTY_FUNCTION__.46> "int128_get64") at > ./assert/assert.c:101 > #6 0x0000ffffe0c8cf6c in int128_get64 (a=18446744073709551616) at > qemu/include/qemu/int128.h:34 > #7 0x0000ffffe0c92cec in kvm_region_commit (listener=0xffffd83e92e0) at > qemu/accel/kvm/kvm-all.c:1503 > #8 0x0000ffffe0bd495c in memory_region_transaction_commit () at > qemu/softmmu/memory.c:1109 > #9 0x0000ffffe0bd8a90 in memory_region_update_container_subregions > (subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2606 > #10 0x0000ffffe0bd8b3c in memory_region_add_subregion_common > (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at > qemu/softmmu/memory.c:2621 > #11 0x0000ffffe0bd8b74 in memory_region_add_subregion > (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at > qemu/softmmu/memory.c:2629 > #12 0x0000ffffe05d5508 in gpex_host_realize (dev=0xaaaaabb69910, > errp=0xffffdd4ce1f0) at qemu/hw/pci-host/gpex.c:132 Thanks. It looks like KVM basically doesn't expect to be working with a memory region that large -- the KVM kernel APIs for registering memslots etc take a uint64_t size, so there's no way to say "all of the 64 bit address space". Probably the best available improvement would be if we added an assert at the point when the memory region initially gets set up with KVM to say "this is too big". Given that we don't run into this upstream, this leaves me suspecting that the underlying problem is that a memory region this big shouldn't be being registered to KVM in the first place. Certainly the gpex PCI controller works fine on the arm64 virt board under KVM, so maybe your board code is doing something odd when it wires it up? thanks -- PMM
On 24/10/2023 14:00, Peter Maydell wrote: > On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano > <quic_acaggian@quicinc.com> wrote: >> >> Hi Peter, >> >> Thanks for the quick response. >> >> On 24/10/2023 12:28, Peter Maydell wrote: >>> On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano >>> <quic_acaggian@quicinc.com> wrote: >>>> >>>> This looks like a bug. When the size is `UINT64_MAX`, it is reset to >>>> (Int128)`1 << 64` which actually is `UINT64_MAX + 1`. >>>> >>>> Then, an assert is triggered when the size is converted back to uin64_t >>>> by using the int128_get64() function, as the new value happens to be >>>> different than the previous one. >>>> >>>> Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com> >>>> --- >>>> system/memory.c | 3 --- >>>> 1 file changed, 3 deletions(-) >>>> >>>> diff --git a/system/memory.c b/system/memory.c >>>> index a800fbc9e5..d41fc6af88 100644 >>>> --- a/system/memory.c >>>> +++ b/system/memory.c >>>> @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr, >>>> uint64_t size) >>>> { >>>> mr->size = int128_make64(size); >>>> - if (size == UINT64_MAX) { >>>> - mr->size = int128_2_64(); >>>> - } >>> >>> No, this is intentional. In these memory region creation APIs >>> that take a uint64_t size parameter, size == UINT64_MAX is a >>> special case that means "actually the full 64 bit address space" >>> (and there is no way to ask for an MR to have a size that is >>> truly UINT64_MAX bytes). When we create the MR, the size is >>> stored in the MemoryRegion struct as its true size, because >>> we have an Int128 field there. >>> >>> What assertion (with backtrace) is being hit? The issue is >>> probably at that point, not here. >> >> Here you can. I'm basically creating a system_memory of size UINT64_MAX, >> and the assert is being hit when the memory is registered to KVM. > > (I've cc'd Paolo to check my understanding of how KVM works.) > >> #5 0x0000fffff6fc4040 in __GI___assert_fail (assertion=0xffffe111d9c8 >> "r == a", file=0xffffe111d960 "qemu/include/qemu/int128.h", line=34, >> function=0xffffe111f348 <__PRETTY_FUNCTION__.46> "int128_get64") at >> ./assert/assert.c:101 >> #6 0x0000ffffe0c8cf6c in int128_get64 (a=18446744073709551616) at >> qemu/include/qemu/int128.h:34 >> #7 0x0000ffffe0c92cec in kvm_region_commit (listener=0xffffd83e92e0) at >> qemu/accel/kvm/kvm-all.c:1503 >> #8 0x0000ffffe0bd495c in memory_region_transaction_commit () at >> qemu/softmmu/memory.c:1109 >> #9 0x0000ffffe0bd8a90 in memory_region_update_container_subregions >> (subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2606 >> #10 0x0000ffffe0bd8b3c in memory_region_add_subregion_common >> (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at >> qemu/softmmu/memory.c:2621 >> #11 0x0000ffffe0bd8b74 in memory_region_add_subregion >> (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at >> qemu/softmmu/memory.c:2629 >> #12 0x0000ffffe05d5508 in gpex_host_realize (dev=0xaaaaabb69910, >> errp=0xffffdd4ce1f0) at qemu/hw/pci-host/gpex.c:132 > > Thanks. It looks like KVM basically doesn't expect to be working > with a memory region that large -- the KVM kernel APIs for > registering memslots etc take a uint64_t size, so there's no way > to say "all of the 64 bit address space". Probably the best > available improvement would be if we added an assert at the point > when the memory region initially gets set up with KVM to say > "this is too big". > > Given that we don't run into this upstream, this leaves me > suspecting that the underlying problem is that a memory > region this big shouldn't be being registered to KVM in the > first place. Certainly the gpex PCI controller works fine > on the arm64 virt board under KVM, so maybe your board code > is doing something odd when it wires it up? > I think so, we use a MMIO system_memory, effectively using memory_region_init_io() instead of memory_region_init(). This is for registering our callbacks with a MemoryRegionOps. So, for a MMIO memory type, UINT64_MAX means "real" size rather than "all of the 64 bit address space"? > thanks > -- PMM Cheers, Antonio
On Tue, 24 Oct 2023 at 13:31, Antonio Caggiano <quic_acaggian@quicinc.com> wrote: > > > > On 24/10/2023 14:00, Peter Maydell wrote: > > On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano > > <quic_acaggian@quicinc.com> wrote: > > Given that we don't run into this upstream, this leaves me > > suspecting that the underlying problem is that a memory > > region this big shouldn't be being registered to KVM in the > > first place. Certainly the gpex PCI controller works fine > > on the arm64 virt board under KVM, so maybe your board code > > is doing something odd when it wires it up? > I think so, we use a MMIO system_memory, effectively using > memory_region_init_io() instead of memory_region_init(). This is for > registering our callbacks with a MemoryRegionOps. > > So, for a MMIO memory type, UINT64_MAX means "real" size rather than > "all of the 64 bit address space"? For a memory region, in the creation APIs that take a uint64_t size argument: * size 0 is (I think) not valid * 1..UINT64_MAX-1 mean "size is that many bytes" * UINT64_MAX means "size is 2^64 bytes" * there is no way to ask for (2^64)-1 bytes I am confused about why you say your system_memory is an MMIO region with callbacks, though. The system_memory MR is expected to be a pure "container" region, which has no callbacks and simply is a place where we add other subregions (which might be RAM, or IO, or whatever). thanks -- PMM
+cc Mark which has a better understanding of our use case. On 24/10/2023 15:21, Peter Maydell wrote: > On Tue, 24 Oct 2023 at 13:31, Antonio Caggiano > <quic_acaggian@quicinc.com> wrote: >> >> >> >> On 24/10/2023 14:00, Peter Maydell wrote: >>> On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano >>> <quic_acaggian@quicinc.com> wrote: >>> Given that we don't run into this upstream, this leaves me >>> suspecting that the underlying problem is that a memory >>> region this big shouldn't be being registered to KVM in the >>> first place. Certainly the gpex PCI controller works fine >>> on the arm64 virt board under KVM, so maybe your board code >>> is doing something odd when it wires it up? > >> I think so, we use a MMIO system_memory, effectively using >> memory_region_init_io() instead of memory_region_init(). This is for >> registering our callbacks with a MemoryRegionOps. >> >> So, for a MMIO memory type, UINT64_MAX means "real" size rather than >> "all of the 64 bit address space"? > > For a memory region, in the creation APIs that take a uint64_t > size argument: > * size 0 is (I think) not valid > * 1..UINT64_MAX-1 mean "size is that many bytes" > * UINT64_MAX means "size is 2^64 bytes" > * there is no way to ask for (2^64)-1 bytes > > I am confused about why you say your system_memory is an > MMIO region with callbacks, though. The system_memory MR > is expected to be a pure "container" region, which has > no callbacks and simply is a place where we add other > subregions (which might be RAM, or IO, or whatever). > We use an MMIO system memory for our ops, which capture memory accesses and handle some of them, even though no memory region is actually backing the addresses. Does that means we are violating QEMU's expectations by modifing that from "container" to "MMIO"? This issue is just one consequence of such violation and there might be more potentially? > thanks > -- PMM Cheers, Antonio
On Tue, 24 Oct 2023 at 14:38, Antonio Caggiano <quic_acaggian@quicinc.com> wrote: > > +cc Mark which has a better understanding of our use case. > > On 24/10/2023 15:21, Peter Maydell wrote: > > On Tue, 24 Oct 2023 at 13:31, Antonio Caggiano > > <quic_acaggian@quicinc.com> wrote: > >> > >> > >> > >> On 24/10/2023 14:00, Peter Maydell wrote: > >>> On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano > >>> <quic_acaggian@quicinc.com> wrote: > >>> Given that we don't run into this upstream, this leaves me > >>> suspecting that the underlying problem is that a memory > >>> region this big shouldn't be being registered to KVM in the > >>> first place. Certainly the gpex PCI controller works fine > >>> on the arm64 virt board under KVM, so maybe your board code > >>> is doing something odd when it wires it up? > > > >> I think so, we use a MMIO system_memory, effectively using > >> memory_region_init_io() instead of memory_region_init(). This is for > >> registering our callbacks with a MemoryRegionOps. > >> > >> So, for a MMIO memory type, UINT64_MAX means "real" size rather than > >> "all of the 64 bit address space"? > > > > For a memory region, in the creation APIs that take a uint64_t > > size argument: > > * size 0 is (I think) not valid > > * 1..UINT64_MAX-1 mean "size is that many bytes" > > * UINT64_MAX means "size is 2^64 bytes" > > * there is no way to ask for (2^64)-1 bytes > > > > I am confused about why you say your system_memory is an > > MMIO region with callbacks, though. The system_memory MR > > is expected to be a pure "container" region, which has > > no callbacks and simply is a place where we add other > > subregions (which might be RAM, or IO, or whatever). > > > > We use an MMIO system memory for our ops, which capture memory accesses > and handle some of them, even though no memory region is actually > backing the addresses. > > Does that means we are violating QEMU's expectations by modifing that > from "container" to "MMIO"? This issue is just one consequence of such > violation and there might be more potentially? I think that's one of those things which in theory might be supposed to work but which in practice might run into trouble, because nobody's ever done it before. In particular it seems like it's not currently compatible with KVM... I can think of a few things it might be interesting to try as workarounds: * leave the system_memory as a container, and fill it with two MMIO MemoryRegions at very-low-priority that split the 2^64 space in half, so that neither of them runs into the "too big" issue * restrict the size to the guest cpu physical address space size (again, avoiding the "too big" problem) In an ideal world the code in kvm_region_commit() could be made to handle 2^64-sized MRs, at least for MMIO MRs (where the eventual behaviour in kvm_set_phys_ram() is "nothing to do"). 2^64-sized memory-backed MRs would need handling in the kernel, but they're not possible since you can't get that much host memory to start with :-) -- PMM
diff --git a/system/memory.c b/system/memory.c index a800fbc9e5..d41fc6af88 100644 --- a/system/memory.c +++ b/system/memory.c @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr, uint64_t size) { mr->size = int128_make64(size); - if (size == UINT64_MAX) { - mr->size = int128_2_64(); - } mr->name = g_strdup(name); mr->owner = owner; mr->dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
This looks like a bug. When the size is `UINT64_MAX`, it is reset to (Int128)`1 << 64` which actually is `UINT64_MAX + 1`. Then, an assert is triggered when the size is converted back to uin64_t by using the int128_get64() function, as the new value happens to be different than the previous one. Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com> --- system/memory.c | 3 --- 1 file changed, 3 deletions(-)