diff mbox series

[RFC] mem: Fix mem region size when is UINT64_MAX

Message ID 20231024094351.50464-1-quic_acaggian@quicinc.com
State New
Headers show
Series [RFC] mem: Fix mem region size when is UINT64_MAX | expand

Commit Message

Antonio Caggiano Oct. 24, 2023, 9:43 a.m. UTC
This looks like a bug. When the size is `UINT64_MAX`, it is reset to
(Int128)`1 << 64` which actually is `UINT64_MAX + 1`.

Then, an assert is triggered when the size is converted back to uin64_t
by using the int128_get64() function, as the new value happens to be
different than the previous one.

Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com>
---
 system/memory.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

Peter Maydell Oct. 24, 2023, 10:28 a.m. UTC | #1
On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano
<quic_acaggian@quicinc.com> wrote:
>
> This looks like a bug. When the size is `UINT64_MAX`, it is reset to
> (Int128)`1 << 64` which actually is `UINT64_MAX + 1`.
>
> Then, an assert is triggered when the size is converted back to uin64_t
> by using the int128_get64() function, as the new value happens to be
> different than the previous one.
>
> Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com>
> ---
>  system/memory.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/system/memory.c b/system/memory.c
> index a800fbc9e5..d41fc6af88 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr,
>                                    uint64_t size)
>  {
>      mr->size = int128_make64(size);
> -    if (size == UINT64_MAX) {
> -        mr->size = int128_2_64();
> -    }

No, this is intentional. In these memory region creation APIs
that take a uint64_t size parameter, size == UINT64_MAX is a
special case that means "actually the full 64 bit address space"
(and there is no way to ask for an MR to have a size that is
truly UINT64_MAX bytes). When we create the MR, the size is
stored in the MemoryRegion struct as its true size, because
we have an Int128 field there.

What assertion (with backtrace) is being hit? The issue is
probably at that point, not here.

thanks
-- PMM
Antonio Caggiano Oct. 24, 2023, 10:49 a.m. UTC | #2
Hi Peter,

Thanks for the quick response.

On 24/10/2023 12:28, Peter Maydell wrote:
> On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano
> <quic_acaggian@quicinc.com> wrote:
>>
>> This looks like a bug. When the size is `UINT64_MAX`, it is reset to
>> (Int128)`1 << 64` which actually is `UINT64_MAX + 1`.
>>
>> Then, an assert is triggered when the size is converted back to uin64_t
>> by using the int128_get64() function, as the new value happens to be
>> different than the previous one.
>>
>> Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com>
>> ---
>>   system/memory.c | 3 ---
>>   1 file changed, 3 deletions(-)
>>
>> diff --git a/system/memory.c b/system/memory.c
>> index a800fbc9e5..d41fc6af88 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr,
>>                                     uint64_t size)
>>   {
>>       mr->size = int128_make64(size);
>> -    if (size == UINT64_MAX) {
>> -        mr->size = int128_2_64();
>> -    }
> 
> No, this is intentional. In these memory region creation APIs
> that take a uint64_t size parameter, size == UINT64_MAX is a
> special case that means "actually the full 64 bit address space"
> (and there is no way to ask for an MR to have a size that is
> truly UINT64_MAX bytes). When we create the MR, the size is
> stored in the MemoryRegion struct as its true size, because
> we have an Int128 field there.
> 
> What assertion (with backtrace) is being hit? The issue is
> probably at that point, not here.

Here you can. I'm basically creating a system_memory of size UINT64_MAX, 
and the assert is being hit when the memory is registered to KVM.

#5  0x0000fffff6fc4040 in __GI___assert_fail (assertion=0xffffe111d9c8 
"r == a", file=0xffffe111d960 "qemu/include/qemu/int128.h", line=34, 
function=0xffffe111f348 <__PRETTY_FUNCTION__.46> "int128_get64") at 
./assert/assert.c:101
#6  0x0000ffffe0c8cf6c in int128_get64 (a=18446744073709551616) at 
qemu/include/qemu/int128.h:34
#7  0x0000ffffe0c92cec in kvm_region_commit (listener=0xffffd83e92e0) at 
qemu/accel/kvm/kvm-all.c:1503
#8  0x0000ffffe0bd495c in memory_region_transaction_commit () at 
qemu/softmmu/memory.c:1109
#9  0x0000ffffe0bd8a90 in memory_region_update_container_subregions 
(subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2606
#10 0x0000ffffe0bd8b3c in memory_region_add_subregion_common 
(mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at 
qemu/softmmu/memory.c:2621
#11 0x0000ffffe0bd8b74 in memory_region_add_subregion 
(mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at 
qemu/softmmu/memory.c:2629
#12 0x0000ffffe05d5508 in gpex_host_realize (dev=0xaaaaabb69910, 
errp=0xffffdd4ce1f0) at qemu/hw/pci-host/gpex.c:132

> 
> thanks
> -- PMM


Cheers,
Antonio
Peter Maydell Oct. 24, 2023, noon UTC | #3
On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano
<quic_acaggian@quicinc.com> wrote:
>
> Hi Peter,
>
> Thanks for the quick response.
>
> On 24/10/2023 12:28, Peter Maydell wrote:
> > On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano
> > <quic_acaggian@quicinc.com> wrote:
> >>
> >> This looks like a bug. When the size is `UINT64_MAX`, it is reset to
> >> (Int128)`1 << 64` which actually is `UINT64_MAX + 1`.
> >>
> >> Then, an assert is triggered when the size is converted back to uin64_t
> >> by using the int128_get64() function, as the new value happens to be
> >> different than the previous one.
> >>
> >> Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com>
> >> ---
> >>   system/memory.c | 3 ---
> >>   1 file changed, 3 deletions(-)
> >>
> >> diff --git a/system/memory.c b/system/memory.c
> >> index a800fbc9e5..d41fc6af88 100644
> >> --- a/system/memory.c
> >> +++ b/system/memory.c
> >> @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr,
> >>                                     uint64_t size)
> >>   {
> >>       mr->size = int128_make64(size);
> >> -    if (size == UINT64_MAX) {
> >> -        mr->size = int128_2_64();
> >> -    }
> >
> > No, this is intentional. In these memory region creation APIs
> > that take a uint64_t size parameter, size == UINT64_MAX is a
> > special case that means "actually the full 64 bit address space"
> > (and there is no way to ask for an MR to have a size that is
> > truly UINT64_MAX bytes). When we create the MR, the size is
> > stored in the MemoryRegion struct as its true size, because
> > we have an Int128 field there.
> >
> > What assertion (with backtrace) is being hit? The issue is
> > probably at that point, not here.
>
> Here you can. I'm basically creating a system_memory of size UINT64_MAX,
> and the assert is being hit when the memory is registered to KVM.

(I've cc'd Paolo to check my understanding of how KVM works.)

> #5  0x0000fffff6fc4040 in __GI___assert_fail (assertion=0xffffe111d9c8
> "r == a", file=0xffffe111d960 "qemu/include/qemu/int128.h", line=34,
> function=0xffffe111f348 <__PRETTY_FUNCTION__.46> "int128_get64") at
> ./assert/assert.c:101
> #6  0x0000ffffe0c8cf6c in int128_get64 (a=18446744073709551616) at
> qemu/include/qemu/int128.h:34
> #7  0x0000ffffe0c92cec in kvm_region_commit (listener=0xffffd83e92e0) at
> qemu/accel/kvm/kvm-all.c:1503
> #8  0x0000ffffe0bd495c in memory_region_transaction_commit () at
> qemu/softmmu/memory.c:1109
> #9  0x0000ffffe0bd8a90 in memory_region_update_container_subregions
> (subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2606
> #10 0x0000ffffe0bd8b3c in memory_region_add_subregion_common
> (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at
> qemu/softmmu/memory.c:2621
> #11 0x0000ffffe0bd8b74 in memory_region_add_subregion
> (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at
> qemu/softmmu/memory.c:2629
> #12 0x0000ffffe05d5508 in gpex_host_realize (dev=0xaaaaabb69910,
> errp=0xffffdd4ce1f0) at qemu/hw/pci-host/gpex.c:132

Thanks. It looks like KVM basically doesn't expect to be working
with a memory region that large -- the KVM kernel APIs for
registering memslots etc take a uint64_t size, so there's no way
to say "all of the 64 bit address space". Probably the best
available improvement would be if we added an assert at the point
when the memory region initially gets set up with KVM to say
"this is too big".

Given that we don't run into this upstream, this leaves me
suspecting that the underlying problem is that a memory
region this big shouldn't be being registered to KVM in the
first place. Certainly the gpex PCI controller works fine
on the arm64 virt board under KVM, so maybe your board code
is doing something odd when it wires it up?

thanks
-- PMM
Antonio Caggiano Oct. 24, 2023, 12:31 p.m. UTC | #4
On 24/10/2023 14:00, Peter Maydell wrote:
> On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano
> <quic_acaggian@quicinc.com> wrote:
>>
>> Hi Peter,
>>
>> Thanks for the quick response.
>>
>> On 24/10/2023 12:28, Peter Maydell wrote:
>>> On Tue, 24 Oct 2023 at 10:45, Antonio Caggiano
>>> <quic_acaggian@quicinc.com> wrote:
>>>>
>>>> This looks like a bug. When the size is `UINT64_MAX`, it is reset to
>>>> (Int128)`1 << 64` which actually is `UINT64_MAX + 1`.
>>>>
>>>> Then, an assert is triggered when the size is converted back to uin64_t
>>>> by using the int128_get64() function, as the new value happens to be
>>>> different than the previous one.
>>>>
>>>> Signed-off-by: Antonio Caggiano <quic_acaggian@quicinc.com>
>>>> ---
>>>>    system/memory.c | 3 ---
>>>>    1 file changed, 3 deletions(-)
>>>>
>>>> diff --git a/system/memory.c b/system/memory.c
>>>> index a800fbc9e5..d41fc6af88 100644
>>>> --- a/system/memory.c
>>>> +++ b/system/memory.c
>>>> @@ -1193,9 +1193,6 @@ static void memory_region_do_init(MemoryRegion *mr,
>>>>                                      uint64_t size)
>>>>    {
>>>>        mr->size = int128_make64(size);
>>>> -    if (size == UINT64_MAX) {
>>>> -        mr->size = int128_2_64();
>>>> -    }
>>>
>>> No, this is intentional. In these memory region creation APIs
>>> that take a uint64_t size parameter, size == UINT64_MAX is a
>>> special case that means "actually the full 64 bit address space"
>>> (and there is no way to ask for an MR to have a size that is
>>> truly UINT64_MAX bytes). When we create the MR, the size is
>>> stored in the MemoryRegion struct as its true size, because
>>> we have an Int128 field there.
>>>
>>> What assertion (with backtrace) is being hit? The issue is
>>> probably at that point, not here.
>>
>> Here you can. I'm basically creating a system_memory of size UINT64_MAX,
>> and the assert is being hit when the memory is registered to KVM.
> 
> (I've cc'd Paolo to check my understanding of how KVM works.)
> 
>> #5  0x0000fffff6fc4040 in __GI___assert_fail (assertion=0xffffe111d9c8
>> "r == a", file=0xffffe111d960 "qemu/include/qemu/int128.h", line=34,
>> function=0xffffe111f348 <__PRETTY_FUNCTION__.46> "int128_get64") at
>> ./assert/assert.c:101
>> #6  0x0000ffffe0c8cf6c in int128_get64 (a=18446744073709551616) at
>> qemu/include/qemu/int128.h:34
>> #7  0x0000ffffe0c92cec in kvm_region_commit (listener=0xffffd83e92e0) at
>> qemu/accel/kvm/kvm-all.c:1503
>> #8  0x0000ffffe0bd495c in memory_region_transaction_commit () at
>> qemu/softmmu/memory.c:1109
>> #9  0x0000ffffe0bd8a90 in memory_region_update_container_subregions
>> (subregion=0xaaaaabb6abf0) at qemu/softmmu/memory.c:2606
>> #10 0x0000ffffe0bd8b3c in memory_region_add_subregion_common
>> (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at
>> qemu/softmmu/memory.c:2621
>> #11 0x0000ffffe0bd8b74 in memory_region_add_subregion
>> (mr=0xaaaaabb6ae10, offset=0, subregion=0xaaaaabb6abf0) at
>> qemu/softmmu/memory.c:2629
>> #12 0x0000ffffe05d5508 in gpex_host_realize (dev=0xaaaaabb69910,
>> errp=0xffffdd4ce1f0) at qemu/hw/pci-host/gpex.c:132
> 
> Thanks. It looks like KVM basically doesn't expect to be working
> with a memory region that large -- the KVM kernel APIs for
> registering memslots etc take a uint64_t size, so there's no way
> to say "all of the 64 bit address space". Probably the best
> available improvement would be if we added an assert at the point
> when the memory region initially gets set up with KVM to say
> "this is too big".
> 
> Given that we don't run into this upstream, this leaves me
> suspecting that the underlying problem is that a memory
> region this big shouldn't be being registered to KVM in the
> first place. Certainly the gpex PCI controller works fine
> on the arm64 virt board under KVM, so maybe your board code
> is doing something odd when it wires it up?
> 

I think so, we use a MMIO system_memory, effectively using 
memory_region_init_io() instead of memory_region_init(). This is for 
registering our callbacks with a MemoryRegionOps.

So, for a MMIO memory type, UINT64_MAX means "real" size rather than 
"all of the 64 bit address space"?

> thanks
> -- PMM

Cheers,
Antonio
Peter Maydell Oct. 24, 2023, 1:21 p.m. UTC | #5
On Tue, 24 Oct 2023 at 13:31, Antonio Caggiano
<quic_acaggian@quicinc.com> wrote:
>
>
>
> On 24/10/2023 14:00, Peter Maydell wrote:
> > On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano
> > <quic_acaggian@quicinc.com> wrote:
> > Given that we don't run into this upstream, this leaves me
> > suspecting that the underlying problem is that a memory
> > region this big shouldn't be being registered to KVM in the
> > first place. Certainly the gpex PCI controller works fine
> > on the arm64 virt board under KVM, so maybe your board code
> > is doing something odd when it wires it up?

> I think so, we use a MMIO system_memory, effectively using
> memory_region_init_io() instead of memory_region_init(). This is for
> registering our callbacks with a MemoryRegionOps.
>
> So, for a MMIO memory type, UINT64_MAX means "real" size rather than
> "all of the 64 bit address space"?

For a memory region, in the creation APIs that take a uint64_t
size argument:
 * size 0 is (I think) not valid
 * 1..UINT64_MAX-1 mean "size is that many bytes"
 * UINT64_MAX means "size is 2^64 bytes"
 * there is no way to ask for (2^64)-1 bytes

I am confused about why you say your system_memory is an
MMIO region with callbacks, though. The system_memory MR
is expected to be a pure "container" region, which has
no callbacks and simply is a place where we add other
subregions (which might be RAM, or IO, or whatever).

thanks
-- PMM
Antonio Caggiano Oct. 24, 2023, 1:38 p.m. UTC | #6
+cc Mark which has a better understanding of our use case.

On 24/10/2023 15:21, Peter Maydell wrote:
> On Tue, 24 Oct 2023 at 13:31, Antonio Caggiano
> <quic_acaggian@quicinc.com> wrote:
>>
>>
>>
>> On 24/10/2023 14:00, Peter Maydell wrote:
>>> On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano
>>> <quic_acaggian@quicinc.com> wrote:
>>> Given that we don't run into this upstream, this leaves me
>>> suspecting that the underlying problem is that a memory
>>> region this big shouldn't be being registered to KVM in the
>>> first place. Certainly the gpex PCI controller works fine
>>> on the arm64 virt board under KVM, so maybe your board code
>>> is doing something odd when it wires it up?
> 
>> I think so, we use a MMIO system_memory, effectively using
>> memory_region_init_io() instead of memory_region_init(). This is for
>> registering our callbacks with a MemoryRegionOps.
>>
>> So, for a MMIO memory type, UINT64_MAX means "real" size rather than
>> "all of the 64 bit address space"?
> 
> For a memory region, in the creation APIs that take a uint64_t
> size argument:
>   * size 0 is (I think) not valid
>   * 1..UINT64_MAX-1 mean "size is that many bytes"
>   * UINT64_MAX means "size is 2^64 bytes"
>   * there is no way to ask for (2^64)-1 bytes
> 
> I am confused about why you say your system_memory is an
> MMIO region with callbacks, though. The system_memory MR
> is expected to be a pure "container" region, which has
> no callbacks and simply is a place where we add other
> subregions (which might be RAM, or IO, or whatever).
> 

We use an MMIO system memory for our ops, which capture memory accesses 
and handle some of them, even though no memory region is actually 
backing the addresses.

Does that means we are violating QEMU's expectations by modifing that 
from "container" to "MMIO"? This issue is just one consequence of such 
violation and there might be more potentially?

> thanks
> -- PMM

Cheers,
Antonio
Peter Maydell Oct. 24, 2023, 2:17 p.m. UTC | #7
On Tue, 24 Oct 2023 at 14:38, Antonio Caggiano
<quic_acaggian@quicinc.com> wrote:
>
> +cc Mark which has a better understanding of our use case.
>
> On 24/10/2023 15:21, Peter Maydell wrote:
> > On Tue, 24 Oct 2023 at 13:31, Antonio Caggiano
> > <quic_acaggian@quicinc.com> wrote:
> >>
> >>
> >>
> >> On 24/10/2023 14:00, Peter Maydell wrote:
> >>> On Tue, 24 Oct 2023 at 11:49, Antonio Caggiano
> >>> <quic_acaggian@quicinc.com> wrote:
> >>> Given that we don't run into this upstream, this leaves me
> >>> suspecting that the underlying problem is that a memory
> >>> region this big shouldn't be being registered to KVM in the
> >>> first place. Certainly the gpex PCI controller works fine
> >>> on the arm64 virt board under KVM, so maybe your board code
> >>> is doing something odd when it wires it up?
> >
> >> I think so, we use a MMIO system_memory, effectively using
> >> memory_region_init_io() instead of memory_region_init(). This is for
> >> registering our callbacks with a MemoryRegionOps.
> >>
> >> So, for a MMIO memory type, UINT64_MAX means "real" size rather than
> >> "all of the 64 bit address space"?
> >
> > For a memory region, in the creation APIs that take a uint64_t
> > size argument:
> >   * size 0 is (I think) not valid
> >   * 1..UINT64_MAX-1 mean "size is that many bytes"
> >   * UINT64_MAX means "size is 2^64 bytes"
> >   * there is no way to ask for (2^64)-1 bytes
> >
> > I am confused about why you say your system_memory is an
> > MMIO region with callbacks, though. The system_memory MR
> > is expected to be a pure "container" region, which has
> > no callbacks and simply is a place where we add other
> > subregions (which might be RAM, or IO, or whatever).
> >
>
> We use an MMIO system memory for our ops, which capture memory accesses
> and handle some of them, even though no memory region is actually
> backing the addresses.
>
> Does that means we are violating QEMU's expectations by modifing that
> from "container" to "MMIO"? This issue is just one consequence of such
> violation and there might be more potentially?

I think that's one of those things which in theory might be supposed
to work but which in practice might run into trouble, because nobody's
ever done it before. In particular it seems like it's not currently
compatible with KVM...

I can think of a few things it might be interesting to try as
workarounds:
 * leave the system_memory as a container, and fill it with
   two MMIO MemoryRegions at very-low-priority that split the
   2^64 space in half, so that neither of them runs into the
   "too big" issue
 * restrict the size to the guest cpu physical address space
   size (again, avoiding the "too big" problem)

In an ideal world the code in kvm_region_commit() could be
made to handle 2^64-sized MRs, at least for MMIO MRs (where
the eventual behaviour in kvm_set_phys_ram() is "nothing to
do"). 2^64-sized memory-backed MRs would need handling in
the kernel, but they're not possible since you can't get
that much host memory to start with :-)

-- PMM
diff mbox series

Patch

diff --git a/system/memory.c b/system/memory.c
index a800fbc9e5..d41fc6af88 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1193,9 +1193,6 @@  static void memory_region_do_init(MemoryRegion *mr,
                                   uint64_t size)
 {
     mr->size = int128_make64(size);
-    if (size == UINT64_MAX) {
-        mr->size = int128_2_64();
-    }
     mr->name = g_strdup(name);
     mr->owner = owner;
     mr->dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);