Message ID | 20230922041619.3909-1-anisinha@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v4] hw/i386/pc: improve physical address space bound check for 32-bit x86 systems | expand |
On 22.09.23 06:16, Ani Sinha wrote: > 32-bit x86 systems do not have a reserved memory for hole64. On those 32-bit > systems without PSE36 or PAE CPU features, hotplugging memory devices are not > supported by QEMU as QEMU always places hotplugged memory above 4 GiB boundary > which is beyond the physical address space of the processor. Linux guests also > does not support memory hotplug on those systems. Please see Linux > kernel commit b59d02ed08690 ("mm/memory_hotplug: disable the functionality > for 32b") for more details. > > Therefore, the maximum limit of the guest physical address in the absence of > additional memory devices effectively coincides with the end of > "above 4G memory space" region for 32-bit x86 without PAE/PSE36. When users > configure additional memory devices, after properly accounting for the > additional device memory region to find the maximum value of the guest > physical address, the address will be outside the range of the processor's > physical address space. > > This change adds improvements to take above into consideration. > > For example, previously this was allowed: > > $ ./qemu-system-x86_64 -cpu pentium -m size=10G > > With this change now it is no longer allowed: > > $ ./qemu-system-x86_64 -cpu pentium -m size=10G > qemu-system-x86_64: Address space limit 0xffffffff < 0x2bfffffff phys-bits too low (32) > > However, the following are allowed since on both cases physical address > space of the processor is 36 bits: > > $ ./qemu-system-x86_64 -cpu pentium2 -m size=10G > $ ./qemu-system-x86_64 -cpu pentium,pse36=on -m size=10G > > For 32-bit, without PAE/PSE36, hotplugging additional memory is no longer allowed. > > $ ./qemu-system-i386 -m size=1G,maxmem=3G,slots=2 > qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) > $ ./qemu-system-i386 -machine q35 -m size=1G,maxmem=3G,slots=2 > qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) > > A new compatibility flag is introduced to make sure pc_max_used_gpa() keeps > returning the old value for machines 8.1 and older. > Therefore, the above is still allowed for older machine types in order to support > compatibility. Hence, the following still works: > > $ ./qemu-system-i386 -machine pc-i440fx-8.1 -m size=1G,maxmem=3G,slots=2 > $ ./qemu-system-i386 -machine pc-q35-8.1 -m size=1G,maxmem=3G,slots=2 > > Further, following is also allowed as with PSE36, the processor has 36-bit > address space: > > $ ./qemu-system-i386 -cpu 486,pse36=on -m size=1G,maxmem=3G,slots=2 > > After calling CPUID with EAX=0x80000001, all AMD64 compliant processors > have the longmode-capable-bit turned on in the extended feature flags (bit 29) > in EDX. The absence of CPUID longmode can be used to differentiate between > 32-bit and 64-bit processors and is the recommended approach. QEMU takes this > approach elsewhere (for example, please see x86_cpu_realizefn()), With > this change, pc_max_used_gpa() also uses the same method to detect 32-bit > processors. > > Unit tests are modified to not run 32-bit x86 tests that use memory hotplug. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Ani Sinha <anisinha@redhat.com> > --- LGTM, thanks Reviewed-by: David Hildenbrand <david@redhat.com>
On 22/9/23 06:16, Ani Sinha wrote: > 32-bit x86 systems do not have a reserved memory for hole64. On those 32-bit > systems without PSE36 or PAE CPU features, hotplugging memory devices are not > supported by QEMU as QEMU always places hotplugged memory above 4 GiB boundary > which is beyond the physical address space of the processor. Linux guests also > does not support memory hotplug on those systems. Please see Linux > kernel commit b59d02ed08690 ("mm/memory_hotplug: disable the functionality > for 32b") for more details. > > Therefore, the maximum limit of the guest physical address in the absence of > additional memory devices effectively coincides with the end of > "above 4G memory space" region for 32-bit x86 without PAE/PSE36. When users > configure additional memory devices, after properly accounting for the > additional device memory region to find the maximum value of the guest > physical address, the address will be outside the range of the processor's > physical address space. > > This change adds improvements to take above into consideration. > > For example, previously this was allowed: > > $ ./qemu-system-x86_64 -cpu pentium -m size=10G > > With this change now it is no longer allowed: > > $ ./qemu-system-x86_64 -cpu pentium -m size=10G > qemu-system-x86_64: Address space limit 0xffffffff < 0x2bfffffff phys-bits too low (32) > > However, the following are allowed since on both cases physical address > space of the processor is 36 bits: > > $ ./qemu-system-x86_64 -cpu pentium2 -m size=10G > $ ./qemu-system-x86_64 -cpu pentium,pse36=on -m size=10G > > For 32-bit, without PAE/PSE36, hotplugging additional memory is no longer allowed. > > $ ./qemu-system-i386 -m size=1G,maxmem=3G,slots=2 > qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) > $ ./qemu-system-i386 -machine q35 -m size=1G,maxmem=3G,slots=2 > qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) > > A new compatibility flag is introduced to make sure pc_max_used_gpa() keeps > returning the old value for machines 8.1 and older. > Therefore, the above is still allowed for older machine types in order to support > compatibility. Hence, the following still works: > > $ ./qemu-system-i386 -machine pc-i440fx-8.1 -m size=1G,maxmem=3G,slots=2 > $ ./qemu-system-i386 -machine pc-q35-8.1 -m size=1G,maxmem=3G,slots=2 > > Further, following is also allowed as with PSE36, the processor has 36-bit > address space: > > $ ./qemu-system-i386 -cpu 486,pse36=on -m size=1G,maxmem=3G,slots=2 > > After calling CPUID with EAX=0x80000001, all AMD64 compliant processors > have the longmode-capable-bit turned on in the extended feature flags (bit 29) > in EDX. The absence of CPUID longmode can be used to differentiate between > 32-bit and 64-bit processors and is the recommended approach. QEMU takes this > approach elsewhere (for example, please see x86_cpu_realizefn()), With > this change, pc_max_used_gpa() also uses the same method to detect 32-bit > processors. > > Unit tests are modified to not run 32-bit x86 tests that use memory hotplug. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Ani Sinha <anisinha@redhat.com> > --- > hw/i386/pc.c | 31 ++++++++++++++++++++++++++++--- > hw/i386/pc_piix.c | 4 ++++ > hw/i386/pc_q35.c | 2 ++ > include/hw/i386/pc.h | 6 ++++++ > tests/qtest/bios-tables-test.c | 26 ++++++++++++++++++-------- > tests/qtest/numa-test.c | 7 ++++++- > 6 files changed, 64 insertions(+), 12 deletions(-) > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 54838c0c41..2a689cf0bd 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -907,12 +907,37 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) > static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) > { > X86CPU *cpu = X86_CPU(first_cpu); > + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > + MachineState *ms = MACHINE(pcms); > + uint64_t devmem_start = 0; > + ram_addr_t devmem_size = 0; > > - /* 32-bit systems don't have hole64 thus return max CPU address */ > - if (cpu->phys_bits <= 32) { > - return ((hwaddr)1 << cpu->phys_bits) - 1; > + /* > + * 32-bit systems don't have hole64 but they might have a region for > + * memory devices. Even if additional hotplugged memory devices might > + * not be usable by most guest OSes, we need to still consider them for > + * calculating the highest possible GPA so that we can properly report > + * if someone configures them on a CPU that cannot possibly address them. > + */ > + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) { > + /* 32-bit systems */ > + if (!pcmc->broken_32bit_mem_addr_check) { Nitpicking, code is simplified if you invert this condition check. > + if (pcmc->has_reserved_memory && > + (ms->ram_size < ms->maxram_size)) { > + pc_get_device_memory_range(pcms, &devmem_start, > + &devmem_size); > + devmem_start += devmem_size; > + return devmem_start - 1; > + } else { > + return pc_above_4g_end(pcms) - 1; > + } > + } else { > + /* old value for compatibility reasons */ > + return ((hwaddr)1 << cpu->phys_bits) - 1; Since you change this line, can we convert to MAKE_64BIT_MASK(0, cpu->phys_bits) ? > + } > } > > + /* 64-bit systems */ > return pc_pci_hole64_start() + pci_hole64_size - 1; > }
> On 22-Sep-2023, at 4:12 PM, Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > On 22/9/23 06:16, Ani Sinha wrote: >> 32-bit x86 systems do not have a reserved memory for hole64. On those 32-bit >> systems without PSE36 or PAE CPU features, hotplugging memory devices are not >> supported by QEMU as QEMU always places hotplugged memory above 4 GiB boundary >> which is beyond the physical address space of the processor. Linux guests also >> does not support memory hotplug on those systems. Please see Linux >> kernel commit b59d02ed08690 ("mm/memory_hotplug: disable the functionality >> for 32b") for more details. >> Therefore, the maximum limit of the guest physical address in the absence of >> additional memory devices effectively coincides with the end of >> "above 4G memory space" region for 32-bit x86 without PAE/PSE36. When users >> configure additional memory devices, after properly accounting for the >> additional device memory region to find the maximum value of the guest >> physical address, the address will be outside the range of the processor's >> physical address space. >> This change adds improvements to take above into consideration. >> For example, previously this was allowed: >> $ ./qemu-system-x86_64 -cpu pentium -m size=10G >> With this change now it is no longer allowed: >> $ ./qemu-system-x86_64 -cpu pentium -m size=10G >> qemu-system-x86_64: Address space limit 0xffffffff < 0x2bfffffff phys-bits too low (32) >> However, the following are allowed since on both cases physical address >> space of the processor is 36 bits: >> $ ./qemu-system-x86_64 -cpu pentium2 -m size=10G >> $ ./qemu-system-x86_64 -cpu pentium,pse36=on -m size=10G >> For 32-bit, without PAE/PSE36, hotplugging additional memory is no longer allowed. >> $ ./qemu-system-i386 -m size=1G,maxmem=3G,slots=2 >> qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) >> $ ./qemu-system-i386 -machine q35 -m size=1G,maxmem=3G,slots=2 >> qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) >> A new compatibility flag is introduced to make sure pc_max_used_gpa() keeps >> returning the old value for machines 8.1 and older. >> Therefore, the above is still allowed for older machine types in order to support >> compatibility. Hence, the following still works: >> $ ./qemu-system-i386 -machine pc-i440fx-8.1 -m size=1G,maxmem=3G,slots=2 >> $ ./qemu-system-i386 -machine pc-q35-8.1 -m size=1G,maxmem=3G,slots=2 >> Further, following is also allowed as with PSE36, the processor has 36-bit >> address space: >> $ ./qemu-system-i386 -cpu 486,pse36=on -m size=1G,maxmem=3G,slots=2 >> After calling CPUID with EAX=0x80000001, all AMD64 compliant processors >> have the longmode-capable-bit turned on in the extended feature flags (bit 29) >> in EDX. The absence of CPUID longmode can be used to differentiate between >> 32-bit and 64-bit processors and is the recommended approach. QEMU takes this >> approach elsewhere (for example, please see x86_cpu_realizefn()), With >> this change, pc_max_used_gpa() also uses the same method to detect 32-bit >> processors. >> Unit tests are modified to not run 32-bit x86 tests that use memory hotplug. >> Suggested-by: David Hildenbrand <david@redhat.com> >> Signed-off-by: Ani Sinha <anisinha@redhat.com> >> --- >> hw/i386/pc.c | 31 ++++++++++++++++++++++++++++--- >> hw/i386/pc_piix.c | 4 ++++ >> hw/i386/pc_q35.c | 2 ++ >> include/hw/i386/pc.h | 6 ++++++ >> tests/qtest/bios-tables-test.c | 26 ++++++++++++++++++-------- >> tests/qtest/numa-test.c | 7 ++++++- >> 6 files changed, 64 insertions(+), 12 deletions(-) > > >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >> index 54838c0c41..2a689cf0bd 100644 >> --- a/hw/i386/pc.c >> +++ b/hw/i386/pc.c >> @@ -907,12 +907,37 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) >> static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) >> { >> X86CPU *cpu = X86_CPU(first_cpu); >> + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); >> + MachineState *ms = MACHINE(pcms); >> + uint64_t devmem_start = 0; >> + ram_addr_t devmem_size = 0; >> - /* 32-bit systems don't have hole64 thus return max CPU address */ >> - if (cpu->phys_bits <= 32) { >> - return ((hwaddr)1 << cpu->phys_bits) - 1; >> + /* >> + * 32-bit systems don't have hole64 but they might have a region for >> + * memory devices. Even if additional hotplugged memory devices might >> + * not be usable by most guest OSes, we need to still consider them for >> + * calculating the highest possible GPA so that we can properly report >> + * if someone configures them on a CPU that cannot possibly address them. >> + */ >> + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) { >> + /* 32-bit systems */ >> + if (!pcmc->broken_32bit_mem_addr_check) { > > Nitpicking, code is simplified if you invert this condition check. Maybe it reads better, but simplified? .. I am not so sure. > >> + if (pcmc->has_reserved_memory && >> + (ms->ram_size < ms->maxram_size)) { >> + pc_get_device_memory_range(pcms, &devmem_start, >> + &devmem_size); >> + devmem_start += devmem_size; >> + return devmem_start - 1; >> + } else { >> + return pc_above_4g_end(pcms) - 1; >> + } >> + } else { >> + /* old value for compatibility reasons */ >> + return ((hwaddr)1 << cpu->phys_bits) - 1; > > Since you change this line, can we convert to > MAKE_64BIT_MASK(0, cpu->phys_bits) ? Doesn’t the existing code reads better? Assuming that the macro does exactly the same thing, one has to still look up the definition. And (((~0ULL) >> (64 - (length))) << (shift)) Is such a brain twister :-) > >> + } >> } >> + /* 64-bit systems */ >> return pc_pci_hole64_start() + pci_hole64_size - 1; >> }
On 22/9/23 14:00, Ani Sinha wrote: > > >> On 22-Sep-2023, at 4:12 PM, Philippe Mathieu-Daudé <philmd@linaro.org> wrote: >> >> On 22/9/23 06:16, Ani Sinha wrote: >>> 32-bit x86 systems do not have a reserved memory for hole64. On those 32-bit >>> systems without PSE36 or PAE CPU features, hotplugging memory devices are not >>> supported by QEMU as QEMU always places hotplugged memory above 4 GiB boundary >>> which is beyond the physical address space of the processor. Linux guests also >>> does not support memory hotplug on those systems. Please see Linux >>> kernel commit b59d02ed08690 ("mm/memory_hotplug: disable the functionality >>> for 32b") for more details. >>> Therefore, the maximum limit of the guest physical address in the absence of >>> additional memory devices effectively coincides with the end of >>> "above 4G memory space" region for 32-bit x86 without PAE/PSE36. When users >>> configure additional memory devices, after properly accounting for the >>> additional device memory region to find the maximum value of the guest >>> physical address, the address will be outside the range of the processor's >>> physical address space. >>> This change adds improvements to take above into consideration. >>> For example, previously this was allowed: >>> $ ./qemu-system-x86_64 -cpu pentium -m size=10G >>> With this change now it is no longer allowed: >>> $ ./qemu-system-x86_64 -cpu pentium -m size=10G >>> qemu-system-x86_64: Address space limit 0xffffffff < 0x2bfffffff phys-bits too low (32) >>> However, the following are allowed since on both cases physical address >>> space of the processor is 36 bits: >>> $ ./qemu-system-x86_64 -cpu pentium2 -m size=10G >>> $ ./qemu-system-x86_64 -cpu pentium,pse36=on -m size=10G >>> For 32-bit, without PAE/PSE36, hotplugging additional memory is no longer allowed. >>> $ ./qemu-system-i386 -m size=1G,maxmem=3G,slots=2 >>> qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) >>> $ ./qemu-system-i386 -machine q35 -m size=1G,maxmem=3G,slots=2 >>> qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) >>> A new compatibility flag is introduced to make sure pc_max_used_gpa() keeps >>> returning the old value for machines 8.1 and older. >>> Therefore, the above is still allowed for older machine types in order to support >>> compatibility. Hence, the following still works: >>> $ ./qemu-system-i386 -machine pc-i440fx-8.1 -m size=1G,maxmem=3G,slots=2 >>> $ ./qemu-system-i386 -machine pc-q35-8.1 -m size=1G,maxmem=3G,slots=2 >>> Further, following is also allowed as with PSE36, the processor has 36-bit >>> address space: >>> $ ./qemu-system-i386 -cpu 486,pse36=on -m size=1G,maxmem=3G,slots=2 >>> After calling CPUID with EAX=0x80000001, all AMD64 compliant processors >>> have the longmode-capable-bit turned on in the extended feature flags (bit 29) >>> in EDX. The absence of CPUID longmode can be used to differentiate between >>> 32-bit and 64-bit processors and is the recommended approach. QEMU takes this >>> approach elsewhere (for example, please see x86_cpu_realizefn()), With >>> this change, pc_max_used_gpa() also uses the same method to detect 32-bit >>> processors. >>> Unit tests are modified to not run 32-bit x86 tests that use memory hotplug. >>> Suggested-by: David Hildenbrand <david@redhat.com> >>> Signed-off-by: Ani Sinha <anisinha@redhat.com> >>> --- >>> hw/i386/pc.c | 31 ++++++++++++++++++++++++++++--- >>> hw/i386/pc_piix.c | 4 ++++ >>> hw/i386/pc_q35.c | 2 ++ >>> include/hw/i386/pc.h | 6 ++++++ >>> tests/qtest/bios-tables-test.c | 26 ++++++++++++++++++-------- >>> tests/qtest/numa-test.c | 7 ++++++- >>> 6 files changed, 64 insertions(+), 12 deletions(-) >> >> >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >>> index 54838c0c41..2a689cf0bd 100644 >>> --- a/hw/i386/pc.c >>> +++ b/hw/i386/pc.c >>> @@ -907,12 +907,37 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) >>> static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) >>> { >>> X86CPU *cpu = X86_CPU(first_cpu); >>> + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); >>> + MachineState *ms = MACHINE(pcms); >>> + uint64_t devmem_start = 0; >>> + ram_addr_t devmem_size = 0; >>> - /* 32-bit systems don't have hole64 thus return max CPU address */ >>> - if (cpu->phys_bits <= 32) { >>> - return ((hwaddr)1 << cpu->phys_bits) - 1; >>> + /* >>> + * 32-bit systems don't have hole64 but they might have a region for >>> + * memory devices. Even if additional hotplugged memory devices might >>> + * not be usable by most guest OSes, we need to still consider them for >>> + * calculating the highest possible GPA so that we can properly report >>> + * if someone configures them on a CPU that cannot possibly address them. >>> + */ >>> + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) { >>> + /* 32-bit systems */ >>> + if (!pcmc->broken_32bit_mem_addr_check) { >> >> Nitpicking, code is simplified if you invert this condition check. > > Maybe it reads better, but simplified? .. I am not so sure. As: -- >8 -- @@ -907,12 +907,35 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) { X86CPU *cpu = X86_CPU(first_cpu); + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); + MachineState *ms = MACHINE(pcms); - /* 32-bit systems don't have hole64 thus return max CPU address */ - if (cpu->phys_bits <= 32) { - return ((hwaddr)1 << cpu->phys_bits) - 1; + /* + * 32-bit systems don't have hole64 but they might have a region for + * memory devices. Even if additional hotplugged memory devices might + * not be usable by most guest OSes, we need to still consider them for + * calculating the highest possible GPA so that we can properly report + * if someone configures them on a CPU that cannot possibly address them. + */ + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) { + /* 32-bit systems */ + if (pcmc->broken_32bit_mem_addr_check) { + /* old value for compatibility reasons */ + return ((hwaddr)1 << cpu->phys_bits) - 1; + } + if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) { + uint64_t devmem_start = 0; + ram_addr_t devmem_size = 0; + + pc_get_device_memory_range(pcms, &devmem_start, &devmem_size); + devmem_start += devmem_size; + + return devmem_start - 1; + } + return pc_above_4g_end(pcms) - 1; } + /* 64-bit systems */ return pc_pci_hole64_start() + pci_hole64_size - 1; } --- But then even simpler (to review): -- >8 -- @@ -907,13 +907,35 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) { X86CPU *cpu = X86_CPU(first_cpu); + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); + MachineState *ms = MACHINE(pcms); - /* 32-bit systems don't have hole64 thus return max CPU address */ - if (cpu->phys_bits <= 32) { - return ((hwaddr)1 << cpu->phys_bits) - 1; + if (cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) { + /* 64-bit systems */ + return pc_pci_hole64_start() + pci_hole64_size - 1; } - return pc_pci_hole64_start() + pci_hole64_size - 1; + /* + * 32-bit systems don't have hole64 but they might have a region for + * memory devices. Even if additional hotplugged memory devices might + * not be usable by most guest OSes, we need to still consider them for + * calculating the highest possible GPA so that we can properly report + * if someone configures them on a CPU that cannot possibly address them. + */ + if (pcmc->broken_32bit_mem_addr_check) { + /* old value for compatibility reasons */ + return ((hwaddr)1 << cpu->phys_bits) - 1; + } + if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) { + uint64_t devmem_start = 0; + ram_addr_t devmem_size = 0; + + pc_get_device_memory_range(pcms, &devmem_start, &devmem_size); + devmem_start += devmem_size; + + return devmem_start - 1; + } + return pc_above_4g_end(pcms) - 1; } --- Personally I'd do it in 2 steps, first invert the if statement: -- >8 -- @@ -908,12 +908,13 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) { X86CPU *cpu = X86_CPU(first_cpu); - /* 32-bit systems don't have hole64 thus return max CPU address */ - if (cpu->phys_bits <= 32) { - return ((hwaddr)1 << cpu->phys_bits) - 1; + if (cpu->phys_bits > 32) { + /* 64-bit systems */ + return pc_pci_hole64_start() + pci_hole64_size - 1; } - return pc_pci_hole64_start() + pci_hole64_size - 1; + /* 32-bit systems don't have hole64 thus return max CPU address */ + return ((hwaddr)1 << cpu->phys_bits) - 1; } --- Then your patch as: -- >8 -- @@ -907,14 +907,35 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) { X86CPU *cpu = X86_CPU(first_cpu); + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); + MachineState *ms = MACHINE(pcms); - if (cpu->phys_bits > 32) { + if (cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) { /* 64-bit systems */ return pc_pci_hole64_start() + pci_hole64_size - 1; } - /* 32-bit systems don't have hole64 thus return max CPU address */ - return ((hwaddr)1 << cpu->phys_bits) - 1; + /* + * 32-bit systems don't have hole64 but they might have a region for + * memory devices. Even if additional hotplugged memory devices might + * not be usable by most guest OSes, we need to still consider them for + * calculating the highest possible GPA so that we can properly report + * if someone configures them on a CPU that cannot possibly address them. + */ + if (pcmc->broken_32bit_mem_addr_check) { + /* old value for compatibility reasons */ + return ((hwaddr)1 << cpu->phys_bits) - 1; + } + if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) { + uint64_t devmem_start = 0; + ram_addr_t devmem_size = 0; + + pc_get_device_memory_range(pcms, &devmem_start, &devmem_size); + devmem_start += devmem_size; + + return devmem_start - 1; + } + return pc_above_4g_end(pcms) - 1; } --- > >> >>> + if (pcmc->has_reserved_memory && >>> + (ms->ram_size < ms->maxram_size)) { >>> + pc_get_device_memory_range(pcms, &devmem_start, >>> + &devmem_size); >>> + devmem_start += devmem_size; >>> + return devmem_start - 1; >>> + } else { >>> + return pc_above_4g_end(pcms) - 1; >>> + } >>> + } else { >>> + /* old value for compatibility reasons */ >>> + return ((hwaddr)1 << cpu->phys_bits) - 1; Left shifting by N and removing 1 makes a mask of N bits, but I had to do the logical operation mentally to figure it out, >> >> Since you change this line, can we convert to >> MAKE_64BIT_MASK(0, cpu->phys_bits) ? while here this macro says "make a mask" without having to look at the definition. Anyhow as I said, "nitpicking...". > > Doesn’t the existing code reads better? Assuming that the macro does exactly the same thing, one has to still look up the definition. And > > (((~0ULL) >> (64 - (length))) << (shift)) > > Is such a brain twister :-) > >> >>> + } >>> } >>> + /* 64-bit systems */ >>> return pc_pci_hole64_start() + pci_hole64_size - 1; >>> } >
> On 22-Sep-2023, at 6:26 PM, Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > > On 22/9/23 14:00, Ani Sinha wrote: >>> On 22-Sep-2023, at 4:12 PM, Philippe Mathieu-Daudé <philmd@linaro.org> wrote: >>> >>> On 22/9/23 06:16, Ani Sinha wrote: >>>> 32-bit x86 systems do not have a reserved memory for hole64. On those 32-bit >>>> systems without PSE36 or PAE CPU features, hotplugging memory devices are not >>>> supported by QEMU as QEMU always places hotplugged memory above 4 GiB boundary >>>> which is beyond the physical address space of the processor. Linux guests also >>>> does not support memory hotplug on those systems. Please see Linux >>>> kernel commit b59d02ed08690 ("mm/memory_hotplug: disable the functionality >>>> for 32b") for more details. >>>> Therefore, the maximum limit of the guest physical address in the absence of >>>> additional memory devices effectively coincides with the end of >>>> "above 4G memory space" region for 32-bit x86 without PAE/PSE36. When users >>>> configure additional memory devices, after properly accounting for the >>>> additional device memory region to find the maximum value of the guest >>>> physical address, the address will be outside the range of the processor's >>>> physical address space. >>>> This change adds improvements to take above into consideration. >>>> For example, previously this was allowed: >>>> $ ./qemu-system-x86_64 -cpu pentium -m size=10G >>>> With this change now it is no longer allowed: >>>> $ ./qemu-system-x86_64 -cpu pentium -m size=10G >>>> qemu-system-x86_64: Address space limit 0xffffffff < 0x2bfffffff phys-bits too low (32) >>>> However, the following are allowed since on both cases physical address >>>> space of the processor is 36 bits: >>>> $ ./qemu-system-x86_64 -cpu pentium2 -m size=10G >>>> $ ./qemu-system-x86_64 -cpu pentium,pse36=on -m size=10G >>>> For 32-bit, without PAE/PSE36, hotplugging additional memory is no longer allowed. >>>> $ ./qemu-system-i386 -m size=1G,maxmem=3G,slots=2 >>>> qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) >>>> $ ./qemu-system-i386 -machine q35 -m size=1G,maxmem=3G,slots=2 >>>> qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) >>>> A new compatibility flag is introduced to make sure pc_max_used_gpa() keeps >>>> returning the old value for machines 8.1 and older. >>>> Therefore, the above is still allowed for older machine types in order to support >>>> compatibility. Hence, the following still works: >>>> $ ./qemu-system-i386 -machine pc-i440fx-8.1 -m size=1G,maxmem=3G,slots=2 >>>> $ ./qemu-system-i386 -machine pc-q35-8.1 -m size=1G,maxmem=3G,slots=2 >>>> Further, following is also allowed as with PSE36, the processor has 36-bit >>>> address space: >>>> $ ./qemu-system-i386 -cpu 486,pse36=on -m size=1G,maxmem=3G,slots=2 >>>> After calling CPUID with EAX=0x80000001, all AMD64 compliant processors >>>> have the longmode-capable-bit turned on in the extended feature flags (bit 29) >>>> in EDX. The absence of CPUID longmode can be used to differentiate between >>>> 32-bit and 64-bit processors and is the recommended approach. QEMU takes this >>>> approach elsewhere (for example, please see x86_cpu_realizefn()), With >>>> this change, pc_max_used_gpa() also uses the same method to detect 32-bit >>>> processors. >>>> Unit tests are modified to not run 32-bit x86 tests that use memory hotplug. >>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>> Signed-off-by: Ani Sinha <anisinha@redhat.com> >>>> --- >>>> hw/i386/pc.c | 31 ++++++++++++++++++++++++++++--- >>>> hw/i386/pc_piix.c | 4 ++++ >>>> hw/i386/pc_q35.c | 2 ++ >>>> include/hw/i386/pc.h | 6 ++++++ >>>> tests/qtest/bios-tables-test.c | 26 ++++++++++++++++++-------- >>>> tests/qtest/numa-test.c | 7 ++++++- >>>> 6 files changed, 64 insertions(+), 12 deletions(-) >>> >>> >>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c >>>> index 54838c0c41..2a689cf0bd 100644 >>>> --- a/hw/i386/pc.c >>>> +++ b/hw/i386/pc.c >>>> @@ -907,12 +907,37 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) >>>> static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) >>>> { >>>> X86CPU *cpu = X86_CPU(first_cpu); >>>> + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); >>>> + MachineState *ms = MACHINE(pcms); >>>> + uint64_t devmem_start = 0; >>>> + ram_addr_t devmem_size = 0; >>>> - /* 32-bit systems don't have hole64 thus return max CPU address */ >>>> - if (cpu->phys_bits <= 32) { >>>> - return ((hwaddr)1 << cpu->phys_bits) - 1; >>>> + /* >>>> + * 32-bit systems don't have hole64 but they might have a region for >>>> + * memory devices. Even if additional hotplugged memory devices might >>>> + * not be usable by most guest OSes, we need to still consider them for >>>> + * calculating the highest possible GPA so that we can properly report >>>> + * if someone configures them on a CPU that cannot possibly address them. >>>> + */ >>>> + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) { >>>> + /* 32-bit systems */ >>>> + if (!pcmc->broken_32bit_mem_addr_check) { >>> >>> Nitpicking, code is simplified if you invert this condition check. >> Maybe it reads better, but simplified? .. I am not so sure. > > As: > > -- >8 -- > @@ -907,12 +907,35 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) > static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) > { > X86CPU *cpu = X86_CPU(first_cpu); > + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > + MachineState *ms = MACHINE(pcms); > > - /* 32-bit systems don't have hole64 thus return max CPU address */ > - if (cpu->phys_bits <= 32) { > - return ((hwaddr)1 << cpu->phys_bits) - 1; > + /* > + * 32-bit systems don't have hole64 but they might have a region for > + * memory devices. Even if additional hotplugged memory devices might > + * not be usable by most guest OSes, we need to still consider them for > + * calculating the highest possible GPA so that we can properly report > + * if someone configures them on a CPU that cannot possibly address them. > + */ > + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) { > + /* 32-bit systems */ > + if (pcmc->broken_32bit_mem_addr_check) { > + /* old value for compatibility reasons */ > + return ((hwaddr)1 << cpu->phys_bits) - 1; > + } > + if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) { > + uint64_t devmem_start = 0; > + ram_addr_t devmem_size = 0; > + > + pc_get_device_memory_range(pcms, &devmem_start, &devmem_size); > + devmem_start += devmem_size; > + > + return devmem_start - 1; > + } > + return pc_above_4g_end(pcms) - 1; > } > > + /* 64-bit systems */ > return pc_pci_hole64_start() + pci_hole64_size - 1; > } > --- > > But then even simpler (to review): > > -- >8 -- > @@ -907,13 +907,35 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) > static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) > { > X86CPU *cpu = X86_CPU(first_cpu); > + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > + MachineState *ms = MACHINE(pcms); > > - /* 32-bit systems don't have hole64 thus return max CPU address */ > - if (cpu->phys_bits <= 32) { > - return ((hwaddr)1 << cpu->phys_bits) - 1; > + if (cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) { > + /* 64-bit systems */ > + return pc_pci_hole64_start() + pci_hole64_size - 1; > } > > - return pc_pci_hole64_start() + pci_hole64_size - 1; > + /* > + * 32-bit systems don't have hole64 but they might have a region for > + * memory devices. Even if additional hotplugged memory devices might > + * not be usable by most guest OSes, we need to still consider them for > + * calculating the highest possible GPA so that we can properly report > + * if someone configures them on a CPU that cannot possibly address them. > + */ > + if (pcmc->broken_32bit_mem_addr_check) { > + /* old value for compatibility reasons */ > + return ((hwaddr)1 << cpu->phys_bits) - 1; > + } > + if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) { > + uint64_t devmem_start = 0; > + ram_addr_t devmem_size = 0; > + > + pc_get_device_memory_range(pcms, &devmem_start, &devmem_size); > + devmem_start += devmem_size; > + > + return devmem_start - 1; > + } > + return pc_above_4g_end(pcms) - 1; > } > --- Yeah I started reworking it before your email and realized it does make the function much cleaner. > > Personally I'd do it in 2 steps, first invert the if statement: > > -- >8 -- > @@ -908,12 +908,13 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) > { > X86CPU *cpu = X86_CPU(first_cpu); > > - /* 32-bit systems don't have hole64 thus return max CPU address */ > - if (cpu->phys_bits <= 32) { > - return ((hwaddr)1 << cpu->phys_bits) - 1; > + if (cpu->phys_bits > 32) { > + /* 64-bit systems */ > + return pc_pci_hole64_start() + pci_hole64_size - 1; > } > > - return pc_pci_hole64_start() + pci_hole64_size - 1; > + /* 32-bit systems don't have hole64 thus return max CPU address */ > + return ((hwaddr)1 << cpu->phys_bits) - 1; > } > --- > > Then your patch as: > > -- >8 -- > @@ -907,14 +907,35 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) > static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) > { > X86CPU *cpu = X86_CPU(first_cpu); > + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > + MachineState *ms = MACHINE(pcms); > > - if (cpu->phys_bits > 32) { > + if (cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) { > /* 64-bit systems */ > return pc_pci_hole64_start() + pci_hole64_size - 1; > } > > - /* 32-bit systems don't have hole64 thus return max CPU address */ > - return ((hwaddr)1 << cpu->phys_bits) - 1; > + /* > + * 32-bit systems don't have hole64 but they might have a region for > + * memory devices. Even if additional hotplugged memory devices might > + * not be usable by most guest OSes, we need to still consider them for > + * calculating the highest possible GPA so that we can properly report > + * if someone configures them on a CPU that cannot possibly address them. > + */ > + if (pcmc->broken_32bit_mem_addr_check) { > + /* old value for compatibility reasons */ > + return ((hwaddr)1 << cpu->phys_bits) - 1; > + } > + if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) { > + uint64_t devmem_start = 0; > + ram_addr_t devmem_size = 0; > + > + pc_get_device_memory_range(pcms, &devmem_start, &devmem_size); > + devmem_start += devmem_size; > + > + return devmem_start - 1; > + } > + return pc_above_4g_end(pcms) - 1; > } > --- > >>> >>>> + if (pcmc->has_reserved_memory && >>>> + (ms->ram_size < ms->maxram_size)) { >>>> + pc_get_device_memory_range(pcms, &devmem_start, >>>> + &devmem_size); >>>> + devmem_start += devmem_size; >>>> + return devmem_start - 1; >>>> + } else { >>>> + return pc_above_4g_end(pcms) - 1; >>>> + } >>>> + } else { >>>> + /* old value for compatibility reasons */ >>>> + return ((hwaddr)1 << cpu->phys_bits) - 1; > > Left shifting by N and removing 1 makes a mask of N bits, > but I had to do the logical operation mentally to figure it out, > >>> >>> Since you change this line, can we convert to >>> MAKE_64BIT_MASK(0, cpu->phys_bits) ? > > while here this macro says "make a mask" without having to look > at the definition. I will leave this part as is for now. > > Anyhow as I said, "nitpicking...". > >> Doesn’t the existing code reads better? Assuming that the macro does exactly the same thing, one has to still look up the definition. And >> (((~0ULL) >> (64 - (length))) << (shift)) >> Is such a brain twister :-) >>> >>>> + } >>>> } >>>> + /* 64-bit systems */ >>>> return pc_pci_hole64_start() + pci_hole64_size - 1; >>>> }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 54838c0c41..2a689cf0bd 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -907,12 +907,37 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size) { X86CPU *cpu = X86_CPU(first_cpu); + PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); + MachineState *ms = MACHINE(pcms); + uint64_t devmem_start = 0; + ram_addr_t devmem_size = 0; - /* 32-bit systems don't have hole64 thus return max CPU address */ - if (cpu->phys_bits <= 32) { - return ((hwaddr)1 << cpu->phys_bits) - 1; + /* + * 32-bit systems don't have hole64 but they might have a region for + * memory devices. Even if additional hotplugged memory devices might + * not be usable by most guest OSes, we need to still consider them for + * calculating the highest possible GPA so that we can properly report + * if someone configures them on a CPU that cannot possibly address them. + */ + if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) { + /* 32-bit systems */ + if (!pcmc->broken_32bit_mem_addr_check) { + if (pcmc->has_reserved_memory && + (ms->ram_size < ms->maxram_size)) { + pc_get_device_memory_range(pcms, &devmem_start, + &devmem_size); + devmem_start += devmem_size; + return devmem_start - 1; + } else { + return pc_above_4g_end(pcms) - 1; + } + } else { + /* old value for compatibility reasons */ + return ((hwaddr)1 << cpu->phys_bits) - 1; + } } + /* 64-bit systems */ return pc_pci_hole64_start() + pci_hole64_size - 1; } diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 8321f36f97..71003759bb 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -517,9 +517,13 @@ DEFINE_I440FX_MACHINE(v8_2, "pc-i440fx-8.2", NULL, static void pc_i440fx_8_1_machine_options(MachineClass *m) { + PCMachineClass *pcmc = PC_MACHINE_CLASS(m); + pc_i440fx_8_2_machine_options(m); m->alias = NULL; m->is_default = false; + pcmc->broken_32bit_mem_addr_check = true; + compat_props_add(m->compat_props, hw_compat_8_1, hw_compat_8_1_len); compat_props_add(m->compat_props, pc_compat_8_1, pc_compat_8_1_len); } diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 2dd1158b70..a7386f2ca2 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -394,8 +394,10 @@ DEFINE_Q35_MACHINE(v8_2, "pc-q35-8.2", NULL, static void pc_q35_8_1_machine_options(MachineClass *m) { + PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_8_2_machine_options(m); m->alias = NULL; + pcmc->broken_32bit_mem_addr_check = true; compat_props_add(m->compat_props, hw_compat_8_1, hw_compat_8_1_len); compat_props_add(m->compat_props, pc_compat_8_1, pc_compat_8_1_len); } diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 0fabece236..bec38cb92c 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -129,6 +129,12 @@ struct PCMachineClass { /* resizable acpi blob compat */ bool resizable_acpi_blob; + + /* + * whether the machine type implements broken 32-bit address space bound + * check for memory. + */ + bool broken_32bit_mem_addr_check; }; #define TYPE_PC_MACHINE "generic-pc-machine" diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c index d1b80149f2..f8e03dfd46 100644 --- a/tests/qtest/bios-tables-test.c +++ b/tests/qtest/bios-tables-test.c @@ -2080,7 +2080,6 @@ int main(int argc, char *argv[]) test_acpi_piix4_no_acpi_pci_hotplug); qtest_add_func("acpi/piix4/ipmi", test_acpi_piix4_tcg_ipmi); qtest_add_func("acpi/piix4/cpuhp", test_acpi_piix4_tcg_cphp); - qtest_add_func("acpi/piix4/memhp", test_acpi_piix4_tcg_memhp); qtest_add_func("acpi/piix4/numamem", test_acpi_piix4_tcg_numamem); qtest_add_func("acpi/piix4/nosmm", test_acpi_piix4_tcg_nosmm); qtest_add_func("acpi/piix4/smm-compat", @@ -2088,9 +2087,15 @@ int main(int argc, char *argv[]) qtest_add_func("acpi/piix4/smm-compat-nosmm", test_acpi_piix4_tcg_smm_compat_nosmm); qtest_add_func("acpi/piix4/nohpet", test_acpi_piix4_tcg_nohpet); - qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm); - qtest_add_func("acpi/piix4/acpihmat", - test_acpi_piix4_tcg_acpi_hmat); + + /* i386 does not support memory hotplug */ + if (strcmp(arch, "i386")) { + qtest_add_func("acpi/piix4/memhp", test_acpi_piix4_tcg_memhp); + qtest_add_func("acpi/piix4/dimmpxm", + test_acpi_piix4_tcg_dimm_pxm); + qtest_add_func("acpi/piix4/acpihmat", + test_acpi_piix4_tcg_acpi_hmat); + } #ifdef CONFIG_POSIX qtest_add_func("acpi/piix4/acpierst", test_acpi_piix4_acpi_erst); #endif @@ -2108,11 +2113,9 @@ int main(int argc, char *argv[]) test_acpi_q35_tcg_no_acpi_hotplug); qtest_add_func("acpi/q35/multif-bridge", test_acpi_q35_multif_bridge); - qtest_add_func("acpi/q35/mmio64", test_acpi_q35_tcg_mmio64); qtest_add_func("acpi/q35/ipmi", test_acpi_q35_tcg_ipmi); qtest_add_func("acpi/q35/smbus/ipmi", test_acpi_q35_tcg_smbus_ipmi); qtest_add_func("acpi/q35/cpuhp", test_acpi_q35_tcg_cphp); - qtest_add_func("acpi/q35/memhp", test_acpi_q35_tcg_memhp); qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem); qtest_add_func("acpi/q35/nosmm", test_acpi_q35_tcg_nosmm); qtest_add_func("acpi/q35/smm-compat", @@ -2120,10 +2123,17 @@ int main(int argc, char *argv[]) qtest_add_func("acpi/q35/smm-compat-nosmm", test_acpi_q35_tcg_smm_compat_nosmm); qtest_add_func("acpi/q35/nohpet", test_acpi_q35_tcg_nohpet); - qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm); - qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat); qtest_add_func("acpi/q35/acpihmat-noinitiator", test_acpi_q35_tcg_acpi_hmat_noinitiator); + + /* i386 does not support memory hotplug */ + if (strcmp(arch, "i386")) { + qtest_add_func("acpi/q35/memhp", test_acpi_q35_tcg_memhp); + qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm); + qtest_add_func("acpi/q35/acpihmat", + test_acpi_q35_tcg_acpi_hmat); + qtest_add_func("acpi/q35/mmio64", test_acpi_q35_tcg_mmio64); + } #ifdef CONFIG_POSIX qtest_add_func("acpi/q35/acpierst", test_acpi_q35_acpi_erst); #endif diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c index c5eb13f349..4f4404a4b1 100644 --- a/tests/qtest/numa-test.c +++ b/tests/qtest/numa-test.c @@ -568,7 +568,7 @@ int main(int argc, char **argv) qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial); qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus); - if (!strcmp(arch, "i386") || !strcmp(arch, "x86_64")) { + if (!strcmp(arch, "x86_64")) { qtest_add_data_func("/numa/pc/cpu/explicit", args, pc_numa_cpu); qtest_add_data_func("/numa/pc/dynamic/cpu", args, pc_dynamic_cpu_cfg); qtest_add_data_func("/numa/pc/hmat/build", args, pc_hmat_build_cfg); @@ -576,6 +576,11 @@ int main(int argc, char **argv) qtest_add_data_func("/numa/pc/hmat/erange", args, pc_hmat_erange_cfg); } + if (!strcmp(arch, "i386")) { + qtest_add_data_func("/numa/pc/cpu/explicit", args, pc_numa_cpu); + qtest_add_data_func("/numa/pc/dynamic/cpu", args, pc_dynamic_cpu_cfg); + } + if (!strcmp(arch, "ppc64")) { qtest_add_data_func("/numa/spapr/cpu/explicit", args, spapr_numa_cpu); }
32-bit x86 systems do not have a reserved memory for hole64. On those 32-bit systems without PSE36 or PAE CPU features, hotplugging memory devices are not supported by QEMU as QEMU always places hotplugged memory above 4 GiB boundary which is beyond the physical address space of the processor. Linux guests also does not support memory hotplug on those systems. Please see Linux kernel commit b59d02ed08690 ("mm/memory_hotplug: disable the functionality for 32b") for more details. Therefore, the maximum limit of the guest physical address in the absence of additional memory devices effectively coincides with the end of "above 4G memory space" region for 32-bit x86 without PAE/PSE36. When users configure additional memory devices, after properly accounting for the additional device memory region to find the maximum value of the guest physical address, the address will be outside the range of the processor's physical address space. This change adds improvements to take above into consideration. For example, previously this was allowed: $ ./qemu-system-x86_64 -cpu pentium -m size=10G With this change now it is no longer allowed: $ ./qemu-system-x86_64 -cpu pentium -m size=10G qemu-system-x86_64: Address space limit 0xffffffff < 0x2bfffffff phys-bits too low (32) However, the following are allowed since on both cases physical address space of the processor is 36 bits: $ ./qemu-system-x86_64 -cpu pentium2 -m size=10G $ ./qemu-system-x86_64 -cpu pentium,pse36=on -m size=10G For 32-bit, without PAE/PSE36, hotplugging additional memory is no longer allowed. $ ./qemu-system-i386 -m size=1G,maxmem=3G,slots=2 qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) $ ./qemu-system-i386 -machine q35 -m size=1G,maxmem=3G,slots=2 qemu-system-i386: Address space limit 0xffffffff < 0x1ffffffff phys-bits too low (32) A new compatibility flag is introduced to make sure pc_max_used_gpa() keeps returning the old value for machines 8.1 and older. Therefore, the above is still allowed for older machine types in order to support compatibility. Hence, the following still works: $ ./qemu-system-i386 -machine pc-i440fx-8.1 -m size=1G,maxmem=3G,slots=2 $ ./qemu-system-i386 -machine pc-q35-8.1 -m size=1G,maxmem=3G,slots=2 Further, following is also allowed as with PSE36, the processor has 36-bit address space: $ ./qemu-system-i386 -cpu 486,pse36=on -m size=1G,maxmem=3G,slots=2 After calling CPUID with EAX=0x80000001, all AMD64 compliant processors have the longmode-capable-bit turned on in the extended feature flags (bit 29) in EDX. The absence of CPUID longmode can be used to differentiate between 32-bit and 64-bit processors and is the recommended approach. QEMU takes this approach elsewhere (for example, please see x86_cpu_realizefn()), With this change, pc_max_used_gpa() also uses the same method to detect 32-bit processors. Unit tests are modified to not run 32-bit x86 tests that use memory hotplug. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ani Sinha <anisinha@redhat.com> --- hw/i386/pc.c | 31 ++++++++++++++++++++++++++++--- hw/i386/pc_piix.c | 4 ++++ hw/i386/pc_q35.c | 2 ++ include/hw/i386/pc.h | 6 ++++++ tests/qtest/bios-tables-test.c | 26 ++++++++++++++++++-------- tests/qtest/numa-test.c | 7 ++++++- 6 files changed, 64 insertions(+), 12 deletions(-) changelog: v4: address comments from v3. Fix a bug where compat knob was absent from q35 machines. Commit message adjustment. v3: still accounting for additional memory device region above 4G. unit tests fixed (not running for 32-bit where mem hotplug is used). v2: removed memory hotplug region from max_gpa. added compat knobs.