Message ID | 20240711074822.3384344-1-imammedo@redhat.com |
---|---|
State | New |
Headers | show |
Series | smbios: make memory device size configurable per Machine | expand |
Hi Igor, On 11/7/24 09:48, Igor Mammedov wrote: > Currently SMBIOS maximum memory device chunk is capped at 16Gb, > which is fine for the most cases (QEMU uses it to describe initial > RAM (type 17 SMBIOS table entries)). > However when starting guest with terabytes of RAM this leads to > too many memory device structures, which eventually upsets linux > kernel as it reserves only 64K for these entries and when that > border is crossed out it runs out of reserved memory. > > Instead of partitioning initial RAM on 16Gb chunks, use maximum > possible chunk size that SMBIOS spec allows[1]. Which lets > encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). > As result initial RAM will generate only one type 17 structure > until host/guest reach ability to use more RAM in the future. > > Compat changes: > We can't unconditionally change chunk size as it will break > QEMU<->guest ABI (and migration). Thus introduce a new machine class > field that would let older versioned machines to use 16Gb chunks > while new machine type could use maximum possible chunk size. > > While it might seem to be risky to rise max entry size this much > (much beyond of what current physical RAM modules support), > I'd not expect it causing much issues, modulo uncovering bugs > in software running within guest. And those should be fixed > on guest side to handle SMBIOS spec properly, especially if > guest is expected to support so huge RAM configs. > In worst case, QEMU can reduce chunk size later if we would > care enough about introducing a workaround for some 'unfixable' > guest OS, either by fixing up the next machine type or > giving users a CLI option to customize it. > > 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size > > PS: > * tested on 8Tb host with RHEL6 guest, which seems to parse > type 17 SMBIOS table entries correctly (according to 'dmidecode'). > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > include/hw/boards.h | 4 ++++ > hw/arm/virt.c | 1 + > hw/core/machine.c | 1 + > hw/i386/pc_piix.c | 1 + > hw/i386/pc_q35.c | 1 + > hw/smbios/smbios.c | 11 ++++++----- > 6 files changed, 14 insertions(+), 5 deletions(-) > > diff --git a/include/hw/boards.h b/include/hw/boards.h > index ef6f18f2c1..48ff6d8b93 100644 > --- a/include/hw/boards.h > +++ b/include/hw/boards.h > @@ -237,6 +237,9 @@ typedef struct { > * purposes only. > * Applies only to default memory backend, i.e., explicit memory backend > * wasn't used. > + * @smbios_memory_device_size: > + * Default size of memory device, > + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" > */ > struct MachineClass { > /*< private >*/ > @@ -304,6 +307,7 @@ struct MachineClass { > const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); > int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); > ram_addr_t (*fixup_ram_size)(ram_addr_t size); > + uint64_t smbios_memory_device_size; Quick notes since I'm on holidays (not meant to block this patch): - How will evolve this machine class property in the context of a heterogeneous machine (i.e. x86_64 cores and 1 riscv32 one)? - Should this become a SmbiosProviderInterface later? > }; > > /** > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index b0c68d66a3..719e83e6a1 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) > static void virt_machine_9_0_options(MachineClass *mc) > { > virt_machine_9_1_options(mc); > + mc->smbios_memory_device_size = 16 * GiB; > compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); > } > DEFINE_VIRT_MACHINE(9, 0) [...]
On Thu, 11 Jul 2024 10:19:27 +0200 Philippe Mathieu-Daudé <philmd@linaro.org> wrote: > Hi Igor, > > On 11/7/24 09:48, Igor Mammedov wrote: > > Currently SMBIOS maximum memory device chunk is capped at 16Gb, > > which is fine for the most cases (QEMU uses it to describe initial > > RAM (type 17 SMBIOS table entries)). > > However when starting guest with terabytes of RAM this leads to > > too many memory device structures, which eventually upsets linux > > kernel as it reserves only 64K for these entries and when that > > border is crossed out it runs out of reserved memory. > > > > Instead of partitioning initial RAM on 16Gb chunks, use maximum > > possible chunk size that SMBIOS spec allows[1]. Which lets > > encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). > > As result initial RAM will generate only one type 17 structure > > until host/guest reach ability to use more RAM in the future. > > > > Compat changes: > > We can't unconditionally change chunk size as it will break > > QEMU<->guest ABI (and migration). Thus introduce a new machine class > > field that would let older versioned machines to use 16Gb chunks > > while new machine type could use maximum possible chunk size. > > > > While it might seem to be risky to rise max entry size this much > > (much beyond of what current physical RAM modules support), > > I'd not expect it causing much issues, modulo uncovering bugs > > in software running within guest. And those should be fixed > > on guest side to handle SMBIOS spec properly, especially if > > guest is expected to support so huge RAM configs. > > In worst case, QEMU can reduce chunk size later if we would > > care enough about introducing a workaround for some 'unfixable' > > guest OS, either by fixing up the next machine type or > > giving users a CLI option to customize it. > > > > 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size > > > > PS: > > * tested on 8Tb host with RHEL6 guest, which seems to parse > > type 17 SMBIOS table entries correctly (according to 'dmidecode'). > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > include/hw/boards.h | 4 ++++ > > hw/arm/virt.c | 1 + > > hw/core/machine.c | 1 + > > hw/i386/pc_piix.c | 1 + > > hw/i386/pc_q35.c | 1 + > > hw/smbios/smbios.c | 11 ++++++----- > > 6 files changed, 14 insertions(+), 5 deletions(-) > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h > > index ef6f18f2c1..48ff6d8b93 100644 > > --- a/include/hw/boards.h > > +++ b/include/hw/boards.h > > @@ -237,6 +237,9 @@ typedef struct { > > * purposes only. > > * Applies only to default memory backend, i.e., explicit memory backend > > * wasn't used. > > + * @smbios_memory_device_size: > > + * Default size of memory device, > > + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" > > */ > > struct MachineClass { > > /*< private >*/ > > @@ -304,6 +307,7 @@ struct MachineClass { > > const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); > > int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); > > ram_addr_t (*fixup_ram_size)(ram_addr_t size); > > + uint64_t smbios_memory_device_size; > > Quick notes since I'm on holidays (not meant to block this patch): > > - How will evolve this machine class property in the context of > a heterogeneous machine (i.e. x86_64 cores and 1 riscv32 one)? I'm not aware of a SMBIOS spec (3.x) that cares about that heterogeneous setup yet. Are there anything in that area exists yet? > - Should this become a SmbiosProviderInterface later? if/when SMBIOS does get there (heterogeneous machines), introducing an interface might make a sense. > > > }; > > > > /** > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index b0c68d66a3..719e83e6a1 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) > > static void virt_machine_9_0_options(MachineClass *mc) > > { > > virt_machine_9_1_options(mc); > > + mc->smbios_memory_device_size = 16 * GiB; > > compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > } > > DEFINE_VIRT_MACHINE(9, 0) > > [...] >
On Thu, Jul 11, 2024 at 09:48:22AM +0200, Igor Mammedov wrote: > Currently SMBIOS maximum memory device chunk is capped at 16Gb, > which is fine for the most cases (QEMU uses it to describe initial > RAM (type 17 SMBIOS table entries)). > However when starting guest with terabytes of RAM this leads to > too many memory device structures, which eventually upsets linux > kernel as it reserves only 64K for these entries and when that > border is crossed out it runs out of reserved memory. > > Instead of partitioning initial RAM on 16Gb chunks, use maximum > possible chunk size that SMBIOS spec allows[1]. Which lets > encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). > As result initial RAM will generate only one type 17 structure > until host/guest reach ability to use more RAM in the future. > > Compat changes: > We can't unconditionally change chunk size as it will break > QEMU<->guest ABI (and migration). Thus introduce a new machine class > field that would let older versioned machines to use 16Gb chunks > while new machine type could use maximum possible chunk size. > > While it might seem to be risky to rise max entry size this much > (much beyond of what current physical RAM modules support), > I'd not expect it causing much issues, modulo uncovering bugs > in software running within guest. And those should be fixed > on guest side to handle SMBIOS spec properly, especially if > guest is expected to support so huge RAM configs. > In worst case, QEMU can reduce chunk size later if we would > care enough about introducing a workaround for some 'unfixable' > guest OS, either by fixing up the next machine type or > giving users a CLI option to customize it. I was wondering what real hardware does, since the best way to avoid guest OS surprises is to align with real world behaviour. IIUC, there is usually one Type 17 structure per physical DIMM. Most QEMU configs don't express DIMMs as a concept so in that case, we can presume 1 virtual DIMM, and thus having one type 17 structure is a match for physical hw practices. What about when the QEMU config has used nvdimm, pc-dimm, or virtio-mem devices though ? It feels like the best practice would be to have a type 17 structure for each instance of one of those devices. > > 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size > > PS: > * tested on 8Tb host with RHEL6 guest, which seems to parse > type 17 SMBIOS table entries correctly (according to 'dmidecode'). > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > include/hw/boards.h | 4 ++++ > hw/arm/virt.c | 1 + > hw/core/machine.c | 1 + > hw/i386/pc_piix.c | 1 + > hw/i386/pc_q35.c | 1 + > hw/smbios/smbios.c | 11 ++++++----- > 6 files changed, 14 insertions(+), 5 deletions(-) > > diff --git a/include/hw/boards.h b/include/hw/boards.h > index ef6f18f2c1..48ff6d8b93 100644 > --- a/include/hw/boards.h > +++ b/include/hw/boards.h > @@ -237,6 +237,9 @@ typedef struct { > * purposes only. > * Applies only to default memory backend, i.e., explicit memory backend > * wasn't used. > + * @smbios_memory_device_size: > + * Default size of memory device, > + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" > */ > struct MachineClass { > /*< private >*/ > @@ -304,6 +307,7 @@ struct MachineClass { > const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); > int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); > ram_addr_t (*fixup_ram_size)(ram_addr_t size); > + uint64_t smbios_memory_device_size; > }; > > /** > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index b0c68d66a3..719e83e6a1 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) > static void virt_machine_9_0_options(MachineClass *mc) > { > virt_machine_9_1_options(mc); > + mc->smbios_memory_device_size = 16 * GiB; > compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); > } > DEFINE_VIRT_MACHINE(9, 0) > diff --git a/hw/core/machine.c b/hw/core/machine.c > index bc38cad7f2..3cfdaec65d 100644 > --- a/hw/core/machine.c > +++ b/hw/core/machine.c > @@ -1004,6 +1004,7 @@ static void machine_class_init(ObjectClass *oc, void *data) > /* Default 128 MB as guest ram size */ > mc->default_ram_size = 128 * MiB; > mc->rom_file_has_mr = true; > + mc->smbios_memory_device_size = 2047 * TiB; > > /* numa node memory size aligned on 8MB by default. > * On Linux, each node's border has to be 8MB aligned > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > index 9445b07b4f..d9e69243b4 100644 > --- a/hw/i386/pc_piix.c > +++ b/hw/i386/pc_piix.c > @@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m) > pc_i440fx_machine_9_1_options(m); > m->alias = NULL; > m->is_default = false; > + m->smbios_memory_device_size = 16 * GiB; > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index 71d3c6d122..9d108b194e 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m) > PCMachineClass *pcmc = PC_MACHINE_CLASS(m); > pc_q35_machine_9_1_options(m); > m->alias = NULL; > + m->smbios_memory_device_size = 16 * GiB; > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > pcmc->isa_bios_alias = false; > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c > index 3b7703489d..a394514264 100644 > --- a/hw/smbios/smbios.c > +++ b/hw/smbios/smbios.c > @@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms, > Error **errp) > { > unsigned i, dimm_cnt, offset; > + MachineClass *mc = MACHINE_GET_CLASS(ms); > ERRP_GUARD(); > > assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 || > @@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms, > smbios_build_type_9_table(errp); > smbios_build_type_11_table(); > > -#define MAX_DIMM_SZ (16 * GiB) > -#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \ > - : ((current_machine->ram_size - 1) % MAX_DIMM_SZ) + 1) > +#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \ > + : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1) > > - dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) / > - MAX_DIMM_SZ; > + dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, > + mc->smbios_memory_device_size) / > + mc->smbios_memory_device_size; > > /* > * The offset determines if we need to keep additional space between > -- > 2.43.0 > > With regards, Daniel
On Thu, 11 Jul 2024 09:43:46 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Thu, Jul 11, 2024 at 09:48:22AM +0200, Igor Mammedov wrote: > > Currently SMBIOS maximum memory device chunk is capped at 16Gb, > > which is fine for the most cases (QEMU uses it to describe initial > > RAM (type 17 SMBIOS table entries)). > > However when starting guest with terabytes of RAM this leads to > > too many memory device structures, which eventually upsets linux > > kernel as it reserves only 64K for these entries and when that > > border is crossed out it runs out of reserved memory. > > > > Instead of partitioning initial RAM on 16Gb chunks, use maximum > > possible chunk size that SMBIOS spec allows[1]. Which lets > > encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). > > As result initial RAM will generate only one type 17 structure > > until host/guest reach ability to use more RAM in the future. > > > > Compat changes: > > We can't unconditionally change chunk size as it will break > > QEMU<->guest ABI (and migration). Thus introduce a new machine class > > field that would let older versioned machines to use 16Gb chunks > > while new machine type could use maximum possible chunk size. > > > > While it might seem to be risky to rise max entry size this much > > (much beyond of what current physical RAM modules support), > > I'd not expect it causing much issues, modulo uncovering bugs > > in software running within guest. And those should be fixed > > on guest side to handle SMBIOS spec properly, especially if > > guest is expected to support so huge RAM configs. > > In worst case, QEMU can reduce chunk size later if we would > > care enough about introducing a workaround for some 'unfixable' > > guest OS, either by fixing up the next machine type or > > giving users a CLI option to customize it. > > I was wondering what real hardware does, since the best way to > avoid guest OS surprises is to align with real world behaviour. > IIUC, there is usually one Type 17 structure per physical > DIMM. > > Most QEMU configs don't express DIMMs as a concept so in that > case, we can presume 1 virtual DIMM, and thus having one type > 17 structure is a match for physical hw practices. > What about when the QEMU config has used nvdimm, pc-dimm, > or virtio-mem devices though ? It feels like the best practice > would be to have a type 17 structure for each instance of one > of those devices. QEMU doesn't expose any memory beside initial one in SMBIOS. So from guest introspection pov when using only SMBIOS, those do not exists. On tangent: I think exposing those with hotplug in place makes it messy especially with migration in mind (we would need to move smbios tables creation to reset time and enumerate all supported memory devices at that time to get somewhat reliable picture, which would reflect machine config _only_ at boot time). Also it would help to model initial RAM as DIMM(s) device to avoid faking RAM entry, and do it consistently with DIMM devices. (but yeah, nobody asked for anything like that so far). > > 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size > > > > PS: > > * tested on 8Tb host with RHEL6 guest, which seems to parse > > type 17 SMBIOS table entries correctly (according to 'dmidecode'). > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > include/hw/boards.h | 4 ++++ > > hw/arm/virt.c | 1 + > > hw/core/machine.c | 1 + > > hw/i386/pc_piix.c | 1 + > > hw/i386/pc_q35.c | 1 + > > hw/smbios/smbios.c | 11 ++++++----- > > 6 files changed, 14 insertions(+), 5 deletions(-) > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h > > index ef6f18f2c1..48ff6d8b93 100644 > > --- a/include/hw/boards.h > > +++ b/include/hw/boards.h > > @@ -237,6 +237,9 @@ typedef struct { > > * purposes only. > > * Applies only to default memory backend, i.e., explicit memory backend > > * wasn't used. > > + * @smbios_memory_device_size: > > + * Default size of memory device, > > + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" > > */ > > struct MachineClass { > > /*< private >*/ > > @@ -304,6 +307,7 @@ struct MachineClass { > > const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); > > int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); > > ram_addr_t (*fixup_ram_size)(ram_addr_t size); > > + uint64_t smbios_memory_device_size; > > }; > > > > /** > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index b0c68d66a3..719e83e6a1 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) > > static void virt_machine_9_0_options(MachineClass *mc) > > { > > virt_machine_9_1_options(mc); > > + mc->smbios_memory_device_size = 16 * GiB; > > compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > } > > DEFINE_VIRT_MACHINE(9, 0) > > diff --git a/hw/core/machine.c b/hw/core/machine.c > > index bc38cad7f2..3cfdaec65d 100644 > > --- a/hw/core/machine.c > > +++ b/hw/core/machine.c > > @@ -1004,6 +1004,7 @@ static void machine_class_init(ObjectClass *oc, void *data) > > /* Default 128 MB as guest ram size */ > > mc->default_ram_size = 128 * MiB; > > mc->rom_file_has_mr = true; > > + mc->smbios_memory_device_size = 2047 * TiB; > > > > /* numa node memory size aligned on 8MB by default. > > * On Linux, each node's border has to be 8MB aligned > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > > index 9445b07b4f..d9e69243b4 100644 > > --- a/hw/i386/pc_piix.c > > +++ b/hw/i386/pc_piix.c > > @@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m) > > pc_i440fx_machine_9_1_options(m); > > m->alias = NULL; > > m->is_default = false; > > + m->smbios_memory_device_size = 16 * GiB; > > > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > > index 71d3c6d122..9d108b194e 100644 > > --- a/hw/i386/pc_q35.c > > +++ b/hw/i386/pc_q35.c > > @@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m) > > PCMachineClass *pcmc = PC_MACHINE_CLASS(m); > > pc_q35_machine_9_1_options(m); > > m->alias = NULL; > > + m->smbios_memory_device_size = 16 * GiB; > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > > pcmc->isa_bios_alias = false; > > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c > > index 3b7703489d..a394514264 100644 > > --- a/hw/smbios/smbios.c > > +++ b/hw/smbios/smbios.c > > @@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms, > > Error **errp) > > { > > unsigned i, dimm_cnt, offset; > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > ERRP_GUARD(); > > > > assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 || > > @@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms, > > smbios_build_type_9_table(errp); > > smbios_build_type_11_table(); > > > > -#define MAX_DIMM_SZ (16 * GiB) > > -#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \ > > - : ((current_machine->ram_size - 1) % MAX_DIMM_SZ) + 1) > > +#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \ > > + : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1) > > > > - dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) / > > - MAX_DIMM_SZ; > > + dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, > > + mc->smbios_memory_device_size) / > > + mc->smbios_memory_device_size; > > > > /* > > * The offset determines if we need to keep additional space between > > -- > > 2.43.0 > > > > > > With regards, > Daniel
On Thu, Jul 11, 2024 at 09:48:22AM +0200, Igor Mammedov wrote: > Currently SMBIOS maximum memory device chunk is capped at 16Gb, > which is fine for the most cases (QEMU uses it to describe initial > RAM (type 17 SMBIOS table entries)). > However when starting guest with terabytes of RAM this leads to > too many memory device structures, which eventually upsets linux > kernel as it reserves only 64K for these entries and when that > border is crossed out it runs out of reserved memory. > > Instead of partitioning initial RAM on 16Gb chunks, use maximum > possible chunk size that SMBIOS spec allows[1]. Which lets > encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). > As result initial RAM will generate only one type 17 structure > until host/guest reach ability to use more RAM in the future. > > Compat changes: > We can't unconditionally change chunk size as it will break > QEMU<->guest ABI (and migration). Thus introduce a new machine class > field that would let older versioned machines to use 16Gb chunks > while new machine type could use maximum possible chunk size. > > While it might seem to be risky to rise max entry size this much > (much beyond of what current physical RAM modules support), > I'd not expect it causing much issues, modulo uncovering bugs > in software running within guest. And those should be fixed > on guest side to handle SMBIOS spec properly, especially if > guest is expected to support so huge RAM configs. > In worst case, QEMU can reduce chunk size later if we would > care enough about introducing a workaround for some 'unfixable' > guest OS, either by fixing up the next machine type or > giving users a CLI option to customize it. > > 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size > > PS: > * tested on 8Tb host with RHEL6 guest, which seems to parse > type 17 SMBIOS table entries correctly (according to 'dmidecode'). > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > include/hw/boards.h | 4 ++++ > hw/arm/virt.c | 1 + > hw/core/machine.c | 1 + > hw/i386/pc_piix.c | 1 + > hw/i386/pc_q35.c | 1 + > hw/smbios/smbios.c | 11 ++++++----- > 6 files changed, 14 insertions(+), 5 deletions(-) > > diff --git a/include/hw/boards.h b/include/hw/boards.h > index ef6f18f2c1..48ff6d8b93 100644 > --- a/include/hw/boards.h > +++ b/include/hw/boards.h > @@ -237,6 +237,9 @@ typedef struct { > * purposes only. > * Applies only to default memory backend, i.e., explicit memory backend > * wasn't used. > + * @smbios_memory_device_size: > + * Default size of memory device, > + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" Maybe it would be better to just make this a boolean, and put the spec related logic in smbios.c ? WDYT? > */ > struct MachineClass { > /*< private >*/ > @@ -304,6 +307,7 @@ struct MachineClass { > const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); > int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); > ram_addr_t (*fixup_ram_size)(ram_addr_t size); > + uint64_t smbios_memory_device_size; > }; > > /** > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index b0c68d66a3..719e83e6a1 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) > static void virt_machine_9_0_options(MachineClass *mc) > { > virt_machine_9_1_options(mc); > + mc->smbios_memory_device_size = 16 * GiB; > compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); > } > DEFINE_VIRT_MACHINE(9, 0) > diff --git a/hw/core/machine.c b/hw/core/machine.c > index bc38cad7f2..3cfdaec65d 100644 > --- a/hw/core/machine.c > +++ b/hw/core/machine.c > @@ -1004,6 +1004,7 @@ static void machine_class_init(ObjectClass *oc, void *data) > /* Default 128 MB as guest ram size */ > mc->default_ram_size = 128 * MiB; > mc->rom_file_has_mr = true; > + mc->smbios_memory_device_size = 2047 * TiB; > > /* numa node memory size aligned on 8MB by default. > * On Linux, each node's border has to be 8MB aligned All these values really should be documented. And I feel > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > index 9445b07b4f..d9e69243b4 100644 > --- a/hw/i386/pc_piix.c > +++ b/hw/i386/pc_piix.c > @@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m) > pc_i440fx_machine_9_1_options(m); > m->alias = NULL; > m->is_default = false; > + m->smbios_memory_device_size = 16 * GiB; > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index 71d3c6d122..9d108b194e 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m) > PCMachineClass *pcmc = PC_MACHINE_CLASS(m); > pc_q35_machine_9_1_options(m); > m->alias = NULL; > + m->smbios_memory_device_size = 16 * GiB; > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > pcmc->isa_bios_alias = false; > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c > index 3b7703489d..a394514264 100644 > --- a/hw/smbios/smbios.c > +++ b/hw/smbios/smbios.c > @@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms, > Error **errp) > { > unsigned i, dimm_cnt, offset; > + MachineClass *mc = MACHINE_GET_CLASS(ms); > ERRP_GUARD(); > > assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 || > @@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms, > smbios_build_type_9_table(errp); > smbios_build_type_11_table(); > > -#define MAX_DIMM_SZ (16 * GiB) > -#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \ > - : ((current_machine->ram_size - 1) % MAX_DIMM_SZ) + 1) > +#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \ > + : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1) > > - dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) / > - MAX_DIMM_SZ; > + dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, > + mc->smbios_memory_device_size) / > + mc->smbios_memory_device_size; > > /* > * The offset determines if we need to keep additional space between > -- > 2.43.0
On Thu, 11 Jul 2024 07:13:27 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Jul 11, 2024 at 09:48:22AM +0200, Igor Mammedov wrote: > > Currently SMBIOS maximum memory device chunk is capped at 16Gb, > > which is fine for the most cases (QEMU uses it to describe initial > > RAM (type 17 SMBIOS table entries)). > > However when starting guest with terabytes of RAM this leads to > > too many memory device structures, which eventually upsets linux > > kernel as it reserves only 64K for these entries and when that > > border is crossed out it runs out of reserved memory. > > > > Instead of partitioning initial RAM on 16Gb chunks, use maximum > > possible chunk size that SMBIOS spec allows[1]. Which lets > > encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). > > As result initial RAM will generate only one type 17 structure > > until host/guest reach ability to use more RAM in the future. > > > > Compat changes: > > We can't unconditionally change chunk size as it will break > > QEMU<->guest ABI (and migration). Thus introduce a new machine class > > field that would let older versioned machines to use 16Gb chunks > > while new machine type could use maximum possible chunk size. > > > > While it might seem to be risky to rise max entry size this much > > (much beyond of what current physical RAM modules support), > > I'd not expect it causing much issues, modulo uncovering bugs > > in software running within guest. And those should be fixed > > on guest side to handle SMBIOS spec properly, especially if > > guest is expected to support so huge RAM configs. > > In worst case, QEMU can reduce chunk size later if we would > > care enough about introducing a workaround for some 'unfixable' > > guest OS, either by fixing up the next machine type or > > giving users a CLI option to customize it. > > > > 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size > > > > PS: > > * tested on 8Tb host with RHEL6 guest, which seems to parse > > type 17 SMBIOS table entries correctly (according to 'dmidecode'). > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > include/hw/boards.h | 4 ++++ > > hw/arm/virt.c | 1 + > > hw/core/machine.c | 1 + > > hw/i386/pc_piix.c | 1 + > > hw/i386/pc_q35.c | 1 + > > hw/smbios/smbios.c | 11 ++++++----- > > 6 files changed, 14 insertions(+), 5 deletions(-) > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h > > index ef6f18f2c1..48ff6d8b93 100644 > > --- a/include/hw/boards.h > > +++ b/include/hw/boards.h > > @@ -237,6 +237,9 @@ typedef struct { > > * purposes only. > > * Applies only to default memory backend, i.e., explicit memory backend > > * wasn't used. > > + * @smbios_memory_device_size: > > + * Default size of memory device, > > + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" > > Maybe it would be better to just make this a boolean, > and put the spec related logic in smbios.c ? > WDYT? Using bool here, seems awkward to me, i.e. not clear semantics and compat handling would be complicated as well. And if we have to expose it someday to users, it would be logical to make it machine property. Given it's used not only by x86, having it as value here lets each machine to customize if necessary using well established pattern (incl. compat machinery) > > */ > > struct MachineClass { > > /*< private >*/ > > @@ -304,6 +307,7 @@ struct MachineClass { > > const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); > > int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); > > ram_addr_t (*fixup_ram_size)(ram_addr_t size); > > + uint64_t smbios_memory_device_size; > > }; > > > > /** > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index b0c68d66a3..719e83e6a1 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) > > static void virt_machine_9_0_options(MachineClass *mc) > > { > > virt_machine_9_1_options(mc); > > + mc->smbios_memory_device_size = 16 * GiB; > > compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > } > > DEFINE_VIRT_MACHINE(9, 0) > > diff --git a/hw/core/machine.c b/hw/core/machine.c > > index bc38cad7f2..3cfdaec65d 100644 > > --- a/hw/core/machine.c > > +++ b/hw/core/machine.c > > @@ -1004,6 +1004,7 @@ static void machine_class_init(ObjectClass *oc, void *data) > > /* Default 128 MB as guest ram size */ > > mc->default_ram_size = 128 * MiB; > > mc->rom_file_has_mr = true; > > + mc->smbios_memory_device_size = 2047 * TiB; > > > > /* numa node memory size aligned on 8MB by default. > > * On Linux, each node's border has to be 8MB aligned > > > > All these values really should be documented. It's in commit message, but right I'll document value here on respin so it would be easier for reader to see where it comes from. > And I feel ??? > > > > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > > index 9445b07b4f..d9e69243b4 100644 > > --- a/hw/i386/pc_piix.c > > +++ b/hw/i386/pc_piix.c > > @@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m) > > pc_i440fx_machine_9_1_options(m); > > m->alias = NULL; > > m->is_default = false; > > + m->smbios_memory_device_size = 16 * GiB; > > > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > > index 71d3c6d122..9d108b194e 100644 > > --- a/hw/i386/pc_q35.c > > +++ b/hw/i386/pc_q35.c > > @@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m) > > PCMachineClass *pcmc = PC_MACHINE_CLASS(m); > > pc_q35_machine_9_1_options(m); > > m->alias = NULL; > > + m->smbios_memory_device_size = 16 * GiB; > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > > pcmc->isa_bios_alias = false; > > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c > > index 3b7703489d..a394514264 100644 > > --- a/hw/smbios/smbios.c > > +++ b/hw/smbios/smbios.c > > @@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms, > > Error **errp) > > { > > unsigned i, dimm_cnt, offset; > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > ERRP_GUARD(); > > > > assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 || > > @@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms, > > smbios_build_type_9_table(errp); > > smbios_build_type_11_table(); > > > > -#define MAX_DIMM_SZ (16 * GiB) > > -#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \ > > - : ((current_machine->ram_size - 1) % MAX_DIMM_SZ) + 1) > > +#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \ > > + : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1) > > > > - dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) / > > - MAX_DIMM_SZ; > > + dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, > > + mc->smbios_memory_device_size) / > > + mc->smbios_memory_device_size; > > > > /* > > * The offset determines if we need to keep additional space between > > -- > > 2.43.0 >
On Thu, Jul 11, 2024 at 03:05:11PM +0200, Igor Mammedov wrote: > On Thu, 11 Jul 2024 07:13:27 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Jul 11, 2024 at 09:48:22AM +0200, Igor Mammedov wrote: > > > Currently SMBIOS maximum memory device chunk is capped at 16Gb, > > > which is fine for the most cases (QEMU uses it to describe initial > > > RAM (type 17 SMBIOS table entries)). > > > However when starting guest with terabytes of RAM this leads to > > > too many memory device structures, which eventually upsets linux > > > kernel as it reserves only 64K for these entries and when that > > > border is crossed out it runs out of reserved memory. > > > > > > Instead of partitioning initial RAM on 16Gb chunks, use maximum > > > possible chunk size that SMBIOS spec allows[1]. Which lets > > > encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). > > > As result initial RAM will generate only one type 17 structure > > > until host/guest reach ability to use more RAM in the future. > > > > > > Compat changes: > > > We can't unconditionally change chunk size as it will break > > > QEMU<->guest ABI (and migration). Thus introduce a new machine class > > > field that would let older versioned machines to use 16Gb chunks > > > while new machine type could use maximum possible chunk size. > > > > > > While it might seem to be risky to rise max entry size this much > > > (much beyond of what current physical RAM modules support), > > > I'd not expect it causing much issues, modulo uncovering bugs > > > in software running within guest. And those should be fixed > > > on guest side to handle SMBIOS spec properly, especially if > > > guest is expected to support so huge RAM configs. > > > In worst case, QEMU can reduce chunk size later if we would > > > care enough about introducing a workaround for some 'unfixable' > > > guest OS, either by fixing up the next machine type or > > > giving users a CLI option to customize it. > > > > > > 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size > > > > > > PS: > > > * tested on 8Tb host with RHEL6 guest, which seems to parse > > > type 17 SMBIOS table entries correctly (according to 'dmidecode'). > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > --- > > > include/hw/boards.h | 4 ++++ > > > hw/arm/virt.c | 1 + > > > hw/core/machine.c | 1 + > > > hw/i386/pc_piix.c | 1 + > > > hw/i386/pc_q35.c | 1 + > > > hw/smbios/smbios.c | 11 ++++++----- > > > 6 files changed, 14 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h > > > index ef6f18f2c1..48ff6d8b93 100644 > > > --- a/include/hw/boards.h > > > +++ b/include/hw/boards.h > > > @@ -237,6 +237,9 @@ typedef struct { > > > * purposes only. > > > * Applies only to default memory backend, i.e., explicit memory backend > > > * wasn't used. > > > + * @smbios_memory_device_size: > > > + * Default size of memory device, > > > + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" > > > > Maybe it would be better to just make this a boolean, > > and put the spec related logic in smbios.c ? > > WDYT? > > Using bool here, seems awkward to me, > i.e. not clear semantics and compat handling would be > complicated as well. > > And if we have to expose it someday to users, > it would be logical to make it machine property. > Given it's used not only by x86, having it as value > here lets each machine to customize if necessary > using well established pattern (incl. compat machinery) > > > > > */ > > > struct MachineClass { > > > /*< private >*/ > > > @@ -304,6 +307,7 @@ struct MachineClass { > > > const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); > > > int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); > > > ram_addr_t (*fixup_ram_size)(ram_addr_t size); > > > + uint64_t smbios_memory_device_size; > > > }; > > > > > > /** > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > > index b0c68d66a3..719e83e6a1 100644 > > > --- a/hw/arm/virt.c > > > +++ b/hw/arm/virt.c > > > @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) > > > static void virt_machine_9_0_options(MachineClass *mc) > > > { > > > virt_machine_9_1_options(mc); > > > + mc->smbios_memory_device_size = 16 * GiB; > > > compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > > } > > > DEFINE_VIRT_MACHINE(9, 0) > > > diff --git a/hw/core/machine.c b/hw/core/machine.c > > > index bc38cad7f2..3cfdaec65d 100644 > > > --- a/hw/core/machine.c > > > +++ b/hw/core/machine.c > > > @@ -1004,6 +1004,7 @@ static void machine_class_init(ObjectClass *oc, void *data) > > > /* Default 128 MB as guest ram size */ > > > mc->default_ram_size = 128 * MiB; > > > mc->rom_file_has_mr = true; > > > + mc->smbios_memory_device_size = 2047 * TiB; > > > > > > /* numa node memory size aligned on 8MB by default. > > > * On Linux, each node's border has to be 8MB aligned > > > > > > > > All these values really should be documented. > It's in commit message, but right I'll document value here > on respin so it would be easier for reader to see where it > comes from. > > > And I feel > ??? sorry. what I said above. can we find some place where it is not awkward to quote spec? > > > > > > > > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > > > index 9445b07b4f..d9e69243b4 100644 > > > --- a/hw/i386/pc_piix.c > > > +++ b/hw/i386/pc_piix.c > > > @@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m) > > > pc_i440fx_machine_9_1_options(m); > > > m->alias = NULL; > > > m->is_default = false; > > > + m->smbios_memory_device_size = 16 * GiB; > > > > > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > > > index 71d3c6d122..9d108b194e 100644 > > > --- a/hw/i386/pc_q35.c > > > +++ b/hw/i386/pc_q35.c > > > @@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m) > > > PCMachineClass *pcmc = PC_MACHINE_CLASS(m); > > > pc_q35_machine_9_1_options(m); > > > m->alias = NULL; > > > + m->smbios_memory_device_size = 16 * GiB; > > > compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); > > > compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); > > > pcmc->isa_bios_alias = false; > > > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c > > > index 3b7703489d..a394514264 100644 > > > --- a/hw/smbios/smbios.c > > > +++ b/hw/smbios/smbios.c > > > @@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms, > > > Error **errp) > > > { > > > unsigned i, dimm_cnt, offset; > > > + MachineClass *mc = MACHINE_GET_CLASS(ms); > > > ERRP_GUARD(); > > > > > > assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 || > > > @@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms, > > > smbios_build_type_9_table(errp); > > > smbios_build_type_11_table(); > > > > > > -#define MAX_DIMM_SZ (16 * GiB) > > > -#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \ > > > - : ((current_machine->ram_size - 1) % MAX_DIMM_SZ) + 1) > > > +#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \ > > > + : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1) > > > > > > - dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) / > > > - MAX_DIMM_SZ; > > > + dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, > > > + mc->smbios_memory_device_size) / > > > + mc->smbios_memory_device_size; > > > > > > /* > > > * The offset determines if we need to keep additional space between > > > -- > > > 2.43.0 > >
diff --git a/include/hw/boards.h b/include/hw/boards.h index ef6f18f2c1..48ff6d8b93 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -237,6 +237,9 @@ typedef struct { * purposes only. * Applies only to default memory backend, i.e., explicit memory backend * wasn't used. + * @smbios_memory_device_size: + * Default size of memory device, + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)" */ struct MachineClass { /*< private >*/ @@ -304,6 +307,7 @@ struct MachineClass { const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx); ram_addr_t (*fixup_ram_size)(ram_addr_t size); + uint64_t smbios_memory_device_size; }; /** diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b0c68d66a3..719e83e6a1 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1) static void virt_machine_9_0_options(MachineClass *mc) { virt_machine_9_1_options(mc); + mc->smbios_memory_device_size = 16 * GiB; compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len); } DEFINE_VIRT_MACHINE(9, 0) diff --git a/hw/core/machine.c b/hw/core/machine.c index bc38cad7f2..3cfdaec65d 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -1004,6 +1004,7 @@ static void machine_class_init(ObjectClass *oc, void *data) /* Default 128 MB as guest ram size */ mc->default_ram_size = 128 * MiB; mc->rom_file_has_mr = true; + mc->smbios_memory_device_size = 2047 * TiB; /* numa node memory size aligned on 8MB by default. * On Linux, each node's border has to be 8MB aligned diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 9445b07b4f..d9e69243b4 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m) pc_i440fx_machine_9_1_options(m); m->alias = NULL; m->is_default = false; + m->smbios_memory_device_size = 16 * GiB; compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 71d3c6d122..9d108b194e 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m) PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_machine_9_1_options(m); m->alias = NULL; + m->smbios_memory_device_size = 16 * GiB; compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len); compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len); pcmc->isa_bios_alias = false; diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c index 3b7703489d..a394514264 100644 --- a/hw/smbios/smbios.c +++ b/hw/smbios/smbios.c @@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms, Error **errp) { unsigned i, dimm_cnt, offset; + MachineClass *mc = MACHINE_GET_CLASS(ms); ERRP_GUARD(); assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 || @@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms, smbios_build_type_9_table(errp); smbios_build_type_11_table(); -#define MAX_DIMM_SZ (16 * GiB) -#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \ - : ((current_machine->ram_size - 1) % MAX_DIMM_SZ) + 1) +#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \ + : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1) - dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) / - MAX_DIMM_SZ; + dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, + mc->smbios_memory_device_size) / + mc->smbios_memory_device_size; /* * The offset determines if we need to keep additional space between
Currently SMBIOS maximum memory device chunk is capped at 16Gb, which is fine for the most cases (QEMU uses it to describe initial RAM (type 17 SMBIOS table entries)). However when starting guest with terabytes of RAM this leads to too many memory device structures, which eventually upsets linux kernel as it reserves only 64K for these entries and when that border is crossed out it runs out of reserved memory. Instead of partitioning initial RAM on 16Gb chunks, use maximum possible chunk size that SMBIOS spec allows[1]. Which lets encode RAM in Mb units in uint32_t-1 field (upto 2047Tb). As result initial RAM will generate only one type 17 structure until host/guest reach ability to use more RAM in the future. Compat changes: We can't unconditionally change chunk size as it will break QEMU<->guest ABI (and migration). Thus introduce a new machine class field that would let older versioned machines to use 16Gb chunks while new machine type could use maximum possible chunk size. While it might seem to be risky to rise max entry size this much (much beyond of what current physical RAM modules support), I'd not expect it causing much issues, modulo uncovering bugs in software running within guest. And those should be fixed on guest side to handle SMBIOS spec properly, especially if guest is expected to support so huge RAM configs. In worst case, QEMU can reduce chunk size later if we would care enough about introducing a workaround for some 'unfixable' guest OS, either by fixing up the next machine type or giving users a CLI option to customize it. 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size PS: * tested on 8Tb host with RHEL6 guest, which seems to parse type 17 SMBIOS table entries correctly (according to 'dmidecode'). Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- include/hw/boards.h | 4 ++++ hw/arm/virt.c | 1 + hw/core/machine.c | 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + hw/smbios/smbios.c | 11 ++++++----- 6 files changed, 14 insertions(+), 5 deletions(-)