Message ID | 20220804230411.17720-1-Jason@zx2c4.com |
---|---|
State | New |
Headers | show |
Series | [v3] hw/i386: place setup_data at fixed place in memory | expand |
On 8/5/22 01:04, Jason A. Donenfeld wrote: > + /* Nothing else uses this part of the hardware mapped region */ > + setup_data_base = 0xfffff - 0x1000; Isn't this where the BIOS lives? I don't think this works. Does it work to place setup_data at the end of the cmdline file instead of having it at the end of the kernel file? This way the first item will be at 0x20000 + cmdline_size. Paolo
On Fri, 5 Aug 2022 at 10:10, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 8/5/22 01:04, Jason A. Donenfeld wrote: > > + /* Nothing else uses this part of the hardware mapped region */ > > + setup_data_base = 0xfffff - 0x1000; > > Isn't this where the BIOS lives? I don't think this works. > > Does it work to place setup_data at the end of the cmdline file instead > of having it at the end of the kernel file? This way the first item > will be at 0x20000 + cmdline_size. > Does QEMU always allocate the command line statically like that? AFAIK, OVMF never accesses that memory to read the command line, it uses fw_cfg to copy it into a buffer it allocates itself. And I guess that implies that this region could be clobbered by OVMF unless it is told to preserve it.
Hi Paolo, On Fri, Aug 05, 2022 at 10:10:02AM +0200, Paolo Bonzini wrote: > On 8/5/22 01:04, Jason A. Donenfeld wrote: > > + /* Nothing else uses this part of the hardware mapped region */ > > + setup_data_base = 0xfffff - 0x1000; > > Isn't this where the BIOS lives? I don't think this works. That's the segment dedicated to ROM and hardware mapped addresses. So that's a place to put ROM material. No actual software will use it. Jason
On 08/05/22 14:47, Jason A. Donenfeld wrote: > Hi Paolo, > > On Fri, Aug 05, 2022 at 10:10:02AM +0200, Paolo Bonzini wrote: >> On 8/5/22 01:04, Jason A. Donenfeld wrote: >>> + /* Nothing else uses this part of the hardware mapped region */ >>> + setup_data_base = 0xfffff - 0x1000; >> >> Isn't this where the BIOS lives? I don't think this works. > > That's the segment dedicated to ROM and hardware mapped addresses. So > that's a place to put ROM material. No actual software will use it. ... accordingly (I think), when the guest tries to read it, it will see the ROM MemoryRegion that QEMU places there, not RAM contents. "info mtree" QEMU monitor command output (excerpt), while OVMF is in the Boot Device Selection phase (well, I left it waiting in the Setup TUI): address-space: memory 0000000000000000-ffffffffffffffff (prio 0, i/o): system 0000000000000000-000000007fffffff (prio 0, ram): alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff 0000000000000000-ffffffffffffffff (prio -1, i/o): pci 00000000000a0000-00000000000affff (prio 2, ram): alias vga.chain4 @vga.vram 0000000000000000-000000000000ffff 00000000000a0000-00000000000bffff (prio 1, i/o): vga-lowmem 00000000000c0000-00000000000dffff (prio 1, rom): pc.rom 00000000000e0000-00000000000fffff (prio 1, rom): isa-bios flat view ("info mtree -f"): FlatView #1 AS "memory", root: system AS "cpu-memory-0", root: system AS "cpu-memory-1", root: system AS "cpu-memory-2", root: system AS "cpu-memory-3", root: system AS "mch", root: bus master container AS "ICH9-LPC", root: bus master container AS "ich9-ahci", root: bus master container AS "ICH9-SMB", root: bus master container AS "pcie-root-port", root: bus master container AS "pcie-root-port", root: bus master container AS "pcie-root-port", root: bus master container AS "pcie-root-port", root: bus master container AS "pcie-root-port", root: bus master container AS "qemu-xhci", root: bus master container AS "virtio-scsi-pci", root: bus master container AS "virtio-serial-pci", root: bus master container AS "virtio-net-pci", root: bus master container AS "VGA", root: bus master container AS "virtio-balloon-pci", root: bus master container AS "virtio-rng-pci", root: bus master container Root memory region: system 0000000000000000-000000000002ffff (prio 0, ram): pc.ram KVM 0000000000030000-000000000004ffff (prio 1, i/o): smbase-blackhole 0000000000050000-000000000009ffff (prio 0, ram): pc.ram @0000000000050000 KVM 00000000000a0000-00000000000affff (prio 1, ram): vga.vram KVM 00000000000b0000-00000000000bffff (prio 1, i/o): vga-lowmem @0000000000010000 00000000000c0000-00000000000c3fff (prio 0, rom): pc.ram @00000000000c0000 KVM 00000000000c4000-00000000000dffff (prio 1, rom): pc.rom @0000000000004000 KVM 00000000000e0000-00000000000fffff (prio 1, rom): isa-bios KVM Laszlo
On 8/5/22 13:08, Ard Biesheuvel wrote: >> >> Does it work to place setup_data at the end of the cmdline file instead >> of having it at the end of the kernel file? This way the first item >> will be at 0x20000 + cmdline_size. >> > Does QEMU always allocate the command line statically like that? > AFAIK, OVMF never accesses that memory to read the command line, it > uses fw_cfg to copy it into a buffer it allocates itself. And I guess > that implies that this region could be clobbered by OVMF unless it is > told to preserve it. No it's not. :( It also goes to gBS->AllocatePages in the end. At this point it seems to me that without extra changes the whole setup_data concept is dead on arrival for OVMF. In principle there's no reason why the individual setup_data items couldn't include interior pointers, meaning that the setup_data _has_ to be at the address provided in fw_cfg by QEMU. One way to "fix" it would be for OVMF to overwrite the pointer to the head of the list, so that the kernel ignores the setup data provided by QEMU. Another way would be to put it in the command line fw_cfg blob and teach OVMF to use a fixed address for the command line. Both are ugly, and both are also broken for new QEMU / old OVMF. In any case, I don't think this should be fixed so close to the release. We have two possibilities: 1) if we believe "build setup_data in QEMU" is a feasible design that only needs more yak shaving, we can keep the code in, but disabled by default, and sort it out in 7.2. 2) if we go for an alternative design, it needs to be reverted. For example the randomness could be in _another_ fw_cfg file, and the linuxboot DMA can patch it in the setup_data. With (2) the OVMF breakage would be limited to -dtb, which more or less nobody cares about, and we can just look the other way. Paolo
On Fri, 5 Aug 2022 at 19:29, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 8/5/22 13:08, Ard Biesheuvel wrote: > >> > >> Does it work to place setup_data at the end of the cmdline file instead > >> of having it at the end of the kernel file? This way the first item > >> will be at 0x20000 + cmdline_size. > >> > > Does QEMU always allocate the command line statically like that? > > AFAIK, OVMF never accesses that memory to read the command line, it > > uses fw_cfg to copy it into a buffer it allocates itself. And I guess > > that implies that this region could be clobbered by OVMF unless it is > > told to preserve it. > > No it's not. :( It also goes to gBS->AllocatePages in the end. > > At this point it seems to me that without extra changes the whole > setup_data concept is dead on arrival for OVMF. In principle there's no > reason why the individual setup_data items couldn't include interior > pointers, meaning that the setup_data _has_ to be at the address > provided in fw_cfg by QEMU. > AIUI, the setup_data nodes are appended at the end, so they are not covered by the setup_data fw_cfg file but the kernel one. > One way to "fix" it would be for OVMF to overwrite the pointer to the > head of the list, so that the kernel ignores the setup data provided by > QEMU. Another way would be to put it in the command line fw_cfg blob and > teach OVMF to use a fixed address for the command line. Both are ugly, > and both are also broken for new QEMU / old OVMF. > This is the 'pure EFI' boot path in OVMF, which means that the firmware does not rely on definitions of struct bootparams or struct setup_header at all. Introducing that dependency just for this is something I'd really prefer to avoid. > In any case, I don't think this should be fixed so close to the release. > We have two possibilities: > > 1) if we believe "build setup_data in QEMU" is a feasible design that > only needs more yak shaving, we can keep the code in, but disabled by > default, and sort it out in 7.2. > As I argued before, conflating the 'file' representation with the 'memory' representation like this is fundamentally flawed. fw_cfg happily DMA's those files anywhere you like, so their contents should not be position dependent like this. So Jason's fix gets us halfway there, although we now pass information to the kernel that is not covered by signatures or measurements, where the setup_data pointer itself is. This means you can replace a single SETUP_RNG_SEED node in memory with a whole set of SETUP_xxx nodes that might be rigged to manipulate the boot in a way that measured boot won't detect. This is perhaps a bit of a stretch, and arguably only a problem if secure or measured boot are enabled to begin with, in which case we could impose additional policy on the use of setup_data. But still ... > 2) if we go for an alternative design, it needs to be reverted. For > example the randomness could be in _another_ fw_cfg file, and the > linuxboot DMA can patch it in the setup_data. > > > With (2) the OVMF breakage would be limited to -dtb, which more or less > nobody cares about, and we can just look the other way. > > Paolo
On Fri, Aug 05, 2022 at 07:29:29PM +0200, Paolo Bonzini wrote: > On 8/5/22 13:08, Ard Biesheuvel wrote: > > > > > > Does it work to place setup_data at the end of the cmdline file instead > > > of having it at the end of the kernel file? This way the first item > > > will be at 0x20000 + cmdline_size. > > > > > Does QEMU always allocate the command line statically like that? > > AFAIK, OVMF never accesses that memory to read the command line, it > > uses fw_cfg to copy it into a buffer it allocates itself. And I guess > > that implies that this region could be clobbered by OVMF unless it is > > told to preserve it. > > No it's not. :( It also goes to gBS->AllocatePages in the end. > > At this point it seems to me that without extra changes the whole setup_data > concept is dead on arrival for OVMF. In principle there's no reason why the > individual setup_data items couldn't include interior pointers, meaning that > the setup_data _has_ to be at the address provided in fw_cfg by QEMU. > > One way to "fix" it would be for OVMF to overwrite the pointer to the head > of the list, so that the kernel ignores the setup data provided by QEMU. > Another way would be to put it in the command line fw_cfg blob and teach > OVMF to use a fixed address for the command line. Both are ugly, and both > are also broken for new QEMU / old OVMF. > > In any case, I don't think this should be fixed so close to the release. We > have two possibilities: > > 1) if we believe "build setup_data in QEMU" is a feasible design that only > needs more yak shaving, we can keep the code in, but disabled by default, > and sort it out in 7.2. > > 2) if we go for an alternative design, it needs to be reverted. For example > the randomness could be in _another_ fw_cfg file, and the linuxboot DMA can > patch it in the setup_data. > > > With (2) the OVMF breakage would be limited to -dtb, which more or less > nobody cares about, and we can just look the other way. > > Paolo So IIUC you retract your pc: add property for Linux setup_data random number seed then? It's neither of the two options above.
Hey Paolo, On Fri, Aug 05, 2022 at 02:47:27PM +0200, Jason A. Donenfeld wrote: > Hi Paolo, > > On Fri, Aug 05, 2022 at 10:10:02AM +0200, Paolo Bonzini wrote: > > On 8/5/22 01:04, Jason A. Donenfeld wrote: > > > + /* Nothing else uses this part of the hardware mapped region */ > > > + setup_data_base = 0xfffff - 0x1000; > > > > Isn't this where the BIOS lives? I don't think this works. > > That's the segment dedicated to ROM and hardware mapped addresses. So > that's a place to put ROM material. No actual software will use it. > > Jason Unless I've misread the thread, I don't think there are any remaining objections, right? Can we try merging this and seeing if it fixes the issue for good? Jason
On Tue, Aug 09, 2022 at 02:17:23PM +0200, Jason A. Donenfeld wrote: > Hey Paolo, > > On Fri, Aug 05, 2022 at 02:47:27PM +0200, Jason A. Donenfeld wrote: > > Hi Paolo, > > > > On Fri, Aug 05, 2022 at 10:10:02AM +0200, Paolo Bonzini wrote: > > > On 8/5/22 01:04, Jason A. Donenfeld wrote: > > > > + /* Nothing else uses this part of the hardware mapped region */ > > > > + setup_data_base = 0xfffff - 0x1000; > > > > > > Isn't this where the BIOS lives? I don't think this works. > > > > That's the segment dedicated to ROM and hardware mapped addresses. So > > that's a place to put ROM material. No actual software will use it. > > > > Jason > > Unless I've misread the thread, I don't think there are any remaining > objections, right? Can we try merging this and seeing if it fixes the > issue for good? > > Jason Laszlo commented here: https://lore.kernel.org/r/fa0601e4-acf5-0ce8-9277-4d90d046b53e%40redhat.com
On Tue, Aug 09, 2022 at 10:07:44AM -0400, Michael S. Tsirkin wrote: > On Tue, Aug 09, 2022 at 02:17:23PM +0200, Jason A. Donenfeld wrote: > > Hey Paolo, > > > > On Fri, Aug 05, 2022 at 02:47:27PM +0200, Jason A. Donenfeld wrote: > > > Hi Paolo, > > > > > > On Fri, Aug 05, 2022 at 10:10:02AM +0200, Paolo Bonzini wrote: > > > > On 8/5/22 01:04, Jason A. Donenfeld wrote: > > > > > + /* Nothing else uses this part of the hardware mapped region */ > > > > > + setup_data_base = 0xfffff - 0x1000; > > > > > > > > Isn't this where the BIOS lives? I don't think this works. > > > > > > That's the segment dedicated to ROM and hardware mapped addresses. So > > > that's a place to put ROM material. No actual software will use it. > > > > > > Jason > > > > Unless I've misread the thread, I don't think there are any remaining > > objections, right? Can we try merging this and seeing if it fixes the > > issue for good? > > > > Jason > > Laszlo commented here: > https://lore.kernel.org/r/fa0601e4-acf5-0ce8-9277-4d90d046b53e%40redhat.com It is 7.1.0 rc2 date today, which leaves ideally only one rc remaining before GA release. The discussion still taking place in this thread does not fill me with confidence that we're going to have a *well tested* solution before GA. Even if we agree on a patch, are we really going to have confidence in it being reliable if we've only got a week of testing ? IMHO we're at the point where we should just disable the RNG feature for 7.1.0, and gives ourselves time to come up with a solution in 7.2.0 that can be properly tested without the time pressure of release deadlines. With regards, Daniel
On 8/9/22 11:17, Michael S. Tsirkin wrote: >> 1) if we believe "build setup_data in QEMU" is a feasible design that only >> needs more yak shaving, we can keep the code in, but disabled by default, >> and sort it out in 7.2. >> >> 2) if we go for an alternative design, it needs to be reverted. For example >> the randomness could be in _another_ fw_cfg file, and the linuxboot DMA can >> patch it in the setup_data. >> >> With (2) the OVMF breakage would be limited to -dtb, which more or less >> nobody cares about, and we can just look the other way. > > So IIUC you retract your pc: add property for Linux setup_data random > number seed then? It's neither of the two options above. That one would be a base for (1). Another choice (3) is to put a pointer to the first setup_data in a new fw_cfg entry, and let the option ROMs place it in the header. In any case, as Laszlo said this [PATCH v3] does not work because 0xf0000 is mapped as ROM (and if it worked, it would have the same problem as the first 640K). Paolo
diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 050eedc0c8..3affef3277 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -773,10 +773,10 @@ void x86_load_linux(X86MachineState *x86ms, bool linuxboot_dma_enabled = X86_MACHINE_GET_CLASS(x86ms)->fwcfg_dma_enabled; uint16_t protocol; int setup_size, kernel_size, cmdline_size; - int dtb_size, setup_data_offset; + int dtb_size, setup_data_item_len, setup_data_total_len = 0; uint32_t initrd_max; - uint8_t header[8192], *setup, *kernel; - hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, first_setup_data = 0; + uint8_t header[8192], *setup, *kernel, *setup_datas = NULL; + hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0, first_setup_data = 0, setup_data_base; FILE *f; char *vmode; MachineState *machine = MACHINE(x86ms); @@ -899,6 +899,8 @@ void x86_load_linux(X86MachineState *x86ms, cmdline_addr = 0x20000; prot_addr = 0x100000; } + /* Nothing else uses this part of the hardware mapped region */ + setup_data_base = 0xfffff - 0x1000; /* highest address for loading the initrd */ if (protocol >= 0x20c && @@ -1062,34 +1064,35 @@ void x86_load_linux(X86MachineState *x86ms, exit(1); } - setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16); - kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size; - kernel = g_realloc(kernel, kernel_size); - - - setup_data = (struct setup_data *)(kernel + setup_data_offset); + setup_data_item_len = sizeof(struct setup_data) + dtb_size; + setup_datas = g_realloc(setup_datas, setup_data_total_len + setup_data_item_len); + setup_data = (struct setup_data *)(setup_datas + setup_data_total_len); setup_data->next = cpu_to_le64(first_setup_data); - first_setup_data = prot_addr + setup_data_offset; + first_setup_data = setup_data_base + setup_data_total_len; + setup_data_total_len += setup_data_item_len; setup_data->type = cpu_to_le32(SETUP_DTB); setup_data->len = cpu_to_le32(dtb_size); - load_image_size(dtb_filename, setup_data->data, dtb_size); } if (!legacy_no_rng_seed) { - setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16); - kernel_size = setup_data_offset + sizeof(struct setup_data) + RNG_SEED_LENGTH; - kernel = g_realloc(kernel, kernel_size); - setup_data = (struct setup_data *)(kernel + setup_data_offset); + setup_data_item_len = sizeof(struct setup_data) + RNG_SEED_LENGTH; + setup_datas = g_realloc(setup_datas, setup_data_total_len + setup_data_item_len); + setup_data = (struct setup_data *)(setup_datas + setup_data_total_len); setup_data->next = cpu_to_le64(first_setup_data); - first_setup_data = prot_addr + setup_data_offset; + first_setup_data = setup_data_base + setup_data_total_len; + setup_data_total_len += setup_data_item_len; setup_data->type = cpu_to_le32(SETUP_RNG_SEED); setup_data->len = cpu_to_le32(RNG_SEED_LENGTH); qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH); } - /* Offset 0x250 is a pointer to the first setup_data link. */ - stq_p(header + 0x250, first_setup_data); + if (first_setup_data && !sev_enabled()) { + /* Offset 0x250 is a pointer to the first setup_data link. */ + stq_p(header + 0x250, first_setup_data); + rom_add_blob("setup_data", setup_datas, setup_data_total_len, setup_data_total_len, + setup_data_base, NULL, NULL, NULL, NULL, false); + } /* * If we're starting an encrypted VM, it will be OVMF based, which uses the
The boot parameter header refers to setup_data at an absolute address, and each setup_data refers to the next setup_data at an absolute address too. Currently QEMU simply puts the setup_datas right after the kernel image, and since the kernel_image is loaded at prot_addr -- a fixed address knowable to QEMU apriori -- the setup_data absolute address winds up being just `prot_addr + a_fixed_offset_into_kernel_image`. This mostly works fine, so long as the kernel image really is loaded at prot_addr. However, OVMF doesn't load the kernel at prot_addr, and generally EFI doesn't give a good way of predicting where it's going to load the kernel. So when it loads it at some address != prot_addr, the absolute addresses in setup_data now point somewhere bogus, causing crashes when EFI stub tries to follow the next link. Fix this by placing setup_data at some fixed place in memory, not as part of the kernel image, and then pointing the setup_data absolute address to that fixed place in memory. This way, even if OVMF or other chains relocate the kernel image, the boot parameter still points to the correct absolute address. For this, an unused part of the hardware mapped area is used, which isn't used by anything else. Fixes: 3cbeb52467 ("hw/i386: add device tree support") Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Laszlo Ersek <lersek@redhat.com> Cc: linux-efi@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- hw/i386/x86.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-)