Message ID | 20220719115300.104095-1-Jason@zx2c4.com |
---|---|
State | New |
Headers | show |
Series | [resend,v3] hw/i386: pass RNG seed via setup_data entry | expand |
Hi Paolo, On Tue, Jul 19, 2022 at 01:53:00PM +0200, Jason A. Donenfeld wrote: > Tiny machines optimized for fast boot time generally don't use EFI, > which means a random seed has to be supplied some other way. For this > purpose, Linux (≥5.20) supports passing a seed in the setup_data table > with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and > specialized bootloaders. The linked commit shows the upstream kernel > implementation. Having received your message in the other thread hinting, "I think there are some issues with migration compatibility of setup_data and they snowball a bit, so I'll reply there," and being a bit eager to get this moving, I thought I'd preempt that discussion by trying to guess what you have in mind and replying to it. Speculative email execution... The SETUP_RNG_SEED parameter is used only during boot, and Linux takes pains to zero out its content after using. If a VM is migrated or copied, the RNG state is also migrated, just as is the case before SETUP_RNG_SEED. For that reason, Linux also has a "vmgenid" driver, which QEMU supports via `-device vmgenid,guid=auto`, which is an ACPI mechanism for telling the RNG to reseed under various migration circumstances. But this is merely complementary to SETUP_RNG_SEED, which is intended as a very simple mechanism for passing a seed at the earliest moment in boot, akin to DT's "rng-seed" node. Hopefully this answers what I think you were going to ask, and sorry if it's a total non-sequitur. Regards, Jason
On 7/20/22 15:03, Jason A. Donenfeld wrote: > Hi Paolo, > > On Tue, Jul 19, 2022 at 01:53:00PM +0200, Jason A. Donenfeld wrote: >> Tiny machines optimized for fast boot time generally don't use EFI, >> which means a random seed has to be supplied some other way. For this >> purpose, Linux (≥5.20) supports passing a seed in the setup_data table >> with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and >> specialized bootloaders. The linked commit shows the upstream kernel >> implementation. > > Having received your message in the other thread hinting, "I think > there are some issues with migration compatibility of setup_data and > they snowball a bit, so I'll reply there," and being a bit eager to get > this moving, I thought I'd preempt that discussion by trying to guess > what you have in mind and replying to it. Speculative email execution... > > The SETUP_RNG_SEED parameter is used only during boot, and Linux takes > pains to zero out its content after using. If a VM is migrated or > copied, the RNG state is also migrated, just as is the case before > SETUP_RNG_SEED. For that reason, Linux also has a "vmgenid" driver, > which QEMU supports via `-device vmgenid,guid=auto`, which is an ACPI > mechanism for telling the RNG to reseed under various migration > circumstances. But this is merely complementary to SETUP_RNG_SEED, which > is intended as a very simple mechanism for passing a seed at the > earliest moment in boot, akin to DT's "rng-seed" node. > > Hopefully this answers what I think you were going to ask, and sorry if > it's a total non-sequitur. It's not entirely what I was thinking about but it is interesting anyway so thanks for writing that. Sorry it took some time to reply but these live migration issues are subtle and I wanted to be sure of what is and isn't a problem. The issue with live migration is that the setup data changes from before to after the patches. This means that a live migration exactly _in the middle_ of reading the Linux boot data could fail badly. For example, you could migrate in the middle of reading the DTB, and it would be shifted by the ~50 bytes of the setup_data and seed. The size would also not match so, even though probably things would mostly work if you place the seed last, that's not really optimal either. Now I understand that this would be exceedingly unlikely or downright impossible, but the main issue is that, roughly speaking, we try to guarantee unchanged guest ABI on every execution with the appropriate command line options. So the solution is to add a machine property (e.g. "-M linuxboot-seed=...") that allows enabling/disabling it, and then making it default to off with machine types like pc-i440fx-7.0 that never had the seed. In turn this means that the "shape" of the linked list changes a bit and it's better to abstract the creation of the setup_data struct in a separate function. And then you've got to move the various local variables of x86_load_linux into a struct for sharing. As I said, it snowballs a bit, but I should be sending out patches later today. As an aside, QEMU tends to only include code after Linux supports it, but it's in your rng tree so the timing is right; I'll queue the FDT ones for 7.1 since it's nice to have feature parity across all FDT boards. Paolo
Hi Paolo, Thanks for your review. On Thu, Jul 21, 2022 at 11:19:40AM +0200, Paolo Bonzini wrote: > The issue with live migration is that the setup data changes from before > to after the patches. This means that a live migration exactly _in the > middle_ of reading the Linux boot data could fail badly. For example, > you could migrate in the middle of reading the DTB, and it would be > shifted by the ~50 bytes of the setup_data and seed. The size would > also not match so, even though probably things would mostly work if you > place the seed last, that's not really optimal either. This doesn't really make sense to me, as I don't think the machine can even be migrated during x86_load_linux(), and a migration will skip this whole step anyway since this is mutable memory that a live kernel does mutate. However, what I'll do is reverse the order of these, so that the DTB is added first, and I'll only set up the links in the right order so that there's no potential race. I'll send a v+1 doing this shortly. I would really very much prefer *not* adding a useless knob for this feature, especially not one that's off by default. The idea is to finally fix randomness for VMs globally in a non-invasive way, and fixing the [implausible] race mentioned above seems like it'll do the trick. > variables of x86_load_linux into a struct for sharing. As I said, it > snowballs a bit, but I should be sending out patches later today. I'll send a patch, as mentioned above. > As an aside, QEMU tends to only include code after Linux supports it, > but it's in your rng tree so the timing is right This one is actually in "tip", which is the x86 tree, so it'll certainly be in 5.20. Jason
Hey again, On Thu, Jul 21, 2022 at 11:47:29AM +0200, Jason A. Donenfeld wrote: > Hi Paolo, > > Thanks for your review. > > On Thu, Jul 21, 2022 at 11:19:40AM +0200, Paolo Bonzini wrote: > > The issue with live migration is that the setup data changes from before > > to after the patches. This means that a live migration exactly _in the > > middle_ of reading the Linux boot data could fail badly. For example, > > you could migrate in the middle of reading the DTB, and it would be > > shifted by the ~50 bytes of the setup_data and seed. The size would > > also not match so, even though probably things would mostly work if you > > place the seed last, that's not really optimal either. > > This doesn't really make sense to me, as I don't think the machine can > even be migrated during x86_load_linux(), and a migration will skip this > whole step anyway since this is mutable memory that a live kernel does > mutate. > > However, what I'll do is reverse the order of these, so that the DTB is > added first, and I'll only set up the links in the right order so that > there's no potential race. I'll send a v+1 doing this shortly. As I implement the "race-free" version, I notice that this is even more of a non-issue, seeing as even without this patch, the DTB is loaded after the length is written. What you're talking about is just not real. I'll still send a v+1 changing the order, because that seems better anyway, but the race thing seems pretty imaginary... Jason
diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 6003b4b2df..0724759eec 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -26,6 +26,7 @@ #include "qemu/cutils.h" #include "qemu/units.h" #include "qemu/datadir.h" +#include "qemu/guest-random.h" #include "qapi/error.h" #include "qapi/qmp/qerror.h" #include "qapi/qapi-visit-common.h" @@ -1045,6 +1046,16 @@ void x86_load_linux(X86MachineState *x86ms, } fclose(f); + setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16); + kernel_size = setup_data_offset + sizeof(struct setup_data) + 32; + kernel = g_realloc(kernel, kernel_size); + stq_p(header + 0x250, prot_addr + setup_data_offset); + setup_data = (struct setup_data *)(kernel + setup_data_offset); + setup_data->next = 0; + setup_data->type = cpu_to_le32(SETUP_RNG_SEED); + setup_data->len = cpu_to_le32(32); + qemu_guest_getrandom_nofail(setup_data->data, 32); + /* append dtb to kernel */ if (dtb_filename) { if (protocol < 0x209) { @@ -1059,13 +1070,11 @@ void x86_load_linux(X86MachineState *x86ms, exit(1); } - setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16); - kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size; + kernel_size += sizeof(struct setup_data) + dtb_size; kernel = g_realloc(kernel, kernel_size); - stq_p(header + 0x250, prot_addr + setup_data_offset); - - setup_data = (struct setup_data *)(kernel + setup_data_offset); + setup_data->next = prot_addr + setup_data_offset + sizeof(*setup_data) + setup_data->len; + ++setup_data; setup_data->next = 0; setup_data->type = cpu_to_le32(SETUP_DTB); setup_data->len = cpu_to_le32(dtb_size); diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index 072e2ed546..b2aaad10e5 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -10,6 +10,7 @@ #define SETUP_EFI 4 #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE 6 +#define SETUP_RNG_SEED 9 #define SETUP_INDIRECT (1<<31)
Tiny machines optimized for fast boot time generally don't use EFI, which means a random seed has to be supplied some other way. For this purpose, Linux (≥5.20) supports passing a seed in the setup_data table with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and specialized bootloaders. The linked commit shows the upstream kernel implementation. Link: https://git.kernel.org/tip/tip/c/68b8e9713c8 Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Eduardo Habkost <eduardo@habkost.net> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Philippe Mathieu-Daudé <f4bug@amsat.org> Cc: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- I'm resending this because apparently I didn't CC the right group of recipients before. hw/i386/x86.c | 19 ++++++++++++++----- include/standard-headers/asm-x86/bootparam.h | 1 + 2 files changed, 15 insertions(+), 5 deletions(-)