diff mbox series

[resend,v3] hw/i386: pass RNG seed via setup_data entry

Message ID 20220719115300.104095-1-Jason@zx2c4.com
State New
Headers show
Series [resend,v3] hw/i386: pass RNG seed via setup_data entry | expand

Commit Message

Jason A. Donenfeld July 19, 2022, 11:53 a.m. UTC
Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way. For this
purpose, Linux (≥5.20) supports passing a seed in the setup_data table
with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
specialized bootloaders. The linked commit shows the upstream kernel
implementation.

Link: https://git.kernel.org/tip/tip/c/68b8e9713c8
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
I'm resending this because apparently I didn't CC the right group of
recipients before.

 hw/i386/x86.c                                | 19 ++++++++++++++-----
 include/standard-headers/asm-x86/bootparam.h |  1 +
 2 files changed, 15 insertions(+), 5 deletions(-)

Comments

Jason A. Donenfeld July 20, 2022, 1:03 p.m. UTC | #1
Hi Paolo,

On Tue, Jul 19, 2022 at 01:53:00PM +0200, Jason A. Donenfeld wrote:
> Tiny machines optimized for fast boot time generally don't use EFI,
> which means a random seed has to be supplied some other way. For this
> purpose, Linux (≥5.20) supports passing a seed in the setup_data table
> with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
> specialized bootloaders. The linked commit shows the upstream kernel
> implementation.

Having received your message in the other thread hinting, "I think
there are some issues with migration compatibility of setup_data and
they snowball a bit, so I'll reply there," and being a bit eager to get
this moving, I thought I'd preempt that discussion by trying to guess
what you have in mind and replying to it. Speculative email execution...

The SETUP_RNG_SEED parameter is used only during boot, and Linux takes
pains to zero out its content after using. If a VM is migrated or
copied, the RNG state is also migrated, just as is the case before
SETUP_RNG_SEED. For that reason, Linux also has a "vmgenid" driver,
which QEMU supports via `-device vmgenid,guid=auto`, which is an ACPI
mechanism for telling the RNG to reseed under various migration
circumstances. But this is merely complementary to SETUP_RNG_SEED, which
is intended as a very simple mechanism for passing a seed at the
earliest moment in boot, akin to DT's "rng-seed" node.

Hopefully this answers what I think you were going to ask, and sorry if
it's a total non-sequitur.

Regards,
Jason
Paolo Bonzini July 21, 2022, 9:19 a.m. UTC | #2
On 7/20/22 15:03, Jason A. Donenfeld wrote:
> Hi Paolo,
> 
> On Tue, Jul 19, 2022 at 01:53:00PM +0200, Jason A. Donenfeld wrote:
>> Tiny machines optimized for fast boot time generally don't use EFI,
>> which means a random seed has to be supplied some other way. For this
>> purpose, Linux (≥5.20) supports passing a seed in the setup_data table
>> with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
>> specialized bootloaders. The linked commit shows the upstream kernel
>> implementation.
> 
> Having received your message in the other thread hinting, "I think
> there are some issues with migration compatibility of setup_data and
> they snowball a bit, so I'll reply there," and being a bit eager to get
> this moving, I thought I'd preempt that discussion by trying to guess
> what you have in mind and replying to it. Speculative email execution...
> 
> The SETUP_RNG_SEED parameter is used only during boot, and Linux takes
> pains to zero out its content after using. If a VM is migrated or
> copied, the RNG state is also migrated, just as is the case before
> SETUP_RNG_SEED. For that reason, Linux also has a "vmgenid" driver,
> which QEMU supports via `-device vmgenid,guid=auto`, which is an ACPI
> mechanism for telling the RNG to reseed under various migration
> circumstances. But this is merely complementary to SETUP_RNG_SEED, which
> is intended as a very simple mechanism for passing a seed at the
> earliest moment in boot, akin to DT's "rng-seed" node.
> 
> Hopefully this answers what I think you were going to ask, and sorry if
> it's a total non-sequitur.

It's not entirely what I was thinking about but it is interesting anyway 
so thanks for writing that.  Sorry it took some time to reply but these 
live migration issues are subtle and I wanted to be sure of what is and 
isn't a problem.

The issue with live migration is that the setup data changes from before 
to after the patches.  This means that a live migration exactly _in the 
middle_ of reading the Linux boot data could fail badly.  For example, 
you could migrate in the middle of reading the DTB, and it would be 
shifted by the ~50 bytes of the setup_data and seed.  The size would 
also not match so, even though probably things would mostly work if you 
place the seed last, that's not really optimal either.

Now I understand that this would be exceedingly unlikely or downright 
impossible, but the main issue is that, roughly speaking, we try to 
guarantee unchanged guest ABI on every execution with the appropriate 
command line options.  So the solution is to add a machine property 
(e.g. "-M linuxboot-seed=...") that allows enabling/disabling it, and 
then making it default to off with machine types like pc-i440fx-7.0 that 
never had the seed.

In turn this means that the "shape" of the linked list changes a bit and 
it's better to abstract the creation of the setup_data struct in a 
separate function.  And then you've got to move the various local 
variables of x86_load_linux into a struct for sharing.  As I said, it 
snowballs a bit, but I should be sending out patches later today.

As an aside, QEMU tends to only include code after Linux supports it, 
but it's in your rng tree so the timing is right; I'll queue the FDT 
ones for 7.1 since it's nice to have feature parity across all FDT boards.

Paolo
Jason A. Donenfeld July 21, 2022, 9:47 a.m. UTC | #3
Hi Paolo,

Thanks for your review.

On Thu, Jul 21, 2022 at 11:19:40AM +0200, Paolo Bonzini wrote:
> The issue with live migration is that the setup data changes from before 
> to after the patches.  This means that a live migration exactly _in the 
> middle_ of reading the Linux boot data could fail badly.  For example, 
> you could migrate in the middle of reading the DTB, and it would be 
> shifted by the ~50 bytes of the setup_data and seed.  The size would 
> also not match so, even though probably things would mostly work if you 
> place the seed last, that's not really optimal either.

This doesn't really make sense to me, as I don't think the machine can
even be migrated during x86_load_linux(), and a migration will skip this
whole step anyway since this is mutable memory that a live kernel does
mutate.

However, what I'll do is reverse the order of these, so that the DTB is
added first, and I'll only set up the links in the right order so that
there's no potential race. I'll send a v+1 doing this shortly.

I would really very much prefer *not* adding a useless knob for this
feature, especially not one that's off by default. The idea is to
finally fix randomness for VMs globally in a non-invasive way, and
fixing the [implausible] race mentioned above seems like it'll do the
trick.

> variables of x86_load_linux into a struct for sharing.  As I said, it 
> snowballs a bit, but I should be sending out patches later today.

I'll send a patch, as mentioned above.

> As an aside, QEMU tends to only include code after Linux supports it, 
> but it's in your rng tree so the timing is right

This one is actually in "tip", which is the x86 tree, so it'll certainly
be in 5.20.

Jason
Jason A. Donenfeld July 21, 2022, 9:56 a.m. UTC | #4
Hey again,

On Thu, Jul 21, 2022 at 11:47:29AM +0200, Jason A. Donenfeld wrote:
> Hi Paolo,
> 
> Thanks for your review.
> 
> On Thu, Jul 21, 2022 at 11:19:40AM +0200, Paolo Bonzini wrote:
> > The issue with live migration is that the setup data changes from before 
> > to after the patches.  This means that a live migration exactly _in the 
> > middle_ of reading the Linux boot data could fail badly.  For example, 
> > you could migrate in the middle of reading the DTB, and it would be 
> > shifted by the ~50 bytes of the setup_data and seed.  The size would 
> > also not match so, even though probably things would mostly work if you 
> > place the seed last, that's not really optimal either.
> 
> This doesn't really make sense to me, as I don't think the machine can
> even be migrated during x86_load_linux(), and a migration will skip this
> whole step anyway since this is mutable memory that a live kernel does
> mutate.
> 
> However, what I'll do is reverse the order of these, so that the DTB is
> added first, and I'll only set up the links in the right order so that
> there's no potential race. I'll send a v+1 doing this shortly.

As I implement the "race-free" version, I notice that this is even more
of a non-issue, seeing as even without this patch, the DTB is loaded
after the length is written. What you're talking about is just not real.
I'll still send a v+1 changing the order, because that seems better
anyway, but the race thing seems pretty imaginary...

Jason
diff mbox series

Patch

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 6003b4b2df..0724759eec 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -26,6 +26,7 @@ 
 #include "qemu/cutils.h"
 #include "qemu/units.h"
 #include "qemu/datadir.h"
+#include "qemu/guest-random.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/qapi-visit-common.h"
@@ -1045,6 +1046,16 @@  void x86_load_linux(X86MachineState *x86ms,
     }
     fclose(f);
 
+    setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
+    kernel_size = setup_data_offset + sizeof(struct setup_data) + 32;
+    kernel = g_realloc(kernel, kernel_size);
+    stq_p(header + 0x250, prot_addr + setup_data_offset);
+    setup_data = (struct setup_data *)(kernel + setup_data_offset);
+    setup_data->next = 0;
+    setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
+    setup_data->len = cpu_to_le32(32);
+    qemu_guest_getrandom_nofail(setup_data->data, 32);
+
     /* append dtb to kernel */
     if (dtb_filename) {
         if (protocol < 0x209) {
@@ -1059,13 +1070,11 @@  void x86_load_linux(X86MachineState *x86ms,
             exit(1);
         }
 
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
+        kernel_size += sizeof(struct setup_data) + dtb_size;
         kernel = g_realloc(kernel, kernel_size);
 
-        stq_p(header + 0x250, prot_addr + setup_data_offset);
-
-        setup_data = (struct setup_data *)(kernel + setup_data_offset);
+        setup_data->next = prot_addr + setup_data_offset + sizeof(*setup_data) + setup_data->len;
+        ++setup_data;
         setup_data->next = 0;
         setup_data->type = cpu_to_le32(SETUP_DTB);
         setup_data->len = cpu_to_le32(dtb_size);
diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h
index 072e2ed546..b2aaad10e5 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -10,6 +10,7 @@ 
 #define SETUP_EFI			4
 #define SETUP_APPLE_PROPERTIES		5
 #define SETUP_JAILHOUSE			6
+#define SETUP_RNG_SEED			9
 
 #define SETUP_INDIRECT			(1<<31)