diff mbox series

[1/6] device-tree: add re-randomization helper function

Message ID 20220929232339.372813-1-Jason@zx2c4.com
State New
Headers show
Series [1/6] device-tree: add re-randomization helper function | expand

Commit Message

Jason A. Donenfeld Sept. 29, 2022, 11:23 p.m. UTC
When the system reboots, the rng-seed that the FDT has should be
re-randomized, so that the new boot gets a new seed. Several
architectures require this functionality, so export a function for
injecting a new seed into the given FDT.

Cc: Alistair Francis <alistair.francis@wdc.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 include/sysemu/device_tree.h |  9 +++++++++
 softmmu/device_tree.c        | 21 +++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Bin Meng Sept. 30, 2022, 8:44 a.m. UTC | #1
On Fri, Sep 30, 2022 at 7:24 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> When the system reboots, the rng-seed that the FDT has should be
> re-randomized, so that the new boot gets a new seed. Several
> architectures require this functionality, so export a function for
> injecting a new seed into the given FDT.
>
> Cc: Alistair Francis <alistair.francis@wdc.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  include/sysemu/device_tree.h |  9 +++++++++
>  softmmu/device_tree.c        | 21 +++++++++++++++++++++
>  2 files changed, 30 insertions(+)
>

Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Peter Maydell Oct. 6, 2022, 1:16 p.m. UTC | #2
On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> When the system reboots, the rng-seed that the FDT has should be
> re-randomized, so that the new boot gets a new seed. Several
> architectures require this functionality, so export a function for
> injecting a new seed into the given FDT.
>
> Cc: Alistair Francis <alistair.francis@wdc.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Hi; I've applied this series to target-arm.next (seems the easiest way
to take it into the tree). I'm not super happy about the need to
use qemu_register_reset(), but (as we discussed on irc) the amount
of refactoring of the Rom blob code to do it some other way would
be disproportionate, and this is no worse than some of the other
implicit reset-order requirements we have already. (I may come back
some day and see if there's a refactoring I like if I need to do
some reset cleanup in future.)

PS: if you could remember to send cover letters for multipatch
patchsets, that helps our automated tooling. (I think this is
why the series didn't show up in patchew, for instance.)

thanks
-- PMM
Jason A. Donenfeld Oct. 6, 2022, 1:17 p.m. UTC | #3
Hi Peter,

On Thu, Oct 6, 2022 at 7:16 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > When the system reboots, the rng-seed that the FDT has should be
> > re-randomized, so that the new boot gets a new seed. Several
> > architectures require this functionality, so export a function for
> > injecting a new seed into the given FDT.
> >
> > Cc: Alistair Francis <alistair.francis@wdc.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>
> Hi; I've applied this series to target-arm.next (seems the easiest way
> to take it into the tree).

Thanks for taking it.

> PS: if you could remember to send cover letters for multipatch
> patchsets, that helps our automated tooling. (I think this is
> why the series didn't show up in patchew, for instance.)

Good call, will do.

Jason
Peter Maydell Oct. 10, 2022, 10:54 a.m. UTC | #4
On Thu, 6 Oct 2022 at 14:16, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > When the system reboots, the rng-seed that the FDT has should be
> > re-randomized, so that the new boot gets a new seed. Several
> > architectures require this functionality, so export a function for
> > injecting a new seed into the given FDT.
> >
> > Cc: Alistair Francis <alistair.francis@wdc.com>
> > Cc: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>
> Hi; I've applied this series to target-arm.next (seems the easiest way
> to take it into the tree).

Unfortunately it turns out that this breaks the reverse-debugging
test that is part of 'make check-avocado'.

Running all of 'check-avocado' takes a long time, so here's how
to run the specific test:

      make -C your-build-tree check-venv   # Only for the first time
      your-build-tree/tests/venv/bin/avocado run
your-build-tree/tests/avocado/boot_linux.py

Probably more convenient though is to run the equivalent commands
by hand:

wget -O /tmp/vmlinuz
https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz
./build/x86/qemu-img create -f qcow2 /tmp/disk.qcow2 128M
./build/x86/qemu-system-aarch64 -display none -machine virt -serial
stdio -cpu cortex-a53 -icount
shift=7,rr=record,rrfile=/tmp/qemu.rr,rrsnapshot=init -net none -drive
file=/tmp/disk.qcow2 -kernel /tmp/vmlinuz
# this will boot the kernel to the no-root-fs panic; hit ctrl-C when
it gets there
./build/x86/qemu-system-aarch64 -display none -machine virt -serial
stdio -cpu cortex-a53 -icount
shift=7,rr=replay,rrfile=/tmp/qemu.rr,rrsnapshot=init  -net none
-drive file=/tmp/disk.qcow2 -kernel /tmp/vmlinuz
# same command line, but 'replay' rather than 'record', QEMU will exit
with an error:
qemu-system-aarch64: Missing random event in the replay log

Without these patches the replay step will replay the recorded execution
up to the guest panic.

The error is essentially the record-and-replay subsystem saying "the
replay just asked for a random number at point when the recording
did not ask for one, and so there's no 'this is what the number was'
info in the record".

I have had a quick look, and I think the reason for this is that
load_snapshot() ("reset the VM state to the snapshot state stored in the
disk image or migration stream") does a system reset. The replay
process involves a lot of "load state from a snapshot and play
forwards from there" operations. It doesn't expect that load_snapshot()
would result in something reading random data, but now that we are
calling qemu_guest_getrandom() in a reset hook, that happens.

I'm not sure exactly what the best approach here is, so I've cc'd
the migration and replay submaintainers. For the moment I'm dropping
this patchset from target-arm.next.

thanks
-- PMM
Peter Maydell Oct. 10, 2022, 10:58 a.m. UTC | #5
On Mon, 10 Oct 2022 at 11:54, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Thu, 6 Oct 2022 at 14:16, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > >
> > > When the system reboots, the rng-seed that the FDT has should be
> > > re-randomized, so that the new boot gets a new seed. Several
> > > architectures require this functionality, so export a function for
> > > injecting a new seed into the given FDT.
> > >
> > > Cc: Alistair Francis <alistair.francis@wdc.com>
> > > Cc: David Gibson <david@gibson.dropbear.id.au>
> > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> >
> > Hi; I've applied this series to target-arm.next (seems the easiest way
> > to take it into the tree).
>
> Unfortunately it turns out that this breaks the reverse-debugging
> test that is part of 'make check-avocado'.
>
> Running all of 'check-avocado' takes a long time, so here's how
> to run the specific test:
>
>       make -C your-build-tree check-venv   # Only for the first time
>       your-build-tree/tests/venv/bin/avocado run
> your-build-tree/tests/avocado/boot_linux.py

derp, wrong test name, should be

 your-build-tree/tests/venv/bin/avocado run
your-build-tree/tests/avocado/reverse_debugging.py

-- PMM
Jason A. Donenfeld Oct. 10, 2022, 3:20 p.m. UTC | #6
On Mon, Oct 10, 2022 at 11:54:50AM +0100, Peter Maydell wrote:
> The error is essentially the record-and-replay subsystem saying "the
> replay just asked for a random number at point when the recording
> did not ask for one, and so there's no 'this is what the number was'
> info in the record".
> 
> I have had a quick look, and I think the reason for this is that
> load_snapshot() ("reset the VM state to the snapshot state stored in the
> disk image or migration stream") does a system reset. The replay
> process involves a lot of "load state from a snapshot and play
> forwards from there" operations. It doesn't expect that load_snapshot()
> would result in something reading random data, but now that we are
> calling qemu_guest_getrandom() in a reset hook, that happens.

Hmm... so this seems like a bug in the replay code then? Shouldn't that
reset handler get hit during both passes, so the entry should be in
each?

Jason
Peter Maydell Oct. 10, 2022, 3:32 p.m. UTC | #7
On Mon, 10 Oct 2022 at 16:21, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Mon, Oct 10, 2022 at 11:54:50AM +0100, Peter Maydell wrote:
> > The error is essentially the record-and-replay subsystem saying "the
> > replay just asked for a random number at point when the recording
> > did not ask for one, and so there's no 'this is what the number was'
> > info in the record".
> >
> > I have had a quick look, and I think the reason for this is that
> > load_snapshot() ("reset the VM state to the snapshot state stored in the
> > disk image or migration stream") does a system reset. The replay
> > process involves a lot of "load state from a snapshot and play
> > forwards from there" operations. It doesn't expect that load_snapshot()
> > would result in something reading random data, but now that we are
> > calling qemu_guest_getrandom() in a reset hook, that happens.
>
> Hmm... so this seems like a bug in the replay code then? Shouldn't that
> reset handler get hit during both passes, so the entry should be in
> each?

No, because record is just
"reset the system, record all the way to the end stop",
but replay is
"set the system to the point we want to start at by using
load_snapshot, play from there", and depending on the actions
you do in the debugger like reverse-continue we might repeatedly
do "reload that snapshot (implying a system reset) and play from there"
multiple times.

thanks
-- PMM
Jason A. Donenfeld Oct. 10, 2022, 3:50 p.m. UTC | #8
On Mon, Oct 10, 2022 at 04:32:45PM +0100, Peter Maydell wrote:
> On Mon, 10 Oct 2022 at 16:21, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > On Mon, Oct 10, 2022 at 11:54:50AM +0100, Peter Maydell wrote:
> > > The error is essentially the record-and-replay subsystem saying "the
> > > replay just asked for a random number at point when the recording
> > > did not ask for one, and so there's no 'this is what the number was'
> > > info in the record".
> > >
> > > I have had a quick look, and I think the reason for this is that
> > > load_snapshot() ("reset the VM state to the snapshot state stored in the
> > > disk image or migration stream") does a system reset. The replay
> > > process involves a lot of "load state from a snapshot and play
> > > forwards from there" operations. It doesn't expect that load_snapshot()
> > > would result in something reading random data, but now that we are
> > > calling qemu_guest_getrandom() in a reset hook, that happens.
> >
> > Hmm... so this seems like a bug in the replay code then? Shouldn't that
> > reset handler get hit during both passes, so the entry should be in
> > each?
> 
> No, because record is just
> "reset the system, record all the way to the end stop",
> but replay is
> "set the system to the point we want to start at by using
> load_snapshot, play from there", and depending on the actions
> you do in the debugger like reverse-continue we might repeatedly
> do "reload that snapshot (implying a system reset) and play from there"
> multiple times.

Hmm. I started typing, "I really have no idea how to fix that except for
hacky ways" but then by the time I got to the end of that sentence, I
had an idea. Still maybe ugly and hacky, but maybe something akin to the
diff below?

Either way, as you mentioned in your initial email, it sounds like this
might need some involvement from the replay people. What's the best way
for us to work together on this? You mentioned you removed it from your
fixes branch, but do you think you could post it in another branch and
link to it, so that the replay maintainers have something tangible to
play with?

Jason

diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index 7ec0882b50..73e2c1ae54 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -46,6 +46,7 @@ typedef enum ReplayCheckpoint ReplayCheckpoint;
 typedef struct ReplayNetState ReplayNetState;

 extern ReplayMode replay_mode;
+extern bool replay_loading;

 /* Name of the initial VM snapshot */
 extern char *replay_snapshot;
diff --git a/migration/savevm.c b/migration/savevm.c
index 48e85c052c..97199a2506 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3037,6 +3037,8 @@ bool load_snapshot(const char *name, const char *vmstate,
         return false;
     }

+    replay_loading = true;
+
     /*
      * Flush the record/replay queue. Now the VM state is going
      * to change. Therefore we don't need to preserve its consistency
@@ -3071,6 +3073,7 @@ bool load_snapshot(const char *name, const char *vmstate,
     aio_context_release(aio_context);

     bdrv_drain_all_end();
+    replay_loading = false;

     if (ret < 0) {
         error_setg(errp, "Error %d while loading VM state", ret);
@@ -3081,6 +3084,7 @@ bool load_snapshot(const char *name, const char *vmstate,

 err_drain:
     bdrv_drain_all_end();
+    replay_loading = false;
     return false;
 }

diff --git a/replay/replay.c b/replay/replay.c
index 9a0dc1cf44..16e16e274b 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -26,6 +26,7 @@
 /* Size of replay log header */
 #define HEADER_SIZE                 (sizeof(uint32_t) + sizeof(uint64_t))

+bool replay_loading;
 ReplayMode replay_mode = REPLAY_MODE_NONE;
 char *replay_snapshot;

diff --git a/stubs/replay.c b/stubs/replay.c
index 9d5b4be339..b9d296a203 100644
--- a/stubs/replay.c
+++ b/stubs/replay.c
@@ -2,6 +2,7 @@
 #include "sysemu/replay.h"

 ReplayMode replay_mode;
+bool replay_loading;

 void replay_finish(void)
 {
diff --git a/util/guest-random.c b/util/guest-random.c
index 23643f86cc..7f847533d1 100644
--- a/util/guest-random.c
+++ b/util/guest-random.c
@@ -46,7 +46,7 @@ static int glib_random_bytes(void *buf, size_t len)
 int qemu_guest_getrandom(void *buf, size_t len, Error **errp)
 {
     int ret;
-    if (replay_mode == REPLAY_MODE_PLAY) {
+    if (replay_mode == REPLAY_MODE_PLAY && !replay_loading) {
         return replay_read_random(buf, len);
     }
     if (unlikely(deterministic)) {
Pavel Dovgalyuk Oct. 11, 2022, 6:46 a.m. UTC | #9
On 10.10.2022 18:32, Peter Maydell wrote:
> On Mon, 10 Oct 2022 at 16:21, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> On Mon, Oct 10, 2022 at 11:54:50AM +0100, Peter Maydell wrote:
>>> The error is essentially the record-and-replay subsystem saying "the
>>> replay just asked for a random number at point when the recording
>>> did not ask for one, and so there's no 'this is what the number was'
>>> info in the record".
>>>
>>> I have had a quick look, and I think the reason for this is that
>>> load_snapshot() ("reset the VM state to the snapshot state stored in the
>>> disk image or migration stream") does a system reset. The replay
>>> process involves a lot of "load state from a snapshot and play
>>> forwards from there" operations. It doesn't expect that load_snapshot()
>>> would result in something reading random data, but now that we are
>>> calling qemu_guest_getrandom() in a reset hook, that happens.
>>
>> Hmm... so this seems like a bug in the replay code then? Shouldn't that
>> reset handler get hit during both passes, so the entry should be in
>> each?
> 
> No, because record is just
> "reset the system, record all the way to the end stop",
> but replay is
> "set the system to the point we want to start at by using
> load_snapshot, play from there", and depending on the actions
> you do in the debugger like reverse-continue we might repeatedly
> do "reload that snapshot (implying a system reset) and play from there"
> multiple times.

The idea of the patches is fdt randomization during reset, right?
But reset is used not only for real reboot, but also for restoring the 
snapshots.
In the latter case it is like "just clear the hw registers to simplify 
the initialization".
Therefore no other virtual hardware tried to read external data yet. And 
random numbers are external to the machine, they come from the outer world.

It means that this is completely new reset case and new solution should 
be found for it.

Pavel Dovgalyuk
Jason A. Donenfeld Oct. 11, 2022, 8:06 p.m. UTC | #10
On Tue, Oct 11, 2022 at 09:46:01AM +0300, Pavel Dovgalyuk wrote:
> On 10.10.2022 18:32, Peter Maydell wrote:
> > On Mon, 10 Oct 2022 at 16:21, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >>
> >> On Mon, Oct 10, 2022 at 11:54:50AM +0100, Peter Maydell wrote:
> >>> The error is essentially the record-and-replay subsystem saying "the
> >>> replay just asked for a random number at point when the recording
> >>> did not ask for one, and so there's no 'this is what the number was'
> >>> info in the record".
> >>>
> >>> I have had a quick look, and I think the reason for this is that
> >>> load_snapshot() ("reset the VM state to the snapshot state stored in the
> >>> disk image or migration stream") does a system reset. The replay
> >>> process involves a lot of "load state from a snapshot and play
> >>> forwards from there" operations. It doesn't expect that load_snapshot()
> >>> would result in something reading random data, but now that we are
> >>> calling qemu_guest_getrandom() in a reset hook, that happens.
> >>
> >> Hmm... so this seems like a bug in the replay code then? Shouldn't that
> >> reset handler get hit during both passes, so the entry should be in
> >> each?
> > 
> > No, because record is just
> > "reset the system, record all the way to the end stop",
> > but replay is
> > "set the system to the point we want to start at by using
> > load_snapshot, play from there", and depending on the actions
> > you do in the debugger like reverse-continue we might repeatedly
> > do "reload that snapshot (implying a system reset) and play from there"
> > multiple times.
> 
> The idea of the patches is fdt randomization during reset, right?
> But reset is used not only for real reboot, but also for restoring the 
> snapshots.
> In the latter case it is like "just clear the hw registers to simplify 
> the initialization".
> Therefore no other virtual hardware tried to read external data yet. And 
> random numbers are external to the machine, they come from the outer world.
> 
> It means that this is completely new reset case and new solution should 
> be found for it.

Do you have any proposals for that?

Jason
Jason A. Donenfeld Oct. 11, 2022, 8:40 p.m. UTC | #11
On Tue, Oct 11, 2022 at 2:06 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Tue, Oct 11, 2022 at 09:46:01AM +0300, Pavel Dovgalyuk wrote:
> > On 10.10.2022 18:32, Peter Maydell wrote:
> > > On Mon, 10 Oct 2022 at 16:21, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > >>
> > >> On Mon, Oct 10, 2022 at 11:54:50AM +0100, Peter Maydell wrote:
> > >>> The error is essentially the record-and-replay subsystem saying "the
> > >>> replay just asked for a random number at point when the recording
> > >>> did not ask for one, and so there's no 'this is what the number was'
> > >>> info in the record".
> > >>>
> > >>> I have had a quick look, and I think the reason for this is that
> > >>> load_snapshot() ("reset the VM state to the snapshot state stored in the
> > >>> disk image or migration stream") does a system reset. The replay
> > >>> process involves a lot of "load state from a snapshot and play
> > >>> forwards from there" operations. It doesn't expect that load_snapshot()
> > >>> would result in something reading random data, but now that we are
> > >>> calling qemu_guest_getrandom() in a reset hook, that happens.
> > >>
> > >> Hmm... so this seems like a bug in the replay code then? Shouldn't that
> > >> reset handler get hit during both passes, so the entry should be in
> > >> each?
> > >
> > > No, because record is just
> > > "reset the system, record all the way to the end stop",
> > > but replay is
> > > "set the system to the point we want to start at by using
> > > load_snapshot, play from there", and depending on the actions
> > > you do in the debugger like reverse-continue we might repeatedly
> > > do "reload that snapshot (implying a system reset) and play from there"
> > > multiple times.
> >
> > The idea of the patches is fdt randomization during reset, right?
> > But reset is used not only for real reboot, but also for restoring the
> > snapshots.
> > In the latter case it is like "just clear the hw registers to simplify
> > the initialization".
> > Therefore no other virtual hardware tried to read external data yet. And
> > random numbers are external to the machine, they come from the outer world.
> >
> > It means that this is completely new reset case and new solution should
> > be found for it.
>
> Do you have any proposals for that?

Okay I've actually read your message like 6 times now and think I may
have come up with something. Initial testing indicates it works well.
I'll send a new series shortly.

Jason
diff mbox series

Patch

diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index ef060a9759..d552f324b6 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -196,6 +196,15 @@  int qemu_fdt_setprop_sized_cells_from_array(void *fdt,
                                                 qdt_tmp);                 \
     })
 
+
+/**
+ * qemu_fdt_randomize_seeds:
+ * @fdt: device tree blob
+ *
+ * Re-randomize all "rng-seed" properties with new seeds.
+ */
+void qemu_fdt_randomize_seeds(void *fdt);
+
 #define FDT_PCI_RANGE_RELOCATABLE          0x80000000
 #define FDT_PCI_RANGE_PREFETCHABLE         0x40000000
 #define FDT_PCI_RANGE_ALIASED              0x20000000
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index 6ca3fad285..d986c7b7b3 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -22,6 +22,7 @@ 
 #include "qemu/option.h"
 #include "qemu/bswap.h"
 #include "qemu/cutils.h"
+#include "qemu/guest-random.h"
 #include "sysemu/device_tree.h"
 #include "hw/loader.h"
 #include "hw/boards.h"
@@ -643,3 +644,23 @@  out:
     g_free(propcells);
     return ret;
 }
+
+void qemu_fdt_randomize_seeds(void *fdt)
+{
+    int noffset, poffset, len;
+    const char *name;
+    uint8_t *data;
+
+    for (noffset = fdt_next_node(fdt, 0, NULL);
+         noffset >= 0;
+         noffset = fdt_next_node(fdt, noffset, NULL)) {
+        for (poffset = fdt_first_property_offset(fdt, noffset);
+             poffset >= 0;
+             poffset = fdt_next_property_offset(fdt, poffset)) {
+            data = (uint8_t *)fdt_getprop_by_offset(fdt, poffset, &name, &len);
+            if (!data || strcmp(name, "rng-seed"))
+                continue;
+            qemu_guest_getrandom_nofail(data, len);
+        }
+    }
+}