diff mbox series

[v2,4/4] virtio-mem: Support "x-ignore-shared" migration

Message ID 20230706075612.67404-5-david@redhat.com
State New
Headers show
Series virtio-mem: Support "x-ignore-shared" migration | expand

Commit Message

David Hildenbrand July 6, 2023, 7:56 a.m. UTC
To achieve desired "x-ignore-shared" functionality, we should not
discard all RAM when realizing the device and not mess with
preallocation/postcopy when loading device state. In essence, we should
not touch RAM content.

As "x-ignore-shared" gets set after realizing the device, we cannot
rely on that. Let's simply skip discarding of RAM on incoming migration.
Note that virtio_mem_post_load() will call
virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
once migration finished we'll have a consistent state.

The initial system reset will also not discard any RAM, because
virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
memory is plugged (which is the case before loading the device state).

Note that something like VM templating -- see commit b17fbbe55cba
("migration: allow private destination ram with x-ignore-shared") -- is
currently incompatible with virtio-mem and ram_block_discard_range() will
warn in case a private file mapping is supplied by virtio-mem.

For VM templating with virtio-mem, it makes more sense to either
(a) Create the template without the virtio-mem device and hotplug a
    virtio-mem device to the new VM instances using proper own memory
    backend.
(b) Use a virtio-mem device that doesn't provide any memory in the
    template (requested-size=0) and use private anonymous memory.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/virtio/virtio-mem.c | 47 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 9 deletions(-)

Comments

Juan Quintela July 6, 2023, 11:06 a.m. UTC | #1
David Hildenbrand <david@redhat.com> wrote:
> To achieve desired "x-ignore-shared" functionality, we should not
> discard all RAM when realizing the device and not mess with
> preallocation/postcopy when loading device state. In essence, we should
> not touch RAM content.
>
> As "x-ignore-shared" gets set after realizing the device, we cannot
> rely on that. Let's simply skip discarding of RAM on incoming migration.
> Note that virtio_mem_post_load() will call
> virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
> once migration finished we'll have a consistent state.
>
> The initial system reset will also not discard any RAM, because
> virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
> memory is plugged (which is the case before loading the device state).
>
> Note that something like VM templating -- see commit b17fbbe55cba
> ("migration: allow private destination ram with x-ignore-shared")

And here I am, I reviewed the patch, and 4 years later I don't remember
anything about it O:-)

> -- is
> currently incompatible with virtio-mem and ram_block_discard_range() will
> warn in case a private file mapping is supplied by virtio-mem.

If it is incompatible, only a warning is not enough.

>
> For VM templating with virtio-mem, it makes more sense to either
> (a) Create the template without the virtio-mem device and hotplug a
>     virtio-mem device to the new VM instances using proper own memory
>     backend.
> (b) Use a virtio-mem device that doesn't provide any memory in the
>     template (requested-size=0) and use private anonymous memory.
>
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/virtio/virtio-mem.c | 47 ++++++++++++++++++++++++++++++++++--------
>  1 file changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index a922c21380..3f41e00e74 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -18,6 +18,7 @@
>  #include "sysemu/numa.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/reset.h"
> +#include "sysemu/runstate.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-mem.h"
> @@ -901,11 +902,23 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> -    ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
> -    if (ret) {
> -        error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
> -        ram_block_coordinated_discard_require(false);
> -        return;
> +    /*
> +     * We don't know at this point whether shared RAM is migrated using
> +     * QEMU or migrated using the file content. "x-ignore-shared" will be
> +     * configured after realizing the device. So in case we have an
> +     * incoming migration, simply always skip the discard step.
> +     *
> +     * Otherwise, make sure that we start with a clean slate: either the
> +     * memory backend might get reused or the shared file might still have
> +     * memory allocated.
> +     */
> +    if (!runstate_check(RUN_STATE_INMIGRATE)) {
> +        ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
> +        if (ret) {
> +            error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
> +            ram_block_coordinated_discard_require(false);
> +            return;
> +        }
>      }

Makes sense.

>  
>      virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
> @@ -977,10 +990,6 @@ static int virtio_mem_post_load(void *opaque, int version_id)
>      RamDiscardListener *rdl;
>      int ret;
>  
> -    if (vmem->prealloc && !vmem->early_migration) {
> -        warn_report("Proper preallocation with migration requires a newer QEMU machine");
> -    }
> -
>      /*
>       * We started out with all memory discarded and our memory region is mapped
>       * into an address space. Replay, now that we updated the bitmap.
> @@ -993,6 +1002,18 @@ static int virtio_mem_post_load(void *opaque, int version_id)
>          }
>      }
>  
> +    /*
> +     * If shared RAM is migrated using the file content and not using QEMU,
> +     * don't mess with preallocation and postcopy.
> +     */
> +    if (migrate_ram_is_ignored(vmem->memdev->mr.ram_block)) {
> +        return 0;
> +    }
> +
> +    if (vmem->prealloc && !vmem->early_migration) {
> +        warn_report("Proper preallocation with migration requires a newer QEMU machine");
> +    }
> +

Could you explain why you are putting the check after calling
virtio_mem_notify_populate_cb()?

What is it expected to for file memory backed RAM?  I got lost when I
saw that it just calls:

static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
{
    RamDiscardListener *rdl = arg;

    return rdl->notify_populate(rdl, s);
}


I end in vfio, and got completely confused about what is going on there.


>      if (migration_in_incoming_postcopy()) {
>          return 0;
>      }
> @@ -1025,6 +1046,14 @@ static int virtio_mem_post_load_early(void *opaque, int version_id)
>          return 0;
>      }
>  
> +    /*
> +     * If shared RAM is migrated using the file content and not using QEMU,
> +     * don't mess with preallocation and postcopy.
> +     */
> +    if (migrate_ram_is_ignored(rb)) {
> +        return 0;
> +    }
> +
>      /*
>       * We restored the bitmap and verified that the basic properties
>       * match on source and destination, so we can go ahead and preallocate

OK.

Thanks, Juan.
David Hildenbrand July 6, 2023, 11:27 a.m. UTC | #2
On 06.07.23 13:06, Juan Quintela wrote:
> David Hildenbrand <david@redhat.com> wrote:
>> To achieve desired "x-ignore-shared" functionality, we should not
>> discard all RAM when realizing the device and not mess with
>> preallocation/postcopy when loading device state. In essence, we should
>> not touch RAM content.
>>
>> As "x-ignore-shared" gets set after realizing the device, we cannot
>> rely on that. Let's simply skip discarding of RAM on incoming migration.
>> Note that virtio_mem_post_load() will call
>> virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
>> once migration finished we'll have a consistent state.
>>
>> The initial system reset will also not discard any RAM, because
>> virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
>> memory is plugged (which is the case before loading the device state).
>>
>> Note that something like VM templating -- see commit b17fbbe55cba
>> ("migration: allow private destination ram with x-ignore-shared")
> 
> And here I am, I reviewed the patch, and 4 years later I don't remember
> anything about it O:-)

:)

[...]

>> +    /*
>> +     * If shared RAM is migrated using the file content and not using QEMU,
>> +     * don't mess with preallocation and postcopy.
>> +     */
>> +    if (migrate_ram_is_ignored(vmem->memdev->mr.ram_block)) {
>> +        return 0;
>> +    }
>> +
>> +    if (vmem->prealloc && !vmem->early_migration) {
>> +        warn_report("Proper preallocation with migration requires a newer QEMU machine");
>> +    }
>> +
> 
> Could you explain why you are putting the check after calling
> virtio_mem_notify_populate_cb()?
> 
> What is it expected to for file memory backed RAM?  I got lost when I
> saw that it just calls:
> 
> static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
> {
>      RamDiscardListener *rdl = arg;
> 
>      return rdl->notify_populate(rdl, s);
> }
> 
> 
> I end in vfio, and got completely confused about what is going on there.


:)


Once we reached virtio_mem_post_load(), we restored the bitmap that 
contains the state of all device blocks (plugged vs. unplugged).

Whenever we modify the bitmap (plug / unplug), we have to notify 
(RamDiscardManager) listeners, such that they are aware of the state 
change and can perform according action.

For example, vfio will go ahead and register the newly plugged blocks 
with the kernel (DMA map it into the vfio), where the kernel will end up 
long-term pinning these pages. Effectively, we only end up DMA-mapping 
plugged memory blocks, so only these get pinned by the kernel (and we 
can actually release the memory of unplugged blocks).


So here (virtio_mem_post_load()), we just restored the bitmap from the 
migration stream and effectively went from 0 plugged blocks (bitmap 
empty) before migration to "maybe some plugged blocks in the bitmap".

So we go over the bitmap and tell the world (vfio) to go ahead and 
DMA-map these blocks that are suddenly plugged.


And that part is independent of the actual RAM migration / 
x-ignore-shared, sow have to do it unconditional.



Thanks for the thorough review!
Juan Quintela July 6, 2023, 11:59 a.m. UTC | #3
David Hildenbrand <david@redhat.com> wrote:
> To achieve desired "x-ignore-shared" functionality, we should not
> discard all RAM when realizing the device and not mess with
> preallocation/postcopy when loading device state. In essence, we should
> not touch RAM content.
>
> As "x-ignore-shared" gets set after realizing the device, we cannot
> rely on that. Let's simply skip discarding of RAM on incoming migration.
> Note that virtio_mem_post_load() will call
> virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
> once migration finished we'll have a consistent state.
>
> The initial system reset will also not discard any RAM, because
> virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
> memory is plugged (which is the case before loading the device state).
>
> Note that something like VM templating -- see commit b17fbbe55cba
> ("migration: allow private destination ram with x-ignore-shared") -- is
> currently incompatible with virtio-mem and ram_block_discard_range() will
> warn in case a private file mapping is supplied by virtio-mem.
>
> For VM templating with virtio-mem, it makes more sense to either
> (a) Create the template without the virtio-mem device and hotplug a
>     virtio-mem device to the new VM instances using proper own memory
>     backend.
> (b) Use a virtio-mem device that doesn't provide any memory in the
>     template (requested-size=0) and use private anonymous memory.
>
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

After very nice explanation.

Reviewed-by: Juan Quintela <quintela@redhat.com>
diff mbox series

Patch

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index a922c21380..3f41e00e74 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -18,6 +18,7 @@ 
 #include "sysemu/numa.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/reset.h"
+#include "sysemu/runstate.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-mem.h"
@@ -901,11 +902,23 @@  static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
-    if (ret) {
-        error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
-        ram_block_coordinated_discard_require(false);
-        return;
+    /*
+     * We don't know at this point whether shared RAM is migrated using
+     * QEMU or migrated using the file content. "x-ignore-shared" will be
+     * configured after realizing the device. So in case we have an
+     * incoming migration, simply always skip the discard step.
+     *
+     * Otherwise, make sure that we start with a clean slate: either the
+     * memory backend might get reused or the shared file might still have
+     * memory allocated.
+     */
+    if (!runstate_check(RUN_STATE_INMIGRATE)) {
+        ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
+        if (ret) {
+            error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
+            ram_block_coordinated_discard_require(false);
+            return;
+        }
     }
 
     virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
@@ -977,10 +990,6 @@  static int virtio_mem_post_load(void *opaque, int version_id)
     RamDiscardListener *rdl;
     int ret;
 
-    if (vmem->prealloc && !vmem->early_migration) {
-        warn_report("Proper preallocation with migration requires a newer QEMU machine");
-    }
-
     /*
      * We started out with all memory discarded and our memory region is mapped
      * into an address space. Replay, now that we updated the bitmap.
@@ -993,6 +1002,18 @@  static int virtio_mem_post_load(void *opaque, int version_id)
         }
     }
 
+    /*
+     * If shared RAM is migrated using the file content and not using QEMU,
+     * don't mess with preallocation and postcopy.
+     */
+    if (migrate_ram_is_ignored(vmem->memdev->mr.ram_block)) {
+        return 0;
+    }
+
+    if (vmem->prealloc && !vmem->early_migration) {
+        warn_report("Proper preallocation with migration requires a newer QEMU machine");
+    }
+
     if (migration_in_incoming_postcopy()) {
         return 0;
     }
@@ -1025,6 +1046,14 @@  static int virtio_mem_post_load_early(void *opaque, int version_id)
         return 0;
     }
 
+    /*
+     * If shared RAM is migrated using the file content and not using QEMU,
+     * don't mess with preallocation and postcopy.
+     */
+    if (migrate_ram_is_ignored(rb)) {
+        return 0;
+    }
+
     /*
      * We restored the bitmap and verified that the basic properties
      * match on source and destination, so we can go ahead and preallocate