From patchwork Tue Jan 25 13:57:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1584089 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=faNi+g0R; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JjqPg0Lj5z9sCD for ; Wed, 26 Jan 2022 01:44:39 +1100 (AEDT) Received: from localhost ([::1]:35570 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCN3k-0006ub-9R for incoming@patchwork.ozlabs.org; Tue, 25 Jan 2022 09:44:36 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52530) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKd-0000Js-Na for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:58:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:22766) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKQ-0002hz-22 for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:57:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MKVdDpmsiD0N08grHKgnsOW9qzZBzzpOjdkj/yG8Onk=; b=faNi+g0R9kHiHSBRrGkBZ+gnZIRjprgygt8OUoph6NKrXBDqZowJ8QTFQLtCsRgGR8MlB3 FZ9nphXkpxmzlowoKf5anf8hAlZXYq8rn+Zh+26JKKuHK6ZFrKwQxmSoSB+w7BeTDTg74O +hMZulpV2clxkE6Cpk+LbqOL+tg6ENo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-OFUL0MytMoy7TKJcp9hE9w-1; Tue, 25 Jan 2022 08:57:39 -0500 X-MC-Unique: OFUL0MytMoy7TKJcp9hE9w-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9CC7F1018720 for ; Tue, 25 Jan 2022 13:57:38 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id 41F5D2377B; Tue, 25 Jan 2022 13:57:37 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 1/2] virtio-mem: Fail if a memory backend with "prealloc=on" is specified Date: Tue, 25 Jan 2022 14:57:33 +0100 Message-Id: <20220125135734.134928-2-david@redhat.com> In-Reply-To: <20220125135734.134928-1-david@redhat.com> References: <20220125135734.134928-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.158, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michal Privoznik , Juan Quintela , "Michael S . Tsirkin" , "Dr . David Alan Gilbert" , David Hildenbrand Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" "prealloc=on" for the memory backend does not work as expected, as virtio-mem will simply discard all preallocated memory immediately again. In the best case, it's an expensive NOP. In the worst case, it's an unexpected allocation error. Instead, "prealloc=on" should be specified for the virtio-mem device only, such that virtio-mem will try preallocating memory before plugging memory dynamically to the guest. Fail if such a memory backend is provided. Tested-by: Michal Privoznik Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index f55dcf61f2..b7bad6ef96 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -773,6 +773,12 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) error_setg(errp, "'%s' property specifies an unsupported memdev", VIRTIO_MEM_MEMDEV_PROP); return; + } else if (vmem->memdev->prealloc) { + error_setg(errp, "'%s' property specifies a memdev with preallocation" + " enabled: %s. Instead, specify 'prealloc=on' for the" + " virtio-mem device. ", VIRTIO_MEM_MEMDEV_PROP, + object_get_canonical_path_component(OBJECT(vmem->memdev))); + return; } if ((nb_numa_nodes && vmem->node >= nb_numa_nodes) || From patchwork Tue Jan 25 13:57:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1584141 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=GgrRuCv1; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JjrhK5DpZz9t56 for ; Wed, 26 Jan 2022 02:42:24 +1100 (AEDT) Received: from localhost ([::1]:60782 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCNxc-00065U-TZ for incoming@patchwork.ozlabs.org; Tue, 25 Jan 2022 10:42:20 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52560) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKh-0000K6-EM for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:58:04 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:49976) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCMKc-0002i7-3e for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:58:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643119062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jI6ReNUwkVesIrTpeu2fiSAGMyfAiTu0gkgaEEZcj6Q=; b=GgrRuCv1wCddrmelPa7lhWJhYBPPg1v1UOefZzplq4HzphHl0qCBKq3dMfvgo3el/nL4p3 SRgWKyPhj2D59Afd6Mo6K644vdAhsnvCVE9cyCzO/gwfWAL0RmKb0mGBx86kA1EtEyoR9Z k5IYM/GJSrhUASPZmeb1yeiWV4JevM0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-547-wgctyUktMnyBUcMyQQHMUQ-1; Tue, 25 Jan 2022 08:57:41 -0500 X-MC-Unique: wgctyUktMnyBUcMyQQHMUQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 79126190D37A for ; Tue, 25 Jan 2022 13:57:40 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.46]) by smtp.corp.redhat.com (Postfix) with ESMTP id EA1B82377B; Tue, 25 Jan 2022 13:57:38 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Subject: [PATCH v2 2/2] virtio-mem: Handle preallocation with migration Date: Tue, 25 Jan 2022 14:57:34 +0100 Message-Id: <20220125135734.134928-3-david@redhat.com> In-Reply-To: <20220125135734.134928-1-david@redhat.com> References: <20220125135734.134928-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.158, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michal Privoznik , Juan Quintela , "Michael S . Tsirkin" , "Dr . David Alan Gilbert" , David Hildenbrand Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" During precopy we usually write all plugged ares and essentially allocate them. However, there are two corner cases: 1) Migrating the zeropage When the zeropage gets migrated, we first check if the destination range is already zero and avoid performing a write in that case: ram_handle_compressed(). If the memory backend, like anonymous RAM or most filesystems, populate the shared zeropage when reading a (file) hole, we don't preallocate backend memory. In that case, we have to explicitly trigger the allocation to allocate actual backend memory. 2) Excluding memory ranges during migration For example, virtio-balloon free page hinting will exclude some pages from getting migrated. In that case, we don't allocate memory for plugged ranges when migrating. So trigger allocation of all plugged ranges when restoring the device state and fail gracefully if allocation fails. Handling postcopy is a bit more tricky, as postcopy and preallocation are problematic in general. To at least mimic what ordinary preallocation does, temporarily try allocating the requested amount of memory and fail postcopy in case the requested size on source and destination doesn't match. This way, we at least checked that there isn't a fundamental configuration issue and that we were able to preallocate the required amount of memory at least once, instead of failing unrecoverably during postcopy later. However, just as ordinary preallocation with postcopy, it's racy. Tested-by: Michal Privoznik Reviewed-by: Dr. David Alan Gilbert Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 136 +++++++++++++++++++++++++++++++++ include/hw/virtio/virtio-mem.h | 6 ++ 2 files changed, 142 insertions(+) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index b7bad6ef96..226081fb63 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -27,6 +27,7 @@ #include "qapi/visitor.h" #include "exec/ram_addr.h" #include "migration/misc.h" +#include "migration/postcopy-ram.h" #include "hw/boards.h" #include "hw/qdev-properties.h" #include CONFIG_DEVICES @@ -203,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg, return ret; } +static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *arg, + virtio_mem_range_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + int ret = 0; + + first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size); + while (first_bit < vmem->bitmap_size) { + offset = first_bit * vmem->block_size; + last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, + first_bit + 1) - 1; + size = (last_bit - first_bit + 1) * vmem->block_size; + + ret = cb(vmem, arg, offset, size); + if (ret) { + break; + } + first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, + last_bit + 2); + } + return ret; +} + /* * Adjust the memory section to cover the intersection with the given range. * @@ -828,6 +853,7 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) if (!vmem->block_size) { vmem->block_size = virtio_mem_default_block_size(rb); } + vmem->initial_requested_size = vmem->requested_size; if (vmem->block_size < page_size) { error_setg(errp, "'%s' property has to be at least the page size (0x%" @@ -888,6 +914,7 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) */ memory_region_set_ram_discard_manager(&vmem->memdev->mr, RAM_DISCARD_MANAGER(vmem)); + postcopy_add_notifier(&vmem->postcopy_notifier); } static void virtio_mem_device_unrealize(DeviceState *dev) @@ -895,6 +922,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOMEM *vmem = VIRTIO_MEM(dev); + postcopy_remove_notifier(&vmem->postcopy_notifier); /* * The unplug handler unmapped the memory region, it cannot be * found via an address space anymore. Unset ourselves. @@ -924,12 +952,119 @@ static int virtio_mem_restore_unplugged(VirtIOMEM *vmem) virtio_mem_discard_range_cb); } +static int virtio_mem_prealloc_range(const VirtIOMEM *vmem, uint64_t offset, + uint64_t size) +{ + void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset; + int fd = memory_region_get_fd(&vmem->memdev->mr); + Error *local_err = NULL; + + os_mem_prealloc(fd, area, size, 1, &local_err); + if (local_err) { + error_report_err(local_err); + return -ENOMEM; + } + return 0; +} + +static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg, + uint64_t offset, uint64_t size) +{ + return virtio_mem_prealloc_range(vmem, offset, size); +} + +static int virtio_mem_restore_prealloc(VirtIOMEM *vmem) +{ + /* + * Make sure any preallocated memory is really preallocated. Migration + * might have skipped some pages or optimized for the zeropage. + */ + return virtio_mem_for_each_plugged_range(vmem, NULL, + virtio_mem_prealloc_range_cb); +} + +static int virtio_mem_postcopy_notify(NotifierWithReturn *notifier, + void *opaque) +{ + struct PostcopyNotifyData *pnd = opaque; + VirtIOMEM *vmem = container_of(notifier, VirtIOMEM, postcopy_notifier); + RAMBlock *rb = vmem->memdev->mr.ram_block; + int ret; + + if (pnd->reason != POSTCOPY_NOTIFY_INBOUND_ADVISE || !vmem->prealloc || + !vmem->initial_requested_size) { + return 0; + } + assert(!vmem->size); + + /* + * When creating the device we discard all memory and we don't know + * which blocks the source has plugged (and should be preallocated) until we + * restore the device state. However, we cannot allocate when restoring the + * device state either if postcopy is already active. + * + * If we reach this point, postcopy is possible and we have preallocation + * enabled. + * + * Temporarily allocate the requested size to see if there is a fundamental + * configuration issue that would make postcopy fail because the memory + * backend is out of memory. While this increases reliability, + * prealloc+postcopy cannot be fully reliable: see the comment in + * virtio_mem_post_load(). + */ + ret = virtio_mem_prealloc_range(vmem, 0, vmem->initial_requested_size); + if (ram_block_discard_range(rb, 0, vmem->initial_requested_size)) { + ret = ret ? ret : -EINVAL; + return ret; + } + return 0; +} + static int virtio_mem_post_load(void *opaque, int version_id) { VirtIOMEM *vmem = VIRTIO_MEM(opaque); RamDiscardListener *rdl; int ret; + if (vmem->prealloc) { + if (migration_in_incoming_postcopy()) { + /* + * Prealloc with postcopy cannot possibly work fully reliable in + * general: preallocation has to populate all memory immediately and + * fail gracefully before the guest started running on the + * destination while postcopy wants to discard memory and populate + * on demand after the guest started running on the destination. + * + * For ordinary memory backends, "prealloc=on" is essentially + * overridden by postcopy, which will simply discard preallocated + * pages and might fail later when running out of backend memory + * when trying to place a page: the earlier preallocation only makes + * it less likely to fail, but nothing (not even huge page + * reservation) will guarantee that postcopy will find a free page + * to place once the guest is running on the destination. + * + * We temporarily allocate "requested-size" during + * POSTCOPY_NOTIFY_INBOUND_ADVISE, before migrating any memory. This + * resembles what is done with ordinary memory backends. + * + * We need to have a matching requested size on source and + * destination that we actually temporarily allocated the right + * amount of memory. As requested-size changed when restoring the + * state, check against the initial value. + */ + if (vmem->requested_size != vmem->initial_requested_size) { + error_report("postcopy with 'prealloc=on' needs matching" + " 'requested-size' on source and destination"); + return -EINVAL; + } + } else { + ret = virtio_mem_restore_prealloc(vmem); + if (ret) { + return ret; + } + } + } + /* * We started out with all memory discarded and our memory region is mapped * into an address space. Replay, now that we updated the bitmap. @@ -1198,6 +1333,7 @@ static void virtio_mem_instance_init(Object *obj) notifier_list_init(&vmem->size_change_notifiers); QLIST_INIT(&vmem->rdl_list); + vmem->postcopy_notifier.notify = virtio_mem_postcopy_notify; object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size, NULL, NULL, NULL); diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index 7745cfc1a3..45395152d2 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -61,6 +61,9 @@ struct VirtIOMEM { /* requested size */ uint64_t requested_size; + /* initial requested size on startup */ + uint64_t initial_requested_size; + /* block size and alignment */ uint64_t block_size; @@ -77,6 +80,9 @@ struct VirtIOMEM { /* notifiers to notify when "size" changes */ NotifierList size_change_notifiers; + /* notifier for postcopy events */ + NotifierWithReturn postcopy_notifier; + /* listeners to notify on plug/unplug activity. */ QLIST_HEAD(, RamDiscardListener) rdl_list; };