From patchwork Wed Nov 1 13:16:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mattias Nissler X-Patchwork-Id: 1857956 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=lLZoQxvp; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SL6xN4NG7z1yQp for ; Thu, 2 Nov 2023 00:17:27 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyB5F-0000Ge-V3; Wed, 01 Nov 2023 09:16:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyB5E-0000GK-5b for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:32 -0400 Received: from mail-oa1-x31.google.com ([2001:4860:4864:20::31]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qyB5B-0003cx-Sk for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:31 -0400 Received: by mail-oa1-x31.google.com with SMTP id 586e51a60fabf-1e9a757e04eso450428fac.0 for ; Wed, 01 Nov 2023 06:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698844587; x=1699449387; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Pd11A/p5zWjpALjqwGU8mFzb6ifOSU8eUKVZKWCWVYw=; b=lLZoQxvpvktCUZNX91GigEt3lk1ne6Vof2+6RUWTkVgGML91YCZylGUm3RCej+xPdU r8RJWVq8z7I2VPUAajmnTGOults9Kyg9vVX/iJ0rdHcH4k1SNc6y7v/Q+PjIHyQh6Npn 5yDytd0VbjVv9irvPAtfO9MUBrKhbFimxCQZokn7QQS+0tZXQa1eNsvbxmgpGoNKoX3B weKZTooRDfNzAJ97+X7FGsctaOrQ/Kf+oxdfpbdvDQFAi/ZNJOt7w8TvsoPm7y7Q3B4N hwDSF28XThTEoWsze2vXVCGEwlF22Y/sgJhaqLwMPFapZ8DZ7Jvad0ajBAzV2ru26Ast 3vGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698844587; x=1699449387; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Pd11A/p5zWjpALjqwGU8mFzb6ifOSU8eUKVZKWCWVYw=; b=py3p1qn0Mx3yRkwX2HHvhn1Q8O/xE0BTqM9dO8SmESr8YKzhKdO6lnj6wCXs6tezyC YSD+uVniOFTBiVi4Wj7xOC0jFE+U2xaafGpJ2HTCBDsoks3FjkRDpdZiEmw8fpg6Dk9h 9mIeYIW5So3l7+tpNJgh++GxadW7eit8MOG6qANoOL+Pil1Cazo7Pfdd0a7jAOKz61N9 DyBQzJhrNQh5igATP92fHrg4zz1luFooZhoEFITqajiNE2OMKZgH357almUjN+E4yFK7 TtGdvFu3/I+gOowuYilimOPUF6qASlS8y8kAVtmzplxeanRag3NT67d8KbwXLqaPiDSb MTug== X-Gm-Message-State: AOJu0YxKh2HiMjucvr7hu4b8IvChEmWX7iEjzlBEqJFIusmEH2XU0J6J S4sfef7oZLg3K5/wjTBFKsgfvQ== X-Google-Smtp-Source: AGHT+IGR/bvyBF+LN948JGHufsQ1De7H443ZX6HE5V1jYR5HRL5LOfwdUYZU0cp+lOLDbAtx2ljVzw== X-Received: by 2002:a05:6871:5312:b0:1ef:b0b6:7e14 with SMTP id hx18-20020a056871531200b001efb0b67e14mr3379619oac.10.1698844587597; Wed, 01 Nov 2023 06:16:27 -0700 (PDT) Received: from mnissler.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id y6-20020a056870e3c600b001d4d8efa7f9sm238148oad.4.2023.11.01.06.16.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 06:16:27 -0700 (PDT) From: Mattias Nissler To: stefanha@redhat.com, jag.raman@oracle.com, qemu-devel@nongnu.org, peterx@redhat.com Cc: john.levon@nutanix.com, David Hildenbrand , Marcel Apfelbaum , Paolo Bonzini , "Michael S. Tsirkin" , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Elena Ufimtseva , Richard Henderson , Mattias Nissler Subject: [PATCH v6 1/5] softmmu: Per-AddressSpace bounce buffering Date: Wed, 1 Nov 2023 06:16:07 -0700 Message-Id: <20231101131611.775299-2-mnissler@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231101131611.775299-1-mnissler@rivosinc.com> References: <20231101131611.775299-1-mnissler@rivosinc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::31; envelope-from=mnissler@rivosinc.com; helo=mail-oa1-x31.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Instead of using a single global bounce buffer, give each AddressSpace its own bounce buffer. The MapClient callback mechanism moves to AddressSpace accordingly. This is in preparation for generalizing bounce buffer handling further to allow multiple bounce buffers, with a total allocation limit configured per AddressSpace. Signed-off-by: Mattias Nissler Reviewed-by: Peter Xu --- include/exec/cpu-common.h | 2 - include/exec/memory.h | 45 ++++++++++++++++- system/dma-helpers.c | 4 +- system/memory.c | 7 +++ system/physmem.c | 101 ++++++++++++++++---------------------- 5 files changed, 93 insertions(+), 66 deletions(-) diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h index 30c376a4de..0d31856898 100644 --- a/include/exec/cpu-common.h +++ b/include/exec/cpu-common.h @@ -160,8 +160,6 @@ void *cpu_physical_memory_map(hwaddr addr, bool is_write); void cpu_physical_memory_unmap(void *buffer, hwaddr len, bool is_write, hwaddr access_len); -void cpu_register_map_client(QEMUBH *bh); -void cpu_unregister_map_client(QEMUBH *bh); bool cpu_physical_memory_is_io(hwaddr phys_addr); diff --git a/include/exec/memory.h b/include/exec/memory.h index 9087d02769..fd2cbee65b 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1086,6 +1086,19 @@ struct MemoryListener { QTAILQ_ENTRY(MemoryListener) link_as; }; +typedef struct AddressSpaceMapClient { + QEMUBH *bh; + QLIST_ENTRY(AddressSpaceMapClient) link; +} AddressSpaceMapClient; + +typedef struct { + MemoryRegion *mr; + void *buffer; + hwaddr addr; + hwaddr len; + bool in_use; +} BounceBuffer; + /** * struct AddressSpace: describes a mapping of addresses to #MemoryRegion objects */ @@ -1103,6 +1116,12 @@ struct AddressSpace { struct MemoryRegionIoeventfd *ioeventfds; QTAILQ_HEAD(, MemoryListener) listeners; QTAILQ_ENTRY(AddressSpace) address_spaces_link; + + /* Bounce buffer to use for this address space. */ + BounceBuffer bounce; + /* List of callbacks to invoke when buffers free up */ + QemuMutex map_client_list_lock; + QLIST_HEAD(, AddressSpaceMapClient) map_client_list; }; typedef struct AddressSpaceDispatch AddressSpaceDispatch; @@ -2874,8 +2893,8 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr, hwaddr len, * May return %NULL and set *@plen to zero(0), if resources needed to perform * the mapping are exhausted. * Use only for reads OR writes - not for read-modify-write operations. - * Use cpu_register_map_client() to know when retrying the map operation is - * likely to succeed. + * Use address_space_register_map_client() to know when retrying the map + * operation is likely to succeed. * * @as: #AddressSpace to be accessed * @addr: address within that address space @@ -2900,6 +2919,28 @@ void *address_space_map(AddressSpace *as, hwaddr addr, void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len, bool is_write, hwaddr access_len); +/* + * address_space_register_map_client: Register a callback to invoke when + * resources for address_space_map() are available again. + * + * address_space_map may fail when there are not enough resources available, + * such as when bounce buffer memory would exceed the limit. The callback can + * be used to retry the address_space_map operation. Note that the callback + * gets automatically removed after firing. + * + * @as: #AddressSpace to be accessed + * @bh: callback to invoke when address_space_map() retry is appropriate + */ +void address_space_register_map_client(AddressSpace *as, QEMUBH *bh); + +/* + * address_space_unregister_map_client: Unregister a callback that has + * previously been registered and not fired yet. + * + * @as: #AddressSpace to be accessed + * @bh: callback to unregister + */ +void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh); /* Internal functions, part of the implementation of address_space_read. */ MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr, diff --git a/system/dma-helpers.c b/system/dma-helpers.c index 36211acc7e..611ea04ffb 100644 --- a/system/dma-helpers.c +++ b/system/dma-helpers.c @@ -167,7 +167,7 @@ static void dma_blk_cb(void *opaque, int ret) if (dbs->iov.size == 0) { trace_dma_map_wait(dbs); dbs->bh = aio_bh_new(ctx, reschedule_dma, dbs); - cpu_register_map_client(dbs->bh); + address_space_register_map_client(dbs->sg->as, dbs->bh); goto out; } @@ -197,7 +197,7 @@ static void dma_aio_cancel(BlockAIOCB *acb) } if (dbs->bh) { - cpu_unregister_map_client(dbs->bh); + address_space_unregister_map_client(dbs->sg->as, dbs->bh); qemu_bh_delete(dbs->bh); dbs->bh = NULL; } diff --git a/system/memory.c b/system/memory.c index 4928f2525d..a63bbc79a7 100644 --- a/system/memory.c +++ b/system/memory.c @@ -3132,6 +3132,9 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name) as->ioeventfds = NULL; QTAILQ_INIT(&as->listeners); QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link); + as->bounce.in_use = false; + qemu_mutex_init(&as->map_client_list_lock); + QLIST_INIT(&as->map_client_list); as->name = g_strdup(name ? name : "anonymous"); address_space_update_topology(as); address_space_update_ioeventfds(as); @@ -3139,6 +3142,10 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name) static void do_address_space_destroy(AddressSpace *as) { + assert(!qatomic_read(&as->bounce.in_use)); + assert(QLIST_EMPTY(&as->map_client_list)); + qemu_mutex_destroy(&as->map_client_list_lock); + assert(QTAILQ_EMPTY(&as->listeners)); flatview_unref(as->current_map); diff --git a/system/physmem.c b/system/physmem.c index fc2b0fee01..1b7ff90113 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2952,55 +2952,37 @@ void cpu_flush_icache_range(hwaddr start, hwaddr len) NULL, len, FLUSH_CACHE); } -typedef struct { - MemoryRegion *mr; - void *buffer; - hwaddr addr; - hwaddr len; - bool in_use; -} BounceBuffer; - -static BounceBuffer bounce; - -typedef struct MapClient { - QEMUBH *bh; - QLIST_ENTRY(MapClient) link; -} MapClient; - -QemuMutex map_client_list_lock; -static QLIST_HEAD(, MapClient) map_client_list - = QLIST_HEAD_INITIALIZER(map_client_list); - -static void cpu_unregister_map_client_do(MapClient *client) +static void +address_space_unregister_map_client_do(AddressSpaceMapClient *client) { QLIST_REMOVE(client, link); g_free(client); } -static void cpu_notify_map_clients_locked(void) +static void address_space_notify_map_clients_locked(AddressSpace *as) { - MapClient *client; + AddressSpaceMapClient *client; - while (!QLIST_EMPTY(&map_client_list)) { - client = QLIST_FIRST(&map_client_list); + while (!QLIST_EMPTY(&as->map_client_list)) { + client = QLIST_FIRST(&as->map_client_list); qemu_bh_schedule(client->bh); - cpu_unregister_map_client_do(client); + address_space_unregister_map_client_do(client); } } -void cpu_register_map_client(QEMUBH *bh) +void address_space_register_map_client(AddressSpace *as, QEMUBH *bh) { - MapClient *client = g_malloc(sizeof(*client)); + AddressSpaceMapClient *client = g_malloc(sizeof(*client)); - qemu_mutex_lock(&map_client_list_lock); + qemu_mutex_lock(&as->map_client_list_lock); client->bh = bh; - QLIST_INSERT_HEAD(&map_client_list, client, link); + QLIST_INSERT_HEAD(&as->map_client_list, client, link); /* Write map_client_list before reading in_use. */ smp_mb(); - if (!qatomic_read(&bounce.in_use)) { - cpu_notify_map_clients_locked(); + if (!qatomic_read(&as->bounce.in_use)) { + address_space_notify_map_clients_locked(as); } - qemu_mutex_unlock(&map_client_list_lock); + qemu_mutex_unlock(&as->map_client_list_lock); } void cpu_exec_init_all(void) @@ -3016,28 +2998,27 @@ void cpu_exec_init_all(void) finalize_target_page_bits(); io_mem_init(); memory_map_init(); - qemu_mutex_init(&map_client_list_lock); } -void cpu_unregister_map_client(QEMUBH *bh) +void address_space_unregister_map_client(AddressSpace *as, QEMUBH *bh) { - MapClient *client; + AddressSpaceMapClient *client; - qemu_mutex_lock(&map_client_list_lock); - QLIST_FOREACH(client, &map_client_list, link) { + qemu_mutex_lock(&as->map_client_list_lock); + QLIST_FOREACH(client, &as->map_client_list, link) { if (client->bh == bh) { - cpu_unregister_map_client_do(client); + address_space_unregister_map_client_do(client); break; } } - qemu_mutex_unlock(&map_client_list_lock); + qemu_mutex_unlock(&as->map_client_list_lock); } -static void cpu_notify_map_clients(void) +static void address_space_notify_map_clients(AddressSpace *as) { - qemu_mutex_lock(&map_client_list_lock); - cpu_notify_map_clients_locked(); - qemu_mutex_unlock(&map_client_list_lock); + qemu_mutex_lock(&as->map_client_list_lock); + address_space_notify_map_clients_locked(as); + qemu_mutex_unlock(&as->map_client_list_lock); } static bool flatview_access_valid(FlatView *fv, hwaddr addr, hwaddr len, @@ -3104,8 +3085,8 @@ flatview_extend_translation(FlatView *fv, hwaddr addr, * May map a subset of the requested range, given by and returned in *plen. * May return NULL if resources needed to perform the mapping are exhausted. * Use only for reads OR writes - not for read-modify-write operations. - * Use cpu_register_map_client() to know when retrying the map operation is - * likely to succeed. + * Use address_space_register_map_client() to know when retrying the map + * operation is likely to succeed. */ void *address_space_map(AddressSpace *as, hwaddr addr, @@ -3128,25 +3109,25 @@ void *address_space_map(AddressSpace *as, mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs); if (!memory_access_is_direct(mr, is_write)) { - if (qatomic_xchg(&bounce.in_use, true)) { + if (qatomic_xchg(&as->bounce.in_use, true)) { *plen = 0; return NULL; } /* Avoid unbounded allocations */ l = MIN(l, TARGET_PAGE_SIZE); - bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); - bounce.addr = addr; - bounce.len = l; + as->bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); + as->bounce.addr = addr; + as->bounce.len = l; memory_region_ref(mr); - bounce.mr = mr; + as->bounce.mr = mr; if (!is_write) { flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED, - bounce.buffer, l); + as->bounce.buffer, l); } *plen = l; - return bounce.buffer; + return as->bounce.buffer; } @@ -3164,7 +3145,7 @@ void *address_space_map(AddressSpace *as, void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len, bool is_write, hwaddr access_len) { - if (buffer != bounce.buffer) { + if (buffer != as->bounce.buffer) { MemoryRegion *mr; ram_addr_t addr1; @@ -3180,15 +3161,15 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len, return; } if (is_write) { - address_space_write(as, bounce.addr, MEMTXATTRS_UNSPECIFIED, - bounce.buffer, access_len); + address_space_write(as, as->bounce.addr, MEMTXATTRS_UNSPECIFIED, + as->bounce.buffer, access_len); } - qemu_vfree(bounce.buffer); - bounce.buffer = NULL; - memory_region_unref(bounce.mr); + qemu_vfree(as->bounce.buffer); + as->bounce.buffer = NULL; + memory_region_unref(as->bounce.mr); /* Clear in_use before reading map_client_list. */ - qatomic_set_mb(&bounce.in_use, false); - cpu_notify_map_clients(); + qatomic_set_mb(&as->bounce.in_use, false); + address_space_notify_map_clients(as); } void *cpu_physical_memory_map(hwaddr addr, From patchwork Wed Nov 1 13:16:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mattias Nissler X-Patchwork-Id: 1857958 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=bKT+Qihi; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SL6xt1qW0z1yQ4 for ; Thu, 2 Nov 2023 00:17:54 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyB5O-0000IN-1j; Wed, 01 Nov 2023 09:16:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyB5E-0000GM-CT for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:32 -0400 Received: from mail-ot1-x32c.google.com ([2607:f8b0:4864:20::32c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qyB5C-0003d6-9g for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:32 -0400 Received: by mail-ot1-x32c.google.com with SMTP id 46e09a7af769-6cf65093780so4355355a34.0 for ; Wed, 01 Nov 2023 06:16:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698844589; x=1699449389; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yIlnbUW/pnWJVQzqafSI25w9klMdFKzbnYY7uKCgM0k=; b=bKT+QihiPxyqlpxVm4xgfhNG51KT0JI2UTQTECyvkw39d6qypjVkR5gQyVP75pmNKI mgM/k2JHZTwUXwOO6r0IVo54meBVqjSX4sG7oWDwXVSRYuxaQhAKdJrPi6bI20CMyDZC 6mYLdfWx6pGh08LS+w8YJ1O2syGZV1/kBHOPUPo2UW/Tg3kZZD8qH/MrzcHR0FqHv4Ee 6y3L67rPZqVhv5LrAgmtD+ArJgfqp5Xq5ucBwHIQaHt7yGbDw60rf30CELaDlH1kz7Tf QZWa2VnY6SkYsjoxt5v8Ow3s1b+udrRAmDMLx9G8tP0bXEgyqzD77faQUroJKk736nCa t/gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698844589; x=1699449389; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yIlnbUW/pnWJVQzqafSI25w9klMdFKzbnYY7uKCgM0k=; b=rjRWd8qapqSUaGTblRNmbrEWv4C08JjJ7UT1sPr1/hJ6SluR0LHKNBxOpVVsEhw75m OTM0Eg8kqmJyQoixyQ36AzzGuWVP8LZZcokfIjHynIa1NE4HZ6MVwZwqlokSoVpJvw4i OzjLHbQSQF8qxbBjcutmUiuVWReQsBvu3P4hgJgtLVb/Ao3q5Ebjj0vWrS5NQDt2xsv4 bnUL4DckeNxFsN+uazy6H5pqLOVKUnPdunL8//i5t7nvCkh0oj0tOkCcY+peJ+d5YwHP o0Q55tp9O50zBC3Par1jvwDdSU8YOo7mgaQWGWaIPy7gm+5poIRZ+6UT1oDkjZ8m7k17 /n0Q== X-Gm-Message-State: AOJu0Yx1DVI8qjtL0mNQvrW96ZhA4HxnGLTGuu2CN2j8X/X11UAZVbuM EfBxv5Zxzc3Vaewxyl0hQNEdbsHXwk8tNOBBHQuqkq2c X-Google-Smtp-Source: AGHT+IHa2olssnDyXCinJeRQV9w2+iR3xvMjgoNLMBtjUvb7kkYMUIxmAqlgGMP0HzUjsYkYv0TFPg== X-Received: by 2002:a05:6870:1045:b0:1bb:a227:7008 with SMTP id 5-20020a056870104500b001bba2277008mr15611209oaj.3.1698844588805; Wed, 01 Nov 2023 06:16:28 -0700 (PDT) Received: from mnissler.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id y6-20020a056870e3c600b001d4d8efa7f9sm238148oad.4.2023.11.01.06.16.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 06:16:28 -0700 (PDT) From: Mattias Nissler To: stefanha@redhat.com, jag.raman@oracle.com, qemu-devel@nongnu.org, peterx@redhat.com Cc: john.levon@nutanix.com, David Hildenbrand , Marcel Apfelbaum , Paolo Bonzini , "Michael S. Tsirkin" , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Elena Ufimtseva , Richard Henderson , Mattias Nissler Subject: [PATCH v6 2/5] softmmu: Support concurrent bounce buffers Date: Wed, 1 Nov 2023 06:16:08 -0700 Message-Id: <20231101131611.775299-3-mnissler@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231101131611.775299-1-mnissler@rivosinc.com> References: <20231101131611.775299-1-mnissler@rivosinc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::32c; envelope-from=mnissler@rivosinc.com; helo=mail-ot1-x32c.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org When DMA memory can't be directly accessed, as is the case when running the device model in a separate process without shareable DMA file descriptors, bounce buffering is used. It is not uncommon for device models to request mapping of several DMA regions at the same time. Examples include: * net devices, e.g. when transmitting a packet that is split across several TX descriptors (observed with igb) * USB host controllers, when handling a packet with multiple data TRBs (observed with xhci) Previously, qemu only provided a single bounce buffer per AddressSpace and would fail DMA map requests while the buffer was already in use. In turn, this would cause DMA failures that ultimately manifest as hardware errors from the guest perspective. This change allocates DMA bounce buffers dynamically instead of supporting only a single buffer. Thus, multiple DMA mappings work correctly also when RAM can't be mmap()-ed. The total bounce buffer allocation size is limited individually for each AddressSpace. The default limit is 4096 bytes, matching the previous maximum buffer size. A new x-max-bounce-buffer-size parameter is provided to configure the limit for PCI devices. Signed-off-by: Mattias Nissler Reviewed-by: Peter Xu --- hw/pci/pci.c | 8 ++++ include/exec/memory.h | 14 +++---- include/hw/pci/pci_device.h | 3 ++ system/memory.c | 5 ++- system/physmem.c | 80 +++++++++++++++++++++++++------------ 5 files changed, 74 insertions(+), 36 deletions(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 7d09e1a39d..206826fcf2 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -85,6 +85,8 @@ static Property pci_props[] = { QEMU_PCIE_ERR_UNC_MASK_BITNR, true), DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present, QEMU_PCIE_ARI_NEXTFN_1_BITNR, false), + DEFINE_PROP_SIZE("x-max-bounce-buffer-size", PCIDevice, + max_bounce_buffer_size, DEFAULT_MAX_BOUNCE_BUFFER_SIZE), DEFINE_PROP_END_OF_LIST() }; @@ -1201,6 +1203,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, "bus master container", UINT64_MAX); address_space_init(&pci_dev->bus_master_as, &pci_dev->bus_master_container_region, pci_dev->name); + pci_dev->bus_master_as.max_bounce_buffer_size = + pci_dev->max_bounce_buffer_size; if (phase_check(PHASE_MACHINE_READY)) { pci_init_bus_master(pci_dev); @@ -2657,6 +2661,10 @@ static void pci_device_class_init(ObjectClass *klass, void *data) k->unrealize = pci_qdev_unrealize; k->bus_type = TYPE_PCI_BUS; device_class_set_props(k, pci_props); + object_class_property_set_description( + klass, "x-max-bounce-buffer-size", + "Maximum buffer size allocated for bounce buffers used for mapped " + "access to indirect DMA memory"); } static void pci_device_class_base_init(ObjectClass *klass, void *data) diff --git a/include/exec/memory.h b/include/exec/memory.h index fd2cbee65b..4f99bb4e6c 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1091,13 +1091,7 @@ typedef struct AddressSpaceMapClient { QLIST_ENTRY(AddressSpaceMapClient) link; } AddressSpaceMapClient; -typedef struct { - MemoryRegion *mr; - void *buffer; - hwaddr addr; - hwaddr len; - bool in_use; -} BounceBuffer; +#define DEFAULT_MAX_BOUNCE_BUFFER_SIZE (4096) /** * struct AddressSpace: describes a mapping of addresses to #MemoryRegion objects @@ -1117,8 +1111,10 @@ struct AddressSpace { QTAILQ_HEAD(, MemoryListener) listeners; QTAILQ_ENTRY(AddressSpace) address_spaces_link; - /* Bounce buffer to use for this address space. */ - BounceBuffer bounce; + /* Maximum DMA bounce buffer size used for indirect memory map requests */ + uint64_t max_bounce_buffer_size; + /* Total size of bounce buffers currently allocated, atomically accessed */ + uint64_t bounce_buffer_size; /* List of callbacks to invoke when buffers free up */ QemuMutex map_client_list_lock; QLIST_HEAD(, AddressSpaceMapClient) map_client_list; diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h index d3dd0f64b2..f4027c5379 100644 --- a/include/hw/pci/pci_device.h +++ b/include/hw/pci/pci_device.h @@ -160,6 +160,9 @@ struct PCIDevice { /* ID of standby device in net_failover pair */ char *failover_pair_id; uint32_t acpi_index; + + /* Maximum DMA bounce buffer size used for indirect memory map requests */ + uint64_t max_bounce_buffer_size; }; static inline int pci_intx(PCIDevice *pci_dev) diff --git a/system/memory.c b/system/memory.c index a63bbc79a7..0b90ad8b07 100644 --- a/system/memory.c +++ b/system/memory.c @@ -3132,7 +3132,8 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name) as->ioeventfds = NULL; QTAILQ_INIT(&as->listeners); QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link); - as->bounce.in_use = false; + as->max_bounce_buffer_size = DEFAULT_MAX_BOUNCE_BUFFER_SIZE; + as->bounce_buffer_size = 0; qemu_mutex_init(&as->map_client_list_lock); QLIST_INIT(&as->map_client_list); as->name = g_strdup(name ? name : "anonymous"); @@ -3142,7 +3143,7 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name) static void do_address_space_destroy(AddressSpace *as) { - assert(!qatomic_read(&as->bounce.in_use)); + assert(qatomic_read(&as->bounce_buffer_size) == 0); assert(QLIST_EMPTY(&as->map_client_list)); qemu_mutex_destroy(&as->map_client_list_lock); diff --git a/system/physmem.c b/system/physmem.c index 1b7ff90113..1b631c08bb 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2952,6 +2952,20 @@ void cpu_flush_icache_range(hwaddr start, hwaddr len) NULL, len, FLUSH_CACHE); } +/* + * A magic value stored in the first 8 bytes of the bounce buffer struct. Used + * to detect illegal pointers passed to address_space_unmap. + */ +#define BOUNCE_BUFFER_MAGIC 0xb4017ceb4ffe12ed + +typedef struct { + uint64_t magic; + MemoryRegion *mr; + hwaddr addr; + size_t len; + uint8_t buffer[]; +} BounceBuffer; + static void address_space_unregister_map_client_do(AddressSpaceMapClient *client) { @@ -2977,9 +2991,9 @@ void address_space_register_map_client(AddressSpace *as, QEMUBH *bh) qemu_mutex_lock(&as->map_client_list_lock); client->bh = bh; QLIST_INSERT_HEAD(&as->map_client_list, client, link); - /* Write map_client_list before reading in_use. */ + /* Write map_client_list before reading bounce_buffer_size. */ smp_mb(); - if (!qatomic_read(&as->bounce.in_use)) { + if (qatomic_read(&as->bounce_buffer_size) < as->max_bounce_buffer_size) { address_space_notify_map_clients_locked(as); } qemu_mutex_unlock(&as->map_client_list_lock); @@ -3109,28 +3123,38 @@ void *address_space_map(AddressSpace *as, mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs); if (!memory_access_is_direct(mr, is_write)) { - if (qatomic_xchg(&as->bounce.in_use, true)) { + size_t size = qatomic_add_fetch(&as->bounce_buffer_size, l); + if (size > as->max_bounce_buffer_size) { + /* + * Note that the overshot might be larger than l if threads are + * racing and bump bounce_buffer_size at the same time. + */ + size_t excess = MIN(size - as->max_bounce_buffer_size, l); + l -= excess; + qatomic_sub(&as->bounce_buffer_size, excess); + } + + if (l == 0) { *plen = 0; return NULL; } - /* Avoid unbounded allocations */ - l = MIN(l, TARGET_PAGE_SIZE); - as->bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); - as->bounce.addr = addr; - as->bounce.len = l; + BounceBuffer *bounce = g_malloc0(l + sizeof(BounceBuffer)); + bounce->magic = BOUNCE_BUFFER_MAGIC; memory_region_ref(mr); - as->bounce.mr = mr; + bounce->mr = mr; + bounce->addr = addr; + bounce->len = l; + if (!is_write) { flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED, - as->bounce.buffer, l); + bounce->buffer, l); } *plen = l; - return as->bounce.buffer; + return bounce->buffer; } - memory_region_ref(mr); *plen = flatview_extend_translation(fv, addr, len, mr, xlat, l, is_write, attrs); @@ -3145,12 +3169,11 @@ void *address_space_map(AddressSpace *as, void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len, bool is_write, hwaddr access_len) { - if (buffer != as->bounce.buffer) { - MemoryRegion *mr; - ram_addr_t addr1; + MemoryRegion *mr; + ram_addr_t addr1; - mr = memory_region_from_host(buffer, &addr1); - assert(mr != NULL); + mr = memory_region_from_host(buffer, &addr1); + if (mr != NULL) { if (is_write) { invalidate_and_set_dirty(mr, addr1, access_len); } @@ -3160,15 +3183,22 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len, memory_region_unref(mr); return; } + + + BounceBuffer *bounce = container_of(buffer, BounceBuffer, buffer); + assert(bounce->magic == BOUNCE_BUFFER_MAGIC); + if (is_write) { - address_space_write(as, as->bounce.addr, MEMTXATTRS_UNSPECIFIED, - as->bounce.buffer, access_len); - } - qemu_vfree(as->bounce.buffer); - as->bounce.buffer = NULL; - memory_region_unref(as->bounce.mr); - /* Clear in_use before reading map_client_list. */ - qatomic_set_mb(&as->bounce.in_use, false); + address_space_write(as, bounce->addr, MEMTXATTRS_UNSPECIFIED, + bounce->buffer, access_len); + } + + qatomic_sub(&as->bounce_buffer_size, bounce->len); + bounce->magic = ~BOUNCE_BUFFER_MAGIC; + memory_region_unref(bounce->mr); + g_free(bounce); + /* Write bounce_buffer_size before reading map_client_list. */ + smp_mb(); address_space_notify_map_clients(as); } From patchwork Wed Nov 1 13:16:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mattias Nissler X-Patchwork-Id: 1857954 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=08zeba/S; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SL6xN43bRz1yQn for ; Thu, 2 Nov 2023 00:17:27 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyB5a-0000KH-10; Wed, 01 Nov 2023 09:16:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyB5K-0000Ht-A5 for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:40 -0400 Received: from mail-ot1-x330.google.com ([2607:f8b0:4864:20::330]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qyB5D-0003dP-1U for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:36 -0400 Received: by mail-ot1-x330.google.com with SMTP id 46e09a7af769-6d31f3e8ca8so172711a34.0 for ; Wed, 01 Nov 2023 06:16:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698844590; x=1699449390; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ugWmU+RI6KuJRXphjeFR5kNJeU7Hhqdcf4zIt6LNeaY=; b=08zeba/S+dn2LGCOL0okQIUIBVn/eRmkpIiW2oC6m2v941oQ4DQurbYuuk9PC58gbX ohodhw1XhrYYpljrvacp+wwJLCzVjJnOI7og6mLltjxttUXEmVmoVRxj7ORvAPkxehET F+LSITTC83GcOCCiQ2gI2XPYwynFH8U2kSvx8meVwcM+UW66pYu4H45E7N4Ew9eJ7Rps DH2KugxVA9DawsdYJsKhdtsVkbz6GW9KQ4vWkjvUzeYf6lezs9stDUrYChXFH+swc2bw brGqvljnp08IbLfBTiPABWUUzKGUF5zuuIIekPynziTph52mRpeOziwdeAr4okqmOwED NuOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698844590; x=1699449390; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ugWmU+RI6KuJRXphjeFR5kNJeU7Hhqdcf4zIt6LNeaY=; b=Y/Ww+HQcczx28XKgK+KpvnkfamK66lP2/CXEbRE4XylnjkfonG/38tI5LtnnMv2wRo RqHbPuMC09xOpUqx38yq4iOPbsSKWsmVtOU093imwvxvtTkboG1Fwo5U/ZOC3E5jBqK3 wQsSQSo78DqVKDzIjcZq/AWXtIatmyNGrUrUAdr628eo7BQUYqMMTFDoRKVCeKrCIIDl iTRIklyh5w237CrL5OpwTvCt3yS7CMXc8+LY8Z3QjQS+6fVXJv5HQfZhTLOtJzmMsvx1 tRmR5H3VzMw9eWpYMprh2zNAqZ8udVjRhlEIP9y9hbTAVDjzMaBmdUIJyp2rfkBF57hk k81g== X-Gm-Message-State: AOJu0YxqOwgF5dwVPLaI4P6COHfucOOp0A1oh0aJu5KLt4ci5f18Dq84 JLPqDz1wB+4KBFp1FqBuoVL/5Q== X-Google-Smtp-Source: AGHT+IEP5Q8DivA0tzeg44pVx0YgKFNh0PvTvuOG6RxbCjwy7n0x95B5TNTcAflKx0ZjEr8i8phxyQ== X-Received: by 2002:a05:6870:32cc:b0:1ea:cdcb:5a2b with SMTP id r12-20020a05687032cc00b001eacdcb5a2bmr18687360oac.54.1698844589990; Wed, 01 Nov 2023 06:16:29 -0700 (PDT) Received: from mnissler.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id y6-20020a056870e3c600b001d4d8efa7f9sm238148oad.4.2023.11.01.06.16.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 06:16:29 -0700 (PDT) From: Mattias Nissler To: stefanha@redhat.com, jag.raman@oracle.com, qemu-devel@nongnu.org, peterx@redhat.com Cc: john.levon@nutanix.com, David Hildenbrand , Marcel Apfelbaum , Paolo Bonzini , "Michael S. Tsirkin" , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Elena Ufimtseva , Richard Henderson , Mattias Nissler Subject: [PATCH v6 3/5] Update subprojects/libvfio-user Date: Wed, 1 Nov 2023 06:16:09 -0700 Message-Id: <20231101131611.775299-4-mnissler@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231101131611.775299-1-mnissler@rivosinc.com> References: <20231101131611.775299-1-mnissler@rivosinc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::330; envelope-from=mnissler@rivosinc.com; helo=mail-ot1-x330.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_HELO_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Brings in assorted bug fixes. The following are of particular interest with respect to message-based DMA support: * bb308a2 "Fix address calculation for message-based DMA" Corrects a bug in DMA address calculation. * 1569a37 "Pass server->client command over a separate socket pair" Adds support for separate sockets for either command direction, addressing a bug where libvfio-user gets confused if both client and server send commands concurrently. Signed-off-by: Mattias Nissler Reviewed-by: Jagannathan Raman --- subprojects/libvfio-user.wrap | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/subprojects/libvfio-user.wrap b/subprojects/libvfio-user.wrap index 416955ca45..cdf0a7a375 100644 --- a/subprojects/libvfio-user.wrap +++ b/subprojects/libvfio-user.wrap @@ -1,4 +1,4 @@ [wrap-git] url = https://gitlab.com/qemu-project/libvfio-user.git -revision = 0b28d205572c80b568a1003db2c8f37ca333e4d7 +revision = 1569a37a54ecb63bd4008708c76339ccf7d06115 depth = 1 From patchwork Wed Nov 1 13:16:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mattias Nissler X-Patchwork-Id: 1857959 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=vFH49yQO; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SL6y21yxJz1yQ4 for ; Thu, 2 Nov 2023 00:18:02 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyB5X-0000JA-P1; Wed, 01 Nov 2023 09:16:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyB5G-0000H5-HV for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:36 -0400 Received: from mail-oa1-x2c.google.com ([2001:4860:4864:20::2c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qyB5E-0003dU-FX for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:34 -0400 Received: by mail-oa1-x2c.google.com with SMTP id 586e51a60fabf-1e0ee4e777bso4362894fac.3 for ; Wed, 01 Nov 2023 06:16:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698844591; x=1699449391; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5j3QQLMZ3x6EOb0j3f63WkMAnSQK6GnCLJFmO2G4vGg=; b=vFH49yQOINmSJ27u/LqOOtrQS7Jl72+df3eIuDa6L2L815sE/AV918ctAMMVLPrk1F TDLM4wesWMbUufY7+IhqzOIq6JlqZD2HTIJo5jhyY+IW2LQTTh5TCKPM8r2zoKmDiFKx ojuNYeNTa2obsS83ykHLeYBPUU7ih0eH15uUlCvpa+C4yPOcqjD0ZleJXqwlqqWH/lEW 63+PaUPsZ4Mj2LjuxZSIfCU2ch+sq96pe1yR61dsyC9pF7oSx8E2vyNMtVPe55+0ngZW TUBafsBTw4I0faL/qO59PPipTh1OUZd3AlIhLTkuvOXes4ymajJPD4hJMhVFIPcyEdhP LF1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698844591; x=1699449391; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5j3QQLMZ3x6EOb0j3f63WkMAnSQK6GnCLJFmO2G4vGg=; b=MdC+RSuwBBcBezV3K1AY1VvH5lU8slXitRcFHbrBiSY+9tDdrZIMy61yXxrpWxUdWC ukJXVtwy7AZyXtnm+tpwyUlP7ntOmecaFwHre+Hbsg/OqHAKYNCSK9vnmgz8FymnJfG0 jgumWifqh9vvlgEKnKoYW/F1nAiTzYqCkPU9taph3hGPkSir67JpYM5w8guS9Iq1tdvK yoqW3+H/R5ZV9+P/41R27c6zegdpd5Py9CLGqoBFHFWJPAVHxkSm+8svoqy3M7NzPmNX DOPiA2ESBMuhbTvS1uR65UBiT/D2EJ0/NA+mnr5hhJxcpThUtAmi6hsUFEU4QNIGpWx/ X6Zw== X-Gm-Message-State: AOJu0Yz2i1n0X55w+JPaBlmun31UftzEb5Xqw117obwvjmQRfa1fwyPK PsrD12LCRxZIPDILTkFudZBlXA== X-Google-Smtp-Source: AGHT+IFaF343+xTfNXSmWc6jLMS5OXiKZ/J6/kXvtTH1tivCwnO5DBnw9mQVPqRTW5nKFduW5HcnDw== X-Received: by 2002:a05:6871:3428:b0:1e9:b860:96df with SMTP id nh40-20020a056871342800b001e9b86096dfmr20549532oac.7.1698844591143; Wed, 01 Nov 2023 06:16:31 -0700 (PDT) Received: from mnissler.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id y6-20020a056870e3c600b001d4d8efa7f9sm238148oad.4.2023.11.01.06.16.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 06:16:30 -0700 (PDT) From: Mattias Nissler To: stefanha@redhat.com, jag.raman@oracle.com, qemu-devel@nongnu.org, peterx@redhat.com Cc: john.levon@nutanix.com, David Hildenbrand , Marcel Apfelbaum , Paolo Bonzini , "Michael S. Tsirkin" , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Elena Ufimtseva , Richard Henderson , Mattias Nissler Subject: [PATCH v6 4/5] vfio-user: Message-based DMA support Date: Wed, 1 Nov 2023 06:16:10 -0700 Message-Id: <20231101131611.775299-5-mnissler@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231101131611.775299-1-mnissler@rivosinc.com> References: <20231101131611.775299-1-mnissler@rivosinc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::2c; envelope-from=mnissler@rivosinc.com; helo=mail-oa1-x2c.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Wire up support for DMA for the case where the vfio-user client does not provide mmap()-able file descriptors, but DMA requests must be performed via the VFIO-user protocol. This installs an indirect memory region, which already works for pci_dma_{read,write}, and pci_dma_map works thanks to the existing DMA bounce buffering support. Note that while simple scenarios work with this patch, there's a known race condition in libvfio-user that will mess up the communication channel. See https://github.com/nutanix/libvfio-user/issues/279 for details as well as a proposed fix. Signed-off-by: Mattias Nissler Reviewed-by: Jagannathan Raman --- hw/remote/trace-events | 2 + hw/remote/vfio-user-obj.c | 100 ++++++++++++++++++++++++++++++++------ 2 files changed, 87 insertions(+), 15 deletions(-) diff --git a/hw/remote/trace-events b/hw/remote/trace-events index 0d1b7d56a5..358a68fb34 100644 --- a/hw/remote/trace-events +++ b/hw/remote/trace-events @@ -9,6 +9,8 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x -> 0x%x" vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x" vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes" vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64"" +vfu_dma_read(uint64_t gpa, size_t len) "vfu: DMA read 0x%"PRIx64", %zu bytes" +vfu_dma_write(uint64_t gpa, size_t len) "vfu: DMA write 0x%"PRIx64", %zu bytes" vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64"" vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64"" vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64"" diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c index 8b10c32a3c..9f5e385668 100644 --- a/hw/remote/vfio-user-obj.c +++ b/hw/remote/vfio-user-obj.c @@ -300,6 +300,63 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf, return count; } +static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val, + unsigned size, MemTxAttrs attrs) +{ + MemoryRegion *region = opaque; + vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx; + uint8_t buf[sizeof(uint64_t)]; + + trace_vfu_dma_read(region->addr + addr, size); + + g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size()); + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr); + if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 || + vfu_sgl_read(vfu_ctx, sg, 1, buf) != 0) { + return MEMTX_ERROR; + } + + *val = ldn_he_p(buf, size); + + return MEMTX_OK; +} + +static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val, + unsigned size, MemTxAttrs attrs) +{ + MemoryRegion *region = opaque; + vfu_ctx_t *vfu_ctx = VFU_OBJECT(region->owner)->vfu_ctx; + uint8_t buf[sizeof(uint64_t)]; + + trace_vfu_dma_write(region->addr + addr, size); + + stn_he_p(buf, size, val); + + g_autofree dma_sg_t *sg = g_malloc0(dma_sg_size()); + vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr); + if (vfu_addr_to_sgl(vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 || + vfu_sgl_write(vfu_ctx, sg, 1, buf) != 0) { + return MEMTX_ERROR; + } + + return MEMTX_OK; +} + +static const MemoryRegionOps vfu_dma_ops = { + .read_with_attrs = vfu_dma_read, + .write_with_attrs = vfu_dma_write, + .endianness = DEVICE_HOST_ENDIAN, + .valid = { + .min_access_size = 1, + .max_access_size = 8, + .unaligned = true, + }, + .impl = { + .min_access_size = 1, + .max_access_size = 8, + }, +}; + static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info) { VfuObject *o = vfu_get_private(vfu_ctx); @@ -308,17 +365,30 @@ static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info) g_autofree char *name = NULL; struct iovec *iov = &info->iova; - if (!info->vaddr) { - return; - } - name = g_strdup_printf("mem-%s-%"PRIx64"", o->device, - (uint64_t)info->vaddr); + (uint64_t)iov->iov_base); subregion = g_new0(MemoryRegion, 1); - memory_region_init_ram_ptr(subregion, NULL, name, - iov->iov_len, info->vaddr); + if (info->vaddr) { + memory_region_init_ram_ptr(subregion, OBJECT(o), name, + iov->iov_len, info->vaddr); + } else { + /* + * Note that I/O regions' MemoryRegionOps handle accesses of at most 8 + * bytes at a time, and larger accesses are broken down. However, + * many/most DMA accesses are larger than 8 bytes and VFIO-user can + * handle large DMA accesses just fine, thus this size restriction + * unnecessarily hurts performance, in particular given that each + * access causes a round trip on the VFIO-user socket. + * + * TODO: Investigate how to plumb larger accesses through memory + * regions, possibly by amending MemoryRegionOps or by creating a new + * memory region type. + */ + memory_region_init_io(subregion, OBJECT(o), &vfu_dma_ops, subregion, + name, iov->iov_len); + } dma_as = pci_device_iommu_address_space(o->pci_dev); @@ -330,20 +400,20 @@ static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info) static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info) { VfuObject *o = vfu_get_private(vfu_ctx); + MemoryRegionSection mr_section; AddressSpace *dma_as = NULL; - MemoryRegion *mr = NULL; - ram_addr_t offset; - mr = memory_region_from_host(info->vaddr, &offset); - if (!mr) { + dma_as = pci_device_iommu_address_space(o->pci_dev); + + mr_section = + memory_region_find(dma_as->root, (hwaddr)info->iova.iov_base, 1); + if (!mr_section.mr) { return; } - dma_as = pci_device_iommu_address_space(o->pci_dev); - - memory_region_del_subregion(dma_as->root, mr); + memory_region_del_subregion(dma_as->root, mr_section.mr); - object_unparent((OBJECT(mr))); + object_unparent((OBJECT(mr_section.mr))); trace_vfu_dma_unregister((uint64_t)info->iova.iov_base); } From patchwork Wed Nov 1 13:16:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mattias Nissler X-Patchwork-Id: 1857957 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=v0xO01kQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SL6xj1sykz1yQ4 for ; Thu, 2 Nov 2023 00:17:45 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyB5T-0000IU-Q7; Wed, 01 Nov 2023 09:16:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qyB5I-0000HP-Ef for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:38 -0400 Received: from mail-oa1-x35.google.com ([2001:4860:4864:20::35]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qyB5F-0003dl-TK for qemu-devel@nongnu.org; Wed, 01 Nov 2023 09:16:35 -0400 Received: by mail-oa1-x35.google.com with SMTP id 586e51a60fabf-1ea82246069so4327607fac.3 for ; Wed, 01 Nov 2023 06:16:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698844593; x=1699449393; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lFF+xz1moS0zKHI0k5mkW8skLO8foQZJd4bTPswPJb4=; b=v0xO01kQPc1YETflmiX9dyr/+2eH37vK04DYWpTgcuh/aWc4+zqS9yYVdMpOQgb2wj SjvyFGqi6ViJfq1nGADwwLt2pOuK56YCyEhvg/A3EyyEoE9YeVd8T7iKJaBcceI7VNuH Oh+QlZ2m7tFDrO1pM9wwopXJTSwxMAhYucsxll1Lsl/kmPlp96jXGjowsPSaIb9uN8xy xzwc2RQ4t+kkGcAHY6S2fHh7iIem6y5bxxg3w1rGb5rYymjEX0czTOCNwQCRYSlK31e+ 5/MzdG3ma/LXm4Cf5zsrK6ZksfVoHfOpeP7oiLg2+cG6mgsRbUWB/G/QDOZ3zAjRzFnp V9uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698844593; x=1699449393; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lFF+xz1moS0zKHI0k5mkW8skLO8foQZJd4bTPswPJb4=; b=rHno3A10K/bY19XIYxVAL78bObn0O7mVT8irN8V9gg8ZOwtRr3mANQ9SMXZaRVYRXk V27uJLRa0mbVLxQ4jOei/f0nSPwDf2YwzOW0BVhjHh997ZJbDHnFSqfJEDOPSV7gQhTA IsL7A/JGnrcmT3IcCT//2MLdsra1sMhwVMXh6Q9tmkrya3dHNSIk5gK/dfv1WZkEqJ/f jf4bsp1qlu5/w7zGPHanl3rUmpBJiKC0rjVCjMZKsiR1gsPJJaIOAX7ZRN2oY0EKp3gn Ei03XhvAz4777VW6JA2BhfsEucg9Vg99uoOyASqc9OSVJ7tcf5aFn+ddW4au+XKrDeG8 8nag== X-Gm-Message-State: AOJu0YyOs9HQFwvIHB41+N6Th7szu2rN7thMk7B4SFdqxUAq3fquLNYf vG8oS5Ouw+f3K0SYSo332+mT6A== X-Google-Smtp-Source: AGHT+IGAEwmbBU+noDaEflmAuWhFDJ9kbaPWu1NsDUtoclcAcxDU1Z0Wg7ARq94m/WMDhlxJ0aps6w== X-Received: by 2002:a05:6870:4202:b0:1ea:7bd1:c48d with SMTP id u2-20020a056870420200b001ea7bd1c48dmr21140574oac.49.1698844592343; Wed, 01 Nov 2023 06:16:32 -0700 (PDT) Received: from mnissler.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id y6-20020a056870e3c600b001d4d8efa7f9sm238148oad.4.2023.11.01.06.16.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 06:16:32 -0700 (PDT) From: Mattias Nissler To: stefanha@redhat.com, jag.raman@oracle.com, qemu-devel@nongnu.org, peterx@redhat.com Cc: john.levon@nutanix.com, David Hildenbrand , Marcel Apfelbaum , Paolo Bonzini , "Michael S. Tsirkin" , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Elena Ufimtseva , Richard Henderson , Mattias Nissler Subject: [PATCH v6 5/5] vfio-user: Fix config space access byte order Date: Wed, 1 Nov 2023 06:16:11 -0700 Message-Id: <20231101131611.775299-6-mnissler@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231101131611.775299-1-mnissler@rivosinc.com> References: <20231101131611.775299-1-mnissler@rivosinc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4860:4864:20::35; envelope-from=mnissler@rivosinc.com; helo=mail-oa1-x35.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org PCI config space is little-endian, so on a big-endian host we need to perform byte swaps for values as they are passed to and received from the generic PCI config space access machinery. Signed-off-by: Mattias Nissler Reviewed-by: Jagannathan Raman --- hw/remote/vfio-user-obj.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c index 9f5e385668..46a2036bd1 100644 --- a/hw/remote/vfio-user-obj.c +++ b/hw/remote/vfio-user-obj.c @@ -281,7 +281,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf, while (bytes > 0) { len = (bytes > pci_access_width) ? pci_access_width : bytes; if (is_write) { - memcpy(&val, ptr, len); + val = ldn_le_p(ptr, len); pci_host_config_write_common(o->pci_dev, offset, pci_config_size(o->pci_dev), val, len); @@ -289,7 +289,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf, } else { val = pci_host_config_read_common(o->pci_dev, offset, pci_config_size(o->pci_dev), len); - memcpy(ptr, &val, len); + stn_le_p(ptr, len, val); trace_vfu_cfg_read(offset, val); } offset += len;