From patchwork Tue Feb 20 04:19:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1901194 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=MYqDsGgD; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Tf5mk3yFVz23cb for ; Tue, 20 Feb 2024 15:20:38 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rcHbd-0001eX-Bc; Mon, 19 Feb 2024 23:19:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rcHbb-0001eE-J7; Mon, 19 Feb 2024 23:19:43 -0500 Received: from mail-pj1-x1030.google.com ([2607:f8b0:4864:20::1030]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rcHbZ-0006DW-Rq; Mon, 19 Feb 2024 23:19:43 -0500 Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-299ea1f1989so30707a91.0; Mon, 19 Feb 2024 20:19:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708402780; x=1709007580; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4UtSR/3s4YP7V03Ktlw6YWmKCvZaG6tEV1kVd5CEnc0=; b=MYqDsGgDX4IiYPeiZkH0fZWh2BONAzNPco1XpAzDbRPuOZ7O4HSKPb++Qbsf7k0O31 lCziZBDgNP/A1AvVA6LHwOcYlvG/nnPMgPyfKsGs73ri2YyCunJrKK6mMBP5vjlU7BFp t5EJHFhqPhoafrviTjgyavYD2Isg42OQXJ+gaeVh4rvh05BmP3nYCk9q1o1SJO+2FRvD T9xPhL00F5LVggc3YgR0nRhooC/loJZHPRfWVBAzgv7ajSr/zEB5Y+mk0gIumJoN3aAr 8FpFaGsWWM4Kza+0l7V3W+Cd0E2dKJfewuh4WdgXBsDiC0ZUPWoS31MBzsPykc9vwMRQ y9SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708402780; x=1709007580; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4UtSR/3s4YP7V03Ktlw6YWmKCvZaG6tEV1kVd5CEnc0=; b=rWt3NCUS4H/ZenOdWGIR2WmXjTkPSFrFkITgBeOe1MlIGk87dv5xIbVj9bF8Qls7Nj 4Ou/cmIuWxUXQuE8sHYFAVvYwGUrLb0wbLRTTiTtuZdv+SomXLEe6SqLNltXWAvo6py1 sUnZcUuA4r0LDJUwxYxzNbIXv93037JvneOoQxZ6vKAgTIA/B1Z0FHIGKYtJQMdKIckl y4B+DggBcaPFRuIvPV0QDyPy47eXvLkEvdW3etp4OLa68WbJNGijX49rwBIhkyBZOQw8 XKPXpziPdUdcrvT8TjVeEXRBS6hM1nw8x4zLl92NEInStpvcrIiIep5m9ISRJ2RXE6ye /gwA== X-Forwarded-Encrypted: i=1; AJvYcCXcyd3i4qzsfBm7eYKVLHKJkFacSAiuy9s/g+I7lM8mhT3f8STmm3LuakafIIj8Q9bGUFk3ZhoU7j0S0qYurPBBcLuYEFk= X-Gm-Message-State: AOJu0YyZkRnMlRG2r3K3gz8EWNkE+corEQyjH0j2Fwjtj4NwjJiEfzD7 TFJ3Iceu2ICKh+seHbcH9F+1S8ruIGH+7qWAw7g1YJIxPK+zjPKtRK8Y/KI/ X-Google-Smtp-Source: AGHT+IFspeXXeXTTa2VqypNi6kxQNzkGsnp7hsL3WOFuSvT1cfroLZONuokkjXiX7bpSKrsLETLffg== X-Received: by 2002:a17:90b:3118:b0:299:1d33:846c with SMTP id gc24-20020a17090b311800b002991d33846cmr9759990pjb.39.1708402774917; Mon, 19 Feb 2024 20:19:34 -0800 (PST) Received: from wheely.local0.net (123-243-155-241.static.tpgi.com.au. [123.243.155.241]) by smtp.gmail.com with ESMTPSA id pl6-20020a17090b268600b00297138f0496sm6232621pjb.31.2024.02.19.20.19.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Feb 2024 20:19:34 -0800 (PST) From: Nicholas Piggin To: qemu-ppc@nongnu.org, Thomas Huth Cc: Nicholas Piggin , qemu-devel@nongnu.org, Richard Henderson Subject: [RFC PATCH 1/3] accel/tcg: Fix TCG TLB coherency race with physical dirty bit clearing Date: Tue, 20 Feb 2024 14:19:20 +1000 Message-ID: <20240220041922.373029-2-npiggin@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240220041922.373029-1-npiggin@gmail.com> References: <20240220041922.373029-1-npiggin@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1030; envelope-from=npiggin@gmail.com; helo=mail-pj1-x1030.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The TCG TLB keeps a TLB_NOTDIRTY bit that must be kept coherent with the cpu physical dirty memory bitmaps. If any bits are clear, TLB_NOTDIRTY *must* be set in all CPUs so that the nodirty_write() slowpath is guaranteed to be called on a write access. TLB_NOTDIRTY may be set if none of the bits are clear, it just results in a superfluous nodirty_write() call (which should remove the bit). There is a race with clearing physical dirty bits and setting TLB_NOTDIRTY vs setting or creating TLB entries without the TLB_NOTDIRTY bit, that can cause the bit to get lost and notdirty_write() updates to be missed. nodirty_write() 1. cpu_physical_memory_set_dirty_range() 2. if (!cpu_physical_memory_is_clean()) 3. tlb_set_dirty() vs cpu_physical_memory_test_and_clear_dirty() A. dirty = bitmap_test_and_clear_atomic() if (dirty) B. tlb_reset_dirty_range_all() 1 executes, then 2 finds no bits clean, then A clears the dirty bit and runs B to set TLB_NOTDIRTY on the TLB entries, then 3 clears the TLB_NOTDIRTY bits from the TLB entries. This results in the physical dirty bitmap having a clear bit with a TLB entry pointing to it without TLB_NOTDIRTY, which means stores through that TLB are performed without the notdirty_write() call to set dirty bits (or invalidate tb). Fix this by checking the physical dirty bitmap while holding the TLB lock that also covers TLB entry insertion / TLB_NOTDIRTY clearing. Signed-off-by: Nicholas Piggin --- Hi Thomas, This is the other dirty bitmap race I saw, if you are hunting migration bugs... Thanks, Nick include/exec/exec-all.h | 1 - accel/stubs/tcg-stub.c | 4 ---- accel/tcg/cputlb.c | 47 +++++++++++++++++++++++++++++------------ 3 files changed, 33 insertions(+), 19 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index ce36bb10d4..ade81b25f0 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -654,7 +654,6 @@ static inline void mmap_unlock(void) {} #define WITH_MMAP_LOCK_GUARD() void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length); -void tlb_set_dirty(CPUState *cpu, vaddr addr); MemoryRegionSection * address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr addr, diff --git a/accel/stubs/tcg-stub.c b/accel/stubs/tcg-stub.c index 8a496a2a6f..dd890d6cf6 100644 --- a/accel/stubs/tcg-stub.c +++ b/accel/stubs/tcg-stub.c @@ -18,10 +18,6 @@ void tb_flush(CPUState *cpu) { } -void tlb_set_dirty(CPUState *cpu, vaddr vaddr) -{ -} - int probe_access_flags(CPUArchState *env, vaddr addr, int size, MMUAccessType access_type, int mmu_idx, bool nonfault, void **phost, uintptr_t retaddr) diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 047cd2cc0a..078f4ef166 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -1001,10 +1001,17 @@ static inline void copy_tlb_helper_locked(CPUTLBEntry *d, const CPUTLBEntry *s) *d = *s; } -/* This is a cross vCPU call (i.e. another vCPU resetting the flags of +/* + * This is a cross vCPU call (i.e. another vCPU resetting the flags of * the target vCPU). * We must take tlb_c.lock to avoid racing with another vCPU update. The only * thing actually updated is the target TLB entry ->addr_write flags. + * + * This can race between the physical memory dirty bits becoming all set and + * tlb_set_page_full or tlb_try_set_dirty creating dirty entries: it is + * harmless to set TLB_NOTDIRTY on a !clean physical page, it might just + * cause a notdirty_write() slowpath on the next write which clears out the + * spurious TLB_NOTDIRTY. */ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length) { @@ -1037,9 +1044,11 @@ static inline void tlb_set_dirty1_locked(CPUTLBEntry *tlb_entry, } } -/* update the TLB corresponding to virtual page vaddr - so that it is no longer dirty */ -void tlb_set_dirty(CPUState *cpu, vaddr addr) +/* + * Update the TLB corresponding to virtual page vaddr if it no longer + * requires the TLB_NOTDIRTY bit. + */ +static void tlb_try_set_dirty(CPUState *cpu, vaddr addr, ram_addr_t ram_addr) { int mmu_idx; @@ -1047,6 +1056,12 @@ void tlb_set_dirty(CPUState *cpu, vaddr addr) addr &= TARGET_PAGE_MASK; qemu_spin_lock(&cpu->neg.tlb.c.lock); + if (cpu_physical_memory_is_clean(ram_addr)) { + /* Must be checked under lock, like tlb_set_page_full */ + qemu_spin_unlock(&cpu->neg.tlb.c.lock); + return; + } + for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { tlb_set_dirty1_locked(tlb_entry(cpu, mmu_idx, addr), addr); } @@ -1165,6 +1180,19 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx, addend = 0; } + /* + * Hold the TLB lock for the rest of the function. We could acquire/release + * the lock several times in the function, but it is faster to amortize the + * acquisition cost by acquiring it just once. Note that this leads to + * a longer critical section, but this is not a concern since the TLB lock + * is unlikely to be contended. + * + * The TLB lock must be held over checking cpu_physical_memory_is_clean + * and inserting the entry into the TLB, so clearing phsyical memory + * dirty status can find and set TLB_NOTDIRTY on the entries. + */ + qemu_spin_lock(&tlb->c.lock); + write_flags = read_flags; if (is_ram) { iotlb = memory_region_get_ram_addr(section->mr) + xlat; @@ -1200,15 +1228,6 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx, index = tlb_index(cpu, mmu_idx, addr_page); te = tlb_entry(cpu, mmu_idx, addr_page); - /* - * Hold the TLB lock for the rest of the function. We could acquire/release - * the lock several times in the function, but it is faster to amortize the - * acquisition cost by acquiring it just once. Note that this leads to - * a longer critical section, but this is not a concern since the TLB lock - * is unlikely to be contended. - */ - qemu_spin_lock(&tlb->c.lock); - /* Note that the tlb is no longer clean. */ tlb->c.dirty |= 1 << mmu_idx; @@ -1409,7 +1428,7 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size, /* We remove the notdirty callback only if the code has been flushed. */ if (!cpu_physical_memory_is_clean(ram_addr)) { trace_memory_notdirty_set_dirty(mem_vaddr); - tlb_set_dirty(cpu, mem_vaddr); + tlb_try_set_dirty(cpu, mem_vaddr, ram_addr); } } From patchwork Tue Feb 20 04:19:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1901197 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=jrFyPlYF; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Tf5mv1vPlz23cw for ; Tue, 20 Feb 2024 15:20:47 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rcHbh-0001fh-Ai; Mon, 19 Feb 2024 23:19:49 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rcHbe-0001f8-Fm; Mon, 19 Feb 2024 23:19:46 -0500 Received: from mail-pg1-x529.google.com ([2607:f8b0:4864:20::529]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rcHbb-0006EG-Sg; Mon, 19 Feb 2024 23:19:46 -0500 Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-5bdbe2de25fso4108933a12.3; Mon, 19 Feb 2024 20:19:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708402782; x=1709007582; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FFZNdh+B8F0CK+ODcoo3/h5v6oqUz7dYUbqoB5SHAvs=; b=jrFyPlYFEYRPpxxbyadPT7MKfyvqtIsn7JfWB8fEJAASboaGIbH4cLhpa5rJHUt9fl PqNMArTU8i9fd2CwFrv+EMT3sxzl4DGRCRggOU2kTk9BfNbym5UXmi4UYudpb1a1UNNs cypvChah17brIcW5kMRt8ifRqEG+orYZ8JsCYQVjrA6siHOrflcL6xEOz0Xmz4XURvam c/w10bjLTgGecRCyYa6C+G0j4VhDgbI0DtYuQUxix/N3eNVKbMas3z6SrUNbYQVhd6r4 AqylJeSvr1Xj6RORJ+C1hWpmGbUiNxfZC/4sClWzTvBMy4gP6QaaU1DL3+hl3UVAPkqA Nd7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708402782; x=1709007582; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FFZNdh+B8F0CK+ODcoo3/h5v6oqUz7dYUbqoB5SHAvs=; b=cmF/yYP1wNySx/LTDqYBbvt8kz5yRNfryXbH9j5p0RdlMxl1Oa6t9z8njLt+THK9UY qJF4xngYCzVzSauMYP5CXlTbvmv0yiKPiyHzFJfurpGFbQGlaQ0Ri0s5h36dTz/Dk+fe wEGir06eoROgH9yXtIvJTIKr2tgPC7ckMm7LkChImPpu74Y4n9MBB4qsWMue4eiEyvsM U6EYN5MRmCvWi9j2ObO34QeB/aKtNZqZnOmbM/uXaiUMxzresA/GPeWFEWxl9lFZWA/A QnxaEhozONiB+HrKuGOS5nzFefnY1p+A/xbR6VtpwNWpj9jOvuY5LW0HmqHMsagdj2xh J2Iw== X-Forwarded-Encrypted: i=1; AJvYcCXsoUDv/wC+AfyRTYYWpPNq6T2auebgSQTRSgDcG+vw+hVX+BtLwhN4Ta8SG5p1oUalXYW91Tu07f3U3rolrd8pEIl4+ec= X-Gm-Message-State: AOJu0Yy9Wivd3U4pVRrh97cPrrZA9Hv4ZNXtUnyfXKu+RDoVg3Lm6Jjw mBBhcn6e9Za+p4co+08/eozQfbzyEwuBUBYZdV0Rqq3pwn0HueaSxV2H3KUR X-Google-Smtp-Source: AGHT+IHSeADdr7oae2dwf7a9wC4VokcmfdTAlMXU7YhbyF38XFOtygJ341wXXOB0MBuYFx4FdJ4TpA== X-Received: by 2002:a17:90a:7d13:b0:299:3550:9fda with SMTP id g19-20020a17090a7d1300b0029935509fdamr9041674pjl.6.1708402777946; Mon, 19 Feb 2024 20:19:37 -0800 (PST) Received: from wheely.local0.net (123-243-155-241.static.tpgi.com.au. [123.243.155.241]) by smtp.gmail.com with ESMTPSA id pl6-20020a17090b268600b00297138f0496sm6232621pjb.31.2024.02.19.20.19.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Feb 2024 20:19:37 -0800 (PST) From: Nicholas Piggin To: qemu-ppc@nongnu.org Cc: Nicholas Piggin , qemu-devel@nongnu.org, Richard Henderson Subject: [RFC PATCH 2/3] tcg: add a ll/sc protection facility Date: Tue, 20 Feb 2024 14:19:21 +1000 Message-ID: <20240220041922.373029-3-npiggin@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240220041922.373029-1-npiggin@gmail.com> References: <20240220041922.373029-1-npiggin@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::529; envelope-from=npiggin@gmail.com; helo=mail-pg1-x529.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This uses the cpu physical memory dirty mechanism to provide a LL/SC protection system, so that a CPU can set llsc protection on a block of memory, and it can check whether any other CPUs have stored to that memory in a way that can be done race-free to perform a store conditional on that check. QEMU TCG implements LL/SC conditional store with an atomic cmpxchg that only checks that the value of memory is the same as the value returned by the LL. This works well in practice, but it does not faithfully emulate LL/SC. Among the issues are: 1. cmpxchg incorrectly succeeds in the case that stores were performed by other CPUs, but the observed value is the same. Also known as the ABA problem. 2. Some implementations (e.g., IBM POWER) provide a protection block (reservation granule) size that is larger than the load. A store from another CPU to anywhere in the granule should cause a loss of protection. A conditional store from this CPU to anywhere in the granule could also succeed. Some IBM firmware relies on this implementation detail. 3. Some (e.g., ppc, arm) implementations of LL/SC work on physical memory. It could succeed if the store is performed at a different virtual address than the load, if they alias to the same physical memory. 4. Some implementations allow the CPU that performed the load to store a different value at that memory location with a simple store, and have the conditional-store succeed. These can cause varying degrees of problems. 2. was observed to cause a real problem. Although most open source code doesn't seem to have a problem. And I think the more restrictie pseudo-LLSC encourages software to be clean :) The ABA problem *almost* has an interesting memory ordering problem vs Linux spinlocks: lock: larx cmp bne busy stcx. bne- lock lwsync The lwsync barrier permits loads to be performed ahead of earleir stores so a load in the critical section can see a result before the stcx. store to set the lock is visible to other CPUs. This works because lwsync does prevent those loads being performed before the earlier larx, so they must be performed with a valid reservation that results in the lock being taken. If another CPU could come between the larx and stcx. and take the lock, perform a critical section, and then release the lock, then if the stcx. succeeds, the lwsync would not prevent loads in the critical section being performed before it. The saving grace here is that a cmpxchg-based stcx. itself has to be implemented with a larx on the host, so it does end up ordering correctly. Maybe that is inherent in any similar kind of issue and so it's not really very interesting, but I thought it showed possible subtle issues (one of our firmware engineers spotted it). --- include/exec/cputlb.h | 7 ++ include/exec/ram_addr.h | 42 ++++++++- include/exec/ramlist.h | 10 +++ include/hw/core/cpu.h | 5 ++ accel/tcg/cputlb.c | 188 ++++++++++++++++++++++++++++++++++++++- hw/core/cpu-common.c | 5 ++ hw/vfio/common.c | 2 +- hw/virtio/vhost.c | 1 + system/memory.c | 3 + system/physmem.c | 7 ++ system/memory_ldst.c.inc | 3 +- 11 files changed, 266 insertions(+), 7 deletions(-) diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h index 6da1462c4f..8c590b34a7 100644 --- a/include/exec/cputlb.h +++ b/include/exec/cputlb.h @@ -27,4 +27,11 @@ void tlb_protect_code(ram_addr_t ram_addr); void tlb_unprotect_code(ram_addr_t ram_addr); #endif + +void cpu_set_llsc_prot(CPUState *cpu, ram_addr_t addr); +bool cpu_resolve_llsc_begin(CPUState *cpu); +void cpu_resolve_llsc_abort(CPUState *cpu); +bool cpu_resolve_llsc_check(CPUState *cpu, ram_addr_t addr); +void cpu_resolve_llsc_success(CPUState *cpu); + #endif diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index dadb2deb11..4776921c49 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -147,7 +147,13 @@ static inline void qemu_ram_block_writeback(RAMBlock *block) } #define DIRTY_CLIENTS_ALL ((1 << DIRTY_MEMORY_NUM) - 1) -#define DIRTY_CLIENTS_NOCODE (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE)) + +/* + * Migration and VGA just want to see the dirty bit set, hence "oneshot". + * As opposed to code and llsc_prot which want to be called on every store. + */ +#define DIRTY_CLIENTS_ONESHOT ((1 << DIRTY_MEMORY_VGA) | \ + (1 << DIRTY_MEMORY_MIGRATION)) static inline bool cpu_physical_memory_get_dirty(ram_addr_t start, ram_addr_t length, @@ -240,7 +246,13 @@ static inline bool cpu_physical_memory_is_clean(ram_addr_t addr) bool code = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_CODE); bool migration = cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_MIGRATION); - return !(vga && code && migration); + bool llsc_prot = +#ifdef TARGET_HAS_LLSC_PROT + cpu_physical_memory_get_dirty_flag(addr, DIRTY_MEMORY_LLSC_PROT); +#else + false; +#endif + return !(vga && code && migration && llsc_prot); } static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t start, @@ -261,6 +273,12 @@ static inline uint8_t cpu_physical_memory_range_includes_clean(ram_addr_t start, !cpu_physical_memory_all_dirty(start, length, DIRTY_MEMORY_MIGRATION)) { ret |= (1 << DIRTY_MEMORY_MIGRATION); } +#ifdef TARGET_HAS_LLSC_PROT + if (mask & (1 << DIRTY_MEMORY_LLSC_PROT) && + !cpu_physical_memory_all_dirty(start, length, DIRTY_MEMORY_LLSC_PROT)) { + ret |= (1 << DIRTY_MEMORY_LLSC_PROT); + } +#endif return ret; } @@ -322,6 +340,12 @@ static inline void cpu_physical_memory_set_dirty_range(ram_addr_t start, bitmap_set_atomic(blocks[DIRTY_MEMORY_CODE]->blocks[idx], offset, next - page); } +#ifdef TARGET_HAS_LLSC_PROT + if (unlikely(mask & (1 << DIRTY_MEMORY_LLSC_PROT))) { + bitmap_set_atomic(blocks[DIRTY_MEMORY_LLSC_PROT]->blocks[idx], + offset, next - page); + } +#endif page = next; idx++; @@ -355,6 +379,8 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap, unsigned long hpratio = qemu_real_host_page_size() / TARGET_PAGE_SIZE; unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS); + assert(0); + /* start address is aligned at the start of a word? */ if ((((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) && (hpratio == 1)) { @@ -396,6 +422,12 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap, if (tcg_enabled()) { qatomic_or(&blocks[DIRTY_MEMORY_CODE][idx][offset], temp); +#ifdef TARGET_HAS_LLSC_PROT + /* XXX? */ + qatomic_or(&blocks[DIRTY_MEMORY_LLSC_PROT][idx][offset], + temp); +#endif + assert(0); } } @@ -408,7 +440,8 @@ uint64_t cpu_physical_memory_set_dirty_lebitmap(unsigned long *bitmap, xen_hvm_modified_memory(start, pages << TARGET_PAGE_BITS); } else { - uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : DIRTY_CLIENTS_NOCODE; + uint8_t clients = tcg_enabled() ? DIRTY_CLIENTS_ALL : + DIRTY_CLIENTS_ONESHOT; if (!global_dirty_tracking) { clients &= ~(1 << DIRTY_MEMORY_MIGRATION); @@ -470,6 +503,9 @@ static inline void cpu_physical_memory_clear_dirty_range(ram_addr_t start, cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_MIGRATION); cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_VGA); cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_CODE); +#ifdef TARGET_HAS_LLSC_PROT + cpu_physical_memory_test_and_clear_dirty(start, length, DIRTY_MEMORY_LLSC_PROT); +#endif } diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h index 2ad2a81acc..e6523bf25c 100644 --- a/include/exec/ramlist.h +++ b/include/exec/ramlist.h @@ -8,10 +8,20 @@ typedef struct RAMBlockNotifier RAMBlockNotifier; +/* XXX: hack, need to work out how to do this */ +#ifndef TARGET_HAS_LLSC_PROT +#define TARGET_HAS_LLSC_PROT +#endif + #define DIRTY_MEMORY_VGA 0 #define DIRTY_MEMORY_CODE 1 #define DIRTY_MEMORY_MIGRATION 2 +#ifdef TARGET_HAS_LLSC_PROT +#define DIRTY_MEMORY_LLSC_PROT 3 +#define DIRTY_MEMORY_NUM 4 /* num of dirty bits */ +#else #define DIRTY_MEMORY_NUM 3 /* num of dirty bits */ +#endif /* The dirty memory bitmap is split into fixed-size blocks to allow growth * under RCU. The bitmap for a block can be accessed as follows: diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index 4385ce54c9..f1da94771e 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -489,6 +489,11 @@ struct CPUState { uint64_t random_seed; sigjmp_buf jmp_env; + int llsc_prot_block_size; + bool llsc_prot_active; + bool llsc_resolving; + hwaddr llsc_prot_address; /* ram_addr_t physical address of reservation */ + QemuMutex work_mutex; QSIMPLEQ_HEAD(, qemu_work_item) work_list; diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 078f4ef166..d27053cadc 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -274,11 +274,18 @@ static inline void tlb_n_used_entries_dec(CPUState *cpu, uintptr_t mmu_idx) cpu->neg.tlb.d[mmu_idx].n_used_entries--; } +#ifdef TARGET_HAS_LLSC_PROT +static QemuMutex llsc_prot_mutex; +#endif + void tlb_init(CPUState *cpu) { int64_t now = get_clock_realtime(); int i; +#ifdef TARGET_HAS_LLSC_PROT + qemu_mutex_init(&llsc_prot_mutex); +#endif qemu_spin_init(&cpu->neg.tlb.c.lock); /* All tlbs are initialized flushed. */ @@ -1408,6 +1415,176 @@ static bool victim_tlb_hit(CPUState *cpu, size_t mmu_idx, size_t index, return false; } +#ifdef TARGET_HAS_LLSC_PROT +/* + * Remove a particular address from being watched by LLSC if it is + * no longer actively protected by any CPU. + */ +static void cpu_teardown_llsc_prot(ram_addr_t addr) +{ + CPUState *cpu; + + addr &= TARGET_PAGE_MASK; + + CPU_FOREACH(cpu) { + if ((cpu->llsc_prot_address & TARGET_PAGE_MASK) == addr) { + if (cpu->llsc_prot_active) { + return; + } + cpu->llsc_prot_address = -1ULL; + } + } + cpu_physical_memory_set_dirty_flag(addr, DIRTY_MEMORY_LLSC_PROT); +} + +/* + * Start protection of an address. cpu_resolve_llsc_begin() and + * cpu_resolve_llsc_check() will return true so long as another + * actor has not modified the line. + */ +void cpu_set_llsc_prot(CPUState *cpu, ram_addr_t addr) +{ + ram_addr_t old; + + addr &= ~(cpu->llsc_prot_block_size - 1); + + qemu_mutex_lock(&llsc_prot_mutex); + old = cpu->llsc_prot_address; + if ((addr & TARGET_PAGE_MASK) == (old & TARGET_PAGE_MASK)) { + old = -1ULL; + goto out; + } + cpu_physical_memory_test_and_clear_dirty(addr & TARGET_PAGE_MASK, + TARGET_PAGE_SIZE, + DIRTY_MEMORY_LLSC_PROT); +out: + cpu->llsc_prot_address = addr; + cpu->llsc_prot_active = true; + if (old != -1ULL) { + cpu_teardown_llsc_prot(old); + } + + qemu_mutex_unlock(&llsc_prot_mutex); +} + +/* + * The store for store-conditional must be performed under the llsc_prot_mutex, + * but it must not go ahead if protection has been lost. The point of resolving + * conflicts happens at TLB probe time, so this can be achieved by taking the + * lock here, then checking our protection has not been invalidated and probing + * the TLB, then performing the store. + * + * The TLB probe must be non-faulting (to avoid problems with lock recursion). + * If the non-faulting probe fails then cpu_resolve_llsc_abort() must be called + * (and either perform a full probe and then retry, or perhaps could just fail + * the store-conditional). + * + * The TLB probe while holding the mutex may have to check and invalidate the + * protection on other CPUs and therefore it must hold the lock. If + * llsc_resolving is true then the lock is held and not acquired again. + */ +bool cpu_resolve_llsc_begin(CPUState *cpu) +{ + if (!cpu->llsc_prot_active) { + return false; + } + qemu_mutex_lock(&llsc_prot_mutex); + if (!cpu->llsc_prot_active) { + qemu_mutex_unlock(&llsc_prot_mutex); + return false; + } + + g_assert(!cpu->llsc_resolving); + cpu->llsc_resolving = true; + + return true; +} + +void cpu_resolve_llsc_abort(CPUState *cpu) +{ + cpu->llsc_resolving = false; + qemu_mutex_unlock(&llsc_prot_mutex); +} + +bool cpu_resolve_llsc_check(CPUState *cpu, ram_addr_t addr) +{ + g_assert(cpu->llsc_resolving); + g_assert(cpu->llsc_prot_active); + + addr &= ~(cpu->llsc_prot_block_size - 1); + if (cpu->llsc_prot_address != addr) { + cpu->llsc_prot_active = false; + cpu->llsc_resolving = false; + qemu_mutex_unlock(&llsc_prot_mutex); + return false; + } + + return true; +} + +void cpu_resolve_llsc_success(CPUState *cpu) +{ + ram_addr_t addr; + + addr = cpu->llsc_prot_address; + g_assert(addr != -1ULL); + assert(cpu->llsc_prot_active); + cpu->llsc_prot_active = false; + + /* Leave the llsc_prot_address under active watch, for performance */ +// cpu->llsc_prot_address = -1ULL; +// cpu_teardown_llsc_prot(addr); + g_assert(cpu->llsc_resolving); + cpu->llsc_resolving = false; + + qemu_mutex_unlock(&llsc_prot_mutex); +} + +static void other_cpus_clear_llsc_prot(CPUState *cpu, ram_addr_t addr, + unsigned size) +{ + CPUState *c; + ram_addr_t end; + bool teardown = false; + + end = (addr + size - 1) & ~(cpu->llsc_prot_block_size - 1); + addr &= ~(cpu->llsc_prot_block_size - 1); + + if (!cpu->llsc_resolving) { + qemu_mutex_lock(&llsc_prot_mutex); + } + + CPU_FOREACH(c) { + ram_addr_t a = c->llsc_prot_address; + + if (c == cpu) { + continue; + } + if (a == -1ULL) { + assert(!c->llsc_prot_active); + continue; + } + if (a == addr || a == end) { + if (c->llsc_prot_active) { + c->llsc_prot_active = false; + } else { + teardown = true; + } + } + } + if (teardown) { + cpu_teardown_llsc_prot(addr); + if (end != addr) { + cpu_teardown_llsc_prot(end); + } + } + + if (!cpu->llsc_resolving) { + qemu_mutex_unlock(&llsc_prot_mutex); + } +} +#endif + static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size, CPUTLBEntryFull *full, uintptr_t retaddr) { @@ -1419,11 +1596,18 @@ static void notdirty_write(CPUState *cpu, vaddr mem_vaddr, unsigned size, tb_invalidate_phys_range_fast(ram_addr, size, retaddr); } +#ifdef TARGET_HAS_LLSC_PROT + if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_LLSC_PROT)) { + other_cpus_clear_llsc_prot(cpu, ram_addr, size); + } +#endif + /* * Set both VGA and migration bits for simplicity and to remove - * the notdirty callback faster. + * the notdirty callback faster. Code and llsc_prot don't get set + * because we always want callbacks for them. */ - cpu_physical_memory_set_dirty_range(ram_addr, size, DIRTY_CLIENTS_NOCODE); + cpu_physical_memory_set_dirty_range(ram_addr, size, DIRTY_CLIENTS_ONESHOT); /* We remove the notdirty callback only if the code has been flushed. */ if (!cpu_physical_memory_is_clean(ram_addr)) { diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c index 67db07741d..edfbffb705 100644 --- a/hw/core/cpu-common.c +++ b/hw/core/cpu-common.c @@ -250,6 +250,11 @@ static void cpu_common_initfn(Object *obj) cpu->nr_cores = 1; cpu->nr_threads = 1; cpu->cflags_next_tb = -1; +#ifdef TARGET_HAS_LLSC_PROT + cpu->llsc_prot_active = false; + cpu->llsc_prot_address = -1ULL; + cpu->llsc_resolving = false; +#endif qemu_mutex_init(&cpu->work_mutex); qemu_lockcnt_init(&cpu->in_ioctl_lock); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 059bfdc07a..f50ff712cf 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1174,7 +1174,7 @@ int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova, if (!bcontainer->dirty_pages_supported && !all_device_dirty_tracking) { cpu_physical_memory_set_dirty_range(ram_addr, size, tcg_enabled() ? DIRTY_CLIENTS_ALL : - DIRTY_CLIENTS_NOCODE); + DIRTY_CLIENTS_ONESHOT); return 0; } diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 2c9ac79468..a57b153505 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -520,6 +520,7 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section) */ handled_dirty = (1 << DIRTY_MEMORY_MIGRATION) | (1 << DIRTY_MEMORY_CODE); + /* XXX: llsc? */ if (dirty_mask & ~handled_dirty) { trace_vhost_reject_section(mr->name, 1); diff --git a/system/memory.c b/system/memory.c index a229a79988..b2bac2d3b5 100644 --- a/system/memory.c +++ b/system/memory.c @@ -1863,6 +1863,9 @@ uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr) if (tcg_enabled() && rb) { /* TCG only cares about dirty memory logging for RAM, not IOMMU. */ mask |= (1 << DIRTY_MEMORY_CODE); +#ifdef TARGET_HAS_LLSC_PROT + mask |= (1 << DIRTY_MEMORY_LLSC_PROT); +#endif } return mask; } diff --git a/system/physmem.c b/system/physmem.c index dc0d8b16aa..aac227f0d2 100644 --- a/system/physmem.c +++ b/system/physmem.c @@ -2591,6 +2591,13 @@ static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr, dirty_log_mask = cpu_physical_memory_range_includes_clean(addr, length, dirty_log_mask); } +#ifdef TARGET_HAS_LLSC_PROT + if (dirty_log_mask & (1 << DIRTY_MEMORY_LLSC_PROT)) { + assert(tcg_enabled()); + /* XXX should invalidate CPU's llsc_prot protections here? */ + dirty_log_mask &= ~(1 << DIRTY_MEMORY_LLSC_PROT); + } +#endif if (dirty_log_mask & (1 << DIRTY_MEMORY_CODE)) { assert(tcg_enabled()); tb_invalidate_phys_range(addr, addr + length - 1); diff --git a/system/memory_ldst.c.inc b/system/memory_ldst.c.inc index 0e6f3940a9..c9730de37b 100644 --- a/system/memory_ldst.c.inc +++ b/system/memory_ldst.c.inc @@ -286,7 +286,8 @@ void glue(address_space_stl_notdirty, SUFFIX)(ARG1_DECL, stl_p(ptr, val); dirty_log_mask = memory_region_get_dirty_log_mask(mr); - dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE); +// dirty_log_mask &= ~((1 << DIRTY_MEMORY_CODE) | (1 << DIRTY_MEMORY_LLSC_PROT)); + dirty_log_mask &= DIRTY_CLIENTS_ONESHOT; cpu_physical_memory_set_dirty_range(memory_region_get_ram_addr(mr) + addr, 4, dirty_log_mask); r = MEMTX_OK; From patchwork Tue Feb 20 04:19:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 1901195 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=FloeIvI4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Tf5mm57xJz23cb for ; Tue, 20 Feb 2024 15:20:40 +1100 (AEDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rcHbi-0001gF-DZ; Mon, 19 Feb 2024 23:19:50 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rcHbh-0001fz-1i; Mon, 19 Feb 2024 23:19:49 -0500 Received: from mail-pj1-x1033.google.com ([2607:f8b0:4864:20::1033]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rcHbe-0006HU-Tv; Mon, 19 Feb 2024 23:19:48 -0500 Received: by mail-pj1-x1033.google.com with SMTP id 98e67ed59e1d1-299c5a23f40so783712a91.3; Mon, 19 Feb 2024 20:19:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708402785; x=1709007585; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VI3mZsuAcYBxmb29DQFvybJDPmtONoM1V8I3PUnaIk4=; b=FloeIvI4qVhsNGd6Dg+NW6OeEu8pPvxhNE3gj0bqhLUqnzjhzRHsnsL5Zp5lb2GhC+ G348vICS5h6LuaNE9rsidUgwMeNlbHECufJcI7miStVaP7MyfjTE/YuPedFclc5WwPUg 49MiIlWhvhZPyP59yceNuNQLxfvoHhGiKSGZcY+LYo9/UFn2K5GLKtYfn3gflQTmeK64 jRYMKLdyiISWQzzll777/zIj8HHxvKUUMc2hrgq7QEtUwrwT2KytbIC/86VcUyiP2Eej x7VP2VjwdGzkKWDatoiewKNDii/HNaw6kIqiCDq2GJMo+43/OZJNxJwYhjhAX375Yw6R GMWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708402785; x=1709007585; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VI3mZsuAcYBxmb29DQFvybJDPmtONoM1V8I3PUnaIk4=; b=GZJPnTfRgsf4fhXbr4dCN4haxHvXLgoOOQc4AWx3oDlpYni9//n93Vw96Z4/rh54Dz FPcqnA0mCXdV4NlLYYgUJHL16Hmfo19qtdofK54yBXCB6lM/fDug6LEs0ai7w4sOph97 HtzZ9997hQfJCrCtd2irh7Y/pwLmnFNP1T4AO0UjE2ICL2SHJ2u+khLsmN3I2B8rhajd MHC8MgE5Qfxlwz5DXGoa5/qhWb2Rs9+vaN90xTNc3N9oonWYMs3ofsGlT1sz8Kdnw8A8 zLhKaJ+pOW8iOskLNV+IHU4h7k8dDucFVLcb8zyZaRnGbMRG5ZYW8kYbuv2sONIf1CxZ G5SQ== X-Forwarded-Encrypted: i=1; AJvYcCUCpVA7r58hTdDNch2/qQH9u7tMFtAeWJjTBuzrBorJcSBCIR9IA/bjlRRTUt8e3s4GlvdZ5zqmUr5cMPWqw8zRHaca3dE= X-Gm-Message-State: AOJu0YyUBep99agpg2T9F76DkbhI/axccSIdn2x9Qz4JtLVdg1NgsLbe Wf6e1ME0tED/ZF9UFi6OCrvGRmU8HEqeXLEMN4h8+jY5kOWBmjcwCVHlbEPe X-Google-Smtp-Source: AGHT+IGfgMoTY6FGpctUejyNDl+lPM+ltp5B+xkwCrkpChOXpwQKQM6HMAi72oJnRhFPIPApjLGtvQ== X-Received: by 2002:a17:90a:7883:b0:298:b08e:ec35 with SMTP id x3-20020a17090a788300b00298b08eec35mr9343098pjk.47.1708402780735; Mon, 19 Feb 2024 20:19:40 -0800 (PST) Received: from wheely.local0.net (123-243-155-241.static.tpgi.com.au. [123.243.155.241]) by smtp.gmail.com with ESMTPSA id pl6-20020a17090b268600b00297138f0496sm6232621pjb.31.2024.02.19.20.19.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Feb 2024 20:19:40 -0800 (PST) From: Nicholas Piggin To: qemu-ppc@nongnu.org Cc: Nicholas Piggin , qemu-devel@nongnu.org, Richard Henderson Subject: [RFC PATCH 3/3] target/ppc: Implement reservation protection for larx/stcx Date: Tue, 20 Feb 2024 14:19:22 +1000 Message-ID: <20240220041922.373029-4-npiggin@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240220041922.373029-1-npiggin@gmail.com> References: <20240220041922.373029-1-npiggin@gmail.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1033; envelope-from=npiggin@gmail.com; helo=mail-pj1-x1033.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Use the generic llsc protection feature to implement real reservation protection for larx/stcx. This is more complicated and quite a bit slower than the cmpxchg pseudo-reservation, so it's questionable whether it should be merged or ever made the default. It could possibly be sped up more by tuning heuristics for managing the dirty bitmap -- being a bit more lazy about that could reduce the heavyweight bitmap cleaning and dirtying operations. Not sure if there are other better approaches to this. One thing I have tried to implement the reservation-granule (i.e., stcx. to succeed for an address that differs from the one that established the reservation) is to first fail the stcx. if it detects an address mismatch, but then set a flag to say try the mismatched mode next time. Next larx encountered will load and save memory contents for the entire reservation granule, then stcx. verifies the granule. This works to some degree with better performance on code that doesn't behave this way, but it still misses things, not only ABA but also stores that modify values not subject to the stcx. on the first pass in case that stcx. address *does* match larx. So I consider that even more of a hack and not really suitable for upstream. At the very least, this seems to have flushed out another bug in the cpu memory dirty bitmap code, since it hammers it much harder than typical users and simple test cases can be constructed that make failures obvious (lost updates, lost mutual exclusion, etc). so it hasn't been all for nothing even if it is a bad design :) --- target/ppc/cpu-param.h | 4 ++ target/ppc/helper.h | 2 + target/ppc/cpu_init.c | 4 ++ target/ppc/mem_helper.c | 132 ++++++++++++++++++++++++++++++++++++++++ target/ppc/translate.c | 128 ++++++++------------------------------ 5 files changed, 168 insertions(+), 102 deletions(-) diff --git a/target/ppc/cpu-param.h b/target/ppc/cpu-param.h index 0a0416e0a8..355b4e2fdd 100644 --- a/target/ppc/cpu-param.h +++ b/target/ppc/cpu-param.h @@ -8,6 +8,10 @@ #ifndef PPC_CPU_PARAM_H #define PPC_CPU_PARAM_H +#ifndef TARGET_HAS_LLSC_PROT /* XXX: hack */ +#define TARGET_HAS_LLSC_PROT 1 +#endif + #ifdef TARGET_PPC64 # define TARGET_LONG_BITS 64 /* diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 86f97ee1e7..e996b9f530 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -818,4 +818,6 @@ DEF_HELPER_4(DSCLI, void, env, fprp, fprp, i32) DEF_HELPER_4(DSCLIQ, void, env, fprp, fprp, i32) DEF_HELPER_1(tbegin, void, env) +DEF_HELPER_4(larx, void, env, tl, tl, tl) +DEF_HELPER_4(stcx, void, env, tl, tl, tl) DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env) diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c index 9931372a08..5fc14830ab 100644 --- a/target/ppc/cpu_init.c +++ b/target/ppc/cpu_init.c @@ -6831,6 +6831,10 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp) goto unrealize; } init_ppc_proc(cpu); + if (tcg_enabled()) { + cs->llsc_prot_block_size = env->dcache_line_size; + printf("Reservation granule size %d\n", cs->llsc_prot_block_size); + } ppc_gdb_init(cs, pcc); qemu_init_vcpu(cs); diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c index ea7e8443a8..fe2c7576c8 100644 --- a/target/ppc/mem_helper.c +++ b/target/ppc/mem_helper.c @@ -24,11 +24,21 @@ #include "exec/helper-proto.h" #include "helper_regs.h" #include "exec/cpu_ldst.h" +#include "exec/cputlb.h" #include "internal.h" #include "qemu/atomic128.h" /* #define DEBUG_OP */ +static inline int DEF_MEMOP(const CPUPPCState *env, int op) +{ + if (FIELD_EX64(env->msr, MSR, LE)) { + return op | MO_LE; + } else { + return op | MO_BE; + } +} + static inline bool needs_byteswap(const CPUPPCState *env) { #if TARGET_BIG_ENDIAN @@ -528,3 +538,125 @@ void helper_tbegin(CPUPPCState *env) env->spr[SPR_TFHAR] = env->nip + 4; env->crf[0] = 0xB; /* 0b1010 = transaction failure */ } + +void helper_larx(CPUPPCState *env, target_ulong addr, target_ulong size, + target_ulong reg) +{ + CPUState *cs = env_cpu(env); + uintptr_t raddr = GETPC(); + int mmu_idx = cpu_mmu_index(cs, false); + uint64_t val; + void *host; + + if (addr & (size - 1)) { + ppc_cpu_do_unaligned_access(cs, addr, MMU_DATA_LOAD, mmu_idx, raddr); + } + + env->access_type = ACCESS_RES; + host = probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, raddr); + if (host) { + cpu_set_llsc_prot(cs, qemu_ram_addr_from_host_nofail(host)); + } else { + /* XXX: fault? */ + g_assert_not_reached(); + } + + if (unlikely(size == 16)) { + Int128 val16; + val16 = cpu_ld16_mmu(env, addr, + make_memop_idx(DEF_MEMOP(env, MO_128 | MO_ALIGN), mmu_idx), + raddr); + env->gpr[reg] = int128_gethi(val16); + env->gpr[reg + 1] = int128_getlo(val16); + return; + } + + switch (size) { + case 1: + val = ldub_p(host); + break; + case 2: + val = FIELD_EX64(env->msr, MSR, LE) ? lduw_le_p(host) : lduw_be_p(host); + break; + case 4: + val = FIELD_EX64(env->msr, MSR, LE) ? ldl_le_p(host) : ldl_be_p(host); + break; + case 8: + val = FIELD_EX64(env->msr, MSR, LE) ? ldq_le_p(host) : ldq_be_p(host); + break; + default: + g_assert_not_reached(); + } + env->gpr[reg] = val; +} + +void helper_stcx(CPUPPCState *env, target_ulong addr, target_ulong size, + target_ulong reg) +{ + CPUState *cs = env_cpu(env); + uintptr_t raddr = GETPC(); + int mmu_idx = cpu_mmu_index(cs, false); + uint64_t val; + void *host; + CPUTLBEntryFull *full; + int flags; + + if (addr & (size - 1)) { + ppc_cpu_do_unaligned_access(cs, addr, MMU_DATA_STORE, mmu_idx, raddr); + } + + env->access_type = ACCESS_RES; + +again: + if (!cpu_resolve_llsc_begin(cs)) { + goto fail; + } + + flags = probe_access_full(env, addr, size, MMU_DATA_STORE, + mmu_idx, true, &host, &full, raddr); + if (unlikely(flags & TLB_INVALID_MASK)) { + cpu_resolve_llsc_abort(cs); + host = probe_access(env, addr, size, MMU_DATA_STORE, mmu_idx, raddr); + g_assert(host); + goto again; + } + + if (!cpu_resolve_llsc_check(cs, qemu_ram_addr_from_host_nofail(host))) { + goto fail; + } + + if (unlikely(size == 16)) { + Int128 val16 = int128_make128(env->gpr[reg + 1], env->gpr[reg]); + cpu_st16_mmu(env, addr, val16, + make_memop_idx(DEF_MEMOP(env, MO_128 | MO_ALIGN), mmu_idx), + raddr); + goto success; + } + + val = env->gpr[reg]; + switch (size) { + case 1: + stb_p(host, val); + break; + case 2: + FIELD_EX64(env->msr, MSR, LE) ? stw_le_p(host, val) : stw_be_p(host, val); + break; + case 4: + FIELD_EX64(env->msr, MSR, LE) ? stl_le_p(host, val) : stl_be_p(host, val); + break; + case 8: + FIELD_EX64(env->msr, MSR, LE) ? stq_le_p(host, val) : stq_be_p(host, val); + break; + default: + g_assert_not_reached(); + } + +success: + cpu_resolve_llsc_success(cs); + + env->crf[0] = xer_so | CRF_EQ; /* stcx pass */ + return; + +fail: + env->crf[0] = xer_so; /* stcx fail */ +} diff --git a/target/ppc/translate.c b/target/ppc/translate.c index 049f636927..a2d002eb89 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -3570,31 +3570,27 @@ static void gen_isync(DisasContext *ctx) ctx->base.is_jmp = DISAS_EXIT_UPDATE; } -#define MEMOP_GET_SIZE(x) (1 << ((x) & MO_SIZE)) - -static void gen_load_locked(DisasContext *ctx, MemOp memop) +static void gen_load_locked(DisasContext *ctx, int size) { - TCGv gpr = cpu_gpr[rD(ctx->opcode)]; - TCGv t0 = tcg_temp_new(); + int rd = rD(ctx->opcode); + TCGv EA = tcg_temp_new(); - gen_set_access_type(ctx, ACCESS_RES); - gen_addr_reg_index(ctx, t0); - tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop | MO_ALIGN); - tcg_gen_mov_tl(cpu_reserve, t0); - tcg_gen_movi_tl(cpu_reserve_length, memop_size(memop)); - tcg_gen_mov_tl(cpu_reserve_val, gpr); + gen_addr_reg_index(ctx, EA); + gen_helper_larx(tcg_env, EA, tcg_constant_tl(size), tcg_constant_tl(rd)); } -#define LARX(name, memop) \ +#define LARX(name, size) \ static void gen_##name(DisasContext *ctx) \ { \ - gen_load_locked(ctx, memop); \ + gen_load_locked(ctx, size); \ } /* lwarx */ -LARX(lbarx, DEF_MEMOP(MO_UB)) -LARX(lharx, DEF_MEMOP(MO_UW)) -LARX(lwarx, DEF_MEMOP(MO_UL)) +LARX(lbarx, 1) +LARX(lharx, 2) +LARX(lwarx, 4) + +#define MEMOP_GET_SIZE(x) (1 << ((x) & MO_SIZE)) static void gen_fetch_inc_conditional(DisasContext *ctx, MemOp memop, TCGv EA, TCGCond cond, int addend) @@ -3802,59 +3798,36 @@ static void gen_stdat(DisasContext *ctx) } #endif -static void gen_conditional_store(DisasContext *ctx, MemOp memop) +static void gen_conditional_store(DisasContext *ctx, int size) { - TCGLabel *lfail; - TCGv EA; - TCGv cr0; - TCGv t0; int rs = rS(ctx->opcode); + TCGv EA = tcg_temp_new(); - lfail = gen_new_label(); - EA = tcg_temp_new(); - cr0 = tcg_temp_new(); - t0 = tcg_temp_new(); - - tcg_gen_mov_tl(cr0, cpu_so); - gen_set_access_type(ctx, ACCESS_RES); gen_addr_reg_index(ctx, EA); - tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, lfail); - tcg_gen_brcondi_tl(TCG_COND_NE, cpu_reserve_length, memop_size(memop), lfail); - - tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val, - cpu_gpr[rs], ctx->mem_idx, - DEF_MEMOP(memop) | MO_ALIGN); - tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val); - tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT); - tcg_gen_or_tl(cr0, cr0, t0); - - gen_set_label(lfail); - tcg_gen_trunc_tl_i32(cpu_crf[0], cr0); - tcg_gen_movi_tl(cpu_reserve, -1); + gen_helper_stcx(tcg_env, EA, tcg_constant_tl(size), tcg_constant_tl(rs)); } -#define STCX(name, memop) \ +#define STCX(name, size) \ static void gen_##name(DisasContext *ctx) \ { \ - gen_conditional_store(ctx, memop); \ + gen_conditional_store(ctx, size); \ } -STCX(stbcx_, DEF_MEMOP(MO_UB)) -STCX(sthcx_, DEF_MEMOP(MO_UW)) -STCX(stwcx_, DEF_MEMOP(MO_UL)) +STCX(stbcx_, 1) +STCX(sthcx_, 2) +STCX(stwcx_, 4) #if defined(TARGET_PPC64) /* ldarx */ -LARX(ldarx, DEF_MEMOP(MO_UQ)) +LARX(ldarx, 8) /* stdcx. */ -STCX(stdcx_, DEF_MEMOP(MO_UQ)) +STCX(stdcx_, 8) /* lqarx */ static void gen_lqarx(DisasContext *ctx) { int rd = rD(ctx->opcode); - TCGv EA, hi, lo; - TCGv_i128 t16; + TCGv EA; if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) || (rd == rB(ctx->opcode)))) { @@ -3862,74 +3835,25 @@ static void gen_lqarx(DisasContext *ctx) return; } - gen_set_access_type(ctx, ACCESS_RES); EA = tcg_temp_new(); gen_addr_reg_index(ctx, EA); - - /* Note that the low part is always in RD+1, even in LE mode. */ - lo = cpu_gpr[rd + 1]; - hi = cpu_gpr[rd]; - - t16 = tcg_temp_new_i128(); - tcg_gen_qemu_ld_i128(t16, EA, ctx->mem_idx, DEF_MEMOP(MO_128 | MO_ALIGN)); - tcg_gen_extr_i128_i64(lo, hi, t16); - - tcg_gen_mov_tl(cpu_reserve, EA); - tcg_gen_movi_tl(cpu_reserve_length, 16); - tcg_gen_st_tl(hi, tcg_env, offsetof(CPUPPCState, reserve_val)); - tcg_gen_st_tl(lo, tcg_env, offsetof(CPUPPCState, reserve_val2)); + gen_helper_larx(tcg_env, EA, tcg_constant_tl(16), tcg_constant_tl(rd)); } /* stqcx. */ static void gen_stqcx_(DisasContext *ctx) { - TCGLabel *lfail; - TCGv EA, t0, t1; - TCGv cr0; - TCGv_i128 cmp, val; int rs = rS(ctx->opcode); + TCGv EA; if (unlikely(rs & 1)) { gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL); return; } - lfail = gen_new_label(); EA = tcg_temp_new(); - cr0 = tcg_temp_new(); - - tcg_gen_mov_tl(cr0, cpu_so); - gen_set_access_type(ctx, ACCESS_RES); gen_addr_reg_index(ctx, EA); - tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, lfail); - tcg_gen_brcondi_tl(TCG_COND_NE, cpu_reserve_length, 16, lfail); - - cmp = tcg_temp_new_i128(); - val = tcg_temp_new_i128(); - - tcg_gen_concat_i64_i128(cmp, cpu_reserve_val2, cpu_reserve_val); - - /* Note that the low part is always in RS+1, even in LE mode. */ - tcg_gen_concat_i64_i128(val, cpu_gpr[rs + 1], cpu_gpr[rs]); - - tcg_gen_atomic_cmpxchg_i128(val, cpu_reserve, cmp, val, ctx->mem_idx, - DEF_MEMOP(MO_128 | MO_ALIGN)); - - t0 = tcg_temp_new(); - t1 = tcg_temp_new(); - tcg_gen_extr_i128_i64(t1, t0, val); - - tcg_gen_xor_tl(t1, t1, cpu_reserve_val2); - tcg_gen_xor_tl(t0, t0, cpu_reserve_val); - tcg_gen_or_tl(t0, t0, t1); - - tcg_gen_setcondi_tl(TCG_COND_EQ, t0, t0, 0); - tcg_gen_shli_tl(t0, t0, CRF_EQ_BIT); - tcg_gen_or_tl(cr0, cr0, t0); - - gen_set_label(lfail); - tcg_gen_trunc_tl_i32(cpu_crf[0], cr0); - tcg_gen_movi_tl(cpu_reserve, -1); + gen_helper_stcx(tcg_env, EA, tcg_constant_tl(16), tcg_constant_tl(rs)); } #endif /* defined(TARGET_PPC64) */