From patchwork Sun Apr 22 14:54:47 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Michael S. Tsirkin" X-Patchwork-Id: 154285 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 8359BB6FC8 for ; Mon, 23 Apr 2012 00:55:00 +1000 (EST) Received: from localhost ([::1]:38001 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SLyBl-0004FJ-VJ for incoming@patchwork.ozlabs.org; Sun, 22 Apr 2012 10:54:57 -0400 Received: from eggs.gnu.org ([208.118.235.92]:33494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SLyBe-0004FD-RR for qemu-devel@nongnu.org; Sun, 22 Apr 2012 10:54:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SLyBc-0005Jf-H8 for qemu-devel@nongnu.org; Sun, 22 Apr 2012 10:54:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:31722) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SLyBc-0005GK-9B for qemu-devel@nongnu.org; Sun, 22 Apr 2012 10:54:48 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q3MEsgsA000811 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 22 Apr 2012 10:54:42 -0400 Received: from redhat.com (vpn-202-59.tlv.redhat.com [10.35.202.59]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with SMTP id q3MEsZpb031477; Sun, 22 Apr 2012 10:54:36 -0400 Date: Sun, 22 Apr 2012 17:54:47 +0300 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Message-ID: <20120422145447.GA10299@redhat.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 Cc: Anthony Liguori , Juan Quintela , Alexey Kardashevskiy , Jason Wang , Eric Sunshine , Amit Shah , David Gibson Subject: [Qemu-devel] [PATCH] virtio: add missing mb() on notification X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org During normal operation, virtio host first writes a used index and then checks whether it should interrupt the guest by reading guest avail flag/used event index values. Guest does the reverse: writes the index/flag, then checks the used ring. The ordering is important: if host avail flag read bypasses the used index write, we could in effect get this timing: host avail flag/used event index read guest enable interrupts: this performs avail flag/used event index write guest check used ring: ring is empty host used index write This timing results in a lost interrupt: guest will never be notified about the used ring update. This has actually been observed in the field, when using qemu-kvm such that the guest vcpu and qemu io run on different host cpus, but only seems to trigger on some specific hardware, and only with userspace virtio: vhost has the necessary smp_mb() in place to prevent the reordering, so the same workload stalls forever waiting for an interrupt with vhost=off but works fine with vhost=on. Insert an smp_mb() barrier operation in userspace virtio to ensure the correct ordering. Applying this patch fixed the race condition we have observed. Tested on x86_64. I checked the code generated by the new macro for i386 and ppc but didn't run virtio. Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefan Hajnoczi --- hw/virtio.c | 2 ++ qemu-barrier.h | 23 ++++++++++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index f805790..6449746 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -693,6 +693,8 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue *vq) { uint16_t old, new; bool v; + /* We need to expose used array entries before checking used event. */ + smp_mb(); /* Always notify when queue is empty (when feature acknowledge) */ if (((vdev->guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) && !vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx)) { diff --git a/qemu-barrier.h b/qemu-barrier.h index c11bb2b..f6722a8 100644 --- a/qemu-barrier.h +++ b/qemu-barrier.h @@ -4,7 +4,7 @@ /* Compiler barrier */ #define barrier() asm volatile("" ::: "memory") -#if defined(__i386__) || defined(__x86_64__) +#if defined(__i386__) /* * Because of the strongly ordered x86 storage model, wmb() is a nop @@ -13,15 +13,31 @@ * load/stores from C code. */ #define smp_wmb() barrier() +/* + * We use GCC builtin if it's available, as that can use + * mfence on 32 bit as well, e.g. if built with -march=pentium-m. + * However, on i386, there seem to be known bugs as recently as 4.3. + * */ +#if defined(_GNUC__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 4 +#define smp_mb() __sync_synchronize() +#else +#define smp_mb() asm volatile("lock; addl $0,0(%%esp) " ::: "memory") +#endif + +#elif defined(__x86_64__) + +#define smp_wmb() barrier() +#define smp_mb() asm volatile("mfence" ::: "memory") #elif defined(_ARCH_PPC) /* - * We use an eieio() for a wmb() on powerpc. This assumes we don't + * We use an eieio() for wmb() and mb() on powerpc. This assumes we don't * need to order cacheable and non-cacheable stores with respect to * each other */ #define smp_wmb() asm volatile("eieio" ::: "memory") +#define smp_mb() asm volatile("eieio" ::: "memory") #else @@ -29,9 +45,10 @@ * For (host) platforms we don't have explicit barrier definitions * for, we use the gcc __sync_synchronize() primitive to generate a * full barrier. This should be safe on all platforms, though it may - * be overkill. + * be overkill for wmb(). */ #define smp_wmb() __sync_synchronize() +#define smp_mb() __sync_synchronize() #endif