From patchwork Tue Dec 20 23:43:18 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rusty Russell X-Patchwork-Id: 132538 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 799E2B6FE4 for ; Wed, 21 Dec 2011 11:40:27 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753548Ab1LUAkG (ORCPT ); Tue, 20 Dec 2011 19:40:06 -0500 Received: from ozlabs.org ([203.10.76.45]:52223 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753399Ab1LUAjx (ORCPT ); Tue, 20 Dec 2011 19:39:53 -0500 Received: by ozlabs.org (Postfix, from userid 1011) id A7A05B6FF8; Wed, 21 Dec 2011 11:39:51 +1100 (EST) From: Rusty Russell To: "Michael S. Tsirkin" , Tejun Heo Cc: Amit Shah , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC] virtio_net: fix refill related races In-Reply-To: <20111220194518.GB26692@redhat.com> References: <20111207152120.GA23417@redhat.com> <8739cvisqe.fsf@rustcorp.com.au> <20111211144428.GB14381@redhat.com> <878vmioh10.fsf@rustcorp.com.au> <20111212115405.GB7946@redhat.com> <87iplltd0g.fsf@rustcorp.com.au> <20111220190908.GC25689@redhat.com> <20111220190946.GD10752@google.com> <20111220193055.GA26392@redhat.com> <20111220193154.GG10752@google.com> <20111220194518.GB26692@redhat.com> User-Agent: Notmuch/0.6.1-1 (http://notmuchmail.org) Emacs/23.3.1 (i686-pc-linux-gnu) Date: Wed, 21 Dec 2011 10:13:18 +1030 Message-ID: <87hb0udd2h.fsf@rustcorp.com.au> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, 20 Dec 2011 21:45:19 +0200, "Michael S. Tsirkin" wrote: > On Tue, Dec 20, 2011 at 11:31:54AM -0800, Tejun Heo wrote: > > On Tue, Dec 20, 2011 at 09:30:55PM +0200, Michael S. Tsirkin wrote: > > > Hmm, in that case it looks like a nasty race could get > > > triggered, with try_fill_recv run on multiple CPUs in parallel, > > > corrupting the linked list within the vq. > > > > > > Using the mutex as my patch did will fix that naturally, as well. > > > > Don't know the code but just use nrt wq. There's even a system one > > called system_nrq_wq. > > > > Thanks. > > We can, but we need the mutex for other reasons, anyway. Well, here's the alternate approach. What do you think? Finding two wq issues makes you justifiably cautious, but it almost feels like giving up to simply wrap it in a lock. The APIs are designed to let us do it without a lock; I was just using them wrong. Two patches in one mail. It's gauche, but it's an RFC only (untested). Cheers, Rusty. virtio_net: set/cancel work on ndo_open/ndo_stop Michael S. Tsirkin noticed that we could run the refill work after ndo_close, which can re-enable napi - we don't disable it until virtnet_remove. This is clearly wrong, so move the workqueue control to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close). One subtle point: virtnet_probe() could simply fail if it couldn't allocate a receive buffer, but that's less polite in virtnet_open() so we schedule a refill as we do in the normal receive path if we run out of memory. --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -719,6 +719,10 @@ static int virtnet_open(struct net_devic { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure we have some buffersl if oom use wq. */ + if (!try_fill_recv(vi, GFP_KERNEL)) + schedule_delayed_work(&vi->refill, 0); + virtnet_napi_enable(vi); return 0; } @@ -772,6 +776,8 @@ static int virtnet_close(struct net_devi { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure refill_work doesn't re-enable napi! */ + cancel_delayed_work_sync(&vi->refill); napi_disable(&vi->napi); return 0; @@ -1082,7 +1088,6 @@ static int virtnet_probe(struct virtio_d unregister: unregister_netdev(dev); - cancel_delayed_work_sync(&vi->refill); free_vqs: vdev->config->del_vqs(vdev); free_stats: @@ -1121,9 +1126,7 @@ static void __devexit virtnet_remove(str /* Stop all the virtqueues. */ vdev->config->reset(vdev); - unregister_netdev(vi->dev); - cancel_delayed_work_sync(&vi->refill); /* Free unused buffers in both send and recv, if any. */ free_unused_bufs(vi); virtio_net: use non-reentrant workqueue. Michael S. Tsirkin also noticed that we could run the refill work multiple CPUs: if we kick off a refill on one CPU and then on another, they would both manipulate the queue at the same time (they use napi_disable to avoid racing against the receive handler itself). Tejun points out that this is what the WQ_NON_REENTRANT flag is for, and that there is a convenient system kthread we can use. diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -501,7 +501,7 @@ static void refill_work(struct work_stru /* In theory, this can happen: if we don't get any buffers in * we will *never* try to fill again. */ if (still_empty) - schedule_delayed_work(&vi->refill, HZ/2); + queue_delayed_work(system_nrt_wq, &vi->refill, HZ/2); } static int virtnet_poll(struct napi_struct *napi, int budget) @@ -520,7 +520,7 @@ again: if (vi->num < vi->max / 2) { if (!try_fill_recv(vi, GFP_ATOMIC)) - schedule_delayed_work(&vi->refill, 0); + queue_delayed_work(system_nrt_wq, &vi->refill, 0); } /* Out of packets? */ @@ -721,7 +721,7 @@ static int virtnet_open(struct net_devic /* Make sure we have some buffersl if oom use wq. */ if (!try_fill_recv(vi, GFP_KERNEL)) - schedule_delayed_work(&vi->refill, 0); + queue_delayed_work(&system_nrt_wq, &vi->refill, 0); virtnet_napi_enable(vi); return 0;