ipv6: tunnel: hang when destroying ipv6 tunnel

> Tetsuo Handa wrote:
> > Maybe you can get more useful information with below untested printk() patch.
> > 
> > diff --git a/net/9p/client.c b/net/9p/client.c
> > index b23a17c..2dd447a 100644
> > --- a/net/9p/client.c
> > +++ b/net/9p/client.c
> > @@ -734,7 +734,9 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> >  	} else
> >  		sigpending = 0;
> >  
> > +	printk("%u:Calling %pS\n", current->pid, c->trans_mod->request);
> >  	err = c->trans_mod->request(c, req);
> > +	printk("%u:%pS = %d\n", current->pid, c->trans_mod->request, err);
> >  	if (err < 0) {
> >  		if (err != -ERESTARTSYS && err != -EFAULT)
> >  			c->status = Disconnected;
> > @@ -742,8 +744,10 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> >  	}
> >  again:
> >  	/* Wait for the response */
> > +	printk("%u:req->status = %u\n", current->pid, req->status);
> >  	err = wait_event_interruptible(*req->wq,
> >  				       req->status >= REQ_STATUS_RCVD);
> > +	printk("%u:wait = %d\n", current->pid, err);
> >  
> >  	if ((err == -ERESTARTSYS) && (c->status == Connected)
> >  				  && (type == P9_TFLUSH)) {
> > 

Sasha Levin wrote:
> Heya,
> 
> The output from the printk confirmed that there are several threads
> waiting for RPC to complete, with the last two having and odd 'wait'
> result. This is just before the hang:
> 
> [  809.165663] 19964:Calling p9_virtio_request+0x0/0x200
> [  809.166951] 19964:p9_virtio_request+0x0/0x200 = 0
> [  809.167878] 19964:req->status = 3
> [  809.803535] 19957:Calling p9_virtio_request+0x0/0x200
> [  809.804506] 19957:p9_virtio_request+0x0/0x200 = 0
> [  809.805332] 19957:req->status = 3
> [  809.868591] 19955:Calling p9_virtio_request+0x0/0x200
> [  809.869493] 19955:p9_virtio_request+0x0/0x200 = 0
> [  809.870331] 19955:req->status = 3
> [  811.364554] 19985:Calling p9_virtio_request+0x0/0x200
> [  811.365498] 19985:p9_virtio_request+0x0/0x200 = 0
> [  811.366386] 19985:req->status = 3
> [  811.458600] 19999:wait = -512
> [  811.459171] 19999:Calling p9_virtio_request+0x0/0x200
> [  811.459992] 19999:p9_virtio_request+0x0/0x200 = 0
> [  811.460822] 19999:req->status = 3
> [  811.472175] 19994:wait = -512
> [  811.472943] 19994:Calling p9_virtio_request+0x0/0x200
> [  811.474195] 19994:p9_virtio_request+0x0/0x200 = 0
> [  811.474955] 19994:req->status = 3
> [... Hang 120 sec later here]
> 

Good. -512 is -ERESTARTSYS, and this hang occurs after -ERESTARTSYS is
returned. It indicates that c->trans_mod->request() is interrupted by signal.
Since c->trans_mod->request is pointing at p9_virtio_request, the location
returning that error would be

254 static int
255 p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
256 {
257         int err;
258         int in, out;
259         unsigned long flags;
260         struct virtio_chan *chan = client->trans;
261 
262         p9_debug(P9_DEBUG_TRANS, "9p debug: virtio request\n");
263 
264         req->status = REQ_STATUS_SENT;
265 req_retry:
266         spin_lock_irqsave(&chan->lock, flags);
267 
268         /* Handle out VirtIO ring buffers */
269         out = pack_sg_list(chan->sg, 0,
270                            VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
271 
272         in = pack_sg_list(chan->sg, out,
273                           VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
274 
275         err = virtqueue_add_buf(chan->vq, chan->sg, out, in, req->tc,
276                                 GFP_ATOMIC);
277         if (err < 0) {
278                 if (err == -ENOSPC) {
279                         chan->ring_bufs_avail = 0;
280                         spin_unlock_irqrestore(&chan->lock, flags);
281                         err = wait_event_interruptible(*chan->vc_wq,
282                                                         chan->ring_bufs_avail);

   here.

283                         if (err  == -ERESTARTSYS)
284                                 return err;
285 
286                         p9_debug(P9_DEBUG_TRANS, "Retry virtio request\n");
287                         goto req_retry;
288                 } else {
289                         spin_unlock_irqrestore(&chan->lock, flags);
290                         p9_debug(P9_DEBUG_TRANS,
291                                  "virtio rpc add_buf returned failure\n");
292                         return -EIO;
293                 }
294         }
295         virtqueue_kick(chan->vq);
296         spin_unlock_irqrestore(&chan->lock, flags);
297 
298         p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n");
299         return 0;
300 }

Comparing 3.3.1 and linux-next in my environment, there are several changes.

# diff -ur linux-3.3.1/drivers/virtio/ linux-next/drivers/virtio/ | diffstat
 config.c         |    1
 virtio_balloon.c |   14 ----------
 virtio_pci.c     |   74 +++++--------------------------------------------------
 3 files changed, 8 insertions(+), 81 deletions(-)
# diff -urp linux-3.3.1/fs/9p/ linux-next/fs/9p/ | diffstat
 v9fs.c      |   16 ++++++++--------
 vfs_super.c |    5 ++---
 2 files changed, 10 insertions(+), 11 deletions(-)
# diff -ur linux-3.3.1/net/9p/ linux-next/net/9p/ | diffstat
 client.c |   26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

Most suspicious change is net/9p/client.c because it is changing handling of
ERESTARTSYS case.

Maybe commit a314f274 "net/9p: don't allow Tflush to be interrupted" or nearby.

By the way, have you already tried 3.4-rc1?
In my environment, there is no difference between linux-next and 3.4-rc1.

# diff -ur linux-3.4.0-rc1/net/9p/ linux-next/net/9p/
# diff -ur linux-3.4.0-rc1/drivers/virtio/ linux-next/drivers/virtio/
# diff -ur linux-3.4.0-rc1/fs/9p/ linux-next/fs/9p/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	201204052329.EBH52139.FFOLQMOJSVHOFt@I-love.SAKURA.ne.jp
State	Not Applicable, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id C731DB7022 for <patchwork-incoming@ozlabs.org>; Sat, 7 Apr 2012 04:24:57 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753364Ab2DFSYh (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Fri, 6 Apr 2012 14:24:37 -0400 Received: from www262.sakura.ne.jp ([202.181.97.72]:59204 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752354Ab2DFSYg (ORCPT <rfc822;netdev@vger.kernel.org>); Fri, 6 Apr 2012 14:24:36 -0400 Received: from www262.sakura.ne.jp (ksav01.sakura.ne.jp [210.224.165.38]) by www262.sakura.ne.jp (8.14.3/8.14.3) with ESMTP id q35ET49H053007; Thu, 5 Apr 2012 23:29:04 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) X-Nat-Received: from [202.181.97.72]:65048 [ident-empty] by smtp-proxy.isp with TPROXY id 1333636144.24680 Received: from CLAMP (KD175108057186.ppp-bb.dion.ne.jp [175.108.57.186]) by www262.sakura.ne.jp (8.14.3/8.14.3) with ESMTP id q35ET34G053002; Thu, 5 Apr 2012 23:29:04 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) To: levinsasha928@gmail.com, garlick@llnl.gov, ericvh@gmail.com Cc: oleg@redhat.com, eric.dumazet@gmail.com, davem@davemloft.net, kuznet@ms2.inr.ac.ru, jmorris@namei.org, yoshfuji@linux-ipv6.org, kaber@trash.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, davej@redhat.com Subject: Re: ipv6: tunnel: hang when destroying ipv6 tunnel From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> References: <20120331213423.GA21219@redhat.com> <CA+1xoqcAXyh306=2=QKWyM3BE44=MT+nsuRQn5YLOLgbtZqicA@mail.gmail.com> <CA+1xoqe1Wc_uifrKsko6hu+py9Ahz3Q0p1tRxup09jmh2NZ1rw@mail.gmail.com> <201204011221.GAG51016.JQFSFFOtLOMVHO@I-love.SAKURA.ne.jp> <CA+1xoqe+T=iUtUeFdeqmRvMZ7TMCgkLs_1QYv-4YNtn-jCoMCQ@mail.gmail.com> In-Reply-To: <CA+1xoqe+T=iUtUeFdeqmRvMZ7TMCgkLs_1QYv-4YNtn-jCoMCQ@mail.gmail.com> Message-Id: <201204052329.EBH52139.FFOLQMOJSVHOFt@I-love.SAKURA.ne.jp> X-Mailer: Winbiff [Version 2.51 PL2] X-Accept-Language: ja,en,zh Date: Thu, 5 Apr 2012 23:29:04 +0900 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Anti-Virus: Kaspersky Anti-Virus for Linux Mail Server 5.6.44/RELEASE, bases: 05042012 #7627293, status: clean Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

ipv6: tunnel: hang when destroying ipv6 tunnel

Commit Message

Comments

Patch