From patchwork Fri Aug 21 23:34:17 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Dillow X-Patchwork-Id: 31855 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 07558B6F2B for ; Sat, 22 Aug 2009 09:35:03 +1000 (EST) Received: by ozlabs.org (Postfix) id EC064DDD0B; Sat, 22 Aug 2009 09:35:02 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 7EB50DDD04 for ; Sat, 22 Aug 2009 09:35:02 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932890AbZHUXeV (ORCPT ); Fri, 21 Aug 2009 19:34:21 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932650AbZHUXeT (ORCPT ); Fri, 21 Aug 2009 19:34:19 -0400 Received: from smtp.knology.net ([24.214.63.101]:33142 "EHLO smtp.knology.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754240AbZHUXeS (ORCPT ); Fri, 21 Aug 2009 19:34:18 -0400 Received: (qmail 31464 invoked by uid 0); 21 Aug 2009 23:34:18 -0000 Received: from unknown (HELO shed.thedillows.org) (207.98.218.89) by smtp8.knology.net with SMTP; 21 Aug 2009 23:34:18 -0000 Received: from [192.168.1.10] (obelisk.gig.thedillows.org [192.168.1.10]) by shed.thedillows.org (8.14.3/8.14.3) with ESMTP id n7LNYHKW026263; Fri, 21 Aug 2009 19:34:18 -0400 Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts From: David Dillow To: "Eric W. Biederman" Cc: Michael Riepe , Michael Buesch , Francois Romieu , Rui Santos , Michael =?ISO-8859-1?Q?B=FCker?= , linux-kernel@vger.kernel.org, netdev@vger.kernel.org In-Reply-To: <1250895567.23419.1.camel@obelisk.thedillows.org> References: <200903041828.49972.m.bueker@berlin.de> <1242001754.4093.12.camel@obelisk.thedillows.org> <200905112248.44868.mb@bu3sch.de> <200905112310.08534.mb@bu3sch.de> <1242077392.3716.15.camel@lap75545.ornl.gov> <4A09DC3E.2080807@googlemail.com> <1242268709.4979.7.camel@obelisk.thedillows.org> <4A0C6504.8000704@googlemail.com> <1242328457.32579.12.camel@lap75545.ornl.gov> <4A0C7443.1010000@googlemail.com> <1243042174.3580.23.camel@obelisk.thedillows.org> <1250895567.23419.1.camel@obelisk.thedillows.org> Date: Fri, 21 Aug 2009 19:34:17 -0400 Message-Id: <1250897657.23419.5.camel@obelisk.thedillows.org> Mime-Version: 1.0 X-Mailer: Evolution 2.24.5 (2.24.5-2.fc10) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, 2009-08-21 at 18:59 -0400, David Dillow wrote: > On Fri, 2009-08-21 at 13:57 -0700, Eric W. Biederman wrote: > > David Dillow writes: > > I have what at first glance looks like a problem caused by this > > patch. For the last month since upgrading one of my machines from > > 2.6.28 to 2.6.30 it has been becomming inaccessible from the > > network and I have a few: > > > > NETDEV WATCHDOG: eth0 (r8169): transmit timed out > > > > in my logs and a lot soft lockups that always have rtl8169_interrupt > > as the thing that is running. I suspect your patch has introduced > > a near infinite loop in the interrupt handler and is causing these > > soft lockups. > > > > Any ideas? > > I would be surprised, but I suppose it is not out of the realm of > possibility. Can you send me a full dmesg, please? Re-looking at the code, I'd guess that some IRQ status line is getting stuck high, but I don't see why -- we should acknowledge all outstanding interrupts each time through the loop, whether we care about them or not. Could reproduce a problem with the following patch applied, and send the full dmesg, please? --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index b82780d..46cb05a 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) void __iomem *ioaddr = tp->mmio_addr; int handled = 0; int status; + int count = 0; /* loop handling interrupts until we have no new ones or * we hit a invalid/hotplug case. @@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) while (status && status != 0xffff) { handled = 1; + if (count++ > 100) { + printk_once("r8169 screaming irq status %08x " + "mask %08x event %08x napi %08x\n", + status, tp->intr_mask, tp->intr_event, + tp->napi_event); + break; + } + + /* Handle all of the error cases first. These will reset * the chip, so just exit the loop. */