From patchwork Thu Nov 10 13:48:20 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hisashi T Fujinaka X-Patchwork-Id: 693224 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3tF4Bw0pTdz9t2T for ; Fri, 11 Nov 2016 00:48:40 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933464AbcKJNsX (ORCPT ); Thu, 10 Nov 2016 08:48:23 -0500 Received: from mail.i8u.org ([75.148.87.25]:35296 "EHLO chris.i8u.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932784AbcKJNsV (ORCPT ); Thu, 10 Nov 2016 08:48:21 -0500 Received: by chris.i8u.org (Postfix, from userid 1000) id 296C016CA2F6; Thu, 10 Nov 2016 05:48:20 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by chris.i8u.org (Postfix) with ESMTP id 1751F16CA2F5; Thu, 10 Nov 2016 05:48:20 -0800 (PST) Date: Thu, 10 Nov 2016 05:48:20 -0800 (PST) From: Hisashi T Fujinaka X-X-Sender: htodd@chris.i8u.org To: Corinna Vinschen cc: Alexander Duyck , Netdev , "linux-kernel@vger.kernel.org" , Cao jin , intel-wired-lan , =?ISO-2022-JP?Q?Izumi=2C_Taku=2F=1B$B=40t=1B=28J_=1B$BBs=1B=28J?= Subject: Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr In-Reply-To: <20161110093549.GA24351@calimero.vinschen.de> Message-ID: References: <1478588780-24480-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20161108164214.GF31855@calimero.vinschen.de> <20161108183739.GA3744@calimero.vinschen.de> <20161110093549.GA24351@calimero.vinschen.de> User-Agent: Alpine 2.20.17 (NEB 179 2016-10-28) MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, 10 Nov 2016, Corinna Vinschen wrote: > On Nov 8 11:33, Alexander Duyck wrote: ... >> The question I would have is what is reading the device when it is in >> this state. The watchdog and any other functions that would read the >> device should be disabled. >> >> One possibility could be a race between a call to igb_close and the >> igb_suspend function. We have seen some of those pop up recently on >> ixgbe and it looks like igb has the same bug. We should probably be >> using the rtnl_lock to guarantee that netif_device_detach and the call >> to __igb_close are completed before igb_close could possibly be called >> by the network stack. > > Do you have a pointer to the related ixgbe patch, by any chance? ... >> The thing is that a suspended device should not be accessed at all. >> If we are accessing it while it is suspended then that is a bug. If >> you could throw a WARN_ON call in igb_rd32 to capture where this is >> being triggered that might be useful. >> >>> - Otherwise assume it's actually a surprise removal. In theory that >>> should somehow trigger a device removal sequence, kind of like >>> calling igb_remove, no? >> >> Well a read of the MMIO region while suspended is more of a surprise >> read since there shouldn't be anything going on. We need to isolate >> where that read is coming from and fix it. > > That would be ideal, but the problem couldn't be reproduced yet apart > from at a customer's customer site. It's not clear yet if we can access > the machine for further testing. Here's the initial patch for igb I have, but it's on hold awaiting more changes in ixgbe regarding AER. diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 4feca69..ce5add6 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3275,7 +3275,10 @@ static int __igb_close(struct net_device *netdev, bool suspending) int igb_close(struct net_device *netdev) { - return __igb_close(netdev, false); + if (netif_device_present(netdev)) + return __igb_close(netdev, false); + + return 0; } /** @@ -7537,6 +7540,7 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake, int retval = 0; #endif + rtnl_lock(); netif_device_detach(netdev); if (netif_running(netdev)) @@ -7545,6 +7549,7 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake, igb_ptp_suspend(adapter); igb_clear_interrupt_scheme(adapter); + rtnl_unlock(); #ifdef CONFIG_PM retval = pci_save_state(pdev); @@ -7663,16 +7668,17 @@ static int igb_resume(struct device *dev) wr32(E1000_WUS, ~0); - if (netdev->flags & IFF_UP) { - rtnl_lock(); + rtnl_lock(); + + if (!err && netif_running(netdev)) err = __igb_open(netdev, true); - rtnl_unlock(); - if (err) - return err; - } - netif_device_attach(netdev); - return 0; + if (!err) + netif_device_attach(netdev); + + rtnl_unlock(); + + return err; } static int igb_runtime_idle(struct device *dev)