Message ID | 200906141606.29754.rjw@sisk.pl |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Hi, On Sun, Jun 14, 2009 at 04:06:29PM +0200, Rafael J. Wysocki wrote: > On Sunday 14 June 2009, Andreas Mohr wrote: > > Hi, > > > > On Sun, Jun 14, 2009 at 12:28:15AM +0200, Rafael J. Wysocki wrote: > > > On Saturday 13 June 2009, Andreas Mohr wrote: > > > > + > > > > if (wake) { > > > > return pci_prepare_to_sleep(pdev); > > > > > > pci_prepare_to_sleep() is supposed to return 0 for your device. I'll have a > > > look at it. > > > > No, wake is false for my card, thus that's not the branch to > > investigate. > > Ah. The problem is, then, that we try to put the device into D3, which it > cannot do and error code is correctly returned from pci_set_power_state(). > > I would use the appended patch in that case and the patch sent previously > is necessary for the 'wake = true' case. OK, as said I cannot test this right now, but I'm _damn_ sure it would work. Thus I'd say your equivalent patch posted a bit later can be committed already. But what about the wake = true case? I'm not affected by this since my card doesn't have any wake capa, thus it's your call of whether that pci core code part really was broken and needed fixing. So, for the patch in your next mail: Acked-by: Andreas Mohr <andi@lisas.de> BTW, that patch was (pasted): static int __e100_power_off(struct pci_dev *pdev, bool wake) { - if (wake) { + if (wake) return pci_prepare_to_sleep(pdev); - } else { - pci_wake_from_d3(pdev, false); - return pci_set_power_state(pdev, PCI_D3hot); - } + + pci_wake_from_d3(pdev, false); + pci_set_power_state(pdev, PCI_D3hot); + + return 0; } Couple questions still: - why do we call pci_wake_from_d3(...false) only!? Wouldn't this break WoL after one iteration back and forth, due to missing 'true' case? - why do we call netif_device_detach() _after_ doing hardware shutdown of the network controller? I'd guess this can cause huge issues? Someone told me he had rtnl lock issues upon S2D with e100 (very similar to my rtnl issues during aborted .suspend), and that might possibly be the reason? So few lines of code, so many questions... Thanks, Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sunday 14 June 2009, Andreas Mohr wrote: > Hi, > > On Sun, Jun 14, 2009 at 04:06:29PM +0200, Rafael J. Wysocki wrote: > > On Sunday 14 June 2009, Andreas Mohr wrote: > > > Hi, > > > > > > On Sun, Jun 14, 2009 at 12:28:15AM +0200, Rafael J. Wysocki wrote: > > > > On Saturday 13 June 2009, Andreas Mohr wrote: > > > > > + > > > > > if (wake) { > > > > > return pci_prepare_to_sleep(pdev); > > > > > > > > pci_prepare_to_sleep() is supposed to return 0 for your device. I'll have a > > > > look at it. > > > > > > No, wake is false for my card, thus that's not the branch to > > > investigate. > > > > Ah. The problem is, then, that we try to put the device into D3, which it > > cannot do and error code is correctly returned from pci_set_power_state(). > > > > I would use the appended patch in that case and the patch sent previously > > is necessary for the 'wake = true' case. > > OK, as said I cannot test this right now, but I'm _damn_ sure it would > work. Thus I'd say your equivalent patch posted a bit later can be > committed already. > > But what about the wake = true case? > I'm not affected by this since my card doesn't have any wake capa, > thus it's your call of whether that pci core code part really was broken > and needed fixing. I think it needs fixing. > So, for the patch in your next mail: > Acked-by: Andreas Mohr <andi@lisas.de> > > > BTW, that patch was (pasted): > > static int __e100_power_off(struct pci_dev *pdev, bool wake) > { > - if (wake) { > + if (wake) > return pci_prepare_to_sleep(pdev); > - } else { > - pci_wake_from_d3(pdev, false); > - return pci_set_power_state(pdev, PCI_D3hot); > - } > + > + pci_wake_from_d3(pdev, false); > + pci_set_power_state(pdev, PCI_D3hot); > + > + return 0; > } > > > Couple questions still: > - why do we call pci_wake_from_d3(...false) only!? > Wouldn't this break WoL after one iteration back and forth, > due to missing 'true' case? The 'true' case is the 'wake = true' one. > - why do we call netif_device_detach() _after_ doing hardware shutdown > of the network controller? I'd guess this can cause huge issues? > Someone told me he had rtnl lock issues upon S2D with e100 > (very similar to my rtnl issues during aborted .suspend), > and that might possibly be the reason? I think you're right, but I'm not a network driver expert. Perhaps you can change the ordering and see if that fixes the rtnl issue (since you're able to reproduce it without my patch, that should be easy to verify). Best, Rafael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Sun, Jun 14, 2009 at 07:09:45PM +0200, Rafael J. Wysocki wrote: > On Sunday 14 June 2009, Andreas Mohr wrote: > > Couple questions still: > > - why do we call pci_wake_from_d3(...false) only!? > > Wouldn't this break WoL after one iteration back and forth, > > due to missing 'true' case? > > The 'true' case is the 'wake = true' one. OK, so it wasn't an explicit pci_wake_from_d3(...true), but the operations done there are the equivalent of it probably. > > - why do we call netif_device_detach() _after_ doing hardware shutdown > > of the network controller? I'd guess this can cause huge issues? > > Someone told me he had rtnl lock issues upon S2D with e100 > > (very similar to my rtnl issues during aborted .suspend), > > and that might possibly be the reason? > > I think you're right, but I'm not a network driver expert. > > Perhaps you can change the ordering and see if that fixes the rtnl issue > (since you're able to reproduce it without my patch, that should be easy to > verify). I'll test this - later. Thanks a lot, Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Sun, Jun 14, 2009 at 07:09:45PM +0200, Rafael J. Wysocki wrote: > On Sunday 14 June 2009, Andreas Mohr wrote: > > - why do we call netif_device_detach() _after_ doing hardware shutdown > > of the network controller? I'd guess this can cause huge issues? > > Someone told me he had rtnl lock issues upon S2D with e100 > > (very similar to my rtnl issues during aborted .suspend), > > and that might possibly be the reason? > > I think you're right, but I'm not a network driver expert. > > Perhaps you can change the ordering and see if that fixes the rtnl issue > (since you're able to reproduce it without my patch, that should be easy to > verify). Well, I just moved netif_device_detach() above netif_running() check, but this didn't fix my network issues in case of a rejecting .suspend handler: after resume when unloading e100, that hangs, and I get tons of rtnl timeouts and locked rtnl mutex. This is most likely because upon e100 unload, a backtrace showed that I was hanging in e100_down -> msleep (somewhere at the very beginning of e100_down), which is most definitely the inlined napi_disable() call there: static inline void napi_disable(struct napi_struct *n) { set_bit(NAPI_STATE_DISABLE, &n->state); while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) msleep(1); clear_bit(NAPI_STATE_DISABLE, &n->state); } IOW the .suspend seems to keep NAPI layer active, yet due to .suspend failure there's no .resume called, thus card is in an _inoperable_ state and NAPI cannot be processed any further, thus napi_disable() on driver unload locks up. BTW, in include/linux/napi.h, shouldn't napi_disable() make use of napi_synchronize() instead of C&P? (simply move napi_synchronize() above napi_disable() and use it there) Oh wait, there's the CONFIG_SMP complication: napi_synchronize() is implemented for SMP only, whereas napi_disable() checks the same thing _always_. (or is it a BUG that napi_disable() does the same check for non-SMP, too??) Thanks, Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/drivers/net/e100.c =================================================================== --- linux-2.6.orig/drivers/net/e100.c +++ linux-2.6/drivers/net/e100.c @@ -2763,8 +2763,9 @@ static int __e100_power_off(struct pci_d return pci_prepare_to_sleep(pdev); } else { pci_wake_from_d3(pdev, false); - return pci_set_power_state(pdev, PCI_D3hot); + pci_set_power_state(pdev, PCI_D3hot); } + return 0; } #ifdef CONFIG_PM