Message ID | 1374269428-6827-1-git-send-email-richard@nod.at |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote: > rhine_reset_task() misses to call netif_stop_queue(), > this can lead to a crash if work is still scheduled while > we're resetting the tx queue. > > Fixes: > [ 93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c > [ 93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6e > > Signed-off-by: Richard Weinberger <richard@nod.at> > --- > drivers/net/ethernet/via/via-rhine.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c > index b75eb9e..57e1b40 100644 > --- a/drivers/net/ethernet/via/via-rhine.c > +++ b/drivers/net/ethernet/via/via-rhine.c > @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work) > goto out_unlock; > > napi_disable(&rp->napi); > + netif_stop_queue(dev); This is not really fixing the bug because there is no synchronisation with the TX scheduler. You can call netif_tx_disable() instead to do that. (I also think that it is preferable to use netif_device_{detach,attach}() to stop the queue during reconfiguration, as this is independent of TX completions and the watchdog.) Ben. > spin_lock_bh(&rp->lock); > > /* clear all descriptors */
Am 21.07.2013 18:18, schrieb Ben Hutchings: > On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote: >> rhine_reset_task() misses to call netif_stop_queue(), >> this can lead to a crash if work is still scheduled while >> we're resetting the tx queue. >> >> Fixes: >> [ 93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c >> [ 93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6e >> >> Signed-off-by: Richard Weinberger <richard@nod.at> >> --- >> drivers/net/ethernet/via/via-rhine.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c >> index b75eb9e..57e1b40 100644 >> --- a/drivers/net/ethernet/via/via-rhine.c >> +++ b/drivers/net/ethernet/via/via-rhine.c >> @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work) >> goto out_unlock; >> >> napi_disable(&rp->napi); >> + netif_stop_queue(dev); > > This is not really fixing the bug because there is no synchronisation > with the TX scheduler. You can call netif_tx_disable() instead to do > that. I guess other drivers suffer from that too. A quick grep showed that not many drivers are using netif_tx_disable(). > (I also think that it is preferable to use > netif_device_{detach,attach}() to stop the queue during reconfiguration, > as this is independent of TX completions and the watchdog.) So the correct down sequence is napi_disable() -> netif_tx_disable() -> netif_device_detach()? Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-07-21 at 18:32 +0200, Richard Weinberger wrote: > Am 21.07.2013 18:18, schrieb Ben Hutchings: > > On Fri, 2013-07-19 at 23:30 +0200, Richard Weinberger wrote: > >> rhine_reset_task() misses to call netif_stop_queue(), > >> this can lead to a crash if work is still scheduled while > >> we're resetting the tx queue. > >> > >> Fixes: > >> [ 93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c > >> [ 93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6e > >> > >> Signed-off-by: Richard Weinberger <richard@nod.at> > >> --- > >> drivers/net/ethernet/via/via-rhine.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c > >> index b75eb9e..57e1b40 100644 > >> --- a/drivers/net/ethernet/via/via-rhine.c > >> +++ b/drivers/net/ethernet/via/via-rhine.c > >> @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work) > >> goto out_unlock; > >> > >> napi_disable(&rp->napi); > >> + netif_stop_queue(dev); > > > > This is not really fixing the bug because there is no synchronisation > > with the TX scheduler. You can call netif_tx_disable() instead to do > > that. > > I guess other drivers suffer from that too. > A quick grep showed that not many drivers are using netif_tx_disable(). > > > (I also think that it is preferable to use > > netif_device_{detach,attach}() to stop the queue during reconfiguration, > > as this is independent of TX completions and the watchdog.) Actually, this is not independent of TX completions - netif_wake_queue() will still start the TX scheduler while the device is not present, so you have to avoid that. > So the correct down sequence is napi_disable() -> netif_tx_disable() -> netif_device_detach()? No, that's redundant. You can do: napi_disable(); netif_tx_lock_bh(); /* sync with TX scheduler */ netif_device_detach(); netif_tx_unlock_bh(); and then when the queue is ready to use again: netif_device_attach(); napi_enable(); Ben.
diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c index b75eb9e..57e1b40 100644 --- a/drivers/net/ethernet/via/via-rhine.c +++ b/drivers/net/ethernet/via/via-rhine.c @@ -1615,6 +1615,7 @@ static void rhine_reset_task(struct work_struct *work) goto out_unlock; napi_disable(&rp->napi); + netif_stop_queue(dev); spin_lock_bh(&rp->lock); /* clear all descriptors */
rhine_reset_task() misses to call netif_stop_queue(), this can lead to a crash if work is still scheduled while we're resetting the tx queue. Fixes: [ 93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c [ 93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6e Signed-off-by: Richard Weinberger <richard@nod.at> --- drivers/net/ethernet/via/via-rhine.c | 1 + 1 file changed, 1 insertion(+)