Message ID | 1490784106-14489-1-git-send-email-vzakhar@synopsys.com |
---|---|
State | New |
Headers | show |
From: Vlad Zakharov <Vladislav.Zakharov@synopsys.com> Date: Wed, 29 Mar 2017 13:41:46 +0300 > After a new NAPI_STATE_MISSED state was added to NAPI we can get into > this state and in such case we have to reschedule NAPI as some work is > still pending and we have to process it. napi_complete_done() function > returns false if we have to reschedule something (e.g. in case we were > in MISSED state) as current polling have not been completed yet. > > nps_enet driver hasn't been verifying the return value of > napi_complete_done() and has been forcibly enabling interrupts. That is > not correct as we should not enable interrupts before we have processed > all scheduled work. As a result we were getting trapped in interrupt > hanlder chain as we had never been able to disabale ethernet > interrupts again. > > So this patch makes nps_enet_poll() func verify return value of > napi_complete_done() and enable interrupts only in case all scheduled > work has been completed. > > Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com> Applied. Eric, if this is really required now, we have 148 broken drivers still.
On Wed, Mar 29, 2017 at 2:30 PM, David Miller <davem@davemloft.net> wrote: Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com> > > Applied. > > Eric, if this is really required now, we have 148 broken drivers still. Piece of cake :/ If we get more reports like that, we might implement a logic to prevent infinite loops. It is not clear to me what exactly happened to this driver, since testing napi_complete_done() was not mandatory.
Hi Eric, On Wed, 2017-03-29 at 14:41 -0700, Eric Dumazet wrote: > On Wed, Mar 29, 2017 at 2:30 PM, David Miller <davem@davemloft.net> wrote: > Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com> > > > > > > Applied. > > > > Eric, if this is really required now, we have 148 broken drivers still. > > Piece of cake :/ > > If we get more reports like that, we might implement a logic to > prevent infinite loops. > > It is not clear to me what exactly happened to this driver, since > testing napi_complete_done() was not mandatory. I am not sure what is happening with other drivers, but in case of ezchip nps_enet driver after the following commit: 39e6c8208d7b6fb9d2047850fb3327db567b564b if we got into NAPI_STATE_MISSED state the following happened: in nps_enet_poll func we were calling napi_complete_done() (which reset MISSED state but left SCHED state) and after that without any checks were enabling interrupts. Then we obviously were getting into nps_enet_irq_hanlder() if irq was pending (it is very possbile state). If we look inside this function we will see that it disables interrupts only in case napi_sched_prep() returns true. In its turn napi_sched_prep() returns true only in case it changes state from non-SCHED to SCHED. But in our case as SCHED had been already set it set MISSED state and then returned false. So at that point we had already been trapped: after exiting irq hanlder we were getting into nps_enet_irq_hanlder() again and again as we couldn't rescind pending irq signal and disable corresponding irq. -- Best regards, Vlad Zakharov <vzakhar@synopsys.com>
On Thu, Mar 30, 2017 at 2:16 AM, Vlad Zakharov <Vladislav.Zakharov@synopsys.com> wrote: > I am not sure what is happening with other drivers, but in case of ezchip nps_enet driver after the following commit: > 39e6c8208d7b6fb9d2047850fb3327db567b564b > > if we got into NAPI_STATE_MISSED state the following happened: > in nps_enet_poll func we were calling napi_complete_done() (which reset MISSED state but left SCHED state) and after > that without any checks were enabling interrupts. > > Then we obviously were getting into nps_enet_irq_hanlder() if irq was pending (it is very possbile state). If we look > inside this function we will see that it disables interrupts only in case napi_sched_prep() returns true. In its turn > napi_sched_prep() returns true only in case it changes state from non-SCHED to SCHED. But in our case as SCHED had been > already set it set MISSED state and then returned false. So at that point we had already been trapped: after exiting irq > hanlder we were getting into nps_enet_irq_hanlder() again and again as we couldn't rescind pending irq signal and > disable corresponding irq. Hi Vlad Considering the use of napi_schedule_prep() in nps_enet_irq_handler(), it is strange that nps_enet_poll() uses : if (nps_enet_is_tx_pending(priv)) { nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0); napi_reschedule(napi); // note the return value is ignored... } So nps_enet_poll() was enabling interrupts, then disabling them, which seems very unusual. I need to check all drivers using napi_schedule_prep() and/or napi_reschedule() and make sure they also test napi_comple_done() return value... I count 12 drivers using napi_reschedule() without checking its return value. They either should check its return value, or use conventional napi_schedule() Thanks !
Hi David, all, On Wed, 2017-03-29 at 14:30 -0700, David Miller wrote: > From: Vlad Zakharov <Vladislav.Zakharov@synopsys.com> > Date: Wed, 29 Mar 2017 13:41:46 +0300 > > > > > After a new NAPI_STATE_MISSED state was added to NAPI we can get into > > this state and in such case we have to reschedule NAPI as some work is > > still pending and we have to process it. napi_complete_done() function > > returns false if we have to reschedule something (e.g. in case we were > > in MISSED state) as current polling have not been completed yet. > > > > nps_enet driver hasn't been verifying the return value of > > napi_complete_done() and has been forcibly enabling interrupts. That is > > not correct as we should not enable interrupts before we have processed > > all scheduled work. As a result we were getting trapped in interrupt > > hanlder chain as we had never been able to disabale ethernet > > interrupts again. > > > > So this patch makes nps_enet_poll() func verify return value of > > napi_complete_done() and enable interrupts only in case all scheduled > > work has been completed. > > > > Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com> > > Applied. > > Eric, if this is really required now, we have 148 broken drivers still. Could you please backport this patch to stable tree (starting from 4.10) as it is pretty important and fixes nps_enet driver? It became actual after Eric's commit 39e6c8208d7b (net: solve a NAPI race) which has already been backported to 4.10. Thanks. -- Best regards, Vlad Zakharov <vzakhar@synopsys.com>
diff --git a/drivers/net/ethernet/ezchip/nps_enet.c b/drivers/net/ethernet/ezchip/nps_enet.c index 992ebe9..f819843 100644 --- a/drivers/net/ethernet/ezchip/nps_enet.c +++ b/drivers/net/ethernet/ezchip/nps_enet.c @@ -189,11 +189,9 @@ static int nps_enet_poll(struct napi_struct *napi, int budget) nps_enet_tx_handler(ndev); work_done = nps_enet_rx_handler(ndev); - if (work_done < budget) { + if ((work_done < budget) && napi_complete_done(napi, work_done)) { u32 buf_int_enable_value = 0; - napi_complete_done(napi, work_done); - /* set tx_done and rx_rdy bits */ buf_int_enable_value |= NPS_ENET_ENABLE << RX_RDY_SHIFT; buf_int_enable_value |= NPS_ENET_ENABLE << TX_DONE_SHIFT;
After a new NAPI_STATE_MISSED state was added to NAPI we can get into this state and in such case we have to reschedule NAPI as some work is still pending and we have to process it. napi_complete_done() function returns false if we have to reschedule something (e.g. in case we were in MISSED state) as current polling have not been completed yet. nps_enet driver hasn't been verifying the return value of napi_complete_done() and has been forcibly enabling interrupts. That is not correct as we should not enable interrupts before we have processed all scheduled work. As a result we were getting trapped in interrupt hanlder chain as we had never been able to disabale ethernet interrupts again. So this patch makes nps_enet_poll() func verify return value of napi_complete_done() and enable interrupts only in case all scheduled work has been completed. Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com> --- drivers/net/ethernet/ezchip/nps_enet.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)