Message ID | 20140327125303.GA22117@breakpoint.cc |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 3/27/2014 2:53 PM, Sebastian Andrzej Siewior wrote: > On 2013-10-14 18:11:15 [+0300], Claudiu Manoil wrote: >>>> BUG: soft lockup - CPU#0 stuck for 23s! [iperf:2847] >>>> NIP [c0255b6c] find_next_bit+0xb8/0xc4 >>>> LR [c0367ae8] gfar_poll+0xc8/0x1d8 >>> It seems there is a race condition, and this patch only makes it happen >>> less often ? >>> >>> return faster means what exactly ? >>> >> >> Hi Eric, >> Because of the outer while loop, gfar_poll may not return due >> to continuous tx work. The later implementation of gfar_poll >> allows only one iteration of the Tx queues before returning >> control to net_rx_action(), that's what I meant with "returns faster". > > We talk here about 23secs of cleanup. RX is limited by NAPI and TX is > limited because it can't be refilled on your UP system. > Does your box recover from this condition without this patch? Mine does > not. But I run -RT and stumbled uppon something different. > > What I observe is that the TX queue is not empty but does not make any > progress. That means tx_queue->tx_skbuff[tx_queue->skb_dirtytx] is true > and gfar_clean_tx_ring() cleans up zero packages because it is not yet > complete. > > My problem is that when gfar_start_xmit() is preemted after the > tx_queue->tx_skbuff[tx_queue->skb_curtx] is set but before the DMA is started > then the NAPI-poll never completes because it sees a packet which never > completes because the DMA engine did no start yet and won't. False, that code section from start_xmit() cannot be preempted, because it has spin_lock_irqsave()/restore() around it (unless you modified your code). Will check though if on SMP, for some reason, clean_tx_ring() enters with 0 skbs to clean. [...] > To fix properly with something that works on -RT and mainline I suggest > to revert this patch and add the following: This patch cannot be reverted. (why would you?) This patch fixes the issue from description. I'm seeing no issues with P1010 now (on any kind of traffic), and the openwrt/tp-link guys also confirmed (on the powerpc list) that this patch addresses the issue on their end. If you encounter problems with the latest driver code, please submit a proper issue description indicating the code base you're using and so on. Also make sure that the problem you're seeing wasn't already fixed by one of the latest gianfar fixes from net-next: http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-03-28 10:19:07 [+0200], Claudiu Manoil wrote: > >My problem is that when gfar_start_xmit() is preemted after the > >tx_queue->tx_skbuff[tx_queue->skb_curtx] is set but before the DMA is started > >then the NAPI-poll never completes because it sees a packet which never > >completes because the DMA engine did no start yet and won't. > > False, that code section from start_xmit() cannot be preempted, because > it has spin_lock_irqsave()/restore() around it (unless you modified > your code). Will check though if on SMP, for some reason, > clean_tx_ring() enters with 0 skbs to clean. I said on -RT. On mainline it can't be preempted as I said. If for some reason you can't get your packet out (on a slow link as you in your case) it will return with 0 cleanups. This has been broken since c233cf4 ("gianfar: Fix tx napi polling") since you drop the return value. > [...] > > >To fix properly with something that works on -RT and mainline I suggest > >to revert this patch and add the following: > > This patch cannot be reverted. (why would you?) Because it does not fix a thing it simply duck tapes the issue that a TX transfer does not cleanup a thing and you assume that it did something. You have budget a reserved for RX cleanup which you do not use up if possible. You simple do one loop and leave. > This patch fixes the issue from description. I'm seeing no issues with > P1010 now (on any kind of traffic), and the openwrt/tp-link guys also > confirmed (on the powerpc list) that this patch addresses the issue on > their end. Simply because the stall is gone doesn't make it good. As you had no idea why. > If you encounter problems with the latest driver code, please submit a > proper issue description indicating the code base you're using and so > on. Also make sure that the problem you're seeing wasn't already fixed > by one of the latest gianfar fixes from net-next: > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git I pointed out _why_ you saw the stall and the fix involved not to endless loop on TX clean up on yet transmitted packages. The removal of outer loop was not required. The issue is present since c233cf4 which made it in v3.10 into the kernel. Sebastian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3/28/2014 10:34 AM, Sebastian Andrzej Siewior wrote: > On 2014-03-28 10:19:07 [+0200], Claudiu Manoil wrote: >>> My problem is that when gfar_start_xmit() is preemted after the >>> tx_queue->tx_skbuff[tx_queue->skb_curtx] is set but before the DMA is started >>> then the NAPI-poll never completes because it sees a packet which never >>> completes because the DMA engine did no start yet and won't. >> >> False, that code section from start_xmit() cannot be preempted, because >> it has spin_lock_irqsave()/restore() around it (unless you modified >> your code). Will check though if on SMP, for some reason, >> clean_tx_ring() enters with 0 skbs to clean. > > I said on -RT. On mainline it can't be preempted as I said. If for > some reason you can't get your packet out (on a slow link as you in your > case) it will return with 0 cleanups. > This has been broken since c233cf4 ("gianfar: Fix tx napi polling") > since you drop the return value. > >> [...] >> >>> To fix properly with something that works on -RT and mainline I suggest >>> to revert this patch and add the following: >> >> This patch cannot be reverted. (why would you?) > Because it does not fix a thing it simply duck tapes the issue that a TX > transfer does not cleanup a thing and you assume that it did something. > You have budget a reserved for RX cleanup which you do not use up if possible. > You simple do one loop and leave. Your proposed fix doesn't fix the root cause either, it's just a workaround that came late. Do you suggest consuming Rx budget for Tx processing as a better workaround? Note that the NAPI processing code has been changed in the meanwhile: http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git to address other issues (see aeb12c5ef7cb08d879af22fc0a56cab9e70689ea, and 71ff9e3df7e1c5d3293af6b595309124e8c97412). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 1799ff0..19192c4 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -132,7 +132,6 @@ static int gfar_poll(struct napi_struct *napi, int budget); static void gfar_netpoll(struct net_device *dev); #endif int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit); -static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue); static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb, int amount_pull, struct napi_struct *napi); void gfar_halt(struct net_device *dev); @@ -2473,7 +2472,7 @@ static void gfar_align_skb(struct sk_buff *skb) } /* Interrupt Handler for Transmit complete */ -static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue) +static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue) { struct net_device *dev = tx_queue->dev; struct netdev_queue *txq; @@ -2854,10 +2853,14 @@ static int gfar_poll(struct napi_struct *napi, int budget) tx_queue = priv->tx_queue[i]; /* run Tx cleanup to completion */ if (tx_queue->tx_skbuff[tx_queue->skb_dirtytx]) { - gfar_clean_tx_ring(tx_queue); - has_tx_work = 1; + int ret; + + ret = gfar_clean_tx_ring(tx_queue); + if (ret) + has_tx_work++; } } + work_done += has_tx_work; for_each_set_bit(i, &gfargrp->rx_bit_map, priv->num_rx_queues) { /* skip queue if not active */