Message ID | 1415197979-1702-1-git-send-email-karl.beldan@gmail.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Hi Karl, On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com> > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released > by the hw that has TX_LAST_DESC set. However, in case of TSO, each > hw desc embedding the last part of a segment has TX_LAST_DESC set, > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance, > which causes data corruption. > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and > warn when trying to dequeue from an empty txq (which can be symptomatic > of releasing skbs prematurely). > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') Although your change makes sense, this isn't fixing the issue for me, neither did the previous one. Ian: Can you double check that you have corruption *without* the patch, and that the patch fixes the issue?
On Wed, Nov 05, 2014 at 11:46:16AM -0300, Ezequiel Garcia wrote: > Hi Karl, > > On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com> > > > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released > > by the hw that has TX_LAST_DESC set. However, in case of TSO, each > > hw desc embedding the last part of a segment has TX_LAST_DESC set, > > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance, > > which causes data corruption. > > > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and > > warn when trying to dequeue from an empty txq (which can be symptomatic > > of releasing skbs prematurely). > > > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') > > Although your change makes sense, this isn't fixing the issue for me, > neither did the previous one. > This change fixes a serious issue. On my side I can now trigger misc NFS and md5sums errors very easily, which I haven't detected so far with it applied. Are you running little endian ? Do you have the tso alignment fix a63ba13e (I don't expect it to be required but I don't know what SoC you are using) ? I suppose you are running with all 3 fixes applied. Karl -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-11-05 at 11:46 -0300, Ezequiel Garcia wrote: > Hi Karl, > > On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com> > > > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released > > by the hw that has TX_LAST_DESC set. However, in case of TSO, each > > hw desc embedding the last part of a segment has TX_LAST_DESC set, > > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance, > > which causes data corruption. > > > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and > > warn when trying to dequeue from an empty txq (which can be symptomatic > > of releasing skbs prematurely). > > > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') > > Although your change makes sense, this isn't fixing the issue for me, > neither did the previous one. > > Ian: Can you double check that you have corruption *without* the patch, > and that the patch fixes the issue? > Have you also applied my patch ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 05, 2014 at 04:05:21PM +0100, Karl Beldan wrote: > On Wed, Nov 05, 2014 at 11:46:16AM -0300, Ezequiel Garcia wrote: > > Hi Karl, > > > > On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com> > > > > > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released > > > by the hw that has TX_LAST_DESC set. However, in case of TSO, each > > > hw desc embedding the last part of a segment has TX_LAST_DESC set, > > > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance, > > > which causes data corruption. > > > > > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and > > > warn when trying to dequeue from an empty txq (which can be symptomatic > > > of releasing skbs prematurely). > > > > > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') > > > > Although your change makes sense, this isn't fixing the issue for me, > > neither did the previous one. > > > This change fixes a serious issue. > On my side I can now trigger misc NFS and md5sums errors very easily, > which I haven't detected so far with it applied. > Are you running little endian ? Do you have the tso alignment fix > a63ba13e (I don't expect it to be required but I don't know what SoC you > are using) ? I suppose you are running with all 3 fixes applied. > Also, I haven't checked SMP issues and I only have one core, if you are using SMP it might be worth looking into that, maybe try running on one core only (I only have an MV78200). Karl -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-11-05 at 07:41 -0800, Eric Dumazet wrote: > On Wed, 2014-11-05 at 11:46 -0300, Ezequiel Garcia wrote: > > Hi Karl, > > > > On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com> > > > > > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released > > > by the hw that has TX_LAST_DESC set. However, in case of TSO, each > > > hw desc embedding the last part of a segment has TX_LAST_DESC set, > > > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance, > > > which causes data corruption. > > > > > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and > > > warn when trying to dequeue from an empty txq (which can be symptomatic > > > of releasing skbs prematurely). > > > > > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') > > > > Although your change makes sense, this isn't fixing the issue for me, > > neither did the previous one. > > > > Ian: Can you double check that you have corruption *without* the patch, > > and that the patch fixes the issue? Yes, doing md5sum on an NFS mount with 18 files in it I see 8-9 corrupted ones without any patch applied and none with Karl's previous one from <20141104142020.GA6728@magnum.frso.rivierawaves.com> in place. This was consistent over repeated invocations of md5sum (mounting and unmounting around each one). I've just confirmed this again to be sure. The system is a QNAP TS-41x (armel, Feroceon 88FR131) > Have you also applied my patch ? I've only applied that one patch from Karl onto the 3.16.7 kernel which is currently in Debian. Debian hasn't got any other mv643xx patches applied. I'm building now with Karl's latest patch from <1415197979-1702-1-git-send-email-karl.beldan@gmail.com> instead. Ian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Karl Beldan <karl.beldan@gmail.com> Date: Wed, 5 Nov 2014 15:32:59 +0100 > From: Karl Beldan <karl.beldan@rivierawaves.com> > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released > by the hw that has TX_LAST_DESC set. However, in case of TSO, each > hw desc embedding the last part of a segment has TX_LAST_DESC set, > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance, > which causes data corruption. > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and > warn when trying to dequeue from an empty txq (which can be symptomatic > of releasing skbs prematurely). > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') > Reported-by: Slawomir Gajzner <slawomir.gajzner@gmail.com> > Reported-by: Julien D'Ascenzio <jdascenzio@yahoo.fr> > Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Applied and queued up for -stable, but it seems there might still be some bugs to resolve... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index b151a94..d44560d 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -1047,7 +1047,6 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force) int tx_index; struct tx_desc *desc; u32 cmd_sts; - struct sk_buff *skb; tx_index = txq->tx_used_desc; desc = &txq->tx_desc_area[tx_index]; @@ -1066,19 +1065,22 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force) reclaimed++; txq->tx_desc_count--; - skb = NULL; - if (cmd_sts & TX_LAST_DESC) - skb = __skb_dequeue(&txq->tx_skb); + if (!IS_TSO_HEADER(txq, desc->buf_ptr)) + dma_unmap_single(mp->dev->dev.parent, desc->buf_ptr, + desc->byte_cnt, DMA_TO_DEVICE); + + if (cmd_sts & TX_ENABLE_INTERRUPT) { + struct sk_buff *skb = __skb_dequeue(&txq->tx_skb); + + if (!WARN_ON(!skb)) + dev_kfree_skb(skb); + } if (cmd_sts & ERROR_SUMMARY) { netdev_info(mp->dev, "tx error\n"); mp->dev->stats.tx_errors++; } - if (!IS_TSO_HEADER(txq, desc->buf_ptr)) - dma_unmap_single(mp->dev->dev.parent, desc->buf_ptr, - desc->byte_cnt, DMA_TO_DEVICE); - dev_kfree_skb(skb); } __netif_tx_unlock_bh(nq);