Message ID | 20100617211203O.fujita.tomonori@lab.ntt.co.jp |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
FUJITA Tomonori wrote: > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > Date: Thu, 17 Jun 2010 13:06:15 +0900 > Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage > > This removes dma_get_ops() prefetch optimization in bnx2. > > bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is > noop. bnx2 does prefetch if it's noop. > > But dma_get_ops() isn't available on all the architectures (only the > architectures that uses dma_map_ops struct have it). Using > dma_get_ops() in drivers leads to compilation breakage on many > archtectures. > > Currently, we don't have a way to see if dma_sync_single_for_cpu() is > noop. If it can improve the performance notably, we can add the new > DMA API for it. This prefetch improves performance noticeably when the driver is handling incoming 64-byte packets at a sustained rate. > > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Michael Chan <mchan@broadcom.com> Thanks. > --- > drivers/net/bnx2.c | 10 +--------- > 1 files changed, 1 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c > index 949d7a9..b3305fc 100644 > --- a/drivers/net/bnx2.c > +++ b/drivers/net/bnx2.c > @@ -3073,7 +3073,6 @@ bnx2_rx_int(struct bnx2 *bp, struct > bnx2_napi *bnapi, int budget) > u16 hw_cons, sw_cons, sw_ring_cons, sw_prod, sw_ring_prod; > struct l2_fhdr *rx_hdr; > int rx_pkt = 0, pg_ring_used = 0; > - struct pci_dev *pdev = bp->pdev; > > hw_cons = bnx2_get_hw_rx_cons(bnapi); > sw_cons = rxr->rx_cons; > @@ -3086,7 +3085,7 @@ bnx2_rx_int(struct bnx2 *bp, struct > bnx2_napi *bnapi, int budget) > while (sw_cons != hw_cons) { > unsigned int len, hdr_len; > u32 status; > - struct sw_bd *rx_buf, *next_rx_buf; > + struct sw_bd *rx_buf; > struct sk_buff *skb; > dma_addr_t dma_addr; > u16 vtag = 0; > @@ -3098,13 +3097,6 @@ bnx2_rx_int(struct bnx2 *bp, struct > bnx2_napi *bnapi, int budget) > rx_buf = &rxr->rx_buf_ring[sw_ring_cons]; > skb = rx_buf->skb; > prefetchw(skb); > - > - if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) { > - next_rx_buf = > - &rxr->rx_buf_ring[ > - > RX_RING_IDX(NEXT_RX_BD(sw_cons))]; > - prefetch(next_rx_buf->desc); > - } > rx_buf->skb = NULL; > > dma_addr = dma_unmap_addr(rx_buf, mapping); > -- > 1.5.6.5 > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote: > FUJITA Tomonori wrote: > > > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > > Date: Thu, 17 Jun 2010 13:06:15 +0900 > > Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage > > > > This removes dma_get_ops() prefetch optimization in bnx2. > > > > bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is > > noop. bnx2 does prefetch if it's noop. > > > > But dma_get_ops() isn't available on all the architectures (only the > > architectures that uses dma_map_ops struct have it). Using > > dma_get_ops() in drivers leads to compilation breakage on many > > archtectures. > > > > Currently, we don't have a way to see if dma_sync_single_for_cpu() is > > noop. If it can improve the performance notably, we can add the new > > DMA API for it. > > This prefetch improves performance noticeably when the driver is > handling incoming 64-byte packets at a sustained rate. So why not do it unconditionally? The worst that can happen is that you pull in a stale cache line which will get cleaned in the dma_sync, thus slightly degrading performance on incoherent architectures. Alternatively, come up with a dma prefetch infrastructure ... all you're really doing is hinting to the architecture that you'll sync this region next. James -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
James Bottomley wrote: > On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote: > > This prefetch improves performance noticeably when the driver is > > handling incoming 64-byte packets at a sustained rate. > > So why not do it unconditionally? The worst that can happen > is that you > pull in a stale cache line which will get cleaned in the > dma_sync, thus > slightly degrading performance on incoherent architectures. The original patch was an unconditional prefetch. There was some discussion that it might not be correct if the DMA wasn't sync'ed yet on some archs. If the concensus is that it is ok to do so, that would be the simplest solution. > > Alternatively, come up with a dma prefetch infrastructure ... > all you're > really doing is hinting to the architecture that you'll sync > this region > next. > > James > > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2010-06-17 at 06:30 -0700, Michael Chan wrote: > James Bottomley wrote: > > > On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote: > > > This prefetch improves performance noticeably when the driver is > > > handling incoming 64-byte packets at a sustained rate. > > > > So why not do it unconditionally? The worst that can happen > > is that you > > pull in a stale cache line which will get cleaned in the > > dma_sync, thus > > slightly degrading performance on incoherent architectures. > > The original patch was an unconditional prefetch. There was > some discussion that it might not be correct if the DMA wasn't > sync'ed yet on some archs. If the concensus is that it is ok to > do so, that would be the simplest solution. It's definitely not "correct" in that it may pull in stale data. But it should be harmless in that if it does, the subsequent sync will destroy the cache line (even if it actually pulled in correct data) and prevent the actual use of the prefetched data being wrong (or indeed being prefetched at all). James -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 949d7a9..b3305fc 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -3073,7 +3073,6 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget) u16 hw_cons, sw_cons, sw_ring_cons, sw_prod, sw_ring_prod; struct l2_fhdr *rx_hdr; int rx_pkt = 0, pg_ring_used = 0; - struct pci_dev *pdev = bp->pdev; hw_cons = bnx2_get_hw_rx_cons(bnapi); sw_cons = rxr->rx_cons; @@ -3086,7 +3085,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget) while (sw_cons != hw_cons) { unsigned int len, hdr_len; u32 status; - struct sw_bd *rx_buf, *next_rx_buf; + struct sw_bd *rx_buf; struct sk_buff *skb; dma_addr_t dma_addr; u16 vtag = 0; @@ -3098,13 +3097,6 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget) rx_buf = &rxr->rx_buf_ring[sw_ring_cons]; skb = rx_buf->skb; prefetchw(skb); - - if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) { - next_rx_buf = - &rxr->rx_buf_ring[ - RX_RING_IDX(NEXT_RX_BD(sw_cons))]; - prefetch(next_rx_buf->desc); - } rx_buf->skb = NULL; dma_addr = dma_unmap_addr(rx_buf, mapping);