Message ID | 1360715064-2689-3-git-send-email-paul.gortmaker@windriver.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On 2/13/2013 2:24 AM, Paul Gortmaker wrote: > While looking at some asm dump for an unrelated change, Eric > noticed in the following stats count increment code: > > 50b8: 81 3c 01 f8 lwz r9,504(r28) > 50bc: 81 5c 01 fc lwz r10,508(r28) > 50c0: 31 4a 00 01 addic r10,r10,1 > 50c4: 7d 29 01 94 addze r9,r9 > 50c8: 91 3c 01 f8 stw r9,504(r28) > 50cc: 91 5c 01 fc stw r10,508(r28) > > that a 64 bit counter was used on ppc-32 without sync > and hence the "ethtool -S" output was racy. > > Here we convert all the values to use atomic64_t so that > the output will always be consistent. > At least it seems that this conversion results in fewer asm instructions, as apparently addze and the double lwz/stw are not generated anymore. Hopefully it's faster too :P Reviewed-by: Claudiu Manoil <claudiu.manoil@freescale.com> Thanks, Claudiu -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-02-13 at 17:22 +0200, Claudiu Manoil wrote: > At least it seems that this conversion results in fewer asm > instructions, as apparently addze and the double lwz/stw are > not generated anymore. Hopefully it's faster too :P Strange, could you show us these asm instructions ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/13/2013 6:14 PM, Eric Dumazet wrote: > On Wed, 2013-02-13 at 17:22 +0200, Claudiu Manoil wrote: > >> At least it seems that this conversion results in fewer asm >> instructions, as apparently addze and the double lwz/stw are >> not generated anymore. Hopefully it's faster too :P > > Strange, could you show us these asm instructions ? > > > Ok, I'm looking over gfar_clean_rx_ring's asm code, with and w/o this patch. They are difficult to compare as asm code changed considerably, the initial version having more lines. The first thing I notice is that the initial ver has 13 'addic' instructions, and the new version has 7. Now taking the code around the last 'addic' instruction (from the gfar_clean_rx_ring function): Initial version looks like this: 5024: 4b ff ff ac b 4fd0 <gfar_clean_rx_ring+0x450> 5028: 81 5d 06 30 lwz r10,1584(r29) 502c: 81 7d 06 34 lwz r11,1588(r29) 5030: 31 6b 00 01 addic r11,r11,1 5034: 7d 4a 01 94 addze r10,r10 5038: 91 5d 06 30 stw r10,1584(r29) 503c: 91 7d 06 34 stw r11,1588(r29) 5040: 4b ff fe fc b 4f3c <gfar_clean_rx_ring+0x3bc> New version looks like this: 4ff8: 4b ff fd a8 b 4da0 <gfar_clean_rx_ring+0x1ec> 4ffc: 80 1c 00 a0 lwz r0,160(r28) 5000: 38 60 00 00 li r3,0 5004: 80 a1 00 18 lwz r5,24(r1) 5008: 38 80 00 01 li r4,1 500c: 30 00 00 01 addic r0,r0,1 5010: 90 1c 00 a0 stw r0,160(r28) 5014: 48 00 00 01 bl 5014 <gfar_clean_rx_ring+0x460> 5018: 4b ff ff 4c b 4f64 <gfar_clean_rx_ring+0x3b0> I have the whole function's asm excepts, if needed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-02-13 at 19:47 +0200, Claudiu Manoil wrote: > On 2/13/2013 6:14 PM, Eric Dumazet wrote: > > On Wed, 2013-02-13 at 17:22 +0200, Claudiu Manoil wrote: > > > >> At least it seems that this conversion results in fewer asm > >> instructions, as apparently addze and the double lwz/stw are > >> not generated anymore. Hopefully it's faster too :P > > > > Strange, could you show us these asm instructions ? > > > > > > > > Ok, I'm looking over gfar_clean_rx_ring's asm code, with and w/o this > patch. They are difficult to compare as asm code changed considerably, > the initial version having more lines. > The first thing I notice is that the initial ver has 13 'addic' > instructions, and the new version has 7. > > Now taking the code around the last 'addic' instruction (from the > gfar_clean_rx_ring function): > > Initial version looks like this: > > 5024: 4b ff ff ac b 4fd0 <gfar_clean_rx_ring+0x450> > 5028: 81 5d 06 30 lwz r10,1584(r29) > 502c: 81 7d 06 34 lwz r11,1588(r29) > 5030: 31 6b 00 01 addic r11,r11,1 > 5034: 7d 4a 01 94 addze r10,r10 > 5038: 91 5d 06 30 stw r10,1584(r29) > 503c: 91 7d 06 34 stw r11,1588(r29) > 5040: 4b ff fe fc b 4f3c <gfar_clean_rx_ring+0x3bc> > > New version looks like this: > > 4ff8: 4b ff fd a8 b 4da0 <gfar_clean_rx_ring+0x1ec> > 4ffc: 80 1c 00 a0 lwz r0,160(r28) > 5000: 38 60 00 00 li r3,0 > 5004: 80 a1 00 18 lwz r5,24(r1) > 5008: 38 80 00 01 li r4,1 > 500c: 30 00 00 01 addic r0,r0,1 > 5010: 90 1c 00 a0 stw r0,160(r28) > 5014: 48 00 00 01 bl 5014 <gfar_clean_rx_ring+0x460> > 5018: 4b ff ff 4c b 4f64 <gfar_clean_rx_ring+0x3b0> > > > I have the whole function's asm excepts, if needed. > I guess you are not looking at the right spot. 32bit powerpc probably use the generic atomic64 Your kernel should have an atomic64_add() function (in lib/atomic64.c) and gianfar should call it. This is certainly expensive, but these counters are not in fast path. If they were in fast path, include/linux/u64_stats_sync.h would be a better choice. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index ab32bd0..c82f677 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -2648,7 +2648,7 @@ static inline void count_errors(unsigned short status, struct net_device *dev) if (status & RXBD_TRUNCATED) { stats->rx_length_errors++; - estats->rx_trunc++; + atomic64_inc(&estats->rx_trunc); return; } @@ -2657,20 +2657,20 @@ static inline void count_errors(unsigned short status, struct net_device *dev) stats->rx_length_errors++; if (status & RXBD_LARGE) - estats->rx_large++; + atomic64_inc(&estats->rx_large); else - estats->rx_short++; + atomic64_inc(&estats->rx_short); } if (status & RXBD_NONOCTET) { stats->rx_frame_errors++; - estats->rx_nonoctet++; + atomic64_inc(&estats->rx_nonoctet); } if (status & RXBD_CRCERR) { - estats->rx_crcerr++; + atomic64_inc(&estats->rx_crcerr); stats->rx_crc_errors++; } if (status & RXBD_OVERRUN) { - estats->rx_overrun++; + atomic64_inc(&estats->rx_overrun); stats->rx_crc_errors++; } } @@ -2744,7 +2744,7 @@ static int gfar_process_frame(struct net_device *dev, struct sk_buff *skb, ret = napi_gro_receive(napi, skb); if (GRO_DROP == ret) - priv->extra_stats.kernel_dropped++; + atomic64_inc(&priv->extra_stats.kernel_dropped); return 0; } @@ -2812,7 +2812,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit) } else { netif_warn(priv, rx_err, dev, "Missing skb!\n"); rx_queue->stats.rx_dropped++; - priv->extra_stats.rx_skbmissing++; + atomic64_inc(&priv->extra_stats.rx_skbmissing); } } @@ -3245,7 +3245,7 @@ static irqreturn_t gfar_error(int irq, void *grp_id) netif_dbg(priv, tx_err, dev, "TX FIFO underrun, packet dropped\n"); dev->stats.tx_dropped++; - priv->extra_stats.tx_underrun++; + atomic64_inc(&priv->extra_stats.tx_underrun); local_irq_save(flags); lock_tx_qs(priv); @@ -3260,7 +3260,7 @@ static irqreturn_t gfar_error(int irq, void *grp_id) } if (events & IEVENT_BSY) { dev->stats.rx_errors++; - priv->extra_stats.rx_bsy++; + atomic64_inc(&priv->extra_stats.rx_bsy); gfar_receive(irq, grp_id); @@ -3269,19 +3269,19 @@ static irqreturn_t gfar_error(int irq, void *grp_id) } if (events & IEVENT_BABR) { dev->stats.rx_errors++; - priv->extra_stats.rx_babr++; + atomic64_inc(&priv->extra_stats.rx_babr); netif_dbg(priv, rx_err, dev, "babbling RX error\n"); } if (events & IEVENT_EBERR) { - priv->extra_stats.eberr++; + atomic64_inc(&priv->extra_stats.eberr); netif_dbg(priv, rx_err, dev, "bus error\n"); } if (events & IEVENT_RXC) netif_dbg(priv, rx_status, dev, "control frame\n"); if (events & IEVENT_BABT) { - priv->extra_stats.tx_babt++; + atomic64_inc(&priv->extra_stats.tx_babt); netif_dbg(priv, tx_err, dev, "babbling TX error\n"); } return IRQ_HANDLED; diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h index 61b1785..78125f1 100644 --- a/drivers/net/ethernet/freescale/gianfar.h +++ b/drivers/net/ethernet/freescale/gianfar.h @@ -627,24 +627,25 @@ struct rmon_mib }; struct gfar_extra_stats { - u64 kernel_dropped; - u64 rx_large; - u64 rx_short; - u64 rx_nonoctet; - u64 rx_crcerr; - u64 rx_overrun; - u64 rx_bsy; - u64 rx_babr; - u64 rx_trunc; - u64 eberr; - u64 tx_babt; - u64 tx_underrun; - u64 rx_skbmissing; - u64 tx_timeout; + atomic64_t kernel_dropped; + atomic64_t rx_large; + atomic64_t rx_short; + atomic64_t rx_nonoctet; + atomic64_t rx_crcerr; + atomic64_t rx_overrun; + atomic64_t rx_bsy; + atomic64_t rx_babr; + atomic64_t rx_trunc; + atomic64_t eberr; + atomic64_t tx_babt; + atomic64_t tx_underrun; + atomic64_t rx_skbmissing; + atomic64_t tx_timeout; }; #define GFAR_RMON_LEN ((sizeof(struct rmon_mib) - 16)/sizeof(u32)) -#define GFAR_EXTRA_STATS_LEN (sizeof(struct gfar_extra_stats)/sizeof(u64)) +#define GFAR_EXTRA_STATS_LEN \ + (sizeof(struct gfar_extra_stats)/sizeof(atomic64_t)) /* Number of stats exported via ethtool */ #define GFAR_STATS_LEN (GFAR_RMON_LEN + GFAR_EXTRA_STATS_LEN) diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c index 172acb9..75e89ac 100644 --- a/drivers/net/ethernet/freescale/gianfar_ethtool.c +++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c @@ -149,10 +149,10 @@ static void gfar_fill_stats(struct net_device *dev, struct ethtool_stats *dummy, int i; struct gfar_private *priv = netdev_priv(dev); struct gfar __iomem *regs = priv->gfargrp[0].regs; - u64 *extra = (u64 *) & priv->extra_stats; + atomic64_t *extra = (atomic64_t *)&priv->extra_stats; for (i = 0; i < GFAR_EXTRA_STATS_LEN; i++) - buf[i] = extra[i]; + buf[i] = atomic64_read(&extra[i]); if (priv->device_flags & FSL_GIANFAR_DEV_HAS_RMON) { u32 __iomem *rmon = (u32 __iomem *) ®s->rmon;
While looking at some asm dump for an unrelated change, Eric noticed in the following stats count increment code: 50b8: 81 3c 01 f8 lwz r9,504(r28) 50bc: 81 5c 01 fc lwz r10,508(r28) 50c0: 31 4a 00 01 addic r10,r10,1 50c4: 7d 29 01 94 addze r9,r9 50c8: 91 3c 01 f8 stw r9,504(r28) 50cc: 91 5c 01 fc stw r10,508(r28) that a 64 bit counter was used on ppc-32 without sync and hence the "ethtool -S" output was racy. Here we convert all the values to use atomic64_t so that the output will always be consistent. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> --- drivers/net/ethernet/freescale/gianfar.c | 26 ++++++++++---------- drivers/net/ethernet/freescale/gianfar.h | 31 ++++++++++++------------ drivers/net/ethernet/freescale/gianfar_ethtool.c | 4 +-- 3 files changed, 31 insertions(+), 30 deletions(-)