Message ID | 20140821220424.GB7117@f1.synalogic.ca |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2014-08-21 at 15:04 -0700, Benjamin Poirier wrote: > On 2014/08/19 15:00, Michael Chan wrote: > > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: > > > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c > > > index 3ac5d23..b11c0fd 100644 > > > --- a/drivers/net/ethernet/broadcom/tg3.c > > > +++ b/drivers/net/ethernet/broadcom/tg3.c > > > @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits) > > > #endif > > > > > > /* minimum number of free TX descriptors required to wake up TX process */ > > > -#define TG3_TX_WAKEUP_THRESH(tnapi) ((tnapi)->tx_pending / 4) > > > +#define TG3_TX_WAKEUP_THRESH(tnapi) max_t(u32, (tnapi)->tx_pending / 4, \ > > > + MAX_SKB_FRAGS + 1) > > > > I think we should precompute this and store it in something like > > tp->tx_wake_thresh. > > I've tried this by adding the following patch at the end of the v2 > series but I did not measure a significant latency improvement. Was > there another reason for the change? Just performance. The wake up threshold is checked in the tx fast path in both start_xmit() and tg3_tx(). I would optimize such code for speed as much as possible. In the current code, it was just a right shift operation. Now, with max_t() added, I think I prefer having it pre-computed. The performance difference may not be measurable, but I think the compiled code size may be smaller too. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014/08/21 15:32, Michael Chan wrote: > On Thu, 2014-08-21 at 15:04 -0700, Benjamin Poirier wrote: > > On 2014/08/19 15:00, Michael Chan wrote: > > > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: > > > > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c > > > > index 3ac5d23..b11c0fd 100644 > > > > --- a/drivers/net/ethernet/broadcom/tg3.c > > > > +++ b/drivers/net/ethernet/broadcom/tg3.c > > > > @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits) > > > > #endif > > > > > > > > /* minimum number of free TX descriptors required to wake up TX process */ > > > > -#define TG3_TX_WAKEUP_THRESH(tnapi) ((tnapi)->tx_pending / 4) > > > > +#define TG3_TX_WAKEUP_THRESH(tnapi) max_t(u32, (tnapi)->tx_pending / 4, \ > > > > + MAX_SKB_FRAGS + 1) > > > > > > I think we should precompute this and store it in something like > > > tp->tx_wake_thresh. > > > > I've tried this by adding the following patch at the end of the v2 > > series but I did not measure a significant latency improvement. Was > > there another reason for the change? > > Just performance. The wake up threshold is checked in the tx fast path > in both start_xmit() and tg3_tx(). I would optimize such code for speed I don't see what you mean. The code in those two functions that used to invoke TG3_TX_WAKEUP_THRESH is wrapped in unlikely() conditions. You can't tell me that's the fast path ;) It's only checked when the queue is stopped. Moreover, the patches I've sent already add tg3_napi.wakeup_thresh. It is over those patches that I've made the measurements. > as much as possible. In the current code, it was just a right shift > operation. Now, with max_t() added, I think I prefer having it > pre-computed. The performance difference may not be measurable, but I > think the compiled code size may be smaller too. Maybe in certain areas, but not overall: with v2 patches 1-3 text data bss dec hex filename 149495 1247 0 150742 24cd6 drivers/net/ethernet/broadcom/tg3.o with v2 patches 1-3 + tx_wake_thresh_def text data bss dec hex filename 149524 1247 0 150771 24cf3 drivers/net/ethernet/broadcom/tg3.o I really don't see a gain. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-08-21 at 16:06 -0700, Benjamin Poirier wrote: > On 2014/08/21 15:32, Michael Chan wrote: > > On Thu, 2014-08-21 at 15:04 -0700, Benjamin Poirier wrote: > > > On 2014/08/19 15:00, Michael Chan wrote: > > > > On Tue, 2014-08-19 at 11:52 -0700, Benjamin Poirier wrote: > > > > > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c > > > > > index 3ac5d23..b11c0fd 100644 > > > > > --- a/drivers/net/ethernet/broadcom/tg3.c > > > > > +++ b/drivers/net/ethernet/broadcom/tg3.c > > > > > @@ -202,7 +202,8 @@ static inline void _tg3_flag_clear(enum TG3_FLAGS flag, unsigned long *bits) > > > > > #endif > > > > > > > > > > /* minimum number of free TX descriptors required to wake up TX process */ > > > > > -#define TG3_TX_WAKEUP_THRESH(tnapi) ((tnapi)->tx_pending / 4) > > > > > +#define TG3_TX_WAKEUP_THRESH(tnapi) max_t(u32, (tnapi)->tx_pending / 4, \ > > > > > + MAX_SKB_FRAGS + 1) > > > > > > > > I think we should precompute this and store it in something like > > > > tp->tx_wake_thresh. > > > > > > I've tried this by adding the following patch at the end of the v2 > > > series but I did not measure a significant latency improvement. Was > > > there another reason for the change? > > > > Just performance. The wake up threshold is checked in the tx fast path > > in both start_xmit() and tg3_tx(). I would optimize such code for speed > > I don't see what you mean. The code in those two functions that used to > invoke TG3_TX_WAKEUP_THRESH is wrapped in unlikely() conditions. You > can't tell me that's the fast path ;) It's only checked when the queue > is stopped. I missed the unlikely(). So you're right. It's not really in the fast path. > > Moreover, the patches I've sent already add tg3_napi.wakeup_thresh. It > is over those patches that I've made the measurements. Right. But my original comment was over your original patch #1 which was adding max_t() to the macro TG3_TX_WAKE_THRESH without adding wakeup_thresh field. All my comments (performance and smaller code) were based on your original patch #1. Later I did see that your patch 3 converted TG3_TX_WAKEUP_THRESH to a structure field so it's no longer an issue. > > > as much as possible. In the current code, it was just a right shift > > operation. Now, with max_t() added, I think I prefer having it > > pre-computed. The performance difference may not be measurable, but I > > think the compiled code size may be smaller too. > > Maybe in certain areas, but not overall: > > with v2 patches 1-3 > text data bss dec hex filename > 149495 1247 0 150742 24cd6 drivers/net/ethernet/broadcom/tg3.o > with v2 patches 1-3 + tx_wake_thresh_def > text data bss dec hex filename > 149524 1247 0 150771 24cf3 drivers/net/ethernet/broadcom/tg3.o > > I really don't see a gain. > Agreed. Once you have converted the TG3_TX_WAKEUP_THRESH to a structure field, that's sufficient. No need to have multiple fields. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index c29f2e3..81e390b 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -6478,10 +6478,11 @@ static void tg3_dump_state(struct tg3 *tp) tnapi->hw_status->idx[0].tx_consumer); netdev_err(tp->dev, - "%d: NAPI info [%08x:%08x:(%04x:%04x:%04x):%04x:(%04x:%04x:%04x:%04x)]\n", + "%d: NAPI info [%08x:%08x:(%04x:%04x:%04x:%04x):%04x:(%04x:%04x:%04x:%04x)]\n", i, tnapi->last_tag, tnapi->last_irq_tag, tnapi->tx_prod, tnapi->tx_cons, tnapi->tx_pending, + tnapi->tx_wake_thresh_cur, tnapi->rx_rcb_ptr, tnapi->prodring.rx_std_prod_idx, tnapi->prodring.rx_std_cons_idx, @@ -6613,10 +6614,10 @@ static void tg3_tx(struct tg3_napi *tnapi) smp_mb(); if (unlikely(netif_tx_queue_stopped(txq) && - (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh))) { + (tg3_tx_avail(tnapi) > tnapi->tx_wake_thresh_cur))) { __netif_tx_lock(txq, smp_processor_id()); if (netif_tx_queue_stopped(txq) && - (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh)) + (tg3_tx_avail(tnapi) > tnapi->tx_wake_thresh_cur)) netif_tx_wake_queue(txq); __netif_tx_unlock(txq); } @@ -7849,8 +7850,8 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi, if (unlikely(tg3_tx_avail(tnapi) <= desc_cnt_est)) { netif_tx_stop_queue(txq); - tnapi->wakeup_thresh = desc_cnt_est; - BUG_ON(tnapi->wakeup_thresh >= tnapi->tx_pending); + tnapi->tx_wake_thresh_cur = desc_cnt_est; + BUG_ON(tnapi->tx_wake_thresh_cur >= tnapi->tx_pending); /* netif_tx_stop_queue() must be done before checking * checking tx index in tg3_tx_avail() below, because in @@ -7858,7 +7859,7 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi, * netif_tx_queue_stopped(). */ smp_mb(); - if (tg3_tx_avail(tnapi) <= tnapi->wakeup_thresh) + if (tg3_tx_avail(tnapi) <= tnapi->tx_wake_thresh_cur) return NETDEV_TX_BUSY; netif_tx_wake_queue(txq); @@ -7938,14 +7939,14 @@ static netdev_tx_t __tg3_start_xmit(struct sk_buff *skb, if (unlikely(budget <= (skb_shinfo(skb)->nr_frags + 1))) { if (!netif_tx_queue_stopped(txq)) { netif_tx_stop_queue(txq); - tnapi->wakeup_thresh = TG3_TX_WAKEUP_THRESH(tnapi); + tnapi->tx_wake_thresh_cur = tnapi->tx_wake_thresh_def; /* This is a hard error, log it. */ netdev_err(dev, "BUG! Tx Ring full when queue awake!\n"); } smp_mb(); - if (tg3_tx_avail(tnapi) <= tnapi->wakeup_thresh) + if (tg3_tx_avail(tnapi) <= tnapi->tx_wake_thresh_cur) return NETDEV_TX_BUSY; netif_tx_wake_queue(txq); @@ -8127,7 +8128,7 @@ static netdev_tx_t __tg3_start_xmit(struct sk_buff *skb, tnapi->tx_prod = entry; if (unlikely(tg3_tx_avail(tnapi) <= stop_thresh)) { netif_tx_stop_queue(txq); - tnapi->wakeup_thresh = TG3_TX_WAKEUP_THRESH(tnapi); + tnapi->tx_wake_thresh_cur = tnapi->tx_wake_thresh_def; /* netif_tx_stop_queue() must be done before checking * checking tx index in tg3_tx_avail() below, because in @@ -8135,7 +8136,7 @@ static netdev_tx_t __tg3_start_xmit(struct sk_buff *skb, * netif_tx_queue_stopped(). */ smp_mb(); - if (tg3_tx_avail(tnapi) > tnapi->wakeup_thresh) + if (tg3_tx_avail(tnapi) > tnapi->tx_wake_thresh_cur) netif_tx_wake_queue(txq); } @@ -12379,8 +12380,11 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e tp->rx_jumbo_pending = ering->rx_jumbo_pending; dev->gso_max_segs = TG3_TX_SEG_PER_DESC(ering->tx_pending - 1); - for (i = 0; i < tp->irq_max; i++) + for (i = 0; i < tp->irq_max; i++) { tp->napi[i].tx_pending = ering->tx_pending; + tp->napi[i].tx_wake_thresh_def = + TG3_TX_WAKEUP_THRESH(&tp->napi[i]); + } if (netif_running(dev)) { tg3_halt(tp, RESET_KIND_SHUTDOWN, 1); @@ -17820,6 +17824,7 @@ static int tg3_init_one(struct pci_dev *pdev, tnapi->tp = tp; tnapi->tx_pending = TG3_DEF_TX_RING_PENDING; + tnapi->tx_wake_thresh_def = TG3_TX_WAKEUP_THRESH(tnapi); tnapi->int_mbox = intmbx; if (i <= 4) diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h index 6a7e13d..44a21cb 100644 --- a/drivers/net/ethernet/broadcom/tg3.h +++ b/drivers/net/ethernet/broadcom/tg3.h @@ -3004,11 +3004,12 @@ struct tg3_napi { u32 tx_prod ____cacheline_aligned; u32 tx_cons; u32 tx_pending; - u32 last_tx_cons; u32 prodmbox; - u32 wakeup_thresh; + u32 tx_wake_thresh_cur; + u32 tx_wake_thresh_def; struct tg3_tx_buffer_desc *tx_ring; struct tg3_tx_ring_info *tx_buffers; + u32 last_tx_cons; dma_addr_t status_mapping; dma_addr_t rx_rcb_mapping;