Message ID | 4F46404D.10509@mellanox.co.il |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
From: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Date: Thu, 23 Feb 2012 15:34:05 +0200 > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> This is rediculious as a default, yes even for 10Gb. Do you have any idea how high latency is going to be for packets trying to get into the transmit queue if there are already a thousand other frames in there? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 23 février 2012 à 14:45 -0500, David Miller a écrit : > From: Yevgeny Petrilin <yevgenyp@mellanox.co.il> > Date: Thu, 23 Feb 2012 15:34:05 +0200 > > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> > > This is rediculious as a default, yes even for 10Gb. > > Do you have any idea how high latency is going to be for packets > trying to get into the transmit queue if there are already a > thousand other frames in there? Before increasing TX ring sizes, a driver should implement BQL as a prereq. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> > > This is rediculious as a default, yes even for 10Gb. > > Do you have any idea how high latency is going to be for packets > trying to get into the transmit queue if there are already a > thousand other frames in there? On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped, how is it better? Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue, It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX. I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue. Thanks, Yevgeny -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Yevgeny Petrilin <yevgenyp@mellanox.com> Date: Fri, 24 Feb 2012 19:35:45 +0000 > On the other hand, when having smaller queue with 1000 in-flight > packets would mean queue would be stopped, how is it better? It's a thousand times better. Because if a high priority packet gets queued up it won't have to wait for 1024 packets to hit the wire before it can go out. You need to support byte queue limits before you jack things up so high like this, otherwise high priority packets are absolutely pointless and unusable. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 24 février 2012 à 19:35 +0000, Yevgeny Petrilin a écrit : > > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> > > > > This is rediculious as a default, yes even for 10Gb. > > > > Do you have any idea how high latency is going to be for packets > > trying to get into the transmit queue if there are already a > > thousand other frames in there? > > On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped, > how is it better? Its better because you can have any kind of Qdisc setup to properly classify packets, with 100.000 total packets in queues if you wish. TX ring is a single FIFO, and that is just horrible, especially with big packets... > Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue, > It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX. > Really I doubt people using forwarding setups use default qdiscs. Instead of bigger TX rings, they need appropriate Qdiscs. > I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue. Thats a shame. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 24 Feb 2012, Eric Dumazet wrote: > Le vendredi 24 février 2012 à 19:35 +0000, Yevgeny Petrilin a écrit : > > > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> > > > > > > This is rediculious as a default, yes even for 10Gb. > > > > > > Do you have any idea how high latency is going to be for packets > > > trying to get into the transmit queue if there are already a > > > thousand other frames in there? For a GigE NIC with a typical ring size of 256, the serialization delay for 256 1500 byte packets is: 1500*8*256/10^9 = ~3.1 msec For a 10-GigE NIC with a ring size of 1024, the serialization delay for 1024 1500 byte packets is: 1500*8*1024/10^10 = ~1.2 msec So it's not immediately clear that a ring size of 1024 is unreasonable for 10-GigE. It probably boils down to whether the default setting should be biased more toward low latency applications or high throughput bulk data applications. Determining the best happy medium is best decided by appropriate benchmark testing. Of course, anyone can change the settings to suit their purpose, so it's really just a question of what's best for the "usual" case. > > On the other hand, when having smaller queue with 1000 in-flight packets would mean queue would be stopped, > > how is it better? > > Its better because you can have any kind of Qdisc setup to properly > classify packets, with 100.000 total packets in queues if you wish. Not everyone wants to deal with the convoluted, arcane, and poorly documented qdisc machinery, especially with its current limitations at 10-GigE (or faster) line rates. > TX ring is a single FIFO, and that is just horrible, especially with big packets... > > > Having bigger TX ring helps dealing better with bursts of TX packets, without the overhead of stopping and starting the queue, > > It also makes sense to have same size TX and RX queues, for example in case of traffic being forwarded from TX to RX. > > Really I doubt people using forwarding setups use default qdiscs. I don't think it's necessarily that uncommon, such as a simple 10-GigE firewall setup. > Instead of bigger TX rings, they need appropriate Qdiscs. > > > I did find number of 10Gb vendors that have 1024 or more as the default size for TX queue. > > Thats a shame. -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le samedi 25 février 2012 à 01:51 -0500, Bill Fink a écrit : > For a GigE NIC with a typical ring size of 256, the serialization delay > for 256 1500 byte packets is: > > 1500*8*256/10^9 = ~3.1 msec > > For a 10-GigE NIC with a ring size of 1024, the serialization delay > for 1024 1500 byte packets is: > > 1500*8*1024/10^10 = ~1.2 msec > > So it's not immediately clear that a ring size of 1024 is unreasonable > for 10-GigE. > Its clear when you take into account packets of 64Kbytes (TSO) With current hardware and state of linux software, you dont need anymore very big NIC queues since they bring known drawbacks. It was true in the past with UP and some timer handlers that could hog cpu for long periods of time, and when TSO didnt exist. Hopefully all these cpu hogs are not running in softirq handlers anymore. If your workload needs more than ~500 slots, then something is wrong elsewhere and should be fixed. No more workarounds please. Now BQL (Byte Queue Limits) is available, a driver should implement it first before considering big TX rings. Thats a 20 minutes change. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index d60335f..174dc38 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -110,7 +110,7 @@ enum { #define MLX4_EN_NUM_TX_RINGS 8 #define MLX4_EN_NUM_PPP_RINGS 8 #define MAX_TX_RINGS (MLX4_EN_NUM_TX_RINGS + MLX4_EN_NUM_PPP_RINGS) -#define MLX4_EN_DEF_TX_RING_SIZE 512 +#define MLX4_EN_DEF_TX_RING_SIZE 1024 #define MLX4_EN_DEF_RX_RING_SIZE 1024 /* Target number of packets to coalesce with interrupt moderation */
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> --- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)