Message ID | 1327494434-21566-1-git-send-email-ian.campbell@citrix.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Le mercredi 25 janvier 2012 à 12:27 +0000, Ian Campbell a écrit : > This reduces the minimum overhead required for this allocation such that the > shinfo can be grown in the following patch without overflowing 2048 bytes for a > 1500 byte frame. > > Reducing this overhead while also growing the shinfo means that sometimes the > tail end of the data can end up in the same cache line as the beginning of the > shinfo. Specifically in the case of the 64 byte cache lines on a 64 bit system > the first 8 bytes of shinfo can overlap the tail cacheline of the data. In many > cases the allocation slop means that there is no overlap. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > --- Hmm... you missed build_skb() and all places we use SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) (for example in some drivers) If you want to see possible performance impact of your changes, see commit e52fcb2462ac (bnx2x: uses build_skb() in receive path), and expect a drop from 820.000 pps to 720.000 pps -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-01-25 at 12:51 +0000, Eric Dumazet wrote: > Le mercredi 25 janvier 2012 à 12:27 +0000, Ian Campbell a écrit : > > This reduces the minimum overhead required for this allocation such that the > > shinfo can be grown in the following patch without overflowing 2048 bytes for a > > 1500 byte frame. > > > > Reducing this overhead while also growing the shinfo means that sometimes the > > tail end of the data can end up in the same cache line as the beginning of the > > shinfo. Specifically in the case of the 64 byte cache lines on a 64 bit system > > the first 8 bytes of shinfo can overlap the tail cacheline of the data. In many > > cases the allocation slop means that there is no overlap. > > > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > Cc: "David S. Miller" <davem@davemloft.net> > > Cc: Eric Dumazet <eric.dumazet@gmail.com> > > --- > > Hmm... you missed build_skb() Presumably build_skb should be using SKB_WITH_OVERHEAD instead of open coding it? (which I think is how I managed to miss it) > and all places we use > SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > (for example in some drivers) I'm not sure how I missed these with my grep. Obviously it is necessary to catch them all in order to avoid the performance drop you mention below. Fortunately there is fewer than a dozen instances in half a dozen drivers to check. > If you want to see possible performance impact of your changes, see > commit e52fcb2462ac (bnx2x: uses build_skb() in receive path), and > expect a drop from 820.000 pps to 720.000 pps Can you elaborate on the specific benchmark you used there? Thanks, Ian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit :
> Can you elaborate on the specific benchmark you used there?
One machine was sending udp frames on my target (using pktgen)
Target was running a mono threaded udp receiver (one socket)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Eric, On Wed, 2012-01-25 at 13:12 +0000, Eric Dumazet wrote: > Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit : > > > Can you elaborate on the specific benchmark you used there? > > One machine was sending udp frames on my target (using pktgen) > > Target was running a mono threaded udp receiver (one socket) I've been playing with pktgen and I'm seeing more like 81,600-81,800 pps from a UDP transmitter, measuring on the rx side using "bwm-ng -u packets" and sinking the traffic with "nc -l -u -p 9 > /dev/null". The numbers are the same with or without this series. You mentioned numbers in the 820pps region -- is that really kilo-pps (in which case I'm an order of magnitude down) or actually 820pps (in which case I'm somehow a couple of orders of magnitude up). I'm using a single NIC transmitter, no delay, 1000000 clones of each skb and I've tried 60 and 1500 byte packets. In the 60 byte case I see more like 50k pps I'm in the process of setting up a receiver with a bnx2 but in the meantime I feel like I'm making some obvious or fundamental flaw in my method... Any tips greatly appreciated. Thanks, Ian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 31 janvier 2012 à 14:35 +0000, Ian Campbell a écrit : > Hi Eric, > > On Wed, 2012-01-25 at 13:12 +0000, Eric Dumazet wrote: > > Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit : > > > > > Can you elaborate on the specific benchmark you used there? > > > > One machine was sending udp frames on my target (using pktgen) > > > > Target was running a mono threaded udp receiver (one socket) > > I've been playing with pktgen and I'm seeing more like 81,600-81,800 pps > from a UDP transmitter, measuring on the rx side using "bwm-ng -u > packets" and sinking the traffic with "nc -l -u -p 9 > /dev/null". The > numbers are the same with or without this series. > > You mentioned numbers in the 820pps region -- is that really kilo-pps > (in which case I'm an order of magnitude down) or actually 820pps (in > which case I'm somehow a couple of orders of magnitude up). > > I'm using a single NIC transmitter, no delay, 1000000 clones of each skb > and I've tried 60 and 1500 byte packets. In the 60 byte case I see more > like 50k pps > > I'm in the process of setting up a receiver with a bnx2 but in the > meantime I feel like I'm making some obvious or fundamental flaw in my > method... > > Any tips greatly appreciated. I confirm I reach 820.000 packets per second, on a Gigabit link. Sender can easily reach line rate (more than 1.000.000 packets per second) Check how many packet drops you have on receiver ? ifconfig eth0 or "ethtool -S eth0" -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ian Campbell <Ian.Campbell@citrix.com> : [...] > I'm using a single NIC transmitter, no delay, 1000000 clones of each skb > and I've tried 60 and 1500 byte packets. In the 60 byte case I see more > like 50k pps At this stage 'tools/perf/perf top -G graph' sometimes helps me discover debug options that I had left enabled.
On Tue, 2012-01-31 at 14:54 +0000, Francois Romieu wrote: > Ian Campbell <Ian.Campbell@citrix.com> : > [...] > > I'm using a single NIC transmitter, no delay, 1000000 clones of each skb > > and I've tried 60 and 1500 byte packets. In the 60 byte case I see more > > like 50k pps > > At this stage 'tools/perf/perf top -G graph' sometimes helps me discover > debug options that I had left enabled. Thanks for the tip, I'll give it a look. Ian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-01-31 at 14:45 +0000, Eric Dumazet wrote: > Le mardi 31 janvier 2012 à 14:35 +0000, Ian Campbell a écrit : > > Hi Eric, > > > > On Wed, 2012-01-25 at 13:12 +0000, Eric Dumazet wrote: > > > Le mercredi 25 janvier 2012 à 13:09 +0000, Ian Campbell a écrit : > > > > > > > Can you elaborate on the specific benchmark you used there? > > > > > > One machine was sending udp frames on my target (using pktgen) > > > > > > Target was running a mono threaded udp receiver (one socket) > > > > I've been playing with pktgen and I'm seeing more like 81,600-81,800 pps > > from a UDP transmitter, measuring on the rx side using "bwm-ng -u > > packets" and sinking the traffic with "nc -l -u -p 9 > /dev/null". The > > numbers are the same with or without this series. > > > > You mentioned numbers in the 820pps region -- is that really kilo-pps > > (in which case I'm an order of magnitude down) or actually 820pps (in > > which case I'm somehow a couple of orders of magnitude up). > > > > I'm using a single NIC transmitter, no delay, 1000000 clones of each skb > > and I've tried 60 and 1500 byte packets. In the 60 byte case I see more > > like 50k pps > > > > I'm in the process of setting up a receiver with a bnx2 but in the > > meantime I feel like I'm making some obvious or fundamental flaw in my > > method... > > > > Any tips greatly appreciated. > > I confirm I reach 820.000 packets per second, on a Gigabit link. Heh, this is where $LOCALE steps up and confuses things again (commas vs periods for thousands separators) ;-). However I know what you mean. > Sender can easily reach line rate (more than 1.000.000 packets per > second) Right, this is where I seem to have failed -- 10,000,000 took more like 2 minutes to send than the expected 10s... > Check how many packet drops you have on receiver ? > > ifconfig eth0 > > or "ethtool -S eth0" After a full 10,000,000 packet run of pktgen I see a difference in rx_packets of +10,001,561 (there is some other traffic). Various rx_*_error do not increase and neither does rx_no_buffer_count or rx_missed_errors (although both of the latter two are non-zero to start with). ifconfig tells the same story. I guess this isn't surprising given the send rate since it's not really stressing the receiver all that much. I'll investigate the sending side. The sender is running a 2.6.32 distro kernel. Maybe I need to tweak it somewhat. Thanks, Ian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 01 février 2012 à 09:39 +0000, Ian Campbell a écrit : > I'll investigate the sending side. The sender is running a 2.6.32 distro > kernel. Maybe I need to tweak it somewhat. Thats strange, even a very old kernel is able to sustain Gigabit line rate with pktgen. Maybe a problem with a switch and ethernet pauses ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2012-02-01 at 10:02 +0000, Eric Dumazet wrote: > Le mercredi 01 février 2012 à 09:39 +0000, Ian Campbell a écrit : > > > I'll investigate the sending side. The sender is running a 2.6.32 distro > > kernel. Maybe I need to tweak it somewhat. > > Thats strange, even a very old kernel is able to sustain Gigabit line > rate with pktgen. Maybe a problem with a switch and ethernet pauses ? Tanks, that was something I was going to investigate. These machines are in our lab so I need to figure out if they are even directly attached to the same switch etc. Ian. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 50db9b0..f04333b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -41,7 +41,7 @@ #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \ ~(SMP_CACHE_BYTES - 1)) #define SKB_WITH_OVERHEAD(X) \ - ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + ((X) - sizeof(struct skb_shared_info)) #define SKB_MAX_ORDER(X, ORDER) \ SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X)) #define SKB_MAX_HEAD(X) (SKB_MAX_ORDER((X), 0)) @@ -50,7 +50,7 @@ /* return minimum truesize of one skb containing X bytes of data */ #define SKB_TRUESIZE(X) ((X) + \ SKB_DATA_ALIGN(sizeof(struct sk_buff)) + \ - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + sizeof(struct skb_shared_info)) /* A. Checksumming of received packets by device. * diff --git a/net/core/skbuff.c b/net/core/skbuff.c index da0c97f..b6de604 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -189,8 +189,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, * aligned memory blocks, unless SLUB/SLAB debug is enabled. * Both skb->head and skb_shared_info are cache line aligned. */ - size = SKB_DATA_ALIGN(size); - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info)); data = kmalloc_node_track_caller(size, gfp_mask, node); if (!data) goto nodata;
This reduces the minimum overhead required for this allocation such that the shinfo can be grown in the following patch without overflowing 2048 bytes for a 1500 byte frame. Reducing this overhead while also growing the shinfo means that sometimes the tail end of the data can end up in the same cache line as the beginning of the shinfo. Specifically in the case of the 64 byte cache lines on a 64 bit system the first 8 bytes of shinfo can overlap the tail cacheline of the data. In many cases the allocation slop means that there is no overlap. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> --- include/linux/skbuff.h | 4 ++-- net/core/skbuff.c | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-)