Message ID | 1325791164.29084.8.camel@dagon.hellion.org.uk |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Le jeudi 05 janvier 2012 à 19:19 +0000, Ian Campbell a écrit : > On Thu, 2012-01-05 at 17:46 +0000, Eric Dumazet wrote: > > Le jeudi 05 janvier 2012 à 17:13 +0000, Ian Campbell a écrit : > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index da0c97f..b6de604 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -189,8 +189,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > > * aligned memory blocks, unless SLUB/SLAB debug is enabled. > > > * Both skb->head and skb_shared_info are cache line aligned. > > > */ > > > - size = SKB_DATA_ALIGN(size); > > > - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > + size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info)); > > > data = kmalloc_node_track_caller(size, gfp_mask, node); > > > if (!data) > > > goto nodata; > > > > Unfortunately this makes the frequently use part of skb_shared_info not > > any more in a single cache line. > > > > This will slow down many operations like kfree_skb() of cold skbs (TX > > completion for example) > > I've been toying with the following which moves the infrequently used > destructor_arg field to the front. With 32 or 64 byte cache lines this > is then the only field which ends up on a different cache line. With 128 > byte cache lines it is all already on the same line. > > Does that alleviate your concern? If so I can clean it up and include it > in the series. > > Cheers, > Ian. > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index cbc815c..533c2ea 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -249,6 +249,11 @@ struct ubuf_info { > * the end of the header data, ie. at skb->end. > */ > struct skb_shared_info { > + /* Intermediate layers must ensure that destructor_arg > + * remains valid until skb destructor */ > + void * destructor_arg; > + > + /* Warning all fields from here until dataref are cleared in __alloc_skb() */ > unsigned char nr_frags; > __u8 tx_flags; > unsigned short gso_size; > @@ -264,9 +269,6 @@ struct skb_shared_info { > */ > atomic_t dataref; > > - /* Intermediate layers must ensure that destructor_arg > - * remains valid until skb destructor */ > - void * destructor_arg; > > /* must be last field, see pskb_expand_head() */ > skb_frag_t frags[MAX_SKB_FRAGS]; Hmmm Take a look at kfree_skb() and please list all skb_shared_info needed fields, for a typical skb with nr_frags= 0 or 1. If all fit in a single cache line, its ok. If not, this adds a cache line miss in a critical path. Also in transmit path, when an skb is queued by cpu X on qdisc, and dequeued by another cpu Y to be delivered to device. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-01-05 at 20:22 +0000, Eric Dumazet wrote: > Take a look at kfree_skb() and please list all skb_shared_info needed > fields, for a typical skb with nr_frags= 0 or 1. > > If all fit in a single cache line, its ok. It doesn't fit in a single cache line today. Before I started touching any of this stuff (e.g. in v3.1) kfree_skb would touch 3x32 byte caches lines (with boundaries between frag_list/hwtstamps and destructor_arg/frags[0]) or 2x64 byte cache lines (boundary between between frag_list/hwtstamps). After the v2 version of these patches the situation is much the same, although the boundaries are different and which field is in which line has shifted a bit. The boundaries are now gso_type/frag_list and dataref/destructor_arg for 32 byte cache lines and gso_type/frag_list for 64 byte cache lines. With the proposed reordering to put destructor_arg first it actually shrinks to just 2 32 byte cache lines (boundary between hwtstamps and ip6_frag_id) or a single 64 byte cache line, since the other 32&64 boundary is between destructor_arg and nr_frags. Even if we don't consider destructor_arg to be rarely used / cold things are no worse than they were before (3x32 or 2x64 byte cache lines touched). So I think if anything I think these patches improve things when nr_frags = 0 or 1 or at at least no worse than the current situation. > If not, this adds a cache line miss in a critical path. I think it does not, but please do check my working (see below). > Also in transmit path, when an skb is queued by cpu X on qdisc, and > dequeued by another cpu Y to be delivered to device. You mean the path starting at dev_queue_xmit -> q->enqueue is true -> __dev_xmit_skb ? I didn't follow it properly but it doesn't seem like there is much use of the shinfo at all in this area qdisc, the only thing I spotted was the use of the gso_{size,segs} fields in qdisc_bstats_update and those are always on the same cache line. Did I miss something? Anyway, by similar logic to the kfree_skb case I think the number of cachelines touched will always be <= the current number. Ian. My working: Used in kfree_skb? unsigned short nr_frags; YES (if dataref -> 0 etc) unsigned short gso_size; unsigned short gso_segs; unsigned short gso_type; __be32 ip6_frag_id; __u8 tx_flags; YES (if dataref -> 0 etc) struct sk_buff *frag_list; YES (if dataref -> 0 etc) struct skb_shared_hwtstamps hwtstamps; atomic_t dataref; YES void * destructor_arg; YES if skb->destructor || tx_flags & ZEROCOPY -> I count this as unused by kfree_skb for the purposes here skb_frag_t frags[MAX_SKB_FRAGS]; YES (<nr_frags, if dataref->0 etc) LEGEND: *<-used in kfree_skb N<-offset from start of shinfo offset from end of shinfo-> -N v3.1: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-= -40 -384 --------------------------------------------- 32 byte cache line -------------- -8 -352 struct skb_shared_info { *0 unsigned short nr_frags; -344 2 unsigned short gso_size; -342 /* Warning: this field is not always filled in (UFO)! */ 4 unsigned short gso_segs; -340 6 unsigned short gso_type; -338 8 __be32 ip6_frag_id; -336 *12 __u8 tx_flags; -332 *16 struct sk_buff *frag_list; -328 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-= 24 struct skb_shared_hwtstamps hwtstamps; -320 /* * Warning : all fields before dataref are cleared in __alloc_skb() */ *40 atomic_t dataref; -304 /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ 48 void * destructor_arg; -296 /* must be last field, see pskb_expand_head() */ --------------------------------------------- 32 byte cache line -------------- *56 skb_frag_t frags[MAX_SKB_FRAGS]; -16x18 = -288 = 9x32 or 4.5x64 or 2.25x128 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 & 128 byte cache line =-= 344 = 10.75x32 or 5.375x64 or 2.6875x128 -0 }; skb patches v2: =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-= -56 -512 --------------------------------------------- 32 byte cache line -------------- -24 -480 struct skb_shared_info { *0 unsigned char nr_frags; -456 *1 __u8 tx_flags; -455 2 unsigned short gso_size; -454 /* Warning: this field is not always filled in (UFO)! */ 4 unsigned short gso_segs; -452 6 unsigned short gso_type; -450 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-= *8 struct sk_buff *frag_list; -448 16 struct skb_shared_hwtstamps hwtstamps; -440 32 __be32 ip6_frag_id; -424 /* * Warning : all fields before dataref are cleared in __alloc_skb() */ *36 atomic_t dataref; -420 /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ --------------------------------------------- 32 byte cache line -------------- 40 void * destructor_arg; -416 /* must be last field, see pskb_expand_head() */ *48 skb_frag_t frags[MAX_SKB_FRAGS]; -24*17 = -408 = 12.75x32 or 6.375x64 or 3.1875x128 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 & 128 byte cache line =-= 456 = 14.25x32 or 7.125x64 or 3.5625x128 -0 }; skb patches + put destructor_arg first =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-= -56 -512 --------------------------------------------- 32 byte cache line -24 -480 struct skb_shared_info { /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ 0 void * destructor_arg; -456 /* Warning all fields from here until dataref are cleared in __alloc_skb() */ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-= *8 unsigned char nr_frags; -448 *9 __u8 tx_flags; -447 10 unsigned short gso_size; -446 /* Warning: this field is not always filled in (UFO)! */ 12 unsigned short gso_segs; -444 14 unsigned short gso_type; -442 *16 struct sk_buff *frag_list; -440 24 struct skb_shared_hwtstamps hwtstamps; -432 --------------------------------------------- 32 byte cache line -------------- 40 __be32 ip6_frag_id; -416 /* * Warning : all fields before dataref are cleared in __alloc_skb() */ *44 atomic_t dataref; -412 /* must be last field, see pskb_expand_head() */ *48 skb_frag_t frags[MAX_SKB_FRAGS]; -24*17 = -408 = 12.75x32 or 6.375x64 or 3.1875x128 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 & 128 byte cache line =-= }; 456 = 14.25x32 or 7.125x64 or 3.5625x128 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit :
> It doesn't fit in a single cache line today.
It really does, thanks to your (net: pack skb_shared_info more
efficiently) previous patch.
I dont understand your numbers, very hard to read.
Current net-next :
offsetof(struct skb_shared_info, nr_frags)=0x0
offsetof(struct skb_shared_info, frags[1])=0x40 (0x30 on 32bit arches)
So _all_ fields, including one frag, are included in a single cache line
on most machines (64-bytes cache line), IF struct skb_shared_info is
aligned.
Your patch obviously breaks this on 64bit arches, unless you make sure
sizeof(struct skb_shared_info) is a multiple of cache line.
[BTW, it is incidentaly the case after your 1/6 patch]
fields reordering is not going to change anything on this.
Or maybe I misread your patch ?
At least you claimed in Changelog :
<quote>
Reducing this overhead means that sometimes the tail end of
the data can end up in the same cache line as the beginning of the shinfo but
in many cases the allocation slop means that there is no overlap.
</quote>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote: > Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit : > > > It doesn't fit in a single cache line today. > > It really does, thanks to your (net: pack skb_shared_info more > efficiently) previous patch. > > I dont understand your numbers, very hard to read. > > Current net-next : > > offsetof(struct skb_shared_info, nr_frags)=0x0 > offsetof(struct skb_shared_info, frags[1])=0x40 (0x30 on 32bit arches) I see 0x48 here at cset 6386994e03ebbe60338ded3d586308a41e81c0dc: (gdb) print &((struct skb_shared_info *)0)->nr_frags $1 = (short unsigned int *) 0x0 (gdb) print &((struct skb_shared_info *)0)->frags[1] $2 = (skb_frag_t *) 0x48 (gdb) print &((struct skb_shared_info *)0)->frags[0] $3 = (skb_frag_t *) 0x38 (it's 0x34 and 0x2c on 32 bit) and these numbers match what I posted for v3.1 (and I imagine earlier since this stuff doesn't seem to change very often). I provided the offsets of each field in struct skb_shared_info, which one do you think is wrong? Remember that the end shared info is explicitly aligned (to the end of the allocation == a cache line) while the front just ends up at wherever the size dictates so you can't measure the alignment of the fields from the beginning of the struct, you need to measure from the end. Ian. > So _all_ fields, including one frag, are included in a single cache line > on most machines (64-bytes cache line), IF struct skb_shared_info is > aligned. > > Your patch obviously breaks this on 64bit arches, unless you make sure > sizeof(struct skb_shared_info) is a multiple of cache line. > > [BTW, it is incidentaly the case after your 1/6 patch] > > fields reordering is not going to change anything on this. > > Or maybe I misread your patch ? > > At least you claimed in Changelog : > > <quote> > Reducing this overhead means that sometimes the tail end of > the data can end up in the same cache line as the beginning of the shinfo but > in many cases the allocation slop means that there is no overlap. > </quote> > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote: > Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit : > > > It doesn't fit in a single cache line today. > > It really does, thanks to your (net: pack skb_shared_info more > efficiently) previous patch. > > I dont understand your numbers, very hard to read. > > Current net-next : > > offsetof(struct skb_shared_info, nr_frags)=0x0 > offsetof(struct skb_shared_info, frags[1])=0x40 (0x30 on 32bit arches) > > So _all_ fields, including one frag, are included in a single cache line > on most machines (64-bytes cache line), BTW, this is also true with my patch + put destructor_arg first in the struct (at least for all interesting fields, since I chose destructor_arg specifically because it did not seem interesting for these purposes -- do you disagree?) (gdb) print &((struct skb_shared_info *)0)->frags[1] $1 = (skb_frag_t *) 0x48 but there is a cacheline boundary just before nr_frags: (gdb) print &((struct skb_shared_info *)0)->nr_frags $3 = (unsigned char *) 0x8 So the interesting fields total 0x48-0x8 = 0x40 bytes and the alignment is such that this is a single cache line. > IF struct skb_shared_info is > aligned. Obviously the conditions for the above are a little different but they are, AFAIK, met. Ian. > > Your patch obviously breaks this on 64bit arches, unless you make sure > sizeof(struct skb_shared_info) is a multiple of cache line. > > [BTW, it is incidentaly the case after your 1/6 patch] > > fields reordering is not going to change anything on this. > > Or maybe I misread your patch ? > > At least you claimed in Changelog : > > <quote> > Reducing this overhead means that sometimes the tail end of > the data can end up in the same cache line as the beginning of the shinfo but > in many cases the allocation slop means that there is no overlap. > </quote> > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 06 janvier 2012 à 13:20 +0000, Ian Campbell a écrit : > On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote: > > Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit : > > > > > It doesn't fit in a single cache line today. > > > > It really does, thanks to your (net: pack skb_shared_info more > > efficiently) previous patch. > > > > I dont understand your numbers, very hard to read. > > > > Current net-next : > > > > offsetof(struct skb_shared_info, nr_frags)=0x0 > > offsetof(struct skb_shared_info, frags[1])=0x40 (0x30 on 32bit arches) > > I see 0x48 here at cset 6386994e03ebbe60338ded3d586308a41e81c0dc: If you read my mail, I said "current net-next" Please "git pull" again ? Here I really have your commit 9f42f126154786e6e76df513004800c8c633f020 in. > (gdb) print &((struct skb_shared_info *)0)->nr_frags > $1 = (short unsigned int *) 0x0 > (gdb) print &((struct skb_shared_info *)0)->frags[1] > $2 = (skb_frag_t *) 0x48 > (gdb) print &((struct skb_shared_info *)0)->frags[0] > $3 = (skb_frag_t *) 0x38 > > (it's 0x34 and 0x2c on 32 bit) and these numbers match what I posted for > v3.1 (and I imagine earlier since this stuff doesn't seem to change very > often). > > I provided the offsets of each field in struct skb_shared_info, which > one do you think is wrong? > > Remember that the end shared info is explicitly aligned (to the end of > the allocation == a cache line) while the front just ends up at wherever > the size dictates so you can't measure the alignment of the fields from > the beginning of the struct, you need to measure from the end. You're mistaken. Really. Current code makes SURE skb->end starts at a cache line boundary (assuminf kmalloc() returned an aligned bloc, this might be not true with SLAB debugging) size = SKB_WITH_OVERHEAD(ksize(data)); ... skb->end = skb->tail + size; So there is possibility of padding (less than 64 bytes) _after_ skb_shared_info and before the end of allocated area. After your 9f42f1261547 commit on 64bit, we have no padding anymore since sizeof(struct skb_shared_info)=0x140 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-01-06 at 13:37 +0000, Eric Dumazet wrote: > Le vendredi 06 janvier 2012 à 13:20 +0000, Ian Campbell a écrit : > > On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote: > > > Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit : > > > > > > > It doesn't fit in a single cache line today. > > > > > > It really does, thanks to your (net: pack skb_shared_info more > > > efficiently) previous patch. > > > > > > I dont understand your numbers, very hard to read. > > > > > > Current net-next : > > > > > > offsetof(struct skb_shared_info, nr_frags)=0x0 > > > offsetof(struct skb_shared_info, frags[1])=0x40 (0x30 on 32bit arches) > > > > I see 0x48 here at cset 6386994e03ebbe60338ded3d586308a41e81c0dc: > > If you read my mail, I said "current net-next" > > Please "git pull" again ? Oh, oops, I thought I'd checked I was up to date. > Here I really have your commit 9f42f126154786e6e76df513004800c8c633f020 > in. > > > > > (gdb) print &((struct skb_shared_info *)0)->nr_frags > > $1 = (short unsigned int *) 0x0 > > (gdb) print &((struct skb_shared_info *)0)->frags[1] > > $2 = (skb_frag_t *) 0x48 > > (gdb) print &((struct skb_shared_info *)0)->frags[0] > > $3 = (skb_frag_t *) 0x38 > > > > (it's 0x34 and 0x2c on 32 bit) and these numbers match what I posted for > > v3.1 (and I imagine earlier since this stuff doesn't seem to change very > > often). > > > > I provided the offsets of each field in struct skb_shared_info, which > > one do you think is wrong? > > > > Remember that the end shared info is explicitly aligned (to the end of > > the allocation == a cache line) while the front just ends up at wherever > > the size dictates so you can't measure the alignment of the fields from > > the beginning of the struct, you need to measure from the end. > > You're mistaken. Really. > > Current code makes SURE skb->end starts at a cache line boundary > (assuminf kmalloc() returned an aligned bloc, this might be not true > with SLAB debugging) > > size = SKB_WITH_OVERHEAD(ksize(data)); > ... > skb->end = skb->tail + size; > > So there is possibility of padding (less than 64 bytes) _after_ > skb_shared_info and before the end of allocated area. Oh, I somehow missed the SKB_DATA_ALIGN inside SKB_WITH_OVERHEAD when I was reasoning about the before case (even though I touched it in my patch!) thanks for prodding me sufficiently... I guess the comment preceeding "size =" is therefore slightly misleading since shinfo isn't "exactly at the end" but rather "at the end but still aligned"? /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ Anyhow, despite this I think with the patch to move destructor_arg to the front ontop of this series the hot fields are all in a single cache line, including the first frag. is that sufficient to alleviate your concerns? Ian. > After your 9f42f1261547 commit on 64bit, we have no padding anymore > since sizeof(struct skb_shared_info)=0x140 > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cbc815c..533c2ea 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -249,6 +249,11 @@ struct ubuf_info { * the end of the header data, ie. at skb->end. */ struct skb_shared_info { + /* Intermediate layers must ensure that destructor_arg + * remains valid until skb destructor */ + void * destructor_arg; + + /* Warning all fields from here until dataref are cleared in __alloc_skb() */ unsigned char nr_frags; __u8 tx_flags; unsigned short gso_size; @@ -264,9 +269,6 @@ struct skb_shared_info { */ atomic_t dataref; - /* Intermediate layers must ensure that destructor_arg - * remains valid until skb destructor */ - void * destructor_arg; /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e942a86..592b8a8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -219,7 +219,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, /* make sure we initialize shinfo sequentially */ shinfo = skb_shinfo(skb); - memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); + + memset(&shinfo->nr_frags, 0, + offsetof(struct skb_shared_info, dataref) + - offsetof(struct skb_shared_info, nr_frags)); atomic_set(&shinfo->dataref, 1); kmemcheck_annotate_variable(shinfo->destructor_arg);