diff mbox

[2/6] net: pad skb data and shinfo as a whole rather than individually

Message ID 1325791164.29084.8.camel@dagon.hellion.org.uk
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Ian Campbell Jan. 5, 2012, 7:19 p.m. UTC
On Thu, 2012-01-05 at 17:46 +0000, Eric Dumazet wrote:
> Le jeudi 05 janvier 2012 à 17:13 +0000, Ian Campbell a écrit :
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index da0c97f..b6de604 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -189,8 +189,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> >  	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
> >  	 * Both skb->head and skb_shared_info are cache line aligned.
> >  	 */
> > -	size = SKB_DATA_ALIGN(size);
> > -	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +	size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info));
> >  	data = kmalloc_node_track_caller(size, gfp_mask, node);
> >  	if (!data)
> >  		goto nodata;
> 
> Unfortunately this makes the frequently use part of skb_shared_info not
> any more in a single cache line.
> 
> This will slow down many operations like kfree_skb() of cold skbs (TX
> completion for example)

I've been toying with the following which moves the infrequently used
destructor_arg field to the front. With 32 or 64 byte cache lines this
is then the only field which ends up on a different cache line. With 128
byte cache lines it is all already on the same line.

Does that alleviate your concern? If so I can clean it up and include it
in the series.

Cheers,
Ian.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Jan. 5, 2012, 8:22 p.m. UTC | #1
Le jeudi 05 janvier 2012 à 19:19 +0000, Ian Campbell a écrit :
> On Thu, 2012-01-05 at 17:46 +0000, Eric Dumazet wrote:
> > Le jeudi 05 janvier 2012 à 17:13 +0000, Ian Campbell a écrit :
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index da0c97f..b6de604 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -189,8 +189,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> > >  	 * aligned memory blocks, unless SLUB/SLAB debug is enabled.
> > >  	 * Both skb->head and skb_shared_info are cache line aligned.
> > >  	 */
> > > -	size = SKB_DATA_ALIGN(size);
> > > -	size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > +	size = SKB_DATA_ALIGN(size + sizeof(struct skb_shared_info));
> > >  	data = kmalloc_node_track_caller(size, gfp_mask, node);
> > >  	if (!data)
> > >  		goto nodata;
> > 
> > Unfortunately this makes the frequently use part of skb_shared_info not
> > any more in a single cache line.
> > 
> > This will slow down many operations like kfree_skb() of cold skbs (TX
> > completion for example)
> 
> I've been toying with the following which moves the infrequently used
> destructor_arg field to the front. With 32 or 64 byte cache lines this
> is then the only field which ends up on a different cache line. With 128
> byte cache lines it is all already on the same line.
> 
> Does that alleviate your concern? If so I can clean it up and include it
> in the series.
> 
> Cheers,
> Ian.
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index cbc815c..533c2ea 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -249,6 +249,11 @@ struct ubuf_info {
>   * the end of the header data, ie. at skb->end.
>   */
>  struct skb_shared_info {
> +	/* Intermediate layers must ensure that destructor_arg
> +	 * remains valid until skb destructor */
> +	void *		destructor_arg;
> +
> +	/* Warning all fields from here until dataref are cleared in __alloc_skb() */
>  	unsigned char	nr_frags;
>  	__u8		tx_flags;
>  	unsigned short	gso_size;
> @@ -264,9 +269,6 @@ struct skb_shared_info {
>  	 */
>  	atomic_t	dataref;
>  
> -	/* Intermediate layers must ensure that destructor_arg
> -	 * remains valid until skb destructor */
> -	void *		destructor_arg;
>  
>  	/* must be last field, see pskb_expand_head() */
>  	skb_frag_t	frags[MAX_SKB_FRAGS];

Hmmm

Take a look at kfree_skb() and please list all skb_shared_info needed
fields, for a typical skb with nr_frags= 0 or 1.

If all fit in a single cache line, its ok.

If not, this adds a cache line miss in a critical path.

Also in transmit path, when an skb is queued by cpu X on qdisc, and
dequeued by another cpu Y to be delivered to device.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Jan. 6, 2012, 11:20 a.m. UTC | #2
On Thu, 2012-01-05 at 20:22 +0000, Eric Dumazet wrote:
> Take a look at kfree_skb() and please list all skb_shared_info needed
> fields, for a typical skb with nr_frags= 0 or 1.
> 
> If all fit in a single cache line, its ok.

It doesn't fit in a single cache line today.

Before I started touching any of this stuff (e.g. in v3.1) kfree_skb
would touch 3x32 byte caches lines (with boundaries between
frag_list/hwtstamps and destructor_arg/frags[0]) or 2x64 byte cache
lines (boundary between between frag_list/hwtstamps).

After the v2 version of these patches the situation is much the same,
although the boundaries are different and which field is in which line
has shifted a bit. The boundaries are now gso_type/frag_list and
dataref/destructor_arg for 32 byte cache lines and gso_type/frag_list
for 64 byte cache lines.

With the proposed reordering to put destructor_arg first it actually
shrinks to just 2 32 byte cache lines (boundary between hwtstamps and
ip6_frag_id) or a single 64 byte cache line, since the other 32&64
boundary is between destructor_arg and nr_frags.

Even if we don't consider destructor_arg to be rarely used / cold things
are no worse than they were before (3x32 or 2x64 byte cache lines
touched).

So I think if anything I think these patches improve things when
nr_frags = 0 or 1 or at at least no worse than the current situation.

> If not, this adds a cache line miss in a critical path.

I think it does not, but please do check my working (see below).

> Also in transmit path, when an skb is queued by cpu X on qdisc, and
> dequeued by another cpu Y to be delivered to device.

You mean the path starting at dev_queue_xmit -> q->enqueue is true ->
__dev_xmit_skb ?

I didn't follow it properly but it doesn't seem like there is much use
of the shinfo at all in this area qdisc, the only thing I spotted was
the use of the gso_{size,segs} fields in qdisc_bstats_update and those
are always on the same cache line. Did I miss something?

Anyway, by similar logic to the kfree_skb case I think the number of
cachelines touched will always be <= the current number.

Ian.

My working:

						Used in kfree_skb?
unsigned short  nr_frags;			YES (if dataref -> 0 etc)
unsigned short  gso_size;
unsigned short  gso_segs;
unsigned short  gso_type;
__be32          ip6_frag_id;
__u8            tx_flags;			YES (if dataref -> 0 etc)
struct sk_buff  *frag_list;			YES (if dataref -> 0 etc)
struct skb_shared_hwtstamps hwtstamps;
atomic_t        dataref;			YES
void *          destructor_arg;			YES if skb->destructor || tx_flags & ZEROCOPY
                                                -> I count this as unused by
						   kfree_skb for the purposes here

skb_frag_t      frags[MAX_SKB_FRAGS];		YES (<nr_frags, if dataref->0 etc)

LEGEND:

*<-used in kfree_skb
 N<-offset from start of shinfo		    offset from end of shinfo-> -N

v3.1:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-=
 -40									-384
--------------------------------------------- 32 byte cache line --------------
 -8									-352
struct skb_shared_info {
*0	unsigned short  nr_frags;					-344
 2	unsigned short  gso_size;					-342
        /* Warning: this field is not always filled in (UFO)! */
 4	unsigned short  gso_segs;					-340
 6	unsigned short  gso_type;					-338
 8	__be32          ip6_frag_id;					-336
*12	__u8            tx_flags;					-332
*16	struct sk_buff  *frag_list;					-328
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-=
 24	struct skb_shared_hwtstamps hwtstamps;				-320

        /*
         * Warning : all fields before dataref are cleared in __alloc_skb()
         */
*40	atomic_t        dataref;					-304

        /* Intermediate layers must ensure that destructor_arg
         * remains valid until skb destructor */
 48	void *          destructor_arg;					-296

        /* must be last field, see pskb_expand_head() */
--------------------------------------------- 32 byte cache line --------------
*56	skb_frag_t      frags[MAX_SKB_FRAGS];				-16x18 
									= -288
									=  9x32
									or 4.5x64
									or 2.25x128
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 & 128 byte cache line =-=
 344	= 10.75x32 or 5.375x64 or 2.6875x128				-0
};

skb patches v2:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-=
 -56									-512
--------------------------------------------- 32 byte cache line --------------
 -24									-480
struct skb_shared_info {
*0	unsigned char	nr_frags;					-456
*1	__u8		tx_flags;					-455
 2	unsigned short	gso_size;					-454
	/* Warning: this field is not always filled in (UFO)! */
 4	unsigned short	gso_segs;					-452
 6	unsigned short  gso_type;					-450
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-=
*8	struct sk_buff	*frag_list;					-448
 16	struct skb_shared_hwtstamps hwtstamps;				-440
 32	__be32          ip6_frag_id;					-424

	/*
	 * Warning : all fields before dataref are cleared in __alloc_skb()
	 */
*36	atomic_t	dataref;					-420

	/* Intermediate layers must ensure that destructor_arg
	 * remains valid until skb destructor */
--------------------------------------------- 32 byte cache line --------------
 40	void *		destructor_arg;					-416

	/* must be last field, see pskb_expand_head() */
*48	skb_frag_t	frags[MAX_SKB_FRAGS];				-24*17
									= -408
									=  12.75x32
									or 6.375x64
									or 3.1875x128
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 & 128 byte cache line =-=
 456	= 14.25x32 or 7.125x64 or 3.5625x128				-0
};

skb patches + put destructor_arg first

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-=
 -56									-512
--------------------------------------------- 32 byte cache line
 -24									-480
struct skb_shared_info {
	/* Intermediate layers must ensure that destructor_arg
	 * remains valid until skb destructor */
 0	void *		destructor_arg;					-456

	/* Warning all fields from here until dataref are cleared in __alloc_skb() */
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 byte cache line =-=-=-=-=
*8	unsigned char	nr_frags;					-448
*9	__u8		tx_flags;					-447
 10	unsigned short	gso_size;					-446
	/* Warning: this field is not always filled in (UFO)! */
 12	unsigned short	gso_segs;					-444
 14	unsigned short  gso_type;					-442
*16	struct sk_buff	*frag_list;					-440
 24	struct skb_shared_hwtstamps hwtstamps;				-432
--------------------------------------------- 32 byte cache line --------------
 40	__be32          ip6_frag_id;					-416

	/*
	 * Warning : all fields before dataref are cleared in __alloc_skb()
	 */
*44	atomic_t	dataref;					-412

	/* must be last field, see pskb_expand_head() */
*48	skb_frag_t	frags[MAX_SKB_FRAGS];				-24*17
									= -408
									=  12.75x32
									or 6.375x64
									or 3.1875x128
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 32 & 64 & 128 byte cache line =-=
}; 456 = 14.25x32 or 7.125x64 or 3.5625x128



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 6, 2012, 12:33 p.m. UTC | #3
Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit :

> It doesn't fit in a single cache line today.

It really does, thanks to your (net: pack skb_shared_info more
efficiently) previous patch.

I dont understand your numbers, very hard to read.

Current net-next :

offsetof(struct skb_shared_info, nr_frags)=0x0
offsetof(struct skb_shared_info, frags[1])=0x40   (0x30 on 32bit arches)

So _all_ fields, including one frag, are included in a single cache line
on most machines (64-bytes cache line), IF struct skb_shared_info is
aligned.

Your patch obviously breaks this on 64bit arches, unless you make sure
sizeof(struct skb_shared_info) is a multiple of cache line.

[BTW, it is incidentaly the case after your 1/6 patch]

fields reordering is not going to change anything on this.

Or maybe I misread your patch ?

At least you claimed in Changelog : 

<quote>
 Reducing this overhead means that sometimes the tail end of
 the data can end up in the same cache line as the beginning of the shinfo but
 in many cases the allocation slop means that there is no overlap.
</quote>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Jan. 6, 2012, 1:20 p.m. UTC | #4
On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote:
> Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit :
> 
> > It doesn't fit in a single cache line today.
> 
> It really does, thanks to your (net: pack skb_shared_info more
> efficiently) previous patch.
> 
> I dont understand your numbers, very hard to read.
> 
> Current net-next :
> 
> offsetof(struct skb_shared_info, nr_frags)=0x0
> offsetof(struct skb_shared_info, frags[1])=0x40   (0x30 on 32bit arches)

I see 0x48 here at cset 6386994e03ebbe60338ded3d586308a41e81c0dc:
(gdb) print &((struct skb_shared_info *)0)->nr_frags
$1 = (short unsigned int *) 0x0
(gdb) print &((struct skb_shared_info *)0)->frags[1]
$2 = (skb_frag_t *) 0x48
(gdb) print &((struct skb_shared_info *)0)->frags[0]
$3 = (skb_frag_t *) 0x38

(it's 0x34 and 0x2c on 32 bit) and these numbers match what I posted for
v3.1 (and I imagine earlier since this stuff doesn't seem to change very
often).

I provided the offsets of each field in struct skb_shared_info, which
one do you think is wrong?

Remember that the end shared info is explicitly aligned (to the end of
the allocation == a cache line) while the front just ends up at wherever
the size dictates so you can't measure the alignment of the fields from
the beginning  of the struct, you need to measure from the end.

Ian.

> So _all_ fields, including one frag, are included in a single cache line
> on most machines (64-bytes cache line), IF struct skb_shared_info is
> aligned.
> 
> Your patch obviously breaks this on 64bit arches, unless you make sure
> sizeof(struct skb_shared_info) is a multiple of cache line.
> 
> [BTW, it is incidentaly the case after your 1/6 patch]
> 
> fields reordering is not going to change anything on this.
> 
> Or maybe I misread your patch ?
> 
> At least you claimed in Changelog : 
> 
> <quote>
>  Reducing this overhead means that sometimes the tail end of
>  the data can end up in the same cache line as the beginning of the shinfo but
>  in many cases the allocation slop means that there is no overlap.
> </quote>
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Jan. 6, 2012, 1:29 p.m. UTC | #5
On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote:
> Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit :
> 
> > It doesn't fit in a single cache line today.
> 
> It really does, thanks to your (net: pack skb_shared_info more
> efficiently) previous patch.
> 
> I dont understand your numbers, very hard to read.
> 
> Current net-next :
> 
> offsetof(struct skb_shared_info, nr_frags)=0x0
> offsetof(struct skb_shared_info, frags[1])=0x40   (0x30 on 32bit arches)
> 
> So _all_ fields, including one frag, are included in a single cache line
> on most machines (64-bytes cache line),

BTW, this is also true with my patch + put destructor_arg first in the
struct (at least for all interesting fields, since I chose
destructor_arg specifically because it did not seem interesting for
these purposes -- do you disagree?)

(gdb) print &((struct skb_shared_info *)0)->frags[1]
$1 = (skb_frag_t *) 0x48
but there is a cacheline boundary just before nr_frags:
(gdb) print &((struct skb_shared_info *)0)->nr_frags
$3 = (unsigned char *) 0x8

So the interesting fields total 0x48-0x8 = 0x40 bytes and the alignment
is such that this is a single cache line.

>  IF struct skb_shared_info is
> aligned.

Obviously the conditions for the above are a little different but they
are, AFAIK, met.

Ian.

> 
> Your patch obviously breaks this on 64bit arches, unless you make sure
> sizeof(struct skb_shared_info) is a multiple of cache line.
> 
> [BTW, it is incidentaly the case after your 1/6 patch]
> 
> fields reordering is not going to change anything on this.
> 
> Or maybe I misread your patch ?
> 
> At least you claimed in Changelog : 
> 
> <quote>
>  Reducing this overhead means that sometimes the tail end of
>  the data can end up in the same cache line as the beginning of the shinfo but
>  in many cases the allocation slop means that there is no overlap.
> </quote>
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Jan. 6, 2012, 1:37 p.m. UTC | #6
Le vendredi 06 janvier 2012 à 13:20 +0000, Ian Campbell a écrit :
> On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote:
> > Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit :
> > 
> > > It doesn't fit in a single cache line today.
> > 
> > It really does, thanks to your (net: pack skb_shared_info more
> > efficiently) previous patch.
> > 
> > I dont understand your numbers, very hard to read.
> > 
> > Current net-next :
> > 
> > offsetof(struct skb_shared_info, nr_frags)=0x0
> > offsetof(struct skb_shared_info, frags[1])=0x40   (0x30 on 32bit arches)
> 
> I see 0x48 here at cset 6386994e03ebbe60338ded3d586308a41e81c0dc:

If you read my mail, I said "current net-next" 

Please "git pull" again ?


Here I really have your commit 9f42f126154786e6e76df513004800c8c633f020
in.



> (gdb) print &((struct skb_shared_info *)0)->nr_frags
> $1 = (short unsigned int *) 0x0
> (gdb) print &((struct skb_shared_info *)0)->frags[1]
> $2 = (skb_frag_t *) 0x48
> (gdb) print &((struct skb_shared_info *)0)->frags[0]
> $3 = (skb_frag_t *) 0x38
> 
> (it's 0x34 and 0x2c on 32 bit) and these numbers match what I posted for
> v3.1 (and I imagine earlier since this stuff doesn't seem to change very
> often).
> 
> I provided the offsets of each field in struct skb_shared_info, which
> one do you think is wrong?
> 
> Remember that the end shared info is explicitly aligned (to the end of
> the allocation == a cache line) while the front just ends up at wherever
> the size dictates so you can't measure the alignment of the fields from
> the beginning  of the struct, you need to measure from the end.

You're mistaken. Really.

Current code makes SURE skb->end starts at a cache line boundary
(assuminf kmalloc() returned an aligned bloc, this might be not true
with SLAB debugging)

size = SKB_WITH_OVERHEAD(ksize(data));
...
skb->end = skb->tail + size;

So there is possibility of padding (less than 64 bytes) _after_
skb_shared_info and before the end of allocated area.

After your 9f42f1261547 commit on 64bit, we have no padding anymore
since sizeof(struct skb_shared_info)=0x140



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Campbell Jan. 6, 2012, 1:49 p.m. UTC | #7
On Fri, 2012-01-06 at 13:37 +0000, Eric Dumazet wrote:
> Le vendredi 06 janvier 2012 à 13:20 +0000, Ian Campbell a écrit :
> > On Fri, 2012-01-06 at 12:33 +0000, Eric Dumazet wrote:
> > > Le vendredi 06 janvier 2012 à 11:20 +0000, Ian Campbell a écrit :
> > > 
> > > > It doesn't fit in a single cache line today.
> > > 
> > > It really does, thanks to your (net: pack skb_shared_info more
> > > efficiently) previous patch.
> > > 
> > > I dont understand your numbers, very hard to read.
> > > 
> > > Current net-next :
> > > 
> > > offsetof(struct skb_shared_info, nr_frags)=0x0
> > > offsetof(struct skb_shared_info, frags[1])=0x40   (0x30 on 32bit arches)
> > 
> > I see 0x48 here at cset 6386994e03ebbe60338ded3d586308a41e81c0dc:
> 
> If you read my mail, I said "current net-next" 
> 
> Please "git pull" again ?

Oh, oops, I thought I'd checked I was up to date.

> Here I really have your commit 9f42f126154786e6e76df513004800c8c633f020
> in.
> 
> 
> 
> > (gdb) print &((struct skb_shared_info *)0)->nr_frags
> > $1 = (short unsigned int *) 0x0
> > (gdb) print &((struct skb_shared_info *)0)->frags[1]
> > $2 = (skb_frag_t *) 0x48
> > (gdb) print &((struct skb_shared_info *)0)->frags[0]
> > $3 = (skb_frag_t *) 0x38
> > 
> > (it's 0x34 and 0x2c on 32 bit) and these numbers match what I posted for
> > v3.1 (and I imagine earlier since this stuff doesn't seem to change very
> > often).
> > 
> > I provided the offsets of each field in struct skb_shared_info, which
> > one do you think is wrong?
> > 
> > Remember that the end shared info is explicitly aligned (to the end of
> > the allocation == a cache line) while the front just ends up at wherever
> > the size dictates so you can't measure the alignment of the fields from
> > the beginning  of the struct, you need to measure from the end.
> 
> You're mistaken. Really.
> 
> Current code makes SURE skb->end starts at a cache line boundary
> (assuminf kmalloc() returned an aligned bloc, this might be not true
> with SLAB debugging)
> 
> size = SKB_WITH_OVERHEAD(ksize(data));
> ...
> skb->end = skb->tail + size;
> 
> So there is possibility of padding (less than 64 bytes) _after_
> skb_shared_info and before the end of allocated area.

Oh, I somehow missed the SKB_DATA_ALIGN inside SKB_WITH_OVERHEAD when I
was reasoning about the before case (even though I touched it in my
patch!) thanks for prodding me sufficiently...

I guess the comment preceeding "size =" is therefore slightly misleading
since shinfo isn't "exactly at the end" but rather "at the end but still
aligned"?
	/* kmalloc(size) might give us more room than requested.
	 * Put skb_shared_info exactly at the end of allocated zone,
	 * to allow max possible filling before reallocation.
	 */

Anyhow, despite this I think with the patch to move destructor_arg to
the front ontop of this series the hot fields are all in a single cache
line, including the first frag. is that sufficient to alleviate your
concerns?

Ian.

> After your 9f42f1261547 commit on 64bit, we have no padding anymore
> since sizeof(struct skb_shared_info)=0x140
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index cbc815c..533c2ea 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -249,6 +249,11 @@  struct ubuf_info {
  * the end of the header data, ie. at skb->end.
  */
 struct skb_shared_info {
+	/* Intermediate layers must ensure that destructor_arg
+	 * remains valid until skb destructor */
+	void *		destructor_arg;
+
+	/* Warning all fields from here until dataref are cleared in __alloc_skb() */
 	unsigned char	nr_frags;
 	__u8		tx_flags;
 	unsigned short	gso_size;
@@ -264,9 +269,6 @@  struct skb_shared_info {
 	 */
 	atomic_t	dataref;
 
-	/* Intermediate layers must ensure that destructor_arg
-	 * remains valid until skb destructor */
-	void *		destructor_arg;
 
 	/* must be last field, see pskb_expand_head() */
 	skb_frag_t	frags[MAX_SKB_FRAGS];
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e942a86..592b8a8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -219,7 +219,10 @@  struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 	/* make sure we initialize shinfo sequentially */
 	shinfo = skb_shinfo(skb);
-	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
+
+	memset(&shinfo->nr_frags, 0,
+	       offsetof(struct skb_shared_info, dataref)
+	       - offsetof(struct skb_shared_info, nr_frags));
 	atomic_set(&shinfo->dataref, 1);
 	kmemcheck_annotate_variable(shinfo->destructor_arg);