[05/10] net: move destructor_arg to the front of sk_buff.

Message ID	1334067984-7706-5-git-send-email-ian.campbell@citrix.com
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Ian Campbell <ian.campbell@citrix.com> To: netdev@vger.kernel.org CC: David Miller <davem@davemloft.net>, Eric Dumazet <eric.dumazet@gmail.com>, "Michael S. Tsirkin" <mst@redhat.com>, Wei Liu <wei.liu2@citrix.com>, xen-devel@lists.xen.org, Ian Campbell <ian.campbell@citrix.com> Subject: [PATCH 05/10] net: move destructor_arg to the front of sk_buff. Date: Tue, 10 Apr 2012 15:26:19 +0100 Message-ID: <1334067984-7706-5-git-send-email-ian.campbell@citrix.com> In-Reply-To: <1334067965.5394.22.camel@zakaz.uk.xensource.com> References: <1334067965.5394.22.camel@zakaz.uk.xensource.com> MIME-Version: 1.0 Content-Type: text/plain Sender: netdev-owner@vger.kernel.org Precedence: bulk

Ian Campbell April 10, 2012, 2:26 p.m. UTC

As of the previous patch we align the end (rather than the start) of the struct
to a cache line and so, with 32 and 64 byte cache lines and the shinfo size
increase from the next patch, the first 8 bytes of the struct end up on a
different cache line to the rest of it so make sure it is something relatively
unimportant to avoid hitting an extra cache line on hot operations such as
kfree_skb.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/skbuff.h |   15 ++++++++++-----
 net/core/skbuff.c      |    5 ++++-
 2 files changed, 14 insertions(+), 6 deletions(-)

Eric Dumazet April 10, 2012, 3:05 p.m. UTC | #1

On Tue, 2012-04-10 at 15:26 +0100, Ian Campbell wrote:
> As of the previous patch we align the end (rather than the start) of the struct
> to a cache line and so, with 32 and 64 byte cache lines and the shinfo size
> increase from the next patch, the first 8 bytes of the struct end up on a
> different cache line to the rest of it so make sure it is something relatively
> unimportant to avoid hitting an extra cache line on hot operations such as
> kfree_skb.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  include/linux/skbuff.h |   15 ++++++++++-----
>  net/core/skbuff.c      |    5 ++++-
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0ad6a46..f0ae39c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -265,6 +265,15 @@ struct ubuf_info {
>   * the end of the header data, ie. at skb->end.
>   */
>  struct skb_shared_info {
> +	/* Intermediate layers must ensure that destructor_arg
> +	 * remains valid until skb destructor */
> +	void		*destructor_arg;
> +
> +	/*
> +	 * Warning: all fields from here until dataref are cleared in
> +	 * __alloc_skb()
> +	 *
> +	 */
>  	unsigned char	nr_frags;
>  	__u8		tx_flags;
>  	unsigned short	gso_size;
> @@ -276,14 +285,10 @@ struct skb_shared_info {
>  	__be32          ip6_frag_id;
>  
>  	/*
> -	 * Warning : all fields before dataref are cleared in __alloc_skb()
> +	 * Warning: all fields before dataref are cleared in __alloc_skb()
>  	 */
>  	atomic_t	dataref;
>  
> -	/* Intermediate layers must ensure that destructor_arg
> -	 * remains valid until skb destructor */
> -	void *		destructor_arg;
> -
>  	/* must be last field, see pskb_expand_head() */
>  	skb_frag_t	frags[MAX_SKB_FRAGS];
>  };
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d4e139e..b8a41d6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  
>  	/* make sure we initialize shinfo sequentially */
>  	shinfo = skb_shinfo(skb);
> -	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> +
> +	memset(&shinfo->nr_frags, 0,
> +	       offsetof(struct skb_shared_info, dataref)
> +	       - offsetof(struct skb_shared_info, nr_frags));
>  	atomic_set(&shinfo->dataref, 1);
>  	kmemcheck_annotate_variable(shinfo->destructor_arg);
>  

Not sure if we can do the same in build_skb()


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell April 10, 2012, 3:19 p.m. UTC | #2

On Tue, 2012-04-10 at 16:05 +0100, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 15:26 +0100, Ian Campbell wrote:
> > As of the previous patch we align the end (rather than the start) of the struct
> > to a cache line and so, with 32 and 64 byte cache lines and the shinfo size
> > increase from the next patch, the first 8 bytes of the struct end up on a
> > different cache line to the rest of it so make sure it is something relatively
> > unimportant to avoid hitting an extra cache line on hot operations such as
> > kfree_skb.
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> >  include/linux/skbuff.h |   15 ++++++++++-----
> >  net/core/skbuff.c      |    5 ++++-
> >  2 files changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 0ad6a46..f0ae39c 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -265,6 +265,15 @@ struct ubuf_info {
> >   * the end of the header data, ie. at skb->end.
> >   */
> >  struct skb_shared_info {
> > +	/* Intermediate layers must ensure that destructor_arg
> > +	 * remains valid until skb destructor */
> > +	void		*destructor_arg;
> > +
> > +	/*
> > +	 * Warning: all fields from here until dataref are cleared in
> > +	 * __alloc_skb()
> > +	 *
> > +	 */
> >  	unsigned char	nr_frags;
> >  	__u8		tx_flags;
> >  	unsigned short	gso_size;
> > @@ -276,14 +285,10 @@ struct skb_shared_info {
> >  	__be32          ip6_frag_id;
> >  
> >  	/*
> > -	 * Warning : all fields before dataref are cleared in __alloc_skb()
> > +	 * Warning: all fields before dataref are cleared in __alloc_skb()
> >  	 */
> >  	atomic_t	dataref;
> >  
> > -	/* Intermediate layers must ensure that destructor_arg
> > -	 * remains valid until skb destructor */
> > -	void *		destructor_arg;
> > -
> >  	/* must be last field, see pskb_expand_head() */
> >  	skb_frag_t	frags[MAX_SKB_FRAGS];
> >  };
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index d4e139e..b8a41d6 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> >  
> >  	/* make sure we initialize shinfo sequentially */
> >  	shinfo = skb_shinfo(skb);
> > -	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> > +
> > +	memset(&shinfo->nr_frags, 0,
> > +	       offsetof(struct skb_shared_info, dataref)
> > +	       - offsetof(struct skb_shared_info, nr_frags));
> >  	atomic_set(&shinfo->dataref, 1);
> >  	kmemcheck_annotate_variable(shinfo->destructor_arg);
> >  
> 
> Not sure if we can do the same in build_skb()

I don't think there's any chance of there being a destructor_arg to
preserve in that case?

Regardless of that though I think for consistency it would be worth
pulling the common shinfo init out into a helper and using it in both
places.

I'll make that change.

Ian.

> 
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duyck, Alexander H April 10, 2012, 6:33 p.m. UTC | #3

On 04/10/2012 07:26 AM, Ian Campbell wrote:
> As of the previous patch we align the end (rather than the start) of the struct
> to a cache line and so, with 32 and 64 byte cache lines and the shinfo size
> increase from the next patch, the first 8 bytes of the struct end up on a
> different cache line to the rest of it so make sure it is something relatively
> unimportant to avoid hitting an extra cache line on hot operations such as
> kfree_skb.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  include/linux/skbuff.h |   15 ++++++++++-----
>  net/core/skbuff.c      |    5 ++++-
>  2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 0ad6a46..f0ae39c 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -265,6 +265,15 @@ struct ubuf_info {
>   * the end of the header data, ie. at skb->end.
>   */
>  struct skb_shared_info {
> +	/* Intermediate layers must ensure that destructor_arg
> +	 * remains valid until skb destructor */
> +	void		*destructor_arg;
> +
> +	/*
> +	 * Warning: all fields from here until dataref are cleared in
> +	 * __alloc_skb()
> +	 *
> +	 */
>  	unsigned char	nr_frags;
>  	__u8		tx_flags;
>  	unsigned short	gso_size;
> @@ -276,14 +285,10 @@ struct skb_shared_info {
>  	__be32          ip6_frag_id;
>  
>  	/*
> -	 * Warning : all fields before dataref are cleared in __alloc_skb()
> +	 * Warning: all fields before dataref are cleared in __alloc_skb()
>  	 */
>  	atomic_t	dataref;
>  
> -	/* Intermediate layers must ensure that destructor_arg
> -	 * remains valid until skb destructor */
> -	void *		destructor_arg;
> -
>  	/* must be last field, see pskb_expand_head() */
>  	skb_frag_t	frags[MAX_SKB_FRAGS];
>  };
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d4e139e..b8a41d6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  
>  	/* make sure we initialize shinfo sequentially */
>  	shinfo = skb_shinfo(skb);
> -	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> +
> +	memset(&shinfo->nr_frags, 0,
> +	       offsetof(struct skb_shared_info, dataref)
> +	       - offsetof(struct skb_shared_info, nr_frags));
>  	atomic_set(&shinfo->dataref, 1);
>  	kmemcheck_annotate_variable(shinfo->destructor_arg);
>  

Have you checked this for 32 bit as well as 64?  Based on my math your
next patch will still mess up the memset on 32 bit with the structure
being split somewhere just in front of hwtstamps.

Why not just take frags and move it to the start of the structure?  It
is already an unknown value because it can be either 16 or 17 depending
on the value of PAGE_SIZE, and since you are making changes to frags the
changes wouldn't impact the alignment of the other values later on since
you are aligning the end of the structure.  That way you would be
guaranteed that all of the fields that will be memset would be in the
last 64 bytes.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet April 10, 2012, 6:41 p.m. UTC | #4

On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote:

> Have you checked this for 32 bit as well as 64?  Based on my math your
> next patch will still mess up the memset on 32 bit with the structure
> being split somewhere just in front of hwtstamps.
> 
> Why not just take frags and move it to the start of the structure?  It
> is already an unknown value because it can be either 16 or 17 depending
> on the value of PAGE_SIZE, and since you are making changes to frags the
> changes wouldn't impact the alignment of the other values later on since
> you are aligning the end of the structure.  That way you would be
> guaranteed that all of the fields that will be memset would be in the
> last 64 bytes.
> 

Now when a fragmented packet is copied in pskb_expand_head(), you access
two separate zones of memory to copy the shinfo. But its supposed to be
slow path.

Problem with this is that the offsets of often used fields will be big
(instead of being < 127) and code will be bigger on x86.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duyck, Alexander H April 10, 2012, 7:15 p.m. UTC | #5

On 04/10/2012 11:41 AM, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote:
>
>> Have you checked this for 32 bit as well as 64?  Based on my math your
>> next patch will still mess up the memset on 32 bit with the structure
>> being split somewhere just in front of hwtstamps.
>>
>> Why not just take frags and move it to the start of the structure?  It
>> is already an unknown value because it can be either 16 or 17 depending
>> on the value of PAGE_SIZE, and since you are making changes to frags the
>> changes wouldn't impact the alignment of the other values later on since
>> you are aligning the end of the structure.  That way you would be
>> guaranteed that all of the fields that will be memset would be in the
>> last 64 bytes.
>>
> Now when a fragmented packet is copied in pskb_expand_head(), you access
> two separate zones of memory to copy the shinfo. But its supposed to be
> slow path.
>
> Problem with this is that the offsets of often used fields will be big
> (instead of being < 127) and code will be bigger on x86.

Actually now that I think about it my concerns go much further than the
memset.  I'm convinced that this is going to cause a pretty significant
performance regression on multiple drivers, especially on non x86_64
architecture.  What we have right now on most platforms is a
skb_shared_info structure in which everything up to and including frag 0
is all in one cache line.  This gives us pretty good performance for igb
and ixgbe since that is our common case when jumbo frames are not
enabled is to split the head and place the data in a page.

However the change being recommend here only resolves the issue for one
specific architecture, and that is what I don't agree with.  What we
need is a solution that also works for 64K pages or 32 bit pointers and
I am fairly certain this current solution does not.

Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell April 11, 2012, 7:56 a.m. UTC | #6

On Tue, 2012-04-10 at 19:33 +0100, Alexander Duyck wrote:
> On 04/10/2012 07:26 AM, Ian Campbell wrote:
> > As of the previous patch we align the end (rather than the start) of the struct
> > to a cache line and so, with 32 and 64 byte cache lines and the shinfo size
> > increase from the next patch, the first 8 bytes of the struct end up on a
> > different cache line to the rest of it so make sure it is something relatively
> > unimportant to avoid hitting an extra cache line on hot operations such as
> > kfree_skb.
> >
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> >  include/linux/skbuff.h |   15 ++++++++++-----
> >  net/core/skbuff.c      |    5 ++++-
> >  2 files changed, 14 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 0ad6a46..f0ae39c 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -265,6 +265,15 @@ struct ubuf_info {
> >   * the end of the header data, ie. at skb->end.
> >   */
> >  struct skb_shared_info {
> > +	/* Intermediate layers must ensure that destructor_arg
> > +	 * remains valid until skb destructor */
> > +	void		*destructor_arg;
> > +
> > +	/*
> > +	 * Warning: all fields from here until dataref are cleared in
> > +	 * __alloc_skb()
> > +	 *
> > +	 */
> >  	unsigned char	nr_frags;
> >  	__u8		tx_flags;
> >  	unsigned short	gso_size;
> > @@ -276,14 +285,10 @@ struct skb_shared_info {
> >  	__be32          ip6_frag_id;
> >  
> >  	/*
> > -	 * Warning : all fields before dataref are cleared in __alloc_skb()
> > +	 * Warning: all fields before dataref are cleared in __alloc_skb()
> >  	 */
> >  	atomic_t	dataref;
> >  
> > -	/* Intermediate layers must ensure that destructor_arg
> > -	 * remains valid until skb destructor */
> > -	void *		destructor_arg;
> > -
> >  	/* must be last field, see pskb_expand_head() */
> >  	skb_frag_t	frags[MAX_SKB_FRAGS];
> >  };
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index d4e139e..b8a41d6 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -214,7 +214,10 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> >  
> >  	/* make sure we initialize shinfo sequentially */
> >  	shinfo = skb_shinfo(skb);
> > -	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> > +
> > +	memset(&shinfo->nr_frags, 0,
> > +	       offsetof(struct skb_shared_info, dataref)
> > +	       - offsetof(struct skb_shared_info, nr_frags));
> >  	atomic_set(&shinfo->dataref, 1);
> >  	kmemcheck_annotate_variable(shinfo->destructor_arg);
> >  
> 
> Have you checked this for 32 bit as well as 64?  Based on my math your
> next patch will still mess up the memset on 32 bit with the structure
> being split somewhere just in front of hwtstamps.

You mean 32 byte cache lines? If so then yes there is a split half way
through the structure in that case but there's no way all this data
could ever fit in a single 32 byte cache line. Not including the frags
or destructor_arg the region nr_frags up to and including dataref is 36
bytes on 32 bit and 40 bytes on 64 bit. I've not changed anything in
this respect.

If you meant 64 byte cache lines with 32 bit structure sizes then by my
calculations everything from destructor_arg (in fact a bit earlier, from
12 bytes before then) up to and including frag[0] is in the same 64 byte
cache line.

I find the easiest way to check is to use gdb and open code an offsetof
macro.

(gdb) print/d sizeof(struct skb_shared_info) - (unsigned long)&(((struct skb_shared_info *)0)->nr_frags)
$3 = 240
(gdb) print/d sizeof(struct skb_shared_info) - (unsigned long)&(((struct skb_shared_info *)0)->frags[1])
$4 = 192

So given 64 byte cache lines the interesting area starts at 240/64=3.75
cache lines from the (aligned) end and it finishes just before 192/64=3
cache lines from the end, so nr_frags through to frags[0] are therefore
on the same cache line.

Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell April 11, 2012, 8 a.m. UTC | #7

On Tue, 2012-04-10 at 20:15 +0100, Alexander Duyck wrote:
> On 04/10/2012 11:41 AM, Eric Dumazet wrote:
> > On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote:
> >
> >> Have you checked this for 32 bit as well as 64?  Based on my math your
> >> next patch will still mess up the memset on 32 bit with the structure
> >> being split somewhere just in front of hwtstamps.
> >>
> >> Why not just take frags and move it to the start of the structure?  It
> >> is already an unknown value because it can be either 16 or 17 depending
> >> on the value of PAGE_SIZE, and since you are making changes to frags the
> >> changes wouldn't impact the alignment of the other values later on since
> >> you are aligning the end of the structure.  That way you would be
> >> guaranteed that all of the fields that will be memset would be in the
> >> last 64 bytes.
> >>
> > Now when a fragmented packet is copied in pskb_expand_head(), you access
> > two separate zones of memory to copy the shinfo. But its supposed to be
> > slow path.
> >
> > Problem with this is that the offsets of often used fields will be big
> > (instead of being < 127) and code will be bigger on x86.
> 
> Actually now that I think about it my concerns go much further than the
> memset.  I'm convinced that this is going to cause a pretty significant
> performance regression on multiple drivers, especially on non x86_64
> architecture.  What we have right now on most platforms is a
> skb_shared_info structure in which everything up to and including frag 0
> is all in one cache line.  This gives us pretty good performance for igb
> and ixgbe since that is our common case when jumbo frames are not
> enabled is to split the head and place the data in a page.

With all the changes in this series it is still possible to fit a
maximum standard MTU frame and the shinfo on the same 4K page while also
have the skb_shared_info up to and including frag [0] aligned to the
same 64 byte cache line. 

The only exception is destructor_arg on 64 bit which is on the preceding
cache line but that is not a field used in any hot path.

> However the change being recommend here only resolves the issue for one
> specific architecture, and that is what I don't agree with.  What we
> need is a solution that also works for 64K pages or 32 bit pointers and
> I am fairly certain this current solution does not.

I think it does work for 32 bit pointers. What issue to do you see with
64K pages?

Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet April 11, 2012, 8:20 a.m. UTC | #8

On Tue, 2012-04-10 at 12:15 -0700, Alexander Duyck wrote:

> 
> Actually now that I think about it my concerns go much further than the
> memset.  I'm convinced that this is going to cause a pretty significant
> performance regression on multiple drivers, especially on non x86_64
> architecture.  What we have right now on most platforms is a
> skb_shared_info structure in which everything up to and including frag 0
> is all in one cache line.  This gives us pretty good performance for igb
> and ixgbe since that is our common case when jumbo frames are not
> enabled is to split the head and place the data in a page.

I dont understand this split thing for MTU=1500 frames.

Even using half a page per fragment, each skb :

needs 2 allocations for sk_buff and skb->head, plus one page alloc /
reference.

skb->truesize = ksize(skb->head) + sizeof(*skb) + PAGE_SIZE/2 = 512 +
256 + 2048 = 2816 bytes


With non split you have :

2 allocations for sk_buff and skb->head.

skb->truesize = ksize(skb->head) + sizeof(*skb) = 2048 + 256 = 2304
bytes

less overhead and less calls to page allocator...

This only can benefit if GRO is on, since aggregation can use fragments
and a single sk_buff, instead of a frag_list



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duyck, Alexander H April 11, 2012, 4:05 p.m. UTC | #9

On 04/11/2012 01:20 AM, Eric Dumazet wrote:
> On Tue, 2012-04-10 at 12:15 -0700, Alexander Duyck wrote:
>
>> Actually now that I think about it my concerns go much further than the
>> memset.  I'm convinced that this is going to cause a pretty significant
>> performance regression on multiple drivers, especially on non x86_64
>> architecture.  What we have right now on most platforms is a
>> skb_shared_info structure in which everything up to and including frag 0
>> is all in one cache line.  This gives us pretty good performance for igb
>> and ixgbe since that is our common case when jumbo frames are not
>> enabled is to split the head and place the data in a page.
> I dont understand this split thing for MTU=1500 frames.
>
> Even using half a page per fragment, each skb :
>
> needs 2 allocations for sk_buff and skb->head, plus one page alloc /
> reference.
>
> skb->truesize = ksize(skb->head) + sizeof(*skb) + PAGE_SIZE/2 = 512 +
> 256 + 2048 = 2816 bytes
The number you provide for head is currently only available for 128 byte
skb allocations.  Anything larger than that will generate a 1K
allocation.  Also after all of these patches the smallest size you can
allocate will be 1K for anything under 504 bytes.

The size advantage is actually more for smaller frames.  In the case of
igb the behaviour is to place anything less than 512 bytes into just the
header and to skip using the page.  As such we get a much more ideal
allocation for small packets. since the truesize is only 1280 in that case.

In the case of ixgbe the advantage is more of a cache miss advantage. 
Ixgbe only receives the data into pages now.  I can prefetch the first
two cache lines of the page into memory while allocating the skb to
receive it.  As such we essentially cut the number of cache misses in
half versus the old approach which had us generating cache misses on the
sk_buff during allocation, and then generating more cache misses again
once we received the buffer and can fill out the sk_buff fields.  A
similar size advantage exists as well, but only for frames 256 bytes or
smaller.

> With non split you have :
>
> 2 allocations for sk_buff and skb->head.
>
> skb->truesize = ksize(skb->head) + sizeof(*skb) = 2048 + 256 = 2304
> bytes
>
> less overhead and less calls to page allocator...
>
> This only can benefit if GRO is on, since aggregation can use fragments
> and a single sk_buff, instead of a frag_list
There is much more than true size involved here.  My main argument is
that if we are going to align this modified skb_shared_info so that it
is aligned on nr_frags we should do it on all architectures, not just
x86_64.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duyck, Alexander H April 11, 2012, 4:31 p.m. UTC | #10

On 04/11/2012 01:00 AM, Ian Campbell wrote:
> On Tue, 2012-04-10 at 20:15 +0100, Alexander Duyck wrote:
>> On 04/10/2012 11:41 AM, Eric Dumazet wrote:
>>> On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote:
>>>
>>>> Have you checked this for 32 bit as well as 64?  Based on my math your
>>>> next patch will still mess up the memset on 32 bit with the structure
>>>> being split somewhere just in front of hwtstamps.
>>>>
>>>> Why not just take frags and move it to the start of the structure?  It
>>>> is already an unknown value because it can be either 16 or 17 depending
>>>> on the value of PAGE_SIZE, and since you are making changes to frags the
>>>> changes wouldn't impact the alignment of the other values later on since
>>>> you are aligning the end of the structure.  That way you would be
>>>> guaranteed that all of the fields that will be memset would be in the
>>>> last 64 bytes.
>>>>
>>> Now when a fragmented packet is copied in pskb_expand_head(), you access
>>> two separate zones of memory to copy the shinfo. But its supposed to be
>>> slow path.
>>>
>>> Problem with this is that the offsets of often used fields will be big
>>> (instead of being < 127) and code will be bigger on x86.
>> Actually now that I think about it my concerns go much further than the
>> memset.  I'm convinced that this is going to cause a pretty significant
>> performance regression on multiple drivers, especially on non x86_64
>> architecture.  What we have right now on most platforms is a
>> skb_shared_info structure in which everything up to and including frag 0
>> is all in one cache line.  This gives us pretty good performance for igb
>> and ixgbe since that is our common case when jumbo frames are not
>> enabled is to split the head and place the data in a page.
> With all the changes in this series it is still possible to fit a
> maximum standard MTU frame and the shinfo on the same 4K page while also
> have the skb_shared_info up to and including frag [0] aligned to the
> same 64 byte cache line. 
>
> The only exception is destructor_arg on 64 bit which is on the preceding
> cache line but that is not a field used in any hot path.
The problem I have is that this is only true on x86_64.  Proper work
hasn't been done to guarantee this on any other architectures.

I think what I would like to see is instead of just setting things up
and hoping it comes out cache aligned on nr_frags why not take steps to
guarantee it?  You could do something like place and size the structure
based on:
SKB_DATA_ALIGN(sizeof(skb_shared_info) - offsetof(struct
skb_shared_info, nr_frags)) + offsetof(struct skb_shared_info, nr_frags)

That way you would have your alignment still guaranteed based off of the
end of the structure, but anything placed before nr_frags would be
placed on the end of the previous cache line.

>> However the change being recommend here only resolves the issue for one
>> specific architecture, and that is what I don't agree with.  What we
>> need is a solution that also works for 64K pages or 32 bit pointers and
>> I am fairly certain this current solution does not.
> I think it does work for 32 bit pointers. What issue to do you see with
> 64K pages?
>
> Ian.
With 64K pages the MAX_SKB_FRAGS value drops from 17 to 16.  That will
undoubtedly mess up the alignment.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell April 11, 2012, 5 p.m. UTC | #11

On Wed, 2012-04-11 at 17:31 +0100, Alexander Duyck wrote:
> On 04/11/2012 01:00 AM, Ian Campbell wrote:
> > On Tue, 2012-04-10 at 20:15 +0100, Alexander Duyck wrote:
> >> On 04/10/2012 11:41 AM, Eric Dumazet wrote:
> >>> On Tue, 2012-04-10 at 11:33 -0700, Alexander Duyck wrote:
> >>>
> >>>> Have you checked this for 32 bit as well as 64?  Based on my math your
> >>>> next patch will still mess up the memset on 32 bit with the structure
> >>>> being split somewhere just in front of hwtstamps.
> >>>>
> >>>> Why not just take frags and move it to the start of the structure?  It
> >>>> is already an unknown value because it can be either 16 or 17 depending
> >>>> on the value of PAGE_SIZE, and since you are making changes to frags the
> >>>> changes wouldn't impact the alignment of the other values later on since
> >>>> you are aligning the end of the structure.  That way you would be
> >>>> guaranteed that all of the fields that will be memset would be in the
> >>>> last 64 bytes.
> >>>>
> >>> Now when a fragmented packet is copied in pskb_expand_head(), you access
> >>> two separate zones of memory to copy the shinfo. But its supposed to be
> >>> slow path.
> >>>
> >>> Problem with this is that the offsets of often used fields will be big
> >>> (instead of being < 127) and code will be bigger on x86.
> >> Actually now that I think about it my concerns go much further than the
> >> memset.  I'm convinced that this is going to cause a pretty significant
> >> performance regression on multiple drivers, especially on non x86_64
> >> architecture.  What we have right now on most platforms is a
> >> skb_shared_info structure in which everything up to and including frag 0
> >> is all in one cache line.  This gives us pretty good performance for igb
> >> and ixgbe since that is our common case when jumbo frames are not
> >> enabled is to split the head and place the data in a page.
> > With all the changes in this series it is still possible to fit a
> > maximum standard MTU frame and the shinfo on the same 4K page while also
> > have the skb_shared_info up to and including frag [0] aligned to the
> > same 64 byte cache line. 
> >
> > The only exception is destructor_arg on 64 bit which is on the preceding
> > cache line but that is not a field used in any hot path.
> The problem I have is that this is only true on x86_64.  Proper work
> hasn't been done to guarantee this on any other architectures.

FWIW I did also explicitly cover i386 (see
<1334130984.12209.195.camel@dagon.hellion.org.uk>)

> I think what I would like to see is instead of just setting things up
> and hoping it comes out cache aligned on nr_frags why not take steps to
> guarantee it?  You could do something like place and size the structure
> based on:
> SKB_DATA_ALIGN(sizeof(skb_shared_info) - offsetof(struct
> skb_shared_info, nr_frags)) + offsetof(struct skb_shared_info, nr_frags)
> 
> That way you would have your alignment still guaranteed based off of the
> end of the structure, but anything placed before nr_frags would be
> placed on the end of the previous cache line.
> 
> >> However the change being recommend here only resolves the issue for one
> >> specific architecture, and that is what I don't agree with.  What we
> >> need is a solution that also works for 64K pages or 32 bit pointers and
> >> I am fairly certain this current solution does not.
> > I think it does work for 32 bit pointers. What issue to do you see with
> > 64K pages?
> >
> > Ian.
> With 64K pages the MAX_SKB_FRAGS value drops from 17 to 16.  That will
> undoubtedly mess up the alignment.

Oh, I see. Need to think about this some more but your suggestion above
is an interesting one, I'll see what I can do with that.

Ian.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[05/10] net: move destructor_arg to the front of sk_buff.

Commit Message

Comments

Patch