From patchwork Tue Dec 6 11:57:14 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Campbell X-Patchwork-Id: 129637 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id CBBB71007DC for ; Tue, 6 Dec 2011 22:57:26 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933348Ab1LFL5R (ORCPT ); Tue, 6 Dec 2011 06:57:17 -0500 Received: from smtp.ctxuk.citrix.com ([62.200.22.115]:49811 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933238Ab1LFL5Q (ORCPT ); Tue, 6 Dec 2011 06:57:16 -0500 X-IronPort-AV: E=Sophos;i="4.71,304,1320624000"; d="scan'208";a="9314175" Received: from lonpmailmx01.citrite.net ([10.30.203.162]) by LONPIPO01.EU.CITRIX.COM with ESMTP/TLS/RC4-MD5; 06 Dec 2011 11:57:15 +0000 Received: from [10.80.2.42] (10.80.2.42) by LONPMAILMX01.citrite.net (10.30.203.162) with Microsoft SMTP Server id 8.3.213.0; Tue, 6 Dec 2011 11:57:14 +0000 Subject: Re: [PATCH 0/4] skb paged fragment destructors From: Ian Campbell To: David Miller CC: Jesse Brandeburg , Date: Tue, 6 Dec 2011 11:57:14 +0000 In-Reply-To: <1320850895.955.172.camel@zakaz.uk.xensource.com> References: <1320850895.955.172.camel@zakaz.uk.xensource.com> Organization: Citrix Systems, Inc. X-Mailer: Evolution 3.0.3- Message-ID: <1323172634.23681.73.camel@zakaz.uk.xensource.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 2011-11-09 at 15:01 +0000, Ian Campbell wrote: > * split linear data allocation and shinfo allocation into two. I > suspect this will have its own performance implications? On the > positive side skb_shared_info could come from its own fixed size > pool/cache which might have some benefits I played with this to see how it would look. Illustrative patch below. I figure that lots of small frames is the interesting workload for a change such as this but I don't know if iperf is necessarily the best benchmark for measuring that. Before changing things I got: iperf -c qarun -m -t 60 -u -b 10000M -l 64 ------------------------------------------------------------ Client connecting to qarun, UDP port 5001 Sending 64 byte datagrams UDP buffer size: 224 KByte (default) ------------------------------------------------------------ [ 3] local 10.80.225.63 port 45857 connected with 10.80.224.22 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 844 MBytes 118 Mbits/sec [ 3] Sent 13820376 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 844 MBytes 118 Mbits/sec 0.005 ms 0/13820375 (0%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order whereas with the patch: # iperf -c qarun -m -t 60 -u -b 10000M -l 64 ------------------------------------------------------------ Client connecting to qarun, UDP port 5001 Sending 64 byte datagrams UDP buffer size: 224 KByte (default) ------------------------------------------------------------ [ 3] local 10.80.225.63 port 42504 connected with 10.80.224.22 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-60.0 sec 833 MBytes 116 Mbits/sec [ 3] Sent 13645857 datagrams [ 3] Server Report: [ 3] 0.0-60.0 sec 833 MBytes 116 Mbits/sec 0.005 ms 0/13645856 (0%) [ 3] 0.0-60.0 sec 1 datagrams received out-of-order With 1200 byte datagrams I get basically identical throughput. (nb: none of the skb destructor stuff was present in either case) > * steal a bit a pointer to indicate destructor pointer vs regular > struct page pointer (moving the struct page into the destructor > datastructure for that case). Stops us sharing a single > destructor between multiple pages, but that isn't critical I also implemented this and compared to the above the patch is pretty fugly, especially its impact on the SUNRPC patches to actually use the feature. It needs a bit more cleanup before I can post it for comparison though. Ian. drivers/net/ethernet/broadcom/bnx2.c | 5 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 4 +- drivers/net/ethernet/broadcom/tg3.c | 3 +- include/linux/skbuff.h | 9 ++- net/core/skbuff.c | 67 ++++++++++++++++---------- 5 files changed, 51 insertions(+), 37 deletions(-) --- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 83d8cef..4e37e05 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -5320,8 +5320,7 @@ bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size) /* 8 for CRC and VLAN */ rx_size = bp->dev->mtu + ETH_HLEN + BNX2_RX_OFFSET + 8; - rx_space = SKB_DATA_ALIGN(rx_size + BNX2_RX_ALIGN) + NET_SKB_PAD + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + rx_space = SKB_DATA_ALIGN(rx_size + BNX2_RX_ALIGN) + NET_SKB_PAD; bp->rx_copy_thresh = BNX2_RX_COPY_THRESH; bp->rx_pg_ring_size = 0; @@ -5345,7 +5344,7 @@ bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size) bp->rx_buf_use_size = rx_size; /* hw alignment + build_skb() overhead*/ bp->rx_buf_size = SKB_DATA_ALIGN(bp->rx_buf_use_size + BNX2_RX_ALIGN) + - NET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + NET_SKB_PAD; bp->rx_jumbo_thresh = rx_size - BNX2_RX_OFFSET; bp->rx_ring_size = size; bp->rx_max_ring = bnx2_find_max_ring(size, MAX_RX_RINGS); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index 0f7b7a4..b3de327 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1203,9 +1203,7 @@ struct bnx2x { */ #define BNX2X_FW_RX_ALIGN_START (1UL << BNX2X_RX_ALIGN_SHIFT) -#define BNX2X_FW_RX_ALIGN_END \ - max(1UL << BNX2X_RX_ALIGN_SHIFT, \ - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) +#define BNX2X_FW_RX_ALIGN_END (1UL << BNX2X_RX_ALIGN_SHIFT) #define BNX2X_PXP_DRAM_ALIGN (BNX2X_RX_ALIGN_SHIFT - 5) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index d9e9c8c..dbea57b 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -5426,8 +5426,7 @@ static int tg3_alloc_rx_data(struct tg3 *tp, struct tg3_rx_prodring_set *tpr, * Callers depend upon this behavior and assume that * we leave everything unchanged if we fail. */ - skb_size = SKB_DATA_ALIGN(data_size + TG3_RX_OFFSET(tp)) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + skb_size = SKB_DATA_ALIGN(data_size + TG3_RX_OFFSET(tp)); data = kmalloc(skb_size, GFP_ATOMIC); if (!data) return -ENOMEM; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 09b7ea5..48e91a0 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -40,8 +40,7 @@ #define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \ ~(SMP_CACHE_BYTES - 1)) -#define SKB_WITH_OVERHEAD(X) \ - ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) +#define SKB_WITH_OVERHEAD(X) ((X)) #define SKB_MAX_ORDER(X, ORDER) \ SKB_WITH_OVERHEAD((PAGE_SIZE << (ORDER)) - (X)) #define SKB_MAX_HEAD(X) (SKB_MAX_ORDER((X), 0)) @@ -476,6 +475,7 @@ struct sk_buff { sk_buff_data_t end; unsigned char *head, *data; + struct skb_shared_info *shinfo; unsigned int truesize; atomic_t users; }; @@ -634,7 +634,10 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) #endif /* Internal */ -#define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) +static inline struct skb_shared_info *skb_shinfo(const struct sk_buff *skb) +{ + return skb->shinfo; +} static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7dc05ec..cfbe8e5 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -73,6 +73,7 @@ static struct kmem_cache *skbuff_head_cache __read_mostly; static struct kmem_cache *skbuff_fclone_cache __read_mostly; +static struct kmem_cache *skbuff_shinfo_cache __read_mostly; static void sock_pipe_buf_release(struct pipe_inode_info *pipe, struct pipe_buffer *buf) @@ -118,7 +119,7 @@ static const struct pipe_buf_operations sock_pipe_buf_ops = { * * Out of line support code for skb_put(). Not user callable. */ -static void skb_over_panic(struct sk_buff *skb, int sz, void *here) +static void skb_over_panic(struct sk_buff*skb, int sz, void *here) { printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p " "data:%p tail:%#lx end:%#lx dev:%s\n", @@ -184,22 +185,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, goto out; prefetchw(skb); + shinfo = kmem_cache_alloc_node(skbuff_shinfo_cache, + gfp_mask & ~__GFP_DMA, node); + if (!shinfo) + goto noshinfo; + prefetchw(shinfo); + /* We do our best to align skb_shared_info on a separate cache * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives * aligned memory blocks, unless SLUB/SLAB debug is enabled. * Both skb->head and skb_shared_info are cache line aligned. */ size = SKB_DATA_ALIGN(size); - size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); data = kmalloc_node_track_caller(size, gfp_mask, node); if (!data) goto nodata; - /* kmalloc(size) might give us more room than requested. - * Put skb_shared_info exactly at the end of allocated zone, - * to allow max possible filling before reallocation. - */ - size = SKB_WITH_OVERHEAD(ksize(data)); - prefetchw(data + size); /* * Only clear those fields we need to clear, not those that we will @@ -210,6 +210,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, /* Account for allocated memory : skb + skb->head */ skb->truesize = SKB_TRUESIZE(size); atomic_set(&skb->users, 1); + skb->shinfo = shinfo; skb->head = data; skb->data = data; skb_reset_tail_pointer(skb); @@ -219,7 +220,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, #endif /* make sure we initialize shinfo sequentially */ - shinfo = skb_shinfo(skb); + BUG_ON(shinfo != skb_shinfo(skb)); memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); atomic_set(&shinfo->dataref, 1); kmemcheck_annotate_variable(shinfo->destructor_arg); @@ -238,6 +239,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, out: return skb; nodata: + kmem_cache_free(skbuff_shinfo_cache, shinfo); +noshinfo: kmem_cache_free(cache, skb); skb = NULL; goto out; @@ -254,8 +257,8 @@ EXPORT_SYMBOL(__alloc_skb); * On a failure the return is %NULL, and @data is not freed. * Notes : * Before IO, driver allocates only data buffer where NIC put incoming frame - * Driver should add room at head (NET_SKB_PAD) and - * MUST add room at tail (SKB_DATA_ALIGN(skb_shared_info)) + * Driver should add room at head (NET_SKB_PAD) and need not + * add room at tail (SKB_DATA_ALIGN(skb_shared_info)) * After IO, driver calls build_skb(), to allocate sk_buff and populate it * before giving packet to stack. * RX rings only contains data buffers, not full skbs. @@ -270,11 +273,17 @@ struct sk_buff *build_skb(void *data) if (!skb) return NULL; - size = ksize(data) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + shinfo = kmem_cache_alloc(skbuff_shinfo_cache, GFP_ATOMIC); + if (!shinfo) + goto noshinfo; + prefetchw(shinfo); + + size = ksize(data); memset(skb, 0, offsetof(struct sk_buff, tail)); skb->truesize = SKB_TRUESIZE(size); atomic_set(&skb->users, 1); + skb->shinfo = shinfo; skb->head = data; skb->data = data; skb_reset_tail_pointer(skb); @@ -284,12 +293,17 @@ struct sk_buff *build_skb(void *data) #endif /* make sure we initialize shinfo sequentially */ - shinfo = skb_shinfo(skb); + BUG_ON(shinfo != skb_shinfo(skb)); memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); atomic_set(&shinfo->dataref, 1); kmemcheck_annotate_variable(shinfo->destructor_arg); +out: return skb; +noshinfo: + kmem_cache_free(skbuff_head_cache, skb); + skb = NULL; + goto out; } EXPORT_SYMBOL(build_skb); @@ -378,7 +392,7 @@ static void skb_clone_fraglist(struct sk_buff *skb) skb_get(list); } -static void skb_release_data(struct sk_buff *skb) +static void skb_release_data(struct sk_buff *skb, int free_shinfo) { if (!skb->cloned || !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, @@ -404,6 +418,8 @@ static void skb_release_data(struct sk_buff *skb) if (skb_has_frag_list(skb)) skb_drop_fraglist(skb); + if (free_shinfo) + kmem_cache_free(skbuff_shinfo_cache, skb_shinfo(skb)); kfree(skb->head); } } @@ -474,7 +490,7 @@ static void skb_release_head_state(struct sk_buff *skb) static void skb_release_all(struct sk_buff *skb) { skb_release_head_state(skb); - skb_release_data(skb); + skb_release_data(skb, 1); } /** @@ -646,6 +662,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) C(tail); C(end); C(head); + C(shinfo); C(data); C(truesize); atomic_set(&n->users, 1); @@ -941,18 +958,14 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta; } - if (fastpath && - size + sizeof(struct skb_shared_info) <= ksize(skb->head)) { - memmove(skb->head + size, skb_shinfo(skb), - offsetof(struct skb_shared_info, - frags[skb_shinfo(skb)->nr_frags])); + if (fastpath && size <= ksize(skb->head)) { memmove(skb->head + nhead, skb->head, skb_tail_pointer(skb) - skb->head); off = nhead; goto adjust_others; } - data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); + data = kmalloc(size, gfp_mask); if (!data) goto nodata; @@ -961,10 +974,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, */ memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head); - memcpy((struct skb_shared_info *)(data + size), - skb_shinfo(skb), - offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags])); -after apply if (fastpath) { kfree(skb->head); } else { @@ -979,7 +988,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, if (skb_has_frag_list(skb)) skb_clone_fraglist(skb); - skb_release_data(skb); + skb_release_data(skb, 0); } off = (data + nhead) - skb->head; @@ -2956,6 +2965,12 @@ void __init skb_init(void) 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); + + skbuff_shinfo_cache = kmem_cache_create("skbuff_shinfo_cache", + sizeof(struct skb_shared_info), + 0, + SLAB_HWCACHE_ALIGN|SLAB_PANIC, + NULL); } /**