Question about __alloc_skb() speedup

Message ID	20101203101450.GA9573@Desktop-Junchang
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; b=Otf+1pwJCO0Q97ARPlN8lxLQNi7a0fvwEIMYBIFMMFAPe1Z9CuCM8KyOss2ttcAXOI Lh814RxxE/ka71DqYVQmoInPlGUBvBikYB0HnRdBBmR0BKH/GZa1dQvXJOQ7RaJQx4gr 30l+JmYPTeetSdZWtKR0BXizkmlwURrKzitWw= Date: Fri, 3 Dec 2010 18:14:52 +0800 From: Junchang Wang <junchangwang@gmail.com> To: eric.dumazet@gmail.com Cc: netdev@vger.kernel.org Subject: Question about __alloc_skb() speedup Message-ID: <20101203101450.GA9573@Desktop-Junchang> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Sender: netdev-owner@vger.kernel.org Precedence: bulk

Message ID

20101203101450.GA9573@Desktop-Junchang

State

RFC, archived

Delegated to:

David Miller

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=date:from:to:cc:subject:message-id:mime-version:content-type
	:content-disposition:user-agent;
	b=Otf+1pwJCO0Q97ARPlN8lxLQNi7a0fvwEIMYBIFMMFAPe1Z9CuCM8KyOss2ttcAXOI
	Lh814RxxE/ka71DqYVQmoInPlGUBvBikYB0HnRdBBmR0BKH/GZa1dQvXJOQ7RaJQx4gr
	30l+JmYPTeetSdZWtKR0BXizkmlwURrKzitWw=
Date: Fri, 3 Dec 2010 18:14:52 +0800
From: Junchang Wang <junchangwang@gmail.com>
To: eric.dumazet@gmail.com
Cc: netdev@vger.kernel.org
Subject: Question about __alloc_skb() speedup
Message-ID: <20101203101450.GA9573@Desktop-Junchang>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-06-14)
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Commit Message

Junchang Wang Dec. 3, 2010, 10:14 a.m. UTC

Hi Eric,

I'm reading your patch (ec7d2f2cf3a1 __alloc_skb() speedup),
in which you prefetch skb and the shinfo part. I'm very
curious why we don't prefetch skb->data. It seems that will
help tx path a lot.

I added the following code


and the pktgen in my server (A Intel SR1625 server with two E5530 
4-core processors and a single ixgbe-based NIC) goes from 7.6Mpps to
8.4Mpps (64 byte), with 10% performance gain.

For rx path, I did experiments on both ixgbe and igb with pktgen+kute,
and there is no change in system performance.

welcome any suggestions and corrections.

Thanks.

--Junchang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Dec. 3, 2010, 10:50 a.m. UTC | #1

Le vendredi 03 décembre 2010 à 18:14 +0800, Junchang Wang a écrit :
> Hi Eric,
> 
> I'm reading your patch (ec7d2f2cf3a1 __alloc_skb() speedup),
> in which you prefetch skb and the shinfo part. I'm very
> curious why we don't prefetch skb->data. It seems that will
> help tx path a lot.
> 
> I added the following code
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 104f844..c60a808 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -222,6 +222,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>  
>  		child->fclone = SKB_FCLONE_UNAVAILABLE;
>  	}
> +	prefetchw(data);
> +
>  out:
>  	return skb;
>  nodata:
> 
> and the pktgen in my server (A Intel SR1625 server with two E5530 
> 4-core processors and a single ixgbe-based NIC) goes from 7.6Mpps to
> 8.4Mpps (64 byte), with 10% performance gain.
> 
> For rx path, I did experiments on both ixgbe and igb with pktgen+kute,
> and there is no change in system performance.
> 
> welcome any suggestions and corrections.
> 
> Thanks.

This is because __alloc_skb() is generic :

We dont know if the skb->data is going to be used right after or not at
all.

For example, NIC drivers call __alloc_skb() to refill their RX ring
buffer. There is no gain to prefetch data in this case since the data is
going to be written by the NIC hardware. The reverse would be needed
actually : ask to local cpu to evict data from its cache, so that device
can DMA it faster (less bus transactions)

By the way, adding prefetchw() right before the "return skb;" is
probably not very useful. You can certainly try to add the prefetchw()
in pktgen itself, since you know for sure you are going to write the
data.

I dont understand your 10% speedup because pktgen actually uses
__netdev_alloc_skb(), so it calls skb_reserve(skb, NET_SKB_PAD) : your
prefetchw is bringing a cache line that wont be used at all by pktgen.

I would say 10% sounds highly suspect to me...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 104f844..c60a808 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -222,6 +222,8 @@  struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
 
 		child->fclone = SKB_FCLONE_UNAVAILABLE;
 	}
+	prefetchw(data);
+
 out:
 	return skb;
 nodata:

Question about __alloc_skb() speedup

Commit Message

Comments

Patch