net: reduce number of reference taken on sk_refcnt

From: Eric Dumazet <dada1@cosmosbay.com>

Krzysztof Halasa a écrit :
> Hi,
> 
> Could NAPI or something similar be used for TX in addition to RX?
> Perhaps some driver already does that?
> 
> I'm specifically thinking of IXP4xx Ethernet (and possibly WAN), those
> are 266-667 MHz ARM XScale CPUs and the interrupts handling just
> transmitted sk_buffs are a pain. Could I delay those interrupts
> somehow?

This is a pain yes, I agree. But some improvements can be done.
(even on drivers already using NAPI, especially if same interrupt
is used both for RX and RX queues)

For example, we can avoid the dst_release() cache miss if this
is done in start_xmit(), and not later in TX completion while freeing skb.
I tried various patches in the past but unfortunatly it seems
only safe way to do this is in the driver xmit itself, not in core
network stack. This would need many patches, one for each driver.

Then, if your workload is UDP packets, a recent patch
avoided an expensive wakeup at TX completion time
(check commit bf368e4e70cd4e0f880923c44e95a4273d725ab4

http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bf368e4e70cd4e0f880923c44e95a4273d725ab4

Unfortunatly this patch is only for 2.6.30, since it is based
on a new infrastructure from Davide Libenzi)

Then, additional work should be done to reduce cache lines
misses in sock_wfree()

This function currently touches many different cache lines
(one for sk_wmem_alloc, one for testing sk_flags, one
 to decrement sk_refcnt). And on UDP, extra cachelines because
we call sock_def_write_space(). Yes this sucks.

We could avoid touching sk_refcnt in several cases. We
need a reference count only for the whole packets attached to a socket,
granted we use atomic_add_return() and such functions to detect 0
transitions. That way, if a tcp stream always has some packets in flight,
TX completion would not have to decrement sk_refcnt.

Following patch (based on net-next-2.6) to illustrate my point :

[PATCH] net: reduce number of reference taken on sk_refcnt

Current sk_wmem_alloc schema uses a sk_refcnt taken for each packet
in flight. This hurts some workloads at TX completion time, because
sock_wfree() has three cache lines to touch at least.
(one for sk_wmem_alloc, one for testing sk_flags, one
 to decrement sk_refcnt)

We could use only one reference count, taken only when sk_wmem_alloc
is changed from or to ZERO value (ie one reference count for any number
of in-flight packets)

Not all atomic_add() must be changed to atomic_add_return(), if we
know current sk_wmem_alloc is already not null.

This patch reduces by one number of cache lines dirtied in sock_wfree()
and number of atomic operation in some workloads. 

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --cc include/net/tcp.h
index ac37228,87d210b..0000000
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
diff --git a/include/net/sock.h b/include/net/sock.h
index 4bb1ff9..d57e88b 100644

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	4A044BE7.3070308@cosmosbay.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id CE841B6F35 for <patchwork-incoming@bilbo.ozlabs.org>; Sat, 9 May 2009 01:13:03 +1000 (EST) Received: by ozlabs.org (Postfix) id BA546DDF9D; Sat, 9 May 2009 01:13:03 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 5A0F4DDF9C for <patchwork-incoming@ozlabs.org>; Sat, 9 May 2009 01:13:03 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760485AbZEHPMv (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Fri, 8 May 2009 11:12:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757970AbZEHPMv (ORCPT <rfc822;netdev-outgoing>); Fri, 8 May 2009 11:12:51 -0400 Received: from gw1.cosmosbay.com ([212.99.114.194]:40237 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761459AbZEHPMt convert rfc822-to-8bit (ORCPT <rfc822;netdev@vger.kernel.org>); Fri, 8 May 2009 11:12:49 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n48FCfmU009388; Fri, 8 May 2009 17:12:41 +0200 Message-ID: <4A044BE7.3070308@cosmosbay.com> Date: Fri, 08 May 2009 17:12:39 +0200 From: Eric Dumazet <dada1@cosmosbay.com> User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Krzysztof Halasa <khc@pm.waw.pl> CC: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net> Subject: [PATCH] net: reduce number of reference taken on sk_refcnt References: <m31vqzwzae.fsf@intrepid.localdomain> In-Reply-To: <m31vqzwzae.fsf@intrepid.localdomain> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Fri, 08 May 2009 17:12:41 +0200 (CEST) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

net: reduce number of reference taken on sk_refcnt

Commit Message

Comments

Patch