[RFC,v2:,1/3] net: hand off skb list to other cpu to submit to upper layer

From: Zhang Yanmin <yanmin.zhang@linux.intel.com>

I got some comments. Special thanks to Stephen Hemminger for teaching me on
what reorder is and some other comments. Also thank other guys who raised comments.

v2 has some improvements.
1) Add new sysfs interface /sys/class/net/ethXXX/rx_queueXXX/processing_cpu. Admin
could use it to configure the binding between RX and cpu number. So it's convenient
for drivers to use the new capability.
2) Delete function netif_rx_queue;
3) Optimize ipi notification. There is no new notification when destination's
input_pkt_alien_queue isn't empty.
4) Did lots of testing, mostly focusing on slab allocator (slab/slub/slqb) and use
SLUB with big slub_max_order currently.

---

Subject: net: hand off skb list to other cpu to submit to upper layer
From: Zhang Yanmin <yanmin.zhang@linux.intel.com>

Recently, I am investigating an ip_forward performance issue with 10G IXGBE NIC.
I start the testing on 2 machines. Every machine has 2 10G NICs. The 1st one seconds
packets by pktgen. The 2nd receives the packets from one NIC and forwards them out
from the 2nd NIC. 

Initial testing showed cpu cache sharing has impact on speed. As NICs supports
multi-queue, I bind the queues to different logical cpu of different physical cpu
while considering cache sharing carefully. I could get about 30~40% improvement;

Comparing with sending speed on the 1st machine, the forward speed is still not good,
only about 60% of sending speed. As a matter of fact, IXGBE driver starts NAPI when
interrupt arrives. When ip_forward=1, receiver collects a packet and forwards it out
immediately. So although IXGBE collects packets with NAPI, the forwarding really has
much impact on collection. As IXGBE runs very fast, it drops packets quickly. The better
way for receiving cpu is doing nothing than just collecting packets.

Currently kernel has backlog to support a similar capability, but process_backlog still
runs on the receiving cpu. I enhance backlog by adding a new input_pkt_alien_queue to
softnet_data. Receving cpu collects packets and link them into skb list, then delivers
the list to the input_pkt_alien_queue of other cpu. process_backlog picks up the skb list
from input_pkt_alien_queue when input_pkt_queue is empty.

I tested my patch on top of 2.6.28.5. The improvement is about 43%.

Some questions:

1) Reorder: My method wouldn't introduce reorder issue, because we use N:1 mapping between
RX queue and cpu number.
2) If there is no free cpu to work on packet collection: It depends on cpu resource
allocation. We could allocate more RX queue to the same cpu. With my new testing, the
forwarding speed could be at about 4.8M pps (packets per second and packet size is 60Byte)
on Nehalem machine, and 8 packet processing cpus almost have no idle time while receiving cpu
idle is about 50%. I just have 4 old NIC and couldn't test more on this question.
3) packet delaying: I didn't calculate it or measure it and might measure it later. The
forwarding speed is close to 270M bytes/s. At least sar shows mostly receiving matches
forwarding. But at sending side, the sending speed is bigger than forwarding speed, although
my method decreases the difference largely.
4) 10G NICs other than IXGBE: I have no other 10G NICs now.
5) Other kinds of machines working as forwarder: I test it between 1 2*4 stoakley and
2*4*2 Nehalem. I reversed the testing and found the improvement on stoakley is less than 30%,
not so big as on Nehalem.
6) Memory utilization: My nehalem machine has 12GB memory. To reach the maximum speed,
I tried netdev_max_backlog=400000. That consumes 10GB memory sometimes.
7) Any impact if driver enables the new capability but admin doesn't configure it: I didn't
measure the speed difference now.
8) If receiving cpu collects packets very fast and processing cpu is slow: We can start many
RX queues on the receiving cpu and bind them to different processing cpu.

Current patch is against 2.6.29-rc7.

Signed-off-by: Zhang Yanmin <yanmin.zhang@linux.intel.com>

---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	1236761624.2567.442.camel@ymzhang
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id C7CB4DE194 for <patchwork-incoming@ozlabs.org>; Wed, 11 Mar 2009 19:54:23 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753241AbZCKIyR (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Wed, 11 Mar 2009 04:54:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752561AbZCKIyR (ORCPT <rfc822; netdev-outgoing>); Wed, 11 Mar 2009 04:54:17 -0400 Received: from mga05.intel.com ([192.55.52.89]:60823 "EHLO fmsmga101.fm.intel.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752174AbZCKIyP (ORCPT <rfc822;netdev@vger.kernel.org>); Wed, 11 Mar 2009 04:54:15 -0400 Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 11 Mar 2009 01:46:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.38,341,1233561600"; d="scan'208";a="672027516" Received: from ymzhang.sh.intel.com (HELO [10.239.36.211]) ([10.239.36.211]) by fmsmga001.fm.intel.com with ESMTP; 11 Mar 2009 01:57:59 -0700 Subject: [RFC v2: Patch 1/3] net: hand off skb list to other cpu to submit to upper layer From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> To: netdev@vger.kernel.org, LKML <linux-kernel@vger.kernel.org> Cc: herbert@gondor.apana.org.au, jesse.brandeburg@intel.com, shemminger@vyatta.com, David Miller <davem@davemloft.net> Content-Type: text/plain; charset=UTF-8 Date: Wed, 11 Mar 2009 16:53:44 +0800 Message-Id: <1236761624.2567.442.camel@ymzhang> Mime-Version: 1.0 X-Mailer: Evolution 2.22.1 (2.22.1-2.fc9) Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

[RFC,v2:,1/3] net: hand off skb list to other cpu to submit to upper layer

Commit Message

Comments

Patch