From patchwork Fri Feb 20 18:10:42 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 23510 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 8354FDE0F1 for ; Sat, 21 Feb 2009 05:11:05 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752081AbZBTSK4 (ORCPT ); Fri, 20 Feb 2009 13:10:56 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751765AbZBTSKz (ORCPT ); Fri, 20 Feb 2009 13:10:55 -0500 Received: from gw1.cosmosbay.com ([212.99.114.194]:45978 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751739AbZBTSKy convert rfc822-to-8bit (ORCPT ); Fri, 20 Feb 2009 13:10:54 -0500 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n1KIAfho011763; Fri, 20 Feb 2009 19:10:41 +0100 Message-ID: <499EF222.3060507@cosmosbay.com> Date: Fri, 20 Feb 2009 19:10:42 +0100 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.19 (Windows/20081209) MIME-Version: 1.0 To: Patrick McHardy CC: Stephen Hemminger , David Miller , Rick Jones , netdev@vger.kernel.org, netfilter-devel@vger.kernel.org Subject: [PATCH] iptables: xt_hashlimit fix References: <20090218051906.174295181@vyatta.com> <20090218052747.321329022@vyatta.com> <20090219114719.560999b5@extreme> <499DEF49.3040602@cosmosbay.com> In-Reply-To: <499DEF49.3040602@cosmosbay.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Fri, 20 Feb 2009 19:10:41 +0100 (CET) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Eric Dumazet a écrit : > Stephen Hemminger a écrit : >> The reader/writer lock in ip_tables is acquired in the critical path of >> processing packets and is one of the reasons just loading iptables can cause >> a 20% performance loss. The rwlock serves two functions: >> >> 1) it prevents changes to table state (xt_replace) while table is in use. >> This is now handled by doing rcu on the xt_table. When table is >> replaced, the new table(s) are put in and the old one table(s) are freed >> after RCU period. >> >> 2) it provides synchronization when accesing the counter values. >> This is now handled by swapping in new table_info entries for each cpu >> then summing the old values, and putting the result back onto one >> cpu. On a busy system it may cause sampling to occur at different >> times on each cpu, but no packet/byte counts are lost in the process. >> >> Signed-off-by: Stephen Hemminger > > > Acked-by: Eric Dumazet > > Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here) > > BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago) > > Thanks Stephen, thats very cool stuff, yet another rwlock out of kernel :) Damned this broke xt_hashlimit, version=0 Look file "net/netfilter/xt_hashlimit.c" line 706 /* Ugly hack: For SMP, we only want to use one set */ r->u.master = r; File "include/linux/netfilter/xt_hashlimit.h" struct xt_hashlimit_info { char name [IFNAMSIZ]; /* name */ struct hashlimit_cfg cfg; /* Used internally by the kernel */ struct xt_hashlimit_htable *hinfo; union { void *ptr; struct xt_hashlimit_info *master; } u; }; So, it appears some modules are using pointers to themselves, what a hack :( We probably need an audit of other modules. (net/netfilter/xt_statistic.c, net/netfilter/xt_quota.c, net/netfilter/xt_limit.c ...) Unfortunatly I wont have time to do this in following days, any volunteer ? Thank you [PATCH] netfilter: xt_hashlimit fix Commit 784544739a25c30637397ace5489eeb6e15d7d49 (netfilter: iptables: lock free counters) broke xt_hashlimit netfilter module : This module was storing a pointer inside its xt_hashlimit_info, and this pointer is not relocated when we temporarly switch tables (iptables -L). This hack is not not needed at all (probably a leftover from ancient time), as each cpu should and can access to its own copy. Signed-off-by: Eric Dumazet --- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c index 2482055..a5b5369 100644 --- a/net/netfilter/xt_hashlimit.c +++ b/net/netfilter/xt_hashlimit.c @@ -565,8 +565,7 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo, static bool hashlimit_mt_v0(const struct sk_buff *skb, const struct xt_match_param *par) { - const struct xt_hashlimit_info *r = - ((const struct xt_hashlimit_info *)par->matchinfo)->u.master; + const struct xt_hashlimit_info *r = par->matchinfo; struct xt_hashlimit_htable *hinfo = r->hinfo; unsigned long now = jiffies; struct dsthash_ent *dh; @@ -702,8 +701,6 @@ static bool hashlimit_mt_check_v0(const struct xt_mtchk_param *par) } mutex_unlock(&hlimit_mutex); - /* Ugly hack: For SMP, we only want to use one set */ - r->u.master = r; return true; }