diff mbox

[net-next] rps: introduce a new sysctl switch rps_workaround_buggy_driver

Message ID 4F7D7464.7040503@gmail.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Li Yu April 5, 2012, 10:31 a.m. UTC
We encountered a buggy NIC driver or hardware/firmware, it keeps
non-zero constant skb->rxhash for long time, so if we enabled RPS,
the targeted CPU keeps same for long time too.

This patch introduces a sysctl switch to workaround for such problem,
if the switch was on, RPS core discards the skb->rxhash that is
computed by NIC hardware.

Hope this patch also can help others, thanks.

Signed-off-by Li Yu <bingtian.ly@taobao.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller April 5, 2012, 10:44 a.m. UTC | #1
From: Li Yu <raise.sail@gmail.com>
Date: Thu, 05 Apr 2012 18:31:00 +0800

> 
> We encountered a buggy NIC driver or hardware/firmware, it keeps
> non-zero constant skb->rxhash for long time, so if we enabled RPS,
> the targeted CPU keeps same for long time too.
> 
> This patch introduces a sysctl switch to workaround for such problem,
> if the switch was on, RPS core discards the skb->rxhash that is
> computed by NIC hardware.
> 
> Hope this patch also can help others, thanks.
> 
> Signed-off-by Li Yu <bingtian.ly@taobao.com>

No way, we fix the drivers not add workarounds like this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 5, 2012, 11:07 a.m. UTC | #2
On Thu, 2012-04-05 at 18:31 +0800, Li Yu wrote:
> We encountered a buggy NIC driver or hardware/firmware, it keeps
> non-zero constant skb->rxhash for long time, so if we enabled RPS,
> the targeted CPU keeps same for long time too.
> 
> This patch introduces a sysctl switch to workaround for such problem,
> if the switch was on, RPS core discards the skb->rxhash that is
> computed by NIC hardware.
> 
> Hope this patch also can help others, thanks.

Really ?

to disable this driver rxhash, you should try :

ethtool -K eth0 rxhash off



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Li Yu April 6, 2012, 2:07 a.m. UTC | #3
于 2012年04月05日 19:07, Eric Dumazet 写道:
> On Thu, 2012-04-05 at 18:31 +0800, Li Yu wrote:
>> We encountered a buggy NIC driver or hardware/firmware, it keeps
>> non-zero constant skb->rxhash for long time, so if we enabled RPS,
>> the targeted CPU keeps same for long time too.
>>
>> This patch introduces a sysctl switch to workaround for such problem,
>> if the switch was on, RPS core discards the skb->rxhash that is
>> computed by NIC hardware.
>>
>> Hope this patch also can help others, thanks.
>
> Really ?
>
> to disable this driver rxhash, you should try :
>
> ethtool -K eth0 rxhash off
>
>

Great! I really did not know this new option ago,
and wrote a kprobe workaround module. It seem that
ethtool 2.6.39 have such support at least.

Thank you very much!

Yu

>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 192250b..4c28ce0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -628,9 +628,13 @@  extern unsigned int   skb_find_text(struct sk_buff
*skb, unsigned int from,
 				    unsigned int to, struct ts_config *config,
 				    struct ts_state *state);

+extern int rps_workaround_buggy_driver;
 extern void __skb_get_rxhash(struct sk_buff *skb);
 static inline __u32 skb_get_rxhash(struct sk_buff *skb)
 {
+	if (unlikely(rps_workaround_buggy_driver))
+		skb->rxhash = 0;
+
 	if (!skb->rxhash)
 		__skb_get_rxhash(skb);

diff --git a/net/core/dev.c b/net/core/dev.c
index 723a406..9d1e728 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -176,6 +176,8 @@ 
 #define PTYPE_HASH_SIZE	(16)
 #define PTYPE_HASH_MASK	(PTYPE_HASH_SIZE - 1)

+int rps_workaround_buggy_driver = 0;
+
 static DEFINE_SPINLOCK(ptype_lock);
 static struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 static struct list_head ptype_all __read_mostly;	/* Taps */
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 0c28508..065ea7c 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -172,6 +172,13 @@  static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= rps_sock_flow_sysctl
 	},
+	{
+		.procname	= "rps_workaround_buggy_driver",
+		.data		= &rps_workaround_buggy_driver,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 #endif
 #endif /* CONFIG_NET */
 	{