diff mbox

[RFC,1/2] net-next: fix DSA flow_disection

Message ID 20170620080655.7251-1-john@phrozen.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

John Crispin June 20, 2017, 8:06 a.m. UTC
RPS and probably other kernel features are currently broken on some if not
all DSA devices. The root cause of this that skb_hash will call the
flow_disector. At this point the skb still contains the magic switch header
and the skb->protocol field is not set up to the correct 802.3 value yet.
by the time the tag specific code is called, removing the header and
properly setting the protocol an invalid hash is already set. In the case
of the mt7530 this will result in all flows always having the same hash.

The patch adds 2 new fields to the dsa_switch_ops allowing the
flow_disector to use them in order to be able to create the real hash of
the connection.

Signed-off-by: John Crispin <john@phrozen.org>
---
 include/net/dsa.h         |  6 ++++++
 net/core/flow_dissector.c | 12 ++++++++++++
 2 files changed, 18 insertions(+)

Comments

Sergei Shtylyov June 20, 2017, 10:17 a.m. UTC | #1
Hello!

On 6/20/2017 11:06 AM, John Crispin wrote:

> RPS and probably other kernel features are currently broken on some if not
> all DSA devices. The root cause of this that skb_hash will call the

   "Is" missing between "this" and "that"?

> flow_disector. At this point the skb still contains the magic switch header

   Dissector?

> and the skb->protocol field is not set up to the correct 802.3 value yet.
> by the time the tag specific code is called, removing the header and
> properly setting the protocol an invalid hash is already set. In the case
> of the mt7530 this will result in all flows always having the same hash.
>
> The patch adds 2 new fields to the dsa_switch_ops allowing the
> flow_disector to use them in order to be able to create the real hash of

   Again.

> the connection.
>
> Signed-off-by: John Crispin <john@phrozen.org>
[...]
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index fc5fc4594c90..da45bdf57408 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
[...]
> @@ -440,6 +441,17 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>  			 skb->vlan_proto : skb->protocol;
>  		nhoff = skb_network_offset(skb);
>  		hlen = skb_headlen(skb);
> +
> +		if (unlikely(netdev_uses_dsa(skb->dev))) {
> +			const struct dsa_switch_ops *ops;
> +			u8 *p = (u8 *) data;

     Didn't checkpatch.pl complain about space after (u8 *)?

> +
> +			ops = skb->dev->dsa_ptr->ds[0]->ops;
> +			if (ops->hash_proto_off)
> +				proto = (u16) p[ops->hash_proto_off];

    Again, didn't it?

[...]

MBR, Sergei
Andrew Lunn June 20, 2017, 2:01 p.m. UTC | #2
On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
> RPS and probably other kernel features are currently broken on some if not
> all DSA devices. The root cause of this that skb_hash will call the
> flow_disector.

Hi John

What is the call path when the flow_disector is called? I'm wondering
if we can defer this, and call it later, after the tag code has
removed the header.

	Andrew
Florian Fainelli June 20, 2017, 5:30 p.m. UTC | #3
On 06/20/2017 07:01 AM, Andrew Lunn wrote:
> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>> RPS and probably other kernel features are currently broken on some if not
>> all DSA devices. The root cause of this that skb_hash will call the
>> flow_disector.
> 
> Hi John
> 
> What is the call path when the flow_disector is called? I'm wondering
> if we can defer this, and call it later, after the tag code has
> removed the header.

Would not you usually want to configure RPS at the DSA network device
level where the switch tag has already been popped and you are
processing a regular Ethernet frame at that point?
John Crispin June 20, 2017, 5:37 p.m. UTC | #4
On 20/06/17 16:01, Andrew Lunn wrote:
> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>> RPS and probably other kernel features are currently broken on some if not
>> all DSA devices. The root cause of this that skb_hash will call the
>> flow_disector.
> Hi John
>
> What is the call path when the flow_disector is called? I'm wondering
> if we can defer this, and call it later, after the tag code has
> removed the header.
>
> 	Andrew

Hi Andrew,

the ethernet driver receives the frame and passes it down the line. 
Eventually it ends up inside netif_receive_skb_internal() where it gets 
added to the backlog. At this point get_rps_cpu() is called. Inside 
get_rps_cpu() the skb_get_hash() is called which utilizes the 
flow_dissector() ... which is broken for DSA devices. get_rps_cpu() will 
always return the same hash for all flows and the frame is always added 
to the backlog on the same core. Once inside the backlog it will 
traverse through the dsa layer and end up inside the tag driver and be 
passed to the slave device for further processing and keep its bad flow 
hash for its whole life cycle.

In theory we could reset the hash inside the tag driver but ideally the 
whole life cycle of the frame should happen on the same core to avoid 
possible reordering issues. In addition RPS is broken until the frame 
reaches the tag driver. In the case of the mediatek mt7623 we only have 
1 RX IRQ and in the worst case the RPS of the frame while still inside 
ethX will happen on the same core as where we handle IRQs. This will 
increase the IRQ latency and reduce the free cpu time, thus reducing 
maximum throughput. I did test resetting the hash inside the tag driver. 
Calculating the correct hash from the start did yield a huge performance 
difference however, at least on mt7623. We are talking about 30% extra 
max throughput. This might not be such a big problem if the SoC has a 
multi queue ethernet core but on mt7623 it does make a huge difference 
if we can use RPS to delegate all frame processing away from the core 
handling the IRQs.

     John
John Crispin June 20, 2017, 5:38 p.m. UTC | #5
On 20/06/17 19:30, Florian Fainelli wrote:
> On 06/20/2017 07:01 AM, Andrew Lunn wrote:
>> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>>> RPS and probably other kernel features are currently broken on some if not
>>> all DSA devices. The root cause of this that skb_hash will call the
>>> flow_disector.
>> Hi John
>>
>> What is the call path when the flow_disector is called? I'm wondering
>> if we can defer this, and call it later, after the tag code has
>> removed the header.
> Would not you usually want to configure RPS at the DSA network device
> level where the switch tag has already been popped and you are
> processing a regular Ethernet frame at that point?
Hi Florian,

is explained in my mail to Andrew, you really want to be able to setup 
RPS for all devices in the chain to free up the core handling IRQs

     John
Andrew Lunn June 20, 2017, 9:52 p.m. UTC | #6
> On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote:
> 
> 
> On 20/06/17 16:01, Andrew Lunn wrote:
> >On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
> >>RPS and probably other kernel features are currently broken on some if not
> >>all DSA devices. The root cause of this that skb_hash will call the
> >>flow_disector.
> >Hi John
> >
> >What is the call path when the flow_disector is called? I'm wondering
> >if we can defer this, and call it later, after the tag code has
> >removed the header.
> >
> >	Andrew

Hi John

I follow your logic of doing the hash early

Is there any value in including the DSA header in the hash? That might
allow frames from different ingress ports to be spread over CPUs?

      Andrew
John Crispin June 21, 2017, 4:33 a.m. UTC | #7
On 20/06/17 23:52, Andrew Lunn wrote:
>> On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote:
>>
>>
>> On 20/06/17 16:01, Andrew Lunn wrote:
>>> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote:
>>>> RPS and probably other kernel features are currently broken on some if not
>>>> all DSA devices. The root cause of this that skb_hash will call the
>>>> flow_disector.
>>> Hi John
>>>
>>> What is the call path when the flow_disector is called? I'm wondering
>>> if we can defer this, and call it later, after the tag code has
>>> removed the header.
>>>
>>> 	Andrew
> Hi John
>
> I follow your logic of doing the hash early
>
> Is there any value in including the DSA header in the hash? That might
> allow frames from different ingress ports to be spread over CPUs?
>
>        Andrew
Hi Andrew,

adding the DSA header wont make any difference and would still require a 
patch to the flow dissector.

     John
diff mbox

Patch

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 58969b9a090c..8b0e8eca3c28 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -442,6 +442,12 @@  struct dsa_switch_ops {
 					 int port, struct net_device *br);
 	void	(*crosschip_bridge_leave)(struct dsa_switch *ds, int sw_index,
 					  int port, struct net_device *br);
+
+	/*
+	 * Network header and 802.3 protocol offsets
+	 */
+	int	hash_nh_off;
+	int	hash_proto_off;
 };
 
 struct dsa_switch_driver {
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index fc5fc4594c90..da45bdf57408 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -4,6 +4,7 @@ 
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/if_vlan.h>
+#include <net/dsa.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/gre.h>
@@ -440,6 +441,17 @@  bool __skb_flow_dissect(const struct sk_buff *skb,
 			 skb->vlan_proto : skb->protocol;
 		nhoff = skb_network_offset(skb);
 		hlen = skb_headlen(skb);
+
+		if (unlikely(netdev_uses_dsa(skb->dev))) {
+			const struct dsa_switch_ops *ops;
+			u8 *p = (u8 *) data;
+
+			ops = skb->dev->dsa_ptr->ds[0]->ops;
+			if (ops->hash_proto_off)
+				proto = (u16) p[ops->hash_proto_off];
+			hlen -= ops->hash_nh_off;
+			nhoff += ops->hash_nh_off;
+		}
 	}
 
 	/* It is ensured by skb_flow_dissector_init() that control key will