Message ID | 20170620080655.7251-1-john@phrozen.org |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Hello! On 6/20/2017 11:06 AM, John Crispin wrote: > RPS and probably other kernel features are currently broken on some if not > all DSA devices. The root cause of this that skb_hash will call the "Is" missing between "this" and "that"? > flow_disector. At this point the skb still contains the magic switch header Dissector? > and the skb->protocol field is not set up to the correct 802.3 value yet. > by the time the tag specific code is called, removing the header and > properly setting the protocol an invalid hash is already set. In the case > of the mt7530 this will result in all flows always having the same hash. > > The patch adds 2 new fields to the dsa_switch_ops allowing the > flow_disector to use them in order to be able to create the real hash of Again. > the connection. > > Signed-off-by: John Crispin <john@phrozen.org> [...] > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c > index fc5fc4594c90..da45bdf57408 100644 > --- a/net/core/flow_dissector.c > +++ b/net/core/flow_dissector.c [...] > @@ -440,6 +441,17 @@ bool __skb_flow_dissect(const struct sk_buff *skb, > skb->vlan_proto : skb->protocol; > nhoff = skb_network_offset(skb); > hlen = skb_headlen(skb); > + > + if (unlikely(netdev_uses_dsa(skb->dev))) { > + const struct dsa_switch_ops *ops; > + u8 *p = (u8 *) data; Didn't checkpatch.pl complain about space after (u8 *)? > + > + ops = skb->dev->dsa_ptr->ds[0]->ops; > + if (ops->hash_proto_off) > + proto = (u16) p[ops->hash_proto_off]; Again, didn't it? [...] MBR, Sergei
On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: > RPS and probably other kernel features are currently broken on some if not > all DSA devices. The root cause of this that skb_hash will call the > flow_disector. Hi John What is the call path when the flow_disector is called? I'm wondering if we can defer this, and call it later, after the tag code has removed the header. Andrew
On 06/20/2017 07:01 AM, Andrew Lunn wrote: > On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: >> RPS and probably other kernel features are currently broken on some if not >> all DSA devices. The root cause of this that skb_hash will call the >> flow_disector. > > Hi John > > What is the call path when the flow_disector is called? I'm wondering > if we can defer this, and call it later, after the tag code has > removed the header. Would not you usually want to configure RPS at the DSA network device level where the switch tag has already been popped and you are processing a regular Ethernet frame at that point?
On 20/06/17 16:01, Andrew Lunn wrote: > On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: >> RPS and probably other kernel features are currently broken on some if not >> all DSA devices. The root cause of this that skb_hash will call the >> flow_disector. > Hi John > > What is the call path when the flow_disector is called? I'm wondering > if we can defer this, and call it later, after the tag code has > removed the header. > > Andrew Hi Andrew, the ethernet driver receives the frame and passes it down the line. Eventually it ends up inside netif_receive_skb_internal() where it gets added to the backlog. At this point get_rps_cpu() is called. Inside get_rps_cpu() the skb_get_hash() is called which utilizes the flow_dissector() ... which is broken for DSA devices. get_rps_cpu() will always return the same hash for all flows and the frame is always added to the backlog on the same core. Once inside the backlog it will traverse through the dsa layer and end up inside the tag driver and be passed to the slave device for further processing and keep its bad flow hash for its whole life cycle. In theory we could reset the hash inside the tag driver but ideally the whole life cycle of the frame should happen on the same core to avoid possible reordering issues. In addition RPS is broken until the frame reaches the tag driver. In the case of the mediatek mt7623 we only have 1 RX IRQ and in the worst case the RPS of the frame while still inside ethX will happen on the same core as where we handle IRQs. This will increase the IRQ latency and reduce the free cpu time, thus reducing maximum throughput. I did test resetting the hash inside the tag driver. Calculating the correct hash from the start did yield a huge performance difference however, at least on mt7623. We are talking about 30% extra max throughput. This might not be such a big problem if the SoC has a multi queue ethernet core but on mt7623 it does make a huge difference if we can use RPS to delegate all frame processing away from the core handling the IRQs. John
On 20/06/17 19:30, Florian Fainelli wrote: > On 06/20/2017 07:01 AM, Andrew Lunn wrote: >> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: >>> RPS and probably other kernel features are currently broken on some if not >>> all DSA devices. The root cause of this that skb_hash will call the >>> flow_disector. >> Hi John >> >> What is the call path when the flow_disector is called? I'm wondering >> if we can defer this, and call it later, after the tag code has >> removed the header. > Would not you usually want to configure RPS at the DSA network device > level where the switch tag has already been popped and you are > processing a regular Ethernet frame at that point? Hi Florian, is explained in my mail to Andrew, you really want to be able to setup RPS for all devices in the chain to free up the core handling IRQs John
> On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote: > > > On 20/06/17 16:01, Andrew Lunn wrote: > >On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: > >>RPS and probably other kernel features are currently broken on some if not > >>all DSA devices. The root cause of this that skb_hash will call the > >>flow_disector. > >Hi John > > > >What is the call path when the flow_disector is called? I'm wondering > >if we can defer this, and call it later, after the tag code has > >removed the header. > > > > Andrew Hi John I follow your logic of doing the hash early Is there any value in including the DSA header in the hash? That might allow frames from different ingress ports to be spread over CPUs? Andrew
On 20/06/17 23:52, Andrew Lunn wrote: >> On Tue, Jun 20, 2017 at 07:37:35PM +0200, John Crispin wrote: >> >> >> On 20/06/17 16:01, Andrew Lunn wrote: >>> On Tue, Jun 20, 2017 at 10:06:54AM +0200, John Crispin wrote: >>>> RPS and probably other kernel features are currently broken on some if not >>>> all DSA devices. The root cause of this that skb_hash will call the >>>> flow_disector. >>> Hi John >>> >>> What is the call path when the flow_disector is called? I'm wondering >>> if we can defer this, and call it later, after the tag code has >>> removed the header. >>> >>> Andrew > Hi John > > I follow your logic of doing the hash early > > Is there any value in including the DSA header in the hash? That might > allow frames from different ingress ports to be spread over CPUs? > > Andrew Hi Andrew, adding the DSA header wont make any difference and would still require a patch to the flow dissector. John
diff --git a/include/net/dsa.h b/include/net/dsa.h index 58969b9a090c..8b0e8eca3c28 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -442,6 +442,12 @@ struct dsa_switch_ops { int port, struct net_device *br); void (*crosschip_bridge_leave)(struct dsa_switch *ds, int sw_index, int port, struct net_device *br); + + /* + * Network header and 802.3 protocol offsets + */ + int hash_nh_off; + int hash_proto_off; }; struct dsa_switch_driver { diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index fc5fc4594c90..da45bdf57408 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -4,6 +4,7 @@ #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/if_vlan.h> +#include <net/dsa.h> #include <net/ip.h> #include <net/ipv6.h> #include <net/gre.h> @@ -440,6 +441,17 @@ bool __skb_flow_dissect(const struct sk_buff *skb, skb->vlan_proto : skb->protocol; nhoff = skb_network_offset(skb); hlen = skb_headlen(skb); + + if (unlikely(netdev_uses_dsa(skb->dev))) { + const struct dsa_switch_ops *ops; + u8 *p = (u8 *) data; + + ops = skb->dev->dsa_ptr->ds[0]->ops; + if (ops->hash_proto_off) + proto = (u16) p[ops->hash_proto_off]; + hlen -= ops->hash_nh_off; + nhoff += ops->hash_nh_off; + } } /* It is ensured by skb_flow_dissector_init() that control key will
RPS and probably other kernel features are currently broken on some if not all DSA devices. The root cause of this that skb_hash will call the flow_disector. At this point the skb still contains the magic switch header and the skb->protocol field is not set up to the correct 802.3 value yet. by the time the tag specific code is called, removing the header and properly setting the protocol an invalid hash is already set. In the case of the mt7530 this will result in all flows always having the same hash. The patch adds 2 new fields to the dsa_switch_ops allowing the flow_disector to use them in order to be able to create the real hash of the connection. Signed-off-by: John Crispin <john@phrozen.org> --- include/net/dsa.h | 6 ++++++ net/core/flow_dissector.c | 12 ++++++++++++ 2 files changed, 18 insertions(+)