Message ID | 20190824024251.4542-1-marek.behun@nic.cz |
---|---|
Headers | show |
Series | Multi-CPU DSA support | expand |
On Sat, Aug 24, 2019 at 04:42:47AM +0200, Marek Behún wrote: > Hi, > this is my attempt to solve the multi-CPU port issue for DSA. > > Patch 1 adds code for handling multiple CPU ports in a DSA switch tree. > If more than one CPU port is found in a tree, the code assigns CPU ports > to user/DSA ports in a round robin way. So for the simplest case where > we have one switch with N ports, 2 of them of type CPU connected to eth0 > and eth1, and the other ports labels being lan1, lan2, ..., the code > assigns them to CPU ports this way: > lan1 <-> eth0 > lan2 <-> eth1 > lan3 <-> eth0 > lan4 <-> eth1 > lan5 <-> eth0 Hi Marek That is what i've always argued is a good default. So i'm happy with this. > Patch 2 adds a new operation to the net device operations structure. > Currently we use the iflink property of a net device to report to which > CPU port a given switch port si connected to. The ip link utility from > iproute2 reports this as "lan1@eth0". We add a new net device operation, > ndo_set_iflink, which can be used to set this property. We call this > function from the netlink handlers. That is a new idea. Interesting. I would like to look around and see what else uses this "lan1@eth0" concept. We need to ensure it is not counter intuitive in general, when you consider all possible users. > Patch 3 implements this new ndo_set_iflink operation for DSA slave > device. Thus the userspace can request a change of CPU port of a given > port. So this is all about transmit from the host out the switch. What about receive? How do you tell the switch which CPU interface it should use for a port? Thanks Andrew
Hi Marek, On Sat, 24 Aug 2019 at 05:43, Marek Behún <marek.behun@nic.cz> wrote: > > Hi, > this is my attempt to solve the multi-CPU port issue for DSA. > > Patch 1 adds code for handling multiple CPU ports in a DSA switch tree. > If more than one CPU port is found in a tree, the code assigns CPU ports > to user/DSA ports in a round robin way. So for the simplest case where > we have one switch with N ports, 2 of them of type CPU connected to eth0 > and eth1, and the other ports labels being lan1, lan2, ..., the code > assigns them to CPU ports this way: > lan1 <-> eth0 > lan2 <-> eth1 > lan3 <-> eth0 > lan4 <-> eth1 > lan5 <-> eth0 > ... > > Patch 2 adds a new operation to the net device operations structure. > Currently we use the iflink property of a net device to report to which > CPU port a given switch port si connected to. The ip link utility from > iproute2 reports this as "lan1@eth0". We add a new net device operation, > ndo_set_iflink, which can be used to set this property. We call this > function from the netlink handlers. > > Patch 3 implements this new ndo_set_iflink operation for DSA slave > device. Thus the userspace can request a change of CPU port of a given > port. > > I am also sending patch for iproute2-next, to add support for setting > this iflink value. > > Marek > The topic is interesting. This changeset leaves the reader wanting to see a driver implementation of .port_change_cpu_port. (mostly to understand what your hardware is capable of) Will DSA assume that all CPU ports are equal in terms of tagging protocol abilities? There are switches where one of the CPU ports can do tagging and the other can't. Is the static assignment between slave and CPU ports going to be the only use case? What about link aggregation? Flow steering perhaps? And like Andrew pointed out, how do you handle the receive case? What happens to flooded frames, will the switch send them to both CPU interfaces, and get received twice in Linux? How do you prevent that? > Marek Behún (3): > net: dsa: allow for multiple CPU ports > net: add ndo for setting the iflink property > net: dsa: implement ndo_set_netlink for chaning port's CPU port > > include/linux/netdevice.h | 5 +++ > include/net/dsa.h | 11 ++++- > net/core/dev.c | 15 +++++++ > net/core/rtnetlink.c | 7 ++++ > net/dsa/dsa2.c | 84 +++++++++++++++++++++++++-------------- > net/dsa/slave.c | 35 ++++++++++++++++ > 6 files changed, 126 insertions(+), 31 deletions(-) > > -- > 2.21.0 > Regards, -Vladimir
On Sat, 24 Aug 2019 at 18:40, Vladimir Oltean <olteanv@gmail.com> wrote: > > Hi Marek, > > On Sat, 24 Aug 2019 at 05:43, Marek Behún <marek.behun@nic.cz> wrote: > > > > Hi, > > this is my attempt to solve the multi-CPU port issue for DSA. > > > > Patch 1 adds code for handling multiple CPU ports in a DSA switch tree. > > If more than one CPU port is found in a tree, the code assigns CPU ports > > to user/DSA ports in a round robin way. So for the simplest case where > > we have one switch with N ports, 2 of them of type CPU connected to eth0 > > and eth1, and the other ports labels being lan1, lan2, ..., the code > > assigns them to CPU ports this way: > > lan1 <-> eth0 > > lan2 <-> eth1 > > lan3 <-> eth0 > > lan4 <-> eth1 > > lan5 <-> eth0 > > ... > > > > Patch 2 adds a new operation to the net device operations structure. > > Currently we use the iflink property of a net device to report to which > > CPU port a given switch port si connected to. The ip link utility from > > iproute2 reports this as "lan1@eth0". We add a new net device operation, > > ndo_set_iflink, which can be used to set this property. We call this > > function from the netlink handlers. > > > > Patch 3 implements this new ndo_set_iflink operation for DSA slave > > device. Thus the userspace can request a change of CPU port of a given > > port. > > > > I am also sending patch for iproute2-next, to add support for setting > > this iflink value. > > > > Marek > > > > The topic is interesting. > This changeset leaves the reader wanting to see a driver > implementation of .port_change_cpu_port. (mostly to understand what > your hardware is capable of) > Will DSA assume that all CPU ports are equal in terms of tagging > protocol abilities? There are switches where one of the CPU ports can > do tagging and the other can't. Just to be clear. You can argue that such switches are weird, and that's ok. Just want to understand the general type of hardware for which such a patch is intended. > Is the static assignment between slave and CPU ports going to be the > only use case? What about link aggregation? Flow steering perhaps? > And like Andrew pointed out, how do you handle the receive case? What > happens to flooded frames, will the switch send them to both CPU > interfaces, and get received twice in Linux? How do you prevent that? > > > Marek Behún (3): > > net: dsa: allow for multiple CPU ports > > net: add ndo for setting the iflink property > > net: dsa: implement ndo_set_netlink for chaning port's CPU port > > > > include/linux/netdevice.h | 5 +++ > > include/net/dsa.h | 11 ++++- > > net/core/dev.c | 15 +++++++ > > net/core/rtnetlink.c | 7 ++++ > > net/dsa/dsa2.c | 84 +++++++++++++++++++++++++-------------- > > net/dsa/slave.c | 35 ++++++++++++++++ > > 6 files changed, 126 insertions(+), 31 deletions(-) > > > > -- > > 2.21.0 > > > > Regards, > -Vladimir
> Will DSA assume that all CPU ports are equal in terms of tagging > protocol abilities? There are switches where one of the CPU ports can > do tagging and the other can't. Hi Vladimir Given the current definition of what a CPU port is, we have to assume the port is using tags. Frames have to be directed out a specific egress port, otherwise things like BPDU, PTP will break. You cannot rely on MAC address learning. > Is the static assignment between slave and CPU ports going to be the > only use case? What about link aggregation? Flow steering perhaps? > And like Andrew pointed out, how do you handle the receive case? What > happens to flooded frames, will the switch send them to both CPU > interfaces, and get received twice in Linux? How do you prevent that? I expect bad things will happen if frames are flooded to multiple CPU ports. For this to work, the whole switch design needs to support multiple CPU ports. I doubt this will work on any old switch. Having a host interface connected to a user port of the switch is a completely different uses case, and not what this patchset is about. Andrew
On Sat, 24 Aug 2019 17:24:07 +0200 Andrew Lunn <andrew@lunn.ch> wrote: > So this is all about transmit from the host out the switch. What about > receive? How do you tell the switch which CPU interface it should use > for a port? Andrew, we use the same. The DSA slave implementation of ndo_set_iflink will also tell the switch driver to change the CPU port for that port. Patch 3 also adds operation port_change_cpu_port to the DSA switch operations. This is called from dsa_slave_set_iflink (at least in this first proposal). Marek
On Sat, Aug 24, 2019 at 07:45:46PM +0200, Marek Behun wrote: > On Sat, 24 Aug 2019 17:24:07 +0200 > Andrew Lunn <andrew@lunn.ch> wrote: > > > So this is all about transmit from the host out the switch. What about > > receive? How do you tell the switch which CPU interface it should use > > for a port? > > Andrew, we use the same. The DSA slave implementation of ndo_set_iflink > will also tell the switch driver to change the CPU port for that port. > Patch 3 also adds operation port_change_cpu_port to the DSA switch > operations. This is called from dsa_slave_set_iflink (at least in this > first proposal). Yes, i noticed this later. The cover letter did not include a change to a driver, so it was not clear you had considered receive, which is very much in the hard of the switch driver, not the DSA core. Andrew
On Sat, 24 Aug 2019 18:44:44 +0300 Vladimir Oltean <olteanv@gmail.com> wrote: > Just to be clear. You can argue that such switches are weird, and > that's ok. Just want to understand the general type of hardware for > which such a patch is intended. Vladimir, the general part should solve for devices like Turris 1.x (qca8k) and Turris Omnia (mv88e6xxx). In these devices the switch is connected to CPU via 2 ports, and 5 ports are connected to RJ-45s. I answered Andrew's question about the receive path in previous mail. To your other question I still would have to think about, but the general idea is that for other types of frames the switch driver should only use one CPU port, so that no frame would reach CPU 2 times. I shall send proposed implementation for mv88e6xxx in next version, perhaps this night. Marek
On Sat, 24 Aug 2019 17:56:36 +0200 Andrew Lunn <andrew@lunn.ch> wrote: > I expect bad things will happen if frames are flooded to multiple CPU > ports. For this to work, the whole switch design needs to support > multiple CPU ports. I doubt this will work on any old switch. > > Having a host interface connected to a user port of the switch is a > completely different uses case, and not what this patchset is about. In the next proposal I shall also add a guard to all DSA drivers, that if more than one CPU port is set, the driver will not probe. After that the next patch will try to add multi-CPU support to mv88e6xxx (while removeing the guard for that driver). qca8k should also be possible to do, since we used it in such a way in openwrt. I shall look into that afterwards. Marek
On 8/23/2019 7:42 PM, Marek Behún wrote: > Hi, > this is my attempt to solve the multi-CPU port issue for DSA. > > Patch 1 adds code for handling multiple CPU ports in a DSA switch tree. > If more than one CPU port is found in a tree, the code assigns CPU ports > to user/DSA ports in a round robin way. So for the simplest case where > we have one switch with N ports, 2 of them of type CPU connected to eth0 > and eth1, and the other ports labels being lan1, lan2, ..., the code > assigns them to CPU ports this way: > lan1 <-> eth0 > lan2 <-> eth1 > lan3 <-> eth0 > lan4 <-> eth1 > lan5 <-> eth0 > ... > > Patch 2 adds a new operation to the net device operations structure. > Currently we use the iflink property of a net device to report to which > CPU port a given switch port si connected to. The ip link utility from > iproute2 reports this as "lan1@eth0". We add a new net device operation, > ndo_set_iflink, which can be used to set this property. We call this > function from the netlink handlers. > > Patch 3 implements this new ndo_set_iflink operation for DSA slave > device. Thus the userspace can request a change of CPU port of a given > port. > > I am also sending patch for iproute2-next, to add support for setting > this iflink value. This is going to be a long email, that is broken into several parts, feel free to skip/reply on the parts you would like. - review/comments on your approach here - history of multiple CPU ports within Broadcom switches - specific use case for a device that uses upstream drivers and a Broadcom switch (BCM7278) 1) Your approach is kind of interesting here, not sure if it is the best but it is not outright wrong. In the past, we had been talking about different approaches, some of which seemed too simplistic or too narrow on the use case, and some of which that are up in the air and were not worked on. - John Crispin submitted a patch series for the MTK switch driver a while back that was picked up by Frank Wunderlich more recently. This approach uses a Device Tree based configuration in order to statically assign ports, or groups of ports to a specific DSA master device. This is IMHO wrong because a) DT is not to impose a policy but strictly describe HW, and b) there was no way to change that static assignment at runtime. - Based on that patch series, Andrew, Vivien, Frank and myself discussed two possible options: - allowing the enslaving of DSA master devices in the bridge, so as to provide a hint that specific DSA slave network devices should be "bound"/"linked" to a specific DSA master device. This requires modifications in the bridge layer to avoid undoing what commit 8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e ("net: bridge: reject DSA-enabled master netdevices as bridge members"). This would also require a bridge to be set-up - enhancing the iproute2 command and backing kernel code in order to allow defining that a DSA slave device may be enslaved into a specific DSA master, similarly to how you currently enslave a device into a bridge, you could "enslave" a DSA slave to a DSA master with something that could look like this: ip link set dev sw0p0 master eth0 # Associate port 0 with eth0 ip link set dev sw0p1 master eth1 # Associate port 1 with eth1 To date, this may be the best way to do what we want here, rather than use the iflink which is a bit re-purposing something that is not exactly meant for that. 2) With Broadcom Ethernet switches there has been historically few designs that allowed the following: - 6 port(s) switches: port 5 was the CPU port, always and supports tagging (EthType + 4bytes Broadcom tag). This is the 5325, 5365 class of switches, they are nearly 20 years old now. - 9 port(s) switches: both port 5 and port 8 could be defined as In-band Managemement Port(s) (IMP) and port 5 is IMP1 and port 8 is IMP0 by default, preference is to use IMP0. Tagging is only supported on those two ports. These are the 5395, 53125 and similar switches. Port 5 is typically meant to be connected to a WAN interface where it might be necessary to run some specific management protocol like 802.1x and such that would require a managed switch to assist with those tasks. Port 5 can do most of what port 8 does, except when it comes to classification and specific remapping rules, that limitation is carried forward with all switches described below. - 9 port(s) switches: port 5, 7 or 8 support tagging, with port 5 and 8 being possible management ports. Tagging is permitted on port 7 to provide in-band information about packets (e.g.: classification ID, QoS, etc.) to an "accelerator" (whatever it is), or a MoCA interface behind port 7. This is the BCM5301X class, the NorthStar Plus (58xxx), and the BCM7445 switches. Tagging can be enabled on RX, or TX, or both. - 9 port(s) switches: all ports support tagging, with RX only, TX only, or both directions supporting tagging. Again, port 8 remains the default and preferred management port. This is the BCM7278 class of switches. What needs to be considered here is that while multiple CPU ports may be defined, if say, both port 8 and port 5 are defined, it is preferable to continue using port 8 as the default management port because it is the most capable, therefore having a switch driver callback that allows us to elect the most suitable CPU/management port might be a good idea. If other switches treat all CPU ports equal, no need to provide that callback. 3) On the BCM7278 system we have 3 external ports wired, 1 internal port to an audio/video streaming accelerator and 2 ports wired to Ethernet MACs, on port 8 and port 5, an ascii diagram looks like this: ------------------- ------------------- |SYSTEMPORT Lite 0| |SYSTEMPORT Lite 1| ------------------- ------------------- | | | | ------------------------------------------------- | Port 8 Port 5 | | | ---------------- | Port 7 |-----| A/V streaming| | | ---------------- | Port 0 Port 1 Port 2 | ------------------------------------------------| GPHY RGMII_1 RGMII_2 GPHY is an integrated Gigabit PHY, RGMII_1 and 2 connect to external MII/RevMII/GMII/RGMII external PHYs. The Device Tree for that system declares both CPU ports by providing a phandle to the respective Ethernet MAC controllers. Now, depending on the kernel version though, you may have different behaviors: 4.9 is our downstream production kernel ATM and on such a system you have: DSA master: eth0 (port 8) DSA slaves: gphy (port 0) rgmii_1 (port 1) rgmii_2 (port 2) asp (port 7) wifi (port 5) Standard interface: eth1 (port 5) And here you should be like: hold on Florian, you have two interfaces that each represent one side of the pipe (wifi, eth1), is not that counter to the very DSA principles? On an upstream kernel though, eth0 is still present, but eth1, because it is connected to port 5 and thus has a lower number, gets chosen as the DSA master, and then the "wifi" interface is not created at all. all of that is expeccted. Now, the 4.9 kernel behavior actually works just fine because eth1 is not a special interface, so no tagging is expected, and "wifi", although it supports DSA tagging, represents another side of the CPU/host network stack, so you never have to inject frames into the switch, because you can use eth1 to do that and let MAC learning do its job to forward to the correct port of the switch. Likewise, for a frame ingressing port 0 for a MAC address that is behind port 5 that works too. The "wifi" interface here acts as a control interface that allows us to have a configuration end-point for port number 5. The typical set-up we have involves two bridge devices: br-lan spans port 0, port 1, port 2, port 7 and port 5 and takes care of putting all of these ports in the same broadcast domain br-wifi spans eth1 and another device, e.g: wlan0 and takes care of bridging LAN to WLAN Now, let's say your use case involves doubling the bandwidth for routing/NAT and you have two CPU ports for that purpose. You could use exactly that same set-up as described, and create a LAN bridge that does not span the switch port to which your second Ethernet MAC is connected to. Leave that WAN port as a standalone DSA device. Although you do need a way to indicate somehow that port X connects to Ethernet MAC Y, which Device Tree already provides. Giving configuration control such that you can arbitrarily assign DSA slaves to a given DSA master is fine, but what problems does it potentially creates?
On Sat, 24 Aug 2019 13:04:04 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote: > 1) Your approach is kind of interesting here, not sure if it is the best > but it is not outright wrong. In the past, we had been talking about > different approaches, some of which seemed too simplistic or too narrow > on the use case, and some of which that are up in the air and were not > worked on. > > - John Crispin submitted a patch series for the MTK switch driver a > while back that was picked up by Frank Wunderlich more recently. This > approach uses a Device Tree based configuration in order to statically > assign ports, or groups of ports to a specific DSA master device. This > is IMHO wrong because a) DT is not to impose a policy but strictly > describe HW, and b) there was no way to change that static assignment at > runtime. > > - Based on that patch series, Andrew, Vivien, Frank and myself discussed > two possible options: > - allowing the enslaving of DSA master devices in the bridge, so as to > provide a hint that specific DSA slave network devices should be > "bound"/"linked" to a specific DSA master device. This requires > modifications in the bridge layer to avoid undoing what commit > 8db0a2ee2c6302a1dcbcdb93cb731dfc6c0cdb5e ("net: bridge: reject > DSA-enabled master netdevices as bridge members"). This would also > require a bridge to be set-up > > - enhancing the iproute2 command and backing kernel code in order to > allow defining that a DSA slave device may be enslaved into a specific > DSA master, similarly to how you currently enslave a device into a > bridge, you could "enslave" a DSA slave to a DSA master with something > that could look like this: > > ip link set dev sw0p0 master eth0 # Associate port 0 with eth0 > ip link set dev sw0p1 master eth1 # Associate port 1 with eth1 > > To date, this may be the best way to do what we want here, rather than > use the iflink which is a bit re-purposing something that is not exactly > meant for that. We cannot use "set master" to set CPU port, since that is used for enslaving interfaces to bridges. There are usecases where these would conflict with each other. The semantics would become complicated and the documentation would became weird to users. We are *already* using the iflink property to report which CPU device is used as CPU destination port for a given switch slave interface. So why to use that for changing this, also? If you think that iflink should not be used for this, and other agree, then we should create a new property, something like dsa-upstream, (eg. ip link set dev sw0p0 dsa-upstream eth0). Using the "master" property is not right, IMO. Marek
On Sat, 24 Aug 2019 23:01:21 +0200 Marek Behun <marek.behun@nic.cz> wrote: > the documentation would became weird to users. ... would become weird ... > > We are *already* using the iflink property to report which CPU device > is used as CPU destination port for a given switch slave interface. So > why to use that for changing this, also? ... why NOT to use that for chaning this also? > > If you think that iflink should not be used for this, and other agree, ... and others agree with you,
On Sat, 24 Aug 2019 17:24:07 +0200 Andrew Lunn <andrew@lunn.ch> wrote: > That is a new idea. Interesting. > > I would like to look around and see what else uses this "lan1@eth0" > concept. We need to ensure it is not counter intuitive in general, > when you consider all possible users. There are not many users of ndo_get_iflink besides DSA slave: ip6_gre, ip6_vti, sit, ip6_tunnel ip_gte, ip_vti, ipmr, ipip, macsec, macvlan, veth ipvlan ipoib and a few other. What these have in common is that all these interfaces are linked somehow to another interfacem, ie. a macvlan interface eth0.1 is linked to it's master interface eth0. All of these are virtual interfaces. Marek
On Sat, 24 Aug 2019 13:04:04 -0700 Florian Fainelli <f.fainelli@gmail.com> wrote: > Now, the 4.9 kernel behavior actually works just fine because eth1 is > not a special interface, so no tagging is expected, and "wifi", although > it supports DSA tagging, represents another side of the CPU/host network > stack, so you never have to inject frames into the switch, because you > can use eth1 to do that and let MAC learning do its job to forward to > the correct port of the switch. Hi Florian, Sorry, I am having trouble understanding what you mean in the paragraph I quoted above (and paragraphs afterwards). eth0 and eth1 are interfaces created by an ethernet driver. wlan0 is an interface created by wireless driver. wifi is a slave interface created by DSA for port 5 on the switch. eth1 is DSA slave or a DSA master connected to port 5? How does DSA handle two interfaces with same reg property? Marek
On 8/25/2019 12:13 AM, Marek Behun wrote: > On Sat, 24 Aug 2019 13:04:04 -0700 > Florian Fainelli <f.fainelli@gmail.com> wrote: > >> Now, the 4.9 kernel behavior actually works just fine because eth1 is >> not a special interface, so no tagging is expected, and "wifi", although >> it supports DSA tagging, represents another side of the CPU/host network >> stack, so you never have to inject frames into the switch, because you >> can use eth1 to do that and let MAC learning do its job to forward to >> the correct port of the switch. > > Hi Florian, > > Sorry, I am having trouble understanding what you mean in the > paragraph I quoted above (and paragraphs afterwards). > > eth0 and eth1 are interfaces created by an ethernet driver. > wlan0 is an interface created by wireless driver. > wifi is a slave interface created by DSA for port 5 on the switch. > eth1 is DSA slave or a DSA master connected to port 5? "wifi" is a DSA slave network interface, and as you understood, eth1 is a standard Ethernet driver (which happens to be the same as eth0, since on that system there are two identical controllers, driver is bcmsysport.c). The relevant part of the DTS for that system looks like this: port@5 { reg = <5>; label = "wifi"; ethernet = <&enet_1>; }; port@8 { reg = <8>; label = "cpu"; ethernet = <&enet_0>; }; So as you can see, the devices are all described correctly, simply the behavior on the Linux change changes whether you have commit 6d4e5c570c2d66c806ecc6bd851fcf881fe8a38e ("net: dsa: get port type at parse time") included or not. With this commit, the criteria for determining what is a DSA/CPU port changed: it used to be based on the label (e.g.: "cpu", "dsa") and then it changed to be based on whether a phandle property is either "ethernet" (port is CPU) or "link" (port is DSA), which is the right thing to do, but it no longer allows us to be specific about which port is going to be elected as the CPU port. Fortunately the switch driver is coded with the assumption that either port 5 or 8 could be used. Note that when we do not have a network device that represents the switch"side (e.g.: the CPU port or the DSA port), we had to jump through many hoops in order to make some information visible to the user, or even overlay specific operations onto the DSA master, that code is under net/dsa/master.c. > > How does DSA handle two interfaces with same reg property? This is not happening, see DTS example above.