Message ID | 20200503221228.10928-3-olteanv@gmail.com |
---|---|
State | Deferred |
Delegated to: | David Miller |
Headers | show |
Series | Cross-chip bridging for disjoint DSA trees | expand |
On 5/3/2020 3:12 PM, Vladimir Oltean wrote: > From: Vladimir Oltean <vladimir.oltean@nxp.com> > > One way of utilizing DSA is by cascading switches which do not all have > compatible taggers. Consider the following real-life topology: > > +---------------------------------------------------------------+ > | LS1028A | > | +------------------------------+ | > | | DSA master for Felix | | > | |(internal ENETC port 2: eno2))| | > | +------------+------------------------------+-------------+ | > | | Felix embedded L2 switch | | > | | | | > | | +--------------+ +--------------+ +--------------+ | | > | | |DSA master for| |DSA master for| |DSA master for| | | > | | | SJA1105 1 | | SJA1105 2 | | SJA1105 3 | | | > | | |(Felix port 1)| |(Felix port 2)| |(Felix port 3)| | | > +--+-+--------------+---+--------------+---+--------------+--+--+ > > +-----------------------+ +-----------------------+ +-----------------------+ > | SJA1105 switch 1 | | SJA1105 switch 2 | | SJA1105 switch 3 | > +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ > |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3| > +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ > > The above can be described in the device tree as follows (obviously not > complete): > > mscc_felix { > dsa,member = <0 0>; > ports { > port@4 { > ethernet = <&enetc_port2>; > }; > }; > }; > > sja1105_switch1 { > dsa,member = <1 1>; > ports { > port@4 { > ethernet = <&mscc_felix_port1>; > }; > }; > }; > > sja1105_switch2 { > dsa,member = <2 2>; > ports { > port@4 { > ethernet = <&mscc_felix_port2>; > }; > }; > }; > > sja1105_switch3 { > dsa,member = <3 3>; > ports { > port@4 { > ethernet = <&mscc_felix_port3>; > }; > }; > }; > > Basically we instantiate one DSA switch tree for every hardware switch > in the system, but we still give them globally unique switch IDs (will > come back to that later). Having 3 disjoint switch trees makes the > tagger drivers "just work", because net devices are registered for the > 3 Felix DSA master ports, and they are also DSA slave ports to the ENETC > port. So packets received on the ENETC port are stripped of their > stacked DSA tags one by one. > > Currently, hardware bridging between ports on the same sja1105 chip is > possible, but switching between sja1105 ports on different chips is > handled by the software bridge. This is fine, but we can do better. > > In fact, the dsa_8021q tag used by sja1105 is compatible with cascading. > In other words, a sja1105 switch can correctly parse and route a packet > containing a dsa_8021q tag. So if we could enable hardware bridging on > the Felix DSA master ports, cross-chip bridging could be completely > offloaded. > > Such as system would be used as follows: > > ip link add dev br0 type bridge && ip link set dev br0 up > for port in sw0p0 sw0p1 sw0p2 sw0p3 \ > sw1p0 sw1p1 sw1p2 sw1p3 \ > sw2p0 sw2p1 sw2p2 sw2p3; do > ip link set dev $port master br0 > done > > The above makes switching between ports on the same row be performed in > hardware, and between ports on different rows in software. Now assume > the Felix switch ports are called swp0, swp1, swp2. By running the > following extra commands: > > ip link add dev br1 type bridge && ip link set dev br1 up > for port in swp0 swp1 swp2; do > ip link set dev $port master br1 > done > > the CPU no longer sees packets which traverse sja1105 switch boundaries > and can be forwarded directly by Felix. The br1 bridge would not be used > for any sort of traffic termination. Is there anything that prevents br1 from terminating traffic though (just curious)? > > For this to work, we need to give drivers an opportunity to listen for > bridging events on DSA trees other than their own, and pass that other > tree index as argument. I have made the assumption, for the moment, that > the other existing DSA notifiers don't need to be broadcast to other > trees. That assumption might turn out to be incorrect. But in the > meantime, introduce a dsa_broadcast function, similar in purpose to > dsa_port_notify, which is used only by the bridging notifiers. > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > --- Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Hi Florian, Thank you so much for the review! On Fri, 8 May 2020 at 06:16, Florian Fainelli <f.fainelli@gmail.com> wrote: > > > > On 5/3/2020 3:12 PM, Vladimir Oltean wrote: > > From: Vladimir Oltean <vladimir.oltean@nxp.com> > > > > One way of utilizing DSA is by cascading switches which do not all have > > compatible taggers. Consider the following real-life topology: > > > > +---------------------------------------------------------------+ > > | LS1028A | > > | +------------------------------+ | > > | | DSA master for Felix | | > > | |(internal ENETC port 2: eno2))| | > > | +------------+------------------------------+-------------+ | > > | | Felix embedded L2 switch | | > > | | | | > > | | +--------------+ +--------------+ +--------------+ | | > > | | |DSA master for| |DSA master for| |DSA master for| | | > > | | | SJA1105 1 | | SJA1105 2 | | SJA1105 3 | | | > > | | |(Felix port 1)| |(Felix port 2)| |(Felix port 3)| | | > > +--+-+--------------+---+--------------+---+--------------+--+--+ > > > > +-----------------------+ +-----------------------+ +-----------------------+ > > | SJA1105 switch 1 | | SJA1105 switch 2 | | SJA1105 switch 3 | > > +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ > > |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3| > > +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ > > > > The above can be described in the device tree as follows (obviously not > > complete): > > > > mscc_felix { > > dsa,member = <0 0>; > > ports { > > port@4 { > > ethernet = <&enetc_port2>; > > }; > > }; > > }; > > > > sja1105_switch1 { > > dsa,member = <1 1>; > > ports { > > port@4 { > > ethernet = <&mscc_felix_port1>; > > }; > > }; > > }; > > > > sja1105_switch2 { > > dsa,member = <2 2>; > > ports { > > port@4 { > > ethernet = <&mscc_felix_port2>; > > }; > > }; > > }; > > > > sja1105_switch3 { > > dsa,member = <3 3>; > > ports { > > port@4 { > > ethernet = <&mscc_felix_port3>; > > }; > > }; > > }; > > > > Basically we instantiate one DSA switch tree for every hardware switch > > in the system, but we still give them globally unique switch IDs (will > > come back to that later). Having 3 disjoint switch trees makes the > > tagger drivers "just work", because net devices are registered for the > > 3 Felix DSA master ports, and they are also DSA slave ports to the ENETC > > port. So packets received on the ENETC port are stripped of their > > stacked DSA tags one by one. > > > > Currently, hardware bridging between ports on the same sja1105 chip is > > possible, but switching between sja1105 ports on different chips is > > handled by the software bridge. This is fine, but we can do better. > > > > In fact, the dsa_8021q tag used by sja1105 is compatible with cascading. > > In other words, a sja1105 switch can correctly parse and route a packet > > containing a dsa_8021q tag. So if we could enable hardware bridging on > > the Felix DSA master ports, cross-chip bridging could be completely > > offloaded. > > > > Such as system would be used as follows: > > > > ip link add dev br0 type bridge && ip link set dev br0 up > > for port in sw0p0 sw0p1 sw0p2 sw0p3 \ > > sw1p0 sw1p1 sw1p2 sw1p3 \ > > sw2p0 sw2p1 sw2p2 sw2p3; do > > ip link set dev $port master br0 > > done > > > > The above makes switching between ports on the same row be performed in > > hardware, and between ports on different rows in software. Now assume > > the Felix switch ports are called swp0, swp1, swp2. By running the > > following extra commands: > > > > ip link add dev br1 type bridge && ip link set dev br1 up > > for port in swp0 swp1 swp2; do > > ip link set dev $port master br1 > > done > > > > the CPU no longer sees packets which traverse sja1105 switch boundaries > > and can be forwarded directly by Felix. The br1 bridge would not be used > > for any sort of traffic termination. > > Is there anything that prevents br1 from terminating traffic though > (just curious)? > Well, one obvious limitation is the fact that to support termination on br1, the bridge rx_handler would have to steal packets from DSA software RX processing path. We just need the upstream switch to forward packets in hardware between ports that are DSA masters, so the choice was to at least permit that. So given the fact that now we have a dummy rx_handler on br1, it _can_ not terminate any traffic. For the particular hardware layout presented above, the choice was to let the user bridge the Felix ports. Functionally it is optional (sja1105 ports are still bridged both ways), but the data paths are different: - if br1 doesn't exist, then a packet that needs to go from sw1p0 to sw2p0 is bridged in software by br0 (because Felix is not bridged, all of its traffic goes to the CPU, then the rules on br0 kick in, and this reinjects the packet to sw2p0, which calls dev_queue_xmit to Felix port 2, which calls dev_queue_xmit to the one and only ENETC master). - If br1 exists and we want to forward packets along the same route (sw1p0 -> sw2p0), then br0 only defines the forwarding domain to which packets are allowed to go to. There would initially be one duplicate, when Felix floods the first packet to the CPU _and_ to its other port (the DSA master of sw2p0), because the packet sent to the stack will still get software-bridged and re-enqueued just as in the case above. But for further packets, Felix will no longer flood packets to the CPU, but just to the other switch. On that end, the other switch will look at the dsa_8021q tag and decide which ports are allowed to see the packet and which aren't (these are the "crosschip links" that depend on which ports are part of br0). So to answer your question, we never need to terminate traffic on br1 because it only serves as double duty for br0 (accelerating its forwarding path). The alternative would have been to build some sja1105 awareness in Felix of some sorts. The question, of course, is when can the Felix driver automatically decide that its DSA masters can be bridged together? And if we take an "automatic decision" route, is it sane that Felix ports 1 and 2 are forwarding packets autonomously between them, even though there is no Linux bridge that asked for that? On the other hand, we may imagine a few situations where things might look differently. Let's say Felix had 4 ports, but sja1105 switches were hanging off of only 3 of its ports. The 4th interface goes straight to a copper port. If sw1p0 wants to talk to the copper port of Felix, how can we model that, and what are the chances of it working in hardware? Spoiler alert, it won't work purely in hardware, because the copper port would see the unpopped dsa_8021q headers coming from sw1p0. But we can still put sw1p0 and Felix port 4 in the same br0 interface, and packets from sw1p0 would go to the CPU, where a new packet would be forwarded to Felix port 4 without the dsa_8021q tag of sw1p0. So bridging a Felix standalone (not DSA master) interface with a sja1105 interface could work under some circumstances (through software bridging), but that is non-ideal, so as long as the DSA master switch doesn't have any understanding of the DSA headers it's transporting, it's simply easier to not do that :) and design boards where there's a sja1105 switch hanging off of every used Felix port. But what if we build a super-Felix switch in the future, that understands the DSA tags of the switches cascaded beneath it? Let's treat this "super-Felix" in the generic case where it's not a DSA device. Currently DSA only means that it has an Ethernet connection towards the system, so its I/O is performed indirectly. But "super-Felix" can be a pure switchdev device just as well, we need to think about this situation in a generic way. The point is just that "super-Felix" has awareness of the DSA tags of switches beneath it. It can listen for "change upper" events for bridging, and it can detect when its standalone copper port 4 gets added to the same bridge as one such downstream switch that it can understand. So in that case, the "super-Felix" switch can do some magic in the background: it can permit hardware bridging of its copper port 4 with a downstream sja1105 hanging off of its port 0. Based on the topology described in the device tree, packets sent to the sja1105 would contain a DSA tag, and packets sent to the copper port wouldn't. From a user perspective, things would "just work". I know the data flow sja1105 <-> super-Felix copper port 4 that I just described is different than what this patch set is providing. With current Felix, this data flow is not even possible in hardware. But I would like to look forward and imagine, with that super-Felix, if br1 would still be necessary for the simple case where we're bridging sw1p0 with sw2p0. I think it would still be necessary, because there's still no "natural" place for super-Felix to listen on "change upper" events of the DSA net devices below it. That's the dilemma I'm having, but it looks like br1 between masters is still the way to go, and that model won't change regardless of whether the parent switchdev driver is DSA-aware or not. I would like to get some more feedback on this. > > > > For this to work, we need to give drivers an opportunity to listen for > > bridging events on DSA trees other than their own, and pass that other > > tree index as argument. I have made the assumption, for the moment, that > > the other existing DSA notifiers don't need to be broadcast to other > > trees. That assumption might turn out to be incorrect. But in the > > meantime, introduce a dsa_broadcast function, similar in purpose to > > dsa_port_notify, which is used only by the bridging notifiers. > > > > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> > > --- > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > -- > Florian Thanks, -Vladimir
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index dd8a5666a584..9ecfbe0c4f6c 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2233,26 +2233,34 @@ static void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port, mv88e6xxx_reg_unlock(chip); } -static int mv88e6xxx_crosschip_bridge_join(struct dsa_switch *ds, int dev, +static int mv88e6xxx_crosschip_bridge_join(struct dsa_switch *ds, + int tree_index, int sw_index, int port, struct net_device *br) { struct mv88e6xxx_chip *chip = ds->priv; int err; + if (tree_index != ds->dst->index) + return 0; + mv88e6xxx_reg_lock(chip); - err = mv88e6xxx_pvt_map(chip, dev, port); + err = mv88e6xxx_pvt_map(chip, sw_index, port); mv88e6xxx_reg_unlock(chip); return err; } -static void mv88e6xxx_crosschip_bridge_leave(struct dsa_switch *ds, int dev, +static void mv88e6xxx_crosschip_bridge_leave(struct dsa_switch *ds, + int tree_index, int sw_index, int port, struct net_device *br) { struct mv88e6xxx_chip *chip = ds->priv; + if (tree_index != ds->dst->index) + return; + mv88e6xxx_reg_lock(chip); - if (mv88e6xxx_pvt_map(chip, dev, port)) + if (mv88e6xxx_pvt_map(chip, sw_index, port)) dev_err(ds->dev, "failed to remap cross-chip Port VLAN\n"); mv88e6xxx_reg_unlock(chip); } diff --git a/include/net/dsa.h b/include/net/dsa.h index 3c0ae84156c8..bd5561dacb13 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -574,10 +574,12 @@ struct dsa_switch_ops { /* * Cross-chip operations */ - int (*crosschip_bridge_join)(struct dsa_switch *ds, int sw_index, - int port, struct net_device *br); - void (*crosschip_bridge_leave)(struct dsa_switch *ds, int sw_index, - int port, struct net_device *br); + int (*crosschip_bridge_join)(struct dsa_switch *ds, int tree_index, + int sw_index, int port, + struct net_device *br); + void (*crosschip_bridge_leave)(struct dsa_switch *ds, int tree_index, + int sw_index, int port, + struct net_device *br); /* * PTP functionality diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 6d9a1ef65fa0..a1a0ae242012 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -35,6 +35,7 @@ struct dsa_notifier_ageing_time_info { /* DSA_NOTIFIER_BRIDGE_* */ struct dsa_notifier_bridge_info { struct net_device *br; + int tree_index; int sw_index; int port; }; diff --git a/net/dsa/port.c b/net/dsa/port.c index a58fdd362574..ebc8d6cbd1d4 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -13,6 +13,23 @@ #include "dsa_priv.h" +static int dsa_broadcast(unsigned long e, void *v) +{ + struct dsa_switch_tree *dst; + int err = 0; + + list_for_each_entry(dst, &dsa_tree_list, list) { + struct raw_notifier_head *nh = &dst->nh; + + err = raw_notifier_call_chain(nh, e, v); + err = notifier_to_errno(err); + if (err) + break; + } + + return err; +} + static int dsa_port_notify(const struct dsa_port *dp, unsigned long e, void *v) { struct raw_notifier_head *nh = &dp->ds->dst->nh; @@ -120,6 +137,7 @@ void dsa_port_disable(struct dsa_port *dp) int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br) { struct dsa_notifier_bridge_info info = { + .tree_index = dp->ds->dst->index, .sw_index = dp->ds->index, .port = dp->index, .br = br, @@ -136,7 +154,7 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br) */ dp->bridge_dev = br; - err = dsa_port_notify(dp, DSA_NOTIFIER_BRIDGE_JOIN, &info); + err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info); /* The bridging is rolled back on error */ if (err) { @@ -150,6 +168,7 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br) void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br) { struct dsa_notifier_bridge_info info = { + .tree_index = dp->ds->dst->index, .sw_index = dp->ds->index, .port = dp->index, .br = br, @@ -161,7 +180,7 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br) */ dp->bridge_dev = NULL; - err = dsa_port_notify(dp, DSA_NOTIFIER_BRIDGE_LEAVE, &info); + err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_LEAVE, &info); if (err) pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n"); diff --git a/net/dsa/switch.c b/net/dsa/switch.c index f3c32ff552b3..86c8dc5c32a0 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -89,11 +89,16 @@ static int dsa_switch_mtu(struct dsa_switch *ds, static int dsa_switch_bridge_join(struct dsa_switch *ds, struct dsa_notifier_bridge_info *info) { - if (ds->index == info->sw_index && ds->ops->port_bridge_join) + struct dsa_switch_tree *dst = ds->dst; + + if (dst->index == info->tree_index && ds->index == info->sw_index && + ds->ops->port_bridge_join) return ds->ops->port_bridge_join(ds, info->port, info->br); - if (ds->index != info->sw_index && ds->ops->crosschip_bridge_join) - return ds->ops->crosschip_bridge_join(ds, info->sw_index, + if ((dst->index != info->tree_index || ds->index != info->sw_index) && + ds->ops->crosschip_bridge_join) + return ds->ops->crosschip_bridge_join(ds, info->tree_index, + info->sw_index, info->port, info->br); return 0; @@ -103,13 +108,17 @@ static int dsa_switch_bridge_leave(struct dsa_switch *ds, struct dsa_notifier_bridge_info *info) { bool unset_vlan_filtering = br_vlan_enabled(info->br); + struct dsa_switch_tree *dst = ds->dst; int err, i; - if (ds->index == info->sw_index && ds->ops->port_bridge_leave) + if (dst->index == info->tree_index && ds->index == info->sw_index && + ds->ops->port_bridge_join) ds->ops->port_bridge_leave(ds, info->port, info->br); - if (ds->index != info->sw_index && ds->ops->crosschip_bridge_leave) - ds->ops->crosschip_bridge_leave(ds, info->sw_index, info->port, + if ((dst->index != info->tree_index || ds->index != info->sw_index) && + ds->ops->crosschip_bridge_join) + ds->ops->crosschip_bridge_leave(ds, info->tree_index, + info->sw_index, info->port, info->br); /* If the bridge was vlan_filtering, the bridge core doesn't trigger an