Message ID | 20190413012822.30931-19-olteanv@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | NXP SJA1105 DSA driver | expand |
On Sat, Apr 13, 2019 at 04:28:16AM +0300, Vladimir Oltean wrote: > In order to support this, we are creating a make-shift switch tag out of > a VLAN trunk configured on the CPU port. Termination of normal traffic > on switch ports only works when not under a vlan_filtering bridge. > Termination of management (PTP, BPDU) traffic works under all > circumstances because it uses a different tagging mechanism > (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q > code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105. > > There are two types of traffic: regular and link-local. > The link-local traffic received on the CPU port is trapped from the > switch's regular forwarding decisions because it matched one of the two > DMAC filters for management traffic. > On transmission, the switch requires special massaging for these > link-local frames. Due to a weird implementation of the switching IP, by > default it drops link-local frames that originate on the CPU port. It > needs to be told where to forward them to, through an SPI command > ("management route") that is valid for only a single frame. > So when we're sending link-local traffic, we need to clone skb's from > DSA and send them in our custom xmit worker that also performs SPI access. > > For that purpose, the DSA xmit handler and the xmit worker communicate > through a per-port "skb ring" software structure, with a producer and a > consumer index. At the moment this structure is rather fragile > (ping-flooding to a link-local DMAC would cause most of the frames to > get dropped). I would like to move the management traffic on a separate > netdev queue that I can stop when the skb ring got full and hardware is > busy processing, so that we are not forced to drop traffic. > > Signed-off-by: Vladimir Oltean <olteanv@gmail.com> > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > --- > Changes in v3: > Made management traffic be receivable on the DSA netdevices even when > switch tagging is disabled, as well as regular traffic be receivable on > the master netdevice in the same scenario. Both are accomplished using > the sja1105_filter() function and some small touch-ups in the .rcv > callback. It seems like you made major changes to this. When you do that, you should drop any reviewed-by tags you have. They are no longer valid because of the major changes. > /* This callback needs to be present */ > @@ -1141,7 +1158,11 @@ static int sja1105_vlan_filtering(struct dsa_switch *ds, int port, bool enabled) > if (rc) > dev_err(ds->dev, "Failed to change VLAN Ethertype\n"); > > - return rc; > + /* Switch port identification based on 802.1Q is only passable possible, not passable. > + * if we are not under a vlan_filtering bridge. So make sure > + * the two configurations are mutually exclusive. > + */ > + return sja1105_setup_8021q_tagging(ds, !enabled); > } > > static void sja1105_vlan_add(struct dsa_switch *ds, int port, > @@ -1233,9 +1254,107 @@ static int sja1105_setup(struct dsa_switch *ds) > */ > ds->vlan_filtering_is_global = true; > > + /* The DSA/switchdev model brings up switch ports in standalone mode by > + * default, and that means vlan_filtering is 0 since they're not under > + * a bridge, so it's safe to set up switch tagging at this time. > + */ > + return sja1105_setup_8021q_tagging(ds, true); > +} > + > +#include "../../../net/dsa/dsa_priv.h" No. Don't use relative includes like this. What do you need from the header? Maybe move it into include/linux/net/dsa.h > +/* Deferred work is unfortunately necessary because setting up the management > + * route cannot be done from atomit context (SPI transfer takes a sleepable > + * lock on the bus) > + */ > +static void sja1105_xmit_work_handler(struct work_struct *work) > +{ > + struct sja1105_port *sp = container_of(work, struct sja1105_port, > + xmit_work); > + struct sja1105_private *priv = sp->dp->ds->priv; > + struct net_device *slave = sp->dp->slave; > + struct net_device *master = dsa_slave_to_master(slave); > + int port = (uintptr_t)(sp - priv->ports); > + struct sk_buff *skb; > + int i, rc; > + > + while ((i = sja1105_skb_ring_get(&sp->xmit_ring, &skb)) >= 0) { > + struct sja1105_mgmt_entry mgmt_route = { 0 }; > + struct ethhdr *hdr; > + int timeout = 10; > + int skb_len; > + > + skb_len = skb->len; > + hdr = eth_hdr(skb); > + > + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest); > + mgmt_route.destports = BIT(port); > + mgmt_route.enfport = 1; > + mgmt_route.tsreg = 0; > + mgmt_route.takets = false; > + > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > + port, &mgmt_route, true); > + if (rc < 0) { > + kfree_skb(skb); > + slave->stats.tx_dropped++; > + continue; > + } > + > + /* Transfer skb to the host port. */ > + skb->dev = master; > + dev_queue_xmit(skb); > + > + /* Wait until the switch has processed the frame */ > + do { > + rc = sja1105_dynamic_config_read(priv, BLK_IDX_MGMT_ROUTE, > + port, &mgmt_route); > + if (rc < 0) { > + slave->stats.tx_errors++; > + dev_err(priv->ds->dev, > + "xmit: failed to poll for mgmt route\n"); > + continue; > + } > + > + /* UM10944: The ENFPORT flag of the respective entry is > + * cleared when a match is found. The host can use this > + * flag as an acknowledgment. > + */ > + cpu_relax(); > + } while (mgmt_route.enfport && --timeout); > + > + if (!timeout) { > + dev_err(priv->ds->dev, "xmit timed out\n"); > + slave->stats.tx_errors++; > + continue; > + } > + > + slave->stats.tx_packets++; > + slave->stats.tx_bytes += skb_len; > + } > +} > + > +static int sja1105_port_enable(struct dsa_switch *ds, int port, > + struct phy_device *phydev) > +{ > + struct sja1105_private *priv = ds->priv; > + struct sja1105_port *sp = &priv->ports[port]; > + > + sp->dp = &ds->ports[port]; > + INIT_WORK(&sp->xmit_work, sja1105_xmit_work_handler); > return 0; > } I think i'm missing something here. You have a per port queue of link local frames which need special handling. And you have a per-port work queue. To send such a frame, you need to write some register, send the frame, and then wait until the mgmt_route.enfport is reset. Why are you doing this per port? How do you stop two ports/work queues running at the same time? It seems like one queue, with one work queue would be a better structure. Also, please move all this code into the tagger. Just add exports for sja1105_dynamic_config_write() and sja1105_dynamic_config_read(). > +static void sja1105_port_disable(struct dsa_switch *ds, int port) > +{ > + struct sja1105_private *priv = ds->priv; > + struct sja1105_port *sp = &priv->ports[port]; > + struct sk_buff *skb; > + > + cancel_work_sync(&sp->xmit_work); > + while (sja1105_skb_ring_get(&sp->xmit_ring, &skb) >= 0) > + kfree_skb(skb); > +} > + > diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c > new file mode 100644 > index 000000000000..5c76a06c9093 > --- /dev/null > +++ b/net/dsa/tag_sja1105.c > @@ -0,0 +1,148 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com> > + */ > +#include <linux/etherdevice.h> > +#include <linux/if_vlan.h> > +#include <linux/dsa/sja1105.h> > +#include "../../drivers/net/dsa/sja1105/sja1105.h" Again, no, don't do this. > + > +#include "dsa_priv.h" > + > +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */ > +static inline bool sja1105_is_link_local(const struct sk_buff *skb) > +{ > + const struct ethhdr *hdr = eth_hdr(skb); > + u64 dmac = ether_addr_to_u64(hdr->h_dest); > + > + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) == > + SJA1105_LINKLOCAL_FILTER_A) > + return true; > + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) == > + SJA1105_LINKLOCAL_FILTER_B) > + return true; > + return false; > +} > + > +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev) > +{ > + if (sja1105_is_link_local(skb)) > + return true; > + if (!dev->dsa_ptr->vlan_filtering) > + return true; > + return false; > +} Please add a comment here about what frames cannot be handled by the tagger. However, i'm not too happy about this design... > + > +static struct sk_buff *sja1105_xmit(struct sk_buff *skb, > + struct net_device *netdev) > +{ > + struct dsa_port *dp = dsa_slave_to_port(netdev); > + struct dsa_switch *ds = dp->ds; > + struct sja1105_private *priv = ds->priv; > + struct sja1105_port *sp = &priv->ports[dp->index]; > + struct sk_buff *clone; > + > + if (likely(!sja1105_is_link_local(skb))) { > + /* Normal traffic path. */ > + u16 tx_vid = dsa_tagging_tx_vid(ds, dp->index); > + u8 pcp = skb->priority; > + > + /* If we are under a vlan_filtering bridge, IP termination on > + * switch ports based on 802.1Q tags is simply too brittle to > + * be passable. So just defer to the dsa_slave_notag_xmit > + * implementation. > + */ > + if (dp->vlan_filtering) > + return skb; > + > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > + } > + > + /* Code path for transmitting management traffic. This does not rely > + * upon switch tagging, but instead SPI-installed management routes. > + */ > + clone = skb_clone(skb, GFP_ATOMIC); > + if (!clone) { > + dev_err(ds->dev, "xmit: failed to clone skb\n"); > + return NULL; > + } > + > + if (sja1105_skb_ring_add(&sp->xmit_ring, clone) < 0) { > + dev_err(ds->dev, "xmit: skb ring full\n"); > + kfree_skb(clone); > + return NULL; > + } > + > + if (sp->xmit_ring.count == SJA1105_SKB_RING_SIZE) > + /* TODO setup a dedicated netdev queue for management traffic > + * so that we can selectively apply backpressure and not be > + * required to stop the entire traffic when the software skb > + * ring is full. This requires hooking the ndo_select_queue > + * from DSA and matching on mac_fltres. > + */ > + dev_err(ds->dev, "xmit: reached maximum skb ring size\n"); This should be rate limited. Andrew > + > + schedule_work(&sp->xmit_work); > + /* Let DSA free its reference to the skb and we will free > + * the clone in the deferred worker > + */ > + return NULL; > +} > + > +static struct sk_buff *sja1105_rcv(struct sk_buff *skb, > + struct net_device *netdev, > + struct packet_type *pt) > +{ > + unsigned int source_port, switch_id; > + struct ethhdr *hdr = eth_hdr(skb); > + struct sk_buff *nskb; > + u16 tpid, vid, tci; > + bool is_tagged; > + > + nskb = dsa_8021q_rcv(skb, netdev, pt, &tpid, &tci); > + is_tagged = (nskb && tpid == ETH_P_EDSA); > + > + skb->priority = (tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; > + vid = tci & VLAN_VID_MASK; > + > + skb->offload_fwd_mark = 1; > + > + if (likely(!sja1105_is_link_local(skb))) { > + /* Normal traffic path. */ > + source_port = dsa_tagging_rx_source_port(vid); > + switch_id = dsa_tagging_rx_switch_id(vid); > + } else { > + /* Management traffic path. Switch embeds the switch ID and > + * port ID into bytes of the destination MAC, courtesy of > + * the incl_srcpt options. > + */ > + source_port = hdr->h_dest[3]; > + switch_id = hdr->h_dest[4]; > + /* Clear the DMAC bytes that were mangled by the switch */ > + hdr->h_dest[3] = 0; > + hdr->h_dest[4] = 0; > + } > + > + skb->dev = dsa_master_find_slave(netdev, switch_id, source_port); > + if (!skb->dev) { > + netdev_warn(netdev, "Couldn't decode source port\n"); > + return NULL; > + } > + > + /* Delete/overwrite fake VLAN header, DSA expects to not find > + * it there, see dsa_switch_rcv: skb_push(skb, ETH_HLEN). > + */ > + if (is_tagged) > + memmove(skb->data - ETH_HLEN, skb->data - ETH_HLEN - VLAN_HLEN, > + ETH_HLEN - VLAN_HLEN); > + > + return skb; > +} > + > +const struct dsa_device_ops sja1105_netdev_ops = { > + .xmit = sja1105_xmit, > + .rcv = sja1105_rcv, > + .filter = sja1105_filter, > + .overhead = VLAN_HLEN, > +}; > + > -- > 2.17.1 >
On Sat, 13 Apr 2019 at 19:38, Andrew Lunn <andrew@lunn.ch> wrote: > > On Sat, Apr 13, 2019 at 04:28:16AM +0300, Vladimir Oltean wrote: > > In order to support this, we are creating a make-shift switch tag out of > > a VLAN trunk configured on the CPU port. Termination of normal traffic > > on switch ports only works when not under a vlan_filtering bridge. > > Termination of management (PTP, BPDU) traffic works under all > > circumstances because it uses a different tagging mechanism > > (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q > > code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105. > > > > There are two types of traffic: regular and link-local. > > The link-local traffic received on the CPU port is trapped from the > > switch's regular forwarding decisions because it matched one of the two > > DMAC filters for management traffic. > > On transmission, the switch requires special massaging for these > > link-local frames. Due to a weird implementation of the switching IP, by > > default it drops link-local frames that originate on the CPU port. It > > needs to be told where to forward them to, through an SPI command > > ("management route") that is valid for only a single frame. > > So when we're sending link-local traffic, we need to clone skb's from > > DSA and send them in our custom xmit worker that also performs SPI access. > > > > For that purpose, the DSA xmit handler and the xmit worker communicate > > through a per-port "skb ring" software structure, with a producer and a > > consumer index. At the moment this structure is rather fragile > > (ping-flooding to a link-local DMAC would cause most of the frames to > > get dropped). I would like to move the management traffic on a separate > > netdev queue that I can stop when the skb ring got full and hardware is > > busy processing, so that we are not forced to drop traffic. > > > > Signed-off-by: Vladimir Oltean <olteanv@gmail.com> > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > > --- > > Changes in v3: > > Made management traffic be receivable on the DSA netdevices even when > > switch tagging is disabled, as well as regular traffic be receivable on > > the master netdevice in the same scenario. Both are accomplished using > > the sja1105_filter() function and some small touch-ups in the .rcv > > callback. > > It seems like you made major changes to this. When you do that, you > should drop any reviewed-by tags you have. They are no longer valid > because of the major changes. > Ok, noted. > > /* This callback needs to be present */ > > @@ -1141,7 +1158,11 @@ static int sja1105_vlan_filtering(struct dsa_switch *ds, int port, bool enabled) > > if (rc) > > dev_err(ds->dev, "Failed to change VLAN Ethertype\n"); > > > > - return rc; > > + /* Switch port identification based on 802.1Q is only passable > > possible, not passable. > Passable (satisfactory, decent, acceptable) is what I wanted to say. Tagging using VLANs is possible even when the bridge wants to use them, but it's smarter not to go there. But I get your point, maybe I'll rephrase. > > + * if we are not under a vlan_filtering bridge. So make sure > > + * the two configurations are mutually exclusive. > > + */ > > + return sja1105_setup_8021q_tagging(ds, !enabled); > > } > > > > static void sja1105_vlan_add(struct dsa_switch *ds, int port, > > @@ -1233,9 +1254,107 @@ static int sja1105_setup(struct dsa_switch *ds) > > */ > > ds->vlan_filtering_is_global = true; > > > > + /* The DSA/switchdev model brings up switch ports in standalone mode by > > + * default, and that means vlan_filtering is 0 since they're not under > > + * a bridge, so it's safe to set up switch tagging at this time. > > + */ > > + return sja1105_setup_8021q_tagging(ds, true); > > +} > > + > > +#include "../../../net/dsa/dsa_priv.h" > > No. Don't use relative includes like this. > > What do you need from the header? Maybe move it into > include/linux/net/dsa.h > dsa_slave_to_master() > > +/* Deferred work is unfortunately necessary because setting up the management > > + * route cannot be done from atomit context (SPI transfer takes a sleepable > > + * lock on the bus) > > + */ > > +static void sja1105_xmit_work_handler(struct work_struct *work) > > +{ > > + struct sja1105_port *sp = container_of(work, struct sja1105_port, > > + xmit_work); > > + struct sja1105_private *priv = sp->dp->ds->priv; > > + struct net_device *slave = sp->dp->slave; > > + struct net_device *master = dsa_slave_to_master(slave); > > + int port = (uintptr_t)(sp - priv->ports); > > + struct sk_buff *skb; > > + int i, rc; > > + > > + while ((i = sja1105_skb_ring_get(&sp->xmit_ring, &skb)) >= 0) { > > + struct sja1105_mgmt_entry mgmt_route = { 0 }; > > + struct ethhdr *hdr; > > + int timeout = 10; > > + int skb_len; > > + > > + skb_len = skb->len; > > + hdr = eth_hdr(skb); > > + > > + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest); > > + mgmt_route.destports = BIT(port); > > + mgmt_route.enfport = 1; > > + mgmt_route.tsreg = 0; > > + mgmt_route.takets = false; > > + > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > + port, &mgmt_route, true); > > + if (rc < 0) { > > + kfree_skb(skb); > > + slave->stats.tx_dropped++; > > + continue; > > + } > > + > > + /* Transfer skb to the host port. */ > > + skb->dev = master; > > + dev_queue_xmit(skb); > > + > > + /* Wait until the switch has processed the frame */ > > + do { > > + rc = sja1105_dynamic_config_read(priv, BLK_IDX_MGMT_ROUTE, > > + port, &mgmt_route); > > + if (rc < 0) { > > + slave->stats.tx_errors++; > > + dev_err(priv->ds->dev, > > + "xmit: failed to poll for mgmt route\n"); > > + continue; > > + } > > + > > + /* UM10944: The ENFPORT flag of the respective entry is > > + * cleared when a match is found. The host can use this > > + * flag as an acknowledgment. > > + */ > > + cpu_relax(); > > + } while (mgmt_route.enfport && --timeout); > > + > > + if (!timeout) { > > + dev_err(priv->ds->dev, "xmit timed out\n"); > > + slave->stats.tx_errors++; > > + continue; > > + } > > + > > + slave->stats.tx_packets++; > > + slave->stats.tx_bytes += skb_len; > > + } > > +} > > + > > +static int sja1105_port_enable(struct dsa_switch *ds, int port, > > + struct phy_device *phydev) > > +{ > > + struct sja1105_private *priv = ds->priv; > > + struct sja1105_port *sp = &priv->ports[port]; > > + > > + sp->dp = &ds->ports[port]; > > + INIT_WORK(&sp->xmit_work, sja1105_xmit_work_handler); > > return 0; > > } > > I think i'm missing something here. You have a per port queue of link > local frames which need special handling. And you have a per-port work > queue. To send such a frame, you need to write some register, send the > frame, and then wait until the mgmt_route.enfport is reset. > > Why are you doing this per port? How do you stop two ports/work queues > running at the same time? It seems like one queue, with one work queue > would be a better structure. > See the "port" parameter to this call here: rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, *port*, &mgmt_route, true); The switch IP aptly allocates 4 slots for management routes. And it's a 5-port switch where 1 port is the management port. I think the structure is fine. > Also, please move all this code into the tagger. Just add exports for > sja1105_dynamic_config_write() and sja1105_dynamic_config_read(). > Well, you see, the tagger code is part of the dsa_core object. If I export function symbols from the driver, those still won't be there if I compile the driver as a module. On the other hand, the way I'm doing it, I think the schedule_work() gives me a pretty good separation. > > +static void sja1105_port_disable(struct dsa_switch *ds, int port) > > +{ > > + struct sja1105_private *priv = ds->priv; > > + struct sja1105_port *sp = &priv->ports[port]; > > + struct sk_buff *skb; > > + > > + cancel_work_sync(&sp->xmit_work); > > + while (sja1105_skb_ring_get(&sp->xmit_ring, &skb) >= 0) > > + kfree_skb(skb); > > +} > > + > > diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c > > new file mode 100644 > > index 000000000000..5c76a06c9093 > > --- /dev/null > > +++ b/net/dsa/tag_sja1105.c > > @@ -0,0 +1,148 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com> > > + */ > > +#include <linux/etherdevice.h> > > +#include <linux/if_vlan.h> > > +#include <linux/dsa/sja1105.h> > > +#include "../../drivers/net/dsa/sja1105/sja1105.h" > > Again, no, don't do this. > This separation between driver and tagger is fairly arbitrary. I need access to the driver's private structure, in order to get a hold of the private shadow of the dsa_port. Moving the driver private structure to include/linux/dsa/ would pull in quite a number of dependencies. Maybe I could provide declarations for the most of them, but anyway the private structure wouldn't be so private any longer, would it? Otherwise put, would you prefer a dp->priv similar to the already existing ds->priv? struct sja1105_port is much more lightweight to keep in include/linux/dsa/. > > + > > +#include "dsa_priv.h" > > + > > +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */ > > +static inline bool sja1105_is_link_local(const struct sk_buff *skb) > > +{ > > + const struct ethhdr *hdr = eth_hdr(skb); > > + u64 dmac = ether_addr_to_u64(hdr->h_dest); > > + > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) == > > + SJA1105_LINKLOCAL_FILTER_A) > > + return true; > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) == > > + SJA1105_LINKLOCAL_FILTER_B) > > + return true; > > + return false; > > +} > > + > > +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev) > > +{ > > + if (sja1105_is_link_local(skb)) > > + return true; > > + if (!dev->dsa_ptr->vlan_filtering) > > + return true; > > + return false; > > +} > > Please add a comment here about what frames cannot be handled by the > tagger. However, i'm not too happy about this design... > Ok, let's put this another way. A switch is primarily a device used to offload the forwarding of traffic based on L2 rules. Additionally there may be some management traffic for stuff like STP that needs to be terminated on the host port of the switch. For that, the hardware's job is to filter and tag management frames on their way to the host port, and the software's job is to process the source port and switch id information in a meaningful way. Now both this particular switch hardware, and DSA, are taking the above definitions to extremes. The switch says: "that's all you want to see? ok, so that's all I'm going to give you". So its native (hardware) tagging protocol is to trap link-local traffic and overwrite two bytes of its destination MAC with the switch ID and the source port. No more, no less. It is an incomplete solution, but it does the job for practical use cases. Now DSA says: "I want these to be fully capable net devices, I want the user to not even realize what's going on under the hood". I don't think that terminating iperf traffic through switch ports is a realistic usage scenario. So in a way discussions about performance and optimizations on DSA hotpath are slightly pointless IMO. Now what my driver says is that it offers a bit of both. It speaks the hardware's tagging protocol so it is capable of management traffic, but it also speaks the DSA paradigm, so in a way pushes the hardware to work in a mode it was never intended to, by repurposing VLANs when the user doesn't request them. So on one hand there is some overlap between the hardware tagging protocol and the VLAN one (in standalone mode and in VLAN-unaware bridged mode, management traffic *could* use VLAN tagging but it doesn't rely on it), and on the other hand the reunion of the two tagging protocols is decent, but still doesn't cover the entire spectrum (when put under a VLAN-aware bridge, you lose the ability to decode general traffic). So you'd better not rely on VLANs to decode the management traffic, because you won't be able to always rely on that, and that is a shame since a bridge with both vlan_filtering 1 and stp_state 1 is a real usage scenario, and the hardware is capable of that combination. But all of that is secondary. Let's forget about VLAN tagging for a second and concentrate on the tagging of management traffic. The limiting factor here is the software architecture of DSA, because in order for me to decode that in the driver/tagger, I'd have to drop everything else coming on the master net device (I explained in 13/24 why). I believe that DSA being all-or-nothing about switch tagging is turning a blind eye to the devices that don't go overboard with features, and give you what's needed in a real-world design but not much else. What would you improve about this design (assuming you're talking about the filter function)? Thanks, -Vladimir > > + > > +static struct sk_buff *sja1105_xmit(struct sk_buff *skb, > > + struct net_device *netdev) > > +{ > > + struct dsa_port *dp = dsa_slave_to_port(netdev); > > + struct dsa_switch *ds = dp->ds; > > + struct sja1105_private *priv = ds->priv; > > + struct sja1105_port *sp = &priv->ports[dp->index]; > > + struct sk_buff *clone; > > + > > + if (likely(!sja1105_is_link_local(skb))) { > > + /* Normal traffic path. */ > > + u16 tx_vid = dsa_tagging_tx_vid(ds, dp->index); > > + u8 pcp = skb->priority; > > + > > + /* If we are under a vlan_filtering bridge, IP termination on > > + * switch ports based on 802.1Q tags is simply too brittle to > > + * be passable. So just defer to the dsa_slave_notag_xmit > > + * implementation. > > + */ > > + if (dp->vlan_filtering) > > + return skb; > > + > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); > > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > > > + } > > + > > + /* Code path for transmitting management traffic. This does not rely > > + * upon switch tagging, but instead SPI-installed management routes. > > + */ > > + clone = skb_clone(skb, GFP_ATOMIC); > > + if (!clone) { > > + dev_err(ds->dev, "xmit: failed to clone skb\n"); > > + return NULL; > > + } > > + > > + if (sja1105_skb_ring_add(&sp->xmit_ring, clone) < 0) { > > + dev_err(ds->dev, "xmit: skb ring full\n"); > > + kfree_skb(clone); > > + return NULL; > > + } > > + > > + if (sp->xmit_ring.count == SJA1105_SKB_RING_SIZE) > > + /* TODO setup a dedicated netdev queue for management traffic > > + * so that we can selectively apply backpressure and not be > > + * required to stop the entire traffic when the software skb > > + * ring is full. This requires hooking the ndo_select_queue > > + * from DSA and matching on mac_fltres. > > + */ > > + dev_err(ds->dev, "xmit: reached maximum skb ring size\n"); > > This should be rate limited. > > Andrew > > > + > > + schedule_work(&sp->xmit_work); > > + /* Let DSA free its reference to the skb and we will free > > + * the clone in the deferred worker > > + */ > > + return NULL; > > +} > > + > > +static struct sk_buff *sja1105_rcv(struct sk_buff *skb, > > + struct net_device *netdev, > > + struct packet_type *pt) > > +{ > > + unsigned int source_port, switch_id; > > + struct ethhdr *hdr = eth_hdr(skb); > > + struct sk_buff *nskb; > > + u16 tpid, vid, tci; > > + bool is_tagged; > > + > > + nskb = dsa_8021q_rcv(skb, netdev, pt, &tpid, &tci); > > + is_tagged = (nskb && tpid == ETH_P_EDSA); > > + > > + skb->priority = (tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; > > + vid = tci & VLAN_VID_MASK; > > + > > + skb->offload_fwd_mark = 1; > > + > > + if (likely(!sja1105_is_link_local(skb))) { > > + /* Normal traffic path. */ > > + source_port = dsa_tagging_rx_source_port(vid); > > + switch_id = dsa_tagging_rx_switch_id(vid); > > + } else { > > + /* Management traffic path. Switch embeds the switch ID and > > + * port ID into bytes of the destination MAC, courtesy of > > + * the incl_srcpt options. > > + */ > > + source_port = hdr->h_dest[3]; > > + switch_id = hdr->h_dest[4]; > > + /* Clear the DMAC bytes that were mangled by the switch */ > > + hdr->h_dest[3] = 0; > > + hdr->h_dest[4] = 0; > > + } > > + > > + skb->dev = dsa_master_find_slave(netdev, switch_id, source_port); > > + if (!skb->dev) { > > + netdev_warn(netdev, "Couldn't decode source port\n"); > > + return NULL; > > + } > > + > > + /* Delete/overwrite fake VLAN header, DSA expects to not find > > + * it there, see dsa_switch_rcv: skb_push(skb, ETH_HLEN). > > + */ > > + if (is_tagged) > > + memmove(skb->data - ETH_HLEN, skb->data - ETH_HLEN - VLAN_HLEN, > > + ETH_HLEN - VLAN_HLEN); > > + > > + return skb; > > +} > > + > > +const struct dsa_device_ops sja1105_netdev_ops = { > > + .xmit = sja1105_xmit, > > + .rcv = sja1105_rcv, > > + .filter = sja1105_filter, > > + .overhead = VLAN_HLEN, > > +}; > > + > > -- > > 2.17.1 > >
On Sun, 14 Apr 2019 at 00:27, Vladimir Oltean <olteanv@gmail.com> wrote: > > On Sat, 13 Apr 2019 at 19:38, Andrew Lunn <andrew@lunn.ch> wrote: > > > > On Sat, Apr 13, 2019 at 04:28:16AM +0300, Vladimir Oltean wrote: > > > In order to support this, we are creating a make-shift switch tag out of > > > a VLAN trunk configured on the CPU port. Termination of normal traffic > > > on switch ports only works when not under a vlan_filtering bridge. > > > Termination of management (PTP, BPDU) traffic works under all > > > circumstances because it uses a different tagging mechanism > > > (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q > > > code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105. > > > > > > There are two types of traffic: regular and link-local. > > > The link-local traffic received on the CPU port is trapped from the > > > switch's regular forwarding decisions because it matched one of the two > > > DMAC filters for management traffic. > > > On transmission, the switch requires special massaging for these > > > link-local frames. Due to a weird implementation of the switching IP, by > > > default it drops link-local frames that originate on the CPU port. It > > > needs to be told where to forward them to, through an SPI command > > > ("management route") that is valid for only a single frame. > > > So when we're sending link-local traffic, we need to clone skb's from > > > DSA and send them in our custom xmit worker that also performs SPI access. > > > > > > For that purpose, the DSA xmit handler and the xmit worker communicate > > > through a per-port "skb ring" software structure, with a producer and a > > > consumer index. At the moment this structure is rather fragile > > > (ping-flooding to a link-local DMAC would cause most of the frames to > > > get dropped). I would like to move the management traffic on a separate > > > netdev queue that I can stop when the skb ring got full and hardware is > > > busy processing, so that we are not forced to drop traffic. > > > > > > Signed-off-by: Vladimir Oltean <olteanv@gmail.com> > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > > > --- > > > Changes in v3: > > > Made management traffic be receivable on the DSA netdevices even when > > > switch tagging is disabled, as well as regular traffic be receivable on > > > the master netdevice in the same scenario. Both are accomplished using > > > the sja1105_filter() function and some small touch-ups in the .rcv > > > callback. > > > > It seems like you made major changes to this. When you do that, you > > should drop any reviewed-by tags you have. They are no longer valid > > because of the major changes. > > > > Ok, noted. > > > > /* This callback needs to be present */ > > > @@ -1141,7 +1158,11 @@ static int sja1105_vlan_filtering(struct dsa_switch *ds, int port, bool enabled) > > > if (rc) > > > dev_err(ds->dev, "Failed to change VLAN Ethertype\n"); > > > > > > - return rc; > > > + /* Switch port identification based on 802.1Q is only passable > > > > possible, not passable. > > > > Passable (satisfactory, decent, acceptable) is what I wanted to say. > Tagging using VLANs is possible even when the bridge wants to use > them, but it's smarter not to go there. But I get your point, maybe > I'll rephrase. > > > > + * if we are not under a vlan_filtering bridge. So make sure > > > + * the two configurations are mutually exclusive. > > > + */ > > > + return sja1105_setup_8021q_tagging(ds, !enabled); > > > } > > > > > > static void sja1105_vlan_add(struct dsa_switch *ds, int port, > > > @@ -1233,9 +1254,107 @@ static int sja1105_setup(struct dsa_switch *ds) > > > */ > > > ds->vlan_filtering_is_global = true; > > > > > > + /* The DSA/switchdev model brings up switch ports in standalone mode by > > > + * default, and that means vlan_filtering is 0 since they're not under > > > + * a bridge, so it's safe to set up switch tagging at this time. > > > + */ > > > + return sja1105_setup_8021q_tagging(ds, true); > > > +} > > > + > > > +#include "../../../net/dsa/dsa_priv.h" > > > > No. Don't use relative includes like this. > > > > What do you need from the header? Maybe move it into > > include/linux/net/dsa.h > > > > dsa_slave_to_master() > > > > +/* Deferred work is unfortunately necessary because setting up the management > > > + * route cannot be done from atomit context (SPI transfer takes a sleepable > > > + * lock on the bus) > > > + */ > > > +static void sja1105_xmit_work_handler(struct work_struct *work) > > > +{ > > > + struct sja1105_port *sp = container_of(work, struct sja1105_port, > > > + xmit_work); > > > + struct sja1105_private *priv = sp->dp->ds->priv; > > > + struct net_device *slave = sp->dp->slave; > > > + struct net_device *master = dsa_slave_to_master(slave); > > > + int port = (uintptr_t)(sp - priv->ports); > > > + struct sk_buff *skb; > > > + int i, rc; > > > + > > > + while ((i = sja1105_skb_ring_get(&sp->xmit_ring, &skb)) >= 0) { > > > + struct sja1105_mgmt_entry mgmt_route = { 0 }; > > > + struct ethhdr *hdr; > > > + int timeout = 10; > > > + int skb_len; > > > + > > > + skb_len = skb->len; > > > + hdr = eth_hdr(skb); > > > + > > > + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest); > > > + mgmt_route.destports = BIT(port); > > > + mgmt_route.enfport = 1; > > > + mgmt_route.tsreg = 0; > > > + mgmt_route.takets = false; > > > + > > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > > + port, &mgmt_route, true); > > > + if (rc < 0) { > > > + kfree_skb(skb); > > > + slave->stats.tx_dropped++; > > > + continue; > > > + } > > > + > > > + /* Transfer skb to the host port. */ > > > + skb->dev = master; > > > + dev_queue_xmit(skb); > > > + > > > + /* Wait until the switch has processed the frame */ > > > + do { > > > + rc = sja1105_dynamic_config_read(priv, BLK_IDX_MGMT_ROUTE, > > > + port, &mgmt_route); > > > + if (rc < 0) { > > > + slave->stats.tx_errors++; > > > + dev_err(priv->ds->dev, > > > + "xmit: failed to poll for mgmt route\n"); > > > + continue; > > > + } > > > + > > > + /* UM10944: The ENFPORT flag of the respective entry is > > > + * cleared when a match is found. The host can use this > > > + * flag as an acknowledgment. > > > + */ > > > + cpu_relax(); > > > + } while (mgmt_route.enfport && --timeout); > > > + > > > + if (!timeout) { > > > + dev_err(priv->ds->dev, "xmit timed out\n"); > > > + slave->stats.tx_errors++; > > > + continue; > > > + } > > > + > > > + slave->stats.tx_packets++; > > > + slave->stats.tx_bytes += skb_len; > > > + } > > > +} > > > + > > > +static int sja1105_port_enable(struct dsa_switch *ds, int port, > > > + struct phy_device *phydev) > > > +{ > > > + struct sja1105_private *priv = ds->priv; > > > + struct sja1105_port *sp = &priv->ports[port]; > > > + > > > + sp->dp = &ds->ports[port]; > > > + INIT_WORK(&sp->xmit_work, sja1105_xmit_work_handler); > > > return 0; > > > } > > > > I think i'm missing something here. You have a per port queue of link > > local frames which need special handling. And you have a per-port work > > queue. To send such a frame, you need to write some register, send the > > frame, and then wait until the mgmt_route.enfport is reset. > > > > Why are you doing this per port? How do you stop two ports/work queues > > running at the same time? It seems like one queue, with one work queue > > would be a better structure. > > > > See the "port" parameter to this call here: > > rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > *port*, &mgmt_route, true); > > The switch IP aptly allocates 4 slots for management routes. And it's > a 5-port switch where 1 port is the management port. I think the > structure is fine. > "How do stop two work queues": if you're talking about contention on the hardware management route, I responded to that. If you're talking about netif_stop_queue(), I think I'm going to avoid that altogether by not having a finite sized ring structure. While studying the Marvell 88e6060 driver I found out that sk_buff_head exists. Now I'm part of the elite club of wheel reinventors with my struct sja1105_skb_ring :) > > Also, please move all this code into the tagger. Just add exports for > > sja1105_dynamic_config_write() and sja1105_dynamic_config_read(). > > > > Well, you see, the tagger code is part of the dsa_core object. If I > export function symbols from the driver, those still won't be there if > I compile the driver as a module. On the other hand, the way I'm doing > it, I think the schedule_work() gives me a pretty good separation. > > > > +static void sja1105_port_disable(struct dsa_switch *ds, int port) > > > +{ > > > + struct sja1105_private *priv = ds->priv; > > > + struct sja1105_port *sp = &priv->ports[port]; > > > + struct sk_buff *skb; > > > + > > > + cancel_work_sync(&sp->xmit_work); > > > + while (sja1105_skb_ring_get(&sp->xmit_ring, &skb) >= 0) > > > + kfree_skb(skb); > > > +} > > > + > > > diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c > > > new file mode 100644 > > > index 000000000000..5c76a06c9093 > > > --- /dev/null > > > +++ b/net/dsa/tag_sja1105.c > > > @@ -0,0 +1,148 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com> > > > + */ > > > +#include <linux/etherdevice.h> > > > +#include <linux/if_vlan.h> > > > +#include <linux/dsa/sja1105.h> > > > +#include "../../drivers/net/dsa/sja1105/sja1105.h" > > > > Again, no, don't do this. > > > > This separation between driver and tagger is fairly arbitrary. > I need access to the driver's private structure, in order to get a > hold of the private shadow of the dsa_port. Moving the driver private > structure to include/linux/dsa/ would pull in quite a number of > dependencies. Maybe I could provide declarations for the most of them, > but anyway the private structure wouldn't be so private any longer, > would it? > Otherwise put, would you prefer a dp->priv similar to the already > existing ds->priv? struct sja1105_port is much more lightweight to > keep in include/linux/dsa/. > > > > + > > > +#include "dsa_priv.h" > > > + > > > +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */ > > > +static inline bool sja1105_is_link_local(const struct sk_buff *skb) > > > +{ > > > + const struct ethhdr *hdr = eth_hdr(skb); > > > + u64 dmac = ether_addr_to_u64(hdr->h_dest); > > > + > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) == > > > + SJA1105_LINKLOCAL_FILTER_A) > > > + return true; > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) == > > > + SJA1105_LINKLOCAL_FILTER_B) > > > + return true; > > > + return false; > > > +} > > > + > > > +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev) > > > +{ > > > + if (sja1105_is_link_local(skb)) > > > + return true; > > > + if (!dev->dsa_ptr->vlan_filtering) > > > + return true; > > > + return false; > > > +} > > > > Please add a comment here about what frames cannot be handled by the > > tagger. However, i'm not too happy about this design... > > > > Ok, let's put this another way. > A switch is primarily a device used to offload the forwarding of > traffic based on L2 rules. Additionally there may be some management > traffic for stuff like STP that needs to be terminated on the host > port of the switch. For that, the hardware's job is to filter and tag > management frames on their way to the host port, and the software's > job is to process the source port and switch id information in a > meaningful way. > Now both this particular switch hardware, and DSA, are taking the > above definitions to extremes. > The switch says: "that's all you want to see? ok, so that's all I'm > going to give you". So its native (hardware) tagging protocol is to > trap link-local traffic and overwrite two bytes of its destination MAC > with the switch ID and the source port. No more, no less. It is an > incomplete solution, but it does the job for practical use cases. > Now DSA says: "I want these to be fully capable net devices, I want > the user to not even realize what's going on under the hood". I don't > think that terminating iperf traffic through switch ports is a > realistic usage scenario. So in a way discussions about performance > and optimizations on DSA hotpath are slightly pointless IMO. > Now what my driver says is that it offers a bit of both. It speaks the > hardware's tagging protocol so it is capable of management traffic, > but it also speaks the DSA paradigm, so in a way pushes the hardware > to work in a mode it was never intended to, by repurposing VLANs when > the user doesn't request them. So on one hand there is some overlap > between the hardware tagging protocol and the VLAN one (in standalone > mode and in VLAN-unaware bridged mode, management traffic *could* use > VLAN tagging but it doesn't rely on it), and on the other hand the > reunion of the two tagging protocols is decent, but still doesn't > cover the entire spectrum (when put under a VLAN-aware bridge, you > lose the ability to decode general traffic). So you'd better not rely > on VLANs to decode the management traffic, because you won't be able > to always rely on that, and that is a shame since a bridge with both > vlan_filtering 1 and stp_state 1 is a real usage scenario, and the > hardware is capable of that combination. > But all of that is secondary. Let's forget about VLAN tagging for a > second and concentrate on the tagging of management traffic. The > limiting factor here is the software architecture of DSA, because in > order for me to decode that in the driver/tagger, I'd have to drop > everything else coming on the master net device (I explained in 13/24 > why). I believe that DSA being all-or-nothing about switch tagging is > turning a blind eye to the devices that don't go overboard with > features, and give you what's needed in a real-world design but not > much else. > What would you improve about this design (assuming you're talking > about the filter function)? > > Thanks, > -Vladimir > > > > > > > > + > > > +static struct sk_buff *sja1105_xmit(struct sk_buff *skb, > > > + struct net_device *netdev) > > > +{ > > > + struct dsa_port *dp = dsa_slave_to_port(netdev); > > > + struct dsa_switch *ds = dp->ds; > > > + struct sja1105_private *priv = ds->priv; > > > + struct sja1105_port *sp = &priv->ports[dp->index]; > > > + struct sk_buff *clone; > > > + > > > + if (likely(!sja1105_is_link_local(skb))) { > > > + /* Normal traffic path. */ > > > + u16 tx_vid = dsa_tagging_tx_vid(ds, dp->index); > > > + u8 pcp = skb->priority; > > > + > > > + /* If we are under a vlan_filtering bridge, IP termination on > > > + * switch ports based on 802.1Q tags is simply too brittle to > > > + * be passable. So just defer to the dsa_slave_notag_xmit > > > + * implementation. > > > + */ > > > + if (dp->vlan_filtering) > > > + return skb; > > > + > > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); > > > > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > > > > > + } > > > + > > > + /* Code path for transmitting management traffic. This does not rely > > > + * upon switch tagging, but instead SPI-installed management routes. > > > + */ > > > + clone = skb_clone(skb, GFP_ATOMIC); > > > + if (!clone) { > > > + dev_err(ds->dev, "xmit: failed to clone skb\n"); > > > + return NULL; > > > + } > > > + > > > + if (sja1105_skb_ring_add(&sp->xmit_ring, clone) < 0) { > > > + dev_err(ds->dev, "xmit: skb ring full\n"); > > > + kfree_skb(clone); > > > + return NULL; > > > + } > > > + > > > + if (sp->xmit_ring.count == SJA1105_SKB_RING_SIZE) > > > + /* TODO setup a dedicated netdev queue for management traffic > > > + * so that we can selectively apply backpressure and not be > > > + * required to stop the entire traffic when the software skb > > > + * ring is full. This requires hooking the ndo_select_queue > > > + * from DSA and matching on mac_fltres. > > > + */ > > > + dev_err(ds->dev, "xmit: reached maximum skb ring size\n"); > > > > This should be rate limited. > > > > Andrew > > > > > + > > > + schedule_work(&sp->xmit_work); > > > + /* Let DSA free its reference to the skb and we will free > > > + * the clone in the deferred worker > > > + */ > > > + return NULL; > > > +} > > > + > > > +static struct sk_buff *sja1105_rcv(struct sk_buff *skb, > > > + struct net_device *netdev, > > > + struct packet_type *pt) > > > +{ > > > + unsigned int source_port, switch_id; > > > + struct ethhdr *hdr = eth_hdr(skb); > > > + struct sk_buff *nskb; > > > + u16 tpid, vid, tci; > > > + bool is_tagged; > > > + > > > + nskb = dsa_8021q_rcv(skb, netdev, pt, &tpid, &tci); > > > + is_tagged = (nskb && tpid == ETH_P_EDSA); > > > + > > > + skb->priority = (tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; > > > + vid = tci & VLAN_VID_MASK; > > > + > > > + skb->offload_fwd_mark = 1; > > > + > > > + if (likely(!sja1105_is_link_local(skb))) { > > > + /* Normal traffic path. */ > > > + source_port = dsa_tagging_rx_source_port(vid); > > > + switch_id = dsa_tagging_rx_switch_id(vid); > > > + } else { > > > + /* Management traffic path. Switch embeds the switch ID and > > > + * port ID into bytes of the destination MAC, courtesy of > > > + * the incl_srcpt options. > > > + */ > > > + source_port = hdr->h_dest[3]; > > > + switch_id = hdr->h_dest[4]; > > > + /* Clear the DMAC bytes that were mangled by the switch */ > > > + hdr->h_dest[3] = 0; > > > + hdr->h_dest[4] = 0; > > > + } > > > + > > > + skb->dev = dsa_master_find_slave(netdev, switch_id, source_port); > > > + if (!skb->dev) { > > > + netdev_warn(netdev, "Couldn't decode source port\n"); > > > + return NULL; > > > + } > > > + > > > + /* Delete/overwrite fake VLAN header, DSA expects to not find > > > + * it there, see dsa_switch_rcv: skb_push(skb, ETH_HLEN). > > > + */ > > > + if (is_tagged) > > > + memmove(skb->data - ETH_HLEN, skb->data - ETH_HLEN - VLAN_HLEN, > > > + ETH_HLEN - VLAN_HLEN); > > > + > > > + return skb; > > > +} > > > + > > > +const struct dsa_device_ops sja1105_netdev_ops = { > > > + .xmit = sja1105_xmit, > > > + .rcv = sja1105_rcv, > > > + .filter = sja1105_filter, > > > + .overhead = VLAN_HLEN, > > > +}; > > > + > > > -- > > > 2.17.1 > > >
On Sun, 14 Apr 2019 at 01:08, Vladimir Oltean <olteanv@gmail.com> wrote: > > On Sun, 14 Apr 2019 at 00:27, Vladimir Oltean <olteanv@gmail.com> wrote: > > > > On Sat, 13 Apr 2019 at 19:38, Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > On Sat, Apr 13, 2019 at 04:28:16AM +0300, Vladimir Oltean wrote: > > > > In order to support this, we are creating a make-shift switch tag out of > > > > a VLAN trunk configured on the CPU port. Termination of normal traffic > > > > on switch ports only works when not under a vlan_filtering bridge. > > > > Termination of management (PTP, BPDU) traffic works under all > > > > circumstances because it uses a different tagging mechanism > > > > (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q > > > > code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105. > > > > > > > > There are two types of traffic: regular and link-local. > > > > The link-local traffic received on the CPU port is trapped from the > > > > switch's regular forwarding decisions because it matched one of the two > > > > DMAC filters for management traffic. > > > > On transmission, the switch requires special massaging for these > > > > link-local frames. Due to a weird implementation of the switching IP, by > > > > default it drops link-local frames that originate on the CPU port. It > > > > needs to be told where to forward them to, through an SPI command > > > > ("management route") that is valid for only a single frame. > > > > So when we're sending link-local traffic, we need to clone skb's from > > > > DSA and send them in our custom xmit worker that also performs SPI access. > > > > > > > > For that purpose, the DSA xmit handler and the xmit worker communicate > > > > through a per-port "skb ring" software structure, with a producer and a > > > > consumer index. At the moment this structure is rather fragile > > > > (ping-flooding to a link-local DMAC would cause most of the frames to > > > > get dropped). I would like to move the management traffic on a separate > > > > netdev queue that I can stop when the skb ring got full and hardware is > > > > busy processing, so that we are not forced to drop traffic. > > > > > > > > Signed-off-by: Vladimir Oltean <olteanv@gmail.com> > > > > Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> > > > > --- > > > > Changes in v3: > > > > Made management traffic be receivable on the DSA netdevices even when > > > > switch tagging is disabled, as well as regular traffic be receivable on > > > > the master netdevice in the same scenario. Both are accomplished using > > > > the sja1105_filter() function and some small touch-ups in the .rcv > > > > callback. > > > > > > It seems like you made major changes to this. When you do that, you > > > should drop any reviewed-by tags you have. They are no longer valid > > > because of the major changes. > > > > > > > Ok, noted. > > > > > > /* This callback needs to be present */ > > > > @@ -1141,7 +1158,11 @@ static int sja1105_vlan_filtering(struct dsa_switch *ds, int port, bool enabled) > > > > if (rc) > > > > dev_err(ds->dev, "Failed to change VLAN Ethertype\n"); > > > > > > > > - return rc; > > > > + /* Switch port identification based on 802.1Q is only passable > > > > > > possible, not passable. > > > > > > > Passable (satisfactory, decent, acceptable) is what I wanted to say. > > Tagging using VLANs is possible even when the bridge wants to use > > them, but it's smarter not to go there. But I get your point, maybe > > I'll rephrase. > > > > > > + * if we are not under a vlan_filtering bridge. So make sure > > > > + * the two configurations are mutually exclusive. > > > > + */ > > > > + return sja1105_setup_8021q_tagging(ds, !enabled); > > > > } > > > > > > > > static void sja1105_vlan_add(struct dsa_switch *ds, int port, > > > > @@ -1233,9 +1254,107 @@ static int sja1105_setup(struct dsa_switch *ds) > > > > */ > > > > ds->vlan_filtering_is_global = true; > > > > > > > > + /* The DSA/switchdev model brings up switch ports in standalone mode by > > > > + * default, and that means vlan_filtering is 0 since they're not under > > > > + * a bridge, so it's safe to set up switch tagging at this time. > > > > + */ > > > > + return sja1105_setup_8021q_tagging(ds, true); > > > > +} > > > > + > > > > +#include "../../../net/dsa/dsa_priv.h" > > > > > > No. Don't use relative includes like this. > > > > > > What do you need from the header? Maybe move it into > > > include/linux/net/dsa.h > > > > > > > dsa_slave_to_master() > > > > > > +/* Deferred work is unfortunately necessary because setting up the management > > > > + * route cannot be done from atomit context (SPI transfer takes a sleepable > > > > + * lock on the bus) > > > > + */ > > > > +static void sja1105_xmit_work_handler(struct work_struct *work) > > > > +{ > > > > + struct sja1105_port *sp = container_of(work, struct sja1105_port, > > > > + xmit_work); > > > > + struct sja1105_private *priv = sp->dp->ds->priv; > > > > + struct net_device *slave = sp->dp->slave; > > > > + struct net_device *master = dsa_slave_to_master(slave); > > > > + int port = (uintptr_t)(sp - priv->ports); > > > > + struct sk_buff *skb; > > > > + int i, rc; > > > > + > > > > + while ((i = sja1105_skb_ring_get(&sp->xmit_ring, &skb)) >= 0) { > > > > + struct sja1105_mgmt_entry mgmt_route = { 0 }; > > > > + struct ethhdr *hdr; > > > > + int timeout = 10; > > > > + int skb_len; > > > > + > > > > + skb_len = skb->len; > > > > + hdr = eth_hdr(skb); > > > > + > > > > + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest); > > > > + mgmt_route.destports = BIT(port); > > > > + mgmt_route.enfport = 1; > > > > + mgmt_route.tsreg = 0; > > > > + mgmt_route.takets = false; > > > > + > > > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > > > + port, &mgmt_route, true); > > > > + if (rc < 0) { > > > > + kfree_skb(skb); > > > > + slave->stats.tx_dropped++; > > > > + continue; > > > > + } > > > > + > > > > + /* Transfer skb to the host port. */ > > > > + skb->dev = master; > > > > + dev_queue_xmit(skb); > > > > + > > > > + /* Wait until the switch has processed the frame */ > > > > + do { > > > > + rc = sja1105_dynamic_config_read(priv, BLK_IDX_MGMT_ROUTE, > > > > + port, &mgmt_route); > > > > + if (rc < 0) { > > > > + slave->stats.tx_errors++; > > > > + dev_err(priv->ds->dev, > > > > + "xmit: failed to poll for mgmt route\n"); > > > > + continue; > > > > + } > > > > + > > > > + /* UM10944: The ENFPORT flag of the respective entry is > > > > + * cleared when a match is found. The host can use this > > > > + * flag as an acknowledgment. > > > > + */ > > > > + cpu_relax(); > > > > + } while (mgmt_route.enfport && --timeout); > > > > + > > > > + if (!timeout) { > > > > + dev_err(priv->ds->dev, "xmit timed out\n"); > > > > + slave->stats.tx_errors++; > > > > + continue; > > > > + } > > > > + > > > > + slave->stats.tx_packets++; > > > > + slave->stats.tx_bytes += skb_len; > > > > + } > > > > +} > > > > + > > > > +static int sja1105_port_enable(struct dsa_switch *ds, int port, > > > > + struct phy_device *phydev) > > > > +{ > > > > + struct sja1105_private *priv = ds->priv; > > > > + struct sja1105_port *sp = &priv->ports[port]; > > > > + > > > > + sp->dp = &ds->ports[port]; > > > > + INIT_WORK(&sp->xmit_work, sja1105_xmit_work_handler); > > > > return 0; > > > > } > > > > > > I think i'm missing something here. You have a per port queue of link > > > local frames which need special handling. And you have a per-port work > > > queue. To send such a frame, you need to write some register, send the > > > frame, and then wait until the mgmt_route.enfport is reset. > > > > > > Why are you doing this per port? How do you stop two ports/work queues > > > running at the same time? It seems like one queue, with one work queue > > > would be a better structure. > > > > > > > See the "port" parameter to this call here: > > > > rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > *port*, &mgmt_route, true); > > > > The switch IP aptly allocates 4 slots for management routes. And it's > > a 5-port switch where 1 port is the management port. I think the > > structure is fine. > > > > "How do stop two work queues": if you're talking about contention on > the hardware management route, I responded to that. > If you're talking about netif_stop_queue(), I think I'm going to avoid > that altogether by not having a finite sized ring structure. While > studying the Marvell 88e6060 driver I found out that sk_buff_head > exists. Now I'm part of the elite club of wheel reinventors with my > struct sja1105_skb_ring :) > > > > > Also, please move all this code into the tagger. Just add exports for > > > sja1105_dynamic_config_write() and sja1105_dynamic_config_read(). > > > > > > > Well, you see, the tagger code is part of the dsa_core object. If I > > export function symbols from the driver, those still won't be there if > > I compile the driver as a module. On the other hand, the way I'm doing > > it, I think the schedule_work() gives me a pretty good separation. > > > > > > +static void sja1105_port_disable(struct dsa_switch *ds, int port) > > > > +{ > > > > + struct sja1105_private *priv = ds->priv; > > > > + struct sja1105_port *sp = &priv->ports[port]; > > > > + struct sk_buff *skb; > > > > + > > > > + cancel_work_sync(&sp->xmit_work); > > > > + while (sja1105_skb_ring_get(&sp->xmit_ring, &skb) >= 0) > > > > + kfree_skb(skb); > > > > +} > > > > + > > > > diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c > > > > new file mode 100644 > > > > index 000000000000..5c76a06c9093 > > > > --- /dev/null > > > > +++ b/net/dsa/tag_sja1105.c > > > > @@ -0,0 +1,148 @@ > > > > +// SPDX-License-Identifier: GPL-2.0 > > > > +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com> > > > > + */ > > > > +#include <linux/etherdevice.h> > > > > +#include <linux/if_vlan.h> > > > > +#include <linux/dsa/sja1105.h> > > > > +#include "../../drivers/net/dsa/sja1105/sja1105.h" > > > > > > Again, no, don't do this. > > > > > > > This separation between driver and tagger is fairly arbitrary. > > I need access to the driver's private structure, in order to get a > > hold of the private shadow of the dsa_port. Moving the driver private > > structure to include/linux/dsa/ would pull in quite a number of > > dependencies. Maybe I could provide declarations for the most of them, > > but anyway the private structure wouldn't be so private any longer, > > would it? > > Otherwise put, would you prefer a dp->priv similar to the already > > existing ds->priv? struct sja1105_port is much more lightweight to > > keep in include/linux/dsa/. > > > > > > + > > > > +#include "dsa_priv.h" > > > > + > > > > +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */ > > > > +static inline bool sja1105_is_link_local(const struct sk_buff *skb) > > > > +{ > > > > + const struct ethhdr *hdr = eth_hdr(skb); > > > > + u64 dmac = ether_addr_to_u64(hdr->h_dest); > > > > + > > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) == > > > > + SJA1105_LINKLOCAL_FILTER_A) > > > > + return true; > > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) == > > > > + SJA1105_LINKLOCAL_FILTER_B) > > > > + return true; > > > > + return false; > > > > +} > > > > + > > > > +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev) > > > > +{ > > > > + if (sja1105_is_link_local(skb)) > > > > + return true; > > > > + if (!dev->dsa_ptr->vlan_filtering) > > > > + return true; > > > > + return false; > > > > +} > > > > > > Please add a comment here about what frames cannot be handled by the > > > tagger. However, i'm not too happy about this design... > > > > > > > Ok, let's put this another way. > > A switch is primarily a device used to offload the forwarding of > > traffic based on L2 rules. Additionally there may be some management > > traffic for stuff like STP that needs to be terminated on the host > > port of the switch. For that, the hardware's job is to filter and tag > > management frames on their way to the host port, and the software's > > job is to process the source port and switch id information in a > > meaningful way. > > Now both this particular switch hardware, and DSA, are taking the > > above definitions to extremes. > > The switch says: "that's all you want to see? ok, so that's all I'm > > going to give you". So its native (hardware) tagging protocol is to > > trap link-local traffic and overwrite two bytes of its destination MAC > > with the switch ID and the source port. No more, no less. It is an > > incomplete solution, but it does the job for practical use cases. > > Now DSA says: "I want these to be fully capable net devices, I want > > the user to not even realize what's going on under the hood". I don't > > think that terminating iperf traffic through switch ports is a > > realistic usage scenario. So in a way discussions about performance > > and optimizations on DSA hotpath are slightly pointless IMO. > > Now what my driver says is that it offers a bit of both. It speaks the > > hardware's tagging protocol so it is capable of management traffic, > > but it also speaks the DSA paradigm, so in a way pushes the hardware > > to work in a mode it was never intended to, by repurposing VLANs when > > the user doesn't request them. So on one hand there is some overlap > > between the hardware tagging protocol and the VLAN one (in standalone > > mode and in VLAN-unaware bridged mode, management traffic *could* use > > VLAN tagging but it doesn't rely on it), and on the other hand the > > reunion of the two tagging protocols is decent, but still doesn't > > cover the entire spectrum (when put under a VLAN-aware bridge, you > > lose the ability to decode general traffic). So you'd better not rely > > on VLANs to decode the management traffic, because you won't be able > > to always rely on that, and that is a shame since a bridge with both > > vlan_filtering 1 and stp_state 1 is a real usage scenario, and the > > hardware is capable of that combination. > > But all of that is secondary. Let's forget about VLAN tagging for a > > second and concentrate on the tagging of management traffic. The > > limiting factor here is the software architecture of DSA, because in > > order for me to decode that in the driver/tagger, I'd have to drop > > everything else coming on the master net device (I explained in 13/24 > > why). I believe that DSA being all-or-nothing about switch tagging is > > turning a blind eye to the devices that don't go overboard with > > features, and give you what's needed in a real-world design but not > > much else. > > What would you improve about this design (assuming you're talking > > about the filter function)? > > > > Thanks, > > -Vladimir > > > > > > > > > > > > > > + > > > > +static struct sk_buff *sja1105_xmit(struct sk_buff *skb, > > > > + struct net_device *netdev) > > > > +{ > > > > + struct dsa_port *dp = dsa_slave_to_port(netdev); > > > > + struct dsa_switch *ds = dp->ds; > > > > + struct sja1105_private *priv = ds->priv; > > > > + struct sja1105_port *sp = &priv->ports[dp->index]; > > > > + struct sk_buff *clone; > > > > + > > > > + if (likely(!sja1105_is_link_local(skb))) { > > > > + /* Normal traffic path. */ > > > > + u16 tx_vid = dsa_tagging_tx_vid(ds, dp->index); > > > > + u8 pcp = skb->priority; > > > > + > > > > + /* If we are under a vlan_filtering bridge, IP termination on > > > > + * switch ports based on 802.1Q tags is simply too brittle to > > > > + * be passable. So just defer to the dsa_slave_notag_xmit > > > > + * implementation. > > > > + */ > > > > + if (dp->vlan_filtering) > > > > + return skb; > > > > + > > > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > > > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); > > > > > > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > > > I'm receiving contradictory advice on this. Why should I define a new ethertype, and if I do, what scope should the definition have (local to the driver and the tagger, local to DSA, UAPI)? > > > > + } > > > > + > > > > + /* Code path for transmitting management traffic. This does not rely > > > > + * upon switch tagging, but instead SPI-installed management routes. > > > > + */ > > > > + clone = skb_clone(skb, GFP_ATOMIC); > > > > + if (!clone) { > > > > + dev_err(ds->dev, "xmit: failed to clone skb\n"); > > > > + return NULL; > > > > + } > > > > + > > > > + if (sja1105_skb_ring_add(&sp->xmit_ring, clone) < 0) { > > > > + dev_err(ds->dev, "xmit: skb ring full\n"); > > > > + kfree_skb(clone); > > > > + return NULL; > > > > + } > > > > + > > > > + if (sp->xmit_ring.count == SJA1105_SKB_RING_SIZE) > > > > + /* TODO setup a dedicated netdev queue for management traffic > > > > + * so that we can selectively apply backpressure and not be > > > > + * required to stop the entire traffic when the software skb > > > > + * ring is full. This requires hooking the ndo_select_queue > > > > + * from DSA and matching on mac_fltres. > > > > + */ > > > > + dev_err(ds->dev, "xmit: reached maximum skb ring size\n"); > > > > > > This should be rate limited. > > > Again, with sk_buff_head it'll probably completely go away. > > > Andrew > > > > > > > + > > > > + schedule_work(&sp->xmit_work); > > > > + /* Let DSA free its reference to the skb and we will free > > > > + * the clone in the deferred worker > > > > + */ > > > > + return NULL; > > > > +} > > > > + > > > > +static struct sk_buff *sja1105_rcv(struct sk_buff *skb, > > > > + struct net_device *netdev, > > > > + struct packet_type *pt) > > > > +{ > > > > + unsigned int source_port, switch_id; > > > > + struct ethhdr *hdr = eth_hdr(skb); > > > > + struct sk_buff *nskb; > > > > + u16 tpid, vid, tci; > > > > + bool is_tagged; > > > > + > > > > + nskb = dsa_8021q_rcv(skb, netdev, pt, &tpid, &tci); > > > > + is_tagged = (nskb && tpid == ETH_P_EDSA); > > > > + > > > > + skb->priority = (tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; > > > > + vid = tci & VLAN_VID_MASK; > > > > + > > > > + skb->offload_fwd_mark = 1; > > > > + > > > > + if (likely(!sja1105_is_link_local(skb))) { > > > > + /* Normal traffic path. */ > > > > + source_port = dsa_tagging_rx_source_port(vid); > > > > + switch_id = dsa_tagging_rx_switch_id(vid); > > > > + } else { > > > > + /* Management traffic path. Switch embeds the switch ID and > > > > + * port ID into bytes of the destination MAC, courtesy of > > > > + * the incl_srcpt options. > > > > + */ > > > > + source_port = hdr->h_dest[3]; > > > > + switch_id = hdr->h_dest[4]; > > > > + /* Clear the DMAC bytes that were mangled by the switch */ > > > > + hdr->h_dest[3] = 0; > > > > + hdr->h_dest[4] = 0; > > > > + } > > > > + > > > > + skb->dev = dsa_master_find_slave(netdev, switch_id, source_port); > > > > + if (!skb->dev) { > > > > + netdev_warn(netdev, "Couldn't decode source port\n"); > > > > + return NULL; > > > > + } > > > > + > > > > + /* Delete/overwrite fake VLAN header, DSA expects to not find > > > > + * it there, see dsa_switch_rcv: skb_push(skb, ETH_HLEN). > > > > + */ > > > > + if (is_tagged) > > > > + memmove(skb->data - ETH_HLEN, skb->data - ETH_HLEN - VLAN_HLEN, > > > > + ETH_HLEN - VLAN_HLEN); > > > > + > > > > + return skb; > > > > +} > > > > + > > > > +const struct dsa_device_ops sja1105_netdev_ops = { > > > > + .xmit = sja1105_xmit, > > > > + .rcv = sja1105_rcv, > > > > + .filter = sja1105_filter, > > > > + .overhead = VLAN_HLEN, > > > > +}; > > > > + > > > > -- > > > > 2.17.1 > > > >
On Sun, Apr 14, 2019 at 12:27:50AM +0300, Vladimir Oltean wrote: > > > + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest); > > > + mgmt_route.destports = BIT(port); > > > + mgmt_route.enfport = 1; > > > + mgmt_route.tsreg = 0; > > > + mgmt_route.takets = false; > > > + > > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > > + port, &mgmt_route, true); > rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > *port*, &mgmt_route, true); > > The switch IP aptly allocates 4 slots for management routes. And it's > a 5-port switch where 1 port is the management port. I think the > structure is fine. So does the hardware look over all the slots and find the first one which has a matching mgmt_route.macaddr destination MAC address? You wait for the enfport to be cleared. I assume a slot with enfport cleared is not active and won't match? So we need to consider if there is a race condition where we have multiple slots with the same destination MAC address, but different destination ports? Say the bridge sends out BPDU to all ports of a bridge in quick succession. These work queues run in any order, and can sleep. Can we get into a situation where we get the two slots setup, and then the frames sent in reverse order? The match then happens backwards, and the frames get sent out the wrong port? Or say the two slots are setup, the two frames are sent in order, but the stack decided to drop the first frame because buffers are full. Can the second frame make it to the switch and match on the first slot and go out the wrong port? > > Also, please move all this code into the tagger. Just add exports for > > sja1105_dynamic_config_write() and sja1105_dynamic_config_read(). > > > > Well, you see, the tagger code is part of the dsa_core object. If I > export function symbols from the driver, those still won't be there if > I compile the driver as a module. On the other hand, the way I'm doing > it, I think the schedule_work() gives me a pretty good separation. That is solvable via Kconfig, don't allow it to be built as a module. Also, DSA has been very successful, we keep getting more switches from different vendors, and hence more taggers. So at some point, we should turn the taggers into modules. I'm not saying that should happen now, but when it does happen, this driver can then become a module. The real reason is, tagger as all about handling frames, where as drivers are all about configuring the switch. The majority of this code is about frames, so it belongs in the tagger. > > > +#include <linux/etherdevice.h> > > > +#include <linux/if_vlan.h> > > > +#include <linux/dsa/sja1105.h> > > > +#include "../../drivers/net/dsa/sja1105/sja1105.h" > > > > Again, no, don't do this. > > > > This separation between driver and tagger is fairly arbitrary. > I need access to the driver's private structure, in order to get a > hold of the private shadow of the dsa_port. Moving the driver private > structure to include/linux/dsa/ would pull in quite a number of > dependencies. Maybe I could provide declarations for the most of them, > but anyway the private structure wouldn't be so private any longer, > would it? > Otherwise put, would you prefer a dp->priv similar to the already > existing ds->priv? struct sja1105_port is much more lightweight to > keep in include/linux/dsa/. Linux simply does not make use of relative paths going between directories like this. That is the key point here. Whatever you need to share between the tagger and the driver has to be put into include/linux/dsa/. Assuming we are just exporting something like sja1105_dynamic_config_write() and _read() > > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > > + port, &mgmt_route, true); priv can be replaced with ds, which the tagger has. port is known. BLK_IDX_MGMT_ROUTE is implicit, and all that the tagger needs to pass for mgmt_route is the destination MAC address, which it has. The tagger does need somewhere to keep the queue of frames to be sent and its workqueue. I would probably add a void *tagger_priv to dsa_switch, and two new methods to dsa_device_ops, .probe and .release, to allow it to create and destroy what it needs in tagger_priv. > > > +#include "dsa_priv.h" > > > + > > > +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */ > > > +static inline bool sja1105_is_link_local(const struct sk_buff *skb) > > > +{ > > > + const struct ethhdr *hdr = eth_hdr(skb); > > > + u64 dmac = ether_addr_to_u64(hdr->h_dest); > > > + > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) == > > > + SJA1105_LINKLOCAL_FILTER_A) > > > + return true; > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) == > > > + SJA1105_LINKLOCAL_FILTER_B) > > > + return true; > > > + return false; > > > +} > > > + > > > +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev) > > > +{ > > > + if (sja1105_is_link_local(skb)) > > > + return true; > > > + if (!dev->dsa_ptr->vlan_filtering) > > > + return true; > > > + return false; > > > +} > > > > Please add a comment here about what frames cannot be handled by the > > tagger. However, i'm not too happy about this design... > > > What would you improve about this design (assuming you're talking > about the filter function)? I want to understand what frames get passed via the master device, and how ultimately they get to where they should be going. Once i understand what sort of frames they are and what is generating/consuming them, maybe we can find a better solution which preserves the DSA concepts. To me, it looks like they are not management frames, at least not BPDU or PTP, since they are link local. If VLAN filtering is off, the VLAN tag tells us which port they came in, so we can strip the tag and pass them to the correct slave. So it looks like real user frames with a VLAN tag are getting passed to the master device. So i then assume you put vlan interfaces on top of the master device, and your application then uses the vlan interfaces? Your application does not care where the frames came from, it is using the switch as a dumb switch. The DSA slaves are unused? Could we enforce that a VLAN can only be assigned to a single port? The tagger could then pass the tagged frame to the correct slave? Is that too restrictive for your use case? Do you need the same VLAN on multiple ports. Andrew
> > > > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > > > > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); > > > > > > > > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > > > > > > I'm receiving contradictory advice on this. Why should I define a new > ethertype, and if I do, what scope should the definition have (local > to the driver and the tagger, local to DSA, UAPI)? ETH_P_EDSA has a well defined meaning. It is a true global EtherType, and means a Marvell EtherType DSA header follows. You are polluting this meaning of ETH_P_EDSA. Would you put ETH_P_IP or ETH_P_8021AD here? Andrew
On Sun, 14 Apr 2019 at 19:05, Andrew Lunn <andrew@lunn.ch> wrote: > > On Sun, Apr 14, 2019 at 12:27:50AM +0300, Vladimir Oltean wrote: > > > > + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest); > > > > + mgmt_route.destports = BIT(port); > > > > + mgmt_route.enfport = 1; > > > > + mgmt_route.tsreg = 0; > > > > + mgmt_route.takets = false; > > > > + > > > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > > > + port, &mgmt_route, true); > > > rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > *port*, &mgmt_route, true); > > > > The switch IP aptly allocates 4 slots for management routes. And it's > > a 5-port switch where 1 port is the management port. I think the > > structure is fine. > > So does the hardware look over all the slots and find the first one > which has a matching mgmt_route.macaddr destination MAC address? You > wait for the enfport to be cleared. I assume a slot with enfport > cleared is not active and won't match? > > So we need to consider if there is a race condition where we have > multiple slots with the same destination MAC address, but different > destination ports? Say the bridge sends out BPDU to all ports of a > bridge in quick succession. > > These work queues run in any order, and can sleep. Can we get into a > situation where we get the two slots setup, and then the frames sent > in reverse order? The match then happens backwards, and the frames get > sent out the wrong port? > Yes, it looks like the hardware isn't doing me any favors on this one. From UM10944: "If the host provides several management route entries with identical values for the MACADDR, the one at the lowest index is used first." So the 4 hardware management slots serve no purpose unless I'm willing to lock the sja1105_xmit_work_handler with a mutex. And even then there's no reason to use separate slots since the workers are serialized anyway. Weird. > Or say the two slots are setup, the two frames are sent in order, but > the stack decided to drop the first frame because buffers are > full. Can the second frame make it to the switch and match on the > first slot and go out the wrong port? > Yes if waiting for enfport times out there's some cleanup work I'm currently not doing. > > > Also, please move all this code into the tagger. Just add exports for > > > sja1105_dynamic_config_write() and sja1105_dynamic_config_read(). > > > > > > > Well, you see, the tagger code is part of the dsa_core object. If I > > export function symbols from the driver, those still won't be there if > > I compile the driver as a module. On the other hand, the way I'm doing > > it, I think the schedule_work() gives me a pretty good separation. > > That is solvable via Kconfig, don't allow it to be built as a module. > > Also, DSA has been very successful, we keep getting more switches from > different vendors, and hence more taggers. So at some point, we should > turn the taggers into modules. I'm not saying that should happen now, > but when it does happen, this driver can then become a module. > > The real reason is, tagger as all about handling frames, where as > drivers are all about configuring the switch. The majority of this > code is about frames, so it belongs in the tagger. > The xmit worker needs to configure the switch to be able to handle frames. Why doesn't that belong in the driver? Not allowing the driver to be built as module is hardly any cleaner than a schedule_work(). I only need dsa_slave_to_master for the delayed enqueue. And if DSA supported a delayed enqueue method natively I wouldn't need it at all. > > > > +#include <linux/etherdevice.h> > > > > +#include <linux/if_vlan.h> > > > > +#include <linux/dsa/sja1105.h> > > > > +#include "../../drivers/net/dsa/sja1105/sja1105.h" > > > > > > Again, no, don't do this. > > > > > > > This separation between driver and tagger is fairly arbitrary. > > I need access to the driver's private structure, in order to get a > > hold of the private shadow of the dsa_port. Moving the driver private > > structure to include/linux/dsa/ would pull in quite a number of > > dependencies. Maybe I could provide declarations for the most of them, > > but anyway the private structure wouldn't be so private any longer, > > would it? > > Otherwise put, would you prefer a dp->priv similar to the already > > existing ds->priv? struct sja1105_port is much more lightweight to > > keep in include/linux/dsa/. > > Linux simply does not make use of relative paths going between > directories like this. That is the key point here. Whatever you need > to share between the tagger and the driver has to be put into > include/linux/dsa/. > > Assuming we are just exporting something like > sja1105_dynamic_config_write() and _read() > > > > > > + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, > > > > + port, &mgmt_route, true); > > priv can be replaced with ds, which the tagger has. port is > known. BLK_IDX_MGMT_ROUTE is implicit, and all that the tagger needs > to pass for mgmt_route is the destination MAC address, which it has. > > The tagger does need somewhere to keep the queue of frames to be sent > and its workqueue. I would probably add a void *tagger_priv to > dsa_switch, and two new methods to dsa_device_ops, .probe and > .release, to allow it to create and destroy what it needs in > tagger_priv. > I need to think about this. > > > > +#include "dsa_priv.h" > > > > + > > > > +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */ > > > > +static inline bool sja1105_is_link_local(const struct sk_buff *skb) > > > > +{ > > > > + const struct ethhdr *hdr = eth_hdr(skb); > > > > + u64 dmac = ether_addr_to_u64(hdr->h_dest); > > > > + > > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) == > > > > + SJA1105_LINKLOCAL_FILTER_A) > > > > + return true; > > > > + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) == > > > > + SJA1105_LINKLOCAL_FILTER_B) > > > > + return true; > > > > + return false; > > > > +} > > > > + > > > > +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev) > > > > +{ > > > > + if (sja1105_is_link_local(skb)) > > > > + return true; > > > > + if (!dev->dsa_ptr->vlan_filtering) > > > > + return true; > > > > + return false; > > > > +} > > > > > > Please add a comment here about what frames cannot be handled by the > > > tagger. However, i'm not too happy about this design... > > > > > > What would you improve about this design (assuming you're talking > > about the filter function)? > > I want to understand what frames get passed via the master device, and > how ultimately they get to where they should be going. > > Once i understand what sort of frames they are and what is > generating/consuming them, maybe we can find a better solution which > preserves the DSA concepts. > > To me, it looks like they are not management frames, at least not BPDU > or PTP, since they are link local. If VLAN filtering is off, the VLAN > tag tells us which port they came in, so we can strip the tag and pass > them to the correct slave. > > So it looks like real user frames with a VLAN tag are getting passed > to the master device. So i then assume you put vlan interfaces on top > of the master device, and your application then uses the vlan > interfaces? Your application does not care where the frames came from, > it is using the switch as a dumb switch. The DSA slaves are unused? > Whatever the application may be, the DSA solution to switches that can't decode all incoming traffic is to drop the rest. In this case it means that the host port is no longer a valid destination for the L2 switching process. > Could we enforce that a VLAN can only be assigned to a single port? > The tagger could then pass the tagged frame to the correct slave? Is > that too restrictive for your use case? Do you need the same VLAN on > multiple ports. > > Andrew No we can't enforce that. The commit message of 07/24 has a pretty lengthy explanation why. Thanks, -Vladimir
On Sun, 14 Apr 2019 at 19:18, Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > > > > > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); > > > > > > > > > > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > > > > > > > > > I'm receiving contradictory advice on this. Why should I define a new > > ethertype, and if I do, what scope should the definition have (local > > to the driver and the tagger, local to DSA, UAPI)? > > ETH_P_EDSA has a well defined meaning. It is a true global EtherType, > and means a Marvell EtherType DSA header follows. > > You are polluting this meaning of ETH_P_EDSA. Would you put ETH_P_IP > or ETH_P_8021AD here? > > Andrew You are putting an equality sign here between things that are not quite equal. The MEDSA EtherType is used for the same purpose as what I'm using it for. The only situation when I can receive ETH_P_EDSA frames is if somebody designed a system with a cascaded SJA1105 and a MV88E6xx. I think that's unlikely but I might be wrong. Don't get me wrong, I could use literally any EtherType and that's exactly why I'm reluctant to define a new one. The only thing is that if I pick an EtherType smaller than 1500 (LLC/SNAP) like ETH_P_XDSA (or even zero works), then I get the hardware incrementing the n_sizeerr counter for each received tagged frame (it doesn't drop it, though). -Vladimir
> Yes, it looks like the hardware isn't doing me any favors on this one. > >From UM10944: "If the host provides several management route entries > with identical values for the MACADDR, the one at the lowest index is > used first." > So the 4 hardware management slots serve no purpose unless I'm willing > to lock the sja1105_xmit_work_handler with a mutex. And even then > there's no reason to use separate slots since the workers are > serialized anyway. Weird. So you can simply this down to just using one queue. Maybe even one queue for all instances of this switch. You can then keep it all as static private structures in the tag driver, maybe not even needing ds->tag_priv. > > Also, DSA has been very successful, we keep getting more switches from > > different vendors, and hence more taggers. So at some point, we should > > turn the taggers into modules. I'm not saying that should happen now, > > but when it does happen, this driver can then become a module. FYI: I started on this already. There might be patches today to allow the tag drivers to be kernel modules. Andrew
On Sun, Apr 14, 2019 at 09:53:42PM +0300, Vladimir Oltean wrote: > On Sun, 14 Apr 2019 at 19:18, Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > > > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > > > > > > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); > > > > > > > > > > > > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > > > > > > > > > > > > I'm receiving contradictory advice on this. Why should I define a new > > > ethertype, and if I do, what scope should the definition have (local > > > to the driver and the tagger, local to DSA, UAPI)? > > > > ETH_P_EDSA has a well defined meaning. It is a true global EtherType, > > and means a Marvell EtherType DSA header follows. > > > > You are polluting this meaning of ETH_P_EDSA. Would you put ETH_P_IP > > or ETH_P_8021AD here? > > > > Andrew > > You are putting an equality sign here between things that are not quite equal. > The MEDSA EtherType is used for the same purpose as what I'm using it for. I don't think it is. tcpdump will match on this EtherType and decode what follows as an EDSA header, just in the same way it matches a ETH_P_IP and decodes what comes next as an IP packet. I also have wireshark patches, which i never submitted, which do the same. Please run tcpdump on the master interface with your test system and see what it does. Andrew
On Sun, 14 Apr 2019 at 22:13, Andrew Lunn <andrew@lunn.ch> wrote: > > On Sun, Apr 14, 2019 at 09:53:42PM +0300, Vladimir Oltean wrote: > > On Sun, 14 Apr 2019 at 19:18, Andrew Lunn <andrew@lunn.ch> wrote: > > > > > > > > > > > + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, > > > > > > > > + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); > > > > > > > > > > > > > > Please don't reuse ETH_P_EDSA. Define an ETH_P_SJA1105. > > > > > > > > > > > > > > > I'm receiving contradictory advice on this. Why should I define a new > > > > ethertype, and if I do, what scope should the definition have (local > > > > to the driver and the tagger, local to DSA, UAPI)? > > > > > > ETH_P_EDSA has a well defined meaning. It is a true global EtherType, > > > and means a Marvell EtherType DSA header follows. > > > > > > You are polluting this meaning of ETH_P_EDSA. Would you put ETH_P_IP > > > or ETH_P_8021AD here? > > > > > > Andrew > > > > You are putting an equality sign here between things that are not quite equal. > > The MEDSA EtherType is used for the same purpose as what I'm using it for. > > I don't think it is. tcpdump will match on this EtherType and decode > what follows as an EDSA header, just in the same way it matches a > ETH_P_IP and decodes what comes next as an IP packet. I also have > wireshark patches, which i never submitted, which do the same. > > Please run tcpdump on the master interface with your test system and > see what it does. > > Andrew It fails to decode the frames, obviously. But so does any other EtherType. Florian was hinting (https://lwn.net/ml/netdev/b52f4cdf-edcf-0757-1d6e-d4a831ec7943@gmail.com/) at the recent pull requests submitted to tcpdump and libpcap that make it possible to decode based on the string in /sys/class/net/${master}/dsa/tagging. I admit I haven't actually tested or studied those closely yet (there are more important things to focus on ATM), but since my driver returns "8021q" in sysfs and yours returns "edsa", I would presume tcpdump could use that information. Anyway, since you obviously know more on this topic than I do, please make me understand what are the real problems in spoofing the Ethertype as a Marvell one. Thanks, -Vladimir
> It fails to decode the frames, obviously. But so does any other EtherType. > Florian was hinting > (https://lwn.net/ml/netdev/b52f4cdf-edcf-0757-1d6e-d4a831ec7943@gmail.com/) > at the recent pull requests submitted to tcpdump and libpcap that make > it possible to decode based on the string in > /sys/class/net/${master}/dsa/tagging. I admit I haven't actually > tested or studied those closely yet (there are more important things > to focus on ATM), but since my driver returns "8021q" in sysfs and > yours returns "edsa", I would presume tcpdump could use that > information. No it does not. It is a valid EtherType, that is what is used to trigger the decoding, it takes no notice of what is in /sys/class/net/${master}/dsa/tagging, nor the extra meta-data added to the pcap file. There is no need. you can identify it is a Marvell EDSA header from the EtherType. In fact, this tcpdump code for decoding EDSA pre-dated Florians patches by a few years. You only need the code which Florian added when you cannot identify the header directly from the packet. And that is true for most of the tagging protocols. But EDSA you can. > Anyway, since you obviously know more on this topic than I do, > please make me understand what are the real problems in spoofing the > Ethertype as a Marvell one. Despite there being an EDSA EtherType in the frame, what follows is not an ESDA header. It is like having the IPv4 EtherType but what following is not an IP header. Broken. Andrew
On 14/04/2019 20:07, Andrew Lunn wrote: >> It fails to decode the frames, obviously. But so does any other EtherType. > >> Florian was hinting >> (https://lwn.net/ml/netdev/b52f4cdf-edcf-0757-1d6e-d4a831ec7943@gmail.com/) >> at the recent pull requests submitted to tcpdump and libpcap that make >> it possible to decode based on the string in >> /sys/class/net/${master}/dsa/tagging. I admit I haven't actually >> tested or studied those closely yet (there are more important things >> to focus on ATM), but since my driver returns "8021q" in sysfs and >> yours returns "edsa", I would presume tcpdump could use that >> information. > > No it does not. It is a valid EtherType, that is what is used to > trigger the decoding, it takes no notice of what is in > /sys/class/net/${master}/dsa/tagging, nor the extra meta-data added to > the pcap file. There is no need. you can identify it is a Marvell EDSA > header from the EtherType. > > In fact, this tcpdump code for decoding EDSA pre-dated Florians > patches by a few years. > > You only need the code which Florian added when you cannot identify > the header directly from the packet. And that is true for most of the > tagging protocols. But EDSA you can. Correct. > >> Anyway, since you obviously know more on this topic than I do, >> please make me understand what are the real problems in spoofing the >> Ethertype as a Marvell one. > > Despite there being an EDSA EtherType in the frame, what follows is > not an ESDA header. It is like having the IPv4 EtherType but what > following is not an IP header. Broken. I suppose this is a valid point, but in that case any EtherType would do and technically using ETH_P_EDSA is just an one of the many possible choices for configuring the Marvell EDSA EtherType, you just need to pick one that is not going to trick the switching into thinking this is invalid LLC/SNAP. With Vivien's latest tcpdump changes, I don't think we need to rely on ETH_P_EDSA to be present anyway since the kernel tells us (when available).
On 13/04/2019 14:27, Vladimir Oltean wrote: > > Ok, let's put this another way. > A switch is primarily a device used to offload the forwarding of > traffic based on L2 rules. Additionally there may be some management > traffic for stuff like STP that needs to be terminated on the host > port of the switch. For that, the hardware's job is to filter and tag > management frames on their way to the host port, and the software's > job is to process the source port and switch id information in a > meaningful way. > Now both this particular switch hardware, and DSA, are taking the > above definitions to extremes. > The switch says: "that's all you want to see? ok, so that's all I'm > going to give you". So its native (hardware) tagging protocol is to > trap link-local traffic and overwrite two bytes of its destination MAC > with the switch ID and the source port. No more, no less. It is an > incomplete solution, but it does the job for practical use cases. Indeed. > Now DSA says: "I want these to be fully capable net devices, I want > the user to not even realize what's going on under the hood". I don't > think that terminating iperf traffic through switch ports is a > realistic usage scenario. So in a way discussions about performance > and optimizations on DSA hotpath are slightly pointless IMO. Actually it is on Broadcom devices that I directly or indirectly helped to support with bcm_sf2/b53 we have 2Gb/sec capable management ports and we run iperf directly on the host CPUs. Some ports remain standalone (e.g.: WAN) and the others can be bridged together (LAN + WLAN). > Now what my driver says is that it offers a bit of both. It speaks the > hardware's tagging protocol so it is capable of management traffic, > but it also speaks the DSA paradigm, so in a way pushes the hardware > to work in a mode it was never intended to, by repurposing VLANs when > the user doesn't request them. So on one hand there is some overlap > between the hardware tagging protocol and the VLAN one (in standalone > mode and in VLAN-unaware bridged mode, management traffic *could* use > VLAN tagging but it doesn't rely on it), and on the other hand the > reunion of the two tagging protocols is decent, but still doesn't > cover the entire spectrum (when put under a VLAN-aware bridge, you > lose the ability to decode general traffic). So you'd better not rely > on VLANs to decode the management traffic, because you won't be able > to always rely on that, and that is a shame since a bridge with both > vlan_filtering 1 and stp_state 1 is a real usage scenario, and the > hardware is capable of that combination. > But all of that is secondary. Let's forget about VLAN tagging for a > second and concentrate on the tagging of management traffic. The > limiting factor here is the software architecture of DSA, because in > order for me to decode that in the driver/tagger, I'd have to drop > everything else coming on the master net device (I explained in 13/24 > why). I believe that DSA being all-or-nothing about switch tagging is > turning a blind eye to the devices that don't go overboard with > features, and give you what's needed in a real-world design but not > much else. I would word it differently and say that up until now, whatever DSA assumed about switches was something that was supportable and with the sja1105 we are faced with an interesting of limits on both designs. I don't think DSA is unreasonable in assuming that management frame is always tagged with a proprietary switch protocol; because that's what has happened across a wide variety of vendors. The NXP SJA1105 is not unreasonable but it does present some challenges. > What would you improve about this design (assuming you're talking > about the filter function)? Would assigning different MAC addresses to each standalone port help in any way such that you could leverage filtering in HW based on MAC DA?
diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h index 80b20bdd8f9c..b7e745c0bb3a 100644 --- a/drivers/net/dsa/sja1105/sja1105.h +++ b/drivers/net/dsa/sja1105/sja1105.h @@ -5,6 +5,7 @@ #ifndef _SJA1105_H #define _SJA1105_H +#include <linux/dsa/sja1105.h> #include <net/dsa.h> #include "sja1105_static_config.h" @@ -19,6 +20,12 @@ #define SJA1105_NUM_TC 8 #define SJA1105ET_FDB_BIN_SIZE 4 +struct sja1105_port { + struct dsa_port *dp; + struct work_struct xmit_work; + struct sja1105_skb_ring xmit_ring; +}; + /* Keeps the different addresses between E/T and P/Q/R/S */ struct sja1105_regs { u64 device_id; @@ -63,6 +70,7 @@ struct sja1105_private { struct gpio_desc *reset_gpio; struct spi_device *spidev; struct dsa_switch *ds; + struct sja1105_port ports[SJA1105_NUM_PORTS]; }; #include "sja1105_dynamic_config.h" diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c index 9d28436fe466..018044f358fd 100644 --- a/drivers/net/dsa/sja1105/sja1105_main.c +++ b/drivers/net/dsa/sja1105/sja1105_main.c @@ -1112,10 +1112,27 @@ static int sja1105_vlan_apply(struct sja1105_private *priv, int port, u16 vid, return 0; } +static int sja1105_setup_8021q_tagging(struct dsa_switch *ds, bool enabled) +{ + int rc, i; + + for (i = 0; i < SJA1105_NUM_PORTS; i++) { + rc = dsa_port_setup_8021q_tagging(ds, i, enabled); + if (rc < 0) { + dev_err(ds->dev, "Failed to setup VLAN tagging for port %d: %d\n", + i, rc); + return rc; + } + } + dev_info(ds->dev, "%s switch tagging\n", + enabled ? "Enabled" : "Disabled"); + return 0; +} + static enum dsa_tag_protocol sja1105_get_tag_protocol(struct dsa_switch *ds, int port) { - return DSA_TAG_PROTO_NONE; + return DSA_TAG_PROTO_SJA1105; } /* This callback needs to be present */ @@ -1141,7 +1158,11 @@ static int sja1105_vlan_filtering(struct dsa_switch *ds, int port, bool enabled) if (rc) dev_err(ds->dev, "Failed to change VLAN Ethertype\n"); - return rc; + /* Switch port identification based on 802.1Q is only passable + * if we are not under a vlan_filtering bridge. So make sure + * the two configurations are mutually exclusive. + */ + return sja1105_setup_8021q_tagging(ds, !enabled); } static void sja1105_vlan_add(struct dsa_switch *ds, int port, @@ -1233,9 +1254,107 @@ static int sja1105_setup(struct dsa_switch *ds) */ ds->vlan_filtering_is_global = true; + /* The DSA/switchdev model brings up switch ports in standalone mode by + * default, and that means vlan_filtering is 0 since they're not under + * a bridge, so it's safe to set up switch tagging at this time. + */ + return sja1105_setup_8021q_tagging(ds, true); +} + +#include "../../../net/dsa/dsa_priv.h" +/* Deferred work is unfortunately necessary because setting up the management + * route cannot be done from atomit context (SPI transfer takes a sleepable + * lock on the bus) + */ +static void sja1105_xmit_work_handler(struct work_struct *work) +{ + struct sja1105_port *sp = container_of(work, struct sja1105_port, + xmit_work); + struct sja1105_private *priv = sp->dp->ds->priv; + struct net_device *slave = sp->dp->slave; + struct net_device *master = dsa_slave_to_master(slave); + int port = (uintptr_t)(sp - priv->ports); + struct sk_buff *skb; + int i, rc; + + while ((i = sja1105_skb_ring_get(&sp->xmit_ring, &skb)) >= 0) { + struct sja1105_mgmt_entry mgmt_route = { 0 }; + struct ethhdr *hdr; + int timeout = 10; + int skb_len; + + skb_len = skb->len; + hdr = eth_hdr(skb); + + mgmt_route.macaddr = ether_addr_to_u64(hdr->h_dest); + mgmt_route.destports = BIT(port); + mgmt_route.enfport = 1; + mgmt_route.tsreg = 0; + mgmt_route.takets = false; + + rc = sja1105_dynamic_config_write(priv, BLK_IDX_MGMT_ROUTE, + port, &mgmt_route, true); + if (rc < 0) { + kfree_skb(skb); + slave->stats.tx_dropped++; + continue; + } + + /* Transfer skb to the host port. */ + skb->dev = master; + dev_queue_xmit(skb); + + /* Wait until the switch has processed the frame */ + do { + rc = sja1105_dynamic_config_read(priv, BLK_IDX_MGMT_ROUTE, + port, &mgmt_route); + if (rc < 0) { + slave->stats.tx_errors++; + dev_err(priv->ds->dev, + "xmit: failed to poll for mgmt route\n"); + continue; + } + + /* UM10944: The ENFPORT flag of the respective entry is + * cleared when a match is found. The host can use this + * flag as an acknowledgment. + */ + cpu_relax(); + } while (mgmt_route.enfport && --timeout); + + if (!timeout) { + dev_err(priv->ds->dev, "xmit timed out\n"); + slave->stats.tx_errors++; + continue; + } + + slave->stats.tx_packets++; + slave->stats.tx_bytes += skb_len; + } +} + +static int sja1105_port_enable(struct dsa_switch *ds, int port, + struct phy_device *phydev) +{ + struct sja1105_private *priv = ds->priv; + struct sja1105_port *sp = &priv->ports[port]; + + sp->dp = &ds->ports[port]; + INIT_WORK(&sp->xmit_work, sja1105_xmit_work_handler); return 0; } +static void sja1105_port_disable(struct dsa_switch *ds, int port) +{ + struct sja1105_private *priv = ds->priv; + struct sja1105_port *sp = &priv->ports[port]; + struct sk_buff *skb; + + cancel_work_sync(&sp->xmit_work); + while (sja1105_skb_ring_get(&sp->xmit_ring, &skb) >= 0) + kfree_skb(skb); +} + static const struct dsa_switch_ops sja1105_switch_ops = { .get_tag_protocol = sja1105_get_tag_protocol, .setup = sja1105_setup, @@ -1255,6 +1374,8 @@ static const struct dsa_switch_ops sja1105_switch_ops = { .port_mdb_prepare = sja1105_mdb_prepare, .port_mdb_add = sja1105_mdb_add, .port_mdb_del = sja1105_mdb_del, + .port_enable = sja1105_port_enable, + .port_disable = sja1105_port_disable, }; static int sja1105_check_device_id(struct sja1105_private *priv) diff --git a/include/linux/dsa/sja1105.h b/include/linux/dsa/sja1105.h new file mode 100644 index 000000000000..dd40ffa475cc --- /dev/null +++ b/include/linux/dsa/sja1105.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0 + * Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com> + */ + +/* Included by drivers/net/dsa/sja1105/sja1105.h and net/dsa/tag_sja1105.c */ + +#ifndef _NET_DSA_SJA1105_H +#define _NET_DSA_SJA1105_H + +#include <linux/skbuff.h> +#include <net/dsa.h> + +#define SJA1105_SKB_RING_SIZE 20 + +struct sja1105_skb_ring { + struct sk_buff *skb[SJA1105_SKB_RING_SIZE]; + int count; + int pi; /* Producer index */ + int ci; /* Consumer index */ +}; + +static inline int sja1105_skb_ring_add(struct sja1105_skb_ring *ring, + struct sk_buff *skb) +{ + int index; + + if (ring->count == SJA1105_SKB_RING_SIZE) + return -1; + + index = ring->pi; + ring->skb[index] = skb; + ring->pi = (index + 1) % SJA1105_SKB_RING_SIZE; + ring->count++; + return index; +} + +static inline int sja1105_skb_ring_get(struct sja1105_skb_ring *ring, + struct sk_buff **skb) +{ + int index; + + if (ring->count == 0) + return -1; + + index = ring->ci; + *skb = ring->skb[index]; + ring->ci = (index + 1) % SJA1105_SKB_RING_SIZE; + ring->count--; + return index; +} + +#endif /* _NET_DSA_SJA1105_H */ diff --git a/include/net/dsa.h b/include/net/dsa.h index e46c107507d8..9c7e29c4f22a 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -41,6 +41,7 @@ enum dsa_tag_protocol { DSA_TAG_PROTO_KSZ9893, DSA_TAG_PROTO_LAN9303, DSA_TAG_PROTO_MTK, + DSA_TAG_PROTO_SJA1105, DSA_TAG_PROTO_QCA, DSA_TAG_PROTO_TRAILER, DSA_TAG_LAST, /* MUST BE LAST */ diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig index b2fc07de8bcb..18deab52890f 100644 --- a/net/dsa/Kconfig +++ b/net/dsa/Kconfig @@ -65,6 +65,9 @@ config NET_DSA_TAG_LAN9303 config NET_DSA_TAG_MTK bool +config NET_DSA_TAG_SJA1105 + bool + config NET_DSA_TAG_TRAILER bool diff --git a/net/dsa/Makefile b/net/dsa/Makefile index d7fc3253d497..8c294cdb895a 100644 --- a/net/dsa/Makefile +++ b/net/dsa/Makefile @@ -15,4 +15,5 @@ dsa_core-$(CONFIG_NET_DSA_TAG_KSZ) += tag_ksz.o dsa_core-$(CONFIG_NET_DSA_TAG_LAN9303) += tag_lan9303.o dsa_core-$(CONFIG_NET_DSA_TAG_MTK) += tag_mtk.o dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o +dsa_core-$(CONFIG_NET_DSA_TAG_SJA1105) += tag_sja1105.o dsa_core-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 36de4f2a3366..7e2542d756e0 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -65,6 +65,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = { #ifdef CONFIG_NET_DSA_TAG_MTK [DSA_TAG_PROTO_MTK] = &mtk_netdev_ops, #endif +#ifdef CONFIG_NET_DSA_TAG_SJA1105 + [DSA_TAG_PROTO_SJA1105] = &sja1105_netdev_ops, +#endif #ifdef CONFIG_NET_DSA_TAG_QCA [DSA_TAG_PROTO_QCA] = &qca_netdev_ops, #endif @@ -102,6 +105,9 @@ const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops) #ifdef CONFIG_NET_DSA_TAG_MTK [DSA_TAG_PROTO_MTK] = "mtk", #endif +#ifdef CONFIG_NET_DSA_TAG_SJA1105 + [DSA_TAG_PROTO_SJA1105] = "8021q", +#endif #ifdef CONFIG_NET_DSA_TAG_QCA [DSA_TAG_PROTO_QCA] = "qca", #endif diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index cc5ec3759952..d58a749c8b5b 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -236,6 +236,9 @@ extern const struct dsa_device_ops lan9303_netdev_ops; /* tag_mtk.c */ extern const struct dsa_device_ops mtk_netdev_ops; +/* tag_sja1105.c */ +extern const struct dsa_device_ops sja1105_netdev_ops; + /* tag_qca.c */ extern const struct dsa_device_ops qca_netdev_ops; diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c new file mode 100644 index 000000000000..5c76a06c9093 --- /dev/null +++ b/net/dsa/tag_sja1105.c @@ -0,0 +1,148 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com> + */ +#include <linux/etherdevice.h> +#include <linux/if_vlan.h> +#include <linux/dsa/sja1105.h> +#include "../../drivers/net/dsa/sja1105/sja1105.h" + +#include "dsa_priv.h" + +/* Similar to is_link_local_ether_addr(hdr->h_dest) but also covers PTP */ +static inline bool sja1105_is_link_local(const struct sk_buff *skb) +{ + const struct ethhdr *hdr = eth_hdr(skb); + u64 dmac = ether_addr_to_u64(hdr->h_dest); + + if ((dmac & SJA1105_LINKLOCAL_FILTER_A_MASK) == + SJA1105_LINKLOCAL_FILTER_A) + return true; + if ((dmac & SJA1105_LINKLOCAL_FILTER_B_MASK) == + SJA1105_LINKLOCAL_FILTER_B) + return true; + return false; +} + +static bool sja1105_filter(const struct sk_buff *skb, struct net_device *dev) +{ + if (sja1105_is_link_local(skb)) + return true; + if (!dev->dsa_ptr->vlan_filtering) + return true; + return false; +} + +static struct sk_buff *sja1105_xmit(struct sk_buff *skb, + struct net_device *netdev) +{ + struct dsa_port *dp = dsa_slave_to_port(netdev); + struct dsa_switch *ds = dp->ds; + struct sja1105_private *priv = ds->priv; + struct sja1105_port *sp = &priv->ports[dp->index]; + struct sk_buff *clone; + + if (likely(!sja1105_is_link_local(skb))) { + /* Normal traffic path. */ + u16 tx_vid = dsa_tagging_tx_vid(ds, dp->index); + u8 pcp = skb->priority; + + /* If we are under a vlan_filtering bridge, IP termination on + * switch ports based on 802.1Q tags is simply too brittle to + * be passable. So just defer to the dsa_slave_notag_xmit + * implementation. + */ + if (dp->vlan_filtering) + return skb; + + return dsa_8021q_xmit(skb, netdev, ETH_P_EDSA, + ((pcp << VLAN_PRIO_SHIFT) | tx_vid)); + } + + /* Code path for transmitting management traffic. This does not rely + * upon switch tagging, but instead SPI-installed management routes. + */ + clone = skb_clone(skb, GFP_ATOMIC); + if (!clone) { + dev_err(ds->dev, "xmit: failed to clone skb\n"); + return NULL; + } + + if (sja1105_skb_ring_add(&sp->xmit_ring, clone) < 0) { + dev_err(ds->dev, "xmit: skb ring full\n"); + kfree_skb(clone); + return NULL; + } + + if (sp->xmit_ring.count == SJA1105_SKB_RING_SIZE) + /* TODO setup a dedicated netdev queue for management traffic + * so that we can selectively apply backpressure and not be + * required to stop the entire traffic when the software skb + * ring is full. This requires hooking the ndo_select_queue + * from DSA and matching on mac_fltres. + */ + dev_err(ds->dev, "xmit: reached maximum skb ring size\n"); + + schedule_work(&sp->xmit_work); + /* Let DSA free its reference to the skb and we will free + * the clone in the deferred worker + */ + return NULL; +} + +static struct sk_buff *sja1105_rcv(struct sk_buff *skb, + struct net_device *netdev, + struct packet_type *pt) +{ + unsigned int source_port, switch_id; + struct ethhdr *hdr = eth_hdr(skb); + struct sk_buff *nskb; + u16 tpid, vid, tci; + bool is_tagged; + + nskb = dsa_8021q_rcv(skb, netdev, pt, &tpid, &tci); + is_tagged = (nskb && tpid == ETH_P_EDSA); + + skb->priority = (tci & VLAN_PRIO_MASK) >> VLAN_PRIO_SHIFT; + vid = tci & VLAN_VID_MASK; + + skb->offload_fwd_mark = 1; + + if (likely(!sja1105_is_link_local(skb))) { + /* Normal traffic path. */ + source_port = dsa_tagging_rx_source_port(vid); + switch_id = dsa_tagging_rx_switch_id(vid); + } else { + /* Management traffic path. Switch embeds the switch ID and + * port ID into bytes of the destination MAC, courtesy of + * the incl_srcpt options. + */ + source_port = hdr->h_dest[3]; + switch_id = hdr->h_dest[4]; + /* Clear the DMAC bytes that were mangled by the switch */ + hdr->h_dest[3] = 0; + hdr->h_dest[4] = 0; + } + + skb->dev = dsa_master_find_slave(netdev, switch_id, source_port); + if (!skb->dev) { + netdev_warn(netdev, "Couldn't decode source port\n"); + return NULL; + } + + /* Delete/overwrite fake VLAN header, DSA expects to not find + * it there, see dsa_switch_rcv: skb_push(skb, ETH_HLEN). + */ + if (is_tagged) + memmove(skb->data - ETH_HLEN, skb->data - ETH_HLEN - VLAN_HLEN, + ETH_HLEN - VLAN_HLEN); + + return skb; +} + +const struct dsa_device_ops sja1105_netdev_ops = { + .xmit = sja1105_xmit, + .rcv = sja1105_rcv, + .filter = sja1105_filter, + .overhead = VLAN_HLEN, +}; +