[net-next,v3,0/6] net: lan966x: Add switchdev and vlan support

Message ID	20211209094615.329379-1-horatiu.vultur@microchip.com
Headers	show Return-Path: <devicetree-owner@vger.kernel.org> IronPort-SDR: GA//ScbG5It7MG7JrqmiMeV5lQzi0odZsDDEAUlLpdWRrRckBqTeqB1lu+6Egcm9IUlk3Bumka N4lGR65DeqTfEyWEHrP4dGA1V8844mVVcs1vkqQU2fXonVAhzuHUwhZmqM3rgCnvLfFRcCsERv Dlm1dcqlxDUy2mpRmQtw7+GyfU6k30p8L/Kga/5SDkZmYnVWig/CtxxzPtP/LCHLg/+PNEhtZc NzzlpomkCxFVprwVRNVN7j2AoZOb3HRd5wIPPicijiEAtXv5PYyn8+KN+JGmtP9AsE3n4Y7DSK xLTSwcRXs7oEmsUi3oihG0Yj From: Horatiu Vultur <horatiu.vultur@microchip.com> To: <netdev@vger.kernel.org>, <devicetree@vger.kernel.org>, <linux-kernel@vger.kernel.org> CC: <davem@davemloft.net>, <kuba@kernel.org>, <robh+dt@kernel.org>, <UNGLinuxDriver@microchip.com>, <linux@armlinux.org.uk>, <f.fainelli@gmail.com>, <vivien.didelot@gmail.com>, <vladimir.oltean@nxp.com>, <andrew@lunn.ch>, Horatiu Vultur <horatiu.vultur@microchip.com> Subject: [PATCH net-next v3 0/6] net: lan966x: Add switchdev and vlan support Date: Thu, 9 Dec 2021 10:46:09 +0100 Message-ID: <20211209094615.329379-1-horatiu.vultur@microchip.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	net: lan966x: Add switchdev and vlan support \| expand [net-next,v3,0/6] net: lan966x: Add switchdev and vlan support [net-next,v3,2/6] dt-bindings: net: lan966x: Extend with the analyzer interrupt

Horatiu Vultur Dec. 9, 2021, 9:46 a.m. UTC

This patch series extends lan966x with switchdev and vlan support.
The first patches just adds new registers and extend the MAC table to
handle the interrupts when a new address is learn/forget.
The last 2 patches adds the vlan and the switchdev support.

v2->v3:
- separate the PVID used when the port is in host mode or vlan unaware
- fix issue when the port was leaving the bridge

v1->v2:
- when allocating entries for the mac table use kzalloc instead of
  devm_kzalloc
- also use GFP_KERNEL instead of GFP_ATOMIC, because is never called
  in atomic context
- when deleting an mac table entry, the order of operations was wrong
- if ana irq is enabled make sure it gets disabled when the driver is
  removed

Horatiu Vultur (6):
  net: lan966x: Add registers that are used for switch and vlan
    functionality
  dt-bindings: net: lan966x: Extend with the analyzer interrupt
  net: lan966x: add support for interrupts from analyzer
  net: lan966x: More MAC table functionality
  net: lan966x: Add vlan support
  net: lan966x: Add switchdev support

 .../net/microchip,lan966x-switch.yaml         |   2 +
 .../net/ethernet/microchip/lan966x/Makefile   |   3 +-
 .../ethernet/microchip/lan966x/lan966x_mac.c  | 352 +++++++++++
 .../ethernet/microchip/lan966x/lan966x_main.c |  99 +++-
 .../ethernet/microchip/lan966x/lan966x_main.h |  73 ++-
 .../ethernet/microchip/lan966x/lan966x_regs.h | 129 +++++
 .../microchip/lan966x/lan966x_switchdev.c     | 548 ++++++++++++++++++
 .../ethernet/microchip/lan966x/lan966x_vlan.c | 446 ++++++++++++++
 8 files changed, 1637 insertions(+), 15 deletions(-)
 create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
 create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c

Vladimir Oltean Dec. 9, 2021, 11:47 a.m. UTC | #1

On Thu, Dec 09, 2021 at 10:46:12AM +0100, Horatiu Vultur wrote:
> This patch adds support for handling the interrupts generated by the
> analyzer. Currently, only the MAC table generates these interrupts.
> The MAC table will generate an interrupt whenever it learns or forgets
> an entry in the table. It is the SW responsibility figure out which
> entries were added/removed.
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---
>  .../ethernet/microchip/lan966x/lan966x_mac.c  | 244 ++++++++++++++++++
>  .../ethernet/microchip/lan966x/lan966x_main.c |  23 ++
>  .../ethernet/microchip/lan966x/lan966x_main.h |   6 +
>  3 files changed, 273 insertions(+)
> 
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c b/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> index f6878b9f57ef..c01ab01bffbf 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0+
>  
> +#include <net/switchdev.h>
>  #include "lan966x_main.h"
>  
>  #define LAN966X_MAC_COLUMNS		4
> @@ -13,6 +14,23 @@
>  #define MACACCESS_CMD_WRITE		7
>  #define MACACCESS_CMD_SYNC_GET_NEXT	8
>  
> +#define LAN966X_MAC_INVALID_ROW		-1
> +
> +struct lan966x_mac_entry {
> +	struct list_head list;
> +	unsigned char mac[ETH_ALEN] __aligned(2);
> +	u16 vid;
> +	u16 port_index;
> +	int row;
> +};
> +
> +struct lan966x_mac_raw_entry {
> +	u32 mach;
> +	u32 macl;
> +	u32 maca;
> +	bool process;
> +};
> +
>  static int lan966x_mac_get_status(struct lan966x *lan966x)
>  {
>  	return lan_rd(lan966x, ANA_MACACCESS);
> @@ -98,4 +116,230 @@ void lan966x_mac_init(struct lan966x *lan966x)
>  	/* Clear the MAC table */
>  	lan_wr(MACACCESS_CMD_INIT, lan966x, ANA_MACACCESS);
>  	lan966x_mac_wait_for_completion(lan966x);
> +
> +	spin_lock_init(&lan966x->mac_lock);
> +	INIT_LIST_HEAD(&lan966x->mac_entries);
> +}
> +
> +static struct lan966x_mac_entry *lan966x_mac_alloc_entry(const unsigned char *mac,
> +							 u16 vid, u16 port_index)
> +{
> +	struct lan966x_mac_entry *mac_entry;
> +
> +	mac_entry = kzalloc(sizeof(*mac_entry), GFP_KERNEL);
> +	if (!mac_entry)
> +		return NULL;
> +
> +	memcpy(mac_entry->mac, mac, ETH_ALEN);
> +	mac_entry->vid = vid;
> +	mac_entry->port_index = port_index;
> +	mac_entry->row = LAN966X_MAC_INVALID_ROW;
> +	return mac_entry;
> +}
> +
> +static void lan966x_fdb_call_notifiers(enum switchdev_notifier_type type,
> +				       const char *mac, u16 vid,
> +				       struct net_device *dev)
> +{
> +	struct switchdev_notifier_fdb_info info = { 0 };
> +
> +	info.addr = mac;
> +	info.vid = vid;
> +	info.offloaded = true;
> +	call_switchdev_notifiers(type, dev, &info.info, NULL);
> +}
> +
> +void lan966x_mac_purge_entries(struct lan966x *lan966x)
> +{
> +	struct lan966x_mac_entry *mac_entry, *tmp;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&lan966x->mac_lock, flags);

I hope I'm not wrong, but you are only using this spinlock to serialize
access to the list, which isn't accessed from hardirq context anywhere
(the irq is threaded). So spin_lock_irqsave could simply be spin_lock.
Unless...

> +	list_for_each_entry_safe(mac_entry, tmp, &lan966x->mac_entries,
> +				 list) {
> +		lan966x_mac_forget(lan966x, mac_entry->mac, mac_entry->vid,
> +				   ENTRYTYPE_LOCKED);

Does this generate a MAC table interrupt?

> +
> +		list_del(&mac_entry->list);
> +		kfree(mac_entry);
> +	}
> +	spin_unlock_irqrestore(&lan966x->mac_lock, flags);
> +}
> +
> +static void lan966x_mac_notifiers(struct lan966x *lan966x,
> +				  enum switchdev_notifier_type type,
> +				  unsigned char *mac, u32 vid,
> +				  struct net_device *dev)
> +{
> +	rtnl_lock();
> +	lan966x_fdb_call_notifiers(type, mac, vid, dev);
> +	rtnl_unlock();
> +}
> +
> +static void lan966x_mac_process_raw_entry(struct lan966x_mac_raw_entry *raw_entry,
> +					  u8 *mac, u16 *vid, u32 *dest_idx)
> +{
> +	mac[0] = (raw_entry->mach >> 8)  & 0xff;
> +	mac[1] = (raw_entry->mach >> 0)  & 0xff;
> +	mac[2] = (raw_entry->macl >> 24) & 0xff;
> +	mac[3] = (raw_entry->macl >> 16) & 0xff;
> +	mac[4] = (raw_entry->macl >> 8)  & 0xff;
> +	mac[5] = (raw_entry->macl >> 0)  & 0xff;
> +
> +	*vid = (raw_entry->mach >> 16) & 0xfff;
> +	*dest_idx = ANA_MACACCESS_DEST_IDX_GET(raw_entry->maca);
> +}
> +
> +static void lan966x_mac_irq_process(struct lan966x *lan966x, u32 row,
> +				    struct lan966x_mac_raw_entry *raw_entries)
> +{
> +	struct lan966x_mac_entry *mac_entry, *tmp;
> +	char mac[ETH_ALEN] __aligned(2);

unsigned char

> +	unsigned long flags;
> +	u32 dest_idx;
> +	u32 column;
> +	u16 vid;
> +
> +	spin_lock_irqsave(&lan966x->mac_lock, flags);
> +	list_for_each_entry_safe(mac_entry, tmp, &lan966x->mac_entries, list) {
> +		bool found = false;
> +
> +		if (mac_entry->row != row)
> +			continue;

When the MAC table gets large, you could consider keeping separate lists
per row. This way you can avoid traversing a list of elements you're
sure you don't care about.

> +
> +		for (column = 0; column < LAN966X_MAC_COLUMNS; ++column) {
> +			/* All the valid entries are at the start of the row,
> +			 * so when get one invalid entry it can just skip the
> +			 * rest of the columns
> +			 */
> +			if (!ANA_MACACCESS_VALID_GET(raw_entries[column].maca))
> +				break;
> +
> +			lan966x_mac_process_raw_entry(&raw_entries[column],
> +						      mac, &vid, &dest_idx);
> +			WARN_ON(dest_idx > lan966x->num_phys_ports);
> +
> +			/* If the entry in SW is found, then there is nothing
> +			 * to do
> +			 */
> +			if (mac_entry->vid == vid &&
> +			    ether_addr_equal(mac_entry->mac, mac) &&
> +			    mac_entry->port_index == dest_idx) {
> +				raw_entries[column].process = true;
> +				found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!found) {
> +			/* Notify the bridge that the entry doesn't exist
> +			 * anymore in the HW and remmove the entry from the SW

s/remmove/remove/

> +			 * list
> +			 */
> +			lan966x_mac_notifiers(lan966x, SWITCHDEV_FDB_DEL_TO_BRIDGE,
> +					      mac_entry->mac, mac_entry->vid,
> +					      lan966x->ports[mac_entry->port_index]->dev);
> +
> +			list_del(&mac_entry->list);
> +			kfree(mac_entry);
> +		}
> +	}
> +	spin_unlock_irqrestore(&lan966x->mac_lock, flags);
> +
> +	/* Now go to the list of columns and see if any entry was not in the SW
> +	 * list, then that means that the entry is new so it needs to notify the
> +	 * bridge.
> +	 */
> +	for (column = 0; column < LAN966X_MAC_COLUMNS; ++column) {
> +		/* All the valid entries are at the start of the row, so when
> +		 * get one invalid entry it can just skip the rest of the columns
> +		 */
> +		if (!ANA_MACACCESS_VALID_GET(raw_entries[column].maca))
> +			break;
> +
> +		/* If the entry already exists then don't do anything */
> +		if (raw_entries[column].process)

s/process/processed/

> +			continue;
> +
> +		lan966x_mac_process_raw_entry(&raw_entries[column],
> +					      mac, &vid, &dest_idx);
> +		WARN_ON(dest_idx > lan966x->num_phys_ports);
> +
> +		mac_entry = lan966x_mac_alloc_entry(mac, vid, dest_idx);
> +		if (!mac_entry)
> +			return;
> +
> +		mac_entry->row = row;
> +
> +		spin_lock_irqsave(&lan966x->mac_lock, flags);
> +		list_add_tail(&mac_entry->list, &lan966x->mac_entries);
> +		spin_unlock_irqrestore(&lan966x->mac_lock, flags);

spin_lock_irqsave shouldn't be necessary from an irq handler.

> +
> +		lan966x_mac_notifiers(lan966x, SWITCHDEV_FDB_ADD_TO_BRIDGE,
> +				      mac, vid, lan966x->ports[dest_idx]->dev);
> +	}
> +}
> +
> +irqreturn_t lan966x_mac_irq_handler(struct lan966x *lan966x)
> +{
> +	struct lan966x_mac_raw_entry entry[LAN966X_MAC_COLUMNS] = { 0 };
> +	u32 index, column;
> +	bool stop = true;
> +	u32 val;
> +
> +	/* Check if the mac table triggered this, if not just bail out */
> +	if (!(ANA_ANAINTR_INTR_GET(lan_rd(lan966x, ANA_ANAINTR))))
> +		return IRQ_NONE;

The interrupt isn't shared, so if we enter this condition, it means the
analyzer block generated it, just not the MAC table portion of it.
If we return IRQ_NONE there will be an IRQ storm because that condition
will never go away. Could we ack the interrupt and return IRQ_HANDLED?

> +
> +	/* Start the scan from 0, 0 */
> +	lan_wr(ANA_MACTINDX_M_INDEX_SET(0) |
> +	       ANA_MACTINDX_BUCKET_SET(0),
> +	       lan966x, ANA_MACTINDX);
> +
> +	while (1) {
> +		lan_rmw(ANA_MACACCESS_MAC_TABLE_CMD_SET(MACACCESS_CMD_SYNC_GET_NEXT),
> +			ANA_MACACCESS_MAC_TABLE_CMD,
> +			lan966x, ANA_MACACCESS);
> +		lan966x_mac_wait_for_completion(lan966x);
> +
> +		val = lan_rd(lan966x, ANA_MACTINDX);
> +		index = ANA_MACTINDX_M_INDEX_GET(val);
> +		column = ANA_MACTINDX_BUCKET_GET(val);
> +
> +		/* The SYNC-GET-NEXT returns all the entries(4) in a row in
> +		 * which is suffered a change. By change it means that new entry
> +		 * was added or an entry was removed because of ageing.
> +		 * It would return all the columns for that row. And after that
> +		 * it would return the next row The stop conditions of the
> +		 * SYNC-GET-NEXT is when it reaches 'directly' to row 0
> +		 * column 3. So if SYNC-GET-NEXT returns row 0 and column 0
> +		 * then it is required to continue to read more even if it
> +		 * reaches row 0 and column 3.
> +		 */
> +		if (index == 0 && column == 0)
> +			stop = false;
> +
> +		if (column == LAN966X_MAC_COLUMNS - 1 &&
> +		    index == 0 && stop)
> +			break;
> +
> +		entry[column].mach = lan_rd(lan966x, ANA_MACHDATA);
> +		entry[column].macl = lan_rd(lan966x, ANA_MACLDATA);
> +		entry[column].maca = lan_rd(lan966x, ANA_MACACCESS);
> +
> +		/* Once all the columns are read process them */
> +		if (column == LAN966X_MAC_COLUMNS - 1) {
> +			lan966x_mac_irq_process(lan966x, index, entry);
> +			/* A row was processed so it is safe to assume that the
> +			 * next row/column can be the stop condition
> +			 */
> +			stop = true;
> +		}
> +	}
> +
> +	lan_rmw(ANA_ANAINTR_INTR_SET(0),
> +		ANA_ANAINTR_INTR,
> +		lan966x, ANA_ANAINTR);
> +
> +	return IRQ_HANDLED;
>  }
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> index 101c1f005baf..7c6d6293611a 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> @@ -527,6 +527,13 @@ static irqreturn_t lan966x_xtr_irq_handler(int irq, void *args)
>  	return IRQ_HANDLED;
>  }
>  
> +static irqreturn_t lan966x_ana_irq_handler(int irq, void *args)
> +{
> +	struct lan966x *lan966x = args;
> +
> +	return lan966x_mac_irq_handler(lan966x);
> +}
> +
>  static void lan966x_cleanup_ports(struct lan966x *lan966x)
>  {
>  	struct lan966x_port *port;
> @@ -554,6 +561,11 @@ static void lan966x_cleanup_ports(struct lan966x *lan966x)
>  
>  	disable_irq(lan966x->xtr_irq);
>  	lan966x->xtr_irq = -ENXIO;
> +
> +	if (lan966x->ana_irq) {
> +		disable_irq(lan966x->ana_irq);
> +		lan966x->ana_irq = -ENXIO;
> +	}
>  }
>  
>  static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
> @@ -870,6 +882,15 @@ static int lan966x_probe(struct platform_device *pdev)
>  		return -ENODEV;
>  	}
>  
> +	lan966x->ana_irq = platform_get_irq_byname(pdev, "ana");
> +	if (lan966x->ana_irq) {
> +		err = devm_request_threaded_irq(&pdev->dev, lan966x->ana_irq, NULL,
> +						lan966x_ana_irq_handler, IRQF_ONESHOT,
> +						"ana irq", lan966x);
> +		if (err)
> +			return dev_err_probe(&pdev->dev, err, "Unable to use ana irq");
> +	}
> +
>  	/* init switch */
>  	lan966x_init(lan966x);
>  	lan966x_stats_init(lan966x);
> @@ -923,6 +944,8 @@ static int lan966x_remove(struct platform_device *pdev)
>  	destroy_workqueue(lan966x->stats_queue);
>  	mutex_destroy(&lan966x->stats_lock);
>  
> +	lan966x_mac_purge_entries(lan966x);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> index 7e5a3b6f168d..ba548d65b58a 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> @@ -75,6 +75,9 @@ struct lan966x {
>  
>  	u8 base_mac[ETH_ALEN];
>  
> +	struct list_head mac_entries;
> +	spinlock_t mac_lock; /* lock for mac_entries list */
> +
>  	/* stats */
>  	const struct lan966x_stat_layout *stats_layout;
>  	u32 num_stats;
> @@ -87,6 +90,7 @@ struct lan966x {
>  
>  	/* interrupts */
>  	int xtr_irq;
> +	int ana_irq;
>  };
>  
>  struct lan966x_port_config {
> @@ -141,6 +145,8 @@ int lan966x_mac_forget(struct lan966x *lan966x,
>  int lan966x_mac_cpu_learn(struct lan966x *lan966x, const char *addr, u16 vid);
>  int lan966x_mac_cpu_forget(struct lan966x *lan966x, const char *addr, u16 vid);
>  void lan966x_mac_init(struct lan966x *lan966x);
> +void lan966x_mac_purge_entries(struct lan966x *lan966x);
> +irqreturn_t lan966x_mac_irq_handler(struct lan966x *lan966x);
>  
>  static inline void __iomem *lan_addr(void __iomem *base[],
>  				     int id, int tinst, int tcnt,
> -- 
> 2.33.0
>

Vladimir Oltean Dec. 9, 2021, 1:36 p.m. UTC | #2

On Thu, Dec 09, 2021 at 10:46:15AM +0100, Horatiu Vultur wrote:
> This adds support for switchdev in lan966x.
> It offloads to the HW basic forwarding and vlan filtering. To be able to
> offload this to the HW, it is required to disable promisc mode for ports
> that are part of the bridge.
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---
>  .../net/ethernet/microchip/lan966x/Makefile   |   3 +-
>  .../ethernet/microchip/lan966x/lan966x_main.c |  41 +-
>  .../ethernet/microchip/lan966x/lan966x_main.h |  18 +
>  .../microchip/lan966x/lan966x_switchdev.c     | 548 ++++++++++++++++++
>  .../ethernet/microchip/lan966x/lan966x_vlan.c |  12 +-
>  5 files changed, 610 insertions(+), 12 deletions(-)
>  create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> 
> diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
> index f7e6068a91cb..d82e896c2e53 100644
> --- a/drivers/net/ethernet/microchip/lan966x/Makefile
> +++ b/drivers/net/ethernet/microchip/lan966x/Makefile
> @@ -6,4 +6,5 @@
>  obj-$(CONFIG_LAN966X_SWITCH) += lan966x-switch.o
>  
>  lan966x-switch-objs  := lan966x_main.o lan966x_phylink.o lan966x_port.o \
> -			lan966x_mac.o lan966x_ethtool.o lan966x_vlan.o
> +			lan966x_mac.o lan966x_ethtool.o lan966x_switchdev.o \
> +			lan966x_vlan.o
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> index 1b4c7e6b4f85..aee36c1cfa17 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> @@ -306,7 +306,7 @@ static int lan966x_port_xmit(struct sk_buff *skb, struct net_device *dev)
>  	return lan966x_port_ifh_xmit(skb, ifh, dev);
>  }
>  
> -static void lan966x_set_promisc(struct lan966x_port *port, bool enable)
> +void lan966x_set_promisc(struct lan966x_port *port, bool enable)
>  {
>  	struct lan966x *lan966x = port->lan966x;
>  

My documentation of CPU_SRC_COPY_ENA says:

If set, all frames received on this port are
copied to the CPU extraction queue given by
CPUQ_CFG.CPUQ_SRC_COPY.

I think it was established a while ago that this isn't what promiscuous
mode is about? Instead it is about accepting packets on a port
regardless of whether the MAC DA is in their RX filter or not.

Hence the oddity of your change. I understand what it intends to do:
if this is a standalone port you support IFF_UNICAST_FLT, so you drop
frames with unknown MAC DA. But if IFF_PROMISC is set, then why do you
copy all frames to the CPU? Why don't you just put the CPU in the
unknown flooding mask? There's a difference between "force a packet to
get copied to the CPU" and "copy it to the CPU only if it may have
business there". Then this change would be compatible with bridge mode.
You want the bridge to receive unknown traffic too, it may need to
forward it in software.

> @@ -318,14 +318,18 @@ static void lan966x_set_promisc(struct lan966x_port *port, bool enable)
>  static void lan966x_port_change_rx_flags(struct net_device *dev, int flags)
>  {
>  	struct lan966x_port *port = netdev_priv(dev);
> +	bool enable;
>  
>  	if (!(flags & IFF_PROMISC))
>  		return;
>  
> -	if (dev->flags & IFF_PROMISC)
> -		lan966x_set_promisc(port, true);
> -	else
> -		lan966x_set_promisc(port, false);
> +	enable = dev->flags & IFF_PROMISC ? true : false;
> +	port->promisc = enable;
> +
> +	if (port->bridge)
> +		return;
> +
> +	lan966x_set_promisc(port, enable);
>  }
>  
>  static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
> @@ -340,7 +344,7 @@ static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
>  	return 0;
>  }
>  
> -static int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr)
> +int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr)
>  {
>  	struct lan966x_port *port = netdev_priv(dev);
>  	struct lan966x *lan966x = port->lan966x;
> @@ -348,7 +352,7 @@ static int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr)
>  	return lan966x_mac_forget(lan966x, addr, port->pvid, ENTRYTYPE_LOCKED);
>  }
>  
> -static int lan966x_mc_sync(struct net_device *dev, const unsigned char *addr)
> +int lan966x_mc_sync(struct net_device *dev, const unsigned char *addr)
>  {
>  	struct lan966x_port *port = netdev_priv(dev);
>  	struct lan966x *lan966x = port->lan966x;
> @@ -401,6 +405,11 @@ static const struct net_device_ops lan966x_port_netdev_ops = {
>  	.ndo_vlan_rx_kill_vid		= lan966x_vlan_rx_kill_vid,
>  };
>  
> +bool lan966x_netdevice_check(const struct net_device *dev)
> +{
> +	return dev && (dev->netdev_ops == &lan966x_port_netdev_ops);

Can "dev" ever be NULL?

> +}
> +
>  static int lan966x_port_xtr_status(struct lan966x *lan966x, u8 grp)
>  {
>  	return lan_rd(lan966x, QS_XTR_RD(grp));
> @@ -537,6 +546,11 @@ static irqreturn_t lan966x_xtr_irq_handler(int irq, void *args)
>  
>  		skb->protocol = eth_type_trans(skb, dev);
>  
> +#ifdef CONFIG_NET_SWITCHDEV
> +		if (lan966x->ports[src_port]->bridge)
> +			skb->offload_fwd_mark = 1;
> +#endif
> +
>  		netif_rx_ni(skb);
>  		dev->stats.rx_bytes += len;
>  		dev->stats.rx_packets++;
> @@ -619,13 +633,16 @@ static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
>  
>  	dev->netdev_ops = &lan966x_port_netdev_ops;
>  	dev->ethtool_ops = &lan966x_ethtool_ops;
> +	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER |
> +			    NETIF_F_RXFCS;
> +	dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
> +			 NETIF_F_HW_VLAN_CTAG_TX |
> +			 NETIF_F_HW_VLAN_STAG_TX;
> +	dev->priv_flags |= IFF_UNICAST_FLT;

Too many changes in one patch. IFF_UNICAST_FLT and the handling of
promiscuous mode have nothing to do with switchdev. Neither VLAN
filtering via dev->features (should have been part of the previous
patch), or RXFCS. It seems that each one of these changes was previously
missed and is now snuck in without the explanation it deserves.

>  	dev->needed_headroom = IFH_LEN * sizeof(u32);
>  
>  	eth_hw_addr_gen(dev, lan966x->base_mac, p + 1);
>  
> -	lan966x_mac_learn(lan966x, PGID_CPU, dev->dev_addr, port->pvid,
> -			  ENTRYTYPE_LOCKED);
> -

Why is this deleted? If the port uses IFF_UNICAST_FLT and isn't
promiscuous, how does it get the unicast traffic it needs?
I may be not realizing something because the changes aren't properly
split.

>  	port->phylink_config.dev = &port->dev->dev;
>  	port->phylink_config.type = PHYLINK_NETDEV;
>  	port->phylink_pcs.poll = true;
> @@ -949,6 +966,8 @@ static int lan966x_probe(struct platform_device *pdev)
>  		lan966x_port_init(lan966x->ports[p]);
>  	}
>  
> +	lan966x_register_notifier_blocks(lan966x);
> +
>  	return 0;
>  
>  cleanup_ports:
> @@ -967,6 +986,8 @@ static int lan966x_remove(struct platform_device *pdev)
>  {
>  	struct lan966x *lan966x = platform_get_drvdata(pdev);
>  
> +	lan966x_unregister_notifier_blocks(lan966x);
> +
>  	lan966x_cleanup_ports(lan966x);
>  
>  	cancel_delayed_work_sync(&lan966x->stats_work);
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> index ec3eccf634b3..4a0988087167 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> @@ -80,6 +80,11 @@ struct lan966x {
>  	struct list_head mac_entries;
>  	spinlock_t mac_lock; /* lock for mac_entries list */
>  
> +	/* Notifiers */
> +	struct notifier_block netdevice_nb;
> +	struct notifier_block switchdev_nb;
> +	struct notifier_block switchdev_blocking_nb;
> +
>  	u16 vlan_mask[VLAN_N_VID];
>  	DECLARE_BITMAP(cpu_vlan_mask, VLAN_N_VID);
>  
> @@ -112,6 +117,10 @@ struct lan966x_port {
>  	struct net_device *dev;
>  	struct lan966x *lan966x;
>  
> +	struct net_device *bridge;
> +	u8 stp_state;
> +	u8 promisc;
> +
>  	u8 chip_port;
>  	u16 pvid;
>  	u16 vid;
> @@ -129,6 +138,14 @@ extern const struct phylink_mac_ops lan966x_phylink_mac_ops;
>  extern const struct phylink_pcs_ops lan966x_phylink_pcs_ops;
>  extern const struct ethtool_ops lan966x_ethtool_ops;
>  
> +int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr);
> +int lan966x_mc_sync(struct net_device *dev, const unsigned char *addr);
> +
> +bool lan966x_netdevice_check(const struct net_device *dev);
> +
> +int lan966x_register_notifier_blocks(struct lan966x *lan966x);
> +void lan966x_unregister_notifier_blocks(struct lan966x *lan966x);
> +
>  void lan966x_stats_get(struct net_device *dev,
>  		       struct rtnl_link_stats64 *stats);
>  int lan966x_stats_init(struct lan966x *lan966x);
> @@ -139,6 +156,7 @@ void lan966x_port_status_get(struct lan966x_port *port,
>  			     struct phylink_link_state *state);
>  int lan966x_port_pcs_set(struct lan966x_port *port,
>  			 struct lan966x_port_config *config);
> +void lan966x_set_promisc(struct lan966x_port *port, bool enable);
>  void lan966x_port_init(struct lan966x_port *port);
>  
>  int lan966x_mac_learn(struct lan966x *lan966x, int port,
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> new file mode 100644
> index 000000000000..ed6ec78d2d9a
> --- /dev/null
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> @@ -0,0 +1,548 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include <linux/if_bridge.h>
> +#include <net/switchdev.h>
> +
> +#include "lan966x_main.h"
> +
> +static struct workqueue_struct *lan966x_owq;
> +
> +struct lan966x_fdb_event_work {
> +	struct work_struct work;
> +	struct switchdev_notifier_fdb_info fdb_info;
> +	struct net_device *dev;
> +	struct lan966x *lan966x;
> +	unsigned long event;
> +};
> +
> +static void lan966x_port_attr_bridge_flags(struct lan966x_port *port,
> +					   struct switchdev_brport_flags flags)
> +{
> +	u32 val = lan_rd(port->lan966x, ANA_PGID(PGID_MC));
> +
> +	val = ANA_PGID_PGID_GET(val);
> +
> +	if (flags.mask & BR_MCAST_FLOOD) {
> +		if (flags.val & BR_MCAST_FLOOD)
> +			val |= BIT(port->chip_port);
> +		else
> +			val &= ~BIT(port->chip_port);
> +	}
> +
> +	lan_rmw(ANA_PGID_PGID_SET(val),
> +		ANA_PGID_PGID,
> +		port->lan966x, ANA_PGID(PGID_MC));
> +}
> +
> +static u32 lan966x_get_fwd_mask(struct lan966x_port *port)
> +{
> +	struct net_device *bridge = port->bridge;
> +	struct lan966x *lan966x = port->lan966x;
> +	u8 ingress_src = port->chip_port;
> +	u32 mask = 0;
> +	int p;
> +
> +	if (port->stp_state != BR_STATE_FORWARDING)
> +		goto skip_forwarding;
> +
> +	for (p = 0; p < lan966x->num_phys_ports; p++) {
> +		port = lan966x->ports[p];
> +
> +		if (!port)
> +			continue;
> +
> +		if (port->stp_state == BR_STATE_FORWARDING &&
> +		    port->bridge == bridge)
> +			mask |= BIT(p);
> +	}
> +
> +skip_forwarding:
> +	mask &= ~BIT(ingress_src);
> +
> +	return mask;
> +}
> +
> +static void lan966x_update_fwd_mask(struct lan966x *lan966x)
> +{
> +	int p;
> +
> +	for (p = 0; p < lan966x->num_phys_ports; p++) {
> +		struct lan966x_port *port = lan966x->ports[p];
> +		unsigned long mask = 0;
> +
> +		if (port->bridge)
> +			mask = lan966x_get_fwd_mask(port);
> +
> +		mask |= BIT(CPU_PORT);
> +
> +		lan_wr(ANA_PGID_PGID_SET(mask),
> +		       lan966x, ANA_PGID(PGID_SRC + p));
> +	}
> +}
> +
> +static void lan966x_attr_stp_state_set(struct lan966x_port *port,
> +				       u8 state)
> +{
> +	struct lan966x *lan966x = port->lan966x;
> +	bool learn_ena = 0;
> +
> +	port->stp_state = state;
> +
> +	if (state == BR_STATE_FORWARDING || state == BR_STATE_LEARNING)
> +		learn_ena = 1;

Please use true/false for bool types.

> +
> +	lan_rmw(ANA_PORT_CFG_LEARN_ENA_SET(learn_ena),
> +		ANA_PORT_CFG_LEARN_ENA,
> +		lan966x, ANA_PORT_CFG(port->chip_port));
> +
> +	lan966x_update_fwd_mask(lan966x);
> +}
> +
> +static void lan966x_port_attr_ageing_set(struct lan966x_port *port,
> +					 unsigned long ageing_clock_t)
> +{
> +	unsigned long ageing_jiffies = clock_t_to_jiffies(ageing_clock_t);
> +	u32 ageing_time = jiffies_to_msecs(ageing_jiffies) / 1000;
> +
> +	lan966x_mac_set_ageing(port->lan966x, ageing_time);
> +}
> +
> +static int lan966x_port_attr_set(struct net_device *dev, const void *ctx,
> +				 const struct switchdev_attr *attr,
> +				 struct netlink_ext_ack *extack)
> +{
> +	struct lan966x_port *port = netdev_priv(dev);
> +
> +	switch (attr->id) {
> +	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
> +		lan966x_port_attr_bridge_flags(port, attr->u.brport_flags);
> +		break;

no SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS? I think it doesn't even work
if you don't handle that?

br_switchdev_set_port_flag():

	attr.id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS;
	attr.u.brport_flags.val = flags;
	attr.u.brport_flags.mask = mask;

	/* We run from atomic context here */
	err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
				       &info.info, extack);
	err = notifier_to_errno(err);
	if (err == -EOPNOTSUPP)
		return 0;

Anyway, a big blob of "switchdev support" is hard to follow and review.
If you add bridge port flags you could as well add more comprehensive
support for them, but in a separate change please. Forwarding domain is
one thing, FDB/MDB is another, VLAN is another.

> +	case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
> +		lan966x_attr_stp_state_set(port, attr->u.stp_state);
> +		break;
> +	case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
> +		lan966x_port_attr_ageing_set(port, attr->u.ageing_time);
> +		break;
> +	case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
> +		lan966x_vlan_port_set_vlan_aware(port, attr->u.vlan_filtering);
> +		lan966x_vlan_port_apply(port);
> +		lan966x_vlan_cpu_set_vlan_aware(port);
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
> +static int lan966x_port_bridge_join(struct lan966x_port *port,
> +				    struct net_device *bridge,
> +				    struct netlink_ext_ack *extack)
> +{
> +	struct net_device *dev = port->dev;
> +	int err;
> +
> +	err = switchdev_bridge_port_offload(dev, dev, NULL, NULL, NULL,
> +					    false, extack);
> +	if (err)
> +		return err;
> +
> +	port->bridge = bridge;
> +
> +	/* Port enters in bridge mode therefor don't need to copy to CPU
> +	 * frames for multicast in case the bridge is not requesting them
> +	 */
> +	__dev_mc_unsync(dev, lan966x_mc_unsync);

Why is there a need to unsync the addresses, though? A driver that
supports proper MAC table isolation between standalone ports and
VLAN-unaware bridge ports should use separate pvids for the two modes of
operation (if it doesn't support other isolation mechanisms apart from VLAN,
which it looks like lan966x does not). I see that your driver even does
this in lan966x_vlan_port_get_pvid(). So the non-bridge multicast
addresses could sit there even when the port is in a bridge.

> +
> +	/* make sure that the promisc is disabled when entering under the bridge
> +	 * because we don't want all the frames to come to CPU
> +	 */
> +	lan966x_set_promisc(port, false);

What's the story here? Why don't other switchdev drivers handle promisc
in this way (copy all frames to the CPU)?

> +
> +	return 0;
> +}
> +
> +static void lan966x_port_bridge_leave(struct lan966x_port *port,
> +				      struct net_device *bridge)
> +{
> +	struct lan966x *lan966x = port->lan966x;
> +
> +	switchdev_bridge_port_unoffload(port->dev, NULL, NULL, NULL);

The bridge offers a facility to sync and unsync the host addresses on
joins and leaves. For this to work, the switchdev_bridge_port_unoffload
call should be during the NETDEV_PRECHANGEUPPER notifier event.

> +	port->bridge = NULL;
> +
> +	/* Set the port back to host mode */
> +	lan966x_vlan_port_set_vlan_aware(port, 0);

s/0/false/

> +	lan966x_vlan_port_set_vid(port, HOST_PVID, false, false);
> +	lan966x_vlan_port_apply(port);
> +
> +	lan966x_mac_cpu_learn(lan966x, port->dev->dev_addr, HOST_PVID);
> +
> +	/* Port enters in host more therefore restore mc list */
> +	__dev_mc_sync(port->dev, lan966x_mc_sync, lan966x_mc_unsync);
> +
> +	/* Restore back the promisc as it was before the interfaces was added to
> +	 * the bridge
> +	 */
> +	lan966x_set_promisc(port, port->promisc);
> +}
> +
> +static int lan966x_port_changeupper(struct net_device *dev,
> +				    struct netdev_notifier_changeupper_info *info)
> +{
> +	struct lan966x_port *port = netdev_priv(dev);
> +	struct netlink_ext_ack *extack;
> +	int err = 0;
> +
> +	extack = netdev_notifier_info_to_extack(&info->info);
> +
> +	if (netif_is_bridge_master(info->upper_dev)) {
> +		if (info->linking)
> +			err = lan966x_port_bridge_join(port, info->upper_dev,
> +						       extack);
> +		else
> +			lan966x_port_bridge_leave(port, info->upper_dev);
> +	}
> +
> +	return err;
> +}
> +
> +static int lan966x_port_add_addr(struct net_device *dev, bool up)
> +{
> +	struct lan966x_port *port = netdev_priv(dev);
> +	struct lan966x *lan966x = port->lan966x;
> +	u16 vid;
> +
> +	vid = lan966x_vlan_port_get_pvid(port);
> +
> +	if (up)
> +		lan966x_mac_cpu_learn(lan966x, dev->dev_addr, vid);
> +	else
> +		lan966x_mac_cpu_forget(lan966x, dev->dev_addr, vid);

For which uppers is this intended? The bridge? Because the bridge
notifies you of all the entries it needs, if you also consider the
replayed events and provide non-NULL pointers to
switchdev_bridge_port_offload.

> +
> +	return 0;
> +}
> +
> +static int lan966x_netdevice_port_event(struct net_device *dev,
> +					struct notifier_block *nb,
> +					unsigned long event, void *ptr)
> +{
> +	int err = 0;
> +
> +	if (!lan966x_netdevice_check(dev))
> +		return 0;
> +
> +	switch (event) {
> +	case NETDEV_CHANGEUPPER:
> +		err = lan966x_port_changeupper(dev, ptr);
> +		break;
> +	case NETDEV_PRE_UP:
> +		err = lan966x_port_add_addr(dev, true);
> +		break;
> +	case NETDEV_DOWN:
> +		err = lan966x_port_add_addr(dev, false);
> +		break;
> +	}
> +
> +	return err;
> +}
> +
> +static int lan966x_netdevice_event(struct notifier_block *nb,
> +				   unsigned long event, void *ptr)
> +{
> +	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	int ret;
> +
> +	ret = lan966x_netdevice_port_event(dev, nb, event, ptr);
> +
> +	return notifier_from_errno(ret);
> +}
> +
> +static void lan966x_fdb_event_work(struct work_struct *work)
> +{
> +	struct lan966x_fdb_event_work *fdb_work =
> +		container_of(work, struct lan966x_fdb_event_work, work);
> +	struct switchdev_notifier_fdb_info *fdb_info;
> +	struct net_device *dev = fdb_work->dev;
> +	struct lan966x_port *port;
> +	struct lan966x *lan966x;
> +
> +	rtnl_lock();

rtnl_lock() shouldn't be needed.

> +
> +	fdb_info = &fdb_work->fdb_info;
> +	lan966x = fdb_work->lan966x;
> +
> +	if (lan966x_netdevice_check(dev)) {
> +		port = netdev_priv(dev);
> +
> +		switch (fdb_work->event) {
> +		case SWITCHDEV_FDB_ADD_TO_DEVICE:
> +			if (!fdb_info->added_by_user)
> +				break;

If you get notified of a MAC address dynamically learned by the software
bridge on a lan966x port, you will have allocated memory for the work
item, and scheduled it, for nothing. Please try to exit unnecessary work
early.

> +			lan966x_mac_add_entry(lan966x, port, fdb_info->addr,
> +					      fdb_info->vid);
> +			break;
> +		case SWITCHDEV_FDB_DEL_TO_DEVICE:
> +			if (!fdb_info->added_by_user)
> +				break;
> +			lan966x_mac_del_entry(lan966x, fdb_info->addr, fdb_info->vid);
> +			break;
> +		}
> +	} else {
> +		if (!netif_is_bridge_master(dev))
> +			goto out;
> +
> +		/* If the CPU is not part of the vlan then there is no point
> +		 * to copy the frames to the CPU because they will be dropped
> +		 */
> +		if (!lan966x_vlan_cpu_member_vlan_mask(lan966x, fdb_info->vid))
> +			goto out;

It isn't part of the VLAN now, but what about later? I don't see that
you keep these FDB entries anywhere and restore them when a port joins
that VLAN.

> +
> +		/* In case the bridge is called */
> +		switch (fdb_work->event) {
> +		case SWITCHDEV_FDB_ADD_TO_DEVICE:
> +			/* If there is no front port in this vlan, there is no
> +			 * point to copy the frame to CPU because it would be
> +			 * just dropped at later point. So add it only if
> +			 * there is a port
> +			 */
> +			if (!lan966x_vlan_port_any_vlan_mask(lan966x, fdb_info->vid))
> +				break;
> +
> +			lan966x_mac_cpu_learn(lan966x, fdb_info->addr, fdb_info->vid);

Does the lan966x_mac_cpu_learn() operation trigger interrupts, or only
the dynamic learning process? How do you handle migration of an FDB
entry pointing towards the CPU, towards one pointing towards a port?

> +			break;
> +		case SWITCHDEV_FDB_DEL_TO_DEVICE:
> +			/* It is OK to always forget the entry it */

Forget the entry it?

> +			lan966x_mac_cpu_forget(lan966x, fdb_info->addr, fdb_info->vid);
> +			break;
> +		}
> +	}
> +
> +out:
> +	rtnl_unlock();
> +	kfree(fdb_work->fdb_info.addr);
> +	kfree(fdb_work);
> +	dev_put(dev);
> +}
> +
> +static int lan966x_switchdev_event(struct notifier_block *nb,
> +				   unsigned long event, void *ptr)
> +{
> +	struct lan966x *lan966x = container_of(nb, struct lan966x, switchdev_nb);
> +	struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
> +	struct switchdev_notifier_fdb_info *fdb_info;
> +	struct switchdev_notifier_info *info = ptr;
> +	struct lan966x_fdb_event_work *fdb_work;
> +	int err;
> +
> +	switch (event) {
> +	case SWITCHDEV_PORT_ATTR_SET:
> +		err = switchdev_handle_port_attr_set(dev, ptr,
> +						     lan966x_netdevice_check,
> +						     lan966x_port_attr_set);
> +		return notifier_from_errno(err);
> +	case SWITCHDEV_FDB_ADD_TO_DEVICE:
> +		fallthrough;

"fallthrough;" not needed here.

> +	case SWITCHDEV_FDB_DEL_TO_DEVICE:
> +		fdb_work = kzalloc(sizeof(*fdb_work), GFP_ATOMIC);
> +		if (!fdb_work)
> +			return NOTIFY_BAD;
> +
> +		fdb_info = container_of(info,
> +					struct switchdev_notifier_fdb_info,
> +					info);
> +
> +		fdb_work->dev = dev;
> +		fdb_work->lan966x = lan966x;
> +		fdb_work->event = event;
> +		INIT_WORK(&fdb_work->work, lan966x_fdb_event_work);
> +		memcpy(&fdb_work->fdb_info, ptr, sizeof(fdb_work->fdb_info));
> +		fdb_work->fdb_info.addr = kzalloc(ETH_ALEN, GFP_ATOMIC);
> +		if (!fdb_work->fdb_info.addr)
> +			goto err_addr_alloc;
> +
> +		ether_addr_copy((u8 *)fdb_work->fdb_info.addr, fdb_info->addr);
> +		dev_hold(dev);
> +
> +		queue_work(lan966x_owq, &fdb_work->work);
> +		break;
> +	}
> +
> +	return NOTIFY_DONE;
> +err_addr_alloc:
> +	kfree(fdb_work);
> +	return NOTIFY_BAD;
> +}
> +
> +static int lan966x_handle_port_vlan_add(struct net_device *dev,
> +					struct notifier_block *nb,
> +					const struct switchdev_obj_port_vlan *v)
> +{
> +	struct lan966x_port *port;
> +	struct lan966x *lan966x;
> +
> +	/* When adding a port to a vlan, we get a callback for the port but
> +	 * also for the bridge. When get the callback for the bridge just bail
> +	 * out. Then when the bridge is added to the vlan, then we get a
> +	 * callback here but in this case the flags has set:
> +	 * BRIDGE_VLAN_INFO_BRENTRY. In this case it means that the CPU
> +	 * port is added to the vlan, so the broadcast frames and unicast frames
> +	 * with dmac of the bridge should be foward to CPU.
> +	 */
> +	if (netif_is_bridge_master(dev) &&
> +	    !(v->flags & BRIDGE_VLAN_INFO_BRENTRY))
> +		return 0;
> +
> +	lan966x = container_of(nb, struct lan966x, switchdev_blocking_nb);
> +
> +	/* In case the port gets called */
> +	if (!(netif_is_bridge_master(dev))) {
> +		if (!lan966x_netdevice_check(dev))
> +			return -EOPNOTSUPP;
> +
> +		port = netdev_priv(dev);
> +		return lan966x_vlan_port_add_vlan(port, v->vid,
> +						  v->flags & BRIDGE_VLAN_INFO_PVID,
> +						  v->flags & BRIDGE_VLAN_INFO_UNTAGGED);
> +	}
> +
> +	/* In case the bridge gets called */
> +	if (netif_is_bridge_master(dev))
> +		return lan966x_vlan_cpu_add_vlan(lan966x, dev, v->vid);
> +
> +	return 0;
> +}
> +
> +static int lan966x_handle_port_obj_add(struct net_device *dev,
> +				       struct notifier_block *nb,
> +				       struct switchdev_notifier_port_obj_info *info)
> +{
> +	const struct switchdev_obj *obj = info->obj;
> +	int err;
> +
> +	switch (obj->id) {
> +	case SWITCHDEV_OBJ_ID_PORT_VLAN:
> +		err = lan966x_handle_port_vlan_add(dev, nb,
> +						   SWITCHDEV_OBJ_PORT_VLAN(obj));
> +		break;
> +	default:
> +		err = -EOPNOTSUPP;
> +		break;
> +	}
> +
> +	info->handled = true;
> +	return err;
> +}
> +
> +static int lan966x_handle_port_vlan_del(struct net_device *dev,
> +					struct notifier_block *nb,
> +					const struct switchdev_obj_port_vlan *v)
> +{
> +	struct lan966x_port *port;
> +	struct lan966x *lan966x;
> +
> +	lan966x = container_of(nb, struct lan966x, switchdev_blocking_nb);
> +
> +	/* In case the physical port gets called */
> +	if (!netif_is_bridge_master(dev)) {
> +		if (!lan966x_netdevice_check(dev))
> +			return -EOPNOTSUPP;
> +
> +		port = netdev_priv(dev);
> +		return lan966x_vlan_port_del_vlan(port, v->vid);
> +	}
> +
> +	/* In case the bridge gets called */
> +	if (netif_is_bridge_master(dev))
> +		return lan966x_vlan_cpu_del_vlan(lan966x, dev, v->vid);
> +
> +	return 0;
> +}
> +
> +static int lan966x_handle_port_obj_del(struct net_device *dev,
> +				       struct notifier_block *nb,
> +				       struct switchdev_notifier_port_obj_info *info)
> +{
> +	const struct switchdev_obj *obj = info->obj;
> +	int err;
> +
> +	switch (obj->id) {
> +	case SWITCHDEV_OBJ_ID_PORT_VLAN:
> +		err = lan966x_handle_port_vlan_del(dev, nb,
> +						   SWITCHDEV_OBJ_PORT_VLAN(obj));
> +		break;
> +	default:
> +		err = -EOPNOTSUPP;
> +		break;
> +	}
> +
> +	info->handled = true;
> +	return err;
> +}
> +
> +static int lan966x_switchdev_blocking_event(struct notifier_block *nb,
> +					    unsigned long event,
> +					    void *ptr)
> +{
> +	struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
> +	int err;
> +
> +	switch (event) {
> +	case SWITCHDEV_PORT_OBJ_ADD:
> +		err = lan966x_handle_port_obj_add(dev, nb, ptr);
> +		return notifier_from_errno(err);
> +	case SWITCHDEV_PORT_OBJ_DEL:
> +		err = lan966x_handle_port_obj_del(dev, nb, ptr);
> +		return notifier_from_errno(err);
> +	case SWITCHDEV_PORT_ATTR_SET:
> +		err = switchdev_handle_port_attr_set(dev, ptr,
> +						     lan966x_netdevice_check,
> +						     lan966x_port_attr_set);
> +		return notifier_from_errno(err);
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
> +int lan966x_register_notifier_blocks(struct lan966x *lan966x)
> +{
> +	int err;
> +
> +	lan966x->netdevice_nb.notifier_call = lan966x_netdevice_event;
> +	err = register_netdevice_notifier(&lan966x->netdevice_nb);
> +	if (err)
> +		return err;
> +
> +	lan966x->switchdev_nb.notifier_call = lan966x_switchdev_event;
> +	err = register_switchdev_notifier(&lan966x->switchdev_nb);
> +	if (err)
> +		goto err_switchdev_nb;
> +
> +	lan966x->switchdev_blocking_nb.notifier_call = lan966x_switchdev_blocking_event;
> +	err = register_switchdev_blocking_notifier(&lan966x->switchdev_blocking_nb);
> +	if (err)
> +		goto err_switchdev_blocking_nb;
> +
> +	lan966x_owq = alloc_ordered_workqueue("lan966x_order", 0);
> +	if (!lan966x_owq) {
> +		err = -ENOMEM;
> +		goto err_switchdev_blocking_nb;
> +	}

These should be singleton objects, otherwise things get problematic if
you have more than one switch device instantiated in the system.

> +
> +	return 0;
> +
> +err_switchdev_blocking_nb:
> +	unregister_switchdev_notifier(&lan966x->switchdev_nb);
> +err_switchdev_nb:
> +	unregister_netdevice_notifier(&lan966x->netdevice_nb);
> +
> +	return err;
> +}
> +
> +void lan966x_unregister_notifier_blocks(struct lan966x *lan966x)
> +{
> +	destroy_workqueue(lan966x_owq);
> +
> +	unregister_switchdev_blocking_notifier(&lan966x->switchdev_blocking_nb);
> +	unregister_switchdev_notifier(&lan966x->switchdev_nb);
> +	unregister_netdevice_notifier(&lan966x->netdevice_nb);
> +}
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c b/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c
> index e47552775d06..26644503b4e6 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c
> @@ -155,6 +155,9 @@ static bool lan966x_vlan_cpu_member_cpu_vlan_mask(struct lan966x *lan966x, u16 v
>  
>  u16 lan966x_vlan_port_get_pvid(struct lan966x_port *port)
>  {
> +	if (!port->bridge)
> +		return HOST_PVID;
> +
>  	return port->vlan_aware ? port->pvid : UNAWARE_PVID;
>  }
>  
> @@ -210,6 +213,8 @@ void lan966x_vlan_cpu_set_vlan_aware(struct lan966x_port *port)
>  		 * table for the front port and the CPU
>  		 */
>  		lan966x_mac_cpu_learn(lan966x, port->dev->dev_addr, UNAWARE_PVID);
> +		lan966x_mac_cpu_learn(lan966x, port->bridge->dev_addr,
> +				      UNAWARE_PVID);
>  
>  		lan966x_vlan_port_add_vlan_mask(port, UNAWARE_PVID);
>  		lan966x_vlan_port_apply(port);
> @@ -218,6 +223,8 @@ void lan966x_vlan_cpu_set_vlan_aware(struct lan966x_port *port)
>  		 * to vlan unaware
>  		 */
>  		lan966x_mac_cpu_forget(lan966x, port->dev->dev_addr, UNAWARE_PVID);
> +		lan966x_mac_cpu_forget(lan966x, port->bridge->dev_addr,
> +				       UNAWARE_PVID);
>  
>  		lan966x_vlan_port_del_vlan_mask(port, UNAWARE_PVID);
>  		lan966x_vlan_port_apply(port);
> @@ -293,6 +300,7 @@ int lan966x_vlan_port_add_vlan(struct lan966x_port *port,
>  	 */
>  	if (lan966x_vlan_cpu_member_cpu_vlan_mask(lan966x, vid)) {
>  		lan966x_mac_cpu_learn(lan966x, port->dev->dev_addr, vid);
> +		lan966x_mac_cpu_learn(lan966x, port->bridge->dev_addr, vid);
>  		lan966x_vlan_cpu_add_vlan_mask(lan966x, vid);
>  	}
>  
> @@ -322,8 +330,10 @@ int lan966x_vlan_port_del_vlan(struct lan966x_port *port,
>  	 * that vlan but still keep it in the mask because it may be needed
>  	 * again then another port gets added in tha vlan
>  	 */
> -	if (!lan966x_vlan_port_any_vlan_mask(lan966x, vid))
> +	if (!lan966x_vlan_port_any_vlan_mask(lan966x, vid)) {
> +		lan966x_mac_cpu_forget(lan966x, port->bridge->dev_addr, vid);
>  		lan966x_vlan_cpu_del_vlan_mask(lan966x, vid);
> +	}
>  
>  	return 0;
>  }
> -- 
> 2.33.0
>

Vladimir Oltean Dec. 9, 2021, 1:59 p.m. UTC | #3

On Thu, Dec 09, 2021 at 10:46:14AM +0100, Horatiu Vultur wrote:
> +int lan966x_vlan_port_set_vid(struct lan966x_port *port, u16 vid,
> +			      bool pvid, bool untagged)
> +{
> +	struct lan966x *lan966x = port->lan966x;
> +
> +	/* Egress vlan classification */
> +	if (untagged && port->vid != vid) {
> +		if (port->vid) {
> +			dev_err(lan966x->dev,
> +				"Port already has a native VLAN: %d\n",
> +				port->vid);
> +			return -EBUSY;

Are you interested in supporting the use case from 0da1a1c48911 ("net:
mscc: ocelot: allow a config where all bridge VLANs are egress-untagged")?
Because it would be good if the driver was structured that way from the
get-go instead of patching it later.

> +		}
> +		port->vid = vid;
> +	}
> +
> +	/* Default ingress vlan classification */
> +	if (pvid)
> +		port->pvid = vid;
> +
> +	return 0;
> +}

Horatiu Vultur Dec. 9, 2021, 3:38 p.m. UTC | #4

The 12/09/2021 11:47, Vladimir Oltean wrote:
> 
> On Thu, Dec 09, 2021 at 10:46:12AM +0100, Horatiu Vultur wrote:
> > This patch adds support for handling the interrupts generated by the
> > analyzer. Currently, only the MAC table generates these interrupts.
> > The MAC table will generate an interrupt whenever it learns or forgets
> > an entry in the table. It is the SW responsibility figure out which
> > entries were added/removed.
> >
> > Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> > ---
> >  .../ethernet/microchip/lan966x/lan966x_mac.c  | 244 ++++++++++++++++++
> >  .../ethernet/microchip/lan966x/lan966x_main.c |  23 ++
> >  .../ethernet/microchip/lan966x/lan966x_main.h |   6 +
> >  3 files changed, 273 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c b/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> > index f6878b9f57ef..c01ab01bffbf 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> > @@ -1,5 +1,6 @@
> >  // SPDX-License-Identifier: GPL-2.0+
> >
> > +#include <net/switchdev.h>
> >  #include "lan966x_main.h"
> >
> >  #define LAN966X_MAC_COLUMNS          4
> > @@ -13,6 +14,23 @@
> >  #define MACACCESS_CMD_WRITE          7
> >  #define MACACCESS_CMD_SYNC_GET_NEXT  8
> >
> > +#define LAN966X_MAC_INVALID_ROW              -1
> > +
> > +struct lan966x_mac_entry {
> > +     struct list_head list;
> > +     unsigned char mac[ETH_ALEN] __aligned(2);
> > +     u16 vid;
> > +     u16 port_index;
> > +     int row;
> > +};
> > +
> > +struct lan966x_mac_raw_entry {
> > +     u32 mach;
> > +     u32 macl;
> > +     u32 maca;
> > +     bool process;
> > +};
> > +
> >  static int lan966x_mac_get_status(struct lan966x *lan966x)
> >  {
> >       return lan_rd(lan966x, ANA_MACACCESS);
> > @@ -98,4 +116,230 @@ void lan966x_mac_init(struct lan966x *lan966x)
> >       /* Clear the MAC table */
> >       lan_wr(MACACCESS_CMD_INIT, lan966x, ANA_MACACCESS);
> >       lan966x_mac_wait_for_completion(lan966x);
> > +
> > +     spin_lock_init(&lan966x->mac_lock);
> > +     INIT_LIST_HEAD(&lan966x->mac_entries);
> > +}
> > +
> > +static struct lan966x_mac_entry *lan966x_mac_alloc_entry(const unsigned char *mac,
> > +                                                      u16 vid, u16 port_index)
> > +{
> > +     struct lan966x_mac_entry *mac_entry;
> > +
> > +     mac_entry = kzalloc(sizeof(*mac_entry), GFP_KERNEL);
> > +     if (!mac_entry)
> > +             return NULL;
> > +
> > +     memcpy(mac_entry->mac, mac, ETH_ALEN);
> > +     mac_entry->vid = vid;
> > +     mac_entry->port_index = port_index;
> > +     mac_entry->row = LAN966X_MAC_INVALID_ROW;
> > +     return mac_entry;
> > +}
> > +
> > +static void lan966x_fdb_call_notifiers(enum switchdev_notifier_type type,
> > +                                    const char *mac, u16 vid,
> > +                                    struct net_device *dev)
> > +{
> > +     struct switchdev_notifier_fdb_info info = { 0 };
> > +
> > +     info.addr = mac;
> > +     info.vid = vid;
> > +     info.offloaded = true;
> > +     call_switchdev_notifiers(type, dev, &info.info, NULL);
> > +}
> > +
> > +void lan966x_mac_purge_entries(struct lan966x *lan966x)
> > +{
> > +     struct lan966x_mac_entry *mac_entry, *tmp;
> > +     unsigned long flags;
> > +
> > +     spin_lock_irqsave(&lan966x->mac_lock, flags);
> 
> I hope I'm not wrong, but you are only using this spinlock to serialize
> access to the list, which isn't accessed from hardirq context anywhere
> (the irq is threaded). So spin_lock_irqsave could simply be spin_lock.
> Unless...

It should be OK to use spin_lock. I have chosen spin_lock_irq because
that is the safest way in case it would be extended.

> 
> > +     list_for_each_entry_safe(mac_entry, tmp, &lan966x->mac_entries,
> > +                              list) {
> > +             lan966x_mac_forget(lan966x, mac_entry->mac, mac_entry->vid,
> > +                                ENTRYTYPE_LOCKED);
> 
> Does this generate a MAC table interrupt?

It doesn't.

> 
> > +
> > +             list_del(&mac_entry->list);
> > +             kfree(mac_entry);
> > +     }
> > +     spin_unlock_irqrestore(&lan966x->mac_lock, flags);
> > +}
> > +
> > +static void lan966x_mac_notifiers(struct lan966x *lan966x,
> > +                               enum switchdev_notifier_type type,
> > +                               unsigned char *mac, u32 vid,
> > +                               struct net_device *dev)
> > +{
> > +     rtnl_lock();
> > +     lan966x_fdb_call_notifiers(type, mac, vid, dev);
> > +     rtnl_unlock();
> > +}
> > +
> > +static void lan966x_mac_process_raw_entry(struct lan966x_mac_raw_entry *raw_entry,
> > +                                       u8 *mac, u16 *vid, u32 *dest_idx)
> > +{
> > +     mac[0] = (raw_entry->mach >> 8)  & 0xff;
> > +     mac[1] = (raw_entry->mach >> 0)  & 0xff;
> > +     mac[2] = (raw_entry->macl >> 24) & 0xff;
> > +     mac[3] = (raw_entry->macl >> 16) & 0xff;
> > +     mac[4] = (raw_entry->macl >> 8)  & 0xff;
> > +     mac[5] = (raw_entry->macl >> 0)  & 0xff;
> > +
> > +     *vid = (raw_entry->mach >> 16) & 0xfff;
> > +     *dest_idx = ANA_MACACCESS_DEST_IDX_GET(raw_entry->maca);
> > +}
> > +
> > +static void lan966x_mac_irq_process(struct lan966x *lan966x, u32 row,
> > +                                 struct lan966x_mac_raw_entry *raw_entries)
> > +{
> > +     struct lan966x_mac_entry *mac_entry, *tmp;
> > +     char mac[ETH_ALEN] __aligned(2);
> 
> unsigned char
> 
> > +     unsigned long flags;
> > +     u32 dest_idx;
> > +     u32 column;
> > +     u16 vid;
> > +
> > +     spin_lock_irqsave(&lan966x->mac_lock, flags);
> > +     list_for_each_entry_safe(mac_entry, tmp, &lan966x->mac_entries, list) {
> > +             bool found = false;
> > +
> > +             if (mac_entry->row != row)
> > +                     continue;
> 
> When the MAC table gets large, you could consider keeping separate lists
> per row. This way you can avoid traversing a list of elements you're
> sure you don't care about.

Before I will change anything, I should run some measurements and see
some results.

> 
> > +
> > +             for (column = 0; column < LAN966X_MAC_COLUMNS; ++column) {
> > +                     /* All the valid entries are at the start of the row,
> > +                      * so when get one invalid entry it can just skip the
> > +                      * rest of the columns
> > +                      */
> > +                     if (!ANA_MACACCESS_VALID_GET(raw_entries[column].maca))
> > +                             break;
> > +
> > +                     lan966x_mac_process_raw_entry(&raw_entries[column],
> > +                                                   mac, &vid, &dest_idx);
> > +                     WARN_ON(dest_idx > lan966x->num_phys_ports);
> > +
> > +                     /* If the entry in SW is found, then there is nothing
> > +                      * to do
> > +                      */
> > +                     if (mac_entry->vid == vid &&
> > +                         ether_addr_equal(mac_entry->mac, mac) &&
> > +                         mac_entry->port_index == dest_idx) {
> > +                             raw_entries[column].process = true;
> > +                             found = true;
> > +                             break;
> > +                     }
> > +             }
> > +
> > +             if (!found) {
> > +                     /* Notify the bridge that the entry doesn't exist
> > +                      * anymore in the HW and remmove the entry from the SW
> 
> s/remmove/remove/
> 
> > +                      * list
> > +                      */
> > +                     lan966x_mac_notifiers(lan966x, SWITCHDEV_FDB_DEL_TO_BRIDGE,
> > +                                           mac_entry->mac, mac_entry->vid,
> > +                                           lan966x->ports[mac_entry->port_index]->dev);
> > +
> > +                     list_del(&mac_entry->list);
> > +                     kfree(mac_entry);
> > +             }
> > +     }
> > +     spin_unlock_irqrestore(&lan966x->mac_lock, flags);
> > +
> > +     /* Now go to the list of columns and see if any entry was not in the SW
> > +      * list, then that means that the entry is new so it needs to notify the
> > +      * bridge.
> > +      */
> > +     for (column = 0; column < LAN966X_MAC_COLUMNS; ++column) {
> > +             /* All the valid entries are at the start of the row, so when
> > +              * get one invalid entry it can just skip the rest of the columns
> > +              */
> > +             if (!ANA_MACACCESS_VALID_GET(raw_entries[column].maca))
> > +                     break;
> > +
> > +             /* If the entry already exists then don't do anything */
> > +             if (raw_entries[column].process)
> 
> s/process/processed/
> 
> > +                     continue;
> > +
> > +             lan966x_mac_process_raw_entry(&raw_entries[column],
> > +                                           mac, &vid, &dest_idx);
> > +             WARN_ON(dest_idx > lan966x->num_phys_ports);
> > +
> > +             mac_entry = lan966x_mac_alloc_entry(mac, vid, dest_idx);
> > +             if (!mac_entry)
> > +                     return;
> > +
> > +             mac_entry->row = row;
> > +
> > +             spin_lock_irqsave(&lan966x->mac_lock, flags);
> > +             list_add_tail(&mac_entry->list, &lan966x->mac_entries);
> > +             spin_unlock_irqrestore(&lan966x->mac_lock, flags);
> 
> spin_lock_irqsave shouldn't be necessary from an irq handler.
> 
> > +
> > +             lan966x_mac_notifiers(lan966x, SWITCHDEV_FDB_ADD_TO_BRIDGE,
> > +                                   mac, vid, lan966x->ports[dest_idx]->dev);
> > +     }
> > +}
> > +
> > +irqreturn_t lan966x_mac_irq_handler(struct lan966x *lan966x)
> > +{
> > +     struct lan966x_mac_raw_entry entry[LAN966X_MAC_COLUMNS] = { 0 };
> > +     u32 index, column;
> > +     bool stop = true;
> > +     u32 val;
> > +
> > +     /* Check if the mac table triggered this, if not just bail out */
> > +     if (!(ANA_ANAINTR_INTR_GET(lan_rd(lan966x, ANA_ANAINTR))))
> > +             return IRQ_NONE;
> 
> The interrupt isn't shared, so if we enter this condition, it means the
> analyzer block generated it, just not the MAC table portion of it.
> If we return IRQ_NONE there will be an IRQ storm because that condition
> will never go away. Could we ack the interrupt and return IRQ_HANDLED?

Actually, there is a wrong comment, it doesn't check if the MAC table
triggered it, but it checks if the analyzer generated it. So it can be
removed and then always return IRQ_HANDLER.

> 
> > +
> > +     /* Start the scan from 0, 0 */
> > +     lan_wr(ANA_MACTINDX_M_INDEX_SET(0) |
> > +            ANA_MACTINDX_BUCKET_SET(0),
> > +            lan966x, ANA_MACTINDX);
> > +
> > +     while (1) {
> > +             lan_rmw(ANA_MACACCESS_MAC_TABLE_CMD_SET(MACACCESS_CMD_SYNC_GET_NEXT),
> > +                     ANA_MACACCESS_MAC_TABLE_CMD,
> > +                     lan966x, ANA_MACACCESS);
> > +             lan966x_mac_wait_for_completion(lan966x);
> > +
> > +             val = lan_rd(lan966x, ANA_MACTINDX);
> > +             index = ANA_MACTINDX_M_INDEX_GET(val);
> > +             column = ANA_MACTINDX_BUCKET_GET(val);
> > +
> > +             /* The SYNC-GET-NEXT returns all the entries(4) in a row in
> > +              * which is suffered a change. By change it means that new entry
> > +              * was added or an entry was removed because of ageing.
> > +              * It would return all the columns for that row. And after that
> > +              * it would return the next row The stop conditions of the
> > +              * SYNC-GET-NEXT is when it reaches 'directly' to row 0
> > +              * column 3. So if SYNC-GET-NEXT returns row 0 and column 0
> > +              * then it is required to continue to read more even if it
> > +              * reaches row 0 and column 3.
> > +              */
> > +             if (index == 0 && column == 0)
> > +                     stop = false;
> > +
> > +             if (column == LAN966X_MAC_COLUMNS - 1 &&
> > +                 index == 0 && stop)
> > +                     break;
> > +
> > +             entry[column].mach = lan_rd(lan966x, ANA_MACHDATA);
> > +             entry[column].macl = lan_rd(lan966x, ANA_MACLDATA);
> > +             entry[column].maca = lan_rd(lan966x, ANA_MACACCESS);
> > +
> > +             /* Once all the columns are read process them */
> > +             if (column == LAN966X_MAC_COLUMNS - 1) {
> > +                     lan966x_mac_irq_process(lan966x, index, entry);
> > +                     /* A row was processed so it is safe to assume that the
> > +                      * next row/column can be the stop condition
> > +                      */
> > +                     stop = true;
> > +             }
> > +     }
> > +
> > +     lan_rmw(ANA_ANAINTR_INTR_SET(0),
> > +             ANA_ANAINTR_INTR,
> > +             lan966x, ANA_ANAINTR);
> > +
> > +     return IRQ_HANDLED;
> >  }
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > index 101c1f005baf..7c6d6293611a 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > @@ -527,6 +527,13 @@ static irqreturn_t lan966x_xtr_irq_handler(int irq, void *args)
> >       return IRQ_HANDLED;
> >  }
> >
> > +static irqreturn_t lan966x_ana_irq_handler(int irq, void *args)
> > +{
> > +     struct lan966x *lan966x = args;
> > +
> > +     return lan966x_mac_irq_handler(lan966x);
> > +}
> > +
> >  static void lan966x_cleanup_ports(struct lan966x *lan966x)
> >  {
> >       struct lan966x_port *port;
> > @@ -554,6 +561,11 @@ static void lan966x_cleanup_ports(struct lan966x *lan966x)
> >
> >       disable_irq(lan966x->xtr_irq);
> >       lan966x->xtr_irq = -ENXIO;
> > +
> > +     if (lan966x->ana_irq) {
> > +             disable_irq(lan966x->ana_irq);
> > +             lan966x->ana_irq = -ENXIO;
> > +     }
> >  }
> >
> >  static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
> > @@ -870,6 +882,15 @@ static int lan966x_probe(struct platform_device *pdev)
> >               return -ENODEV;
> >       }
> >
> > +     lan966x->ana_irq = platform_get_irq_byname(pdev, "ana");
> > +     if (lan966x->ana_irq) {
> > +             err = devm_request_threaded_irq(&pdev->dev, lan966x->ana_irq, NULL,
> > +                                             lan966x_ana_irq_handler, IRQF_ONESHOT,
> > +                                             "ana irq", lan966x);
> > +             if (err)
> > +                     return dev_err_probe(&pdev->dev, err, "Unable to use ana irq");
> > +     }
> > +
> >       /* init switch */
> >       lan966x_init(lan966x);
> >       lan966x_stats_init(lan966x);
> > @@ -923,6 +944,8 @@ static int lan966x_remove(struct platform_device *pdev)
> >       destroy_workqueue(lan966x->stats_queue);
> >       mutex_destroy(&lan966x->stats_lock);
> >
> > +     lan966x_mac_purge_entries(lan966x);
> > +
> >       return 0;
> >  }
> >
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> > index 7e5a3b6f168d..ba548d65b58a 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> > @@ -75,6 +75,9 @@ struct lan966x {
> >
> >       u8 base_mac[ETH_ALEN];
> >
> > +     struct list_head mac_entries;
> > +     spinlock_t mac_lock; /* lock for mac_entries list */
> > +
> >       /* stats */
> >       const struct lan966x_stat_layout *stats_layout;
> >       u32 num_stats;
> > @@ -87,6 +90,7 @@ struct lan966x {
> >
> >       /* interrupts */
> >       int xtr_irq;
> > +     int ana_irq;
> >  };
> >
> >  struct lan966x_port_config {
> > @@ -141,6 +145,8 @@ int lan966x_mac_forget(struct lan966x *lan966x,
> >  int lan966x_mac_cpu_learn(struct lan966x *lan966x, const char *addr, u16 vid);
> >  int lan966x_mac_cpu_forget(struct lan966x *lan966x, const char *addr, u16 vid);
> >  void lan966x_mac_init(struct lan966x *lan966x);
> > +void lan966x_mac_purge_entries(struct lan966x *lan966x);
> > +irqreturn_t lan966x_mac_irq_handler(struct lan966x *lan966x);
> >
> >  static inline void __iomem *lan_addr(void __iomem *base[],
> >                                    int id, int tinst, int tcnt,
> > --
> > 2.33.0
> >

Horatiu Vultur Dec. 9, 2021, 3:47 p.m. UTC | #5

The 12/09/2021 13:59, Vladimir Oltean wrote:
> 
> On Thu, Dec 09, 2021 at 10:46:14AM +0100, Horatiu Vultur wrote:
> > +int lan966x_vlan_port_set_vid(struct lan966x_port *port, u16 vid,
> > +                           bool pvid, bool untagged)
> > +{
> > +     struct lan966x *lan966x = port->lan966x;
> > +
> > +     /* Egress vlan classification */
> > +     if (untagged && port->vid != vid) {
> > +             if (port->vid) {
> > +                     dev_err(lan966x->dev,
> > +                             "Port already has a native VLAN: %d\n",
> > +                             port->vid);
> > +                     return -EBUSY;
> 
> Are you interested in supporting the use case from 0da1a1c48911 ("net:
> mscc: ocelot: allow a config where all bridge VLANs are egress-untagged")?
> Because it would be good if the driver was structured that way from the
> get-go instead of patching it later.

Currently not, but I don't know what will happen in 1 month or 6
months.

> 
> > +             }
> > +             port->vid = vid;
> > +     }
> > +
> > +     /* Default ingress vlan classification */
> > +     if (pvid)
> > +             port->pvid = vid;
> > +
> > +     return 0;
> > +}

Horatiu Vultur Dec. 9, 2021, 4:43 p.m. UTC | #6

The 12/09/2021 13:36, Vladimir Oltean wrote:
> 
> On Thu, Dec 09, 2021 at 10:46:15AM +0100, Horatiu Vultur wrote:
> > This adds support for switchdev in lan966x.
> > It offloads to the HW basic forwarding and vlan filtering. To be able to
> > offload this to the HW, it is required to disable promisc mode for ports
> > that are part of the bridge.
> >
> > Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> > ---
> >  .../net/ethernet/microchip/lan966x/Makefile   |   3 +-
> >  .../ethernet/microchip/lan966x/lan966x_main.c |  41 +-
> >  .../ethernet/microchip/lan966x/lan966x_main.h |  18 +
> >  .../microchip/lan966x/lan966x_switchdev.c     | 548 ++++++++++++++++++
> >  .../ethernet/microchip/lan966x/lan966x_vlan.c |  12 +-
> >  5 files changed, 610 insertions(+), 12 deletions(-)
> >  create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> >
> > diff --git a/drivers/net/ethernet/microchip/lan966x/Makefile b/drivers/net/ethernet/microchip/lan966x/Makefile
> > index f7e6068a91cb..d82e896c2e53 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/Makefile
> > +++ b/drivers/net/ethernet/microchip/lan966x/Makefile
> > @@ -6,4 +6,5 @@
> >  obj-$(CONFIG_LAN966X_SWITCH) += lan966x-switch.o
> >
> >  lan966x-switch-objs  := lan966x_main.o lan966x_phylink.o lan966x_port.o \
> > -                     lan966x_mac.o lan966x_ethtool.o lan966x_vlan.o
> > +                     lan966x_mac.o lan966x_ethtool.o lan966x_switchdev.o \
> > +                     lan966x_vlan.o
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > index 1b4c7e6b4f85..aee36c1cfa17 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> > @@ -306,7 +306,7 @@ static int lan966x_port_xmit(struct sk_buff *skb, struct net_device *dev)
> >       return lan966x_port_ifh_xmit(skb, ifh, dev);
> >  }
> >
> > -static void lan966x_set_promisc(struct lan966x_port *port, bool enable)
> > +void lan966x_set_promisc(struct lan966x_port *port, bool enable)
> >  {
> >       struct lan966x *lan966x = port->lan966x;
> >
> 
> My documentation of CPU_SRC_COPY_ENA says:
> 
> If set, all frames received on this port are
> copied to the CPU extraction queue given by
> CPUQ_CFG.CPUQ_SRC_COPY.
> 
> I think it was established a while ago that this isn't what promiscuous
> mode is about? Instead it is about accepting packets on a port
> regardless of whether the MAC DA is in their RX filter or not.

Yes, I am aware that this change interprets the things differently and I
am totally OK to drop this promisc if it is needed.

> 
> Hence the oddity of your change. I understand what it intends to do:
> if this is a standalone port you support IFF_UNICAST_FLT, so you drop
> frames with unknown MAC DA. But if IFF_PROMISC is set, then why do you
> copy all frames to the CPU? Why don't you just put the CPU in the
> unknown flooding mask?

Because I don't want the CPU to be in the unknown flooding mask. I want
to send frames to the CPU only if it is required.

> There's a difference between "force a packet to
> get copied to the CPU" and "copy it to the CPU only if it may have
> business there". Then this change would be compatible with bridge mode.
> You want the bridge to receive unknown traffic too, it may need to
> forward it in software.

I don't want to bridge to receive unknown traffic if I know it would
just drop it.

> 
> > @@ -318,14 +318,18 @@ static void lan966x_set_promisc(struct lan966x_port *port, bool enable)
> >  static void lan966x_port_change_rx_flags(struct net_device *dev, int flags)
> >  {
> >       struct lan966x_port *port = netdev_priv(dev);
> > +     bool enable;
> >
> >       if (!(flags & IFF_PROMISC))
> >               return;
> >
> > -     if (dev->flags & IFF_PROMISC)
> > -             lan966x_set_promisc(port, true);
> > -     else
> > -             lan966x_set_promisc(port, false);
> > +     enable = dev->flags & IFF_PROMISC ? true : false;
> > +     port->promisc = enable;
> > +
> > +     if (port->bridge)
> > +             return;
> > +
> > +     lan966x_set_promisc(port, enable);
> >  }
> >
> >  static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
> > @@ -340,7 +344,7 @@ static int lan966x_port_change_mtu(struct net_device *dev, int new_mtu)
> >       return 0;
> >  }
> >
> > -static int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr)
> > +int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr)
> >  {
> >       struct lan966x_port *port = netdev_priv(dev);
> >       struct lan966x *lan966x = port->lan966x;
> > @@ -348,7 +352,7 @@ static int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr)
> >       return lan966x_mac_forget(lan966x, addr, port->pvid, ENTRYTYPE_LOCKED);
> >  }
> >
> > -static int lan966x_mc_sync(struct net_device *dev, const unsigned char *addr)
> > +int lan966x_mc_sync(struct net_device *dev, const unsigned char *addr)
> >  {
> >       struct lan966x_port *port = netdev_priv(dev);
> >       struct lan966x *lan966x = port->lan966x;
> > @@ -401,6 +405,11 @@ static const struct net_device_ops lan966x_port_netdev_ops = {
> >       .ndo_vlan_rx_kill_vid           = lan966x_vlan_rx_kill_vid,
> >  };
> >
> > +bool lan966x_netdevice_check(const struct net_device *dev)
> > +{
> > +     return dev && (dev->netdev_ops == &lan966x_port_netdev_ops);
> 
> Can "dev" ever be NULL?

It doesn't look like that. I will remove it.

> 
> > +}
> > +
> >  static int lan966x_port_xtr_status(struct lan966x *lan966x, u8 grp)
> >  {
> >       return lan_rd(lan966x, QS_XTR_RD(grp));
> > @@ -537,6 +546,11 @@ static irqreturn_t lan966x_xtr_irq_handler(int irq, void *args)
> >
> >               skb->protocol = eth_type_trans(skb, dev);
> >
> > +#ifdef CONFIG_NET_SWITCHDEV
> > +             if (lan966x->ports[src_port]->bridge)
> > +                     skb->offload_fwd_mark = 1;
> > +#endif
> > +
> >               netif_rx_ni(skb);
> >               dev->stats.rx_bytes += len;
> >               dev->stats.rx_packets++;
> > @@ -619,13 +633,16 @@ static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
> >
> >       dev->netdev_ops = &lan966x_port_netdev_ops;
> >       dev->ethtool_ops = &lan966x_ethtool_ops;
> > +     dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER |
> > +                         NETIF_F_RXFCS;
> > +     dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
> > +                      NETIF_F_HW_VLAN_CTAG_TX |
> > +                      NETIF_F_HW_VLAN_STAG_TX;
> > +     dev->priv_flags |= IFF_UNICAST_FLT;
> 
> Too many changes in one patch. IFF_UNICAST_FLT and the handling of
> promiscuous mode have nothing to do with switchdev.

Well, in this case (because it is a bug in the promisc) it has. Because
the bridge will increase the promiscuity of the interfaces, so then all
the ports will copy the frames to the CPU. Which breaks the entire
purpose of the bridge.

> Neither VLAN filtering via dev->features (should have been part of the previous
> patch), or RXFCS.

I agree, the VLAN filtering should be part of the previous patch and
RXFCS should be totally separate change.

> It seems that each one of these changes was previously
> missed and is now snuck in without the explanation it deserves.
> 
> >       dev->needed_headroom = IFH_LEN * sizeof(u32);
> >
> >       eth_hw_addr_gen(dev, lan966x->base_mac, p + 1);
> >
> > -     lan966x_mac_learn(lan966x, PGID_CPU, dev->dev_addr, port->pvid,
> > -                       ENTRYTYPE_LOCKED);
> > -
> 
> Why is this deleted? If the port uses IFF_UNICAST_FLT and isn't
> promiscuous, how does it get the unicast traffic it needs?
> I may be not realizing something because the changes aren't properly
> split.

The functionality is moved inside function 'lan966x_port_add_addr'.

> 
> >       port->phylink_config.dev = &port->dev->dev;
> >       port->phylink_config.type = PHYLINK_NETDEV;
> >       port->phylink_pcs.poll = true;
> > @@ -949,6 +966,8 @@ static int lan966x_probe(struct platform_device *pdev)
> >               lan966x_port_init(lan966x->ports[p]);
> >       }
> >
> > +     lan966x_register_notifier_blocks(lan966x);
> > +
> >       return 0;
> >
> >  cleanup_ports:
> > @@ -967,6 +986,8 @@ static int lan966x_remove(struct platform_device *pdev)
> >  {
> >       struct lan966x *lan966x = platform_get_drvdata(pdev);
> >
> > +     lan966x_unregister_notifier_blocks(lan966x);
> > +
> >       lan966x_cleanup_ports(lan966x);
> >
> >       cancel_delayed_work_sync(&lan966x->stats_work);
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> > index ec3eccf634b3..4a0988087167 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> > @@ -80,6 +80,11 @@ struct lan966x {
> >       struct list_head mac_entries;
> >       spinlock_t mac_lock; /* lock for mac_entries list */
> >
> > +     /* Notifiers */
> > +     struct notifier_block netdevice_nb;
> > +     struct notifier_block switchdev_nb;
> > +     struct notifier_block switchdev_blocking_nb;
> > +
> >       u16 vlan_mask[VLAN_N_VID];
> >       DECLARE_BITMAP(cpu_vlan_mask, VLAN_N_VID);
> >
> > @@ -112,6 +117,10 @@ struct lan966x_port {
> >       struct net_device *dev;
> >       struct lan966x *lan966x;
> >
> > +     struct net_device *bridge;
> > +     u8 stp_state;
> > +     u8 promisc;
> > +
> >       u8 chip_port;
> >       u16 pvid;
> >       u16 vid;
> > @@ -129,6 +138,14 @@ extern const struct phylink_mac_ops lan966x_phylink_mac_ops;
> >  extern const struct phylink_pcs_ops lan966x_phylink_pcs_ops;
> >  extern const struct ethtool_ops lan966x_ethtool_ops;
> >
> > +int lan966x_mc_unsync(struct net_device *dev, const unsigned char *addr);
> > +int lan966x_mc_sync(struct net_device *dev, const unsigned char *addr);
> > +
> > +bool lan966x_netdevice_check(const struct net_device *dev);
> > +
> > +int lan966x_register_notifier_blocks(struct lan966x *lan966x);
> > +void lan966x_unregister_notifier_blocks(struct lan966x *lan966x);
> > +
> >  void lan966x_stats_get(struct net_device *dev,
> >                      struct rtnl_link_stats64 *stats);
> >  int lan966x_stats_init(struct lan966x *lan966x);
> > @@ -139,6 +156,7 @@ void lan966x_port_status_get(struct lan966x_port *port,
> >                            struct phylink_link_state *state);
> >  int lan966x_port_pcs_set(struct lan966x_port *port,
> >                        struct lan966x_port_config *config);
> > +void lan966x_set_promisc(struct lan966x_port *port, bool enable);
> >  void lan966x_port_init(struct lan966x_port *port);
> >
> >  int lan966x_mac_learn(struct lan966x *lan966x, int port,
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> > new file mode 100644
> > index 000000000000..ed6ec78d2d9a
> > --- /dev/null
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
> > @@ -0,0 +1,548 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +
> > +#include <linux/if_bridge.h>
> > +#include <net/switchdev.h>
> > +
> > +#include "lan966x_main.h"
> > +
> > +static struct workqueue_struct *lan966x_owq;
> > +
> > +struct lan966x_fdb_event_work {
> > +     struct work_struct work;
> > +     struct switchdev_notifier_fdb_info fdb_info;
> > +     struct net_device *dev;
> > +     struct lan966x *lan966x;
> > +     unsigned long event;
> > +};
> > +
> > +static void lan966x_port_attr_bridge_flags(struct lan966x_port *port,
> > +                                        struct switchdev_brport_flags flags)
> > +{
> > +     u32 val = lan_rd(port->lan966x, ANA_PGID(PGID_MC));
> > +
> > +     val = ANA_PGID_PGID_GET(val);
> > +
> > +     if (flags.mask & BR_MCAST_FLOOD) {
> > +             if (flags.val & BR_MCAST_FLOOD)
> > +                     val |= BIT(port->chip_port);
> > +             else
> > +                     val &= ~BIT(port->chip_port);
> > +     }
> > +
> > +     lan_rmw(ANA_PGID_PGID_SET(val),
> > +             ANA_PGID_PGID,
> > +             port->lan966x, ANA_PGID(PGID_MC));
> > +}
> > +
> > +static u32 lan966x_get_fwd_mask(struct lan966x_port *port)
> > +{
> > +     struct net_device *bridge = port->bridge;
> > +     struct lan966x *lan966x = port->lan966x;
> > +     u8 ingress_src = port->chip_port;
> > +     u32 mask = 0;
> > +     int p;
> > +
> > +     if (port->stp_state != BR_STATE_FORWARDING)
> > +             goto skip_forwarding;
> > +
> > +     for (p = 0; p < lan966x->num_phys_ports; p++) {
> > +             port = lan966x->ports[p];
> > +
> > +             if (!port)
> > +                     continue;
> > +
> > +             if (port->stp_state == BR_STATE_FORWARDING &&
> > +                 port->bridge == bridge)
> > +                     mask |= BIT(p);
> > +     }
> > +
> > +skip_forwarding:
> > +     mask &= ~BIT(ingress_src);
> > +
> > +     return mask;
> > +}
> > +
> > +static void lan966x_update_fwd_mask(struct lan966x *lan966x)
> > +{
> > +     int p;
> > +
> > +     for (p = 0; p < lan966x->num_phys_ports; p++) {
> > +             struct lan966x_port *port = lan966x->ports[p];
> > +             unsigned long mask = 0;
> > +
> > +             if (port->bridge)
> > +                     mask = lan966x_get_fwd_mask(port);
> > +
> > +             mask |= BIT(CPU_PORT);
> > +
> > +             lan_wr(ANA_PGID_PGID_SET(mask),
> > +                    lan966x, ANA_PGID(PGID_SRC + p));
> > +     }
> > +}
> > +
> > +static void lan966x_attr_stp_state_set(struct lan966x_port *port,
> > +                                    u8 state)
> > +{
> > +     struct lan966x *lan966x = port->lan966x;
> > +     bool learn_ena = 0;
> > +
> > +     port->stp_state = state;
> > +
> > +     if (state == BR_STATE_FORWARDING || state == BR_STATE_LEARNING)
> > +             learn_ena = 1;
> 
> Please use true/false for bool types.

Will do that.

> 
> > +
> > +     lan_rmw(ANA_PORT_CFG_LEARN_ENA_SET(learn_ena),
> > +             ANA_PORT_CFG_LEARN_ENA,
> > +             lan966x, ANA_PORT_CFG(port->chip_port));
> > +
> > +     lan966x_update_fwd_mask(lan966x);
> > +}
> > +
> > +static void lan966x_port_attr_ageing_set(struct lan966x_port *port,
> > +                                      unsigned long ageing_clock_t)
> > +{
> > +     unsigned long ageing_jiffies = clock_t_to_jiffies(ageing_clock_t);
> > +     u32 ageing_time = jiffies_to_msecs(ageing_jiffies) / 1000;
> > +
> > +     lan966x_mac_set_ageing(port->lan966x, ageing_time);
> > +}
> > +
> > +static int lan966x_port_attr_set(struct net_device *dev, const void *ctx,
> > +                              const struct switchdev_attr *attr,
> > +                              struct netlink_ext_ack *extack)
> > +{
> > +     struct lan966x_port *port = netdev_priv(dev);
> > +
> > +     switch (attr->id) {
> > +     case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
> > +             lan966x_port_attr_bridge_flags(port, attr->u.brport_flags);
> > +             break;
> 
> no SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS? I think it doesn't even work
> if you don't handle that?
> 
> br_switchdev_set_port_flag():
> 
>         attr.id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS;
>         attr.u.brport_flags.val = flags;
>         attr.u.brport_flags.mask = mask;
> 
>         /* We run from atomic context here */
>         err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
>                                        &info.info, extack);
>         err = notifier_to_errno(err);
>         if (err == -EOPNOTSUPP)
>                 return 0;
> 
> Anyway, a big blob of "switchdev support" is hard to follow and review.
> If you add bridge port flags you could as well add more comprehensive
> support for them, but in a separate change please. Forwarding domain is
> one thing, FDB/MDB is another, VLAN is another.

Good catch. I will try to split it as you suggested.

> 
> > +     case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
> > +             lan966x_attr_stp_state_set(port, attr->u.stp_state);
> > +             break;
> > +     case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
> > +             lan966x_port_attr_ageing_set(port, attr->u.ageing_time);
> > +             break;
> > +     case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
> > +             lan966x_vlan_port_set_vlan_aware(port, attr->u.vlan_filtering);
> > +             lan966x_vlan_port_apply(port);
> > +             lan966x_vlan_cpu_set_vlan_aware(port);
> > +             break;
> > +     default:
> > +             return -EOPNOTSUPP;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int lan966x_port_bridge_join(struct lan966x_port *port,
> > +                                 struct net_device *bridge,
> > +                                 struct netlink_ext_ack *extack)
> > +{
> > +     struct net_device *dev = port->dev;
> > +     int err;
> > +
> > +     err = switchdev_bridge_port_offload(dev, dev, NULL, NULL, NULL,
> > +                                         false, extack);
> > +     if (err)
> > +             return err;
> > +
> > +     port->bridge = bridge;
> > +
> > +     /* Port enters in bridge mode therefor don't need to copy to CPU
> > +      * frames for multicast in case the bridge is not requesting them
> > +      */
> > +     __dev_mc_unsync(dev, lan966x_mc_unsync);
> 
> Why is there a need to unsync the addresses, though? A driver that
> supports proper MAC table isolation between standalone ports and
> VLAN-unaware bridge ports should use separate pvids for the two modes of
> operation (if it doesn't support other isolation mechanisms apart from VLAN,
> which it looks like lan966x does not). I see that your driver even does
> this in lan966x_vlan_port_get_pvid(). So the non-bridge multicast
> addresses could sit there even when the port is in a bridge.

Really good observation.
Initially I had the same PVID for standalone and vlan-unaware bridge
ports. So then I have added all these but now that they use different
PVID I can remove this.

> 
> > +
> > +     /* make sure that the promisc is disabled when entering under the bridge
> > +      * because we don't want all the frames to come to CPU
> > +      */
> > +     lan966x_set_promisc(port, false);
> 
> What's the story here? Why don't other switchdev drivers handle promisc
> in this way (copy all frames to the CPU)?

I can't say about other switchdev drivers, but in our case we really
don't want all the frames to the CPU. The function lan966x_set_promisc
was setting the port to copy all the frames to the CPU, which we don't
want when the port is part of the bridge.

> 
> > +
> > +     return 0;
> > +}
> > +
> > +static void lan966x_port_bridge_leave(struct lan966x_port *port,
> > +                                   struct net_device *bridge)
> > +{
> > +     struct lan966x *lan966x = port->lan966x;
> > +
> > +     switchdev_bridge_port_unoffload(port->dev, NULL, NULL, NULL);
> 
> The bridge offers a facility to sync and unsync the host addresses on
> joins and leaves. For this to work, the switchdev_bridge_port_unoffload
> call should be during the NETDEV_PRECHANGEUPPER notifier event.
> 
> > +     port->bridge = NULL;
> > +
> > +     /* Set the port back to host mode */
> > +     lan966x_vlan_port_set_vlan_aware(port, 0);
> 
> s/0/false/
> 
> > +     lan966x_vlan_port_set_vid(port, HOST_PVID, false, false);
> > +     lan966x_vlan_port_apply(port);
> > +
> > +     lan966x_mac_cpu_learn(lan966x, port->dev->dev_addr, HOST_PVID);
> > +
> > +     /* Port enters in host more therefore restore mc list */
> > +     __dev_mc_sync(port->dev, lan966x_mc_sync, lan966x_mc_unsync);
> > +
> > +     /* Restore back the promisc as it was before the interfaces was added to
> > +      * the bridge
> > +      */
> > +     lan966x_set_promisc(port, port->promisc);
> > +}
> > +
> > +static int lan966x_port_changeupper(struct net_device *dev,
> > +                                 struct netdev_notifier_changeupper_info *info)
> > +{
> > +     struct lan966x_port *port = netdev_priv(dev);
> > +     struct netlink_ext_ack *extack;
> > +     int err = 0;
> > +
> > +     extack = netdev_notifier_info_to_extack(&info->info);
> > +
> > +     if (netif_is_bridge_master(info->upper_dev)) {
> > +             if (info->linking)
> > +                     err = lan966x_port_bridge_join(port, info->upper_dev,
> > +                                                    extack);
> > +             else
> > +                     lan966x_port_bridge_leave(port, info->upper_dev);
> > +     }
> > +
> > +     return err;
> > +}
> > +
> > +static int lan966x_port_add_addr(struct net_device *dev, bool up)
> > +{
> > +     struct lan966x_port *port = netdev_priv(dev);
> > +     struct lan966x *lan966x = port->lan966x;
> > +     u16 vid;
> > +
> > +     vid = lan966x_vlan_port_get_pvid(port);
> > +
> > +     if (up)
> > +             lan966x_mac_cpu_learn(lan966x, dev->dev_addr, vid);
> > +     else
> > +             lan966x_mac_cpu_forget(lan966x, dev->dev_addr, vid);
> 
> For which uppers is this intended? The bridge? Because the bridge
> notifies you of all the entries it needs, if you also consider the
> replayed events and provide non-NULL pointers to
> switchdev_bridge_port_offload.

No, this up just means if the port is up or down. Doesn't have anything
to do with the bridge. This code replace the code from
lan966x_probe_port.

> 
> > +
> > +     return 0;
> > +}
> > +
> > +static int lan966x_netdevice_port_event(struct net_device *dev,
> > +                                     struct notifier_block *nb,
> > +                                     unsigned long event, void *ptr)
> > +{
> > +     int err = 0;
> > +
> > +     if (!lan966x_netdevice_check(dev))
> > +             return 0;
> > +
> > +     switch (event) {
> > +     case NETDEV_CHANGEUPPER:
> > +             err = lan966x_port_changeupper(dev, ptr);
> > +             break;
> > +     case NETDEV_PRE_UP:
> > +             err = lan966x_port_add_addr(dev, true);
> > +             break;
> > +     case NETDEV_DOWN:
> > +             err = lan966x_port_add_addr(dev, false);
> > +             break;
> > +     }
> > +
> > +     return err;
> > +}
> > +
> > +static int lan966x_netdevice_event(struct notifier_block *nb,
> > +                                unsigned long event, void *ptr)
> > +{
> > +     struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> > +     int ret;
> > +
> > +     ret = lan966x_netdevice_port_event(dev, nb, event, ptr);
> > +
> > +     return notifier_from_errno(ret);
> > +}
> > +
> > +static void lan966x_fdb_event_work(struct work_struct *work)
> > +{
> > +     struct lan966x_fdb_event_work *fdb_work =
> > +             container_of(work, struct lan966x_fdb_event_work, work);
> > +     struct switchdev_notifier_fdb_info *fdb_info;
> > +     struct net_device *dev = fdb_work->dev;
> > +     struct lan966x_port *port;
> > +     struct lan966x *lan966x;
> > +
> > +     rtnl_lock();
> 
> rtnl_lock() shouldn't be needed.
> 
> > +
> > +     fdb_info = &fdb_work->fdb_info;
> > +     lan966x = fdb_work->lan966x;
> > +
> > +     if (lan966x_netdevice_check(dev)) {
> > +             port = netdev_priv(dev);
> > +
> > +             switch (fdb_work->event) {
> > +             case SWITCHDEV_FDB_ADD_TO_DEVICE:
> > +                     if (!fdb_info->added_by_user)
> > +                             break;
> 
> If you get notified of a MAC address dynamically learned by the software
> bridge on a lan966x port, you will have allocated memory for the work
> item, and scheduled it, for nothing. Please try to exit unnecessary work
> early.

Yes, I will add an early check.

> 
> > +                     lan966x_mac_add_entry(lan966x, port, fdb_info->addr,
> > +                                           fdb_info->vid);
> > +                     break;
> > +             case SWITCHDEV_FDB_DEL_TO_DEVICE:
> > +                     if (!fdb_info->added_by_user)
> > +                             break;
> > +                     lan966x_mac_del_entry(lan966x, fdb_info->addr, fdb_info->vid);
> > +                     break;
> > +             }
> > +     } else {
> > +             if (!netif_is_bridge_master(dev))
> > +                     goto out;
> > +
> > +             /* If the CPU is not part of the vlan then there is no point
> > +              * to copy the frames to the CPU because they will be dropped
> > +              */
> > +             if (!lan966x_vlan_cpu_member_vlan_mask(lan966x, fdb_info->vid))
> > +                     goto out;
> 
> It isn't part of the VLAN now, but what about later? I don't see that
> you keep these FDB entries anywhere and restore them when a port joins
> that VLAN.

Actually I do that. It is inside lan966x_vlan_port_add_vlan. There I
check if the CPU is part of the VLAN then add that entry.

> 
> > +
> > +             /* In case the bridge is called */
> > +             switch (fdb_work->event) {
> > +             case SWITCHDEV_FDB_ADD_TO_DEVICE:
> > +                     /* If there is no front port in this vlan, there is no
> > +                      * point to copy the frame to CPU because it would be
> > +                      * just dropped at later point. So add it only if
> > +                      * there is a port
> > +                      */
> > +                     if (!lan966x_vlan_port_any_vlan_mask(lan966x, fdb_info->vid))
> > +                             break;
> > +
> > +                     lan966x_mac_cpu_learn(lan966x, fdb_info->addr, fdb_info->vid);
> 
> Does the lan966x_mac_cpu_learn() operation trigger interrupts, or only
> the dynamic learning process?

Only dynamic learning process.

> How do you handle migration of an FDB entry pointing towards the CPU,
> towards one pointing towards a port?

Shouldn't I get 2 calls that the entry is removed from CPU and then
added to a port?

> 
> > +                     break;
> > +             case SWITCHDEV_FDB_DEL_TO_DEVICE:
> > +                     /* It is OK to always forget the entry it */
> 
> Forget the entry it?

It was a typo. I will remove the 'it'.

> 
> > +                     lan966x_mac_cpu_forget(lan966x, fdb_info->addr, fdb_info->vid);
> > +                     break;
> > +             }
> > +     }
> > +
> > +out:
> > +     rtnl_unlock();
> > +     kfree(fdb_work->fdb_info.addr);
> > +     kfree(fdb_work);
> > +     dev_put(dev);
> > +}
> > +
> > +static int lan966x_switchdev_event(struct notifier_block *nb,
> > +                                unsigned long event, void *ptr)
> > +{
> > +     struct lan966x *lan966x = container_of(nb, struct lan966x, switchdev_nb);
> > +     struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
> > +     struct switchdev_notifier_fdb_info *fdb_info;
> > +     struct switchdev_notifier_info *info = ptr;
> > +     struct lan966x_fdb_event_work *fdb_work;
> > +     int err;
> > +
> > +     switch (event) {
> > +     case SWITCHDEV_PORT_ATTR_SET:
> > +             err = switchdev_handle_port_attr_set(dev, ptr,
> > +                                                  lan966x_netdevice_check,
> > +                                                  lan966x_port_attr_set);
> > +             return notifier_from_errno(err);
> > +     case SWITCHDEV_FDB_ADD_TO_DEVICE:
> > +             fallthrough;
> 
> "fallthrough;" not needed here.
> 
> > +     case SWITCHDEV_FDB_DEL_TO_DEVICE:
> > +             fdb_work = kzalloc(sizeof(*fdb_work), GFP_ATOMIC);
> > +             if (!fdb_work)
> > +                     return NOTIFY_BAD;
> > +
> > +             fdb_info = container_of(info,
> > +                                     struct switchdev_notifier_fdb_info,
> > +                                     info);
> > +
> > +             fdb_work->dev = dev;
> > +             fdb_work->lan966x = lan966x;
> > +             fdb_work->event = event;
> > +             INIT_WORK(&fdb_work->work, lan966x_fdb_event_work);
> > +             memcpy(&fdb_work->fdb_info, ptr, sizeof(fdb_work->fdb_info));
> > +             fdb_work->fdb_info.addr = kzalloc(ETH_ALEN, GFP_ATOMIC);
> > +             if (!fdb_work->fdb_info.addr)
> > +                     goto err_addr_alloc;
> > +
> > +             ether_addr_copy((u8 *)fdb_work->fdb_info.addr, fdb_info->addr);
> > +             dev_hold(dev);
> > +
> > +             queue_work(lan966x_owq, &fdb_work->work);
> > +             break;
> > +     }
> > +
> > +     return NOTIFY_DONE;
> > +err_addr_alloc:
> > +     kfree(fdb_work);
> > +     return NOTIFY_BAD;
> > +}
> > +
> > +static int lan966x_handle_port_vlan_add(struct net_device *dev,
> > +                                     struct notifier_block *nb,
> > +                                     const struct switchdev_obj_port_vlan *v)
> > +{
> > +     struct lan966x_port *port;
> > +     struct lan966x *lan966x;
> > +
> > +     /* When adding a port to a vlan, we get a callback for the port but
> > +      * also for the bridge. When get the callback for the bridge just bail
> > +      * out. Then when the bridge is added to the vlan, then we get a
> > +      * callback here but in this case the flags has set:
> > +      * BRIDGE_VLAN_INFO_BRENTRY. In this case it means that the CPU
> > +      * port is added to the vlan, so the broadcast frames and unicast frames
> > +      * with dmac of the bridge should be foward to CPU.
> > +      */
> > +     if (netif_is_bridge_master(dev) &&
> > +         !(v->flags & BRIDGE_VLAN_INFO_BRENTRY))
> > +             return 0;
> > +
> > +     lan966x = container_of(nb, struct lan966x, switchdev_blocking_nb);
> > +
> > +     /* In case the port gets called */
> > +     if (!(netif_is_bridge_master(dev))) {
> > +             if (!lan966x_netdevice_check(dev))
> > +                     return -EOPNOTSUPP;
> > +
> > +             port = netdev_priv(dev);
> > +             return lan966x_vlan_port_add_vlan(port, v->vid,
> > +                                               v->flags & BRIDGE_VLAN_INFO_PVID,
> > +                                               v->flags & BRIDGE_VLAN_INFO_UNTAGGED);
> > +     }
> > +
> > +     /* In case the bridge gets called */
> > +     if (netif_is_bridge_master(dev))
> > +             return lan966x_vlan_cpu_add_vlan(lan966x, dev, v->vid);
> > +
> > +     return 0;
> > +}
> > +
> > +static int lan966x_handle_port_obj_add(struct net_device *dev,
> > +                                    struct notifier_block *nb,
> > +                                    struct switchdev_notifier_port_obj_info *info)
> > +{
> > +     const struct switchdev_obj *obj = info->obj;
> > +     int err;
> > +
> > +     switch (obj->id) {
> > +     case SWITCHDEV_OBJ_ID_PORT_VLAN:
> > +             err = lan966x_handle_port_vlan_add(dev, nb,
> > +                                                SWITCHDEV_OBJ_PORT_VLAN(obj));
> > +             break;
> > +     default:
> > +             err = -EOPNOTSUPP;
> > +             break;
> > +     }
> > +
> > +     info->handled = true;
> > +     return err;
> > +}
> > +
> > +static int lan966x_handle_port_vlan_del(struct net_device *dev,
> > +                                     struct notifier_block *nb,
> > +                                     const struct switchdev_obj_port_vlan *v)
> > +{
> > +     struct lan966x_port *port;
> > +     struct lan966x *lan966x;
> > +
> > +     lan966x = container_of(nb, struct lan966x, switchdev_blocking_nb);
> > +
> > +     /* In case the physical port gets called */
> > +     if (!netif_is_bridge_master(dev)) {
> > +             if (!lan966x_netdevice_check(dev))
> > +                     return -EOPNOTSUPP;
> > +
> > +             port = netdev_priv(dev);
> > +             return lan966x_vlan_port_del_vlan(port, v->vid);
> > +     }
> > +
> > +     /* In case the bridge gets called */
> > +     if (netif_is_bridge_master(dev))
> > +             return lan966x_vlan_cpu_del_vlan(lan966x, dev, v->vid);
> > +
> > +     return 0;
> > +}
> > +
> > +static int lan966x_handle_port_obj_del(struct net_device *dev,
> > +                                    struct notifier_block *nb,
> > +                                    struct switchdev_notifier_port_obj_info *info)
> > +{
> > +     const struct switchdev_obj *obj = info->obj;
> > +     int err;
> > +
> > +     switch (obj->id) {
> > +     case SWITCHDEV_OBJ_ID_PORT_VLAN:
> > +             err = lan966x_handle_port_vlan_del(dev, nb,
> > +                                                SWITCHDEV_OBJ_PORT_VLAN(obj));
> > +             break;
> > +     default:
> > +             err = -EOPNOTSUPP;
> > +             break;
> > +     }
> > +
> > +     info->handled = true;
> > +     return err;
> > +}
> > +
> > +static int lan966x_switchdev_blocking_event(struct notifier_block *nb,
> > +                                         unsigned long event,
> > +                                         void *ptr)
> > +{
> > +     struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
> > +     int err;
> > +
> > +     switch (event) {
> > +     case SWITCHDEV_PORT_OBJ_ADD:
> > +             err = lan966x_handle_port_obj_add(dev, nb, ptr);
> > +             return notifier_from_errno(err);
> > +     case SWITCHDEV_PORT_OBJ_DEL:
> > +             err = lan966x_handle_port_obj_del(dev, nb, ptr);
> > +             return notifier_from_errno(err);
> > +     case SWITCHDEV_PORT_ATTR_SET:
> > +             err = switchdev_handle_port_attr_set(dev, ptr,
> > +                                                  lan966x_netdevice_check,
> > +                                                  lan966x_port_attr_set);
> > +             return notifier_from_errno(err);
> > +     }
> > +
> > +     return NOTIFY_DONE;
> > +}
> > +
> > +int lan966x_register_notifier_blocks(struct lan966x *lan966x)
> > +{
> > +     int err;
> > +
> > +     lan966x->netdevice_nb.notifier_call = lan966x_netdevice_event;
> > +     err = register_netdevice_notifier(&lan966x->netdevice_nb);
> > +     if (err)
> > +             return err;
> > +
> > +     lan966x->switchdev_nb.notifier_call = lan966x_switchdev_event;
> > +     err = register_switchdev_notifier(&lan966x->switchdev_nb);
> > +     if (err)
> > +             goto err_switchdev_nb;
> > +
> > +     lan966x->switchdev_blocking_nb.notifier_call = lan966x_switchdev_blocking_event;
> > +     err = register_switchdev_blocking_notifier(&lan966x->switchdev_blocking_nb);
> > +     if (err)
> > +             goto err_switchdev_blocking_nb;
> > +
> > +     lan966x_owq = alloc_ordered_workqueue("lan966x_order", 0);
> > +     if (!lan966x_owq) {
> > +             err = -ENOMEM;
> > +             goto err_switchdev_blocking_nb;
> > +     }
> 
> These should be singleton objects, otherwise things get problematic if
> you have more than one switch device instantiated in the system.

Yes, I will update this.

> 
> > +
> > +     return 0;
> > +
> > +err_switchdev_blocking_nb:
> > +     unregister_switchdev_notifier(&lan966x->switchdev_nb);
> > +err_switchdev_nb:
> > +     unregister_netdevice_notifier(&lan966x->netdevice_nb);
> > +
> > +     return err;
> > +}
> > +
> > +void lan966x_unregister_notifier_blocks(struct lan966x *lan966x)
> > +{
> > +     destroy_workqueue(lan966x_owq);
> > +
> > +     unregister_switchdev_blocking_notifier(&lan966x->switchdev_blocking_nb);
> > +     unregister_switchdev_notifier(&lan966x->switchdev_nb);
> > +     unregister_netdevice_notifier(&lan966x->netdevice_nb);
> > +}
> > diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c b/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c
> > index e47552775d06..26644503b4e6 100644
> > --- a/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c
> > +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_vlan.c
> > @@ -155,6 +155,9 @@ static bool lan966x_vlan_cpu_member_cpu_vlan_mask(struct lan966x *lan966x, u16 v
> >
> >  u16 lan966x_vlan_port_get_pvid(struct lan966x_port *port)
> >  {
> > +     if (!port->bridge)
> > +             return HOST_PVID;
> > +
> >       return port->vlan_aware ? port->pvid : UNAWARE_PVID;
> >  }
> >
> > @@ -210,6 +213,8 @@ void lan966x_vlan_cpu_set_vlan_aware(struct lan966x_port *port)
> >                * table for the front port and the CPU
> >                */
> >               lan966x_mac_cpu_learn(lan966x, port->dev->dev_addr, UNAWARE_PVID);
> > +             lan966x_mac_cpu_learn(lan966x, port->bridge->dev_addr,
> > +                                   UNAWARE_PVID);
> >
> >               lan966x_vlan_port_add_vlan_mask(port, UNAWARE_PVID);
> >               lan966x_vlan_port_apply(port);
> > @@ -218,6 +223,8 @@ void lan966x_vlan_cpu_set_vlan_aware(struct lan966x_port *port)
> >                * to vlan unaware
> >                */
> >               lan966x_mac_cpu_forget(lan966x, port->dev->dev_addr, UNAWARE_PVID);
> > +             lan966x_mac_cpu_forget(lan966x, port->bridge->dev_addr,
> > +                                    UNAWARE_PVID);
> >
> >               lan966x_vlan_port_del_vlan_mask(port, UNAWARE_PVID);
> >               lan966x_vlan_port_apply(port);
> > @@ -293,6 +300,7 @@ int lan966x_vlan_port_add_vlan(struct lan966x_port *port,
> >        */
> >       if (lan966x_vlan_cpu_member_cpu_vlan_mask(lan966x, vid)) {
> >               lan966x_mac_cpu_learn(lan966x, port->dev->dev_addr, vid);
> > +             lan966x_mac_cpu_learn(lan966x, port->bridge->dev_addr, vid);
> >               lan966x_vlan_cpu_add_vlan_mask(lan966x, vid);
> >       }
> >
> > @@ -322,8 +330,10 @@ int lan966x_vlan_port_del_vlan(struct lan966x_port *port,
> >        * that vlan but still keep it in the mask because it may be needed
> >        * again then another port gets added in tha vlan
> >        */
> > -     if (!lan966x_vlan_port_any_vlan_mask(lan966x, vid))
> > +     if (!lan966x_vlan_port_any_vlan_mask(lan966x, vid)) {
> > +             lan966x_mac_cpu_forget(lan966x, port->bridge->dev_addr, vid);
> >               lan966x_vlan_cpu_del_vlan_mask(lan966x, vid);
> > +     }
> >
> >       return 0;
> >  }
> > --
> > 2.33.0
> >

Horatiu Vultur Dec. 13, 2021, 10:25 a.m. UTC | #7

The 12/09/2021 17:43, Horatiu Vultur wrote:
> > > +int lan966x_register_notifier_blocks(struct lan966x *lan966x)
> > > +{
> > > +     int err;
> > > +
> > > +     lan966x->netdevice_nb.notifier_call = lan966x_netdevice_event;
> > > +     err = register_netdevice_notifier(&lan966x->netdevice_nb);
> > > +     if (err)
> > > +             return err;
> > > +
> > > +     lan966x->switchdev_nb.notifier_call = lan966x_switchdev_event;
> > > +     err = register_switchdev_notifier(&lan966x->switchdev_nb);
> > > +     if (err)
> > > +             goto err_switchdev_nb;
> > > +
> > > +     lan966x->switchdev_blocking_nb.notifier_call = lan966x_switchdev_blocking_event;
> > > +     err = register_switchdev_blocking_notifier(&lan966x->switchdev_blocking_nb);
> > > +     if (err)
> > > +             goto err_switchdev_blocking_nb;
> > > +
> > > +     lan966x_owq = alloc_ordered_workqueue("lan966x_order", 0);
> > > +     if (!lan966x_owq) {
> > > +             err = -ENOMEM;
> > > +             goto err_switchdev_blocking_nb;
> > > +     }
> > 
> > These should be singleton objects, otherwise things get problematic if
> > you have more than one switch device instantiated in the system.
> 
> Yes, I will update this.

Actually I think they need to be part of lan966x.
Because we want each lan966x instance to be independent of each other.
This is not seen in this version but is more clear in the next version
(v4).

> 
> > 
> 
> -- 
> /Horatiu

Vladimir Oltean Dec. 13, 2021, 11:46 a.m. UTC | #8

On Thu, Dec 09, 2021 at 05:43:11PM +0100, Horatiu Vultur wrote:
> > My documentation of CPU_SRC_COPY_ENA says:
> > 
> > If set, all frames received on this port are
> > copied to the CPU extraction queue given by
> > CPUQ_CFG.CPUQ_SRC_COPY.
> > 
> > I think it was established a while ago that this isn't what promiscuous
> > mode is about? Instead it is about accepting packets on a port
> > regardless of whether the MAC DA is in their RX filter or not.
> 
> Yes, I am aware that this change interprets the things differently and I
> am totally OK to drop this promisc if it is needed.

I think we just need to agree on the observable behavior. Promiscuous
means for an interface to receive packets with unknown destination, and
while in standalone mode you do support that, in bridge mode you're a
bit on the edge: the port accepts them but will deliver them anywhere
except to the CPU. I suppose you could try to make an argument that you
know better than the bridge, and as long as the use cases for that are
restricted enough, maybe it could work for most scenarios. I don't know.

> > Hence the oddity of your change. I understand what it intends to do:
> > if this is a standalone port you support IFF_UNICAST_FLT, so you drop
> > frames with unknown MAC DA. But if IFF_PROMISC is set, then why do you
> > copy all frames to the CPU? Why don't you just put the CPU in the
> > unknown flooding mask?
> 
> Because I don't want the CPU to be in the unknown flooding mask. I want
> to send frames to the CPU only if it is required.

What is the strategy through which this driver accepts things like
pinging the bridge device over IPv6, with the Neighbor Discovery
protocol having the ICMP6 neighbor solicitation messages delivered to
(according to my knowledge) an unregistered IPv6 multicast address?
Whose responsibility is it to notify the driver of that address, and
does the driver copy those packets to the CPU in the right VLAN?

> > How do you handle migration of an FDB entry pointing towards the CPU,
> > towards one pointing towards a port?
> 
> Shouldn't I get 2 calls that the entry is removed from CPU and then
> added to a port?

Ok.

Horatiu Vultur Dec. 13, 2021, 1:37 p.m. UTC | #9

The 12/13/2021 11:46, Vladimir Oltean wrote:
> 
> On Thu, Dec 09, 2021 at 05:43:11PM +0100, Horatiu Vultur wrote:
> > > My documentation of CPU_SRC_COPY_ENA says:
> > >
> > > If set, all frames received on this port are
> > > copied to the CPU extraction queue given by
> > > CPUQ_CFG.CPUQ_SRC_COPY.
> > >
> > > I think it was established a while ago that this isn't what promiscuous
> > > mode is about? Instead it is about accepting packets on a port
> > > regardless of whether the MAC DA is in their RX filter or not.
> >
> > Yes, I am aware that this change interprets the things differently and I
> > am totally OK to drop this promisc if it is needed.
> 
> I think we just need to agree on the observable behavior. Promiscuous
> means for an interface to receive packets with unknown destination, and
> while in standalone mode you do support that, in bridge mode you're a
> bit on the edge: the port accepts them but will deliver them anywhere
> except to the CPU. I suppose you could try to make an argument that you
> know better than the bridge, and as long as the use cases for that are
> restricted enough, maybe it could work for most scenarios. I don't know.

I think this requires some proper explanations of the intended
behaviour for both the standalone and bridge mode. I will drop this
promisc for now, as other drivers are doing it and at a later point
send some patch series with all the explanations.

> 
> > > Hence the oddity of your change. I understand what it intends to do:
> > > if this is a standalone port you support IFF_UNICAST_FLT, so you drop
> > > frames with unknown MAC DA. But if IFF_PROMISC is set, then why do you
> > > copy all frames to the CPU? Why don't you just put the CPU in the
> > > unknown flooding mask?
> >
> > Because I don't want the CPU to be in the unknown flooding mask. I want
> > to send frames to the CPU only if it is required.
> 
> What is the strategy through which this driver accepts things like
> pinging the bridge device over IPv6, with the Neighbor Discovery
> protocol having the ICMP6 neighbor solicitation messages delivered to
> (according to my knowledge) an unregistered IPv6 multicast address?
> Whose responsibility is it to notify the driver of that address, and
> does the driver copy those packets to the CPU in the right VLAN?

I think in that case the CPU should be part of the multicast flooding
mask. I will need to look more on this because I don't know much about
the IPv6.

> 
> > > How do you handle migration of an FDB entry pointing towards the CPU,
> > > towards one pointing towards a port?
> >
> > Shouldn't I get 2 calls that the entry is removed from CPU and then
> > added to a port?
> 
> Ok.

Vladimir Oltean Dec. 13, 2021, 1:43 p.m. UTC | #10

On Mon, Dec 13, 2021 at 11:25:29AM +0100, Horatiu Vultur wrote:
> The 12/09/2021 17:43, Horatiu Vultur wrote:
> > > > +int lan966x_register_notifier_blocks(struct lan966x *lan966x)
> > > > +{
> > > > +     int err;
> > > > +
> > > > +     lan966x->netdevice_nb.notifier_call = lan966x_netdevice_event;
> > > > +     err = register_netdevice_notifier(&lan966x->netdevice_nb);
> > > > +     if (err)
> > > > +             return err;
> > > > +
> > > > +     lan966x->switchdev_nb.notifier_call = lan966x_switchdev_event;
> > > > +     err = register_switchdev_notifier(&lan966x->switchdev_nb);
> > > > +     if (err)
> > > > +             goto err_switchdev_nb;
> > > > +
> > > > +     lan966x->switchdev_blocking_nb.notifier_call = lan966x_switchdev_blocking_event;
> > > > +     err = register_switchdev_blocking_notifier(&lan966x->switchdev_blocking_nb);
> > > > +     if (err)
> > > > +             goto err_switchdev_blocking_nb;
> > > > +
> > > > +     lan966x_owq = alloc_ordered_workqueue("lan966x_order", 0);
> > > > +     if (!lan966x_owq) {
> > > > +             err = -ENOMEM;
> > > > +             goto err_switchdev_blocking_nb;
> > > > +     }
> > > 
> > > These should be singleton objects, otherwise things get problematic if
> > > you have more than one switch device instantiated in the system.
> > 
> > Yes, I will update this.
> 
> Actually I think they need to be part of lan966x.
> Because we want each lan966x instance to be independent of each other.
> This is not seen in this version but is more clear in the next version
> (v4).

They are independent of each other. You deduce the interface on which
the notifier was emitted using switchdev_notifier_info_to_dev() and act
upon it, if lan966x_netdevice_check() is true. The notifier handling
code itself is stateless, all the state is per port / per switch.
If you register one notifier handler per switch, lan966x_netdevice_check()
would return true for each notifier handler instance, and you would
handle each event twice, would you not? This is why I'm saying that the
notifier handlers should be registered as singletons, like other drivers
do.

Horatiu Vultur Dec. 13, 2021, 2:26 p.m. UTC | #11

The 12/13/2021 13:43, Vladimir Oltean wrote:
> 
> On Mon, Dec 13, 2021 at 11:25:29AM +0100, Horatiu Vultur wrote:
> > The 12/09/2021 17:43, Horatiu Vultur wrote:
> > > > > +int lan966x_register_notifier_blocks(struct lan966x *lan966x)
> > > > > +{
> > > > > +     int err;
> > > > > +
> > > > > +     lan966x->netdevice_nb.notifier_call = lan966x_netdevice_event;
> > > > > +     err = register_netdevice_notifier(&lan966x->netdevice_nb);
> > > > > +     if (err)
> > > > > +             return err;
> > > > > +
> > > > > +     lan966x->switchdev_nb.notifier_call = lan966x_switchdev_event;
> > > > > +     err = register_switchdev_notifier(&lan966x->switchdev_nb);
> > > > > +     if (err)
> > > > > +             goto err_switchdev_nb;
> > > > > +
> > > > > +     lan966x->switchdev_blocking_nb.notifier_call = lan966x_switchdev_blocking_event;
> > > > > +     err = register_switchdev_blocking_notifier(&lan966x->switchdev_blocking_nb);
> > > > > +     if (err)
> > > > > +             goto err_switchdev_blocking_nb;
> > > > > +
> > > > > +     lan966x_owq = alloc_ordered_workqueue("lan966x_order", 0);
> > > > > +     if (!lan966x_owq) {
> > > > > +             err = -ENOMEM;
> > > > > +             goto err_switchdev_blocking_nb;
> > > > > +     }
> > > >
> > > > These should be singleton objects, otherwise things get problematic if
> > > > you have more than one switch device instantiated in the system.
> > >
> > > Yes, I will update this.
> >
> > Actually I think they need to be part of lan966x.
> > Because we want each lan966x instance to be independent of each other.
> > This is not seen in this version but is more clear in the next version
> > (v4).
> 
> They are independent of each other. You deduce the interface on which
> the notifier was emitted using switchdev_notifier_info_to_dev() and act
> upon it, if lan966x_netdevice_check() is true. The notifier handling
> code itself is stateless, all the state is per port / per switch.
> If you register one notifier handler per switch, lan966x_netdevice_check()
> would return true for each notifier handler instance, and you would
> handle each event twice, would you not?

That is correct, I will get the event twice which is a problem in the
lan966x. The function lan966x_netdevice_check should be per instance, in
this way each instance can filter the events.
The reason why I am putting the notifier_block inside lan966x is to be
able to get to the instance of lan966x even if I get a event that is not
for lan966x port.

> notifier handlers should be registered as singletons, like other drivers
> do.

It looks like not all the other driver register them as singletone. For
example: prestera, mlx5, sparx5. (I just have done a git grep for
register_switchdev_notifier, I have not looked in details at the
implementation).

Vladimir Oltean Dec. 13, 2021, 2:29 p.m. UTC | #12

On Mon, Dec 13, 2021 at 03:26:56PM +0100, Horatiu Vultur wrote:
> > They are independent of each other. You deduce the interface on which
> > the notifier was emitted using switchdev_notifier_info_to_dev() and act
> > upon it, if lan966x_netdevice_check() is true. The notifier handling
> > code itself is stateless, all the state is per port / per switch.
> > If you register one notifier handler per switch, lan966x_netdevice_check()
> > would return true for each notifier handler instance, and you would
> > handle each event twice, would you not?
> 
> That is correct, I will get the event twice which is a problem in the
> lan966x. The function lan966x_netdevice_check should be per instance, in
> this way each instance can filter the events.
> The reason why I am putting the notifier_block inside lan966x is to be
> able to get to the instance of lan966x even if I get a event that is not
> for lan966x port.

That isn't a problem, every netdevice notifier still sees all events.
DSA intercepts a lot of events which aren't directly emitted for its own
interfaces. You don't gain much by having one more, if anything.

> > notifier handlers should be registered as singletons, like other drivers
> > do.
> 
> It looks like not all the other driver register them as singletone. For
> example: prestera, mlx5, sparx5. (I just have done a git grep for
> register_switchdev_notifier, I have not looked in details at the
> implementation).

Not all driver writers may have realized that it is an issue that needs
to be thought of.

Horatiu Vultur Dec. 13, 2021, 3:28 p.m. UTC | #13

The 12/13/2021 14:29, Vladimir Oltean wrote:
> 
> On Mon, Dec 13, 2021 at 03:26:56PM +0100, Horatiu Vultur wrote:
> > > They are independent of each other. You deduce the interface on which
> > > the notifier was emitted using switchdev_notifier_info_to_dev() and act
> > > upon it, if lan966x_netdevice_check() is true. The notifier handling
> > > code itself is stateless, all the state is per port / per switch.
> > > If you register one notifier handler per switch, lan966x_netdevice_check()
> > > would return true for each notifier handler instance, and you would
> > > handle each event twice, would you not?
> >
> > That is correct, I will get the event twice which is a problem in the
> > lan966x. The function lan966x_netdevice_check should be per instance, in
> > this way each instance can filter the events.
> > The reason why I am putting the notifier_block inside lan966x is to be
> > able to get to the instance of lan966x even if I get a event that is not
> > for lan966x port.
> 
> That isn't a problem, every netdevice notifier still sees all events.

Yes, that is correct.
Sorry maybe I am still confused, but some things are still not right.

So lets say there are two lan966x instances(A and B) and each one has 2
ports(ethA0, ethA1, ethB0, ethB1).
So with the current behaviour, if for example ethA0 is added in vlan
100, then we get two callbacks for each lan966x instance(A and B) because
each of them registered. And because of lan966x_netdevice_check() is true
for ethA0 will do twice the work.
And you propose to have a singleton notifier so we get only 1 callback
and will be fine for this case. But if you add for example the bridge in
vlan 200 then I will never be able to get to the lan966x instance which
is needed in this case.

That is why if the lan966x_netdevice_check would be per instance, then
we can filter like before, we still get call twice but then we filter for
each instance. We get the lan966x instance from notifier_block and then
we can check if the port netdev_ops is the same as the lan966x
netdev_ops.

And in the other case we will still be able to get to the lan966x instance
in case the bridge is added in a vlan.

> DSA intercepts a lot of events which aren't directly emitted for its own
> interfaces. You don't gain much by having one more, if anything.
> 
> > > notifier handlers should be registered as singletons, like other drivers
> > > do.
> >
> > It looks like not all the other driver register them as singletone. For
> > example: prestera, mlx5, sparx5. (I just have done a git grep for
> > register_switchdev_notifier, I have not looked in details at the
> > implementation).
> 
> Not all driver writers may have realized that it is an issue that needs
> to be thought of.

Vladimir Oltean Dec. 13, 2021, 4:25 p.m. UTC | #14

On Mon, Dec 13, 2021 at 04:28:24PM +0100, Horatiu Vultur wrote:
> The 12/13/2021 14:29, Vladimir Oltean wrote:
> > 
> > On Mon, Dec 13, 2021 at 03:26:56PM +0100, Horatiu Vultur wrote:
> > > > They are independent of each other. You deduce the interface on which
> > > > the notifier was emitted using switchdev_notifier_info_to_dev() and act
> > > > upon it, if lan966x_netdevice_check() is true. The notifier handling
> > > > code itself is stateless, all the state is per port / per switch.
> > > > If you register one notifier handler per switch, lan966x_netdevice_check()
> > > > would return true for each notifier handler instance, and you would
> > > > handle each event twice, would you not?
> > >
> > > That is correct, I will get the event twice which is a problem in the
> > > lan966x. The function lan966x_netdevice_check should be per instance, in
> > > this way each instance can filter the events.
> > > The reason why I am putting the notifier_block inside lan966x is to be
> > > able to get to the instance of lan966x even if I get a event that is not
> > > for lan966x port.
> > 
> > That isn't a problem, every netdevice notifier still sees all events.
> 
> Yes, that is correct.
> Sorry maybe I am still confused, but some things are still not right.
> 
> So lets say there are two lan966x instances(A and B) and each one has 2
> ports(ethA0, ethA1, ethB0, ethB1).
> So with the current behaviour, if for example ethA0 is added in vlan
> 100, then we get two callbacks for each lan966x instance(A and B) because
> each of them registered. And because of lan966x_netdevice_check() is true
> for ethA0 will do twice the work.
> And you propose to have a singleton notifier so we get only 1 callback
> and will be fine for this case. But if you add for example the bridge in
> vlan 200 then I will never be able to get to the lan966x instance which
> is needed in this case.

I'm not sure what you mean by "you add the bridge in vlan 200" with
respect to netdevice notifiers. Create an 8021q upper with VID 200 on
top of a bridge (as that would generate a NETDEV_CHANGEUPPER)?
If there's a netdevice event on a bridge, the singleton netdevice event
handler can see if it is a bridge (netif_is_bridge_master), and if it
is, it can crawl through the bridge's lower interfaces using
netdev_for_each_lower_dev to see if there is any lan966x interface
beneath it. If there isn't, nothing to do. Otherwise, you get the
opportunity to do something for each port under that bridge. Maybe I'm
not understanding what you're trying to accomplish that's different?

> That is why if the lan966x_netdevice_check would be per instance, then
> we can filter like before, we still get call twice but then we filter for
> each instance. We get the lan966x instance from notifier_block and then
> we can check if the port netdev_ops is the same as the lan966x
> netdev_ops.
> 
> And in the other case we will still be able to get to the lan966x instance
> in case the bridge is added in a vlan.
> 
> > DSA intercepts a lot of events which aren't directly emitted for its own
> > interfaces. You don't gain much by having one more, if anything.
> > 
> > > > notifier handlers should be registered as singletons, like other drivers
> > > > do.
> > >
> > > It looks like not all the other driver register them as singletone. For
> > > example: prestera, mlx5, sparx5. (I just have done a git grep for
> > > register_switchdev_notifier, I have not looked in details at the
> > > implementation).
> > 
> > Not all driver writers may have realized that it is an issue that needs
> > to be thought of.
> 
> -- 
> /Horatiu

Horatiu Vultur Dec. 13, 2021, 9:24 p.m. UTC | #15

The 12/13/2021 16:25, Vladimir Oltean wrote:
> 
> On Mon, Dec 13, 2021 at 04:28:24PM +0100, Horatiu Vultur wrote:
> > The 12/13/2021 14:29, Vladimir Oltean wrote:
> > >
> > > On Mon, Dec 13, 2021 at 03:26:56PM +0100, Horatiu Vultur wrote:
> > > > > They are independent of each other. You deduce the interface on which
> > > > > the notifier was emitted using switchdev_notifier_info_to_dev() and act
> > > > > upon it, if lan966x_netdevice_check() is true. The notifier handling
> > > > > code itself is stateless, all the state is per port / per switch.
> > > > > If you register one notifier handler per switch, lan966x_netdevice_check()
> > > > > would return true for each notifier handler instance, and you would
> > > > > handle each event twice, would you not?
> > > >
> > > > That is correct, I will get the event twice which is a problem in the
> > > > lan966x. The function lan966x_netdevice_check should be per instance, in
> > > > this way each instance can filter the events.
> > > > The reason why I am putting the notifier_block inside lan966x is to be
> > > > able to get to the instance of lan966x even if I get a event that is not
> > > > for lan966x port.
> > >
> > > That isn't a problem, every netdevice notifier still sees all events.
> >
> > Yes, that is correct.
> > Sorry maybe I am still confused, but some things are still not right.
> >
> > So lets say there are two lan966x instances(A and B) and each one has 2
> > ports(ethA0, ethA1, ethB0, ethB1).
> > So with the current behaviour, if for example ethA0 is added in vlan
> > 100, then we get two callbacks for each lan966x instance(A and B) because
> > each of them registered. And because of lan966x_netdevice_check() is true
> > for ethA0 will do twice the work.
> > And you propose to have a singleton notifier so we get only 1 callback
> > and will be fine for this case. But if you add for example the bridge in
> > vlan 200 then I will never be able to get to the lan966x instance which
> > is needed in this case.
> 
> I'm not sure what you mean by "you add the bridge in vlan 200" with
> respect to netdevice notifiers. Create an 8021q upper with VID 200 on
> top of a bridge (as that would generate a NETDEV_CHANGEUPPER)?

I meant the following:

ip link add name brA type bridge
ip link add name brB type bridge
ip link set dev ethA0 master brA
ip link set dev ethA1 master brA
ip link set dev ethB0 master brB
ip link set dev ethB1 master brB
bridge vlan add dev brA vid 200 self

After the last command both lan966x instances will get a switchdev blocking
event where event is SWITCHDEV_PORT_OBJ_ADD and obj->id is
SWITCHDEV_OBJ_ID_PORT_VLAN. And in this case the
switchdev_notifier_info_to_dev will return brA.

> If there's a netdevice event on a bridge, the singleton netdevice event
> handler can see if it is a bridge (netif_is_bridge_master), and if it
> is, it can crawl through the bridge's lower interfaces using
> netdev_for_each_lower_dev to see if there is any lan966x interface
> beneath it. If there isn't, nothing to do. Otherwise, you get the
> opportunity to do something for each port under that bridge.

If I start to use switchdev_handle_port_obj_add, then as you mention
will get a callback for each interface under the port and then I need to
look in obj->orig_dev to see if it was a bridge or was a port that was
part of the bridge.

If I don't use switchdev_handle_port_obj_add and implement own function
then there is no way to get to the lan966x instance. I will get two
callbacks but then they can be filtered based on the bridge. If I use
switchdev_handle_port_obj_add then if I have 2 ports under the bridge,
both ports will be called which will do the same work anyway.

So I am not sure how much I will benefit of using
switchdev_handle_port_obj_add in this case.

One important observation in the driver is that I am separating these 2
cases:

bridge vlan add dev brA vid 300 self
bridge vlan add dev ethA0 vid 400

> Maybe I'm not understanding what you're trying to accomplish that's different?

The reason is that I want to use brA to control the CPU port. To decide
which frames to be copy to the CPU. Also to copy as few as possible
frames to CPU.

If we still want to go with the approach of using a singleton notifier
block, then we will still have a problem for netdevice notifier block.
We will get the same issue, can't get to lan966x instance in case the
lan966x callback is called for a different device. And we need this for
the following case:

If for example eth0, eth1 are part of a different IP and eth2, eth3 are
part of lan966x. We would like not to be able to put under the same
bridge interfaces that are part of different IPs (more precisely,
lan966x interfaces can be only under a bridge where lan966x interfaces
are part).

For example the following command should fail:
ip link add name br0 type bridge
ip link set dev eth0 master br0
ip link set dev eth2 master br0

Also the this command should fail:
ip link add name br0 type bridge
ip link set dev eth2 master br0
ip link set dev eth0 master br0

But the following should be accepted:
ip link add name br0 type bridge
ip link set dev eth0 master br0
ip link set dev eth1 master br0
ip link add name br1 type bridge
ip link set dev eth2 master br1
ip link set dev eth3 master br1

Maybe I should also make it explictly that is not allow to have more
than one instance of lan966x for now. And once is needed then add
support for it.

> 
> > That is why if the lan966x_netdevice_check would be per instance, then
> > we can filter like before, we still get call twice but then we filter for
> > each instance. We get the lan966x instance from notifier_block and then
> > we can check if the port netdev_ops is the same as the lan966x
> > netdev_ops.
> >
> > And in the other case we will still be able to get to the lan966x instance
> > in case the bridge is added in a vlan.
> >
> > > DSA intercepts a lot of events which aren't directly emitted for its own
> > > interfaces. You don't gain much by having one more, if anything.
> > >
> > > > > notifier handlers should be registered as singletons, like other drivers
> > > > > do.
> > > >
> > > > It looks like not all the other driver register them as singletone. For
> > > > example: prestera, mlx5, sparx5. (I just have done a git grep for
> > > > register_switchdev_notifier, I have not looked in details at the
> > > > implementation).
> > >
> > > Not all driver writers may have realized that it is an issue that needs
> > > to be thought of.
> >
> > --
> > /Horatiu

Vladimir Oltean Dec. 14, 2021, 12:01 a.m. UTC | #16

On Mon, Dec 13, 2021 at 10:24:50PM +0100, Horatiu Vultur wrote:
> The 12/13/2021 16:25, Vladimir Oltean wrote:
> >
> > On Mon, Dec 13, 2021 at 04:28:24PM +0100, Horatiu Vultur wrote:
> > > The 12/13/2021 14:29, Vladimir Oltean wrote:
> > > >
> > > > On Mon, Dec 13, 2021 at 03:26:56PM +0100, Horatiu Vultur wrote:
> > > > > > They are independent of each other. You deduce the interface on which
> > > > > > the notifier was emitted using switchdev_notifier_info_to_dev() and act
> > > > > > upon it, if lan966x_netdevice_check() is true. The notifier handling
> > > > > > code itself is stateless, all the state is per port / per switch.
> > > > > > If you register one notifier handler per switch, lan966x_netdevice_check()
> > > > > > would return true for each notifier handler instance, and you would
> > > > > > handle each event twice, would you not?
> > > > >
> > > > > That is correct, I will get the event twice which is a problem in the
> > > > > lan966x. The function lan966x_netdevice_check should be per instance, in
> > > > > this way each instance can filter the events.
> > > > > The reason why I am putting the notifier_block inside lan966x is to be
> > > > > able to get to the instance of lan966x even if I get a event that is not
> > > > > for lan966x port.
> > > >
> > > > That isn't a problem, every netdevice notifier still sees all events.
> > >
> > > Yes, that is correct.
> > > Sorry maybe I am still confused, but some things are still not right.
> > >
> > > So lets say there are two lan966x instances(A and B) and each one has 2
> > > ports(ethA0, ethA1, ethB0, ethB1).
> > > So with the current behaviour, if for example ethA0 is added in vlan
> > > 100, then we get two callbacks for each lan966x instance(A and B) because
> > > each of them registered. And because of lan966x_netdevice_check() is true
> > > for ethA0 will do twice the work.
> > > And you propose to have a singleton notifier so we get only 1 callback
> > > and will be fine for this case. But if you add for example the bridge in
> > > vlan 200 then I will never be able to get to the lan966x instance which
> > > is needed in this case.
> >
> > I'm not sure what you mean by "you add the bridge in vlan 200" with
> > respect to netdevice notifiers. Create an 8021q upper with VID 200 on
> > top of a bridge (as that would generate a NETDEV_CHANGEUPPER)?
>
> I meant the following:
>
> ip link add name brA type bridge
> ip link add name brB type bridge
> ip link set dev ethA0 master brA
> ip link set dev ethA1 master brA
> ip link set dev ethB0 master brB
> ip link set dev ethB1 master brB
> bridge vlan add dev brA vid 200 self

Ok, so the same as this use case and patch posted by Florian for DSA:
https://lkml.org/lkml/2018/6/24/300
we should be getting back to it some day.

> After the last command both lan966x instances will get a switchdev blocking
> event where event is SWITCHDEV_PORT_OBJ_ADD and obj->id is
> SWITCHDEV_OBJ_ID_PORT_VLAN. And in this case the
> switchdev_notifier_info_to_dev will return brA.

It returns brA anyway. But the point being, your current code submission
is something like this (of course, I had to fish these two functions
from two different patches, because they still aren't properly split):

static int lan966x_vlan_cpu_add_vlan_mask(struct lan966x *lan966x, u16 vid)
{
	lan966x->vlan_mask[vid] |= BIT(CPU_PORT);
	return lan966x_vlan_set_mask(lan966x, vid);
}

static int lan966x_handle_port_vlan_add(struct net_device *dev,
					struct notifier_block *nb,
					const struct switchdev_obj_port_vlan *v)
{
	struct lan966x_port *port;
	struct lan966x *lan966x;

	/* When adding a port to a vlan, we get a callback for the port but
	 * also for the bridge. When get the callback for the bridge just bail
	 * out. Then when the bridge is added to the vlan, then we get a
	 * callback here but in this case the flags has set:
	 * BRIDGE_VLAN_INFO_BRENTRY. In this case it means that the CPU
	 * port is added to the vlan, so the broadcast frames and unicast frames
	 * with dmac of the bridge should be foward to CPU.
	 */
	if (netif_is_bridge_master(dev) &&
	    !(v->flags & BRIDGE_VLAN_INFO_BRENTRY))
		return 0;

	lan966x = container_of(nb, struct lan966x, switchdev_blocking_nb);

	/* In case the port gets called */
	if (!(netif_is_bridge_master(dev))) {
		if (!lan966x_netdevice_check(dev))
			return -EOPNOTSUPP;

		port = netdev_priv(dev);
		return lan966x_vlan_port_add_vlan(port, v->vid,
						  v->flags & BRIDGE_VLAN_INFO_PVID,
						  v->flags & BRIDGE_VLAN_INFO_UNTAGGED);
	}

	/* In case the bridge gets called */
	if (netif_is_bridge_master(dev))
		return lan966x_vlan_cpu_add_vlan(lan966x, dev, v->vid);
		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
		In which way does this function call, exactly, check
		your lan966x's relationship to that bridge?

	return 0;
}

My point being, if you have two veth interfaces in your system just
minding their own business and being put in an unrelated bridge, and
that bridge would be put in VLAN 100 too, my understanding is that your
lan966x driver would sniff that event and add its CPU port to VLAN 100
too.  The reverse is true as well: any removal of a bridge from a VLAN
would also cause your CPU port to stop being in that VLAN, no matter
what interfaces may be in that VLAN. How could I say this... "spooky
action at a distance".

> > If there's a netdevice event on a bridge, the singleton netdevice event
> > handler can see if it is a bridge (netif_is_bridge_master), and if it
> > is, it can crawl through the bridge's lower interfaces using
> > netdev_for_each_lower_dev to see if there is any lan966x interface
> > beneath it. If there isn't, nothing to do. Otherwise, you get the
> > opportunity to do something for each port under that bridge.
>
> If I start to use switchdev_handle_port_obj_add, then as you mention
> will get a callback for each interface under the port and then I need to
> look in obj->orig_dev to see if it was a bridge or was a port that was
> part of the bridge.

Oh yes of course. And right now you don't need that because? You think
you get notifications only of switchdev events emitted by bridges that
you have a port in?

> If I don't use switchdev_handle_port_obj_add and implement own function
> then there is no way to get to the lan966x instance.

The switchdev_handle_port_obj_add() function isn't magic, and it has an
actual public implementation, too. Sure you can get to the lan966x
instance even if you don't use switchdev_handle_port_obj_add() -
although, it is there for people to use it.

> I will get two callbacks but then they can be filtered based on the
> bridge. If I use switchdev_handle_port_obj_add then if I have 2 ports
> under the bridge, both ports will be called which will do the same
> work anyway.

And that's a good thing, if you actually think about how you design
things to actually work. Please consider that you have two distinct
events: you can join a bridge that is in a VLAN, or the bridge can join
that VLAN while you have some ports under it. The invariant is that your
CPU port needs to be in that VLAN only for as long as there is any port
under that bridge. So it is actually beneficial to use the
switchdev_handle_* helper. It tells you how many users of the CPU VLAN
rule there still are. It would be broken to delete it right away, when a
port leaves the bridge. It would also be broken to not delete it after
all ports leave: the bridge may have a longer lifetime than the lan966x
ports beneath them, so there may not be any deletion event that you
should expect.

> So I am not sure how much I will benefit of using
> switchdev_handle_port_obj_add in this case.
>
> One important observation in the driver is that I am separating these 2
> cases:
>
> bridge vlan add dev brA vid 300 self
> bridge vlan add dev ethA0 vid 400

Understood, and that's ok. But I'm not convinced it works, though.

> > Maybe I'm not understanding what you're trying to accomplish that's different?
>
> The reason is that I want to use brA to control the CPU port. To decide
> which frames to be copy to the CPU. Also to copy as few as possible
> frames to CPU.
>
> If we still want to go with the approach of using a singleton notifier
> block, then we will still have a problem for netdevice notifier block.
> We will get the same issue, can't get to lan966x instance in case the
> lan966x callback is called for a different device. And we need this for
> the following case:
>
> If for example eth0, eth1 are part of a different IP and eth2, eth3 are
> part of lan966x. We would like not to be able to put under the same
> bridge interfaces that are part of different IPs (more precisely,
> lan966x interfaces can be only under a bridge where lan966x interfaces
> are part).
>
> For example the following command should fail:
> ip link add name br0 type bridge
> ip link set dev eth0 master br0
> ip link set dev eth2 master br0
>
> Also the this command should fail:
> ip link add name br0 type bridge
> ip link set dev eth2 master br0
> ip link set dev eth0 master br0
>
> But the following should be accepted:
> ip link add name br0 type bridge
> ip link set dev eth0 master br0
> ip link set dev eth1 master br0
> ip link add name br1 type bridge
> ip link set dev eth2 master br1
> ip link set dev eth3 master br1

You can track NETDEV_PRECHANGEUPPER and deny that, and also provide an
extack with a reason. That should work, it's been tested.

> Maybe I should also make it explictly that is not allow to have more
> than one instance of lan966x for now. And once is needed then add
> support for it.

I don't necessarily see the reason for this, but ok. I don't think you
should view things as "support for parallel instances of the driver is
what's complicating the implementation", but rather "catching the events
in all the permutations that they can happen in is what this driver
needs to provide a good user experience".

Horatiu Vultur Dec. 14, 2021, 2:31 p.m. UTC | #17

The 12/14/2021 00:01, Vladimir Oltean wrote:

Sorry for late reply, but I spend some time trying out your suggestions.

> 
> On Mon, Dec 13, 2021 at 10:24:50PM +0100, Horatiu Vultur wrote:
> > The 12/13/2021 16:25, Vladimir Oltean wrote:
> > >
> > > On Mon, Dec 13, 2021 at 04:28:24PM +0100, Horatiu Vultur wrote:
> > > > The 12/13/2021 14:29, Vladimir Oltean wrote:
> > > > >
> > > > > On Mon, Dec 13, 2021 at 03:26:56PM +0100, Horatiu Vultur wrote:
> > > > > > > They are independent of each other. You deduce the interface on which
> > > > > > > the notifier was emitted using switchdev_notifier_info_to_dev() and act
> > > > > > > upon it, if lan966x_netdevice_check() is true. The notifier handling
> > > > > > > code itself is stateless, all the state is per port / per switch.
> > > > > > > If you register one notifier handler per switch, lan966x_netdevice_check()
> > > > > > > would return true for each notifier handler instance, and you would
> > > > > > > handle each event twice, would you not?
> > > > > >
> > > > > > That is correct, I will get the event twice which is a problem in the
> > > > > > lan966x. The function lan966x_netdevice_check should be per instance, in
> > > > > > this way each instance can filter the events.
> > > > > > The reason why I am putting the notifier_block inside lan966x is to be
> > > > > > able to get to the instance of lan966x even if I get a event that is not
> > > > > > for lan966x port.
> > > > >
> > > > > That isn't a problem, every netdevice notifier still sees all events.
> > > >
> > > > Yes, that is correct.
> > > > Sorry maybe I am still confused, but some things are still not right.
> > > >
> > > > So lets say there are two lan966x instances(A and B) and each one has 2
> > > > ports(ethA0, ethA1, ethB0, ethB1).
> > > > So with the current behaviour, if for example ethA0 is added in vlan
> > > > 100, then we get two callbacks for each lan966x instance(A and B) because
> > > > each of them registered. And because of lan966x_netdevice_check() is true
> > > > for ethA0 will do twice the work.
> > > > And you propose to have a singleton notifier so we get only 1 callback
> > > > and will be fine for this case. But if you add for example the bridge in
> > > > vlan 200 then I will never be able to get to the lan966x instance which
> > > > is needed in this case.
> > >
> > > I'm not sure what you mean by "you add the bridge in vlan 200" with
> > > respect to netdevice notifiers. Create an 8021q upper with VID 200 on
> > > top of a bridge (as that would generate a NETDEV_CHANGEUPPER)?
> >
> > I meant the following:
> >
> > ip link add name brA type bridge
> > ip link add name brB type bridge
> > ip link set dev ethA0 master brA
> > ip link set dev ethA1 master brA
> > ip link set dev ethB0 master brB
> > ip link set dev ethB1 master brB
> > bridge vlan add dev brA vid 200 self
> 
> Ok, so the same as this use case and patch posted by Florian for DSA:
> https://lkml.org/lkml/2018/6/24/300
> we should be getting back to it some day.
> 
> > After the last command both lan966x instances will get a switchdev blocking
> > event where event is SWITCHDEV_PORT_OBJ_ADD and obj->id is
> > SWITCHDEV_OBJ_ID_PORT_VLAN. And in this case the
> > switchdev_notifier_info_to_dev will return brA.
> 
> It returns brA anyway. But the point being, your current code submission
> is something like this (of course, I had to fish these two functions
> from two different patches, because they still aren't properly split):
> 
> static int lan966x_vlan_cpu_add_vlan_mask(struct lan966x *lan966x, u16 vid)
> {
>         lan966x->vlan_mask[vid] |= BIT(CPU_PORT);
>         return lan966x_vlan_set_mask(lan966x, vid);
> }
> 
> static int lan966x_handle_port_vlan_add(struct net_device *dev,
>                                         struct notifier_block *nb,
>                                         const struct switchdev_obj_port_vlan *v)
> {
>         struct lan966x_port *port;
>         struct lan966x *lan966x;
> 
>         /* When adding a port to a vlan, we get a callback for the port but
>          * also for the bridge. When get the callback for the bridge just bail
>          * out. Then when the bridge is added to the vlan, then we get a
>          * callback here but in this case the flags has set:
>          * BRIDGE_VLAN_INFO_BRENTRY. In this case it means that the CPU
>          * port is added to the vlan, so the broadcast frames and unicast frames
>          * with dmac of the bridge should be foward to CPU.
>          */
>         if (netif_is_bridge_master(dev) &&
>             !(v->flags & BRIDGE_VLAN_INFO_BRENTRY))
>                 return 0;
> 
>         lan966x = container_of(nb, struct lan966x, switchdev_blocking_nb);
> 
>         /* In case the port gets called */
>         if (!(netif_is_bridge_master(dev))) {
>                 if (!lan966x_netdevice_check(dev))
>                         return -EOPNOTSUPP;
> 
>                 port = netdev_priv(dev);
>                 return lan966x_vlan_port_add_vlan(port, v->vid,
>                                                   v->flags & BRIDGE_VLAN_INFO_PVID,
>                                                   v->flags & BRIDGE_VLAN_INFO_UNTAGGED);
>         }
> 
>         /* In case the bridge gets called */
>         if (netif_is_bridge_master(dev))
>                 return lan966x_vlan_cpu_add_vlan(lan966x, dev, v->vid);
>                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>                 In which way does this function call, exactly, check
>                 your lan966x's relationship to that bridge?
> 
>         return 0;
> }
> 
> My point being, if you have two veth interfaces in your system just
> minding their own business and being put in an unrelated bridge, and
> that bridge would be put in VLAN 100 too, my understanding is that your
> lan966x driver would sniff that event and add its CPU port to VLAN 100
> too.  

That is correct and my initial thought was to add something like this:

if (netif_is_bridge_master(dev) && lan966x->bridge != dev)
	return 0;

> The reverse is true as well: any removal of a bridge from a VLAN
> would also cause your CPU port to stop being in that VLAN, no matter
> what interfaces may be in that VLAN. How could I say this... "spooky
> action at a distance".

But I don't need to do this if I use 'switchdev_handle_port_obj_add'

> 
> > > If there's a netdevice event on a bridge, the singleton netdevice event
> > > handler can see if it is a bridge (netif_is_bridge_master), and if it
> > > is, it can crawl through the bridge's lower interfaces using
> > > netdev_for_each_lower_dev to see if there is any lan966x interface
> > > beneath it. If there isn't, nothing to do. Otherwise, you get the
> > > opportunity to do something for each port under that bridge.
> >
> > If I start to use switchdev_handle_port_obj_add, then as you mention
> > will get a callback for each interface under the port and then I need to
> > look in obj->orig_dev to see if it was a bridge or was a port that was
> > part of the bridge.
> 
> Oh yes of course. And right now you don't need that because? You think
> you get notifications only of switchdev events emitted by bridges that
> you have a port in?

Actually I need it, it was just a comment.

> 
> > If I don't use switchdev_handle_port_obj_add and implement own function
> > then there is no way to get to the lan966x instance.
> 
> The switchdev_handle_port_obj_add() function isn't magic, and it has an
> actual public implementation, too. Sure you can get to the lan966x
> instance even if you don't use switchdev_handle_port_obj_add() -
> although, it is there for people to use it.

Yes, I can use switchdev_handle_port_obj_add and still get access to
bridge device and to lan966x instance.

> 
> > I will get two callbacks but then they can be filtered based on the
> > bridge. If I use switchdev_handle_port_obj_add then if I have 2 ports
> > under the bridge, both ports will be called which will do the same
> > work anyway.
> 
> And that's a good thing, if you actually think about how you design
> things to actually work. Please consider that you have two distinct
> events: you can join a bridge that is in a VLAN, or the bridge can join
> that VLAN while you have some ports under it. The invariant is that your
> CPU port needs to be in that VLAN only for as long as there is any port
> under that bridge. So it is actually beneficial to use the
> switchdev_handle_* helper. It tells you how many users of the CPU VLAN
> rule there still are. It would be broken to delete it right away, when a
> port leaves the bridge. It would also be broken to not delete it after
> all ports leave: the bridge may have a longer lifetime than the lan966x
> ports beneath them, so there may not be any deletion event that you
> should expect.

I have already some logic in the driver regarding this. To see how many
ports are part of a vlan and also in which vlan is the CPU.

> 
> > So I am not sure how much I will benefit of using
> > switchdev_handle_port_obj_add in this case.
> >
> > One important observation in the driver is that I am separating these 2
> > cases:
> >
> > bridge vlan add dev brA vid 300 self
> > bridge vlan add dev ethA0 vid 400
> 
> Understood, and that's ok. But I'm not convinced it works, though.

I think it would work :)

> 
> > > Maybe I'm not understanding what you're trying to accomplish that's different?
> >
> > The reason is that I want to use brA to control the CPU port. To decide
> > which frames to be copy to the CPU. Also to copy as few as possible
> > frames to CPU.
> >
> > If we still want to go with the approach of using a singleton notifier
> > block, then we will still have a problem for netdevice notifier block.
> > We will get the same issue, can't get to lan966x instance in case the
> > lan966x callback is called for a different device. And we need this for
> > the following case:
> >
> > If for example eth0, eth1 are part of a different IP and eth2, eth3 are
> > part of lan966x. We would like not to be able to put under the same
> > bridge interfaces that are part of different IPs (more precisely,
> > lan966x interfaces can be only under a bridge where lan966x interfaces
> > are part).
> >
> > For example the following command should fail:
> > ip link add name br0 type bridge
> > ip link set dev eth0 master br0
> > ip link set dev eth2 master br0
> >
> > Also the this command should fail:
> > ip link add name br0 type bridge
> > ip link set dev eth2 master br0
> > ip link set dev eth0 master br0
> >
> > But the following should be accepted:
> > ip link add name br0 type bridge
> > ip link set dev eth0 master br0
> > ip link set dev eth1 master br0
> > ip link add name br1 type bridge
> > ip link set dev eth2 master br1
> > ip link set dev eth3 master br1
> 
> You can track NETDEV_PRECHANGEUPPER and deny that, and also provide an
> extack with a reason. That should work, it's been tested.

Yes and that is what I was doing. But I had to keep a list of bridges to
see any of the bridges has any foreign interfaces. The problem was that
I kept the list on the lan966x instance, but that can be fixed easily
by making it static.

> 
> > Maybe I should also make it explictly that is not allow to have more
> > than one instance of lan966x for now. And once is needed then add
> > support for it.
> 
> I don't necessarily see the reason for this, but ok. I don't think you
> should view things as "support for parallel instances of the driver is
> what's complicating the implementation", but rather "catching the events
> in all the permutations that they can happen in is what this driver
> needs to provide a good user experience".

So long story short, I agree with your comments, I can make the
notifier_block as singleton objects and start to use
switchdev_handle_port_obj_add and the other variants.
In this way it should also work "parallel instances of the driver".

Vladimir Oltean Dec. 15, 2021, 9:27 a.m. UTC | #18

On Tue, Dec 14, 2021 at 03:31:29PM +0100, Horatiu Vultur wrote:
> So long story short, I agree with your comments, I can make the
> notifier_block as singleton objects and start to use
> switchdev_handle_port_obj_add and the other variants.
> In this way it should also work "parallel instances of the driver".

To be clear, what I'd like to see for v5 is a set of patterns that is
simple, clean and functional enough that it could be copied by other
drivers - I'd be interested in limiting the VLANs on the CPU ports in
DSA too.

[net-next,v3,0/6] net: lan966x: Add switchdev and vlan support

Message

Comments