diff mbox series

[iwl-net] ice: Fix switchdev slow-path in LAG

Message ID 20250102190751.7691-2-marcin.szycik@linux.intel.com
State Under Review
Delegated to: Anthony Nguyen
Headers show
Series [iwl-net] ice: Fix switchdev slow-path in LAG | expand

Commit Message

Marcin Szycik Jan. 2, 2025, 7:07 p.m. UTC
Ever since removing switchdev control VSI and using PF for port
representor Tx/Rx, switchdev slow-path has been working improperly after
failover in SR-IOV LAG. LAG assumes that the first uplink to be added to
the aggregate will own VFs and have switchdev configured. After
failing-over to the other uplink, representors are still configured to
Tx through the uplink they are set up on, which fails because that
uplink is now down.

On failover, update all PRs on primary uplink to use the currently
active uplink for Tx. Call netif_keep_dst(), as the secondary uplink
might not be in switchdev mode. Also make sure to call
ice_eswitch_set_target_vsi() if uplink is in LAG.

On the Rx path, representors are already working properly, because
default Tx from VFs is set to PF owning the eswitch. After failover the
same PF is receiving traffic from VFs, even though link is down.

Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lag.c  | 27 +++++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_txrx.c |  4 +++-
 2 files changed, 30 insertions(+), 1 deletion(-)

Comments

Simon Horman Jan. 6, 2025, 2:39 p.m. UTC | #1
On Thu, Jan 02, 2025 at 08:07:52PM +0100, Marcin Szycik wrote:
> Ever since removing switchdev control VSI and using PF for port
> representor Tx/Rx, switchdev slow-path has been working improperly after
> failover in SR-IOV LAG. LAG assumes that the first uplink to be added to
> the aggregate will own VFs and have switchdev configured. After
> failing-over to the other uplink, representors are still configured to
> Tx through the uplink they are set up on, which fails because that
> uplink is now down.
> 
> On failover, update all PRs on primary uplink to use the currently
> active uplink for Tx. Call netif_keep_dst(), as the secondary uplink
> might not be in switchdev mode. Also make sure to call
> ice_eswitch_set_target_vsi() if uplink is in LAG.
> 
> On the Rx path, representors are already working properly, because
> default Tx from VFs is set to PF owning the eswitch. After failover the
> same PF is receiving traffic from VFs, even though link is down.
> 
> Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Buvaneswaran, Sujai Jan. 13, 2025, 8:17 a.m. UTC | #2
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Marcin Szycik
> Sent: Friday, January 3, 2025 12:38 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Marcin Szycik <marcin.szycik@linux.intel.com>;
> Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev slow-path in LAG
> 
> Ever since removing switchdev control VSI and using PF for port representor
> Tx/Rx, switchdev slow-path has been working improperly after failover in SR-
> IOV LAG. LAG assumes that the first uplink to be added to the aggregate will
> own VFs and have switchdev configured. After failing-over to the other
> uplink, representors are still configured to Tx through the uplink they are set
> up on, which fails because that uplink is now down.
> 
> On failover, update all PRs on primary uplink to use the currently active
> uplink for Tx. Call netif_keep_dst(), as the secondary uplink might not be in
> switchdev mode. Also make sure to call
> ice_eswitch_set_target_vsi() if uplink is in LAG.
> 
> On the Rx path, representors are already working properly, because default
> Tx from VFs is set to PF owning the eswitch. After failover the same PF is
> receiving traffic from VFs, even though link is down.
> 
> Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_lag.c  | 27 +++++++++++++++++++++++
> drivers/net/ethernet/intel/ice/ice_txrx.c |  4 +++-
>  2 files changed, 30 insertions(+), 1 deletion(-)
> 
Hi,

Observing below call trace while creating VFs in Switchdev mode with this patch in net-queue.

[  +0.000188] ice 0000:b1:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF
[  +0.000062] list_add corruption. next->prev should be prev (ff1d7c830300c6f0), but was ff282828ff282828. (next=ff1d7c5367d61330).
[  +0.000015] ------------[ cut here ]------------
[  +0.000001] kernel BUG at lib/list_debug.c:29!
[  +0.000007] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  +0.000004] CPU: 81 UID: 0 PID: 2758 Comm: bash Kdump: loaded Not tainted 6.13.0-rc3+ #1
[  +0.000003] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.3.8 08/31/2021
[  +0.000002] RIP: 0010:__list_add_valid_or_report+0x61/0xa0
[  +0.000008] Code: c7 c7 a8 97 b2 8f e8 7e e4 af ff 0f 0b 48 c7 c7 d0 97 b2 8f e8 70 e4 af ff 0f 0b 4c 89 c1 48 c7 c7 f8 97 b2 8f e8 5f e4 af ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 50 98 b2 8f e8 48 e4 af
[  +0.000002] RSP: 0018:ff5ebf3d22093d20 EFLAGS: 00010246
[  +0.000003] RAX: 0000000000000075 RBX: ff1d7c54143a1330 RCX: 0000000000000000
[  +0.000002] RDX: 0000000000000000 RSI: ff1d7c81f06a0bc0 RDI: ff1d7c81f06a0bc0
[  +0.000001] RBP: ff1d7c83030097d8 R08: 0000000000000000 R09: ff5ebf3d22093bd8
[  +0.000002] R10: ff5ebf3d22093bd0 R11: ffffffff901debc8 R12: ff1d7c5367d61330
[  +0.000001] R13: ff1d7c830300c6f0 R14: 0000000000000000 R15: 0000000000000000
[  +0.000002] FS:  00007fea5e4e4740(0000) GS:ff1d7c81f0680000(0000) knlGS:0000000000000000
[  +0.000002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000001] CR2: 0000562ef57c7608 CR3: 000000019037c002 CR4: 0000000000773ef0
[  +0.000002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0.000001] PKRU: 55555554
[  +0.000002] Call Trace:
[  +0.000003]  <TASK>
[  +0.000002]  ? die+0x37/0x90
[  +0.000007]  ? do_trap+0xdd/0x100
[  +0.000004]  ? __list_add_valid_or_report+0x61/0xa0
[  +0.000003]  ? do_error_trap+0x65/0x80
[  +0.000002]  ? __list_add_valid_or_report+0x61/0xa0
[  +0.000003]  ? exc_invalid_op+0x52/0x70
[  +0.000005]  ? __list_add_valid_or_report+0x61/0xa0
[  +0.000002]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.000007]  ? __list_add_valid_or_report+0x61/0xa0
[  +0.000005]  ice_mbx_init_vf_info+0x3c/0x60 [ice]
[  +0.000076]  ice_initialize_vf_entry+0x99/0xa0 [ice]

Regards,
Sujai B
Romanowski, Rafal Jan. 14, 2025, 6:32 p.m. UTC | #3
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Buvaneswaran, Sujai
> Sent: Monday, January 13, 2025 9:17 AM
> To: Marcin Szycik <marcin.szycik@linux.intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Michal Swiatkowski
> <michal.swiatkowski@linux.intel.com>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev slow-path in LAG
> 
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> > Of Marcin Szycik
> > Sent: Friday, January 3, 2025 12:38 AM
> > To: intel-wired-lan@lists.osuosl.org
> > Cc: netdev@vger.kernel.org; Marcin Szycik
> > <marcin.szycik@linux.intel.com>; Michal Swiatkowski
> > <michal.swiatkowski@linux.intel.com>
> > Subject: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev
> > slow-path in LAG
> >
> > Ever since removing switchdev control VSI and using PF for port
> > representor Tx/Rx, switchdev slow-path has been working improperly
> > after failover in SR- IOV LAG. LAG assumes that the first uplink to be
> > added to the aggregate will own VFs and have switchdev configured.
> > After failing-over to the other uplink, representors are still
> > configured to Tx through the uplink they are set up on, which fails because that
> uplink is now down.
> >
> > On failover, update all PRs on primary uplink to use the currently
> > active uplink for Tx. Call netif_keep_dst(), as the secondary uplink
> > might not be in switchdev mode. Also make sure to call
> > ice_eswitch_set_target_vsi() if uplink is in LAG.
> >
> > On the Rx path, representors are already working properly, because
> > default Tx from VFs is set to PF owning the eswitch. After failover
> > the same PF is receiving traffic from VFs, even though link is down.
> >
> > Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
> > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_lag.c  | 27
> > +++++++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_txrx.c |  4
> > +++-
> >  2 files changed, 30 insertions(+), 1 deletion(-)
> >


Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c
index 1ccb572ce285..22371011c249 100644
--- a/drivers/net/ethernet/intel/ice/ice_lag.c
+++ b/drivers/net/ethernet/intel/ice/ice_lag.c
@@ -1000,6 +1000,28 @@  static void ice_lag_link(struct ice_lag *lag)
 	netdev_info(lag->netdev, "Shared SR-IOV resources in bond are active\n");
 }
 
+/**
+ * ice_lag_config_eswitch - configure eswitch to work with LAG
+ * @lag: lag info struct
+ * @netdev: active network interface device struct
+ *
+ * Updates all port representors in eswitch to use @netdev for Tx.
+ *
+ * Configures the netdev to keep dst metadata (also used in representor Tx).
+ * This is required for an uplink without switchdev mode configured.
+ */
+static void ice_lag_config_eswitch(struct ice_lag *lag,
+				   struct net_device *netdev)
+{
+	struct ice_repr *repr;
+	unsigned long id;
+
+	xa_for_each(&lag->pf->eswitch.reprs, id, repr)
+		repr->dst->u.port_info.lower_dev = netdev;
+
+	netif_keep_dst(netdev);
+}
+
 /**
  * ice_lag_unlink - handle unlink event
  * @lag: LAG info struct
@@ -1021,6 +1043,9 @@  static void ice_lag_unlink(struct ice_lag *lag)
 			ice_lag_move_vf_nodes(lag, act_port, pri_port);
 		lag->primary = false;
 		lag->active_port = ICE_LAG_INVALID_PORT;
+
+		/* Config primary's eswitch back to normal operation. */
+		ice_lag_config_eswitch(lag, lag->netdev);
 	} else {
 		struct ice_lag *primary_lag;
 
@@ -1419,6 +1444,7 @@  static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr)
 				ice_lag_move_vf_nodes(lag, prim_port,
 						      event_port);
 			lag->active_port = event_port;
+			ice_lag_config_eswitch(lag, event_netdev);
 			return;
 		}
 
@@ -1428,6 +1454,7 @@  static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr)
 		/* new active port */
 		ice_lag_move_vf_nodes(lag, lag->active_port, event_port);
 		lag->active_port = event_port;
+		ice_lag_config_eswitch(lag, event_netdev);
 	} else {
 		/* port not set as currently active (e.g. new active port
 		 * has already claimed the nodes and filters
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 5d2d7736fd5f..f1c06c227dc5 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -2368,7 +2368,9 @@  ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring)
 					ICE_TXD_CTX_QW1_CMD_S);
 
 	ice_tstamp(tx_ring, skb, first, &offload);
-	if (ice_is_switchdev_running(vsi->back) && vsi->type != ICE_VSI_SF)
+	if ((ice_is_switchdev_running(vsi->back) ||
+	     ice_lag_is_switchdev_running(vsi->back)) &&
+	    vsi->type != ICE_VSI_SF)
 		ice_eswitch_set_target_vsi(skb, &offload);
 
 	if (offload.cd_qw1 & ICE_TX_DESC_DTYPE_CTX) {