Message ID | 20250102190751.7691-2-marcin.szycik@linux.intel.com |
---|---|
State | Under Review |
Delegated to: | Anthony Nguyen |
Headers | show |
Series | [iwl-net] ice: Fix switchdev slow-path in LAG | expand |
On Thu, Jan 02, 2025 at 08:07:52PM +0100, Marcin Szycik wrote: > Ever since removing switchdev control VSI and using PF for port > representor Tx/Rx, switchdev slow-path has been working improperly after > failover in SR-IOV LAG. LAG assumes that the first uplink to be added to > the aggregate will own VFs and have switchdev configured. After > failing-over to the other uplink, representors are still configured to > Tx through the uplink they are set up on, which fails because that > uplink is now down. > > On failover, update all PRs on primary uplink to use the currently > active uplink for Tx. Call netif_keep_dst(), as the secondary uplink > might not be in switchdev mode. Also make sure to call > ice_eswitch_set_target_vsi() if uplink is in LAG. > > On the Rx path, representors are already working properly, because > default Tx from VFs is set to PF owning the eswitch. After failover the > same PF is receiving traffic from VFs, even though link is down. > > Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path") > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org>
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of > Marcin Szycik > Sent: Friday, January 3, 2025 12:38 AM > To: intel-wired-lan@lists.osuosl.org > Cc: netdev@vger.kernel.org; Marcin Szycik <marcin.szycik@linux.intel.com>; > Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Subject: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev slow-path in LAG > > Ever since removing switchdev control VSI and using PF for port representor > Tx/Rx, switchdev slow-path has been working improperly after failover in SR- > IOV LAG. LAG assumes that the first uplink to be added to the aggregate will > own VFs and have switchdev configured. After failing-over to the other > uplink, representors are still configured to Tx through the uplink they are set > up on, which fails because that uplink is now down. > > On failover, update all PRs on primary uplink to use the currently active > uplink for Tx. Call netif_keep_dst(), as the secondary uplink might not be in > switchdev mode. Also make sure to call > ice_eswitch_set_target_vsi() if uplink is in LAG. > > On the Rx path, representors are already working properly, because default > Tx from VFs is set to PF owning the eswitch. After failover the same PF is > receiving traffic from VFs, even though link is down. > > Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path") > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> > --- > drivers/net/ethernet/intel/ice/ice_lag.c | 27 +++++++++++++++++++++++ > drivers/net/ethernet/intel/ice/ice_txrx.c | 4 +++- > 2 files changed, 30 insertions(+), 1 deletion(-) > Hi, Observing below call trace while creating VFs in Switchdev mode with this patch in net-queue. [ +0.000188] ice 0000:b1:00.0: Enabling 1 VFs with 17 vectors and 16 queues per VF [ +0.000062] list_add corruption. next->prev should be prev (ff1d7c830300c6f0), but was ff282828ff282828. (next=ff1d7c5367d61330). [ +0.000015] ------------[ cut here ]------------ [ +0.000001] kernel BUG at lib/list_debug.c:29! [ +0.000007] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ +0.000004] CPU: 81 UID: 0 PID: 2758 Comm: bash Kdump: loaded Not tainted 6.13.0-rc3+ #1 [ +0.000003] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.3.8 08/31/2021 [ +0.000002] RIP: 0010:__list_add_valid_or_report+0x61/0xa0 [ +0.000008] Code: c7 c7 a8 97 b2 8f e8 7e e4 af ff 0f 0b 48 c7 c7 d0 97 b2 8f e8 70 e4 af ff 0f 0b 4c 89 c1 48 c7 c7 f8 97 b2 8f e8 5f e4 af ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 50 98 b2 8f e8 48 e4 af [ +0.000002] RSP: 0018:ff5ebf3d22093d20 EFLAGS: 00010246 [ +0.000003] RAX: 0000000000000075 RBX: ff1d7c54143a1330 RCX: 0000000000000000 [ +0.000002] RDX: 0000000000000000 RSI: ff1d7c81f06a0bc0 RDI: ff1d7c81f06a0bc0 [ +0.000001] RBP: ff1d7c83030097d8 R08: 0000000000000000 R09: ff5ebf3d22093bd8 [ +0.000002] R10: ff5ebf3d22093bd0 R11: ffffffff901debc8 R12: ff1d7c5367d61330 [ +0.000001] R13: ff1d7c830300c6f0 R14: 0000000000000000 R15: 0000000000000000 [ +0.000002] FS: 00007fea5e4e4740(0000) GS:ff1d7c81f0680000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 0000562ef57c7608 CR3: 000000019037c002 CR4: 0000000000773ef0 [ +0.000002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000001] PKRU: 55555554 [ +0.000002] Call Trace: [ +0.000003] <TASK> [ +0.000002] ? die+0x37/0x90 [ +0.000007] ? do_trap+0xdd/0x100 [ +0.000004] ? __list_add_valid_or_report+0x61/0xa0 [ +0.000003] ? do_error_trap+0x65/0x80 [ +0.000002] ? __list_add_valid_or_report+0x61/0xa0 [ +0.000003] ? exc_invalid_op+0x52/0x70 [ +0.000005] ? __list_add_valid_or_report+0x61/0xa0 [ +0.000002] ? asm_exc_invalid_op+0x1a/0x20 [ +0.000007] ? __list_add_valid_or_report+0x61/0xa0 [ +0.000005] ice_mbx_init_vf_info+0x3c/0x60 [ice] [ +0.000076] ice_initialize_vf_entry+0x99/0xa0 [ice] Regards, Sujai B
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of > Buvaneswaran, Sujai > Sent: Monday, January 13, 2025 9:17 AM > To: Marcin Szycik <marcin.szycik@linux.intel.com>; intel-wired- > lan@lists.osuosl.org > Cc: netdev@vger.kernel.org; Michal Swiatkowski > <michal.swiatkowski@linux.intel.com> > Subject: Re: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev slow-path in LAG > > > -----Original Message----- > > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > > Of Marcin Szycik > > Sent: Friday, January 3, 2025 12:38 AM > > To: intel-wired-lan@lists.osuosl.org > > Cc: netdev@vger.kernel.org; Marcin Szycik > > <marcin.szycik@linux.intel.com>; Michal Swiatkowski > > <michal.swiatkowski@linux.intel.com> > > Subject: [Intel-wired-lan] [PATCH iwl-net] ice: Fix switchdev > > slow-path in LAG > > > > Ever since removing switchdev control VSI and using PF for port > > representor Tx/Rx, switchdev slow-path has been working improperly > > after failover in SR- IOV LAG. LAG assumes that the first uplink to be > > added to the aggregate will own VFs and have switchdev configured. > > After failing-over to the other uplink, representors are still > > configured to Tx through the uplink they are set up on, which fails because that > uplink is now down. > > > > On failover, update all PRs on primary uplink to use the currently > > active uplink for Tx. Call netif_keep_dst(), as the secondary uplink > > might not be in switchdev mode. Also make sure to call > > ice_eswitch_set_target_vsi() if uplink is in LAG. > > > > On the Rx path, representors are already working properly, because > > default Tx from VFs is set to PF owning the eswitch. After failover > > the same PF is receiving traffic from VFs, even though link is down. > > > > Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path") > > Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > > Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> > > --- > > drivers/net/ethernet/intel/ice/ice_lag.c | 27 > > +++++++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_txrx.c | 4 > > +++- > > 2 files changed, 30 insertions(+), 1 deletion(-) > > Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index 1ccb572ce285..22371011c249 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -1000,6 +1000,28 @@ static void ice_lag_link(struct ice_lag *lag) netdev_info(lag->netdev, "Shared SR-IOV resources in bond are active\n"); } +/** + * ice_lag_config_eswitch - configure eswitch to work with LAG + * @lag: lag info struct + * @netdev: active network interface device struct + * + * Updates all port representors in eswitch to use @netdev for Tx. + * + * Configures the netdev to keep dst metadata (also used in representor Tx). + * This is required for an uplink without switchdev mode configured. + */ +static void ice_lag_config_eswitch(struct ice_lag *lag, + struct net_device *netdev) +{ + struct ice_repr *repr; + unsigned long id; + + xa_for_each(&lag->pf->eswitch.reprs, id, repr) + repr->dst->u.port_info.lower_dev = netdev; + + netif_keep_dst(netdev); +} + /** * ice_lag_unlink - handle unlink event * @lag: LAG info struct @@ -1021,6 +1043,9 @@ static void ice_lag_unlink(struct ice_lag *lag) ice_lag_move_vf_nodes(lag, act_port, pri_port); lag->primary = false; lag->active_port = ICE_LAG_INVALID_PORT; + + /* Config primary's eswitch back to normal operation. */ + ice_lag_config_eswitch(lag, lag->netdev); } else { struct ice_lag *primary_lag; @@ -1419,6 +1444,7 @@ static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr) ice_lag_move_vf_nodes(lag, prim_port, event_port); lag->active_port = event_port; + ice_lag_config_eswitch(lag, event_netdev); return; } @@ -1428,6 +1454,7 @@ static void ice_lag_monitor_active(struct ice_lag *lag, void *ptr) /* new active port */ ice_lag_move_vf_nodes(lag, lag->active_port, event_port); lag->active_port = event_port; + ice_lag_config_eswitch(lag, event_netdev); } else { /* port not set as currently active (e.g. new active port * has already claimed the nodes and filters diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 5d2d7736fd5f..f1c06c227dc5 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -2368,7 +2368,9 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring) ICE_TXD_CTX_QW1_CMD_S); ice_tstamp(tx_ring, skb, first, &offload); - if (ice_is_switchdev_running(vsi->back) && vsi->type != ICE_VSI_SF) + if ((ice_is_switchdev_running(vsi->back) || + ice_lag_is_switchdev_running(vsi->back)) && + vsi->type != ICE_VSI_SF) ice_eswitch_set_target_vsi(skb, &offload); if (offload.cd_qw1 & ICE_TX_DESC_DTYPE_CTX) {