[net-next,0/3] cxgb4: add mirror action support for TC-MATCHALL

Message ID	cover.1593085107.git.rahul.lakkireddy@chelsio.com
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> To: netdev@vger.kernel.org Cc: davem@davemloft.net, nirranjan@chelsio.com, vishal@chelsio.com, dt@chelsio.com Subject: [PATCH net-next 0/3] cxgb4: add mirror action support for TC-MATCHALL Date: Thu, 25 Jun 2020 17:28:40 +0530 Message-Id: <cover.1593085107.git.rahul.lakkireddy@chelsio.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	cxgb4: add mirror action support for TC-MATCHALL \| expand [net-next,0/3] cxgb4: add mirror action support for TC-MATCHALL [net-next,1/3] cxgb4: add mirror action to TC-MATCHALL offload [net-next,2/3] cxgb4: add support for mirror Rxqs [net-next,3/3] cxgb4: add main VI to mirror VI config replication

Rahul Lakkireddy June 25, 2020, 11:58 a.m. UTC

This series of patches add support to mirror all ingress traffic
for TC-MATCHALL ingress offload.

Patch 1 adds support to dynamically create a mirror Virtual Interface
(VI) that accepts all mirror ingress traffic when mirror action is
set in TC-MATCHALL offload.

Patch 2 adds support to allocate mirror Rxqs and setup RSS for the
mirror VI.

Patch 3 adds support to replicate all the main VI configuration to
mirror VI. This includes replicating MTU, promiscuous mode,
all-multicast mode, and enabled netdev Rx feature offloads.

Thanks,
Rahul

Rahul Lakkireddy (3):
  cxgb4: add mirror action to TC-MATCHALL offload
  cxgb4: add support for mirror Rxqs
  cxgb4: add main VI to mirror VI config replication

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h    |  21 +
 .../ethernet/chelsio/cxgb4/cxgb4_debugfs.c    |  69 ++-
 .../net/ethernet/chelsio/cxgb4/cxgb4_main.c   | 487 ++++++++++++++++--
 .../ethernet/chelsio/cxgb4/cxgb4_tc_flower.c  |  16 +-
 .../ethernet/chelsio/cxgb4/cxgb4_tc_flower.h  |   3 +-
 .../chelsio/cxgb4/cxgb4_tc_matchall.c         |  57 +-
 .../chelsio/cxgb4/cxgb4_tc_matchall.h         |   1 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c    |  16 +
 8 files changed, 626 insertions(+), 44 deletions(-)

Jakub Kicinski June 25, 2020, 10:55 p.m. UTC | #1

On Thu, 25 Jun 2020 17:28:40 +0530 Rahul Lakkireddy wrote:
> This series of patches add support to mirror all ingress traffic
> for TC-MATCHALL ingress offload.
> 
> Patch 1 adds support to dynamically create a mirror Virtual Interface
> (VI) that accepts all mirror ingress traffic when mirror action is
> set in TC-MATCHALL offload.
> 
> Patch 2 adds support to allocate mirror Rxqs and setup RSS for the
> mirror VI.
> 
> Patch 3 adds support to replicate all the main VI configuration to
> mirror VI. This includes replicating MTU, promiscuous mode,
> all-multicast mode, and enabled netdev Rx feature offloads.

Could you say more about this mirror VI? Is this an internal object
within the NIC or something visible to the user?

Also looking at the implementation of redirect:

		case FLOW_ACTION_REDIRECT: {
			struct net_device *out = act->dev;
			struct port_info *pi = netdev_priv(out);

			fs->action = FILTER_SWITCH;
			fs->eport = pi->port_id;
			}

How do you know the output interface is controlled by your driver, and
therefore it's sage to cast netdev_priv() to port_info?

Rahul Lakkireddy June 26, 2020, 10:06 a.m. UTC | #2

On Thursday, June 06/25/20, 2020 at 15:55:10 -0700, Jakub Kicinski wrote:
> On Thu, 25 Jun 2020 17:28:40 +0530 Rahul Lakkireddy wrote:
> > This series of patches add support to mirror all ingress traffic
> > for TC-MATCHALL ingress offload.
> > 
> > Patch 1 adds support to dynamically create a mirror Virtual Interface
> > (VI) that accepts all mirror ingress traffic when mirror action is
> > set in TC-MATCHALL offload.
> > 
> > Patch 2 adds support to allocate mirror Rxqs and setup RSS for the
> > mirror VI.
> > 
> > Patch 3 adds support to replicate all the main VI configuration to
> > mirror VI. This includes replicating MTU, promiscuous mode,
> > all-multicast mode, and enabled netdev Rx feature offloads.
> 
> Could you say more about this mirror VI? Is this an internal object
> within the NIC or something visible to the user?
> 

The Virtual Interface (VI) is an internal object managed by firmware
and Multi Port Switch (MPS) module in hardware. Each VI can be
programmed with a unique MAC address in the MPS TCAM. So, 1 physical
port can have multiple VIs, each with their own MAC address. It's
also possible for VIs to share the same MAC address, which would
result in MPS setting the replication mode for that entry in the
TCAM. In this case, the incoming packet would get replicated and
sent to all the VIs sharing the MAC address. When MPS is able to
classify the destination MAC in the incoming packet with an entry
in the MPS TCAM, it forwards the packet to the corresponding VI(s).

In case of Mirror VI, we program the same MAC as the existing main
VI. This will result in MPS setting the replication mode for that
existing entry in the MPS TCAM. So, the MPS would replicate the
incoming packet and send it to both the main VI and mirror VI.
Note that for the main VI, we also programmed the flow Lookup Engine
(LE) module to switch the packet back out on one of the underlying
ports. So, when this rule hits in the LE, the main VI's packet would
get switched back out in hardware to one of the underlying ports and
will not reach driver. The mirror VI's packet will not hit any rule
in the LE and will be received by the driver and will be sent up to
Linux networking stack.

> Also looking at the implementation of redirect:
> 
> 		case FLOW_ACTION_REDIRECT: {
> 			struct net_device *out = act->dev;
> 			struct port_info *pi = netdev_priv(out);
> 
> 			fs->action = FILTER_SWITCH;
> 			fs->eport = pi->port_id;
> 			}
> 
> How do you know the output interface is controlled by your driver, and
> therefore it's sage to cast netdev_priv() to port_info?

We're validating it earlier in cxgb4_validate_flow_actions().
Here's the code snippet. We're saving the netdevice pointer returned
by alloc_etherdev_mq() during PCI probe in cxgb4_main.c::init_one()
and using it to compare the netdevice given by redirect action. If
the redirect action's netdevice doesn't match any of our underlying
ports, then we fail offloading this rule.

		case FLOW_ACTION_REDIRECT: {
			struct adapter *adap = netdev2adap(dev);
			struct net_device *n_dev, *target_dev;
			unsigned int i;
			bool found = false;

			target_dev = act->dev;
			for_each_port(adap, i) {
				n_dev = adap->port[i];
				if (target_dev == n_dev) {
					found = true;
					break;
				}
			}

			/* If interface doesn't belong to our hw, then
			 * the provided output port is not valid
			 */
			if (!found) {
				netdev_err(dev, "%s: Out port invalid\n",
					   __func__);
				return -EINVAL;
			}
			act_redir = true;
			}
			break;

Thanks,
Rahul

Jakub Kicinski June 26, 2020, 4:55 p.m. UTC | #3

On Fri, 26 Jun 2020 15:36:15 +0530 Rahul Lakkireddy wrote:
> On Thursday, June 06/25/20, 2020 at 15:55:10 -0700, Jakub Kicinski wrote:
> > On Thu, 25 Jun 2020 17:28:40 +0530 Rahul Lakkireddy wrote:  
> > > This series of patches add support to mirror all ingress traffic
> > > for TC-MATCHALL ingress offload.
> > > 
> > > Patch 1 adds support to dynamically create a mirror Virtual Interface
> > > (VI) that accepts all mirror ingress traffic when mirror action is
> > > set in TC-MATCHALL offload.
> > > 
> > > Patch 2 adds support to allocate mirror Rxqs and setup RSS for the
> > > mirror VI.
> > > 
> > > Patch 3 adds support to replicate all the main VI configuration to
> > > mirror VI. This includes replicating MTU, promiscuous mode,
> > > all-multicast mode, and enabled netdev Rx feature offloads.  
> > 
> > Could you say more about this mirror VI? Is this an internal object
> > within the NIC or something visible to the user?
> >   
> 
> The Virtual Interface (VI) is an internal object managed by firmware
> and Multi Port Switch (MPS) module in hardware. Each VI can be
> programmed with a unique MAC address in the MPS TCAM. So, 1 physical
> port can have multiple VIs, each with their own MAC address. It's
> also possible for VIs to share the same MAC address, which would
> result in MPS setting the replication mode for that entry in the
> TCAM. In this case, the incoming packet would get replicated and
> sent to all the VIs sharing the MAC address. When MPS is able to
> classify the destination MAC in the incoming packet with an entry
> in the MPS TCAM, it forwards the packet to the corresponding VI(s).
> 
> In case of Mirror VI, we program the same MAC as the existing main
> VI. This will result in MPS setting the replication mode for that
> existing entry in the MPS TCAM. So, the MPS would replicate the
> incoming packet and send it to both the main VI and mirror VI.

So far sounds good.

> Note that for the main VI, we also programmed the flow Lookup Engine
> (LE) module to switch the packet back out on one of the underlying
> ports. So, when this rule hits in the LE, the main VI's packet would
> get switched back out in hardware to one of the underlying ports and
> will not reach driver. The mirror VI's packet will not hit any rule
> in the LE and will be received by the driver and will be sent up to
> Linux networking stack.

This I'm not sure I'm following. Are you saying that if there is another
(flower, not matchall) rule programmed it will be ignored as far 
as the matchall filter is concerned? I assume you ensure the matchall
rule is at higher prio in that case?

> > Also looking at the implementation of redirect:
> > 
> > 		case FLOW_ACTION_REDIRECT: {
> > 			struct net_device *out = act->dev;
> > 			struct port_info *pi = netdev_priv(out);
> > 
> > 			fs->action = FILTER_SWITCH;
> > 			fs->eport = pi->port_id;
> > 			}
> > 
> > How do you know the output interface is controlled by your driver, and
> > therefore it's sage to cast netdev_priv() to port_info?  
> 
> We're validating it earlier in cxgb4_validate_flow_actions().
> Here's the code snippet. We're saving the netdevice pointer returned
> by alloc_etherdev_mq() during PCI probe in cxgb4_main.c::init_one()
> and using it to compare the netdevice given by redirect action. If
> the redirect action's netdevice doesn't match any of our underlying
> ports, then we fail offloading this rule.
> 
> 		case FLOW_ACTION_REDIRECT: {
> 			struct adapter *adap = netdev2adap(dev);
> 			struct net_device *n_dev, *target_dev;
> 			unsigned int i;
> 			bool found = false;
> 
> 			target_dev = act->dev;
> 			for_each_port(adap, i) {
> 				n_dev = adap->port[i];
> 				if (target_dev == n_dev) {
> 					found = true;
> 					break;
> 				}
> 			}
> 
> 			/* If interface doesn't belong to our hw, then
> 			 * the provided output port is not valid
> 			 */
> 			if (!found) {
> 				netdev_err(dev, "%s: Out port invalid\n",
> 					   __func__);
> 				return -EINVAL;
> 			}
> 			act_redir = true;
> 			}
> 			break;

Thanks, FWIW this is what netdev_port_same_parent_id() is for.

Rahul Lakkireddy June 26, 2020, 8:50 p.m. UTC | #4

On Friday, June 06/26/20, 2020 at 09:55:49 -0700, Jakub Kicinski wrote:
> On Fri, 26 Jun 2020 15:36:15 +0530 Rahul Lakkireddy wrote:
> > On Thursday, June 06/25/20, 2020 at 15:55:10 -0700, Jakub Kicinski wrote:
> > > On Thu, 25 Jun 2020 17:28:40 +0530 Rahul Lakkireddy wrote:  
> > > > This series of patches add support to mirror all ingress traffic
> > > > for TC-MATCHALL ingress offload.
> > > > 
> > > > Patch 1 adds support to dynamically create a mirror Virtual Interface
> > > > (VI) that accepts all mirror ingress traffic when mirror action is
> > > > set in TC-MATCHALL offload.
> > > > 
> > > > Patch 2 adds support to allocate mirror Rxqs and setup RSS for the
> > > > mirror VI.
> > > > 
> > > > Patch 3 adds support to replicate all the main VI configuration to
> > > > mirror VI. This includes replicating MTU, promiscuous mode,
> > > > all-multicast mode, and enabled netdev Rx feature offloads.  
> > > 
> > > Could you say more about this mirror VI? Is this an internal object
> > > within the NIC or something visible to the user?
> > >   
> > 
> > The Virtual Interface (VI) is an internal object managed by firmware
> > and Multi Port Switch (MPS) module in hardware. Each VI can be
> > programmed with a unique MAC address in the MPS TCAM. So, 1 physical
> > port can have multiple VIs, each with their own MAC address. It's
> > also possible for VIs to share the same MAC address, which would
> > result in MPS setting the replication mode for that entry in the
> > TCAM. In this case, the incoming packet would get replicated and
> > sent to all the VIs sharing the MAC address. When MPS is able to
> > classify the destination MAC in the incoming packet with an entry
> > in the MPS TCAM, it forwards the packet to the corresponding VI(s).
> > 
> > In case of Mirror VI, we program the same MAC as the existing main
> > VI. This will result in MPS setting the replication mode for that
> > existing entry in the MPS TCAM. So, the MPS would replicate the
> > incoming packet and send it to both the main VI and mirror VI.
> 
> So far sounds good.
> 
> > Note that for the main VI, we also programmed the flow Lookup Engine
> > (LE) module to switch the packet back out on one of the underlying
> > ports. So, when this rule hits in the LE, the main VI's packet would
> > get switched back out in hardware to one of the underlying ports and
> > will not reach driver. The mirror VI's packet will not hit any rule
> > in the LE and will be received by the driver and will be sent up to
> > Linux networking stack.
> 
> This I'm not sure I'm following. Are you saying that if there is another
> (flower, not matchall) rule programmed it will be ignored as far 
> as the matchall filter is concerned? I assume you ensure the matchall
> rule is at higher prio in that case?
> 

The order is still maintained. If there's a higher priority
flower rule, then that rule will be considered first before
considering the matchall rule.

For example, let's say we have 2 rules like below:

# tc filter add dev enp2s0f4 ingress protocol ip pref 1 \
	flower skip_sw src_ip 10.1.3.3 action drop

# tc filter add dev enp2s0f4 ingress pref 100 \
	matchall skip_sw action mirred egress mirror dev enp2s0f4d1


If we're receiving a packet with src_ip 10.1.3.3, then rule prio 1
will hit first and the lower prio 100 matchall rule will never hit
for that packet. For all other packets, the matchall rule prio 100
will always hit.

I had tried to explain that some special care must be taken to make
sure that the redirect action of the mirror rule must only be performed
for the main VI's packet, so that it gets switched out to enp2s0f4d1 and
_must not_ reach the driver. The same redirect action must not be
performed for the mirror VI's replicated packet and the packet _must_
reach the driver. We're ensuring this by explicitly programming the
main VI's index for the filter entry in hardware. This way the hardware
will redirect out only the main VI's packet to enp2s0f4d1. It will not
perform redirect on the replicated packet coming from mirror VI.
If no VI index is programmed, then hardware will redirect out
both the main and mirror VI packets to enp2s0f4d1 and none of
the replicated packets will reach the driver, which is unexpected.

I hope I'm making sense... :)

> > > Also looking at the implementation of redirect:
> > > 
> > > 		case FLOW_ACTION_REDIRECT: {
> > > 			struct net_device *out = act->dev;
> > > 			struct port_info *pi = netdev_priv(out);
> > > 
> > > 			fs->action = FILTER_SWITCH;
> > > 			fs->eport = pi->port_id;
> > > 			}
> > > 
> > > How do you know the output interface is controlled by your driver, and
> > > therefore it's sage to cast netdev_priv() to port_info?  
> > 
> > We're validating it earlier in cxgb4_validate_flow_actions().
> > Here's the code snippet. We're saving the netdevice pointer returned
> > by alloc_etherdev_mq() during PCI probe in cxgb4_main.c::init_one()
> > and using it to compare the netdevice given by redirect action. If
> > the redirect action's netdevice doesn't match any of our underlying
> > ports, then we fail offloading this rule.
> > 
> > 		case FLOW_ACTION_REDIRECT: {
> > 			struct adapter *adap = netdev2adap(dev);
> > 			struct net_device *n_dev, *target_dev;
> > 			unsigned int i;
> > 			bool found = false;
> > 
> > 			target_dev = act->dev;
> > 			for_each_port(adap, i) {
> > 				n_dev = adap->port[i];
> > 				if (target_dev == n_dev) {
> > 					found = true;
> > 					break;
> > 				}
> > 			}
> > 
> > 			/* If interface doesn't belong to our hw, then
> > 			 * the provided output port is not valid
> > 			 */
> > 			if (!found) {
> > 				netdev_err(dev, "%s: Out port invalid\n",
> > 					   __func__);
> > 				return -EINVAL;
> > 			}
> > 			act_redir = true;
> > 			}
> > 			break;
> 
> Thanks, FWIW this is what netdev_port_same_parent_id() is for.

Looks like I need to add ndo_get_port_parent_id() support in the
driver for this to work. I'll look into adding this support in
follow up patches.

Thanks,
Rahul

Jakub Kicinski June 27, 2020, 4:17 a.m. UTC | #5

On Sat, 27 Jun 2020 02:20:12 +0530 Rahul Lakkireddy wrote:
> On Friday, June 06/26/20, 2020 at 09:55:49 -0700, Jakub Kicinski wrote:
> > On Fri, 26 Jun 2020 15:36:15 +0530 Rahul Lakkireddy wrote:  
> > > On Thursday, June 06/25/20, 2020 at 15:55:10 -0700, Jakub Kicinski wrote:  
> > > > On Thu, 25 Jun 2020 17:28:40 +0530 Rahul Lakkireddy wrote:    
> > > > > This series of patches add support to mirror all ingress traffic
> > > > > for TC-MATCHALL ingress offload.
> > > > > 
> > > > > Patch 1 adds support to dynamically create a mirror Virtual Interface
> > > > > (VI) that accepts all mirror ingress traffic when mirror action is
> > > > > set in TC-MATCHALL offload.
> > > > > 
> > > > > Patch 2 adds support to allocate mirror Rxqs and setup RSS for the
> > > > > mirror VI.
> > > > > 
> > > > > Patch 3 adds support to replicate all the main VI configuration to
> > > > > mirror VI. This includes replicating MTU, promiscuous mode,
> > > > > all-multicast mode, and enabled netdev Rx feature offloads.    
> > > > 
> > > > Could you say more about this mirror VI? Is this an internal object
> > > > within the NIC or something visible to the user?
> > > >     
> > > 
> > > The Virtual Interface (VI) is an internal object managed by firmware
> > > and Multi Port Switch (MPS) module in hardware. Each VI can be
> > > programmed with a unique MAC address in the MPS TCAM. So, 1 physical
> > > port can have multiple VIs, each with their own MAC address. It's
> > > also possible for VIs to share the same MAC address, which would
> > > result in MPS setting the replication mode for that entry in the
> > > TCAM. In this case, the incoming packet would get replicated and
> > > sent to all the VIs sharing the MAC address. When MPS is able to
> > > classify the destination MAC in the incoming packet with an entry
> > > in the MPS TCAM, it forwards the packet to the corresponding VI(s).
> > > 
> > > In case of Mirror VI, we program the same MAC as the existing main
> > > VI. This will result in MPS setting the replication mode for that
> > > existing entry in the MPS TCAM. So, the MPS would replicate the
> > > incoming packet and send it to both the main VI and mirror VI.  
> > 
> > So far sounds good.
> >   
> > > Note that for the main VI, we also programmed the flow Lookup Engine
> > > (LE) module to switch the packet back out on one of the underlying
> > > ports. So, when this rule hits in the LE, the main VI's packet would
> > > get switched back out in hardware to one of the underlying ports and
> > > will not reach driver. The mirror VI's packet will not hit any rule
> > > in the LE and will be received by the driver and will be sent up to
> > > Linux networking stack.  
> > 
> > This I'm not sure I'm following. Are you saying that if there is another
> > (flower, not matchall) rule programmed it will be ignored as far 
> > as the matchall filter is concerned? I assume you ensure the matchall
> > rule is at higher prio in that case?
> >   
> 
> The order is still maintained. If there's a higher priority
> flower rule, then that rule will be considered first before
> considering the matchall rule.
> 
> For example, let's say we have 2 rules like below:
> 
> # tc filter add dev enp2s0f4 ingress protocol ip pref 1 \
> 	flower skip_sw src_ip 10.1.3.3 action drop
> 
> # tc filter add dev enp2s0f4 ingress pref 100 \
> 	matchall skip_sw action mirred egress mirror dev enp2s0f4d1
> 
> 
> If we're receiving a packet with src_ip 10.1.3.3, then rule prio 1
> will hit first and the lower prio 100 matchall rule will never hit
> for that packet. For all other packets, the matchall rule prio 100
> will always hit.
> 
> I had tried to explain that some special care must be taken to make
> sure that the redirect action of the mirror rule must only be performed
> for the main VI's packet, so that it gets switched out to enp2s0f4d1 and
> _must not_ reach the driver. The same redirect action must not be
> performed for the mirror VI's replicated packet and the packet _must_
> reach the driver. We're ensuring this by explicitly programming the
> main VI's index for the filter entry in hardware. This way the hardware
> will redirect out only the main VI's packet to enp2s0f4d1. It will not
> perform redirect on the replicated packet coming from mirror VI.
> If no VI index is programmed, then hardware will redirect out
> both the main and mirror VI packets to enp2s0f4d1 and none of
> the replicated packets will reach the driver, which is unexpected.
> 
> I hope I'm making sense... :)

What's the main use case for this feature? It appears that you're
allocating queues in patch 2. At the same time I don't see SWITCHDEV
mode / representors in this driver. So the use case is to redirect a
packet out to another port while still receiving a copy?

Rahul Lakkireddy June 27, 2020, 6:44 a.m. UTC | #6

On Friday, June 06/26/20, 2020 at 21:17:50 -0700, Jakub Kicinski wrote:
> On Sat, 27 Jun 2020 02:20:12 +0530 Rahul Lakkireddy wrote:
> > On Friday, June 06/26/20, 2020 at 09:55:49 -0700, Jakub Kicinski wrote:
> > > On Fri, 26 Jun 2020 15:36:15 +0530 Rahul Lakkireddy wrote:  
> > > > On Thursday, June 06/25/20, 2020 at 15:55:10 -0700, Jakub Kicinski wrote:  
> > > > > On Thu, 25 Jun 2020 17:28:40 +0530 Rahul Lakkireddy wrote:    
> > > > > > This series of patches add support to mirror all ingress traffic
> > > > > > for TC-MATCHALL ingress offload.
> > > > > > 
> > > > > > Patch 1 adds support to dynamically create a mirror Virtual Interface
> > > > > > (VI) that accepts all mirror ingress traffic when mirror action is
> > > > > > set in TC-MATCHALL offload.
> > > > > > 
> > > > > > Patch 2 adds support to allocate mirror Rxqs and setup RSS for the
> > > > > > mirror VI.
> > > > > > 
> > > > > > Patch 3 adds support to replicate all the main VI configuration to
> > > > > > mirror VI. This includes replicating MTU, promiscuous mode,
> > > > > > all-multicast mode, and enabled netdev Rx feature offloads.    
> > > > > 
> > > > > Could you say more about this mirror VI? Is this an internal object
> > > > > within the NIC or something visible to the user?
> > > > >     
> > > > 
> > > > The Virtual Interface (VI) is an internal object managed by firmware
> > > > and Multi Port Switch (MPS) module in hardware. Each VI can be
> > > > programmed with a unique MAC address in the MPS TCAM. So, 1 physical
> > > > port can have multiple VIs, each with their own MAC address. It's
> > > > also possible for VIs to share the same MAC address, which would
> > > > result in MPS setting the replication mode for that entry in the
> > > > TCAM. In this case, the incoming packet would get replicated and
> > > > sent to all the VIs sharing the MAC address. When MPS is able to
> > > > classify the destination MAC in the incoming packet with an entry
> > > > in the MPS TCAM, it forwards the packet to the corresponding VI(s).
> > > > 
> > > > In case of Mirror VI, we program the same MAC as the existing main
> > > > VI. This will result in MPS setting the replication mode for that
> > > > existing entry in the MPS TCAM. So, the MPS would replicate the
> > > > incoming packet and send it to both the main VI and mirror VI.  
> > > 
> > > So far sounds good.
> > >   
> > > > Note that for the main VI, we also programmed the flow Lookup Engine
> > > > (LE) module to switch the packet back out on one of the underlying
> > > > ports. So, when this rule hits in the LE, the main VI's packet would
> > > > get switched back out in hardware to one of the underlying ports and
> > > > will not reach driver. The mirror VI's packet will not hit any rule
> > > > in the LE and will be received by the driver and will be sent up to
> > > > Linux networking stack.  
> > > 
> > > This I'm not sure I'm following. Are you saying that if there is another
> > > (flower, not matchall) rule programmed it will be ignored as far 
> > > as the matchall filter is concerned? I assume you ensure the matchall
> > > rule is at higher prio in that case?
> > >   
> > 
> > The order is still maintained. If there's a higher priority
> > flower rule, then that rule will be considered first before
> > considering the matchall rule.
> > 
> > For example, let's say we have 2 rules like below:
> > 
> > # tc filter add dev enp2s0f4 ingress protocol ip pref 1 \
> > 	flower skip_sw src_ip 10.1.3.3 action drop
> > 
> > # tc filter add dev enp2s0f4 ingress pref 100 \
> > 	matchall skip_sw action mirred egress mirror dev enp2s0f4d1
> > 
> > 
> > If we're receiving a packet with src_ip 10.1.3.3, then rule prio 1
> > will hit first and the lower prio 100 matchall rule will never hit
> > for that packet. For all other packets, the matchall rule prio 100
> > will always hit.
> > 
> > I had tried to explain that some special care must be taken to make
> > sure that the redirect action of the mirror rule must only be performed
> > for the main VI's packet, so that it gets switched out to enp2s0f4d1 and
> > _must not_ reach the driver. The same redirect action must not be
> > performed for the mirror VI's replicated packet and the packet _must_
> > reach the driver. We're ensuring this by explicitly programming the
> > main VI's index for the filter entry in hardware. This way the hardware
> > will redirect out only the main VI's packet to enp2s0f4d1. It will not
> > perform redirect on the replicated packet coming from mirror VI.
> > If no VI index is programmed, then hardware will redirect out
> > both the main and mirror VI packets to enp2s0f4d1 and none of
> > the replicated packets will reach the driver, which is unexpected.
> > 
> > I hope I'm making sense... :)
> 
> What's the main use case for this feature? It appears that you're
> allocating queues in patch 2. At the same time I don't see SWITCHDEV
> mode / representors in this driver. So the use case is to redirect a
> packet out to another port while still receiving a copy?

The main use case is to sniff packets that are being switched out
by hardware. The requirement is that there would be higher priority
flower rules that will accept specific traffic and all the other
traffic will be switched out on one of the underlying ports.
Occasionally, we want to sniff the packets that are being switched
out by replacing the redirect action with mirror action.

The reason for allocating queues is that the VIs are isolated from
each other and can't access each other's queues. So, separate queues
must be allocated for mirror VI.

Thanks,
Rahul

Jakub Kicinski June 29, 2020, 10:53 p.m. UTC | #7

On Sat, 27 Jun 2020 12:14:07 +0530 Rahul Lakkireddy wrote:
> > What's the main use case for this feature? It appears that you're
> > allocating queues in patch 2. At the same time I don't see SWITCHDEV
> > mode / representors in this driver. So the use case is to redirect a
> > packet out to another port while still receiving a copy?  
> 
> The main use case is to sniff packets that are being switched out
> by hardware. The requirement is that there would be higher priority
> flower rules that will accept specific traffic and all the other
> traffic will be switched out on one of the underlying ports.
> Occasionally, we want to sniff the packets that are being switched
> out by replacing the redirect action with mirror action.
> 
> The reason for allocating queues is that the VIs are isolated from
> each other and can't access each other's queues. So, separate queues
> must be allocated for mirror VI.

Sounds reasonable, please fix the use of refcount_t (perhaps just use 
a normal integer since structures are protected), and repost.

[net-next,0/3] cxgb4: add mirror action support for TC-MATCHALL

Message

Comments