Message ID | 1583937238-21511-1-git-send-email-paulb@mellanox.com |
---|---|
Headers | show |
Series | Introduce connection tracking offload | expand |
On Wed, Mar 11, 2020 at 04:33:43PM +0200, Paul Blakey wrote: > Applying this patchset > -------------------------- > > On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"), > pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git > and fix the following non trivial conflict in fs_core.c as follows: > #define OFFLOADS_MAX_FT 2 > #define OFFLOADS_NUM_PRIOS 2 > #define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS) > > Then apply this patchset. I did this and I couldn't get tc offloading (not ct) to work anymore. Then I moved to current net-next (the commit you mentioned above), and got the same thing. What I can tell so far is that perf script | head handler11 4415 [009] 1263.438424: probe:mlx5e_configure_flower__return: (ffffffffc094fa80 <- ffffffff93dc510a) arg1=0xffffffffffffffa1 and that's EOPNOTSUPP. Not sure yet where that is coming from. Btw, it's prooving to be a nice exercise to find out why it is failing to offload. Perhaps some netdev_err_once() is welcomed. Marcelo
On 11/03/2020 21:13, Marcelo Ricardo Leitner wrote: > On Wed, Mar 11, 2020 at 04:33:43PM +0200, Paul Blakey wrote: >> Applying this patchset >> -------------------------- >> >> On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"), >> pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git >> and fix the following non trivial conflict in fs_core.c as follows: >> #define OFFLOADS_MAX_FT 2 >> #define OFFLOADS_NUM_PRIOS 2 >> #define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS) >> >> Then apply this patchset. > I did this and I couldn't get tc offloading (not ct) to work anymore. > Then I moved to current net-next (the commit you mentioned above), and > got the same thing. > > What I can tell so far is that > perf script | head > handler11 4415 [009] 1263.438424: probe:mlx5e_configure_flower__return: (ffffffffc094fa80 <- ffffffff93dc510a) arg1=0xffffffffffffffa1 > > and that's EOPNOTSUPP. Not sure yet where that is coming from. I guess you fail at what this patch "flow_offload: fix allowed types check" is suppose to fix https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a393daa8993fd7d6c9c33110d5dac08bc0dc2696 That was the last bug that caused what you decribe - all tc rules failed. It should be before the patch I detailed as seen in git log here: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/ Can you check you have it? im not sure if compiling just the driver is enough. if it is this, it should have fixed it. if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver. there is a netlink error msg most of the cases, try skip_sw or verbose to see. > > Btw, it's prooving to be a nice exercise to find out why it is failing > to offload. Perhaps some netdev_err_once() is welcomed. > > Marcelo
On Thu, Mar 12, 2020 at 12:27:37AM +0200, Paul Blakey wrote: > > On 11/03/2020 21:13, Marcelo Ricardo Leitner wrote: > > On Wed, Mar 11, 2020 at 04:33:43PM +0200, Paul Blakey wrote: > >> Applying this patchset > >> -------------------------- > >> > >> On top of current net-next ("r8169: simplify getting stats by using netdev_stats_to_stats64"), > >> pull Saeed's ct-offload branch, from git git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git > >> and fix the following non trivial conflict in fs_core.c as follows: > >> #define OFFLOADS_MAX_FT 2 > >> #define OFFLOADS_NUM_PRIOS 2 > >> #define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + OFFLOADS_NUM_PRIOS) > >> > >> Then apply this patchset. > > I did this and I couldn't get tc offloading (not ct) to work anymore. > > Then I moved to current net-next (the commit you mentioned above), and > > got the same thing. > > > > What I can tell so far is that > > perf script | head > > handler11 4415 [009] 1263.438424: probe:mlx5e_configure_flower__return: (ffffffffc094fa80 <- ffffffff93dc510a) arg1=0xffffffffffffffa1 > > > > and that's EOPNOTSUPP. Not sure yet where that is coming from. > > I guess you fail at what this patch "flow_offload: fix allowed types check" is suppose to fix > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a393daa8993fd7d6c9c33110d5dac08bc0dc2696 > > That was the last bug that caused what you decribe - all tc rules failed. > > It should be before the patch I detailed as seen in git log here: > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/ > > > Can you check you have it? im not sure if compiling just the driver is enough. I had this one, in both tests. > > if it is this, it should have fixed it. > > if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver. Ok. This one was my main suspect now after some extra printks. I could confirm it parse_tc_fdb_actions is returning the error, but not sure why yet. > > there is a netlink error msg most of the cases, try skip_sw or verbose to see. Yeah.. that probably would be easier, thanks. I'm using it under OvS and without skip_sw, got just no logs for it. > > > > > Btw, it's prooving to be a nice exercise to find out why it is failing > > to offload. Perhaps some netdev_err_once() is welcomed. > > > > Marcelo
On Wed, Mar 11, 2020 at 07:44:15PM -0300, Marcelo Ricardo Leitner wrote: > On Thu, Mar 12, 2020 at 12:27:37AM +0200, Paul Blakey wrote: ... > > if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver. > > Ok. This one was my main suspect now after some extra printks. I could > confirm it parse_tc_fdb_actions is returning the error, but not sure > why yet. Copied the extack in flow_action_mixed_hw_stats_types_check() onto a printk and logged the if parameters: @@ -284,6 +284,8 @@ flow_action_mixed_hw_stats_types_check(const struct flow_action *action, flow_action_for_each(i, action_entry, action) { if (i && action_entry->hw_stats_type != last_hw_stats_type) { + printk("Mixing HW stats types for actions is not supported, %d %d %d\n", + i, action_entry->hw_stats_type, last_hw_stats_type); NL_SET_ERR_MSG_MOD(extack, "Mixing HW stats types for actions is not supported"); [ 188.554636] Mixing HW stats types for actions is not supported, 2 0 3 with iproute-next from today loaded with 'iproute2/net-next v2] tc: m_action: introduce support for hw stats type', I get a dump like: # ./tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 2 flower chain 0 filter protocol ip pref 2 flower chain 0 handle 0x1 dst_mac 6a:66:2d:48:92:c2 src_mac 00:00:00:00:0e:b7 eth_type ipv4 ip_proto udp ip_ttl 64 src_port 100 enc_dst_ip 1.1.1.1 enc_src_ip 1.1.1.2 enc_key_id 0 enc_dst_port 4789 enc_tos 0 ip_flags nofrag not_in_hw action order 1: tunnel_key unset pipe index 4 ref 1 bind 1 installed 2432 sec used 0 sec Action statistics: Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu action order 2: pedit action pipe keys 2 index 4 ref 1 bind 1 installed 2432 sec used 0 sec key #0 at ipv4+8: val 3f000000 mask 00ffffff key #1 at udp+0: val 13890000 mask 0000ffff Action statistics: Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 3: csum (iph, udp) action pipe index 4 ref 1 bind 1 installed 2432 sec used 0 sec Action statistics: Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 no_percpu action order 4: mirred (Egress Redirect to device eth0) stolen index 1 ref 1 bind 1 installed 2432 sec used 0 sec Action statistics: Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 9c7d6b3aa84a32498181d98da0af80d2 no_percpu Which is quite interesting as it is not listing any hwstats fields at all. Considering that due to tcf_action_hw_stats_type_get() and hw_stats_type initialization in tcf_action_init_1(), the default is for it to be 3 if not specified (which is then not dumped over netlink by the kernel). I'm wondering how it was 0 for i==0,1 and is 3 for i==2.
On 3/12/2020 2:01 AM, Marcelo Ricardo Leitner wrote: > On Wed, Mar 11, 2020 at 07:44:15PM -0300, Marcelo Ricardo Leitner wrote: >> On Thu, Mar 12, 2020 at 12:27:37AM +0200, Paul Blakey wrote: > ... >>> if not try skipping flow_action_hw_stats_types_check() calls in mlx5 driver. >> Ok. This one was my main suspect now after some extra printks. I could >> confirm it parse_tc_fdb_actions is returning the error, but not sure >> why yet. > Copied the extack in flow_action_mixed_hw_stats_types_check() onto a > printk and logged the if parameters: > > @@ -284,6 +284,8 @@ flow_action_mixed_hw_stats_types_check(const struct flow_action *action, > > flow_action_for_each(i, action_entry, action) { > if (i && action_entry->hw_stats_type != last_hw_stats_type) { > + printk("Mixing HW stats types for actions is not supported, %d %d %d\n", > + i, action_entry->hw_stats_type, last_hw_stats_type); > NL_SET_ERR_MSG_MOD(extack, "Mixing HW stats types for actions is not supported"); > > [ 188.554636] Mixing HW stats types for actions is not supported, 2 0 3 > > with iproute-next from today loaded with > 'iproute2/net-next v2] tc: m_action: introduce support for hw stats > type', I get a dump like: > > # ./tc -s filter show dev vxlan_sys_4789 ingress > filter protocol ip pref 2 flower chain 0 > filter protocol ip pref 2 flower chain 0 handle 0x1 > dst_mac 6a:66:2d:48:92:c2 > src_mac 00:00:00:00:0e:b7 > eth_type ipv4 > ip_proto udp > ip_ttl 64 > src_port 100 > enc_dst_ip 1.1.1.1 > enc_src_ip 1.1.1.2 > enc_key_id 0 > enc_dst_port 4789 > enc_tos 0 > ip_flags nofrag > not_in_hw > action order 1: tunnel_key unset pipe > index 4 ref 1 bind 1 installed 2432 sec used 0 sec > Action statistics: > Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > no_percpu > > action order 2: pedit action pipe keys 2 > index 4 ref 1 bind 1 installed 2432 sec used 0 sec > key #0 at ipv4+8: val 3f000000 mask 00ffffff > key #1 at udp+0: val 13890000 mask 0000ffff > Action statistics: > Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > > action order 3: csum (iph, udp) action pipe > index 4 ref 1 bind 1 installed 2432 sec used 0 sec > Action statistics: > Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > no_percpu > > action order 4: mirred (Egress Redirect to device eth0) stolen > index 1 ref 1 bind 1 installed 2432 sec used 0 sec > Action statistics: > Sent 223744 bytes 4864 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > cookie 9c7d6b3aa84a32498181d98da0af80d2 > no_percpu > > Which is quite interesting as it is not listing any hwstats fields at all. > Considering that due to tcf_action_hw_stats_type_get() and > hw_stats_type initialization in tcf_action_init_1(), the default is > for it to be 3 if not specified (which is then not dumped over netlink > by the kernel). > > I'm wondering how it was 0 for i==0,1 and is 3 for i==2. There is a current bug with pedit, as it is split to muliple mangle action entries, and the action entry hw_stats field is not init for all of them. Then we fail on checking all hw_stats are the same ...