diff mbox series

[ovs-dev,ovs-dev,v3] dpif-netdev: fix dpif_netdev_flow_put

Message ID 20230731045523.315568-1-hepeng.0320@bytedance.com
State Changes Requested
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev,ovs-dev,v3] dpif-netdev: fix dpif_netdev_flow_put | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Peng He July 31, 2023, 4:55 a.m. UTC
OVS allows overlapping megaflows, as long as the actions of these
megaflows are equal. However, the current implementation of action
modification relies on flow_lookup instead of ufid, this could result
in looking up a wrong megaflow and make the ukeys and megaflows inconsistent

Just like the test case in the patch, at first we have a rule with the
prefix:

10.1.2.0/24

and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with IP
10.1.2.2 is received.

Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep the
10.1.2.2/24 megaflow and just changes its action instead of extending
the prefix into 10.1.2.2/16.

then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
this time, we will have an overlapping megaflow with the right prefix:
10.1.0.2/16

now we have two megaflows:
10.1.2.2/24
10.1.0.2/16

last, suppose we have changed the ruleset again. The revalidator this
time still decides to change the actions of both megaflows instead of
deleting them.

The dpif_netdev_flow_put will search the megaflow to modify with unmasked
keys, however it might lookup the wrong megaflow as the key 10.1.2.2 matches
both 10.1.2.2/24 and 10.1.0.2/16!

This patch changes the megaflow lookup code in modification path into
relying the ufid to find the correct megaflow instead of key lookup.

Signed-off-by: Peng He <hepeng.0320@bytedance.com>
---
 lib/dpif-netdev.c | 49 ++++++++++++++++++++++++++++++++---------------
 tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+), 15 deletions(-)

Comments

0-day Robot July 31, 2023, 6:59 a.m. UTC | #1
Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He <xnhp0320@gmail.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He <hepeng.0320@bytedance.com>
Lines checked: 186, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Peng He July 31, 2023, 12:53 p.m. UTC | #2
just found that I miss the Fixes tag, will add it in the next post.

Peng He <xnhp0320@gmail.com> 于2023年7月31日周一 14:44写道:

> OVS allows overlapping megaflows, as long as the actions of these
> megaflows are equal. However, the current implementation of action
> modification relies on flow_lookup instead of ufid, this could result
> in looking up a wrong megaflow and make the ukeys and megaflows
> inconsistent
>
> Just like the test case in the patch, at first we have a rule with the
> prefix:
>
> 10.1.2.0/24
>
> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with IP
> 10.1.2.2 is received.
>
> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep the
> 10.1.2.2/24 megaflow and just changes its action instead of extending
> the prefix into 10.1.2.2/16.
>
> then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> this time, we will have an overlapping megaflow with the right prefix:
> 10.1.0.2/16
>
> now we have two megaflows:
> 10.1.2.2/24
> 10.1.0.2/16
>
> last, suppose we have changed the ruleset again. The revalidator this
> time still decides to change the actions of both megaflows instead of
> deleting them.
>
> The dpif_netdev_flow_put will search the megaflow to modify with unmasked
> keys, however it might lookup the wrong megaflow as the key 10.1.2.2
> matches
> both 10.1.2.2/24 and 10.1.0.2/16!
>
> This patch changes the megaflow lookup code in modification path into
> relying the ufid to find the correct megaflow instead of key lookup.
>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---
>  lib/dpif-netdev.c | 49 ++++++++++++++++++++++++++++++++---------------
>  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+), 15 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 70b953ae6..8c88a5dc0 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4191,7 +4191,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>                  const struct dpif_flow_put *put,
>                  struct dpif_flow_stats *stats)
>  {
> -    struct dp_netdev_flow *netdev_flow;
> +    struct dp_netdev_flow *netdev_flow = NULL;
>      int error = 0;
>
>      if (stats) {
> @@ -4199,16 +4199,39 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>      }
>
>      ovs_mutex_lock(&pmd->flow_mutex);
> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> -    if (!netdev_flow) {
> -        if (put->flags & DPIF_FP_CREATE) {
> -            dp_netdev_flow_add(pmd, match, ufid, put->actions,
> -                               put->actions_len, ODPP_NONE);
> +    if (put->ufid) {
> +        netdev_flow = dp_netdev_pmd_find_flow(pmd, put->ufid,
> +                                              put->key, put->key_len);
> +    } else {
> +        /* Use key instead of the locally generated ufid
> +         * to search netdev_flow.
> +         */
> +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> +    }
> +
> +    if (put->flags & DPIF_FP_CREATE) {
> +        if (!netdev_flow) {
> +            /* If ufid is provided, use provided ufid, otherwise
> +             * use locally generated ufid.
> +             */
> +            dp_netdev_flow_add(pmd, match, put->ufid ? put->ufid : ufid,
> +                               put->actions, put->actions_len, ODPP_NONE);
>          } else {
> -            error = ENOENT;
> +            error = EEXIST;
>          }
> -    } else {
> -        if (put->flags & DPIF_FP_MODIFY) {
> +        goto exit;
> +    }
> +
> +    if (put->flags & DPIF_FP_MODIFY) {
> +        if (!netdev_flow) {
> +            error = ENOENT;
> +        } else {
> +            if (!flow_equal(&match->flow, &netdev_flow->flow)) {
> +                /* Overlapping flow. */
> +                error = EINVAL;
> +                goto exit;
> +            }
> +
>              struct dp_netdev_actions *new_actions;
>              struct dp_netdev_actions *old_actions;
>
> @@ -4239,15 +4262,11 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>                   *   counter, and subtracting it before outputting the
> stats */
>                  error = EOPNOTSUPP;
>              }
> -
>              ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> -        } else if (put->flags & DPIF_FP_CREATE) {
> -            error = EEXIST;
> -        } else {
> -            /* Overlapping flow. */
> -            error = EINVAL;
>          }
>      }
> +
> +exit:
>      ovs_mutex_unlock(&pmd->flow_mutex);
>      return error;
>  }
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 48f3d432d..f399294d2 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log
> | grep "PMD load based sleeps
>
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> +
> +OVS_VSWITCHD_START(
> +[add-port br0 p1 \
> +   -- set interface p1 type=dummy-pmd \
> +   -- set bridge br0 datapath-type=dummy \
> +   -- add-port br0 p2 \
> +   -- set interface p2 type=dummy-pmd --
> +])
> +
> +dnl Add one openflow rule and generate a megaflow.
> +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=
> 10.1.2.0/24,actions=p2'])
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], [0],
> [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> +])
> +
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> +dnl Replace openflow rules, trigger the revalidation.
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16
> actions=ct(commit)' | dnl
> +ovs-ofctl --bundle replace-flows br0 -])
> +AT_CHECK([ovs-appctl revalidator/wait])
> +
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never,
> actions:ct(commit)
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
> actions:ct(commit)
> +])
> +
> +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24
> tuple in the pvector of subtables.
> +for i in `seq 0 256`;do
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
> +done
> +
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' |
> dnl
> +ovs-ofctl --bundle replace-flows br0 -])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> --
> 2.25.1
>
>
Peng He Aug. 7, 2023, 3:13 a.m. UTC | #3
Hi, Ilya,

are there any more comments beside the Fixes tag?




Peng He <xnhp0320@gmail.com> 于2023年7月31日周一 20:53写道:

> just found that I miss the Fixes tag, will add it in the next post.
>
> Peng He <xnhp0320@gmail.com> 于2023年7月31日周一 14:44写道:
>
>> OVS allows overlapping megaflows, as long as the actions of these
>> megaflows are equal. However, the current implementation of action
>> modification relies on flow_lookup instead of ufid, this could result
>> in looking up a wrong megaflow and make the ukeys and megaflows
>> inconsistent
>>
>> Just like the test case in the patch, at first we have a rule with the
>> prefix:
>>
>> 10.1.2.0/24
>>
>> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with
>> IP
>> 10.1.2.2 is received.
>>
>> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep the
>> 10.1.2.2/24 megaflow and just changes its action instead of extending
>> the prefix into 10.1.2.2/16.
>>
>> then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
>> this time, we will have an overlapping megaflow with the right prefix:
>> 10.1.0.2/16
>>
>> now we have two megaflows:
>> 10.1.2.2/24
>> 10.1.0.2/16
>>
>> last, suppose we have changed the ruleset again. The revalidator this
>> time still decides to change the actions of both megaflows instead of
>> deleting them.
>>
>> The dpif_netdev_flow_put will search the megaflow to modify with unmasked
>> keys, however it might lookup the wrong megaflow as the key 10.1.2.2
>> matches
>> both 10.1.2.2/24 and 10.1.0.2/16!
>>
>> This patch changes the megaflow lookup code in modification path into
>> relying the ufid to find the correct megaflow instead of key lookup.
>>
>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>> ---
>>  lib/dpif-netdev.c | 49 ++++++++++++++++++++++++++++++++---------------
>>  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 82 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 70b953ae6..8c88a5dc0 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -4191,7 +4191,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>>                  const struct dpif_flow_put *put,
>>                  struct dpif_flow_stats *stats)
>>  {
>> -    struct dp_netdev_flow *netdev_flow;
>> +    struct dp_netdev_flow *netdev_flow = NULL;
>>      int error = 0;
>>
>>      if (stats) {
>> @@ -4199,16 +4199,39 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>>      }
>>
>>      ovs_mutex_lock(&pmd->flow_mutex);
>> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>> -    if (!netdev_flow) {
>> -        if (put->flags & DPIF_FP_CREATE) {
>> -            dp_netdev_flow_add(pmd, match, ufid, put->actions,
>> -                               put->actions_len, ODPP_NONE);
>> +    if (put->ufid) {
>> +        netdev_flow = dp_netdev_pmd_find_flow(pmd, put->ufid,
>> +                                              put->key, put->key_len);
>> +    } else {
>> +        /* Use key instead of the locally generated ufid
>> +         * to search netdev_flow.
>> +         */
>> +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>> +    }
>> +
>> +    if (put->flags & DPIF_FP_CREATE) {
>> +        if (!netdev_flow) {
>> +            /* If ufid is provided, use provided ufid, otherwise
>> +             * use locally generated ufid.
>> +             */
>> +            dp_netdev_flow_add(pmd, match, put->ufid ? put->ufid : ufid,
>> +                               put->actions, put->actions_len,
>> ODPP_NONE);
>>          } else {
>> -            error = ENOENT;
>> +            error = EEXIST;
>>          }
>> -    } else {
>> -        if (put->flags & DPIF_FP_MODIFY) {
>> +        goto exit;
>> +    }
>> +
>> +    if (put->flags & DPIF_FP_MODIFY) {
>> +        if (!netdev_flow) {
>> +            error = ENOENT;
>> +        } else {
>> +            if (!flow_equal(&match->flow, &netdev_flow->flow)) {
>> +                /* Overlapping flow. */
>> +                error = EINVAL;
>> +                goto exit;
>> +            }
>> +
>>              struct dp_netdev_actions *new_actions;
>>              struct dp_netdev_actions *old_actions;
>>
>> @@ -4239,15 +4262,11 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>>                   *   counter, and subtracting it before outputting the
>> stats */
>>                  error = EOPNOTSUPP;
>>              }
>> -
>>              ovsrcu_postpone(dp_netdev_actions_free, old_actions);
>> -        } else if (put->flags & DPIF_FP_CREATE) {
>> -            error = EEXIST;
>> -        } else {
>> -            /* Overlapping flow. */
>> -            error = EINVAL;
>>          }
>>      }
>> +
>> +exit:
>>      ovs_mutex_unlock(&pmd->flow_mutex);
>>      return error;
>>  }
>> diff --git a/tests/pmd.at b/tests/pmd.at
>> index 48f3d432d..f399294d2 100644
>> --- a/tests/pmd.at
>> +++ b/tests/pmd.at
>> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log
>> | grep "PMD load based sleeps
>>
>>  OVS_VSWITCHD_STOP
>>  AT_CLEANUP
>> +
>> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
>> +
>> +OVS_VSWITCHD_START(
>> +[add-port br0 p1 \
>> +   -- set interface p1 type=dummy-pmd \
>> +   -- set bridge br0 datapath-type=dummy \
>> +   -- add-port br0 p2 \
>> +   -- set interface p2 type=dummy-pmd --
>> +])
>> +
>> +dnl Add one openflow rule and generate a megaflow.
>> +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=
>> 10.1.2.0/24,actions=p2'])
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>> +
>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'],
>> [0], [
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never,
>> actions:2
>> +])
>> +
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>> +dnl Replace openflow rules, trigger the revalidation.
>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16
>> actions=ct(commit)' | dnl
>> +ovs-ofctl --bundle replace-flows br0 -])
>> +AT_CHECK([ovs-appctl revalidator/wait])
>> +
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
>> strip_xout_keep_actions], [0], [
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never,
>> actions:ct(commit)
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
>> actions:ct(commit)
>> +])
>> +
>> +dnl Hold the prefix 10.1.2.2/24 by another 10s.
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24
>> tuple in the pvector of subtables.
>> +for i in `seq 0 256`;do
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p1
>> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
>> +done
>> +
>> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' |
>> dnl
>> +ovs-ofctl --bundle replace-flows br0 -])
>> +
>> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
>> strip_xout_keep_actions], [0], [
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
>> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
>> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
>> +])
>> +
>> +OVS_VSWITCHD_STOP
>> +AT_CLEANUP
>> --
>> 2.25.1
>>
>>
>
> --
> hepeng
>
Simon Horman Aug. 7, 2023, 12:12 p.m. UTC | #4
On Mon, Aug 07, 2023 at 11:13:09AM +0800, Peng He wrote:
> Hi, Ilya,
> 
> are there any more comments beside the Fixes tag?

FWIIW, the tag aside, this looks good to me.

Acked-by: Simon Horman <horms@ovn.org>
Ilya Maximets Aug. 7, 2023, 11:31 p.m. UTC | #5
On 7/31/23 06:55, Peng He wrote:
> OVS allows overlapping megaflows, as long as the actions of these
> megaflows are equal. However, the current implementation of action
> modification relies on flow_lookup instead of ufid, this could result
> in looking up a wrong megaflow and make the ukeys and megaflows inconsistent
> 
> Just like the test case in the patch, at first we have a rule with the
> prefix:
> 
> 10.1.2.0/24
> 
> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with IP
> 10.1.2.2 is received.
> 
> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep the
> 10.1.2.2/24 megaflow and just changes its action instead of extending
> the prefix into 10.1.2.2/16.
> 
> then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> this time, we will have an overlapping megaflow with the right prefix:
> 10.1.0.2/16
> 
> now we have two megaflows:
> 10.1.2.2/24
> 10.1.0.2/16
> 
> last, suppose we have changed the ruleset again. The revalidator this
> time still decides to change the actions of both megaflows instead of
> deleting them.
> 
> The dpif_netdev_flow_put will search the megaflow to modify with unmasked
> keys, however it might lookup the wrong megaflow as the key 10.1.2.2 matches
> both 10.1.2.2/24 and 10.1.0.2/16!
> 
> This patch changes the megaflow lookup code in modification path into
> relying the ufid to find the correct megaflow instead of key lookup.
> 
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---

Thanks for the update!  This version looks logically correct to me.
See some small comments below.

Best regards, Ilya Maximets.

>  lib/dpif-netdev.c | 49 ++++++++++++++++++++++++++++++++---------------
>  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 70b953ae6..8c88a5dc0 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4191,7 +4191,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>                  const struct dpif_flow_put *put,
>                  struct dpif_flow_stats *stats)
>  {
> -    struct dp_netdev_flow *netdev_flow;
> +    struct dp_netdev_flow *netdev_flow = NULL;
>      int error = 0;
>  
>      if (stats) {
> @@ -4199,16 +4199,39 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>      }
>  
>      ovs_mutex_lock(&pmd->flow_mutex);
> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> -    if (!netdev_flow) {
> -        if (put->flags & DPIF_FP_CREATE) {
> -            dp_netdev_flow_add(pmd, match, ufid, put->actions,
> -                               put->actions_len, ODPP_NONE);
> +    if (put->ufid) {
> +        netdev_flow = dp_netdev_pmd_find_flow(pmd, put->ufid,
> +                                              put->key, put->key_len);
> +    } else {
> +        /* Use key instead of the locally generated ufid
> +         * to search netdev_flow.
> +         */

Nit: Usually, /* and */ in the comment are on their own lines or
both on the lines with the text, i.e.

 /*
  * Line 1
  * Line 2
  */

Or

 /* Line 1
  * Line 2 */

Rarely the style is mixed.  In the case above, I'd suggest to move
the closing '*/' to the previous line.

> +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> +    }
> +
> +    if (put->flags & DPIF_FP_CREATE) {
> +        if (!netdev_flow) {
> +            /* If ufid is provided, use provided ufid, otherwise
> +             * use locally generated ufid.
> +             */
> +            dp_netdev_flow_add(pmd, match, put->ufid ? put->ufid : ufid,

This looks  abit strange, because ufid equals put->ufid, if provided.
I'd remove the comment and just use 'ufid' here without a ternary
operator.

Or maybe a cleaner solution is to remove 'ufid = *put->ufid;' assignment
from dpif_netdev_flow_put() and re-name the argument of flow_put_on_pmd()
function from 'ufid' to 'local_ufid' or something like that.  In this
case we should still remove the comment, but keep the ternary operator.
Might be easier to read this way.  What do you think?

> +                               put->actions, put->actions_len, ODPP_NONE);
>          } else {
> -            error = ENOENT;
> +            error = EEXIST;
>          }
> -    } else {
> -        if (put->flags & DPIF_FP_MODIFY) {
> +        goto exit;
> +    }
> +
> +    if (put->flags & DPIF_FP_MODIFY) {
> +        if (!netdev_flow) {
> +            error = ENOENT;
> +        } else {
> +            if (!flow_equal(&match->flow, &netdev_flow->flow)) {

Should this be (!put->ufid && !flow_equal(&match->flow, &netdev_flow->flow)) ?
i.e. is it necessary to compare flows if found by external uuid?

> +                /* Overlapping flow. */
> +                error = EINVAL;
> +                goto exit;
> +            }
> +
>              struct dp_netdev_actions *new_actions;
>              struct dp_netdev_actions *old_actions;
>  
> @@ -4239,15 +4262,11 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>                   *   counter, and subtracting it before outputting the stats */
>                  error = EOPNOTSUPP;
>              }
> -
>              ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> -        } else if (put->flags & DPIF_FP_CREATE) {
> -            error = EEXIST;
> -        } else {
> -            /* Overlapping flow. */
> -            error = EINVAL;
>          }
>      }
> +
> +exit:
>      ovs_mutex_unlock(&pmd->flow_mutex);
>      return error;
>  }
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 48f3d432d..f399294d2 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps
>  
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> +
> +OVS_VSWITCHD_START(
> +[add-port br0 p1 \
> +   -- set interface p1 type=dummy-pmd \
> +   -- set bridge br0 datapath-type=dummy \

It's probably better to change order of above two lines.

> +   -- add-port br0 p2 \
> +   -- set interface p2 type=dummy-pmd --

'--' at the end is not needed.

> +])
> +
> +dnl Add one openflow rule and generate a megaflow.
> +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=10.1.2.0/24,actions=p2'])
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> +])

For some reason, this check always fails on FreeBSD:

./pmd.at:1349: ovs-appctl dpctl/dump-flows | sed 's/.*core: [0-9]*//'
--- -	2023-08-07 22:43:09.208141000 +0000
+++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/1088/stdout	2023-08-07 22:43:09.207372000 +0000
@@ -1,3 +1 @@
 
-recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
-
---

The flow dump is empty, presumably the PMD thread didn't fully
process the packet yet at the time of a dump.

> +
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> +dnl Replace openflow rules, trigger the revalidation.
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=ct(commit)' | dnl
> +ovs-ofctl --bundle replace-flows br0 -])

Indent this line to the level of 'echo' above.

> +AT_CHECK([ovs-appctl revalidator/wait])
> +
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never, actions:ct(commit)
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:ct(commit)
> +])
> +
> +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24 tuple in the pvector of subtables.
> +for i in `seq 0 256`;do

A space after ';'.  Also, it's better to use $(...) instead of `...`.

> +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])

Indent this 2 spaces to the right.

> +done
> +
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' | dnl
> +ovs-ofctl --bundle replace-flows br0 -])

Indentation here again.

Should, probably, wait here for revalidation as well?

> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
Peng He Aug. 8, 2023, 7:20 a.m. UTC | #6
Ilya Maximets <i.maximets@ovn.org> 于2023年8月8日周二 07:30写道:

> On 7/31/23 06:55, Peng He wrote:
> > OVS allows overlapping megaflows, as long as the actions of these
> > megaflows are equal. However, the current implementation of action
> > modification relies on flow_lookup instead of ufid, this could result
> > in looking up a wrong megaflow and make the ukeys and megaflows
> inconsistent
> >
> > Just like the test case in the patch, at first we have a rule with the
> > prefix:
> >
> > 10.1.2.0/24
> >
> > and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with
> IP
> > 10.1.2.2 is received.
> >
> > Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep
> the
> > 10.1.2.2/24 megaflow and just changes its action instead of extending
> > the prefix into 10.1.2.2/16.
> >
> > then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> > this time, we will have an overlapping megaflow with the right prefix:
> > 10.1.0.2/16
> >
> > now we have two megaflows:
> > 10.1.2.2/24
> > 10.1.0.2/16
> >
> > last, suppose we have changed the ruleset again. The revalidator this
> > time still decides to change the actions of both megaflows instead of
> > deleting them.
> >
> > The dpif_netdev_flow_put will search the megaflow to modify with unmasked
> > keys, however it might lookup the wrong megaflow as the key 10.1.2.2
> matches
> > both 10.1.2.2/24 and 10.1.0.2/16!
> >
> > This patch changes the megaflow lookup code in modification path into
> > relying the ufid to find the correct megaflow instead of key lookup.
> >
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > ---
>
> Thanks for the update!  This version looks logically correct to me.
> See some small comments below.
>
> Best regards, Ilya Maximets.
>
> >  lib/dpif-netdev.c | 49 ++++++++++++++++++++++++++++++++---------------
> >  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 82 insertions(+), 15 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 70b953ae6..8c88a5dc0 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4191,7 +4191,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >                  const struct dpif_flow_put *put,
> >                  struct dpif_flow_stats *stats)
> >  {
> > -    struct dp_netdev_flow *netdev_flow;
> > +    struct dp_netdev_flow *netdev_flow = NULL;
> >      int error = 0;
> >
> >      if (stats) {
> > @@ -4199,16 +4199,39 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >      }
> >
> >      ovs_mutex_lock(&pmd->flow_mutex);
> > -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> > -    if (!netdev_flow) {
> > -        if (put->flags & DPIF_FP_CREATE) {
> > -            dp_netdev_flow_add(pmd, match, ufid, put->actions,
> > -                               put->actions_len, ODPP_NONE);
> > +    if (put->ufid) {
> > +        netdev_flow = dp_netdev_pmd_find_flow(pmd, put->ufid,
> > +                                              put->key, put->key_len);
> > +    } else {
> > +        /* Use key instead of the locally generated ufid
> > +         * to search netdev_flow.
> > +         */
>
> Nit: Usually, /* and */ in the comment are on their own lines or
> both on the lines with the text, i.e.
>
>  /*
>   * Line 1
>   * Line 2
>   */
>
> Or
>
>  /* Line 1
>   * Line 2 */
>
> Rarely the style is mixed.  In the case above, I'd suggest to move
> the closing '*/' to the previous line.
>
> > +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> > +    }
> > +
> > +    if (put->flags & DPIF_FP_CREATE) {
> > +        if (!netdev_flow) {
> > +            /* If ufid is provided, use provided ufid, otherwise
> > +             * use locally generated ufid.
> > +             */
> > +            dp_netdev_flow_add(pmd, match, put->ufid ? put->ufid : ufid,
>
> This looks  abit strange, because ufid equals put->ufid, if provided.
> I'd remove the comment and just use 'ufid' here without a ternary
> operator.
>
> Or maybe a cleaner solution is to remove 'ufid = *put->ufid;' assignment
> from dpif_netdev_flow_put() and re-name the argument of flow_put_on_pmd()
> function from 'ufid' to 'local_ufid' or something like that.  In this
> case we should still remove the comment, but keep the ternary operator.
> Might be easier to read this way.  What do you think?
>

I miss ufid = put->ufid here. If so, I guess just use ufid here is fine.



>
> > +                               put->actions, put->actions_len,
> ODPP_NONE);
> >          } else {
> > -            error = ENOENT;
> > +            error = EEXIST;
> >          }
> > -    } else {
> > -        if (put->flags & DPIF_FP_MODIFY) {
> > +        goto exit;
> > +    }
> > +
> > +    if (put->flags & DPIF_FP_MODIFY) {
> > +        if (!netdev_flow) {
> > +            error = ENOENT;
> > +        } else {
> > +            if (!flow_equal(&match->flow, &netdev_flow->flow)) {
>
> Should this be (!put->ufid && !flow_equal(&match->flow,
> &netdev_flow->flow)) ?
> i.e. is it necessary to compare flows if found by external uuid?
>
Yes, no need to check against the flow if ufid is provided.
will fix it in the next version.

Will fix all the issues you mentioned below.


>
> > +                /* Overlapping flow. */
> > +                error = EINVAL;
> > +                goto exit;
> > +            }
> > +
> >              struct dp_netdev_actions *new_actions;
> >              struct dp_netdev_actions *old_actions;
> >
> > @@ -4239,15 +4262,11 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >                   *   counter, and subtracting it before outputting the
> stats */
> >                  error = EOPNOTSUPP;
> >              }
> > -
> >              ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> > -        } else if (put->flags & DPIF_FP_CREATE) {
> > -            error = EEXIST;
> > -        } else {
> > -            /* Overlapping flow. */
> > -            error = EINVAL;
> >          }
> >      }
> > +
> > +exit:
> >      ovs_mutex_unlock(&pmd->flow_mutex);
> >      return error;
> >  }
> > diff --git a/tests/pmd.at b/tests/pmd.at
> > index 48f3d432d..f399294d2 100644
> > --- a/tests/pmd.at
> > +++ b/tests/pmd.at
> > @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM
> ovs-vswitchd.log | grep "PMD load based sleeps
> >
> >  OVS_VSWITCHD_STOP
> >  AT_CLEANUP
> > +
> > +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> > +
> > +OVS_VSWITCHD_START(
> > +[add-port br0 p1 \
> > +   -- set interface p1 type=dummy-pmd \
> > +   -- set bridge br0 datapath-type=dummy \
>
> It's probably better to change order of above two lines.
>
> > +   -- add-port br0 p2 \
> > +   -- set interface p2 type=dummy-pmd --
>
> '--' at the end is not needed.
>
> > +])
> > +
> > +dnl Add one openflow rule and generate a megaflow.
> > +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=
> 10.1.2.0/24,actions=p2'])
> > +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'],
> [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> > +])
>
> For some reason, this check always fails on FreeBSD:
>
> ./pmd.at:1349: ovs-appctl dpctl/dump-flows | sed 's/.*core: [0-9]*//'
> --- -   2023-08-07 22:43:09.208141000 +0000
> +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/1088/stdout
> 2023-08-07 22:43:09.207372000 +0000
> @@ -1,3 +1 @@
>
> -recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> -
> ---
>
> The flow dump is empty, presumably the PMD thread didn't fully
> process the packet yet at the time of a dump.
>

So, what's your suggestion? set a large max_idle and sleep like a 1s?


> > +
> > +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> > +dnl Replace openflow rules, trigger the revalidation.
> > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16
> actions=ct(commit)' | dnl
> > +ovs-ofctl --bundle replace-flows br0 -])
>
> Indent this line to the level of 'echo' above.
>
> > +AT_CHECK([ovs-appctl revalidator/wait])
> > +
> > +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never,
> actions:ct(commit)
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
> actions:ct(commit)
> > +])
> > +
> > +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> > +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
> > +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24
> tuple in the pvector of subtables.
> > +for i in `seq 0 256`;do
>
> A space after ';'.  Also, it's better to use $(...) instead of `...`.
>
> > +AT_CHECK([ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
>
> Indent this 2 spaces to the right.
>
> > +done
> > +
> > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' |
> dnl
> > +ovs-ofctl --bundle replace-flows br0 -])
>
> Indentation here again.
>
> Should, probably, wait here for revalidation as well?
>
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> > +])
> > +
> > +OVS_VSWITCHD_STOP
> > +AT_CLEANUP
>
>
Ilya Maximets Aug. 8, 2023, 12:51 p.m. UTC | #7
On 8/8/23 09:20, Peng He wrote:
> 
> 
> Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> 于2023年8月8日周二 07:30写道:
> 
>     On 7/31/23 06:55, Peng He wrote:
>     > OVS allows overlapping megaflows, as long as the actions of these
>     > megaflows are equal. However, the current implementation of action
>     > modification relies on flow_lookup instead of ufid, this could result
>     > in looking up a wrong megaflow and make the ukeys and megaflows inconsistent
>     >
>     > Just like the test case in the patch, at first we have a rule with the
>     > prefix:
>     >
>     > 10.1.2.0/24 <http://10.1.2.0/24>
>     >
>     > and we will get a megaflow with prefixes 10.1.2.2/24 <http://10.1.2.2/24> when a packet with IP
>     > 10.1.2.2 is received.
>     >
>     > Then suppose we change the rule into 10.1.0.0/16 <http://10.1.0.0/16>. OVS prefers to keep the
>     > 10.1.2.2/24 <http://10.1.2.2/24> megaflow and just changes its action instead of extending
>     > the prefix into 10.1.2.2/16 <http://10.1.2.2/16>.
>     >
>     > then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
>     > this time, we will have an overlapping megaflow with the right prefix:
>     > 10.1.0.2/16 <http://10.1.0.2/16>
>     >
>     > now we have two megaflows:
>     > 10.1.2.2/24 <http://10.1.2.2/24>
>     > 10.1.0.2/16 <http://10.1.0.2/16>
>     >
>     > last, suppose we have changed the ruleset again. The revalidator this
>     > time still decides to change the actions of both megaflows instead of
>     > deleting them.
>     >
>     > The dpif_netdev_flow_put will search the megaflow to modify with unmasked
>     > keys, however it might lookup the wrong megaflow as the key 10.1.2.2 matches
>     > both 10.1.2.2/24 <http://10.1.2.2/24> and 10.1.0.2/16 <http://10.1.0.2/16>!
>     >
>     > This patch changes the megaflow lookup code in modification path into
>     > relying the ufid to find the correct megaflow instead of key lookup.
>     >
>     > Signed-off-by: Peng He <hepeng.0320@bytedance.com <mailto:hepeng.0320@bytedance.com>>
>     > ---
> 
>     Thanks for the update!  This version looks logically correct to me.
>     See some small comments below.
> 
>     Best regards, Ilya Maximets.
> 
>     >  lib/dpif-netdev.c | 49 ++++++++++++++++++++++++++++++++---------------
>     >  tests/pmd.at <http://pmd.at>      | 48 ++++++++++++++++++++++++++++++++++++++++++++++
>     >  2 files changed, 82 insertions(+), 15 deletions(-)
>     >
>     > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     > index 70b953ae6..8c88a5dc0 100644
>     > --- a/lib/dpif-netdev.c
>     > +++ b/lib/dpif-netdev.c
>     > @@ -4191,7 +4191,7 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>     >                  const struct dpif_flow_put *put,
>     >                  struct dpif_flow_stats *stats)
>     >  {
>     > -    struct dp_netdev_flow *netdev_flow;
>     > +    struct dp_netdev_flow *netdev_flow = NULL;
>     >      int error = 0;
>     > 
>     >      if (stats) {
>     > @@ -4199,16 +4199,39 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>     >      }
>     > 
>     >      ovs_mutex_lock(&pmd->flow_mutex);
>     > -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>     > -    if (!netdev_flow) {
>     > -        if (put->flags & DPIF_FP_CREATE) {
>     > -            dp_netdev_flow_add(pmd, match, ufid, put->actions,
>     > -                               put->actions_len, ODPP_NONE);
>     > +    if (put->ufid) {
>     > +        netdev_flow = dp_netdev_pmd_find_flow(pmd, put->ufid,
>     > +                                              put->key, put->key_len);
>     > +    } else {
>     > +        /* Use key instead of the locally generated ufid
>     > +         * to search netdev_flow.
>     > +         */
> 
>     Nit: Usually, /* and */ in the comment are on their own lines or
>     both on the lines with the text, i.e.
> 
>      /*
>       * Line 1
>       * Line 2
>       */
> 
>     Or
> 
>      /* Line 1
>       * Line 2 */
> 
>     Rarely the style is mixed.  In the case above, I'd suggest to move
>     the closing '*/' to the previous line.
> 
>     > +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
>     > +    }
>     > +
>     > +    if (put->flags & DPIF_FP_CREATE) {
>     > +        if (!netdev_flow) {
>     > +            /* If ufid is provided, use provided ufid, otherwise
>     > +             * use locally generated ufid.
>     > +             */
>     > +            dp_netdev_flow_add(pmd, match, put->ufid ? put->ufid : ufid,
> 
>     This looks  abit strange, because ufid equals put->ufid, if provided.
>     I'd remove the comment and just use 'ufid' here without a ternary
>     operator.
> 
>     Or maybe a cleaner solution is to remove 'ufid = *put->ufid;' assignment
>     from dpif_netdev_flow_put() and re-name the argument of flow_put_on_pmd()
>     function from 'ufid' to 'local_ufid' or something like that.  In this
>     case we should still remove the comment, but keep the ternary operator.
>     Might be easier to read this way.  What do you think?
> 
>  
> I miss ufid = put->ufid here. If so, I guess just use ufid here is fine.
> 
>  
> 
> 
>     > +                               put->actions, put->actions_len, ODPP_NONE);
>     >          } else {
>     > -            error = ENOENT;
>     > +            error = EEXIST;
>     >          }
>     > -    } else {
>     > -        if (put->flags & DPIF_FP_MODIFY) {
>     > +        goto exit;
>     > +    }
>     > +
>     > +    if (put->flags & DPIF_FP_MODIFY) {
>     > +        if (!netdev_flow) {
>     > +            error = ENOENT;
>     > +        } else {
>     > +            if (!flow_equal(&match->flow, &netdev_flow->flow)) {
> 
>     Should this be (!put->ufid && !flow_equal(&match->flow, &netdev_flow->flow)) ?
>     i.e. is it necessary to compare flows if found by external uuid?
> 
> Yes, no need to check against the flow if ufid is provided.
> will fix it in the next version.
> 
> Will fix all the issues you mentioned below.
>  
> 
> 
>     > +                /* Overlapping flow. */
>     > +                error = EINVAL;
>     > +                goto exit;
>     > +            }
>     > +
>     >              struct dp_netdev_actions *new_actions;
>     >              struct dp_netdev_actions *old_actions;
>     > 
>     > @@ -4239,15 +4262,11 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>     >                   *   counter, and subtracting it before outputting the stats */
>     >                  error = EOPNOTSUPP;
>     >              }
>     > -
>     >              ovsrcu_postpone(dp_netdev_actions_free, old_actions);
>     > -        } else if (put->flags & DPIF_FP_CREATE) {
>     > -            error = EEXIST;
>     > -        } else {
>     > -            /* Overlapping flow. */
>     > -            error = EINVAL;
>     >          }
>     >      }
>     > +
>     > +exit:
>     >      ovs_mutex_unlock(&pmd->flow_mutex);
>     >      return error;
>     >  }
>     > diff --git a/tests/pmd.at <http://pmd.at> b/tests/pmd.at <http://pmd.at>
>     > index 48f3d432d..f399294d2 100644
>     > --- a/tests/pmd.at <http://pmd.at>
>     > +++ b/tests/pmd.at <http://pmd.at>
>     > @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps
>     > 
>     >  OVS_VSWITCHD_STOP
>     >  AT_CLEANUP
>     > +
>     > +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
>     > +
>     > +OVS_VSWITCHD_START(
>     > +[add-port br0 p1 \
>     > +   -- set interface p1 type=dummy-pmd \
>     > +   -- set bridge br0 datapath-type=dummy \
> 
>     It's probably better to change order of above two lines.
> 
>     > +   -- add-port br0 p2 \
>     > +   -- set interface p2 type=dummy-pmd --
> 
>     '--' at the end is not needed.
> 
>     > +])
>     > +
>     > +dnl Add one openflow rule and generate a megaflow.
>     > +AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=10.1.2.0/24,actions=p2' <http://10.1.2.0/24,actions=p2'>])
>     > +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], [0], [
>     > +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no <http://10.1.2.2/255.255.255.0,frag=no>), packets:0, bytes:0, used:never, actions:2
>     > +])
> 
>     For some reason, this check always fails on FreeBSD:
> 
>     ./pmd.at:1349 <http://pmd.at:1349>: ovs-appctl dpctl/dump-flows | sed 's/.*core: [0-9]*//'
>     --- -   2023-08-07 22:43:09.208141000 +0000
>     +++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/1088/stdout      2023-08-07 22:43:09.207372000 +0000
>     @@ -1,3 +1 @@
> 
>     -recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no <http://10.1.2.2/255.255.255.0,frag=no>), packets:0, bytes:0, used:never, actions:2
>     -
>     ---
> 
>     The flow dump is empty, presumably the PMD thread didn't fully
>     process the packet yet at the time of a dump.
> 
> 
> So, what's your suggestion? set a large max_idle and sleep like a 1s?

Sleeps are not great.  We should use OVS_WAIT_UNTIL_EQUAL, I guess.

> 
> 
>     > +
>     > +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>     > +dnl Replace openflow rules, trigger the revalidation.
>     > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 <http://10.1.0.0/16> actions=ct(commit)' | dnl
>     > +ovs-ofctl --bundle replace-flows br0 -])
> 
>     Indent this line to the level of 'echo' above.
> 
>     > +AT_CHECK([ovs-appctl revalidator/wait])
>     > +
>     > +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
>     > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
>     > +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no <http://10.1.0.2/255.255.0.0,frag=no>), packets:0, bytes:0, used:never, actions:ct(commit)
>     > +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no <http://10.1.2.2/255.255.255.0,frag=no>), packets:0, bytes:0, used:0.0s, actions:ct(commit)
>     > +])
>     > +
>     > +dnl Hold the prefix 10.1.2.2/24 <http://10.1.2.2/24> by another 10s.
>     > +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
>     > +dnl Send more 10.1.0.2 to make 10.1.0.0/16 <http://10.1.0.0/16> tuple preprend 10.1.2.0/24 <http://10.1.2.0/24> tuple in the pvector of subtables.
>     > +for i in `seq 0 256`;do
> 
>     A space after ';'.  Also, it's better to use $(...) instead of `...`.
> 
>     > +AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
> 
>     Indent this 2 spaces to the right.
> 
>     > +done
>     > +
>     > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 <http://10.1.0.0/16> actions=p2' | dnl
>     > +ovs-ofctl --bundle replace-flows br0 -])
> 
>     Indentation here again.
> 
>     Should, probably, wait here for revalidation as well?
> 
>     > +
>     > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
>     > +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no <http://10.1.0.2/255.255.0.0,frag=no>), packets:0, bytes:0, used:0.0s, actions:2
>     > +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no <http://10.1.2.2/255.255.255.0,frag=no>), packets:0, bytes:0, used:0.0s, actions:2
>     > +])
>     > +
>     > +OVS_VSWITCHD_STOP
>     > +AT_CLEANUP
> 
> 
> 
> -- 
> hepeng
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 70b953ae6..8c88a5dc0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4191,7 +4191,7 @@  flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
                 const struct dpif_flow_put *put,
                 struct dpif_flow_stats *stats)
 {
-    struct dp_netdev_flow *netdev_flow;
+    struct dp_netdev_flow *netdev_flow = NULL;
     int error = 0;
 
     if (stats) {
@@ -4199,16 +4199,39 @@  flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
     }
 
     ovs_mutex_lock(&pmd->flow_mutex);
-    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
-    if (!netdev_flow) {
-        if (put->flags & DPIF_FP_CREATE) {
-            dp_netdev_flow_add(pmd, match, ufid, put->actions,
-                               put->actions_len, ODPP_NONE);
+    if (put->ufid) {
+        netdev_flow = dp_netdev_pmd_find_flow(pmd, put->ufid,
+                                              put->key, put->key_len);
+    } else {
+        /* Use key instead of the locally generated ufid
+         * to search netdev_flow.
+         */
+        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
+    }
+
+    if (put->flags & DPIF_FP_CREATE) {
+        if (!netdev_flow) {
+            /* If ufid is provided, use provided ufid, otherwise
+             * use locally generated ufid.
+             */
+            dp_netdev_flow_add(pmd, match, put->ufid ? put->ufid : ufid,
+                               put->actions, put->actions_len, ODPP_NONE);
         } else {
-            error = ENOENT;
+            error = EEXIST;
         }
-    } else {
-        if (put->flags & DPIF_FP_MODIFY) {
+        goto exit;
+    }
+
+    if (put->flags & DPIF_FP_MODIFY) {
+        if (!netdev_flow) {
+            error = ENOENT;
+        } else {
+            if (!flow_equal(&match->flow, &netdev_flow->flow)) {
+                /* Overlapping flow. */
+                error = EINVAL;
+                goto exit;
+            }
+
             struct dp_netdev_actions *new_actions;
             struct dp_netdev_actions *old_actions;
 
@@ -4239,15 +4262,11 @@  flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
                  *   counter, and subtracting it before outputting the stats */
                 error = EOPNOTSUPP;
             }
-
             ovsrcu_postpone(dp_netdev_actions_free, old_actions);
-        } else if (put->flags & DPIF_FP_CREATE) {
-            error = EEXIST;
-        } else {
-            /* Overlapping flow. */
-            error = EINVAL;
         }
     }
+
+exit:
     ovs_mutex_unlock(&pmd->flow_mutex);
     return error;
 }
diff --git a/tests/pmd.at b/tests/pmd.at
index 48f3d432d..f399294d2 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -1300,3 +1300,51 @@  OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
+
+OVS_VSWITCHD_START(
+[add-port br0 p1 \
+   -- set interface p1 type=dummy-pmd \
+   -- set bridge br0 datapath-type=dummy \
+   -- add-port br0 p2 \
+   -- set interface p2 type=dummy-pmd --
+])
+
+dnl Add one openflow rule and generate a megaflow.
+AT_CHECK([ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=10.1.2.0/24,actions=p2'])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], [0], [
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
+])
+
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
+dnl Replace openflow rules, trigger the revalidation.
+AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=ct(commit)' | dnl
+ovs-ofctl --bundle replace-flows br0 -])
+AT_CHECK([ovs-appctl revalidator/wait])
+
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never, actions:ct(commit)
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:ct(commit)
+])
+
+dnl Hold the prefix 10.1.2.2/24 by another 10s.
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'])
+dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24 tuple in the pvector of subtables.
+for i in `seq 0 256`;do
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'])
+done
+
+AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' | dnl
+ovs-ofctl --bundle replace-flows br0 -])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP