diff mbox series

[ovs-dev,ovs-dev,v3] dpif-netdev: fix dpif_netdev_flow_put

Message ID 20230615025117.24631-1-hepeng.0320@bytedance.com
State Changes Requested
Headers show
Series [ovs-dev,ovs-dev,v3] dpif-netdev: fix dpif_netdev_flow_put | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Peng He June 15, 2023, 2:51 a.m. UTC
OVS allows overlapping megaflows, as long as the actions of these
megaflows are equal. However, the current implementation of action
modification relies on flow_lookup instead of ufid, this could result
in looking up a wrong megaflow and make the ukeys and megaflows inconsistent

Just like the test case in the patch, at first we have a rule with the
prefix:

10.1.2.0/24

and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with IP
10.1.2.2 is received.

Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep the
10.1.2.2/24 megaflow and just changes its action instead of extending
the prefix into 10.1.2.2/16.

then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
this time, we will have an overlapping megaflow with the right prefix:
10.1.0.2/16

now we have two megaflows:
10.1.2.2/24
10.1.0.2/16

last, suppose we have changed the ruleset again. The revalidator this
time still decides to change the actions of both megaflows instead of
deleting them.

The dpif_netdev_flow_put will search the megaflow to modify with unmasked
keys, however it might lookup the wrong megaflow as the key 10.1.2.2 matches
both 10.1.2.2/24 and 10.1.0.2/16!

This patch changes the megaflow lookup code in modification path into
relying the ufid to find the correct megaflow instead of key lookup.

Signed-off-by: Peng He <hepeng.0320@bytedance.com>
---
 lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++---------------------
 tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+), 28 deletions(-)

Comments

0-day Robot June 15, 2023, 2:59 a.m. UTC | #1
Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
ERROR: Author Peng He <xnhp0320@gmail.com> needs to sign off.
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Peng He <hepeng.0320@bytedance.com>
Lines checked: 201, Warnings: 1, Errors: 1


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Eelco Chaudron June 23, 2023, 2:31 p.m. UTC | #2
On 15 Jun 2023, at 4:51, Peng He wrote:

> OVS allows overlapping megaflows, as long as the actions of these
> megaflows are equal. However, the current implementation of action
> modification relies on flow_lookup instead of ufid, this could result
> in looking up a wrong megaflow and make the ukeys and megaflows inconsistent
>
> Just like the test case in the patch, at first we have a rule with the
> prefix:
>
> 10.1.2.0/24
>
> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with IP
> 10.1.2.2 is received.
>
> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep the
> 10.1.2.2/24 megaflow and just changes its action instead of extending
> the prefix into 10.1.2.2/16.
>
> then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> this time, we will have an overlapping megaflow with the right prefix:
> 10.1.0.2/16
>
> now we have two megaflows:
> 10.1.2.2/24
> 10.1.0.2/16
>
> last, suppose we have changed the ruleset again. The revalidator this
> time still decides to change the actions of both megaflows instead of
> deleting them.
>
> The dpif_netdev_flow_put will search the megaflow to modify with unmasked
> keys, however it might lookup the wrong megaflow as the key 10.1.2.2 matches
> both 10.1.2.2/24 and 10.1.0.2/16!
>
> This patch changes the megaflow lookup code in modification path into
> relying the ufid to find the correct megaflow instead of key lookup.
>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>

Thanks for the fix, and in general it looks good, however I have one question about not covering all cases. See below.

Cheers,

Eelco


> ---
>  lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++---------------------
>  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+), 28 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 70b953ae6..b90ed1542 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>          memset(stats, 0, sizeof *stats);
>      }
>
> -    ovs_mutex_lock(&pmd->flow_mutex);
> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> -    if (!netdev_flow) {
> -        if (put->flags & DPIF_FP_CREATE) {
> +    if (put->flags & DPIF_FP_CREATE) {

Should this not be:

if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY))

I know this will break the fix, but I’m wondering what the possible combinations are.
Quickly looking at the code the following ones seem to exist:

DPIF_FP_CREATE	-> Create a flow, if one exists return EEXIST
DPIF_FP_MODIFY  -> Modify a flow, if it does not exist return ENONT
DPIF_FP_CREATE | DPIF_FP_MODIFY	  -> If it exists modify it, if it does not exists create it.

Could the last combination not potentially fail the lookup?

Or are we assuming only standalone DPIF_FP_MODIFY’s are the problem? In theory, it could also be the combination.

> +        ovs_mutex_lock(&pmd->flow_mutex);
> +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> +        if (!netdev_flow) {
>              dp_netdev_flow_add(pmd, match, ufid, put->actions,
>                                 put->actions_len, ODPP_NONE);
>          } else {
> -            error = ENOENT;
> +            error = EEXIST;
>          }
> +        ovs_mutex_unlock(&pmd->flow_mutex);
>      } else {
> +        netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid,
> +                                              put->key, put->key_len);
> +
>          if (put->flags & DPIF_FP_MODIFY) {
> -            struct dp_netdev_actions *new_actions;
> -            struct dp_netdev_actions *old_actions;
> +            if (!netdev_flow) {
> +                error = ENOENT;
> +            } else {
> +                struct dp_netdev_actions *new_actions;
> +                struct dp_netdev_actions *old_actions;
>
> -            new_actions = dp_netdev_actions_create(put->actions,
> -                                                   put->actions_len);
> +                new_actions = dp_netdev_actions_create(put->actions,
> +                                                       put->actions_len);
>
> -            old_actions = dp_netdev_flow_get_actions(netdev_flow);
> -            ovsrcu_set(&netdev_flow->actions, new_actions);
> +                old_actions = dp_netdev_flow_get_actions(netdev_flow);
> +                ovsrcu_set(&netdev_flow->actions, new_actions);
>
> -            queue_netdev_flow_put(pmd, netdev_flow, match,
> -                                  put->actions, put->actions_len,
> -                                  DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> -            log_netdev_flow_change(netdev_flow, match, old_actions,
> -                                   put->actions, put->actions_len);
> +                queue_netdev_flow_put(pmd, netdev_flow, match,
> +                                      put->actions, put->actions_len,
> +                                      DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> +                log_netdev_flow_change(netdev_flow, match, old_actions,
> +                                       put->actions, put->actions_len);
>
> -            if (stats) {
> -                get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> -            }
> -            if (put->flags & DPIF_FP_ZERO_STATS) {
> +                if (stats) {
> +                    get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> +                }
> +                if (put->flags & DPIF_FP_ZERO_STATS) {
>                  /* XXX: The userspace datapath uses thread local statistics
>                   * (for flows), which should be updated only by the owning
>                   * thread.  Since we cannot write on stats memory here,
> @@ -4237,18 +4244,17 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>                   * - Should the need arise, this operation can be implemented
>                   *   by keeping a base value (to be update here) for each
>                   *   counter, and subtracting it before outputting the stats */
> -                error = EOPNOTSUPP;
> +                    error = EOPNOTSUPP;
> +                }
> +                ovsrcu_postpone(dp_netdev_actions_free, old_actions);
>              }
> -
> -            ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> -        } else if (put->flags & DPIF_FP_CREATE) {
> -            error = EEXIST;
>          } else {
> -            /* Overlapping flow. */
> -            error = EINVAL;
> +            if (netdev_flow) {
> +                /* Overlapping flow. */
> +                error = EINVAL;
> +            }
>          }
>      }
> -    ovs_mutex_unlock(&pmd->flow_mutex);
>      return error;
>  }
>
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 48f3d432d..a78e86c54 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps
>
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> +
> +OVS_VSWITCHD_START(
> +[add-port br0 p1 \
> +   -- set interface p1 type=dummy-pmd \
> +   -- set bridge br0 datapath-type=dummy \
> +   -- add-port br0 p2 \
> +   -- set interface p2 type=dummy-pmd --
> +])
> +
> +dnl Add one openflow rule and generate a megaflow.
> +ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=10.1.2.0/24,actions=p2'
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> +])
> +
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> +dnl Replace openflow rules, trigger the revalidation.
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=ct(commit)' | dnl
> +ovs-ofctl --bundle replace-flows br0 -])
> +ovs-appctl revalidator/wait
> +
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never, actions:ct(commit)
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:ct(commit)
> +])
> +
> +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24 tuple in pvector of subtable.
> +for i in `seq 0 256`;do
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> +done
> +
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' | dnl
> +ovs-ofctl --bundle replace-flows br0 -])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
> -- 
> 2.25.1
Ilya Maximets June 23, 2023, 8:27 p.m. UTC | #3
On 6/15/23 04:51, Peng He wrote:
> OVS allows overlapping megaflows, as long as the actions of these
> megaflows are equal. However, the current implementation of action
> modification relies on flow_lookup instead of ufid, this could result
> in looking up a wrong megaflow and make the ukeys and megaflows inconsistent
> 
> Just like the test case in the patch, at first we have a rule with the
> prefix:
> 
> 10.1.2.0/24
> 
> and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with IP
> 10.1.2.2 is received.
> 
> Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep the
> 10.1.2.2/24 megaflow and just changes its action instead of extending
> the prefix into 10.1.2.2/16.
> 
> then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> this time, we will have an overlapping megaflow with the right prefix:
> 10.1.0.2/16
> 
> now we have two megaflows:
> 10.1.2.2/24
> 10.1.0.2/16
> 
> last, suppose we have changed the ruleset again. The revalidator this
> time still decides to change the actions of both megaflows instead of
> deleting them.
> 
> The dpif_netdev_flow_put will search the megaflow to modify with unmasked
> keys, however it might lookup the wrong megaflow as the key 10.1.2.2 matches
> both 10.1.2.2/24 and 10.1.0.2/16!
> 
> This patch changes the megaflow lookup code in modification path into
> relying the ufid to find the correct megaflow instead of key lookup.
> 
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---
>  lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++---------------------
>  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 70b953ae6..b90ed1542 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>          memset(stats, 0, sizeof *stats);
>      }
>  
> -    ovs_mutex_lock(&pmd->flow_mutex);
> -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> -    if (!netdev_flow) {
> -        if (put->flags & DPIF_FP_CREATE) {
> +    if (put->flags & DPIF_FP_CREATE) {
> +        ovs_mutex_lock(&pmd->flow_mutex);
> +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> +        if (!netdev_flow) {
>              dp_netdev_flow_add(pmd, match, ufid, put->actions,
>                                 put->actions_len, ODPP_NONE);
>          } else {
> -            error = ENOENT;
> +            error = EEXIST;
>          }
> +        ovs_mutex_unlock(&pmd->flow_mutex);
>      } else {
> +        netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid,
> +                                              put->key, put->key_len);
> +
>          if (put->flags & DPIF_FP_MODIFY) {
> -            struct dp_netdev_actions *new_actions;
> -            struct dp_netdev_actions *old_actions;
> +            if (!netdev_flow) {
> +                error = ENOENT;
> +            } else {
> +                struct dp_netdev_actions *new_actions;
> +                struct dp_netdev_actions *old_actions;
>  
> -            new_actions = dp_netdev_actions_create(put->actions,
> -                                                   put->actions_len);
> +                new_actions = dp_netdev_actions_create(put->actions,
> +                                                       put->actions_len);
>  
> -            old_actions = dp_netdev_flow_get_actions(netdev_flow);
> -            ovsrcu_set(&netdev_flow->actions, new_actions);
> +                old_actions = dp_netdev_flow_get_actions(netdev_flow);
> +                ovsrcu_set(&netdev_flow->actions, new_actions);
>  
> -            queue_netdev_flow_put(pmd, netdev_flow, match,
> -                                  put->actions, put->actions_len,
> -                                  DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> -            log_netdev_flow_change(netdev_flow, match, old_actions,
> -                                   put->actions, put->actions_len);
> +                queue_netdev_flow_put(pmd, netdev_flow, match,
> +                                      put->actions, put->actions_len,
> +                                      DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> +                log_netdev_flow_change(netdev_flow, match, old_actions,
> +                                       put->actions, put->actions_len);
>  
> -            if (stats) {
> -                get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> -            }
> -            if (put->flags & DPIF_FP_ZERO_STATS) {
> +                if (stats) {
> +                    get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> +                }
> +                if (put->flags & DPIF_FP_ZERO_STATS) {
>                  /* XXX: The userspace datapath uses thread local statistics
>                   * (for flows), which should be updated only by the owning
>                   * thread.  Since we cannot write on stats memory here,
> @@ -4237,18 +4244,17 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>                   * - Should the need arise, this operation can be implemented
>                   *   by keeping a base value (to be update here) for each
>                   *   counter, and subtracting it before outputting the stats */
> -                error = EOPNOTSUPP;
> +                    error = EOPNOTSUPP;
> +                }
> +                ovsrcu_postpone(dp_netdev_actions_free, old_actions);
>              }
> -
> -            ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> -        } else if (put->flags & DPIF_FP_CREATE) {
> -            error = EEXIST;
>          } else {
> -            /* Overlapping flow. */
> -            error = EINVAL;
> +            if (netdev_flow) {
> +                /* Overlapping flow. */
> +                error = EINVAL;
> +            }
>          }
>      }
> -    ovs_mutex_unlock(&pmd->flow_mutex);
>      return error;
>  }
>  
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 48f3d432d..a78e86c54 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps
>  
>  OVS_VSWITCHD_STOP
>  AT_CLEANUP
> +
> +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> +
> +OVS_VSWITCHD_START(
> +[add-port br0 p1 \
> +   -- set interface p1 type=dummy-pmd \
> +   -- set bridge br0 datapath-type=dummy \
> +   -- add-port br0 p2 \
> +   -- set interface p2 type=dummy-pmd --
> +])
> +
> +dnl Add one openflow rule and generate a megaflow.
> +ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=10.1.2.0/24,actions=p2'
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'

Hi,
I didnt' review the logic in the patch, but, please, wrap
the calls above and in the test below with the AT_CHECK,
so it's visible in the log what the test is doing.  And
potentially catch issues if there are any.

Thanks.

Best regards, Ilya Maximets.

> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> +])
> +
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> +dnl Replace openflow rules, trigger the revalidation.
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=ct(commit)' | dnl
> +ovs-ofctl --bundle replace-flows br0 -])
> +ovs-appctl revalidator/wait
> +
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never, actions:ct(commit)
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:ct(commit)
> +])
> +
> +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24 tuple in pvector of subtable.
> +for i in `seq 0 256`;do
> +ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> +done
> +
> +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' | dnl
> +ovs-ofctl --bundle replace-flows br0 -])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> +])
> +
> +OVS_VSWITCHD_STOP
> +AT_CLEANUP
Peng He June 26, 2023, 1:04 p.m. UTC | #4
Ilya Maximets <i.maximets@ovn.org> 于2023年6月24日周六 04:26写道:

> On 6/15/23 04:51, Peng He wrote:
> > OVS allows overlapping megaflows, as long as the actions of these
> > megaflows are equal. However, the current implementation of action
> > modification relies on flow_lookup instead of ufid, this could result
> > in looking up a wrong megaflow and make the ukeys and megaflows
> inconsistent
> >
> > Just like the test case in the patch, at first we have a rule with the
> > prefix:
> >
> > 10.1.2.0/24
> >
> > and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with
> IP
> > 10.1.2.2 is received.
> >
> > Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep
> the
> > 10.1.2.2/24 megaflow and just changes its action instead of extending
> > the prefix into 10.1.2.2/16.
> >
> > then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> > this time, we will have an overlapping megaflow with the right prefix:
> > 10.1.0.2/16
> >
> > now we have two megaflows:
> > 10.1.2.2/24
> > 10.1.0.2/16
> >
> > last, suppose we have changed the ruleset again. The revalidator this
> > time still decides to change the actions of both megaflows instead of
> > deleting them.
> >
> > The dpif_netdev_flow_put will search the megaflow to modify with unmasked
> > keys, however it might lookup the wrong megaflow as the key 10.1.2.2
> matches
> > both 10.1.2.2/24 and 10.1.0.2/16!
> >
> > This patch changes the megaflow lookup code in modification path into
> > relying the ufid to find the correct megaflow instead of key lookup.
> >
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > ---
> >  lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++---------------------
> >  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 82 insertions(+), 28 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 70b953ae6..b90ed1542 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >          memset(stats, 0, sizeof *stats);
> >      }
> >
> > -    ovs_mutex_lock(&pmd->flow_mutex);
> > -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> > -    if (!netdev_flow) {
> > -        if (put->flags & DPIF_FP_CREATE) {
> > +    if (put->flags & DPIF_FP_CREATE) {
> > +        ovs_mutex_lock(&pmd->flow_mutex);
> > +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> > +        if (!netdev_flow) {
> >              dp_netdev_flow_add(pmd, match, ufid, put->actions,
> >                                 put->actions_len, ODPP_NONE);
> >          } else {
> > -            error = ENOENT;
> > +            error = EEXIST;
> >          }
> > +        ovs_mutex_unlock(&pmd->flow_mutex);
> >      } else {
> > +        netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid,
> > +                                              put->key, put->key_len);
> > +
> >          if (put->flags & DPIF_FP_MODIFY) {
> > -            struct dp_netdev_actions *new_actions;
> > -            struct dp_netdev_actions *old_actions;
> > +            if (!netdev_flow) {
> > +                error = ENOENT;
> > +            } else {
> > +                struct dp_netdev_actions *new_actions;
> > +                struct dp_netdev_actions *old_actions;
> >
> > -            new_actions = dp_netdev_actions_create(put->actions,
> > -                                                   put->actions_len);
> > +                new_actions = dp_netdev_actions_create(put->actions,
> > +
>  put->actions_len);
> >
> > -            old_actions = dp_netdev_flow_get_actions(netdev_flow);
> > -            ovsrcu_set(&netdev_flow->actions, new_actions);
> > +                old_actions = dp_netdev_flow_get_actions(netdev_flow);
> > +                ovsrcu_set(&netdev_flow->actions, new_actions);
> >
> > -            queue_netdev_flow_put(pmd, netdev_flow, match,
> > -                                  put->actions, put->actions_len,
> > -                                  DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> > -            log_netdev_flow_change(netdev_flow, match, old_actions,
> > -                                   put->actions, put->actions_len);
> > +                queue_netdev_flow_put(pmd, netdev_flow, match,
> > +                                      put->actions, put->actions_len,
> > +                                      DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> > +                log_netdev_flow_change(netdev_flow, match, old_actions,
> > +                                       put->actions, put->actions_len);
> >
> > -            if (stats) {
> > -                get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> > -            }
> > -            if (put->flags & DPIF_FP_ZERO_STATS) {
> > +                if (stats) {
> > +                    get_dpif_flow_status(pmd->dp, netdev_flow, stats,
> NULL);
> > +                }
> > +                if (put->flags & DPIF_FP_ZERO_STATS) {
> >                  /* XXX: The userspace datapath uses thread local
> statistics
> >                   * (for flows), which should be updated only by the
> owning
> >                   * thread.  Since we cannot write on stats memory here,
> > @@ -4237,18 +4244,17 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >                   * - Should the need arise, this operation can be
> implemented
> >                   *   by keeping a base value (to be update here) for
> each
> >                   *   counter, and subtracting it before outputting the
> stats */
> > -                error = EOPNOTSUPP;
> > +                    error = EOPNOTSUPP;
> > +                }
> > +                ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> >              }
> > -
> > -            ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> > -        } else if (put->flags & DPIF_FP_CREATE) {
> > -            error = EEXIST;
> >          } else {
> > -            /* Overlapping flow. */
> > -            error = EINVAL;
> > +            if (netdev_flow) {
> > +                /* Overlapping flow. */
> > +                error = EINVAL;
> > +            }
> >          }
> >      }
> > -    ovs_mutex_unlock(&pmd->flow_mutex);
> >      return error;
> >  }
> >
> > diff --git a/tests/pmd.at b/tests/pmd.at
> > index 48f3d432d..a78e86c54 100644
> > --- a/tests/pmd.at
> > +++ b/tests/pmd.at
> > @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM
> ovs-vswitchd.log | grep "PMD load based sleeps
> >
> >  OVS_VSWITCHD_STOP
> >  AT_CLEANUP
> > +
> > +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> > +
> > +OVS_VSWITCHD_START(
> > +[add-port br0 p1 \
> > +   -- set interface p1 type=dummy-pmd \
> > +   -- set bridge br0 datapath-type=dummy \
> > +   -- add-port br0 p2 \
> > +   -- set interface p2 type=dummy-pmd --
> > +])
> > +
> > +dnl Add one openflow rule and generate a megaflow.
> > +ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=
> 10.1.2.0/24,actions=p2'
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
>
> Hi,
> I didnt' review the logic in the patch, but, please, wrap
> the calls above and in the test below with the AT_CHECK,
> so it's visible in the log what the test is doing.  And
> potentially catch issues if there are any.
>
> OK, will do it in the next version.


> Thanks.
>
> Best regards, Ilya Maximets.
>
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'],
> [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> > +])
> > +
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> > +dnl Replace openflow rules, trigger the revalidation.
> > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16
> actions=ct(commit)' | dnl
> > +ovs-ofctl --bundle replace-flows br0 -])
> > +ovs-appctl revalidator/wait
> > +
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never,
> actions:ct(commit)
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
> actions:ct(commit)
> > +])
> > +
> > +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> > +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24
> tuple in pvector of subtable.
> > +for i in `seq 0 256`;do
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> > +done
> > +
> > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' |
> dnl
> > +ovs-ofctl --bundle replace-flows br0 -])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> > +])
> > +
> > +OVS_VSWITCHD_STOP
> > +AT_CLEANUP
>
>
Peng He July 1, 2023, 2:40 a.m. UTC | #5
Eelco Chaudron <echaudro@redhat.com> 于2023年6月23日周五 22:31写道:

>
>
> On 15 Jun 2023, at 4:51, Peng He wrote:
>
> > OVS allows overlapping megaflows, as long as the actions of these
> > megaflows are equal. However, the current implementation of action
> > modification relies on flow_lookup instead of ufid, this could result
> > in looking up a wrong megaflow and make the ukeys and megaflows
> inconsistent
> >
> > Just like the test case in the patch, at first we have a rule with the
> > prefix:
> >
> > 10.1.2.0/24
> >
> > and we will get a megaflow with prefixes 10.1.2.2/24 when a packet with
> IP
> > 10.1.2.2 is received.
> >
> > Then suppose we change the rule into 10.1.0.0/16. OVS prefers to keep
> the
> > 10.1.2.2/24 megaflow and just changes its action instead of extending
> > the prefix into 10.1.2.2/16.
> >
> > then suppose we have a 10.1.0.2 packet, since it misses the megaflow,
> > this time, we will have an overlapping megaflow with the right prefix:
> > 10.1.0.2/16
> >
> > now we have two megaflows:
> > 10.1.2.2/24
> > 10.1.0.2/16
> >
> > last, suppose we have changed the ruleset again. The revalidator this
> > time still decides to change the actions of both megaflows instead of
> > deleting them.
> >
> > The dpif_netdev_flow_put will search the megaflow to modify with unmasked
> > keys, however it might lookup the wrong megaflow as the key 10.1.2.2
> matches
> > both 10.1.2.2/24 and 10.1.0.2/16!
> >
> > This patch changes the megaflow lookup code in modification path into
> > relying the ufid to find the correct megaflow instead of key lookup.
> >
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>
> Thanks for the fix, and in general it looks good, however I have one
> question about not covering all cases. See below.
>
> Cheers,
>
> Eelco
>
>
> > ---
> >  lib/dpif-netdev.c | 62 ++++++++++++++++++++++++++---------------------
> >  tests/pmd.at      | 48 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 82 insertions(+), 28 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 70b953ae6..b90ed1542 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4198,36 +4198,43 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >          memset(stats, 0, sizeof *stats);
> >      }
> >
> > -    ovs_mutex_lock(&pmd->flow_mutex);
> > -    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> > -    if (!netdev_flow) {
> > -        if (put->flags & DPIF_FP_CREATE) {
> > +    if (put->flags & DPIF_FP_CREATE) {
>
> Should this not be:
>
> if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY))
>
> I know this will break the fix, but I’m wondering what the possible
> combinations are.
> Quickly looking at the code the following ones seem to exist:
>
> DPIF_FP_CREATE  -> Create a flow, if one exists return EEXIST
> DPIF_FP_MODIFY  -> Modify a flow, if it does not exist return ENONT
> DPIF_FP_CREATE | DPIF_FP_MODIFY   -> If it exists modify it, if it does
> not exists create it.
>
> Could the last combination not potentially fail the lookup?
>
> Or are we assuming only standalone DPIF_FP_MODIFY’s are the problem? In
> theory, it could also be the combination.
>

sorry, I was wrong. Such a combination does exist in
the dpif_probe_feature, and it probes the datapath by trying to put flows:

     error = dpif_flow_put(dpif, DPIF_FP_CREATE | DPIF_FP_MODIFY |
DPIF_FP_PROBE,
                           key->data, key->size, NULL, 0,
                           nl_actions, nl_actions_size,
                           ufid, NON_PMD_CORE_ID, NULL);

so we CANNOT change the code into:

if (put->flags & DPIF_FP_CREATE && !(put->flags & DPIF_FP_MODIFY))

as the issue the patch tries to fix only exist in MODIFY alone path.
If CREATE bit is set, we need to go through the CREATE path no matter if
MODIFY exist or not.



>
> > +        ovs_mutex_lock(&pmd->flow_mutex);
> > +        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
> > +        if (!netdev_flow) {
> >              dp_netdev_flow_add(pmd, match, ufid, put->actions,
> >                                 put->actions_len, ODPP_NONE);
> >          } else {
> > -            error = ENOENT;
> > +            error = EEXIST;
> >          }
> > +        ovs_mutex_unlock(&pmd->flow_mutex);
> >      } else {
> > +        netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid,
> > +                                              put->key, put->key_len);
> > +
> >          if (put->flags & DPIF_FP_MODIFY) {
> > -            struct dp_netdev_actions *new_actions;
> > -            struct dp_netdev_actions *old_actions;
> > +            if (!netdev_flow) {
> > +                error = ENOENT;
> > +            } else {
> > +                struct dp_netdev_actions *new_actions;
> > +                struct dp_netdev_actions *old_actions;
> >
> > -            new_actions = dp_netdev_actions_create(put->actions,
> > -                                                   put->actions_len);
> > +                new_actions = dp_netdev_actions_create(put->actions,
> > +
>  put->actions_len);
> >
> > -            old_actions = dp_netdev_flow_get_actions(netdev_flow);
> > -            ovsrcu_set(&netdev_flow->actions, new_actions);
> > +                old_actions = dp_netdev_flow_get_actions(netdev_flow);
> > +                ovsrcu_set(&netdev_flow->actions, new_actions);
> >
> > -            queue_netdev_flow_put(pmd, netdev_flow, match,
> > -                                  put->actions, put->actions_len,
> > -                                  DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> > -            log_netdev_flow_change(netdev_flow, match, old_actions,
> > -                                   put->actions, put->actions_len);
> > +                queue_netdev_flow_put(pmd, netdev_flow, match,
> > +                                      put->actions, put->actions_len,
> > +                                      DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
> > +                log_netdev_flow_change(netdev_flow, match, old_actions,
> > +                                       put->actions, put->actions_len);
> >
> > -            if (stats) {
> > -                get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
> > -            }
> > -            if (put->flags & DPIF_FP_ZERO_STATS) {
> > +                if (stats) {
> > +                    get_dpif_flow_status(pmd->dp, netdev_flow, stats,
> NULL);
> > +                }
> > +                if (put->flags & DPIF_FP_ZERO_STATS) {
> >                  /* XXX: The userspace datapath uses thread local
> statistics
> >                   * (for flows), which should be updated only by the
> owning
> >                   * thread.  Since we cannot write on stats memory here,
> > @@ -4237,18 +4244,17 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >                   * - Should the need arise, this operation can be
> implemented
> >                   *   by keeping a base value (to be update here) for
> each
> >                   *   counter, and subtracting it before outputting the
> stats */
> > -                error = EOPNOTSUPP;
> > +                    error = EOPNOTSUPP;
> > +                }
> > +                ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> >              }
> > -
> > -            ovsrcu_postpone(dp_netdev_actions_free, old_actions);
> > -        } else if (put->flags & DPIF_FP_CREATE) {
> > -            error = EEXIST;
> >          } else {
> > -            /* Overlapping flow. */
> > -            error = EINVAL;
> > +            if (netdev_flow) {
> > +                /* Overlapping flow. */
> > +                error = EINVAL;
> > +            }
> >          }
> >      }
> > -    ovs_mutex_unlock(&pmd->flow_mutex);
> >      return error;
> >  }
> >
> > diff --git a/tests/pmd.at b/tests/pmd.at
> > index 48f3d432d..a78e86c54 100644
> > --- a/tests/pmd.at
> > +++ b/tests/pmd.at
> > @@ -1300,3 +1300,51 @@ OVS_WAIT_UNTIL([tail -n +$LINENUM
> ovs-vswitchd.log | grep "PMD load based sleeps
> >
> >  OVS_VSWITCHD_STOP
> >  AT_CLEANUP
> > +
> > +AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
> > +
> > +OVS_VSWITCHD_START(
> > +[add-port br0 p1 \
> > +   -- set interface p1 type=dummy-pmd \
> > +   -- set bridge br0 datapath-type=dummy \
> > +   -- add-port br0 p2 \
> > +   -- set interface p2 type=dummy-pmd --
> > +])
> > +
> > +dnl Add one openflow rule and generate a megaflow.
> > +ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=
> 10.1.2.0/24,actions=p2'
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'],
> [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
> > +])
> > +
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> > +dnl Replace openflow rules, trigger the revalidation.
> > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16
> actions=ct(commit)' | dnl
> > +ovs-ofctl --bundle replace-flows br0 -])
> > +ovs-appctl revalidator/wait
> > +
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never,
> actions:ct(commit)
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s,
> actions:ct(commit)
> > +])
> > +
> > +dnl Hold the prefix 10.1.2.2/24 by another 10s.
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
> > +dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24
> tuple in pvector of subtable.
> > +for i in `seq 0 256`;do
> > +ovs-appctl netdev-dummy/receive p1
> 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
> > +done
> > +
> > +AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' |
> dnl
> > +ovs-ofctl --bundle replace-flows br0 -])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' |
> strip_xout_keep_actions], [0], [
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> >
> +recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=
> 10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
> > +])
> > +
> > +OVS_VSWITCHD_STOP
> > +AT_CLEANUP
> > --
> > 2.25.1
>
>
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 70b953ae6..b90ed1542 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4198,36 +4198,43 @@  flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
         memset(stats, 0, sizeof *stats);
     }
 
-    ovs_mutex_lock(&pmd->flow_mutex);
-    netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
-    if (!netdev_flow) {
-        if (put->flags & DPIF_FP_CREATE) {
+    if (put->flags & DPIF_FP_CREATE) {
+        ovs_mutex_lock(&pmd->flow_mutex);
+        netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);
+        if (!netdev_flow) {
             dp_netdev_flow_add(pmd, match, ufid, put->actions,
                                put->actions_len, ODPP_NONE);
         } else {
-            error = ENOENT;
+            error = EEXIST;
         }
+        ovs_mutex_unlock(&pmd->flow_mutex);
     } else {
+        netdev_flow = dp_netdev_pmd_find_flow(pmd, ufid,
+                                              put->key, put->key_len);
+
         if (put->flags & DPIF_FP_MODIFY) {
-            struct dp_netdev_actions *new_actions;
-            struct dp_netdev_actions *old_actions;
+            if (!netdev_flow) {
+                error = ENOENT;
+            } else {
+                struct dp_netdev_actions *new_actions;
+                struct dp_netdev_actions *old_actions;
 
-            new_actions = dp_netdev_actions_create(put->actions,
-                                                   put->actions_len);
+                new_actions = dp_netdev_actions_create(put->actions,
+                                                       put->actions_len);
 
-            old_actions = dp_netdev_flow_get_actions(netdev_flow);
-            ovsrcu_set(&netdev_flow->actions, new_actions);
+                old_actions = dp_netdev_flow_get_actions(netdev_flow);
+                ovsrcu_set(&netdev_flow->actions, new_actions);
 
-            queue_netdev_flow_put(pmd, netdev_flow, match,
-                                  put->actions, put->actions_len,
-                                  DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
-            log_netdev_flow_change(netdev_flow, match, old_actions,
-                                   put->actions, put->actions_len);
+                queue_netdev_flow_put(pmd, netdev_flow, match,
+                                      put->actions, put->actions_len,
+                                      DP_NETDEV_FLOW_OFFLOAD_OP_MOD);
+                log_netdev_flow_change(netdev_flow, match, old_actions,
+                                       put->actions, put->actions_len);
 
-            if (stats) {
-                get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
-            }
-            if (put->flags & DPIF_FP_ZERO_STATS) {
+                if (stats) {
+                    get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
+                }
+                if (put->flags & DPIF_FP_ZERO_STATS) {
                 /* XXX: The userspace datapath uses thread local statistics
                  * (for flows), which should be updated only by the owning
                  * thread.  Since we cannot write on stats memory here,
@@ -4237,18 +4244,17 @@  flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
                  * - Should the need arise, this operation can be implemented
                  *   by keeping a base value (to be update here) for each
                  *   counter, and subtracting it before outputting the stats */
-                error = EOPNOTSUPP;
+                    error = EOPNOTSUPP;
+                }
+                ovsrcu_postpone(dp_netdev_actions_free, old_actions);
             }
-
-            ovsrcu_postpone(dp_netdev_actions_free, old_actions);
-        } else if (put->flags & DPIF_FP_CREATE) {
-            error = EEXIST;
         } else {
-            /* Overlapping flow. */
-            error = EINVAL;
+            if (netdev_flow) {
+                /* Overlapping flow. */
+                error = EINVAL;
+            }
         }
     }
-    ovs_mutex_unlock(&pmd->flow_mutex);
     return error;
 }
 
diff --git a/tests/pmd.at b/tests/pmd.at
index 48f3d432d..a78e86c54 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -1300,3 +1300,51 @@  OVS_WAIT_UNTIL([tail -n +$LINENUM ovs-vswitchd.log | grep "PMD load based sleeps
 
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([PMD - revalidator wrongly modify userspace megaflows])
+
+OVS_VSWITCHD_START(
+[add-port br0 p1 \
+   -- set interface p1 type=dummy-pmd \
+   -- set bridge br0 datapath-type=dummy \
+   -- add-port br0 p2 \
+   -- set interface p2 type=dummy-pmd --
+])
+
+dnl Add one openflow rule and generate a megaflow.
+ovs-ofctl add-flow br0 'table=0,in_port=p1,ip,nw_dst=10.1.2.0/24,actions=p2'
+ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//'], [0], [
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:never, actions:2
+])
+
+ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
+dnl Replace openflow rules, trigger the revalidation.
+AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=ct(commit)' | dnl
+ovs-ofctl --bundle replace-flows br0 -])
+ovs-appctl revalidator/wait
+
+ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:never, actions:ct(commit)
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:ct(commit)
+])
+
+dnl Hold the prefix 10.1.2.2/24 by another 10s.
+ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.2.2),tcp(src=1,dst=2)'
+dnl Send more 10.1.0.2 to make 10.1.0.0/16 tuple preprend 10.1.2.0/24 tuple in pvector of subtable.
+for i in `seq 0 256`;do
+ovs-appctl netdev-dummy/receive p1 'ipv4(src=10.0.0.1,dst=10.1.0.2),tcp(src=1,dst=2)'
+done
+
+AT_CHECK([echo 'table=0,in_port=p1,ip,nw_dst=10.1.0.0/16 actions=p2' | dnl
+ovs-ofctl --bundle replace-flows br0 -])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows | sed 's/.*core: [[0-9]]*//' | strip_xout_keep_actions], [0], [
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.0.2/255.255.0.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
+recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=10.1.2.2/255.255.255.0,frag=no), packets:0, bytes:0, used:0.0s, actions:2
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP