diff mbox series

[ovs-dev,v2,1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks.

Message ID 20240703073322.1246632-1-roid@nvidia.com
State Changes Requested
Delegated to: Eelco Chaudron
Headers show
Series [ovs-dev,v2,1/1] ofproto-dpif-upcall: Avoid stale ukeys leaks. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Roi Dayan July 3, 2024, 7:33 a.m. UTC
It is observed in some environments that there are much more ukeys than
actual DP flows. For example:

$ ovs-appctl upcall/show
system@ovs-system:
flows : (current 7) (avg 6) (max 117) (limit 2125)
offloaded flows : 525
dump duration : 1063ms
ufid enabled : true

23: (keys 3612)
24: (keys 3625)
25: (keys 3485)

The revalidator threads are busy revalidating the stale ukeys leading to
high CPU and long dump duration.

This patch adds checks in the sweep phase for such ukeys and move them
to DELETE so that they can be cleared eventually.

Co-authored-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Roi Dayan <roid@nvidia.com>
---
 ofproto/ofproto-dpif-upcall.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Eelco Chaudron July 3, 2024, 10:16 a.m. UTC | #1
On 3 Jul 2024, at 9:33, Roi Dayan wrote:

> It is observed in some environments that there are much more ukeys than
> actual DP flows. For example:
>
> $ ovs-appctl upcall/show
> system@ovs-system:
> flows : (current 7) (avg 6) (max 117) (limit 2125)
> offloaded flows : 525
> dump duration : 1063ms
> ufid enabled : true
>
> 23: (keys 3612)
> 24: (keys 3625)
> 25: (keys 3485)
>
> The revalidator threads are busy revalidating the stale ukeys leading to
> high CPU and long dump duration.
>
> This patch adds checks in the sweep phase for such ukeys and move them
> to DELETE so that they can be cleared eventually.

Thank Roi for the patch update, one small issue below. Let’s also discuss a bit more a potential testcase before sending a v3.

Cheers,

Eelco


> Co-authored-by: Han Zhou <hzhou@ovn.org>
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> Signed-off-by: Roi Dayan <roid@nvidia.com>
> ---
>  ofproto/ofproto-dpif-upcall.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 83609ec62b63..08b3c70411aa 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -278,6 +278,7 @@ enum flow_del_reason {
>      FDR_BAD_ODP_FIT,        /* Bad ODP flow fit. */
>      FDR_FLOW_IDLE,          /* Flow idle timeout. */
>      FDR_FLOW_LIMIT,         /* Kill all flows condition reached. */
> +    FDR_FLOW_NO_STATS_IDLE, /* Flow idled out without receiving statistics. */

Guess you missed the part at the beginning on updating the related USDT script:

```
/* Ukey delete reasons used by USDT probes.  Please keep in sync with the
 * definition in utilities/usdt-scripts/flow_reval_monitor.py.  */
enum flow_del_reason {
```

>      FDR_FLOW_WILDCARDED,    /* Flow needs a narrower wildcard mask. */
>      FDR_NO_OFPROTO,         /* Bridge not found. */
>      FDR_PURGE,              /* User requested flow deletion. */
> @@ -2450,7 +2451,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>          log_unexpected_stats_jump(ukey, stats);
>      }
>
> -    if (need_revalidate) {
> +    if (!ukey->stats.used
> +        && ukey->created < udpif->dpif->current_ms - ofproto_max_idle) {
> +        /* If the flow has a 'used' value of 0, meaning no stats were received
> +         * for this flow, and the configured idle time has elapsed, it might
> +         * indicates a stale flow (i.e., a flow without an installed datapath
> +         * rule). In this case, it is safe to mark this ukey for deletion. */
> +        *del_reason = FDR_FLOW_NO_STATS_IDLE;
> +    } else if (need_revalidate) {
>          if (should_revalidate(udpif, ukey, push.n_packets)) {
>              if (!ukey->xcache) {
>                  ukey->xcache = xlate_cache_new();
> -- 
> 2.21.0
Roi Dayan July 3, 2024, 10:28 a.m. UTC | #2
On 03/07/2024 13:16, Eelco Chaudron wrote:
> 
> 
> On 3 Jul 2024, at 9:33, Roi Dayan wrote:
> 
>> It is observed in some environments that there are much more ukeys than
>> actual DP flows. For example:
>>
>> $ ovs-appctl upcall/show
>> system@ovs-system:
>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>> offloaded flows : 525
>> dump duration : 1063ms
>> ufid enabled : true
>>
>> 23: (keys 3612)
>> 24: (keys 3625)
>> 25: (keys 3485)
>>
>> The revalidator threads are busy revalidating the stale ukeys leading to
>> high CPU and long dump duration.
>>
>> This patch adds checks in the sweep phase for such ukeys and move them
>> to DELETE so that they can be cleared eventually.
> 
> Thank Roi for the patch update, one small issue below. Let’s also discuss a bit more a potential testcase before sending a v3.
> 
> Cheers,
> 
> Eelco
> 
> 

I also replied in v0 but replying here so we can continue the discussion here in v2.

I did this for testing this case:

- create bridge with 2 veth ports. configure ips.
- ping between the ports to have tc rules.
- repeated few times: clear the tc rules like this: tc filter del dev veth1 ingress and also on the 2nd port.
- set max-idle to 1 and remove it to cause a flush of the rules.
- create another set of veth ports. add/del veth4 from the bridge a few times to cause a sweep.
- before the fix: ovs-appctl upcall/show will show ukeys.
- after the fix upcall/show will show 0 ukeys.


what do you think?
I think if there is a cleaner way to do a sweep with purge=false as just
will just skip seq mismatch check and just set every ukey to delete.



>> Co-authored-by: Han Zhou <hzhou@ovn.org>
>> Signed-off-by: Han Zhou <hzhou@ovn.org>
>> Signed-off-by: Roi Dayan <roid@nvidia.com>
>> ---
>>  ofproto/ofproto-dpif-upcall.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index 83609ec62b63..08b3c70411aa 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -278,6 +278,7 @@ enum flow_del_reason {
>>      FDR_BAD_ODP_FIT,        /* Bad ODP flow fit. */
>>      FDR_FLOW_IDLE,          /* Flow idle timeout. */
>>      FDR_FLOW_LIMIT,         /* Kill all flows condition reached. */
>> +    FDR_FLOW_NO_STATS_IDLE, /* Flow idled out without receiving statistics. */
> 
> Guess you missed the part at the beginning on updating the related USDT script:
> 
> ```
> /* Ukey delete reasons used by USDT probes.  Please keep in sync with the
>  * definition in utilities/usdt-scripts/flow_reval_monitor.py.  */
> enum flow_del_reason {
> ```
> 

right. thanks.

>>      FDR_FLOW_WILDCARDED,    /* Flow needs a narrower wildcard mask. */
>>      FDR_NO_OFPROTO,         /* Bridge not found. */
>>      FDR_PURGE,              /* User requested flow deletion. */
>> @@ -2450,7 +2451,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>>          log_unexpected_stats_jump(ukey, stats);
>>      }
>>
>> -    if (need_revalidate) {
>> +    if (!ukey->stats.used
>> +        && ukey->created < udpif->dpif->current_ms - ofproto_max_idle) {
>> +        /* If the flow has a 'used' value of 0, meaning no stats were received
>> +         * for this flow, and the configured idle time has elapsed, it might
>> +         * indicates a stale flow (i.e., a flow without an installed datapath
>> +         * rule). In this case, it is safe to mark this ukey for deletion. */
>> +        *del_reason = FDR_FLOW_NO_STATS_IDLE;
>> +    } else if (need_revalidate) {
>>          if (should_revalidate(udpif, ukey, push.n_packets)) {
>>              if (!ukey->xcache) {
>>                  ukey->xcache = xlate_cache_new();
>> -- 
>> 2.21.0
>
Ilya Maximets July 9, 2024, 9:38 p.m. UTC | #3
On 7/3/24 12:28, Roi Dayan via dev wrote:
> 
> 
> On 03/07/2024 13:16, Eelco Chaudron wrote:
>>
>>
>> On 3 Jul 2024, at 9:33, Roi Dayan wrote:
>>
>>> It is observed in some environments that there are much more ukeys than
>>> actual DP flows. For example:
>>>
>>> $ ovs-appctl upcall/show
>>> system@ovs-system:
>>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>>> offloaded flows : 525
>>> dump duration : 1063ms
>>> ufid enabled : true
>>>
>>> 23: (keys 3612)
>>> 24: (keys 3625)
>>> 25: (keys 3485)
>>>
>>> The revalidator threads are busy revalidating the stale ukeys leading to
>>> high CPU and long dump duration.
>>>
>>> This patch adds checks in the sweep phase for such ukeys and move them
>>> to DELETE so that they can be cleared eventually.
>>
>> Thank Roi for the patch update, one small issue below. Let’s also discuss a bit more a potential testcase before sending a v3.
>>
>> Cheers,
>>
>> Eelco
>>
>>
> 
> I also replied in v0 but replying here so we can continue the discussion here in v2.
> 
> I did this for testing this case:
> 
> - create bridge with 2 veth ports. configure ips.
> - ping between the ports to have tc rules.
> - repeated few times: clear the tc rules like this: tc filter del dev veth1 ingress and also on the 2nd port.
> - set max-idle to 1 and remove it to cause a flush of the rules.
> - create another set of veth ports. add/del veth4 from the bridge a few times to cause a sweep.
> - before the fix: ovs-appctl upcall/show will show ukeys.
> - after the fix upcall/show will show 0 ukeys.
> 
> 
> what do you think?
> I think if there is a cleaner way to do a sweep with purge=false as just
> will just skip seq mismatch check and just set every ukey to delete.

One problem here is that flow dumps are inherently racy.
We can dump the same flow multiple times as well as not
receive some flows that are actually in the datapath.
So, we can't just remove ukeys if we missed a flow once
or twice.  If we didn't see the flow in the dumps for the
max-idle interval, that might be a better indicator that
something is wrong.

One more problem here however is that ukey statistics can
keep increasing if there is some traffic that matches.
Since the datapath flow doesn't exist, all this traffic
will go to userspace updating the ukey stats, but the
flow will never be installed again.  And we will not be
able to fix that in this patch since the stats will grow.

We need a way to tell that we haven't seen this flow in
the dumps for a long time and stats are not a good indicator
in this case.

Best regards, Ilya Maximets.
Eelco Chaudron July 10, 2024, 3:35 p.m. UTC | #4
On 9 Jul 2024, at 23:38, Ilya Maximets wrote:

> On 7/3/24 12:28, Roi Dayan via dev wrote:
>>
>>
>> On 03/07/2024 13:16, Eelco Chaudron wrote:
>>>
>>>
>>> On 3 Jul 2024, at 9:33, Roi Dayan wrote:
>>>
>>>> It is observed in some environments that there are much more ukeys than
>>>> actual DP flows. For example:
>>>>
>>>> $ ovs-appctl upcall/show
>>>> system@ovs-system:
>>>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>>>> offloaded flows : 525
>>>> dump duration : 1063ms
>>>> ufid enabled : true
>>>>
>>>> 23: (keys 3612)
>>>> 24: (keys 3625)
>>>> 25: (keys 3485)
>>>>
>>>> The revalidator threads are busy revalidating the stale ukeys leading to
>>>> high CPU and long dump duration.
>>>>
>>>> This patch adds checks in the sweep phase for such ukeys and move them
>>>> to DELETE so that they can be cleared eventually.
>>>
>>> Thank Roi for the patch update, one small issue below. Let’s also discuss a bit more a potential testcase before sending a v3.
>>>
>>> Cheers,
>>>
>>> Eelco
>>>
>>>
>>
>> I also replied in v0 but replying here so we can continue the discussion here in v2.
>>
>> I did this for testing this case:
>>
>> - create bridge with 2 veth ports. configure ips.
>> - ping between the ports to have tc rules.
>> - repeated few times: clear the tc rules like this: tc filter del dev veth1 ingress and also on the 2nd port.
>> - set max-idle to 1 and remove it to cause a flush of the rules.
>> - create another set of veth ports. add/del veth4 from the bridge a few times to cause a sweep.
>> - before the fix: ovs-appctl upcall/show will show ukeys.
>> - after the fix upcall/show will show 0 ukeys.
>>
>>
>> what do you think?
>> I think if there is a cleaner way to do a sweep with purge=false as just
>> will just skip seq mismatch check and just set every ukey to delete.
>
> One problem here is that flow dumps are inherently racy.
> We can dump the same flow multiple times as well as not
> receive some flows that are actually in the datapath.
> So, we can't just remove ukeys if we missed a flow once
> or twice.  If we didn't see the flow in the dumps for the
> max-idle interval, that might be a better indicator that
> something is wrong.
>
> One more problem here however is that ukey statistics can
> keep increasing if there is some traffic that matches.
> Since the datapath flow doesn't exist, all this traffic
> will go to userspace updating the ukey stats, but the
> flow will never be installed again.  And we will not be
> able to fix that in this patch since the stats will grow.

Yes, I noticed this when trying to add a quick testcase for this. The entries were used, so it would never hit the code path.


> We need a way to tell that we haven't seen this flow in
> the dumps for a long time and stats are not a good indicator
> in this case.

Agreed, I made a rough testcase and a patch that does this. Please take a look and use whatever you need for your patch. I can also try to work on this if you want, but it will be another three weeks as I’m about to go on PTO.

Here are the patches (diffs):

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 83609ec62..8b449537d 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow);
 COVERAGE_DEFINE(dumped_new_flow);
 COVERAGE_DEFINE(handler_duplicate_upcall);
 COVERAGE_DEFINE(revalidate_missed_dp_flow);
+COVERAGE_DEFINE(revalidate_missing_dp_flow);
 COVERAGE_DEFINE(ukey_dp_change);
 COVERAGE_DEFINE(ukey_invalid_stat_reset);
 COVERAGE_DEFINE(ukey_replace_contention);
@@ -278,6 +279,7 @@ enum flow_del_reason {
     FDR_BAD_ODP_FIT,        /* Bad ODP flow fit. */
     FDR_FLOW_IDLE,          /* Flow idle timeout. */
     FDR_FLOW_LIMIT,         /* Kill all flows condition reached. */
+    FDR_FLOW_MISSING_DP,    /* Flow is missing from the datapath. */
     FDR_FLOW_WILDCARDED,    /* Flow needs a narrower wildcard mask. */
     FDR_NO_OFPROTO,         /* Bridge not found. */
     FDR_PURGE,              /* User requested flow deletion. */
@@ -315,6 +317,7 @@ struct udpif_key {
     struct dpif_flow_stats stats OVS_GUARDED; /* Last known stats.*/
     const char *dp_layer OVS_GUARDED;         /* Last known dp_layer. */
     long long int created OVS_GUARDED;        /* Estimate of creation time. */
+    long long int last_dumped OVS_GUARDED;    /* Flow last dump time. */
     uint64_t dump_seq OVS_GUARDED;            /* Tracks udpif->dump_seq. */
     uint64_t reval_seq OVS_GUARDED;           /* Tracks udpif->reval_seq. */
     enum ukey_state state OVS_GUARDED;        /* Tracks ukey lifetime. */
@@ -1819,6 +1822,7 @@ ukey_create__(const struct nlattr *key, size_t key_len,
     ukey->state_thread = ovsthread_id_self();
     ukey->state_where = OVS_SOURCE_LOCATOR;
     ukey->created = ukey->flow_time = time_msec();
+    ukey->last_dumped = 0;
     memset(&ukey->stats, 0, sizeof ukey->stats);
     ukey->stats.used = used;
     ukey->dp_layer = NULL;
@@ -2450,7 +2454,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
         log_unexpected_stats_jump(ukey, stats);
     }

-    if (need_revalidate) {
+    if ((ukey->last_dumped ? ukey->last_dumped : ukey->created)
+        < udpif->dpif->current_ms - (2 * ofproto_max_idle)) {
+        /* If the flow was not dumped for at least twice the idle time,
+         * we can assume the datapath flow now longer exists and the ukey
+         * should be deleted. */
+        COVERAGE_INC(revalidate_missing_dp_flow);
+        *del_reason = FDR_FLOW_MISSING_DP;
+    } else if (need_revalidate) {
         if (should_revalidate(udpif, ukey, push.n_packets)) {
             if (!ukey->xcache) {
                 ukey->xcache = xlate_cache_new();
@@ -2884,6 +2895,7 @@ revalidate(struct revalidator *revalidator)
                 continue;
             }

+            ukey->last_dumped = now;
             ukey->offloaded = f->attrs.offloaded;
             if (!ukey->dp_layer
                 || (!dpif_synced_dp_layers(udpif->dpif)
diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
index 7dbf0e97c..87a64d453 100644
--- a/tests/system-offloads-traffic.at
+++ b/tests/system-offloads-traffic.at
@@ -132,8 +132,8 @@ dnl Wait until the ARP flow has timed out.
 OVS_WAIT_UNTIL([test `ovs-appctl dpctl/dump-flows type=tc,offloaded | \
                   grep "arp" | wc -l` -eq 0])

-dnl Set max-idle to 200 ms, so we are sure the max-idle * 2 has been reached.
-AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200])
+dnl Set max-idle to 10 ms, so we are sure the max-idle * 2 has been reached.
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=10])

 dnl Add a port to force the revalidation.
 REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure)
diff --git a/utilities/usdt-scripts/flow_reval_monitor.py b/utilities/usdt-scripts/flow_reval_monitor.py
index 28479a565..1de06c73b 100755
--- a/utilities/usdt-scripts/flow_reval_monitor.py
+++ b/utilities/usdt-scripts/flow_reval_monitor.py
@@ -249,6 +249,7 @@ FdrReasons = IntEnum(
         "FDR_BAD_ODP_FIT",
         "FDR_FLOW_IDLE",
         "FDR_FLOW_LIMIT",
+        "FDR_FLOW_MISSING_DP",
         "FDR_FLOW_WILDCARDED",
         "FDR_NO_OFPROTO",
         "FDR_PURGE",
@@ -265,6 +266,7 @@ FdrReasonStrings = {
     FdrReasons.FDR_BAD_ODP_FIT: "Bad ODP flow fit",
     FdrReasons.FDR_FLOW_IDLE: "Flow idle timeout",
     FdrReasons.FDR_FLOW_LIMIT: "Kill all flows condition reached",
+    FdrReasons.FDR_FLOW_MISSING_DP: "Flow is missing from the datapath",
     FdrReasons.FDR_FLOW_WILDCARDED: "Flow needs a narrower wildcard mask",
     FdrReasons.FDR_NO_OFPROTO: "Bridge not found",
     FdrReasons.FDR_PURGE: "User requested flow deletion",


diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
index d1da33d96..7dbf0e97c 100644
--- a/tests/system-offloads-traffic.at
+++ b/tests/system-offloads-traffic.at
@@ -93,6 +93,66 @@ AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], [i
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP

+AT_SETUP([offloads - Forced removed datapath entries])
+OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . other_config:hw-offload=true])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [ignore])
+
+AT_CHECK([ovs-appctl vlog/set ofproto_dpif_upcall:dbg])
+
+dnl Set idle timeout to 1 second.
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=1000])
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+10 packets transmitted, 10 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep "eth_type(0x0800)" | DUMP_CLEAN_SORTED], [0], [dnl
+in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, used:0.001s, actions:output
+in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, used:0.001s, actions:output
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep -q "arp" ], [0])
+
+dnl Delete IPv4 entries, but keep the ARP ones.
+AT_CHECK([tc filter del dev ovs-p0 ingress protocol ip pref 2])
+AT_CHECK([tc filter del dev ovs-p1 ingress protocol ip pref 2])
+
+dnl Bring the remote ports down to avoid traffic.
+AT_CHECK([ip -n at_ns0 link set p0 down])
+AT_CHECK([ip -n at_ns1 link set p1 down])
+
+dnl Wait until the ARP flow has timed out.
+OVS_WAIT_UNTIL([test `ovs-appctl dpctl/dump-flows type=tc,offloaded | \
+                  grep "arp" | wc -l` -eq 0])
+
+dnl Set max-idle to 200 ms, so we are sure the max-idle * 2 has been reached.
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200])
+
+dnl Add a port to force the revalidation.
+REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure)
+AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 type=internal])
+OVS_WAIT_UNTIL([test `ovs-appctl coverage/read-counter rev_reconfigure` \
+                  -gt $REV_RECONFIGURE])
+
+dnl Wait for another full round of revalidation.  After this all ukeys should
+dnl be gone.
+AT_CHECK([ovs-appctl revalidator/wait], [0])
+AT_CHECK([ovs-appctl revalidator/wait], [0])
+
+dnl Make sure no more ukeys exists.
+AT_CHECK([ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \
+            grep -qv '0)'], [1])
+
+OVS_TRAFFIC_VSWITCHD_STOP(["/failed to flow_del (No such file or directory)/d"])
+AT_CLEANUP
+
 AT_SETUP([offloads - set ingress_policing_rate and ingress_policing_burst - offloads disabled])
 AT_KEYWORDS([ingress_policing])
 OVS_CHECK_TC_QDISC()
Roi Dayan July 17, 2024, 8:28 a.m. UTC | #5
On 10/07/2024 18:35, Eelco Chaudron wrote:
> 
> 
> On 9 Jul 2024, at 23:38, Ilya Maximets wrote:
> 
>> On 7/3/24 12:28, Roi Dayan via dev wrote:
>>>
>>>
>>> On 03/07/2024 13:16, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 3 Jul 2024, at 9:33, Roi Dayan wrote:
>>>>
>>>>> It is observed in some environments that there are much more ukeys than
>>>>> actual DP flows. For example:
>>>>>
>>>>> $ ovs-appctl upcall/show
>>>>> system@ovs-system:
>>>>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>>>>> offloaded flows : 525
>>>>> dump duration : 1063ms
>>>>> ufid enabled : true
>>>>>
>>>>> 23: (keys 3612)
>>>>> 24: (keys 3625)
>>>>> 25: (keys 3485)
>>>>>
>>>>> The revalidator threads are busy revalidating the stale ukeys leading to
>>>>> high CPU and long dump duration.
>>>>>
>>>>> This patch adds checks in the sweep phase for such ukeys and move them
>>>>> to DELETE so that they can be cleared eventually.
>>>>
>>>> Thank Roi for the patch update, one small issue below. Let’s also discuss a bit more a potential testcase before sending a v3.
>>>>
>>>> Cheers,
>>>>
>>>> Eelco
>>>>
>>>>
>>>
>>> I also replied in v0 but replying here so we can continue the discussion here in v2.
>>>
>>> I did this for testing this case:
>>>
>>> - create bridge with 2 veth ports. configure ips.
>>> - ping between the ports to have tc rules.
>>> - repeated few times: clear the tc rules like this: tc filter del dev veth1 ingress and also on the 2nd port.
>>> - set max-idle to 1 and remove it to cause a flush of the rules.
>>> - create another set of veth ports. add/del veth4 from the bridge a few times to cause a sweep.
>>> - before the fix: ovs-appctl upcall/show will show ukeys.
>>> - after the fix upcall/show will show 0 ukeys.
>>>
>>>
>>> what do you think?
>>> I think if there is a cleaner way to do a sweep with purge=false as just
>>> will just skip seq mismatch check and just set every ukey to delete.
>>
>> One problem here is that flow dumps are inherently racy.
>> We can dump the same flow multiple times as well as not
>> receive some flows that are actually in the datapath.
>> So, we can't just remove ukeys if we missed a flow once
>> or twice.  If we didn't see the flow in the dumps for the
>> max-idle interval, that might be a better indicator that
>> something is wrong.
>>
>> One more problem here however is that ukey statistics can
>> keep increasing if there is some traffic that matches.
>> Since the datapath flow doesn't exist, all this traffic
>> will go to userspace updating the ukey stats, but the
>> flow will never be installed again.  And we will not be
>> able to fix that in this patch since the stats will grow.
> 
> Yes, I noticed this when trying to add a quick testcase for this. The entries were used, so it would never hit the code path.
> 
> 
>> We need a way to tell that we haven't seen this flow in
>> the dumps for a long time and stats are not a good indicator
>> in this case.
> 
> Agreed, I made a rough testcase and a patch that does this. Please take a look and use whatever you need for your patch. I can also try to work on this if you want, but it will be another three weeks as I’m about to go on PTO.
> 
> Here are the patches (diffs):
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 83609ec62..8b449537d 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -57,6 +57,7 @@ COVERAGE_DEFINE(dumped_inconsistent_flow);
>  COVERAGE_DEFINE(dumped_new_flow);
>  COVERAGE_DEFINE(handler_duplicate_upcall);
>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> +COVERAGE_DEFINE(revalidate_missing_dp_flow);
>  COVERAGE_DEFINE(ukey_dp_change);
>  COVERAGE_DEFINE(ukey_invalid_stat_reset);
>  COVERAGE_DEFINE(ukey_replace_contention);
> @@ -278,6 +279,7 @@ enum flow_del_reason {
>      FDR_BAD_ODP_FIT,        /* Bad ODP flow fit. */
>      FDR_FLOW_IDLE,          /* Flow idle timeout. */
>      FDR_FLOW_LIMIT,         /* Kill all flows condition reached. */
> +    FDR_FLOW_MISSING_DP,    /* Flow is missing from the datapath. */
>      FDR_FLOW_WILDCARDED,    /* Flow needs a narrower wildcard mask. */
>      FDR_NO_OFPROTO,         /* Bridge not found. */
>      FDR_PURGE,              /* User requested flow deletion. */
> @@ -315,6 +317,7 @@ struct udpif_key {
>      struct dpif_flow_stats stats OVS_GUARDED; /* Last known stats.*/
>      const char *dp_layer OVS_GUARDED;         /* Last known dp_layer. */
>      long long int created OVS_GUARDED;        /* Estimate of creation time. */
> +    long long int last_dumped OVS_GUARDED;    /* Flow last dump time. */
>      uint64_t dump_seq OVS_GUARDED;            /* Tracks udpif->dump_seq. */
>      uint64_t reval_seq OVS_GUARDED;           /* Tracks udpif->reval_seq. */
>      enum ukey_state state OVS_GUARDED;        /* Tracks ukey lifetime. */
> @@ -1819,6 +1822,7 @@ ukey_create__(const struct nlattr *key, size_t key_len,
>      ukey->state_thread = ovsthread_id_self();
>      ukey->state_where = OVS_SOURCE_LOCATOR;
>      ukey->created = ukey->flow_time = time_msec();
> +    ukey->last_dumped = 0;
>      memset(&ukey->stats, 0, sizeof ukey->stats);
>      ukey->stats.used = used;
>      ukey->dp_layer = NULL;
> @@ -2450,7 +2454,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>          log_unexpected_stats_jump(ukey, stats);
>      }
> 
> -    if (need_revalidate) {
> +    if ((ukey->last_dumped ? ukey->last_dumped : ukey->created)
> +        < udpif->dpif->current_ms - (2 * ofproto_max_idle)) {
> +        /* If the flow was not dumped for at least twice the idle time,
> +         * we can assume the datapath flow now longer exists and the ukey
> +         * should be deleted. */
> +        COVERAGE_INC(revalidate_missing_dp_flow);
> +        *del_reason = FDR_FLOW_MISSING_DP;
> +    } else if (need_revalidate) {
>          if (should_revalidate(udpif, ukey, push.n_packets)) {
>              if (!ukey->xcache) {
>                  ukey->xcache = xlate_cache_new();
> @@ -2884,6 +2895,7 @@ revalidate(struct revalidator *revalidator)
>                  continue;
>              }
> 
> +            ukey->last_dumped = now;
>              ukey->offloaded = f->attrs.offloaded;
>              if (!ukey->dp_layer
>                  || (!dpif_synced_dp_layers(udpif->dpif)
> diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
> index 7dbf0e97c..87a64d453 100644
> --- a/tests/system-offloads-traffic.at
> +++ b/tests/system-offloads-traffic.at
> @@ -132,8 +132,8 @@ dnl Wait until the ARP flow has timed out.
>  OVS_WAIT_UNTIL([test `ovs-appctl dpctl/dump-flows type=tc,offloaded | \
>                    grep "arp" | wc -l` -eq 0])
> 
> -dnl Set max-idle to 200 ms, so we are sure the max-idle * 2 has been reached.
> -AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200])
> +dnl Set max-idle to 10 ms, so we are sure the max-idle * 2 has been reached.
> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=10])
> 
>  dnl Add a port to force the revalidation.
>  REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure)
> diff --git a/utilities/usdt-scripts/flow_reval_monitor.py b/utilities/usdt-scripts/flow_reval_monitor.py
> index 28479a565..1de06c73b 100755
> --- a/utilities/usdt-scripts/flow_reval_monitor.py
> +++ b/utilities/usdt-scripts/flow_reval_monitor.py
> @@ -249,6 +249,7 @@ FdrReasons = IntEnum(
>          "FDR_BAD_ODP_FIT",
>          "FDR_FLOW_IDLE",
>          "FDR_FLOW_LIMIT",
> +        "FDR_FLOW_MISSING_DP",
>          "FDR_FLOW_WILDCARDED",
>          "FDR_NO_OFPROTO",
>          "FDR_PURGE",
> @@ -265,6 +266,7 @@ FdrReasonStrings = {
>      FdrReasons.FDR_BAD_ODP_FIT: "Bad ODP flow fit",
>      FdrReasons.FDR_FLOW_IDLE: "Flow idle timeout",
>      FdrReasons.FDR_FLOW_LIMIT: "Kill all flows condition reached",
> +    FdrReasons.FDR_FLOW_MISSING_DP: "Flow is missing from the datapath",
>      FdrReasons.FDR_FLOW_WILDCARDED: "Flow needs a narrower wildcard mask",
>      FdrReasons.FDR_NO_OFPROTO: "Bridge not found",
>      FdrReasons.FDR_PURGE: "User requested flow deletion",
> 
> 
> diff --git a/tests/system-offloads-traffic.at b/tests/system-offloads-traffic.at
> index d1da33d96..7dbf0e97c 100644
> --- a/tests/system-offloads-traffic.at
> +++ b/tests/system-offloads-traffic.at
> @@ -93,6 +93,66 @@ AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], [i
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
> 
> +AT_SETUP([offloads - Forced removed datapath entries])
> +OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . other_config:hw-offload=true])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +AT_CHECK([ovs-appctl dpctl/dump-flows], [0], [ignore])
> +
> +AT_CHECK([ovs-appctl vlog/set ofproto_dpif_upcall:dbg])
> +
> +dnl Set idle timeout to 1 second.
> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=1000])
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 10 -i 0.1 -W 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +10 packets transmitted, 10 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep "eth_type(0x0800)" | DUMP_CLEAN_SORTED], [0], [dnl
> +in_port(2),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, used:0.001s, actions:output
> +in_port(3),eth(macs),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:756, used:0.001s, actions:output
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows type=tc,offloaded | grep -q "arp" ], [0])
> +
> +dnl Delete IPv4 entries, but keep the ARP ones.
> +AT_CHECK([tc filter del dev ovs-p0 ingress protocol ip pref 2])
> +AT_CHECK([tc filter del dev ovs-p1 ingress protocol ip pref 2])
> +
> +dnl Bring the remote ports down to avoid traffic.
> +AT_CHECK([ip -n at_ns0 link set p0 down])
> +AT_CHECK([ip -n at_ns1 link set p1 down])
> +
> +dnl Wait until the ARP flow has timed out.
> +OVS_WAIT_UNTIL([test `ovs-appctl dpctl/dump-flows type=tc,offloaded | \
> +                  grep "arp" | wc -l` -eq 0])
> +
> +dnl Set max-idle to 200 ms, so we are sure the max-idle * 2 has been reached.
> +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200])
> +
> +dnl Add a port to force the revalidation.
> +REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure)
> +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 type=internal])
> +OVS_WAIT_UNTIL([test `ovs-appctl coverage/read-counter rev_reconfigure` \
> +                  -gt $REV_RECONFIGURE])
> +
> +dnl Wait for another full round of revalidation.  After this all ukeys should
> +dnl be gone.
> +AT_CHECK([ovs-appctl revalidator/wait], [0])
> +AT_CHECK([ovs-appctl revalidator/wait], [0])
> +
> +dnl Make sure no more ukeys exists.
> +AT_CHECK([ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \
> +            grep -qv '0)'], [1])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to flow_del (No such file or directory)/d"])
> +AT_CLEANUP
> +
>  AT_SETUP([offloads - set ingress_policing_rate and ingress_policing_burst - offloads disabled])
>  AT_KEYWORDS([ingress_policing])
>  OVS_CHECK_TC_QDISC()
> 
> 
> 
> 


Hi Eelco,

Thanks for the patch. it works fine.
But the test fails with ukeys still exists.

./system-offloads-traffic.at:150: ovs-appctl upcall/show | grep '(keys' | awk '{print $3}' | \
            grep -qv '0)'
./system-offloads-traffic.at:150: exit code was 0, expected 1

If I'm changing the test a bit to do add-port del-port one more time
then it works.
I think the seq doesn't change and doing this multiple times is helping.

You already did the entire patch with a test :) do you want to submit ? its fine by me.

Thanks,
Roi

+++ b/tests/system-offloads-traffic.at
@@ -138,8 +138,13 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:max-idle=200])
 dnl Add a port to force the revalidation.
 REV_RECONFIGURE=$(ovs-appctl coverage/read-counter rev_reconfigure)
 AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 type=internal])
+AT_CHECK([ovs-vsctl del-port ovs-p2 ])
 OVS_WAIT_UNTIL([test `ovs-appctl coverage/read-counter rev_reconfigure` \
                   -gt $REV_RECONFIGURE])
+sleep 1
+AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 type=internal])
+AT_CHECK([ovs-vsctl del-port ovs-p2 ])
+sleep 1
 
 dnl Wait for another full round of revalidation.  After this all ukeys should
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 83609ec62b63..08b3c70411aa 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -278,6 +278,7 @@  enum flow_del_reason {
     FDR_BAD_ODP_FIT,        /* Bad ODP flow fit. */
     FDR_FLOW_IDLE,          /* Flow idle timeout. */
     FDR_FLOW_LIMIT,         /* Kill all flows condition reached. */
+    FDR_FLOW_NO_STATS_IDLE, /* Flow idled out without receiving statistics. */
     FDR_FLOW_WILDCARDED,    /* Flow needs a narrower wildcard mask. */
     FDR_NO_OFPROTO,         /* Bridge not found. */
     FDR_PURGE,              /* User requested flow deletion. */
@@ -2450,7 +2451,14 @@  revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
         log_unexpected_stats_jump(ukey, stats);
     }
 
-    if (need_revalidate) {
+    if (!ukey->stats.used
+        && ukey->created < udpif->dpif->current_ms - ofproto_max_idle) {
+        /* If the flow has a 'used' value of 0, meaning no stats were received
+         * for this flow, and the configured idle time has elapsed, it might
+         * indicates a stale flow (i.e., a flow without an installed datapath
+         * rule). In this case, it is safe to mark this ukey for deletion. */
+        *del_reason = FDR_FLOW_NO_STATS_IDLE;
+    } else if (need_revalidate) {
         if (should_revalidate(udpif, ukey, push.n_packets)) {
             if (!ukey->xcache) {
                 ukey->xcache = xlate_cache_new();