diff mbox series

[ovs-dev] features.c: Always wait on the rconn.

Message ID 20240208204921.932115-1-mmichels@redhat.com
State Accepted
Headers show
Series [ovs-dev] features.c: Always wait on the rconn. | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Mark Michelson Feb. 8, 2024, 8:49 p.m. UTC
When performing feature detection with OVS, we use an rconn connected to
br-int.mgmt to query for features. If the rconn is not connected, then
we would exit early. However, this early exit did not call
rconn_run_wait() or rconn_recv_wait(). Therefore, if the rconn were in
a state such as BACKOFF or CONNECTING, we would not wait on the rconn's
fd for activity.

In most cases, nobody would notice this because ovn-controller would
likely be waking up constantly due to southbound database changes,
handling CLI commands, or other activity. However, in one of Red Hat
QE's tests, they were seeing ovn-controller take 10 seconds between
when it would start up and when OF flows were installed. This was
because the only thing waking ovn-controller up was regularly-scheduled
probe intervals with the southbound database.

To fix this, we call rconn_run_wait() and rconn_recv_wait() even in
the case when the rconn is not connected. This way, if we end up in a
state other than ACTIVE or IDLE, we can handle traffic from the rconn
when it arrives.

Fixes: e9e716ad531e ("controller: Don't artificially limit group and
meter IDs to 16bit.")

Reported-at: https://issues.redhat.com/browse/FDP-357
Signed-off-by: Mark Michelson <mmichels@redhat.com>
---
 lib/features.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Dumitru Ceara Feb. 9, 2024, 9:21 a.m. UTC | #1
Hi Mark,

Thanks for the fix, sorry for introducing the issue in the first place!

On 2/8/24 21:49, Mark Michelson wrote:
> When performing feature detection with OVS, we use an rconn connected to
> br-int.mgmt to query for features. If the rconn is not connected, then
> we would exit early. However, this early exit did not call
> rconn_run_wait() or rconn_recv_wait(). Therefore, if the rconn were in
> a state such as BACKOFF or CONNECTING, we would not wait on the rconn's
> fd for activity.
> 
> In most cases, nobody would notice this because ovn-controller would
> likely be waking up constantly due to southbound database changes,
> handling CLI commands, or other activity. However, in one of Red Hat
> QE's tests, they were seeing ovn-controller take 10 seconds between
> when it would start up and when OF flows were installed. This was
> because the only thing waking ovn-controller up was regularly-scheduled
> probe intervals with the southbound database.
> 
> To fix this, we call rconn_run_wait() and rconn_recv_wait() even in
> the case when the rconn is not connected. This way, if we end up in a
> state other than ACTIVE or IDLE, we can handle traffic from the rconn
> when it arrives.
> 
> Fixes: e9e716ad531e ("controller: Don't artificially limit group and
> meter IDs to 16bit.")

0-day bot flagged this as an issue, the Fixes tag should be on a single
line but that can be fixed when merging the patch.

> 
> Reported-at: https://issues.redhat.com/browse/FDP-357
> Signed-off-by: Mark Michelson <mmichels@redhat.com>
> ---
>  lib/features.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/features.c b/lib/features.c
> index 82bce10e9..b04437235 100644
> --- a/lib/features.c
> +++ b/lib/features.c
> @@ -150,6 +150,8 @@ ovs_feature_get_openflow_cap(const char *br_name)
>  
>      rconn_run(swconn);
>      if (!rconn_is_connected(swconn)) {
> +        rconn_run_wait(swconn);
> +        rconn_recv_wait(swconn);
>          return false;
>      }
>  

Looks good to me, thanks!

Acked-by: Dumitru Ceara <dceara@redhat.com>

Regards,
Dumitru
Mark Michelson Feb. 9, 2024, 4:15 p.m. UTC | #2
On 2/9/24 04:21, Dumitru Ceara wrote:
> Hi Mark,
> 
> Thanks for the fix, sorry for introducing the issue in the first place!
> 
> On 2/8/24 21:49, Mark Michelson wrote:
>> When performing feature detection with OVS, we use an rconn connected to
>> br-int.mgmt to query for features. If the rconn is not connected, then
>> we would exit early. However, this early exit did not call
>> rconn_run_wait() or rconn_recv_wait(). Therefore, if the rconn were in
>> a state such as BACKOFF or CONNECTING, we would not wait on the rconn's
>> fd for activity.
>>
>> In most cases, nobody would notice this because ovn-controller would
>> likely be waking up constantly due to southbound database changes,
>> handling CLI commands, or other activity. However, in one of Red Hat
>> QE's tests, they were seeing ovn-controller take 10 seconds between
>> when it would start up and when OF flows were installed. This was
>> because the only thing waking ovn-controller up was regularly-scheduled
>> probe intervals with the southbound database.
>>
>> To fix this, we call rconn_run_wait() and rconn_recv_wait() even in
>> the case when the rconn is not connected. This way, if we end up in a
>> state other than ACTIVE or IDLE, we can handle traffic from the rconn
>> when it arrives.
>>
>> Fixes: e9e716ad531e ("controller: Don't artificially limit group and
>> meter IDs to 16bit.")
> 
> 0-day bot flagged this as an issue, the Fixes tag should be on a single
> line but that can be fixed when merging the patch.

Whoops, I missed this before pushing. I guess I could force-push a 
change, but I think humans will understand the intent.

> 
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-357
>> Signed-off-by: Mark Michelson <mmichels@redhat.com>
>> ---
>>   lib/features.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/lib/features.c b/lib/features.c
>> index 82bce10e9..b04437235 100644
>> --- a/lib/features.c
>> +++ b/lib/features.c
>> @@ -150,6 +150,8 @@ ovs_feature_get_openflow_cap(const char *br_name)
>>   
>>       rconn_run(swconn);
>>       if (!rconn_is_connected(swconn)) {
>> +        rconn_run_wait(swconn);
>> +        rconn_recv_wait(swconn);
>>           return false;
>>       }
>>   
> 
> Looks good to me, thanks!
> 
> Acked-by: Dumitru Ceara <dceara@redhat.com>
> 
> Regards,
> Dumitru
> 

Thanks Dumitru, I pushed this to main and all branches back to 22.12.
diff mbox series

Patch

diff --git a/lib/features.c b/lib/features.c
index 82bce10e9..b04437235 100644
--- a/lib/features.c
+++ b/lib/features.c
@@ -150,6 +150,8 @@  ovs_feature_get_openflow_cap(const char *br_name)
 
     rconn_run(swconn);
     if (!rconn_is_connected(swconn)) {
+        rconn_run_wait(swconn);
+        rconn_recv_wait(swconn);
         return false;
     }