Message ID | 20240208204921.932115-1-mmichels@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] features.c: Always wait on the rconn. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | warning | apply and check: warning |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Hi Mark, Thanks for the fix, sorry for introducing the issue in the first place! On 2/8/24 21:49, Mark Michelson wrote: > When performing feature detection with OVS, we use an rconn connected to > br-int.mgmt to query for features. If the rconn is not connected, then > we would exit early. However, this early exit did not call > rconn_run_wait() or rconn_recv_wait(). Therefore, if the rconn were in > a state such as BACKOFF or CONNECTING, we would not wait on the rconn's > fd for activity. > > In most cases, nobody would notice this because ovn-controller would > likely be waking up constantly due to southbound database changes, > handling CLI commands, or other activity. However, in one of Red Hat > QE's tests, they were seeing ovn-controller take 10 seconds between > when it would start up and when OF flows were installed. This was > because the only thing waking ovn-controller up was regularly-scheduled > probe intervals with the southbound database. > > To fix this, we call rconn_run_wait() and rconn_recv_wait() even in > the case when the rconn is not connected. This way, if we end up in a > state other than ACTIVE or IDLE, we can handle traffic from the rconn > when it arrives. > > Fixes: e9e716ad531e ("controller: Don't artificially limit group and > meter IDs to 16bit.") 0-day bot flagged this as an issue, the Fixes tag should be on a single line but that can be fixed when merging the patch. > > Reported-at: https://issues.redhat.com/browse/FDP-357 > Signed-off-by: Mark Michelson <mmichels@redhat.com> > --- > lib/features.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/lib/features.c b/lib/features.c > index 82bce10e9..b04437235 100644 > --- a/lib/features.c > +++ b/lib/features.c > @@ -150,6 +150,8 @@ ovs_feature_get_openflow_cap(const char *br_name) > > rconn_run(swconn); > if (!rconn_is_connected(swconn)) { > + rconn_run_wait(swconn); > + rconn_recv_wait(swconn); > return false; > } > Looks good to me, thanks! Acked-by: Dumitru Ceara <dceara@redhat.com> Regards, Dumitru
On 2/9/24 04:21, Dumitru Ceara wrote: > Hi Mark, > > Thanks for the fix, sorry for introducing the issue in the first place! > > On 2/8/24 21:49, Mark Michelson wrote: >> When performing feature detection with OVS, we use an rconn connected to >> br-int.mgmt to query for features. If the rconn is not connected, then >> we would exit early. However, this early exit did not call >> rconn_run_wait() or rconn_recv_wait(). Therefore, if the rconn were in >> a state such as BACKOFF or CONNECTING, we would not wait on the rconn's >> fd for activity. >> >> In most cases, nobody would notice this because ovn-controller would >> likely be waking up constantly due to southbound database changes, >> handling CLI commands, or other activity. However, in one of Red Hat >> QE's tests, they were seeing ovn-controller take 10 seconds between >> when it would start up and when OF flows were installed. This was >> because the only thing waking ovn-controller up was regularly-scheduled >> probe intervals with the southbound database. >> >> To fix this, we call rconn_run_wait() and rconn_recv_wait() even in >> the case when the rconn is not connected. This way, if we end up in a >> state other than ACTIVE or IDLE, we can handle traffic from the rconn >> when it arrives. >> >> Fixes: e9e716ad531e ("controller: Don't artificially limit group and >> meter IDs to 16bit.") > > 0-day bot flagged this as an issue, the Fixes tag should be on a single > line but that can be fixed when merging the patch. Whoops, I missed this before pushing. I guess I could force-push a change, but I think humans will understand the intent. > >> >> Reported-at: https://issues.redhat.com/browse/FDP-357 >> Signed-off-by: Mark Michelson <mmichels@redhat.com> >> --- >> lib/features.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/lib/features.c b/lib/features.c >> index 82bce10e9..b04437235 100644 >> --- a/lib/features.c >> +++ b/lib/features.c >> @@ -150,6 +150,8 @@ ovs_feature_get_openflow_cap(const char *br_name) >> >> rconn_run(swconn); >> if (!rconn_is_connected(swconn)) { >> + rconn_run_wait(swconn); >> + rconn_recv_wait(swconn); >> return false; >> } >> > > Looks good to me, thanks! > > Acked-by: Dumitru Ceara <dceara@redhat.com> > > Regards, > Dumitru > Thanks Dumitru, I pushed this to main and all branches back to 22.12.
diff --git a/lib/features.c b/lib/features.c index 82bce10e9..b04437235 100644 --- a/lib/features.c +++ b/lib/features.c @@ -150,6 +150,8 @@ ovs_feature_get_openflow_cap(const char *br_name) rconn_run(swconn); if (!rconn_is_connected(swconn)) { + rconn_run_wait(swconn); + rconn_recv_wait(swconn); return false; }
When performing feature detection with OVS, we use an rconn connected to br-int.mgmt to query for features. If the rconn is not connected, then we would exit early. However, this early exit did not call rconn_run_wait() or rconn_recv_wait(). Therefore, if the rconn were in a state such as BACKOFF or CONNECTING, we would not wait on the rconn's fd for activity. In most cases, nobody would notice this because ovn-controller would likely be waking up constantly due to southbound database changes, handling CLI commands, or other activity. However, in one of Red Hat QE's tests, they were seeing ovn-controller take 10 seconds between when it would start up and when OF flows were installed. This was because the only thing waking ovn-controller up was regularly-scheduled probe intervals with the southbound database. To fix this, we call rconn_run_wait() and rconn_recv_wait() even in the case when the rconn is not connected. This way, if we end up in a state other than ACTIVE or IDLE, we can handle traffic from the rconn when it arrives. Fixes: e9e716ad531e ("controller: Don't artificially limit group and meter IDs to 16bit.") Reported-at: https://issues.redhat.com/browse/FDP-357 Signed-off-by: Mark Michelson <mmichels@redhat.com> --- lib/features.c | 2 ++ 1 file changed, 2 insertions(+)