Message ID | 20231129154521.313487-1-xsimonar@redhat.com |
---|---|
State | Superseded |
Delegated to: | Dumitru Ceara |
Headers | show |
Series | [ovs-dev] controller: fix group_table and meter_table allocation | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On 11/29/23 16:45, Xavier Simonart wrote: > The group_table and meter_table are initialized in ovn-controller, with n_ids = 0. > Then they are re-initialized in ofctrl, with proper number of n_ids, in state S_CLEAR_FLOWS. > However, nothing prevented to start adding flows (by adding logical ports) using groups > before ofctrl reaches this state. This was causing some wrong flows (missing group in the flow). > > With this patch, as soon as the feature rconn is available, i.e. before adding flows, those table > are properly re-initialized. > > This issue is usually not visible in main and recent branches ci, since > "Wait for new ovn-controllers to connect to Southbound." as this was slowing down the moment when > the test started to add ports. > This was causing the following test to fail: > "ECMP static routes -- ovn-northd -- parallelization=yes -- ovn_monitor_all=yes". > > Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter IDs to 16bit.") > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > --- Thanks for the patch, Xavier! And sorry for introducing the issue in the first place. > controller/ovn-controller.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 44605eb4e..98d95b1f3 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -5654,6 +5654,14 @@ main(int argc, char *argv[]) > br_int ? br_int->name : NULL)) { > VLOG_INFO("OVS feature set changed, force recompute."); > engine_set_force_recompute(true); > + if (ovs_feature_set_discovered()) { > + lflow_output_data = > + engine_get_internal_data(&en_lflow_output); > + ovn_extend_table_reinit(&lflow_output_data->group_table, > + ovs_feature_max_select_groups_get()); > + ovn_extend_table_reinit(&lflow_output_data->meter_table, > + ovs_feature_max_meters_get()); I don't think this is correct. What if the supported number of groups/meters actually changed, from X to Y, with X != 0? In that case ovs_feature_support_run() would also return true and we'll reinitialize the "extend_tables" (call ovn_extend_table_reinit()). That currently just recreates the id_pool but doesn't touch the "ovn_extend_table->existing" map. Doesn't that mean that we'd potentially end up in an inconsistent state? Funny thing, in the v3 of the original patch that introduced the problem I was calling ovn_extend_table_clear() from ovn_extend_table_reinit() but we decided to remove it as it was already done by the caller. :) https://patchwork.ozlabs.org/project/ovn/patch/20231026113713.1718954-1-dceara@redhat.com/#3209437 Thinking more about it, I wonder, isn't it enough to change the way this works and just call ovn_extend_table_reinit() here instead of in run_S_CLEAR_FLOWS()? We should also clear the "existing" map inside ovn_extend_table_reinit() if we're changing the size of the id_pool. Wpuldn't that properly cover all scenarios? > + } > } > > if (br_int && ovs_feature_set_discovered()) { Regards, Dumitru
Hi Dumitru Thanks for the review. On Wed, Nov 29, 2023 at 11:23 PM Dumitru Ceara <dceara@redhat.com> wrote: > On 11/29/23 16:45, Xavier Simonart wrote: > > The group_table and meter_table are initialized in ovn-controller, with > n_ids = 0. > > Then they are re-initialized in ofctrl, with proper number of n_ids, in > state S_CLEAR_FLOWS. > > However, nothing prevented to start adding flows (by adding logical > ports) using groups > > before ofctrl reaches this state. This was causing some wrong flows > (missing group in the flow). > > > > With this patch, as soon as the feature rconn is available, i.e. before > adding flows, those table > > are properly re-initialized. > > > > This issue is usually not visible in main and recent branches ci, since > > "Wait for new ovn-controllers to connect to Southbound." as this was > slowing down the moment when > > the test started to add ports. > > This was causing the following test to fail: > > "ECMP static routes -- ovn-northd -- parallelization=yes -- > ovn_monitor_all=yes". > > > > Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and > meter IDs to 16bit.") > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- > > Thanks for the patch, Xavier! And sorry for introducing the issue in > the first place. > > > controller/ovn-controller.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > index 44605eb4e..98d95b1f3 100644 > > --- a/controller/ovn-controller.c > > +++ b/controller/ovn-controller.c > > @@ -5654,6 +5654,14 @@ main(int argc, char *argv[]) > > br_int ? br_int->name : > NULL)) { > > VLOG_INFO("OVS feature set changed, force recompute."); > > engine_set_force_recompute(true); > > + if (ovs_feature_set_discovered()) { > > + lflow_output_data = > > + engine_get_internal_data(&en_lflow_output); > > + > ovn_extend_table_reinit(&lflow_output_data->group_table, > > + > ovs_feature_max_select_groups_get()); > > + > ovn_extend_table_reinit(&lflow_output_data->meter_table, > > + > ovs_feature_max_meters_get()); > > I don't think this is correct. What if the supported number of > groups/meters actually changed, from X to Y, with X != 0? In that case > ovs_feature_support_run() would also return true and we'll reinitialize > the "extend_tables" (call ovn_extend_table_reinit()). That currently > just recreates the id_pool but doesn't touch the > "ovn_extend_table->existing" map. > > Doesn't that mean that we'd potentially end up in an inconsistent state? > > Funny thing, in the v3 of the original patch that introduced the problem > I was calling ovn_extend_table_clear() from ovn_extend_table_reinit() > but we decided to remove it as it was already done by the caller. :) > > > https://patchwork.ozlabs.org/project/ovn/patch/20231026113713.1718954-1-dceara@redhat.com/#3209437 I agree that we should call ovn_extend_table_clear() before (or within) the ovn_extend_table_reinit() As a side note, if the number of group decreased (which seems quite theoretical), some flows might become inconsistent, but as the feature set changed, we are recomputing, so it should be fine. I'll send a v2 > > > Thinking more about it, I wonder, isn't it enough to change the way this > works and just call ovn_extend_table_reinit() here instead of in > run_S_CLEAR_FLOWS()? We should also clear the "existing" map inside > ovn_extend_table_reinit() if we're changing the size of the id_pool. > > Wpuldn't that properly cover all scenarios? > I think so too. > > > + } > > } > > > > if (br_int && ovs_feature_set_discovered()) { > > Regards, > Dumitru > Thanks Xavier
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 44605eb4e..98d95b1f3 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -5654,6 +5654,14 @@ main(int argc, char *argv[]) br_int ? br_int->name : NULL)) { VLOG_INFO("OVS feature set changed, force recompute."); engine_set_force_recompute(true); + if (ovs_feature_set_discovered()) { + lflow_output_data = + engine_get_internal_data(&en_lflow_output); + ovn_extend_table_reinit(&lflow_output_data->group_table, + ovs_feature_max_select_groups_get()); + ovn_extend_table_reinit(&lflow_output_data->meter_table, + ovs_feature_max_meters_get()); + } } if (br_int && ovs_feature_set_discovered()) {
The group_table and meter_table are initialized in ovn-controller, with n_ids = 0. Then they are re-initialized in ofctrl, with proper number of n_ids, in state S_CLEAR_FLOWS. However, nothing prevented to start adding flows (by adding logical ports) using groups before ofctrl reaches this state. This was causing some wrong flows (missing group in the flow). With this patch, as soon as the feature rconn is available, i.e. before adding flows, those table are properly re-initialized. This issue is usually not visible in main and recent branches ci, since "Wait for new ovn-controllers to connect to Southbound." as this was slowing down the moment when the test started to add ports. This was causing the following test to fail: "ECMP static routes -- ovn-northd -- parallelization=yes -- ovn_monitor_all=yes". Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter IDs to 16bit.") Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- controller/ovn-controller.c | 8 ++++++++ 1 file changed, 8 insertions(+)