Message ID | 20210830195707.98529-1-odivlad@gmail.com |
---|---|
State | Accepted |
Delegated to: | Han Zhou |
Headers | show |
Series | [ovs-dev] ic: call ovn_db_run() only when all DBs are connected | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
On Mon, Aug 30, 2021 at 12:57 PM Vladislav Odintsov <odivlad@gmail.com> wrote: > > Prior to this commit ovn_db_run() function in ovn-ic > was called right after daemon's lock was acquired in SB DB. > > Inside ovn_db_run() there is a search for availability zone > in NB DB. If AZ was not found, ovn_db_run returned. So, it was > an implicit requirement for connected NB DB. > > But in case, where NB DB was connected and ICNB DB was not, > we could run in situation where in ts_run() logical switches > from NB DB were collected and transit switches from ICNB DB > were not due to not synced ICNB DB. This lead to deletion of > transit logical switches from NB DB. > > A normal restart of only one ovn-ic daemon in AZ, could damage > the logical topology: it removed LS (transit), its LSPs. > After ovn-ic is connected to ICNB, it creates logical switch > in NB DB back, but its LSPs are lost and need manual re-addition. > > This commit adds explicit requirement for all DBs to be > connected: > > - NB DB > - SB DB > - ICNB DB > - ICSB DB > > Only when this condition is met, we can safely run ovn-ic sync > logic. > > Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> Thanks for the fix! This is a critical bug, so I applied to all branches. Han > --- > ic/ovn-ic.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c > index 18066a305..fc608af82 100644 > --- a/ic/ovn-ic.c > +++ b/ic/ovn-ic.c > @@ -1786,7 +1786,11 @@ main(int argc, char *argv[]) > state.had_lock = false; > } > > - if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { > + if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl) && > + ovsdb_idl_has_ever_connected(ctx.ovnnb_idl) && > + ovsdb_idl_has_ever_connected(ctx.ovnsb_idl) && > + ovsdb_idl_has_ever_connected(ctx.ovninb_idl) && > + ovsdb_idl_has_ever_connected(ctx.ovnisb_idl)) { > ovn_db_run(&ctx); > } > > -- > 2.30.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c index 18066a305..fc608af82 100644 --- a/ic/ovn-ic.c +++ b/ic/ovn-ic.c @@ -1786,7 +1786,11 @@ main(int argc, char *argv[]) state.had_lock = false; } - if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { + if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl) && + ovsdb_idl_has_ever_connected(ctx.ovnnb_idl) && + ovsdb_idl_has_ever_connected(ctx.ovnsb_idl) && + ovsdb_idl_has_ever_connected(ctx.ovninb_idl) && + ovsdb_idl_has_ever_connected(ctx.ovnisb_idl)) { ovn_db_run(&ctx); }
Prior to this commit ovn_db_run() function in ovn-ic was called right after daemon's lock was acquired in SB DB. Inside ovn_db_run() there is a search for availability zone in NB DB. If AZ was not found, ovn_db_run returned. So, it was an implicit requirement for connected NB DB. But in case, where NB DB was connected and ICNB DB was not, we could run in situation where in ts_run() logical switches from NB DB were collected and transit switches from ICNB DB were not due to not synced ICNB DB. This lead to deletion of transit logical switches from NB DB. A normal restart of only one ovn-ic daemon in AZ, could damage the logical topology: it removed LS (transit), its LSPs. After ovn-ic is connected to ICNB, it creates logical switch in NB DB back, but its LSPs are lost and need manual re-addition. This commit adds explicit requirement for all DBs to be connected: - NB DB - SB DB - ICNB DB - ICSB DB Only when this condition is met, we can safely run ovn-ic sync logic. Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> --- ic/ovn-ic.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)