diff mbox series

[ovs-dev] ic: call ovn_db_run() only when all DBs are connected

Message ID 20210830195707.98529-1-odivlad@gmail.com
State Accepted
Delegated to: Han Zhou
Headers show
Series [ovs-dev] ic: call ovn_db_run() only when all DBs are connected | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Vladislav Odintsov Aug. 30, 2021, 7:57 p.m. UTC
Prior to this commit ovn_db_run() function in ovn-ic
was called right after daemon's lock was acquired in SB DB.

Inside ovn_db_run() there is a search for availability zone
in NB DB. If AZ was not found, ovn_db_run returned. So, it was
an implicit requirement for connected NB DB.

But in case, where NB DB was connected and ICNB DB was not,
we could run in situation where in ts_run() logical switches
from NB DB were collected and transit switches from ICNB DB
were not due to not synced ICNB DB. This lead to deletion of
transit logical switches from NB DB.

A normal restart of only one ovn-ic daemon in AZ, could damage
the logical topology: it removed LS (transit), its LSPs.
After ovn-ic is connected to ICNB, it creates logical switch
in NB DB back, but its LSPs are lost and need manual re-addition.

This commit adds explicit requirement for all DBs to be
connected:

- NB DB
- SB DB
- ICNB DB
- ICSB DB

Only when this condition is met, we can safely run ovn-ic sync
logic.

Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
---
 ic/ovn-ic.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Han Zhou Sept. 7, 2021, 6:28 a.m. UTC | #1
On Mon, Aug 30, 2021 at 12:57 PM Vladislav Odintsov <odivlad@gmail.com>
wrote:
>
> Prior to this commit ovn_db_run() function in ovn-ic
> was called right after daemon's lock was acquired in SB DB.
>
> Inside ovn_db_run() there is a search for availability zone
> in NB DB. If AZ was not found, ovn_db_run returned. So, it was
> an implicit requirement for connected NB DB.
>
> But in case, where NB DB was connected and ICNB DB was not,
> we could run in situation where in ts_run() logical switches
> from NB DB were collected and transit switches from ICNB DB
> were not due to not synced ICNB DB. This lead to deletion of
> transit logical switches from NB DB.
>
> A normal restart of only one ovn-ic daemon in AZ, could damage
> the logical topology: it removed LS (transit), its LSPs.
> After ovn-ic is connected to ICNB, it creates logical switch
> in NB DB back, but its LSPs are lost and need manual re-addition.
>
> This commit adds explicit requirement for all DBs to be
> connected:
>
> - NB DB
> - SB DB
> - ICNB DB
> - ICSB DB
>
> Only when this condition is met, we can safely run ovn-ic sync
> logic.
>
> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>

Thanks for the fix! This is a critical bug, so I applied to all branches.

Han

> ---
>  ic/ovn-ic.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 18066a305..fc608af82 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -1786,7 +1786,11 @@ main(int argc, char *argv[])
>                  state.had_lock = false;
>              }
>
> -            if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> +            if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl) &&
> +                ovsdb_idl_has_ever_connected(ctx.ovnnb_idl) &&
> +                ovsdb_idl_has_ever_connected(ctx.ovnsb_idl) &&
> +                ovsdb_idl_has_ever_connected(ctx.ovninb_idl) &&
> +                ovsdb_idl_has_ever_connected(ctx.ovnisb_idl)) {
>                  ovn_db_run(&ctx);
>              }
>
> --
> 2.30.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
index 18066a305..fc608af82 100644
--- a/ic/ovn-ic.c
+++ b/ic/ovn-ic.c
@@ -1786,7 +1786,11 @@  main(int argc, char *argv[])
                 state.had_lock = false;
             }
 
-            if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
+            if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl) &&
+                ovsdb_idl_has_ever_connected(ctx.ovnnb_idl) &&
+                ovsdb_idl_has_ever_connected(ctx.ovnsb_idl) &&
+                ovsdb_idl_has_ever_connected(ctx.ovninb_idl) &&
+                ovsdb_idl_has_ever_connected(ctx.ovnisb_idl)) {
                 ovn_db_run(&ctx);
             }