diff mbox series

[ovs-dev,v3,04/10] ipsec: libreswan: Try to bring non-active connections up.

Message ID 20241101012321.3346333-5-i.maximets@ovn.org
State Accepted
Commit f11fdde3cb8367b5a4e3c6979d481927f0b12085
Headers show
Series ipsec: Resiliency to Libreswan failures. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Ilya Maximets Nov. 1, 2024, 1:23 a.m. UTC
Sometimes connections are getting loaded, but do not become active for
some reason on a first try.  We can try and bring them up manually.
However, if they are still not active after that, it's better to just
remove the connection and try to add them from scratch, as there must
be some internal issue in libreswan that doesn't allow these connections
to actually become active.

Note: Once the "defunct" connection is removed, the second connection
for the same tunnel will also be removed as "half-loaded".  This ensures
that all the shared SAs will also be cleaned up, so we can truly start
from scratch.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 ipsec/ovs-monitor-ipsec.in | 12 ++++++++++++
 1 file changed, 12 insertions(+)
diff mbox series

Patch

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index 152c30a13..20f6ccb20 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -516,6 +516,7 @@  conn prevent_unencrypted_vxlan
         self.IPSEC_D = "sql:" + libreswan_root_prefix + ipsec_d
         self.IPSEC_CTL = libreswan_root_prefix + ipsec_ctl
         self.conf_file = None
+        self.conns_not_active = set()
         self.last_refresh = time.time()
         self.secrets_file = None
         vlog.dbg("Using: " + self.IPSEC)
@@ -641,6 +642,14 @@  conn prevent_unencrypted_vxlan
             loaded = set(loaded_conns.get(name, dict()).keys())
             active = set(active_conns.get(name, dict()).keys())
 
+            # Untrack connections that became active.
+            self.conns_not_active.difference_update(active)
+            # Remove connections that didn't become active after --start
+            # and another explicit --up.
+            for conn in self.conns_not_active & loaded:
+                self._delete_ipsec_connection(conn, "is defunct")
+                loaded.remove(conn)
+
             # Remove all the loaded or active but not desired connections.
             for conn in loaded | active:
                 if conn not in desired:
@@ -671,6 +680,8 @@  conn prevent_unencrypted_vxlan
                 # so loaded >= active
                 for conn in loaded - active:
                     vlog.info("Bringing up ipsec connection %s" % conn)
+                    # On failure to --up it will be removed from the set.
+                    self.conns_not_active.add(conn)
                     self._start_ipsec_connection(conn, "up")
 
         # Update shunt policy if changed
@@ -804,6 +815,7 @@  conn prevent_unencrypted_vxlan
 
     def _delete_ipsec_connection(self, conn, reason):
         vlog.info("%s %s, removing" % (conn, reason))
+        self.conns_not_active.discard(conn)
         run_command(self.IPSEC_AUTO +
                     ["--ctlsocket", self.IPSEC_CTL,
                      "--config", self.IPSEC_CONF,