Message ID | 3f70ca7bafad296e18ed9579f30fd7044c47fc61.camel@cloudandheat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] python-stream: handle SSL error in do_handshake | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/intel-ovs-compilation | fail | test: fail |
On Thu, Apr 20, 2023 at 10:14:14AM +0200, Stefan Hoffmann wrote: > In some cases ovsdb server or relay gets restarted, ovsdb python clients > may keep the local socket open. Instead of reconnecting a lot of failures > will be logged. > This can be reproduced with ssl connections to the server/relay and > restarting it, so it has the same IP after restart. > > This patch catches the Exceptions at do_handshake to recreate the > connection on the client side. > > Tracebacks from the issue: > > Traceback (most recent call last): > File \"/usr/local/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/connection.py\", line 107, in run > self.idl.run() > File \"/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/db/idl.py\", line 433, in run > self._session.run() > File \"/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/jsonrpc.py\", line 519, in run > error = self.stream.connect() > File \"/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/stream.py\", line 824, in connect > self.socket.do_handshake() > File \"/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py\", line 312, in do_handshake > return self._call_trampolining( > File \"/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py\", line 158, in _call_trampolining > return func(*a, **kw) > File \"/usr/local/lib/python3.9/ssl.py\", line 1310, in do_handshake > self._sslobj.do_handshake() > ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129) > > 2023-04-03 14:06:43.458 1 ERROR ovsdbapp.backend.ovs_idl.connection > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection [-] TLS/SSL connection has been closed (EOF) (_ssl.c:997): ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997) > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection Traceback (most recent call last): > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self.idl.run() > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovs/db/idl.py", line 433, in run > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self._session.run() > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovs/jsonrpc.py", line 519, in run > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection error = self.stream.connect() > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovs/stream.py", line 824, in connect > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self.socket.do_handshake() > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self._sslobj.do_handshake() > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997) > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection > 2023-04-03 14:06:43.567 1 ERROR ovsdbapp.backend.ovs_idl.connection [-] TLS/SSL connection has been closed (EOF) (_ssl.c:997): ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997) > > Traceback (most recent call last): > File "/usr/local/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run > self.idl.run() > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/db/idl.py", line 433, in run > self._session.run() > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/jsonrpc.py", line 519, in run > error = self.stream.connect() > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/stream.py", line 824, in connect > self.socket.do_handshake() > File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", line 312, in do_handshake > return self._call_trampolining( > File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", line 158, in _call_trampolining > return func(*a, **kw) > File "/usr/local/lib/python3.9/ssl.py", line 1305, in do_handshake > self._check_connected() > File "/usr/local/lib/python3.9/ssl.py", line 1089, in _check_connected > self.getpeername() > > OSError: [Errno 107] Transport endpoint is not connected > > Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> > Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> > Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> > Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> > Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> Hi Stefan, thanks for your patch. I do see CI failures, but I think these are false negatives: * https://patchwork.ozlabs.org/project/openvswitch/patch/3f70ca7bafad296e18ed9579f30fd7044c47fc61.camel@cloudandheat.com/ I'm retrying the GitHub based jobs here. * https://github.com/horms/ovs/actions/runs/4753793285
On Thu, Apr 20, 2023 at 01:35:59PM +0200, Simon Horman wrote: > On Thu, Apr 20, 2023 at 10:14:14AM +0200, Stefan Hoffmann wrote: > > In some cases ovsdb server or relay gets restarted, ovsdb python clients > > may keep the local socket open. Instead of reconnecting a lot of failures > > will be logged. > > This can be reproduced with ssl connections to the server/relay and > > restarting it, so it has the same IP after restart. > > > > This patch catches the Exceptions at do_handshake to recreate the > > connection on the client side. > > > > Tracebacks from the issue: > > > > Traceback (most recent call last): > > File \"/usr/local/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/connection.py\", line 107, in run > > self.idl.run() > > File \"/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/db/idl.py\", line 433, in run > > self._session.run() > > File \"/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/jsonrpc.py\", line 519, in run > > error = self.stream.connect() > > File \"/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/stream.py\", line 824, in connect > > self.socket.do_handshake() > > File \"/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py\", line 312, in do_handshake > > return self._call_trampolining( > > File \"/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py\", line 158, in _call_trampolining > > return func(*a, **kw) > > File \"/usr/local/lib/python3.9/ssl.py\", line 1310, in do_handshake > > self._sslobj.do_handshake() > > ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129) > > > > 2023-04-03 14:06:43.458 1 ERROR ovsdbapp.backend.ovs_idl.connection > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection [-] TLS/SSL connection has been closed (EOF) (_ssl.c:997): ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997) > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection Traceback (most recent call last): > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self.idl.run() > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovs/db/idl.py", line 433, in run > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self._session.run() > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovs/jsonrpc.py", line 519, in run > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection error = self.stream.connect() > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/local/lib/python3.10/dist-packages/ovs/stream.py", line 824, in connect > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self.socket.do_handshake() > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection self._sslobj.do_handshake() > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997) > > 2023-04-03 14:06:43.513 1 ERROR ovsdbapp.backend.ovs_idl.connection > > 2023-04-03 14:06:43.567 1 ERROR ovsdbapp.backend.ovs_idl.connection [-] TLS/SSL connection has been closed (EOF) (_ssl.c:997): ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997) > > > > Traceback (most recent call last): > > File "/usr/local/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run > > self.idl.run() > > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/db/idl.py", line 433, in run > > self._session.run() > > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/jsonrpc.py", line 519, in run > > error = self.stream.connect() > > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0-py3.9.egg/ovs/stream.py", line 824, in connect > > self.socket.do_handshake() > > File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", line 312, in do_handshake > > return self._call_trampolining( > > File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", line 158, in _call_trampolining > > return func(*a, **kw) > > File "/usr/local/lib/python3.9/ssl.py", line 1305, in do_handshake > > self._check_connected() > > File "/usr/local/lib/python3.9/ssl.py", line 1089, in _check_connected > > self.getpeername() > > > > OSError: [Errno 107] Transport endpoint is not connected > > > > Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> > > Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> > > Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> > > Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> > > Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> > > Hi Stefan, > > thanks for your patch. > > I do see CI failures, but I think these are false negatives: > > * https://patchwork.ozlabs.org/project/openvswitch/patch/3f70ca7bafad296e18ed9579f30fd7044c47fc61.camel@cloudandheat.com/ > > I'm retrying the GitHub based jobs here. > > * https://github.com/horms/ovs/actions/runs/4753793285 The above succeeded, so I'm going to assume the previous failure is unrelated to this patch. As for the patch itself, it looks good to me. Reviewed-by: Simon Horman <simon.horman@corigine.com>
On Thu, Apr 20, 2023 at 2:06 PM Simon Horman <simon.horman@corigine.com> wrote: > > I do see CI failures, but I think these are false negatives: > > > > * https://patchwork.ozlabs.org/project/openvswitch/patch/3f70ca7bafad296e18ed9579f30fd7044c47fc61.camel@cloudandheat.com/ > > > > I'm retrying the GitHub based jobs here. > > > > * https://github.com/horms/ovs/actions/runs/4753793285 > > The above succeeded, so I'm going to assume the previous failure > is unrelated to this patch. The bfd decay test is known to be flaky with ASan. Ilya had mentionned it initially when enabling ASan: https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380467.html More recently, Eelco mentionned it: https://mail.openvswitch.org/pipermail/ovs-dev/2022-September/397476.html I also see it every once in a while in GHA.
On Thu, Apr 20, 2023 at 02:18:33PM +0200, David Marchand wrote: > On Thu, Apr 20, 2023 at 2:06 PM Simon Horman <simon.horman@corigine.com> wrote: > > > I do see CI failures, but I think these are false negatives: > > > > > > * https://patchwork.ozlabs.org/project/openvswitch/patch/3f70ca7bafad296e18ed9579f30fd7044c47fc61.camel@cloudandheat.com/ > > > > > > I'm retrying the GitHub based jobs here. > > > > > > * https://github.com/horms/ovs/actions/runs/4753793285 [1] > > The above succeeded, so I'm going to assume the previous failure > > is unrelated to this patch. > > The bfd decay test is known to be flaky with ASan. > > Ilya had mentionned it initially when enabling ASan: > https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380467.html > More recently, Eelco mentionned it: > https://mail.openvswitch.org/pipermail/ovs-dev/2022-September/397476.html > I also see it every once in a while in GHA. Thanks. I've also observed that test is flakey. FWIIW, [1] passed, So I'm happy with this patch. Reviewed-by: Simon Horman <simon.horman@corigine.com>
On Thu, 2023-04-20 at 15:50 +0200, Simon Horman wrote: > On Thu, Apr 20, 2023 at 02:18:33PM +0200, David Marchand wrote: > > On Thu, Apr 20, 2023 at 2:06 PM Simon Horman <simon.horman@corigine.com> wrote: > > > > I do see CI failures, but I think these are false negatives: > > > > > > > > * https://patchwork.ozlabs.org/project/openvswitch/patch/3f70ca7bafad296e18ed9579f30fd7044c47fc61.camel@cloudandheat.com/ > > > > > > > > I'm retrying the GitHub based jobs here. > > > > > > > > * https://github.com/horms/ovs/actions/runs/4753793285 > > [1] > > > > The above succeeded, so I'm going to assume the previous failure > > > is unrelated to this patch. > > > > The bfd decay test is known to be flaky with ASan. > > > > Ilya had mentionned it initially when enabling ASan: > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380467.html > > More recently, Eelco mentionned it: > > https://mail.openvswitch.org/pipermail/ovs-dev/2022-September/397476.html > > I also see it every once in a while in GHA. > > Thanks. I've also observed that test is flakey. > > FWIIW, [1] passed, So I'm happy with this patch. > > Reviewed-by: Simon Horman <simon.horman@corigine.com> > Hi Simon, thanks for your review and reply. Do I need to take care of the intel-ovs-compilation pipeline? The test result looks good, but still some logs are written to the end, that I don't know, what to do with. https://mail.openvswitch.org/pipermail/ovs-build/2023-April/029983.html Otherwise, I would send a v2 of my patch without the stacktraces in the commit messages.
On Thu, Apr 20, 2023 at 05:31:23PM +0200, Stefan Hoffmann wrote: > On Thu, 2023-04-20 at 15:50 +0200, Simon Horman wrote: > > On Thu, Apr 20, 2023 at 02:18:33PM +0200, David Marchand wrote: > > > On Thu, Apr 20, 2023 at 2:06 PM Simon Horman <simon.horman@corigine.com> wrote: > > > > > I do see CI failures, but I think these are false negatives: > > > > > > > > > > * https://patchwork.ozlabs.org/project/openvswitch/patch/3f70ca7bafad296e18ed9579f30fd7044c47fc61.camel@cloudandheat.com/ > > > > > > > > > > I'm retrying the GitHub based jobs here. > > > > > > > > > > * https://github.com/horms/ovs/actions/runs/4753793285 > > > > [1] > > > > > > The above succeeded, so I'm going to assume the previous failure > > > > is unrelated to this patch. > > > > > > The bfd decay test is known to be flaky with ASan. > > > > > > Ilya had mentionned it initially when enabling ASan: > > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380467.html > > > More recently, Eelco mentionned it: > > > https://mail.openvswitch.org/pipermail/ovs-dev/2022-September/397476.html > > > I also see it every once in a while in GHA. > > > > Thanks. I've also observed that test is flakey. > > > > FWIIW, [1] passed, So I'm happy with this patch. > > > > Reviewed-by: Simon Horman <simon.horman@corigine.com> > > > > Hi Simon, > > thanks for your review and reply. > > Do I need to take care of the intel-ovs-compilation pipeline? The test > result looks good, but still some logs are written to the end, that I > don't know, what to do with. > https://mail.openvswitch.org/pipermail/ovs-build/2023-April/029983.html > > Otherwise, I would send a v2 of my patch without the stacktraces in the > commit messages. Hi Stefan, I don't think any action is needed on your part at this time. And I think (but am not entirely sure) we can ignore the intel-ovs-compilation error.
diff --git a/python/ovs/stream.py b/python/ovs/stream.py index ac5b0fd0c..b32341076 100644 --- a/python/ovs/stream.py +++ b/python/ovs/stream.py @@ -824,7 +824,8 @@ class SSLStream(Stream): self.socket.do_handshake() except ssl.SSLWantReadError: return errno.EAGAIN - except ssl.SSLSyscallError as e: + except (ssl.SSLSyscallError, ssl.SSLZeroReturnError, + ssl.SSLEOFError, OSError) as e: return ovs.socket_util.get_exception_errno(e) return 0