Message ID | da98669bd765b1d8e66cf4c1ba76cf142b67f472.camel@cloudandheat.com |
---|---|
State | Accepted |
Commit | b456b1a02f629c2438ef2c3f247f35c8712f12c6 |
Headers | show |
Series | [ovs-dev,v3] python-stream: handle SSL error in do_handshake | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On Mon, 2023-04-24 at 16:42 -0500, Terry Wilson wrote: > > On Fri, Apr 21, 2023 at 3:29 AM Stefan Hoffmann > <stefan.hoffmann@cloudandheat.com> wrote: > > In some cases ovsdb server or relay gets restarted, ovsdb python > > clients > > may keep the local socket open. Instead of reconnecting a lot of > > failures > > will be logged. > > This can be reproduced with ssl connections to the server/relay and > > restarting it, so it has the same IP after restart. > > > > This patch catches the Exceptions at do_handshake to recreate the > > connection on the client side. > > > > > Is this something we can add some tests for? I know we have some > other tests that involve restarts, but I don't think they involve > python clients or SSL. I can try to find a proper place to add such an test. Can you point me to the correct place to add python clients to that test? But I'm not sure, if we run always into that issue at a restart. > > > Reviewed-by: Simon Horman <simon.horman@corigine.com> > > Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> > > Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> > > Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> > > Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> > > Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> > > --- > > python/ovs/stream.py | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/python/ovs/stream.py b/python/ovs/stream.py > > index ac5b0fd0c..b32341076 100644 > > --- a/python/ovs/stream.py > > +++ b/python/ovs/stream.py > > @@ -824,7 +824,8 @@ class SSLStream(Stream): > > self.socket.do_handshake() > > except ssl.SSLWantReadError: > > return errno.EAGAIN > > - except ssl.SSLSyscallError as e: > > + except (ssl.SSLSyscallError, ssl.SSLZeroReturnError, > > + ssl.SSLEOFError, OSError) as e: > > return ovs.socket_util.get_exception_errno(e) > > > > return 0
On 4/25/23 15:08, Stefan Hoffmann wrote: > On Mon, 2023-04-24 at 16:42 -0500, Terry Wilson wrote: >> >> On Fri, Apr 21, 2023 at 3:29 AM Stefan Hoffmann >> <stefan.hoffmann@cloudandheat.com> wrote: >>> In some cases ovsdb server or relay gets restarted, ovsdb python >>> clients >>> may keep the local socket open. Instead of reconnecting a lot of >>> failures >>> will be logged. >>> This can be reproduced with ssl connections to the server/relay and >>> restarting it, so it has the same IP after restart. >>> >>> This patch catches the Exceptions at do_handshake to recreate the >>> connection on the client side. >>> >> >> >> Is this something we can add some tests for? I know we have some >> other tests that involve restarts, but I don't think they involve >> python clients or SSL. > > I can try to find a proper place to add such an test. Can you point me > to the correct place to add python clients to that test? > But I'm not sure, if we run always into that issue at a restart. Usually, such tests live somewhere around tests/ovsdb-idl.at. There is a basic CHECK_STREAM_OPEN_BLOCK test there, but it doesn't check SSL connections, maybe it's worth it to extend the test to be able to connect via SSL. This will require changing the tests/test-stream.py test script. On the subject, I'm not sure if the issue can be reliably reproduced, especially because it requires OpenSSL 3.0, AFAICT. I have one question for the code below. >> >>> Reviewed-by: Simon Horman <simon.horman@corigine.com> >>> Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> >>> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> >>> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>> Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> >>> Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>> --- >>> python/ovs/stream.py | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py >>> index ac5b0fd0c..b32341076 100644 >>> --- a/python/ovs/stream.py >>> +++ b/python/ovs/stream.py >>> @@ -824,7 +824,8 @@ class SSLStream(Stream): >>> self.socket.do_handshake() >>> except ssl.SSLWantReadError: >>> return errno.EAGAIN >>> - except ssl.SSLSyscallError as e: >>> + except (ssl.SSLSyscallError, ssl.SSLZeroReturnError, >>> + ssl.SSLEOFError, OSError) as e: Do we actually need to catch the generic OSError here? It looks a bit odd to catch non-ssl exception. Best regards, Ilya Maximets.
On Tue, 2023-04-25 at 18:13 +0200, Ilya Maximets wrote: > On 4/25/23 15:08, Stefan Hoffmann wrote: > > On Mon, 2023-04-24 at 16:42 -0500, Terry Wilson wrote: > > > > > > On Fri, Apr 21, 2023 at 3:29 AM Stefan Hoffmann > > > <stefan.hoffmann@cloudandheat.com> wrote: > > > > In some cases ovsdb server or relay gets restarted, ovsdb python > > > > clients > > > > may keep the local socket open. Instead of reconnecting a lot of > > > > failures > > > > will be logged. > > > > This can be reproduced with ssl connections to the server/relay and > > > > restarting it, so it has the same IP after restart. > > > > > > > > This patch catches the Exceptions at do_handshake to recreate the > > > > connection on the client side. > > > > > > > > > > > > > Is this something we can add some tests for? I know we have some > > > other tests that involve restarts, but I don't think they involve > > > python clients or SSL. > > > > I can try to find a proper place to add such an test. Can you point me > > to the correct place to add python clients to that test? > > But I'm not sure, if we run always into that issue at a restart. > > Usually, such tests live somewhere around tests/ovsdb-idl.at. > There is a basic CHECK_STREAM_OPEN_BLOCK test there, but it > doesn't check SSL connections, maybe it's worth it to extend > the test to be able to connect via SSL. This will require > changing the tests/test-stream.py test script. > > On the subject, I'm not sure if the issue can be reliably > reproduced, especially because it requires OpenSSL 3.0, AFAICT. > > I have one question for the code below. > > > > > > > > Reviewed-by: Simon Horman <simon.horman@corigine.com> > > > > Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> > > > > Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> > > > > Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> > > > > Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> > > > > Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> > > > > --- > > > > python/ovs/stream.py | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/python/ovs/stream.py b/python/ovs/stream.py > > > > index ac5b0fd0c..b32341076 100644 > > > > --- a/python/ovs/stream.py > > > > +++ b/python/ovs/stream.py > > > > @@ -824,7 +824,8 @@ class SSLStream(Stream): > > > > self.socket.do_handshake() > > > > except ssl.SSLWantReadError: > > > > return errno.EAGAIN > > > > - except ssl.SSLSyscallError as e: > > > > + except (ssl.SSLSyscallError, ssl.SSLZeroReturnError, > > > > + ssl.SSLEOFError, OSError) as e: > > Do we actually need to catch the generic OSError here? > It looks a bit odd to catch non-ssl exception. We got an "Transport endpoint is not connected" sometimes. It tries getpeername() at a socket that is not connected anymore or something like that. That results in the OSError. We also saw, that the file descriptor was still there but not connected. Here the traceback (I hope you can still read that after line wrap): Traceback (most recent call last): File "/usr/local/lib/python3.9/site- packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run self.idl.run() File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- py3.9.egg/ovs/db/idl.py", line 433, in run self._session.run() File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- py3.9.egg/ovs/jsonrpc.py", line 519, in run error = self.stream.connect() File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- py3.9.egg/ovs/stream.py", line 824, in connect self.socket.do_handshake() File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", line 312, in do_handshake return self._call_trampolining( File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", line 158, in _call_trampolining return func(*a, **kw) File "/usr/local/lib/python3.9/ssl.py", line 1305, in do_handshake self._check_connected() File "/usr/local/lib/python3.9/ssl.py", line 1089, in _check_connected self.getpeername() OSError: [Errno 107] Transport endpoint is not connected > > Best regards, Ilya Maximets.
On 4/26/23 10:13, Stefan Hoffmann wrote: > On Tue, 2023-04-25 at 18:13 +0200, Ilya Maximets wrote: >> On 4/25/23 15:08, Stefan Hoffmann wrote: >>> On Mon, 2023-04-24 at 16:42 -0500, Terry Wilson wrote: >>>> >>>> On Fri, Apr 21, 2023 at 3:29 AM Stefan Hoffmann >>>> <stefan.hoffmann@cloudandheat.com> wrote: >>>>> In some cases ovsdb server or relay gets restarted, ovsdb python >>>>> clients >>>>> may keep the local socket open. Instead of reconnecting a lot of >>>>> failures >>>>> will be logged. >>>>> This can be reproduced with ssl connections to the server/relay and >>>>> restarting it, so it has the same IP after restart. >>>>> >>>>> This patch catches the Exceptions at do_handshake to recreate the >>>>> connection on the client side. >>>>> >>>> >>>> >>>> Is this something we can add some tests for? I know we have some >>>> other tests that involve restarts, but I don't think they involve >>>> python clients or SSL. >>> >>> I can try to find a proper place to add such an test. Can you point me >>> to the correct place to add python clients to that test? >>> But I'm not sure, if we run always into that issue at a restart. >> >> Usually, such tests live somewhere around tests/ovsdb-idl.at. >> There is a basic CHECK_STREAM_OPEN_BLOCK test there, but it >> doesn't check SSL connections, maybe it's worth it to extend >> the test to be able to connect via SSL. This will require >> changing the tests/test-stream.py test script. >> >> On the subject, I'm not sure if the issue can be reliably >> reproduced, especially because it requires OpenSSL 3.0, AFAICT. >> >> I have one question for the code below. >> >>>> >>>>> Reviewed-by: Simon Horman <simon.horman@corigine.com> >>>>> Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> >>>>> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> >>>>> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>>>> Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> >>>>> Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>>>> --- >>>>> python/ovs/stream.py | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py >>>>> index ac5b0fd0c..b32341076 100644 >>>>> --- a/python/ovs/stream.py >>>>> +++ b/python/ovs/stream.py >>>>> @@ -824,7 +824,8 @@ class SSLStream(Stream): >>>>> self.socket.do_handshake() >>>>> except ssl.SSLWantReadError: >>>>> return errno.EAGAIN >>>>> - except ssl.SSLSyscallError as e: >>>>> + except (ssl.SSLSyscallError, ssl.SSLZeroReturnError, >>>>> + ssl.SSLEOFError, OSError) as e: >> >> Do we actually need to catch the generic OSError here? >> It looks a bit odd to catch non-ssl exception. > > We got an "Transport endpoint is not connected" sometimes. > It tries getpeername() at a socket that is not connected anymore or > something like that. That results in the OSError. > We also saw, that the file descriptor was still there but not > connected. > > Here the traceback (I hope you can still read that after line wrap): > Traceback (most recent call last): > File "/usr/local/lib/python3.9/site- > packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run > self.idl.run() > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- > py3.9.egg/ovs/db/idl.py", line 433, in run > self._session.run() > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- > py3.9.egg/ovs/jsonrpc.py", line 519, in run > error = self.stream.connect() > File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- > py3.9.egg/ovs/stream.py", line 824, in connect > self.socket.do_handshake() > File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", > line 312, in do_handshake > return self._call_trampolining( > File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", > line 158, in _call_trampolining > return func(*a, **kw) > File "/usr/local/lib/python3.9/ssl.py", line 1305, in do_handshake > self._check_connected() > File "/usr/local/lib/python3.9/ssl.py", line 1089, in > _check_connected > self.getpeername() > > OSError: [Errno 107] Transport endpoint is not connected OK. Yeah, looking through the ssl.py implementation, it can indeed throw a generic OSError. That's unfortunate, but OK, I guess. Thanks for clarification! >> >> Best regards, Ilya Maximets. >
On 4/26/23 12:57, Ilya Maximets wrote: > On 4/26/23 10:13, Stefan Hoffmann wrote: >> On Tue, 2023-04-25 at 18:13 +0200, Ilya Maximets wrote: >>> On 4/25/23 15:08, Stefan Hoffmann wrote: >>>> On Mon, 2023-04-24 at 16:42 -0500, Terry Wilson wrote: >>>>> >>>>> On Fri, Apr 21, 2023 at 3:29 AM Stefan Hoffmann >>>>> <stefan.hoffmann@cloudandheat.com> wrote: >>>>>> In some cases ovsdb server or relay gets restarted, ovsdb python >>>>>> clients >>>>>> may keep the local socket open. Instead of reconnecting a lot of >>>>>> failures >>>>>> will be logged. >>>>>> This can be reproduced with ssl connections to the server/relay and >>>>>> restarting it, so it has the same IP after restart. >>>>>> >>>>>> This patch catches the Exceptions at do_handshake to recreate the >>>>>> connection on the client side. >>>>>> >>>>> >>>>> >>>>> Is this something we can add some tests for? I know we have some >>>>> other tests that involve restarts, but I don't think they involve >>>>> python clients or SSL. >>>> >>>> I can try to find a proper place to add such an test. Can you point me >>>> to the correct place to add python clients to that test? >>>> But I'm not sure, if we run always into that issue at a restart. >>> >>> Usually, such tests live somewhere around tests/ovsdb-idl.at. >>> There is a basic CHECK_STREAM_OPEN_BLOCK test there, but it >>> doesn't check SSL connections, maybe it's worth it to extend >>> the test to be able to connect via SSL. This will require >>> changing the tests/test-stream.py test script. >>> >>> On the subject, I'm not sure if the issue can be reliably >>> reproduced, especially because it requires OpenSSL 3.0, AFAICT. >>> >>> I have one question for the code below. >>> >>>>> >>>>>> Reviewed-by: Simon Horman <simon.horman@corigine.com> >>>>>> Signed-off-by: Stefan Hoffmann <stefan.hoffmann@cloudandheat.com> >>>>>> Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz> >>>>>> Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>>>>> Co-authored-by: Luca Czesla <luca.czesla@mail.schwarz> >>>>>> Co-authored-by: Max Lamprecht <max.lamprecht@mail.schwarz> >>>>>> --- >>>>>> python/ovs/stream.py | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py >>>>>> index ac5b0fd0c..b32341076 100644 >>>>>> --- a/python/ovs/stream.py >>>>>> +++ b/python/ovs/stream.py >>>>>> @@ -824,7 +824,8 @@ class SSLStream(Stream): >>>>>> self.socket.do_handshake() >>>>>> except ssl.SSLWantReadError: >>>>>> return errno.EAGAIN >>>>>> - except ssl.SSLSyscallError as e: >>>>>> + except (ssl.SSLSyscallError, ssl.SSLZeroReturnError, >>>>>> + ssl.SSLEOFError, OSError) as e: >>> >>> Do we actually need to catch the generic OSError here? >>> It looks a bit odd to catch non-ssl exception. >> >> We got an "Transport endpoint is not connected" sometimes. >> It tries getpeername() at a socket that is not connected anymore or >> something like that. That results in the OSError. >> We also saw, that the file descriptor was still there but not >> connected. >> >> Here the traceback (I hope you can still read that after line wrap): >> Traceback (most recent call last): >> File "/usr/local/lib/python3.9/site- >> packages/ovsdbapp/backend/ovs_idl/connection.py", line 107, in run >> self.idl.run() >> File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- >> py3.9.egg/ovs/db/idl.py", line 433, in run >> self._session.run() >> File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- >> py3.9.egg/ovs/jsonrpc.py", line 519, in run >> error = self.stream.connect() >> File "/usr/local/lib/python3.9/site-packages/ovs-3.1.0- >> py3.9.egg/ovs/stream.py", line 824, in connect >> self.socket.do_handshake() >> File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", >> line 312, in do_handshake >> return self._call_trampolining( >> File "/usr/local/lib/python3.9/site-packages/eventlet/green/ssl.py", >> line 158, in _call_trampolining >> return func(*a, **kw) >> File "/usr/local/lib/python3.9/ssl.py", line 1305, in do_handshake >> self._check_connected() >> File "/usr/local/lib/python3.9/ssl.py", line 1089, in >> _check_connected >> self.getpeername() >> >> OSError: [Errno 107] Transport endpoint is not connected > > OK. Yeah, looking through the ssl.py implementation, it can indeed > throw a generic OSError. That's unfortunate, but OK, I guess. > > Thanks for clarification! Since it's unlikely we will have a test specific for this issue, I applied the change as is for now. Also, backported down to 2.17. It would stll be great to have more test coverage around python and SSL, but that is slightly out of the scope of the current fix. Thanks! Best regards, Ilya Maximets.
diff --git a/python/ovs/stream.py b/python/ovs/stream.py index ac5b0fd0c..b32341076 100644 --- a/python/ovs/stream.py +++ b/python/ovs/stream.py @@ -824,7 +824,8 @@ class SSLStream(Stream): self.socket.do_handshake() except ssl.SSLWantReadError: return errno.EAGAIN - except ssl.SSLSyscallError as e: + except (ssl.SSLSyscallError, ssl.SSLZeroReturnError, + ssl.SSLEOFError, OSError) as e: return ovs.socket_util.get_exception_errno(e) return 0