[RFC,1/6] io: only allow return path for socket typed

Message ID	1495176212-14446-2-git-send-email-peterx@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DDAB87F4A3 From: Peter Xu <peterx@redhat.com> To: qemu-devel@nongnu.org Date: Fri, 19 May 2017 14:43:27 +0800 Message-Id: <1495176212-14446-2-git-send-email-peterx@redhat.com> In-Reply-To: <1495176212-14446-1-git-send-email-peterx@redhat.com> References: <1495176212-14446-1-git-send-email-peterx@redhat.com> Subject: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed Precedence: list Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>, peterx@redhat.com, Juan Quintela <quintela@redhat.com> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Peter Xu May 19, 2017, 6:43 a.m. UTC

We don't really have a return path for the other types yet. Let's check
this when .get_return_path() is called.

For this, we introduce a new feature bit, and set it up only for socket
typed IO channels.

This will help detect earlier failure for postcopy, e.g., logically
speaking postcopy cannot work with "exec:". Before this patch, when we
try to migrate with "migrate -d exec:cat>out", we'll hang the system.
With this patch, we'll get:

(qemu) migrate -d exec:cat>out
Unable to open return-path for postcopy

CC: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/io/channel.h          | 1 +
 io/channel-socket.c           | 1 +
 migration/qemu-file-channel.c | 9 +++++++++
 3 files changed, 11 insertions(+)

Daniel P. Berrangé May 19, 2017, 8:25 a.m. UTC | #1

On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> We don't really have a return path for the other types yet. Let's check
> this when .get_return_path() is called.
> 
> For this, we introduce a new feature bit, and set it up only for socket
> typed IO channels.
> 
> This will help detect earlier failure for postcopy, e.g., logically
> speaking postcopy cannot work with "exec:". Before this patch, when we
> try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> With this patch, we'll get:
> 
> (qemu) migrate -d exec:cat>out
> Unable to open return-path for postcopy

This is wrong - post-copy migration *can* work with exec: - it just entirely
depends on what command you are running. Your example ran a command which is
unidirectional, but if you ran 'exec:socat ...' you would have a fully
bidirectional channel. Actually the channel is always bi-directional, but
'cat' simply won't ever send data back to QEMU.

If QEMU hangs when the other end doesn't send data back, that actually seems
like a potentially serious bug in migration code. Even if using the normal
'tcp' migration protocol, if the target QEMU server hangs and fails to
send data to QEMU on the return path, the source QEMU must never hang.

Regards,
Daniel

Daniel P. Berrangé May 19, 2017, 8:30 a.m. UTC | #2

On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.
> 
> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

BTW, if you want to simplify the code in this area at all, then arguably
we should get rid of the "get_return_path" helper method entirely. We're
not actually opening any new connections - we're just creating a second
QEMUFile that uses the same underlying QIOChannel object. All we would
need is for the QEMUFile to have a separate 'buf' field management in
QEMUFile for the read & write directions.  Then all the code would be
able to just use the single QEMUFile for read & write getting rid of this
concept of "opening a return path" which doens't actually do anything at
the underlying data transport level.

Regards,
Daniel

Peter Xu May 19, 2017, 9:51 a.m. UTC | #3

On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.

Indeed. I should not block postcopy if the user used a TCP tunnel
between the source and destination in some way, using this exec: way.
Thanks for pointing that out.

However I still think the idea is needed here. Say, we'd better know
whether the transport would be able to respond (though current
approach of "assuming sockets are the only ones that can reply" is not
a good solution...). Please see below.

> 
> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

Firstly I should not say it's a hang - it's actually by-design here
imho - migration thread is in the last phase now, waiting for a SHUT
message from destination (which I think is wise). But from the
behavior, indeed src VM is not usable during the time, just like what
happened for most postcopy cases on the source side. So, we can see
that postcopy "assumes" that destination side can reply now.

Meanwhile, I see it reasonable for postcopy to have such an
assumption. After all, postcopy means "start VM on destination before
pages are moved over completely", then there must be someone to reply
to source, no matter whether it'll be via some kind of io channel.

That's why I think we still need the general idea here, that we need
to know whether destination end is able to reply.

But, I still have no good idea (after knowing this patch won't work)
on how we can do this... Any further suggestions would be greatly
welcomed.

Thanks,

Peter Xu May 19, 2017, 9:55 a.m. UTC | #4

On Fri, May 19, 2017 at 09:30:10AM +0100, Daniel P. Berrange wrote:
> On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > We don't really have a return path for the other types yet. Let's check
> > > this when .get_return_path() is called.
> > > 
> > > For this, we introduce a new feature bit, and set it up only for socket
> > > typed IO channels.
> > > 
> > > This will help detect earlier failure for postcopy, e.g., logically
> > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > With this patch, we'll get:
> > > 
> > > (qemu) migrate -d exec:cat>out
> > > Unable to open return-path for postcopy
> > 
> > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > depends on what command you are running. Your example ran a command which is
> > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > bidirectional channel. Actually the channel is always bi-directional, but
> > 'cat' simply won't ever send data back to QEMU.
> > 
> > If QEMU hangs when the other end doesn't send data back, that actually seems
> > like a potentially serious bug in migration code. Even if using the normal
> > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > send data to QEMU on the return path, the source QEMU must never hang.
> 
> BTW, if you want to simplify the code in this area at all, then arguably
> we should get rid of the "get_return_path" helper method entirely. We're
> not actually opening any new connections - we're just creating a second
> QEMUFile that uses the same underlying QIOChannel object. All we would
> need is for the QEMUFile to have a separate 'buf' field management in
> QEMUFile for the read & write directions.  Then all the code would be
> able to just use the single QEMUFile for read & write getting rid of this
> concept of "opening a return path" which doens't actually do anything at
> the underlying data transport level.

Makes sense. Noted. Thanks,

Daniel P. Berrangé May 19, 2017, 10:03 a.m. UTC | #5

On Fri, May 19, 2017 at 05:51:43PM +0800, Peter Xu wrote:
> On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > We don't really have a return path for the other types yet. Let's check
> > > this when .get_return_path() is called.
> > > 
> > > For this, we introduce a new feature bit, and set it up only for socket
> > > typed IO channels.
> > > 
> > > This will help detect earlier failure for postcopy, e.g., logically
> > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > With this patch, we'll get:
> > > 
> > > (qemu) migrate -d exec:cat>out
> > > Unable to open return-path for postcopy
> > 
> > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > depends on what command you are running. Your example ran a command which is
> > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > bidirectional channel. Actually the channel is always bi-directional, but
> > 'cat' simply won't ever send data back to QEMU.
> 
> Indeed. I should not block postcopy if the user used a TCP tunnel
> between the source and destination in some way, using this exec: way.
> Thanks for pointing that out.
> 
> However I still think the idea is needed here. Say, we'd better know
> whether the transport would be able to respond (though current
> approach of "assuming sockets are the only ones that can reply" is not
> a good solution...). Please see below.
> 
> > 
> > If QEMU hangs when the other end doesn't send data back, that actually seems
> > like a potentially serious bug in migration code. Even if using the normal
> > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > send data to QEMU on the return path, the source QEMU must never hang.
> 
> Firstly I should not say it's a hang - it's actually by-design here
> imho - migration thread is in the last phase now, waiting for a SHUT
> message from destination (which I think is wise). But from the
> behavior, indeed src VM is not usable during the time, just like what
> happened for most postcopy cases on the source side. So, we can see
> that postcopy "assumes" that destination side can reply now.
> 
> Meanwhile, I see it reasonable for postcopy to have such an
> assumption. After all, postcopy means "start VM on destination before
> pages are moved over completely", then there must be someone to reply
> to source, no matter whether it'll be via some kind of io channel.
> 
> That's why I think we still need the general idea here, that we need
> to know whether destination end is able to reply.
> 
> But, I still have no good idea (after knowing this patch won't work)
> on how we can do this... Any further suggestions would be greatly
> welcomed.

IMHO this is nothing more than a documentation issue for the 'exec'
protocol. ie, document that you should provide a bi-directional
transport for live migration.

A uni-directional transport is arguably only valid if you're using
migrate to save/restore the VM state to a file.

Regards,
Daniel

Dr. David Alan Gilbert May 19, 2017, 12:51 p.m. UTC | #6

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.

The thing is it didn't used to be able to; prior to your conversion to
channel, postcopy would reject being started with exec: because it
couldn't open a return path, so it was safe.

> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
return path; but it does depend on how the exec: behaves on the
destination.
If the destination discards data written to it, then I think the
behaviour would be:
   a) Page requests would just get dropped, they'd eventually get
fulfilled by the background page transmissions, but that could mean that
a page request would wait for minutes for the page.
   b) The qemu main thread on the destination can be blocked by that, so
the monitor might not respond until the page request is fulfilled.
   c) I'm not quite sure what would happen to the source return-path
thread

The behaviour seems to have changed between 2.9.0 (f26 package) and my
reasonably recent head build.

2.9.0 gives me:
(qemu) migrate_set_speed 1B
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d "exec:cat > out"
RP: Received invalid message 0x0000 length 0x0000
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: failed
total time: 0 milliseconds

So that's the return path thread trying to read from the exec: not
getting anything and failing.

On head-ish it doesn't fail, the source qemu doesn't hang, however
the migration never completes - possibly because it's waiting for
the MIG_RP_MSG_SHUT from the destination.
A migration_cancel also doesn't work for 'exec' because it doesn't
support shutdown() - it just sticks in 'cancelling'.
On a socket that was broken like this the cancel would work because
it issues a shutdown() which causes the socket to cleanup.

Personally I'd rather fix this by still not supporting exec:,
making shutdown() work on exec (by kill'ing the child process)
means at least cancel would work, but it still wouldn't be pretty
for a postcopy, and still doesn't help Peter solve this problem
which is a nasty problem QEMU has had for ages.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Dr. David Alan Gilbert May 19, 2017, 12:54 p.m. UTC | #7

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > We don't really have a return path for the other types yet. Let's check
> > > this when .get_return_path() is called.
> > > 
> > > For this, we introduce a new feature bit, and set it up only for socket
> > > typed IO channels.
> > > 
> > > This will help detect earlier failure for postcopy, e.g., logically
> > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > With this patch, we'll get:
> > > 
> > > (qemu) migrate -d exec:cat>out
> > > Unable to open return-path for postcopy
> > 
> > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > depends on what command you are running. Your example ran a command which is
> > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > bidirectional channel. Actually the channel is always bi-directional, but
> > 'cat' simply won't ever send data back to QEMU.
> > 
> > If QEMU hangs when the other end doesn't send data back, that actually seems
> > like a potentially serious bug in migration code. Even if using the normal
> > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > send data to QEMU on the return path, the source QEMU must never hang.
> 
> BTW, if you want to simplify the code in this area at all, then arguably
> we should get rid of the "get_return_path" helper method entirely. We're
> not actually opening any new connections - we're just creating a second
> QEMUFile that uses the same underlying QIOChannel object. All we would
> need is for the QEMUFile to have a separate 'buf' field management in
> QEMUFile for the read & write directions.  Then all the code would be
> able to just use the single QEMUFile for read & write getting rid of this
> concept of "opening a return path" which doens't actually do anything at
> the underlying data transport level.

No, I'd rather keep the get_return_path;  we should keep each direction
separate.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel P. Berrangé May 19, 2017, 12:56 p.m. UTC | #8

On Fri, May 19, 2017 at 01:51:43PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > We don't really have a return path for the other types yet. Let's check
> > > this when .get_return_path() is called.
> > > 
> > > For this, we introduce a new feature bit, and set it up only for socket
> > > typed IO channels.
> > > 
> > > This will help detect earlier failure for postcopy, e.g., logically
> > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > With this patch, we'll get:
> > > 
> > > (qemu) migrate -d exec:cat>out
> > > Unable to open return-path for postcopy
> > 
> > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > depends on what command you are running. Your example ran a command which is
> > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > bidirectional channel. Actually the channel is always bi-directional, but
> > 'cat' simply won't ever send data back to QEMU.
> 
> The thing is it didn't used to be able to; prior to your conversion to
> channel, postcopy would reject being started with exec: because it
> couldn't open a return path, so it was safe.

It safe but functionally broken because it is valid to want to use
exec migration with post-copy.

> > If QEMU hangs when the other end doesn't send data back, that actually seems
> > like a potentially serious bug in migration code. Even if using the normal
> > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > send data to QEMU on the return path, the source QEMU must never hang.
> 
> Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
> return path; but it does depend on how the exec: behaves on the
> destination.
> If the destination discards data written to it, then I think the
> behaviour would be:
>    a) Page requests would just get dropped, they'd eventually get
> fulfilled by the background page transmissions, but that could mean that
> a page request would wait for minutes for the page.
>    b) The qemu main thread on the destination can be blocked by that, so
> the monitor might not respond until the page request is fulfilled.
>    c) I'm not quite sure what would happen to the source return-path
> thread
> 
> The behaviour seems to have changed between 2.9.0 (f26 package) and my
> reasonably recent head build.

That's due to the bug we just fixed where we mistakenly didn't
allow bi-directional I/O for exec

  commit 062d81f0e968fe1597474735f3ea038065027372
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Fri Apr 21 12:12:20 2017 +0100

    migration: setup bi-directional I/O channel for exec: protocol
    
    Historically the migration data channel has only needed to be
    unidirectional. Thus the 'exec:' protocol was requesting an
    I/O channel with O_RDONLY on incoming side, and O_WRONLY on
    the outgoing side.
    
    This is fine for classic migration, but if you then try to run
    TLS over it, this fails because the TLS handshake requires a
    bi-directional channel.
    
    Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
    Reviewed-by: Juan Quintela <quintela@redhat.com>
    Signed-off-by: Juan Quintela <quintela@redhat.com>


> A migration_cancel also doesn't work for 'exec' because it doesn't
> support shutdown() - it just sticks in 'cancelling'.
> On a socket that was broken like this the cancel would work because
> it issues a shutdown() which causes the socket to cleanup.

I'm curious why migration_cancel requires shutdown() to work ? Why
isn't it sufficient from the source POV to just close the socket
entirely straight away.

Regards,
Daniel

Dr. David Alan Gilbert May 19, 2017, 1:02 p.m. UTC | #9

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, May 19, 2017 at 01:51:43PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > > We don't really have a return path for the other types yet. Let's check
> > > > this when .get_return_path() is called.
> > > > 
> > > > For this, we introduce a new feature bit, and set it up only for socket
> > > > typed IO channels.
> > > > 
> > > > This will help detect earlier failure for postcopy, e.g., logically
> > > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > > With this patch, we'll get:
> > > > 
> > > > (qemu) migrate -d exec:cat>out
> > > > Unable to open return-path for postcopy
> > > 
> > > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > > depends on what command you are running. Your example ran a command which is
> > > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > > bidirectional channel. Actually the channel is always bi-directional, but
> > > 'cat' simply won't ever send data back to QEMU.
> > 
> > The thing is it didn't used to be able to; prior to your conversion to
> > channel, postcopy would reject being started with exec: because it
> > couldn't open a return path, so it was safe.
> 
> It safe but functionally broken because it is valid to want to use
> exec migration with post-copy.
> 
> > > If QEMU hangs when the other end doesn't send data back, that actually seems
> > > like a potentially serious bug in migration code. Even if using the normal
> > > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > > send data to QEMU on the return path, the source QEMU must never hang.
> > 
> > Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
> > return path; but it does depend on how the exec: behaves on the
> > destination.
> > If the destination discards data written to it, then I think the
> > behaviour would be:
> >    a) Page requests would just get dropped, they'd eventually get
> > fulfilled by the background page transmissions, but that could mean that
> > a page request would wait for minutes for the page.
> >    b) The qemu main thread on the destination can be blocked by that, so
> > the monitor might not respond until the page request is fulfilled.
> >    c) I'm not quite sure what would happen to the source return-path
> > thread
> > 
> > The behaviour seems to have changed between 2.9.0 (f26 package) and my
> > reasonably recent head build.
> 
> That's due to the bug we just fixed where we mistakenly didn't
> allow bi-directional I/O for exec
> 
>   commit 062d81f0e968fe1597474735f3ea038065027372
>   Author: Daniel P. Berrange <berrange@redhat.com>
>   Date:   Fri Apr 21 12:12:20 2017 +0100
> 
>     migration: setup bi-directional I/O channel for exec: protocol
>     
>     Historically the migration data channel has only needed to be
>     unidirectional. Thus the 'exec:' protocol was requesting an
>     I/O channel with O_RDONLY on incoming side, and O_WRONLY on
>     the outgoing side.
>     
>     This is fine for classic migration, but if you then try to run
>     TLS over it, this fails because the TLS handshake requires a
>     bi-directional channel.
>     
>     Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
>     Reviewed-by: Juan Quintela <quintela@redhat.com>
>     Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> 
> > A migration_cancel also doesn't work for 'exec' because it doesn't
> > support shutdown() - it just sticks in 'cancelling'.
> > On a socket that was broken like this the cancel would work because
> > it issues a shutdown() which causes the socket to cleanup.
> 
> I'm curious why migration_cancel requires shutdown() to work ? Why
> isn't it sufficient from the source POV to just close the socket
> entirely straight away.

close() closes the fd so that any other uses of the fd get an
error and you're at risk of the fd getting reallocated by something
else.  So consider if the main thread calls close(), the migration
thread and the return thread both carry on using that fd, which might
have been reallocated to something different.  Worse I think we came to the
consolution that on some OSs a read()/write() blocked in the use of an fd didn't
get kicked out by a close.

shutdown() is safe, in that it stops any other threads accessing the fd
but doesn't allow it's reallocation until the close;  We perform the
close only when we've joined all other threads that were using the fd.
Any of the threads that do new calls on the fd get an error and quickly
fall down their error paths.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel P. Berrangé May 19, 2017, 1:13 p.m. UTC | #10

On Fri, May 19, 2017 at 02:02:00PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Fri, May 19, 2017 at 01:51:43PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > > > We don't really have a return path for the other types yet. Let's check
> > > > > this when .get_return_path() is called.
> > > > > 
> > > > > For this, we introduce a new feature bit, and set it up only for socket
> > > > > typed IO channels.
> > > > > 
> > > > > This will help detect earlier failure for postcopy, e.g., logically
> > > > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > > > With this patch, we'll get:
> > > > > 
> > > > > (qemu) migrate -d exec:cat>out
> > > > > Unable to open return-path for postcopy
> > > > 
> > > > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > > > depends on what command you are running. Your example ran a command which is
> > > > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > > > bidirectional channel. Actually the channel is always bi-directional, but
> > > > 'cat' simply won't ever send data back to QEMU.
> > > 
> > > The thing is it didn't used to be able to; prior to your conversion to
> > > channel, postcopy would reject being started with exec: because it
> > > couldn't open a return path, so it was safe.
> > 
> > It safe but functionally broken because it is valid to want to use
> > exec migration with post-copy.
> > 
> > > > If QEMU hangs when the other end doesn't send data back, that actually seems
> > > > like a potentially serious bug in migration code. Even if using the normal
> > > > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > > > send data to QEMU on the return path, the source QEMU must never hang.
> > > 
> > > Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
> > > return path; but it does depend on how the exec: behaves on the
> > > destination.
> > > If the destination discards data written to it, then I think the
> > > behaviour would be:
> > >    a) Page requests would just get dropped, they'd eventually get
> > > fulfilled by the background page transmissions, but that could mean that
> > > a page request would wait for minutes for the page.
> > >    b) The qemu main thread on the destination can be blocked by that, so
> > > the monitor might not respond until the page request is fulfilled.
> > >    c) I'm not quite sure what would happen to the source return-path
> > > thread
> > > 
> > > The behaviour seems to have changed between 2.9.0 (f26 package) and my
> > > reasonably recent head build.
> > 
> > That's due to the bug we just fixed where we mistakenly didn't
> > allow bi-directional I/O for exec
> > 
> >   commit 062d81f0e968fe1597474735f3ea038065027372
> >   Author: Daniel P. Berrange <berrange@redhat.com>
> >   Date:   Fri Apr 21 12:12:20 2017 +0100
> > 
> >     migration: setup bi-directional I/O channel for exec: protocol
> >     
> >     Historically the migration data channel has only needed to be
> >     unidirectional. Thus the 'exec:' protocol was requesting an
> >     I/O channel with O_RDONLY on incoming side, and O_WRONLY on
> >     the outgoing side.
> >     
> >     This is fine for classic migration, but if you then try to run
> >     TLS over it, this fails because the TLS handshake requires a
> >     bi-directional channel.
> >     
> >     Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> >     Reviewed-by: Juan Quintela <quintela@redhat.com>
> >     Signed-off-by: Juan Quintela <quintela@redhat.com>
> > 
> > 
> > > A migration_cancel also doesn't work for 'exec' because it doesn't
> > > support shutdown() - it just sticks in 'cancelling'.
> > > On a socket that was broken like this the cancel would work because
> > > it issues a shutdown() which causes the socket to cleanup.
> > 
> > I'm curious why migration_cancel requires shutdown() to work ? Why
> > isn't it sufficient from the source POV to just close the socket
> > entirely straight away.
> 
> close() closes the fd so that any other uses of the fd get an
> error and you're at risk of the fd getting reallocated by something
> else.  So consider if the main thread calls close(), the migration
> thread and the return thread both carry on using that fd, which might
> have been reallocated to something different.  Worse I think we came to the
> consolution that on some OSs a read()/write() blocked in the use of an fd
> didn't get kicked out by a close.

I'd be very surprised if close() didn't terminate any other threads doing
read/write, and even more surprised if it they handed out the same FD
to another thread while something was still in the read.

> shutdown() is safe, in that it stops any other threads accessing the fd
> but doesn't allow it's reallocation until the close;  We perform the
> close only when we've joined all other threads that were using the fd.
> Any of the threads that do new calls on the fd get an error and quickly
> fall down their error paths.

Ahh that's certainly an interesting scenario. That would certainly be
a problem with the migration code when this was originally written.
It had two QEMUFile structs each with an 'int fd' field, so when you
close 'fd' on one QEMUFile struct, it wouldn't update the other QEMUFile
used by another thread.

Since we switched over to use QIOChannel though, I think the thread
scenario you describe should be avoided entirely. When you have multiple
QEMUFile objects, they each have a reference counted pointer to the same
underlying QIOChannel object instance. So when QEMUFile triggers a call
to qio_channel_close() in one thread, that'll set fd=-1 in the QIOChannel.
Since the other threads have a reference to the same QIOChannel object,
they'll now see this fd == -1 straightaway.

So, IIUC, this should make the need for shutdown() redundant (at least
for the thread race conditions you describe).

Regards,
Daniel

Dr. David Alan Gilbert May 19, 2017, 2:33 p.m. UTC | #11

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, May 19, 2017 at 02:02:00PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > On Fri, May 19, 2017 at 01:51:43PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > > > > We don't really have a return path for the other types yet. Let's check
> > > > > > this when .get_return_path() is called.
> > > > > > 
> > > > > > For this, we introduce a new feature bit, and set it up only for socket
> > > > > > typed IO channels.
> > > > > > 
> > > > > > This will help detect earlier failure for postcopy, e.g., logically
> > > > > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > > > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > > > > With this patch, we'll get:
> > > > > > 
> > > > > > (qemu) migrate -d exec:cat>out
> > > > > > Unable to open return-path for postcopy
> > > > > 
> > > > > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > > > > depends on what command you are running. Your example ran a command which is
> > > > > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > > > > bidirectional channel. Actually the channel is always bi-directional, but
> > > > > 'cat' simply won't ever send data back to QEMU.
> > > > 
> > > > The thing is it didn't used to be able to; prior to your conversion to
> > > > channel, postcopy would reject being started with exec: because it
> > > > couldn't open a return path, so it was safe.
> > > 
> > > It safe but functionally broken because it is valid to want to use
> > > exec migration with post-copy.
> > > 
> > > > > If QEMU hangs when the other end doesn't send data back, that actually seems
> > > > > like a potentially serious bug in migration code. Even if using the normal
> > > > > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > > > > send data to QEMU on the return path, the source QEMU must never hang.
> > > > 
> > > > Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
> > > > return path; but it does depend on how the exec: behaves on the
> > > > destination.
> > > > If the destination discards data written to it, then I think the
> > > > behaviour would be:
> > > >    a) Page requests would just get dropped, they'd eventually get
> > > > fulfilled by the background page transmissions, but that could mean that
> > > > a page request would wait for minutes for the page.
> > > >    b) The qemu main thread on the destination can be blocked by that, so
> > > > the monitor might not respond until the page request is fulfilled.
> > > >    c) I'm not quite sure what would happen to the source return-path
> > > > thread
> > > > 
> > > > The behaviour seems to have changed between 2.9.0 (f26 package) and my
> > > > reasonably recent head build.
> > > 
> > > That's due to the bug we just fixed where we mistakenly didn't
> > > allow bi-directional I/O for exec
> > > 
> > >   commit 062d81f0e968fe1597474735f3ea038065027372
> > >   Author: Daniel P. Berrange <berrange@redhat.com>
> > >   Date:   Fri Apr 21 12:12:20 2017 +0100
> > > 
> > >     migration: setup bi-directional I/O channel for exec: protocol
> > >     
> > >     Historically the migration data channel has only needed to be
> > >     unidirectional. Thus the 'exec:' protocol was requesting an
> > >     I/O channel with O_RDONLY on incoming side, and O_WRONLY on
> > >     the outgoing side.
> > >     
> > >     This is fine for classic migration, but if you then try to run
> > >     TLS over it, this fails because the TLS handshake requires a
> > >     bi-directional channel.
> > >     
> > >     Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> > >     Reviewed-by: Juan Quintela <quintela@redhat.com>
> > >     Signed-off-by: Juan Quintela <quintela@redhat.com>
> > > 
> > > 
> > > > A migration_cancel also doesn't work for 'exec' because it doesn't
> > > > support shutdown() - it just sticks in 'cancelling'.
> > > > On a socket that was broken like this the cancel would work because
> > > > it issues a shutdown() which causes the socket to cleanup.
> > > 
> > > I'm curious why migration_cancel requires shutdown() to work ? Why
> > > isn't it sufficient from the source POV to just close the socket
> > > entirely straight away.
> > 
> > close() closes the fd so that any other uses of the fd get an
> > error and you're at risk of the fd getting reallocated by something
> > else.  So consider if the main thread calls close(), the migration
> > thread and the return thread both carry on using that fd, which might
> > have been reallocated to something different.  Worse I think we came to the
> > consolution that on some OSs a read()/write() blocked in the use of an fd
> > didn't get kicked out by a close.
> 
> I'd be very surprised if close() didn't terminate any other threads doing
> read/write

Prepare to be surprised then - that's the behaviour I found when I tried
it out in 2014.
(Also at the time we found cases where the close() could hang)

> and even more surprised if it they handed out the same FD
> to another thread while something was still in the read.

Right, I don't think that will happen.

> 
> > shutdown() is safe, in that it stops any other threads accessing the fd
> > but doesn't allow it's reallocation until the close;  We perform the
> > close only when we've joined all other threads that were using the fd.
> > Any of the threads that do new calls on the fd get an error and quickly
> > fall down their error paths.
> 
> Ahh that's certainly an interesting scenario. That would certainly be
> a problem with the migration code when this was originally written.
> It had two QEMUFile structs each with an 'int fd' field, so when you
> close 'fd' on one QEMUFile struct, it wouldn't update the other QEMUFile
> used by another thread.
> 
> Since we switched over to use QIOChannel though, I think the thread
> scenario you describe should be avoided entirely. When you have multiple
> QEMUFile objects, they each have a reference counted pointer to the same
> underlying QIOChannel object instance. So when QEMUFile triggers a call
> to qio_channel_close() in one thread, that'll set fd=-1 in the QIOChannel.
> Since the other threads have a reference to the same QIOChannel object,
> they'll now see this fd == -1 straightaway.
> 
> So, IIUC, this should make the need for shutdown() redundant (at least
> for the thread race conditions you describe).

That's not thread safe unless you're doing some very careful locking.
Consider:
  T1                      T2       
     oldfd=fd               tmp=fd
     fd=-1
     close(oldfd)
     unrelated open()
                            read(tmp,...

In practice every use of fd will be a copy into a tmp and then the
syscall; the unrelated open() could happen in another thread.
(OK, the gap between the tmp and the read is tiny, although if we're
doing multiple operations chances are the compiler will optimise
it to the top of a loop).

There's no way to make that code safe.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel P. Berrangé May 19, 2017, 2:51 p.m. UTC | #12

On Fri, May 19, 2017 at 03:33:12PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > shutdown() is safe, in that it stops any other threads accessing the fd
> > > but doesn't allow it's reallocation until the close;  We perform the
> > > close only when we've joined all other threads that were using the fd.
> > > Any of the threads that do new calls on the fd get an error and quickly
> > > fall down their error paths.
> > 
> > Ahh that's certainly an interesting scenario. That would certainly be
> > a problem with the migration code when this was originally written.
> > It had two QEMUFile structs each with an 'int fd' field, so when you
> > close 'fd' on one QEMUFile struct, it wouldn't update the other QEMUFile
> > used by another thread.
> > 
> > Since we switched over to use QIOChannel though, I think the thread
> > scenario you describe should be avoided entirely. When you have multiple
> > QEMUFile objects, they each have a reference counted pointer to the same
> > underlying QIOChannel object instance. So when QEMUFile triggers a call
> > to qio_channel_close() in one thread, that'll set fd=-1 in the QIOChannel.
> > Since the other threads have a reference to the same QIOChannel object,
> > they'll now see this fd == -1 straightaway.
> > 
> > So, IIUC, this should make the need for shutdown() redundant (at least
> > for the thread race conditions you describe).
> 
> That's not thread safe unless you're doing some very careful locking.
> Consider:
>   T1                      T2       
>      oldfd=fd               tmp=fd
>      fd=-1
>      close(oldfd)
>      unrelated open()
>                             read(tmp,...
> 
> In practice every use of fd will be a copy into a tmp and then the
> syscall; the unrelated open() could happen in another thread.
> (OK, the gap between the tmp and the read is tiny, although if we're
> doing multiple operations chances are the compiler will optimise
> it to the top of a loop).
> 
> There's no way to make that code safe.

Urgh, yes, I see what you mean.

Currently the QIOChannelCommand implementation, uses a pair of anonymous
pipes for stdin/out to the child process. I wonder if we could switch
that to use socketpair() instead, thus letting us shutdown() on it too.

Though I guess it would be sufficient for qio_channel_shutdown() to
merely kill the child PID, while leaving the FDs open, as then you'd
get EOF and/or EPIPE on the read/writes.

Regards,
Daniel

Dr. David Alan Gilbert May 19, 2017, 6:41 p.m. UTC | #13

* Daniel P. Berrange (berrange@redhat.com) wrote:
> On Fri, May 19, 2017 at 03:33:12PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > shutdown() is safe, in that it stops any other threads accessing the fd
> > > > but doesn't allow it's reallocation until the close;  We perform the
> > > > close only when we've joined all other threads that were using the fd.
> > > > Any of the threads that do new calls on the fd get an error and quickly
> > > > fall down their error paths.
> > > 
> > > Ahh that's certainly an interesting scenario. That would certainly be
> > > a problem with the migration code when this was originally written.
> > > It had two QEMUFile structs each with an 'int fd' field, so when you
> > > close 'fd' on one QEMUFile struct, it wouldn't update the other QEMUFile
> > > used by another thread.
> > > 
> > > Since we switched over to use QIOChannel though, I think the thread
> > > scenario you describe should be avoided entirely. When you have multiple
> > > QEMUFile objects, they each have a reference counted pointer to the same
> > > underlying QIOChannel object instance. So when QEMUFile triggers a call
> > > to qio_channel_close() in one thread, that'll set fd=-1 in the QIOChannel.
> > > Since the other threads have a reference to the same QIOChannel object,
> > > they'll now see this fd == -1 straightaway.
> > > 
> > > So, IIUC, this should make the need for shutdown() redundant (at least
> > > for the thread race conditions you describe).
> > 
> > That's not thread safe unless you're doing some very careful locking.
> > Consider:
> >   T1                      T2       
> >      oldfd=fd               tmp=fd
> >      fd=-1
> >      close(oldfd)
> >      unrelated open()
> >                             read(tmp,...
> > 
> > In practice every use of fd will be a copy into a tmp and then the
> > syscall; the unrelated open() could happen in another thread.
> > (OK, the gap between the tmp and the read is tiny, although if we're
> > doing multiple operations chances are the compiler will optimise
> > it to the top of a loop).
> > 
> > There's no way to make that code safe.
> 
> Urgh, yes, I see what you mean.
> 
> Currently the QIOChannelCommand implementation, uses a pair of anonymous
> pipes for stdin/out to the child process. I wonder if we could switch
> that to use socketpair() instead, thus letting us shutdown() on it too.
> 
> Though I guess it would be sufficient for qio_channel_shutdown() to
> merely kill the child PID, while leaving the FDs open, as then you'd
> get EOF and/or EPIPE on the read/writes.

Yes, I guess it's a question of which one is more likely to actually
kill the exec child off; the socketpair is more likely to cause the
source side migration code to cancel cleanly, although a kill -9 
should sort out a wayward exec child.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel P. Berrangé May 22, 2017, 8:26 a.m. UTC | #14

On Fri, May 19, 2017 at 07:41:34PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > On Fri, May 19, 2017 at 03:33:12PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. Berrange (berrange@redhat.com) wrote:
> > > > > shutdown() is safe, in that it stops any other threads accessing the fd
> > > > > but doesn't allow it's reallocation until the close;  We perform the
> > > > > close only when we've joined all other threads that were using the fd.
> > > > > Any of the threads that do new calls on the fd get an error and quickly
> > > > > fall down their error paths.
> > > > 
> > > > Ahh that's certainly an interesting scenario. That would certainly be
> > > > a problem with the migration code when this was originally written.
> > > > It had two QEMUFile structs each with an 'int fd' field, so when you
> > > > close 'fd' on one QEMUFile struct, it wouldn't update the other QEMUFile
> > > > used by another thread.
> > > > 
> > > > Since we switched over to use QIOChannel though, I think the thread
> > > > scenario you describe should be avoided entirely. When you have multiple
> > > > QEMUFile objects, they each have a reference counted pointer to the same
> > > > underlying QIOChannel object instance. So when QEMUFile triggers a call
> > > > to qio_channel_close() in one thread, that'll set fd=-1 in the QIOChannel.
> > > > Since the other threads have a reference to the same QIOChannel object,
> > > > they'll now see this fd == -1 straightaway.
> > > > 
> > > > So, IIUC, this should make the need for shutdown() redundant (at least
> > > > for the thread race conditions you describe).
> > > 
> > > That's not thread safe unless you're doing some very careful locking.
> > > Consider:
> > >   T1                      T2       
> > >      oldfd=fd               tmp=fd
> > >      fd=-1
> > >      close(oldfd)
> > >      unrelated open()
> > >                             read(tmp,...
> > > 
> > > In practice every use of fd will be a copy into a tmp and then the
> > > syscall; the unrelated open() could happen in another thread.
> > > (OK, the gap between the tmp and the read is tiny, although if we're
> > > doing multiple operations chances are the compiler will optimise
> > > it to the top of a loop).
> > > 
> > > There's no way to make that code safe.
> > 
> > Urgh, yes, I see what you mean.
> > 
> > Currently the QIOChannelCommand implementation, uses a pair of anonymous
> > pipes for stdin/out to the child process. I wonder if we could switch
> > that to use socketpair() instead, thus letting us shutdown() on it too.
> > 
> > Though I guess it would be sufficient for qio_channel_shutdown() to
> > merely kill the child PID, while leaving the FDs open, as then you'd
> > get EOF and/or EPIPE on the read/writes.
> 
> Yes, I guess it's a question of which one is more likely to actually
> kill the exec child off; the socketpair is more likely to cause the
> source side migration code to cancel cleanly, although a kill -9 
> should sort out a wayward exec child.

I'm also curious if there's any real world performance difference between
using a pipe vs socketpair to communicate with a child process

Regards,
Daniel

[RFC,1/6] io: only allow return path for socket typed

Commit Message

Comments

Patch