diff mbox series

[2/9] migration: Fix file migration with fdset

Message ID 20240426142042.14573-3-farosas@suse.de
State New
Headers show
Series migration/mapped-ram: Add direct-io support | expand

Commit Message

Fabiano Rosas April 26, 2024, 2:20 p.m. UTC
When the migration using the "file:" URI was implemented, I don't
think any of us noticed that if you pass in a file name with the
format "/dev/fdset/N", this allows a file descriptor to be passed in
to QEMU and that behaves just like the "fd:" URI. So the "file:"
support has been added without regard for the fdset part and we got
some things wrong.

The first issue is that we should not truncate the migration file if
we're allowing an fd + offset. We need to leave the file contents
untouched.

The second issue is that there's an expectation that QEMU removes the
fd after the migration has finished. That's what the "fd:" code
does. Otherwise a second migration on the same VM could attempt to
provide an fdset with the same name and QEMU would reject it.

We can fix the first issue by detecting when we're using the fdset
vs. the plain file name. This requires storing the fdset_id
somewhere. We can then use this stored fdset_id to do cleanup at the
end and also fix the second issue.

Fixes: 385f510df5 ("migration: file URI offset")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

Comments

Peter Xu May 3, 2024, 4:23 p.m. UTC | #1
On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> When the migration using the "file:" URI was implemented, I don't
> think any of us noticed that if you pass in a file name with the
> format "/dev/fdset/N", this allows a file descriptor to be passed in
> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> support has been added without regard for the fdset part and we got
> some things wrong.
> 
> The first issue is that we should not truncate the migration file if
> we're allowing an fd + offset. We need to leave the file contents
> untouched.

I'm wondering whether we can use fallocate() instead on the ranges so that
we always don't open() with O_TRUNC.  Before that..  could you remind me
why do we need to truncate in the first place?  I definitely missed
something else here too.

> 
> The second issue is that there's an expectation that QEMU removes the
> fd after the migration has finished. That's what the "fd:" code
> does. Otherwise a second migration on the same VM could attempt to
> provide an fdset with the same name and QEMU would reject it.

Let me check what we do when with "fd:" and when migration completes or
cancels.

IIUC it's qio_channel_file_close() that does the final cleanup work on
e.g. to_dst_file, right?  Then there's qemu_close(), and it has:

    /* Close fd that was dup'd from an fdset */
    fdset_id = monitor_fdset_dup_fd_find(fd);
    if (fdset_id != -1) {
        int ret;

        ret = close(fd);
        if (ret == 0) {
            monitor_fdset_dup_fd_remove(fd);
        }

        return ret;
    }

Shouldn't this done the work already?

Off topic: I think this code is over complicated too, maybe I missed
something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
simply walk the list and remove stuff..  I attach a patch at the end that I
tried to clean that up, just in case there's early comments.  But we can
ignore that so we don't get side-tracked, and focus on the direct-io
issues.

Thanks,

=======

From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 3 May 2024 11:27:20 -0400
Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This function is not needed, one remove function should already work.
Clean it up.

Here the code doesn't really care about whether we need to keep that dupfd
around if close() failed: when that happens something got very wrong,
keeping the dup_fd around the fdsets may not help that situation so far.

Cc: Dr. David Alan Gilbert <dave@treblig.org>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/monitor/monitor.h |  1 -
 monitor/fds.c             | 27 +++++----------------------
 stubs/fdset.c             |  5 -----
 util/osdep.c              | 15 +--------------
 4 files changed, 6 insertions(+), 42 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 965f5d5450..fd9b3f538c 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
                                 const char *opaque, Error **errp);
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
 void monitor_fdset_dup_fd_remove(int dup_fd);
-int64_t monitor_fdset_dup_fd_find(int dup_fd);
 
 void monitor_register_hmp(const char *name, bool info,
                           void (*cmd)(Monitor *mon, const QDict *qdict));
diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..d5aecfb70e 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 #endif
 }
 
-static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
+void monitor_fdset_dup_fd_remove(int dup_fd)
 {
     MonFdset *mon_fdset;
     MonFdsetFd *mon_fdset_fd_dup;
@@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
     QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
         QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
             if (mon_fdset_fd_dup->fd == dup_fd) {
-                if (remove) {
-                    QLIST_REMOVE(mon_fdset_fd_dup, next);
-                    g_free(mon_fdset_fd_dup);
-                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
-                        monitor_fdset_cleanup(mon_fdset);
-                    }
-                    return -1;
-                } else {
-                    return mon_fdset->id;
+                QLIST_REMOVE(mon_fdset_fd_dup, next);
+                g_free(mon_fdset_fd_dup);
+                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
+                    monitor_fdset_cleanup(mon_fdset);
                 }
             }
         }
     }
-
-    return -1;
-}
-
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
-}
-
-void monitor_fdset_dup_fd_remove(int dup_fd)
-{
-    monitor_fdset_dup_fd_find_remove(dup_fd, true);
 }
 
 int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
diff --git a/stubs/fdset.c b/stubs/fdset.c
index d7c39a28ac..389e368a29 100644
--- a/stubs/fdset.c
+++ b/stubs/fdset.c
@@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
     return -1;
 }
 
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-    return -1;
-}
-
 void monitor_fdset_dup_fd_remove(int dupfd)
 {
 }
diff --git a/util/osdep.c b/util/osdep.c
index e996c4744a..2d9749d060 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
 
 int qemu_close(int fd)
 {
-    int64_t fdset_id;
-
     /* Close fd that was dup'd from an fdset */
-    fdset_id = monitor_fdset_dup_fd_find(fd);
-    if (fdset_id != -1) {
-        int ret;
-
-        ret = close(fd);
-        if (ret == 0) {
-            monitor_fdset_dup_fd_remove(fd);
-        }
-
-        return ret;
-    }
-
+    monitor_fdset_dup_fd_remove(fd);
     return close(fd);
 }
Fabiano Rosas May 3, 2024, 7:56 p.m. UTC | #2
Peter Xu <peterx@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> When the migration using the "file:" URI was implemented, I don't
>> think any of us noticed that if you pass in a file name with the
>> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> support has been added without regard for the fdset part and we got
>> some things wrong.
>> 
>> The first issue is that we should not truncate the migration file if
>> we're allowing an fd + offset. We need to leave the file contents
>> untouched.
>
> I'm wondering whether we can use fallocate() instead on the ranges so that
> we always don't open() with O_TRUNC.  Before that..  could you remind me
> why do we need to truncate in the first place?  I definitely missed
> something else here too.

AFAIK, just to avoid any issues if the file is pre-existing. I don't see
the difference between O_TRUNC and fallocate in this case.

>
>> 
>> The second issue is that there's an expectation that QEMU removes the
>> fd after the migration has finished. That's what the "fd:" code
>> does. Otherwise a second migration on the same VM could attempt to
>> provide an fdset with the same name and QEMU would reject it.
>
> Let me check what we do when with "fd:" and when migration completes or
> cancels.
>
> IIUC it's qio_channel_file_close() that does the final cleanup work on
> e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
>
>     /* Close fd that was dup'd from an fdset */
>     fdset_id = monitor_fdset_dup_fd_find(fd);
>     if (fdset_id != -1) {
>         int ret;
>
>         ret = close(fd);
>         if (ret == 0) {
>             monitor_fdset_dup_fd_remove(fd);
>         }
>
>         return ret;
>     }
>
> Shouldn't this done the work already?

That removes the mon_fdset_fd_dup->fd, we want to remove the
mon_fdset_fd->fd.

>
> Off topic: I think this code is over complicated too, maybe I missed
> something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
> simply walk the list and remove stuff..  I attach a patch at the end that I
> tried to clean that up, just in case there's early comments.  But we can
> ignore that so we don't get side-tracked, and focus on the direct-io
> issues.

Well, I'm not confident touching this code. This is more than a decade
old, I have no idea what the original motivations were. The possible
interactions with the user via command-line (-add-fd), QMP (add-fd) and
the monitor lifetime make me confused. Not to mention the fdset part
being plumbed into the guts of a widely used qemu_open_internal() that
very misleadingly presents itself as just a wrapper for open().

>
> Thanks,
>
> =======
>
> From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Fri, 3 May 2024 11:27:20 -0400
> Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> This function is not needed, one remove function should already work.
> Clean it up.
>
> Here the code doesn't really care about whether we need to keep that dupfd
> around if close() failed: when that happens something got very wrong,
> keeping the dup_fd around the fdsets may not help that situation so far.
>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Markus Armbruster <armbru@redhat.com>
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/monitor/monitor.h |  1 -
>  monitor/fds.c             | 27 +++++----------------------
>  stubs/fdset.c             |  5 -----
>  util/osdep.c              | 15 +--------------
>  4 files changed, 6 insertions(+), 42 deletions(-)
>
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 965f5d5450..fd9b3f538c 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
>                                  const char *opaque, Error **errp);
>  int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
>  void monitor_fdset_dup_fd_remove(int dup_fd);
> -int64_t monitor_fdset_dup_fd_find(int dup_fd);
>  
>  void monitor_register_hmp(const char *name, bool info,
>                            void (*cmd)(Monitor *mon, const QDict *qdict));
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..d5aecfb70e 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>  #endif
>  }
>  
> -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> +void monitor_fdset_dup_fd_remove(int dup_fd)
>  {
>      MonFdset *mon_fdset;
>      MonFdsetFd *mon_fdset_fd_dup;
> @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
>      QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
>          QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
>              if (mon_fdset_fd_dup->fd == dup_fd) {
> -                if (remove) {
> -                    QLIST_REMOVE(mon_fdset_fd_dup, next);
> -                    g_free(mon_fdset_fd_dup);
> -                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> -                        monitor_fdset_cleanup(mon_fdset);
> -                    }
> -                    return -1;
> -                } else {
> -                    return mon_fdset->id;
> +                QLIST_REMOVE(mon_fdset_fd_dup, next);
> +                g_free(mon_fdset_fd_dup);
> +                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> +                    monitor_fdset_cleanup(mon_fdset);
>                  }
>              }
>          }
>      }
> -
> -    return -1;
> -}
> -
> -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> -{
> -    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
> -}
> -
> -void monitor_fdset_dup_fd_remove(int dup_fd)
> -{
> -    monitor_fdset_dup_fd_find_remove(dup_fd, true);
>  }
>  
>  int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
> diff --git a/stubs/fdset.c b/stubs/fdset.c
> index d7c39a28ac..389e368a29 100644
> --- a/stubs/fdset.c
> +++ b/stubs/fdset.c
> @@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>      return -1;
>  }
>  
> -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> -{
> -    return -1;
> -}
> -
>  void monitor_fdset_dup_fd_remove(int dupfd)
>  {
>  }
> diff --git a/util/osdep.c b/util/osdep.c
> index e996c4744a..2d9749d060 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
>  
>  int qemu_close(int fd)
>  {
> -    int64_t fdset_id;
> -
>      /* Close fd that was dup'd from an fdset */
> -    fdset_id = monitor_fdset_dup_fd_find(fd);
> -    if (fdset_id != -1) {
> -        int ret;
> -
> -        ret = close(fd);
> -        if (ret == 0) {
> -            monitor_fdset_dup_fd_remove(fd);
> -        }
> -
> -        return ret;
> -    }
> -
> +    monitor_fdset_dup_fd_remove(fd);
>      return close(fd);
>  }
>  
> -- 
> 2.44.0
Peter Xu May 3, 2024, 9:04 p.m. UTC | #3
On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> >> When the migration using the "file:" URI was implemented, I don't
> >> think any of us noticed that if you pass in a file name with the
> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> >> support has been added without regard for the fdset part and we got
> >> some things wrong.
> >> 
> >> The first issue is that we should not truncate the migration file if
> >> we're allowing an fd + offset. We need to leave the file contents
> >> untouched.
> >
> > I'm wondering whether we can use fallocate() instead on the ranges so that
> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> > why do we need to truncate in the first place?  I definitely missed
> > something else here too.
> 
> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
> the difference between O_TRUNC and fallocate in this case.

Then, shall we avoid truncations at all, leaving all the feasibility to
user (also errors prone to make)?

> 
> >
> >> 
> >> The second issue is that there's an expectation that QEMU removes the
> >> fd after the migration has finished. That's what the "fd:" code
> >> does. Otherwise a second migration on the same VM could attempt to
> >> provide an fdset with the same name and QEMU would reject it.
> >
> > Let me check what we do when with "fd:" and when migration completes or
> > cancels.
> >
> > IIUC it's qio_channel_file_close() that does the final cleanup work on
> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
> >
> >     /* Close fd that was dup'd from an fdset */
> >     fdset_id = monitor_fdset_dup_fd_find(fd);
> >     if (fdset_id != -1) {
> >         int ret;
> >
> >         ret = close(fd);
> >         if (ret == 0) {
> >             monitor_fdset_dup_fd_remove(fd);
> >         }
> >
> >         return ret;
> >     }
> >
> > Shouldn't this done the work already?
> 
> That removes the mon_fdset_fd_dup->fd, we want to remove the
> mon_fdset_fd->fd.

What I read so far is when we are removing the dup-fds, we'll do one more
thing:

monitor_fdset_dup_fd_find_remove():
                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
                        monitor_fdset_cleanup(mon_fdset);
                    }

It means if we removed all the dup-fds correctly, we should also remove the
whole fdset, which includes the ->fds, IIUC.

> 
> >
> > Off topic: I think this code is over complicated too, maybe I missed
> > something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
> > simply walk the list and remove stuff..  I attach a patch at the end that I
> > tried to clean that up, just in case there's early comments.  But we can
> > ignore that so we don't get side-tracked, and focus on the direct-io
> > issues.
> 
> Well, I'm not confident touching this code. This is more than a decade
> old, I have no idea what the original motivations were. The possible
> interactions with the user via command-line (-add-fd), QMP (add-fd) and
> the monitor lifetime make me confused. Not to mention the fdset part
> being plumbed into the guts of a widely used qemu_open_internal() that
> very misleadingly presents itself as just a wrapper for open().

If to make QEMU long live, we'll probably need to touch it at some
point.. or at least discuss about it and figure things out. We pay tech
debts like this when there's no good comment / docs to refer in this case,
then the earlier, perhaps also the better.. to try taking the stab, imho.

Definitely not a request to clean everything up. :) Let's see whether
others can chim in with better knowledge of the history.

> 
> >
> > Thanks,
> >
> > =======
> >
> > From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
> > From: Peter Xu <peterx@redhat.com>
> > Date: Fri, 3 May 2024 11:27:20 -0400
> > Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > This function is not needed, one remove function should already work.
> > Clean it up.
> >
> > Here the code doesn't really care about whether we need to keep that dupfd
> > around if close() failed: when that happens something got very wrong,
> > keeping the dup_fd around the fdsets may not help that situation so far.
> >
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Cc: Markus Armbruster <armbru@redhat.com>
> > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/monitor/monitor.h |  1 -
> >  monitor/fds.c             | 27 +++++----------------------
> >  stubs/fdset.c             |  5 -----
> >  util/osdep.c              | 15 +--------------
> >  4 files changed, 6 insertions(+), 42 deletions(-)
> >
> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> > index 965f5d5450..fd9b3f538c 100644
> > --- a/include/monitor/monitor.h
> > +++ b/include/monitor/monitor.h
> > @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
> >                                  const char *opaque, Error **errp);
> >  int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
> >  void monitor_fdset_dup_fd_remove(int dup_fd);
> > -int64_t monitor_fdset_dup_fd_find(int dup_fd);
> >  
> >  void monitor_register_hmp(const char *name, bool info,
> >                            void (*cmd)(Monitor *mon, const QDict *qdict));
> > diff --git a/monitor/fds.c b/monitor/fds.c
> > index d86c2c674c..d5aecfb70e 100644
> > --- a/monitor/fds.c
> > +++ b/monitor/fds.c
> > @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
> >  #endif
> >  }
> >  
> > -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> > +void monitor_fdset_dup_fd_remove(int dup_fd)
> >  {
> >      MonFdset *mon_fdset;
> >      MonFdsetFd *mon_fdset_fd_dup;
> > @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> >      QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
> >          QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
> >              if (mon_fdset_fd_dup->fd == dup_fd) {
> > -                if (remove) {
> > -                    QLIST_REMOVE(mon_fdset_fd_dup, next);
> > -                    g_free(mon_fdset_fd_dup);
> > -                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> > -                        monitor_fdset_cleanup(mon_fdset);
> > -                    }
> > -                    return -1;
> > -                } else {
> > -                    return mon_fdset->id;
> > +                QLIST_REMOVE(mon_fdset_fd_dup, next);
> > +                g_free(mon_fdset_fd_dup);
> > +                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> > +                    monitor_fdset_cleanup(mon_fdset);
> >                  }
> >              }
> >          }
> >      }
> > -
> > -    return -1;
> > -}
> > -
> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> > -{
> > -    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
> > -}
> > -
> > -void monitor_fdset_dup_fd_remove(int dup_fd)
> > -{
> > -    monitor_fdset_dup_fd_find_remove(dup_fd, true);
> >  }
> >  
> >  int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
> > diff --git a/stubs/fdset.c b/stubs/fdset.c
> > index d7c39a28ac..389e368a29 100644
> > --- a/stubs/fdset.c
> > +++ b/stubs/fdset.c
> > @@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
> >      return -1;
> >  }
> >  
> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
> > -{
> > -    return -1;
> > -}
> > -
> >  void monitor_fdset_dup_fd_remove(int dupfd)
> >  {
> >  }
> > diff --git a/util/osdep.c b/util/osdep.c
> > index e996c4744a..2d9749d060 100644
> > --- a/util/osdep.c
> > +++ b/util/osdep.c
> > @@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
> >  
> >  int qemu_close(int fd)
> >  {
> > -    int64_t fdset_id;
> > -
> >      /* Close fd that was dup'd from an fdset */
> > -    fdset_id = monitor_fdset_dup_fd_find(fd);
> > -    if (fdset_id != -1) {
> > -        int ret;
> > -
> > -        ret = close(fd);
> > -        if (ret == 0) {
> > -            monitor_fdset_dup_fd_remove(fd);
> > -        }
> > -
> > -        return ret;
> > -    }
> > -
> > +    monitor_fdset_dup_fd_remove(fd);
> >      return close(fd);
> >  }
> >  
> > -- 
> > 2.44.0
>
Fabiano Rosas May 3, 2024, 9:31 p.m. UTC | #4
Peter Xu <peterx@redhat.com> writes:

> On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> >> When the migration using the "file:" URI was implemented, I don't
>> >> think any of us noticed that if you pass in a file name with the
>> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> >> support has been added without regard for the fdset part and we got
>> >> some things wrong.
>> >> 
>> >> The first issue is that we should not truncate the migration file if
>> >> we're allowing an fd + offset. We need to leave the file contents
>> >> untouched.
>> >
>> > I'm wondering whether we can use fallocate() instead on the ranges so that
>> > we always don't open() with O_TRUNC.  Before that..  could you remind me
>> > why do we need to truncate in the first place?  I definitely missed
>> > something else here too.
>> 
>> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
>> the difference between O_TRUNC and fallocate in this case.
>
> Then, shall we avoid truncations at all, leaving all the feasibility to
> user (also errors prone to make)?
>

Is this a big deal? I'd rather close that possible gap and avoid the bug
reports.

>> 
>> >
>> >> 
>> >> The second issue is that there's an expectation that QEMU removes the
>> >> fd after the migration has finished. That's what the "fd:" code
>> >> does. Otherwise a second migration on the same VM could attempt to
>> >> provide an fdset with the same name and QEMU would reject it.
>> >
>> > Let me check what we do when with "fd:" and when migration completes or
>> > cancels.
>> >
>> > IIUC it's qio_channel_file_close() that does the final cleanup work on
>> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
>> >
>> >     /* Close fd that was dup'd from an fdset */
>> >     fdset_id = monitor_fdset_dup_fd_find(fd);
>> >     if (fdset_id != -1) {
>> >         int ret;
>> >
>> >         ret = close(fd);
>> >         if (ret == 0) {
>> >             monitor_fdset_dup_fd_remove(fd);
>> >         }
>> >
>> >         return ret;
>> >     }
>> >
>> > Shouldn't this done the work already?
>> 
>> That removes the mon_fdset_fd_dup->fd, we want to remove the
>> mon_fdset_fd->fd.
>
> What I read so far is when we are removing the dup-fds, we'll do one more
> thing:
>
> monitor_fdset_dup_fd_find_remove():
>                     if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>                         monitor_fdset_cleanup(mon_fdset);
>                     }
>
> It means if we removed all the dup-fds correctly, we should also remove the
> whole fdset, which includes the ->fds, IIUC.
>

Since mon_fdset_fd->removed == false, we hit the runstate_is_running()
problem. I'm not sure, but probably mon_refcount > 0 as well. So the fd
would not be removed.

But I'll retest this on Monday just be sure, it's been a while since I
wrote some parts of this.

>> 
>> >
>> > Off topic: I think this code is over complicated too, maybe I missed
>> > something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
>> > simply walk the list and remove stuff..  I attach a patch at the end that I
>> > tried to clean that up, just in case there's early comments.  But we can
>> > ignore that so we don't get side-tracked, and focus on the direct-io
>> > issues.
>> 
>> Well, I'm not confident touching this code. This is more than a decade
>> old, I have no idea what the original motivations were. The possible
>> interactions with the user via command-line (-add-fd), QMP (add-fd) and
>> the monitor lifetime make me confused. Not to mention the fdset part
>> being plumbed into the guts of a widely used qemu_open_internal() that
>> very misleadingly presents itself as just a wrapper for open().
>
> If to make QEMU long live, we'll probably need to touch it at some
> point.. or at least discuss about it and figure things out. We pay tech
> debts like this when there's no good comment / docs to refer in this case,
> then the earlier, perhaps also the better.. to try taking the stab, imho.
>
> Definitely not a request to clean everything up. :) Let's see whether
> others can chim in with better knowledge of the history.
>
>> 
>> >
>> > Thanks,
>> >
>> > =======
>> >
>> > From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
>> > From: Peter Xu <peterx@redhat.com>
>> > Date: Fri, 3 May 2024 11:27:20 -0400
>> > Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
>> > MIME-Version: 1.0
>> > Content-Type: text/plain; charset=UTF-8
>> > Content-Transfer-Encoding: 8bit
>> >
>> > This function is not needed, one remove function should already work.
>> > Clean it up.
>> >
>> > Here the code doesn't really care about whether we need to keep that dupfd
>> > around if close() failed: when that happens something got very wrong,
>> > keeping the dup_fd around the fdsets may not help that situation so far.
>> >
>> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
>> > Cc: Markus Armbruster <armbru@redhat.com>
>> > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: Daniel P. Berrangé <berrange@redhat.com>
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  include/monitor/monitor.h |  1 -
>> >  monitor/fds.c             | 27 +++++----------------------
>> >  stubs/fdset.c             |  5 -----
>> >  util/osdep.c              | 15 +--------------
>> >  4 files changed, 6 insertions(+), 42 deletions(-)
>> >
>> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
>> > index 965f5d5450..fd9b3f538c 100644
>> > --- a/include/monitor/monitor.h
>> > +++ b/include/monitor/monitor.h
>> > @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
>> >                                  const char *opaque, Error **errp);
>> >  int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
>> >  void monitor_fdset_dup_fd_remove(int dup_fd);
>> > -int64_t monitor_fdset_dup_fd_find(int dup_fd);
>> >  
>> >  void monitor_register_hmp(const char *name, bool info,
>> >                            void (*cmd)(Monitor *mon, const QDict *qdict));
>> > diff --git a/monitor/fds.c b/monitor/fds.c
>> > index d86c2c674c..d5aecfb70e 100644
>> > --- a/monitor/fds.c
>> > +++ b/monitor/fds.c
>> > @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>> >  #endif
>> >  }
>> >  
>> > -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
>> > +void monitor_fdset_dup_fd_remove(int dup_fd)
>> >  {
>> >      MonFdset *mon_fdset;
>> >      MonFdsetFd *mon_fdset_fd_dup;
>> > @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
>> >      QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
>> >          QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
>> >              if (mon_fdset_fd_dup->fd == dup_fd) {
>> > -                if (remove) {
>> > -                    QLIST_REMOVE(mon_fdset_fd_dup, next);
>> > -                    g_free(mon_fdset_fd_dup);
>> > -                    if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>> > -                        monitor_fdset_cleanup(mon_fdset);
>> > -                    }
>> > -                    return -1;
>> > -                } else {
>> > -                    return mon_fdset->id;
>> > +                QLIST_REMOVE(mon_fdset_fd_dup, next);
>> > +                g_free(mon_fdset_fd_dup);
>> > +                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>> > +                    monitor_fdset_cleanup(mon_fdset);
>> >                  }
>> >              }
>> >          }
>> >      }
>> > -
>> > -    return -1;
>> > -}
>> > -
>> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
>> > -{
>> > -    return monitor_fdset_dup_fd_find_remove(dup_fd, false);
>> > -}
>> > -
>> > -void monitor_fdset_dup_fd_remove(int dup_fd)
>> > -{
>> > -    monitor_fdset_dup_fd_find_remove(dup_fd, true);
>> >  }
>> >  
>> >  int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
>> > diff --git a/stubs/fdset.c b/stubs/fdset.c
>> > index d7c39a28ac..389e368a29 100644
>> > --- a/stubs/fdset.c
>> > +++ b/stubs/fdset.c
>> > @@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>> >      return -1;
>> >  }
>> >  
>> > -int64_t monitor_fdset_dup_fd_find(int dup_fd)
>> > -{
>> > -    return -1;
>> > -}
>> > -
>> >  void monitor_fdset_dup_fd_remove(int dupfd)
>> >  {
>> >  }
>> > diff --git a/util/osdep.c b/util/osdep.c
>> > index e996c4744a..2d9749d060 100644
>> > --- a/util/osdep.c
>> > +++ b/util/osdep.c
>> > @@ -393,21 +393,8 @@ int qemu_open_old(const char *name, int flags, ...)
>> >  
>> >  int qemu_close(int fd)
>> >  {
>> > -    int64_t fdset_id;
>> > -
>> >      /* Close fd that was dup'd from an fdset */
>> > -    fdset_id = monitor_fdset_dup_fd_find(fd);
>> > -    if (fdset_id != -1) {
>> > -        int ret;
>> > -
>> > -        ret = close(fd);
>> > -        if (ret == 0) {
>> > -            monitor_fdset_dup_fd_remove(fd);
>> > -        }
>> > -
>> > -        return ret;
>> > -    }
>> > -
>> > +    monitor_fdset_dup_fd_remove(fd);
>> >      return close(fd);
>> >  }
>> >  
>> > -- 
>> > 2.44.0
>>
Peter Xu May 3, 2024, 9:56 p.m. UTC | #5
On Fri, May 03, 2024 at 06:31:06PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> >> >> When the migration using the "file:" URI was implemented, I don't
> >> >> think any of us noticed that if you pass in a file name with the
> >> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
> >> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> >> >> support has been added without regard for the fdset part and we got
> >> >> some things wrong.
> >> >> 
> >> >> The first issue is that we should not truncate the migration file if
> >> >> we're allowing an fd + offset. We need to leave the file contents
> >> >> untouched.
> >> >
> >> > I'm wondering whether we can use fallocate() instead on the ranges so that
> >> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> >> > why do we need to truncate in the first place?  I definitely missed
> >> > something else here too.
> >> 
> >> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
> >> the difference between O_TRUNC and fallocate in this case.
> >
> > Then, shall we avoid truncations at all, leaving all the feasibility to
> > user (also errors prone to make)?
> >
> 
> Is this a big deal? I'd rather close that possible gap and avoid the bug
> reports.

No possible of such report if the user uses Libvirt or even more virt
stacks, am I right?  While this is only for whoever uses QEMU directly, and
only if the one forgot to remove a leftover image file?

I'd not worry about those people who use QEMU directly - they aren't the
people we need to care too much about, imho (and I'm definitely one of
them..).  The problem is I feel it an overkill introducing a migration
global var just for this purpose.

No strong opinions, if you feel strongly like so I'm ok with it.  But if
one day if we want to remove FileOutgoingArgs I'll also leave that to you
as a trade-off. :-)

> 
> >> 
> >> >
> >> >> 
> >> >> The second issue is that there's an expectation that QEMU removes the
> >> >> fd after the migration has finished. That's what the "fd:" code
> >> >> does. Otherwise a second migration on the same VM could attempt to
> >> >> provide an fdset with the same name and QEMU would reject it.
> >> >
> >> > Let me check what we do when with "fd:" and when migration completes or
> >> > cancels.
> >> >
> >> > IIUC it's qio_channel_file_close() that does the final cleanup work on
> >> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
> >> >
> >> >     /* Close fd that was dup'd from an fdset */
> >> >     fdset_id = monitor_fdset_dup_fd_find(fd);
> >> >     if (fdset_id != -1) {
> >> >         int ret;
> >> >
> >> >         ret = close(fd);
> >> >         if (ret == 0) {
> >> >             monitor_fdset_dup_fd_remove(fd);
> >> >         }
> >> >
> >> >         return ret;
> >> >     }
> >> >
> >> > Shouldn't this done the work already?
> >> 
> >> That removes the mon_fdset_fd_dup->fd, we want to remove the
> >> mon_fdset_fd->fd.
> >
> > What I read so far is when we are removing the dup-fds, we'll do one more
> > thing:
> >
> > monitor_fdset_dup_fd_find_remove():
> >                     if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> >                         monitor_fdset_cleanup(mon_fdset);
> >                     }
> >
> > It means if we removed all the dup-fds correctly, we should also remove the
> > whole fdset, which includes the ->fds, IIUC.
> >
> 
> Since mon_fdset_fd->removed == false, we hit the runstate_is_running()
> problem. I'm not sure, but probably mon_refcount > 0 as well. So the fd
> would not be removed.
> 
> But I'll retest this on Monday just be sure, it's been a while since I
> wrote some parts of this.

Thanks.  And I hope we can also get some more clues too when you dig out
more out of the whole add-fd API; I hope we don't pile up more complicated
logics on top of a mistery.  I feel like this is the time we figure things
out.
Daniel P. Berrangé May 8, 2024, 8 a.m. UTC | #6
On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> When the migration using the "file:" URI was implemented, I don't
> think any of us noticed that if you pass in a file name with the
> format "/dev/fdset/N", this allows a file descriptor to be passed in
> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> support has been added without regard for the fdset part and we got
> some things wrong.
> 
> The first issue is that we should not truncate the migration file if
> we're allowing an fd + offset. We need to leave the file contents
> untouched.
> 
> The second issue is that there's an expectation that QEMU removes the
> fd after the migration has finished. That's what the "fd:" code
> does. Otherwise a second migration on the same VM could attempt to
> provide an fdset with the same name and QEMU would reject it.
> 
> We can fix the first issue by detecting when we're using the fdset
> vs. the plain file name. This requires storing the fdset_id
> somewhere. We can then use this stored fdset_id to do cleanup at the
> end and also fix the second issue.

The use of /dev/fdset is supposed to be transparent to code in
QEMU, so modifying migration to learn about FD sets to do manual
cleanup is breaking that API facade.

IMHO the transparency of the design points towards the mgmt app
calling 'remove-fd' set after migration has started, in order
that a later migraiton can use the same fdset name.

Ideally the truncation issue needs to be transparent too.

Rather than detecting use of fdset, we can not use O_TRUNC
at all. Instead we can call ftruncate(fd, offset), which
should work in both normal and fdset scenarios.

> 
> Fixes: 385f510df5 ("migration: file URI offset")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/file.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index ab18ba505a..8f30999400 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -10,6 +10,7 @@
>  #include "qemu/cutils.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-commands-misc.h"
>  #include "channel.h"
>  #include "file.h"
>  #include "migration.h"
> @@ -23,6 +24,7 @@
>  
>  static struct FileOutgoingArgs {
>      char *fname;
> +    int64_t fdset_id;
>  } outgoing_args;
>  
>  /* Remove the offset option from @filespec and return it in @offsetp. */
> @@ -44,10 +46,39 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
>      return 0;
>  }
>  
> +static void file_remove_fdset(void)
> +{
> +    if (outgoing_args.fdset_id != -1) {
> +        qmp_remove_fd(outgoing_args.fdset_id, false, -1, NULL);
> +        outgoing_args.fdset_id = -1;
> +    }
> +}
> +
> +static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> +                             Error **errp)
> +{
> +    const char *fdset_id_str;
> +
> +    *fdset_id = -1;
> +
> +    if (!strstart(filename, "/dev/fdset/", &fdset_id_str)) {
> +        return true;
> +    }
> +
> +    *fdset_id = qemu_parse_fd(fdset_id_str);
> +    if (*fdset_id == -1) {
> +        error_setg_errno(errp, EINVAL, "Could not parse fdset %s", fdset_id_str);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
>  void file_cleanup_outgoing_migration(void)
>  {
>      g_free(outgoing_args.fname);
>      outgoing_args.fname = NULL;
> +    file_remove_fdset();
>  }
>  
>  bool file_send_channel_create(gpointer opaque, Error **errp)
> @@ -81,11 +112,24 @@ void file_start_outgoing_migration(MigrationState *s,
>      g_autofree char *filename = g_strdup(file_args->filename);
>      uint64_t offset = file_args->offset;
>      QIOChannel *ioc;
> +    int flags = O_CREAT | O_WRONLY;
>  
>      trace_migration_file_outgoing(filename);
>  
> -    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
> -                                     0600, errp);
> +    if (!file_parse_fdset(filename, &outgoing_args.fdset_id, errp)) {
> +        return;
> +    }
> +
> +    /*
> +     * Only truncate if it's QEMU opening the file. If an fd has been
> +     * passed in the file will already contain data written by the
> +     * management layer.
> +     */
> +    if (outgoing_args.fdset_id == -1) {
> +        flags |= O_TRUNC;
> +    }
> +
> +    fioc = qio_channel_file_new_path(filename, flags, 0600, errp);
>      if (!fioc) {
>          return;
>      }
> -- 
> 2.35.3
> 

With regards,
Daniel
Daniel P. Berrangé May 8, 2024, 8:02 a.m. UTC | #7
On Fri, May 03, 2024 at 12:23:51PM -0400, Peter Xu wrote:
> On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> > When the migration using the "file:" URI was implemented, I don't
> > think any of us noticed that if you pass in a file name with the
> > format "/dev/fdset/N", this allows a file descriptor to be passed in
> > to QEMU and that behaves just like the "fd:" URI. So the "file:"
> > support has been added without regard for the fdset part and we got
> > some things wrong.
> > 
> > The first issue is that we should not truncate the migration file if
> > we're allowing an fd + offset. We need to leave the file contents
> > untouched.
> 
> I'm wondering whether we can use fallocate() instead on the ranges so that
> we always don't open() with O_TRUNC.  Before that..  could you remind me
> why do we need to truncate in the first place?  I definitely missed
> something else here too.

You're mixing distinct concepts here. fallocate makes a file region
non-sparse, while O_TRUNC removes all existing allocation, making it
sparse if we write at non-contiguous offsets. I don't think we would
want to call fallocate, since we /want/ a sparse file so that we
don't needlessly store large regions of all-zeros as RAM maps.


With regards,
Daniel
Peter Xu May 8, 2024, 12:49 p.m. UTC | #8
On Wed, May 08, 2024 at 09:02:16AM +0100, Daniel P. Berrangé wrote:
> On Fri, May 03, 2024 at 12:23:51PM -0400, Peter Xu wrote:
> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> > > When the migration using the "file:" URI was implemented, I don't
> > > think any of us noticed that if you pass in a file name with the
> > > format "/dev/fdset/N", this allows a file descriptor to be passed in
> > > to QEMU and that behaves just like the "fd:" URI. So the "file:"
> > > support has been added without regard for the fdset part and we got
> > > some things wrong.
> > > 
> > > The first issue is that we should not truncate the migration file if
> > > we're allowing an fd + offset. We need to leave the file contents
> > > untouched.
> > 
> > I'm wondering whether we can use fallocate() instead on the ranges so that
> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> > why do we need to truncate in the first place?  I definitely missed
> > something else here too.
> 
> You're mixing distinct concepts here. fallocate makes a file region
> non-sparse, while O_TRUNC removes all existing allocation, making it
> sparse if we write at non-contiguous offsets. I don't think we would
> want to call fallocate, since we /want/ a sparse file so that we
> don't needlessly store large regions of all-zeros as RAM maps.

I meant fallocate() with FALLOC_FL_PUNCH_HOLE.  But now I think it'll be
good we avoid both.

Thanks,
Fabiano Rosas May 8, 2024, 8:45 p.m. UTC | #9
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> When the migration using the "file:" URI was implemented, I don't
>> think any of us noticed that if you pass in a file name with the
>> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> support has been added without regard for the fdset part and we got
>> some things wrong.
>> 
>> The first issue is that we should not truncate the migration file if
>> we're allowing an fd + offset. We need to leave the file contents
>> untouched.
>> 
>> The second issue is that there's an expectation that QEMU removes the
>> fd after the migration has finished. That's what the "fd:" code
>> does. Otherwise a second migration on the same VM could attempt to
>> provide an fdset with the same name and QEMU would reject it.
>> 
>> We can fix the first issue by detecting when we're using the fdset
>> vs. the plain file name. This requires storing the fdset_id
>> somewhere. We can then use this stored fdset_id to do cleanup at the
>> end and also fix the second issue.
>
> The use of /dev/fdset is supposed to be transparent to code in
> QEMU, so modifying migration to learn about FD sets to do manual
> cleanup is breaking that API facade.
>
> IMHO the transparency of the design points towards the mgmt app
> calling 'remove-fd' set after migration has started, in order
> that a later migraiton can use the same fdset name.

I got this slightly wrong, QEMU doesn't reject the creation of the
fdset, it just reuses the old one and adds the new fd to it. That is
somewhat worse because then we'd choose the wrong fd when migrating. But
I guess we could just require the management layer to do proper
management of the fds/fdset.

>
> Ideally the truncation issue needs to be transparent too.
>
> Rather than detecting use of fdset, we can not use O_TRUNC
> at all. Instead we can call ftruncate(fd, offset), which
> should work in both normal and fdset scenarios.
>

Good idea.

>> 
>> Fixes: 385f510df5 ("migration: file URI offset")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/file.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 46 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/file.c b/migration/file.c
>> index ab18ba505a..8f30999400 100644
>> --- a/migration/file.c
>> +++ b/migration/file.c
>> @@ -10,6 +10,7 @@
>>  #include "qemu/cutils.h"
>>  #include "qemu/error-report.h"
>>  #include "qapi/error.h"
>> +#include "qapi/qapi-commands-misc.h"
>>  #include "channel.h"
>>  #include "file.h"
>>  #include "migration.h"
>> @@ -23,6 +24,7 @@
>>  
>>  static struct FileOutgoingArgs {
>>      char *fname;
>> +    int64_t fdset_id;
>>  } outgoing_args;
>>  
>>  /* Remove the offset option from @filespec and return it in @offsetp. */
>> @@ -44,10 +46,39 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
>>      return 0;
>>  }
>>  
>> +static void file_remove_fdset(void)
>> +{
>> +    if (outgoing_args.fdset_id != -1) {
>> +        qmp_remove_fd(outgoing_args.fdset_id, false, -1, NULL);
>> +        outgoing_args.fdset_id = -1;
>> +    }
>> +}
>> +
>> +static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> +                             Error **errp)
>> +{
>> +    const char *fdset_id_str;
>> +
>> +    *fdset_id = -1;
>> +
>> +    if (!strstart(filename, "/dev/fdset/", &fdset_id_str)) {
>> +        return true;
>> +    }
>> +
>> +    *fdset_id = qemu_parse_fd(fdset_id_str);
>> +    if (*fdset_id == -1) {
>> +        error_setg_errno(errp, EINVAL, "Could not parse fdset %s", fdset_id_str);
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>  void file_cleanup_outgoing_migration(void)
>>  {
>>      g_free(outgoing_args.fname);
>>      outgoing_args.fname = NULL;
>> +    file_remove_fdset();
>>  }
>>  
>>  bool file_send_channel_create(gpointer opaque, Error **errp)
>> @@ -81,11 +112,24 @@ void file_start_outgoing_migration(MigrationState *s,
>>      g_autofree char *filename = g_strdup(file_args->filename);
>>      uint64_t offset = file_args->offset;
>>      QIOChannel *ioc;
>> +    int flags = O_CREAT | O_WRONLY;
>>  
>>      trace_migration_file_outgoing(filename);
>>  
>> -    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
>> -                                     0600, errp);
>> +    if (!file_parse_fdset(filename, &outgoing_args.fdset_id, errp)) {
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Only truncate if it's QEMU opening the file. If an fd has been
>> +     * passed in the file will already contain data written by the
>> +     * management layer.
>> +     */
>> +    if (outgoing_args.fdset_id == -1) {
>> +        flags |= O_TRUNC;
>> +    }
>> +
>> +    fioc = qio_channel_file_new_path(filename, flags, 0600, errp);
>>      if (!fioc) {
>>          return;
>>      }
>> -- 
>> 2.35.3
>> 
>
> With regards,
> Daniel
diff mbox series

Patch

diff --git a/migration/file.c b/migration/file.c
index ab18ba505a..8f30999400 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -10,6 +10,7 @@ 
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-misc.h"
 #include "channel.h"
 #include "file.h"
 #include "migration.h"
@@ -23,6 +24,7 @@ 
 
 static struct FileOutgoingArgs {
     char *fname;
+    int64_t fdset_id;
 } outgoing_args;
 
 /* Remove the offset option from @filespec and return it in @offsetp. */
@@ -44,10 +46,39 @@  int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
     return 0;
 }
 
+static void file_remove_fdset(void)
+{
+    if (outgoing_args.fdset_id != -1) {
+        qmp_remove_fd(outgoing_args.fdset_id, false, -1, NULL);
+        outgoing_args.fdset_id = -1;
+    }
+}
+
+static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
+                             Error **errp)
+{
+    const char *fdset_id_str;
+
+    *fdset_id = -1;
+
+    if (!strstart(filename, "/dev/fdset/", &fdset_id_str)) {
+        return true;
+    }
+
+    *fdset_id = qemu_parse_fd(fdset_id_str);
+    if (*fdset_id == -1) {
+        error_setg_errno(errp, EINVAL, "Could not parse fdset %s", fdset_id_str);
+        return false;
+    }
+
+    return true;
+}
+
 void file_cleanup_outgoing_migration(void)
 {
     g_free(outgoing_args.fname);
     outgoing_args.fname = NULL;
+    file_remove_fdset();
 }
 
 bool file_send_channel_create(gpointer opaque, Error **errp)
@@ -81,11 +112,24 @@  void file_start_outgoing_migration(MigrationState *s,
     g_autofree char *filename = g_strdup(file_args->filename);
     uint64_t offset = file_args->offset;
     QIOChannel *ioc;
+    int flags = O_CREAT | O_WRONLY;
 
     trace_migration_file_outgoing(filename);
 
-    fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
-                                     0600, errp);
+    if (!file_parse_fdset(filename, &outgoing_args.fdset_id, errp)) {
+        return;
+    }
+
+    /*
+     * Only truncate if it's QEMU opening the file. If an fd has been
+     * passed in the file will already contain data written by the
+     * management layer.
+     */
+    if (outgoing_args.fdset_id == -1) {
+        flags |= O_TRUNC;
+    }
+
+    fioc = qio_channel_file_new_path(filename, flags, 0600, errp);
     if (!fioc) {
         return;
     }