diff mbox series

util: retry open() when it gets interrupted by a signal

Message ID 20240731132524.308273-1-philipp.reisner@linbit.com
State New
Headers show
Series util: retry open() when it gets interrupted by a signal | expand

Commit Message

Philipp Reisner July 31, 2024, 1:25 p.m. UTC
As with many syscalls, open() might be interrupted by a signal.

The experienced logfile entry is:

qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call

Retry it until it is not interrupted by a signal.
FYI, dd has the same kind of loop aroud open().
https://github.com/coreutils/coreutils/blob/1ae98dbda7322427e8226356fd110d2553f5fac9/src/dd.c#L1294-L1299

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
---
 util/osdep.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

David Hildenbrand July 31, 2024, 2 p.m. UTC | #1
On 31.07.24 15:25, Philipp Reisner wrote:
> As with many syscalls, open() might be interrupted by a signal.
> 
> The experienced logfile entry is:
> 
> qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call
> 
> Retry it until it is not interrupted by a signal.
> FYI, dd has the same kind of loop aroud open().
> https://github.com/coreutils/coreutils/blob/1ae98dbda7322427e8226356fd110d2553f5fac9/src/dd.c#L1294-L1299
> 
> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
> ---
>   util/osdep.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 770369831b..a1269d9345 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -294,14 +294,17 @@ bool qemu_has_direct_io(void)
>   static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
>   {
>       int ret;
> +    do  {
>   #ifdef O_CLOEXEC
> -    ret = open(name, flags | O_CLOEXEC, mode);
> +        ret = open(name, flags | O_CLOEXEC, mode);
>   #else
> -    ret = open(name, flags, mode);
> -    if (ret >= 0) {
> -        qemu_set_cloexec(ret);
> -    }
> +        ret = open(name, flags, mode);
> +        if (ret >= 0) {
> +            qemu_set_cloexec(ret);
> +        }
>   #endif
> +    } while (ret == -1 && errno == EINTR);
> +
>       return ret;
>   }
>   

Reviewed-by: David Hildenbrand <david@redhat.com>
Daniel P. Berrangé July 31, 2024, 2:10 p.m. UTC | #2
On Wed, Jul 31, 2024 at 03:25:24PM +0200, Philipp Reisner wrote:
> As with many syscalls, open() might be interrupted by a signal.
> 
> The experienced logfile entry is:
> 
> qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call
> 
> Retry it until it is not interrupted by a signal.

As you say, many syscalls can be interruptted by signals, so
special casing open() isn't really a solution - its just
addressing one specific instance you happened to see.

If there are certain signals that we don't want to have a
fatal interruption for, it'd be better to set SA_RESTART
with sigaction, which will auto-restart a large set of
syscalls, while allowing other signals to be fatal.

> FYI, dd has the same kind of loop aroud open().
> https://github.com/coreutils/coreutils/blob/1ae98dbda7322427e8226356fd110d2553f5fac9/src/dd.c#L1294-L1299
> 
> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
> ---
>  util/osdep.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 770369831b..a1269d9345 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -294,14 +294,17 @@ bool qemu_has_direct_io(void)
>  static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
>  {
>      int ret;
> +    do  {
>  #ifdef O_CLOEXEC
> -    ret = open(name, flags | O_CLOEXEC, mode);
> +        ret = open(name, flags | O_CLOEXEC, mode);
>  #else
> -    ret = open(name, flags, mode);
> -    if (ret >= 0) {
> -        qemu_set_cloexec(ret);
> -    }
> +        ret = open(name, flags, mode);
> +        if (ret >= 0) {
> +            qemu_set_cloexec(ret);
> +        }
>  #endif
> +    } while (ret == -1 && errno == EINTR);
> +
>      return ret;
>  }
>  
> -- 
> 2.45.2
> 
> 

With regards,
Daniel
Peter Maydell July 31, 2024, 2:32 p.m. UTC | #3
On Wed, 31 Jul 2024 at 15:11, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Wed, Jul 31, 2024 at 03:25:24PM +0200, Philipp Reisner wrote:
> > As with many syscalls, open() might be interrupted by a signal.
> >
> > The experienced logfile entry is:
> >
> > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call
> >
> > Retry it until it is not interrupted by a signal.
>
> As you say, many syscalls can be interruptted by signals, so
> special casing open() isn't really a solution - its just
> addressing one specific instance you happened to see.
>
> If there are certain signals that we don't want to have a
> fatal interruption for, it'd be better to set SA_RESTART
> with sigaction, which will auto-restart a large set of
> syscalls, while allowing other signals to be fatal.

This is why we have the RETRY_ON_EINTR() macro, right?

Currently we have some places that call qemu_open_old() inside
RETRY_ON_EINTR -- we should decide whether we want to
handle EINTR inside the qemu_open family of functions,
or make the caller deal with it, and put the macro uses
in the right place consistently.

I agree that it would be nicer if we could use SA_RESTART,
but presumably there's a reason why we don't. (At any
rate code that's shared with the user-mode emulation
has to be EINTR-resistant, because we can't force the
user-mode guest code to avoid registering signal handlers
that aren't SA_RESTART.)

thanks
-- PMM
Daniel P. Berrangé July 31, 2024, 3:21 p.m. UTC | #4
On Wed, Jul 31, 2024 at 03:32:52PM +0100, Peter Maydell wrote:
> On Wed, 31 Jul 2024 at 15:11, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Wed, Jul 31, 2024 at 03:25:24PM +0200, Philipp Reisner wrote:
> > > As with many syscalls, open() might be interrupted by a signal.
> > >
> > > The experienced logfile entry is:
> > >
> > > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call

What is the actual signal you are seeing that impacts QEMU
in this way ?

> > > Retry it until it is not interrupted by a signal.
> >
> > As you say, many syscalls can be interruptted by signals, so
> > special casing open() isn't really a solution - its just
> > addressing one specific instance you happened to see.
> >
> > If there are certain signals that we don't want to have a
> > fatal interruption for, it'd be better to set SA_RESTART
> > with sigaction, which will auto-restart a large set of
> > syscalls, while allowing other signals to be fatal.
> 
> This is why we have the RETRY_ON_EINTR() macro, right?
> 
> Currently we have some places that call qemu_open_old() inside
> RETRY_ON_EINTR -- we should decide whether we want to
> handle EINTR inside the qemu_open family of functions,
> or make the caller deal with it, and put the macro uses
> in the right place consistently.

It is incredibly arbitrary where we use RETRY_ON_EINTR, which I think
points towards it being a sub-optimal solution to the general problem.

> 
> I agree that it would be nicer if we could use SA_RESTART,
> but presumably there's a reason why we don't. (At any
> rate code that's shared with the user-mode emulation
> has to be EINTR-resistant, because we can't force the
> user-mode guest code to avoid registering signal handlers
> that aren't SA_RESTART.)

For user mode emulation isn't it valid to just propagage the
EINTR back up to the application, since EINTR is a valid errno
they have to be willing to handle unless the app has itself
use SA_RESTART.

With regards,
Daniel
Peter Maydell July 31, 2024, 3:24 p.m. UTC | #5
On Wed, 31 Jul 2024 at 16:21, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Wed, Jul 31, 2024 at 03:32:52PM +0100, Peter Maydell wrote:
> > This is why we have the RETRY_ON_EINTR() macro, right?
> >
> > Currently we have some places that call qemu_open_old() inside
> > RETRY_ON_EINTR -- we should decide whether we want to
> > handle EINTR inside the qemu_open family of functions,
> > or make the caller deal with it, and put the macro uses
> > in the right place consistently.
>
> It is incredibly arbitrary where we use RETRY_ON_EINTR, which I think
> points towards it being a sub-optimal solution to the general problem.

Agreed (and agreed that SA_RESTART is the usual approach to
avoid this mess). Partly I just vaguely recall discussions
about this back when we added/improved the RETRY_ON_EINTR
macro in the first place: maybe there's a reason we have it
still...

> > I agree that it would be nicer if we could use SA_RESTART,
> > but presumably there's a reason why we don't. (At any
> > rate code that's shared with the user-mode emulation
> > has to be EINTR-resistant, because we can't force the
> > user-mode guest code to avoid registering signal handlers
> > that aren't SA_RESTART.)
>
> For user mode emulation isn't it valid to just propagage the
> EINTR back up to the application, since EINTR is a valid errno
> they have to be willing to handle unless the app has itself
> use SA_RESTART.

Yes, that's what we must do for cases where we are doing some
syscall on behalf of the guest. But for cases where we're
doing a syscall because of something QEMU itself needs to do,
we may need to retry, because we might not be in a position
to be able to back out of what we're doing (or we might not
even be inside the "handle a guest syscall" codepath at all).

-- PMM
Daniel P. Berrangé July 31, 2024, 3:34 p.m. UTC | #6
On Wed, Jul 31, 2024 at 04:24:45PM +0100, Peter Maydell wrote:
> On Wed, 31 Jul 2024 at 16:21, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Wed, Jul 31, 2024 at 03:32:52PM +0100, Peter Maydell wrote:
> > > This is why we have the RETRY_ON_EINTR() macro, right?
> > >
> > > Currently we have some places that call qemu_open_old() inside
> > > RETRY_ON_EINTR -- we should decide whether we want to
> > > handle EINTR inside the qemu_open family of functions,
> > > or make the caller deal with it, and put the macro uses
> > > in the right place consistently.
> >
> > It is incredibly arbitrary where we use RETRY_ON_EINTR, which I think
> > points towards it being a sub-optimal solution to the general problem.
> 
> Agreed (and agreed that SA_RESTART is the usual approach to
> avoid this mess). Partly I just vaguely recall discussions
> about this back when we added/improved the RETRY_ON_EINTR
> macro in the first place: maybe there's a reason we have it
> still...
> 
> > > I agree that it would be nicer if we could use SA_RESTART,
> > > but presumably there's a reason why we don't. (At any
> > > rate code that's shared with the user-mode emulation
> > > has to be EINTR-resistant, because we can't force the
> > > user-mode guest code to avoid registering signal handlers
> > > that aren't SA_RESTART.)
> >
> > For user mode emulation isn't it valid to just propagage the
> > EINTR back up to the application, since EINTR is a valid errno
> > they have to be willing to handle unless the app has itself
> > use SA_RESTART.
> 
> Yes, that's what we must do for cases where we are doing some
> syscall on behalf of the guest. But for cases where we're
> doing a syscall because of something QEMU itself needs to do,
> we may need to retry, because we might not be in a position
> to be able to back out of what we're doing (or we might not
> even be inside the "handle a guest syscall" codepath at all).

Ah ok, so RETRY_ON_EINTR conceivably makes sense in the linux-user
/ bsd-user code in certain scenarios......but it seems almost every
single use today is in system emulator code !

With regards,
Daniel
Philipp Reisner July 31, 2024, 3:34 p.m. UTC | #7
Hi Daniel,

> > > > The experienced logfile entry is:
> > > >
> > > > qemu-system-x86_64: -device virtio-blk-pci,bus=pci.0,addr=0x7,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on,serial=1b990c4d13b74a4e90ea: Could not open '/dev/drbd1003': Interrupted system call
>
> What is the actual signal you are seeing that impacts QEMU
> in this way ?
>

I do not know at this point. This only reproduces on a customer's
system we do not have access to. We do not see it in our in-house lab.
And qemu is called through libvirt through ApacheCloudStack. And it
affects only about 10%-20% of the VM start operations.

I will wrap my head around bpftrace and see if I can instruct the
customer to run that on his systems. So, maybe I can answer the
question regarding the signal in a few days. Maybe next week.

The backing device we use (drbd) does an "auto promote" action in its
open implementation. That involves exchanging some packets with some
peers on the local network. I guess that takes between 1ms to 10ms.
So, it exposes a larger time window than other backing block devices,
which probably have a shorter running open implementation.

So this is why we see it sometimes.

with regards,
 Philipp
diff mbox series

Patch

diff --git a/util/osdep.c b/util/osdep.c
index 770369831b..a1269d9345 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -294,14 +294,17 @@  bool qemu_has_direct_io(void)
 static int qemu_open_cloexec(const char *name, int flags, mode_t mode)
 {
     int ret;
+    do  {
 #ifdef O_CLOEXEC
-    ret = open(name, flags | O_CLOEXEC, mode);
+        ret = open(name, flags | O_CLOEXEC, mode);
 #else
-    ret = open(name, flags, mode);
-    if (ret >= 0) {
-        qemu_set_cloexec(ret);
-    }
+        ret = open(name, flags, mode);
+        if (ret >= 0) {
+            qemu_set_cloexec(ret);
+        }
 #endif
+    } while (ret == -1 && errno == EINTR);
+
     return ret;
 }