diff mbox series

[v2] block/io_uring: resubmit when result is -EAGAIN

Message ID 20210729091029.65369-1-f.ebner@proxmox.com
State New
Headers show
Series [v2] block/io_uring: resubmit when result is -EAGAIN | expand

Commit Message

Fiona Ebner July 29, 2021, 9:10 a.m. UTC
Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.

Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.

Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

Changes from v1:
    * Focus on what's relevant for the patch itself in the commit
      message.
    * Add Stefan's comment.
    * Add Stefano's R-b tag (I hope that's fine, since there was no
      change code-wise).

 block/io_uring.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Stefano Garzarella July 29, 2021, 9:54 a.m. UTC | #1
On Thu, Jul 29, 2021 at 11:10:29AM +0200, Fabian Ebner wrote:
>Linux SCSI can throw spurious -EAGAIN in some corner cases in its
>completion path, which will end up being the result in the completed
>io_uring request.
>
>Resubmitting such requests should allow block jobs to complete, even
>if such spurious errors are encountered.
>
>Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
>Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>---
>
>Changes from v1:
>    * Focus on what's relevant for the patch itself in the commit
>      message.
>    * Add Stefan's comment.
>    * Add Stefano's R-b tag (I hope that's fine, since there was no
>      change code-wise).

Yep, it's fine :-)

Thanks,
Stefano
Stefan Hajnoczi July 29, 2021, 4:15 p.m. UTC | #2
On Thu, Jul 29, 2021 at 11:10:29AM +0200, Fabian Ebner wrote:
> Linux SCSI can throw spurious -EAGAIN in some corner cases in its
> completion path, which will end up being the result in the completed
> io_uring request.
> 
> Resubmitting such requests should allow block jobs to complete, even
> if such spurious errors are encountered.
> 
> Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> Changes from v1:
>     * Focus on what's relevant for the patch itself in the commit
>       message.
>     * Add Stefan's comment.
>     * Add Stefano's R-b tag (I hope that's fine, since there was no
>       change code-wise).
> 
>  block/io_uring.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan
Kevin Wolf Aug. 2, 2021, 12:40 p.m. UTC | #3
Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben:
> Linux SCSI can throw spurious -EAGAIN in some corner cases in its
> completion path, which will end up being the result in the completed
> io_uring request.
> 
> Resubmitting such requests should allow block jobs to complete, even
> if such spurious errors are encountered.
> 
> Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> Changes from v1:
>     * Focus on what's relevant for the patch itself in the commit
>       message.
>     * Add Stefan's comment.
>     * Add Stefano's R-b tag (I hope that's fine, since there was no
>       change code-wise).
> 
>  block/io_uring.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 00a3ee9fb8..dfa475cc87 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s)
>          total_bytes = ret + luringcb->total_read;
>  
>          if (ret < 0) {
> -            if (ret == -EINTR) {
> +            /*
> +             * Only writev/readv/fsync requests on regular files or host block
> +             * devices are submitted. Therefore -EAGAIN is not expected but it's
> +             * known to happen sometimes with Linux SCSI. Submit again and hope
> +             * the request completes successfully.
> +             *
> +             * For more information, see:
> +             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
> +             *
> +             * If the code is changed to submit other types of requests in the
> +             * future, then this workaround may need to be extended to deal with
> +             * genuine -EAGAIN results that should not be resubmitted
> +             * immediately.
> +             */
> +            if (ret == -EINTR || ret == -EAGAIN) {
>                  luring_resubmit(s, luringcb);
>                  continue;
>              }

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Question about the preexisting code, though: luring_resubmit() requires
that the caller calls ioq_submit() later so that the request doesn't
just sit in a queue without getting any attention, but actually gets
submitted to the kernel.

In the call chain ioq_submit() -> luring_process_completions() ->
luring_resubmit(), who takes care of that?

Kevin
Stefano Garzarella Aug. 4, 2021, 2:50 p.m. UTC | #4
On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote:
>Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben:
>> Linux SCSI can throw spurious -EAGAIN in some corner cases in its
>> completion path, which will end up being the result in the completed
>> io_uring request.
>>
>> Resubmitting such requests should allow block jobs to complete, even
>> if such spurious errors are encountered.
>>
>> Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
>> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>> ---
>>
>> Changes from v1:
>>     * Focus on what's relevant for the patch itself in the commit
>>       message.
>>     * Add Stefan's comment.
>>     * Add Stefano's R-b tag (I hope that's fine, since there was no
>>       change code-wise).
>>
>>  block/io_uring.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/io_uring.c b/block/io_uring.c
>> index 00a3ee9fb8..dfa475cc87 100644
>> --- a/block/io_uring.c
>> +++ b/block/io_uring.c
>> @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s)
>>          total_bytes = ret + luringcb->total_read;
>>
>>          if (ret < 0) {
>> -            if (ret == -EINTR) {
>> +            /*
>> +             * Only writev/readv/fsync requests on regular files or host block
>> +             * devices are submitted. Therefore -EAGAIN is not expected but it's
>> +             * known to happen sometimes with Linux SCSI. Submit again and hope
>> +             * the request completes successfully.
>> +             *
>> +             * For more information, see:
>> +             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
>> +             *
>> +             * If the code is changed to submit other types of requests in the
>> +             * future, then this workaround may need to be extended to deal with
>> +             * genuine -EAGAIN results that should not be resubmitted
>> +             * immediately.
>> +             */
>> +            if (ret == -EINTR || ret == -EAGAIN) {
>>                  luring_resubmit(s, luringcb);
>>                  continue;
>>              }
>
>Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>
>Question about the preexisting code, though: luring_resubmit() requires
>that the caller calls ioq_submit() later so that the request doesn't
>just sit in a queue without getting any attention, but actually gets
>submitted to the kernel.
>
>In the call chain ioq_submit() -> luring_process_completions() ->
>luring_resubmit(), who takes care of that?

Mmm, good point.
There should be the same problem with ioq_submit() -> 
luring_process_completions() -> luring_resubmit_short_read() -> 
luring_resubmit().

Should we schedule a BH for example in luring_resubmit() to make sure 
that ioq_submit() is invoked after a resubmission?

Thanks,
Stefano
Kevin Wolf Aug. 4, 2021, 4:52 p.m. UTC | #5
Am 04.08.2021 um 16:50 hat Stefano Garzarella geschrieben:
> On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote:
> > Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben:
> > > Linux SCSI can throw spurious -EAGAIN in some corner cases in its
> > > completion path, which will end up being the result in the completed
> > > io_uring request.
> > > 
> > > Resubmitting such requests should allow block jobs to complete, even
> > > if such spurious errors are encountered.
> > > 
> > > Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
> > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> > > Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> > > ---
> > > 
> > > Changes from v1:
> > >     * Focus on what's relevant for the patch itself in the commit
> > >       message.
> > >     * Add Stefan's comment.
> > >     * Add Stefano's R-b tag (I hope that's fine, since there was no
> > >       change code-wise).
> > > 
> > >  block/io_uring.c | 16 +++++++++++++++-
> > >  1 file changed, 15 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/io_uring.c b/block/io_uring.c
> > > index 00a3ee9fb8..dfa475cc87 100644
> > > --- a/block/io_uring.c
> > > +++ b/block/io_uring.c
> > > @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s)
> > >          total_bytes = ret + luringcb->total_read;
> > > 
> > >          if (ret < 0) {
> > > -            if (ret == -EINTR) {
> > > +            /*
> > > +             * Only writev/readv/fsync requests on regular files or host block
> > > +             * devices are submitted. Therefore -EAGAIN is not expected but it's
> > > +             * known to happen sometimes with Linux SCSI. Submit again and hope
> > > +             * the request completes successfully.
> > > +             *
> > > +             * For more information, see:
> > > +             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
> > > +             *
> > > +             * If the code is changed to submit other types of requests in the
> > > +             * future, then this workaround may need to be extended to deal with
> > > +             * genuine -EAGAIN results that should not be resubmitted
> > > +             * immediately.
> > > +             */
> > > +            if (ret == -EINTR || ret == -EAGAIN) {
> > >                  luring_resubmit(s, luringcb);
> > >                  continue;
> > >              }
> > 
> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > 
> > Question about the preexisting code, though: luring_resubmit() requires
> > that the caller calls ioq_submit() later so that the request doesn't
> > just sit in a queue without getting any attention, but actually gets
> > submitted to the kernel.
> > 
> > In the call chain ioq_submit() -> luring_process_completions() ->
> > luring_resubmit(), who takes care of that?
> 
> Mmm, good point.
> There should be the same problem with ioq_submit() ->
> luring_process_completions() -> luring_resubmit_short_read() ->
> luring_resubmit().
> 
> Should we schedule a BH for example in luring_resubmit() to make sure that
> ioq_submit() is invoked after a resubmission?

Or just loop in ioq_submit() after calling luring_process_completions()
if new requests were added to the queue?

Kevin
Stefano Garzarella Aug. 5, 2021, 8:31 a.m. UTC | #6
On Wed, Aug 04, 2021 at 06:52:15PM +0200, Kevin Wolf wrote:
>Am 04.08.2021 um 16:50 hat Stefano Garzarella geschrieben:
>> On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote:
>> > Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben:
>> > > Linux SCSI can throw spurious -EAGAIN in some corner cases in its
>> > > completion path, which will end up being the result in the completed
>> > > io_uring request.
>> > >
>> > > Resubmitting such requests should allow block jobs to complete, even
>> > > if such spurious errors are encountered.
>> > >
>> > > Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
>> > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>> > > Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>> > > ---
>> > >
>> > > Changes from v1:
>> > >     * Focus on what's relevant for the patch itself in the commit
>> > >       message.
>> > >     * Add Stefan's comment.
>> > >     * Add Stefano's R-b tag (I hope that's fine, since there was no
>> > >       change code-wise).
>> > >
>> > >  block/io_uring.c | 16 +++++++++++++++-
>> > >  1 file changed, 15 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/block/io_uring.c b/block/io_uring.c
>> > > index 00a3ee9fb8..dfa475cc87 100644
>> > > --- a/block/io_uring.c
>> > > +++ b/block/io_uring.c
>> > > @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s)
>> > >          total_bytes = ret + luringcb->total_read;
>> > >
>> > >          if (ret < 0) {
>> > > -            if (ret == -EINTR) {
>> > > +            /*
>> > > +             * Only writev/readv/fsync requests on regular files or host block
>> > > +             * devices are submitted. Therefore -EAGAIN is not expected but it's
>> > > +             * known to happen sometimes with Linux SCSI. Submit again and hope
>> > > +             * the request completes successfully.
>> > > +             *
>> > > +             * For more information, see:
>> > > +             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
>> > > +             *
>> > > +             * If the code is changed to submit other types of requests in the
>> > > +             * future, then this workaround may need to be extended to deal with
>> > > +             * genuine -EAGAIN results that should not be resubmitted
>> > > +             * immediately.
>> > > +             */
>> > > +            if (ret == -EINTR || ret == -EAGAIN) {
>> > >                  luring_resubmit(s, luringcb);
>> > >                  continue;
>> > >              }
>> >
>> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> >
>> > Question about the preexisting code, though: luring_resubmit() requires
>> > that the caller calls ioq_submit() later so that the request doesn't
>> > just sit in a queue without getting any attention, but actually gets
>> > submitted to the kernel.
>> >
>> > In the call chain ioq_submit() -> luring_process_completions() ->
>> > luring_resubmit(), who takes care of that?
>>
>> Mmm, good point.
>> There should be the same problem with ioq_submit() ->
>> luring_process_completions() -> luring_resubmit_short_read() ->
>> luring_resubmit().
>>
>> Should we schedule a BH for example in luring_resubmit() to make sure that
>> ioq_submit() is invoked after a resubmission?
>
>Or just loop in ioq_submit() after calling luring_process_completions()
>if new requests were added to the queue?
>

I was just concerned that we might cycle a bit if a request always 
returns -EAGAIN, while scheduling a task might give room for other 
devices to queue other requests.

But maybe this happens so occasionally that we might not worry about 
it...

Stefano
diff mbox series

Patch

diff --git a/block/io_uring.c b/block/io_uring.c
index 00a3ee9fb8..dfa475cc87 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -165,7 +165,21 @@  static void luring_process_completions(LuringState *s)
         total_bytes = ret + luringcb->total_read;
 
         if (ret < 0) {
-            if (ret == -EINTR) {
+            /*
+             * Only writev/readv/fsync requests on regular files or host block
+             * devices are submitted. Therefore -EAGAIN is not expected but it's
+             * known to happen sometimes with Linux SCSI. Submit again and hope
+             * the request completes successfully.
+             *
+             * For more information, see:
+             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
+             *
+             * If the code is changed to submit other types of requests in the
+             * future, then this workaround may need to be extended to deal with
+             * genuine -EAGAIN results that should not be resubmitted
+             * immediately.
+             */
+            if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
                 continue;
             }