Message ID | 20210729091029.65369-1-f.ebner@proxmox.com |
---|---|
State | New |
Headers | show |
Series | [v2] block/io_uring: resubmit when result is -EAGAIN | expand |
On Thu, Jul 29, 2021 at 11:10:29AM +0200, Fabian Ebner wrote: >Linux SCSI can throw spurious -EAGAIN in some corner cases in its >completion path, which will end up being the result in the completed >io_uring request. > >Resubmitting such requests should allow block jobs to complete, even >if such spurious errors are encountered. > >Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> >Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> >Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> >--- > >Changes from v1: > * Focus on what's relevant for the patch itself in the commit > message. > * Add Stefan's comment. > * Add Stefano's R-b tag (I hope that's fine, since there was no > change code-wise). Yep, it's fine :-) Thanks, Stefano
On Thu, Jul 29, 2021 at 11:10:29AM +0200, Fabian Ebner wrote: > Linux SCSI can throw spurious -EAGAIN in some corner cases in its > completion path, which will end up being the result in the completed > io_uring request. > > Resubmitting such requests should allow block jobs to complete, even > if such spurious errors are encountered. > > Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> > Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> > --- > > Changes from v1: > * Focus on what's relevant for the patch itself in the commit > message. > * Add Stefan's comment. > * Add Stefano's R-b tag (I hope that's fine, since there was no > change code-wise). > > block/io_uring.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) Thanks, applied to my block tree: https://gitlab.com/stefanha/qemu/commits/block Stefan
Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben: > Linux SCSI can throw spurious -EAGAIN in some corner cases in its > completion path, which will end up being the result in the completed > io_uring request. > > Resubmitting such requests should allow block jobs to complete, even > if such spurious errors are encountered. > > Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> > Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> > --- > > Changes from v1: > * Focus on what's relevant for the patch itself in the commit > message. > * Add Stefan's comment. > * Add Stefano's R-b tag (I hope that's fine, since there was no > change code-wise). > > block/io_uring.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/block/io_uring.c b/block/io_uring.c > index 00a3ee9fb8..dfa475cc87 100644 > --- a/block/io_uring.c > +++ b/block/io_uring.c > @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s) > total_bytes = ret + luringcb->total_read; > > if (ret < 0) { > - if (ret == -EINTR) { > + /* > + * Only writev/readv/fsync requests on regular files or host block > + * devices are submitted. Therefore -EAGAIN is not expected but it's > + * known to happen sometimes with Linux SCSI. Submit again and hope > + * the request completes successfully. > + * > + * For more information, see: > + * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u > + * > + * If the code is changed to submit other types of requests in the > + * future, then this workaround may need to be extended to deal with > + * genuine -EAGAIN results that should not be resubmitted > + * immediately. > + */ > + if (ret == -EINTR || ret == -EAGAIN) { > luring_resubmit(s, luringcb); > continue; > } Reviewed-by: Kevin Wolf <kwolf@redhat.com> Question about the preexisting code, though: luring_resubmit() requires that the caller calls ioq_submit() later so that the request doesn't just sit in a queue without getting any attention, but actually gets submitted to the kernel. In the call chain ioq_submit() -> luring_process_completions() -> luring_resubmit(), who takes care of that? Kevin
On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote: >Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben: >> Linux SCSI can throw spurious -EAGAIN in some corner cases in its >> completion path, which will end up being the result in the completed >> io_uring request. >> >> Resubmitting such requests should allow block jobs to complete, even >> if such spurious errors are encountered. >> >> Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> >> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> >> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> >> --- >> >> Changes from v1: >> * Focus on what's relevant for the patch itself in the commit >> message. >> * Add Stefan's comment. >> * Add Stefano's R-b tag (I hope that's fine, since there was no >> change code-wise). >> >> block/io_uring.c | 16 +++++++++++++++- >> 1 file changed, 15 insertions(+), 1 deletion(-) >> >> diff --git a/block/io_uring.c b/block/io_uring.c >> index 00a3ee9fb8..dfa475cc87 100644 >> --- a/block/io_uring.c >> +++ b/block/io_uring.c >> @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s) >> total_bytes = ret + luringcb->total_read; >> >> if (ret < 0) { >> - if (ret == -EINTR) { >> + /* >> + * Only writev/readv/fsync requests on regular files or host block >> + * devices are submitted. Therefore -EAGAIN is not expected but it's >> + * known to happen sometimes with Linux SCSI. Submit again and hope >> + * the request completes successfully. >> + * >> + * For more information, see: >> + * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u >> + * >> + * If the code is changed to submit other types of requests in the >> + * future, then this workaround may need to be extended to deal with >> + * genuine -EAGAIN results that should not be resubmitted >> + * immediately. >> + */ >> + if (ret == -EINTR || ret == -EAGAIN) { >> luring_resubmit(s, luringcb); >> continue; >> } > >Reviewed-by: Kevin Wolf <kwolf@redhat.com> > >Question about the preexisting code, though: luring_resubmit() requires >that the caller calls ioq_submit() later so that the request doesn't >just sit in a queue without getting any attention, but actually gets >submitted to the kernel. > >In the call chain ioq_submit() -> luring_process_completions() -> >luring_resubmit(), who takes care of that? Mmm, good point. There should be the same problem with ioq_submit() -> luring_process_completions() -> luring_resubmit_short_read() -> luring_resubmit(). Should we schedule a BH for example in luring_resubmit() to make sure that ioq_submit() is invoked after a resubmission? Thanks, Stefano
Am 04.08.2021 um 16:50 hat Stefano Garzarella geschrieben: > On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote: > > Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben: > > > Linux SCSI can throw spurious -EAGAIN in some corner cases in its > > > completion path, which will end up being the result in the completed > > > io_uring request. > > > > > > Resubmitting such requests should allow block jobs to complete, even > > > if such spurious errors are encountered. > > > > > > Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> > > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> > > > Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> > > > --- > > > > > > Changes from v1: > > > * Focus on what's relevant for the patch itself in the commit > > > message. > > > * Add Stefan's comment. > > > * Add Stefano's R-b tag (I hope that's fine, since there was no > > > change code-wise). > > > > > > block/io_uring.c | 16 +++++++++++++++- > > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > > > diff --git a/block/io_uring.c b/block/io_uring.c > > > index 00a3ee9fb8..dfa475cc87 100644 > > > --- a/block/io_uring.c > > > +++ b/block/io_uring.c > > > @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s) > > > total_bytes = ret + luringcb->total_read; > > > > > > if (ret < 0) { > > > - if (ret == -EINTR) { > > > + /* > > > + * Only writev/readv/fsync requests on regular files or host block > > > + * devices are submitted. Therefore -EAGAIN is not expected but it's > > > + * known to happen sometimes with Linux SCSI. Submit again and hope > > > + * the request completes successfully. > > > + * > > > + * For more information, see: > > > + * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u > > > + * > > > + * If the code is changed to submit other types of requests in the > > > + * future, then this workaround may need to be extended to deal with > > > + * genuine -EAGAIN results that should not be resubmitted > > > + * immediately. > > > + */ > > > + if (ret == -EINTR || ret == -EAGAIN) { > > > luring_resubmit(s, luringcb); > > > continue; > > > } > > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > > > Question about the preexisting code, though: luring_resubmit() requires > > that the caller calls ioq_submit() later so that the request doesn't > > just sit in a queue without getting any attention, but actually gets > > submitted to the kernel. > > > > In the call chain ioq_submit() -> luring_process_completions() -> > > luring_resubmit(), who takes care of that? > > Mmm, good point. > There should be the same problem with ioq_submit() -> > luring_process_completions() -> luring_resubmit_short_read() -> > luring_resubmit(). > > Should we schedule a BH for example in luring_resubmit() to make sure that > ioq_submit() is invoked after a resubmission? Or just loop in ioq_submit() after calling luring_process_completions() if new requests were added to the queue? Kevin
On Wed, Aug 04, 2021 at 06:52:15PM +0200, Kevin Wolf wrote: >Am 04.08.2021 um 16:50 hat Stefano Garzarella geschrieben: >> On Mon, Aug 02, 2021 at 02:40:36PM +0200, Kevin Wolf wrote: >> > Am 29.07.2021 um 11:10 hat Fabian Ebner geschrieben: >> > > Linux SCSI can throw spurious -EAGAIN in some corner cases in its >> > > completion path, which will end up being the result in the completed >> > > io_uring request. >> > > >> > > Resubmitting such requests should allow block jobs to complete, even >> > > if such spurious errors are encountered. >> > > >> > > Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> >> > > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> >> > > Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> >> > > --- >> > > >> > > Changes from v1: >> > > * Focus on what's relevant for the patch itself in the commit >> > > message. >> > > * Add Stefan's comment. >> > > * Add Stefano's R-b tag (I hope that's fine, since there was no >> > > change code-wise). >> > > >> > > block/io_uring.c | 16 +++++++++++++++- >> > > 1 file changed, 15 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/block/io_uring.c b/block/io_uring.c >> > > index 00a3ee9fb8..dfa475cc87 100644 >> > > --- a/block/io_uring.c >> > > +++ b/block/io_uring.c >> > > @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s) >> > > total_bytes = ret + luringcb->total_read; >> > > >> > > if (ret < 0) { >> > > - if (ret == -EINTR) { >> > > + /* >> > > + * Only writev/readv/fsync requests on regular files or host block >> > > + * devices are submitted. Therefore -EAGAIN is not expected but it's >> > > + * known to happen sometimes with Linux SCSI. Submit again and hope >> > > + * the request completes successfully. >> > > + * >> > > + * For more information, see: >> > > + * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u >> > > + * >> > > + * If the code is changed to submit other types of requests in the >> > > + * future, then this workaround may need to be extended to deal with >> > > + * genuine -EAGAIN results that should not be resubmitted >> > > + * immediately. >> > > + */ >> > > + if (ret == -EINTR || ret == -EAGAIN) { >> > > luring_resubmit(s, luringcb); >> > > continue; >> > > } >> > >> > Reviewed-by: Kevin Wolf <kwolf@redhat.com> >> > >> > Question about the preexisting code, though: luring_resubmit() requires >> > that the caller calls ioq_submit() later so that the request doesn't >> > just sit in a queue without getting any attention, but actually gets >> > submitted to the kernel. >> > >> > In the call chain ioq_submit() -> luring_process_completions() -> >> > luring_resubmit(), who takes care of that? >> >> Mmm, good point. >> There should be the same problem with ioq_submit() -> >> luring_process_completions() -> luring_resubmit_short_read() -> >> luring_resubmit(). >> >> Should we schedule a BH for example in luring_resubmit() to make sure that >> ioq_submit() is invoked after a resubmission? > >Or just loop in ioq_submit() after calling luring_process_completions() >if new requests were added to the queue? > I was just concerned that we might cycle a bit if a request always returns -EAGAIN, while scheduling a task might give room for other devices to queue other requests. But maybe this happens so occasionally that we might not worry about it... Stefano
diff --git a/block/io_uring.c b/block/io_uring.c index 00a3ee9fb8..dfa475cc87 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -165,7 +165,21 @@ static void luring_process_completions(LuringState *s) total_bytes = ret + luringcb->total_read; if (ret < 0) { - if (ret == -EINTR) { + /* + * Only writev/readv/fsync requests on regular files or host block + * devices are submitted. Therefore -EAGAIN is not expected but it's + * known to happen sometimes with Linux SCSI. Submit again and hope + * the request completes successfully. + * + * For more information, see: + * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u + * + * If the code is changed to submit other types of requests in the + * future, then this workaround may need to be extended to deal with + * genuine -EAGAIN results that should not be resubmitted + * immediately. + */ + if (ret == -EINTR || ret == -EAGAIN) { luring_resubmit(s, luringcb); continue; }