diff mbox series

[net-next,v2,2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE

Message ID 20200822020442.2677358-2-luke.w.hsiao@gmail.com
State Superseded
Delegated to: David Miller
Headers show
Series [net-next,v2,1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() | expand

Commit Message

Luke Hsiao Aug. 22, 2020, 2:04 a.m. UTC
From: Luke Hsiao <lukehsiao@google.com>

Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
the context of TCP tx zero-copy, this is inefficient since we are only
reading the error queue and not using recvmsg to read POLLIN responses.

This patch was tested by using a simple sending program to call recvmsg
using io_uring with MSG_ERRQUEUE set and verifying with printks that the
POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 fs/io_uring.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Jens Axboe Aug. 22, 2020, 2:08 a.m. UTC | #1
On 8/21/20 8:04 PM, Luke Hsiao wrote:
> From: Luke Hsiao <lukehsiao@google.com>
> 
> Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
> the context of TCP tx zero-copy, this is inefficient since we are only
> reading the error queue and not using recvmsg to read POLLIN responses.
> 
> This patch was tested by using a simple sending program to call recvmsg
> using io_uring with MSG_ERRQUEUE set and verifying with printks that the
> POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Sorry, one more minor thing to fix up:

> @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
>  		mask |= POLLIN | POLLRDNORM;
>  	if (def->pollout)
>  		mask |= POLLOUT | POLLWRNORM;
> +
> +	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
> +	if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
> +		mask &= ~POLLIN;
> +

Don't pass in the sqe here, but use req->sr_msg.msg_flags for this check. This
is actually really important, as you don't want to re-read anything from the
sqe.

I'm actually surprised this one got past Jann :-)

> @@ -6146,7 +6153,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  	 * doesn't support non-blocking read/write attempts
>  	 */
>  	if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
> -		if (!io_arm_poll_handler(req)) {
> +		if (!io_arm_poll_handler(req, sqe)) {

Also means you can drop this part.
Luke Hsiao Aug. 22, 2020, 2:13 a.m. UTC | #2
Hi Jens,

On Fri, Aug 21, 2020 at 7:09 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 8/21/20 8:04 PM, Luke Hsiao wrote:
> >
> Sorry, one more minor thing to fix up:
>
> > @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
> >               mask |= POLLIN | POLLRDNORM;
> >       if (def->pollout)
> >               mask |= POLLOUT | POLLWRNORM;
> > +
> > +     /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
> > +     if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
> > +             mask &= ~POLLIN;
> > +
>
> Don't pass in the sqe here, but use req->sr_msg.msg_flags for this check. This
> is actually really important, as you don't want to re-read anything from the
> sqe.
>
> I'm actually surprised this one got past Jann :-)

Got it, I will make the change and send v3. In Jann's defense, he
reviewed the previous commit, but not this one :). Thanks for your
detailed feedback.

--
Luke
Jens Axboe Aug. 22, 2020, 2:16 a.m. UTC | #3
On 8/21/20 8:13 PM, Luke Hsiao wrote:
> Hi Jens,
> 
> On Fri, Aug 21, 2020 at 7:09 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 8/21/20 8:04 PM, Luke Hsiao wrote:
>>>
>> Sorry, one more minor thing to fix up:
>>
>>> @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
>>>               mask |= POLLIN | POLLRDNORM;
>>>       if (def->pollout)
>>>               mask |= POLLOUT | POLLWRNORM;
>>> +
>>> +     /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
>>> +     if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
>>> +             mask &= ~POLLIN;
>>> +
>>
>> Don't pass in the sqe here, but use req->sr_msg.msg_flags for this check. This
>> is actually really important, as you don't want to re-read anything from the
>> sqe.
>>
>> I'm actually surprised this one got past Jann :-)
> 
> Got it, I will make the change and send v3. In Jann's defense, he
> reviewed the previous commit, but not this one :). Thanks for your
> detailed feedback.

Ah right you are, I guess it was the previous patch that had his
review! Thanks for taking care of this.
diff mbox series

Patch

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc506b75659c..fd5353e31a2c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -79,6 +79,7 @@ 
 #include <linux/splice.h>
 #include <linux/task_work.h>
 #include <linux/pagemap.h>
+#include <linux/socket.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
@@ -4902,7 +4903,8 @@  static __poll_t __io_arm_poll_handler(struct io_kiocb *req,
 	return mask;
 }
 
-static bool io_arm_poll_handler(struct io_kiocb *req)
+static bool io_arm_poll_handler(struct io_kiocb *req,
+				const struct io_uring_sqe *sqe)
 {
 	const struct io_op_def *def = &io_op_defs[req->opcode];
 	struct io_ring_ctx *ctx = req->ctx;
@@ -4932,6 +4934,11 @@  static bool io_arm_poll_handler(struct io_kiocb *req)
 		mask |= POLLIN | POLLRDNORM;
 	if (def->pollout)
 		mask |= POLLOUT | POLLWRNORM;
+
+	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+	if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE))
+		mask &= ~POLLIN;
+
 	mask |= POLLERR | POLLPRI;
 
 	ipt.pt._qproc = io_async_queue_proc;
@@ -6146,7 +6153,7 @@  static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	 * doesn't support non-blocking read/write attempts
 	 */
 	if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
-		if (!io_arm_poll_handler(req)) {
+		if (!io_arm_poll_handler(req, sqe)) {
 punt:
 			ret = io_prep_work_files(req);
 			if (unlikely(ret))