diff mbox series

[net-next,v3,2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE

Message ID 20200822044105.3097613-2-luke.w.hsiao@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series [net-next,v3,1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() | expand

Commit Message

Luke Hsiao Aug. 22, 2020, 4:41 a.m. UTC
From: Luke Hsiao <lukehsiao@google.com>

Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
the context of TCP tx zero-copy, this is inefficient since we are only
reading the error queue and not using recvmsg to read POLLIN responses.

This patch was tested by using a simple sending program to call recvmsg
using io_uring with MSG_ERRQUEUE set and verifying with printks that the
POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
---
 fs/io_uring.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Jens Axboe Aug. 22, 2020, 3:49 p.m. UTC | #1
On 8/21/20 10:41 PM, Luke Hsiao wrote:
> From: Luke Hsiao <lukehsiao@google.com>
> 
> Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
> the context of TCP tx zero-copy, this is inefficient since we are only
> reading the error queue and not using recvmsg to read POLLIN responses.
> 
> This patch was tested by using a simple sending program to call recvmsg
> using io_uring with MSG_ERRQUEUE set and verifying with printks that the
> POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Perfect, and ends up being much simpler too and straight forward.
David Miller Aug. 24, 2020, 11:16 p.m. UTC | #2
From: Luke Hsiao <luke.w.hsiao@gmail.com>
Date: Fri, 21 Aug 2020 21:41:05 -0700

> From: Luke Hsiao <lukehsiao@google.com>
> 
> Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In
> the context of TCP tx zero-copy, this is inefficient since we are only
> reading the error queue and not using recvmsg to read POLLIN responses.
> 
> This patch was tested by using a simple sending program to call recvmsg
> using io_uring with MSG_ERRQUEUE set and verifying with printks that the
> POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE.

Again, selftests additions please.

> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Luke Hsiao <lukehsiao@google.com>

Applied.
diff mbox series

Patch

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dc506b75659c..1aa2191ea683 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4932,6 +4932,12 @@  static bool io_arm_poll_handler(struct io_kiocb *req)
 		mask |= POLLIN | POLLRDNORM;
 	if (def->pollout)
 		mask |= POLLOUT | POLLWRNORM;
+
+	/* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */
+	if ((req->opcode == IORING_OP_RECVMSG) &&
+	    (req->sr_msg.msg_flags & MSG_ERRQUEUE))
+		mask &= ~POLLIN;
+
 	mask |= POLLERR | POLLPRI;
 
 	ipt.pt._qproc = io_async_queue_proc;