From patchwork Sat Aug 22 02:04:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luke Hsiao X-Patchwork-Id: 1349574 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=gGmYx7bg; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BYMD10Yksz9sTX for ; Sat, 22 Aug 2020 12:05:21 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726639AbgHVCFL (ORCPT ); Fri, 21 Aug 2020 22:05:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726588AbgHVCFG (ORCPT ); Fri, 21 Aug 2020 22:05:06 -0400 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47F8EC061573 for ; Fri, 21 Aug 2020 19:05:06 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id p37so1844246pgl.3 for ; Fri, 21 Aug 2020 19:05:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4tycrz9+mrs9bI0PZ0O9ZszH/0ecSjNeZ2XrreQ/rwY=; b=gGmYx7bgX543ETiOzh4kWAG3KPsL44gQc2A28kDEOnZNZ0YzC6hTHBfPxAIHSLxTEG jl1NjAi3OoX1iD50EPrnf3CsNKMdmEHdR4N6rvhDrVFExHQGGWRA2gnQ6SZVjB0ixVML iJu2upXQy7qJKTVCtDW4EvX0lD9ESFhiF2suRXUMHZ/XfA4IiyM968i/ZU52cJ5ZehLR lhI12+H8I8i5D+3Ck+WH0p+HPgeLVoaCiOaZYokBQje05b48wQ4j9hYeWhenDg6zxmn8 cCV0RmKsHR89zM6gFtlmtYIv22jkHztPh2j9k5Vp/E9r+H+TTDGo0ZQJoMBEmXrVGCBj Trgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4tycrz9+mrs9bI0PZ0O9ZszH/0ecSjNeZ2XrreQ/rwY=; b=VbENby9z6H/RIIsF6UEXl8fuAT801xfwD6wrETxFlwZqr4bSXmQ7Q87LazGHTbmWRV qD/rkApyyVnNuCAgp4aJL1cgjAGG3D2DVSHb1RYBceg4IYGQ5/2YAxMSkAnro5OdERzI T0PevXZt8k0E/s5uRhQdL7KM/ozxCHYxHkMvzGa51dgAMCoL/0QgGpux7RTC4FvKVJMe 7f2+JuvBjWIY/A1oonOoXMqbGn9wFXlqbxhz7E9YhZVIWxOJaPvk54AO4ByVBN1VR3/U lC0NbVbEyAIu812KjAPhhWiMoh4noGrMu5iK1N02XzNDAnjQcv+HBNqI1B9/BDD+R+Ao Gr/Q== X-Gm-Message-State: AOAM531wSGLTX8CDyZ4eqFVJ8lFNPw6kTW70V7p4DnhAz0KozZS6vf4O rB5rxpTIR6HS7cR9+zeol6s= X-Google-Smtp-Source: ABdhPJzaLIToRZf2H2TPnJm//24pxCVKISydj1rxZQqjQkU4gUNpyYnLc1s6IBfAs0PNcmLvKKh/xA== X-Received: by 2002:a62:d149:: with SMTP id t9mr4677825pfl.59.1598061904344; Fri, 21 Aug 2020 19:05:04 -0700 (PDT) Received: from lukehsiao.c.googlers.com.com (40.156.233.35.bc.googleusercontent.com. [35.233.156.40]) by smtp.gmail.com with ESMTPSA id v78sm4129729pfc.121.2020.08.21.19.05.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Aug 2020 19:05:02 -0700 (PDT) From: Luke Hsiao To: David Miller Cc: netdev@vger.kernel.org, Jens Axboe , Jakub Kicinski , Luke Hsiao , Soheil Hassas Yeganeh , Arjun Roy , Eric Dumazet , Jann Horn Subject: [PATCH net-next v2 1/2] io_uring: allow tcp ancillary data for __sys_recvmsg_sock() Date: Fri, 21 Aug 2020 19:04:41 -0700 Message-Id: <20200822020442.2677358-1-luke.w.hsiao@gmail.com> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <9abca73b-de63-f69d-caff-ae3ed24854de@kernel.dk> References: <9abca73b-de63-f69d-caff-ae3ed24854de@kernel.dk> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Luke Hsiao For TCP tx zero-copy, the kernel notifies the process of completions by queuing completion notifications on the socket error queue. This patch allows reading these notifications via recvmsg to support TCP tx zero-copy. Ancillary data was originally disallowed due to privilege escalation via io_uring's offloading of sendmsg() onto a kernel thread with kernel credentials (https://crbug.com/project-zero/1975). So, we must ensure that the socket type is one where the ancillary data types that are delivered on recvmsg are plain data (no file descriptors or values that are translated based on the identity of the calling process). This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE with tx zero-copy enabled. Before this patch, we received -EINVALID from this specific code path. After this patch, we could read tcp tx zero-copy completion notifications from the MSG_ERRQUEUE. Signed-off-by: Soheil Hassas Yeganeh Signed-off-by: Arjun Roy Acked-by: Eric Dumazet Reviewed-by: Jann Horn Reviewed-by: Jens Axboe Signed-off-by: Luke Hsiao --- include/linux/net.h | 3 +++ net/ipv4/af_inet.c | 1 + net/ipv6/af_inet6.c | 1 + net/socket.c | 8 +++++--- 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index d48ff1180879..7657c6432a69 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -41,6 +41,8 @@ struct net; #define SOCK_PASSCRED 3 #define SOCK_PASSSEC 4 +#define PROTO_CMSG_DATA_ONLY 0x0001 + #ifndef ARCH_HAS_SOCKET_TYPES /** * enum sock_type - Socket types @@ -135,6 +137,7 @@ typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *, struct proto_ops { int family; + unsigned int flags; struct module *owner; int (*release) (struct socket *sock); int (*bind) (struct socket *sock, diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 4307503a6f0b..b7260c8cef2e 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1017,6 +1017,7 @@ static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon const struct proto_ops inet_stream_ops = { .family = PF_INET, + .flags = PROTO_CMSG_DATA_ONLY, .owner = THIS_MODULE, .release = inet_release, .bind = inet_bind, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 0306509ab063..d9a14935f402 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -661,6 +661,7 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, const struct proto_ops inet6_stream_ops = { .family = PF_INET6, + .flags = PROTO_CMSG_DATA_ONLY, .owner = THIS_MODULE, .release = inet6_release, .bind = inet6_bind, diff --git a/net/socket.c b/net/socket.c index dbbe8ea7d395..e84a8e281b4c 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2628,9 +2628,11 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, struct user_msghdr __user *umsg, struct sockaddr __user *uaddr, unsigned int flags) { - /* disallow ancillary data requests from this path */ - if (msg->msg_control || msg->msg_controllen) - return -EINVAL; + if (msg->msg_control || msg->msg_controllen) { + /* disallow ancillary data reqs unless cmsg is plain data */ + if (!(sock->ops->flags & PROTO_CMSG_DATA_ONLY)) + return -EINVAL; + } return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0); } From patchwork Sat Aug 22 02:04:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luke Hsiao X-Patchwork-Id: 1349573 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=RKmoBK/E; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BYMD04dPyz9sTW for ; Sat, 22 Aug 2020 12:05:20 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726670AbgHVCFM (ORCPT ); Fri, 21 Aug 2020 22:05:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725935AbgHVCFI (ORCPT ); Fri, 21 Aug 2020 22:05:08 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8518C061574 for ; Fri, 21 Aug 2020 19:05:07 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id g15so1692654plj.6 for ; Fri, 21 Aug 2020 19:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZeyYNnKESh4/AMObQMqL54TNNFM6lWLlOLYyi1zJfxo=; b=RKmoBK/E0xLvxsY5MilCYiysQTK6kimEgjPtRzKmjvo6yosmpwtNkVoPazM1dTsHqD BJav3bTKGTY7qWy5mJC4jxp6gcaKyTZhwrIP5t9871yPYvdH2uNB7eQ++fUsTw4Qo+xV WN38STNz/mJXXKE91FeLdRcrsTM43NEwEqKt4Cha465cm+7dQMJP85vVK8lNU2kn7MQs YnQiUeqw6GXeDyAlgCKBStgXpfc7/8HXfg53qlS0D7wZT68NqRAb6AZZRHYuK+9SemFo ++XKEjGMZ/z5A4Uyib+w4lyuQYqx4STTFY5ts0K6wCXIJgQyHs1dy9wwRrdShZeQYCg1 2PxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZeyYNnKESh4/AMObQMqL54TNNFM6lWLlOLYyi1zJfxo=; b=rxc9ATj+8staBvaN5nksj15r5QG2aMSUnkj6AWQ00HZ2fdD3a2MoKp0MoqE5DhkUDV wQPbzRd9PGSOChXJzGkOpPWJDkWWJUkYMrRT1iovls8kFIHgB1GRI9DoWNwnXcSiVOoD GSjuSsWW+ia+05D7u8Gk7qLRHekjTIKoE3NQmF2+conZhzM+cl9gKsC9UhtI/cWcEi1k h+hYXx85bzrZUajdzpNya0ng4NwS5LsAxpMmdL0cryJcaHJKFMcCEwk/YN7CDYzjubXP wuyy1ZyU2tjgdW1mcBVSWngTZpd1bS5s0pr4o3CoaiKRCFOD3XIiwPmDr+FtaRgPbjPY 4TAQ== X-Gm-Message-State: AOAM533pcX/ODvOIT0qK/8dLzUsKtUst4+B1brtcrNXz3xBtXDkwQaNz L4VJ2wBUVtL3VKN6Cmasaxw= X-Google-Smtp-Source: ABdhPJwOIJhgp9YmoWd/X33HV2j423sHQkh+VwBKCUnNIHsuqWgXSsfIrOxeEEX3iuSr15vXbd7jPg== X-Received: by 2002:a17:90b:283:: with SMTP id az3mr4777245pjb.10.1598061907255; Fri, 21 Aug 2020 19:05:07 -0700 (PDT) Received: from lukehsiao.c.googlers.com.com (40.156.233.35.bc.googleusercontent.com. [35.233.156.40]) by smtp.gmail.com with ESMTPSA id v78sm4129729pfc.121.2020.08.21.19.05.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Aug 2020 19:05:06 -0700 (PDT) From: Luke Hsiao To: David Miller Cc: netdev@vger.kernel.org, Jens Axboe , Jakub Kicinski , Luke Hsiao , Arjun Roy , Soheil Hassas Yeganeh , Eric Dumazet Subject: [PATCH net-next v2 2/2] io_uring: ignore POLLIN for recvmsg on MSG_ERRQUEUE Date: Fri, 21 Aug 2020 19:04:42 -0700 Message-Id: <20200822020442.2677358-2-luke.w.hsiao@gmail.com> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200822020442.2677358-1-luke.w.hsiao@gmail.com> References: <9abca73b-de63-f69d-caff-ae3ed24854de@kernel.dk> <20200822020442.2677358-1-luke.w.hsiao@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Luke Hsiao Currently, io_uring's recvmsg subscribes to both POLLERR and POLLIN. In the context of TCP tx zero-copy, this is inefficient since we are only reading the error queue and not using recvmsg to read POLLIN responses. This patch was tested by using a simple sending program to call recvmsg using io_uring with MSG_ERRQUEUE set and verifying with printks that the POLLIN is correctly unset when the msg flags are MSG_ERRQUEUE. Signed-off-by: Arjun Roy Signed-off-by: Soheil Hassas Yeganeh Acked-by: Eric Dumazet Reviewed-by: Jens Axboe Signed-off-by: Luke Hsiao --- fs/io_uring.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index dc506b75659c..fd5353e31a2c 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -79,6 +79,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -4902,7 +4903,8 @@ static __poll_t __io_arm_poll_handler(struct io_kiocb *req, return mask; } -static bool io_arm_poll_handler(struct io_kiocb *req) +static bool io_arm_poll_handler(struct io_kiocb *req, + const struct io_uring_sqe *sqe) { const struct io_op_def *def = &io_op_defs[req->opcode]; struct io_ring_ctx *ctx = req->ctx; @@ -4932,6 +4934,11 @@ static bool io_arm_poll_handler(struct io_kiocb *req) mask |= POLLIN | POLLRDNORM; if (def->pollout) mask |= POLLOUT | POLLWRNORM; + + /* If reading from MSG_ERRQUEUE using recvmsg, ignore POLLIN */ + if (req->opcode == IORING_OP_RECVMSG && (sqe->msg_flags & MSG_ERRQUEUE)) + mask &= ~POLLIN; + mask |= POLLERR | POLLPRI; ipt.pt._qproc = io_async_queue_proc; @@ -6146,7 +6153,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe, * doesn't support non-blocking read/write attempts */ if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) { - if (!io_arm_poll_handler(req)) { + if (!io_arm_poll_handler(req, sqe)) { punt: ret = io_prep_work_files(req); if (unlikely(ret))