Message ID | 20200820234954.1784522-2-luke.w.hsiao@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | Support reading msg errq from io_uring | expand |
On 8/20/20 5:49 PM, Luke Hsiao wrote: > From: Luke Hsiao <lukehsiao@google.com> > > For TCP tx zero-copy, the kernel notifies the process of completions by > queuing completion notifications on the socket error queue. This patch > allows reading these notifications via recvmsg to support TCP tx > zero-copy. > > Ancillary data was originally disallowed due to privilege escalation > via io_uring's offloading of sendmsg() onto a kernel thread with kernel > credentials (https://crbug.com/project-zero/1975). So, we must ensure > that the socket type is one where the ancillary data types that are > delivered on recvmsg are plain data (no file descriptors or values that > are translated based on the identity of the calling process). > > This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE > with tx zero-copy enabled. Before this patch, we received -EINVALID from > this specific code path. After this patch, we could read tcp tx > zero-copy completion notifications from the MSG_ERRQUEUE. > > Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> > Signed-off-by: Arjun Roy <arjunroy@google.com> > Acked-by: Eric Dumazet <edumazet@google.com> > Reviewed-by: Jann Horn <jannh@google.com> > Signed-off-by: Luke Hsiao <lukehsiao@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk>
diff --git a/include/linux/net.h b/include/linux/net.h index d48ff1180879..7657c6432a69 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -41,6 +41,8 @@ struct net; #define SOCK_PASSCRED 3 #define SOCK_PASSSEC 4 +#define PROTO_CMSG_DATA_ONLY 0x0001 + #ifndef ARCH_HAS_SOCKET_TYPES /** * enum sock_type - Socket types @@ -135,6 +137,7 @@ typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *, struct proto_ops { int family; + unsigned int flags; struct module *owner; int (*release) (struct socket *sock); int (*bind) (struct socket *sock, diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 4307503a6f0b..b7260c8cef2e 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1017,6 +1017,7 @@ static int inet_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon const struct proto_ops inet_stream_ops = { .family = PF_INET, + .flags = PROTO_CMSG_DATA_ONLY, .owner = THIS_MODULE, .release = inet_release, .bind = inet_bind, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 0306509ab063..d9a14935f402 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -661,6 +661,7 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, const struct proto_ops inet6_stream_ops = { .family = PF_INET6, + .flags = PROTO_CMSG_DATA_ONLY, .owner = THIS_MODULE, .release = inet6_release, .bind = inet6_bind, diff --git a/net/socket.c b/net/socket.c index dbbe8ea7d395..e84a8e281b4c 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2628,9 +2628,11 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg, struct user_msghdr __user *umsg, struct sockaddr __user *uaddr, unsigned int flags) { - /* disallow ancillary data requests from this path */ - if (msg->msg_control || msg->msg_controllen) - return -EINVAL; + if (msg->msg_control || msg->msg_controllen) { + /* disallow ancillary data reqs unless cmsg is plain data */ + if (!(sock->ops->flags & PROTO_CMSG_DATA_ONLY)) + return -EINVAL; + } return ____sys_recvmsg(sock, msg, umsg, uaddr, flags, 0); }