From patchwork Thu Sep 1 22:04:38 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Linus Torvalds X-Patchwork-Id: 665062 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3sQGWw28X4z9vGw for ; Fri, 2 Sep 2016 08:05:00 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=dLuQ029i; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754551AbcIAWEq (ORCPT ); Thu, 1 Sep 2016 18:04:46 -0400 Received: from mail-oi0-f48.google.com ([209.85.218.48]:36490 "EHLO mail-oi0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753359AbcIAWEk (ORCPT ); Thu, 1 Sep 2016 18:04:40 -0400 Received: by mail-oi0-f48.google.com with SMTP id r205so57197084oih.3 for ; Thu, 01 Sep 2016 15:04:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=V8wYxgUq/mB8si4CqpkjG8Jdghbhlvt8+LxLure1DqY=; b=dLuQ029iYVVNJb1LC3E5knaEwh5h9zjF9FzHfEzx6FF513CY80YN0kkWvaMISZACRf zva+1bSNzf2/RF7QfkIrPodE9aexbWducTHzrRNxBWVDOmh7b8fpr4TIW/lEagreIOE1 WWxMCMpIjxaTNQzw0gOXZhxSUXDIwv4+Zq+II4q1kT9jbUu7Uw8w+L7AYrJ3Y52E4KCC 9vXNS2ZRXch+3A2cAbX81weNvuUdbMxaxs626lxlMDocCzyWPDZfw2uJy/OgvaSEHO/I ZnjajI9neiU5tDG4/5FDljRjcLZC+JnXXhOWhQZVJZLp01iOAktuoK70wFMV0QiLbPia V/IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=V8wYxgUq/mB8si4CqpkjG8Jdghbhlvt8+LxLure1DqY=; b=QV8g7tbRhCV+RN+E4EAeCtvyRxlTx1ZcReo+JCKcHNsjobuDHCy3loGPavo9IC7MV6 wVXuW1CeLvVaIdstSZK0r+DAJB0M1528Lq1ykp+YgECi4lMGeBQs3dgCnNc9K1asb3TN CdsLUveorY0fK5EKIpWru9iyQ1WdIECBLkx1vZ5HV8xs6kV7h/TQSVw6C8P/Lzw0CYFc norNSAuNStWwCXkV145EnunLAtnz183OGmgTRGE0/PFyqFkQLcU+oujSlgnGL8436wl1 DqMCIDrUL/p+nCsvCcISqicNUP5s6CVOGOhe7X2s8pnHL9Wt7n0opKPTW7JlxMb1MASR T2GA== X-Gm-Message-State: AE9vXwNBTqnHBNAVWxEr1aqQeR4vpN69GBSqTylJ/HqKfoMFztIAYJo8e7gqIR9LSrJOa5X5YXENgmGM3qDpxA== X-Received: by 10.202.53.132 with SMTP id c126mr15729097oia.3.1472767479127; Thu, 01 Sep 2016 15:04:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.195.6 with HTTP; Thu, 1 Sep 2016 15:04:38 -0700 (PDT) In-Reply-To: References: <87h99zo4d8.fsf@doppelsaurus.mobileactivedefense.com> <871t13o1n1.fsf@doppelsaurus.mobileactivedefense.com> <6f7d587b-3933-7c02-a804-db1732ced1cf@stressinduktion.org> <20160901204746.GR2356@ZenIV.linux.org.uk> <20160901210142.GS2356@ZenIV.linux.org.uk> From: Linus Torvalds Date: Thu, 1 Sep 2016 15:04:38 -0700 X-Google-Sender-Auth: -RY7Q4KBZeUKKyBdpW-dvMWJYMQ Message-ID: Subject: Re: possible circular locking dependency detected To: Al Viro , CAI Qian Cc: Miklos Szeredi , Rainer Weikusat , Hannes Frederic Sowa , Rainer Weikusat , Eric Sandeen , Network Development Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds wrote: > On Thu, Sep 1, 2016 at 2:01 PM, Al Viro wrote: >> >> Outside as in "all fs activity in bind happens under it". Along with >> assignment to ->u.addr, etc. IOW, make it the outermost lock there. > > Hah, yes. I misunderstood you. > > Yes. In fact that fixes the problem I mentioned, rather than introducing it. So the easiest approach would seem to be to revert commit c845acb324aa ("af_unix: Fix splice-bind deadlock"), and then apply the lock split. Like the attached two patches. This is still *entirely* untested. Rainer? Linus Tested-by: CAI Qian Acked-by: Al Viro From 9a76489d81f6d2b1da22906363d28c398d4f7c5c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 1 Sep 2016 14:43:53 -0700 Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' Right now we use the 'readlock' both for protecting some of the af_unix IO path and for making the bind be single-threaded. The two are independent, but using the same lock makes for a nasty deadlock due to ordering with regards to filesystem locking. The bind locking would want to nest outside the VSF pathname locking, but the IO locking wants to nest inside some of those same locks. We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix splice-bind deadlock") which moved the readlock inside the vfs locks, but that caused problems with overlayfs that will then call back into filesystem routines that take the lock in the wrong order anyway. Splitting the locks means that we can go back to having the bind lock be the outermost lock, and we don't have any deadlocks with lock ordering. Signed-off-by: Linus Torvalds --- include/net/af_unix.h | 2 +- net/unix/af_unix.c | 45 +++++++++++++++++++++++---------------------- 2 files changed, 24 insertions(+), 23 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 9b4c418bebd8..fd60eccb59a6 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -52,7 +52,7 @@ struct unix_sock { struct sock sk; struct unix_address *addr; struct path path; - struct mutex readlock; + struct mutex iolock, bindlock; struct sock *peer; struct list_head link; atomic_long_t inflight; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 433ae1bbef97..8309687a56b0 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -661,11 +661,11 @@ static int unix_set_peek_off(struct sock *sk, int val) { struct unix_sock *u = unix_sk(sk); - if (mutex_lock_interruptible(&u->readlock)) + if (mutex_lock_interruptible(&u->iolock)) return -EINTR; sk->sk_peek_off = val; - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); return 0; } @@ -779,7 +779,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) spin_lock_init(&u->lock); atomic_long_set(&u->inflight, 0); INIT_LIST_HEAD(&u->link); - mutex_init(&u->readlock); /* single task reading lock */ + mutex_init(&u->iolock); /* single task reading lock */ + mutex_init(&u->bindlock); /* single task binding lock */ init_waitqueue_head(&u->peer_wait); init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); @@ -848,7 +849,7 @@ static int unix_autobind(struct socket *sock) int err; unsigned int retries = 0; - err = mutex_lock_interruptible(&u->readlock); + err = mutex_lock_interruptible(&u->bindlock); if (err) return err; @@ -895,7 +896,7 @@ retry: spin_unlock(&unix_table_lock); err = 0; -out: mutex_unlock(&u->readlock); +out: mutex_unlock(&u->bindlock); return err; } @@ -1009,7 +1010,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) goto out; addr_len = err; - err = mutex_lock_interruptible(&u->readlock); + err = mutex_lock_interruptible(&u->bindlock); if (err) goto out; @@ -1063,7 +1064,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) out_unlock: spin_unlock(&unix_table_lock); out_up: - mutex_unlock(&u->readlock); + mutex_unlock(&u->bindlock); out: return err; } @@ -1955,17 +1956,17 @@ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page, if (false) { alloc_skb: unix_state_unlock(other); - mutex_unlock(&unix_sk(other)->readlock); + mutex_unlock(&unix_sk(other)->iolock); newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT, &err, 0); if (!newskb) goto err; } - /* we must acquire readlock as we modify already present + /* we must acquire iolock as we modify already present * skbs in the sk_receive_queue and mess with skb->len */ - err = mutex_lock_interruptible(&unix_sk(other)->readlock); + err = mutex_lock_interruptible(&unix_sk(other)->iolock); if (err) { err = flags & MSG_DONTWAIT ? -EAGAIN : -ERESTARTSYS; goto err; @@ -2032,7 +2033,7 @@ alloc_skb: } unix_state_unlock(other); - mutex_unlock(&unix_sk(other)->readlock); + mutex_unlock(&unix_sk(other)->iolock); other->sk_data_ready(other); scm_destroy(&scm); @@ -2041,7 +2042,7 @@ alloc_skb: err_state_unlock: unix_state_unlock(other); err_unlock: - mutex_unlock(&unix_sk(other)->readlock); + mutex_unlock(&unix_sk(other)->iolock); err: kfree_skb(newskb); if (send_sigpipe && !(flags & MSG_NOSIGNAL)) @@ -2109,7 +2110,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); do { - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); skip = sk_peek_offset(sk, flags); skb = __skb_try_recv_datagram(sk, flags, &peeked, &skip, &err, @@ -2117,14 +2118,14 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, if (skb) break; - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); if (err != -EAGAIN) break; } while (timeo && !__skb_wait_for_more_packets(sk, &err, &timeo, last)); - if (!skb) { /* implies readlock unlocked */ + if (!skb) { /* implies iolock unlocked */ unix_state_lock(sk); /* Signal EOF on disconnected non-blocking SEQPACKET socket. */ if (sk->sk_type == SOCK_SEQPACKET && err == -EAGAIN && @@ -2189,7 +2190,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, out_free: skb_free_datagram(sk, skb); - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); out: return err; } @@ -2284,7 +2285,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state) /* Lock the socket to prevent queue disordering * while sleeps in memcpy_tomsg */ - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); if (flags & MSG_PEEK) skip = sk_peek_offset(sk, flags); @@ -2326,7 +2327,7 @@ again: break; } - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); timeo = unix_stream_data_wait(sk, timeo, last, last_len); @@ -2337,7 +2338,7 @@ again: goto out; } - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); goto redo; unlock: unix_state_unlock(sk); @@ -2440,7 +2441,7 @@ unlock: } } while (size); - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); if (state->msg) scm_recv(sock, state->msg, &scm, flags); else @@ -2481,9 +2482,9 @@ static ssize_t skb_unix_socket_splice(struct sock *sk, int ret; struct unix_sock *u = unix_sk(sk); - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); ret = splice_to_pipe(pipe, spd); - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); return ret; } -- 2.10.0.rc0.2.g0a9fa47