Message ID | CA+55aFwntZpaAutp_C=sLBg8ravfYGRoF4TTG1cfE3SjaJEOtw@mail.gmail.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
----- Original Message ----- > From: "Linus Torvalds" <torvalds@linux-foundation.org> > To: "Al Viro" <viro@zeniv.linux.org.uk>, "CAI Qian" <caiqian@redhat.com> > Cc: "Miklos Szeredi" <miklos@szeredi.hu>, "Rainer Weikusat" <rweikusat@cyberadapt.com>, "Hannes Frederic Sowa" > <hannes@stressinduktion.org>, "Rainer Weikusat" <rweikusat@mobileactivedefense.com>, "Eric Sandeen" > <esandeen@redhat.com>, "Network Development" <netdev@vger.kernel.org> > Sent: Thursday, September 1, 2016 6:04:38 PM > Subject: Re: possible circular locking dependency detected > > On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Thu, Sep 1, 2016 at 2:01 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > >> > >> Outside as in "all fs activity in bind happens under it". Along with > >> assignment to ->u.addr, etc. IOW, make it the outermost lock there. > > > > Hah, yes. I misunderstood you. > > > > Yes. In fact that fixes the problem I mentioned, rather than introducing > > it. > > So the easiest approach would seem to be to revert commit c845acb324aa > ("af_unix: Fix splice-bind deadlock"), and then apply the lock split. > > Like the attached two patches. > > This is still *entirely* untested. Tested-by: CAI Qian <caiqian@redhat.com> > > Rainer? > > Linus >
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> On Thu, Sep 1, 2016 at 2:01 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >>> >>> Outside as in "all fs activity in bind happens under it". Along with >>> assignment to ->u.addr, etc. IOW, make it the outermost lock there. >> >> Hah, yes. I misunderstood you. >> >> Yes. In fact that fixes the problem I mentioned, rather than introducing it. > > So the easiest approach would seem to be to revert commit c845acb324aa > ("af_unix: Fix splice-bind deadlock"), and then apply the lock split. > > Like the attached two patches. > > This is still *entirely* untested. As far as I can tell, this should work as I can't currently imagine why a fs operation might end up binding a unix socket despite the idea to make af_unix.c yet more complicated in order to work around irregular behaviour of (as far as I can tell) a single filesystem (for which kern_path_create doesn't really mean kern_path_create and it has to work around that once it gets control) goes against all instincts I have in this area. If filesystems need to do arbitrary stuff when __sb_start_write is called for 'their' superblock, they should be able to do so directly. At present, this is a theoretic concern as I can't (due to other work committments) put any non-cursory work into this before Sunday. There may also be other reasons why this idea is impractical or even unworkable.
----- Original Message ----- > From: "CAI Qian" <caiqian@redhat.com> > To: "Linus Torvalds" <torvalds@linux-foundation.org> > Cc: "Al Viro" <viro@zeniv.linux.org.uk>, "Miklos Szeredi" <miklos@szeredi.hu>, "Rainer Weikusat" > <rweikusat@cyberadapt.com>, "Hannes Frederic Sowa" <hannes@stressinduktion.org>, "Rainer Weikusat" > <rweikusat@mobileactivedefense.com>, "Eric Sandeen" <esandeen@redhat.com>, "Network Development" > <netdev@vger.kernel.org> > Sent: Friday, September 2, 2016 10:43:20 AM > Subject: Re: possible circular locking dependency detected > > > > ----- Original Message ----- > > From: "Linus Torvalds" <torvalds@linux-foundation.org> > > To: "Al Viro" <viro@zeniv.linux.org.uk>, "CAI Qian" <caiqian@redhat.com> > > Cc: "Miklos Szeredi" <miklos@szeredi.hu>, "Rainer Weikusat" > > <rweikusat@cyberadapt.com>, "Hannes Frederic Sowa" > > <hannes@stressinduktion.org>, "Rainer Weikusat" > > <rweikusat@mobileactivedefense.com>, "Eric Sandeen" > > <esandeen@redhat.com>, "Network Development" <netdev@vger.kernel.org> > > Sent: Thursday, September 1, 2016 6:04:38 PM > > Subject: Re: possible circular locking dependency detected > > > > On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > On Thu, Sep 1, 2016 at 2:01 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > >> > > >> Outside as in "all fs activity in bind happens under it". Along with > > >> assignment to ->u.addr, etc. IOW, make it the outermost lock there. > > > > > > Hah, yes. I misunderstood you. > > > > > > Yes. In fact that fixes the problem I mentioned, rather than introducing > > > it. > > > > So the easiest approach would seem to be to revert commit c845acb324aa > > ("af_unix: Fix splice-bind deadlock"), and then apply the lock split. > > > > Like the attached two patches. > > > > This is still *entirely* untested. > Tested-by: CAI Qian <caiqian@redhat.com> Actually, I took it back, and now spice seems start to deadlock using the reproducer, https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/splice/splice01.c [ 1749.956818] [ 1749.958492] ====================================================== [ 1749.965386] [ INFO: possible circular locking dependency detected ] [ 1749.972381] 4.8.0-rc4+ #34 Not tainted [ 1749.976560] ------------------------------------------------------- [ 1749.983554] splice01/35921 is trying to acquire lock: [ 1749.989188] (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 [xfs] [ 1750.001644] [ 1750.001644] but task is already holding lock: [ 1750.008151] (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8169e7c1>] pipe_lock+0x51/0x60 [ 1750.016753] [ 1750.016753] which lock already depends on the new lock. [ 1750.016753] [ 1750.025880] [ 1750.025880] the existing dependency chain (in reverse order) is: [ 1750.034229] -> #2 (&pipe->mutex/1){+.+.+.}: [ 1750.039139] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 [ 1750.045857] [<ffffffff8266448d>] mutex_lock_nested+0xdd/0x850 [ 1750.052963] [<ffffffff8169e7c1>] pipe_lock+0x51/0x60 [ 1750.059190] [<ffffffff8171ee25>] splice_to_pipe+0x75/0x9e0 [ 1750.066001] [<ffffffff81723991>] __generic_file_splice_read+0xa71/0xe90 [ 1750.074071] [<ffffffff81723e71>] generic_file_splice_read+0xc1/0x1f0 [ 1750.081849] [<ffffffffa0838628>] xfs_file_splice_read+0x368/0x7b0 [xfs] [ 1750.089940] [<ffffffff8171fa7e>] do_splice_to+0xee/0x150 [ 1750.096555] [<ffffffff817262f4>] SyS_splice+0x1144/0x1c10 [ 1750.103269] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 [ 1750.110084] [<ffffffff8266ea7f>] return_from_SYSCALL_64+0x0/0x7a [ 1750.117479] -> #1 (&(&ip->i_iolock)->mr_lock#2){++++++}: [ 1750.123649] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 [ 1750.130362] [<ffffffff8129b93e>] down_write_nested+0x5e/0xe0 [ 1750.137371] [<ffffffffa086772e>] xfs_ilock+0x2fe/0x550 [xfs] [ 1750.144397] [<ffffffffa083c204>] xfs_file_buffered_aio_write+0x134/0x840 [xfs] [ 1750.153175] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 [xfs] [ 1750.161177] [<ffffffff8168374e>] __vfs_write+0x2be/0x640 [ 1750.167799] [<ffffffff816876e2>] vfs_write+0x152/0x4b0 [ 1750.174220] [<ffffffff8168b0df>] SyS_write+0xdf/0x1d0 [ 1750.180547] [<ffffffff8266e9bc>] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 1750.188328] -> #0 (&sb->s_type->i_mutex_key#14){+.+.+.}: [ 1750.194508] [<ffffffff812adbc3>] __lock_acquire+0x3043/0x3dd0 [ 1750.201609] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 [ 1750.208321] [<ffffffff82668cda>] down_write+0x5a/0xe0 [ 1750.214645] [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 [xfs] [ 1750.223421] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 [xfs] [ 1750.231423] [<ffffffff816859be>] vfs_iter_write+0x29e/0x550 [ 1750.238330] [<ffffffff81722729>] iter_file_splice_write+0x529/0xb70 [ 1750.246012] [<ffffffff817258d4>] SyS_splice+0x724/0x1c10 [ 1750.252627] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 [ 1750.259438] [<ffffffff8266ea7f>] return_from_SYSCALL_64+0x0/0x7a [ 1750.266830] [ 1750.266830] other info that might help us debug this: [ 1750.266830] [ 1750.275764] Chain exists of: &sb->s_type->i_mutex_key#14 --> &(&ip->i_iolock)->mr_lock#2 --> &pipe->mutex/1 [ 1750.287213] Possible unsafe locking scenario: [ 1750.287213] [ 1750.293817] CPU0 CPU1 [ 1750.298871] ---- ---- [ 1750.303924] lock(&pipe->mutex/1); [ 1750.307845] lock(&(&ip->i_iolock)->mr_lock#2); [ 1750.315836] lock(&pipe->mutex/1); [ 1750.322567] lock(&sb->s_type->i_mutex_key#14); [ 1750.327748] [ 1750.327748] *** DEADLOCK *** [ 1750.327748] [ 1750.334355] 2 locks held by splice01/35921: [ 1750.339019] #0: (sb_writers#8){.+.+.+}, at: [<ffffffff8168f444>] __sb_start_write+0xb4/0xf0 [ 1750.348595] #1: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8169e7c1>] pipe_lock+0x51/0x60 [ 1750.357686] [ 1750.357686] stack backtrace: [ 1750.362548] CPU: 50 PID: 35921 Comm: splice01 Not tainted 4.8.0-rc4+ #34 [ 1750.370026] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0044.R00.1501191641 01/19/2015 [ 1750.381382] 0000000000000000 000000003bca9477 ffff88044c4176e0 ffffffff81a3d191 [ 1750.389675] ffffffff84292880 ffffffff842b9e30 ffff88044c417730 ffffffff812a6aa6 [ 1750.397968] ffffffff84292880 ffff880414a28cd0 ffff88044c417850 ffff880414a28cd0 [ 1750.406261] Call Trace: [ 1750.408992] [<ffffffff81a3d191>] dump_stack+0x85/0xc4 [ 1750.414725] [<ffffffff812a6aa6>] print_circular_bug+0x356/0x460 [ 1750.421428] [<ffffffff812adbc3>] __lock_acquire+0x3043/0x3dd0 [ 1750.427942] [<ffffffff81414fe9>] ? is_ftrace_trampoline+0x99/0xe0 [ 1750.434840] [<ffffffff812aab80>] ? debug_check_no_locks_freed+0x2c0/0x2c0 [ 1750.442512] [<ffffffff812a0272>] ? add_lock_to_list.isra.29.constprop.45+0x142/0x1d0 [ 1750.451249] [<ffffffff812acd9e>] ? __lock_acquire+0x221e/0x3dd0 [ 1750.457952] [<ffffffff812aa3ce>] ? trace_hardirqs_on_caller+0x3fe/0x580 [ 1750.465430] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 [ 1750.471578] [<ffffffffa083c1f7>] ? xfs_file_buffered_aio_write+0x127/0x840 [xfs] [ 1750.479929] [<ffffffff82668cda>] down_write+0x5a/0xe0 [ 1750.485691] [<ffffffffa083c1f7>] ? xfs_file_buffered_aio_write+0x127/0x840 [xfs] [ 1750.494070] [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 [xfs] [ 1750.502226] [<ffffffff81007b66>] ? do_syscall_64+0x1a6/0x500 [ 1750.508666] [<ffffffffa083c0d0>] ? xfs_file_dio_aio_write+0xca0/0xca0 [xfs] [ 1750.516532] [<ffffffff812a9f72>] ? mark_held_locks+0xd2/0x130 [ 1750.523044] [<ffffffff812f5887>] ? debug_lockdep_rcu_enabled+0x77/0x90 [ 1750.530417] [<ffffffff82664873>] ? mutex_lock_nested+0x4c3/0x850 [ 1750.537243] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 [xfs] [ 1750.544625] [<ffffffff8169e7c1>] ? pipe_lock+0x51/0x60 [ 1750.550456] [<ffffffff816859be>] vfs_iter_write+0x29e/0x550 [ 1750.556770] [<ffffffff81685720>] ? vfs_iter_read+0x540/0x540 [ 1750.563181] [<ffffffff81722729>] iter_file_splice_write+0x529/0xb70 [ 1750.570271] [<ffffffff81722200>] ? page_cache_pipe_buf_confirm+0x1f0/0x1f0 [ 1750.578041] [<ffffffff812f5a33>] ? rcu_read_lock_sched_held+0xa3/0x120 [ 1750.585423] [<ffffffff812f6055>] ? rcu_sync_lockdep_assert+0x75/0xb0 [ 1750.592609] [<ffffffff8129bd6c>] ? percpu_down_read+0x5c/0xa0 [ 1750.599118] [<ffffffff8168f444>] ? __sb_start_write+0xb4/0xf0 [ 1750.605627] [<ffffffff817258d4>] SyS_splice+0x724/0x1c10 [ 1750.611651] [<ffffffff812f5a33>] ? rcu_read_lock_sched_held+0xa3/0x120 [ 1750.619033] [<ffffffff817251b0>] ? compat_SyS_vmsplice+0x1f0/0x1f0 [ 1750.626025] [<ffffffff81007a12>] ? do_syscall_64+0x52/0x500 [ 1750.632338] [<ffffffff817251b0>] ? compat_SyS_vmsplice+0x1f0/0x1f0 [ 1750.639330] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 [ 1750.645549] [<ffffffff8100401a>] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1750.652737] [<ffffffff8266ea7f>] entry_SYSCALL64_slow_path+0x25/0x25 > > > > Rainer? > > > > Linus > > >
On Fri, Sep 02, 2016 at 04:18:04PM +0100, Rainer Weikusat wrote: > As far as I can tell, this should work as I can't currently imagine why > a fs operation might end up binding a unix socket despite the idea to > make af_unix.c yet more complicated in order to work around irregular > behaviour of (as far as I can tell) a single filesystem (for which > kern_path_create doesn't really mean kern_path_create Bullshit. kern_path_create() *does* mean the same thing in all cases. Namely, find the parent, lock it and leave the final name component for the create-type operation. It sure as hell is not guaranteed to take *all* locks that are going to be taken in process of mknod/mkdir/etc. Never had been. and it has to work > around that once it gets control) goes against all instincts I have in > this area. If filesystems need to do arbitrary stuff when > __sb_start_write is called for 'their' superblock, they should be able > to do so directly. > > At present, this is a theoretic concern as I can't (due to other work > committments) put any non-cursory work into this before Sunday. There > may also be other reasons why this idea is impractical or even > unworkable.
Al Viro <viro@ZenIV.linux.org.uk> writes: > On Fri, Sep 02, 2016 at 04:18:04PM +0100, Rainer Weikusat wrote: > >> As far as I can tell, this should work as I can't currently imagine >> why a fs operation might end up binding a unix socket despite the >> idea to make af_unix.c yet more complicated in order to work around >> irregular behaviour of (as far as I can tell) a single filesystem >> (for which kern_path_create doesn't really mean kern_path_create and >> it has to work around that once it gets control) goes against all >> instincts I have in this area. If filesystems need to do arbitrary >> stuff when __sb_start_write is called for 'their' superblock, they >> should be able to do so directly. > > Bullshit. kern_path_create() *does* mean the same thing in all cases. > Namely, find the parent, lock it and leave the final name component for > the create-type operation. It sure as hell is not guaranteed to take > *all* locks that are going to be taken in process of mknod/mkdir/etc. > Never had been. This isn't about "all locks", it's about the lock in question. No other mknod operation (I'm aware of) calls this with another superblock than the one already acted upon by kern_path_create. This may be wrong (if so, feel free to correct it) but it's not "bullshit" (intentional deception in order to sell something to someone).
----- Original Message ----- > From: "CAI Qian" <caiqian@redhat.com> > To: "Linus Torvalds" <torvalds@linux-foundation.org> > Cc: "Al Viro" <viro@zeniv.linux.org.uk>, "Miklos Szeredi" <miklos@szeredi.hu>, "Rainer Weikusat" > <rweikusat@cyberadapt.com>, "Hannes Frederic Sowa" <hannes@stressinduktion.org>, "Rainer Weikusat" > <rweikusat@mobileactivedefense.com>, "Eric Sandeen" <esandeen@redhat.com>, "Network Development" > <netdev@vger.kernel.org> > Sent: Friday, September 2, 2016 11:51:58 AM > Subject: Re: possible circular locking dependency detected > > > > ----- Original Message ----- > > From: "CAI Qian" <caiqian@redhat.com> > > To: "Linus Torvalds" <torvalds@linux-foundation.org> > > Cc: "Al Viro" <viro@zeniv.linux.org.uk>, "Miklos Szeredi" > > <miklos@szeredi.hu>, "Rainer Weikusat" > > <rweikusat@cyberadapt.com>, "Hannes Frederic Sowa" > > <hannes@stressinduktion.org>, "Rainer Weikusat" > > <rweikusat@mobileactivedefense.com>, "Eric Sandeen" <esandeen@redhat.com>, > > "Network Development" > > <netdev@vger.kernel.org> > > Sent: Friday, September 2, 2016 10:43:20 AM > > Subject: Re: possible circular locking dependency detected > > > > > > > > ----- Original Message ----- > > > From: "Linus Torvalds" <torvalds@linux-foundation.org> > > > To: "Al Viro" <viro@zeniv.linux.org.uk>, "CAI Qian" <caiqian@redhat.com> > > > Cc: "Miklos Szeredi" <miklos@szeredi.hu>, "Rainer Weikusat" > > > <rweikusat@cyberadapt.com>, "Hannes Frederic Sowa" > > > <hannes@stressinduktion.org>, "Rainer Weikusat" > > > <rweikusat@mobileactivedefense.com>, "Eric Sandeen" > > > <esandeen@redhat.com>, "Network Development" <netdev@vger.kernel.org> > > > Sent: Thursday, September 1, 2016 6:04:38 PM > > > Subject: Re: possible circular locking dependency detected > > > > > > On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds > > > <torvalds@linux-foundation.org> wrote: > > > > On Thu, Sep 1, 2016 at 2:01 PM, Al Viro <viro@zeniv.linux.org.uk> > > > > wrote: > > > >> > > > >> Outside as in "all fs activity in bind happens under it". Along with > > > >> assignment to ->u.addr, etc. IOW, make it the outermost lock there. > > > > > > > > Hah, yes. I misunderstood you. > > > > > > > > Yes. In fact that fixes the problem I mentioned, rather than > > > > introducing > > > > it. > > > > > > So the easiest approach would seem to be to revert commit c845acb324aa > > > ("af_unix: Fix splice-bind deadlock"), and then apply the lock split. > > > > > > Like the attached two patches. > > > > > > This is still *entirely* untested. > > Tested-by: CAI Qian <caiqian@redhat.com> OK, this tag still stand. The below issue is also reproduced without those patches, so a separate problem most likely was introduced recently (after rc3 or rc4) by probably some xfs update. CAI Qian > Actually, I took it back, and now spice seems start to deadlock using the > reproducer, > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/splice/splice01.c > > [ 1749.956818] > [ 1749.958492] ====================================================== > [ 1749.965386] [ INFO: possible circular locking dependency detected ] > [ 1749.972381] 4.8.0-rc4+ #34 Not tainted > [ 1749.976560] ------------------------------------------------------- > [ 1749.983554] splice01/35921 is trying to acquire lock: > [ 1749.989188] (&sb->s_type->i_mutex_key#14){+.+.+.}, at: > [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.001644] > [ 1750.001644] but task is already holding lock: > [ 1750.008151] (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8169e7c1>] > pipe_lock+0x51/0x60 > [ 1750.016753] > [ 1750.016753] which lock already depends on the new lock. > [ 1750.016753] > [ 1750.025880] > [ 1750.025880] the existing dependency chain (in reverse order) is: > [ 1750.034229] > -> #2 (&pipe->mutex/1){+.+.+.}: > [ 1750.039139] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.045857] [<ffffffff8266448d>] mutex_lock_nested+0xdd/0x850 > [ 1750.052963] [<ffffffff8169e7c1>] pipe_lock+0x51/0x60 > [ 1750.059190] [<ffffffff8171ee25>] splice_to_pipe+0x75/0x9e0 > [ 1750.066001] [<ffffffff81723991>] > __generic_file_splice_read+0xa71/0xe90 > [ 1750.074071] [<ffffffff81723e71>] > generic_file_splice_read+0xc1/0x1f0 > [ 1750.081849] [<ffffffffa0838628>] xfs_file_splice_read+0x368/0x7b0 > [xfs] > [ 1750.089940] [<ffffffff8171fa7e>] do_splice_to+0xee/0x150 > [ 1750.096555] [<ffffffff817262f4>] SyS_splice+0x1144/0x1c10 > [ 1750.103269] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 > [ 1750.110084] [<ffffffff8266ea7f>] return_from_SYSCALL_64+0x0/0x7a > [ 1750.117479] > -> #1 (&(&ip->i_iolock)->mr_lock#2){++++++}: > [ 1750.123649] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.130362] [<ffffffff8129b93e>] down_write_nested+0x5e/0xe0 > [ 1750.137371] [<ffffffffa086772e>] xfs_ilock+0x2fe/0x550 [xfs] > [ 1750.144397] [<ffffffffa083c204>] > xfs_file_buffered_aio_write+0x134/0x840 [xfs] > [ 1750.153175] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 > [xfs] > [ 1750.161177] [<ffffffff8168374e>] __vfs_write+0x2be/0x640 > [ 1750.167799] [<ffffffff816876e2>] vfs_write+0x152/0x4b0 > [ 1750.174220] [<ffffffff8168b0df>] SyS_write+0xdf/0x1d0 > [ 1750.180547] [<ffffffff8266e9bc>] > entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 1750.188328] > -> #0 (&sb->s_type->i_mutex_key#14){+.+.+.}: > [ 1750.194508] [<ffffffff812adbc3>] __lock_acquire+0x3043/0x3dd0 > [ 1750.201609] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.208321] [<ffffffff82668cda>] down_write+0x5a/0xe0 > [ 1750.214645] [<ffffffffa083c1f7>] > xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.223421] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 > [xfs] > [ 1750.231423] [<ffffffff816859be>] vfs_iter_write+0x29e/0x550 > [ 1750.238330] [<ffffffff81722729>] iter_file_splice_write+0x529/0xb70 > [ 1750.246012] [<ffffffff817258d4>] SyS_splice+0x724/0x1c10 > [ 1750.252627] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 > [ 1750.259438] [<ffffffff8266ea7f>] return_from_SYSCALL_64+0x0/0x7a > [ 1750.266830] > [ 1750.266830] other info that might help us debug this: > [ 1750.266830] > [ 1750.275764] Chain exists of: > &sb->s_type->i_mutex_key#14 --> &(&ip->i_iolock)->mr_lock#2 --> > &pipe->mutex/1 > > [ 1750.287213] Possible unsafe locking scenario: > [ 1750.287213] > [ 1750.293817] CPU0 CPU1 > [ 1750.298871] ---- ---- > [ 1750.303924] lock(&pipe->mutex/1); > [ 1750.307845] > lock(&(&ip->i_iolock)->mr_lock#2); > [ 1750.315836] lock(&pipe->mutex/1); > [ 1750.322567] lock(&sb->s_type->i_mutex_key#14); > [ 1750.327748] > [ 1750.327748] *** DEADLOCK *** > [ 1750.327748] > [ 1750.334355] 2 locks held by splice01/35921: > [ 1750.339019] #0: (sb_writers#8){.+.+.+}, at: [<ffffffff8168f444>] > __sb_start_write+0xb4/0xf0 > [ 1750.348595] #1: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8169e7c1>] > pipe_lock+0x51/0x60 > [ 1750.357686] > [ 1750.357686] stack backtrace: > [ 1750.362548] CPU: 50 PID: 35921 Comm: splice01 Not tainted 4.8.0-rc4+ #34 > [ 1750.370026] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS > GRNDSDP1.86B.0044.R00.1501191641 01/19/2015 > [ 1750.381382] 0000000000000000 000000003bca9477 ffff88044c4176e0 > ffffffff81a3d191 > [ 1750.389675] ffffffff84292880 ffffffff842b9e30 ffff88044c417730 > ffffffff812a6aa6 > [ 1750.397968] ffffffff84292880 ffff880414a28cd0 ffff88044c417850 > ffff880414a28cd0 > [ 1750.406261] Call Trace: > [ 1750.408992] [<ffffffff81a3d191>] dump_stack+0x85/0xc4 > [ 1750.414725] [<ffffffff812a6aa6>] print_circular_bug+0x356/0x460 > [ 1750.421428] [<ffffffff812adbc3>] __lock_acquire+0x3043/0x3dd0 > [ 1750.427942] [<ffffffff81414fe9>] ? is_ftrace_trampoline+0x99/0xe0 > [ 1750.434840] [<ffffffff812aab80>] ? debug_check_no_locks_freed+0x2c0/0x2c0 > [ 1750.442512] [<ffffffff812a0272>] ? > add_lock_to_list.isra.29.constprop.45+0x142/0x1d0 > [ 1750.451249] [<ffffffff812acd9e>] ? __lock_acquire+0x221e/0x3dd0 > [ 1750.457952] [<ffffffff812aa3ce>] ? trace_hardirqs_on_caller+0x3fe/0x580 > [ 1750.465430] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.471578] [<ffffffffa083c1f7>] ? > xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.479929] [<ffffffff82668cda>] down_write+0x5a/0xe0 > [ 1750.485691] [<ffffffffa083c1f7>] ? > xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.494070] [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 > [xfs] > [ 1750.502226] [<ffffffff81007b66>] ? do_syscall_64+0x1a6/0x500 > [ 1750.508666] [<ffffffffa083c0d0>] ? xfs_file_dio_aio_write+0xca0/0xca0 > [xfs] > [ 1750.516532] [<ffffffff812a9f72>] ? mark_held_locks+0xd2/0x130 > [ 1750.523044] [<ffffffff812f5887>] ? debug_lockdep_rcu_enabled+0x77/0x90 > [ 1750.530417] [<ffffffff82664873>] ? mutex_lock_nested+0x4c3/0x850 > [ 1750.537243] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 [xfs] > [ 1750.544625] [<ffffffff8169e7c1>] ? pipe_lock+0x51/0x60 > [ 1750.550456] [<ffffffff816859be>] vfs_iter_write+0x29e/0x550 > [ 1750.556770] [<ffffffff81685720>] ? vfs_iter_read+0x540/0x540 > [ 1750.563181] [<ffffffff81722729>] iter_file_splice_write+0x529/0xb70 > [ 1750.570271] [<ffffffff81722200>] ? > page_cache_pipe_buf_confirm+0x1f0/0x1f0 > [ 1750.578041] [<ffffffff812f5a33>] ? rcu_read_lock_sched_held+0xa3/0x120 > [ 1750.585423] [<ffffffff812f6055>] ? rcu_sync_lockdep_assert+0x75/0xb0 > [ 1750.592609] [<ffffffff8129bd6c>] ? percpu_down_read+0x5c/0xa0 > [ 1750.599118] [<ffffffff8168f444>] ? __sb_start_write+0xb4/0xf0 > [ 1750.605627] [<ffffffff817258d4>] SyS_splice+0x724/0x1c10 > [ 1750.611651] [<ffffffff812f5a33>] ? rcu_read_lock_sched_held+0xa3/0x120 > [ 1750.619033] [<ffffffff817251b0>] ? compat_SyS_vmsplice+0x1f0/0x1f0 > [ 1750.626025] [<ffffffff81007a12>] ? do_syscall_64+0x52/0x500 > [ 1750.632338] [<ffffffff817251b0>] ? compat_SyS_vmsplice+0x1f0/0x1f0 > [ 1750.639330] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 > [ 1750.645549] [<ffffffff8100401a>] ? trace_hardirqs_on_thunk+0x1a/0x1c > [ 1750.652737] [<ffffffff8266ea7f>] entry_SYSCALL64_slow_path+0x25/0x25 > > > > > > Rainer? > > > > > > Linus > > > > >
On Fri, Sep 02, 2016 at 05:10:13PM +0100, Rainer Weikusat wrote: > > Bullshit. kern_path_create() *does* mean the same thing in all cases. > > Namely, find the parent, lock it and leave the final name component for > > the create-type operation. It sure as hell is not guaranteed to take > > *all* locks that are going to be taken in process of mknod/mkdir/etc. > > Never had been. > > This isn't about "all locks", it's about the lock in question. No other > mknod operation (I'm aware of) calls this with another superblock than > the one already acted upon by kern_path_create. This may be wrong (if > so, feel free to correct it) but it's not "bullshit" (intentional > deception in order to sell something to someone). > Never had been promised. And it's not just this lock - e.g. ->i_rwsem is taken on the parent by kern_path_create() and on parent in underlying filesystem by ecryptfs ->mknod() (as well as overlayfs one). bind/bind deadlock - one for a path to ecryptfs, another for that on the raw filesystem behind it (which can be mounted elsewhere/in another namespace/etc.) with those paths ending in the matching directories (the last components may be same or different - doesn't matter) A: lock parent in ecryptfs (via kern_path_create()) B: lock the directory behind it in underlying fs (ditto) A: grab ->readlock B: block on ->readlock A: call ecryptfs_mknod() and block trying to lock the directory held by B Deadlock. And while we are at it, ecryptfs probably ought to claim transient write access for the duration of modifications of the underlying one similar to overlayfs. The point is, it had never been promised that you can stick random locks just outside of ->mknod()/->mkdir()/etc. The same goes for e.g. NFS mount of something exported by localhost; knfsd must lock the parent directory on server before creating anything in it. Suppose you have /srv/nfs/foo exported and mounted on the same host at /mnt/bar. bind to /mnt/bar/barf/a vs. bind to /srv/nfs/foo/barf/b: A: lock /mnt/bar/barf B: lock /srv/nfs/foo/barf A: grab ->readlock B: block on ->readlock A: call nfs_mknod(), wait for reply knfsd: block trying to lock /srv/nfs/foo/barf It's very much _not_ just overlayfs being pathological - that it certainly is, but the problem is much wider. You could try to argue that kern_path_create() should've known to lock all relevant directories in case of overlayfs and ecryptfs, but it has absolutely no chance to do that in case of NFS - the protocol doesn't allow "lock this directory, one of my next requests will be to create something in it". Even leases are only for regular files...
On Fri, Sep 2, 2016 at 8:51 AM, CAI Qian <caiqian@redhat.com> wrote: > > Actually, I took it back, and now spice seems start to deadlock using the reproducer, > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/splice/splice01.c This is a different deadlock, though. This is a deadlock due to mixed lock ordering between the pipe mutex, the XFS "mr_lock", and the inode mutex. If I read the lockdep trace correctly, we have: Normally we have write doing inode->i_mutex -> i_iolock.mr_lock fro the regular write path. But the normal splice "write()" case has pipe->mutex -> filesystem write lock (normally i_mutex) (in iter_file_splice_write() that calls vfs_iter_write() that calls ->write_iter()) and then the XFS splice case as mr_lock -> pipe->mutex in xfs_file_splice_read() calling splice_to_pipe(). So you end up with a A->B->C->A chain. I think it's new to the new iomap based buffered write path in 4.8. Dave, Christoph? Linus > [ 1749.956818] > [ 1749.958492] ====================================================== > [ 1749.965386] [ INFO: possible circular locking dependency detected ] > [ 1749.972381] 4.8.0-rc4+ #34 Not tainted > [ 1749.976560] ------------------------------------------------------- > [ 1749.983554] splice01/35921 is trying to acquire lock: > [ 1749.989188] (&sb->s_type->i_mutex_key#14){+.+.+.}, at: [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.001644] > [ 1750.001644] but task is already holding lock: > [ 1750.008151] (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8169e7c1>] pipe_lock+0x51/0x60 > [ 1750.016753] > [ 1750.016753] which lock already depends on the new lock. > [ 1750.016753] > [ 1750.025880] > [ 1750.025880] the existing dependency chain (in reverse order) is: > [ 1750.034229] > -> #2 (&pipe->mutex/1){+.+.+.}: > [ 1750.039139] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.045857] [<ffffffff8266448d>] mutex_lock_nested+0xdd/0x850 > [ 1750.052963] [<ffffffff8169e7c1>] pipe_lock+0x51/0x60 > [ 1750.059190] [<ffffffff8171ee25>] splice_to_pipe+0x75/0x9e0 > [ 1750.066001] [<ffffffff81723991>] __generic_file_splice_read+0xa71/0xe90 > [ 1750.074071] [<ffffffff81723e71>] generic_file_splice_read+0xc1/0x1f0 > [ 1750.081849] [<ffffffffa0838628>] xfs_file_splice_read+0x368/0x7b0 [xfs] > [ 1750.089940] [<ffffffff8171fa7e>] do_splice_to+0xee/0x150 > [ 1750.096555] [<ffffffff817262f4>] SyS_splice+0x1144/0x1c10 > [ 1750.103269] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 > [ 1750.110084] [<ffffffff8266ea7f>] return_from_SYSCALL_64+0x0/0x7a > [ 1750.117479] > -> #1 (&(&ip->i_iolock)->mr_lock#2){++++++}: > [ 1750.123649] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.130362] [<ffffffff8129b93e>] down_write_nested+0x5e/0xe0 > [ 1750.137371] [<ffffffffa086772e>] xfs_ilock+0x2fe/0x550 [xfs] > [ 1750.144397] [<ffffffffa083c204>] xfs_file_buffered_aio_write+0x134/0x840 [xfs] > [ 1750.153175] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 [xfs] > [ 1750.161177] [<ffffffff8168374e>] __vfs_write+0x2be/0x640 > [ 1750.167799] [<ffffffff816876e2>] vfs_write+0x152/0x4b0 > [ 1750.174220] [<ffffffff8168b0df>] SyS_write+0xdf/0x1d0 > [ 1750.180547] [<ffffffff8266e9bc>] entry_SYSCALL_64_fastpath+0x1f/0xbd > [ 1750.188328] > -> #0 (&sb->s_type->i_mutex_key#14){+.+.+.}: > [ 1750.194508] [<ffffffff812adbc3>] __lock_acquire+0x3043/0x3dd0 > [ 1750.201609] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.208321] [<ffffffff82668cda>] down_write+0x5a/0xe0 > [ 1750.214645] [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.223421] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 [xfs] > [ 1750.231423] [<ffffffff816859be>] vfs_iter_write+0x29e/0x550 > [ 1750.238330] [<ffffffff81722729>] iter_file_splice_write+0x529/0xb70 > [ 1750.246012] [<ffffffff817258d4>] SyS_splice+0x724/0x1c10 > [ 1750.252627] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 > [ 1750.259438] [<ffffffff8266ea7f>] return_from_SYSCALL_64+0x0/0x7a > [ 1750.266830] > [ 1750.266830] other info that might help us debug this: > [ 1750.266830] > [ 1750.275764] Chain exists of: > &sb->s_type->i_mutex_key#14 --> &(&ip->i_iolock)->mr_lock#2 --> &pipe->mutex/1 > > [ 1750.287213] Possible unsafe locking scenario: > [ 1750.287213] > [ 1750.293817] CPU0 CPU1 > [ 1750.298871] ---- ---- > [ 1750.303924] lock(&pipe->mutex/1); > [ 1750.307845] lock(&(&ip->i_iolock)->mr_lock#2); > [ 1750.315836] lock(&pipe->mutex/1); > [ 1750.322567] lock(&sb->s_type->i_mutex_key#14); > [ 1750.327748] > [ 1750.327748] *** DEADLOCK *** > [ 1750.327748] > [ 1750.334355] 2 locks held by splice01/35921: > [ 1750.339019] #0: (sb_writers#8){.+.+.+}, at: [<ffffffff8168f444>] __sb_start_write+0xb4/0xf0 > [ 1750.348595] #1: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8169e7c1>] pipe_lock+0x51/0x60 > [ 1750.357686] > [ 1750.357686] stack backtrace: > [ 1750.362548] CPU: 50 PID: 35921 Comm: splice01 Not tainted 4.8.0-rc4+ #34 > [ 1750.370026] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0044.R00.1501191641 01/19/2015 > [ 1750.381382] 0000000000000000 000000003bca9477 ffff88044c4176e0 ffffffff81a3d191 > [ 1750.389675] ffffffff84292880 ffffffff842b9e30 ffff88044c417730 ffffffff812a6aa6 > [ 1750.397968] ffffffff84292880 ffff880414a28cd0 ffff88044c417850 ffff880414a28cd0 > [ 1750.406261] Call Trace: > [ 1750.408992] [<ffffffff81a3d191>] dump_stack+0x85/0xc4 > [ 1750.414725] [<ffffffff812a6aa6>] print_circular_bug+0x356/0x460 > [ 1750.421428] [<ffffffff812adbc3>] __lock_acquire+0x3043/0x3dd0 > [ 1750.427942] [<ffffffff81414fe9>] ? is_ftrace_trampoline+0x99/0xe0 > [ 1750.434840] [<ffffffff812aab80>] ? debug_check_no_locks_freed+0x2c0/0x2c0 > [ 1750.442512] [<ffffffff812a0272>] ? add_lock_to_list.isra.29.constprop.45+0x142/0x1d0 > [ 1750.451249] [<ffffffff812acd9e>] ? __lock_acquire+0x221e/0x3dd0 > [ 1750.457952] [<ffffffff812aa3ce>] ? trace_hardirqs_on_caller+0x3fe/0x580 > [ 1750.465430] [<ffffffff812af52a>] lock_acquire+0x1fa/0x440 > [ 1750.471578] [<ffffffffa083c1f7>] ? xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.479929] [<ffffffff82668cda>] down_write+0x5a/0xe0 > [ 1750.485691] [<ffffffffa083c1f7>] ? xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.494070] [<ffffffffa083c1f7>] xfs_file_buffered_aio_write+0x127/0x840 [xfs] > [ 1750.502226] [<ffffffff81007b66>] ? do_syscall_64+0x1a6/0x500 > [ 1750.508666] [<ffffffffa083c0d0>] ? xfs_file_dio_aio_write+0xca0/0xca0 [xfs] > [ 1750.516532] [<ffffffff812a9f72>] ? mark_held_locks+0xd2/0x130 > [ 1750.523044] [<ffffffff812f5887>] ? debug_lockdep_rcu_enabled+0x77/0x90 > [ 1750.530417] [<ffffffff82664873>] ? mutex_lock_nested+0x4c3/0x850 > [ 1750.537243] [<ffffffffa083cb7d>] xfs_file_write_iter+0x26d/0x6d0 [xfs] > [ 1750.544625] [<ffffffff8169e7c1>] ? pipe_lock+0x51/0x60 > [ 1750.550456] [<ffffffff816859be>] vfs_iter_write+0x29e/0x550 > [ 1750.556770] [<ffffffff81685720>] ? vfs_iter_read+0x540/0x540 > [ 1750.563181] [<ffffffff81722729>] iter_file_splice_write+0x529/0xb70 > [ 1750.570271] [<ffffffff81722200>] ? page_cache_pipe_buf_confirm+0x1f0/0x1f0 > [ 1750.578041] [<ffffffff812f5a33>] ? rcu_read_lock_sched_held+0xa3/0x120 > [ 1750.585423] [<ffffffff812f6055>] ? rcu_sync_lockdep_assert+0x75/0xb0 > [ 1750.592609] [<ffffffff8129bd6c>] ? percpu_down_read+0x5c/0xa0 > [ 1750.599118] [<ffffffff8168f444>] ? __sb_start_write+0xb4/0xf0 > [ 1750.605627] [<ffffffff817258d4>] SyS_splice+0x724/0x1c10 > [ 1750.611651] [<ffffffff812f5a33>] ? rcu_read_lock_sched_held+0xa3/0x120 > [ 1750.619033] [<ffffffff817251b0>] ? compat_SyS_vmsplice+0x1f0/0x1f0 > [ 1750.626025] [<ffffffff81007a12>] ? do_syscall_64+0x52/0x500 > [ 1750.632338] [<ffffffff817251b0>] ? compat_SyS_vmsplice+0x1f0/0x1f0 > [ 1750.639330] [<ffffffff81007b66>] do_syscall_64+0x1a6/0x500 > [ 1750.645549] [<ffffffff8100401a>] ? trace_hardirqs_on_thunk+0x1a/0x1c > [ 1750.652737] [<ffffffff8266ea7f>] entry_SYSCALL64_slow_path+0x25/0x25
On Fri, Sep 2, 2016 at 10:02 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > It's very much _not_ just overlayfs being pathological - that it certainly is, > but the problem is much wider. Al, can you take a look at my two patches, and see if you agree that they fix it, though? Of course, we now have *another* splice deadlock. That pipe inode is nasty, it's very easy to deadlock on it in subtle ways. Linus
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Sep 2, 2016 at 10:02 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> It's very much _not_ just overlayfs being pathological - that it certainly is, >> but the problem is much wider. > > Al, can you take a look at my two patches, and see if you agree that > they fix it, though? The original deadlock occurred because of some code path locking the superblock followed by trying to acquire the af_unix readlock while unix_bind did the same in the opposite order (by doing kern_path_create with the readlock held). If unix_bind doesn't share a lock with the receive routines anymore, this obviously can't happen anymore. The other problem situation is one where a lock on someting can be acquired both by kern_path_create and a mknod operation and the readlock is taken in between. Because that sits in between the kern_path_create and the mknod, it can block the thread which got a certain lock via kern_path_create because the one which is about to try to acquire it via mknod got the readlock first. This obviously can't happen anymore the when the original 'acquire readlock (now bindlock) first' is restored.
On Fri, Sep 02, 2016 at 10:12:08AM -0700, Linus Torvalds wrote: > On Fri, Sep 2, 2016 at 10:02 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > It's very much _not_ just overlayfs being pathological - that it certainly is, > > but the problem is much wider. > > Al, can you take a look at my two patches, and see if you agree that > they fix it, though? AFAICS, they should. Locking is obviously saner that way and AFAICS the rest is absolutely straightforward. Acked-by: Al Viro <viro@zeniv.linux.org.uk> > Of course, we now have *another* splice deadlock. That pipe inode is > nasty, it's very easy to deadlock on it in subtle ways. I'm still digging through iomap.c, but that's better taken to another branch of this thread...
On Fri, Sep 02, 2016 at 06:40:59PM +0100, Rainer Weikusat wrote: > The original deadlock occurred because of some code path locking the > superblock followed by trying to acquire the af_unix readlock while Not even that - one code path takes ->readlock under pipe lock, while another takes pipe lock under sb_start_write...
From 9a76489d81f6d2b1da22906363d28c398d4f7c5c Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu, 1 Sep 2016 14:43:53 -0700 Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' Right now we use the 'readlock' both for protecting some of the af_unix IO path and for making the bind be single-threaded. The two are independent, but using the same lock makes for a nasty deadlock due to ordering with regards to filesystem locking. The bind locking would want to nest outside the VSF pathname locking, but the IO locking wants to nest inside some of those same locks. We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix splice-bind deadlock") which moved the readlock inside the vfs locks, but that caused problems with overlayfs that will then call back into filesystem routines that take the lock in the wrong order anyway. Splitting the locks means that we can go back to having the bind lock be the outermost lock, and we don't have any deadlocks with lock ordering. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- include/net/af_unix.h | 2 +- net/unix/af_unix.c | 45 +++++++++++++++++++++++---------------------- 2 files changed, 24 insertions(+), 23 deletions(-) diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 9b4c418bebd8..fd60eccb59a6 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -52,7 +52,7 @@ struct unix_sock { struct sock sk; struct unix_address *addr; struct path path; - struct mutex readlock; + struct mutex iolock, bindlock; struct sock *peer; struct list_head link; atomic_long_t inflight; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 433ae1bbef97..8309687a56b0 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -661,11 +661,11 @@ static int unix_set_peek_off(struct sock *sk, int val) { struct unix_sock *u = unix_sk(sk); - if (mutex_lock_interruptible(&u->readlock)) + if (mutex_lock_interruptible(&u->iolock)) return -EINTR; sk->sk_peek_off = val; - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); return 0; } @@ -779,7 +779,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) spin_lock_init(&u->lock); atomic_long_set(&u->inflight, 0); INIT_LIST_HEAD(&u->link); - mutex_init(&u->readlock); /* single task reading lock */ + mutex_init(&u->iolock); /* single task reading lock */ + mutex_init(&u->bindlock); /* single task binding lock */ init_waitqueue_head(&u->peer_wait); init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay); unix_insert_socket(unix_sockets_unbound(sk), sk); @@ -848,7 +849,7 @@ static int unix_autobind(struct socket *sock) int err; unsigned int retries = 0; - err = mutex_lock_interruptible(&u->readlock); + err = mutex_lock_interruptible(&u->bindlock); if (err) return err; @@ -895,7 +896,7 @@ retry: spin_unlock(&unix_table_lock); err = 0; -out: mutex_unlock(&u->readlock); +out: mutex_unlock(&u->bindlock); return err; } @@ -1009,7 +1010,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) goto out; addr_len = err; - err = mutex_lock_interruptible(&u->readlock); + err = mutex_lock_interruptible(&u->bindlock); if (err) goto out; @@ -1063,7 +1064,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) out_unlock: spin_unlock(&unix_table_lock); out_up: - mutex_unlock(&u->readlock); + mutex_unlock(&u->bindlock); out: return err; } @@ -1955,17 +1956,17 @@ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page, if (false) { alloc_skb: unix_state_unlock(other); - mutex_unlock(&unix_sk(other)->readlock); + mutex_unlock(&unix_sk(other)->iolock); newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT, &err, 0); if (!newskb) goto err; } - /* we must acquire readlock as we modify already present + /* we must acquire iolock as we modify already present * skbs in the sk_receive_queue and mess with skb->len */ - err = mutex_lock_interruptible(&unix_sk(other)->readlock); + err = mutex_lock_interruptible(&unix_sk(other)->iolock); if (err) { err = flags & MSG_DONTWAIT ? -EAGAIN : -ERESTARTSYS; goto err; @@ -2032,7 +2033,7 @@ alloc_skb: } unix_state_unlock(other); - mutex_unlock(&unix_sk(other)->readlock); + mutex_unlock(&unix_sk(other)->iolock); other->sk_data_ready(other); scm_destroy(&scm); @@ -2041,7 +2042,7 @@ alloc_skb: err_state_unlock: unix_state_unlock(other); err_unlock: - mutex_unlock(&unix_sk(other)->readlock); + mutex_unlock(&unix_sk(other)->iolock); err: kfree_skb(newskb); if (send_sigpipe && !(flags & MSG_NOSIGNAL)) @@ -2109,7 +2110,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); do { - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); skip = sk_peek_offset(sk, flags); skb = __skb_try_recv_datagram(sk, flags, &peeked, &skip, &err, @@ -2117,14 +2118,14 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, if (skb) break; - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); if (err != -EAGAIN) break; } while (timeo && !__skb_wait_for_more_packets(sk, &err, &timeo, last)); - if (!skb) { /* implies readlock unlocked */ + if (!skb) { /* implies iolock unlocked */ unix_state_lock(sk); /* Signal EOF on disconnected non-blocking SEQPACKET socket. */ if (sk->sk_type == SOCK_SEQPACKET && err == -EAGAIN && @@ -2189,7 +2190,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, out_free: skb_free_datagram(sk, skb); - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); out: return err; } @@ -2284,7 +2285,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state) /* Lock the socket to prevent queue disordering * while sleeps in memcpy_tomsg */ - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); if (flags & MSG_PEEK) skip = sk_peek_offset(sk, flags); @@ -2326,7 +2327,7 @@ again: break; } - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); timeo = unix_stream_data_wait(sk, timeo, last, last_len); @@ -2337,7 +2338,7 @@ again: goto out; } - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); goto redo; unlock: unix_state_unlock(sk); @@ -2440,7 +2441,7 @@ unlock: } } while (size); - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); if (state->msg) scm_recv(sock, state->msg, &scm, flags); else @@ -2481,9 +2482,9 @@ static ssize_t skb_unix_socket_splice(struct sock *sk, int ret; struct unix_sock *u = unix_sk(sk); - mutex_unlock(&u->readlock); + mutex_unlock(&u->iolock); ret = splice_to_pipe(pipe, spd); - mutex_lock(&u->readlock); + mutex_lock(&u->iolock); return ret; } -- 2.10.0.rc0.2.g0a9fa47