Message ID | 4ED68017.5030504@canonical.com |
---|---|
State | New |
Headers | show |
On Wed, Nov 30, 2011 at 01:12:23PM -0600, Chris J Arges wrote: > BugLink: http://bugs.launchpad.net/bugs/769927 > > SRU Justification: > > Impact: > If two processes attempt to cause automounting on the same mountpoint at > the same time, the vfsmount holding the mountpoint will be left with one > too few references on it, causing a BUG when the kernel tries to clean up. > > Fix: > Fix has been cherry picked from 8aef18845266f5c05904c610088f2d1ed58f6be3t. > It has been accepted in 3.0 and is present in ubuntu-oneiric. > It applies cleanly to ubuntu-natty. > > Testcase: > The procedure for replicating this bug is outlined in the patch > description. Essentially mounting an nfs mount with another mount in it. > Running a test program, and unmount the submount and original mount. > > Patch is attached. > From a8d02e97b6a23ac9648a7a550feb0ed7c7109ec2 Mon Sep 17 00:00:00 2001 > From: Al Viro <viro@ZenIV.linux.org.uk> > Date: Thu, 16 Jun 2011 15:10:06 +0100 > Subject: [PATCH] VFS: Fix vfsmount overput on simultaneous automount > > BugLink: http://bugs.launchpad.net/bugs/769927 > > [Kudos to dhowells for tracking that crap down] > > If two processes attempt to cause automounting on the same mountpoint at the > same time, the vfsmount holding the mountpoint will be left with one too few > references on it, causing a BUG when the kernel tries to clean up. > > The problem is that lock_mount() drops the caller's reference to the > mountpoint's vfsmount in the case where it finds something already mounted on > the mountpoint as it transits to the mounted filesystem and replaces path->mnt > with the new mountpoint vfsmount. > > During a pathwalk, however, we don't take a reference on the vfsmount if it is > the same as the one in the nameidata struct, but do_add_mount() doesn't know > this. > > The fix is to make sure we have a ref on the vfsmount of the mountpoint before > calling do_add_mount(). However, if lock_mount() doesn't transit, we're then > left with an extra ref on the mountpoint vfsmount which needs releasing. > We can handle that in follow_managed() by not making assumptions about what > we can and what we cannot get from lookup_mnt() as the current code does. > > The callers of follow_managed() expect that reference to path->mnt will be > grabbed iff path->mnt has been changed. follow_managed() and follow_automount() > keep track of whether such reference has been grabbed and assume that it'll > happen in those and only those cases that'll have us return with changed > path->mnt. That assumption is almost correct - it breaks in case of > racing automounts and in even harder to hit race between following a mountpoint > and a couple of mount --move. The thing is, we don't need to make that > assumption at all - after the end of loop in follow_manage() we can check > if path->mnt has ended up unchanged and do mntput() if needed. > > The BUG can be reproduced with the following test program: > > #include <stdio.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <unistd.h> > #include <sys/wait.h> > int main(int argc, char **argv) > { > int pid, ws; > struct stat buf; > pid = fork(); > stat(argv[1], &buf); > if (pid > 0) wait(&ws); > return 0; > } > > and the following procedure: > > (1) Mount an NFS volume that on the server has something else mounted on a > subdirectory. For instance, I can mount / from my server: > > mount warthog:/ /mnt -t nfs4 -r > > On the server /data has another filesystem mounted on it, so NFS will see > a change in FSID as it walks down the path, and will mark /mnt/data as > being a mountpoint. This will cause the automount code to be triggered. > > !!! Do not look inside the mounted fs at this point !!! > > (2) Run the above program on a file within the submount to generate two > simultaneous automount requests: > > /tmp/forkstat /mnt/data/testfile > > (3) Unmount the automounted submount: > > umount /mnt/data > > (4) Unmount the original mount: > > umount /mnt > > At this point the kernel should throw a BUG with something like the > following: > > BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12] > > Note that the bug appears on the root dentry of the original mount, not the > mountpoint and not the submount because sys_umount() hasn't got to its final > mntput_no_expire() yet, but this isn't so obvious from the call trace: > > [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82 > [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b > [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs] > [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e > [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs] > [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83 > [<ffffffff811629ff>] deactivate_super+0x6f/0x7b > [<ffffffff81186261>] mntput_no_expire+0x18d/0x199 > [<ffffffff811862a8>] mntput+0x3b/0x44 > [<ffffffff81186d87>] release_mounts+0xa2/0xbf > [<ffffffff811876af>] sys_umount+0x47a/0x4ba > [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f > [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b > > as do_umount() is inlined. However, you can see release_mounts() in there. > > Note also that it may be necessary to have multiple CPU cores to be able to > trigger this bug. > > Tested-by: Jeff Layton <jlayton@redhat.com> > Tested-by: Ian Kent <raven@themaw.net> > Signed-off-by: David Howells <dhowells@redhat.com> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > (cherry picked from commit 8aef18845266f5c05904c610088f2d1ed58f6be3) > > Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> > --- > fs/namei.c | 24 ++++++++++++++++-------- > 1 files changed, 16 insertions(+), 8 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 44890c6..7bf6968 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -923,6 +923,11 @@ static int follow_automount(struct path *path, unsigned flags, > if (!mnt) /* mount collision */ > return 0; > > + if (!*need_mntput) { > + /* lock_mount() may release path->mnt on error */ > + mntget(path->mnt); > + *need_mntput = true; > + } > err = finish_automount(mnt, path); > > switch (err) { > @@ -930,12 +935,9 @@ static int follow_automount(struct path *path, unsigned flags, > /* Someone else made a mount here whilst we were busy */ > return 0; > case 0: > - dput(path->dentry); > - if (*need_mntput) > - mntput(path->mnt); > + path_put(path); > path->mnt = mnt; > path->dentry = dget(mnt->mnt_root); > - *need_mntput = true; > return 0; > default: > return err; > @@ -955,9 +957,10 @@ static int follow_automount(struct path *path, unsigned flags, > */ > static int follow_managed(struct path *path, unsigned flags) > { > + struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */ > unsigned managed; > bool need_mntput = false; > - int ret; > + int ret = 0; > > /* Given that we're not holding a lock here, we retain the value in a > * local variable for each dentry as we look at it so that we don't see > @@ -973,7 +976,7 @@ static int follow_managed(struct path *path, unsigned flags) > ret = path->dentry->d_op->d_manage(path->dentry, > false, false); > if (ret < 0) > - return ret == -EISDIR ? 0 : ret; > + break; > } > > /* Transit to a mounted filesystem. */ > @@ -999,14 +1002,19 @@ static int follow_managed(struct path *path, unsigned flags) > if (managed & DCACHE_NEED_AUTOMOUNT) { > ret = follow_automount(path, flags, &need_mntput); > if (ret < 0) > - return ret == -EISDIR ? 0 : ret; > + break; > continue; > } > > /* We didn't change the current path point */ > break; > } > - return 0; > + > + if (need_mntput && path->mnt == mnt) > + mntput(path->mnt); > + if (ret == -EISDIR) > + ret = 0; > + return ret; > } > > int follow_down_one(struct path *path) > -- > 1.7.5.4 > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On 11/30/2011 12:12 PM, Chris J Arges wrote: > BugLink: http://bugs.launchpad.net/bugs/769927 > > SRU Justification: > > Impact: > If two processes attempt to cause automounting on the same mountpoint at > the same time, the vfsmount holding the mountpoint will be left with one > too few references on it, causing a BUG when the kernel tries to clean up. > > Fix: > Fix has been cherry picked from 8aef18845266f5c05904c610088f2d1ed58f6be3t. > It has been accepted in 3.0 and is present in ubuntu-oneiric. > It applies cleanly to ubuntu-natty. > > Testcase: > The procedure for replicating this bug is outlined in the patch > description. Essentially mounting an nfs mount with another mount in it. > Running a test program, and unmount the submount and original mount. > > Patch is attached. >
From a8d02e97b6a23ac9648a7a550feb0ed7c7109ec2 Mon Sep 17 00:00:00 2001 From: Al Viro <viro@ZenIV.linux.org.uk> Date: Thu, 16 Jun 2011 15:10:06 +0100 Subject: [PATCH] VFS: Fix vfsmount overput on simultaneous automount BugLink: http://bugs.launchpad.net/bugs/769927 [Kudos to dhowells for tracking that crap down] If two processes attempt to cause automounting on the same mountpoint at the same time, the vfsmount holding the mountpoint will be left with one too few references on it, causing a BUG when the kernel tries to clean up. The problem is that lock_mount() drops the caller's reference to the mountpoint's vfsmount in the case where it finds something already mounted on the mountpoint as it transits to the mounted filesystem and replaces path->mnt with the new mountpoint vfsmount. During a pathwalk, however, we don't take a reference on the vfsmount if it is the same as the one in the nameidata struct, but do_add_mount() doesn't know this. The fix is to make sure we have a ref on the vfsmount of the mountpoint before calling do_add_mount(). However, if lock_mount() doesn't transit, we're then left with an extra ref on the mountpoint vfsmount which needs releasing. We can handle that in follow_managed() by not making assumptions about what we can and what we cannot get from lookup_mnt() as the current code does. The callers of follow_managed() expect that reference to path->mnt will be grabbed iff path->mnt has been changed. follow_managed() and follow_automount() keep track of whether such reference has been grabbed and assume that it'll happen in those and only those cases that'll have us return with changed path->mnt. That assumption is almost correct - it breaks in case of racing automounts and in even harder to hit race between following a mountpoint and a couple of mount --move. The thing is, we don't need to make that assumption at all - after the end of loop in follow_manage() we can check if path->mnt has ended up unchanged and do mntput() if needed. The BUG can be reproduced with the following test program: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <sys/wait.h> int main(int argc, char **argv) { int pid, ws; struct stat buf; pid = fork(); stat(argv[1], &buf); if (pid > 0) wait(&ws); return 0; } and the following procedure: (1) Mount an NFS volume that on the server has something else mounted on a subdirectory. For instance, I can mount / from my server: mount warthog:/ /mnt -t nfs4 -r On the server /data has another filesystem mounted on it, so NFS will see a change in FSID as it walks down the path, and will mark /mnt/data as being a mountpoint. This will cause the automount code to be triggered. !!! Do not look inside the mounted fs at this point !!! (2) Run the above program on a file within the submount to generate two simultaneous automount requests: /tmp/forkstat /mnt/data/testfile (3) Unmount the automounted submount: umount /mnt/data (4) Unmount the original mount: umount /mnt At this point the kernel should throw a BUG with something like the following: BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12] Note that the bug appears on the root dentry of the original mount, not the mountpoint and not the submount because sys_umount() hasn't got to its final mntput_no_expire() yet, but this isn't so obvious from the call trace: [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs] [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs] [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b [<ffffffff81186261>] mntput_no_expire+0x18d/0x199 [<ffffffff811862a8>] mntput+0x3b/0x44 [<ffffffff81186d87>] release_mounts+0xa2/0xbf [<ffffffff811876af>] sys_umount+0x47a/0x4ba [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b as do_umount() is inlined. However, you can see release_mounts() in there. Note also that it may be necessary to have multiple CPU cores to be able to trigger this bug. Tested-by: Jeff Layton <jlayton@redhat.com> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from commit 8aef18845266f5c05904c610088f2d1ed58f6be3) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> --- fs/namei.c | 24 ++++++++++++++++-------- 1 files changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 44890c6..7bf6968 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -923,6 +923,11 @@ static int follow_automount(struct path *path, unsigned flags, if (!mnt) /* mount collision */ return 0; + if (!*need_mntput) { + /* lock_mount() may release path->mnt on error */ + mntget(path->mnt); + *need_mntput = true; + } err = finish_automount(mnt, path); switch (err) { @@ -930,12 +935,9 @@ static int follow_automount(struct path *path, unsigned flags, /* Someone else made a mount here whilst we were busy */ return 0; case 0: - dput(path->dentry); - if (*need_mntput) - mntput(path->mnt); + path_put(path); path->mnt = mnt; path->dentry = dget(mnt->mnt_root); - *need_mntput = true; return 0; default: return err; @@ -955,9 +957,10 @@ static int follow_automount(struct path *path, unsigned flags, */ static int follow_managed(struct path *path, unsigned flags) { + struct vfsmount *mnt = path->mnt; /* held by caller, must be left alone */ unsigned managed; bool need_mntput = false; - int ret; + int ret = 0; /* Given that we're not holding a lock here, we retain the value in a * local variable for each dentry as we look at it so that we don't see @@ -973,7 +976,7 @@ static int follow_managed(struct path *path, unsigned flags) ret = path->dentry->d_op->d_manage(path->dentry, false, false); if (ret < 0) - return ret == -EISDIR ? 0 : ret; + break; } /* Transit to a mounted filesystem. */ @@ -999,14 +1002,19 @@ static int follow_managed(struct path *path, unsigned flags) if (managed & DCACHE_NEED_AUTOMOUNT) { ret = follow_automount(path, flags, &need_mntput); if (ret < 0) - return ret == -EISDIR ? 0 : ret; + break; continue; } /* We didn't change the current path point */ break; } - return 0; + + if (need_mntput && path->mnt == mnt) + mntput(path->mnt); + if (ret == -EISDIR) + ret = 0; + return ret; } int follow_down_one(struct path *path) -- 1.7.5.4