diff mbox

[RFC] overlayfs,xattr: allow unprivileged users to whiteout

Message ID 20140225173113.GA14257@sergelap
State New
Headers show

Commit Message

Serge E. Hallyn Feb. 25, 2014, 5:31 p.m. UTC
To mark a file which exists in the lower layer as deleted,
it creates a symbolic link to a file called "(overlay-whiteout)"
in the writeable mount, and sets a "trusted.overlay" xattr
on that link.

1. When the create the symbolic link as container root, not
as the global root

2. Allow root in a container to edit "trusted.overlay*"
xattrs.  Generally only global root is allowed to edit
"trusted.*"

With this patch, I'm able to delete files and directories in a
user-namespace-based overlayfs-backed container.  The overlay
writeable layer after "rm ab/ab; rmdir ab; mv xxx yyy;" ends up
looking like:

ls -l .local/share/lxc/u11/delta0/home/ubuntu/
total 0
lrwxrwxrwx 1 150000 150000 18 Feb 13 22:30 ab -> (overlay-whiteout)
lrwxrwxrwx 1 150000 150000 18 Feb 13 22:30 xxx -> (overlay-whiteout)
-rw-rw-r-- 1 151000 151000  0 Feb 13 03:53 yyy

(with 150000 being the mapped container root)

Note - the fs/xattr.c hunk could presumably be dropped if we
switched to using "user.overlay".  This could however cause
problems with pre-existing overlay deltas.  I don't really care
how this is done, but am wondering whether there is any good
reason why making overlay whiteouts should require root on
the host rather than only in the userns.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 fs/overlayfs/dir.c | 9 +++++++--
 fs/xattr.c         | 5 ++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

Comments

Miklos Szeredi Feb. 28, 2014, 2:15 p.m. UTC | #1
On Tue, Feb 25, 2014 at 6:31 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> To mark a file which exists in the lower layer as deleted,
> it creates a symbolic link to a file called "(overlay-whiteout)"
> in the writeable mount, and sets a "trusted.overlay" xattr
> on that link.
>
> 1. When the create the symbolic link as container root, not
> as the global root
>
> 2. Allow root in a container to edit "trusted.overlay*"
> xattrs.  Generally only global root is allowed to edit
> "trusted.*"

Shouldn't overlayfs just skip the permission checks and call
__vfs_setxattr_noperm() instead?

Thanks,
Miklos
Andy Whitcroft Feb. 28, 2014, 2:55 p.m. UTC | #2
On Fri, Feb 28, 2014 at 03:15:14PM +0100, Miklos Szeredi wrote:
> On Tue, Feb 25, 2014 at 6:31 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> > To mark a file which exists in the lower layer as deleted,
> > it creates a symbolic link to a file called "(overlay-whiteout)"
> > in the writeable mount, and sets a "trusted.overlay" xattr
> > on that link.
> >
> > 1. When the create the symbolic link as container root, not
> > as the global root
> >
> > 2. Allow root in a container to edit "trusted.overlay*"
> > xattrs.  Generally only global root is allowed to edit
> > "trusted.*"
> 
> Shouldn't overlayfs just skip the permission checks and call
> __vfs_setxattr_noperm() instead?

It does seem we should be avoiding the permissions here, as we have let
the thing be mounted we have done the permissions checks for that and for
the file access itself already.  This operation is something we definatly
want to represent in the filesystem.

-apw
Serge E. Hallyn Feb. 28, 2014, 4:23 p.m. UTC | #3
Quoting Andy Whitcroft (apw@canonical.com):
> On Fri, Feb 28, 2014 at 03:15:14PM +0100, Miklos Szeredi wrote:
> > On Tue, Feb 25, 2014 at 6:31 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> > > To mark a file which exists in the lower layer as deleted,
> > > it creates a symbolic link to a file called "(overlay-whiteout)"
> > > in the writeable mount, and sets a "trusted.overlay" xattr
> > > on that link.
> > >
> > > 1. When the create the symbolic link as container root, not
> > > as the global root
> > >
> > > 2. Allow root in a container to edit "trusted.overlay*"
> > > xattrs.  Generally only global root is allowed to edit
> > > "trusted.*"
> > 
> > Shouldn't overlayfs just skip the permission checks and call
> > __vfs_setxattr_noperm() instead?
> 
> It does seem we should be avoiding the permissions here, as we have let
> the thing be mounted we have done the permissions checks for that and for
> the file access itself already.  This operation is something we definatly
> want to represent in the filesystem.

D'oh.  Yeah, that looks good.   Andy, should I send a new patch, or
can you make those changes inline?

-serge
Andy Whitcroft March 5, 2014, 5:46 p.m. UTC | #4
When mounting an overlayfs filesystem in a user namespace we encounter
issues with whiteout handling.  Specifically the local namespace root
user is not able to manipulate the "trusted." hierachy to install the
xattr marking the whiteouts.

In the preceeding discussion it was noted that we should really be avoiding
the permissions checks in these cases, as we have already validated that
the operation is permissible.

It was suggested we could use the __vfs_getxattr_noperm interfaces for
this, and I did spend some time prototyping this.  However it requires
us to expose at least two new interfaces to allow setting and getting
these without checks.  Overall it seemed excessive given that we were
already indicating our willingness to take on additional capabilities to
get these operations done.  That lead to a rethink, and the two alternate
patches which follow:

1) overlayfs: switch to the init user namespace for xattr operations:
This first patch solves the issue by dropping back into the init_user_ns
for these operations, effectivly avoiding the namespace issues.

2) overlayfs: use kernel service credentials for copy up and xattr manipulations:
This second patch takes a more radical approach.  As we are willing to
grab all and any permissions we need to make these operations succeed,
it simply grabs full kernel rights via prepare_kernel_cred(NULL).

I personally favour the latter approach as it allows us to remove a number
of essentially arbitrary capability grabs, though it could be argued that
the former patch better documents what is its we actually need.

We have done some basic testing on both and they seem to work as expected.

Comments, thoughts?

-apw

Andy Whitcroft (1):
  UBUNTU: ubuntu: overlayfs20 -- switch to the init user namespace for
    xattr operations

 fs/overlayfs/copy_up.c |  4 ++++
 fs/overlayfs/dir.c     | 23 +++++++++++++++++++----
 fs/overlayfs/readdir.c |  7 +++++++
 fs/overlayfs/super.c   |  5 ++++-
 4 files changed, 34 insertions(+), 5 deletions(-)
diff mbox

Patch

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a209409..3c4657b 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -12,6 +12,7 @@ 
 #include <linux/xattr.h>
 #include <linux/security.h>
 #include <linux/cred.h>
+#include <linux/sched.h>
 #include "overlayfs.h"
 
 static const char *ovl_whiteout_symlink = "(overlay-whiteout)";
@@ -38,8 +39,12 @@  static int ovl_whiteout(struct dentry *upperdir, struct dentry *dentry)
 	cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
 	cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
 	cap_raise(override_cred->cap_effective, CAP_FOWNER);
-	override_cred->fsuid = GLOBAL_ROOT_UID;
-	override_cred->fsgid = GLOBAL_ROOT_GID;
+	override_cred->fsuid = make_kuid(current_user_ns(), 0);
+	if (!uid_valid(override_cred->fsuid))
+		override_cred->fsuid = GLOBAL_ROOT_UID;
+	override_cred->fsgid = make_kgid(current_user_ns(), 0);
+	if (!gid_valid(override_cred->fsgid))
+		override_cred->fsgid = GLOBAL_ROOT_GID;
 	old_cred = override_creds(override_cred);
 
 	newdentry = lookup_one_len(dentry->d_name.name, upperdir,
diff --git a/fs/xattr.c b/fs/xattr.c
index 3377dff..edd826c 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -52,7 +52,10 @@  xattr_permission(struct inode *inode, const char *name, int mask)
 	 * The trusted.* namespace can only be accessed by privileged users.
 	 */
 	if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN)) {
-		if (!capable(CAP_SYS_ADMIN))
+		if (strncmp(name, "trusted.overlay", 15) == 0) {
+			if (!inode_capable(inode, CAP_SYS_ADMIN))
+				return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
+		} else if (!capable(CAP_SYS_ADMIN))
 			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
 		return 0;
 	}