Message ID | 20140225173113.GA14257@sergelap |
---|---|
State | New |
Headers | show |
On Tue, Feb 25, 2014 at 6:31 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > To mark a file which exists in the lower layer as deleted, > it creates a symbolic link to a file called "(overlay-whiteout)" > in the writeable mount, and sets a "trusted.overlay" xattr > on that link. > > 1. When the create the symbolic link as container root, not > as the global root > > 2. Allow root in a container to edit "trusted.overlay*" > xattrs. Generally only global root is allowed to edit > "trusted.*" Shouldn't overlayfs just skip the permission checks and call __vfs_setxattr_noperm() instead? Thanks, Miklos
On Fri, Feb 28, 2014 at 03:15:14PM +0100, Miklos Szeredi wrote: > On Tue, Feb 25, 2014 at 6:31 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > > To mark a file which exists in the lower layer as deleted, > > it creates a symbolic link to a file called "(overlay-whiteout)" > > in the writeable mount, and sets a "trusted.overlay" xattr > > on that link. > > > > 1. When the create the symbolic link as container root, not > > as the global root > > > > 2. Allow root in a container to edit "trusted.overlay*" > > xattrs. Generally only global root is allowed to edit > > "trusted.*" > > Shouldn't overlayfs just skip the permission checks and call > __vfs_setxattr_noperm() instead? It does seem we should be avoiding the permissions here, as we have let the thing be mounted we have done the permissions checks for that and for the file access itself already. This operation is something we definatly want to represent in the filesystem. -apw
Quoting Andy Whitcroft (apw@canonical.com): > On Fri, Feb 28, 2014 at 03:15:14PM +0100, Miklos Szeredi wrote: > > On Tue, Feb 25, 2014 at 6:31 PM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote: > > > To mark a file which exists in the lower layer as deleted, > > > it creates a symbolic link to a file called "(overlay-whiteout)" > > > in the writeable mount, and sets a "trusted.overlay" xattr > > > on that link. > > > > > > 1. When the create the symbolic link as container root, not > > > as the global root > > > > > > 2. Allow root in a container to edit "trusted.overlay*" > > > xattrs. Generally only global root is allowed to edit > > > "trusted.*" > > > > Shouldn't overlayfs just skip the permission checks and call > > __vfs_setxattr_noperm() instead? > > It does seem we should be avoiding the permissions here, as we have let > the thing be mounted we have done the permissions checks for that and for > the file access itself already. This operation is something we definatly > want to represent in the filesystem. D'oh. Yeah, that looks good. Andy, should I send a new patch, or can you make those changes inline? -serge
When mounting an overlayfs filesystem in a user namespace we encounter issues with whiteout handling. Specifically the local namespace root user is not able to manipulate the "trusted." hierachy to install the xattr marking the whiteouts. In the preceeding discussion it was noted that we should really be avoiding the permissions checks in these cases, as we have already validated that the operation is permissible. It was suggested we could use the __vfs_getxattr_noperm interfaces for this, and I did spend some time prototyping this. However it requires us to expose at least two new interfaces to allow setting and getting these without checks. Overall it seemed excessive given that we were already indicating our willingness to take on additional capabilities to get these operations done. That lead to a rethink, and the two alternate patches which follow: 1) overlayfs: switch to the init user namespace for xattr operations: This first patch solves the issue by dropping back into the init_user_ns for these operations, effectivly avoiding the namespace issues. 2) overlayfs: use kernel service credentials for copy up and xattr manipulations: This second patch takes a more radical approach. As we are willing to grab all and any permissions we need to make these operations succeed, it simply grabs full kernel rights via prepare_kernel_cred(NULL). I personally favour the latter approach as it allows us to remove a number of essentially arbitrary capability grabs, though it could be argued that the former patch better documents what is its we actually need. We have done some basic testing on both and they seem to work as expected. Comments, thoughts? -apw Andy Whitcroft (1): UBUNTU: ubuntu: overlayfs20 -- switch to the init user namespace for xattr operations fs/overlayfs/copy_up.c | 4 ++++ fs/overlayfs/dir.c | 23 +++++++++++++++++++---- fs/overlayfs/readdir.c | 7 +++++++ fs/overlayfs/super.c | 5 ++++- 4 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index a209409..3c4657b 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -12,6 +12,7 @@ #include <linux/xattr.h> #include <linux/security.h> #include <linux/cred.h> +#include <linux/sched.h> #include "overlayfs.h" static const char *ovl_whiteout_symlink = "(overlay-whiteout)"; @@ -38,8 +39,12 @@ static int ovl_whiteout(struct dentry *upperdir, struct dentry *dentry) cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN); cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE); cap_raise(override_cred->cap_effective, CAP_FOWNER); - override_cred->fsuid = GLOBAL_ROOT_UID; - override_cred->fsgid = GLOBAL_ROOT_GID; + override_cred->fsuid = make_kuid(current_user_ns(), 0); + if (!uid_valid(override_cred->fsuid)) + override_cred->fsuid = GLOBAL_ROOT_UID; + override_cred->fsgid = make_kgid(current_user_ns(), 0); + if (!gid_valid(override_cred->fsgid)) + override_cred->fsgid = GLOBAL_ROOT_GID; old_cred = override_creds(override_cred); newdentry = lookup_one_len(dentry->d_name.name, upperdir, diff --git a/fs/xattr.c b/fs/xattr.c index 3377dff..edd826c 100644 --- a/fs/xattr.c +++ b/fs/xattr.c @@ -52,7 +52,10 @@ xattr_permission(struct inode *inode, const char *name, int mask) * The trusted.* namespace can only be accessed by privileged users. */ if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN)) { - if (!capable(CAP_SYS_ADMIN)) + if (strncmp(name, "trusted.overlay", 15) == 0) { + if (!inode_capable(inode, CAP_SYS_ADMIN)) + return (mask & MAY_WRITE) ? -EPERM : -ENODATA; + } else if (!capable(CAP_SYS_ADMIN)) return (mask & MAY_WRITE) ? -EPERM : -ENODATA; return 0; }
To mark a file which exists in the lower layer as deleted, it creates a symbolic link to a file called "(overlay-whiteout)" in the writeable mount, and sets a "trusted.overlay" xattr on that link. 1. When the create the symbolic link as container root, not as the global root 2. Allow root in a container to edit "trusted.overlay*" xattrs. Generally only global root is allowed to edit "trusted.*" With this patch, I'm able to delete files and directories in a user-namespace-based overlayfs-backed container. The overlay writeable layer after "rm ab/ab; rmdir ab; mv xxx yyy;" ends up looking like: ls -l .local/share/lxc/u11/delta0/home/ubuntu/ total 0 lrwxrwxrwx 1 150000 150000 18 Feb 13 22:30 ab -> (overlay-whiteout) lrwxrwxrwx 1 150000 150000 18 Feb 13 22:30 xxx -> (overlay-whiteout) -rw-rw-r-- 1 151000 151000 0 Feb 13 03:53 yyy (with 150000 being the mapped container root) Note - the fs/xattr.c hunk could presumably be dropped if we switched to using "user.overlay". This could however cause problems with pre-existing overlay deltas. I don't really care how this is done, but am wondering whether there is any good reason why making overlay whiteouts should require root on the host rather than only in the userns. Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com> --- fs/overlayfs/dir.c | 9 +++++++-- fs/xattr.c | 5 ++++- 2 files changed, 11 insertions(+), 3 deletions(-)