From patchwork Wed Jan 3 07:26:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh Bandewar X-Patchwork-Id: 854894 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=bandewar-net.20150623.gappssmtp.com header.i=@bandewar-net.20150623.gappssmtp.com header.b="DYdzs4sk"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zBMvk4zkCz9t3H for ; Wed, 3 Jan 2018 18:27:30 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751603AbeACH1C (ORCPT ); Wed, 3 Jan 2018 02:27:02 -0500 Received: from mail-it0-f66.google.com ([209.85.214.66]:45687 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751512AbeACH05 (ORCPT ); Wed, 3 Jan 2018 02:26:57 -0500 Received: by mail-it0-f66.google.com with SMTP id z6so791673iti.4 for ; Tue, 02 Jan 2018 23:26:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bandewar-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=ig3uzXyEJgQkJScKAafT0l58hZ02U3W/JKWvZBZjEyo=; b=DYdzs4sk245gezUmZJz3wkr85JtecMk+d2yArDDv7cqFiHgiyQ9BXeQ+Zrx+CDAGQC fM45csGNi+QCIhRO1kcfDhitR8MpsuxoV80ypUnyqHntPMBMy+JEvnc4gpw107rw0Zp6 95IF8jmYdfhG4qfAJZcRXI8PzBOgt6hJW61AsLAA/AUYNHwTnrNLebuYiF5pZxcOLOOr OcZkSn5n0tEZU5fRo1UtMAbaGmbdOyyBvE7rXWmdSSnqGd4HQ7xUp5ihOL3dhEZn2Mdb sW40i7cPsSjA27OIcQRKX08Lq+U/bVWrLOdH++vPqGcqjjusjV5mQAHgMkpO0Rkn0Bbe Z/5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=ig3uzXyEJgQkJScKAafT0l58hZ02U3W/JKWvZBZjEyo=; b=YSHIOM5D1W0CbZuEudQNVrZFpPDzM/rwGVL7jKe3E2jVyNswNXtPMQboYh383TpKvX VgxG0mnJk97St+J8lQZqix+Zt9zmhpcuA6yZ7ULCHrw2NCne36hb4UPpcS03f5qoAUOg XMDK/s/YClcc8vgAP8zYRE2t6+6QdWt670kxdfIptsZGnGVyWMjB1anA8VCHGhAvU7qC +1TcNzVOaDvpXQRfuxNEtqw70JypN6svdJrTENiGWqoyt4fwnWLHapfzvsm9o19FjPiw zXNVd/hx09rOnbGv6dpJf+Nxpd29kUvrI+AE65kNvkldGwAwcqNOYchCn6/V8q4ty/PC RDeQ== X-Gm-Message-State: AKGB3mIOdx/sgcvBBWTYkGLCM8WOAnEIqInAa+/GG6J8HVExxk2134+7 QNhSYG9okpRW1TkJ1lnYCtsqvA== X-Google-Smtp-Source: ACJfBouDn63sMzNabeFbUUdnpLEEE5CmydvJFnpVyhmkYyAU2mkFpm8v5jM31MtUO7Yx+pgxn/vCAQ== X-Received: by 10.36.253.73 with SMTP id m70mr800339ith.49.1514964416631; Tue, 02 Jan 2018 23:26:56 -0800 (PST) Received: from localhost ([2620:15c:2c4:201:8c5:d50e:8273:19ab]) by smtp.gmail.com with ESMTPSA id g187sm395841itb.29.2018.01.02.23.26.55 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 02 Jan 2018 23:26:55 -0800 (PST) From: Mahesh Bandewar To: LKML , James Morris Cc: Netdev , Kernel-hardening , Linux API , Linux Security , Serge Hallyn , Michael Kerrisk , Kees Cook , "Eric W . Biederman" , Eric Dumazet , David Miller , Mahesh Bandewar , Mahesh Bandewar Subject: [PATCHv4 1/2] capability: introduce sysctl for controlled user-ns capability whitelist Date: Tue, 2 Jan 2018 23:26:52 -0800 Message-Id: <20180103072652.161912-1-mahesh@bandewar.net> X-Mailer: git-send-email 2.15.1.620.gb9897f4670-goog Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Mahesh Bandewar Add a sysctl variable kernel.controlled_userns_caps_whitelist. Capability mask is stored in kernel as kernel_cap_t type (array of u32). This sysctl takes input as comma separated hex u32 words. For simplicity one could see this sysctl to operate on string inputs. However the value is not expected to change that often during the life of a kernel-boot. It makes more sense to use the widely available API instead of bringing another string manipulation for the purpose of making this simpler. The default value set (for kernel.controlled_userns_caps_whitelist) is CAP_FULL_SET indicating that no capability is controlled by default to maintain compatibility with the existing behavior of user-ns. Administrator will have to modify this sysctl to control any capability as such. e.g. to control CAP_NET_RAW the mask need to be changed like - # sysctl -q kernel.controlled_userns_caps_whitelist kernel.controlled_userns_caps_whitelist = 1f,ffffffff # sysctl -w kernel.controlled_userns_caps_whitelist=1f,ffffdfff kernel.controlled_userns_caps_whitelist = 1f,ffffdfff For bit-to-mask conversion please check include/uapi/linux/capability.h file. Any capabilities that are not part of this mask will be controlled and will not be allowed to processes in controlled user-ns. In above example CAP_NET_RAW will not be available to controlled-user-namespaces. Acked-by: Serge Hallyn Signed-off-by: Mahesh Bandewar --- v4: commit message changes. v3: Added couple of comments as requested by Serge Hallyn v2: Rebase v1: Initial submission Documentation/sysctl/kernel.txt | 21 ++++++++++++++++++ include/linux/capability.h | 3 +++ kernel/capability.c | 47 +++++++++++++++++++++++++++++++++++++++++ kernel/sysctl.c | 5 +++++ 4 files changed, 76 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 694968c7523c..6aa1e087afee 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -25,6 +25,7 @@ show up in /proc/sys/kernel: - bootloader_version [ X86 only ] - callhome [ S390 only ] - cap_last_cap +- controlled_userns_caps_whitelist - core_pattern - core_pipe_limit - core_uses_pid @@ -187,6 +188,26 @@ CAP_LAST_CAP from the kernel. ============================================================== +controlled_userns_caps_whitelist + +Capability mask that is whitelisted for "controlled" user namespaces. +Any capability that is missing from this mask will not be allowed to +any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW +is not part of this mask, then processes running inside any controlled +userns's will not be allowed to perform action that needs CAP_NET_RAW +capability. However, processes that are attached to a parent user-ns +hierarchy that is *not* controlled and has CAP_NET_RAW can continue +performing those actions. User-namespaces are marked "controlled" at +the time of their creation based on the capabilities of the creator. +A process that does not have CAP_SYS_ADMIN will create user-namespaces +that are controlled. + +The value is expressed as two comma separated hex words (u32). This +sysctl is available in init-ns and users with CAP_SYS_ADMIN in init-ns +are allowed to make changes. + +============================================================== + core_pattern: core_pattern is used to specify a core dumpfile pattern name. diff --git a/include/linux/capability.h b/include/linux/capability.h index f640dcbc880c..7d79a4689625 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -14,6 +14,7 @@ #define _LINUX_CAPABILITY_H #include +#include #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 @@ -248,6 +249,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); /* audit system wants to get cap info from files as well */ extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, + void __user *buff, size_t *lenp, loff_t *ppos); extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size); diff --git a/kernel/capability.c b/kernel/capability.c index 1e1c0236f55b..4a859b7d4902 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -29,6 +29,8 @@ EXPORT_SYMBOL(__cap_empty_set); int file_caps_enabled = 1; +kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET; + static int __init file_caps_disable(char *str) { file_caps_enabled = 0; @@ -507,3 +509,48 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns) rcu_read_unlock(); return (ret == 0); } + +/* Controlled-userns capabilities routines */ +#ifdef CONFIG_SYSCTL +int proc_douserns_caps_whitelist(struct ctl_table *table, int write, + void __user *buff, size_t *lenp, loff_t *ppos) +{ + DECLARE_BITMAP(caps_bitmap, CAP_LAST_CAP); + struct ctl_table caps_table; + char tbuf[NAME_MAX]; + int ret; + + ret = bitmap_from_u32array(caps_bitmap, CAP_LAST_CAP, + controlled_userns_caps_whitelist.cap, + _KERNEL_CAPABILITY_U32S); + if (ret != CAP_LAST_CAP) + return -1; + + scnprintf(tbuf, NAME_MAX, "%*pb", CAP_LAST_CAP, caps_bitmap); + + caps_table.data = tbuf; + caps_table.maxlen = NAME_MAX; + caps_table.mode = table->mode; + ret = proc_dostring(&caps_table, write, buff, lenp, ppos); + if (ret) + return ret; + if (write) { + kernel_cap_t tmp; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + ret = bitmap_parse_user(buff, *lenp, caps_bitmap, CAP_LAST_CAP); + if (ret) + return ret; + + ret = bitmap_to_u32array(tmp.cap, _KERNEL_CAPABILITY_U32S, + caps_bitmap, CAP_LAST_CAP); + if (ret != CAP_LAST_CAP) + return -1; + + controlled_userns_caps_whitelist = tmp; + } + return 0; +} +#endif /* CONFIG_SYSCTL */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 557d46728577..759b6c286806 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1217,6 +1217,11 @@ static struct ctl_table kern_table[] = { .extra2 = &one, }, #endif + { + .procname = "controlled_userns_caps_whitelist", + .mode = 0644, + .proc_handler = proc_douserns_caps_whitelist, + }, { } }; From patchwork Wed Jan 3 07:26:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mahesh Bandewar X-Patchwork-Id: 854893 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=bandewar-net.20150623.gappssmtp.com header.i=@bandewar-net.20150623.gappssmtp.com header.b="1lU0VYUg"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zBMvS38stz9sNV for ; Wed, 3 Jan 2018 18:27:16 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751687AbeACH1G (ORCPT ); Wed, 3 Jan 2018 02:27:06 -0500 Received: from mail-io0-f196.google.com ([209.85.223.196]:43671 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751591AbeACH1C (ORCPT ); Wed, 3 Jan 2018 02:27:02 -0500 Received: by mail-io0-f196.google.com with SMTP id w188so1194332iod.10 for ; Tue, 02 Jan 2018 23:27:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bandewar-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id; bh=mji10y1Y07G1mevZtpYXKW/RRVCRMUWloayybg+vVuM=; b=1lU0VYUgGO2wFVA2WMhfxjs4VhJasDD+2YKkgBv2mZMMD/YeZOzSk+SYmL1pIlJssU 6BIKNIyY0qT2Ifpt8QAlD92K7XPWPz0jEiafVgu68w2y7mv0zEEMKl7+4/p4hFyR3E55 /KoOPIt5vKgFCMfS+QAfpqCrqOXqTNutyup9M4ln6+TaPZtwZBn+7hoz3cm7Y5rd6XA4 bAVwZVDUnJAWNqNzZJEtTlazduQuOlTA7JkX/ggSs9tNszBpkdY4JhkBEnlACLPkwIyh 0tTR6SU6OT4Uth3fxYqXkoIMo6z1ItzsokmCmOnaFN9/JkxxJ7KVkz0skz70S9+9vGCm vYlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=mji10y1Y07G1mevZtpYXKW/RRVCRMUWloayybg+vVuM=; b=l7tYwTuZE1Ls4fUIWSoJ0Q6OhOjanz1+Vi8hecf0rZPnY4DQzNZ3OSSqJO93+IcA6W HJc9Sv4aXYSV377CvdFcIVt/uNo7zCn0ZmcI1/5oXb5aGOWo8VkDNWAib6ZcNBdptv5T OHO0qXx9DcKkNb81V/trJQ4Kv58Y6CtMI2sQEtW4EeW/u/N3ZySKBBqSeNP+x12XMpPJ gzKpOw/V7BnHRLHMWKML/vW3DQloPIUN+4sxT3Ze19z/Bx9XW7vRMjrbHabPNDwtaA/M ytKwgI7388j7dp/PNrvJtVLTF4LKGUm5RMJtMQWlSpilJwIab1MMJiMmBVLv/xMhNFNL d+Rw== X-Gm-Message-State: AKGB3mLtEiL1u7hAkKNkPfBHCYMbSf2hHg38Uh1TF5Wy8WaAtyZ+w8dz CvPMsZwCFrCX8RtEbxishJFNdg== X-Google-Smtp-Source: ACJfBos3N3Qzvln/jhsC9V4A92mTys6nVgT5RC1M/NW6PalYvQdY6KuysqHBqE4Vo3qWFbeGiE6L3A== X-Received: by 10.107.15.160 with SMTP id 32mr558322iop.163.1514964421975; Tue, 02 Jan 2018 23:27:01 -0800 (PST) Received: from localhost ([2620:15c:2c4:201:8c5:d50e:8273:19ab]) by smtp.gmail.com with ESMTPSA id j204sm396104itj.37.2018.01.02.23.27.01 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 02 Jan 2018 23:27:01 -0800 (PST) From: Mahesh Bandewar To: LKML , James Morris Cc: Netdev , Kernel-hardening , Linux API , Linux Security , Serge Hallyn , Michael Kerrisk , Kees Cook , "Eric W . Biederman" , Eric Dumazet , David Miller , Mahesh Bandewar , Mahesh Bandewar Subject: [PATCHv4 2/2] userns: control capabilities of some user namespaces Date: Tue, 2 Jan 2018 23:26:57 -0800 Message-Id: <20180103072657.161985-1-mahesh@bandewar.net> X-Mailer: git-send-email 2.15.1.620.gb9897f4670-goog Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Mahesh Bandewar With this new notion of "controlled" user-namespaces, the controlled user-namespaces are marked at the time of their creation while the capabilities of processes that belong to them are controlled using the global mask. Init-user-ns is always uncontrolled and a process that has SYS_ADMIN that belongs to uncontrolled user-ns can create another (child) user- namespace that is uncontrolled. Any other process (that either does not have SYS_ADMIN or belongs to a controlled user-ns) can only create a user-ns that is controlled. global-capability-whitelist (controlled_userns_caps_whitelist) is used at the capability check-time and keeps the semantics for the processes that belong to uncontrolled user-ns as it is. Processes that belong to controlled user-ns however are subjected to different checks- (a) if the capability in question is controlled and process belongs to controlled user-ns, then it's always denied. (b) if the capability in question is NOT controlled then fall back to the traditional check. Acked-by: Serge Hallyn Signed-off-by: Mahesh Bandewar --- v4: Rebase v3: Rebase v2: Don't recalculate user-ns flags for every setns() call. v1: Initial submission. include/linux/capability.h | 4 ++++ include/linux/user_namespace.h | 25 +++++++++++++++++++++++++ kernel/capability.c | 5 +++++ kernel/user_namespace.c | 4 ++++ security/commoncap.c | 8 ++++++++ 5 files changed, 46 insertions(+) diff --git a/include/linux/capability.h b/include/linux/capability.h index 7d79a4689625..383f31f066f0 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -251,6 +251,10 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns); extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); int proc_douserns_caps_whitelist(struct ctl_table *table, int write, void __user *buff, size_t *lenp, loff_t *ppos); +/* Controlled capability is capability that is missing from the capability-mask + * controlled_userns_caps_whitelist controlled via sysctl. + */ +bool is_capability_controlled(int cap); extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size); diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index d6b74b91096b..a5c48684b317 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -32,6 +32,7 @@ struct uid_gid_map { /* 64 bytes -- 1 cache line */ }; #define USERNS_SETGROUPS_ALLOWED 1UL +#define USERNS_CONTROLLED 2UL #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED @@ -112,6 +113,21 @@ static inline void put_user_ns(struct user_namespace *ns) __put_user_ns(ns); } +/* Controlled user-ns is the one that is created by a process that does not + * have CAP_SYS_ADMIN (or descended from such an user-ns). + * For more details please see the sysctl description of + * controlled_userns_caps_whitelist. + */ +static inline bool is_user_ns_controlled(const struct user_namespace *ns) +{ + return ns->flags & USERNS_CONTROLLED; +} + +static inline void mark_user_ns_controlled(struct user_namespace *ns) +{ + ns->flags |= USERNS_CONTROLLED; +} + struct seq_operations; extern const struct seq_operations proc_uid_seq_operations; extern const struct seq_operations proc_gid_seq_operations; @@ -170,6 +186,15 @@ static inline struct ns_common *ns_get_owner(struct ns_common *ns) { return ERR_PTR(-EPERM); } + +static inline bool is_user_ns_controlled(const struct user_namespace *ns) +{ + return false; +} + +static inline void mark_user_ns_controlled(struct user_namespace *ns) +{ +} #endif #endif /* _LINUX_USER_H */ diff --git a/kernel/capability.c b/kernel/capability.c index 4a859b7d4902..bffe249922de 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -511,6 +511,11 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns) } /* Controlled-userns capabilities routines */ +bool is_capability_controlled(int cap) +{ + return !cap_raised(controlled_userns_caps_whitelist, cap); +} + #ifdef CONFIG_SYSCTL int proc_douserns_caps_whitelist(struct ctl_table *table, int write, void __user *buff, size_t *lenp, loff_t *ppos) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 246d4d4ce5c7..ca0556d466b6 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -141,6 +141,10 @@ int create_user_ns(struct cred *new) goto fail_keyring; set_cred_user_ns(new, ns); + if (!ns_capable(parent_ns, CAP_SYS_ADMIN) || + is_user_ns_controlled(parent_ns)) + mark_user_ns_controlled(ns); + return 0; fail_keyring: #ifdef CONFIG_PERSISTENT_KEYRINGS diff --git a/security/commoncap.c b/security/commoncap.c index 4f8e09340956..5454e9c03ee8 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -73,6 +73,14 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns, { struct user_namespace *ns = targ_ns; + /* If the capability is controlled and user-ns that process + * belongs-to is 'controlled' then return EPERM and no need + * to check the user-ns hierarchy. + */ + if (is_user_ns_controlled(cred->user_ns) && + is_capability_controlled(cap)) + return -EPERM; + /* See if cred has the capability in the target user namespace * by examining the target user namespace and all of the target * user namespace's parents.