From patchwork Wed Sep 18 08:48:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 1163856 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46YDD11LGGz9sNw for ; Wed, 18 Sep 2019 18:48:49 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730585AbfIRIss (ORCPT ); Wed, 18 Sep 2019 04:48:48 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:41698 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729310AbfIRIsq (ORCPT ); Wed, 18 Sep 2019 04:48:46 -0400 Received: from static-dcd-cqq-121001.business.bouyguestelecom.com ([212.194.121.1] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iAVde-0004BF-Rm; Wed, 18 Sep 2019 08:48:38 +0000 From: Christian Brauner To: keescook@chromium.org, luto@amacapital.net Cc: jannh@google.com, wad@chromium.org, shuah@kernel.org, ast@kernel.org, daniel@iogearbox.net, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Christian Brauner , Tycho Andersen , Tyler Hicks Subject: [PATCH 1/4] seccomp: add SECCOMP_RET_USER_NOTIF_ALLOW Date: Wed, 18 Sep 2019 10:48:30 +0200 Message-Id: <20190918084833.9369-2-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190918084833.9369-1-christian.brauner@ubuntu.com> References: <20190918084833.9369-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This allows the seccomp notifier to continue a syscall. A positive discussion about this feature was triggered by a post to the ksummit-discuss mailing list (cf. [3]) and took place during KSummit (cf. [1]) and again at the containers/checkpoint-restore micro-conference at Linux Plumbers. Recently we landed seccomp support for SECCOMP_RET_USER_NOTIF (cf. [4]) which enables a process (watchee) to retrieve an fd for its seccomp filter. This fd can then be handed to another (usually more privileged) process (watcher). The watcher will then be able to receive seccomp messages about the syscalls having been performed by the watchee. This feature is heavily used in some userspace workloads. For example, it is currently used to intercept mknod() syscalls in user namespaces aka in containers. The mknod() syscall can be easily filtered based on dev_t. This allows us to only intercept a very specific subset of mknod() syscalls. Furthermore, mknod() is not possible in user namespaces toto coelo and so intercepting and denying syscalls that are not in the whitelist on accident is not a big deal. The watchee won't notice a difference. In contrast to mknod(), a lot of other syscall we intercept (e.g. setxattr()) cannot be easily filtered like mknod() because they have pointer arguments. Additionally, some of them might actually succeed in user namespaces (e.g. setxattr() for all "user.*" xattrs). Since we currently cannot tell seccomp to continue from a user notifier we are stuck with performing all of the syscalls in lieu of the container. This is a huge security liability since it is extremely difficult to correctly assume all of the necessary privileges of the calling task such that the syscall can be successfully emulated without escaping other additional security restrictions (think missing CAP_MKNOD for mknod(), or MS_NODEV on a filesystem etc.). This can be solved by telling seccomp to resume the syscall. One thing that came up in the discussion was the problem that another thread could change the memory after userspace has decided to let the syscall continue which is a well known TOCTOU with seccomp which is present in other ways already. The discussion showed that this feature is already very useful for any syscall without pointer arguments. For any accidentally intercepted non-pointer syscall it is safe to continue. For syscalls with pointer arguments there is a race but for any cautious userspace and the main usec cases the race doesn't matter. The notifier is intended to be used in a scenario where a more privileged watcher supervises the syscalls of lesser privileged watchee to allow it to get around kernel-enforced limitations by performing the syscall for it whenever deemed save by the watcher. Hence, if a user tricks the watcher into allowing a syscall they will either get a deny based on kernel-enforced restrictions later or they will have changed the arguments in such a way that they manage to perform a syscall with arguments that they would've been allowed to do anyway. In general, it is good to point out again, that the notifier fd was not intended to allow userspace to implement a security policy but rather to work around kernel security mechanisms in cases where the watcher knows that a given action is safe to perform. /* References */ [1]: https://linuxplumbersconf.org/event/4/contributions/560 [2]: https://linuxplumbersconf.org/event/4/contributions/477 [3]: https://lore.kernel.org/r/20190719093538.dhyopljyr5ns33qx@brauner.io [4]: commit 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Signed-off-by: Christian Brauner Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Tycho Andersen CC: Tyler Hicks Cc: Jann Horn Reviewed-by: Tycho Andersen --- include/uapi/linux/seccomp.h | 2 ++ kernel/seccomp.c | 24 ++++++++++++++++++++---- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 90734aa5aa36..2c23b9aa6383 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -76,6 +76,8 @@ struct seccomp_notif { struct seccomp_data data; }; +#define SECCOMP_RET_USER_NOTIF_ALLOW 0x00000001 + struct seccomp_notif_resp { __u64 id; __s64 val; diff --git a/kernel/seccomp.c b/kernel/seccomp.c index dba52a7db5e8..cdb90184d6d7 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -75,6 +75,7 @@ struct seccomp_knotif { /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */ int error; long val; + u32 flags; /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ struct completion ready; @@ -732,11 +733,12 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter) return filter->notif->next_id++; } -static void seccomp_do_user_notification(int this_syscall, +static bool seccomp_do_user_notification(int this_syscall, struct seccomp_filter *match, const struct seccomp_data *sd) { int err; + u32 flags = 0; long ret = 0; struct seccomp_knotif n = {}; @@ -764,6 +766,7 @@ static void seccomp_do_user_notification(int this_syscall, if (err == 0) { ret = n.val; err = n.error; + flags = n.flags; } /* @@ -780,8 +783,14 @@ static void seccomp_do_user_notification(int this_syscall, list_del(&n.list); out: mutex_unlock(&match->notify_lock); + + /* perform syscall */ + if (flags & SECCOMP_RET_USER_NOTIF_ALLOW) + return false; + syscall_set_return_value(current, task_pt_regs(current), err, ret); + return true; } static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, @@ -867,8 +876,10 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, return 0; case SECCOMP_RET_USER_NOTIF: - seccomp_do_user_notification(this_syscall, match, sd); - goto skip; + if (seccomp_do_user_notification(this_syscall, match, sd)) + goto skip; + + return 0; case SECCOMP_RET_LOG: seccomp_log(this_syscall, 0, action, true); @@ -1087,7 +1098,11 @@ static long seccomp_notify_send(struct seccomp_filter *filter, if (copy_from_user(&resp, buf, sizeof(resp))) return -EFAULT; - if (resp.flags) + if (resp.flags & ~SECCOMP_RET_USER_NOTIF_ALLOW) + return -EINVAL; + + if ((resp.flags & SECCOMP_RET_USER_NOTIF_ALLOW) && + (resp.error || resp.val)) return -EINVAL; ret = mutex_lock_interruptible(&filter->notify_lock); @@ -1116,6 +1131,7 @@ static long seccomp_notify_send(struct seccomp_filter *filter, knotif->state = SECCOMP_NOTIFY_REPLIED; knotif->error = resp.error; knotif->val = resp.val; + knotif->flags = resp.flags; complete(&knotif->ready); out: mutex_unlock(&filter->notify_lock); From patchwork Wed Sep 18 08:48:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 1163855 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46YDD056RJz9sNk for ; Wed, 18 Sep 2019 18:48:48 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730569AbfIRIsr (ORCPT ); Wed, 18 Sep 2019 04:48:47 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:41699 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730502AbfIRIsq (ORCPT ); Wed, 18 Sep 2019 04:48:46 -0400 Received: from static-dcd-cqq-121001.business.bouyguestelecom.com ([212.194.121.1] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iAVdf-0004BF-BN; Wed, 18 Sep 2019 08:48:39 +0000 From: Christian Brauner To: keescook@chromium.org, luto@amacapital.net Cc: jannh@google.com, wad@chromium.org, shuah@kernel.org, ast@kernel.org, daniel@iogearbox.net, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Christian Brauner , Tycho Andersen , Tyler Hicks , stable@vger.kernel.org Subject: [PATCH 2/4] seccomp: add two missing ptrace ifdefines Date: Wed, 18 Sep 2019 10:48:31 +0200 Message-Id: <20190918084833.9369-3-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190918084833.9369-1-christian.brauner@ubuntu.com> References: <20190918084833.9369-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add tw missing ptrace ifdefines to avoid compilation errors on systems that do not provide PTRACE_EVENTMSG_SYSCALL_ENTRY or PTRACE_EVENTMSG_SYSCALL_EXIT or: gcc -Wl,-no-as-needed -Wall seccomp_bpf.c -lpthread -o seccomp_bpf In file included from seccomp_bpf.c:52:0: seccomp_bpf.c: In function ‘tracer_ptrace’: seccomp_bpf.c:1792:20: error: ‘PTRACE_EVENTMSG_SYSCALL_ENTRY’ undeclared (first use in this function); did you mean ‘PTRACE_EVENT_CLONE’? EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^ ../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’ __typeof__(_expected) __exp = (_expected); \ ^~~~~~~~~ seccomp_bpf.c:1792:2: note: in expansion of macro ‘EXPECT_EQ’ EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~ seccomp_bpf.c:1792:20: note: each undeclared identifier is reported only once for each function it appears in EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^ ../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’ __typeof__(_expected) __exp = (_expected); \ ^~~~~~~~~ seccomp_bpf.c:1792:2: note: in expansion of macro ‘EXPECT_EQ’ EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~ seccomp_bpf.c:1793:6: error: ‘PTRACE_EVENTMSG_SYSCALL_EXIT’ undeclared (first use in this function); did you mean ‘PTRACE_EVENTMSG_SYSCALL_ENTRY’? : PTRACE_EVENTMSG_SYSCALL_EXIT, msg); ^ ../kselftest_harness.h:608:13: note: in definition of macro ‘__EXPECT’ __typeof__(_expected) __exp = (_expected); \ ^~~~~~~~~ seccomp_bpf.c:1792:2: note: in expansion of macro ‘EXPECT_EQ’ EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY ^~~~~~~~~ Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Signed-off-by: Christian Brauner Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Shuah Khan Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Martin KaFai Lau Cc: Song Liu Cc: Yonghong Song Cc: Tycho Andersen CC: Tyler Hicks Cc: Jann Horn Cc: stable@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: netdev@vger.kernel.org Cc: bpf@vger.kernel.org Reviewed-by: Tyler Hicks --- tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 6ef7f16c4cf5..ee52eab01800 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -155,6 +155,14 @@ struct seccomp_data { #ifndef PTRACE_SECCOMP_GET_METADATA #define PTRACE_SECCOMP_GET_METADATA 0x420d +#ifndef PTRACE_EVENTMSG_SYSCALL_ENTRY +#define PTRACE_EVENTMSG_SYSCALL_ENTRY 1 +#endif + +#ifndef PTRACE_EVENTMSG_SYSCALL_EXIT +#define PTRACE_EVENTMSG_SYSCALL_EXIT 2 +#endif + struct seccomp_metadata { __u64 filter_off; /* Input: which filter */ __u64 flags; /* Output: filter's flags */ From patchwork Wed Sep 18 08:48:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 1163859 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46YDDT32Bqz9s4Y for ; Wed, 18 Sep 2019 18:49:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730686AbfIRItJ (ORCPT ); Wed, 18 Sep 2019 04:49:09 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:41705 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730506AbfIRIsr (ORCPT ); Wed, 18 Sep 2019 04:48:47 -0400 Received: from static-dcd-cqq-121001.business.bouyguestelecom.com ([212.194.121.1] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iAVdf-0004BF-SC; Wed, 18 Sep 2019 08:48:39 +0000 From: Christian Brauner To: keescook@chromium.org, luto@amacapital.net Cc: jannh@google.com, wad@chromium.org, shuah@kernel.org, ast@kernel.org, daniel@iogearbox.net, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Christian Brauner , Tycho Andersen , Tyler Hicks , stable@vger.kernel.org Subject: [PATCH 3/4] seccomp: avoid overflow in implicit constant conversion Date: Wed, 18 Sep 2019 10:48:32 +0200 Message-Id: <20190918084833.9369-4-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190918084833.9369-1-christian.brauner@ubuntu.com> References: <20190918084833.9369-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org USER_NOTIF_MAGIC is assigned to int variables in this test so set it to INT_MAX to avoid warnings: seccomp_bpf.c: In function ‘user_notification_continue’: seccomp_bpf.c:3088:26: warning: overflow in implicit constant conversion [-Woverflow] #define USER_NOTIF_MAGIC 116983961184613L ^ seccomp_bpf.c:3572:15: note: in expansion of macro ‘USER_NOTIF_MAGIC’ resp.error = USER_NOTIF_MAGIC; ^~~~~~~~~~~~~~~~ Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Signed-off-by: Christian Brauner Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Shuah Khan Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Martin KaFai Lau Cc: Song Liu Cc: Yonghong Song Cc: Tycho Andersen CC: Tyler Hicks Cc: Jann Horn Cc: stable@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: netdev@vger.kernel.org Cc: bpf@vger.kernel.org Reviewed-by: Tyler Hicks --- tools/testing/selftests/seccomp/seccomp_bpf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index ee52eab01800..921f0e26f835 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -3080,7 +3081,7 @@ static int user_trap_syscall(int nr, unsigned int flags) return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); } -#define USER_NOTIF_MAGIC 116983961184613L +#define USER_NOTIF_MAGIC INT_MAX TEST(user_notification_basic) { pid_t pid; From patchwork Wed Sep 18 08:48:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 1163857 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ubuntu.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46YDDD5bwrz9sNk for ; Wed, 18 Sep 2019 18:49:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730638AbfIRIs6 (ORCPT ); Wed, 18 Sep 2019 04:48:58 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:41706 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730522AbfIRIsr (ORCPT ); Wed, 18 Sep 2019 04:48:47 -0400 Received: from static-dcd-cqq-121001.business.bouyguestelecom.com ([212.194.121.1] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1iAVdg-0004BF-Ba; Wed, 18 Sep 2019 08:48:40 +0000 From: Christian Brauner To: keescook@chromium.org, luto@amacapital.net Cc: jannh@google.com, wad@chromium.org, shuah@kernel.org, ast@kernel.org, daniel@iogearbox.net, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Christian Brauner , Tycho Andersen , Tyler Hicks , stable@vger.kernel.org Subject: [PATCH 4/4] seccomp: test SECCOMP_RET_USER_NOTIF_ALLOW Date: Wed, 18 Sep 2019 10:48:33 +0200 Message-Id: <20190918084833.9369-5-christian.brauner@ubuntu.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190918084833.9369-1-christian.brauner@ubuntu.com> References: <20190918084833.9369-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Test whether a syscall can be performed after having been intercepted by the seccomp notifier. The test uses dup() and kcmp() since it allows us to nicely test whether the dup() syscall actually succeeded by comparing whether the fd refers to the same underlying struct file. Signed-off-by: Christian Brauner Cc: Kees Cook Cc: Andy Lutomirski Cc: Will Drewry Cc: Shuah Khan Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Martin KaFai Lau Cc: Song Liu Cc: Yonghong Song Cc: Tycho Andersen CC: Tyler Hicks Cc: Jann Horn Cc: stable@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: netdev@vger.kernel.org Cc: bpf@vger.kernel.org --- tools/testing/selftests/seccomp/seccomp_bpf.c | 99 +++++++++++++++++++ 1 file changed, 99 insertions(+) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 921f0e26f835..788d7e9007d5 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -175,6 +176,10 @@ struct seccomp_metadata { #define SECCOMP_RET_USER_NOTIF 0x7fc00000U +#ifndef SECCOMP_RET_USER_NOTIF_ALLOW +#define SECCOMP_RET_USER_NOTIF_ALLOW 0x00000001 +#endif + #define SECCOMP_IOC_MAGIC '!' #define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr) #define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type) @@ -3489,6 +3494,100 @@ TEST(seccomp_get_notif_sizes) EXPECT_EQ(sizes.seccomp_notif_resp, sizeof(struct seccomp_notif_resp)); } +static int filecmp(pid_t pid1, pid_t pid2, int fd1, int fd2) +{ +#ifdef __NR_kcmp + return syscall(__NR_kcmp, pid1, pid2, KCMP_FILE, fd1, fd2); +#else + errno = ENOSYS; + return -1; +#endif +} + +TEST(user_notification_continue) +{ + pid_t pid; + long ret; + int status, listener; + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + struct pollfd pollfd; + + ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret) { + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); + } + + listener = user_trap_syscall(__NR_dup, SECCOMP_FILTER_FLAG_NEW_LISTENER); + ASSERT_GE(listener, 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + int dup_fd, pipe_fds[2]; + pid_t self; + + ret = pipe(pipe_fds); + if (ret < 0) + exit(EXIT_FAILURE); + + dup_fd = dup(pipe_fds[0]); + if (dup_fd < 0) + exit(EXIT_FAILURE); + + self = getpid(); + + ret = filecmp(self, self, pipe_fds[0], dup_fd); + if (ret) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); + } + + pollfd.fd = listener; + pollfd.events = POLLIN | POLLOUT; + + EXPECT_GT(poll(&pollfd, 1, -1), 0); + EXPECT_EQ(pollfd.revents, POLLIN); + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + pollfd.fd = listener; + pollfd.events = POLLIN | POLLOUT; + + EXPECT_GT(poll(&pollfd, 1, -1), 0); + EXPECT_EQ(pollfd.revents, POLLOUT); + + EXPECT_EQ(req.data.nr, __NR_dup); + + resp.id = req.id; + resp.flags = SECCOMP_RET_USER_NOTIF_ALLOW; + + /* check that if (flags & SECCOMP_RET_USER_NOTIF_ALLOW) the rest is 0 */ + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); + EXPECT_EQ(errno, EINVAL); + + resp.error = USER_NOTIF_MAGIC; + resp.val = 0; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); + EXPECT_EQ(errno, EINVAL); + + resp.error = 0; + resp.val = 0; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0) { + if (errno == EINVAL) + XFAIL(goto skip, "Kernel does not support SECCOMP_RET_USER_NOTIF_ALLOW"); + } + +skip: + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); +} + /* * TODO: * - add microbenchmarks