From patchwork Wed Sep 25 20:32:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989526 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=Dl0jEjdz; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=vLA/j9WL; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT215d0kz1xsM for ; Thu, 26 Sep 2024 06:32:56 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=u49n3vt1xmXpdN+H3Ve037tCjCuO47RCORmnokub20Q=; b=Dl0jEjdzbH/nQVdfh3uTgRupyZ KeZndJYJYrQ2waYN+Kw/iq4mFST3oHWTz3b1gNNQb/XEyyXpGhdG2rA3l8XkSc8Suxc+H/WeJz/FJ wAenGmo9hcXMceAWWrLvMHtu3hAoNqAmrxX7uO5d+AZV4EvOimEQr6Cpd0DWyhVu6zkmXJrFQAD9z FffZEVKoojqAU/+8FP8rkuY4DHo09CO6tyojqq4ZdAYAuKophqQjkfwhReRNpbwguOESqYZd7xpm1 A+BM9ycE6KoovgbHKqXXBG1FI1FA2cBTPUq+ZuPulYiZH9JNUpAVi6swMSVXfF0D5IZIFU2H2QIfw FXx8BAzQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYgv-00000006Uzx-1c5t; Wed, 25 Sep 2024 20:32:53 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYgt-00000006Uyi-0BGM for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:32:52 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=u49n3vt1xmXpdN+H3Ve037tCjCuO47RCORmnokub20Q=; t=1727296369; x=1728505969; b=vLA/j9WLKqfBguDXaJUCP0Qf/xGkmpRkZqZ8QJL8zPSvHe7 x1vpnZTjp5+6+NBPabGodDR65jDfBlPzEpFMK5Vpo9cb/c0NJMLb6UMAw/7Ycl7cOO1edT3SElllQ p2/fibEqQXu859kzgi0DW0NY8iSTNbWlOkwN4gPQoCkJS03sQQS3nQIMIkmBwjwy3RipZLO4r1uHT zXVX/vMd3uOpz8xONR533wcQsbYPPNQQUtccpNcC0/qUxTe+t67J+H9Pv/cIT71OL0trst9UAhu0x TAYHc81f4bWARtMdhZkAJreoANJ9nxtPvdToZ8jPiZ/RJ6oazHBy23STUl+erBjQ==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYgo-00000001A19-1FAd; Wed, 25 Sep 2024 22:32:46 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Benjamin Berg Subject: [RFC PATCH 1/9] um: Store full CSGSFS and SS register from mcontext Date: Wed, 25 Sep 2024 22:32:24 +0200 Message-ID: <20240925203232.565086-2-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133251_102543_6186B98F X-CRM114-Status: GOOD ( 10.19 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Doing this allows using registers as retrieved from an mcontext to be pushed to a process using PTRACE_SETREGS. It is not entirely clear to me why CSGSFS was masked. Doing so creates issues when using the mcontext as process state in seccomp and simply copying the register appears to work perfectly fine for ptr [...] Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Doing this allows using registers as retrieved from an mcontext to be pushed to a process using PTRACE_SETREGS. It is not entirely clear to me why CSGSFS was masked. Doing so creates issues when using the mcontext as process state in seccomp and simply copying the register appears to work perfectly fine for ptrace. Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/x86/um/os-Linux/mcontext.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/um/os-Linux/mcontext.c b/arch/x86/um/os-Linux/mcontext.c index e80ab7d28117..1b0d95328b2c 100644 --- a/arch/x86/um/os-Linux/mcontext.c +++ b/arch/x86/um/os-Linux/mcontext.c @@ -27,7 +27,6 @@ void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc) COPY(RIP); COPY2(EFLAGS, EFL); COPY2(CS, CSGSFS); - regs->gp[CS / sizeof(unsigned long)] &= 0xffff; - regs->gp[CS / sizeof(unsigned long)] |= 3; + regs->gp[SS / sizeof(unsigned long)] = mc->gregs[REG_CSGSFS] >> 48; #endif } From patchwork Wed Sep 25 20:32:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989524 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=E69fHP6I; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=B70J2Smm; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT220P3Xz1xt5 for ; Thu, 26 Sep 2024 06:32:56 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=KysYQJEYX6mkJORaKH4woAy4aN0wX3ajbyMpeUIWgpk=; b=E69fHP6IgfdUA+nXesroCvxuEv 89VaZAByPORwsk4IAPZNHoAq8Q0Mc7n2TIlhbuIuci9PNvA3trWuRA1jJy53YGK2cCOcsQfRfKvRp w8axXl13c0vHbaFQYSLFs/ODItfgXCYHetqEpUo91+nRbp6e2qa0K5Gsu89Z2fN67ZtbAYKtAziiH Qsv/nMhDFtgeadytlVrZeFHCY85btMAkDw1x9Up5UwdvqXkXczTB9c4Fj5ShT20ZYqVw9PR5S6JRV SXYn9Z42YfNtuPORSzGonFjyXa/I2PNI8mYUsjqnqwAwVQrtY5uscSRaBKLeUM5wsmU5pFwaHqh14 vf5GZXiA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYgw-00000006V0U-34pX; Wed, 25 Sep 2024 20:32:54 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYgt-00000006Uyt-0wbE for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:32:52 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=KysYQJEYX6mkJORaKH4woAy4aN0wX3ajbyMpeUIWgpk=; t=1727296371; x=1728505971; b=B70J2Smm7r40TW/njx7zA0vvl/hGw0H5k9nJ4Hd4VvRwkdc RUqvhJowDO/2gVvVXdAR3K9/cRsoZZSvYD0pLiG4RH1anjNjyxnjjCnrGqwAA7Ja/JAxsuV+HvHvs pbJmpi819GGBcEBryC6WPb5EMOVLilIu4JddMw41Bh3EkhIh5pdnHM89ubnFPaGPa59aZJ0DwDkCw P0yo0Fv2tALjdfbjw8Zpta09eB6nunzJWUti/3HrSiA/2ZfkFIHGUmesik2FdvREDz4fvGh0WdOZk 5TQeDrUf5KmjoqF6DqoCCoqAss884AjXF7SV4PDmPA4ed1fVgO5GOQydUvmQrjMA==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYgp-00000001A19-3BHX; Wed, 25 Sep 2024 22:32:48 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Benjamin Berg Subject: [RFC PATCH 2/9] um: Move faultinfo extraction into userspace routine Date: Wed, 25 Sep 2024 22:32:25 +0200 Message-ID: <20240925203232.565086-3-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133251_290211_A66B1F94 X-CRM114-Status: GOOD ( 11.81 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: The segv handler is called slightly differently depending on whether PTRACE_FULL_FAULTINFO is set or not (32bit vs. 64bit). The only difference is that we don't try to pass the registers and instructi [...] Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org The segv handler is called slightly differently depending on whether PTRACE_FULL_FAULTINFO is set or not (32bit vs. 64bit). The only difference is that we don't try to pass the registers and instruction pointer to the segv handler. It would be good to either document or remove the difference, but I do not know why this difference exists. Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/os-Linux/skas/process.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c index 95376357fb17..24a09dc3c83e 100644 --- a/arch/um/os-Linux/skas/process.c +++ b/arch/um/os-Linux/skas/process.c @@ -176,12 +176,6 @@ static void get_skas_faultinfo(int pid, struct faultinfo *fi, unsigned long *aux } } -static void handle_segv(int pid, struct uml_pt_regs *regs, unsigned long *aux_fp_regs) -{ - get_skas_faultinfo(pid, ®s->faultinfo, aux_fp_regs); - segv(regs->faultinfo, 0, 1, NULL); -} - static void handle_trap(int pid, struct uml_pt_regs *regs) { if ((UPT_IP(regs) >= STUB_START) && (UPT_IP(regs) < STUB_END)) @@ -525,13 +519,15 @@ void userspace(struct uml_pt_regs *regs, unsigned long *aux_fp_regs) switch (sig) { case SIGSEGV: - if (PTRACE_FULL_FAULTINFO) { - get_skas_faultinfo(pid, - ®s->faultinfo, aux_fp_regs); + get_skas_faultinfo(pid, + ®s->faultinfo, aux_fp_regs); + + if (PTRACE_FULL_FAULTINFO) (*sig_info[SIGSEGV])(SIGSEGV, (struct siginfo *)&si, regs); - } - else handle_segv(pid, regs, aux_fp_regs); + else + segv(regs->faultinfo, 0, 1, NULL); + break; case SIGTRAP + 0x80: handle_trap(pid, regs); From patchwork Wed Sep 25 20:32:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=bJnRBLTK; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=Q4eNAjnp; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT215qrtz1xt0 for ; Thu, 26 Sep 2024 06:32:57 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=798Pe4y9UugDN+QU72wAGbH3wNvKyUq///x3VuY4CI4=; b=bJnRBLTKYRl98i/m+vMLjYVnwa /hEX90uwVaXrIj3P+BvlmElni0pve/oJhWHy1TzsCF9Vyd3fy5D16stVI5VDJwdJXJBOE9xmLgzot rm876vlIID7nru0CmRDA0OZn1F02RK4HqECh5C9j/sSpN5JIhRdRN9FwkPkZzbd6uts5rhu99mWJE 611HTLBOF2nfoMKO31EZP9Ce5BqN2r1XXgvnGoX/EFi3F0QIyt6Npi0Ju/4gfjt+7K5DxDRbgaoQu 54bMydc1RjedCU4ynpmhBBtbkLHrdAupPLrdBw+Na0lqZZZ6L7G4lz/AmT2e6/vd4uzKZMQVyKvD6 NN7zhFtQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYgy-00000006V0o-0QN7; Wed, 25 Sep 2024 20:32:56 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYgw-00000006Uzv-0V6d for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:32:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=798Pe4y9UugDN+QU72wAGbH3wNvKyUq///x3VuY4CI4=; t=1727296373; x=1728505973; b=Q4eNAjnpMxsjWwdijqo3waXWGsNjGvGv2iYkIa6RvKV7VVU o+bkLn1BV1UhVTJfCvpoOaS6S07OEyiEZo6Lu6fHhCTW5jbpLuSS+JpHZ11AjNpmbZTJ1DRcmPo4Y f6HRojsP1wm8wXYScXG/3U5tUlbGW559OqEiXp7BlYzOHOFWo05RNx+LRx+p2KgQnY84vTLYq5EW2 XnWp/t+3rdRpRTDho9d9MkRcXKnhRYd6w6Jc2K0AC99NIG5TnW3BDwMQh/eIWNyv/jmce8VVnad1I lVk2fUFNRDt6C/boO+yTOAyI65yItJzKgKNDnCkL+uqTAsVdEpCKaRMp7S8J+lyg==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYgr-00000001A19-1gtv; Wed, 25 Sep 2024 22:32:49 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Benjamin Berg Subject: [RFC PATCH 3/9] um: Add UML_SECCOMP configuration option Date: Wed, 25 Sep 2024 22:32:26 +0200 Message-ID: <20240925203232.565086-4-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133254_180704_6CA2827E X-CRM114-Status: GOOD ( 10.56 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Add the UML_SECCOMP configuration options. The next commits will add the support itself in smaller chunks. Only x86_64 will be supported for now. Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/Kconfig | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Add the UML_SECCOMP configuration options. The next commits will add the support itself in smaller chunks. Only x86_64 will be supported for now. Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/Kconfig | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/arch/um/Kconfig b/arch/um/Kconfig index 48db1c99bd46..4698e4c8ef29 100644 --- a/arch/um/Kconfig +++ b/arch/um/Kconfig @@ -240,6 +240,26 @@ config KASAN_SHADOW_OFFSET set to a large value. On low-memory systems, try 0x7fff8000, as it fits into the immediate of most instructions, improving performance. +config UML_SECCOMP + bool "SECCOMP based userspace" + default n + help + With SECCOMP userspace processes work collaboratively with the kernel + instead of being traced using ptrace. All syscalls from the application + are caught and redirected using a signal. This signal handler in turn + is permitted to do the selected set of syscalls to communicate with + the UML kernel and do the required memory management. + + This method is overall faster than the ptrace based userspace, + primarily because it reduces the number of context switches for + (minor) page faults. + However, the SECCOMP filter is not (yet) restrictive enough to prevent + userspace from reading and writing all physical memory. Userspace + processes could also trick the stub into disabling SIGALRM which + prevents it from being interrupted for scheduling purposes. + + If in doubt say N, as the feature has security implications. + endmenu source "arch/um/drivers/Kconfig" From patchwork Wed Sep 25 20:32:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989527 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=AxzvYOY2; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=oF7KmVfm; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT271gd2z1xsM for ; Thu, 26 Sep 2024 06:33:03 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=JgugQawXpoEtPKzMySq2yhgptRzDx0f88x+TnPEDkSA=; b=AxzvYOY2DG3Scr4wfAWE/2t3zv LDmx0sKA1GVrWkUtMDgtypXs7wJ8SMsZ4eodRs3k2u6U3bU71OAYqXyU+gjcFeH4HdmgMWyaHKo7r mCQLBQ1XGiudQUipHPyP1GtiUyfDHxxKkMonnUpr+amtM+WUYE26lJvVrc5FjL6o1MYOR4Zg9DE4w 592OinwulMWwgaLPBur2Wc79ydx+ezxWphAchFD/Wj8fqoMctKiMcq056aFmv8bLLUP50CiA9Vt9S fqgFWAQZqfT+C6RLtX9CIBuv3Y7UXbKLDivXG6LOvzzQuZizEO/cxqKmjQqKSjJ29fHehceWe0SG7 XsCQ6UWg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYh3-00000006V3C-2NsW; Wed, 25 Sep 2024 20:33:01 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYh0-00000006V1p-34kg for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:33:00 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=JgugQawXpoEtPKzMySq2yhgptRzDx0f88x+TnPEDkSA=; t=1727296378; x=1728505978; b=oF7KmVfmnAdnMe3eQCIvmrsk9E1VWqGmOz78I8Af9RHb6PF OYynoHjuEGdpSPARZUuIMqdtsH5MklUVJkBzqhqYgYxPmSo1a9MdzgpqGEb95doL/jRM1nmmZpWqz Iw6ZcZQGWPczz0TV8n8kfZ8KUg+TZYedUukAJxElncbEg6Za0PQxv1zFTs/HHFIsWlgUSxX5jsW4U rR+5/HO2YG08OSsGOlyv8Sx7x4Gh09oVToQy+/GEwOwCxb9eqRXt7gZ2d5gNkB64mIzO0iLZPi8Gg nWCESGs79Ga/87wTpoXbjr7M61vZjVZNnPaqSi+ECodIes/QLPYNrqVj8Fe4rYfQ==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYgv-00000001A19-2Cn8; Wed, 25 Sep 2024 22:32:53 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Johannes Berg , Benjamin Berg Subject: [RFC PATCH 4/9] um: Add stub side of SECCOMP/futex based process handling Date: Wed, 25 Sep 2024 22:32:27 +0200 Message-ID: <20240925203232.565086-5-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133258_967749_A22BC890 X-CRM114-Status: GOOD ( 21.38 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: This adds the stub side for the new seccomp process management code. In this case we do register save/restore through the signal handler mcontext. For the FS_BASE/GS_BASE register we need special hand [...] Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org This adds the stub side for the new seccomp process management code. In this case we do register save/restore through the signal handler mcontext. For the FS_BASE/GS_BASE register we need special handling. Co-authored-by: Johannes Berg Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/include/shared/common-offsets.h | 1 + arch/um/include/shared/skas/stub-data.h | 15 +++++++ arch/um/kernel/skas/stub.c | 53 +++++++++++++++++++++++++ arch/x86/um/shared/sysdep/stub-data.h | 18 +++++++++ arch/x86/um/shared/sysdep/stub.h | 2 + arch/x86/um/shared/sysdep/stub_32.h | 13 ++++++ arch/x86/um/shared/sysdep/stub_64.h | 14 +++++++ 7 files changed, 116 insertions(+) create mode 100644 arch/x86/um/shared/sysdep/stub-data.h diff --git a/arch/um/include/shared/common-offsets.h b/arch/um/include/shared/common-offsets.h index 579ed946a3a9..253987fc78ac 100644 --- a/arch/um/include/shared/common-offsets.h +++ b/arch/um/include/shared/common-offsets.h @@ -29,3 +29,4 @@ DEFINE(UML_CONFIG_64BIT, CONFIG_64BIT); DEFINE(UML_CONFIG_UML_TIME_TRAVEL_SUPPORT, CONFIG_UML_TIME_TRAVEL_SUPPORT); #endif +DEFINE(UM_KERN_GDT_ENTRY_TLS_ENTRIES, GDT_ENTRY_TLS_ENTRIES); diff --git a/arch/um/include/shared/skas/stub-data.h b/arch/um/include/shared/skas/stub-data.h index 3fbdda727373..1ee1677abeda 100644 --- a/arch/um/include/shared/skas/stub-data.h +++ b/arch/um/include/shared/skas/stub-data.h @@ -8,9 +8,14 @@ #ifndef __STUB_DATA_H #define __STUB_DATA_H +#include #include #include #include +#include + +#define FUTEX_IN_CHILD 0 +#define FUTEX_IN_KERN 1 struct stub_init_data { unsigned long stub_start; @@ -53,6 +58,16 @@ struct stub_data { /* 128 leaves enough room for additional fields in the struct */ struct stub_syscall syscall_data[(UM_KERN_PAGE_SIZE - 128) / sizeof(struct stub_syscall)] __aligned(16); + /* data shared with signal handler (only used in seccomp mode) */ + short restart_wait; + unsigned int futex; + int signal; + unsigned short si_offset; + unsigned short mctx_offset; + + /* seccomp architecture specific state restore */ + struct stub_data_arch arch_data; + /* Stack for our signal handlers and for calling into . */ unsigned char sigstack[UM_KERN_PAGE_SIZE] __aligned(UM_KERN_PAGE_SIZE); }; diff --git a/arch/um/kernel/skas/stub.c b/arch/um/kernel/skas/stub.c index 5d52ffa682dc..2d0cdb701d29 100644 --- a/arch/um/kernel/skas/stub.c +++ b/arch/um/kernel/skas/stub.c @@ -5,6 +5,11 @@ #include +#ifdef CONFIG_UML_SECCOMP +#include +#include +#endif + static __always_inline int syscall_handler(struct stub_data *d) { int i; @@ -67,3 +72,51 @@ stub_syscall_handler(void) trap_myself(); } + +#ifdef CONFIG_UML_SECCOMP +void __attribute__ ((__section__ (".__syscall_stub"))) +stub_signal_interrupt(int sig, siginfo_t *info, void *p) +{ + struct stub_data *d = get_stub_data(); + ucontext_t *uc = p; + long res; + + d->signal = sig; + d->si_offset = (unsigned long)info - (unsigned long)&d->sigstack[0]; + d->mctx_offset = (unsigned long)&uc->uc_mcontext - (unsigned long)&d->sigstack[0]; + +restart_wait: + d->futex = FUTEX_IN_KERN; + do { + res = stub_syscall3(__NR_futex, (unsigned long)&d->futex, + FUTEX_WAKE, 1); + } while (res == -EINTR); + do { + res = stub_syscall4(__NR_futex, (unsigned long)&d->futex, + FUTEX_WAIT, FUTEX_IN_KERN, 0); + } while (res == -EINTR || d->futex == FUTEX_IN_KERN); + + if (res < 0 && res != -EAGAIN) + stub_syscall2(__NR_kill, 0, SIGKILL); + + /* Try running queued syscalls. */ + if (syscall_handler(d) < 0 || d->restart_wait) { + /* Report SIGSYS if we restart. */ + d->signal = SIGSYS; + d->restart_wait = 0; + goto restart_wait; + } + + /* Restore arch dependent state that is not part of the mcontext */ + stub_seccomp_restore_state(&d->arch_data); + + /* Return so that the host modified mcontext is restored. */ +} + +void __attribute__ ((__section__ (".__syscall_stub"))) +stub_signal_restorer(void) +{ + /* We must not have anything on the stack when doing rt_sigreturn */ + stub_syscall0(__NR_rt_sigreturn); +} +#endif diff --git a/arch/x86/um/shared/sysdep/stub-data.h b/arch/x86/um/shared/sysdep/stub-data.h new file mode 100644 index 000000000000..15707798ae6e --- /dev/null +++ b/arch/x86/um/shared/sysdep/stub-data.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifdef __i386__ +#include +#include + +struct stub_data_arch { + int sync; + struct user_desc tls[UM_KERN_GDT_ENTRY_TLS_ENTRIES]; +}; +#else +#define STUB_SYNC_FS_BASE (1 << 0) +#define STUB_SYNC_GS_BASE (1 << 1) +struct stub_data_arch { + int sync; + unsigned long fs_base; + unsigned long gs_base; +}; +#endif diff --git a/arch/x86/um/shared/sysdep/stub.h b/arch/x86/um/shared/sysdep/stub.h index dc89f4423454..4fa58f5b4fca 100644 --- a/arch/x86/um/shared/sysdep/stub.h +++ b/arch/x86/um/shared/sysdep/stub.h @@ -13,3 +13,5 @@ extern void stub_segv_handler(int, siginfo_t *, void *); extern void stub_syscall_handler(void); +extern void stub_signal_interrupt(int, siginfo_t *, void *); +extern void stub_signal_restorer(void); diff --git a/arch/x86/um/shared/sysdep/stub_32.h b/arch/x86/um/shared/sysdep/stub_32.h index 631a18d0ff44..22d5c67c05fa 100644 --- a/arch/x86/um/shared/sysdep/stub_32.h +++ b/arch/x86/um/shared/sysdep/stub_32.h @@ -123,4 +123,17 @@ static __always_inline void *get_stub_data(void) return (void *)ret; } + +static __always_inline void +stub_seccomp_restore_state(struct stub_data_arch *arch) +{ + for (int i = 0; i < sizeof(arch->tls) / sizeof(arch->tls[0]); i++) { + if (arch->sync & (1 << i)) + stub_syscall1(__NR_set_thread_area, + (unsigned long) &arch->tls[i]); + } + + arch->sync = 0; +} + #endif diff --git a/arch/x86/um/shared/sysdep/stub_64.h b/arch/x86/um/shared/sysdep/stub_64.h index 17153dfd780a..648a3bb4b406 100644 --- a/arch/x86/um/shared/sysdep/stub_64.h +++ b/arch/x86/um/shared/sysdep/stub_64.h @@ -10,6 +10,7 @@ #include #include #include +#include #define STUB_MMAP_NR __NR_mmap #define MMAP_OFFSET(o) (o) @@ -126,4 +127,17 @@ static __always_inline void *get_stub_data(void) return (void *)ret; } + +static __always_inline void +stub_seccomp_restore_state(struct stub_data_arch *arch) +{ + /* TODO: Use _writefsbase_u64/_writegsbase_u64 when possible */ + if (arch->sync & STUB_SYNC_FS_BASE) + stub_syscall2(__NR_arch_prctl, ARCH_SET_FS, arch->fs_base); + if (arch->sync & STUB_SYNC_GS_BASE) + stub_syscall2(__NR_arch_prctl, ARCH_SET_GS, arch->gs_base); + + arch->sync = 0; +} + #endif From patchwork Wed Sep 25 20:32:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989528 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=QSHvX/Ob; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=ocqy9/4Z; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT2745BNz1xt0 for ; Thu, 26 Sep 2024 06:33:03 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=eCK5vzUWRbePHYtkZrpK4or/B1Y04fq1w6ul1uFBlxo=; b=QSHvX/ObH8XkxFrKpNvNmV0A4H h6vbkpM/DEyI648g5H3XrKRxXFWgYJIFV6712cHLt3ft8RKvopTQq6B503yBxn6fQX7qEqs8HHY5g kE9VgEFjUGvZLfIcwW8F68GpbZBJTHAYOQkgQwlG3LRPIgIAzujWvItBMCz91Qg2HDq9Jsjl7YHmh rQx2NIAInK8hv+8mx/A/BG0ZqY1NGWfhr7vDXJRToQqGwmSSeyGxGSQBcsybPof8OOCDPGpmhTf6U vd7dXDFiSW5CTwDB6uEqFOeVw9PiISXDBExSaFS0dWW+3DaHl93QzsztzvMQJq0KfM3VBx4oz5mfy oESZdYGw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYh3-00000006V3W-3tzd; Wed, 25 Sep 2024 20:33:01 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYh0-00000006V1t-3wCl for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:33:00 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=eCK5vzUWRbePHYtkZrpK4or/B1Y04fq1w6ul1uFBlxo=; t=1727296378; x=1728505978; b=ocqy9/4ZK97X6I2BhE1i+CwtOSyWCMYoXSUY211y3bJMY+P OIvcXlChZ9c7Uhipjf+04mO6cdSBEbldJjhV9qHZMhxRQ4aaZBOOQYgIb3pndhbvzeXHUaGc46KPb MbcWRMobhYJDTjbaNp+IM3mk8T/Cd9LNquUpjRkUmc2vAEzSqUrvY0RCkm1iy557KzG801fSwB/TK a1+TmP/tjgBsBz1fH9tj+envBXRjQ6czXyNv2Y91mIPMJUehpEdbrx2OBiEa+W1gXCmd9FTeI5bKT U8oFsc41BdrLxlWDMZ8RVuuDJl2357q9aek/rj4wimF3yce0MrH3/65z/H2H41qA==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYgx-00000001A19-0Yme; Wed, 25 Sep 2024 22:32:55 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Benjamin Berg Subject: [RFC PATCH 5/9] um: Add helper functions to get/set state for SECCOMP Date: Wed, 25 Sep 2024 22:32:28 +0200 Message-ID: <20240925203232.565086-6-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133259_185105_D695F6A4 X-CRM114-Status: GOOD ( 21.78 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: When not using ptrace, we need to both save and restore registers through the mcontext as provided by the host kernel to our signal handlers. Add corresponding functions to store the state to an mcontext and helpers to access the mcontext of the subprocess through the stub data. Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org When not using ptrace, we need to both save and restore registers through the mcontext as provided by the host kernel to our signal handlers. Add corresponding functions to store the state to an mcontext and helpers to access the mcontext of the subprocess through the stub data. Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/x86/um/os-Linux/mcontext.c | 200 ++++++++++++++++++++++++++- arch/x86/um/shared/sysdep/mcontext.h | 9 ++ 2 files changed, 208 insertions(+), 1 deletion(-) diff --git a/arch/x86/um/os-Linux/mcontext.c b/arch/x86/um/os-Linux/mcontext.c index 1b0d95328b2c..60589110c38a 100644 --- a/arch/x86/um/os-Linux/mcontext.c +++ b/arch/x86/um/os-Linux/mcontext.c @@ -1,9 +1,12 @@ // SPDX-License-Identifier: GPL-2.0 -#include #define __FRAME_OFFSETS +#include +#include +#include #include #include #include +#include void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc) { @@ -17,6 +20,10 @@ void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc) COPY2(UESP, ESP); /* sic */ COPY(EBX); COPY(EDX); COPY(ECX); COPY(EAX); COPY(EIP); COPY_SEG_CPL3(CS); COPY(EFL); COPY_SEG_CPL3(SS); +#undef COPY2 +#undef COPY +#undef COPY_SEG +#undef COPY_SEG_CPL3 #else #define COPY2(X,Y) regs->gp[X/sizeof(unsigned long)] = mc->gregs[REG_##Y] #define COPY(X) regs->gp[X/sizeof(unsigned long)] = mc->gregs[REG_##X] @@ -28,5 +35,196 @@ void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc) COPY2(EFLAGS, EFL); COPY2(CS, CSGSFS); regs->gp[SS / sizeof(unsigned long)] = mc->gregs[REG_CSGSFS] >> 48; +#undef COPY2 +#undef COPY +#endif +} + +#ifdef CONFIG_UML_SECCOMP +/* Same thing, but the copy macros are turned around. */ +void get_mc_from_regs(struct uml_pt_regs *regs, mcontext_t *mc, int single_stepping) +{ +#ifdef __i386__ +#define COPY2(X,Y) mc->gregs[REG_##Y] = regs->gp[X] +#define COPY(X) mc->gregs[REG_##X] = regs->gp[X] +#define COPY_SEG(X) mc->gregs[REG_##X] = regs->gp[X] & 0xffff; +#define COPY_SEG_CPL3(X) mc->gregs[REG_##X] = (regs->gp[X] & 0xffff) | 3; + COPY_SEG(GS); COPY_SEG(FS); COPY_SEG(ES); COPY_SEG(DS); + COPY(EDI); COPY(ESI); COPY(EBP); + COPY2(UESP, ESP); /* sic */ + COPY(EBX); COPY(EDX); COPY(ECX); COPY(EAX); + COPY(EIP); COPY_SEG_CPL3(CS); COPY(EFL); COPY_SEG_CPL3(SS); +#else +#define COPY2(X,Y) mc->gregs[REG_##Y] = regs->gp[X/sizeof(unsigned long)] +#define COPY(X) mc->gregs[REG_##X] = regs->gp[X/sizeof(unsigned long)] + COPY(R8); COPY(R9); COPY(R10); COPY(R11); + COPY(R12); COPY(R13); COPY(R14); COPY(R15); + COPY(RDI); COPY(RSI); COPY(RBP); COPY(RBX); + COPY(RDX); COPY(RAX); COPY(RCX); COPY(RSP); + COPY(RIP); + COPY2(EFLAGS, EFL); + mc->gregs[REG_CSGSFS] = mc->gregs[REG_CSGSFS] & 0xffffffffffffl; + mc->gregs[REG_CSGSFS] |= (regs->gp[SS / sizeof(unsigned long)] & 0xffff) << 48; #endif + + if (single_stepping) + mc->gregs[REG_EFL] |= X86_EFLAGS_TF; + else + mc->gregs[REG_EFL] &= ~X86_EFLAGS_TF; } + +int get_stub_state(struct uml_pt_regs *regs, struct stub_data *data) +{ + mcontext_t *mcontext; + struct _fpstate *fpstate_stub; + int fp_size; + + /* mctx_offset is verified by wait_stub_done_seccomp */ + mcontext = (void *)&data->sigstack[data->mctx_offset]; + + get_regs_from_mc(regs, mcontext); + + /* Assume floating point registers are on the same page */ + fpstate_stub = (void *)(((unsigned long)mcontext->fpregs & + (UM_KERN_PAGE_SIZE - 1)) + + (unsigned long)&data->sigstack[0]); + + if (fpstate_stub->sw_reserved.magic1 != FP_XSTATE_MAGIC1) { + fp_size = sizeof(struct _fpstate); + } else { + char *magic2_addr; + + magic2_addr = (void *)fpstate_stub; + magic2_addr += fpstate_stub->sw_reserved.extended_size; + magic2_addr -= FP_XSTATE_MAGIC2_SIZE; + + /* Bail out if there is no magic (cannot really happen here) */ + if (*(__u32 *)magic2_addr != FP_XSTATE_MAGIC2) + return -EINVAL; + + /* Remove MAGIC2 from the size, we do not save/restore it */ + fp_size = fpstate_stub->sw_reserved.extended_size - + FP_XSTATE_MAGIC2_SIZE; + } + +#ifdef __i386__ + /* + * FIXME: Storage is too small (at least on x86_64 host). See below. + */ + fp_size = sizeof(regs->fp); +#else + if (fp_size > sizeof(regs->fp)) + return -ENOSPC; + + if ((unsigned long)fpstate_stub + fp_size > + (unsigned long)data->sigstack + sizeof(data->sigstack)) + return -EINVAL; +#endif + + memcpy(®s->fp, fpstate_stub, fp_size); + + /* We do not need to read the x86_64 FS_BASE/GS_BASE registers as + * we do not permit userspace to set them directly. + */ + + return 0; +} + +int set_stub_state(struct uml_pt_regs *regs, struct stub_data *data, + int single_stepping) +{ + mcontext_t *mcontext; +#ifndef __i386__ + struct _fpstate *fpstate; +#endif + struct _fpstate *fpstate_stub; + int fp_size; + int fp_size_stub; + + /* mctx_offset is verified by wait_stub_done_seccomp */ + mcontext = (void *)&data->sigstack[data->mctx_offset]; + + if ((unsigned long)mcontext < (unsigned long)data->sigstack || + (unsigned long)mcontext > + (unsigned long) data->sigstack + + sizeof(data->sigstack) - sizeof(*mcontext)) + return -EINVAL; + + get_mc_from_regs(regs, mcontext, single_stepping); + + /* Assume floating point registers are on the same page */ + fpstate_stub = (void *)(((unsigned long)mcontext->fpregs & + (UM_KERN_PAGE_SIZE - 1)) + + (unsigned long)&data->sigstack[0]); + + if (fpstate_stub->sw_reserved.magic1 != FP_XSTATE_MAGIC1) { + fp_size_stub = sizeof(struct _fpstate); + } else { + char *magic2_addr; + + magic2_addr = (void *)fpstate_stub; + magic2_addr += fpstate_stub->sw_reserved.extended_size; + magic2_addr -= FP_XSTATE_MAGIC2_SIZE; + + /* Bail out if there is no magic (cannot really happen here) */ + if (*(__u32 *)magic2_addr != FP_XSTATE_MAGIC2) + return -EINVAL; + + /* Remove MAGIC2 from the size, we do not save/restore it */ + fp_size_stub = fpstate_stub->sw_reserved.extended_size - + FP_XSTATE_MAGIC2_SIZE; + } + +#ifdef __i386__ + /* + * FIXME: Our registers are too small (on a x86_64 host at least). + * We need to mark this as not being an xstate, and we need to do that + * explicitly here as the magic is not stored in the register set then. + * + * Really, we should just dynamically allocate the floating pointer + * registers and use the unmodified host registers. + */ + fp_size = sizeof(regs->fp); + fpstate_stub->sw_reserved.magic1 = 0; + fpstate_stub->sw_reserved.extended_size = sizeof(struct _fpstate); +#else + fpstate = (void *)regs->fp; + if (fpstate->sw_reserved.magic1 != FP_XSTATE_MAGIC1) + fp_size = sizeof(struct _fpstate); + else + fp_size = fpstate_stub->sw_reserved.xstate_size; +#endif + + /* Do our registers fit into the userspace context? */ + if (fp_size > fp_size_stub) + return -ENOSPC; + + /* And, does it really not cross a page boundary? */ + if ((unsigned long)fpstate_stub + fp_size > + (unsigned long)data->sigstack + sizeof(data->sigstack)) + return -EINVAL; + + memcpy(fpstate_stub, ®s->fp, fp_size); + +#ifdef __i386__ + /* + * On x86, the GDT entries are updated by arch_set_tls. + */ +#else + /* + * On x86_64, we need to sync the FS_BASE/GS_BASE registers using the + * arch specific data. + */ + if (data->arch_data.fs_base != regs->gp[FS_BASE / sizeof(unsigned long)]) { + data->arch_data.fs_base = regs->gp[FS_BASE / sizeof(unsigned long)]; + data->arch_data.sync |= STUB_SYNC_FS_BASE; + } + if (data->arch_data.gs_base != regs->gp[GS_BASE / sizeof(unsigned long)]) { + data->arch_data.gs_base = regs->gp[GS_BASE / sizeof(unsigned long)]; + data->arch_data.sync |= STUB_SYNC_GS_BASE; + } +#endif + + return 0; +} +#endif diff --git a/arch/x86/um/shared/sysdep/mcontext.h b/arch/x86/um/shared/sysdep/mcontext.h index b724c54da316..3ea6da0dbe9d 100644 --- a/arch/x86/um/shared/sysdep/mcontext.h +++ b/arch/x86/um/shared/sysdep/mcontext.h @@ -6,7 +6,16 @@ #ifndef __SYS_SIGCONTEXT_X86_H #define __SYS_SIGCONTEXT_X86_H +#include +#include + extern void get_regs_from_mc(struct uml_pt_regs *, mcontext_t *); +extern void get_mc_from_regs(struct uml_pt_regs *regs, mcontext_t *mc, + int single_stepping); + +extern int get_stub_state(struct uml_pt_regs *regs, struct stub_data *data); +extern int set_stub_state(struct uml_pt_regs *regs, struct stub_data *data, + int single_stepping); #ifdef __i386__ From patchwork Wed Sep 25 20:32:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989529 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=xzZJ4gpM; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=hQR40Ycw; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT291BSmz1xsM for ; Thu, 26 Sep 2024 06:33:05 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=xjUv3Y2vRz9/Hr+JVEg9jvBMuDlcJ3pejGAEkzVtx48=; b=xzZJ4gpMopLF5pQADovDM7svDw hQW/jn8p1k74mgGkZYkHpPAMMgrOaRlC9yZcDHEaQx2ZfyrixPG4rJ77AEBAd6yUs+XoLbFvSUbJl mI3f4QSm8hZLAGv/6ERmHDIJalMaPnuptLwZ3y1JPx15y6jjJ/dfzzUlNBdxQr9RgmtwONrUvOrI5 hBb4HXfNPmwCecxQhykqqS5Qu0wwydGSc3Ys1c74O9+5JLBp3Ibauk2yIGltpZofXKv56QFJzcdVT J3F4eOliue3n+8ls0iVGRaEd0mwCd5fZ92eN7n6CBcJHefkHUqRGqqYkeYrKeX5PG2fCooZdxhh2W m7ptfy4A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYh5-00000006V4H-2NID; Wed, 25 Sep 2024 20:33:03 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYh2-00000006V2W-0y1Q for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:33:01 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=xjUv3Y2vRz9/Hr+JVEg9jvBMuDlcJ3pejGAEkzVtx48=; t=1727296380; x=1728505980; b=hQR40YcwJErgFXHLt2UiPoUfDeiv6uEG4T/rRsQzc6dL4Bl bvlsvKjr6p2QJ6k3G2+L1GwG86SfGgXx3mL3EhYO9QlJ0XtXt2fz3qub0VToHKC6uUA4JsIElhOIb h6SyXdGxW+rHJYN2iRSeqrUIeqEfOfe7joMBiFvcSVHwoE9sVnm1hdKQJqZDp7xXxlO7vLmVA0Ora omXI4Mw2d5Lkp62z98GQVkK44Yt2J+foK3EMxvZabGteAY/baZH0KBtw5Pi1bZpnukplUFGgW1snj qr4UkaHW9YoXroyQeUeq7MvptySKVm4+HPRWyTGCmygR/AhdEeFkXhVREIZpsIuQ==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYgy-00000001A19-3TJI; Wed, 25 Sep 2024 22:32:57 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Benjamin Berg Subject: [RFC PATCH 6/9] um: Add SECCOMP support detection and initialization Date: Wed, 25 Sep 2024 22:32:29 +0200 Message-ID: <20240925203232.565086-7-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133300_553606_C623E178 X-CRM114-Status: GOOD ( 22.36 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: This detects seccomp support, sets the global using_seccomp variable and initilizes the exec registers. For now, the implementation simply falls through to the ptrace startup code, meaning that it is [...] Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org This detects seccomp support, sets the global using_seccomp variable and initilizes the exec registers. For now, the implementation simply falls through to the ptrace startup code, meaning that it is unused. Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/include/shared/skas/skas.h | 6 ++ arch/um/os-Linux/registers.c | 4 +- arch/um/os-Linux/skas/process.c | 3 + arch/um/os-Linux/start_up.c | 142 ++++++++++++++++++++++++++++- 4 files changed, 151 insertions(+), 4 deletions(-) diff --git a/arch/um/include/shared/skas/skas.h b/arch/um/include/shared/skas/skas.h index 85c50122ab98..2ff01c773483 100644 --- a/arch/um/include/shared/skas/skas.h +++ b/arch/um/include/shared/skas/skas.h @@ -6,8 +6,14 @@ #ifndef __SKAS_H #define __SKAS_H +#include #include +#ifdef CONFIG_UML_SECCOMP +extern int using_seccomp; +#else +#define using_seccomp 0 +#endif extern int userspace_pid[]; extern void new_thread_handler(void); diff --git a/arch/um/os-Linux/registers.c b/arch/um/os-Linux/registers.c index bd80b921add0..528381496aa7 100644 --- a/arch/um/os-Linux/registers.c +++ b/arch/um/os-Linux/registers.c @@ -13,8 +13,8 @@ /* This is set once at boot time and not changed thereafter */ -static unsigned long exec_regs[MAX_REG_NR]; -static unsigned long exec_fp_regs[FP_SIZE]; +unsigned long exec_regs[MAX_REG_NR]; +unsigned long exec_fp_regs[FP_SIZE]; int init_pid_registers(int pid) { diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c index 24a09dc3c83e..2329fddf195a 100644 --- a/arch/um/os-Linux/skas/process.c +++ b/arch/um/os-Linux/skas/process.c @@ -316,6 +316,9 @@ static int __init init_stub_exe_fd(void) } __initcall(init_stub_exe_fd); +#ifdef CONFIG_UML_SECCOMP +int using_seccomp; +#endif int userspace_pid[NR_CPUS]; /** diff --git a/arch/um/os-Linux/start_up.c b/arch/um/os-Linux/start_up.c index 93fc82c01aba..bfca66db505f 100644 --- a/arch/um/os-Linux/start_up.c +++ b/arch/um/os-Linux/start_up.c @@ -1,8 +1,10 @@ // SPDX-License-Identifier: GPL-2.0 /* + * Copyright (C) 2021 Benjamin Berg * Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) */ +#include #include #include #include @@ -24,6 +26,15 @@ #include #include #include +#ifdef CONFIG_UML_SECCOMP +#include +#include +#include +#include +#include +#include +#include +#endif #include #include #include "internal.h" @@ -224,6 +235,120 @@ static void __init check_ptrace(void) check_sysemu(); } +#ifdef CONFIG_UML_SECCOMP +extern unsigned long exec_regs[MAX_REG_NR]; +extern unsigned long exec_fp_regs[FP_SIZE]; + +static void __init sigsys_handler(int sig, siginfo_t *info, void *p) +{ + struct stub_data *data = get_stub_data(); + ucontext_t *uc = p; + + /* Stow away the location of the mcontext in the stack */ + data->mctx_offset = (unsigned long)&uc->uc_mcontext - + (unsigned long)&data->sigstack[0]; + exit(0); +} + +static bool __init init_seccomp(void) +{ + void *data_addr; + struct stub_data *data; + int pid; + int status; + int n; + + /* We check that we can install a seccomp filter and then exit(0) + * from a trapped syscall. + * + * Note that we cannot verify that no seccomp filter already exists + * for a syscall that results in the process/thread to be killed. + */ + + os_info("Checking that seccomp filters can be installed..."); + + /* data needs to be page aligned, so allocate twice the amount */ + data_addr = mmap(0, 2 * sizeof(*data), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, 0, 0); + + data = (void*)((long)(data_addr + STUB_DATA_PAGES * UM_KERN_PAGE_SIZE) & + (long)~(STUB_DATA_PAGES * UM_KERN_PAGE_SIZE - 1)); + + pid = fork(); + if (pid == 0) { + static struct sock_filter filter[] = { + BPF_STMT(BPF_LD | BPF_W | BPF_ABS, + offsetof(struct seccomp_data, nr)), + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_clock_nanosleep, 1, 0), + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW), + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_TRAP), + }; + static struct sock_fprog prog = { + .len = ARRAY_SIZE(filter), + .filter = filter, + }; + struct sigaction sa; + + set_sigstack(data->sigstack, sizeof(data->sigstack)); + + sa.sa_flags = SA_ONSTACK | SA_NODEFER | SA_SIGINFO; + sa.sa_sigaction = (void *) sigsys_handler; + sa.sa_restorer = NULL; + if (sigaction(SIGSYS, &sa, NULL) < 0) + exit(1); + + prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + if (syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER, + SECCOMP_FILTER_FLAG_TSYNC, &prog) != 0) + exit(2); + + sleep(0); + + /* Never reached. */ + exit(3); + } + + if (pid < 0) + fatal_perror("check_seccomp : fork failed"); + + CATCH_EINTR(n = waitpid(pid, &status, 0)); + if (n < 0) + fatal_perror("check_seccomp : waitpid failed"); + + if (WIFEXITED(status) && WEXITSTATUS(status) == 0) { + int r; + struct uml_pt_regs *regs = calloc(1, sizeof(struct uml_pt_regs)); + + /* Copy registers, the init_registers function assumes ptrace. */ + r = get_stub_state(regs, data); + + memcpy(exec_regs, regs->gp, sizeof(exec_regs)); + memcpy(exec_fp_regs, regs->fp, sizeof(exec_fp_regs)); + + munmap(data, sizeof(*data)); + + free(regs); + + if (r) { + os_info("failed to fetch registers: %d\n", r); + return false; + } + + os_info("OK\n"); + return true; + } + + if (WIFEXITED(status) && WEXITSTATUS(status) == 2) + os_info("missing\n"); + else + os_info("error\n"); + + munmap(data_addr, 2*sizeof(*data)); + return false; +} +#endif + + static void __init check_coredump_limit(void) { struct rlimit lim; @@ -286,13 +411,26 @@ void __init os_early_checks(void) /* Print out the core dump limits early */ check_coredump_limit(); - check_ptrace(); - /* Need to check this early because mmapping happens before the * kernel is running. */ check_tmpexec(); +#ifdef CONFIG_UML_SECCOMP + using_seccomp = 0; + + if (init_seccomp()) { + /* Not fully implemented */ +#if 0 + using_seccomp = 1; + + return; +#endif + } +#endif + + check_ptrace(); + pid = start_ptraced_child(); if (init_pid_registers(pid)) fatal("Failed to initialize default registers"); From patchwork Wed Sep 25 20:32:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989530 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=AyRddLhT; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=bOB48lJJ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT2D665yz1xsM for ; Thu, 26 Sep 2024 06:33:08 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=O3Q/38rXeEUVrojPW4zmGoR/22Nv0/GfBYZQoawwncs=; b=AyRddLhTedtVbvXWbnkkjAiFCb DkftyxBJEH6FO5PlYrcdnfIUY6GXB645m0m0P+p/9eYsgGMlP1H28FFVvKbu66BQM8rnW+CCvNt5D qDx5+pBjn7fdBR6v10rYee9Bj+94/RDVN8mQTAWHHlbABMO4tZq9+qb7igiiQJ1bB3qn5yyCyOMrv aMuKMIT+B/M4tBleC+967nNTNqMC0DJPuIipH211xquAOjGob1de8t5ZDjUe9M6pNr9oN+OApSTy3 a4VVnW6rCdhgCgXeMCKcQ8f156Q8PS4NUElRPYh4+HjlYKzB3IVCeC/jYgd+5fgp9w6ZRIu3yoF5n bJ56hY7g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYh9-00000006V5e-07uY; Wed, 25 Sep 2024 20:33:07 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYh5-00000006V46-3fIJ for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:33:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=O3Q/38rXeEUVrojPW4zmGoR/22Nv0/GfBYZQoawwncs=; t=1727296383; x=1728505983; b=bOB48lJJirwLssKGeC5/ZVSnDN5ZiD0dDHPWIi0/+bhsy5M wzoMqtP2OGfZqIHoDoozrfWGmIhhYQtsF7+q2LQEfhBK8R88amSS6JtHjailqLTdGOetW9NKNE981 JPyv1y3hDJUMcqEVf4wEacxrRtsnpH9AcVlGAWsLgNLwEsUv+94Xdu4yIRr0cx2JKYiB+sTO8noIR 6VyQOFzRKO5hIy1HHLHj0TE7TK2kSDqD4ZT1C2MOMuJm7jrnYkX9ZO1chivKmKIuf5pwYt/ei/nCo oJ7ZBy3BKsqNMVy0y3A/EAPfY8T8mXugLlBAE44k1xW2LLr5F+CDbjOtsSnZv4AA==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYh1-00000001A19-43SP; Wed, 25 Sep 2024 22:33:00 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Benjamin Berg Subject: [RFC PATCH 7/9] um: Track userspace children dying in SECCOMP mode Date: Wed, 25 Sep 2024 22:32:30 +0200 Message-ID: <20240925203232.565086-8-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133304_092824_54E92466 X-CRM114-Status: GOOD ( 28.77 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: When in seccomp mode, we would hang forever on the futex if a child has died unexpectedly. In contrast, ptrace mode will notice it and kill the corresponding thread when it fails to run it. Fix this issue using a new IRQ that is fired after a SIGCHLD and keeping an (internal) list of all MMs. In the IRQ handler, find the affected MM and set its PID to -1 as well as the futex variable to [...] Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org When in seccomp mode, we would hang forever on the futex if a child has died unexpectedly. In contrast, ptrace mode will notice it and kill the corresponding thread when it fails to run it. Fix this issue using a new IRQ that is fired after a SIGCHLD and keeping an (internal) list of all MMs. In the IRQ handler, find the affected MM and set its PID to -1 as well as the futex variable to FUTEX_IN_KERN. This, together with futex returning -EINTR after the signal is sufficient to implement a race-free detection of a child dying. Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/include/asm/irq.h | 5 +- arch/um/include/shared/irq_user.h | 1 + arch/um/include/shared/os.h | 1 + arch/um/include/shared/skas/mm_id.h | 5 ++ arch/um/kernel/irq.c | 5 ++ arch/um/kernel/skas/mmu.c | 91 +++++++++++++++++++++++++++-- arch/um/os-Linux/process.c | 31 ++++++++++ arch/um/os-Linux/signal.c | 19 +++++- 8 files changed, 150 insertions(+), 8 deletions(-) diff --git a/arch/um/include/asm/irq.h b/arch/um/include/asm/irq.h index 749dfe8512e8..36dbedd1af48 100644 --- a/arch/um/include/asm/irq.h +++ b/arch/um/include/asm/irq.h @@ -13,17 +13,18 @@ #define TELNETD_IRQ 8 #define XTERM_IRQ 9 #define RANDOM_IRQ 10 +#define SIGCHLD_IRQ 11 #ifdef CONFIG_UML_NET_VECTOR -#define VECTOR_BASE_IRQ (RANDOM_IRQ + 1) +#define VECTOR_BASE_IRQ (SIGCHLD_IRQ + 1) #define VECTOR_IRQ_SPACE 8 #define UM_FIRST_DYN_IRQ (VECTOR_IRQ_SPACE + VECTOR_BASE_IRQ) #else -#define UM_FIRST_DYN_IRQ (RANDOM_IRQ + 1) +#define UM_FIRST_DYN_IRQ (SIGCHLD_IRQ + 1) #endif diff --git a/arch/um/include/shared/irq_user.h b/arch/um/include/shared/irq_user.h index da0f6eea30d0..53a1f0651b96 100644 --- a/arch/um/include/shared/irq_user.h +++ b/arch/um/include/shared/irq_user.h @@ -16,6 +16,7 @@ enum um_irq_type { struct siginfo; extern void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs); +extern void sigchld_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs); void sigio_run_timetravel_handlers(void); extern void free_irq_by_fd(int fd); extern void deactivate_fd(int fd, int irqnum); diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h index 33c4d2677591..929ddb437ee1 100644 --- a/arch/um/include/shared/os.h +++ b/arch/um/include/shared/os.h @@ -199,6 +199,7 @@ extern int create_mem_file(unsigned long long len); extern void report_enomem(void); /* process.c */ +pid_t os_reap_child(void); extern void os_alarm_process(int pid); extern void os_kill_process(int pid, int reap_child); extern void os_kill_ptraced_process(int pid, int reap_child); diff --git a/arch/um/include/shared/skas/mm_id.h b/arch/um/include/shared/skas/mm_id.h index 140388c282f6..39948f91d89b 100644 --- a/arch/um/include/shared/skas/mm_id.h +++ b/arch/um/include/shared/skas/mm_id.h @@ -7,6 +7,9 @@ #define __MM_ID_H struct mm_id { + /* Simple list containing all MMs to react to a dead child */ + struct mm_id *next; + int pid; unsigned long stack; int syscall_data_len; @@ -14,4 +17,6 @@ struct mm_id { void __switch_mm(struct mm_id *mm_idp); +void notify_mm_kill(int pid); + #endif diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c index 534e91797f89..4fed231a0deb 100644 --- a/arch/um/kernel/irq.c +++ b/arch/um/kernel/irq.c @@ -786,3 +786,8 @@ unsigned long from_irq_stack(int nested) return mask & ~1; } +extern void sigchld_handler(int sig, struct siginfo *unused_si, + struct uml_pt_regs *regs) +{ + do_IRQ(SIGCHLD_IRQ, regs); +} diff --git a/arch/um/kernel/skas/mmu.c b/arch/um/kernel/skas/mmu.c index d3fb506d5bd6..2704f0342a35 100644 --- a/arch/um/kernel/skas/mmu.c +++ b/arch/um/kernel/skas/mmu.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -19,6 +20,9 @@ /* Ensure the stub_data struct covers the allocated area */ static_assert(sizeof(struct stub_data) == STUB_DATA_PAGES * UM_KERN_PAGE_SIZE); +spinlock_t mm_list_lock; +struct mm_id *mm_list = NULL; + int init_new_context(struct task_struct *task, struct mm_struct *mm) { struct mm_id *new_id = &mm->context.id; @@ -31,9 +35,13 @@ int init_new_context(struct task_struct *task, struct mm_struct *mm) new_id->stack = stack; - block_signals_trace(); - new_id->pid = start_userspace(stack); - unblock_signals_trace(); + scoped_guard(spinlock_irqsave, &mm_list_lock) { + /* Insert into list, used for lookups when the child dies */ + new_id->next = mm_list; + mm_list = new_id; + + new_id->pid = start_userspace(stack); + } if (new_id->pid < 0) { ret = new_id->pid; @@ -55,19 +63,92 @@ int init_new_context(struct task_struct *task, struct mm_struct *mm) void destroy_context(struct mm_struct *mm) { struct mm_context *mmu = &mm->context; + struct mm_id *mm_idp, *prev; /* * If init_new_context wasn't called, this will be * zero, resulting in a kill(0), which will result in the * whole UML suddenly dying. Also, cover negative and * 1 cases, since they shouldn't happen either. + * + * Negative cases happen if the child died unexpectedly. */ - if (mmu->id.pid < 2) { + if (mmu->id.pid >= 0 && mmu->id.pid < 2) { printk(KERN_ERR "corrupt mm_context - pid = %d\n", mmu->id.pid); return; } - os_kill_ptraced_process(mmu->id.pid, 1); + + if (mmu->id.pid > 0) { + os_kill_ptraced_process(mmu->id.pid, 1); + mmu->id.pid = -1; + } free_pages(mmu->id.stack, ilog2(STUB_DATA_PAGES)); + + guard(spinlock_irqsave)(&mm_list_lock); + + for (prev = NULL, mm_idp = mm_list; + mm_idp; + prev = mm_idp, mm_idp = prev->next) { + if (mm_idp != &mmu->id) + continue; + + if (prev) + prev->next = mm_idp->next; + else + mm_list = mm_idp->next; + + break; + } +} + +static irqreturn_t mm_sigchld_irq(int irq, void* dev) +{ + struct mm_id *mm_idp; + pid_t pid; + + guard(spinlock)(&mm_list_lock); + + while ((pid = os_reap_child()) > 0) { + /* + * A child died, check if we have an MM with the PID. This is + * only relevant in SECCOMP mode (as ptrace will fail anyway). + * + * See wait_stub_done_seccomp for more details. + */ + for (mm_idp = mm_list; mm_idp; mm_idp = mm_idp->next) { + if (mm_idp->pid == pid) { + struct stub_data *stub_data; + printk("Unexpectedly lost MM child! Affected processes will segfault."); + + /* Marks the MM as dead */ + mm_idp->pid = -1; + + /* + * NOTE: If SMP is implemented, a futex_wake + * needs to be added here. + */ + stub_data = (void *)mm_idp->stack; + stub_data->futex = FUTEX_IN_KERN; + break; + } + } + } + + return IRQ_HANDLED; +} + +static int __init init_child_tracking(void) +{ + int err; + + spin_lock_init(&mm_list_lock); + + err = request_irq(SIGCHLD_IRQ, mm_sigchld_irq, 0, "SIGCHLD", NULL); + if (err < 0) + panic("Failed to register SIGCHLD IRQ: %d", err); + + return 0; } +__initcall(init_child_tracking) diff --git a/arch/um/os-Linux/process.c b/arch/um/os-Linux/process.c index f20602e793d9..01ddaaadfa04 100644 --- a/arch/um/os-Linux/process.c +++ b/arch/um/os-Linux/process.c @@ -17,17 +17,29 @@ #include #include #include +#include void os_alarm_process(int pid) { + if (pid <= 0) + return; + kill(pid, SIGALRM); } void os_kill_process(int pid, int reap_child) { + if (pid <= 0) + return; + + /* Block signals until child is reaped */ + block_signals(); + kill(pid, SIGKILL); if (reap_child) CATCH_EINTR(waitpid(pid, NULL, __WALL)); + + unblock_signals(); } /* Kill off a ptraced child by all means available. kill it normally first, @@ -37,11 +49,27 @@ void os_kill_process(int pid, int reap_child) void os_kill_ptraced_process(int pid, int reap_child) { + if (pid <= 0) + return; + + /* Block signals until child is reaped */ + block_signals(); + kill(pid, SIGKILL); ptrace(PTRACE_KILL, pid); ptrace(PTRACE_CONT, pid); if (reap_child) CATCH_EINTR(waitpid(pid, NULL, __WALL)); + + unblock_signals(); +} + +pid_t os_reap_child(void) +{ + int status; + + /* Try to reap a child */ + return waitpid(-1, &status, WNOHANG); } /* Don't use the glibc version, which caches the result in TLS. It misses some @@ -201,5 +229,8 @@ void init_new_thread_signals(void) set_handler(SIGBUS); signal(SIGHUP, SIG_IGN); set_handler(SIGIO); + /* We (currently) only use the child reaper IRQ in seccomp mode */ + if (using_seccomp) + set_handler(SIGCHLD); signal(SIGWINCH, SIG_IGN); } diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c index b11ed66c8bb0..6ca72ffb8d38 100644 --- a/arch/um/os-Linux/signal.c +++ b/arch/um/os-Linux/signal.c @@ -29,6 +29,7 @@ void (*sig_info[NSIG])(int, struct siginfo *, struct uml_pt_regs *) = { [SIGBUS] = bus_handler, [SIGSEGV] = segv_handler, [SIGIO] = sigio_handler, + [SIGCHLD] = sigchld_handler, }; static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc) @@ -44,7 +45,7 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc) } /* enable signals if sig isn't IRQ signal */ - if ((sig != SIGIO) && (sig != SIGWINCH)) + if ((sig != SIGIO) && (sig != SIGWINCH) && (sig != SIGCHLD)) unblock_signals_trace(); (*sig_info[sig])(sig, si, &r); @@ -64,6 +65,9 @@ static void sig_handler_common(int sig, struct siginfo *si, mcontext_t *mc) #define SIGALRM_BIT 1 #define SIGALRM_MASK (1 << SIGALRM_BIT) +#define SIGCHLD_BIT 2 +#define SIGCHLD_MASK (1 << SIGCHLD_BIT) + int signals_enabled; #ifdef UML_CONFIG_UML_TIME_TRAVEL_SUPPORT static int signals_blocked, signals_blocked_pending; @@ -102,6 +106,11 @@ static void sig_handler(int sig, struct siginfo *si, mcontext_t *mc) return; } + if (!enabled && (sig == SIGCHLD)) { + signals_pending |= SIGCHLD_MASK; + return; + } + block_signals_trace(); sig_handler_common(sig, si, mc); @@ -181,6 +190,8 @@ static void (*handlers[_NSIG])(int sig, struct siginfo *si, mcontext_t *mc) = { [SIGIO] = sig_handler, [SIGWINCH] = sig_handler, + /* SIGCHLD is only actually registered in seccomp mode. */ + [SIGCHLD] = sig_handler, [SIGALRM] = timer_alarm_handler, [SIGUSR1] = sigusr1_handler, @@ -344,6 +355,12 @@ void unblock_signals(void) if (save_pending & SIGIO_MASK) sig_handler_common(SIGIO, NULL, NULL); + if (save_pending & SIGCHLD_MASK) { + struct uml_pt_regs regs = {}; + + sigchld_handler(SIGCHLD, NULL, ®s); + } + /* Do not reenter the handler */ if ((save_pending & SIGALRM_MASK) && (!(signals_active & SIGALRM_MASK))) From patchwork Wed Sep 25 20:32:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989532 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=fJuuMQC1; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=aNgOXvK2; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT2L5nbBz1xsM for ; Thu, 26 Sep 2024 06:33:14 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=sojGu6dhAaBtQ/W10d3IhKLWyvz5TXxEeiv95fjLZyo=; b=fJuuMQC1MEoD4b357nhHwmPHaP pgqEZNiFH02/t4mAYhvjUUQXFKdeFaBMWPXFwtfIMsOPifyjnoGpgXDrkGyaEIKk32CB8YEMaEDzU qnGiXBv0X2v35ny/2Yv911oskbV1sHthbutr0i2smzYHfGvUhCWDC0fVBXdv50AkuWD4gg6zmJY19 jbF2KRVhCOjERE6GubOLzJWkieA5cI2cW9+oMB1xvs2ndt47/ouWitUZb9vwohnSHHiXEfPcKKrZK 0PJ9hk5/hdC+GfNoJht7CWtZSAEoLvMU7LEyg/m90X54YVf47ZQN+Z++E1dlAtSgRl9WSRwhpNqLs wZFrFfYQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYhF-00000006V7q-0ma0; Wed, 25 Sep 2024 20:33:13 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYh9-00000006V5d-2r1d for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:33:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=sojGu6dhAaBtQ/W10d3IhKLWyvz5TXxEeiv95fjLZyo=; t=1727296387; x=1728505987; b=aNgOXvK2vVheUg1nCgjY4zUkBneDxCE8o3hqfWJCA8kN7xS 5zgQ83qDUPuaP0exNCHVK+4Og0CDBYWjY+OtmDwA5vp7kbQFHbLm8A9a06b1S9upe844qDUSF0Q9u iJpmOvBrfHbKIPjBoxgCKeswJzdsCct8KyRlamukOWaIOBInJ5GWPDXjVihbDGaCOcSBaBnvY1y7D YLA7fX1lAZP/a001yQrPYqNwANk0BiRoQqX4V+1XTCzkARoDLdFc0NbEKDJ50e5bFQWNUeg9MGex3 wmJ8FfbOB/UFkYjCbBiSpUqM/4x2u8oXR4hNOBSSR+RoUcaHI1T/03germErk/Ag==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYh4-00000001A19-0jTr; Wed, 25 Sep 2024 22:33:02 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg , Johannes Berg , Benjamin Berg Subject: [RFC PATCH 8/9] um: Implement kernel side of SECCOMP based process handling Date: Wed, 25 Sep 2024 22:32:31 +0200 Message-ID: <20240925203232.565086-9-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133308_065967_8DFC628C X-CRM114-Status: GOOD ( 32.88 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: This adds the kernel side of the seccomp based process handling. Co-authored-by: Johannes Berg Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/include/shared/com [...] Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org This adds the kernel side of the seccomp based process handling. Co-authored-by: Johannes Berg Signed-off-by: Benjamin Berg Signed-off-by: Benjamin Berg --- arch/um/include/shared/common-offsets.h | 2 + arch/um/include/shared/os.h | 2 +- arch/um/include/shared/skas/stub-data.h | 5 +- arch/um/kernel/skas/mmu.c | 6 +- arch/um/kernel/skas/stub_exe.c | 147 +++++++- arch/um/os-Linux/internal.h | 5 + arch/um/os-Linux/skas/mem.c | 38 ++- arch/um/os-Linux/skas/process.c | 378 +++++++++++++++------ arch/um/os-Linux/start_up.c | 42 ++- arch/x86/um/shared/sysdep/kernel-offsets.h | 2 + arch/x86/um/tls_32.c | 23 +- 11 files changed, 491 insertions(+), 159 deletions(-) diff --git a/arch/um/include/shared/common-offsets.h b/arch/um/include/shared/common-offsets.h index 253987fc78ac..64654bbd1176 100644 --- a/arch/um/include/shared/common-offsets.h +++ b/arch/um/include/shared/common-offsets.h @@ -30,3 +30,5 @@ DEFINE(UML_CONFIG_UML_TIME_TRAVEL_SUPPORT, CONFIG_UML_TIME_TRAVEL_SUPPORT); #endif DEFINE(UM_KERN_GDT_ENTRY_TLS_ENTRIES, GDT_ENTRY_TLS_ENTRIES); + +DEFINE(UM_SECCOMP_ARCH_NATIVE, SECCOMP_ARCH_NATIVE); diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h index 929ddb437ee1..45f0a94197eb 100644 --- a/arch/um/include/shared/os.h +++ b/arch/um/include/shared/os.h @@ -285,7 +285,7 @@ int protect(struct mm_id *mm_idp, unsigned long addr, /* skas/process.c */ extern int is_skas_winch(int pid, int fd, void *data); -extern int start_userspace(unsigned long stub_stack); +extern int start_userspace(struct mm_id *mm_id); extern void userspace(struct uml_pt_regs *regs, unsigned long *aux_fp_regs); extern void new_thread(void *stack, jmp_buf *buf, void (*handler)(void)); extern void switch_threads(jmp_buf *me, jmp_buf *you); diff --git a/arch/um/include/shared/skas/stub-data.h b/arch/um/include/shared/skas/stub-data.h index 1ee1677abeda..0fb8bc470331 100644 --- a/arch/um/include/shared/skas/stub-data.h +++ b/arch/um/include/shared/skas/stub-data.h @@ -18,6 +18,8 @@ #define FUTEX_IN_KERN 1 struct stub_init_data { + int seccomp; + unsigned long stub_start; int stub_code_fd; @@ -25,7 +27,8 @@ struct stub_init_data { int stub_data_fd; unsigned long stub_data_offset; - unsigned long segv_handler; + unsigned long signal_handler; + unsigned long signal_restorer; }; #define STUB_NEXT_SYSCALL(s) \ diff --git a/arch/um/kernel/skas/mmu.c b/arch/um/kernel/skas/mmu.c index 2704f0342a35..1b37f72a9c35 100644 --- a/arch/um/kernel/skas/mmu.c +++ b/arch/um/kernel/skas/mmu.c @@ -40,13 +40,11 @@ int init_new_context(struct task_struct *task, struct mm_struct *mm) new_id->next = mm_list; mm_list = new_id; - new_id->pid = start_userspace(stack); + ret = start_userspace(new_id); } - if (new_id->pid < 0) { - ret = new_id->pid; + if (ret < 0) goto out_free; - } /* Ensure the new MM is clean and nothing unwanted is mapped */ unmap(new_id, 0, STUB_START); diff --git a/arch/um/kernel/skas/stub_exe.c b/arch/um/kernel/skas/stub_exe.c index 04f75c577f1a..292de5afc06d 100644 --- a/arch/um/kernel/skas/stub_exe.c +++ b/arch/um/kernel/skas/stub_exe.c @@ -3,6 +3,9 @@ #include #include #include +#include +#include +#include void _start(void); @@ -25,8 +28,6 @@ noinline static void real_init(void) } sa = { /* Need to set SA_RESTORER (but the handler never returns) */ .sa_flags = SA_ONSTACK | SA_NODEFER | SA_SIGINFO | 0x04000000, - /* no need to mask any signals */ - .sa_mask = 0, }; /* set a nice name */ @@ -35,6 +36,9 @@ noinline static void real_init(void) /* Make sure this process dies if the kernel dies */ stub_syscall2(__NR_prctl, PR_SET_PDEATHSIG, SIGKILL); + /* Needed in SECCOMP mode (and safe to do anyway) */ + stub_syscall5(__NR_prctl, PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + /* read information from STDIN and close it */ res = stub_syscall3(__NR_read, 0, (unsigned long)&init_data, sizeof(init_data)); @@ -63,18 +67,133 @@ noinline static void real_init(void) stack.ss_sp = (void *)init_data.stub_start + UM_KERN_PAGE_SIZE; stub_syscall2(__NR_sigaltstack, (unsigned long)&stack, 0); - /* register SIGSEGV handler */ - sa.sa_handler_ = (void *) init_data.segv_handler; - res = stub_syscall4(__NR_rt_sigaction, SIGSEGV, (unsigned long)&sa, 0, - sizeof(sa.sa_mask)); - if (res != 0) - stub_syscall1(__NR_exit, 13); - - stub_syscall4(__NR_ptrace, PTRACE_TRACEME, 0, 0, 0); - - stub_syscall2(__NR_kill, stub_syscall0(__NR_getpid), SIGSTOP); - - stub_syscall1(__NR_exit, 14); + /* register signal handlers */ + sa.sa_handler_ = (void *) init_data.signal_handler; + sa.sa_restorer = (void *) init_data.signal_restorer; + if (!init_data.seccomp) { + /* In ptrace mode, the SIGSEGV handler never returns */ + sa.sa_mask = 0; + + res = stub_syscall4(__NR_rt_sigaction, SIGSEGV, + (unsigned long)&sa, 0, sizeof(sa.sa_mask)); + if (res != 0) + stub_syscall1(__NR_exit, 13); + } else { + /* SECCOMP mode uses rt_sigreturn, need to mask all signals */ + sa.sa_mask = ~0ULL; + + res = stub_syscall4(__NR_rt_sigaction, SIGSEGV, + (unsigned long)&sa, 0, sizeof(sa.sa_mask)); + if (res != 0) + stub_syscall1(__NR_exit, 14); + + res = stub_syscall4(__NR_rt_sigaction, SIGSYS, + (unsigned long)&sa, 0, sizeof(sa.sa_mask)); + if (res != 0) + stub_syscall1(__NR_exit, 15); + + res = stub_syscall4(__NR_rt_sigaction, SIGALRM, + (unsigned long)&sa, 0, sizeof(sa.sa_mask)); + if (res != 0) + stub_syscall1(__NR_exit, 16); + + res = stub_syscall4(__NR_rt_sigaction, SIGTRAP, + (unsigned long)&sa, 0, sizeof(sa.sa_mask)); + if (res != 0) + stub_syscall1(__NR_exit, 17); + + res = stub_syscall4(__NR_rt_sigaction, SIGILL, + (unsigned long)&sa, 0, sizeof(sa.sa_mask)); + if (res != 0) + stub_syscall1(__NR_exit, 18); + + res = stub_syscall4(__NR_rt_sigaction, SIGFPE, + (unsigned long)&sa, 0, sizeof(sa.sa_mask)); + if (res != 0) + stub_syscall1(__NR_exit, 19); + } + + /* + * If in seccomp mode, install the SECCOMP filter and trigger a syscall. + * Otherwise set PTRACE_TRACEME and do a SIGSTOP. + */ + if (init_data.seccomp) { + struct sock_filter filter[] = { +#if __BITS_PER_LONG > 32 + /* [0] Load upper 32bit of instruction pointer from seccomp_data */ + BPF_STMT(BPF_LD | BPF_W | BPF_ABS, + (offsetof(struct seccomp_data, instruction_pointer) + 4)), + + /* [1] Jump forward 3 instructions if the upper address is not identical */ + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, (init_data.stub_start) >> 32, 0, 3), +#endif + /* [2] Load lower 32bit of instruction pointer from seccomp_data */ + BPF_STMT(BPF_LD | BPF_W | BPF_ABS, + (offsetof(struct seccomp_data, instruction_pointer))), + + /* [3] Mask out lower bits */ + BPF_STMT(BPF_ALU | BPF_AND | BPF_K, 0xfffff000), + + /* [4] Jump to [6] if the lower bits are not on the expected page */ + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, (init_data.stub_start) & 0xfffff000, 1, 0), + + /* [5] Trap call, allow */ + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_TRAP), + + /* [6,7] Check architecture */ + BPF_STMT(BPF_LD | BPF_W | BPF_ABS, + offsetof(struct seccomp_data, arch)), + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, + UM_SECCOMP_ARCH_NATIVE, 1, 0), + + /* [8] Kill (for architecture check) */ + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS), + + /* [9] Load syscall number */ + BPF_STMT(BPF_LD | BPF_W | BPF_ABS, + offsetof(struct seccomp_data, nr)), + + /* [10-14] Check against permitted syscalls */ + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_futex, + 5, 0), + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, STUB_MMAP_NR, + 4, 0), + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_munmap, + 3, 0), +#ifdef __i386__ + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_set_thread_area, + 2, 0), +#else + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_arch_prctl, + 2, 0), +#endif + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_rt_sigreturn, + 1, 0), + + /* [15] Not one of the permitted syscalls */ + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS), + + /* [16] Permitted call for the stub */ + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW), + }; + struct sock_fprog prog = { + .len = sizeof(filter) / sizeof(filter[0]), + .filter = filter, + }; + + if (stub_syscall3(__NR_seccomp, SECCOMP_SET_MODE_FILTER, + SECCOMP_FILTER_FLAG_TSYNC, + (unsigned long)&prog) != 0) + stub_syscall1(__NR_exit, 20); + + /* Fall through, the exit syscall will cause SIGSYS */ + } else { + stub_syscall4(__NR_ptrace, PTRACE_TRACEME, 0, 0, 0); + + stub_syscall2(__NR_kill, stub_syscall0(__NR_getpid), SIGSTOP); + } + + stub_syscall1(__NR_exit, 30); __builtin_unreachable(); } diff --git a/arch/um/os-Linux/internal.h b/arch/um/os-Linux/internal.h index 317fca190c2b..b4b96bb1f05b 100644 --- a/arch/um/os-Linux/internal.h +++ b/arch/um/os-Linux/internal.h @@ -2,6 +2,9 @@ #ifndef __UM_OS_LINUX_INTERNAL_H #define __UM_OS_LINUX_INTERNAL_H +#include +#include + /* * elf_aux.c */ @@ -16,5 +19,7 @@ void check_tmpexec(void); * skas/process.c */ void wait_stub_done(int pid); +void wait_stub_done_seccomp(struct mm_id *mm_idp, int running, int wait_sigsys); + #endif /* __UM_OS_LINUX_INTERNAL_H */ diff --git a/arch/um/os-Linux/skas/mem.c b/arch/um/os-Linux/skas/mem.c index 9a13ac23c606..26ff609b35c0 100644 --- a/arch/um/os-Linux/skas/mem.c +++ b/arch/um/os-Linux/skas/mem.c @@ -4,6 +4,7 @@ * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) */ +#include #include #include #include @@ -80,27 +81,32 @@ static inline long do_syscall_stub(struct mm_id *mm_idp) int n, i; int err, pid = mm_idp->pid; - n = ptrace_setregs(pid, syscall_regs); - if (n < 0) { - printk(UM_KERN_ERR "Registers - \n"); - for (i = 0; i < MAX_REG_NR; i++) - printk(UM_KERN_ERR "\t%d\t0x%lx\n", i, syscall_regs[i]); - panic("%s : PTRACE_SETREGS failed, errno = %d\n", - __func__, -n); - } - /* Inform process how much we have filled in. */ proc_data->syscall_data_len = mm_idp->syscall_data_len; - err = ptrace(PTRACE_CONT, pid, 0, 0); - if (err) - panic("Failed to continue stub, pid = %d, errno = %d\n", pid, - errno); - - wait_stub_done(pid); + if (using_seccomp) { + proc_data->restart_wait = 1; + wait_stub_done_seccomp(mm_idp, 0, 1); + } else { + n = ptrace_setregs(pid, syscall_regs); + if (n < 0) { + printk(UM_KERN_ERR "Registers -\n"); + for (i = 0; i < MAX_REG_NR; i++) + printk(UM_KERN_ERR "\t%d\t0x%lx\n", i, syscall_regs[i]); + panic("%s : PTRACE_SETREGS failed, errno = %d\n", + __func__, -n); + } + + err = ptrace(PTRACE_CONT, pid, 0, 0); + if (err) + panic("Failed to continue stub, pid = %d, errno = %d\n", + pid, errno); + + wait_stub_done(pid); + } /* - * proc_data->err will be non-zero if there was an (unexpected) error. + * proc_data->err will be negative if there was an (unexpected) error. * In that case, syscall_data_len points to the last executed syscall, * otherwise it will be zero (but we do not need to rely on that). */ diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c index 2329fddf195a..8cc180330113 100644 --- a/arch/um/os-Linux/skas/process.c +++ b/arch/um/os-Linux/skas/process.c @@ -1,9 +1,11 @@ // SPDX-License-Identifier: GPL-2.0 /* + * Copyright (C) 2021 Benjamin Berg * Copyright (C) 2015 Thomas Meyer (thomas@m3y3r.de) * Copyright (C) 2002- 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) */ +#include #include #include #include @@ -25,8 +27,11 @@ #include #include #include +#include +#include #include #include +#include #include "../internal.h" int is_skas_winch(int pid, int fd, void *data) @@ -142,6 +147,74 @@ void wait_stub_done(int pid) fatal_sigsegv(); } +#ifdef CONFIG_UML_SECCOMP +void wait_stub_done_seccomp(struct mm_id *mm_idp, int running, int wait_sigsys) +{ + struct stub_data *data = (void *)mm_idp->stack; + int ret; + + do { + if (!running) { + data->signal = 0; + data->futex = FUTEX_IN_CHILD; + CATCH_EINTR(syscall(__NR_futex, &data->futex, + FUTEX_WAKE, 1, NULL, NULL, 0)); + } + + do { + /* + * We need to check whether the child is still alive + * before and after the FUTEX_WAIT call. Before, in + * case it just died but we still updated data->futex + * to FUTEX_IN_CHILD. And after, in case it died while + * we were waiting (and SIGCHLD woke us up, see the + * IRQ handler in mmu.c). + * + * Either way, if PID is negative, then we have no + * choice but to kill the task. + */ + if (__READ_ONCE(mm_idp->pid) < 0) + goto out_kill; + + ret = syscall(__NR_futex, &data->futex, + FUTEX_WAIT, FUTEX_IN_CHILD, + NULL, NULL, 0); + } while ((ret == -1 && errno == EINTR) && data->futex == FUTEX_IN_CHILD); + + if (__READ_ONCE(mm_idp->pid) < 0) + goto out_kill; + + running = 0; + + /* We may receive a SIGALRM before SIGSYS, iterate again. */ + } while (wait_sigsys && data->signal == SIGALRM); + + if (ret < 0 && errno != EAGAIN) { + printk(UM_KERN_ERR "%s : waiting for child futex failed, errno = %d\n", + __func__, errno); + goto out_kill; + } + + if (data->mctx_offset > sizeof(data->sigstack) - sizeof(mcontext_t)) { + printk(UM_KERN_ERR "%s : invalid mcontext offset", __func__); + goto out_kill; + } + + if (wait_sigsys && data->signal != SIGSYS) { + printk(UM_KERN_ERR "%s : expected SIGSYS but got %d", + __func__, data->signal); + goto out_kill; + } + + return; + +out_kill: + printk(UM_KERN_ERR "%s : failed to wait for stub, pid = %d, errno = %d\n", + __func__, mm_idp->pid, errno); + fatal_sigsegv(); +} +#endif + extern unsigned long current_stub_stack(void); static void get_skas_faultinfo(int pid, struct faultinfo *fi, unsigned long *aux_fp_regs) @@ -194,14 +267,26 @@ static int userspace_tramp(void *stack) int pipe_fds[2]; unsigned long long offset; struct stub_init_data init_data = { + .seccomp = using_seccomp, .stub_start = STUB_START, - .segv_handler = STUB_CODE + - (unsigned long) stub_segv_handler - - (unsigned long) __syscall_stub_start, }; struct iomem_region *iomem; int ret; + if (using_seccomp) { + init_data.signal_handler = STUB_CODE + + (unsigned long) stub_signal_interrupt - + (unsigned long) __syscall_stub_start; + init_data.signal_restorer = STUB_CODE + + (unsigned long) stub_signal_restorer - + (unsigned long) __syscall_stub_start; + } else { + init_data.signal_handler = STUB_CODE + + (unsigned long) stub_segv_handler - + (unsigned long) __syscall_stub_start; + init_data.signal_restorer = 0; + } + init_data.stub_code_fd = phys_mapping(uml_to_phys(__syscall_stub_start), &offset); init_data.stub_code_offset = MMAP_OFFSET(offset); @@ -332,8 +417,9 @@ int userspace_pid[NR_CPUS]; * when negative: an error number. * FIXME: can PIDs become negative?! */ -int start_userspace(unsigned long stub_stack) +int start_userspace(struct mm_id *mm_id) { + struct stub_data *proc_data = (void *)mm_id->stack; void *stack; unsigned long sp; int pid, status, n, err; @@ -352,10 +438,13 @@ int start_userspace(unsigned long stub_stack) /* set stack pointer to the end of the stack page, so it can grow downwards */ sp = (unsigned long)stack + UM_KERN_PAGE_SIZE; + if (using_seccomp) + proc_data->futex = FUTEX_IN_CHILD; + /* clone into new userspace process */ pid = clone(userspace_tramp, (void *) sp, CLONE_VFORK | CLONE_VM | SIGCHLD, - (void *)stub_stack); + (void *)mm_id->stack); if (pid < 0) { err = -errno; printk(UM_KERN_ERR "%s : clone failed, errno = %d\n", @@ -363,29 +452,34 @@ int start_userspace(unsigned long stub_stack) return err; } - do { - CATCH_EINTR(n = waitpid(pid, &status, WUNTRACED | __WALL)); - if (n < 0) { + if (using_seccomp) { + wait_stub_done_seccomp(mm_id, 1, 1); + } else { + do { + CATCH_EINTR(n = waitpid(pid, &status, + WUNTRACED | __WALL)); + if (n < 0) { + err = -errno; + printk(UM_KERN_ERR "%s : wait failed, errno = %d\n", + __func__, errno); + goto out_kill; + } + } while (WIFSTOPPED(status) && (WSTOPSIG(status) == SIGALRM)); + + if (!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGSTOP)) { + err = -EINVAL; + printk(UM_KERN_ERR "%s : expected SIGSTOP, got status = %d\n", + __func__, status); + goto out_kill; + } + + if (ptrace(PTRACE_SETOPTIONS, pid, NULL, + (void *) PTRACE_O_TRACESYSGOOD) < 0) { err = -errno; - printk(UM_KERN_ERR "%s : wait failed, errno = %d\n", + printk(UM_KERN_ERR "%s : PTRACE_SETOPTIONS failed, errno = %d\n", __func__, errno); goto out_kill; } - } while (WIFSTOPPED(status) && (WSTOPSIG(status) == SIGALRM)); - - if (!WIFSTOPPED(status) || (WSTOPSIG(status) != SIGSTOP)) { - err = -EINVAL; - printk(UM_KERN_ERR "%s : expected SIGSTOP, got status = %d\n", - __func__, status); - goto out_kill; - } - - if (ptrace(PTRACE_SETOPTIONS, pid, NULL, - (void *) PTRACE_O_TRACESYSGOOD) < 0) { - err = -errno; - printk(UM_KERN_ERR "%s : PTRACE_SETOPTIONS failed, errno = %d\n", - __func__, errno); - goto out_kill; } if (munmap(stack, UM_KERN_PAGE_SIZE) < 0) { @@ -395,6 +489,8 @@ int start_userspace(unsigned long stub_stack) goto out_kill; } + mm_id->pid = pid; + return pid; out_kill: @@ -408,7 +504,9 @@ extern unsigned long tt_extra_sched_jiffies; void userspace(struct uml_pt_regs *regs, unsigned long *aux_fp_regs) { int err, status, op, pid = userspace_pid[0]; - siginfo_t si; + siginfo_t si_ptrace; + siginfo_t *si; + int sig; /* Handle any immediate reschedules or signals */ interrupt_end(); @@ -438,105 +536,181 @@ void userspace(struct uml_pt_regs *regs, unsigned long *aux_fp_regs) current_mm_sync(); - /* Flush out any pending syscalls */ - err = syscall_stub_flush(current_mm_id()); - if (err) { - if (err == -ENOMEM) - report_enomem(); + if (using_seccomp) { + struct mm_id *mm_id = current_mm_id(); + struct stub_data *proc_data = (void *) mm_id->stack; + int ret; - printk(UM_KERN_ERR "%s - Error flushing stub syscalls: %d", - __func__, -err); - fatal_sigsegv(); - } + ret = set_stub_state(regs, proc_data, singlestepping()); + if (ret) { + printk(UM_KERN_ERR "%s - failed to set regs: %d", + __func__, ret); + fatal_sigsegv(); + } - /* - * This can legitimately fail if the process loads a - * bogus value into a segment register. It will - * segfault and PTRACE_GETREGS will read that value - * out of the process. However, PTRACE_SETREGS will - * fail. In this case, there is nothing to do but - * just kill the process. - */ - if (ptrace(PTRACE_SETREGS, pid, 0, regs->gp)) { - printk(UM_KERN_ERR "%s - ptrace set regs failed, errno = %d\n", - __func__, errno); - fatal_sigsegv(); - } + /* Must have been reset by the syscall caller */ + if (proc_data->restart_wait != 0) + panic("Programming error: Flag to only run syscalls in child was not cleared!"); + + /* Mark pending syscalls for flushing */ + proc_data->syscall_data_len = mm_id->syscall_data_len; + mm_id->syscall_data_len = 0; + + proc_data->signal = 0; + proc_data->futex = FUTEX_IN_CHILD; + CATCH_EINTR(syscall(__NR_futex, &proc_data->futex, + FUTEX_WAKE, 1, NULL, NULL, 0)); + do { + ret = syscall(__NR_futex, &proc_data->futex, + FUTEX_WAIT, FUTEX_IN_CHILD, NULL, NULL, 0); + } while ((ret == -1 && errno == EINTR) || + proc_data->futex == FUTEX_IN_CHILD); + + sig = proc_data->signal; + + if (sig == SIGTRAP && proc_data->err != 0) { + printk(UM_KERN_ERR "%s - Error flushing stub syscalls", + __func__); + syscall_stub_dump_error(mm_id); + fatal_sigsegv(); + } - if (put_fp_registers(pid, regs->fp)) { - printk(UM_KERN_ERR "%s - ptrace set fp regs failed, errno = %d\n", - __func__, errno); - fatal_sigsegv(); - } + ret = get_stub_state(regs, proc_data); + if (ret) { + printk(UM_KERN_ERR "%s - failed to get regs: %d", + __func__, ret); + fatal_sigsegv(); + } - if (singlestepping()) - op = PTRACE_SYSEMU_SINGLESTEP; - else - op = PTRACE_SYSEMU; + if (proc_data->si_offset > sizeof(proc_data->sigstack) - sizeof(*si)) + panic("%s - Invalid siginfo offset from child", + __func__); + si = (void *)&proc_data->sigstack[proc_data->si_offset]; - if (ptrace(op, pid, 0, 0)) { - printk(UM_KERN_ERR "%s - ptrace continue failed, op = %d, errno = %d\n", - __func__, op, errno); - fatal_sigsegv(); - } + regs->is_user = 1; - CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED | __WALL)); - if (err < 0) { - printk(UM_KERN_ERR "%s - wait failed, errno = %d\n", - __func__, errno); - fatal_sigsegv(); - } + /* Fill in ORIG_RAX and extract fault information */ + PT_SYSCALL_NR(regs->gp) = si->si_syscall; + if (sig == SIGSEGV) { + mcontext_t *mcontext = (void *)&proc_data->sigstack[proc_data->mctx_offset]; - regs->is_user = 1; - if (ptrace(PTRACE_GETREGS, pid, 0, regs->gp)) { - printk(UM_KERN_ERR "%s - PTRACE_GETREGS failed, errno = %d\n", - __func__, errno); - fatal_sigsegv(); - } + GET_FAULTINFO_FROM_MC(regs->faultinfo, mcontext); + } + } else { + /* Flush out any pending syscalls */ + err = syscall_stub_flush(current_mm_id()); + if (err) { + if (err == -ENOMEM) + report_enomem(); + + printk(UM_KERN_ERR "%s - Error flushing stub syscalls: %d", + __func__, -err); + fatal_sigsegv(); + } - if (get_fp_registers(pid, regs->fp)) { - printk(UM_KERN_ERR "%s - get_fp_registers failed, errno = %d\n", - __func__, errno); - fatal_sigsegv(); - } + /* + * This can legitimately fail if the process loads a + * bogus value into a segment register. It will + * segfault and PTRACE_GETREGS will read that value + * out of the process. However, PTRACE_SETREGS will + * fail. In this case, there is nothing to do but + * just kill the process. + */ + if (ptrace(PTRACE_SETREGS, pid, 0, regs->gp)) { + printk(UM_KERN_ERR "%s - ptrace set regs failed, errno = %d\n", + __func__, errno); + fatal_sigsegv(); + } - UPT_SYSCALL_NR(regs) = -1; /* Assume: It's not a syscall */ + if (put_fp_registers(pid, regs->fp)) { + printk(UM_KERN_ERR "%s - ptrace set fp regs failed, errno = %d\n", + __func__, errno); + fatal_sigsegv(); + } - if (WIFSTOPPED(status)) { - int sig = WSTOPSIG(status); + if (singlestepping()) + op = PTRACE_SYSEMU_SINGLESTEP; + else + op = PTRACE_SYSEMU; - /* These signal handlers need the si argument. - * The SIGIO and SIGALARM handlers which constitute the - * majority of invocations, do not use it. - */ - switch (sig) { - case SIGSEGV: - case SIGTRAP: - case SIGILL: - case SIGBUS: - case SIGFPE: - case SIGWINCH: - ptrace(PTRACE_GETSIGINFO, pid, 0, (struct siginfo *)&si); - break; + if (ptrace(op, pid, 0, 0)) { + printk(UM_KERN_ERR "%s - ptrace continue failed, op = %d, errno = %d\n", + __func__, op, errno); + fatal_sigsegv(); + } + + CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED | __WALL)); + if (err < 0) { + printk(UM_KERN_ERR "%s - wait failed, errno = %d\n", + __func__, errno); + fatal_sigsegv(); + } + + regs->is_user = 1; + if (ptrace(PTRACE_GETREGS, pid, 0, regs->gp)) { + printk(UM_KERN_ERR "%s - PTRACE_GETREGS failed, errno = %d\n", + __func__, errno); + fatal_sigsegv(); + } + + if (get_fp_registers(pid, regs->fp)) { + printk(UM_KERN_ERR "%s - get_fp_registers failed, errno = %d\n", + __func__, errno); + fatal_sigsegv(); } + if (WIFSTOPPED(status)) { + sig = WSTOPSIG(status); + + /* These signal handlers need the si argument + * and SIGSEGV needs the faultinfo. + * The SIGIO and SIGALARM handlers which constitute the + * majority of invocations, do not use it. + */ + switch (sig) { + case SIGSEGV: + get_skas_faultinfo(pid, + ®s->faultinfo, + aux_fp_regs); + fallthrough; + case SIGTRAP: + case SIGILL: + case SIGBUS: + case SIGFPE: + case SIGWINCH: + ptrace(PTRACE_GETSIGINFO, pid, 0, + (struct siginfo *)&si_ptrace); + si = &si_ptrace; + break; + default: + si = NULL; + break; + } + } else { + sig = 0; + } + } + + UPT_SYSCALL_NR(regs) = -1; /* Assume: It's not a syscall */ + + if (sig) { switch (sig) { case SIGSEGV: - get_skas_faultinfo(pid, - ®s->faultinfo, aux_fp_regs); - - if (PTRACE_FULL_FAULTINFO) - (*sig_info[SIGSEGV])(SIGSEGV, (struct siginfo *)&si, - regs); + if (using_seccomp || PTRACE_FULL_FAULTINFO) + (*sig_info[SIGSEGV])(SIGSEGV, (struct siginfo *)si, + regs); else segv(regs->faultinfo, 0, 1, NULL); + break; + case SIGSYS: + handle_syscall(regs); break; case SIGTRAP + 0x80: handle_trap(pid, regs); break; case SIGTRAP: - relay_signal(SIGTRAP, (struct siginfo *)&si, regs); + relay_signal(SIGTRAP, (struct siginfo *)si, regs); break; case SIGALRM: break; @@ -546,7 +720,7 @@ void userspace(struct uml_pt_regs *regs, unsigned long *aux_fp_regs) case SIGFPE: case SIGWINCH: block_signals_trace(); - (*sig_info[sig])(sig, (struct siginfo *)&si, regs); + (*sig_info[sig])(sig, (struct siginfo *)si, regs); unblock_signals_trace(); break; default: diff --git a/arch/um/os-Linux/start_up.c b/arch/um/os-Linux/start_up.c index bfca66db505f..2f5c2af1db8a 100644 --- a/arch/um/os-Linux/start_up.c +++ b/arch/um/os-Linux/start_up.c @@ -239,21 +239,20 @@ static void __init check_ptrace(void) extern unsigned long exec_regs[MAX_REG_NR]; extern unsigned long exec_fp_regs[FP_SIZE]; +__initdata static struct stub_data *seccomp_test_stub_data; + static void __init sigsys_handler(int sig, siginfo_t *info, void *p) { - struct stub_data *data = get_stub_data(); ucontext_t *uc = p; /* Stow away the location of the mcontext in the stack */ - data->mctx_offset = (unsigned long)&uc->uc_mcontext - - (unsigned long)&data->sigstack[0]; + seccomp_test_stub_data->mctx_offset = (unsigned long)&uc->uc_mcontext - + (unsigned long)&seccomp_test_stub_data->sigstack[0]; exit(0); } static bool __init init_seccomp(void) { - void *data_addr; - struct stub_data *data; int pid; int status; int n; @@ -268,11 +267,9 @@ static bool __init init_seccomp(void) os_info("Checking that seccomp filters can be installed..."); /* data needs to be page aligned, so allocate twice the amount */ - data_addr = mmap(0, 2 * sizeof(*data), - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, 0, 0); - - data = (void*)((long)(data_addr + STUB_DATA_PAGES * UM_KERN_PAGE_SIZE) & - (long)~(STUB_DATA_PAGES * UM_KERN_PAGE_SIZE - 1)); + seccomp_test_stub_data = mmap(0, sizeof(*seccomp_test_stub_data), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANON, 0, 0); pid = fork(); if (pid == 0) { @@ -289,7 +286,8 @@ static bool __init init_seccomp(void) }; struct sigaction sa; - set_sigstack(data->sigstack, sizeof(data->sigstack)); + set_sigstack(seccomp_test_stub_data->sigstack, + sizeof(seccomp_test_stub_data->sigstack)); sa.sa_flags = SA_ONSTACK | SA_NODEFER | SA_SIGINFO; sa.sa_sigaction = (void *) sigsys_handler; @@ -320,12 +318,12 @@ static bool __init init_seccomp(void) struct uml_pt_regs *regs = calloc(1, sizeof(struct uml_pt_regs)); /* Copy registers, the init_registers function assumes ptrace. */ - r = get_stub_state(regs, data); + r = get_stub_state(regs, seccomp_test_stub_data); memcpy(exec_regs, regs->gp, sizeof(exec_regs)); memcpy(exec_fp_regs, regs->fp, sizeof(exec_fp_regs)); - munmap(data, sizeof(*data)); + munmap(seccomp_test_stub_data, sizeof(*seccomp_test_stub_data)); free(regs); @@ -343,7 +341,7 @@ static bool __init init_seccomp(void) else os_info("error\n"); - munmap(data_addr, 2*sizeof(*data)); + munmap(seccomp_test_stub_data, sizeof(*seccomp_test_stub_data)); return false; } #endif @@ -420,12 +418,22 @@ void __init os_early_checks(void) using_seccomp = 0; if (init_seccomp()) { - /* Not fully implemented */ -#if 0 +#ifdef CONFIG_X86_32 + extern int have_fpx_regs; + + /* + * FIXME: This is wrong, but the non-FPX layout is closer to + * what the mcontext presents to us. So, for all intents and + * purposes we'll behave mostly correct if we do this. + * + * At least rt_sigreturn does not corrupt the registers. + */ + have_fpx_regs = 0; +#endif + using_seccomp = 1; return; -#endif } #endif diff --git a/arch/x86/um/shared/sysdep/kernel-offsets.h b/arch/x86/um/shared/sysdep/kernel-offsets.h index 48de3a71f845..6fd1ed400399 100644 --- a/arch/x86/um/shared/sysdep/kernel-offsets.h +++ b/arch/x86/um/shared/sysdep/kernel-offsets.h @@ -4,7 +4,9 @@ #include #include #include +#include #include +#include /* workaround for a warning with -Wmissing-prototypes */ void foo(void); diff --git a/arch/x86/um/tls_32.c b/arch/x86/um/tls_32.c index fbb129023080..21cbb70cf771 100644 --- a/arch/x86/um/tls_32.c +++ b/arch/x86/um/tls_32.c @@ -12,6 +12,7 @@ #include #include #include +#include /* * If needed we can detect when it's uninitialized. @@ -21,13 +22,27 @@ static int host_supports_tls = -1; int host_gdt_entry_tls_min; -static int do_set_thread_area(struct user_desc *info) +static int do_set_thread_area(struct task_struct* task, struct user_desc *info) { int ret; u32 cpu; + if (info->entry_number < host_gdt_entry_tls_min || + info->entry_number >= host_gdt_entry_tls_min + GDT_ENTRY_TLS_ENTRIES) + return -EINVAL; + + if (using_seccomp) { + int idx = info->entry_number - host_gdt_entry_tls_min; + struct stub_data *data = (void *)task->mm->context.id.stack; + + data->arch_data.tls[idx] = *info; + data->arch_data.sync |= BIT(idx); + + return 0; + } + cpu = get_cpu(); - ret = os_set_thread_area(info, userspace_pid[cpu]); + ret = os_set_thread_area(info, task->mm->context.id.pid); put_cpu(); if (ret) @@ -97,7 +112,7 @@ static int load_TLS(int flags, struct task_struct *to) if (!(flags & O_FORCE) && curr->flushed) continue; - ret = do_set_thread_area(&curr->tls); + ret = do_set_thread_area(current, &curr->tls); if (ret) goto out; @@ -275,7 +290,7 @@ SYSCALL_DEFINE1(set_thread_area, struct user_desc __user *, user_desc) return -EFAULT; } - ret = do_set_thread_area(&info); + ret = do_set_thread_area(current, &info); if (ret) return ret; return set_tls_entry(current, &info, idx, 1); From patchwork Wed Sep 25 20:32:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Berg X-Patchwork-Id: 1989531 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=bombadil.20210309 header.b=ULTmxJYv; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=sipsolutions.net header.i=@sipsolutions.net header.a=rsa-sha256 header.s=mail header.b=HX7jRRt8; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2607:7c80:54:3::133; helo=bombadil.infradead.org; envelope-from=linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=patchwork.ozlabs.org) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XDT2K3JRYz1xsM for ; Thu, 26 Sep 2024 06:33:13 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=aI60LNchNhAMMpz6DNsi04Q5NtHswV7B7cto+GSANN4=; b=ULTmxJYvee11OfRfnj5GISm368 kSak+W9oxGX4uSSsHZOII9r3aPGh+aF1GokVp0ZvE4lm/gMSDHIGMj5pk9SvLY186O0uKfZVGB+VT i5GdpfIyN6GQvEMybuApq7vjfHGo5vRprjIDa/DnBtXWgHkl4ffLZScRedwllB0newISpBIamFJZk J1rTQii1py4xq++pAwRXQlg0nH0kNtnZkAXxsnnOl1Ud8o9rD3yjOaPbSqo73v+7wKIeM/1Lsp7/p atLtoVKfbRZ8wAkIxSA0an5QueXhrdn59dt25j2rBEs34Z1+rgJZJM/iYl40/dLwefeprYe+vcwbE eCHIrF9w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1stYhD-00000006V7Q-3UOK; Wed, 25 Sep 2024 20:33:11 +0000 Received: from s3.sipsolutions.net ([2a01:4f8:242:246e::2] helo=sipsolutions.net) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1stYhA-00000006V5u-0Xfo for linux-um@lists.infradead.org; Wed, 25 Sep 2024 20:33:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sipsolutions.net; s=mail; h=Content-Transfer-Encoding:MIME-Version: References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Content-Type:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-To: Resent-Cc:Resent-Message-ID; bh=aI60LNchNhAMMpz6DNsi04Q5NtHswV7B7cto+GSANN4=; t=1727296387; x=1728505987; b=HX7jRRt8BmmIlor+cXvOoqMqv+M4/J0QZ97E+cYsOWFmDPr hFvg4bSj9phRC5CBrn6ElYsf+E4jba1PH5OdsPjTLxCQ+WzDuttmcTDZRVJXDtUQCdHftNzjnZ2z9 UtL+8hR2Qr7vQhAUtctxQRqdD74AqVu02KPTlUYMgGgu7MScoCBKjPHl4MyDKxE6D1GwjuwmBDmTz sJcdttV+uCsV9LsmqrnatIMm2QGb94sEiKIiOXQC7jNKxOV5OqWWtCEFdzK7yNcVpwue2z5LJbDLA c4wPa5Z4C4D9oE1aa4k7/US2KPjtvvvBmWgAORZ8zK9oZSNfXftEAi/RjNUcTbXQ==; Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.97) (envelope-from ) id 1stYh7-00000001A19-1ZkW; Wed, 25 Sep 2024 22:33:06 +0200 From: Benjamin Berg To: linux-um@lists.infradead.org Cc: Benjamin Berg Subject: [RFC PATCH 9/9] um: pass FD for memory operations when needed Date: Wed, 25 Sep 2024 22:32:32 +0200 Message-ID: <20240925203232.565086-10-benjamin@sipsolutions.net> X-Mailer: git-send-email 2.46.1 In-Reply-To: <20240925203232.565086-1-benjamin@sipsolutions.net> References: <20240925203232.565086-1-benjamin@sipsolutions.net> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240925_133308_515435_238DE370 X-CRM114-Status: GOOD ( 37.80 ) X-Spam-Score: -2.1 (--) X-Spam-Report: Spam detection software, running on the system "bombadil.infradead.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: From: Benjamin Berg Instead of always sharing the FDs with the userspace process, only hand over the FDs needed for mmap when required. The idea is that userspace might be able to force the stub into executing an mmap sy [...] Content analysis details: (-2.1 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: linux-um@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-um" Errors-To: linux-um-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org From: Benjamin Berg Instead of always sharing the FDs with the userspace process, only hand over the FDs needed for mmap when required. The idea is that userspace might be able to force the stub into executing an mmap syscall, however, it will not be able to manipulate the control flow sufficiently to have access to an FD that would allow mapping arbitrary memory. Security wise, we need to be sure that only the expected syscalls are executed after the kernel sends FDs through the socket. This is currently not the case, as userspace can trivially jump the rt_sigreturn syscall instruction to execute any syscall that the stub is permitted to do. With this, it can trick the kernel to send the FD, which in turn allows userspace to freely map any physical memory. As such, this is currently *not* secure. However, in principle the approach should be fine with a more strict SECCOMP filter and a careful review of the stub control flow (as userspace can prepare a stack). With some care, it is likely possible to extend the security model to SMP if desired. Signed-off-by: Benjamin Berg --- arch/um/include/shared/skas/mm_id.h | 11 ++ arch/um/include/shared/skas/stub-data.h | 1 + arch/um/kernel/skas/mmu.c | 3 + arch/um/kernel/skas/stub.c | 90 +++++++++++++-- arch/um/kernel/skas/stub_exe.c | 21 +++- arch/um/kernel/tlb.c | 21 +++- arch/um/os-Linux/internal.h | 1 - arch/um/os-Linux/skas/mem.c | 66 ++++++++++- arch/um/os-Linux/skas/process.c | 142 +++++++++++++++++------- 9 files changed, 292 insertions(+), 64 deletions(-) diff --git a/arch/um/include/shared/skas/mm_id.h b/arch/um/include/shared/skas/mm_id.h index 39948f91d89b..1c157447ee57 100644 --- a/arch/um/include/shared/skas/mm_id.h +++ b/arch/um/include/shared/skas/mm_id.h @@ -6,6 +6,12 @@ #ifndef __MM_ID_H #define __MM_ID_H +#ifdef CONFIG_UML_SECCOMP +#define STUB_MAX_FDS 4 +#else +#define STUB_MAX_FDS 0 +#endif + struct mm_id { /* Simple list containing all MMs to react to a dead child */ struct mm_id *next; @@ -13,6 +19,11 @@ struct mm_id { int pid; unsigned long stack; int syscall_data_len; + + /* Only used with SECCOMP mode */ + int sock; + int syscall_fd_num; + int syscall_fd_map[STUB_MAX_FDS]; }; void __switch_mm(struct mm_id *mm_idp); diff --git a/arch/um/include/shared/skas/stub-data.h b/arch/um/include/shared/skas/stub-data.h index 0fb8bc470331..6130fc2658f9 100644 --- a/arch/um/include/shared/skas/stub-data.h +++ b/arch/um/include/shared/skas/stub-data.h @@ -13,6 +13,7 @@ #include #include #include +#include #define FUTEX_IN_CHILD 0 #define FUTEX_IN_KERN 1 diff --git a/arch/um/kernel/skas/mmu.c b/arch/um/kernel/skas/mmu.c index 1b37f72a9c35..37bad160d0db 100644 --- a/arch/um/kernel/skas/mmu.c +++ b/arch/um/kernel/skas/mmu.c @@ -82,6 +82,9 @@ void destroy_context(struct mm_struct *mm) mmu->id.pid = -1; } + if (using_seccomp && mmu->id.sock) + os_close_file(mmu->id.sock); + free_pages(mmu->id.stack, ilog2(STUB_DATA_PAGES)); guard(spinlock_irqsave)(&mm_list_lock); diff --git a/arch/um/kernel/skas/stub.c b/arch/um/kernel/skas/stub.c index 2d0cdb701d29..53cce4d214e5 100644 --- a/arch/um/kernel/skas/stub.c +++ b/arch/um/kernel/skas/stub.c @@ -7,24 +7,54 @@ #ifdef CONFIG_UML_SECCOMP #include +#include #include #endif -static __always_inline int syscall_handler(struct stub_data *d) +/* + * Known security issues + * + * Userspace can jump to this address to execute *any* syscall that is + * permitted by the stub. As we will return afterwards, it can do + * whatever it likes, including: + * - Tricking the kernel into handing out the memory FD + * - Using this memory FD to read/write all physical memory + * - Running in parallel to the kernel processing a syscall + * (possibly creating data races?) + * - Blocking e.g. SIGALRM to avoid time based scheduling + * + * To avoid this, the permitted location for each syscall needs to be + * checked for in the SECCOMP filter (which is reasonably simple). Also, + * more care will need to go into considerations how the code might be + * tricked by using a prepared stack (or even modifying the stack from + * another thread in case SMP support is added). + * + * As for the SIGALRM, the best counter measure will be to check in the + * kernel that the process is reporting back the SIGALRM in a timely + * fashion. + */ +static __always_inline int syscall_handler(int fd_map[STUB_MAX_FDS]) { + struct stub_data *d = get_stub_data(); int i; unsigned long res; + int fd; for (i = 0; i < d->syscall_data_len; i++) { struct stub_syscall *sc = &d->syscall_data[i]; switch (sc->syscall) { case STUB_SYSCALL_MMAP: + if (fd_map) + fd = fd_map[sc->mem.fd]; + else + fd = sc->mem.fd; + res = stub_syscall6(STUB_MMAP_NR, sc->mem.addr, sc->mem.length, sc->mem.prot, MAP_SHARED | MAP_FIXED, - sc->mem.fd, sc->mem.offset); + fd, sc->mem.offset); if (res != sc->mem.addr) { d->err = res; d->syscall_data_len = i; @@ -66,19 +96,35 @@ static __always_inline int syscall_handler(struct stub_data *d) void __section(".__syscall_stub") stub_syscall_handler(void) { - struct stub_data *d = get_stub_data(); - - syscall_handler(d); + syscall_handler(NULL); trap_myself(); } #ifdef CONFIG_UML_SECCOMP -void __attribute__ ((__section__ (".__syscall_stub"))) +void __section(".__syscall_stub") stub_signal_interrupt(int sig, siginfo_t *info, void *p) { struct stub_data *d = get_stub_data(); + char rcv_data; + union { + char data[CMSG_SPACE(sizeof(int) * STUB_MAX_FDS)]; + struct cmsghdr align; + } ctrl = {}; + struct iovec iov = { + .iov_base = &rcv_data, + .iov_len = 1, + }; + struct msghdr msghdr = { + .msg_iov = &iov, + .msg_iovlen = 1, + .msg_control = &ctrl, + .msg_controllen = sizeof(ctrl), + }; ucontext_t *uc = p; + struct cmsghdr *fd_msg; + int *fd_map; + int num_fds; long res; d->signal = sig; @@ -91,19 +137,43 @@ stub_signal_interrupt(int sig, siginfo_t *info, void *p) res = stub_syscall3(__NR_futex, (unsigned long)&d->futex, FUTEX_WAKE, 1); } while (res == -EINTR); + do { res = stub_syscall4(__NR_futex, (unsigned long)&d->futex, FUTEX_WAIT, FUTEX_IN_KERN, 0); } while (res == -EINTR || d->futex == FUTEX_IN_KERN); - if (res < 0 && res != -EAGAIN) - stub_syscall2(__NR_kill, 0, SIGKILL); + if (d->syscall_data_len) { + /* Read passed FDs (if any) */ + do { + res = stub_syscall3(__NR_recvmsg, 0, (unsigned long)&msghdr, 0); + } while (res == -EINTR); + + /* We should never have a receive error (other than -EAGAIN) */ + if (res < 0 && res != -EAGAIN) + stub_syscall1(__NR_exit_group, 1); + + /* Receive the FDs */ + num_fds = 0; + fd_msg = msghdr.msg_control; + fd_map = (void *)&CMSG_DATA(fd_msg); + if (res == iov.iov_len && msghdr.msg_controllen > sizeof(struct cmsghdr)) + num_fds = (fd_msg->cmsg_len - CMSG_LEN(0)) / sizeof(int); + + /* Try running queued syscalls. */ + res = syscall_handler(fd_map); + + while (num_fds) + stub_syscall2(__NR_close, fd_map[--num_fds], 0); + } else { + res = 0; + } - /* Try running queued syscalls. */ - if (syscall_handler(d) < 0 || d->restart_wait) { + if (res < 0 || d->restart_wait) { /* Report SIGSYS if we restart. */ d->signal = SIGSYS; d->restart_wait = 0; + goto restart_wait; } diff --git a/arch/um/kernel/skas/stub_exe.c b/arch/um/kernel/skas/stub_exe.c index 292de5afc06d..a75a781dd998 100644 --- a/arch/um/kernel/skas/stub_exe.c +++ b/arch/um/kernel/skas/stub_exe.c @@ -1,5 +1,6 @@ #include #include +#include #include #include #include @@ -45,7 +46,11 @@ noinline static void real_init(void) if (res != sizeof(init_data)) stub_syscall1(__NR_exit, 10); - stub_syscall1(__NR_close, 0); + /* In SECCOMP mode, FD 0 is a socket and is later used for FD passing */ + if (!init_data.seccomp) + stub_syscall1(__NR_close, 0); + else + stub_syscall3(__NR_fcntl, 0, F_SETFL, O_NONBLOCK); /* map stub code + data */ res = stub_syscall6(STUB_MMAP_NR, @@ -63,6 +68,10 @@ noinline static void real_init(void) if (res != init_data.stub_start + UM_KERN_PAGE_SIZE) stub_syscall1(__NR_exit, 12); + /* In SECCOMP mode, we only need the signalling FD from now on */ + if (init_data.seccomp) + stub_syscall3(__NR_close_range, 1, ~0U, 0); + /* setup signal stack inside stub data */ stack.ss_sp = (void *)init_data.stub_start + UM_KERN_PAGE_SIZE; stub_syscall2(__NR_sigaltstack, (unsigned long)&stack, 0); @@ -153,8 +162,12 @@ noinline static void real_init(void) BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)), - /* [10-14] Check against permitted syscalls */ + /* [10-16] Check against permitted syscalls */ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_futex, + 7, 0), + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,__NR_recvmsg, + 6, 0), + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,__NR_close, 5, 0), BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, STUB_MMAP_NR, 4, 0), @@ -170,10 +183,10 @@ noinline static void real_init(void) BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_rt_sigreturn, 1, 0), - /* [15] Not one of the permitted syscalls */ + /* [17] Not one of the permitted syscalls */ BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL_PROCESS), - /* [16] Permitted call for the stub */ + /* [18] Permitted call for the stub */ BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW), }; struct sock_fprog prog = { diff --git a/arch/um/kernel/tlb.c b/arch/um/kernel/tlb.c index 548af31d4111..912f84001153 100644 --- a/arch/um/kernel/tlb.c +++ b/arch/um/kernel/tlb.c @@ -90,8 +90,20 @@ static inline int update_pte_range(pmd_t *pmd, unsigned long addr, prot, fd, offset); } else ret = ops->unmap(ops->mm_idp, addr, PAGE_SIZE); - } else if (pte_newprot(*pte)) - ret = ops->mprotect(ops->mm_idp, addr, PAGE_SIZE, prot); + } else if (pte_newprot(*pte)) { + if (ops->mprotect) { + ret = ops->mprotect(ops->mm_idp, addr, + PAGE_SIZE, prot); + } else { + __u64 offset; + unsigned long phys = + pte_val(*pte) & PAGE_MASK; + int fd = phys_mapping(phys, &offset); + + ret = ops->mmap(ops->mm_idp, addr, PAGE_SIZE, + prot, fd, offset); + } + } *pte = pte_mkuptodate(*pte); } while (pte++, addr += PAGE_SIZE, ((addr < end) && !ret)); return ret; @@ -184,7 +196,10 @@ int um_tlb_sync(struct mm_struct *mm) } else { ops.mmap = map; ops.unmap = unmap; - ops.mprotect = protect; + if (!using_seccomp) + ops.mprotect = protect; + else + ops.mprotect = NULL; } pgd = pgd_offset(mm, addr); diff --git a/arch/um/os-Linux/internal.h b/arch/um/os-Linux/internal.h index b4b96bb1f05b..09fa232f5695 100644 --- a/arch/um/os-Linux/internal.h +++ b/arch/um/os-Linux/internal.h @@ -21,5 +21,4 @@ void check_tmpexec(void); void wait_stub_done(int pid); void wait_stub_done_seccomp(struct mm_id *mm_idp, int running, int wait_sigsys); - #endif /* __UM_OS_LINUX_INTERNAL_H */ diff --git a/arch/um/os-Linux/skas/mem.c b/arch/um/os-Linux/skas/mem.c index 26ff609b35c0..d0728fe52e9f 100644 --- a/arch/um/os-Linux/skas/mem.c +++ b/arch/um/os-Linux/skas/mem.c @@ -44,6 +44,16 @@ void syscall_stub_dump_error(struct mm_id *mm_idp) print_hex_dump(UM_KERN_ERR, " syscall data: ", 0, 16, 4, sc, sizeof(*sc), 0); + + if (using_seccomp) { + printk(UM_KERN_ERR "%s: FD map num: %d", __func__, + mm_idp->syscall_fd_num); + print_hex_dump(UM_KERN_ERR, + " FD map: ", 0, 16, + sizeof(mm_idp->syscall_fd_map[0]), + mm_idp->syscall_fd_map, + sizeof(mm_idp->syscall_fd_map), 0); + } } static inline unsigned long *check_init_stack(struct mm_id * mm_idp, @@ -119,6 +129,9 @@ static inline long do_syscall_stub(struct mm_id *mm_idp) mm_idp->syscall_data_len = 0; } + if (using_seccomp) + mm_idp->syscall_fd_num = 0; + return mm_idp->syscall_data_len; } @@ -181,6 +194,44 @@ static struct stub_syscall *syscall_stub_get_previous(struct mm_id *mm_idp, return NULL; } +static int get_stub_fd(struct mm_id *mm_idp, int fd) +{ + int i; + + /* Find an FD slot (or flush and use first) */ + if (!using_seccomp) + return fd; + + /* Already crashed, value does not matter */ + if (mm_idp->syscall_data_len < 0) + return 0; + + /* Find existing FD in map if we can allocate another syscall */ + if (mm_idp->syscall_data_len < + ARRAY_SIZE(((struct stub_data *)NULL)->syscall_data)) { + for (i = 0; i < mm_idp->syscall_fd_num; i++) { + if (mm_idp->syscall_fd_map[i] == fd) + return i; + } + + if (mm_idp->syscall_fd_num < STUB_MAX_FDS) { + i = mm_idp->syscall_fd_num; + mm_idp->syscall_fd_map[i] = fd; + + mm_idp->syscall_fd_num++; + + return i; + } + } + + /* FD map full or no syscall space available, continue after flush */ + do_syscall_stub(mm_idp); + mm_idp->syscall_fd_map[0] = fd; + mm_idp->syscall_fd_num = 1; + + return 0; +} + int map(struct mm_id *mm_idp, unsigned long virt, unsigned long len, int prot, int phys_fd, unsigned long long offset) { @@ -188,12 +239,21 @@ int map(struct mm_id *mm_idp, unsigned long virt, unsigned long len, int prot, /* Compress with previous syscall if that is possible */ sc = syscall_stub_get_previous(mm_idp, STUB_SYSCALL_MMAP, virt); - if (sc && sc->mem.prot == prot && sc->mem.fd == phys_fd && + if (sc && sc->mem.prot == prot && sc->mem.offset == MMAP_OFFSET(offset - sc->mem.length)) { - sc->mem.length += len; - return 0; + int prev_fd = sc->mem.fd; + + if (using_seccomp) + prev_fd = mm_idp->syscall_fd_map[sc->mem.fd]; + + if (phys_fd == prev_fd) { + sc->mem.length += len; + return 0; + } } + phys_fd = get_stub_fd(mm_idp, phys_fd); + sc = syscall_stub_alloc(mm_idp); sc->syscall = STUB_SYSCALL_MMAP; sc->mem.addr = virt; diff --git a/arch/um/os-Linux/skas/process.c b/arch/um/os-Linux/skas/process.c index 8cc180330113..4e93a3453ee8 100644 --- a/arch/um/os-Linux/skas/process.c +++ b/arch/um/os-Linux/skas/process.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -154,7 +155,39 @@ void wait_stub_done_seccomp(struct mm_id *mm_idp, int running, int wait_sigsys) int ret; do { + const char byte = 0; + struct iovec iov = { + .iov_base = (void *)&byte, + .iov_len = sizeof(byte), + }; + union { + char data[CMSG_SPACE(sizeof(mm_idp->syscall_fd_map))]; + struct cmsghdr align; + } ctrl; + struct msghdr msgh = { + .msg_iov = &iov, + .msg_iovlen = 1, + }; + if (!running) { + if (mm_idp->syscall_fd_num) { + unsigned int fds_size = + sizeof(int) * mm_idp->syscall_fd_num; + struct cmsghdr *cmsg; + + msgh.msg_control = ctrl.data; + msgh.msg_controllen = CMSG_SPACE(fds_size); + cmsg = CMSG_FIRSTHDR(&msgh); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(fds_size); + memcpy(CMSG_DATA(cmsg), mm_idp->syscall_fd_map, + fds_size); + + CATCH_EINTR(syscall(__NR_sendmsg, mm_idp->sock, + &msgh, 0)); + } + data->signal = 0; data->futex = FUTEX_IN_CHILD; CATCH_EINTR(syscall(__NR_futex, &data->futex, @@ -190,7 +223,7 @@ void wait_stub_done_seccomp(struct mm_id *mm_idp, int running, int wait_sigsys) } while (wait_sigsys && data->signal == SIGALRM); if (ret < 0 && errno != EAGAIN) { - printk(UM_KERN_ERR "%s : waiting for child futex failed, errno = %d\n", + printk(UM_KERN_ERR "%s : waiting for child failed, errno = %d\n", __func__, errno); goto out_kill; } @@ -261,10 +294,16 @@ extern char __syscall_stub_start[]; static int stub_exe_fd; -static int userspace_tramp(void *stack) +struct tramp_data { + struct stub_data *stub_data; + /* 0 is inherited, 1 is the kernel side */ + int sockpair[2]; +}; + +static int userspace_tramp(void *data) { + struct tramp_data *tramp_data = data; char *const argv[] = { "uml-userspace", NULL }; - int pipe_fds[2]; unsigned long long offset; struct stub_init_data init_data = { .seccomp = using_seccomp, @@ -291,31 +330,33 @@ static int userspace_tramp(void *stack) &offset); init_data.stub_code_offset = MMAP_OFFSET(offset); - init_data.stub_data_fd = phys_mapping(uml_to_phys(stack), &offset); + init_data.stub_data_fd = phys_mapping(uml_to_phys(tramp_data->stub_data), + &offset); init_data.stub_data_offset = MMAP_OFFSET(offset); - /* Set CLOEXEC on all FDs and then unset on all memory related FDs */ - close_range(0, ~0U, CLOSE_RANGE_CLOEXEC); + /* dup2 signaling FD/socket to STDIN */ + close(0); + if (dup2(tramp_data->sockpair[0], 0) < 0) + exit(3); - fcntl(init_data.stub_data_fd, F_SETFD, 0); - for (iomem = iomem_regions; iomem; iomem = iomem->next) - fcntl(iomem->fd, F_SETFD, 0); + /* + * Set CLOEXEC on all FDs except the signaling one and then unset for + * the main memory FD as well as IOMEM regions (if not in SECCOMP). + */ + close_range(1, ~0U, CLOSE_RANGE_CLOEXEC); - /* Create a pipe for init_data (no CLOEXEC) and dup2 to STDIN */ - if (pipe2(pipe_fds, 0)) - exit(2); + fcntl(init_data.stub_data_fd, F_SETFD, 0); - close(0); - if (dup2(pipe_fds[0], 0) < 0) { - close(pipe_fds[0]); - close(pipe_fds[1]); - exit(3); + if (!using_seccomp) { + for (iomem = iomem_regions; iomem; iomem = iomem->next) + fcntl(iomem->fd, F_SETFD, 0); } - close(pipe_fds[0]); + + close(tramp_data->sockpair[0]); /* Write init_data and close write side */ - ret = write(pipe_fds[1], &init_data, sizeof(init_data)); - close(pipe_fds[1]); + ret = write(tramp_data->sockpair[1], &init_data, sizeof(init_data)); + close(tramp_data->sockpair[1]); if (ret != sizeof(init_data)) exit(4); @@ -408,7 +449,7 @@ int userspace_pid[NR_CPUS]; /** * start_userspace() - prepare a new userspace process - * @stub_stack: pointer to the stub stack. + * @mm_id: The corresponding struct mm_id * * Setups a new temporary stack page that is used while userspace_tramp() runs * Clones the kernel process into a new userspace process, with FDs only. @@ -420,9 +461,12 @@ int userspace_pid[NR_CPUS]; int start_userspace(struct mm_id *mm_id) { struct stub_data *proc_data = (void *)mm_id->stack; + struct tramp_data tramp_data = { + .stub_data = proc_data, + }; void *stack; unsigned long sp; - int pid, status, n, err; + int status, n, err; /* setup a temporary stack page */ stack = mmap(NULL, UM_KERN_PAGE_SIZE, @@ -438,25 +482,32 @@ int start_userspace(struct mm_id *mm_id) /* set stack pointer to the end of the stack page, so it can grow downwards */ sp = (unsigned long)stack + UM_KERN_PAGE_SIZE; + /* socket pair for init data and SECCOMP FD passing (no CLOEXEC here) */ + if (socketpair(AF_UNIX, SOCK_STREAM, 0, tramp_data.sockpair)) { + err = -errno; + printk(UM_KERN_ERR "%s : socketpair failed, errno = %d\n", + __func__, errno); + return err; + } + if (using_seccomp) proc_data->futex = FUTEX_IN_CHILD; - /* clone into new userspace process */ - pid = clone(userspace_tramp, (void *) sp, + mm_id->pid = clone(userspace_tramp, (void *) sp, CLONE_VFORK | CLONE_VM | SIGCHLD, - (void *)mm_id->stack); - if (pid < 0) { + (void *)&tramp_data); + if (mm_id->pid < 0) { err = -errno; printk(UM_KERN_ERR "%s : clone failed, errno = %d\n", __func__, errno); - return err; + goto out_close; } if (using_seccomp) { wait_stub_done_seccomp(mm_id, 1, 1); } else { do { - CATCH_EINTR(n = waitpid(pid, &status, + CATCH_EINTR(n = waitpid(mm_id->pid, &status, WUNTRACED | __WALL)); if (n < 0) { err = -errno; @@ -473,7 +524,7 @@ int start_userspace(struct mm_id *mm_id) goto out_kill; } - if (ptrace(PTRACE_SETOPTIONS, pid, NULL, + if (ptrace(PTRACE_SETOPTIONS, mm_id->pid, NULL, (void *) PTRACE_O_TRACESYSGOOD) < 0) { err = -errno; printk(UM_KERN_ERR "%s : PTRACE_SETOPTIONS failed, errno = %d\n", @@ -489,12 +540,22 @@ int start_userspace(struct mm_id *mm_id) goto out_kill; } - mm_id->pid = pid; + close(tramp_data.sockpair[0]); + if (using_seccomp) + mm_id->sock = tramp_data.sockpair[1]; + else + close(tramp_data.sockpair[1]); - return pid; + return 0; + +out_kill: + os_kill_ptraced_process(mm_id->pid, 1); +out_close: + close(tramp_data.sockpair[0]); + close(tramp_data.sockpair[1]); + + mm_id->pid = -1; - out_kill: - os_kill_ptraced_process(pid, 1); return err; } @@ -554,17 +615,8 @@ void userspace(struct uml_pt_regs *regs, unsigned long *aux_fp_regs) /* Mark pending syscalls for flushing */ proc_data->syscall_data_len = mm_id->syscall_data_len; - mm_id->syscall_data_len = 0; - proc_data->signal = 0; - proc_data->futex = FUTEX_IN_CHILD; - CATCH_EINTR(syscall(__NR_futex, &proc_data->futex, - FUTEX_WAKE, 1, NULL, NULL, 0)); - do { - ret = syscall(__NR_futex, &proc_data->futex, - FUTEX_WAIT, FUTEX_IN_CHILD, NULL, NULL, 0); - } while ((ret == -1 && errno == EINTR) || - proc_data->futex == FUTEX_IN_CHILD); + wait_stub_done_seccomp(mm_id, 0, 0); sig = proc_data->signal; @@ -572,9 +624,13 @@ void userspace(struct uml_pt_regs *regs, unsigned long *aux_fp_regs) printk(UM_KERN_ERR "%s - Error flushing stub syscalls", __func__); syscall_stub_dump_error(mm_id); + mm_id->syscall_data_len = proc_data->err; fatal_sigsegv(); } + mm_id->syscall_data_len = 0; + mm_id->syscall_fd_num = 0; + ret = get_stub_state(regs, proc_data); if (ret) { printk(UM_KERN_ERR "%s - failed to get regs: %d",