From patchwork Thu Aug 3 16:35:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1816578 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=o6rOvUIg; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RGvcZ0FwPz1ybS for ; Fri, 4 Aug 2023 02:36:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ED83D3858025 for ; Thu, 3 Aug 2023 16:36:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ED83D3858025 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1691080587; bh=nJQXHZGViPhAJTb3vavtOhzFJv4EVkz8lnXuBgJ/pR8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=o6rOvUIghoFu0Uy6wrIVtnZMuWOsNwIS8nXuG8ZCp7aLTWS6+MtvEYiE4wwVR6bSU zV0NQAmCRRypUyzPm/kTQqp+kn97gm832RSyBzQ8vYLmdczH64GbfXKBYMgZuPUz+S 3VurrWrhPOtMNJSaqF7eqXG5m4OYBBK9J/nmpHZM= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id B9C583858D35 for ; Thu, 3 Aug 2023 16:36:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B9C583858D35 Received: by mail-ot1-x32d.google.com with SMTP id 46e09a7af769-6bc8d1878a0so1047319a34.1 for ; Thu, 03 Aug 2023 09:36:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691080563; x=1691685363; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nJQXHZGViPhAJTb3vavtOhzFJv4EVkz8lnXuBgJ/pR8=; b=fGrqRwLnOzJh4lLWs0QwYpoqhUaam1DGFGDQY+WYpRBDWdL96hjgLnpsvMsL4KgjkK IvLeGr8dA5Izaq2CYbb0ftXikqyC4KO/Vsb5jrtDJYjgto5aV+1IhhxIl9jU6J84vJMB tgxaV+Q1DADFEWkARS6GUtCF5uyidnUgMb/jdmaa1F6oioLGRjo4i1B1xxq6w2j455nf wjAFP3MRSqZu3R6WmkHgnOx5VMKeg2I81r388L1oNoyEAFdjGGmeCgJQ5T3vjU/lz4Ff wq17eRUEKgpVHzJ49OrZUO6ZF+Hav7jGkFfCWKLk6h9bRVNMHq0XB+kZJSRbHFyY2HYh OSDg== X-Gm-Message-State: ABy/qLYWrcaRmbsWYJ36Sc/zesS9YruTQL6AP2LtxwenbtbZy/2QJHFi 1+GKR7WgbdnQo/73bLFUDeBkuTpt4ZKVH7O/Mo5fIw== X-Google-Smtp-Source: APBJJlE0XvOTdT8rtjkGcRPXjjTuiFCBXLrh3oIioIYR/6mpG1HruOzLXLzs6zCIJV/Ara3imAfrHQ== X-Received: by 2002:a05:6870:a90d:b0:1bd:f87e:6ad3 with SMTP id eq13-20020a056870a90d00b001bdf87e6ad3mr16692786oab.30.1691080563216; Thu, 03 Aug 2023 09:36:03 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c1:9aa9:6a6b:da4a:374c:385a]) by smtp.gmail.com with ESMTPSA id t12-20020a0568301e2c00b006b9a9bc7773sm148694otr.56.2023.08.03.09.36.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 09:36:02 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v7 0/8] Add pidfd and cgroupv2 support for process creation Date: Thu, 3 Aug 2023 13:35:50 -0300 Message-Id: <20230803163558.991832-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use along with waitid. The pidfd is a race free interface, however the pidfd_open is subject to TOCTOU if the file descriptor is not obtained directly from the clone or clone3 syscall (there is still a small window between the clone return and the pidfd_getfd where the process can be reaped and the process ID reused). A fully race free interface with posix_spawn interface is being discussed by GNOME [1] [2], and Qt already uses on its QtProcess implementation [3]. The Qt implementation has some pitfalls: - It calls clone through the syscall symbol, which does not run the pthread_atfork handlers even though it really intends to use the clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD). - It also does not reset any internal state, such as internal IO, malloc, loader, etc. locks. - It does not set the TCB tid field nor the robust list, used by pthread code. - It does not optimize process creation by using CLONE_VM and CLONE_VFORK. Also, recent Linux kernel (starting with 5.7) provide a way to create a new process in a different cgroups version 2 than the default one (through clone3 CLONE_INTO_CGROUP flag). Providing it through glibc interfaces make is usable without the risk of potential breakage by issuing clone3 syscall directly (check BZ#26371 discussion). This patchset adds new interfaces that take care of this potential issues. The new posix_spawn / posix_spawnp extesions: #define POSIX_SPAWN_SETCGROUP 0x100 int posix_spawnattr_getcgroup_np (const posix_spawnattr_t restrict *attr, int *cgroup); int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr, int cgroup); Allow spawn a new process on a different cgroupv2. The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and posix_spawnp, but return a process file descriptor instead of a PID. int pidfd_spawn (int *restrict pidfd, const char *restrict file, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]) int pidfd_spawnp (int *restrict pidfd, const char *restrict path, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict_arr], char *const envp[restrict_arr]); The implementation makes sure that kernel must support the complete pidfd interface, meaning that waitid (P_PIDFD) should be supported. It ensure that non racy workaround is required (such as reading procfs fdinfo pid to use along with old wait interfaces). If kernel does not have the required support the interface returns ENOSYS. A new symbol is used instead of a posix_spawn extension to avoid possible issue with language bindings that might track the argument lifetime. Both symbols reuse the posix_spawn posix_spawn_file_actions_t and posix_spawnattr_t, to either avoid rehash posix_spawn API or add a new one. It also mean that both interfaces support the same attribute and file actions, and a new flag or file actions on posix_spawn is also added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP. Along with the spawn interface, a fork like one is also provided: pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags) If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork acts as fork. Otherwise, a new file descriptor is returned and the kernel already sets O_CLOEXEC as default. The pidfd_fork follows fork/_Fork convention on returning a positive or negative value to the parent (with negative indicating an error) and zero to the child. If cgroup is 0 or positive value, it is interpreted as a different cgroup to be place the new process (check CLONE_INTO_CGROUP clone flag). The kernel already sets O_CLOEXEC as default and it follows fork/_Fork convention on returning a positive or negative value to the parent (with negative indicating an error) and zero to the child. Similar to fork, pidfd_fork also runs the pthread_atfork handlers It can be change by using PIDFDFORK_ASYNCSAFE flag, which make pidfd_fork acts a _Fork. It also send SIGCHLD to parent when process terminates. To have a way to interop between process IDs and process file descriptors, the pidfd_getpid is also provided: pid_t pidfd_getpid (int fd) It reads the procfs fdinfo entry from the file descriptor to get the process ID. --- Changes from v6: - Rebased against master, adjusted symbol version and NEWS entry. - Added arm/mips clone3 implementation. Changes from v5: - Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork. Changes from v4: - Changed pidfd_fork signature to return a pid_t instead of PID file descriptor. - Changed pidfd_getpid to return EBADF for negative input, instead of EINVAL. - Added PIDFDFORK_NOSIGCHLD option. - Fixed nested __BEGIN_DECLS on spawn.h Changes from v3: - Remove strtoul usage. - Fixed patchwork tst-pidfd_getpid.c regression. - Fixed manual and NEWS typos. Changes from v2: - Added pidfd_fork and pidfd_getpid manual entries - Change pidfd_fork to act as fork as default, instead as _Fork. - Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE. - Added pidfd_getpid test for EREMOTE. Changes from v1: - Extended pidfd_getpid error codes to return EBADF if fdinfo does not have Pid entry or if the value is invalid, EREMOTE is pid is in a separate namespace, and ESRCH if is already terminated. - Extended tst-pidfd_getpid. - Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clash with possible kernel extensions. Adhemerval Zanella (8): arm: Add the clone3 wrapper mips: Add the clone3 wrapper linux: Undef __ASSUME_CLONE3 for alpha, ia64, nios2, sh, and sparc linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731) posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) posix: Add pidfd_fork (BZ 26371) posix: Add PIDFDFORK_NOSIGCHLD for pidfd_fork linux: Add pidfd_getpid NEWS | 22 +- bits/spawn_ext.h | 21 ++ include/clone_internal.h | 21 ++ manual/process.texi | 92 ++++++- posix/Makefile | 5 +- posix/fork-internal.c | 127 ++++++++++ posix/fork-internal.h | 36 +++ posix/fork.c | 107 +-------- posix/spawn.h | 6 +- posix/spawn_int.h | 3 +- posix/spawnattr_setflags.c | 3 +- posix/tst-posix_spawn-setsid.c | 168 +++++++++---- posix/tst-spawn-chdir.c | 15 +- posix/tst-spawn.c | 24 +- posix/tst-spawn.h | 36 +++ posix/tst-spawn2.c | 17 +- posix/tst-spawn3.c | 100 ++++---- posix/tst-spawn4.c | 7 +- posix/tst-spawn5.c | 14 +- posix/tst-spawn6.c | 15 +- posix/tst-spawn7.c | 13 +- sysdeps/nptl/_Fork.c | 2 +- sysdeps/unix/sysv/linux/Makefile | 29 +++ sysdeps/unix/sysv/linux/Versions | 8 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 6 + .../unix/sysv/linux/alpha/kernel-features.h | 3 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 6 + sysdeps/unix/sysv/linux/arc/libc.abilist | 6 + sysdeps/unix/sysv/linux/arch-fork.h | 16 +- sysdeps/unix/sysv/linux/arm/be/libc.abilist | 6 + sysdeps/unix/sysv/linux/arm/clone3.S | 80 ++++++ sysdeps/unix/sysv/linux/arm/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/arm/sysdep.h | 1 + sysdeps/unix/sysv/linux/bits/spawn_ext.h | 71 ++++++ sysdeps/unix/sysv/linux/clone-internal.c | 62 ++++- sysdeps/unix/sysv/linux/clone-pidfd-support.c | 58 +++++ sysdeps/unix/sysv/linux/csky/libc.abilist | 6 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 6 + sysdeps/unix/sysv/linux/i386/libc.abilist | 6 + .../unix/sysv/linux/ia64/kernel-features.h | 3 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 6 + .../sysv/linux/loongarch/lp64/libc.abilist | 6 + .../sysv/linux/m68k/coldfire/libc.abilist | 6 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 6 + .../sysv/linux/microblaze/be/libc.abilist | 6 + .../sysv/linux/microblaze/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/mips/clone3.S | 139 +++++++++++ .../sysv/linux/mips/mips32/fpu/libc.abilist | 6 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 6 + .../sysv/linux/mips/mips64/n32/libc.abilist | 6 + .../sysv/linux/mips/mips64/n64/libc.abilist | 6 + sysdeps/unix/sysv/linux/mips/sysdep.h | 2 + .../unix/sysv/linux/nios2/kernel-features.h | 23 ++ sysdeps/unix/sysv/linux/nios2/libc.abilist | 6 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 6 + sysdeps/unix/sysv/linux/pidfd_fork.c | 82 +++++++ sysdeps/unix/sysv/linux/pidfd_getpid.c | 122 ++++++++++ sysdeps/unix/sysv/linux/pidfd_spawn.c | 30 +++ sysdeps/unix/sysv/linux/pidfd_spawnp.c | 30 +++ .../linux/powerpc/powerpc32/fpu/libc.abilist | 6 + .../powerpc/powerpc32/nofpu/libc.abilist | 6 + .../linux/powerpc/powerpc64/be/libc.abilist | 6 + .../linux/powerpc/powerpc64/le/libc.abilist | 6 + sysdeps/unix/sysv/linux/procutils.c | 104 ++++++++ sysdeps/unix/sysv/linux/procutils.h | 35 +++ .../unix/sysv/linux/riscv/rv32/libc.abilist | 6 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 6 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 6 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 6 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 6 + sysdeps/unix/sysv/linux/sh/kernel-features.h | 3 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 6 + .../unix/sysv/linux/sparc/kernel-features.h | 3 + .../sysv/linux/sparc/sparc32/libc.abilist | 6 + .../sysv/linux/sparc/sparc64/libc.abilist | 6 + .../unix/sysv/linux/spawnattr_getcgroup_np.c | 28 +++ .../unix/sysv/linux/spawnattr_setcgroup_np.c | 27 +++ sysdeps/unix/sysv/linux/spawni.c | 40 ++- sysdeps/unix/sysv/linux/sys/pidfd.h | 25 ++ sysdeps/unix/sysv/linux/tst-pidfd.c | 47 ++++ .../unix/sysv/linux/tst-pidfd_fork-cgroup.c | 162 +++++++++++++ sysdeps/unix/sysv/linux/tst-pidfd_fork.c | 227 ++++++++++++++++++ sysdeps/unix/sysv/linux/tst-pidfd_getpid.c | 187 +++++++++++++++ .../sysv/linux/tst-posix_spawn-setsid-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-cgroup.c | 216 +++++++++++++++++ .../unix/sysv/linux/tst-spawn-chdir-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn-pidfd.h | 63 +++++ sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c | 20 ++ sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c | 20 ++ .../unix/sysv/linux/x86_64/64/libc.abilist | 6 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 6 + 96 files changed, 2894 insertions(+), 270 deletions(-) create mode 100644 bits/spawn_ext.h create mode 100644 posix/fork-internal.c create mode 100644 posix/fork-internal.h create mode 100644 posix/tst-spawn.h create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h create mode 100644 sysdeps/unix/sysv/linux/pidfd_fork.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c create mode 100644 sysdeps/unix/sysv/linux/procutils.c create mode 100644 sysdeps/unix/sysv/linux/procutils.h create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork-cgroup.c create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_fork.c create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c