mbox series

[v9,0/6] Add pidfd and cgroupv2 support for process creation

Message ID 20230824164219.391263-1-adhemerval.zanella@linaro.org
Headers show
Series Add pidfd and cgroupv2 support for process creation | expand

Message

Adhemerval Zanella Netto Aug. 24, 2023, 4:42 p.m. UTC
The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
and pidfd_send_signal, and exported the P_PIDFD to use along with
waitid. The pidfd is a race-free interface, however, the pidfd_open is
subject to TOCTOU if the file descriptor is not obtained directly from
the clone or clone3 syscall (there is still a small window between the
clone return and the pidfd_getfd where the process can be reaped and the
process ID reused).

A fully race-free interface with posix_spawn interface is being
discussed by GNOME [1] [2], and Qt already uses it in its QtProcess
implementation [3].  The Qt implementation has some pitfalls:

  - It calls clone through the syscall symbol, which does not run the
    pthread_atfork handlers even though it intends to use the clone
semantic for fork (by only using CLONE_PIDFD | SIGCHLD).

  - It also does not reset any internal state, such as internal IO,
    malloc, loader, etc. locks.

  - It does not set the TCB tid field nor the robust list, used by the
    pthread code.

  - It does not optimize process creation by using CLONE_VM and
    CLONE_VFORK.

Also, the recent Linux kernel (starting with 5.7) provides a way to
create a new process in a different cgroups version 2 than the default
one (through clone3 CLONE_INTO_CGROUP flag).  Providing it through glibc
interfaces makes it usable without the risk of potential breakage by
issuing clone3 syscall directly (check BZ#26371 discussion).

This patch set adds new interfaces that take care of these potential
issues.  The new posix_spawn / posix_spawnp extensions:

  #define POSIX_SPAWN_SETCGROUP 0x100

  int posix_spawnattr_getcgroup_np (const posix_spawnattr_t
				    restrict *attr, int *cgroup);
  int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr,
                                    int cgroup);
  
Allow spawning a new process on a different cgroupv2.  

The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and
posix_spawnp, but return a process file descriptor instead of a PID.

  int pidfd_spawn (int *restrict pidfd,
 		   const char *restrict file,
  		   const posix_spawn_file_actions_t *restrict facts,
  		   const posix_spawnattr_t *restrict attrp,
  		   char *const argv[restrict],
  		   char *const envp[restrict]);

  int pidfd_spawnp (int *restrict pidfd,
 		    const char *restrict path,
  		    const posix_spawn_file_actions_t *restrict facts,
  		    const posix_spawnattr_t *restrict attrp,
  		    char *const argv[restrict_arr],
  		    char *const envp[restrict_arr]);

The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported.  It
ensures that a non-racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces).  If the kernel does
not have the required support the interface returns ENOSYS.

A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the argument
lifetime.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to either avoid rehashing the posix_spawn API or add
a new one.  It also means that both interfaces support the same
attribute and file actions, and a new flag or file action on posix_spawn
is also added automatically for pidfd_spawn. It includes
POSIX_SPAWN_SETCGROUP.

To have a way to interop between process IDs and process file
descriptors, the pidfd_getpid is also provided:

   pid_t pidfd_getpid (int fd)

It reads the procfs fdinfo entry from the file descriptor to get the
process ID.

[1] https://gitlab.gnome.org/GNOME/glib/-/issues/1866
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=30349
[3] https://codebrowser.dev/qt6/qtbase/src/3rdparty/forkfd/forkfd_linux.c.html

---

Changes from v8:
- Removed fork_np.

Changes from v7:
- Redefine __ASSUME_CLONE3 to 0 if the architecture does not support the
  syscall.
- Fixed some failing errors to be reported by spawned processes.
- Fixed pre-commit CI for AArch64 failures.
- Rename pidfd_fork to fork_np and make the API extensible
- Document more possible pidfd_getpid errors.

Changes from v6:
- Rebased against master, adjusted symbol version and NEWS entry.
- Added arm/mips clone3 implementation.

Changes from v5:
- Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork.

Changes from v4:
- Changed pidfd_fork signature to return a pid_t instead of the PID file
  descriptor.
- Changed pidfd_getpid to return EBADF for negative input, instead of
  EINVAL.
- Added PIDFDFORK_NOSIGCHLD option.
- Fixed nested __BEGIN_DECLS on spawn.h

Changes from v3:
- Remove strtoul usage.
- Fixed patchwork tst-pidfd_getpid.c regression.
- Fixed manual and NEWS typos.

Changes from v2:
- Added pidfd_fork and pidfd_getpid manual entries
- Change pidfd_fork to act as fork as default, instead as _Fork.
- Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE.
- Added pidfd_getpid test for EREMOTE.

Changes from v1:
- Extended pidfd_getpid error codes to return EBADF if fdinfo does not
  have Pid entry or if the value is invalid, EREMOTE is pid is in a 
  separate namespace, and ESRCH if is already terminated.
- Extended tst-pidfd_getpid.
- Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clashes
  with possible kernel extensions.

Adhemerval Zanella (6):
  arm: Add the clone3 wrapper
  mips: Add the clone3 wrapper
  linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and
    sparc
  linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
  posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
  linux: Add pidfd_getpid

 NEWS                                          |  17 ++
 bits/spawn_ext.h                              |  21 ++
 include/clone_internal.h                      |   4 +
 manual/process.texi                           |  52 +++-
 posix/Makefile                                |   2 +
 posix/spawn.h                                 |   6 +-
 posix/spawn_int.h                             |   3 +-
 posix/spawnattr_setflags.c                    |   3 +-
 posix/tst-posix_spawn-setsid.c                | 169 +++++++++----
 posix/tst-spawn-chdir.c                       |  15 +-
 posix/tst-spawn.c                             |  24 +-
 posix/tst-spawn.h                             |  36 +++
 posix/tst-spawn2.c                            |  17 +-
 posix/tst-spawn3.c                            |  99 ++++----
 posix/tst-spawn4.c                            |   7 +-
 posix/tst-spawn5.c                            |  14 +-
 posix/tst-spawn6.c                            |  13 +-
 posix/tst-spawn7.c                            |  13 +-
 sysdeps/unix/sysv/linux/Makefile              |  26 ++
 sysdeps/unix/sysv/linux/Versions              |   7 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   5 +
 .../unix/sysv/linux/alpha/kernel-features.h   |   4 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   5 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   5 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   5 +
 sysdeps/unix/sysv/linux/arm/clone3.S          |  80 +++++++
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   5 +
 sysdeps/unix/sysv/linux/arm/sysdep.h          |   1 +
 sysdeps/unix/sysv/linux/bits/spawn_ext.h      |  67 ++++++
 sysdeps/unix/sysv/linux/clone-pidfd-support.c |  60 +++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   5 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   5 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   5 +
 .../unix/sysv/linux/ia64/kernel-features.h    |   4 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   5 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   5 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   5 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   5 +
 .../sysv/linux/microblaze/be/libc.abilist     |   5 +
 .../sysv/linux/microblaze/le/libc.abilist     |   5 +
 sysdeps/unix/sysv/linux/mips/clone3.S         | 139 +++++++++++
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   5 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   5 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   5 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   5 +
 sysdeps/unix/sysv/linux/mips/sysdep.h         |   2 +
 .../unix/sysv/linux/nios2/kernel-features.h   |  24 ++
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   5 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   5 +
 sysdeps/unix/sysv/linux/pidfd_getpid.c        | 126 ++++++++++
 sysdeps/unix/sysv/linux/pidfd_spawn.c         |  30 +++
 sysdeps/unix/sysv/linux/pidfd_spawnp.c        |  30 +++
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   5 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   5 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   5 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   5 +
 sysdeps/unix/sysv/linux/procutils.c           |  97 ++++++++
 sysdeps/unix/sysv/linux/procutils.h           |  43 ++++
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   5 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   5 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   5 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   5 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   5 +
 sysdeps/unix/sysv/linux/sh/kernel-features.h  |   4 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   5 +
 .../unix/sysv/linux/sparc/kernel-features.h   |   4 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   5 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   5 +
 .../unix/sysv/linux/spawnattr_getcgroup_np.c  |  28 +++
 .../unix/sysv/linux/spawnattr_setcgroup_np.c  |  27 +++
 sysdeps/unix/sysv/linux/spawni.c              |  44 +++-
 sysdeps/unix/sysv/linux/sys/pidfd.h           |   4 +
 sysdeps/unix/sysv/linux/tst-pidfd.c           |  47 ++++
 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c    | 123 ++++++++++
 .../sysv/linux/tst-posix_spawn-setsid-pidfd.c |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c    | 223 ++++++++++++++++++
 .../unix/sysv/linux/tst-spawn-chdir-pidfd.c   |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c     |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h     |  63 +++++
 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c    |  20 ++
 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c    |  20 ++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   5 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   5 +
 87 files changed, 2018 insertions(+), 154 deletions(-)
 create mode 100644 bits/spawn_ext.h
 create mode 100644 posix/tst-spawn.h
 create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
 create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
 create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
 create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
 create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.c
 create mode 100644 sysdeps/unix/sysv/linux/procutils.h
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c

Comments

Luca Boccassi Aug. 28, 2023, 12:42 p.m. UTC | #1
> The glibc 2.36 added wrappers for Linux syscall pidfd_open,
> pidfd_getfd,
> and pidfd_send_signal, and exported the P_PIDFD to use along with
> waitid. The pidfd is a race-free interface, however, the pidfd_open
> is
> subject to TOCTOU if the file descriptor is not obtained directly
> from
> the clone or clone3 syscall (there is still a small window between
> the
> clone return and the pidfd_getfd where the process can be reaped and
> the
> process ID reused).

Thanks, this looks great and will make use of all these new APIs from
systemd as soon as they are available, so for the series:

Acked-by: Luca Boccassi <bluca@debian.org>