Message ID | 20230706134508.422526-1-adhemerval.zanella@linaro.org |
---|---|
Headers | show |
Series | Add pidfd and cgroupv2 support for process creation (resend) | expand |
On Thu, 6 Jul 2023 at 14:45, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > The glibc 2.36 added wrappers for Linux syscall pidfd_open, > pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use > along with waitid. The pidfd is a race free interface, however > the pidfd_open is subject to TOCTOU if the file descriptor > is not obtained directly from the clone or clone3 syscall (there is > still a small window between the clone return and the pidfd_getfd > where the process can be reaped and the process ID reused). > > A fully race free interface with posix_spawn interface is being > discussed by GNOME [1] [2], and Qt already uses on its QtProcess > implementation [3]. The Qt implementation has some pitfalls: > > - It calls clone through the syscall symbol, which does not run the > pthread_atfork handlers even though it really intends to use the > clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD). > > - It also does not reset any internal state, such as internal IO, > malloc, loader, etc. locks. > > - It does not set the TCB tid field nor the robust list, used by > pthread code. > > - It does not optimize process creation by using CLONE_VM and > CLONE_VFORK. > > Also, recent Linux kernel (starting with 5.7) provide a way to > create a new process in a different cgroups version 2 than the > default one (through clone3 CLONE_INTO_CGROUP flag). Providing it > through glibc interfaces make is usable without the risk of potential > breakage by issuing clone3 syscall directly (check BZ#26371 discussion). > > This patchset adds new interfaces that take care of this potential > issues. The new posix_spawn / posix_spawnp extesions: > > > #define POSIX_SPAWN_SETCGROUP 0x100 > > int posix_spawnattr_getcgroup_np (const posix_spawnattr_t > restrict *attr, int *cgroup); > int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr, > int cgroup); > > Allow spawn a new process on a different cgroupv2. > > The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and > posix_spawnp, > but return a process file descriptor instead of a PID. > > int pidfd_spawn (int *restrict pidfd, > const char *restrict file, > const posix_spawn_file_actions_t *restrict facts, > const posix_spawnattr_t *restrict attrp, > char *const argv[restrict], > char *const envp[restrict]) > > int pidfd_spawnp (int *restrict pidfd, > const char *restrict path, > const posix_spawn_file_actions_t *restrict facts, > const posix_spawnattr_t *restrict attrp, > char *const argv[restrict_arr], > char *const envp[restrict_arr]); > > The implementation makes sure that kernel must support the complete > pidfd interface, meaning that waitid (P_PIDFD) should be supported. It > ensure that non racy workaround is required (such as reading procfs > fdinfo pid to use along with old wait interfaces). If kernel does not > have the required support the interface returns ENOSYS. > > A new symbol is used instead of a posix_spawn extension to avoid > possible issue with language bindings that might track the argument > lifetime. > > Both symbols reuse the posix_spawn posix_spawn_file_actions_t and > posix_spawnattr_t, to either avoid rehash posix_spawn API or add a new > one. It also mean that both interfaces support the same attribute and > file actions, and a new flag or file actions on posix_spawn is also > added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP. > > Along with the spawn interface, a fork like one is also provided: > > pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags) > > If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork > acts as fork. Otherwise, a new file descriptor is returned and the > kernel already sets O_CLOEXEC as default. The pidfd_fork follows > fork/_Fork convention on returning a positive or negative value to the > parent (with negative indicating an error) and zero to the child. > > If cgroup is 0 or positive value, it is interpreted as a different > cgroup to be place the new process (check CLONE_INTO_CGROUP clone > flag). > > The kernel already sets O_CLOEXEC as default and it follows fork/_Fork > convention on returning a positive or negative value to the parent > (with negative indicating an error) and zero to the child. > > Similar to fork, pidfd_fork also runs the pthread_atfork handlers > It can be change by using PIDFDFORK_ASYNCSAFE flag, which make > pidfd_fork acts a _Fork. It also send SIGCHLD to parent when > process terminates. > > To have a way to interop between process IDs and process file > descriptors, the pidfd_getpid is also provided: > > pid_t pidfd_getpid (int fd) > > It reads the procfs fdinfo entry from the file descriptor to get > the process ID. Hi, any update on this series? Kind regards, Luca Boccassi
On 08/08/23 16:35, Luca Boccassi wrote: > On Thu, 6 Jul 2023 at 14:45, Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: >> >> The glibc 2.36 added wrappers for Linux syscall pidfd_open, >> pidfd_getfd, and pidfd_send_signal, and exported the P_PIDFD to use >> along with waitid. The pidfd is a race free interface, however >> the pidfd_open is subject to TOCTOU if the file descriptor >> is not obtained directly from the clone or clone3 syscall (there is >> still a small window between the clone return and the pidfd_getfd >> where the process can be reaped and the process ID reused). >> >> A fully race free interface with posix_spawn interface is being >> discussed by GNOME [1] [2], and Qt already uses on its QtProcess >> implementation [3]. The Qt implementation has some pitfalls: >> >> - It calls clone through the syscall symbol, which does not run the >> pthread_atfork handlers even though it really intends to use the >> clone semantic for fork (by only using CLONE_PIDFD | SIGCHLD). >> >> - It also does not reset any internal state, such as internal IO, >> malloc, loader, etc. locks. >> >> - It does not set the TCB tid field nor the robust list, used by >> pthread code. >> >> - It does not optimize process creation by using CLONE_VM and >> CLONE_VFORK. >> >> Also, recent Linux kernel (starting with 5.7) provide a way to >> create a new process in a different cgroups version 2 than the >> default one (through clone3 CLONE_INTO_CGROUP flag). Providing it >> through glibc interfaces make is usable without the risk of potential >> breakage by issuing clone3 syscall directly (check BZ#26371 discussion). >> >> This patchset adds new interfaces that take care of this potential >> issues. The new posix_spawn / posix_spawnp extesions: >> >> >> #define POSIX_SPAWN_SETCGROUP 0x100 >> >> int posix_spawnattr_getcgroup_np (const posix_spawnattr_t >> restrict *attr, int *cgroup); >> int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr, >> int cgroup); >> >> Allow spawn a new process on a different cgroupv2. >> >> The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and >> posix_spawnp, >> but return a process file descriptor instead of a PID. >> >> int pidfd_spawn (int *restrict pidfd, >> const char *restrict file, >> const posix_spawn_file_actions_t *restrict facts, >> const posix_spawnattr_t *restrict attrp, >> char *const argv[restrict], >> char *const envp[restrict]) >> >> int pidfd_spawnp (int *restrict pidfd, >> const char *restrict path, >> const posix_spawn_file_actions_t *restrict facts, >> const posix_spawnattr_t *restrict attrp, >> char *const argv[restrict_arr], >> char *const envp[restrict_arr]); >> >> The implementation makes sure that kernel must support the complete >> pidfd interface, meaning that waitid (P_PIDFD) should be supported. It >> ensure that non racy workaround is required (such as reading procfs >> fdinfo pid to use along with old wait interfaces). If kernel does not >> have the required support the interface returns ENOSYS. >> >> A new symbol is used instead of a posix_spawn extension to avoid >> possible issue with language bindings that might track the argument >> lifetime. >> >> Both symbols reuse the posix_spawn posix_spawn_file_actions_t and >> posix_spawnattr_t, to either avoid rehash posix_spawn API or add a new >> one. It also mean that both interfaces support the same attribute and >> file actions, and a new flag or file actions on posix_spawn is also >> added automatically for pidfd_spawn. It includes POSIX_SPAWN_SETCGROUP. >> >> Along with the spawn interface, a fork like one is also provided: >> >> pid_t pidfd_fork (int *pidfd, int cgroup, unsigned int flags) >> >> If PIDFD is set to NULL, no file descriptor is returned and pidfd_fork >> acts as fork. Otherwise, a new file descriptor is returned and the >> kernel already sets O_CLOEXEC as default. The pidfd_fork follows >> fork/_Fork convention on returning a positive or negative value to the >> parent (with negative indicating an error) and zero to the child. >> >> If cgroup is 0 or positive value, it is interpreted as a different >> cgroup to be place the new process (check CLONE_INTO_CGROUP clone >> flag). >> >> The kernel already sets O_CLOEXEC as default and it follows fork/_Fork >> convention on returning a positive or negative value to the parent >> (with negative indicating an error) and zero to the child. >> >> Similar to fork, pidfd_fork also runs the pthread_atfork handlers >> It can be change by using PIDFDFORK_ASYNCSAFE flag, which make >> pidfd_fork acts a _Fork. It also send SIGCHLD to parent when >> process terminates. >> >> To have a way to interop between process IDs and process file >> descriptors, the pidfd_getpid is also provided: >> >> pid_t pidfd_getpid (int fd) >> >> It reads the procfs fdinfo entry from the file descriptor to get >> the process ID. > > Hi, any update on this series? Florian said on last weekly call that he intends to check this in following weeks.