Message ID | 20230818140642.1623571-1-adhemerval.zanella@linaro.org |
---|---|
Headers | show |
Series | Add pidfd and cgroupv2 support for process creation | expand |
On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote: > The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd, > and pidfd_send_signal, and exported the P_PIDFD to use along with > waitid. The pidfd is a race-free interface, however, the pidfd_open is > subject to TOCTOU if the file descriptor is not obtained directly from > the clone or clone3 syscall (there is still a small window between the > clone return and the pidfd_getfd where the process can be reaped and the > process ID reused). Unless I'm missing something, that window is purely programmer error. The pid belongs to the parent process, that called fork, posix_spawn, clone, or whatever, and is responsible for not freeing it until it's done using it. Yes this can happen if you install a SIGCHLD handler that reaps anything it sees, or if you're calling wait without a pid. This is programming error. If you're stuck with code outside your control that makes that mistake, you can already avoid it with clone by setting the child exit signal to 0 rather than SIGCHLD. But it's best just not to do that. IMO making a new, complex, highly nonstandard interface to work around a problem that's programmer error, and getting this nonstandard and nonportable pattern into mainstream software, has negative value. Rich
On 18/08/23 14:51, Rich Felker wrote: > On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote: >> The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd, >> and pidfd_send_signal, and exported the P_PIDFD to use along with >> waitid. The pidfd is a race-free interface, however, the pidfd_open is >> subject to TOCTOU if the file descriptor is not obtained directly from >> the clone or clone3 syscall (there is still a small window between the >> clone return and the pidfd_getfd where the process can be reaped and the >> process ID reused). > > Unless I'm missing something, that window is purely programmer error. > The pid belongs to the parent process, that called fork, posix_spawn, > clone, or whatever, and is responsible for not freeing it until it's > done using it. > > Yes this can happen if you install a SIGCHLD handler that reaps > anything it sees, or if you're calling wait without a pid. This is > programming error. If you're stuck with code outside your control that > makes that mistake, you can already avoid it with clone by setting the > child exit signal to 0 rather than SIGCHLD. But it's best just not to > do that. > Yes, this is the issue GNOME is having with their code base [1] and that motivated this new interface. Systemd also seems to be interested in these interface, although I am not sure if it is also subject to same issue. I don't have a strong opinion whether this should be considered a solid reason to provide a new API, another option would to close BZ#30349 [2] as wontfix with this rationale. However, this does not really provide an workaround, and worse it will pass the idea that to fully resolve it you will need either to allow the racy condition or issue clone directly. > IMO making a new, complex, highly nonstandard interface to work around > a problem that's programmer error, and getting this nonstandard and > nonportable pattern into mainstream software, has negative value. Although I see this makes sense for the pidfd_spawn, there is still the BZ 26371 requirement for cgroupv2 support. This would required at least something analogous to posix_spawnattr_{get,set}cgroup_np. I am not sure about if my fork_np suggestion is really required here, although the idea is to allow a more extensible fork interface for possible future clone support. Florian and Lennart seems to lean for a clone3 similar to the clone interface, which I really would like to avoid. [1] https://gitlab.gnome.org/GNOME/glib/-/issues/1866 [2] https://sourceware.org/bugzilla/show_bug.cgi?id=30349
* Rich Felker: > On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote: >> The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd, >> and pidfd_send_signal, and exported the P_PIDFD to use along with >> waitid. The pidfd is a race-free interface, however, the pidfd_open is >> subject to TOCTOU if the file descriptor is not obtained directly from >> the clone or clone3 syscall (there is still a small window between the >> clone return and the pidfd_getfd where the process can be reaped and the >> process ID reused). > > Unless I'm missing something, that window is purely programmer error. > The pid belongs to the parent process, that called fork, posix_spawn, > clone, or whatever, and is responsible for not freeing it until it's > done using it. > > Yes this can happen if you install a SIGCHLD handler that reaps > anything it sees, or if you're calling wait without a pid. This is > programming error. If you're stuck with code outside your control that > makes that mistake, you can already avoid it with clone by setting the > child exit signal to 0 rather than SIGCHLD. But it's best just not to > do that. I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the creation of subprocesses that are difficult to observe by accident from the rest of the process, while obtaining a stable identifier for the process. I do not think there is any other way to achieve that. I think it's desirable to expose this functionality in some way. Thanks, Florian
On Mon, Aug 21, 2023 at 08:53:53AM +0200, Florian Weimer wrote: > * Rich Felker: > > > On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via Libc-alpha wrote: > >> The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd, > >> and pidfd_send_signal, and exported the P_PIDFD to use along with > >> waitid. The pidfd is a race-free interface, however, the pidfd_open is > >> subject to TOCTOU if the file descriptor is not obtained directly from > >> the clone or clone3 syscall (there is still a small window between the > >> clone return and the pidfd_getfd where the process can be reaped and the > >> process ID reused). > > > > Unless I'm missing something, that window is purely programmer error. > > The pid belongs to the parent process, that called fork, posix_spawn, > > clone, or whatever, and is responsible for not freeing it until it's > > done using it. > > > > Yes this can happen if you install a SIGCHLD handler that reaps > > anything it sees, or if you're calling wait without a pid. This is > > programming error. If you're stuck with code outside your control that > > makes that mistake, you can already avoid it with clone by setting the > > child exit signal to 0 rather than SIGCHLD. But it's best just not to > > do that. > > I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the > creation of subprocesses that are difficult to observe by accident from > the rest of the process, while obtaining a stable identifier for the > process. I do not think there is any other way to achieve that. I > think it's desirable to expose this functionality in some way. Indeed that seems like useful functionality to expose for cases where you can't fix some bad code, but there are lots of issues with how clone3 (and even clone) should behave with respect to the child environment when you don't exec -- is it _Fork-like or fork-like? Can you use AS-unsafe interfaces in the child of a MT parent? Etc. This should all be discussed on libc-coord not libc-alpha, IMO. Independent of that, though, I'd like to focus on the fact that randomly reaping children with no regard to what part of your process owned those pids has been wrong and broken practice for something like 4 decades. Just like looping over a range of fds and closing everything is broken. It's a type of UAF bug and it's not a pattern we suddenly need to support because systemd or whoever says so. It's a bug that's easily detected with static analysis (any calls to wait or to waitpid/waitid without the pid specified, or SIGCHLD handlers) and usually easy to fix -- and then the fix works on every POSIX-ish platform that's existed, pretty much ever, not just on GNU/Linux. Rich
* Rich Felker: > On Mon, Aug 21, 2023 at 08:53:53AM +0200, Florian Weimer wrote: >> I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the >> creation of subprocesses that are difficult to observe by accident from >> the rest of the process, while obtaining a stable identifier for the >> process. I do not think there is any other way to achieve that. I >> think it's desirable to expose this functionality in some way. > > Indeed that seems like useful functionality to expose for cases where > you can't fix some bad code, but there are lots of issues with how It's more about providing a confirming implementation of function in the wait family in case the implementation needs to create processes for some reason. It's not about “bad code”, these are standard POSIX interfaces. > clone3 (and even clone) should behave with respect to the child > environment when you don't exec -- is it _Fork-like or fork-like? Can > you use AS-unsafe interfaces in the child of a MT parent? Etc. This > should all be discussed on libc-coord not libc-alpha, IMO. Seems more like linux-api material, given that it's Linux-specific? Thanks, Florian
On Thu, Aug 24, 2023 at 09:25:05AM +0200, Florian Weimer wrote: > * Rich Felker: > > > On Mon, Aug 21, 2023 at 08:53:53AM +0200, Florian Weimer wrote: > >> I think clone3 with exit_signal set to 0 and CLONE_PIDFD allows the > >> creation of subprocesses that are difficult to observe by accident from > >> the rest of the process, while obtaining a stable identifier for the > >> process. I do not think there is any other way to achieve that. I > >> think it's desirable to expose this functionality in some way. > > > > Indeed that seems like useful functionality to expose for cases where > > you can't fix some bad code, but there are lots of issues with how > > It's more about providing a confirming implementation of function in the > wait family in case the implementation needs to create processes for > some reason. It's not about “bad code”, these are standard POSIX > interfaces. You're talking about the implementation (exit_signal=0 is very useful for this) and I thought we were talking about interfaces to expose to the application (where yes calling wait is a standard interface you can use, but the standard consequence of "bad code" doing that when another part of the program might have pids it doesn't want reaped is that things blow up and you get to keep both pieces). > > clone3 (and even clone) should behave with respect to the child > > environment when you don't exec -- is it _Fork-like or fork-like? Can > > you use AS-unsafe interfaces in the child of a MT parent? Etc. This > > should all be discussed on libc-coord not libc-alpha, IMO. > > Seems more like linux-api material, given that it's Linux-specific? Perhaps. Rich
> On 18/08/23 14:51, Rich Felker wrote: > > On Fri, Aug 18, 2023 at 11:06:35AM -0300, Adhemerval Zanella via > Libc-alpha wrote: > >> The glibc 2.36 added wrappers for Linux syscall pidfd_open, > pidfd_getfd, > >> and pidfd_send_signal, and exported the P_PIDFD to use along with > >> waitid. The pidfd is a race-free interface, however, the > pidfd_open is > >> subject to TOCTOU if the file descriptor is not obtained directly > from > >> the clone or clone3 syscall (there is still a small window between > the > >> clone return and the pidfd_getfd where the process can be reaped > and the > >> process ID reused). > > > > Unless I'm missing something, that window is purely programmer > error. > > The pid belongs to the parent process, that called fork, > posix_spawn, > > clone, or whatever, and is responsible for not freeing it until > it's > > done using it. > > > > Yes this can happen if you install a SIGCHLD handler that reaps > > anything it sees, or if you're calling wait without a pid. This is > > programming error. If you're stuck with code outside your control > that > > makes that mistake, you can already avoid it with clone by setting > the > > child exit signal to 0 rather than SIGCHLD. But it's best just not > to > > do that. > > > > Yes, this is the issue GNOME is having with their code base [1] and > that > motivated this new interface. Systemd also seems to be interested in > these interface, although I am not sure if it is also subject to same > issue. > > I don't have a strong opinion whether this should be considered a > solid > reason to provide a new API, another option would to close BZ#30349 > [2] > as wontfix with this rationale. However, this does not really > provide > an workaround, and worse it will pass the idea that to fully resolve > it > you will need either to allow the racy condition or issue clone > directly. These are real race conditions, that cannot be solved otherwise, characterizing them as 'programming errors' is very misleading and wrong. We very much need both of those interfaces in systemd, and fully intend to use them as soon as they are available. We are slowly moving towards using pidfds everywhere to be able to do end-to-end race-free process tracking and management, and these are fundamental pieces for this effort. From what I can read the GNOME developers feel the same way, and I wouldn't be surprised if QT followed suit too given what you mentioned in the cover letter. Surely implementing useful, core functionality for the direct and immediate benefit of 3 major Linux projects is a reason as solid as you could ever find to add a new interface.
* Luca Boccassi: > These are real race conditions, that cannot be solved otherwise, > characterizing them as 'programming errors' is very misleading and > wrong. > > We very much need both of those interfaces in systemd, and fully intend > to use them as soon as they are available. We are slowly moving towards > using pidfds everywhere to be able to do end-to-end race-free process > tracking and management, and these are fundamental pieces for this > effort. From what I can read the GNOME developers feel the same way, > and I wouldn't be surprised if QT followed suit too given what you > mentioned in the cover letter. > > Surely implementing useful, core functionality for the direct and > immediate benefit of 3 major Linux projects is a reason as solid as you > could ever find to add a new interface. I see value in adding fork support, too. The fundamental issue with the fork part is that it's not future-proof at all. The programming model is completely different from the kernel interface. If I start a discussion about API alternatives, who should I Cc: except you and Rich? Thanks, Florian
On Mon, 28 Aug 2023 at 14:21, Florian Weimer <fweimer@redhat.com> wrote: > > * Luca Boccassi: > > > These are real race conditions, that cannot be solved otherwise, > > characterizing them as 'programming errors' is very misleading and > > wrong. > > > > We very much need both of those interfaces in systemd, and fully intend > > to use them as soon as they are available. We are slowly moving towards > > using pidfds everywhere to be able to do end-to-end race-free process > > tracking and management, and these are fundamental pieces for this > > effort. From what I can read the GNOME developers feel the same way, > > and I wouldn't be surprised if QT followed suit too given what you > > mentioned in the cover letter. > > > > Surely implementing useful, core functionality for the direct and > > immediate benefit of 3 major Linux projects is a reason as solid as you > > could ever find to add a new interface. > > I see value in adding fork support, too. > > The fundamental issue with the fork part is that it's not future-proof > at all. The programming model is completely different from the kernel > interface. Sure that would be nice too, it is less urgent for us as we are moving towards spawn(), so it seems fine if that is handled separately from this series for me. > If I start a discussion about API alternatives, who should I Cc: except > you and Rich? Nobody else for now on my side, thanks.