diff mbox series

[2/2] nptl: Fix stray process left by tst-cancel7 blocking testing

Message ID alpine.DEB.2.21.2408051403450.61955@angie.orcam.me.uk
State New
Headers show
Series nptl: Fix termination issues with tst-cancel7 | expand

Commit Message

Maciej W. Rozycki Aug. 5, 2024, 2:22 p.m. UTC
From: Maciej W. Rozycki <macro@redhat.com>

Fix an issue with commit b74121ae4bc5 ("Update.") and prevent a stray 
process from being left behind by tst-cancel7 (and also tst-cancelx7, 
which is the same test built with '-fexceptions' additionally supplied 
to the compiler), which then blocks remote testing until the process has 
been killed by hand.

This test case creates a thread that runs an extra copy of the test via 
system(3) and using the '--direct' option so that the test wrapper does 
not interfere with this instance.  This extra copy executes its business 
and calls sigsuspend(2) and then never terminates by itself.  Instead it 
relies on being killed by the main test process directly via a thread 
cancellation request or, should that fail, by issuing SIGKILL either at 
the conclusion of 'do_test' or by the test driver via 'do_cleanup' where 
the test timeout has been hit or the test driver interrupted.

However if the main test process has been instead killed by a signal, 
such as due to incorrect execution, before it had a chance to kill the 
extra copy of the test case, then the test wrapper will terminate 
without running 'do_cleanup' and consequently the extra copy of the test 
case will remain forever in its suspended state, and in the remote case 
in particular it means that the remote test wrapper will wait forever 
for the SSH command to complete.

This has been observed with the 'alpha-linux-gnu' target, where the main 
test process triggers SIGSEGV and the test wrapper correctly records:

Didn't expect signal from child: got `Segmentation fault'

in nptl/tst-cancel7.out and terminates, but then the calling SSH command
continues waiting for the remaining process started in the same session
on the remote target to complete.

Address this problem by also registering 'do_cleanup' via atexit(3), 
observing that 'support_delete_temp_files' is registered by the test 
wrapper before the test initializing function 'do_prepare' is called and 
that we call all the functions registered in the reverse of the order in 
which they were registered, so it is safe to refer to 'pidfilename' in 
'do_cleanup' invoked by exit(3) because by that time temporary files 
have not yet been deleted.

A minor inconvenience is that if 'signal_handler' is invoked in the test 
wrapper as a result of SIGALRM rather than SIGINT, then 'do_cleanup' 
will be called twice, once as a cleanup handler and again by exit(3).  
In reality it is harmless though, because issuing SIGKILL is guarded by 
a record lock, so if the first call has succeeded in killing the extra 
copy of the test case, then the subsequent call will do nothing.
---
 nptl/tst-cancel7.c |    4 ++++
 1 file changed, 4 insertions(+)

glibc-tst-cancel7-cleanup.diff

Comments

Adhemerval Zanella Netto Aug. 5, 2024, 5:44 p.m. UTC | #1
On 05/08/24 11:22, Maciej W. Rozycki wrote:
> From: Maciej W. Rozycki <macro@redhat.com>
> 
> Fix an issue with commit b74121ae4bc5 ("Update.") and prevent a stray 
> process from being left behind by tst-cancel7 (and also tst-cancelx7, 
> which is the same test built with '-fexceptions' additionally supplied 
> to the compiler), which then blocks remote testing until the process has 
> been killed by hand.
> 
> This test case creates a thread that runs an extra copy of the test via 
> system(3) and using the '--direct' option so that the test wrapper does 
> not interfere with this instance.  This extra copy executes its business 
> and calls sigsuspend(2) and then never terminates by itself.  Instead it 
> relies on being killed by the main test process directly via a thread 
> cancellation request or, should that fail, by issuing SIGKILL either at 
> the conclusion of 'do_test' or by the test driver via 'do_cleanup' where 
> the test timeout has been hit or the test driver interrupted.
> 
> However if the main test process has been instead killed by a signal, 
> such as due to incorrect execution, before it had a chance to kill the 
> extra copy of the test case, then the test wrapper will terminate 
> without running 'do_cleanup' and consequently the extra copy of the test 
> case will remain forever in its suspended state, and in the remote case 
> in particular it means that the remote test wrapper will wait forever 
> for the SSH command to complete.
> 
> This has been observed with the 'alpha-linux-gnu' target, where the main 
> test process triggers SIGSEGV and the test wrapper correctly records:
> 
> Didn't expect signal from child: got `Segmentation fault'
> 
> in nptl/tst-cancel7.out and terminates, but then the calling SSH command
> continues waiting for the remaining process started in the same session
> on the remote target to complete.
> 
> Address this problem by also registering 'do_cleanup' via atexit(3), 
> observing that 'support_delete_temp_files' is registered by the test 
> wrapper before the test initializing function 'do_prepare' is called and 
> that we call all the functions registered in the reverse of the order in 
> which they were registered, so it is safe to refer to 'pidfilename' in 
> 'do_cleanup' invoked by exit(3) because by that time temporary files 
> have not yet been deleted.
> 
> A minor inconvenience is that if 'signal_handler' is invoked in the test 
> wrapper as a result of SIGALRM rather than SIGINT, then 'do_cleanup' 
> will be called twice, once as a cleanup handler and again by exit(3).  
> In reality it is harmless though, because issuing SIGKILL is guarded by 
> a record lock, so if the first call has succeeded in killing the extra 
> copy of the test case, then the subsequent call will do nothing.


LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
>  nptl/tst-cancel7.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> glibc-tst-cancel7-cleanup.diff
> Index: glibc/nptl/tst-cancel7.c
> ===================================================================
> --- glibc.orig/nptl/tst-cancel7.c
> +++ glibc/nptl/tst-cancel7.c
> @@ -38,6 +38,8 @@ static char *semfilename;
>  
>  static sem_t *sem;
>  
> +static void do_cleanup (void);
> +
>  static void *
>  tf (void *arg)
>  {
> @@ -108,6 +110,8 @@ do_prepare (int argc, char *argv[])
>  
>    xwrite (fd, " ", 1);
>    xclose (fd);
> +
> +  atexit (do_cleanup);
>  }
>  
>
Maciej W. Rozycki Aug. 7, 2024, 6:49 p.m. UTC | #2
On Mon, 5 Aug 2024, Adhemerval Zanella Netto wrote:

> LGTM, thanks.
> 
> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

 I have pushed both changes now, thank you for your review.

  Maciej
diff mbox series

Patch

Index: glibc/nptl/tst-cancel7.c
===================================================================
--- glibc.orig/nptl/tst-cancel7.c
+++ glibc/nptl/tst-cancel7.c
@@ -38,6 +38,8 @@  static char *semfilename;
 
 static sem_t *sem;
 
+static void do_cleanup (void);
+
 static void *
 tf (void *arg)
 {
@@ -108,6 +110,8 @@  do_prepare (int argc, char *argv[])
 
   xwrite (fd, " ", 1);
   xclose (fd);
+
+  atexit (do_cleanup);
 }