Message ID | 20230224161528.997-1-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | [v3] i386: Use pthread_barrier for synchronization on tst-bz21269 | expand |
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > static void * > threadproc (void *ctx) > { > - while (1) > + for (int i = 0; i < NITER; i++) Ok. > - futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0); > - while (atomic_load (&ftx) != 2) > - { > - if (atomic_load (&ftx) >= 3) > - return NULL; > - } > + xpthread_barrier_wait (&barrier); - parent up to this point has set up the LDT block 1 - thread waits for parent; > /* clear LDT entry 0. */ > const struct user_desc desc = { 0 }; > xmodify_ldt (1, &desc, sizeof (desc)); > > - /* If ftx == 2, set it to zero, If ftx == 100, quit. */ > - if (atomic_fetch_add (&ftx, -2) != 2) > - return NULL; > + /* Wait for 'ss' set in main thread. */ > + xpthread_barrier_wait (&barrier); block 2 - child sets ldt, parent does nothing - post, parent sets SS. This still isn't doing what the original test case was doing. The original code did this: Parent changes LDT and sets SS - force a task switch, which should segfault - sigaction's handler either works, or fails <-- important part - child resets LDT Do you have a version of libc newer than the patch in BZ#21269 that you can test the modified test against, to make sure it still detects the failing case? Thinking about it, I suspect this change shouldn't be done: > - TEST_VERIFY_EXIT (sigaction (sig, &sa, 0) == 0); > + xsigaction (sig, &sa, 0); Although they do the same things, since it's sigaction we're actually testing here, hiding it in an xfunction isn't appropriate.
On 27/02/23 19:21, DJ Delorie wrote: > Adhemerval Zanella <adhemerval.zanella@linaro.org> writes: > >> static void * >> threadproc (void *ctx) >> { >> - while (1) >> + for (int i = 0; i < NITER; i++) > > Ok. > >> - futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0); >> - while (atomic_load (&ftx) != 2) >> - { >> - if (atomic_load (&ftx) >= 3) >> - return NULL; >> - } >> + xpthread_barrier_wait (&barrier); > > - parent up to this point has set up the LDT > block 1 - thread waits for parent; > >> /* clear LDT entry 0. */ >> const struct user_desc desc = { 0 }; >> xmodify_ldt (1, &desc, sizeof (desc)); >> >> - /* If ftx == 2, set it to zero, If ftx == 100, quit. */ >> - if (atomic_fetch_add (&ftx, -2) != 2) >> - return NULL; >> + /* Wait for 'ss' set in main thread. */ >> + xpthread_barrier_wait (&barrier); > > block 2 - child sets ldt, parent does nothing > > - post, parent sets SS. This still isn't doing what the original test > case was doing. > > The original code did this: > > > Parent changes LDT and sets SS > - force a task switch, which should segfault > - sigaction's handler either works, or fails <-- important part > - child resets LDT > > Do you have a version of libc newer than the patch in BZ#21269 that you > can test the modified test against, to make sure it still detects the > failing case? What I did was to actually revert the original fix and check if the test in fact trigger the issue: diff --git a/sysdeps/unix/sysv/linux/i386/libc_sigaction.c b/sysdeps/unix/sysv/linux/i386/libc_sigaction.c index 0665b41bbc..b2454d8d55 100644 --- a/sysdeps/unix/sysv/linux/i386/libc_sigaction.c +++ b/sysdeps/unix/sysv/linux/i386/libc_sigaction.c @@ -33,7 +33,7 @@ extern void restore (void) asm ("__restore") attribute_hidden; ? &restore_rt : &restore); \ } \ else \ - (kact)->sa_restorer = NULL; \ + (kact)->sa_restorer = (void*)0xdeadbeef; \ }) #define RESET_SA_RESTORER(act, kact) \ And it does not, because as you pointed out the barrier are not fully correct. So I circle back and I think the v2 is actually correctly, using a bogus sa_restore does trigger a failure and the test succeeds with current code. > > Thinking about it, I suspect this change shouldn't be done: > >> - TEST_VERIFY_EXIT (sigaction (sig, &sa, 0) == 0); >> + xsigaction (sig, &sa, 0); > > Although they do the same things, since it's sigaction we're actually > testing here, hiding it in an xfunction isn't appropriate. > Alright.
So I was able to reproduce the hangs in the original source, and debug it, and fix it. In doing so, I realized that we can't use anything complex to trigger the thread because that "anything" might also cause the expected segfault and force everything out of sync again. Here's what I ended up with, and it doesn't seem to hang where the original one hung quite often (in a tight while..end loop). The key changes are: 1. Calls to futex are error checked, with retries, to ensure that the futexes are actually doing what they're supposed to be doing. In the original code, nearly every futex call returned an error. 2. The main loop has checks for whether the thread ran or not, and "unlocks" the thread if it didn't (this is how the original source hangs). diff --git a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c index c95bfcb917..51d4a1b082 100644 --- a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c +++ b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c @@ -1,5 +1,5 @@ /* Test for i386 sigaction sa_restorer handling (BZ#21269) - Copyright (C) 2017-2022 Free Software Foundation, Inc. + Copyright (C) 2017-2023 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -135,7 +135,14 @@ threadproc (void *ctx) { while (1) { - futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0); + /* Continue to wait here until we've successfully waited, unless + we're supposed to be clearing the LDT already. */ + while (futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0) < 0) + if (atomic_load (&ftx) >= 2) + break; + + /* Normally there's time to hit this busy loop and wait for ftx + to be set to 2. */ while (atomic_load (&ftx) != 2) { if (atomic_load (&ftx) >= 3) @@ -189,7 +196,14 @@ do_test (void) if (sigsetjmp (jmpbuf, 1) != 0) continue; - /* Make sure the thread is ready after the last test. */ + /* We may have longjmp'd before triggering the thread. If so, + trigger the thread now and wait for it. */ + if (atomic_load (&ftx) == 1) + atomic_store (&ftx, 2); + + /* Make sure the thread is ready after the last test. FTX is + initially zero for the first loop, and set to zero each time + the thread clears the LDT. */ while (atomic_load (&ftx) != 0) ; @@ -207,15 +221,24 @@ do_test (void) xmodify_ldt (0x11, &desc, sizeof (desc)); - /* Arm the thread. */ - ftx = 1; - futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0); + /* Arm the thread. We loop here until we've woken up one thread. */ + atomic_store (&ftx, 1); + while (futex ((int*) &ftx, FUTEX_WAKE, 1, NULL, NULL, 0) < 1) + ; + + /* Give the thread a chance to get into it's busy loop. */ + usleep (5); + /* At *ANY* point after this instruction, we may segfault and + longjump back to the top of the loop. The intention is to + have this happen when the thread clears the LDT, but it could + happen elsewhen. */ asm volatile ("mov %0, %%ss" : : "r" (0x7)); /* Fire up thread modify_ldt call. */ atomic_store (&ftx, 2); + /* And wait for it. */ while (atomic_load (&ftx) != 0) ;
On 02/03/23 02:10, DJ Delorie wrote: > > So I was able to reproduce the hangs in the original source, and debug > it, and fix it. In doing so, I realized that we can't use anything > complex to trigger the thread because that "anything" might also cause > the expected segfault and force everything out of sync again. > > Here's what I ended up with, and it doesn't seem to hang where the > original one hung quite often (in a tight while..end loop). The key > changes are: > > 1. Calls to futex are error checked, with retries, to ensure that the > futexes are actually doing what they're supposed to be doing. In the > original code, nearly every futex call returned an error. > > 2. The main loop has checks for whether the thread ran or not, and > "unlocks" the thread if it didn't (this is how the original source > hangs). This makes sense but... > > diff --git a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c > index c95bfcb917..51d4a1b082 100644 > --- a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c > +++ b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c > @@ -1,5 +1,5 @@ > /* Test for i386 sigaction sa_restorer handling (BZ#21269) > - Copyright (C) 2017-2022 Free Software Foundation, Inc. > + Copyright (C) 2017-2023 Free Software Foundation, Inc. > This file is part of the GNU C Library. > > The GNU C Library is free software; you can redistribute it and/or > @@ -135,7 +135,14 @@ threadproc (void *ctx) > { > while (1) > { > - futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0); > + /* Continue to wait here until we've successfully waited, unless > + we're supposed to be clearing the LDT already. */ > + while (futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0) < 0) > + if (atomic_load (&ftx) >= 2) > + break; > + > + /* Normally there's time to hit this busy loop and wait for ftx > + to be set to 2. */ > while (atomic_load (&ftx) != 2) > { > if (atomic_load (&ftx) >= 3) > @@ -189,7 +196,14 @@ do_test (void) > if (sigsetjmp (jmpbuf, 1) != 0) > continue; > > - /* Make sure the thread is ready after the last test. */ > + /* We may have longjmp'd before triggering the thread. If so, > + trigger the thread now and wait for it. */ > + if (atomic_load (&ftx) == 1) > + atomic_store (&ftx, 2); > + > + /* Make sure the thread is ready after the last test. FTX is > + initially zero for the first loop, and set to zero each time > + the thread clears the LDT. */ > while (atomic_load (&ftx) != 0) > ; > > @@ -207,15 +221,24 @@ do_test (void) > > xmodify_ldt (0x11, &desc, sizeof (desc)); > > - /* Arm the thread. */ > - ftx = 1; > - futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0); > + /* Arm the thread. We loop here until we've woken up one thread. */ > + atomic_store (&ftx, 1); > + while (futex ((int*) &ftx, FUTEX_WAKE, 1, NULL, NULL, 0) < 1) > + ; > + > + /* Give the thread a chance to get into it's busy loop. */ > + usleep (5); ... I shivers every time I see sleep used as synchronization mechanism, since most likely in some environment the sleep won't work as expected due scheduling pressuer and we will end up with a false positive. I am wondering if it would be better to just remove this test saying we can't really make it work reliable. > > + /* At *ANY* point after this instruction, we may segfault and > + longjump back to the top of the loop. The intention is to > + have this happen when the thread clears the LDT, but it could > + happen elsewhen. */ > asm volatile ("mov %0, %%ss" : : "r" (0x7)); > > /* Fire up thread modify_ldt call. */ > atomic_store (&ftx, 2); > > + /* And wait for it. */ > while (atomic_load (&ftx) != 0) > ; > >
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes: >> + /* Give the thread a chance to get into it's busy loop. */ >> + usleep (5); > > ... I shivers every time I see sleep used as synchronization mechanism, since > most likely in some environment the sleep won't work as expected due > scheduling pressuer and we will end up with a false positive. Yeah, I'm using as just a "yeild if you can" operation. The code works without it, but not in the way the test is intended. > I am wondering if it would be better to just remove this test saying we > can't really make it work reliable. I'm not opposed to removing it. Even with the fixed I put in, the test is still more likely to fault "for some reason" than for the expected reason.
On 02/03/23 17:21, DJ Delorie wrote: > Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes: >>> + /* Give the thread a chance to get into it's busy loop. */ >>> + usleep (5); >> >> ... I shivers every time I see sleep used as synchronization mechanism, since >> most likely in some environment the sleep won't work as expected due >> scheduling pressuer and we will end up with a false positive. > > Yeah, I'm using as just a "yeild if you can" operation. The code works > without it, but not in the way the test is intended. > >> I am wondering if it would be better to just remove this test saying we >> can't really make it work reliable. > > I'm not opposed to removing it. Even with the fixed I put in, the test > is still more likely to fault "for some reason" than for the expected > reason. > Ok, this LGTM then. The patchwork false positive failures is being really annoying.
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes: > On 02/03/23 17:21, DJ Delorie wrote: >> I'm not opposed to removing it. Even with the fixed I put in, the test >> is still more likely to fault "for some reason" than for the expected >> reason. >> > > Ok, this LGTM then. The patchwork false positive failures is being really > annoying. We dropped the ball on this one. We ended up with three options: 1> Adhemerval's v4 patch https://sourceware.org/pipermail/libc-alpha/2023-February/145934.html 2> DJ's v3 patch https://sourceware.org/pipermail/libc-alpha/2023-March/145999.html 3> remove the test Which option were you agreeing to, and do you still agree to it?
On 10/05/23 18:31, DJ Delorie wrote: > > Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes: >> On 02/03/23 17:21, DJ Delorie wrote: >>> I'm not opposed to removing it. Even with the fixed I put in, the test >>> is still more likely to fault "for some reason" than for the expected >>> reason. >>> >> >> Ok, this LGTM then. The patchwork false positive failures is being really >> annoying. > > We dropped the ball on this one. We ended up with three options: > > 1> Adhemerval's v4 patch > https://sourceware.org/pipermail/libc-alpha/2023-February/145934.html > 2> DJ's v3 patch > https://sourceware.org/pipermail/libc-alpha/2023-March/145999.html > 3> remove the test > > Which option were you agreeing to, and do you still agree to it? > Either 2 or 3 is fine for me, since I acked your patch I think we can go with 2 for now.
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> writes: >> 1> Adhemerval's v4 patch >> https://sourceware.org/pipermail/libc-alpha/2023-February/145934.html >> 2> DJ's v3 patch >> https://sourceware.org/pipermail/libc-alpha/2023-March/145999.html >> 3> remove the test >> >> Which option were you agreeing to, and do you still agree to it? >> > > Either 2 or 3 is fine for me, since I acked your patch I think we can > go with 2 for now. I pushed #2. Next time we have a problem with that test, we'll just remove it.
diff --git a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c index 1850240c34..351a9187e7 100644 --- a/sysdeps/unix/sysv/linux/i386/tst-bz21269.c +++ b/sysdeps/unix/sysv/linux/i386/tst-bz21269.c @@ -20,16 +20,13 @@ more specifically in do_multicpu_tests function. The main changes are: - - C11 atomics instead of plain access. + - Use pthread_barrier instead of atomic and futexes. - Remove x86_64 support which simplifies the syscall handling and fallbacks. - Replicate only the test required to trigger the issue for the BZ#21269. */ -#include <stdatomic.h> - #include <asm/ldt.h> -#include <linux/futex.h> #include <setjmp.h> #include <signal.h> @@ -40,6 +37,9 @@ #include <support/xunistd.h> #include <support/check.h> #include <support/xthread.h> +#include <support/xsignal.h> + +#define NITER 5 static int xset_thread_area (struct user_desc *u_info) @@ -55,13 +55,6 @@ xmodify_ldt (int func, const void *ptr, unsigned long bytecount) TEST_VERIFY_EXIT (syscall (SYS_modify_ldt, 1, ptr, bytecount) == 0); } -static int -futex (int *uaddr, int futex_op, int val, void *timeout, int *uaddr2, - int val3) -{ - return syscall (SYS_futex, uaddr, futex_op, val, timeout, uaddr2, val3); -} - static void xsethandler (int sig, void (*handler)(int, siginfo_t *, void *), int flags) { @@ -69,7 +62,7 @@ xsethandler (int sig, void (*handler)(int, siginfo_t *, void *), int flags) sa.sa_sigaction = handler; sa.sa_flags = SA_SIGINFO | flags; TEST_VERIFY_EXIT (sigemptyset (&sa.sa_mask) == 0); - TEST_VERIFY_EXIT (sigaction (sig, &sa, 0) == 0); + xsigaction (sig, &sa, 0); } static jmp_buf jmpbuf; @@ -123,33 +116,24 @@ setup_low_user_desc (void) low_user_desc_clear->seg_not_present = 1; } -/* Possible values of futex: - 0: thread is idle. - 1: thread armed. - 2: thread should clear LDT entry 0. - 3: thread should exit. */ -static atomic_uint ftx; +static pthread_barrier_t barrier; static void * threadproc (void *ctx) { - while (1) + for (int i = 0; i < NITER; i++) { - futex ((int *) &ftx, FUTEX_WAIT, 1, NULL, NULL, 0); - while (atomic_load (&ftx) != 2) - { - if (atomic_load (&ftx) >= 3) - return NULL; - } + xpthread_barrier_wait (&barrier); /* clear LDT entry 0. */ const struct user_desc desc = { 0 }; xmodify_ldt (1, &desc, sizeof (desc)); - /* If ftx == 2, set it to zero, If ftx == 100, quit. */ - if (atomic_fetch_add (&ftx, -2) != 2) - return NULL; + /* Wait for 'ss' set in main thread. */ + xpthread_barrier_wait (&barrier); } + + return NULL; } @@ -180,20 +164,18 @@ do_test (void) /* Some kernels send SIGBUS instead. */ xsethandler (SIGBUS, sigsegv_handler, 0); + xpthread_barrier_init (&barrier, NULL, 2); + thread = xpthread_create (0, threadproc, 0); asm volatile ("mov %%ss, %0" : "=rm" (orig_ss)); - for (int i = 0; i < 5; i++) + for (int i = 0; i < NITER; i++) { if (sigsetjmp (jmpbuf, 1) != 0) continue; - /* Make sure the thread is ready after the last test. */ - while (atomic_load (&ftx) != 0) - ; - - struct user_desc desc = { + const struct user_desc desc = { .entry_number = 0, .base_addr = 0, .limit = 0xffff, @@ -207,28 +189,23 @@ do_test (void) xmodify_ldt (0x11, &desc, sizeof (desc)); - /* Arm the thread. */ - ftx = 1; - futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0); - - asm volatile ("mov %0, %%ss" : : "r" (0x7)); + /* Make sure the thread is ready after the last test. */ + xpthread_barrier_wait (&barrier); - /* Fire up thread modify_ldt call. */ - atomic_store (&ftx, 2); + /* Wait thread clear LDT entry 0. */ + xpthread_barrier_wait (&barrier); - while (atomic_load (&ftx) != 0) - ; + asm volatile ("mov %0, %%ss" : : "r" (0x7)); /* On success, modify_ldt will segfault us synchronously and we will escape via siglongjmp. */ support_record_failure (); } - atomic_store (&ftx, 100); - futex ((int*) &ftx, FUTEX_WAKE, 0, NULL, NULL, 0); - xpthread_join (thread); + xpthread_barrier_destroy (&barrier); + return 0; }