Message ID | 1543411413-23863-1-git-send-email-leitao@debian.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | selftests/powerpc: New TM signal self test | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | next/apply_patch Successfully applied |
snowpatch_ozlabs/build-ppc64le | success | build succeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/build-ppc64be | success | build succeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/build-ppc64e | success | build succeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/build-pmac32 | success | build succeded & removed 0 sparse warning(s) |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 0 warnings, 1 checks, 137 lines checked |
On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote: > A new self test that forces MSR[TS] to be set without calling any TM > instruction. This test also tries to cause a page fault at a signal > handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing > thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG > when tm_recheckpoint() is called. > > This test is not deterministic since it is hard to guarantee that the page > access will cause a page fault. Tests have shown that the bug could be > exposed with few interactions in a buggy kernel. This test is configured to > loop 5000x, having a good chance to hit the kernel issue in just one run. > This self test takes less than two seconds to run. You could try using sigaltstack() to put the ucontext somewhere else. Then you could play tricks with that memory to try to force a fault. madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick. This is more extra credit to make it more reliable. Not a requirement. > This test uses set/getcontext because the kernel will recheckpoint > zeroed structures, causing the test to segfault, which is undesired because > the test needs to rerun, so, there is a signal handler for SIGSEGV which > will restart the test. Please put this description at the top of the test also. Other than that, it looks good. Mikey > > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > tools/testing/selftests/powerpc/tm/.gitignore | 1 + > tools/testing/selftests/powerpc/tm/Makefile | 3 +- > .../powerpc/tm/tm-signal-force-msr.c | 115 ++++++++++++++++++ > 3 files changed, 118 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > > diff --git a/tools/testing/selftests/powerpc/tm/.gitignore > b/tools/testing/selftests/powerpc/tm/.gitignore > index c3ee8393dae8..89679822ebc9 100644 > --- a/tools/testing/selftests/powerpc/tm/.gitignore > +++ b/tools/testing/selftests/powerpc/tm/.gitignore > @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu > tm-signal-context-chk-gpr > tm-signal-context-chk-vmx > tm-signal-context-chk-vsx > +tm-signal-force-msr > tm-vmx-unavail > tm-unavailable > tm-trap > diff --git a/tools/testing/selftests/powerpc/tm/Makefile > b/tools/testing/selftests/powerpc/tm/Makefile > index 9fc2cf6fbc92..58a2ebd13958 100644 > --- a/tools/testing/selftests/powerpc/tm/Makefile > +++ b/tools/testing/selftests/powerpc/tm/Makefile > @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm- > signal-context-chk-fpu > > TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal- > stack \ > tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap > \ > - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn > + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr > > top_srcdir = ../../../../.. > include ../../lib.mk > @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 > $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c > $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno- > error=uninitialized -mvsx > $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64 > +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread > > SIGNAL_CONTEXT_CHK_TESTS := $(patsubst > %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) > $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S > diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > new file mode 100644 > index 000000000000..4441d61c2328 > --- /dev/null > +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c > @@ -0,0 +1,115 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp. > + */ > + > +#define _GNU_SOURCE > +#include <stdio.h> > +#include <stdlib.h> > +#include <signal.h> > +#include <string.h> > +#include <ucontext.h> > +#include <unistd.h> > + > +#include "tm.h" > +#include "utils.h" > + > +#define __MASK(X) (1UL<<(X)) > +#define MSR_TS_S_LG 33 /* Trans Mem state: Suspended */ > +#define MSR_TM __MASK(MSR_TM_LG) /* Transactional Mem Available */ > +#define MSR_TS_S __MASK(MSR_TS_S_LG) /* Transaction Suspended */ Surely we have these defined somewhere else in selftests? > + > +#define COUNT_MAX 5000 /* Number of interactions */ > + > +/* Setting contexts because the test will crash and we want to recover */ > +ucontext_t init_context, main_context; > + > +static int count, first_time; > + > +void trap_signal_handler(int signo, siginfo_t *si, void *uc) > +{ > + ucontext_t *ucp = uc; > + > + /* > + * Allocating memory in a signal handler, and never freeing it on > + * purpose, forcing the heap increase, so, the memory leak is what > + * we want here. > + */ > + ucp->uc_link = malloc(sizeof(ucontext_t)); > + memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext)); > + > + /* Forcing to enable MSR[TM] */ > + ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; > + > + /* > + * A fork inside a signal handler seems to be more efficient than a > + * fork() prior to the signal being raised. > + */ > + if (fork() == 0) { > + /* > + * Both child and parent will return, but, child returns > + * with count set so it will exit in the next segfault. > + * Parent will continue to loop. > + */ > + count = COUNT_MAX; > + } > + > + /* > + * If the change above does not hit the bug, it will cause a > + * segmentation fault, since the ck structures are NULL. > + */ > +} > + > +void seg_signal_handler(int signo, siginfo_t *si, void *uc) > +{ > + if (count == COUNT_MAX) { > + /* Return to tm_signal_force_msr() and exit */ > + setcontext(&main_context); > + } > + > + count++; > + /* Reexecute the test */ > + setcontext(&init_context); > +} > + > +void tm_trap_test(void) > +{ > + struct sigaction trap_sa, seg_sa; > + > + trap_sa.sa_flags = SA_SIGINFO; > + trap_sa.sa_sigaction = trap_signal_handler; > + > + seg_sa.sa_flags = SA_SIGINFO; > + seg_sa.sa_sigaction = seg_signal_handler; > + > + /* > + * Set initial context. Will get back here from > + * seg_signal_handler() > + */ > + getcontext(&init_context); > + > + /* The signal handler will enable MSR_TS */ > + sigaction(SIGUSR1, &trap_sa, NULL); > + /* If it does not crash, it will segfault, avoid it to retest */ > + sigaction(SIGSEGV, &seg_sa, NULL); > + > + raise(SIGUSR1); > +} > + > +int tm_signal_force_msr(void) > +{ > + SKIP_IF(!have_htm()); > + > + /* Will get back here after COUNT_MAX interactions */ > + getcontext(&main_context); > + > + if (!first_time++) > + tm_trap_test(); > + > + return EXIT_SUCCESS; > +} > + > +int main(int argc, char **argv) > +{ > + test_harness(tm_signal_force_msr, "tm_signal_force_msr"); > +}
Hi Mikey, On 11/29/18 12:11 AM, Michael Neuling wrote: > On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote: >> A new self test that forces MSR[TS] to be set without calling any TM >> instruction. This test also tries to cause a page fault at a signal >> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing >> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG >> when tm_recheckpoint() is called. >> >> This test is not deterministic since it is hard to guarantee that the page >> access will cause a page fault. Tests have shown that the bug could be >> exposed with few interactions in a buggy kernel. This test is configured to >> loop 5000x, having a good chance to hit the kernel issue in just one run. >> This self test takes less than two seconds to run. > > You could try using sigaltstack() to put the ucontext somewhere else. Then you > could play tricks with that memory to try to force a fault. > madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick. Yes, it sounded interesting and I implemented the test using madvice(). Thanks for the suggestion! The current approach didn't seem to improve the amount of page faults at it seems that MADV_DONTNEED makes no difference when using a Lazy page loading. This is the test I did, where 'original' is my current patch and 'madvice` is the patch below: Performance counter stats for './original': 0 major-faults 125,100 minor-faults 2.575479619 seconds time elapsed Performance counter stats for './madvice': 0 major-faults 125,099 minor-faults Other than that, I didn't see any improvements in the reproduction rate also, although it is a bit challenging to measure, since it crashes the machine and I can't run a full statistical model. This is the current patch I compared to the original one --- commit 082a9fe29412943adfa2d6a363f44bac8e81d0ce Author: Breno Leitao <leitao@debian.org> Date: Tue Nov 13 18:02:57 2018 -0500 selftests/powerpc: New TM signal self test A new self test that forces MSR[TS] to be set without calling any TM instruction. This test also tries to cause a page fault at a signal handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG when tm_recheckpoint() is called. This test is not deterministic, since it is hard to guarantee that the page access will cause a page fault. In order to force more page faults at signal context, the signal handler and the ucontext are being mapped into a MADV_DONTNEED memory chunks. Tests have shown that the bug could be exposed with few interactions in a buggy kernel. This test is configured to loop 5000x, having a good chance to hit the kernel issue in just one run. This self test takes less than two seconds to run. This test uses set/getcontext because the kernel will recheckpoint zeroed structures, causing the test to segfault, which is undesired because the test needs to rerun, so, there is a signal handler for SIGSEGV which will restart the test. Signed-off-by: Breno Leitao <leitao@debian.org> diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index c3ee8393dae8..89679822ebc9 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu tm-signal-context-chk-gpr tm-signal-context-chk-vmx tm-signal-context-chk-vsx +tm-signal-force-msr tm-vmx-unavail tm-unavailable tm-trap diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 9fc2cf6fbc92..58a2ebd13958 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \ - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr top_srcdir = ../../../../.. include ../../lib.mk @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64 +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c new file mode 100644 index 000000000000..496596f3c1bf --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c @@ -0,0 +1,164 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp. + * + * This test raises a SIGUSR1 signal, and toggle the MSR[TS] + * fields at the signal handler. With MSR[TS] being set, the kernel will + * force a recheckpoint, which may cause a segfault when returning to + * user space. Since the kernel needs to re-run, the segfault needs to be + * caught and handled. + * + * In order to continue the test even after a segfault, the context is + * saved prior to the signal being raised, and it is restored when there is + * a segmentation fault. This happens for COUNT_MAX times. + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <signal.h> +#include <string.h> +#include <ucontext.h> +#include <unistd.h> +#include <sys/mman.h> + +#include "tm.h" +#include "utils.h" +#include "reg.h" + +#define COUNT_MAX 5000 /* Number of interactions */ + +/* Setting contexts because the test will crash and we want to recover */ +ucontext_t init_context, main_context; + +static int count, first_time; + +void usr_signal_handler(int signo, siginfo_t *si, void *uc) +{ + ucontext_t *ucp = uc; + int ret; + + /* + * Allocating memory in a signal handler, and never freeing it on + * purpose, forcing the heap increase, so, the memory leak is what + * we want here. + */ + ucp->uc_link = mmap(NULL, sizeof(ucontext_t), + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + if (ucp->uc_link == (void *)-1) { + perror("Mmap failed"); + exit(-1); + } + + /* Forcing the page to be allocated in a page fault */ + ret = madvise(ucp->uc_link, sizeof(ucontext_t), MADV_DONTNEED); + if (ret) { + perror("madvise failed"); + exit(-1); + } + + memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext)); + + /* Forcing to enable MSR[TM] */ + ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; + + /* + * A fork inside a signal handler seems to be more efficient than a + * fork() prior to the signal being raised. + */ + if (fork() == 0) { + /* + * Both child and parent will return, but, child returns + * with count set so it will exit in the next segfault. + * Parent will continue to loop. + */ + count = COUNT_MAX; + } + + /* + * If the change above does not hit the bug, it will cause a + * segmentation fault, since the ck structures are NULL. + */ +} + +void seg_signal_handler(int signo, siginfo_t *si, void *uc) +{ + if (count == COUNT_MAX) { + /* Return to tm_signal_force_msr() and exit */ + setcontext(&main_context); + } + + count++; + + /* Reexecute the test */ + setcontext(&init_context); +} + +void tm_trap_test(void) +{ + struct sigaction usr_sa, seg_sa; + stack_t ss; + + usr_sa.sa_flags = SA_SIGINFO | SA_ONSTACK; + usr_sa.sa_sigaction = usr_signal_handler; + + seg_sa.sa_flags = SA_SIGINFO; + seg_sa.sa_sigaction = seg_signal_handler; + + /* + * Set initial context. Will get back here from + * seg_signal_handler() + */ + getcontext(&init_context); + + /* Allocated am alternative signal stack area */ + ss.ss_sp = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + ss.ss_size = SIGSTKSZ; + ss.ss_flags = 0; + + if (ss.ss_sp == (void *)-1) { + perror("mmap error\n"); + exit(-1); + } + + /* Force the allocation through a page fault */ + if (madvise(ss.ss_sp, SIGSTKSZ, MADV_DONTNEED)) { + perror("madvise\n"); + exit(-1); + } + + /* Setting a alternative stack to generate a page fault when + * the signal is raised. + */ + if (sigaltstack(&ss, NULL)) { + perror("sigaltstack\n"); + exit(-1); + } + + /* The signal handler will enable MSR_TS */ + sigaction(SIGUSR1, &usr_sa, NULL); + /* If it does not crash, it will segfault, avoid it to retest */ + sigaction(SIGSEGV, &seg_sa, NULL); + + raise(SIGUSR1); +} + +int tm_signal_force_msr(void) +{ + SKIP_IF(!have_htm()); + + /* Will get back here after COUNT_MAX interactions */ + getcontext(&main_context); + + if (!first_time++) + tm_trap_test(); + + return EXIT_SUCCESS; +} + +int main(int argc, char **argv) +{ + test_harness(tm_signal_force_msr, "tm_signal_force_msr"); +}
Breno Leitao <leitao@debian.org> writes: > A new self test that forces MSR[TS] to be set without calling any TM > instruction. This test also tries to cause a page fault at a signal > handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing > thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG > when tm_recheckpoint() is called. > > This test is not deterministic since it is hard to guarantee that the page > access will cause a page fault. Tests have shown that the bug could be > exposed with few interactions in a buggy kernel. This test is configured to > loop 5000x, having a good chance to hit the kernel issue in just one run. > This self test takes less than two seconds to run. > > This test uses set/getcontext because the kernel will recheckpoint > zeroed structures, causing the test to segfault, which is undesired because > the test needs to rerun, so, there is a signal handler for SIGSEGV which > will restart the test. Hi Breno, Thanks for the test, some of these TM tests are getting pretty advanced! :) Unfortunately it doesn't build in a few configurations. On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get: tm-signal-force-msr.c: In function 'trap_signal_handler': tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member named 'gp_regs'; did you mean 'uc_regs'? ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; ^~~~~~~ uc_regs tm-signal-force-msr.c:17:29: error: left shift count >= width of type [-Werror=shift-count-overflow] #define __MASK(X) (1UL<<(X)) ^~ tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK' #define MSR_TS_S __MASK(MSR_TS_S_LG) /* Transaction Suspended */ ^~~~~~ tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S' ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; ^~~~~~~~ And using powerpc64le-linux-gnu-gcc I get: In file included from /usr/powerpc64le-linux-gnu/include/string.h:494, from tm-signal-force-msr.c:10: In function 'memcpy', inlined from 'trap_signal_handler' at tm-signal-force-msr.c:39:2: /usr/powerpc64le-linux-gnu/include/bits/string_fortified.h:34:10: error: '__builtin_memcpy' accessing 1272 bytes at offsets 8 and 168 overlaps 1112 bytes at offset 168 [-Werror=restrict] return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cheers
Hi Michael, On 12/20/18 10:51 AM, Michael Ellerman wrote: > Breno Leitao <leitao@debian.org> writes: > >> A new self test that forces MSR[TS] to be set without calling any TM >> instruction. This test also tries to cause a page fault at a signal >> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing >> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG >> when tm_recheckpoint() is called. >> >> This test is not deterministic since it is hard to guarantee that the page >> access will cause a page fault. Tests have shown that the bug could be >> exposed with few interactions in a buggy kernel. This test is configured to >> loop 5000x, having a good chance to hit the kernel issue in just one run. >> This self test takes less than two seconds to run. >> >> This test uses set/getcontext because the kernel will recheckpoint >> zeroed structures, causing the test to segfault, which is undesired because >> the test needs to rerun, so, there is a signal handler for SIGSEGV which >> will restart the test. > And reference the ucontext->mcontext MSR using UCONTEXT_MSR() macro. > Hi Breno, > > Thanks for the test, some of these TM tests are getting pretty advanced! :) > > Unfortunately it doesn't build in a few configurations. > > On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get: > > tm-signal-force-msr.c: In function 'trap_signal_handler': > tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member named 'gp_regs'; did you mean 'uc_regs'? > ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; > ^~~~~~~ > uc_regs > tm-signal-force-msr.c:17:29: error: left shift count >= width of type [-Werror=shift-count-overflow] > #define __MASK(X) (1UL<<(X)) > ^~ > tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK' > #define MSR_TS_S __MASK(MSR_TS_S_LG) /* Transaction Suspended */ > ^~~~~~ > tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S' > ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; > ^~~~~~~~ > That is because I missed the -m64 compilation flag on Makefile. I understand that this test only make sense when compiled in 64 bits. Do you agree? I might also add a macro to address ucontext->mcontext MSR. This will avoid problems like that in the future. index ae43a614835d..7636bf45d5d5 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -102,8 +102,10 @@ do { #if defined(__powerpc64__) #define UCONTEXT_NIA(UC) (UC)->uc_mcontext.gp_regs[PT_NIP] +#define UCONTEXT_MSR(UC) (UC)->uc_mcontext.gp_regs[PT_MSR] #elif defined(__powerpc__) #define UCONTEXT_NIA(UC) (UC)->uc_mcontext.uc_regs->gregs[PT_NIP] +#define UCONTEXT_MSR(UC) (UC)->uc_mcontext.uc_regs->gregs[PT_MSR] #else #error implement UCONTEXT_NIA #endif > And using powerpc64le-linux-gnu-gcc I get: > > In file included from /usr/powerpc64le-linux-gnu/include/string.h:494, > from tm-signal-force-msr.c:10: > In function 'memcpy', > inlined from 'trap_signal_handler' at tm-signal-force-msr.c:39:2: > /usr/powerpc64le-linux-gnu/include/bits/string_fortified.h:34:10: error: '__builtin_memcpy' accessing 1272 bytes at offsets 8 and 168 overlaps 1112 bytes at offset 168 [-Werror=restrict] > return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Damn, that is because I do not know how to use C pointers. Fixing it on v3 also.
Breno Leitao <leitao@debian.org> writes: > On 12/20/18 10:51 AM, Michael Ellerman wrote: >> Breno Leitao <leitao@debian.org> writes: >> >>> A new self test that forces MSR[TS] to be set without calling any TM >>> instruction. This test also tries to cause a page fault at a signal >>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing >>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG >>> when tm_recheckpoint() is called. >>> >>> This test is not deterministic since it is hard to guarantee that the page >>> access will cause a page fault. Tests have shown that the bug could be >>> exposed with few interactions in a buggy kernel. This test is configured to >>> loop 5000x, having a good chance to hit the kernel issue in just one run. >>> This self test takes less than two seconds to run. >>> >>> This test uses set/getcontext because the kernel will recheckpoint >>> zeroed structures, causing the test to segfault, which is undesired because >>> the test needs to rerun, so, there is a signal handler for SIGSEGV which >>> will restart the test. >> And reference the ucontext->mcontext MSR using UCONTEXT_MSR() macro. >> Hi Breno, >> >> Thanks for the test, some of these TM tests are getting pretty advanced! :) >> >> Unfortunately it doesn't build in a few configurations. >> >> On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get: >> >> tm-signal-force-msr.c: In function 'trap_signal_handler': >> tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member named 'gp_regs'; did you mean 'uc_regs'? >> ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; >> ^~~~~~~ >> uc_regs >> tm-signal-force-msr.c:17:29: error: left shift count >= width of type [-Werror=shift-count-overflow] >> #define __MASK(X) (1UL<<(X)) >> ^~ >> tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK' >> #define MSR_TS_S __MASK(MSR_TS_S_LG) /* Transaction Suspended */ >> ^~~~~~ >> tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S' >> ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; >> ^~~~~~~~ >> > > That is because I missed the -m64 compilation flag on Makefile. I understand > that this test only make sense when compiled in 64 bits. Do you agree? I think the test could work as a 32-bit binary on a 64-bit kernel, but I don't mind if you force it to build 64-bit. cheers
diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore index c3ee8393dae8..89679822ebc9 100644 --- a/tools/testing/selftests/powerpc/tm/.gitignore +++ b/tools/testing/selftests/powerpc/tm/.gitignore @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu tm-signal-context-chk-gpr tm-signal-context-chk-vmx tm-signal-context-chk-vsx +tm-signal-force-msr tm-vmx-unavail tm-unavailable tm-trap diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile index 9fc2cf6fbc92..58a2ebd13958 100644 --- a/tools/testing/selftests/powerpc/tm/Makefile +++ b/tools/testing/selftests/powerpc/tm/Makefile @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \ tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \ - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr top_srcdir = ../../../../.. include ../../lib.mk @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64 +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS)) $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c new file mode 100644 index 000000000000..4441d61c2328 --- /dev/null +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp. + */ + +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <signal.h> +#include <string.h> +#include <ucontext.h> +#include <unistd.h> + +#include "tm.h" +#include "utils.h" + +#define __MASK(X) (1UL<<(X)) +#define MSR_TS_S_LG 33 /* Trans Mem state: Suspended */ +#define MSR_TM __MASK(MSR_TM_LG) /* Transactional Mem Available */ +#define MSR_TS_S __MASK(MSR_TS_S_LG) /* Transaction Suspended */ + +#define COUNT_MAX 5000 /* Number of interactions */ + +/* Setting contexts because the test will crash and we want to recover */ +ucontext_t init_context, main_context; + +static int count, first_time; + +void trap_signal_handler(int signo, siginfo_t *si, void *uc) +{ + ucontext_t *ucp = uc; + + /* + * Allocating memory in a signal handler, and never freeing it on + * purpose, forcing the heap increase, so, the memory leak is what + * we want here. + */ + ucp->uc_link = malloc(sizeof(ucontext_t)); + memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext)); + + /* Forcing to enable MSR[TM] */ + ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S; + + /* + * A fork inside a signal handler seems to be more efficient than a + * fork() prior to the signal being raised. + */ + if (fork() == 0) { + /* + * Both child and parent will return, but, child returns + * with count set so it will exit in the next segfault. + * Parent will continue to loop. + */ + count = COUNT_MAX; + } + + /* + * If the change above does not hit the bug, it will cause a + * segmentation fault, since the ck structures are NULL. + */ +} + +void seg_signal_handler(int signo, siginfo_t *si, void *uc) +{ + if (count == COUNT_MAX) { + /* Return to tm_signal_force_msr() and exit */ + setcontext(&main_context); + } + + count++; + /* Reexecute the test */ + setcontext(&init_context); +} + +void tm_trap_test(void) +{ + struct sigaction trap_sa, seg_sa; + + trap_sa.sa_flags = SA_SIGINFO; + trap_sa.sa_sigaction = trap_signal_handler; + + seg_sa.sa_flags = SA_SIGINFO; + seg_sa.sa_sigaction = seg_signal_handler; + + /* + * Set initial context. Will get back here from + * seg_signal_handler() + */ + getcontext(&init_context); + + /* The signal handler will enable MSR_TS */ + sigaction(SIGUSR1, &trap_sa, NULL); + /* If it does not crash, it will segfault, avoid it to retest */ + sigaction(SIGSEGV, &seg_sa, NULL); + + raise(SIGUSR1); +} + +int tm_signal_force_msr(void) +{ + SKIP_IF(!have_htm()); + + /* Will get back here after COUNT_MAX interactions */ + getcontext(&main_context); + + if (!first_time++) + tm_trap_test(); + + return EXIT_SUCCESS; +} + +int main(int argc, char **argv) +{ + test_harness(tm_signal_force_msr, "tm_signal_force_msr"); +}
A new self test that forces MSR[TS] to be set without calling any TM instruction. This test also tries to cause a page fault at a signal handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG when tm_recheckpoint() is called. This test is not deterministic since it is hard to guarantee that the page access will cause a page fault. Tests have shown that the bug could be exposed with few interactions in a buggy kernel. This test is configured to loop 5000x, having a good chance to hit the kernel issue in just one run. This self test takes less than two seconds to run. This test uses set/getcontext because the kernel will recheckpoint zeroed structures, causing the test to segfault, which is undesired because the test needs to rerun, so, there is a signal handler for SIGSEGV which will restart the test. Signed-off-by: Breno Leitao <leitao@debian.org> --- tools/testing/selftests/powerpc/tm/.gitignore | 1 + tools/testing/selftests/powerpc/tm/Makefile | 3 +- .../powerpc/tm/tm-signal-force-msr.c | 115 ++++++++++++++++++ 3 files changed, 118 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c