diff mbox series

selftests/powerpc: New TM signal self test

Message ID 1543411413-23863-1-git-send-email-leitao@debian.org (mailing list archive)
State Changes Requested
Headers show
Series selftests/powerpc: New TM signal self test | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied
snowpatch_ozlabs/build-ppc64le success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64be success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64e success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-pmac32 success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/checkpatch warning total: 0 errors, 0 warnings, 1 checks, 137 lines checked

Commit Message

Breno Leitao Nov. 28, 2018, 1:23 p.m. UTC
A new self test that forces MSR[TS] to be set without calling any TM
instruction. This test also tries to cause a page fault at a signal
handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
when tm_recheckpoint() is called.

This test is not deterministic since it is hard to guarantee that the page
access will cause a page fault. Tests have shown that the bug could be
exposed with few interactions in a buggy kernel. This test is configured to
loop 5000x, having a good chance to hit the kernel issue in just one run.
This self test takes less than two seconds to run.

This test uses set/getcontext because the kernel will recheckpoint
zeroed structures, causing the test to segfault, which is undesired because
the test needs to rerun, so, there is a signal handler for SIGSEGV which
will restart the test.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 tools/testing/selftests/powerpc/tm/.gitignore |   1 +
 tools/testing/selftests/powerpc/tm/Makefile   |   3 +-
 .../powerpc/tm/tm-signal-force-msr.c          | 115 ++++++++++++++++++
 3 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c

Comments

Michael Neuling Nov. 29, 2018, 2:11 a.m. UTC | #1
On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote:
> A new self test that forces MSR[TS] to be set without calling any TM
> instruction. This test also tries to cause a page fault at a signal
> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
> when tm_recheckpoint() is called.
> 
> This test is not deterministic since it is hard to guarantee that the page
> access will cause a page fault. Tests have shown that the bug could be
> exposed with few interactions in a buggy kernel. This test is configured to
> loop 5000x, having a good chance to hit the kernel issue in just one run.
> This self test takes less than two seconds to run.

You could try using sigaltstack() to put the ucontext somewhere else. Then you
could play tricks with that memory to try to force a fault.
madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick.

This is more extra credit to make it more reliable. Not a requirement.


> This test uses set/getcontext because the kernel will recheckpoint
> zeroed structures, causing the test to segfault, which is undesired because
> the test needs to rerun, so, there is a signal handler for SIGSEGV which
> will restart the test.

Please put this description at the top of the test also.

Other than that, it looks good.

Mikey

> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  tools/testing/selftests/powerpc/tm/.gitignore |   1 +
>  tools/testing/selftests/powerpc/tm/Makefile   |   3 +-
>  .../powerpc/tm/tm-signal-force-msr.c          | 115 ++++++++++++++++++
>  3 files changed, 118 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> 
> diff --git a/tools/testing/selftests/powerpc/tm/.gitignore
> b/tools/testing/selftests/powerpc/tm/.gitignore
> index c3ee8393dae8..89679822ebc9 100644
> --- a/tools/testing/selftests/powerpc/tm/.gitignore
> +++ b/tools/testing/selftests/powerpc/tm/.gitignore
> @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
>  tm-signal-context-chk-gpr
>  tm-signal-context-chk-vmx
>  tm-signal-context-chk-vsx
> +tm-signal-force-msr
>  tm-vmx-unavail
>  tm-unavailable
>  tm-trap
> diff --git a/tools/testing/selftests/powerpc/tm/Makefile
> b/tools/testing/selftests/powerpc/tm/Makefile
> index 9fc2cf6fbc92..58a2ebd13958 100644
> --- a/tools/testing/selftests/powerpc/tm/Makefile
> +++ b/tools/testing/selftests/powerpc/tm/Makefile
> @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-
> signal-context-chk-fpu
>  
>  TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-
> stack \
>  	tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap 
> \
> -	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
> +	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
>  
>  top_srcdir = ../../../../..
>  include ../../lib.mk
> @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
>  $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
>  $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-
> error=uninitialized -mvsx
>  $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
> +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
>  
>  SIGNAL_CONTEXT_CHK_TESTS := $(patsubst
> %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
>  $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
> diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> new file mode 100644
> index 000000000000..4441d61c2328
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
> + */
> +
> +#define _GNU_SOURCE
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <signal.h>
> +#include <string.h>
> +#include <ucontext.h>
> +#include <unistd.h>
> +
> +#include "tm.h"
> +#include "utils.h"
> +
> +#define __MASK(X)       (1UL<<(X))
> +#define MSR_TS_S_LG     33                  /* Trans Mem state: Suspended */
> +#define MSR_TM          __MASK(MSR_TM_LG)   /* Transactional Mem Available */
> +#define MSR_TS_S        __MASK(MSR_TS_S_LG) /* Transaction Suspended */

Surely we have these defined somewhere else in selftests? 

> +
> +#define COUNT_MAX       5000                /* Number of interactions */
> +
> +/* Setting contexts because the test will crash and we want to recover */
> +ucontext_t init_context, main_context;
> +
> +static int count, first_time;
> +
> +void trap_signal_handler(int signo, siginfo_t *si, void *uc)
> +{
> +	ucontext_t *ucp = uc;
> +
> +	/*
> +	 * Allocating memory in a signal handler, and never freeing it on
> +	 * purpose, forcing the heap increase, so, the memory leak is what
> +	 * we want here.
> +	 */
> +	ucp->uc_link = malloc(sizeof(ucontext_t));
> +	memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext));
> +
> +	/* Forcing to enable MSR[TM] */
> +	ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
> +
> +	/*
> +	 * A fork inside a signal handler seems to be more efficient than a
> +	 * fork() prior to the signal being raised.
> +	 */
> +	if (fork() == 0) {
> +		/*
> +		 * Both child and parent will return, but, child returns
> +		 * with count set so it will exit in the next segfault.
> +		 * Parent will continue to loop.
> +		 */
> +		count = COUNT_MAX;
> +	}
> +
> +	/*
> +	 * If the change above does not hit the bug, it will cause a
> +	 * segmentation fault, since the ck structures are NULL.
> +	 */
> +}
> +
> +void seg_signal_handler(int signo, siginfo_t *si, void *uc)
> +{
> +	if (count == COUNT_MAX) {
> +		/* Return to tm_signal_force_msr() and exit */
> +		setcontext(&main_context);
> +	}
> +
> +	count++;
> +	/* Reexecute the test */
> +	setcontext(&init_context);
> +}
> +
> +void tm_trap_test(void)
> +{
> +	struct sigaction trap_sa, seg_sa;
> +
> +	trap_sa.sa_flags = SA_SIGINFO;
> +	trap_sa.sa_sigaction = trap_signal_handler;
> +
> +	seg_sa.sa_flags = SA_SIGINFO;
> +	seg_sa.sa_sigaction = seg_signal_handler;
> +
> +	/*
> +	 * Set initial context. Will get back here from
> +	 * seg_signal_handler()
> +	 */
> +	getcontext(&init_context);
> +
> +	/* The signal handler will enable MSR_TS */
> +	sigaction(SIGUSR1, &trap_sa, NULL);
> +	/* If it does not crash, it will segfault, avoid it to retest */
> +	sigaction(SIGSEGV, &seg_sa, NULL);
> +
> +	raise(SIGUSR1);
> +}
> +
> +int tm_signal_force_msr(void)
> +{
> +	SKIP_IF(!have_htm());
> +
> +	/* Will get back here after COUNT_MAX interactions */
> +	getcontext(&main_context);
> +
> +	if (!first_time++)
> +		tm_trap_test();
> +
> +	return EXIT_SUCCESS;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	test_harness(tm_signal_force_msr, "tm_signal_force_msr");
> +}
Breno Leitao Dec. 4, 2018, 5:51 p.m. UTC | #2
Hi Mikey,

On 11/29/18 12:11 AM, Michael Neuling wrote:
> On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote:
>> A new self test that forces MSR[TS] to be set without calling any TM
>> instruction. This test also tries to cause a page fault at a signal
>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
>> when tm_recheckpoint() is called.
>>
>> This test is not deterministic since it is hard to guarantee that the page
>> access will cause a page fault. Tests have shown that the bug could be
>> exposed with few interactions in a buggy kernel. This test is configured to
>> loop 5000x, having a good chance to hit the kernel issue in just one run.
>> This self test takes less than two seconds to run.
> 
> You could try using sigaltstack() to put the ucontext somewhere else. Then you
> could play tricks with that memory to try to force a fault.
> madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick.

Yes, it sounded interesting and I implemented the test using madvice(). Thanks
for the suggestion!

The current approach didn't seem to improve the amount of page faults at it
seems that MADV_DONTNEED makes no difference when using a Lazy page loading.
This is the test I did, where 'original' is my current patch and 'madvice` is
the patch below:

  Performance counter stats for './original':

                 0      major-faults                                                
           125,100      minor-faults                                                

       2.575479619 seconds time elapsed


  Performance counter stats for './madvice':

                 0      major-faults                                                
           125,099      minor-faults         



Other than that, I didn't see any improvements in the reproduction rate also, although
it is a bit challenging to measure, since it crashes the machine and I can't run a
full statistical model.

This is the current patch I compared to the original one

---

commit 082a9fe29412943adfa2d6a363f44bac8e81d0ce
Author: Breno Leitao <leitao@debian.org>
Date:   Tue Nov 13 18:02:57 2018 -0500

    selftests/powerpc: New TM signal self test
    
    A new self test that forces MSR[TS] to be set without calling any TM
    instruction. This test also tries to cause a page fault at a signal
    handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
    thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
    when tm_recheckpoint() is called.
    
    This test is not deterministic, since it is hard to guarantee that the page
    access will cause a page fault. In order to force more page faults at
    signal context, the signal handler and the ucontext are being mapped into a
    MADV_DONTNEED memory chunks.
    
    Tests have shown that the bug could be exposed with few interactions in a
    buggy kernel. This test is configured to loop 5000x, having a good chance
    to hit the kernel issue in just one run.  This self test takes less than
    two seconds to run.
    
    This test uses set/getcontext because the kernel will recheckpoint
    zeroed structures, causing the test to segfault, which is undesired because
    the test needs to rerun, so, there is a signal handler for SIGSEGV which
    will restart the test.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore
index c3ee8393dae8..89679822ebc9 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
 tm-signal-context-chk-gpr
 tm-signal-context-chk-vmx
 tm-signal-context-chk-vsx
+tm-signal-force-msr
 tm-vmx-unavail
 tm-unavailable
 tm-trap
diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile
index 9fc2cf6fbc92..58a2ebd13958 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
 	tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \
-	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
+	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
 
 top_srcdir = ../../../../..
 include ../../lib.mk
@@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
 $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx
 $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
+$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
new file mode 100644
index 000000000000..496596f3c1bf
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
+ *
+ * This test raises a SIGUSR1 signal, and toggle the MSR[TS]
+ * fields at the signal handler. With MSR[TS] being set, the kernel will
+ * force a recheckpoint, which may cause a segfault when returning to
+ * user space. Since the kernel needs to re-run, the segfault needs to be
+ * caught and handled.
+ *
+ * In order to continue the test even after a segfault, the context is
+ * saved prior to the signal being raised, and it is restored when there is
+ * a segmentation fault. This happens for COUNT_MAX times.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <string.h>
+#include <ucontext.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include "tm.h"
+#include "utils.h"
+#include "reg.h"
+
+#define COUNT_MAX       5000		/* Number of interactions */
+
+/* Setting contexts because the test will crash and we want to recover */
+ucontext_t init_context, main_context;
+
+static int count, first_time;
+
+void usr_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+	ucontext_t *ucp = uc;
+	int ret;
+
+	/*
+	 * Allocating memory in a signal handler, and never freeing it on
+	 * purpose, forcing the heap increase, so, the memory leak is what
+	 * we want here.
+	 */
+	ucp->uc_link = mmap(NULL, sizeof(ucontext_t),
+			    PROT_READ | PROT_WRITE,
+			    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	if (ucp->uc_link == (void *)-1) {
+		perror("Mmap failed");
+		exit(-1);
+	}
+
+	/* Forcing the page to be allocated in a page fault */
+	ret = madvise(ucp->uc_link, sizeof(ucontext_t), MADV_DONTNEED);
+	if (ret) {
+		perror("madvise failed");
+		exit(-1);
+	}
+
+	memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext));
+
+	/* Forcing to enable MSR[TM] */
+	ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
+
+	/*
+	 * A fork inside a signal handler seems to be more efficient than a
+	 * fork() prior to the signal being raised.
+	 */
+	if (fork() == 0) {
+		/*
+		 * Both child and parent will return, but, child returns
+		 * with count set so it will exit in the next segfault.
+		 * Parent will continue to loop.
+		 */
+		count = COUNT_MAX;
+	}
+
+	/*
+	 * If the change above does not hit the bug, it will cause a
+	 * segmentation fault, since the ck structures are NULL.
+	 */
+}
+
+void seg_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+	if (count == COUNT_MAX) {
+		/* Return to tm_signal_force_msr() and exit */
+		setcontext(&main_context);
+	}
+
+	count++;
+
+	/* Reexecute the test */
+	setcontext(&init_context);
+}
+
+void tm_trap_test(void)
+{
+	struct sigaction usr_sa, seg_sa;
+	stack_t ss;
+
+	usr_sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
+	usr_sa.sa_sigaction = usr_signal_handler;
+
+	seg_sa.sa_flags = SA_SIGINFO;
+	seg_sa.sa_sigaction = seg_signal_handler;
+
+	/*
+	 * Set initial context. Will get back here from
+	 * seg_signal_handler()
+	 */
+	getcontext(&init_context);
+
+	/* Allocated am alternative signal stack area */
+	ss.ss_sp = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	ss.ss_size = SIGSTKSZ;
+	ss.ss_flags = 0;
+
+	if (ss.ss_sp == (void *)-1) {
+		perror("mmap error\n");
+		exit(-1);
+	}
+
+	/* Force the allocation through a page fault */
+	if (madvise(ss.ss_sp, SIGSTKSZ, MADV_DONTNEED)) {
+		perror("madvise\n");
+		exit(-1);
+	}
+
+	/* Setting a alternative stack to generate a page fault when
+	 * the signal is raised.
+	 */
+	if (sigaltstack(&ss, NULL)) {
+		perror("sigaltstack\n");
+		exit(-1);
+	}
+
+	/* The signal handler will enable MSR_TS */
+	sigaction(SIGUSR1, &usr_sa, NULL);
+	/* If it does not crash, it will segfault, avoid it to retest */
+	sigaction(SIGSEGV, &seg_sa, NULL);
+
+	raise(SIGUSR1);
+}
+
+int tm_signal_force_msr(void)
+{
+	SKIP_IF(!have_htm());
+
+	/* Will get back here after COUNT_MAX interactions */
+	getcontext(&main_context);
+
+	if (!first_time++)
+		tm_trap_test();
+
+	return EXIT_SUCCESS;
+}
+
+int main(int argc, char **argv)
+{
+	test_harness(tm_signal_force_msr, "tm_signal_force_msr");
+}
Michael Ellerman Dec. 20, 2018, 12:51 p.m. UTC | #3
Breno Leitao <leitao@debian.org> writes:

> A new self test that forces MSR[TS] to be set without calling any TM
> instruction. This test also tries to cause a page fault at a signal
> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
> when tm_recheckpoint() is called.
>
> This test is not deterministic since it is hard to guarantee that the page
> access will cause a page fault. Tests have shown that the bug could be
> exposed with few interactions in a buggy kernel. This test is configured to
> loop 5000x, having a good chance to hit the kernel issue in just one run.
> This self test takes less than two seconds to run.
>
> This test uses set/getcontext because the kernel will recheckpoint
> zeroed structures, causing the test to segfault, which is undesired because
> the test needs to rerun, so, there is a signal handler for SIGSEGV which
> will restart the test.

Hi Breno,

Thanks for the test, some of these TM tests are getting pretty advanced! :)

Unfortunately it doesn't build in a few configurations.

On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get:

  tm-signal-force-msr.c: In function 'trap_signal_handler':
  tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member named 'gp_regs'; did you mean 'uc_regs'?
    ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
                     ^~~~~~~
                     uc_regs
  tm-signal-force-msr.c:17:29: error: left shift count >= width of type [-Werror=shift-count-overflow]
   #define __MASK(X)       (1UL<<(X))
                               ^~
  tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK'
   #define MSR_TS_S        __MASK(MSR_TS_S_LG) /* Transaction Suspended */
                           ^~~~~~
  tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S'
    ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
                                        ^~~~~~~~

And using powerpc64le-linux-gnu-gcc I get:

  In file included from /usr/powerpc64le-linux-gnu/include/string.h:494,
                   from tm-signal-force-msr.c:10:
  In function 'memcpy',
      inlined from 'trap_signal_handler' at tm-signal-force-msr.c:39:2:
  /usr/powerpc64le-linux-gnu/include/bits/string_fortified.h:34:10: error: '__builtin_memcpy' accessing 1272 bytes at offsets 8 and 168 overlaps 1112 bytes at offset 168 [-Werror=restrict]
     return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

cheers
Breno Leitao Jan. 3, 2019, 1:05 p.m. UTC | #4
Hi Michael,

On 12/20/18 10:51 AM, Michael Ellerman wrote:
> Breno Leitao <leitao@debian.org> writes:
> 
>> A new self test that forces MSR[TS] to be set without calling any TM
>> instruction. This test also tries to cause a page fault at a signal
>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
>> when tm_recheckpoint() is called.
>>
>> This test is not deterministic since it is hard to guarantee that the page
>> access will cause a page fault. Tests have shown that the bug could be
>> exposed with few interactions in a buggy kernel. This test is configured to
>> loop 5000x, having a good chance to hit the kernel issue in just one run.
>> This self test takes less than two seconds to run.
>>
>> This test uses set/getcontext because the kernel will recheckpoint
>> zeroed structures, causing the test to segfault, which is undesired because
>> the test needs to rerun, so, there is a signal handler for SIGSEGV which
>> will restart the test.
> And reference the ucontext->mcontext MSR using UCONTEXT_MSR() macro.
> Hi Breno,
> 
> Thanks for the test, some of these TM tests are getting pretty advanced! :)
> 
> Unfortunately it doesn't build in a few configurations.
> 
> On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get:
> 
>   tm-signal-force-msr.c: In function 'trap_signal_handler':
>   tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member named 'gp_regs'; did you mean 'uc_regs'?
>     ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
>                      ^~~~~~~
>                      uc_regs
>   tm-signal-force-msr.c:17:29: error: left shift count >= width of type [-Werror=shift-count-overflow]
>    #define __MASK(X)       (1UL<<(X))
>                                ^~
>   tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK'
>    #define MSR_TS_S        __MASK(MSR_TS_S_LG) /* Transaction Suspended */
>                            ^~~~~~
>   tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S'
>     ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
>                                         ^~~~~~~~
> 

That is because I missed the -m64 compilation flag on Makefile. I understand
that this test only make sense when compiled in 64 bits. Do you agree?

I might also add a macro to address ucontext->mcontext MSR. This will avoid
problems like that in the future.

 index ae43a614835d..7636bf45d5d5 100644
 --- a/tools/testing/selftests/powerpc/include/utils.h
 +++ b/tools/testing/selftests/powerpc/include/utils.h
 @@ -102,8 +102,10 @@ do {


  #if defined(__powerpc64__)
  #define UCONTEXT_NIA(UC)       (UC)->uc_mcontext.gp_regs[PT_NIP]
 +#define UCONTEXT_MSR(UC)       (UC)->uc_mcontext.gp_regs[PT_MSR]
  #elif defined(__powerpc__)
  #define UCONTEXT_NIA(UC)       (UC)->uc_mcontext.uc_regs->gregs[PT_NIP]
 +#define UCONTEXT_MSR(UC)       (UC)->uc_mcontext.uc_regs->gregs[PT_MSR]
  #else
  #error implement UCONTEXT_NIA
  #endif

> And using powerpc64le-linux-gnu-gcc I get:
> 
>   In file included from /usr/powerpc64le-linux-gnu/include/string.h:494,
>                    from tm-signal-force-msr.c:10:
>   In function 'memcpy',
>       inlined from 'trap_signal_handler' at tm-signal-force-msr.c:39:2:
>   /usr/powerpc64le-linux-gnu/include/bits/string_fortified.h:34:10: error: '__builtin_memcpy' accessing 1272 bytes at offsets 8 and 168 overlaps 1112 bytes at offset 168 [-Werror=restrict]
>      return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
>             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Damn, that is because I do not know how to use C pointers. Fixing it on v3 also.
Michael Ellerman Jan. 8, 2019, 10:16 a.m. UTC | #5
Breno Leitao <leitao@debian.org> writes:
> On 12/20/18 10:51 AM, Michael Ellerman wrote:
>> Breno Leitao <leitao@debian.org> writes:
>> 
>>> A new self test that forces MSR[TS] to be set without calling any TM
>>> instruction. This test also tries to cause a page fault at a signal
>>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
>>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
>>> when tm_recheckpoint() is called.
>>>
>>> This test is not deterministic since it is hard to guarantee that the page
>>> access will cause a page fault. Tests have shown that the bug could be
>>> exposed with few interactions in a buggy kernel. This test is configured to
>>> loop 5000x, having a good chance to hit the kernel issue in just one run.
>>> This self test takes less than two seconds to run.
>>>
>>> This test uses set/getcontext because the kernel will recheckpoint
>>> zeroed structures, causing the test to segfault, which is undesired because
>>> the test needs to rerun, so, there is a signal handler for SIGSEGV which
>>> will restart the test.
>> And reference the ucontext->mcontext MSR using UCONTEXT_MSR() macro.
>> Hi Breno,
>> 
>> Thanks for the test, some of these TM tests are getting pretty advanced! :)
>> 
>> Unfortunately it doesn't build in a few configurations.
>> 
>> On Ubuntu 18.10 built with powerpc-linux-gnu-gcc I get:
>> 
>>   tm-signal-force-msr.c: In function 'trap_signal_handler':
>>   tm-signal-force-msr.c:42:19: error: 'union uc_regs_ptr' has no member named 'gp_regs'; did you mean 'uc_regs'?
>>     ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
>>                      ^~~~~~~
>>                      uc_regs
>>   tm-signal-force-msr.c:17:29: error: left shift count >= width of type [-Werror=shift-count-overflow]
>>    #define __MASK(X)       (1UL<<(X))
>>                                ^~
>>   tm-signal-force-msr.c:20:25: note: in expansion of macro '__MASK'
>>    #define MSR_TS_S        __MASK(MSR_TS_S_LG) /* Transaction Suspended */
>>                            ^~~~~~
>>   tm-signal-force-msr.c:42:38: note: in expansion of macro 'MSR_TS_S'
>>     ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
>>                                         ^~~~~~~~
>> 
>
> That is because I missed the -m64 compilation flag on Makefile. I understand
> that this test only make sense when compiled in 64 bits. Do you agree?

I think the test could work as a 32-bit binary on a 64-bit kernel, but I
don't mind if you force it to build 64-bit.

cheers
diff mbox series

Patch

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore
index c3ee8393dae8..89679822ebc9 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -11,6 +11,7 @@  tm-signal-context-chk-fpu
 tm-signal-context-chk-gpr
 tm-signal-context-chk-vmx
 tm-signal-context-chk-vsx
+tm-signal-force-msr
 tm-vmx-unavail
 tm-unavailable
 tm-trap
diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile
index 9fc2cf6fbc92..58a2ebd13958 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,7 +4,7 @@  SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
 	tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \
-	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
+	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
 
 top_srcdir = ../../../../..
 include ../../lib.mk
@@ -20,6 +20,7 @@  $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
 $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx
 $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
+$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
new file mode 100644
index 000000000000..4441d61c2328
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
@@ -0,0 +1,115 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <string.h>
+#include <ucontext.h>
+#include <unistd.h>
+
+#include "tm.h"
+#include "utils.h"
+
+#define __MASK(X)       (1UL<<(X))
+#define MSR_TS_S_LG     33                  /* Trans Mem state: Suspended */
+#define MSR_TM          __MASK(MSR_TM_LG)   /* Transactional Mem Available */
+#define MSR_TS_S        __MASK(MSR_TS_S_LG) /* Transaction Suspended */
+
+#define COUNT_MAX       5000                /* Number of interactions */
+
+/* Setting contexts because the test will crash and we want to recover */
+ucontext_t init_context, main_context;
+
+static int count, first_time;
+
+void trap_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+	ucontext_t *ucp = uc;
+
+	/*
+	 * Allocating memory in a signal handler, and never freeing it on
+	 * purpose, forcing the heap increase, so, the memory leak is what
+	 * we want here.
+	 */
+	ucp->uc_link = malloc(sizeof(ucontext_t));
+	memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext));
+
+	/* Forcing to enable MSR[TM] */
+	ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
+
+	/*
+	 * A fork inside a signal handler seems to be more efficient than a
+	 * fork() prior to the signal being raised.
+	 */
+	if (fork() == 0) {
+		/*
+		 * Both child and parent will return, but, child returns
+		 * with count set so it will exit in the next segfault.
+		 * Parent will continue to loop.
+		 */
+		count = COUNT_MAX;
+	}
+
+	/*
+	 * If the change above does not hit the bug, it will cause a
+	 * segmentation fault, since the ck structures are NULL.
+	 */
+}
+
+void seg_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+	if (count == COUNT_MAX) {
+		/* Return to tm_signal_force_msr() and exit */
+		setcontext(&main_context);
+	}
+
+	count++;
+	/* Reexecute the test */
+	setcontext(&init_context);
+}
+
+void tm_trap_test(void)
+{
+	struct sigaction trap_sa, seg_sa;
+
+	trap_sa.sa_flags = SA_SIGINFO;
+	trap_sa.sa_sigaction = trap_signal_handler;
+
+	seg_sa.sa_flags = SA_SIGINFO;
+	seg_sa.sa_sigaction = seg_signal_handler;
+
+	/*
+	 * Set initial context. Will get back here from
+	 * seg_signal_handler()
+	 */
+	getcontext(&init_context);
+
+	/* The signal handler will enable MSR_TS */
+	sigaction(SIGUSR1, &trap_sa, NULL);
+	/* If it does not crash, it will segfault, avoid it to retest */
+	sigaction(SIGSEGV, &seg_sa, NULL);
+
+	raise(SIGUSR1);
+}
+
+int tm_signal_force_msr(void)
+{
+	SKIP_IF(!have_htm());
+
+	/* Will get back here after COUNT_MAX interactions */
+	getcontext(&main_context);
+
+	if (!first_time++)
+		tm_trap_test();
+
+	return EXIT_SUCCESS;
+}
+
+int main(int argc, char **argv)
+{
+	test_harness(tm_signal_force_msr, "tm_signal_force_msr");
+}