Message ID | 20240911204158.2034295-4-seanjc@google.com |
---|---|
State | Changes Requested |
Headers | show |
Series | KVM: selftests: Morph max_guest_mem to mmu_stress | expand |
On Wed, Sep 11, 2024, Sean Christopherson wrote: > Use u64_replace_bits() instead of u64p_replace_bits() to set PMCR.N in > arm64's vPMU counter access test to fudge around what appears to be a gcc > bug. With the recent change to have vcpu_get_reg() return a value in lieu > of an out-param, some versions of gcc completely ignore the operation > performed by set_pmcr_n(), i.e. ignore the output param. Filed a gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116912 I'll report back if anything interesting comes out of that bug.
On Mon, Sep 30, 2024, Sean Christopherson wrote: > On Wed, Sep 11, 2024, Sean Christopherson wrote: > > Use u64_replace_bits() instead of u64p_replace_bits() to set PMCR.N in > > arm64's vPMU counter access test to fudge around what appears to be a gcc > > bug. With the recent change to have vcpu_get_reg() return a value in lieu > > of an out-param, some versions of gcc completely ignore the operation > > performed by set_pmcr_n(), i.e. ignore the output param. > > Filed a gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116912 > > I'll report back if anything interesting comes out of that bug. Well, there goes several hours that I'll never get back. Selftests are compiled with -O2, which enables strict-aliasing optimizations, and "unsigned long" and "unsigned long long" technically don't alias despite being the same size on 64-bit builds, so the compiler is allowed to optimize away the load. *sigh* I'll replace this with a patch to disable strict-aliasing, which the kernel has done since forever (literally predates git). Grr. diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 48d32c5aa3eb..a6f92129bb02 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -235,10 +235,10 @@ CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \ -Wno-gnu-variable-sized-type-not-at-end -MD -MP -DCONFIG_64BIT \ -fno-builtin-memcmp -fno-builtin-memcpy \ -fno-builtin-memset -fno-builtin-strnlen \ - -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ - -I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \ - -I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \ - $(KHDR_INCLUDES) + -fno-stack-protector -fno-PIE -fno-strict-aliasing \ + -I$(LINUX_TOOL_INCLUDE) -I$(LINUX_TOOL_ARCH_INCLUDE) \ + -I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(ARCH_DIR) \ + -I ../rseq -I.. $(EXTRA_CFLAGS) $(KHDR_INCLUDES)
diff --git a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c index 30d9c9e7ae35..74da8252b884 100644 --- a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c +++ b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c @@ -45,11 +45,6 @@ static uint64_t get_pmcr_n(uint64_t pmcr) return FIELD_GET(ARMV8_PMU_PMCR_N, pmcr); } -static void set_pmcr_n(uint64_t *pmcr, uint64_t pmcr_n) -{ - u64p_replace_bits((__u64 *) pmcr, pmcr_n, ARMV8_PMU_PMCR_N); -} - static uint64_t get_counters_mask(uint64_t n) { uint64_t mask = BIT(ARMV8_PMU_CYCLE_IDX); @@ -484,13 +479,12 @@ static void test_create_vpmu_vm_with_pmcr_n(uint64_t pmcr_n, bool expect_fail) vcpu = vpmu_vm.vcpu; pmcr_orig = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0)); - pmcr = pmcr_orig; /* * Setting a larger value of PMCR.N should not modify the field, and * return a success. */ - set_pmcr_n(&pmcr, pmcr_n); + pmcr = u64_replace_bits(pmcr_orig, pmcr_n, ARMV8_PMU_PMCR_N); vcpu_set_reg(vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0), pmcr); pmcr = vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(SYS_PMCR_EL0));
Use u64_replace_bits() instead of u64p_replace_bits() to set PMCR.N in arm64's vPMU counter access test to fudge around what appears to be a gcc bug. With the recent change to have vcpu_get_reg() return a value in lieu of an out-param, some versions of gcc completely ignore the operation performed by set_pmcr_n(), i.e. ignore the output param. The issue is most easily observed by making set_pmcr_n() noinline and wrapping the call with printf(), e.g. sans comments, for this code: printf("orig = %lx, next = %lx, want = %lu\n", pmcr_orig, pmcr, pmcr_n); set_pmcr_n(&pmcr, pmcr_n); printf("orig = %lx, next = %lx, want = %lu\n", pmcr_orig, pmcr, pmcr_n); gcc-13 generates: 0000000000401c90 <set_pmcr_n>: 401c90: f9400002 ldr x2, [x0] 401c94: b3751022 bfi x2, x1, #11, #5 401c98: f9000002 str x2, [x0] 401c9c: d65f03c0 ret 0000000000402660 <test_create_vpmu_vm_with_pmcr_n>: 402724: aa1403e3 mov x3, x20 402728: aa1503e2 mov x2, x21 40272c: aa1603e0 mov x0, x22 402730: aa1503e1 mov x1, x21 402734: 940060ff bl 41ab30 <_IO_printf> 402738: aa1403e1 mov x1, x20 40273c: 910183e0 add x0, sp, #0x60 402740: 97fffd54 bl 401c90 <set_pmcr_n> 402744: aa1403e3 mov x3, x20 402748: aa1503e2 mov x2, x21 40274c: aa1503e1 mov x1, x21 402750: aa1603e0 mov x0, x22 402754: 940060f7 bl 41ab30 <_IO_printf> with the value stored in [sp + 0x60] ignored by both printf() above and in the test proper, resulting in a false failure due to vcpu_set_reg() simply storing the original value, not the intended value. $ ./vpmu_counter_access Random seed: 0x6b8b4567 orig = 3040, next = 3040, want = 0 orig = 3040, next = 3040, want = 0 ==== Test Assertion Failure ==== aarch64/vpmu_counter_access.c:505: pmcr_n == get_pmcr_n(pmcr) pid=71578 tid=71578 errno=9 - Bad file descriptor 1 0x400673: run_access_test at vpmu_counter_access.c:522 2 (inlined by) main at vpmu_counter_access.c:643 3 0x4132d7: __libc_start_call_main at libc-start.o:0 4 0x413653: __libc_start_main at ??:0 5 0x40106f: _start at ??:0 Failed to update PMCR.N to 0 (received: 6) Somewhat bizarrely, gcc-11 also exhibits the same behavior, but only if set_pmcr_n() is marked noinline, whereas gcc-13 fails even if set_pmcr_n() is inlined in its sole caller. All signs point to this being a gcc bug, as clang doesn't exhibit the same issue, the code generated by u64p_replace_bits() is correct, and the error is somewhat transient, e.g. varies between gcc versions and depends on surrounding code. For now, work around the issue to unblock the vcpu_get_reg() cleanup, and because arguably using u64_replace_bits() makes the code a wee bit more intuitive. Signed-off-by: Sean Christopherson <seanjc@google.com> --- tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)