Message ID | 20210426101524.a6atjl3hyo2fpvyq@arm.com |
---|---|
State | New |
Headers | show |
Series | arm: Fix wrong code with MVE V2DImode loads and stores [PR99960] | expand |
Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568669.html On 26/04/2021 11:15, Alex Coplan via Gcc-patches wrote: > Hi, > > As the PR shows, we currently miscompile V2DImode loads and stores for > MVE. We're currently using 64-bit loads/stores, but need to be using > 128-bit vector loads and stores. > > Some intrinsics tests were checking that we (incorrectly) used the > 64-bit loads/stores: these have been updated. > > Regression tested an arm-eabi cross configured > --with-arch=armv8.1-m.main+mve --with-float=hard. The patch has the > following effect on the testsuite: > > FAIL->PASS: c-c++-common/torture/vector-compare-1.c -O0 execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O0 execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O1 execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O2 execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O3 -g execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -Os execution test > FAIL->PASS: gcc.dg/compat/vector-1 c_compat_x_tst.o-c_compat_y_tst.o execute > FAIL->PASS: gcc.dg/torture/pr52407.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O1 execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O2 execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O3 -g execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -Os execution test > FAIL->PASS: gcc.dg/torture/pr57748-1.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr57748-2.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr57748-3.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr57748-4.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O1 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O2 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O3 -g execution test FAIL->PASS: gcc.dg/torture/pr58041.c -Os execution test > FAIL->PASS: gcc.dg/torture/pr61346.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test > FAIL->PASS: gcc.dg/torture/pr61346.c -O3 -g execution test > FAIL->PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test > FAIL->PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test > > Bootstrap and regtest on arm-linux-gnueabihf in progress. > > OK for trunk and eventual backports to 11 and 10 branches if regstrap > looks good? > > Thanks, > Alex > > gcc/ChangeLog: > > PR target/99960 > * config/arm/mve.md (*mve_mov<mode>): Simplify output code. Use > vldrw.u32 and vstrw.32 for V2D[IF]mode loads and stores. > > gcc/testsuite/ChangeLog: > > PR target/99960 > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: > Update now that we're (correctly) using full 128-bit vector > loads/stores. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: > Likewise.
> -----Original Message----- > From: Alex Coplan <Alex.Coplan@arm.com> > Sent: 26 April 2021 11:15 > To: gcc-patches@gcc.gnu.org > Cc: nickc@redhat.com; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo > Tkachov <Kyrylo.Tkachov@arm.com> > Subject: [PATCH] arm: Fix wrong code with MVE V2DImode loads and stores > [PR99960] > > Hi, > > As the PR shows, we currently miscompile V2DImode loads and stores for > MVE. We're currently using 64-bit loads/stores, but need to be using > 128-bit vector loads and stores. > > Some intrinsics tests were checking that we (incorrectly) used the > 64-bit loads/stores: these have been updated. > > Regression tested an arm-eabi cross configured > --with-arch=armv8.1-m.main+mve --with-float=hard. The patch has the > following effect on the testsuite: > > FAIL->PASS: c-c++-common/torture/vector-compare-1.c -O0 execution > test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O0 execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O1 execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O2 execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O2 -flto -fno-use-linker- > plugin -flto-partition=none execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O2 -flto -fuse-linker-plugin - > fno-fat-lto-objects execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -O3 -g execution test > FAIL->PASS: gcc.c-torture/execute/pr92618.c -Os execution test > FAIL->PASS: gcc.dg/compat/vector-1 c_compat_x_tst.o-c_compat_y_tst.o > execute > FAIL->PASS: gcc.dg/torture/pr52407.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O1 execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O2 execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O2 -flto -fno-use-linker-plugin -flto- > partition=none execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O2 -flto -fuse-linker-plugin -fno-fat- > lto-objects execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O3 -fomit-frame-pointer -funroll- > loops -fpeel-loops -ftracer -finline-functions execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -O3 -g execution test > FAIL->PASS: gcc.dg/torture/pr52407.c -Os execution test > FAIL->PASS: gcc.dg/torture/pr57748-1.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr57748-2.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr57748-3.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr57748-4.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O0 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O1 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O2 execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O2 -flto -fno-use-linker-plugin -flto- > partition=none execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O2 -flto -fuse-linker-plugin -fno-fat- > lto-objects execution test > FAIL->PASS: gcc.dg/torture/pr58041.c -O3 -g execution test FAIL->PASS: > gcc.dg/torture/pr58041.c -Os execution test > FAIL->PASS: gcc.dg/torture/pr61346.c -O3 -fomit-frame-pointer -funroll- > loops -fpeel-loops -ftracer -finline-functions execution test > FAIL->PASS: gcc.dg/torture/pr61346.c -O3 -g execution test > FAIL->PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test > FAIL->PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test > > Bootstrap and regtest on arm-linux-gnueabihf in progress. > > OK for trunk and eventual backports to 11 and 10 branches if regstrap > looks good? Ok. I had looked at it earlier but had forgotten to reply... Thanks, Kyrill > > Thanks, > Alex > > gcc/ChangeLog: > > PR target/99960 > * config/arm/mve.md (*mve_mov<mode>): Simplify output code. > Use > vldrw.u32 and vstrw.32 for V2D[IF]mode loads and stores. > > gcc/testsuite/ChangeLog: > > PR target/99960 > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: > Update now that we're (correctly) using full 128-bit vector > loads/stores. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: > Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise. > * gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: > Likewise.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 7467d5f4d57..5c11885fb73 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -41,44 +41,19 @@ (define_insn "*mve_mov<mode>" if (which_alternative == 4 || which_alternative == 7) { - rtx ops[2]; - int regno = (which_alternative == 7) - ? REGNO (operands[1]) : REGNO (operands[0]); - - ops[0] = operands[0]; - ops[1] = operands[1]; - if (<MODE>mode == V2DFmode || <MODE>mode == V2DImode) - { - if (which_alternative == 7) - { - ops[1] = gen_rtx_REG (DImode, regno); - output_asm_insn ("vstr.64\t%P1, %E0",ops); - } - else - { - ops[0] = gen_rtx_REG (DImode, regno); - output_asm_insn ("vldr.64\t%P0, %E1",ops); - } - } - else if (<MODE>mode == TImode) + if (<MODE>mode == V2DFmode || <MODE>mode == V2DImode || <MODE>mode == TImode) { if (which_alternative == 7) - output_asm_insn ("vstr.64\t%q1, %E0",ops); + output_asm_insn ("vstrw.32\t%q1, %E0", operands); else - output_asm_insn ("vldr.64\t%q0, %E1",ops); + output_asm_insn ("vldrw.u32\t%q0, %E1",operands); } else { if (which_alternative == 7) - { - ops[1] = gen_rtx_REG (TImode, regno); - output_asm_insn ("vstr<V_sz_elem1>.<V_sz_elem>\t%q1, %E0",ops); - } + output_asm_insn ("vstr<V_sz_elem1>.<V_sz_elem>\t%q1, %E0", operands); else - { - ops[0] = gen_rtx_REG (TImode, regno); - output_asm_insn ("vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1",ops); - } + output_asm_insn ("vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1", operands); } return ""; } diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c index 7420d0198e7..a9b1f81b62d 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c @@ -11,6 +11,6 @@ foo (uint64x2_t * addr) } /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ -/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vldrw.u32" 1 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ /* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c index ebe5b2fd70c..e32a06695ae 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c @@ -11,6 +11,6 @@ foo (uint64x2_t * addr) } /* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ -/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vldrw.u32" 1 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ /* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c index 231a24a1e55..bb06cf88e32 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c @@ -10,6 +10,6 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p) /* { dg-final { scan-assembler "vpst" } } */ /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ -/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vldrw.u32" 1 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ /* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c index b8d9b5c1391..558115d49ef 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c @@ -10,6 +10,6 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p) /* { dg-final { scan-assembler "vpst" } } */ /* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */ -/* { dg-final { scan-assembler-times "vldr.64" 1 } } */ -/* { dg-final { scan-assembler-times "vstr.64" 1 } } */ +/* { dg-final { scan-assembler-times "vldrw.u32" 1 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 1 } } */ /* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c index bf6692fe573..cc5e6358bee 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int.c @@ -27,6 +27,5 @@ foo () /* { dg-final { scan-assembler-times "vstrb.8" 2 } } */ /* { dg-final { scan-assembler-times "vstrh.16" 2 } } */ -/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */ -/* { dg-final { scan-assembler-times "vstr.64" 2 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 4 } } */ /* { dg-final { scan-assembler-not "__ARM_undef" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c index 4f66a07ac29..bfeb52b4dd6 100644 --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c @@ -28,6 +28,5 @@ foo () /* { dg-final { scan-assembler-times "vstrb.8" 2 } } */ /* { dg-final { scan-assembler-times "vstrh.16" 2 } } */ -/* { dg-final { scan-assembler-times "vstrw.32" 2 } } */ -/* { dg-final { scan-assembler-times "vstr.64" 2 } } */ +/* { dg-final { scan-assembler-times "vstrw.32" 4 } } */ /* { dg-final { scan-assembler-not "__ARM_undef" } } */