diff mbox series

[committed,testsuite] Enable pr94600-{1,3}.c tests for nvptx

Message ID efc375c0-dd94-ec9d-6a77-b63befe0d069@suse.de
State New
Headers show
Series [committed,testsuite] Enable pr94600-{1,3}.c tests for nvptx | expand

Commit Message

Tom de Vries Oct. 1, 2020, 11:25 a.m. UTC
[ was: Re: [committed][testsuite] Re-enable pr94600-{1,3}.c tests for arm ]

On 10/1/20 7:38 AM, Hans-Peter Nilsson wrote:
> On Wed, 30 Sep 2020, Tom de Vries wrote:
> 
>> [ was: Re: [committed][testsuite] Require non_strict_align in
>> pr94600-{1,3}.c ]
>>
>> On 9/30/20 4:53 AM, Hans-Peter Nilsson wrote:
>>> On Thu, 24 Sep 2020, Tom de Vries wrote:
>>>
>>>> Hi,
>>>>
>>>> With the nvptx target, we run into:
>>>> ...
>>>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(mem/v" 6
>>>> FAIL: gcc.dg/pr94600-1.c scan-rtl-dump-times final "\\(set \\(mem/v" 6
>>>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(mem/v" 1
>>>> FAIL: gcc.dg/pr94600-3.c scan-rtl-dump-times final "\\(set \\(mem/v" 1
>>>> ...
>>>> The scans attempt to check for volatile stores, but on nvptx we have memcpy
>>>> instead.
>>>>
>>>> This is due to nvptx being a STRICT_ALIGNMENT target, which has the effect
>>>> that the TYPE_MODE for the store target is set to BKLmode in
>>>> compute_record_mode.
>>>>
>>>> Fix the FAILs by requiring effective target non_strict_align.
>>>
>>> No, that's wrong.  There's more than that at play; it worked for
>>> the strict-alignment targets where it was tested at the time.
>>>
>>
>> Hi,
>>
>> thanks for letting me know.
>>
>>> The test is a valuable canary for this kind of bug.  You now
>>> disabled it for strict-alignment targets.
>>>
>>> Please revert and add your target specifier instead, if you
>>> don't feel like investigating further.
>>
>> I've analyzed the compilation on strict-alignment target arm-eabi, and
> 
> An analysis should result in more than that statement.
> 

Well, it refers to the analysis in the commit log of the patch, sorry if
that was not obvious.

>> broadened the effective target to (non_strict_align ||
>> pcc_bitfield_type_matters).
> 
> That's *also* not right.  I'm guessing your nvptx fails because
> it has 64-bit alignment requirement, but no 32-bit writes.
> ...um, no that can't be it, nvptx seems to have them.  Costs?
> Yes, probably your #define MOVE_RATIO(SPEED) 4.
> 
> The writes are to 32-bit aligned addresses which gcc can deduce
> (also for strict-alignment targets) because it's a literal,
> where it isn't explicitly declared to be attribute-aligned
> 
> You should have noticed the weirness in that you "only" needed
> to tweak pr94600-1.c and -3.c, not even pr94600-2.c, which
> should be the case if it was just the test-case getting the
> predicates wrong.  This points at your MOVE_RATIO, together with
> middle-end not applying it consistently for -2.c.
> 
> Again, please just skip for nvptx (don't mix-n-match general
> predicates) unless you really look into the reason you don't get
> 6 single 32-bit-writes only in *some* of the cases.

Thanks for the pointer to pr94600-2.c.  I've compared the behaviour
between pr94600-1.c and pr94600-2.c and figured out why in one case we
get the load/store pair, and in the other the memcpy.  See rationale in
commit below.

Committed to trunk.

Thanks,
- Tom

Comments

Hans-Peter Nilsson Oct. 2, 2020, 5:32 a.m. UTC | #1
On Thu, 1 Oct 2020, Tom de Vries wrote:

> [ was: Re: [committed][testsuite] Re-enable pr94600-{1,3}.c tests for arm ]
>
> On 10/1/20 7:38 AM, Hans-Peter Nilsson wrote:
> > On Wed, 30 Sep 2020, Tom de Vries wrote:
> >> I've analyzed the compilation on strict-alignment target arm-eabi, and
> >
> > An analysis should result in more than that statement.
> >
>
> Well, it refers to the analysis in the commit log of the patch, sorry if
> that was not obvious.

Aha, I think I only saw your first commit, thanks.  Yes, that
looked more appropriate, but I would have preferred a proper
review to a commit-as-obvious (assuming that was the track when
nothing else was stated), since this was more than just target
gating or *trivial* missing predicates.

> Thanks for the pointer to pr94600-2.c.  I've compared the behaviour
> between pr94600-1.c and pr94600-2.c and figured out why in one case we
> get the load/store pair, and in the other the memcpy.  See rationale in
> commit below.

You may be on the right track judging from the commit log, and I
see my hunch about MOVE_RATIO wasn't far off either.  I guess I
should look into it too with rested eyes, but I'm happy with
this direction; not disabling the test for many targets.

Thanks for looking into it again.

brgds, H-P
diff mbox series

Patch

[testsuite] Enable pr94600-{1,3}.c tests for nvptx

When compiling test-case pr94600-1.c for nvptx, this gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a0[0];
...
is expanded into a memcpy, but when compiling pr94600-2.c instead, this similar
gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a00;
...
is expanded into a 32-bit load/store pair.

In both cases, emit_block_move is called.

In the latter case, can_move_by_pieces (4 /* byte-size */, 32 /* bit-align */)
is called, which returns true (because by_pieces_ninsns returns 1, which is
smaller than the MOVE_RATIO of 4).

In the former case, can_move_by_pieces (4 /* byte-size */, 8 /* bit-align */)
is called, which returns false (because by_pieces_ninsns returns 4, which is
not smaller than the MOVE_RATIO of 4).

So the difference in code generation is explained by the alignment.  The
difference in alignment comes from the move sources: a0[0] vs. a00.  Both
have the same type with 8-bit alignment, but a00 is on stack, which based on
the base stack align and stack variable placement happens to result in a
32-bit alignment.

Enable test-cases pr94600-{1,3}.c for nvptx by forcing the currently 8-byte
aligned variables to have a 32-bit alignment for STRICT_ALIGNMENT targets.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2020-10-01  Tom de Vries  <tdevries@suse.de>

	* gcc.dg/pr94600-1.c: Force 32-bit alignment for a0 for !non_strict_align
	targets.  Remove target clauses from scan tests.
	* gcc.dg/pr94600-3.c: Same.

---
 gcc/testsuite/gcc.dg/pr94600-1.c | 11 ++++++++---
 gcc/testsuite/gcc.dg/pr94600-3.c | 11 ++++++++---
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr94600-1.c b/gcc/testsuite/gcc.dg/pr94600-1.c
index c9a7bb9007e..149e4f35dbe 100644
--- a/gcc/testsuite/gcc.dg/pr94600-1.c
+++ b/gcc/testsuite/gcc.dg/pr94600-1.c
@@ -1,6 +1,7 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target size32plus } */
 /* { dg-options "-fdump-rtl-final -O2" } */
+/* { dg-additional-options "-DALIGN_VAR" { target { ! non_strict_align } } } */
 
 /* Assignments to a whole struct of suitable size (32 bytes) must not be
    picked apart into field accesses. */
@@ -12,7 +13,11 @@  typedef struct {
   unsigned int f3 : 7;
 } t0;
 
-static t0 a0[] = {
+static t0 a0[]
+#ifdef ALIGN_VAR
+__attribute__((aligned (4)))
+#endif
+  = {
  { .f0 = 7, .f1 = 99, .f3 = 1, },
  { .f0 = 7, .f1 = 251, .f3 = 1, },
  { .f0 = 8, .f1 = 127, .f3 = 5, },
@@ -32,5 +37,5 @@  foo(void)
 }
 
 /* The only volatile accesses should be the obvious writes.  */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 6 "final" } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 6 "final" } } */
diff --git a/gcc/testsuite/gcc.dg/pr94600-3.c b/gcc/testsuite/gcc.dg/pr94600-3.c
index ff42c7db3c6..2fce9f13cfa 100644
--- a/gcc/testsuite/gcc.dg/pr94600-3.c
+++ b/gcc/testsuite/gcc.dg/pr94600-3.c
@@ -1,6 +1,7 @@ 
 /* { dg-do compile } */
 /* { dg-require-effective-target size32plus } */
 /* { dg-options "-fdump-rtl-final -O2 -fno-unroll-loops" } */
+/* { dg-additional-options "-DALIGN_VAR" { target { ! non_strict_align } } } */
 
 /* Same-address version of pr94600-1.c. */
 
@@ -11,7 +12,11 @@  typedef struct {
   unsigned int f3 : 7;
 } t0;
 
-static t0 a0[] = {
+static t0 a0[]
+#ifdef ALIGN_VAR
+__attribute__((aligned (4)))
+#endif
+  = {
  { .f0 = 7, .f1 = 99, .f3 = 1, },
  { .f0 = 7, .f1 = 251, .f3 = 1, },
  { .f0 = 8, .f1 = 127, .f3 = 5, },
@@ -31,5 +36,5 @@  foo(void)
 }
 
 /* The loop isn't unrolled. */
-/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
-/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" { target { non_strict_align || pcc_bitfield_type_matters } } } } */
+/* { dg-final { scan-rtl-dump-times {\(mem/v} 1 "final" } } */
+/* { dg-final { scan-rtl-dump-times {\(set \(mem/v} 1 "final" } } */