diff mbox

Fix unaligned access when predictive commoning (PR 71083)

Message ID AM4PR0701MB2162F9A6F39A908B01FB6D4DE41C0@AM4PR0701MB2162.eurprd07.prod.outlook.com
State New
Headers show

Commit Message

Bernd Edlinger Aug. 9, 2016, 5:31 p.m. UTC
On 08/09/16 09:29, Richard Biener wrote:
> On Mon, 8 Aug 2016, Bernd Edlinger wrote:
>
>> Hi!
>>
>>
>> As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71083
>> we generate unaligned accesses because tree-predcom rewrites
>> bitfield and packed accesses in a way that looses the proper
>> alignment information.
>>
>> The attached patch fixes this by re-using the gimple memory
>> expression from the original access.
>>
>> I was not sure, if any non-constant array references would be
>> valid at where the ref_at_iteration is expanded, I set these
>> to the array-lower-bound, as the DR_OFFSET contains the folded
>> value of all non-constant array references.
>>
>> The patch compensates for the constant offset from the outer
>> reference to the inner reference, and replaces the inner
>> reference instead of the outer reference.
>>
>>
>> Boot-strapped and reg-tested on x86_64-linux-gnu.
>> OK for trunk?
>
> I don't think what you do is correct.  Consider
>
>   for (i)
>    for (j)
>     {
>       ... = a[i][j];
>     }
>
> predcom considers innermost loops only and thus the DRs will
> be analyzed with respect to that which means DR_BASE_ADDRESS
> is &a[i][0] but you end up generating (*(&a) + i * ..)[0][0]
> which is at best suspicious with respect to further analyses
> by data-ref and TBAA.  Also note that this may get alignment
> wrong as well as changing [i] to [0] may lead to false alignment
> (consider a [2][2] char array aligned to 4 bytes).
>
> Your patch goes halfway back to code we had in the past
> (looking at the 4.3 branch right now) which had numerous
> issues (don't remember exactly).
>

Hmm.  Yes.  These ARRAY_REFs ruin the whole idea :)

> I believe that if we'd handle bitfields by
>
>    if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
>        && DECL_BIT_FIELD_TYPE (TREE_OPERAND (DR_REF (dr), 1)))
>      ref = TREE_OPERAND (DR_REF (dr), 0);
>    else
>      ref = DR_REF (dr);
>    unsigned align = get_object_alignment (ref);
>
> and use the type / alignment of that ref for the built MEM_REF
> (with coff adjusted by the split bitfield component-ref offset)
> and build a COMPONENT_REF around the MEM_REF to handle the
> bitfield part this should work ok.
>

Ooookay.  I think your idea works in principle.

Attached a new version of the patch, I hope it did not become
too ugly.

I think from Eric's comment in get_inner_ref it can be possible
in Ada that the outer object is accidentally byte-aligned
but the inner reference is not.  In that case I think it is
better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.

Eric do you agree?  Are there any Ada test cases where the
pcom optimization jumps in?

We prefer the COMPONENT_REF just because it is likely more
aligned than the bit field itself.  The mem_ref should be
correctly aligned in both cases.

I was not sure if I should pass the TREE_OPERAND(ref, 2) to
the COMPONENT_REF constructor. Is that used at all? All other
places where COMPONENT_REF are built, seen to pass NULL there.


Bootstrap on x86_64-linux-gnu, reg-test is still running.

Is it OK it no regressions?


Thanks
Bernd.
2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* tree-predcom.c (ref_at_iteration): Correctly align the
	inner reference.

testsuite:
2016-08-08  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR tree-optimization/71083
	* gcc.c-torture/execute/pr71083.c: New test.

Comments

Eric Botcazou Aug. 9, 2016, 8:48 p.m. UTC | #1
> I think from Eric's comment in get_inner_ref it can be possible
> in Ada that the outer object is accidentally byte-aligned
> but the inner reference is not.  In that case I think it is
> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.

The patch says get_bit_range instead...  The comment therein means that 
bitfields can be nested in Ada: you can have a bitfield in a record which is 
itself a bitfield in another record.

> Eric do you agree?  Are there any Ada test cases where the
> pcom optimization jumps in?

According to the comment, data-ref analysis punts on bit offsets so I'm not 
sure how boff can be not byte-aligned.
Bernd Edlinger Aug. 9, 2016, 10:22 p.m. UTC | #2
On 08/09/16 22:48, Eric Botcazou wrote:
>> I think from Eric's comment in get_inner_ref it can be possible
>> in Ada that the outer object is accidentally byte-aligned
>> but the inner reference is not.  In that case I think it is
>> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
>
> The patch says get_bit_range instead...  The comment therein means that
> bitfields can be nested in Ada: you can have a bitfield in a record which is
> itself a bitfield in another record.
>
>> Eric do you agree?  Are there any Ada test cases where the
>> pcom optimization jumps in?
>
> According to the comment, data-ref analysis punts on bit offsets so I'm not
> sure how boff can be not byte-aligned.
>

I mean in dr_analyze_innermost, we have:

   base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
                               &punsignedp, &preversep, &pvolatilep);
   gcc_assert (base != NULL_TREE);

   if (pbitpos % BITS_PER_UNIT != 0)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
         fprintf (dump_file, "failed: bit offset alignment.\n");
       return false;
     }

   if (preversep)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
         fprintf (dump_file, "failed: reverse storage order.\n");
       return false;
     }


and that means that get_inner_reference on the outermost ref
returns a byte-aligned bit-offset, and from there I would
think it has the knowledge, when to punt on reversep and
when the offset is not byte-aligned.

But in get_bit_range we have a bit-field with arbitrary
bit-offset, but surprisingly the
get_inner_reference (TREE_OPERAND (exp, 0)) did not return
byte-aligned bit-offset.  I doubt that data-ref ananlysis
ever cares about the byte-alignment of intermediate
refs.

We know that the difference between the innermost ref
and the outer ref is byte-aligned, but we do not know
that the same is true for offset between the COMPONENT_REF
and the outer ref.

I mean boff is essentially the difference between
bitpos of get_inner_reference(exp) and
bitpos of get_inner_reference(TREE_OPERAND (exp, 0))

This would be exposed by my patch, because previously
we always generated BIT_FIELD_REFS, with bit-offset 0,
but the MEM_REF at the byte-offset there is in all likelihood
pretty much unaligned, the MEM_REF at the COMPONENT_REF
is likely more aligned and allows better code for ARM processors,
but only if the MEM_REF is at a byte-aligned offset at all,
otherwise the whole transformation would be wrong.



Thanks
Bernd.
Bernd Edlinger Aug. 10, 2016, 8:47 a.m. UTC | #3
On 08/10/16, Bernd Edlinger wrote:
>On 08/09/16 22:48, Eric Botcazou wrote:
>>> I think from Eric's comment in get_inner_ref it can be possible
>>> in Ada that the outer object is accidentally byte-aligned
>>> but the inner reference is not.  In that case I think it is
>>> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
>>

I actually meant to say get_bit_range.

I tried to translate the c-test case to an equivalent ada test case with
my not-so-fluent Ada speak...

So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT) != 0)
code path.

Before my patch there was an unaligned SImode access, and with
the patch we have 3 QImode accesses here.

cat pr71083_pkg.ads 
package Pr71083_pkg is
  type Nibble is mod 2**4;
  type Int24  is mod 2**24;
  type StructA is record
    a : Nibble;
    b : Int24;
  end record;
  pragma Pack(StructA);
  type StructB is record
    a : Nibble;
    b : StructA;
  end record;
  pragma Pack(StructB);
  type ArrayOfStructB is array(0..100) of StructB;
  procedure Foo (X : in out ArrayOfStructB);
end Pr71083_pkg;

cat pr71083_pkg.adb
-- { dg-do compile }
-- { dg-options "-O3" }
package body Pr71083_pkg is
  procedure Foo (X : in out ArrayOfStructB) is
  begin
    for K in 0..99 loop
      X (K+1).b.b := X (K).b.b;
    end loop;
  end Foo;
end Pr71083_pkg;

cat pr71083.adb 
-- { dg-do run }
-- { dg-options "-O3" }
with Pr71083_pkg;
use Pr71083_pkg;
procedure Pr71083 is
  Test : ArrayOfStructB;
begin
  Test (0).b.b := 9999;
  Foo (Test);
  if Test (100).b.b /= 9999 then
    raise Program_Error;
  end if;
end;


I was not sure which name to choose,
so I used the same convention as in the c-torture.
How do you like that?

I would like to add that to the gnat.dg because
it seems that there is zero testing for the predictive
commoning in the gnat.dg test suite.


Bernd.

>>> The patch says get_bit_range instead...  The comment therein means that
>>> bitfields can be nested in Ada: you can have a bitfield in a record which is
>>> itself a bitfield in another record.
>>>
>>>> Eric do you agree?  Are there any Ada test cases where the
>>>> pcom optimization jumps in?
>>>
>>> According to the comment, data-ref analysis punts on bit offsets so I'm not
>>> sure how boff can be not byte-aligned.
>>>
>
> I mean in dr_analyze_innermost, we have:
>
>   base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
>                               &punsignedp, &preversep, &pvolatilep);
>   gcc_assert (base != NULL_TREE);
>
>   if (pbitpos % BITS_PER_UNIT != 0)
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "failed: bit offset alignment.\n");
>       return false;
>     }
>
>   if (preversep)
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "failed: reverse storage order.\n");
>       return false;
>     }
>
>
> and that means that get_inner_reference on the outermost ref
> returns a byte-aligned bit-offset, and from there I would
> think it has the knowledge, when to punt on reversep and
> when the offset is not byte-aligned.
>
> But in get_bit_range we have a bit-field with arbitrary
> bit-offset, but surprisingly the
> get_inner_reference (TREE_OPERAND (exp, 0)) did not return
> byte-aligned bit-offset.  I doubt that data-ref ananlysis
> ever cares about the byte-alignment of intermediate
> refs.
>
> We know that the difference between the innermost ref
> and the outer ref is byte-aligned, but we do not know
> that the same is true for offset between the COMPONENT_REF
> and the outer ref.
>
> I mean boff is essentially the difference between
> bitpos of get_inner_reference(exp) and
> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
>
> This would be exposed by my patch, because previously
> we always generated BIT_FIELD_REFS, with bit-offset 0,
> but the MEM_REF at the byte-offset there is in all likelihood
> pretty much unaligned, the MEM_REF at the COMPONENT_REF
> is likely more aligned and allows better code for ARM processors,
> but only if the MEM_REF is at a byte-aligned offset at all,
> otherwise the whole transformation would be wrong.
>
>
>
> Thanks
> Bernd.
Eric Botcazou Aug. 10, 2016, 12:19 p.m. UTC | #4
> I tried to translate the c-test case to an equivalent ada test case with
> my not-so-fluent Ada speak...
> 
> So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT)
> != 0) code path.

I see, well done then.

> Before my patch there was an unaligned SImode access, and with
> the patch we have 3 QImode accesses here.
> 
> cat pr71083_pkg.ads
> package Pr71083_pkg is
>   type Nibble is mod 2**4;
>   type Int24  is mod 2**24;
>   type StructA is record
>     a : Nibble;
>     b : Int24;
>   end record;
>   pragma Pack(StructA);
>   type StructB is record
>     a : Nibble;
>     b : StructA;
>   end record;
>   pragma Pack(StructB);
>   type ArrayOfStructB is array(0..100) of StructB;
>   procedure Foo (X : in out ArrayOfStructB);
> end Pr71083_pkg;
> 
> cat pr71083_pkg.adb
> -- { dg-do compile }
> -- { dg-options "-O3" }
> package body Pr71083_pkg is
>   procedure Foo (X : in out ArrayOfStructB) is
>   begin
>     for K in 0..99 loop
>       X (K+1).b.b := X (K).b.b;
>     end loop;
>   end Foo;
> end Pr71083_pkg;
> 
> cat pr71083.adb
> -- { dg-do run }
> -- { dg-options "-O3" }
> with Pr71083_pkg;
> use Pr71083_pkg;
> procedure Pr71083 is
>   Test : ArrayOfStructB;
> begin
>   Test (0).b.b := 9999;
>   Foo (Test);
>   if Test (100).b.b /= 9999 then
>     raise Program_Error;
>   end if;
> end;
> 
> 
> I was not sure which name to choose,
> so I used the same convention as in the c-torture.
> How do you like that?

Worst convention ever. :-)  Either opt57 or loop_optimization23 at your 
convenience (you can add a reference to the PR in a comment).

> I would like to add that to the gnat.dg because
> it seems that there is zero testing for the predictive
> commoning in the gnat.dg test suite.

Very likely so.  The renamed testcase is OK for mainline, but please replace 
"Eric's" with "the" in the patch or copy-and-paste the explanation.
Richard Biener Aug. 10, 2016, 12:29 p.m. UTC | #5
On Tue, 9 Aug 2016, Bernd Edlinger wrote:

> On 08/09/16 22:48, Eric Botcazou wrote:
> >> I think from Eric's comment in get_inner_ref it can be possible
> >> in Ada that the outer object is accidentally byte-aligned
> >> but the inner reference is not.  In that case I think it is
> >> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF.
> >
> > The patch says get_bit_range instead...  The comment therein means that
> > bitfields can be nested in Ada: you can have a bitfield in a record which is
> > itself a bitfield in another record.
> >
> >> Eric do you agree?  Are there any Ada test cases where the
> >> pcom optimization jumps in?
> >
> > According to the comment, data-ref analysis punts on bit offsets so I'm not
> > sure how boff can be not byte-aligned.
> >
> 
> I mean in dr_analyze_innermost, we have:
> 
>    base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode,
>                                &punsignedp, &preversep, &pvolatilep);
>    gcc_assert (base != NULL_TREE);
> 
>    if (pbitpos % BITS_PER_UNIT != 0)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          fprintf (dump_file, "failed: bit offset alignment.\n");
>        return false;
>      }
> 
>    if (preversep)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          fprintf (dump_file, "failed: reverse storage order.\n");
>        return false;
>      }
> 
> 
> and that means that get_inner_reference on the outermost ref
> returns a byte-aligned bit-offset, and from there I would
> think it has the knowledge, when to punt on reversep and
> when the offset is not byte-aligned.
> 
> But in get_bit_range we have a bit-field with arbitrary
> bit-offset, but surprisingly the
> get_inner_reference (TREE_OPERAND (exp, 0)) did not return
> byte-aligned bit-offset.  I doubt that data-ref ananlysis
> ever cares about the byte-alignment of intermediate
> refs.

It doesn't.  Note that it cares about byte-alignment of the ref
only because it decomposes the ref into an address plus
a byte offset - and the address is natrually to a byte-aligned thing.

> We know that the difference between the innermost ref
> and the outer ref is byte-aligned, but we do not know
> that the same is true for offset between the COMPONENT_REF
> and the outer ref.
> 
> I mean boff is essentially the difference between
> bitpos of get_inner_reference(exp) and
> bitpos of get_inner_reference(TREE_OPERAND (exp, 0))
> 
> This would be exposed by my patch, because previously
> we always generated BIT_FIELD_REFS, with bit-offset 0,
> but the MEM_REF at the byte-offset there is in all likelihood
> pretty much unaligned, the MEM_REF at the COMPONENT_REF
> is likely more aligned and allows better code for ARM processors,
> but only if the MEM_REF is at a byte-aligned offset at all,
> otherwise the whole transformation would be wrong.

Note that the important thing to ensure is that the access the
MEM_REF performs is correct TBAA-wise which means you either
have to use alias-set zero (ptr_type_node offset) or _not_
shuffle the offset arbitrarily between the MEM_REF and the
components you wrap it in.

Richard.
diff mbox

Patch

Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	(revision 239193)
+++ gcc/tree-predcom.c	(working copy)
@@ -213,6 +213,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-affine.h"
+#include "builtins.h"
 
 /* The maximum number of iterations between the considered memory
    references.  */
@@ -1364,11 +1365,16 @@  replace_ref_with (gimple *stmt, tree new_tree, boo
 /* Returns a memory reference to DR in the ITER-th iteration of
    the loop it was analyzed in.  Append init stmts to STMTS.  */
 
-static tree 
+static tree
 ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts)
 {
   tree off = DR_OFFSET (dr);
   tree coff = DR_INIT (dr);
+  tree ref = DR_REF (dr);
+  enum tree_code ref_code = ERROR_MARK;
+  tree ref_type = NULL_TREE;
+  tree ref_op1 = NULL_TREE;
+  tree ref_op2 = NULL_TREE;
   if (iter == 0)
     ;
   else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST)
@@ -1377,27 +1383,43 @@  ref_at_iteration (data_reference_p dr, int iter, g
   else
     off = size_binop (PLUS_EXPR, off,
 		      size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter)));
+  if (TREE_CODE (ref) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
+    {
+      unsigned HOST_WIDE_INT boff;
+      tree field = TREE_OPERAND (ref, 1);
+      ref_type = TREE_TYPE (ref);
+      boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field));
+      /* This can occur in Ada.  See Eric's comment in get_bit_range.  */
+      if ((boff % BITS_PER_UNIT) != 0)
+	{
+	  ref_code = BIT_FIELD_REF;
+	  ref_op1 = DECL_SIZE (field);
+	  ref_op2 = bitsize_zero_node;
+	}
+      else
+	{
+	  boff >>= LOG2_BITS_PER_UNIT;
+	  boff += tree_to_uhwi (component_ref_field_offset (ref));
+	  coff = size_binop (MINUS_EXPR, coff, ssize_int (boff));
+	  ref_code = COMPONENT_REF;
+	  ref_op1 = field;
+	  ref_op2 = TREE_OPERAND (ref, 2);
+	  ref = TREE_OPERAND (ref, 0);
+	}
+    }
   tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off);
   addr = force_gimple_operand_1 (unshare_expr (addr), stmts,
 				 is_gimple_mem_ref_addr, NULL_TREE);
-  tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff);
+  tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff);
+  tree type = build_aligned_type (TREE_TYPE (ref),
+				  get_object_alignment (ref));
+  ref = build2 (MEM_REF, type, addr, alias_ptr);
   /* While data-ref analysis punts on bit offsets it still handles
-     bitfield accesses at byte boundaries.  Cope with that.  Note that
-     we cannot simply re-apply the outer COMPONENT_REF because the
-     byte-granular portion of it is already applied via DR_INIT and
-     DR_OFFSET, so simply build a BIT_FIELD_REF knowing that the bits
-     start at offset zero.  */
-  if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
-      && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
-    {
-      tree field = TREE_OPERAND (DR_REF (dr), 1);
-      return build3 (BIT_FIELD_REF, TREE_TYPE (DR_REF (dr)),
-		     build2 (MEM_REF, DECL_BIT_FIELD_TYPE (field),
-			     addr, alias_ptr),
-		     DECL_SIZE (field), bitsize_zero_node);
-    }
-  else
-    return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), addr, alias_ptr);
+     bitfield accesses at byte boundaries.  Cope with that.  */
+  if (ref_type)
+    ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2);
+  return ref;
 }
 
 /* Get the initialization expression for the INDEX-th temporary variable
Index: gcc/testsuite/gcc.c-torture/execute/pr71083.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/pr71083.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/pr71083.c	(working copy)
@@ -0,0 +1,43 @@ 
+struct lock_chain {
+  unsigned int irq_context: 2,
+    depth: 6,
+    base: 24;
+};
+
+__attribute__((noinline, noclone))
+struct lock_chain * foo (struct lock_chain *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain1 {
+  char x;
+  unsigned short base;
+} __attribute__((packed));
+
+__attribute__((noinline, noclone))
+struct lock_chain1 * bar (struct lock_chain1 *chain)
+{
+  int i;
+  for (i = 0; i < 100; i++)
+    {
+      chain[i+1].base = chain[i].base;
+    }
+  return chain;
+}
+
+struct lock_chain test [101];
+struct lock_chain1 test1 [101];
+
+int
+main ()
+{
+  foo (test);
+  bar (test1);
+  return 0;
+}