Message ID | AM4PR0701MB2162F9A6F39A908B01FB6D4DE41C0@AM4PR0701MB2162.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
> I think from Eric's comment in get_inner_ref it can be possible > in Ada that the outer object is accidentally byte-aligned > but the inner reference is not. In that case I think it is > better to generate a BIT_FIELD_REF instead of a COMPONENT_REF. The patch says get_bit_range instead... The comment therein means that bitfields can be nested in Ada: you can have a bitfield in a record which is itself a bitfield in another record. > Eric do you agree? Are there any Ada test cases where the > pcom optimization jumps in? According to the comment, data-ref analysis punts on bit offsets so I'm not sure how boff can be not byte-aligned.
On 08/09/16 22:48, Eric Botcazou wrote: >> I think from Eric's comment in get_inner_ref it can be possible >> in Ada that the outer object is accidentally byte-aligned >> but the inner reference is not. In that case I think it is >> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF. > > The patch says get_bit_range instead... The comment therein means that > bitfields can be nested in Ada: you can have a bitfield in a record which is > itself a bitfield in another record. > >> Eric do you agree? Are there any Ada test cases where the >> pcom optimization jumps in? > > According to the comment, data-ref analysis punts on bit offsets so I'm not > sure how boff can be not byte-aligned. > I mean in dr_analyze_innermost, we have: base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode, &punsignedp, &preversep, &pvolatilep); gcc_assert (base != NULL_TREE); if (pbitpos % BITS_PER_UNIT != 0) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "failed: bit offset alignment.\n"); return false; } if (preversep) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "failed: reverse storage order.\n"); return false; } and that means that get_inner_reference on the outermost ref returns a byte-aligned bit-offset, and from there I would think it has the knowledge, when to punt on reversep and when the offset is not byte-aligned. But in get_bit_range we have a bit-field with arbitrary bit-offset, but surprisingly the get_inner_reference (TREE_OPERAND (exp, 0)) did not return byte-aligned bit-offset. I doubt that data-ref ananlysis ever cares about the byte-alignment of intermediate refs. We know that the difference between the innermost ref and the outer ref is byte-aligned, but we do not know that the same is true for offset between the COMPONENT_REF and the outer ref. I mean boff is essentially the difference between bitpos of get_inner_reference(exp) and bitpos of get_inner_reference(TREE_OPERAND (exp, 0)) This would be exposed by my patch, because previously we always generated BIT_FIELD_REFS, with bit-offset 0, but the MEM_REF at the byte-offset there is in all likelihood pretty much unaligned, the MEM_REF at the COMPONENT_REF is likely more aligned and allows better code for ARM processors, but only if the MEM_REF is at a byte-aligned offset at all, otherwise the whole transformation would be wrong. Thanks Bernd.
On 08/10/16, Bernd Edlinger wrote: >On 08/09/16 22:48, Eric Botcazou wrote: >>> I think from Eric's comment in get_inner_ref it can be possible >>> in Ada that the outer object is accidentally byte-aligned >>> but the inner reference is not. In that case I think it is >>> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF. >> I actually meant to say get_bit_range. I tried to translate the c-test case to an equivalent ada test case with my not-so-fluent Ada speak... So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT) != 0) code path. Before my patch there was an unaligned SImode access, and with the patch we have 3 QImode accesses here. cat pr71083_pkg.ads package Pr71083_pkg is type Nibble is mod 2**4; type Int24 is mod 2**24; type StructA is record a : Nibble; b : Int24; end record; pragma Pack(StructA); type StructB is record a : Nibble; b : StructA; end record; pragma Pack(StructB); type ArrayOfStructB is array(0..100) of StructB; procedure Foo (X : in out ArrayOfStructB); end Pr71083_pkg; cat pr71083_pkg.adb -- { dg-do compile } -- { dg-options "-O3" } package body Pr71083_pkg is procedure Foo (X : in out ArrayOfStructB) is begin for K in 0..99 loop X (K+1).b.b := X (K).b.b; end loop; end Foo; end Pr71083_pkg; cat pr71083.adb -- { dg-do run } -- { dg-options "-O3" } with Pr71083_pkg; use Pr71083_pkg; procedure Pr71083 is Test : ArrayOfStructB; begin Test (0).b.b := 9999; Foo (Test); if Test (100).b.b /= 9999 then raise Program_Error; end if; end; I was not sure which name to choose, so I used the same convention as in the c-torture. How do you like that? I would like to add that to the gnat.dg because it seems that there is zero testing for the predictive commoning in the gnat.dg test suite. Bernd. >>> The patch says get_bit_range instead... The comment therein means that >>> bitfields can be nested in Ada: you can have a bitfield in a record which is >>> itself a bitfield in another record. >>> >>>> Eric do you agree? Are there any Ada test cases where the >>>> pcom optimization jumps in? >>> >>> According to the comment, data-ref analysis punts on bit offsets so I'm not >>> sure how boff can be not byte-aligned. >>> > > I mean in dr_analyze_innermost, we have: > > base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode, > &punsignedp, &preversep, &pvolatilep); > gcc_assert (base != NULL_TREE); > > if (pbitpos % BITS_PER_UNIT != 0) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "failed: bit offset alignment.\n"); > return false; > } > > if (preversep) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "failed: reverse storage order.\n"); > return false; > } > > > and that means that get_inner_reference on the outermost ref > returns a byte-aligned bit-offset, and from there I would > think it has the knowledge, when to punt on reversep and > when the offset is not byte-aligned. > > But in get_bit_range we have a bit-field with arbitrary > bit-offset, but surprisingly the > get_inner_reference (TREE_OPERAND (exp, 0)) did not return > byte-aligned bit-offset. I doubt that data-ref ananlysis > ever cares about the byte-alignment of intermediate > refs. > > We know that the difference between the innermost ref > and the outer ref is byte-aligned, but we do not know > that the same is true for offset between the COMPONENT_REF > and the outer ref. > > I mean boff is essentially the difference between > bitpos of get_inner_reference(exp) and > bitpos of get_inner_reference(TREE_OPERAND (exp, 0)) > > This would be exposed by my patch, because previously > we always generated BIT_FIELD_REFS, with bit-offset 0, > but the MEM_REF at the byte-offset there is in all likelihood > pretty much unaligned, the MEM_REF at the COMPONENT_REF > is likely more aligned and allows better code for ARM processors, > but only if the MEM_REF is at a byte-aligned offset at all, > otherwise the whole transformation would be wrong. > > > > Thanks > Bernd.
> I tried to translate the c-test case to an equivalent ada test case with > my not-so-fluent Ada speak... > > So here is a test case that *actually* hits the if ((boff % BITS_PER_UNIT) > != 0) code path. I see, well done then. > Before my patch there was an unaligned SImode access, and with > the patch we have 3 QImode accesses here. > > cat pr71083_pkg.ads > package Pr71083_pkg is > type Nibble is mod 2**4; > type Int24 is mod 2**24; > type StructA is record > a : Nibble; > b : Int24; > end record; > pragma Pack(StructA); > type StructB is record > a : Nibble; > b : StructA; > end record; > pragma Pack(StructB); > type ArrayOfStructB is array(0..100) of StructB; > procedure Foo (X : in out ArrayOfStructB); > end Pr71083_pkg; > > cat pr71083_pkg.adb > -- { dg-do compile } > -- { dg-options "-O3" } > package body Pr71083_pkg is > procedure Foo (X : in out ArrayOfStructB) is > begin > for K in 0..99 loop > X (K+1).b.b := X (K).b.b; > end loop; > end Foo; > end Pr71083_pkg; > > cat pr71083.adb > -- { dg-do run } > -- { dg-options "-O3" } > with Pr71083_pkg; > use Pr71083_pkg; > procedure Pr71083 is > Test : ArrayOfStructB; > begin > Test (0).b.b := 9999; > Foo (Test); > if Test (100).b.b /= 9999 then > raise Program_Error; > end if; > end; > > > I was not sure which name to choose, > so I used the same convention as in the c-torture. > How do you like that? Worst convention ever. :-) Either opt57 or loop_optimization23 at your convenience (you can add a reference to the PR in a comment). > I would like to add that to the gnat.dg because > it seems that there is zero testing for the predictive > commoning in the gnat.dg test suite. Very likely so. The renamed testcase is OK for mainline, but please replace "Eric's" with "the" in the patch or copy-and-paste the explanation.
On Tue, 9 Aug 2016, Bernd Edlinger wrote: > On 08/09/16 22:48, Eric Botcazou wrote: > >> I think from Eric's comment in get_inner_ref it can be possible > >> in Ada that the outer object is accidentally byte-aligned > >> but the inner reference is not. In that case I think it is > >> better to generate a BIT_FIELD_REF instead of a COMPONENT_REF. > > > > The patch says get_bit_range instead... The comment therein means that > > bitfields can be nested in Ada: you can have a bitfield in a record which is > > itself a bitfield in another record. > > > >> Eric do you agree? Are there any Ada test cases where the > >> pcom optimization jumps in? > > > > According to the comment, data-ref analysis punts on bit offsets so I'm not > > sure how boff can be not byte-aligned. > > > > I mean in dr_analyze_innermost, we have: > > base = get_inner_reference (ref, &pbitsize, &pbitpos, &poffset, &pmode, > &punsignedp, &preversep, &pvolatilep); > gcc_assert (base != NULL_TREE); > > if (pbitpos % BITS_PER_UNIT != 0) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "failed: bit offset alignment.\n"); > return false; > } > > if (preversep) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, "failed: reverse storage order.\n"); > return false; > } > > > and that means that get_inner_reference on the outermost ref > returns a byte-aligned bit-offset, and from there I would > think it has the knowledge, when to punt on reversep and > when the offset is not byte-aligned. > > But in get_bit_range we have a bit-field with arbitrary > bit-offset, but surprisingly the > get_inner_reference (TREE_OPERAND (exp, 0)) did not return > byte-aligned bit-offset. I doubt that data-ref ananlysis > ever cares about the byte-alignment of intermediate > refs. It doesn't. Note that it cares about byte-alignment of the ref only because it decomposes the ref into an address plus a byte offset - and the address is natrually to a byte-aligned thing. > We know that the difference between the innermost ref > and the outer ref is byte-aligned, but we do not know > that the same is true for offset between the COMPONENT_REF > and the outer ref. > > I mean boff is essentially the difference between > bitpos of get_inner_reference(exp) and > bitpos of get_inner_reference(TREE_OPERAND (exp, 0)) > > This would be exposed by my patch, because previously > we always generated BIT_FIELD_REFS, with bit-offset 0, > but the MEM_REF at the byte-offset there is in all likelihood > pretty much unaligned, the MEM_REF at the COMPONENT_REF > is likely more aligned and allows better code for ARM processors, > but only if the MEM_REF is at a byte-aligned offset at all, > otherwise the whole transformation would be wrong. Note that the important thing to ensure is that the access the MEM_REF performs is correct TBAA-wise which means you either have to use alias-set zero (ptr_type_node offset) or _not_ shuffle the offset arbitrarily between the MEM_REF and the components you wrap it in. Richard.
Index: gcc/tree-predcom.c =================================================================== --- gcc/tree-predcom.c (revision 239193) +++ gcc/tree-predcom.c (working copy) @@ -213,6 +213,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "params.h" #include "tree-affine.h" +#include "builtins.h" /* The maximum number of iterations between the considered memory references. */ @@ -1364,11 +1365,16 @@ replace_ref_with (gimple *stmt, tree new_tree, boo /* Returns a memory reference to DR in the ITER-th iteration of the loop it was analyzed in. Append init stmts to STMTS. */ -static tree +static tree ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts) { tree off = DR_OFFSET (dr); tree coff = DR_INIT (dr); + tree ref = DR_REF (dr); + enum tree_code ref_code = ERROR_MARK; + tree ref_type = NULL_TREE; + tree ref_op1 = NULL_TREE; + tree ref_op2 = NULL_TREE; if (iter == 0) ; else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST) @@ -1377,27 +1383,43 @@ ref_at_iteration (data_reference_p dr, int iter, g else off = size_binop (PLUS_EXPR, off, size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter))); + if (TREE_CODE (ref) == COMPONENT_REF + && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))) + { + unsigned HOST_WIDE_INT boff; + tree field = TREE_OPERAND (ref, 1); + ref_type = TREE_TYPE (ref); + boff = tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field)); + /* This can occur in Ada. See Eric's comment in get_bit_range. */ + if ((boff % BITS_PER_UNIT) != 0) + { + ref_code = BIT_FIELD_REF; + ref_op1 = DECL_SIZE (field); + ref_op2 = bitsize_zero_node; + } + else + { + boff >>= LOG2_BITS_PER_UNIT; + boff += tree_to_uhwi (component_ref_field_offset (ref)); + coff = size_binop (MINUS_EXPR, coff, ssize_int (boff)); + ref_code = COMPONENT_REF; + ref_op1 = field; + ref_op2 = TREE_OPERAND (ref, 2); + ref = TREE_OPERAND (ref, 0); + } + } tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off); addr = force_gimple_operand_1 (unshare_expr (addr), stmts, is_gimple_mem_ref_addr, NULL_TREE); - tree alias_ptr = fold_convert (reference_alias_ptr_type (DR_REF (dr)), coff); + tree alias_ptr = fold_convert (reference_alias_ptr_type (ref), coff); + tree type = build_aligned_type (TREE_TYPE (ref), + get_object_alignment (ref)); + ref = build2 (MEM_REF, type, addr, alias_ptr); /* While data-ref analysis punts on bit offsets it still handles - bitfield accesses at byte boundaries. Cope with that. Note that - we cannot simply re-apply the outer COMPONENT_REF because the - byte-granular portion of it is already applied via DR_INIT and - DR_OFFSET, so simply build a BIT_FIELD_REF knowing that the bits - start at offset zero. */ - if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF - && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1))) - { - tree field = TREE_OPERAND (DR_REF (dr), 1); - return build3 (BIT_FIELD_REF, TREE_TYPE (DR_REF (dr)), - build2 (MEM_REF, DECL_BIT_FIELD_TYPE (field), - addr, alias_ptr), - DECL_SIZE (field), bitsize_zero_node); - } - else - return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), addr, alias_ptr); + bitfield accesses at byte boundaries. Cope with that. */ + if (ref_type) + ref = build3 (ref_code, ref_type, ref, ref_op1, ref_op2); + return ref; } /* Get the initialization expression for the INDEX-th temporary variable Index: gcc/testsuite/gcc.c-torture/execute/pr71083.c =================================================================== --- gcc/testsuite/gcc.c-torture/execute/pr71083.c (revision 0) +++ gcc/testsuite/gcc.c-torture/execute/pr71083.c (working copy) @@ -0,0 +1,43 @@ +struct lock_chain { + unsigned int irq_context: 2, + depth: 6, + base: 24; +}; + +__attribute__((noinline, noclone)) +struct lock_chain * foo (struct lock_chain *chain) +{ + int i; + for (i = 0; i < 100; i++) + { + chain[i+1].base = chain[i].base; + } + return chain; +} + +struct lock_chain1 { + char x; + unsigned short base; +} __attribute__((packed)); + +__attribute__((noinline, noclone)) +struct lock_chain1 * bar (struct lock_chain1 *chain) +{ + int i; + for (i = 0; i < 100; i++) + { + chain[i+1].base = chain[i].base; + } + return chain; +} + +struct lock_chain test [101]; +struct lock_chain1 test1 [101]; + +int +main () +{ + foo (test); + bar (test1); + return 0; +}