Message ID | Zp9LSqpoZShk+eYT@tucnak |
---|---|
State | New |
Headers | show |
Series | ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034] | expand |
On Tue, 23 Jul 2024, Jakub Jelinek wrote: > Hi! > > The folding into REALPART_EXPR is correct, used only when the mem_offset > is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just > that it is not 0). > The following patch fixes that by using IMAGPART_EXPR only if the offset > is right and using BITFIELD_REF or whatever else otherwise. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2 > and other release branches? I think this is not enough - what kind of GIMPLE does this result in? You should amend the checking in non_rewritable_mem_ref_base as well, it should fail when the corresponding rewrite doesn't work. I suspect it falls through to the BIT_FIELD_REF code? That said, can you match up the offset check with that of non_rewritable_mem_ref_base then, particularly if ((VECTOR_TYPE_P (TREE_TYPE (decl)) || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE) && useless_type_conversion_p (TREE_TYPE (base), TREE_TYPE (TREE_TYPE (decl))) && known_ge (mem_ref_offset (base), 0) && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))), mem_ref_offset (base)) && multiple_p (mem_ref_offset (base), wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (base))))) I suppose this check should be adjusted to use the three arg multiple_p and check the factor against 0/1 for COMPLEX_TYPE. It would probably better maintainance-wise to merge the checking and transform routines and make it have two modes, check only or check-and-transform. That makes it easier to keep them in sync. Richard. > 2024-07-23 Jakub Jelinek <jakub@redhat.com> > Andrew Pinski <quic_apinski@quicinc.com> > > PR tree-optimization/116034 > * tree-sra.cc (maybe_rewrite_mem_ref_base): Only use IMAGPART_EXPR > if MEM_REF offset is equal to element type size. > > * gcc.dg/pr116034.c: New test. > > --- gcc/tree-ssa.cc.jj 2024-03-11 11:00:46.768915988 +0100 > +++ gcc/tree-ssa.cc 2024-07-22 21:27:02.115530861 +0200 > @@ -1506,7 +1506,10 @@ maybe_rewrite_mem_ref_base (tree *tp, bi > } > else if (TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE > && useless_type_conversion_p (TREE_TYPE (*tp), > - TREE_TYPE (TREE_TYPE (sym)))) > + TREE_TYPE (TREE_TYPE (sym))) > + && (integer_zerop (TREE_OPERAND (*tp, 1)) > + || tree_int_cst_equal (TREE_OPERAND (*tp, 1), > + TYPE_SIZE_UNIT (TREE_TYPE (*tp))))) > { > *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1)) > ? REALPART_EXPR : IMAGPART_EXPR, > --- gcc/testsuite/gcc.dg/pr116034.c.jj 2024-07-22 21:39:50.050994243 +0200 > +++ gcc/testsuite/gcc.dg/pr116034.c 2024-07-22 21:39:32.432213042 +0200 > @@ -0,0 +1,21 @@ > +/* PR tree-optimization/116034 */ > +/* { dg-do run } */ > +/* { dg-options "-O1 -fno-strict-aliasing" } */ > + > +int g; > + > +static inline int > +foo (_Complex unsigned short c) > +{ > + __builtin_memmove (&g, 1 + (char *) &c, 2); > + return g; > +} > + > +int > +main () > +{ > + if (__SIZEOF_SHORT__ == 2 > + && __CHAR_BIT__ == 8 > + && foo (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? 0x0100 : 1) != 1) > + __builtin_abort (); > +} > > Jakub > >
On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote: > On Tue, 23 Jul 2024, Jakub Jelinek wrote: > > The folding into REALPART_EXPR is correct, used only when the mem_offset > > is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just > > that it is not 0). > > The following patch fixes that by using IMAGPART_EXPR only if the offset > > is right and using BITFIELD_REF or whatever else otherwise. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2 > > and other release branches? > > I think this is not enough - what kind of GIMPLE does this result in? If it is __builtin_memmove (&g, 2 + (char *) &c, 2); then _1 = &c + 2; _3 = MEM <unsigned short> [(char * {ref-all})_1]; is optimized to _3 = IMAGPART_EXPR <c_6(D)>; as before. If it is __builtin_memmove (&g, 1 + (char *) &c, 2); from the testcase, then _1 = &c + 1; _3 = MEM <unsigned short> [(char * {ref-all})_1]; is optimized to _3 = BIT_FIELD_REF <c_6(D), 16, 8>; and that is expanded correctly. > You should amend the checking in non_rewritable_mem_ref_base as well, > it should fail when the corresponding rewrite doesn't work. That is already the case I believe. non_rewritable_mem_ref_base rejects it in the VECTOR_TYPE/COMPLEX_TYPE case (so doesn't return NULL), but then falls through the /* For integral typed extracts we can use a BIT_FIELD_REF. */ check and returns NULL_TREE there. But then maybe_rewrite_mem_ref_base triggers already on the COMPLEX_TYPE case. > I suspect it falls through to the BIT_FIELD_REF code? > > That said, can you match up the offset check with that of > non_rewritable_mem_ref_base then, particularly > > if ((VECTOR_TYPE_P (TREE_TYPE (decl)) > || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE) > && useless_type_conversion_p (TREE_TYPE (base), > TREE_TYPE (TREE_TYPE (decl))) > && known_ge (mem_ref_offset (base), 0) > && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE > (decl))), > mem_ref_offset (base)) > && multiple_p (mem_ref_offset (base), > wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE > (base))))) > > I suppose this check should be adjusted to use the three arg multiple_p > and check the factor against 0/1 for COMPLEX_TYPE. Isn't that just too complex/expensive for something as simple as mem_ref_offset (base) is 0 or TYPE_SIZE_UNIT? That integer_zerop () || tree_int_cst_equal looked much cheaper. Sure, it could be done on poly_int_cst instead if that looks better: && (known_eq (mem_ref_offset (base), 0) || known_eq (mem_ref_offset (base), wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl)))) And shouldn't we cache those mem_ref_offset (base) and wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))) which are used in multiple spots? Anyway, yet another option because non_rewritable_mem_ref_base has the VECTOR/COMPLEX types cases together would be to have them together in maybe_rewrite_mem_ref_base too, so do: if ((VECTOR_TYPE_P (TREE_TYPE (sym)) || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE) && useless_type_conversion_p (TREE_TYPE (*tp), TREE_TYPE (TREE_TYPE (sym))) && multiple_p (mem_ref_offset (*tp), wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp))))) { if (VECTOR_TYPE_P (TREE_TYPE (sym))) *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, TYPE_SIZE (TREE_TYPE (*tp)), int_const_binop (MULT_EXPR, bitsize_int (BITS_PER_UNIT), TREE_OPERAND (*tp, 1))); else *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1)) ? REALPART_EXPR : IMAGPART_EXPR, TREE_TYPE (*tp), sym); } Jakub
On Tue, Jul 23, 2024 at 10:07:08AM +0200, Jakub Jelinek wrote: > Anyway, yet another option because non_rewritable_mem_ref_base has > the VECTOR/COMPLEX types cases together would be to have them together > in maybe_rewrite_mem_ref_base too, so do: In patch form that would be: 2024-07-23 Jakub Jelinek <jakub@redhat.com> Andrew Pinski <quic_apinski@quicinc.com> PR tree-optimization/116034 * tree-ssa.cc (maybe_rewrite_mem_ref_base): Merge the VECTOR and COMPLEX type checks. * gcc.dg/pr116034.c: New test. --- gcc/tree-ssa.cc.jj 2024-03-11 11:00:46.768915988 +0100 +++ gcc/tree-ssa.cc 2024-07-23 10:24:54.564568968 +0200 @@ -1492,25 +1492,23 @@ maybe_rewrite_mem_ref_base (tree *tp, bi && is_gimple_reg_type (TREE_TYPE (*tp)) && ! VOID_TYPE_P (TREE_TYPE (*tp))) { - if (VECTOR_TYPE_P (TREE_TYPE (sym)) + if ((VECTOR_TYPE_P (TREE_TYPE (sym)) + || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE) && useless_type_conversion_p (TREE_TYPE (*tp), TREE_TYPE (TREE_TYPE (sym))) && multiple_p (mem_ref_offset (*tp), wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp))))) { - *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, - TYPE_SIZE (TREE_TYPE (*tp)), - int_const_binop (MULT_EXPR, - bitsize_int (BITS_PER_UNIT), - TREE_OPERAND (*tp, 1))); - } - else if (TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE - && useless_type_conversion_p (TREE_TYPE (*tp), - TREE_TYPE (TREE_TYPE (sym)))) - { - *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1)) - ? REALPART_EXPR : IMAGPART_EXPR, - TREE_TYPE (*tp), sym); + if (VECTOR_TYPE_P (TREE_TYPE (sym))) + *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, + TYPE_SIZE (TREE_TYPE (*tp)), + int_const_binop (MULT_EXPR, + bitsize_int (BITS_PER_UNIT), + TREE_OPERAND (*tp, 1))); + else + *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1)) + ? REALPART_EXPR : IMAGPART_EXPR, + TREE_TYPE (*tp), sym); } else if (integer_zerop (TREE_OPERAND (*tp, 1)) && DECL_SIZE (sym) == TYPE_SIZE (TREE_TYPE (*tp))) --- gcc/testsuite/gcc.dg/pr116034.c.jj 2024-07-22 21:39:50.050994243 +0200 +++ gcc/testsuite/gcc.dg/pr116034.c 2024-07-23 10:26:29.940340508 +0200 @@ -0,0 +1,22 @@ +/* PR tree-optimization/116034 */ +/* { dg-do run } */ +/* { dg-options "-O1 -fno-strict-aliasing" } */ + +int g; + +static inline int +foo (_Complex unsigned short c) +{ + __builtin_memmove (&g, 1 + (char *) &c, 2); + return g; +} + +int +main () +{ + if (__SIZEOF_SHORT__ == 2 + && __CHAR_BIT__ == 8 + && (foo (__BYTE_ORDER__ != __ORDER_BIG_ENDIAN__ ? 0x100 : 1) + != (__BYTE_ORDER__ != __ORDER_BIG_ENDIAN__ ? 1 : 0x100))) + __builtin_abort (); +} Jakub
On Tue, 23 Jul 2024, Jakub Jelinek wrote: > On Tue, Jul 23, 2024 at 08:42:24AM +0200, Richard Biener wrote: > > On Tue, 23 Jul 2024, Jakub Jelinek wrote: > > > The folding into REALPART_EXPR is correct, used only when the mem_offset > > > is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just > > > that it is not 0). > > > The following patch fixes that by using IMAGPART_EXPR only if the offset > > > is right and using BITFIELD_REF or whatever else otherwise. > > > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, 14.2 > > > and other release branches? > > > > I think this is not enough - what kind of GIMPLE does this result in? > > If it is > __builtin_memmove (&g, 2 + (char *) &c, 2); > then > _1 = &c + 2; > _3 = MEM <unsigned short> [(char * {ref-all})_1]; > is optimized to > _3 = IMAGPART_EXPR <c_6(D)>; > as before. If it is > __builtin_memmove (&g, 1 + (char *) &c, 2); > from the testcase, then > _1 = &c + 1; > _3 = MEM <unsigned short> [(char * {ref-all})_1]; > is optimized to > _3 = BIT_FIELD_REF <c_6(D), 16, 8>; > and that is expanded correctly. > > > You should amend the checking in non_rewritable_mem_ref_base as well, > > it should fail when the corresponding rewrite doesn't work. > > That is already the case I believe. > non_rewritable_mem_ref_base rejects it in the VECTOR_TYPE/COMPLEX_TYPE > case (so doesn't return NULL), but then falls through the > /* For integral typed extracts we can use a BIT_FIELD_REF. */ > check and returns NULL_TREE there. > But then maybe_rewrite_mem_ref_base triggers already on the COMPLEX_TYPE > case. > > > I suspect it falls through to the BIT_FIELD_REF code? > > > > That said, can you match up the offset check with that of > > non_rewritable_mem_ref_base then, particularly > > > > if ((VECTOR_TYPE_P (TREE_TYPE (decl)) > > || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE) > > && useless_type_conversion_p (TREE_TYPE (base), > > TREE_TYPE (TREE_TYPE (decl))) > > && known_ge (mem_ref_offset (base), 0) > > && known_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE > > (decl))), > > mem_ref_offset (base)) > > && multiple_p (mem_ref_offset (base), > > wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE > > (base))))) > > > > I suppose this check should be adjusted to use the three arg multiple_p > > and check the factor against 0/1 for COMPLEX_TYPE. Ah, reading the above again it should alreeady ensure the offset is for the real or imag part only - I was thinking it might allow larger aligned offsets. Thus your original patch is OK. I think an improvement would really be to merge the two functions. Thanks, Richard. > Isn't that just too complex/expensive for something as simple as > mem_ref_offset (base) is 0 or TYPE_SIZE_UNIT? > That > integer_zerop () || tree_int_cst_equal looked much cheaper. > Sure, it could be done on poly_int_cst instead if that looks better: > && (known_eq (mem_ref_offset (base), 0) > || known_eq (mem_ref_offset (base), > wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl)))) > And shouldn't we cache those mem_ref_offset (base) and > wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))) > which are used in multiple spots? > > Anyway, yet another option because non_rewritable_mem_ref_base has > the VECTOR/COMPLEX types cases together would be to have them together > in maybe_rewrite_mem_ref_base too, so do: > if ((VECTOR_TYPE_P (TREE_TYPE (sym)) > || TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE) > && useless_type_conversion_p (TREE_TYPE (*tp), > TREE_TYPE (TREE_TYPE (sym))) > && multiple_p (mem_ref_offset (*tp), > wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp))))) > { > if (VECTOR_TYPE_P (TREE_TYPE (sym))) > *tp = build3 (BIT_FIELD_REF, TREE_TYPE (*tp), sym, > TYPE_SIZE (TREE_TYPE (*tp)), > int_const_binop (MULT_EXPR, > bitsize_int (BITS_PER_UNIT), > TREE_OPERAND (*tp, 1))); > else > *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1)) > ? REALPART_EXPR : IMAGPART_EXPR, > TREE_TYPE (*tp), sym); > } > > Jakub > >
--- gcc/tree-ssa.cc.jj 2024-03-11 11:00:46.768915988 +0100 +++ gcc/tree-ssa.cc 2024-07-22 21:27:02.115530861 +0200 @@ -1506,7 +1506,10 @@ maybe_rewrite_mem_ref_base (tree *tp, bi } else if (TREE_CODE (TREE_TYPE (sym)) == COMPLEX_TYPE && useless_type_conversion_p (TREE_TYPE (*tp), - TREE_TYPE (TREE_TYPE (sym)))) + TREE_TYPE (TREE_TYPE (sym))) + && (integer_zerop (TREE_OPERAND (*tp, 1)) + || tree_int_cst_equal (TREE_OPERAND (*tp, 1), + TYPE_SIZE_UNIT (TREE_TYPE (*tp))))) { *tp = build1 (integer_zerop (TREE_OPERAND (*tp, 1)) ? REALPART_EXPR : IMAGPART_EXPR, --- gcc/testsuite/gcc.dg/pr116034.c.jj 2024-07-22 21:39:50.050994243 +0200 +++ gcc/testsuite/gcc.dg/pr116034.c 2024-07-22 21:39:32.432213042 +0200 @@ -0,0 +1,21 @@ +/* PR tree-optimization/116034 */ +/* { dg-do run } */ +/* { dg-options "-O1 -fno-strict-aliasing" } */ + +int g; + +static inline int +foo (_Complex unsigned short c) +{ + __builtin_memmove (&g, 1 + (char *) &c, 2); + return g; +} + +int +main () +{ + if (__SIZEOF_SHORT__ == 2 + && __CHAR_BIT__ == 8 + && foo (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? 0x0100 : 1) != 1) + __builtin_abort (); +}