Message ID | 54D076BC.9080504@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, 3 Feb 2015, Jeff Law wrote: > +/* Given a bit-wise operation performed in mode P1 on operands > + in some narrower type P2 that feeds an outer masking operation. > + See if the mask turns off all the bits outside P2, and if so > + perform the all the operations in P2 and just convert the final > + result from P1 to P2. */ > +(for inner_op (bit_and bit_ior bit_xor) > + (simplify > + (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3) > + (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0 Are you sure about checking TREE_INT_CST_LOW here? What if the inner type is wider than HOST_WIDE_INT? (That could occur with bit-fields of type __int128, for example.) I think the check for what bits are set needs to be written in a wide-int-safe way - maybe something like tree_int_cst_sgn (@3) > 0 && tree_int_cst_min_precision (@3, UNSIGNED) <= TYPE_PRECISION (TREE_TYPE (@1)). > + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1)) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) > + (convert (bit_and (inner_op @0 @1) (convert @3)))))) I still don't think this is safe. Suppose @0 and @1 are -128 in type int8_t and @3 is 128 in a wider type and the operation is AND. Then the original expression has result 128. But if you convert @3 to int8_t you get -128 and this would result in -128 from the simplified expression. If the inner values are signed and the mask includes the sign bit of the inner type, you have to zero-extend to the wider type (e.g. convert the inner values to unsigned), not sign-extend. (If the inner values are signed, it's *also* valid to optimize with a mask where both the sign bit of the inner type and all higher bits are set, such as a mask of -128 above; in that case, you do need to sign-extend. If the inner values are unsigned, no check on the mask value is needed at all as all higher bits in the mask can just be discarded. Both of these statements only apply for bitwise operations, not arithmetic.)
On 02/03/15 05:23, Joseph Myers wrote: > On Tue, 3 Feb 2015, Jeff Law wrote: > >> +/* Given a bit-wise operation performed in mode P1 on operands >> + in some narrower type P2 that feeds an outer masking operation. >> + See if the mask turns off all the bits outside P2, and if so >> + perform the all the operations in P2 and just convert the final >> + result from P1 to P2. */ >> +(for inner_op (bit_and bit_ior bit_xor) >> + (simplify >> + (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3) >> + (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0 > > Are you sure about checking TREE_INT_CST_LOW here? What if the inner type > is wider than HOST_WIDE_INT? (That could occur with bit-fields of type > __int128, for example.) I think the check for what bits are set needs to > be written in a wide-int-safe way - maybe something like tree_int_cst_sgn > (@3) > 0 && tree_int_cst_min_precision (@3, UNSIGNED) <= TYPE_PRECISION > (TREE_TYPE (@1)). It's a WIP :-) > >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)) >> + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1)) >> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) >> + (convert (bit_and (inner_op @0 @1) (convert @3)))))) > > I still don't think this is safe. Suppose @0 and @1 are -128 in type > int8_t and @3 is 128 in a wider type and the operation is AND. Then the > original expression has result 128. But if you convert @3 to int8_t you > get -128 and this would result in -128 from the simplified expression. Agreed. > > If the inner values are signed and the mask includes the sign bit of the > inner type, you have to zero-extend to the wider type (e.g. convert the > inner values to unsigned), not sign-extend. FWIW I did figure out how to get at an expression's type (you can capture them just like individual operands). This is important because if we introduce that signed->unsigned conversions, we have to avoid infinite recursion of the pattern and the easiest way is actually to look at the various types. > > (If the inner values are signed, it's *also* valid to optimize with a mask > where both the sign bit of the inner type and all higher bits are set, > such as a mask of -128 above; in that case, you do need to sign-extend. > If the inner values are unsigned, no check on the mask value is needed at > all as all higher bits in the mask can just be discarded. Both of these > statements only apply for bitwise operations, not arithmetic.) Noted. Thanks. I'm going to go through another iteration on this stuff. I'm less concerned about this particular PR, but the knowledge gained here looks to be helpful for 45397 which looks like it might be solveable with match.pd patterns too. jeff
On 02/03/15 05:23, Joseph Myers wrote: > >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)) >> + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1)) >> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) >> + (convert (bit_and (inner_op @0 @1) (convert @3)))))) > > I still don't think this is safe. Suppose @0 and @1 are -128 in type > int8_t and @3 is 128 in a wider type and the operation is AND. Then the > original expression has result 128. But if you convert @3 to int8_t you > get -128 and this would result in -128 from the simplified expression. Looking at this again, @3 is being converted to the wrong type, plain and simple. I should never write code while heavily medicated. I suspect that combined with trying to work around the genmatch bug Richi just fixed sent me down a totally wrong path. I think the way to go is to always convert the inner operands to an unsigned type. In fact, everything except the outer convert should be using an unsigned type of the same size/precision as @0's type. The outer convert should, of course, be the final type. That avoids all the concerns about sign bit propagation from the narrow type into the wider final type. Application of this pattern (and the one I posted for 47477) is a concern for targets that don't do sub-word arithmetic/logicals. But I just did a sniff test of one such target (v850-elf because it was handy) and I couldn't spot a change in the end code using both the 47477 patch and my WIP patch for this BZ. jeff
On Wed, Feb 11, 2015 at 7:43 AM, Jeff Law <law@redhat.com> wrote: > On 02/03/15 05:23, Joseph Myers wrote: >> >> >>> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >>> (@1)) >>> + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE >>> (@1)) >>> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) >>> + (convert (bit_and (inner_op @0 @1) (convert @3)))))) >> >> >> I still don't think this is safe. Suppose @0 and @1 are -128 in type >> int8_t and @3 is 128 in a wider type and the operation is AND. Then the >> original expression has result 128. But if you convert @3 to int8_t you >> get -128 and this would result in -128 from the simplified expression. > > Looking at this again, @3 is being converted to the wrong type, plain and > simple. I should never write code while heavily medicated. I suspect that > combined with trying to work around the genmatch bug Richi just fixed sent > me down a totally wrong path. > > I think the way to go is to always convert the inner operands to an unsigned > type. In fact, everything except the outer convert should be using an > unsigned type of the same size/precision as @0's type. The outer convert > should, of course, be the final type. > > That avoids all the concerns about sign bit propagation from the narrow type > into the wider final type. > > Application of this pattern (and the one I posted for 47477) is a concern > for targets that don't do sub-word arithmetic/logicals. But I just did a > sniff test of one such target (v850-elf because it was handy) and I couldn't > spot a change in the end code using both the 47477 patch and my WIP patch > for this BZ. The c-family frontends perform this kind of narrowing already anyway (via the shorten_* stuff which is misplaced there and should be done elsewhere for all frontends - thus in match.pd, thanks for starting that). Richard. > jeff >
On 02/11/15 04:16, Richard Biener wrote: >> >> Application of this pattern (and the one I posted for 47477) is a concern >> for targets that don't do sub-word arithmetic/logicals. But I just did a >> sniff test of one such target (v850-elf because it was handy) and I couldn't >> spot a change in the end code using both the 47477 patch and my WIP patch >> for this BZ. > > The c-family frontends perform this kind of narrowing already anyway > (via the shorten_* stuff which is misplaced there and should be done > elsewhere for all frontends - thus in match.pd, thanks for starting that). True, but I wanted to see if there was any impact, but thankfully there isn't. The fact that the C/C++ front-ends are doing most of the shortening now probably explains why the fix for 47477 only affected code generation for the Java front-end. Jeff
On Wed, Feb 11, 2015 at 08:56:20AM -0700, Jeff Law wrote: > On 02/11/15 04:16, Richard Biener wrote: > >> > >>Application of this pattern (and the one I posted for 47477) is a concern > >>for targets that don't do sub-word arithmetic/logicals. But I just did a > >>sniff test of one such target (v850-elf because it was handy) and I couldn't > >>spot a change in the end code using both the 47477 patch and my WIP patch > >>for this BZ. > > > >The c-family frontends perform this kind of narrowing already anyway > >(via the shorten_* stuff which is misplaced there and should be done > >elsewhere for all frontends - thus in match.pd, thanks for starting that). > True, but I wanted to see if there was any impact, but thankfully there > isn't. > > The fact that the C/C++ front-ends are doing most of the shortening now > probably explains why the fix for 47477 only affected code generation for > the Java front-end. The C/C++ FEs are doing it only if it appears on a single statement though, something fold can see at once. If you rewrite it so that there are multiple vars holding the intermediate values, they of course can't do that and it is a task for match.pd or specialized pass to handle those. Jakub
On Tue, 10 Feb 2015, Jeff Law wrote: > I think the way to go is to always convert the inner operands to an unsigned > type. In fact, everything except the outer convert should be using an > unsigned type of the same size/precision as @0's type. The outer convert > should, of course, be the final type. > > That avoids all the concerns about sign bit propagation from the narrow type > into the wider final type. Subject of course to allowing for the cases where the original expression uses sign-extension (signed narrow type), the mask is consistent with this (sign bit of narrow type and all higher bits set, e.g. -128), and the operation is bitwise not arithmetic, if you want to optimize those as well.
diff --git a/gcc/match.pd b/gcc/match.pd index 81c4ee6..d55fccd 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1018,3 +1018,31 @@ along with GCC; see the file COPYING3. If not see (logs (pows @0 @1)) (mult @1 (logs @0))))) +/* Given a bit-wise operation performed in mode P1 on operands + in some narrower type P2 that feeds an outer masking operation. + See if the mask turns off all the bits outside P2, and if so + perform the all the operations in P2 and just convert the final + result from P1 to P2. */ +(for inner_op (bit_and bit_ior bit_xor) + (simplify + (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3) + (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0 + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)) + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1)) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) + (convert (bit_and (inner_op @0 @1) (convert @3)))))) + +/* Similarly, but for unsigned arithmetic operations. + + It would be nice to handle signed arithmetic, but that runs the + risk of introducing undefined behaviour where none existed before. */ +(for inner_op (minus plus mult) + (simplify + (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3) + (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0 + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)) + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1)) + /* Restricted to unsigned inner arithmetic for now. */ + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && TYPE_PRECISION (TREE_TYPE (@3)) > TYPE_PRECISION (TREE_TYPE (@0))) + (convert (bit_and (inner_op @0 @1) (convert @3))))))