Message ID | mpt8r4abx42.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | vect: Tighten vect_determine_precisions_from_range [PR113281] | expand |
On Sat, Jan 27, 2024 at 4:44 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > This was another PR caused by the way that > vect_determine_precisions_from_range handle shifts. We tried to > narrow 32768 >> x to a 16-bit shift based on range information for > the inputs and outputs, with vect_recog_over_widening_pattern > (after PR110828) adjusting the shift amount. But this doesn't > work for the case where x is in [16, 31], since then 32-bit > 32768 >> x is a well-defined zero, whereas no well-defined > 16-bit 32768 >> y will produce 0. > > We could perhaps generate x < 16 ? 32768 >> x : 0 instead, > but since vect_determine_precisions_from_range was never really > supposed to rely on fix-ups, it seems better to fix that instead. > > The patch also makes the code more selective about which codes > can be narrowed based on input and output ranges. This showed > that vect_truncatable_operation_p was missing cases for > BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR > (equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1). > > pr113281-1.c is the original testcase. pr113281-[23].c failed > before the patch due to overly optimistic narrowing. pr113281-[45].c > previously passed and are meant to protect against accidental > optimisation regressions. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > PR target/113281 > * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove > workaround for right shifts. > (vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR. > (vect_determine_precisions_from_range): Be more selective about > which codes can be narrowed based on their input and output ranges. > For shifts, require at least one more bit of precision than the > maximum shift amount. > > gcc/testsuite/ > PR target/113281 > * gcc.dg/vect/pr113281-1.c: New test. > * gcc.dg/vect/pr113281-2.c: Likewise. > * gcc.dg/vect/pr113281-3.c: Likewise. > * gcc.dg/vect/pr113281-4.c: Likewise. > * gcc.dg/vect/pr113281-5.c: Likewise. > --- > gcc/testsuite/gcc.dg/vect/pr113281-1.c | 17 +++ > gcc/testsuite/gcc.dg/vect/pr113281-2.c | 50 +++++++++ > gcc/testsuite/gcc.dg/vect/pr113281-3.c | 39 +++++++ > gcc/testsuite/gcc.dg/vect/pr113281-4.c | 55 ++++++++++ > gcc/testsuite/gcc.dg/vect/pr113281-5.c | 66 ++++++++++++ > gcc/tree-vect-patterns.cc | 144 +++++++++++++------------ > 6 files changed, 305 insertions(+), 66 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-1.c > create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-2.c > create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-3.c > create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-4.c > create mode 100644 gcc/testsuite/gcc.dg/vect/pr113281-5.c > > diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-1.c b/gcc/testsuite/gcc.dg/vect/pr113281-1.c > new file mode 100644 > index 00000000000..6df4231cb5f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr113281-1.c > @@ -0,0 +1,17 @@ > +#include "tree-vect.h" > + > +unsigned char a; > + > +int main() { > + check_vect (); > + > + short b = a = 0; > + for (; a != 19; a++) > + if (a) > + b = 32872 >> a; > + > + if (b == 0) > + return 0; > + else > + return 1; > +} > diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-2.c b/gcc/testsuite/gcc.dg/vect/pr113281-2.c > new file mode 100644 > index 00000000000..3a1170c28b6 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr113281-2.c > @@ -0,0 +1,50 @@ > +/* { dg-do compile } */ > + > +#define N 128 > + > +short x[N]; > +short y[N]; > + > +void > +f1 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= y[i]; > +} > + > +void > +f2 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] < 32 ? y[i] : 32); > +} > + > +void > +f3 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] < 31 ? y[i] : 31); > +} > + > +void > +f4 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] & 31); > +} > + > +void > +f5 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= 0x8000 >> y[i]; > +} > + > +void > +f6 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= 0x8000 >> (y[i] & 31); > +} > + > +/* { dg-final { scan-tree-dump-not {can narrow[^\n]+>>} "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-3.c b/gcc/testsuite/gcc.dg/vect/pr113281-3.c > new file mode 100644 > index 00000000000..5982dd2d16f > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr113281-3.c > @@ -0,0 +1,39 @@ > +/* { dg-do compile } */ > + > +#define N 128 > + > +short x[N]; > +short y[N]; > + > +void > +f1 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] < 30 ? y[i] : 30); > +} > + > +void > +f2 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= ((y[i] & 15) + 2); > +} > + > +void > +f3 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] < 16 ? y[i] : 16); > +} > + > +void > +f4 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] = 32768 >> ((y[i] & 15) + 3); > +} > + > +/* { dg-final { scan-tree-dump {can narrow to signed:31 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {can narrow to signed:18 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {can narrow to signed:17 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {can narrow to unsigned:19 without loss [^\n]+>>} "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-4.c b/gcc/testsuite/gcc.dg/vect/pr113281-4.c > new file mode 100644 > index 00000000000..10fbc0e8405 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr113281-4.c > @@ -0,0 +1,55 @@ > +/* { dg-do compile } */ > + > +#define N 128 > + > +short x[N]; > +short y[N]; > + > +void > +f1 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] & 15); > +} > + > +void > +f2 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= ((y[i] & 7) + 8); > +} > + > +void > +f3 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= ((y[i] & 7) ^ 11); > +} > + > +void > +f4 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] < 15 ? y[i] : 15); > +} > + > +void > +f5 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] >>= (y[i] < 15 ? y[i] : 1); > +} > + > +void > +f6 (void) > +{ > + for (int i = 0; i < N; ++i) > + x[i] = 32768 >> (y[i] & 15); > +} > + > +/* { dg-final { scan-tree-dump {:11:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {:18:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {:25:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {:32:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {:39:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ > +/* { dg-final { scan-tree-dump {can narrow to unsigned:16 without loss [^\n]+>>} "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-5.c b/gcc/testsuite/gcc.dg/vect/pr113281-5.c > new file mode 100644 > index 00000000000..4a4571792e2 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/pr113281-5.c > @@ -0,0 +1,66 @@ > +/* { dg-do compile } */ > + > +#define N 128 > + > +short x[N]; > +short y[N]; > + > +void > +f1 (void) > +{ > + for (int i = 0; i < N; ++i) > + { > + int a = y[i]; > + int b = ~a; > + x[i] = b; > + } > +} > + > +void > +f2 (void) > +{ > + for (int i = 0; i < N; ++i) > + { > + int a = y[i]; > + int b = -a; > + x[i] = b; > + } > +} > + > +void > +f3 (void) > +{ > + for (int i = 0; i < N; ++i) > + { > + int a = x[i]; > + int b = a / y[i]; > + x[i] = b; > + } > +} > + > +void > +f4 (void) > +{ > + for (int i = 0; i < N; ++i) > + { > + int a = x[i]; > + int b = a < y[i] ? a : y[i]; > + x[i] = b; > + } > +} > + > +void > +f5 (void) > +{ > + for (int i = 0; i < N; ++i) > + { > + int a = x[i]; > + int b = a > y[i] ? a : y[i]; > + x[i] = b; > + } > +} > + > +/* { dg-final { scan-tree-dump {can narrow to signed:17 without loss [^\n]+= -} "vect" } } */ > +/* { dg-final { scan-tree-dump {can narrow to signed:16 without loss [^\n]+= ~} "vect" } } */ > +/* { dg-final { scan-tree-dump {can narrow to signed:16 without loss [^\n]+ MIN_EXPR} "vect" } } */ > +/* { dg-final { scan-tree-dump {can narrow to signed:16 without loss [^\n]+ MAX_EXPR} "vect" } } */ > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index f1a95ef34b9..d562f57920f 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -3164,46 +3164,9 @@ vect_recog_over_widening_pattern (vec_info *vinfo, > tree ops[3] = {}; > for (unsigned int i = 1; i < first_op; ++i) > ops[i - 1] = gimple_op (last_stmt, i); > - /* For right shifts limit the shift operand. */ > vect_convert_inputs (vinfo, last_stmt_info, nops, &ops[first_op - 1], > op_type, &unprom[0], op_vectype); > > - /* Limit shift operands. */ > - if (code == RSHIFT_EXPR) > - { > - wide_int min_value, max_value; > - if (TREE_CODE (ops[1]) == INTEGER_CST) > - ops[1] = wide_int_to_tree (op_type, > - wi::umin (wi::to_wide (ops[1]), > - new_precision - 1)); > - else if (!vect_get_range_info (ops[1], &min_value, &max_value) > - || wi::ge_p (max_value, new_precision, TYPE_SIGN (op_type))) > - { > - /* ??? Note the following bad for SLP as that only supports > - same argument widened shifts and it un-CSEs same arguments. */ > - tree new_var = vect_recog_temp_ssa_var (op_type, NULL); > - gimple *pattern_stmt > - = gimple_build_assign (new_var, MIN_EXPR, ops[1], > - build_int_cst (op_type, new_precision - 1)); > - gimple_set_location (pattern_stmt, gimple_location (last_stmt)); > - if (ops[1] == unprom[1].op && unprom[1].dt == vect_external_def) > - { > - if (edge e = vect_get_external_def_edge (vinfo, ops[1])) > - { > - basic_block new_bb > - = gsi_insert_on_edge_immediate (e, pattern_stmt); > - gcc_assert (!new_bb); > - } > - else > - return NULL; > - } > - else > - append_pattern_def_seq (vinfo, last_stmt_info, pattern_stmt, > - op_vectype); > - ops[1] = new_var; > - } > - } > - > /* Use the operation to produce a result of type OP_TYPE. */ > tree new_var = vect_recog_temp_ssa_var (op_type, NULL); > gimple *pattern_stmt = gimple_build_assign (new_var, code, > @@ -6359,9 +6322,11 @@ vect_truncatable_operation_p (tree_code code) > { > switch (code) > { > + case NEGATE_EXPR: > case PLUS_EXPR: > case MINUS_EXPR: > case MULT_EXPR: > + case BIT_NOT_EXPR: > case BIT_AND_EXPR: > case BIT_IOR_EXPR: > case BIT_XOR_EXPR: > @@ -6520,38 +6485,85 @@ vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt) > unsigned int nops = gimple_num_ops (stmt); > > if (!vect_truncatable_operation_p (code)) > - /* Check that all relevant input operands are compatible, and update > - [MIN_VALUE, MAX_VALUE] to include their ranges. */ > - for (unsigned int i = 1; i < nops; ++i) > - { > - tree op = gimple_op (stmt, i); > - if (TREE_CODE (op) == INTEGER_CST) > - { > - /* Don't require the integer to have RHS_TYPE (which it might > - not for things like shift amounts, etc.), but do require it > - to fit the type. */ > - if (!int_fits_type_p (op, type)) > - return; > - > - min_value = wi::min (min_value, wi::to_wide (op, precision), sign); > - max_value = wi::max (max_value, wi::to_wide (op, precision), sign); > - } > - else if (TREE_CODE (op) == SSA_NAME) > - { > - /* Ignore codes that don't take uniform arguments. */ > - if (!types_compatible_p (TREE_TYPE (op), type)) > - return; > + { > + /* Handle operations that can be computed in type T if all inputs > + and outputs can be represented in type T. Also handle left and > + right shifts, where (in addition) the maximum shift amount must > + be less than the number of bits in T. */ > + bool is_shift; > + switch (code) > + { > + case LSHIFT_EXPR: > + case RSHIFT_EXPR: > + is_shift = true; > + break; > > - wide_int op_min_value, op_max_value; > - if (!vect_get_range_info (op, &op_min_value, &op_max_value)) > - return; > + case ABS_EXPR: > + case MIN_EXPR: > + case MAX_EXPR: > + case TRUNC_DIV_EXPR: > + case CEIL_DIV_EXPR: > + case FLOOR_DIV_EXPR: > + case ROUND_DIV_EXPR: > + case EXACT_DIV_EXPR: > + /* Modulus is excluded because it is typically calculated by doing > + a division, for which minimum signed / -1 isn't representable in > + the original signed type. We could take the division range into > + account instead, if handling modulus ever becomes important. */ > + is_shift = false; > + break; > > - min_value = wi::min (min_value, op_min_value, sign); > - max_value = wi::max (max_value, op_max_value, sign); > - } > - else > + default: > return; > - } > + } > + for (unsigned int i = 1; i < nops; ++i) > + { > + tree op = gimple_op (stmt, i); > + wide_int op_min_value, op_max_value; > + if (TREE_CODE (op) == INTEGER_CST) > + { > + unsigned int op_precision = TYPE_PRECISION (TREE_TYPE (op)); > + op_min_value = op_max_value = wi::to_wide (op, op_precision); > + } > + else if (TREE_CODE (op) == SSA_NAME) > + { > + if (!vect_get_range_info (op, &op_min_value, &op_max_value)) > + return; > + } > + else > + return; > + > + if (is_shift && i == 2) > + { > + /* There needs to be one more bit than the maximum shift amount. > + > + If the maximum shift amount is already 1 less than PRECISION > + then we can't narrow the shift further. Dealing with that > + case first ensures that we can safely use an unsigned range > + below. > + > + op_min_value isn't relevant, since shifts by negative amounts > + are UB. */ > + if (wi::geu_p (op_max_value, precision - 1)) > + return; > + unsigned int min_bits = op_max_value.to_uhwi () + 1; > + > + /* As explained below, we can convert a signed shift into an > + unsigned shift if the sign bit is always clear. At this > + point we've already processed the ranges of the output and > + the first input. */ > + auto op_sign = sign; > + if (sign == SIGNED && !wi::neg_p (min_value)) > + op_sign = UNSIGNED; > + op_min_value = wide_int::from (wi::min_value (min_bits, op_sign), > + precision, op_sign); > + op_max_value = wide_int::from (wi::max_value (min_bits, op_sign), > + precision, op_sign); > + } > + min_value = wi::min (min_value, op_min_value, sign); > + max_value = wi::max (max_value, op_max_value, sign); > + } > + } > > /* Try to switch signed types for unsigned types if we can. > This is better for two reasons. First, unsigned ops tend > -- > 2.25.1 >
diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-1.c b/gcc/testsuite/gcc.dg/vect/pr113281-1.c new file mode 100644 index 00000000000..6df4231cb5f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-1.c @@ -0,0 +1,17 @@ +#include "tree-vect.h" + +unsigned char a; + +int main() { + check_vect (); + + short b = a = 0; + for (; a != 19; a++) + if (a) + b = 32872 >> a; + + if (b == 0) + return 0; + else + return 1; +} diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-2.c b/gcc/testsuite/gcc.dg/vect/pr113281-2.c new file mode 100644 index 00000000000..3a1170c28b6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-2.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ + +#define N 128 + +short x[N]; +short y[N]; + +void +f1 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= y[i]; +} + +void +f2 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] < 32 ? y[i] : 32); +} + +void +f3 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] < 31 ? y[i] : 31); +} + +void +f4 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] & 31); +} + +void +f5 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= 0x8000 >> y[i]; +} + +void +f6 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= 0x8000 >> (y[i] & 31); +} + +/* { dg-final { scan-tree-dump-not {can narrow[^\n]+>>} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-3.c b/gcc/testsuite/gcc.dg/vect/pr113281-3.c new file mode 100644 index 00000000000..5982dd2d16f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-3.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ + +#define N 128 + +short x[N]; +short y[N]; + +void +f1 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] < 30 ? y[i] : 30); +} + +void +f2 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= ((y[i] & 15) + 2); +} + +void +f3 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] < 16 ? y[i] : 16); +} + +void +f4 (void) +{ + for (int i = 0; i < N; ++i) + x[i] = 32768 >> ((y[i] & 15) + 3); +} + +/* { dg-final { scan-tree-dump {can narrow to signed:31 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to signed:18 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to signed:17 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to unsigned:19 without loss [^\n]+>>} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-4.c b/gcc/testsuite/gcc.dg/vect/pr113281-4.c new file mode 100644 index 00000000000..10fbc0e8405 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-4.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ + +#define N 128 + +short x[N]; +short y[N]; + +void +f1 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] & 15); +} + +void +f2 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= ((y[i] & 7) + 8); +} + +void +f3 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= ((y[i] & 7) ^ 11); +} + +void +f4 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] < 15 ? y[i] : 15); +} + +void +f5 (void) +{ + for (int i = 0; i < N; ++i) + x[i] >>= (y[i] < 15 ? y[i] : 1); +} + +void +f6 (void) +{ + for (int i = 0; i < N; ++i) + x[i] = 32768 >> (y[i] & 15); +} + +/* { dg-final { scan-tree-dump {:11:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {:18:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {:25:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {:32:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {:39:[^\n]+can narrow to signed:16 without loss [^\n]+>>} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to unsigned:16 without loss [^\n]+>>} "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-5.c b/gcc/testsuite/gcc.dg/vect/pr113281-5.c new file mode 100644 index 00000000000..4a4571792e2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr113281-5.c @@ -0,0 +1,66 @@ +/* { dg-do compile } */ + +#define N 128 + +short x[N]; +short y[N]; + +void +f1 (void) +{ + for (int i = 0; i < N; ++i) + { + int a = y[i]; + int b = ~a; + x[i] = b; + } +} + +void +f2 (void) +{ + for (int i = 0; i < N; ++i) + { + int a = y[i]; + int b = -a; + x[i] = b; + } +} + +void +f3 (void) +{ + for (int i = 0; i < N; ++i) + { + int a = x[i]; + int b = a / y[i]; + x[i] = b; + } +} + +void +f4 (void) +{ + for (int i = 0; i < N; ++i) + { + int a = x[i]; + int b = a < y[i] ? a : y[i]; + x[i] = b; + } +} + +void +f5 (void) +{ + for (int i = 0; i < N; ++i) + { + int a = x[i]; + int b = a > y[i] ? a : y[i]; + x[i] = b; + } +} + +/* { dg-final { scan-tree-dump {can narrow to signed:17 without loss [^\n]+= -} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to signed:16 without loss [^\n]+= ~} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to signed:16 without loss [^\n]+ MIN_EXPR} "vect" } } */ +/* { dg-final { scan-tree-dump {can narrow to signed:16 without loss [^\n]+ MAX_EXPR} "vect" } } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index f1a95ef34b9..d562f57920f 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3164,46 +3164,9 @@ vect_recog_over_widening_pattern (vec_info *vinfo, tree ops[3] = {}; for (unsigned int i = 1; i < first_op; ++i) ops[i - 1] = gimple_op (last_stmt, i); - /* For right shifts limit the shift operand. */ vect_convert_inputs (vinfo, last_stmt_info, nops, &ops[first_op - 1], op_type, &unprom[0], op_vectype); - /* Limit shift operands. */ - if (code == RSHIFT_EXPR) - { - wide_int min_value, max_value; - if (TREE_CODE (ops[1]) == INTEGER_CST) - ops[1] = wide_int_to_tree (op_type, - wi::umin (wi::to_wide (ops[1]), - new_precision - 1)); - else if (!vect_get_range_info (ops[1], &min_value, &max_value) - || wi::ge_p (max_value, new_precision, TYPE_SIGN (op_type))) - { - /* ??? Note the following bad for SLP as that only supports - same argument widened shifts and it un-CSEs same arguments. */ - tree new_var = vect_recog_temp_ssa_var (op_type, NULL); - gimple *pattern_stmt - = gimple_build_assign (new_var, MIN_EXPR, ops[1], - build_int_cst (op_type, new_precision - 1)); - gimple_set_location (pattern_stmt, gimple_location (last_stmt)); - if (ops[1] == unprom[1].op && unprom[1].dt == vect_external_def) - { - if (edge e = vect_get_external_def_edge (vinfo, ops[1])) - { - basic_block new_bb - = gsi_insert_on_edge_immediate (e, pattern_stmt); - gcc_assert (!new_bb); - } - else - return NULL; - } - else - append_pattern_def_seq (vinfo, last_stmt_info, pattern_stmt, - op_vectype); - ops[1] = new_var; - } - } - /* Use the operation to produce a result of type OP_TYPE. */ tree new_var = vect_recog_temp_ssa_var (op_type, NULL); gimple *pattern_stmt = gimple_build_assign (new_var, code, @@ -6359,9 +6322,11 @@ vect_truncatable_operation_p (tree_code code) { switch (code) { + case NEGATE_EXPR: case PLUS_EXPR: case MINUS_EXPR: case MULT_EXPR: + case BIT_NOT_EXPR: case BIT_AND_EXPR: case BIT_IOR_EXPR: case BIT_XOR_EXPR: @@ -6520,38 +6485,85 @@ vect_determine_precisions_from_range (stmt_vec_info stmt_info, gassign *stmt) unsigned int nops = gimple_num_ops (stmt); if (!vect_truncatable_operation_p (code)) - /* Check that all relevant input operands are compatible, and update - [MIN_VALUE, MAX_VALUE] to include their ranges. */ - for (unsigned int i = 1; i < nops; ++i) - { - tree op = gimple_op (stmt, i); - if (TREE_CODE (op) == INTEGER_CST) - { - /* Don't require the integer to have RHS_TYPE (which it might - not for things like shift amounts, etc.), but do require it - to fit the type. */ - if (!int_fits_type_p (op, type)) - return; - - min_value = wi::min (min_value, wi::to_wide (op, precision), sign); - max_value = wi::max (max_value, wi::to_wide (op, precision), sign); - } - else if (TREE_CODE (op) == SSA_NAME) - { - /* Ignore codes that don't take uniform arguments. */ - if (!types_compatible_p (TREE_TYPE (op), type)) - return; + { + /* Handle operations that can be computed in type T if all inputs + and outputs can be represented in type T. Also handle left and + right shifts, where (in addition) the maximum shift amount must + be less than the number of bits in T. */ + bool is_shift; + switch (code) + { + case LSHIFT_EXPR: + case RSHIFT_EXPR: + is_shift = true; + break; - wide_int op_min_value, op_max_value; - if (!vect_get_range_info (op, &op_min_value, &op_max_value)) - return; + case ABS_EXPR: + case MIN_EXPR: + case MAX_EXPR: + case TRUNC_DIV_EXPR: + case CEIL_DIV_EXPR: + case FLOOR_DIV_EXPR: + case ROUND_DIV_EXPR: + case EXACT_DIV_EXPR: + /* Modulus is excluded because it is typically calculated by doing + a division, for which minimum signed / -1 isn't representable in + the original signed type. We could take the division range into + account instead, if handling modulus ever becomes important. */ + is_shift = false; + break; - min_value = wi::min (min_value, op_min_value, sign); - max_value = wi::max (max_value, op_max_value, sign); - } - else + default: return; - } + } + for (unsigned int i = 1; i < nops; ++i) + { + tree op = gimple_op (stmt, i); + wide_int op_min_value, op_max_value; + if (TREE_CODE (op) == INTEGER_CST) + { + unsigned int op_precision = TYPE_PRECISION (TREE_TYPE (op)); + op_min_value = op_max_value = wi::to_wide (op, op_precision); + } + else if (TREE_CODE (op) == SSA_NAME) + { + if (!vect_get_range_info (op, &op_min_value, &op_max_value)) + return; + } + else + return; + + if (is_shift && i == 2) + { + /* There needs to be one more bit than the maximum shift amount. + + If the maximum shift amount is already 1 less than PRECISION + then we can't narrow the shift further. Dealing with that + case first ensures that we can safely use an unsigned range + below. + + op_min_value isn't relevant, since shifts by negative amounts + are UB. */ + if (wi::geu_p (op_max_value, precision - 1)) + return; + unsigned int min_bits = op_max_value.to_uhwi () + 1; + + /* As explained below, we can convert a signed shift into an + unsigned shift if the sign bit is always clear. At this + point we've already processed the ranges of the output and + the first input. */ + auto op_sign = sign; + if (sign == SIGNED && !wi::neg_p (min_value)) + op_sign = UNSIGNED; + op_min_value = wide_int::from (wi::min_value (min_bits, op_sign), + precision, op_sign); + op_max_value = wide_int::from (wi::max_value (min_bits, op_sign), + precision, op_sign); + } + min_value = wi::min (min_value, op_min_value, sign); + max_value = wi::max (max_value, op_max_value, sign); + } + } /* Try to switch signed types for unsigned types if we can. This is better for two reasons. First, unsigned ops tend