Message ID | 20200915011946.3395-7-amodra@gmail.com |
---|---|
State | New |
Headers | show |
Series | [RS6000] rs6000_rtx_costs cost IOR | expand |
Hi! On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote: > * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR. > case IOR: > - /* FIXME */ > *total = COSTS_N_INSNS (1); > - return true; Hey this was okay for over five years :-) > + left = XEXP (x, 0); > + if (GET_CODE (left) == AND > + && CONST_INT_P (XEXP (left, 1))) Add a comment that this is the integer insert insns? > + // rotlsi3_insert_5 But use /* comments */. > + /* Test both regs even though the one in the mask is > + constrained to be equal to the output. Increasing > + cost may well result in rejecting an invalid insn > + earlier. */ Is that ever actually useful? So this new block is pretty huge. Can it easily be factored to a separate function? Just the insert insns part, not all IOR. Okay for trunk with the comments changed to the correct syntax, and factoring masked insert out to a separate function pre-approved if you want to do that. Thanks! Segher
On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote: > Hi! > > On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote: > > * config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR. > > > case IOR: > > - /* FIXME */ > > *total = COSTS_N_INSNS (1); > > - return true; > > Hey this was okay for over five years :-) > > > + left = XEXP (x, 0); > > + if (GET_CODE (left) == AND > > + && CONST_INT_P (XEXP (left, 1))) > > Add a comment that this is the integer insert insns? > > > + // rotlsi3_insert_5 > > But use /* comments */. > > > + /* Test both regs even though the one in the mask is > > + constrained to be equal to the output. Increasing > > + cost may well result in rejecting an invalid insn > > + earlier. */ > > Is that ever actually useful? Possibly not in this particular case, but I did see cases where invalid insns were rejected early by costing non-reg sub-expressions. Beside that, the default position on rtx_costs paths that return true should be to cost any sub-expressions unless you know for sure they are zero cost. And yes, we fail to do that for some cases, eg. mul_highpart. > So this new block is pretty huge. Can it easily be factored to a > separate function? Just the insert insns part, not all IOR. Done in my local tree. > Okay for trunk with the comments changed to the correct syntax, and > factoring masked insert out to a separate function pre-approved if you > want to do that. Thanks! I'll hold off committing until the whole rtx_costs patch series is reviewed (not counting the rotate_and_mask_constant patch).
Hi! On Thu, Sep 17, 2020 at 01:12:19PM +0930, Alan Modra wrote: > On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote: > > > + /* Test both regs even though the one in the mask is > > > + constrained to be equal to the output. Increasing > > > + cost may well result in rejecting an invalid insn > > > + earlier. */ > > > > Is that ever actually useful? > > Possibly not in this particular case, but I did see cases where > invalid insns were rejected early by costing non-reg sub-expressions. But does that ever change generated code? This makes the compiler a lot harder to read and understand. To the point that such micro-optimisations makes worthwhile optimisations hard or impossible to do. Segher
On Mon, Sep 21, 2020 at 10:49:17AM -0500, Segher Boessenkool wrote: > Hi! > > On Thu, Sep 17, 2020 at 01:12:19PM +0930, Alan Modra wrote: > > On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote: > > > > + /* Test both regs even though the one in the mask is > > > > + constrained to be equal to the output. Increasing > > > > + cost may well result in rejecting an invalid insn > > > > + earlier. */ > > > > > > Is that ever actually useful? > > > > Possibly not in this particular case, but I did see cases where > > invalid insns were rejected early by costing non-reg sub-expressions. > > But does that ever change generated code? > > This makes the compiler a lot harder to read and understand. To the > point that such micro-optimisations makes worthwhile optimisations hard > or impossible to do. Fair enough, here's a revised patch. * config/rs6000/rs6000.c (rotate_insert_cost): New function. (rs6000_rtx_costs): Cost IOR. Tidy break/return. Tidy AND. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 5025e3c30c0..78c33cc8cba 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -21118,6 +21118,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn) && get_attr_cannot_copy (insn); } +/* Handle rtx_costs for scalar integer rotate and insert insns. */ + +static bool +rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed, + int *total) +{ + if (GET_CODE (right) == AND + && CONST_INT_P (XEXP (right, 1)) + && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0) + { + rtx leftop = XEXP (left, 0); + rtx rightop = XEXP (right, 0); + + /* rotlsi3_insert_5. */ + if (REG_P (leftop) + && REG_P (rightop) + && mode == SImode + && UINTVAL (XEXP (left, 1)) != 0 + && UINTVAL (XEXP (right, 1)) != 0 + && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode)) + return true; + /* rotldi3_insert_6. */ + if (REG_P (leftop) + && REG_P (rightop) + && mode == DImode + && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0) + return true; + /* rotldi3_insert_7. */ + if (REG_P (leftop) + && REG_P (rightop) + && mode == DImode + && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0) + return true; + + rtx mask = 0; + rtx shift = leftop; + rtx_code shift_code = GET_CODE (shift); + /* rotl<mode>3_insert. */ + if (shift_code == ROTATE + || shift_code == ASHIFT + || shift_code == LSHIFTRT) + mask = right; + else + { + shift = rightop; + shift_code = GET_CODE (shift); + /* rotl<mode>3_insert_2. */ + if (shift_code == ROTATE + || shift_code == ASHIFT + || shift_code == LSHIFTRT) + mask = left; + } + if (mask + && CONST_INT_P (XEXP (shift, 1)) + && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode)) + { + *total += rtx_cost (XEXP (shift, 0), mode, shift_code, 0, speed); + *total += rtx_cost (XEXP (mask, 0), mode, AND, 0, speed); + return true; + } + } + /* rotl<mode>3_insert_3. */ + if (GET_CODE (right) == ASHIFT + && CONST_INT_P (XEXP (right, 1)) + && (INTVAL (XEXP (right, 1)) + == exact_log2 (UINTVAL (XEXP (left, 1)) + 1))) + { + *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed); + *total += rtx_cost (XEXP (right, 0), mode, ASHIFT, 0, speed); + return true; + } + /* rotl<mode>3_insert_4. */ + if (GET_CODE (right) == LSHIFTRT + && CONST_INT_P (XEXP (right, 1)) + && mode == SImode + && (INTVAL (XEXP (right, 1)) + + exact_log2 (-UINTVAL (XEXP (left, 1)))) == 32) + { + *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed); + *total += rtx_cost (XEXP (right, 0), mode, LSHIFTRT, 0, speed); + return true; + } + return false; +} + /* Compute a (partial) cost for rtx X. Return true if the complete cost has been computed, and false if subexpressions should be scanned. In either case, *TOTAL contains the cost result. @@ -21165,6 +21250,7 @@ static bool rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UNUSED, int *total, bool speed) { + rtx left, right; int code = GET_CODE (x); switch (code) @@ -21295,7 +21381,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = rs6000_cost->fp; else *total = rs6000_cost->dmul; - break; + return false; case DIV: case MOD: @@ -21355,32 +21441,37 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, return false; case AND: - if (CONST_INT_P (XEXP (x, 1))) + *total = COSTS_N_INSNS (1); + right = XEXP (x, 1); + if (CONST_INT_P (right)) { - rtx left = XEXP (x, 0); + left = XEXP (x, 0); rtx_code left_code = GET_CODE (left); /* rotate-and-mask: 1 insn. */ if ((left_code == ROTATE || left_code == ASHIFT || left_code == LSHIFTRT) - && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode)) + && rs6000_is_valid_shift_mask (right, left, mode)) { - *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed); - if (!CONST_INT_P (XEXP (left, 1))) - *total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed); - *total += COSTS_N_INSNS (1); + *total += rtx_cost (XEXP (left, 0), mode, left_code, 0, speed); + *total += rtx_cost (XEXP (left, 1), mode, left_code, 1, speed); return true; } } - - *total = COSTS_N_INSNS (1); return false; case IOR: - /* FIXME */ *total = COSTS_N_INSNS (1); - return true; + left = XEXP (x, 0); + if (GET_CODE (left) == AND + && CONST_INT_P (XEXP (left, 1))) + { + right = XEXP (x, 1); + if (rotate_insert_cost (left, right, mode, speed, total)) + return true; + } + return false; case CLZ: case XOR: @@ -21465,7 +21556,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = rs6000_cost->fp; return false; } - break; + return false; case NE: case EQ: @@ -21503,13 +21594,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, *total = 0; return true; } - break; + return false; default: - break; + return false; } - - return false; } /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost. */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 8c300b82b11..fb5fe7969a3 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -21177,6 +21177,7 @@ static bool rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UNUSED, int *total, bool speed) { + rtx left, right; int code = GET_CODE (x); switch (code) @@ -21367,16 +21368,17 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, return false; case AND: - if (CONST_INT_P (XEXP (x, 1))) + right = XEXP (x, 1); + if (CONST_INT_P (right)) { - rtx left = XEXP (x, 0); + left = XEXP (x, 0); rtx_code left_code = GET_CODE (left); /* rotate-and-mask: 1 insn. */ if ((left_code == ROTATE || left_code == ASHIFT || left_code == LSHIFTRT) - && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode)) + && rs6000_is_valid_shift_mask (right, left, mode)) { *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed); if (!CONST_INT_P (XEXP (left, 1))) @@ -21390,9 +21392,106 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code, return false; case IOR: - /* FIXME */ *total = COSTS_N_INSNS (1); - return true; + left = XEXP (x, 0); + if (GET_CODE (left) == AND + && CONST_INT_P (XEXP (left, 1))) + { + right = XEXP (x, 1); + if (GET_CODE (right) == AND + && CONST_INT_P (XEXP (right, 1)) + && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0) + { + rtx leftop = XEXP (left, 0); + rtx rightop = XEXP (right, 0); + + // rotlsi3_insert_5 + if (REG_P (leftop) + && REG_P (rightop) + && mode == SImode + && UINTVAL (XEXP (left, 1)) != 0 + && UINTVAL (XEXP (right, 1)) != 0 + && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode)) + return true; + // rotldi3_insert_6 + if (REG_P (leftop) + && REG_P (rightop) + && mode == DImode + && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0) + return true; + // rotldi3_insert_7 + if (REG_P (leftop) + && REG_P (rightop) + && mode == DImode + && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0) + return true; + + rtx mask = 0; + rtx shift = leftop; + rtx_code shift_code = GET_CODE (shift); + // rotl<mode>3_insert + if (shift_code == ROTATE + || shift_code == ASHIFT + || shift_code == LSHIFTRT) + mask = right; + else + { + shift = rightop; + shift_code = GET_CODE (shift); + // rotl<mode>3_insert_2 + if (shift_code == ROTATE + || shift_code == ASHIFT + || shift_code == LSHIFTRT) + mask = left; + } + if (mask + && CONST_INT_P (XEXP (shift, 1)) + && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode)) + { + /* Test both regs even though the one in the mask is + constrained to be equal to the output. Increasing + cost may well result in rejecting an invalid insn + earlier. */ + rtx reg_op = XEXP (shift, 0); + if (!REG_P (reg_op)) + *total += rtx_cost (reg_op, mode, shift_code, 0, speed); + reg_op = XEXP (mask, 0); + if (!REG_P (reg_op)) + *total += rtx_cost (reg_op, mode, AND, 0, speed); + return true; + } + } + // rotl<mode>3_insert_3 + if (GET_CODE (right) == ASHIFT + && CONST_INT_P (XEXP (right, 1)) + && (INTVAL (XEXP (right, 1)) + == exact_log2 (UINTVAL (XEXP (left, 1)) + 1))) + { + rtx reg_op = XEXP (left, 0); + if (!REG_P (reg_op)) + *total += rtx_cost (reg_op, mode, AND, 0, speed); + reg_op = XEXP (right, 0); + if (!REG_P (reg_op)) + *total += rtx_cost (reg_op, mode, ASHIFT, 0, speed); + return true; + } + // rotl<mode>3_insert_4 + if (GET_CODE (right) == LSHIFTRT + && CONST_INT_P (XEXP (right, 1)) + && mode == SImode + && (INTVAL (XEXP (right, 1)) + + exact_log2 (-UINTVAL (XEXP (left, 1)))) == 32) + { + rtx reg_op = XEXP (left, 0); + if (!REG_P (reg_op)) + *total += rtx_cost (reg_op, mode, AND, 0, speed); + reg_op = XEXP (right, 0); + if (!REG_P (reg_op)) + *total += rtx_cost (reg_op, mode, LSHIFTRT, 0, speed); + return true; + } + } + return false; case CLZ: case XOR: