diff mbox series

[RS6000] rs6000_rtx_costs cost IOR

Message ID 20200915011946.3395-7-amodra@gmail.com
State New
Headers show
Series [RS6000] rs6000_rtx_costs cost IOR | expand

Commit Message

Alan Modra Sept. 15, 2020, 1:19 a.m. UTC
* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.

Comments

Segher Boessenkool Sept. 17, 2020, 12:02 a.m. UTC | #1
Hi!

On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote:
> 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.

>      case IOR:
> -      /* FIXME */
>        *total = COSTS_N_INSNS (1);
> -      return true;

Hey this was okay for over five years :-)

> +      left = XEXP (x, 0);
> +      if (GET_CODE (left) == AND
> +	  && CONST_INT_P (XEXP (left, 1)))

Add a comment that this is the integer insert insns?

> +	      // rotlsi3_insert_5

But use /* comments */.

> +		  /* Test both regs even though the one in the mask is
> +		     constrained to be equal to the output.  Increasing
> +		     cost may well result in rejecting an invalid insn
> +		     earlier.  */

Is that ever actually useful?

So this new block is pretty huge.  Can it easily be factored to a
separate function?  Just the insert insns part, not all IOR.

Okay for trunk with the comments changed to the correct syntax, and
factoring masked insert out to a separate function pre-approved if you
want to do that.  Thanks!


Segher
Alan Modra Sept. 17, 2020, 3:42 a.m. UTC | #2
On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:44AM +0930, Alan Modra wrote:
> > 	* config/rs6000/rs6000.c (rs6000_rtx_costs): Cost IOR.
> 
> >      case IOR:
> > -      /* FIXME */
> >        *total = COSTS_N_INSNS (1);
> > -      return true;
> 
> Hey this was okay for over five years :-)
> 
> > +      left = XEXP (x, 0);
> > +      if (GET_CODE (left) == AND
> > +	  && CONST_INT_P (XEXP (left, 1)))
> 
> Add a comment that this is the integer insert insns?
> 
> > +	      // rotlsi3_insert_5
> 
> But use /* comments */.
> 
> > +		  /* Test both regs even though the one in the mask is
> > +		     constrained to be equal to the output.  Increasing
> > +		     cost may well result in rejecting an invalid insn
> > +		     earlier.  */
> 
> Is that ever actually useful?

Possibly not in this particular case, but I did see cases where
invalid insns were rejected early by costing non-reg sub-expressions.

Beside that, the default position on rtx_costs paths that return true
should be to cost any sub-expressions unless you know for sure they
are zero cost.  And yes, we fail to do that for some cases,
eg. mul_highpart.

> So this new block is pretty huge.  Can it easily be factored to a
> separate function?  Just the insert insns part, not all IOR.

Done in my local tree.

> Okay for trunk with the comments changed to the correct syntax, and
> factoring masked insert out to a separate function pre-approved if you
> want to do that.  Thanks!

I'll hold off committing until the whole rtx_costs patch series is
reviewed (not counting the rotate_and_mask_constant patch).
Segher Boessenkool Sept. 21, 2020, 3:49 p.m. UTC | #3
Hi!

On Thu, Sep 17, 2020 at 01:12:19PM +0930, Alan Modra wrote:
> On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> > > +		  /* Test both regs even though the one in the mask is
> > > +		     constrained to be equal to the output.  Increasing
> > > +		     cost may well result in rejecting an invalid insn
> > > +		     earlier.  */
> > 
> > Is that ever actually useful?
> 
> Possibly not in this particular case, but I did see cases where
> invalid insns were rejected early by costing non-reg sub-expressions.

But does that ever change generated code?

This makes the compiler a lot harder to read and understand.  To the
point that such micro-optimisations makes worthwhile optimisations hard
or impossible to do.


Segher
Alan Modra Sept. 21, 2020, 11:54 p.m. UTC | #4
On Mon, Sep 21, 2020 at 10:49:17AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 17, 2020 at 01:12:19PM +0930, Alan Modra wrote:
> > On Wed, Sep 16, 2020 at 07:02:06PM -0500, Segher Boessenkool wrote:
> > > > +		  /* Test both regs even though the one in the mask is
> > > > +		     constrained to be equal to the output.  Increasing
> > > > +		     cost may well result in rejecting an invalid insn
> > > > +		     earlier.  */
> > > 
> > > Is that ever actually useful?
> > 
> > Possibly not in this particular case, but I did see cases where
> > invalid insns were rejected early by costing non-reg sub-expressions.
> 
> But does that ever change generated code?
> 
> This makes the compiler a lot harder to read and understand.  To the
> point that such micro-optimisations makes worthwhile optimisations hard
> or impossible to do.

Fair enough, here's a revised patch.

	* config/rs6000/rs6000.c (rotate_insert_cost): New function.
	(rs6000_rtx_costs): Cost IOR.  Tidy break/return.  Tidy AND.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5025e3c30c0..78c33cc8cba 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21118,6 +21118,91 @@ rs6000_cannot_copy_insn_p (rtx_insn *insn)
 	 && get_attr_cannot_copy (insn);
 }
 
+/* Handle rtx_costs for scalar integer rotate and insert insns.  */
+
+static bool
+rotate_insert_cost (rtx left, rtx right, machine_mode mode, bool speed,
+		    int *total)
+{
+  if (GET_CODE (right) == AND
+      && CONST_INT_P (XEXP (right, 1))
+      && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+    {
+      rtx leftop = XEXP (left, 0);
+      rtx rightop = XEXP (right, 0);
+
+      /* rotlsi3_insert_5.  */
+      if (REG_P (leftop)
+	  && REG_P (rightop)
+	  && mode == SImode
+	  && UINTVAL (XEXP (left, 1)) != 0
+	  && UINTVAL (XEXP (right, 1)) != 0
+	  && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+	return true;
+      /* rotldi3_insert_6.  */
+      if (REG_P (leftop)
+	  && REG_P (rightop)
+	  && mode == DImode
+	  && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+	return true;
+      /* rotldi3_insert_7.  */
+      if (REG_P (leftop)
+	  && REG_P (rightop)
+	  && mode == DImode
+	  && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+	return true;
+
+      rtx mask = 0;
+      rtx shift = leftop;
+      rtx_code shift_code = GET_CODE (shift);
+      /* rotl<mode>3_insert.  */
+      if (shift_code == ROTATE
+	  || shift_code == ASHIFT
+	  || shift_code == LSHIFTRT)
+	mask = right;
+      else
+	{
+	  shift = rightop;
+	  shift_code = GET_CODE (shift);
+	  /* rotl<mode>3_insert_2.  */
+	  if (shift_code == ROTATE
+	      || shift_code == ASHIFT
+	      || shift_code == LSHIFTRT)
+	    mask = left;
+	}
+      if (mask
+	  && CONST_INT_P (XEXP (shift, 1))
+	  && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+	{
+	  *total += rtx_cost (XEXP (shift, 0), mode, shift_code, 0, speed);
+	  *total += rtx_cost (XEXP (mask, 0), mode, AND, 0, speed);
+	  return true;
+	}
+    }
+  /* rotl<mode>3_insert_3.  */
+  if (GET_CODE (right) == ASHIFT
+      && CONST_INT_P (XEXP (right, 1))
+      && (INTVAL (XEXP (right, 1))
+	  == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+    {
+      *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+      *total += rtx_cost (XEXP (right, 0), mode, ASHIFT, 0, speed);
+      return true;
+    }
+  /* rotl<mode>3_insert_4.  */
+  if (GET_CODE (right) == LSHIFTRT
+      && CONST_INT_P (XEXP (right, 1))
+      && mode == SImode
+      && (INTVAL (XEXP (right, 1))
+	  + exact_log2 (-UINTVAL (XEXP (left, 1)))) == 32)
+    {
+      *total += rtx_cost (XEXP (left, 0), mode, AND, 0, speed);
+      *total += rtx_cost (XEXP (right, 0), mode, LSHIFTRT, 0, speed);
+      return true;
+    }
+  return false;
+}
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
    cost has been computed, and false if subexpressions should be
    scanned.  In either case, *TOTAL contains the cost result.
@@ -21165,6 +21250,7 @@ static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21295,7 +21381,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	*total = rs6000_cost->fp;
       else
 	*total = rs6000_cost->dmul;
-      break;
+      return false;
 
     case DIV:
     case MOD:
@@ -21355,32 +21441,37 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
-      if (CONST_INT_P (XEXP (x, 1)))
+      *total = COSTS_N_INSNS (1);
+      right = XEXP (x, 1);
+      if (CONST_INT_P (right))
 	{
-	  rtx left = XEXP (x, 0);
+	  left = XEXP (x, 0);
 	  rtx_code left_code = GET_CODE (left);
 
 	  /* rotate-and-mask: 1 insn.  */
 	  if ((left_code == ROTATE
 	       || left_code == ASHIFT
 	       || left_code == LSHIFTRT)
-	      && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
+	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
-	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
-	      if (!CONST_INT_P (XEXP (left, 1)))
-		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
-	      *total += COSTS_N_INSNS (1);
+	      *total += rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
+	      *total += rtx_cost (XEXP (left, 1), mode, left_code, 1, speed);
 	      return true;
 	    }
 	}
-
-      *total = COSTS_N_INSNS (1);
       return false;
 
     case IOR:
-      /* FIXME */
       *total = COSTS_N_INSNS (1);
-      return true;
+      left = XEXP (x, 0);
+      if (GET_CODE (left) == AND
+	  && CONST_INT_P (XEXP (left, 1)))
+	{
+	  right = XEXP (x, 1);
+	  if (rotate_insert_cost (left, right, mode, speed, total))
+	    return true;
+	}
+      return false;
 
     case CLZ:
     case XOR:
@@ -21465,7 +21556,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = rs6000_cost->fp;
 	  return false;
 	}
-      break;
+      return false;
 
     case NE:
     case EQ:
@@ -21503,13 +21594,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = 0;
 	  return true;
 	}
-      break;
+      return false;
 
     default:
-      break;
+      return false;
     }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8c300b82b11..fb5fe7969a3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21177,6 +21177,7 @@  static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
 {
+  rtx left, right;
   int code = GET_CODE (x);
 
   switch (code)
@@ -21367,16 +21368,17 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
-      if (CONST_INT_P (XEXP (x, 1)))
+      right = XEXP (x, 1);
+      if (CONST_INT_P (right))
 	{
-	  rtx left = XEXP (x, 0);
+	  left = XEXP (x, 0);
 	  rtx_code left_code = GET_CODE (left);
 
 	  /* rotate-and-mask: 1 insn.  */
 	  if ((left_code == ROTATE
 	       || left_code == ASHIFT
 	       || left_code == LSHIFTRT)
-	      && rs6000_is_valid_shift_mask (XEXP (x, 1), left, mode))
+	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
 	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
 	      if (!CONST_INT_P (XEXP (left, 1)))
@@ -21390,9 +21392,106 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case IOR:
-      /* FIXME */
       *total = COSTS_N_INSNS (1);
-      return true;
+      left = XEXP (x, 0);
+      if (GET_CODE (left) == AND
+	  && CONST_INT_P (XEXP (left, 1)))
+	{
+	  right = XEXP (x, 1);
+	  if (GET_CODE (right) == AND
+	      && CONST_INT_P (XEXP (right, 1))
+	      && UINTVAL (XEXP (left, 1)) + UINTVAL (XEXP (right, 1)) + 1 == 0)
+	    {
+	      rtx leftop = XEXP (left, 0);
+	      rtx rightop = XEXP (right, 0);
+
+	      // rotlsi3_insert_5
+	      if (REG_P (leftop)
+		  && REG_P (rightop)
+		  && mode == SImode
+		  && UINTVAL (XEXP (left, 1)) != 0
+		  && UINTVAL (XEXP (right, 1)) != 0
+		  && rs6000_is_valid_mask (XEXP (left, 1), NULL, NULL, mode))
+		return true;
+	      // rotldi3_insert_6
+	      if (REG_P (leftop)
+		  && REG_P (rightop)
+		  && mode == DImode
+		  && exact_log2 (-UINTVAL (XEXP (left, 1))) > 0)
+		return true;
+	      // rotldi3_insert_7
+	      if (REG_P (leftop)
+		  && REG_P (rightop)
+		  && mode == DImode
+		  && exact_log2 (-UINTVAL (XEXP (right, 1))) > 0)
+		return true;
+
+	      rtx mask = 0;
+	      rtx shift = leftop;
+	      rtx_code shift_code = GET_CODE (shift);
+	      // rotl<mode>3_insert
+	      if (shift_code == ROTATE
+		  || shift_code == ASHIFT
+		  || shift_code == LSHIFTRT)
+		mask = right;
+	      else
+		{
+		  shift = rightop;
+		  shift_code = GET_CODE (shift);
+		  // rotl<mode>3_insert_2
+		  if (shift_code == ROTATE
+		      || shift_code == ASHIFT
+		      || shift_code == LSHIFTRT)
+		    mask = left;
+		}
+	      if (mask
+		  && CONST_INT_P (XEXP (shift, 1))
+		  && rs6000_is_valid_insert_mask (XEXP (mask, 1), shift, mode))
+		{
+		  /* Test both regs even though the one in the mask is
+		     constrained to be equal to the output.  Increasing
+		     cost may well result in rejecting an invalid insn
+		     earlier.  */
+		  rtx reg_op = XEXP (shift, 0);
+		  if (!REG_P (reg_op))
+		    *total += rtx_cost (reg_op, mode, shift_code, 0, speed);
+		  reg_op = XEXP (mask, 0);
+		  if (!REG_P (reg_op))
+		    *total += rtx_cost (reg_op, mode, AND, 0, speed);
+		  return true;
+		}
+	    }
+	  // rotl<mode>3_insert_3
+	  if (GET_CODE (right) == ASHIFT
+	      && CONST_INT_P (XEXP (right, 1))
+	      && (INTVAL (XEXP (right, 1))
+		  == exact_log2 (UINTVAL (XEXP (left, 1)) + 1)))
+	    {
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, AND, 0, speed);
+	      reg_op = XEXP (right, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, ASHIFT, 0, speed);
+	      return true;
+	    }
+	  // rotl<mode>3_insert_4
+	  if (GET_CODE (right) == LSHIFTRT
+	      && CONST_INT_P (XEXP (right, 1))
+	      && mode == SImode
+	      && (INTVAL (XEXP (right, 1))
+		  + exact_log2 (-UINTVAL (XEXP (left, 1)))) == 32)
+	    {
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, AND, 0, speed);
+	      reg_op = XEXP (right, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, LSHIFTRT, 0, speed);
+	      return true;
+	    }
+	}
+      return false;
 
     case CLZ:
     case XOR: