diff mbox series

combine: zeroing cost for new copies

Message ID 41ddb2fb-0118-4d14-9d04-b882deb3855f@linux.ibm.com
State New
Headers show
Series combine: zeroing cost for new copies | expand

Commit Message

Kewen.Lin Dec. 9, 2020, 9:49 a.m. UTC
Hi,

This patch is to treat those new pseudo-to-pseudo copies
after hard-reg-to-pseudo-copy as zero costs.  The
justification is that these new copies are closely after
the corresponding hard-reg-to-pseudo-copy insns, register
allocation should be able to coalesce them and get them
eliminated.

Now these copies follow the normal costing scheme, the
below case dump shows the unexpected combination:

``` dump

Trying 3, 2 -> 13:
    3: r119:DI=r132:DI
      REG_DEAD r132:DI
    2: r118:DI=r131:DI
      REG_DEAD r131:DI
   13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
      REG_DEAD r119:DI
      REG_DEAD r118:DI

Failed to match this instruction:
(set (reg:DI 128)
    (ior:DI (ashift:DI (reg:DI 132)
            (const_int 32 [0x20]))
        (reg:DI 131)))
Successfully matched this instruction:
(set (reg/v:DI 119 [ f2 ])
    (ashift:DI (reg:DI 132)
        (const_int 32 [0x20])))
Successfully matched this instruction:
(set (reg:DI 128)
    (ior:DI (reg/v:DI 119 [ f2 ])
        (reg:DI 131)))
allowing combination of insns 2, 3 and 13
original costs 4 + 4 + 4 = 12
replacement costs 4 + 4 = 8
deferring deletion of insn with uid = 2.
modifying insn i2     3: r119:DI=r132:DI<<0x20
      REG_DEAD r132:DI
deferring rescan insn with uid = 3.
modifying insn i3    13: r128:DI=r119:DI|r131:DI
      REG_DEAD r131:DI
      REG_DEAD r119:DI
deferring rescan insn with uid = 13.

``` end dump

The original insn 13 can work well as rotldi3_insert_3,
so the combination with shift/or isn't better, but the
costing doesn't matches.

With this patch, we get below instead:

rejecting combination of insns 2, 3 and 13
original costs 0 + 0 + 4 = 4
replacement costs 4 + 4 = 8


Bootstrapped/regtested on powerpc64le-linux-gnu P9.

Is it reasonable?  Any comments are highly appreciated!

BR,
Kewen
------
gcc/ChangeLog:

	* combine.c (new_copies): New static global variable declare/init.
	(combine_validate_cost): Consider zero costs from new_copies.
	(combine_instructions): Set zero cost for insns in new_copies.
	(make_more_copies): Record new pseudo-to-pseudo copies to new_copies.
	(rest_of_handle_combine): Call bitmap alloc/free for new_copies.

Comments

Kewen.Lin Jan. 14, 2021, 2:27 a.m. UTC | #1
Hi,

I'd like to gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561413.html


BR,
Kewen

on 2020/12/9 下午5:49, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> This patch is to treat those new pseudo-to-pseudo copies
> after hard-reg-to-pseudo-copy as zero costs.  The
> justification is that these new copies are closely after
> the corresponding hard-reg-to-pseudo-copy insns, register
> allocation should be able to coalesce them and get them
> eliminated.
> 
> Now these copies follow the normal costing scheme, the
> below case dump shows the unexpected combination:
> 
> ``` dump
> 
> Trying 3, 2 -> 13:
>     3: r119:DI=r132:DI
>       REG_DEAD r132:DI
>     2: r118:DI=r131:DI
>       REG_DEAD r131:DI
>    13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
>       REG_DEAD r119:DI
>       REG_DEAD r118:DI
> 
> Failed to match this instruction:
> (set (reg:DI 128)
>     (ior:DI (ashift:DI (reg:DI 132)
>             (const_int 32 [0x20]))
>         (reg:DI 131)))
> Successfully matched this instruction:
> (set (reg/v:DI 119 [ f2 ])
>     (ashift:DI (reg:DI 132)
>         (const_int 32 [0x20])))
> Successfully matched this instruction:
> (set (reg:DI 128)
>     (ior:DI (reg/v:DI 119 [ f2 ])
>         (reg:DI 131)))
> allowing combination of insns 2, 3 and 13
> original costs 4 + 4 + 4 = 12
> replacement costs 4 + 4 = 8
> deferring deletion of insn with uid = 2.
> modifying insn i2     3: r119:DI=r132:DI<<0x20
>       REG_DEAD r132:DI
> deferring rescan insn with uid = 3.
> modifying insn i3    13: r128:DI=r119:DI|r131:DI
>       REG_DEAD r131:DI
>       REG_DEAD r119:DI
> deferring rescan insn with uid = 13.
> 
> ``` end dump
> 
> The original insn 13 can work well as rotldi3_insert_3,
> so the combination with shift/or isn't better, but the
> costing doesn't matches.
> 
> With this patch, we get below instead:
> 
> rejecting combination of insns 2, 3 and 13
> original costs 0 + 0 + 4 = 4
> replacement costs 4 + 4 = 8
> 
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> 
> Is it reasonable?  Any comments are highly appreciated!
> 
> BR,
> Kewen
> ------
> gcc/ChangeLog:
> 
> 	* combine.c (new_copies): New static global variable declare/init.
> 	(combine_validate_cost): Consider zero costs from new_copies.
> 	(combine_instructions): Set zero cost for insns in new_copies.
> 	(make_more_copies): Record new pseudo-to-pseudo copies to new_copies.
> 	(rest_of_handle_combine): Call bitmap alloc/free for new_copies.
>
Segher Boessenkool Jan. 14, 2021, 8:43 p.m. UTC | #2
Hi!

On Wed, Dec 09, 2020 at 05:49:53PM +0800, Kewen.Lin wrote:
> This patch is to treat those new pseudo-to-pseudo copies
> after hard-reg-to-pseudo-copy as zero costs.  The
> justification is that these new copies are closely after
> the corresponding hard-reg-to-pseudo-copy insns, register
> allocation should be able to coalesce them and get them
> eliminated.

Costing things that are not free as cost zero is very problematic.
Cost zero is problematic in combine anyway (it means unknown cost, not
no cost).

> Now these copies follow the normal costing scheme, the
> below case dump shows the unexpected combination:
> 
> ``` dump
> 
> Trying 3, 2 -> 13:
>     3: r119:DI=r132:DI
>       REG_DEAD r132:DI
>     2: r118:DI=r131:DI
>       REG_DEAD r131:DI
>    13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
>       REG_DEAD r119:DI
>       REG_DEAD r118:DI

This should not combine if 2+13 and 3+13 do not combine already.  Why
did those not combine?

> Failed to match this instruction:
> (set (reg:DI 128)
>     (ior:DI (ashift:DI (reg:DI 132)
>             (const_int 32 [0x20]))
>         (reg:DI 131)))

Likely because it results in this, and this insn isn't recognised.  So
this can be fixed by adding a pattern for it (it needs to make sure all
but the bottom 32 bits of reg 131 are zero; it can use nonzero_bits for
that).

Long ago I had the following patch for this.  Not sure why I never
submitted it, maybe there is something wronmg with it?


Segher

=
From 04c44ad71941310c84b376744cfbcc87c93a8d68 Mon Sep 17 00:00:00 2001
Message-Id: <04c44ad71941310c84b376744cfbcc87c93a8d68.1528751010.git.segher@kernel.crashing.org>
From: Segher Boessenkool <segher@kernel.crashing.org>
Date: Mon, 11 Jun 2018 20:46:31 +0000
Subject: [PATCH] rs6000: Add a splitter for a rl*imi case

An rl*imi is usually written as an IOR of an ASHIFT or similar, and an
AND of a register with a constant mask.  In some cases combine knows
that that AND doesn't do anything (because all zero bits in that mask
correspond to bits known to be already zero), and then no pattern
matches.  This patch adds a define_split for such cases.  It uses
nonzero_bits in the condition of the splitter, but does not need it
afterwards for the instruction to be recognised.  This is necessary
because later passes can see fewer nonzero_bits.

Because it is a splitter, combine will only use it when starting with
three insns (or more), even though the result is just one.  This isn't
a huge problem in practice, but some possible combinations still won't
happen.

---
 gcc/config/rs6000/rs6000.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 38555f5..bc9781b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3920,6 +3920,24 @@ (define_insn "*rotl<mode>3_insert_3"
 }
   [(set_attr "type" "insert")])
 
+(define_code_iterator plus_ior_xor [plus ior xor])
+
+(define_split
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+	(plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand")
+				      (match_operand:SI 2 "const_int_operand"))
+			  (match_operand:GPR 3 "gpc_reg_operand")))]
+  "nonzero_bits (operands[3], <MODE>mode)
+   < HOST_WIDE_INT_1U << INTVAL (operands[2])"
+  [(set (match_dup 0)
+	(ior:GPR (and:GPR (match_dup 3)
+			  (match_dup 4))
+		 (ashift:GPR (match_dup 1)
+			     (match_dup 2))))]
+{
+  operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
+})
+
 (define_insn "*rotl<mode>3_insert_4"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
 	(ior:GPR (and:GPR (match_operand:GPR 3 "gpc_reg_operand" "0")
Kewen.Lin Jan. 15, 2021, 6:49 a.m. UTC | #3
Hi Segher,

Thanks for the comments!

on 2021/1/15 上午4:43, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 09, 2020 at 05:49:53PM +0800, Kewen.Lin wrote:
>> This patch is to treat those new pseudo-to-pseudo copies
>> after hard-reg-to-pseudo-copy as zero costs.  The
>> justification is that these new copies are closely after
>> the corresponding hard-reg-to-pseudo-copy insns, register
>> allocation should be able to coalesce them and get them
>> eliminated.
> 
> Costing things that are not free as cost zero is very problematic.

I totally agree this.  But I'd argue that these new pseudo-to-pseudo
copies aren't counted as "not free".  Assuming they are not free, we
need to have one postpass to clean them once they are not used in
combine pass.  These new copies are closely after the corresponding
hard-reg-to-pseudo-copy insns, register allocation should be able to
coalesce them and get them eliminated.  So I think they are free
in the context of combine cost modeling.

> Cost zero is problematic in combine anyway (it means unknown cost, not
> no cost).

Yeah, zeroing cost is a bad wording here, the patch actually treats
them as free in costing, not change their costs to zero (unknown).

> 
>> Now these copies follow the normal costing scheme, the
>> below case dump shows the unexpected combination:
>>
>> ``` dump
>>
>> Trying 3, 2 -> 13:
>>     3: r119:DI=r132:DI
>>       REG_DEAD r132:DI
>>     2: r118:DI=r131:DI
>>       REG_DEAD r131:DI
>>    13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
>>       REG_DEAD r119:DI
>>       REG_DEAD r118:DI
> 
> This should not combine if 2+13 and 3+13 do not combine already.  Why
> did those not combine?

```
Trying 2 -> 13:
    2: r118:DI=r131:DI
      REG_DEAD r131:DI
   13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
      REG_DEAD r119:DI
      REG_DEAD r118:DI
Failed to match this instruction:
(set (reg:DI 128)
    (ior:DI (ashift:DI (reg/v:DI 119 [ f2 ])
            (const_int 32 [0x20]))
        (reg:DI 131)))

Trying 3 -> 13:
    3: r119:DI=r132:DI
      REG_DEAD r132:DI
   13: r128:DI=r118:DI&0xffffffff|r119:DI<<0x20
      REG_DEAD r119:DI
      REG_DEAD r118:DI
Failed to match this instruction:
(set (reg:DI 128)
    (ior:DI (ashift:DI (reg:DI 132)
            (const_int 32 [0x20]))
        (reg/v:DI 118 [ f1 ])))
```

We don't have the pattern to match it.  So the patch in another thread
"rs6000: Use rldimi for vec init instead of shift + ior" can't keep the
desirable *rotl<mode>3_insert_3 instructions without adjusting the
costs like this patch.  Later 2,3 -> 13 will split it into shift and or.

> 
>> Failed to match this instruction:
>> (set (reg:DI 128)
>>     (ior:DI (ashift:DI (reg:DI 132)
>>             (const_int 32 [0x20]))
>>         (reg:DI 131)))
> 
> Likely because it results in this, and this insn isn't recognised.  So
> this can be fixed by adding a pattern for it (it needs to make sure all
> but the bottom 32 bits of reg 131 are zero; it can use nonzero_bits for
> that).
> 

Yeah, that's what I thought before, as another patch mentioned, I tried
to use nonzero_bits in define_insn but failed to.  I didn't found any
nonzero_bits usages in md files, the regression tesing also showed the
recog would have more rough nonzero_bits information than what we have
in combine and then fail to validate changes, so I did give up this
direction before.

Your patch below shows a brand new way to use nonzero_bits and avoid
the recog issue, it opens a window for me, so great!!!

> Long ago I had the following patch for this.  Not sure why I never
> submitted it, maybe there is something wronmg with it?
> 

If you don't mind, I'll do a check with bootstrappping and regression
testing and then get back to you.
BR,
Kewen
Kewen.Lin Jan. 19, 2021, 7:11 a.m. UTC | #4
Hi Segher,

on 2021/1/15 下午2:49, Kewen.Lin via Gcc-patches wrote:
> on 2021/1/15 上午4:43, Segher Boessenkool wrote:

[snip...]

>> Long ago I had the following patch for this.  Not sure why I never
>> submitted it, maybe there is something wronmg with it?
>>
> 
> If you don't mind, I'll do a check with bootstrappping and regression
> testing and then get back to you.

Your posted patch was bootstrapped/regtested on powerpc64le-linux-gnu
Power9 and powerpc64-linux-gnu Power8, it looks fine to be landed.  :)

I also confirmed that the vec init with type int patch works as expected
on top of this.  Could you have a double check and commit it later
if you don't find anything wrong either?

Thanks in advance.

BR,
Kewen
diff mbox series

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index ed1ad45de83..6fb2fa82c3f 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -419,6 +419,10 @@  static struct undobuf undobuf;
 
 static int n_occurrences;
 
+/* Record the newly introduced pseudo-to-pseudo copies in function
+   make_more_copies.  */
+static bitmap new_copies = NULL;
+
 static rtx reg_nonzero_bits_for_combine (const_rtx, scalar_int_mode,
 					 scalar_int_mode,
 					 unsigned HOST_WIDE_INT *);
@@ -856,30 +860,38 @@  combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3,
   int i0_cost, i1_cost, i2_cost, i3_cost;
   int new_i2_cost, new_i3_cost;
   int old_cost, new_cost;
+  bool i0_cost_ok, i1_cost_ok, i2_cost_ok, i3_cost_ok;
 
   /* Lookup the original insn_costs.  */
   i2_cost = INSN_COST (i2);
   i3_cost = INSN_COST (i3);
+  i2_cost_ok = (i2_cost > 0) || bitmap_bit_p (new_copies, INSN_UID (i2));
+  i3_cost_ok = (i3_cost > 0) || bitmap_bit_p (new_copies, INSN_UID (i3));
 
   if (i1)
     {
       i1_cost = INSN_COST (i1);
+      i1_cost_ok = (i1_cost > 0) || bitmap_bit_p (new_copies, INSN_UID (i1));
       if (i0)
 	{
 	  i0_cost = INSN_COST (i0);
-	  old_cost = (i0_cost > 0 && i1_cost > 0 && i2_cost > 0 && i3_cost > 0
-		      ? i0_cost + i1_cost + i2_cost + i3_cost : 0);
+	  i0_cost_ok = (i0_cost > 0)
+		       || bitmap_bit_p (new_copies, INSN_UID (i0));
+	  old_cost = (i0_cost_ok && i1_cost_ok && i2_cost_ok && i3_cost_ok
+			? i0_cost + i1_cost + i2_cost + i3_cost
+			: 0);
 	}
       else
 	{
-	  old_cost = (i1_cost > 0 && i2_cost > 0 && i3_cost > 0
-		      ? i1_cost + i2_cost + i3_cost : 0);
+	  old_cost = (i1_cost_ok && i2_cost_ok && i3_cost_ok
+			? i1_cost + i2_cost + i3_cost
+			: 0);
 	  i0_cost = 0;
 	}
     }
   else
     {
-      old_cost = (i2_cost > 0 && i3_cost > 0) ? i2_cost + i3_cost : 0;
+      old_cost = (i2_cost_ok && i3_cost_ok) ? i2_cost + i3_cost : 0;
       i1_cost = i0_cost = 0;
     }
 
@@ -1233,7 +1245,12 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 						    insn);
 
 	    /* Record the current insn_cost of this instruction.  */
-	    INSN_COST (insn) = insn_cost (insn, optimize_this_for_speed_p);
+	    if (bitmap_bit_p (new_copies, INSN_UID (insn)))
+	      /* Newly added pseudo-to-pseudo copies should not take any
+		 costs since they should be able to be coalesced.  */
+	      INSN_COST (insn) = 0;
+	    else
+	      INSN_COST (insn) = insn_cost (insn, optimize_this_for_speed_p);
 	    if (dump_file)
 	      {
 		fprintf (dump_file, "insn_cost %d for ", INSN_COST (insn));
@@ -15068,6 +15085,7 @@  make_more_copies (void)
 	  SET_SRC (set) = new_reg;
 	  emit_insn_before (new_insn, insn);
 	  df_insn_rescan (insn);
+	  bitmap_set_bit (new_copies, INSN_UID (insn));
 	}
     }
 }
@@ -15076,6 +15094,7 @@  make_more_copies (void)
 static unsigned int
 rest_of_handle_combine (void)
 {
+  new_copies = BITMAP_ALLOC (NULL);
   make_more_copies ();
 
   df_set_flags (DF_LR_RUN_DCE + DF_DEFER_INSN_RESCAN);
@@ -15102,6 +15121,7 @@  rest_of_handle_combine (void)
     }
 
   regstat_free_n_sets_and_refs ();
+  BITMAP_FREE (new_copies);
   return 0;
 }