diff mbox series

LoongArch: Only transform move/move/bstrins to srai/bstrins when -Os

Message ID 20240615134745.14671-1-xry111@xry111.site
State New
Headers show
Series LoongArch: Only transform move/move/bstrins to srai/bstrins when -Os | expand

Commit Message

Xi Ruoyao June 15, 2024, 1:47 p.m. UTC
The first form has a lower latency (due to the special handling of
"move" in LA464 and LA664) despite it's longer.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (define_peephole2): Require
	optimize_insn_for_size_p () for move/move/bstrins =>
	srai/bstrins transform.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Xi Ruoyao June 26, 2024, 7:54 a.m. UTC | #1
Ping.

On Sat, 2024-06-15 at 21:47 +0800, Xi Ruoyao wrote:
> The first form has a lower latency (due to the special handling of
> "move" in LA464 and LA664) despite it's longer.
> 
> gcc/ChangeLog:
> 
> 	* config/loongarch/loongarch.md (define_peephole2): Require
> 	optimize_insn_for_size_p () for move/move/bstrins =>
> 	srai/bstrins transform.
> ---
> 
> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> 
>  gcc/config/loongarch/loongarch.md | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md
> b/gcc/config/loongarch/loongarch.md
> index 25c1d323ba0..e4434c3bd4e 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -1617,20 +1617,23 @@ (define_insn_and_split
> "*bstrins_<mode>_for_ior_mask"
>    })
>  
>  ;; We always avoid the shift operation in bstrins_<mode>_for_ior_mask
> -;; if possible, but the result may be sub-optimal when one of the
> masks
> +;; if possible, but the result may be larger when one of the masks
>  ;; is (1 << N) - 1 and one of the src register is the dest register.
>  ;; For example:
>  ;;     move		t0, a0
>  ;;     move		a0, a1
>  ;;     bstrins.d	a0, t0, 42, 0
>  ;;     ret
> -;; using a shift operation would be better:
> +;; using a shift operation would be smaller:
>  ;;     srai.d		t0, a1, 43
>  ;;     bstrins.d	a0, t0, 63, 43
>  ;;     ret
>  ;; unfortunately we cannot figure it out in split1: before reload we
> cannot
>  ;; know if the dest register is one of the src register.  Fix it up
> in
>  ;; peephole2.
> +;;
> +;; Note that the first form has a lower latency so this should only
> be
> +;; done when optimizing for size.
>  (define_peephole2
>    [(set (match_operand:GPR 0 "register_operand")
>  	(match_operand:GPR 1 "register_operand"))
> @@ -1639,7 +1642,7 @@ (define_peephole2
>  			  (match_operand:SI 3 "const_int_operand")
>  			  (const_int 0))
>  	(match_dup 0))]
> -  "peep2_reg_dead_p (3, operands[0])"
> +  "peep2_reg_dead_p (3, operands[0]) && optimize_insn_for_size_p ()"
>    [(const_int 0)]
>    {
>      int len = GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[3]);
Lulu Cheng June 26, 2024, 9:10 a.m. UTC | #2
>>   ;; We always avoid the shift operation in bstrins_<mode>_for_ior_mask
>> -;; if possible, but the result may be sub-optimal when one of the
>> masks
>> +;; if possible, but the result may be larger when one of the masks
>>   ;; is (1 << N) - 1 and one of the src register is the dest register.
>>   ;; For example:
>>   ;;     move		t0, a0
>>   ;;     move		a0, a1
>>   ;;     bstrins.d	a0, t0, 42, 0
>>   ;;     ret
>> -;; using a shift operation would be better:
>> +;; using a shift operation would be smaller:
>>   ;;     srai.d		t0, a1, 43
>>   ;;     bstrins.d	a0, t0, 63, 43
>>   ;;     ret
>>   ;; unfortunately we cannot figure it out in split1: before reload we
>> cannot
>>   ;; know if the dest register is one of the src register.  Fix it up
>> in
>>   ;; peephole2.
>> +;;
>> +;; Note that the first form has a lower latency so this should only

The result of my test is that the latency of these two forms is the 
same, is there a problem with my test?


>> be
>> +;; done when optimizing for size.
>>   (define_peephole2
>>     [(set (match_operand:GPR 0 "register_operand")
>>   	(match_operand:GPR 1 "register_operand"))
>> @@ -1639,7 +1642,7 @@ (define_peephole2
>>   			  (match_operand:SI 3 "const_int_operand")
>>   			  (const_int 0))
>>   	(match_dup 0))]
>> -  "peep2_reg_dead_p (3, operands[0])"
>> +  "peep2_reg_dead_p (3, operands[0]) && optimize_insn_for_size_p ()"
>>     [(const_int 0)]
>>     {
>>       int len = GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[3]);
diff mbox series

Patch

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 25c1d323ba0..e4434c3bd4e 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1617,20 +1617,23 @@  (define_insn_and_split "*bstrins_<mode>_for_ior_mask"
   })
 
 ;; We always avoid the shift operation in bstrins_<mode>_for_ior_mask
-;; if possible, but the result may be sub-optimal when one of the masks
+;; if possible, but the result may be larger when one of the masks
 ;; is (1 << N) - 1 and one of the src register is the dest register.
 ;; For example:
 ;;     move		t0, a0
 ;;     move		a0, a1
 ;;     bstrins.d	a0, t0, 42, 0
 ;;     ret
-;; using a shift operation would be better:
+;; using a shift operation would be smaller:
 ;;     srai.d		t0, a1, 43
 ;;     bstrins.d	a0, t0, 63, 43
 ;;     ret
 ;; unfortunately we cannot figure it out in split1: before reload we cannot
 ;; know if the dest register is one of the src register.  Fix it up in
 ;; peephole2.
+;;
+;; Note that the first form has a lower latency so this should only be
+;; done when optimizing for size.
 (define_peephole2
   [(set (match_operand:GPR 0 "register_operand")
 	(match_operand:GPR 1 "register_operand"))
@@ -1639,7 +1642,7 @@  (define_peephole2
 			  (match_operand:SI 3 "const_int_operand")
 			  (const_int 0))
 	(match_dup 0))]
-  "peep2_reg_dead_p (3, operands[0])"
+  "peep2_reg_dead_p (3, operands[0]) && optimize_insn_for_size_p ()"
   [(const_int 0)]
   {
     int len = GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[3]);