Message ID | 20240615134745.14671-1-xry111@xry111.site |
---|---|
State | New |
Headers | show |
Series | LoongArch: Only transform move/move/bstrins to srai/bstrins when -Os | expand |
Ping. On Sat, 2024-06-15 at 21:47 +0800, Xi Ruoyao wrote: > The first form has a lower latency (due to the special handling of > "move" in LA464 and LA664) despite it's longer. > > gcc/ChangeLog: > > * config/loongarch/loongarch.md (define_peephole2): Require > optimize_insn_for_size_p () for move/move/bstrins => > srai/bstrins transform. > --- > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > gcc/config/loongarch/loongarch.md | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/gcc/config/loongarch/loongarch.md > b/gcc/config/loongarch/loongarch.md > index 25c1d323ba0..e4434c3bd4e 100644 > --- a/gcc/config/loongarch/loongarch.md > +++ b/gcc/config/loongarch/loongarch.md > @@ -1617,20 +1617,23 @@ (define_insn_and_split > "*bstrins_<mode>_for_ior_mask" > }) > > ;; We always avoid the shift operation in bstrins_<mode>_for_ior_mask > -;; if possible, but the result may be sub-optimal when one of the > masks > +;; if possible, but the result may be larger when one of the masks > ;; is (1 << N) - 1 and one of the src register is the dest register. > ;; For example: > ;; move t0, a0 > ;; move a0, a1 > ;; bstrins.d a0, t0, 42, 0 > ;; ret > -;; using a shift operation would be better: > +;; using a shift operation would be smaller: > ;; srai.d t0, a1, 43 > ;; bstrins.d a0, t0, 63, 43 > ;; ret > ;; unfortunately we cannot figure it out in split1: before reload we > cannot > ;; know if the dest register is one of the src register. Fix it up > in > ;; peephole2. > +;; > +;; Note that the first form has a lower latency so this should only > be > +;; done when optimizing for size. > (define_peephole2 > [(set (match_operand:GPR 0 "register_operand") > (match_operand:GPR 1 "register_operand")) > @@ -1639,7 +1642,7 @@ (define_peephole2 > (match_operand:SI 3 "const_int_operand") > (const_int 0)) > (match_dup 0))] > - "peep2_reg_dead_p (3, operands[0])" > + "peep2_reg_dead_p (3, operands[0]) && optimize_insn_for_size_p ()" > [(const_int 0)] > { > int len = GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[3]);
>> ;; We always avoid the shift operation in bstrins_<mode>_for_ior_mask >> -;; if possible, but the result may be sub-optimal when one of the >> masks >> +;; if possible, but the result may be larger when one of the masks >> ;; is (1 << N) - 1 and one of the src register is the dest register. >> ;; For example: >> ;; move t0, a0 >> ;; move a0, a1 >> ;; bstrins.d a0, t0, 42, 0 >> ;; ret >> -;; using a shift operation would be better: >> +;; using a shift operation would be smaller: >> ;; srai.d t0, a1, 43 >> ;; bstrins.d a0, t0, 63, 43 >> ;; ret >> ;; unfortunately we cannot figure it out in split1: before reload we >> cannot >> ;; know if the dest register is one of the src register. Fix it up >> in >> ;; peephole2. >> +;; >> +;; Note that the first form has a lower latency so this should only The result of my test is that the latency of these two forms is the same, is there a problem with my test? >> be >> +;; done when optimizing for size. >> (define_peephole2 >> [(set (match_operand:GPR 0 "register_operand") >> (match_operand:GPR 1 "register_operand")) >> @@ -1639,7 +1642,7 @@ (define_peephole2 >> (match_operand:SI 3 "const_int_operand") >> (const_int 0)) >> (match_dup 0))] >> - "peep2_reg_dead_p (3, operands[0])" >> + "peep2_reg_dead_p (3, operands[0]) && optimize_insn_for_size_p ()" >> [(const_int 0)] >> { >> int len = GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[3]);
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 25c1d323ba0..e4434c3bd4e 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1617,20 +1617,23 @@ (define_insn_and_split "*bstrins_<mode>_for_ior_mask" }) ;; We always avoid the shift operation in bstrins_<mode>_for_ior_mask -;; if possible, but the result may be sub-optimal when one of the masks +;; if possible, but the result may be larger when one of the masks ;; is (1 << N) - 1 and one of the src register is the dest register. ;; For example: ;; move t0, a0 ;; move a0, a1 ;; bstrins.d a0, t0, 42, 0 ;; ret -;; using a shift operation would be better: +;; using a shift operation would be smaller: ;; srai.d t0, a1, 43 ;; bstrins.d a0, t0, 63, 43 ;; ret ;; unfortunately we cannot figure it out in split1: before reload we cannot ;; know if the dest register is one of the src register. Fix it up in ;; peephole2. +;; +;; Note that the first form has a lower latency so this should only be +;; done when optimizing for size. (define_peephole2 [(set (match_operand:GPR 0 "register_operand") (match_operand:GPR 1 "register_operand")) @@ -1639,7 +1642,7 @@ (define_peephole2 (match_operand:SI 3 "const_int_operand") (const_int 0)) (match_dup 0))] - "peep2_reg_dead_p (3, operands[0])" + "peep2_reg_dead_p (3, operands[0]) && optimize_insn_for_size_p ()" [(const_int 0)] { int len = GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[3]);