Message ID | 5746C779.8060901@foss.arm.com |
---|---|
State | New |
Headers | show |
On Thu, 26 May 2016, Kyrill Tkachov wrote: > the early RTL optimisers. This better expresses the semantics of the > operation as a whole, since the RBIT operation is represented as an > UNSPEC anyway and so will not see the benefits of combine, This doesn't affect your patch, but I think it would make sense for RBIT not to be an UNSPEC but to have architecture-independent RTL and built-in function - see bug 50481.
Ping. https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02078.html Patches 1 and 3 have been approved. Thanks, Kyrill On 26/05/16 10:52, Kyrill Tkachov wrote: > Hi all, > > On arm we don't have a dedicated instruction that corresponds to a CTZ rtx but we synthesise it > with an RBIT instruction followed by a CLZ. This is currently done at expand time. > However, I'd like to push that step until after reload and keep the CTZ rtx as a single whole in > the early RTL optimisers. This better expresses the semantics of the operation as a whole, since > the RBIT operation is represented as an UNSPEC anyway and so will not see the benefits of combine, > and a CTZ-specific optimisation that is implemented in patch 3/3 of this series won't be triggered > if the expression is broken up into an UNSPEC and a CLZ. > > Therefore this patch changes the expander to expand to a CTZ rtx and split it after reload into > an RBIT + CLZ to allow sched2 to schedule them apart if it deems necessary. > This patch enables the optimisation in patch 3/3 where the appropriate test is added. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? > > Thanks, > Kyrill > > 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR middle-end/37780 > * config/arm/arm.md (ctzsi2): Convert to define_insn_and_split.
On Thu, May 26, 2016 at 10:52 AM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: > Hi all, > > On arm we don't have a dedicated instruction that corresponds to a CTZ rtx > but we synthesise it > with an RBIT instruction followed by a CLZ. This is currently done at expand > time. > However, I'd like to push that step until after reload and keep the CTZ rtx > as a single whole in > the early RTL optimisers. This better expresses the semantics of the > operation as a whole, since > the RBIT operation is represented as an UNSPEC anyway and so will not see > the benefits of combine, > and a CTZ-specific optimisation that is implemented in patch 3/3 of this > series won't be triggered > if the expression is broken up into an UNSPEC and a CLZ. > > Therefore this patch changes the expander to expand to a CTZ rtx and split > it after reload into > an RBIT + CLZ to allow sched2 to schedule them apart if it deems necessary. > This patch enables the optimisation in patch 3/3 where the appropriate test > is added. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? > > Thanks, > Kyrill > > 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR middle-end/37780 > * config/arm/arm.md (ctzsi2): Convert to define_insn_and_split. OK. Ramana
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0d491f7ea41e4fb5fb58bbb3047294abda541a73..fcb07e7629dc14b7cf8a0cd3d4d1a57ff33efe07 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -10784,19 +10784,22 @@ (define_insn "rbitsi2" (set_attr "predicable_short_it" "no") (set_attr "type" "clz")]) -(define_expand "ctzsi2" - [(set (match_operand:SI 0 "s_register_operand" "") - (ctz:SI (match_operand:SI 1 "s_register_operand" "")))] +;; Keep this as a CTZ expression until after reload and then split +;; into RBIT + CLZ. Since RBIT is represented as an UNSPEC it is unlikely +;; to fold with any other expression. + +(define_insn_and_split "ctzsi2" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (ctz:SI (match_operand:SI 1 "s_register_operand" "r")))] "TARGET_32BIT && arm_arch_thumb2" + "#" + "&& reload_completed" + [(const_int 0)] " - { - rtx tmp = gen_reg_rtx (SImode); - emit_insn (gen_rbitsi2 (tmp, operands[1])); - emit_insn (gen_clzsi2 (operands[0], tmp)); - } - DONE; - " -) + emit_insn (gen_rbitsi2 (operands[0], operands[1])); + emit_insn (gen_clzsi2 (operands[0], operands[0])); + DONE; +") ;; V5E instructions.