Message ID | 573EE130.9050009@foss.arm.com |
---|---|
State | New |
Headers | show |
On Fri, May 20, 2016 at 11:04:32AM +0100, Kyrill Tkachov wrote: > Hi all, > > The recent -frename-registers change exposed a deficiency in the way we fuse > AESE/AESMC instruction pairs in aarch64. > > Basically we want to enforce: > AESE Vn, _ > AESMC Vn, Vn > > to enable the fusion, but regrename comes along and renames the output Vn > register in AESMC to something else, killing the fusion in the hardware. > > The solution in this patch is to add an alternative that ties the input and > output registers in the AESMC pattern and enable that alternative when the > fusion is enabled. > > With this patch I've confirmed that the above preferred register sequence is > kept even with -frename-registers when tuning for a cpu that enables the > fusion and that the chain is broken by regrename otherwise and have seen the > appropriate improvement in a proprietary benchmark (that I cannot name) that > exercises this sequence. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > > Thanks, > Kyrill > > 2016-05-20 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function. > * config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare > prototype. > * config/aarch64/aarch64-simd.md (aarch64_crypto_aes<aesmc_op>v16qi): > Add "=w,0" alternative. Enable it when AES/AESMC fusion is enabled. > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx); > bool aarch64_expand_movmem (rtx *); > bool aarch64_float_const_zero_rtx_p (rtx); > bool aarch64_function_arg_regno_p (unsigned); > +bool aarch64_fusion_enabled_p (unsigned int); This argument type should be "enum aarch64_fusion_pairs". > bool aarch64_gen_movmemqi (rtx *); > bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *); > bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) > return false; > } > > +/* Return true iff the instruction fusion described by OP is enabled. */ > + > +bool > +aarch64_fusion_enabled_p (unsigned int op) > +{ > + return (aarch64_tune_params.fusible_ops & op) != 0; > +} > + A follow-up patch fixing the uses in aarch_macro_fusion_pair_p to use your new function would be nice. OK with the change to argument type. Thanks, James
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx); bool aarch64_expand_movmem (rtx *); bool aarch64_float_const_zero_rtx_p (rtx); bool aarch64_function_arg_regno_p (unsigned); +bool aarch64_fusion_enabled_p (unsigned int); bool aarch64_gen_movmemqi (rtx *); bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *); bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bd73bce64414e8bc01732d14311d742cf28f4586..a66948a28e99f4437824a8640b092f7be1c917f6 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -5424,13 +5424,25 @@ (define_insn "aarch64_crypto_aes<aes_op>v16qi" [(set_attr "type" "crypto_aese")] ) +;; When AES/AESMC fusion is enabled we want the register allocation to +;; look like: +;; AESE Vn, _ +;; AESMC Vn, Vn +;; So prefer to tie operand 1 to operand 0 when fusing. + (define_insn "aarch64_crypto_aes<aesmc_op>v16qi" - [(set (match_operand:V16QI 0 "register_operand" "=w") - (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "w")] + [(set (match_operand:V16QI 0 "register_operand" "=w,w") + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0,w")] CRYPTO_AESMC))] "TARGET_SIMD && TARGET_CRYPTO" "aes<aesmc_op>\\t%0.16b, %1.16b" - [(set_attr "type" "crypto_aesmc")] + [(set_attr "type" "crypto_aesmc") + (set_attr_alternative "enabled" + [(if_then_else (match_test + "aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)") + (const_string "yes" ) + (const_string "no")) + (const_string "yes")])] ) ;; sha1 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) return false; } +/* Return true iff the instruction fusion described by OP is enabled. */ + +bool +aarch64_fusion_enabled_p (unsigned int op) +{ + return (aarch64_tune_params.fusible_ops & op) != 0; +} + /* If MEM is in the form of [base+offset], extract the two parts of address and set to BASE and OFFSET, otherwise return false after clearing BASE and OFFSET. */