Message ID | 20240909121134.697314-1-chenyangyu@isrc.iscas.ac.cn |
---|---|
State | New |
Headers | show |
Series | RISC-V: Implement TARGET_CAN_INLINE_P | expand |
On 9/9/24 6:11 AM, Yangyu Chen wrote: > Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V > ISA. As a result, certain functions cannot be optimized with inlining > when specific options, such as __attribute__((target("arch=+v"))) . > This can lead to potential performance issues when building > retargetable binaries for RISC-V. > > To address this, I have implemented the riscv_can_inline_p function. > This addition enables inlining when the callee either has no special > options or when the some options match, and also ensuring that the > callee's ISA is a subset of the caller's. I also check some other > options when there is no always_inline set. > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_can_inline_p): New function. > (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P. I'd kind of hoped not to mess with this as I suspect keeping this up-to-date is going to end up being somewhat painful and I doubt we're going to see that many cases where folks are using target attributes and where inlining is critical for performance. What's really driving your desire to change this? Is this showing up in real code as a performance issue? > + > + if (caller_opts->x_riscv_tune_string > + && callee_opts->x_riscv_tune_string > + && strcmp (caller_opts->x_riscv_tune_string, > + callee_opts->x_riscv_tune_string) != 0) > + return false; Tuning shouldn't affect correctness of inlining. I'd just drop this clause if this patch goes forward. Jeff
> On Sep 30, 2024, at 02:49, Jeff Law <jeffreyalaw@gmail.com> wrote: > > > > On 9/9/24 6:11 AM, Yangyu Chen wrote: >> Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V >> ISA. As a result, certain functions cannot be optimized with inlining >> when specific options, such as __attribute__((target("arch=+v"))) . >> This can lead to potential performance issues when building >> retargetable binaries for RISC-V. >> To address this, I have implemented the riscv_can_inline_p function. >> This addition enables inlining when the callee either has no special >> options or when the some options match, and also ensuring that the >> callee's ISA is a subset of the caller's. I also check some other >> options when there is no always_inline set. >> gcc/ChangeLog: >> * config/riscv/riscv.cc (riscv_can_inline_p): New function. >> (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P. > I'd kind of hoped not to mess with this as I suspect keeping this up-to-date is going to end up being somewhat painful and I doubt we're going to see that many cases where folks are using target attributes and where inlining is critical for performance. > > What's really driving your desire to change this? Is this showing up in real code as a performance issue? > Yeah. I'm working on function multi-versioning support and found that allowing the callee to be inlined in the caller will result in performance gains. Without this patch, we will need to CALL the function since we can't inline them when two functions have different attributes. A simple evaluation of coremarks shows a 13% performance degradation. > > > > > >> + >> + if (caller_opts->x_riscv_tune_string >> + && callee_opts->x_riscv_tune_string >> + && strcmp (caller_opts->x_riscv_tune_string, >> + callee_opts->x_riscv_tune_string) != 0) >> + return false; > Tuning shouldn't affect correctness of inlining. I'd just drop this clause if this patch goes forward. > OK. I will fix it in the next revision. > > > > Jeff
> On Sep 30, 2024, at 10:34, Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> wrote: >> >> >> On Sep 30, 2024, at 02:49, Jeff Law <jeffreyalaw@gmail.com> wrote: >> On 9/9/24 6:11 AM, Yangyu Chen wrote: >>> Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V >>> ISA. As a result, certain functions cannot be optimized with inlining >>> when specific options, such as __attribute__((target("arch=+v"))) . >>> This can lead to potential performance issues when building >>> retargetable binaries for RISC-V. >>> To address this, I have implemented the riscv_can_inline_p function. >>> This addition enables inlining when the callee either has no special >>> options or when the some options match, and also ensuring that the >>> callee's ISA is a subset of the caller's. I also check some other >>> options when there is no always_inline set. >>> gcc/ChangeLog: >>> * config/riscv/riscv.cc (riscv_can_inline_p): New function. >>> (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P. >> I'd kind of hoped not to mess with this as I suspect keeping this up-to-date is going to end up being somewhat painful and I doubt we're going to see that many cases where folks are using target attributes and where inlining is critical for performance. >> >> What's really driving your desire to change this? Is this showing up in real code as a performance issue? >> > > Yeah. I'm working on function multi-versioning support and found > that allowing the callee to be inlined in the caller will result > in performance gains. Without this patch, we will need to CALL the > function since we can't inline them when two functions have different > attributes. A simple evaluation of coremarks shows a 13% performance > degradation. > Specially, we can reproduce the result on BananaPi-F3 Hardware: Use this GCC branch with my patch: https://github.com/cyyself/gcc/tree/rv_can_inline And compile the coremark on this branch: https://github.com/cyyself/coremark/tree/rva22_v_hotspot With command `make CC=riscv64-unknown-linux-gnu-gcc compile` With my patch, we will get the coremark scored `Iterations/Sec : 5992.917461`. But without this patch after `git reset HEAD^` and recompile the GCC and then coremark, we will get `Iterations/Sec : 5235.602094`, which is 12.6% slower. Thanks, Yangyu Chen >>> + >>> + if (caller_opts->x_riscv_tune_string >>> + && callee_opts->x_riscv_tune_string >>> + && strcmp (caller_opts->x_riscv_tune_string, >>> + callee_opts->x_riscv_tune_string) != 0) >>> + return false; >> Tuning shouldn't affect correctness of inlining. I'd just drop this clause if this patch goes forward. > > OK. I will fix it in the next revision. > >> Jeff >
Hi Yang-Yu: > > Specially, we can reproduce the result on BananaPi-F3 Hardware: > > Use this GCC branch with my patch: > https://github.com/cyyself/gcc/tree/rv_can_inline > > And compile the coremark on this branch: > https://github.com/cyyself/coremark/tree/rva22_v_hotspot > > With command `make CC=riscv64-unknown-linux-gnu-gcc compile` > > With my patch, we will get the coremark scored `Iterations/Sec : > 5992.917461`. But without this patch after `git reset HEAD^` and > recompile the GCC and then coremark, we will get `Iterations/Sec : > 5235.602094`, which is 12.6% slower. Could you add a test case to demonstrate that ? > /* Callee's ISA should be a subset of the caller's ISA. */ This check is necessary, but this way may not scalable for longer term, I mean people may forgot to update this part when adding new extension variables, so I would suggest add a new function to construct a riscv_subset_list from options e.g. diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h index dace4de6575..e8b7c0f194b 100644 --- a/gcc/config/riscv/riscv-subset.h +++ b/gcc/config/riscv/riscv-subset.h @@ -103,6 +103,7 @@ public: riscv_subset_list *clone () const; static riscv_subset_list *parse (const char *, location_t); + static riscv_subset_list *parse (struct gcc_options *opts); const char *parse_single_ext (const char *, bool exact_single_p = true); const riscv_subset_t *begin () const {return m_head;}; And then use riscv_subset_list to do the checking
> On Sep 30, 2024, at 13:58, Kito Cheng <kito.cheng@gmail.com> wrote: > > Hi Yang-Yu: > >> >> Specially, we can reproduce the result on BananaPi-F3 Hardware: >> >> Use this GCC branch with my patch: >> https://github.com/cyyself/gcc/tree/rv_can_inline >> >> And compile the coremark on this branch: >> https://github.com/cyyself/coremark/tree/rva22_v_hotspot >> >> With command `make CC=riscv64-unknown-linux-gnu-gcc compile` >> >> With my patch, we will get the coremark scored `Iterations/Sec : >> 5992.917461`. But without this patch after `git reset HEAD^` and >> recompile the GCC and then coremark, we will get `Iterations/Sec : >> 5235.602094`, which is 12.6% slower. > > Could you add a test case to demonstrate that ? > I will try. >> /* Callee's ISA should be a subset of the caller's ISA. */ > > This check is necessary, but this way may not scalable for longer term, > I mean people may forgot to update this part when adding new extension > variables, > so I would suggest add a new function to construct a riscv_subset_list > from options > e.g. > > diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h > index dace4de6575..e8b7c0f194b 100644 > --- a/gcc/config/riscv/riscv-subset.h > +++ b/gcc/config/riscv/riscv-subset.h > @@ -103,6 +103,7 @@ public: > riscv_subset_list *clone () const; > > static riscv_subset_list *parse (const char *, location_t); > + static riscv_subset_list *parse (struct gcc_options *opts); Thanks for this hint. However, using the class riscv_subset_list is very costly and requires a copy of the ISA string. We can implement a new function like bool riscv_ext_is_subset(struct gcc_options *opts, struct gcc_options *subset) in riscv-common.cc and iterate through the riscv_ext_flag_table. This method is also scalable for the long term. I will submit the next revision lately. > const char *parse_single_ext (const char *, bool exact_single_p = true); > > const riscv_subset_t *begin () const {return m_head;}; > > And then use riscv_subset_list to do the checking
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 6efe14ff199..d36c135650e 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -7656,6 +7656,138 @@ riscv_compute_frame_info (void) /* Next points the incoming stack pointer and any incoming arguments. */ } +/* Implement TARGET_CAN_INLINE_P. */ + +static bool +riscv_can_inline_p (tree caller, tree callee) +{ + tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee); + tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller); + + /* It's safe to inline if callee has no opts. */ + if (! callee_tree) + return true; + + if (! caller_tree) + caller_tree = target_option_default_node; + + struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); + struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); + + int isa_flag_mask = MASK_MUL | MASK_ATOMIC | MASK_HARD_FLOAT + | MASK_DOUBLE_FLOAT | MASK_RVC | MASK_VECTOR + | MASK_FULL_V; + + /* Callee and caller should have the same target options except for ISA. */ + int callee_target_flags = callee_opts->x_target_flags & ~isa_flag_mask; + int caller_target_flags = caller_opts->x_target_flags & ~isa_flag_mask; + + if (callee_target_flags != caller_target_flags) + return false; + + /* Callee's ISA should be a subset of the caller's ISA. */ + int callee_isa_flags = callee_opts->x_target_flags & isa_flag_mask; + int caller_isa_flags = caller_opts->x_target_flags & isa_flag_mask; + + if (callee_isa_flags & ~caller_isa_flags) + return false; + + if (callee_opts->x_riscv_zi_subext & ~caller_opts->x_riscv_zi_subext) + return false; + + if (callee_opts->x_riscv_za_subext & ~caller_opts->x_riscv_za_subext) + return false; + + if (callee_opts->x_riscv_zb_subext & ~caller_opts->x_riscv_zb_subext) + return false; + + if (callee_opts->x_riscv_zinx_subext & ~caller_opts->x_riscv_zinx_subext) + return false; + + if (callee_opts->x_riscv_zk_subext & ~caller_opts->x_riscv_zk_subext) + return false; + + if (callee_opts->x_riscv_vector_elen_flags + & ~caller_opts->x_riscv_vector_elen_flags) + return false; + + if (callee_opts->x_riscv_zvl_flags & ~caller_opts->x_riscv_zvl_flags) + return false; + + if (callee_opts->x_riscv_zvb_subext & ~caller_opts->x_riscv_zvb_subext) + return false; + + if (callee_opts->x_riscv_zvk_subext & ~caller_opts->x_riscv_zvk_subext) + return false; + + if (callee_opts->x_riscv_zicmo_subext & ~caller_opts->x_riscv_zicmo_subext) + return false; + + if (callee_opts->x_riscv_mop_subext & ~caller_opts->x_riscv_mop_subext) + return false; + + if (callee_opts->x_riscv_zf_subext & ~caller_opts->x_riscv_zf_subext) + return false; + + if (callee_opts->x_riscv_zfa_subext & ~caller_opts->x_riscv_zfa_subext) + return false; + + if (callee_opts->x_riscv_zm_subext & ~caller_opts->x_riscv_zm_subext) + return false; + + if (callee_opts->x_riscv_zc_subext & ~caller_opts->x_riscv_zc_subext) + return false; + + if (callee_opts->x_riscv_sv_subext & ~caller_opts->x_riscv_sv_subext) + return false; + + if (callee_opts->x_riscv_ztso_subext & ~caller_opts->x_riscv_ztso_subext) + return false; + + if (callee_opts->x_riscv_xcv_subext & ~caller_opts->x_riscv_xcv_subext) + return false; + + if (callee_opts->x_riscv_xthead_subext & ~caller_opts->x_riscv_xthead_subext) + return false; + + if (callee_opts->x_riscv_xventana_subext + & ~caller_opts->x_riscv_xventana_subext) + return false; + + if (callee_opts->x_riscv_sifive_subext & ~caller_opts->x_riscv_sifive_subext) + return false; + + /* If the callee has always_inline set, we can ignore the rest attributes. */ + if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee))) + return true; + + if (caller_opts->x_riscv_cmodel != callee_opts->x_riscv_cmodel) + return false; + + if (caller_opts->x_riscv_tls_dialect != callee_opts->x_riscv_tls_dialect) + return false; + + if (caller_opts->x_riscv_stack_protector_guard_reg + != callee_opts->x_riscv_stack_protector_guard_reg) + return false; + + if (caller_opts->x_riscv_stack_protector_guard_offset + != callee_opts->x_riscv_stack_protector_guard_offset) + return false; + + if (caller_opts->x_rvv_vector_strict_align + != callee_opts->x_rvv_vector_strict_align) + return false; + + if (caller_opts->x_riscv_tune_string + && callee_opts->x_riscv_tune_string + && strcmp (caller_opts->x_riscv_tune_string, + callee_opts->x_riscv_tune_string) != 0) + return false; + + return true; +} + /* Make sure that we're not trying to eliminate to the wrong hard frame pointer. */ @@ -12540,6 +12672,9 @@ riscv_stack_clash_protection_alloca_probe_range (void) #undef TARGET_LEGITIMATE_ADDRESS_P #define TARGET_LEGITIMATE_ADDRESS_P riscv_legitimate_address_p +#undef TARGET_CAN_INLINE_P +#define TARGET_CAN_INLINE_P riscv_can_inline_p + #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE riscv_can_eliminate
Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V ISA. As a result, certain functions cannot be optimized with inlining when specific options, such as __attribute__((target("arch=+v"))) . This can lead to potential performance issues when building retargetable binaries for RISC-V. To address this, I have implemented the riscv_can_inline_p function. This addition enables inlining when the callee either has no special options or when the some options match, and also ensuring that the callee's ISA is a subset of the caller's. I also check some other options when there is no always_inline set. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_can_inline_p): New function. (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P. Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> --- gcc/config/riscv/riscv.cc | 135 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+)