diff mbox series

RISC-V: Implement TARGET_CAN_INLINE_P

Message ID 20240909121134.697314-1-chenyangyu@isrc.iscas.ac.cn
State New
Headers show
Series RISC-V: Implement TARGET_CAN_INLINE_P | expand

Commit Message

Yangyu Chen Sept. 9, 2024, 12:11 p.m. UTC
Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
ISA. As a result, certain functions cannot be optimized with inlining
when specific options, such as __attribute__((target("arch=+v"))) .
This can lead to potential performance issues when building
retargetable binaries for RISC-V.

To address this, I have implemented the riscv_can_inline_p function.
This addition enables inlining when the callee either has no special
options or when the some options match, and also ensuring that the
callee's ISA is a subset of the caller's. I also check some other
options when there is no always_inline set.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_can_inline_p): New function.
        (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.

Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
---
 gcc/config/riscv/riscv.cc | 135 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

Comments

Jeff Law Sept. 29, 2024, 6:49 p.m. UTC | #1
On 9/9/24 6:11 AM, Yangyu Chen wrote:
> Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
> ISA. As a result, certain functions cannot be optimized with inlining
> when specific options, such as __attribute__((target("arch=+v"))) .
> This can lead to potential performance issues when building
> retargetable binaries for RISC-V.
> 
> To address this, I have implemented the riscv_can_inline_p function.
> This addition enables inlining when the callee either has no special
> options or when the some options match, and also ensuring that the
> callee's ISA is a subset of the caller's. I also check some other
> options when there is no always_inline set.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/riscv.cc (riscv_can_inline_p): New function.
>          (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
I'd kind of hoped not to mess with this as I suspect keeping this 
up-to-date is going to end up being somewhat painful and I doubt we're 
going to see that many cases where folks are using target attributes and 
where inlining is critical for performance.

What's really driving your desire to change this?  Is this showing up in 
real code as a performance issue?






> +
> +  if (caller_opts->x_riscv_tune_string
> +      && callee_opts->x_riscv_tune_string
> +      && strcmp (caller_opts->x_riscv_tune_string,
> +		 callee_opts->x_riscv_tune_string) != 0)
> +    return false;
Tuning shouldn't affect correctness of inlining.  I'd just drop this 
clause if this patch goes forward.




Jeff
Yangyu Chen Sept. 30, 2024, 2:34 a.m. UTC | #2
> On Sep 30, 2024, at 02:49, Jeff Law <jeffreyalaw@gmail.com> wrote:
> 
> 
> 
> On 9/9/24 6:11 AM, Yangyu Chen wrote:
>> Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
>> ISA. As a result, certain functions cannot be optimized with inlining
>> when specific options, such as __attribute__((target("arch=+v"))) .
>> This can lead to potential performance issues when building
>> retargetable binaries for RISC-V.
>> To address this, I have implemented the riscv_can_inline_p function.
>> This addition enables inlining when the callee either has no special
>> options or when the some options match, and also ensuring that the
>> callee's ISA is a subset of the caller's. I also check some other
>> options when there is no always_inline set.
>> gcc/ChangeLog:
>>         * config/riscv/riscv.cc (riscv_can_inline_p): New function.
>>         (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
> I'd kind of hoped not to mess with this as I suspect keeping this up-to-date is going to end up being somewhat painful and I doubt we're going to see that many cases where folks are using target attributes and where inlining is critical for performance.
> 
> What's really driving your desire to change this?  Is this showing up in real code as a performance issue?
> 

Yeah. I'm working on function multi-versioning support and found
that allowing the callee to be inlined in the caller will result
in performance gains. Without this patch, we will need to CALL the
function since we can't inline them when two functions have different
attributes. A simple evaluation of coremarks shows a 13% performance
degradation.

> 
> 
> 
> 
> 
>> +
>> +  if (caller_opts->x_riscv_tune_string
>> +      && callee_opts->x_riscv_tune_string
>> +      && strcmp (caller_opts->x_riscv_tune_string,
>> + callee_opts->x_riscv_tune_string) != 0)
>> +    return false;
> Tuning shouldn't affect correctness of inlining.  I'd just drop this clause if this patch goes forward.
> 

OK. I will fix it in the next revision.

> 
> 
> 
> Jeff
Yangyu Chen Sept. 30, 2024, 4:02 a.m. UTC | #3
> On Sep 30, 2024, at 10:34, Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> wrote:
>> 
>> 
>> On Sep 30, 2024, at 02:49, Jeff Law <jeffreyalaw@gmail.com> wrote:
>> On 9/9/24 6:11 AM, Yangyu Chen wrote:
>>> Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
>>> ISA. As a result, certain functions cannot be optimized with inlining
>>> when specific options, such as __attribute__((target("arch=+v"))) .
>>> This can lead to potential performance issues when building
>>> retargetable binaries for RISC-V.
>>> To address this, I have implemented the riscv_can_inline_p function.
>>> This addition enables inlining when the callee either has no special
>>> options or when the some options match, and also ensuring that the
>>> callee's ISA is a subset of the caller's. I also check some other
>>> options when there is no always_inline set.
>>> gcc/ChangeLog:
>>>        * config/riscv/riscv.cc (riscv_can_inline_p): New function.
>>>        (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
>> I'd kind of hoped not to mess with this as I suspect keeping this up-to-date is going to end up being somewhat painful and I doubt we're going to see that many cases where folks are using target attributes and where inlining is critical for performance.
>> 
>> What's really driving your desire to change this?  Is this showing up in real code as a performance issue?
>> 
> 
> Yeah. I'm working on function multi-versioning support and found
> that allowing the callee to be inlined in the caller will result
> in performance gains. Without this patch, we will need to CALL the
> function since we can't inline them when two functions have different
> attributes. A simple evaluation of coremarks shows a 13% performance
> degradation.
> 

Specially, we can reproduce the result on BananaPi-F3 Hardware:

Use this GCC branch with my patch:
https://github.com/cyyself/gcc/tree/rv_can_inline

And compile the coremark on this branch:
https://github.com/cyyself/coremark/tree/rva22_v_hotspot

With command `make CC=riscv64-unknown-linux-gnu-gcc compile`

With my patch, we will get the coremark scored `Iterations/Sec   :
5992.917461`. But without this patch after `git reset HEAD^` and
recompile the GCC and then coremark, we will get `Iterations/Sec :
5235.602094`, which is 12.6% slower.

Thanks,
Yangyu Chen

>>> +
>>> +  if (caller_opts->x_riscv_tune_string
>>> +      && callee_opts->x_riscv_tune_string
>>> +      && strcmp (caller_opts->x_riscv_tune_string,
>>> + callee_opts->x_riscv_tune_string) != 0)
>>> +    return false;
>> Tuning shouldn't affect correctness of inlining.  I'd just drop this clause if this patch goes forward.
> 
> OK. I will fix it in the next revision.
> 
>> Jeff
>
Kito Cheng Sept. 30, 2024, 5:58 a.m. UTC | #4
Hi Yang-Yu:

>
> Specially, we can reproduce the result on BananaPi-F3 Hardware:
>
> Use this GCC branch with my patch:
> https://github.com/cyyself/gcc/tree/rv_can_inline
>
> And compile the coremark on this branch:
> https://github.com/cyyself/coremark/tree/rva22_v_hotspot
>
> With command `make CC=riscv64-unknown-linux-gnu-gcc compile`
>
> With my patch, we will get the coremark scored `Iterations/Sec   :
> 5992.917461`. But without this patch after `git reset HEAD^` and
> recompile the GCC and then coremark, we will get `Iterations/Sec :
> 5235.602094`, which is 12.6% slower.

Could you add a test case to demonstrate that ?

>  /* Callee's ISA should be a subset of the caller's ISA.  */

This check is necessary, but this way may not scalable for longer term,
I mean people may forgot to update this part when adding new extension
variables,
so I would suggest add a new function to construct a riscv_subset_list
from options
e.g.

diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index dace4de6575..e8b7c0f194b 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -103,6 +103,7 @@ public:
  riscv_subset_list *clone () const;

  static riscv_subset_list *parse (const char *, location_t);
+  static riscv_subset_list *parse (struct gcc_options *opts);
  const char *parse_single_ext (const char *, bool exact_single_p = true);

  const riscv_subset_t *begin () const {return m_head;};

And then use riscv_subset_list to do the checking
Yangyu Chen Sept. 30, 2024, 8:22 a.m. UTC | #5
> On Sep 30, 2024, at 13:58, Kito Cheng <kito.cheng@gmail.com> wrote:
> 
> Hi Yang-Yu:
> 
>> 
>> Specially, we can reproduce the result on BananaPi-F3 Hardware:
>> 
>> Use this GCC branch with my patch:
>> https://github.com/cyyself/gcc/tree/rv_can_inline
>> 
>> And compile the coremark on this branch:
>> https://github.com/cyyself/coremark/tree/rva22_v_hotspot
>> 
>> With command `make CC=riscv64-unknown-linux-gnu-gcc compile`
>> 
>> With my patch, we will get the coremark scored `Iterations/Sec   :
>> 5992.917461`. But without this patch after `git reset HEAD^` and
>> recompile the GCC and then coremark, we will get `Iterations/Sec :
>> 5235.602094`, which is 12.6% slower.
> 
> Could you add a test case to demonstrate that ?
> 

I will try.

>> /* Callee's ISA should be a subset of the caller's ISA.  */
> 
> This check is necessary, but this way may not scalable for longer term,
> I mean people may forgot to update this part when adding new extension
> variables,
> so I would suggest add a new function to construct a riscv_subset_list
> from options
> e.g.
> 
> diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
> index dace4de6575..e8b7c0f194b 100644
> --- a/gcc/config/riscv/riscv-subset.h
> +++ b/gcc/config/riscv/riscv-subset.h
> @@ -103,6 +103,7 @@ public:
>  riscv_subset_list *clone () const;
> 
>  static riscv_subset_list *parse (const char *, location_t);
> +  static riscv_subset_list *parse (struct gcc_options *opts);

Thanks for this hint. However, using the class riscv_subset_list
is very costly and requires a copy of the ISA string. We can implement
a new function like bool riscv_ext_is_subset(struct gcc_options
*opts, struct gcc_options *subset) in riscv-common.cc and iterate
through the riscv_ext_flag_table. This method is also scalable for
the long term. I will submit the next revision lately.

>  const char *parse_single_ext (const char *, bool exact_single_p = true);
> 
>  const riscv_subset_t *begin () const {return m_head;};
> 
> And then use riscv_subset_list to do the checking
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 6efe14ff199..d36c135650e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7656,6 +7656,138 @@  riscv_compute_frame_info (void)
   /* Next points the incoming stack pointer and any incoming arguments. */
 }
 
+/* Implement TARGET_CAN_INLINE_P.  */
+
+static bool
+riscv_can_inline_p (tree caller, tree callee)
+{
+  tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
+  tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
+
+  /* It's safe to inline if callee has no opts.  */
+  if (! callee_tree)
+    return true;
+
+  if (! caller_tree)
+    caller_tree = target_option_default_node;
+
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+
+  int isa_flag_mask = MASK_MUL | MASK_ATOMIC | MASK_HARD_FLOAT
+		      | MASK_DOUBLE_FLOAT | MASK_RVC | MASK_VECTOR
+		      | MASK_FULL_V;
+
+  /* Callee and caller should have the same target options except for ISA.  */
+  int callee_target_flags = callee_opts->x_target_flags & ~isa_flag_mask;
+  int caller_target_flags = caller_opts->x_target_flags & ~isa_flag_mask;
+
+  if (callee_target_flags != caller_target_flags)
+    return false;
+
+  /* Callee's ISA should be a subset of the caller's ISA.  */
+  int callee_isa_flags = callee_opts->x_target_flags & isa_flag_mask;
+  int caller_isa_flags = caller_opts->x_target_flags & isa_flag_mask;
+
+  if (callee_isa_flags & ~caller_isa_flags)
+    return false;
+
+  if (callee_opts->x_riscv_zi_subext & ~caller_opts->x_riscv_zi_subext)
+    return false;
+
+  if (callee_opts->x_riscv_za_subext & ~caller_opts->x_riscv_za_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zb_subext & ~caller_opts->x_riscv_zb_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zinx_subext & ~caller_opts->x_riscv_zinx_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zk_subext & ~caller_opts->x_riscv_zk_subext)
+    return false;
+
+  if (callee_opts->x_riscv_vector_elen_flags
+      & ~caller_opts->x_riscv_vector_elen_flags)
+    return false;
+
+  if (callee_opts->x_riscv_zvl_flags & ~caller_opts->x_riscv_zvl_flags)
+    return false;
+
+  if (callee_opts->x_riscv_zvb_subext & ~caller_opts->x_riscv_zvb_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zvk_subext & ~caller_opts->x_riscv_zvk_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zicmo_subext & ~caller_opts->x_riscv_zicmo_subext)
+    return false;
+
+  if (callee_opts->x_riscv_mop_subext & ~caller_opts->x_riscv_mop_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zf_subext & ~caller_opts->x_riscv_zf_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zfa_subext & ~caller_opts->x_riscv_zfa_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zm_subext & ~caller_opts->x_riscv_zm_subext)
+    return false;
+
+  if (callee_opts->x_riscv_zc_subext & ~caller_opts->x_riscv_zc_subext)
+    return false;
+
+  if (callee_opts->x_riscv_sv_subext & ~caller_opts->x_riscv_sv_subext)
+    return false;
+
+  if (callee_opts->x_riscv_ztso_subext & ~caller_opts->x_riscv_ztso_subext)
+    return false;
+
+  if (callee_opts->x_riscv_xcv_subext & ~caller_opts->x_riscv_xcv_subext)
+    return false;
+
+  if (callee_opts->x_riscv_xthead_subext & ~caller_opts->x_riscv_xthead_subext)
+    return false;
+
+  if (callee_opts->x_riscv_xventana_subext
+      & ~caller_opts->x_riscv_xventana_subext)
+    return false;
+
+  if (callee_opts->x_riscv_sifive_subext & ~caller_opts->x_riscv_sifive_subext)
+    return false;
+
+  /* If the callee has always_inline set, we can ignore the rest attributes.  */
+  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (callee)))
+    return true;
+
+  if (caller_opts->x_riscv_cmodel != callee_opts->x_riscv_cmodel)
+    return false;
+
+  if (caller_opts->x_riscv_tls_dialect != callee_opts->x_riscv_tls_dialect)
+    return false;
+
+  if (caller_opts->x_riscv_stack_protector_guard_reg
+      != callee_opts->x_riscv_stack_protector_guard_reg)
+    return false;
+
+  if (caller_opts->x_riscv_stack_protector_guard_offset
+      != callee_opts->x_riscv_stack_protector_guard_offset)
+    return false;
+
+  if (caller_opts->x_rvv_vector_strict_align
+      != callee_opts->x_rvv_vector_strict_align)
+    return false;
+
+  if (caller_opts->x_riscv_tune_string
+      && callee_opts->x_riscv_tune_string
+      && strcmp (caller_opts->x_riscv_tune_string,
+		 callee_opts->x_riscv_tune_string) != 0)
+    return false;
+
+  return true;
+}
+
 /* Make sure that we're not trying to eliminate to the wrong hard frame
    pointer.  */
 
@@ -12540,6 +12672,9 @@  riscv_stack_clash_protection_alloca_probe_range (void)
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P	riscv_legitimate_address_p
 
+#undef TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P riscv_can_inline_p
+
 #undef TARGET_CAN_ELIMINATE
 #define TARGET_CAN_ELIMINATE riscv_can_eliminate