diff mbox series

[RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

Message ID f727bb3e-a553-2859-d691-2bcfebb50d7d@linux.ibm.com
State New
Headers show
Series [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059] | expand

Commit Message

Kewen.Lin Sept. 1, 2021, 7:02 a.m. UTC
Hi!

Power ISA 2.07 (Power8) introduces transactional memory feature
but ISA3.1 (Power10) removes it.  It exposes one troublesome
issue as PR102059 shows.  Users define some function with
target pragma cpu=power10 then it calls one function with
attribute always_inline which inherits command line option
cpu=power8 which enables HTM implicitly.  The current isa_flags
check doesn't allow this inlining due to "target specific
option mismatch" and error mesasge is emitted.

Normally, the callee function isn't intended to exploit HTM
feature, but the default flag setting make it look it has.
As Richi raised in the PR, we have fp_expressions flag in
function summary, and allow us to check the function actually
contains any floating point expressions to avoid overkill.
So this patch follows the similar idea but is more target
specific, for this rs6000 port specific requirement on HTM
feature check, we would like to check rs6000 specific HTM
built-in functions and inline assembly, it allows targets
to do their own customized checks and updates.

It introduces two target hooks need_ipa_fn_target_info and
update_ipa_fn_target_info.  The former allows target to do
some previous check and decides to collect target specific
information for this function or not.  For some special case,
it can predict the analysis result and push it early without
any scannings.  The latter allows the analyze_function_body
to pass gimple stmts down just like fp_expressions handlings,
target can do its own tricks.  I put them as one hook initially
with one boolean to indicates whether it's initial time, but
the code looks a bit ugly, to separate them seems to have
better readability.

To make it simple, this patch uses HOST_WIDE_INT to record the
flags just like what we use for isa_flags.  For rs6000's HTM
need, one HOST_WIDE_INT variable is quite enough, but it seems
good to have one auto_vec for scalability as I noticed some
targets have more than one HOST_WIDE_INT flag.  For now, this
target information collection is only for always_inline function,
function ipa_merge_fn_summary_after_inlining deals with target
information merging.

The patch has been bootstrapped and regress-tested on
powerpc64le-linux-gnu Power9.

Is it on the right track?

Any comments are highly appreciated!

BR,
Kewen
------
gcc/ChangeLog:

	PR ipa/102059
	* config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
	function.
	* config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
	declare.
	* config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
	(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
	(rs6000_need_ipa_fn_target_info): New function.
	(rs6000_update_ipa_fn_target_info): Likewise.
	(rs6000_can_inline_p): Adjust for ipa function summary target info.
	* ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
	summary target info.
	(analyze_function_body): Adjust for ipa function summary target
	info and call hook rs6000_need_ipa_fn_target_info and
	rs6000_update_ipa_fn_target_info.
	(ipa_merge_fn_summary_after_inlining): Adjust for ipa function
	summary target info.
	(inline_read_section): Likewise.
	(ipa_fn_summary_write): Likewise.
	* ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
	hook.
	(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
	* target.def (update_ipa_fn_target_info): New hook.
	(need_ipa_fn_target_info): Likewise.
	* targhooks.c (default_need_ipa_fn_target_info): New function.
	(default_update_ipa_fn_target_info): Likewise.
	* targhooks.h (default_update_ipa_fn_target_info): New declare.
	(default_need_ipa_fn_target_info): Likewise.

gcc/testsuite/ChangeLog:

	PR ipa/102059
	* gcc.dg/lto/pr102059_0.c: New test.
	* gcc.dg/lto/pr102059_1.c: New test.
	* gcc.dg/lto/pr102059_2.c: New test.
	* gcc.target/powerpc/pr102059-5.c: New test.
	* gcc.target/powerpc/pr102059-6.c: New test.
---
 gcc/config/rs6000/rs6000-call.c               | 11 +++
 gcc/config/rs6000/rs6000-internal.h           |  1 +
 gcc/config/rs6000/rs6000.c                    | 94 ++++++++++++++++++-
 gcc/doc/tm.texi                               | 31 ++++++
 gcc/doc/tm.texi.in                            |  4 +
 gcc/ipa-fnsummary.c                           | 53 +++++++++++
 gcc/ipa-fnsummary.h                           |  6 +-
 gcc/testsuite/gcc.dg/lto/pr102059_0.c         | 12 +++
 gcc/testsuite/gcc.dg/lto/pr102059_1.c         |  9 ++
 gcc/testsuite/gcc.dg/lto/pr102059_2.c         | 11 +++
 gcc/testsuite/gcc.target/powerpc/pr102059-5.c | 21 +++++
 gcc/testsuite/gcc.target/powerpc/pr102059-6.c | 21 +++++
 gcc/target.def                                | 35 +++++++
 gcc/targhooks.c                               | 16 ++++
 gcc/targhooks.h                               |  3 +
 15 files changed, 322 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059_1.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr102059_2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-5.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-6.c

Comments

Richard Biener Sept. 2, 2021, 9:25 a.m. UTC | #1
On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi!
>
> Power ISA 2.07 (Power8) introduces transactional memory feature
> but ISA3.1 (Power10) removes it.  It exposes one troublesome
> issue as PR102059 shows.  Users define some function with
> target pragma cpu=power10 then it calls one function with
> attribute always_inline which inherits command line option
> cpu=power8 which enables HTM implicitly.  The current isa_flags
> check doesn't allow this inlining due to "target specific
> option mismatch" and error mesasge is emitted.
>
> Normally, the callee function isn't intended to exploit HTM
> feature, but the default flag setting make it look it has.
> As Richi raised in the PR, we have fp_expressions flag in
> function summary, and allow us to check the function actually
> contains any floating point expressions to avoid overkill.
> So this patch follows the similar idea but is more target
> specific, for this rs6000 port specific requirement on HTM
> feature check, we would like to check rs6000 specific HTM
> built-in functions and inline assembly, it allows targets
> to do their own customized checks and updates.
>
> It introduces two target hooks need_ipa_fn_target_info and
> update_ipa_fn_target_info.  The former allows target to do
> some previous check and decides to collect target specific
> information for this function or not.  For some special case,
> it can predict the analysis result and push it early without
> any scannings.  The latter allows the analyze_function_body
> to pass gimple stmts down just like fp_expressions handlings,
> target can do its own tricks.  I put them as one hook initially
> with one boolean to indicates whether it's initial time, but
> the code looks a bit ugly, to separate them seems to have
> better readability.
>
> To make it simple, this patch uses HOST_WIDE_INT to record the
> flags just like what we use for isa_flags.  For rs6000's HTM
> need, one HOST_WIDE_INT variable is quite enough, but it seems
> good to have one auto_vec for scalability as I noticed some
> targets have more than one HOST_WIDE_INT flag.  For now, this
> target information collection is only for always_inline function,
> function ipa_merge_fn_summary_after_inlining deals with target
> information merging.
>
> The patch has been bootstrapped and regress-tested on
> powerpc64le-linux-gnu Power9.
>
> Is it on the right track?

+  if (always_inline)
+    {
+      cgraph_node *callee_node = cgraph_node::get (callee);
+      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
+       {
+         if (dump_file)
+           ipa_dump_fn_summary (dump_file, callee_node);
+         const vec<HOST_WIDE_INT> &info =
+           ipa_fn_summaries->get (callee_node)->target_info;
+         if (!info.is_empty ())
+           always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
+       }
+
+      caller_isa &= ~always_inline_safe_mask;
+      callee_isa &= ~always_inline_safe_mask;
+    }

that's a bit convoluted but obviously the IPA info can be used for
non-always_inline cases as well.

As said above the info can be produced for not always-inline functions
as well, the usual case would be for LTO inlining across TUs compiled
with different target options.  In your case the special -mcpu=power10
TU would otherwise not be able to inline from a general -mcpu=power8 TU.

On the streaming side we possibly have to take care about the
GPU offloading path where we likely want to avoid pushing host target
bits to the GPU target in some way.

Your case is specifically looking for HTM target builtins - for more general
cases, like for example deciding whether we can inline across, say,
-mlzcnt on x86 the scanning would have to include at least direct internal-fns
mapping to optabs that could change.  With such inlining we might also
work against heuristic tuning decisions based on the ISA availability
making code (much) more expensive to expand without such ISA availability,
but that wouldn't be a correctness issue then.

Otherwise the overall bits look OK but I'll leave the details to our IPA folks.

Thanks,
Richard.

>
> Any comments are highly appreciated!
>
> BR,
> Kewen
> ------
> gcc/ChangeLog:
>
>         PR ipa/102059
>         * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
>         function.
>         * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
>         declare.
>         * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
>         (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
>         (rs6000_need_ipa_fn_target_info): New function.
>         (rs6000_update_ipa_fn_target_info): Likewise.
>         (rs6000_can_inline_p): Adjust for ipa function summary target info.
>         * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
>         summary target info.
>         (analyze_function_body): Adjust for ipa function summary target
>         info and call hook rs6000_need_ipa_fn_target_info and
>         rs6000_update_ipa_fn_target_info.
>         (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
>         summary target info.
>         (inline_read_section): Likewise.
>         (ipa_fn_summary_write): Likewise.
>         * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
>         * doc/tm.texi: Regenerate.
>         * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
>         hook.
>         (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
>         * target.def (update_ipa_fn_target_info): New hook.
>         (need_ipa_fn_target_info): Likewise.
>         * targhooks.c (default_need_ipa_fn_target_info): New function.
>         (default_update_ipa_fn_target_info): Likewise.
>         * targhooks.h (default_update_ipa_fn_target_info): New declare.
>         (default_need_ipa_fn_target_info): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>         PR ipa/102059
>         * gcc.dg/lto/pr102059_0.c: New test.
>         * gcc.dg/lto/pr102059_1.c: New test.
>         * gcc.dg/lto/pr102059_2.c: New test.
>         * gcc.target/powerpc/pr102059-5.c: New test.
>         * gcc.target/powerpc/pr102059-6.c: New test.
>
Kewen.Lin Sept. 2, 2021, 11:13 a.m. UTC | #2
Hi Richi,

Thanks for the comments!

on 2021/9/2 下午5:25, Richard Biener wrote:
> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi!
>>
>> Power ISA 2.07 (Power8) introduces transactional memory feature
>> but ISA3.1 (Power10) removes it.  It exposes one troublesome
>> issue as PR102059 shows.  Users define some function with
>> target pragma cpu=power10 then it calls one function with
>> attribute always_inline which inherits command line option
>> cpu=power8 which enables HTM implicitly.  The current isa_flags
>> check doesn't allow this inlining due to "target specific
>> option mismatch" and error mesasge is emitted.
>>
>> Normally, the callee function isn't intended to exploit HTM
>> feature, but the default flag setting make it look it has.
>> As Richi raised in the PR, we have fp_expressions flag in
>> function summary, and allow us to check the function actually
>> contains any floating point expressions to avoid overkill.
>> So this patch follows the similar idea but is more target
>> specific, for this rs6000 port specific requirement on HTM
>> feature check, we would like to check rs6000 specific HTM
>> built-in functions and inline assembly, it allows targets
>> to do their own customized checks and updates.
>>
>> It introduces two target hooks need_ipa_fn_target_info and
>> update_ipa_fn_target_info.  The former allows target to do
>> some previous check and decides to collect target specific
>> information for this function or not.  For some special case,
>> it can predict the analysis result and push it early without
>> any scannings.  The latter allows the analyze_function_body
>> to pass gimple stmts down just like fp_expressions handlings,
>> target can do its own tricks.  I put them as one hook initially
>> with one boolean to indicates whether it's initial time, but
>> the code looks a bit ugly, to separate them seems to have
>> better readability.
>>
>> To make it simple, this patch uses HOST_WIDE_INT to record the
>> flags just like what we use for isa_flags.  For rs6000's HTM
>> need, one HOST_WIDE_INT variable is quite enough, but it seems
>> good to have one auto_vec for scalability as I noticed some
>> targets have more than one HOST_WIDE_INT flag.  For now, this
>> target information collection is only for always_inline function,
>> function ipa_merge_fn_summary_after_inlining deals with target
>> information merging.
>>
>> The patch has been bootstrapped and regress-tested on
>> powerpc64le-linux-gnu Power9.
>>
>> Is it on the right track?
> 
> +  if (always_inline)
> +    {
> +      cgraph_node *callee_node = cgraph_node::get (callee);
> +      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
> +       {
> +         if (dump_file)
> +           ipa_dump_fn_summary (dump_file, callee_node);
> +         const vec<HOST_WIDE_INT> &info =
> +           ipa_fn_summaries->get (callee_node)->target_info;
> +         if (!info.is_empty ())
> +           always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> +       }
> +
> +      caller_isa &= ~always_inline_safe_mask;
> +      callee_isa &= ~always_inline_safe_mask;
> +    }
> 
> that's a bit convoluted but obviously the IPA info can be used for
> non-always_inline cases as well.
> 
> As said above the info can be produced for not always-inline functions
> as well, the usual case would be for LTO inlining across TUs compiled
> with different target options.  In your case the special -mcpu=power10
> TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> 

Agree it can be extended to non-always_inline cases.  Since always_inline
is kind of user "forced" requirement and compiler emits error if it fails
to inline, while non-always_inline will have warning instead.  Considering
the scanning might be considered as costly for some big functions, I
guessed it might be good to start from always_inline as the first step.
But if different target options among LTO TUs is a common user case, I
think it's worth to extending it now.

> On the streaming side we possibly have to take care about the
> GPU offloading path where we likely want to avoid pushing host target
> bits to the GPU target in some way.
> 

I guess this comment is about lto_stream_offload_p, I just did some quick
checks, this flag seems to guard things into section offload_lto, while
the function summary has its own section, it seems fine? 

> Your case is specifically looking for HTM target builtins - for more general
> cases, like for example deciding whether we can inline across, say,
> -mlzcnt on x86 the scanning would have to include at least direct internal-fns
> mapping to optabs that could change.  With such inlining we might also
> work against heuristic tuning decisions based on the ISA availability
> making code (much) more expensive to expand without such ISA availability,
> but that wouldn't be a correctness issue then.

OK,I would assume the hook function parameter gimple* will be also enough for
this example.  :)

IMHO, even with this target information collection, we are unable to check all
ISA features, it can only work for some "dull" ISA features, like HTM on
Power which can only be exploited by builtin (or inline asm), the downstream
passes don't try to exploit it in other ways.  For some features like VSX,
vectorization pass can produce vector code after we generating fn summary and
think it doesn't use, it seems no way to get it right in early stage of pass
pipeline.
> 
> Otherwise the overall bits look OK but I'll leave the details to our IPA folks.
> 

Thanks again!

BR,
Kewen

> Thanks,
> Richard.
> 
>>
>> Any comments are highly appreciated!
>>
>> BR,
>> Kewen
>> ------
>> gcc/ChangeLog:
>>
>>         PR ipa/102059
>>         * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
>>         function.
>>         * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
>>         declare.
>>         * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
>>         (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
>>         (rs6000_need_ipa_fn_target_info): New function.
>>         (rs6000_update_ipa_fn_target_info): Likewise.
>>         (rs6000_can_inline_p): Adjust for ipa function summary target info.
>>         * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
>>         summary target info.
>>         (analyze_function_body): Adjust for ipa function summary target
>>         info and call hook rs6000_need_ipa_fn_target_info and
>>         rs6000_update_ipa_fn_target_info.
>>         (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
>>         summary target info.
>>         (inline_read_section): Likewise.
>>         (ipa_fn_summary_write): Likewise.
>>         * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
>>         * doc/tm.texi: Regenerate.
>>         * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
>>         hook.
>>         (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
>>         * target.def (update_ipa_fn_target_info): New hook.
>>         (need_ipa_fn_target_info): Likewise.
>>         * targhooks.c (default_need_ipa_fn_target_info): New function.
>>         (default_update_ipa_fn_target_info): Likewise.
>>         * targhooks.h (default_update_ipa_fn_target_info): New declare.
>>         (default_need_ipa_fn_target_info): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         PR ipa/102059
>>         * gcc.dg/lto/pr102059_0.c: New test.
>>         * gcc.dg/lto/pr102059_1.c: New test.
>>         * gcc.dg/lto/pr102059_2.c: New test.
>>         * gcc.target/powerpc/pr102059-5.c: New test.
>>         * gcc.target/powerpc/pr102059-6.c: New test.
>>
Richard Biener Sept. 2, 2021, 11:51 a.m. UTC | #3
On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/9/2 下午5:25, Richard Biener wrote:
> > On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi!
> >>
> >> Power ISA 2.07 (Power8) introduces transactional memory feature
> >> but ISA3.1 (Power10) removes it.  It exposes one troublesome
> >> issue as PR102059 shows.  Users define some function with
> >> target pragma cpu=power10 then it calls one function with
> >> attribute always_inline which inherits command line option
> >> cpu=power8 which enables HTM implicitly.  The current isa_flags
> >> check doesn't allow this inlining due to "target specific
> >> option mismatch" and error mesasge is emitted.
> >>
> >> Normally, the callee function isn't intended to exploit HTM
> >> feature, but the default flag setting make it look it has.
> >> As Richi raised in the PR, we have fp_expressions flag in
> >> function summary, and allow us to check the function actually
> >> contains any floating point expressions to avoid overkill.
> >> So this patch follows the similar idea but is more target
> >> specific, for this rs6000 port specific requirement on HTM
> >> feature check, we would like to check rs6000 specific HTM
> >> built-in functions and inline assembly, it allows targets
> >> to do their own customized checks and updates.
> >>
> >> It introduces two target hooks need_ipa_fn_target_info and
> >> update_ipa_fn_target_info.  The former allows target to do
> >> some previous check and decides to collect target specific
> >> information for this function or not.  For some special case,
> >> it can predict the analysis result and push it early without
> >> any scannings.  The latter allows the analyze_function_body
> >> to pass gimple stmts down just like fp_expressions handlings,
> >> target can do its own tricks.  I put them as one hook initially
> >> with one boolean to indicates whether it's initial time, but
> >> the code looks a bit ugly, to separate them seems to have
> >> better readability.
> >>
> >> To make it simple, this patch uses HOST_WIDE_INT to record the
> >> flags just like what we use for isa_flags.  For rs6000's HTM
> >> need, one HOST_WIDE_INT variable is quite enough, but it seems
> >> good to have one auto_vec for scalability as I noticed some
> >> targets have more than one HOST_WIDE_INT flag.  For now, this
> >> target information collection is only for always_inline function,
> >> function ipa_merge_fn_summary_after_inlining deals with target
> >> information merging.
> >>
> >> The patch has been bootstrapped and regress-tested on
> >> powerpc64le-linux-gnu Power9.
> >>
> >> Is it on the right track?
> >
> > +  if (always_inline)
> > +    {
> > +      cgraph_node *callee_node = cgraph_node::get (callee);
> > +      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
> > +       {
> > +         if (dump_file)
> > +           ipa_dump_fn_summary (dump_file, callee_node);
> > +         const vec<HOST_WIDE_INT> &info =
> > +           ipa_fn_summaries->get (callee_node)->target_info;
> > +         if (!info.is_empty ())
> > +           always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> > +       }
> > +
> > +      caller_isa &= ~always_inline_safe_mask;
> > +      callee_isa &= ~always_inline_safe_mask;
> > +    }
> >
> > that's a bit convoluted but obviously the IPA info can be used for
> > non-always_inline cases as well.
> >
> > As said above the info can be produced for not always-inline functions
> > as well, the usual case would be for LTO inlining across TUs compiled
> > with different target options.  In your case the special -mcpu=power10
> > TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> >
>
> Agree it can be extended to non-always_inline cases.  Since always_inline
> is kind of user "forced" requirement and compiler emits error if it fails
> to inline, while non-always_inline will have warning instead.  Considering
> the scanning might be considered as costly for some big functions, I
> guessed it might be good to start from always_inline as the first step.
> But if different target options among LTO TUs is a common user case, I
> think it's worth to extending it now.

I was merely looking at this from the perspective of test coverage - with
restricting it to always-inline we're not going to notice issues very
reliably I think.

> > On the streaming side we possibly have to take care about the
> > GPU offloading path where we likely want to avoid pushing host target
> > bits to the GPU target in some way.
> >
>
> I guess this comment is about lto_stream_offload_p, I just did some quick
> checks, this flag seems to guard things into section offload_lto, while
> the function summary has its own section, it seems fine?

Yes, but the data, since its target specific, is interpreted different by
the host target than by the offload target so IMHO we should drop this
to a conservative state when offloading?

> > Your case is specifically looking for HTM target builtins - for more general
> > cases, like for example deciding whether we can inline across, say,
> > -mlzcnt on x86 the scanning would have to include at least direct internal-fns
> > mapping to optabs that could change.  With such inlining we might also
> > work against heuristic tuning decisions based on the ISA availability
> > making code (much) more expensive to expand without such ISA availability,
> > but that wouldn't be a correctness issue then.
>
> OK,I would assume the hook function parameter gimple* will be also enough for
> this example.  :)

Yes, I think so.

> IMHO, even with this target information collection, we are unable to check all
> ISA features, it can only work for some "dull" ISA features, like HTM on
> Power which can only be exploited by builtin (or inline asm), the downstream
> passes don't try to exploit it in other ways.  For some features like VSX,
> vectorization pass can produce vector code after we generating fn summary and
> think it doesn't use, it seems no way to get it right in early stage of pass
> pipeline.

I don't think later passes are an issue - they would operate under the
constraints
of the caller flag and thus still generate valid code.  Yes, if, say, somebody
disabled VSX on a function in the attempt to not vectorize an unprofitable case
and that function gets inlined it might end up using VSX (as now active in the
caller) to vectorize the unprofitable case.  But in general it should work
for any ISA feature (and some ISA features might just change what we
expand common GIMPLE to - differences in those ISA features do not need
to prevent inlining from a correctness perspective).

Richard.

> >
> > Otherwise the overall bits look OK but I'll leave the details to our IPA folks.
> >
>
> Thanks again!
>
> BR,
> Kewen
>
> > Thanks,
> > Richard.
> >
> >>
> >> Any comments are highly appreciated!
> >>
> >> BR,
> >> Kewen
> >> ------
> >> gcc/ChangeLog:
> >>
> >>         PR ipa/102059
> >>         * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
> >>         function.
> >>         * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
> >>         declare.
> >>         * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
> >>         (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
> >>         (rs6000_need_ipa_fn_target_info): New function.
> >>         (rs6000_update_ipa_fn_target_info): Likewise.
> >>         (rs6000_can_inline_p): Adjust for ipa function summary target info.
> >>         * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
> >>         summary target info.
> >>         (analyze_function_body): Adjust for ipa function summary target
> >>         info and call hook rs6000_need_ipa_fn_target_info and
> >>         rs6000_update_ipa_fn_target_info.
> >>         (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
> >>         summary target info.
> >>         (inline_read_section): Likewise.
> >>         (ipa_fn_summary_write): Likewise.
> >>         * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
> >>         * doc/tm.texi: Regenerate.
> >>         * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
> >>         hook.
> >>         (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
> >>         * target.def (update_ipa_fn_target_info): New hook.
> >>         (need_ipa_fn_target_info): Likewise.
> >>         * targhooks.c (default_need_ipa_fn_target_info): New function.
> >>         (default_update_ipa_fn_target_info): Likewise.
> >>         * targhooks.h (default_update_ipa_fn_target_info): New declare.
> >>         (default_need_ipa_fn_target_info): Likewise.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         PR ipa/102059
> >>         * gcc.dg/lto/pr102059_0.c: New test.
> >>         * gcc.dg/lto/pr102059_1.c: New test.
> >>         * gcc.dg/lto/pr102059_2.c: New test.
> >>         * gcc.target/powerpc/pr102059-5.c: New test.
> >>         * gcc.target/powerpc/pr102059-6.c: New test.
> >>
Kewen.Lin Sept. 2, 2021, 1:10 p.m. UTC | #4
on 2021/9/2 下午7:51, Richard Biener wrote:
> On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Richi,
>>
>> Thanks for the comments!
>>
>> on 2021/9/2 下午5:25, Richard Biener wrote:
>>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi!
>>>>
>>>> Power ISA 2.07 (Power8) introduces transactional memory feature
>>>> but ISA3.1 (Power10) removes it.  It exposes one troublesome
>>>> issue as PR102059 shows.  Users define some function with
>>>> target pragma cpu=power10 then it calls one function with
>>>> attribute always_inline which inherits command line option
>>>> cpu=power8 which enables HTM implicitly.  The current isa_flags
>>>> check doesn't allow this inlining due to "target specific
>>>> option mismatch" and error mesasge is emitted.
>>>>
>>>> Normally, the callee function isn't intended to exploit HTM
>>>> feature, but the default flag setting make it look it has.
>>>> As Richi raised in the PR, we have fp_expressions flag in
>>>> function summary, and allow us to check the function actually
>>>> contains any floating point expressions to avoid overkill.
>>>> So this patch follows the similar idea but is more target
>>>> specific, for this rs6000 port specific requirement on HTM
>>>> feature check, we would like to check rs6000 specific HTM
>>>> built-in functions and inline assembly, it allows targets
>>>> to do their own customized checks and updates.
>>>>
>>>> It introduces two target hooks need_ipa_fn_target_info and
>>>> update_ipa_fn_target_info.  The former allows target to do
>>>> some previous check and decides to collect target specific
>>>> information for this function or not.  For some special case,
>>>> it can predict the analysis result and push it early without
>>>> any scannings.  The latter allows the analyze_function_body
>>>> to pass gimple stmts down just like fp_expressions handlings,
>>>> target can do its own tricks.  I put them as one hook initially
>>>> with one boolean to indicates whether it's initial time, but
>>>> the code looks a bit ugly, to separate them seems to have
>>>> better readability.
>>>>
>>>> To make it simple, this patch uses HOST_WIDE_INT to record the
>>>> flags just like what we use for isa_flags.  For rs6000's HTM
>>>> need, one HOST_WIDE_INT variable is quite enough, but it seems
>>>> good to have one auto_vec for scalability as I noticed some
>>>> targets have more than one HOST_WIDE_INT flag.  For now, this
>>>> target information collection is only for always_inline function,
>>>> function ipa_merge_fn_summary_after_inlining deals with target
>>>> information merging.
>>>>
>>>> The patch has been bootstrapped and regress-tested on
>>>> powerpc64le-linux-gnu Power9.
>>>>
>>>> Is it on the right track?
>>>
>>> +  if (always_inline)
>>> +    {
>>> +      cgraph_node *callee_node = cgraph_node::get (callee);
>>> +      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
>>> +       {
>>> +         if (dump_file)
>>> +           ipa_dump_fn_summary (dump_file, callee_node);
>>> +         const vec<HOST_WIDE_INT> &info =
>>> +           ipa_fn_summaries->get (callee_node)->target_info;
>>> +         if (!info.is_empty ())
>>> +           always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
>>> +       }
>>> +
>>> +      caller_isa &= ~always_inline_safe_mask;
>>> +      callee_isa &= ~always_inline_safe_mask;
>>> +    }
>>>
>>> that's a bit convoluted but obviously the IPA info can be used for
>>> non-always_inline cases as well.
>>>
>>> As said above the info can be produced for not always-inline functions
>>> as well, the usual case would be for LTO inlining across TUs compiled
>>> with different target options.  In your case the special -mcpu=power10
>>> TU would otherwise not be able to inline from a general -mcpu=power8 TU.
>>>
>>
>> Agree it can be extended to non-always_inline cases.  Since always_inline
>> is kind of user "forced" requirement and compiler emits error if it fails
>> to inline, while non-always_inline will have warning instead.  Considering
>> the scanning might be considered as costly for some big functions, I
>> guessed it might be good to start from always_inline as the first step.
>> But if different target options among LTO TUs is a common user case, I
>> think it's worth to extending it now.
> 
> I was merely looking at this from the perspective of test coverage - with
> restricting it to always-inline we're not going to notice issues very
> reliably I think.
> 

Got it, will extend it to support all inlinable functions in next version.

>>> On the streaming side we possibly have to take care about the
>>> GPU offloading path where we likely want to avoid pushing host target
>>> bits to the GPU target in some way.
>>>
>>
>> I guess this comment is about lto_stream_offload_p, I just did some quick
>> checks, this flag seems to guard things into section offload_lto, while
>> the function summary has its own section, it seems fine?
> 
> Yes, but the data, since its target specific, is interpreted different by
> the host target than by the offload target so IMHO we should drop this
> to a conservative state when offloading?
> 

OK, I can guard the read/write under !lto_stream_offload_p and also test
with offload cases to see whether it takes effect as expected.  There are
explicitly set/unset code before write_summary, but seems none for summary
reading, I might miss sth., need to check more.

>>> Your case is specifically looking for HTM target builtins - for more general
>>> cases, like for example deciding whether we can inline across, say,
>>> -mlzcnt on x86 the scanning would have to include at least direct internal-fns
>>> mapping to optabs that could change.  With such inlining we might also
>>> work against heuristic tuning decisions based on the ISA availability
>>> making code (much) more expensive to expand without such ISA availability,
>>> but that wouldn't be a correctness issue then.
>>
>> OK,I would assume the hook function parameter gimple* will be also enough for
>> this example.  :)
> 
> Yes, I think so.
> 
>> IMHO, even with this target information collection, we are unable to check all
>> ISA features, it can only work for some "dull" ISA features, like HTM on
>> Power which can only be exploited by builtin (or inline asm), the downstream
>> passes don't try to exploit it in other ways.  For some features like VSX,
>> vectorization pass can produce vector code after we generating fn summary and
>> think it doesn't use, it seems no way to get it right in early stage of pass
>> pipeline.
> 
> I don't think later passes are an issue - they would operate under the
> constraints of the caller flag and thus still generate valid code.  

Yeah, it makes sense.

> Yes, if, say, somebody
> disabled VSX on a function in the attempt to not vectorize an unprofitable case
> and that function gets inlined it might end up using VSX (as now active in the
> caller) to vectorize the unprofitable case.  But in general it should work

The example seems to be what we want to avoid in PR70010 if non-VSX set for callee
explicitly. :)  btw, in the current implementation, I simply mark the function
non-HTM if it has explicit option -mno-htm, it doesn't need any scannings.

One opposite example, one callee with VSX flag (implicitly or explicitly), but
after scanning it's marked as non-VSX, it's a question that if we want it to be
inlined to one caller with non-VSX or not.  Like your example, if it can be
vectorized before (meant without inlining), then to inline it is bad.  While if
it can not be vectorized before, then to inline it is good.  We can't predict
this function will have VSX code or not (vectorized or not, feature exploited
or not) when setting the target information VSX feature bit.

Anyway, target can do its own decision in that hook, it's fine.  :)

> for any ISA feature (and some ISA features might just change what we
> expand common GIMPLE to - differences in those ISA features do not need
> to prevent inlining from a correctness perspective).
> 

Yeah, it would not be a correctness issue.

BR,
Kewen

> Richard.
> 
>>>
>>> Otherwise the overall bits look OK but I'll leave the details to our IPA folks.
>>>
>>
>> Thanks again!
>>
>> BR,
>> Kewen
>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Any comments are highly appreciated!
>>>>
>>>> BR,
>>>> Kewen
>>>> ------
>>>> gcc/ChangeLog:
>>>>
>>>>         PR ipa/102059
>>>>         * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
>>>>         function.
>>>>         * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
>>>>         declare.
>>>>         * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
>>>>         (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
>>>>         (rs6000_need_ipa_fn_target_info): New function.
>>>>         (rs6000_update_ipa_fn_target_info): Likewise.
>>>>         (rs6000_can_inline_p): Adjust for ipa function summary target info.
>>>>         * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
>>>>         summary target info.
>>>>         (analyze_function_body): Adjust for ipa function summary target
>>>>         info and call hook rs6000_need_ipa_fn_target_info and
>>>>         rs6000_update_ipa_fn_target_info.
>>>>         (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
>>>>         summary target info.
>>>>         (inline_read_section): Likewise.
>>>>         (ipa_fn_summary_write): Likewise.
>>>>         * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
>>>>         * doc/tm.texi: Regenerate.
>>>>         * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
>>>>         hook.
>>>>         (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
>>>>         * target.def (update_ipa_fn_target_info): New hook.
>>>>         (need_ipa_fn_target_info): Likewise.
>>>>         * targhooks.c (default_need_ipa_fn_target_info): New function.
>>>>         (default_update_ipa_fn_target_info): Likewise.
>>>>         * targhooks.h (default_update_ipa_fn_target_info): New declare.
>>>>         (default_need_ipa_fn_target_info): Likewise.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>         PR ipa/102059
>>>>         * gcc.dg/lto/pr102059_0.c: New test.
>>>>         * gcc.dg/lto/pr102059_1.c: New test.
>>>>         * gcc.dg/lto/pr102059_2.c: New test.
>>>>         * gcc.target/powerpc/pr102059-5.c: New test.
>>>>         * gcc.target/powerpc/pr102059-6.c: New test.
>>>>


BR,
Kewen
Richard Biener Sept. 2, 2021, 1:21 p.m. UTC | #5
On Thu, Sep 2, 2021 at 3:11 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/9/2 下午7:51, Richard Biener wrote:
> > On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi Richi,
> >>
> >> Thanks for the comments!
> >>
> >> on 2021/9/2 下午5:25, Richard Biener wrote:
> >>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>
> >>>> Hi!
> >>>>
> >>>> Power ISA 2.07 (Power8) introduces transactional memory feature
> >>>> but ISA3.1 (Power10) removes it.  It exposes one troublesome
> >>>> issue as PR102059 shows.  Users define some function with
> >>>> target pragma cpu=power10 then it calls one function with
> >>>> attribute always_inline which inherits command line option
> >>>> cpu=power8 which enables HTM implicitly.  The current isa_flags
> >>>> check doesn't allow this inlining due to "target specific
> >>>> option mismatch" and error mesasge is emitted.
> >>>>
> >>>> Normally, the callee function isn't intended to exploit HTM
> >>>> feature, but the default flag setting make it look it has.
> >>>> As Richi raised in the PR, we have fp_expressions flag in
> >>>> function summary, and allow us to check the function actually
> >>>> contains any floating point expressions to avoid overkill.
> >>>> So this patch follows the similar idea but is more target
> >>>> specific, for this rs6000 port specific requirement on HTM
> >>>> feature check, we would like to check rs6000 specific HTM
> >>>> built-in functions and inline assembly, it allows targets
> >>>> to do their own customized checks and updates.
> >>>>
> >>>> It introduces two target hooks need_ipa_fn_target_info and
> >>>> update_ipa_fn_target_info.  The former allows target to do
> >>>> some previous check and decides to collect target specific
> >>>> information for this function or not.  For some special case,
> >>>> it can predict the analysis result and push it early without
> >>>> any scannings.  The latter allows the analyze_function_body
> >>>> to pass gimple stmts down just like fp_expressions handlings,
> >>>> target can do its own tricks.  I put them as one hook initially
> >>>> with one boolean to indicates whether it's initial time, but
> >>>> the code looks a bit ugly, to separate them seems to have
> >>>> better readability.
> >>>>
> >>>> To make it simple, this patch uses HOST_WIDE_INT to record the
> >>>> flags just like what we use for isa_flags.  For rs6000's HTM
> >>>> need, one HOST_WIDE_INT variable is quite enough, but it seems
> >>>> good to have one auto_vec for scalability as I noticed some
> >>>> targets have more than one HOST_WIDE_INT flag.  For now, this
> >>>> target information collection is only for always_inline function,
> >>>> function ipa_merge_fn_summary_after_inlining deals with target
> >>>> information merging.
> >>>>
> >>>> The patch has been bootstrapped and regress-tested on
> >>>> powerpc64le-linux-gnu Power9.
> >>>>
> >>>> Is it on the right track?
> >>>
> >>> +  if (always_inline)
> >>> +    {
> >>> +      cgraph_node *callee_node = cgraph_node::get (callee);
> >>> +      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
> >>> +       {
> >>> +         if (dump_file)
> >>> +           ipa_dump_fn_summary (dump_file, callee_node);
> >>> +         const vec<HOST_WIDE_INT> &info =
> >>> +           ipa_fn_summaries->get (callee_node)->target_info;
> >>> +         if (!info.is_empty ())
> >>> +           always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> >>> +       }
> >>> +
> >>> +      caller_isa &= ~always_inline_safe_mask;
> >>> +      callee_isa &= ~always_inline_safe_mask;
> >>> +    }
> >>>
> >>> that's a bit convoluted but obviously the IPA info can be used for
> >>> non-always_inline cases as well.
> >>>
> >>> As said above the info can be produced for not always-inline functions
> >>> as well, the usual case would be for LTO inlining across TUs compiled
> >>> with different target options.  In your case the special -mcpu=power10
> >>> TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> >>>
> >>
> >> Agree it can be extended to non-always_inline cases.  Since always_inline
> >> is kind of user "forced" requirement and compiler emits error if it fails
> >> to inline, while non-always_inline will have warning instead.  Considering
> >> the scanning might be considered as costly for some big functions, I
> >> guessed it might be good to start from always_inline as the first step.
> >> But if different target options among LTO TUs is a common user case, I
> >> think it's worth to extending it now.
> >
> > I was merely looking at this from the perspective of test coverage - with
> > restricting it to always-inline we're not going to notice issues very
> > reliably I think.
> >
>
> Got it, will extend it to support all inlinable functions in next version.
>
> >>> On the streaming side we possibly have to take care about the
> >>> GPU offloading path where we likely want to avoid pushing host target
> >>> bits to the GPU target in some way.
> >>>
> >>
> >> I guess this comment is about lto_stream_offload_p, I just did some quick
> >> checks, this flag seems to guard things into section offload_lto, while
> >> the function summary has its own section, it seems fine?
> >
> > Yes, but the data, since its target specific, is interpreted different by
> > the host target than by the offload target so IMHO we should drop this
> > to a conservative state when offloading?
> >
>
> OK, I can guard the read/write under !lto_stream_offload_p and also test
> with offload cases to see whether it takes effect as expected.  There are
> explicitly set/unset code before write_summary, but seems none for summary
> reading, I might miss sth., need to check more.
>
> >>> Your case is specifically looking for HTM target builtins - for more general
> >>> cases, like for example deciding whether we can inline across, say,
> >>> -mlzcnt on x86 the scanning would have to include at least direct internal-fns
> >>> mapping to optabs that could change.  With such inlining we might also
> >>> work against heuristic tuning decisions based on the ISA availability
> >>> making code (much) more expensive to expand without such ISA availability,
> >>> but that wouldn't be a correctness issue then.
> >>
> >> OK,I would assume the hook function parameter gimple* will be also enough for
> >> this example.  :)
> >
> > Yes, I think so.
> >
> >> IMHO, even with this target information collection, we are unable to check all
> >> ISA features, it can only work for some "dull" ISA features, like HTM on
> >> Power which can only be exploited by builtin (or inline asm), the downstream
> >> passes don't try to exploit it in other ways.  For some features like VSX,
> >> vectorization pass can produce vector code after we generating fn summary and
> >> think it doesn't use, it seems no way to get it right in early stage of pass
> >> pipeline.
> >
> > I don't think later passes are an issue - they would operate under the
> > constraints of the caller flag and thus still generate valid code.
>
> Yeah, it makes sense.
>
> > Yes, if, say, somebody
> > disabled VSX on a function in the attempt to not vectorize an unprofitable case
> > and that function gets inlined it might end up using VSX (as now active in the
> > caller) to vectorize the unprofitable case.  But in general it should work
>
> The example seems to be what we want to avoid in PR70010 if non-VSX set for callee
> explicitly. :)  btw, in the current implementation, I simply mark the function
> non-HTM if it has explicit option -mno-htm, it doesn't need any scannings.
>
> One opposite example, one callee with VSX flag (implicitly or explicitly), but
> after scanning it's marked as non-VSX, it's a question that if we want it to be
> inlined to one caller with non-VSX or not.  Like your example, if it can be
> vectorized before (meant without inlining), then to inline it is bad.  While if
> it can not be vectorized before, then to inline it is good.  We can't predict
> this function will have VSX code or not (vectorized or not, feature exploited
> or not) when setting the target information VSX feature bit.

Yeah - I guess this is where any heuristic decision a target might want to
make will be different for the always-inline case.

> Anyway, target can do its own decision in that hook, it's fine.  :)
>
> > for any ISA feature (and some ISA features might just change what we
> > expand common GIMPLE to - differences in those ISA features do not need
> > to prevent inlining from a correctness perspective).
> >
>
> Yeah, it would not be a correctness issue.
>
> BR,
> Kewen
>
> > Richard.
> >
> >>>
> >>> Otherwise the overall bits look OK but I'll leave the details to our IPA folks.
> >>>
> >>
> >> Thanks again!
> >>
> >> BR,
> >> Kewen
> >>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>>
> >>>> Any comments are highly appreciated!
> >>>>
> >>>> BR,
> >>>> Kewen
> >>>> ------
> >>>> gcc/ChangeLog:
> >>>>
> >>>>         PR ipa/102059
> >>>>         * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
> >>>>         function.
> >>>>         * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
> >>>>         declare.
> >>>>         * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
> >>>>         (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
> >>>>         (rs6000_need_ipa_fn_target_info): New function.
> >>>>         (rs6000_update_ipa_fn_target_info): Likewise.
> >>>>         (rs6000_can_inline_p): Adjust for ipa function summary target info.
> >>>>         * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
> >>>>         summary target info.
> >>>>         (analyze_function_body): Adjust for ipa function summary target
> >>>>         info and call hook rs6000_need_ipa_fn_target_info and
> >>>>         rs6000_update_ipa_fn_target_info.
> >>>>         (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
> >>>>         summary target info.
> >>>>         (inline_read_section): Likewise.
> >>>>         (ipa_fn_summary_write): Likewise.
> >>>>         * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
> >>>>         * doc/tm.texi: Regenerate.
> >>>>         * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
> >>>>         hook.
> >>>>         (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
> >>>>         * target.def (update_ipa_fn_target_info): New hook.
> >>>>         (need_ipa_fn_target_info): Likewise.
> >>>>         * targhooks.c (default_need_ipa_fn_target_info): New function.
> >>>>         (default_update_ipa_fn_target_info): Likewise.
> >>>>         * targhooks.h (default_update_ipa_fn_target_info): New declare.
> >>>>         (default_need_ipa_fn_target_info): Likewise.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>>         PR ipa/102059
> >>>>         * gcc.dg/lto/pr102059_0.c: New test.
> >>>>         * gcc.dg/lto/pr102059_1.c: New test.
> >>>>         * gcc.dg/lto/pr102059_2.c: New test.
> >>>>         * gcc.target/powerpc/pr102059-5.c: New test.
> >>>>         * gcc.target/powerpc/pr102059-6.c: New test.
> >>>>
>
>
> BR,
> Kewen
Segher Boessenkool Sept. 2, 2021, 5:44 p.m. UTC | #6
Hi!

On Wed, Sep 01, 2021 at 03:02:22PM +0800, Kewen.Lin wrote:
> It introduces two target hooks need_ipa_fn_target_info and
> update_ipa_fn_target_info.  The former allows target to do
> some previous check and decides to collect target specific
> information for this function or not.  For some special case,
> it can predict the analysis result and push it early without
> any scannings.  The latter allows the analyze_function_body
> to pass gimple stmts down just like fp_expressions handlings,
> target can do its own tricks.
> 
> To make it simple, this patch uses HOST_WIDE_INT to record the
> flags just like what we use for isa_flags.  For rs6000's HTM
> need, one HOST_WIDE_INT variable is quite enough, but it seems
> good to have one auto_vec for scalability as I noticed some
> targets have more than one HOST_WIDE_INT flag.  For now, this
> target information collection is only for always_inline function,
> function ipa_merge_fn_summary_after_inlining deals with target
> information merging.

These flags can in principle be separate from any flags the target
keeps, so 64 bits will be enough for a long time.  If we want to
architect that better, we should really architect the way all targets
do target flags first.  Let's not go there now :-)

So just one HOST_WIDE_INT, not a stack of them please?

> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -13642,6 +13642,17 @@ rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
>    return rs6000_builtin_decls[code];
>  }
>  
> +/* Return true if the builtin with CODE has any mask bits set
> +   which are specified by MASK.  */
> +
> +bool
> +rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask)
> +{
> +  gcc_assert (code < RS6000_BUILTIN_COUNT);
> +  HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask;
> +  return fnmask & mask;
> +}

The "_p" does not say that "any bits" part, which is crucial here.  So
name this something like "rs6000_fn_has_any_of_these_mask_bits"?  Yes
the name sucks, because this interface does :-P

Its it useful to have "any" semantics at all?  Otherwise, require this
to be passed just a single bit?

The implicit "!!" (or "!= 0", same thing) that casting to bool does
might be better explicit, too?  A cast to bool changes value so is more
suprising than other casts.

> +  /* Assume inline asm can use any instruction features.  */
> +  if (gimple_code (stmt) == GIMPLE_ASM)
> +    {
> +      info[0] = -1;
> +      return false;
> +    }

What is -1 here?  "All options set"?  Does that work?  Reliably?

> +      if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD))
> +	{
> +	  enum rs6000_builtins fcode =
> +	    (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> +	  /* HTM bifs definitely exploit HTM insns.  */
> +	  if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM))

Why the cast here?  Please change the parameter type, instead?  It is
fine to use enums specific to our backend in that backend itself :-)

> @@ -1146,6 +1147,16 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
>  	  fprintf (f, "  calls:\n");
>  	  dump_ipa_call_summary (f, 4, node, s);
>  	  fprintf (f, "\n");
> +	  HOST_WIDE_INT flags;
> +	  for (int i = 0; s->target_info.iterate (i, &flags); i++)
> +	    {
> +	      if (i == 0)
> +		{
> +		  fprintf (f, "  target_info flags:");
> +		}

Don't use blocks around single statements please.

> +  /* Only look for target information for inlinable always_inline functions.  */
> +  bool scan_for_target_info =
> +    (info->inlinable
> +     && DECL_DISREGARD_INLINE_LIMITS (node->decl)
> +     && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl))
> +     && targetm.target_option.need_ipa_fn_target_info (node->decl,
> +						       info->target_info));

Don't use unnecessary parens please.


Segher
Kewen.Lin Sept. 3, 2021, 3:09 a.m. UTC | #7
Hi Segher,

Thanks for the comments!

on 2021/9/3 上午1:44, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Sep 01, 2021 at 03:02:22PM +0800, Kewen.Lin wrote:
>> It introduces two target hooks need_ipa_fn_target_info and
>> update_ipa_fn_target_info.  The former allows target to do
>> some previous check and decides to collect target specific
>> information for this function or not.  For some special case,
>> it can predict the analysis result and push it early without
>> any scannings.  The latter allows the analyze_function_body
>> to pass gimple stmts down just like fp_expressions handlings,
>> target can do its own tricks.
>>
>> To make it simple, this patch uses HOST_WIDE_INT to record the
>> flags just like what we use for isa_flags.  For rs6000's HTM
>> need, one HOST_WIDE_INT variable is quite enough, but it seems
>> good to have one auto_vec for scalability as I noticed some
>> targets have more than one HOST_WIDE_INT flag.  For now, this
>> target information collection is only for always_inline function,
>> function ipa_merge_fn_summary_after_inlining deals with target
>> information merging.
> 
> These flags can in principle be separate from any flags the target
> keeps, so 64 bits will be enough for a long time.  If we want to
> architect that better, we should really architect the way all targets
> do target flags first.  Let's not go there now :-)
> 
> So just one HOST_WIDE_INT, not a stack of them please?

I considered this, it's fine to use this customized bit in the target hook,
but back to target hook can_inline_p, we have to decoded them to the bits
in isa_flags separately, it's inefficient than just using the whole mask
if the interesting bits are more.

As the discussion with Richi, theoretically speaking if target likes, it can
try to scan for many isa features with target's own desicions, there could be
much more bits.  Another thing inspiring me to make it with one vector is that
i386 port ix86_can_inline_p checks x_ix86_target_flags, x_ix86_isa_flags,
x_ix86_isa_flags2, arch and tune etc. now, one HOST_WIDE_INT seems not good
to it, if it wants to check more.  ;-)

> 
>> --- a/gcc/config/rs6000/rs6000-call.c
>> +++ b/gcc/config/rs6000/rs6000-call.c
>> @@ -13642,6 +13642,17 @@ rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
>>    return rs6000_builtin_decls[code];
>>  }
>>  
>> +/* Return true if the builtin with CODE has any mask bits set
>> +   which are specified by MASK.  */
>> +
>> +bool
>> +rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask)
>> +{
>> +  gcc_assert (code < RS6000_BUILTIN_COUNT);
>> +  HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask;
>> +  return fnmask & mask;
>> +}
> 
> The "_p" does not say that "any bits" part, which is crucial here.  So
> name this something like "rs6000_fn_has_any_of_these_mask_bits"?  Yes
> the name sucks, because this interface does :-P
> 

Thanks for the name, will fix it.  :)

> Its it useful to have "any" semantics at all?  Otherwise, require this
> to be passed just a single bit?
> 

Since we can not just pass in a bit, we have to assert it with something
like:

   gcc_assert (__builtin_popcount(mask) == 1);

to claim it's checking a single bit.  But the implementation logic still
supports checking any bits, so I thought we can just claim it to check
any bits and a single bit is just one special case.

Yeah, not sure if there is a need to check any bits, but something like
checking exists FRSQRTE and FRSQRTES bifs can pass (RS6000_BTM_FRSQRTE |
RS6000_BTM_FRSQRTES), so is it fine to keep it for any bits?

> The implicit "!!" (or "!= 0", same thing) that casting to bool does
> might be better explicit, too?  A cast to bool changes value so is more
> suprising than other casts.

OK, will fix it.

> 
>> +  /* Assume inline asm can use any instruction features.  */
>> +  if (gimple_code (stmt) == GIMPLE_ASM)
>> +    {
>> +      info[0] = -1;
>> +      return false;
>> +    }
> 
> What is -1 here?  "All options set"?  Does that work?  Reliably?
> 

Good question, in the current implementation it's reliable, since we do
operation "~" first then & the interesting bits (OPTION_MASK_HTM here)
but I think you concerned some conflict bits co-exists is reasonable or
not.  I was intended to cover any future interesting bits, but I agree
it's better to just set the correpsonding intersting bits to make it clear.

Will fix it.

>> +      if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD))
>> +	{
>> +	  enum rs6000_builtins fcode =
>> +	    (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>> +	  /* HTM bifs definitely exploit HTM insns.  */
>> +	  if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM))
> 
> Why the cast here?  Please change the parameter type, instead?  It is
> fine to use enums specific to our backend in that backend itself :-)
> 

Referred to the exisitng rs6000_builtin_decl, just noticed it's a hook.
Will fix it.

>> @@ -1146,6 +1147,16 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
>>  	  fprintf (f, "  calls:\n");
>>  	  dump_ipa_call_summary (f, 4, node, s);
>>  	  fprintf (f, "\n");
>> +	  HOST_WIDE_INT flags;
>> +	  for (int i = 0; s->target_info.iterate (i, &flags); i++)
>> +	    {
>> +	      if (i == 0)
>> +		{
>> +		  fprintf (f, "  target_info flags:");
>> +		}
> 
> Don't use blocks around single statements please.
> 

Good catch, will fix it.

>> +  /* Only look for target information for inlinable always_inline functions.  */
>> +  bool scan_for_target_info =
>> +    (info->inlinable
>> +     && DECL_DISREGARD_INLINE_LIMITS (node->decl)
>> +     && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl))
>> +     && targetm.target_option.need_ipa_fn_target_info (node->decl,
>> +						       info->target_info));
> 
> Don't use unnecessary parens please.
> 

OK, will fix it.

BR,
Kewen.
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b92928c891a..0fbdab15e67 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13642,6 +13642,17 @@  rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
   return rs6000_builtin_decls[code];
 }
 
+/* Return true if the builtin with CODE has any mask bits set
+   which are specified by MASK.  */
+
+bool
+rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask)
+{
+  gcc_assert (code < RS6000_BUILTIN_COUNT);
+  HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask;
+  return fnmask & mask;
+}
+
 static void
 altivec_init_builtins (void)
 {
diff --git a/gcc/config/rs6000/rs6000-internal.h b/gcc/config/rs6000/rs6000-internal.h
index 88cf9bd5692..fc71b5637f1 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -179,6 +179,7 @@  extern tree rs6000_fold_builtin (tree fndecl ATTRIBUTE_UNUSED,
 			         int n_args ATTRIBUTE_UNUSED,
 			         tree *args ATTRIBUTE_UNUSED,
 			         bool ignore ATTRIBUTE_UNUSED);
+extern bool rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask);
 
 #if TARGET_ELF
 extern bool rs6000_passes_ieee128;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c2582a3efab..2ee12275b78 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -71,6 +71,9 @@ 
 #include "tree-vector-builder.h"
 #include "context.h"
 #include "tree-pass.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 #include "except.h"
 #if TARGET_XCOFF
 #include "xcoffout.h"  /* get declarations of xcoff_*_section_name */
@@ -1780,6 +1783,12 @@  static const struct attribute_spec rs6000_attribute_table[] =
 
 #undef TARGET_INVALID_CONVERSION
 #define TARGET_INVALID_CONVERSION rs6000_invalid_conversion
+
+#undef TARGET_NEED_IPA_FN_TARGET_INFO
+#define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info
+
+#undef TARGET_UPDATE_IPA_FN_TARGET_INFO
+#define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
 
 
 /* Processor table.  */
@@ -25052,7 +25061,70 @@  rs6000_generate_version_dispatcher_body (void *node_p)
   return resolver;
 }
 
-
+/* Hook to decide if we need to scan function gimples to collect
+   target specific information for inlining, push the initial flag
+   into INFO if we want to make use of target info.  */
+
+static bool
+rs6000_need_ipa_fn_target_info (const_tree decl, vec<HOST_WIDE_INT> &info)
+{
+  tree target = DECL_FUNCTION_SPECIFIC_TARGET (decl);
+  if (!target)
+    target = target_option_default_node;
+  struct cl_target_option *opts = TREE_TARGET_OPTION (target);
+
+  /* See PR102059, we only handle HTM for now, so will only do
+     the consequent scannings when HTM feature enabled.  */
+  if (opts->x_rs6000_isa_flags & OPTION_MASK_HTM)
+    {
+      info.safe_push (0);
+      return true;
+    }
+
+  /* The given function disables HTM explicitly, it's safe to be
+     inlined from HTM feature's perspective and don't need any
+     further checkings.  */
+  if (opts->x_rs6000_isa_flags_explicit & OPTION_MASK_HTM)
+    {
+      info.safe_push (0);
+      return false;
+    }
+
+  return false;
+}
+
+/* Hook to update target specific information INFO for inlining by
+   scanning the given STMT.  Return false if we don't need to scan
+   any more.  */
+
+static bool
+rs6000_update_ipa_fn_target_info (vec<HOST_WIDE_INT> &info, const gimple *stmt)
+{
+  /* Assume inline asm can use any instruction features.  */
+  if (gimple_code (stmt) == GIMPLE_ASM)
+    {
+      info[0] = -1;
+      return false;
+    }
+  else if (gimple_code (stmt) == GIMPLE_CALL)
+    {
+      tree fndecl = gimple_call_fndecl (stmt);
+      if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD))
+	{
+	  enum rs6000_builtins fcode =
+	    (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+	  /* HTM bifs definitely exploit HTM insns.  */
+	  if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM))
+	    {
+	      info[0] |= OPTION_MASK_HTM;
+	      return false;
+	    }
+	}
+    }
+
+  return true;
+}
+
 /* Hook to determine if one function can safely inline another.  */
 
 static bool
@@ -25085,10 +25157,22 @@  rs6000_can_inline_p (tree caller, tree callee)
      | OPTION_MASK_P10_FUSION_ADDLOG | OPTION_MASK_P10_FUSION_2ADD
      | OPTION_MASK_PCREL_OPT);
 
-  if (always_inline) {
-    caller_isa &= ~always_inline_safe_mask;
-    callee_isa &= ~always_inline_safe_mask;
-  }
+  if (always_inline)
+    {
+      cgraph_node *callee_node = cgraph_node::get (callee);
+      if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
+	{
+	  if (dump_file)
+	    ipa_dump_fn_summary (dump_file, callee_node);
+	  const vec<HOST_WIDE_INT> &info =
+	    ipa_fn_summaries->get (callee_node)->target_info;
+	  if (!info.is_empty ())
+	    always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
+	}
+
+      caller_isa &= ~always_inline_safe_mask;
+      callee_isa &= ~always_inline_safe_mask;
+    }
 
   /* The callee's options must be a subset of the caller's options, i.e.
      a vsx function may inline an altivec function, but a no-vsx function
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c8f4abe3e41..4af5e3e6819 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10743,6 +10743,37 @@  default, inlining is not allowed if the callee function has function
 specific target options and the caller does not use the same options.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_UPDATE_IPA_FN_TARGET_INFO (vec<HOST_WIDE_INT>& @var{vec}, const gimple* @var{stmt})
+Allow target to analyze all gimple statements for the given function to
+record and update some target specific information for inlining.  A typical
+example is that a caller with one isa feature disabled is normally not
+allowed to inline a callee with that same isa feature enabled even which is
+attributed by always_inline, but with the conservative analysis on all
+statements of the callee if we are able to guarantee the callee does not
+exploit any instructions from the mismatch isa feature, it would be safe to
+allow the caller to inline the callee.
+@var{vec} is one @code{HOST_WIDE_INT} element vector to record information
+in which one set bit indicates one corresponding feature is detected in the
+analysis, @var{stmt} is the statement being analyzed.  Return true if target
+still need to analyze the subsequent statements, otherwise return false to
+stop subsequent analysis.
+The default version of this hook returns false.
+@end deftypefn
+
+@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree @var{decl}, vec<HOST_WIDE_INT>& @var{vec})
+Allow target to check early whether it is necessary to analyze all gimple
+statements in the given function to update target specific information for
+inlining.  See hook @code{update_ipa_fn_target_info} for usage example of
+target specific information.  This hook is expected to be invoked ahead of
+the iterating with hook @code{update_ipa_fn_target_info}.
+@var{decl} is the function being analyzed, @var{vec} is the same as what
+in hook @code{update_ipa_fn_target_info}, target can do one time update
+into @var{vec} without iterating for some case.  Return true if target
+decides to analyze all gimple statements to collect information, otherwise
+return false.
+The default version of this hook returns false.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_RELAYOUT_FUNCTION (tree @var{fndecl})
 This target hook fixes function @var{fndecl} after attributes are processed.
 Default does nothing. On ARM, the default function's alignment is updated
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9c4b5016053..2b013649a45 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7196,6 +7196,10 @@  on this implementation detail.
 
 @hook TARGET_CAN_INLINE_P
 
+@hook TARGET_UPDATE_IPA_FN_TARGET_INFO
+
+@hook TARGET_NEED_IPA_FN_TARGET_INFO
+
 @hook TARGET_RELAYOUT_FUNCTION
 
 @node Emulated TLS
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index cf80ce3c040..1e5f29464e7 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -56,6 +56,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
 #include "tree.h"
 #include "gimple.h"
 #include "alloc-pool.h"
@@ -1146,6 +1147,16 @@  ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
 	  fprintf (f, "  calls:\n");
 	  dump_ipa_call_summary (f, 4, node, s);
 	  fprintf (f, "\n");
+	  HOST_WIDE_INT flags;
+	  for (int i = 0; s->target_info.iterate (i, &flags); i++)
+	    {
+	      if (i == 0)
+		{
+		  fprintf (f, "  target_info flags:");
+		}
+	      fprintf (f, " [%d]: " HOST_WIDE_INT_PRINT_HEX "\n", i, flags);
+	    }
+	  fprintf (f, "\n");
 	}
       else
 	fprintf (f, "IPA summary for %s is missing.\n", node->dump_name ());
@@ -2608,6 +2619,7 @@  analyze_function_body (struct cgraph_node *node, bool early)
   info->conds = NULL;
   info->size_time_table.release ();
   info->call_size_time_table.release ();
+  info->target_info.release();
 
   /* When optimizing and analyzing for IPA inliner, initialize loop optimizer
      so we can produce proper inline hints.
@@ -2659,6 +2671,14 @@  analyze_function_body (struct cgraph_node *node, bool early)
 			   bb_predicate,
 		           bb_predicate);
 
+  /* Only look for target information for inlinable always_inline functions.  */
+  bool scan_for_target_info =
+    (info->inlinable
+     && DECL_DISREGARD_INLINE_LIMITS (node->decl)
+     && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl))
+     && targetm.target_option.need_ipa_fn_target_info (node->decl,
+						       info->target_info));
+
   if (fbi.info)
     compute_bb_predicates (&fbi, node, info, params_summary);
   const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
@@ -2876,6 +2896,10 @@  analyze_function_body (struct cgraph_node *node, bool early)
 		  if (dump_file)
 		    fprintf (dump_file, "   fp_expression set\n");
 		}
+	      if (scan_for_target_info)
+		scan_for_target_info =
+		  targetm.target_option.update_ipa_fn_target_info
+		  (info->target_info, stmt);
 	    }
 
 	  /* Account cost of address calculations in the statements.  */
@@ -4073,6 +4097,22 @@  ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge)
 
   info->fp_expressions |= callee_info->fp_expressions;
 
+  /* For now, only always_inline functions have the target info.  */
+  if (callee_info->target_info.is_empty ())
+    {
+      if (!info->target_info.is_empty ())
+	/* We don't have target info from the inlined callee, discard the
+	   current one to keep conservative.  */
+	info->target_info.truncate (0);
+    }
+  else if (!info->target_info.is_empty ())
+    {
+      unsigned len = info->target_info.length ();
+      gcc_assert (len == callee_info->target_info.length ());
+      for (unsigned i = 0; i < len; i++)
+	info->target_info[i] |= callee_info->target_info[i];
+    }
+
   if (callee_info->conds)
     {
       ipa_auto_call_arg_values avals;
@@ -4534,6 +4574,15 @@  inline_read_section (struct lto_file_decl_data *file_data, const char *data,
 	  if (info)
 	    info->builtin_constant_p_parms.quick_push (parm);
 	}
+      count2 = streamer_read_uhwi (&ib);
+      if (info && count2)
+	info->target_info.reserve_exact (count2);
+      for (j = 0; j < count2; j++)
+	{
+	  HOST_WIDE_INT flag = streamer_read_hwi (&ib);
+	  if (info)
+	    info->target_info.quick_push (flag);
+	}
       for (e = node->callees; e; e = e->next_callee)
 	read_ipa_call_summary (&ib, e, info != NULL);
       for (e = node->indirect_calls; e; e = e->next_callee)
@@ -4713,6 +4762,10 @@  ipa_fn_summary_write (void)
 	  for (i = 0; info->builtin_constant_p_parms.iterate (i, &ip);
 	       i++)
 	    streamer_write_uhwi (ob, ip);
+	  streamer_write_uhwi (ob, info->target_info.length ());
+	  HOST_WIDE_INT hwi;
+	  for (i = 0; info->target_info.iterate (i, &hwi); i++)
+	    streamer_write_hwi (ob, hwi);
 	  for (edge = cnode->callees; edge; edge = edge->next_callee)
 	    write_ipa_call_summary (ob, edge);
 	  for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index 78399b0b9bb..300b8da4507 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -131,7 +131,7 @@  public:
       time (0), conds (NULL),
       size_time_table (), call_size_time_table (vNULL),
       loop_iterations (NULL), loop_strides (NULL),
-      builtin_constant_p_parms (vNULL),
+      builtin_constant_p_parms (vNULL), target_info (),
       growth (0), scc_no (0)
   {
   }
@@ -146,6 +146,7 @@  public:
     call_size_time_table (vNULL),
     loop_iterations (s.loop_iterations), loop_strides (s.loop_strides),
     builtin_constant_p_parms (s.builtin_constant_p_parms),
+    target_info (s.target_info.copy()),
     growth (s.growth), scc_no (s.scc_no)
   {}
 
@@ -193,6 +194,9 @@  public:
   vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
   /* Parameters tested by builtin_constant_p.  */
   vec<int, va_heap, vl_ptr> GTY((skip)) builtin_constant_p_parms;
+  /* Like fp_expressions, but it's to hold some target specific information,
+     such as some target specific isa flags.  */
+  auto_vec<HOST_WIDE_INT> GTY((skip)) target_info;
   /* Estimated growth for inlining all copies of the function before start
      of small functions inlining.
      This value will get out of date as the callers are duplicated, but
diff --git a/gcc/testsuite/gcc.dg/lto/pr102059_0.c b/gcc/testsuite/gcc.dg/lto/pr102059_0.c
new file mode 100644
index 00000000000..f164d96637e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr102059_0.c
@@ -0,0 +1,12 @@ 
+/* { dg-lto-do link } */
+/* { dg-skip-if "power only" { ! { powerpc*-*-* } } } */
+/* -Wno-attributes suppresses always_inline warnings.  */
+/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */
+
+int __attribute__ ((always_inline))
+foo1 (int *b)
+{
+  *b += 100;
+  return *b;
+}
+
diff --git a/gcc/testsuite/gcc.dg/lto/pr102059_1.c b/gcc/testsuite/gcc.dg/lto/pr102059_1.c
new file mode 100644
index 00000000000..7e31fc7fbd9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr102059_1.c
@@ -0,0 +1,9 @@ 
+extern int foo1 (int *b);
+
+int __attribute__ ((always_inline)) foo2 (int *b)
+{
+  int res = foo1 (b);
+  *b += res;
+  return *b;
+}
+
diff --git a/gcc/testsuite/gcc.dg/lto/pr102059_2.c b/gcc/testsuite/gcc.dg/lto/pr102059_2.c
new file mode 100644
index 00000000000..bfa1e62fb7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr102059_2.c
@@ -0,0 +1,11 @@ 
+extern int foo2 (int *b);
+
+#pragma GCC target "cpu=power10"
+__attribute__ ((always_inline))
+int
+main (int *a)
+{
+  *a = foo2 (a);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-5.c b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c
new file mode 100644
index 00000000000..b969c4c4750
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* -Wno-attributes suppresses always_inline warnings.  */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -Wno-attributes" } */
+
+/* Verify the reduced case from PR102059 won't fail.  */
+
+__attribute__ ((always_inline)) int
+foo (int *b)
+{
+  *b += 10;
+  return *b;
+}
+
+#pragma GCC target "cpu=power10"
+int
+bar (int *a)
+{
+  *a = foo (a);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-6.c b/gcc/testsuite/gcc.target/powerpc/pr102059-6.c
new file mode 100644
index 00000000000..52e009289f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-6.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-htm" } */
+
+/* Verify target info for inlining still works even if callee
+   disables HTM explicitly while caller enables it.  */
+
+static inline int __attribute__ ((always_inline))
+foo (int *b)
+{
+  *b += 10;
+  return *b;
+}
+
+#pragma GCC target "htm"
+int
+bar (int *a)
+{
+  *a = foo (a);
+  return 0;
+}
+
diff --git a/gcc/target.def b/gcc/target.def
index 2e40448e6c5..92e4b0be667 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6567,6 +6567,41 @@  specific target options and the caller does not use the same options.",
  bool, (tree caller, tree callee),
  default_target_can_inline_p)
 
+DEFHOOK
+(update_ipa_fn_target_info,
+ "Allow target to analyze all gimple statements for the given function to\n\
+record and update some target specific information for inlining.  A typical\n\
+example is that a caller with one isa feature disabled is normally not\n\
+allowed to inline a callee with that same isa feature enabled even which is\n\
+attributed by always_inline, but with the conservative analysis on all\n\
+statements of the callee if we are able to guarantee the callee does not\n\
+exploit any instructions from the mismatch isa feature, it would be safe to\n\
+allow the caller to inline the callee.\n\
+@var{vec} is one @code{HOST_WIDE_INT} element vector to record information\n\
+in which one set bit indicates one corresponding feature is detected in the\n\
+analysis, @var{stmt} is the statement being analyzed.  Return true if target\n\
+still need to analyze the subsequent statements, otherwise return false to\n\
+stop subsequent analysis.\n\
+The default version of this hook returns false.",
+ bool, (vec<HOST_WIDE_INT>& vec, const gimple* stmt),
+ default_update_ipa_fn_target_info)
+
+DEFHOOK
+(need_ipa_fn_target_info,
+ "Allow target to check early whether it is necessary to analyze all gimple\n\
+statements in the given function to update target specific information for\n\
+inlining.  See hook @code{update_ipa_fn_target_info} for usage example of\n\
+target specific information.  This hook is expected to be invoked ahead of\n\
+the iterating with hook @code{update_ipa_fn_target_info}.\n\
+@var{decl} is the function being analyzed, @var{vec} is the same as what\n\
+in hook @code{update_ipa_fn_target_info}, target can do one time update\n\
+into @var{vec} without iterating for some case.  Return true if target\n\
+decides to analyze all gimple statements to collect information, otherwise\n\
+return false.\n\
+The default version of this hook returns false.",
+ bool, (const_tree decl, vec<HOST_WIDE_INT>& vec),
+ default_need_ipa_fn_target_info)
+
 DEFHOOK
 (relayout_function,
 "This target hook fixes function @var{fndecl} after attributes are processed.\n\
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 44a1facedcf..b14f5e22ba8 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1757,6 +1757,22 @@  default_target_can_inline_p (tree caller, tree callee)
   return callee_opts == caller_opts;
 }
 
+/* By default, return false to not need to collect any target information
+   for inlining.  Target maintainer should re-define the hook if the
+   target want to take advantage of it.  */
+
+bool
+default_need_ipa_fn_target_info (const_tree, vec<HOST_WIDE_INT> &)
+{
+  return false;
+}
+
+bool
+default_update_ipa_fn_target_info (vec<HOST_WIDE_INT> &, const gimple *)
+{
+  return false;
+}
+
 /* If the machine does not have a case insn that compares the bounds,
    this means extra overhead for dispatch tables, which raises the
    threshold for using them.  */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index f70a307d26c..7950f907d27 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -193,6 +193,9 @@  extern bool default_new_address_profitable_p (rtx, rtx_insn *, rtx);
 extern bool default_target_option_valid_attribute_p (tree, tree, tree, int);
 extern bool default_target_option_pragma_parse (tree, tree);
 extern bool default_target_can_inline_p (tree, tree);
+extern bool default_update_ipa_fn_target_info (vec<HOST_WIDE_INT> &,
+					       const gimple *);
+extern bool default_need_ipa_fn_target_info (const_tree, vec<HOST_WIDE_INT> &);
 extern bool default_valid_pointer_mode (scalar_int_mode);
 extern bool default_ref_may_alias_errno (class ao_ref *);
 extern scalar_int_mode default_addr_space_pointer_mode (addr_space_t);