Message ID | f727bb3e-a553-2859-d691-2bcfebb50d7d@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059] | expand |
On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi! > > Power ISA 2.07 (Power8) introduces transactional memory feature > but ISA3.1 (Power10) removes it. It exposes one troublesome > issue as PR102059 shows. Users define some function with > target pragma cpu=power10 then it calls one function with > attribute always_inline which inherits command line option > cpu=power8 which enables HTM implicitly. The current isa_flags > check doesn't allow this inlining due to "target specific > option mismatch" and error mesasge is emitted. > > Normally, the callee function isn't intended to exploit HTM > feature, but the default flag setting make it look it has. > As Richi raised in the PR, we have fp_expressions flag in > function summary, and allow us to check the function actually > contains any floating point expressions to avoid overkill. > So this patch follows the similar idea but is more target > specific, for this rs6000 port specific requirement on HTM > feature check, we would like to check rs6000 specific HTM > built-in functions and inline assembly, it allows targets > to do their own customized checks and updates. > > It introduces two target hooks need_ipa_fn_target_info and > update_ipa_fn_target_info. The former allows target to do > some previous check and decides to collect target specific > information for this function or not. For some special case, > it can predict the analysis result and push it early without > any scannings. The latter allows the analyze_function_body > to pass gimple stmts down just like fp_expressions handlings, > target can do its own tricks. I put them as one hook initially > with one boolean to indicates whether it's initial time, but > the code looks a bit ugly, to separate them seems to have > better readability. > > To make it simple, this patch uses HOST_WIDE_INT to record the > flags just like what we use for isa_flags. For rs6000's HTM > need, one HOST_WIDE_INT variable is quite enough, but it seems > good to have one auto_vec for scalability as I noticed some > targets have more than one HOST_WIDE_INT flag. For now, this > target information collection is only for always_inline function, > function ipa_merge_fn_summary_after_inlining deals with target > information merging. > > The patch has been bootstrapped and regress-tested on > powerpc64le-linux-gnu Power9. > > Is it on the right track? + if (always_inline) + { + cgraph_node *callee_node = cgraph_node::get (callee); + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL) + { + if (dump_file) + ipa_dump_fn_summary (dump_file, callee_node); + const vec<HOST_WIDE_INT> &info = + ipa_fn_summaries->get (callee_node)->target_info; + if (!info.is_empty ()) + always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM; + } + + caller_isa &= ~always_inline_safe_mask; + callee_isa &= ~always_inline_safe_mask; + } that's a bit convoluted but obviously the IPA info can be used for non-always_inline cases as well. As said above the info can be produced for not always-inline functions as well, the usual case would be for LTO inlining across TUs compiled with different target options. In your case the special -mcpu=power10 TU would otherwise not be able to inline from a general -mcpu=power8 TU. On the streaming side we possibly have to take care about the GPU offloading path where we likely want to avoid pushing host target bits to the GPU target in some way. Your case is specifically looking for HTM target builtins - for more general cases, like for example deciding whether we can inline across, say, -mlzcnt on x86 the scanning would have to include at least direct internal-fns mapping to optabs that could change. With such inlining we might also work against heuristic tuning decisions based on the ISA availability making code (much) more expensive to expand without such ISA availability, but that wouldn't be a correctness issue then. Otherwise the overall bits look OK but I'll leave the details to our IPA folks. Thanks, Richard. > > Any comments are highly appreciated! > > BR, > Kewen > ------ > gcc/ChangeLog: > > PR ipa/102059 > * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New > function. > * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New > declare. > * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro. > (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. > (rs6000_need_ipa_fn_target_info): New function. > (rs6000_update_ipa_fn_target_info): Likewise. > (rs6000_can_inline_p): Adjust for ipa function summary target info. > * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function > summary target info. > (analyze_function_body): Adjust for ipa function summary target > info and call hook rs6000_need_ipa_fn_target_info and > rs6000_update_ipa_fn_target_info. > (ipa_merge_fn_summary_after_inlining): Adjust for ipa function > summary target info. > (inline_read_section): Likewise. > (ipa_fn_summary_write): Likewise. > * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. > * doc/tm.texi: Regenerate. > * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new > hook. > (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. > * target.def (update_ipa_fn_target_info): New hook. > (need_ipa_fn_target_info): Likewise. > * targhooks.c (default_need_ipa_fn_target_info): New function. > (default_update_ipa_fn_target_info): Likewise. > * targhooks.h (default_update_ipa_fn_target_info): New declare. > (default_need_ipa_fn_target_info): Likewise. > > gcc/testsuite/ChangeLog: > > PR ipa/102059 > * gcc.dg/lto/pr102059_0.c: New test. > * gcc.dg/lto/pr102059_1.c: New test. > * gcc.dg/lto/pr102059_2.c: New test. > * gcc.target/powerpc/pr102059-5.c: New test. > * gcc.target/powerpc/pr102059-6.c: New test. >
Hi Richi, Thanks for the comments! on 2021/9/2 下午5:25, Richard Biener wrote: > On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi! >> >> Power ISA 2.07 (Power8) introduces transactional memory feature >> but ISA3.1 (Power10) removes it. It exposes one troublesome >> issue as PR102059 shows. Users define some function with >> target pragma cpu=power10 then it calls one function with >> attribute always_inline which inherits command line option >> cpu=power8 which enables HTM implicitly. The current isa_flags >> check doesn't allow this inlining due to "target specific >> option mismatch" and error mesasge is emitted. >> >> Normally, the callee function isn't intended to exploit HTM >> feature, but the default flag setting make it look it has. >> As Richi raised in the PR, we have fp_expressions flag in >> function summary, and allow us to check the function actually >> contains any floating point expressions to avoid overkill. >> So this patch follows the similar idea but is more target >> specific, for this rs6000 port specific requirement on HTM >> feature check, we would like to check rs6000 specific HTM >> built-in functions and inline assembly, it allows targets >> to do their own customized checks and updates. >> >> It introduces two target hooks need_ipa_fn_target_info and >> update_ipa_fn_target_info. The former allows target to do >> some previous check and decides to collect target specific >> information for this function or not. For some special case, >> it can predict the analysis result and push it early without >> any scannings. The latter allows the analyze_function_body >> to pass gimple stmts down just like fp_expressions handlings, >> target can do its own tricks. I put them as one hook initially >> with one boolean to indicates whether it's initial time, but >> the code looks a bit ugly, to separate them seems to have >> better readability. >> >> To make it simple, this patch uses HOST_WIDE_INT to record the >> flags just like what we use for isa_flags. For rs6000's HTM >> need, one HOST_WIDE_INT variable is quite enough, but it seems >> good to have one auto_vec for scalability as I noticed some >> targets have more than one HOST_WIDE_INT flag. For now, this >> target information collection is only for always_inline function, >> function ipa_merge_fn_summary_after_inlining deals with target >> information merging. >> >> The patch has been bootstrapped and regress-tested on >> powerpc64le-linux-gnu Power9. >> >> Is it on the right track? > > + if (always_inline) > + { > + cgraph_node *callee_node = cgraph_node::get (callee); > + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL) > + { > + if (dump_file) > + ipa_dump_fn_summary (dump_file, callee_node); > + const vec<HOST_WIDE_INT> &info = > + ipa_fn_summaries->get (callee_node)->target_info; > + if (!info.is_empty ()) > + always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM; > + } > + > + caller_isa &= ~always_inline_safe_mask; > + callee_isa &= ~always_inline_safe_mask; > + } > > that's a bit convoluted but obviously the IPA info can be used for > non-always_inline cases as well. > > As said above the info can be produced for not always-inline functions > as well, the usual case would be for LTO inlining across TUs compiled > with different target options. In your case the special -mcpu=power10 > TU would otherwise not be able to inline from a general -mcpu=power8 TU. > Agree it can be extended to non-always_inline cases. Since always_inline is kind of user "forced" requirement and compiler emits error if it fails to inline, while non-always_inline will have warning instead. Considering the scanning might be considered as costly for some big functions, I guessed it might be good to start from always_inline as the first step. But if different target options among LTO TUs is a common user case, I think it's worth to extending it now. > On the streaming side we possibly have to take care about the > GPU offloading path where we likely want to avoid pushing host target > bits to the GPU target in some way. > I guess this comment is about lto_stream_offload_p, I just did some quick checks, this flag seems to guard things into section offload_lto, while the function summary has its own section, it seems fine? > Your case is specifically looking for HTM target builtins - for more general > cases, like for example deciding whether we can inline across, say, > -mlzcnt on x86 the scanning would have to include at least direct internal-fns > mapping to optabs that could change. With such inlining we might also > work against heuristic tuning decisions based on the ISA availability > making code (much) more expensive to expand without such ISA availability, > but that wouldn't be a correctness issue then. OK,I would assume the hook function parameter gimple* will be also enough for this example. :) IMHO, even with this target information collection, we are unable to check all ISA features, it can only work for some "dull" ISA features, like HTM on Power which can only be exploited by builtin (or inline asm), the downstream passes don't try to exploit it in other ways. For some features like VSX, vectorization pass can produce vector code after we generating fn summary and think it doesn't use, it seems no way to get it right in early stage of pass pipeline. > > Otherwise the overall bits look OK but I'll leave the details to our IPA folks. > Thanks again! BR, Kewen > Thanks, > Richard. > >> >> Any comments are highly appreciated! >> >> BR, >> Kewen >> ------ >> gcc/ChangeLog: >> >> PR ipa/102059 >> * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New >> function. >> * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New >> declare. >> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro. >> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. >> (rs6000_need_ipa_fn_target_info): New function. >> (rs6000_update_ipa_fn_target_info): Likewise. >> (rs6000_can_inline_p): Adjust for ipa function summary target info. >> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function >> summary target info. >> (analyze_function_body): Adjust for ipa function summary target >> info and call hook rs6000_need_ipa_fn_target_info and >> rs6000_update_ipa_fn_target_info. >> (ipa_merge_fn_summary_after_inlining): Adjust for ipa function >> summary target info. >> (inline_read_section): Likewise. >> (ipa_fn_summary_write): Likewise. >> * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. >> * doc/tm.texi: Regenerate. >> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new >> hook. >> (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. >> * target.def (update_ipa_fn_target_info): New hook. >> (need_ipa_fn_target_info): Likewise. >> * targhooks.c (default_need_ipa_fn_target_info): New function. >> (default_update_ipa_fn_target_info): Likewise. >> * targhooks.h (default_update_ipa_fn_target_info): New declare. >> (default_need_ipa_fn_target_info): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> PR ipa/102059 >> * gcc.dg/lto/pr102059_0.c: New test. >> * gcc.dg/lto/pr102059_1.c: New test. >> * gcc.dg/lto/pr102059_2.c: New test. >> * gcc.target/powerpc/pr102059-5.c: New test. >> * gcc.target/powerpc/pr102059-6.c: New test. >>
On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi Richi, > > Thanks for the comments! > > on 2021/9/2 下午5:25, Richard Biener wrote: > > On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi! > >> > >> Power ISA 2.07 (Power8) introduces transactional memory feature > >> but ISA3.1 (Power10) removes it. It exposes one troublesome > >> issue as PR102059 shows. Users define some function with > >> target pragma cpu=power10 then it calls one function with > >> attribute always_inline which inherits command line option > >> cpu=power8 which enables HTM implicitly. The current isa_flags > >> check doesn't allow this inlining due to "target specific > >> option mismatch" and error mesasge is emitted. > >> > >> Normally, the callee function isn't intended to exploit HTM > >> feature, but the default flag setting make it look it has. > >> As Richi raised in the PR, we have fp_expressions flag in > >> function summary, and allow us to check the function actually > >> contains any floating point expressions to avoid overkill. > >> So this patch follows the similar idea but is more target > >> specific, for this rs6000 port specific requirement on HTM > >> feature check, we would like to check rs6000 specific HTM > >> built-in functions and inline assembly, it allows targets > >> to do their own customized checks and updates. > >> > >> It introduces two target hooks need_ipa_fn_target_info and > >> update_ipa_fn_target_info. The former allows target to do > >> some previous check and decides to collect target specific > >> information for this function or not. For some special case, > >> it can predict the analysis result and push it early without > >> any scannings. The latter allows the analyze_function_body > >> to pass gimple stmts down just like fp_expressions handlings, > >> target can do its own tricks. I put them as one hook initially > >> with one boolean to indicates whether it's initial time, but > >> the code looks a bit ugly, to separate them seems to have > >> better readability. > >> > >> To make it simple, this patch uses HOST_WIDE_INT to record the > >> flags just like what we use for isa_flags. For rs6000's HTM > >> need, one HOST_WIDE_INT variable is quite enough, but it seems > >> good to have one auto_vec for scalability as I noticed some > >> targets have more than one HOST_WIDE_INT flag. For now, this > >> target information collection is only for always_inline function, > >> function ipa_merge_fn_summary_after_inlining deals with target > >> information merging. > >> > >> The patch has been bootstrapped and regress-tested on > >> powerpc64le-linux-gnu Power9. > >> > >> Is it on the right track? > > > > + if (always_inline) > > + { > > + cgraph_node *callee_node = cgraph_node::get (callee); > > + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL) > > + { > > + if (dump_file) > > + ipa_dump_fn_summary (dump_file, callee_node); > > + const vec<HOST_WIDE_INT> &info = > > + ipa_fn_summaries->get (callee_node)->target_info; > > + if (!info.is_empty ()) > > + always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM; > > + } > > + > > + caller_isa &= ~always_inline_safe_mask; > > + callee_isa &= ~always_inline_safe_mask; > > + } > > > > that's a bit convoluted but obviously the IPA info can be used for > > non-always_inline cases as well. > > > > As said above the info can be produced for not always-inline functions > > as well, the usual case would be for LTO inlining across TUs compiled > > with different target options. In your case the special -mcpu=power10 > > TU would otherwise not be able to inline from a general -mcpu=power8 TU. > > > > Agree it can be extended to non-always_inline cases. Since always_inline > is kind of user "forced" requirement and compiler emits error if it fails > to inline, while non-always_inline will have warning instead. Considering > the scanning might be considered as costly for some big functions, I > guessed it might be good to start from always_inline as the first step. > But if different target options among LTO TUs is a common user case, I > think it's worth to extending it now. I was merely looking at this from the perspective of test coverage - with restricting it to always-inline we're not going to notice issues very reliably I think. > > On the streaming side we possibly have to take care about the > > GPU offloading path where we likely want to avoid pushing host target > > bits to the GPU target in some way. > > > > I guess this comment is about lto_stream_offload_p, I just did some quick > checks, this flag seems to guard things into section offload_lto, while > the function summary has its own section, it seems fine? Yes, but the data, since its target specific, is interpreted different by the host target than by the offload target so IMHO we should drop this to a conservative state when offloading? > > Your case is specifically looking for HTM target builtins - for more general > > cases, like for example deciding whether we can inline across, say, > > -mlzcnt on x86 the scanning would have to include at least direct internal-fns > > mapping to optabs that could change. With such inlining we might also > > work against heuristic tuning decisions based on the ISA availability > > making code (much) more expensive to expand without such ISA availability, > > but that wouldn't be a correctness issue then. > > OK,I would assume the hook function parameter gimple* will be also enough for > this example. :) Yes, I think so. > IMHO, even with this target information collection, we are unable to check all > ISA features, it can only work for some "dull" ISA features, like HTM on > Power which can only be exploited by builtin (or inline asm), the downstream > passes don't try to exploit it in other ways. For some features like VSX, > vectorization pass can produce vector code after we generating fn summary and > think it doesn't use, it seems no way to get it right in early stage of pass > pipeline. I don't think later passes are an issue - they would operate under the constraints of the caller flag and thus still generate valid code. Yes, if, say, somebody disabled VSX on a function in the attempt to not vectorize an unprofitable case and that function gets inlined it might end up using VSX (as now active in the caller) to vectorize the unprofitable case. But in general it should work for any ISA feature (and some ISA features might just change what we expand common GIMPLE to - differences in those ISA features do not need to prevent inlining from a correctness perspective). Richard. > > > > Otherwise the overall bits look OK but I'll leave the details to our IPA folks. > > > > Thanks again! > > BR, > Kewen > > > Thanks, > > Richard. > > > >> > >> Any comments are highly appreciated! > >> > >> BR, > >> Kewen > >> ------ > >> gcc/ChangeLog: > >> > >> PR ipa/102059 > >> * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New > >> function. > >> * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New > >> declare. > >> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro. > >> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. > >> (rs6000_need_ipa_fn_target_info): New function. > >> (rs6000_update_ipa_fn_target_info): Likewise. > >> (rs6000_can_inline_p): Adjust for ipa function summary target info. > >> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function > >> summary target info. > >> (analyze_function_body): Adjust for ipa function summary target > >> info and call hook rs6000_need_ipa_fn_target_info and > >> rs6000_update_ipa_fn_target_info. > >> (ipa_merge_fn_summary_after_inlining): Adjust for ipa function > >> summary target info. > >> (inline_read_section): Likewise. > >> (ipa_fn_summary_write): Likewise. > >> * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. > >> * doc/tm.texi: Regenerate. > >> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new > >> hook. > >> (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. > >> * target.def (update_ipa_fn_target_info): New hook. > >> (need_ipa_fn_target_info): Likewise. > >> * targhooks.c (default_need_ipa_fn_target_info): New function. > >> (default_update_ipa_fn_target_info): Likewise. > >> * targhooks.h (default_update_ipa_fn_target_info): New declare. > >> (default_need_ipa_fn_target_info): Likewise. > >> > >> gcc/testsuite/ChangeLog: > >> > >> PR ipa/102059 > >> * gcc.dg/lto/pr102059_0.c: New test. > >> * gcc.dg/lto/pr102059_1.c: New test. > >> * gcc.dg/lto/pr102059_2.c: New test. > >> * gcc.target/powerpc/pr102059-5.c: New test. > >> * gcc.target/powerpc/pr102059-6.c: New test. > >>
on 2021/9/2 下午7:51, Richard Biener wrote: > On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi Richi, >> >> Thanks for the comments! >> >> on 2021/9/2 下午5:25, Richard Biener wrote: >>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>> >>>> Hi! >>>> >>>> Power ISA 2.07 (Power8) introduces transactional memory feature >>>> but ISA3.1 (Power10) removes it. It exposes one troublesome >>>> issue as PR102059 shows. Users define some function with >>>> target pragma cpu=power10 then it calls one function with >>>> attribute always_inline which inherits command line option >>>> cpu=power8 which enables HTM implicitly. The current isa_flags >>>> check doesn't allow this inlining due to "target specific >>>> option mismatch" and error mesasge is emitted. >>>> >>>> Normally, the callee function isn't intended to exploit HTM >>>> feature, but the default flag setting make it look it has. >>>> As Richi raised in the PR, we have fp_expressions flag in >>>> function summary, and allow us to check the function actually >>>> contains any floating point expressions to avoid overkill. >>>> So this patch follows the similar idea but is more target >>>> specific, for this rs6000 port specific requirement on HTM >>>> feature check, we would like to check rs6000 specific HTM >>>> built-in functions and inline assembly, it allows targets >>>> to do their own customized checks and updates. >>>> >>>> It introduces two target hooks need_ipa_fn_target_info and >>>> update_ipa_fn_target_info. The former allows target to do >>>> some previous check and decides to collect target specific >>>> information for this function or not. For some special case, >>>> it can predict the analysis result and push it early without >>>> any scannings. The latter allows the analyze_function_body >>>> to pass gimple stmts down just like fp_expressions handlings, >>>> target can do its own tricks. I put them as one hook initially >>>> with one boolean to indicates whether it's initial time, but >>>> the code looks a bit ugly, to separate them seems to have >>>> better readability. >>>> >>>> To make it simple, this patch uses HOST_WIDE_INT to record the >>>> flags just like what we use for isa_flags. For rs6000's HTM >>>> need, one HOST_WIDE_INT variable is quite enough, but it seems >>>> good to have one auto_vec for scalability as I noticed some >>>> targets have more than one HOST_WIDE_INT flag. For now, this >>>> target information collection is only for always_inline function, >>>> function ipa_merge_fn_summary_after_inlining deals with target >>>> information merging. >>>> >>>> The patch has been bootstrapped and regress-tested on >>>> powerpc64le-linux-gnu Power9. >>>> >>>> Is it on the right track? >>> >>> + if (always_inline) >>> + { >>> + cgraph_node *callee_node = cgraph_node::get (callee); >>> + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL) >>> + { >>> + if (dump_file) >>> + ipa_dump_fn_summary (dump_file, callee_node); >>> + const vec<HOST_WIDE_INT> &info = >>> + ipa_fn_summaries->get (callee_node)->target_info; >>> + if (!info.is_empty ()) >>> + always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM; >>> + } >>> + >>> + caller_isa &= ~always_inline_safe_mask; >>> + callee_isa &= ~always_inline_safe_mask; >>> + } >>> >>> that's a bit convoluted but obviously the IPA info can be used for >>> non-always_inline cases as well. >>> >>> As said above the info can be produced for not always-inline functions >>> as well, the usual case would be for LTO inlining across TUs compiled >>> with different target options. In your case the special -mcpu=power10 >>> TU would otherwise not be able to inline from a general -mcpu=power8 TU. >>> >> >> Agree it can be extended to non-always_inline cases. Since always_inline >> is kind of user "forced" requirement and compiler emits error if it fails >> to inline, while non-always_inline will have warning instead. Considering >> the scanning might be considered as costly for some big functions, I >> guessed it might be good to start from always_inline as the first step. >> But if different target options among LTO TUs is a common user case, I >> think it's worth to extending it now. > > I was merely looking at this from the perspective of test coverage - with > restricting it to always-inline we're not going to notice issues very > reliably I think. > Got it, will extend it to support all inlinable functions in next version. >>> On the streaming side we possibly have to take care about the >>> GPU offloading path where we likely want to avoid pushing host target >>> bits to the GPU target in some way. >>> >> >> I guess this comment is about lto_stream_offload_p, I just did some quick >> checks, this flag seems to guard things into section offload_lto, while >> the function summary has its own section, it seems fine? > > Yes, but the data, since its target specific, is interpreted different by > the host target than by the offload target so IMHO we should drop this > to a conservative state when offloading? > OK, I can guard the read/write under !lto_stream_offload_p and also test with offload cases to see whether it takes effect as expected. There are explicitly set/unset code before write_summary, but seems none for summary reading, I might miss sth., need to check more. >>> Your case is specifically looking for HTM target builtins - for more general >>> cases, like for example deciding whether we can inline across, say, >>> -mlzcnt on x86 the scanning would have to include at least direct internal-fns >>> mapping to optabs that could change. With such inlining we might also >>> work against heuristic tuning decisions based on the ISA availability >>> making code (much) more expensive to expand without such ISA availability, >>> but that wouldn't be a correctness issue then. >> >> OK,I would assume the hook function parameter gimple* will be also enough for >> this example. :) > > Yes, I think so. > >> IMHO, even with this target information collection, we are unable to check all >> ISA features, it can only work for some "dull" ISA features, like HTM on >> Power which can only be exploited by builtin (or inline asm), the downstream >> passes don't try to exploit it in other ways. For some features like VSX, >> vectorization pass can produce vector code after we generating fn summary and >> think it doesn't use, it seems no way to get it right in early stage of pass >> pipeline. > > I don't think later passes are an issue - they would operate under the > constraints of the caller flag and thus still generate valid code. Yeah, it makes sense. > Yes, if, say, somebody > disabled VSX on a function in the attempt to not vectorize an unprofitable case > and that function gets inlined it might end up using VSX (as now active in the > caller) to vectorize the unprofitable case. But in general it should work The example seems to be what we want to avoid in PR70010 if non-VSX set for callee explicitly. :) btw, in the current implementation, I simply mark the function non-HTM if it has explicit option -mno-htm, it doesn't need any scannings. One opposite example, one callee with VSX flag (implicitly or explicitly), but after scanning it's marked as non-VSX, it's a question that if we want it to be inlined to one caller with non-VSX or not. Like your example, if it can be vectorized before (meant without inlining), then to inline it is bad. While if it can not be vectorized before, then to inline it is good. We can't predict this function will have VSX code or not (vectorized or not, feature exploited or not) when setting the target information VSX feature bit. Anyway, target can do its own decision in that hook, it's fine. :) > for any ISA feature (and some ISA features might just change what we > expand common GIMPLE to - differences in those ISA features do not need > to prevent inlining from a correctness perspective). > Yeah, it would not be a correctness issue. BR, Kewen > Richard. > >>> >>> Otherwise the overall bits look OK but I'll leave the details to our IPA folks. >>> >> >> Thanks again! >> >> BR, >> Kewen >> >>> Thanks, >>> Richard. >>> >>>> >>>> Any comments are highly appreciated! >>>> >>>> BR, >>>> Kewen >>>> ------ >>>> gcc/ChangeLog: >>>> >>>> PR ipa/102059 >>>> * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New >>>> function. >>>> * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New >>>> declare. >>>> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro. >>>> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. >>>> (rs6000_need_ipa_fn_target_info): New function. >>>> (rs6000_update_ipa_fn_target_info): Likewise. >>>> (rs6000_can_inline_p): Adjust for ipa function summary target info. >>>> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function >>>> summary target info. >>>> (analyze_function_body): Adjust for ipa function summary target >>>> info and call hook rs6000_need_ipa_fn_target_info and >>>> rs6000_update_ipa_fn_target_info. >>>> (ipa_merge_fn_summary_after_inlining): Adjust for ipa function >>>> summary target info. >>>> (inline_read_section): Likewise. >>>> (ipa_fn_summary_write): Likewise. >>>> * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. >>>> * doc/tm.texi: Regenerate. >>>> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new >>>> hook. >>>> (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. >>>> * target.def (update_ipa_fn_target_info): New hook. >>>> (need_ipa_fn_target_info): Likewise. >>>> * targhooks.c (default_need_ipa_fn_target_info): New function. >>>> (default_update_ipa_fn_target_info): Likewise. >>>> * targhooks.h (default_update_ipa_fn_target_info): New declare. >>>> (default_need_ipa_fn_target_info): Likewise. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> PR ipa/102059 >>>> * gcc.dg/lto/pr102059_0.c: New test. >>>> * gcc.dg/lto/pr102059_1.c: New test. >>>> * gcc.dg/lto/pr102059_2.c: New test. >>>> * gcc.target/powerpc/pr102059-5.c: New test. >>>> * gcc.target/powerpc/pr102059-6.c: New test. >>>> BR, Kewen
On Thu, Sep 2, 2021 at 3:11 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2021/9/2 下午7:51, Richard Biener wrote: > > On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi Richi, > >> > >> Thanks for the comments! > >> > >> on 2021/9/2 下午5:25, Richard Biener wrote: > >>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>>> > >>>> Hi! > >>>> > >>>> Power ISA 2.07 (Power8) introduces transactional memory feature > >>>> but ISA3.1 (Power10) removes it. It exposes one troublesome > >>>> issue as PR102059 shows. Users define some function with > >>>> target pragma cpu=power10 then it calls one function with > >>>> attribute always_inline which inherits command line option > >>>> cpu=power8 which enables HTM implicitly. The current isa_flags > >>>> check doesn't allow this inlining due to "target specific > >>>> option mismatch" and error mesasge is emitted. > >>>> > >>>> Normally, the callee function isn't intended to exploit HTM > >>>> feature, but the default flag setting make it look it has. > >>>> As Richi raised in the PR, we have fp_expressions flag in > >>>> function summary, and allow us to check the function actually > >>>> contains any floating point expressions to avoid overkill. > >>>> So this patch follows the similar idea but is more target > >>>> specific, for this rs6000 port specific requirement on HTM > >>>> feature check, we would like to check rs6000 specific HTM > >>>> built-in functions and inline assembly, it allows targets > >>>> to do their own customized checks and updates. > >>>> > >>>> It introduces two target hooks need_ipa_fn_target_info and > >>>> update_ipa_fn_target_info. The former allows target to do > >>>> some previous check and decides to collect target specific > >>>> information for this function or not. For some special case, > >>>> it can predict the analysis result and push it early without > >>>> any scannings. The latter allows the analyze_function_body > >>>> to pass gimple stmts down just like fp_expressions handlings, > >>>> target can do its own tricks. I put them as one hook initially > >>>> with one boolean to indicates whether it's initial time, but > >>>> the code looks a bit ugly, to separate them seems to have > >>>> better readability. > >>>> > >>>> To make it simple, this patch uses HOST_WIDE_INT to record the > >>>> flags just like what we use for isa_flags. For rs6000's HTM > >>>> need, one HOST_WIDE_INT variable is quite enough, but it seems > >>>> good to have one auto_vec for scalability as I noticed some > >>>> targets have more than one HOST_WIDE_INT flag. For now, this > >>>> target information collection is only for always_inline function, > >>>> function ipa_merge_fn_summary_after_inlining deals with target > >>>> information merging. > >>>> > >>>> The patch has been bootstrapped and regress-tested on > >>>> powerpc64le-linux-gnu Power9. > >>>> > >>>> Is it on the right track? > >>> > >>> + if (always_inline) > >>> + { > >>> + cgraph_node *callee_node = cgraph_node::get (callee); > >>> + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL) > >>> + { > >>> + if (dump_file) > >>> + ipa_dump_fn_summary (dump_file, callee_node); > >>> + const vec<HOST_WIDE_INT> &info = > >>> + ipa_fn_summaries->get (callee_node)->target_info; > >>> + if (!info.is_empty ()) > >>> + always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM; > >>> + } > >>> + > >>> + caller_isa &= ~always_inline_safe_mask; > >>> + callee_isa &= ~always_inline_safe_mask; > >>> + } > >>> > >>> that's a bit convoluted but obviously the IPA info can be used for > >>> non-always_inline cases as well. > >>> > >>> As said above the info can be produced for not always-inline functions > >>> as well, the usual case would be for LTO inlining across TUs compiled > >>> with different target options. In your case the special -mcpu=power10 > >>> TU would otherwise not be able to inline from a general -mcpu=power8 TU. > >>> > >> > >> Agree it can be extended to non-always_inline cases. Since always_inline > >> is kind of user "forced" requirement and compiler emits error if it fails > >> to inline, while non-always_inline will have warning instead. Considering > >> the scanning might be considered as costly for some big functions, I > >> guessed it might be good to start from always_inline as the first step. > >> But if different target options among LTO TUs is a common user case, I > >> think it's worth to extending it now. > > > > I was merely looking at this from the perspective of test coverage - with > > restricting it to always-inline we're not going to notice issues very > > reliably I think. > > > > Got it, will extend it to support all inlinable functions in next version. > > >>> On the streaming side we possibly have to take care about the > >>> GPU offloading path where we likely want to avoid pushing host target > >>> bits to the GPU target in some way. > >>> > >> > >> I guess this comment is about lto_stream_offload_p, I just did some quick > >> checks, this flag seems to guard things into section offload_lto, while > >> the function summary has its own section, it seems fine? > > > > Yes, but the data, since its target specific, is interpreted different by > > the host target than by the offload target so IMHO we should drop this > > to a conservative state when offloading? > > > > OK, I can guard the read/write under !lto_stream_offload_p and also test > with offload cases to see whether it takes effect as expected. There are > explicitly set/unset code before write_summary, but seems none for summary > reading, I might miss sth., need to check more. > > >>> Your case is specifically looking for HTM target builtins - for more general > >>> cases, like for example deciding whether we can inline across, say, > >>> -mlzcnt on x86 the scanning would have to include at least direct internal-fns > >>> mapping to optabs that could change. With such inlining we might also > >>> work against heuristic tuning decisions based on the ISA availability > >>> making code (much) more expensive to expand without such ISA availability, > >>> but that wouldn't be a correctness issue then. > >> > >> OK,I would assume the hook function parameter gimple* will be also enough for > >> this example. :) > > > > Yes, I think so. > > > >> IMHO, even with this target information collection, we are unable to check all > >> ISA features, it can only work for some "dull" ISA features, like HTM on > >> Power which can only be exploited by builtin (or inline asm), the downstream > >> passes don't try to exploit it in other ways. For some features like VSX, > >> vectorization pass can produce vector code after we generating fn summary and > >> think it doesn't use, it seems no way to get it right in early stage of pass > >> pipeline. > > > > I don't think later passes are an issue - they would operate under the > > constraints of the caller flag and thus still generate valid code. > > Yeah, it makes sense. > > > Yes, if, say, somebody > > disabled VSX on a function in the attempt to not vectorize an unprofitable case > > and that function gets inlined it might end up using VSX (as now active in the > > caller) to vectorize the unprofitable case. But in general it should work > > The example seems to be what we want to avoid in PR70010 if non-VSX set for callee > explicitly. :) btw, in the current implementation, I simply mark the function > non-HTM if it has explicit option -mno-htm, it doesn't need any scannings. > > One opposite example, one callee with VSX flag (implicitly or explicitly), but > after scanning it's marked as non-VSX, it's a question that if we want it to be > inlined to one caller with non-VSX or not. Like your example, if it can be > vectorized before (meant without inlining), then to inline it is bad. While if > it can not be vectorized before, then to inline it is good. We can't predict > this function will have VSX code or not (vectorized or not, feature exploited > or not) when setting the target information VSX feature bit. Yeah - I guess this is where any heuristic decision a target might want to make will be different for the always-inline case. > Anyway, target can do its own decision in that hook, it's fine. :) > > > for any ISA feature (and some ISA features might just change what we > > expand common GIMPLE to - differences in those ISA features do not need > > to prevent inlining from a correctness perspective). > > > > Yeah, it would not be a correctness issue. > > BR, > Kewen > > > Richard. > > > >>> > >>> Otherwise the overall bits look OK but I'll leave the details to our IPA folks. > >>> > >> > >> Thanks again! > >> > >> BR, > >> Kewen > >> > >>> Thanks, > >>> Richard. > >>> > >>>> > >>>> Any comments are highly appreciated! > >>>> > >>>> BR, > >>>> Kewen > >>>> ------ > >>>> gcc/ChangeLog: > >>>> > >>>> PR ipa/102059 > >>>> * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New > >>>> function. > >>>> * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New > >>>> declare. > >>>> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro. > >>>> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. > >>>> (rs6000_need_ipa_fn_target_info): New function. > >>>> (rs6000_update_ipa_fn_target_info): Likewise. > >>>> (rs6000_can_inline_p): Adjust for ipa function summary target info. > >>>> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function > >>>> summary target info. > >>>> (analyze_function_body): Adjust for ipa function summary target > >>>> info and call hook rs6000_need_ipa_fn_target_info and > >>>> rs6000_update_ipa_fn_target_info. > >>>> (ipa_merge_fn_summary_after_inlining): Adjust for ipa function > >>>> summary target info. > >>>> (inline_read_section): Likewise. > >>>> (ipa_fn_summary_write): Likewise. > >>>> * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. > >>>> * doc/tm.texi: Regenerate. > >>>> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new > >>>> hook. > >>>> (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. > >>>> * target.def (update_ipa_fn_target_info): New hook. > >>>> (need_ipa_fn_target_info): Likewise. > >>>> * targhooks.c (default_need_ipa_fn_target_info): New function. > >>>> (default_update_ipa_fn_target_info): Likewise. > >>>> * targhooks.h (default_update_ipa_fn_target_info): New declare. > >>>> (default_need_ipa_fn_target_info): Likewise. > >>>> > >>>> gcc/testsuite/ChangeLog: > >>>> > >>>> PR ipa/102059 > >>>> * gcc.dg/lto/pr102059_0.c: New test. > >>>> * gcc.dg/lto/pr102059_1.c: New test. > >>>> * gcc.dg/lto/pr102059_2.c: New test. > >>>> * gcc.target/powerpc/pr102059-5.c: New test. > >>>> * gcc.target/powerpc/pr102059-6.c: New test. > >>>> > > > BR, > Kewen
Hi! On Wed, Sep 01, 2021 at 03:02:22PM +0800, Kewen.Lin wrote: > It introduces two target hooks need_ipa_fn_target_info and > update_ipa_fn_target_info. The former allows target to do > some previous check and decides to collect target specific > information for this function or not. For some special case, > it can predict the analysis result and push it early without > any scannings. The latter allows the analyze_function_body > to pass gimple stmts down just like fp_expressions handlings, > target can do its own tricks. > > To make it simple, this patch uses HOST_WIDE_INT to record the > flags just like what we use for isa_flags. For rs6000's HTM > need, one HOST_WIDE_INT variable is quite enough, but it seems > good to have one auto_vec for scalability as I noticed some > targets have more than one HOST_WIDE_INT flag. For now, this > target information collection is only for always_inline function, > function ipa_merge_fn_summary_after_inlining deals with target > information merging. These flags can in principle be separate from any flags the target keeps, so 64 bits will be enough for a long time. If we want to architect that better, we should really architect the way all targets do target flags first. Let's not go there now :-) So just one HOST_WIDE_INT, not a stack of them please? > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -13642,6 +13642,17 @@ rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED) > return rs6000_builtin_decls[code]; > } > > +/* Return true if the builtin with CODE has any mask bits set > + which are specified by MASK. */ > + > +bool > +rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask) > +{ > + gcc_assert (code < RS6000_BUILTIN_COUNT); > + HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask; > + return fnmask & mask; > +} The "_p" does not say that "any bits" part, which is crucial here. So name this something like "rs6000_fn_has_any_of_these_mask_bits"? Yes the name sucks, because this interface does :-P Its it useful to have "any" semantics at all? Otherwise, require this to be passed just a single bit? The implicit "!!" (or "!= 0", same thing) that casting to bool does might be better explicit, too? A cast to bool changes value so is more suprising than other casts. > + /* Assume inline asm can use any instruction features. */ > + if (gimple_code (stmt) == GIMPLE_ASM) > + { > + info[0] = -1; > + return false; > + } What is -1 here? "All options set"? Does that work? Reliably? > + if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD)) > + { > + enum rs6000_builtins fcode = > + (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl); > + /* HTM bifs definitely exploit HTM insns. */ > + if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM)) Why the cast here? Please change the parameter type, instead? It is fine to use enums specific to our backend in that backend itself :-) > @@ -1146,6 +1147,16 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node) > fprintf (f, " calls:\n"); > dump_ipa_call_summary (f, 4, node, s); > fprintf (f, "\n"); > + HOST_WIDE_INT flags; > + for (int i = 0; s->target_info.iterate (i, &flags); i++) > + { > + if (i == 0) > + { > + fprintf (f, " target_info flags:"); > + } Don't use blocks around single statements please. > + /* Only look for target information for inlinable always_inline functions. */ > + bool scan_for_target_info = > + (info->inlinable > + && DECL_DISREGARD_INLINE_LIMITS (node->decl) > + && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl)) > + && targetm.target_option.need_ipa_fn_target_info (node->decl, > + info->target_info)); Don't use unnecessary parens please. Segher
Hi Segher, Thanks for the comments! on 2021/9/3 上午1:44, Segher Boessenkool wrote: > Hi! > > On Wed, Sep 01, 2021 at 03:02:22PM +0800, Kewen.Lin wrote: >> It introduces two target hooks need_ipa_fn_target_info and >> update_ipa_fn_target_info. The former allows target to do >> some previous check and decides to collect target specific >> information for this function or not. For some special case, >> it can predict the analysis result and push it early without >> any scannings. The latter allows the analyze_function_body >> to pass gimple stmts down just like fp_expressions handlings, >> target can do its own tricks. >> >> To make it simple, this patch uses HOST_WIDE_INT to record the >> flags just like what we use for isa_flags. For rs6000's HTM >> need, one HOST_WIDE_INT variable is quite enough, but it seems >> good to have one auto_vec for scalability as I noticed some >> targets have more than one HOST_WIDE_INT flag. For now, this >> target information collection is only for always_inline function, >> function ipa_merge_fn_summary_after_inlining deals with target >> information merging. > > These flags can in principle be separate from any flags the target > keeps, so 64 bits will be enough for a long time. If we want to > architect that better, we should really architect the way all targets > do target flags first. Let's not go there now :-) > > So just one HOST_WIDE_INT, not a stack of them please? I considered this, it's fine to use this customized bit in the target hook, but back to target hook can_inline_p, we have to decoded them to the bits in isa_flags separately, it's inefficient than just using the whole mask if the interesting bits are more. As the discussion with Richi, theoretically speaking if target likes, it can try to scan for many isa features with target's own desicions, there could be much more bits. Another thing inspiring me to make it with one vector is that i386 port ix86_can_inline_p checks x_ix86_target_flags, x_ix86_isa_flags, x_ix86_isa_flags2, arch and tune etc. now, one HOST_WIDE_INT seems not good to it, if it wants to check more. ;-) > >> --- a/gcc/config/rs6000/rs6000-call.c >> +++ b/gcc/config/rs6000/rs6000-call.c >> @@ -13642,6 +13642,17 @@ rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED) >> return rs6000_builtin_decls[code]; >> } >> >> +/* Return true if the builtin with CODE has any mask bits set >> + which are specified by MASK. */ >> + >> +bool >> +rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask) >> +{ >> + gcc_assert (code < RS6000_BUILTIN_COUNT); >> + HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask; >> + return fnmask & mask; >> +} > > The "_p" does not say that "any bits" part, which is crucial here. So > name this something like "rs6000_fn_has_any_of_these_mask_bits"? Yes > the name sucks, because this interface does :-P > Thanks for the name, will fix it. :) > Its it useful to have "any" semantics at all? Otherwise, require this > to be passed just a single bit? > Since we can not just pass in a bit, we have to assert it with something like: gcc_assert (__builtin_popcount(mask) == 1); to claim it's checking a single bit. But the implementation logic still supports checking any bits, so I thought we can just claim it to check any bits and a single bit is just one special case. Yeah, not sure if there is a need to check any bits, but something like checking exists FRSQRTE and FRSQRTES bifs can pass (RS6000_BTM_FRSQRTE | RS6000_BTM_FRSQRTES), so is it fine to keep it for any bits? > The implicit "!!" (or "!= 0", same thing) that casting to bool does > might be better explicit, too? A cast to bool changes value so is more > suprising than other casts. OK, will fix it. > >> + /* Assume inline asm can use any instruction features. */ >> + if (gimple_code (stmt) == GIMPLE_ASM) >> + { >> + info[0] = -1; >> + return false; >> + } > > What is -1 here? "All options set"? Does that work? Reliably? > Good question, in the current implementation it's reliable, since we do operation "~" first then & the interesting bits (OPTION_MASK_HTM here) but I think you concerned some conflict bits co-exists is reasonable or not. I was intended to cover any future interesting bits, but I agree it's better to just set the correpsonding intersting bits to make it clear. Will fix it. >> + if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD)) >> + { >> + enum rs6000_builtins fcode = >> + (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl); >> + /* HTM bifs definitely exploit HTM insns. */ >> + if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM)) > > Why the cast here? Please change the parameter type, instead? It is > fine to use enums specific to our backend in that backend itself :-) > Referred to the exisitng rs6000_builtin_decl, just noticed it's a hook. Will fix it. >> @@ -1146,6 +1147,16 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node) >> fprintf (f, " calls:\n"); >> dump_ipa_call_summary (f, 4, node, s); >> fprintf (f, "\n"); >> + HOST_WIDE_INT flags; >> + for (int i = 0; s->target_info.iterate (i, &flags); i++) >> + { >> + if (i == 0) >> + { >> + fprintf (f, " target_info flags:"); >> + } > > Don't use blocks around single statements please. > Good catch, will fix it. >> + /* Only look for target information for inlinable always_inline functions. */ >> + bool scan_for_target_info = >> + (info->inlinable >> + && DECL_DISREGARD_INLINE_LIMITS (node->decl) >> + && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl)) >> + && targetm.target_option.need_ipa_fn_target_info (node->decl, >> + info->target_info)); > > Don't use unnecessary parens please. > OK, will fix it. BR, Kewen.
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index b92928c891a..0fbdab15e67 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -13642,6 +13642,17 @@ rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED) return rs6000_builtin_decls[code]; } +/* Return true if the builtin with CODE has any mask bits set + which are specified by MASK. */ + +bool +rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask) +{ + gcc_assert (code < RS6000_BUILTIN_COUNT); + HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask; + return fnmask & mask; +} + static void altivec_init_builtins (void) { diff --git a/gcc/config/rs6000/rs6000-internal.h b/gcc/config/rs6000/rs6000-internal.h index 88cf9bd5692..fc71b5637f1 100644 --- a/gcc/config/rs6000/rs6000-internal.h +++ b/gcc/config/rs6000/rs6000-internal.h @@ -179,6 +179,7 @@ extern tree rs6000_fold_builtin (tree fndecl ATTRIBUTE_UNUSED, int n_args ATTRIBUTE_UNUSED, tree *args ATTRIBUTE_UNUSED, bool ignore ATTRIBUTE_UNUSED); +extern bool rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask); #if TARGET_ELF extern bool rs6000_passes_ieee128; diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index c2582a3efab..2ee12275b78 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -71,6 +71,9 @@ #include "tree-vector-builder.h" #include "context.h" #include "tree-pass.h" +#include "symbol-summary.h" +#include "ipa-prop.h" +#include "ipa-fnsummary.h" #include "except.h" #if TARGET_XCOFF #include "xcoffout.h" /* get declarations of xcoff_*_section_name */ @@ -1780,6 +1783,12 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_INVALID_CONVERSION #define TARGET_INVALID_CONVERSION rs6000_invalid_conversion + +#undef TARGET_NEED_IPA_FN_TARGET_INFO +#define TARGET_NEED_IPA_FN_TARGET_INFO rs6000_need_ipa_fn_target_info + +#undef TARGET_UPDATE_IPA_FN_TARGET_INFO +#define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info /* Processor table. */ @@ -25052,7 +25061,70 @@ rs6000_generate_version_dispatcher_body (void *node_p) return resolver; } - +/* Hook to decide if we need to scan function gimples to collect + target specific information for inlining, push the initial flag + into INFO if we want to make use of target info. */ + +static bool +rs6000_need_ipa_fn_target_info (const_tree decl, vec<HOST_WIDE_INT> &info) +{ + tree target = DECL_FUNCTION_SPECIFIC_TARGET (decl); + if (!target) + target = target_option_default_node; + struct cl_target_option *opts = TREE_TARGET_OPTION (target); + + /* See PR102059, we only handle HTM for now, so will only do + the consequent scannings when HTM feature enabled. */ + if (opts->x_rs6000_isa_flags & OPTION_MASK_HTM) + { + info.safe_push (0); + return true; + } + + /* The given function disables HTM explicitly, it's safe to be + inlined from HTM feature's perspective and don't need any + further checkings. */ + if (opts->x_rs6000_isa_flags_explicit & OPTION_MASK_HTM) + { + info.safe_push (0); + return false; + } + + return false; +} + +/* Hook to update target specific information INFO for inlining by + scanning the given STMT. Return false if we don't need to scan + any more. */ + +static bool +rs6000_update_ipa_fn_target_info (vec<HOST_WIDE_INT> &info, const gimple *stmt) +{ + /* Assume inline asm can use any instruction features. */ + if (gimple_code (stmt) == GIMPLE_ASM) + { + info[0] = -1; + return false; + } + else if (gimple_code (stmt) == GIMPLE_CALL) + { + tree fndecl = gimple_call_fndecl (stmt); + if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD)) + { + enum rs6000_builtins fcode = + (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl); + /* HTM bifs definitely exploit HTM insns. */ + if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM)) + { + info[0] |= OPTION_MASK_HTM; + return false; + } + } + } + + return true; +} + /* Hook to determine if one function can safely inline another. */ static bool @@ -25085,10 +25157,22 @@ rs6000_can_inline_p (tree caller, tree callee) | OPTION_MASK_P10_FUSION_ADDLOG | OPTION_MASK_P10_FUSION_2ADD | OPTION_MASK_PCREL_OPT); - if (always_inline) { - caller_isa &= ~always_inline_safe_mask; - callee_isa &= ~always_inline_safe_mask; - } + if (always_inline) + { + cgraph_node *callee_node = cgraph_node::get (callee); + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL) + { + if (dump_file) + ipa_dump_fn_summary (dump_file, callee_node); + const vec<HOST_WIDE_INT> &info = + ipa_fn_summaries->get (callee_node)->target_info; + if (!info.is_empty ()) + always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM; + } + + caller_isa &= ~always_inline_safe_mask; + callee_isa &= ~always_inline_safe_mask; + } /* The callee's options must be a subset of the caller's options, i.e. a vsx function may inline an altivec function, but a no-vsx function diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c8f4abe3e41..4af5e3e6819 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -10743,6 +10743,37 @@ default, inlining is not allowed if the callee function has function specific target options and the caller does not use the same options. @end deftypefn +@deftypefn {Target Hook} bool TARGET_UPDATE_IPA_FN_TARGET_INFO (vec<HOST_WIDE_INT>& @var{vec}, const gimple* @var{stmt}) +Allow target to analyze all gimple statements for the given function to +record and update some target specific information for inlining. A typical +example is that a caller with one isa feature disabled is normally not +allowed to inline a callee with that same isa feature enabled even which is +attributed by always_inline, but with the conservative analysis on all +statements of the callee if we are able to guarantee the callee does not +exploit any instructions from the mismatch isa feature, it would be safe to +allow the caller to inline the callee. +@var{vec} is one @code{HOST_WIDE_INT} element vector to record information +in which one set bit indicates one corresponding feature is detected in the +analysis, @var{stmt} is the statement being analyzed. Return true if target +still need to analyze the subsequent statements, otherwise return false to +stop subsequent analysis. +The default version of this hook returns false. +@end deftypefn + +@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree @var{decl}, vec<HOST_WIDE_INT>& @var{vec}) +Allow target to check early whether it is necessary to analyze all gimple +statements in the given function to update target specific information for +inlining. See hook @code{update_ipa_fn_target_info} for usage example of +target specific information. This hook is expected to be invoked ahead of +the iterating with hook @code{update_ipa_fn_target_info}. +@var{decl} is the function being analyzed, @var{vec} is the same as what +in hook @code{update_ipa_fn_target_info}, target can do one time update +into @var{vec} without iterating for some case. Return true if target +decides to analyze all gimple statements to collect information, otherwise +return false. +The default version of this hook returns false. +@end deftypefn + @deftypefn {Target Hook} void TARGET_RELAYOUT_FUNCTION (tree @var{fndecl}) This target hook fixes function @var{fndecl} after attributes are processed. Default does nothing. On ARM, the default function's alignment is updated diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9c4b5016053..2b013649a45 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7196,6 +7196,10 @@ on this implementation detail. @hook TARGET_CAN_INLINE_P +@hook TARGET_UPDATE_IPA_FN_TARGET_INFO + +@hook TARGET_NEED_IPA_FN_TARGET_INFO + @hook TARGET_RELAYOUT_FUNCTION @node Emulated TLS diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index cf80ce3c040..1e5f29464e7 100644 --- a/gcc/ipa-fnsummary.c +++ b/gcc/ipa-fnsummary.c @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see #include "system.h" #include "coretypes.h" #include "backend.h" +#include "target.h" #include "tree.h" #include "gimple.h" #include "alloc-pool.h" @@ -1146,6 +1147,16 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node) fprintf (f, " calls:\n"); dump_ipa_call_summary (f, 4, node, s); fprintf (f, "\n"); + HOST_WIDE_INT flags; + for (int i = 0; s->target_info.iterate (i, &flags); i++) + { + if (i == 0) + { + fprintf (f, " target_info flags:"); + } + fprintf (f, " [%d]: " HOST_WIDE_INT_PRINT_HEX "\n", i, flags); + } + fprintf (f, "\n"); } else fprintf (f, "IPA summary for %s is missing.\n", node->dump_name ()); @@ -2608,6 +2619,7 @@ analyze_function_body (struct cgraph_node *node, bool early) info->conds = NULL; info->size_time_table.release (); info->call_size_time_table.release (); + info->target_info.release(); /* When optimizing and analyzing for IPA inliner, initialize loop optimizer so we can produce proper inline hints. @@ -2659,6 +2671,14 @@ analyze_function_body (struct cgraph_node *node, bool early) bb_predicate, bb_predicate); + /* Only look for target information for inlinable always_inline functions. */ + bool scan_for_target_info = + (info->inlinable + && DECL_DISREGARD_INLINE_LIMITS (node->decl) + && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl)) + && targetm.target_option.need_ipa_fn_target_info (node->decl, + info->target_info)); + if (fbi.info) compute_bb_predicates (&fbi, node, info, params_summary); const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; @@ -2876,6 +2896,10 @@ analyze_function_body (struct cgraph_node *node, bool early) if (dump_file) fprintf (dump_file, " fp_expression set\n"); } + if (scan_for_target_info) + scan_for_target_info = + targetm.target_option.update_ipa_fn_target_info + (info->target_info, stmt); } /* Account cost of address calculations in the statements. */ @@ -4073,6 +4097,22 @@ ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge) info->fp_expressions |= callee_info->fp_expressions; + /* For now, only always_inline functions have the target info. */ + if (callee_info->target_info.is_empty ()) + { + if (!info->target_info.is_empty ()) + /* We don't have target info from the inlined callee, discard the + current one to keep conservative. */ + info->target_info.truncate (0); + } + else if (!info->target_info.is_empty ()) + { + unsigned len = info->target_info.length (); + gcc_assert (len == callee_info->target_info.length ()); + for (unsigned i = 0; i < len; i++) + info->target_info[i] |= callee_info->target_info[i]; + } + if (callee_info->conds) { ipa_auto_call_arg_values avals; @@ -4534,6 +4574,15 @@ inline_read_section (struct lto_file_decl_data *file_data, const char *data, if (info) info->builtin_constant_p_parms.quick_push (parm); } + count2 = streamer_read_uhwi (&ib); + if (info && count2) + info->target_info.reserve_exact (count2); + for (j = 0; j < count2; j++) + { + HOST_WIDE_INT flag = streamer_read_hwi (&ib); + if (info) + info->target_info.quick_push (flag); + } for (e = node->callees; e; e = e->next_callee) read_ipa_call_summary (&ib, e, info != NULL); for (e = node->indirect_calls; e; e = e->next_callee) @@ -4713,6 +4762,10 @@ ipa_fn_summary_write (void) for (i = 0; info->builtin_constant_p_parms.iterate (i, &ip); i++) streamer_write_uhwi (ob, ip); + streamer_write_uhwi (ob, info->target_info.length ()); + HOST_WIDE_INT hwi; + for (i = 0; info->target_info.iterate (i, &hwi); i++) + streamer_write_hwi (ob, hwi); for (edge = cnode->callees; edge; edge = edge->next_callee) write_ipa_call_summary (ob, edge); for (edge = cnode->indirect_calls; edge; edge = edge->next_callee) diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h index 78399b0b9bb..300b8da4507 100644 --- a/gcc/ipa-fnsummary.h +++ b/gcc/ipa-fnsummary.h @@ -131,7 +131,7 @@ public: time (0), conds (NULL), size_time_table (), call_size_time_table (vNULL), loop_iterations (NULL), loop_strides (NULL), - builtin_constant_p_parms (vNULL), + builtin_constant_p_parms (vNULL), target_info (), growth (0), scc_no (0) { } @@ -146,6 +146,7 @@ public: call_size_time_table (vNULL), loop_iterations (s.loop_iterations), loop_strides (s.loop_strides), builtin_constant_p_parms (s.builtin_constant_p_parms), + target_info (s.target_info.copy()), growth (s.growth), scc_no (s.scc_no) {} @@ -193,6 +194,9 @@ public: vec<ipa_freqcounting_predicate, va_gc> *loop_strides; /* Parameters tested by builtin_constant_p. */ vec<int, va_heap, vl_ptr> GTY((skip)) builtin_constant_p_parms; + /* Like fp_expressions, but it's to hold some target specific information, + such as some target specific isa flags. */ + auto_vec<HOST_WIDE_INT> GTY((skip)) target_info; /* Estimated growth for inlining all copies of the function before start of small functions inlining. This value will get out of date as the callers are duplicated, but diff --git a/gcc/testsuite/gcc.dg/lto/pr102059_0.c b/gcc/testsuite/gcc.dg/lto/pr102059_0.c new file mode 100644 index 00000000000..f164d96637e --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr102059_0.c @@ -0,0 +1,12 @@ +/* { dg-lto-do link } */ +/* { dg-skip-if "power only" { ! { powerpc*-*-* } } } */ +/* -Wno-attributes suppresses always_inline warnings. */ +/* { dg-lto-options { "-O2 -mdejagnu-cpu=power8 -flto -Wno-attributes" } } */ + +int __attribute__ ((always_inline)) +foo1 (int *b) +{ + *b += 100; + return *b; +} + diff --git a/gcc/testsuite/gcc.dg/lto/pr102059_1.c b/gcc/testsuite/gcc.dg/lto/pr102059_1.c new file mode 100644 index 00000000000..7e31fc7fbd9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr102059_1.c @@ -0,0 +1,9 @@ +extern int foo1 (int *b); + +int __attribute__ ((always_inline)) foo2 (int *b) +{ + int res = foo1 (b); + *b += res; + return *b; +} + diff --git a/gcc/testsuite/gcc.dg/lto/pr102059_2.c b/gcc/testsuite/gcc.dg/lto/pr102059_2.c new file mode 100644 index 00000000000..bfa1e62fb7a --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr102059_2.c @@ -0,0 +1,11 @@ +extern int foo2 (int *b); + +#pragma GCC target "cpu=power10" +__attribute__ ((always_inline)) +int +main (int *a) +{ + *a = foo2 (a); + return 0; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-5.c b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c new file mode 100644 index 00000000000..b969c4c4750 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-5.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* -Wno-attributes suppresses always_inline warnings. */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -Wno-attributes" } */ + +/* Verify the reduced case from PR102059 won't fail. */ + +__attribute__ ((always_inline)) int +foo (int *b) +{ + *b += 10; + return *b; +} + +#pragma GCC target "cpu=power10" +int +bar (int *a) +{ + *a = foo (a); + return 0; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-6.c b/gcc/testsuite/gcc.target/powerpc/pr102059-6.c new file mode 100644 index 00000000000..52e009289f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-6.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-htm" } */ + +/* Verify target info for inlining still works even if callee + disables HTM explicitly while caller enables it. */ + +static inline int __attribute__ ((always_inline)) +foo (int *b) +{ + *b += 10; + return *b; +} + +#pragma GCC target "htm" +int +bar (int *a) +{ + *a = foo (a); + return 0; +} + diff --git a/gcc/target.def b/gcc/target.def index 2e40448e6c5..92e4b0be667 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6567,6 +6567,41 @@ specific target options and the caller does not use the same options.", bool, (tree caller, tree callee), default_target_can_inline_p) +DEFHOOK +(update_ipa_fn_target_info, + "Allow target to analyze all gimple statements for the given function to\n\ +record and update some target specific information for inlining. A typical\n\ +example is that a caller with one isa feature disabled is normally not\n\ +allowed to inline a callee with that same isa feature enabled even which is\n\ +attributed by always_inline, but with the conservative analysis on all\n\ +statements of the callee if we are able to guarantee the callee does not\n\ +exploit any instructions from the mismatch isa feature, it would be safe to\n\ +allow the caller to inline the callee.\n\ +@var{vec} is one @code{HOST_WIDE_INT} element vector to record information\n\ +in which one set bit indicates one corresponding feature is detected in the\n\ +analysis, @var{stmt} is the statement being analyzed. Return true if target\n\ +still need to analyze the subsequent statements, otherwise return false to\n\ +stop subsequent analysis.\n\ +The default version of this hook returns false.", + bool, (vec<HOST_WIDE_INT>& vec, const gimple* stmt), + default_update_ipa_fn_target_info) + +DEFHOOK +(need_ipa_fn_target_info, + "Allow target to check early whether it is necessary to analyze all gimple\n\ +statements in the given function to update target specific information for\n\ +inlining. See hook @code{update_ipa_fn_target_info} for usage example of\n\ +target specific information. This hook is expected to be invoked ahead of\n\ +the iterating with hook @code{update_ipa_fn_target_info}.\n\ +@var{decl} is the function being analyzed, @var{vec} is the same as what\n\ +in hook @code{update_ipa_fn_target_info}, target can do one time update\n\ +into @var{vec} without iterating for some case. Return true if target\n\ +decides to analyze all gimple statements to collect information, otherwise\n\ +return false.\n\ +The default version of this hook returns false.", + bool, (const_tree decl, vec<HOST_WIDE_INT>& vec), + default_need_ipa_fn_target_info) + DEFHOOK (relayout_function, "This target hook fixes function @var{fndecl} after attributes are processed.\n\ diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 44a1facedcf..b14f5e22ba8 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1757,6 +1757,22 @@ default_target_can_inline_p (tree caller, tree callee) return callee_opts == caller_opts; } +/* By default, return false to not need to collect any target information + for inlining. Target maintainer should re-define the hook if the + target want to take advantage of it. */ + +bool +default_need_ipa_fn_target_info (const_tree, vec<HOST_WIDE_INT> &) +{ + return false; +} + +bool +default_update_ipa_fn_target_info (vec<HOST_WIDE_INT> &, const gimple *) +{ + return false; +} + /* If the machine does not have a case insn that compares the bounds, this means extra overhead for dispatch tables, which raises the threshold for using them. */ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index f70a307d26c..7950f907d27 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -193,6 +193,9 @@ extern bool default_new_address_profitable_p (rtx, rtx_insn *, rtx); extern bool default_target_option_valid_attribute_p (tree, tree, tree, int); extern bool default_target_option_pragma_parse (tree, tree); extern bool default_target_can_inline_p (tree, tree); +extern bool default_update_ipa_fn_target_info (vec<HOST_WIDE_INT> &, + const gimple *); +extern bool default_need_ipa_fn_target_info (const_tree, vec<HOST_WIDE_INT> &); extern bool default_valid_pointer_mode (scalar_int_mode); extern bool default_ref_may_alias_errno (class ao_ref *); extern scalar_int_mode default_addr_space_pointer_mode (addr_space_t);