Message ID | 20230626023408.33758-1-hongyu.wang@intel.com |
---|---|
State | New |
Headers | show |
Series | i386: Relax inline requirement for functions with different target attrs | expand |
On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > Hi, > > For function with different target attributes, current logic rejects to > inline the callee when any arch or tune is mismatched. Relax the > condition to honor just prefer_vecotr_width_type and other flags that > may cause safety issue so caller can get more optimization opportunity. I don't think this is desirable. If we inline something with different ISAs, we get some strange mix of ISAs when the function is inlined. OTOH - we already inline with mismatched tune flags if the function is marked with always_inline. Uros. > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or > tune directly, just check prefer_vector_width_type and make sure > not to inline if they mismatch. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/inline-target-attr.c: New test. > --- > gcc/config/i386/i386.cc | 11 +++++---- > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++ > 2 files changed, 30 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 0761965344b..1d86384ac06 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > ret = false; > > - /* See if arch, tune, etc. are the same. */ > - else if (caller_opts->arch != callee_opts->arch) > - ret = false; > - > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > + /* Do not inline when specified perfer-vector-width mismatched between > + callee and caller. */ > + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE > + && caller_opts->x_prefer_vector_width_type != PVW_NONE) > + && callee_opts->x_prefer_vector_width_type > + != caller_opts->x_prefer_vector_width_type) > ret = false; > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > new file mode 100644 > index 00000000000..995502165f0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > + > +__attribute__((target("arch=skylake"))) > +int callee (int n) > +{ > + int sum = 0; > + for (int i = 0; i < n; i++) > + { > + if (i % 2 == 0) > + sum +=i; > + else > + sum += (i - 1); > + } > + return sum + n; > +} > + > +__attribute__((target("arch=icelake-server"))) > +int caller (int n) > +{ > + return callee (n) + n; > +} > + > -- > 2.31.1 >
> I don't think this is desirable. If we inline something with different > ISAs, we get some strange mix of ISAs when the function is inlined. > OTOH - we already inline with mismatched tune flags if the function is > marked with always_inline. Previously ix86_can_inline_p has if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) != callee_opts->x_ix86_isa_flags) || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) != callee_opts->x_ix86_isa_flags2)) ret = false; It make sure caller ISA is a super set of callee, and the inlined one should follow caller's ISA specification. IMHO I cannot give a real example that after inline the caller's performance get harmed, I added PVW since there might be some callee want to limit its vector size and caller may have larger preferred vector size. At least with current change we get more optimization opportunity for different target_clones. But I agree the tuning setting may be a factor that affect the performance. One possible choice is that if the tune for callee is unspecified or default, just inline it to the caller with specified arch and tune. Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道: > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > > > Hi, > > > > For function with different target attributes, current logic rejects to > > inline the callee when any arch or tune is mismatched. Relax the > > condition to honor just prefer_vecotr_width_type and other flags that > > may cause safety issue so caller can get more optimization opportunity. > > I don't think this is desirable. If we inline something with different > ISAs, we get some strange mix of ISAs when the function is inlined. > OTOH - we already inline with mismatched tune flags if the function is > marked with always_inline. > > Uros. > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or > > tune directly, just check prefer_vector_width_type and make sure > > not to inline if they mismatch. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/inline-target-attr.c: New test. > > --- > > gcc/config/i386/i386.cc | 11 +++++---- > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++ > > 2 files changed, 30 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index 0761965344b..1d86384ac06 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) > > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > > ret = false; > > > > - /* See if arch, tune, etc. are the same. */ > > - else if (caller_opts->arch != callee_opts->arch) > > - ret = false; > > - > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > + /* Do not inline when specified perfer-vector-width mismatched between > > + callee and caller. */ > > + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE > > + && caller_opts->x_prefer_vector_width_type != PVW_NONE) > > + && callee_opts->x_prefer_vector_width_type > > + != caller_opts->x_prefer_vector_width_type) > > ret = false; > > > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > new file mode 100644 > > index 00000000000..995502165f0 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > > + > > +__attribute__((target("arch=skylake"))) > > +int callee (int n) > > +{ > > + int sum = 0; > > + for (int i = 0; i < n; i++) > > + { > > + if (i % 2 == 0) > > + sum +=i; > > + else > > + sum += (i - 1); > > + } > > + return sum + n; > > +} > > + > > +__attribute__((target("arch=icelake-server"))) > > +int caller (int n) > > +{ > > + return callee (n) + n; > > +} > > + > > -- > > 2.31.1 > >
On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote: > > > I don't think this is desirable. If we inline something with different > > ISAs, we get some strange mix of ISAs when the function is inlined. > > OTOH - we already inline with mismatched tune flags if the function is > > marked with always_inline. > > Previously ix86_can_inline_p has > > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > != callee_opts->x_ix86_isa_flags) > || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) > != callee_opts->x_ix86_isa_flags2)) > ret = false; > > It make sure caller ISA is a super set of callee, and the inlined one > should follow caller's ISA specification. > > IMHO I cannot give a real example that after inline the caller's > performance get harmed, I added PVW since there might > be some callee want to limit its vector size and caller may have > larger preferred vector size. At least with current change > we get more optimization opportunity for different target_clones. > > But I agree the tuning setting may be a factor that affect the > performance. One possible choice is that if the > tune for callee is unspecified or default, just inline it to the > caller with specified arch and tune. If the user specified a different arch for callee than the caller, then the compiler will switch on different ISAs (-march is just a shortcut for different ISA packs), and the programmer is aware that inlining isn't intended here (we have -mtune, which is not as strong as -march, but even functions with different -mtune are not inlined without always_inline attribute). This is documented as: --q-- On the x86, the inliner does not inline a function that has different target options than the caller, unless the callee has a subset of the target options of the caller. For example a function declared with target("sse3") can inline a function with target("sse2"), since -msse3 implies -msse2. --/q-- I don't think arch=skylake can be considered as a subset of arch=icelake-server. I agree that the compiler should reject functions with different PVW. This is also in accordance with the documentation. Uros. > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道: > > > > > > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > > > > > Hi, > > > > > > For function with different target attributes, current logic rejects to > > > inline the callee when any arch or tune is mismatched. Relax the > > > condition to honor just prefer_vecotr_width_type and other flags that > > > may cause safety issue so caller can get more optimization opportunity. > > > > I don't think this is desirable. If we inline something with different > > ISAs, we get some strange mix of ISAs when the function is inlined. > > OTOH - we already inline with mismatched tune flags if the function is > > marked with always_inline. > > > > Uros. > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or > > > tune directly, just check prefer_vector_width_type and make sure > > > not to inline if they mismatch. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/inline-target-attr.c: New test. > > > --- > > > gcc/config/i386/i386.cc | 11 +++++---- > > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++ > > > 2 files changed, 30 insertions(+), 5 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > index 0761965344b..1d86384ac06 100644 > > > --- a/gcc/config/i386/i386.cc > > > +++ b/gcc/config/i386/i386.cc > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) > > > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > > > ret = false; > > > > > > - /* See if arch, tune, etc. are the same. */ > > > - else if (caller_opts->arch != callee_opts->arch) > > > - ret = false; > > > - > > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > + /* Do not inline when specified perfer-vector-width mismatched between > > > + callee and caller. */ > > > + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE > > > + && caller_opts->x_prefer_vector_width_type != PVW_NONE) > > > + && callee_opts->x_prefer_vector_width_type > > > + != caller_opts->x_prefer_vector_width_type) > > > ret = false; > > > > > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > new file mode 100644 > > > index 00000000000..995502165f0 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2" } */ > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > > > + > > > +__attribute__((target("arch=skylake"))) > > > +int callee (int n) > > > +{ > > > + int sum = 0; > > > + for (int i = 0; i < n; i++) > > > + { > > > + if (i % 2 == 0) > > > + sum +=i; > > > + else > > > + sum += (i - 1); > > > + } > > > + return sum + n; > > > +} > > > + > > > +__attribute__((target("arch=icelake-server"))) > > > +int caller (int n) > > > +{ > > > + return callee (n) + n; > > > +} > > > + > > > -- > > > 2.31.1 > > >
> If the user specified a different arch for callee than the caller, > then the compiler will switch on different ISAs (-march is just a > shortcut for different ISA packs), and the programmer is aware that > inlining isn't intended here (we have -mtune, which is not as strong > as -march, but even functions with different -mtune are not inlined > without always_inline attribute). This is documented as: The original issue comes from a case like float callee (float a, float b, float c, float d, float e, float f, float g, float h) { return a * b + c * d + e * f + g + h + a * c + b * c + a * d + b * e + a * f + c * h + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; } __attribute__((target_clones("default","arch=icelake-server"))) void caller (int n, float *a, float c1, float c2, float c3, float c4, float c5, float c6, float c7) { for (int i = 0; i < n; i++) { a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); } } For current gcc, the .icelake_server clone fails to inline callee due to target specific option mismatch, while the .default clone succeeded and the loop get vectorized. I think it is not reasonable that the specific clone with higher arch cannot produce better code. So I think at least we can decide to inline those callee without any arch/tune specified, but for now they are rejected by the strict arch= and tune= check. Uros Bizjak <ubizjak@gmail.com> 于2023年6月28日周三 14:43写道: > > On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote: > > > > > I don't think this is desirable. If we inline something with different > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > OTOH - we already inline with mismatched tune flags if the function is > > > marked with always_inline. > > > > Previously ix86_can_inline_p has > > > > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > > != callee_opts->x_ix86_isa_flags) > > || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) > > != callee_opts->x_ix86_isa_flags2)) > > ret = false; > > > > It make sure caller ISA is a super set of callee, and the inlined one > > should follow caller's ISA specification. > > > > IMHO I cannot give a real example that after inline the caller's > > performance get harmed, I added PVW since there might > > be some callee want to limit its vector size and caller may have > > larger preferred vector size. At least with current change > > we get more optimization opportunity for different target_clones. > > > > But I agree the tuning setting may be a factor that affect the > > performance. One possible choice is that if the > > tune for callee is unspecified or default, just inline it to the > > caller with specified arch and tune. > > If the user specified a different arch for callee than the caller, > then the compiler will switch on different ISAs (-march is just a > shortcut for different ISA packs), and the programmer is aware that > inlining isn't intended here (we have -mtune, which is not as strong > as -march, but even functions with different -mtune are not inlined > without always_inline attribute). This is documented as: > > --q-- > On the x86, the inliner does not inline a function that has different > target options than the caller, unless the callee has a subset of the > target options of the caller. For example a function declared with > target("sse3") can inline a function with target("sse2"), since -msse3 > implies -msse2. > --/q-- > > I don't think arch=skylake can be considered as a subset of arch=icelake-server. > > I agree that the compiler should reject functions with different PVW. > This is also in accordance with the documentation. > > Uros. > > > > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道: > > > > > > > > > > > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > > > > > > > Hi, > > > > > > > > For function with different target attributes, current logic rejects to > > > > inline the callee when any arch or tune is mismatched. Relax the > > > > condition to honor just prefer_vecotr_width_type and other flags that > > > > may cause safety issue so caller can get more optimization opportunity. > > > > > > I don't think this is desirable. If we inline something with different > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > OTOH - we already inline with mismatched tune flags if the function is > > > marked with always_inline. > > > > > > Uros. > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > > > > > Ok for trunk? > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or > > > > tune directly, just check prefer_vector_width_type and make sure > > > > not to inline if they mismatch. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/i386/inline-target-attr.c: New test. > > > > --- > > > > gcc/config/i386/i386.cc | 11 +++++---- > > > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++ > > > > 2 files changed, 30 insertions(+), 5 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > index 0761965344b..1d86384ac06 100644 > > > > --- a/gcc/config/i386/i386.cc > > > > +++ b/gcc/config/i386/i386.cc > > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) > > > > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > > > > ret = false; > > > > > > > > - /* See if arch, tune, etc. are the same. */ > > > > - else if (caller_opts->arch != callee_opts->arch) > > > > - ret = false; > > > > - > > > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > > + /* Do not inline when specified perfer-vector-width mismatched between > > > > + callee and caller. */ > > > > + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE > > > > + && caller_opts->x_prefer_vector_width_type != PVW_NONE) > > > > + && callee_opts->x_prefer_vector_width_type > > > > + != caller_opts->x_prefer_vector_width_type) > > > > ret = false; > > > > > > > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > new file mode 100644 > > > > index 00000000000..995502165f0 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > @@ -0,0 +1,24 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2" } */ > > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > > > > + > > > > +__attribute__((target("arch=skylake"))) > > > > +int callee (int n) > > > > +{ > > > > + int sum = 0; > > > > + for (int i = 0; i < n; i++) > > > > + { > > > > + if (i % 2 == 0) > > > > + sum +=i; > > > > + else > > > > + sum += (i - 1); > > > > + } > > > > + return sum + n; > > > > +} > > > > + > > > > +__attribute__((target("arch=icelake-server"))) > > > > +int caller (int n) > > > > +{ > > > > + return callee (n) + n; > > > > +} > > > > + > > > > -- > > > > 2.31.1 > > > >
On Wed, Jun 28, 2023 at 10:20 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote: > > > If the user specified a different arch for callee than the caller, > > then the compiler will switch on different ISAs (-march is just a > > shortcut for different ISA packs), and the programmer is aware that > > inlining isn't intended here (we have -mtune, which is not as strong > > as -march, but even functions with different -mtune are not inlined > > without always_inline attribute). This is documented as: > > The original issue comes from a case like > > float callee (float a, float b, float c, float d, > float e, float f, float g, float h) > { > return a * b + c * d + e * f + g + h + a * c + b * c > + a * d + b * e + a * f + c * h + > b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; > } > > __attribute__((target_clones("default","arch=icelake-server"))) > void caller (int n, float *a, > float c1, float c2, float c3, > float c4, float c5, float c6, > float c7) > { > for (int i = 0; i < n; i++) > { > a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); > } > } > > For current gcc, the .icelake_server clone fails to inline callee due > to target specific option mismatch, while the .default clone > succeeded and the loop get vectorized. I think it is not reasonable > that the specific clone with higher arch cannot produce better code. > So I think at least we can decide to inline those callee without any > arch/tune specified, but for now they are rejected by the strict arch= > and tune= check. Yes, I think it is reasonable to inline callee without an arch/tune specified. We expect "default" callee to have properties that allow inlining it into all callers, independent of callers arch/tune target attribute. Uros. > > Uros Bizjak <ubizjak@gmail.com> 于2023年6月28日周三 14:43写道: > > > > On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote: > > > > > > > I don't think this is desirable. If we inline something with different > > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > > OTOH - we already inline with mismatched tune flags if the function is > > > > marked with always_inline. > > > > > > Previously ix86_can_inline_p has > > > > > > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > > > != callee_opts->x_ix86_isa_flags) > > > || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) > > > != callee_opts->x_ix86_isa_flags2)) > > > ret = false; > > > > > > It make sure caller ISA is a super set of callee, and the inlined one > > > should follow caller's ISA specification. > > > > > > IMHO I cannot give a real example that after inline the caller's > > > performance get harmed, I added PVW since there might > > > be some callee want to limit its vector size and caller may have > > > larger preferred vector size. At least with current change > > > we get more optimization opportunity for different target_clones. > > > > > > But I agree the tuning setting may be a factor that affect the > > > performance. One possible choice is that if the > > > tune for callee is unspecified or default, just inline it to the > > > caller with specified arch and tune. > > > > If the user specified a different arch for callee than the caller, > > then the compiler will switch on different ISAs (-march is just a > > shortcut for different ISA packs), and the programmer is aware that > > inlining isn't intended here (we have -mtune, which is not as strong > > as -march, but even functions with different -mtune are not inlined > > without always_inline attribute). This is documented as: > > > > --q-- > > On the x86, the inliner does not inline a function that has different > > target options than the caller, unless the callee has a subset of the > > target options of the caller. For example a function declared with > > target("sse3") can inline a function with target("sse2"), since -msse3 > > implies -msse2. > > --/q-- > > > > I don't think arch=skylake can be considered as a subset of arch=icelake-server. > > > > I agree that the compiler should reject functions with different PVW. > > This is also in accordance with the documentation. > > > > Uros. > > > > > > > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道: > > > > > > > > > > > > > > > > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > > > > > > > > > Hi, > > > > > > > > > > For function with different target attributes, current logic rejects to > > > > > inline the callee when any arch or tune is mismatched. Relax the > > > > > condition to honor just prefer_vecotr_width_type and other flags that > > > > > may cause safety issue so caller can get more optimization opportunity. > > > > > > > > I don't think this is desirable. If we inline something with different > > > > ISAs, we get some strange mix of ISAs when the function is inlined. > > > > OTOH - we already inline with mismatched tune flags if the function is > > > > marked with always_inline. > > > > > > > > Uros. > > > > > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} > > > > > > > > > > Ok for trunk? > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or > > > > > tune directly, just check prefer_vector_width_type and make sure > > > > > not to inline if they mismatch. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * gcc.target/i386/inline-target-attr.c: New test. > > > > > --- > > > > > gcc/config/i386/i386.cc | 11 +++++---- > > > > > .../gcc.target/i386/inline-target-attr.c | 24 +++++++++++++++++++ > > > > > 2 files changed, 30 insertions(+), 5 deletions(-) > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > > index 0761965344b..1d86384ac06 100644 > > > > > --- a/gcc/config/i386/i386.cc > > > > > +++ b/gcc/config/i386/i386.cc > > > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) > > > > > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > > > > > ret = false; > > > > > > > > > > - /* See if arch, tune, etc. are the same. */ > > > > > - else if (caller_opts->arch != callee_opts->arch) > > > > > - ret = false; > > > > > - > > > > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > > > + /* Do not inline when specified perfer-vector-width mismatched between > > > > > + callee and caller. */ > > > > > + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE > > > > > + && caller_opts->x_prefer_vector_width_type != PVW_NONE) > > > > > + && callee_opts->x_prefer_vector_width_type > > > > > + != caller_opts->x_prefer_vector_width_type) > > > > > ret = false; > > > > > > > > > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > new file mode 100644 > > > > > index 00000000000..995502165f0 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c > > > > > @@ -0,0 +1,24 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O2" } */ > > > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ > > > > > + > > > > > +__attribute__((target("arch=skylake"))) > > > > > +int callee (int n) > > > > > +{ > > > > > + int sum = 0; > > > > > + for (int i = 0; i < n; i++) > > > > > + { > > > > > + if (i % 2 == 0) > > > > > + sum +=i; > > > > > + else > > > > > + sum += (i - 1); > > > > > + } > > > > > + return sum + n; > > > > > +} > > > > > + > > > > > +__attribute__((target("arch=icelake-server"))) > > > > > +int caller (int n) > > > > > +{ > > > > > + return callee (n) + n; > > > > > +} > > > > > + > > > > > -- > > > > > 2.31.1 > > > > >
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 0761965344b..1d86384ac06 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee) != (callee_opts->x_target_flags & ~always_inline_safe_mask)) ret = false; - /* See if arch, tune, etc. are the same. */ - else if (caller_opts->arch != callee_opts->arch) - ret = false; - - else if (!always_inline && caller_opts->tune != callee_opts->tune) + /* Do not inline when specified perfer-vector-width mismatched between + callee and caller. */ + else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE + && caller_opts->x_prefer_vector_width_type != PVW_NONE) + && callee_opts->x_prefer_vector_width_type + != caller_opts->x_prefer_vector_width_type) ret = false; else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c new file mode 100644 index 00000000000..995502165f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */ + +__attribute__((target("arch=skylake"))) +int callee (int n) +{ + int sum = 0; + for (int i = 0; i < n; i++) + { + if (i % 2 == 0) + sum +=i; + else + sum += (i - 1); + } + return sum + n; +} + +__attribute__((target("arch=icelake-server"))) +int caller (int n) +{ + return callee (n) + n; +} +