Message ID | 46838de4-3d92-a270-e71a-73fbe923d306@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v3] vect: Recog mul_highpart pattern | expand |
On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: > > on 2021/7/14 下午2:38, Richard Biener wrote: > >> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>> > >>> on 2021/7/13 下午8:42, Richard Biener wrote: > >>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? > >> > >> Yes. > >> > > > > Thanks for confirming! The related patch v2 is attached and the testing > > is ongoing. > > > > It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and > aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: > > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw These XFAILs should be removed after your patch. This is PR100696 [1], we want PMULH.W here, so x86 part of the patch is actually not needed. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696 Uros. > They weren't exposed in the testing run with the previous patch which > doesn't use IFN way. By investigating it, the difference comes from > the different costing on MULT_HIGHPART_EXPR and IFN_MULH. > > For MULT_HIGHPART_EXPR, it's costed by 16 from below call: > > case MULT_EXPR: > case WIDEN_MULT_EXPR: > case MULT_HIGHPART_EXPR: > stmt_cost = ix86_multiplication_cost (ix86_cost, mode); > > While for IFN_MULH, it's costed by 4 as normal stmt so the total cost > becomes profitable and the expected vectorization happens. > > One conservative fix seems to make IFN_MULH costing go through the > unique cost interface for multiplication, that is: > > case CFN_MULH: > stmt_cost = ix86_multiplication_cost (ix86_cost, mode); > break; > > As the test case marks the checks as "xfail", probably it's good to > revisit the costing on mul_highpart to ensure it's not priced more. > > The attached patch also addressed Richard S.'s review comments on > two reformatting hunks. Is it ok for trunk? > > BR, > Kewen > ----- > gcc/ChangeLog: > > * internal-fn.c (first_commutative_argument): Add info for IFN_MULH. > * internal-fn.def (IFN_MULH): New internal function. > * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to > recog normal multiply highpart as IFN_MULH. > * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined > function CFN_MULH.
Hi Uros, on 2021/7/15 下午3:17, Uros Bizjak wrote: > On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: >>> on 2021/7/14 下午2:38, Richard Biener wrote: >>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>>> >>>>> on 2021/7/13 下午8:42, Richard Biener wrote: >>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>> >>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? >>>> >>>> Yes. >>>> >>> >>> Thanks for confirming! The related patch v2 is attached and the testing >>> is ongoing. >>> >> >> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and >> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: >> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > > These XFAILs should be removed after your patch. > I'm curious whether it's intentional not to specify -fno-vect-cost-model for this test case. As noted above, this case is sensitive on how we cost mult_highpart. Without cost modeling, the XFAILs can be removed only with this mul_highpart pattern support, no matter how we model it (x86 part of this patch exists or not). > This is PR100696 [1], we want PMULH.W here, so x86 part of the patch > is actually not needed. > Thanks for the information! The justification for the x86 part is that: the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart optab support, i386 port has already customized costing for MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab support), if we don't follow the same way for IFN_MULH, I'm worried that we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is a right thing (we shouldn't cost it specially), it at least means we have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it has direct mul_highpart optab support, I think they should be costed consistently. Does it sound reasonable? BR, Kewen > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696 > > Uros. > >> They weren't exposed in the testing run with the previous patch which >> doesn't use IFN way. By investigating it, the difference comes from >> the different costing on MULT_HIGHPART_EXPR and IFN_MULH. >> >> For MULT_HIGHPART_EXPR, it's costed by 16 from below call: >> >> case MULT_EXPR: >> case WIDEN_MULT_EXPR: >> case MULT_HIGHPART_EXPR: >> stmt_cost = ix86_multiplication_cost (ix86_cost, mode); >> >> While for IFN_MULH, it's costed by 4 as normal stmt so the total cost >> becomes profitable and the expected vectorization happens. >> >> One conservative fix seems to make IFN_MULH costing go through the >> unique cost interface for multiplication, that is: >> >> case CFN_MULH: >> stmt_cost = ix86_multiplication_cost (ix86_cost, mode); >> break; >> >> As the test case marks the checks as "xfail", probably it's good to >> revisit the costing on mul_highpart to ensure it's not priced more. >> >> The attached patch also addressed Richard S.'s review comments on >> two reformatting hunks. Is it ok for trunk? >> >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * internal-fn.c (first_commutative_argument): Add info for IFN_MULH. >> * internal-fn.def (IFN_MULH): New internal function. >> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to >> recog normal multiply highpart as IFN_MULH. >> * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined >> function CFN_MULH.
On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > Hi Uros, > > on 2021/7/15 下午3:17, Uros Bizjak wrote: > > On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: > >>> on 2021/7/14 下午2:38, Richard Biener wrote: > >>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>>>> > >>>>> on 2021/7/13 下午8:42, Richard Biener wrote: > >>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>>> > >>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? > >>>> > >>>> Yes. > >>>> > >>> > >>> Thanks for confirming! The related patch v2 is attached and the testing > >>> is ongoing. > >>> > >> > >> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and > >> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: > >> > >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > > > > These XFAILs should be removed after your patch. > > > I'm curious whether it's intentional not to specify -fno-vect-cost-model > for this test case. As noted above, this case is sensitive on how we > cost mult_highpart. Without cost modeling, the XFAILs can be removed > only with this mul_highpart pattern support, no matter how we model it > (x86 part of this patch exists or not). > > > This is PR100696 [1], we want PMULH.W here, so x86 part of the patch > > is actually not needed. > > > > Thanks for the information! The justification for the x86 part is that: > the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart > optab support, i386 port has already customized costing for > MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab > support), if we don't follow the same way for IFN_MULH, I'm worried that > we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is > a right thing (we shouldn't cost it specially), it at least means we > have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it > has direct mul_highpart optab support, I think they should be costed > consistently. Does it sound reasonable? Ah, I was under impression that i386 part was introduced to avoid generation of PMULHW instructions in the testcases above (to keep XFAILs). Based on your explanation - yes, the costing function should be the same. So, the x86 part is OK. Thanks, Uros.
on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote: > Hi Uros, > > on 2021/7/15 下午3:17, Uros Bizjak wrote: >> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >>> >>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: >>>> on 2021/7/14 下午2:38, Richard Biener wrote: >>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>>>> >>>>>> on 2021/7/13 下午8:42, Richard Biener wrote: >>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>>> >>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? >>>>> >>>>> Yes. >>>>> >>>> >>>> Thanks for confirming! The related patch v2 is attached and the testing >>>> is ongoing. >>>> >>> >>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and >>> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: >>> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >> >> These XFAILs should be removed after your patch. >> > I'm curious whether it's intentional not to specify -fno-vect-cost-model > for this test case. As noted above, this case is sensitive on how we > cost mult_highpart. Without cost modeling, the XFAILs can be removed > only with this mul_highpart pattern support, no matter how we model it > (x86 part of this patch exists or not). > >> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch >> is actually not needed. >> > > Thanks for the information! The justification for the x86 part is that: > the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart > optab support, i386 port has already customized costing for > MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab > support), if we don't follow the same way for IFN_MULH, I'm worried that > we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is > a right thing (we shouldn't cost it specially), it at least means we > have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it > has direct mul_highpart optab support, I think they should be costed > consistently. Does it sound reasonable? > Hi Richard(s), This possibly inconsistent handling problem seems like a counter example better to use a new IFN rather than the existing tree_code, it seems hard to maintain (should remember to keep consistent for its handlings). ;) From this perspective, maybe it's better to move backward to use tree_code and guard it under can_mult_highpart_p == 1 (just like IFN and avoid costing issue Richi pointed out before)? What do you think? BR, Kewen
on 2021/7/15 下午4:23, Uros Bizjak wrote: > On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >> >> Hi Uros, >> >> on 2021/7/15 下午3:17, Uros Bizjak wrote: >>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>> >>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: >>>>> on 2021/7/14 下午2:38, Richard Biener wrote: >>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>>>>> >>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote: >>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote: >>>>>> >>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? >>>>>> >>>>>> Yes. >>>>>> >>>>> >>>>> Thanks for confirming! The related patch v2 is attached and the testing >>>>> is ongoing. >>>>> >>>> >>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and >>>> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: >>>> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw >>> >>> These XFAILs should be removed after your patch. >>> >> I'm curious whether it's intentional not to specify -fno-vect-cost-model >> for this test case. As noted above, this case is sensitive on how we >> cost mult_highpart. Without cost modeling, the XFAILs can be removed >> only with this mul_highpart pattern support, no matter how we model it >> (x86 part of this patch exists or not). >> >>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch >>> is actually not needed. >>> >> >> Thanks for the information! The justification for the x86 part is that: >> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart >> optab support, i386 port has already customized costing for >> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab >> support), if we don't follow the same way for IFN_MULH, I'm worried that >> we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is >> a right thing (we shouldn't cost it specially), it at least means we >> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it >> has direct mul_highpart optab support, I think they should be costed >> consistently. Does it sound reasonable? > > Ah, I was under impression that i386 part was introduced to avoid > generation of PMULHW instructions in the testcases above (to keep > XFAILs). Based on your explanation - yes, the costing function should > be the same. So, the x86 part is OK. > Thanks! It does have the effect to keep XFAILs. ;) I guess the case doesn't care about the costing much just like most vectorization cases? If so, do you want me to remove the xfails with one extra option "-fno-vect-cost-model" along with this patch? BR, Kewen
V čet., 15. jul. 2021 10:49 je oseba Kewen.Lin <linkw@linux.ibm.com> napisala: > on 2021/7/15 下午4:23, Uros Bizjak wrote: > > On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >> > >> Hi Uros, > >> > >> on 2021/7/15 下午3:17, Uros Bizjak wrote: > >>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>>> > >>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: > >>>>> on 2021/7/14 下午2:38, Richard Biener wrote: > >>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> > wrote: > >>>>>>> > >>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote: > >>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> > wrote: > >>>>>> > >>>>>>> I guess the proposed IFN would be directly mapped for > [us]mul_highpart? > >>>>>> > >>>>>> Yes. > >>>>>> > >>>>> > >>>>> Thanks for confirming! The related patch v2 is attached and the > testing > >>>>> is ongoing. > >>>>> > >>>> > >>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and > >>>> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as > below: > >>>> > >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > >>> > >>> These XFAILs should be removed after your patch. > >>> > >> I'm curious whether it's intentional not to specify -fno-vect-cost-model > >> for this test case. As noted above, this case is sensitive on how we > >> cost mult_highpart. Without cost modeling, the XFAILs can be removed > >> only with this mul_highpart pattern support, no matter how we model it > >> (x86 part of this patch exists or not). > >> > >>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch > >>> is actually not needed. > >>> > >> > >> Thanks for the information! The justification for the x86 part is that: > >> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart > >> optab support, i386 port has already customized costing for > >> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab > >> support), if we don't follow the same way for IFN_MULH, I'm worried that > >> we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is > >> a right thing (we shouldn't cost it specially), it at least means we > >> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it > >> has direct mul_highpart optab support, I think they should be costed > >> consistently. Does it sound reasonable? > > > > Ah, I was under impression that i386 part was introduced to avoid > > generation of PMULHW instructions in the testcases above (to keep > > XFAILs). Based on your explanation - yes, the costing function should > > be the same. So, the x86 part is OK. > > > > Thanks! It does have the effect to keep XFAILs. ;) I guess the case > doesn't care about the costing much just like most vectorization cases? > If so, do you want me to remove the xfails with one extra option > "-fno-vect-cost-model" along with this patch. > Yes, please do so. The testcase cares only about PMULHW generation. Thanks, Uros. > BR, > Kewen >
On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > > on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote: > > Hi Uros, > > > > on 2021/7/15 下午3:17, Uros Bizjak wrote: > >> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>> > >>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: > >>>> on 2021/7/14 下午2:38, Richard Biener wrote: > >>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>>>>> > >>>>>> on 2021/7/13 下午8:42, Richard Biener wrote: > >>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote: > >>>>> > >>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart? > >>>>> > >>>>> Yes. > >>>>> > >>>> > >>>> Thanks for confirming! The related patch v2 is attached and the testing > >>>> is ongoing. > >>>> > >>> > >>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and > >>> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as below: > >>> > >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > >> > >> These XFAILs should be removed after your patch. > >> > > I'm curious whether it's intentional not to specify -fno-vect-cost-model > > for this test case. As noted above, this case is sensitive on how we > > cost mult_highpart. Without cost modeling, the XFAILs can be removed > > only with this mul_highpart pattern support, no matter how we model it > > (x86 part of this patch exists or not). > > > >> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch > >> is actually not needed. > >> > > > > Thanks for the information! The justification for the x86 part is that: > > the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart > > optab support, i386 port has already customized costing for > > MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab > > support), if we don't follow the same way for IFN_MULH, I'm worried that > > we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is > > a right thing (we shouldn't cost it specially), it at least means we > > have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it > > has direct mul_highpart optab support, I think they should be costed > > consistently. Does it sound reasonable? > > > > Hi Richard(s), > > This possibly inconsistent handling problem seems like a counter example > better to use a new IFN rather than the existing tree_code, it seems hard > to maintain (should remember to keep consistent for its handlings). ;) > From this perspective, maybe it's better to move backward to use tree_code > and guard it under can_mult_highpart_p == 1 (just like IFN and avoid > costing issue Richi pointed out before)? > > What do you think? No, whenever we want to do code generation based on machine capabilities the canonical way to test for those is to look at optabs and then it's most natural to keep that 1:1 relation and emit internal function calls which directly map to supported optabs instead of going back to some tree codes. When targets "lie" and provide expanders for something they can only emulate then they have to compensate in their costing. But as I understand this isn't the case for x86 here. Now, in this case we already have the MULT_HIGHPART_EXPR tree, so yes, it might make sense to use that instead of introducing an alternate way via the direct internal function. Somebody decided that MULT_HIGHPART is generic enough to warrant this - but I see that expand_mult_highpart can fail unless can_mult_highpart_p and this is exactly one of the cases we want to avoid - either we can handle something generally in which case it can be a tree code or we can't, then it should be 1:1 tied to optabs at best (mult_highpart has scalar support only for the direct optab, vector support also for widen_mult). Richard. > > BR, > Kewen
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a93128fa0a4..1dd9108353c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -22559,6 +22559,9 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count, mode == SFmode ? ix86_cost->fmass : ix86_cost->fmasd); break; + case CFN_MULH: + stmt_cost = ix86_multiplication_cost (ix86_cost, mode); + break; default: break; } diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index fb8b43d1ce2..b1b4289357c 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -3703,6 +3703,7 @@ first_commutative_argument (internal_fn fn) case IFN_FNMS: case IFN_AVG_FLOOR: case IFN_AVG_CEIL: + case IFN_MULH: case IFN_MULHS: case IFN_MULHRS: case IFN_FMIN: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index c3b8e730960..ed6d7de1680 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -169,6 +169,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first, DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first, savg_ceil, uavg_ceil, binary) +DEF_INTERNAL_SIGNED_OPTAB_FN (MULH, ECF_CONST | ECF_NOTHROW, first, + smul_highpart, umul_highpart, binary) DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first, smulhs, umulhs, binary) DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first, diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index b2e7fc2cc7a..ada89d7060b 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -1896,8 +1896,15 @@ vect_recog_over_widening_pattern (vec_info *vinfo, 1) Multiply high with scaling TYPE res = ((TYPE) a * (TYPE) b) >> c; + Here, c is bitsize (TYPE) / 2 - 1. + 2) ... or also with rounding TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1; + Here, d is bitsize (TYPE) / 2 - 2. + + 3) Normal multiply high + TYPE res = ((TYPE) a * (TYPE) b) >> e; + Here, e is bitsize (TYPE) / 2. where only the bottom half of res is used. */ @@ -1942,7 +1949,6 @@ vect_recog_mulhs_pattern (vec_info *vinfo, stmt_vec_info mulh_stmt_info; tree scale_term; internal_fn ifn; - unsigned int expect_offset; /* Check for the presence of the rounding term. */ if (gimple_assign_rhs_code (rshift_input_stmt) == PLUS_EXPR) @@ -1991,25 +1997,37 @@ vect_recog_mulhs_pattern (vec_info *vinfo, /* Get the scaling term. */ scale_term = gimple_assign_rhs2 (plus_input_stmt); + /* Check that the scaling factor is correct. */ + if (TREE_CODE (scale_term) != INTEGER_CST) + return NULL; + + /* Check pattern 2). */ + if (wi::to_widest (scale_term) + target_precision + 2 + != TYPE_PRECISION (lhs_type)) + return NULL; - expect_offset = target_precision + 2; ifn = IFN_MULHRS; } else { mulh_stmt_info = rshift_input_stmt_info; scale_term = gimple_assign_rhs2 (last_stmt); + /* Check that the scaling factor is correct. */ + if (TREE_CODE (scale_term) != INTEGER_CST) + return NULL; - expect_offset = target_precision + 1; - ifn = IFN_MULHS; + /* Check for pattern 1). */ + if (wi::to_widest (scale_term) + target_precision + 1 + == TYPE_PRECISION (lhs_type)) + ifn = IFN_MULHS; + /* Check for pattern 3). */ + else if (wi::to_widest (scale_term) + target_precision + == TYPE_PRECISION (lhs_type)) + ifn = IFN_MULH; + else + return NULL; } - /* Check that the scaling factor is correct. */ - if (TREE_CODE (scale_term) != INTEGER_CST - || wi::to_widest (scale_term) + expect_offset - != TYPE_PRECISION (lhs_type)) - return NULL; - /* Check whether the scaling input term can be seen as two widened inputs multiplied together. */ vect_unpromoted_value unprom_mult[2];