[v3] vect: Recog mul_highpart pattern

Message ID	46838de4-3d92-a270-e71a-73fbe923d306@linux.ibm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5CABE38515DD Subject: [PATCH v3] vect: Recog mul_highpart pattern To: Richard Biener <richard.guenther@gmail.com> References: <da469973-d874-1eb6-739f-048831e8a898@linux.ibm.com> <CAFiYyc11OKHBLoWG=Z3dbNY8cc=jvxochq1OgHn7DJyASvQPqg@mail.gmail.com> <ff874f39-b97f-652e-3d4f-a2b43b4d4892@linux.ibm.com> <CAFiYyc0_UsOt+_xSdKCe1hgW3M4pD=PQrRSu6wJQZ9f8c4zY2A@mail.gmail.com> <b9af40e6-f65c-4f40-3c05-0536e28be097@linux.ibm.com> <CAFiYyc0aZJSpsxjvUOOD==J0pkx=cmUiJMK6QA0g-zPOfk-uFA@mail.gmail.com> <0b72fa77-a281-35e6-34e3-17cf26f18bc1@linux.ibm.com> Message-ID: <46838de4-3d92-a270-e71a-73fbe923d306@linux.ibm.com> Date: Thu, 15 Jul 2021 15:06:58 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <0b72fa77-a281-35e6-34e3-17cf26f18bc1@linux.ibm.com> Content-Type: multipart/mixed; boundary="------------B7DD34C278CA859D49272858" Content-Language: en-US Precedence: list From: "Kewen.Lin via Gcc-patches" <gcc-patches@gcc.gnu.org> Reply-To: "Kewen.Lin" <linkw@linux.ibm.com> Cc: Richard Sandiford <richard.sandiford@arm.com>, Bill Schmidt <wschmidt@linux.ibm.com>, GCC Patches <gcc-patches@gcc.gnu.org>, Segher Boessenkool <segher@kernel.crashing.org> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	[v3] vect: Recog mul_highpart pattern \| expand [v3] vect: Recog mul_highpart pattern

Kewen.Lin July 15, 2021, 7:06 a.m. UTC

on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> on 2021/7/14 下午2:38, Richard Biener wrote:
>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>
>> Yes.
>>
> 
> Thanks for confirming!  The related patch v2 is attached and the testing
> is ongoing.
> 

It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:

XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

They weren't exposed in the testing run with the previous patch which
doesn't use IFN way.  By investigating it, the difference comes from
the different costing on MULT_HIGHPART_EXPR and IFN_MULH.

For MULT_HIGHPART_EXPR, it's costed by 16 from below call:

	case MULT_EXPR:
	case WIDEN_MULT_EXPR:
	case MULT_HIGHPART_EXPR:
	  stmt_cost = ix86_multiplication_cost (ix86_cost, mode);

While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
becomes profitable and the expected vectorization happens.

One conservative fix seems to make IFN_MULH costing go through the
unique cost interface for multiplication, that is:

      case CFN_MULH:
        stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
        break;

As the test case marks the checks as "xfail", probably it's good to
revisit the costing on mul_highpart to ensure it's not priced more.

The attached patch also addressed Richard S.'s review comments on
two reformatting hunks.  Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	* internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
	* internal-fn.def (IFN_MULH): New internal function.
	* tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
	recog normal multiply highpart as IFN_MULH.
	* config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
	function CFN_MULH.
---
 gcc/config/i386/i386.c   |  3 +++
 gcc/internal-fn.c        |  1 +
 gcc/internal-fn.def      |  2 ++
 gcc/tree-vect-patterns.c | 38 ++++++++++++++++++++++++++++----------
 4 files changed, 34 insertions(+), 10 deletions(-)

Uros Bizjak July 15, 2021, 7:17 a.m. UTC | #1

On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> > on 2021/7/14 下午2:38, Richard Biener wrote:
> >> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>
> >>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >>
> >> Yes.
> >>
> >
> > Thanks for confirming!  The related patch v2 is attached and the testing
> > is ongoing.
> >
>
> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

These XFAILs should be removed after your patch.

This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
is actually not needed.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696

Uros.

> They weren't exposed in the testing run with the previous patch which
> doesn't use IFN way.  By investigating it, the difference comes from
> the different costing on MULT_HIGHPART_EXPR and IFN_MULH.
>
> For MULT_HIGHPART_EXPR, it's costed by 16 from below call:
>
>         case MULT_EXPR:
>         case WIDEN_MULT_EXPR:
>         case MULT_HIGHPART_EXPR:
>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>
> While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
> becomes profitable and the expected vectorization happens.
>
> One conservative fix seems to make IFN_MULH costing go through the
> unique cost interface for multiplication, that is:
>
>       case CFN_MULH:
>         stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>         break;
>
> As the test case marks the checks as "xfail", probably it's good to
> revisit the costing on mul_highpart to ensure it's not priced more.
>
> The attached patch also addressed Richard S.'s review comments on
> two reformatting hunks.  Is it ok for trunk?
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>         * internal-fn.def (IFN_MULH): New internal function.
>         * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>         recog normal multiply highpart as IFN_MULH.
>         * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
>         function CFN_MULH.

Kewen.Lin July 15, 2021, 8:04 a.m. UTC | #2

Hi Uros,

on 2021/7/15 下午3:17, Uros Bizjak wrote:
> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>
>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>>
>>>> Yes.
>>>>
>>>
>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>> is ongoing.
>>>
>>
>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> 
> These XFAILs should be removed after your patch.
> 
I'm curious whether it's intentional not to specify -fno-vect-cost-model
for this test case.  As noted above, this case is sensitive on how we
cost mult_highpart.  Without cost modeling, the XFAILs can be removed
only with this mul_highpart pattern support, no matter how we model it
(x86 part of this patch exists or not).

> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> is actually not needed.
> 

Thanks for the information!  The justification for the x86 part is that:
the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
optab support, i386 port has already customized costing for 
MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
support), if we don't follow the same way for IFN_MULH, I'm worried that
we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
a right thing (we shouldn't cost it specially), it at least means we
have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
has direct mul_highpart optab support, I think they should be costed
consistently.  Does it sound reasonable?

BR,
Kewen

> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696
> 
> Uros.
> 
>> They weren't exposed in the testing run with the previous patch which
>> doesn't use IFN way.  By investigating it, the difference comes from
>> the different costing on MULT_HIGHPART_EXPR and IFN_MULH.
>>
>> For MULT_HIGHPART_EXPR, it's costed by 16 from below call:
>>
>>         case MULT_EXPR:
>>         case WIDEN_MULT_EXPR:
>>         case MULT_HIGHPART_EXPR:
>>           stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>>
>> While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
>> becomes profitable and the expected vectorization happens.
>>
>> One conservative fix seems to make IFN_MULH costing go through the
>> unique cost interface for multiplication, that is:
>>
>>       case CFN_MULH:
>>         stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
>>         break;
>>
>> As the test case marks the checks as "xfail", probably it's good to
>> revisit the costing on mul_highpart to ensure it's not priced more.
>>
>> The attached patch also addressed Richard S.'s review comments on
>> two reformatting hunks.  Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>         * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>>         * internal-fn.def (IFN_MULH): New internal function.
>>         * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>>         recog normal multiply highpart as IFN_MULH.
>>         * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
>>         function CFN_MULH.

Uros Bizjak July 15, 2021, 8:23 a.m. UTC | #3

On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Uros,
>
> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>>
> >>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>
> >>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >>>>
> >>>> Yes.
> >>>>
> >>>
> >>> Thanks for confirming!  The related patch v2 is attached and the testing
> >>> is ongoing.
> >>>
> >>
> >> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
> >>
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >
> > These XFAILs should be removed after your patch.
> >
> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> for this test case.  As noted above, this case is sensitive on how we
> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> only with this mul_highpart pattern support, no matter how we model it
> (x86 part of this patch exists or not).
>
> > This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> > is actually not needed.
> >
>
> Thanks for the information!  The justification for the x86 part is that:
> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> optab support, i386 port has already customized costing for
> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> support), if we don't follow the same way for IFN_MULH, I'm worried that
> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> a right thing (we shouldn't cost it specially), it at least means we
> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> has direct mul_highpart optab support, I think they should be costed
> consistently.  Does it sound reasonable?

Ah, I was under impression that i386 part was introduced to avoid
generation of PMULHW instructions in the testcases above (to keep
XFAILs). Based on your explanation - yes, the costing function should
be the same. So, the x86 part is OK.

Thanks,
Uros.

Kewen.Lin July 15, 2021, 8:40 a.m. UTC | #4

on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> Hi Uros,
> 
> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>
>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>
>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>>>
>>>>> Yes.
>>>>>
>>>>
>>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>>> is ongoing.
>>>>
>>>
>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>>
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>
>> These XFAILs should be removed after your patch.
>>
> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> for this test case.  As noted above, this case is sensitive on how we
> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> only with this mul_highpart pattern support, no matter how we model it
> (x86 part of this patch exists or not).
> 
>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>> is actually not needed.
>>
> 
> Thanks for the information!  The justification for the x86 part is that:
> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> optab support, i386 port has already customized costing for 
> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> support), if we don't follow the same way for IFN_MULH, I'm worried that
> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> a right thing (we shouldn't cost it specially), it at least means we
> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> has direct mul_highpart optab support, I think they should be costed
> consistently.  Does it sound reasonable?
> 

Hi Richard(s),

This possibly inconsistent handling problem seems like a counter example
better to use a new IFN rather than the existing tree_code, it seems hard
to maintain (should remember to keep consistent for its handlings).  ;)
From this perspective, maybe it's better to move backward to use tree_code
and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
costing issue Richi pointed out before)?

What do you think?

BR,
Kewen

Kewen.Lin July 15, 2021, 8:49 a.m. UTC | #5

on 2021/7/15 下午4:23, Uros Bizjak wrote:
> On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Uros,
>>
>> on 2021/7/15 下午3:17, Uros Bizjak wrote:
>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
>>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
>>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>>
>>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>
>>>>> Thanks for confirming!  The related patch v2 is attached and the testing
>>>>> is ongoing.
>>>>>
>>>>
>>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
>>>>
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
>>>
>>> These XFAILs should be removed after your patch.
>>>
>> I'm curious whether it's intentional not to specify -fno-vect-cost-model
>> for this test case.  As noted above, this case is sensitive on how we
>> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
>> only with this mul_highpart pattern support, no matter how we model it
>> (x86 part of this patch exists or not).
>>
>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
>>> is actually not needed.
>>>
>>
>> Thanks for the information!  The justification for the x86 part is that:
>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
>> optab support, i386 port has already customized costing for
>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
>> support), if we don't follow the same way for IFN_MULH, I'm worried that
>> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
>> a right thing (we shouldn't cost it specially), it at least means we
>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
>> has direct mul_highpart optab support, I think they should be costed
>> consistently.  Does it sound reasonable?
> 
> Ah, I was under impression that i386 part was introduced to avoid
> generation of PMULHW instructions in the testcases above (to keep
> XFAILs). Based on your explanation - yes, the costing function should
> be the same. So, the x86 part is OK.
> 

Thanks!  It does have the effect to keep XFAILs.  ;)  I guess the case
doesn't care about the costing much just like most vectorization cases?
If so, do you want me to remove the xfails with one extra option 
"-fno-vect-cost-model" along with this patch?

BR,
Kewen

Uros Bizjak July 15, 2021, 9:41 a.m. UTC | #6

V čet., 15. jul. 2021 10:49 je oseba Kewen.Lin <linkw@linux.ibm.com>
napisala:

> on 2021/7/15 下午4:23, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi Uros,
> >>
> >> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>
> >>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com>
> wrote:
> >>>>>>>
> >>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com>
> wrote:
> >>>>>>
> >>>>>>> I guess the proposed IFN would be directly mapped for
> [us]mul_highpart?
> >>>>>>
> >>>>>> Yes.
> >>>>>>
> >>>>>
> >>>>> Thanks for confirming!  The related patch v2 is attached and the
> testing
> >>>>> is ongoing.
> >>>>>
> >>>>
> >>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as
> below:
> >>>>
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>>
> >>> These XFAILs should be removed after your patch.
> >>>
> >> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> >> for this test case.  As noted above, this case is sensitive on how we
> >> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> >> only with this mul_highpart pattern support, no matter how we model it
> >> (x86 part of this patch exists or not).
> >>
> >>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >>> is actually not needed.
> >>>
> >>
> >> Thanks for the information!  The justification for the x86 part is that:
> >> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> >> optab support, i386 port has already customized costing for
> >> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> >> support), if we don't follow the same way for IFN_MULH, I'm worried that
> >> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> >> a right thing (we shouldn't cost it specially), it at least means we
> >> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> >> has direct mul_highpart optab support, I think they should be costed
> >> consistently.  Does it sound reasonable?
> >
> > Ah, I was under impression that i386 part was introduced to avoid
> > generation of PMULHW instructions in the testcases above (to keep
> > XFAILs). Based on your explanation - yes, the costing function should
> > be the same. So, the x86 part is OK.
> >
>
> Thanks!  It does have the effect to keep XFAILs.  ;)  I guess the case
> doesn't care about the costing much just like most vectorization cases?
> If so, do you want me to remove the xfails with one extra option
> "-fno-vect-cost-model" along with this patch.
>

Yes, please do so. The testcase cares only about PMULHW generation.

Thanks,
Uros.


> BR,
> Kewen
>

Richard Biener July 15, 2021, 11:58 a.m. UTC | #7

On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> > Hi Uros,
> >
> > on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>
> >>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>>> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>>>
> >>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>>
> >>>>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>
> >>>> Thanks for confirming!  The related patch v2 is attached and the testing
> >>>> is ongoing.
> >>>>
> >>>
> >>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:
> >>>
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>
> >> These XFAILs should be removed after your patch.
> >>
> > I'm curious whether it's intentional not to specify -fno-vect-cost-model
> > for this test case.  As noted above, this case is sensitive on how we
> > cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> > only with this mul_highpart pattern support, no matter how we model it
> > (x86 part of this patch exists or not).
> >
> >> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >> is actually not needed.
> >>
> >
> > Thanks for the information!  The justification for the x86 part is that:
> > the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> > optab support, i386 port has already customized costing for
> > MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> > support), if we don't follow the same way for IFN_MULH, I'm worried that
> > we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> > a right thing (we shouldn't cost it specially), it at least means we
> > have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> > has direct mul_highpart optab support, I think they should be costed
> > consistently.  Does it sound reasonable?
> >
>
> Hi Richard(s),
>
> This possibly inconsistent handling problem seems like a counter example
> better to use a new IFN rather than the existing tree_code, it seems hard
> to maintain (should remember to keep consistent for its handlings).  ;)
> From this perspective, maybe it's better to move backward to use tree_code
> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
> costing issue Richi pointed out before)?
>
> What do you think?

No, whenever we want to do code generation based on machine
capabilities the canonical way to test for those is to look at optabs
and then it's most natural to keep that 1:1 relation and emit
internal function calls which directly map to supported optabs
instead of going back to some tree codes.

When targets "lie" and provide expanders for something they can
only emulate then they have to compensate in their costing.
But as I understand this isn't the case for x86 here.

Now, in this case we already have the MULT_HIGHPART_EXPR tree,
so yes, it might make sense to use that instead of introducing an
alternate way via the direct internal function.  Somebody decided
that MULT_HIGHPART is generic enough to warrant this - but I
see that expand_mult_highpart can fail unless can_mult_highpart_p
and this is exactly one of the cases we want to avoid - either
we can handle something generally in which case it can be a
tree code or we can't, then it should be 1:1 tied to optabs at best
(mult_highpart has scalar support only for the direct optab,
vector support also for widen_mult).

Richard.

>
> BR,
> Kewen

[v3] vect: Recog mul_highpart pattern

Commit Message

Comments

Patch