[2/3] Refactor widen_plus as internal_fn

Message ID	a9c739df-eba4-e0e6-b59e-4d6ecc7511e9@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB4433881D06 Content-Type: multipart/mixed; boundary="------------NNJKlt7yq1Ui0wiPwuhq0X9n" Message-ID: <a9c739df-eba4-e0e6-b59e-4d6ecc7511e9@arm.com> Date: Fri, 28 Apr 2023 13:37:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: [PATCH 2/3] Refactor widen_plus as internal_fn Content-Language: en-US To: Richard Biener <richard.guenther@gmail.com> Cc: Richard Biener <rguenther@suse.de>, Richard Sandiford <Richard.Sandiford@arm.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> References: <DB9PR08MB660348E126B0DBFBBC11325EF5D69@DB9PR08MB6603.eurprd08.prod.outlook.com> <nycvar.YFH.7.77.849.2205271323240.32424@jbgna.fhfr.qr> <DB9PR08MB660367C6399C30B397EFC9BDF5DC9@DB9PR08MB6603.eurprd08.prod.outlook.com> <nycvar.YFH.7.77.849.2206010952470.32424@jbgna.fhfr.qr> <DB9PR08MB6603F9C50FFDAF4B972EC317F5A29@DB9PR08MB6603.eurprd08.prod.outlook.com> <mpt5ylckhtf.fsf@arm.com> <AM8PR08MB6596506A7B7EE54B128D35A3F5A59@AM8PR08MB6596.eurprd08.prod.outlook.com> <sp30q1n6-orn4-p4ss-q36s-734854o1ss4@fhfr.qr> <DB9PR08MB6603D656FD05B9D562033809F5BA9@DB9PR08MB6603.eurprd08.prod.outlook.com> <nycvar.YFH.7.77.849.2207121045100.14950@jbgna.fhfr.qr> <51ce8969-3130-452e-092e-f9d91eff2dad@arm.com> <nycvar.YFH.7.77.849.2303171151340.18795@jbgna.fhfr.qr> <ff69685d-059b-15a6-c2fe-80d9f09c87e3@arm.com> <CAFiYyc3ULC=1P7Mr8m9ObJhpywSC0q6vb_GtqHeQr-h-UvB_nQ@mail.gmail.com> <ba68e2e8-9670-7e9b-1467-7bc6238ecf0d@arm.com> In-Reply-To: <ba68e2e8-9670-7e9b-1467-7bc6238ecf0d@arm.com> Precedence: list From: "Andre Vieira \(lists\) via Gcc-patches" <gcc-patches@gcc.gnu.org> Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	[1/3] Refactor to allow internal_fn's \| expand [1/3] Refactor to allow internal_fn's [2/3] Refactor widen_plus as internal_fn [3/3] Remove widen_plus/minus_expr tree codes

Andre Vieira (lists) April 28, 2023, 12:37 p.m. UTC

This patch replaces the existing tree_code widen_plus and widen_minus
patterns with internal_fn versions.

DEF_INTERNAL_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it 
provides convenience wrappers for defining conversions that require a 
hi/lo split, like widening and narrowing operations.  Each definition 
for <NAME> will require an optab named <OPTAB> and two other optabs that 
you specify for signed and unsigned. The hi/lo pair is necessary because 
the widening operations take n narrow elements as inputs and return n/2 
wide elements as outputs. The 'lo' operation operates on the first n/2 
elements of input. The 'hi' operation operates on the second n/2 
elements of input. Defining an internal_fn along with hi/lo variations 
allows a single internal function to be returned from a vect_recog 
function that will later be expanded to hi/lo.

DEF_INTERNAL_OPTAB_HILO_FN is used in internal-fn.def to register a 
widening internal_fn. It is defined differently in different places and 
internal-fn.def is sourced from those places so the parameters given can 
be reused.
   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, 
later defined to generate the  'expand_' functions for the hi/lo 
versions of the fn.
   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the 
original and hi/lo variants of the internal_fn

  For example:
  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>addl_hi_<mode> -> 
(u/s)addl2
                        IFN_VEC_WIDEN_PLUS_LO  -> 
vec_widen_<su>addl_lo_<mode> -> (u/s)addl

This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS 
tree codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

gcc/ChangeLog:

2023-04-28  Andre Vieira  <andre.simoesdiasvieira@arm.com>
             Joel Hutton  <joel.hutton@arm.com>
             Tamar Christina  <tamar.christina@arm.com>

	* internal-fn.cc (INCLUDE_MAP): Include maps for use in optab
     lookup.
	(DEF_INTERNAL_OPTAB_HILO_FN): Macro to define an internal_fn that
     expands into multiple internal_fns (for widening).
	(ifn_cmp): Function to compare ifn's for sorting/searching.
	(lookup_hilo_ifn_optab): Add lookup function.
	(lookup_hilo_internal_fn): Add lookup function.
	(commutative_binary_fn_p): Add widen_plus fn's.
	(widening_fn_p): New function.
	(decomposes_to_hilo_fn_p): New function.
	* internal-fn.def (DEF_INTERNAL_OPTAB_HILO_FN): Define widening
     plus,minus functions.
	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS tree code.
	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS tree code.
	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
	(lookup_hilo_ifn_optab): Add prototype.
	(lookup_hilo_internal_fn): Likewise.
	(widening_fn_p): Likewise.
	(decomposes_to_hilo_fn_p): Likewise.
	* optabs.cc (commutative_optab_p): Add widening plus, minus optabs.
	* optabs.def (OPTAB_CD): widen add, sub optabs
	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
     patterns with a hi/lo split.
	(vect_recog_widen_plus_pattern): Refactor to return
     IFN_VECT_WIDEN_PLUS.
	(vect_recog_widen_minus_pattern): Refactor to return new
     IFN_VEC_WIDEN_MINUS.
	* tree-vect-stmts.cc (vectorizable_conversion): Add widen plus/minus
     ifn
     support.
	(supportable_widening_operation): Add widen plus/minus ifn support.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-widen-add.c: Test that new
     IFN_VEC_WIDEN_PLUS is being used.
	* gcc.target/aarch64/vect-widen-sub.c: Test that new
     IFN_VEC_WIDEN_MINUS is being used.

Richard Biener May 3, 2023, 12:11 p.m. UTC | #1

On Fri, 28 Apr 2023, Andre Vieira (lists) wrote:

> This patch replaces the existing tree_code widen_plus and widen_minus
> patterns with internal_fn versions.
> 
> DEF_INTERNAL_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it provides
> convenience wrappers for defining conversions that require a hi/lo split, like
> widening and narrowing operations.  Each definition for <NAME> will require an
> optab named <OPTAB> and two other optabs that you specify for signed and
> unsigned. The hi/lo pair is necessary because the widening operations take n
> narrow elements as inputs and return n/2 wide elements as outputs. The 'lo'
> operation operates on the first n/2 elements of input. The 'hi' operation
> operates on the second n/2 elements of input. Defining an internal_fn along
> with hi/lo variations allows a single internal function to be returned from a
> vect_recog function that will later be expanded to hi/lo.
> 
> DEF_INTERNAL_OPTAB_HILO_FN is used in internal-fn.def to register a widening
> internal_fn. It is defined differently in different places and internal-fn.def
> is sourced from those places so the parameters given can be reused.
>   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later
> defined to generate the  'expand_' functions for the hi/lo versions of the fn.
>   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original
> and hi/lo variants of the internal_fn
> 
>  For example:
>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>addl_hi_<mode> ->
> (u/s)addl2
>                        IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>addl_lo_<mode>
> -> (u/s)addl
> 
> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

I'll note that it's interesting we have widen multiplication as
the only existing example where we have both HI/LO and EVEN/ODD cases.
I think we want to share as much of the infrastructure to eventually
support targets doing even/odd (I guess all VLA vector targets will
be even/odd?).

DEF_INTERNAL_OPTAB_HILO_FN also looks to be implicitely directed to
widening operations (otherwise no signed/unsigned variants would be
necessary).  What I don't understand is why we need an optab
without _hi/_lo but in that case no signed/unsigned variant?

Looks like all plus, plus_lo and plus_hi are commutative but
only plus is widening?!  So is the setup that the vectorizer
doesn't know about the split and uses 'plus' but then the
expander performs the split?  It does look a bit awkward here
(the plain 'plus' is just used for the scalar case during
pattern recog it seems).

I'd rather have DEF_INTERNAL_OPTAB_HILO_FN split up, declaring
the hi/lo pairs and the scalar variant separately using
DEF_INTERNAL_FN without expander for that, and having
DEF_INTERNAL_HILO_WIDEN_OPTAB_FN and DEF_INTERNAL_EVENODD_WIDEN_OPTAB_FN
for the signed/unsigned pairs?  (if we need that helper at all)

Targets shouldn't need to implement the plain optab (it shouldn't
exist) and the vectorizer should query the hi/lo or even/odd
optabs for support instead.

The vectorizer parts look OK to me, I'd like Richard to chime
in on the optab parts as well.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 2023-04-28  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>             Joel Hutton  <joel.hutton@arm.com>
>             Tamar Christina  <tamar.christina@arm.com>
> 
>     	* internal-fn.cc (INCLUDE_MAP): Include maps for use in optab
>     lookup.
>     	(DEF_INTERNAL_OPTAB_HILO_FN): Macro to define an internal_fn that
>     expands into multiple internal_fns (for widening).
> 	(ifn_cmp): Function to compare ifn's for sorting/searching.
> 	(lookup_hilo_ifn_optab): Add lookup function.
> 	(lookup_hilo_internal_fn): Add lookup function.
> 	(commutative_binary_fn_p): Add widen_plus fn's.
> 	(widening_fn_p): New function.
> 	(decomposes_to_hilo_fn_p): New function.
> 	* internal-fn.def (DEF_INTERNAL_OPTAB_HILO_FN): Define widening
>     plus,minus functions.
> 	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS tree code.
> 	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS tree code.
> 	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
> 	(lookup_hilo_ifn_optab): Add prototype.
> 	(lookup_hilo_internal_fn): Likewise.
> 	(widening_fn_p): Likewise.
> 	(decomposes_to_hilo_fn_p): Likewise.
> 	* optabs.cc (commutative_optab_p): Add widening plus, minus optabs.
> 	* optabs.def (OPTAB_CD): widen add, sub optabs
> 	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
>     patterns with a hi/lo split.
>     	(vect_recog_widen_plus_pattern): Refactor to return
>     IFN_VECT_WIDEN_PLUS.
>     	(vect_recog_widen_minus_pattern): Refactor to return new
>     IFN_VEC_WIDEN_MINUS.
>     	* tree-vect-stmts.cc (vectorizable_conversion): Add widen plus/minus
>     ifn
>     support.
> 	(supportable_widening_operation): Add widen plus/minus ifn support.
> 
> gcc/testsuite/ChangeLog:
> 
>     	* gcc.target/aarch64/vect-widen-add.c: Test that new
>     IFN_VEC_WIDEN_PLUS is being used.
>     	* gcc.target/aarch64/vect-widen-sub.c: Test that new
>     IFN_VEC_WIDEN_MINUS is being used.
>

Richard Sandiford May 3, 2023, 7:07 p.m. UTC | #2

Richard Biener <rguenther@suse.de> writes:
> On Fri, 28 Apr 2023, Andre Vieira (lists) wrote:
>
>> This patch replaces the existing tree_code widen_plus and widen_minus
>> patterns with internal_fn versions.
>> 
>> DEF_INTERNAL_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it provides
>> convenience wrappers for defining conversions that require a hi/lo split, like
>> widening and narrowing operations.  Each definition for <NAME> will require an
>> optab named <OPTAB> and two other optabs that you specify for signed and
>> unsigned. The hi/lo pair is necessary because the widening operations take n
>> narrow elements as inputs and return n/2 wide elements as outputs. The 'lo'
>> operation operates on the first n/2 elements of input. The 'hi' operation
>> operates on the second n/2 elements of input. Defining an internal_fn along
>> with hi/lo variations allows a single internal function to be returned from a
>> vect_recog function that will later be expanded to hi/lo.
>> 
>> DEF_INTERNAL_OPTAB_HILO_FN is used in internal-fn.def to register a widening
>> internal_fn. It is defined differently in different places and internal-fn.def
>> is sourced from those places so the parameters given can be reused.
>>   internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later
>> defined to generate the  'expand_' functions for the hi/lo versions of the fn.
>>   internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original
>> and hi/lo variants of the internal_fn
>> 
>>  For example:
>>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>addl_hi_<mode> ->
>> (u/s)addl2
>>                        IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>addl_lo_<mode>
>> -> (u/s)addl
>> 
>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
>
> I'll note that it's interesting we have widen multiplication as
> the only existing example where we have both HI/LO and EVEN/ODD cases.
> I think we want to share as much of the infrastructure to eventually
> support targets doing even/odd (I guess all VLA vector targets will
> be even/odd?).

Can't speak for all, but SVE2 certainly is.

> DEF_INTERNAL_OPTAB_HILO_FN also looks to be implicitely directed to
> widening operations (otherwise no signed/unsigned variants would be
> necessary).  What I don't understand is why we need an optab
> without _hi/_lo but in that case no signed/unsigned variant?
>
> Looks like all plus, plus_lo and plus_hi are commutative but
> only plus is widening?!  So is the setup that the vectorizer
> doesn't know about the split and uses 'plus' but then the
> expander performs the split?  It does look a bit awkward here
> (the plain 'plus' is just used for the scalar case during
> pattern recog it seems).
>
> I'd rather have DEF_INTERNAL_OPTAB_HILO_FN split up, declaring
> the hi/lo pairs and the scalar variant separately using
> DEF_INTERNAL_FN without expander for that, and having
> DEF_INTERNAL_HILO_WIDEN_OPTAB_FN and DEF_INTERNAL_EVENODD_WIDEN_OPTAB_FN
> for the signed/unsigned pairs?  (if we need that helper at all)
>
> Targets shouldn't need to implement the plain optab (it shouldn't
> exist) and the vectorizer should query the hi/lo or even/odd
> optabs for support instead.

I dread these kinds of review because I think I'm almost certain to
flatly contradict something I said last time round, but +1 FWIW.
It seems OK to define an ifn to represent the combined effect, for the
scalar case, but that shouldn't leak into optabs unless we actually want
to use the ifn for "real" scalar ops (as opposed to a temporary
placeholder during pattern recognition).

On the optabs/ifn bits:

> +static int
> +ifn_cmp (const void *a_, const void *b_)
> +{
> +  typedef std::pair<enum internal_fn, unsigned> ifn_pair;
> +  auto *a = (const std::pair<ifn_pair, optab> *)a_;
> +  auto *b = (const std::pair<ifn_pair, optab> *)b_;
> +  return (int) (a->first.first) - (b->first.first);
> +}
> +
> +/* Return the optab belonging to the given internal function NAME for the given
> +   SIGN or unknown_optab.  */
> +
> +optab
> +lookup_hilo_ifn_optab (enum internal_fn fn, unsigned sign)

There is no NAME parameter.  It also isn't clear what SIGN means:
is 1 for unsigned or signed?  Would be better to use signop and
TYPE_SIGN IMO.

> +{
> +  typedef std::pair<enum internal_fn, unsigned> ifn_pair;
> +  typedef auto_vec <std::pair<ifn_pair, optab>>fn_to_optab_map_type;
> +  static fn_to_optab_map_type *fn_to_optab_map;
> +
> +  if (!fn_to_optab_map)
> +    {
> +      unsigned num
> +	= sizeof (internal_fn_hilo_keys_array) / sizeof (enum internal_fn);
> +      fn_to_optab_map = new fn_to_optab_map_type ();
> +      for (unsigned int i = 0; i < num - 1; ++i)
> +	{
> +	  enum internal_fn fn = internal_fn_hilo_keys_array[i];
> +	  optab v1 = internal_fn_hilo_values_array[2*i];
> +	  optab v2 = internal_fn_hilo_values_array[2*i + 1];
> +	  ifn_pair key1 (fn, 0);
> +	  fn_to_optab_map->safe_push ({key1, v1});
> +	  ifn_pair key2 (fn, 1);
> +	  fn_to_optab_map->safe_push ({key2, v2});
> +	}
> +	fn_to_optab_map->qsort (ifn_cmp);
> +    }
> +
> +  ifn_pair new_pair (fn, sign ? 1 : 0);
> +  optab tmp;
> +  std::pair<ifn_pair,optab> pair_wrap (new_pair, tmp);
> +  auto entry = fn_to_optab_map->bsearch (&pair_wrap, ifn_cmp);
> +  return entry != fn_to_optab_map->end () ? entry->second : unknown_optab;
> +}
> +

Do we need to use a map for this?  It seems like it follows mechanically
from the macro definition and could be handled using a switch statement
and preprocessor logic.

Also, it would be good to make direct_internal_fn_optab DTRT for this
case, rather than needing a separate function.

> +extern void
> +lookup_hilo_internal_fn (enum internal_fn ifn, enum internal_fn *lo,
> +			  enum internal_fn *hi)
> +{
> +  gcc_assert (decomposes_to_hilo_fn_p (ifn));
> +
> +  *lo = internal_fn (ifn + 1);
> +  *hi = internal_fn (ifn + 2);
> +}

Nit: spurious extern.  Function needs a comment.  There have been
requests to drop redundant "enum" keywords from new code.

> +/* Return true if FN decomposes to _hi and _lo IFN.  If true this should also
> +   be a widening function.  */
> +
> +bool
> +decomposes_to_hilo_fn_p (internal_fn fn)
> +{
> +  if (!widening_fn_p (fn))
> +    return false;
> +
> +  switch (fn)
> +    {
> +    case IFN_VEC_WIDEN_PLUS:
> +    case IFN_VEC_WIDEN_MINUS:
> +      return true;
> +
> +    default:
> +      return false;
> +    }
> +}
> +

Similarly here I think we should use the preprocessor.  It isn't clear
why this returns false for !widening_fn_p.  Narrowing hi/lo functions
would decompose in a similar way.

As a general comment, how about naming the new macro:

  DEF_INTERNAL_SIGNED_HILO_OPTAB_FN

and make it invoke DEF_INTERNAL_SIGNED_OPTAB_FN twice, once for
the hi and once for the lo?

The new optabs need to be documented in md.texi.  I think it'd be
better to drop the "l" suffix in "addl" and "subl", since that's an
Arm convention and is redundant with the earlier "widen".

Sorry for the nitpicks and thanks for picking up this work.

Richard

Andre Vieira (lists) May 12, 2023, 12:16 p.m. UTC | #3

I have dealt with, I think..., most of your comments. There's quite a 
few changes, I think it's all a bit simpler now. I made some other 
changes to the costing in tree-inline.cc and gimple-range-op.cc in which 
I try to preserve the same behaviour as we had with the tree codes 
before. Also added some extra checks to tree-cfg.cc that made sense to me.

I am still regression testing the gimple-range-op change, as that was a 
last minute change, but the rest survived a bootstrap and regression 
test on aarch64-unknown-linux-gnu.

cover letter:

This patch replaces the existing tree_code widen_plus and widen_minus
patterns with internal_fn versions.

DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and 
DEF_INTERNAL_OPTAB_NARROWING_HILO_FN are like 
DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively 
except they provide convenience wrappers for defining conversions that 
require a hi/lo split.  Each definition for <NAME> will require optabs 
for _hi and _lo and each of those will also require a signed and 
unsigned version in the case of widening. The hi/lo pair is necessary 
because the widening and narrowing operations take n narrow elements as 
inputs and return n/2 wide elements as outputs. The 'lo' operation 
operates on the first n/2 elements of input. The 'hi' operation operates 
on the second n/2 elements of input. Defining an internal_fn along with 
hi/lo variations allows a single internal function to be returned from a 
vect_recog function that will later be expanded to hi/lo.


  For example:
  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> -> 
(u/s)addl2
                        IFN_VEC_WIDEN_PLUS_LO  -> 
vec_widen_<su>add_lo_<mode> -> (u/s)addl

This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS 
tree codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

gcc/ChangeLog:

2023-05-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
             Joel Hutton  <joel.hutton@arm.com>
             Tamar Christina  <tamar.christina@arm.com>

         * config/aarch64/aarch64-simd.md 
(vec_widen_<su>addl_lo_<mode>): Rename
         this ...
         (vec_widen_<su>add_lo_<mode>): ... to this.
         (vec_widen_<su>addl_hi_<mode>): Rename this ...
         (vec_widen_<su>add_hi_<mode>): ... to this.
         (vec_widen_<su>subl_lo_<mode>): Rename this ...
         (vec_widen_<su>sub_lo_<mode>): ... to this.
         (vec_widen_<su>subl_hi_<mode>): Rename this ...
         (vec_widen_<su>sub_hi_<mode>): ...to this.
         * doc/generic.texi: Document new IFN codes.
	* internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to define an
         internal_fn that expands into multiple internal_fns for widening.
         (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing.
  	(ifn_cmp): Function to compare ifn's for sorting/searching.
	(lookup_hilo_internal_fn): Add lookup function.
	(commutative_binary_fn_p): Add widen_plus fn's.
	(widening_fn_p): New function.
	(narrowing_fn_p): New function.
	(decomposes_to_hilo_fn_p): New function.
         (direct_internal_fn_optab): Change visibility.
	* internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define widening
     plus,minus functions.
	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code.
	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code.
	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
         (direct_internal_fn_optab): Declare new prototype.
	(lookup_hilo_internal_fn): Likewise.
	(widening_fn_p): Likewise.
	(Narrowing_fn_p): Likewise.
	(decomposes_to_hilo_fn_p): Likewise.
	* optabs.cc (commutative_optab_p): Add widening plus optabs.
	* optabs.def (OPTAB_D): Define widen add, sub optabs.
         * tree-cfg.cc (verify_gimple_call): Add checks for new widen
         add and sub IFNs.
         * tree-inline.cc (estimate_num_insns): Return same
         cost for widen add and sub IFNs as previous tree_codes.
	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
     patterns with a hi/lo split.
         (vect_recog_sad_pattern): Refactor to use new IFN codes.
         (vect_recog_widen_plus_pattern): Likewise.
         (vect_recog_widen_minus_pattern): Likewise.
         (vect_recog_average_pattern): Likewise.
	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
         _HILO IFNs.
	(supportable_widening_operation): Likewise.
         * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-widen-add.c: Test that new
     IFN_VEC_WIDEN_PLUS is being used.
	* gcc.target/aarch64/vect-widen-sub.c: Test that new
     IFN_VEC_WIDEN_MINUS is being used.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bfc98a8d943467b33390defab9682f44efab5907..ffbbecb9409e1c2835d658c2a8855cd0e955c0f2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4626,7 +4626,7 @@
   [(set_attr "type" "neon_<ADDSUB:optab>_long")]
 )
 
-(define_expand "vec_widen_<su>addl_lo_<mode>"
+(define_expand "vec_widen_<su>add_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4638,7 +4638,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>addl_hi_<mode>"
+(define_expand "vec_widen_<su>add_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4650,7 +4650,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_lo_<mode>"
+(define_expand "vec_widen_<su>sub_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4662,7 +4662,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_hi_<mode>"
+(define_expand "vec_widen_<su>sub_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..0fd7e6cce8bbd4ecb8027b702722adcf6c32eb55 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1811,6 +1811,10 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
 @tindex VEC_WIDEN_MULT_LO_EXPR
+@tindex IFN_VEC_WIDEN_PLUS_HI
+@tindex IFN_VEC_WIDEN_PLUS_LO
+@tindex IFN_VEC_WIDEN_MINUS_HI
+@tindex IFN_VEC_WIDEN_MINUS_LO
 @tindex VEC_WIDEN_PLUS_HI_EXPR
 @tindex VEC_WIDEN_PLUS_LO_EXPR
 @tindex VEC_WIDEN_MINUS_HI_EXPR
@@ -1861,6 +1865,33 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the
 low @code{N/2} elements of the two vector are multiplied to produce the
 vector of @code{N/2} products.
 
+@item IFN_VEC_WIDEN_PLUS_HI
+@itemx IFN_VEC_WIDEN_PLUS_LO
+These internal functions represent widening vector addition of the high and low
+parts of the two input vectors, respectively.  Their operands are vectors that
+contain the same number of elements (@code{N}) of the same integral type. The
+result is a vector that contains half as many elements, of an integral type
+whose size is twice as wide.  In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the
+high @code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products.  In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products.
+
+@item IFN_VEC_WIDEN_MINUS_HI
+@itemx IFN_VEC_WIDEN_MINUS_LO
+These internal functions represent widening vector subtraction of the high and
+low parts of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The high/low elements of the second vector are subtracted from the high/low
+elements of the first. The result is a vector that contains half as many
+elements, of an integral type whose size is twice as wide.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second
+vector are subtracted from the high @code{N/2} of the first to produce the
+vector of @code{N/2} products.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second
+vector are subtracted from the low @code{N/2} of the first to produce the
+vector of @code{N/2} products.
+
 @item VEC_WIDEN_PLUS_HI_EXPR
 @itemx VEC_WIDEN_PLUS_LO_EXPR
 These nodes represent widening vector addition of the high and low parts of
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 594bd3043f0e944299ddfff219f757ef15a3dd61..66636d82df27626e7911efd0cb8526921b39633f 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1187,6 +1187,7 @@ gimple_range_op_handler::maybe_non_standard ()
 {
   range_operator *signed_op = ptr_op_widen_mult_signed;
   range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
+  bool signed1, signed2, signed_ret;
   if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
     switch (gimple_assign_rhs_code (m_stmt))
       {
@@ -1202,32 +1203,55 @@ gimple_range_op_handler::maybe_non_standard ()
 	  m_op1 = gimple_assign_rhs1 (m_stmt);
 	  m_op2 = gimple_assign_rhs2 (m_stmt);
 	  tree ret = gimple_assign_lhs (m_stmt);
-	  bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
-	  bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
-	  bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
-
-	  /* Normally these operands should all have the same sign, but
-	     some passes and violate this by taking mismatched sign args.  At
-	     the moment the only one that's possible is mismatch inputs and
-	     unsigned output.  Once ranger supports signs for the operands we
-	     can properly fix it,  for now only accept the case we can do
-	     correctly.  */
-	  if ((signed1 ^ signed2) && signed_ret)
-	    return;
-
-	  m_valid = true;
-	  if (signed2 && !signed1)
-	    std::swap (m_op1, m_op2);
-
-	  if (signed1 || signed2)
-	    m_int = signed_op;
-	  else
-	    m_int = unsigned_op;
+	  signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+	  signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+	  signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
 	  break;
 	}
 	default:
-	  break;
+	  return;
       }
+  else if (gimple_code (m_stmt) == GIMPLE_CALL
+      && gimple_call_internal_p (m_stmt)
+      && gimple_get_lhs (m_stmt) != NULL_TREE)
+    switch (gimple_call_internal_fn (m_stmt))
+      {
+      case IFN_VEC_WIDEN_PLUS_LO:
+      case IFN_VEC_WIDEN_PLUS_HI:
+	  {
+	    signed_op = ptr_op_widen_plus_signed;
+	    unsigned_op = ptr_op_widen_plus_unsigned;
+	    m_valid = false;
+	    m_op1 = gimple_call_arg (m_stmt, 0);
+	    m_op2 = gimple_call_arg (m_stmt, 1);
+	    tree ret = gimple_get_lhs (m_stmt);
+	    signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+	    signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+	    signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
+	    break;
+	  }
+      default:
+	return;
+      }
+  else
+    return;
+
+    /* Normally these operands should all have the same sign, but some passes
+       and violate this by taking mismatched sign args.  At the moment the only
+       one that's possible is mismatch inputs and unsigned output.  Once ranger
+       supports signs for the operands we can properly fix it,  for now only
+       accept the case we can do correctly.  */
+    if ((signed1 ^ signed2) && signed_ret)
+      return;
+
+    m_valid = true;
+    if (signed2 && !signed1)
+      std::swap (m_op1, m_op2);
+
+    if (signed1 || signed2)
+      m_int = signed_op;
+    else
+      m_int = unsigned_op;
 }
 
 // Set up a gimple_range_op_handler for any built in function which can be
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..1acea5ae33046b70de247b1688aea874d9956abc 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -90,6 +90,19 @@ lookup_internal_fn (const char *name)
   return entry ? *entry : IFN_LAST;
 }
 
+/*  Given an internal_fn IFN that is a HILO function, return its corresponding
+    LO and HI internal_fns.  */
+
+extern void
+lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi)
+{
+  gcc_assert (decomposes_to_hilo_fn_p (ifn));
+
+  *lo = internal_fn (ifn + 1);
+  *hi = internal_fn (ifn + 2);
+}
+
+
 /* Fnspec of each internal function, indexed by function number.  */
 const_tree internal_fn_fnspec_array[IFN_LAST + 1];
 
@@ -137,7 +150,16 @@ const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) TYPE##_direct,
 #define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
 				     UNSIGNED_OPTAB, TYPE) TYPE##_direct,
+#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+					    UNSIGNED_OPTAB, TYPE)		  \
+TYPE##_direct, TYPE##_direct, TYPE##_direct,
+#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE)	\
+TYPE##_direct, TYPE##_direct, TYPE##_direct,
 #include "internal-fn.def"
+#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
   not_direct
 };
 
@@ -3852,7 +3874,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 
 /* Return the optab used by internal function FN.  */
 
-static optab
+optab
 direct_internal_fn_optab (internal_fn fn, tree_pair types)
 {
   switch (fn)
@@ -3971,6 +3993,9 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_VEC_WIDEN_PLUS_HILO:
+    case IFN_VEC_WIDEN_PLUS_LO:
+    case IFN_VEC_WIDEN_PLUS_HI:
       return true;
 
     default:
@@ -4044,6 +4069,88 @@ first_commutative_argument (internal_fn fn)
     }
 }
 
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as wide as the element size of the input vectors.  */
+
+bool
+widening_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \
+    case IFN_##NAME##_HILO:\
+    case IFN_##NAME##_HI: \
+    case IFN_##NAME##_LO: \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+
+    default:
+      return false;
+    }
+}
+
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as narrow as the element size of the input vectors.  */
+
+bool
+narrowing_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \
+    case IFN_##NAME##_HILO:\
+    case IFN_##NAME##_HI: \
+    case IFN_##NAME##_LO: \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+
+    default:
+      return false;
+    }
+}
+
+/* Return true if FN decomposes to _hi and _lo IFN.  */
+
+bool
+decomposes_to_hilo_fn_p (internal_fn fn)
+{
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \
+    case IFN_##NAME##_HILO:\
+      return true;
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \
+    case IFN_##NAME##_HILO:\
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+
+    default:
+      return false;
+    }
+}
+
 /* Return true if IFN_SET_EDOM is supported.  */
 
 bool
@@ -4071,7 +4178,33 @@ set_edom_supported_p (void)
     optab which_optab = direct_internal_fn_optab (fn, types);		\
     expand_##TYPE##_optab_fn (fn, stmt, which_optab);			\
   }
+#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR,	    \
+					    SIGNED_OPTAB, UNSIGNED_OPTAB,   \
+					    TYPE)			    \
+  static void								    \
+  expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED,		    \
+			gcall *stmt ATTRIBUTE_UNUSED)			    \
+  {									    \
+    gcc_unreachable ();							    \
+  }									    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_HI, FLAGS, SELECTOR, SIGNED_OPTAB,    \
+			       UNSIGNED_OPTAB, TYPE)			    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_LO, FLAGS, SELECTOR, SIGNED_OPTAB,    \
+			       UNSIGNED_OPTAB, TYPE)
+#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE)	\
+  static void								\
+  expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED,		\
+			gcall *stmt ATTRIBUTE_UNUSED)			\
+  {									\
+    gcc_unreachable ();							\
+  }									\
+  DEF_INTERNAL_OPTAB_FN(CODE##_LO, FLAGS, OPTAB, TYPE)			\
+  DEF_INTERNAL_OPTAB_FN(CODE##_HI, FLAGS, OPTAB, TYPE)
 #include "internal-fn.def"
+#undef DEF_INTERNAL_OPTAB_FN
+#undef DEF_INTERNAL_SIGNED_OPTAB_FN
+#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
 
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
@@ -4080,6 +4213,7 @@ set_edom_supported_p (void)
 
    where STMT is the statement that performs the call. */
 static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = {
+
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE,
 #include "internal-fn.def"
   0
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..012dd323b86dd7cfcc5c13d3a2bb2a453937155d 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -85,6 +85,13 @@ along with GCC; see the file COPYING3.  If not see
    says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX}
    group of functions to any integral mode (including vector modes).
 
+   DEF_INTERNAL_SIGNED_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it
+   provides convenience wrappers for defining conversions that require a
+   hi/lo split, like widening and narrowing operations.  Each definition
+   for <NAME> will require an optab named <OPTAB> and two other optabs that
+   you specify for signed and unsigned.
+
+
    Each entry must have a corresponding expander of the form:
 
      void expand_NAME (gimple_call stmt)
@@ -123,6 +130,20 @@ along with GCC; see the file COPYING3.  If not see
   DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
 #endif
 
+#ifndef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \
+  DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE) \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE)
+#endif
+
+#ifndef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, FLAGS, OPTAB, TYPE) \
+  DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE) \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE)
+#endif
+
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load)
 DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
@@ -315,6 +336,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
 DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
+DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_PLUS,
+				     ECF_CONST | ECF_NOTHROW,
+				     first,
+				     vec_widen_sadd, vec_widen_uadd,
+				     binary)
+DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_MINUS,
+				     ECF_CONST | ECF_NOTHROW,
+				     first,
+				     vec_widen_ssub, vec_widen_usub,
+				     binary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 08922ed4254898f5fffca3f33973e96ed9ce772f..8ba07d6d1338e75bc5a451d9e403112a608f3ea2 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -20,6 +20,10 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_INTERNAL_FN_H
 #define GCC_INTERNAL_FN_H
 
+#include "insn-codes.h"
+#include "insn-opinit.h"
+
+
 /* INTEGER_CST values for IFN_UNIQUE function arg-0.
 
    UNSPEC: Undifferentiated UNIQUE.
@@ -112,6 +116,8 @@ internal_fn_name (enum internal_fn fn)
 }
 
 extern internal_fn lookup_internal_fn (const char *);
+extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *);
+extern optab direct_internal_fn_optab (internal_fn, tree_pair);
 
 /* Return the ECF_* flags for function FN.  */
 
@@ -210,6 +216,9 @@ extern bool commutative_binary_fn_p (internal_fn);
 extern bool commutative_ternary_fn_p (internal_fn);
 extern int first_commutative_argument (internal_fn);
 extern bool associative_binary_fn_p (internal_fn);
+extern bool widening_fn_p (code_helper);
+extern bool narrowing_fn_p (code_helper);
+extern bool decomposes_to_hilo_fn_p (internal_fn);
 
 extern bool set_edom_supported_p (void);
 
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index c8e39c82d57a7d726e7da33d247b80f32ec9236c..5a08d91e550b2d92e9572211f811fdba99a33a38 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1314,7 +1314,15 @@ commutative_optab_p (optab binoptab)
 	  || binoptab == smul_widen_optab
 	  || binoptab == umul_widen_optab
 	  || binoptab == smul_highpart_optab
-	  || binoptab == umul_highpart_optab);
+	  || binoptab == umul_highpart_optab
+	  || binoptab == vec_widen_saddl_hi_optab
+	  || binoptab == vec_widen_saddl_lo_optab
+	  || binoptab == vec_widen_uaddl_hi_optab
+	  || binoptab == vec_widen_uaddl_lo_optab
+	  || binoptab == vec_widen_sadd_hi_optab
+	  || binoptab == vec_widen_sadd_lo_optab
+	  || binoptab == vec_widen_uadd_hi_optab
+	  || binoptab == vec_widen_uadd_lo_optab);
 }
 
 /* X is to be used in mode MODE as operand OPN to BINOPTAB.  If we're
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b300c9ca5737de9be809fa01aabe5e01..16d121722c8c5723d9b164f5a2c616dc7ec143de 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -410,6 +410,10 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
 OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
 OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
 OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
+OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a")
+OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a")
+OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
+OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -422,6 +426,10 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
 OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
 OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
 OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
+OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a")
+OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a")
+OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
+OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
 OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
 OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
 OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
index 220bd9352a4c7acd2e3713e441d74898d3e92b30..7037673d32bd780e1c9b58a51e58e2bac3b30b7e 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
index a2bed63affbd091977df95a126da1f5b8c1d41d2..83bc1edb6105f47114b665e24a13e6194b2179a2 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 0aeebb67fac864db284985f4a6f0653af281d62b..28464ad9e3a7ea25557ffebcdbdbc1340f9e0d8b 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "profile.h"
 #include "sreal.h"
+#include "internal-fn.h"
 
 /* This file contains functions for building the Control Flow Graph (CFG)
    for a function tree.  */
@@ -3411,6 +3412,52 @@ verify_gimple_call (gcall *stmt)
 	  debug_generic_stmt (fn);
 	  return true;
 	}
+      internal_fn ifn = gimple_call_internal_fn (stmt);
+      if (ifn == IFN_LAST)
+	{
+	  error ("gimple call has an invalid IFN");
+	  debug_generic_stmt (fn);
+	  return true;
+	}
+      else if (decomposes_to_hilo_fn_p (ifn))
+	{
+	  /* Non decomposed HILO stmts should not appear in IL, these are
+	     merely used as an internal representation to the auto-vectorizer
+	     pass and should have been expanded to their _LO _HI variants.  */
+	  error ("gimple call has an non decomposed HILO IFN");
+	  debug_generic_stmt (fn);
+	  return true;
+	}
+      else if (ifn == IFN_VEC_WIDEN_PLUS_LO
+	       || ifn == IFN_VEC_WIDEN_PLUS_HI
+	       || ifn == IFN_VEC_WIDEN_MINUS_LO
+	       || ifn == IFN_VEC_WIDEN_MINUS_HI)
+	{
+	  tree rhs1_type = TREE_TYPE (gimple_call_arg (stmt, 0));
+	  tree rhs2_type = TREE_TYPE (gimple_call_arg (stmt, 1));
+	  tree lhs_type = TREE_TYPE (gimple_get_lhs (stmt));
+	  if (TREE_CODE (lhs_type) == VECTOR_TYPE)
+	    {
+	      if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+		  || TREE_CODE (rhs2_type) != VECTOR_TYPE)
+		{
+		  error ("invalid non-vector operands in vector IFN call");
+		  debug_generic_stmt (fn);
+		  return true;
+		}
+	      lhs_type = TREE_TYPE (lhs_type);
+	      rhs1_type = TREE_TYPE (rhs1_type);
+	      rhs2_type = TREE_TYPE (rhs2_type);
+	    }
+	  if (POINTER_TYPE_P (lhs_type)
+	      || POINTER_TYPE_P (rhs1_type)
+	      || POINTER_TYPE_P (rhs2_type))
+	    {
+	      error ("invalid (pointer) operands in vector IFN call");
+	      debug_generic_stmt (fn);
+	      return true;
+	    }
+	}
     }
   else
     {
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 63a19f8d1d89c6bd5d8e55a299cbffaa324b4b84..d74d8db2173b1ab117250fea89de5212d5e354ec 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -4433,7 +4433,20 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
 	tree decl;
 
 	if (gimple_call_internal_p (stmt))
-	  return 0;
+	  {
+	    internal_fn fn = gimple_call_internal_fn (stmt);
+	    switch (fn)
+	      {
+	      case IFN_VEC_WIDEN_PLUS_HI:
+	      case IFN_VEC_WIDEN_PLUS_LO:
+	      case IFN_VEC_WIDEN_MINUS_HI:
+	      case IFN_VEC_WIDEN_MINUS_LO:
+		return 1;
+
+	      default:
+		return 0;
+	      }
+	  }
 	else if ((decl = gimple_call_fndecl (stmt))
 		 && fndecl_built_in_p (decl))
 	  {
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1778af0242898e3dc73d94d22a5b8505628a53b5..93cebc72beb4f65249a69b2665dfeb8a0991c1d1 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
 
 static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
-		      tree_code widened_code, bool shift_p,
+		      code_helper widened_code, bool shift_p,
 		      unsigned int max_nops,
 		      vect_unpromoted_value *unprom, tree *common_type,
 		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
-  gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
-  if (!assign)
+  gimple* stmt = stmt_info->stmt;
+  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
+    return 0;
+
+  code_helper rhs_code;
+  if (is_gimple_assign (stmt))
+    rhs_code = gimple_assign_rhs_code (stmt);
+  else if (is_gimple_call (stmt))
+    rhs_code = gimple_call_combined_fn (stmt);
+  else
     return 0;
 
-  tree_code rhs_code = gimple_assign_rhs_code (assign);
-  if (rhs_code != code && rhs_code != widened_code)
+  if (rhs_code != code
+      && rhs_code != widened_code)
     return 0;
 
-  tree type = TREE_TYPE (gimple_assign_lhs (assign));
+  tree lhs = gimple_get_lhs (stmt);
+  tree type = TREE_TYPE (lhs);
   if (!INTEGRAL_TYPE_P (type))
     return 0;
 
@@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
     {
       vect_unpromoted_value *this_unprom = &unprom[next_op];
       unsigned int nops = 1;
-      tree op = gimple_op (assign, i + 1);
+      tree op = gimple_arg (stmt, i);
       if (i == 1 && TREE_CODE (op) == INTEGER_CST)
 	{
 	  /* We already have a common type from earlier operands.
@@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom[2];
-  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR,
+  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR,
+			     IFN_VEC_WIDEN_MINUS_HILO,
 			     false, 2, unprom, &half_type))
     return NULL;
 
@@ -1395,14 +1405,16 @@ static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 			     stmt_vec_info last_stmt_info, tree *type_out,
 			     tree_code orig_code, code_helper wide_code,
-			     bool shift_p, const char *name)
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
 {
   gimple *last_stmt = last_stmt_info->stmt;
 
   vect_unpromoted_value unprom[2];
   tree half_type;
   if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code,
-			     shift_p, 2, unprom, &half_type))
+			     shift_p, 2, unprom, &half_type, subtype))
+
     return NULL;
 
   /* Pattern detected.  */
@@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
 			      type, pattern_stmt, vecctype);
 }
 
+static gimple *
+vect_recog_widen_op_pattern (vec_info *vinfo,
+			     stmt_vec_info last_stmt_info, tree *type_out,
+			     tree_code orig_code, internal_fn wide_ifn,
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
+{
+  combined_fn ifn = as_combined_fn (wide_ifn);
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      orig_code, ifn, shift_p, name,
+				      subtype);
+}
+
+
 /* Try to detect multiplication on widened inputs, converting MULT_EXPR
    to WIDEN_MULT_EXPR.  See vect_recog_widen_op_pattern for details.  */
 
@@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 }
 
 /* Try to detect addition on widened inputs, converting PLUS_EXPR
-   to WIDEN_PLUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_PLUS_HILO.  See vect_recog_widen_op_pattern for details.  */
 
 static gimple *
 vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      PLUS_EXPR, WIDEN_PLUS_EXPR, false,
-				      "vect_recog_widen_plus_pattern");
+				      PLUS_EXPR, IFN_VEC_WIDEN_PLUS_HILO,
+				      false, "vect_recog_widen_plus_pattern",
+				      &subtype);
 }
 
 /* Try to detect subtraction on widened inputs, converting MINUS_EXPR
-   to WIDEN_MINUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_MINUS_HILO.  See vect_recog_widen_op_pattern for details.  */
 static gimple *
 vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      MINUS_EXPR, WIDEN_MINUS_EXPR, false,
-				      "vect_recog_widen_minus_pattern");
+				      MINUS_EXPR, IFN_VEC_WIDEN_MINUS_HILO,
+				      false, "vect_recog_widen_minus_pattern",
+				      &subtype);
 }
 
 /* Function vect_recog_ctz_ffs_pattern
@@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo,
   vect_unpromoted_value unprom[3];
   tree new_type;
   unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
-					    WIDEN_PLUS_EXPR, false, 3,
+					    IFN_VEC_WIDEN_PLUS_HILO, false, 3,
 					    unprom, &new_type);
   if (nops == 0)
     return NULL;
@@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_mask_conversion_pattern, "mask_conversion" },
   { vect_recog_widen_plus_pattern, "widen_plus" },
   { vect_recog_widen_minus_pattern, "widen_minus" },
+  /* These must come after the double widening ones.  */
 };
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d152ae9ab10b361b88c0f839d6951c43b954750a..24c811ebe01fb8b003100dea494cf64fea72a975 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5038,7 +5038,9 @@ vectorizable_conversion (vec_info *vinfo,
   bool widen_arith = (code == WIDEN_PLUS_EXPR
 		 || code == WIDEN_MINUS_EXPR
 		 || code == WIDEN_MULT_EXPR
-		 || code == WIDEN_LSHIFT_EXPR);
+		 || code == WIDEN_LSHIFT_EXPR
+		 || code == IFN_VEC_WIDEN_PLUS_HILO
+		 || code == IFN_VEC_WIDEN_MINUS_HILO);
 
   if (!widen_arith
       && !CONVERT_EXPR_CODE_P (code)
@@ -5088,7 +5090,9 @@ vectorizable_conversion (vec_info *vinfo,
       gcc_assert (code == WIDEN_MULT_EXPR
 		  || code == WIDEN_LSHIFT_EXPR
 		  || code == WIDEN_PLUS_EXPR
-		  || code == WIDEN_MINUS_EXPR);
+		  || code == WIDEN_MINUS_EXPR
+		  || code == IFN_VEC_WIDEN_PLUS_HILO
+		  || code == IFN_VEC_WIDEN_MINUS_HILO);
 
 
       op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
@@ -12478,10 +12482,43 @@ supportable_widening_operation (vec_info *vinfo,
       optab1 = vec_unpacks_sbool_lo_optab;
       optab2 = vec_unpacks_sbool_hi_optab;
     }
-  else
+
+  if (code.is_fn_code ())
+     {
+      internal_fn ifn = as_internal_fn ((combined_fn) code);
+      gcc_assert (decomposes_to_hilo_fn_p (ifn));
+
+      internal_fn lo, hi;
+      lookup_hilo_internal_fn (ifn, &lo, &hi);
+      *code1 = as_combined_fn (lo);
+      *code2 = as_combined_fn (hi);
+      optab1 = direct_internal_fn_optab (lo, {vectype, vectype});
+      optab2 = direct_internal_fn_optab (hi, {vectype, vectype});
+    }
+  else if (code.is_tree_code ())
     {
-      optab1 = optab_for_tree_code (c1, vectype, optab_default);
-      optab2 = optab_for_tree_code (c2, vectype, optab_default);
+      if (code == FIX_TRUNC_EXPR)
+	{
+	  /* The signedness is determined from output operand.  */
+	  optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype_out, optab_default);
+	}
+      else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ())
+	       && VECTOR_BOOLEAN_TYPE_P (wide_vectype)
+	       && VECTOR_BOOLEAN_TYPE_P (vectype)
+	       && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype)
+	       && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
+	{
+	  /* If the input and result modes are the same, a different optab
+	     is needed where we pass in the number of units in vectype.  */
+	  optab1 = vec_unpacks_sbool_lo_optab;
+	  optab2 = vec_unpacks_sbool_hi_optab;
+	}
+      else
+	{
+	  optab1 = optab_for_tree_code (c1, vectype, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype, optab_default);
+	}
     }
 
   if (!optab1 || !optab2)
diff --git a/gcc/tree.def b/gcc/tree.def
index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3)
 DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2)
 
 /* Widening sad (sum of absolute differences).
-   The first two arguments are of type t1 which should be integer.
-   The third argument and the result are of type t2, such that t2 is at least
-   twice the size of t1.  Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
+   The first two arguments are of type t1 which should be a vector of integers.
+   The third argument and the result are of type t2, such that the size of
+   the elements of t2 is at least twice the size of the elements of t1.
+   Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
    equivalent to:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = PLUS_EXPR (tmp2, arg3)
   or:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
  */

Richard Biener May 12, 2023, 1:28 p.m. UTC | #4

On Fri, 12 May 2023, Andre Vieira (lists) wrote:

> I have dealt with, I think..., most of your comments. There's quite a few
> changes, I think it's all a bit simpler now. I made some other changes to the
> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
> the same behaviour as we had with the tree codes before. Also added some extra
> checks to tree-cfg.cc that made sense to me.
> 
> I am still regression testing the gimple-range-op change, as that was a last
> minute change, but the rest survived a bootstrap and regression test on
> aarch64-unknown-linux-gnu.
> 
> cover letter:
> 
> This patch replaces the existing tree_code widen_plus and widen_minus
> patterns with internal_fn versions.
> 
> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
> except they provide convenience wrappers for defining conversions that require
> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
> and each of those will also require a signed and unsigned version in the case
> of widening. The hi/lo pair is necessary because the widening and narrowing
> operations take n narrow elements as inputs and return n/2 wide elements as
> outputs. The 'lo' operation operates on the first n/2 elements of input. The
> 'hi' operation operates on the second n/2 elements of input. Defining an
> internal_fn along with hi/lo variations allows a single internal function to
> be returned from a vect_recog function that will later be expanded to hi/lo.
> 
> 
>  For example:
>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
> (u/s)addl2
>                        IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
> -> (u/s)addl
> 
> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

What I still don't understand is how we are so narrowly focused on
HI/LO?  We need a combined scalar IFN for pattern selection (not
sure why that's now called _HILO, I expected no suffix).  Then there's
three possibilities the target can implement this:

 1) with a widen_[su]add<mode> instruction - I _think_ that's what
    RISCV is going to offer since it is a target where vector modes
    have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
    RVV can do a V4HI to V4SI widening and widening add/subtract
    using vwadd[u] and vwsub[u] (the HI->SI widening is actually
    done with a widening add of zero - eh).
    IIRC GCN is the same here.
 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
    codes currently support (exclusively)
 3) similar, but widen_[su]add{_even,_odd}<mode>

that said, things like decomposes_to_hilo_fn_p look to paint us into
a 2) corner without good reason.

Richard.

> gcc/ChangeLog:
> 
> 2023-05-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>             Joel Hutton  <joel.hutton@arm.com>
>             Tamar Christina  <tamar.christina@arm.com>
> 
>         * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
> Rename
>         this ...
>         (vec_widen_<su>add_lo_<mode>): ... to this.
>         (vec_widen_<su>addl_hi_<mode>): Rename this ...
>         (vec_widen_<su>add_hi_<mode>): ... to this.
>         (vec_widen_<su>subl_lo_<mode>): Rename this ...
>         (vec_widen_<su>sub_lo_<mode>): ... to this.
>         (vec_widen_<su>subl_hi_<mode>): Rename this ...
>         (vec_widen_<su>sub_hi_<mode>): ...to this.
>         * doc/generic.texi: Document new IFN codes.
> 	* internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to
> 	define an
>         internal_fn that expands into multiple internal_fns for widening.
>         (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing.
>  	(ifn_cmp): Function to compare ifn's for sorting/searching.
> 	(lookup_hilo_internal_fn): Add lookup function.
> 	(commutative_binary_fn_p): Add widen_plus fn's.
> 	(widening_fn_p): New function.
> 	(narrowing_fn_p): New function.
> 	(decomposes_to_hilo_fn_p): New function.
> 	         (direct_internal_fn_optab): Change visibility.
>     	* internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define
>     widening
>     plus,minus functions.
> 	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code.
> 	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code.
> 	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
> 	         (direct_internal_fn_optab): Declare new prototype.
> 	(lookup_hilo_internal_fn): Likewise.
> 	(widening_fn_p): Likewise.
> 	(Narrowing_fn_p): Likewise.
> 	(decomposes_to_hilo_fn_p): Likewise.
> 	* optabs.cc (commutative_optab_p): Add widening plus optabs.
> 	* optabs.def (OPTAB_D): Define widen add, sub optabs.
>         * tree-cfg.cc (verify_gimple_call): Add checks for new widen
>         add and sub IFNs.
>         * tree-inline.cc (estimate_num_insns): Return same
>         cost for widen add and sub IFNs as previous tree_codes.
>     	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
>     patterns with a hi/lo split.
>         (vect_recog_sad_pattern): Refactor to use new IFN codes.
>         (vect_recog_widen_plus_pattern): Likewise.
>         (vect_recog_widen_minus_pattern): Likewise.
>         (vect_recog_average_pattern): Likewise.
> 	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
> 	         _HILO IFNs.
> 	(supportable_widening_operation): Likewise.
>         * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
> 
> gcc/testsuite/ChangeLog:
> 
>     	* gcc.target/aarch64/vect-widen-add.c: Test that new
>     IFN_VEC_WIDEN_PLUS is being used.
>     	* gcc.target/aarch64/vect-widen-sub.c: Test that new
>     IFN_VEC_WIDEN_MINUS is being used.
>

Andre Vieira (lists) May 12, 2023, 1:55 p.m. UTC | #5

On 12/05/2023 14:28, Richard Biener wrote:
> On Fri, 12 May 2023, Andre Vieira (lists) wrote:
> 
>> I have dealt with, I think..., most of your comments. There's quite a few
>> changes, I think it's all a bit simpler now. I made some other changes to the
>> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
>> the same behaviour as we had with the tree codes before. Also added some extra
>> checks to tree-cfg.cc that made sense to me.
>>
>> I am still regression testing the gimple-range-op change, as that was a last
>> minute change, but the rest survived a bootstrap and regression test on
>> aarch64-unknown-linux-gnu.
>>
>> cover letter:
>>
>> This patch replaces the existing tree_code widen_plus and widen_minus
>> patterns with internal_fn versions.
>>
>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
>> except they provide convenience wrappers for defining conversions that require
>> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
>> and each of those will also require a signed and unsigned version in the case
>> of widening. The hi/lo pair is necessary because the widening and narrowing
>> operations take n narrow elements as inputs and return n/2 wide elements as
>> outputs. The 'lo' operation operates on the first n/2 elements of input. The
>> 'hi' operation operates on the second n/2 elements of input. Defining an
>> internal_fn along with hi/lo variations allows a single internal function to
>> be returned from a vect_recog function that will later be expanded to hi/lo.
>>
>>
>>   For example:
>>   IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
>> (u/s)addl2
>>                         IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
>> -> (u/s)addl
>>
>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> 
> What I still don't understand is how we are so narrowly focused on
> HI/LO?  We need a combined scalar IFN for pattern selection (not
> sure why that's now called _HILO, I expected no suffix).  Then there's
> three possibilities the target can implement this:
> 
>   1) with a widen_[su]add<mode> instruction - I _think_ that's what
>      RISCV is going to offer since it is a target where vector modes
>      have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
>      RVV can do a V4HI to V4SI widening and widening add/subtract
>      using vwadd[u] and vwsub[u] (the HI->SI widening is actually
>      done with a widening add of zero - eh).
>      IIRC GCN is the same here.
>   2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
>      codes currently support (exclusively)
>   3) similar, but widen_[su]add{_even,_odd}<mode>
> 
> that said, things like decomposes_to_hilo_fn_p look to paint us into
> a 2) corner without good reason.

I was kind of just keeping the naming, I had forgotten to mention I was 
also going to add _EVENODD but you are right, the pattern selection IFN 
does not need to be restrictive.

And then at supportable_widening_operation we could check what the 
target offers support for (either 1, 2 or 3). We can then actually just 
get rid of decomposes_to_hilo_fn_p and just assume that for all 
narrowing or widening IFN's there are optabs (that may or may not be 
implemented by a target) for all three variants

Having said that, that means we should have an optab to cover 1, which 
should probably just have the original name. Let me write it out...

Say we have a IFN_VEC_WIDEN_PLUS pattern and assume its signed, 
supportable_widening_operation would then first check if the target 
supported vec_widen_sadd_optab for say V8HI -> V8SI? Risc-V would take 
this path I guess?

If the target doesn't then it could check for support for:
vec_widen_sadd_lo_optab V4HI -> V4SI
vec_widen_sadd_hi_optab V4HI -> V4SI

AArch64 Advanced SIMD would implement this.

If the target still didn't support this it would check for (not sure 
about the modes here):
vec_widen_sadd_even_optab VNx8HI -> VNx4SI
vec_widen_sadd_odd_optab VNx8HI -> VNx4SI

This is one SVE would implement.


So that would mean that I'd probably end up rewriting
#define DEF_INTERNAL_OPTAB_WIDENING_FN (NAME, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)
as:
for1)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, 
TYPE)

for 2)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_LO, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_HI, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)

for 3)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_EVEN, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_ODD, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)

And the same for narrowing (but with DEF_INTERNAL_OPTAB_FN instead of 
SIGNED_OPTAB).

So each widening and narrowing IFN would have optabs for all its 
variants and each target implements the ones it supports.

I'm happy to do this, but implementing support to handle the 1 and 3 
variants without having optabs for them right now seems a bit odd and it 
would delay this patch, so I suggest I add the framework and the optabs 
but leave adding the vectorizer support for later? I can add comments to 
where I think that should go.

> Richard.
> 
>> gcc/ChangeLog:
>>
>> 2023-05-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>              Joel Hutton  <joel.hutton@arm.com>
>>              Tamar Christina  <tamar.christina@arm.com>
>>
>>          * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
>> Rename
>>          this ...
>>          (vec_widen_<su>add_lo_<mode>): ... to this.
>>          (vec_widen_<su>addl_hi_<mode>): Rename this ...
>>          (vec_widen_<su>add_hi_<mode>): ... to this.
>>          (vec_widen_<su>subl_lo_<mode>): Rename this ...
>>          (vec_widen_<su>sub_lo_<mode>): ... to this.
>>          (vec_widen_<su>subl_hi_<mode>): Rename this ...
>>          (vec_widen_<su>sub_hi_<mode>): ...to this.
>>          * doc/generic.texi: Document new IFN codes.
>> 	* internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to
>> 	define an
>>          internal_fn that expands into multiple internal_fns for widening.
>>          (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing.
>>   	(ifn_cmp): Function to compare ifn's for sorting/searching.
>> 	(lookup_hilo_internal_fn): Add lookup function.
>> 	(commutative_binary_fn_p): Add widen_plus fn's.
>> 	(widening_fn_p): New function.
>> 	(narrowing_fn_p): New function.
>> 	(decomposes_to_hilo_fn_p): New function.
>> 	         (direct_internal_fn_optab): Change visibility.
>>      	* internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define
>>      widening
>>      plus,minus functions.
>> 	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code.
>> 	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code.
>> 	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
>> 	         (direct_internal_fn_optab): Declare new prototype.
>> 	(lookup_hilo_internal_fn): Likewise.
>> 	(widening_fn_p): Likewise.
>> 	(Narrowing_fn_p): Likewise.
>> 	(decomposes_to_hilo_fn_p): Likewise.
>> 	* optabs.cc (commutative_optab_p): Add widening plus optabs.
>> 	* optabs.def (OPTAB_D): Define widen add, sub optabs.
>>          * tree-cfg.cc (verify_gimple_call): Add checks for new widen
>>          add and sub IFNs.
>>          * tree-inline.cc (estimate_num_insns): Return same
>>          cost for widen add and sub IFNs as previous tree_codes.
>>      	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
>>      patterns with a hi/lo split.
>>          (vect_recog_sad_pattern): Refactor to use new IFN codes.
>>          (vect_recog_widen_plus_pattern): Likewise.
>>          (vect_recog_widen_minus_pattern): Likewise.
>>          (vect_recog_average_pattern): Likewise.
>> 	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
>> 	         _HILO IFNs.
>> 	(supportable_widening_operation): Likewise.
>>          * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      	* gcc.target/aarch64/vect-widen-add.c: Test that new
>>      IFN_VEC_WIDEN_PLUS is being used.
>>      	* gcc.target/aarch64/vect-widen-sub.c: Test that new
>>      IFN_VEC_WIDEN_MINUS is being used.
>>
>

Richard Sandiford May 12, 2023, 2:01 p.m. UTC | #6

Richard Biener <rguenther@suse.de> writes:
> On Fri, 12 May 2023, Andre Vieira (lists) wrote:
>
>> I have dealt with, I think..., most of your comments. There's quite a few
>> changes, I think it's all a bit simpler now. I made some other changes to the
>> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
>> the same behaviour as we had with the tree codes before. Also added some extra
>> checks to tree-cfg.cc that made sense to me.
>> 
>> I am still regression testing the gimple-range-op change, as that was a last
>> minute change, but the rest survived a bootstrap and regression test on
>> aarch64-unknown-linux-gnu.
>> 
>> cover letter:
>> 
>> This patch replaces the existing tree_code widen_plus and widen_minus
>> patterns with internal_fn versions.
>> 
>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
>> except they provide convenience wrappers for defining conversions that require
>> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
>> and each of those will also require a signed and unsigned version in the case
>> of widening. The hi/lo pair is necessary because the widening and narrowing
>> operations take n narrow elements as inputs and return n/2 wide elements as
>> outputs. The 'lo' operation operates on the first n/2 elements of input. The
>> 'hi' operation operates on the second n/2 elements of input. Defining an
>> internal_fn along with hi/lo variations allows a single internal function to
>> be returned from a vect_recog function that will later be expanded to hi/lo.
>> 
>> 
>>  For example:
>>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
>> (u/s)addl2
>>                        IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
>> -> (u/s)addl
>> 
>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
>
> What I still don't understand is how we are so narrowly focused on
> HI/LO?  We need a combined scalar IFN for pattern selection (not
> sure why that's now called _HILO, I expected no suffix).  Then there's
> three possibilities the target can implement this:
>
>  1) with a widen_[su]add<mode> instruction - I _think_ that's what
>     RISCV is going to offer since it is a target where vector modes
>     have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
>     RVV can do a V4HI to V4SI widening and widening add/subtract
>     using vwadd[u] and vwsub[u] (the HI->SI widening is actually
>     done with a widening add of zero - eh).
>     IIRC GCN is the same here.

SVE currently does this too, but the addition and widening are
separate operations.  E.g. in principle there's no reason why
you can't sign-extend one operand, zero-extend the other, and
then add the result together.  Or you could extend them from
different sizes (QI and HI).  All of those are supported
(if the costing allows them).

If the target has operations to do combined extending and adding (or
whatever), then at the moment we rely on combine to generate them.

So I think this case is separate from Andre's work.  The addition
itself is just an ordinary addition, and any widening happens by
vectorising a CONVERT/NOP_EXPR.

>  2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
>     codes currently support (exclusively)
>  3) similar, but widen_[su]add{_even,_odd}<mode>
>
> that said, things like decomposes_to_hilo_fn_p look to paint us into
> a 2) corner without good reason.

I suppose one question is: how much of the patch is really specific
to HI/LO, and how much is just grouping two halves together?  The nice
thing about the internal-fn grouping macros is that, if (3) is
implemented in future, the structure will strongly encourage even/odd
pairs to be supported for all operations that support hi/lo.  That is,
I would expect the grouping macros to be extended to define even/odd
ifns alongside hi/lo ones, rather than adding separate definitions
for even/odd functions.

If so, at least from the internal-fn.* side of things, I think the question
is whether it's OK to stick with hilo names for now, or whether we should
use more forward-looking names.

Thanks,
Richard

>
> Richard.
>
>> gcc/ChangeLog:
>> 
>> 2023-05-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>             Joel Hutton  <joel.hutton@arm.com>
>>             Tamar Christina  <tamar.christina@arm.com>
>> 
>>         * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
>> Rename
>>         this ...
>>         (vec_widen_<su>add_lo_<mode>): ... to this.
>>         (vec_widen_<su>addl_hi_<mode>): Rename this ...
>>         (vec_widen_<su>add_hi_<mode>): ... to this.
>>         (vec_widen_<su>subl_lo_<mode>): Rename this ...
>>         (vec_widen_<su>sub_lo_<mode>): ... to this.
>>         (vec_widen_<su>subl_hi_<mode>): Rename this ...
>>         (vec_widen_<su>sub_hi_<mode>): ...to this.
>>         * doc/generic.texi: Document new IFN codes.
>> 	* internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to
>> 	define an
>>         internal_fn that expands into multiple internal_fns for widening.
>>         (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing.
>>  	(ifn_cmp): Function to compare ifn's for sorting/searching.
>> 	(lookup_hilo_internal_fn): Add lookup function.
>> 	(commutative_binary_fn_p): Add widen_plus fn's.
>> 	(widening_fn_p): New function.
>> 	(narrowing_fn_p): New function.
>> 	(decomposes_to_hilo_fn_p): New function.
>> 	         (direct_internal_fn_optab): Change visibility.
>>     	* internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define
>>     widening
>>     plus,minus functions.
>> 	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code.
>> 	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code.
>> 	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
>> 	         (direct_internal_fn_optab): Declare new prototype.
>> 	(lookup_hilo_internal_fn): Likewise.
>> 	(widening_fn_p): Likewise.
>> 	(Narrowing_fn_p): Likewise.
>> 	(decomposes_to_hilo_fn_p): Likewise.
>> 	* optabs.cc (commutative_optab_p): Add widening plus optabs.
>> 	* optabs.def (OPTAB_D): Define widen add, sub optabs.
>>         * tree-cfg.cc (verify_gimple_call): Add checks for new widen
>>         add and sub IFNs.
>>         * tree-inline.cc (estimate_num_insns): Return same
>>         cost for widen add and sub IFNs as previous tree_codes.
>>     	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
>>     patterns with a hi/lo split.
>>         (vect_recog_sad_pattern): Refactor to use new IFN codes.
>>         (vect_recog_widen_plus_pattern): Likewise.
>>         (vect_recog_widen_minus_pattern): Likewise.
>>         (vect_recog_average_pattern): Likewise.
>> 	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
>> 	         _HILO IFNs.
>> 	(supportable_widening_operation): Likewise.
>>         * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>     	* gcc.target/aarch64/vect-widen-add.c: Test that new
>>     IFN_VEC_WIDEN_PLUS is being used.
>>     	* gcc.target/aarch64/vect-widen-sub.c: Test that new
>>     IFN_VEC_WIDEN_MINUS is being used.
>>

Richard Biener May 15, 2023, 10:20 a.m. UTC | #7

On Fri, 12 May 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Fri, 12 May 2023, Andre Vieira (lists) wrote:
> >
> >> I have dealt with, I think..., most of your comments. There's quite a few
> >> changes, I think it's all a bit simpler now. I made some other changes to the
> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
> >> the same behaviour as we had with the tree codes before. Also added some extra
> >> checks to tree-cfg.cc that made sense to me.
> >> 
> >> I am still regression testing the gimple-range-op change, as that was a last
> >> minute change, but the rest survived a bootstrap and regression test on
> >> aarch64-unknown-linux-gnu.
> >> 
> >> cover letter:
> >> 
> >> This patch replaces the existing tree_code widen_plus and widen_minus
> >> patterns with internal_fn versions.
> >> 
> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
> >> except they provide convenience wrappers for defining conversions that require
> >> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
> >> and each of those will also require a signed and unsigned version in the case
> >> of widening. The hi/lo pair is necessary because the widening and narrowing
> >> operations take n narrow elements as inputs and return n/2 wide elements as
> >> outputs. The 'lo' operation operates on the first n/2 elements of input. The
> >> 'hi' operation operates on the second n/2 elements of input. Defining an
> >> internal_fn along with hi/lo variations allows a single internal function to
> >> be returned from a vect_recog function that will later be expanded to hi/lo.
> >> 
> >> 
> >>  For example:
> >>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
> >> (u/s)addl2
> >>                        IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
> >> -> (u/s)addl
> >> 
> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> >
> > What I still don't understand is how we are so narrowly focused on
> > HI/LO?  We need a combined scalar IFN for pattern selection (not
> > sure why that's now called _HILO, I expected no suffix).  Then there's
> > three possibilities the target can implement this:
> >
> >  1) with a widen_[su]add<mode> instruction - I _think_ that's what
> >     RISCV is going to offer since it is a target where vector modes
> >     have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
> >     RVV can do a V4HI to V4SI widening and widening add/subtract
> >     using vwadd[u] and vwsub[u] (the HI->SI widening is actually
> >     done with a widening add of zero - eh).
> >     IIRC GCN is the same here.
> 
> SVE currently does this too, but the addition and widening are
> separate operations.  E.g. in principle there's no reason why
> you can't sign-extend one operand, zero-extend the other, and
> then add the result together.  Or you could extend them from
> different sizes (QI and HI).  All of those are supported
> (if the costing allows them).

I see.  So why does the target the expose widen_[su]add<mode> at all?

> If the target has operations to do combined extending and adding (or
> whatever), then at the moment we rely on combine to generate them.
> 
> So I think this case is separate from Andre's work.  The addition
> itself is just an ordinary addition, and any widening happens by
> vectorising a CONVERT/NOP_EXPR.
> 
> >  2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
> >     codes currently support (exclusively)
> >  3) similar, but widen_[su]add{_even,_odd}<mode>
> >
> > that said, things like decomposes_to_hilo_fn_p look to paint us into
> > a 2) corner without good reason.
> 
> I suppose one question is: how much of the patch is really specific
> to HI/LO, and how much is just grouping two halves together?

Yep, that I don't know for sure.

>  The nice
> thing about the internal-fn grouping macros is that, if (3) is
> implemented in future, the structure will strongly encourage even/odd
> pairs to be supported for all operations that support hi/lo.  That is,
> I would expect the grouping macros to be extended to define even/odd
> ifns alongside hi/lo ones, rather than adding separate definitions
> for even/odd functions.
> 
> If so, at least from the internal-fn.* side of things, I think the question
> is whether it's OK to stick with hilo names for now, or whether we should
> use more forward-looking names.

I think for parts that are independent we could use a more
forward-looking name.  Maybe _halves?  But I'm also not sure
how much of that is really needed (it seems to be tied around
optimizing optabs space?)

Richard.

> Thanks,
> Richard
> 
> >
> > Richard.
> >
> >> gcc/ChangeLog:
> >> 
> >> 2023-05-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> >>             Joel Hutton  <joel.hutton@arm.com>
> >>             Tamar Christina  <tamar.christina@arm.com>
> >> 
> >>         * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
> >> Rename
> >>         this ...
> >>         (vec_widen_<su>add_lo_<mode>): ... to this.
> >>         (vec_widen_<su>addl_hi_<mode>): Rename this ...
> >>         (vec_widen_<su>add_hi_<mode>): ... to this.
> >>         (vec_widen_<su>subl_lo_<mode>): Rename this ...
> >>         (vec_widen_<su>sub_lo_<mode>): ... to this.
> >>         (vec_widen_<su>subl_hi_<mode>): Rename this ...
> >>         (vec_widen_<su>sub_hi_<mode>): ...to this.
> >>         * doc/generic.texi: Document new IFN codes.
> >> 	* internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to
> >> 	define an
> >>         internal_fn that expands into multiple internal_fns for widening.
> >>         (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing.
> >>  	(ifn_cmp): Function to compare ifn's for sorting/searching.
> >> 	(lookup_hilo_internal_fn): Add lookup function.
> >> 	(commutative_binary_fn_p): Add widen_plus fn's.
> >> 	(widening_fn_p): New function.
> >> 	(narrowing_fn_p): New function.
> >> 	(decomposes_to_hilo_fn_p): New function.
> >> 	         (direct_internal_fn_optab): Change visibility.
> >>     	* internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define
> >>     widening
> >>     plus,minus functions.
> >> 	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code.
> >> 	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code.
> >> 	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
> >> 	         (direct_internal_fn_optab): Declare new prototype.
> >> 	(lookup_hilo_internal_fn): Likewise.
> >> 	(widening_fn_p): Likewise.
> >> 	(Narrowing_fn_p): Likewise.
> >> 	(decomposes_to_hilo_fn_p): Likewise.
> >> 	* optabs.cc (commutative_optab_p): Add widening plus optabs.
> >> 	* optabs.def (OPTAB_D): Define widen add, sub optabs.
> >>         * tree-cfg.cc (verify_gimple_call): Add checks for new widen
> >>         add and sub IFNs.
> >>         * tree-inline.cc (estimate_num_insns): Return same
> >>         cost for widen add and sub IFNs as previous tree_codes.
> >>     	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
> >>     patterns with a hi/lo split.
> >>         (vect_recog_sad_pattern): Refactor to use new IFN codes.
> >>         (vect_recog_widen_plus_pattern): Likewise.
> >>         (vect_recog_widen_minus_pattern): Likewise.
> >>         (vect_recog_average_pattern): Likewise.
> >> 	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
> >> 	         _HILO IFNs.
> >> 	(supportable_widening_operation): Likewise.
> >>         * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>     	* gcc.target/aarch64/vect-widen-add.c: Test that new
> >>     IFN_VEC_WIDEN_PLUS is being used.
> >>     	* gcc.target/aarch64/vect-widen-sub.c: Test that new
> >>     IFN_VEC_WIDEN_MINUS is being used.
> >> 
>

Richard Sandiford May 15, 2023, 10:47 a.m. UTC | #8

Richard Biener <rguenther@suse.de> writes:
> On Fri, 12 May 2023, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Fri, 12 May 2023, Andre Vieira (lists) wrote:
>> >
>> >> I have dealt with, I think..., most of your comments. There's quite a few
>> >> changes, I think it's all a bit simpler now. I made some other changes to the
>> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
>> >> the same behaviour as we had with the tree codes before. Also added some extra
>> >> checks to tree-cfg.cc that made sense to me.
>> >> 
>> >> I am still regression testing the gimple-range-op change, as that was a last
>> >> minute change, but the rest survived a bootstrap and regression test on
>> >> aarch64-unknown-linux-gnu.
>> >> 
>> >> cover letter:
>> >> 
>> >> This patch replaces the existing tree_code widen_plus and widen_minus
>> >> patterns with internal_fn versions.
>> >> 
>> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
>> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
>> >> except they provide convenience wrappers for defining conversions that require
>> >> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
>> >> and each of those will also require a signed and unsigned version in the case
>> >> of widening. The hi/lo pair is necessary because the widening and narrowing
>> >> operations take n narrow elements as inputs and return n/2 wide elements as
>> >> outputs. The 'lo' operation operates on the first n/2 elements of input. The
>> >> 'hi' operation operates on the second n/2 elements of input. Defining an
>> >> internal_fn along with hi/lo variations allows a single internal function to
>> >> be returned from a vect_recog function that will later be expanded to hi/lo.
>> >> 
>> >> 
>> >>  For example:
>> >>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
>> >> (u/s)addl2
>> >>                        IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
>> >> -> (u/s)addl
>> >> 
>> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
>> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
>> >
>> > What I still don't understand is how we are so narrowly focused on
>> > HI/LO?  We need a combined scalar IFN for pattern selection (not
>> > sure why that's now called _HILO, I expected no suffix).  Then there's
>> > three possibilities the target can implement this:
>> >
>> >  1) with a widen_[su]add<mode> instruction - I _think_ that's what
>> >     RISCV is going to offer since it is a target where vector modes
>> >     have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
>> >     RVV can do a V4HI to V4SI widening and widening add/subtract
>> >     using vwadd[u] and vwsub[u] (the HI->SI widening is actually
>> >     done with a widening add of zero - eh).
>> >     IIRC GCN is the same here.
>> 
>> SVE currently does this too, but the addition and widening are
>> separate operations.  E.g. in principle there's no reason why
>> you can't sign-extend one operand, zero-extend the other, and
>> then add the result together.  Or you could extend them from
>> different sizes (QI and HI).  All of those are supported
>> (if the costing allows them).
>
> I see.  So why does the target the expose widen_[su]add<mode> at all?

It shouldn't (need to) do that.  I don't think we should have an optab
for the unsplit operation.

At least on SVE, we really want the extensions to be fused with loads
(where possible) rather than with arithmetic.

We can still do the widening arithmetic in one go.  It's just that
fusing with the loads works for the mixed-sign and mixed-size cases,
and can handle more than just doubling the element size.

>> If the target has operations to do combined extending and adding (or
>> whatever), then at the moment we rely on combine to generate them.
>> 
>> So I think this case is separate from Andre's work.  The addition
>> itself is just an ordinary addition, and any widening happens by
>> vectorising a CONVERT/NOP_EXPR.
>> 
>> >  2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
>> >     codes currently support (exclusively)
>> >  3) similar, but widen_[su]add{_even,_odd}<mode>
>> >
>> > that said, things like decomposes_to_hilo_fn_p look to paint us into
>> > a 2) corner without good reason.
>> 
>> I suppose one question is: how much of the patch is really specific
>> to HI/LO, and how much is just grouping two halves together?
>
> Yep, that I don't know for sure.
>
>>  The nice
>> thing about the internal-fn grouping macros is that, if (3) is
>> implemented in future, the structure will strongly encourage even/odd
>> pairs to be supported for all operations that support hi/lo.  That is,
>> I would expect the grouping macros to be extended to define even/odd
>> ifns alongside hi/lo ones, rather than adding separate definitions
>> for even/odd functions.
>> 
>> If so, at least from the internal-fn.* side of things, I think the question
>> is whether it's OK to stick with hilo names for now, or whether we should
>> use more forward-looking names.
>
> I think for parts that are independent we could use a more
> forward-looking name.  Maybe _halves?

Using _halves for the ifn macros sounds good to me FWIW.

> But I'm also not sure
> how much of that is really needed (it seems to be tied around
> optimizing optabs space?)

Not sure what you mean by "this".  Optabs space shouldn't be a problem
though.  The optab encoding gives us a full int to play with, and it
could easily go up to 64 bits if necessary/convenient.

At least on the internal-fn.* side, the aim is really just to establish
a regular structure, so that we don't have arbitrary differences between
different widening operations, or too much cut-&-paste.

Thanks,
Richard

Richard Biener May 15, 2023, 11:01 a.m. UTC | #9

On Mon, 15 May 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Fri, 12 May 2023, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On Fri, 12 May 2023, Andre Vieira (lists) wrote:
> >> >
> >> >> I have dealt with, I think..., most of your comments. There's quite a few
> >> >> changes, I think it's all a bit simpler now. I made some other changes to the
> >> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
> >> >> the same behaviour as we had with the tree codes before. Also added some extra
> >> >> checks to tree-cfg.cc that made sense to me.
> >> >> 
> >> >> I am still regression testing the gimple-range-op change, as that was a last
> >> >> minute change, but the rest survived a bootstrap and regression test on
> >> >> aarch64-unknown-linux-gnu.
> >> >> 
> >> >> cover letter:
> >> >> 
> >> >> This patch replaces the existing tree_code widen_plus and widen_minus
> >> >> patterns with internal_fn versions.
> >> >> 
> >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
> >> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
> >> >> except they provide convenience wrappers for defining conversions that require
> >> >> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
> >> >> and each of those will also require a signed and unsigned version in the case
> >> >> of widening. The hi/lo pair is necessary because the widening and narrowing
> >> >> operations take n narrow elements as inputs and return n/2 wide elements as
> >> >> outputs. The 'lo' operation operates on the first n/2 elements of input. The
> >> >> 'hi' operation operates on the second n/2 elements of input. Defining an
> >> >> internal_fn along with hi/lo variations allows a single internal function to
> >> >> be returned from a vect_recog function that will later be expanded to hi/lo.
> >> >> 
> >> >> 
> >> >>  For example:
> >> >>  IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
> >> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
> >> >> (u/s)addl2
> >> >>                        IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
> >> >> -> (u/s)addl
> >> >> 
> >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
> >> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> >> >
> >> > What I still don't understand is how we are so narrowly focused on
> >> > HI/LO?  We need a combined scalar IFN for pattern selection (not
> >> > sure why that's now called _HILO, I expected no suffix).  Then there's
> >> > three possibilities the target can implement this:
> >> >
> >> >  1) with a widen_[su]add<mode> instruction - I _think_ that's what
> >> >     RISCV is going to offer since it is a target where vector modes
> >> >     have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
> >> >     RVV can do a V4HI to V4SI widening and widening add/subtract
> >> >     using vwadd[u] and vwsub[u] (the HI->SI widening is actually
> >> >     done with a widening add of zero - eh).
> >> >     IIRC GCN is the same here.
> >> 
> >> SVE currently does this too, but the addition and widening are
> >> separate operations.  E.g. in principle there's no reason why
> >> you can't sign-extend one operand, zero-extend the other, and
> >> then add the result together.  Or you could extend them from
> >> different sizes (QI and HI).  All of those are supported
> >> (if the costing allows them).
> >
> > I see.  So why does the target the expose widen_[su]add<mode> at all?
> 
> It shouldn't (need to) do that.  I don't think we should have an optab
> for the unsplit operation.
> 
> At least on SVE, we really want the extensions to be fused with loads
> (where possible) rather than with arithmetic.
> 
> We can still do the widening arithmetic in one go.  It's just that
> fusing with the loads works for the mixed-sign and mixed-size cases,
> and can handle more than just doubling the element size.
> 
> >> If the target has operations to do combined extending and adding (or
> >> whatever), then at the moment we rely on combine to generate them.
> >> 
> >> So I think this case is separate from Andre's work.  The addition
> >> itself is just an ordinary addition, and any widening happens by
> >> vectorising a CONVERT/NOP_EXPR.
> >> 
> >> >  2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
> >> >     codes currently support (exclusively)
> >> >  3) similar, but widen_[su]add{_even,_odd}<mode>
> >> >
> >> > that said, things like decomposes_to_hilo_fn_p look to paint us into
> >> > a 2) corner without good reason.
> >> 
> >> I suppose one question is: how much of the patch is really specific
> >> to HI/LO, and how much is just grouping two halves together?
> >
> > Yep, that I don't know for sure.
> >
> >>  The nice
> >> thing about the internal-fn grouping macros is that, if (3) is
> >> implemented in future, the structure will strongly encourage even/odd
> >> pairs to be supported for all operations that support hi/lo.  That is,
> >> I would expect the grouping macros to be extended to define even/odd
> >> ifns alongside hi/lo ones, rather than adding separate definitions
> >> for even/odd functions.
> >> 
> >> If so, at least from the internal-fn.* side of things, I think the question
> >> is whether it's OK to stick with hilo names for now, or whether we should
> >> use more forward-looking names.
> >
> > I think for parts that are independent we could use a more
> > forward-looking name.  Maybe _halves?
> 
> Using _halves for the ifn macros sounds good to me FWIW.
> 
> > But I'm also not sure
> > how much of that is really needed (it seems to be tied around
> > optimizing optabs space?)
> 
> Not sure what you mean by "this".  Optabs space shouldn't be a problem
> though.  The optab encoding gives us a full int to play with, and it
> could easily go up to 64 bits if necessary/convenient.
> 
> At least on the internal-fn.* side, the aim is really just to establish
> a regular structure, so that we don't have arbitrary differences between
> different widening operations, or too much cut-&-paste.

Hmm, I'm looking at the need for the std::map and 
internal_fn_hilo_keys_array and internal_fn_hilo_values_array.
The vectorizer pieces contain

+  if (code.is_fn_code ())
+     {
+      internal_fn ifn = as_internal_fn ((combined_fn) code);
+      gcc_assert (decomposes_to_hilo_fn_p (ifn));
+
+      internal_fn lo, hi;
+      lookup_hilo_internal_fn (ifn, &lo, &hi);
+      *code1 = as_combined_fn (lo);
+      *code2 = as_combined_fn (hi);
+      optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
+      optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype));

so that tries to automatically associate the scalar widening IFN
with the set(s) of IFN pairs we can split to.  But then this
list should be static and there's no need to create a std::map?
Maybe gencfn-macros.cc can be enhanced to output these static
cases?  Or the vectorizer could (as it did previously) simply
open-code the handled cases (I guess since we deal with two
cases only now I'd prefer that).

Thanks,
Richard.


> Thanks,
> Richard
>

Richard Sandiford May 15, 2023, 11:10 a.m. UTC | #10

Richard Biener <rguenther@suse.de> writes:
> On Mon, 15 May 2023, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > But I'm also not sure
>> > how much of that is really needed (it seems to be tied around
>> > optimizing optabs space?)
>> 
>> Not sure what you mean by "this".  Optabs space shouldn't be a problem
>> though.  The optab encoding gives us a full int to play with, and it
>> could easily go up to 64 bits if necessary/convenient.
>> 
>> At least on the internal-fn.* side, the aim is really just to establish
>> a regular structure, so that we don't have arbitrary differences between
>> different widening operations, or too much cut-&-paste.
>
> Hmm, I'm looking at the need for the std::map and 
> internal_fn_hilo_keys_array and internal_fn_hilo_values_array.
> The vectorizer pieces contain
>
> +  if (code.is_fn_code ())
> +     {
> +      internal_fn ifn = as_internal_fn ((combined_fn) code);
> +      gcc_assert (decomposes_to_hilo_fn_p (ifn));
> +
> +      internal_fn lo, hi;
> +      lookup_hilo_internal_fn (ifn, &lo, &hi);
> +      *code1 = as_combined_fn (lo);
> +      *code2 = as_combined_fn (hi);
> +      optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
> +      optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype));
>
> so that tries to automatically associate the scalar widening IFN
> with the set(s) of IFN pairs we can split to.  But then this
> list should be static and there's no need to create a std::map?
> Maybe gencfn-macros.cc can be enhanced to output these static
> cases?  Or the vectorizer could (as it did previously) simply
> open-code the handled cases (I guess since we deal with two
> cases only now I'd prefer that).

Ah, yeah, I pushed back against that too.  I think it should be possible
to do it using the preprocessor, if the macros are defined appropriately.
But if it isn't possible to do it with macros then I agree that a
generator would be better than initialisation within the compiler.

Thanks,
Richard

Andre Vieira (lists) May 15, 2023, 11:53 a.m. UTC | #11

On 15/05/2023 12:01, Richard Biener wrote:
> On Mon, 15 May 2023, Richard Sandiford wrote:
> 
>> Richard Biener <rguenther@suse.de> writes:
>>> On Fri, 12 May 2023, Richard Sandiford wrote:
>>>
>>>> Richard Biener <rguenther@suse.de> writes:
>>>>> On Fri, 12 May 2023, Andre Vieira (lists) wrote:
>>>>>
>>>>>> I have dealt with, I think..., most of your comments. There's quite a few
>>>>>> changes, I think it's all a bit simpler now. I made some other changes to the
>>>>>> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
>>>>>> the same behaviour as we had with the tree codes before. Also added some extra
>>>>>> checks to tree-cfg.cc that made sense to me.
>>>>>>
>>>>>> I am still regression testing the gimple-range-op change, as that was a last
>>>>>> minute change, but the rest survived a bootstrap and regression test on
>>>>>> aarch64-unknown-linux-gnu.
>>>>>>
>>>>>> cover letter:
>>>>>>
>>>>>> This patch replaces the existing tree_code widen_plus and widen_minus
>>>>>> patterns with internal_fn versions.
>>>>>>
>>>>>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
>>>>>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
>>>>>> except they provide convenience wrappers for defining conversions that require
>>>>>> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
>>>>>> and each of those will also require a signed and unsigned version in the case
>>>>>> of widening. The hi/lo pair is necessary because the widening and narrowing
>>>>>> operations take n narrow elements as inputs and return n/2 wide elements as
>>>>>> outputs. The 'lo' operation operates on the first n/2 elements of input. The
>>>>>> 'hi' operation operates on the second n/2 elements of input. Defining an
>>>>>> internal_fn along with hi/lo variations allows a single internal function to
>>>>>> be returned from a vect_recog function that will later be expanded to hi/lo.
>>>>>>
>>>>>>
>>>>>>   For example:
>>>>>>   IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>>>>>> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
>>>>>> (u/s)addl2
>>>>>>                         IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
>>>>>> -> (u/s)addl
>>>>>>
>>>>>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
>>>>>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
>>>>>
>>>>> What I still don't understand is how we are so narrowly focused on
>>>>> HI/LO?  We need a combined scalar IFN for pattern selection (not
>>>>> sure why that's now called _HILO, I expected no suffix).  Then there's
>>>>> three possibilities the target can implement this:
>>>>>
>>>>>   1) with a widen_[su]add<mode> instruction - I _think_ that's what
>>>>>      RISCV is going to offer since it is a target where vector modes
>>>>>      have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
>>>>>      RVV can do a V4HI to V4SI widening and widening add/subtract
>>>>>      using vwadd[u] and vwsub[u] (the HI->SI widening is actually
>>>>>      done with a widening add of zero - eh).
>>>>>      IIRC GCN is the same here.
>>>>
>>>> SVE currently does this too, but the addition and widening are
>>>> separate operations.  E.g. in principle there's no reason why
>>>> you can't sign-extend one operand, zero-extend the other, and
>>>> then add the result together.  Or you could extend them from
>>>> different sizes (QI and HI).  All of those are supported
>>>> (if the costing allows them).
>>>
>>> I see.  So why does the target the expose widen_[su]add<mode> at all?
>>
>> It shouldn't (need to) do that.  I don't think we should have an optab
>> for the unsplit operation.
>>
>> At least on SVE, we really want the extensions to be fused with loads
>> (where possible) rather than with arithmetic.
>>
>> We can still do the widening arithmetic in one go.  It's just that
>> fusing with the loads works for the mixed-sign and mixed-size cases,
>> and can handle more than just doubling the element size.
>>
>>>> If the target has operations to do combined extending and adding (or
>>>> whatever), then at the moment we rely on combine to generate them.
>>>>
>>>> So I think this case is separate from Andre's work.  The addition
>>>> itself is just an ordinary addition, and any widening happens by
>>>> vectorising a CONVERT/NOP_EXPR.
>>>>
>>>>>   2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
>>>>>      codes currently support (exclusively)
>>>>>   3) similar, but widen_[su]add{_even,_odd}<mode>
>>>>>
>>>>> that said, things like decomposes_to_hilo_fn_p look to paint us into
>>>>> a 2) corner without good reason.
>>>>
>>>> I suppose one question is: how much of the patch is really specific
>>>> to HI/LO, and how much is just grouping two halves together?
>>>
>>> Yep, that I don't know for sure.
>>>
>>>>   The nice
>>>> thing about the internal-fn grouping macros is that, if (3) is
>>>> implemented in future, the structure will strongly encourage even/odd
>>>> pairs to be supported for all operations that support hi/lo.  That is,
>>>> I would expect the grouping macros to be extended to define even/odd
>>>> ifns alongside hi/lo ones, rather than adding separate definitions
>>>> for even/odd functions.
>>>>
>>>> If so, at least from the internal-fn.* side of things, I think the question
>>>> is whether it's OK to stick with hilo names for now, or whether we should
>>>> use more forward-looking names.
>>>
>>> I think for parts that are independent we could use a more
>>> forward-looking name.  Maybe _halves?
>>
>> Using _halves for the ifn macros sounds good to me FWIW.
>>
>>> But I'm also not sure
>>> how much of that is really needed (it seems to be tied around
>>> optimizing optabs space?)
>>
>> Not sure what you mean by "this".  Optabs space shouldn't be a problem
>> though.  The optab encoding gives us a full int to play with, and it
>> could easily go up to 64 bits if necessary/convenient.
>>
>> At least on the internal-fn.* side, the aim is really just to establish
>> a regular structure, so that we don't have arbitrary differences between
>> different widening operations, or too much cut-&-paste.
> 
> Hmm, I'm looking at the need for the std::map and
> internal_fn_hilo_keys_array and internal_fn_hilo_values_array.
> The vectorizer pieces contain
> 
> +  if (code.is_fn_code ())
> +     {
> +      internal_fn ifn = as_internal_fn ((combined_fn) code);
> +      gcc_assert (decomposes_to_hilo_fn_p (ifn));
> +
> +      internal_fn lo, hi;
> +      lookup_hilo_internal_fn (ifn, &lo, &hi);
> +      *code1 = as_combined_fn (lo);
> +      *code2 = as_combined_fn (hi);
> +      optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
> +      optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype));
> 
> so that tries to automatically associate the scalar widening IFN
> with the set(s) of IFN pairs we can split to.  But then this
> list should be static and there's no need to create a std::map?
> Maybe gencfn-macros.cc can be enhanced to output these static
> cases?  Or the vectorizer could (as it did previously) simply
> open-code the handled cases (I guess since we deal with two
> cases only now I'd prefer that).
> 
> Thanks,
> Richard.
> 
> 
>> Thanks,
>> Richard
>>
> 
The patch I uploaded last no longer has std::map nor 
internal_fn_hilo_keys_array and internal_fn_hilo_values_array. (I've 
attached it again)

I'm not sure I understand the _halves, do you mean that for the case 
where I had _hilo or _HILO before we rename that to _halves/_HALVES such 
that it later represents both _hi/_lo separation and _even/_odd?

And am I correct to assume we are just giving up on having a 
INTERNAL_OPTAB_FN idea for 1)?

Kind regards,
Andre
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bfc98a8d943467b33390defab9682f44efab5907..ffbbecb9409e1c2835d658c2a8855cd0e955c0f2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4626,7 +4626,7 @@
   [(set_attr "type" "neon_<ADDSUB:optab>_long")]
 )
 
-(define_expand "vec_widen_<su>addl_lo_<mode>"
+(define_expand "vec_widen_<su>add_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4638,7 +4638,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>addl_hi_<mode>"
+(define_expand "vec_widen_<su>add_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4650,7 +4650,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_lo_<mode>"
+(define_expand "vec_widen_<su>sub_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4662,7 +4662,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_hi_<mode>"
+(define_expand "vec_widen_<su>sub_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..0fd7e6cce8bbd4ecb8027b702722adcf6c32eb55 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1811,6 +1811,10 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
 @tindex VEC_WIDEN_MULT_LO_EXPR
+@tindex IFN_VEC_WIDEN_PLUS_HI
+@tindex IFN_VEC_WIDEN_PLUS_LO
+@tindex IFN_VEC_WIDEN_MINUS_HI
+@tindex IFN_VEC_WIDEN_MINUS_LO
 @tindex VEC_WIDEN_PLUS_HI_EXPR
 @tindex VEC_WIDEN_PLUS_LO_EXPR
 @tindex VEC_WIDEN_MINUS_HI_EXPR
@@ -1861,6 +1865,33 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the
 low @code{N/2} elements of the two vector are multiplied to produce the
 vector of @code{N/2} products.
 
+@item IFN_VEC_WIDEN_PLUS_HI
+@itemx IFN_VEC_WIDEN_PLUS_LO
+These internal functions represent widening vector addition of the high and low
+parts of the two input vectors, respectively.  Their operands are vectors that
+contain the same number of elements (@code{N}) of the same integral type. The
+result is a vector that contains half as many elements, of an integral type
+whose size is twice as wide.  In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the
+high @code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products.  In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products.
+
+@item IFN_VEC_WIDEN_MINUS_HI
+@itemx IFN_VEC_WIDEN_MINUS_LO
+These internal functions represent widening vector subtraction of the high and
+low parts of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The high/low elements of the second vector are subtracted from the high/low
+elements of the first. The result is a vector that contains half as many
+elements, of an integral type whose size is twice as wide.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second
+vector are subtracted from the high @code{N/2} of the first to produce the
+vector of @code{N/2} products.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second
+vector are subtracted from the low @code{N/2} of the first to produce the
+vector of @code{N/2} products.
+
 @item VEC_WIDEN_PLUS_HI_EXPR
 @itemx VEC_WIDEN_PLUS_LO_EXPR
 These nodes represent widening vector addition of the high and low parts of
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 594bd3043f0e944299ddfff219f757ef15a3dd61..66636d82df27626e7911efd0cb8526921b39633f 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1187,6 +1187,7 @@ gimple_range_op_handler::maybe_non_standard ()
 {
   range_operator *signed_op = ptr_op_widen_mult_signed;
   range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
+  bool signed1, signed2, signed_ret;
   if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
     switch (gimple_assign_rhs_code (m_stmt))
       {
@@ -1202,32 +1203,55 @@ gimple_range_op_handler::maybe_non_standard ()
 	  m_op1 = gimple_assign_rhs1 (m_stmt);
 	  m_op2 = gimple_assign_rhs2 (m_stmt);
 	  tree ret = gimple_assign_lhs (m_stmt);
-	  bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
-	  bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
-	  bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
-
-	  /* Normally these operands should all have the same sign, but
-	     some passes and violate this by taking mismatched sign args.  At
-	     the moment the only one that's possible is mismatch inputs and
-	     unsigned output.  Once ranger supports signs for the operands we
-	     can properly fix it,  for now only accept the case we can do
-	     correctly.  */
-	  if ((signed1 ^ signed2) && signed_ret)
-	    return;
-
-	  m_valid = true;
-	  if (signed2 && !signed1)
-	    std::swap (m_op1, m_op2);
-
-	  if (signed1 || signed2)
-	    m_int = signed_op;
-	  else
-	    m_int = unsigned_op;
+	  signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+	  signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+	  signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
 	  break;
 	}
 	default:
-	  break;
+	  return;
       }
+  else if (gimple_code (m_stmt) == GIMPLE_CALL
+      && gimple_call_internal_p (m_stmt)
+      && gimple_get_lhs (m_stmt) != NULL_TREE)
+    switch (gimple_call_internal_fn (m_stmt))
+      {
+      case IFN_VEC_WIDEN_PLUS_LO:
+      case IFN_VEC_WIDEN_PLUS_HI:
+	  {
+	    signed_op = ptr_op_widen_plus_signed;
+	    unsigned_op = ptr_op_widen_plus_unsigned;
+	    m_valid = false;
+	    m_op1 = gimple_call_arg (m_stmt, 0);
+	    m_op2 = gimple_call_arg (m_stmt, 1);
+	    tree ret = gimple_get_lhs (m_stmt);
+	    signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+	    signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+	    signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
+	    break;
+	  }
+      default:
+	return;
+      }
+  else
+    return;
+
+    /* Normally these operands should all have the same sign, but some passes
+       and violate this by taking mismatched sign args.  At the moment the only
+       one that's possible is mismatch inputs and unsigned output.  Once ranger
+       supports signs for the operands we can properly fix it,  for now only
+       accept the case we can do correctly.  */
+    if ((signed1 ^ signed2) && signed_ret)
+      return;
+
+    m_valid = true;
+    if (signed2 && !signed1)
+      std::swap (m_op1, m_op2);
+
+    if (signed1 || signed2)
+      m_int = signed_op;
+    else
+      m_int = unsigned_op;
 }
 
 // Set up a gimple_range_op_handler for any built in function which can be
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..1acea5ae33046b70de247b1688aea874d9956abc 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -90,6 +90,19 @@ lookup_internal_fn (const char *name)
   return entry ? *entry : IFN_LAST;
 }
 
+/*  Given an internal_fn IFN that is a HILO function, return its corresponding
+    LO and HI internal_fns.  */
+
+extern void
+lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi)
+{
+  gcc_assert (decomposes_to_hilo_fn_p (ifn));
+
+  *lo = internal_fn (ifn + 1);
+  *hi = internal_fn (ifn + 2);
+}
+
+
 /* Fnspec of each internal function, indexed by function number.  */
 const_tree internal_fn_fnspec_array[IFN_LAST + 1];
 
@@ -137,7 +150,16 @@ const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) TYPE##_direct,
 #define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
 				     UNSIGNED_OPTAB, TYPE) TYPE##_direct,
+#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \
+					    UNSIGNED_OPTAB, TYPE)		  \
+TYPE##_direct, TYPE##_direct, TYPE##_direct,
+#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE)	\
+TYPE##_direct, TYPE##_direct, TYPE##_direct,
 #include "internal-fn.def"
+#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
   not_direct
 };
 
@@ -3852,7 +3874,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 
 /* Return the optab used by internal function FN.  */
 
-static optab
+optab
 direct_internal_fn_optab (internal_fn fn, tree_pair types)
 {
   switch (fn)
@@ -3971,6 +3993,9 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_VEC_WIDEN_PLUS_HILO:
+    case IFN_VEC_WIDEN_PLUS_LO:
+    case IFN_VEC_WIDEN_PLUS_HI:
       return true;
 
     default:
@@ -4044,6 +4069,88 @@ first_commutative_argument (internal_fn fn)
     }
 }
 
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as wide as the element size of the input vectors.  */
+
+bool
+widening_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \
+    case IFN_##NAME##_HILO:\
+    case IFN_##NAME##_HI: \
+    case IFN_##NAME##_LO: \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+
+    default:
+      return false;
+    }
+}
+
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as narrow as the element size of the input vectors.  */
+
+bool
+narrowing_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \
+    case IFN_##NAME##_HILO:\
+    case IFN_##NAME##_HI: \
+    case IFN_##NAME##_LO: \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+
+    default:
+      return false;
+    }
+}
+
+/* Return true if FN decomposes to _hi and _lo IFN.  */
+
+bool
+decomposes_to_hilo_fn_p (internal_fn fn)
+{
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \
+    case IFN_##NAME##_HILO:\
+      return true;
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+    #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \
+    case IFN_##NAME##_HILO:\
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+    #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+
+    default:
+      return false;
+    }
+}
+
 /* Return true if IFN_SET_EDOM is supported.  */
 
 bool
@@ -4071,7 +4178,33 @@ set_edom_supported_p (void)
     optab which_optab = direct_internal_fn_optab (fn, types);		\
     expand_##TYPE##_optab_fn (fn, stmt, which_optab);			\
   }
+#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR,	    \
+					    SIGNED_OPTAB, UNSIGNED_OPTAB,   \
+					    TYPE)			    \
+  static void								    \
+  expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED,		    \
+			gcall *stmt ATTRIBUTE_UNUSED)			    \
+  {									    \
+    gcc_unreachable ();							    \
+  }									    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_HI, FLAGS, SELECTOR, SIGNED_OPTAB,    \
+			       UNSIGNED_OPTAB, TYPE)			    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_LO, FLAGS, SELECTOR, SIGNED_OPTAB,    \
+			       UNSIGNED_OPTAB, TYPE)
+#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE)	\
+  static void								\
+  expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED,		\
+			gcall *stmt ATTRIBUTE_UNUSED)			\
+  {									\
+    gcc_unreachable ();							\
+  }									\
+  DEF_INTERNAL_OPTAB_FN(CODE##_LO, FLAGS, OPTAB, TYPE)			\
+  DEF_INTERNAL_OPTAB_FN(CODE##_HI, FLAGS, OPTAB, TYPE)
 #include "internal-fn.def"
+#undef DEF_INTERNAL_OPTAB_FN
+#undef DEF_INTERNAL_SIGNED_OPTAB_FN
+#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
 
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
@@ -4080,6 +4213,7 @@ set_edom_supported_p (void)
 
    where STMT is the statement that performs the call. */
 static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = {
+
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE,
 #include "internal-fn.def"
   0
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..012dd323b86dd7cfcc5c13d3a2bb2a453937155d 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -85,6 +85,13 @@ along with GCC; see the file COPYING3.  If not see
    says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX}
    group of functions to any integral mode (including vector modes).
 
+   DEF_INTERNAL_SIGNED_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it
+   provides convenience wrappers for defining conversions that require a
+   hi/lo split, like widening and narrowing operations.  Each definition
+   for <NAME> will require an optab named <OPTAB> and two other optabs that
+   you specify for signed and unsigned.
+
+
    Each entry must have a corresponding expander of the form:
 
      void expand_NAME (gimple_call stmt)
@@ -123,6 +130,20 @@ along with GCC; see the file COPYING3.  If not see
   DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
 #endif
 
+#ifndef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN
+#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \
+  DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE) \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE)
+#endif
+
+#ifndef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
+#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, FLAGS, OPTAB, TYPE) \
+  DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE) \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE)
+#endif
+
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load)
 DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
@@ -315,6 +336,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
 DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
+DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_PLUS,
+				     ECF_CONST | ECF_NOTHROW,
+				     first,
+				     vec_widen_sadd, vec_widen_uadd,
+				     binary)
+DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_MINUS,
+				     ECF_CONST | ECF_NOTHROW,
+				     first,
+				     vec_widen_ssub, vec_widen_usub,
+				     binary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 08922ed4254898f5fffca3f33973e96ed9ce772f..8ba07d6d1338e75bc5a451d9e403112a608f3ea2 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -20,6 +20,10 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_INTERNAL_FN_H
 #define GCC_INTERNAL_FN_H
 
+#include "insn-codes.h"
+#include "insn-opinit.h"
+
+
 /* INTEGER_CST values for IFN_UNIQUE function arg-0.
 
    UNSPEC: Undifferentiated UNIQUE.
@@ -112,6 +116,8 @@ internal_fn_name (enum internal_fn fn)
 }
 
 extern internal_fn lookup_internal_fn (const char *);
+extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *);
+extern optab direct_internal_fn_optab (internal_fn, tree_pair);
 
 /* Return the ECF_* flags for function FN.  */
 
@@ -210,6 +216,9 @@ extern bool commutative_binary_fn_p (internal_fn);
 extern bool commutative_ternary_fn_p (internal_fn);
 extern int first_commutative_argument (internal_fn);
 extern bool associative_binary_fn_p (internal_fn);
+extern bool widening_fn_p (code_helper);
+extern bool narrowing_fn_p (code_helper);
+extern bool decomposes_to_hilo_fn_p (internal_fn);
 
 extern bool set_edom_supported_p (void);
 
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index c8e39c82d57a7d726e7da33d247b80f32ec9236c..5a08d91e550b2d92e9572211f811fdba99a33a38 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1314,7 +1314,15 @@ commutative_optab_p (optab binoptab)
 	  || binoptab == smul_widen_optab
 	  || binoptab == umul_widen_optab
 	  || binoptab == smul_highpart_optab
-	  || binoptab == umul_highpart_optab);
+	  || binoptab == umul_highpart_optab
+	  || binoptab == vec_widen_saddl_hi_optab
+	  || binoptab == vec_widen_saddl_lo_optab
+	  || binoptab == vec_widen_uaddl_hi_optab
+	  || binoptab == vec_widen_uaddl_lo_optab
+	  || binoptab == vec_widen_sadd_hi_optab
+	  || binoptab == vec_widen_sadd_lo_optab
+	  || binoptab == vec_widen_uadd_hi_optab
+	  || binoptab == vec_widen_uadd_lo_optab);
 }
 
 /* X is to be used in mode MODE as operand OPN to BINOPTAB.  If we're
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b300c9ca5737de9be809fa01aabe5e01..16d121722c8c5723d9b164f5a2c616dc7ec143de 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -410,6 +410,10 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
 OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
 OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
 OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
+OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a")
+OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a")
+OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
+OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -422,6 +426,10 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
 OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
 OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
 OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
+OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a")
+OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a")
+OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
+OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
 OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
 OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
 OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
index 220bd9352a4c7acd2e3713e441d74898d3e92b30..7037673d32bd780e1c9b58a51e58e2bac3b30b7e 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
index a2bed63affbd091977df95a126da1f5b8c1d41d2..83bc1edb6105f47114b665e24a13e6194b2179a2 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 0aeebb67fac864db284985f4a6f0653af281d62b..28464ad9e3a7ea25557ffebcdbdbc1340f9e0d8b 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "profile.h"
 #include "sreal.h"
+#include "internal-fn.h"
 
 /* This file contains functions for building the Control Flow Graph (CFG)
    for a function tree.  */
@@ -3411,6 +3412,52 @@ verify_gimple_call (gcall *stmt)
 	  debug_generic_stmt (fn);
 	  return true;
 	}
+      internal_fn ifn = gimple_call_internal_fn (stmt);
+      if (ifn == IFN_LAST)
+	{
+	  error ("gimple call has an invalid IFN");
+	  debug_generic_stmt (fn);
+	  return true;
+	}
+      else if (decomposes_to_hilo_fn_p (ifn))
+	{
+	  /* Non decomposed HILO stmts should not appear in IL, these are
+	     merely used as an internal representation to the auto-vectorizer
+	     pass and should have been expanded to their _LO _HI variants.  */
+	  error ("gimple call has an non decomposed HILO IFN");
+	  debug_generic_stmt (fn);
+	  return true;
+	}
+      else if (ifn == IFN_VEC_WIDEN_PLUS_LO
+	       || ifn == IFN_VEC_WIDEN_PLUS_HI
+	       || ifn == IFN_VEC_WIDEN_MINUS_LO
+	       || ifn == IFN_VEC_WIDEN_MINUS_HI)
+	{
+	  tree rhs1_type = TREE_TYPE (gimple_call_arg (stmt, 0));
+	  tree rhs2_type = TREE_TYPE (gimple_call_arg (stmt, 1));
+	  tree lhs_type = TREE_TYPE (gimple_get_lhs (stmt));
+	  if (TREE_CODE (lhs_type) == VECTOR_TYPE)
+	    {
+	      if (TREE_CODE (rhs1_type) != VECTOR_TYPE
+		  || TREE_CODE (rhs2_type) != VECTOR_TYPE)
+		{
+		  error ("invalid non-vector operands in vector IFN call");
+		  debug_generic_stmt (fn);
+		  return true;
+		}
+	      lhs_type = TREE_TYPE (lhs_type);
+	      rhs1_type = TREE_TYPE (rhs1_type);
+	      rhs2_type = TREE_TYPE (rhs2_type);
+	    }
+	  if (POINTER_TYPE_P (lhs_type)
+	      || POINTER_TYPE_P (rhs1_type)
+	      || POINTER_TYPE_P (rhs2_type))
+	    {
+	      error ("invalid (pointer) operands in vector IFN call");
+	      debug_generic_stmt (fn);
+	      return true;
+	    }
+	}
     }
   else
     {
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 63a19f8d1d89c6bd5d8e55a299cbffaa324b4b84..d74d8db2173b1ab117250fea89de5212d5e354ec 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -4433,7 +4433,20 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
 	tree decl;
 
 	if (gimple_call_internal_p (stmt))
-	  return 0;
+	  {
+	    internal_fn fn = gimple_call_internal_fn (stmt);
+	    switch (fn)
+	      {
+	      case IFN_VEC_WIDEN_PLUS_HI:
+	      case IFN_VEC_WIDEN_PLUS_LO:
+	      case IFN_VEC_WIDEN_MINUS_HI:
+	      case IFN_VEC_WIDEN_MINUS_LO:
+		return 1;
+
+	      default:
+		return 0;
+	      }
+	  }
 	else if ((decl = gimple_call_fndecl (stmt))
 		 && fndecl_built_in_p (decl))
 	  {
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1778af0242898e3dc73d94d22a5b8505628a53b5..93cebc72beb4f65249a69b2665dfeb8a0991c1d1 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
 
 static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
-		      tree_code widened_code, bool shift_p,
+		      code_helper widened_code, bool shift_p,
 		      unsigned int max_nops,
 		      vect_unpromoted_value *unprom, tree *common_type,
 		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
-  gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
-  if (!assign)
+  gimple* stmt = stmt_info->stmt;
+  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
+    return 0;
+
+  code_helper rhs_code;
+  if (is_gimple_assign (stmt))
+    rhs_code = gimple_assign_rhs_code (stmt);
+  else if (is_gimple_call (stmt))
+    rhs_code = gimple_call_combined_fn (stmt);
+  else
     return 0;
 
-  tree_code rhs_code = gimple_assign_rhs_code (assign);
-  if (rhs_code != code && rhs_code != widened_code)
+  if (rhs_code != code
+      && rhs_code != widened_code)
     return 0;
 
-  tree type = TREE_TYPE (gimple_assign_lhs (assign));
+  tree lhs = gimple_get_lhs (stmt);
+  tree type = TREE_TYPE (lhs);
   if (!INTEGRAL_TYPE_P (type))
     return 0;
 
@@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
     {
       vect_unpromoted_value *this_unprom = &unprom[next_op];
       unsigned int nops = 1;
-      tree op = gimple_op (assign, i + 1);
+      tree op = gimple_arg (stmt, i);
       if (i == 1 && TREE_CODE (op) == INTEGER_CST)
 	{
 	  /* We already have a common type from earlier operands.
@@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom[2];
-  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR,
+  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR,
+			     IFN_VEC_WIDEN_MINUS_HILO,
 			     false, 2, unprom, &half_type))
     return NULL;
 
@@ -1395,14 +1405,16 @@ static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 			     stmt_vec_info last_stmt_info, tree *type_out,
 			     tree_code orig_code, code_helper wide_code,
-			     bool shift_p, const char *name)
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
 {
   gimple *last_stmt = last_stmt_info->stmt;
 
   vect_unpromoted_value unprom[2];
   tree half_type;
   if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code,
-			     shift_p, 2, unprom, &half_type))
+			     shift_p, 2, unprom, &half_type, subtype))
+
     return NULL;
 
   /* Pattern detected.  */
@@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
 			      type, pattern_stmt, vecctype);
 }
 
+static gimple *
+vect_recog_widen_op_pattern (vec_info *vinfo,
+			     stmt_vec_info last_stmt_info, tree *type_out,
+			     tree_code orig_code, internal_fn wide_ifn,
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
+{
+  combined_fn ifn = as_combined_fn (wide_ifn);
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      orig_code, ifn, shift_p, name,
+				      subtype);
+}
+
+
 /* Try to detect multiplication on widened inputs, converting MULT_EXPR
    to WIDEN_MULT_EXPR.  See vect_recog_widen_op_pattern for details.  */
 
@@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 }
 
 /* Try to detect addition on widened inputs, converting PLUS_EXPR
-   to WIDEN_PLUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_PLUS_HILO.  See vect_recog_widen_op_pattern for details.  */
 
 static gimple *
 vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      PLUS_EXPR, WIDEN_PLUS_EXPR, false,
-				      "vect_recog_widen_plus_pattern");
+				      PLUS_EXPR, IFN_VEC_WIDEN_PLUS_HILO,
+				      false, "vect_recog_widen_plus_pattern",
+				      &subtype);
 }
 
 /* Try to detect subtraction on widened inputs, converting MINUS_EXPR
-   to WIDEN_MINUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_MINUS_HILO.  See vect_recog_widen_op_pattern for details.  */
 static gimple *
 vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      MINUS_EXPR, WIDEN_MINUS_EXPR, false,
-				      "vect_recog_widen_minus_pattern");
+				      MINUS_EXPR, IFN_VEC_WIDEN_MINUS_HILO,
+				      false, "vect_recog_widen_minus_pattern",
+				      &subtype);
 }
 
 /* Function vect_recog_ctz_ffs_pattern
@@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo,
   vect_unpromoted_value unprom[3];
   tree new_type;
   unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
-					    WIDEN_PLUS_EXPR, false, 3,
+					    IFN_VEC_WIDEN_PLUS_HILO, false, 3,
 					    unprom, &new_type);
   if (nops == 0)
     return NULL;
@@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_mask_conversion_pattern, "mask_conversion" },
   { vect_recog_widen_plus_pattern, "widen_plus" },
   { vect_recog_widen_minus_pattern, "widen_minus" },
+  /* These must come after the double widening ones.  */
 };
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d152ae9ab10b361b88c0f839d6951c43b954750a..24c811ebe01fb8b003100dea494cf64fea72a975 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5038,7 +5038,9 @@ vectorizable_conversion (vec_info *vinfo,
   bool widen_arith = (code == WIDEN_PLUS_EXPR
 		 || code == WIDEN_MINUS_EXPR
 		 || code == WIDEN_MULT_EXPR
-		 || code == WIDEN_LSHIFT_EXPR);
+		 || code == WIDEN_LSHIFT_EXPR
+		 || code == IFN_VEC_WIDEN_PLUS_HILO
+		 || code == IFN_VEC_WIDEN_MINUS_HILO);
 
   if (!widen_arith
       && !CONVERT_EXPR_CODE_P (code)
@@ -5088,7 +5090,9 @@ vectorizable_conversion (vec_info *vinfo,
       gcc_assert (code == WIDEN_MULT_EXPR
 		  || code == WIDEN_LSHIFT_EXPR
 		  || code == WIDEN_PLUS_EXPR
-		  || code == WIDEN_MINUS_EXPR);
+		  || code == WIDEN_MINUS_EXPR
+		  || code == IFN_VEC_WIDEN_PLUS_HILO
+		  || code == IFN_VEC_WIDEN_MINUS_HILO);
 
 
       op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
@@ -12478,10 +12482,43 @@ supportable_widening_operation (vec_info *vinfo,
       optab1 = vec_unpacks_sbool_lo_optab;
       optab2 = vec_unpacks_sbool_hi_optab;
     }
-  else
+
+  if (code.is_fn_code ())
+     {
+      internal_fn ifn = as_internal_fn ((combined_fn) code);
+      gcc_assert (decomposes_to_hilo_fn_p (ifn));
+
+      internal_fn lo, hi;
+      lookup_hilo_internal_fn (ifn, &lo, &hi);
+      *code1 = as_combined_fn (lo);
+      *code2 = as_combined_fn (hi);
+      optab1 = direct_internal_fn_optab (lo, {vectype, vectype});
+      optab2 = direct_internal_fn_optab (hi, {vectype, vectype});
+    }
+  else if (code.is_tree_code ())
     {
-      optab1 = optab_for_tree_code (c1, vectype, optab_default);
-      optab2 = optab_for_tree_code (c2, vectype, optab_default);
+      if (code == FIX_TRUNC_EXPR)
+	{
+	  /* The signedness is determined from output operand.  */
+	  optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype_out, optab_default);
+	}
+      else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ())
+	       && VECTOR_BOOLEAN_TYPE_P (wide_vectype)
+	       && VECTOR_BOOLEAN_TYPE_P (vectype)
+	       && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype)
+	       && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
+	{
+	  /* If the input and result modes are the same, a different optab
+	     is needed where we pass in the number of units in vectype.  */
+	  optab1 = vec_unpacks_sbool_lo_optab;
+	  optab2 = vec_unpacks_sbool_hi_optab;
+	}
+      else
+	{
+	  optab1 = optab_for_tree_code (c1, vectype, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype, optab_default);
+	}
     }
 
   if (!optab1 || !optab2)
diff --git a/gcc/tree.def b/gcc/tree.def
index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3)
 DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2)
 
 /* Widening sad (sum of absolute differences).
-   The first two arguments are of type t1 which should be integer.
-   The third argument and the result are of type t2, such that t2 is at least
-   twice the size of t1.  Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
+   The first two arguments are of type t1 which should be a vector of integers.
+   The third argument and the result are of type t2, such that the size of
+   the elements of t2 is at least twice the size of the elements of t1.
+   Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
    equivalent to:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = PLUS_EXPR (tmp2, arg3)
   or:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
  */

Richard Biener May 15, 2023, 12:21 p.m. UTC | #12

On Mon, 15 May 2023, Andre Vieira (lists) wrote:

> 
> 
> On 15/05/2023 12:01, Richard Biener wrote:
> > On Mon, 15 May 2023, Richard Sandiford wrote:
> > 
> >> Richard Biener <rguenther@suse.de> writes:
> >>> On Fri, 12 May 2023, Richard Sandiford wrote:
> >>>
> >>>> Richard Biener <rguenther@suse.de> writes:
> >>>>> On Fri, 12 May 2023, Andre Vieira (lists) wrote:
> >>>>>
> >>>>>> I have dealt with, I think..., most of your comments. There's quite a
> >>>>>> few
> >>>>>> changes, I think it's all a bit simpler now. I made some other changes
> >>>>>> to the
> >>>>>> costing in tree-inline.cc and gimple-range-op.cc in which I try to
> >>>>>> preserve
> >>>>>> the same behaviour as we had with the tree codes before. Also added
> >>>>>> some extra
> >>>>>> checks to tree-cfg.cc that made sense to me.
> >>>>>>
> >>>>>> I am still regression testing the gimple-range-op change, as that was a
> >>>>>> last
> >>>>>> minute change, but the rest survived a bootstrap and regression test on
> >>>>>> aarch64-unknown-linux-gnu.
> >>>>>>
> >>>>>> cover letter:
> >>>>>>
> >>>>>> This patch replaces the existing tree_code widen_plus and widen_minus
> >>>>>> patterns with internal_fn versions.
> >>>>>>
> >>>>>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and
> >>>>>> DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
> >>>>>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN
> >>>>>> respectively
> >>>>>> except they provide convenience wrappers for defining conversions that
> >>>>>> require
> >>>>>> a hi/lo split.  Each definition for <NAME> will require optabs for _hi
> >>>>>> and _lo
> >>>>>> and each of those will also require a signed and unsigned version in
> >>>>>> the case
> >>>>>> of widening. The hi/lo pair is necessary because the widening and
> >>>>>> narrowing
> >>>>>> operations take n narrow elements as inputs and return n/2 wide
> >>>>>> elements as
> >>>>>> outputs. The 'lo' operation operates on the first n/2 elements of
> >>>>>> input. The
> >>>>>> 'hi' operation operates on the second n/2 elements of input. Defining
> >>>>>> an
> >>>>>> internal_fn along with hi/lo variations allows a single internal
> >>>>>> function to
> >>>>>> be returned from a vect_recog function that will later be expanded to
> >>>>>> hi/lo.
> >>>>>>
> >>>>>>
> >>>>>>   For example:
> >>>>>>   IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
> >>>>>> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
> >>>>>> (u/s)addl2
> >>>>>>                         IFN_VEC_WIDEN_PLUS_LO  ->
> >>>>>> vec_widen_<su>add_lo_<mode>
> >>>>>> -> (u/s)addl
> >>>>>>
> >>>>>> This gives the same functionality as the previous
> >>>>>> WIDEN_PLUS/WIDEN_MINUS tree
> >>>>>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> >>>>>
> >>>>> What I still don't understand is how we are so narrowly focused on
> >>>>> HI/LO?  We need a combined scalar IFN for pattern selection (not
> >>>>> sure why that's now called _HILO, I expected no suffix).  Then there's
> >>>>> three possibilities the target can implement this:
> >>>>>
> >>>>>   1) with a widen_[su]add<mode> instruction - I _think_ that's what
> >>>>>      RISCV is going to offer since it is a target where vector modes
> >>>>>      have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
> >>>>>      RVV can do a V4HI to V4SI widening and widening add/subtract
> >>>>>      using vwadd[u] and vwsub[u] (the HI->SI widening is actually
> >>>>>      done with a widening add of zero - eh).
> >>>>>      IIRC GCN is the same here.
> >>>>
> >>>> SVE currently does this too, but the addition and widening are
> >>>> separate operations.  E.g. in principle there's no reason why
> >>>> you can't sign-extend one operand, zero-extend the other, and
> >>>> then add the result together.  Or you could extend them from
> >>>> different sizes (QI and HI).  All of those are supported
> >>>> (if the costing allows them).
> >>>
> >>> I see.  So why does the target the expose widen_[su]add<mode> at all?
> >>
> >> It shouldn't (need to) do that.  I don't think we should have an optab
> >> for the unsplit operation.
> >>
> >> At least on SVE, we really want the extensions to be fused with loads
> >> (where possible) rather than with arithmetic.
> >>
> >> We can still do the widening arithmetic in one go.  It's just that
> >> fusing with the loads works for the mixed-sign and mixed-size cases,
> >> and can handle more than just doubling the element size.
> >>
> >>>> If the target has operations to do combined extending and adding (or
> >>>> whatever), then at the moment we rely on combine to generate them.
> >>>>
> >>>> So I think this case is separate from Andre's work.  The addition
> >>>> itself is just an ordinary addition, and any widening happens by
> >>>> vectorising a CONVERT/NOP_EXPR.
> >>>>
> >>>>>   2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
> >>>>>      codes currently support (exclusively)
> >>>>>   3) similar, but widen_[su]add{_even,_odd}<mode>
> >>>>>
> >>>>> that said, things like decomposes_to_hilo_fn_p look to paint us into
> >>>>> a 2) corner without good reason.
> >>>>
> >>>> I suppose one question is: how much of the patch is really specific
> >>>> to HI/LO, and how much is just grouping two halves together?
> >>>
> >>> Yep, that I don't know for sure.
> >>>
> >>>>   The nice
> >>>> thing about the internal-fn grouping macros is that, if (3) is
> >>>> implemented in future, the structure will strongly encourage even/odd
> >>>> pairs to be supported for all operations that support hi/lo.  That is,
> >>>> I would expect the grouping macros to be extended to define even/odd
> >>>> ifns alongside hi/lo ones, rather than adding separate definitions
> >>>> for even/odd functions.
> >>>>
> >>>> If so, at least from the internal-fn.* side of things, I think the
> >>>> question
> >>>> is whether it's OK to stick with hilo names for now, or whether we should
> >>>> use more forward-looking names.
> >>>
> >>> I think for parts that are independent we could use a more
> >>> forward-looking name.  Maybe _halves?
> >>
> >> Using _halves for the ifn macros sounds good to me FWIW.
> >>
> >>> But I'm also not sure
> >>> how much of that is really needed (it seems to be tied around
> >>> optimizing optabs space?)
> >>
> >> Not sure what you mean by "this".  Optabs space shouldn't be a problem
> >> though.  The optab encoding gives us a full int to play with, and it
> >> could easily go up to 64 bits if necessary/convenient.
> >>
> >> At least on the internal-fn.* side, the aim is really just to establish
> >> a regular structure, so that we don't have arbitrary differences between
> >> different widening operations, or too much cut-&-paste.
> > 
> > Hmm, I'm looking at the need for the std::map and
> > internal_fn_hilo_keys_array and internal_fn_hilo_values_array.
> > The vectorizer pieces contain
> > 
> > +  if (code.is_fn_code ())
> > +     {
> > +      internal_fn ifn = as_internal_fn ((combined_fn) code);
> > +      gcc_assert (decomposes_to_hilo_fn_p (ifn));
> > +
> > +      internal_fn lo, hi;
> > +      lookup_hilo_internal_fn (ifn, &lo, &hi);
> > +      *code1 = as_combined_fn (lo);
> > +      *code2 = as_combined_fn (hi);
> > +      optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
> > +      optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype));
> > 
> > so that tries to automatically associate the scalar widening IFN
> > with the set(s) of IFN pairs we can split to.  But then this
> > list should be static and there's no need to create a std::map?
> > Maybe gencfn-macros.cc can be enhanced to output these static
> > cases?  Or the vectorizer could (as it did previously) simply
> > open-code the handled cases (I guess since we deal with two
> > cases only now I'd prefer that).
> > 
> > Thanks,
> > Richard.
> > 
> > 
> >> Thanks,
> >> Richard
> >>
> > 
> The patch I uploaded last no longer has std::map nor
> internal_fn_hilo_keys_array and internal_fn_hilo_values_array. (I've attached
> it again)

Whoops, too many patches ...

> I'm not sure I understand the _halves, do you mean that for the case where I
> had _hilo or _HILO before we rename that to _halves/_HALVES such that it later
> represents both _hi/_lo separation and _even/_odd?

I don't see much shared stuff, but I guess we'd see when we add a case
for EVEN/ODD.  The verifier contains

+      else if (decomposes_to_hilo_fn_p (ifn))
+       {
+         /* Non decomposed HILO stmts should not appear in IL, these are
+            merely used as an internal representation to the 
auto-vectorizer
+            pass and should have been expanded to their _LO _HI variants.  
*/
+         error ("gimple call has an non decomposed HILO IFN");
+         debug_generic_stmt (fn);
+         return true;

I think to support case 1) that's not wanted.  Instead what you could
check is that the types involved are vector types, so a subset of
what you check for IFN_VEC_WIDEN_PLUS_LO etc. (but oddly it's not
verified those are all operating on vector types only?)

+/*  Given an internal_fn IFN that is a HILO function, return its 
corresponding
+    LO and HI internal_fns.  */
+
+extern void
+lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn 
*hi)
+{
+  gcc_assert (decomposes_to_hilo_fn_p (ifn));
+
+  *lo = internal_fn (ifn + 1);
+  *hi = internal_fn (ifn + 2);

that might become fragile if we add EVEN/ODD besides HI/LO unless
we merge those with a DEF_INTERNAL_OPTAB_WIDENING_HILO_EVENODD_FN
case, right?

> And am I correct to assume we are just giving up on having a INTERNAL_OPTAB_FN
> idea for 1)?

Well, I think we want all of them in the end (or at least support them
if target need arises).  full vector, hi/lo and even/odd.

Richard.

Andre Vieira (lists) May 18, 2023, 5:15 p.m. UTC | #13

How about this?

Not sure about the DEF_INTERNAL documentation I rewrote in 
internal-fn.def, was struggling to word these, so improvements welcome!

gcc/ChangeLog:

2023-04-25  Andre Vieira  <andre.simoesdiasvieira@arm.com>
             Joel Hutton  <joel.hutton@arm.com>
             Tamar Christina  <tamar.christina@arm.com>

         * config/aarch64/aarch64-simd.md 
(vec_widen_<su>addl_lo_<mode>): Rename
         this ...
         (vec_widen_<su>add_lo_<mode>): ... to this.
         (vec_widen_<su>addl_hi_<mode>): Rename this ...
         (vec_widen_<su>add_hi_<mode>): ... to this.
         (vec_widen_<su>subl_lo_<mode>): Rename this ...
         (vec_widen_<su>sub_lo_<mode>): ... to this.
         (vec_widen_<su>subl_hi_<mode>): Rename this ...
         (vec_widen_<su>sub_hi_<mode>): ...to this.
         * doc/generic.texi: Document new IFN codes.
	* internal-fn.cc (ifn_cmp): Function to compare ifn's for 
sorting/searching.
	(lookup_hilo_internal_fn): Add lookup function.
	(commutative_binary_fn_p): Add widen_plus fn's.
	(widening_fn_p): New function.
	(narrowing_fn_p): New function.
         (direct_internal_fn_optab): Change visibility.
	* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
         internal_fn that expands into multiple internal_fns for widening.
         (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing.
         (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
          IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
          IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, 
IFN_VEC_WIDEN_MINUS_LO,
          IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define 
widening
         plus,minus functions.
	* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
	(lookup_hilo_internal_fn): Likewise.
	(widening_fn_p): Likewise.
	(Narrowing_fn_p): Likewise.
	* optabs.cc (commutative_optab_p): Add widening plus optabs.
	* optabs.def (OPTAB_D): Define widen add, sub optabs.
         * tree-cfg.cc (verify_gimple_call): Add checks for widening ifns.
         * tree-inline.cc (estimate_num_insns): Return same
         cost for widen add and sub IFNs as previous tree_codes.
	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
         patterns with a hi/lo or even/odd split.
         (vect_recog_sad_pattern): Refactor to use new IFN codes.
         (vect_recog_widen_plus_pattern): Likewise.
         (vect_recog_widen_minus_pattern): Likewise.
         (vect_recog_average_pattern): Likewise.
	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
         _HILO IFNs.
	(supportable_widening_operation): Likewise.
         * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-widen-add.c: Test that new
     IFN_VEC_WIDEN_PLUS is being used.
	* gcc.target/aarch64/vect-widen-sub.c: Test that new
     IFN_VEC_WIDEN_MINUS is being used.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bfc98a8d943467b33390defab9682f44efab5907..ffbbecb9409e1c2835d658c2a8855cd0e955c0f2 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4626,7 +4626,7 @@
   [(set_attr "type" "neon_<ADDSUB:optab>_long")]
 )
 
-(define_expand "vec_widen_<su>addl_lo_<mode>"
+(define_expand "vec_widen_<su>add_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4638,7 +4638,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>addl_hi_<mode>"
+(define_expand "vec_widen_<su>add_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4650,7 +4650,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_lo_<mode>"
+(define_expand "vec_widen_<su>sub_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4662,7 +4662,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_hi_<mode>"
+(define_expand "vec_widen_<su>sub_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..5e36dac2b1a10257616f12cdfb0b12d0f2879ae9 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1811,10 +1811,16 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
 @tindex VEC_WIDEN_MULT_LO_EXPR
-@tindex VEC_WIDEN_PLUS_HI_EXPR
-@tindex VEC_WIDEN_PLUS_LO_EXPR
-@tindex VEC_WIDEN_MINUS_HI_EXPR
-@tindex VEC_WIDEN_MINUS_LO_EXPR
+@tindex IFN_VEC_WIDEN_PLUS
+@tindex IFN_VEC_WIDEN_PLUS_HI
+@tindex IFN_VEC_WIDEN_PLUS_LO
+@tindex IFN_VEC_WIDEN_PLUS_EVEN
+@tindex IFN_VEC_WIDEN_PLUS_ODD
+@tindex IFN_VEC_WIDEN_MINUS
+@tindex IFN_VEC_WIDEN_MINUS_HI
+@tindex IFN_VEC_WIDEN_MINUS_LO
+@tindex IFN_VEC_WIDEN_MINUS_EVEN
+@tindex IFN_VEC_WIDEN_MINUS_ODD
 @tindex VEC_UNPACK_HI_EXPR
 @tindex VEC_UNPACK_LO_EXPR
 @tindex VEC_UNPACK_FLOAT_HI_EXPR
@@ -1861,6 +1867,82 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the
 low @code{N/2} elements of the two vector are multiplied to produce the
 vector of @code{N/2} products.
 
+@item IFN_VEC_WIDEN_PLUS
+This internal function represents widening vector addition of two input
+vectors.  Its operands are vectors that contain the same number of elements
+(@code{N}) of the same integral type.  The result is a vector that contains
+the same amount (@code{N}) of elements, of an integral type whose size is twice
+as wide, as the input vectors.  If the current target does not implement the
+corresponding optabs the vectorizer may choose to split it into either a pair
+of @code{IFN_VEC_WIDEN_PLUS_HI} and @code{IFN_VEC_WIDEN_PLUS_LO} or
+@code{IFN_VEC_WIDEN_PLUS_EVEN} and @code{IFN_VEC_WIDEN_PLUS_ODD}, depending
+on what optabs the target implements.
+
+@item IFN_VEC_WIDEN_PLUS_HI
+@itemx IFN_VEC_WIDEN_PLUS_LO
+These internal functions represent widening vector addition of the high and low
+parts of the two input vectors, respectively.  Their operands are vectors that
+contain the same number of elements (@code{N}) of the same integral type. The
+result is a vector that contains half as many elements, of an integral type
+whose size is twice as wide.  In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the
+high @code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.  In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.
+
+@item IFN_VEC_WIDEN_PLUS_EVEN
+@itemx IFN_VEC_WIDEN_PLUS_ODD
+These internal functions represent widening vector addition of the even and odd
+elements of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The result is a vector that contains half as many elements, of an integral type
+whose size is twice as wide.  In the case of @code{IFN_VEC_WIDEN_PLUS_EVEN} the
+even @code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.  In the case of @code{IFN_VEC_WIDEN_PLUS_ODD} the odd
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.
+
+@item IFN_VEC_WIDEN_MINUS
+This internal function represents widening vector subtraction of two input
+vectors.  Its operands are vectors that contain the same number of elements
+(@code{N}) of the same integral type.  The result is a vector that contains
+the same amount (@code{N}) of elements, of an integral type whose size is twice
+as wide, as the input vectors.  If the current target does not implement the
+corresponding optabs the vectorizer may choose to split it into either a pair
+of @code{IFN_VEC_WIDEN_MINUS_HI} and @code{IFN_VEC_WIDEN_MINUS_LO} or
+@code{IFN_VEC_WIDEN_MINUS_EVEN} and @code{IFN_VEC_WIDEN_MINUS_ODD}, depending
+on what optabs the target implements.
+
+@item IFN_VEC_WIDEN_MINUS_HI
+@itemx IFN_VEC_WIDEN_MINUS_LO
+These internal functions represent widening vector subtraction of the high and
+low parts of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The high/low elements of the second vector are subtracted from the high/low
+elements of the first. The result is a vector that contains half as many
+elements, of an integral type whose size is twice as wide.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second
+vector are subtracted from the high @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second
+vector are subtracted from the low @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.
+
+@item IFN_VEC_WIDEN_MINUS_EVEN
+@itemx IFN_VEC_WIDEN_MINUS_ODD
+These internal functions represent widening vector subtraction of the even and
+odd parts of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The even/odd elements of the second vector are subtracted from the even/odd
+elements of the first. The result is a vector that contains half as many
+elements, of an integral type whose size is twice as wide.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_EVEN} the even @code{N/2} elements of the second
+vector are subtracted from the even @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_ODD} the odd @code{N/2} elements of the second
+vector are subtracted from the odd @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.
+
 @item VEC_WIDEN_PLUS_HI_EXPR
 @itemx VEC_WIDEN_PLUS_LO_EXPR
 These nodes represent widening vector addition of the high and low parts of
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 594bd3043f0e944299ddfff219f757ef15a3dd61..33f4b7064a2a22aad49f27b24b409e91a5b89c69 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1187,6 +1187,7 @@ gimple_range_op_handler::maybe_non_standard ()
 {
   range_operator *signed_op = ptr_op_widen_mult_signed;
   range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
+  bool signed1, signed2, signed_ret;
   if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
     switch (gimple_assign_rhs_code (m_stmt))
       {
@@ -1202,32 +1203,55 @@ gimple_range_op_handler::maybe_non_standard ()
 	  m_op1 = gimple_assign_rhs1 (m_stmt);
 	  m_op2 = gimple_assign_rhs2 (m_stmt);
 	  tree ret = gimple_assign_lhs (m_stmt);
-	  bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
-	  bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
-	  bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
-
-	  /* Normally these operands should all have the same sign, but
-	     some passes and violate this by taking mismatched sign args.  At
-	     the moment the only one that's possible is mismatch inputs and
-	     unsigned output.  Once ranger supports signs for the operands we
-	     can properly fix it,  for now only accept the case we can do
-	     correctly.  */
-	  if ((signed1 ^ signed2) && signed_ret)
-	    return;
-
-	  m_valid = true;
-	  if (signed2 && !signed1)
-	    std::swap (m_op1, m_op2);
-
-	  if (signed1 || signed2)
-	    m_int = signed_op;
-	  else
-	    m_int = unsigned_op;
+	  signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+	  signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+	  signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
 	  break;
 	}
 	default:
-	  break;
+	  return;
+      }
+  else if (gimple_code (m_stmt) == GIMPLE_CALL
+      && gimple_call_internal_p (m_stmt)
+      && gimple_get_lhs (m_stmt) != NULL_TREE)
+    switch (gimple_call_internal_fn (m_stmt))
+      {
+      case IFN_VEC_WIDEN_PLUS_LO:
+      case IFN_VEC_WIDEN_PLUS_HI:
+	  {
+	    signed_op = ptr_op_widen_plus_signed;
+	    unsigned_op = ptr_op_widen_plus_unsigned;
+	    m_valid = false;
+	    m_op1 = gimple_call_arg (m_stmt, 0);
+	    m_op2 = gimple_call_arg (m_stmt, 1);
+	    tree ret = gimple_get_lhs (m_stmt);
+	    signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+	    signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+	    signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
+	    break;
+	  }
+      default:
+	return;
       }
+  else
+    return;
+
+  /* Normally these operands should all have the same sign, but some passes
+     and violate this by taking mismatched sign args.  At the moment the only
+     one that's possible is mismatch inputs and unsigned output.  Once ranger
+     supports signs for the operands we can properly fix it,  for now only
+     accept the case we can do correctly.  */
+  if ((signed1 ^ signed2) && signed_ret)
+    return;
+
+  m_valid = true;
+  if (signed2 && !signed1)
+    std::swap (m_op1, m_op2);
+
+  if (signed1 || signed2)
+    m_int = signed_op;
+  else
+    m_int = unsigned_op;
 }
 
 // Set up a gimple_range_op_handler for any built in function which can be
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..348bee35a35ae4ed9a8652f5349f430c2733e1cb 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -90,6 +90,71 @@ lookup_internal_fn (const char *name)
   return entry ? *entry : IFN_LAST;
 }
 
+/*  Given an internal_fn IFN that is either a widening or narrowing function, return its
+    corresponding LO and HI internal_fns.  */
+
+extern void
+lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi)
+{
+  gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn));
+
+  switch (ifn)
+    {
+    default:
+      gcc_unreachable ();
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
+#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)	\
+    case IFN_##NAME:						\
+      *lo = internal_fn (IFN_##NAME##_LO);			\
+      *hi = internal_fn (IFN_##NAME##_HI);			\
+      break;
+#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T)	\
+    case IFN_##NAME:					\
+      *lo = internal_fn (IFN_##NAME##_LO);		\
+      *hi = internal_fn (IFN_##NAME##_HI);		\
+      break;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+    }
+}
+
+extern void
+lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even,
+			    internal_fn *odd)
+{
+  gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn));
+
+  switch (ifn)
+    {
+    default:
+      gcc_unreachable ();
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
+#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)	\
+    case IFN_##NAME:						\
+      *even = internal_fn (IFN_##NAME##_EVEN);			\
+      *odd = internal_fn (IFN_##NAME##_ODD);			\
+      break;
+#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T)	\
+    case IFN_##NAME:					\
+      *even = internal_fn (IFN_##NAME##_EVEN);		\
+      *odd = internal_fn (IFN_##NAME##_ODD);		\
+      break;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+    }
+}
+
+
 /* Fnspec of each internal function, indexed by function number.  */
 const_tree internal_fn_fnspec_array[IFN_LAST + 1];
 
@@ -3852,7 +3917,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 
 /* Return the optab used by internal function FN.  */
 
-static optab
+optab
 direct_internal_fn_optab (internal_fn fn, tree_pair types)
 {
   switch (fn)
@@ -3971,6 +4036,9 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_VEC_WIDEN_PLUS:
+    case IFN_VEC_WIDEN_PLUS_LO:
+    case IFN_VEC_WIDEN_PLUS_HI:
       return true;
 
     default:
@@ -4044,6 +4112,68 @@ first_commutative_argument (internal_fn fn)
     }
 }
 
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as wide as the element size of the input vectors.  */
+
+bool
+widening_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_WIDENING_OPTAB_FN
+    #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \
+    case IFN_##NAME:						  \
+    case IFN_##NAME##_HI:					  \
+    case IFN_##NAME##_LO:					  \
+    case IFN_##NAME##_EVEN:					  \
+    case IFN_##NAME##_ODD:					  \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_WIDENING_OPTAB_FN
+
+    default:
+      return false;
+    }
+}
+
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as narrow as the element size of the input vectors.  */
+
+bool
+narrowing_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_NARROWING_OPTAB_FN
+    #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T)  \
+    case IFN_##NAME##:					    \
+    case IFN_##NAME##_HI:				    \
+    case IFN_##NAME##_LO:				    \
+    case IFN_##NAME##_HI:				    \
+    case IFN_##NAME##_LO:				    \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_NARROWING_OPTAB_FN
+
+    default:
+      return false;
+    }
+}
+
 /* Return true if IFN_SET_EDOM is supported.  */
 
 bool
@@ -4072,6 +4202,8 @@ set_edom_supported_p (void)
     expand_##TYPE##_optab_fn (fn, stmt, which_optab);			\
   }
 #include "internal-fn.def"
+#undef DEF_INTERNAL_OPTAB_FN
+#undef DEF_INTERNAL_SIGNED_OPTAB_FN
 
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
@@ -4080,6 +4212,7 @@ set_edom_supported_p (void)
 
    where STMT is the statement that performs the call. */
 static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = {
+
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE,
 #include "internal-fn.def"
   0
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..e9edaa201ad4ad171a49119efa9d6bff49add9f4 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -85,6 +85,34 @@ along with GCC; see the file COPYING3.  If not see
    says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX}
    group of functions to any integral mode (including vector modes).
 
+   DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal
+   functions with DEF_INTERNAL_SIGNED_OPTAB_FN:
+   - one that describes a widening operation with the same number of elements
+   in the output and input vectors,
+   - two that describe a pair of high-low widening operations where the output
+   vectors each have half the number of elements of the input vectors,
+   corresponding to the result of the widening operation on the top half and
+   bottom half, these have the suffixes _HI and _LO,
+   - and two that describe a pair of even-odd widening operations where the
+   output vectors each have half the number of elements of the input vectors,
+   corresponding to the result of the widening operation on the even and odd
+   elements, these have the suffixes _EVEN and _ODD.
+   These five internal functions will require two optabs each, a SIGNED_OPTAB
+   and an UNSIGNED_OTPAB.
+
+   DEF_INTERNAL_NARROWING_OPTAB_FN is a wrapper that defines five internal
+   functions with DEF_INTERNAL_OPTAB_FN:
+   - one that describes a narrowing operation with the same number of elements
+   in the output and input vectors,
+   - two that describe a pair of high-low narrowing operations where the output
+   vector has the same number of elements in the top or bottom halves as the
+   full input vectors, these have the suffixes _HI and _LO.
+   - and two that describe a pair of even-odd narrowing operations where the
+   output vector has the same number of elements, in the even or odd positions,
+   as the full input vectors, these have the suffixes _EVEN and _ODD.
+   These five internal functions will require an optab each.
+
+
    Each entry must have a corresponding expander of the form:
 
      void expand_NAME (gimple_call stmt)
@@ -123,6 +151,24 @@ along with GCC; see the file COPYING3.  If not see
   DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
 #endif
 
+#ifndef DEF_INTERNAL_WIDENING_OPTAB_FN
+#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE)		    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE)			    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE)	    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE)	    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _EVEN, FLAGS, SELECTOR, SOPTAB##_even, UOPTAB##_even, TYPE) \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR, SOPTAB##_odd, UOPTAB##_odd, TYPE)
+#endif
+
+#ifndef DEF_INTERNAL_NARROWING_OPTAB_FN
+#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE)   \
+  DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)		    \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE)	    \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE)	    \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _EVEN, FLAGS, OPTAB##_even, TYPE)  \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _ODD, FLAGS, OPTAB##_odd, TYPE)
+#endif
+
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load)
 DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
@@ -315,6 +361,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
 DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
+DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_PLUS,
+				ECF_CONST | ECF_NOTHROW,
+				first,
+				vec_widen_sadd, vec_widen_uadd,
+				binary)
+DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS,
+				ECF_CONST | ECF_NOTHROW,
+				first,
+				vec_widen_ssub, vec_widen_usub,
+				binary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 08922ed4254898f5fffca3f33973e96ed9ce772f..3904ba3ca36949d844532a6a9303f550533311a4 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -20,6 +20,10 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_INTERNAL_FN_H
 #define GCC_INTERNAL_FN_H
 
+#include "insn-codes.h"
+#include "insn-opinit.h"
+
+
 /* INTEGER_CST values for IFN_UNIQUE function arg-0.
 
    UNSPEC: Undifferentiated UNIQUE.
@@ -112,6 +116,10 @@ internal_fn_name (enum internal_fn fn)
 }
 
 extern internal_fn lookup_internal_fn (const char *);
+extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *);
+extern void lookup_evenodd_internal_fn (internal_fn, internal_fn *,
+					internal_fn *);
+extern optab direct_internal_fn_optab (internal_fn, tree_pair);
 
 /* Return the ECF_* flags for function FN.  */
 
@@ -210,6 +218,8 @@ extern bool commutative_binary_fn_p (internal_fn);
 extern bool commutative_ternary_fn_p (internal_fn);
 extern int first_commutative_argument (internal_fn);
 extern bool associative_binary_fn_p (internal_fn);
+extern bool widening_fn_p (code_helper);
+extern bool narrowing_fn_p (code_helper);
 
 extern bool set_edom_supported_p (void);
 
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index c8e39c82d57a7d726e7da33d247b80f32ec9236c..5a08d91e550b2d92e9572211f811fdba99a33a38 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1314,7 +1314,15 @@ commutative_optab_p (optab binoptab)
 	  || binoptab == smul_widen_optab
 	  || binoptab == umul_widen_optab
 	  || binoptab == smul_highpart_optab
-	  || binoptab == umul_highpart_optab);
+	  || binoptab == umul_highpart_optab
+	  || binoptab == vec_widen_saddl_hi_optab
+	  || binoptab == vec_widen_saddl_lo_optab
+	  || binoptab == vec_widen_uaddl_hi_optab
+	  || binoptab == vec_widen_uaddl_lo_optab
+	  || binoptab == vec_widen_sadd_hi_optab
+	  || binoptab == vec_widen_sadd_lo_optab
+	  || binoptab == vec_widen_uadd_hi_optab
+	  || binoptab == vec_widen_uadd_lo_optab);
 }
 
 /* X is to be used in mode MODE as operand OPN to BINOPTAB.  If we're
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b300c9ca5737de9be809fa01aabe5e01..d41ed6e1afaddd019c7470f965c0ad21c8b2b9d7 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -410,6 +410,16 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
 OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
 OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
 OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
+OPTAB_D (vec_widen_ssub_optab, "vec_widen_ssub_$a")
+OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a")
+OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a")
+OPTAB_D (vec_widen_ssub_odd_optab, "vec_widen_ssub_odd_$a")
+OPTAB_D (vec_widen_ssub_even_optab, "vec_widen_ssub_even_$a")
+OPTAB_D (vec_widen_sadd_optab, "vec_widen_sadd_$a")
+OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
+OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
+OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a")
+OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -422,6 +432,16 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
 OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
 OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
 OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
+OPTAB_D (vec_widen_usub_optab, "vec_widen_usub_$a")
+OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a")
+OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a")
+OPTAB_D (vec_widen_usub_odd_optab, "vec_widen_usub_odd_$a")
+OPTAB_D (vec_widen_usub_even_optab, "vec_widen_usub_even_$a")
+OPTAB_D (vec_widen_uadd_optab, "vec_widen_uadd_$a")
+OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
+OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
+OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a")
+OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a")
 OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
 OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
 OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
index 220bd9352a4c7acd2e3713e441d74898d3e92b30..7037673d32bd780e1c9b58a51e58e2bac3b30b7e 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
index a2bed63affbd091977df95a126da1f5b8c1d41d2..83bc1edb6105f47114b665e24a13e6194b2179a2 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 0aeebb67fac864db284985f4a6f0653af281d62b..0e847cd04ca6e33f67a86a78a36d35d42aba2627 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "profile.h"
 #include "sreal.h"
+#include "internal-fn.h"
 
 /* This file contains functions for building the Control Flow Graph (CFG)
    for a function tree.  */
@@ -3411,6 +3412,40 @@ verify_gimple_call (gcall *stmt)
 	  debug_generic_stmt (fn);
 	  return true;
 	}
+      internal_fn ifn = gimple_call_internal_fn (stmt);
+      if (ifn == IFN_LAST)
+	{
+	  error ("gimple call has an invalid IFN");
+	  debug_generic_stmt (fn);
+	  return true;
+	}
+      else if (widening_fn_p (ifn)
+	       || narrowing_fn_p (ifn))
+	{
+	  tree lhs = gimple_get_lhs (stmt);
+	  if (!lhs)
+	    {
+	      error ("vector IFN call with no lhs");
+	      debug_generic_stmt (fn);
+	      return true;
+	    }
+
+	  bool non_vector_operands = false;
+	  for (unsigned i = 0; i < gimple_call_num_args (stmt); ++i)
+	    if (!VECTOR_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, i))))
+	      {
+		non_vector_operands = true;
+		break;
+	      }
+
+	  if (non_vector_operands
+	      || !VECTOR_TYPE_P (TREE_TYPE (lhs)))
+	    {
+	      error ("invalid non-vector operands in vector IFN call");
+	      debug_generic_stmt (fn);
+	      return true;
+	    }
+	}
     }
   else
     {
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 63a19f8d1d89c6bd5d8e55a299cbffaa324b4b84..d74d8db2173b1ab117250fea89de5212d5e354ec 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -4433,7 +4433,20 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
 	tree decl;
 
 	if (gimple_call_internal_p (stmt))
-	  return 0;
+	  {
+	    internal_fn fn = gimple_call_internal_fn (stmt);
+	    switch (fn)
+	      {
+	      case IFN_VEC_WIDEN_PLUS_HI:
+	      case IFN_VEC_WIDEN_PLUS_LO:
+	      case IFN_VEC_WIDEN_MINUS_HI:
+	      case IFN_VEC_WIDEN_MINUS_LO:
+		return 1;
+
+	      default:
+		return 0;
+	      }
+	  }
 	else if ((decl = gimple_call_fndecl (stmt))
 		 && fndecl_built_in_p (decl))
 	  {
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1778af0242898e3dc73d94d22a5b8505628a53b5..dcd4b5561600346a2c10bd5133507329206e8837 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
 
 static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
-		      tree_code widened_code, bool shift_p,
+		      code_helper widened_code, bool shift_p,
 		      unsigned int max_nops,
 		      vect_unpromoted_value *unprom, tree *common_type,
 		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
-  gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
-  if (!assign)
+  gimple* stmt = stmt_info->stmt;
+  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
+    return 0;
+
+  code_helper rhs_code;
+  if (is_gimple_assign (stmt))
+    rhs_code = gimple_assign_rhs_code (stmt);
+  else if (is_gimple_call (stmt))
+    rhs_code = gimple_call_combined_fn (stmt);
+  else
     return 0;
 
-  tree_code rhs_code = gimple_assign_rhs_code (assign);
-  if (rhs_code != code && rhs_code != widened_code)
+  if (rhs_code != code
+      && rhs_code != widened_code)
     return 0;
 
-  tree type = TREE_TYPE (gimple_assign_lhs (assign));
+  tree lhs = gimple_get_lhs (stmt);
+  tree type = TREE_TYPE (lhs);
   if (!INTEGRAL_TYPE_P (type))
     return 0;
 
@@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
     {
       vect_unpromoted_value *this_unprom = &unprom[next_op];
       unsigned int nops = 1;
-      tree op = gimple_op (assign, i + 1);
+      tree op = gimple_arg (stmt, i);
       if (i == 1 && TREE_CODE (op) == INTEGER_CST)
 	{
 	  /* We already have a common type from earlier operands.
@@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom[2];
-  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR,
+  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR,
+			     IFN_VEC_WIDEN_MINUS,
 			     false, 2, unprom, &half_type))
     return NULL;
 
@@ -1395,14 +1405,16 @@ static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 			     stmt_vec_info last_stmt_info, tree *type_out,
 			     tree_code orig_code, code_helper wide_code,
-			     bool shift_p, const char *name)
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
 {
   gimple *last_stmt = last_stmt_info->stmt;
 
   vect_unpromoted_value unprom[2];
   tree half_type;
   if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code,
-			     shift_p, 2, unprom, &half_type))
+			     shift_p, 2, unprom, &half_type, subtype))
+
     return NULL;
 
   /* Pattern detected.  */
@@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
 			      type, pattern_stmt, vecctype);
 }
 
+static gimple *
+vect_recog_widen_op_pattern (vec_info *vinfo,
+			     stmt_vec_info last_stmt_info, tree *type_out,
+			     tree_code orig_code, internal_fn wide_ifn,
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
+{
+  combined_fn ifn = as_combined_fn (wide_ifn);
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      orig_code, ifn, shift_p, name,
+				      subtype);
+}
+
+
 /* Try to detect multiplication on widened inputs, converting MULT_EXPR
    to WIDEN_MULT_EXPR.  See vect_recog_widen_op_pattern for details.  */
 
@@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 }
 
 /* Try to detect addition on widened inputs, converting PLUS_EXPR
-   to WIDEN_PLUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_PLUS.  See vect_recog_widen_op_pattern for details.  */
 
 static gimple *
 vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      PLUS_EXPR, WIDEN_PLUS_EXPR, false,
-				      "vect_recog_widen_plus_pattern");
+				      PLUS_EXPR, IFN_VEC_WIDEN_PLUS,
+				      false, "vect_recog_widen_plus_pattern",
+				      &subtype);
 }
 
 /* Try to detect subtraction on widened inputs, converting MINUS_EXPR
-   to WIDEN_MINUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_MINUS.  See vect_recog_widen_op_pattern for details.  */
 static gimple *
 vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      MINUS_EXPR, WIDEN_MINUS_EXPR, false,
-				      "vect_recog_widen_minus_pattern");
+				      MINUS_EXPR, IFN_VEC_WIDEN_MINUS,
+				      false, "vect_recog_widen_minus_pattern",
+				      &subtype);
 }
 
 /* Function vect_recog_ctz_ffs_pattern
@@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo,
   vect_unpromoted_value unprom[3];
   tree new_type;
   unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
-					    WIDEN_PLUS_EXPR, false, 3,
+					    IFN_VEC_WIDEN_PLUS, false, 3,
 					    unprom, &new_type);
   if (nops == 0)
     return NULL;
@@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_mask_conversion_pattern, "mask_conversion" },
   { vect_recog_widen_plus_pattern, "widen_plus" },
   { vect_recog_widen_minus_pattern, "widen_minus" },
+  /* These must come after the double widening ones.  */
 };
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d152ae9ab10b361b88c0f839d6951c43b954750a..132c0337b7f541bfb114c0a3d2abbeffdad79880 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5038,7 +5038,8 @@ vectorizable_conversion (vec_info *vinfo,
   bool widen_arith = (code == WIDEN_PLUS_EXPR
 		 || code == WIDEN_MINUS_EXPR
 		 || code == WIDEN_MULT_EXPR
-		 || code == WIDEN_LSHIFT_EXPR);
+		 || code == WIDEN_LSHIFT_EXPR
+		 || widening_fn_p (code));
 
   if (!widen_arith
       && !CONVERT_EXPR_CODE_P (code)
@@ -5088,8 +5089,8 @@ vectorizable_conversion (vec_info *vinfo,
       gcc_assert (code == WIDEN_MULT_EXPR
 		  || code == WIDEN_LSHIFT_EXPR
 		  || code == WIDEN_PLUS_EXPR
-		  || code == WIDEN_MINUS_EXPR);
-
+		  || code == WIDEN_MINUS_EXPR
+		  || widening_fn_p (code));
 
       op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
 				     gimple_call_arg (stmt, 0);
@@ -12478,26 +12479,69 @@ supportable_widening_operation (vec_info *vinfo,
       optab1 = vec_unpacks_sbool_lo_optab;
       optab2 = vec_unpacks_sbool_hi_optab;
     }
-  else
-    {
-      optab1 = optab_for_tree_code (c1, vectype, optab_default);
-      optab2 = optab_for_tree_code (c2, vectype, optab_default);
+
+  vec_mode = TYPE_MODE (vectype);
+  if (widening_fn_p (code))
+     {
+       /* If this is an internal fn then we must check whether the target
+	  supports either a low-high split or an even-odd split.  */
+      internal_fn ifn = as_internal_fn ((combined_fn) code);
+
+      internal_fn lo, hi, even, odd;
+      lookup_hilo_internal_fn (ifn, &lo, &hi);
+      *code1 = as_combined_fn (lo);
+      *code2 = as_combined_fn (hi);
+      optab1 = direct_internal_fn_optab (lo, {vectype, vectype});
+      optab2 = direct_internal_fn_optab (hi, {vectype, vectype});
+
+      /* If we don't support low-high, then check for even-odd.  */
+      if (!optab1
+	  || (icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing
+	  || !optab2
+	  || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing)
+	{
+	  lookup_evenodd_internal_fn (ifn, &even, &odd);
+	  *code1 = as_combined_fn (even);
+	  *code2 = as_combined_fn (odd);
+	  optab1 = direct_internal_fn_optab (even, {vectype, vectype});
+	  optab2 = direct_internal_fn_optab (odd, {vectype, vectype});
+	}
+    }
+  else if (code.is_tree_code ())
+    {
+      if (code == FIX_TRUNC_EXPR)
+	{
+	  /* The signedness is determined from output operand.  */
+	  optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype_out, optab_default);
+	}
+      else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ())
+	       && VECTOR_BOOLEAN_TYPE_P (wide_vectype)
+	       && VECTOR_BOOLEAN_TYPE_P (vectype)
+	       && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype)
+	       && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
+	{
+	  /* If the input and result modes are the same, a different optab
+	     is needed where we pass in the number of units in vectype.  */
+	  optab1 = vec_unpacks_sbool_lo_optab;
+	  optab2 = vec_unpacks_sbool_hi_optab;
+	}
+      else
+	{
+	  optab1 = optab_for_tree_code (c1, vectype, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype, optab_default);
+	}
+      *code1 = c1;
+      *code2 = c2;
     }
 
   if (!optab1 || !optab2)
     return false;
 
-  vec_mode = TYPE_MODE (vectype);
   if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing
        || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing)
     return false;
 
-  if (code.is_tree_code ())
-  {
-    *code1 = c1;
-    *code2 = c2;
-  }
-
 
   if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
       && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
diff --git a/gcc/tree.def b/gcc/tree.def
index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3)
 DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2)
 
 /* Widening sad (sum of absolute differences).
-   The first two arguments are of type t1 which should be integer.
-   The third argument and the result are of type t2, such that t2 is at least
-   twice the size of t1.  Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
+   The first two arguments are of type t1 which should be a vector of integers.
+   The third argument and the result are of type t2, such that the size of
+   the elements of t2 is at least twice the size of the elements of t1.
+   Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
    equivalent to:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = PLUS_EXPR (tmp2, arg3)
   or:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
  */

Richard Biener May 22, 2023, 1:06 p.m. UTC | #14

On Thu, 18 May 2023, Andre Vieira (lists) wrote:

> How about this?
> 
> Not sure about the DEF_INTERNAL documentation I rewrote in internal-fn.def,
> was struggling to word these, so improvements welcome!

The even/odd variant optabs are also commutative_optab_p, so is
the vec_widen_sadd without hi/lo or even/odd.

+/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */

do you really want -all?  I think you want -details

+      else if (widening_fn_p (ifn)
+              || narrowing_fn_p (ifn))
+       {
+         tree lhs = gimple_get_lhs (stmt);
+         if (!lhs)
+           {
+             error ("vector IFN call with no lhs");
+             debug_generic_stmt (fn);

that's an error because ...?  Maybe we want to verify this
for all ECF_CONST|ECF_NOTHROW (or pure instead of const) internal
function calls, but I wouldn't add any verification as part
of this patch (not special to widening/narrowing fns either).

        if (gimple_call_internal_p (stmt))
-         return 0;
+         {
+           internal_fn fn = gimple_call_internal_fn (stmt);
+           switch (fn)
+             {
+             case IFN_VEC_WIDEN_PLUS_HI:
+             case IFN_VEC_WIDEN_PLUS_LO:
+             case IFN_VEC_WIDEN_MINUS_HI:
+             case IFN_VEC_WIDEN_MINUS_LO:
+               return 1;

this now looks incomplete.  I think that we want instead to
have a default: returning 1 and then special-cases we want
to cost as zero.  Not sure which - maybe blame tells why
this was added?  I think we can deal with this as followup
(likewise the ranger additions).

Otherwise looks good to me.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 2023-04-25  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>             Joel Hutton  <joel.hutton@arm.com>
>             Tamar Christina  <tamar.christina@arm.com>
> 
>         * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
> Rename
>         this ...
>         (vec_widen_<su>add_lo_<mode>): ... to this.
>         (vec_widen_<su>addl_hi_<mode>): Rename this ...
>         (vec_widen_<su>add_hi_<mode>): ... to this.
>         (vec_widen_<su>subl_lo_<mode>): Rename this ...
>         (vec_widen_<su>sub_lo_<mode>): ... to this.
>         (vec_widen_<su>subl_hi_<mode>): Rename this ...
>         (vec_widen_<su>sub_hi_<mode>): ...to this.
>         * doc/generic.texi: Document new IFN codes.
> 	* internal-fn.cc (ifn_cmp): Function to compare ifn's for
> sorting/searching.
> 	(lookup_hilo_internal_fn): Add lookup function.
> 	(commutative_binary_fn_p): Add widen_plus fn's.
> 	(widening_fn_p): New function.
> 	(narrowing_fn_p): New function.
> 	         (direct_internal_fn_optab): Change visibility.
> 	* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
>         internal_fn that expands into multiple internal_fns for widening.
>         (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing.
>         (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
>          IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
>          IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, 
> IFN_VEC_WIDEN_MINUS_LO,
>          IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening
> 	         plus,minus functions.
> 	* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
> 	(lookup_hilo_internal_fn): Likewise.
> 	(widening_fn_p): Likewise.
> 	(Narrowing_fn_p): Likewise.
> 	* optabs.cc (commutative_optab_p): Add widening plus optabs.
> 	* optabs.def (OPTAB_D): Define widen add, sub optabs.
>         * tree-cfg.cc (verify_gimple_call): Add checks for widening ifns.
>         * tree-inline.cc (estimate_num_insns): Return same
>         cost for widen add and sub IFNs as previous tree_codes.
> 	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
>         patterns with a hi/lo or even/odd split.
>         (vect_recog_sad_pattern): Refactor to use new IFN codes.
>         (vect_recog_widen_plus_pattern): Likewise.
>         (vect_recog_widen_minus_pattern): Likewise.
>         (vect_recog_average_pattern): Likewise.
> 	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
> 	         _HILO IFNs.
> 	(supportable_widening_operation): Likewise.
>         * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
> 
> gcc/testsuite/ChangeLog:
> 
>     	* gcc.target/aarch64/vect-widen-add.c: Test that new
>     IFN_VEC_WIDEN_PLUS is being used.
>     	* gcc.target/aarch64/vect-widen-sub.c: Test that new
>     IFN_VEC_WIDEN_MINUS is being used.
>

Andre Vieira (lists) June 1, 2023, 4:27 p.m. UTC | #15

Hi,

This is the updated patch and cover letter. Patches for inline and 
gimple-op changes will follow soon.

     DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN 
are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN 
respectively. With the exception that they provide convenience wrappers 
for a single vector to vector conversion, a hi/lo split or an even/odd 
split.  Each definition for <NAME> will require either signed optabs 
named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for 
narrowing) for each of the five functions it creates.

      For example, for widening addition the 
DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions: 
IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, 
IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two 
optabs, one for signed and one for unsigned.
      Aarch64 implements the hi/lo split optabs:
      IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2
      IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode> -> (u/s)addl

     This gives the same functionality as the previous 
WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into 
VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.

gcc/ChangeLog:

2023-04-25  Andre Vieira  <andre.simoesdiasvieira@arm.com>
             Joel Hutton  <joel.hutton@arm.com>
             Tamar Christina  <tamar.christina@arm.com>

         * config/aarch64/aarch64-simd.md 
(vec_widen_<su>addl_lo_<mode>): Rename
         this ...
         (vec_widen_<su>add_lo_<mode>): ... to this.
         (vec_widen_<su>addl_hi_<mode>): Rename this ...
         (vec_widen_<su>add_hi_<mode>): ... to this.
         (vec_widen_<su>subl_lo_<mode>): Rename this ...
         (vec_widen_<su>sub_lo_<mode>): ... to this.
         (vec_widen_<su>subl_hi_<mode>): Rename this ...
         (vec_widen_<su>sub_hi_<mode>): ...to this.
         * doc/generic.texi: Document new IFN codes.
	* internal-fn.cc (ifn_cmp): Function to compare ifn's for 
sorting/searching.
	(lookup_hilo_internal_fn): Add lookup function.
	(commutative_binary_fn_p): Add widen_plus fn's.
	(widening_fn_p): New function.
	(narrowing_fn_p): New function.
         (direct_internal_fn_optab): Change visibility.
	* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
         internal_fn that expands into multiple internal_fns for widening.
         (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing.
         (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
          IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
          IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, 
IFN_VEC_WIDEN_MINUS_LO,
          IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define 
widening
         plus,minus functions.
	* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
	(lookup_hilo_internal_fn): Likewise.
	(widening_fn_p): Likewise.
	(Narrowing_fn_p): Likewise.
	* optabs.cc (commutative_optab_p): Add widening plus optabs.
	* optabs.def (OPTAB_D): Define widen add, sub optabs.
	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
         patterns with a hi/lo or even/odd split.
         (vect_recog_sad_pattern): Refactor to use new IFN codes.
         (vect_recog_widen_plus_pattern): Likewise.
         (vect_recog_widen_minus_pattern): Likewise.
         (vect_recog_average_pattern): Likewise.
	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
         _HILO IFNs.
	(supportable_widening_operation): Likewise.
         * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-widen-add.c: Test that new
     IFN_VEC_WIDEN_PLUS is being used.
	* gcc.target/aarch64/vect-widen-sub.c: Test that new
     IFN_VEC_WIDEN_MINUS is being used.

On 22/05/2023 14:06, Richard Biener wrote:
> On Thu, 18 May 2023, Andre Vieira (lists) wrote:
> 
>> How about this?
>>
>> Not sure about the DEF_INTERNAL documentation I rewrote in internal-fn.def,
>> was struggling to word these, so improvements welcome!
> 
> The even/odd variant optabs are also commutative_optab_p, so is
> the vec_widen_sadd without hi/lo or even/odd.
> 
> +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */
> 
> do you really want -all?  I think you want -details
> 
> +      else if (widening_fn_p (ifn)
> +              || narrowing_fn_p (ifn))
> +       {
> +         tree lhs = gimple_get_lhs (stmt);
> +         if (!lhs)
> +           {
> +             error ("vector IFN call with no lhs");
> +             debug_generic_stmt (fn);
> 
> that's an error because ...?  Maybe we want to verify this
> for all ECF_CONST|ECF_NOTHROW (or pure instead of const) internal
> function calls, but I wouldn't add any verification as part
> of this patch (not special to widening/narrowing fns either).
> 
>          if (gimple_call_internal_p (stmt))
> -         return 0;
> +         {
> +           internal_fn fn = gimple_call_internal_fn (stmt);
> +           switch (fn)
> +             {
> +             case IFN_VEC_WIDEN_PLUS_HI:
> +             case IFN_VEC_WIDEN_PLUS_LO:
> +             case IFN_VEC_WIDEN_MINUS_HI:
> +             case IFN_VEC_WIDEN_MINUS_LO:
> +               return 1;
> 
> this now looks incomplete.  I think that we want instead to
> have a default: returning 1 and then special-cases we want
> to cost as zero.  Not sure which - maybe blame tells why
> this was added?  I think we can deal with this as followup
> (likewise the ranger additions).
> 
> Otherwise looks good to me.
> 
> Thanks,
> Richard.
> 
>> gcc/ChangeLog:
>>
>> 2023-04-25  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>              Joel Hutton  <joel.hutton@arm.com>
>>              Tamar Christina  <tamar.christina@arm.com>
>>
>>          * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
>> Rename
>>          this ...
>>          (vec_widen_<su>add_lo_<mode>): ... to this.
>>          (vec_widen_<su>addl_hi_<mode>): Rename this ...
>>          (vec_widen_<su>add_hi_<mode>): ... to this.
>>          (vec_widen_<su>subl_lo_<mode>): Rename this ...
>>          (vec_widen_<su>sub_lo_<mode>): ... to this.
>>          (vec_widen_<su>subl_hi_<mode>): Rename this ...
>>          (vec_widen_<su>sub_hi_<mode>): ...to this.
>>          * doc/generic.texi: Document new IFN codes.
>> 	* internal-fn.cc (ifn_cmp): Function to compare ifn's for
>> sorting/searching.
>> 	(lookup_hilo_internal_fn): Add lookup function.
>> 	(commutative_binary_fn_p): Add widen_plus fn's.
>> 	(widening_fn_p): New function.
>> 	(narrowing_fn_p): New function.
>> 	         (direct_internal_fn_optab): Change visibility.
>> 	* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
>>          internal_fn that expands into multiple internal_fns for widening.
>>          (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing.
>>          (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
>>           IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
>>           IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI,
>> IFN_VEC_WIDEN_MINUS_LO,
>>           IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening
>> 	         plus,minus functions.
>> 	* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
>> 	(lookup_hilo_internal_fn): Likewise.
>> 	(widening_fn_p): Likewise.
>> 	(Narrowing_fn_p): Likewise.
>> 	* optabs.cc (commutative_optab_p): Add widening plus optabs.
>> 	* optabs.def (OPTAB_D): Define widen add, sub optabs.
>>          * tree-cfg.cc (verify_gimple_call): Add checks for widening ifns.
>>          * tree-inline.cc (estimate_num_insns): Return same
>>          cost for widen add and sub IFNs as previous tree_codes.
>> 	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
>>          patterns with a hi/lo or even/odd split.
>>          (vect_recog_sad_pattern): Refactor to use new IFN codes.
>>          (vect_recog_widen_plus_pattern): Likewise.
>>          (vect_recog_widen_minus_pattern): Likewise.
>>          (vect_recog_average_pattern): Likewise.
>> 	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
>> 	         _HILO IFNs.
>> 	(supportable_widening_operation): Likewise.
>>          * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      	* gcc.target/aarch64/vect-widen-add.c: Test that new
>>      IFN_VEC_WIDEN_PLUS is being used.
>>      	* gcc.target/aarch64/vect-widen-sub.c: Test that new
>>      IFN_VEC_WIDEN_MINUS is being used.
>>
>
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index da9c59e655465a74926b81b95b4ac8c353efb1b7..b404d5cabf9df8ea8c70ea4537deb978d351c51e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4626,7 +4626,7 @@
   [(set_attr "type" "neon_<ADDSUB:optab>_long")]
 )
 
-(define_expand "vec_widen_<su>addl_lo_<mode>"
+(define_expand "vec_widen_<su>add_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4638,7 +4638,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>addl_hi_<mode>"
+(define_expand "vec_widen_<su>add_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4650,7 +4650,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_lo_<mode>"
+(define_expand "vec_widen_<su>sub_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
@@ -4662,7 +4662,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_<su>subl_hi_<mode>"
+(define_expand "vec_widen_<su>sub_hi_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand")
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))
    (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..5e36dac2b1a10257616f12cdfb0b12d0f2879ae9 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1811,10 +1811,16 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
 @tindex VEC_WIDEN_MULT_LO_EXPR
-@tindex VEC_WIDEN_PLUS_HI_EXPR
-@tindex VEC_WIDEN_PLUS_LO_EXPR
-@tindex VEC_WIDEN_MINUS_HI_EXPR
-@tindex VEC_WIDEN_MINUS_LO_EXPR
+@tindex IFN_VEC_WIDEN_PLUS
+@tindex IFN_VEC_WIDEN_PLUS_HI
+@tindex IFN_VEC_WIDEN_PLUS_LO
+@tindex IFN_VEC_WIDEN_PLUS_EVEN
+@tindex IFN_VEC_WIDEN_PLUS_ODD
+@tindex IFN_VEC_WIDEN_MINUS
+@tindex IFN_VEC_WIDEN_MINUS_HI
+@tindex IFN_VEC_WIDEN_MINUS_LO
+@tindex IFN_VEC_WIDEN_MINUS_EVEN
+@tindex IFN_VEC_WIDEN_MINUS_ODD
 @tindex VEC_UNPACK_HI_EXPR
 @tindex VEC_UNPACK_LO_EXPR
 @tindex VEC_UNPACK_FLOAT_HI_EXPR
@@ -1861,6 +1867,82 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the
 low @code{N/2} elements of the two vector are multiplied to produce the
 vector of @code{N/2} products.
 
+@item IFN_VEC_WIDEN_PLUS
+This internal function represents widening vector addition of two input
+vectors.  Its operands are vectors that contain the same number of elements
+(@code{N}) of the same integral type.  The result is a vector that contains
+the same amount (@code{N}) of elements, of an integral type whose size is twice
+as wide, as the input vectors.  If the current target does not implement the
+corresponding optabs the vectorizer may choose to split it into either a pair
+of @code{IFN_VEC_WIDEN_PLUS_HI} and @code{IFN_VEC_WIDEN_PLUS_LO} or
+@code{IFN_VEC_WIDEN_PLUS_EVEN} and @code{IFN_VEC_WIDEN_PLUS_ODD}, depending
+on what optabs the target implements.
+
+@item IFN_VEC_WIDEN_PLUS_HI
+@itemx IFN_VEC_WIDEN_PLUS_LO
+These internal functions represent widening vector addition of the high and low
+parts of the two input vectors, respectively.  Their operands are vectors that
+contain the same number of elements (@code{N}) of the same integral type. The
+result is a vector that contains half as many elements, of an integral type
+whose size is twice as wide.  In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the
+high @code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.  In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.
+
+@item IFN_VEC_WIDEN_PLUS_EVEN
+@itemx IFN_VEC_WIDEN_PLUS_ODD
+These internal functions represent widening vector addition of the even and odd
+elements of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The result is a vector that contains half as many elements, of an integral type
+whose size is twice as wide.  In the case of @code{IFN_VEC_WIDEN_PLUS_EVEN} the
+even @code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.  In the case of @code{IFN_VEC_WIDEN_PLUS_ODD} the odd
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} additions.
+
+@item IFN_VEC_WIDEN_MINUS
+This internal function represents widening vector subtraction of two input
+vectors.  Its operands are vectors that contain the same number of elements
+(@code{N}) of the same integral type.  The result is a vector that contains
+the same amount (@code{N}) of elements, of an integral type whose size is twice
+as wide, as the input vectors.  If the current target does not implement the
+corresponding optabs the vectorizer may choose to split it into either a pair
+of @code{IFN_VEC_WIDEN_MINUS_HI} and @code{IFN_VEC_WIDEN_MINUS_LO} or
+@code{IFN_VEC_WIDEN_MINUS_EVEN} and @code{IFN_VEC_WIDEN_MINUS_ODD}, depending
+on what optabs the target implements.
+
+@item IFN_VEC_WIDEN_MINUS_HI
+@itemx IFN_VEC_WIDEN_MINUS_LO
+These internal functions represent widening vector subtraction of the high and
+low parts of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The high/low elements of the second vector are subtracted from the high/low
+elements of the first. The result is a vector that contains half as many
+elements, of an integral type whose size is twice as wide.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second
+vector are subtracted from the high @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second
+vector are subtracted from the low @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.
+
+@item IFN_VEC_WIDEN_MINUS_EVEN
+@itemx IFN_VEC_WIDEN_MINUS_ODD
+These internal functions represent widening vector subtraction of the even and
+odd parts of the two input vectors, respectively.  Their operands are vectors
+that contain the same number of elements (@code{N}) of the same integral type.
+The even/odd elements of the second vector are subtracted from the even/odd
+elements of the first. The result is a vector that contains half as many
+elements, of an integral type whose size is twice as wide.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_EVEN} the even @code{N/2} elements of the second
+vector are subtracted from the even @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.  In the case of
+@code{IFN_VEC_WIDEN_MINUS_ODD} the odd @code{N/2} elements of the second
+vector are subtracted from the odd @code{N/2} of the first to produce the
+vector of @code{N/2} subtractions.
+
 @item VEC_WIDEN_PLUS_HI_EXPR
 @itemx VEC_WIDEN_PLUS_LO_EXPR
 These nodes represent widening vector addition of the high and low parts of
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..348bee35a35ae4ed9a8652f5349f430c2733e1cb 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -90,6 +90,71 @@ lookup_internal_fn (const char *name)
   return entry ? *entry : IFN_LAST;
 }
 
+/*  Given an internal_fn IFN that is either a widening or narrowing function, return its
+    corresponding LO and HI internal_fns.  */
+
+extern void
+lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi)
+{
+  gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn));
+
+  switch (ifn)
+    {
+    default:
+      gcc_unreachable ();
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
+#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)	\
+    case IFN_##NAME:						\
+      *lo = internal_fn (IFN_##NAME##_LO);			\
+      *hi = internal_fn (IFN_##NAME##_HI);			\
+      break;
+#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T)	\
+    case IFN_##NAME:					\
+      *lo = internal_fn (IFN_##NAME##_LO);		\
+      *hi = internal_fn (IFN_##NAME##_HI);		\
+      break;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+    }
+}
+
+extern void
+lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even,
+			    internal_fn *odd)
+{
+  gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn));
+
+  switch (ifn)
+    {
+    default:
+      gcc_unreachable ();
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
+#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)	\
+    case IFN_##NAME:						\
+      *even = internal_fn (IFN_##NAME##_EVEN);			\
+      *odd = internal_fn (IFN_##NAME##_ODD);			\
+      break;
+#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T)	\
+    case IFN_##NAME:					\
+      *even = internal_fn (IFN_##NAME##_EVEN);		\
+      *odd = internal_fn (IFN_##NAME##_ODD);		\
+      break;
+#include "internal-fn.def"
+#undef DEF_INTERNAL_FN
+#undef DEF_INTERNAL_WIDENING_OPTAB_FN
+#undef DEF_INTERNAL_NARROWING_OPTAB_FN
+    }
+}
+
+
 /* Fnspec of each internal function, indexed by function number.  */
 const_tree internal_fn_fnspec_array[IFN_LAST + 1];
 
@@ -3852,7 +3917,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 
 /* Return the optab used by internal function FN.  */
 
-static optab
+optab
 direct_internal_fn_optab (internal_fn fn, tree_pair types)
 {
   switch (fn)
@@ -3971,6 +4036,9 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_VEC_WIDEN_PLUS:
+    case IFN_VEC_WIDEN_PLUS_LO:
+    case IFN_VEC_WIDEN_PLUS_HI:
       return true;
 
     default:
@@ -4044,6 +4112,68 @@ first_commutative_argument (internal_fn fn)
     }
 }
 
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as wide as the element size of the input vectors.  */
+
+bool
+widening_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_WIDENING_OPTAB_FN
+    #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \
+    case IFN_##NAME:						  \
+    case IFN_##NAME##_HI:					  \
+    case IFN_##NAME##_LO:					  \
+    case IFN_##NAME##_EVEN:					  \
+    case IFN_##NAME##_ODD:					  \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_WIDENING_OPTAB_FN
+
+    default:
+      return false;
+    }
+}
+
+/* Return true if this CODE describes an internal_fn that returns a vector with
+   elements twice as narrow as the element size of the input vectors.  */
+
+bool
+narrowing_fn_p (code_helper code)
+{
+  if (!code.is_fn_code ())
+    return false;
+
+  if (!internal_fn_p ((combined_fn) code))
+    return false;
+
+  internal_fn fn = as_internal_fn ((combined_fn) code);
+  switch (fn)
+    {
+    #undef DEF_INTERNAL_NARROWING_OPTAB_FN
+    #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T)  \
+    case IFN_##NAME##:					    \
+    case IFN_##NAME##_HI:				    \
+    case IFN_##NAME##_LO:				    \
+    case IFN_##NAME##_HI:				    \
+    case IFN_##NAME##_LO:				    \
+      return true;
+    #include "internal-fn.def"
+    #undef DEF_INTERNAL_NARROWING_OPTAB_FN
+
+    default:
+      return false;
+    }
+}
+
 /* Return true if IFN_SET_EDOM is supported.  */
 
 bool
@@ -4072,6 +4202,8 @@ set_edom_supported_p (void)
     expand_##TYPE##_optab_fn (fn, stmt, which_optab);			\
   }
 #include "internal-fn.def"
+#undef DEF_INTERNAL_OPTAB_FN
+#undef DEF_INTERNAL_SIGNED_OPTAB_FN
 
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
@@ -4080,6 +4212,7 @@ set_edom_supported_p (void)
 
    where STMT is the statement that performs the call. */
 static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = {
+
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE,
 #include "internal-fn.def"
   0
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..e9edaa201ad4ad171a49119efa9d6bff49add9f4 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -85,6 +85,34 @@ along with GCC; see the file COPYING3.  If not see
    says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX}
    group of functions to any integral mode (including vector modes).
 
+   DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal
+   functions with DEF_INTERNAL_SIGNED_OPTAB_FN:
+   - one that describes a widening operation with the same number of elements
+   in the output and input vectors,
+   - two that describe a pair of high-low widening operations where the output
+   vectors each have half the number of elements of the input vectors,
+   corresponding to the result of the widening operation on the top half and
+   bottom half, these have the suffixes _HI and _LO,
+   - and two that describe a pair of even-odd widening operations where the
+   output vectors each have half the number of elements of the input vectors,
+   corresponding to the result of the widening operation on the even and odd
+   elements, these have the suffixes _EVEN and _ODD.
+   These five internal functions will require two optabs each, a SIGNED_OPTAB
+   and an UNSIGNED_OTPAB.
+
+   DEF_INTERNAL_NARROWING_OPTAB_FN is a wrapper that defines five internal
+   functions with DEF_INTERNAL_OPTAB_FN:
+   - one that describes a narrowing operation with the same number of elements
+   in the output and input vectors,
+   - two that describe a pair of high-low narrowing operations where the output
+   vector has the same number of elements in the top or bottom halves as the
+   full input vectors, these have the suffixes _HI and _LO.
+   - and two that describe a pair of even-odd narrowing operations where the
+   output vector has the same number of elements, in the even or odd positions,
+   as the full input vectors, these have the suffixes _EVEN and _ODD.
+   These five internal functions will require an optab each.
+
+
    Each entry must have a corresponding expander of the form:
 
      void expand_NAME (gimple_call stmt)
@@ -123,6 +151,24 @@ along with GCC; see the file COPYING3.  If not see
   DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
 #endif
 
+#ifndef DEF_INTERNAL_WIDENING_OPTAB_FN
+#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE)		    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE)			    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE)	    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE)	    \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _EVEN, FLAGS, SELECTOR, SOPTAB##_even, UOPTAB##_even, TYPE) \
+  DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR, SOPTAB##_odd, UOPTAB##_odd, TYPE)
+#endif
+
+#ifndef DEF_INTERNAL_NARROWING_OPTAB_FN
+#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE)   \
+  DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)		    \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE)	    \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE)	    \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _EVEN, FLAGS, OPTAB##_even, TYPE)  \
+  DEF_INTERNAL_OPTAB_FN (NAME ## _ODD, FLAGS, OPTAB##_odd, TYPE)
+#endif
+
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load)
 DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
@@ -315,6 +361,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
 DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
+DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_PLUS,
+				ECF_CONST | ECF_NOTHROW,
+				first,
+				vec_widen_sadd, vec_widen_uadd,
+				binary)
+DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS,
+				ECF_CONST | ECF_NOTHROW,
+				first,
+				vec_widen_ssub, vec_widen_usub,
+				binary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
 DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
 
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 08922ed4254898f5fffca3f33973e96ed9ce772f..3904ba3ca36949d844532a6a9303f550533311a4 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -20,6 +20,10 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_INTERNAL_FN_H
 #define GCC_INTERNAL_FN_H
 
+#include "insn-codes.h"
+#include "insn-opinit.h"
+
+
 /* INTEGER_CST values for IFN_UNIQUE function arg-0.
 
    UNSPEC: Undifferentiated UNIQUE.
@@ -112,6 +116,10 @@ internal_fn_name (enum internal_fn fn)
 }
 
 extern internal_fn lookup_internal_fn (const char *);
+extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *);
+extern void lookup_evenodd_internal_fn (internal_fn, internal_fn *,
+					internal_fn *);
+extern optab direct_internal_fn_optab (internal_fn, tree_pair);
 
 /* Return the ECF_* flags for function FN.  */
 
@@ -210,6 +218,8 @@ extern bool commutative_binary_fn_p (internal_fn);
 extern bool commutative_ternary_fn_p (internal_fn);
 extern int first_commutative_argument (internal_fn);
 extern bool associative_binary_fn_p (internal_fn);
+extern bool widening_fn_p (code_helper);
+extern bool narrowing_fn_p (code_helper);
 
 extern bool set_edom_supported_p (void);
 
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index a12333c7169fc6219b0e34b6169780f78e033ee3..aab6ab6faf244a8236dac81be2d68fc28819bc9a 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1314,7 +1314,17 @@ commutative_optab_p (optab binoptab)
 	  || binoptab == smul_widen_optab
 	  || binoptab == umul_widen_optab
 	  || binoptab == smul_highpart_optab
-	  || binoptab == umul_highpart_optab);
+	  || binoptab == umul_highpart_optab
+	  || binoptab == vec_widen_sadd_optab
+	  || binoptab == vec_widen_uadd_optab
+	  || binoptab == vec_widen_sadd_hi_optab
+	  || binoptab == vec_widen_sadd_lo_optab
+	  || binoptab == vec_widen_uadd_hi_optab
+	  || binoptab == vec_widen_uadd_lo_optab
+	  || binoptab == vec_widen_sadd_even_optab
+	  || binoptab == vec_widen_sadd_odd_optab
+	  || binoptab == vec_widen_uadd_even_optab
+	  || binoptab == vec_widen_uadd_odd_optab);
 }
 
 /* X is to be used in mode MODE as operand OPN to BINOPTAB.  If we're
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b300c9ca5737de9be809fa01aabe5e01..d41ed6e1afaddd019c7470f965c0ad21c8b2b9d7 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -410,6 +410,16 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
 OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
 OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
 OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
+OPTAB_D (vec_widen_ssub_optab, "vec_widen_ssub_$a")
+OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a")
+OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a")
+OPTAB_D (vec_widen_ssub_odd_optab, "vec_widen_ssub_odd_$a")
+OPTAB_D (vec_widen_ssub_even_optab, "vec_widen_ssub_even_$a")
+OPTAB_D (vec_widen_sadd_optab, "vec_widen_sadd_$a")
+OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a")
+OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a")
+OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a")
+OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a")
 OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
 OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
 OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -422,6 +432,16 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
 OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
 OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
 OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
+OPTAB_D (vec_widen_usub_optab, "vec_widen_usub_$a")
+OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a")
+OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a")
+OPTAB_D (vec_widen_usub_odd_optab, "vec_widen_usub_odd_$a")
+OPTAB_D (vec_widen_usub_even_optab, "vec_widen_usub_even_$a")
+OPTAB_D (vec_widen_uadd_optab, "vec_widen_uadd_$a")
+OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a")
+OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a")
+OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a")
+OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a")
 OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
 OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
 OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
index 220bd9352a4c7acd2e3713e441d74898d3e92b30..b5a73867e44ec3fa04d1201decf81353a67b4c82 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-details" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
index a2bed63affbd091977df95a126da1f5b8c1d41d2..1686c3f2f344c367ebb9cf34e558d0878849f9bc 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -fdump-tree-vect-details" } */
 #include <stdint.h>
 #include <string.h>
 
@@ -86,6 +86,8 @@ main()
     return 0;
 }
 
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect"   } } */
+/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect"   } } */
 /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
 /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
 /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 1778af0242898e3dc73d94d22a5b8505628a53b5..dcd4b5561600346a2c10bd5133507329206e8837 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
 
 static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
-		      tree_code widened_code, bool shift_p,
+		      code_helper widened_code, bool shift_p,
 		      unsigned int max_nops,
 		      vect_unpromoted_value *unprom, tree *common_type,
 		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
-  gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
-  if (!assign)
+  gimple* stmt = stmt_info->stmt;
+  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
+    return 0;
+
+  code_helper rhs_code;
+  if (is_gimple_assign (stmt))
+    rhs_code = gimple_assign_rhs_code (stmt);
+  else if (is_gimple_call (stmt))
+    rhs_code = gimple_call_combined_fn (stmt);
+  else
     return 0;
 
-  tree_code rhs_code = gimple_assign_rhs_code (assign);
-  if (rhs_code != code && rhs_code != widened_code)
+  if (rhs_code != code
+      && rhs_code != widened_code)
     return 0;
 
-  tree type = TREE_TYPE (gimple_assign_lhs (assign));
+  tree lhs = gimple_get_lhs (stmt);
+  tree type = TREE_TYPE (lhs);
   if (!INTEGRAL_TYPE_P (type))
     return 0;
 
@@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
     {
       vect_unpromoted_value *this_unprom = &unprom[next_op];
       unsigned int nops = 1;
-      tree op = gimple_op (assign, i + 1);
+      tree op = gimple_arg (stmt, i);
       if (i == 1 && TREE_CODE (op) == INTEGER_CST)
 	{
 	  /* We already have a common type from earlier operands.
@@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom[2];
-  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR,
+  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR,
+			     IFN_VEC_WIDEN_MINUS,
 			     false, 2, unprom, &half_type))
     return NULL;
 
@@ -1395,14 +1405,16 @@ static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 			     stmt_vec_info last_stmt_info, tree *type_out,
 			     tree_code orig_code, code_helper wide_code,
-			     bool shift_p, const char *name)
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
 {
   gimple *last_stmt = last_stmt_info->stmt;
 
   vect_unpromoted_value unprom[2];
   tree half_type;
   if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code,
-			     shift_p, 2, unprom, &half_type))
+			     shift_p, 2, unprom, &half_type, subtype))
+
     return NULL;
 
   /* Pattern detected.  */
@@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
 			      type, pattern_stmt, vecctype);
 }
 
+static gimple *
+vect_recog_widen_op_pattern (vec_info *vinfo,
+			     stmt_vec_info last_stmt_info, tree *type_out,
+			     tree_code orig_code, internal_fn wide_ifn,
+			     bool shift_p, const char *name,
+			     optab_subtype *subtype = NULL)
+{
+  combined_fn ifn = as_combined_fn (wide_ifn);
+  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+				      orig_code, ifn, shift_p, name,
+				      subtype);
+}
+
+
 /* Try to detect multiplication on widened inputs, converting MULT_EXPR
    to WIDEN_MULT_EXPR.  See vect_recog_widen_op_pattern for details.  */
 
@@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 }
 
 /* Try to detect addition on widened inputs, converting PLUS_EXPR
-   to WIDEN_PLUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_PLUS.  See vect_recog_widen_op_pattern for details.  */
 
 static gimple *
 vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      PLUS_EXPR, WIDEN_PLUS_EXPR, false,
-				      "vect_recog_widen_plus_pattern");
+				      PLUS_EXPR, IFN_VEC_WIDEN_PLUS,
+				      false, "vect_recog_widen_plus_pattern",
+				      &subtype);
 }
 
 /* Try to detect subtraction on widened inputs, converting MINUS_EXPR
-   to WIDEN_MINUS_EXPR.  See vect_recog_widen_op_pattern for details.  */
+   to IFN_VEC_WIDEN_MINUS.  See vect_recog_widen_op_pattern for details.  */
 static gimple *
 vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
 			       tree *type_out)
 {
+  optab_subtype subtype;
   return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
-				      MINUS_EXPR, WIDEN_MINUS_EXPR, false,
-				      "vect_recog_widen_minus_pattern");
+				      MINUS_EXPR, IFN_VEC_WIDEN_MINUS,
+				      false, "vect_recog_widen_minus_pattern",
+				      &subtype);
 }
 
 /* Function vect_recog_ctz_ffs_pattern
@@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo,
   vect_unpromoted_value unprom[3];
   tree new_type;
   unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
-					    WIDEN_PLUS_EXPR, false, 3,
+					    IFN_VEC_WIDEN_PLUS, false, 3,
 					    unprom, &new_type);
   if (nops == 0)
     return NULL;
@@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_mask_conversion_pattern, "mask_conversion" },
   { vect_recog_widen_plus_pattern, "widen_plus" },
   { vect_recog_widen_minus_pattern, "widen_minus" },
+  /* These must come after the double widening ones.  */
 };
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d73e7f0936435951fe05fa6b787ba053233635aa..4f1569023a4e42ad6d058bccf62687dc3fe1302e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5038,7 +5038,8 @@ vectorizable_conversion (vec_info *vinfo,
   bool widen_arith = (code == WIDEN_PLUS_EXPR
 		 || code == WIDEN_MINUS_EXPR
 		 || code == WIDEN_MULT_EXPR
-		 || code == WIDEN_LSHIFT_EXPR);
+		 || code == WIDEN_LSHIFT_EXPR
+		 || widening_fn_p (code));
 
   if (!widen_arith
       && !CONVERT_EXPR_CODE_P (code)
@@ -5088,8 +5089,8 @@ vectorizable_conversion (vec_info *vinfo,
       gcc_assert (code == WIDEN_MULT_EXPR
 		  || code == WIDEN_LSHIFT_EXPR
 		  || code == WIDEN_PLUS_EXPR
-		  || code == WIDEN_MINUS_EXPR);
-
+		  || code == WIDEN_MINUS_EXPR
+		  || widening_fn_p (code));
 
       op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) :
 				     gimple_call_arg (stmt, 0);
@@ -12500,26 +12501,69 @@ supportable_widening_operation (vec_info *vinfo,
       optab1 = vec_unpacks_sbool_lo_optab;
       optab2 = vec_unpacks_sbool_hi_optab;
     }
-  else
-    {
-      optab1 = optab_for_tree_code (c1, vectype, optab_default);
-      optab2 = optab_for_tree_code (c2, vectype, optab_default);
+
+  vec_mode = TYPE_MODE (vectype);
+  if (widening_fn_p (code))
+     {
+       /* If this is an internal fn then we must check whether the target
+	  supports either a low-high split or an even-odd split.  */
+      internal_fn ifn = as_internal_fn ((combined_fn) code);
+
+      internal_fn lo, hi, even, odd;
+      lookup_hilo_internal_fn (ifn, &lo, &hi);
+      *code1 = as_combined_fn (lo);
+      *code2 = as_combined_fn (hi);
+      optab1 = direct_internal_fn_optab (lo, {vectype, vectype});
+      optab2 = direct_internal_fn_optab (hi, {vectype, vectype});
+
+      /* If we don't support low-high, then check for even-odd.  */
+      if (!optab1
+	  || (icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing
+	  || !optab2
+	  || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing)
+	{
+	  lookup_evenodd_internal_fn (ifn, &even, &odd);
+	  *code1 = as_combined_fn (even);
+	  *code2 = as_combined_fn (odd);
+	  optab1 = direct_internal_fn_optab (even, {vectype, vectype});
+	  optab2 = direct_internal_fn_optab (odd, {vectype, vectype});
+	}
+    }
+  else if (code.is_tree_code ())
+    {
+      if (code == FIX_TRUNC_EXPR)
+	{
+	  /* The signedness is determined from output operand.  */
+	  optab1 = optab_for_tree_code (c1, vectype_out, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype_out, optab_default);
+	}
+      else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ())
+	       && VECTOR_BOOLEAN_TYPE_P (wide_vectype)
+	       && VECTOR_BOOLEAN_TYPE_P (vectype)
+	       && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype)
+	       && SCALAR_INT_MODE_P (TYPE_MODE (vectype)))
+	{
+	  /* If the input and result modes are the same, a different optab
+	     is needed where we pass in the number of units in vectype.  */
+	  optab1 = vec_unpacks_sbool_lo_optab;
+	  optab2 = vec_unpacks_sbool_hi_optab;
+	}
+      else
+	{
+	  optab1 = optab_for_tree_code (c1, vectype, optab_default);
+	  optab2 = optab_for_tree_code (c2, vectype, optab_default);
+	}
+      *code1 = c1;
+      *code2 = c2;
     }
 
   if (!optab1 || !optab2)
     return false;
 
-  vec_mode = TYPE_MODE (vectype);
   if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing
        || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing)
     return false;
 
-  if (code.is_tree_code ())
-  {
-    *code1 = c1;
-    *code2 = c2;
-  }
-
 
   if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
       && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
diff --git a/gcc/tree.def b/gcc/tree.def
index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3)
 DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2)
 
 /* Widening sad (sum of absolute differences).
-   The first two arguments are of type t1 which should be integer.
-   The third argument and the result are of type t2, such that t2 is at least
-   twice the size of t1.  Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
+   The first two arguments are of type t1 which should be a vector of integers.
+   The third argument and the result are of type t2, such that the size of
+   the elements of t2 is at least twice the size of the elements of t1.
+   Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is
    equivalent to:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = PLUS_EXPR (tmp2, arg3)
   or:
-       tmp = WIDEN_MINUS_EXPR (arg1, arg2)
+       tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2)
        tmp2 = ABS_EXPR (tmp)
        arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
  */

Richard Sandiford June 2, 2023, noon UTC | #16

Just some very minor things.

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..348bee35a35ae4ed9a8652f5349f430c2733e1cb 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -90,6 +90,71 @@ lookup_internal_fn (const char *name)
>    return entry ? *entry : IFN_LAST;
>  }
>  
> +/*  Given an internal_fn IFN that is either a widening or narrowing function, return its
> +    corresponding LO and HI internal_fns.  */

Long line and too much space after "/*":

/* Given an internal_fn IFN that is either a widening or narrowing function,
   return its corresponding _LO and _HI internal_fns in *LO and *HI.  */

> +extern void
> +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi)
> +{
> +  gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn));
> +
> +  switch (ifn)
> +    {
> +    default:
> +      gcc_unreachable ();
> +#undef DEF_INTERNAL_FN
> +#undef DEF_INTERNAL_WIDENING_OPTAB_FN
> +#undef DEF_INTERNAL_NARROWING_OPTAB_FN
> +#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE)
> +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T)	\
> +    case IFN_##NAME:						\
> +      *lo = internal_fn (IFN_##NAME##_LO);			\
> +      *hi = internal_fn (IFN_##NAME##_HI);			\
> +      break;
> +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T)	\
> +    case IFN_##NAME:					\
> +      *lo = internal_fn (IFN_##NAME##_LO);		\
> +      *hi = internal_fn (IFN_##NAME##_HI);		\
> +      break;
> +#include "internal-fn.def"
> +#undef DEF_INTERNAL_FN
> +#undef DEF_INTERNAL_WIDENING_OPTAB_FN
> +#undef DEF_INTERNAL_NARROWING_OPTAB_FN
> +    }
> +}
> +
> +extern void
> +lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even,
> +			    internal_fn *odd)

This needs a similar comment:

/* Given an internal_fn IFN that is either a widening or narrowing function,
   return its corresponding _EVEN and _ODD internal_fns in *EVEN and *ODD.  */

> @@ -3971,6 +4036,9 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_VEC_WIDEN_PLUS:
> +    case IFN_VEC_WIDEN_PLUS_LO:
> +    case IFN_VEC_WIDEN_PLUS_HI:

Should include even & odd as well.

I'd suggest leaving out the narrowing stuff for now.  There are some
questions that would be easier to answer once we add the first use,
such as whether one of the hi/lo pair and one or the even/odd pair
merge with a vector containing the other half, whether all four
define the other half to be zero, etc.

OK for the optab/internal-fn parts with those changes from my POV.

Thanks again for doing this!

Richard

Jakub Jelinek June 6, 2023, 7 p.m. UTC | #17

On Thu, Jun 01, 2023 at 05:27:56PM +0100, Andre Vieira (lists) via Gcc-patches wrote:
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_INTERNAL_FN_H
>  #define GCC_INTERNAL_FN_H
>  
> +#include "insn-codes.h"
> +#include "insn-opinit.h"

My i686-linux build configured with
../configure --enable-languages=default,obj-c++,lto,go,d,rust,m2 --enable-checking=yes,rtl,extra --enable-libstdcxx-backtrace=yes
just died with
In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74,
                 from ../../gcc/m2/gm2-gcc/m2except.cc:22:
../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or directory
   24 | #include "insn-opinit.h"
      |          ^~~~~~~~~~~~~~~
compilation terminated.
In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74,
                 from ../../gcc/m2/m2pp.cc:23:
../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or directory
   24 | #include "insn-opinit.h"
      |          ^~~~~~~~~~~~~~~
In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74,
                 from ../../gcc/m2/gm2-gcc/rtegraph.cc:22:
../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or directory
   24 | #include "insn-opinit.h"
      |          ^~~~~~~~~~~~~~~
compilation terminated.
compilation terminated.
supposedly because of this change.

Do you really need those includes there?
If yes, what is supposed to ensure that the generated includes
are generated before compiling files which include those?

From what I can see, gcc/Makefile.in has
generated_files var which includes among other things insn-opinit.h,
and
# Dependency information.

# In order for parallel make to really start compiling the expensive
# objects from $(OBJS) as early as possible, build all their
# prerequisites strictly before all objects.
$(ALL_HOST_OBJS) : | $(generated_files)

rule, plus I see $(generated_files) mentioned in a couple of dependencies
in gcc/m2/Make-lang.in .  But supposedly because of this change it now
needs to be added to tons of other spots.

	Jakub

[2/3] Refactor widen_plus as internal_fn

Commit Message

Comments

Patch