[V2] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

>Looks good overall - a few comments inline.  Also can you please add
>SLP support?
>I've tried hard to fill in gaps where SLP support is missing since my
>goal is still to get
>rid of non-SLP.
For slp with different induction type, they need separate iv update and
an vector permutation. And if they are the same induction type, iv can
be updated with 1 instructions w/o permutation.
I'll add a incremental patch for that.

>gcc_assert (is_a <gphi *> (loop_phi_node));
>is shorter
It looks like it doesn't support gphi*, just support gimple*. Since it's
already gphi* here, I've removed the assert.

>I think you should use
>init_expr = PHI_ARG_DEF_FROM_EDGE (loop_phi_node,loop_preheader_edge (loop));
>and loop_latch_edge (loop) for ev_expr, you can't rely on them being arg 0 / 1.
Changed.

>Likewise.  Use preheader/latch edge.
Changed.

>and those two should then go away.
Removed.

>is not vectorized?  I think it should be possible to relax
>this.
Relaxed.

>def is never NULL so a cheaper way to write this is
>    || ((def = SSA_NAME_DEF_STMT (ev_expr)), true)
Changed.

>not sure if we need to bother here - ideally vectorizable_live_operation would
>give up instead (but I suppose the regular IV / induction analysis gives up
>here as well?)
Removed.

>the above can at least go into the combined switch case
Changed.

>Seeing this - did you check whether you handle prologue peeling correctly?  The
>same issue might show up with a vectorized epilogue.  I think you can force a
>peeled prologue with storing unaligned and -fno-vect-cost-model (that IIRC will
>simply optimize for the larger number of aligned memory ops)
Update in vect_update_ivs_after_vectorizer, also support peel for unaligned cases.

>since you only handle inner loop nonlinear IVs you should probably
>swap the two checks?
Changed.

>There might be a more canonical way to build the series expr
build_vec_series doens't add stmt to sequence, so i'll still keep VEC_SERY_EXPR here?

>use types_compatible_p (...) instead of comparing TYPE_CANONICAL.
>A small enhancement would be to support different signedness
>(use tree_nop_conversion_p then).
Support different signedness.

>above you asserted that the conversion is only necessary for constants
>but then fold_convert will also generate a tree NOP_EXPR for
>some types_compatible_p types.  So maybe only do this for INTEGER_CST
>init_expr or use init_expr = gimple_convert (...) and insert required stmts
>on the preheader.
Changed.

>Alternatively you could perform the vector IV updates in an unsigned type?
Changed.

>why's that necessary?  can we do a MIN (vector_step, { prec-1, prec-1,
>prec-1 ... })
It's true for ashr, but not for ashl, lshr. For the later 2, when vector_step >= precision
The result should be zero instead of shift by prec - 1.

>> +      new_name = vect_create_nonlinear_iv_step (&stmts, step_expr,
>> +                                               nunits, induction_type);
>> +
>> +      vec_step = vect_create_nonlinear_iv_vec_step (loop_vinfo, stmt_info,
>> +                                                   new_name, vectype,
>> +                                                   induction_type);

>are these not the same as created above?are these not the same as created above?

They are different, the first one is vf, this is nunits, vf could be multi copy of nunits which
is exact this code is handled and phi_latch is updated in the former vf place.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.

For neg, the patch create a vec_init as [ a, -a, a, -a, ...  ] and no
vec_step is needed to update vectorized iv since vf is always multiple
of 2(negative * negative is positive).

For shift, the patch create a vec_init as [ a, a >> c, a >> 2*c, ..]
as vec_step as [ c * nunits, c * nunits, c * nunits, ... ], vectorized iv is
updated as vec_def = vec_init >>/<< vec_step.

For mul, the patch create a vec_init as [ a, a * c, a * pow(c, 2), ..]
as vec_step as [ pow(c,nunits), pow(c,nunits),...] iv is updated as vec_def =
vec_init * vec_step.

The patch handles nonlinear iv for
1. Integer type only, floating point is not handled.
2. No slp_node.
3. iv_loop should be same as vector loop, not nested loop.
4. No UD is created, for mul, use unsigned mult to avoid UD, for
   shift, shift count should be less than type precision.

gcc/ChangeLog:

	PR tree-optimization/103144
	* tree-vect-loop.cc (vect_is_nonlinear_iv_evolution): New function.
	(vect_analyze_scalar_cycles_1): Detect nonlinear iv by upper function.
	(vect_create_nonlinear_iv_init): New function.
	(vect_peel_nonlinear_iv_init): Ditto.
	(vect_create_nonlinear_iv_step): Ditto
	(vect_create_nonlinear_iv_vec_step): Ditto
	(vect_update_nonlinear_iv): Ditto
	(vectorizable_nonlinear_induction): Ditto.
	(vectorizable_induction): Call
	vectorizable_nonlinear_induction when induction_type is not
	vect_step_op_add.
	* tree-vect-loop-manip.cc (vect_update_ivs_after_vectorizer):
	Update nonlinear iv for epilogue loop.
	* tree-vectorizer.h (enum vect_induction_op_type): New enum.
	(STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE): New Macro.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr103144-mul-1.c: New test.
	* gcc.target/i386/pr103144-mul-2.c: New test.
	* gcc.target/i386/pr103144-neg-1.c: New test.
	* gcc.target/i386/pr103144-neg-2.c: New test.
	* gcc.target/i386/pr103144-shift-1.c: New test.
	* gcc.target/i386/pr103144-shift-2.c: New test.
---
 .../gcc.target/i386/pr103144-mul-1.c          |  51 ++
 .../gcc.target/i386/pr103144-mul-2.c          |  51 ++
 .../gcc.target/i386/pr103144-neg-1.c          |  51 ++
 .../gcc.target/i386/pr103144-neg-2.c          |  44 ++
 .../gcc.target/i386/pr103144-shift-1.c        |  70 ++
 .../gcc.target/i386/pr103144-shift-2.c        |  79 ++
 gcc/tree-vect-loop-manip.cc                   |  37 +-
 gcc/tree-vect-loop.cc                         | 678 +++++++++++++++++-
 gcc/tree-vectorizer.h                         |  15 +
 9 files changed, 1062 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-2.c

Message ID	20220829052444.86744-1-hongtao.liu@intel.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D86E3858D32 To: gcc-patches@gcc.gnu.org Subject: [PATCH V2] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant. Date: Mon, 29 Aug 2022 13:24:44 +0800 Message-Id: <20220829052444.86744-1-hongtao.liu@intel.com> In-Reply-To: <CAFiYyc3b3UPqA453TUZt-__bScoPfK29MOVO7Bnj4_J3_fYpFw@mail.gmail.com> References: <CAFiYyc3b3UPqA453TUZt-__bScoPfK29MOVO7Bnj4_J3_fYpFw@mail.gmail.com> Precedence: list From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: liuhongt <hongtao.liu@intel.com> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	[V2] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant. \| expand [V2] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

[V2] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

Commit Message

Comments

Patch