[RFC,vectorizer] Fix ICE with masked vectors

Message ID	c9f7a19a-d3c4-46e2-2cb0-dc6aadd47077@codesourcery.com
State	New
Headers	show Return-Path: <gcc-patches-return-515530-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:cc:message-id:date:mime-version:content-type; q=dns; s=default; b=NIrSm7oguirpNzwGu9jguA6hsbExzuDjVjodqLk+cSakkbwGUs yO975i+fTwhRbK8ZxwPyeH3Ujt7hiqc/iodcSwvMCb1fsjzgK5i2K5VBsxhbD6DQ OSt6YMx1OtRnwUS0mDra4AlPeabhllqkvPVxmwXHCkq4R+6llbDrogGpA= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org IronPort-SDR: l2FkkCxqbUniU8z0jFSOS+x3x5HgoVPbNxHSGaieCNV0FOYHPuNZEj9ftisB42wmmD2PMLfBOM 21O1EMRvbbdNcowhVyhBmdkN3uCxHWNexJ+aKwXMNAr/Z2HRd4x9+hrl40o/qAIFpQ/HFgrbjc fRWN9Hs5yEYkANpCD1RWuPB3IlGZEXskNja2rPQcSGUMSEtkx8QGxBKbjXRJlbrtoAfdt9cA3q wVSxr2K20zlU+O71DHrcmLGDtj8BrZGBsBf1rFYwk9NtAZcvYGFTgymAX9P5rJsd8RHRbGZYQO rao= IronPort-SDR: apO/Pg2tJ5MZezQ0AQMmG5S/rCHjjBVK9ymjqnxOx38H8avhQ/eJ8oOCzkLcKd45GrvpUhQN7v dnwN017On6KjzYVGG8Pv2daELIW7GfKPAS3tVLmkty5b06EiCrVvoH1sDOIzhbx6dOvz+mjXnp 1TBWeMfIX3zk0vT3RYpt/aNjJqPqjkODkTwm1eMZo6Jnxcmy0Mm4QeffBjMP6gOSW/6gQkRr1p GCTVrd5UH/pZThT9ZbQDu85kUT5ZuhNbsXDbv3NK00m/59iqK8fse7IeDuAGpeHBwbSUSZNNwK +l8= From: Andrew Stubbs <ams@codesourcery.com> Subject: [RFC, vectorizer] Fix ICE with masked vectors To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> CC: Richard Sandiford <richard.sandiford@arm.com> Message-ID: <c9f7a19a-d3c4-46e2-2cb0-dc6aadd47077@codesourcery.com> Date: Mon, 9 Dec 2019 15:21:52 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------E975988E8CA9AD2CBCCA3C40"
Series	[RFC,vectorizer] Fix ICE with masked vectors \| expand [RFC,vectorizer] Fix ICE with masked vectors

Message ID

c9f7a19a-d3c4-46e2-2cb0-dc6aadd47077@codesourcery.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:subject:to:cc:message-id:date:mime-version:content-type; q=dns;
	s=default; b=NIrSm7oguirpNzwGu9jguA6hsbExzuDjVjodqLk+cSakkbwGUs
	yO975i+fTwhRbK8ZxwPyeH3Ujt7hiqc/iodcSwvMCb1fsjzgK5i2K5VBsxhbD6DQ
	OSt6YMx1OtRnwUS0mDra4AlPeabhllqkvPVxmwXHCkq4R+6llbDrogGpA=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
IronPort-SDR: l2FkkCxqbUniU8z0jFSOS+x3x5HgoVPbNxHSGaieCNV0FOYHPuNZEj9ftisB42wmmD2PMLfBOM
	21O1EMRvbbdNcowhVyhBmdkN3uCxHWNexJ+aKwXMNAr/Z2HRd4x9+hrl40o/qAIFpQ/HFgrbjc
	fRWN9Hs5yEYkANpCD1RWuPB3IlGZEXskNja2rPQcSGUMSEtkx8QGxBKbjXRJlbrtoAfdt9cA3q
	wVSxr2K20zlU+O71DHrcmLGDtj8BrZGBsBf1rFYwk9NtAZcvYGFTgymAX9P5rJsd8RHRbGZYQO
	rao=
IronPort-SDR: apO/Pg2tJ5MZezQ0AQMmG5S/rCHjjBVK9ymjqnxOx38H8avhQ/eJ8oOCzkLcKd45GrvpUhQN7v
	dnwN017On6KjzYVGG8Pv2daELIW7GfKPAS3tVLmkty5b06EiCrVvoH1sDOIzhbx6dOvz+mjXnp
	1TBWeMfIX3zk0vT3RYpt/aNjJqPqjkODkTwm1eMZo6Jnxcmy0Mm4QeffBjMP6gOSW/6gQkRr1p
	GCTVrd5UH/pZThT9ZbQDu85kUT5ZuhNbsXDbv3NK00m/59iqK8fse7IeDuAGpeHBwbSUSZNNwK
	+l8=
From: Andrew Stubbs <ams@codesourcery.com>
Subject: [RFC, vectorizer] Fix ICE with masked vectors
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Sandiford <richard.sandiford@arm.com>
Message-ID: <c9f7a19a-d3c4-46e2-2cb0-dc6aadd47077@codesourcery.com>
Date: Mon, 9 Dec 2019 15:21:52 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:68.0) Gecko/20100101 Thunderbird/68.2.1
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="------------E975988E8CA9AD2CBCCA3C40"

Series

[RFC,vectorizer] Fix ICE with masked vectors | expand

Commit Message

Andrew Stubbs Dec. 9, 2019, 3:21 p.m. UTC

Hi,

This patch fixes an ICE in testcase gcc.dg/vect/vect-ctor-1.c:

during GIMPLE pass: vect
dump file: vect-ctor-1.c.159t.vect
.../gcc.dg/vect/vect-ctor-1.c: In function 'intrapred_luma_16x16':
.../gcc.dg/vect/vect-ctor-1.c:9:6: internal compiler error: in 
exact_div, at poly-int.h:2162
0xdf845f poly_int<1u, poly_result<unsigned long, if_nonpoly<unsigned 
long, unsigned long, poly_int_traits<unsigned long>::is_poly>::type, 
poly_coeff_pair_traits<unsigned long, if_nonpoly<unsigned long, unsigned 
long, poly_int_traits<unsigned 
long>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, 
unsigned long>(poly_int_pod<1u, unsigned long> const&, unsigned long)
         /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2162
0xdf649a poly_int<1u, poly_result<unsigned long, unsigned long, 
poly_coeff_pair_traits<unsigned long, unsigned 
long>::result_kind>::type> exact_div<1u, unsigned long, unsigned 
long>(poly_int_pod<1u, unsigned long> const&, poly_int_pod<1u, unsigned 
long> const&)
         /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2175
0x1c473cd vect_get_num_vectors
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.h:1520
0x1c4bd35 vect_enhance_data_refs_alignment(_loop_vec_info*)
 
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-data-refs.c:1798
0x1596732 vect_analyze_loop_2
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2095
0x15980f3 vect_analyze_loop(loop*, vec_info_shared*)
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2536
0x15d7b36 try_vectorize_loop_1
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:892
0x15d831f try_vectorize_loop
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1044
0x15d84f9 vectorize_loops()
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1125
0x144f0af execute
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-ssa-loop.c:414
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


The problem is that exact_div is being asked to do "8 / 64", which it 
won't. The comment on the function says "NUNITS should be based on the 
vectorization factor, so it is always a known multiple of the number of 
elements in VECTYPE". This is on the amdgcn target where the 
vectorization factor is always 64, but smaller tasks can be vectorized 
using masking.

I think what's happening here is that the assumption described in the 
comment is invalid in the presence of masked vectors.

The attached patch fixes the ICE in the testcase, but I suspect does not 
go far enough. Can it happen that NUNITS can be greater than the 
vectorization factor, but not a multiple? Is this even a valid fix in 
the first place? Must it be conditionalized on masking being available? 
Is the exactness even worth checking, in the presence of exceptions?

Thanks

Andrew

Comments

Richard Sandiford Dec. 9, 2019, 3:59 p.m. UTC | #1

Andrew Stubbs <ams@codesourcery.com> writes:
> Hi,
>
> This patch fixes an ICE in testcase gcc.dg/vect/vect-ctor-1.c:
>
> during GIMPLE pass: vect
> dump file: vect-ctor-1.c.159t.vect
> .../gcc.dg/vect/vect-ctor-1.c: In function 'intrapred_luma_16x16':
> .../gcc.dg/vect/vect-ctor-1.c:9:6: internal compiler error: in 
> exact_div, at poly-int.h:2162
> 0xdf845f poly_int<1u, poly_result<unsigned long, if_nonpoly<unsigned 
> long, unsigned long, poly_int_traits<unsigned long>::is_poly>::type, 
> poly_coeff_pair_traits<unsigned long, if_nonpoly<unsigned long, unsigned 
> long, poly_int_traits<unsigned 
> long>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, 
> unsigned long>(poly_int_pod<1u, unsigned long> const&, unsigned long)
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2162
> 0xdf649a poly_int<1u, poly_result<unsigned long, unsigned long, 
> poly_coeff_pair_traits<unsigned long, unsigned 
> long>::result_kind>::type> exact_div<1u, unsigned long, unsigned 
> long>(poly_int_pod<1u, unsigned long> const&, poly_int_pod<1u, unsigned 
> long> const&)
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2175
> 0x1c473cd vect_get_num_vectors
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.h:1520
> 0x1c4bd35 vect_enhance_data_refs_alignment(_loop_vec_info*)
>  
> /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-data-refs.c:1798
> 0x1596732 vect_analyze_loop_2
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2095
> 0x15980f3 vect_analyze_loop(loop*, vec_info_shared*)
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2536
> 0x15d7b36 try_vectorize_loop_1
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:892
> 0x15d831f try_vectorize_loop
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1044
> 0x15d84f9 vectorize_loops()
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1125
> 0x144f0af execute
>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-ssa-loop.c:414
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>
>
> The problem is that exact_div is being asked to do "8 / 64", which it 
> won't. The comment on the function says "NUNITS should be based on the 
> vectorization factor, so it is always a known multiple of the number of 
> elements in VECTYPE". This is on the amdgcn target where the 
> vectorization factor is always 64, but smaller tasks can be vectorized 
> using masking.
>
> I think what's happening here is that the assumption described in the 
> comment is invalid in the presence of masked vectors.

No, the assumption's correct even there.  The assert usually triggers
because something elsewhere is getting confused about the vector types.

> The attached patch fixes the ICE in the testcase, but I suspect does not 
> go far enough. Can it happen that NUNITS can be greater than the 
> vectorization factor, but not a multiple? Is this even a valid fix in 
> the first place? Must it be conditionalized on masking being available? 
> Is the exactness even worth checking, in the presence of exceptions?

The vector types and VF aren't chosen based on whether masking is available.
It happens the other way around: we first analyse the loop and pick the VF
for an unmasked loop, but record as we go whether a masked implementation
is also possible.  Then we decide at the end whether to use a masked
implementation instead of an unmasked one.

So if this assert triggers for masked loops, it could trigger for unmasked
loops too.

FWIW there's an instance of this for SVE that I haven't got around
to debugging yet, but from a quick look at the dump, it was somehow
combining a vector of 8 longs with a vector of 4 floats.  I'm not sure
it's going to be the same issue as yours though.

Thanks,
Richard

>
> Thanks
>
> Andrew
>
> WIP Fix vect-ctor-1.c ICE
>
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 51a13f1d207..bf1c3eeda85 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1513,6 +1513,10 @@ vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo)
>  static inline unsigned int
>  vect_get_num_vectors (poly_uint64 nunits, tree vectype)
>  {
> +  /* Masked vectors can cause partial vector use.  */
> +  if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
> +    return 1;
> +
>    return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
>  }
>

Andrew Stubbs Dec. 10, 2019, 1:27 p.m. UTC | #2

On 09/12/2019 15:59, Richard Sandiford wrote:
> No, the assumption's correct even there.  The assert usually triggers
> because something elsewhere is getting confused about the vector types.
> 
>> The attached patch fixes the ICE in the testcase, but I suspect does not
>> go far enough. Can it happen that NUNITS can be greater than the
>> vectorization factor, but not a multiple? Is this even a valid fix in
>> the first place? Must it be conditionalized on masking being available?
>> Is the exactness even worth checking, in the presence of exceptions?
> 
> The vector types and VF aren't chosen based on whether masking is available.
> It happens the other way around: we first analyse the loop and pick the VF
> for an unmasked loop, but record as we go whether a masked implementation
> is also possible.  Then we decide at the end whether to use a masked
> implementation instead of an unmasked one.
> 
> So if this assert triggers for masked loops, it could trigger for unmasked
> loops too.

OK, I completely misunderstood what was happening here.

What happens is that it goes through and finds vector types for every 
statement, and then says "Updating vectorization factor to 4.", but 
doesn't then go back and check for suitable types.

So, then it gets to this:

      if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
        {
          poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
                                  ? vf * DR_GROUP_SIZE (stmt_info) : vf);
          possible_npeel_number
            = vect_get_num_vectors (nscalars, vectype);

          /* NPEEL_TMP is 0 when there is no misalignment, but also
             allow peeling NELEMENTS.  */
          if (DR_MISALIGNMENT (dr_info) == 0)
            possible_npeel_number++;
        }

where "vf" is now 4, the group size appears to be 2, and "vectype" is 
V64SI, and vect_get_num_vectors blows up trying to divide 8 by 64.

If I switch back to the default cost model then the "vect" pass 
completes successfully, although vectorization fails due to a missing 
vector operator. The following "slp2" pass then switches to TImode fake 
vectors and works fine.

Alternatively, if I add back my patch then the pass completes the same 
way (without vectorization).

Any suggestions? I can't see how this stuff is supposed to work?

Andrew

WIP Fix vect-ctor-1.c ICE


diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 51a13f1d207..bf1c3eeda85 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1513,6 +1513,10 @@  vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo)
 static inline unsigned int
 vect_get_num_vectors (poly_uint64 nunits, tree vectype)
 {
+  /* Masked vectors can cause partial vector use.  */
+  if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
+    return 1;
+
   return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
 }

[RFC,vectorizer] Fix ICE with masked vectors

Commit Message

Comments

Patch