Message ID | c9f7a19a-d3c4-46e2-2cb0-dc6aadd47077@codesourcery.com |
---|---|
State | New |
Headers | show |
Series | [RFC,vectorizer] Fix ICE with masked vectors | expand |
Andrew Stubbs <ams@codesourcery.com> writes: > Hi, > > This patch fixes an ICE in testcase gcc.dg/vect/vect-ctor-1.c: > > during GIMPLE pass: vect > dump file: vect-ctor-1.c.159t.vect > .../gcc.dg/vect/vect-ctor-1.c: In function 'intrapred_luma_16x16': > .../gcc.dg/vect/vect-ctor-1.c:9:6: internal compiler error: in > exact_div, at poly-int.h:2162 > 0xdf845f poly_int<1u, poly_result<unsigned long, if_nonpoly<unsigned > long, unsigned long, poly_int_traits<unsigned long>::is_poly>::type, > poly_coeff_pair_traits<unsigned long, if_nonpoly<unsigned long, unsigned > long, poly_int_traits<unsigned > long>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, > unsigned long>(poly_int_pod<1u, unsigned long> const&, unsigned long) > /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2162 > 0xdf649a poly_int<1u, poly_result<unsigned long, unsigned long, > poly_coeff_pair_traits<unsigned long, unsigned > long>::result_kind>::type> exact_div<1u, unsigned long, unsigned > long>(poly_int_pod<1u, unsigned long> const&, poly_int_pod<1u, unsigned > long> const&) > /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2175 > 0x1c473cd vect_get_num_vectors > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.h:1520 > 0x1c4bd35 vect_enhance_data_refs_alignment(_loop_vec_info*) > > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-data-refs.c:1798 > 0x1596732 vect_analyze_loop_2 > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2095 > 0x15980f3 vect_analyze_loop(loop*, vec_info_shared*) > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2536 > 0x15d7b36 try_vectorize_loop_1 > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:892 > 0x15d831f try_vectorize_loop > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1044 > 0x15d84f9 vectorize_loops() > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1125 > 0x144f0af execute > /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-ssa-loop.c:414 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <https://gcc.gnu.org/bugs/> for instructions. > > > The problem is that exact_div is being asked to do "8 / 64", which it > won't. The comment on the function says "NUNITS should be based on the > vectorization factor, so it is always a known multiple of the number of > elements in VECTYPE". This is on the amdgcn target where the > vectorization factor is always 64, but smaller tasks can be vectorized > using masking. > > I think what's happening here is that the assumption described in the > comment is invalid in the presence of masked vectors. No, the assumption's correct even there. The assert usually triggers because something elsewhere is getting confused about the vector types. > The attached patch fixes the ICE in the testcase, but I suspect does not > go far enough. Can it happen that NUNITS can be greater than the > vectorization factor, but not a multiple? Is this even a valid fix in > the first place? Must it be conditionalized on masking being available? > Is the exactness even worth checking, in the presence of exceptions? The vector types and VF aren't chosen based on whether masking is available. It happens the other way around: we first analyse the loop and pick the VF for an unmasked loop, but record as we go whether a masked implementation is also possible. Then we decide at the end whether to use a masked implementation instead of an unmasked one. So if this assert triggers for masked loops, it could trigger for unmasked loops too. FWIW there's an instance of this for SVE that I haven't got around to debugging yet, but from a quick look at the dump, it was somehow combining a vector of 8 longs with a vector of 4 floats. I'm not sure it's going to be the same issue as yours though. Thanks, Richard > > Thanks > > Andrew > > WIP Fix vect-ctor-1.c ICE > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index 51a13f1d207..bf1c3eeda85 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -1513,6 +1513,10 @@ vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo) > static inline unsigned int > vect_get_num_vectors (poly_uint64 nunits, tree vectype) > { > + /* Masked vectors can cause partial vector use. */ > + if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype))) > + return 1; > + > return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant (); > } >
On 09/12/2019 15:59, Richard Sandiford wrote: > No, the assumption's correct even there. The assert usually triggers > because something elsewhere is getting confused about the vector types. > >> The attached patch fixes the ICE in the testcase, but I suspect does not >> go far enough. Can it happen that NUNITS can be greater than the >> vectorization factor, but not a multiple? Is this even a valid fix in >> the first place? Must it be conditionalized on masking being available? >> Is the exactness even worth checking, in the presence of exceptions? > > The vector types and VF aren't chosen based on whether masking is available. > It happens the other way around: we first analyse the loop and pick the VF > for an unmasked loop, but record as we go whether a masked implementation > is also possible. Then we decide at the end whether to use a masked > implementation instead of an unmasked one. > > So if this assert triggers for masked loops, it could trigger for unmasked > loops too. OK, I completely misunderstood what was happening here. What happens is that it goes through and finds vector types for every statement, and then says "Updating vectorization factor to 4.", but doesn't then go back and check for suitable types. So, then it gets to this: if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) { poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info) ? vf * DR_GROUP_SIZE (stmt_info) : vf); possible_npeel_number = vect_get_num_vectors (nscalars, vectype); /* NPEEL_TMP is 0 when there is no misalignment, but also allow peeling NELEMENTS. */ if (DR_MISALIGNMENT (dr_info) == 0) possible_npeel_number++; } where "vf" is now 4, the group size appears to be 2, and "vectype" is V64SI, and vect_get_num_vectors blows up trying to divide 8 by 64. If I switch back to the default cost model then the "vect" pass completes successfully, although vectorization fails due to a missing vector operator. The following "slp2" pass then switches to TImode fake vectors and works fine. Alternatively, if I add back my patch then the pass completes the same way (without vectorization). Any suggestions? I can't see how this stuff is supposed to work? Andrew
WIP Fix vect-ctor-1.c ICE diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 51a13f1d207..bf1c3eeda85 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1513,6 +1513,10 @@ vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo) static inline unsigned int vect_get_num_vectors (poly_uint64 nunits, tree vectype) { + /* Masked vectors can cause partial vector use. */ + if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype))) + return 1; + return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant (); }