Message ID | 91d3f8ee-8b2c-4866-a3ed-beb2953b5438@arm.com |
---|---|
State | New |
Headers | show |
Series | [RFC] vect: disable multiple calls of poly simdclones | expand |
On Fri, 3 Nov 2023, Andre Vieira (lists) wrote: > Hi, > > The current codegen code to support VF's that are multiples of a simdclone > simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does not > work for non-constant simdclones, so we should disable using such clones when > the VF is a multiple of the non-constant simdlen until we change the codegen > to support those. > > Enabling SVE simdclone support will cause ICEs if the vectorizer decides to > use a SVE simdclone with a VF that is larger than the simdlen. I'll be away > for the next two weeks, so cant' really discuss this further. > I initially tried to solve the problem, but the way > vectorizable_simd_clone_call is structured doesn't make it easy to replace > BIT_FIELD_REF with the poly-suitable solution right now of using > unpack_{hi,lo}. I think it should be straight-forward to use unpack_{even,odd} (it's even/odd for VLA, right? If lo/hi would be possible then doing BIT_FIELD_REF would be, too? Also you need to have multiple stages of unpack/pack when the factor is more than 2). There's plenty of time even during stage3 to address this. At least your patch should have come with a testcase (or two). Is there a bugreport tracking this issue? It should affect GCN as well I guess. Richard. > Unfortunately I only found this now as I was adding further > tests for SVE :( > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Reject simdclones > with non-constant simdlen when VF is not exactly the same.
On 06/11/2023 07:52, Richard Biener wrote: > On Fri, 3 Nov 2023, Andre Vieira (lists) wrote: > >> Hi, >> >> The current codegen code to support VF's that are multiples of a simdclone >> simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does not >> work for non-constant simdclones, so we should disable using such clones when >> the VF is a multiple of the non-constant simdlen until we change the codegen >> to support those. >> >> Enabling SVE simdclone support will cause ICEs if the vectorizer decides to >> use a SVE simdclone with a VF that is larger than the simdlen. I'll be away >> for the next two weeks, so cant' really discuss this further. >> I initially tried to solve the problem, but the way >> vectorizable_simd_clone_call is structured doesn't make it easy to replace >> BIT_FIELD_REF with the poly-suitable solution right now of using >> unpack_{hi,lo}. > > I think it should be straight-forward to use unpack_{even,odd} (it's > even/odd for VLA, right? If lo/hi would be possible then doing > BIT_FIELD_REF would be, too? Also you need to have multiple stages > of unpack/pack when the factor is more than 2). > > There's plenty of time even during stage3 to address this. > > At least your patch should have come with a testcase (or two). > > Is there a bugreport tracking this issue? It should affect GCN as well > I guess. What does "non-constant simdclones" mean? I'm not sure if this is a thing that can happen on GCN, or not? Andrew
On Mon, 6 Nov 2023, Andrew Stubbs wrote: > > > On 06/11/2023 07:52, Richard Biener wrote: > > On Fri, 3 Nov 2023, Andre Vieira (lists) wrote: > > > >> Hi, > >> > >> The current codegen code to support VF's that are multiples of a simdclone > >> simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does > >> not > >> work for non-constant simdclones, so we should disable using such clones > >> when > >> the VF is a multiple of the non-constant simdlen until we change the > >> codegen > >> to support those. > >> > >> Enabling SVE simdclone support will cause ICEs if the vectorizer decides to > >> use a SVE simdclone with a VF that is larger than the simdlen. I'll be away > >> for the next two weeks, so cant' really discuss this further. > >> I initially tried to solve the problem, but the way > >> vectorizable_simd_clone_call is structured doesn't make it easy to replace > >> BIT_FIELD_REF with the poly-suitable solution right now of using > >> unpack_{hi,lo}. > > > > I think it should be straight-forward to use unpack_{even,odd} (it's > > even/odd for VLA, right? If lo/hi would be possible then doing > > BIT_FIELD_REF would be, too? Also you need to have multiple stages > > of unpack/pack when the factor is more than 2). > > > > There's plenty of time even during stage3 to address this. > > > > At least your patch should have come with a testcase (or two). > > > > Is there a bugreport tracking this issue? It should affect GCN as well > > I guess. > > What does "non-constant simdclones" mean? I'm not sure if this is a thing that > can happen on GCN, or not? simdclone with a variable (POLY_INT) vector size. Richard.
On 06/11/2023 07:52, Richard Biener wrote: > On Fri, 3 Nov 2023, Andre Vieira (lists) wrote: > >> Hi, >> >> The current codegen code to support VF's that are multiples of a simdclone >> simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does not >> work for non-constant simdclones, so we should disable using such clones when >> the VF is a multiple of the non-constant simdlen until we change the codegen >> to support those. >> >> Enabling SVE simdclone support will cause ICEs if the vectorizer decides to >> use a SVE simdclone with a VF that is larger than the simdlen. I'll be away >> for the next two weeks, so cant' really discuss this further. >> I initially tried to solve the problem, but the way >> vectorizable_simd_clone_call is structured doesn't make it easy to replace >> BIT_FIELD_REF with the poly-suitable solution right now of using >> unpack_{hi,lo}. > > I think it should be straight-forward to use unpack_{even,odd} (it's > even/odd for VLA, right? If lo/hi would be possible then doing > BIT_FIELD_REF would be, too? Also you need to have multiple stages > of unpack/pack when the factor is more than 2). > > There's plenty of time even during stage3 to address this. > > At least your patch should have come with a testcase (or two). Yeah I didn't add one as it didn't trigger on AArch64 without my two outstanding aarch64 simdclone patches. > > Is there a bugreport tracking this issue? It should affect GCN as well > I guess. No, since I can't trigger them yet on trunk until the reviews on my target specific patches are done and they are committed. I don't have a GCN backend lying around but I suspect GCN doesn't use poly simdlen simdclones yet either... I haven't checked. The issue triggers for aarch64 when trying to generate SVE simdclones for functions with mixed types. I'll give the unpack thing a go locally.
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 5f262cae2aae784e3ef4fd07455b7aa742797b51..dc3e0716161838aef66cf37342499006673336d6 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4165,7 +4165,10 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen, &num_calls) || (!n->simdclone->inbranch && (masked_call_offset > 0)) - || (nargs != simd_nargs)) + || (nargs != simd_nargs) + /* Currently we do not support multiple calls of non-constant + simdlen as poly vectors can not be accessed by BIT_FIELD_REF. */ + || (!n->simdclone->simdlen.is_constant () && num_calls != 1)) continue; if (num_calls != 1) this_badness += exact_log2 (num_calls) * 4096;