mbox series

[0/3] vect, aarch64: Add SVE support for simdclones

Message ID 20240130143132.9575-1-andre.simoesdiasvieira@arm.com
Headers show
Series vect, aarch64: Add SVE support for simdclones | expand

Message

Andre Vieira (lists) Jan. 30, 2024, 2:31 p.m. UTC
Hi,

This patch series is a set of patches that I have sent up for review before and it enables initial support SVE simd clones with some caveats.
Caveat 1: we do not support SVE simd clones with function bodies.
To enable support for this we need to change the way we 'simdify' a function body. For each argument that maps to a vector an array is created with 'simdlen'. This however does not work for VLA simdlen.  We will need to come up with a way to support this such that the generated code is performant, there's little reason to 'simdify' a function by generating really slow code. I have some ideas on how we might be able to do this, though I'm not convinced it's even worth trying, but I think that's a bigger discussion.  For now I've disabled generating SVE simdclones for functions with function bodies.  This still fits our libmvec usecase as the simd clones are handwritten using intrinsics in glibc.

Caveat 2: we can not generate ncopy calls to a SVE simd clone call.
When I first sent the second patch of this series upstream Richi asked me to look at enabling being able to support calling ncopies of VLA simdlen simd clones, I have vectorizer code to do this, however I found that we didn't yet have enough backend support to be able to index VLA vectors to support this.  I think that's something that will need to wait until gcc 15, so for now I'd simply reject vectorization where that is required.

Caveat 3: we don't yet support SVE simdclones for VLS codegen.
We've disabled the use of SVE simdclones when the -msve-vector-bits option is used to request VLS codegen. We need this because the mangling is determined by the 'simdlen' of a simd clone which will not be VLA when -msve-vector-bits is passed. We would like to support using VLA simd clones when generating VLS, but for that to work right now we'd need to set the simdlen of the simd clone to the VLS value and that messes up the mangling.  In the future we will need to add a target hook to specify the mangling.

Given that the target agnostic changes are minimal, have been suggested before and have no impact on other targets, the target specific parts have been reviewed before, would this still be acceptable for Stage 4? I would really like to make use of the work that was done to support this and the SVE simdclones added to glibc.

Kind regards,
Andre

Andre Vieira (3):
vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
vect: disable multiple calls of poly simdclones
aarch64: Add SVE support for simd clones [PR 96342]