[ARM,RFC] Fix vect.exp failures for NEON in big-endian mode

Hi,

Several new (ish?) autovectorizer features have apparently caused NEON
support for same to regress quite heavily in big-endian mode. This
patch is an attempt to fix things up, but is not without problems --
maybe someone will have a suggestion as to how we should proceed.

The problem (as ever) is that the ARM backend must lie to the
middle-end about the layout of NEON vectors in big-endian mode (due to
ABI requirements, VFP compatibility, and the middle-end semantics of
vector indices being equivalent to those of an array with the same type
of elements when stored in memory). A few years ago when the vectorizer
was relatively less sophisticated, the ordering of vector elements
could be ignored to some extent by disabling certain instruction
patterns used by the vectorizer in big-endian mode which were sensitive
to the ordering of elements: in fact this is still the strategy we're
using, but it is clearly becoming less and less tenable as time
progresses. Quad-word registers (being composed of two double-word
registers, loaded/stored the "wrong way round" in big-endian mode)
arguably cause more problems than double-word registers.

So, the idea behind the attached patch was supposed to be to limit the
autovectorizer to using double-word registers only, and to disable a
few additional (or newly-used by the vectorizer) patterns in big-endian
mode. That, plus several testsuite tweaks, gets us down to zero
failures for vect.exp, which is good.

The problem is that at the same time quite a large set of neon.exp tests
regress (vzip/vuzp/vtrn): one of the new patterns which is
disabled because it causes trouble (i.e. execution failures) for the
vectorizer is vec_perm_const<mode>. However __builtin_shuffle (which
uses that pattern) is used for arm_neon.h now -- so disabling it means
that the proper instructions aren't generated for intrinsics any more in
big-endian mode.

I think we have a problem here. The vectorizer also tries to use
__builtin_shuffle (for scatter/gather operations, when lane
loading/storing ops aren't available), but does not understand the
"special tweaks" that arm_evpc_neon_{vuzp,vzip,vtrn} does to try to
hide the true element ordering of vectors from the middle-end. So, I'm
left wondering:

 * Given our funky element ordering in BE mode, are the
   __builtin_shuffle lists in arm_neon.h actually an accurate
   representation of what the given intrinsic should do? (The fallback
   code might or might not do the same thing, I'm not sure.)

 * The vectorizer tries to use VEC_PERM_EXPR (equivalent to
   __builtin_shuffle) with e.g. pairs of doubleword registers loaded
   from adjacent memory locations. Are the semantics required for this
   (again, with our funky element ordering) even the same as those
   required for the intrinsics? Including quad-word registers for the
   latter? (My suspicion is "no", in which case there's a fundamental
   incompatibility here that needs to be resolved somehow.)

Anyway: the tl;dr is "fixing NEON vect tests breaks intrinsics". Any
ideas for what to do about that? (FAOD, I don't think I'm in a position
to do the kind of middle-end surgery required to fix the problem
"properly" at this point :-p).

(It's arguably more important for the vectorizer to not generate bad
code than it is for intrinsics to work properly, in which case: OK to
apply? Tested cross to ARM EABI with configury modifications to build
LE/BE multilibs.)

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_array_mode_supported_p): No array modes for
    big-endian NEON.
    (arm_preferred_simd_mode): Always prefer 64-bit modes for
    big-endian NEON.
    (arm_autovectorize_vector_sizes): Use 8-byte vectors only for NEON.
    (arm_vectorize_vec_perm_const_ok): No permutations are OK in
    big-endian mode.
    * config/arm/neon.md (vec_load_lanes<mode><mode>): Disable in
    big-endian mode.
    (vec_store_lanes<mode><mode>, vec_load_lanesti<mode>)
    (vec_load_lanesoi<mode>, vec_store_lanesti<mode>)
    (vec_store_lanesoi<mode>, vec_load_lanesei<mode>)
    (vec_load_lanesci<mode>, vec_store_lanesei<mode>)
    (vec_store_lanesci<mode>, vec_load_lanesxi<mode>)
    (vec_store_lanesxi<mode>): Likewise.
    (vec_widen_<US>shiftl_lo_<mode>, vec_widen_<US>shiftl_hi_<mode>)
    (vec_widen_<US>mult_hi_<mode>, vec_widen_<US>mult_lo_<mode>):
    Likewise.

    gcc/testsuite/
    * gcc.dg/vect/slp-cond-3.c: XFAIL for !vect_unpack.
    * gcc.dg/vect/slp-cond-4.c: Likewise.
    * gcc.dg/vect/vect-1.c: Likewise.
    * gcc.dg/vect/vect-1-big-array.c: Likewise.
    * gcc.dg/vect/vect-35.c: Likewise.
    * gcc.dg/vect/vect-35-big-array.c: Likewise.
    * gcc.dg/vect/bb-slp-11.c: Likewise.
    * gcc.dg/vect/bb-slp-26.c: Likewise.
    * gcc.dg/vect/vect-over-widen-3-big-array.c: XFAIL
    for !vect_element_align.
    * gcc.dg/vect/vect-over-widen-1.c: Likewise.
    * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
    * gcc.dg/vect/vect-over-widen-2.c: Likewise.
    * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
    * gcc.dg/vect/vect-over-widen-3.c: Likewise.
    * gcc.dg/vect/vect-over-widen-4.c: Likewise.
    * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
    * gcc.dg/vect/pr43430-2.c: Likewise.
    * gcc.dg/vect/vect-widen-shift-u16.c: XFAIL for !vect_widen_shift
    && !vect_unpack.
    * gcc.dg/vect/vect-widen-shift-s8.c: Likewise.
    * gcc.dg/vect/vect-widen-shift-u8.c: Likewise.
    * gcc.dg/vect/vect-widen-shift-s16.c: Likewise.
    * gcc.dg/vect/vect-93.c: Only run if !vect_intfloat_cvt.
    * gcc.dg/vect/vect-intfloat-conversion-4a.c: Only run if
    vect_unpack.
    * gcc.dg/vect/vect-intfloat-conversion-4b.c: Likewise.
    * lib/target-supports.exp (check_effective_target_vect_perm): Only
    enable for NEON little-endian.
    (check_effective_target_vect_widen_sum_qi_to_hi): Likewise.
    (check_effective_target_vect_widen_mult_qi_to_hi): Likewise.
    (check_effective_target_vect_widen_mult_hi_to_si): Likewise.
    (check_effective_target_vect_widen_shift): Likewise.
    (check_effective_target_vect_extract_even_odd): Likewise.
    (check_effective_target_vect_interleave): Likewise.
    (check_effective_target_vect_stridedN): Likewise.
    (check_effective_target_vect_multiple_sizes): Likewise.
    (check_effective_target_vect64): Enable for any NEON.

[ARM,RFC] Fix vect.exp failures for NEON in big-endian mode

Commit Message

Comments

Patch