Message ID | 20200925142704.GA9928@arm.com |
---|---|
Headers | show |
Series | middle-end Add support for SLP vectorization of complex number instructions. | expand |
On Fri, 25 Sep 2020, Tamar Christina wrote: > Hi All, > > This patch series adds support for SLP vectorization of complex instructions [1]. > > These instructions exist only in their vector forms and require you to recognize > two statements in parallel. Complex operations usually require a permute due to > the fact that the real and imaginary numbers are stored intermixed but these vector > instructions expect this and no longer need the compiler to generate a permute. > > For this reason the pass also re-orders the loads in the SLP tree such that they > become contiguous and no longer need the permutes. The Basic Blocks are left > untouched such that the scalar loop will still correctly issue permutes. > > The instructions also support rotations along the Argand plane, as such the operands > have to be re-ordered to coincide with their load group. > > For now, this patch only adds support for: > > * Complex Addition with rotation of 0 and 180. > * Complex Multiplication and Multiplication where one operand is conjucated. > * Complex FMA and FMA where one operand is conjucated. > * Complex FMS and FMS where one operand is conjucated. > > Complex dot-product is not currently supported in this patch set as build_slp fails > for it. This will be provided as a future patch. > > These are supported for both integer and floating point and as such these don't look > for real or imaginary pairs but instead rely on the early lowering of complex > numbers by GCC and canonicazation of the operations such that it just recognizes any > instruction sequence matching the operations requested. > > To be safe when the it is not sure it can support the operation or if it finds something it > does not understand it backs off. > > This patch is an RFC and I am looking on feedback on the approach. Particularly > this series has one problem which is when it is decided that SLP is not viable > and that the normal loop vectorizer is to be used. > > In this case I dissolve the changes but the compiler crashes because the use of > pattern matcher essentially undoes two_operands. This means that the number of > copies needed when using the patterns and when not are different. When using > the patterns the two operands become the same and so are treated as manually > unrolled loops. The problem is that because nunits has already been decided > along with the unroll factor. When the dissolved statements are then analyzed > they fail. This is also the reason why I cannot analyze both the pattern and > original statements initially. That's the same as with "regular" patterns btw., if vectorizing the pattern fails vectorization fails, we never re-consider and we also have no way of multiple patterns to choose from. The way "regular" patterns make this a non-issue is that they try to only convert things that are likely unhandled/suboptimal and most likely vectorizable. That said - the solution to the ICE is to _not_ dissolve the changes and instead make vectorization fail. Richard. > The relavent placed in the source code have comments describing the problem. > > [1] https://developer.arm.com/documentation/ddi0487/fc/ > > Thanks, > Tamar