diff mbox series

[2/2] aarch64: Add codegen support for SVE2 faminmax

Message ID 20240913090655.1551666-3-saurabh.jha@arm.com
State New
Headers show
Series aarch64: Add support for SVE2 faminmax | expand

Commit Message

Saurabh Jha Sept. 13, 2024, 9:06 a.m. UTC
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
mandatory from Armv9.5-a. It introduces instructions for computing the
floating point absolute maximum and minimum of the two vectors
element-wise.

This patch adds code generation for famax and famin in terms of existing
unspecs. With this patch:
1. famax can be expressed as taking fmax/fmaxnm of the two operands and
   then taking absolute value of their result.
2. famin can be expressed as taking fmin/fminnm of the two operands and
   then taking absolute value of their result.

This fusion of operators is only possible when
-march=armv9-a+faminmax+sve flags are passed.

This code generation is only available on -O2 or -O3 as that is when
auto-vectorization is enabled.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve.md
	(*aarch64_pred_faminmax_fused): Instruction pattern for faminmax
	codegen.
	* config/aarch64/iterators.md: Attribute for faminmax codegen.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/faminmax.c: New test.
---
 gcc/config/aarch64/aarch64-sve.md             | 29 +++++++
 gcc/config/aarch64/iterators.md               |  6 ++
 .../gcc.target/aarch64/sve/faminmax.c         | 85 +++++++++++++++++++
 3 files changed, 120 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax.c

Comments

Kyrylo Tkachov Sept. 15, 2024, 8:42 a.m. UTC | #1
Hi Saurabh,

> On 13 Sep 2024, at 11:06, saurabh.jha@arm.com wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
> mandatory from Armv9.5-a. It introduces instructions for computing the
> floating point absolute maximum and minimum of the two vectors
> element-wise.
> 
> This patch adds code generation for famax and famin in terms of existing
> unspecs. With this patch:
> 1. famax can be expressed as taking fmax/fmaxnm of the two operands and
>  then taking absolute value of their result.
> 2. famin can be expressed as taking fmin/fminnm of the two operands and
>  then taking absolute value of their result.
> 
> This fusion of operators is only possible when
> -march=armv9-a+faminmax+sve flags are passed.
> 
> This code generation is only available on -O2 or -O3 as that is when
> auto-vectorization is enabled.
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64-sve.md
>       (*aarch64_pred_faminmax_fused): Instruction pattern for faminmax
>       codegen.
>       * config/aarch64/iterators.md: Attribute for faminmax codegen.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/aarch64/sve/faminmax.c: New test.
> ---
> gcc/config/aarch64/aarch64-sve.md             | 29 +++++++
> gcc/config/aarch64/iterators.md               |  6 ++
> .../gcc.target/aarch64/sve/faminmax.c         | 85 +++++++++++++++++++
> 3 files changed, 120 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax.c
> 

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index a5cd42be9d5..feb6438efde 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -11111,3 +11111,32 @@
    return "sel\t%0.<Vetype>, %3, %2.<Vetype>, %1.<Vetype>";
  }
)

A slight tangent, maybe more of a question for Richard, but should we be putting these extensions into aarch64-sve2.md or aarch64-sve.md?
It looks like the architecture has had a major extension with SVE2 or SVE so it made sense to create aarch64-sve2.md but now the incremental improvements can be considered as an extension to either?

+
+;; -------------------------------------------------------------------------
+;; -- [FP] Absolute maximum and minimum
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - FAMAX
+;; - FAMIN
+;; -------------------------------------------------------------------------
+
+;; Predicated floating-point absolute maximum and minimum.
+(define_insn "*aarch64_pred_faminmax_fused"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
+	(unspec:SVE_FULL_F
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
+	   (match_operand:SI 4 "aarch64_sve_gp_strictness" "w")
+	   (unspec:SVE_FULL_F
+	     [(match_operand 5)
+	      (const_int SVE_RELAXED_GP)
+	      (match_operand:SVE_FULL_F 2 "register_operand" "w")]
+	     UNSPEC_COND_FABS)
+	   (unspec:SVE_FULL_F
+	     [(match_operand 6)
+	      (const_int SVE_RELAXED_GP)
+	      (match_operand:SVE_FULL_F 3 "register_operand" "w")]
+	     UNSPEC_COND_FABS)]
+	  SVE_COND_FP_MAXMIN))]
+  "TARGET_SVE_FAMINMAX"
+  "<faminmax_cond_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>”

This output pattern is missing operand 2.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/faminmax.c b/gcc/testsuite/gcc.target/aarch64/sve/faminmax.c
new file mode 100644
index 00000000000..b70e19fa276
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/faminmax.c
@@ -0,0 +1,85 @@
+/* { dg-do assemble} */

I think dejagnu is pedantic about wanting the space between “assemble” and “}"

+/* { dg-additional-options "-O3 -ffast-math -march=armv9-a+sve+faminmax" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include “arm_sve.h"
+
+#pragma GCC target "+sve"
+

You already force +sve in the additional-options (though -march=armv9-a already implies sve2 and therefore sve). I think we want one or the other.
Maybe just use the target pragma “+sve+faminmax”?

Thanks,
Kyrill

+)
Richard Sandiford Sept. 17, 2024, 1:30 p.m. UTC | #2
Kyrylo Tkachov <ktkachov@nvidia.com> writes:
> Hi Saurabh,
>
>> On 13 Sep 2024, at 11:06, saurabh.jha@arm.com wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
>> mandatory from Armv9.5-a. It introduces instructions for computing the
>> floating point absolute maximum and minimum of the two vectors
>> element-wise.
>> 
>> This patch adds code generation for famax and famin in terms of existing
>> unspecs. With this patch:
>> 1. famax can be expressed as taking fmax/fmaxnm of the two operands and
>>  then taking absolute value of their result.
>> 2. famin can be expressed as taking fmin/fminnm of the two operands and
>>  then taking absolute value of their result.
>> 
>> This fusion of operators is only possible when
>> -march=armv9-a+faminmax+sve flags are passed.
>> 
>> This code generation is only available on -O2 or -O3 as that is when
>> auto-vectorization is enabled.
>> 
>> gcc/ChangeLog:
>> 
>>       * config/aarch64/aarch64-sve.md
>>       (*aarch64_pred_faminmax_fused): Instruction pattern for faminmax
>>       codegen.
>>       * config/aarch64/iterators.md: Attribute for faminmax codegen.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>       * gcc.target/aarch64/sve/faminmax.c: New test.
>> ---
>> gcc/config/aarch64/aarch64-sve.md             | 29 +++++++
>> gcc/config/aarch64/iterators.md               |  6 ++
>> .../gcc.target/aarch64/sve/faminmax.c         | 85 +++++++++++++++++++
>> 3 files changed, 120 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax.c
>> 
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
> index a5cd42be9d5..feb6438efde 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -11111,3 +11111,32 @@
>     return "sel\t%0.<Vetype>, %3, %2.<Vetype>, %1.<Vetype>";
>   }
> )
>
> A slight tangent, maybe more of a question for Richard, but should we be putting these extensions into aarch64-sve2.md or aarch64-sve.md?
> It looks like the architecture has had a major extension with SVE2 or SVE so it made sense to create aarch64-sve2.md but now the incremental improvements can be considered as an extension to either?

Yeah, good question. :)  I guess the sve/sve2 split doesn't make much
sense any more.  But while we have it, new patterns that are specific
to SVE2+ should probably go in aarch64-sve2.md.

> +;; -------------------------------------------------------------------------
> +;; -- [FP] Absolute maximum and minimum
> +;; -------------------------------------------------------------------------
> +;; Includes:
> +;; - FAMAX
> +;; - FAMIN
> +;; -------------------------------------------------------------------------
> +
> +;; Predicated floating-point absolute maximum and minimum.
> +(define_insn "*aarch64_pred_faminmax_fused"
> +  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
> +	(unspec:SVE_FULL_F
> +	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
> +	   (match_operand:SI 4 "aarch64_sve_gp_strictness" "w")
> +	   (unspec:SVE_FULL_F
> +	     [(match_operand 5)
> +	      (const_int SVE_RELAXED_GP)
> +	      (match_operand:SVE_FULL_F 2 "register_operand" "w")]
> +	     UNSPEC_COND_FABS)
> +	   (unspec:SVE_FULL_F
> +	     [(match_operand 6)
> +	      (const_int SVE_RELAXED_GP)
> +	      (match_operand:SVE_FULL_F 3 "register_operand" "w")]
> +	     UNSPEC_COND_FABS)]
> +	  SVE_COND_FP_MAXMIN))]
> +  "TARGET_SVE_FAMINMAX"
> +  "<faminmax_cond_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>”
>
> This output pattern is missing operand 2.

Yeah.  We should use the same formulation as elsewhere to support:

- operand 2 tied to operand 0
- operand 3 tied to operand 0 (through commutativity)
- all three are separate register (using movprfx)

@aarch64_pred_<su>abd<mode> is an example of a similar commutative operation.

I don't think this distinguishes between fmax(nm)s that came from
intrinsics and fmaxnms that came from the smax optab.  The former
can't be optimised, since famax has slightly different behaviour.
The latter can, because smax on a float is inherently somewhat fuzzy.

I think we should also have tests that something like:

#include <arm_sve.h>

svfloat32_t foo(svfloat32_t x, svfloat32_t y) {
    svbool_t pg = svptrue_b8();
    return svmax_x(pg, svabs_x(pg, x), svabs_x(pg, y));
}

and

#include <arm_sve.h>

svfloat32_t foo(svfloat32_t x, svfloat32_t y) {
    svbool_t pg = svptrue_b8();
    return svmaxnm_x(pg, svabs_x(pg, x), svabs_x(pg, y));
}

are not optimised to famax even when famax is available.  This can be
done using scan-assemblers for the three individual instructions and
a scan-assembler-not for famax.

As for how to fix that: I think we'll need to use UNSPEC_COND_SMAX
and UNSPEC_COND_SMIN for "smax" and "smin" (even for floating-point
modes), rather than the current UNSPEC_COND_FMAXNM and UNSPEC_COND_FMINNM.
Code that wants to generate UNSPEC_COND_FMAXNM or UNSPEC_COND_FMINNM
directly can do it via the separate fmax/fmin optabs.

I think that can all be done by judicious tweaking of existing iterators,
but I haven't tried...

Thanks,
Richard
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index a5cd42be9d5..feb6438efde 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -11111,3 +11111,32 @@ 
     return "sel\t%0.<Vetype>, %3, %2.<Vetype>, %1.<Vetype>";
   }
 )
+
+;; -------------------------------------------------------------------------
+;; -- [FP] Absolute maximum and minimum
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - FAMAX
+;; - FAMIN
+;; -------------------------------------------------------------------------
+
+;; Predicated floating-point absolute maximum and minimum.
+(define_insn "*aarch64_pred_faminmax_fused"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
+	(unspec:SVE_FULL_F
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
+	   (match_operand:SI 4 "aarch64_sve_gp_strictness" "w")
+	   (unspec:SVE_FULL_F
+	     [(match_operand 5)
+	      (const_int SVE_RELAXED_GP)
+	      (match_operand:SVE_FULL_F 2 "register_operand" "w")]
+	     UNSPEC_COND_FABS)
+	   (unspec:SVE_FULL_F
+	     [(match_operand 6)
+	      (const_int SVE_RELAXED_GP)
+	      (match_operand:SVE_FULL_F 3 "register_operand" "w")]
+	     UNSPEC_COND_FABS)]
+	  SVE_COND_FP_MAXMIN))]
+  "TARGET_SVE_FAMINMAX"
+  "<faminmax_cond_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index b993ac9a7f6..5bdf1970f92 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -4489,5 +4489,11 @@ 
 (define_int_attr faminmax_uns_op
   [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")])
 
+(define_int_attr faminmax_cond_uns_op
+  [(UNSPEC_COND_FMAX "famax")
+   (UNSPEC_COND_FMAXNM "famax")
+   (UNSPEC_COND_FMIN "famin")
+   (UNSPEC_COND_FMINNM "famin")])
+
 (define_code_attr faminmax_op
   [(smax "famax") (smin "famin")])
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/faminmax.c b/gcc/testsuite/gcc.target/aarch64/sve/faminmax.c
new file mode 100644
index 00000000000..b70e19fa276
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/faminmax.c
@@ -0,0 +1,85 @@ 
+/* { dg-do assemble} */
+/* { dg-additional-options "-O3 -ffast-math -march=armv9-a+sve+faminmax" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "arm_sve.h"
+
+#pragma GCC target "+sve"
+
+#define TEST_FAMAX(TYPE)						\
+  void fn_famax_##TYPE (TYPE * restrict a,				\
+			TYPE * restrict b,				\
+			TYPE * restrict c,				\
+			int n) {					\
+    for (int i = 0; i < n; i++) {					\
+      TYPE temp1 = __builtin_fabs (a[i]);				\
+      TYPE temp2 = __builtin_fabs (b[i]);				\
+      c[i] = __builtin_fmax (temp1, temp2);				\
+    }									\
+  }									\
+
+#define TEST_FAMIN(TYPE)						\
+  void fn_famin_##TYPE (TYPE * restrict a,				\
+			TYPE * restrict b,				\
+			TYPE * restrict c,				\
+			int n) {					\
+    for (int i = 0; i < n; i++) {					\
+      TYPE temp1 = __builtin_fabs (a[i]);				\
+      TYPE temp2 = __builtin_fabs (b[i]);				\
+      c[i] = __builtin_fmin (temp1, temp2);				\
+    }									\
+  }									\
+
+/*
+** fn_famax_float16_t:
+**	...
+**      famax	z31.h, p6/m, z31.h, z30.h
+**	...
+**	ret
+*/
+TEST_FAMAX (float16_t)
+
+/*
+** fn_famax_float32_t:
+**	...
+**      famax	z31.s, p6/m, z31.s, z30.s
+**	...
+**	ret
+*/
+TEST_FAMAX (float32_t)
+
+/*
+** fn_famax_float64_t:
+**	...
+**      famax	z31.d, p6/m, z31.d, z30.d
+**	...
+**	ret
+*/
+TEST_FAMAX (float64_t)
+
+/*
+** fn_famin_float16_t:
+**	...
+**      famin	z31.h, p6/m, z31.h, z30.h
+**	...
+**	ret
+*/
+TEST_FAMIN (float16_t)
+
+/*
+** fn_famin_float32_t:
+**	...
+**      famin	z31.s, p6/m, z31.s, z30.s
+**	...
+**	ret
+*/
+TEST_FAMIN (float32_t)
+
+/*
+** fn_famin_float64_t:
+**	...
+**      famin	z31.d, p6/m, z31.d, z30.d
+**	...
+**	ret
+*/
+TEST_FAMIN (float64_t)