diff mbox series

[10/10] autovectorizer: Test autovectorization of different dot-prod modes.

Message ID 20240710140602.1707875-11-victor.donascimento@arm.com
State New
Headers show
Series Make `dot_prod' a convert-type optab | expand

Commit Message

Victor Do Nascimento July 10, 2024, 2:06 p.m. UTC
From: Victor Do Nascimento <vicdon01@e125768.arm.com>

Given the novel treatment of the dot product optab as a conversion we
are now able to target, for a given architecture, different
relationships between output modes and input modes.

This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:

uint32_t udot4(int n, uint8_t* data) {
  uint32_t sum = 0;
  for (int i=0; i<n; i+=1)
    sum += data[i] * data[i];
  return sum;
}

while the following wasn't:

uint32_t udot2(int n, uint16_t* data) {
  uint32_t sum = 0;
  for (int i=0; i<n; i+=1)
    sum += data[i] * data[i];
  return sum;
}

Under the new treatment of the dot product optab, they are both now
vectorizable.

This adds the relevant target-agnostic check to ensure this behaviour
in the autovectorizer.

gcc/testsuite/ChangeLog:

	  * gcc.dg/vect/vect-dotprod-twoway.c: New.
---
 .../gcc.dg/vect/vect-dotprod-twoway.c         | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c

Comments

Tamar Christina July 11, 2024, 7:02 a.m. UTC | #1
Hi Victor,

> -----Original Message-----
> From: Victor Do Nascimento <victor.donascimento@arm.com>
> Sent: Wednesday, July 10, 2024 3:06 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> <vicdon01@e125768.arm.com>
> Subject: [PATCH 10/10] autovectorizer: Test autovectorization of different dot-
> prod modes.
> 
> From: Victor Do Nascimento <vicdon01@e125768.arm.com>
> 
> Given the novel treatment of the dot product optab as a conversion we
> are now able to target, for a given architecture, different
> relationships between output modes and input modes.
> 
> This is made clearer by way of example. Previously, on AArch64, the
> following loop was vectorizable:
> 
> uint32_t udot4(int n, uint8_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i<n; i+=1)
>     sum += data[i] * data[i];
>   return sum;
> }
> 
> while the following wasn't:
> 
> uint32_t udot2(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i<n; i+=1)
>     sum += data[i] * data[i];
>   return sum;
> }
> 
> Under the new treatment of the dot product optab, they are both now
> vectorizable.
> 
> This adds the relevant target-agnostic check to ensure this behaviour
> in the autovectorizer.
> 
> gcc/testsuite/ChangeLog:
> 
> 	  * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../gcc.dg/vect/vect-dotprod-twoway.c         | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..5caa7b81fce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* Ensure both the two-way and four-way dot products are autovectorized.  */
> +#include <stdint.h>
> +
> +uint32_t udot4(int n, uint8_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot4(int n, int8_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +uint32_t udot2(int n, uint16_t* data) {
> +  uint32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) {
> +  int32_t sum = 0;
> +  for (int i=0; i<n; i+=1) {
> +    sum += data[i] * data[i];
> +  }
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */

These tests only test that you have vectorized the loops, not that the loop was vectorized
using dotprod.  I think you want to have a scan for DOT_PROD_EXPR as well, gated to the
targets that support two-way dot prod.

Cheers,
Tamar

> --
> 2.34.1
Richard Biener July 11, 2024, 8:47 a.m. UTC | #2
On Thu, Jul 11, 2024 at 9:03 AM Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Victor,
>
> > -----Original Message-----
> > From: Victor Do Nascimento <victor.donascimento@arm.com>
> > Sent: Wednesday, July 10, 2024 3:06 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Victor Do Nascimento
> > <vicdon01@e125768.arm.com>
> > Subject: [PATCH 10/10] autovectorizer: Test autovectorization of different dot-
> > prod modes.
> >
> > From: Victor Do Nascimento <vicdon01@e125768.arm.com>
> >
> > Given the novel treatment of the dot product optab as a conversion we
> > are now able to target, for a given architecture, different
> > relationships between output modes and input modes.
> >
> > This is made clearer by way of example. Previously, on AArch64, the
> > following loop was vectorizable:
> >
> > uint32_t udot4(int n, uint8_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i<n; i+=1)
> >     sum += data[i] * data[i];
> >   return sum;
> > }
> >
> > while the following wasn't:
> >
> > uint32_t udot2(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i<n; i+=1)
> >     sum += data[i] * data[i];
> >   return sum;
> > }
> >
> > Under the new treatment of the dot product optab, they are both now
> > vectorizable.
> >
> > This adds the relevant target-agnostic check to ensure this behaviour
> > in the autovectorizer.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../gcc.dg/vect/vect-dotprod-twoway.c         | 38 +++++++++++++++++++
> >  1 file changed, 38 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > new file mode 100644
> > index 00000000000..5caa7b81fce
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* Ensure both the two-way and four-way dot products are autovectorized.  */
> > +#include <stdint.h>
> > +
> > +uint32_t udot4(int n, uint8_t* data) {
> > +  uint32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +int32_t sdot4(int n, int8_t* data) {
> > +  int32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +uint32_t udot2(int n, uint16_t* data) {
> > +  uint32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +int32_t sdot2(int n, int16_t* data) {
> > +  int32_t sum = 0;
> > +  for (int i=0; i<n; i+=1) {
> > +    sum += data[i] * data[i];
> > +  }
> > +  return sum;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
>
> These tests only test that you have vectorized the loops, not that the loop was vectorized
> using dotprod.  I think you want to have a scan for DOT_PROD_EXPR as well, gated to the
> targets that support two-way dot prod.

Ideally they'd also verify correctness, thus make them have runtime checks.

> Cheers,
> Tamar
>
> > --
> > 2.34.1
>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..5caa7b81fce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* Ensure both the two-way and four-way dot products are autovectorized.  */
+#include <stdint.h>
+
+uint32_t udot4(int n, uint8_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot4(int n, int8_t* data) {
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+uint32_t udot2(int n, uint16_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) {
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */