Message ID | 20240710140602.1707875-11-victor.donascimento@arm.com |
---|---|
State | New |
Headers | show |
Series | Make `dot_prod' a convert-type optab | expand |
Hi Victor, > -----Original Message----- > From: Victor Do Nascimento <victor.donascimento@arm.com> > Sent: Wednesday, July 10, 2024 3:06 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Victor Do Nascimento > <vicdon01@e125768.arm.com> > Subject: [PATCH 10/10] autovectorizer: Test autovectorization of different dot- > prod modes. > > From: Victor Do Nascimento <vicdon01@e125768.arm.com> > > Given the novel treatment of the dot product optab as a conversion we > are now able to target, for a given architecture, different > relationships between output modes and input modes. > > This is made clearer by way of example. Previously, on AArch64, the > following loop was vectorizable: > > uint32_t udot4(int n, uint8_t* data) { > uint32_t sum = 0; > for (int i=0; i<n; i+=1) > sum += data[i] * data[i]; > return sum; > } > > while the following wasn't: > > uint32_t udot2(int n, uint16_t* data) { > uint32_t sum = 0; > for (int i=0; i<n; i+=1) > sum += data[i] * data[i]; > return sum; > } > > Under the new treatment of the dot product optab, they are both now > vectorizable. > > This adds the relevant target-agnostic check to ensure this behaviour > in the autovectorizer. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > --- > .../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++++++++++++++++++ > 1 file changed, 38 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > new file mode 100644 > index 00000000000..5caa7b81fce > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* Ensure both the two-way and four-way dot products are autovectorized. */ > +#include <stdint.h> > + > +uint32_t udot4(int n, uint8_t* data) { > + uint32_t sum = 0; > + for (int i=0; i<n; i+=1) { > + sum += data[i] * data[i]; > + } > + return sum; > +} > + > +int32_t sdot4(int n, int8_t* data) { > + int32_t sum = 0; > + for (int i=0; i<n; i+=1) { > + sum += data[i] * data[i]; > + } > + return sum; > +} > + > +uint32_t udot2(int n, uint16_t* data) { > + uint32_t sum = 0; > + for (int i=0; i<n; i+=1) { > + sum += data[i] * data[i]; > + } > + return sum; > +} > + > +int32_t sdot2(int n, int16_t* data) { > + int32_t sum = 0; > + for (int i=0; i<n; i+=1) { > + sum += data[i] * data[i]; > + } > + return sum; > +} > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ These tests only test that you have vectorized the loops, not that the loop was vectorized using dotprod. I think you want to have a scan for DOT_PROD_EXPR as well, gated to the targets that support two-way dot prod. Cheers, Tamar > -- > 2.34.1
On Thu, Jul 11, 2024 at 9:03 AM Tamar Christina <Tamar.Christina@arm.com> wrote: > > Hi Victor, > > > -----Original Message----- > > From: Victor Do Nascimento <victor.donascimento@arm.com> > > Sent: Wednesday, July 10, 2024 3:06 PM > > To: gcc-patches@gcc.gnu.org > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw > > <Richard.Earnshaw@arm.com>; Victor Do Nascimento > > <vicdon01@e125768.arm.com> > > Subject: [PATCH 10/10] autovectorizer: Test autovectorization of different dot- > > prod modes. > > > > From: Victor Do Nascimento <vicdon01@e125768.arm.com> > > > > Given the novel treatment of the dot product optab as a conversion we > > are now able to target, for a given architecture, different > > relationships between output modes and input modes. > > > > This is made clearer by way of example. Previously, on AArch64, the > > following loop was vectorizable: > > > > uint32_t udot4(int n, uint8_t* data) { > > uint32_t sum = 0; > > for (int i=0; i<n; i+=1) > > sum += data[i] * data[i]; > > return sum; > > } > > > > while the following wasn't: > > > > uint32_t udot2(int n, uint16_t* data) { > > uint32_t sum = 0; > > for (int i=0; i<n; i+=1) > > sum += data[i] * data[i]; > > return sum; > > } > > > > Under the new treatment of the dot product optab, they are both now > > vectorizable. > > > > This adds the relevant target-agnostic check to ensure this behaviour > > in the autovectorizer. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-dotprod-twoway.c: New. > > --- > > .../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++++++++++++++++++ > > 1 file changed, 38 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > new file mode 100644 > > index 00000000000..5caa7b81fce > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c > > @@ -0,0 +1,38 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* Ensure both the two-way and four-way dot products are autovectorized. */ > > +#include <stdint.h> > > + > > +uint32_t udot4(int n, uint8_t* data) { > > + uint32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +int32_t sdot4(int n, int8_t* data) { > > + int32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +uint32_t udot2(int n, uint16_t* data) { > > + uint32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +int32_t sdot2(int n, int16_t* data) { > > + int32_t sum = 0; > > + for (int i=0; i<n; i+=1) { > > + sum += data[i] * data[i]; > > + } > > + return sum; > > +} > > + > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ > > These tests only test that you have vectorized the loops, not that the loop was vectorized > using dotprod. I think you want to have a scan for DOT_PROD_EXPR as well, gated to the > targets that support two-way dot prod. Ideally they'd also verify correctness, thus make them have runtime checks. > Cheers, > Tamar > > > -- > > 2.34.1 >
diff --git a/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c new file mode 100644 index 00000000000..5caa7b81fce --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* Ensure both the two-way and four-way dot products are autovectorized. */ +#include <stdint.h> + +uint32_t udot4(int n, uint8_t* data) { + uint32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +int32_t sdot4(int n, int8_t* data) { + int32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +uint32_t udot2(int n, uint16_t* data) { + uint32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +int32_t sdot2(int n, int16_t* data) { + int32_t sum = 0; + for (int i=0; i<n; i+=1) { + sum += data[i] * data[i]; + } + return sum; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
From: Victor Do Nascimento <vicdon01@e125768.arm.com> Given the novel treatment of the dot product optab as a conversion we are now able to target, for a given architecture, different relationships between output modes and input modes. This is made clearer by way of example. Previously, on AArch64, the following loop was vectorizable: uint32_t udot4(int n, uint8_t* data) { uint32_t sum = 0; for (int i=0; i<n; i+=1) sum += data[i] * data[i]; return sum; } while the following wasn't: uint32_t udot2(int n, uint16_t* data) { uint32_t sum = 0; for (int i=0; i<n; i+=1) sum += data[i] * data[i]; return sum; } Under the new treatment of the dot product optab, they are both now vectorizable. This adds the relevant target-agnostic check to ensure this behaviour in the autovectorizer. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-dotprod-twoway.c: New. --- .../gcc.dg/vect/vect-dotprod-twoway.c | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c