[AArch64] Fix vector multiplication costs

Message ID	be85ddc1-4a8f-4e15-2cbc-a9f80ca0c4d1@arm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6655A3870903 To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> Subject: [AArch64] Fix vector multiplication costs Message-ID: <be85ddc1-4a8f-4e15-2cbc-a9f80ca0c4d1@arm.com> Date: Wed, 3 Feb 2021 17:59:01 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------3376DE8A6F4691DAC315C1DC" Content-Language: en-US Precedence: list From: "Andre Vieira \(lists\) via Gcc-patches" <gcc-patches@gcc.gnu.org> Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com> Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	[AArch64] Fix vector multiplication costs \| expand [AArch64] Fix vector multiplication costs

Message ID

be85ddc1-4a8f-4e15-2cbc-a9f80ca0c4d1@arm.com

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6655A3870903
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [AArch64] Fix vector multiplication costs
Message-ID: <be85ddc1-4a8f-4e15-2cbc-a9f80ca0c4d1@arm.com>
Date: Wed, 3 Feb 2021 17:59:01 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.5.1
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------3376DE8A6F4691DAC315C1DC"
Content-Language: en-US
Precedence: list
From: "Andre Vieira \(lists\) via Gcc-patches" <gcc-patches@gcc.gnu.org>
Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Errors-To: gcc-patches-bounces@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>

Series

[AArch64] Fix vector multiplication costs | expand

Commit Message

Andre Vieira (lists) Feb. 3, 2021, 5:59 p.m. UTC

This patch introduces a vect.mul RTX cost and decouples the vector 
multiplication costing from the scalar one.

After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a 
regression in vector codegen. Reproduceable with the small test added in 
this patch.
Upon further investigation we noticed 'aarch64_rtx_mult_cost' was using 
scalar costs to calculate the cost of vector multiplication, which was 
now lower and preventing 'choose_mult_variant' from making the right 
choice to expand such vector multiplications with constants as shift and 
sub's. I also added a special case for SSRA to use the default vector 
cost rather than mult, SSRA seems to be cost using 
'aarch64_rtx_mult_cost', which to be fair is quite curious. I believe we 
should have a better look at 'aarch64_rtx_costs' altogether and 
completely decouple vector and scalar costs. Though that is something 
that requires more rewriting than I believe should be done in Stage 4.

I gave all targets a vect.mult cost of 4x the vect.alu cost, with the 
exception of targets with cost 0 for vect.alu, those I gave the cost 4.

Bootstrapped on aarch64.

Is this OK for trunk?

gcc/ChangeLog:

         * config/aarch64/aarch64-cost-tables.h: Add entries for vect.mul.
         * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Use 
vect.mul for
         vector multiplies and vect.alu for SSRA.
         * config/arm/aarch-common-protos.h (struct vector_cost_table): 
Define
         vect.mul cost field.
         * config/arm/aarch-cost-tables.h: Add entries for vect.mul.
         * config/arm/arm.c: Likewise.

gcc/testsuite/ChangeLog:

         * gcc.target/aarch64/asimd-mul-to-shl-sub.c: New test.

Comments

Kyrylo Tkachov Feb. 8, 2021, 5:09 p.m. UTC | #1

> -----Original Message-----
> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com>
> Sent: 03 February 2021 17:59
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: [AArch64] Fix vector multiplication costs
> 
> This patch introduces a vect.mul RTX cost and decouples the vector
> multiplication costing from the scalar one.
> 
> After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a
> regression in vector codegen. Reproduceable with the small test added in
> this patch.
> Upon further investigation we noticed 'aarch64_rtx_mult_cost' was using
> scalar costs to calculate the cost of vector multiplication, which was
> now lower and preventing 'choose_mult_variant' from making the right
> choice to expand such vector multiplications with constants as shift and
> sub's. I also added a special case for SSRA to use the default vector
> cost rather than mult, SSRA seems to be cost using
> 'aarch64_rtx_mult_cost', which to be fair is quite curious. I believe we
> should have a better look at 'aarch64_rtx_costs' altogether and
> completely decouple vector and scalar costs. Though that is something
> that requires more rewriting than I believe should be done in Stage 4.
> 
> I gave all targets a vect.mult cost of 4x the vect.alu cost, with the
> exception of targets with cost 0 for vect.alu, those I gave the cost 4.
> 
> Bootstrapped on aarch64.
> 
> Is this OK for trunk?

Ok.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
>          * config/aarch64/aarch64-cost-tables.h: Add entries for vect.mul.
>          * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Use
> vect.mul for
>          vector multiplies and vect.alu for SSRA.
>          * config/arm/aarch-common-protos.h (struct vector_cost_table):
> Define
>          vect.mul cost field.
>          * config/arm/aarch-cost-tables.h: Add entries for vect.mul.
>          * config/arm/arm.c: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/aarch64/asimd-mul-to-shl-sub.c: New test.

Christophe Lyon March 5, 2021, 10:30 a.m. UTC | #2

On Mon, 8 Feb 2021 at 18:10, Kyrylo Tkachov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> > -----Original Message-----
> > From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com>
> > Sent: 03 February 2021 17:59
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Subject: [AArch64] Fix vector multiplication costs
> >
> > This patch introduces a vect.mul RTX cost and decouples the vector
> > multiplication costing from the scalar one.
> >
> > After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a
> > regression in vector codegen. Reproduceable with the small test added in
> > this patch.
> > Upon further investigation we noticed 'aarch64_rtx_mult_cost' was using
> > scalar costs to calculate the cost of vector multiplication, which was
> > now lower and preventing 'choose_mult_variant' from making the right
> > choice to expand such vector multiplications with constants as shift and
> > sub's. I also added a special case for SSRA to use the default vector
> > cost rather than mult, SSRA seems to be cost using
> > 'aarch64_rtx_mult_cost', which to be fair is quite curious. I believe we
> > should have a better look at 'aarch64_rtx_costs' altogether and
> > completely decouple vector and scalar costs. Though that is something
> > that requires more rewriting than I believe should be done in Stage 4.
> >
> > I gave all targets a vect.mult cost of 4x the vect.alu cost, with the
> > exception of targets with cost 0 for vect.alu, those I gave the cost 4.
> >
> > Bootstrapped on aarch64.
> >
> > Is this OK for trunk?
>
> Ok.
> Thanks,
> Kyrill
>
> >
> > gcc/ChangeLog:
> >
> >          * config/aarch64/aarch64-cost-tables.h: Add entries for vect.mul.
> >          * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Use
> > vect.mul for
> >          vector multiplies and vect.alu for SSRA.
> >          * config/arm/aarch-common-protos.h (struct vector_cost_table):
> > Define
> >          vect.mul cost field.
> >          * config/arm/aarch-cost-tables.h: Add entries for vect.mul.
> >          * config/arm/arm.c: Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >          * gcc.target/aarch64/asimd-mul-to-shl-sub.c: New test.
>


Hi Andre,

It seems you forgot to update a test, because I've noticed these
failures since you committed this patch:
FAIL: gcc.target/aarch64/sve/mul_2.c -march=armv8.2-a+sve
scan-assembler-times \\tmul\\tz[0-9]+\\.b, z[0-9]+\\.b, #-120\\n 1
FAIL: gcc.target/aarch64/sve/mul_2.c -march=armv8.2-a+sve
scan-assembler-times \\tmul\\tz[0-9]+\\.h, z[0-9]+\\.h, #-128\\n 2
FAIL: gcc.target/aarch64/sve/mul_2.c -march=armv8.2-a+sve
scan-assembler-times \\tmul\\tz[0-9]+\\.s, z[0-9]+\\.s, #-128\\n 1

Am I missing something?

Thanks,

Christophe

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h
index c309f88cbd56f0d2347996d860c982a3a6744492..dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -123,7 +123,8 @@  const struct cpu_cost_table qdf24xx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -227,7 +228,8 @@  const struct cpu_cost_table thunderx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* Alu.  */
+    COSTS_N_INSNS (1),	/* Alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
@@ -330,7 +332,8 @@  const struct cpu_cost_table thunderx2t99_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* Alu.  */
+    COSTS_N_INSNS (1),	/* Alu.  */
+    COSTS_N_INSNS (4)	/* Mult.  */
   }
 };
 
@@ -433,7 +436,8 @@  const struct cpu_cost_table thunderx3t110_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* Alu.  */
+    COSTS_N_INSNS (1),	/* Alu.  */
+    COSTS_N_INSNS (4)	/* Mult.  */
   }
 };
 
@@ -537,7 +541,8 @@  const struct cpu_cost_table tsv110_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -640,7 +645,8 @@  const struct cpu_cost_table a64fx_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b6192e55521004ae70cd13acbdb4dab142216845..146ed8c1b693d7204a754bc4e6d17025e0af544b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11568,7 +11568,6 @@  aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int outer, bool speed)
   if (VECTOR_MODE_P (mode))
     {
       unsigned int vec_flags = aarch64_classify_vector_mode (mode);
-      mode = GET_MODE_INNER (mode);
       if (vec_flags & VEC_ADVSIMD)
 	{
 	  /* The by-element versions of the instruction have the same costs as
@@ -11582,6 +11581,17 @@  aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int outer, bool speed)
 	  else if (GET_CODE (op1) == VEC_DUPLICATE)
 	    op1 = XEXP (op1, 0);
 	}
+      cost += rtx_cost (op0, mode, MULT, 0, speed);
+      cost += rtx_cost (op1, mode, MULT, 1, speed);
+      if (speed)
+	{
+	  if (GET_CODE (x) == MULT)
+	    cost += extra_cost->vect.mult;
+	  /* This is to catch the SSRA costing currently flowing here.  */
+	  else
+	    cost += extra_cost->vect.alu;
+	}
+      return cost;
     }
 
   /* Integer multiply/fma.  */
diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 251de3d61a833a2bb4b77e9211cac7fbc17c0b75..7a9cf3d324c103de74af741abe9ef30b76fea5ce 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -132,6 +132,7 @@  struct fp_cost_table
 struct vector_cost_table
 {
   const int alu;
+  const int mult;
 };
 
 struct cpu_cost_table
diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h
index d4baee4f83ad7bcdb1835a471e4eafedbf63ee2d..25ff702f01fab50d749b9a7b7b072c2be2504562 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -121,7 +121,8 @@  const struct cpu_cost_table generic_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -224,7 +225,8 @@  const struct cpu_cost_table cortexa53_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -327,7 +329,8 @@  const struct cpu_cost_table cortexa57_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -430,7 +433,8 @@  const struct cpu_cost_table cortexa76_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)  /* alu.  */
+    COSTS_N_INSNS (1),  /* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -533,7 +537,8 @@  const struct cpu_cost_table exynosm1_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (0)  /* alu.  */
+    COSTS_N_INSNS (0),  /* alu.  */
+    COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -636,7 +641,8 @@  const struct cpu_cost_table xgene1_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (2)  /* alu.  */
+    COSTS_N_INSNS (2),  /* alu.  */
+    COSTS_N_INSNS (8)   /* mult.  */
   }
 };
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bebccc13456a8eda21fe5ca8f7bc1783fa36d29a..829671a74ebc41e6e69f31ae2a2e4dfe126e75d5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1192,7 +1192,8 @@  const struct cpu_cost_table cortexa9_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
@@ -1295,7 +1296,8 @@  const struct cpu_cost_table cortexa8_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
@@ -1399,7 +1401,8 @@  const struct cpu_cost_table cortexa5_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
@@ -1504,7 +1507,8 @@  const struct cpu_cost_table cortexa7_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
@@ -1607,7 +1611,8 @@  const struct cpu_cost_table cortexa12_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
@@ -1710,7 +1715,8 @@  const struct cpu_cost_table cortexa15_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
@@ -1813,7 +1819,8 @@  const struct cpu_cost_table v7m_extra_costs =
   },
   /* Vector */
   {
-    COSTS_N_INSNS (1)	/* alu.  */
+    COSTS_N_INSNS (1),	/* alu.  */
+    COSTS_N_INSNS (4)	/* mult.  */
   }
 };
 
diff --git a/gcc/testsuite/gcc.target/aarch64/asimd-mul-to-shl-sub.c b/gcc/testsuite/gcc.target/aarch64/asimd-mul-to-shl-sub.c
new file mode 100644
index 0000000000000000000000000000000000000000..d7c5e5f341b2c56e9c2853859b786e1fe524eb59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asimd-mul-to-shl-sub.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-Ofast" } */
+
+/*
+**foo:
+**	shl	v1.4s, v0.4s, 16
+**	sub	v0.4s, v1.4s, v0.4s
+**	ret
+*/
+#include <arm_neon.h>
+uint32x4_t foo (uint32x4_t a)
+{
+  return a * 65535;
+}
+
+/* { dg-final { check-function-bodies "**" "" "" } } */

[AArch64] Fix vector multiplication costs

Commit Message

Comments

Patch