Message ID | 22ba05fb-774e-62b8-64a2-90c5d73fcaba@arm.com |
---|---|
State | New |
Headers | show |
Series | arm: Fix regressions around MVE predicate codegen | expand |
Hi Andre, > -----Original Message----- > From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> > Sent: Tuesday, January 24, 2023 1:54 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Richard Biener <rguenther@suse.de>; > Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> > Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE > predicates before use [PR 107674] > > Hi, > > This patch teaches GCC that zero-extending a MVE predicate from 16-bits > to 32-bits and then only using 16-bits is a no-op. > It does so in two steps: > - it lets gcc know that it can access any MVE predicate mode using any > other MVE predicate mode without needing to copy it, using the > TARGET_MODES_TIEABLE_P hook, > - it teaches simplify_subreg to optimize a subreg with a vector > outermode, by replacing this outermode with a same-sized integer mode > and trying the avalailable optimizations, then if successful it > surrounds the result with a subreg casting it back to the original > vector outermode. > > This removes the unnecessary zero-extending shown on PR 107674 (though > it's a sign-extend there), that was introduced in gcc 11. > > Bootstrapped on aarch64-none-linux-gnu and regression tested on > arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. > > OK for trunk? > > gcc/ChangeLog: > > PR target/107674 > * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. > (arm_modes_tieable_p): Make MVE predicate modes tieable. > * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. > * simplify-rtx.cc (simplify_context::simplify_subreg): Teach > simplify_subreg to simplify subregs where the outermode is not > scalar. The arm changes look ok to me. We'll want a midend maintainer to have a look at simplify-rtx.cc > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary > zero-extend. diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) ** vldrb.8 q2, \[r0\] ** vldrb.8 q1, \[r1\] ** vcmp.i8 eq, q2, q1 -** vmrs r3, p0 @ movhi -** uxth r3, r3 -** vmsr p0, r3 @ movhi ** vpst ** vaddt.i8 q3, q2, q1 ** vpst Ah I see, that's the testcase from patch 1/3 that I criticized :) Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more robust? Thanks, Kyrill
On 26/01/2023 15:06, Kyrylo Tkachov wrote: > Hi Andre, > >> -----Original Message----- >> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> >> Sent: Tuesday, January 24, 2023 1:54 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw >> <Richard.Earnshaw@arm.com>; Richard Biener <rguenther@suse.de>; >> Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> >> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >> predicates before use [PR 107674] >> >> Hi, >> >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits >> to 32-bits and then only using 16-bits is a no-op. >> It does so in two steps: >> - it lets gcc know that it can access any MVE predicate mode using any >> other MVE predicate mode without needing to copy it, using the >> TARGET_MODES_TIEABLE_P hook, >> - it teaches simplify_subreg to optimize a subreg with a vector >> outermode, by replacing this outermode with a same-sized integer mode >> and trying the avalailable optimizations, then if successful it >> surrounds the result with a subreg casting it back to the original >> vector outermode. >> >> This removes the unnecessary zero-extending shown on PR 107674 (though >> it's a sign-extend there), that was introduced in gcc 11. >> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. >> >> OK for trunk? >> >> gcc/ChangeLog: >> >> PR target/107674 >> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. >> (arm_modes_tieable_p): Make MVE predicate modes tieable. >> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. >> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach >> simplify_subreg to simplify subregs where the outermode is not >> scalar. > > The arm changes look ok to me. We'll want a midend maintainer to have a look at simplify-rtx.cc > >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary >> zero-extend. > > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) > ** vldrb.8 q2, \[r0\] > ** vldrb.8 q1, \[r1\] > ** vcmp.i8 eq, q2, q1 > -** vmrs r3, p0 @ movhi > -** uxth r3, r3 > -** vmsr p0, r3 @ movhi > ** vpst > ** vaddt.i8 q3, q2, q1 > ** vpst > > Ah I see, that's the testcase from patch 1/3 that I criticized :) > Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more robust? > Thanks, > Kyrill I could, but I would rather not. I have a patch series waiting for GCC 14 that does further improvements to this (and other VPST codegen) sequences and if I do scan for 'absence' of an instruction I have to break them up into single tests each. Also it wouldn't then fail if we start spilling the predicate directly to memory for instance. Like I mentioned in the previous patch, the sequence is unlikely to be able to change through scheduling (other than maybe the reordering of the loads through some bad luck, but I could make it robust to that).
> -----Original Message----- > From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> > Sent: Friday, January 27, 2023 9:58 AM > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Richard Biener <rguenther@suse.de> > Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE > predicates before use [PR 107674] > > > > On 26/01/2023 15:06, Kyrylo Tkachov wrote: > > Hi Andre, > > > >> -----Original Message----- > >> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> > >> Sent: Tuesday, January 24, 2023 1:54 PM > >> To: gcc-patches@gcc.gnu.org > >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw > >> <Richard.Earnshaw@arm.com>; Richard Biener <rguenther@suse.de>; > >> Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> > >> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE > >> predicates before use [PR 107674] > >> > >> Hi, > >> > >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits > >> to 32-bits and then only using 16-bits is a no-op. > >> It does so in two steps: > >> - it lets gcc know that it can access any MVE predicate mode using any > >> other MVE predicate mode without needing to copy it, using the > >> TARGET_MODES_TIEABLE_P hook, > >> - it teaches simplify_subreg to optimize a subreg with a vector > >> outermode, by replacing this outermode with a same-sized integer mode > >> and trying the avalailable optimizations, then if successful it > >> surrounds the result with a subreg casting it back to the original > >> vector outermode. > >> > >> This removes the unnecessary zero-extending shown on PR 107674 > (though > >> it's a sign-extend there), that was introduced in gcc 11. > >> > >> Bootstrapped on aarch64-none-linux-gnu and regression tested on > >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. > >> > >> OK for trunk? > >> > >> gcc/ChangeLog: > >> > >> PR target/107674 > >> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. > >> (arm_modes_tieable_p): Make MVE predicate modes tieable. > >> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. > >> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach > >> simplify_subreg to simplify subregs where the outermode is not > >> scalar. > > > > The arm changes look ok to me. We'll want a midend maintainer to have a > look at simplify-rtx.cc > > > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary > >> zero-extend. > > > > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > > index > 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5 > acf9af0a2155b5c5 100644 > > --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > > @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) > > ** vldrb.8 q2, \[r0\] > > ** vldrb.8 q1, \[r1\] > > ** vcmp.i8 eq, q2, q1 > > -** vmrs r3, p0 @ movhi > > -** uxth r3, r3 > > -** vmsr p0, r3 @ movhi > > ** vpst > > ** vaddt.i8 q3, q2, q1 > > ** vpst > > > > Ah I see, that's the testcase from patch 1/3 that I criticized :) > > Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more > robust? > > Thanks, > > Kyrill > I could, but I would rather not. I have a patch series waiting for GCC > 14 that does further improvements to this (and other VPST codegen) > sequences and if I do scan for 'absence' of an instruction I have to > break them up into single tests each. Also it wouldn't then fail if we > start spilling the predicate directly to memory for instance. Like I > mentioned in the previous patch, the sequence is unlikely to be able to > change through scheduling (other than maybe the reordering of the loads > through some bad luck, but I could make it robust to that). Ok, looks like it was thought through, so fine by me. Thanks, Kyrill
Changed the testcase to be more robust (as per the discussion for the first patch). Still need the OK for the mid-end (simplify-rtx) part. Kind regards, Andre On 27/01/2023 09:59, Kyrylo Tkachov wrote: > > >> -----Original Message----- >> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> >> Sent: Friday, January 27, 2023 9:58 AM >> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw >> <Richard.Earnshaw@arm.com>; Richard Biener <rguenther@suse.de> >> Subject: Re: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >> predicates before use [PR 107674] >> >> >> >> On 26/01/2023 15:06, Kyrylo Tkachov wrote: >>> Hi Andre, >>> >>>> -----Original Message----- >>>> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> >>>> Sent: Tuesday, January 24, 2023 1:54 PM >>>> To: gcc-patches@gcc.gnu.org >>>> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Richard Earnshaw >>>> <Richard.Earnshaw@arm.com>; Richard Biener <rguenther@suse.de>; >>>> Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> >>>> Subject: [PATCH 2/3] arm: Remove unnecessary zero-extending of MVE >>>> predicates before use [PR 107674] >>>> >>>> Hi, >>>> >>>> This patch teaches GCC that zero-extending a MVE predicate from 16-bits >>>> to 32-bits and then only using 16-bits is a no-op. >>>> It does so in two steps: >>>> - it lets gcc know that it can access any MVE predicate mode using any >>>> other MVE predicate mode without needing to copy it, using the >>>> TARGET_MODES_TIEABLE_P hook, >>>> - it teaches simplify_subreg to optimize a subreg with a vector >>>> outermode, by replacing this outermode with a same-sized integer mode >>>> and trying the avalailable optimizations, then if successful it >>>> surrounds the result with a subreg casting it back to the original >>>> vector outermode. >>>> >>>> This removes the unnecessary zero-extending shown on PR 107674 >> (though >>>> it's a sign-extend there), that was introduced in gcc 11. >>>> >>>> Bootstrapped on aarch64-none-linux-gnu and regression tested on >>>> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. >>>> >>>> OK for trunk? >>>> >>>> gcc/ChangeLog: >>>> >>>> PR target/107674 >>>> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. >>>> (arm_modes_tieable_p): Make MVE predicate modes tieable. >>>> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. >>>> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach >>>> simplify_subreg to simplify subregs where the outermode is not >>>> scalar. >>> >>> The arm changes look ok to me. We'll want a midend maintainer to have a >> look at simplify-rtx.cc >>> >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary >>>> zero-extend. >>> >>> diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >> b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> index >> 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5 >> acf9af0a2155b5c5 100644 >>> --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >>> @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) >>> ** vldrb.8 q2, \[r0\] >>> ** vldrb.8 q1, \[r1\] >>> ** vcmp.i8 eq, q2, q1 >>> -** vmrs r3, p0 @ movhi >>> -** uxth r3, r3 >>> -** vmsr p0, r3 @ movhi >>> ** vpst >>> ** vaddt.i8 q3, q2, q1 >>> ** vpst >>> >>> Ah I see, that's the testcase from patch 1/3 that I criticized :) >>> Maybe if we just scan for absence of an uxth, vmrs and vmsr it will be more >> robust? >>> Thanks, >>> Kyrill >> I could, but I would rather not. I have a patch series waiting for GCC >> 14 that does further improvements to this (and other VPST codegen) >> sequences and if I do scan for 'absence' of an instruction I have to >> break them up into single tests each. Also it wouldn't then fail if we >> start spilling the predicate directly to memory for instance. Like I >> mentioned in the previous patch, the sequence is unlikely to be able to >> change through scheduling (other than maybe the reordering of the loads >> through some bad luck, but I could make it robust to that). > > Ok, looks like it was thought through, so fine by me. > Thanks, > Kyrill diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 632728371d5cef364e47bf33bfa0faba738db871..8325e7a876e2e03f14cba07385cc5a1ddd771655 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1104,6 +1104,10 @@ extern const int arm_arch_cde_coproc_bits[]; || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \ || (MODE) == V2DFmode) +#define VALID_MVE_PRED_MODE(MODE) \ + ((MODE) == HImode \ + || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode) + #define VALID_MVE_SI_MODE(MODE) \ ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \ || (MODE) == V16QImode) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index efc48349dd3508e6790c1a9f3bba5da689a986bc..4d9d202cad1f39ba386df9d8e4277007fd960262 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -25656,10 +25656,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return false; if (IS_VPR_REGNUM (regno)) - return mode == HImode - || mode == V16BImode - || mode == V8BImode - || mode == V4BImode; + return VALID_MVE_PRED_MODE (mode); if (TARGET_THUMB1) /* For the Thumb we only allow values bigger than SImode in @@ -25738,6 +25735,10 @@ arm_modes_tieable_p (machine_mode mode1, machine_mode mode2) if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2)) return true; + if (TARGET_HAVE_MVE + && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2))) + return true; + /* We specifically want to allow elements of "structure" modes to be tieable to the structure. This more general condition allows other rarer situations too. */ diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, } } + /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by using a + NEW_OUTERMODE of the same size instead, other simplifications rely on + integer to integer subregs and we'd potentially miss out on optimizations + otherwise. */ + if (known_gt (GET_MODE_SIZE (innermode), + GET_MODE_SIZE (outermode)) + && SCALAR_INT_MODE_P (innermode) + && !SCALAR_INT_MODE_P (outermode) + && int_mode_for_size (GET_MODE_BITSIZE (outermode), + 0).exists (&int_outermode)) + { + rtx tem = simplify_subreg (int_outermode, op, innermode, byte); + if (tem) + return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte); + } + /* If OP is a vector comparison and the subreg is not changing the number of elements or the size of the elements, change the result of the comparison to the new mode. */ diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c index 28e4697c3c5bcc89b37fcb296f4b46c861aed27d..41f4e3805d62d0343c4035a328250fb8c7b0c47f 100644 --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c @@ -16,12 +16,9 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) ** vldrb.8 q[0-9]+, \[r[0-9]+\] ** vldrb.8 q[0-9]+, \[r[0-9]+\] ** vcmp.i8 eq, q[0-9]+, q[0-9]+ -** vmrs (r[0-9]+), p0 @ movhi -** uxth \1, \1 -** vmsr p0, \1 @ movhi ** vpst ** vaddt.i8 (q[0-9]+), q[0-9]+, q[0-9]+ ** vpst -** vstrbt.8 \2, \[r[0-9]+\] +** vstrbt.8 \1, \[r[0-9]+\] ** bx lr */
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes: > Hi, > > This patch teaches GCC that zero-extending a MVE predicate from 16-bits > to 32-bits and then only using 16-bits is a no-op. > It does so in two steps: > - it lets gcc know that it can access any MVE predicate mode using any > other MVE predicate mode without needing to copy it, using the > TARGET_MODES_TIEABLE_P hook, > - it teaches simplify_subreg to optimize a subreg with a vector > outermode, by replacing this outermode with a same-sized integer mode > and trying the avalailable optimizations, then if successful it > surrounds the result with a subreg casting it back to the original > vector outermode. > > This removes the unnecessary zero-extending shown on PR 107674 (though > it's a sign-extend there), that was introduced in gcc 11. > > Bootstrapped on aarch64-none-linux-gnu and regression tested on > arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. > > OK for trunk? > > gcc/ChangeLog: > > PR target/107674 > * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. > (arm_modes_tieable_p): Make MVE predicate modes tieable. > * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. > * simplify-rtx.cc (simplify_context::simplify_subreg): Teach > simplify_subreg to simplify subregs where the outermode is not scalar. > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary > zero-extend. > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index 6f7ecf9128047647fc41677e634cd9612a13242b..4352c830cb6d2e632a225edea861b5ceb35dd035 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -1091,6 +1091,10 @@ extern const int arm_arch_cde_coproc_bits[]; > || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \ > || (MODE) == V2DFmode) > > +#define VALID_MVE_PRED_MODE(MODE) \ > + ((MODE) == HImode \ > + || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode) > + > #define VALID_MVE_SI_MODE(MODE) \ > ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \ > || (MODE) == V16QImode) > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc > index 3f171188de513e258369397e4726afe27bd9fdbf..18460ef5280be8c1df85eff424a1bf66d6019c0a 100644 > --- a/gcc/config/arm/arm.cc > +++ b/gcc/config/arm/arm.cc > @@ -25564,10 +25564,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode) > return false; > > if (IS_VPR_REGNUM (regno)) > - return mode == HImode > - || mode == V16BImode > - || mode == V8BImode > - || mode == V4BImode; > + return VALID_MVE_PRED_MODE (mode); > > if (TARGET_THUMB1) > /* For the Thumb we only allow values bigger than SImode in > @@ -25646,6 +25643,10 @@ arm_modes_tieable_p (machine_mode mode1, machine_mode mode2) > if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2)) > return true; > > + if (TARGET_HAVE_MVE > + && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2))) > + return true; > + > /* We specifically want to allow elements of "structure" modes to > be tieable to the structure. This more general condition allows > other rarer situations too. */ > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, > } > } > > + /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by using a > + NEW_OUTERMODE of the same size instead, other simplifications rely on > + integer to integer subregs and we'd potentially miss out on optimizations > + otherwise. */ How about: /* If the outer mode is not integral, try taking a subreg with the equivalent integer outer mode and then bitcasting the result. Other simplifications rely on integer to integer subregs and we'd potentially miss out on optimizations otherwise. */ (Think it's easier to read as two sentences, and there's no new_outermode in the code.) > + if (known_gt (GET_MODE_SIZE (innermode), > + GET_MODE_SIZE (outermode)) > + && SCALAR_INT_MODE_P (innermode) > + && !SCALAR_INT_MODE_P (outermode) > + && int_mode_for_size (GET_MODE_BITSIZE (outermode), > + 0).exists (&int_outermode)) > + { > + rtx tem = simplify_subreg (int_outermode, op, innermode, byte); > + if (tem) > + return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte); Perhaps safer as s/GET_MODE (tem)/int_outermode/, in case TEM turns out to be constant. OK for the simplify-rtx.cc part with those changes. Thanks, Richard > + } > + > /* If OP is a vector comparison and the subreg is not changing the > number of elements or the size of the elements, change the result > of the comparison to the new mode. */ > diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 > --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c > @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) > ** vldrb.8 q2, \[r0\] > ** vldrb.8 q1, \[r1\] > ** vcmp.i8 eq, q2, q1 > -** vmrs r3, p0 @ movhi > -** uxth r3, r3 > -** vmsr p0, r3 @ movhi > ** vpst > ** vaddt.i8 q3, q2, q1 > ** vpst
Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes: >> Hi, >> >> This patch teaches GCC that zero-extending a MVE predicate from 16-bits >> to 32-bits and then only using 16-bits is a no-op. >> It does so in two steps: >> - it lets gcc know that it can access any MVE predicate mode using any >> other MVE predicate mode without needing to copy it, using the >> TARGET_MODES_TIEABLE_P hook, >> - it teaches simplify_subreg to optimize a subreg with a vector >> outermode, by replacing this outermode with a same-sized integer mode >> and trying the avalailable optimizations, then if successful it >> surrounds the result with a subreg casting it back to the original >> vector outermode. >> >> This removes the unnecessary zero-extending shown on PR 107674 (though >> it's a sign-extend there), that was introduced in gcc 11. >> >> Bootstrapped on aarch64-none-linux-gnu and regression tested on >> arm-none-eabi and armeb-none-eabi for armv8.1-m.main+mve.fp. >> >> OK for trunk? >> >> gcc/ChangeLog: >> >> PR target/107674 >> * conig/arm/arm.cc (arm_hard_regno_mode_ok): Use new MACRO. >> (arm_modes_tieable_p): Make MVE predicate modes tieable. >> * config/arm/arm.h (VALID_MVE_PRED_MODE): New define. >> * simplify-rtx.cc (simplify_context::simplify_subreg): Teach >> simplify_subreg to simplify subregs where the outermode is not scalar. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/arm/mve/mve_vpt.c: Change to remove unecessary >> zero-extend. >> >> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h >> index 6f7ecf9128047647fc41677e634cd9612a13242b..4352c830cb6d2e632a225edea861b5ceb35dd035 100644 >> --- a/gcc/config/arm/arm.h >> +++ b/gcc/config/arm/arm.h >> @@ -1091,6 +1091,10 @@ extern const int arm_arch_cde_coproc_bits[]; >> || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \ >> || (MODE) == V2DFmode) >> >> +#define VALID_MVE_PRED_MODE(MODE) \ >> + ((MODE) == HImode \ >> + || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode) >> + >> #define VALID_MVE_SI_MODE(MODE) \ >> ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \ >> || (MODE) == V16QImode) >> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc >> index 3f171188de513e258369397e4726afe27bd9fdbf..18460ef5280be8c1df85eff424a1bf66d6019c0a 100644 >> --- a/gcc/config/arm/arm.cc >> +++ b/gcc/config/arm/arm.cc >> @@ -25564,10 +25564,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode) >> return false; >> >> if (IS_VPR_REGNUM (regno)) >> - return mode == HImode >> - || mode == V16BImode >> - || mode == V8BImode >> - || mode == V4BImode; >> + return VALID_MVE_PRED_MODE (mode); >> >> if (TARGET_THUMB1) >> /* For the Thumb we only allow values bigger than SImode in >> @@ -25646,6 +25643,10 @@ arm_modes_tieable_p (machine_mode mode1, machine_mode mode2) >> if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2)) >> return true; >> >> + if (TARGET_HAVE_MVE >> + && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2))) >> + return true; >> + >> /* We specifically want to allow elements of "structure" modes to >> be tieable to the structure. This more general condition allows >> other rarer situations too. */ >> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc >> index 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42 100644 >> --- a/gcc/simplify-rtx.cc >> +++ b/gcc/simplify-rtx.cc >> @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, >> } >> } >> >> + /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by using a >> + NEW_OUTERMODE of the same size instead, other simplifications rely on >> + integer to integer subregs and we'd potentially miss out on optimizations >> + otherwise. */ > > How about: > > /* If the outer mode is not integral, try taking a subreg with the > equivalent integer outer mode and then bitcasting the result. > Other simplifications rely on integer to integer subregs and we'd > potentially miss out on optimizations otherwise. */ > > (Think it's easier to read as two sentences, and there's no > new_outermode in the code.) > >> + if (known_gt (GET_MODE_SIZE (innermode), >> + GET_MODE_SIZE (outermode)) >> + && SCALAR_INT_MODE_P (innermode) >> + && !SCALAR_INT_MODE_P (outermode) >> + && int_mode_for_size (GET_MODE_BITSIZE (outermode), >> + 0).exists (&int_outermode)) >> + { >> + rtx tem = simplify_subreg (int_outermode, op, innermode, byte); >> + if (tem) >> + return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte); > > Perhaps safer as s/GET_MODE (tem)/int_outermode/, in case TEM turns > out to be constant. Also, the final line should pass 0 rather than byte. (The tem = line is OK.) Thanks, Richard > > OK for the simplify-rtx.cc part with those changes. > > Thanks, > Richard > > >> + } >> + >> /* If OP is a vector comparison and the subreg is not changing the >> number of elements or the size of the elements, change the result >> of the comparison to the new mode. */ >> diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >> index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 >> --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >> +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c >> @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) >> ** vldrb.8 q2, \[r0\] >> ** vldrb.8 q1, \[r1\] >> ** vcmp.i8 eq, q2, q1 >> -** vmrs r3, p0 @ movhi >> -** uxth r3, r3 >> -** vmsr p0, r3 @ movhi >> ** vpst >> ** vaddt.i8 q3, q2, q1 >> ** vpst
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 6f7ecf9128047647fc41677e634cd9612a13242b..4352c830cb6d2e632a225edea861b5ceb35dd035 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1091,6 +1091,10 @@ extern const int arm_arch_cde_coproc_bits[]; || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \ || (MODE) == V2DFmode) +#define VALID_MVE_PRED_MODE(MODE) \ + ((MODE) == HImode \ + || (MODE) == V16BImode || (MODE) == V8BImode || (MODE) == V4BImode) + #define VALID_MVE_SI_MODE(MODE) \ ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \ || (MODE) == V16QImode) diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 3f171188de513e258369397e4726afe27bd9fdbf..18460ef5280be8c1df85eff424a1bf66d6019c0a 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -25564,10 +25564,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return false; if (IS_VPR_REGNUM (regno)) - return mode == HImode - || mode == V16BImode - || mode == V8BImode - || mode == V4BImode; + return VALID_MVE_PRED_MODE (mode); if (TARGET_THUMB1) /* For the Thumb we only allow values bigger than SImode in @@ -25646,6 +25643,10 @@ arm_modes_tieable_p (machine_mode mode1, machine_mode mode2) if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2)) return true; + if (TARGET_HAVE_MVE + && (VALID_MVE_PRED_MODE (mode1) && VALID_MVE_PRED_MODE (mode2))) + return true; + /* We specifically want to allow elements of "structure" modes to be tieable to the structure. This more general condition allows other rarer situations too. */ diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 7fb1e97fbea4e7b8b091f5724ebe0cb61eee7ec3..a951272186585c0a5cc3e0155285e7a635865f42 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7652,6 +7652,22 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, } } + /* Try simplifying a SUBREG expression of a non-integer OUTERMODE by using a + NEW_OUTERMODE of the same size instead, other simplifications rely on + integer to integer subregs and we'd potentially miss out on optimizations + otherwise. */ + if (known_gt (GET_MODE_SIZE (innermode), + GET_MODE_SIZE (outermode)) + && SCALAR_INT_MODE_P (innermode) + && !SCALAR_INT_MODE_P (outermode) + && int_mode_for_size (GET_MODE_BITSIZE (outermode), + 0).exists (&int_outermode)) + { + rtx tem = simplify_subreg (int_outermode, op, innermode, byte); + if (tem) + return simplify_gen_subreg (outermode, tem, GET_MODE (tem), byte); + } + /* If OP is a vector comparison and the subreg is not changing the number of elements or the size of the elements, change the result of the comparison to the new mode. */ diff --git a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c index 26a565b79dd1348e361b3aa23a1d6e6d13bffce8..8e562a9f065eff157f63ebd5acf9af0a2155b5c5 100644 --- a/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c +++ b/gcc/testsuite/gcc.target/arm/mve/mve_vpt.c @@ -16,9 +16,6 @@ void test0 (uint8_t *a, uint8_t *b, uint8_t *c) ** vldrb.8 q2, \[r0\] ** vldrb.8 q1, \[r1\] ** vcmp.i8 eq, q2, q1 -** vmrs r3, p0 @ movhi -** uxth r3, r3 -** vmsr p0, r3 @ movhi ** vpst ** vaddt.i8 q3, q2, q1 ** vpst