Message ID | 0637fdd9-f396-57e5-ed4f-4d54594020c9@arm.com |
---|---|
State | New |
Headers | show |
Series | [PR98791] : IRA: Make sure allocno copy mode's are ordered | expand |
On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote: > Hi, > > This patch makes sure that allocno copies are not created for > unordered modes. The testcases in the PR highlighted a case where an > allocno copy was being created for: > (insn 121 120 123 11 (parallel [ > (set (reg:VNx2QI 217) > (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 > ]) 0))) > (clobber (scratch:VNx16BI)) > ]) 4750 {*vec_duplicatevnx2qi_reg} > (expr_list:REG_DEAD (reg:SI 93 [ _2 ]) > (nil))) > > As the compiler detected that the vec_duplicate<mode>_reg pattern > allowed the input and output operand to be of the same register class, > it tried to create an allocno copy for these two operands, stripping > subregs in the process. However, this meant that the copy was between > VNx2QI and SI, which have unordered mode precisions. > > So at compile time we do not know which of the two modes is smaller > which is a requirement when updating allocno copy costs. > > Regression tested on aarch64-linux-gnu. > > Is this OK for trunk (and after a week backport to gcc-10) ? > OK. Yes, it is wise to wait a bit and see how the patch behaves on the trunk before submitting it to gcc-10 branch. Sometimes such changes can have quite unexpected consequences. But I guess not in this is case. Thank you for working on the issue. > gcc/ChangeLog: > 2021-02-19 Andre Vieira <andre.simoesdiasvieira@arm.com> > > PR rtl-optimization/98791 > * ira-conflicts.c (process_regs_for_copy): Don't create > allocno copies for unordered modes. > > gcc/testsuite/ChangeLog: > 2021-02-19 Andre Vieira <andre.simoesdiasvieira@arm.com> > > PR rtl-optimization/98791 > * gcc.target/aarch64/sve/pr98791.c: New test. >
Hi Andre, Thanks for fixing this. On 19/02/2021 10:53, Andre Vieira (lists) via Gcc-patches wrote: > Hi, > > This patch makes sure that allocno copies are not created for unordered > modes. The testcases in the PR highlighted a case where an allocno copy was > being created for: > (insn 121 120 123 11 (parallel [ > (set (reg:VNx2QI 217) > (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 ]) 0))) > (clobber (scratch:VNx16BI)) > ]) 4750 {*vec_duplicatevnx2qi_reg} > (expr_list:REG_DEAD (reg:SI 93 [ _2 ]) > (nil))) > > As the compiler detected that the vec_duplicate<mode>_reg pattern allowed > the input and output operand to be of the same register class, it tried to > create an allocno copy for these two operands, stripping subregs in the > process. However, this meant that the copy was between VNx2QI and SI, which > have unordered mode precisions. > > So at compile time we do not know which of the two modes is smaller which is > a requirement when updating allocno copy costs. > > Regression tested on aarch64-linux-gnu. > > Is this OK for trunk (and after a week backport to gcc-10) ? > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c > new file mode 100644 > index 0000000000000000000000000000000000000000..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c > @@ -0,0 +1,12 @@ > +/* PR rtl-optimization/98791 */ > +/* { dg-do compile } */ > +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */ > +extern char a[], b[]; > +short c, d; > +long *e; > +void f() { > + for (int g; g < c; g += 1) { > + a[g] = d; > + b[g] = e[g]; > + } > +} For the testcase, you might want to use the one I posted most recently: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3 which avoids the dependency on the aarch64-autovec-preference param (which is in GCC 11 only) as this will simplify backporting. But if it's preferable to have a testcase without SVE intrinsics for GCC 11 then we should stick with that. Cheers, Alex
Hi Alex, On 22/02/2021 10:20, Alex Coplan wrote: > For the testcase, you might want to use the one I posted most recently: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98791#c3 > which avoids the dependency on the aarch64-autovec-preference param > (which is in GCC 11 only) as this will simplify backporting. > > But if it's preferable to have a testcase without SVE intrinsics for GCC > 11 then we should stick with that. I don't see any problem with having SVE intrinsics in the testcase, committed with your other test as I agree it makes the backport easier eventually. Thanks for pointing that out. diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c index 2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0 100644 --- a/gcc/ira-conflicts.c +++ b/gcc/ira-conflicts.c @@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)]; ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)]; - if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2) + if (!allocnos_conflict_for_copy_p (a1, a2) + && offset1 == offset2 + && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)), + GET_MODE_PRECISION (ALLOCNO_MODE (a2)))) { cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn, ira_curr_loop_tree_node); diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c new file mode 100644 index 0000000000000000000000000000000000000000..cc1f1831afb68ba70016cbe26f8f9273cfceafa8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c @@ -0,0 +1,12 @@ +/* PR rtl-optimization/98791 */ +/* { dg-do compile } */ +/* { dg-options "-O -ftree-vectorize" } */ +#include <arm_sve.h> +extern char a[11]; +extern long b[]; +void f() { + for (int d; d < 10; d++) { + a[d] = svaddv(svptrue_b8(), svdup_u8(0)); + b[d] = 0; + } +}
On 19/02/2021 15:05, Vladimir Makarov wrote: > > On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote: >> Hi, >> >> This patch makes sure that allocno copies are not created for >> unordered modes. The testcases in the PR highlighted a case where an >> allocno copy was being created for: >> (insn 121 120 123 11 (parallel [ >> (set (reg:VNx2QI 217) >> (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 >> ]) 0))) >> (clobber (scratch:VNx16BI)) >> ]) 4750 {*vec_duplicatevnx2qi_reg} >> (expr_list:REG_DEAD (reg:SI 93 [ _2 ]) >> (nil))) >> >> As the compiler detected that the vec_duplicate<mode>_reg pattern >> allowed the input and output operand to be of the same register >> class, it tried to create an allocno copy for these two operands, >> stripping subregs in the process. However, this meant that the copy >> was between VNx2QI and SI, which have unordered mode precisions. >> >> So at compile time we do not know which of the two modes is smaller >> which is a requirement when updating allocno copy costs. >> >> Regression tested on aarch64-linux-gnu. >> >> Is this OK for trunk (and after a week backport to gcc-10) ? >> > OK. Yes, it is wise to wait a bit and see how the patch behaves on > the trunk before submitting it to gcc-10 branch. Sometimes such > changes can have quite unexpected consequences. But I guess not in > this is case. > > Is it OK to backport now? The committed patch applies cleanly and I regression tested it on gcc-10 branch for aarch64-linux-gnu. Kind regards, Andre
On 2021-03-10 5:25 a.m., Andre Vieira (lists) wrote: > > On 19/02/2021 15:05, Vladimir Makarov wrote: >> >> On 2021-02-19 5:53 a.m., Andre Vieira (lists) wrote: >>> Hi, >>> >>> This patch makes sure that allocno copies are not created for >>> unordered modes. The testcases in the PR highlighted a case where an >>> allocno copy was being created for: >>> (insn 121 120 123 11 (parallel [ >>> (set (reg:VNx2QI 217) >>> (vec_duplicate:VNx2QI (subreg/s/v:QI (reg:SI 93 [ _2 >>> ]) 0))) >>> (clobber (scratch:VNx16BI)) >>> ]) 4750 {*vec_duplicatevnx2qi_reg} >>> (expr_list:REG_DEAD (reg:SI 93 [ _2 ]) >>> (nil))) >>> >>> As the compiler detected that the vec_duplicate<mode>_reg pattern >>> allowed the input and output operand to be of the same register >>> class, it tried to create an allocno copy for these two operands, >>> stripping subregs in the process. However, this meant that the copy >>> was between VNx2QI and SI, which have unordered mode precisions. >>> >>> So at compile time we do not know which of the two modes is smaller >>> which is a requirement when updating allocno copy costs. >>> >>> Regression tested on aarch64-linux-gnu. >>> >>> Is this OK for trunk (and after a week backport to gcc-10) ? >>> >> OK. Yes, it is wise to wait a bit and see how the patch behaves on >> the trunk before submitting it to gcc-10 branch. Sometimes such >> changes can have quite unexpected consequences. But I guess not in >> this is case. >> >> > Is it OK to backport now? The committed patch applies cleanly and I > regression tested it on gcc-10 branch for aarch64-linux-gnu. > Yes. I did not see any problems with the patch after its submitting.
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c index 2c2234734c3166872d94d94c5960045cb89ff2a8..d83cfc1c1a708ba04f5e01a395721540e31173f0 100644 --- a/gcc/ira-conflicts.c +++ b/gcc/ira-conflicts.c @@ -275,7 +275,10 @@ process_regs_for_copy (rtx reg1, rtx reg2, bool constraint_p, ira_allocno_t a1 = ira_curr_regno_allocno_map[REGNO (reg1)]; ira_allocno_t a2 = ira_curr_regno_allocno_map[REGNO (reg2)]; - if (!allocnos_conflict_for_copy_p (a1, a2) && offset1 == offset2) + if (!allocnos_conflict_for_copy_p (a1, a2) + && offset1 == offset2 + && ordered_p (GET_MODE_PRECISION (ALLOCNO_MODE (a1)), + GET_MODE_PRECISION (ALLOCNO_MODE (a2)))) { cp = ira_add_allocno_copy (a1, a2, freq, constraint_p, insn, ira_curr_loop_tree_node); diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c new file mode 100644 index 0000000000000000000000000000000000000000..ee0c7b51602cacd45f9e33acecb1eaa9f9edebf2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98791.c @@ -0,0 +1,12 @@ +/* PR rtl-optimization/98791 */ +/* { dg-do compile } */ +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */ +extern char a[], b[]; +short c, d; +long *e; +void f() { + for (int g; g < c; g += 1) { + a[g] = d; + b[g] = e[g]; + } +}