Message ID | CACgzC7Bx7fSLmYysz=09D4yi+_QcfLo4BXK3gEGa8kmywE0_JQ@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote: > Hi, > > For some large constants, ARM will split them during expanding, which > makes impossible to hoist them out the loop or shared by different > references (refer the test case in the patch). > > The patch keeps some constants in registers. If the constant can not > be optimized, the cprop and combine passes can optimize them as what > we do in current expand pass with > > define_insn_and_split "*arm_subsi3_insn" > define_insn_and_split "*arm_andsi3_insn" > define_insn_and_split "*iorsi3_insn" > define_insn_and_split "*arm_xorsi3" > > The patch does not modify addsi3 since the define_insn_and_split > "*arm_addsi3" is only valid when (reload_completed || > !arm_eliminable_register (operands[1])). The cprop and combine passes > can not optimize the large constant if we put it in register, which > will lead to regression. > > For logic operators, the patch skips changes for constants: > > INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) > > since expand pass always uses "sign-extend" to get the value > (trunc_int_for_mode called from immed_wide_int_const) for rtl, and > logs show most negative values are UNSIGNED when they are TREE node. > And combine pass is smart enough to recover the negative value to > positive value. I am unable to verify any change in code generation for this testcase with and without the patch when I had a play with the patch. what gives ? Ramana > > Bootstrap and no make check regression on Chrome book. > For coremark, dhrystone and eembcv1, no any code size and performance > change on Cortex-M4. > No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. > No Spec2000 performance regression on Chrome book and dumped assemble > codes only show very few difference. > > OK for trunk? > > Thanks! > -Zhenqiang > > ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some > large constants in register other than split them. > > testsuite/ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * gcc.target/arm/maskdata.c: New test.
On 05/08/14 10:31, Zhenqiang Chen wrote: > Hi, > > For some large constants, ARM will split them during expanding, which > makes impossible to hoist them out the loop or shared by different > references (refer the test case in the patch). > > The patch keeps some constants in registers. If the constant can not > be optimized, the cprop and combine passes can optimize them as what > we do in current expand pass with > > define_insn_and_split "*arm_subsi3_insn" > define_insn_and_split "*arm_andsi3_insn" > define_insn_and_split "*iorsi3_insn" > define_insn_and_split "*arm_xorsi3" > > The patch does not modify addsi3 since the define_insn_and_split > "*arm_addsi3" is only valid when (reload_completed || > !arm_eliminable_register (operands[1])). The cprop and combine passes > can not optimize the large constant if we put it in register, which > will lead to regression. > > For logic operators, the patch skips changes for constants: > > INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) > > since expand pass always uses "sign-extend" to get the value > (trunc_int_for_mode called from immed_wide_int_const) for rtl, and > logs show most negative values are UNSIGNED when they are TREE node. > And combine pass is smart enough to recover the negative value to > positive value. > > Bootstrap and no make check regression on Chrome book. > For coremark, dhrystone and eembcv1, no any code size and performance > change on Cortex-M4. > No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. > No Spec2000 performance regression on Chrome book and dumped assemble > codes only show very few difference. > > OK for trunk? > Not yet. I think the principle is sound, but there are some issues that need sorting out. 1) We shouldn't do this when not optimizing; there's no advantage to hoisting out the constants in that case. Indeed, it may not be worth-while at -O1 either. 2) I think you should be using const_ok_for_op rather than working through all the permitted values. If that function isn't doing the right thing, then it probably needs extending to handle those cases as well. 3) Why haven't you handled addsi3? See below for some specific comments. > Thanks! > -Zhenqiang > > ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some > large constants in register other than split them. > > testsuite/ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * gcc.target/arm/maskdata.c: New test. > > > skip-split.patch > > > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index bd8ea8f..c8b3001 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -1162,9 +1162,16 @@ > { > if (TARGET_32BIT) > { > - arm_split_constant (MINUS, SImode, NULL_RTX, > - INTVAL (operands[1]), operands[0], > - operands[2], optimize && can_create_pseudo_p ()); > + if (!const_ok_for_arm (INTVAL (operands[1])) > + && can_create_pseudo_p ()) > + { > + operands[1] = force_reg (SImode, operands[1]); > + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); The final emit_insn shouldn't be be needed. Once you've forced operands[1] into a register, you can just fall out the bottom of the pattern and let the default expansion take over. But... > + } > + else > + arm_split_constant (MINUS, SImode, NULL_RTX, > + INTVAL (operands[1]), operands[0], operands[2], > + optimize && can_create_pseudo_p ()); > DONE; ... You'll need to move the DONE inside the else clause. > } > else /* TARGET_THUMB1 */ > @@ -2077,6 +2084,17 @@ > emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], > operands[1])); > } > + else if (!(const_ok_for_arm (INTVAL (operands[2])) > + || const_ok_for_arm (~INTVAL (operands[2])) > + /* zero_extendhi instruction is efficient enough. */ > + || INTVAL (operands[2]) == 0xffff > + || (INTVAL (operands[2]) < 0 > + && const_ok_for_arm (-INTVAL (operands[2])))) > + && can_create_pseudo_p ()) > + { > + operands[2] = force_reg (SImode, operands[2]); > + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); Same here. > + } > else > arm_split_constant (AND, SImode, NULL_RTX, > INTVAL (operands[2]), operands[0], > @@ -2882,9 +2900,20 @@ > { > if (TARGET_32BIT) > { > - arm_split_constant (IOR, SImode, NULL_RTX, > - INTVAL (operands[2]), operands[0], operands[1], > - optimize && can_create_pseudo_p ()); > + if (!(const_ok_for_arm (INTVAL (operands[2])) > + || (TARGET_THUMB2 > + && const_ok_for_arm (~INTVAL (operands[2]))) > + || (INTVAL (operands[2]) < 0 > + && const_ok_for_arm (-INTVAL (operands[2])))) > + && can_create_pseudo_p ()) > + { > + operands[2] = force_reg (SImode, operands[2]); > + emit_insn (gen_iorsi3 (operands[0], operands[1], operands[2])); And here. > + } > + else > + arm_split_constant (IOR, SImode, NULL_RTX, > + INTVAL (operands[2]), operands[0], operands[1], > + optimize && can_create_pseudo_p ()); > DONE; > } > else /* TARGET_THUMB1 */ > @@ -3052,9 +3081,18 @@ > { > if (TARGET_32BIT) > { > - arm_split_constant (XOR, SImode, NULL_RTX, > - INTVAL (operands[2]), operands[0], operands[1], > - optimize && can_create_pseudo_p ()); > + if (!(const_ok_for_arm (INTVAL (operands[2])) > + || (INTVAL (operands[2]) < 0 > + && const_ok_for_arm (-INTVAL (operands[2])))) > + && can_create_pseudo_p ()) > + { > + operands[2] = force_reg (SImode, operands[2]); > + emit_insn (gen_xorsi3 (operands[0], operands[1], operands[2])); And here. > + } > + else > + arm_split_constant (XOR, SImode, NULL_RTX, > + INTVAL (operands[2]), operands[0], operands[1], > + optimize && can_create_pseudo_p ()); > DONE; > } > else /* TARGET_THUMB1 */ > > diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c > new file mode 100644 > index 0000000..b4231a4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/maskdata.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options " -O2 -fno-gcse " } */ > +/* { dg-require-effective-target arm_thumb2_ok } */ > + > +#define MASK 0xfe00ff > +void maskdata (int * data, int len) > +{ > + int i = len; > + for (; i > 0; i -= 2) > + { > + data[i] &= MASK; > + data[i + 1] &= MASK; > + } > +} > +/* { dg-final { scan-assembler "254" } } */ > +/* { dg-final { scan-assembler "255" } } */ >
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index bd8ea8f..c8b3001 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1162,9 +1162,16 @@ { if (TARGET_32BIT) { - arm_split_constant (MINUS, SImode, NULL_RTX, - INTVAL (operands[1]), operands[0], - operands[2], optimize && can_create_pseudo_p ()); + if (!const_ok_for_arm (INTVAL (operands[1])) + && can_create_pseudo_p ()) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (MINUS, SImode, NULL_RTX, + INTVAL (operands[1]), operands[0], operands[2], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ @@ -2077,6 +2084,17 @@ emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], operands[1])); } + else if (!(const_ok_for_arm (INTVAL (operands[2])) + || const_ok_for_arm (~INTVAL (operands[2])) + /* zero_extendhi instruction is efficient enough. */ + || INTVAL (operands[2]) == 0xffff + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); + } else arm_split_constant (AND, SImode, NULL_RTX, INTVAL (operands[2]), operands[0], @@ -2882,9 +2900,20 @@ { if (TARGET_32BIT) { - arm_split_constant (IOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); + if (!(const_ok_for_arm (INTVAL (operands[2])) + || (TARGET_THUMB2 + && const_ok_for_arm (~INTVAL (operands[2]))) + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_iorsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (IOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], operands[1], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ @@ -3052,9 +3081,18 @@ { if (TARGET_32BIT) { - arm_split_constant (XOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); + if (!(const_ok_for_arm (INTVAL (operands[2])) + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_xorsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (XOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], operands[1], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c new file mode 100644 index 0000000..b4231a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/maskdata.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options " -O2 -fno-gcse " } */ +/* { dg-require-effective-target arm_thumb2_ok } */ + +#define MASK 0xfe00ff +void maskdata (int * data, int len) +{ + int i = len; + for (; i > 0; i -= 2) + { + data[i] &= MASK; + data[i + 1] &= MASK; + } +} +/* { dg-final { scan-assembler "254" } } */ +/* { dg-final { scan-assembler "255" } } */