Message ID | 20210824085530.356808-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | Change illegitimate constant into memref of constant pool in change_zero_ext. | expand |
Hi! On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote: > This patch extend change_zero_ext to change illegitimate constant > into constant pool, this will enable simplification of below: It should be in a separate function. recog_for_combine will call both. But not both for the same RTL! This is important. Originally, combine tried only one thing for every combination of input instructions it got. Every extra attempt causes quite a bit more garbage (not to mention it takes time as well, recog isn't super cheap), so we should try to not make recog_for_combine exponential in the variants it tries.. And of course the function name should always be descriptive of what a function does :-) > change_zero_ext (rtx pat) > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat) > { > rtx x = **iter; > scalar_int_mode mode, inner_mode; > + machine_mode const_mode = GET_MODE (x); > + > + /* Change illegitimate constant into memref of constant pool. */ > + if (CONSTANT_P (x) > + && !const_vec_duplicate_p (x) This is x86-specific? It makes no sense in generic code, anyway. Or if it does it needs a big fat comment :-) > + && const_mode != BLKmode > + && GET_CODE (x) != HIGH > + && GET_MODE_SIZE (const_mode).is_constant () > + && !targetm.legitimate_constant_p (const_mode, x) > + && !targetm.cannot_force_const_mem (const_mode, x)) You should only test that it did not recog, and then force a constant to memory. You do not want to do this for every constant (rotate by 42 won't likely match better if you force 42 to memory) so some sophistication will help here, but please do not make it target- specific. > + { > + x = force_const_mem (GET_MODE (x), x); That mode is const_mode. Ideally you will have some example where it benefits some other target, too. Running recog twice for a big fraction of all combine attempts, for no benefit at all, is not a good idea. The zext* thing is there because combine *itself* creates a lot of extra zext*, whether those exist for the target or not. So this isn't obvious precedent (and that wouldn't mean it is a good idea anyway ;-) ) Segher
On Wed, Aug 25, 2021 at 5:14 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote: > > This patch extend change_zero_ext to change illegitimate constant > > into constant pool, this will enable simplification of below: > > It should be in a separate function. recog_for_combine will call both. > But not both for the same RTL! This is important. Originally, combine > tried only one thing for every combination of input instructions it got. > Every extra attempt causes quite a bit more garbage (not to mention it > takes time as well, recog isn't super cheap), so we should try to not > make recog_for_combine exponential in the variants it tries.. > > And of course the function name should always be descriptive of what a > function does :-) > Yes, will do. > > change_zero_ext (rtx pat) > > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat) > > { > > rtx x = **iter; > > scalar_int_mode mode, inner_mode; > > + machine_mode const_mode = GET_MODE (x); > > + > > + /* Change illegitimate constant into memref of constant pool. */ > > + if (CONSTANT_P (x) > > + && !const_vec_duplicate_p (x) > > This is x86-specific? It makes no sense in generic code, anyway. Or if > it does it needs a big fat comment :-) > > > + && const_mode != BLKmode > > + && GET_CODE (x) != HIGH > > + && GET_MODE_SIZE (const_mode).is_constant () > > + && !targetm.legitimate_constant_p (const_mode, x) > > + && !targetm.cannot_force_const_mem (const_mode, x)) > > You should only test that it did not recog, and then force a constant > to memory. You do not want to do this for every constant (rotate by 42 > won't likely match better if you force 42 to memory) so some > sophistication will help here, but please do not make it target- > specific. When it comes to change_zero_ext, insn_code_number must < 0, constant pool is used as a second choice. ---------cut-------------- if (insn_code_number >= 0 || check_asm_operands (pat)) return insn_code_number; void *marker = get_undo_marker (); bool changed = false; if (GET_CODE (pat) == SET) changed = change_zero_ext (pat); -------cut end----------------- > > > + { > > + x = force_const_mem (GET_MODE (x), x); > > That mode is const_mode. > > > Ideally you will have some example where it benefits some other target, > too. Running recog twice for a big fraction of all combine attempts, > for no benefit at all, is not a good idea. The zext* thing is there This optimization is similar, pass_combine extracts const_vector from constant pool and simplify it to another const_vector, but doesn't realize the simplified const_vector should also be in constant pool. Maybe I should just restrict this optimization to const_vector only. > because combine *itself* creates a lot of extra zext*, whether those > exist for the target or not. So this isn't obvious precedent (and that > wouldn't mean it is a good idea anyway ;-) ) > > > Segher
On Wed, Aug 25, 2021 at 5:14 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Tue, Aug 24, 2021 at 04:55:30PM +0800, liuhongt wrote: > > This patch extend change_zero_ext to change illegitimate constant > > into constant pool, this will enable simplification of below: > > It should be in a separate function. recog_for_combine will call both. > But not both for the same RTL! This is important. Originally, combine > tried only one thing for every combination of input instructions it got. > Every extra attempt causes quite a bit more garbage (not to mention it > takes time as well, recog isn't super cheap), so we should try to not > make recog_for_combine exponential in the variants it tries.. > > And of course the function name should always be descriptive of what a > function does :-) > > > change_zero_ext (rtx pat) > > @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat) > > { > > rtx x = **iter; > > scalar_int_mode mode, inner_mode; > > + machine_mode const_mode = GET_MODE (x); > > + > > + /* Change illegitimate constant into memref of constant pool. */ > > + if (CONSTANT_P (x) > > + && !const_vec_duplicate_p (x) > > This is x86-specific? It makes no sense in generic code, anyway. Or if > it does it needs a big fat comment :-) > > > + && const_mode != BLKmode > > + && GET_CODE (x) != HIGH > > + && GET_MODE_SIZE (const_mode).is_constant () > > + && !targetm.legitimate_constant_p (const_mode, x) > > + && !targetm.cannot_force_const_mem (const_mode, x)) > > You should only test that it did not recog, and then force a constant > to memory. You do not want to do this for every constant (rotate by 42 > won't likely match better if you force 42 to memory) so some > sophistication will help here, but please do not make it target- > specific. > > > + { > > + x = force_const_mem (GET_MODE (x), x); > > That mode is const_mode. > > > Ideally you will have some example where it benefits some other target, > too. Running recog twice for a big fraction of all combine attempts, > for no benefit at all, is not a good idea. The zext* thing is there > because combine *itself* creates a lot of extra zext*, whether those > exist for the target or not. So this isn't obvious precedent (and that > wouldn't mean it is a good idea anyway ;-) ) When trying to construct a testcase for another backend, I realized that the issue can also be solved by folding _mm_shuffle_ps into gimple vec_perm_expr. Let me try that way. > > > Segher
diff --git a/gcc/combine.c b/gcc/combine.c index cb5fa401fcb..0b2afdf45af 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11404,7 +11404,8 @@ recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) /* Change every ZERO_EXTRACT and ZERO_EXTEND of a SUBREG that can be expressed as an AND and maybe an LSHIFTRT, to that formulation. - Return whether anything was so changed. */ + Return whether anything was so changed. + Also change illegitimate constant into memref of constant pool. */ static bool change_zero_ext (rtx pat) @@ -11417,6 +11418,23 @@ change_zero_ext (rtx pat) { rtx x = **iter; scalar_int_mode mode, inner_mode; + machine_mode const_mode = GET_MODE (x); + + /* Change illegitimate constant into memref of constant pool. */ + if (CONSTANT_P (x) + && !const_vec_duplicate_p (x) + && const_mode != BLKmode + && GET_CODE (x) != HIGH + && GET_MODE_SIZE (const_mode).is_constant () + && !targetm.legitimate_constant_p (const_mode, x) + && !targetm.cannot_force_const_mem (const_mode, x)) + { + x = force_const_mem (GET_MODE (x), x); + SUBST (**iter, x); + changed = true; + continue; + } + if (!is_a <scalar_int_mode> (GET_MODE (x), &mode)) continue; int size; diff --git a/gcc/testsuite/gcc.target/i386/pr22076.c b/gcc/testsuite/gcc.target/i386/pr22076.c index 427ffcd4920..866c387280f 100644 --- a/gcc/testsuite/gcc.target/i386/pr22076.c +++ b/gcc/testsuite/gcc.target/i386/pr22076.c @@ -15,5 +15,5 @@ void test () x = _mm_add_pi8 (mm0, mm1); } -/* { dg-final { scan-assembler-times "movq" 2 } } */ -/* { dg-final { scan-assembler-not "movl" { target nonpic } } } */ +/* { dg-final { scan-assembler-times "movq" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "movl" 4 { target ia32 } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr43147.c b/gcc/testsuite/gcc.target/i386/pr43147.c new file mode 100644 index 00000000000..3c30f917c06 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr43147.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ +/* { dg-final { scan-assembler "movaps" } } */ +/* { dg-final { scan-assembler-not "shufps" } } */ + +#include <x86intrin.h> + +__m128 +foo (void) +{ + __m128 m = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f); + m = _mm_shuffle_ps(m, m, 0xC9); + m = _mm_shuffle_ps(m, m, 0x2D); + return m; +}