Message ID | 5936670F.4040404@foss.arm.com |
---|---|
State | New |
Headers | show |
On 06/06/2017 02:25 AM, Kyrill Tkachov wrote: > Hi all, > > I'm trying to improve some of the RTL-level handling of vector lane > operations on aarch64 and that > involves dealing with a lot of vec_merge operations. One simplification > that I noticed missing > from simplify-rtx are combinations of vec_merge with vec_duplicate. > In this particular case: > (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N)) > > which can be replaced with > > (vec_concat (X) (B)) if N == 1 (0b01) or > (vec_concat (A) (X)) if N == 2 (0b10). > > For the aarch64 testcase in this patch this simplifications allows us to > try to combine: > (set (reg:V2DI 77 [ x ]) > (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]) > (const_int 0 [0]))) > > instead of the more complex: > (set (reg:V2DI 77 [ x ]) > (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 > *y_3(D)+0 S8 A64])) > (const_vector:V2DI [ > (const_int 0 [0]) > (const_int 0 [0]) > ]) > (const_int 1 [0x1]))) > > > For the simplified form above we already have an aarch64 pattern: > *aarch64_combinez<mode> which > is missing a DI/DFmode version due to an oversight, so this patch > extends that pattern as well to > use the VDC mode iterator that includes DI and DFmode (as well as V2HF > which VD_BHSI was missing). > The aarch64 hunk is needed to see the benefit of the simplify-rtx.c > hunk, so I didn't split them > into separate patches. > > Before this for the testcase we'd generate: > construct_lanedi: > movi v0.4s, 0 > ldr x0, [x0] > ins v0.d[0], x0 > ret > > construct_lanedf: > movi v0.2d, 0 > ldr d1, [x0] > ins v0.d[0], v1.d[0] > ret > > but now we can generate: > construct_lanedi: > ldr d0, [x0] > ret > > construct_lanedf: > ldr d0, [x0] > ret > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > > Thanks, > Kyrill > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): > Simplify vec_merge of vec_duplicate and const_vector. > * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero): > New predicate. > * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC > mode iterator. Update predicate on operand 1 to > handle non-const_vec constants. Delete constraints. > (*aarch64_combinez_be<mode>): Likewise for operand 2. > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/construct_lane_zero_1.c: New test. OK for the simplify-rtx parts. jeff
On 27/06/17 23:29, Jeff Law wrote: > On 06/06/2017 02:25 AM, Kyrill Tkachov wrote: >> Hi all, >> >> I'm trying to improve some of the RTL-level handling of vector lane >> operations on aarch64 and that >> involves dealing with a lot of vec_merge operations. One simplification >> that I noticed missing >> from simplify-rtx are combinations of vec_merge with vec_duplicate. >> In this particular case: >> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N)) >> >> which can be replaced with >> >> (vec_concat (X) (B)) if N == 1 (0b01) or >> (vec_concat (A) (X)) if N == 2 (0b10). >> >> For the aarch64 testcase in this patch this simplifications allows us to >> try to combine: >> (set (reg:V2DI 77 [ x ]) >> (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]) >> (const_int 0 [0]))) >> >> instead of the more complex: >> (set (reg:V2DI 77 [ x ]) >> (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 >> *y_3(D)+0 S8 A64])) >> (const_vector:V2DI [ >> (const_int 0 [0]) >> (const_int 0 [0]) >> ]) >> (const_int 1 [0x1]))) >> >> >> For the simplified form above we already have an aarch64 pattern: >> *aarch64_combinez<mode> which >> is missing a DI/DFmode version due to an oversight, so this patch >> extends that pattern as well to >> use the VDC mode iterator that includes DI and DFmode (as well as V2HF >> which VD_BHSI was missing). >> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c >> hunk, so I didn't split them >> into separate patches. >> >> Before this for the testcase we'd generate: >> construct_lanedi: >> movi v0.4s, 0 >> ldr x0, [x0] >> ins v0.d[0], x0 >> ret >> >> construct_lanedf: >> movi v0.2d, 0 >> ldr d1, [x0] >> ins v0.d[0], v1.d[0] >> ret >> >> but now we can generate: >> construct_lanedi: >> ldr d0, [x0] >> ret >> >> construct_lanedf: >> ldr d0, [x0] >> ret >> >> Bootstrapped and tested on aarch64-none-linux-gnu. >> >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): >> Simplify vec_merge of vec_duplicate and const_vector. >> * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero): >> New predicate. >> * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC >> mode iterator. Update predicate on operand 1 to >> handle non-const_vec constants. Delete constraints. >> (*aarch64_combinez_be<mode>): Likewise for operand 2. >> >> 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * gcc.target/aarch64/construct_lane_zero_1.c: New test. > OK for the simplify-rtx parts. Thanks Jeff. Pinging the aarch64 parts at: https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00272.html I've re-bootstrapped and re-tested the patches on top of current trunk. Kyrill > jeff >
On 05/07/17 16:14, Kyrill Tkachov wrote: > > On 27/06/17 23:29, Jeff Law wrote: >> On 06/06/2017 02:25 AM, Kyrill Tkachov wrote: >>> Hi all, >>> >>> I'm trying to improve some of the RTL-level handling of vector lane >>> operations on aarch64 and that >>> involves dealing with a lot of vec_merge operations. One simplification >>> that I noticed missing >>> from simplify-rtx are combinations of vec_merge with vec_duplicate. >>> In this particular case: >>> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N)) >>> >>> which can be replaced with >>> >>> (vec_concat (X) (B)) if N == 1 (0b01) or >>> (vec_concat (A) (X)) if N == 2 (0b10). >>> >>> For the aarch64 testcase in this patch this simplifications allows us to >>> try to combine: >>> (set (reg:V2DI 77 [ x ]) >>> (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]) >>> (const_int 0 [0]))) >>> >>> instead of the more complex: >>> (set (reg:V2DI 77 [ x ]) >>> (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 >>> *y_3(D)+0 S8 A64])) >>> (const_vector:V2DI [ >>> (const_int 0 [0]) >>> (const_int 0 [0]) >>> ]) >>> (const_int 1 [0x1]))) >>> >>> >>> For the simplified form above we already have an aarch64 pattern: >>> *aarch64_combinez<mode> which >>> is missing a DI/DFmode version due to an oversight, so this patch >>> extends that pattern as well to >>> use the VDC mode iterator that includes DI and DFmode (as well as V2HF >>> which VD_BHSI was missing). >>> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c >>> hunk, so I didn't split them >>> into separate patches. >>> >>> Before this for the testcase we'd generate: >>> construct_lanedi: >>> movi v0.4s, 0 >>> ldr x0, [x0] >>> ins v0.d[0], x0 >>> ret >>> >>> construct_lanedf: >>> movi v0.2d, 0 >>> ldr d1, [x0] >>> ins v0.d[0], v1.d[0] >>> ret >>> >>> but now we can generate: >>> construct_lanedi: >>> ldr d0, [x0] >>> ret >>> >>> construct_lanedf: >>> ldr d0, [x0] >>> ret >>> >>> Bootstrapped and tested on aarch64-none-linux-gnu. >>> >>> Ok for trunk? >>> >>> Thanks, >>> Kyrill >>> >>> 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>> >>> * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): >>> Simplify vec_merge of vec_duplicate and const_vector. >>> * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero): >>> New predicate. >>> * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC >>> mode iterator. Update predicate on operand 1 to >>> handle non-const_vec constants. Delete constraints. >>> (*aarch64_combinez_be<mode>): Likewise for operand 2. >>> >>> 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>> >>> * gcc.target/aarch64/construct_lane_zero_1.c: New test. >> OK for the simplify-rtx parts. > > Thanks Jeff. > Pinging the aarch64 parts at: > https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00272.html Ping. Thanks, Kyrill > I've re-bootstrapped and re-tested the patches on top of current trunk. > > Kyrill > >> jeff >> >
On Tue, Jun 06, 2017 at 09:25:51AM +0100, Kyrill Tkachov wrote: > Hi all, > > I'm trying to improve some of the RTL-level handling of vector lane > operations on aarch64 and that involves dealing with a lot of vec_merge > operations. One simplification that I noticed missing from simplify-rtx are > combinations of vec_merge with vec_duplicate. > In this particular case: > (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N)) > > which can be replaced with > > (vec_concat (X) (B)) if N == 1 (0b01) or > (vec_concat (A) (X)) if N == 2 (0b10). > > For the aarch64 testcase in this patch this simplifications allows us to try to combine: > (set (reg:V2DI 77 [ x ]) > (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]) > (const_int 0 [0]))) > > instead of the more complex: > (set (reg:V2DI 77 [ x ]) > (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])) > (const_vector:V2DI [ > (const_int 0 [0]) > (const_int 0 [0]) > ]) > (const_int 1 [0x1]))) > > > For the simplified form above we already have an aarch64 pattern: *aarch64_combinez<mode> which > is missing a DI/DFmode version due to an oversight, so this patch extends that pattern as well to > use the VDC mode iterator that includes DI and DFmode (as well as V2HF which VD_BHSI was missing). > The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so I didn't split them > into separate patches. > > Before this for the testcase we'd generate: > construct_lanedi: > movi v0.4s, 0 > ldr x0, [x0] > ins v0.d[0], x0 > ret > > construct_lanedf: > movi v0.2d, 0 > ldr d1, [x0] > ins v0.d[0], v1.d[0] > ret > > but now we can generate: > construct_lanedi: > ldr d0, [x0] > ret > > construct_lanedf: > ldr d0, [x0] > ret > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? OK. Thanks, James > 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): > Simplify vec_merge of vec_duplicate and const_vector. > * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero): > New predicate. > * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC > mode iterator. Update predicate on operand 1 to > handle non-const_vec constants. Delete constraints. > (*aarch64_combinez_be<mode>): Likewise for operand 2. > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/construct_lane_zero_1.c: New test.
On Tue, Jun 6, 2017 at 1:25 AM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: > Hi all, > > I'm trying to improve some of the RTL-level handling of vector lane > operations on aarch64 and that > involves dealing with a lot of vec_merge operations. One simplification that > I noticed missing > from simplify-rtx are combinations of vec_merge with vec_duplicate. > In this particular case: > (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N)) > > which can be replaced with > > (vec_concat (X) (B)) if N == 1 (0b01) or > (vec_concat (A) (X)) if N == 2 (0b10). > > For the aarch64 testcase in this patch this simplifications allows us to try > to combine: > (set (reg:V2DI 77 [ x ]) > (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]) > (const_int 0 [0]))) > > instead of the more complex: > (set (reg:V2DI 77 [ x ]) > (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 > *y_3(D)+0 S8 A64])) > (const_vector:V2DI [ > (const_int 0 [0]) > (const_int 0 [0]) > ]) > (const_int 1 [0x1]))) > > > For the simplified form above we already have an aarch64 pattern: > *aarch64_combinez<mode> which > is missing a DI/DFmode version due to an oversight, so this patch extends > that pattern as well to > use the VDC mode iterator that includes DI and DFmode (as well as V2HF which > VD_BHSI was missing). > The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so > I didn't split them > into separate patches. > > Before this for the testcase we'd generate: > construct_lanedi: > movi v0.4s, 0 > ldr x0, [x0] > ins v0.d[0], x0 > ret > > construct_lanedf: > movi v0.2d, 0 > ldr d1, [x0] > ins v0.d[0], v1.d[0] > ret > > but now we can generate: > construct_lanedi: > ldr d0, [x0] > ret > > construct_lanedf: > ldr d0, [x0] > ret > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > > Thanks, > Kyrill > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): > Simplify vec_merge of vec_duplicate and const_vector. > * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero): > New predicate. > * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC > mode iterator. Update predicate on operand 1 to > handle non-const_vec constants. Delete constraints. > (*aarch64_combinez_be<mode>): Likewise for operand 2. > > 2017-06-06 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * gcc.target/aarch64/construct_lane_zero_1.c: New test. This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85090
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index d31857db385e1695d37906f8339620c9c878b50c..77a3a7d6534e5fd3575e33d5a7c607713abd614b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2808,9 +2808,9 @@ (define_insn "aarch64_get_lane<mode>" (define_insn "*aarch64_combinez<mode>" [(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w") - (vec_concat:<VDBL> - (match_operand:VD_BHSI 1 "general_operand" "w,?r,m") - (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")))] + (vec_concat:<VDBL> + (match_operand:VDC 1 "general_operand" "w,?r,m") + (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")))] "TARGET_SIMD && !BYTES_BIG_ENDIAN" "@ mov\\t%0.8b, %1.8b @@ -2824,8 +2824,8 @@ (define_insn "*aarch64_combinez<mode>" (define_insn "*aarch64_combinez_be<mode>" [(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w") (vec_concat:<VDBL> - (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz") - (match_operand:VD_BHSI 1 "general_operand" "w,?r,m")))] + (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero") + (match_operand:VDC 1 "general_operand" "w,?r,m")))] "TARGET_SIMD && BYTES_BIG_ENDIAN" "@ mov\\t%0.8b, %1.8b diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 4bd8f45562c017bca736ae466ede6b9e4de0d17a..16e864765cde3bf8f54a11e5fc8db5a53606db30 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -348,6 +348,9 @@ (define_special_predicate "aarch64_simd_imm_zero" return aarch64_simd_imm_zero_p (op, mode); }) +(define_special_predicate "aarch64_simd_or_scalar_imm_zero" + (match_test "aarch64_simd_imm_zero_p (op, mode)")) + (define_special_predicate "aarch64_simd_imm_minus_one" (match_code "const_vector") { diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 51ffcbc9b5fbf71298b186c43c4a50913cc5e792..42824b6c61af37f6b005de75bd1e5ebe7522bdba 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5685,6 +5685,22 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode, return op1; } } + /* Replace (vec_merge (vec_duplicate (X)) (const_vector [A, B]) + (const_int N)) + with (vec_concat (X) (B)) if N == 1 or + (vec_concat (A) (X)) if N == 2. */ + if (GET_CODE (op0) == VEC_DUPLICATE + && GET_CODE (op1) == CONST_VECTOR + && CONST_VECTOR_NUNITS (op1) == 2 + && GET_MODE_NUNITS (GET_MODE (op0)) == 2 + && IN_RANGE (sel, 1, 2)) + { + rtx newop0 = XEXP (op0, 0); + rtx newop1 = CONST_VECTOR_ELT (op1, 2 - sel); + if (sel == 2) + std::swap (newop0, newop1); + return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); + } } if (rtx_equal_p (op0, op1) diff --git a/gcc/testsuite/gcc.target/aarch64/construct_lane_zero_1.c b/gcc/testsuite/gcc.target/aarch64/construct_lane_zero_1.c new file mode 100644 index 0000000000000000000000000000000000000000..d87f32908280cf7f6ad89d129f0005510ba7cced --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/construct_lane_zero_1.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +typedef long long v2di __attribute__ ((vector_size (16))); +typedef double v2df __attribute__ ((vector_size (16))); + +v2di +construct_lanedi (long long *y) +{ + v2di x = +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + { 0, y[0] } +#else + { y[0], 0 } +#endif + ; + return x; +} + +v2df +construct_lanedf (double *y) +{ + v2df x = +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ + { 0.0, y[0] } +#else + { y[0], 0.0 } +#endif + ; + return x; +} + +/* Check that creating V2DI and V2DF vectors from a lane with a zero + makes use of the D-reg LDR rather than doing explicit lane inserts. */ + +/* { dg-final { scan-assembler-times "ldr\td\[0-9\]+" 2 } } */ +/* { dg-final { scan-assembler-not "ins\t" } } */