Message ID | 037701d8a8fa$4f65ed80$ee31c880$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86] Move V1TI shift/rotate lowering from expand to pre-reload split. | expand |
On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch moves the lowering of 128-bit V1TImode shifts and rotations by > constant bit counts to sequences of SSE operations from the RTL expansion > pass to the pre-reload split pass. Postponing this splitting of shifts > and rotates enables (will enable) the TImode equivalents of these > operations/ > instructions to be considered as candidates by the (TImode) STV pass. > Technically, this patch changes the existing expanders to continue to > lower shifts by variable amounts, but constant operands become RTL > instructions, specified by define_insn_and_split that are triggered by > x86_pre_reload_split. The one minor complication is that logical shifts > by multiples of eight, don't get split, but are handled by existing insn > patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3. There should be no > changes in generated code with this patch, which just adjusts the pass > in which transformations get applied. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, with > no new failures. Ok for mainline? > > > > 2022-08-05 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left > shifts by constant bit counts. > (*ashlvti3_internal): New define_insn_and_split that lowers > logical left shifts by constant bit counts, that aren't multiples > of 8, before reload. > (lshrv1ti3): Delay lowering of logical right shifts by constant. > (*lshrv1ti3_internal): New define_insn_and_split that lowers > logical right shifts by constant bit counts, that aren't multiples > of 8, before reload. > (ashrv1ti3):: Delay lowering of arithmetic right shifts by > constant bit counts. > (*ashrv1ti3_internal): New define_insn_and_split that lowers > arithmetic right shifts by constant bit counts before reload. > (rotlv1ti3): Delay lowering of rotate left by constant. > (*rotlv1ti3_internal): New define_insn_and_split that lowers > rotate left by constant bits counts before reload. > (rotrv1ti3): Delay lowering of rotate right by constant. > (*rotrv1ti3_internal): New define_insn_and_split that lowers > rotate right by constant bits counts before reload. +(define_insn_and_split "*ashlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") (ashift:V1TI (match_operand:V1TI 1 "register_operand") - (match_operand:QI 2 "general_operand")))] - "TARGET_SSE2 && TARGET_64BIT" + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && (INTVAL (operands[2]) & 7) != 0 Please introduce const_0_to_255_not_mul_8_operand predicate. Alternatively, and preferably, you can use pattern shadowing, where the preceding, more constrained pattern will match before the following, more broad pattern will. Uros.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 14d12d1..d3ea52f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15995,10 +15995,30 @@ (define_expand "ashlv1ti3" [(set (match_operand:V1TI 0 "register_operand") + (ashift:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:QI 2 "general_operand")))] + "TARGET_SSE2 && TARGET_64BIT" +{ + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_shift (ASHIFT, operands); + DONE; + } +}) + +(define_insn_and_split "*ashlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") (ashift:V1TI (match_operand:V1TI 1 "register_operand") - (match_operand:QI 2 "general_operand")))] - "TARGET_SSE2 && TARGET_64BIT" + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && (INTVAL (operands[2]) & 7) != 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] { ix86_expand_v1ti_shift (ASHIFT, operands); DONE; @@ -16011,6 +16031,26 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_shift (LSHIFTRT, operands); + DONE; + } +}) + +(define_insn_and_split "*lshrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (lshiftrt:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && (INTVAL (operands[2]) & 7) != 0 + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_shift (LSHIFTRT, operands); DONE; }) @@ -16022,6 +16062,26 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_ashiftrt (operands); + DONE; + } +}) + + +(define_insn_and_split "*ashrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (ashiftrt:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_ashiftrt (operands); DONE; }) @@ -16033,6 +16093,25 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_rotate (ROTATE, operands); + DONE; + } +}) + +(define_insn_and_split "*rotlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (rotate:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_rotate (ROTATE, operands); DONE; }) @@ -16044,6 +16123,25 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_rotate (ROTATERT, operands); + DONE; + } +}) + +(define_insn_and_split "*rotrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (rotatert:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_rotate (ROTATERT, operands); DONE; })