Message ID | 27be90e6-4beb-5c4c-a163-9b136490d783@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] Support vrotr<mode>3 for int vector types | expand |
On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote: > --- a/gcc/config/rs6000/vector.md > +++ b/gcc/config/rs6000/vector.md > @@ -1260,6 +1260,32 @@ > "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" > "") > > +;; Expanders for rotatert to make use of vrotl > +(define_expand "vrotr<mode>3" > + [(set (match_operand:VEC_I 0 "vint_operand") > + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") > + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] > + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" > +{ > + machine_mode inner_mode = GET_MODE_INNER (<MODE>mode); > + unsigned int bits = GET_MODE_PRECISION (inner_mode); > + rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits)); > + rtx rot_count = gen_reg_rtx (<MODE>mode); > + if (GET_CODE (operands[2]) == CONST_VECTOR) > + { > + imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec, > + operands[2]); > + rot_count = force_reg (<MODE>mode, imm_vec); > + } > + else > + { > + rtx imm_reg = force_reg (<MODE>mode, imm_vec); > + emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2])); > + } Is this actually correct if one or more elements in operands[2] are 0? If vrotl<mode>3 acts with truncated shift count, that is not an issue (but then perhaps you wouldn't have to compute imm_reg - operands[2] but just - operands[2]), but if it does something else, then prec - 0 will be prec and thus outside of the allowed rotate count. Or does rs6000 allow rotate counts to be 0 to prec inclusive? Jakub
on 2019/7/17 下午4:42, Jakub Jelinek wrote: > On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote: >> --- a/gcc/config/rs6000/vector.md >> +++ b/gcc/config/rs6000/vector.md >> @@ -1260,6 +1260,32 @@ >> "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" >> "") >> >> +;; Expanders for rotatert to make use of vrotl >> +(define_expand "vrotr<mode>3" >> + [(set (match_operand:VEC_I 0 "vint_operand") >> + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") >> + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] >> + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" >> +{ >> + machine_mode inner_mode = GET_MODE_INNER (<MODE>mode); >> + unsigned int bits = GET_MODE_PRECISION (inner_mode); >> + rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits)); >> + rtx rot_count = gen_reg_rtx (<MODE>mode); >> + if (GET_CODE (operands[2]) == CONST_VECTOR) >> + { >> + imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec, >> + operands[2]); >> + rot_count = force_reg (<MODE>mode, imm_vec); >> + } >> + else >> + { >> + rtx imm_reg = force_reg (<MODE>mode, imm_vec); >> + emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2])); >> + } > > Is this actually correct if one or more elements in operands[2] are 0? > If vrotl<mode>3 acts with truncated shift count, that is not an issue > (but then perhaps you wouldn't have to compute imm_reg - operands[2] but > just - operands[2]), but if it does something else, then prec - 0 will be > prec and thus outside of the allowed rotate count. Or does rs6000 allow > rotate counts to be 0 to prec inclusive? > > Jakub > Hi Jakub, Good question, the vector rotation for byte looks like (others are similar): vrlb VRT,VRA,VRB do i=0 to 127 by 8 sh = (VRB)[i+5:i+7] VRT[i:i+7] = (VRA)[i:i+7] <<< sh end It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) bits] So it's fine even operands[2] are zero or negative. Take byte as example, prec is 8. - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0. - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the original rot count 9 was parsed as 1 (in 3 bits range). - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original rot count was parsed as 7 (in 3 bits range). It's a good idea to just use negate! Thanks!! Kewen
On Wed, Jul 17, 2019 at 05:22:38PM +0800, Kewen.Lin wrote: > Good question, the vector rotation for byte looks like (others are similar): > > vrlb VRT,VRA,VRB > do i=0 to 127 by 8 > sh = (VRB)[i+5:i+7] > VRT[i:i+7] = (VRA)[i:i+7] <<< sh > end > > It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) bits] > So it's fine even operands[2] are zero or negative. > > Take byte as example, prec is 8. > - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0. > - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the original > rot count 9 was parsed as 1 (in 3 bits range). > - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original > rot count was parsed as 7 (in 3 bits range). > > It's a good idea to just use negate! Thanks!! Ok, so the hw for the vectors truncates, the question is how happy will the RTL generic code with that. rs6000 defines SHIFT_COUNT_TRUNCATED to 0, so the generic code can't assume there is a truncation going on. Either it will punt some optimizations when it sees say negative or too large shift/rotate count (that is the better case), or it might just assume there is UB. As the documentation says, for zero SHIFT_COUNT_TRUNCATED there is an option of having a pattern with the truncation being explicit, so in your case *vrotl<mode>3_and or similar that would have an explicit AND on the shift operand with say {7, 7...} vector for the byte shifts etc. but emit in the end identical instruction to vrotl<mode>3 and use the MINUS + that pattern for vrotr<mode>3. If the rotate argument is CONST_VECTOR, you can of course just canonicalize, i.e. perform -operands[2] & mask, fold that into constant and keep using vrotl<mode>3 in that case. Jakub
Hi Kewen, On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote: > Regression testing just launched, is it OK for trunk if it's bootstrapped > and regresstested on powerpc64le-unknown-linux-gnu? > +;; Expanders for rotatert to make use of vrotl > +(define_expand "vrotr<mode>3" > + [(set (match_operand:VEC_I 0 "vint_operand") > + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") > + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] Having any rotatert in a define_expand or define_insn will regress. So, nope, sorry. Segher
Hi Segher, on 2019/7/17 下午9:40, Segher Boessenkool wrote: > Hi Kewen, > > On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote: >> Regression testing just launched, is it OK for trunk if it's bootstrapped >> and regresstested on powerpc64le-unknown-linux-gnu? > >> +;; Expanders for rotatert to make use of vrotl >> +(define_expand "vrotr<mode>3" >> + [(set (match_operand:VEC_I 0 "vint_operand") >> + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") >> + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] > > Having any rotatert in a define_expand or define_insn will regress. > > So, nope, sorry. > Thanks for clarifying! Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's a regression from design view? Is it specific to rotatert and its related one like vrotr? If yes, it sounds we can't go with vrotr way. :( Thanks, Kewen
On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote: > Hi Segher, > > on 2019/7/17 下午9:40, Segher Boessenkool wrote: > > Hi Kewen, > > > > On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote: > >> Regression testing just launched, is it OK for trunk if it's bootstrapped > >> and regresstested on powerpc64le-unknown-linux-gnu? > > > >> +;; Expanders for rotatert to make use of vrotl > >> +(define_expand "vrotr<mode>3" > >> + [(set (match_operand:VEC_I 0 "vint_operand") > >> + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") > >> + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] > > > > Having any rotatert in a define_expand or define_insn will regress. > > > > So, nope, sorry. > > Thanks for clarifying! Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's > a regression from design view? Is it specific to rotatert and its > related one like vrotr? You will get HAVE_rotatert defined in insn-config.h if you do this patch, and then simplify-rtx.c will not work correctly, generating rotatert by an immediate, which we have no instructions for. This might be masked because many of our rl*.c tests already fail because of other changes, I should fix that :-/ Segher
on 2019/7/19 上午3:48, Segher Boessenkool wrote: > On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote: >> Hi Segher, >> >> on 2019/7/17 下午9:40, Segher Boessenkool wrote: >>> Hi Kewen, >>> >>> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote: >>>> Regression testing just launched, is it OK for trunk if it's bootstrapped >>>> and regresstested on powerpc64le-unknown-linux-gnu? >>> >>>> +;; Expanders for rotatert to make use of vrotl >>>> +(define_expand "vrotr<mode>3" >>>> + [(set (match_operand:VEC_I 0 "vint_operand") >>>> + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") >>>> + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] >>> >>> Having any rotatert in a define_expand or define_insn will regress. >>> >>> So, nope, sorry. >> >> Thanks for clarifying! Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's >> a regression from design view? Is it specific to rotatert and its >> related one like vrotr? > > You will get HAVE_rotatert defined in insn-config.h if you do this patch, > and then simplify-rtx.c will not work correctly, generating rotatert by > an immediate, which we have no instructions for. > > This might be masked because many of our rl*.c tests already fail because > of other changes, I should fix that :-/ > Hi Segher, Thanks for further explanation! Sorry that, but I didn't find this HAVE_rotatert definition. I guess it's due to the preparation is always "DONE"? Then it doesn't really generate rotatert. although I can see rotatert in insn like below, it seems fine in note? (insn 10 9 11 4 (set (reg:V4SI 122 [ vect__2.7 ]) (rotate:V4SI (reg:V4SI 121 [ vect__1.6 ]) (reg:V4SI 124))) "t.c":17:28 1596 {*altivec_vrlw} (expr_list:REG_EQUAL (rotatert:V4SI (reg:V4SI 121 [ vect__1.6 ]) (const_vector:V4SI [ (const_int 8 [0x8]) repeated x4 ])) (nil))) Thanks, Kewen
Hi! On Fri, Jul 19, 2019 at 10:21:06AM +0800, Kewen.Lin wrote: > on 2019/7/19 上午3:48, Segher Boessenkool wrote: > > On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote: > >> on 2019/7/17 下午9:40, Segher Boessenkool wrote: > >>> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote: > >>>> Regression testing just launched, is it OK for trunk if it's bootstrapped > >>>> and regresstested on powerpc64le-unknown-linux-gnu? > >>> > >>>> +;; Expanders for rotatert to make use of vrotl > >>>> +(define_expand "vrotr<mode>3" > >>>> + [(set (match_operand:VEC_I 0 "vint_operand") > >>>> + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") > >>>> + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] > >>> > >>> Having any rotatert in a define_expand or define_insn will regress. This is wrong. I don't know why I thought this for a while. There shouldn't be any rotatert in anything that goes through recog, but that is everything *except* define_expand. So define_insn, define_split, define_peephole, define_peephole2 (and define_insn_and_split, which is just syntactic sugar). > Thanks for further explanation! Sorry that, but I didn't find this > HAVE_rotatert definition. I guess it's due to the preparation is always > "DONE"? Then it doesn't really generate rotatert. You only had one in a define_expand. That is fine, that pattern is never recognised against. HAVE_rotatert means that something somewhere will recognise rotatert RTL insns; if it isn't set, it doesn't make sense to ever create them, because they will never match. > although I can see rotatert in insn like below, it seems fine in note? Sure, many things are allowed in notes that can never show up in RTL proper. So, this approach will work fine, and not be too bad. Could you do a new patch with it? It's simple to do, and even if the generic thing will happen eventually, this is a nice stepping stone for that. Thanks, and sorry for the confusion, Segher
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 8ca98299950..c4c74630d26 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -163,6 +163,17 @@ return VINT_REGNO_P (REGNO (op)); }) +;; Return 1 if op is a vector register that operates on integer vectors +;; or if op is a const vector with integer vector modes. +(define_predicate "vint_reg_or_const_vector" + (match_code "reg,subreg,const_vector") +{ + if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + return 1; + + return vint_operand (op, mode); +}) + ;; Return 1 if op is a vector register to do logical operations on (and, or, ;; xor, etc.) (define_predicate "vlogical_operand" diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 70bcfe02e22..5c6a344e452 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -1260,6 +1260,32 @@ "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" "") +;; Expanders for rotatert to make use of vrotl +(define_expand "vrotr<mode>3" + [(set (match_operand:VEC_I 0 "vint_operand") + (rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand") + (match_operand:VEC_I 2 "vint_reg_or_const_vector")))] + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)" +{ + machine_mode inner_mode = GET_MODE_INNER (<MODE>mode); + unsigned int bits = GET_MODE_PRECISION (inner_mode); + rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits)); + rtx rot_count = gen_reg_rtx (<MODE>mode); + if (GET_CODE (operands[2]) == CONST_VECTOR) + { + imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec, + operands[2]); + rot_count = force_reg (<MODE>mode, imm_vec); + } + else + { + rtx imm_reg = force_reg (<MODE>mode, imm_vec); + emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2])); + } + emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count)); + DONE; +}) + ;; Expanders for arithmetic shift left on each vector element (define_expand "vashl<mode>3" [(set (match_operand:VEC_I 0 "vint_operand") diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c new file mode 100644 index 00000000000..80aca1a94a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c @@ -0,0 +1,46 @@ +/* { dg-options "-O3" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ + +/* Check vectorizer can exploit vector rotation instructions on Power, mainly + for the case rotation count is const number. */ + +#define N 256 +unsigned long long sud[N], rud[N]; +unsigned int suw[N], ruw[N]; +unsigned short suh[N], ruh[N]; +unsigned char sub[N], rub[N]; + +void +testULL () +{ + for (int i = 0; i < 256; ++i) + rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8)); +} + +void +testUW () +{ + for (int i = 0; i < 256; ++i) + ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8)); +} + +void +testUH () +{ + for (int i = 0; i < 256; ++i) + ruh[i] = (unsigned short) (suh[i] >> 9) + | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9)); +} + +void +testUB () +{ + for (int i = 0; i < 256; ++i) + rub[i] = (unsigned char) (sub[i] >> 5) + | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5)); +} + +/* { dg-final { scan-assembler {\mvrld\M} } } */ +/* { dg-final { scan-assembler {\mvrlw\M} } } */ +/* { dg-final { scan-assembler {\mvrlh\M} } } */ +/* { dg-final { scan-assembler {\mvrlb\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c new file mode 100644 index 00000000000..affda6c023b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c @@ -0,0 +1,47 @@ +/* { dg-options "-O3" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ + +/* Check vectorizer can exploit vector rotation instructions on Power, mainly + for the case rotation count isn't const number. */ + +#define N 256 +unsigned long long sud[N], rud[N]; +unsigned int suw[N], ruw[N]; +unsigned short suh[N], ruh[N]; +unsigned char sub[N], rub[N]; +extern unsigned char rot_cnt; + +void +testULL () +{ + for (int i = 0; i < 256; ++i) + rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt)); +} + +void +testUW () +{ + for (int i = 0; i < 256; ++i) + ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt)); +} + +void +testUH () +{ + for (int i = 0; i < 256; ++i) + ruh[i] = (unsigned short) (suh[i] >> rot_cnt) + | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt)); +} + +void +testUB () +{ + for (int i = 0; i < 256; ++i) + rub[i] = (unsigned char) (sub[i] >> rot_cnt) + | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt)); +} + +/* { dg-final { scan-assembler {\mvrld\M} } } */ +/* { dg-final { scan-assembler {\mvrlw\M} } } */ +/* { dg-final { scan-assembler {\mvrlh\M} } } */ +/* { dg-final { scan-assembler {\mvrlb\M} } } */