Message ID | CAMZc-byW9O2vwwozyNueO+7YPPdh54B54ne3fd+_9MnDEcfz0w@mail.gmail.com |
---|---|
State | New |
Headers | show |
Series | [PR,rtl-optimization/97249] Simplify vec_select of paradoxical subreg. | expand |
Hi! On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > For rtx like > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > (parallel [(const_int 0) (const_int 1)])) > it could be simplified as inner. You could even simplify any vec_select of a subreg of X to just a vec_select of X, by changing the selection vector a bit (well, only do this if that is a constant vector, I suppose). Not just for paradoxical subregs either, just for *all* subregs. > gcc/ChangeLog > PR rtl-optimization/97249 > * simplify-rtx.c (simplify_binary_operation_1): Simplify > vec_select of paradoxical subreg. > > gcc/testsuite/ChangeLog > > * gcc.target/i386/pr97249-1.c: New test. > + /* For cases like > + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > + (parallel [(const_int 0) (const_int 1)])). > + return inner directly. */ > + if (GET_CODE (trueop0) == SUBREG > + && paradoxical_subreg_p (trueop0) > + && mode == GET_MODE (XEXP (trueop0, 0)) > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && l0 % l1 == 0) Why this? Why does the number of elements of the input have to divide that of the output? > + { > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; > + unsigned HOST_WIDE_INT sel = 0; > + int i = 0; > + for (;i != l1; i++) for (int i = 0; i != l1; i++) > + { > + rtx j = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (j)) > + break; > + sel |= HOST_WIDE_INT_1U << UINTVAL (j); > + } > + /* ??? Need to simplify XEXP (trueop0, 0) here. */ > + if (sel == expect) > + return XEXP (trueop0, 0); > + } > } If you just handle the much more generic case, all the other vec_select simplifications can be done as well, not just this one. > +/* PR target/97249 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx2 -O3 -masm=att" } */ > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ I don't know enough about the x86 backend to know if this is exactly what you need in the testsuite. I do know a case of backslashitis when I see one though -- you might want to use {} instead of "", and perhaps \m and \M and \s etc. And to make sure things are on one line, don't do all that nastiness with [^\n], just start the RE with (?n) :-) Segher
On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > For rtx like > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > (parallel [(const_int 0) (const_int 1)])) > > it could be simplified as inner. > > You could even simplify any vec_select of a subreg of X to just a > vec_select of X, by changing the selection vector a bit (well, only do Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. > this if that is a constant vector, I suppose). Not just for paradoxical > subregs either, just for *all* subregs. > Yes, and only when X has the same inner mode and more elements. > > gcc/ChangeLog > > PR rtl-optimization/97249 > > * simplify-rtx.c (simplify_binary_operation_1): Simplify > > vec_select of paradoxical subreg. > > > > gcc/testsuite/ChangeLog > > > > * gcc.target/i386/pr97249-1.c: New test. > > > + /* For cases like > > + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > + (parallel [(const_int 0) (const_int 1)])). > > + return inner directly. */ > > + if (GET_CODE (trueop0) == SUBREG > > + && paradoxical_subreg_p (trueop0) > > + && mode == GET_MODE (XEXP (trueop0, 0)) > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && l0 % l1 == 0) > > Why this? Why does the number of elements of the input have to divide > that of the output? > Removed, also add condition for my upper comments. > > + { > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; > > + unsigned HOST_WIDE_INT sel = 0; > > + int i = 0; > > + for (;i != l1; i++) > > for (int i = 0; i != l1; i++) > > > + { > > + rtx j = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (j)) > > + break; > > + sel |= HOST_WIDE_INT_1U << UINTVAL (j); > > + } > > + /* ??? Need to simplify XEXP (trueop0, 0) here. */ > > + if (sel == expect) > > + return XEXP (trueop0, 0); > > + } > > } > > If you just handle the much more generic case, all the other vec_select > simplifications can be done as well, not just this one. > Yes, changed, also selection should be inside the elements of X. > > +/* PR target/97249 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-mavx2 -O3 -masm=att" } */ > > +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ > > I don't know enough about the x86 backend to know if this is exactly > what you need in the testsuite. I do know a case of backslashitis when > I see one though -- you might want to use {} instead of "", and perhaps > \m and \M and \s etc. And to make sure things are on one line, don't do > all that nastiness with [^\n], just start the RE with (?n) :-) > Yes, changed and it's very clean with usage of (?n) and {}. > > Segher Update patch.
Hi! On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > > For rtx like > > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > > (parallel [(const_int 0) (const_int 1)])) > > > it could be simplified as inner. > > > > You could even simplify any vec_select of a subreg of X to just a > > vec_select of X, by changing the selection vector a bit (well, only do > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. Exactly. > > this if that is a constant vector, I suppose). Not just for paradoxical > > subregs either, just for *all* subregs. > > Yes, and only when X has the same inner mode and more elements. No, for *all*. The mode of the first argument of vec_select does not have to equal its result mode. Any (constant indices) vec_select of a subreg can be written as just a vec_select. > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when available. */ What does "when available" mean here? > + int l2; > + if (GET_CODE (trueop0) == SUBREG > + && (GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) Don't use unnecessary parens here please, it makes it harder to read (there are quite enough parens already :-) ) > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > + &subreg_offset)); Why is this needed? > + bool success = true; > + for (int i = 0;i != l1; i++) (space after ; ) > + { > + rtx j = XVECEXP (trueop1, 0, i); (i and j and k ususally are integers, not rtx) > + if (!CONST_INT_P (j) > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > + { > + success = false; > + break; > + } > + } You don't have to test if the input RTL is valid. You can assume it is. > + if (success) > + { > + rtx par = trueop1; > + if (subreg_offset) > + { > + rtvec vec = rtvec_alloc (l1); > + for (int i = 0; i < l1; i++) > + RTVEC_ELT (vec, i) > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > + + subreg_offset)); > + par = gen_rtx_PARALLEL (VOIDmode, vec); > + } > + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); > + } > + } subreg_offset will differ in meaning if big-endian; is this correct there, do all the stars align so this code works out fine there as well? Looks fine otherwise, thanks :-) Segher
On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote: >Hi! > >On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: >> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool >> <segher@kernel.crashing.org> wrote: >> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: >> > > For rtx like >> > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) >> > > (parallel [(const_int 0) (const_int 1)])) >> > > it could be simplified as inner. >> > >> > You could even simplify any vec_select of a subreg of X to just a >> > vec_select of X, by changing the selection vector a bit (well, only >do >> >> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to >selection. > >Exactly. > >> > this if that is a constant vector, I suppose). Not just for >paradoxical >> > subregs either, just for *all* subregs. >> >> Yes, and only when X has the same inner mode and more elements. > >No, for *all*. The mode of the first argument of vec_select does not >have to equal its result mode. But IIRC the component mode needs to match. >Any (constant indices) vec_select of a subreg can be written as just a >vec_select. > >> + /* Simplify vec_select of a subreg of X to just a vec_select of X >> + when available. */ > >What does "when available" mean here? > >> + int l2; >> + if (GET_CODE (trueop0) == SUBREG >> + && (GET_MODE_INNER (mode) >> + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) > >Don't use unnecessary parens here please, it makes it harder to read >(there are quite enough parens already :-) ) > >> + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), >> + GET_MODE_SIZE (GET_MODE_INNER (mode)), >> + &subreg_offset)); > >Why is this needed? > >> + bool success = true; >> + for (int i = 0;i != l1; i++) > >(space after ; ) > >> + { >> + rtx j = XVECEXP (trueop1, 0, i); > >(i and j and k ususally are integers, not rtx) > >> + if (!CONST_INT_P (j) >> + || known_ge (UINTVAL (j), l2 - subreg_offset)) >> + { >> + success = false; >> + break; >> + } >> + } > >You don't have to test if the input RTL is valid. You can assume it >is. > >> + if (success) >> + { >> + rtx par = trueop1; >> + if (subreg_offset) >> + { >> + rtvec vec = rtvec_alloc (l1); >> + for (int i = 0; i < l1; i++) >> + RTVEC_ELT (vec, i) >> + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) >> + + subreg_offset)); >> + par = gen_rtx_PARALLEL (VOIDmode, vec); >> + } >> + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); >> + } >> + } > >subreg_offset will differ in meaning if big-endian; is this correct >there, do all the stars align so this code works out fine there as >well? > >Looks fine otherwise, thanks :-) > > >Segher
On Wed, Oct 14, 2020 at 07:55:55PM +0200, Richard Biener wrote: > On October 14, 2020 7:35:32 PM GMT+02:00, Segher Boessenkool <segher@kernel.crashing.org> wrote: > >On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > >> On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > >> <segher@kernel.crashing.org> wrote: > >> > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > >> > > For rtx like > >> > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > >> > > (parallel [(const_int 0) (const_int 1)])) > >> > > it could be simplified as inner. > >> > > >> > You could even simplify any vec_select of a subreg of X to just a > >> > vec_select of X, by changing the selection vector a bit (well, only > >do > >> > >> Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to > >selection. > > > >Exactly. > > > >> > this if that is a constant vector, I suppose). Not just for > >paradoxical > >> > subregs either, just for *all* subregs. > >> > >> Yes, and only when X has the same inner mode and more elements. > > > >No, for *all*. The mode of the first argument of vec_select does not > >have to equal its result mode. > > But IIRC the component mode needs to match. Yeah, good point, at least the i386 backend uses crazy subregs, which is why validate_subreg does not test this :-( Segher
On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > > <segher@kernel.crashing.org> wrote: > > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > > > For rtx like > > > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > > > (parallel [(const_int 0) (const_int 1)])) > > > > it could be simplified as inner. > > > > > > You could even simplify any vec_select of a subreg of X to just a > > > vec_select of X, by changing the selection vector a bit (well, only do > > > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. > > Exactly. > > > > this if that is a constant vector, I suppose). Not just for paradoxical > > > subregs either, just for *all* subregs. > > > > Yes, and only when X has the same inner mode and more elements. > > No, for *all*. The mode of the first argument of vec_select does not > have to equal its result mode. > > Any (constant indices) vec_select of a subreg can be written as just a > vec_select. > > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when available. */ > > What does "when available" mean here? > When X has same component mode as vec_select, i'll add this to comments. > > + int l2; > > + if (GET_CODE (trueop0) == SUBREG > > + && (GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) > > Don't use unnecessary parens here please, it makes it harder to read > (there are quite enough parens already :-) ) > Yes. > > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > + &subreg_offset)); > > Why is this needed? I only found this interface for poly_uint64 division to get subreg_offset. > > > + bool success = true; > > + for (int i = 0;i != l1; i++) > > (space after ; ) > > > + { > > + rtx j = XVECEXP (trueop1, 0, i); > > (i and j and k ususally are integers, not rtx) > > > + if (!CONST_INT_P (j) > > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > > + { > > + success = false; > > + break; > > + } > > + } > > You don't have to test if the input RTL is valid. You can assume it is. > This test is for something like (vec_select:v2di (subreg:v4di (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])). const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist? > > + if (success) > > + { > > + rtx par = trueop1; > > + if (subreg_offset) > > + { > > + rtvec vec = rtvec_alloc (l1); > > + for (int i = 0; i < l1; i++) > > + RTVEC_ELT (vec, i) > > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > > + + subreg_offset)); > > + par = gen_rtx_PARALLEL (VOIDmode, vec); > > + } > > + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); > > + } > > + } > > subreg_offset will differ in meaning if big-endian; is this correct Yes. > there, do all the stars align so this code works out fine there as well? > i found it's a bit tricky to adjust selection index for target BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN. Especially for component mode smaller than word, Any interface to handle this? > Looks fine otherwise, thanks :-) > > > Segher
On Thu, Oct 15, 2020 at 4:14 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > Hi! > > > > On Wed, Oct 14, 2020 at 01:43:45PM +0800, Hongtao Liu wrote: > > > On Wed, Oct 14, 2020 at 4:01 AM Segher Boessenkool > > > <segher@kernel.crashing.org> wrote: > > > > On Tue, Oct 13, 2020 at 04:40:53PM +0800, Hongtao Liu wrote: > > > > > For rtx like > > > > > (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) > > > > > (parallel [(const_int 0) (const_int 1)])) > > > > > it could be simplified as inner. > > > > > > > > You could even simplify any vec_select of a subreg of X to just a > > > > vec_select of X, by changing the selection vector a bit (well, only do > > > > > > Yes, when SUBREG_BYTE of trueop0 is not 0, we need to add offset to selection. > > > > Exactly. > > > > > > this if that is a constant vector, I suppose). Not just for paradoxical > > > > subregs either, just for *all* subregs. > > > > > > Yes, and only when X has the same inner mode and more elements. > > > > No, for *all*. The mode of the first argument of vec_select does not > > have to equal its result mode. > > > > Any (constant indices) vec_select of a subreg can be written as just a > > vec_select. > > > > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > > + when available. */ > > > > What does "when available" mean here? > > > > When X has same component mode as vec_select, i'll add this to comments. > > > > + int l2; > > > + if (GET_CODE (trueop0) == SUBREG > > > + && (GET_MODE_INNER (mode) > > > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0)))) > > > > Don't use unnecessary parens here please, it makes it harder to read > > (there are quite enough parens already :-) ) > > > > Yes. > > > > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > > + &subreg_offset)); > > > > Why is this needed? > > I only found this interface for poly_uint64 division to get subreg_offset. > > > > > > + bool success = true; > > > + for (int i = 0;i != l1; i++) > > > > (space after ; ) > > > > > + { > > > + rtx j = XVECEXP (trueop1, 0, i); > > > > (i and j and k ususally are integers, not rtx) > > > > > + if (!CONST_INT_P (j) > > > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > > > + { > > > + success = false; > > > + break; > > > + } > > > + } > > > > You don't have to test if the input RTL is valid. You can assume it is. > > > > This test is for something like (vec_select:v2di (subreg:v4di > (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])). > const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist? > Remove the test. > > > + if (success) > > > + { > > > + rtx par = trueop1; > > > + if (subreg_offset) > > > + { > > > + rtvec vec = rtvec_alloc (l1); > > > + for (int i = 0; i < l1; i++) > > > + RTVEC_ELT (vec, i) > > > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > > > + + subreg_offset)); > > > + par = gen_rtx_PARALLEL (VOIDmode, vec); > > > + } > > > + return gen_rtx_VEC_SELECT (mode, XEXP (trueop0, 0), par); > > > + } > > > + } > > > > subreg_offset will differ in meaning if big-endian; is this correct > Yes. > > there, do all the stars align so this code works out fine there as well? > > > > i found it's a bit tricky to adjust selection index for target > BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN. > Especially for component mode smaller than word, Any interface to handle this? > Use subreg_lsb to get offset from least significant bits, suppose this could be independent of big/little-endian. > > Looks fine otherwise, thanks :-) > > > > > > Segher > > > > -- > BR, > Hongtao Update Patch.
Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when X has same component mode as vec_select. */ > + int l2; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) Better to use SUBREG_REG here and below. > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) > + .is_constant (&l2) > + && known_le (l1, l2)) > + { > + unsigned HOST_WIDE_INT subreg_offset = 0; > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > + &subreg_offset)); can_div_trunc_p discards the remainder, whereas it looks like here you want an exact multiple. I don't think it's absolutely guaranteed that the “if” condition makes the division by GET_MODE_SIZE exact. E.g. in principle you could have a subreg of a vector of TIs in which the subreg offset is misaligned by a DI offset. I'm not sure the subreg_lsb conversion is correct though. On big-endian targets, lane numbering follows memory layout, just like subreg byte offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) was correct. In summary, I think the "if” condition should include something like: constant_mulitple_p (SUBREG_BYTE (trueop0), GET_MODE_UNIT_BITSIZE (mode), &subreg_offset) Thanks, Richard
On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when X has same component mode as vec_select. */ > > + int l2; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > > Better to use SUBREG_REG here and below. > Yes and changed. > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) > > + .is_constant (&l2) > > + && known_le (l1, l2)) > > + { > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > + &subreg_offset)); > > can_div_trunc_p discards the remainder, whereas it looks like here > you want an exact multiple. > > I don't think it's absolutely guaranteed that the “if” condition makes > the division by GET_MODE_SIZE exact. E.g. in principle you could have > a subreg of a vector of TIs in which the subreg offset is misaligned by > a DI offset. > > I'm not sure the subreg_lsb conversion is correct though. On big-endian > targets, lane numbering follows memory layout, just like subreg byte > offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) > was correct. > > In summary, I think the "if” condition should include something like: > > constant_mulitple_p (SUBREG_BYTE (trueop0), > GET_MODE_UNIT_BITSIZE (mode), > &subreg_offset) > Changed. > Thanks, > Richard Update patch.
Hongtao Liu <crazylht@gmail.com> writes: > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford > <richard.sandiford@arm.com> wrote: >> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> > + /* Simplify vec_select of a subreg of X to just a vec_select of X >> > + when X has same component mode as vec_select. */ >> > + int l2; >> > + if (GET_CODE (trueop0) == SUBREG >> > + && GET_MODE_INNER (mode) >> > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) >> >> Better to use SUBREG_REG here and below. >> > > Yes and changed. > >> > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) >> > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) >> > + .is_constant (&l2) >> > + && known_le (l1, l2)) >> > + { >> > + unsigned HOST_WIDE_INT subreg_offset = 0; >> > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); >> > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), >> > + GET_MODE_SIZE (GET_MODE_INNER (mode)), >> > + &subreg_offset)); >> >> can_div_trunc_p discards the remainder, whereas it looks like here >> you want an exact multiple. >> >> I don't think it's absolutely guaranteed that the “if” condition makes >> the division by GET_MODE_SIZE exact. E.g. in principle you could have >> a subreg of a vector of TIs in which the subreg offset is misaligned by >> a DI offset. >> >> I'm not sure the subreg_lsb conversion is correct though. On big-endian >> targets, lane numbering follows memory layout, just like subreg byte >> offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) >> was correct. >> >> In summary, I think the "if” condition should include something like: >> >> constant_mulitple_p (SUBREG_BYTE (trueop0), >> GET_MODE_UNIT_BITSIZE (mode), >> &subreg_offset) >> > > Changed. > >> Thanks, >> Richard > > > Update patch. > > -- > BR, > Hongtao > > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001 > From: liuhongt <hongtao.liu@intel.com> > Date: Tue, 13 Oct 2020 15:35:29 +0800 > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of > X. > > gcc/ChangeLog > PR rtl-optimization/97249 > * simplify-rtx.c (simplify_binary_operation_1): Simplify > vec_select of a subreg of X to a vec_select of X. > > gcc/testsuite/ChangeLog > > * gcc.target/i386/pr97249-1.c: New test. > --- > gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ > gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ > 2 files changed, 74 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index 869f0d11b2e..b1009837b2b 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, > return subop1; > } > } > + > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when X has same component mode as vec_select. */ > + int l2; > + unsigned HOST_WIDE_INT subreg_offset = 0; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) Nothing really relies on this last line, and nothing uses l0, so better to drop it. > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) > + .is_constant (&l2) > + && known_le (l1, l2) I'm not sure the last two &&s are really the important condition. I think we should drop them for the suggestion below. > + && constant_multiple_p (SUBREG_BYTE (trueop0), > + GET_MODE_UNIT_BITSIZE (mode), > + &subreg_offset)) > + { > + Excess blank line. > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); This can just use ==. > + bool success = true; > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); Excess space. > + if (!CONST_INT_P (idx)) Here I think we should check: || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) where: poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). This makes sure that all indices are in range. In particular, it's valid for the SUBREG_REG to be narrower than mode, for appropriate vec_select indices Thanks, Richard
On Mon, Oct 19, 2020 at 11:31 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Hongtao Liu <crazylht@gmail.com> writes: > > On Thu, Oct 15, 2020 at 8:38 PM Richard Sandiford > > <richard.sandiford@arm.com> wrote: > >> > >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> > + /* Simplify vec_select of a subreg of X to just a vec_select of X > >> > + when X has same component mode as vec_select. */ > >> > + int l2; > >> > + if (GET_CODE (trueop0) == SUBREG > >> > + && GET_MODE_INNER (mode) > >> > + == GET_MODE_INNER (GET_MODE (XEXP (trueop0, 0))) > >> > >> Better to use SUBREG_REG here and below. > >> > > > > Yes and changed. > > > >> > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > >> > + && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))) > >> > + .is_constant (&l2) > >> > + && known_le (l1, l2)) > >> > + { > >> > + unsigned HOST_WIDE_INT subreg_offset = 0; > >> > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > >> > + gcc_assert (can_div_trunc_p (exact_div (subreg_lsb (trueop0), BITS_PER_UNIT), > >> > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > >> > + &subreg_offset)); > >> > >> can_div_trunc_p discards the remainder, whereas it looks like here > >> you want an exact multiple. > >> > >> I don't think it's absolutely guaranteed that the “if” condition makes > >> the division by GET_MODE_SIZE exact. E.g. in principle you could have > >> a subreg of a vector of TIs in which the subreg offset is misaligned by > >> a DI offset. > >> > >> I'm not sure the subreg_lsb conversion is correct though. On big-endian > >> targets, lane numbering follows memory layout, just like subreg byte > >> offsets do. So ISTM that using SUBREG_BYTE (as per the earlier patch) > >> was correct. > >> > >> In summary, I think the "if” condition should include something like: > >> > >> constant_mulitple_p (SUBREG_BYTE (trueop0), > >> GET_MODE_UNIT_BITSIZE (mode), > >> &subreg_offset) > >> > > > > Changed. > > > >> Thanks, > >> Richard > > > > > > Update patch. > > > > -- > > BR, > > Hongtao > > > > From 8d154067963e453c337e6dc2c4f3f19bf0d6e11b Mon Sep 17 00:00:00 2001 > > From: liuhongt <hongtao.liu@intel.com> > > Date: Tue, 13 Oct 2020 15:35:29 +0800 > > Subject: [PATCH] Simplify vec_select of a subreg of X to just a vec_select of > > X. > > > > gcc/ChangeLog > > PR rtl-optimization/97249 > > * simplify-rtx.c (simplify_binary_operation_1): Simplify > > vec_select of a subreg of X to a vec_select of X. > > > > gcc/testsuite/ChangeLog > > > > * gcc.target/i386/pr97249-1.c: New test. > > --- > > gcc/simplify-rtx.c | 44 +++++++++++++++++++++++ > > gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 ++++++++++++++++ > > 2 files changed, 74 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > index 869f0d11b2e..b1009837b2b 100644 > > --- a/gcc/simplify-rtx.c > > +++ b/gcc/simplify-rtx.c > > @@ -4170,6 +4170,50 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, > > return subop1; > > } > > } > > + > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when X has same component mode as vec_select. */ > > + int l2; > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > > + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) > > Nothing really relies on this last line, and nothing uses l0, so better > to drop it. > Changed, so there won't be any vector mode with variable number elts. > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) > > + .is_constant (&l2) > > + && known_le (l1, l2) > > I'm not sure the last two &&s are really the important condition. > I think we should drop them for the suggestion below. > Changed, assume gcc also support something like (vec_select:v4di (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1) (const_int 0)])) as long as the range of selection guaranteed by || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > + && constant_multiple_p (SUBREG_BYTE (trueop0), > > + GET_MODE_UNIT_BITSIZE (mode), > > + &subreg_offset)) > > + { > > + > > Excess blank line. > Changed. > > + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); > > This can just use ==. > Changed. > > + bool success = true; > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > Excess space. Changed. > > > + if (!CONST_INT_P (idx)) > > Here I think we should check: > > || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > where: > > poly_uint64 nunits > = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). > Changed. > This makes sure that all indices are in range. In particular, it's > valid for the SUBREG_REG to be narrower than mode, for appropriate > vec_select indices > Yes, that's what paradoxical subreg means. > Thanks, > Richard -- BR, Hongtao
Hongtao Liu <crazylht@gmail.com> writes: >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) >> > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) >> > + .is_constant (&l2) >> > + && known_le (l1, l2) >> >> I'm not sure the last two &&s are really the important condition. >> I think we should drop them for the suggestion below. >> > > Changed, assume gcc also support something like (vec_select:v4di > (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1) > (const_int 0)])) > as long as the range of selection guaranteed by > || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) Yeah, that vec_select looks OK. >> >> > + if (!CONST_INT_P (idx)) >> >> Here I think we should check: >> >> || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) >> >> where: >> >> poly_uint64 nunits >> = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). >> > > Changed. > >> This makes sure that all indices are in range. In particular, it's >> valid for the SUBREG_REG to be narrower than mode, for appropriate >> vec_select indices >> > > Yes, that's what paradoxical subreg means. But I was comparing the mode of the vec_select with the mode of the SUBREG_REG (rather than the mode of trueop0 with the mode of the SUBREG_REG, which is what matters for paradoxical subregs). > + /* Simplify vec_select of a subreg of X to just a vec_select of X > + when X has same component mode as vec_select. */ > + unsigned HOST_WIDE_INT subreg_offset = 0; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) Unnecessary brackets around “GET_MODE_NUNITS (mode)”. > + && constant_multiple_p (SUBREG_BYTE (trueop0), > + GET_MODE_UNIT_BITSIZE (mode), > + &subreg_offset)) Sorry, my bad, this should be: && constant_multiple_p (subreg_memory_offset (trueop0), GET_MODE_UNIT_BITSIZE (mode), &subreg_offset)) > + { > + gcc_assert (XVECLEN (trueop1, 0) == l1); > + bool success = true; > + poly_uint64 nunits > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (idx) > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > + { > + success = false; > + break; > + } > + } > + if (success) > + { > + rtx par = trueop1; > + if (subreg_offset) > + { > + rtvec vec = rtvec_alloc (l1); > + for (int i = 0; i < l1; i++) > + RTVEC_ELT (vec, i) > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > + + subreg_offset)); This is applying subreg_offset to the pointer rather than the INTVAL. It should be: = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i)) + subreg_offset); OK with those changes, thanks. Richard
On Thu, Oct 15, 2020 at 04:14:39PM +0800, Hongtao Liu wrote: > On Thu, Oct 15, 2020 at 1:37 AM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > + gcc_assert (can_div_trunc_p (SUBREG_BYTE (trueop0), > > > + GET_MODE_SIZE (GET_MODE_INNER (mode)), > > > + &subreg_offset)); > > > > Why is this needed? > > I only found this interface for poly_uint64 division to get subreg_offset. I mean, why do you have this assert at all? > > > + if (!CONST_INT_P (j) > > > + || known_ge (UINTVAL (j), l2 - subreg_offset)) > > > + { > > > + success = false; > > > + break; > > > + } > > > + } > > > > You don't have to test if the input RTL is valid. You can assume it is. > > > > This test is for something like (vec_select:v2di (subreg:v4di > (reg:v2di) 0)(parallel [ (const_int 2) (const_int 3)])). > const_int 2 here is out of range. Are you meaning the upper rtx wouldn't exist? Assuming this is LE: yes, this is just invalid. You can do whatever you want with it (except ICE :-) ) > > subreg_offset will differ in meaning if big-endian; is this correct > Yes. > > there, do all the stars align so this code works out fine there as well? > > i found it's a bit tricky to adjust selection index for target > BYTES_BIG_ENDIA != WORDS_BIG_ENDIAN. > Especially for component mode smaller than word, Any interface to handle this? For most things you want BYTES_BIG_ENDIAN, anything in a subreg here for example. I don't know which of those vectors use; I cannot find it in the documentation, either. Segher
On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote: > + unsigned HOST_WIDE_INT subreg_offset = 0; > + if (GET_CODE (trueop0) == SUBREG > + && GET_MODE_INNER (mode) > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > + && constant_multiple_p (SUBREG_BYTE (trueop0), > + GET_MODE_UNIT_BITSIZE (mode), > + &subreg_offset)) > + { > + gcc_assert (XVECLEN (trueop1, 0) == l1); Why? If we want to check that, it should be in RTL checking (and maybe it already is!) > + bool success = true; > + poly_uint64 nunits > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (idx) > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) Can that ever happen in valid code? This seems to just hide problems. > + { > + success = false; > + break; > + } > + } > + if (success) If you have a huge piece of code like this, factor it? Esp. if you now need to have all kinds of booleans where you really just want to do early returns. Segher
On Wed, Oct 21, 2020 at 12:42 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Hongtao Liu <crazylht@gmail.com> writes: > >> > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > >> > + && (GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))) > >> > + .is_constant (&l2) > >> > + && known_le (l1, l2) > >> > >> I'm not sure the last two &&s are really the important condition. > >> I think we should drop them for the suggestion below. > >> > > > > Changed, assume gcc also support something like (vec_select:v4di > > (reg:v2di) (parallel [ (const_int 0) (const_int 1) (const_int 1) > > (const_int 0)])) > > as long as the range of selection guaranteed by > > || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > Yeah, that vec_select looks OK. > > >> > >> > + if (!CONST_INT_P (idx)) > >> > >> Here I think we should check: > >> > >> || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > >> > >> where: > >> > >> poly_uint64 nunits > >> = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0)))). > >> > > > > Changed. > > > >> This makes sure that all indices are in range. In particular, it's > >> valid for the SUBREG_REG to be narrower than mode, for appropriate > >> vec_select indices > >> > > > > Yes, that's what paradoxical subreg means. > > But I was comparing the mode of the vec_select with the mode of the > SUBREG_REG (rather than the mode of trueop0 with the mode of the > SUBREG_REG, which is what matters for paradoxical subregs). > > > + /* Simplify vec_select of a subreg of X to just a vec_select of X > > + when X has same component mode as vec_select. */ > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > Unnecessary brackets around “GET_MODE_NUNITS (mode)”. > Changed. > > + && constant_multiple_p (SUBREG_BYTE (trueop0), > > + GET_MODE_UNIT_BITSIZE (mode), > > + &subreg_offset)) > > Sorry, my bad, this should be: > > && constant_multiple_p (subreg_memory_offset (trueop0), > GET_MODE_UNIT_BITSIZE (mode), > &subreg_offset)) > Changed. > > + { > > + gcc_assert (XVECLEN (trueop1, 0) == l1); > > + bool success = true; > > + poly_uint64 nunits > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (idx) > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > + { > > + success = false; > > + break; > > + } > > + } > > + if (success) > > + { > > + rtx par = trueop1; > > + if (subreg_offset) > > + { > > + rtvec vec = rtvec_alloc (l1); > > + for (int i = 0; i < l1; i++) > > + RTVEC_ELT (vec, i) > > + = GEN_INT (INTVAL (XVECEXP (trueop1, 0, i) > > + + subreg_offset)); > > This is applying subreg_offset to the pointer rather than the INTVAL. > It should be: > > = GEN_INT (UINTVAL (XVECEXP (trueop1, 0, i)) > + subreg_offset); > oops, sorry for typo and changed. > OK with those changes, thanks. > > Richard
On Wed, Oct 21, 2020 at 5:07 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Oct 20, 2020 at 11:20:48AM +0800, Hongtao Liu wrote: > > + unsigned HOST_WIDE_INT subreg_offset = 0; > > + if (GET_CODE (trueop0) == SUBREG > > + && GET_MODE_INNER (mode) > > + == GET_MODE_INNER (GET_MODE (SUBREG_REG (trueop0))) > > + && (GET_MODE_NUNITS (mode)).is_constant (&l1) > > + && constant_multiple_p (SUBREG_BYTE (trueop0), > > + GET_MODE_UNIT_BITSIZE (mode), > > + &subreg_offset)) > > + { > > + gcc_assert (XVECLEN (trueop1, 0) == l1); > > Why? If we want to check that, it should be in RTL checking (and maybe > it already is!) > Yes, RTL checking would guarantee that and it should be removed. > > + bool success = true; > > + poly_uint64 nunits > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (idx) > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > Can that ever happen in valid code? This seems to just hide problems. > for rtx like (vec_select:v4di:(subreg:v8di (reg:v2di)) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])), It seems valid for rtl checking. > > + { > > + success = false; > > + break; > > + } > > + } > > + if (success) > > If you have a huge piece of code like this, factor it? Esp. if you now > need to have all kinds of booleans where you really just want to do > early returns. > I want to jump out of this if branch, since later codes in this function won't simplify VEC_SELECT further when it matches my if condition, it's ok to use ealry returns. > > Segher Update patch.
Hongtao Liu <crazylht@gmail.com> writes: > + poly_uint64 nunits > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > + rtx par = trueop1; > + for (int i = 0; i != l1; i++) > + { > + rtx idx = XVECEXP (trueop1, 0, i); > + if (!CONST_INT_P (idx) > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > + return 0; > + } I think the previous version was better. We shouldn't assume that further simplification rules will fail just because the conditions for this rule haven't been met. Thanks, Richard
On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote: > Hongtao Liu <crazylht@gmail.com> writes: > > + poly_uint64 nunits > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > + rtx par = trueop1; > > + for (int i = 0; i != l1; i++) > > + { > > + rtx idx = XVECEXP (trueop1, 0, i); > > + if (!CONST_INT_P (idx) > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > + return 0; > > + } > > I think the previous version was better. We shouldn't assume that > further simplification rules will fail just because the conditions > for this rule haven't been met. Yes. My suggestion was to factor this big piece of code to a separate function, and do an early return from *that*. The patch is okay for trunk without that, with the clumsy booleans. Thanks Hongtao! Segher
On Thu, Oct 22, 2020 at 12:36 AM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Wed, Oct 21, 2020 at 04:43:29PM +0100, Richard Sandiford wrote: > > Hongtao Liu <crazylht@gmail.com> writes: > > > + poly_uint64 nunits > > > + = GET_MODE_NUNITS (GET_MODE (SUBREG_REG (trueop0))); > > > + rtx par = trueop1; > > > + for (int i = 0; i != l1; i++) > > > + { > > > + rtx idx = XVECEXP (trueop1, 0, i); > > > + if (!CONST_INT_P (idx) > > > + || maybe_ge (UINTVAL (idx) + subreg_offset, nunits)) > > > + return 0; > > > + } > > > > I think the previous version was better. We shouldn't assume that > > further simplification rules will fail just because the conditions > > for this rule haven't been met. > > Yes. My suggestion was to factor this big piece of code to a separate > function, and do an early return from *that*. > > The patch is okay for trunk without that, with the clumsy booleans. > Thanks Hongtao! > > > Segher Thank you both for the review, I'll commit the patch with *bool success* keeped.
From c00369aa36d2e169b59287c58872c915953dd2a2 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Tue, 13 Oct 2020 15:35:29 +0800 Subject: [PATCH] Simplify vec_select of paradoxical subreg. For rtx like (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) (parallel [(const_int 0) (const_int 1)])) it could be simplified as inner. gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of paradoxical subreg. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test. --- gcc/simplify-rtx.c | 27 ++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr97249-1.c | 30 +++++++++++++++++++++++ 2 files changed, 57 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr97249-1.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 869f0d11b2e..9c397157f28 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, return subop1; } } + + /* For cases like + (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0) + (parallel [(const_int 0) (const_int 1)])). + return inner directly. */ + if (GET_CODE (trueop0) == SUBREG + && paradoxical_subreg_p (trueop0) + && mode == GET_MODE (XEXP (trueop0, 0)) + && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0) + && (GET_MODE_NUNITS (mode)).is_constant (&l1) + && l0 % l1 == 0) + { + gcc_assert (known_eq (XVECLEN (trueop1, 0), l1)); + unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1; + unsigned HOST_WIDE_INT sel = 0; + int i = 0; + for (;i != l1; i++) + { + rtx j = XVECEXP (trueop1, 0, i); + if (!CONST_INT_P (j)) + break; + sel |= HOST_WIDE_INT_1U << UINTVAL (j); + } + /* ??? Need to simplify XEXP (trueop0, 0) here. */ + if (sel == expect) + return XEXP (trueop0, 0); + } } if (XVECLEN (trueop1, 0) == 1 diff --git a/gcc/testsuite/gcc.target/i386/pr97249-1.c b/gcc/testsuite/gcc.target/i386/pr97249-1.c new file mode 100644 index 00000000000..bc34aa8baa6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr97249-1.c @@ -0,0 +1,30 @@ +/* PR target/97249 */ +/* { dg-do compile } */ +/* { dg-options "-mavx2 -O3 -masm=att" } */ +/* { dg-final { scan-assembler-times "vpmovzxbw\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovzxwd\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovzxdq\[ \t\]+\\\(\[^\n\]*%xmm\[0-9\](?:\n|\[ \t\]+#)" 2 } } */ + +void +foo (unsigned char* p1, unsigned char* p2, short* __restrict p3) +{ + for (int i = 0 ; i != 8; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo1 (unsigned short* p1, unsigned short* p2, int* __restrict p3) +{ + for (int i = 0 ; i != 4; i++) + p3[i] = p1[i] + p2[i]; + return; +} + +void +foo2 (unsigned int* p1, unsigned int* p2, long long* __restrict p3) +{ + for (int i = 0 ; i != 2; i++) + p3[i] = (long long)p1[i] + (long long)p2[i]; + return; +} -- 2.18.1