Message ID | 20201015085922.qmtum7kw2dlaixq3@arm.com |
---|---|
State | New |
Headers | show |
Series | [1/2] aarch64: Remove support for extract-based addresses [PR96998] | expand |
Ping. Hopefully this is easier to review/test now that we fix the AArch64 bug first and deliberately regress code quality so that the impact of the combine patch can be measured. On 15/10/2020 09:59, Alex Coplan via Gcc-patches wrote: > Currently, make_extraction() identifies where we can emit an ASHIFT of > an extend in place of an extraction, but fails to make the corresponding > canonicalization/simplification when presented with a MULT by a power of > two. Such a representation is canonical when representing a left-shifted > address inside a MEM. > > This patch remedies this situation: after the patch, make_extraction() > now also identifies RTXs such as: > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > and rewrites this as: > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > instead of using a sign_extract. > > (This patch also fixes up a comment in expand_compound_operation() which > appears to have suffered from bitrot.) > > This fixes several quality regressions on AArch64 after removing support for > addresses represented as sign_extract insns (1/2). > > In particular, after the fix for PR96998, for the relevant testcase, we > have: > > .L2: > sxtw x0, w0 // 8 [c=4 l=4] *extendsidi2_aarch64/0 > add x0, x19, x0, lsl 2 // 39 [c=8 l=4] *add_lsl_di > bl h // 11 [c=4 l=4] *call_value_insn/1 > b .L2 // 54 [c=4 l=4] jump > > and after this patch, we have: > > .L2: > add x0, x19, w0, sxtw 2 // 39 [c=8 l=4] *add_extendsi_shft_di > bl h // 11 [c=4 l=4] *call_value_insn/1 > b .L2 // 54 [c=4 l=4] jump > > which restores the original optimal sequence we saw before the AArch64 > patch. > > Testing: > * Bootstrapped and regtested on aarch64-linux-gnu, arm-linux-gnueabihf, > and x86_64-linux-gnu. > > OK for trunk? > > Thanks, > Alex > > --- > > gcc/ChangeLog: > > * combine.c (expand_compound_operation): Tweak variable name in > comment to match source. > (make_extraction): Handle mult by power of two in addition to > ashift. > > --- > gcc/combine.c | 37 ++++++++++++++++++++++--------------- > 1 file changed, 22 insertions(+), 15 deletions(-)
Hi Alex, On Thu, Oct 22, 2020 at 09:36:02AM +0100, Alex Coplan wrote: > Ping. > > Hopefully this is easier to review/test now that we fix the AArch64 bug first > and deliberately regress code quality so that the impact of the combine patch > can be measured. Yes, I am just busy. I'll get to it soon. Segher
On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > Currently, make_extraction() identifies where we can emit an ASHIFT of > an extend in place of an extraction, but fails to make the corresponding > canonicalization/simplification when presented with a MULT by a power of > two. Such a representation is canonical when representing a left-shifted > address inside a MEM. > > This patch remedies this situation: after the patch, make_extraction() > now also identifies RTXs such as: > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > and rewrites this as: > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > instead of using a sign_extract. That is only correct if SUBREG_PROMOTED_VAR_P is true and SUBREG_PROMOTED_UNSIGNED_P is false for r. Is that guaranteed to be true here (and how then?) Segher
Hi! On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > This patch remedies this situation: after the patch, make_extraction() > now also identifies RTXs such as: > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > and rewrites this as: > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > instead of using a sign_extract. Do we do that in simplify-rtx already (w/ the proper SUBREG_PROMOTED_* stuff set)? If not, we should; if so, why does that not work here? (Just the subreg->{sign,zero}_extend that is, nothing with the mult.) Segher
Hi Segher, On 22/10/2020 15:39, Segher Boessenkool wrote: > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > > Currently, make_extraction() identifies where we can emit an ASHIFT of > > an extend in place of an extraction, but fails to make the corresponding > > canonicalization/simplification when presented with a MULT by a power of > > two. Such a representation is canonical when representing a left-shifted > > address inside a MEM. > > > > This patch remedies this situation: after the patch, make_extraction() > > now also identifies RTXs such as: > > > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > > > and rewrites this as: > > > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > > > instead of using a sign_extract. > > That is only correct if SUBREG_PROMOTED_VAR_P is true and > SUBREG_PROMOTED_UNSIGNED_P is false for r. Is that guaranteed to be > true here (and how then?) Sorry, I didn't give enough context here. For this subreg, SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in isolation is not valid. The crucial piece of missing information is that we only make this transformation in calls to make_extraction where len = 32 + n and pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and unsignedp is false (so we're doing a sign_extract). Below is a proposed commit message, updated with this information. OK for trunk? Thanks, Alex --- Currently, make_extraction() identifies where we can emit an ashift of an extend in place of an extraction, but fails to make the corresponding canonicalization/simplification when presented with a mult by a power of two. Such a representation is canonical when representing a left-shifted address inside a mem. This patch remedies this situation. For rtxes such as: (mult:DI (subreg:DI (reg:SI r) 0) (const_int 2^n)) where the bottom 32 + n bits are valid (the higher-order bits are undefined) and make_extraction() is being asked to sign_extract the lower (valid) bits, after the patch, we rewrite this as: (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) instead of using a sign_extract. (This patch also fixes up a comment in expand_compound_operation() which appears to have suffered from bitrot.) For an example of the existing behavior in the ashift case, compiling the following C testcase at -O2 on AArch64: int h(void); struct c d; struct c { int e[1]; }; void g(void) { int k = 0; for (;; k = h()) { asm volatile ("" :: "r"(&d.e[k])); } } make_extraction gets called with len=34, pos_rtx=pos=0, unsignedp=false (so we're sign_extracting the bottom 34 bits), and the following rtx in inner: (ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0) (const_int 2 [0x2])) where it is clear that the bottom 34 bits are valid (and the higher-order bits are undefined). We then hit the block: else if (GET_CODE (inner) == ASHIFT && CONST_INT_P (XEXP (inner, 1)) && pos_rtx == 0 && pos == 0 && len > UINTVAL (XEXP (inner, 1))) { /* We're extracting the least significant bits of an rtx (ashift X (const_int C)), where LEN > C. Extract the least significant (LEN - C) bits of X, giving an rtx whose mode is MODE, then shift it left C times. */ new_rtx = make_extraction (mode, XEXP (inner, 0), 0, 0, len - INTVAL (XEXP (inner, 1)), unsignedp, in_dest, in_compare); if (new_rtx != 0) return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); } and the recursive call to make_extraction() is asked to sign_extract the bottom (LEN - C) = 32 bits of: (subreg:DI (reg/v:SI 93) 0) which gives us: (sign_extend:DI (reg/v:SI 93 [ k ])). The gen_rtx_ASHIFT call then gives us the final result: (ashift:DI (sign_extend:DI (reg/v:SI 93 [ k ])) (const_int 2 [0x2])). Now all that this patch does is to teach the block that looks for these shifts how to handle the same thing written as a mult instead of an ashift. In particular, for the testcase in the PR (96998), we hit make_extraction with len=34, unsignedp=false, pos_rtx=pos=0, and inner as: (mult:DI (subreg:DI (reg/v:SI 92 [ g ]) 0) (const_int 4 [0x4])) It should be clear from the above that this can be handled in an analogous way: the recursive case is precisely the same, the only difference is that we take the log2 of the shift amount and write the end result as a mult instead. This fixes several quality regressions on AArch64 after removing support for addresses represented as sign_extract insns (1/2). In particular, after the fix for PR96998, for the relevant testcase, we have: .L2: sxtw x0, w0 // 8 [c=4 l=4] *extendsidi2_aarch64/0 add x0, x19, x0, lsl 2 // 39 [c=8 l=4] *add_lsl_di bl h // 11 [c=4 l=4] *call_value_insn/1 b .L2 // 54 [c=4 l=4] jump and after this patch, we have: .L2: add x0, x19, w0, sxtw 2 // 39 [c=8 l=4] *add_extendsi_shft_di bl h // 11 [c=4 l=4] *call_value_insn/1 b .L2 // 54 [c=4 l=4] jump which restores the original optimal sequence we saw before the AArch64 patch. --- gcc/ChangeLog: * combine.c (expand_compound_operation): Tweak variable name in comment to match source. (make_extraction): Handle mult by power of two in addition to ashift. diff --git a/gcc/combine.c b/gcc/combine.c index c88382efbd3..fe8eff2b464 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) } /* If we reach here, we want to return a pair of shifts. The inner - shift is a left shift of BITSIZE - POS - LEN bits. The outer - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or logical depending on the value of UNSIGNEDP. If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, is_mode = GET_MODE (SUBREG_REG (inner)); inner = SUBREG_REG (inner); } - else if (GET_CODE (inner) == ASHIFT + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) && CONST_INT_P (XEXP (inner, 1)) - && pos_rtx == 0 && pos == 0 - && len > UINTVAL (XEXP (inner, 1))) - { - /* We're extracting the least significant bits of an rtx - (ashift X (const_int C)), where LEN > C. Extract the - least significant (LEN - C) bits of X, giving an rtx - whose mode is MODE, then shift it left C times. */ - new_rtx = make_extraction (mode, XEXP (inner, 0), - 0, 0, len - INTVAL (XEXP (inner, 1)), - unsignedp, in_dest, in_compare); - if (new_rtx != 0) - return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + && pos_rtx == 0 && pos == 0) + { + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); + const auto code = GET_CODE (inner); + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; + + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt) + { + /* We're extracting the least significant bits of an rtx + (ashift X (const_int C)) or (mult X (const_int (2^C))), + where LEN > C. Extract the least significant (LEN - C) bits + of X, giving an rtx whose mode is MODE, then shift it left + C times. */ + new_rtx = make_extraction (mode, XEXP (inner, 0), + 0, 0, len - shift_amt, + unsignedp, in_dest, in_compare); + if (new_rtx) + return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1)); + } } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract
Hi! On Mon, Oct 26, 2020 at 10:09:41AM +0000, Alex Coplan wrote: > On 22/10/2020 15:39, Segher Boessenkool wrote: > > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > > > Currently, make_extraction() identifies where we can emit an ASHIFT of > > > an extend in place of an extraction, but fails to make the corresponding > > > canonicalization/simplification when presented with a MULT by a power of > > > two. Such a representation is canonical when representing a left-shifted > > > address inside a MEM. > > > > > > This patch remedies this situation: after the patch, make_extraction() > > > now also identifies RTXs such as: > > > > > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > > > > > and rewrites this as: > > > > > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > > > > > instead of using a sign_extract. > > > > That is only correct if SUBREG_PROMOTED_VAR_P is true and > > SUBREG_PROMOTED_UNSIGNED_P is false for r. Is that guaranteed to be > > true here (and how then?) > > Sorry, I didn't give enough context here. For this subreg, > SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in > isolation is not valid. > > The crucial piece of missing information is that we only make this > transformation in calls to make_extraction where len = 32 + n and > pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and > unsignedp is false (so we're doing a sign_extract). The high half of a DI subreg of a SI reg is *undefined* if SUBREG_PROMOTED_VAR_P is not set. So the code you get as input: > (ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0) > (const_int 2 [0x2])) ... is already incorrect. Please fix that? > where it is clear that the bottom 34 bits are valid (and the > higher-order bits are undefined). We then hit the block: No, only the bottom 32 bits are valid. > diff --git a/gcc/combine.c b/gcc/combine.c > index c88382efbd3..fe8eff2b464 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) > } > > /* If we reach here, we want to return a pair of shifts. The inner > - shift is a left shift of BITSIZE - POS - LEN bits. The outer > - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or > + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer > + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or > logical depending on the value of UNSIGNEDP. > > If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be MODEWIDTH isn't defined here yet, it is initialised just below to MODE_PRECISION (mode). Segher
On 26/10/2020 05:48, Segher Boessenkool wrote: > Hi! > > On Mon, Oct 26, 2020 at 10:09:41AM +0000, Alex Coplan wrote: > > On 22/10/2020 15:39, Segher Boessenkool wrote: > > > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > > > > Currently, make_extraction() identifies where we can emit an ASHIFT of > > > > an extend in place of an extraction, but fails to make the corresponding > > > > canonicalization/simplification when presented with a MULT by a power of > > > > two. Such a representation is canonical when representing a left-shifted > > > > address inside a MEM. > > > > > > > > This patch remedies this situation: after the patch, make_extraction() > > > > now also identifies RTXs such as: > > > > > > > > (mult:DI (subreg:DI (reg:SI r)) (const_int 2^n)) > > > > > > > > and rewrites this as: > > > > > > > > (mult:DI (sign_extend:DI (reg:SI r)) (const_int 2^n)) > > > > > > > > instead of using a sign_extract. > > > > > > That is only correct if SUBREG_PROMOTED_VAR_P is true and > > > SUBREG_PROMOTED_UNSIGNED_P is false for r. Is that guaranteed to be > > > true here (and how then?) > > > > Sorry, I didn't give enough context here. For this subreg, > > SUBREG_PROMOTED_VAR_P is not set, so I agree that this transformation in > > isolation is not valid. > > > > The crucial piece of missing information is that we only make this > > transformation in calls to make_extraction where len = 32 + n and > > pos_rtx = pos = 0 (so we're extracting the bottom 32 + n bits), and > > unsignedp is false (so we're doing a sign_extract). > > The high half of a DI subreg of a SI reg is *undefined* if > SUBREG_PROMOTED_VAR_P is not set. So the code you get as input: > > > (ashift:DI (subreg:DI (reg/v:SI 93 [ k ]) 0) > > (const_int 2 [0x2])) > > ... is already incorrect. Please fix that? > > > where it is clear that the bottom 34 bits are valid (and the > > higher-order bits are undefined). We then hit the block: > > No, only the bottom 32 bits are valid. Well, only the low 32 bits of the subreg are valid. But because those low 32 bits are shifted left 2 times, the low 34 bits of the ashift are valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above those are from the inner SImode reg (with the upper 62 bits being undefined). > > > diff --git a/gcc/combine.c b/gcc/combine.c > > index c88382efbd3..fe8eff2b464 100644 > > --- a/gcc/combine.c > > +++ b/gcc/combine.c > > @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) > > } > > > > /* If we reach here, we want to return a pair of shifts. The inner > > - shift is a left shift of BITSIZE - POS - LEN bits. The outer > > - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or > > + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer > > + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or > > logical depending on the value of UNSIGNEDP. > > > > If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be > > MODEWIDTH isn't defined here yet, it is initialised just below to > MODE_PRECISION (mode). Yes, but bitsize isn't defined at all in this function AFAICT. Are comments not permitted to refer to variables defined immediately beneath them? Alex
On 26/10/2020 11:06, Alex Coplan via Gcc-patches wrote: > Well, only the low 32 bits of the subreg are valid. But because those > low 32 bits are shifted left 2 times, the low 34 bits of the ashift are > valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above > those are from the inner SImode reg (with the upper 62 bits being > undefined). s/upper 62 bits/upper 30 bits/
On Mon, Oct 26, 2020 at 11:06:22AM +0000, Alex Coplan wrote: > Well, only the low 32 bits of the subreg are valid. But because those > low 32 bits are shifted left 2 times, the low 34 bits of the ashift are > valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above > those are from the inner SImode reg (with the upper 62 bits being > undefined). Ugh. Yes, I think you are right. One more reason why we should only use *explicit* sign/zero extends, none of this confusing subreg business :-( > > > diff --git a/gcc/combine.c b/gcc/combine.c > > > index c88382efbd3..fe8eff2b464 100644 > > > --- a/gcc/combine.c > > > +++ b/gcc/combine.c > > > @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) > > > } > > > > > > /* If we reach here, we want to return a pair of shifts. The inner > > > - shift is a left shift of BITSIZE - POS - LEN bits. The outer > > > - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or > > > + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer > > > + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or > > > logical depending on the value of UNSIGNEDP. > > > > > > If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be > > > > MODEWIDTH isn't defined here yet, it is initialised just below to > > MODE_PRECISION (mode). > > Yes, but bitsize isn't defined at all in this function AFAICT. Are > comments not permitted to refer to variables defined immediately beneath > them? Of course you can -- comments are free form text after all -- but as written it suggest there already is an initialised variable "modewidth". Just move the initialisation to above this comment? Segher
Hi! On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, > is_mode = GET_MODE (SUBREG_REG (inner)); > inner = SUBREG_REG (inner); > } > + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) > + && pos_rtx == 0 && pos == 0) > + { > + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); > + const auto code = GET_CODE (inner); > + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; Can you instead replace the mult by a shift somewhere earlier in make_extract? That would make a lot more sense :-) Segher
On 26/10/2020 06:51, Segher Boessenkool wrote: > On Mon, Oct 26, 2020 at 11:06:22AM +0000, Alex Coplan wrote: > > Well, only the low 32 bits of the subreg are valid. But because those > > low 32 bits are shifted left 2 times, the low 34 bits of the ashift are > > valid: the bottom 2 bits of the ashift are zeros, and the 32 bits above > > those are from the inner SImode reg (with the upper 62 bits being > > undefined). > > Ugh. Yes, I think you are right. One more reason why we should only > use *explicit* sign/zero extends, none of this confusing subreg > business :-( Yeah. IIRC expand_compound_operation() introduces the subreg because it explicitly wants to rewrite the sign_extend using a pair of shifts (without using an extend rtx). Something like: (ashiftrt:DI (ashift:DI (subreg:DI (reg:SI r) 0) (const_int 32)) (const_int 32)) > > > > > diff --git a/gcc/combine.c b/gcc/combine.c > > > > index c88382efbd3..fe8eff2b464 100644 > > > > --- a/gcc/combine.c > > > > +++ b/gcc/combine.c > > > > @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) > > > > } > > > > > > > > /* If we reach here, we want to return a pair of shifts. The inner > > > > - shift is a left shift of BITSIZE - POS - LEN bits. The outer > > > > - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or > > > > + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer > > > > + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or > > > > logical depending on the value of UNSIGNEDP. > > > > > > > > If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be > > > > > > MODEWIDTH isn't defined here yet, it is initialised just below to > > > MODE_PRECISION (mode). > > > > Yes, but bitsize isn't defined at all in this function AFAICT. Are > > comments not permitted to refer to variables defined immediately beneath > > them? > > Of course you can -- comments are free form text after all -- but as > written it suggest there already is an initialised variable "modewidth". > > Just move the initialisation to above this comment? Sure, see the revised patch attached. Thanks, Alex diff --git a/gcc/combine.c b/gcc/combine.c index 4782e1d9dcc..d4793c1c575 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7418,9 +7418,11 @@ expand_compound_operation (rtx x) } + modewidth = GET_MODE_PRECISION (mode); + /* If we reach here, we want to return a pair of shifts. The inner - shift is a left shift of BITSIZE - POS - LEN bits. The outer - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or logical depending on the value of UNSIGNEDP. If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be @@ -7433,7 +7435,6 @@ expand_compound_operation (rtx x) extraction. Then the constant of 31 would be substituted in to produce such a position. */ - modewidth = GET_MODE_PRECISION (mode); if (modewidth >= pos + len) { tem = gen_lowpart (mode, XEXP (x, 0)); @@ -7650,20 +7651,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, is_mode = GET_MODE (SUBREG_REG (inner)); inner = SUBREG_REG (inner); } - else if (GET_CODE (inner) == ASHIFT + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) && CONST_INT_P (XEXP (inner, 1)) - && pos_rtx == 0 && pos == 0 - && len > UINTVAL (XEXP (inner, 1))) - { - /* We're extracting the least significant bits of an rtx - (ashift X (const_int C)), where LEN > C. Extract the - least significant (LEN - C) bits of X, giving an rtx - whose mode is MODE, then shift it left C times. */ - new_rtx = make_extraction (mode, XEXP (inner, 0), - 0, 0, len - INTVAL (XEXP (inner, 1)), - unsignedp, in_dest, in_compare); - if (new_rtx != 0) - return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + && pos_rtx == 0 && pos == 0) + { + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); + const auto code = GET_CODE (inner); + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; + + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt) + { + /* We're extracting the least significant bits of an rtx + (ashift X (const_int C)) or (mult X (const_int (2^C))), + where LEN > C. Extract the least significant (LEN - C) bits + of X, giving an rtx whose mode is MODE, then shift it left + C times. */ + new_rtx = make_extraction (mode, XEXP (inner, 0), + 0, 0, len - shift_amt, + unsignedp, in_dest, in_compare); + if (new_rtx) + return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1)); + } } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract
On 26/10/2020 07:12, Segher Boessenkool wrote: > Hi! > > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > > @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, > > is_mode = GET_MODE (SUBREG_REG (inner)); > > inner = SUBREG_REG (inner); > > } > > + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) > > + && pos_rtx == 0 && pos == 0) > > + { > > + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); > > + const auto code = GET_CODE (inner); > > + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; > > Can you instead replace the mult by a shift somewhere earlier in > make_extract? That would make a lot more sense :-) I guess we could do this, the only complication being that we can't unconditionally rewrite the expression using a shift, since mult is canonical inside a mem (which is why we see it in the testcase in the PR). So if we did this, we'd have to remember that we did it earlier on, and rewrite it back to a mult accordingly. Would you still like to see a version of the patch that does that, or is this version OK: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557050.html ? Thanks, Alex
On Mon, Oct 26, 2020 at 01:28:42PM +0000, Alex Coplan wrote: > On 26/10/2020 07:12, Segher Boessenkool wrote: > > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > > Can you instead replace the mult by a shift somewhere earlier in > > make_extract? That would make a lot more sense :-) > > I guess we could do this, the only complication being that we can't > unconditionally rewrite the expression using a shift, since mult is canonical > inside a mem (which is why we see it in the testcase in the PR). You can do it just inside the block you are already editing. > So if we did this, we'd have to remember that we did it earlier on, and rewrite > it back to a mult accordingly. Yes, this function has ridiculously complicated cpontrol flow. So I cannot trick you into improving it? ;-) > Would you still like to see a version of the patch that does that, or is this > version OK: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557050.html ? I do not like handling both mult and ashift in one case like this, it complicates things for no good reason. Write it as two cases, and it should be good. Segher
On Mon, Oct 26, 2020 at 01:18:54PM +0000, Alex Coplan wrote: > - else if (GET_CODE (inner) == ASHIFT > + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) As I wrote in the other mail, write this as two cases. Write something in the comment for the mult one that this is for the canonicalisation of memory addresses (feel free to use swear words). > + { > + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); > + const auto code = GET_CODE (inner); > + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; > + > + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt) Space after cast; better is to not need a cast at all (and you do not need one, len is unsigned HOST_WIDE_INT already). Segher
On 26/10/2020 12:43, Segher Boessenkool wrote: > On Mon, Oct 26, 2020 at 01:28:42PM +0000, Alex Coplan wrote: > > On 26/10/2020 07:12, Segher Boessenkool wrote: > > > On Thu, Oct 15, 2020 at 09:59:24AM +0100, Alex Coplan wrote: > > > Can you instead replace the mult by a shift somewhere earlier in > > > make_extract? That would make a lot more sense :-) > > > > I guess we could do this, the only complication being that we can't > > unconditionally rewrite the expression using a shift, since mult is canonical > > inside a mem (which is why we see it in the testcase in the PR). > > You can do it just inside the block you are already editing. > > > So if we did this, we'd have to remember that we did it earlier on, and rewrite > > it back to a mult accordingly. > > Yes, this function has ridiculously complicated cpontrol flow. So I > cannot trick you into improving it? ;-) > > > Would you still like to see a version of the patch that does that, or is this > > version OK: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557050.html ? > > I do not like handling both mult and ashift in one case like this, it > complicates things for no good reason. Write it as two cases, and it > should be good. OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top of make_extraction so that the existing ASHIFT block can do the work for us. We remember if we did it and then convert it back if necessary. I'm not convinced that it's an improvement. What do you think? Bootstrap/regtest in progress on aarch64-none-linux-gnu. I'll test other platforms (as well as testing on top of 1/2) and repost with a proper commit message if you think it looks good. Alex --- gcc/ChangeLog: (make_extraction): Temporarily rewrite (mult x 2^n) so that we can handle it as (ashift x n) and avoid emitting an extract where extend+shift will suffice. diff --git a/gcc/combine.c b/gcc/combine.c index 4782e1d9dcc..991dc5eabf7 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7628,10 +7628,25 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, rtx new_rtx = 0; rtx orig_pos_rtx = pos_rtx; HOST_WIDE_INT orig_pos; + bool rewrote_mult_p = false; if (pos_rtx && CONST_INT_P (pos_rtx)) pos = INTVAL (pos_rtx), pos_rtx = 0; + if (GET_CODE (inner) == MULT && CONST_INT_P (XEXP (inner, 1))) + { + /* We have to handle shifts disguised as multiplications + by powers of two since this is the canonical form for + mem addresses. */ + const int shift_amt = exact_log2 (INTVAL (XEXP (inner, 1))); + if (shift_amt > 0) + { + PUT_CODE (inner, ASHIFT); + INTVAL (XEXP (inner, 1)) = shift_amt; + rewrote_mult_p = true; + } + } + if (GET_CODE (inner) == SUBREG && subreg_lowpart_p (inner) && (paradoxical_subreg_p (inner) @@ -7663,7 +7678,7 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, 0, 0, len - INTVAL (XEXP (inner, 1)), unsignedp, in_dest, in_compare); if (new_rtx != 0) - return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + new_rtx = gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract @@ -7673,6 +7688,17 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, && known_le (pos + len, GET_MODE_PRECISION (is_mode))) inner = XEXP (inner, 0); + if (rewrote_mult_p) + { + /* If we rewrote MULT -> ASHIFT, convert it back now. */ + rtx x = new_rtx ? new_rtx : inner; + PUT_CODE (x, MULT); + INTVAL (XEXP (x, 1)) = 1 << INTVAL (XEXP (x, 1)); + } + + if (new_rtx) + return new_rtx; + inner_mode = GET_MODE (inner); /* See if this can be done without an extraction. We never can if the
On 27/10/2020 10:35, Alex Coplan via Gcc-patches wrote: > On 26/10/2020 12:43, Segher Boessenkool wrote: > > I do not like handling both mult and ashift in one case like this, it > > complicates things for no good reason. Write it as two cases, and it > > should be good. > > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top > of make_extraction so that the existing ASHIFT block can do the work for > us. We remember if we did it and then convert it back if necessary. > > I'm not convinced that it's an improvement. What do you think? > > Bootstrap/regtest in progress on aarch64-none-linux-gnu. I'll test other > platforms (as well as testing on top of 1/2) and repost with a proper > commit message if you think it looks good. This passes bootstrap and regtest on aarch64, FWIW. Alex > > gcc/ChangeLog: > > (make_extraction): Temporarily rewrite (mult x 2^n) so that we > can handle it as (ashift x n) and avoid emitting an extract where > extend+shift will suffice.
On Tue, Oct 27, 2020 at 10:35:59AM +0000, Alex Coplan wrote: > On 26/10/2020 12:43, Segher Boessenkool wrote: > > I do not like handling both mult and ashift in one case like this, it > > complicates things for no good reason. Write it as two cases, and it > > should be good. > > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top > of make_extraction so that the existing ASHIFT block can do the work for > us. We remember if we did it and then convert it back if necessary. > > I'm not convinced that it's an improvement. What do you think? Restoring it like that is just yuck. That can be okay if it is as the start and end of a smallish function, importantly some self-contained piece of code, but this is not. Just write it as two blocks? One handling the shift, that is already there; and add one block adding the mult case. That should not increase the complexity of this already way too complex code. Segher
On 27/10/2020 17:31, Segher Boessenkool wrote: > On Tue, Oct 27, 2020 at 10:35:59AM +0000, Alex Coplan wrote: > > On 26/10/2020 12:43, Segher Boessenkool wrote: > > > I do not like handling both mult and ashift in one case like this, it > > > complicates things for no good reason. Write it as two cases, and it > > > should be good. > > > > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top > > of make_extraction so that the existing ASHIFT block can do the work for > > us. We remember if we did it and then convert it back if necessary. > > > > I'm not convinced that it's an improvement. What do you think? > > Restoring it like that is just yuck. That can be okay if it is as the > start and end of a smallish function, importantly some self-contained > piece of code, but this is not. > > Just write it as two blocks? One handling the shift, that is already > there; and add one block adding the mult case. That should not > increase the complexity of this already way too complex code. OK, how about the attached? Bootstrap and regtest in progress on aarch64-none-linux-gnu. Thanks, Alex --- gcc/ChangeLog: * combine.c (make_extraction): Also handle shfits written as (mult x 2^n), avoid creating an extract rtx for these. diff --git a/gcc/combine.c b/gcc/combine.c index 4782e1d9dcc..5040dabff98 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7665,6 +7665,24 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, if (new_rtx != 0) return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); } + else if (GET_CODE (inner) == MULT + && CONST_INT_P (XEXP (inner, 1)) + && pos_rtx == 0 && pos == 0) + { + /* We're extracting the least significant bits of an rtx + (mult X (const_int 2^C)), where LEN > C. Extract the + least significant (LEN - C) bits of X, giving an rtx + whose mode is MODE, then multiply it by 2^C. */ + const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1))); + if (shift_amt > 0 && len > shift_amt) + { + new_rtx = make_extraction (mode, XEXP (inner, 0), + 0, 0, len - shift_amt, + unsignedp, in_dest, in_compare); + if (new_rtx) + return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1)); + } + } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract bits outside of is_mode, don't look through
On 28/10/2020 09:09, Alex Coplan via Gcc-patches wrote: > On 27/10/2020 17:31, Segher Boessenkool wrote: > > On Tue, Oct 27, 2020 at 10:35:59AM +0000, Alex Coplan wrote: > > > On 26/10/2020 12:43, Segher Boessenkool wrote: > > > > I do not like handling both mult and ashift in one case like this, it > > > > complicates things for no good reason. Write it as two cases, and it > > > > should be good. > > > > > > OK, the attached patch rewrites (mult x 2^n) to (ashift x n) at the top > > > of make_extraction so that the existing ASHIFT block can do the work for > > > us. We remember if we did it and then convert it back if necessary. > > > > > > I'm not convinced that it's an improvement. What do you think? > > > > Restoring it like that is just yuck. That can be okay if it is as the > > start and end of a smallish function, importantly some self-contained > > piece of code, but this is not. > > > > Just write it as two blocks? One handling the shift, that is already > > there; and add one block adding the mult case. That should not > > increase the complexity of this already way too complex code. > > OK, how about the attached? > > Bootstrap and regtest in progress on aarch64-none-linux-gnu. This fails bootstrap since we trigger -Wsign-compare without the cast to unsigned HOST_WIDE_INT on shift_amt: > + const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1))); > + if (shift_amt > 0 && len > shift_amt) So, quoting an earlier reply: > > + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; > > + > > + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt) > > Space after cast; better is to not need a cast at all (and you do not > need one, len is unsigned HOST_WIDE_INT already). unfortunately we do need the cast here. See the revised patch attached, bootstrap on aarch64 in progress. Thanks, Alex --- gcc/ChangeLog: * combine.c (make_extraction): Also handle shfits written as (mult x 2^n), avoid creating an extract rtx for these. diff --git a/gcc/combine.c b/gcc/combine.c index 4782e1d9dcc..729d04b1d9e 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7665,6 +7665,24 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, if (new_rtx != 0) return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); } + else if (GET_CODE (inner) == MULT + && CONST_INT_P (XEXP (inner, 1)) + && pos_rtx == 0 && pos == 0) + { + /* We're extracting the least significant bits of an rtx + (mult X (const_int 2^C)), where LEN > C. Extract the + least significant (LEN - C) bits of X, giving an rtx + whose mode is MODE, then multiply it by 2^C. */ + const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1))); + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT) shift_amt) + { + new_rtx = make_extraction (mode, XEXP (inner, 0), + 0, 0, len - shift_amt, + unsignedp, in_dest, in_compare); + if (new_rtx) + return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1)); + } + } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract bits outside of is_mode, don't look through
Hi! On Wed, Oct 28, 2020 at 11:13:28AM +0000, Alex Coplan wrote: > This fails bootstrap since we trigger -Wsign-compare without the cast to > unsigned HOST_WIDE_INT on shift_amt: > > + else if (GET_CODE (inner) == MULT > + && CONST_INT_P (XEXP (inner, 1)) > + && pos_rtx == 0 && pos == 0) > + { > + /* We're extracting the least significant bits of an rtx > + (mult X (const_int 2^C)), where LEN > C. Extract the > + least significant (LEN - C) bits of X, giving an rtx > + whose mode is MODE, then multiply it by 2^C. */ > + const HOST_WIDE_INT shift_amt = exact_log2 (INTVAL (XEXP (inner, 1))); > + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT) shift_amt) if (IN_RANGE (shift_amt, 1, len - 1)) > + { > + new_rtx = make_extraction (mode, XEXP (inner, 0), > + 0, 0, len - shift_amt, > + unsignedp, in_dest, in_compare); > + if (new_rtx) > + return gen_rtx_MULT (mode, new_rtx, XEXP (inner, 1)); > + } > + } > else if (GET_CODE (inner) == TRUNCATE > /* If trying or potentionally trying to extract > bits outside of is_mode, don't look through Okay for trunk like that. Thanks! Segher
diff --git a/gcc/combine.c b/gcc/combine.c index 4782e1d9dcc..88f3925aee7 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -7419,8 +7419,8 @@ expand_compound_operation (rtx x) } /* If we reach here, we want to return a pair of shifts. The inner - shift is a left shift of BITSIZE - POS - LEN bits. The outer - shift is a right shift of BITSIZE - LEN bits. It is arithmetic or + shift is a left shift of MODEWIDTH - POS - LEN bits. The outer + shift is a right shift of MODEWIDTH - LEN bits. It is arithmetic or logical depending on the value of UNSIGNEDP. If this was a ZERO_EXTEND or ZERO_EXTRACT, this pair of shifts will be @@ -7650,20 +7650,27 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos, is_mode = GET_MODE (SUBREG_REG (inner)); inner = SUBREG_REG (inner); } - else if (GET_CODE (inner) == ASHIFT + else if ((GET_CODE (inner) == ASHIFT || GET_CODE (inner) == MULT) && CONST_INT_P (XEXP (inner, 1)) - && pos_rtx == 0 && pos == 0 - && len > UINTVAL (XEXP (inner, 1))) - { - /* We're extracting the least significant bits of an rtx - (ashift X (const_int C)), where LEN > C. Extract the - least significant (LEN - C) bits of X, giving an rtx - whose mode is MODE, then shift it left C times. */ - new_rtx = make_extraction (mode, XEXP (inner, 0), - 0, 0, len - INTVAL (XEXP (inner, 1)), - unsignedp, in_dest, in_compare); - if (new_rtx != 0) - return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1)); + && pos_rtx == 0 && pos == 0) + { + const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1)); + const auto code = GET_CODE (inner); + const HOST_WIDE_INT shift_amt = (code == MULT) ? exact_log2 (ci) : ci; + + if (shift_amt > 0 && len > (unsigned HOST_WIDE_INT)shift_amt) + { + /* We're extracting the least significant bits of an rtx + (ashift X (const_int C)) or (mult X (const_int (2^C))), + where LEN > C. Extract the least significant (LEN - C) bits + of X, giving an rtx whose mode is MODE, then shift it left + C times. */ + new_rtx = make_extraction (mode, XEXP (inner, 0), + 0, 0, len - shift_amt, + unsignedp, in_dest, in_compare); + if (new_rtx) + return gen_rtx_fmt_ee (code, mode, new_rtx, XEXP (inner, 1)); + } } else if (GET_CODE (inner) == TRUNCATE /* If trying or potentionally trying to extract