Message ID | DB6PR0801MB2053FB1EE13D95936C2E700683EB0@DB6PR0801MB2053.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | PR82964: Fix 128-bit immediate ICEs | expand |
On Mon, Jan 15, 2018 at 11:34:19AM +0000, Wilco Dijkstra wrote: > This fixes PR82964 which reports ICEs for some CONST_WIDE_INT immediates. > It turns out decimal floating point CONST_DOUBLE get changed into > CONST_WIDE_INT without checking the constraint on the operand, which > results in failures. Avoid this by only allowing SF/DF/TF mode floating > point constants in aarch64_legitimate_constant_p. A similar issue can > occur with 128-bit immediates which may be emitted even when disallowed > in aarch64_legitimate_constant_p, and the constraints in movti_aarch64 > don't match. Fix this with a new constraint and allowing valid immediates > in aarch64_legitimate_constant_p. > > Rather than allowing all 128-bit immediates and expanding in up to 8 > MOV/MOVK instructions, limit them to 4 instructions and use a literal > load for other cases. Improve the pr79041-2.c test to use a literal and > skip it for -fpic. > > This fixes all reported failures. OK for commit? Most of this makes sense, but I don't understand this relaxation in aarch64_legitimate_constant_p > - /* Do not allow wide int constants - this requires support in movti. */ > + /* Only allow simple 128-bit immediates. */ > if (CONST_WIDE_INT_P (x)) > - return false; > + return aarch64_mov128_immediate (x); I can see why this could be correct, but it is unclear why it is neccessary to fix the bug. What goes wrong if we leave this as "return false". I think the patch looks OK otherwise, but I'd appreciate an answer on that point before you commit. Thanks, James
James Greenhalgh wrote: > - /* Do not allow wide int constants - this requires support in movti. */ > + /* Only allow simple 128-bit immediates. */ > if (CONST_WIDE_INT_P (x)) > - return false; > + return aarch64_mov128_immediate (x); > I can see why this could be correct, but it is unclear why it is neccessary > to fix the bug. What goes wrong if we leave this as "return false". It's not necessary, things only go wrong if you return true for a wider set of immediates than those directly supported by the movti pattern - and that may be a regalloc issue. However removing it (returning false in all cases) actually improves code quality due to a bug in memset expansion. Therefore I'll commit it as returning false for now (there was no change in test results) and update it once memset is fixed and inlining works as expected. Returning true means memset(p, 32, 63) expands as: mov x2, 2314885530818453536 mov x3, 2314885530818453536 mov x6, 2314885530818453536 mov w5, 538976288 mov w4, 8224 mov w1, 32 stp x2, x3, [x0] stp x2, x3, [x0, 16] stp x2, x3, [x0, 32] str x6, [x0, 48] str w5, [x0, 56] strh w4, [x0, 60] strb w1, [x0, 62] ret Wilco
Hi Wilco, On 17 January 2018 at 17:22, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > James Greenhalgh wrote: > >> - /* Do not allow wide int constants - this requires support in movti. */ >> + /* Only allow simple 128-bit immediates. */ >> if (CONST_WIDE_INT_P (x)) >> - return false; >> + return aarch64_mov128_immediate (x); > >> I can see why this could be correct, but it is unclear why it is neccessary >> to fix the bug. What goes wrong if we leave this as "return false". > > It's not necessary, things only go wrong if you return true for a wider set of > immediates than those directly supported by the movti pattern - and that may > be a regalloc issue. > > However removing it (returning false in all cases) actually improves code quality > due to a bug in memset expansion. Therefore I'll commit it as returning false > for now (there was no change in test results) and update it once memset is fixed > and inlining works as expected. > > Returning true means memset(p, 32, 63) expands as: > > mov x2, 2314885530818453536 > mov x3, 2314885530818453536 > mov x6, 2314885530818453536 > mov w5, 538976288 > mov w4, 8224 > mov w1, 32 > stp x2, x3, [x0] > stp x2, x3, [x0, 16] > stp x2, x3, [x0, 32] > str x6, [x0, 48] > str w5, [x0, 56] > strh w4, [x0, 60] > strb w1, [x0, 62] > ret > > Wilco > > After this patch (r256800), I have noticed new failures on aarch64: gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times mov\tw[0-9]+, #?19520 3 (found 0 times) gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times movi\tv[0-9]+\\.2s, 0x4c, lsl 8 1 (found 0 times) gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times movi\tv[0-9]+\\.2s, 0xbc, lsl 8 1 (found 0 times) gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times fmov\th[0-9], w[0-9]+ 1 (found 0 times) gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times mov\tw[0-9]+, 19520 1 (found 0 times) gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times movi\tv[0-9]+\\.2s, 0x5c, lsl 8 1 (found 0 times) gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times movi\tv[0-9]+\\.2s, 0x7c, lsl 8 1 (found 0 times) gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times movi\tv[0-9]+\\.2s, 0x80, lsl 8 1 (found 0 times) gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times movi\tv[0-9]+\\.4h, ?#0 1 (found 0 times) Christophe
Christophe Lyon wrote: > After this patch (r256800), I have noticed new failures on aarch64: > gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times > mov\tw[0-9]+, #?19520 3 (found 0 times) Thanks for spotting these, the scripts appear to have missed those (contrib/dg-cmp-results.sh skipped them eventhough they were in contrib/compare_tests...). Anyway these are fixed now. Wilco
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 15c3b46ebef8f305f960e60a8b4e85d8be07e8c7..bc93f4c5753b47c05c144f4a80ba8034603d3736 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -431,6 +431,8 @@ void aarch64_split_128bit_move (rtx, rtx); bool aarch64_split_128bit_move_p (rtx, rtx); +bool aarch64_mov128_immediate (rtx); + void aarch64_split_simd_combine (rtx, rtx, rtx); void aarch64_split_simd_move (rtx, rtx); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 7b50ab43dbc075e6b6d4541c3fb71e5cc872c88b..e6cdbe74356e395c887082cea66a468b51b2ff47 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1996,6 +1996,23 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate, return num_insns; } +/* Return whether imm is a 128-bit immediate which is simple enough to + expand inline. */ +bool +aarch64_mov128_immediate (rtx imm) +{ + if (GET_CODE (imm) == CONST_INT) + return true; + + gcc_assert (CONST_WIDE_INT_NUNITS (imm) == 2); + + rtx lo = GEN_INT (CONST_WIDE_INT_ELT (imm, 0)); + rtx hi = GEN_INT (CONST_WIDE_INT_ELT (imm, 1)); + + return aarch64_internal_mov_immediate (NULL_RTX, lo, false, DImode) + + aarch64_internal_mov_immediate (NULL_RTX, hi, false, DImode) <= 4; +} + /* Add DELTA to REGNUM in mode MODE. SCRATCHREG can be used to hold a temporary value if necessary. FRAME_RELATED_P should be true if the RTX_FRAME_RELATED flag should be set and CFA adjustments added @@ -10650,7 +10667,10 @@ static bool aarch64_legitimate_constant_p (machine_mode mode, rtx x) { /* Support CSE and rematerialization of common constants. */ - if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR) + if (CONST_INT_P (x) + || (CONST_DOUBLE_P (x) + && (mode == SFmode || mode == DFmode || mode == TFmode)) + || GET_CODE (x) == CONST_VECTOR) return true; /* Do not allow vector struct mode constants. We could support @@ -10658,9 +10678,9 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x) if (aarch64_vect_struct_mode_p (mode)) return false; - /* Do not allow wide int constants - this requires support in movti. */ + /* Only allow simple 128-bit immediates. */ if (CONST_WIDE_INT_P (x)) - return false; + return aarch64_mov128_immediate (x); /* Do not allow const (plus (anchor_symbol, const_int)). */ if (GET_CODE (x) == CONST) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index d14b57b0ef7f4eeca40bfdcaf3ebb02a1031cb99..382953e6ec42ae4475d66143be1e25d22e48571f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1023,9 +1023,9 @@ (define_expand "movti" (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "=r, w,r,w,r,m,m,w,m") + "nonimmediate_operand" "= r,w, r,w,r,m,m,w,m") (match_operand:TI 1 - "aarch64_movti_operand" " rn,r,w,w,m,r,Z,m,w"))] + "aarch64_movti_operand" " rUti,r, w,w,m,r,Z,m,w"))] "(register_operand (operands[0], TImode) || aarch64_reg_or_zero (operands[1], TImode))" "@ diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index af4143ef756464afac29d17f124b436520f90451..b13fec91fbc337c60d2c7e24080d25fdf1706e7b 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -69,6 +69,12 @@ (define_constraint "N" (and (match_code "const_int") (match_test "aarch64_move_imm (ival, DImode)"))) +(define_constraint "Uti" + "A constant that can be used with a 128-bit MOV immediate operation." + (and (ior (match_code "const_int") + (match_code "const_wide_int")) + (match_test "aarch64_mov128_immediate (op)"))) + (define_constraint "UsO" "A constant that can be used with a 32-bit and operation." (and (match_code "const_int") diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 2eaf0a7630169c3f4c23632d2a90be9ca15680df..fc06365cc11c4d86f7b95489a83d9a5698265290 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -257,15 +257,14 @@ (define_predicate "aarch64_mov_operand" (match_test "aarch64_mov_operand_p (op, mode)"))))) (define_predicate "aarch64_movti_operand" - (and (match_code "reg,subreg,mem,const_int") - (ior (match_operand 0 "register_operand") - (ior (match_operand 0 "memory_operand") - (match_operand 0 "const_int_operand"))))) + (ior (match_operand 0 "register_operand") + (match_operand 0 "memory_operand") + (and (match_operand 0 "const_scalar_int_operand") + (match_test "aarch64_mov128_immediate (op)")))) (define_predicate "aarch64_reg_or_imm" - (and (match_code "reg,subreg,const_int") - (ior (match_operand 0 "register_operand") - (match_operand 0 "const_int_operand")))) + (ior (match_operand 0 "register_operand") + (match_operand 0 "const_scalar_int_operand"))) ;; True for integer comparisons and for FP comparisons other than LTGT or UNEQ. (define_special_predicate "aarch64_comparison_operator" diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c index a889dfdd89549fdd19d2e167d43835ff0cbb236a..4695b5c1b2b7c9b515995e242dd38e0519a48a2b 100644 --- a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c @@ -1,11 +1,12 @@ /* { dg-do compile } */ /* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */ /* { dg-require-effective-target lp64 } */ +/* { dg-skip-if "-mcmodel=large, no support for -fpic" { aarch64-*-* } { "-fpic" } { "" } } */ __int128 t (void) { - return (__int128)1 << 80; + return ((__int128)0x123456789abcdef << 64) | 0xfedcba987654321; } /* { dg-final { scan-assembler "adr" } } */