PR82964: Fix 128-bit immediate ICEs

Message ID	DB6PR0801MB2053FB1EE13D95936C2E700683EB0@DB6PR0801MB2053.eurprd08.prod.outlook.com
State	New
Headers	show Return-Path: <gcc-patches-return-471262-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=GfN 6IfKrw+upR3QmG0O4Z4Xz2EjdakyxiNfnFsIHpylb89O5ooo3+liGYy55YtzvwOV nGMGO/sp2RdNpU/y5bL9VuCnDTc4Y7f83LFhz7BIJZKh0u9Oi6jh9iG2Xp3l+M6x m35vZaTEL6VnxKQXO+W7dQGPAO6EIAaPNsxPJHfE= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Wilco Dijkstra <Wilco.Dijkstra@arm.com> To: GCC Patches <gcc-patches@gcc.gnu.org> CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com> Subject: [PATCH] PR82964: Fix 128-bit immediate ICEs Date: Mon, 15 Jan 2018 11:34:19 +0000 Message-ID: <DB6PR0801MB2053FB1EE13D95936C2E700683EB0@DB6PR0801MB2053.eurprd08.prod.outlook.com> nodisclaimer: True received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: f09Ur62+ICletb/c9652vR1lKZb2SFW2whfgN+i0uxnwi5MR7Fl2QRURwVE+x3tY+6p61tItKOXyAqiQcKVliQ== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0
Series	PR82964: Fix 128-bit immediate ICEs \| expand PR82964: Fix 128-bit immediate ICEs

Message ID

DB6PR0801MB2053FB1EE13D95936C2E700683EB0@DB6PR0801MB2053.eurprd08.prod.outlook.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type
	:content-transfer-encoding:mime-version; q=dns; s=default; b=GfN
	6IfKrw+upR3QmG0O4Z4Xz2EjdakyxiNfnFsIHpylb89O5ooo3+liGYy55YtzvwOV
	nGMGO/sp2RdNpU/y5bL9VuCnDTc4Y7f83LFhz7BIJZKh0u9Oi6jh9iG2Xp3l+M6x
	m35vZaTEL6VnxKQXO+W7dQGPAO6EIAaPNsxPJHfE=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>
Subject: [PATCH] PR82964: Fix 128-bit immediate ICEs
Date: Mon, 15 Jan 2018 11:34:19 +0000
Message-ID: <DB6PR0801MB2053FB1EE13D95936C2E700683EB0@DB6PR0801MB2053.eurprd08.prod.outlook.com>
nodisclaimer: True
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 2a898d68-ab40-4e88-c745-08d55c0beaa3
X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jan 2018 11:34:19.2540
	(UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR08MB2665

Series

PR82964: Fix 128-bit immediate ICEs | expand

Commit Message

Wilco Dijkstra Jan. 15, 2018, 11:34 a.m. UTC

This fixes PR82964 which reports ICEs for some CONST_WIDE_INT immediates.
It turns out decimal floating point CONST_DOUBLE get changed into
CONST_WIDE_INT without checking the constraint on the operand, which 
results in failures.  Avoid this by only allowing SF/DF/TF mode floating
point constants in aarch64_legitimate_constant_p.  A similar issue can
occur with 128-bit immediates which may be emitted even when disallowed
in aarch64_legitimate_constant_p, and the constraints in movti_aarch64
don't match.  Fix this with a new constraint and allowing valid immediates
in aarch64_legitimate_constant_p.

Rather than allowing all 128-bit immediates and expanding in up to 8
MOV/MOVK instructions, limit them to 4 instructions and use a literal
load for other cases.  Improve the pr79041-2.c test to use a literal and
skip it for -fpic.

This fixes all reported failures. OK for commit?


ChangeLog:
2018-01-15  Wilco Dijkstra  <wdijkstr@arm.com>
	    Richard Sandiford  <richard.sandiford@linaro.org>

    gcc/
	PR target/82964
	* config/aarch64/aarch64.md (movti_aarch64): Use Uti constraint.
	* config/aarch64/aarch64.c (aarch64_mov128_immediate): New function.
	(aarch64_legitimate_constant_p): Just support CONST_DOUBLE 
	SF/DF/TF mode to avoid creating illegal CONST_WIDE_INT immediates.
	Call aarch64_mov128_immediate for CONST_WIDE_INT.
	* config/aarch64/aarch64-protos.h (aarch64_mov128_immediate): Add declaration.
	* config/aarch64/constraints.md (aarch64_movti_operand): Limit immediates.
	* config/aarch64/predicates.md (Uti): Add new constraint.

    gcc/testsuite/
	PR target/79041
	PR target/82964
	* gcc.target/aarch64/pr79041-2.c: improve test, disable with fpic.
--

Comments

James Greenhalgh Jan. 16, 2018, 5:12 p.m. UTC | #1

On Mon, Jan 15, 2018 at 11:34:19AM +0000, Wilco Dijkstra wrote:
> This fixes PR82964 which reports ICEs for some CONST_WIDE_INT immediates.
> It turns out decimal floating point CONST_DOUBLE get changed into
> CONST_WIDE_INT without checking the constraint on the operand, which 
> results in failures.  Avoid this by only allowing SF/DF/TF mode floating
> point constants in aarch64_legitimate_constant_p.  A similar issue can
> occur with 128-bit immediates which may be emitted even when disallowed
> in aarch64_legitimate_constant_p, and the constraints in movti_aarch64
> don't match.  Fix this with a new constraint and allowing valid immediates
> in aarch64_legitimate_constant_p.
> 
> Rather than allowing all 128-bit immediates and expanding in up to 8
> MOV/MOVK instructions, limit them to 4 instructions and use a literal
> load for other cases.  Improve the pr79041-2.c test to use a literal and
> skip it for -fpic.
> 
> This fixes all reported failures. OK for commit?

Most of this makes sense, but I don't understand this relaxation in
aarch64_legitimate_constant_p

> -  /* Do not allow wide int constants - this requires support in movti.  */
> +  /* Only allow simple 128-bit immediates.  */
>    if (CONST_WIDE_INT_P (x))
> -    return false;
> +    return aarch64_mov128_immediate (x);

I can see why this could be correct, but it is unclear why it is neccessary
to fix the bug. What goes wrong if we leave this as "return false".

I think the patch looks OK otherwise, but I'd appreciate an answer on that
point before you commit.

Thanks,
James

Wilco Dijkstra Jan. 17, 2018, 4:22 p.m. UTC | #2

James Greenhalgh wrote:

> -  /* Do not allow wide int constants - this requires support in movti.  */
> +  /* Only allow simple 128-bit immediates.  */
>    if (CONST_WIDE_INT_P (x))
> -    return false;
> +    return aarch64_mov128_immediate (x);

> I can see why this could be correct, but it is unclear why it is neccessary
> to fix the bug. What goes wrong if we leave this as "return false".

It's not necessary, things only go wrong if you return true for a wider set of
immediates than those directly supported by the movti pattern - and that may
be a regalloc issue.

However removing it (returning false in all cases) actually improves code quality
due to a bug in memset expansion. Therefore I'll commit it as returning false
for now (there was no change in test results) and update it once memset is fixed
and inlining works as expected.

Returning true means memset(p, 32, 63) expands as:

	mov	x2, 2314885530818453536
	mov	x3, 2314885530818453536
	mov	x6, 2314885530818453536
	mov	w5, 538976288
	mov	w4, 8224
	mov	w1, 32
	stp	x2, x3, [x0]
	stp	x2, x3, [x0, 16]
	stp	x2, x3, [x0, 32]
	str	x6, [x0, 48]
	str	w5, [x0, 56]
	strh	w4, [x0, 60]
	strb	w1, [x0, 62]
	ret

Wilco

Christophe Lyon Jan. 18, 2018, 9:33 a.m. UTC | #3

Hi Wilco,


On 17 January 2018 at 17:22, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> James Greenhalgh wrote:
>
>> -  /* Do not allow wide int constants - this requires support in movti.  */
>> +  /* Only allow simple 128-bit immediates.  */
>>    if (CONST_WIDE_INT_P (x))
>> -    return false;
>> +    return aarch64_mov128_immediate (x);
>
>> I can see why this could be correct, but it is unclear why it is neccessary
>> to fix the bug. What goes wrong if we leave this as "return false".
>
> It's not necessary, things only go wrong if you return true for a wider set of
> immediates than those directly supported by the movti pattern - and that may
> be a regalloc issue.
>
> However removing it (returning false in all cases) actually improves code quality
> due to a bug in memset expansion. Therefore I'll commit it as returning false
> for now (there was no change in test results) and update it once memset is fixed
> and inlining works as expected.
>
> Returning true means memset(p, 32, 63) expands as:
>
>         mov     x2, 2314885530818453536
>         mov     x3, 2314885530818453536
>         mov     x6, 2314885530818453536
>         mov     w5, 538976288
>         mov     w4, 8224
>         mov     w1, 32
>         stp     x2, x3, [x0]
>         stp     x2, x3, [x0, 16]
>         stp     x2, x3, [x0, 32]
>         str     x6, [x0, 48]
>         str     w5, [x0, 56]
>         strh    w4, [x0, 60]
>         strb    w1, [x0, 62]
>         ret
>
> Wilco
>
>

After this patch (r256800), I have noticed new failures on aarch64:
    gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times
mov\tw[0-9]+, #?19520 3 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times
movi\tv[0-9]+\\.2s, 0x4c, lsl 8 1 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times
movi\tv[0-9]+\\.2s, 0xbc, lsl 8 1 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times
fmov\th[0-9], w[0-9]+ 1 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times
mov\tw[0-9]+, 19520 1 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times
movi\tv[0-9]+\\.2s, 0x5c, lsl 8 1 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times
movi\tv[0-9]+\\.2s, 0x7c, lsl 8 1 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times
movi\tv[0-9]+\\.2s, 0x80, lsl 8 1 (found 0 times)
    gcc.target/aarch64/f16_mov_immediate_2.c scan-assembler-times
movi\tv[0-9]+\\.4h, ?#0 1 (found 0 times)

Christophe

Wilco Dijkstra Jan. 18, 2018, 4:47 p.m. UTC | #4

Christophe Lyon wrote:

> After this patch (r256800), I have noticed new failures on aarch64:
>    gcc.target/aarch64/f16_mov_immediate_1.c scan-assembler-times
> mov\tw[0-9]+, #?19520 3 (found 0 times)

Thanks for spotting these, the scripts appear to have missed those
(contrib/dg-cmp-results.sh skipped them eventhough they were in
contrib/compare_tests...). Anyway these are fixed now.

Wilco

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 15c3b46ebef8f305f960e60a8b4e85d8be07e8c7..bc93f4c5753b47c05c144f4a80ba8034603d3736 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -431,6 +431,8 @@  void aarch64_split_128bit_move (rtx, rtx);
 
 bool aarch64_split_128bit_move_p (rtx, rtx);
 
+bool aarch64_mov128_immediate (rtx);
+
 void aarch64_split_simd_combine (rtx, rtx, rtx);
 
 void aarch64_split_simd_move (rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7b50ab43dbc075e6b6d4541c3fb71e5cc872c88b..e6cdbe74356e395c887082cea66a468b51b2ff47 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1996,6 +1996,23 @@  aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
   return num_insns;
 }
 
+/* Return whether imm is a 128-bit immediate which is simple enough to
+   expand inline.  */
+bool
+aarch64_mov128_immediate (rtx imm)
+{
+  if (GET_CODE (imm) == CONST_INT)
+    return true;
+
+  gcc_assert (CONST_WIDE_INT_NUNITS (imm) == 2);
+
+  rtx lo = GEN_INT (CONST_WIDE_INT_ELT (imm, 0));
+  rtx hi = GEN_INT (CONST_WIDE_INT_ELT (imm, 1));
+
+  return aarch64_internal_mov_immediate (NULL_RTX, lo, false, DImode)
+	 + aarch64_internal_mov_immediate (NULL_RTX, hi, false, DImode) <= 4;
+}
+
 /* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to hold a
    temporary value if necessary.  FRAME_RELATED_P should be true if
    the RTX_FRAME_RELATED flag should be set and CFA adjustments added
@@ -10650,7 +10667,10 @@  static bool
 aarch64_legitimate_constant_p (machine_mode mode, rtx x)
 {
   /* Support CSE and rematerialization of common constants.  */
-  if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR)
+  if (CONST_INT_P (x)
+      || (CONST_DOUBLE_P (x)
+	  && (mode == SFmode || mode == DFmode || mode == TFmode))
+      || GET_CODE (x) == CONST_VECTOR)
     return true;
 
   /* Do not allow vector struct mode constants.  We could support
@@ -10658,9 +10678,9 @@  aarch64_legitimate_constant_p (machine_mode mode, rtx x)
   if (aarch64_vect_struct_mode_p (mode))
     return false;
 
-  /* Do not allow wide int constants - this requires support in movti.  */
+  /* Only allow simple 128-bit immediates.  */
   if (CONST_WIDE_INT_P (x))
-    return false;
+    return aarch64_mov128_immediate (x);
 
   /* Do not allow const (plus (anchor_symbol, const_int)).  */
   if (GET_CODE (x) == CONST)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d14b57b0ef7f4eeca40bfdcaf3ebb02a1031cb99..382953e6ec42ae4475d66143be1e25d22e48571f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1023,9 +1023,9 @@  (define_expand "movti"
 
 (define_insn "*movti_aarch64"
   [(set (match_operand:TI 0
-	 "nonimmediate_operand"  "=r, w,r,w,r,m,m,w,m")
+	 "nonimmediate_operand"  "=   r,w, r,w,r,m,m,w,m")
 	(match_operand:TI 1
-	 "aarch64_movti_operand" " rn,r,w,w,m,r,Z,m,w"))]
+	 "aarch64_movti_operand" " rUti,r, w,w,m,r,Z,m,w"))]
   "(register_operand (operands[0], TImode)
     || aarch64_reg_or_zero (operands[1], TImode))"
   "@
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index af4143ef756464afac29d17f124b436520f90451..b13fec91fbc337c60d2c7e24080d25fdf1706e7b 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -69,6 +69,12 @@  (define_constraint "N"
  (and (match_code "const_int")
       (match_test "aarch64_move_imm (ival, DImode)")))
 
+(define_constraint "Uti"
+ "A constant that can be used with a 128-bit MOV immediate operation."
+ (and (ior (match_code "const_int")
+	   (match_code "const_wide_int"))
+      (match_test "aarch64_mov128_immediate (op)")))
+
 (define_constraint "UsO"
  "A constant that can be used with a 32-bit and operation."
  (and (match_code "const_int")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 2eaf0a7630169c3f4c23632d2a90be9ca15680df..fc06365cc11c4d86f7b95489a83d9a5698265290 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -257,15 +257,14 @@  (define_predicate "aarch64_mov_operand"
 		 (match_test "aarch64_mov_operand_p (op, mode)")))))
 
 (define_predicate "aarch64_movti_operand"
-  (and (match_code "reg,subreg,mem,const_int")
-       (ior (match_operand 0 "register_operand")
-	    (ior (match_operand 0 "memory_operand")
-		 (match_operand 0 "const_int_operand")))))
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "memory_operand")
+       (and (match_operand 0 "const_scalar_int_operand")
+	    (match_test "aarch64_mov128_immediate (op)"))))
 
 (define_predicate "aarch64_reg_or_imm"
-  (and (match_code "reg,subreg,const_int")
-       (ior (match_operand 0 "register_operand")
-	    (match_operand 0 "const_int_operand"))))
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "const_scalar_int_operand")))
 
 ;; True for integer comparisons and for FP comparisons other than LTGT or UNEQ.
 (define_special_predicate "aarch64_comparison_operator"
diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
index a889dfdd89549fdd19d2e167d43835ff0cbb236a..4695b5c1b2b7c9b515995e242dd38e0519a48a2b 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
@@ -1,11 +1,12 @@ 
 /* { dg-do compile } */
 /* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
 /* { dg-require-effective-target lp64 } */
+/* { dg-skip-if "-mcmodel=large, no support for -fpic" { aarch64-*-* }  { "-fpic" } { "" } } */
 
 __int128
 t (void)
 {
-  return (__int128)1 << 80;
+  return ((__int128)0x123456789abcdef << 64) | 0xfedcba987654321;
 }
 
 /* { dg-final { scan-assembler "adr" } } */