From patchwork Tue May 22 13:13:22 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
X-Patchwork-Id: 918310
Return-Path: 
 <gcc-patches-return-478157-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-478157-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none)
	header.from=foss.arm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="Ukb4T7tj"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 40qx122vgmz9s7X
	for <incoming@patchwork.ozlabs.org>;
	Tue, 22 May 2018 23:13:40 +1000 (AEST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	q=dns; s=default; b=QKIVwW6X2uC+St+AHgCYsGZ8pUZ2VgnSlq33tXDk0DI
	XGDAmj72biLap6IedCXZmUxiwbUX12QfXay6uPyxoT4EMY2TJyFtk74X2MOWEDcP
	22/T4eYroDKUL5taofuexTizapg2ZQlOOhCM/JBGZ/VjepyBEDbK8os95+ZKuoMQ
	=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	s=default; bh=lkKW61HcDF6ot/f5XCx+sCmacTs=; b=Ukb4T7tjjeYUoHFWo
	K/KN8lA5Mgr5MZPPPNVZHd3UH2d99ZVXayS6L/+zQSav+/zlziKB3HbD33jHC8iq
	s2p0Q9rN1YiXrfPnP/ZKkR97664W8Kc9OXKh3gKPsDJxHSbONBCV9vqrCaNBmR1/
	UIPgdmW7gmDeTTuKpDgw3pCfQg=
Received: (qmail 54528 invoked by alias); 22 May 2018 13:13:32 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 53249 invoked by uid 89); 22 May 2018 13:13:29 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
	KAM_LAZY_DOMAIN_SECURITY,
	KAM_LOTSOFHASH autolearn=ham version=3.3.2 spammy=jackson,
	UD:predicates.md, predicates.md, predicatesmd
X-HELO: foss.arm.com
Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com)
	(217.140.101.70) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Tue, 22 May 2018 13:13:26 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by
	usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id
	4271A1435; Tue, 22 May 2018 06:13:25 -0700 (PDT)
Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com
	[10.2.207.77])	by usa-sjc-imap-foss1.foss.arm.com (Postfix)
	with ESMTPSA id 2DB053F577; Tue, 22 May 2018 06:13:24 -0700 (PDT)
Message-ID: <5B041772.80209@foss.arm.com>
Date: Tue, 22 May 2018 14:13:22 +0100
From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <marcus.shawcroft@arm.com>,
	"Richard Earnshaw (lists)" <richard.earnshaw@arm.com>,
	James Greenhalgh <james.greenhalgh@arm.com>
Subject: [PATCH][AArch64] Merge stores of D-register values with different
	modes

[sending on behalf of Jackson Woodruff]

Hi all,

This patch merges loads and stores from D-registers that are of different modes.

Code like this:

     typedef int __attribute__((vector_size(8))) vec;
     struct pair
     {
       vec v;
       double d;
     }

Now generates a store pair instruction:

     void
     assign (struct pair *p, vec v)
     {
       p->v = v;
       p->d = 1.0;
     }

Whereas previously it generated two `str` instructions.

This patch also merges storing of double zero values with
long integer values:

     struct pair
     {
       long long l;
       double d;
     }

     void
     foo (struct pair *p)
     {
       p->l = 10;
       p->d = 0.0;
     }

Now generates a single store pair instruction rather than two `str` instructions.

The patch basically generalises the mode iterators on the patterns in aarch64.md
and the peepholes in aarch64-ldpstp.md to take all combinations of pairs of modes
so, while it may be a large-ish patch, it does fairly mechanical stuff.

Bootstrap and testsuite run OK. OK for trunk?

Jackson

2018-05-22  Jackson Woodruff  <jackson.woodruff@arm.com>
             Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md: New patterns to generate stp
     and ldp.
     (store_pair_sw, store_pair_dw): New patterns to generate stp for
     single words and double words.
     (load_pair_sw, load_pair_dw): Likewise.
     (store_pair_sf, store_pair_df, store_pair_si, store_pair_di):
     Delete.
     (load_pair_sf, load_pair_df, load_pair_si, load_pair_di):
     Delete.
     * config/aarch64/aarch64-ldpstp.md: Modify peephole
     for different mode ldpstp and add peephole for merged zero stores.
     Likewise for loads.
     * config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp):
     Add size check.
     (aarch64_gen_store_pair): Rename calls to match new patterns.
     (aarch64_gen_load_pair): Rename calls to match new patterns.
     * config/aarch64/aarch64-simd.md (load_pair<mode>): Rename to...
     (load_pair<DREG:mode><DREG2:mode>): ... This.
     (store_pair<mode>): Rename to...
     (vec_store_pair<DREG:mode><DREG2:mode>): ... This.
     * config/aarch64/iterators.md (DREG, DREG2, DX2, SX, SX2, DSX):
     New mode iterators.
     (V_INT_EQUIV): Handle SImode.
     * config/aarch64/predicates.md (aarch64_reg_zero_or_fp_zero):
     New predicate.


2018-05-22  Jackson Woodruff  <jackson.woodruff@arm.com>

     * gcc.target/aarch64/ldp_stp_6.c: New.
     * gcc.target/aarch64/ldp_stp_7.c: New.
     * gcc.target/aarch64/ldp_stp_8.c: New.
diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md
index c008477c741d20f530b66ad9420113fa84a54ebb..f6fe8a6a93b5466723e3ed6b892e0ec1e67ee89d 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -99,11 +99,11 @@ (define_peephole2
 })
 
 (define_peephole2
-  [(set (match_operand:VD 0 "register_operand" "")
-	(match_operand:VD 1 "aarch64_mem_pair_operand" ""))
-   (set (match_operand:VD 2 "register_operand" "")
-	(match_operand:VD 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)"
+  [(set (match_operand:DREG 0 "register_operand" "")
+	(match_operand:DREG 1 "aarch64_mem_pair_operand" ""))
+   (set (match_operand:DREG2 2 "register_operand" "")
+	(match_operand:DREG2 3 "memory_operand" ""))]
+  "aarch64_operands_ok_for_ldpstp (operands, true, <DREG:MODE>mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	      (set (match_dup 2) (match_dup 3))])]
 {
@@ -119,11 +119,12 @@ (define_peephole2
 })
 
 (define_peephole2
-  [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "")
-	(match_operand:VD 1 "register_operand" ""))
-   (set (match_operand:VD 2 "memory_operand" "")
-	(match_operand:VD 3 "register_operand" ""))]
-  "TARGET_SIMD && aarch64_operands_ok_for_ldpstp (operands, false, <MODE>mode)"
+  [(set (match_operand:DREG 0 "aarch64_mem_pair_operand" "")
+	(match_operand:DREG 1 "register_operand" ""))
+   (set (match_operand:DREG2 2 "memory_operand" "")
+	(match_operand:DREG2 3 "register_operand" ""))]
+  "TARGET_SIMD
+   && aarch64_operands_ok_for_ldpstp (operands, false, <DREG:MODE>mode)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	      (set (match_dup 2) (match_dup 3))])]
 {
@@ -138,7 +139,6 @@ (define_peephole2
     }
 })
 
-
 ;; Handle sign/zero extended consecutive load/store.
 
 (define_peephole2
@@ -181,6 +181,36 @@ (define_peephole2
     }
 })
 
+;; Handle storing of a floating point zero with integer data.
+;; This handles cases like:
+;;   struct pair { int a; float b; }
+;;
+;;   p->a = 1;
+;;   p->b = 0.0;
+;;
+;; We can match modes that won't work for a stp instruction
+;; as aarch64_operands_ok_for_ldpstp checks that the modes are
+;; compatible.
+(define_peephole2
+  [(set (match_operand:DSX 0 "aarch64_mem_pair_operand" "")
+	(match_operand:DSX 1 "aarch64_reg_zero_or_fp_zero" ""))
+   (set (match_operand:<FCVT_TARGET> 2 "memory_operand" "")
+	(match_operand:<FCVT_TARGET> 3 "aarch64_reg_zero_or_fp_zero" ""))]
+  "aarch64_operands_ok_for_ldpstp (operands, false, <V_INT_EQUIV>mode)"
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	      (set (match_dup 2) (match_dup 3))])]
+{
+  rtx base, offset_1, offset_2;
+
+  extract_base_offset_in_addr (operands[0], &base, &offset_1);
+  extract_base_offset_in_addr (operands[2], &base, &offset_2);
+  if (INTVAL (offset_1) > INTVAL (offset_2))
+    {
+      std::swap (operands[0], operands[2]);
+      std::swap (operands[1], operands[3]);
+    }
+})
+
 ;; Handle consecutive load/store whose offset is out of the range
 ;; supported by ldp/ldpsw/stp.  We firstly adjust offset in a scratch
 ;; register, then merge them into ldp/ldpsw/stp by using the adjusted
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index e96afeaf2a5cbca631c0f7f7731b59f2d986df4f..82588a706016da8207eb5aafdf076aea8e9db0fd 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -177,30 +177,30 @@ (define_insn "aarch64_store_lane0<mode>"
   [(set_attr "type" "neon_store1_1reg<q>")]
 )
 
-(define_insn "load_pair<mode>"
-  [(set (match_operand:VD 0 "register_operand" "=w")
-	(match_operand:VD 1 "aarch64_mem_pair_operand" "Ump"))
-   (set (match_operand:VD 2 "register_operand" "=w")
-	(match_operand:VD 3 "memory_operand" "m"))]
+(define_insn "load_pair<DREG:mode><DREG2:mode>"
+  [(set (match_operand:DREG 0 "register_operand" "=w")
+	(match_operand:DREG 1 "aarch64_mem_pair_operand" "Ump"))
+   (set (match_operand:DREG2 2 "register_operand" "=w")
+	(match_operand:DREG2 3 "memory_operand" "m"))]
   "TARGET_SIMD
    && rtx_equal_p (XEXP (operands[3], 0),
 		   plus_constant (Pmode,
 				  XEXP (operands[1], 0),
-				  GET_MODE_SIZE (<MODE>mode)))"
+				  GET_MODE_SIZE (<DREG:MODE>mode)))"
   "ldp\\t%d0, %d2, %1"
   [(set_attr "type" "neon_ldp")]
 )
 
-(define_insn "store_pair<mode>"
-  [(set (match_operand:VD 0 "aarch64_mem_pair_operand" "=Ump")
-	(match_operand:VD 1 "register_operand" "w"))
-   (set (match_operand:VD 2 "memory_operand" "=m")
-	(match_operand:VD 3 "register_operand" "w"))]
+(define_insn "vec_store_pair<DREG:mode><DREG2:mode>"
+  [(set (match_operand:DREG 0 "aarch64_mem_pair_operand" "=Ump")
+	(match_operand:DREG 1 "register_operand" "w"))
+   (set (match_operand:DREG2 2 "memory_operand" "=m")
+	(match_operand:DREG2 3 "register_operand" "w"))]
   "TARGET_SIMD
    && rtx_equal_p (XEXP (operands[2], 0),
 		   plus_constant (Pmode,
 				  XEXP (operands[0], 0),
-				  GET_MODE_SIZE (<MODE>mode)))"
+				  GET_MODE_SIZE (<DREG:MODE>mode)))"
   "stp\\t%d1, %d3, %0"
   [(set_attr "type" "neon_stp")]
 )
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6bf6c05535b61eef1021d46bcd8448fb3a0b25f4..b75a588eb9aa49b1796161d34494d452a6742e4d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4279,10 +4279,10 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2,
   switch (mode)
     {
     case E_DImode:
-      return gen_store_pairdi (mem1, reg1, mem2, reg2);
+      return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2);
 
     case E_DFmode:
-      return gen_store_pairdf (mem1, reg1, mem2, reg2);
+      return gen_store_pair_dw_dfdf (mem1, reg1, mem2, reg2);
 
     default:
       gcc_unreachable ();
@@ -4299,10 +4299,10 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
   switch (mode)
     {
     case E_DImode:
-      return gen_load_pairdi (reg1, mem1, reg2, mem2);
+      return gen_load_pair_dw_didi (reg1, mem1, reg2, mem2);
 
     case E_DFmode:
-      return gen_load_pairdf (reg1, mem1, reg2, mem2);
+      return gen_load_pair_dw_dfdf (reg1, mem1, reg2, mem2);
 
     default:
       gcc_unreachable ();
@@ -16853,6 +16853,10 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load,
   if (!rtx_equal_p (base_1, base_2))
     return false;
 
+  /* The operands must be of the same size.  */
+  gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)),
+			 GET_MODE_SIZE (GET_MODE (mem_2))));
+
   offval_1 = INTVAL (offset_1);
   offval_2 = INTVAL (offset_2);
   /* We should only be trying this for fixed-sized modes.  There is no
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1277e489185b4f1c8bb8f31da4c4b53d19a2249c..d1290563255d64d05c94771af13e45c002111949 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1309,15 +1309,15 @@ (define_expand "movmemdi"
 
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
-(define_insn "load_pairsi"
-  [(set (match_operand:SI 0 "register_operand" "=r,w")
-	(match_operand:SI 1 "aarch64_mem_pair_operand" "Ump,Ump"))
-   (set (match_operand:SI 2 "register_operand" "=r,w")
-	(match_operand:SI 3 "memory_operand" "m,m"))]
-  "rtx_equal_p (XEXP (operands[3], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[1], 0),
-			       GET_MODE_SIZE (SImode)))"
+(define_insn "load_pair_sw_<SX:mode><SX2:mode>"
+  [(set (match_operand:SX 0 "register_operand" "=r,w")
+	(match_operand:SX 1 "aarch64_mem_pair_operand" "Ump,Ump"))
+   (set (match_operand:SX2 2 "register_operand" "=r,w")
+	(match_operand:SX2 3 "memory_operand" "m,m"))]
+   "rtx_equal_p (XEXP (operands[3], 0),
+		 plus_constant (Pmode,
+				XEXP (operands[1], 0),
+				GET_MODE_SIZE (<SX:MODE>mode)))"
   "@
    ldp\\t%w0, %w2, %1
    ldp\\t%s0, %s2, %1"
@@ -1325,15 +1325,16 @@ (define_insn "load_pairsi"
    (set_attr "fp" "*,yes")]
 )
 
-(define_insn "load_pairdi"
-  [(set (match_operand:DI 0 "register_operand" "=r,w")
-	(match_operand:DI 1 "aarch64_mem_pair_operand" "Ump,Ump"))
-   (set (match_operand:DI 2 "register_operand" "=r,w")
-	(match_operand:DI 3 "memory_operand" "m,m"))]
-  "rtx_equal_p (XEXP (operands[3], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[1], 0),
-			       GET_MODE_SIZE (DImode)))"
+;; Storing different modes that can still be merged
+(define_insn "load_pair_dw_<DX:mode><DX2:mode>"
+  [(set (match_operand:DX 0 "register_operand" "=r,w")
+	(match_operand:DX 1 "aarch64_mem_pair_operand" "Ump,Ump"))
+   (set (match_operand:DX2 2 "register_operand" "=r,w")
+	(match_operand:DX2 3 "memory_operand" "m,m"))]
+   "rtx_equal_p (XEXP (operands[3], 0),
+		 plus_constant (Pmode,
+				XEXP (operands[1], 0),
+				GET_MODE_SIZE (<DX:MODE>mode)))"
   "@
    ldp\\t%x0, %x2, %1
    ldp\\t%d0, %d2, %1"
@@ -1341,18 +1342,17 @@ (define_insn "load_pairdi"
    (set_attr "fp" "*,yes")]
 )
 
-
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
-(define_insn "store_pairsi"
-  [(set (match_operand:SI 0 "aarch64_mem_pair_operand" "=Ump,Ump")
-	(match_operand:SI 1 "aarch64_reg_or_zero" "rZ,w"))
-   (set (match_operand:SI 2 "memory_operand" "=m,m")
-	(match_operand:SI 3 "aarch64_reg_or_zero" "rZ,w"))]
-  "rtx_equal_p (XEXP (operands[2], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[0], 0),
-			       GET_MODE_SIZE (SImode)))"
+(define_insn "store_pair_sw_<SX:mode><SX2:mode>"
+  [(set (match_operand:SX 0 "aarch64_mem_pair_operand" "=Ump,Ump")
+	(match_operand:SX 1 "aarch64_reg_zero_or_fp_zero" "rYZ,w"))
+   (set (match_operand:SX2 2 "memory_operand" "=m,m")
+	(match_operand:SX2 3 "aarch64_reg_zero_or_fp_zero" "rYZ,w"))]
+   "rtx_equal_p (XEXP (operands[2], 0),
+		 plus_constant (Pmode,
+				XEXP (operands[0], 0),
+				GET_MODE_SIZE (<SX:MODE>mode)))"
   "@
    stp\\t%w1, %w3, %0
    stp\\t%s1, %s3, %0"
@@ -1360,15 +1360,16 @@ (define_insn "store_pairsi"
    (set_attr "fp" "*,yes")]
 )
 
-(define_insn "store_pairdi"
-  [(set (match_operand:DI 0 "aarch64_mem_pair_operand" "=Ump,Ump")
-	(match_operand:DI 1 "aarch64_reg_or_zero" "rZ,w"))
-   (set (match_operand:DI 2 "memory_operand" "=m,m")
-	(match_operand:DI 3 "aarch64_reg_or_zero" "rZ,w"))]
-  "rtx_equal_p (XEXP (operands[2], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[0], 0),
-			       GET_MODE_SIZE (DImode)))"
+;; Storing different modes that can still be merged
+(define_insn "store_pair_dw_<DX:mode><DX2:mode>"
+  [(set (match_operand:DX 0 "aarch64_mem_pair_operand" "=Ump,Ump")
+	(match_operand:DX 1 "aarch64_reg_zero_or_fp_zero" "rYZ,w"))
+   (set (match_operand:DX2 2 "memory_operand" "=m,m")
+	(match_operand:DX2 3 "aarch64_reg_zero_or_fp_zero" "rYZ,w"))]
+   "rtx_equal_p (XEXP (operands[2], 0),
+		 plus_constant (Pmode,
+				XEXP (operands[0], 0),
+				GET_MODE_SIZE (<DX:MODE>mode)))"
   "@
    stp\\t%x1, %x3, %0
    stp\\t%d1, %d3, %0"
@@ -1376,74 +1377,6 @@ (define_insn "store_pairdi"
    (set_attr "fp" "*,yes")]
 )
 
-;; Operands 1 and 3 are tied together by the final condition; so we allow
-;; fairly lax checking on the second memory operation.
-(define_insn "load_pairsf"
-  [(set (match_operand:SF 0 "register_operand" "=w,r")
-	(match_operand:SF 1 "aarch64_mem_pair_operand" "Ump,Ump"))
-   (set (match_operand:SF 2 "register_operand" "=w,r")
-	(match_operand:SF 3 "memory_operand" "m,m"))]
-  "rtx_equal_p (XEXP (operands[3], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[1], 0),
-			       GET_MODE_SIZE (SFmode)))"
-  "@
-   ldp\\t%s0, %s2, %1
-   ldp\\t%w0, %w2, %1"
-  [(set_attr "type" "neon_load1_2reg,load_8")
-   (set_attr "fp" "yes,*")]
-)
-
-(define_insn "load_pairdf"
-  [(set (match_operand:DF 0 "register_operand" "=w,r")
-	(match_operand:DF 1 "aarch64_mem_pair_operand" "Ump,Ump"))
-   (set (match_operand:DF 2 "register_operand" "=w,r")
-	(match_operand:DF 3 "memory_operand" "m,m"))]
-  "rtx_equal_p (XEXP (operands[3], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[1], 0),
-			       GET_MODE_SIZE (DFmode)))"
-  "@
-   ldp\\t%d0, %d2, %1
-   ldp\\t%x0, %x2, %1"
-  [(set_attr "type" "neon_load1_2reg,load_16")
-   (set_attr "fp" "yes,*")]
-)
-
-;; Operands 0 and 2 are tied together by the final condition; so we allow
-;; fairly lax checking on the second memory operation.
-(define_insn "store_pairsf"
-  [(set (match_operand:SF 0 "aarch64_mem_pair_operand" "=Ump,Ump")
-	(match_operand:SF 1 "aarch64_reg_or_fp_zero" "w,rY"))
-   (set (match_operand:SF 2 "memory_operand" "=m,m")
-	(match_operand:SF 3 "aarch64_reg_or_fp_zero" "w,rY"))]
-  "rtx_equal_p (XEXP (operands[2], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[0], 0),
-			       GET_MODE_SIZE (SFmode)))"
-  "@
-   stp\\t%s1, %s3, %0
-   stp\\t%w1, %w3, %0"
-  [(set_attr "type" "neon_store1_2reg,store_8")
-   (set_attr "fp" "yes,*")]
-)
-
-(define_insn "store_pairdf"
-  [(set (match_operand:DF 0 "aarch64_mem_pair_operand" "=Ump,Ump")
-	(match_operand:DF 1 "aarch64_reg_or_fp_zero" "w,rY"))
-   (set (match_operand:DF 2 "memory_operand" "=m,m")
-	(match_operand:DF 3 "aarch64_reg_or_fp_zero" "w,rY"))]
-  "rtx_equal_p (XEXP (operands[2], 0),
-		plus_constant (Pmode,
-			       XEXP (operands[0], 0),
-			       GET_MODE_SIZE (DFmode)))"
-  "@
-   stp\\t%d1, %d3, %0
-   stp\\t%x1, %x3, %0"
-  [(set_attr "type" "neon_store1_2reg,store_16")
-   (set_attr "fp" "yes,*")]
-)
-
 ;; Load pair with post-index writeback.  This is primarily used in function
 ;; epilogues.
 (define_insn "loadwb_pair<GPI:mode>_<P:mode>"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 36c8aa81dd700f5df48df0184cbfd6c79d2dafbf..29c198e22f79510018e3bc79b499863f857cc7a2 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -69,6 +69,12 @@ (define_mode_iterator VSDQ_I_DI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI DI])
 ;; Double vector modes.
 (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF])
 
+;; All modes stored in registers d0-d31.
+(define_mode_iterator DREG [V8QI V4HI V4HF V2SI V2SF DF])
+
+;; Copy of the above.
+(define_mode_iterator DREG2 [V8QI V4HI V4HF V2SI V2SF DF])
+
 ;; Advanced SIMD, 64-bit container, all integer modes.
 (define_mode_iterator VD_BHSI [V8QI V4HI V2SI])
 
@@ -236,6 +242,19 @@ (define_mode_iterator VSTRUCT [OI CI XI])
 ;; Double scalar modes
 (define_mode_iterator DX [DI DF])
 
+;; Duplicate of the above
+(define_mode_iterator DX2 [DI DF])
+
+;; Single scalar modes
+(define_mode_iterator SX [SI SF])
+
+;; Duplicate of the above
+(define_mode_iterator SX2 [SI SF])
+
+;; Single and double integer and float modes
+(define_mode_iterator DSX [DF DI SF SI])
+
+
 ;; Modes available for Advanced SIMD <f>mul lane operations.
 (define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
 			    (V4HF "TARGET_SIMD_F16INST")
@@ -855,7 +874,8 @@ (define_mode_attr V_INT_EQUIV [(V8QI "V8QI") (V16QI "V16QI")
 			       (V4HF "V4HI") (V8HF  "V8HI")
 			       (V2SF "V2SI") (V4SF  "V4SI")
 			       (DF   "DI")   (V2DF  "V2DI")
-			       (SF   "SI")   (HF    "HI")
+			       (SF   "SI")   (SI    "SI")
+			       (HF    "HI")
 			       (VNx16QI "VNx16QI")
 			       (VNx8HI  "VNx8HI") (VNx8HF "VNx8HI")
 			       (VNx4SI  "VNx4SI") (VNx4SF "VNx4SI")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 5d41d4350402b2a9e5941f160c6ab6f933bfff90..7aec76d681f5eca87b7b5e1d63d12dc0205ad113 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -62,6 +62,10 @@ (define_predicate "aarch64_reg_or_fp_zero"
 	(and (match_code "const_double")
 	     (match_test "aarch64_float_const_zero_rtx_p (op)"))))
 
+(define_predicate "aarch64_reg_zero_or_fp_zero"
+  (ior (match_operand 0 "aarch64_reg_or_fp_zero")
+       (match_operand 0 "aarch64_reg_or_zero")))
+
 (define_predicate "aarch64_reg_zero_or_m1_or_1"
   (and (match_code "reg,subreg,const_int")
        (ior (match_operand 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_6.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..2d982f3389b668f2042d48ba3db04e619fd999f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_6.c
@@ -0,0 +1,20 @@
+/* { dg-options "-O2" } */
+
+typedef float __attribute__ ((vector_size (8))) vec;
+
+struct pair
+{
+  vec e1;
+  double e2;
+};
+
+vec tmp;
+
+void
+stp (struct pair *p)
+{
+  p->e1 = tmp;
+  p->e2 = 1.0;
+
+  /* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_7.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..be24c8a97009a77b510e2b1caee754a75bb9c3d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_7.c
@@ -0,0 +1,47 @@
+/* { dg-options "-O2" } */
+
+struct pair
+{
+  double a;
+  long long int b;
+};
+
+void
+stp (struct pair *p)
+{
+  p->a = 0.0;
+  p->b = 1;
+}
+
+/* { dg-final { scan-assembler "stp\txzr, x\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+
+void
+stp2 (struct pair *p)
+{
+  p->a = 0.0;
+  p->b = 0;
+}
+
+struct reverse_pair
+{
+  long long int a;
+  double b;
+};
+
+void
+stp_reverse (struct reverse_pair *p)
+{
+  p->a = 1;
+  p->b = 0.0;
+}
+
+/* { dg-final { scan-assembler "stp\tx\[0-9\]+, xzr, \\\[x\[0-9\]+\\\]" } } */
+
+void
+stp_reverse2 (struct reverse_pair *p)
+{
+  p->a = 0;
+  p->b = 0.0;
+}
+
+/* { dg-final { scan-assembler-times "stp\txzr, xzr, \\\[x\[0-9\]+\\\]" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_8.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..2d9cb6b19d563a7e1bb11c174f6d642291260f29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_8.c
@@ -0,0 +1,30 @@
+/* { dg-options "-O2" } */
+
+typedef float __attribute__ ((vector_size (8))) fvec;
+typedef int __attribute__ ((vector_size (8))) ivec;
+
+struct pair
+{
+  double a;
+  fvec b;
+};
+
+void ldp (double *a, fvec *b, struct pair *p)
+{
+  *a = p->a + 1;
+  *b = p->b;
+}
+
+struct vec_pair
+{
+  fvec a;
+  ivec b;
+};
+
+void ldp2 (fvec *a, ivec *b, struct vec_pair *p)
+{
+  *a = p->a;
+  *b = p->b;
+}
+
+/* { dg-final { scan-assembler-times "ldp\td\[0-9\], d\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */