From patchwork Tue Nov 15 16:14:20 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 695114
Return-Path: 
 <gcc-patches-return-441485-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3tJCC76XDvz9t0q
	for <incoming@patchwork.ozlabs.org>;
	Wed, 16 Nov 2016 03:14:43 +1100 (AEDT)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="jIjPtyWr"; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; q=dns; s=
	default; b=TAGaVu74giLrU6npp1zdS6AecxMkYqqF+EWpviISFRFpMksehkHSG
	npvg47TRuc5n2H9IN3zrR7UL58UKEBJQzmjFzch6EvbyeFDV3cS8/kEwYR3KeEvz
	wnCUHGEpq5qYcE6u08+tSG5sCpjEnLi0Gx4xXIQ7iXWC8NCyXYkNjE=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; s=
	default; bh=QCqsZRrn2HGpH4v5yfxFvE17A8o=; b=jIjPtyWrQsQzzkW4klgt
	wVDSq2se9HQLHQkDTRzFRaxChmmzTuJVpxxj6xO/YfiS1ItEsozSOFin4ALiysvL
	3N6oimdiWMyyq0pirRaJ/ZiHQ1vrVniCQ9LuIVAhVouON/qLgB8oC5qdRTmOwJjn
	Bw5VtLElI9cu7OsmIZZzwzs=
Received: (qmail 103311 invoked by alias); 15 Nov 2016 16:14:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 103288 invoked by uid 89); 15 Nov 2016 16:14:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-4.7 required=5.0 tests=BAYES_00,
	RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Hard,
	decreasing, cancel
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Tue, 15 Nov 2016 16:14:24 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by
	usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id
	BCF131610; Tue, 15 Nov 2016 08:14:22 -0800 (PST)
Received: from localhost (e105548-lin.manchester.arm.com [10.45.32.67])	by
	usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id
	67CB53F3D6	for <gcc-patches@gcc.gnu.org>;
	Tue, 15 Nov 2016 08:14:22 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: Tweak LRA handling of shared spill slots
Date: Tue, 15 Nov 2016 16:14:20 +0000
Message-ID: <87eg2c3elf.fsf@e105548-lin.cambridge.arm.com>
User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.3 (gnu/linux)
MIME-Version: 1.0

The previous code processed the users of a stack slot in order of
decreasing size and allocated the slot based on the first user.
This seems a bit dangerous, since the ordering is based on the
mode of the biggest reference while the allocation is based also
on the size of the register itself (which I think could be larger).

That scheme doesn't scale well to polynomial sizes, since there's
no guarantee that the order of the sizes is known at compile time.
This patch instead records an upper bound on the size required
by all users of a slot.  It also records the maximum alignment
requirement.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  <richard.sandiford@arm.com>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

	* function.h (spill_slot_alignment): Declare.
	* function.c (spill_slot_alignment): New function.
	* lra-spills.c (slot): Add align and size fields.
	(assign_mem_slot): Use them in the call to assign_stack_local.
	(add_pseudo_to_slot): Update the fields.
	(assign_stack_slot_num_and_sort_pseudos): Initialise the fields.

diff --git a/gcc/function.c b/gcc/function.c
index 0b1d168..b009a0d 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -246,6 +246,14 @@ frame_offset_overflow (HOST_WIDE_INT offset, tree func)
   return FALSE;
 }
 
+/* Return the minimum spill slot alignment for a register of mode MODE.  */
+
+unsigned int
+spill_slot_alignment (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return STACK_SLOT_ALIGNMENT (NULL_TREE, mode, GET_MODE_ALIGNMENT (mode));
+}
+
 /* Return stack slot alignment in bits for TYPE and MODE.  */
 
 static unsigned int
diff --git a/gcc/function.h b/gcc/function.h
index e854c7f..6898f7f 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -567,6 +567,8 @@ extern HOST_WIDE_INT get_frame_size (void);
    return FALSE.  */
 extern bool frame_offset_overflow (HOST_WIDE_INT, tree);
 
+extern unsigned int spill_slot_alignment (machine_mode);
+
 extern rtx assign_stack_local_1 (machine_mode, HOST_WIDE_INT, int, int);
 extern rtx assign_stack_local (machine_mode, HOST_WIDE_INT, int);
 extern rtx assign_stack_temp_for_type (machine_mode, HOST_WIDE_INT, tree);
diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index 6e044cd..9f1d5e9 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -104,6 +104,10 @@ struct slot
   /* Hard reg into which the slot pseudos are spilled.	The value is
      negative for pseudos spilled into memory.	*/
   int hard_regno;
+  /* Maximum alignment required by all users of the slot.  */
+  unsigned int align;
+  /* Maximum size required by all users of the slot.  */
+  HOST_WIDE_INT size;
   /* Memory representing the all stack slot.  It can be different from
      memory representing a pseudo belonging to give stack slot because
      pseudo can be placed in a part of the corresponding stack slot.
@@ -128,51 +132,23 @@ assign_mem_slot (int i)
 {
   rtx x = NULL_RTX;
   machine_mode mode = GET_MODE (regno_reg_rtx[i]);
-  unsigned int inherent_size = PSEUDO_REGNO_BYTES (i);
-  unsigned int inherent_align = GET_MODE_ALIGNMENT (mode);
-  unsigned int max_ref_width = GET_MODE_SIZE (lra_reg_info[i].biggest_mode);
-  unsigned int total_size = MAX (inherent_size, max_ref_width);
-  unsigned int min_align = max_ref_width * BITS_PER_UNIT;
-  int adjust = 0;
+  HOST_WIDE_INT inherent_size = PSEUDO_REGNO_BYTES (i);
+  machine_mode wider_mode
+    = (GET_MODE_SIZE (mode) >= GET_MODE_SIZE (lra_reg_info[i].biggest_mode)
+       ? mode : lra_reg_info[i].biggest_mode);
+  HOST_WIDE_INT total_size = GET_MODE_SIZE (wider_mode);
+  HOST_WIDE_INT adjust = 0;
 
   lra_assert (regno_reg_rtx[i] != NULL_RTX && REG_P (regno_reg_rtx[i])
 	      && lra_reg_info[i].nrefs != 0 && reg_renumber[i] < 0);
 
-  x = slots[pseudo_slots[i].slot_num].mem;
-
-  /* We can use a slot already allocated because it is guaranteed the
-     slot provides both enough inherent space and enough total
-     space.  */
-  if (x)
-    ;
-  /* Each pseudo has an inherent size which comes from its own mode,
-     and a total size which provides room for paradoxical subregs
-     which refer to the pseudo reg in wider modes.  We allocate a new
-     slot, making sure that it has enough inherent space and total
-     space.  */
-  else
+  unsigned int slot_num = pseudo_slots[i].slot_num;
+  x = slots[slot_num].mem;
+  if (!x)
     {
-      rtx stack_slot;
-
-      /* No known place to spill from => no slot to reuse.  */
-      x = assign_stack_local (mode, total_size,
-			      min_align > inherent_align
-			      || total_size > inherent_size ? -1 : 0);
-      stack_slot = x;
-      /* Cancel the big-endian correction done in assign_stack_local.
-	 Get the address of the beginning of the slot.	This is so we
-	 can do a big-endian correction unconditionally below.	*/
-      if (BYTES_BIG_ENDIAN)
-	{
-	  adjust = inherent_size - total_size;
-	  if (adjust)
-	    stack_slot
-	      = adjust_address_nv (x,
-				   mode_for_size (total_size * BITS_PER_UNIT,
-						  MODE_INT, 1),
-				   adjust);
-	}
-      slots[pseudo_slots[i].slot_num].mem = stack_slot;
+      x = assign_stack_local (BLKmode, slots[slot_num].size,
+			      slots[slot_num].align);
+      slots[slot_num].mem = x;
     }
 
   /* On a big endian machine, the "address" of the slot is the address
@@ -335,6 +311,18 @@ add_pseudo_to_slot (int regno, int slot_num)
 {
   struct pseudo_slot *first;
 
+  /* Each pseudo has an inherent size which comes from its own mode,
+     and a total size which provides room for paradoxical subregs.
+     We need to make sure the size and alignment of the slot are
+     sufficient for both.  */
+  machine_mode mode = (GET_MODE_SIZE (PSEUDO_REGNO_MODE (regno))
+		       >= GET_MODE_SIZE (lra_reg_info[regno].biggest_mode)
+		       ? PSEUDO_REGNO_MODE (regno)
+		       : lra_reg_info[regno].biggest_mode);
+  unsigned int align = spill_slot_alignment (mode);
+  slots[slot_num].align = MAX (slots[slot_num].align, align);
+  slots[slot_num].size = MAX (slots[slot_num].size, GET_MODE_SIZE (mode));
+
   if (slots[slot_num].regno < 0)
     {
       /* It is the first pseudo in the slot.  */
@@ -385,6 +373,8 @@ assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, int n)
 	{
 	  /* New slot.	*/
 	  slots[j].live_ranges = NULL;
+	  slots[j].size = 0;
+	  slots[j].align = BITS_PER_UNIT;
 	  slots[j].regno = slots[j].hard_regno = -1;
 	  slots[j].mem = NULL_RTX;
 	  slots_num++;