diff mbox

PowerPC64 reload failure with misaligned fp load

Message ID 20110325150905.GM13754@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra March 25, 2011, 3:09 p.m. UTC
Compiler errors like the following have plagued powerpc64 gcc for a
long time.

error: insn does not satisfy its constraints:
(insn 43020 13791 13792 1245 _thread.c:8306 (set (reg:DF 12 12)
        (mem:DF (plus:DI (reg:DI 3 3 [orig:4092 D.11951 ] [4092])
                (const_int 15 [0xf])) [0 S8 A64])) 388 {*movdf_hardfloat64} 

They tend to appear when there is high register pressure and
misaligned doubles.  Often the reason the memory is misaligned is due
to user error, for example, compiling generated code full of macros
highly optimized for some other architecture.  So people fix their
error and this sort of compiler bug doesn't get much attention.

I spent some time in gdb yesterday looking at what goes wrong in
reload for this insn.  First time in calculate_needs_all_insns we see

(insn 11358 11553 11359 1256 _thread.c:8306 (set (reg/v:DF 12 12 [orig:4065 ___F64V1 ] [4065])
        (mem:DF (plus:DI (reg:DI 15 15 [orig:4062 D.10743 ] [4062])
                (const_int 15 [0xf])) [16 S8 A64])) 388 {*movdf_hardfloat64} (nil))

Reload correctly chooses the r,Y alternative of movdf_hardfloat64 and
also correctly notices the address in the mem doesn't match the Y
constraint so reloads the address.  All good so far.

However, some spills are needed and r12 gets chosen as a spill reg.
Next time in calculate_needs_all_insns we see

(insn 11358 11553 11359 1256 _thread.c:8306 (set (reg/v:DF 4065 [ ___F64V1 ])
        (mem:DF (plus:DI (reg:DI 15 15 [orig:4062 D.10743 ] [4062])
                (const_int 15 [0xf])) [16 S8 A64])) 388 {*movdf_hardfloat64} (nil))

Notice we are back to a pseudo for the set dest.  reg_renumber[4065]
is -1 and reg_equiv_mem[4065] is a stack slot.
(mem/c:DF (plus:DI (reg/f:DI 1 1)
        (const_int 112 [0x70])) [19 %sfp+112 S8 A64])
No surprises there.

The problem is that the stack slot perfectly matches the Y constraint,
and we see reload choosing the Y,r alternative of movdf_hardfloat64 as
this happens to give a better "losers" value than the r,Y alternative.
This is a bit weird considering the original instruction was a load
and now we're matching a store, but reasonable when you realize that
what reload is looking at now is really a mem->mem move.
But of course now 
(mem:DF (plus:DI (reg:DI 15 15 [orig:4062 D.10743 ] [4062])
        (const_int 15 [0xf])) [16 S8 A64])
needs to be reloaded to a reg, and this is attempted with a general
reg.  Reload assumes that this reload is OK, but really hasn't made
any progress at all!

How to fix this?  Well, stack slots need to stay matching the Y
constraint since we obviously need to load and store fp values in
stack slots.  I don't think adding '?' to the first constraint is a
solution as then the problem would reappear on fp stores to misaligned
mem.  The only solution I see is a secondary reload to fix up invalid
offset addresses.

Much of the following patch is based on Michael Meissner's support for
vector reloads.  The predicates.md change teaches the predicate used
by the "Y" constraint to check cmodel medium addresses in case such
addresses should ever be generated with invalid offsets.
Bootstrapped and regression tested powerpc64-linux. OK mainline?

	* config/rs6000/predicates.md (word_offset_memref_op): Handle
	cmodel medium addresses.
	* config/rs6000/rs6000.c (rs6000_secondary_reload): Handle misaligned
	64-bit gpr loads and stores.
	(rs6000_secondary_reload_ppc64): New function.
	* config/rs6000/rs6000-protos.h: Declare it.
	* config/rs6000/rs6000.md (reload_di_store, reload_di_load): New.

Comments

David Edelsohn March 25, 2011, 7:30 p.m. UTC | #1
On Fri, Mar 25, 2011 at 11:09 AM, Alan Modra <amodra@gmail.com> wrote:

> Much of the following patch is based on Michael Meissner's support for
> vector reloads.  The predicates.md change teaches the predicate used
> by the "Y" constraint to check cmodel medium addresses in case such
> addresses should ever be generated with invalid offsets.
> Bootstrapped and regression tested powerpc64-linux. OK mainline?
>
>        * config/rs6000/predicates.md (word_offset_memref_op): Handle
>        cmodel medium addresses.
>        * config/rs6000/rs6000.c (rs6000_secondary_reload): Handle misaligned
>        64-bit gpr loads and stores.
>        (rs6000_secondary_reload_ppc64): New function.
>        * config/rs6000/rs6000-protos.h: Declare it.
>        * config/rs6000/rs6000.md (reload_di_store, reload_di_load): New.

Okay with me.

Want to give Uli a heads up in case he has any additional comments.

Thanks, David
Alan Modra March 26, 2011, 12:20 a.m. UTC | #2
On Fri, Mar 25, 2011 at 09:36:04PM +0100, Ulrich Weigand wrote:
> Looks good to me, except ...
> 
> +  mem = change_address (mem, VOIDmode, scratch_or_premodify);
> 
> Maybe replace_equiv_address instead, to avoid losing the memory
> attribute information (alignment, alias set, ...)?

Yes, in fact replace_equiv_address_nv is even better.  (As witnessed
by the use of that function in reload.c, reload1.c.  We know
validating the address is just a waste of time, as is calling
update_temp_slot_address.)  I'll bootstrap and regtest again with that
change.
diff mbox

Patch

Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 171446)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -435,9 +435,12 @@  (define_predicate "word_offset_memref_op
     op = XEXP (op, 0);
   else if (GET_CODE (op) == PRE_MODIFY)
     op = XEXP (op, 1);
+  else if (GET_CODE (op) == LO_SUM
+	   && GET_CODE (XEXP (op, 0)) == REG
+	   && GET_CODE (XEXP (op, 1)) == CONST)
+    op = XEXP (XEXP (op, 1), 0);
 
   return (GET_CODE (op) != PLUS
-	  || ! REG_P (XEXP (op, 0))
 	  || GET_CODE (XEXP (op, 1)) != CONST_INT
 	  || INTVAL (XEXP (op, 1)) % 4 == 0);
 })
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 171446)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -79,6 +79,7 @@  extern bool (*rs6000_cannot_change_mode_
 						    enum machine_mode,
 						    enum reg_class);
 extern void rs6000_secondary_reload_inner (rtx, rtx, rtx, bool);
+extern void rs6000_secondary_reload_ppc64 (rtx, rtx, rtx, bool);
 extern int paired_emit_vector_cond_expr (rtx, rtx, rtx,
                                          rtx, rtx, rtx);
 extern void paired_expand_vector_move (rtx operands[]);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 171446)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -14816,7 +14815,10 @@  rs6000_reload_register_type (enum reg_cl
    needed for the immediate register.
 
    For VSX and Altivec, we may need a register to convert sp+offset into
-   reg+sp.  */
+   reg+sp.
+
+   For misaligned 64-bit gpr loads and stores we need a register to
+   convert an offset address to indirect.  */
 
 static reg_class_t
 rs6000_secondary_reload (bool in_p,
@@ -14919,6 +14921,34 @@  rs6000_secondary_reload (bool in_p,
       else
 	default_p = true;
     }
+  else if (TARGET_POWERPC64
+	   && rs6000_reload_register_type (rclass) == GPR_REGISTER_TYPE
+	   && MEM_P (x)
+	   && GET_MODE_SIZE (GET_MODE (x)) >= UNITS_PER_WORD)
+    {
+      rtx addr = XEXP (x, 0);
+
+      if (GET_CODE (addr) == PRE_MODIFY)
+	addr = XEXP (addr, 1);
+      else if (GET_CODE (addr) == LO_SUM
+	       && GET_CODE (XEXP (addr, 0)) == REG
+	       && GET_CODE (XEXP (addr, 1)) == CONST)
+	addr = XEXP (XEXP (addr, 1), 0);
+
+      if (GET_CODE (addr) == PLUS
+	  && GET_CODE (XEXP (addr, 1)) == CONST_INT
+	  && (INTVAL (XEXP (addr, 1)) & 3) != 0)
+	{
+	  if (in_p)
+	    sri->icode = CODE_FOR_reload_di_load;
+	  else
+	    sri->icode = CODE_FOR_reload_di_store;
+	  sri->extra_cost = 2;
+	  ret = NO_REGS;
+	}
+      else
+	default_p = true;
+    }
   else
     default_p = true;
 
@@ -15207,6 +15237,56 @@  rs6000_secondary_reload_inner (rtx reg, 
   return;
 }
 
+/* Convert reloads involving 64-bit gprs and misaligned offset
+   addressing to use indirect addressing.  */
+
+void
+rs6000_secondary_reload_ppc64 (rtx reg, rtx mem, rtx scratch, bool store_p)
+{
+  int regno = true_regnum (reg);
+  enum reg_class rclass;
+  rtx addr;
+  rtx scratch_or_premodify = scratch;
+
+  if (TARGET_DEBUG_ADDR)
+    {
+      fprintf (stderr, "\nrs6000_secondary_reload_ppc64, type = %s\n",
+	       store_p ? "store" : "load");
+      fprintf (stderr, "reg:\n");
+      debug_rtx (reg);
+      fprintf (stderr, "mem:\n");
+      debug_rtx (mem);
+      fprintf (stderr, "scratch:\n");
+      debug_rtx (scratch);
+    }
+
+  gcc_assert (regno >= 0 && regno < FIRST_PSEUDO_REGISTER);
+  gcc_assert (GET_CODE (mem) == MEM);
+  rclass = REGNO_REG_CLASS (regno);
+  gcc_assert (rclass == GENERAL_REGS || rclass == BASE_REGS);
+  addr = XEXP (mem, 0);
+
+  if (GET_CODE (addr) == PRE_MODIFY)
+    {
+      scratch_or_premodify = XEXP (addr, 0);
+      gcc_assert (REG_P (scratch_or_premodify));
+      addr = XEXP (addr, 1);
+    }
+  gcc_assert (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM);
+
+  rs6000_emit_move (scratch_or_premodify, addr, Pmode);
+
+  mem = change_address (mem, VOIDmode, scratch_or_premodify);
+
+  /* Now create the move.  */
+  if (store_p)
+    emit_insn (gen_rtx_SET (VOIDmode, mem, reg));
+  else
+    emit_insn (gen_rtx_SET (VOIDmode, reg, mem));
+
+  return;
+}
+
 /* Target hook to return the cover classes for Integrated Register Allocator.
    Cover classes is a set of non-intersected register classes covering all hard
    registers used for register allocation purpose.  Any move between two
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 171446)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -9645,6 +9645,27 @@  (define_insn "*movdf_softfloat32"
   [(set_attr "type" "two,load,store,*,*,*")
    (set_attr "length" "8,8,8,8,12,16")])
 
+;; Reload patterns to support gpr load/store with misaligned mem.
+(define_expand "reload_di_store"
+  [(parallel [(match_operand 0 "memory_operand" "=m")
+              (match_operand 1 "gpc_reg_operand" "r")
+              (match_operand:DI 2 "register_operand" "=&b")])]
+  "TARGET_POWERPC64"
+{
+  rs6000_secondary_reload_ppc64 (operands[1], operands[0], operands[2], true);
+  DONE;
+})
+
+(define_expand "reload_di_load"
+  [(parallel [(match_operand 0 "gpc_reg_operand" "=r")
+              (match_operand 1 "memory_operand" "m")
+              (match_operand:DI 2 "register_operand" "=b")])]
+  "TARGET_POWERPC64"
+{
+  rs6000_secondary_reload_ppc64 (operands[0], operands[1], operands[2], false);
+  DONE;
+})
+
 ; ld/std require word-aligned displacements -> 'Y' constraint.
 ; List Y->r and r->Y before r->r for reload.
 (define_insn "*movdf_hardfloat64_mfpgpr"