diff mbox

[avr] Take 2: Fix PR64331: insn output and insn length computation rely on REG_DEAD notes.

Message ID 54F1EEEB.6060905@gjlay.de
State New
Headers show

Commit Message

Georg-Johann Lay Feb. 28, 2015, 4:38 p.m. UTC
Am 02/26/2015 um 11:45 PM schrieb Steven Bosscher:
> On Thu, Feb 26, 2015 at 8:35 PM, Georg-Johann Lay wrote:
>> Take #2 introduces a new, avr-specific rtl pass whose sole purpose is to
>> rectify notes.  The pass is scheduled right before cfg does down (right
>> before .*free_cfg) so that cfg and hence df machinery is available.
>>
>> Regression tests look fine and for the test case the patches produce correct
>> code and correct insn length.
>
> Sorry for party-pooping, but it seems to me that the real bug is that
> the target is lying to reload.
>
> IIUC the AVR port selects the insn alternative after register
> allocation (after reload). It bases its selection on REG_DEAD notes.
> In PR64331 an alternative is used that clobbers r20 that has a
> REG_DEAD note, but r20 is not actually dead because hardreg-cprop has
> propagated it forward without adjusting the note.

It's not actually about constraint alternatives.

Let me give an example: Testing HI for 0. The usual sequence would be

   cc0 = reg.low == 0
   cc0 = cc0 && reg.high == 0

which costs 2 instructions.  If reg is unused after, then ORing can be used and 
cc0 will be set as a side effect.  This costs 1 insn:

   cc0 = (reg.low |= reg.high)

Using alternatives would double their number, i.e. 14 instead of 7 for *cmphi.

These constraint alternatives would have to express

1) reg-alloc, please, use alternative #1 (with clobber of reg)
    if the register is unused after

2) reg-alloc, please, use alternative #2 (not clobbering reg)
    if the register is used after

If we assume for a moment that we have a *testhi insn and the old *tsthi had 
just 1 alternative:


(define_insn "*tsthi"
   [(set (cc0)
         (compare
          (match_operand:ALL2 0 "register_operand"  "r")
          (const_int 0)))]
  ...)


Then the new one would be something like


(define_insn "*tsthi"
   [(set (cc0)
         (compare
          (match_operand:ALL2 0 "register_operand"  "r,r")
          (const_int 0)))
    (clobber (match_scratch:HI 1                   "=0,X")]
...)

But how can I express 1) and 2) ?

reg-alloc's choices appear to be random from BE's perspective, and it fails 
with much simpler tasks to produce good results.

The first alternative which matches is the 1st, i.e. it would always clobber op 
0 which results in up to 4 instructions and higher register pressure if that 
register is still needed after the insn.

Making alternative 1 more expensive presumably results always in 2nd 
alternateive, i.e. 2 instructions instead of 1 if op 0 is no more needed.

Similar if we swap alternatives #1 and #2:  Then "r","X" will always match and 
appear to be cheaper than "r","=0".


Currently, my preferred approach is a new drop-in replacement for the old 
reg_unused_after which uses clobbers to decide whether or not op 0 is still 
needed.  That way, reg-alloc can work like before and there is no need to 
implement dozens of new constraint alternatives across the md files.

The clobbers, in turn, are added by a target-pass which translates deadness 
information to clobbers.  The preferred placement of that pass is the same as 
for the proposed new avr pass from patch take #2.

Below you find a quick hack that outlines the idea:

New insn attribute "dead" tags insns which are worth to be taken into account 
for such dead-->clobber translation and provide the operand number.  The new 
pass scans all insns and translates dead info to clobbers as requested.

This needs much more work, of course, like adjusting peepholes and splitters to 
the new insns with their additional operand.

avr_reg_unused_after_df (the old reg_unused_after) should be rewritten to use 
DF info instead of scanning by hand.

Some of the insn templates which used reg_unused_after are move insns; for 
these insns it's a bit different.

> The "normal" way of things is that the insn alternative is selected in
> reload (or in LRA) and that the clobbers are added as necessary. In

Would you outline a concrete example? In particular, how to avoid clobbers if 
the operand is no more needed after the insn, and use clobber alternative if 
the value is no more needed?  I.e. how would you express 1) and 2) from above 
in terms of insn constraints?

> PR64331, an alternative for insn r17 would be selected that has a
> CLOBBER for r20, prevent hardreg-cprop from propagating r20.
>
> Selecting insns based on REG-notes is dangerous business. Lying to
> reload and to post-RA passes is a mortal sin, the compiler will punish
> you. There is no guarantee that nothing will change between your new
> pass to recompute notes, and the final pass that emits the insns.

Ya.  However, for 4.9 I'd still propose patch take #2.  It definitely improves 
matters, does nothing wrong, and throwing out reg_unused_after completely will 
lead to considerable increase of code size and run time.

And for 4.9 a "big change" like outlined above is way too intrusive for already 
released version, imo.

For 5.0 I am also still proposing the patchh.  It does nothing wrong and is the 
base for a complete solution like the one just outlined.

Notice that the patch below is based on the proposed patch take #2.


Johann


+   (clobber (match_operand:ALL2 3 "scratch_operand" ""))]
    ""
    {
      switch (which_alternative)
@@ -4626,6 +4631,7 @@
      return avr_out_compare (insn, operands, NULL);
    }
    [(set_attr "cc" "compare")
+   (set_attr "dead" "0")
     (set_attr "length" "1,2,2,3,4,2,4")
     (set_attr "adjust_len" "tsthi,tsthi,*,*,*,compare,compare")])

@@ -4633,7 +4639,8 @@
    [(set (cc0)
          (compare (match_operand:PSI 0 "register_operand"  "r,r,d ,r  ,d,r")
                   (match_operand:PSI 1 "nonmemory_operand" "L,r,s ,s  ,M,n")))
-   (clobber (match_scratch:QI 2                          "=X,X,&d,&d ,X,&d"))]
+   (clobber (match_scratch:QI 2                          "=X,X,&d,&d ,X,&d"))
+   (clobber (match_operand:PSI 3 "scratch_operand" ""))]
    ""
    {
      switch (which_alternative)
@@ -4666,7 +4673,8 @@
    [(set (cc0)
          (compare (match_operand:ALL4 0 "register_operand"  "r  ,r ,d,r ,r")
                   (match_operand:ALL4 1 "nonmemory_operand" "Y00,r ,M,M ,n Ynn")))
-   (clobber (match_scratch:QI 2                           "=X  ,X ,X,&d,&d"))]
+   (clobber (match_scratch:QI 2                           "=X  ,X ,X,&d,&d"))
+   (clobber (match_operand:ALL4 3 "scratch_operand" ""))]
    ""
    {
      if (0 == which_alternative)
@@ -4677,6 +4685,7 @@
      return avr_out_compare (insn, operands, NULL);
    }
    [(set_attr "cc" "compare")
+   (set_attr "dead" "0")
     (set_attr "length" "4,4,4,5,8")
     (set_attr "adjust_len" "tstsi,*,compare,compare,compare")])

@@ -4706,7 +4715,8 @@
    [(parallel [(set (cc0)
                     (compare (match_operand:ORDERED234 1 "register_operand" "")
                              (match_operand:ORDERED234 2 "nonmemory_operand" "")))
-              (clobber (match_scratch:QI 4 ""))])
+              (clobber (match_scratch:QI 4 ""))
+              (clobber (scratch:ORDERED234))])
     (set (pc)
          (if_then_else
           (match_operator 0 "ordered_comparison_operator" [(cc0)

Comments

Steven Bosscher March 2, 2015, 4:12 p.m. UTC | #1
On Sat, Feb 28, 2015 at 5:38 PM, Georg-Johann Lay wrote:
> Am 02/26/2015 um 11:45 PM schrieb Steven Bosscher:
>>
>> On Thu, Feb 26, 2015 at 8:35 PM, Georg-Johann Lay wrote:
>>>
>>> Take #2 introduces a new, avr-specific rtl pass whose sole purpose is to
>>> rectify notes.  The pass is scheduled right before cfg does down (right
>>> before .*free_cfg) so that cfg and hence df machinery is available.
>>>
>>> Regression tests look fine and for the test case the patches produce
>>> correct
>>> code and correct insn length.
>>
>>
>> Sorry for party-pooping, but it seems to me that the real bug is that
>> the target is lying to reload.
>>
>> IIUC the AVR port selects the insn alternative after register
>> allocation (after reload). It bases its selection on REG_DEAD notes.
>> In PR64331 an alternative is used that clobbers r20 that has a
>> REG_DEAD note, but r20 is not actually dead because hardreg-cprop has
>> propagated it forward without adjusting the note.
>
>
> It's not actually about constraint alternatives.
>
> Let me give an example: Testing HI for 0. The usual sequence would be
>
>   cc0 = reg.low == 0
>   cc0 = cc0 && reg.high == 0
>
> which costs 2 instructions.  If reg is unused after, then ORing can be used
> and cc0 will be set as a side effect.  This costs 1 insn:
>
>   cc0 = (reg.low |= reg.high)
>
> Using alternatives would double their number, i.e. 14 instead of 7 for
> *cmphi.
>
> These constraint alternatives would have to express
>
> 1) reg-alloc, please, use alternative #1 (with clobber of reg)
>    if the register is unused after
>
> 2) reg-alloc, please, use alternative #2 (not clobbering reg)
>    if the register is used after
>
> If we assume for a moment that we have a *testhi insn and the old *tsthi had
> just 1 alternative:
>
>
> (define_insn "*tsthi"
>   [(set (cc0)
>         (compare
>          (match_operand:ALL2 0 "register_operand"  "r")
>          (const_int 0)))]
>  ...)
>
>
> Then the new one would be something like
>
>
> (define_insn "*tsthi"
>   [(set (cc0)
>         (compare
>          (match_operand:ALL2 0 "register_operand"  "r,r")
>          (const_int 0)))
>    (clobber (match_scratch:HI 1                   "=0,X")]
> ...)
>
> But how can I express 1) and 2) ?

I don't think you can. Other ports express such transformations using
define_peephole2 and peep2_reg_dead_p.


> Currently, my preferred approach is a new drop-in replacement for the old
> reg_unused_after which uses clobbers to decide whether or not op 0 is still
> needed.  That way, reg-alloc can work like before and there is no need to
> implement dozens of new constraint alternatives across the md files.

The problem with this approach is that it may break at random. There's
just no guarantee that it will work, because you're relying on
information that just isn't valid anymore.

Unfortunately I don't know enough about CC0-targets to suggest an alternative.

Ciao!
Steven
Georg-Johann Lay March 2, 2015, 6:22 p.m. UTC | #2
Am 03/02/2015 um 05:12 PM schrieb Steven Bosscher:
> On Sat, Feb 28, 2015 at 5:38 PM, Georg-Johann Lay wrote:
>> Am 02/26/2015 um 11:45 PM schrieb Steven Bosscher:
>>>
>>> On Thu, Feb 26, 2015 at 8:35 PM, Georg-Johann Lay wrote:
>>>>
>>>> Take #2 introduces a new, avr-specific rtl pass whose sole purpose is to
>>>> rectify notes.  The pass is scheduled right before cfg does down (right
>>>> before .*free_cfg) so that cfg and hence df machinery is available.
>>>>
>>>> Regression tests look fine and for the test case the patches produce
>>>> correct
>>>> code and correct insn length.
>>>
>>>
>>> Sorry for party-pooping, but it seems to me that the real bug is that
>>> the target is lying to reload.
>>>
>>> IIUC the AVR port selects the insn alternative after register
>>> allocation (after reload). It bases its selection on REG_DEAD notes.
>>> In PR64331 an alternative is used that clobbers r20 that has a
>>> REG_DEAD note, but r20 is not actually dead because hardreg-cprop has
>>> propagated it forward without adjusting the note.
>>
>>
>> It's not actually about constraint alternatives.
>>
>> Let me give an example: Testing HI for 0. The usual sequence would be
>>
>>    cc0 = reg.low == 0
>>    cc0 = cc0 && reg.high == 0
>>
>> which costs 2 instructions.  If reg is unused after, then ORing can be used
>> and cc0 will be set as a side effect.  This costs 1 insn:
>>
>>    cc0 = (reg.low |= reg.high)
>>
>> Using alternatives would double their number, i.e. 14 instead of 7 for
>> *cmphi.
>>
>> These constraint alternatives would have to express
>>
>> 1) reg-alloc, please, use alternative #1 (with clobber of reg)
>>     if the register is unused after
>>
>> 2) reg-alloc, please, use alternative #2 (not clobbering reg)
>>     if the register is used after
>>
>> If we assume for a moment that we have a *testhi insn and the old *tsthi had
>> just 1 alternative:
>>
>>
>> (define_insn "*tsthi"
>>    [(set (cc0)
>>          (compare
>>           (match_operand:ALL2 0 "register_operand"  "r")
>>           (const_int 0)))]
>>   ...)
>>
>>
>> Then the new one would be something like
>>
>>
>> (define_insn "*tsthi"
>>    [(set (cc0)
>>          (compare
>>           (match_operand:ALL2 0 "register_operand"  "r,r")
>>           (const_int 0)))
>>     (clobber (match_scratch:HI 1                   "=0,X")]
>> ...)
>>
>> But how can I express 1) and 2) ?
>
> I don't think you can. Other ports express such transformations using
> define_peephole2 and peep2_reg_dead_p.

This means to duplicate all patterns that currently make use of 
reg_unused_after...  With proposed "dead" attribute there is no need to 
duplicate all patterns.  And it is explicit about the insns which want to use 
that information: the insn itself carries all needed information :-)


>> Currently, my preferred approach is a new drop-in replacement for the old
>> reg_unused_after which uses clobbers to decide whether or not op 0 is still
>> needed.  That way, reg-alloc can work like before and there is no need to
>> implement dozens of new constraint alternatives across the md files.
>
> The problem with this approach is that it may break at random. There's
> just no guarantee that it will work, because you're relying on
> information that just isn't valid anymore.

Would you elaborate on this?

In the proposed solution, the new reg_unused_after just scans the insn pattern 
for clobber of the register of interest, why isn't that clobber valid any more?

If a register is dead after a specific insn, isn't it valid to add a clobber of 
respective regsiter to the insn pattern?  Why does that break at random when 
the pass which adds these clobbers runs before df / cfg is down and has correct 
deadness information availabe?

If some optimization pass, which is filed after that clobber has been added, 
decides to add an insn which uses the contents of that clobbered register, then 
that pass is plain wrong.

The old reg_unused_after has been renamed to avr_reg_unused_after_df and runs 
before cfg goes down, hence is basing its decisions on valid df information -- 
modulo the fact that reg_unused_after is antique code and might need to be 
updated for different reasons.

In principle it should be in order to rely on df at that time (before pass 
free_cfg), isn't it?  Or am I still missing some gory details for why df / cfg 
is invalid and must not be used befor pass *free_cfg for some obscure reason?

Johann

> Unfortunately I don't know enough about CC0-targets to suggest an alternative.
>
> Ciao!
> Steven
diff mbox

Patch

diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index 827b280..2510739 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -83,6 +83,7 @@ 
  #include "builtins.h"
  #include "context.h"
  #include "tree-pass.h"
+#include "rtl-iter.h"

  /* Maximal allowed offset for an address in the LD command */
  #define MAX_LD_OFFSET(MODE) (64 - (signed)GET_MODE_SIZE (MODE))
@@ -331,6 +332,40 @@  avr_to_int_mode (rtx x)
  }


+static int avr_reg_unused_after_df (rtx_insn *insn, rtx reg);
+
+static void
+avr_rest_of_recompute_notes (void)
+{
+  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      int opno, scratchno;
+
+      if (!NONDEBUG_INSN_P (insn)
+          || recog_memoized (insn) < 0
+          || (-1 == (opno = (int8_t) get_attr_dead (insn))))
+        continue;
+
+      extract_insn (insn);
+
+      scratchno = recog_data.n_operands - 1;
+
+      rtx op = recog_data.operand[opno];
+      rtx *pscratch = recog_data.operand_loc[scratchno];
+
+      avr_dump ("\ninsn with dead attr for op %d %r --> op %d %r:\n%r\n",
+                opno, op, scratchno, *pscratch, insn);
+
+      if (REG_P (op)
+          && avr_reg_unused_after_df (insn, op))
+        {
+          *pscratch = op;
+        }
+    }
+}
+
+
  static const pass_data avr_pass_data_recompute_notes =
  {
    RTL_PASS,      // type
@@ -359,6 +394,8 @@  public:
      df_note_add_problem ();
      df_analyze ();

+    avr_rest_of_recompute_notes ();
+
      return 0;
    }
  }; // avr_pass_recompute_notes
@@ -8731,6 +8768,24 @@  avr_adjust_insn_length (rtx_insn *insn, int len)
  int
  reg_unused_after (rtx_insn *insn, rtx reg)
  {
+  subrtx_iterator::array_type array;
+
+  FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
+    {
+      const_rtx x = *iter;
+
+      if (CLOBBER == GET_CODE (x)
+          && rtx_equal_p (reg, XEXP (x, 0)))
+        return true;
+    }
+
+  return false;
+}
+
+
+static int
+avr_reg_unused_after_df (rtx_insn *insn, rtx reg)
+{
    return (dead_or_set_p (insn, reg)
  	  || (REG_P(reg) && _reg_unused_after (insn, reg)));
  }
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 1b39ddb..bc8f701 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -238,6 +238,9 @@ 
           ] (const_int 0)))


+(define_attr "dead" ""
+  (const_int 255))
+
  ;; Define mode iterators
  (define_mode_iterator QIHI  [QI HI])
  (define_mode_iterator QIHI2 [QI HI])
@@ -4518,6 +4521,7 @@ 
    "cp __zero_reg__,%A0
  	cpc __zero_reg__,%B0"
  [(set_attr "cc" "compare")
+ (set_attr "dead" "0")
   (set_attr "length" "2")])

  (define_insn "*negated_tstpsi"
@@ -4598,7 +4602,8 @@ 
    [(set (cc0)
          (compare (match_operand:ALL2 0 "register_operand"  "!w  ,r  ,r,d ,r 
,d,r")
                   (match_operand:ALL2 1 "nonmemory_operand"  "Y00,Y00,r,s ,s 
,M,n Ynn")))
-   (clobber (match_scratch:QI 2                            "=X  ,X  ,X,&d,&d 
,X,&d"))]
+   (clobber (match_scratch:QI 2                            "=X  ,X  ,X,&d,&d 
,X,&d"))