Message ID | 20100930182927.GL32503@codesourcery.com |
---|---|
State | New |
Headers | show |
On 09/30/2010 08:29 PM, Nathan Froyd wrote: > The solution Bernd came up with is this: if we find an equivalent > register for an output reload, we remember it in reload_override_in > rather than reg_rtx. [...] > Once we have the, the fix becomes straightforward: emit_reload_insns can > check to see if we inherited an output reload. If we did, then there's > no point in keeping that insn around, so we might as well delete it, > which is what reload_as_needed has bee modified to do. > > Bootstrapping in progress on x86-64. OK to commit and backport to 4.5 > and 4.4? Thanks for debugging and submitting the patch. I think ideally we'd get a second opinion on this, but if no one comments within a week you can check it in. Bernd
> PR44606 is bug because GCC tries to be very clever. Before register > allocation, we have the insns: > > (insn 323 319 84 6 pr44606.c:34 (set (reg:DF 750) > (mem/u/c/i:DF (reg/f:SI 719) [5 S8 A64])) 1503 {*movdf_e500_double} > (expr_list:REG_DEAD (reg/f:SI 719) (expr_list:REG_EQUAL (const_double:DF > 5.0e-1 [0x0.8p+0]) > (nil)))) > > ... > > (insn 63 502 491 6 pr44606.c:20 (set (reg/v:SI 221 [ x ]) > (const_int 0 [0x0])) 349 {*movsi_internal1} (nil)) > > During register allocation, pseudo 750 gets assigned to r27. Insn 63 > receives an output reload, and choose_reload_regs goes looking for an > possibly equivalent register that holds 0. find_equiv_regs has this > > chunk of code when looking for an equivalence: > || (goal_const && REG_NOTES (p) != 0 > > && (tem = find_reg_note (p, REG_EQUIV, NULL_RTX)) > && ((rtx_equal_p (XEXP (tem, 0), goal) > && (valueno > = true_regnum (valtry = SET_DEST (pat))) >= 0) > > || (REG_P (SET_DEST (pat)) > > && GET_CODE (XEXP (tem, 0)) == CONST_DOUBLE > && SCALAR_FLOAT_MODE_P (GET_MODE (XEXP (tem, 0))) > && CONST_INT_P (goal) > && 0 != (goaltry > = operand_subword (XEXP (tem, 0), 0, 0, > VOIDmode)) > && rtx_equal_p (goal, goaltry) > && (valtry > = operand_subword (SET_DEST (pat), 0, 0, > VOIDmode)) > && (valueno = true_regnum (valtry)) >= 0))) > > || (goal_const && (tem = find_reg_note (p, REG_EQUIV, > > NULL_RTX)) > && REG_P (SET_DEST (pat)) > && GET_CODE (XEXP (tem, 0)) == CONST_DOUBLE > && SCALAR_FLOAT_MODE_P (GET_MODE (XEXP (tem, 0))) > && CONST_INT_P (goal) > && 0 != (goaltry = operand_subword (XEXP (tem, 0), 1, 0, > VOIDmode)) > && rtx_equal_p (goal, goaltry) > && (valtry > = operand_subword (SET_DEST (pat), 1, 0, VOIDmode)) > && (valueno = true_regnum (valtry)) >= 0))) > > The complexity here comes is because we also consider the case where we > have a CONST_DOUBLE somewhere whose low or high part is equal to the > value we're trying to find an equivalence for. In this case, the low > bits of 0.5 are equal to zero, so find_equiv_regs determines that r27 is > a suitable equivalence. After register allocation, then, we have: > > (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750]) > (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503 > {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1 > [0x0.8p+0]) (nil))) > > ... > > (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750]) > (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL > (const_int 0 [0x0]) (nil))) > > and a later DCE determines that insn 323 is no longer needed, deleting > it and resulting in incorrect code at runtime. What's incorrect exactly? > Bootstrapping in progress on x86-64. OK to commit and backport to 4.5 > and 4.4? > > -Nathan > > gcc/ > * reload1.c (emit_reload_insns): Adjust prototype. Check for > inherited output reloads. The return value must be documented. > (reload_as_needed): Delete insn if emit_reload_insns returns > true. > (choose_reload_regs): Save equiv in reload_override_in for > output reloads. Set reg_rtx from reload_override_in. I think that deleting new insns in reload is too risky on the branches. Can't we tighten the above condition instead on the branches so that it will return false in this case?
On Thu, Sep 30, 2010 at 3:16 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750]) >> (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503 >> {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1 >> [0x0.8p+0]) (nil))) >> >> ... >> >> (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750]) >> (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL >> (const_int 0 [0x0]) (nil))) >> >> and a later DCE determines that insn 323 is no longer needed, deleting >> it and resulting in incorrect code at runtime. > > What's incorrect exactly? The upper part of (reg:DF 27 27 [750]) would not contain the correct value between the setting of the register to 0.5 and setting the lower (SI) part to 0. I think he forgot to mention (reg:DF 27 27 [750]) is used between those two instructions. Thanks, Andrew Pinski
On 10/01/2010 12:16 AM, Eric Botcazou wrote: >> After register allocation, then, we have: >> >> (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750]) >> (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503 >> {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1 >> [0x0.8p+0]) (nil))) >> >> ... >> >> (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750]) >> (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL >> (const_int 0 [0x0]) (nil))) >> >> and a later DCE determines that insn 323 is no longer needed, deleting >> it and resulting in incorrect code at runtime. > > What's incorrect exactly? Rather than reusing the portion of reg:DF 27 which holds SImode zero, it clobbered all of reg:DF 27, which is live past insn 63. This piece of code: /* If this is an output reload from a simple move insn, look if an equivalence for the input is available. */ else if (inheritance && rld[r].in == 0 && rld[r].out != 0) { rtx set = single_set (insn); if (set && rtx_equal_p (rld[r].out, SET_DEST (set)) && CONSTANT_P (SET_SRC (set))) search_equiv = SET_SRC (set); } made us look for the equivalence. If we find one, and use it, the original insn (#63 in this case) should be deleted, since it would just load up the same value we already found (and in this case, clobber other parts of the register). Bernd
On Fri, Oct 01, 2010 at 12:35:16AM +0200, Bernd Schmidt wrote: > On 10/01/2010 12:16 AM, Eric Botcazou wrote: > >> After register allocation, then, we have: > >> > >> (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750]) > >> (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503 > >> {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1 > >> [0x0.8p+0]) (nil))) > >> > >> ... > >> > >> (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750]) > >> (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL > >> (const_int 0 [0x0]) (nil))) > >> > >> and a later DCE determines that insn 323 is no longer needed, deleting > >> it and resulting in incorrect code at runtime. > > > > What's incorrect exactly? > > Rather than reusing the portion of reg:DF 27 which holds SImode zero, it > clobbered all of reg:DF 27, which is live past insn 63. This piece of code: > /* If this is an output reload from a simple move insn, look > if an equivalence for the input is available. */ > else if (inheritance && rld[r].in == 0 && rld[r].out != 0) > { > rtx set = single_set (insn); > > if (set > && rtx_equal_p (rld[r].out, SET_DEST (set)) > && CONSTANT_P (SET_SRC (set))) > search_equiv = SET_SRC (set); > } > > made us look for the equivalence. If we find one, and use it, the > original insn (#63 in this case) should be deleted, since it would just > load up the same value we already found (and in this case, clobber other > parts of the register). Technically, it's OK that reload did this. Regular 32-bit PPC insns do not affect the high bits of the register on E500. If we stopped after register allocation, we'd be OK. The problems arise later, when dead code elimination sees: r27:DF = [stuff] ; load 0.5 from memory ... ; no intervening use of r27 r27:SI = 0 ; DF thinks the original load is dead here, and ; deletes it, which means later code is finding ; and using garbage in the high bits of r27. -Nathan
On Thu, Sep 30, 2010 at 4:24 PM, Nathan Froyd <froydnj@codesourcery.com> wrote: > Technically, it's OK that reload did this. Regular 32-bit PPC insns do > not affect the high bits of the register on E500. If we stopped after > register allocation, we'd be OK. The problems arise later, when > dead code elimination sees: > > r27:DF = [stuff] ; load 0.5 from memory > ... ; no intervening use of r27 > r27:SI = 0 ; DF thinks the original load is dead here, and > ; deletes it, which means later code is finding > ; and using garbage in the high bits of r27. Then this sounds like a bug in DF/DCE rather than reload. -- Pinski
On 10/01/2010 01:29 AM, Andrew Pinski wrote: > On Thu, Sep 30, 2010 at 4:24 PM, Nathan Froyd <froydnj@codesourcery.com> wrote: >> Technically, it's OK that reload did this. Regular 32-bit PPC insns do >> not affect the high bits of the register on E500. If we stopped after >> register allocation, we'd be OK. The problems arise later, when >> dead code elimination sees: >> >> r27:DF = [stuff] ; load 0.5 from memory >> ... ; no intervening use of r27 >> r27:SI = 0 ; DF thinks the original load is dead here, and >> ; deletes it, which means later code is finding >> ; and using garbage in the high bits of r27. > > Then this sounds like a bug in DF/DCE rather than reload. A normal set is always considered to clobber the entire register. Bernd
> > What's incorrect exactly? > > The upper part of (reg:DF 27 27 [750]) would not contain the correct > value between the setting of the register to 0.5 and setting the lower > (SI) part to 0. I think he forgot to mention (reg:DF 27 27 [750]) is > used between those two instructions. Between the instructions? That would be a big DCE bug then. Is that not rather after both instructions, in which case patching reload would indeed seem to be the correct thing to do.
On Fri, Oct 01, 2010 at 12:16:50AM +0200, Eric Botcazou wrote: > > (reload_as_needed): Delete insn if emit_reload_insns returns > > true. > > (choose_reload_regs): Save equiv in reload_override_in for > > output reloads. Set reg_rtx from reload_override_in. > > I think that deleting new insns in reload is too risky on the branches. Can't > we tighten the above condition instead on the branches so that it will return > false in this case? I don't see a good way to tighten the above condition. Deleting the cleverness is an option, though. ;) The other possibility is to teach postreload about this sort of thing and have it delete the second insn. -Nathan
> I don't see a good way to tighten the above condition. Deleting the > cleverness is an option, though. ;) How about adding a check that the modes have the same size when this is for an output reload?
On 10/01/2010 11:38 PM, Eric Botcazou wrote: >> I don't see a good way to tighten the above condition. Deleting the >> cleverness is an option, though. ;) > > How about adding a check that the modes have the same size when this is for an > output reload? That's just papering over the real problem. I think the intent is clearly that if we find an equivalence, the move insn being reloaded is removed (as it serves no purpose if we already have the value in a register) and only the output reload remains. For the release branches it's conceivable to disable this optimization, it shouldn't trigger that often anyway. Bernd
On 10/02/2010 01:16 AM, Bernd Schmidt wrote: > For the release branches it's conceivable to disable this optimization, > it shouldn't trigger that often anyway. Actually, maybe it's best to just do that everywhere. It doesn't seem to trigger in an i686-linux bootstrap, and it's something we can clean up in postreload if we really want to. Bernd
On Sat, Oct 02, 2010 at 02:45:08AM +0200, Bernd Schmidt wrote: > On 10/02/2010 01:16 AM, Bernd Schmidt wrote: > > For the release branches it's conceivable to disable this optimization, > > it shouldn't trigger that often anyway. > > Actually, maybe it's best to just do that everywhere. It doesn't seem > to trigger in an i686-linux bootstrap, and it's something we can clean > up in postreload if we really want to. I'd be really surprised if it triggered in a bootstrap anywhere, given that GCC's floating-point usage is practically zero. I honestly can't think of many places it'd help: soft-float targets and slightly odd targets like E500 are the only ones that come to mind. Any target that permits SImode subregs of DFmode registers would benefit, I suppose, but even then the stars have to align... -Nathan
On 10/02/2010 02:50 AM, Nathan Froyd wrote: > I'd be really surprised if it triggered in a bootstrap anywhere, given > that GCC's floating-point usage is practically zero. > > I honestly can't think of many places it'd help: soft-float targets and > slightly odd targets like E500 are the only ones that come to mind. Any > target that permits SImode subregs of DFmode registers would benefit, I > suppose, but even then the stars have to align... I mean not just the specific problem that was responsible for the bug, but the entire case where we try to find an equivalence for an output reload and actually keep it. I put an additional abort in the patch and successfully bootstrapped it. Bernd
> That's just papering over the real problem. I think the intent is > clearly that if we find an equivalence, the move insn being reloaded is > removed (as it serves no purpose if we already have the value in a > register) and only the output reload remains. > For the release branches it's conceivable to disable this optimization, > it shouldn't trigger that often anyway. As already mentioned, my proposal was for the release branches only.
Index: testsuite/gcc.dg/pr44606.c =================================================================== --- testsuite/gcc.dg/pr44606.c (revision 0) +++ testsuite/gcc.dg/pr44606.c (revision 0) @@ -0,0 +1,53 @@ +/* PR target/44606 */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +typedef struct file FILE; +extern FILE *stderr; +extern int fprintf (FILE *, const char *, ...); +extern void abort (void); + + typedef struct _PixelPacket { unsigned char r, g, b; } + PixelPacket; +#define ARRAYLEN(X) (sizeof(X)/sizeof(X[0])) +PixelPacket q[6]; +#define COLS (ARRAYLEN(q) - 1) +PixelPacket p[2*COLS + 22]; +#define Minify(POS, WEIGHT) do { \ + total_r += (WEIGHT)*(p[POS].r); \ + total_g += (WEIGHT)*(p[POS].g); \ + total_b += (WEIGHT)*(p[POS].b); \ +} while (0) +unsigned long columns = COLS; +int main(void) +{ + static const unsigned char answers[COLS] = { 31, 32, 34, 35, 36 }; + unsigned long x; + for (x = 0; x < sizeof(p)/sizeof(p[0]); x++) { + p[x].b = (x + 34) | 1; + } + for (x = 0; x < columns; x++) { + double total_r = 0, total_g = 0, total_b = 0; + double saved_r = 0, saved_g = 0, saved_b = 0; + Minify(2*x + 0, 3.0); + Minify(2*x + 1, 7.0); + Minify(2*x + 2, 7.0); + saved_r = total_r; + saved_g = total_g; + Minify(2*x + 11, 15.0); + Minify(2*x + 12, 7.0); + Minify(2*x + 18, 7.0); + Minify(2*x + 19, 15.0); + Minify(2*x + 20, 15.0); + Minify(2*x + 21, 7.0); + q[x].r = (unsigned char)(total_r/128.0 + 0.5); + q[x].g = (unsigned char)(total_g/128.0 + 0.5); + q[x].b = (unsigned char)(total_b/128.0 + 0.5); + fprintf(stderr, "r:%f g:%f b:%f\n", saved_r, saved_g, saved_b); + } + for (x = 0; x < COLS; x++) { + if (answers[x] != q[x].b) + abort(); + } + return 0; +} Index: testsuite/ChangeLog =================================================================== Index: reload1.c =================================================================== --- reload1.c (revision 164751) +++ reload1.c (working copy) @@ -441,7 +441,7 @@ static void emit_output_reload_insns (st int); static void do_input_reload (struct insn_chain *, struct reload *, int); static void do_output_reload (struct insn_chain *, struct reload *, int); -static void emit_reload_insns (struct insn_chain *); +static bool emit_reload_insns (struct insn_chain *); static void delete_output_reload (rtx, int, int, rtx); static void delete_address_reloads (rtx, rtx); static void delete_address_reloads_1 (rtx, rtx, rtx); @@ -4590,6 +4590,7 @@ reload_as_needed (int live_known) { rtx next = NEXT_INSN (insn); rtx p; + bool delete_this; prev = PREV_INSN (insn); @@ -4601,7 +4602,7 @@ reload_as_needed (int live_known) /* Generate the insns to reload operands into or out of their reload regs. */ - emit_reload_insns (chain); + delete_this = emit_reload_insns (chain); /* Substitute the chosen reload regs from reload_reg_rtx into the insn's body (or perhaps into the bodies of other @@ -4628,6 +4629,10 @@ reload_as_needed (int live_known) "impossible reload"); delete_insn (p); } + /* emit_reload_insns may have indicated this insn is + unnecessary due to inherited output reloads. */ + if (delete_this) + delete_insn (insn); } if (num_eliminable && chain->need_elim) @@ -5695,8 +5700,9 @@ static char reload_inherited[MAX_RELOADS if we know it. Otherwise, this is 0. */ static rtx reload_inheritance_insn[MAX_RELOADS]; -/* If nonzero, this is a place to get the value of the reload, - rather than using reload_in. */ +/* If nonzero, this is a place to get the value of the reload, rather + than using reload_in. For output reloads, this holds the equivalent + register if the reload is inherited. */ static rtx reload_override_in[MAX_RELOADS]; /* For each reload, the hard register number of the register used, @@ -6741,7 +6747,10 @@ choose_reload_regs (struct insn_chain *c { int nr = hard_regno_nregs[regno][rld[r].mode]; int k; - rld[r].reg_rtx = equiv; + if (rld[r].out && !rld[r].in) + reload_override_in[r] = equiv; + else + rld[r].reg_rtx = equiv; reload_spill_index[r] = regno; reload_inherited[r] = 1; @@ -6916,7 +6925,12 @@ choose_reload_regs (struct insn_chain *c actually override reload_in. */ for (j = 0; j < n_reloads; j++) if (reload_override_in[j]) - rld[j].in = reload_override_in[j]; + { + if (rld[j].in) + rld[j].in = reload_override_in[j]; + else + rld[j].reg_rtx = reload_override_in[j]; + } /* If this reload won't be done because it has been canceled or is optional and not inherited, clear reload_reg_rtx so other @@ -7947,7 +7961,7 @@ inherit_piecemeal_p (int dest ATTRIBUTE_ /* Output insns to reload values in and out of the chosen reload regs. */ -static void +static bool emit_reload_insns (struct insn_chain *chain) { rtx insn = chain->insn; @@ -8370,6 +8384,10 @@ emit_reload_insns (struct insn_chain *ch } } IOR_HARD_REG_SET (reg_reloaded_dead, reg_reloaded_died); + for (j = 0; j < n_reloads; j++) + if (!rld[j].in && rld[j].out && reload_override_in[j]) + return true; + return false; } /* Go through the motions to emit INSN and test if it is strictly valid.