ifcvt vs. expand_binop

Message ID	1442928926.2509.23.camel@t-online.de
State	New
Headers	show Return-Path: <gcc-patches-return-408022-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; q=dns; s=default; b=xsuvjeCkiCz0Sm3X MODMi8f3rHqbqpPsz6TgopB8OROx/B/Z47Mb0+/fuTAeLp86oBxNet+LhGrTAJOr BgEFuwO3ZeREGyQcZJq920wlyhlOTzp21vFqCz8K9u7Bv9iCOO3ouMCqA5w3WBXT YthuXk4hg/6yAPsf3eqoeJutrVg= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Message-ID: <1442928926.2509.23.camel@t-online.de> Subject: ifcvt vs. expand_binop From: Oleg Endo <oleg.endo@t-online.de> To: GCC Patches <gcc-patches@gcc.gnu.org> Cc: Jeff Law <law@redhat.com>, Kyrill Tkachov <kyrylo.tkachov@arm.com>, Richard Henderson <rth@redhat.com> Date: Tue, 22 Sep 2015 22:35:26 +0900 Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit

Oleg Endo Sept. 22, 2015, 1:35 p.m. UTC

On SH, the result of comparisons etc. is stored in the T_REG.  It's a 1
bit reg but described as SImode.  To get the T_REG into another reg,
there's this insn:

(define_insn "movt"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
	(match_operand:SI 1 "t_reg_operand"))]
  "TARGET_SH1"
  "movt	%0"
  [(set_attr "type" "arith")])

where "t_reg_operand" accepts various forms of the T_REG via
reg,subreg,sign_extend,zero_extend (to get better combine results).

Now I'd like to extend the "t_reg_operand" predicate to accept something
like

(set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0)))


For the test case 

int test05 (int a, int b)
{
  return a != b ? 0x7FFFFFFF : 0x80000000;
}

ifcvt (noce_emit_store_flag) tries to emit the store flag insn as

(set (reg:SI 162)
    (ne:SI (reg:SI 147 t)
        (const_int 0 [0])))

It then proceeds with a call to

expand_simple_binop
  op0: (const_int 2147483647 [0x7fffffff])
  op1: (reg:SI 162)          <<< same reg
  target: (reg:SI 162)       <<<

The problem is that expand_binop gets confused, seemingly because
op1 and target are the same.  It tries to emit a 64 bit addition:

(insn 50 49 51 (clobber (reg:DI 173)) -1
     (nil))

(insn 51 50 52 (set (subreg:SI (reg:DI 173) 0)
        (reg:SI 162)) -1
     (nil))

(insn 52 51 53 (set (reg:DI 175)
        (const_int 2147483647 [0x7fffffff])) -1
     (nil))

(insn 53 52 0 (parallel [
            (set (reg:DI 174)
                (plus:DI (reg:DI 173)
                    (reg:DI 175)))
            (clobber (reg:SI 147 t))
        ]) -1
     (nil))

ifcvt (end_ifcvt_sequence) then does a recog for each emitted insn in
the resulting sequence and insn 50 above will not match anything in the
SH md.  It then drops everything and tries something else.

Until now, because SH patterns would not accept the
(set (reg:SI 162)
    (ne:SI (reg:SI 147 t)
        (const_int 0 [0])))

the "fallback path" in noce_emit_store_flag is used.  This uses
emit_store_flag which creates a new pseudo for the result and we get

expand_simple_binop
  op0: (const_int 2147483647 [0x7fffffff])
  op1: (reg:SI 167)          <<< different regs
  target: (reg:SI 162)       <<< 

which works fine (no 64 bit addition).

A simple fix seems to be to create a new pseudo for the "short cut" path
in noce_emit_store_flag:


I haven't done full testing of this, but it fixes the problem for me.
Any opinions?

Cheers,
Oleg

Kyrylo Tkachov Sept. 22, 2015, 2:21 p.m. UTC | #1

Hi Oleg,

On 22/09/15 14:35, Oleg Endo wrote:
> On SH, the result of comparisons etc. is stored in the T_REG.  It's a 1
> bit reg but described as SImode.  To get the T_REG into another reg,
> there's this insn:
>
> (define_insn "movt"
>    [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> 	(match_operand:SI 1 "t_reg_operand"))]
>    "TARGET_SH1"
>    "movt	%0"
>    [(set_attr "type" "arith")])
>
> where "t_reg_operand" accepts various forms of the T_REG via
> reg,subreg,sign_extend,zero_extend (to get better combine results).
>
> Now I'd like to extend the "t_reg_operand" predicate to accept something
> like
>
> (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0)))
>
>
> For the test case
>
> int test05 (int a, int b)
> {
>    return a != b ? 0x7FFFFFFF : 0x80000000;
> }
>
> ifcvt (noce_emit_store_flag) tries to emit the store flag insn as
>
> (set (reg:SI 162)
>      (ne:SI (reg:SI 147 t)
>          (const_int 0 [0])))
>
> It then proceeds with a call to
>
> expand_simple_binop
>    op0: (const_int 2147483647 [0x7fffffff])
>    op1: (reg:SI 162)          <<< same reg
>    target: (reg:SI 162)       <<<

where does noce_emit_store_flag call expand_simple_binop?
Do you mean the code following the call to noce_emit_store_flag
in noce_try_store_flag_constants? (I suspect that's the code that
will get triggered for your testcase)

> The problem is that expand_binop gets confused, seemingly because
> op1 and target are the same.  It tries to emit a 64 bit addition:
>
> (insn 50 49 51 (clobber (reg:DI 173)) -1
>       (nil))
>
> (insn 51 50 52 (set (subreg:SI (reg:DI 173) 0)
>          (reg:SI 162)) -1
>       (nil))
>
> (insn 52 51 53 (set (reg:DI 175)
>          (const_int 2147483647 [0x7fffffff])) -1
>       (nil))
>
> (insn 53 52 0 (parallel [
>              (set (reg:DI 174)
>                  (plus:DI (reg:DI 173)
>                      (reg:DI 175)))
>              (clobber (reg:SI 147 t))
>          ]) -1
>       (nil))

huh, why does it try to do that? I'll admit I don't know the details
of how that expansion works. Presumably it's very target-specific

> ifcvt (end_ifcvt_sequence) then does a recog for each emitted insn in
> the resulting sequence and insn 50 above will not match anything in the
> SH md.  It then drops everything and tries something else.
>
> Until now, because SH patterns would not accept the
> (set (reg:SI 162)
>      (ne:SI (reg:SI 147 t)
>          (const_int 0 [0])))
>
> the "fallback path" in noce_emit_store_flag is used.  This uses
> emit_store_flag which creates a new pseudo for the result and we get
>
> expand_simple_binop
>    op0: (const_int 2147483647 [0x7fffffff])
>    op1: (reg:SI 167)          <<< different regs
>    target: (reg:SI 162)       <<<
>
> which works fine (no 64 bit addition).
>
> A simple fix seems to be to create a new pseudo for the "short cut" path
> in noce_emit_store_flag:
>
> Index: gcc/ifcvt.c
> ===================================================================
> --- gcc/ifcvt.c	(revision 227970)
> +++ gcc/ifcvt.c	(working copy)
> @@ -874,6 +874,7 @@
>       {
>         rtx src = gen_rtx_fmt_ee (code, GET_MODE (x), XEXP (cond, 0),
>   			    XEXP (cond, 1));
> +      x = gen_reg_rtx (GET_MODE (x));
>         rtx set = gen_rtx_SET (x, src);
>   
>         start_sequence ();
>
> I haven't done full testing of this, but it fixes the problem for me.
> Any opinions?

Looks reasonable to me with one caveat.
The comment text for noce_emit_store_flag is not very helpful
and my first expectation of the X parameter to it is that it is the
target where the flag is stored. With your case this will no longer
be the case (emit_store_flag seems to attempt to return the result in
x, judging by its comment in expmed.c). I think it would be a good idea
to expand the comment of noce_emit_store_flag to describe its arguments
and to mention that the result is returned but is not necessarily placed in X.

I think all the callers in ifcvt.c already guard against that possibility and
do something along the lines of:
  target = noce_emit_store_flag (..., x, ..., ...);
  if (target != if_info->x)
    noce_emit_move_insn (if_info->x, target);

but it would be nice to go through the callers and make sure that they don't assume
that the result of noce_emit_store_flag is in always x.

Kyrill

>
> Cheers,
> Oleg
>

Oleg Endo Sept. 22, 2015, 3:48 p.m. UTC | #2

Hi,

On Tue, 2015-09-22 at 15:21 +0100, Kyrill Tkachov wrote:

> where does noce_emit_store_flag call expand_simple_binop?
> Do you mean the code following the call to noce_emit_store_flag
> in noce_try_store_flag_constants? (I suspect that's the code that
> will get triggered for your testcase)

Sorry, forgot to mention that.  Yes, it's in
noce_try_store_flag_constants:

  /* Always use ifalse here.  It should have been swapped with itrue
     when appropriate when reversep is true.  */
  target = expand_simple_binop (mode, subtract_flag_p ? MINUS : PLUS,

> > The problem is that expand_binop gets confused, seemingly because
> > op1 and target are the same.  It tries to emit a 64 bit addition:
> >
> > (insn 50 49 51 (clobber (reg:DI 173)) -1
> >       (nil))
> >
> > (insn 51 50 52 (set (subreg:SI (reg:DI 173) 0)
> >          (reg:SI 162)) -1
> >       (nil))
> >
> > (insn 52 51 53 (set (reg:DI 175)
> >          (const_int 2147483647 [0x7fffffff])) -1
> >       (nil))
> >
> > (insn 53 52 0 (parallel [
> >              (set (reg:DI 174)
> >                  (plus:DI (reg:DI 173)
> >                      (reg:DI 175)))
> >              (clobber (reg:SI 147 t))
> >          ]) -1
> >       (nil))
> 
> huh, why does it try to do that? I'll admit I don't know the details
> of how that expansion works. Presumably it's very target-specific

I haven't checked the details.  But I guess because expand_binop wants
to somehow reuse the input and output it creates a DImode pseudo, puts
the input there, does the DImode plus and the returned "target" is a
SImode subreg of the DImode result.  The strange thing is the first
clobber (insn 50)... Maybe there's some other hidden problem in
expand_binop which is triggered by ifcvt in this case.

Even if the DImode plus worked OK, I guess the result would be counter
productive.

> Looks reasonable to me with one caveat.
> The comment text for noce_emit_store_flag is not very helpful

Which comment do you mean?  I don't see any ...

> and my first expectation of the X parameter to it is that it is the
> target where the flag is stored.

If the "return emit_store_flag" path in noce_emit_store_flag is taken,
it already returns a new reg rtx.  At least this is happening on SH
right now, because the "short cut path" is not taken.

>  With your case this will no longer
> be the case (emit_store_flag seems to attempt to return the result in
> x, judging by its comment in expmed.c). 

Hm yeah, but I guess there's no guarantee to that.  In emit_store_flag_1
(used by emit_store_flag), there's this:

  rtx tem = emit_cstore (target, icode, code, mode, compare_mode,
			 unsignedp, op0, op1, normalizep, target_mode);

And this might return a new reg with the output.

> I think it would be a good idea
> to expand the comment of noce_emit_store_flag to describe its arguments
> and to mention that the result is returned but is not necessarily placed in X.

Sure, I could add that.

> I think all the callers in ifcvt.c already guard against that possibility and
> do something along the lines of:
>   target = noce_emit_store_flag (..., x, ..., ...);
>   if (target != if_info->x)
>     noce_emit_move_insn (if_info->x, target);

Not in all the places.  It seems "target" is copied into "if_info->x" at
the very end of the sequence, which makes sense.  In between there are
several places where "if_info->x" is passed as the proposed target to
noce_emit_store_flag, but the following code always uses the actual
result reg.  At least as far as I can see. 

> but it would be nice to go through the callers and make sure that they don't assume
> that the result of noce_emit_store_flag is in always x.

It looks OK.  At least I don't see anything suspicious.  But it's better
if somebody else also takes a peek.

Cheers,
Oleg

Bernd Schmidt Sept. 22, 2015, 3:53 p.m. UTC | #3

On 09/22/2015 03:35 PM, Oleg Endo wrote:
> On SH, the result of comparisons etc. is stored in the T_REG.  It's a 1
> bit reg but described as SImode.  To get the T_REG into another reg,
> there's this insn:
>
> (define_insn "movt"
>    [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> 	(match_operand:SI 1 "t_reg_operand"))]
>    "TARGET_SH1"
>    "movt	%0"
>    [(set_attr "type" "arith")])
>
> where "t_reg_operand" accepts various forms of the T_REG via
> reg,subreg,sign_extend,zero_extend (to get better combine results).
>
> Now I'd like to extend the "t_reg_operand" predicate to accept something
> like
>
> (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0)))

Is there a reason you're not just adding that as a new pattern rather 
than trying to overload the predicate?


Bernd

Oleg Endo Sept. 22, 2015, 4:02 p.m. UTC | #4

On Tue, 2015-09-22 at 17:53 +0200, Bernd Schmidt wrote:
> On 09/22/2015 03:35 PM, Oleg Endo wrote:
> > On SH, the result of comparisons etc. is stored in the T_REG.  It's a 1
> > bit reg but described as SImode.  To get the T_REG into another reg,
> > there's this insn:
> >
> > (define_insn "movt"
> >    [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> > 	(match_operand:SI 1 "t_reg_operand"))]
> >    "TARGET_SH1"
> >    "movt	%0"
> >    [(set_attr "type" "arith")])
> >
> > where "t_reg_operand" accepts various forms of the T_REG via
> > reg,subreg,sign_extend,zero_extend (to get better combine results).
> >
> > Now I'd like to extend the "t_reg_operand" predicate to accept something
> > like
> >
> > (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0)))
> 
> Is there a reason you're not just adding that as a new pattern rather 
> than trying to overload the predicate?

Basically, because the predicate is also used several in other places,
for example in the recursive matching in treg_set_expr.  I'm trying to
avoid duplicating code to match stuff that eventually evaluates to "T is
1" or "T is 0".
Like
  (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0)))
is the same as
  (set (reg:SI) (eq:SI (reg:SI 147 t) (const_int 1)))

Cheers,
Oleg

Oleg Endo Sept. 22, 2015, 5:27 p.m. UTC | #5

On Wed, 2015-09-23 at 00:48 +0900, Oleg Endo wrote:

> I haven't checked the details.  But I guess because expand_binop wants
> to somehow reuse the input and output it creates a DImode pseudo, puts
> the input there, does the DImode plus and the returned "target" is a
> SImode subreg of the DImode result.  The strange thing is the first
> clobber (insn 50)... Maybe there's some other hidden problem in
> expand_binop which is triggered by ifcvt in this case.

OK ... this particular problem is in SH's md.  The addsi3 expander
purposefully fails if the output reg is the same as the 1st input reg.
This was added because of some issues with LRA.  Because of that
expand_binop_directly fails and a widened op is tried, which goes wrong
for some other reason (not going to investigate now).

I guess there's no need for the ifcvt patch in this case.  Sorry for the
noise.

Cheers,
Oleg

ifcvt vs. expand_binop

Commit Message

Comments

Patch