Message ID | 1442928926.2509.23.camel@t-online.de |
---|---|
State | New |
Headers | show |
Hi Oleg, On 22/09/15 14:35, Oleg Endo wrote: > On SH, the result of comparisons etc. is stored in the T_REG. It's a 1 > bit reg but described as SImode. To get the T_REG into another reg, > there's this insn: > > (define_insn "movt" > [(set (match_operand:SI 0 "arith_reg_dest" "=r") > (match_operand:SI 1 "t_reg_operand"))] > "TARGET_SH1" > "movt %0" > [(set_attr "type" "arith")]) > > where "t_reg_operand" accepts various forms of the T_REG via > reg,subreg,sign_extend,zero_extend (to get better combine results). > > Now I'd like to extend the "t_reg_operand" predicate to accept something > like > > (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0))) > > > For the test case > > int test05 (int a, int b) > { > return a != b ? 0x7FFFFFFF : 0x80000000; > } > > ifcvt (noce_emit_store_flag) tries to emit the store flag insn as > > (set (reg:SI 162) > (ne:SI (reg:SI 147 t) > (const_int 0 [0]))) > > It then proceeds with a call to > > expand_simple_binop > op0: (const_int 2147483647 [0x7fffffff]) > op1: (reg:SI 162) <<< same reg > target: (reg:SI 162) <<< where does noce_emit_store_flag call expand_simple_binop? Do you mean the code following the call to noce_emit_store_flag in noce_try_store_flag_constants? (I suspect that's the code that will get triggered for your testcase) > The problem is that expand_binop gets confused, seemingly because > op1 and target are the same. It tries to emit a 64 bit addition: > > (insn 50 49 51 (clobber (reg:DI 173)) -1 > (nil)) > > (insn 51 50 52 (set (subreg:SI (reg:DI 173) 0) > (reg:SI 162)) -1 > (nil)) > > (insn 52 51 53 (set (reg:DI 175) > (const_int 2147483647 [0x7fffffff])) -1 > (nil)) > > (insn 53 52 0 (parallel [ > (set (reg:DI 174) > (plus:DI (reg:DI 173) > (reg:DI 175))) > (clobber (reg:SI 147 t)) > ]) -1 > (nil)) huh, why does it try to do that? I'll admit I don't know the details of how that expansion works. Presumably it's very target-specific > ifcvt (end_ifcvt_sequence) then does a recog for each emitted insn in > the resulting sequence and insn 50 above will not match anything in the > SH md. It then drops everything and tries something else. > > Until now, because SH patterns would not accept the > (set (reg:SI 162) > (ne:SI (reg:SI 147 t) > (const_int 0 [0]))) > > the "fallback path" in noce_emit_store_flag is used. This uses > emit_store_flag which creates a new pseudo for the result and we get > > expand_simple_binop > op0: (const_int 2147483647 [0x7fffffff]) > op1: (reg:SI 167) <<< different regs > target: (reg:SI 162) <<< > > which works fine (no 64 bit addition). > > A simple fix seems to be to create a new pseudo for the "short cut" path > in noce_emit_store_flag: > > Index: gcc/ifcvt.c > =================================================================== > --- gcc/ifcvt.c (revision 227970) > +++ gcc/ifcvt.c (working copy) > @@ -874,6 +874,7 @@ > { > rtx src = gen_rtx_fmt_ee (code, GET_MODE (x), XEXP (cond, 0), > XEXP (cond, 1)); > + x = gen_reg_rtx (GET_MODE (x)); > rtx set = gen_rtx_SET (x, src); > > start_sequence (); > > I haven't done full testing of this, but it fixes the problem for me. > Any opinions? Looks reasonable to me with one caveat. The comment text for noce_emit_store_flag is not very helpful and my first expectation of the X parameter to it is that it is the target where the flag is stored. With your case this will no longer be the case (emit_store_flag seems to attempt to return the result in x, judging by its comment in expmed.c). I think it would be a good idea to expand the comment of noce_emit_store_flag to describe its arguments and to mention that the result is returned but is not necessarily placed in X. I think all the callers in ifcvt.c already guard against that possibility and do something along the lines of: target = noce_emit_store_flag (..., x, ..., ...); if (target != if_info->x) noce_emit_move_insn (if_info->x, target); but it would be nice to go through the callers and make sure that they don't assume that the result of noce_emit_store_flag is in always x. Kyrill > > Cheers, > Oleg >
Hi, On Tue, 2015-09-22 at 15:21 +0100, Kyrill Tkachov wrote: > where does noce_emit_store_flag call expand_simple_binop? > Do you mean the code following the call to noce_emit_store_flag > in noce_try_store_flag_constants? (I suspect that's the code that > will get triggered for your testcase) Sorry, forgot to mention that. Yes, it's in noce_try_store_flag_constants: /* Always use ifalse here. It should have been swapped with itrue when appropriate when reversep is true. */ target = expand_simple_binop (mode, subtract_flag_p ? MINUS : PLUS, > > The problem is that expand_binop gets confused, seemingly because > > op1 and target are the same. It tries to emit a 64 bit addition: > > > > (insn 50 49 51 (clobber (reg:DI 173)) -1 > > (nil)) > > > > (insn 51 50 52 (set (subreg:SI (reg:DI 173) 0) > > (reg:SI 162)) -1 > > (nil)) > > > > (insn 52 51 53 (set (reg:DI 175) > > (const_int 2147483647 [0x7fffffff])) -1 > > (nil)) > > > > (insn 53 52 0 (parallel [ > > (set (reg:DI 174) > > (plus:DI (reg:DI 173) > > (reg:DI 175))) > > (clobber (reg:SI 147 t)) > > ]) -1 > > (nil)) > > huh, why does it try to do that? I'll admit I don't know the details > of how that expansion works. Presumably it's very target-specific I haven't checked the details. But I guess because expand_binop wants to somehow reuse the input and output it creates a DImode pseudo, puts the input there, does the DImode plus and the returned "target" is a SImode subreg of the DImode result. The strange thing is the first clobber (insn 50)... Maybe there's some other hidden problem in expand_binop which is triggered by ifcvt in this case. Even if the DImode plus worked OK, I guess the result would be counter productive. > Looks reasonable to me with one caveat. > The comment text for noce_emit_store_flag is not very helpful Which comment do you mean? I don't see any ... > and my first expectation of the X parameter to it is that it is the > target where the flag is stored. If the "return emit_store_flag" path in noce_emit_store_flag is taken, it already returns a new reg rtx. At least this is happening on SH right now, because the "short cut path" is not taken. > With your case this will no longer > be the case (emit_store_flag seems to attempt to return the result in > x, judging by its comment in expmed.c). Hm yeah, but I guess there's no guarantee to that. In emit_store_flag_1 (used by emit_store_flag), there's this: rtx tem = emit_cstore (target, icode, code, mode, compare_mode, unsignedp, op0, op1, normalizep, target_mode); And this might return a new reg with the output. > I think it would be a good idea > to expand the comment of noce_emit_store_flag to describe its arguments > and to mention that the result is returned but is not necessarily placed in X. Sure, I could add that. > I think all the callers in ifcvt.c already guard against that possibility and > do something along the lines of: > target = noce_emit_store_flag (..., x, ..., ...); > if (target != if_info->x) > noce_emit_move_insn (if_info->x, target); Not in all the places. It seems "target" is copied into "if_info->x" at the very end of the sequence, which makes sense. In between there are several places where "if_info->x" is passed as the proposed target to noce_emit_store_flag, but the following code always uses the actual result reg. At least as far as I can see. > but it would be nice to go through the callers and make sure that they don't assume > that the result of noce_emit_store_flag is in always x. It looks OK. At least I don't see anything suspicious. But it's better if somebody else also takes a peek. Cheers, Oleg
On 09/22/2015 03:35 PM, Oleg Endo wrote: > On SH, the result of comparisons etc. is stored in the T_REG. It's a 1 > bit reg but described as SImode. To get the T_REG into another reg, > there's this insn: > > (define_insn "movt" > [(set (match_operand:SI 0 "arith_reg_dest" "=r") > (match_operand:SI 1 "t_reg_operand"))] > "TARGET_SH1" > "movt %0" > [(set_attr "type" "arith")]) > > where "t_reg_operand" accepts various forms of the T_REG via > reg,subreg,sign_extend,zero_extend (to get better combine results). > > Now I'd like to extend the "t_reg_operand" predicate to accept something > like > > (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0))) Is there a reason you're not just adding that as a new pattern rather than trying to overload the predicate? Bernd
On Tue, 2015-09-22 at 17:53 +0200, Bernd Schmidt wrote: > On 09/22/2015 03:35 PM, Oleg Endo wrote: > > On SH, the result of comparisons etc. is stored in the T_REG. It's a 1 > > bit reg but described as SImode. To get the T_REG into another reg, > > there's this insn: > > > > (define_insn "movt" > > [(set (match_operand:SI 0 "arith_reg_dest" "=r") > > (match_operand:SI 1 "t_reg_operand"))] > > "TARGET_SH1" > > "movt %0" > > [(set_attr "type" "arith")]) > > > > where "t_reg_operand" accepts various forms of the T_REG via > > reg,subreg,sign_extend,zero_extend (to get better combine results). > > > > Now I'd like to extend the "t_reg_operand" predicate to accept something > > like > > > > (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0))) > > Is there a reason you're not just adding that as a new pattern rather > than trying to overload the predicate? Basically, because the predicate is also used several in other places, for example in the recursive matching in treg_set_expr. I'm trying to avoid duplicating code to match stuff that eventually evaluates to "T is 1" or "T is 0". Like (set (reg:SI) (ne:SI (reg:SI 147 t) (const_int 0))) is the same as (set (reg:SI) (eq:SI (reg:SI 147 t) (const_int 1))) Cheers, Oleg
On Wed, 2015-09-23 at 00:48 +0900, Oleg Endo wrote: > I haven't checked the details. But I guess because expand_binop wants > to somehow reuse the input and output it creates a DImode pseudo, puts > the input there, does the DImode plus and the returned "target" is a > SImode subreg of the DImode result. The strange thing is the first > clobber (insn 50)... Maybe there's some other hidden problem in > expand_binop which is triggered by ifcvt in this case. OK ... this particular problem is in SH's md. The addsi3 expander purposefully fails if the output reg is the same as the 1st input reg. This was added because of some issues with LRA. Because of that expand_binop_directly fails and a widened op is tried, which goes wrong for some other reason (not going to investigate now). I guess there's no need for the ifcvt patch in this case. Sorry for the noise. Cheers, Oleg
Index: gcc/ifcvt.c =================================================================== --- gcc/ifcvt.c (revision 227970) +++ gcc/ifcvt.c (working copy) @@ -874,6 +874,7 @@ { rtx src = gen_rtx_fmt_ee (code, GET_MODE (x), XEXP (cond, 0), XEXP (cond, 1)); + x = gen_reg_rtx (GET_MODE (x)); rtx set = gen_rtx_SET (x, src); start_sequence ();