Message ID | B5E67142681B53468FAF6B7C31356562441183BD@hhmail02.hh.imgtec.org |
---|---|
State | New |
Headers | show |
On 01/14/15 12:20, Robert Suchanek wrote: > Hi Vladimir, > > An issue has been identified with LRA when running CPU2006 h264ref benchmark. > > I'll try to describe what the issue is and a fix applied as it is very > difficult to reproduce it and it is next to impossible to create a narrowed > testcase on top of the source code restrictions. > > The concerned LRA code in lra-constraints.c is the following: > > if (GET_CODE (*loc) == SUBREG) > { > reg = SUBREG_REG (*loc); > byte = SUBREG_BYTE (*loc); > if (REG_P (reg) > /* Strict_low_part requires reload the register not > the sub-register. */ > && (curr_static_id->operand[i].strict_low > || (GET_MODE_SIZE (mode) > <= GET_MODE_SIZE (GET_MODE (reg)) > && (hard_regno > = get_try_hard_regno (REGNO (reg))) >= 0 > && (simplify_subreg_regno > (hard_regno, > GET_MODE (reg), byte, mode) < 0) > && (goal_alt[i] == NO_REGS > || (simplify_subreg_regno > (ira_class_hard_regs[goal_alt[i]][0], > GET_MODE (reg), byte, mode) >= 0)))) > { > loc = &SUBREG_REG (*loc); > mode = GET_MODE (*loc); > } > } > > The above works just fine when we deal with strict_low_part or a subreg > smaller than a word. > > However, multi-word operations that were emitted as a sequence of operations > on word sized parts of the DImode register appears to expose a problem with > LRA e.g. '(set (subreg: SI (reg: DI)) ...)'. > LRA does not realize that it actually uses the other halve of the DI-mode > register leading to a situation where it modifies one halve of the result and > spills the whole register with the other halve undefined. > > In the dump I can see the following: > > Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 > 1487: r1552:DI#4=r1404:SI+r1509:SI > REG_DEAD r1509:SI > REG_DEAD r1404:SI > Inserting insn reload after: > 1735: r521:DI=r1552:DI > > There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted > to load the value before modifying it but it is spilled. > > As it is a multi-word register, the split pass emits an additional instruction > to load the whole 64-bit value but since one halve was modified, only > register $20 appears in the live-in set. In contrast to $20, $21 is being used > but not added to the live-in set. > > ... > ;; live in 4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13] > [$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25] > 29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec] > ... > (insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521]) > (mem/c:SI (plus:SI (reg/f:SI 29 $sp) > (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288 > {*movsi_internal} > (nil)) > (insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ]) > (mem/c:SI (plus:SI (reg/f:SI 29 $sp) > (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288 > {*movsi_internal} > (nil)) > ... > > The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT > to treat the pseudo register (r1552 in this case) as input and LRA will be forced > to insert a reload before modifying its contents. > > Handling of strict_low_part case is fine as the operand is described in the MD pattern > as IN_OUT through modifiers. > > With the above change in place, we get a reload before assignment: > > Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 > 1487: r1552:DI#4=r1404:SI+r1509:SI > REG_DEAD r1509:SI > REG_DEAD r1404:SI > Inserting insn reload before: > 1735: r1552:DI=r521:DI > Inserting insn reload after: > 1736: r521:DI=r1552:DI > > and the benchmark happily passes the runtime check. > > The question is whether changing the type to OP_INOUT is the correct and > valid fix? > > Regards, > Robert > > 2015-01-14 Robert Suchanek <robert.suchanek@imgtec.com> > > gcc/ > * lra-constraints.c (curr_insn_transform): Change the type of a reload > pseudo to OP_INOUT. Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify that the comment just before its return statement is effectively the situation you're in. There are certainly cases where a SUBREG needs to be treated as an in-out operand. We walked through them eons ago when we were poking at SSA for RTL. But the details have long since faded from memory. jeff
> Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify > that the comment just before its return statement is effectively the > situation you're in. > > There are certainly cases where a SUBREG needs to be treated as an > in-out operand. We walked through them eons ago when we were poking at > SSA for RTL. But the details have long since faded from memory. The comment pretty much applies to my situation. The only difference I can see is that reload would have had hard registers at this point. In the testcase, LRA does not have hard registers assigned to the concerned pseudo(s), thus, it can't rely on the information in hard_regno_nregs to check if the number of registers in INNER is different to the number of words in INNER. Robert
On 14/01/15 02:20 PM, Robert Suchanek wrote: > Hi Vladimir, > > An issue has been identified with LRA when running CPU2006 h264ref benchmark. > > I'll try to describe what the issue is and a fix applied as it is very > difficult to reproduce it and it is next to impossible to create a narrowed > testcase on top of the source code restrictions. > > The concerned LRA code in lra-constraints.c is the following: > > if (GET_CODE (*loc) == SUBREG) > { > reg = SUBREG_REG (*loc); > byte = SUBREG_BYTE (*loc); > if (REG_P (reg) > /* Strict_low_part requires reload the register not > the sub-register. */ > && (curr_static_id->operand[i].strict_low > || (GET_MODE_SIZE (mode) > <= GET_MODE_SIZE (GET_MODE (reg)) > && (hard_regno > = get_try_hard_regno (REGNO (reg))) >= 0 > && (simplify_subreg_regno > (hard_regno, > GET_MODE (reg), byte, mode) < 0) > && (goal_alt[i] == NO_REGS > || (simplify_subreg_regno > (ira_class_hard_regs[goal_alt[i]][0], > GET_MODE (reg), byte, mode) >= 0)))) > { > loc = &SUBREG_REG (*loc); > mode = GET_MODE (*loc); > } > } > > The above works just fine when we deal with strict_low_part or a subreg > smaller than a word. > > However, multi-word operations that were emitted as a sequence of operations > on word sized parts of the DImode register appears to expose a problem with > LRA e.g. '(set (subreg: SI (reg: DI)) ...)'. > LRA does not realize that it actually uses the other halve of the DI-mode > register leading to a situation where it modifies one halve of the result and > spills the whole register with the other halve undefined. > > In the dump I can see the following: > > Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 > 1487: r1552:DI#4=r1404:SI+r1509:SI > REG_DEAD r1509:SI > REG_DEAD r1404:SI > Inserting insn reload after: > 1735: r521:DI=r1552:DI > > There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted > to load the value before modifying it but it is spilled. > > As it is a multi-word register, the split pass emits an additional instruction > to load the whole 64-bit value but since one halve was modified, only > register $20 appears in the live-in set. In contrast to $20, $21 is being used > but not added to the live-in set. > > ... > ;; live in 4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13] > [$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25] > 29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec] > ... > (insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521]) > (mem/c:SI (plus:SI (reg/f:SI 29 $sp) > (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288 > {*movsi_internal} > (nil)) > (insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ]) > (mem/c:SI (plus:SI (reg/f:SI 29 $sp) > (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288 > {*movsi_internal} > (nil)) > ... > > The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT > to treat the pseudo register (r1552 in this case) as input and LRA will be forced > to insert a reload before modifying its contents. > > Handling of strict_low_part case is fine as the operand is described in the MD pattern > as IN_OUT through modifiers. > > With the above change in place, we get a reload before assignment: > > Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 > 1487: r1552:DI#4=r1404:SI+r1509:SI > REG_DEAD r1509:SI > REG_DEAD r1404:SI > Inserting insn reload before: > 1735: r1552:DI=r521:DI > Inserting insn reload after: > 1736: r521:DI=r1552:DI > > and the benchmark happily passes the runtime check. > > The question is whether changing the type to OP_INOUT is the correct and > valid fix? > > Regards, > Robert > > 2015-01-14 Robert Suchanek <robert.suchanek@imgtec.com> > > gcc/ > * lra-constraints.c (curr_insn_transform): Change the type of a reload > pseudo to OP_INOUT. > --- > gcc/lra-constraints.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c > index ec28b7f..018968b 100644 > --- a/gcc/lra-constraints.c > +++ b/gcc/lra-constraints.c > @@ -3798,6 +3798,7 @@ curr_insn_transform (void) > (ira_class_hard_regs[goal_alt[i]][0], > GET_MODE (reg), byte, mode) >= 0))))) > { > + type = OP_INOUT; > loc = &SUBREG_REG (*loc); > mode = GET_MODE (*loc); > } > Robert, thanks for working on the issue. Your approach is in the right direction. But I believe the change should be if (type == OP_OUT) type = OP_INOUT; Otherwise, I will generate still a right code but an additional insn which will not go away later. With this change the patch is ok. Please, check x86/x86-64 target bootstrap and GCC testsuite before submitting the patch. It seems there will be no problem with the patch on other targets but it is better to check to be sure.
On 01/15/15 03:13, Robert Suchanek wrote: >> Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify >> that the comment just before its return statement is effectively the >> situation you're in. >> >> There are certainly cases where a SUBREG needs to be treated as an >> in-out operand. We walked through them eons ago when we were poking at >> SSA for RTL. But the details have long since faded from memory. > > The comment pretty much applies to my situation. The only difference I can > see is that reload would have had hard registers at this point. In the testcase, > LRA does not have hard registers assigned to the concerned pseudo(s), thus, > it can't rely on the information in hard_regno_nregs to check if the number of > registers in INNER is different to the number of words in INNER. The differences (hard vs pseudo regs) are primarily an implementation detail. I was really looking to see if there was existing code which would turn an output reload into an in-out reload for these subregs. The in-out nature of certain subregs is something I've personally stumbled over in various contexts (for example, this also came up during RTL-SSA investigations years ago). Jeff
> The differences (hard vs pseudo regs) are primarily an implementation > detail. I was really looking to see if there was existing code which > would turn an output reload into an in-out reload for these subregs. > > The in-out nature of certain subregs is something I've personally > stumbled over in various contexts (for example, this also came up during > RTL-SSA investigations years ago). > > Jeff Although I haven't seen an output reload changing to in-out reload in the testcase after analysing the reload code I'm inclined to say the change of the out to in-out reload can happen for these subregs in push_reload(). Robert
Jeff Law <law@redhat.com> writes: > On 01/15/15 03:13, Robert Suchanek wrote: > >> Robert, can you look at reload.c::reload_inner_reg_of_subreg and > >> verify that the comment just before its return statement is > >> effectively the situation you're in. > >> > >> There are certainly cases where a SUBREG needs to be treated as an > >> in-out operand. We walked through them eons ago when we were poking > >> at SSA for RTL. But the details have long since faded from memory. > > > > The comment pretty much applies to my situation. The only difference > > I can see is that reload would have had hard registers at this point. > > In the testcase, LRA does not have hard registers assigned to the > > concerned pseudo(s), thus, it can't rely on the information in > > hard_regno_nregs to check if the number of registers in INNER is > different to the number of words in INNER. > The differences (hard vs pseudo regs) are primarily an implementation > detail. I was really looking to see if there was existing code which > would turn an output reload into an in-out reload for these subregs. > > The in-out nature of certain subregs is something I've personally > stumbled over in various contexts (for example, this also came up during > RTL-SSA investigations years ago). Committed as r219730 on Robert's behalf. Thanks, Matthew
Jeff Law <law@redhat.com> writes: > On 01/15/15 03:13, Robert Suchanek wrote: >>> Robert, can you look at reload.c::reload_inner_reg_of_subreg and verify >>> that the comment just before its return statement is effectively the >>> situation you're in. >>> >>> There are certainly cases where a SUBREG needs to be treated as an >>> in-out operand. We walked through them eons ago when we were poking at >>> SSA for RTL. But the details have long since faded from memory. >> >> The comment pretty much applies to my situation. The only difference I can >> see is that reload would have had hard registers at this point. In >> the testcase, >> LRA does not have hard registers assigned to the concerned pseudo(s), thus, >> it can't rely on the information in hard_regno_nregs to check if the number of >> registers in INNER is different to the number of words in INNER. But the code you quote is: if (GET_CODE (*loc) == SUBREG) { reg = SUBREG_REG (*loc); byte = SUBREG_BYTE (*loc); if (REG_P (reg) /* Strict_low_part requires reload the register not the sub-register. */ && (curr_static_id->operand[i].strict_low || (GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (reg)) && (hard_regno = get_try_hard_regno (REGNO (reg))) >= 0 && (simplify_subreg_regno (hard_regno, GET_MODE (reg), byte, mode) < 0) && (goal_alt[i] == NO_REGS || (simplify_subreg_regno (ira_class_hard_regs[goal_alt[i]][0], GET_MODE (reg), byte, mode) >= 0)))) Here we do have a hard register, but it isn't valid to form the subreg on that hard register. Reload had to cope with that case too. Since the subreg on the original hard register is invalid, we can't use it to decide whether the intention was to write to only a part of the inner register. But I don't think we need to use the hard register here. The original register was a psuedo and I'm pretty sure... > The differences (hard vs pseudo regs) are primarily an implementation > detail. I was really looking to see if there was existing code which > would turn an output reload into an in-out reload for these subregs. > > The in-out nature of certain subregs is something I've personally > stumbled over in various contexts (for example, this also came up during > RTL-SSA investigations years ago). ...the rule for pseudos is that words of a multiword pseudo can be accessed independently but subword pieces of an individual word can't. This obviously isn't ideal if a mode is intended for wider-than-word registers, since not all words will be independently addressable when allocated to those registers. The code above is partly dealing with the fallout from that. It's also why we have strict_lowpart for cases like al on i386. So IMO the patch is too broad. I think it should only use INOUT reloads for !strict_low if the inner mode is wider than a word and the outer mode is strictly narrower than the inner mode. That's on top of Vlad's comment about only converting OP_OUTs, of course. Thanks, Richard
Richard Sandiford <rdsandiford@googlemail.com> writes: > Jeff Law <law@redhat.com> writes: > > On 01/15/15 03:13, Robert Suchanek wrote: > >>> Robert, can you look at reload.c::reload_inner_reg_of_subreg and > >>> verify that the comment just before its return statement is > >>> effectively the situation you're in. > >>> > >>> There are certainly cases where a SUBREG needs to be treated as an > >>> in-out operand. We walked through them eons ago when we were poking > >>> at SSA for RTL. But the details have long since faded from memory. > >> > >> The comment pretty much applies to my situation. The only difference > >> I can see is that reload would have had hard registers at this point. > >> In the testcase, LRA does not have hard registers assigned to the > >> concerned pseudo(s), thus, it can't rely on the information in > >> hard_regno_nregs to check if the number of registers in INNER is > >> different to the number of words in INNER. > > But the code you quote is: > > if (GET_CODE (*loc) == SUBREG) > { > reg = SUBREG_REG (*loc); > byte = SUBREG_BYTE (*loc); > if (REG_P (reg) > /* Strict_low_part requires reload the register not > the sub-register. */ > && (curr_static_id->operand[i].strict_low > || (GET_MODE_SIZE (mode) > <= GET_MODE_SIZE (GET_MODE (reg)) > && (hard_regno > = get_try_hard_regno (REGNO (reg))) >= 0 > && (simplify_subreg_regno > (hard_regno, > GET_MODE (reg), byte, mode) < 0) > && (goal_alt[i] == NO_REGS > || (simplify_subreg_regno > (ira_class_hard_regs[goal_alt[i]][0], > GET_MODE (reg), byte, mode) >= 0)))) > > Here we do have a hard register, but it isn't valid to form the subreg > on that hard register. Reload had to cope with that case too. > > Since the subreg on the original hard register is invalid, we can't use > it to decide whether the intention was to write to only a part of the > inner register. But I don't think we need to use the hard register > here. The original register was a psuedo and I'm pretty sure... > > > The differences (hard vs pseudo regs) are primarily an implementation > > detail. I was really looking to see if there was existing code which > > would turn an output reload into an in-out reload for these subregs. > > > > The in-out nature of certain subregs is something I've personally > > stumbled over in various contexts (for example, this also came up > > during RTL-SSA investigations years ago). > > ...the rule for pseudos is that words of a multiword pseudo can be > accessed independently but subword pieces of an individual word can't. > This obviously isn't ideal if a mode is intended for wider-than-word > registers, since not all words will be independently addressable when > allocated to those registers. The code above is partly dealing with the > fallout from that. It's also why we have strict_lowpart for cases like > al on i386. > > So IMO the patch is too broad. I think it should only use INOUT reloads > for !strict_low if the inner mode is wider than a word and the outer > mode is strictly narrower than the inner mode. That's on top of Vlad's > comment about only converting OP_OUTs, of course. This sounds correct/sensible. The change as committed will be turning some OP_OUT reloads into OP_INOUT unnecessarily. Checking for !strict_low is however probably redundant as strict_low must be OP_INOUT and never OP_OUT but we could mask backend bugs by converting strict_low subregs. I think this should be resolved for GCC 5 if others agree it is an issue. Matthew
> > Here we do have a hard register, but it isn't valid to form the subreg > > on that hard register. Reload had to cope with that case too. > > > > Since the subreg on the original hard register is invalid, we can't use > > it to decide whether the intention was to write to only a part of the > > inner register. But I don't think we need to use the hard register > > here. The original register was a psuedo and I'm pretty sure... Indeed, it is a hard register (64 IIRC) since get_try_hard_regno () must return >= 0. > > > The differences (hard vs pseudo regs) are primarily an implementation > > > detail. I was really looking to see if there was existing code which > > > would turn an output reload into an in-out reload for these subregs. > > > > > > The in-out nature of certain subregs is something I've personally > > > stumbled over in various contexts (for example, this also came up > > > during RTL-SSA investigations years ago). > > > > ...the rule for pseudos is that words of a multiword pseudo can be > > accessed independently but subword pieces of an individual word can't. > > This obviously isn't ideal if a mode is intended for wider-than-word > > registers, since not all words will be independently addressable when > > allocated to those registers. The code above is partly dealing with the > > fallout from that. It's also why we have strict_lowpart for cases like > > al on i386. > > > > So IMO the patch is too broad. I think it should only use INOUT reloads > > for !strict_low if the inner mode is wider than a word and the outer > > mode is strictly narrower than the inner mode. That's on top of Vlad's > > comment about only converting OP_OUTs, of course. > > This sounds correct/sensible. The change as committed will be turning > some OP_OUT reloads into OP_INOUT unnecessarily. Checking for !strict_low > is however probably redundant as strict_low must be OP_INOUT and never > OP_OUT but we could mask backend bugs by converting strict_low subregs. > > I think this should be resolved for GCC 5 if others agree it is an issue. > > Matthew This makes sense since we've got the inner wider than a word and I agree it'd be better to resolve this for GCC 5. Robert
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index ec28b7f..018968b 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -3798,6 +3798,7 @@ curr_insn_transform (void) (ira_class_hard_regs[goal_alt[i]][0], GET_MODE (reg), byte, mode) >= 0))))) { + type = OP_INOUT; loc = &SUBREG_REG (*loc); mode = GET_MODE (*loc); }