Message ID | 0DA23CC379F5F945ACB41CF394B9827720F331A8@LEMAIL01.le.imgtec.org |
---|---|
State | New |
Headers | show |
On Tue, 18 Nov 2014, Andrew Bennett wrote: > The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when > using the -mfix-r10000 option. This is due to the fact that the delay > slot of the branch instruction that checks if the atomic operation was > not successful can be filled with an operation that returns the output > result. For the branch likely case this operation can not go in the > delay slot because it wont execute when the atomic operation was > successful and therefore the wrong result will be returned. This patch > forces a nop to be placed in the delay slot if a branch likely is used. > The fix is as simple as possible; it does cause a second nop to be introduced > after the branch instruction in the branch likely case. As this option is > rarely used, it makes more sense to keep the code readable than trying to > optimise it. Can you please be a bit more elaborate on how the wrong code sequence looks like, why it is produced like that and how your fix changes it? Without seeing examples of code generated it is very hard to imagine what is really going on here. Maciej
> -----Original Message----- > From: Maciej W. Rozycki [mailto:macro@codesourcery.com] > Sent: 18 November 2014 13:48 > To: Andrew Bennett > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] If using branch likelies in MIPS sync code fill the delay > slot with a nop > > On Tue, 18 Nov 2014, Andrew Bennett wrote: > > > The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when > > using the -mfix-r10000 option. This is due to the fact that the delay > > slot of the branch instruction that checks if the atomic operation was > > not successful can be filled with an operation that returns the output > > result. For the branch likely case this operation can not go in the > > delay slot because it wont execute when the atomic operation was > > successful and therefore the wrong result will be returned. This patch > > forces a nop to be placed in the delay slot if a branch likely is used. > > The fix is as simple as possible; it does cause a second nop to be > introduced > > after the branch instruction in the branch likely case. As this option is > > rarely used, it makes more sense to keep the code readable than trying to > > optimise it. > > Can you please be a bit more elaborate on how the wrong code sequence > looks like, why it is produced like that and how your fix changes it? > Without seeing examples of code generated it is very hard to imagine > what is really going on here. Ok if we compile the following cut-down atomic-op-3.c test case with -mfix-r10000. extern void abort(void); int v, count, res; int main () { v = 0; count = 1; if (__atomic_add_fetch (&v, count, __ATOMIC_RELAXED) != 1) abort (); return 0; } The following assembly code is produced for the atomic operation: .set noat sync # 15 sync_new_addsi/2 [length = 24] 1: ll $3,0($4) addu $1,$3,$2 sc $1,0($4) beql $1,$0,1b addu $3,$3,$2 sync .set at Notice here that the addu is in the delay slot of the branch likely instruction. This is computing the value that exists in the memory location after the atomic operation has completed. Because it is a branch likely instruction this will only run when the atomic operation fails, and hence when it is successful it will not return the correct value. The offending code is in mips_process_sync_loop: mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL); /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot]. */ if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 != newval) { mips_multi_copy_insn (tmp3_insn); mips_multi_set_operand (mips_multi_last_index (), 0, newval); } else if (!(required_oldval && cmp)) mips_multi_add_insn ("nop", NULL); /* CMP = 1 -- either standalone or in a delay slot. */ if (required_oldval && cmp) mips_multi_add_insn ("li\t%0,1", cmp, NULL); For the branch likely case the delay slot could be filled either with the operation to compute the value that exists in memory after the atomic operation has completed; an operation to return if the atomic operation is successful or not; or a nop. The first two operations should not be in the delay slot of the branch if it is a branch likely because they will only run if the atomic operation was unsuccessful. My fix places a nop in the delay slot of the branch likely instruction by using the %~ output operation. This then causes the sync code for the previous example to be correct: .set noat sync # 15 sync_new_addsi/2 [length = 24] 1: ll $3,0($4) addu $1,$3,$2 sc $1,0($4) beql $1,$0,1b nop addu $3,$3,$2 sync .set at Regards, Andrew
> The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when > using the -mfix-r10000 option. This is due to the fact that the delay > slot of the branch instruction that checks if the atomic operation was > not successful can be filled with an operation that returns the output > result. For the branch likely case this operation can not go in the > delay slot because it wont execute when the atomic operation was > successful and therefore the wrong result will be returned. This patch > forces a nop to be placed in the delay slot if a branch likely is used. > The fix is as simple as possible; it does cause a second nop to be > introduced > after the branch instruction in the branch likely case. As this option is > rarely used, it makes more sense to keep the code readable than trying to > optimise it. > > The atomic tests now pass successfully. The ChangeLog and patch are > below. > > Ok to commit? I'm OK with just making the fix-r10000 case safe rather than also complicating the normal code path to remove the normal delay slot NOP in this special case. I'd like to see what Catherine thinks though. To remove the second NOP I believe we would have to add a !TARGET_FIX_R10000 to the condition of the normal delay slot NOP. This seems a bit convoluted when the real reason is because branch likely is in use. To make use of the mips_branch_likely flag it would have to be set earlier in: mips_output_sync_loop and also get set in mips_sync_loop_insns. If Catherine thinks getting rid of the second NOP is important enough then please base it on mips_branch_likely and fix the callers. > gcc/ > * config/mips/mips.c (mips_process_sync_loop): Place a nop in the > delay slot of the branch likely instruction. > > > diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c > index 02268f3..6dd3728 100644 > --- a/gcc/config/mips/mips.c > +++ b/gcc/config/mips/mips.c > @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx > *operands) > This will sometimes be a delayed branch; see the write code below > for details. */ > mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, mem, > NULL); > - mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL); > + > + /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay > slot > + of the branch if it is a branch likely because they will not execute > when > + the atomic operation is successful, so place a NOP in there instead. > */ > + I found the comment hard to digest, perhaps: /* When using branch likely (-mfix-r10000), the delay slot instruction will be annulled on false. The normal delay slot instructions calculate the overall result of the atomic operation and must not be annulled. Unconditionally use a NOP instead for the branch likely case. */ > + mips_multi_add_insn ("beq%?\t%0,%.,1b%~", at, NULL); > > /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot]. */ > if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 != > newval) Matthew
On Tue, 18 Nov 2014, Andrew Bennett wrote: > My fix places a nop in the delay slot of the branch likely instruction > by using the %~ output operation. This then causes the sync code for the > previous example to be correct: > > .set noat > sync # 15 sync_new_addsi/2 [length = 24] > 1: > ll $3,0($4) > addu $1,$3,$2 > sc $1,0($4) > beql $1,$0,1b > nop > addu $3,$3,$2 > sync > .set at OK, this does look to me like the correct way to address the issue, but where is the second NOP that you previously mentioned? I fail to see it here and this code can't be made any better, there isn't anything you could possibly schedule into the delay slot as there is nothing else to do in this loop. Maciej
> OK, this does look to me like the correct way to address the issue, but > where is the second NOP that you previously mentioned? I fail to see it > here and this code can't be made any better, there isn't anything you > could possibly schedule into the delay slot as there is nothing else to > do in this loop. The following testcase shows this occurring. short v, count, ret; int main () { v = 0; count = 0; __atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED); return 0; } Produces (for the atomic operation): .set noat sync 1: ll $3,0($5) and $1,$3,$4 bne $1,$7,2f and $1,$3,$6 or $1,$1,$8 sc $1,0($5) beql $1,$0,1b nop nop sync 2: .set at Regards, Andrew
On Tue, 18 Nov 2014, Andrew Bennett wrote: > Produces (for the atomic operation): > > .set noat > sync > 1: > ll $3,0($5) > and $1,$3,$4 > bne $1,$7,2f > and $1,$3,$6 > or $1,$1,$8 > sc $1,0($5) > beql $1,$0,1b > nop > nop > sync > 2: > .set at With a quick look at `mips_process_sync_loop' it looks to me the other NOP is produced from here: else if (!(required_oldval && cmp)) mips_multi_add_insn ("nop", NULL); -- correct? If so, then can't you just make it: else if (!(required_oldval && cmp) && !TARGET_FIX_R10000) mips_multi_add_insn ("nop", NULL); instead? It appears plain and simple, and if you're concerned about it being unobvious to the casual reader, then add a one-line comment or suchlike. It's not like TARGET_FIX_R10000 is going to imply anything else about branches in the future (and r6 code won't run on an R10k anyway). Maciej
> From: Maciej W. Rozycki [mailto:macro@codesourcery.com] > > On Tue, 18 Nov 2014, Andrew Bennett wrote: > > > Produces (for the atomic operation): > > > > .set noat > > sync > > 1: > > ll $3,0($5) > > and $1,$3,$4 > > bne $1,$7,2f > > and $1,$3,$6 > > or $1,$1,$8 > > sc $1,0($5) > > beql $1,$0,1b > > nop > > nop > > sync > > 2: > > .set at > > With a quick look at `mips_process_sync_loop' it looks to me the > other NOP is produced from here: > > else if (!(required_oldval && cmp)) > mips_multi_add_insn ("nop", NULL); > > -- correct? If so, then can't you just make it: Correct. > > else if (!(required_oldval && cmp) && !TARGET_FIX_R10000) > mips_multi_add_insn ("nop", NULL); > > instead? It appears plain and simple, and if you're concerned about it > being unobvious to the casual reader, then add a one-line comment or > suchlike. It's not like TARGET_FIX_R10000 is going to imply anything > else about branches in the future (and r6 code won't run on an R10k > anyway). True. Actually this is what my original patch had in it, but Matthew and I decided to leave it out for the initial submission. I am happy to add it back in, and provide a comment to explain what is happening. Catherine are you happy with this? Regards, Andrew
> > From: Maciej W. Rozycki [mailto:macro@codesourcery.com] > > > > On Tue, 18 Nov 2014, Andrew Bennett wrote: > > > > > Produces (for the atomic operation): > > > > > > .set noat > > > sync > > > 1: > > > ll $3,0($5) > > > and $1,$3,$4 > > > bne $1,$7,2f > > > and $1,$3,$6 > > > or $1,$1,$8 > > > sc $1,0($5) > > > beql $1,$0,1b > > > nop > > > nop > > > sync > > > 2: > > > .set at > > > > With a quick look at `mips_process_sync_loop' it looks to me the > > other NOP is produced from here: > > > > else if (!(required_oldval && cmp)) > > mips_multi_add_insn ("nop", NULL); > > > > -- correct? If so, then can't you just make it: > > Correct. > > > > > else if (!(required_oldval && cmp) && !TARGET_FIX_R10000) > > mips_multi_add_insn ("nop", NULL); > > > > instead? It appears plain and simple, and if you're concerned about > > it being unobvious to the casual reader, then add a one-line comment > > or suchlike. It's not like TARGET_FIX_R10000 is going to imply > > anything else about branches in the future (and r6 code won't run on > > an R10k anyway). I'm afraid I disagree about it being plain and simple even with a comment. The amount of code in the backend that makes assumptions based on derived variables is very high and it makes the code hard to read (especially w.r.t. branches). I'm OK with adding the code to avoid the extra NOP but have it based on mips_branch_likely and fix the callers so that it is set appropriately. Thanks, Matthew
On Tue, 18 Nov 2014, Matthew Fortune wrote: > > > With a quick look at `mips_process_sync_loop' it looks to me the > > > other NOP is produced from here: > > > > > > else if (!(required_oldval && cmp)) > > > mips_multi_add_insn ("nop", NULL); > > > > > > -- correct? If so, then can't you just make it: > > > > Correct. > > > > > > > > else if (!(required_oldval && cmp) && !TARGET_FIX_R10000) > > > mips_multi_add_insn ("nop", NULL); > > > > > > instead? It appears plain and simple, and if you're concerned about > > > it being unobvious to the casual reader, then add a one-line comment > > > or suchlike. It's not like TARGET_FIX_R10000 is going to imply > > > anything else about branches in the future (and r6 code won't run on > > > an R10k anyway). > > I'm afraid I disagree about it being plain and simple even with a comment. > The amount of code in the backend that makes assumptions based on derived > variables is very high and it makes the code hard to read (especially w.r.t. > branches). I'm OK with adding the code to avoid the extra NOP but have it > based on mips_branch_likely and fix the callers so that it is set > appropriately. Sure, fine with me too. It looks to me it'll be more natural even to have `mips_branch_likely' preset on entering `mips_process_sync_loop'. Maciej
> -----Original Message----- > From: Matthew Fortune [mailto:Matthew.Fortune@imgtec.com] > Sent: Tuesday, November 18, 2014 9:42 AM > To: Andrew Bennett; gcc-patches@gcc.gnu.org > Cc: Moore, Catherine; Rozycki, Maciej > Subject: RE: [PATCH] If using branch likelies in MIPS sync code fill the delay > slot with a nop > > > The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing > > when using the -mfix-r10000 option. This is due to the fact that the > > delay slot of the branch instruction that checks if the atomic > > operation was not successful can be filled with an operation that > > returns the output result. For the branch likely case this operation > > can not go in the delay slot because it wont execute when the atomic > > operation was successful and therefore the wrong result will be > > returned. This patch forces a nop to be placed in the delay slot if a branch > likely is used. > > The fix is as simple as possible; it does cause a second nop to be > > introduced after the branch instruction in the branch likely case. As > > this option is rarely used, it makes more sense to keep the code > > readable than trying to optimise it. > > > > The atomic tests now pass successfully. The ChangeLog and patch are > > below. > > > > Ok to commit? > > I'm OK with just making the fix-r10000 case safe rather than also complicating > the normal code path to remove the normal delay slot NOP in this special > case. I'd like to see what Catherine thinks though. To remove the second > NOP I believe we would have to add a !TARGET_FIX_R10000 to the condition > of the normal delay slot NOP. This seems a bit convoluted when the real > reason is because branch likely is in use. To make use of the > mips_branch_likely flag it would have to be set earlier in: > mips_output_sync_loop and also get set in mips_sync_loop_insns. > > If Catherine thinks getting rid of the second NOP is important enough then > please base it on mips_branch_likely and fix the callers. > Yes, removing the second NOP is worth the additional effort.
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 02268f3..6dd3728 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands) This will sometimes be a delayed branch; see the write code below for details. */ mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, mem, NULL); - mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL); + + /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot + of the branch if it is a branch likely because they will not execute when + the atomic operation is successful, so place a NOP in there instead. */ + + mips_multi_add_insn ("beq%?\t%0,%.,1b%~", at, NULL); /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot]. */ if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 != newval)