Message ID | CAOvf_xwDUU=gsQqHgybovAFHbn1+OVHYNBw+=xG0jau1wG2HDg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Oct 17, 2014, at 7:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > The patch fixes 1st fail in darwin bootstarp. > When PIC register is pseudo we don't need to init it after setjmp or > non local goto. > > Is it ok? So, I don’t see commentary in the PR that all fallout and all bugs introduced are fixed by the patch. :-( Given how central pic is to code-gen, I don’t favor any patch, until all bugs and regressions are fixed. If they can’t be, then I favor reversion of the patch that broke everything. Additionally, if given the types of changes to codegen, I’d like to see the before and after changes to try and ensure that code-gen quality isn’t regressed. For example, is the pic register saved and restored? If restored, is the code to save it actually better than simply appearing it out of thin air as did previously? If the patch that did all the pic work was in one patch, it would be easier for me to see the change in its entirety.
>For example, is the pic register saved and restored? Yes, if RA decided that it is better to save it. >If restored, is the code to save it actually better than simply appearing it out of thin air as did previously? "enabling ebx" gives performance mostly to loops. There still could be some performance issues, however we got pretty good results on a set of benchmarks: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00892.html To get performance diff now you can compare r216153 svn revision and r216304 with the patch in 63534#c33 There was quite long discussion on enabling starting here: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02186.html On Fri, Oct 17, 2014 at 10:32 PM, Mike Stump <mikestump@comcast.net> wrote: > On Oct 17, 2014, at 7:08 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> The patch fixes 1st fail in darwin bootstarp. >> When PIC register is pseudo we don't need to init it after setjmp or >> non local goto. >> >> Is it ok? > > So, I don’t see commentary in the PR that all fallout and all bugs introduced are fixed by the patch. :-( > > Given how central pic is to code-gen, I don’t favor any patch, until all bugs and regressions are fixed. If they can’t be, then I favor reversion of the patch that broke everything. > > Additionally, if given the types of changes to codegen, I’d like to see the before and after changes to try and ensure that code-gen quality isn’t regressed. For example, is the pic register saved and restored? If restored, is the code to save it actually better than simply appearing it out of thin air as did previously? If the patch that did all the pic work was in one patch, it would be easier for me to see the change in its entirety.
On 10/17/14 08:08, Evgeny Stupachenko wrote: > Hi, > > The patch fixes 1st fail in darwin bootstarp. > When PIC register is pseudo we don't need to init it after setjmp or > non local goto. > > Is it ok? > > ChangeLog: > > 2014-10-17 Evgeny Stupachenko <evstupac@gmail.com> > > PR target/63534 > * config/i386/i386.c (builtin_setjmp_receiver): Delete. > (nonlocal_goto_receiver): Ditto. Why do you think they're not needed? The builtin setjmp/longjmp implementation do not behave like what you're used to in the C library. Specifically, they do not save/restore register state beyond SP, FP and possibly ARGP. So let's take one step back -- what precisely about these patterns was causing a problem? My initial inclination is that we must set the PIC register here in the same manner as it's set in the prologue. Jeff
When PIC register is pseudo there is nothing special about it's value that setjmp can hurt. So if the pseudo register lives across setjmp_receiver RA should care about correct allocation (in case it is not saved/restored, it should go on stack). gcc.dg tests and specs I've tested behave like this. The initial problem comes from non-local goto as it tries to emit pseudo PIC register after reload. On Fri, Oct 31, 2014 at 11:14 PM, Jeff Law <law@redhat.com> wrote: > On 10/17/14 08:08, Evgeny Stupachenko wrote: >> >> Hi, >> >> The patch fixes 1st fail in darwin bootstarp. >> When PIC register is pseudo we don't need to init it after setjmp or >> non local goto. >> >> Is it ok? >> >> ChangeLog: >> >> 2014-10-17 Evgeny Stupachenko <evstupac@gmail.com> >> >> PR target/63534 >> * config/i386/i386.c (builtin_setjmp_receiver): Delete. >> (nonlocal_goto_receiver): Ditto. > > > Why do you think they're not needed? The builtin setjmp/longjmp > implementation do not behave like what you're used to in the C library. > Specifically, they do not save/restore register state beyond SP, FP and > possibly ARGP. > > So let's take one step back -- what precisely about these patterns was > causing a problem? My initial inclination is that we must set the PIC > register here in the same manner as it's set in the prologue. > > > > Jeff
On Nov 1, 2014, at 5:39 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: > When PIC register is pseudo there is nothing special about it's value > that setjmp can hurt. So if the pseudo register lives across > setjmp_receiver RA should care about correct allocation (in case it is > not saved/restored, it should go on stack). So, why is consuming more stack space beneficial?
On 11/01/14 06:39, Evgeny Stupachenko wrote: > When PIC register is pseudo there is nothing special about it's value > that setjmp can hurt. So if the pseudo register lives across > setjmp_receiver RA should care about correct allocation (in case it is > not saved/restored, it should go on stack). gcc.dg tests and specs > I've tested behave like this. If the allocator picked a call-clobbered register for the PIC register, then we're obviously OK since the setjmp has to be expected to clobber the PIC register. But if the PIC register is in a call-saved register, then it's going to be assumed to not be clobbered across calls and I don't believe that is guaranteed for builtin setjmp/longjmp. Those restore SP, FP and an ARGP, but not anything else by default. So the callee might have clobbered the call saved hard register, expecting to restore its value in its epilogue. But due to the longjmp, that epilogue never gets called and thus the call-saved register won't have the right value in the receiver. Now if your argument is that IRA/LRA handle this, that's fine, a pointer to that code would be appreciated so that it can be quickly audited. Certainly the old local-alloc/global-alloc had magic for setjmp/longjmp and maybe IRA/LRA does too, but it's better to be sure than just assume. > > The initial problem comes from non-local goto as it tries to emit > pseudo PIC register after reload. ? You mean it emits a reference to the pseudo into RTL? That would indicate that the allocators never put the pseudo into a hard register?!? RTL dumps with a few pointers to key insns would help here. jeff
> Now if your argument is that IRA/LRA handle this, that's fine, a pointer > to that code would be appreciated so that it can be quickly audited. > Certainly the old local-alloc/global-alloc had magic for setjmp/longjmp > and maybe IRA/LRA does too, but it's better to be sure than just assume. See ira-lives.c:1217 and below.
On Tue, Nov 4, 2014 at 1:40 AM, Jeff Law <law@redhat.com> wrote: > On 11/01/14 06:39, Evgeny Stupachenko wrote: >> >> When PIC register is pseudo there is nothing special about it's value >> that setjmp can hurt. So if the pseudo register lives across >> setjmp_receiver RA should care about correct allocation (in case it is >> not saved/restored, it should go on stack). gcc.dg tests and specs >> I've tested behave like this. > > If the allocator picked a call-clobbered register for the PIC register, then > we're obviously OK since the setjmp has to be expected to clobber the PIC > register. > > But if the PIC register is in a call-saved register, then it's going to be > assumed to not be clobbered across calls and I don't believe that is > guaranteed for builtin setjmp/longjmp. Those restore SP, FP and an ARGP, > but not anything else by default. I still don't see what is special for PIC register here. PIC pseudo now behave as every other pseudo register. If we assume that setjmp can change a pseudo register value we need IRA/LRA "magic" for each pseudo register. I believe that when we had EBX fixed, IRA/LRA don't save/restore it anywhere. Therefore we had to care about EBX value in special cases like setjmp/non-local goto. Now RA cares about PIC pseudo as well as about correct allocation for any pseudo register. > > So the callee might have clobbered the call saved hard register, expecting > to restore its value in its epilogue. But due to the longjmp, that epilogue > never gets called and thus the call-saved register won't have the right > value in the receiver. > > Now if your argument is that IRA/LRA handle this, that's fine, a pointer to > that code would be appreciated so that it can be quickly audited. Certainly > the old local-alloc/global-alloc had magic for setjmp/longjmp and maybe > IRA/LRA does too, but it's better to be sure than just assume. > >> >> The initial problem comes from non-local goto as it tries to emit >> pseudo PIC register after reload. > > ? You mean it emits a reference to the pseudo into RTL? That would > indicate that the allocators never put the pseudo into a hard register?!? > RTL dumps with a few pointers to key insns would help here. Correct, that is why Darwin crashes with ICE on non-local goto. We still have: (define_insn_and_split "nonlocal_goto_receiver" [(unspec_volatile [(const_int 0)] UNSPECV_NLGR)] "TARGET_MACHO && !TARGET_64BIT && flag_pic" "#" "&& reload_completed" [(const_int 0)] { if (crtl->uses_pic_offset_table) { rtx xops[3]; rtx label_rtx = gen_label_rtx (); rtx tmp; /* Get a new pic base. */ emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx)); ..... Here for MAC only we are trying to use pseudo PIC: pic_offset_table_rtx when reload_completed. > > jeff > >
We don't emit extra SET_GOT. That is beneficial. As for stack usage, that is RA to decide which register is more beneficial to put on stack. On Sat, Nov 1, 2014 at 8:33 PM, Mike Stump <mikestump@comcast.net> wrote: > On Nov 1, 2014, at 5:39 AM, Evgeny Stupachenko <evstupac@gmail.com> wrote: >> When PIC register is pseudo there is nothing special about it's value >> that setjmp can hurt. So if the pseudo register lives across >> setjmp_receiver RA should care about correct allocation (in case it is >> not saved/restored, it should go on stack). > > So, why is consuming more stack space beneficial?
On 11/05/14 04:54, Eric Botcazou wrote: >> Now if your argument is that IRA/LRA handle this, that's fine, a pointer >> to that code would be appreciated so that it can be quickly audited. >> Certainly the old local-alloc/global-alloc had magic for setjmp/longjmp >> and maybe IRA/LRA does too, but it's better to be sure than just assume. > > See ira-lives.c:1217 and below. Thanks. So that code creates a set of conflicts which, if I'm reading correctly, will prevent the PIC value from living in a register at all. Which ought to result in it being dumped into the stack and being reloaded for each use. Which ought to be safe (modulo the liveness bug Vlad is working on right now). Does that sound right to either of you? jeff
On 11/05/14 05:59, Evgeny Stupachenko wrote: > On Tue, Nov 4, 2014 at 1:40 AM, Jeff Law <law@redhat.com> wrote: >> On 11/01/14 06:39, Evgeny Stupachenko wrote: >>> >>> When PIC register is pseudo there is nothing special about it's >>> value that setjmp can hurt. So if the pseudo register lives >>> across setjmp_receiver RA should care about correct allocation >>> (in case it is not saved/restored, it should go on stack). gcc.dg >>> tests and specs I've tested behave like this. >> >> If the allocator picked a call-clobbered register for the PIC >> register, then we're obviously OK since the setjmp has to be >> expected to clobber the PIC register. >> >> But if the PIC register is in a call-saved register, then it's >> going to be assumed to not be clobbered across calls and I don't >> believe that is guaranteed for builtin setjmp/longjmp. Those >> restore SP, FP and an ARGP, but not anything else by default. > > I still don't see what is special for PIC register here. PIC pseudo > now behave as every other pseudo register. If we assume that setjmp > can change a pseudo register value we need IRA/LRA "magic" for each > pseudo register. And that's precisely what we have as Eric pointed out. Any allocnos that are live across calls are not allocated into hard registers if we call setjmp or can receive a nonlocal goto. > > I believe that when we had EBX fixed, IRA/LRA don't save/restore it > anywhere. Therefore we had to care about EBX value in special cases > like setjmp/non-local goto. Now RA cares about PIC pseudo as well as > about correct allocation for any pseudo register. Right. But what was missing was the explanation why this change is correct. With the knowledge about how IRA handles objects under these circumstances that Eric pointed to, it becomes much easier to see that the special handling is no longer desirable. >> ? You mean it emits a reference to the pseudo into RTL? That >> would indicate that the allocators never put the pseudo into a hard >> register?!? RTL dumps with a few pointers to key insns would help >> here. > > Correct, that is why Darwin crashes with ICE on non-local goto. We > still have: I was referring to the generated RTL to confirm what I thought you stated, namely that we ended up with a reference to the pseudo being emitted into the insn stream after reload. The only way I could see that happening would be if the pseudo wasn't allocated a hard register at all. Which we now know makes sense after Eric pointed us the magic in ira-lives.c. In the end it all comes down to what is the behaviour of the allocator for a value that is live across calls in these kinds of functions. I think I've got enough background to properly review now ;-) Jeff
> So that code creates a set of conflicts which, if I'm reading correctly, > will prevent the PIC value from living in a register at all. Which > ought to result in it being dumped into the stack and being reloaded for > each use. Which ought to be safe (modulo the liveness bug Vlad is > working on right now). > > Does that sound right to either of you? Yes, that's also my understanding of the code and the behavior required by builtin setjmp/longjmp. I can add that the Ada compiler would fall apart if this didn't work correctly in the compiler, and especially in the RA.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 624a1c1..fc3776f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16927,57 +16927,6 @@ "* return output_probe_stack_range (operands[0], operands[2]);" [(set_attr "type" "multi")]) -(define_expand "builtin_setjmp_receiver" - [(label_ref (match_operand 0))] - "!TARGET_64BIT && flag_pic" -{ -#if TARGET_MACHO - if (TARGET_MACHO) - { - rtx xops[3]; - rtx picreg = gen_rtx_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); - rtx_code_label *label_rtx = gen_label_rtx (); - emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx)); - xops[0] = xops[1] = picreg; - xops[2] = machopic_gen_offset (gen_rtx_LABEL_REF (SImode, label_rtx)); - ix86_expand_binary_operator (MINUS, SImode, xops); - } - else -#endif - emit_insn (gen_set_got (pic_offset_table_rtx)); - DONE; -}) - -(define_insn_and_split "nonlocal_goto_receiver" - [(unspec_volatile [(const_int 0)] UNSPECV_NLGR)] - "TARGET_MACHO && !TARGET_64BIT && flag_pic" - "#" - "&& reload_completed" - [(const_int 0)] -{ - if (crtl->uses_pic_offset_table) - { - rtx xops[3]; - rtx label_rtx = gen_label_rtx (); - rtx tmp; - - /* Get a new pic base. */ - emit_insn (gen_set_got_labelled (pic_offset_table_rtx, label_rtx)); - /* Correct this with the offset from the new to the old. */ - xops[0] = xops[1] = pic_offset_table_rtx; - label_rtx = gen_rtx_LABEL_REF (SImode, label_rtx); - tmp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, label_rtx), - UNSPEC_MACHOPIC_OFFSET); - xops[2] = gen_rtx_CONST (Pmode, tmp); - ix86_expand_binary_operator (MINUS, SImode, xops); - } - else - /* No pic reg restore needed. */ - emit_note (NOTE_INSN_DELETED); - - DONE; -}) - ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode. ;; Do not split instructions with mask registers. (define_split