Message ID | AM4PR0701MB2162FA0509CEF1A1D865EB0FE4560@AM4PR0701MB2162.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
On 06/16/2016 08:47 AM, Bernd Edlinger wrote: > Hi! > > > By the design of the target hook INITIAL_ELIMINATION_OFFSET > it is necessary to call this function several times with > different register combinations. > Most targets use a cached data structure that describes the > exact frame layout of the current function. > > It is safe to skip the computation when reload_completed = true, > and most targets do that already. > > However while reload is doing its work, it is not clear when to > do the computation and when not. This results in unnecessary > work. Computing the frame layout can be a simple function or an > arbitrarily complex one, that walks all instructions of the current > function for instance, which is more or less the common case. > > > This patch adds a new optional target hook that can be used > by the target to factor the INITIAL_ELIMINATION_OFFSET-hook > into a O(n) computation part, and a O(1) result function. > > The patch implements a compute_frame_layout target hook just > for ARM in the moment, to show the principle. > Other targets may also implement that hook, if it seems appropriate. > > > Boot-strapped and reg-tested on arm-linux-gnueabihf. > OK for trunk? > > > Thanks > Bernd. > > > changelog-frame-layout.txt > > > 2016-06-16 Bernd Edlinger <bernd.edlinger@hotmail.de> > > * target.def (compute_frame_layout): New optional target hook. > * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook. > * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation. > * lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout > target hook. > * reload1.c (verify_initial_elim_offsets): Likewise. > * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define. > (use_simple_return_p): Call arm_compute_frame_layout if needed. > (arm_get_frame_offsets): Split up into this ... > (arm_compute_frame_layout): ... and this function. The ARM maintainers would need to chime in on the ARM specific changes though. > Index: gcc/target.def > =================================================================== > --- gcc/target.def (Revision 233176) > +++ gcc/target.def (Arbeitskopie) > @@ -5245,8 +5245,19 @@ five otherwise. This is best for most machines.", > unsigned int, (void), > default_case_values_threshold) > > -/* Retutn true if a function must have and use a frame pointer. */ s/Retutn/Return > +/* Optional callback to advise the target to compute the frame layout. */ > DEFHOOK > +(compute_frame_layout, > + "This target hook is called immediately before reload wants to call\n\ > +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the frame\n\ > +layout instead of re-computing it on every invocation. This is particularly\n\ > +useful for targets that have an O(n) frame layout function. Implementing\n\ > +this callback is optional.", > + void, (void), > + hook_void_void) So the docs say "immediately before", but that's not actually reality in lra-eliminations. I think you can just say "This target hook is called before reload or lra-eliminations calls @code{INITIAL_ELIMINATION_OFFSET} and allows ..." How does this macro interact with INITIAL_FRAME_POINTER_OFFSET? I'm OK with this conceptually. I think you need a minor doc update and OK from the ARM maintainers before it can be installed though. jeff
On 06/21/16 23:29, Jeff Law wrote: > On 06/16/2016 08:47 AM, Bernd Edlinger wrote: >> Hi! >> >> >> By the design of the target hook INITIAL_ELIMINATION_OFFSET >> it is necessary to call this function several times with >> different register combinations. >> Most targets use a cached data structure that describes the >> exact frame layout of the current function. >> >> It is safe to skip the computation when reload_completed = true, >> and most targets do that already. >> >> However while reload is doing its work, it is not clear when to >> do the computation and when not. This results in unnecessary >> work. Computing the frame layout can be a simple function or an >> arbitrarily complex one, that walks all instructions of the current >> function for instance, which is more or less the common case. >> >> >> This patch adds a new optional target hook that can be used >> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook >> into a O(n) computation part, and a O(1) result function. >> >> The patch implements a compute_frame_layout target hook just >> for ARM in the moment, to show the principle. >> Other targets may also implement that hook, if it seems appropriate. >> >> >> Boot-strapped and reg-tested on arm-linux-gnueabihf. >> OK for trunk? >> >> >> Thanks >> Bernd. >> >> >> changelog-frame-layout.txt >> >> >> 2016-06-16 Bernd Edlinger <bernd.edlinger@hotmail.de> >> >> * target.def (compute_frame_layout): New optional target hook. >> * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook. >> * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation. >> * lra-eliminations.c (update_reg_eliminate): Call >> compute_frame_layout >> target hook. >> * reload1.c (verify_initial_elim_offsets): Likewise. >> * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define. >> (use_simple_return_p): Call arm_compute_frame_layout if needed. >> (arm_get_frame_offsets): Split up into this ... >> (arm_compute_frame_layout): ... and this function. > The ARM maintainers would need to chime in on the ARM specific changes > though. > > > >> Index: gcc/target.def >> =================================================================== >> --- gcc/target.def (Revision 233176) >> +++ gcc/target.def (Arbeitskopie) >> @@ -5245,8 +5245,19 @@ five otherwise. This is best for most machines.", >> unsigned int, (void), >> default_case_values_threshold) >> >> -/* Retutn true if a function must have and use a frame pointer. */ > s/Retutn/Return > yes, a line "+ /* Return ..." is a few lines below. >> +/* Optional callback to advise the target to compute the frame >> layout. */ >> DEFHOOK >> +(compute_frame_layout, >> + "This target hook is called immediately before reload wants to call\n\ >> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the >> frame\n\ >> +layout instead of re-computing it on every invocation. This is >> particularly\n\ >> +useful for targets that have an O(n) frame layout function. >> Implementing\n\ >> +this callback is optional.", >> + void, (void), >> + hook_void_void) > So the docs say "immediately before", but that's not actually reality in > lra-eliminations. I think you can just say "This target hook is called > before reload or lra-eliminations calls > @code{INITIAL_ELIMINATION_OFFSET} and allows ..." > > > How does this macro interact with INITIAL_FRAME_POINTER_OFFSET? What I wanted to say here, is that lra goes thru several iterations, changes something in the register allocation that has an impact on the frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET 3-4 times in a row, and in the results must be consistent in each iteration to be usable. So I am open to suggestions, how would you explain this idea in the doc? Thanks Bernd. > > I'm OK with this conceptually. I think you need a minor doc update and > OK from the ARM maintainers before it can be installed though. > > jeff
On 06/21/16 23:29, Jeff Law wrote: > > How does this macro interact with INITIAL_FRAME_POINTER_OFFSET? That I forgot to mention: INITIAL_FRAME_POINTER_OFFSET is just a single call, so whenever it is called from lra/reload the frame layout is really expected to change, and so it does not make a difference if the target computes the frame layout in TARGET_COMPUTE_FRAME_LAYOUT or in INITIAL_FRAME_POINTER_OFFSET. But I do not know of any targets that still use INITIAL_FRAME_POINTER_OFFSET, and maybe support for this target hook could be discontinued as a follow-up patch. What do you think? Bernd.
On 06/22/2016 01:20 AM, Bernd Edlinger wrote: > On 06/21/16 23:29, Jeff Law wrote: >> >> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET? > > That I forgot to mention: INITIAL_FRAME_POINTER_OFFSET is just > a single call, so whenever it is called from lra/reload the frame layout > is really expected to change, and so it does not make a difference if the target > computes the frame layout in TARGET_COMPUTE_FRAME_LAYOUT or in > INITIAL_FRAME_POINTER_OFFSET. > > But I do not know of any targets that still use INITIAL_FRAME_POINTER_OFFSET, > and maybe support for this target hook could be discontinued as a follow-up patch. INITIAL_FRAME_POINTER_OFFSET is only defined in 4 ports: ./ft32/ft32.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0 ./m32r/m32r.h:#define INITIAL_FRAME_POINTER_OFFSET(VAR) \ ./moxie/moxie.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0 ./vax/vax.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0; However, the m32r version is actually #if 0'd out. So it's really only defined in 3 ports and always to "0". So yea, it'd be a good candidate to collapse away. Jeff
On 06/21/2016 11:12 PM, Bernd Edlinger wrote: > > What I wanted to say here, is that lra goes thru several iterations, > changes something in the register allocation that has an impact on the > frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET > 3-4 times in a row, and in the results must be consistent in each > iteration to be usable. > > So I am open to suggestions, how would you explain this idea in the doc? I'm not sure :( The goal is still the same, you're trying to separate the O(n) from the O(1) operations. So you want the COMPUTE_FRAME_LAYOUT hook to be called once for things which don't vary and INITIAL_ELIMINATION_OFFSET multiple times for things that do vary. Thinking more about this, which port has has a particularly expensive INITIAL_ELIMINATION_OFFSET? Jeff
On 06/22/16 20:49, Jeff Law wrote: > On 06/21/2016 11:12 PM, Bernd Edlinger wrote: > >> >> What I wanted to say here, is that lra goes thru several iterations, >> changes something in the register allocation that has an impact on the >> frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET >> 3-4 times in a row, and in the results must be consistent in each >> iteration to be usable. >> >> So I am open to suggestions, how would you explain this idea in the doc? > I'm not sure :( The goal is still the same, you're trying to separate > the O(n) from the O(1) operations. So you want the COMPUTE_FRAME_LAYOUT > hook to be called once for things which don't vary and > INITIAL_ELIMINATION_OFFSET multiple times for things that do vary. > > Thinking more about this, which port has has a particularly expensive > INITIAL_ELIMINATION_OFFSET? > I'd bet on mips for instance. Bernd.
Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (Revision 233176) +++ gcc/config/arm/arm.c (Arbeitskopie) @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx); static bool arm_needs_doubleword_align (machine_mode, const_tree); static int arm_compute_static_chain_stack_bytes (void); static arm_stack_offsets *arm_get_frame_offsets (void); +static void arm_compute_frame_layout (void); static void arm_add_gc_roots (void); static int arm_gen_constant (enum rtx_code, machine_mode, rtx, unsigned HOST_WIDE_INT, rtx, rtx, int, int); @@ -669,6 +670,9 @@ static const struct attribute_spec arm_attribute_t #undef TARGET_SCALAR_MODE_SUPPORTED_P #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p +#undef TARGET_COMPUTE_FRAME_LAYOUT +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout + #undef TARGET_FRAME_POINTER_REQUIRED #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required @@ -3813,6 +3817,10 @@ use_simple_return_p (void) { arm_stack_offsets *offsets; + /* Note this function can be called before or after reload. */ + if (!reload_completed) + arm_compute_frame_layout (); + offsets = arm_get_frame_offsets (); return offsets->outgoing_args != 0; } @@ -19238,7 +19246,7 @@ arm_compute_static_chain_stack_bytes (void) /* Compute a bit mask of which registers need to be saved on the stack for the current function. - This is used by arm_get_frame_offsets, which may add extra registers. */ + This is used by arm_compute_frame_layout, which may add extra registers. */ static unsigned long arm_compute_save_reg_mask (void) @@ -20789,12 +20797,25 @@ any_sibcall_could_use_r3 (void) alignment. */ +/* Return cached stack offsets. */ + +static arm_stack_offsets * +arm_get_frame_offsets (void) +{ + struct arm_stack_offsets *offsets; + + offsets = &cfun->machine->stack_offsets; + + return offsets; +} + + /* Calculate stack offsets. These are used to calculate register elimination offsets and in prologue/epilogue code. Also calculates which registers should be saved. */ -static arm_stack_offsets * -arm_get_frame_offsets (void) +static void +arm_compute_frame_layout (void) { struct arm_stack_offsets *offsets; unsigned long func_type; @@ -20806,19 +20827,6 @@ any_sibcall_could_use_r3 (void) offsets = &cfun->machine->stack_offsets; - /* We need to know if we are a leaf function. Unfortunately, it - is possible to be called after start_sequence has been called, - which causes get_insns to return the insns for the sequence, - not the function, which will cause leaf_function_p to return - the incorrect result. - - to know about leaf functions once reload has completed, and the - frame size cannot be changed after that time, so we can safely - use the cached value. */ - - if (reload_completed) - return offsets; - /* Initially this is the size of the local variables. It will translated into an offset once we have determined the size of preceding data. */ frame_size = ROUND_UP_WORD (get_frame_size ()); @@ -20885,7 +20893,7 @@ any_sibcall_could_use_r3 (void) { offsets->outgoing_args = offsets->soft_frame; offsets->locals_base = offsets->soft_frame; - return offsets; + return; } /* Ensure SFP has the correct alignment. */ @@ -20961,8 +20969,6 @@ any_sibcall_could_use_r3 (void) offsets->outgoing_args += 4; gcc_assert (!(offsets->outgoing_args & 7)); } - - return offsets; } Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (Revision 233176) +++ gcc/doc/tm.texi (Arbeitskopie) @@ -3693,6 +3693,14 @@ registers. This macro must be defined if @code{EL defined. @end defmac +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void) +This target hook is called immediately before reload wants to call +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the frame +layout instead of re-computing it on every invocation. This is particularly +useful for targets that have an O(n) frame layout function. Implementing +this callback is optional. +@end deftypefn + @node Stack Arguments @subsection Passing Function Arguments on the Stack @cindex arguments on stack Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (Revision 233176) +++ gcc/doc/tm.texi.in (Arbeitskopie) @@ -3227,6 +3227,8 @@ registers. This macro must be defined if @code{EL defined. @end defmac +@hook TARGET_COMPUTE_FRAME_LAYOUT + @node Stack Arguments @subsection Passing Function Arguments on the Stack @cindex arguments on stack Index: gcc/lra-eliminations.c =================================================================== --- gcc/lra-eliminations.c (Revision 233176) +++ gcc/lra-eliminations.c (Arbeitskopie) @@ -1202,6 +1202,10 @@ update_reg_eliminate (bitmap insns_with_changed_of struct lra_elim_table *ep, *ep1; HARD_REG_SET temp_hard_reg_set; +#ifdef ELIMINABLE_REGS + targetm.compute_frame_layout (); +#endif + /* Clear self elimination offsets. */ for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) self_elim_offsets[ep->from] = 0; Index: gcc/reload1.c =================================================================== --- gcc/reload1.c (Revision 233176) +++ gcc/reload1.c (Arbeitskopie) @@ -3856,6 +3856,7 @@ verify_initial_elim_offsets (void) { struct elim_table *ep; + targetm.compute_frame_layout (); for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) { INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t); @@ -3880,6 +3881,7 @@ set_initial_elim_offsets (void) struct elim_table *ep = reg_eliminate; #ifdef ELIMINABLE_REGS + targetm.compute_frame_layout (); for (; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) { INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset); Index: gcc/target.def =================================================================== --- gcc/target.def (Revision 233176) +++ gcc/target.def (Arbeitskopie) @@ -5245,8 +5245,19 @@ five otherwise. This is best for most machines.", unsigned int, (void), default_case_values_threshold) -/* Retutn true if a function must have and use a frame pointer. */ +/* Optional callback to advise the target to compute the frame layout. */ DEFHOOK +(compute_frame_layout, + "This target hook is called immediately before reload wants to call\n\ +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the frame\n\ +layout instead of re-computing it on every invocation. This is particularly\n\ +useful for targets that have an O(n) frame layout function. Implementing\n\ +this callback is optional.", + void, (void), + hook_void_void) + +/* Return true if a function must have and use a frame pointer. */ +DEFHOOK (frame_pointer_required, "This target hook should return @code{true} if a function must have and use\n\ a frame pointer. This target hook is called in the reload pass. If its return\n\