Message ID | 4C227A74.1070201@codesourcery.com |
---|---|
State | New |
Headers | show |
On Wed, Jun 23, 2010 at 2:19 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote: > This patch enables optimizations, particularly GCSE, handle calculation of > PIC addresses. GCSE tracks only single instructions, so it can't handle > two-instruction calculation of PIC address. The reg_equal note on the second instruction should have been enough to solve this issue. This is how it is optimized on PowerPC and some other targets. Why is not working for arm? Thanks, Andrew Pinski
On Wed, Jun 23, 2010 at 11:19 PM, Maxim Kuvyrkov <maxim@codesourcery.com> wrote: > This patch enables optimizations, particularly GCSE, handle calculation of > PIC addresses. GCSE tracks only single instructions, so it can't handle > two-instruction calculation of PIC address. > > With this patch, calculations of PIC addresses are represented as single > instructions allowing GCSE eliminate all but the first address calculation > for global variables. > > Any comments? Yes. This is what we added GCSE's ability to eliminate redundancies from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note, GCSE is (or should be) already able to eliminate redundant address calculations. Ciao! Steven
On 6/24/10 1:22 AM, Steven Bosscher wrote: > On Wed, Jun 23, 2010 at 11:19 PM, Maxim Kuvyrkov<maxim@codesourcery.com> wrote: >> This patch enables optimizations, particularly GCSE, handle calculation of >> PIC addresses. GCSE tracks only single instructions, so it can't handle >> two-instruction calculation of PIC address. >> >> With this patch, calculations of PIC addresses are represented as single >> instructions allowing GCSE eliminate all but the first address calculation >> for global variables. >> >> Any comments? > > Yes. This is what we added GCSE's ability to eliminate redundancies > from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note, > GCSE is (or should be) already able to eliminate redundant address > calculations. You know, it turns out GCSE can eliminate calculation of PIC addresses on ARM. When I started working on improving code hoisting the example with PIC address wasn't fully optimized without this patch. Now, apparently, one of the other GCSE improvements (VBEout computation, probably) fixed the underlying problem. Richard, unless you think the patch may be valuable for some other reason, I'm dropping it. Thanks,
On 6/24/10 1:50 AM, Maxim Kuvyrkov wrote: > On 6/24/10 1:22 AM, Steven Bosscher wrote: >> On Wed, Jun 23, 2010 at 11:19 PM, Maxim >> Kuvyrkov<maxim@codesourcery.com> wrote: >>> This patch enables optimizations, particularly GCSE, handle >>> calculation of >>> PIC addresses. GCSE tracks only single instructions, so it can't handle >>> two-instruction calculation of PIC address. >>> >>> With this patch, calculations of PIC addresses are represented as single >>> instructions allowing GCSE eliminate all but the first address >>> calculation >>> for global variables. >>> >>> Any comments? >> >> Yes. This is what we added GCSE's ability to eliminate redundancies >> from REG_EQUAL notes for. If your PIC addresses have a REG_EQUAL note, >> GCSE is (or should be) already able to eliminate redundant address >> calculations. > > You know, it turns out GCSE can eliminate calculation of PIC addresses > on ARM. When I started working on improving code hoisting the example > with PIC address wasn't fully optimized without this patch. It was late in the night when I checked the generated code and, although GCSE of PIC addresses for ARM is now better, it is still not as good as with this patch. GCSE cannot use (REG_EQUAL (symbol_ref)) note on the second instruction because can_assign_to_reg_without_clobbers returns false for symbol_ref when compiling PIC code. (Symbol_ref) is not a LEGITIMATE_PIC_OPERAND_P, so it not a general_operand either. The second check in can_assign_to_reg_without_clobbers returns false as (set (reg) (symbol_ref)) yields invalid instruction. AFAICT, GCSE cannot optimize a bare symbol_ref for ARM PIC code because it has no guarantee that emit_move_insn (reg, symbol_ref) will generate simple enough code. > > Now, apparently, one of the other GCSE improvements (VBEout computation, > probably) fixed the underlying problem. Improvements to GCSE'ing constants were able to optimize half of second and subsequent address calculations. > > Richard, unless you think the patch may be valuable for some other > reason, I'm dropping it. The patch stands for now. Thank you,
On 6/24/10 3:04 PM, Maxim Kuvyrkov wrote: ... > GCSE cannot use (REG_EQUAL (symbol_ref)) note on the second instruction > because can_assign_to_reg_without_clobbers returns false for symbol_ref > when compiling PIC code. (Symbol_ref) is not a LEGITIMATE_PIC_OPERAND_P, > so it not a general_operand either. The second check in > can_assign_to_reg_without_clobbers returns false as (set (reg) > (symbol_ref)) yields invalid instruction. ... Ping. Improvements to code hoisting provide about 0.8% size reduction on Thumb PIC code. Without this patch space reduction is much less. Thank you,
On Thu, 2010-06-24 at 01:19 +0400, Maxim Kuvyrkov wrote: > This patch enables optimizations, particularly GCSE, handle calculation > of PIC addresses. GCSE tracks only single instructions, so it can't > handle two-instruction calculation of PIC address. > > With this patch, calculations of PIC addresses are represented as single > instructions allowing GCSE eliminate all but the first address > calculation for global variables. > > Any comments? OK to check in? +/* Return true to X will surely end up in an index register after the first + splitting pass. */ s/Return true to/Return true if/ Other than that, this is ok. R.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 5671587..d846557 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -4897,17 +4897,13 @@ legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg) if (GET_CODE (orig) == SYMBOL_REF || GET_CODE (orig) == LABEL_REF) { - rtx pic_ref, address; rtx insn; if (reg == 0) { gcc_assert (can_create_pseudo_p ()); reg = gen_reg_rtx (Pmode); - address = gen_reg_rtx (Pmode); } - else - address = reg; /* VxWorks does not impose a fixed gap between segments; the run-time gap can be different from the object-file gap. We therefore can't @@ -4923,18 +4919,21 @@ legitimize_pic_address (rtx orig, enum machine_mode mode, rtx reg) insn = arm_pic_static_addr (orig, reg); else { + rtx pat; + rtx mem; + /* If this function doesn't have a pic register, create one now. */ require_pic_register (); - if (TARGET_32BIT) - emit_insn (gen_pic_load_addr_32bit (address, orig)); - else /* TARGET_THUMB1 */ - emit_insn (gen_pic_load_addr_thumb1 (address, orig)); + pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig); - pic_ref = gen_const_mem (Pmode, - gen_rtx_PLUS (Pmode, cfun->machine->pic_reg, - address)); - insn = emit_move_insn (reg, pic_ref); + /* Make the MEM as close to a constant as possible. */ + mem = SET_SRC (pat); + gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem)); + MEM_READONLY_P (mem) = 1; + MEM_NOTRAP_P (mem) = 1; + + insn = emit_insn (pat); } /* Put a REG_EQUAL note on this insn, so that it can be optimized @@ -5214,6 +5213,15 @@ pcrel_constant_p (rtx x) return FALSE; } +/* Return true to X will surely end up in an index register after the first + splitting pass. */ +static bool +will_be_in_index_register (const_rtx x) +{ + /* arm.md: calculate_pic_address will split this into a register. */ + return GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_PIC_SYM; +} + /* Return nonzero if X is a valid ARM state address operand. */ int arm_legitimate_address_outer_p (enum machine_mode mode, rtx x, RTX_CODE outer, @@ -5271,8 +5279,9 @@ arm_legitimate_address_outer_p (enum machine_mode mode, rtx x, RTX_CODE outer, rtx xop1 = XEXP (x, 1); return ((arm_address_register_rtx_p (xop0, strict_p) - && GET_CODE(xop1) == CONST_INT - && arm_legitimate_index_p (mode, xop1, outer, strict_p)) + && ((GET_CODE(xop1) == CONST_INT + && arm_legitimate_index_p (mode, xop1, outer, strict_p)) + || (!strict_p && will_be_in_index_register (xop1)))) || (arm_address_register_rtx_p (xop1, strict_p) && arm_legitimate_index_p (mode, xop0, outer, strict_p))); } @@ -5358,7 +5367,8 @@ thumb2_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p) rtx xop1 = XEXP (x, 1); return ((arm_address_register_rtx_p (xop0, strict_p) - && thumb2_legitimate_index_p (mode, xop1, strict_p)) + && (thumb2_legitimate_index_p (mode, xop1, strict_p) + || (!strict_p && will_be_in_index_register (xop1)))) || (arm_address_register_rtx_p (xop1, strict_p) && thumb2_legitimate_index_p (mode, xop0, strict_p))); } @@ -5661,7 +5671,8 @@ thumb1_legitimate_address_p (enum machine_mode mode, rtx x, int strict_p) && XEXP (x, 0) != frame_pointer_rtx && XEXP (x, 1) != frame_pointer_rtx && thumb1_index_register_rtx_p (XEXP (x, 0), strict_p) - && thumb1_index_register_rtx_p (XEXP (x, 1), strict_p)) + && (thumb1_index_register_rtx_p (XEXP (x, 1), strict_p) + || (!strict_p && will_be_in_index_register (XEXP (x, 1))))) return 1; /* REG+const has 5-7 bit offset for non-SP registers. */ diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index b6cca49..534bfc7 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5231,6 +5231,34 @@ ;; we use an unspec. The offset will be loaded from a constant pool entry, ;; since that is the only type of relocation we can use. +;; Wrap calculation of the whole PIC address in a single pattern for the +;; benefit of optimizers, particularly, PRE and HOIST. Calculation of +;; a PIC address involves two loads from memory, so we want to CSE it +;; as often as possible. +;; This pattern will be split into one of the pic_load_addr_* patterns +;; and a move after GCSE optimizations. +;; +;; Note: Update arm.c: legitimize_pic_address() when changing this pattern. +(define_expand "calculate_pic_address" + [(set (match_operand:SI 0 "register_operand" "") + (mem:SI (plus:SI (match_operand:SI 1 "register_operand" "") + (unspec:SI [(match_operand:SI 2 "" "")] + UNSPEC_PIC_SYM))))] + "flag_pic" +) + +;; Split calculate_pic_address into pic_load_addr_* and a move. +(define_split + [(set (match_operand:SI 0 "register_operand" "") + (mem:SI (plus:SI (match_operand:SI 1 "register_operand" "") + (unspec:SI [(match_operand:SI 2 "" "")] + UNSPEC_PIC_SYM))))] + "flag_pic" + [(set (match_dup 3) (unspec:SI [(match_dup 2)] UNSPEC_PIC_SYM)) + (set (match_dup 0) (mem:SI (plus:SI (match_dup 1) (match_dup 3))))] + "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];" +) + ;; The rather odd constraints on the following are to force reload to leave ;; the insn alone, and to force the minipool generation pass to then move ;; the GOT symbol to memory.