Message ID | 53FE3D46.8050904@redhat.com |
---|---|
State | New |
Headers | show |
2014-08-28 0:19 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>: > On 2014-08-26 5:42 PM, Ilya Enkovich wrote: >> >> Hi, >> >> Here is a patch I tried. I apply it over revision 214215. Unfortunately >> I do not have a small reproducer but the problem can be easily reproduced on >> SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float >> value is compared with float constant 1.0. It is inlined into read_arch >> function and can be easily found in RTL dump of function read_arch as a >> float comparison with 1.0 after the first call to strtod function. >> >> Here is a compilation string I use: >> >> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math >> -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o >> -DSPEC_CPU2000 read_arch.c >> >> In my final assembler comparison with 1.0 looks like: >> >> comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = >> 7] >> >> and %ebp here doesn't have a proper value. >> >> I'll try to make a smaller reproducer if these instructions don't help. > > > I've managed to reproduce it. Although it would be better to send the patch > as an attachment. > > The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. > Then in a region when a *new* pseudo used as PIC we rematerialize a constant > which transformed in memory addressed through *original* PIC pseudo. > > To solve the problem we should prevent such splitting and guarantee that PIC > pseudo allocnos in different region gets the same hard reg. > > The following patch should solve the problem. > Thanks for the patch! I'll try it and be back with results. Ilya >
2014-08-28 12:28 GMT+04:00 Ilya Enkovich <enkovich.gnu@gmail.com>: > 2014-08-28 0:19 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>: >> On 2014-08-26 5:42 PM, Ilya Enkovich wrote: >>> >>> Hi, >>> >>> Here is a patch I tried. I apply it over revision 214215. Unfortunately >>> I do not have a small reproducer but the problem can be easily reproduced on >>> SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float >>> value is compared with float constant 1.0. It is inlined into read_arch >>> function and can be easily found in RTL dump of function read_arch as a >>> float comparison with 1.0 after the first call to strtod function. >>> >>> Here is a compilation string I use: >>> >>> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math >>> -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o >>> -DSPEC_CPU2000 read_arch.c >>> >>> In my final assembler comparison with 1.0 looks like: >>> >>> comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = >>> 7] >>> >>> and %ebp here doesn't have a proper value. >>> >>> I'll try to make a smaller reproducer if these instructions don't help. >> >> >> I've managed to reproduce it. Although it would be better to send the patch >> as an attachment. >> >> The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. >> Then in a region when a *new* pseudo used as PIC we rematerialize a constant >> which transformed in memory addressed through *original* PIC pseudo. >> >> To solve the problem we should prevent such splitting and guarantee that PIC >> pseudo allocnos in different region gets the same hard reg. >> >> The following patch should solve the problem. >> > > Thanks for the patch! I'll try it and be back with results. Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI ("*.LC0") [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9999999999999997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI ("*.LC0") [flags 0x2]) ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9999999999999997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Thanks, Ilya > > Ilya >>
On 08/29/2014 02:47 AM, Ilya Enkovich wrote: > Seems your patch doesn't cover all cases. Attached is a modified > patch (with your changes included) and a test where double constant is > wrongly rematerialized. I also see in ira dump that there is still a > copy of PIC reg created: > > Initialization of original PIC reg: > (insn 23 22 24 2 (set (reg:SI 127) > (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} > (expr_list:REG_DEAD (reg:SI 3 bx) > (nil))) > ... > Copy is created: > (insn 135 37 25 3 (set (reg:SI 138 [127]) > (reg:SI 127)) 90 {*movsi_internal} > (expr_list:REG_DEAD (reg:SI 127) > (nil))) > ... > Copy is used: > (insn 119 25 122 3 (set (reg:DF 134) > (mem/u/c:DF (plus:SI (reg:SI 138 [127]) > (const:SI (unspec:SI [ > (symbol_ref/u:SI ("*.LC0") [flags 0x2]) > ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} > (expr_list:REG_EQUIV (const_double:DF > 2.9999999999999997371893933895137251965934410691261292e-4 > [0x0.9d495182a99308p-11]) > (nil))) > > After reload we have new usage of r127 which is allocated to ecx which > actually does not have any definition in this function at all. > > (insn 151 42 44 4 (set (reg:SI 0 ax [147]) > (plus:SI (reg:SI 2 cx [127]) > (const:SI (unspec:SI [ > (symbol_ref/u:SI ("*.LC0") [flags 0x2]) > ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi} > (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2]) > (nil))) > (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) > (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) > (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 > 790 {*fop_df_comm_sse} > (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) > (const_double:DF > 2.9999999999999997371893933895137251965934410691261292e-4 > [0x0.9d495182a99308p-11])) > (nil))) > > Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc > > Ok, Ilya. I'll look at the problem this week.
Index: ira-color.c =================================================================== --- ira-color.c (revision 214576) +++ ira-color.c (working copy) @@ -3239,9 +3239,10 @@ ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass); ira_assert (bitmap_bit_p (subloop_node->all_allocnos, ALLOCNO_NUM (subloop_allocno))); - if ((flag_ira_region == IRA_REGION_MIXED) - && (loop_tree_node->reg_pressure[pclass] - <= ira_class_hard_regs_num[pclass])) + if ((flag_ira_region == IRA_REGION_MIXED + && (loop_tree_node->reg_pressure[pclass] + <= ira_class_hard_regs_num[pclass])) + || regno == (int) REGNO (pic_offset_table_rtx)) { if (! ALLOCNO_ASSIGNED_P (subloop_allocno)) { Index: ira-emit.c =================================================================== --- ira-emit.c (revision 214576) +++ ira-emit.c (working copy) @@ -620,7 +620,8 @@ /* don't create copies because reload can spill an allocno set by copy although the allocno will not get memory slot. */ - || ira_equiv_no_lvalue_p (regno))) + || ira_equiv_no_lvalue_p (regno) + || ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx))) continue; original_reg = allocno_emit_reg (allocno); if (parent_allocno == NULL