Message ID | 2e5accd9-ebde-2bbc-02df-f0ad10776c73@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000,v2] Make load cost more in vectorization cost for P8/P9 | expand |
On Thu, Nov 07, 2019 at 11:22:12AM +0800, Kewen.Lin wrote: > One updated patch to enable it everywhere attached. > 2019-11-07 Kewen Lin <linkw@gcc.gnu.org> > > * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Make > scalar_load, vector_load, unaligned_load and vector_gather_load cost > more to conform hardware latency and insn cost settings. > case unaligned_load: > case vector_gather_load: ... > - /* Word aligned. */ > - return 22; > + /* Word aligned. */ > + return 44; I don't think it should go up from 22 all the way to 44 (not all insns here are loads). But exact cost doesn't really matter. Make it 30 perhaps? 44 (as well as 22) are awfully precise numbers for a very imprecise cost like this ;-) With either cost, whatever seems reasonable to you and works well in your tests: approved for trunk. Thanks! Segher
Hi Segher, on 2019/11/8 上午6:36, Segher Boessenkool wrote: > On Thu, Nov 07, 2019 at 11:22:12AM +0800, Kewen.Lin wrote: >> One updated patch to enable it everywhere attached. > >> 2019-11-07 Kewen Lin <linkw@gcc.gnu.org> >> >> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Make >> scalar_load, vector_load, unaligned_load and vector_gather_load cost >> more to conform hardware latency and insn cost settings. > >> case unaligned_load: >> case vector_gather_load: > > ... > >> - /* Word aligned. */ >> - return 22; > >> + /* Word aligned. */ >> + return 44; > > I don't think it should go up from 22 all the way to 44 (not all insns here > are loads). But exact cost doesn't really matter. Make it 30 perhaps? > Good point, I'll try the cost 33 (the avg. of 22 and 44). > 44 (as well as 22) are awfully precise numbers for a very imprecise cost > like this ;-) Yep! ;-) > > With either cost, whatever seems reasonable to you and works well in your > tests: approved for trunk. Thanks! Thanks! I'll kick off two regression testing on both BE and LE with new cost, then commit it if everything goes well. BR, Kewen
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 5876714..1094fbd 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4763,15 +4763,17 @@ rs6000_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, switch (type_of_cost) { case scalar_stmt: - case scalar_load: case scalar_store: case vector_stmt: - case vector_load: case vector_store: case vec_to_scalar: case scalar_to_vec: case cond_branch_not_taken: return 1; + case scalar_load: + case vector_load: + /* Like rs6000_insn_cost, make load insns cost a bit more. */ + return 2; case vec_perm: /* Power7 has only one permute unit, make it a bit expensive. */ @@ -4792,42 +4794,44 @@ rs6000_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, case unaligned_load: case vector_gather_load: + /* Like rs6000_insn_cost, make load insns cost a bit more. */ if (TARGET_EFFICIENT_UNALIGNED_VSX) - return 1; - - if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) - { - elements = TYPE_VECTOR_SUBPARTS (vectype); - if (elements == 2) - /* Double word aligned. */ - return 2; - - if (elements == 4) - { - switch (misalign) - { - case 8: - /* Double word aligned. */ - return 2; + return 2; - case -1: - /* Unknown misalignment. */ - case 4: - case 12: - /* Word aligned. */ - return 22; + if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) + { + elements = TYPE_VECTOR_SUBPARTS (vectype); + if (elements == 2) + /* Double word aligned. */ + return 4; - default: - gcc_unreachable (); - } - } - } + if (elements == 4) + { + switch (misalign) + { + case 8: + /* Double word aligned. */ + return 4; + + case -1: + /* Unknown misalignment. */ + case 4: + case 12: + /* Word aligned. */ + return 44; + + default: + gcc_unreachable (); + } + } + } - if (TARGET_ALTIVEC) - /* Misaligned loads are not supported. */ - gcc_unreachable (); + if (TARGET_ALTIVEC) + /* Misaligned loads are not supported. */ + gcc_unreachable (); - return 2; + /* Like rs6000_insn_cost, make load insns cost a bit more. */ + return 4; case unaligned_store: case vector_scatter_store: