Message ID | a76558b3-68db-9bb3-3b31-4f601c69f453@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Refactor peel_iters_{pro,epi}logue cost model handlings | expand |
"Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi, > > As Richard S. suggested in the thread: > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550633.html > > this patch is separated from the one of that thread, mainly to refactor the > existing peel_iters_{pro,epi}logue cost model handlings. > > I've addressed Richard S.'s review comments there, moreover because of one > failure of aarch64 testing, I updated it a bit more to keep the logic unchanged > as before first (refactor_cost.diff). Heh, nice when a clean-up exposes an existing bug. ;-) I agree the updates look correct. E.g. if vf is 1, we should assume that there are no peeled iterations even if the number of iterations is unknown. Both patches are OK with some very minor nits (sorry): > gcc/ChangeLog: > > * tree-vect-loop.c (vect_get_known_peeling_cost): Factor out some code > to determine peel_iters_epilogue to function ... > (vect_get_peel_iters_epilogue): ... this. New function. to determine peel_iters_epilogue to... (vect_get_peel_iters_epilogue): ...this new function. > + if (dump_enabled_p ()) > + dump_printf (MSG_NOTE, "cost model: " > + "prologue peel iters set to vf/2.\n"); Agree this looks nice, but the old formatting was correct: the string should be indented under “MSG_NOTE, ”. Same for the epilogue. > + /* If peeled iterations are unknown, count a taken branch and a not taken > + branch per peeled loop. Even if scalar loop iterations are known, > + vector iterations are not known since peeled prologue iterations are > + not known. Hence guards remain the same. */ Should be two spaces rather than one space before “Hence”. Thanks for doing this. Richard
Hi Richard, on 2020/7/27 下午9:10, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: >> Hi, >> >> As Richard S. suggested in the thread: >> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550633.html >> >> this patch is separated from the one of that thread, mainly to refactor the >> existing peel_iters_{pro,epi}logue cost model handlings. >> >> I've addressed Richard S.'s review comments there, moreover because of one >> failure of aarch64 testing, I updated it a bit more to keep the logic unchanged >> as before first (refactor_cost.diff). > > Heh, nice when a clean-up exposes an existing bug. ;-) I agree the > updates look correct. E.g. if vf is 1, we should assume that there > are no peeled iterations even if the number of iterations is unknown. > > Both patches are OK with some very minor nits (sorry): Thanks for the comments! I've addressed them and commit refactor_cost.diff via r11-2378. For adjust.diff, it works well on powerpc64le-linux-gnu (bootstrap/regtest), but got one failure on aarch64 as below: PASS->FAIL: gcc.target/aarch64/sve/cost_model_2.c -march=armv8.2-a+sve scan-assembler-times \\tst1w\\tz 1 I spent some time investigating it and thought it's expected. As you mentioned above, the patch can fix the cases with VF 1, the VF of this case is exactly one. :) Without the proposed adjustment, the cost for V16QI looks like: Vector inside of loop cost: 1 Vector prologue cost: 4 Vector epilogue cost: 3 Scalar iteration cost: 4 Scalar outside cost: 6 Vector outside cost: 7 prologue iterations: 0 epilogue iterations: 0 After the change, the cost becomes to: Vector inside of loop cost: 1 Vector prologue cost: 1 Vector epilogue cost: 0 Scalar iteration cost: 4 Scalar outside cost: 6 Vector outside cost: 1 prologue iterations: 0 epilogue iterations: 0 which outperforms VNx16QI that isn't affected by this patch with partial vectors. The test case expects VNx16QI to be used, the asm looks like: cmp w2, 0 ble .L1 mov x1, 0 sbfiz x2, x2, 2, 32 mov z0.s, #1 whilelo p0.s, xzr, x2 .p2align 3,,7 .L3: st1w z0.s, p0, [x0, x1, lsl 2] add x1, x1, 4 whilelo p0.s, x1, x2 b.any .L3 But with this patch, instead it ends up with: .LFB0: .cfi_startproc cmp w2, 0 ble .L1 movi v0.4s, 0x1 sub w1, w2, #1 add x2, x0, 16 add x1, x2, x1, uxtw 4 .p2align 3,,7 .L3: str q0, [x0], 16 cmp x0, x1 bne .L3 I updated the expected asm as the current generated asm, not sure whether it's a fix good enough. Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu. BR, Kewen ----- gcc/ChangeLog: * tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch taken costs for prologue and epilogue if they don't exist. (vect_estimate_min_profitable_iters): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model change. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c index d9d707893d8..a7d748397d6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_2.c @@ -9,4 +9,4 @@ vset (int *restrict dst, int *restrict src, int count) *dst++ = 1; } -/* { dg-final { scan-assembler-times {\tst1w\tz} 1 } } */ +/* { dg-final { scan-assembler-times {\tstr\tq} 1 } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 185019c3305..43ec4fb04d5 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3520,10 +3520,12 @@ vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue, { /* If peeled iterations are known but number of scalar loop iterations are unknown, count a taken branch per peeled loop. */ - retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, NULL, - NULL_TREE, 0, vect_prologue); - retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken, NULL, - NULL_TREE, 0, vect_epilogue); + if (peel_iters_prologue > 0) + retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, + NULL, NULL_TREE, 0, vect_prologue); + if (*peel_iters_epilogue > 0) + retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken, + NULL, NULL_TREE, 0, vect_epilogue); } stmt_info_for_cost *si; @@ -3670,7 +3672,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, bool prologue_need_br_not_taken_cost = false; /* Calculate peel_iters_prologue. */ - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (vect_use_loop_mask_for_alignment_p (loop_vinfo)) peel_iters_prologue = 0; else if (npeel < 0) { @@ -3689,7 +3691,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, else { peel_iters_prologue = npeel; - if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && peel_iters_prologue > 0) /* If peeled iterations are known but number of scalar loop iterations are unknown, count a taken branch per peeled loop. */ prologue_need_br_taken_cost = true; @@ -3719,7 +3721,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, else { peel_iters_epilogue = vect_get_peel_iters_epilogue (loop_vinfo, npeel); - if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && peel_iters_epilogue > 0) /* If peeled iterations are known but number of scalar loop iterations are unknown, count a taken branch per peeled loop. */ epilogue_need_br_taken_cost = true;
"Kewen.Lin" <linkw@linux.ibm.com> writes: > Hi Richard, > > on 2020/7/27 下午9:10, Richard Sandiford wrote: >> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>> Hi, >>> >>> As Richard S. suggested in the thread: >>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550633.html >>> >>> this patch is separated from the one of that thread, mainly to refactor the >>> existing peel_iters_{pro,epi}logue cost model handlings. >>> >>> I've addressed Richard S.'s review comments there, moreover because of one >>> failure of aarch64 testing, I updated it a bit more to keep the logic unchanged >>> as before first (refactor_cost.diff). >> >> Heh, nice when a clean-up exposes an existing bug. ;-) I agree the >> updates look correct. E.g. if vf is 1, we should assume that there >> are no peeled iterations even if the number of iterations is unknown. >> >> Both patches are OK with some very minor nits (sorry): > > Thanks for the comments! I've addressed them and commit refactor_cost.diff > via r11-2378. > > For adjust.diff, it works well on powerpc64le-linux-gnu (bootstrap/regtest), > but got one failure on aarch64 as below: > > PASS->FAIL: gcc.target/aarch64/sve/cost_model_2.c -march=armv8.2-a+sve scan-assembler-times \\tst1w\\tz 1 > > I spent some time investigating it and thought it's expected. As you mentioned > above, the patch can fix the cases with VF 1, the VF of this case is exactly one. :) > > Without the proposed adjustment, the cost for V16QI looks like: > > Vector inside of loop cost: 1 > Vector prologue cost: 4 > Vector epilogue cost: 3 > Scalar iteration cost: 4 > Scalar outside cost: 6 > Vector outside cost: 7 > prologue iterations: 0 > epilogue iterations: 0 > > After the change, the cost becomes to: > > Vector inside of loop cost: 1 > Vector prologue cost: 1 > Vector epilogue cost: 0 > Scalar iteration cost: 4 > Scalar outside cost: 6 > Vector outside cost: 1 > prologue iterations: 0 > epilogue iterations: 0 > > which outperforms VNx16QI that isn't affected by this patch with partial vectors. Hmm. V16QI outperforming VNx16QI is kind-of bogus in this case, since the store is a full-vector store for both Advanced SIMD and 128-bit SVE. But we do currently still generate the SVE loop in a more general form (since doing otherwise introduces other problems) so I guess the result is self-consistent. More importantly, even if the costs were equal, V16QI would still win for 128-bit SVE. So yeah, while I had my doubts at first, I agree this is the right fix. > gcc/ChangeLog: > > * tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch > taken costs for prologue and epilogue if they don't exist. > (vect_estimate_min_profitable_iters): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model > change. OK, thanks. Richard
on 2020/7/31 下午6:57, Richard Sandiford wrote: > "Kewen.Lin" <linkw@linux.ibm.com> writes: >> Hi Richard, >> >> on 2020/7/27 下午9:10, Richard Sandiford wrote: >>> "Kewen.Lin" <linkw@linux.ibm.com> writes: >>>> Hi, >>>> >>>> As Richard S. suggested in the thread: >>>> >>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550633.html >>>> >>>> this patch is separated from the one of that thread, mainly to refactor the >>>> existing peel_iters_{pro,epi}logue cost model handlings. >>>> >>>> I've addressed Richard S.'s review comments there, moreover because of one >>>> failure of aarch64 testing, I updated it a bit more to keep the logic unchanged >>>> as before first (refactor_cost.diff). >>> >>> Heh, nice when a clean-up exposes an existing bug. ;-) I agree the >>> updates look correct. E.g. if vf is 1, we should assume that there >>> are no peeled iterations even if the number of iterations is unknown. >>> >>> Both patches are OK with some very minor nits (sorry): >> >> Thanks for the comments! I've addressed them and commit refactor_cost.diff >> via r11-2378. >> >> For adjust.diff, it works well on powerpc64le-linux-gnu (bootstrap/regtest), >> but got one failure on aarch64 as below: >> >> PASS->FAIL: gcc.target/aarch64/sve/cost_model_2.c -march=armv8.2-a+sve scan-assembler-times \\tst1w\\tz 1 >> >> I spent some time investigating it and thought it's expected. As you mentioned >> above, the patch can fix the cases with VF 1, the VF of this case is exactly one. :) >> >> Without the proposed adjustment, the cost for V16QI looks like: >> >> Vector inside of loop cost: 1 >> Vector prologue cost: 4 >> Vector epilogue cost: 3 >> Scalar iteration cost: 4 >> Scalar outside cost: 6 >> Vector outside cost: 7 >> prologue iterations: 0 >> epilogue iterations: 0 >> >> After the change, the cost becomes to: >> >> Vector inside of loop cost: 1 >> Vector prologue cost: 1 >> Vector epilogue cost: 0 >> Scalar iteration cost: 4 >> Scalar outside cost: 6 >> Vector outside cost: 1 >> prologue iterations: 0 >> epilogue iterations: 0 >> >> which outperforms VNx16QI that isn't affected by this patch with partial vectors. > > Hmm. V16QI outperforming VNx16QI is kind-of bogus in this case, > since the store is a full-vector store for both Advanced SIMD and > 128-bit SVE. But we do currently still generate the SVE loop in > a more general form (since doing otherwise introduces other problems) > so I guess the result is self-consistent. > > More importantly, even if the costs were equal, V16QI would still > win for 128-bit SVE. > > So yeah, while I had my doubts at first, I agree this is the right fix. > Thanks for the explanation! Committed in r11-2453. BR, Kewen
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index e933441b922..b3e84a590a4 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3474,42 +3474,56 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, return NULL; } -/* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times. */ -int -vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue, - int *peel_iters_epilogue, - stmt_vector_for_cost *scalar_cost_vec, - stmt_vector_for_cost *prologue_cost_vec, - stmt_vector_for_cost *epilogue_cost_vec) +/* Estimate the number of peeled epilogue iterations for LOOP_VINFO. + PEEL_ITERS_PROLOGUE is the number of peeled prologue iterations, + or -1 if not known. */ + +static int +vect_get_peel_iters_epilogue (loop_vec_info loop_vinfo, int peel_iters_prologue) { - int retval = 0; int assumed_vf = vect_vf_for_cost (loop_vinfo); - - if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) || peel_iters_prologue == -1) { - *peel_iters_epilogue = assumed_vf / 2; if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, + dump_printf_loc (MSG_NOTE, vect_location, "cost model: epilogue peel iters set to vf/2 " "because loop iterations are unknown .\n"); - - /* If peeled iterations are known but number of scalar loop - iterations are unknown, count a taken branch per peeled loop. */ - retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, - NULL, NULL_TREE, 0, vect_prologue); - retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken, - NULL, NULL_TREE, 0, vect_epilogue); + return assumed_vf / 2; } else { int niters = LOOP_VINFO_INT_NITERS (loop_vinfo); - peel_iters_prologue = niters < peel_iters_prologue ? - niters : peel_iters_prologue; - *peel_iters_epilogue = (niters - peel_iters_prologue) % assumed_vf; + peel_iters_prologue = MIN (niters, peel_iters_prologue); + int peel_iters_epilogue = (niters - peel_iters_prologue) % assumed_vf; /* If we need to peel for gaps, but no peeling is required, we have to peel VF iterations. */ - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && !*peel_iters_epilogue) - *peel_iters_epilogue = assumed_vf; + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && !peel_iters_epilogue) + peel_iters_epilogue = assumed_vf; + return peel_iters_epilogue; + } +} + +/* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times. */ +int +vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue, + int *peel_iters_epilogue, + stmt_vector_for_cost *scalar_cost_vec, + stmt_vector_for_cost *prologue_cost_vec, + stmt_vector_for_cost *epilogue_cost_vec) +{ + int retval = 0; + + *peel_iters_epilogue + = vect_get_peel_iters_epilogue (loop_vinfo, peel_iters_prologue); + + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + { + /* If peeled iterations are known but number of scalar loop + iterations are unknown, count a taken branch per peeled loop. */ + retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, NULL, + NULL_TREE, 0, vect_prologue); + retval += record_stmt_cost (epilogue_cost_vec, 1, cond_branch_taken, NULL, + NULL_TREE, 0, vect_epilogue); } stmt_info_for_cost *si; @@ -3652,24 +3666,110 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, TODO: Build an expression that represents peel_iters for prologue and epilogue to be used in a run-time test. */ + bool prologue_need_br_taken_cost = false; + bool prologue_need_br_not_taken_cost = false; + + /* Calculate peel_iters_prologue. */ if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + peel_iters_prologue = 0; + else if (npeel < 0) { - peel_iters_prologue = 0; - peel_iters_epilogue = 0; + peel_iters_prologue = assumed_vf / 2; + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "cost model: " + "prologue peel iters set to vf/2.\n"); - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) - { - /* We need to peel exactly one iteration. */ - peel_iters_epilogue += 1; - stmt_info_for_cost *si; - int j; - FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), - j, si) - (void) add_stmt_cost (loop_vinfo, target_cost_data, si->count, - si->kind, si->stmt_info, si->vectype, - si->misalign, vect_epilogue); - } + /* If peeled iterations are unknown, count a taken branch and a not taken + branch per peeled loop. Even if scalar loop iterations are known, + vector iterations are not known since peeled prologue iterations are + not known. Hence guards remain the same. */ + prologue_need_br_taken_cost = true; + prologue_need_br_not_taken_cost = true; + } + else + { + peel_iters_prologue = npeel; + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + /* If peeled iterations are known but number of scalar loop + iterations are unknown, count a taken branch per peeled loop. */ + prologue_need_br_taken_cost = true; + } + + bool epilogue_need_br_taken_cost = false; + bool epilogue_need_br_not_taken_cost = false; + /* Calculate peel_iters_epilogue. */ + if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) + /* We need to peel exactly one iteration for gaps. */ + peel_iters_epilogue = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 0; + else if (npeel < 0) + { + /* If peeling for alignment is unknown, loop bound of main loop + becomes unknown. */ + peel_iters_epilogue = assumed_vf / 2; + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "cost model: " + "epilogue peel iters set to vf/2 because " + "peeling for alignment is unknown.\n"); + + /* See the same reason above in peel_iters_prologue calculation. */ + epilogue_need_br_taken_cost = true; + epilogue_need_br_not_taken_cost = true; + } + else + { + peel_iters_epilogue = vect_get_peel_iters_epilogue (loop_vinfo, npeel); + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) + /* If peeled iterations are known but number of scalar loop + iterations are unknown, count a taken branch per peeled loop. */ + epilogue_need_br_taken_cost = true; + } + + stmt_info_for_cost *si; + int j; + /* Add costs associated with peel_iters_prologue. */ + if (peel_iters_prologue) + FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), j, si) + { + (void) add_stmt_cost (loop_vinfo, target_cost_data, + si->count * peel_iters_prologue, si->kind, + si->stmt_info, si->vectype, si->misalign, + vect_prologue); + } + + /* Add costs associated with peel_iters_epilogue. */ + if (peel_iters_epilogue) + FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), j, si) + { + (void) add_stmt_cost (loop_vinfo, target_cost_data, + si->count * peel_iters_epilogue, si->kind, + si->stmt_info, si->vectype, si->misalign, + vect_epilogue); + } + + /* Add possible cond_branch_taken/cond_branch_not_taken cost. */ + + if (prologue_need_br_taken_cost) + (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken, + NULL, NULL_TREE, 0, vect_prologue); + + if (prologue_need_br_not_taken_cost) + (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, + cond_branch_not_taken, NULL, NULL_TREE, 0, + vect_prologue); + + if (epilogue_need_br_taken_cost) + (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken, + NULL, NULL_TREE, 0, vect_epilogue); + + if (epilogue_need_br_not_taken_cost) + (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, + cond_branch_not_taken, NULL, NULL_TREE, 0, + vect_epilogue); + + /* Take care of special costs for rgroup controls of partial vectors. */ + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + { /* Calculate how many masks we need to generate. */ unsigned int num_masks = 0; rgroup_controls *rgm; @@ -3691,93 +3791,10 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, simpler and safer to use the worst-case cost; if this ends up being the tie-breaker between vectorizing or not, then it's probably better not to vectorize. */ - (void) add_stmt_cost (loop_vinfo, - target_cost_data, num_masks, vector_stmt, - NULL, NULL_TREE, 0, vect_prologue); - (void) add_stmt_cost (loop_vinfo, - target_cost_data, num_masks - 1, vector_stmt, - NULL, NULL_TREE, 0, vect_body); - } - else if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) - { - peel_iters_prologue = 0; - peel_iters_epilogue = 0; - } - else if (npeel < 0) - { - peel_iters_prologue = assumed_vf / 2; - if (dump_enabled_p ()) - dump_printf (MSG_NOTE, "cost model: " - "prologue peel iters set to vf/2.\n"); - - /* If peeling for alignment is unknown, loop bound of main loop becomes - unknown. */ - peel_iters_epilogue = assumed_vf / 2; - if (dump_enabled_p ()) - dump_printf (MSG_NOTE, "cost model: " - "epilogue peel iters set to vf/2 because " - "peeling for alignment is unknown.\n"); - - /* If peeled iterations are unknown, count a taken branch and a not taken - branch per peeled loop. Even if scalar loop iterations are known, - vector iterations are not known since peeled prologue iterations are - not known. Hence guards remain the same. */ - (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken, - NULL, NULL_TREE, 0, vect_prologue); - (void) add_stmt_cost (loop_vinfo, - target_cost_data, 1, cond_branch_not_taken, - NULL, NULL_TREE, 0, vect_prologue); - (void) add_stmt_cost (loop_vinfo, target_cost_data, 1, cond_branch_taken, - NULL, NULL_TREE, 0, vect_epilogue); - (void) add_stmt_cost (loop_vinfo, - target_cost_data, 1, cond_branch_not_taken, - NULL, NULL_TREE, 0, vect_epilogue); - stmt_info_for_cost *si; - int j; - FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), j, si) - { - (void) add_stmt_cost (loop_vinfo, target_cost_data, - si->count * peel_iters_prologue, - si->kind, si->stmt_info, si->vectype, - si->misalign, - vect_prologue); - (void) add_stmt_cost (loop_vinfo, target_cost_data, - si->count * peel_iters_epilogue, - si->kind, si->stmt_info, si->vectype, - si->misalign, - vect_epilogue); - } - } - else - { - stmt_vector_for_cost prologue_cost_vec, epilogue_cost_vec; - stmt_info_for_cost *si; - int j; - void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); - - prologue_cost_vec.create (2); - epilogue_cost_vec.create (2); - peel_iters_prologue = npeel; - - (void) vect_get_known_peeling_cost (loop_vinfo, peel_iters_prologue, - &peel_iters_epilogue, - &LOOP_VINFO_SCALAR_ITERATION_COST - (loop_vinfo), - &prologue_cost_vec, - &epilogue_cost_vec); - - FOR_EACH_VEC_ELT (prologue_cost_vec, j, si) - (void) add_stmt_cost (loop_vinfo, - data, si->count, si->kind, si->stmt_info, - si->vectype, si->misalign, vect_prologue); - - FOR_EACH_VEC_ELT (epilogue_cost_vec, j, si) - (void) add_stmt_cost (loop_vinfo, - data, si->count, si->kind, si->stmt_info, - si->vectype, si->misalign, vect_epilogue); - - prologue_cost_vec.release (); - epilogue_cost_vec.release (); + (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks, + vector_stmt, NULL, NULL_TREE, 0, vect_prologue); + (void) add_stmt_cost (loop_vinfo, target_cost_data, num_masks - 1, + vector_stmt, NULL, NULL_TREE, 0, vect_body); } /* FORNOW: The scalar outside cost is incremented in one of the