Message ID | 20150915135233.GA26618@msticlxl57.ims.intel.com |
---|---|
State | New |
Headers | show |
On 09/15/2015 06:52 AM, Ilya Enkovich wrote: > I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target. Several issues were met. > > - c/c++ front-ends generate vector comparison with integer vector result. I had to make some modifications to use vec_cond instead. Don't know if there are other front-ends producing vector comparisons. > - vector lowering fails to expand vector masks due to mismatch of type and mode sizes. I fixed vector type size computation to match mode size and added a special handling of mask expand. > - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here. But get_mask_mode call may be moved into type layout to get it. > - Expand of vec<bool> constants/contstructors requires special handling. Common case should require target hooks/optabs to expand vector into required mode. But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes. > > Currently 'make check' shows two types of regression. > - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). This must be due to my front-end changes. Hope it will be easy to fix. > - missed vectorization. All of them appear due to bool patterns disabling. I didn't look into all of them but it seems the main problem is in mixed type sizes. With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation. With boolean mask we don't have a proper scalar statements to do that. I think mask widening/narrowing may be directly supported in masked statements vectorization. Going to look into it. > > I attach what I currently have for a prototype. It grows bigger so I split into several parts. The general approach looks good. > +/* By defaults a vector of integers is used as a mask. */ > + > +machine_mode > +default_get_mask_mode (unsigned nunits, unsigned vector_size) > +{ > + unsigned elem_size = vector_size / nunits; > + machine_mode elem_mode > + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); Why these arguments as opposed to passing elem_size? It seems that every hook is going to have to do this division... > +#define VECTOR_MASK_TYPE_P(TYPE) \ > + (TREE_CODE (TYPE) == VECTOR_TYPE \ > + && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE) Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being tested? > @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1) > return true; > } > } > - /* Or an integer vector type with the same size and element count > + /* Or a boolean vector type with the same element count > as the comparison operand types. */ > else if (TREE_CODE (type) == VECTOR_TYPE > - && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE) > + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) VECTOR_BOOLEAN_TYPE_P. > @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type, > tree t, tree bitsize, tree bitpos) > { > if (bitpos) > - return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); > + { > + if (TREE_CODE (type) == BOOLEAN_TYPE) > + { > + tree itype > + = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0); > + tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t, > + bitsize, bitpos); > + return gimplify_build2 (gsi, NE_EXPR, type, field, > + build_zero_cst (itype)); > + } > + else > + return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); > + } > else > return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t); > } So... this is us lowering vector operations on a target that doesn't support them. Which means that we get to decide what value is produced for a comparison? Which means that we can settle on the "normal" -1, correct? Which means that we ought not need to extract the entire element and then compare for non-zero, but rather simply extract a single bit from the element, and directly call that a boolean result, correct? I assume you tested all this code with -mno-sse or equivalent arch default? > @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt) > create_output_operand (&ops[0], target, TYPE_MODE (type)); > create_fixed_operand (&ops[1], mem); > create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); > + expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type), > + TYPE_MODE (TREE_TYPE (maskt))), > + 3, ops); Why do we now need a conversion here? > + if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT) > + { > + /* This is a vcond with mask. To be supported soon... */ > + gcc_unreachable (); > + } Leave this out til we need it? I can't see that you replace this later in the patch series... r~
On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: > On 08 Sep 15:37, Ilya Enkovich wrote: >> 2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>: >> > >> > So do we have enough confidence in this representation that we want to go >> > ahead and commit to it? >> >> I think new representation fits nice mostly. There are some places >> where I have to make some exceptions for vector of bools to make it >> work. This is mostly to avoid target modifications. I'd like to avoid >> necessity to change all targets currently supporting vec_cond. It >> makes me add some special handling of vec<bool> in GIMPLE, e.g. I add >> a special code in vect_init_vector to build vec<bool> invariants with >> proper casting to int. Otherwise I'd need to do it on a target side. >> >> I made several fixes and current patch (still allowing integer vector >> result for vector comparison and applying bool patterns) passes >> bootstrap and regression testing on x86_64. Now I'll try to fully >> switch to vec<bool> and see how it goes. >> >> Thanks, >> Ilya >> > > Hi, > > I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target. Several issues were met. > > - c/c++ front-ends generate vector comparison with integer vector result. I had to make some modifications to use vec_cond instead. Don't know if there are other front-ends producing vector comparisons. > - vector lowering fails to expand vector masks due to mismatch of type and mode sizes. I fixed vector type size computation to match mode size and added a special handling of mask expand. > - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here. But get_mask_mode call may be moved into type layout to get it. > - Expand of vec<bool> constants/contstructors requires special handling. Common case should require target hooks/optabs to expand vector into required mode. But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes. One complication you might run into currently is that at the moment we require the comparison result to be of the same size as the comparison operands. This means that vector<bool> with, say, 4 elements has to support different modes for v4si < v4si vs. v4df < v4df (if you think of x86 with its multiple vector sizes). That's for the "fallback" non-mask vector<bool> only of course. Does that mean we have to use different bool types with different modes here? So the other possibility is to never expose the fallback vector<bool> anywhere but make sure to lower to vector<int> via VEC_COND_EXPRs. After all it's only the vectorizer that should create stmts with vector<bool> LHS and the vectorizer is already careful to only generate code supported by the target. > Currently 'make check' shows two types of regression. > - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). This must be due to my front-end changes. Hope it will be easy to fix. > - missed vectorization. All of them appear due to bool patterns disabling. I didn't look into all of them but it seems the main problem is in mixed type sizes. With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation. With boolean mask we don't have a proper scalar statements to do that. I think mask widening/narrowing may be directly supported in masked statements vectorization. Going to look into it. > > I attach what I currently have for a prototype. It grows bigger so I split into several parts. > > Thanks, > Ilya > -- > * avx512-vec-bool-01-add-truth-vector.ChangeLog > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * doc/tm.texi: Regenerated. > * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New. > * stor-layout.c (layout_type): Use mode to get vector mask size. > (vector_type_mode): Likewise. > * target.def (get_mask_mode): New. > * targhooks.c (default_vector_alignment): Use mode alignment > for vector masks. > (default_get_mask_mode): New. > * targhooks.h (default_get_mask_mode): New. > * tree.c (make_vector_type): Vector mask has no canonical type. > (build_truth_vector_type): New. > (build_same_sized_truth_vector_type): New. > (truth_type_for): Support vector masks. > * tree.h (VECTOR_MASK_TYPE_P): New. > (build_truth_vector_type): New. > (build_same_sized_truth_vector_type): New. > > * avx512-vec-bool-02-no-int-vec-cmp.ChangeLog > > gcc/ > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * tree-cfg.c (verify_gimple_comparison) Require vector mask > type for vector comparison. > (verify_gimple_assign_ternary): Likewise. > > gcc/c > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * c-typeck.c (build_conditional_expr): Use vector mask > type for vector comparison. > (build_vec_cmp): New. > (build_binary_op): Use build_vec_cmp for comparison. > > gcc/cp > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * call.c (build_conditional_expr_1): Use vector mask > type for vector comparison. > * typeck.c (build_vec_cmp): New. > (cp_build_binary_op): Use build_vec_cmp for comparison. > > * avx512-vec-bool-03-vec-lower.ChangeLog > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * tree-vect-generic.c (tree_vec_extract): Use additional > comparison when extracting boolean value. > (do_bool_compare): New. > (expand_vector_comparison): Add casts for vector mask. > (expand_vector_divmod): Use vector mask type for vector > comparison. > (expand_vector_operations_1) Skip scalar mode mask statements. > > * avx512-vec-bool-04-vectorize.ChangeLog > > gcc/ > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results. > (const_vector_mask_from_tree): New. > (const_vector_from_tree): Use const_vector_mask_from_tree for vector > masks. > * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes. > (expand_MASK_STORE): Likewise. > * optabs.c (vector_compare_rtx): Add OPNO arg. > (expand_vec_cond_expr): Adjust to vector_compare_rtx change. > (get_vec_cmp_icode): New. > (expand_vec_cmp_expr_p): New. > (expand_vec_cmp_expr): New. > (can_vec_mask_load_store_p): Add MASK_MODE arg. > * optabs.def (vec_cmp_optab): New. > (vec_cmpu_optab): New. > (maskload_optab): Transform into convert optab. > (maskstore_optab): Likewise. > * optabs.h (expand_vec_cmp_expr_p): New. > (expand_vec_cmp_expr): New. > (can_vec_mask_load_store_p): Add MASK_MODE arg. > * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to > can_vec_mask_load_store_p signature change. > (predicate_mem_writes): Use boolean mask. > * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var. > (vect_create_destination_var): Likewise. > * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask > operations for VF. Add mask type computation. > * tree-vect-stmts.c (vect_init_vector): Support mask invariants. > (vect_get_vec_def_for_operand): Support mask constant. > (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p > signature change. > (vectorizable_condition): Use vector mask type for vector comparison. > (vectorizable_comparison): New. > (vect_analyze_stmt): Add vectorizable_comparison. > (vect_transform_stmt): Likewise. > (get_mask_type_for_scalar_type): New. > * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var > (enum stmt_vec_info_type): Add comparison_vec_info_type. > (get_mask_type_for_scalar_type): New. > > * avx512-vec-bool-05-bool-patterns.ChangeLog > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * tree-vect-patterns.c (check_bool_pattern): Check fails > if we can vectorize comparison directly. > (search_type_for_mask): New. > (vect_recog_bool_pattern): Support cases when bool pattern > check fails. > > * avx512-vec-bool-06-i386.ChangeLog > > 2015-09-15 Ilya Enkovich <enkovich.gnu@gmail.com> > > * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New. > (ix86_expand_int_vec_cmp): New. > (ix86_expand_fp_vec_cmp): New. > * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for > op_true and op_false. > (ix86_int_cmp_code_to_pcmp_immediate): New. > (ix86_fp_cmp_code_to_pcmp_immediate): New. > (ix86_cmp_code_to_pcmp_immediate): New. > (ix86_expand_mask_vec_cmp): New. > (ix86_expand_fp_vec_cmp): New. > (ix86_expand_int_sse_cmp): New. > (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp. > (ix86_expand_int_vec_cmp): New. > (ix86_get_mask_mode): New. > (TARGET_VECTORIZE_GET_MASK_MODE): New. > * config/i386/sse.md (avx512fmaskmodelower): New. > (vec_cmp<mode><avx512fmaskmodelower>): New. > (vec_cmp<mode><sseintvecmodelower>): New. > (vec_cmpv2div2di): New. > (vec_cmpu<mode><avx512fmaskmodelower>): New. > (vec_cmpu<mode><sseintvecmodelower>): New. > (vec_cmpuv2div2di): New. > (maskload<mode>): Rename to ... > (maskload<mode><sseintvecmodelower>): ... this. > (maskstore<mode>): Rename to ... > (maskstore<mode><sseintvecmodelower>): ... this. > (maskload<mode><avx512fmaskmodelower>): New. > (maskstore<mode><avx512fmaskmodelower>): New.
2015-09-17 20:35 GMT+03:00 Richard Henderson <rth@redhat.com>: > On 09/15/2015 06:52 AM, Ilya Enkovich wrote: >> I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target. Several issues were met. >> >> - c/c++ front-ends generate vector comparison with integer vector result. I had to make some modifications to use vec_cond instead. Don't know if there are other front-ends producing vector comparisons. >> - vector lowering fails to expand vector masks due to mismatch of type and mode sizes. I fixed vector type size computation to match mode size and added a special handling of mask expand. >> - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here. But get_mask_mode call may be moved into type layout to get it. >> - Expand of vec<bool> constants/contstructors requires special handling. Common case should require target hooks/optabs to expand vector into required mode. But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes. >> >> Currently 'make check' shows two types of regression. >> - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). This must be due to my front-end changes. Hope it will be easy to fix. >> - missed vectorization. All of them appear due to bool patterns disabling. I didn't look into all of them but it seems the main problem is in mixed type sizes. With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation. With boolean mask we don't have a proper scalar statements to do that. I think mask widening/narrowing may be directly supported in masked statements vectorization. Going to look into it. >> >> I attach what I currently have for a prototype. It grows bigger so I split into several parts. > > The general approach looks good. > Great! > >> +/* By defaults a vector of integers is used as a mask. */ >> + >> +machine_mode >> +default_get_mask_mode (unsigned nunits, unsigned vector_size) >> +{ >> + unsigned elem_size = vector_size / nunits; >> + machine_mode elem_mode >> + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); > > Why these arguments as opposed to passing elem_size? It seems that every hook > is going to have to do this division... Every target would have nunits = vector_size / elem_size because nunits is used to create a vector mode. Thus no difference. > >> +#define VECTOR_MASK_TYPE_P(TYPE) \ >> + (TREE_CODE (TYPE) == VECTOR_TYPE \ >> + && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE) > > Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being tested? OK > >> @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1) >> return true; >> } >> } >> - /* Or an integer vector type with the same size and element count >> + /* Or a boolean vector type with the same element count >> as the comparison operand types. */ >> else if (TREE_CODE (type) == VECTOR_TYPE >> - && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE) >> + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) > > VECTOR_BOOLEAN_TYPE_P. > >> @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type, >> tree t, tree bitsize, tree bitpos) >> { >> if (bitpos) >> - return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); >> + { >> + if (TREE_CODE (type) == BOOLEAN_TYPE) >> + { >> + tree itype >> + = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0); >> + tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t, >> + bitsize, bitpos); >> + return gimplify_build2 (gsi, NE_EXPR, type, field, >> + build_zero_cst (itype)); >> + } >> + else >> + return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); >> + } >> else >> return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t); >> } > > So... this is us lowering vector operations on a target that doesn't support > them. Which means that we get to decide what value is produced for a > comparison? Which means that we can settle on the "normal" -1, correct? > > Which means that we ought not need to extract the entire element and then > compare for non-zero, but rather simply extract a single bit from the element, > and directly call that a boolean result, correct? Didn't think about that. I'll give it a try. > > I assume you tested all this code with -mno-sse or equivalent arch default? I didn't make some special runs for that. Just used regression testing which seems to have such tests. > >> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt) >> create_output_operand (&ops[0], target, TYPE_MODE (type)); >> create_fixed_operand (&ops[1], mem); >> create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); >> - expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); >> + expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type), >> + TYPE_MODE (TREE_TYPE (maskt))), >> + 3, ops); > > Why do we now need a conversion here? Mask mode was implicit for masked loads and stores. Now it becomes explicit because we may load the same value using different masks. E.g. for i386 we may load 256bit vector using both vector and scalar masks. > >> + if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT) >> + { >> + /* This is a vcond with mask. To be supported soon... */ >> + gcc_unreachable (); >> + } > > Leave this out til we need it? I can't see that you replace this later in the > patch series... Currently we just shouldn't generate such vec_cond. But later I want to enable such statements generation and this will need to handle non vector masks. I don't have it in my patches set yet. But it is not finished and have lots of unfinished work. I don't expect detailed review for these patches. Once we make a decision that this representation works fine and we want to have, I'll address opened issues and send it a separate series. Thanks, Ilya > > > r~
2015-09-18 15:29 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Tue, Sep 15, 2015 at 3:52 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote: >> On 08 Sep 15:37, Ilya Enkovich wrote: >>> 2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>: >>> > >>> > So do we have enough confidence in this representation that we want to go >>> > ahead and commit to it? >>> >>> I think new representation fits nice mostly. There are some places >>> where I have to make some exceptions for vector of bools to make it >>> work. This is mostly to avoid target modifications. I'd like to avoid >>> necessity to change all targets currently supporting vec_cond. It >>> makes me add some special handling of vec<bool> in GIMPLE, e.g. I add >>> a special code in vect_init_vector to build vec<bool> invariants with >>> proper casting to int. Otherwise I'd need to do it on a target side. >>> >>> I made several fixes and current patch (still allowing integer vector >>> result for vector comparison and applying bool patterns) passes >>> bootstrap and regression testing on x86_64. Now I'll try to fully >>> switch to vec<bool> and see how it goes. >>> >>> Thanks, >>> Ilya >>> >> >> Hi, >> >> I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target. Several issues were met. >> >> - c/c++ front-ends generate vector comparison with integer vector result. I had to make some modifications to use vec_cond instead. Don't know if there are other front-ends producing vector comparisons. >> - vector lowering fails to expand vector masks due to mismatch of type and mode sizes. I fixed vector type size computation to match mode size and added a special handling of mask expand. >> - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here. But get_mask_mode call may be moved into type layout to get it. >> - Expand of vec<bool> constants/contstructors requires special handling. Common case should require target hooks/optabs to expand vector into required mode. But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes. > > One complication you might run into currently is that at the moment we > require the comparison result to be > of the same size as the comparison operands. This means that > vector<bool> with, say, 4 elements has > to support different modes for v4si < v4si vs. v4df < v4df (if you > think of x86 with its multiple vector sizes). > That's for the "fallback" non-mask vector<bool> only of course. Does > that mean we have to use different > bool types with different modes here? I though about boolean types with different sizes/modes. I still avoid them but it causes some ugliness. E.g. sizeof(innertype)*nelems != sizeof(vectortype) for vec<bool>. I causes some special handling in type layout and problems in lowering because BIT_FIELD_REF uses more bits than resulting type has. I use additional comparison to handle it. Richard also proposed to extract one bit only for bools. Don't know if differently sized boolean types may help to resolve this issue or create more problems. > > So the other possibility is to never expose the fallback vector<bool> > anywhere but make sure to lower to > vector<int> via VEC_COND_EXPRs. After all it's only the vectorizer > that should create stmts with > vector<bool> LHS and the vectorizer is already careful to only > generate code supported by the target. In case vec<bool> has integer vector mode, comparison should be handled similar to VEC_COND_EXPR by vect lowering and expand which should be enough to have it properly handled on targets with no vec<bool> support. Thanks, Ilya > >> Currently 'make check' shows two types of regression. >> - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). This must be due to my front-end changes. Hope it will be easy to fix. >> - missed vectorization. All of them appear due to bool patterns disabling. I didn't look into all of them but it seems the main problem is in mixed type sizes. With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation. With boolean mask we don't have a proper scalar statements to do that. I think mask widening/narrowing may be directly supported in masked statements vectorization. Going to look into it. >> >> I attach what I currently have for a prototype. It grows bigger so I split into several parts. >> >> Thanks, >> Ilya
On 09/18/2015 06:21 AM, Ilya Enkovich wrote: >>> +machine_mode >>> +default_get_mask_mode (unsigned nunits, unsigned vector_size) >>> +{ >>> + unsigned elem_size = vector_size / nunits; >>> + machine_mode elem_mode >>> + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); >> >> Why these arguments as opposed to passing elem_size? It seems that every hook >> is going to have to do this division... > > Every target would have nunits = vector_size / elem_size because > nunits is used to create a vector mode. Thus no difference. I meant passing nunits and elem_size, but not vector_size. Thus no division required. If the target does require the vector size, it could be obtained by multiplication, which is cheaper. But in cases like this we'd not require either mult or div. >>> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt) >>> create_output_operand (&ops[0], target, TYPE_MODE (type)); >>> create_fixed_operand (&ops[1], mem); >>> create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); >>> - expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); >>> + expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type), >>> + TYPE_MODE (TREE_TYPE (maskt))), >>> + 3, ops); >> >> Why do we now need a conversion here? > > Mask mode was implicit for masked loads and stores. Now it becomes > explicit because we may load the same value using different masks. > E.g. for i386 we may load 256bit vector using both vector and scalar > masks. Ok, sure, the mask mode is needed, I get that. But that doesn't answer the question regarding conversion. Why would convert_optab_handler be needed to *change* the mode of the mask. I assume that's not actually possible, with the target hook already having chosen the proper mode for the mask. r~
2015-09-18 19:50 GMT+03:00 Richard Henderson <rth@redhat.com>: > On 09/18/2015 06:21 AM, Ilya Enkovich wrote: >>>> +machine_mode >>>> +default_get_mask_mode (unsigned nunits, unsigned vector_size) >>>> +{ >>>> + unsigned elem_size = vector_size / nunits; >>>> + machine_mode elem_mode >>>> + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); >>> >>> Why these arguments as opposed to passing elem_size? It seems that every hook >>> is going to have to do this division... >> >> Every target would have nunits = vector_size / elem_size because >> nunits is used to create a vector mode. Thus no difference. > > I meant passing nunits and elem_size, but not vector_size. Thus no division > required. If the target does require the vector size, it could be obtained by > multiplication, which is cheaper. But in cases like this we'd not require > either mult or div. OK > >>>> @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt) >>>> create_output_operand (&ops[0], target, TYPE_MODE (type)); >>>> create_fixed_operand (&ops[1], mem); >>>> create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); >>>> - expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); >>>> + expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type), >>>> + TYPE_MODE (TREE_TYPE (maskt))), >>>> + 3, ops); >>> >>> Why do we now need a conversion here? >> >> Mask mode was implicit for masked loads and stores. Now it becomes >> explicit because we may load the same value using different masks. >> E.g. for i386 we may load 256bit vector using both vector and scalar >> masks. > > Ok, sure, the mask mode is needed, I get that. But that doesn't answer the > question regarding conversion. Why would convert_optab_handler be needed to > *change* the mode of the mask. I assume that's not actually possible, with the > target hook already having chosen the proper mode for the mask. There is no any conversion here, maskload_optab is a convert_optab because it uses two modes, one for value and the other one for mask. Ilya > > > r~
On 09/21/2015 05:08 AM, Ilya Enkovich wrote: > There is no any conversion here, maskload_optab is a convert_optab > because it uses two modes, one for value and the other one for mask. Ah, I see. In which case I think we ought to come up with a different name. C.f. get_vcond_icode. r~
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f5a1f84..acdfcd5 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5688,6 +5688,11 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}. The default is zero which means to not iterate over other vector sizes. @end deftypefn +@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned @var{nunits}, unsigned @var{length}) +This hook returns mode to be used for a mask to be used for a vector +of specified @var{length} with @var{nunits} elements. +@end deftypefn + @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info}) This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block. The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block. If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9d5ac0a..52e912a 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4225,6 +4225,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES +@hook TARGET_VECTORIZE_GET_MASK_MODE + @hook TARGET_VECTORIZE_INIT_COST @hook TARGET_VECTORIZE_ADD_STMT_COST diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c index 938e54b..f24a0c4 100644 --- a/gcc/stor-layout.c +++ b/gcc/stor-layout.c @@ -2184,11 +2184,22 @@ layout_type (tree type) TYPE_SATURATING (type) = TYPE_SATURATING (TREE_TYPE (type)); TYPE_UNSIGNED (type) = TYPE_UNSIGNED (TREE_TYPE (type)); - TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR, - TYPE_SIZE_UNIT (innertype), - size_int (nunits)); - TYPE_SIZE (type) = int_const_binop (MULT_EXPR, TYPE_SIZE (innertype), - bitsize_int (nunits)); + if (VECTOR_MASK_TYPE_P (type)) + { + TYPE_SIZE_UNIT (type) + = size_int (GET_MODE_SIZE (type->type_common.mode)); + TYPE_SIZE (type) + = bitsize_int (GET_MODE_BITSIZE (type->type_common.mode)); + } + else + { + TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR, + TYPE_SIZE_UNIT (innertype), + size_int (nunits)); + TYPE_SIZE (type) = int_const_binop (MULT_EXPR, + TYPE_SIZE (innertype), + bitsize_int (nunits)); + } /* For vector types, we do not default to the mode's alignment. Instead, query a target hook, defaulting to natural alignment. @@ -2455,7 +2466,14 @@ vector_type_mode (const_tree t) machine_mode innermode = TREE_TYPE (t)->type_common.mode; /* For integers, try mapping it to a same-sized scalar mode. */ - if (GET_MODE_CLASS (innermode) == MODE_INT) + if (VECTOR_MASK_TYPE_P (t)) + { + mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0); + + if (mode != VOIDmode && have_regs_of_mode[mode]) + return mode; + } + else if (GET_MODE_CLASS (innermode) == MODE_INT) { mode = mode_for_size (TYPE_VECTOR_SUBPARTS (t) * GET_MODE_BITSIZE (innermode), MODE_INT, 0); diff --git a/gcc/target.def b/gcc/target.def index 4edc209..c5b8ed9 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1789,6 +1789,15 @@ The default is zero which means to not iterate over other vector sizes.", (void), default_autovectorize_vector_sizes) +/* Function to get a target mode for a vector mask. */ +DEFHOOK +(get_mask_mode, + "This hook returns mode to be used for a mask to be used for a vector\n\ +of specified @var{length} with @var{nunits} elements.", + machine_mode, + (unsigned nunits, unsigned length), + default_get_mask_mode) + /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 7238c8f..ac01d57 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1087,6 +1087,20 @@ default_autovectorize_vector_sizes (void) return 0; } +/* By defaults a vector of integers is used as a mask. */ + +machine_mode +default_get_mask_mode (unsigned nunits, unsigned vector_size) +{ + unsigned elem_size = vector_size / nunits; + machine_mode elem_mode + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); + + gcc_assert (elem_size * nunits == vector_size); + + return mode_for_vector (elem_mode, nunits); +} + /* By default, the cost model accumulates three separate costs (prologue, loop body, and epilogue) for a vectorized loop or block. So allocate an array of three unsigned ints, set it to zero, and return its address. */ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 5ae991d..cc7263f 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -100,6 +100,7 @@ default_builtin_support_vector_misalignment (machine_mode mode, int, bool); extern machine_mode default_preferred_simd_mode (machine_mode mode); extern unsigned int default_autovectorize_vector_sizes (void); +extern machine_mode default_get_mask_mode (unsigned, unsigned); extern void *default_init_cost (struct loop *); extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt, struct _stmt_vec_info *, int, diff --git a/gcc/tree.c b/gcc/tree.c index af3a6a3..946d2ad 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -9742,8 +9742,9 @@ make_vector_type (tree innertype, int nunits, machine_mode mode) if (TYPE_STRUCTURAL_EQUALITY_P (innertype)) SET_TYPE_STRUCTURAL_EQUALITY (t); - else if (TYPE_CANONICAL (innertype) != innertype - || mode != VOIDmode) + else if ((TYPE_CANONICAL (innertype) != innertype + || mode != VOIDmode) + && !VECTOR_MASK_TYPE_P (t)) TYPE_CANONICAL (t) = make_vector_type (TYPE_CANONICAL (innertype), nunits, VOIDmode); @@ -10568,6 +10569,36 @@ build_vector_type (tree innertype, int nunits) return make_vector_type (innertype, nunits, VOIDmode); } +/* Build truth vector with specified length and number of units. */ + +tree +build_truth_vector_type (unsigned nunits, unsigned vector_size) +{ + machine_mode mask_mode = targetm.vectorize.get_mask_mode (nunits, + vector_size); + + if (mask_mode == VOIDmode) + return NULL; + + return make_vector_type (boolean_type_node, nunits, mask_mode); +} + +/* Returns a vector type corresponding to a comparison of VECTYPE. */ + +tree +build_same_sized_truth_vector_type (tree vectype) +{ + if (VECTOR_MASK_TYPE_P (vectype)) + return vectype; + + unsigned HOST_WIDE_INT size = GET_MODE_SIZE (TYPE_MODE (vectype)); + + if (!size) + size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype)); + + return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), size); +} + /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set. */ tree @@ -11054,9 +11085,10 @@ truth_type_for (tree type) { if (TREE_CODE (type) == VECTOR_TYPE) { - tree elem = lang_hooks.types.type_for_size - (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type))), 0); - return build_opaque_vector_type (elem, TYPE_VECTOR_SUBPARTS (type)); + if (VECTOR_MASK_TYPE_P (type)) + return type; + return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (type), + GET_MODE_SIZE (TYPE_MODE (type))); } else return boolean_type_node; diff --git a/gcc/tree.h b/gcc/tree.h index 2cd6ec4..09fb26d 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -469,6 +469,12 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, #define VECTOR_TYPE_P(TYPE) (TREE_CODE (TYPE) == VECTOR_TYPE) +/* Nonzero if TYPE represents a vector of booleans. */ + +#define VECTOR_MASK_TYPE_P(TYPE) \ + (TREE_CODE (TYPE) == VECTOR_TYPE \ + && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE) + /* Nonzero if TYPE represents an integral type. Note that we do not include COMPLEX types here. Keep these checks in ascending code order. */ @@ -3820,6 +3826,8 @@ extern tree build_reference_type_for_mode (tree, machine_mode, bool); extern tree build_reference_type (tree); extern tree build_vector_type_for_mode (tree, machine_mode); extern tree build_vector_type (tree innertype, int nunits); +extern tree build_truth_vector_type (unsigned, unsigned); +extern tree build_same_sized_truth_vector_type (tree vectype); extern tree build_opaque_vector_type (tree innertype, int nunits); extern tree build_index_type (tree); extern tree build_array_type (tree, tree);