Message ID | 5638F8EF.6050607@acm.org |
---|---|
State | New |
Headers | show |
On Tue, Nov 3, 2015 at 7:11 PM, Nathan Sidwell <nathan@acm.org> wrote: > Richard, > this patch implements VRP for the 2 openacc axis internal fns I've added. > We know the position within a dimension cannot exceed that dimensions > extend. Further, if the extend is dynamic, the target backend may well know > there's a hardware-mandated maximum value. > > Hence, added a new target hook to allow the backend to specify that upper > bound, and added smarts to extract_range_basic to process the two internal > functions. > > Incidentally, this was the bit I was working on at the cauldron, which > caused me to work on the min/max range combining. > > ok for trunk? + /* Optimizing these two internal functions helps the loop + optimizer eliminate outer comparisons. Size is [1,N] + and pos is [0,N-1]. */ + { + bool is_pos = ifn_code == IFN_GOACC_DIM_POS; + tree attr = get_oacc_fn_attrib (current_function_decl); + tree arg = gimple_call_arg (stmt, 0); + int axis = TREE_INT_CST_LOW (arg); + tree dims = TREE_VALUE (attr); + + for (int ix = axis; ix--;) + dims = TREE_CHAIN (dims); + int size = TREE_INT_CST_LOW (TREE_VALUE (dims)); + + if (!size) + /* If it's dynamic, the backend might know a hardware + limitation. */ + size = targetm.goacc.dim_limit (axis); this all seems a little bit fragile and relying on implementation details? Is the attribute always present? Is the call argument always a constant that fits in a HOST_WIDE_INT (or even int here)? Are there always enough 'dims' in the tree list? Is the 'dim' value always an INTEGER_CST that fits a HOST_WIDE_INT (or even an int here)? I hope all these constraints (esp. 'int' fitting) are verified by the parser. If so I'd like to see helper functions to hide these implementation details from generic code like this. You miss to provide the known lower bound to VRP when size is 0 in the end. Just unconditioonally do tree type = TREE_TYPE (gimple_call_lhs (stmt)); set_value_range (vr, VR_RANGE, build_int_cst (type, is_pos ? 0 : 1), size ? build_int_cst (type, size - is_pos) : vrp_val_max (type), NULL); so you get at least [1, +INF] and [0, +INF] here. Thanks, Richard. > nathan
On 11/04/15 05:26, Richard Biener wrote: > On Tue, Nov 3, 2015 at 7:11 PM, Nathan Sidwell <nathan@acm.org> wrote: >> Richard, >> this patch implements VRP for the 2 openacc axis internal fns I've added. >> We know the position within a dimension cannot exceed that dimensions >> extend. Further, if the extend is dynamic, the target backend may well know >> there's a hardware-mandated maximum value. >> >> Hence, added a new target hook to allow the backend to specify that upper >> bound, and added smarts to extract_range_basic to process the two internal >> functions. >> >> Incidentally, this was the bit I was working on at the cauldron, which >> caused me to work on the min/max range combining. >> >> ok for trunk? > > + /* Optimizing these two internal functions helps the loop > + optimizer eliminate outer comparisons. Size is [1,N] > + and pos is [0,N-1]. */ > + { > + bool is_pos = ifn_code == IFN_GOACC_DIM_POS; > + tree attr = get_oacc_fn_attrib (current_function_decl); > + tree arg = gimple_call_arg (stmt, 0); > + int axis = TREE_INT_CST_LOW (arg); > + tree dims = TREE_VALUE (attr); > + > + for (int ix = axis; ix--;) > + dims = TREE_CHAIN (dims); > + int size = TREE_INT_CST_LOW (TREE_VALUE (dims)); > + > + if (!size) > + /* If it's dynamic, the backend might know a hardware > + limitation. */ > + size = targetm.goacc.dim_limit (axis); > > this all seems a little bit fragile and relying on implementation details? > Is the attribute always present? Yes. The ifns are inserted by a pass (execute_oacc_device_lower) that only operates on such functions. (I considered adding an assert, but figured the resulting segfault would be loud enough) > Is the call argument always a constant > that fits in a HOST_WIDE_INT (or even int here)? Yes. Are there always enough > 'dims' in the tree list? Yes. The oacc_device_lower pass has already validated and canonicalized the dimensions. Is the 'dim' value always an INTEGER_CST that > fits a HOST_WIDE_INT (or even an int here)? Yes. That's part of the validation. see set_oacc_fn_attrib (omp-low.c) > I hope all these constraints (esp. 'int' fitting) are verified by the parser. It's an internal function not visible the user. Generation is entirely within the compiler > If so I'd like to see helper functions to hide these implementation details > from generic code like this. ok. > > You miss to provide the known lower bound to VRP when size is 0 > in the end. Just unconditioonally do > > tree type = TREE_TYPE (gimple_call_lhs (stmt)); > set_value_range (vr, VR_RANGE, > build_int_cst (type, is_pos ? 0 : 1), > size > ? build_int_cst (type, size - is_pos) > : vrp_val_max (type), NULL); I'm confused. If size is zero, we never execute that path, and IIUC therefore never specify a range. What you suggest looks like an additional improvement though. Is that what you meant? nathan
On Wed, Nov 4, 2015 at 2:56 PM, Nathan Sidwell <nathan@acm.org> wrote: > On 11/04/15 05:26, Richard Biener wrote: >> >> On Tue, Nov 3, 2015 at 7:11 PM, Nathan Sidwell <nathan@acm.org> wrote: >>> >>> Richard, >>> this patch implements VRP for the 2 openacc axis internal fns I've added. >>> We know the position within a dimension cannot exceed that dimensions >>> extend. Further, if the extend is dynamic, the target backend may well >>> know >>> there's a hardware-mandated maximum value. >>> >>> Hence, added a new target hook to allow the backend to specify that upper >>> bound, and added smarts to extract_range_basic to process the two >>> internal >>> functions. >>> >>> Incidentally, this was the bit I was working on at the cauldron, which >>> caused me to work on the min/max range combining. >>> >>> ok for trunk? >> >> >> + /* Optimizing these two internal functions helps the loop >> + optimizer eliminate outer comparisons. Size is [1,N] >> + and pos is [0,N-1]. */ >> + { >> + bool is_pos = ifn_code == IFN_GOACC_DIM_POS; >> + tree attr = get_oacc_fn_attrib (current_function_decl); >> + tree arg = gimple_call_arg (stmt, 0); >> + int axis = TREE_INT_CST_LOW (arg); >> + tree dims = TREE_VALUE (attr); >> + >> + for (int ix = axis; ix--;) >> + dims = TREE_CHAIN (dims); >> + int size = TREE_INT_CST_LOW (TREE_VALUE (dims)); >> + >> + if (!size) >> + /* If it's dynamic, the backend might know a hardware >> + limitation. */ >> + size = targetm.goacc.dim_limit (axis); >> >> this all seems a little bit fragile and relying on implementation details? >> Is the attribute always present? > > > Yes. The ifns are inserted by a pass (execute_oacc_device_lower) that only > operates on such functions. (I considered adding an assert, but figured the > resulting segfault would be loud enough) > >> Is the call argument always a constant >> that fits in a HOST_WIDE_INT (or even int here)? > > > Yes. > > > Are there always enough >> >> 'dims' in the tree list? > > > Yes. The oacc_device_lower pass has already validated and canonicalized the > dimensions. > > > Is the 'dim' value always an INTEGER_CST that >> >> fits a HOST_WIDE_INT (or even an int here)? > > > Yes. That's part of the validation. see set_oacc_fn_attrib (omp-low.c) > >> I hope all these constraints (esp. 'int' fitting) are verified by the >> parser. > > > It's an internal function not visible the user. Generation is entirely > within the compiler > >> If so I'd like to see helper functions to hide these implementation >> details >> from generic code like this. > > > ok. > >> >> You miss to provide the known lower bound to VRP when size is 0 >> in the end. Just unconditioonally do >> >> tree type = TREE_TYPE (gimple_call_lhs (stmt)); >> set_value_range (vr, VR_RANGE, >> build_int_cst (type, is_pos ? 0 : 1), >> size >> ? build_int_cst (type, size - is_pos) >> : vrp_val_max (type), NULL); > > > I'm confused. If size is zero, we never execute that path, and IIUC > therefore never specify a range. What you suggest looks like an additional > improvement though. Is that what you meant? Yes. > nathan
2015-11-03 Nathan Sidwell <nathan@codesourcery.com> * target.def (goacc.dim_limit): New hook. * targhooks.h (default_goacc_dim_limit): Declare. * doc/tm.texi.in (TARGET_GOACC_DIM_LIMIT): Add. * doc/tm.texi: Rebuilt. * omp-low.c (default_goacc_dim_limit): New. * config/nvptx/nvptx.c (PTX_VECTOR_LENGTH, PTX_WORKER_LENGTH): New. (nvptx_goacc_dim_limit) New. (TARGET_GOACC_DIM_LIMIT): Override. * tree-vrp.c: Include omp-low.h, target.h. (extract_range_basic): Add handling for IFN_GOACC_DIM_SIZE & IFN_GOACC_DIM_POS. Index: gcc/targhooks.h =================================================================== --- gcc/targhooks.h (revision 229535) +++ gcc/targhooks.h (working copy) @@ -110,6 +110,7 @@ extern void default_destroy_cost_data (v /* OpenACC hooks. */ extern bool default_goacc_validate_dims (tree, int [], int); +extern int default_goacc_dim_limit (int); extern bool default_goacc_fork_join (gcall *, const int [], bool); /* These are here, and not in hooks.[ch], because not all users of Index: gcc/config/nvptx/nvptx.c =================================================================== --- gcc/config/nvptx/nvptx.c (revision 229535) +++ gcc/config/nvptx/nvptx.c (working copy) @@ -3248,6 +3248,10 @@ nvptx_file_end (void) } } +/* Define dimension sizes for known hardware. */ +#define PTX_VECTOR_LENGTH 32 +#define PTX_WORKER_LENGTH 32 + /* Validate compute dimensions of an OpenACC offload or routine, fill in non-unity defaults. FN_LEVEL indicates the level at which a routine might spawn a loop. It is negative for non-routines. */ @@ -3264,6 +3268,25 @@ nvptx_goacc_validate_dims (tree ARG_UNUS return changed; } +/* Return maximum dimension size, or zero for unbounded. */ + +static int +nvptx_goacc_dim_limit (int axis) +{ + switch (axis) + { + case GOMP_DIM_WORKER: + return PTX_WORKER_LENGTH; + + case GOMP_DIM_VECTOR: + return PTX_VECTOR_LENGTH; + + default: + break; + } + return 0; +} + /* Determine whether fork & joins are needed. */ static bool @@ -3376,6 +3399,9 @@ nvptx_goacc_fork_join (gcall *call, cons #undef TARGET_GOACC_VALIDATE_DIMS #define TARGET_GOACC_VALIDATE_DIMS nvptx_goacc_validate_dims +#undef TARGET_GOACC_DIM_LIMIT +#define TARGET_GOACC_DIM_LIMIT nvptx_goacc_dim_limit + #undef TARGET_GOACC_FORK_JOIN #define TARGET_GOACC_FORK_JOIN nvptx_goacc_fork_join Index: gcc/omp-low.c =================================================================== --- gcc/omp-low.c (revision 229535) +++ gcc/omp-low.c (working copy) @@ -19380,6 +19380,18 @@ default_goacc_validate_dims (tree ARG_UN return changed; } +/* Default dimension bound is unknown on accelerator and 1 on host. */ + +int +default_goacc_dim_limit (int ARG_UNUSED (axis)) +{ +#ifdef ACCEL_COMPILER + return 0; +#else + return 1; +#endif +} + namespace { const pass_data pass_data_oacc_device_lower = Index: gcc/tree-vrp.c =================================================================== --- gcc/tree-vrp.c (revision 229535) +++ gcc/tree-vrp.c (working copy) @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3. #include "tree-ssa-threadupdate.h" #include "tree-ssa-scopedtables.h" #include "tree-ssa-threadedge.h" - - +#include "omp-low.h" +#include "target.h" /* Range of values that can be associated with an SSA_NAME after VRP has executed. */ @@ -3976,7 +3976,9 @@ extract_range_basic (value_range *vr, gi else if (is_gimple_call (stmt) && gimple_call_internal_p (stmt)) { enum tree_code subcode = ERROR_MARK; - switch (gimple_call_internal_fn (stmt)) + unsigned ifn_code = gimple_call_internal_fn (stmt); + + switch (ifn_code) { case IFN_UBSAN_CHECK_ADD: subcode = PLUS_EXPR; @@ -3987,6 +3989,37 @@ extract_range_basic (value_range *vr, gi case IFN_UBSAN_CHECK_MUL: subcode = MULT_EXPR; break; + case IFN_GOACC_DIM_SIZE: + case IFN_GOACC_DIM_POS: + /* Optimizing these two internal functions helps the loop + optimizer eliminate outer comparisons. Size is [1,N] + and pos is [0,N-1]. */ + { + bool is_pos = ifn_code == IFN_GOACC_DIM_POS; + tree attr = get_oacc_fn_attrib (current_function_decl); + tree arg = gimple_call_arg (stmt, 0); + int axis = TREE_INT_CST_LOW (arg); + tree dims = TREE_VALUE (attr); + + for (int ix = axis; ix--;) + dims = TREE_CHAIN (dims); + int size = TREE_INT_CST_LOW (TREE_VALUE (dims)); + + if (!size) + /* If it's dynamic, the backend might know a hardware + limitation. */ + size = targetm.goacc.dim_limit (axis); + + if (size) + { + tree type = TREE_TYPE (gimple_call_lhs (stmt)); + + set_value_range (vr, VR_RANGE, + build_int_cst (type, is_pos ? 0 : 1), + build_int_cst (type, size - is_pos), NULL); + } + } + return; default: break; } Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (revision 229535) +++ gcc/doc/tm.texi (working copy) @@ -5777,6 +5777,11 @@ true, if changes have been made. You mu provide dimensions larger than 1. @end deftypefn +@deftypefn {Target Hook} int TARGET_GOACC_DIM_LIMIT (int @var{axis}) +This hook should return the maximum size of a particular dimension, +or zero if unbounded. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_GOACC_FORK_JOIN (gcall *@var{call}, const int *@var{dims}, bool @var{is_fork}) This hook can be used to convert IFN_GOACC_FORK and IFN_GOACC_JOIN function calls to target-specific gimple, or indicate whether they Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (revision 229535) +++ gcc/doc/tm.texi.in (working copy) @@ -4262,6 +4262,8 @@ address; but often a machine-dependent @hook TARGET_GOACC_VALIDATE_DIMS +@hook TARGET_GOACC_DIM_LIMIT + @hook TARGET_GOACC_FORK_JOIN @node Anchored Addresses Index: gcc/target.def =================================================================== --- gcc/target.def (revision 229535) +++ gcc/target.def (working copy) @@ -1659,6 +1659,13 @@ bool, (tree decl, int *dims, int fn_leve default_goacc_validate_dims) DEFHOOK +(dim_limit, +"This hook should return the maximum size of a particular dimension,\n\ +or zero if unbounded.", +int, (int axis), +default_goacc_dim_limit) + +DEFHOOK (fork_join, "This hook can be used to convert IFN_GOACC_FORK and IFN_GOACC_JOIN\n\ function calls to target-specific gimple, or indicate whether they\n\