From patchwork Thu Jan 4 13:58:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Richard Earnshaw (lists)" X-Patchwork-Id: 855598 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-470136-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="qziTkCMD"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zC8Yf3g9vz9s81 for ; Fri, 5 Jan 2018 00:59:34 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references:mime-version:content-type; q=dns; s= default; b=B+ahhTTGXuN1lHDhz95/4oZaUQID/1w28oL5sThTuSuWSUF4MaTfi xu9m34kl02UqTCkn4taGqLlLhkj9gRHUKireXLDDVAQ5Gg8UrRCf5sFkY61PDheR PWNobVXZqXdQOuqLOt8TthHGa90nmIkdtgqXlTdimIWiT4c45/M2n4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references:mime-version:content-type; s=default; bh=gsgEoZjt4AdOqNWmQfJggFw1QSc=; b=qziTkCMDSv8Jodwq5DzGtnwJcIWw 0PxP3whK8Y0JF2jwAUA3uT0oGYieckSPfIyk1ZXXAmzu3Y3doWebaB95R8I2UGrg vtAMQzfQcdmPOljGhOg8UOjNPq134freJNt9xe0tRgQ3jLdwgD8/T5YxVPx1D8KO 8ny5Q8O6oHYG89M= Received: (qmail 87989 invoked by alias); 4 Jan 2018 13:58:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 87923 invoked by uid 89); 4 Jan 2018 13:58:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.7 required=5.0 tests=BAYES_50, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS, T_RP_MATCHES_RCVD, URIBL_BLACK autolearn=ham version=3.3.2 spammy=UD:builtins.def, builtinsdef, builtins.def, c-cppbuiltin.c X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 04 Jan 2018 13:58:54 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A0D5515A2; Thu, 4 Jan 2018 05:58:53 -0800 (PST) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 906423F581; Thu, 4 Jan 2018 05:58:52 -0800 (PST) From: Richard Earnshaw To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw Subject: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate() Date: Thu, 4 Jan 2018 13:58:41 +0000 Message-Id: <3f3375cf241d099400b7f90c7c6e42c2e140734c.1515072356.git.Richard.Earnshaw@arm.com> In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 This patch adds generic support for the new builtin __builtin_load_no_speculate. It provides the overloading of the different access sizes and a default fall-back expansion for targets that do not support a mechanism for inhibiting speculation. * builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): New builtin type signature. (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise. * builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin. (BUILT_IN_LOAD_NO_SPECULATE_1): Likewise. (BUILT_IN_LOAD_NO_SPECULATE_2): Likewise. (BUILT_IN_LOAD_NO_SPECULATE_4): Likewise. (BUILT_IN_LOAD_NO_SPECULATE_8): Likewise. (BUILT_IN_LOAD_NO_SPECULATE_16): Likewise. * target.def (inhibit_load_speculation): New hook. * doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to documentation. * doc/tm.texi: Regenerated. * doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE. * doc/extend.texi: Document __builtin_load_no_speculate. * c-family/c-common.c (load_no_speculate_resolve_size): New function. (load_no_speculate_resolve_params): New function. (load_no_speculate_resolve_return): New function. (resolve_overloaded_builtin): Handle overloading __builtin_load_no_speculate. * builtins.c (expand_load_no_speculate): New function. (expand_builtin): Handle new no-speculation builtins. * targhooks.h (default_inhibit_load_speculation): Declare. * targhooks.c (default_inhibit_load_speculation): New function. --- gcc/builtin-types.def | 16 +++++ gcc/builtins.c | 99 ++++++++++++++++++++++++++ gcc/builtins.def | 22 ++++++ gcc/c-family/c-common.c | 164 ++++++++++++++++++++++++++++++++++++++++++++ gcc/c-family/c-cppbuiltin.c | 5 +- gcc/doc/cpp.texi | 4 ++ gcc/doc/extend.texi | 53 ++++++++++++++ gcc/doc/tm.texi | 6 ++ gcc/doc/tm.texi.in | 2 + gcc/target.def | 20 ++++++ gcc/targhooks.c | 69 +++++++++++++++++++ gcc/targhooks.h | 3 + 12 files changed, 462 insertions(+), 1 deletion(-) diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index bb50e60..259aacd 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -785,6 +785,22 @@ DEF_FUNCTION_TYPE_VAR_3 (BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR, DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_FILEPTR_INT_CONST_STRING_VAR, BT_INT, BT_FILEPTR, BT_INT, BT_CONST_STRING) +DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + BT_I1, BT_CONST_VOLATILE_PTR, BT_CONST_VOLATILE_PTR, + BT_CONST_VOLATILE_PTR) +DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + BT_I2, BT_CONST_VOLATILE_PTR, BT_CONST_VOLATILE_PTR, + BT_CONST_VOLATILE_PTR) +DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + BT_I4, BT_CONST_VOLATILE_PTR, BT_CONST_VOLATILE_PTR, + BT_CONST_VOLATILE_PTR) +DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + BT_I8, BT_CONST_VOLATILE_PTR, BT_CONST_VOLATILE_PTR, + BT_CONST_VOLATILE_PTR) +DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + BT_I16, BT_CONST_VOLATILE_PTR, BT_CONST_VOLATILE_PTR, + BT_CONST_VOLATILE_PTR) + DEF_FUNCTION_TYPE_VAR_4 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VAR, BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING) diff --git a/gcc/builtins.c b/gcc/builtins.c index 98eb804..1bdbc64 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -6602,6 +6602,97 @@ expand_stack_save (void) return ret; } +/* Expand a call to __builtin_load_no_speculate_. MODE represents the + size of the first argument to that call. We emit a warning if the + result isn't used (IGNORE != 0), since the implementation might + rely on the value being used to correctly inhibit speculation. */ +static rtx +expand_load_no_speculate (machine_mode mode, tree exp, rtx target, int ignore) +{ + rtx ptr, op0, op1, op2, op3, op4; + unsigned nargs = call_expr_nargs (exp); + + if (ignore) + { + warning_at (input_location, 0, + "result of __builtin_load_no_speculate must be used to " + "ensure correct operation"); + target = NULL; + } + + tree arg0 = CALL_EXPR_ARG (exp, 0); + tree arg1 = CALL_EXPR_ARG (exp, 1); + tree arg2 = CALL_EXPR_ARG (exp, 2); + + ptr = expand_expr (arg0, NULL_RTX, ptr_mode, EXPAND_SUM); + op0 = validize_mem (gen_rtx_MEM (mode, convert_memory_address (Pmode, ptr))); + + set_mem_align (op0, MAX (GET_MODE_ALIGNMENT (mode), + get_pointer_alignment (arg0))); + set_mem_alias_set (op0, get_alias_set (TREE_TYPE (TREE_TYPE (arg0)))); + + /* Mark the memory access as volatile. We don't want the optimizers to + move it or otherwise substitue an alternative value. */ + MEM_VOLATILE_P (op0) = 1; + + if (integer_zerop (tree_strip_nop_conversions (arg1))) + op1 = NULL; + else + { + op1 = expand_normal (arg1); + if (GET_MODE (op1) != ptr_mode && GET_MODE (op1) != VOIDmode) + op1 = convert_modes (ptr_mode, VOIDmode, op1, + TYPE_UNSIGNED (TREE_TYPE (arg1))); + } + + if (integer_zerop (tree_strip_nop_conversions (arg2))) + op2 = NULL; + else + { + op2 = expand_normal (arg2); + if (GET_MODE (op2) != ptr_mode && GET_MODE (op2) != VOIDmode) + op2 = convert_modes (ptr_mode, VOIDmode, op2, + TYPE_UNSIGNED (TREE_TYPE (arg2))); + } + + if (nargs > 3) + { + tree arg3 = CALL_EXPR_ARG (exp, 3); + op3 = expand_normal (arg3); + if (CONST_INT_P (op3)) + op3 = gen_int_mode (INTVAL (op3), mode); + else if (GET_MODE (op3) != mode && GET_MODE (op3) != VOIDmode) + op3 = convert_modes (mode, VOIDmode, op3, + TYPE_UNSIGNED (TREE_TYPE (arg3))); + } + else + op3 = const0_rtx; + + if (nargs > 4) + { + tree arg4 = CALL_EXPR_ARG (exp, 4); + op4 = expand_normal (arg4); + if (GET_MODE (op4) != ptr_mode && GET_MODE (op4) != VOIDmode) + op4 = convert_modes (ptr_mode, VOIDmode, op4, + TYPE_UNSIGNED (TREE_TYPE (arg4))); + } + else + op4 = ptr; + + if (op1 == NULL && op2 == NULL) + { + error_at (input_location, + "at least one speculation bound must be non-NULL"); + /* Ensure we don't crash later. */ + op1 = op4; + } + + if (target == NULL) + target = gen_reg_rtx (mode); + + return targetm.inhibit_load_speculation (mode, target, op0, op1, op2, op3, + op4); +} /* Expand an expression EXP that calls a built-in function, with result going to TARGET if that's convenient @@ -7732,6 +7823,14 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, folding. */ break; + case BUILT_IN_LOAD_NO_SPECULATE_1: + case BUILT_IN_LOAD_NO_SPECULATE_2: + case BUILT_IN_LOAD_NO_SPECULATE_4: + case BUILT_IN_LOAD_NO_SPECULATE_8: + case BUILT_IN_LOAD_NO_SPECULATE_16: + mode = get_builtin_sync_mode (fcode - BUILT_IN_LOAD_NO_SPECULATE_1); + return expand_load_no_speculate (mode, exp, target, ignore); + default: /* just do library call, if unknown builtin */ break; } diff --git a/gcc/builtins.def b/gcc/builtins.def index 671097e..761c063 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -1017,6 +1017,28 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON, true, true, true, ATTR_NOTHROW_LEAF_LIST, false, !targetm.have_tls) +/* Suppressing speculation. Users are expected to use the first (N) + variant, which will be translated internally into one of the other + types. */ +DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_N, "load_no_speculate", + BT_FN_VOID_VAR, ATTR_NULL) + +DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_1, "load_no_speculate_1", + BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + ATTR_NULL) +DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_2, "load_no_speculate_2", + BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + ATTR_NULL) +DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_4, "load_no_speculate_4", + BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + ATTR_NULL) +DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_8, "load_no_speculate_8", + BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + ATTR_NULL) +DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_16, "load_no_speculate_16", + BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, + ATTR_NULL) + /* Exception support. */ DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume") DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup") diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 197a71f..c213ffd 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -6456,6 +6456,146 @@ builtin_type_for_size (int size, bool unsignedp) return type ? type : error_mark_node; } +/* Work out the size of the object pointed to by the first arguement + of a call to __builtin_load_no_speculate. Only pointers to + integral types and pointers are permitted. Return 0 if the + arguement type is not supported of if the size is too large. */ +static int +load_no_speculate_resolve_size (tree function, vec *params) +{ + /* Type of the argument. */ + tree type; + int size; + + if (vec_safe_is_empty (params)) + { + error ("too few arguments to function %qE", function); + return 0; + } + + type = TREE_TYPE ((*params)[0]); + + if (!POINTER_TYPE_P (type)) + goto incompatible; + + type = TREE_TYPE (type); + + if (TREE_CODE (type) == ARRAY_TYPE) + { + /* Force array-to-pointer decay for c++. */ + gcc_assert (c_dialect_cxx()); + (*params)[0] = default_conversion ((*params)[0]); + type = TREE_TYPE ((*params)[0]); + } + + if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type)) + goto incompatible; + + if (!COMPLETE_TYPE_P (type)) + goto incompatible; + + size = tree_to_uhwi (TYPE_SIZE_UNIT (type)); + if (size == 1 || size == 2 || size == 4 || size == 8 || size == 16) + return size; + + incompatible: + /* Issue the diagnostic only if the argument is valid, otherwise + it would be redundant at best and could be misleading. */ + if (type != error_mark_node) + error ("operand type %qT is incompatible with argument %d of %qE", + type, 1, function); + + return 0; +} + +/* Validate and coerce PARAMS, the arguments to ORIG_FUNCTION to fit + the prototype for FUNCTION. The first three arguments are + mandatory, but shouldn't need casting as they are all pointers and + we've already established that the first argument is a pointer to a + permitted type. The two optional arguments may need to be + fabricated if they have been omitted. */ +static bool +load_no_speculate_resolve_params (location_t loc, tree orig_function, + tree function, + vec *params) +{ + function_args_iterator iter; + + function_args_iter_init (&iter, TREE_TYPE (function)); + tree arg_type = function_args_iter_cond (&iter); + unsigned parmnum; + tree val; + + if (params->length () < 3) + { + error_at (loc, "too few arguments to function %qE", orig_function); + return false; + } + else if (params->length () > 5) + { + error_at (loc, "too many arguments to function %qE", orig_function); + return false; + } + + /* Required arguments. These must all be pointers. */ + for (parmnum = 0; parmnum < 3; parmnum++) + { + arg_type = function_args_iter_cond (&iter); + val = (*params)[parmnum]; + if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE) + val = default_conversion (val); + if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE) + goto bad_arg; + (*params)[parmnum] = val; + } + + /* Optional integer value. */ + arg_type = function_args_iter_cond (&iter); + if (params->length () >= 4) + { + val = (*params)[parmnum]; + val = convert (arg_type, val); + (*params)[parmnum] = val; + } + else + return true; + + /* Optional pointer to compare against. */ + parmnum = 4; + arg_type = function_args_iter_cond (&iter); + if (params->length () == 5) + { + val = (*params)[parmnum]; + if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE) + val = default_conversion (val); + if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE) + goto bad_arg; + (*params)[parmnum] = val; + } + + return true; + + bad_arg: + error_at (loc, "expecting argument of type %qT for argument %u", arg_type, + parmnum); + return false; +} + +/* Cast the result of the builtin back to the type pointed to by the + first argument, preserving any qualifiers that it might have. */ +static tree +load_no_speculate_resolve_return (tree first_param, tree result) +{ + tree ptype = TREE_TYPE (TREE_TYPE (first_param)); + tree rtype = TREE_TYPE (result); + ptype = TYPE_MAIN_VARIANT (ptype); + + if (tree_int_cst_equal (TYPE_SIZE (ptype), TYPE_SIZE (rtype))) + return convert (ptype, result); + + return result; +} + /* A helper function for resolve_overloaded_builtin in resolving the overloaded __sync_ builtins. Returns a positive power of 2 if the first operand of PARAMS is a pointer to a supported data type. @@ -7110,6 +7250,30 @@ resolve_overloaded_builtin (location_t loc, tree function, /* Handle BUILT_IN_NORMAL here. */ switch (orig_code) { + case BUILT_IN_LOAD_NO_SPECULATE_N: + { + int n = load_no_speculate_resolve_size (function, params); + tree new_function, first_param, result; + enum built_in_function fncode; + + if (n == 0) + return error_mark_node; + + fncode = (enum built_in_function)((int)orig_code + exact_log2 (n) + 1); + new_function = builtin_decl_explicit (fncode); + first_param = (*params)[0]; + if (!load_no_speculate_resolve_params (loc, function, new_function, + params)) + return error_mark_node; + + result = build_function_call_vec (loc, vNULL, new_function, params, + NULL); + if (result == error_mark_node) + return result; + + return load_no_speculate_resolve_return (first_param, result); + } + case BUILT_IN_ATOMIC_EXCHANGE: case BUILT_IN_ATOMIC_COMPARE_EXCHANGE: case BUILT_IN_ATOMIC_LOAD: diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 9e33aed..fb06ee7 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -1361,7 +1361,10 @@ c_cpp_builtins (cpp_reader *pfile) cpp_define (pfile, "__WCHAR_UNSIGNED__"); cpp_atomic_builtins (pfile); - + + /* Show support for __builtin_load_no_speculate (). */ + cpp_define (pfile, "__HAVE_LOAD_NO_SPECULATE"); + #ifdef DWARF2_UNWIND_INFO if (dwarf2out_do_cfi_asm ()) cpp_define (pfile, "__GCC_HAVE_DWARF2_CFI_ASM"); diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi index 94437d5..9dca2e2 100644 --- a/gcc/doc/cpp.texi +++ b/gcc/doc/cpp.texi @@ -2381,6 +2381,10 @@ If GCC cannot determine the current date, it will emit a warning message These macros are defined when the target processor supports atomic compare and swap operations on operands 1, 2, 4, 8 or 16 bytes in length, respectively. +@item __HAVE_LOAD_NO_SPECULATE +This macro is defined with the value 1 to show that this version of GCC +supports @code{__builtin_load_no_speculate}. + @item __GCC_HAVE_DWARF2_CFI_ASM This macro is defined when the compiler is emitting DWARF CFI directives to the assembler. When this is defined, it is possible to emit those same diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 2a553ad..7a71f34 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -10968,6 +10968,7 @@ the built-in function returns -1. @findex __builtin_islessequal @findex __builtin_islessgreater @findex __builtin_isunordered +@findex __builtin_load_no_speculate @findex __builtin_powi @findex __builtin_powif @findex __builtin_powil @@ -11614,6 +11615,58 @@ check its compatibility with @var{size}. @end deftypefn +@deftypefn {Built-in Function} @var{type} __builtin_load_no_speculate (const volatile @var{type} *ptr, const volatile void *lower_bound, const volatile void *upper_bound, @var{type} failval, const volatile void *cmpptr) +The @code{__builtin_load_no_speculation} function provides a means to +limit the extent to which a processor can continue speculative +execution with the result of loading a value stored at @var{ptr}. +Logically, the builtin implements the following behavior: + +@smallexample +inline @var{type} __builtin_load_no_speculate + (const volatile @var{type} *ptr, + const volatile void *lower_bound, + const volatile void *upper_bound, + @var{type} failval, + const volatile void *cmpptr) +@{ + @var{type} result; + if (cmpptr >= lower_bound && cmpptr < upper_bound) + result = *ptr; + else + result = failval; + return result; +@} +@end smallexample + +but in addition target-specific code will be inserted to ensure that +speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of +the specified bounds. + +@var{type} may be any integral type (signed, or unsigned, @code{char}, +@code{short}, @code{int}, etc) or a pointer to any type. + +The final argument, @var{cmpptr}, may be omitted. If you do this, +then the compiler will use @var{ptr} for comparison against the upper +and lower bounds. Furthermore, if you omit @var{cmpptr}, you may also +omit @var{failval} and the compiler will use @code{(@var{type})0} for +the out-of-bounds result. + +Additionally, when it is know that one of the bounds can never fail, +you can use a literal @code{NULL} argument and the compiler will +generate code that only checks the other boundary condition. It is generally +only safe to do this when your code contains a loop construct where the only +boundary of interest is the one beyond the termination condition. You cannot +omit both boundary conditions in this way. + +The logical behaviour of the builtin is supported for all architectures, but +on machines where target-specific support for inhibiting speculation is not +implemented, or not necessary, the compiler will emit a warning. + +The pre-processor macro @code{__HAVE_LOAD_NO_SPECULATE} is defined with the +value 1 on all implementations of GCC that support this builtin. + +@end deftypefn + @deftypefn {Built-in Function} int __builtin_types_compatible_p (@var{type1}, @var{type2}) You can use the built-in function @code{__builtin_types_compatible_p} to diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 9793a0e..7309ccb 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11922,6 +11922,12 @@ maintainer is familiar with. @end defmac +@deftypefn {Target Hook} rtx TARGET_INHIBIT_LOAD_SPECULATION (machine_mode @var{mode}, rtx @var{result}, rtx @var{mem}, rtx @var{lower_bound}, rtx @var{upper_bound}, rtx @var{fail_result}, rtx @var{cmpptr}) +Generate a target-specific code sequence that implements @code{__builtin_load_no_speculate}, returning the result in @var{result}. If @var{cmpptr} is greater than, or equal to, @var{lower_bound} and less than @var{upper_bound} then @var{mem}, a @code{MEM} of type @var{mode}, should be returned, otherwise @var{failval} should be returned. The expansion must ensure that subsequent speculation by the processor using the @var{mem} cannot occur if @var{cmpptr} lies outside of the specified bounds. At most one of @var{lower_bound} and @var{upper_bound} can be @code{NULL_RTX}, indicating that code for that bounds check should not be generated. + + The default implementation implements the logic of the builtin but cannot provide the target-specific code necessary to inhibit speculation. A warning will be emitted to that effect. +@end deftypefn + @deftypefn {Target Hook} void TARGET_RUN_TARGET_SELFTESTS (void) If selftests are enabled, run any selftests for this target. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 7bcfb37..d34e4bf 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8075,4 +8075,6 @@ maintainer is familiar with. @end defmac +@hook TARGET_INHIBIT_LOAD_SPECULATION + @hook TARGET_RUN_TARGET_SELFTESTS diff --git a/gcc/target.def b/gcc/target.def index e9eacc8..375eb0a 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4214,6 +4214,26 @@ DEFHOOK hook_bool_void_true) DEFHOOK +(inhibit_load_speculation, + "Generate a target-specific code sequence that implements\ + @code{__builtin_load_no_speculate}, returning the result in @var{result}.\ + If @var{cmpptr} is greater than, or equal to, @var{lower_bound} and less\ + than @var{upper_bound} then @var{mem}, a @code{MEM} of type @var{mode},\ + should be returned, otherwise @var{failval} should be returned. The\ + expansion must ensure that subsequent speculation by the processor using\ + the @var{mem} cannot occur if @var{cmpptr} lies outside of the specified\ + bounds. At most one of @var{lower_bound} and @var{upper_bound} can be\ + @code{NULL_RTX}, indicating that code for that bounds check should not be\ + generated.\n\ + \n\ + The default implementation implements the logic of the builtin\ + but cannot provide the target-specific code necessary to inhibit\ + speculation. A warning will be emitted to that effect.", + rtx, (machine_mode mode, rtx result, rtx mem, rtx lower_bound, + rtx upper_bound, rtx fail_result, rtx cmpptr), + default_inhibit_load_speculation) + +DEFHOOK (can_use_doloop_p, "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\ and @code{doloop_begin}) for a particular loop. @var{iterations} gives the\n\ diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 653567c..24d9c7b 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3. If not see #include "params.h" #include "real.h" #include "langhooks.h" +#include "dojump.h" bool default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED, @@ -2307,4 +2308,72 @@ default_stack_clash_protection_final_dynamic_probe (rtx residual ATTRIBUTE_UNUSE return 0; } +/* Default implementation of the load-and-inhibit-speculation builtin. + This version does not have, or know of, the target-specific + mechanisms necessary to inhibit speculation, so it simply emits a + code sequence that implements the architectural aspects of the + builtin. */ +rtx +default_inhibit_load_speculation (machine_mode mode ATTRIBUTE_UNUSED, + rtx result, + rtx mem, + rtx lower_bound, + rtx upper_bound, + rtx fail_result, + rtx cmpptr) +{ + rtx_code_label *done_label = gen_label_rtx (); + rtx_code_label *inrange_label = gen_label_rtx (); + warning_at + (input_location, 0, + "this target does not support anti-speculation operations. " + "Your program will still execute correctly, but speculation " + "will not be inhibited"); + + /* We don't have any despeculation barriers, but if we mark the branch + probabilities to be always predicting the out-of-bounds path, then + there's a higher chance that the compiler will order code so that + static prediction will fall through a safe path. */ + if (lower_bound == NULL) + { + do_compare_rtx_and_jump (cmpptr, upper_bound, LTU, true, ptr_mode, + NULL, NULL, inrange_label, + profile_probability::never ()); + emit_move_insn (result, fail_result); + emit_jump (done_label); + emit_label (inrange_label); + emit_move_insn (result, mem); + emit_label (done_label); + } + else if (upper_bound == NULL) + { + do_compare_rtx_and_jump (cmpptr, lower_bound, GEU, true, ptr_mode, + NULL, NULL, inrange_label, + profile_probability::never ()); + emit_move_insn (result, fail_result); + emit_jump (done_label); + emit_label (inrange_label); + emit_move_insn (result, mem); + emit_label (done_label); + } + else + { + rtx_code_label *oob_label = gen_label_rtx (); + do_compare_rtx_and_jump (cmpptr, lower_bound, LTU, true, ptr_mode, + NULL, NULL, oob_label, + profile_probability::always ()); + do_compare_rtx_and_jump (cmpptr, upper_bound, GEU, true, ptr_mode, + NULL, NULL, inrange_label, + profile_probability::never ()); + emit_label (oob_label); + emit_move_insn (result, fail_result); + emit_jump (done_label); + emit_label (inrange_label); + emit_move_insn (result, mem); + emit_label (done_label); + } + + return result; +} + #include "gt-targhooks.h" diff --git a/gcc/targhooks.h b/gcc/targhooks.h index e753e58..c55b43f 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -286,4 +286,7 @@ extern enum flt_eval_method default_excess_precision (enum excess_precision_type ATTRIBUTE_UNUSED); extern bool default_stack_clash_protection_final_dynamic_probe (rtx); +extern rtx +default_inhibit_load_speculation (machine_mode, rtx, rtx, rtx, rtx, rtx, rtx); + #endif /* GCC_TARGHOOKS_H */ From patchwork Thu Jan 4 13:58:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Richard Earnshaw (lists)" X-Patchwork-Id: 855597 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-470135-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="TXFN1xvy"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zC8YL6s8kz9s81 for ; Fri, 5 Jan 2018 00:59:18 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references:mime-version:content-type; q=dns; s= default; b=UcWGHe/oVM/m7P43vCJC2qkWB2p1kroYGt9KS3/f/3/W1pRmSOrkQ nt+s8/60LOT9X4zLqZS0UFf+tZEWCKtiCmtnbW/cTTuLeS9NascS3LedLPL7fSbr agA6tk7LzpvKGNIDsf8ov/q3Y+dg1ZBbElrH7fsggzT/FornzglQ1k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references:mime-version:content-type; s=default; bh=uz5KN9Nr4vvXV9TdZeKJT/NLlXA=; b=TXFN1xvyeTiR61rq/9TO/Tmh8gz2 fgI4DMedvj11gTHuw5ae17FEyWnVdwqEqz0VzclF8s7p7mV75MXZ41lPZVPZm2m0 91bVDW4t/eW8qf3DfpbT+1kAkezw4ghhwE5Jd4CyXigRUl+x6cMP5ggCpYlnRONs IdlGXMruGR7VgaM= Received: (qmail 87828 invoked by alias); 4 Jan 2018 13:58:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 87751 invoked by uid 89); 4 Jan 2018 13:58:57 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=inhibit X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 04 Jan 2018 13:58:55 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6D3915AD; Thu, 4 Jan 2018 05:58:54 -0800 (PST) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DEE143F581; Thu, 4 Jan 2018 05:58:53 -0800 (PST) From: Richard Earnshaw To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw Subject: [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate. Date: Thu, 4 Jan 2018 13:58:42 +0000 Message-Id: <1197003135ea0b0aeecd038a07db6b787a7db6f6.1515072356.git.Richard.Earnshaw@arm.com> In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 This patch implements support for __builtin_load_no_speculate on AArch64. On this architecture we inhibit speclation by emitting a combination of CSEL and a hint instruction that ensures the CSEL is full resolved when the operands to the CSEL may involve a speculative load. * config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed to 'H' operand qualifier. (aarch64_inhibit_load_speculation): New function. (TARGET_INHIBIT_LOAD_SPECULATION): Redefine. * config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile code. (nospeculate, nospeculateti): New patterns. --- gcc/config/aarch64/aarch64.c | 92 +++++++++++++++++++++++++++++++++++++++++++ gcc/config/aarch64/aarch64.md | 28 +++++++++++++ 2 files changed, 120 insertions(+) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 93e9d9f9..7410921 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5315,6 +5315,14 @@ aarch64_print_operand (FILE *f, rtx x, int code) break; case 'H': + /* Print the higher numbered register of a pair (TImode) of regs. */ + if (x == const0_rtx + || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x))) + { + asm_fprintf (f, "xzr"); + break; + } + if (!REG_P (x) || !GP_REGNUM_P (REGNO (x) + 1)) { output_operand_lossage ("invalid operand for '%%%c'", code); @@ -15115,6 +15123,87 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn) } } +static rtx +aarch64_inhibit_load_speculation (machine_mode mode, rtx result, rtx mem, + rtx lower_bound, rtx upper_bound, + rtx fail_result, rtx cmpptr) +{ + rtx cond, comparison; + rtx target = gen_reg_rtx (mode); + rtx tgt2 = result; + + if (!register_operand (cmpptr, ptr_mode)) + cmpptr = force_reg (ptr_mode, cmpptr); + + if (!register_operand (tgt2, mode)) + tgt2 = gen_reg_rtx (mode); + + if (upper_bound == NULL) + { + if (!register_operand (lower_bound, ptr_mode)) + lower_bound = force_reg (ptr_mode, lower_bound); + + cond = aarch64_gen_compare_reg (LTU, cmpptr, lower_bound); + comparison = gen_rtx_LTU (VOIDmode, cond, const0_rtx); + } + else if (lower_bound == NULL) + { + if (!register_operand (upper_bound, ptr_mode)) + upper_bound = force_reg (ptr_mode, upper_bound); + + cond = aarch64_gen_compare_reg (GEU, cmpptr, upper_bound); + comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx); + } + else + { + if (!register_operand (lower_bound, ptr_mode)) + lower_bound = force_reg (ptr_mode, lower_bound); + + if (!register_operand (upper_bound, ptr_mode)) + upper_bound = force_reg (ptr_mode, upper_bound); + + rtx cond1 = aarch64_gen_compare_reg (GEU, cmpptr, lower_bound); + rtx comparison1 = gen_rtx_GEU (ptr_mode, cond1, const0_rtx); + rtx failcond = GEN_INT (aarch64_get_condition_code (comparison1)^1); + cond = gen_rtx_REG (CCmode, CC_REGNUM); + if (ptr_mode == SImode) + emit_insn (gen_ccmpsi (cond1, cond, cmpptr, upper_bound, comparison1, + failcond)); + else + emit_insn (gen_ccmpdi (cond1, cond, cmpptr, upper_bound, comparison1, + failcond)); + comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx); + } + + rtx_code_label *label = gen_label_rtx (); + emit_jump_insn (gen_condjump (comparison, cond, label)); + emit_move_insn (target, mem); + emit_label (label); + + insn_code icode; + + switch (mode) + { + case E_QImode: icode = CODE_FOR_nospeculateqi; break; + case E_HImode: icode = CODE_FOR_nospeculatehi; break; + case E_SImode: icode = CODE_FOR_nospeculatesi; break; + case E_DImode: icode = CODE_FOR_nospeculatedi; break; + case E_TImode: icode = CODE_FOR_nospeculateti; break; + default: + gcc_unreachable (); + } + + if (! insn_operand_matches (icode, 4, fail_result)) + fail_result = force_reg (mode, fail_result); + + emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, fail_result)); + + if (tgt2 != result) + emit_move_insn (result, tgt2); + + return result; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -15554,6 +15643,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_CONSTANT_ALIGNMENT #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment +#undef TARGET_INHIBIT_LOAD_SPECULATION +#define TARGET_INHIBIT_LOAD_SPECULATION aarch64_inhibit_load_speculation + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f1e2a07..1a1f398 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,7 @@ UNSPECV_SET_FPSR ; Represent assign of FPSR content. UNSPECV_BLOCKAGE ; Represent a blockage UNSPECV_PROBE_STACK_RANGE ; Represent stack range probing. + UNSPECV_NOSPECULATE ; Inhibit speculation ] ) @@ -5797,6 +5798,33 @@ DONE; }) +(define_insn "nospeculate" + [(set (match_operand:ALLI 0 "register_operand" "=r") + (unspec_volatile:ALLI + [(match_operator 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:ALLI 3 "register_operand" "r") + (match_operand:ALLI 4 "aarch64_reg_or_zero" "rZ")] + UNSPECV_NOSPECULATE))] + "" + "csel\\t%0, %3, %4, %M1\;hint\t#0x14\t// CSDB" + [(set_attr "type" "csel") + (set_attr "length" "8")] +) + +(define_insn "nospeculateti" + [(set (match_operand:TI 0 "register_operand" "=r") + (unspec_volatile:TI + [(match_operator 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:TI 3 "register_operand" "r") + (match_operand:TI 4 "aarch64_reg_or_zero" "rZ")] + UNSPECV_NOSPECULATE))] + "" + "csel\\t%x0, %x3, %x4, %M1\;csel\\t%H0, %H3, %H4, %M1\;hint\t#0x14\t// CSDB" + [(set_attr "type" "csel") + (set_attr "length" "12")] +) ;; AdvSIMD Stuff (include "aarch64-simd.md") From patchwork Thu Jan 4 13:58:43 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Richard Earnshaw (lists)" X-Patchwork-Id: 855599 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-470137-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="th+APBDX"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zC8Yz4wGHz9s81 for ; Fri, 5 Jan 2018 00:59:51 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references:mime-version:content-type; q=dns; s= default; b=nAlSH1TYmq0dknlYc8useH52gyuzWyWpACnBLYfbtYHJ+nYbmI1X7 gW6Clzy53t/YrJiR+d0mgDxa4MgTMWvCiQSAgu0Dln5g8ag1GZAX/zZEqlcLGgNL ayHgQVkgNd89JHIWhxIp21DfJrheBqneoJkDLl1K1gjQGOSAUWf/K0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references:mime-version:content-type; s=default; bh=oOfvSiRzhd2Mna63x8qBWucNTZc=; b=th+APBDXUALoqZfO5qoYjNy+9W9N Hjnedn3l6rj5zwokWEk+UPSdYVDW3gkLQm2zlKhZVaQI3HqzFFnRoKRO7BdgI7zj 4xsfwCDp6SkC2XzVKj0KUsMQ88pPcgjvAsTm13+qbUBJk6zRkQv+ktqRi9kIsfI2 OI0bTCYnqB9+d2A= Received: (qmail 88286 invoked by alias); 4 Jan 2018 13:59:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 88036 invoked by uid 89); 4 Jan 2018 13:58:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=multimedia, Multimedia, vec-common.md, UD:vec-common.md X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 04 Jan 2018 13:58:56 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AD1FE15BE; Thu, 4 Jan 2018 05:58:55 -0800 (PST) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E507E3F581; Thu, 4 Jan 2018 05:58:54 -0800 (PST) From: Richard Earnshaw To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw Subject: [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic Date: Thu, 4 Jan 2018 13:58:43 +0000 Message-Id: <26f1fd261a467d6b43e1d77085dfb0e169782cf7.1515072356.git.Richard.Earnshaw@arm.com> In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 This patch implements despeculation on ARM. We only support it when generating ARM or Thumb2 code (we need conditional execution); and we only support it for sizes up to DImode. For unsupported cases we fall back to the generic code generation sequence so that a suitable failure warning is emitted. * config/arm/arm.c (arm_inhibit_load_speculation): New function (TARGET_INHIBIT_LOAD_SPECULATION): Redefine. * config/arm/unspec.md (VUNSPEC_NOSPECULATE): New unspec_volatile code. * config/arm/arm.md (cmp_ior): Make this pattern callable. (nospeculate, nospeculatedi): New patterns. --- gcc/config/arm/arm.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++ gcc/config/arm/arm.md | 40 ++++++++++++++++- gcc/config/arm/unspecs.md | 1 + 3 files changed, 147 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 11e35ad..d1fc0b9 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -321,6 +321,8 @@ static unsigned int arm_hard_regno_nregs (unsigned int, machine_mode); static bool arm_hard_regno_mode_ok (unsigned int, machine_mode); static bool arm_modes_tieable_p (machine_mode, machine_mode); static HOST_WIDE_INT arm_constant_alignment (const_tree, HOST_WIDE_INT); +static rtx arm_inhibit_load_speculation (machine_mode, rtx, rtx, rtx, rtx, + rtx, rtx); /* Table of machine attributes. */ static const struct attribute_spec arm_attribute_table[] = @@ -804,6 +806,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_CONSTANT_ALIGNMENT #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment + +#undef TARGET_INHIBIT_LOAD_SPECULATION +#define TARGET_INHIBIT_LOAD_SPECULATION arm_inhibit_load_speculation /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -31523,6 +31528,108 @@ arm_constant_alignment (const_tree exp, HOST_WIDE_INT align) return align; } +static rtx +arm_inhibit_load_speculation (machine_mode mode, rtx result, rtx mem, + rtx lower_bound, rtx upper_bound, + rtx fail_result, rtx cmpptr) +{ + rtx cond, comparison; + + /* We can't support this for Thumb1 as we have no suitable conditional + move operations. Nor do we support it for TImode. For both + these cases fall back to the generic code sequence which will emit + a suitable warning for us. */ + if (mode == TImode || TARGET_THUMB1) + return default_inhibit_load_speculation (mode, result, mem, lower_bound, + upper_bound, fail_result, cmpptr); + + + rtx target = gen_reg_rtx (mode); + rtx tgt2 = result; + + if (!register_operand (tgt2, mode)) + tgt2 = gen_reg_rtx (mode); + +if (!register_operand (cmpptr, ptr_mode)) + cmpptr = force_reg (ptr_mode, cmpptr); + + if (upper_bound == NULL) + { + if (!register_operand (lower_bound, ptr_mode)) + lower_bound = force_reg (ptr_mode, lower_bound); + + cond = arm_gen_compare_reg (LTU, cmpptr, lower_bound, NULL); + comparison = gen_rtx_LTU (VOIDmode, cond, const0_rtx); + } + else if (lower_bound == NULL) + { + if (!register_operand (upper_bound, ptr_mode)) + upper_bound = force_reg (ptr_mode, upper_bound); + + cond = arm_gen_compare_reg (GEU, cmpptr, upper_bound, NULL); + comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx); + } + else + { + /* We want to generate code for + result = (cmpptr < lower || cmpptr >= upper) ? 0 : *ptr; + Which can be recast to + result = (cmpptr < lower || upper <= cmpptr) ? 0 : *ptr; + which can be implemented as + cmp cmpptr, lower + cmpcs upper, cmpptr + bls 1f + ldr result, [ptr] + 1: + movls result, #0 + with suitable IT instructions as needed for thumb2. Later + optimization passes may make the load conditional. */ + + if (!register_operand (lower_bound, ptr_mode)) + lower_bound = force_reg (ptr_mode, lower_bound); + + if (!register_operand (upper_bound, ptr_mode)) + upper_bound = force_reg (ptr_mode, upper_bound); + + rtx comparison1 = gen_rtx_LTU (SImode, cmpptr, lower_bound); + rtx comparison2 = gen_rtx_LEU (SImode, upper_bound, cmpptr); + cond = gen_rtx_REG (arm_select_dominance_cc_mode (comparison1, + comparison2, + DOM_CC_X_OR_Y), + CC_REGNUM); + emit_insn (gen_cmp_ior (cmpptr, lower_bound, upper_bound, cmpptr, + comparison1, comparison2, cond)); + comparison = gen_rtx_NE (SImode, cond, const0_rtx); + } + + rtx_code_label *label = gen_label_rtx (); + emit_jump_insn (gen_arm_cond_branch (label, comparison, cond)); + emit_move_insn (target, mem); + emit_label (label); + + insn_code icode; + + switch (mode) + { + case E_QImode: icode = CODE_FOR_nospeculateqi; break; + case E_HImode: icode = CODE_FOR_nospeculatehi; break; + case E_SImode: icode = CODE_FOR_nospeculatesi; break; + case E_DImode: icode = CODE_FOR_nospeculatedi; break; + default: + gcc_unreachable (); + } + + if (! insn_operand_matches (icode, 4, fail_result)) + fail_result = force_reg (mode, fail_result); + + emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, fail_result)); + + if (tgt2 != result) + emit_move_insn (result, tgt2); + + return result; +} + #if CHECKING_P namespace selftest { diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index d60c5af..e700fdf 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9488,7 +9488,7 @@ (set_attr "type" "multiple")] ) -(define_insn "*cmp_ior" +(define_insn "cmp_ior" [(set (match_operand 6 "dominant_cc_register" "") (compare (ior:SI @@ -12015,6 +12015,44 @@ [(set_attr "length" "4") (set_attr "type" "coproc")]) +(define_insn "nospeculate" + [(set (match_operand:QHSI 0 "s_register_operand" "=l,l,r") + (unspec_volatile:QHSI + [(match_operator 1 "arm_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:QHSI 3 "s_register_operand" "0,0,0") + (match_operand:QHSI 4 "arm_not_operand" "I,K,r")] + VUNSPEC_NOSPECULATE))] + "TARGET_32BIT" + { + if (TARGET_THUMB) + return \"it\\t%d1\;mov%d1\\t%0, %4\;.inst 0xf3af8014\t%@ CSDB\"; + return \"mov%d1\\t%0, %4\;.inst 0xe320f014\t%@ CSDB\"; + } + [(set_attr "type" "mov_imm,mvn_imm,mov_reg") + (set_attr "conds" "use") + (set_attr "length" "8")] +) + +(define_insn "nospeculatedi" + [(set (match_operand:DI 0 "s_register_operand" "=r") + (unspec_volatile:DI + [(match_operator 1 "arm_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:DI 3 "s_register_operand" "0") + (match_operand:DI 4 "arm_rhs_operand" "rI")] + VUNSPEC_NOSPECULATE))] + "TARGET_32BIT" + { + if (TARGET_THUMB) + return \"it\\t%d1\;mov%d1\\t%Q0, %Q4\;it\\t%d1\;mov%d1\\t%R0, %R4\;.inst 0xf3af8014\t%@ CSDB\"; + return \"mov%d1\\t%Q0, %Q4\;mov%d1\\t%R0, %R4\;.inst 0xe320f014\t%@ CSDB\"; + } + [(set_attr "type" "mov_reg") + (set_attr "conds" "use") + (set_attr "length" "12")] +) + ;; Vector bits common to IWMMXT and Neon (include "vec-common.md") ;; Load the Intel Wireless Multimedia Extension patterns diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index c474f4b..727a5ab 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -168,6 +168,7 @@ VUNSPEC_MCRR2 ; Represent the coprocessor mcrr2 instruction. VUNSPEC_MRRC ; Represent the coprocessor mrrc instruction. VUNSPEC_MRRC2 ; Represent the coprocessor mrrc2 instruction. + VUNSPEC_NOSPECULATE ; Represent a despeculation sequence. ]) ;; Enumerators for NEON unspecs.