From patchwork Mon Dec 9 15:28:08 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Schmidt X-Patchwork-Id: 299102 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 9CD782C00FC for ; Tue, 10 Dec 2013 02:28:54 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=TGEapTb1K9ncRD4hy 4UCdXUU8bAz3ZHkGzmtLIcXJg+UPyMSMZagzXxfDJQiXR+YEYQxD9WAO6M8IAZh6 bwB6hicgheEHSRIvu1EDHuh/6Y7edq+YmYQVclUc1Sh8FdLq6agopUDKu6qZme+l B6hvcdH6MOvE6SK8XK+ujVP76M= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=default; bh=5WTYXb3V27YwQIc06sMTtEp 8Rzw=; b=bXZ5921IEPJYZ337elVPmgpIsFL2ElGmIcxSpnvlF6FTEhDCtbor1+s V5rtvd9Ys9xvTxXy/7Sd0O2m1dNGHotjwXn1e5vQephNpdx22Whba4fKaffkQSMb /3PsdkI09AI38okspW5rwgu6LCeP5gSLIiDJ/zcN8a1D4NYVqLhY= Received: (qmail 7337 invoked by alias); 9 Dec 2013 15:28:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 7326 invoked by uid 89); 9 Dec 2013 15:28:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.5 required=5.0 tests=AWL, BAYES_50, GUARANTEED_100_PERCENT, KAM_STOCKGEN autolearn=no version=3.3.2 X-HELO: relay1.mentorg.com Received: from Unknown (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 09 Dec 2013 15:28:34 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1Vq2ks-0003DE-3C from Bernd_Schmidt@mentor.com for gcc-patches@gcc.gnu.org; Mon, 09 Dec 2013 07:28:18 -0800 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 9 Dec 2013 07:28:18 -0800 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.2.247.3; Mon, 9 Dec 2013 15:28:14 +0000 Message-ID: <52A5E188.60209@codesourcery.com> Date: Mon, 9 Dec 2013 16:28:08 +0100 From: Bernd Schmidt User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130926 Thunderbird/17.0.9 MIME-Version: 1.0 To: GCC Patches Subject: [gomp4, 23/23] nvptx port files References: <52A5D8D4.2030803@codesourcery.com> In-Reply-To: <52A5D8D4.2030803@codesourcery.com> These are the backend files. Once again, this is currently a work in progress. Bernd gcc/ * config/nvptx/nvptx.c: New file. * config/nvptx/nvptx.h: New file. * config/nvptx/nvptx-protos.h: New file. * config/nvptx/nvptx-c.c: New file. * config/nvptx/nvptx.md: New file. * config/nvptx/t-nvptx: New file. * config/nvptx/nvptx.opt: New file. * common/config/nvptx/nvptx-common.c: New file. * config.gcc: Handle nvptx-*-*. ------------------------------------------------------------------------ Index: gcc/common/config/nvptx/nvptx-common.c =================================================================== --- /dev/null +++ gcc/common/config/nvptx/nvptx-common.c @@ -0,0 +1,35 @@ +/* NVPTX common hooks. + Copyright (C) 2013 Free Software Foundation, Inc. + Contributed by Bernd Schmidt + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "diagnostic-core.h" +#include "tm.h" +#include "tm_p.h" +#include "common/common-target.h" +#include "common/common-target-def.h" +#include "opts.h" +#include "flags.h" + +#undef TARGET_HAVE_NAMED_SECTIONS +#define TARGET_HAVE_NAMED_SECTIONS false + +struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; Index: gcc/config.gcc =================================================================== --- gcc/config.gcc.orig +++ gcc/config.gcc @@ -425,6 +425,11 @@ nds32*) cpu_type=nds32 extra_headers="nds32_intrinsic.h" ;; +nvptx-*-*) + cpu_type=nvptx + c_target_objs="nvptx-c.o" + cxx_target_objs="nvptx-c.o" + ;; picochip-*-*) cpu_type=picochip ;; @@ -2107,6 +2112,10 @@ nds32be-*-*) tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}" tmake_file="nds32/t-mlibs" ;; +nvptx-*) + tm_file="${tm_file} newlib-stdint.h" + tmake_file="nvptx/t-nvptx" + ;; pdp11-*-*) tm_file="${tm_file} newlib-stdint.h" use_gcc_stdint=wrap Index: gcc/config/nvptx/nvptx-c.c =================================================================== --- /dev/null +++ gcc/config/nvptx/nvptx-c.c @@ -0,0 +1,37 @@ +/* NVPTX C-specific support + Copyright (C) 2013 Free Software Foundation, Inc. + Contributed by Bernd Schmidt + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "tree.h" +#include "c-family/c-common.h" +#include "nvptx-protos.h" + +/* Implements REGISTER_TARGET_PRAGMAS. */ +void +nvptx_register_pragmas (void) +{ + c_register_addr_space ("__ptxglobal", ADDR_SPACE_GLOBAL); + c_register_addr_space ("__ptxshared", ADDR_SPACE_SHARED); + c_register_addr_space ("__ptxconst", ADDR_SPACE_CONST); + c_register_addr_space ("__ptxlocal", ADDR_SPACE_LOCAL); +} Index: gcc/config/nvptx/nvptx.c =================================================================== --- /dev/null +++ gcc/config/nvptx/nvptx.c @@ -0,0 +1,2809 @@ +/* Target code for NVPTX. + Copyright (C) 2013 Free Software Foundation, Inc. + Contributed by Bernd Schmidt + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "rtl.h" +#include "tree.h" +#include "insn-flags.h" +#include "output.h" +#include "insn-attr.h" +#include "insn-codes.h" +#include "expr.h" +#include "regs.h" +#include "optabs.h" +#include "recog.h" +#include "ggc.h" +#include "timevar.h" +#include "tm_p.h" +#include "tm-preds.h" +#include "tm-constrs.h" +#include "function.h" +#include "langhooks.h" +#include "dbxout.h" +#include "target.h" +#include "target-def.h" +#include "diagnostic.h" +#include "basic-block.h" +#include "stor-layout.h" +#include "calls.h" +#include "df.h" + +/* Allocate a new, cleared machine_function structure. */ + +static struct machine_function * +nvptx_init_machine_status (void) +{ + return ggc_alloc_cleared_machine_function (); +} + +/* Implement TARGET_OPTION_OVERRIDE. */ + +static void +nvptx_option_override (void) +{ + init_machine_status = nvptx_init_machine_status; + /* Gives us a predictable order, which we need especially for variables. */ + flag_toplevel_reorder = 1; + /* Assumes that it will see only hard registers. */ + flag_var_tracking = 0; +} + +static void +nvptx_file_start (void) +{ + fputs ("// BEGIN PREAMBLE\n", asm_out_file); + fputs ("\t.version\t3.1\n", asm_out_file); + fputs ("\t.target\tsm_30\n", asm_out_file); + fprintf (asm_out_file, "\t.address_size %d\n", GET_MODE_BITSIZE (Pmode)); + fputs ("// END PREAMBLE\n", asm_out_file); +} + +const char * +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote) +{ + switch (mode) + { + case BLKmode: + return ".b8"; + case BImode: + return ".pred"; + case QImode: + if (promote) + return ".u32"; + else + return ".u8"; + case HImode: + return ".u16"; + case SImode: + return ".u32"; + case DImode: + return ".u64"; + + case SFmode: + return ".f32"; + case DFmode: + return ".f64"; + + default: + gcc_unreachable (); + } +} + +static bool +nvptx_split_reg_p (enum machine_mode mode) +{ + if (COMPLEX_MODE_P (mode)) + return true; + if (mode == TImode) + return true; + return false; +} + +#define PASS_OR_RETURN_IN_REG(MODE) \ + (GET_MODE_CLASS (MODE) == MODE_INT \ + || GET_MODE_CLASS (MODE) == MODE_FLOAT) + +/* Perform a mode promotion for a function argument. Return the promoted + mode. */ +static enum machine_mode +arg_promotion (enum machine_mode mode) +{ + if (mode == QImode || mode == HImode) + return SImode; + return mode; +} + +/* Write the declaration of a function arg ARG to FILE. I is the index + of the argument, MODE its mode. If KERNEL is true, write a .param + declaration instead of a .reg and change the name prefix to in_ar. */ +static int +write_one_arg (FILE *file, tree arg, int i, enum machine_mode mode, + bool no_arg_types) +{ + int count = 1; + + if (COMPLEX_MODE_P (mode)) + { + tree subtype = TREE_TYPE (arg); + mode = TYPE_MODE (subtype); + count = 2; + } + else if (mode == TImode) + { + count = 2; + mode = DImode; + } + if (count == 2) + { + write_one_arg (file, NULL_TREE, i, mode, false); + write_one_arg (file, NULL_TREE, i + 1, mode, false); + return i + 1; + } + + if (!PASS_OR_RETURN_IN_REG (mode)) + mode = Pmode; + mode = arg_promotion (mode); + if (no_arg_types && mode == SFmode) + mode = DFmode; + + if (i > 0) + fprintf (file, ", "); + fprintf (file, ".param"); + fprintf (file, "%s %%in_ar%d", nvptx_ptx_type_from_mode (mode, false), + i + 1); + if (mode == BLKmode) + fprintf (file, "["HOST_WIDE_INT_PRINT_DEC"]", int_size_in_bytes (arg)); + return i; +} + +static void +nvptx_write_function_decl (FILE *file, const char *name, const_tree decl) +{ + tree fntype = TREE_TYPE (decl); + tree result_type = TREE_TYPE (fntype); + tree args = TYPE_ARG_TYPES (fntype); + tree attrs = DECL_ATTRIBUTES (decl); + bool kernel = lookup_attribute ("kernel", attrs) != NULL_TREE; + bool is_main = strcmp (name, "main") == 0; + bool args_from_decl = false; + + /* We get: + NULL in TYPE_ARG_TYPES, for old-style functions + NULL in DECL_ARGUMENTS, for builtin functions without another + declaration. + So we have to pick the best one we have. */ + if (args == 0) + { + args = DECL_ARGUMENTS (decl); + args_from_decl = true; + } + + if (DECL_EXTERNAL (decl)) + fprintf (file, ".extern "); + else if (TREE_PUBLIC (decl)) + fprintf (file, ".visible "); + + if (kernel) + fprintf (file, ".entry "); + else + fprintf (file, ".func "); + + /* Declare the result. */ + bool return_in_mem = false; + if (TYPE_MODE (result_type) != VOIDmode) + { + enum machine_mode mode = TYPE_MODE (result_type); + if (!PASS_OR_RETURN_IN_REG (mode)) + return_in_mem = true; + else + { + mode = arg_promotion (mode); + fprintf (file, "(.param%s %%out_retval)", + nvptx_ptx_type_from_mode (mode, false)); + } + } + + assemble_name_raw (file, name); + + /* Declare argument types. */ + if ((args != NULL_TREE + && !(TREE_CODE (args) == TREE_LIST && TREE_VALUE (args) == void_type_node)) + || is_main + || return_in_mem) + { + fprintf (file, "("); + int i = 0; + bool any_args = false; + if (return_in_mem) + { + fprintf (file, ".param.u%d %%in_ar1", GET_MODE_BITSIZE (Pmode)); + i++; + } + while (args != NULL_TREE) + { + tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args); + enum machine_mode mode = TYPE_MODE (type); + + if (mode != VOIDmode) + { + i = write_one_arg (file, type, i, mode, + TYPE_ARG_TYPES (fntype) == 0); + any_args = true; + } + args = TREE_CHAIN (args); + i++; + } + if (stdarg_p (fntype)) + { + gcc_assert (i > 0); + fprintf (file, ", .param.u%d %%in_argp", GET_MODE_BITSIZE (Pmode)); + } + if (!any_args && is_main) + fprintf (file, ".param.u32 %%argc, .param.u%d %%argv", + GET_MODE_BITSIZE (Pmode)); + fprintf (file, ")"); + } +} + +static void +walk_args_for_param (FILE *file, tree argtypes, tree args, bool write_copy, + bool return_in_mem) +{ + int i; + + bool args_from_decl = false; + if (argtypes == 0) + args_from_decl = true; + else + args = argtypes; + + for (i = return_in_mem ? 1 : 0; args != NULL_TREE; args = TREE_CHAIN (args)) + { + tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args); + enum machine_mode mode = TYPE_MODE (type); + + if (mode == VOIDmode) + break; + + int count = 1; + if (COMPLEX_MODE_P (mode)) + { + count = 2; + mode = TYPE_MODE (TREE_TYPE (type)); + } + else if (!PASS_OR_RETURN_IN_REG (mode)) + mode = Pmode; + else + mode = arg_promotion (mode); + + while (count-- > 0) + { + i++; + if (write_copy) + fprintf (file, "\tld.param%s %%ar%d, [%%in_ar%d];\n", + nvptx_ptx_type_from_mode (mode, false), i, i); + else + fprintf (file, "\t.reg%s %%ar%d;\n", + nvptx_ptx_type_from_mode (mode, false), i); + } + } +} + +static void +write_function_decl_only (FILE *file, const char *name, const_tree decl) +{ + fprintf (file, "// BEGIN FUNCTION DECL: %s\n", name); + nvptx_write_function_decl (file, name, decl); + fprintf (file, ";\n"); + fprintf (file, "// END FUNCTION DECL\n"); +} + +void +nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) +{ + tree fntype = TREE_TYPE (decl); + tree result_type = TREE_TYPE (fntype); + tree argtypes = TYPE_ARG_TYPES (fntype); + tree attrs = DECL_ATTRIBUTES (decl); + bool kernel = lookup_attribute ("kernel", attrs) != NULL_TREE; + + write_function_decl_only (file, name, decl); + + fprintf (file, "// BEGIN FUNCTION DEF: %s\n", name); + + nvptx_write_function_decl (file, name, decl); + + bool return_in_mem = false; + if (TYPE_MODE (result_type) != VOIDmode) + { + enum machine_mode mode = TYPE_MODE (result_type); + if (!PASS_OR_RETURN_IN_REG (mode)) + return_in_mem = true; + } + + fprintf (file, "\n{\n"); + + /* Ensure all arguments that should live in a register have one + declared. We'll emit the copies below. */ + walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl), + false, return_in_mem); + if (return_in_mem) + fprintf (file, ".reg.u%d %%ar1;\n", GET_MODE_BITSIZE (Pmode)); + else if (TYPE_MODE (result_type) != VOIDmode) + { + enum machine_mode mode = arg_promotion (TYPE_MODE (result_type)); + fprintf (file, ".reg%s %%retval;\n", + nvptx_ptx_type_from_mode (mode, false)); + } + + if (stdarg_p (fntype)) + fprintf (file, ".reg.u%d %%argp;\n", GET_MODE_BITSIZE (Pmode)); + + /* Declare the pseudos we have as ptx registers. */ + int maxregs = max_reg_num (); + for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++) + { + if (regno_reg_rtx[i] != const0_rtx) + { + enum machine_mode mode = PSEUDO_REGNO_MODE (i); + int count = 1; + if (nvptx_split_reg_p (mode)) + { + if (COMPLEX_MODE_P (mode)) + { + count = 2; + mode = GET_MODE_INNER (mode); + } + else + { + count = GET_MODE_SIZE (mode) / UNITS_PER_WORD; + mode = DImode; + } + while (count-- > 0) + fprintf (file, "\t.reg%s %%r%d$%d;\n", + nvptx_ptx_type_from_mode (mode, true), + i, count); + } + else + fprintf (file, "\t.reg%s %%r%d;\n", + nvptx_ptx_type_from_mode (mode, true), + i); + } + } + + /* The only reason we might be using outgoing args is if we call a stdargs + function. Allocate the space for this. If we called varargs functions + without passing any variadic arguments, we'll see a reference to outargs + even with a zero outgoing_args_size. */ + HOST_WIDE_INT sz = crtl->outgoing_args_size; + if (sz == 0) + sz = 1; + if (cfun->machine->has_call_with_varargs) + fprintf (file, "\t.local.b8 %%outargs["HOST_WIDE_INT_PRINT_DEC"];\n", + sz); + + /* Declare a local variable for the frame. */ + sz = get_frame_size (); + if (sz > 0) + fprintf (file, "\t.local.b8 %%frame["HOST_WIDE_INT_PRINT_DEC"];\n", sz); + + /* Now emit any copies necessary for arguments. */ + walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl), + true, return_in_mem); + if (return_in_mem) + fprintf (file, "ld.param.u%d %%ar1, [%%in_ar1];\n", + GET_MODE_BITSIZE (Pmode)); + if (stdarg_p (fntype)) + fprintf (file, "ld.param.u%d %%argp, [%%in_argp];\n", + GET_MODE_BITSIZE (Pmode)); +} + +const char * +nvptx_output_return (void) +{ + tree fntype = TREE_TYPE (current_function_decl); + tree result_type = TREE_TYPE (fntype); + if (TYPE_MODE (result_type) != VOIDmode) + { + enum machine_mode mode = TYPE_MODE (result_type); + if (PASS_OR_RETURN_IN_REG (mode)) + { + mode = arg_promotion (mode); + fprintf (asm_out_file, "\tst.param%s\t[%%out_retval], %%retval;\n", + nvptx_ptx_type_from_mode (mode, false)); + } + } + + return "ret;"; +} + +static void +nvptx_asm_predeclare_function (FILE *file, const char *name, const_tree decl) +{ + if (DECL_EXTERNAL (decl)) + write_function_decl_only (file, name, decl); +} + +void +nvptx_function_end (FILE *file) +{ + fprintf (file, "\t}\n"); + fprintf (file, "// END FUNCTION DEF\n"); +} + +/* Decide whether we can make a sibling call to a function. For ptx, we + can't. */ + +static bool +nvptx_function_ok_for_sibcall (tree, tree) +{ + return false; +} + +static void +nvptx_start_call_args (rtx arg) +{ + if (cfun->machine->start_call == NULL_RTX) + { + cfun->machine->call_args = NULL_RTX; + cfun->machine->start_call = gen_start_call_block (const0_rtx); + emit_insn (cfun->machine->start_call); + } + if (arg == pc_rtx) + return; + + rtx args_so_far = cfun->machine->call_args; + if (REG_P (arg)) + cfun->machine->call_args = alloc_EXPR_LIST (VOIDmode, arg, args_so_far); +} + +static void +nvptx_end_call_args (void) +{ + cfun->machine->start_call = NULL_RTX; + free_EXPR_LIST_list (&cfun->machine->call_args); +} + +/* Emit the sequence for a call. */ +void +nvptx_expand_call (rtx retval, rtx address) +{ + int nargs; + rtx callee = XEXP (address, 0); + rtx pat, t; + rtvec vec; + + if (retval != NULL_RTX) + { + rtx start_pat = cfun->machine->start_call; + XVECEXP (start_pat, 0, 0) = gen_rtx_SCRATCH (GET_MODE (retval)); + } + + nargs = 0; + for (t = cfun->machine->call_args; t; t = XEXP (t, 1)) + nargs++; + + bool has_varargs = false; + if (GET_CODE (callee) == SYMBOL_REF) + { + tree decl = SYMBOL_REF_DECL (callee); + if (decl != NULL_TREE) + { + tree type = TREE_TYPE (decl); + if (stdarg_p (type)) + { + has_varargs = true; + cfun->machine->has_call_with_varargs = true; + } + } + } + vec = rtvec_alloc (nargs + 1 + (has_varargs ? 1 : 0)); + pat = gen_rtx_PARALLEL (VOIDmode, vec); + if (has_varargs) + { + rtx this_arg = gen_reg_rtx (Pmode); + if (Pmode == DImode) + emit_insn (gen_convaddr_from_localdi (this_arg, stack_pointer_rtx)); + else + emit_insn (gen_convaddr_from_localsi (this_arg, stack_pointer_rtx)); + XVECEXP (pat, 0, nargs + 1) = gen_rtx_USE (VOIDmode, this_arg); + } + + for (t = cfun->machine->call_args; t; t = XEXP (t, 1)) + { + rtx this_arg = XEXP (t, 0); + XVECEXP (pat, 0, nargs--) = gen_rtx_USE (VOIDmode, this_arg); + } + + if (!call_insn_operand (callee, Pmode)) + { + callee = force_reg (Pmode, callee); + address = change_address (address, QImode, callee); + } + + t = gen_rtx_CALL (VOIDmode, address, const0_rtx); + if (retval != NULL_RTX) + t = gen_rtx_SET (VOIDmode, retval, t); + XVECEXP (pat, 0, 0) = t; + emit_call_insn (pat); +} + +/* Implement TARGET_FUNCTION_ARG. */ +static rtx +nvptx_function_arg (cumulative_args_t cum_v, enum machine_mode mode, + const_tree, bool named) +{ + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + + if (mode == VOIDmode) + return NULL_RTX; + + if (named) + { + enum machine_mode wide_mode = arg_promotion (mode); + rtx t = gen_reg_rtx (mode); + if (wide_mode != mode) + t = gen_lowpart (mode, t); + return t; + } + else + return NULL_RTX; +} + +static rtx +nvptx_function_incoming_arg (cumulative_args_t cum_v, enum machine_mode mode, + const_tree, bool named) +{ + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + if (mode == VOIDmode) + return NULL_RTX; + + if (!named) + return NULL_RTX; + + int count = 1; + if (mode == TImode) + { + count = 2; + mode = DImode; + } + + if (count == 1) + { + rtx t = gen_rtx_UNSPEC (mode, + gen_rtvec (1, GEN_INT (1 + cum->count)), + UNSPEC_ARG_REG); + return t; + } + + rtx t = gen_rtx_PARALLEL (BLKmode, rtvec_alloc (count)); + while (count-- > 0) + { + rtvec vec = gen_rtvec (1, GEN_INT (1 + cum->count + count)); + rtx arg = gen_rtx_UNSPEC (mode, vec, UNSPEC_ARG_REG); + rtx off = GEN_INT (count * GET_MODE_SIZE (mode)); + XVECEXP (t, 0, count) = gen_rtx_EXPR_LIST (VOIDmode, arg, off); + } + return t; +} + +/* Implement TARGET_FUNCTION_ARG_ADVANCE. */ +static void +nvptx_function_arg_advance (cumulative_args_t cum_v, enum machine_mode mode, + const_tree type ATTRIBUTE_UNUSED, + bool named ATTRIBUTE_UNUSED) +{ + CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); + if (mode == TImode) + cum->count += 2; + else + cum->count++; +} + + +/* Implement TARGET_FUNCTION_ARG_BOUNDARY. */ + +static unsigned int +nvptx_function_arg_boundary (enum machine_mode mode, const_tree type) +{ + unsigned int boundary = type ? TYPE_ALIGN (type) : GET_MODE_BITSIZE (mode); + + if (boundary > BITS_PER_WORD) + return 2 * BITS_PER_WORD; + + if (mode == BLKmode) + { + HOST_WIDE_INT size = int_size_in_bytes (type); + if (size > 4) + return 2 * BITS_PER_WORD; + if (boundary < BITS_PER_WORD) + { + if (size >= 3) + return BITS_PER_WORD; + if (size >= 2) + return 2 * BITS_PER_UNIT; + } + } + return boundary; +} + +/* Implement TARGET_FUNCTION_ARG_ROUND_BOUNDARY. */ +static unsigned int +nvptx_function_arg_round_boundary (enum machine_mode mode, const_tree type) +{ + return nvptx_function_arg_boundary (mode, type); +} + +/* TARGET_FUNCTION_VALUE implementation. Returns an RTX representing the place + where function FUNC returns or receives a value of data type TYPE. */ + +static rtx +nvptx_function_value (const_tree type, const_tree func ATTRIBUTE_UNUSED, + bool outgoing) +{ + int unsignedp = TYPE_UNSIGNED (type); + enum machine_mode orig_mode = TYPE_MODE (type); + enum machine_mode mode = promote_function_mode (type, orig_mode, + &unsignedp, NULL_TREE, 1); + if (outgoing) + return gen_rtx_REG (mode, 4); + if (cfun->machine->start_call == NULL_RTX) + /* Pretend to return in a hard reg for early uses before pseudos can be + generated. */ + return gen_rtx_REG (mode, 4); + return gen_reg_rtx (mode); +} + +/* Implement TARGET_LIBCALL_VALUE. */ + +static rtx +nvptx_libcall_value (enum machine_mode mode, const_rtx) +{ + if (cfun->machine->start_call == NULL_RTX) + /* Pretend to return in a hard reg for early uses before pseudos can be + generated. */ + return gen_rtx_REG (mode, 4); + return gen_reg_rtx (mode); +} + +/* Implement TARGET_FUNCTION_VALUE_REGNO_P. */ + +static bool +nvptx_function_value_regno_p (const unsigned int regno) +{ + return regno == 4; +} + +/* Types with a mode other than those supported by the machine are passed by + reference in memory. */ + +static bool +nvptx_pass_by_reference (cumulative_args_t, enum machine_mode mode, + const_tree type, bool) +{ + return mode == BLKmode; +} + +/* Decide whether a type should be returned in memory (true) + or in a register (false). This is called by the macro + TARGET_RETURN_IN_MEMORY. */ + +static bool +nvptx_return_in_memory (const_tree type, const_tree fntype) +{ + enum machine_mode mode = TYPE_MODE (type); + if (!PASS_OR_RETURN_IN_REG (mode)) + return true; + return false; + + return TYPE_MODE (type) == BLKmode; + int size = int_size_in_bytes (type); + return size > 2 * UNITS_PER_WORD || size == -1; +} + +/* Predict just emitted jump instruction to be taken with probability PROB. */ +static void +predict_jump (int prob) +{ + rtx insn = get_last_insn (); + gcc_assert (JUMP_P (insn)); + add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); +} + +/* Helper function for the string operations below. Dest VARIABLE whether + it is aligned to VALUE bytes. If true, jump to the label. */ +static rtx +nvptx_expand_aligntest (rtx variable, int value, bool epilogue) +{ + rtx label = gen_label_rtx (); + rtx tmpcount = gen_reg_rtx (GET_MODE (variable)); + if (GET_MODE (variable) == DImode) + emit_insn (gen_anddi3 (tmpcount, variable, GEN_INT (value))); + else + emit_insn (gen_andsi3 (tmpcount, variable, GEN_INT (value))); + emit_cmp_and_jump_insns (tmpcount, const0_rtx, EQ, 0, GET_MODE (variable), + 1, label); + return label; +} + +/* Adjust COUNTER by the VALUE. */ +static void +nvptx_adjust_counter (rtx countreg, HOST_WIDE_INT value) +{ + rtx (*gen_add)(rtx, rtx, rtx) + = GET_MODE (countreg) == DImode ? gen_adddi3 : gen_addsi3; + + emit_insn (gen_add (countreg, countreg, GEN_INT (-value))); +} + +/* Decide on alignment. We know that the operand is already aligned to ALIGN + (ALIGN can be based on profile feedback and thus it is not 100% guaranteed). */ +static int +decide_alignment (int align, int expected_size, enum machine_mode move_mode) +{ + int desired_align = 0; + + if (move_mode == VOIDmode) + return 0; + + return align; + + desired_align = GET_MODE_SIZE (move_mode); + if (optimize_size) + desired_align = 1; + if (desired_align < align) + desired_align = align; + if (expected_size != -1 && expected_size < 4) + desired_align = align; + + return desired_align; +} + +/* Return mode for the memcpy/memset loop counter. Prefer SImode over + DImode for constant loop counts. */ + +static enum machine_mode +counter_mode (rtx count_exp) +{ + if (GET_MODE (count_exp) != VOIDmode) + return GET_MODE (count_exp); + if (!CONST_INT_P (count_exp)) + return Pmode; + if (INTVAL (count_exp) & ~0xffffffff) + return DImode; + return SImode; +} + +/* When SRCPTR is non-NULL, output simple loop to move memory + pointer to SRCPTR to DESTPTR via chunks of MODE unrolled UNROLL times, + overall size is COUNT specified in bytes. When SRCPTR is NULL, output the + equivalent loop to set memory by VALUE (supposed to be in MODE). + + The size is rounded down to whole number of chunk size moved at once. + SRCMEM and DESTMEM provide MEMrtx to feed proper aliasing info. */ + + +static void +expand_set_or_movmem_via_loop (rtx destmem, rtx srcmem, + rtx destptr, rtx srcptr, rtx value, + rtx count, enum machine_mode mode, int unroll, + int expected_size) +{ + rtx out_label, top_label, iter, tmp; + enum machine_mode iter_mode = counter_mode (count); + int piece_size_n = GET_MODE_SIZE (mode) * unroll; + rtx piece_size = GEN_INT (piece_size_n); + rtx piece_size_mask = GEN_INT (~((GET_MODE_SIZE (mode) * unroll) - 1)); + rtx size; + int i; + + top_label = gen_label_rtx (); + out_label = gen_label_rtx (); + iter = gen_reg_rtx (iter_mode); + + size = expand_simple_binop (iter_mode, AND, count, piece_size_mask, + NULL, 1, OPTAB_DIRECT); + /* Those two should combine. */ + if (piece_size == const1_rtx) + { + emit_cmp_and_jump_insns (size, const0_rtx, EQ, NULL_RTX, iter_mode, + true, out_label); + } + emit_move_insn (iter, const0_rtx); + + emit_label (top_label); + + tmp = convert_modes (Pmode, iter_mode, iter, true); + + /* This assert could be relaxed - in this case we'll need to compute + smallest power of two, containing in PIECE_SIZE_N and pass it to + offset_address. */ + gcc_assert ((piece_size_n & (piece_size_n - 1)) == 0); + destmem = offset_address (destmem, tmp, piece_size_n); + destmem = adjust_address (destmem, mode, 0); + + if (srcmem) + { + srcmem = offset_address (srcmem, copy_rtx (tmp), piece_size_n); + srcmem = adjust_address (srcmem, mode, 0); + + rtx tmpreg[4]; + gcc_assert (unroll <= 4); + for (i = 0; i < unroll; i++) + { + tmpreg[i] = gen_reg_rtx (mode); + if (i) + { + srcmem = + adjust_address (copy_rtx (srcmem), mode, GET_MODE_SIZE (mode)); + } + emit_move_insn (tmpreg[i], srcmem); + } + for (i = 0; i < unroll; i++) + { + if (i) + { + destmem = + adjust_address (copy_rtx (destmem), mode, GET_MODE_SIZE (mode)); + } + emit_move_insn (destmem, tmpreg[i]); + } + } + else + for (i = 0; i < unroll; i++) + { + if (i) + destmem + = adjust_address (copy_rtx (destmem), mode, GET_MODE_SIZE (mode)); + emit_move_insn (destmem, value); + } + + tmp = expand_simple_binop (iter_mode, PLUS, iter, piece_size, iter, + true, OPTAB_LIB_WIDEN); + if (tmp != iter) + emit_move_insn (iter, tmp); + + emit_cmp_and_jump_insns (iter, size, LT, NULL_RTX, iter_mode, + true, top_label); + + iter = force_reg (Pmode, convert_to_mode (Pmode, iter, 1)); + tmp = expand_simple_binop (Pmode, PLUS, destptr, iter, destptr, + true, OPTAB_LIB_WIDEN); + if (tmp != destptr) + emit_move_insn (destptr, tmp); + if (srcptr) + { + tmp = expand_simple_binop (Pmode, PLUS, srcptr, iter, srcptr, + true, OPTAB_LIB_WIDEN); + if (tmp != srcptr) + emit_move_insn (srcptr, tmp); + } + emit_label (out_label); +} + +static void +emit_strset (rtx destptr, rtx destmem, rtx val) +{ + if (GET_MODE (destmem) != GET_MODE (val)) + destmem = adjust_address_nv (destmem, GET_MODE (val), 0); + + rtx incremented = gen_rtx_PLUS (Pmode, destptr, + GEN_INT (GET_MODE_SIZE (GET_MODE + (val)))); + emit_move_insn (destmem, val); + emit_insn (gen_rtx_SET (VOIDmode, destptr, incremented)); +} + +/* This function emits moves to copy SIZE_TO_MOVE bytes from SRCMEM to + DESTMEM. + SRC is passed by pointer to be updated on return. + Return value is updated DST. */ +static rtx +emit_memmov (rtx destmem, rtx *srcmem, rtx destptr, rtx srcptr, + HOST_WIDE_INT size_to_move) +{ + rtx dst = destmem, src = *srcmem, adjust, tempreg; + enum insn_code code; + enum machine_mode move_mode; + int piece_size, i; + + /* Find the widest mode in which we could perform moves. + Start with the biggest power of 2 less than SIZE_TO_MOVE and half + it until move of such size is supported. */ + piece_size = 1 << floor_log2 (size_to_move); + move_mode = mode_for_size (piece_size * BITS_PER_UNIT, MODE_INT, 0); + code = optab_handler (mov_optab, move_mode); + while (code == CODE_FOR_nothing && piece_size > 1) + { + piece_size >>= 1; + move_mode = mode_for_size (piece_size * BITS_PER_UNIT, MODE_INT, 0); + code = optab_handler (mov_optab, move_mode); + } + + /* Find the corresponding vector mode with the same size as MOVE_MODE. + MOVE_MODE is an integer mode at the moment (SI, DI, TI, etc.). */ + if (GET_MODE_SIZE (move_mode) > GET_MODE_SIZE (word_mode)) + { + int nunits = GET_MODE_SIZE (move_mode) / GET_MODE_SIZE (word_mode); + move_mode = mode_for_vector (word_mode, nunits); + code = optab_handler (mov_optab, move_mode); + if (code == CODE_FOR_nothing) + { + move_mode = word_mode; + piece_size = GET_MODE_SIZE (move_mode); + code = optab_handler (mov_optab, move_mode); + } + } + gcc_assert (code != CODE_FOR_nothing); + + dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0); + src = adjust_automodify_address_nv (src, move_mode, srcptr, 0); + + /* Emit moves. We'll need SIZE_TO_MOVE/PIECE_SIZES moves. */ + gcc_assert (size_to_move % piece_size == 0); + adjust = GEN_INT (piece_size); + for (i = 0; i < size_to_move; i += piece_size) + { + /* We move from memory to memory, so we'll need to do it via + a temporary register. */ + tempreg = gen_reg_rtx (move_mode); + emit_insn (GEN_FCN (code) (tempreg, src)); + emit_insn (GEN_FCN (code) (dst, tempreg)); + + emit_move_insn (destptr, + gen_rtx_PLUS (Pmode, copy_rtx (destptr), adjust)); + emit_move_insn (srcptr, + gen_rtx_PLUS (Pmode, copy_rtx (srcptr), adjust)); + + dst = adjust_automodify_address_nv (dst, move_mode, destptr, + piece_size); + src = adjust_automodify_address_nv (src, move_mode, srcptr, + piece_size); + } + + /* Update DST and SRC rtx. */ + *srcmem = src; + return dst; +} + +/* Output code to copy at most count & (max_size - 1) bytes from SRC to DEST. */ +static void +expand_movmem_epilogue (rtx destmem, rtx srcmem, + rtx destptr, rtx srcptr, rtx count, int max_size) +{ + rtx src, dest; + if (CONST_INT_P (count)) + { + HOST_WIDE_INT countval = INTVAL (count); + HOST_WIDE_INT epilogue_size = countval % max_size; + int i; + + /* For now MAX_SIZE should be a power of 2. This assert could be + relaxed, but it'll require a bit more complicated epilogue + expanding. */ + gcc_assert ((max_size & (max_size - 1)) == 0); + for (i = max_size; i >= 1; i >>= 1) + { + if (epilogue_size & i) + destmem = emit_memmov (destmem, &srcmem, destptr, srcptr, i); + } + return; + } + if (max_size > 8) + { + count = expand_simple_binop (GET_MODE (count), AND, count, GEN_INT (max_size - 1), + count, 1, OPTAB_DIRECT); + expand_set_or_movmem_via_loop (destmem, srcmem, destptr, srcptr, NULL, + count, QImode, 1, 4); + return; + } + + rtx offset = force_reg (Pmode, const0_rtx); + rtx tmp; + + if (max_size > 4) + { + rtx label = nvptx_expand_aligntest (count, 4, true); + src = change_address (srcmem, SImode, srcptr); + dest = change_address (destmem, SImode, destptr); + emit_move_insn (dest, src); + tmp = expand_simple_binop (Pmode, PLUS, offset, GEN_INT (4), NULL, + true, OPTAB_LIB_WIDEN); + if (tmp != offset) + emit_move_insn (offset, tmp); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (max_size > 2) + { + rtx label = nvptx_expand_aligntest (count, 2, true); + tmp = gen_rtx_PLUS (Pmode, srcptr, offset); + src = change_address (srcmem, HImode, tmp); + tmp = gen_rtx_PLUS (Pmode, destptr, offset); + dest = change_address (destmem, HImode, tmp); + emit_move_insn (dest, src); + tmp = expand_simple_binop (Pmode, PLUS, offset, GEN_INT (2), tmp, + true, OPTAB_LIB_WIDEN); + if (tmp != offset) + emit_move_insn (offset, tmp); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (max_size > 1) + { + rtx label = nvptx_expand_aligntest (count, 1, true); + tmp = gen_rtx_PLUS (Pmode, srcptr, offset); + src = change_address (srcmem, QImode, tmp); + tmp = gen_rtx_PLUS (Pmode, destptr, offset); + dest = change_address (destmem, QImode, tmp); + emit_move_insn (dest, src); + emit_label (label); + LABEL_NUSES (label) = 1; + } +} + +/* Output code to set at most count & (max_size - 1) bytes starting by DEST. */ +static void +expand_setmem_epilogue_via_loop (rtx destmem, rtx destptr, rtx value, + rtx count, int max_size) +{ + count = + expand_simple_binop (counter_mode (count), AND, count, + GEN_INT (max_size - 1), count, 1, OPTAB_DIRECT); + expand_set_or_movmem_via_loop (destmem, NULL, destptr, NULL, + gen_lowpart (QImode, value), count, QImode, + 1, max_size / 2); +} + +/* Output code to set at most count & (max_size - 1) bytes starting by DEST. */ +static void +expand_setmem_epilogue (rtx destmem, rtx destptr, rtx value, rtx count, int max_size) +{ + rtx dest; + + if (CONST_INT_P (count)) + { + HOST_WIDE_INT countval = INTVAL (count); + int offset = 0; + + if ((countval & 0x10) && max_size > 16) + { + dest = adjust_automodify_address_nv (destmem, DImode, destptr, offset); + emit_strset (destptr, dest, value); + dest = adjust_automodify_address_nv (destmem, DImode, destptr, offset + 8); + emit_strset (destptr, dest, value); + offset += 16; + } + if ((countval & 0x08) && max_size > 8) + { + dest = adjust_automodify_address_nv (destmem, DImode, destptr, offset); + emit_strset (destptr, dest, value); + offset += 8; + } + if ((countval & 0x04) && max_size > 4) + { + dest = adjust_automodify_address_nv (destmem, SImode, destptr, offset); + emit_strset (destptr, dest, gen_lowpart (SImode, value)); + offset += 4; + } + if ((countval & 0x02) && max_size > 2) + { + dest = adjust_automodify_address_nv (destmem, HImode, destptr, offset); + emit_strset (destptr, dest, gen_lowpart (HImode, value)); + offset += 2; + } + if ((countval & 0x01) && max_size > 1) + { + dest = adjust_automodify_address_nv (destmem, QImode, destptr, offset); + emit_strset (destptr, dest, gen_lowpart (QImode, value)); + offset += 1; + } + return; + } + if (max_size > 32) + { + expand_setmem_epilogue_via_loop (destmem, destptr, value, count, max_size); + return; + } + if (max_size > 16) + { + rtx label = nvptx_expand_aligntest (count, 16, true); + dest = change_address (destmem, DImode, destptr); + emit_strset (destptr, dest, value); + emit_strset (destptr, dest, value); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (max_size > 8) + { + rtx label = nvptx_expand_aligntest (count, 8, true); + dest = change_address (destmem, DImode, destptr); + emit_strset (destptr, dest, value); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (max_size > 4) + { + rtx label = nvptx_expand_aligntest (count, 4, true); + dest = change_address (destmem, SImode, destptr); + emit_strset (destptr, dest, gen_lowpart (SImode, value)); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (max_size > 2) + { + rtx label = nvptx_expand_aligntest (count, 2, true); + dest = change_address (destmem, HImode, destptr); + emit_strset (destptr, dest, gen_lowpart (HImode, value)); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (max_size > 1) + { + rtx label = nvptx_expand_aligntest (count, 1, true); + dest = change_address (destmem, QImode, destptr); + emit_strset (destptr, dest, gen_lowpart (QImode, value)); + emit_label (label); + LABEL_NUSES (label) = 1; + } +} + +/* Copy enough from DEST to SRC to align DEST known to by aligned by ALIGN to + DESIRED_ALIGNMENT. + Return value is updated DESTMEM. */ +static rtx +expand_movmem_prologue (rtx destmem, rtx srcmem, + rtx destptr, rtx srcptr, rtx count, + int align, int desired_alignment) +{ + int i; + for (i = 1; i < desired_alignment; i <<= 1) + { + if (align <= i) + { + rtx label = nvptx_expand_aligntest (destptr, i, false); + destmem = emit_memmov (destmem, &srcmem, destptr, srcptr, i); + nvptx_adjust_counter (count, i); + emit_label (label); + LABEL_NUSES (label) = 1; + set_mem_align (destmem, i * 2 * BITS_PER_UNIT); + } + } + return destmem; +} + +/* Copy enough from DST to SRC to align DST known to DESIRED_ALIGN. + ALIGN_BYTES is how many bytes need to be copied. + The function updates DST and SRC, namely, it sets proper alignment. + DST is returned via return value, SRC is updated via pointer SRCP. */ +static rtx +expand_constant_movmem_prologue (rtx dst, rtx *srcp, rtx destreg, rtx srcreg, + int desired_align, int align_bytes) +{ + rtx src = *srcp; + rtx orig_dst = dst; + rtx orig_src = src; + int piece_size = 1; + int copied_bytes = 0; + int src_align_bytes = get_mem_align_offset (src, desired_align * BITS_PER_UNIT); + if (src_align_bytes >= 0) + src_align_bytes = desired_align - src_align_bytes; + + for (piece_size = 1; + piece_size <= desired_align && copied_bytes < align_bytes; + piece_size <<= 1) + { + if (align_bytes & piece_size) + { + dst = emit_memmov (dst, &src, destreg, srcreg, piece_size); + copied_bytes += piece_size; + } + } + + if (MEM_ALIGN (dst) < (unsigned int) desired_align * BITS_PER_UNIT) + set_mem_align (dst, desired_align * BITS_PER_UNIT); + if (src_align_bytes >= 0) + { + unsigned int src_align; + for (src_align = desired_align; src_align >= 2; src_align >>= 1) + { + if ((src_align_bytes & (src_align - 1)) + == (align_bytes & (src_align - 1))) + break; + } + if (src_align > (unsigned int) desired_align) + src_align = desired_align; + if (MEM_ALIGN (src) < src_align * BITS_PER_UNIT) + set_mem_align (src, src_align * BITS_PER_UNIT); + } + if (MEM_SIZE_KNOWN_P (orig_dst)) + set_mem_size (dst, MEM_SIZE (orig_dst) - align_bytes); + if (MEM_SIZE_KNOWN_P (orig_src)) + set_mem_size (src, MEM_SIZE (orig_src) - align_bytes); + *srcp = src; + return dst; +} + +/* Set enough from DEST to align DEST known to by aligned by ALIGN to + DESIRED_ALIGNMENT. */ +static void +expand_setmem_prologue (rtx destmem, rtx destptr, rtx value, rtx count, + int align, int desired_alignment) +{ + if (align <= 1 && desired_alignment > 1) + { + rtx label = nvptx_expand_aligntest (destptr, 1, false); + destmem = change_address (destmem, QImode, destptr); + emit_strset (destptr, destmem, gen_lowpart (QImode, value)); + nvptx_adjust_counter (count, 1); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (align <= 2 && desired_alignment > 2) + { + rtx label = nvptx_expand_aligntest (destptr, 2, false); + destmem = change_address (destmem, HImode, destptr); + emit_strset (destptr, destmem, gen_lowpart (HImode, value)); + nvptx_adjust_counter (count, 2); + emit_label (label); + LABEL_NUSES (label) = 1; + } + if (align <= 4 && desired_alignment > 4) + { + rtx label = nvptx_expand_aligntest (destptr, 4, false); + destmem = change_address (destmem, SImode, destptr); + emit_strset (destptr, destmem, gen_lowpart (SImode, value)); + nvptx_adjust_counter (count, 4); + emit_label (label); + LABEL_NUSES (label) = 1; + } + gcc_assert (desired_alignment <= 8); +} + +/* Set enough from DST to align DST known to by aligned by ALIGN to + DESIRED_ALIGN. ALIGN_BYTES is how many bytes need to be stored. */ +static rtx +expand_constant_setmem_prologue (rtx dst, rtx destreg, rtx value, + int desired_align, int align_bytes) +{ + int off = 0; + rtx orig_dst = dst; + if (align_bytes & 1) + { + dst = adjust_automodify_address_nv (dst, QImode, destreg, 0); + off = 1; + emit_strset (destreg, dst, gen_lowpart (QImode, value)); + } + if (align_bytes & 2) + { + dst = adjust_automodify_address_nv (dst, HImode, destreg, off); + if (MEM_ALIGN (dst) < 2 * BITS_PER_UNIT) + set_mem_align (dst, 2 * BITS_PER_UNIT); + off = 2; + emit_strset (destreg, dst, gen_lowpart (HImode, value)); + } + if (align_bytes & 4) + { + dst = adjust_automodify_address_nv (dst, SImode, destreg, off); + if (MEM_ALIGN (dst) < 4 * BITS_PER_UNIT) + set_mem_align (dst, 4 * BITS_PER_UNIT); + off = 4; + emit_strset (destreg, dst, gen_lowpart (SImode, value)); + } + dst = adjust_automodify_address_nv (dst, BLKmode, destreg, off); + if (MEM_ALIGN (dst) < (unsigned int) desired_align * BITS_PER_UNIT) + set_mem_align (dst, desired_align * BITS_PER_UNIT); + if (MEM_SIZE_KNOWN_P (orig_dst)) + set_mem_size (dst, MEM_SIZE (orig_dst) - align_bytes); + return dst; +} + +/* Expand string move (memcpy) operation. Use i386 string operations + when profitable. expand_setmem contains similar code. The code + depends upon architecture, block size and alignment, but always has + the same overall structure: + + 1) Prologue guard: Conditional that jumps up to epilogues for small + blocks that can be handled by epilogue alone. This is faster + but also needed for correctness, since prologue assume the block + is larger than the desired alignment. + + Optional dynamic check for size and libcall for large + blocks is emitted here too, with -minline-stringops-dynamically. + + 2) Prologue: copy first few bytes in order to get destination + aligned to DESIRED_ALIGN. It is emitted only when ALIGN is less + than DESIRED_ALIGN and up to DESIRED_ALIGN - ALIGN bytes can be + copied. We emit either a jump tree on power of two sized + blocks, or a byte loop. + + 3) Main body: the copying loop itself, copying in SIZE_NEEDED chunks + with specified algorithm. + + 4) Epilogue: code copying tail of the block that is too small to be + handled by main body (or up to size guarded by prologue guard). */ + +bool +nvptx_expand_movmem (rtx dst, rtx src, rtx count_exp, rtx align_exp, + rtx expected_align_exp, rtx expected_size_exp) +{ + rtx destreg; + rtx srcreg; + rtx label = NULL; + rtx tmp; + rtx jump_around_label = NULL; + HOST_WIDE_INT align = 1; + unsigned HOST_WIDE_INT count = 0; + HOST_WIDE_INT expected_size = -1; + int size_needed = 0, epilogue_size_needed; + int desired_align = 0, align_bytes = 0; + bool need_zero_guard = false; + enum machine_mode move_mode = VOIDmode; + int unroll_factor = 1; + + if (CONST_INT_P (align_exp)) + align = INTVAL (align_exp); + /* ALIGN is the minimum of destination and source alignment, but we care here + just about destination alignment. */ + else if (MEM_ALIGN (dst) > (unsigned HOST_WIDE_INT) align * BITS_PER_UNIT) + align = MEM_ALIGN (dst) / BITS_PER_UNIT; + + if (CONST_INT_P (count_exp)) + count = expected_size = INTVAL (count_exp); + if (CONST_INT_P (expected_size_exp) && count == 0) + expected_size = INTVAL (expected_size_exp); + + /* Make sure we don't need to care about overflow later on. */ + if (count > ((unsigned HOST_WIDE_INT) 1 << 30)) + return false; + + if (!count) + count_exp = copy_to_mode_reg (GET_MODE (count_exp), count_exp); + destreg = copy_addr_to_reg (XEXP (dst, 0)); + srcreg = copy_addr_to_reg (XEXP (src, 0)); + + unroll_factor = 1; + if (align >= 8) + move_mode = DImode; + else + move_mode = mode_for_size (align * BITS_PER_UNIT, MODE_INT, 0); + need_zero_guard = true; + size_needed = GET_MODE_SIZE (move_mode) * unroll_factor; + epilogue_size_needed = size_needed; + + desired_align = decide_alignment (align, expected_size, move_mode); + desired_align = align; + + /* Step 1: Prologue guard. */ + + /* Alignment code needs count to be in register. */ + if (CONST_INT_P (count_exp) && desired_align > align) + { + if (INTVAL (count_exp) > desired_align + && INTVAL (count_exp) > size_needed) + { + align_bytes + = get_mem_align_offset (dst, desired_align * BITS_PER_UNIT); + if (align_bytes <= 0) + align_bytes = 0; + else + align_bytes = desired_align - align_bytes; + } + if (align_bytes == 0) + count_exp = force_reg (counter_mode (count_exp), count_exp); + } + gcc_assert (desired_align >= 1 && align >= 1); + + /* Ensure that alignment prologue won't copy past end of block. */ + if (size_needed > 1 || (desired_align > 1 && desired_align > align)) + { + epilogue_size_needed = MAX (size_needed - 1, desired_align - align); + /* Epilogue always copies COUNT_EXP & EPILOGUE_SIZE_NEEDED bytes. + Make sure it is power of 2. */ + epilogue_size_needed = 1 << (floor_log2 (epilogue_size_needed) + 1); + + if (count) + { + if (count < (unsigned HOST_WIDE_INT)epilogue_size_needed) + { + /* If main algorithm works on QImode, no epilogue is needed. + For small sizes just don't align anything. */ + if (size_needed == 1) + desired_align = align; + else + goto epilogue; + } + } + else + { + label = gen_label_rtx (); + emit_cmp_and_jump_insns (count_exp, + GEN_INT (epilogue_size_needed), + LTU, 0, counter_mode (count_exp), 1, label); + if (expected_size == -1 || expected_size < epilogue_size_needed) + predict_jump (REG_BR_PROB_BASE * 60 / 100); + else + predict_jump (REG_BR_PROB_BASE * 20 / 100); + } + } + + /* Step 2: Alignment prologue. */ + + if (desired_align > align) + { + if (align_bytes == 0) + { + /* Except for the first move in epilogue, we no longer know + constant offset in aliasing info. It don't seems to worth + the pain to maintain it for the first move, so throw away + the info early. */ + src = change_address (src, BLKmode, srcreg); + dst = change_address (dst, BLKmode, destreg); + dst = expand_movmem_prologue (dst, src, destreg, srcreg, count_exp, align, + desired_align); + } + else + { + /* If we know how many bytes need to be stored before dst is + sufficiently aligned, maintain aliasing info accurately. */ + dst = expand_constant_movmem_prologue (dst, &src, destreg, srcreg, + desired_align, align_bytes); + count_exp = plus_constant (counter_mode (count_exp), + count_exp, -align_bytes); + count -= align_bytes; + } + if (need_zero_guard + && (count < (unsigned HOST_WIDE_INT) size_needed + || (align_bytes == 0 + && count < ((unsigned HOST_WIDE_INT) size_needed + + desired_align - align)))) + { + /* It is possible that we copied enough so the main loop will not + execute. */ + gcc_assert (size_needed > 1); + if (label == NULL_RTX) + label = gen_label_rtx (); + emit_cmp_and_jump_insns (count_exp, + GEN_INT (size_needed), + LTU, 0, counter_mode (count_exp), 1, label); + if (expected_size == -1 + || expected_size < (desired_align - align) / 2 + size_needed) + predict_jump (REG_BR_PROB_BASE * 20 / 100); + else + predict_jump (REG_BR_PROB_BASE * 60 / 100); + } + } + if (label && size_needed == 1) + { + emit_label (label); + LABEL_NUSES (label) = 1; + label = NULL; + epilogue_size_needed = 1; + } + else if (label == NULL_RTX) + epilogue_size_needed = size_needed; + + /* Step 3: Main loop. */ + + expand_set_or_movmem_via_loop (dst, src, destreg, srcreg, NULL, + count_exp, move_mode, unroll_factor, + expected_size); + + /* Adjust properly the offset of src and dest memory for aliasing. */ + if (CONST_INT_P (count_exp)) + { + src = adjust_automodify_address_nv (src, BLKmode, srcreg, + (count / size_needed) * size_needed); + dst = adjust_automodify_address_nv (dst, BLKmode, destreg, + (count / size_needed) * size_needed); + } + else + { + src = change_address (src, BLKmode, srcreg); + dst = change_address (dst, BLKmode, destreg); + } + + /* Step 4: Epilogue to copy the remaining bytes. */ + epilogue: + if (label) + { + /* When the main loop is done, COUNT_EXP might hold original count, + while we want to copy only COUNT_EXP & SIZE_NEEDED bytes. + Epilogue code will actually copy COUNT_EXP & EPILOGUE_SIZE_NEEDED + bytes. Compensate if needed. */ + + if (size_needed < epilogue_size_needed) + { + tmp = + expand_simple_binop (counter_mode (count_exp), AND, count_exp, + GEN_INT (size_needed - 1), count_exp, 1, + OPTAB_DIRECT); + if (tmp != count_exp) + emit_move_insn (count_exp, tmp); + } + emit_label (label); + LABEL_NUSES (label) = 1; + } + + if (count_exp != const0_rtx && epilogue_size_needed > 1) + expand_movmem_epilogue (dst, src, destreg, srcreg, count_exp, + epilogue_size_needed); + if (jump_around_label) + emit_label (jump_around_label); + return true; +} + +/* Helper function for memcpy. For QImode value 0xXY produce + 0xXYXYXYXY of wide specified by MODE. This is essentially + a * 0x10101010, but we can do slightly better than + synth_mult by unwinding the sequence by hand on CPUs with + slow multiply. */ +static rtx +promote_duplicated_reg (enum machine_mode mode, rtx val) +{ + enum machine_mode valmode = GET_MODE (val); + rtx tmp; + + gcc_assert (mode == SImode || mode == DImode); + if (val == const0_rtx) + return copy_to_mode_reg (mode, const0_rtx); + + if (CONST_INT_P (val)) + { + HOST_WIDE_INT v = INTVAL (val) & 255; + + v |= v << 8; + v |= v << 16; + if (mode == DImode) + v |= (v << 16) << 16; + return copy_to_mode_reg (mode, gen_int_mode (v, mode)); + } + + if (valmode == VOIDmode) + valmode = QImode; + if (valmode != QImode) + val = gen_lowpart (QImode, val); + if (mode == QImode) + return val; + rtx reg = convert_modes (mode, QImode, val, true); + tmp = promote_duplicated_reg (mode, const1_rtx); + return expand_simple_binop (mode, MULT, reg, tmp, NULL, 1, + OPTAB_DIRECT); +} + +/* Duplicate value VAL using promote_duplicated_reg into maximal size that will + be needed by main loop copying SIZE_NEEDED chunks and prologue getting + alignment from ALIGN to DESIRED_ALIGN. */ +static rtx +promote_duplicated_reg_to_size (rtx val, int size_needed, int desired_align, int align) +{ + rtx promoted_val; + + if (size_needed > 4 || (desired_align > align && desired_align > 4)) + promoted_val = promote_duplicated_reg (DImode, val); + else if (size_needed > 2 || (desired_align > align && desired_align > 2)) + promoted_val = promote_duplicated_reg (SImode, val); + else if (size_needed > 1 || (desired_align > align && desired_align > 1)) + promoted_val = promote_duplicated_reg (HImode, val); + else + promoted_val = val; + + return promoted_val; +} + +/* Expand string clear operation (bzero). Use i386 string operations when + profitable. See expand_movmem comment for explanation of individual + steps performed. */ +bool +nvptx_expand_setmem (rtx dst, rtx count_exp, rtx val_exp, rtx align_exp, + rtx expected_align_exp, rtx expected_size_exp) +{ + rtx destreg; + rtx label = NULL; + rtx tmp; + rtx jump_around_label = NULL; + HOST_WIDE_INT align = 1; + unsigned HOST_WIDE_INT count = 0; + HOST_WIDE_INT expected_size = -1; + int size_needed = 0, epilogue_size_needed; + int desired_align = 0, align_bytes = 0; + rtx promoted_val = NULL; + bool force_loopy_epilogue = false; + bool need_zero_guard = false; + enum machine_mode move_mode = VOIDmode; + int unroll_factor; + + if (CONST_INT_P (align_exp)) + align = INTVAL (align_exp); + /* i386 can do misaligned access on reasonably increased cost. */ + if (CONST_INT_P (expected_align_exp) + && INTVAL (expected_align_exp) > align) + align = INTVAL (expected_align_exp); + if (CONST_INT_P (count_exp)) + count = expected_size = INTVAL (count_exp); + if (CONST_INT_P (expected_size_exp) && count == 0) + expected_size = INTVAL (expected_size_exp); + + /* Make sure we don't need to care about overflow later on. */ + if (count > ((unsigned HOST_WIDE_INT) 1 << 30)) + return false; + + if (!count) + count_exp = copy_to_mode_reg (counter_mode (count_exp), count_exp); + destreg = copy_addr_to_reg (XEXP (dst, 0)); + + move_mode = word_mode; + unroll_factor = 1; + need_zero_guard = true; + if (1) + move_mode = QImode; + + size_needed = GET_MODE_SIZE (move_mode) * unroll_factor; + epilogue_size_needed = size_needed; + + desired_align = decide_alignment (align, expected_size, move_mode); + + /* Step 1: Prologue guard. */ + + /* Do the cheap promotion to allow better CSE across the + main loop and epilogue (ie one load of the big constant in the + front of all code. */ + if (CONST_INT_P (val_exp)) + promoted_val = promote_duplicated_reg_to_size (val_exp, size_needed, + desired_align, align); + + /* Ensure that alignment prologue won't copy past end of block. */ + if (size_needed > 1 || (desired_align > 1 && desired_align > align)) + { + epilogue_size_needed = MAX (size_needed - 1, desired_align - align); + /* Epilogue always copies COUNT_EXP & (EPILOGUE_SIZE_NEEDED - 1) bytes. + Make sure it is power of 2. */ + epilogue_size_needed = 1 << (floor_log2 (epilogue_size_needed) + 1); + + /* To improve performance of small blocks, we jump around the VAL + promoting mode. This mean that if the promoted VAL is not constant, + we might not use it in the epilogue and have to use byte + loop variant. */ + if (epilogue_size_needed > 2 && !promoted_val) + force_loopy_epilogue = true; + if (count) + { + if (count < (unsigned HOST_WIDE_INT)epilogue_size_needed) + { + /* If main algorithm works on QImode, no epilogue is needed. + For small sizes just don't align anything. */ + if (size_needed == 1) + desired_align = align; + else + goto epilogue; + } + } + else + { + label = gen_label_rtx (); + emit_cmp_and_jump_insns (count_exp, + GEN_INT (epilogue_size_needed), + LTU, 0, counter_mode (count_exp), 1, label); + } + } + + /* Step 2: Alignment prologue. */ + + /* Do the expensive promotion once we branched off the small blocks. */ + if (!promoted_val) + promoted_val = promote_duplicated_reg_to_size (val_exp, size_needed, + desired_align, align); + gcc_assert (desired_align >= 1 && align >= 1); + + if (desired_align > align) + { + if (align_bytes == 0) + { + /* Except for the first move in epilogue, we no longer know + constant offset in aliasing info. It don't seems to worth + the pain to maintain it for the first move, so throw away + the info early. */ + dst = change_address (dst, BLKmode, destreg); + expand_setmem_prologue (dst, destreg, promoted_val, count_exp, align, + desired_align); + } + else + { + /* If we know how many bytes need to be stored before dst is + sufficiently aligned, maintain aliasing info accurately. */ + dst = expand_constant_setmem_prologue (dst, destreg, promoted_val, + desired_align, align_bytes); + count_exp = plus_constant (counter_mode (count_exp), + count_exp, -align_bytes); + count -= align_bytes; + } + if (need_zero_guard + && (count < (unsigned HOST_WIDE_INT) size_needed + || (align_bytes == 0 + && count < ((unsigned HOST_WIDE_INT) size_needed + + desired_align - align)))) + { + /* It is possible that we copied enough so the main loop will not + execute. */ + gcc_assert (size_needed > 1); + if (label == NULL_RTX) + label = gen_label_rtx (); + emit_cmp_and_jump_insns (count_exp, + GEN_INT (size_needed), + LTU, 0, counter_mode (count_exp), 1, label); + } + } + if (label && size_needed == 1) + { + emit_label (label); + LABEL_NUSES (label) = 1; + label = NULL; + promoted_val = val_exp; + epilogue_size_needed = 1; + } + else if (label == NULL_RTX) + epilogue_size_needed = size_needed; + + /* Step 3: Main loop. */ + + expand_set_or_movmem_via_loop (dst, NULL, destreg, NULL, promoted_val, + count_exp, move_mode, unroll_factor, + expected_size); + + /* Adjust properly the offset of src and dest memory for aliasing. */ + if (CONST_INT_P (count_exp)) + dst = adjust_automodify_address_nv (dst, BLKmode, destreg, + (count / size_needed) * size_needed); + else + dst = change_address (dst, BLKmode, destreg); + + /* Step 4: Epilogue to copy the remaining bytes. */ + + if (label) + { + /* When the main loop is done, COUNT_EXP might hold original count, + while we want to copy only COUNT_EXP & SIZE_NEEDED bytes. + Epilogue code will actually copy COUNT_EXP & EPILOGUE_SIZE_NEEDED + bytes. Compensate if needed. */ + + if (size_needed < epilogue_size_needed) + { + tmp = + expand_simple_binop (counter_mode (count_exp), AND, count_exp, + GEN_INT (size_needed - 1), count_exp, 1, + OPTAB_DIRECT); + if (tmp != count_exp) + emit_move_insn (count_exp, tmp); + } + emit_label (label); + LABEL_NUSES (label) = 1; + } + epilogue: + if (count_exp != const0_rtx && epilogue_size_needed > 1) + { + if (force_loopy_epilogue) + expand_setmem_epilogue_via_loop (dst, destreg, val_exp, count_exp, + epilogue_size_needed); + else + expand_setmem_epilogue (dst, destreg, promoted_val, count_exp, + epilogue_size_needed); + } + if (jump_around_label) + emit_label (jump_around_label); + return true; +} + +/* Emit a comparison. */ +rtx +nvptx_expand_compare (rtx compare) +{ + rtx pred = gen_reg_rtx (BImode); + rtx cmp = gen_rtx_fmt_ee (GET_CODE (compare), BImode, + XEXP (compare, 0), XEXP (compare, 1)); + emit_insn (gen_rtx_SET (VOIDmode, pred, cmp)); + return gen_rtx_NE (BImode, pred, const0_rtx); +} + +/* Returns true if X is a valid address for use in a memory reference + of mode MODE. If STRICT is true, we do not allow pseudo registers + in the address. */ + +static bool +nvptx_legitimate_address_p (enum machine_mode mode, rtx x, bool strict) +{ + enum rtx_code code = GET_CODE (x); + + switch (code) + { + case REG: + return true; + + case PLUS: + if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1))) + return true; + return false; + + case CONST: + case SYMBOL_REF: + case LABEL_REF: + return true; + + default: + return false; + } +} + +/* Named address space version of legitimate_address_p. */ + +bool +nvptx_addr_space_legitimate_address_p (enum machine_mode mode, rtx mem, + bool strict, addr_space_t) +{ + return targetm.legitimate_address_p (mode, mem, strict); +} + +/* Used when assembling integers to ensure data is emitted in + pieces whose size matches the declaration we printed. */ +static int decl_chunk_size; +static enum machine_mode decl_chunk_mode; +/* Used in the same situation, to keep track of the byte offset + into the initializer. */ +static int decl_offset; +/* The initializer part we are currently processing. */ +static HOST_WIDE_INT init_part; +/* The total size of the object. */ +static HOST_WIDE_INT object_size; +/* True if we found a skip extending to the end of the object. Used to + assert that no data follows. */ +static bool object_finished; + +static void +begin_decl_field (void) +{ + /* We never see decl_offset at zero by the time we get here. */ + if (decl_offset == decl_chunk_size) + fprintf (asm_out_file, " = { "); + else + fprintf (asm_out_file, ", "); +} + +static void +output_decl_chunk (void) +{ + begin_decl_field (); + output_address (gen_int_mode (init_part, decl_chunk_mode)); + init_part = 0; +} + +static void +nvptx_assemble_value (HOST_WIDE_INT val, unsigned int size) +{ + int chunk_offset + = decl_offset % decl_chunk_size; + gcc_assert (!object_finished); + while (size > 0) + { + int this_part = size; + if (chunk_offset + this_part > decl_chunk_size) + this_part = decl_chunk_size - chunk_offset; + HOST_WIDE_INT val_part; + HOST_WIDE_INT mask = 2; + mask <<= (this_part * BITS_PER_UNIT - 1); + val_part = val & (mask - 1); + init_part |= val_part << (BITS_PER_UNIT * chunk_offset); + val >>= BITS_PER_UNIT * this_part; + size -= this_part; + decl_offset += this_part; + if (decl_offset % decl_chunk_size == 0) + output_decl_chunk (); + + chunk_offset = 0; + } +} + +/* Target hook for assembling integer objects. */ + +static bool +nvptx_assemble_integer (rtx x, unsigned int size, int aligned_p) +{ + if (GET_CODE (x) == AS_CONVERT || GET_CODE (x) == SYMBOL_REF + || GET_CODE (x) == CONST) + { + gcc_assert (size = decl_chunk_size); + if (decl_offset % decl_chunk_size != 0) + sorry ("cannot emit unaligned pointers in ptx assembly"); + decl_offset += size; + begin_decl_field (); + + HOST_WIDE_INT off = 0; + bool generic = false; + if (GET_CODE (x) == CONST) + x = XEXP (x, 0); + if (GET_CODE (x) == AS_CONVERT) + { + gcc_assert (ADDR_SPACE_GENERIC_P (XINT (x, 1))); + x = XEXP (x, 0); + generic = true; + } + if (GET_CODE (x) == PLUS) + { + off = INTVAL (XEXP (x, 1)); + x = XEXP (x, 0); + } + if (GET_CODE (x) == AS_CONVERT) + { + x = XEXP (x, 0); + generic = true; + } + if (generic) + fprintf (asm_out_file, "generic("); + output_address (x); + if (generic) + fprintf (asm_out_file, ")"); + if (off != 0) + fprintf (asm_out_file, " + " HOST_WIDE_INT_PRINT_DEC, off); + return true; + } + + HOST_WIDE_INT val; + switch (GET_CODE (x)) + { + case CONST_INT: + val = INTVAL (x); + break; + case CONST_DOUBLE: + gcc_unreachable (); + break; + default: + gcc_unreachable (); + } + + nvptx_assemble_value (val, size); + return true; +} + +void +nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT size) +{ + if (decl_offset + size >= object_size) + { + if (decl_offset % decl_chunk_size != 0) + nvptx_assemble_value (0, decl_chunk_size); + object_finished = true; + return; + } + + while (size > decl_chunk_size) + { + nvptx_assemble_value (0, decl_chunk_size); + size -= decl_chunk_size; + } + while (size-- > 0) + nvptx_assemble_value (0, 1); +} + +void +nvptx_output_ascii (FILE *, const char *str, unsigned HOST_WIDE_INT size) +{ + for (unsigned HOST_WIDE_INT i = 0; i < size; i++) + nvptx_assemble_value (str[i], 1); +} + +static void +nvptx_assemble_decl_end (void) +{ + if (decl_offset != 0) + { + if (!object_finished && decl_offset % decl_chunk_size != 0) + nvptx_assemble_value (0, decl_chunk_size); + + fprintf (asm_out_file, " }"); + } + fprintf (asm_out_file, ";\n"); + fprintf (asm_out_file, "// END VAR DEF\n"); +} + +static void +init_output_initializer (FILE *file, const char *name, const_tree type) +{ + fprintf (file, "// BEGIN VAR DEF: %s\n", name); + + if (TREE_CODE (type) == ARRAY_TYPE) + type = TREE_TYPE (type); + int sz = int_size_in_bytes (type); + if ((TREE_CODE (type) != INTEGER_TYPE + && TREE_CODE (type) != REAL_TYPE) + || sz < 0 + || sz > HOST_BITS_PER_WIDE_INT) + type = ptr_type_node; + decl_chunk_size = int_size_in_bytes (type); + decl_chunk_mode = int_mode_for_mode (TYPE_MODE (type)); + decl_offset = 0; + init_part = 0; + object_finished = false; +} + +static const char * +section_from_addr_space (addr_space_t as) +{ + switch (as) + { + case ADDR_SPACE_CONST: + return ".const"; + + case ADDR_SPACE_GLOBAL: + return ".global"; + + case ADDR_SPACE_SHARED: + return ".shared"; + + default: + gcc_unreachable (); + } +} + +static void +nvptx_asm_declare_constant_name (FILE *file, const char *name, + const_tree exp ATTRIBUTE_UNUSED, + HOST_WIDE_INT size) +{ + tree type = TREE_TYPE (exp); + init_output_initializer (file, name, type); + fprintf (file, "\t.const .align %d .u%d ", + TYPE_ALIGN (TREE_TYPE (exp)) / BITS_PER_UNIT, + decl_chunk_size * BITS_PER_UNIT); + assemble_name (file, name); + fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]", + (size + decl_chunk_size - 1) / decl_chunk_size); + object_size = size; +} + +void +nvptx_declare_object_name (FILE *file, const char *name, const_tree decl) +{ + if (decl && DECL_SIZE (decl)) + { + tree type = TREE_TYPE (decl); + unsigned HOST_WIDE_INT size; + + init_output_initializer (file, name, type); + size = tree_to_uhwi (DECL_SIZE_UNIT (decl)); + addr_space_t as = TYPE_ADDR_SPACE (type); + const char *section = section_from_addr_space (as); + fprintf (file, "\t%s%s .align %d .u%d ", + TREE_PUBLIC (decl) ? " .visible" : "", section, + DECL_ALIGN (decl) / BITS_PER_UNIT, + decl_chunk_size * BITS_PER_UNIT); + assemble_name (file, name); + if (size > 0) + fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]", + (size + decl_chunk_size - 1) / decl_chunk_size); + else + object_finished = true; + object_size = size; + } +} + +static void +nvptx_globalize_label (FILE *, const char *label) +{ +} + +static void +nvptx_assemble_undefined_decl (FILE *file, const char *name, const_tree decl) +{ + if (TREE_CODE (decl) != VAR_DECL) + return; + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl)); + const char *section = section_from_addr_space (as); + fprintf (file, "// BEGIN VAR DEF: %s\n", name); + HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (decl)); + fprintf (file, ".extern %s .b8 %s["HOST_WIDE_INT_PRINT_DEC"];", + section, name, size >= 0 ? size : 1); + fprintf (file, "// END VAR DEF\n"); +} + +const char * +nvptx_output_start_call (rtx op) +{ + fprintf (asm_out_file, "\t{\n"); + if (op != const0_rtx) + { + fprintf (asm_out_file, "\t\t.param%s %%retval_in;\n", + nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (op)), + false)); + } + return ""; +} + +const char * +nvptx_output_call_insn (rtx insn, rtx result, rtx callee) +{ + char buf[256]; + static int labelno; + bool needs_tgt = register_operand (callee, Pmode); + rtx pat = PATTERN (insn); + int nargs = XVECLEN (pat, 0) - 1; + + fprintf (asm_out_file, "{"); + if (needs_tgt) + { + ASM_GENERATE_INTERNAL_LABEL (buf, "LCT", labelno); + labelno++; + ASM_OUTPUT_LABEL (asm_out_file, buf); + fprintf (asm_out_file, "\t.callprototype\t"); + if (result != NULL_RTX) + fprintf (asm_out_file, "(.param%s _) ", + nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)), + false)); + fprintf (asm_out_file, "_"); + if (nargs > 0) + { + fprintf (asm_out_file, " ("); + for (int i = 0; i < nargs; i++) + { + rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0); + enum machine_mode mode = arg_promotion (GET_MODE (t)); + fprintf (asm_out_file, ".param%s _", + nvptx_ptx_type_from_mode (mode, false)); + if (i + 1 < nargs) + fprintf (asm_out_file, ", "); + } + fprintf (asm_out_file, ")"); + } + fprintf (asm_out_file, ";\n"); + } + + for (int i = 0; i < nargs; i++) + { + rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0); + fprintf (asm_out_file, "\t\t.param%s %%out_arg%d;\n", + nvptx_ptx_type_from_mode (GET_MODE (t), false), i); + } + for (int i = 0; i < nargs; i++) + { + rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0); + gcc_assert (REG_P (t)); + fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d;\n", + nvptx_ptx_type_from_mode (GET_MODE (t), false), i, + REGNO (t)); + } + + fprintf (asm_out_file, "\t\tcall "); + if (result != NULL_RTX) + fprintf (asm_out_file, "(%%retval_in), "); + + output_address (callee); + if (nargs > 0) + { + fprintf (asm_out_file, ", ("); + for (int i = 0; i < nargs; i++) + { + fprintf (asm_out_file, "%%out_arg%d", i); + if (i + 1 < nargs) + fprintf (asm_out_file, ", "); + } + fprintf (asm_out_file, ")"); + } + if (needs_tgt) + { + fprintf (asm_out_file, ", "); + assemble_name (asm_out_file, buf); + } + fprintf (asm_out_file, ";\n\t}\n"); + if (result != NULL_RTX) + return "\tld.param%t0\t%0, [%%retval_in];\n}"; + + return "}"; +} + +static bool +nvptx_print_operand_punct_valid_p (unsigned char c) +{ + return c == '.' || c== '#'; +} + +static void nvptx_print_operand (FILE *, rtx, int); + +/* Subroutine of nvptx_print_operand; used to print a memory reference X to FILE. */ + +static void +nvptx_print_address_operand (FILE *file, rtx x, enum machine_mode) +{ + rtx off; + switch (GET_CODE (x)) + { + case PLUS: + off = XEXP (x, 1); + output_address (XEXP (x, 0)); + fprintf (file, "+"); + output_address (off); + break; + + case SYMBOL_REF: + case CONST: + case LABEL_REF: + output_addr_const (file, x); + break; + + default: + gcc_assert (GET_CODE (x) != MEM); + nvptx_print_operand (file, x, 0); + break; + } +} + +/* Output assembly language output for the address ADDR to FILE. */ +static void +nvptx_print_operand_address (FILE *file, rtx addr) +{ + nvptx_print_address_operand (file, addr, VOIDmode); +} + +/* Print an operand, X, to FILE, with an optional modifier in CODE. + + Meaning of CODE: + . -- print the predicate for the instruction or an emptry string for an + unconditional one. + # -- print a rounding mode for the instruction + + A -- print an address space identifier for a MEM + c -- print an opcode suffix for a comparison operator, including a type code + t -- print a type opcode suffix, promoting QImode to 32 bits + T -- print a type size in bits + u -- print a type opcode suffix without promotions +*/ + +static void +nvptx_print_operand (FILE *file, rtx x, int code) +{ + rtx orig_x = x; + enum machine_mode op_mode; + + if (code == '.') + { + x = current_insn_predicate; + if (x) + { + unsigned int regno = REGNO (XEXP (x, 0)); + fputs ("[", file); + if (GET_CODE (x) == EQ) + fputs ("!", file); + fputs (reg_names [regno], file); + fputs ("]", file); + } + return; + } + else if (code == '#') + { + fputs (".rn", file); + return; + } + + enum rtx_code x_code = GET_CODE (x); + + switch (code) + { + case 'A': + switch (MEM_ADDR_SPACE (x)) + { + case ADDR_SPACE_GLOBAL: + fputs (".global", file); + break; + case ADDR_SPACE_LOCAL: + fputs (".local", file); + break; + case ADDR_SPACE_SHARED: + fputs (".shared", file); + break; + case ADDR_SPACE_PARAM: + fputs (".param", file); + break; + case ADDR_SPACE_CONST: + fputs (".const", file); + break; + default: + break; + } + break; + + case 't': + if (x_code == SUBREG) + x = SUBREG_REG (x); + fprintf (file, "%s", nvptx_ptx_type_from_mode (GET_MODE (x), true)); + break; + + case 'u': + if (x_code == SUBREG) + x = SUBREG_REG (x); + fprintf (file, "%s", nvptx_ptx_type_from_mode (GET_MODE (x), false)); + break; + + case 'T': + fprintf (file, "%d", GET_MODE_BITSIZE (GET_MODE (x))); + break; + + case 'j': + fprintf (file, "@"); + goto common; + + case 'J': + fprintf (file, "@!"); + goto common; + + case 'c': + op_mode = GET_MODE (XEXP (x, 0)); + switch (x_code) + { + case EQ: + fputs (".eq", file); + break; + case NE: + if (FLOAT_MODE_P (op_mode)) + fputs (".neu", file); + else + fputs (".ne", file); + break; + case LE: + fputs (".le", file); + break; + case GE: + fputs (".ge", file); + break; + case LT: + fputs (".lt", file); + break; + case GT: + fputs (".gt", file); + break; + case LEU: + fputs (".ls", file); + break; + case GEU: + fputs (".hs", file); + break; + case LTU: + fputs (".lo", file); + break; + case GTU: + fputs (".hi", file); + break; + case LTGT: + fputs (".ne", file); + break; + case UNEQ: + fputs (".equ", file); + break; + case UNLE: + fputs (".leu", file); + break; + case UNGE: + fputs (".geu", file); + break; + case UNLT: + fputs (".ltu", file); + break; + case UNGT: + fputs (".gtu", file); + break; + default: + gcc_unreachable (); + } + if (FLOAT_MODE_P (op_mode) + || x_code == EQ || x_code == NE + || x_code == GEU || x_code == GTU + || x_code == LEU || x_code == LTU) + fputs (nvptx_ptx_type_from_mode (op_mode, true), file); + else + fprintf (file, ".s%d", GET_MODE_BITSIZE (op_mode)); + break; + default: + common: + switch (x_code) + { + case SUBREG: + x = SUBREG_REG (x); + /* fall through */ + + case REG: + if (HARD_REGISTER_P (x)) + fprintf (file, "%s", reg_names[REGNO (x)]); + else + fprintf (file, "%%r%d", REGNO (x)); + if (nvptx_split_reg_p (GET_MODE (x))) + { + gcc_assert (GET_CODE (orig_x) == SUBREG + && !nvptx_split_reg_p (GET_MODE (orig_x))); + fprintf (file, "$%d", SUBREG_BYTE (orig_x) / UNITS_PER_WORD); + } + break; + + case MEM: + fputc ('[', file); + nvptx_print_address_operand (file, XEXP (x, 0), GET_MODE (x)); + fputc (']', file); + break; + + case CONST_INT: + output_addr_const (file, x); + break; + + case CONST: + case SYMBOL_REF: + case LABEL_REF: + /* We could use output_addr_const, but that can print things like + "x-8", which breaks ptxas. Need to ensure it is output as + "x+-8". */ + nvptx_print_address_operand (file, x, VOIDmode); + break; + + case CONST_DOUBLE: + long vals[2]; + REAL_VALUE_TYPE real; + REAL_VALUE_FROM_CONST_DOUBLE (real, x); + real_to_target (vals, &real, GET_MODE (x)); + vals[0] &= 0xffffffff; + vals[1] &= 0xffffffff; + if (GET_MODE (x) == SFmode) + fprintf (file, "0f%08lx", vals[0]); + else + fprintf (file, "0d%08lx%08lx", vals[1], vals[0]); + break; + + default: + output_addr_const (file, x); + } + } +} + +struct reg_replace +{ + rtx replacement[MAX_RECOG_OPERANDS]; + enum machine_mode mode; + int n_allocated; + int n_in_use; +}; + +static rtx +get_replacement (struct reg_replace *r) +{ + if (r->n_allocated == r->n_in_use) + r->replacement[r->n_allocated++] = gen_reg_rtx (r->mode); + return r->replacement[r->n_in_use++]; +} + +static void +nvptx_reorg (void) +{ + struct reg_replace qiregs, hiregs, siregs, diregs; + rtx insn, next; + + /* We are freeing block_for_insn in the toplev to keep compatibility + with old MDEP_REORGS that are not CFG based. Recompute it now. */ + compute_bb_for_insn (); + + df_clear_flags (DF_LR_RUN_DCE); + df_analyze (); + + thread_prologue_and_epilogue_insns (); + + qiregs.n_allocated = 0; + hiregs.n_allocated = 0; + siregs.n_allocated = 0; + diregs.n_allocated = 0; + qiregs.mode = QImode; + hiregs.mode = HImode; + siregs.mode = SImode; + diregs.mode = DImode; + + for (insn = get_insns (); insn; insn = next) + { + next = NEXT_INSN (insn); + if (!NONDEBUG_INSN_P (insn) + || asm_noperands (insn) >= 0 + || GET_CODE (PATTERN (insn)) == USE + || GET_CODE (PATTERN (insn)) == CLOBBER) + continue; + qiregs.n_in_use = 0; + hiregs.n_in_use = 0; + siregs.n_in_use = 0; + diregs.n_in_use = 0; + extract_insn (insn); + enum attr_subregs_ok s_ok = get_attr_subregs_ok (insn); + for (int i = 0; i < recog_data.n_operands; i++) + { + rtx op = recog_data.operand[i]; + if (GET_CODE (op) != SUBREG) + continue; + + rtx inner = SUBREG_REG (op); + + enum machine_mode outer_mode = GET_MODE (op); + enum machine_mode inner_mode = GET_MODE (inner); + if (s_ok + && (GET_MODE_PRECISION (inner_mode) + >= GET_MODE_PRECISION (outer_mode))) + continue; + gcc_assert (SCALAR_INT_MODE_P (outer_mode)); + struct reg_replace *r = (outer_mode == QImode ? &qiregs + : outer_mode == HImode ? &hiregs + : outer_mode == SImode ? &siregs + : &diregs); + rtx new_reg = get_replacement (r); + + if (recog_data.operand_type[i] != OP_OUT) + { + enum rtx_code code; + if (GET_MODE_PRECISION (inner_mode) + < GET_MODE_PRECISION (outer_mode)) + code = ZERO_EXTEND; + else + code = TRUNCATE; + + rtx pat = gen_rtx_SET (VOIDmode, new_reg, + gen_rtx_fmt_e (code, outer_mode, inner)); + emit_insn_before (pat, insn); + } + + if (recog_data.operand_type[i] != OP_IN) + { + enum rtx_code code; + if (GET_MODE_PRECISION (inner_mode) + < GET_MODE_PRECISION (outer_mode)) + code = TRUNCATE; + else + code = ZERO_EXTEND; + + rtx pat = gen_rtx_SET (VOIDmode, inner, + gen_rtx_fmt_e (code, inner_mode, new_reg)); + emit_insn_after (pat, insn); + } + validate_change (insn, recog_data.operand_loc[i], new_reg, false); + } + } + + int maxregs = max_reg_num (); + regstat_init_n_sets_and_refs (); + + for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++) + if (REG_N_SETS (i) == 0 && REG_N_REFS (i) == 0) + regno_reg_rtx[i] = const0_rtx; + regstat_free_n_sets_and_refs (); +} + +static addr_space_t +nvptx_addr_space_for_frame (void) +{ + return ADDR_SPACE_LOCAL; +} + +static addr_space_t +nvptx_addr_space_for_global (bool constp) +{ + return constp ? ADDR_SPACE_CONST : ADDR_SPACE_GLOBAL; +} + +static bool +nvptx_addr_space_subset_p (addr_space_t as1, addr_space_t as2) +{ + return as2 == ADDR_SPACE_GENERIC; +} + +static rtx +nvptx_addr_space_convert (rtx op, tree from_type, tree to_type) +{ + addr_space_t from_as = TYPE_ADDR_SPACE (TREE_TYPE (from_type)); + addr_space_t to_as = TYPE_ADDR_SPACE (TREE_TYPE (to_type)); + + gcc_assert (from_as == ADDR_SPACE_GENERIC || to_as == ADDR_SPACE_GENERIC); + + enum unspec code; + if (from_as == ADDR_SPACE_GENERIC) + { + code = (to_as == ADDR_SPACE_GLOBAL ? UNSPEC_TO_GLOBAL + : to_as == ADDR_SPACE_LOCAL ? UNSPEC_TO_LOCAL + : to_as == ADDR_SPACE_SHARED ? UNSPEC_TO_SHARED + : to_as == ADDR_SPACE_CONST ? UNSPEC_TO_CONST + : UNSPEC_TO_PARAM); + } + else + { + code = (from_as == ADDR_SPACE_GLOBAL ? UNSPEC_FROM_GLOBAL + : from_as == ADDR_SPACE_LOCAL ? UNSPEC_FROM_LOCAL + : from_as == ADDR_SPACE_SHARED ? UNSPEC_FROM_SHARED + : from_as == ADDR_SPACE_CONST ? UNSPEC_FROM_CONST + : UNSPEC_FROM_PARAM); + } + rtx dest = gen_reg_rtx (Pmode); + if (!REG_P (op)) + op = force_reg (Pmode, op); + emit_insn (gen_rtx_SET (VOIDmode, dest, + gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op), code))); + return dest; +} + +/* Return the mode for a pointer to a given ADDRSPACE. */ + +enum machine_mode +nvptx_addr_space_pointer_mode (addr_space_t addrspace ATTRIBUTE_UNUSED) +{ + return ptr_mode; +} + +/* Return the mode for an address in a given ADDRSPACE. */ + +enum machine_mode +nvptx_addr_space_address_mode (addr_space_t addrspace ATTRIBUTE_UNUSED) +{ + return Pmode; +} + +/* Handle a "kernel" attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +nvptx_handle_kernel_attribute (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + tree decl = *node; + + if (TREE_CODE (decl) != FUNCTION_DECL) + { + error ("%qE attribute only applies to functions", name); + *no_add_attrs = true; + } + + else if (TREE_TYPE (TREE_TYPE (decl)) != void_type_node) + { + error ("%qE attribute requires a void return type", name); + *no_add_attrs = true; + } + + return NULL_TREE; +} + +/* Table of valid machine attributes. */ +static const struct attribute_spec nvptx_attribute_table[] = +{ + /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, + affects_type_identity } */ + { "kernel", 0, 0, true, false, false, nvptx_handle_kernel_attribute, false }, + { NULL, 0, 0, false, false, false, NULL, false } +}; + +static void +nvptx_asm_named_section (const char *name, unsigned int flags, tree decl) +{ + default_elf_asm_named_section (name, flags, decl); +} +/* Implement TARGET_ASM_INITIALIZE_SECTIONS. */ + +static void +nvptx_asm_init_sections (void) +{ +#if 0 + debug_frame_section = get_unnamed_section (0, output_section_asm_op, + "\t.section .debug_frame, \"\", @progbits"); +#endif +} + +#undef TARGET_OPTION_OVERRIDE +#define TARGET_OPTION_OVERRIDE nvptx_option_override + +#undef TARGET_ATTRIBUTE_TABLE +#define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table + +#undef TARGET_LEGITIMATE_ADDRESS_P +#define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p +#undef TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P +#define TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P nvptx_addr_space_legitimate_address_p + +#undef TARGET_PROMOTE_FUNCTION_MODE +#define TARGET_PROMOTE_FUNCTION_MODE default_promote_function_mode_always_promote +//#undef TARGET_PROMOTE_PROTOTYPES +//#define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true + +#undef TARGET_FUNCTION_ARG +#define TARGET_FUNCTION_ARG nvptx_function_arg +#undef TARGET_FUNCTION_INCOMING_ARG +#define TARGET_FUNCTION_INCOMING_ARG nvptx_function_incoming_arg +#undef TARGET_FUNCTION_ARG_ADVANCE +#define TARGET_FUNCTION_ARG_ADVANCE nvptx_function_arg_advance +#undef TARGET_FUNCTION_ARG_BOUNDARY +#define TARGET_FUNCTION_ARG_BOUNDARY nvptx_function_arg_boundary +#undef TARGET_FUNCTION_ARG_ROUND_BOUNDARY +#define TARGET_FUNCTION_ARG_ROUND_BOUNDARY \ + nvptx_function_arg_round_boundary +#undef TARGET_PASS_BY_REFERENCE +#define TARGET_PASS_BY_REFERENCE nvptx_pass_by_reference +#undef TARGET_FUNCTION_VALUE_REGNO_P +#define TARGET_FUNCTION_VALUE_REGNO_P nvptx_function_value_regno_p +#undef TARGET_FUNCTION_VALUE +#define TARGET_FUNCTION_VALUE nvptx_function_value +#undef TARGET_LIBCALL_VALUE +#define TARGET_LIBCALL_VALUE nvptx_libcall_value +#undef TARGET_FUNCTION_OK_FOR_SIBCALL +#define TARGET_FUNCTION_OK_FOR_SIBCALL nvptx_function_ok_for_sibcall +#undef TARGET_SPLIT_COMPLEX_ARG +#define TARGET_SPLIT_COMPLEX_ARG hook_bool_const_tree_true +#undef TARGET_RETURN_IN_MEMORY +#define TARGET_RETURN_IN_MEMORY nvptx_return_in_memory +#undef TARGET_OMIT_STRUCT_RETURN_REG +#define TARGET_OMIT_STRUCT_RETURN_REG true +#undef TARGET_STRICT_ARGUMENT_NAMING +#define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true + +#undef TARGET_START_CALL_ARGS +#define TARGET_START_CALL_ARGS nvptx_start_call_args +#undef TARGET_END_CALL_ARGS +#define TARGET_END_CALL_ARGS nvptx_end_call_args + +#undef TARGET_ASM_FILE_START +#define TARGET_ASM_FILE_START nvptx_file_start +#undef TARGET_ASM_GLOBALIZE_LABEL +#define TARGET_ASM_GLOBALIZE_LABEL nvptx_globalize_label +#undef TARGET_ASM_ASSEMBLE_UNDEFINED_DECL +#define TARGET_ASM_ASSEMBLE_UNDEFINED_DECL nvptx_assemble_undefined_decl +#undef TARGET_PRINT_OPERAND +#define TARGET_PRINT_OPERAND nvptx_print_operand +#undef TARGET_PRINT_OPERAND_ADDRESS +#define TARGET_PRINT_OPERAND_ADDRESS nvptx_print_operand_address +#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P +#define TARGET_PRINT_OPERAND_PUNCT_VALID_P nvptx_print_operand_punct_valid_p +#undef TARGET_ASM_INTEGER +#define TARGET_ASM_INTEGER nvptx_assemble_integer +#undef TARGET_ASM_DECL_END +#define TARGET_ASM_DECL_END nvptx_assemble_decl_end +#undef TARGET_ASM_DECLARE_CONSTANT_NAME +#define TARGET_ASM_DECLARE_CONSTANT_NAME nvptx_asm_declare_constant_name +#undef TARGET_ASM_PREDECLARE_FUNCTION +#define TARGET_ASM_PREDECLARE_FUNCTION nvptx_asm_predeclare_function +#undef TARGET_ASM_NEED_VAR_DECL_BEFORE_USE +#define TARGET_ASM_NEED_VAR_DECL_BEFORE_USE true + +#undef TARGET_MACHINE_DEPENDENT_REORG +#define TARGET_MACHINE_DEPENDENT_REORG nvptx_reorg +#undef TARGET_NO_REGISTER_ALLOCATION +#define TARGET_NO_REGISTER_ALLOCATION true + +#undef TARGET_ADDR_SPACE_FOR_FRAME +#define TARGET_ADDR_SPACE_FOR_FRAME nvptx_addr_space_for_frame +#undef TARGET_ADDR_SPACE_FOR_GLOBAL +#define TARGET_ADDR_SPACE_FOR_GLOBAL nvptx_addr_space_for_global +#undef TARGET_ADDR_SPACE_SUBSET_P +#define TARGET_ADDR_SPACE_SUBSET_P nvptx_addr_space_subset_p +#undef TARGET_ADDR_SPACE_CONVERT +#define TARGET_ADDR_SPACE_CONVERT nvptx_addr_space_convert +#undef TARGET_ADDR_SPACE_CONVERT +#define TARGET_ADDR_SPACE_CONVERT nvptx_addr_space_convert +#undef TARGET_ADDR_SPACE_ADDRESS_MODE +#define TARGET_ADDR_SPACE_ADDRESS_MODE nvptx_addr_space_address_mode +#undef TARGET_ADDR_SPACE_POINTER_MODE +#define TARGET_ADDR_SPACE_POINTER_MODE nvptx_addr_space_pointer_mode + +#undef TARGET_ASM_INIT_SECTIONS +#define TARGET_ASM_INIT_SECTIONS nvptx_asm_init_sections +#undef TARGET_ASM_NAMED_SECTION +#define TARGET_ASM_NAMED_SECTION nvptx_asm_named_section + +struct gcc_target targetm = TARGET_INITIALIZER; Index: gcc/config/nvptx/nvptx.opt =================================================================== --- /dev/null +++ gcc/config/nvptx/nvptx.opt @@ -0,0 +1,28 @@ +; Options for the NVPTX port +; Copyright 2013 Free Software Foundation, Inc. +; +; This file is part of GCC. +; +; GCC is free software; you can redistribute it and/or modify it under +; the terms of the GNU General Public License as published by the Free +; Software Foundation; either version 3, or (at your option) any later +; version. +; +; GCC is distributed in the hope that it will be useful, but WITHOUT ANY +; WARRANTY; without even the implied warranty of MERCHANTABILITY or +; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +; for more details. +; +; You should have received a copy of the GNU General Public License +; along with GCC; see the file COPYING3. If not see +; . + +m64 +Target Report RejectNegative Mask(ABI64) +Generate code for a 64 bit ABI + +m32 +Target Report RejectNegative InverseMask(ABI64) +Generate code for a 32 bit ABI + + Index: gcc/config/nvptx/t-nvptx =================================================================== --- /dev/null +++ gcc/config/nvptx/t-nvptx @@ -0,0 +1,4 @@ +# + +nvptx-c.o: $(srcdir)/config/nvptx/nvptx-c.c $(RTL_H) $(TREE_H) $(CONFIG_H) $(TM_H) + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< Index: gcc/config/nvptx/nvptx.h =================================================================== --- /dev/null +++ gcc/config/nvptx/nvptx.h @@ -0,0 +1,327 @@ +/* Target Definitions for NVPTX. + Copyright (C) 2013 Free Software Foundation, Inc. + Contributed by Bernd Schmidt + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_NVPTX_H +#define GCC_NVPTX_H + +/* Run-time Target. */ + +#define TARGET_CPU_CPP_BUILTINS() \ + do \ + { \ + builtin_assert ("machine=nvptx"); \ + builtin_assert ("cpu=nvptx"); \ + builtin_define ("__nvptx__"); \ + } while (0) + +/* Storage Layout. */ + +#define BITS_BIG_ENDIAN 0 +#define BYTES_BIG_ENDIAN 0 +#define WORDS_BIG_ENDIAN 0 + +/* Chosen such that we won't have to deal with multi-word subregs. */ +#define UNITS_PER_WORD 8 + +#define PARM_BOUNDARY 8 +#define STACK_BOUNDARY 64 +#define FUNCTION_BOUNDARY 32 +#define BIGGEST_ALIGNMENT 64 +#define STRICT_ALIGNMENT 1 + +/* Type Layout. */ + +#define DEFAULT_SIGNED_CHAR 1 + +#define SHORT_TYPE_SIZE 16 +#define INT_TYPE_SIZE 32 +#define LONG_TYPE_SIZE (TARGET_ABI64 ? 64 : 32) +#define LONG_LONG_TYPE_SIZE 64 +#define FLOAT_TYPE_SIZE 32 +#define DOUBLE_TYPE_SIZE 64 +#define LONG_DOUBLE_TYPE_SIZE 64 + +#undef SIZE_TYPE +#define SIZE_TYPE "long unsigned int" +#undef PTRDIFF_TYPE +#define PTRDIFF_TYPE "int" + +#define POINTER_SIZE (TARGET_ABI64 ? 64 : 32) + +#define Pmode (TARGET_ABI64 ? DImode : SImode) + +/* Registers. */ + +#define FIRST_PSEUDO_REGISTER 16 +#define FIXED_REGISTERS \ + { 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1 } +#define CALL_USED_REGISTERS \ + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 } + +#define HARD_REGNO_NREGS(regno, mode) 1 + +#define HARD_REGNO_MODE_OK(reg, mode) 1 + +/* Register Classes. */ + +enum reg_class + { + NO_REGS, + ALL_REGS, + LIM_REG_CLASSES + }; + +#define N_REG_CLASSES (int) LIM_REG_CLASSES + +#define REG_CLASS_NAMES { \ + "NO_REGS", \ + "ALL_REGS" } + +#define REG_CLASS_CONTENTS \ +{ \ + /* NO_REGS. */ \ + { 0x0000 }, \ + /* ALL_REGS. */ \ + { 0xFFFF }, \ +} + +#define GENERAL_REGS ALL_REGS + +#define REGNO_REG_CLASS(reg) ALL_REGS + +#define BASE_REG_CLASS ALL_REGS +#define INDEX_REG_CLASS NO_REGS + +#define REGNO_OK_FOR_BASE_P(X) true +#define REGNO_OK_FOR_INDEX_P(X) false + +#define CLASS_MAX_NREGS(class, mode) \ + ((GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD) + +#define MODES_TIEABLE_P(M1, M2) false + +#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ + if (GET_MODE_CLASS (MODE) == MODE_INT \ + && GET_MODE_SIZE (MODE) < GET_MODE_SIZE (SImode)) \ + { \ + (MODE) = SImode; \ + } + +/* Address spaces. */ +#define ADDR_SPACE_GLOBAL 1 +#define ADDR_SPACE_SHARED 3 +#define ADDR_SPACE_CONST 4 +#define ADDR_SPACE_LOCAL 5 +#define ADDR_SPACE_PARAM 101 + +#define REGISTER_TARGET_PRAGMAS() nvptx_register_pragmas() + +/* Stack and Calling. */ + +#define STARTING_FRAME_OFFSET 0 +#define FRAME_GROWS_DOWNWARD 0 +#define STACK_GROWS_DOWNWARD + +#define STACK_POINTER_REGNUM 1 +#define HARD_FRAME_POINTER_REGNUM 2 +#define FRAME_POINTER_REGNUM 15 +#define ARG_POINTER_REGNUM 14 +#define RETURN_ADDR_REGNO 13 + +#define STATIC_CHAIN_REGNUM 12 +#define OUTGOING_ARG_POINTER_REGNUM 11 + +#define FIRST_PARM_OFFSET(FNDECL) 0 + +#define ACCUMULATE_OUTGOING_ARGS 1 + +struct nvptx_args { + /* Number of arguments passed in registers so far. */ + int count; + /* Offset into the stdarg area so far. */ + HOST_WIDE_INT off; +}; + +#define CUMULATIVE_ARGS struct nvptx_args + +#define INIT_CUMULATIVE_ARGS(cum, fntype, libname, fndecl, n_named_args) \ + do { cum.count = 0; cum.off = 0; } while (0) + +#define FUNCTION_ARG_REGNO_P(r) 0 + +#define DEFAULT_PCC_STRUCT_RETURN 0 + +#define FUNCTION_PROFILER(file, labelno) \ + fatal_error ("profiling is not yet implemented for this architecture") + +#define TRAMPOLINE_SIZE 32 +#define TRAMPOLINE_ALIGNMENT 256 + +/* We don't run reload, so this isn't actually used, but it still needs to be + defined. */ +#define ELIMINABLE_REGS \ +{{ FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM}} + +/* Define the offset between two registers, one to be eliminated, and the other + its replacement, at the start of a routine. */ + +#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \ + ((OFFSET) = 0) + +/* Addressing Modes. */ + +#define MAX_REGS_PER_ADDRESS 1 + +#define LEGITIMATE_PIC_OPERAND_P(X) 1 + + +struct nvptx_pseudo_info +{ + int true_size; + int renumber; +}; + +struct GTY(()) machine_function +{ + rtx call_args; + rtx start_call; + bool has_call_with_varargs; + struct GTY((skip)) nvptx_pseudo_info *pseudos; + HOST_WIDE_INT outgoing_stdarg_size; +}; + + +/* Costs. */ + +#define NO_FUNCTION_CSE 1 +#define SLOW_BYTE_ACCESS 0 +#define BRANCH_COST(speed_p, predictable_p) 6 + +/* Assembler Format. */ + +#undef ASM_DECLARE_FUNCTION_NAME +#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL) \ + nvptx_declare_function_name (FILE, NAME, DECL) + +#undef ASM_DECLARE_FUNCTION_SIZE +#define ASM_DECLARE_FUNCTION_SIZE(STREAM, NAME, DECL) \ + nvptx_function_end (STREAM) + +#define DWARF2_ASM_LINE_DEBUG_INFO 1 + +#undef ASM_APP_ON +#define ASM_APP_ON "\t// #APP \n" +#undef ASM_APP_OFF +#define ASM_APP_OFF "\t// #NO_APP \n" + +#define ASM_OUTPUT_COMMON(stream, name, size, rounded) +#define ASM_OUTPUT_LOCAL(stream, name, size, rounded) + +#define REGISTER_NAMES \ + { \ + "%hr0", "%outargs", "%hr2", "%hr3", "%retval", "%retval_in", "%hr6", "%hr7", \ + "%hr8", "%hr9", "%hr10", "%hr11", "%hr12", "%hr13", "%argp", "%frame" \ + } + +#define DBX_REGISTER_NUMBER(N) N + +#define TEXT_SECTION_ASM_OP "" +#define DATA_SECTION_ASM_OP "" + +/* This is how to store into the string LABEL + the symbol_ref name of an internal numbered label where + PREFIX is the class of label and NUM is the number within the class. + This is suitable for output with `assemble_name'. */ + +#undef ASM_GENERATE_INTERNAL_LABEL +#define ASM_GENERATE_INTERNAL_LABEL(LABEL, PREFIX, NUM) \ + do \ + { \ + char *__p; \ + __p = stpcpy (&(LABEL)[1], PREFIX); \ + (LABEL)[0] = '$'; \ + sprint_ul (__p, (unsigned long) (NUM)); \ + } \ + while (0) + +#define ASM_OUTPUT_ALIGN(FILE, POWER) +#define ASM_OUTPUT_SKIP(FILE, N) \ + nvptx_output_skip (FILE, N) +#undef ASM_OUTPUT_ASCII +#define ASM_OUTPUT_ASCII(FILE, STR, LENGTH) \ + nvptx_output_ascii (FILE, STR, LENGTH); + +#define ASM_DECLARE_OBJECT_NAME(FILE, NAME, DECL) \ + nvptx_declare_object_name (FILE, NAME, DECL) + +#undef ASM_OUTPUT_ALIGNED_DECL_COMMON +#define ASM_OUTPUT_ALIGNED_DECL_COMMON(FILE, DECL, NAME, SIZE, ALIGN) \ + do \ + { \ + fprintf (FILE, "// BEGIN VAR DEF: %s\n", NAME); \ + fprintf (FILE, ".visible.global.align %d .b8 ", \ + (ALIGN) / BITS_PER_UNIT); \ + assemble_name ((FILE), (NAME)); \ + fprintf (FILE, "["HOST_WIDE_INT_PRINT_DEC"];\n", (SIZE)); \ + fprintf (FILE, "// END VAR DEF\n"); \ + } \ + while (0) + +/* This says how to output assembler code to declare an + uninitialized internal linkage data object. */ + +#undef ASM_OUTPUT_ALIGNED_DECL_LOCAL +#define ASM_OUTPUT_ALIGNED_DECL_LOCAL(FILE, DECL, NAME, SIZE, ALIGN) \ + do \ + { \ + fputs (".global", FILE); \ + fputs (nvptx_ptx_type_from_mode (DECL_MODE (DECL), false), FILE); \ + fprintf (FILE, ".align %d %s", \ + exact_log2((ALIGN) / BITS_PER_UNIT), (NAME)); \ +} while (0) + +#define CASE_VECTOR_PC_RELATIVE flag_pic +#define JUMP_TABLES_IN_TEXT_SECTION flag_pic + +#define ADDR_VEC_ALIGN(VEC) (JUMP_TABLES_IN_TEXT_SECTION ? 5 : 2) + +/* Misc. */ + +#define DWARF2_DEBUGGING_INFO 1 + +#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2) + +#define NO_DOT_IN_LABEL +#define ASM_COMMENT_START "//" + +#define STORE_FLAG_VALUE -1 +#define FLOAT_STORE_FLAG_VALUE(MODE) REAL_VALUE_ATOF("1.0", (MODE)) + +#define CASE_VECTOR_MODE SImode +#define MOVE_MAX 4 +#define MOVE_RATIO(SPEED) 4 +#define TRULY_NOOP_TRUNCATION(outprec, inprec) 1 +#define FUNCTION_MODE QImode +#define HAS_INIT_SECTION 1 + +#endif /* GCC_NVPTX_H */ Index: gcc/config/nvptx/nvptx-protos.h =================================================================== --- /dev/null +++ gcc/config/nvptx/nvptx-protos.h @@ -0,0 +1,43 @@ +/* Prototypes for exported functions defined in nvptx.c. + Copyright (C) 2013 Free Software Foundation, Inc. + Contributed by Bernd Schmidt + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#ifndef GCC_NVPTX_PROTOS_H +#define GCC_NVPTX_PROTOS_H + +extern void nvptx_declare_function_name (FILE *, const char *, const_tree decl); +extern void nvptx_declare_object_name (FILE *file, const char *name, + const_tree decl); +extern void nvptx_function_end (FILE *); +extern void nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT); +extern void nvptx_output_ascii (FILE *, const char *, unsigned HOST_WIDE_INT); +extern void nvptx_register_pragmas (void); + +#ifdef RTX_CODE +extern void nvptx_expand_call (rtx, rtx); +extern rtx nvptx_expand_compare (rtx); +extern const char *nvptx_ptx_type_from_mode (enum machine_mode, bool); +extern const char *nvptx_output_call_insn (rtx, rtx, rtx); +extern const char *nvptx_output_start_call (rtx); +extern const char *nvptx_output_return (void); +extern bool nvptx_expand_setmem (rtx, rtx, rtx, rtx, rtx, rtx); +extern bool nvptx_expand_movmem (rtx, rtx, rtx, rtx, rtx, rtx); +#endif +#endif + Index: gcc/config/nvptx/nvptx.md =================================================================== --- /dev/null +++ gcc/config/nvptx/nvptx.md @@ -0,0 +1,1109 @@ +;; Machine description for NVPTX. +;; Copyright (C) 2013 Free Software Foundation, Inc. +;; Contributed by Bernd Schmidt +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; . + +(define_c_enum "unspecv" [ + UNSPECV_START_CALL + UNSPECV_END_CALL + UNSPECV_SET_RET_REG +]) + +(define_c_enum "unspec" [ + UNSPEC_ARG_REG + UNSPEC_FROM_GLOBAL + UNSPEC_FROM_LOCAL + UNSPEC_FROM_PARAM + UNSPEC_FROM_SHARED + UNSPEC_FROM_CONST + UNSPEC_TO_GLOBAL + UNSPEC_TO_LOCAL + UNSPEC_TO_PARAM + UNSPEC_TO_SHARED + UNSPEC_TO_CONST + + UNSPEC_COPYSIGN + UNSPEC_LOG2 + UNSPEC_EXP2 + UNSPEC_SIN + UNSPEC_COS + + UNSPEC_BITREV + + UNSPEC_ALLOCA +]) + +(define_attr "subregs_ok" "false,true" + (const_string "false")) + +(define_predicate "nvptx_register_operand" + (match_code "reg,subreg") +{ + if (REG_P (op)) + return !HARD_REGISTER_P (op); + if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op))) + return false; + if (GET_CODE (op) == SUBREG) + return false; + return register_operand (op, mode); +}) + +(define_predicate "nvptx_reg_or_mem_operand" + (match_code "mem,reg,subreg") +{ + if (REG_P (op)) + return !HARD_REGISTER_P (op); + if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op))) + return false; + if (GET_CODE (op) == SUBREG) + return false; + return memory_operand (op, mode) || register_operand (op, mode); +}) + +;; Allow registers or symbolic constants. We can allow frame, arg or stack +;; pointers here since they are actually symbolic constants. +(define_predicate "nvptx_register_or_symbolic_operand" + (match_code "reg,subreg,symbol_ref,const") +{ + if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op))) + return false; + if (GET_CODE (op) == SUBREG) + return false; + if (CONSTANT_P (op)) + return true; + return register_operand (op, mode); +}) + +;; Registers or constants for normal instructions. Does not allow symbolic +;; constants. +(define_predicate "nvptx_nonmemory_operand" + (match_code "reg,subreg,const_int,const_double") +{ + if (REG_P (op)) + return !HARD_REGISTER_P (op); + if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op))) + return false; + if (GET_CODE (op) == SUBREG) + return false; + return nonmemory_operand (op, mode); +}) + +;; A source operand for a move instruction. This is the only predicate we use +;; that accepts symbolic constants. +(define_predicate "nvptx_general_operand" + (match_code "reg,subreg,mem,const,symbol_ref,label_ref,const_int,const_double") +{ + if (REG_P (op)) + return !HARD_REGISTER_P (op); + return general_operand (op, mode); +}) + +;; A destination operand for a move instruction. This is the only destination +;; predicate that accepts the return register since it requires special handling. +(define_predicate "nvptx_nonimmediate_operand" + (match_code "reg,subreg,mem") +{ + if (REG_P (op)) + return (op != frame_pointer_rtx + && op != arg_pointer_rtx + && op != stack_pointer_rtx); + return nonimmediate_operand (op, mode); +}) + +(define_predicate "const_0_operand" + (and (match_code "const_int,const_double,const_vector") + (match_test "op == CONST0_RTX (GET_MODE (op))"))) + +(define_predicate "global_mem_operand" + (and (match_code "mem") + (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_GLOBAL"))) + +(define_predicate "const_mem_operand" + (and (match_code "mem") + (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_CONST"))) + +(define_predicate "param_mem_operand" + (and (match_code "mem") + (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_PARAM"))) + +(define_predicate "shared_mem_operand" + (and (match_code "mem") + (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_SHARED"))) + +(define_predicate "const0_operand" + (and (match_code "const_int") + (match_test "op == const0_rtx"))) + +;; True if this operator is valid for predication. +(define_predicate "predicate_operator" + (match_code "eq,ne")) + +(define_predicate "ne_operator" + (match_code "ne")) + +(define_predicate "nvptx_comparison_operator" + (match_code "eq,ne,le,ge,lt,gt,leu,geu,ltu,gtu")) + +(define_predicate "nvptx_float_comparison_operator" + (match_code "eq,ne,le,ge,lt,gt,uneq,unle,unge,unlt,ungt")) + +;; Test for a valid operand for a call instruction. +(define_special_predicate "call_insn_operand" + (match_code "symbol_ref,reg") +{ + if (GET_CODE (op) == SYMBOL_REF) + { + tree decl = SYMBOL_REF_DECL (op); + if (decl && TYPE_ARG_TYPES (TREE_TYPE (decl)) == NULL_TREE) + return false; + } + return true; +}) + +;; Return true if OP is a call with parallel USEs of the argument +;; pseudos. +(define_predicate "call_operation" + (match_code "parallel") +{ + unsigned i; + + for (i = 1; i < XVECLEN (op, 0); i++) + { + rtx elt = XVECEXP (op, 0, i); + enum machine_mode mode; + unsigned regno; + + if (GET_CODE (elt) != USE + || GET_CODE (XEXP (elt, 0)) != REG + || XEXP (elt, 0) == frame_pointer_rtx + || XEXP (elt, 0) == arg_pointer_rtx + || XEXP (elt, 0) == stack_pointer_rtx) + + return false; + } + return true; +}) + +(define_constraint "P0" + "An integer with the value 0." + (and (match_code "const_int") + (match_test "ival == 0"))) + +(define_constraint "P1" + "An integer with the value 1." + (and (match_code "const_int") + (match_test "ival == 1"))) + +(define_constraint "Pn" + "An integer with the value -1." + (and (match_code "const_int") + (match_test "ival == -1"))) + +(define_constraint "R" + "A pseudo register." + (match_code "reg")) + +(define_constraint "Ia" + "Any integer constant." + (and (match_code "const_int") (match_test "true"))) + +(define_mode_iterator QHSDISDFM [QI HI SI DI SF DF]) +(define_mode_iterator QHSDIM [QI HI SI DI]) +(define_mode_iterator HSDIM [HI SI DI]) +(define_mode_iterator BHSDIM [BI HI SI DI]) +(define_mode_iterator SDIM [SI DI]) +(define_mode_iterator SDISDFM [SI DI SF DF]) +(define_mode_iterator QHIM [QI HI]) +(define_mode_iterator QHSIM [QI HI SI]) +(define_mode_iterator SDFM [SF DF]) + +;; This mode iterator allows :P to be used for patterns that operate on +;; pointer-sized quantities. Exactly one of the two alternatives will match. +(define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")]) + +;; We should get away with not defining memory alternatives, since we don't +;; get variables in this mode and pseudos are never spilled. +(define_insn "movbi" + [(set (match_operand:BI 0 "nvptx_register_operand" "=R,R,R") + (match_operand:BI 1 "nvptx_nonmemory_operand" "R,P0,Pn"))] + "" + "@ + %.\\tmov%t0\\t%0, %1; + %.\\tsetp.eq.u32\\t%0, 1, 0; + %.\\tsetp.eq.u32\\t%0, 1, 1;") + +(define_insn "*mov_insn" + [(set (match_operand:QHSDIM 0 "nvptx_nonimmediate_operand" "=R,R,R,m") + (match_operand:QHSDIM 1 "general_operand" "n,Ri,m,R"))] + "!(MEM_P (operands[0]) + && (!REG_P (operands[1]) || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER))" +{ + if (which_alternative == 2) + return "%.\\tld%A1%u1\\t%0, %1;"; + if (which_alternative == 3) + return "%.\\tst%A0%u0\\t%0, %1;"; + + rtx dst = operands[0]; + rtx src = operands[1]; + if (GET_CODE (dst) == SUBREG) + dst = SUBREG_REG (dst); + if (GET_CODE (src) == SUBREG) + src = SUBREG_REG (src); + enum machine_mode dst_mode = GET_MODE (dst); + enum machine_mode src_mode = GET_MODE (src); + if (CONSTANT_P (operands[1])) + { + if (GET_MODE_CLASS (dst_mode) != MODE_INT) + return "%.\\tmov.b%T0\\t%0, %1;"; + else + return "%.\\tmov%t0\\t%0, %1;"; + } + + if (src_mode == QImode) + src_mode = SImode; + if (dst_mode == QImode) + dst_mode = SImode; + /* Special handling for the return register; we allow this register to + only occur in the destination of a move insn. */ + if (REG_P (dst) && REGNO (dst) == 4 && dst_mode == HImode) + dst_mode = SImode; + if (dst_mode == src_mode) + return "%.\\tmov%t0\\t%0, %1;"; + /* Mode-punning between floating point and integer. */ + if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode)) + return "%.\\tmov.b%T0\\t%0, %1;"; + return "%.\\tcvt%t0%t1\\t%0, %1;"; +} + [(set_attr "subregs_ok" "true")]) + +(define_insn "*mov_insn" + [(set (match_operand:SDFM 0 "nvptx_nonimmediate_operand" "=R,R,m") + (match_operand:SDFM 1 "general_operand" "RF,m,R"))] + "!(MEM_P (operands[0]) && !REG_P (operands[1]))" +{ + if (which_alternative == 1) + return "%.\\tld%A1%u1\\t%0, %1;"; + if (which_alternative == 2) + return "%.\\tst%A0%u0\\t%0, %1;"; + + rtx dst = operands[0]; + rtx src = operands[1]; + if (GET_CODE (dst) == SUBREG) + dst = SUBREG_REG (dst); + if (GET_CODE (src) == SUBREG) + src = SUBREG_REG (src); + enum machine_mode dst_mode = GET_MODE (dst); + enum machine_mode src_mode = GET_MODE (src); + if (dst_mode == src_mode) + return "%.\\tmov%t0\\t%0, %1;"; + if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode)) + return "%.\\tmov.b%T0\\t%0, %1;"; + gcc_unreachable (); +} + [(set_attr "subregs_ok" "true")]) + +(define_insn "load_arg_reg" + [(set (match_operand:QHIM 0 "nvptx_register_operand" "=R") + (unspec:QHIM [(match_operand 1 "const_int_operand" "i")] + UNSPEC_ARG_REG))] + "" + "%.\\tcvt%t0.u32\\t%0, %%ar%1;") + +(define_insn "load_arg_reg" + [(set (match_operand:SDISDFM 0 "nvptx_register_operand" "=R") + (unspec:SDISDFM [(match_operand 1 "const_int_operand" "i")] + UNSPEC_ARG_REG))] + "" + "%.\\tmov%t0\\t%0, %%ar%1;") + +(define_expand "mov" + [(set (match_operand:QHSDISDFM 0 "nvptx_nonimmediate_operand" "") + (match_operand:QHSDISDFM 1 "general_operand" ""))] + "" +{ + /* Hard registers are often actually symbolic operands on this target. + Don't allow them when storing to memory. */ + if (MEM_P (operands[0]) + && (!REG_P (operands[1]) + || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER)) + { + rtx tmp = gen_reg_rtx (mode); + emit_move_insn (tmp, operands[1]); + emit_move_insn (operands[0], tmp); + DONE; + } +}) + +(define_expand "movmem" + [(use (match_operand:BLK 0 "memory_operand")) + (use (match_operand:BLK 1 "memory_operand")) + (use (match_operand:QHSDIM 2 "nonmemory_operand")) + (use (match_operand:QHSDIM 3 "const_int_operand")) + (use (match_operand:SI 4 "const_int_operand")) + (use (match_operand:SI 5 "const_int_operand"))] + "" +{ + if (nvptx_expand_movmem (operands[0], operands[1], operands[2], operands[3], + operands[4], operands[5])) + DONE; + else + FAIL; +}) + +(define_expand "setmem" + [(use (match_operand:BLK 0 "memory_operand")) + (use (match_operand:QHSDIM 1 "nonmemory_operand")) + (use (match_operand:QI 2 "nonmemory_operand")) + (use (match_operand 3 "const_int_operand")) + (use (match_operand:SI 4 "const_int_operand")) + (use (match_operand:SI 5 "const_int_operand"))] + "" +{ + if (nvptx_expand_setmem (operands[0], operands[1], + operands[2], operands[3], + operands[4], operands[5])) + DONE; + else + FAIL; +}) + +(define_insn "zero_extendqihi2" + [(set (match_operand:HI 0 "nvptx_register_operand" "=R,R") + (zero_extend:HI (match_operand:QI 1 "nvptx_reg_or_mem_operand" "R,m")))] + "" + "@ + %.\\tcvt.u16.u%T1\\t%0, %1; + %.\\tld%A1.u8\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +(define_insn "zero_extendsi2" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R") + (zero_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))] + "" + "@ + %.\\tcvt.u32.u%T1\\t%0, %1; + %.\\tld%A1.u%T1\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +(define_insn "zero_extenddi2" + [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R") + (zero_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))] + "" + "@ + %.\\tcvt.u64.u%T1\\t%0, %1; + %.\\tld%A1%u1\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +(define_insn "extendsi2" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R") + (sign_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))] + "" + "@ + %.\\tcvt.s32.s%T1\\t%0, %1; + %.\\tld%A1.s%T1\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +(define_insn "extenddi2" + [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R") + (sign_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))] + "" + "@ + %.\\tcvt.s64.s%T1\\t%0, %1; + %.\\tld%A1.s%T1\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +(define_insn "trunchiqi2" + [(set (match_operand:QI 0 "nvptx_reg_or_mem_operand" "=R,m") + (truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))] + "" + "@ + %.\\tcvt%t0.u16\\t%0, %1; + %.\\tst%A0.u8\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +(define_insn "truncsi2" + [(set (match_operand:QHIM 0 "nvptx_reg_or_mem_operand" "=R,m") + (truncate:QHIM (match_operand:SI 1 "nvptx_register_operand" "R,R")))] + "" + "@ + %.\\tcvt%t0.u32\\t%0, %1; + %.\\tst%A0.u%T0\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +(define_insn "truncdi2" + [(set (match_operand:QHSIM 0 "nvptx_reg_or_mem_operand" "=R,m") + (truncate:QHSIM (match_operand:DI 1 "nvptx_register_operand" "R,R")))] + "" + "@ + %.\\tcvt%t0.u64\\t%0, %1; + %.\\tst%A0.u%T0\\t%0, %1;" + [(set_attr "subregs_ok" "true")]) + +;; Pointer address space conversions + +(define_int_iterator cvt_code + [UNSPEC_FROM_GLOBAL + UNSPEC_FROM_LOCAL + UNSPEC_FROM_SHARED + UNSPEC_FROM_CONST + UNSPEC_TO_GLOBAL + UNSPEC_TO_LOCAL + UNSPEC_TO_SHARED + UNSPEC_TO_CONST]) + +(define_int_attr cvt_name + [(UNSPEC_FROM_GLOBAL "from_global") + (UNSPEC_FROM_LOCAL "from_local") + (UNSPEC_FROM_SHARED "from_shared") + (UNSPEC_FROM_CONST "from_const") + (UNSPEC_TO_GLOBAL "to_global") + (UNSPEC_TO_LOCAL "to_local") + (UNSPEC_TO_SHARED "to_shared") + (UNSPEC_TO_CONST "to_const")]) + +(define_int_attr cvt_str + [(UNSPEC_FROM_GLOBAL ".global") + (UNSPEC_FROM_LOCAL ".local") + (UNSPEC_FROM_SHARED ".shared") + (UNSPEC_FROM_CONST ".const") + (UNSPEC_TO_GLOBAL ".to.global") + (UNSPEC_TO_LOCAL ".to.local") + (UNSPEC_TO_SHARED ".to.shared") + (UNSPEC_TO_CONST ".to.const")]) + +(define_insn "convaddr_" + [(set (match_operand:P 0 "nvptx_register_operand" "=R") + (unspec:P [(match_operand:P 1 "nvptx_register_or_symbolic_operand" "Rs")] cvt_code))] + "" + "%.\\tcvta%t0\\t%0, %1;") + +;; Integer arithmetic + +(define_insn "add3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (plus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tadd%t0\\t%0, %1, %2;") + +(define_insn "sub3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_register_operand" "R")))] + "" + "%.\\tsub%t0\\t%0, %1, %2;") + +(define_insn "mul3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmul.lo%t0\\t%0, %1, %2;") + +(define_insn "*mad3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (plus:HSDIM (mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")) + (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmad.lo%t0\\t%0, %1, %2, %3;") + +(define_insn "div3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (div:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tdiv.s%T0\\t%0, %1, %2;") + +(define_insn "udiv3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (udiv:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tdiv.u%T0\\t%0, %1, %2;") + +(define_insn "mod3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (mod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\trem.s%T0\\t%0, %1, %2;") + +(define_insn "umod3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (umod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\trem.u%T0\\t%0, %1, %2;") + +(define_insn "smin3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (smin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmin.s%T0\\t%0, %1, %2;") + +(define_insn "umin3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (umin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmin.u%T0\\t%0, %1, %2;") + +(define_insn "smax3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (smax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmax.s%T0\\t%0, %1, %2;") + +(define_insn "umax3" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (umax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R") + (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tmax.u%T0\\t%0, %1, %2;") + +(define_insn "abs2" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (abs:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tabs.s%T0\\t%0, %1;") + +(define_insn "neg2" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (neg:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tneg.s%T0\\t%0, %1;") + +(define_insn "one_cmpl2" + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R") + (not:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tnot.b%T0\\t%0, %1;") + +(define_insn "bitrev2" + [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") + (unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")] + UNSPEC_BITREV))] + "" + "%.\\tbrev.b%T0\\t%0, %1;") + +(define_insn "clz2" + [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") + (clz:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tclz.b%T0\\t%0, %1;") + +(define_expand "ctz2" + [(set (match_operand:SDIM 0 "nvptx_register_operand" "") + (ctz:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "")))] + "" +{ + rtx tmpreg = gen_reg_rtx (mode); + emit_insn (gen_bitrev2 (tmpreg, operands[1])); + emit_insn (gen_clz2 (operands[0], tmpreg)); + DONE; +}) + +;; Shifts + +(define_insn "ashl3" + [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") + (ashift:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R") + (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tshl.b%T0\\t%0, %1, %2;") + +(define_insn "ashr3" + [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") + (ashiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R") + (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tshr.s%T0\\t%0, %1, %2;") + +(define_insn "lshr3" + [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R") + (lshiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R") + (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tshr.u%T0\\t%0, %1, %2;") + +;; Logical operations + +(define_insn "and3" + [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R") + (and:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R") + (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tand.b%T0\\t%0, %1, %2;") + +(define_insn "ior3" + [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R") + (ior:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R") + (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\tor.b%T0\\t%0, %1, %2;") + +(define_insn "xor3" + [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R") + (xor:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R") + (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))] + "" + "%.\\txor.b%T0\\t%0, %1, %2;") + +;; Comparisons and branches + +(define_insn "*cmp" + [(set (match_operand:BI 0 "nvptx_register_operand" "=R") + (match_operator:BI 1 "nvptx_comparison_operator" + [(match_operand:HSDIM 2 "nvptx_register_operand" "R") + (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))] + "" + "%.\\tsetp%c1 %0,%2,%3;") + +(define_insn "*cmp" + [(set (match_operand:BI 0 "nvptx_register_operand" "=R") + (match_operator:BI 1 "nvptx_float_comparison_operator" + [(match_operand:SDFM 2 "nvptx_register_operand" "R") + (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))] + "" + "%.\\tsetp%c1 %0,%2,%3;") + +(define_insn "jump" + [(set (pc) + (label_ref (match_operand 0 "" "")))] + "" + "%.\\tbra\\t%l0;") + +(define_insn "br_true" + [(set (pc) + (if_then_else (ne (match_operand:BI 0 "nvptx_register_operand" "R") + (const_int 0)) + (label_ref (match_operand 1 "" "")) + (pc)))] + "" + "%j0\\tbra\\t%l1;") + +(define_insn "br_false" + [(set (pc) + (if_then_else (eq (match_operand:BI 0 "nvptx_register_operand" "R") + (const_int 0)) + (label_ref (match_operand 1 "" "")) + (pc)))] + "" + "%J0\\tbra\\t%l1;") + +(define_expand "cbranch4" + [(set (pc) + (if_then_else (match_operator 0 "nvptx_comparison_operator" + [(match_operand:HSDIM 1 "nvptx_register_operand" "") + (match_operand:HSDIM 2 "nvptx_register_operand" "")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "" +{ + rtx t = nvptx_expand_compare (operands[0]); + operands[0] = t; + operands[1] = XEXP (t, 0); + operands[2] = XEXP (t, 1); +}) + +(define_expand "cbranch4" + [(set (pc) + (if_then_else (match_operator 0 "nvptx_float_comparison_operator" + [(match_operand:SDFM 1 "nvptx_register_operand" "") + (match_operand:SDFM 2 "nvptx_register_operand" "")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "" +{ + rtx t = nvptx_expand_compare (operands[0]); + operands[0] = t; + operands[1] = XEXP (t, 0); + operands[2] = XEXP (t, 1); +}) + +(define_expand "cbranchbi4" + [(set (pc) + (if_then_else (match_operator 0 "predicate_operator" + [(match_operand:BI 1 "nvptx_register_operand" "") + (match_operand:BI 2 "const0_operand" "")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "" + "") + +;; Conditional stores + +(define_insn "setcc_from_bi" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (ne:SI (match_operand:BI 1 "nvptx_register_operand" "R") + (const_int 0)))] + "" + "%.\\tselp%t0 %0,-1,0,%1;") + +(define_insn "setcc_int" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (match_operator:SI 1 "nvptx_comparison_operator" + [(match_operand:HSDIM 2 "nvptx_register_operand" "R") + (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))] + "" + "%.\\tset%t0%c1 %0,%2,%3;") + +(define_insn "setcc_int" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (match_operator:SI 1 "nvptx_float_comparison_operator" + [(match_operand:SDFM 2 "nvptx_register_operand" "R") + (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))] + "" + "%.\\tset%t0%c1 %0,%2,%3;") + +(define_insn "setcc_float" + [(set (match_operand:SF 0 "nvptx_register_operand" "=R") + (match_operator:SF 1 "nvptx_comparison_operator" + [(match_operand:HSDIM 2 "nvptx_register_operand" "R") + (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))] + "" + "%.\\tset%t0%c1 %0,%2,%3;") + +(define_insn "setcc_float" + [(set (match_operand:SF 0 "nvptx_register_operand" "=R") + (match_operator:SF 1 "nvptx_float_comparison_operator" + [(match_operand:SDFM 2 "nvptx_register_operand" "R") + (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))] + "" + "%.\\tset%t0%c1 %0,%2,%3;") + +(define_expand "cstorebi4" + [(set (match_operand:SI 0 "nvptx_register_operand") + (match_operator:SI 1 "ne_operator" + [(match_operand:BI 2 "nvptx_register_operand") + (match_operand:BI 3 "const0_operand")]))] + "" + "") + +(define_expand "cstore4" + [(set (match_operand:SI 0 "nvptx_register_operand") + (match_operator:SI 1 "nvptx_comparison_operator" + [(match_operand:HSDIM 2 "nvptx_register_operand") + (match_operand:HSDIM 3 "nvptx_nonmemory_operand")]))] + "" + "") + +(define_expand "cstore4" + [(set (match_operand:SI 0 "nvptx_register_operand") + (match_operator:SI 1 "nvptx_float_comparison_operator" + [(match_operand:SDFM 2 "nvptx_register_operand") + (match_operand:SDFM 3 "nvptx_nonmemory_operand")]))] + "" + "") + +;; Calls + +(define_insn "start_call_block" + [(unspec_volatile [(match_operand 0 "" "")] UNSPECV_START_CALL)] + "" +{ + return nvptx_output_start_call (operands[0]); +}) + +(define_insn "end_call_block" + [(unspec_volatile [(const_int 0)] UNSPECV_END_CALL)] + "" + "}") + +(define_insn "call_insn" + [(match_parallel 2 "call_operation" + [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "Rs")) + (match_operand 1))])] + "" +{ + return nvptx_output_call_insn (insn, NULL_RTX, operands[0]); +}) + +(define_insn "call_value_insn" + [(match_parallel 3 "call_operation" + [(set (match_operand 0 "nvptx_register_operand" "=R") + (call (mem:QI (match_operand:SI 1 "call_insn_operand" "Rs")) + (match_operand 2)))])] + "" +{ + return nvptx_output_call_insn (insn, operands[0], operands[1]); +}) + +(define_expand "call" + [(match_operand 0 "" "")] + "" +{ + nvptx_expand_call (NULL_RTX, operands[0]); + DONE; +}) + +(define_expand "call_value" + [(match_operand 0 "" "") + (match_operand 1 "" "")] + "" +{ + nvptx_expand_call (operands[0], operands[1]); + DONE; +}) + +;; Floating point arithmetic. + +(define_insn "add3" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (plus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))] + "" + "%.\\tadd%t0\\t%0, %1, %2;") + +(define_insn "sub3" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (minus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_register_operand" "R")))] + "" + "%.\\tsub%t0\\t%0, %1, %2;") + +(define_insn "mul3" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (mult:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))] + "" + "%.\\tmul%t0\\t%0, %1, %2;") + +(define_insn "fma4" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (fma:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF") + (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")))] + "" + "%.\\tfma%#%t0\\t%0, %1, %2, %3;") + +(define_insn "div3" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (div:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))] + "" + "%.\\tdiv%#%t0\\t%0, %1, %2;") + +(define_insn "copysign3" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_register_operand" "R")] + UNSPEC_COPYSIGN))] + "" + "%.\\tcopysign%t0\\t%0, %2, %1;") + +(define_insn "smin3" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (smin:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))] + "" + "%.\\tmin%t0\\t%0, %1, %2;") + +(define_insn "smax3" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (smax:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R") + (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))] + "" + "%.\\tmax%t0\\t%0, %1, %2;") + +(define_insn "abs2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (abs:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tabs%t0\\t%0, %1;") + +(define_insn "neg2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (neg:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tneg%t0\\t%0, %1;") + +(define_insn "sqrt2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (sqrt:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tsqrt%#%t0\\t%0, %1;") + +(define_insn "sinsf2" + [(set (match_operand:SF 0 "nvptx_register_operand" "=R") + (unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")] + UNSPEC_SIN))] + "flag_unsafe_math_optimizations" + "%.\\tsin.approx%t0\\t%0, %1;") + +(define_insn "cossf2" + [(set (match_operand:SF 0 "nvptx_register_operand" "=R") + (unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")] + UNSPEC_COS))] + "flag_unsafe_math_optimizations" + "%.\\tcos.approx%t0\\t%0, %1;") + +(define_insn "log2sf2" + [(set (match_operand:SF 0 "nvptx_register_operand" "=R") + (unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")] + UNSPEC_LOG2))] + "flag_unsafe_math_optimizations" + "%.\\tlg2.approx%t0\\t%0, %1;") + +(define_insn "exp2sf2" + [(set (match_operand:SF 0 "nvptx_register_operand" "=R") + (unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")] + UNSPEC_EXP2))] + "flag_unsafe_math_optimizations" + "%.\\tex2.approx%t0\\t%0, %1;") + +;; Conversions involving floating point + +(define_insn "extendsfdf2" + [(set (match_operand:DF 0 "nvptx_register_operand" "=R") + (float_extend:DF (match_operand:SF 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%t0%t1\\t%0, %1;") + +(define_insn "truncdfsf2" + [(set (match_operand:SF 0 "nvptx_register_operand" "=R") + (float_truncate:SF (match_operand:DF 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#%t0%t1\\t%0, %1;") + +(define_insn "floatunssi2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (unsigned_float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#%t0.u%T1\\t%0, %1;") + +(define_insn "floatsi2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#%t0.s%T1\\t%0, %1;") + +(define_insn "floatunsdi2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (unsigned_float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#%t0.u%T1\\t%0, %1;") + +(define_insn "floatdi2" + [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") + (float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#%t0.s%T1\\t%0, %1;") + +(define_insn "fixuns_truncsi2" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unsigned_fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#i.u%T0%t1\\t%0, %1;") + +(define_insn "fix_truncsi2" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#i.s%T0%t1\\t%0, %1;") + +(define_insn "fixuns_truncdi2" + [(set (match_operand:DI 0 "nvptx_register_operand" "=R") + (unsigned_fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#i.u%T0%t1\\t%0, %1;") + +(define_insn "fix_truncdi2" + [(set (match_operand:DI 0 "nvptx_register_operand" "=R") + (fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))] + "" + "%.\\tcvt%#i.s%T0%t1\\t%0, %1;") + +;; Miscellaneous + +(define_insn "nop" + [(const_int 0)] + "" + "") + +(define_insn "return" + [(return)] + "" +{ + return nvptx_output_return (); +}) + +(define_expand "epilogue" + [(clobber (const_int 0))] + "" +{ + emit_jump_insn (gen_return ()); + DONE; +}) + +(define_insn "allocate_stack" + [(set (match_operand 0 "nvptx_register_operand" "=R") + (unspec [(match_operand 1 "nvptx_register_operand" "R")] + UNSPEC_ALLOCA))] + "" + "%.\\tcall (%0), %%alloca, (%1);") + +(define_expand "restore_stack_block" + [(match_operand 0 "register_operand" "") + (match_operand 1 "register_operand" "")] + "" +{ + DONE; +}) + +(define_expand "restore_stack_function" + [(match_operand 0 "register_operand" "") + (match_operand 1 "register_operand" "")] + "" +{ + DONE; +}) + +(define_insn "trap" + [(trap_if (const_int 1) (const_int 0))] + "" + "trap;") + +(define_insn "trap_if_true" + [(trap_if (ne (match_operand:BI 0 "nvptx_register_operand" "R") + (const_int 0)) + (const_int 0))] + "" + "%j0 trap;") + +(define_insn "trap_if_false" + [(trap_if (eq (match_operand:BI 0 "nvptx_register_operand" "R") + (const_int 0)) + (const_int 0))] + "" + "%J0 trap;") + +(define_expand "ctrap4" + [(trap_if (match_operator 0 "nvptx_comparison_operator" + [(match_operand:SDIM 1 "nvptx_register_operand") + (match_operand:SDIM 2 "nvptx_nonmemory_operand")]) + (match_operand 3 "const_0_operand"))] + "" +{ + rtx t = nvptx_expand_compare (operands[0]); + emit_insn (gen_trap_if_true (t)); + DONE; +})