From patchwork Thu Nov 5 21:59:53 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 540716 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 79515140E31 for ; Fri, 6 Nov 2015 09:00:06 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=UaM8eDxY; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=AICpJrfOqTg6QMXGm yl/gRQ28knopQ3TwX6Ymf6AqCZ48hCegE3i77ATI126knSRGLOSi5gMrG44dDyUX PWyG4sbHiboUFkk73raRL7B9RiW1Q6D2VXzjyWLqC8NNEa16RIml2vYzrUPmaW7y qpYvNbrO+0hkdoGk+zwfA14uP4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=wYphHjOgsD2xrBDyEhs1Nla O8/c=; b=UaM8eDxYFxwOC1SsyN1EevPdrhHqgxxAP7LIoemXOJdBDyWA6FtjGBT thEiVJHWUHMdqHaRYV9LJ1Mq+t+ddB2sWZMcic8n/OyxsZu85sE8ficr2PYWH2n6 Y2lh7fJlN3xGc8Nh3FEMLtPw6IBt5VK84yYmzklTwHpGOX4RE6ok= Received: (qmail 127429 invoked by alias); 5 Nov 2015 21:59:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 127420 invoked by uid 89); 5 Nov 2015 21:59:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_20, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Thu, 05 Nov 2015 21:59:56 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id C8DA9AABB; Thu, 5 Nov 2015 21:59:34 +0000 (UTC) Date: Thu, 5 Nov 2015 22:59:53 +0100 From: Martin Jambor To: GCC Patches Cc: Martin Liska , Jan Hubicka Subject: [hsa 6/12] IPA-HSA pass Message-ID: <20151105215953.GI9264@virgil.suse.cz> Mail-Followup-To: GCC Patches , Martin Liska , Jan Hubicka References: <20151105215108.GC9264@virgil.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20151105215108.GC9264@virgil.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-IsSubscribed: yes Hi, when a target construct is gridified, the HSA GPU function is associated with the CPU function throughout the compilation, so that they can be registered as a pair in libgomp. When a target or a parallel construct is not gridified, its body emerges out of OMP expansion as one gimple function. However, at some point we need to create a special HSA function representation so that we can modify behavior of a (very) few optimization passes for them. Similarly, "omp declare target" functions, which ought to be callable from HSA, should get their own representation for exactly the same reason. Both is done by the following new IPA pass, which creates new HSA clones in these cases. Moreover, it redirects the appropriate call graph edges to be in between HSA implementations, marks HSA clones with the flatten attribute to minimize any call overhead (which is much more significant on GPUs) and makes sure both the CPU and GPU functions are coupled together and remain in the same LTO partition so that they can b registered together to libgomp. Thanks, Martin 2015-11-05 Martin Liska Martin Jambor * ipa-hsa.c: New file. diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c new file mode 100644 index 0000000..b4cb58e --- /dev/null +++ b/gcc/ipa-hsa.c @@ -0,0 +1,334 @@ +/* Callgraph based analysis of static variables. + Copyright (C) 2015 Free Software Foundation, Inc. + Contributed by Martin Liska + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* Interprocedural HSA pass is responsible for creation of HSA clones. + For all these HSA clones, we emit HSAIL instructions and pass processing + is terminated. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "is-a.h" +#include "hash-set.h" +#include "vec.h" +#include "tree.h" +#include "tree-pass.h" +#include "function.h" +#include "basic-block.h" +#include "gimple.h" +#include "dumpfile.h" +#include "gimple-pretty-print.h" +#include "tree-streamer.h" +#include "stringpool.h" +#include "cgraph.h" +#include "print-tree.h" +#include "symbol-summary.h" +#include "hsa.h" + +namespace { + +/* If NODE is not versionable, warn about not emiting HSAIL and return false. + Otherwise return true. */ + +static bool +check_warn_node_versionable (cgraph_node *node) +{ + if (!node->local.versionable) + { + if (warning_at (EXPR_LOCATION (node->decl), OPT_Whsa, + HSA_SORRY_MSG)) + inform (EXPR_LOCATION (node->decl), + "Function cannot be cloned"); + return false; + } + return true; +} + +/* The function creates HSA clones for all functions that were either + marked as HSA kernels or are callable HSA functions. Apart from that, + we redirect all edges that come from an HSA clone and end in another + HSA clone to connect these two functions. */ + +static unsigned int +process_hsa_functions (void) +{ + struct cgraph_node *node; + + if (hsa_summaries == NULL) + hsa_summaries = new hsa_summary_t (symtab); + + FOR_EACH_DEFINED_FUNCTION (node) + { + hsa_function_summary *s = hsa_summaries->get (node); + + /* A linked function is skipped. */ + if (s->m_binded_function != NULL) + continue; + + if (s->m_kind != HSA_NONE) + { + if (!check_warn_node_versionable (node)) + continue; + cgraph_node *clone = node->create_virtual_clone + (vec (), NULL, NULL, "hsa"); + TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); + if (s->m_kind == HSA_KERNEL) + DECL_ATTRIBUTES (clone->decl) + = tree_cons (get_identifier ("flatten"), NULL_TREE, + DECL_ATTRIBUTES (clone->decl)); + + clone->force_output = true; + hsa_summaries->link_functions (clone, node, s->m_kind); + + if (dump_file) + fprintf (dump_file, "HSA creates a new clone: %s, type: %s\n", + clone->name (), + s->m_kind == HSA_KERNEL ? "kernel" : "function"); + } + else if (hsa_callable_function_p (node->decl)) + { + if (!check_warn_node_versionable (node)) + continue; + cgraph_node *clone = node->create_virtual_clone + (vec (), NULL, NULL, "hsa"); + TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); + + if (!cgraph_local_p (node)) + clone->force_output = true; + hsa_summaries->link_functions (clone, node, HSA_FUNCTION); + + if (dump_file) + fprintf (dump_file, "HSA creates a new function clone: %s\n", + clone->name ()); + } + } + + /* Redirect all edges that are between HSA clones. */ + FOR_EACH_DEFINED_FUNCTION (node) + { + cgraph_edge *e = node->callees; + + while (e) + { + hsa_function_summary *src = hsa_summaries->get (node); + if (src->m_kind != HSA_NONE && src->m_gpu_implementation_p) + { + hsa_function_summary *dst = hsa_summaries->get (e->callee); + if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p) + { + e->redirect_callee (dst->m_binded_function); + if (dump_file) + fprintf (dump_file, + "Redirecting edge to HSA function: %s->%s\n", + xstrdup_for_dump (e->caller->name ()), + xstrdup_for_dump (e->callee->name ())); + } + } + + e = e->next_callee; + } + } + + return 0; +} + +/* Iterate all HSA functions and stream out HSA function summary. */ + +static void +ipa_hsa_write_summary (void) +{ + struct bitpack_d bp; + struct cgraph_node *node; + struct output_block *ob; + unsigned int count = 0; + lto_symtab_encoder_iterator lsei; + lto_symtab_encoder_t encoder; + + if (!hsa_summaries) + return; + + ob = create_output_block (LTO_section_ipa_hsa); + encoder = ob->decl_state->symtab_node_encoder; + ob->symbol = NULL; + for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); + lsei_next_function_in_partition (&lsei)) + { + node = lsei_cgraph_node (lsei); + hsa_function_summary *s = hsa_summaries->get (node); + + if (s->m_kind != HSA_NONE) + count++; + } + + streamer_write_uhwi (ob, count); + + /* Process all of the functions. */ + for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); + lsei_next_function_in_partition (&lsei)) + { + node = lsei_cgraph_node (lsei); + hsa_function_summary *s = hsa_summaries->get (node); + + if (s->m_kind != HSA_NONE) + { + encoder = ob->decl_state->symtab_node_encoder; + int node_ref = lto_symtab_encoder_encode (encoder, node); + streamer_write_uhwi (ob, node_ref); + + bp = bitpack_create (ob->main_stream); + bp_pack_value (&bp, s->m_kind, 2); + bp_pack_value (&bp, s->m_gpu_implementation_p, 1); + bp_pack_value (&bp, s->m_binded_function != NULL, 1); + streamer_write_bitpack (&bp); + if (s->m_binded_function) + stream_write_tree (ob, s->m_binded_function->decl, true); + } + } + + streamer_write_char_stream (ob->main_stream, 0); + produce_asm (ob, NULL); + destroy_output_block (ob); +} + +/* Read section in file FILE_DATA of length LEN with data DATA. */ + +static void +ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data, + size_t len) +{ + const struct lto_function_header *header = + (const struct lto_function_header *) data; + const int cfg_offset = sizeof (struct lto_function_header); + const int main_offset = cfg_offset + header->cfg_size; + const int string_offset = main_offset + header->main_size; + struct data_in *data_in; + unsigned int i; + unsigned int count; + + lto_input_block ib_main ((const char *) data + main_offset, + header->main_size, file_data->mode_table); + + data_in = + lto_data_in_create (file_data, (const char *) data + string_offset, + header->string_size, vNULL); + count = streamer_read_uhwi (&ib_main); + + for (i = 0; i < count; i++) + { + unsigned int index; + struct cgraph_node *node; + lto_symtab_encoder_t encoder; + + index = streamer_read_uhwi (&ib_main); + encoder = file_data->symtab_node_encoder; + node = dyn_cast (lto_symtab_encoder_deref (encoder, + index)); + gcc_assert (node->definition); + hsa_function_summary *s = hsa_summaries->get (node); + + struct bitpack_d bp = streamer_read_bitpack (&ib_main); + s->m_kind = (hsa_function_kind) bp_unpack_value (&bp, 2); + s->m_gpu_implementation_p = bp_unpack_value (&bp, 1); + bool has_tree = bp_unpack_value (&bp, 1); + + if (has_tree) + { + tree decl = stream_read_tree (&ib_main, data_in); + s->m_binded_function = cgraph_node::get_create (decl); + } + } + lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data, + len); + lto_data_in_delete (data_in); +} + +/* Load streamed HSA functions summary and assign the summary to a function. */ + +static void +ipa_hsa_read_summary (void) +{ + struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); + struct lto_file_decl_data *file_data; + unsigned int j = 0; + + if (hsa_summaries == NULL) + hsa_summaries = new hsa_summary_t (symtab); + + while ((file_data = file_data_vec[j++])) + { + size_t len; + const char *data = lto_get_section_data (file_data, LTO_section_ipa_hsa, + NULL, &len); + + if (data) + ipa_hsa_read_section (file_data, data, len); + } +} + +const pass_data pass_data_ipa_hsa = +{ + IPA_PASS, /* type */ + "hsa", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_IPA_HSA, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_dump_symtab, /* todo_flags_finish */ +}; + +class pass_ipa_hsa : public ipa_opt_pass_d +{ +public: + pass_ipa_hsa (gcc::context *ctxt) + : ipa_opt_pass_d (pass_data_ipa_hsa, ctxt, + NULL, /* generate_summary */ + ipa_hsa_write_summary, /* write_summary */ + ipa_hsa_read_summary, /* read_summary */ + ipa_hsa_write_summary, /* write_optimization_summary */ + ipa_hsa_read_summary, /* read_optimization_summary */ + NULL, /* stmt_fixup */ + 0, /* function_transform_todo_flags_start */ + NULL, /* function_transform */ + NULL) /* variable_transform */ + {} + + /* opt_pass methods: */ + virtual bool gate (function *); + + virtual unsigned int execute (function *) { return process_hsa_functions (); } + +}; // class pass_ipa_reference + +bool +pass_ipa_hsa::gate (function *) +{ + return hsa_gen_requested_p () || in_lto_p; +} + +} // anon namespace + +ipa_opt_pass_d * +make_pass_ipa_hsa (gcc::context *ctxt) +{ + return new pass_ipa_hsa (ctxt); +} diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c index e7ace09..840e26b 100644 --- a/gcc/lto-section-in.c +++ b/gcc/lto-section-in.c @@ -51,7 +51,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = "ipcp_trans", "icf", "offload_table", - "mode_table" + "mode_table", + "hsa" }; diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 5aae9e9..b29ff18 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -244,6 +244,7 @@ enum lto_section_type LTO_section_ipa_icf, LTO_section_offload_table, LTO_section_mode_table, + LTO_section_ipa_hsa, LTO_N_SECTION_TYPES /* Must be last. */ }; diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 03ed72b..a966014 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-inline.h" #include "ipa-utils.h" #include "lto-partition.h" +#include "hsa.h" vec ltrans_partitions; @@ -177,6 +178,24 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) Therefore put it into the same partition. */ if (cnode->instrumented_version) add_symbol_to_partition_1 (part, cnode->instrumented_version); + + /* Add an HSA associated with the symbol. */ + if (hsa_summaries != NULL) + { + hsa_function_summary *s = hsa_summaries->get (cnode); + if (s->m_kind == HSA_KERNEL) + { + /* Add binded function. */ + bool added = add_symbol_to_partition_1 (part, + s->m_binded_function); + gcc_assert (added); + if (symtab->dump_file) + fprintf (symtab->dump_file, + "adding an HSA function (host/gpu) to the " + "partition: %s\n", + s->m_binded_function->name ()); + } + } } add_references_to_partition (part, node);