From patchwork Wed Jan 13 17:39:32 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Martin Jambor X-Patchwork-Id: 567089 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B0F0C1402E2 for ; Thu, 14 Jan 2016 04:56:24 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=UY1kpWNv; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :resent-from:resent-date:resent-message-id:resent-to:message-id :date:from:to:cc:subject:mime-version:content-type:in-reply-to; q=dns; s=default; b=EO6dqoIN++ezF6xpkvmPa8vZ8vOWZcnrZVy2n9gUNdI wa32IwbC43iVO8jaezsW2/7IgrYxt8obSyIRCm194Jp/lmr8ZVgHrL40XV74U9AX SAdT4UqSzHhlGfxsvhPCAmLbOeWgoVi+6B1ODs7WIJHyf//M5BPsz1nR/YRNLWSE = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :resent-from:resent-date:resent-message-id:resent-to:message-id :date:from:to:cc:subject:mime-version:content-type:in-reply-to; s=default; bh=JqLutZlGvjPfE6HDSfdSr7Z3Tdw=; b=UY1kpWNv8X0rptGFW CGlO5kK8G23uFv8bKFNPZC4GaaazcdhO50C13QahBaK25ZKvAQeBudribhpDhmRA /ndSaAydXTYXLrKG7iwYPF+lQ2fBG7Aipw1Fei7nuRiSPf2/Yz7uJ5wep4kZbb5r +Qjd5umllnGprAQ5uFpJYOMgjo= Received: (qmail 119660 invoked by alias); 13 Jan 2016 17:55:24 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 119595 invoked by uid 89); 13 Jan 2016 17:55:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.5 required=5.0 tests=BAYES_50, SPF_SOFTFAIL autolearn=no version=3.3.2 spammy=518, sk:symbol, function.h, functionh X-HELO: eggs.gnu.org Received: from eggs.gnu.org (HELO eggs.gnu.org) (208.118.235.92) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 13 Jan 2016 17:55:13 +0000 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aJPdS-0006SR-Hx for gcc-patches@gcc.gnu.org; Wed, 13 Jan 2016 12:55:10 -0500 Received: from mx2.suse.de ([195.135.220.15]:60250) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aJPdS-0006SC-7s for gcc-patches@gcc.gnu.org; Wed, 13 Jan 2016 12:55:06 -0500 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 63E94ACEE for ; Wed, 13 Jan 2016 17:55:03 +0000 (UTC) Resent-From: Martin Jambor Resent-Date: Wed, 13 Jan 2016 18:55:03 +0100 Resent-Message-ID: <20160113175503.GH5905@virgil.suse.cz> Resent-To: GCC Patches Message-Id: <20160113173925.776317025@virgil.suse.cz> User-Agent: quilt/0.64 Date: Wed, 13 Jan 2016 18:39:32 +0100 From: Martin Jambor To: GCC Patches Cc: Jan Hubicka , Martin Liska Subject: [hsa merge 07/10] IPA-HSA pass MIME-Version: 1.0 Content-Disposition: inline; filename=ipa-hsa.diff In-Reply-To: <20160113173925.220029649@virgil.suse.cz> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 195.135.220.15 X-IsSubscribed: yes Hi, this patch contains IPA-related changes that we need to bring about for HSA. The patch is a re-post of https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00720.html but so far we have not received any feedback. Let me quote the original accompanying email here for reference: When a target construct is gridified, the HSA GPU function is associated with the CPU function throughout the compilation, so that they can be registered as a pair in libgomp. Ungridified target constructs and, more importantly, "pragma omp declare target" marked functions emerge out of OMP expansion as one gimple function for both the host and the accelerator. However, at some point we need to create a special HSA function representation so that we can modify behavior of a (very) few optimization passes for them. Both is done by the following new IPA pass, which creates new HSA clones in these cases. Moreover, it redirects the appropriate call graph edges to be in between HSA implementations, marks HSA clones with the flatten attribute to minimize any call overhead (which is much more significant on GPUs) and makes sure both the CPU and GPU functions are coupled together and remain in the same LTO partition so that they can b registered together to libgomp. Thanks, Martin 2016-01-13 Martin Liska Martin Jambor * ipa-hsa.c: New file. * lto-section-in.c (lto_section_name): Add hsa section name. * lto-streamer.h (lto_section_type): Add hsa section. * lto-partition.c: Include "hsa.h" (add_symbol_to_partition_1): Put hsa implementations into the same partition as host implementations. * timevar.def (TV_IPA_HSA): New. diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c new file mode 100644 index 0000000..dd47995 --- /dev/null +++ b/gcc/ipa-hsa.c @@ -0,0 +1,329 @@ +/* Callgraph based analysis of static variables. + Copyright (C) 2015-2016 Free Software Foundation, Inc. + Contributed by Martin Liska + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* Interprocedural HSA pass is responsible for creation of HSA clones. + For all these HSA clones, we emit HSAIL instructions and pass processing + is terminated. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "is-a.h" +#include "hash-set.h" +#include "vec.h" +#include "tree.h" +#include "tree-pass.h" +#include "function.h" +#include "basic-block.h" +#include "gimple.h" +#include "dumpfile.h" +#include "gimple-pretty-print.h" +#include "tree-streamer.h" +#include "stringpool.h" +#include "cgraph.h" +#include "print-tree.h" +#include "symbol-summary.h" +#include "hsa.h" + +namespace { + +/* If NODE is not versionable, warn about not emiting HSAIL and return false. + Otherwise return true. */ + +static bool +check_warn_node_versionable (cgraph_node *node) +{ + if (!node->local.versionable) + { + warning_at (EXPR_LOCATION (node->decl), OPT_Whsa, + "could not emit HSAIL for function %s: function cannot be " + "cloned", node->name ()); + return false; + } + return true; +} + +/* The function creates HSA clones for all functions that were either + marked as HSA kernels or are callable HSA functions. Apart from that, + we redirect all edges that come from an HSA clone and end in another + HSA clone to connect these two functions. */ + +static unsigned int +process_hsa_functions (void) +{ + struct cgraph_node *node; + + if (hsa_summaries == NULL) + hsa_summaries = new hsa_summary_t (symtab); + + FOR_EACH_DEFINED_FUNCTION (node) + { + hsa_function_summary *s = hsa_summaries->get (node); + + /* A linked function is skipped. */ + if (s->m_binded_function != NULL) + continue; + + if (s->m_kind != HSA_NONE) + { + if (!check_warn_node_versionable (node)) + continue; + cgraph_node *clone = node->create_virtual_clone + (vec (), NULL, NULL, "hsa"); + TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); + + clone->force_output = true; + hsa_summaries->link_functions (clone, node, s->m_kind, false); + + if (dump_file) + fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n", + clone->name (), + s->m_kind == HSA_KERNEL ? "kernel" : "function"); + } + else if (hsa_callable_function_p (node->decl)) + { + if (!check_warn_node_versionable (node)) + continue; + cgraph_node *clone = node->create_virtual_clone + (vec (), NULL, NULL, "hsa"); + TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl); + + if (!cgraph_local_p (node)) + clone->force_output = true; + hsa_summaries->link_functions (clone, node, HSA_FUNCTION, false); + + if (dump_file) + fprintf (dump_file, "Created a new HSA function clone: %s\n", + clone->name ()); + } + } + + /* Redirect all edges that are between HSA clones. */ + FOR_EACH_DEFINED_FUNCTION (node) + { + cgraph_edge *e = node->callees; + + while (e) + { + hsa_function_summary *src = hsa_summaries->get (node); + if (src->m_kind != HSA_NONE && src->m_gpu_implementation_p) + { + hsa_function_summary *dst = hsa_summaries->get (e->callee); + if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p) + { + e->redirect_callee (dst->m_binded_function); + if (dump_file) + fprintf (dump_file, + "Redirecting edge to HSA function: %s->%s\n", + xstrdup_for_dump (e->caller->name ()), + xstrdup_for_dump (e->callee->name ())); + } + } + + e = e->next_callee; + } + } + + return 0; +} + +/* Iterate all HSA functions and stream out HSA function summary. */ + +static void +ipa_hsa_write_summary (void) +{ + struct bitpack_d bp; + struct cgraph_node *node; + struct output_block *ob; + unsigned int count = 0; + lto_symtab_encoder_iterator lsei; + lto_symtab_encoder_t encoder; + + if (!hsa_summaries) + return; + + ob = create_output_block (LTO_section_ipa_hsa); + encoder = ob->decl_state->symtab_node_encoder; + ob->symbol = NULL; + for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); + lsei_next_function_in_partition (&lsei)) + { + node = lsei_cgraph_node (lsei); + hsa_function_summary *s = hsa_summaries->get (node); + + if (s->m_kind != HSA_NONE) + count++; + } + + streamer_write_uhwi (ob, count); + + /* Process all of the functions. */ + for (lsei = lsei_start_function_in_partition (encoder); !lsei_end_p (lsei); + lsei_next_function_in_partition (&lsei)) + { + node = lsei_cgraph_node (lsei); + hsa_function_summary *s = hsa_summaries->get (node); + + if (s->m_kind != HSA_NONE) + { + encoder = ob->decl_state->symtab_node_encoder; + int node_ref = lto_symtab_encoder_encode (encoder, node); + streamer_write_uhwi (ob, node_ref); + + bp = bitpack_create (ob->main_stream); + bp_pack_value (&bp, s->m_kind, 2); + bp_pack_value (&bp, s->m_gpu_implementation_p, 1); + bp_pack_value (&bp, s->m_binded_function != NULL, 1); + streamer_write_bitpack (&bp); + if (s->m_binded_function) + stream_write_tree (ob, s->m_binded_function->decl, true); + } + } + + streamer_write_char_stream (ob->main_stream, 0); + produce_asm (ob, NULL); + destroy_output_block (ob); +} + +/* Read section in file FILE_DATA of length LEN with data DATA. */ + +static void +ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data, + size_t len) +{ + const struct lto_function_header *header = + (const struct lto_function_header *) data; + const int cfg_offset = sizeof (struct lto_function_header); + const int main_offset = cfg_offset + header->cfg_size; + const int string_offset = main_offset + header->main_size; + struct data_in *data_in; + unsigned int i; + unsigned int count; + + lto_input_block ib_main ((const char *) data + main_offset, + header->main_size, file_data->mode_table); + + data_in = + lto_data_in_create (file_data, (const char *) data + string_offset, + header->string_size, vNULL); + count = streamer_read_uhwi (&ib_main); + + for (i = 0; i < count; i++) + { + unsigned int index; + struct cgraph_node *node; + lto_symtab_encoder_t encoder; + + index = streamer_read_uhwi (&ib_main); + encoder = file_data->symtab_node_encoder; + node = dyn_cast (lto_symtab_encoder_deref (encoder, + index)); + gcc_assert (node->definition); + hsa_function_summary *s = hsa_summaries->get (node); + + struct bitpack_d bp = streamer_read_bitpack (&ib_main); + s->m_kind = (hsa_function_kind) bp_unpack_value (&bp, 2); + s->m_gpu_implementation_p = bp_unpack_value (&bp, 1); + bool has_tree = bp_unpack_value (&bp, 1); + + if (has_tree) + { + tree decl = stream_read_tree (&ib_main, data_in); + s->m_binded_function = cgraph_node::get_create (decl); + } + } + lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data, + len); + lto_data_in_delete (data_in); +} + +/* Load streamed HSA functions summary and assign the summary to a function. */ + +static void +ipa_hsa_read_summary (void) +{ + struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); + struct lto_file_decl_data *file_data; + unsigned int j = 0; + + if (hsa_summaries == NULL) + hsa_summaries = new hsa_summary_t (symtab); + + while ((file_data = file_data_vec[j++])) + { + size_t len; + const char *data = lto_get_section_data (file_data, LTO_section_ipa_hsa, + NULL, &len); + + if (data) + ipa_hsa_read_section (file_data, data, len); + } +} + +const pass_data pass_data_ipa_hsa = +{ + IPA_PASS, /* type */ + "hsa", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_IPA_HSA, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_dump_symtab, /* todo_flags_finish */ +}; + +class pass_ipa_hsa : public ipa_opt_pass_d +{ +public: + pass_ipa_hsa (gcc::context *ctxt) + : ipa_opt_pass_d (pass_data_ipa_hsa, ctxt, + NULL, /* generate_summary */ + ipa_hsa_write_summary, /* write_summary */ + ipa_hsa_read_summary, /* read_summary */ + ipa_hsa_write_summary, /* write_optimization_summary */ + ipa_hsa_read_summary, /* read_optimization_summary */ + NULL, /* stmt_fixup */ + 0, /* function_transform_todo_flags_start */ + NULL, /* function_transform */ + NULL) /* variable_transform */ + {} + + /* opt_pass methods: */ + virtual bool gate (function *); + + virtual unsigned int execute (function *) { return process_hsa_functions (); } + +}; // class pass_ipa_reference + +bool +pass_ipa_hsa::gate (function *) +{ + return hsa_gen_requested_p () || in_lto_p; +} + +} // anon namespace + +ipa_opt_pass_d * +make_pass_ipa_hsa (gcc::context *ctxt) +{ + return new pass_ipa_hsa (ctxt); +} diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c index 972f062..93b82be 100644 --- a/gcc/lto-section-in.c +++ b/gcc/lto-section-in.c @@ -51,7 +51,8 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] = "ipcp_trans", "icf", "offload_table", - "mode_table" + "mode_table", + "hsa" }; diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 42654f5..0cb200e 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -244,6 +244,7 @@ enum lto_section_type LTO_section_ipa_icf, LTO_section_offload_table, LTO_section_mode_table, + LTO_section_ipa_hsa, LTO_N_SECTION_TYPES /* Must be last. */ }; diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 81a63a5..0a56170 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-prop.h" #include "ipa-inline.h" #include "lto-partition.h" +#include "hsa.h" vec ltrans_partitions; @@ -170,6 +171,24 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) Therefore put it into the same partition. */ if (cnode->instrumented_version) add_symbol_to_partition_1 (part, cnode->instrumented_version); + + /* Add an HSA associated with the symbol. */ + if (hsa_summaries != NULL) + { + hsa_function_summary *s = hsa_summaries->get (cnode); + if (s->m_kind == HSA_KERNEL) + { + /* Add binded function. */ + bool added = add_symbol_to_partition_1 (part, + s->m_binded_function); + gcc_assert (added); + if (symtab->dump_file) + fprintf (symtab->dump_file, + "adding an HSA function (host/gpu) to the " + "partition: %s\n", + s->m_binded_function->name ()); + } + } } add_references_to_partition (part, node); diff --git a/gcc/timevar.def b/gcc/timevar.def index 2765179..d9a5066 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -97,6 +97,7 @@ DEFTIMEVAR (TV_WHOPR_WPA_IO , "whopr wpa I/O") DEFTIMEVAR (TV_WHOPR_PARTITIONING , "whopr partitioning") DEFTIMEVAR (TV_WHOPR_LTRANS , "whopr ltrans") DEFTIMEVAR (TV_IPA_REFERENCE , "ipa reference") +DEFTIMEVAR (TV_IPA_HSA , "ipa HSA") DEFTIMEVAR (TV_IPA_PROFILE , "ipa profile") DEFTIMEVAR (TV_IPA_AUTOFDO , "auto profile") DEFTIMEVAR (TV_IPA_PURE_CONST , "ipa pure const")