diff mbox

[4/4] Back-end and IPA bits of hsa branch merge

Message ID c71b3c0875d3d372441194f99971af084148b75b.1479076755.git.mjambor@suse.cz
State New
Headers show

Commit Message

Martin Jambor Nov. 13, 2016, 10:22 p.m. UTC
Hi,

so this patch bundles together all the various fixes, cleanups and
improvements to the HSAIL generation itself which are far too many to
list here individually, more details can be found in the email
messages that i sent when committing a given change to the branch.

As the HSA maintainer I am going to approve this after the previous
two patches are approved by others, but if anybody has any comment or
suggestion, I will be glad to know.

Thanks,

Martin



2016-11-11  Martin Jambor  <mjambor@suse.cz>
	    Martin Liska  <mliska@suse.cz>

	* hsa.h (hsa_bb): Add method method append_phi.
	(hsa_insn_br): Renamed to hsa_insn_cbr, renamed all
	occurences in all files too.
	(hsa_insn_br): New class, now the ancestor of hsa_incn_cbr.
	(is_a_helper <hsa_insn_br *>::test): New function.
	(is_a_helper <hsa_insn_cbr *>::test): Adjust to only cover conditional
	branch instructions.
	(hsa_insn_signal): Make a direct descendant of
	hsa_insn_basic.  Add memorder constructor parameter and
	m_memory_order and m_signalop member variables.
	(hsa_insn_queue): Changed constructor parameters to common form.
	Added m_segment and m_memory_order member variables.
	(hsa_summary_t): Add private member function
	process_gpu_implementation_attributes.
	(hsa_function_summary): Rename m_binded_function to
	m_bound_function.
	(hsa_insn_basic_p): Remove typedef.
	(hsa_op_with_type): Change hsa_insn_basic_p into plain pointers.
	(hsa_op_reg_p): Remove typedef.
	(hsa_function_representation): Change hsa_op_reg_p into plain
	pointers.
	(hsa_insn_phi): Removed new and delete operators.
	(hsa_insn_br): Likewise.
	(hsa_insn_cbr): Likewise.
	(hsa_insn_sbr): Likewise.
	(hsa_insn_cmp): Likewise.
	(hsa_insn_mem): Likewise.
	(hsa_insn_atomic): Likewise.
	(hsa_insn_signal): Likewise.
	(hsa_insn_seg): Likewise.
	(hsa_insn_call): Likewise.
	(hsa_insn_arg_block): Likewise.
	(hsa_insn_comment): Likewise.
	(hsa_insn_srctype): Likewise.
	(hsa_insn_packed): Likewise.
	(hsa_insn_cvt): Likewise.
	(hsa_insn_alloca): Likewise.

	* hsa.c (hsa_destroy_insn): Also handle instances of hsa_insn_br.
	(process_gpu_implementation_attributes): New function.
	(link_functions): Move some functionality into it.  Adjust after
	renaming m_binded_functions to m_bound_functions.
	(hsa_insn_basic::op_output_p): Add BRIG_OPCODE_DEBUGTRAP
	to the list of instructions with no output registers.
	(get_in_type): Return this if it is a register of
	matching size.
	(hsa_get_declaration_name): Moved to...

        * hsa-gen.c (hsa_get_declaration_name): ...here.  Allocate
	temporary string on an obstack instead from ggc.
	(query_hsa_grid): Renamed to query_hsa_grid_dim, reimplemented, cut
	down to two overloads.
	(hsa_allocp_operand_address): Removed.
	(hsa_allocp_operand_immed): Likewise.
	(hsa_allocp_operand_reg): Likewise.
	(hsa_allocp_operand_code_list): Likewise.
	(hsa_allocp_operand_operand_list): Likewise.
	(hsa_allocp_inst_basic): Likewise.
	(hsa_allocp_inst_phi): Likewise.
	(hsa_allocp_inst_mem): Likewise.
	(hsa_allocp_inst_atomic): Likewise.
	(hsa_allocp_inst_signal): Likewise.
	(hsa_allocp_inst_seg): Likewise.
	(hsa_allocp_inst_cmp): Likewise.
	(hsa_allocp_inst_br): Likewise.
	(hsa_allocp_inst_sbr): Likewise.
	(hsa_allocp_inst_call): Likewise.
	(hsa_allocp_inst_arg_block): Likewise.
	(hsa_allocp_inst_comment): Likewise.
	(hsa_allocp_inst_queue): Likewise.
	(hsa_allocp_inst_srctype): Likewise.
	(hsa_allocp_inst_packed): Likewise.
	(hsa_allocp_inst_cvt): Likewise.
	(hsa_allocp_inst_alloca): Likewise.
	(hsa_allocp_bb): Likewise.
	(hsa_obstack): New.
	(hsa_init_data_for_cfun): Initialize obstack.
	(hsa_deinit_data_for_cfun): Release memory of the obstack.
	(hsa_op_immed::operator new): Use obstack instead of object_allocator.
	(hsa_op_reg::operator new): Likewise.
	(hsa_op_address::operator new): Likewise.
	(hsa_op_code_list::operator new): Likewise.
	(hsa_op_operand_list::operator new): Likewise.
	(hsa_insn_basic::operator new): Likewise.
	(hsa_insn_phi::operator new): Likewise.
	(hsa_insn_br::operator new): Likewise.
	(hsa_insn_sbr::operator new): Likewise.
	(hsa_insn_cmp::operator new): Likewise.
	(hsa_insn_mem::operator new): Likewise.
	(hsa_insn_atomic::operator new): Likewise.
	(hsa_insn_signal::operator new): Likewise.
	(hsa_insn_seg::operator new): Likewise.
	(hsa_insn_call::operator new): Likewise.
	(hsa_insn_arg_block::operator new): Likewise.
	(hsa_insn_comment::operator new): Likewise.
	(hsa_insn_srctype::operator new): Likewise.
	(hsa_insn_packed::operator new): Likewise.
	(hsa_insn_cvt::operator new): Likewise.
	(hsa_insn_alloca::operator new): Likewise.
	(hsa_init_new_bb): Likewise.
	(hsa_bb::append_phi): New function.
	(gen_hsa_phi_from_gimple_phi): Use it.
	(get_symbol_for_decl): Fix dinstinguishing between
	global and local functions.  Put local variables into a segment
	according to their attribute or static flag, if there is one.
	(hsa_insn_br::hsa_insn_br): New.
	(hsa_insn_br::operator new): Likewise.
	(hsa_insn_cbr::hsa_insn_cbr): Set width via ancestor constructor.
	(query_hsa_grid_nodim): New function.
	(multiply_grid_dim_characteristics): Likewise.
	(gen_get_num_threads): Likewise.
	(gen_get_num_teams): Reimplemented.
	(gen_get_team_num): Likewise.
	(gen_hsa_insns_for_known_library_call): Updated calls to the above
	helper functions.
	(get_memory_order_name): Removed.
	(get_memory_order): Likewise.
	(hsa_memorder_from_tree): New function.
	(gen_hsa_ternary_atomic_for_builtin): Renamed to
	gen_hsa_atomic_for_builtin, can also create signals.
	(gen_hsa_insns_for_call): Handle many new builtins.  Adjust to use
	hsa_memory_order_from_tree and gen_hsa_atomic_for_builtin.
	(hsa_insn_atomic): Fix function comment.
	(hsa_insn_signal::hsa_insn_signal): Fix comment.  Update call to
	ancestor constructor and initialization of new member variables.
	(hsa_insn_queue::hsa_insn_queue): Added initialization of new
	member variables.
	(hsa_get_host_function): Handle functions with no bound CPU
	implementation.  Fix binded to bound.
	(get_brig_function_name): Likewise.
	(HSA_SORRY_ATV): Remove semicolon after macro.
	(HSA_SORRY_AT): Likewise.
	(omp_simple_builtin::generate): Add missing semicolons.
	(hsa_insn_phi::operator new): Removed.
	(hsa_insn_br::operator new): Likewise.
	(hsa_insn_cbr::operator new): Likewise.
	(hsa_insn_sbr::operator new): Likewise.
	(hsa_insn_cmp::operator new): Likewise.
	(hsa_insn_mem::operator new): Likewise.
	(hsa_insn_atomic::operator new): Likewise.
	(hsa_insn_signal::operator new): Likewise.
	(hsa_insn_seg::operator new): Likewise.
	(hsa_insn_call::operator new): Likewise.
	(hsa_insn_arg_block::operator new): Likewise.
	(hsa_insn_comment::operator new): Likewise.
	(hsa_insn_srctype::operator new): Likewise.
	(hsa_insn_packed::operator new): Likewise.
	(hsa_insn_cvt::operator new): Likewise.
	(hsa_insn_alloca::operator new): Likewise.
	(get_symbol_for_decl): Accept CONST_DECLs, put them to
	readonly segment.
	(gen_hsa_addr): Also process CONST_DECLs.
	(gen_hsa_addr_insns): Process CONST_DECLs by creating private
	copies.
	(gen_hsa_unary_operation): Make sure the function does
	not use bittype source type for firstbit and lastbit operations.
	(gen_hsa_popcount_to_dest): Make sure the function uses a bittype
	source type.

	* hsa-brig.c (emit_insn_operands): Cope with zero operands in an
	instruction.
	(emit_branch_insn): Renamed to emit_cond_branch_insn.
	Emit the width stored in the class.
	(emit_generic_branch_insn): New function.
	(emit_insn): Call emit_generic_branch_insn.
	(emit_signal_insn): Remove obsolete comment.  Update
	member variable name, pick a type according to profile.
	(emit_alloca_insn): Remove obsolete comment.
	(emit_atomic_insn): Likewise.
	(emit_queue_insn): Get segment and memory order from the IR object.
	(hsa_brig_section): Make allocate_new_chunk, chunks
	and cur_chunk provate, add a default NULL parameter to add method.
	(hsa_brig_section::add): Added a new parameter, store pointer to
	output data there if it is non-NULL.
	(emit_function_directives): Use this new parameter instead of
	calculating the pointer itself, fix function comment.
	(hsa_brig_emit_function): Add forgotten endian conversion.
	(hsa_output_kernels): Remove unnecessary building of
	kernel_dependencies_vector_type.
	(emit_immediate_operand): Declare.
	(emit_directive_variable): Also emit initializers of CONST_DECLs.
	(gen_hsa_insn_for_internal_fn_call): Also handle IFN_RSQRT.
	(verify_function_arguments): Properly detect variadic
	arguments.

	* hsa-dump.c (hsa_width_specifier_name): New function.
	(dump_hsa_insn_1): Dump generic branch instructions, update signal
	member variable name.  Special dumping for queue objects.

	* ipa-hsa.c (process_hsa_functions): Adjust after renaming
	m_binded_functions to m_bound_functions.  Copy externally visible flag
	to the node.
	(ipa_hsa_write_summary): Likewise.
	(ipa_hsa_read_section): Likewise.

libgomp/
	* testsuite/libgomp.hsa.c/bits-insns.c: New test.
---
 gcc/hsa-brig.c                               | 140 ++--
 gcc/hsa-dump.c                               | 107 +++-
 gcc/hsa-gen.c                                | 914 ++++++++++++++-------------
 gcc/hsa.c                                    |  60 +-
 gcc/hsa.h                                    | 157 ++---
 gcc/ipa-hsa.c                                |  14 +-
 libgomp/testsuite/libgomp.hsa.c/bits-insns.c |  73 +++
 7 files changed, 838 insertions(+), 627 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/bits-insns.c
diff mbox

Patch

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 66ff8f9..acd9164 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -161,19 +161,21 @@  public:
   /* The size of the header of the section without any padding.  */
   unsigned header_byte_delta;
 
-  /* Buffers of binary data, each containing BRIG_CHUNK_MAX_SIZE bytes.  */
-  vec <struct hsa_brig_data_chunk> chunks;
-
-  /* More convenient access to the last chunk from the vector above.  */
-  struct hsa_brig_data_chunk *cur_chunk;
-
-  void allocate_new_chunk ();
   void init (const char *name);
   void release ();
   void output ();
-  unsigned add (const void *data, unsigned len);
+  unsigned add (const void *data, unsigned len, void **output = NULL);
   void round_size_up (int factor);
   void *get_ptr_by_offset (unsigned int offset);
+
+private:
+  void allocate_new_chunk ();
+
+  /* Buffers of binary data, each containing BRIG_CHUNK_MAX_SIZE bytes.  */
+  vec <struct hsa_brig_data_chunk> chunks;
+
+  /* More convenient access to the last chunk from the vector above.  */
+  struct hsa_brig_data_chunk *cur_chunk;
 };
 
 static struct hsa_brig_section brig_data, brig_code, brig_operand;
@@ -271,10 +273,11 @@  hsa_brig_section::output ()
 }
 
 /* Add to the stream LEN bytes of opaque binary DATA.  Return the offset at
-   which it was stored.  */
+   which it was stored.  If OUTPUT is not NULL, store into it the pointer to
+   the place where DATA was actually stored.  */
 
 unsigned
-hsa_brig_section::add (const void *data, unsigned len)
+hsa_brig_section::add (const void *data, unsigned len, void **output)
 {
   unsigned offset = total_size;
 
@@ -282,7 +285,10 @@  hsa_brig_section::add (const void *data, unsigned len)
   if (cur_chunk->size > (BRIG_CHUNK_MAX_SIZE - len))
     allocate_new_chunk ();
 
-  memcpy (cur_chunk->data + cur_chunk->size, data, len);
+  char *dst = cur_chunk->data + cur_chunk->size;
+  memcpy (dst, data, len);
+  if (output)
+    *output = dst;
   cur_chunk->size += len;
   total_size += len;
 
@@ -565,6 +571,7 @@  enqueue_op (hsa_op_base *op)
   return ret;
 }
 
+static void emit_immediate_operand (hsa_op_immed *imm);
 
 /* Emit directive describing a symbol if it has not been emitted already.
    Return the offset of the directive.  */
@@ -603,7 +610,14 @@  emit_directive_variable (struct hsa_symbol *symbol)
     }
 
   dirvar.name = lendian32 (name_offset);
-  dirvar.init = 0;
+
+  if (symbol->m_decl && TREE_CODE (symbol->m_decl) == CONST_DECL)
+    {
+      hsa_op_immed *tmp = new hsa_op_immed (DECL_INITIAL (symbol->m_decl));
+      dirvar.init = lendian32 (enqueue_op (tmp));
+    }
+  else
+    dirvar.init = 0;
   dirvar.type = lendian16 (symbol->m_type);
   dirvar.segment = symbol->m_segment;
   dirvar.align = symbol->m_align;
@@ -626,8 +640,12 @@  emit_directive_variable (struct hsa_symbol *symbol)
   return symbol->m_directive_offset;
 }
 
-/* Emit directives describing either a function declaration or
-   definition F.  */
+/* Emit directives describing either a function declaration or definition F and
+   return the produced BrigDirectiveExecutable structure.  The function does
+   not take into account any instructions when calculating nextModuleEntry
+   field of the produced BrigDirectiveExecutable structure so when emitting
+   actual definitions, this field needs to be updated after all of the function
+   is actually added to the code section.  */
 
 static BrigDirectiveExecutable *
 emit_function_directives (hsa_function_representation *f, bool is_declaration)
@@ -635,7 +653,7 @@  emit_function_directives (hsa_function_representation *f, bool is_declaration)
   struct BrigDirectiveExecutable fndir;
   unsigned name_offset, inarg_off, scoped_off, next_toplev_off;
   int count = 0;
-  BrigDirectiveExecutable *ptr_to_fndir;
+  void *ptr_to_fndir;
   hsa_symbol *sym;
 
   if (!f->m_declaration_p)
@@ -693,17 +711,7 @@  emit_function_directives (hsa_function_representation *f, bool is_declaration)
       *slot = int_fn;
     }
 
-  brig_code.add (&fndir, sizeof (fndir));
-  /* terrible hack: we need to set instCount after we emit all
-     insns, but we need to emit directive in order, and we emit directives
-     during insn emitting.  So we need to emit the FUNCTION directive
-     early, then the insns, and then we need to set instCount, so remember
-     a pointer to it, in some horrible way.  cur_chunk.data+size points
-     directly to after fndir here.  */
-  ptr_to_fndir
-      = (BrigDirectiveExecutable *)(brig_code.cur_chunk->data
-				    + brig_code.cur_chunk->size
-				    - sizeof (fndir));
+  brig_code.add (&fndir, sizeof (fndir), &ptr_to_fndir);
 
   if (f->m_output_arg)
     emit_directive_variable (f->m_output_arg);
@@ -724,7 +732,7 @@  emit_function_directives (hsa_function_representation *f, bool is_declaration)
 	}
     }
 
-  return ptr_to_fndir;
+  return (BrigDirectiveExecutable *) ptr_to_fndir;
 }
 
 /* Emit a label directive for the given HBB.  We assume it is about to start on
@@ -1237,20 +1245,20 @@  emit_insn_operands (hsa_insn_basic *insn)
     operand_offsets;
 
   unsigned l = insn->operand_count ();
-  operand_offsets.safe_grow (l);
-
-  for (unsigned i = 0; i < l; i++)
-    operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i)));
 
   /* We have N operands so use 4 * N for the byte_count.  */
   uint32_t byte_count = lendian32 (4 * l);
-
   unsigned offset = brig_data.add (&byte_count, sizeof (byte_count));
-  brig_data.add (operand_offsets.address (),
-		 l * sizeof (BrigOperandOffset32_t));
+  if (l > 0)
+    {
+      operand_offsets.safe_grow (l);
+      for (unsigned i = 0; i < l; i++)
+	operand_offsets[i] = lendian32 (enqueue_op (insn->get_op (i)));
 
+      brig_data.add (operand_offsets.address (),
+		     l * sizeof (BrigOperandOffset32_t));
+    }
   brig_data.round_size_up (4);
-
   return offset;
 }
 
@@ -1334,10 +1342,6 @@  emit_signal_insn (hsa_insn_signal *mem)
 {
   struct BrigInstSignal repr;
 
-  /* This is necessary because of the erroneous typedef of
-     BrigMemoryModifier8_t which introduces padding which may then contain
-     random stuff (which we do not want so that we can test things don't
-     change).  */
   memset (&repr, 0, sizeof (repr));
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_SIGNAL);
@@ -1345,9 +1349,9 @@  emit_signal_insn (hsa_insn_signal *mem)
   repr.base.type = lendian16 (mem->m_type);
   repr.base.operands = lendian32 (emit_insn_operands (mem));
 
-  repr.memoryOrder = mem->m_memoryorder;
-  repr.signalOperation = mem->m_atomicop;
-  repr.signalType = BRIG_TYPE_SIG64;
+  repr.memoryOrder = mem->m_memory_order;
+  repr.signalOperation = mem->m_signalop;
+  repr.signalType = hsa_machine_large_p () ? BRIG_TYPE_SIG64 : BRIG_TYPE_SIG32;
 
   brig_code.add (&repr, sizeof (repr));
   brig_insn_count++;
@@ -1368,10 +1372,6 @@  emit_atomic_insn (hsa_insn_atomic *mem)
   else
     addr = as_a <hsa_op_address *> (mem->get_op (1));
 
-  /* This is necessary because of the erroneous typedef of
-     BrigMemoryModifier8_t which introduces padding which may then contain
-     random stuff (which we do not want so that we can test things don't
-     change).  */
   memset (&repr, 0, sizeof (repr));
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_ATOMIC);
@@ -1448,10 +1448,6 @@  emit_alloca_insn (hsa_insn_alloca *alloca)
   struct BrigInstMem repr;
   gcc_checking_assert (alloca->operand_count () == 2);
 
-  /* This is necessary because of the erroneous typedef of
-     BrigMemoryModifier8_t which introduces padding which may then contain
-     random stuff (which we do not want so that we can test things don't
-     change).  */
   memset (&repr, 0, sizeof (repr));
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_MEM);
@@ -1497,11 +1493,29 @@  emit_cmp_insn (hsa_insn_cmp *cmp)
   brig_insn_count++;
 }
 
-/* Emit an HSA branching instruction and all necessary directives, schedule
-   necessary operands for writing.  */
+/* Emit an HSA generic branching/sycnronization instruction.  */
+
+static void
+emit_generic_branch_insn (hsa_insn_br *br)
+{
+  struct BrigInstBr repr;
+  repr.base.base.byteCount = lendian16 (sizeof (repr));
+  repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
+  repr.base.opcode = lendian16 (br->m_opcode);
+  repr.width = br->m_width;
+  repr.base.type = lendian16 (br->m_type);
+  repr.base.operands = lendian32 (emit_insn_operands (br));
+  memset (&repr.reserved, 0, sizeof (repr.reserved));
+
+  brig_code.add (&repr, sizeof (repr));
+  brig_insn_count++;
+}
+
+/* Emit an HSA conditional branching instruction and all necessary directives,
+   schedule necessary operands for writing.  */
 
 static void
-emit_branch_insn (hsa_insn_br *br)
+emit_cond_branch_insn (hsa_insn_cbr *br)
 {
   struct BrigInstBr repr;
 
@@ -1514,7 +1528,7 @@  emit_branch_insn (hsa_insn_br *br)
   repr.base.base.byteCount = lendian16 (sizeof (repr));
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_BR);
   repr.base.opcode = lendian16 (br->m_opcode);
-  repr.width = BRIG_WIDTH_1;
+  repr.width = br->m_width;
   /* For Conditional jumps the type is always B1.  */
   repr.base.type = lendian16 (BRIG_TYPE_B1);
 
@@ -1730,8 +1744,8 @@  emit_queue_insn (hsa_insn_queue *insn)
   repr.base.base.kind = lendian16 (BRIG_KIND_INST_QUEUE);
   repr.base.opcode = lendian16 (insn->m_opcode);
   repr.base.type = lendian16 (insn->m_type);
-  repr.segment = BRIG_SEGMENT_GLOBAL;
-  repr.memoryOrder = BRIG_MEMORY_ORDER_SC_RELEASE;
+  repr.segment = insn->m_segment;
+  repr.memoryOrder = insn->m_memory_order;
   repr.base.operands = lendian32 (emit_insn_operands (insn));
   brig_data.round_size_up (4);
   brig_code.add (&repr, sizeof (repr));
@@ -1886,8 +1900,8 @@  emit_insn (hsa_insn_basic *insn)
     emit_segment_insn (seg);
   else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
     emit_cmp_insn (cmp);
-  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
-    emit_branch_insn (br);
+  else if (hsa_insn_cbr *br = dyn_cast <hsa_insn_cbr *> (insn))
+    emit_cond_branch_insn (br);
   else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
     {
       if (switch_instructions == NULL)
@@ -1896,6 +1910,8 @@  emit_insn (hsa_insn_basic *insn)
       switch_instructions->safe_push (sbr);
       emit_switch_insn (sbr);
     }
+  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
+    emit_generic_branch_insn (br);
   else if (hsa_insn_arg_block *block = dyn_cast <hsa_insn_arg_block *> (insn))
     emit_arg_block_insn (block);
   else if (hsa_insn_call *call = dyn_cast <hsa_insn_call *> (insn))
@@ -2006,7 +2022,7 @@  hsa_brig_emit_function (void)
       prev_bb = bb;
     }
   perhaps_emit_branch (prev_bb, NULL);
-  ptr_to_fndir->nextModuleEntry = brig_code.total_size;
+  ptr_to_fndir->nextModuleEntry = lendian32 (brig_code.total_size);
 
   /* Fill up label references for all sbr instructions.  */
   if (switch_instructions)
@@ -2225,11 +2241,6 @@  hsa_output_kernels (tree *host_func_table, tree *kernels)
       tree gridified_kernel_p_tree = build_int_cstu (boolean_type_node,
 						     gridified_kernel_p);
       unsigned count = 0;
-
-      kernel_dependencies_vector_type
-	= build_array_type (build_pointer_type (char_type_node),
-			    build_index_type (size_int (0)));
-
       vec<constructor_elt, va_gc> *kernel_dependencies_vec = NULL;
       if (hsa_decl_kernel_dependencies)
 	{
@@ -2279,6 +2290,7 @@  hsa_output_kernels (tree *host_func_table, tree *kernels)
       if (count > 0)
 	{
 	  ASM_GENERATE_INTERNAL_LABEL (tmp_name, "__hsa_dependencies_list", i);
+	  gcc_checking_assert (kernel_dependencies_vector_type);
 	  tree dependencies_list = build_decl (UNKNOWN_LOCATION, VAR_DECL,
 					       get_identifier (tmp_name),
 					       kernel_dependencies_vector_type);
diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index 985caca..7e3b9f0 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -621,6 +621,88 @@  hsa_m_atomicop_name (enum BrigAtomicOperation op)
     }
 }
 
+/* Return textual name for atomic operation.  */
+
+static const char *
+hsa_width_specifier_name (BrigWidth8_t width)
+{
+  switch (width)
+    {
+    case BRIG_WIDTH_NONE:
+      return "none";
+    case BRIG_WIDTH_1:
+      return "1";
+    case BRIG_WIDTH_2:
+      return "2";
+    case BRIG_WIDTH_4:
+      return "4";
+    case BRIG_WIDTH_8:
+      return "8";
+    case BRIG_WIDTH_16:
+      return "16";
+    case BRIG_WIDTH_32:
+      return "32";
+    case BRIG_WIDTH_64:
+      return "64";
+    case BRIG_WIDTH_128:
+      return "128";
+    case BRIG_WIDTH_256:
+      return "256";
+    case BRIG_WIDTH_512:
+      return "512";
+    case BRIG_WIDTH_1024:
+      return "1024";
+    case BRIG_WIDTH_2048:
+      return "2048";
+    case BRIG_WIDTH_4096:
+      return "4096";
+    case BRIG_WIDTH_8192:
+      return "8192";
+    case BRIG_WIDTH_16384:
+      return "16384";
+    case BRIG_WIDTH_32768:
+      return "32768";
+    case BRIG_WIDTH_65536:
+      return "65536";
+    case BRIG_WIDTH_131072:
+      return "131072";
+    case BRIG_WIDTH_262144:
+      return "262144";
+    case BRIG_WIDTH_524288:
+      return "524288";
+    case BRIG_WIDTH_1048576:
+      return "1048576";
+    case BRIG_WIDTH_2097152:
+      return "2097152";
+    case BRIG_WIDTH_4194304:
+      return "4194304";
+    case BRIG_WIDTH_8388608:
+      return "8388608";
+    case BRIG_WIDTH_16777216:
+      return "16777216";
+    case BRIG_WIDTH_33554432:
+      return "33554432";
+    case BRIG_WIDTH_67108864:
+      return "67108864";
+    case BRIG_WIDTH_134217728:
+      return "134217728";
+    case BRIG_WIDTH_268435456:
+      return "268435456";
+    case BRIG_WIDTH_536870912:
+      return "536870912";
+    case BRIG_WIDTH_1073741824:
+      return "1073741824";
+    case BRIG_WIDTH_2147483648:
+      return "2147483648";
+    case BRIG_WIDTH_WAVESIZE:
+      return "wavesize";
+    case BRIG_WIDTH_ALL:
+      return "all";
+    default:
+      return "UNKNOWN_WIDTH";
+    }
+}
+
 /* Dump textual representation of HSA IL register REG to file F.  */
 
 static void
@@ -793,9 +875,9 @@  dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
       hsa_insn_signal *mem = as_a <hsa_insn_signal *> (insn);
 
       fprintf (f, "%s", hsa_opcode_name (mem->m_opcode));
-      fprintf (f, "_%s", hsa_m_atomicop_name (mem->m_atomicop));
-      if (mem->m_memoryorder != BRIG_MEMORY_ORDER_NONE)
-	fprintf (f, "_%s", hsa_memsem_name (mem->m_memoryorder));
+      fprintf (f, "_%s", hsa_m_atomicop_name (mem->m_signalop));
+      if (mem->m_memory_order != BRIG_MEMORY_ORDER_NONE)
+	fprintf (f, "_%s", hsa_memsem_name (mem->m_memory_order));
       fprintf (f, "_%s ", hsa_type_name (mem->m_type));
 
       dump_hsa_operands (f, mem);
@@ -884,9 +966,9 @@  dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
       fprintf (f, ", ");
       dump_hsa_operand (f, cmp->get_op (2));
     }
-  else if (is_a <hsa_insn_br *> (insn))
+  else if (is_a <hsa_insn_cbr *> (insn))
     {
-      hsa_insn_br *br = as_a <hsa_insn_br *> (insn);
+      hsa_insn_cbr *br = as_a <hsa_insn_cbr *> (insn);
       basic_block target = NULL;
       edge_iterator ei;
       edge e;
@@ -921,6 +1003,12 @@  dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
 	    fprintf (f, ", ");
 	}
     }
+  else if (is_a <hsa_insn_br *> (insn))
+    {
+      hsa_insn_br *br = as_a <hsa_insn_br *> (insn);
+      fprintf (f, "%s_width(%s) ", hsa_opcode_name (br->m_opcode),
+	       hsa_width_specifier_name (br->m_width));
+    }
   else if (is_a <hsa_insn_arg_block *> (insn))
     {
       hsa_insn_arg_block *arg_block = as_a <hsa_insn_arg_block *> (insn);
@@ -1018,6 +1106,15 @@  dump_hsa_insn_1 (FILE *f, hsa_insn_basic *insn, int *indent)
 
       dump_hsa_operands (f, insn);
     }
+  else if (hsa_insn_queue *qi = dyn_cast <hsa_insn_queue *> (insn))
+    {
+      fprintf (f, "%s_%s_%s_%s ", hsa_opcode_name (qi->m_opcode),
+	       hsa_seg_name (qi->m_segment),
+	       hsa_memsem_name (qi->m_memory_order),
+	       hsa_type_name (qi->m_type));
+
+      dump_hsa_operands (f, qi);
+    }
   else
     {
       fprintf (f, "%s_%s ", hsa_opcode_name (insn->m_opcode),
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 21c35e6..a88294e 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -39,7 +39,6 @@  along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "diagnostic-core.h"
-#include "alloc-pool.h"
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "stringpool.h"
@@ -72,7 +71,7 @@  along with GCC; see the file COPYING3.  If not see
 		    HSA_SORRY_MSG)) \
       inform (location, message, __VA_ARGS__); \
   } \
-  while (false);
+  while (false)
 
 /* Same as previous, but highlight a location.  */
 
@@ -84,7 +83,7 @@  along with GCC; see the file COPYING3.  If not see
 		    HSA_SORRY_MSG)) \
       inform (location, message); \
   } \
-  while (false);
+  while (false)
 
 /* Default number of threads used by kernel dispatch.  */
 
@@ -127,31 +126,7 @@  struct hsa_queue
   uint64_t id;
 };
 
-/* Alloc pools for allocating basic hsa structures such as operands,
-   instructions and other basic entities.  */
-static object_allocator<hsa_op_address> *hsa_allocp_operand_address;
-static object_allocator<hsa_op_immed> *hsa_allocp_operand_immed;
-static object_allocator<hsa_op_reg> *hsa_allocp_operand_reg;
-static object_allocator<hsa_op_code_list> *hsa_allocp_operand_code_list;
-static object_allocator<hsa_op_operand_list> *hsa_allocp_operand_operand_list;
-static object_allocator<hsa_insn_basic> *hsa_allocp_inst_basic;
-static object_allocator<hsa_insn_phi> *hsa_allocp_inst_phi;
-static object_allocator<hsa_insn_mem> *hsa_allocp_inst_mem;
-static object_allocator<hsa_insn_atomic> *hsa_allocp_inst_atomic;
-static object_allocator<hsa_insn_signal> *hsa_allocp_inst_signal;
-static object_allocator<hsa_insn_seg> *hsa_allocp_inst_seg;
-static object_allocator<hsa_insn_cmp> *hsa_allocp_inst_cmp;
-static object_allocator<hsa_insn_br> *hsa_allocp_inst_br;
-static object_allocator<hsa_insn_sbr> *hsa_allocp_inst_sbr;
-static object_allocator<hsa_insn_call> *hsa_allocp_inst_call;
-static object_allocator<hsa_insn_arg_block> *hsa_allocp_inst_arg_block;
-static object_allocator<hsa_insn_comment> *hsa_allocp_inst_comment;
-static object_allocator<hsa_insn_queue> *hsa_allocp_inst_queue;
-static object_allocator<hsa_insn_srctype> *hsa_allocp_inst_srctype;
-static object_allocator<hsa_insn_packed> *hsa_allocp_inst_packed;
-static object_allocator<hsa_insn_cvt> *hsa_allocp_inst_cvt;
-static object_allocator<hsa_insn_alloca> *hsa_allocp_inst_alloca;
-static object_allocator<hsa_bb> *hsa_allocp_bb;
+static struct obstack hsa_obstack;
 
 /* List of pointers to all instructions that come from an object allocator.  */
 static vec <hsa_insn_basic *> hsa_instructions;
@@ -486,52 +461,7 @@  static void
 hsa_init_data_for_cfun ()
 {
   hsa_init_compilation_unit_data ();
-  hsa_allocp_operand_address
-    = new object_allocator<hsa_op_address> ("HSA address operands");
-  hsa_allocp_operand_immed
-    = new object_allocator<hsa_op_immed> ("HSA immediate operands");
-  hsa_allocp_operand_reg
-    = new object_allocator<hsa_op_reg> ("HSA register operands");
-  hsa_allocp_operand_code_list
-    = new object_allocator<hsa_op_code_list> ("HSA code list operands");
-  hsa_allocp_operand_operand_list
-    = new object_allocator<hsa_op_operand_list> ("HSA operand list operands");
-  hsa_allocp_inst_basic
-    = new object_allocator<hsa_insn_basic> ("HSA basic instructions");
-  hsa_allocp_inst_phi
-    = new object_allocator<hsa_insn_phi> ("HSA phi operands");
-  hsa_allocp_inst_mem
-    = new object_allocator<hsa_insn_mem> ("HSA memory instructions");
-  hsa_allocp_inst_atomic
-    = new object_allocator<hsa_insn_atomic> ("HSA atomic instructions");
-  hsa_allocp_inst_signal
-    = new object_allocator<hsa_insn_signal> ("HSA signal instructions");
-  hsa_allocp_inst_seg
-    = new object_allocator<hsa_insn_seg> ("HSA segment conversion "
-					  "instructions");
-  hsa_allocp_inst_cmp
-    = new object_allocator<hsa_insn_cmp> ("HSA comparison instructions");
-  hsa_allocp_inst_br
-    = new object_allocator<hsa_insn_br> ("HSA branching instructions");
-  hsa_allocp_inst_sbr
-    = new object_allocator<hsa_insn_sbr> ("HSA switch branching instructions");
-  hsa_allocp_inst_call
-    = new object_allocator<hsa_insn_call> ("HSA call instructions");
-  hsa_allocp_inst_arg_block
-    = new object_allocator<hsa_insn_arg_block> ("HSA arg block instructions");
-  hsa_allocp_inst_comment
-    = new object_allocator<hsa_insn_comment> ("HSA comment instructions");
-  hsa_allocp_inst_queue
-    = new object_allocator<hsa_insn_queue> ("HSA queue instructions");
-  hsa_allocp_inst_srctype
-    = new object_allocator<hsa_insn_srctype> ("HSA source type instructions");
-  hsa_allocp_inst_packed
-    = new object_allocator<hsa_insn_packed> ("HSA packed instructions");
-  hsa_allocp_inst_cvt
-    = new object_allocator<hsa_insn_cvt> ("HSA convert instructions");
-  hsa_allocp_inst_alloca
-    = new object_allocator<hsa_insn_alloca> ("HSA alloca instructions");
-  hsa_allocp_bb = new object_allocator<hsa_bb> ("HSA basic blocks");
+  gcc_obstack_init (&hsa_obstack);
 }
 
 /* Deinitialize HSA subsystem and free all allocated memory.  */
@@ -565,29 +495,7 @@  hsa_deinit_data_for_cfun (void)
       omp_simple_builtins = NULL;
     }
 
-  delete hsa_allocp_operand_address;
-  delete hsa_allocp_operand_immed;
-  delete hsa_allocp_operand_reg;
-  delete hsa_allocp_operand_code_list;
-  delete hsa_allocp_operand_operand_list;
-  delete hsa_allocp_inst_basic;
-  delete hsa_allocp_inst_phi;
-  delete hsa_allocp_inst_atomic;
-  delete hsa_allocp_inst_mem;
-  delete hsa_allocp_inst_signal;
-  delete hsa_allocp_inst_seg;
-  delete hsa_allocp_inst_cmp;
-  delete hsa_allocp_inst_br;
-  delete hsa_allocp_inst_sbr;
-  delete hsa_allocp_inst_call;
-  delete hsa_allocp_inst_arg_block;
-  delete hsa_allocp_inst_comment;
-  delete hsa_allocp_inst_queue;
-  delete hsa_allocp_inst_srctype;
-  delete hsa_allocp_inst_packed;
-  delete hsa_allocp_inst_cvt;
-  delete hsa_allocp_inst_alloca;
-  delete hsa_allocp_bb;
+  obstack_free (&hsa_obstack, NULL);
   delete hsa_cfun;
 }
 
@@ -873,6 +781,49 @@  hsa_needs_cvt (BrigType16_t dtype, BrigType16_t stype)
   return false;
 }
 
+/* Return declaration name if it exists or create one from UID if it does not.
+   If DECL is a local variable, make UID part of its name.  */
+
+const char *
+hsa_get_declaration_name (tree decl)
+{
+  if (!DECL_NAME (decl))
+    {
+      char buf[64];
+      snprintf (buf, 64, "__hsa_anon_%u", DECL_UID (decl));
+      size_t len = strlen (buf);
+      char *copy = (char *) obstack_alloc (&hsa_obstack, len + 1);
+      memcpy (copy, buf, len + 1);
+      return copy;
+    }
+
+  tree name_tree;
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      || (TREE_CODE (decl) == VAR_DECL && is_global_var (decl)))
+    name_tree = DECL_ASSEMBLER_NAME (decl);
+  else
+    name_tree = DECL_NAME (decl);
+
+  const char *name = IDENTIFIER_POINTER (name_tree);
+  /* User-defined assembly names have prepended asterisk symbol.  */
+  if (name[0] == '*')
+    name++;
+
+  if ((TREE_CODE (decl) == VAR_DECL)
+      && decl_function_context (decl))
+    {
+      size_t len = strlen (name);
+      char *buf = (char *) alloca (len + 32);
+      snprintf (buf, len + 32, "%s_%u", name, DECL_UID (decl));
+      len = strlen (buf);
+      char *copy = (char *) obstack_alloc (&hsa_obstack, len + 1);
+      memcpy (copy, buf, len + 1);
+      return copy;
+    }
+  else
+    return name;
+}
+
 /* Lookup or create the associated hsa_symbol structure with a given VAR_DECL
    or lookup the hsa_structure corresponding to a PARM_DECL.  */
 
@@ -884,11 +835,13 @@  get_symbol_for_decl (tree decl)
 
   gcc_assert (TREE_CODE (decl) == PARM_DECL
 	      || TREE_CODE (decl) == RESULT_DECL
-	      || VAR_P (decl));
+	      || TREE_CODE (decl) == VAR_DECL
+	      || TREE_CODE (decl) == CONST_DECL);
 
   dummy.m_decl = decl;
 
-  bool is_in_global_vars = VAR_P (decl) && is_global_var (decl);
+  bool is_in_global_vars = ((TREE_CODE (decl) == VAR_DECL)
+			    && !decl_function_context (decl));
 
   if (is_in_global_vars)
     slot = hsa_global_variable_symbols->find_slot (&dummy, INSERT);
@@ -925,11 +878,14 @@  get_symbol_for_decl (tree decl)
   else
     {
       hsa_symbol *sym;
-      gcc_assert (VAR_P (decl));
+      /* PARM_DECLs and RESULT_DECL should be already in m_local_symbols.  */
+      gcc_assert (TREE_CODE (decl) == VAR_DECL
+		  || TREE_CODE (decl) == CONST_DECL);
       BrigAlignment8_t align = hsa_object_alignment (decl);
 
       if (is_in_global_vars)
 	{
+	  gcc_checking_assert (TREE_CODE (decl) != CONST_DECL);
 	  sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_GLOBAL,
 				BRIG_LINKAGE_PROGRAM, true,
 				BRIG_ALLOCATION_PROGRAM, align);
@@ -951,12 +907,25 @@  get_symbol_for_decl (tree decl)
 	  if (AGGREGATE_TYPE_P (TREE_TYPE (decl)))
 	    align = MAX ((BrigAlignment8_t) BRIG_ALIGNMENT_8, align);
 
-	  /* PARM_DECL and RESULT_DECL should be already in m_local_symbols.  */
-	  gcc_assert (VAR_P (decl));
+	  BrigAllocation allocation = BRIG_ALLOCATION_AUTOMATIC;
+	  BrigSegment8_t segment;
+	  if (TREE_CODE (decl) == CONST_DECL)
+	    {
+	      segment = BRIG_SEGMENT_READONLY;
+	      allocation = BRIG_ALLOCATION_AGENT;
+	    }
+	  else if (lookup_attribute ("hsa_group_segment",
+				     DECL_ATTRIBUTES (decl)))
+	    segment = BRIG_SEGMENT_GROUP;
+	  else if (TREE_STATIC (decl)
+		   || lookup_attribute ("hsa_global_segment",
+					DECL_ATTRIBUTES (decl)))
+	    segment = BRIG_SEGMENT_GLOBAL;
+	  else
+	    segment = BRIG_SEGMENT_PRIVATE;
 
-	  sym = new hsa_symbol (BRIG_TYPE_NONE, BRIG_SEGMENT_PRIVATE,
-				BRIG_LINKAGE_FUNCTION);
-	  sym->m_align = align;
+	  sym = new hsa_symbol (BRIG_TYPE_NONE, segment, BRIG_LINKAGE_FUNCTION,
+				false, allocation, align);
 	  sym->fillup_for_decl (decl);
 	  hsa_cfun->m_private_variables.safe_push (sym);
 	}
@@ -978,7 +947,7 @@  hsa_get_host_function (tree decl)
   gcc_assert (s->m_kind != HSA_NONE);
   gcc_assert (s->m_gpu_implementation_p);
 
-  return s->m_binded_function->decl;
+  return s->m_bound_function ? s->m_bound_function->decl : NULL;
 }
 
 /* Return true if function DECL has a host equivalent function.  */
@@ -989,8 +958,10 @@  get_brig_function_name (tree decl)
   tree d = decl;
 
   hsa_function_summary *s = hsa_summaries->get (cgraph_node::get_create (d));
-  if (s->m_kind != HSA_NONE && s->m_gpu_implementation_p)
-    d = s->m_binded_function->decl;
+  if (s->m_kind != HSA_NONE
+      && s->m_gpu_implementation_p
+      && s->m_bound_function)
+    d = s->m_bound_function->decl;
 
   /* IPA split can create a function that has no host equivalent.  */
   if (d == NULL)
@@ -1066,6 +1037,14 @@  hsa_op_with_type::get_in_type (BrigType16_t dtype, hsa_bb *hbb)
       dest = new hsa_op_reg (dtype);
       hbb->append_insn (new hsa_insn_cvt (dest, this));
     }
+  else if (is_a <hsa_op_reg *> (this))
+    {
+      /* In the end, HSA registers do not really have types, only sizes, so if
+	 the sizes match, we can use the register directly.  */
+      gcc_checking_assert (hsa_type_bit_size (dtype)
+			   == hsa_type_bit_size (m_type));
+      return this;
+    }
   else
     {
       dest = new hsa_op_reg (m_type);
@@ -1128,12 +1107,12 @@  hsa_op_immed::hsa_op_immed ()
 {
 }
 
-/* New operator to allocate immediate operands from pool alloc.  */
+/* New operator to allocate immediate operands from obstack.  */
 
 void *
-hsa_op_immed::operator new (size_t)
+hsa_op_immed::operator new (size_t size)
 {
-  return hsa_allocp_operand_immed->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Destructor.  */
@@ -1160,12 +1139,12 @@  hsa_op_reg::hsa_op_reg (BrigType16_t t)
 {
 }
 
-/* New operator to allocate a register from pool alloc.  */
+/* New operator to allocate a register from obstack.  */
 
 void *
-hsa_op_reg::operator new (size_t)
+hsa_op_reg::operator new (size_t size)
 {
-  return hsa_allocp_operand_reg->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Verify register operand.  */
@@ -1244,12 +1223,12 @@  hsa_op_address::hsa_op_address (hsa_op_reg *r, HOST_WIDE_INT offset)
 {
 }
 
-/* New operator to allocate address operands from pool alloc.  */
+/* New operator to allocate address operands from obstack.  */
 
 void *
-hsa_op_address::operator new (size_t)
+hsa_op_address::operator new (size_t size)
 {
-  return hsa_allocp_operand_address->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Constructor of an operand referring to HSAIL code.  */
@@ -1269,12 +1248,12 @@  hsa_op_code_list::hsa_op_code_list (unsigned elements)
   m_offsets.safe_grow_cleared (elements);
 }
 
-/* New operator to allocate code list operands from pool alloc.  */
+/* New operator to allocate code list operands from obstack.  */
 
 void *
-hsa_op_code_list::operator new (size_t)
+hsa_op_code_list::operator new (size_t size)
 {
-  return hsa_allocp_operand_code_list->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Constructor of an operand representing an operand list.
@@ -1287,12 +1266,12 @@  hsa_op_operand_list::hsa_op_operand_list (unsigned elements)
   m_offsets.safe_grow (elements);
 }
 
-/* New operator to allocate operand list operands from pool alloc.  */
+/* New operator to allocate operand list operands from obstack.  */
 
 void *
-hsa_op_operand_list::operator new (size_t)
+hsa_op_operand_list::operator new (size_t size)
 {
-  return hsa_allocp_operand_operand_list->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 hsa_op_operand_list::~hsa_op_operand_list ()
@@ -1437,12 +1416,12 @@  hsa_insn_basic::hsa_insn_basic (unsigned nops, int opc, BrigType16_t t,
   hsa_instructions.safe_push (this);
 }
 
-/* New operator to allocate basic instruction from pool alloc.  */
+/* New operator to allocate basic instruction from obstack.  */
 
 void *
-hsa_insn_basic::operator new (size_t)
+hsa_insn_basic::operator new (size_t size)
 {
-  return hsa_allocp_inst_basic->allocate_raw ();
+  return obstack_alloc (&hsa_obstack, size);
 }
 
 /* Verify the instruction.  */
@@ -1495,32 +1474,27 @@  hsa_insn_phi::hsa_insn_phi (unsigned nops, hsa_op_reg *dst)
   dst->set_definition (this);
 }
 
-/* New operator to allocate PHI instruction from pool alloc.  */
+/* Constructor of class representing instructions for control flow and
+   sychronization,   */
 
-void *
-hsa_insn_phi::operator new (size_t)
+hsa_insn_br::hsa_insn_br (unsigned nops, int opc, BrigType16_t t,
+			  BrigWidth8_t width, hsa_op_base *arg0,
+			  hsa_op_base *arg1, hsa_op_base *arg2,
+			  hsa_op_base *arg3)
+  : hsa_insn_basic (nops, opc, t, arg0, arg1, arg2, arg3),
+    m_width (width)
 {
-  return hsa_allocp_inst_phi->allocate_raw ();
 }
 
 /* Constructor of class representing instruction for conditional jump, CTRL is
    the control register determining whether the jump will be carried out, the
    new instruction is automatically added to its uses list.  */
 
-hsa_insn_br::hsa_insn_br (hsa_op_reg *ctrl)
-  : hsa_insn_basic (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, ctrl),
-    m_width (BRIG_WIDTH_1)
+hsa_insn_cbr::hsa_insn_cbr (hsa_op_reg *ctrl)
+  : hsa_insn_br (1, BRIG_OPCODE_CBR, BRIG_TYPE_B1, BRIG_WIDTH_1, ctrl)
 {
 }
 
-/* New operator to allocate branch instruction from pool alloc.  */
-
-void *
-hsa_insn_br::operator new (size_t)
-{
-  return hsa_allocp_inst_br->allocate_raw ();
-}
-
 /* Constructor of class representing instruction for switch jump, CTRL is
    the index register.  */
 
@@ -1531,14 +1505,6 @@  hsa_insn_sbr::hsa_insn_sbr (hsa_op_reg *index, unsigned jump_count)
 {
 }
 
-/* New operator to allocate switch branch instruction from pool alloc.  */
-
-void *
-hsa_insn_sbr::operator new (size_t)
-{
-  return hsa_allocp_inst_sbr->allocate_raw ();
-}
-
 /* Replace all occurrences of OLD_BB with NEW_BB in the statements
    jump table.  */
 
@@ -1565,14 +1531,6 @@  hsa_insn_cmp::hsa_insn_cmp (BrigCompareOperation8_t cmp, BrigType16_t t,
 {
 }
 
-/* New operator to allocate compare instruction from pool alloc.  */
-
-void *
-hsa_insn_cmp::operator new (size_t)
-{
-  return hsa_allocp_inst_cmp->allocate_raw ();
-}
-
 /* Constructor of classes representing memory accesses.  OPC is the opcode (must
    be BRIG_OPCODE_ST or BRIG_OPCODE_LD) and T is the type.  The instruction
    operands are provided as ARG0 and ARG1.  */
@@ -1598,18 +1556,9 @@  hsa_insn_mem::hsa_insn_mem (unsigned nops, int opc, BrigType16_t t,
 {
 }
 
-/* New operator to allocate memory instruction from pool alloc.  */
-
-void *
-hsa_insn_mem::operator new (size_t)
-{
-  return hsa_allocp_inst_mem->allocate_raw ();
-}
-
-/* Constructor of class representing atomic instructions and signals.  OPC is
-   the principal opcode, aop is the specific atomic operation opcode.  T is the
-   type of the instruction.  The instruction operands
-   are provided as ARG[0-3].  */
+/* Constructor of class representing atomic instructions.  OPC is the principal
+   opcode, AOP is the specific atomic operation opcode.  T is the type of the
+   instruction.  The instruction operands are provided as ARG[0-3].  */
 
 hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
 				  enum BrigAtomicOperation aop,
@@ -1627,34 +1576,18 @@  hsa_insn_atomic::hsa_insn_atomic (int nops, int opc,
 		       opc == BRIG_OPCODE_SIGNALNORET);
 }
 
-/* New operator to allocate signal instruction from pool alloc.  */
-
-void *
-hsa_insn_atomic::operator new (size_t)
-{
-  return hsa_allocp_inst_atomic->allocate_raw ();
-}
-
 /* Constructor of class representing signal instructions.  OPC is the prinicpal
-   opcode, sop is the specific signal operation opcode.  T is the type of the
+   opcode, SOP is the specific signal operation opcode.  T is the type of the
    instruction.  The instruction operands are provided as ARG[0-3].  */
 
 hsa_insn_signal::hsa_insn_signal (int nops, int opc,
 				  enum BrigAtomicOperation sop,
-				  BrigType16_t t, hsa_op_base *arg0,
-				  hsa_op_base *arg1, hsa_op_base *arg2,
-				  hsa_op_base *arg3)
-  : hsa_insn_atomic (nops, opc, sop, t, BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE,
-		     arg0, arg1, arg2, arg3)
-{
-}
-
-/* New operator to allocate signal instruction from pool alloc.  */
-
-void *
-hsa_insn_signal::operator new (size_t)
+				  BrigType16_t t, BrigMemoryOrder memorder,
+				  hsa_op_base *arg0, hsa_op_base *arg1,
+				  hsa_op_base *arg2, hsa_op_base *arg3)
+  : hsa_insn_basic (nops, opc, t, arg0, arg1, arg2, arg3),
+    m_memory_order (memorder), m_signalop (sop)
 {
-  return hsa_allocp_inst_signal->allocate_raw ();
 }
 
 /* Constructor of class representing segment conversion instructions.  OPC is
@@ -1672,14 +1605,6 @@  hsa_insn_seg::hsa_insn_seg (int opc, BrigType16_t dest, BrigType16_t srct,
   gcc_checking_assert (opc == BRIG_OPCODE_STOF || opc == BRIG_OPCODE_FTOS);
 }
 
-/* New operator to allocate address conversion instruction from pool alloc.  */
-
-void *
-hsa_insn_seg::operator new (size_t)
-{
-  return hsa_allocp_inst_seg->allocate_raw ();
-}
-
 /* Constructor of class representing a call instruction.  CALLEE is the tree
    representation of the function being called.  */
 
@@ -1696,14 +1621,6 @@  hsa_insn_call::hsa_insn_call (hsa_internal_fn *fn)
 {
 }
 
-/* New operator to allocate call instruction from pool alloc.  */
-
-void *
-hsa_insn_call::operator new (size_t)
-{
-  return hsa_allocp_inst_call->allocate_raw ();
-}
-
 hsa_insn_call::~hsa_insn_call ()
 {
   for (unsigned i = 0; i < m_input_args.length (); i++)
@@ -1724,14 +1641,6 @@  hsa_insn_arg_block::hsa_insn_arg_block (BrigKind brig_kind,
 {
 }
 
-/* New operator to allocate argument block instruction from pool alloc.  */
-
-void *
-hsa_insn_arg_block::operator new (size_t)
-{
-  return hsa_allocp_inst_arg_block->allocate_raw ();
-}
-
 hsa_insn_comment::hsa_insn_comment (const char *s)
   : hsa_insn_basic (0, BRIG_KIND_DIRECTIVE_COMMENT)
 {
@@ -1743,14 +1652,6 @@  hsa_insn_comment::hsa_insn_comment (const char *s)
   m_comment = buf;
 }
 
-/* New operator to allocate comment instruction from pool alloc.  */
-
-void *
-hsa_insn_comment::operator new (size_t)
-{
-  return hsa_allocp_inst_comment->allocate_raw ();
-}
-
 hsa_insn_comment::~hsa_insn_comment ()
 {
   gcc_checking_assert (m_comment);
@@ -1759,17 +1660,14 @@  hsa_insn_comment::~hsa_insn_comment ()
 }
 
 /* Constructor of class representing the queue instruction in HSAIL.  */
-hsa_insn_queue::hsa_insn_queue (int nops, BrigOpcode opcode)
-  : hsa_insn_basic (nops, opcode, BRIG_TYPE_U64)
-{
-}
 
-/* New operator to allocate source type instruction from pool alloc.  */
-
-void *
-hsa_insn_srctype::operator new (size_t)
+hsa_insn_queue::hsa_insn_queue (int nops, int opcode, BrigSegment segment,
+				BrigMemoryOrder memory_order,
+				hsa_op_base *arg0, hsa_op_base *arg1,
+				hsa_op_base *arg2, hsa_op_base *arg3)
+  : hsa_insn_basic (nops, opcode, BRIG_TYPE_U64, arg0, arg1, arg2, arg3),
+    m_segment (segment), m_memory_order (memory_order)
 {
-  return hsa_allocp_inst_srctype->allocate_raw ();
 }
 
 /* Constructor of class representing the source type instruction in HSAIL.  */
@@ -1782,14 +1680,6 @@  hsa_insn_srctype::hsa_insn_srctype (int nops, BrigOpcode opcode,
     m_source_type (srct)
 {}
 
-/* New operator to allocate packed instruction from pool alloc.  */
-
-void *
-hsa_insn_packed::operator new (size_t)
-{
-  return hsa_allocp_inst_packed->allocate_raw ();
-}
-
 /* Constructor of class representing the packed instruction in HSAIL.  */
 
 hsa_insn_packed::hsa_insn_packed (int nops, BrigOpcode opcode,
@@ -1801,14 +1691,6 @@  hsa_insn_packed::hsa_insn_packed (int nops, BrigOpcode opcode,
   m_operand_list = new hsa_op_operand_list (nops - 1);
 }
 
-/* New operator to allocate convert instruction from pool alloc.  */
-
-void *
-hsa_insn_cvt::operator new (size_t)
-{
-  return hsa_allocp_inst_cvt->allocate_raw ();
-}
-
 /* Constructor of class representing the convert instruction in HSAIL.  */
 
 hsa_insn_cvt::hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src)
@@ -1816,14 +1698,6 @@  hsa_insn_cvt::hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src)
 {
 }
 
-/* New operator to allocate alloca from pool alloc.  */
-
-void *
-hsa_insn_alloca::operator new (size_t)
-{
-  return hsa_allocp_inst_alloca->allocate_raw ();
-}
-
 /* Constructor of class representing the alloca in HSAIL.  */
 
 hsa_insn_alloca::hsa_insn_alloca (hsa_op_with_type *dest,
@@ -1854,6 +1728,20 @@  hsa_bb::append_insn (hsa_insn_basic *insn)
     m_first_insn = insn;
 }
 
+void
+hsa_bb::append_phi (hsa_insn_phi *hphi)
+{
+  hphi->m_bb = m_bb;
+
+  hphi->m_prev = m_last_phi;
+  hphi->m_next = NULL;
+  if (m_last_phi)
+    m_last_phi->m_next = hphi;
+  m_last_phi = hphi;
+  if (!m_first_phi)
+    m_first_phi = hphi;
+}
+
 /* Insert HSA instruction NEW_INSN immediately before an existing instruction
    OLD_INSN.  */
 
@@ -2078,6 +1966,7 @@  gen_hsa_addr (tree ref, hsa_bb *hbb, HOST_WIDE_INT *output_bitsize = NULL,
     case PARM_DECL:
     case VAR_DECL:
     case RESULT_DECL:
+    case CONST_DECL:
       gcc_assert (!symbol);
       symbol = get_symbol_for_decl (ref);
       addrtype = hsa_get_segment_addr_type (symbol->m_segment);
@@ -2295,6 +2184,34 @@  gen_hsa_addr_insns (tree val, hsa_op_reg *dest, hsa_bb *hbb)
     val = TREE_OPERAND (val, 0);
   addr = gen_hsa_addr (val, hbb);
 
+  if (TREE_CODE (val) == CONST_DECL
+      && is_gimple_reg_type (TREE_TYPE (val)))
+    {
+      gcc_assert (addr->m_symbol
+		  && addr->m_symbol->m_segment == BRIG_SEGMENT_READONLY);
+      /* CONST_DECLs are in readonly segment which however does not have
+	 addresses convertible to flat segments.  So copy it to a private one
+	 and take address of that.  */
+      BrigType16_t csttype
+	= mem_type_for_type (hsa_type_for_scalar_tree_type (TREE_TYPE (val),
+							    false));
+      hsa_op_reg *r = new hsa_op_reg (csttype);
+      hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_LD, csttype, r,
+					  new hsa_op_address (addr->m_symbol)));
+      hsa_symbol *copysym = hsa_cfun->create_hsa_temporary (csttype);
+      hbb->append_insn (new hsa_insn_mem (BRIG_OPCODE_ST, csttype, r,
+					  new hsa_op_address (copysym)));
+      addr->m_symbol = copysym;
+    }
+  else if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_READONLY)
+    {
+      HSA_SORRY_ATV (EXPR_LOCATION (val), "support for HSA does "
+		     "not implement taking addresses of complex "
+		     "CONST_DECLs such as %E", val);
+      return;
+    }
+
+
   convert_addr_to_flat_segment (addr, dest, hbb);
 }
 
@@ -2324,8 +2241,10 @@  hsa_reg_or_immed_for_gimple_op (tree op, hsa_bb *hbb)
 void
 hsa_build_append_simple_mov (hsa_op_reg *dest, hsa_op_base *src, hsa_bb *hbb)
 {
-  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type,
-					     dest, src);
+  /* Moves of packed data between registers need to adhere to the same type
+     rules like when dealing with memory.  */
+  BrigType16_t tp = mem_type_for_type (dest->m_type);
+  hsa_insn_basic *insn = new hsa_insn_basic (2, BRIG_OPCODE_MOV, tp, dest, src);
   if (hsa_op_reg *sreg = dyn_cast <hsa_op_reg *> (src))
     gcc_assert (hsa_type_bit_size (dest->m_type)
 		== hsa_type_bit_size (sreg->m_type));
@@ -3054,8 +2973,12 @@  gen_hsa_unary_operation (BrigOpcode opcode, hsa_op_reg *dest,
   if (opcode == BRIG_OPCODE_MOV && hsa_needs_cvt (dest->m_type, op1->m_type))
     insn = new hsa_insn_cvt (dest, op1);
   else if (opcode == BRIG_OPCODE_FIRSTBIT || opcode == BRIG_OPCODE_LASTBIT)
-    insn = new hsa_insn_srctype (2, opcode, BRIG_TYPE_U32, op1->m_type, NULL,
-				 op1);
+    {
+      BrigType16_t srctype = hsa_type_integer_p (op1->m_type) ? op1->m_type
+	: hsa_unsigned_type_for_type (op1->m_type);
+      insn = new hsa_insn_srctype (2, opcode, BRIG_TYPE_U32, srctype, NULL,
+				   op1);
+    }
   else
     {
       insn = new hsa_insn_basic (2, opcode, dest->m_type, dest, op1);
@@ -3169,6 +3092,23 @@  gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
     case NEGATE_EXPR:
       opcode = BRIG_OPCODE_NEG;
       break;
+    case FMA_EXPR:
+      /* There is a native HSA instruction for scalar FMAs but not for vector
+	 ones.  */
+      if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE)
+	{
+	  hsa_op_reg *dest
+	    = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign));
+	  hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb);
+	  hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb);
+	  hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+	  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+	  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb);
+	  gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb);
+	  return;
+	}
+      opcode = BRIG_OPCODE_MAD;
+      break;
     case MIN_EXPR:
       opcode = BRIG_OPCODE_MIN;
       break;
@@ -3368,14 +3308,18 @@  gen_hsa_insns_for_operation_assignment (gimple *assign, hsa_bb *hbb)
   switch (rhs_class)
     {
     case GIMPLE_TERNARY_RHS:
-      gcc_unreachable ();
+      {
+	hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb);
+	hsa_insn_basic *insn = new hsa_insn_basic (4, opcode, dest->m_type, dest,
+						   op1, op2, op3);
+	hbb->append_insn (insn);
+      }
       return;
 
-      /* Fall through */
     case GIMPLE_BINARY_RHS:
       gen_hsa_binary_operation (opcode, dest, op1, op2, hbb);
       break;
-      /* Fall through */
+
     case GIMPLE_UNARY_RHS:
       gen_hsa_unary_operation (opcode, dest, op1, hbb);
       break;
@@ -3392,14 +3336,14 @@  static void
 gen_hsa_insns_for_cond_stmt (gimple *cond, hsa_bb *hbb)
 {
   hsa_op_reg *ctrl = new hsa_op_reg (BRIG_TYPE_B1);
-  hsa_insn_br *cbr;
+  hsa_insn_cbr *cbr;
 
   gen_hsa_cmp_insn_from_gimple (gimple_cond_code (cond),
 				gimple_cond_lhs (cond),
 				gimple_cond_rhs (cond),
 				ctrl, hbb);
 
-  cbr = new hsa_insn_br (ctrl);
+  cbr = new hsa_insn_cbr (ctrl);
   hbb->append_insn (cbr);
 }
 
@@ -3476,7 +3420,7 @@  gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
   hbb->append_insn (new hsa_insn_basic (3, BRIG_OPCODE_AND, cmp_reg->m_type,
 					cmp_reg, cmp1_reg, cmp2_reg));
 
-  hbb->append_insn (new hsa_insn_br (cmp_reg));
+  hbb->append_insn (new hsa_insn_cbr (cmp_reg));
 
   tree default_label = gimple_switch_default_label (s);
   basic_block default_label_bb = label_to_block_fn (func,
@@ -3537,13 +3481,14 @@  gen_hsa_insns_for_switch_stmt (gswitch *s, hsa_bb *hbb)
 static void
 verify_function_arguments (tree decl)
 {
+  tree type = TREE_TYPE (decl);
   if (DECL_STATIC_CHAIN (decl))
     {
       HSA_SORRY_ATV (EXPR_LOCATION (decl),
 		     "HSA does not support nested functions: %D", decl);
       return;
     }
-  else if (!TYPE_ARG_TYPES (TREE_TYPE (decl)))
+  else if (!TYPE_ARG_TYPES (type) || stdarg_p (type))
     {
       HSA_SORRY_ATV (EXPR_LOCATION (decl),
 		     "HSA does not support functions with variadic arguments "
@@ -3839,33 +3784,58 @@  hsa_insn_basic::set_output_in_type (hsa_op_reg *dest, unsigned op_index,
    HBB.  */
 
 static void
-query_hsa_grid (hsa_op_reg *dest, BrigType16_t opcode, int dimension,
-		hsa_bb *hbb)
+query_hsa_grid_dim (hsa_op_reg *dest, int opcode, hsa_op_immed *dimension,
+		    hsa_bb *hbb)
 {
-  /* We're using just one-dimensional kernels, so hard-coded
-     dimension X.  */
-  hsa_op_immed *imm
-    = new hsa_op_immed (dimension, (BrigKind16_t) BRIG_TYPE_U32);
   hsa_insn_basic *insn = new hsa_insn_basic (2, opcode, BRIG_TYPE_U32, NULL,
-					     imm);
+					     dimension);
   hbb->append_insn (insn);
   insn->set_output_in_type (dest, 0, hbb);
 }
 
-/* Generate a special HSA-related instruction for gimple STMT.
-   Instructions are appended to basic block HBB.  */
+/* Generate instruction OPCODE to query a property of HSA grid along the given
+   dimension which is an immediate in first argument of STMT.  Store result
+   into the register corresponding to LHS of STMT and append the instruction to
+   HBB.  */
 
 static void
-query_hsa_grid (gimple *stmt, BrigOpcode16_t opcode, int dimension,
-		hsa_bb *hbb)
+query_hsa_grid_dim (gimple *stmt, int opcode, hsa_bb *hbb)
 {
   tree lhs = gimple_call_lhs (dyn_cast <gcall *> (stmt));
   if (lhs == NULL_TREE)
     return;
 
+  tree arg = gimple_call_arg (stmt, 0);
+  unsigned HOST_WIDE_INT dim = 5;
+  if (tree_fits_uhwi_p (arg))
+    dim = tree_to_uhwi (arg);
+  if (dim > 2)
+    {
+      HSA_SORRY_AT (gimple_location (stmt),
+		    "HSA grid query dimension must be immediate constant 0, 1 "
+		    "or 2");
+      return;
+    }
+
+  hsa_op_immed *hdim = new hsa_op_immed (dim, (BrigKind16_t) BRIG_TYPE_U32);
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  query_hsa_grid_dim (dest, opcode, hdim, hbb);
+}
+
+/* Generate instruction OPCODE to query a property of HSA grid that is
+   independent of any dimension.  Store result into the register corresponding
+   to LHS of STMT and append the instruction to HBB.  */
 
-  query_hsa_grid (dest, opcode, dimension, hbb);
+static void
+query_hsa_grid_nodim (gimple *stmt, BrigOpcode16_t opcode, hsa_bb *hbb)
+{
+  tree lhs = gimple_call_lhs (dyn_cast <gcall *> (stmt));
+  if (lhs == NULL_TREE)
+    return;
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  BrigType16_t brig_type = hsa_unsigned_type_for_type (dest->m_type);
+  hsa_insn_basic *insn = new hsa_insn_basic (1, opcode, brig_type, dest);
+  hbb->append_insn (insn);
 }
 
 /* Emit instructions that set hsa_num_threads according to provided VALUE.
@@ -4012,6 +3982,44 @@  gen_num_threads_for_dispatch (hsa_bb *hbb)
   return as_a <hsa_op_reg *> (dest);
 }
 
+/* Build OPCODE query for all three hsa dimensions, multiply them and store the
+   result into DEST.  */
+
+static void
+multiply_grid_dim_characteristics (hsa_op_reg *dest, int opcode, hsa_bb *hbb)
+{
+  hsa_op_reg *dimx = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (dimx, opcode,
+		      new hsa_op_immed (0, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *dimy = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (dimy, opcode,
+		      new hsa_op_immed (1, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *dimz = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (dimz, opcode,
+		      new hsa_op_immed (2, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *tmp = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp,
+			    dimx->get_in_type (dest->m_type, hbb),
+			    dimy->get_in_type (dest->m_type, hbb), hbb);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, dest, tmp,
+			    dimz->get_in_type (dest->m_type, hbb), hbb);
+}
+
+/* Emit instructions that assign number of threads to lhs of gimple STMT.
+   Instructions are appended to basic block HBB.  */
+
+static void
+gen_get_num_threads (gimple *stmt, hsa_bb *hbb)
+{
+  if (gimple_call_lhs (stmt) == NULL_TREE)
+    return;
+
+  hbb->append_insn (new hsa_insn_comment ("omp_get_num_threads"));
+  tree lhs = gimple_call_lhs (stmt);
+  hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
+  multiply_grid_dim_characteristics (dest, BRIG_OPCODE_CURRENTWORKGROUPSIZE,
+				     hbb);
+}
 
 /* Emit instructions that assign number of teams to lhs of gimple STMT.
    Instructions are appended to basic block HBB.  */
@@ -4023,15 +4031,9 @@  gen_get_num_teams (gimple *stmt, hsa_bb *hbb)
     return;
 
   hbb->append_insn (new hsa_insn_comment ("omp_get_num_teams"));
-
   tree lhs = gimple_call_lhs (stmt);
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
-  hsa_op_immed *one = new hsa_op_immed (1, dest->m_type);
-
-  hsa_insn_basic *basic
-    = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, dest, one);
-
-  hbb->append_insn (basic);
+  multiply_grid_dim_characteristics (dest, BRIG_OPCODE_GRIDGROUPS, hbb);
 }
 
 /* Emit instructions that assign a team number to lhs of gimple STMT.
@@ -4044,15 +4046,42 @@  gen_get_team_num (gimple *stmt, hsa_bb *hbb)
     return;
 
   hbb->append_insn (new hsa_insn_comment ("omp_get_team_num"));
-
   tree lhs = gimple_call_lhs (stmt);
   hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs);
-  hsa_op_immed *zero = new hsa_op_immed (0, dest->m_type);
 
-  hsa_insn_basic *basic
-    = new hsa_insn_basic (2, BRIG_OPCODE_MOV, dest->m_type, dest, zero);
-
-  hbb->append_insn (basic);
+  hsa_op_reg *gnum_x = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gnum_x, BRIG_OPCODE_GRIDGROUPS,
+		      new hsa_op_immed (0, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *gnum_y = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gnum_y, BRIG_OPCODE_GRIDGROUPS,
+		      new hsa_op_immed (1, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+
+  hsa_op_reg *gno_z = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gno_z, BRIG_OPCODE_WORKGROUPID,
+		      new hsa_op_immed (2, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+
+  hsa_op_reg *tmp1 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp1,
+			    gnum_x->get_in_type (dest->m_type, hbb),
+			    gnum_y->get_in_type (dest->m_type, hbb), hbb);
+  hsa_op_reg *tmp2 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp2, tmp1,
+			    gno_z->get_in_type (dest->m_type, hbb), hbb);
+
+  hsa_op_reg *gno_y = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gno_y, BRIG_OPCODE_WORKGROUPID,
+		      new hsa_op_immed (1, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  hsa_op_reg *tmp3 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp3,
+			    gnum_x->get_in_type (dest->m_type, hbb),
+			    gno_y->get_in_type (dest->m_type, hbb), hbb);
+  hsa_op_reg *tmp4 = new hsa_op_reg (dest->m_type);
+  gen_hsa_binary_operation (BRIG_OPCODE_ADD, tmp4, tmp3, tmp2, hbb);
+  hsa_op_reg *gno_x = new hsa_op_reg (BRIG_TYPE_U32);
+  query_hsa_grid_dim (gno_x, BRIG_OPCODE_WORKGROUPID,
+		      new hsa_op_immed (0, (BrigKind16_t) BRIG_TYPE_U32), hbb);
+  gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp4,
+			    gno_x->get_in_type (dest->m_type, hbb), hbb);
 }
 
 /* Emit instructions that get levels-var ICV to lhs of gimple STMT.
@@ -4263,12 +4292,13 @@  gen_hsa_popcount_to_dest (hsa_op_reg *dest, hsa_op_with_type *arg, hsa_bb *hbb)
   if (hsa_type_bit_size (arg->m_type) < 32)
     arg = arg->get_in_type (BRIG_TYPE_B32, hbb);
 
+  BrigType16_t srctype = hsa_bittype_for_type (arg->m_type);
   if (!hsa_btype_p (arg->m_type))
-    arg = arg->get_in_type (hsa_bittype_for_type (arg->m_type), hbb);
+    arg = arg->get_in_type (srctype, hbb);
 
   hsa_insn_srctype *popcount
     = new hsa_insn_srctype (2, BRIG_OPCODE_POPCOUNT, BRIG_TYPE_U32,
-			    arg->m_type, NULL, arg);
+			    srctype, NULL, arg);
   hbb->append_insn (popcount);
   popcount->set_output_in_type (dest, 0, hbb);
 }
@@ -4339,11 +4369,11 @@  omp_simple_builtin::generate (gimple *stmt, hsa_bb *hbb)
   if (m_sorry)
     {
       if (m_warning_message)
-	HSA_SORRY_AT (gimple_location (stmt), m_warning_message)
+	HSA_SORRY_AT (gimple_location (stmt), m_warning_message);
       else
 	HSA_SORRY_ATV (gimple_location (stmt),
 		       "Support for HSA does not implement calls to %s\n",
-		       m_name)
+		       m_name);
     }
   else if (m_warning_message != NULL)
     warning_at (gimple_location (stmt), OPT_Whsa, m_warning_message);
@@ -4398,12 +4428,12 @@  gen_hsa_insns_for_known_library_call (gimple *stmt, hsa_bb *hbb)
       else if (strcmp (name, "omp_get_thread_num") == 0)
 	{
 	  hbb->append_insn (new hsa_insn_comment (name));
-	  query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
+	  query_hsa_grid_nodim (stmt, BRIG_OPCODE_WORKITEMFLATABSID, hbb);
 	}
       else if (strcmp (name, "omp_get_num_threads") == 0)
 	{
 	  hbb->append_insn (new hsa_insn_comment (name));
-	  query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
+	  gen_get_num_threads (stmt, hbb);
 	}
       else if (strcmp (name, "omp_get_num_teams") == 0)
 	gen_get_num_teams (stmt, hbb);
@@ -4589,7 +4619,7 @@  expand_string_operation_builtin (gimple *stmt, hsa_bb *hbb,
 {
   edge e = split_block (hbb->m_bb, stmt);
   basic_block condition_bb = e->src;
-  hbb->append_insn (new hsa_insn_br (misaligned_flag));
+  hbb->append_insn (new hsa_insn_cbr (misaligned_flag));
 
   /* Prepare the control flow.  */
   edge condition_edge = EDGE_SUCC (condition_bb, 0);
@@ -4718,95 +4748,86 @@  expand_memory_set (gimple *stmt, unsigned HOST_WIDE_INT n,
   expand_lhs_of_string_op (stmt, n, merge_bb, builtin);
 }
 
-/* Return string for MEMMODEL.  */
+/* Store into MEMORDER the memory order specified by tree T, which must be an
+   integer constant representing a C++ memory order.  If it isn't, issue an HSA
+   sorry message using LOC and return true, otherwise return false and store
+   the name of the requested order to *MNAME.  */
 
-static const char *
-get_memory_order_name (unsigned memmodel)
+static bool
+hsa_memorder_from_tree (tree t, BrigMemoryOrder *memorder, const char **mname,
+			location_t loc)
 {
-  switch (memmodel & MEMMODEL_BASE_MASK)
+  if (!tree_fits_uhwi_p (t))
     {
-    case MEMMODEL_RELAXED:
-      return "relaxed";
-    case MEMMODEL_CONSUME:
-      return "consume";
-    case MEMMODEL_ACQUIRE:
-      return "acquire";
-    case MEMMODEL_RELEASE:
-      return "release";
-    case MEMMODEL_ACQ_REL:
-      return "acq_rel";
-    case MEMMODEL_SEQ_CST:
-      return "seq_cst";
-    default:
-      return NULL;
+      HSA_SORRY_ATV (loc, "support for HSA does not implement memory model %E",
+		     t);
+      return true;
     }
-}
-
-/* Return memory order according to predefined __atomic memory model
-   constants.  LOCATION is provided to locate the problematic statement.  */
 
-static BrigMemoryOrder
-get_memory_order (unsigned memmodel, location_t location)
-{
-  switch (memmodel & MEMMODEL_BASE_MASK)
+  unsigned HOST_WIDE_INT mm = tree_to_uhwi (t);
+  switch (mm & MEMMODEL_BASE_MASK)
     {
     case MEMMODEL_RELAXED:
-      return BRIG_MEMORY_ORDER_RELAXED;
+      *memorder = BRIG_MEMORY_ORDER_RELAXED;
+      *mname = "relaxed";
+      break;
     case MEMMODEL_CONSUME:
       /* HSA does not have an equivalent, but we can use the slightly stronger
 	 ACQUIRE.  */
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE;
+      *mname = "consume";
+      break;
     case MEMMODEL_ACQUIRE:
-      return BRIG_MEMORY_ORDER_SC_ACQUIRE;
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE;
+      *mname = "acquire";
+      break;
     case MEMMODEL_RELEASE:
-      return BRIG_MEMORY_ORDER_SC_RELEASE;
+      *memorder = BRIG_MEMORY_ORDER_SC_RELEASE;
+      *mname = "release";
+      break;
     case MEMMODEL_ACQ_REL:
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+      *mname = "acq_rel";
+      break;
     case MEMMODEL_SEQ_CST:
       /* Callers implementing a simple load or store need to remove the release
 	 or acquire part respectively.  */
-      return BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+      *memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+      *mname = "seq_cst";
+      break;
     default:
       {
-	const char *mmname = get_memory_order_name (memmodel);
-	HSA_SORRY_ATV (location,
-		       "support for HSA does not implement the specified "
-		       " memory model%s %s",
-		       mmname ? ": " : "", mmname ? mmname : "");
-	return BRIG_MEMORY_ORDER_NONE;
+	HSA_SORRY_AT (loc, "support for HSA does not implement the specified "
+		      "memory model");
+	return true;
       }
     }
+  return false;
 }
 
-/* Helper function to create an HSA atomic binary operation instruction out of
-   calls to atomic builtins.  RET_ORIG is true if the built-in is the variant
-   that return s the value before applying operation, and false if it should
-   return the value after applying the operation (if it returns value at all).
-   ACODE is the atomic operation code, STMT is a gimple call to a builtin.  HBB
-   is the HSA BB to which the instruction should be added.  */
+/* Helper function to create an HSA atomic operation instruction out of calls
+   to atomic builtins.  RET_ORIG is true if the built-in is the variant that
+   return s the value before applying operation, and false if it should return
+   the value after applying the operation (if it returns value at all).  ACODE
+   is the atomic operation code, STMT is a gimple call to a builtin.  HBB is
+   the HSA BB to which the instruction should be added.  If SIGNAL is true, the
+   created operation will work on HSA signals rather than atomic variables.  */
 
 static void
-gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
- 				    enum BrigAtomicOperation acode,
-				    gimple *stmt,
-				    hsa_bb *hbb)
+gen_hsa_atomic_for_builtin (bool ret_orig, enum BrigAtomicOperation acode,
+			    gimple *stmt, hsa_bb *hbb, bool signal)
 {
   tree lhs = gimple_call_lhs (stmt);
 
   tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
   BrigType16_t hsa_type = hsa_type_for_scalar_tree_type (type, false);
   BrigType16_t mtype = mem_type_for_type (hsa_type);
-  tree model = gimple_call_arg (stmt, 2);
+  BrigMemoryOrder memorder;
+  const char *mmname;
 
-  if (!tree_fits_uhwi_p (model))
-    {
-      HSA_SORRY_ATV (gimple_location (stmt),
-		     "support for HSA does not implement memory model %E",
-		     model);
-      return;
-    }
-
-  unsigned HOST_WIDE_INT mmodel = tree_to_uhwi (model);
-
-  BrigMemoryOrder memorder = get_memory_order (mmodel, gimple_location (stmt));
+  if (hsa_memorder_from_tree (gimple_call_arg (stmt, 2), &memorder, &mmname,
+			      gimple_location (stmt)))
+    return;
 
   /* Certain atomic insns must have Bx memory types.  */
   switch (acode)
@@ -4831,13 +4852,13 @@  gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
 	dest = hsa_cfun->reg_for_gimple_ssa (lhs);
       else
 	dest = new hsa_op_reg (hsa_type);
-      opcode = BRIG_OPCODE_ATOMIC;
+      opcode = signal ? BRIG_OPCODE_SIGNAL : BRIG_OPCODE_ATOMIC;
       nops = 3;
     }
   else
     {
       dest = NULL;
-      opcode = BRIG_OPCODE_ATOMICNORET;
+      opcode = signal ? BRIG_OPCODE_SIGNALNORET : BRIG_OPCODE_ATOMICNORET;
       nops = 2;
     }
 
@@ -4852,35 +4873,44 @@  gen_hsa_ternary_atomic_for_builtin (bool ret_orig,
 	{
 	  HSA_SORRY_ATV (gimple_location (stmt),
 			 "support for HSA does not implement memory model for "
-			 "ATOMIC_ST: %s", get_memory_order_name (mmodel));
+			 "ATOMIC_ST: %s", mmname);
 	  return;
 	}
     }
 
-  hsa_insn_atomic *atominsn = new hsa_insn_atomic (nops, opcode, acode, mtype,
-						   memorder);
-
-  hsa_op_address *addr;
-  addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
-  if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_PRIVATE)
+  hsa_insn_basic *atominsn;
+  hsa_op_base *tgt;
+  if (signal)
     {
-      HSA_SORRY_AT (gimple_location (stmt),
-		    "HSA does not implement atomic operations in private "
-		    "segment");
-      return;
+      atominsn = new hsa_insn_signal (nops, opcode, acode, mtype, memorder);
+      tgt = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 0), hbb);
     }
+  else
+    {
+      atominsn = new hsa_insn_atomic (nops, opcode, acode, mtype, memorder);
+      hsa_op_address *addr;
+      addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
+      if (addr->m_symbol && addr->m_symbol->m_segment == BRIG_SEGMENT_PRIVATE)
+	{
+	  HSA_SORRY_AT (gimple_location (stmt),
+			"HSA does not implement atomic operations in private "
+			"segment");
+	  return;
+	}
+      tgt = addr;
+    }
+
   hsa_op_base *op = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1),
 						    hbb);
-
   if (lhs)
     {
       atominsn->set_op (0, dest);
-      atominsn->set_op (1, addr);
+      atominsn->set_op (1, tgt);
       atominsn->set_op (2, op);
     }
   else
     {
-      atominsn->set_op (0, addr);
+      atominsn->set_op (0, tgt);
       atominsn->set_op (1, op);
     }
 
@@ -4950,6 +4980,10 @@  gen_hsa_insn_for_internal_fn_call (gcall *stmt, hsa_bb *hbb)
       gen_hsa_unaryop_for_builtin (BRIG_OPCODE_SQRT, stmt, hbb);
       break;
 
+    case IFN_RSQRT:
+      gen_hsa_unaryop_for_builtin (BRIG_OPCODE_NRSQRT, stmt, hbb);
+      break;
+
     case IFN_TRUNC:
       gen_hsa_unaryop_for_builtin (BRIG_OPCODE_TRUNC, stmt, hbb);
       break;
@@ -5068,6 +5102,12 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
     {
       tree function_decl = gimple_call_fndecl (stmt);
+      /* Prefetch pass can create type-mismatching prefetch builtin calls which
+	 fail the gimple_call_builtin_p test above.  Handle them here.  */
+      if (DECL_BUILT_IN_CLASS (function_decl)
+	  && DECL_FUNCTION_CODE (function_decl) == BUILT_IN_PREFETCH)
+	return;
+
       if (function_decl == NULL_TREE)
 	{
 	  HSA_SORRY_AT (gimple_location (stmt),
@@ -5185,21 +5225,14 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_LOAD_16:
       {
 	BrigType16_t mtype;
-	hsa_op_address *addr;
-	addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
-	tree model = gimple_call_arg (stmt, 1);
-	if (!tree_fits_uhwi_p (model))
-	  {
-	    HSA_SORRY_ATV (gimple_location (stmt),
-			   "support for HSA does not implement "
-			   "memory model: %E",
-			   model);
-	    return;
-	  }
+	hsa_op_base *src;
+	src = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
 
-	unsigned HOST_WIDE_INT mmodel = tree_to_uhwi (model);
-	BrigMemoryOrder memorder = get_memory_order (mmodel,
-						     gimple_location (stmt));
+	BrigMemoryOrder memorder;
+	const char *mmname;
+	if (hsa_memorder_from_tree (gimple_call_arg (stmt, 1), &memorder,
+				    &mmname, gimple_location (stmt)))
+	  return;
 
 	if (memorder == BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE)
 	  memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE;
@@ -5210,8 +5243,7 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	  {
 	    HSA_SORRY_ATV (gimple_location (stmt),
 			   "support for HSA does not implement "
-			   "memory model for ATOMIC_LD: %s",
-			   get_memory_order_name (mmodel));
+			   "memory model for atomic loads: %s", mmname);
 	    return;
 	  }
 
@@ -5229,9 +5261,9 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	    dest = new hsa_op_reg (mtype);
 	  }
 
-	hsa_insn_atomic *atominsn
-	  = new hsa_insn_atomic (2, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_LD, mtype,
-				 memorder, dest, addr);
+	hsa_insn_basic *atominsn;
+	atominsn = new hsa_insn_atomic (2, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_LD,
+					mtype, memorder, dest, src);
 
 	hbb->append_insn (atominsn);
 	break;
@@ -5242,7 +5274,8 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_EXCHANGE_4:
     case BUILT_IN_ATOMIC_EXCHANGE_8:
     case BUILT_IN_ATOMIC_EXCHANGE_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_EXCH, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_EXCH, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_ADD_1:
@@ -5250,7 +5283,8 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_ADD_4:
     case BUILT_IN_ATOMIC_FETCH_ADD_8:
     case BUILT_IN_ATOMIC_FETCH_ADD_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_ADD, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_ADD, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_SUB_1:
@@ -5258,7 +5292,8 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_SUB_4:
     case BUILT_IN_ATOMIC_FETCH_SUB_8:
     case BUILT_IN_ATOMIC_FETCH_SUB_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_SUB, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_SUB, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_AND_1:
@@ -5266,7 +5301,8 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_AND_4:
     case BUILT_IN_ATOMIC_FETCH_AND_8:
     case BUILT_IN_ATOMIC_FETCH_AND_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_AND, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_AND, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_XOR_1:
@@ -5274,7 +5310,8 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_XOR_4:
     case BUILT_IN_ATOMIC_FETCH_XOR_8:
     case BUILT_IN_ATOMIC_FETCH_XOR_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_XOR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_XOR, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_FETCH_OR_1:
@@ -5282,7 +5319,8 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_FETCH_OR_4:
     case BUILT_IN_ATOMIC_FETCH_OR_8:
     case BUILT_IN_ATOMIC_FETCH_OR_16:
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_OR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_OR, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_STORE_1:
@@ -5291,7 +5329,8 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_STORE_8:
     case BUILT_IN_ATOMIC_STORE_16:
       /* Since there cannot be any LHS, the first parameter is meaningless.  */
-      gen_hsa_ternary_atomic_for_builtin (true, BRIG_ATOMIC_ST, stmt, hbb);
+      gen_hsa_atomic_for_builtin (true, BRIG_ATOMIC_ST, stmt, hbb, false);
+      break;
       break;
 
     case BUILT_IN_ATOMIC_ADD_FETCH_1:
@@ -5299,7 +5338,7 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_ADD_FETCH_4:
     case BUILT_IN_ATOMIC_ADD_FETCH_8:
     case BUILT_IN_ATOMIC_ADD_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_ADD, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_ADD, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_SUB_FETCH_1:
@@ -5307,7 +5346,7 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_SUB_FETCH_4:
     case BUILT_IN_ATOMIC_SUB_FETCH_8:
     case BUILT_IN_ATOMIC_SUB_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_SUB, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_SUB, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_AND_FETCH_1:
@@ -5315,7 +5354,7 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_AND_FETCH_4:
     case BUILT_IN_ATOMIC_AND_FETCH_8:
     case BUILT_IN_ATOMIC_AND_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_AND, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_AND, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_XOR_FETCH_1:
@@ -5323,7 +5362,7 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_XOR_FETCH_4:
     case BUILT_IN_ATOMIC_XOR_FETCH_8:
     case BUILT_IN_ATOMIC_XOR_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_XOR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_XOR, stmt, hbb, false);
       break;
 
     case BUILT_IN_ATOMIC_OR_FETCH_1:
@@ -5331,7 +5370,7 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_ATOMIC_OR_FETCH_4:
     case BUILT_IN_ATOMIC_OR_FETCH_8:
     case BUILT_IN_ATOMIC_OR_FETCH_16:
-      gen_hsa_ternary_atomic_for_builtin (false, BRIG_ATOMIC_OR, stmt, hbb);
+      gen_hsa_atomic_for_builtin (false, BRIG_ATOMIC_OR, stmt, hbb, false);
       break;
 
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1:
@@ -5340,27 +5379,23 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8:
     case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16:
       {
-	/* TODO: Use the appropriate memory model for now.  */
 	tree type = TREE_TYPE (gimple_call_arg (stmt, 1));
-
 	BrigType16_t atype
 	  = hsa_bittype_for_type (hsa_type_for_scalar_tree_type (type, false));
-
-	hsa_insn_atomic *atominsn
-	  = new hsa_insn_atomic (4, BRIG_OPCODE_ATOMIC, BRIG_ATOMIC_CAS, atype,
-				 BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE);
-	hsa_op_address *addr;
-	addr = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
+	BrigMemoryOrder memorder = BRIG_MEMORY_ORDER_SC_ACQUIRE_RELEASE;
+	hsa_insn_basic *atominsn;
+	hsa_op_base *tgt;
+	atominsn = new hsa_insn_atomic (4, BRIG_OPCODE_ATOMIC,
+					BRIG_ATOMIC_CAS, atype, memorder);
+	tgt = get_address_from_value (gimple_call_arg (stmt, 0), hbb);
 
 	if (lhs != NULL)
 	  dest = hsa_cfun->reg_for_gimple_ssa (lhs);
 	else
 	  dest = new hsa_op_reg (atype);
 
-	/* Should check what the memory scope is.  */
-	atominsn->m_memoryscope = BRIG_MEMORY_SCOPE_WORKGROUP;
 	atominsn->set_op (0, dest);
-	atominsn->set_op (1, addr);
+	atominsn->set_op (1, tgt);
 
 	hsa_op_with_type *op
 	  = hsa_reg_or_immed_for_gimple_op (gimple_call_arg (stmt, 1), hbb);
@@ -5371,20 +5406,42 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	hbb->append_insn (atominsn);
 	break;
       }
+
+    case BUILT_IN_HSA_WORKGROUPID:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKGROUPID, hbb);
+      break;
+    case BUILT_IN_HSA_WORKITEMID:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKITEMID, hbb);
+      break;
+    case BUILT_IN_HSA_WORKITEMABSID:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_WORKITEMABSID, hbb);
+      break;
+    case BUILT_IN_HSA_GRIDSIZE:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_GRIDSIZE, hbb);
+      break;
+    case BUILT_IN_HSA_CURRENTWORKGROUPSIZE:
+      query_hsa_grid_dim (stmt, BRIG_OPCODE_CURRENTWORKGROUPSIZE, hbb);
+      break;
+
+    case BUILT_IN_GOMP_BARRIER:
+      hbb->append_insn (new hsa_insn_br (0, BRIG_OPCODE_BARRIER, BRIG_TYPE_NONE,
+					 BRIG_WIDTH_ALL));
+      break;
     case BUILT_IN_GOMP_PARALLEL:
       HSA_SORRY_AT (gimple_location (stmt),
 		    "support for HSA does not implement non-gridified "
 		    "OpenMP parallel constructs.");
       break;
+
     case BUILT_IN_OMP_GET_THREAD_NUM:
       {
-	query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
+	query_hsa_grid_nodim (stmt, BRIG_OPCODE_WORKITEMFLATABSID, hbb);
 	break;
       }
 
     case BUILT_IN_OMP_GET_NUM_THREADS:
       {
-	query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
+	gen_get_num_threads (stmt, hbb);
 	break;
       }
     case BUILT_IN_GOMP_TEAMS:
@@ -5469,9 +5526,19 @@  gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 	gen_hsa_alloca (call, hbb);
 	break;
       }
+    case BUILT_IN_PREFETCH:
+      break;
     default:
       {
-	gen_hsa_insns_for_direct_call (stmt, hbb);
+	tree name_tree = DECL_NAME (fndecl);
+	const char *s = IDENTIFIER_POINTER (name_tree);
+	size_t len = strlen (s);
+	if (len > 4 && (strncmp (s, "__builtin_GOMP_", 15) == 0))
+	  HSA_SORRY_ATV (gimple_location (stmt),
+			 "support for HSA does not implement GOMP function %s",
+			 s);
+	else
+	  gen_hsa_insns_for_direct_call (stmt, hbb);
 	return;
       }
     }
@@ -5601,13 +5668,7 @@  gen_hsa_phi_from_gimple_phi (gimple *phi_stmt, hsa_bb *hbb)
 	}
     }
 
-  hphi->m_prev = hbb->m_last_phi;
-  hphi->m_next = NULL;
-  if (hbb->m_last_phi)
-    hbb->m_last_phi->m_next = hphi;
-  hbb->m_last_phi = hphi;
-  if (!hbb->m_first_phi)
-    hbb->m_first_phi = hphi;
+  hbb->append_phi (hphi);
 }
 
 /* Constructor of class containing HSA-specific information about a basic
@@ -5650,7 +5711,8 @@  hsa_bb::~hsa_bb ()
 hsa_bb *
 hsa_init_new_bb (basic_block bb)
 {
-  return new (*hsa_allocp_bb) hsa_bb (bb);
+  void *m = obstack_alloc (&hsa_obstack, sizeof (hsa_bb));
+  return new (m) hsa_bb (bb);
 }
 
 /* Initialize OMP in an HSA basic block PROLOGUE.  */
diff --git a/gcc/hsa.c b/gcc/hsa.c
index 168cfe3..f881e78 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -170,6 +170,7 @@  hsa_insn_basic::op_output_p (unsigned opnum)
     case BRIG_OPCODE_SBR:
     case BRIG_OPCODE_ST:
     case BRIG_OPCODE_SIGNALNORET:
+    case BRIG_OPCODE_DEBUGTRAP:
       /* FIXME: There are probably missing cases here, double check.  */
       return false;
     case BRIG_OPCODE_EXPAND:
@@ -605,8 +606,8 @@  hsa_destroy_insn (hsa_insn_basic *insn)
 {
   if (hsa_insn_phi *phi = dyn_cast <hsa_insn_phi *> (insn))
     phi->~hsa_insn_phi ();
-  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
-    br->~hsa_insn_br ();
+  else if (hsa_insn_cbr *br = dyn_cast <hsa_insn_cbr *> (insn))
+    br->~hsa_insn_cbr ();
   else if (hsa_insn_cmp *cmp = dyn_cast <hsa_insn_cmp *> (insn))
     cmp->~hsa_insn_cmp ();
   else if (hsa_insn_mem *mem = dyn_cast <hsa_insn_mem *> (insn))
@@ -621,6 +622,8 @@  hsa_destroy_insn (hsa_insn_basic *insn)
     block->~hsa_insn_arg_block ();
   else if (hsa_insn_sbr *sbr = dyn_cast <hsa_insn_sbr *> (insn))
     sbr->~hsa_insn_sbr ();
+  else if (hsa_insn_br *br = dyn_cast <hsa_insn_br *> (insn))
+    br->~hsa_insn_br ();
   else if (hsa_insn_comment *comment = dyn_cast <hsa_insn_comment *> (insn))
     comment->~hsa_insn_comment ();
   else
@@ -783,32 +786,22 @@  hsa_brig_function_name (const char *p)
   return buf;
 }
 
-/* Return declaration name if exists.  */
+/* Add a flatten attribute and disable vectorization for gpu implementation
+   function decl GDECL.  */
 
-const char *
-hsa_get_declaration_name (tree decl)
+void hsa_summary_t::process_gpu_implementation_attributes (tree gdecl)
 {
-  if (!DECL_NAME (decl))
-    {
-      char buf[64];
-      snprintf (buf, 64, "__hsa_anonymous_%i", DECL_UID (decl));
-      const char *ggc_str = ggc_strdup (buf);
-      return ggc_str;
-    }
-
-  tree name_tree;
-  if (TREE_CODE (decl) == FUNCTION_DECL
-      || (VAR_P (decl) && is_global_var (decl)))
-    name_tree = DECL_ASSEMBLER_NAME (decl);
-  else
-    name_tree = DECL_NAME (decl);
-
-  const char *name = IDENTIFIER_POINTER (name_tree);
-  /* User-defined assembly names have prepended asterisk symbol.  */
-  if (name[0] == '*')
-    name++;
+  DECL_ATTRIBUTES (gdecl)
+    = tree_cons (get_identifier ("flatten"), NULL_TREE,
+		 DECL_ATTRIBUTES (gdecl));
 
-  return name;
+  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
+  if (fn_opts == NULL_TREE)
+    fn_opts = optimization_default_node;
+  fn_opts = copy_node (fn_opts);
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
 }
 
 void
@@ -827,21 +820,10 @@  hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
   gpu_summary->m_gridified_kernel_p = gridified_kernel_p;
   host_summary->m_gridified_kernel_p = gridified_kernel_p;
 
-  gpu_summary->m_binded_function = host;
-  host_summary->m_binded_function = gpu;
-
-  tree gdecl = gpu->decl;
-  DECL_ATTRIBUTES (gdecl)
-    = tree_cons (get_identifier ("flatten"), NULL_TREE,
-		 DECL_ATTRIBUTES (gdecl));
+  gpu_summary->m_bound_function = host;
+  host_summary->m_bound_function = gpu;
 
-  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
-  if (fn_opts == NULL_TREE)
-    fn_opts = optimization_default_node;
-  fn_opts = copy_node (fn_opts);
-  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
-  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
-  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
+  process_gpu_implementation_attributes (gpu->decl);
 
   /* Create reference between a kernel and a corresponding host implementation
      to quarantee LTO streaming to a same LTRANS.  */
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 1b57a3c..c00ffd5 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -50,7 +50,6 @@  class hsa_insn_basic;
 class hsa_op_address;
 class hsa_op_reg;
 class hsa_bb;
-typedef hsa_insn_basic *hsa_insn_basic_p;
 
 /* Class representing an input argument, output argument (result) or a
    variable, that will eventually end up being a symbol directive.  */
@@ -72,7 +71,8 @@  struct hsa_symbol
   void fillup_for_decl (tree decl);
 
   /* Pointer to the original tree, which is PARM_DECL for input parameters and
-     RESULT_DECL for the output parameters.  */
+     RESULT_DECL for the output parameters.  Also can be CONST_DECL for Fortran
+     constants which need to be put into readonly segment.  */
   tree m_decl;
 
   /* Name of the symbol, that will be written into output and dumps.  Can be
@@ -259,11 +259,9 @@  private:
   /* Set definition where the register is defined.  */
   void set_definition (hsa_insn_basic *insn);
   /* Uses of the value while still in SSA.  */
-  auto_vec <hsa_insn_basic_p> m_uses;
+  auto_vec <hsa_insn_basic *> m_uses;
 };
 
-typedef class hsa_op_reg *hsa_op_reg_p;
-
 /* Report whether or not P is a register operand.  */
 
 template <>
@@ -490,17 +488,12 @@  class hsa_insn_phi : public hsa_insn_basic
 public:
   hsa_insn_phi (unsigned nops, hsa_op_reg *dst);
 
-  void *operator new (size_t);
-
   /* Destination.  */
   hsa_op_reg *m_dest;
 
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_phi () : hsa_insn_basic (1, HSA_OPCODE_PHI) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a PHI node.  */
@@ -513,35 +506,56 @@  is_a_helper <hsa_insn_phi *>::test (hsa_insn_basic *p)
   return p->m_opcode == HSA_OPCODE_PHI;
 }
 
-/* HSA instruction for branches.  Currently we explicitely represent only
-   conditional branches.  */
-
+/* HSA instruction for  */
 class hsa_insn_br : public hsa_insn_basic
 {
 public:
-  hsa_insn_br (hsa_op_reg *ctrl);
-
-  void *operator new (size_t);
+  hsa_insn_br (unsigned nops, int opc, BrigType16_t t, BrigWidth8_t width,
+	       hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
+	       hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
 
-  /* Width as described in HSA documentation.  */
+  /* Number of work-items affected in the same way by the instruction.  */
   BrigWidth8_t m_width;
+
 private:
   /* Make the default constructor inaccessible.  */
-  hsa_insn_br () : hsa_insn_basic (1, BRIG_OPCODE_CBR) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
+  hsa_insn_br () : hsa_insn_basic (0, BRIG_OPCODE_BR) {}
 };
 
-/* Report whether P is a branching instruction.  */
+/* Return true if P is a branching/synchronization instruction.  */
 
 template <>
 template <>
 inline bool
 is_a_helper <hsa_insn_br *>::test (hsa_insn_basic *p)
 {
-  return p->m_opcode == BRIG_OPCODE_BR
-    || p->m_opcode == BRIG_OPCODE_CBR;
+  return p->m_opcode == BRIG_OPCODE_BARRIER
+    || p->m_opcode == BRIG_OPCODE_BR;
+}
+
+/* HSA instruction for conditional branches.  Structurally the same as
+   hsa_insn_br but we represent it specially because of inherent control
+   flow it represents.  */
+
+class hsa_insn_cbr : public hsa_insn_br
+{
+public:
+  hsa_insn_cbr (hsa_op_reg *ctrl);
+
+private:
+  /* Make the default constructor inaccessible.  */
+  hsa_insn_cbr () : hsa_insn_br (0, BRIG_OPCODE_CBR, BRIG_TYPE_B1,
+				 BRIG_WIDTH_1) {}
+};
+
+/* Report whether P is a contitional branching instruction.  */
+
+template <>
+template <>
+inline bool
+is_a_helper <hsa_insn_cbr *>::test (hsa_insn_basic *p)
+{
+  return p->m_opcode == BRIG_OPCODE_CBR;
 }
 
 /* HSA instruction for switch branches.  */
@@ -554,8 +568,6 @@  public:
   /* Default destructor.  */
   ~hsa_insn_sbr ();
 
-  void *operator new (size_t);
-
   void replace_all_labels (basic_block old_bb, basic_block new_bb);
 
   /* Width as described in HSA documentation.  */
@@ -570,9 +582,6 @@  public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_sbr () : hsa_insn_basic (1, BRIG_OPCODE_SBR) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether P is a switch branching instruction.  */
@@ -594,8 +603,6 @@  public:
 		hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
 		hsa_op_base *arg2 = NULL);
 
-  void *operator new (size_t);
-
   /* Source type should be derived from operand types.  */
 
   /* The comparison operation.  */
@@ -606,9 +613,6 @@  public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_cmp () : hsa_insn_basic (1, BRIG_OPCODE_CMP) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a comparison instruction.  */
@@ -628,8 +632,6 @@  class hsa_insn_mem : public hsa_insn_basic
 public:
   hsa_insn_mem (int opc, BrigType16_t t, hsa_op_base *arg0, hsa_op_base *arg1);
 
-  void *operator new (size_t);
-
   /* Set alignment to VALUE.  */
 
   void set_align (BrigAlignment8_t value);
@@ -652,9 +654,6 @@  protected:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_mem () : hsa_insn_basic (1, BRIG_OPCODE_LD) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a memory instruction.  */
@@ -677,7 +676,6 @@  public:
 		   BrigType16_t t, BrigMemoryOrder memorder,
 		   hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
 		   hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
-  void *operator new (size_t);
 
   /* The operation itself.  */
   enum BrigAtomicOperation m_atomicop;
@@ -691,9 +689,6 @@  public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_atomic () : hsa_insn_mem (1, BRIG_KIND_NONE, BRIG_TYPE_NONE) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is an atomic instruction.  */
@@ -709,20 +704,19 @@  is_a_helper <hsa_insn_atomic *>::test (hsa_insn_basic *p)
 
 /* HSA instruction for signal operations.  */
 
-class hsa_insn_signal : public hsa_insn_atomic
+class hsa_insn_signal : public hsa_insn_basic
 {
 public:
   hsa_insn_signal (int nops, int opc, enum BrigAtomicOperation sop,
-		   BrigType16_t t, hsa_op_base *arg0 = NULL,
-		   hsa_op_base *arg1 = NULL,
+		   BrigType16_t t, BrigMemoryOrder memorder,
+		   hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
 		   hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
 
-  void *operator new (size_t);
+  /* Things like acquire/release/aligned.  */
+  enum BrigMemoryOrder m_memory_order;
 
-private:
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
+  /* The operation itself.  */
+  enum BrigAtomicOperation m_signalop;
 };
 
 /* Report whether or not P is a signal instruction.  */
@@ -744,8 +738,6 @@  public:
   hsa_insn_seg (int opc, BrigType16_t destt, BrigType16_t srct,
 		BrigSegment8_t seg, hsa_op_base *arg0, hsa_op_base *arg1);
 
-  void *operator new (size_t);
-
   /* Source type.  Depends on the source addressing/segment.  */
   BrigType16_t m_src_type;
   /* The segment we are converting from or to.  */
@@ -753,9 +745,6 @@  public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_seg () : hsa_insn_basic (1, BRIG_OPCODE_STOF) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a segment conversion instruction.  */
@@ -812,8 +801,6 @@  public:
   /* Default destructor.  */
   ~hsa_insn_call ();
 
-  void *operator new (size_t);
-
   /* Called function.  */
   tree m_called_function;
 
@@ -840,9 +827,6 @@  public:
 private:
   /* Make the default constructor inaccessible.  */
   hsa_insn_call () : hsa_insn_basic (0, BRIG_OPCODE_CALL) {}
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a call instruction.  */
@@ -866,17 +850,11 @@  class hsa_insn_arg_block : public hsa_insn_basic
 public:
   hsa_insn_arg_block (BrigKind brig_kind, hsa_insn_call * call);
 
-  void *operator new (size_t);
-
   /* Kind of argument block.  */
   BrigKind m_kind;
 
   /* Call instruction.  */
   hsa_insn_call *m_call_insn;
-private:
-  /* All objects are deallocated by destroying their pool, so make delete
-     inaccessible too.  */
-  void operator delete (void *) {}
 };
 
 /* Report whether or not P is a call block instruction.  */
@@ -900,8 +878,6 @@  public:
   /* Default destructor.  */
   ~hsa_insn_comment ();
 
-  void *operator new (size_t);
-
   char *m_comment;
 };
 
@@ -920,10 +896,18 @@  is_a_helper <hsa_insn_comment *>::test (hsa_insn_basic *p)
 class hsa_insn_queue: public hsa_insn_basic
 {
 public:
-  hsa_insn_queue (int nops, BrigOpcode opcode);
+  hsa_insn_queue (int nops, int opcode, BrigSegment segment,
+		  BrigMemoryOrder memory_order,
+		  hsa_op_base *arg0 = NULL, hsa_op_base *arg1 = NULL,
+		  hsa_op_base *arg2 = NULL, hsa_op_base *arg3 = NULL);
 
   /* Destructor.  */
   ~hsa_insn_queue ();
+
+  /* Segment used to refer to the queue.  Must be global or flat.  */
+  BrigSegment m_segment;
+  /* Memory order used to specify synchronization.  */
+  BrigMemoryOrder m_memory_order;
 };
 
 /* Report whether or not P is a queue instruction.  */
@@ -933,7 +917,12 @@  template <>
 inline bool
 is_a_helper <hsa_insn_queue *>::test (hsa_insn_basic *p)
 {
-  return (p->m_opcode == BRIG_OPCODE_ADDQUEUEWRITEINDEX);
+  return (p->m_opcode == BRIG_OPCODE_ADDQUEUEWRITEINDEX
+	  || p->m_opcode == BRIG_OPCODE_CASQUEUEWRITEINDEX
+	  || p->m_opcode == BRIG_OPCODE_LDQUEUEREADINDEX
+	  || p->m_opcode == BRIG_OPCODE_LDQUEUEWRITEINDEX
+	  || p->m_opcode == BRIG_OPCODE_STQUEUEREADINDEX
+	  || p->m_opcode == BRIG_OPCODE_STQUEUEWRITEINDEX);
 }
 
 /* HSA source type instruction.  */
@@ -945,9 +934,6 @@  public:
 		   BrigType16_t srct, hsa_op_base *arg0, hsa_op_base *arg1,
 		   hsa_op_base *arg2);
 
-  /* Pool allocator.  */
-  void *operator new (size_t);
-
   /* Source type.  */
   BrigType16_t m_source_type;
 
@@ -976,9 +962,6 @@  public:
 		   BrigType16_t srct, hsa_op_base *arg0, hsa_op_base *arg1,
 		   hsa_op_base *arg2);
 
-  /* Pool allocator.  */
-  void *operator new (size_t);
-
   /* Operand list for an operand of the instruction.  */
   hsa_op_operand_list *m_operand_list;
 
@@ -1003,9 +986,6 @@  class hsa_insn_cvt: public hsa_insn_basic
 {
 public:
   hsa_insn_cvt (hsa_op_with_type *dest, hsa_op_with_type *src);
-
-  /* Pool allocator.  */
-  void *operator new (size_t);
 };
 
 /* Report whether or not P is a convert instruction.  */
@@ -1028,9 +1008,6 @@  public:
 
   /* Required alignment of the allocation.  */
   BrigAlignment8_t m_align;
-
-  /* Pool allocator.  */
-  void *operator new (size_t);
 };
 
 /* Report whether or not P is an alloca instruction.  */
@@ -1055,6 +1032,9 @@  public:
   /* Append an instruction INSN into the basic block.  */
   void append_insn (hsa_insn_basic *insn);
 
+  /* Add a PHI instruction.  */
+  void append_phi (hsa_insn_phi *phi);
+
   /* The real CFG BB that this HBB belongs to.  */
   basic_block m_bb;
 
@@ -1217,7 +1197,7 @@  public:
   unsigned m_temp_symbol_count;
 
   /* SSA names mapping.  */
-  vec <hsa_op_reg_p> m_ssa_map;
+  vec <hsa_op_reg *> m_ssa_map;
 
   /* Flag whether a function needs update of dominators before RA.  */
   bool m_modified_cfg;
@@ -1239,9 +1219,9 @@  struct hsa_function_summary
   hsa_function_kind m_kind;
 
   /* Pointer to a cgraph node which is a HSA implementation of the function.
-     In case of the function is a HSA function, the binded function points
+     In case of the function is a HSA function, the bound function points
      to the host function.  */
-  cgraph_node *m_binded_function;
+  cgraph_node *m_bound_function;
 
   /* Identifies if the function is an HSA function or a host function.  */
   bool m_gpu_implementation_p;
@@ -1252,7 +1232,7 @@  struct hsa_function_summary
 
 inline
 hsa_function_summary::hsa_function_summary (): m_kind (HSA_NONE),
-  m_binded_function (NULL), m_gpu_implementation_p (false)
+  m_bound_function (NULL), m_gpu_implementation_p (false)
 {
 }
 
@@ -1270,6 +1250,9 @@  public:
 
   void link_functions (cgraph_node *gpu, cgraph_node *host,
 		       hsa_function_kind kind, bool gridified_kernel_p);
+
+private:
+  void process_gpu_implementation_attributes (tree gdecl);
 };
 
 /* OMP simple builtin describes behavior that should be done for
diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index 769657f..0fbe2e2 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -79,7 +79,7 @@  process_hsa_functions (void)
       hsa_function_summary *s = hsa_summaries->get (node);
 
       /* A linked function is skipped.  */
-      if (s->m_binded_function != NULL)
+      if (s->m_bound_function != NULL)
 	continue;
 
       if (s->m_kind != HSA_NONE)
@@ -90,6 +90,7 @@  process_hsa_functions (void)
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
 					  NULL, NULL, "hsa");
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+	  clone->externally_visible = node->externally_visible;
 
 	  clone->force_output = true;
 	  hsa_summaries->link_functions (clone, node, s->m_kind, false);
@@ -107,6 +108,7 @@  process_hsa_functions (void)
 	    = node->create_virtual_clone (vec <cgraph_edge *> (),
 					  NULL, NULL, "hsa");
 	  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+	  clone->externally_visible = node->externally_visible;
 
 	  if (!cgraph_local_p (node))
 	    clone->force_output = true;
@@ -131,7 +133,7 @@  process_hsa_functions (void)
 	      hsa_function_summary *dst = hsa_summaries->get (e->callee);
 	      if (dst->m_kind != HSA_NONE && !dst->m_gpu_implementation_p)
 		{
-		  e->redirect_callee (dst->m_binded_function);
+		  e->redirect_callee (dst->m_bound_function);
 		  if (dump_file)
 		    fprintf (dump_file,
 			     "Redirecting edge to HSA function: %s->%s\n",
@@ -193,10 +195,10 @@  ipa_hsa_write_summary (void)
 	  bp = bitpack_create (ob->main_stream);
 	  bp_pack_value (&bp, s->m_kind, 2);
 	  bp_pack_value (&bp, s->m_gpu_implementation_p, 1);
-	  bp_pack_value (&bp, s->m_binded_function != NULL, 1);
+	  bp_pack_value (&bp, s->m_bound_function != NULL, 1);
 	  streamer_write_bitpack (&bp);
-	  if (s->m_binded_function)
-	    stream_write_tree (ob, s->m_binded_function->decl, true);
+	  if (s->m_bound_function)
+	    stream_write_tree (ob, s->m_bound_function->decl, true);
 	}
     }
 
@@ -249,7 +251,7 @@  ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
       if (has_tree)
 	{
 	  tree decl = stream_read_tree (&ib_main, data_in);
-	  s->m_binded_function = cgraph_node::get_create (decl);
+	  s->m_bound_function = cgraph_node::get_create (decl);
 	}
     }
   lto_free_section_data (file_data, LTO_section_ipa_hsa, NULL, data,
diff --git a/libgomp/testsuite/libgomp.hsa.c/bits-insns.c b/libgomp/testsuite/libgomp.hsa.c/bits-insns.c
new file mode 100644
index 0000000..21cac72
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/bits-insns.c
@@ -0,0 +1,73 @@ 
+#include <math.h>
+
+#define N 12
+
+int main()
+{
+  unsigned int arguments[N] = {0u, 1u, 2u, 3u, 111u, 333u, 444u, 0x80000000u, 0x0000ffffu, 0xf0000000u, 0xff000000u, 0xffffffffu};
+  int clrsb[N] = {};
+  int clz[N] = {};
+  int ctz[N] = {};
+  int ffs[N] = {};
+  int parity[N] = {};
+  int popcount[N] = {};
+
+  int ref_clrsb[N] = {};
+  int ref_clz[N] = {};
+  int ref_ctz[N] = {};
+  int ref_ffs[N] = {};
+  int ref_parity[N] = {};
+  int ref_popcount[N] = {};
+
+  for (unsigned i = 0; i < N; i++)
+    {
+      ref_clrsb[i] = __builtin_clrsb (arguments[i]);
+      ref_clz[i] = __builtin_clz (arguments[i]);
+      ref_ctz[i] = __builtin_ctz (arguments[i]);
+      ref_ffs[i] = __builtin_ffs (arguments[i]);
+      ref_parity[i] = __builtin_parity (arguments[i]);
+      ref_popcount[i] = __builtin_popcount (arguments[i]);
+    }
+
+  #pragma omp target map(from:clz, ctz, ffs, parity, popcount)
+  {
+    for (unsigned i = 0; i < N; i++)
+    {
+      clrsb[i] = __builtin_clrsb (arguments[i]);
+      clz[i] = __builtin_clz (arguments[i]);
+      ctz[i] = __builtin_ctz (arguments[i]);
+      ffs[i] = __builtin_ffs (arguments[i]);
+      parity[i] = __builtin_parity (arguments[i]);
+      popcount[i] = __builtin_popcount (arguments[i]);
+    }
+  }
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_clrsb[i] != clrsb[i])
+      __builtin_abort ();
+
+  /* CLZ of zero is undefined for zero.  */
+  for (unsigned i = 1; i < N; i++)
+    if (ref_clz[i] != clz[i])
+      __builtin_abort ();
+
+  /* Likewise for ctz */
+  for (unsigned i = 1; i < N; i++)
+    if (ref_ctz[i] != ctz[i])
+      __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_ffs[i] != ffs[i])
+      __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_parity[i] != parity[i])
+      __builtin_abort ();
+
+  for (unsigned i = 0; i < N; i++)
+    if (ref_popcount[i] != popcount[i])
+      __builtin_abort ();
+
+  return 0;
+}
+