diff mbox

_Cilk_for for C and C++

Message ID BF230D13CA30DD48930C31D4099330003A4AC599@FMSMSX101.amr.corp.intel.com
State New
Headers show

Commit Message

Iyer, Balaji V Nov. 26, 2013, 4:03 a.m. UTC
Hi Jason,
	I am attaching a fixed patch. I have resolved all the issues you have mentioned below and I have added answers to your questions below. I have not regenerated the C patch since nothing has changed on it.

Here are the ChangeLog entries:
gcc/cp/ChangeLog.
2013-11-25  Balaji V. Iyer  <balaji.v.iyer@intel.com>

        * cp-cilkplus.c: Added cgraph.h, gimple.h and gimplify.h.
        (callable): New function.
        (calc_count_up_count_down): Likewise.
        (compute_loop_var_cp_iter_hdl): Likewise.
        (cp_create_cilk_for_body): Likewise.
        (create_cilk_for_nested_fn): Likewise.
        (gimplify_cilk_for_1): Likewise.
        (cp_extract_cilk_for_fields): Likewise.
        (cp_gimplify_cilk_for): Likewise.
        * cp-gimplify.c (genericize_cilk_for_stmt): Likewise.
        (cp_genericize_r): Added a check for CILK_FOR_STMT.
        * cp-objcp-common.h (LANG_HOOKS_CILKPLUS_GIMPLIFY_CILK_FOR): New
        #define.
        * cp-tree.h (begin_cilk_for_stmt): New prototype.
        (finish_cilk_for_stmt): Likewise.
        (finish_cilk_for_init_stmt): Likewise.
        (cp_gimplify_cilk_for): Likewise.
        * parser.c (cp_parser_cilk_grainsize): New function and prototype.
        (cp_parser_init_declarator): Added a new parameter to hold the
        initial value.
        (cp_parser_statement): Added RID_CILK_FOR case.
        (cp_parser_iteration_statement): Likewise.
        (cp_parser_jump_statement): Added IN_CILK_FOR_STMT case (twice).
        (cp_parser_pragma): Added PRAGMA_CILK_GRAINSIZE case.
        (cp_parser_cilk_for_init_statement): New function.
        (cp_parser_cilk_for): Renamed a parameter and added support for
        parsing _Cilk_for loops that are part of Cilk keywords.
        * parser.h (IN_CILK_FOR_STMT): New #define.
        * pt.c (tsubst_expr): Added CILK_FOR_STMT case.
        * semantics.c (begin_for_scope): Added "_Cilk_for statement" in the
        header comment.
        (finish_for_expr): Added support for CILK_FOR_STMT to use this
        function.
        (finish_cilk_for_cond): Added support for processing templates.
        (begin_cilk_for_stmt): New function.
        (finish_cilk_for_init_stmt): Likewise.
        (finish_clk_for_stmt): Likewise.

gcc/testsuite/ChangeLog.
2013-11-15  Balaji V. Iyer  <balaji.v.iyer@intel.com>

        * g++.dg/cilk-plus/CK/cilk-for-start-at-5.cc: New test.
        * g++.dg/cilk-plus/CK/cilk-for-tplt.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk-for.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_cont_inside_for.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_cont_with_for.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_cont_with_if.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_cont_with_while.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_genricize_test.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_grainsize.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_p_errors.cc: Likewise.
        * g++.dg/cilk-plus/CK/cilk_for_t_errors.cc: Likewise.
        * g++.dg/cilk-plus/CK/explicit_ctor.cc: Likewise.
        * g++.dg/cilk-plus/CK/label_test.cc: Likewise.
        * g++.dg/cilk-plus/CK/no-opp-overload-error.cc: Likewise.
        * g++.dg/cilk-plus/CK/plus-equal-one.cc: Likewise.
        * g++.dg/cilk-plus/CK/plus-equal-test.cc: Likewise.
        * g++.dg/cilk-plus/CK/stl_iter.cc: Likewise.
        * g++.dg/cilk-plus/CK/stl_test.cc: Likewise.
        * g++.dg/cilk-plus/cilk-plus.exp: Added support to call _Cilk_for
        testcodes.

Thanks,

Balaji V. Iyer.

> -----Original Message-----
> From: Jason Merrill [mailto:jason@redhat.com]
> Sent: Friday, November 22, 2013 11:46 AM
> To: Iyer, Balaji V; Aldy Hernandez
> Cc: gcc-patches@gcc.gnu.org; Jeff Law; rth@redhat.com
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On 11/18/2013 04:50 PM, Iyer, Balaji V wrote:
> > +  int flags = LOOKUP_PROTECT | LOOKUP_ONLYCONVERTING;
> 
> Why not LOOKUP_NORMAL? LOOKUP_ONLYCONVERTING isn't relevant in
> this context.
> 

Fixed. I used LOOKUP_NORMAL

> > +  tree exp = build_new_op (EXPR_LOCATION (op1), code, flags, op0, op1,
> > +			   NULL_TREE, NULL, 0);
> 
> Use tf_none instead of 0.
> 

Fixed.

> > +  if (exp == error_mark_node)
> > +    exp = build_x_modify_expr (EXPR_LOCATION (op1), op0, code, op1,
> > + tf_none);  if (exp && exp != error_mark_node)
> > +    return exp;
> 
> This doesn't make sense to me.  build_x_modify_expr takes codes like
> PLUS_EXPR and then does an assignment afterward; we don't want to
> quietly do += just because there's some error with the evaluation of the
> + operation.  What is this code trying to do?
> 

Yes, that was a mistake on my side. The exp cannot be error_mark_code at this point  I am sorry. It is removed.

> > +/* Handler for iterator to compute the loop variable.  ADD_OP indicates
> > +   whether we need a '+' or '-' operation. LOW indicates the starting point
> > +   and LOOP_VAR is the induction variable.  Returns an expression (or a
> > +   STATEMENT_LIST of expressions).  If it is unable to find the appropriate
> > +   iteration, then it returns an error mark node and its parent will set
> > +   the loop as invalid.  */
> 
> This doesn't explain what VAR2 is.  And it seems like you're also using LOW as
> the increment?
> 

var2 is the copy of the induction variable in _Cilk_for but its context is the cilk_for nested function.

> > +      tree new_stmt = build_x_modify_expr (loc, new_var, INIT_EXPR,
> > +					   build_zero_cst (TREE_TYPE
> (new_var)),
> > +					   tf_warning_or_error);
> > +      if (new_stmt == error_mark_node)
> > +	return error_mark_node;
> > +      append_to_statement_list (new_stmt, &exp);
> > +      new_stmt = build_x_modify_expr (loc, new_var, NOP_EXPR, low,
> > +				      tf_warning_or_error);
> 
> Why assign 0 if you're going to immediately assign low afterwards?
> 

This part is fixed as I mentioned above.

> > +  /* We have to manually create this loop for two reasons:
> > +     a. We need to have access to continue and start label since we need
> > +        to resolve continue and breaks by hand.
> 
> Why do you need to resolve them by hand?  It looks like break isn't even
> allowed.
> 

You are correct, I don't need to do them. I just need to emit a FOR_STMT with the body inside it and then when I do a cp_genericize, it will automatically resolve it. I fixed it accordingly.

> > +     b. C++ doesn't provide a c_finish_loop function like C does.  */
> 
> Why is that important?
> 

Please see my note above. By the way, I hope you didn't read my above comment as knocking the C++ implementation. I just gave a detailed explanation as to why I did the loop-creation by hand. I have removed that now since it is no longer applicable

> >    sk_for,	     /* The scope of the variable declared in a
> >  			for-init-statement.  */
> > +  sk_cilk_for,       /* The scope of the variable declared in _Cilk_for init
> > +			statement.  */
> 
> How is this different from a normal for-init-statement?  Nothing seems to
> use it.
> 

Yep. Removed.

> Jason

Comments

Jason Merrill Nov. 27, 2013, 5:06 p.m. UTC | #1
On 11/25/2013 11:03 PM, Iyer, Balaji V wrote:

On a broad note, I think there's a lot of OpenMP code you could be 
reusing here rather than writing it all again.  And that way Cilk code 
will benefit from improvements to OpenMP handling, and vice versa.  It 
probably makes sense to turn Cilk_for into an OMP_FOR loop, and then 
gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and 
handle everything at the tree level.  But I don't know the OMP code well 
enough to suggest exactly how that would work.

Finer-grained comments:

> +  tree exp = build_new_op (EXPR_LOCATION (op1), code, flags, op0, op1,
> +			   NULL_TREE, NULL, tf_none);
> +  if (exp == error_mark_node)
> +    exp = build_x_modify_expr (EXPR_LOCATION (op1), op0, code, op1, tf_none);
> +  if (exp && exp != error_mark_node)
> +    return exp;

I thought you were changing this?

> +/* Handler for iterator to compute the loop variable.  ADD_OP indicates
> +   whether we need a '+' or '-' operation. LOW indicates the starting point
> +   and LOOP_VAR is the induction variable.  This functin returns an
> +   INIT_EXPR.  */

This comment still doesn't document VAR2.

"function"

> +  tree exp = build_new_op (loc, add_op, 0, low, loop_var, NULL_TREE, 0,
> +			   tf_none);
> +  gcc_assert (exp != error_mark_node);
> +  exp = cp_build_modify_expr (var2, INIT_EXPR, exp, tf_warning_or_error);

Looking at online Cilk documentation I see:

> The increment expression must add to or subtract from the control variable using one of the following supported operations:
> +=
> -=
> ++ (prefix or postfix)
> -- (prefix or postfix)

 From this, I think people would expect the increment to use a 
user-defined operator+=/-=/++/--, but your code above uses operator+/- 
instead.

> +		    "-fcilkplus must be enabled t use %<_Cilk_for%>");

"to"

> +cp_parser_cilk_for (cp_parser *parser, tree grain)

Please reuse cp_parser_omp_for, like Aldy did for #pragma simd 
(cp_parser_cilk_simd) rather than write yet another for-statement 
parser.  This should reduce the patch size quite a bit.

> +    case PRAGMA_CILK_GRAINSIZE:
> +      if (context == pragma_external)
> +	{
> +	  error_at (pragma_tok->location,
> +		    "%<#pragma cilk grainsize%> may only be be used inside a "
> +		    "function");
> +	  break;
> +	}
> +
> +      /* Ignore the pragma if Cilk Plus is not enabled.  */
> +      if (flag_enable_cilkplus)
> +	{
> +	  cp_parser_cilk_grainsize (parser, pragma_tok);
> +	  return true;
> +	}
>      default:

Do you mean to fall through to the default case if Cilk+ is not enabled?

> +	tmp = CILK_FOR_COND (t);
> +	if (COMPARISON_CLASS_P (tmp))
> +	  {
> +	    tree op0 = RECUR (TREE_OPERAND (tmp, 0));
> +	    tree op1 = RECUR (TREE_OPERAND (tmp, 1));
> +	    tmp = build2 (TREE_CODE (tmp), boolean_type_node, op0, op1);
> +	  }
> +	CILK_FOR_COND (stmt) = tmp;

Why not just recur into CILK_FOR_COND?

> +	tmp = CILK_FOR_EXPR (t);
> +	if (TREE_CODE (tmp) == MODIFY_EXPR)
> +	  {
> +	    tree lhs = TREE_OPERAND (tmp, 0);
> +	    tree rhs = TREE_OPERAND (tmp, 1);
> +	    lhs = RECUR (lhs);
> +	    rhs = build2 (TREE_CODE (rhs), TREE_TYPE (lhs),
> +			  RECUR (TREE_OPERAND (rhs, 0)),
> +			  RECUR (TREE_OPERAND (rhs, 1)));
> +	    tmp = build2 (MODIFY_EXPR, void_type_node, lhs, rhs);
> +	  }
> +	else
> +	  tmp = build2 (TREE_CODE (tmp), void_type_node,
> +			RECUR (TREE_OPERAND (tmp, 0)),
> +			RECUR (TREE_OPERAND (tmp, 1)));
> +	finish_for_expr (tmp, stmt);

And CILK_FOR_EXPR?

Jason
Jeff Law Nov. 27, 2013, 7:43 p.m. UTC | #2
On 11/27/13 10:06, Jason Merrill wrote:
> On 11/25/2013 11:03 PM, Iyer, Balaji V wrote:
>
> On a broad note, I think there's a lot of OpenMP code you could be
> reusing here rather than writing it all again.  And that way Cilk code
> will benefit from improvements to OpenMP handling, and vice versa.  It
> probably makes sense to turn Cilk_for into an OMP_FOR loop, and then
> gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and
> handle everything at the tree level.  But I don't know the OMP code well
> enough to suggest exactly how that would work.
That's certainly the direction I'd like to see this work go as well.  To 
the fullest extent possible Cilk+ should be layering on top of the 
OpenMP 4 work -- ie, Cilk+ should really be dealing with parsing issues, 
then handoff to OpenMP for the real work.

Jeff
Iyer, Balaji V Nov. 27, 2013, 9:14 p.m. UTC | #3
> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Wednesday, November 27, 2013 2:43 PM
> To: Jason Merrill; Iyer, Balaji V; Aldy Hernandez
> Cc: gcc-patches@gcc.gnu.org; rth@redhat.com; Jakub Jelinek
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On 11/27/13 10:06, Jason Merrill wrote:
> > On 11/25/2013 11:03 PM, Iyer, Balaji V wrote:
> >
> > On a broad note, I think there's a lot of OpenMP code you could be
> > reusing here rather than writing it all again.  And that way Cilk code
> > will benefit from improvements to OpenMP handling, and vice versa.  It
> > probably makes sense to turn Cilk_for into an OMP_FOR loop, and then
> > gimplify into GIMPLE_OMP_FOR, rather than create a new tree code and
> > handle everything at the tree level.  But I don't know the OMP code
> > well enough to suggest exactly how that would work.
> That's certainly the direction I'd like to see this work go as well.  To the fullest
> extent possible Cilk+ should be layering on top of the OpenMP 4 work -- ie,
> Cilk+ should really be dealing with parsing issues, then handoff to OpenMP
> for the real work.

Hello Jeff and Jason,
	I completely agree with you that there are certain parts of Cilk Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled functions (formerly called elemental functions). But, the Cilk keywords is almost completely orthogonal to OpenMP. They are semantically different  and one cannot be transformed to another. Cilk uses automatically load-balanced work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP runtime. There are a number of other semantic differences but this is the core-issue. #pragma simd and #pragma omp have converged in several places but the Cilk part has always been different from OpenMP.

	I have thought about sharing routines with OpenMP and have done it in several parts of Cilk plus.  It is not possible to share any middle end work between Cilk keywords and OpenMP because they are fundamentally different. I have shared some parsing parts with omp  in C.

 	Since we are talking about _Cilk_for loops, maybe an example of how a compiler is supposed break down a _Cilk_for loop will help. Please see the example below. It is a simple main routine with one _Cilk_for in it and it returns a local variable X that may or may not be read/written in the body:


int main (void)
{
	int X = 0;
	_Cilk_for (int ii = 5; ii < 15; ii++)
	{
		<body>
	}
	return X;
}

This program is converted to the following:

/* Low and high fields are passed in by the runtime using the user defined grainsize or the rumtime     
     computed one. Data field is ignored in GCC, please see below.  */

cilk_for_helper_function  (void *data, int low, int high) {
	for (ii = low; ii < high; ii++)
		<body>;
}

int main (void)
{
	int X = 0;
	/* This function is actually a call the the runtime whose implementation is in 		  	      libcilkrts/runtime/cilk-abi-cilk-for.c.  */
	__cilkrts_cilk_for_64 	(__cilk_for_001,   /* Nested/Lambda function */
				  __cilk_for_001,   /* Data used by the lambda function, the runtime
                                                                                                          does not worry about it.  It is an interface to 
                                                                                                          pass the information to the lambda function. In           
                                                                                                          GCC we create a nested function so it is 
                                                                                                           ignored.  */
				10                                /* loop_count (15-5) */, 
				0                                  /* grain value from the 
                                                                                                         #pragma grainsize pragma */   );

	
	/* Note: if the trip-count is 32 bit then __cilkrts_cilk_for_64 is replaced by 			     __cilkrts_cilk_for_32  */
	
	return X;
}


As you can tell, this is not how openmp handles a #pragma omp for loop.

Thanks,

Balaji V. Iyer.

> 
> Jeff
Jason Merrill Nov. 28, 2013, 12:52 a.m. UTC | #4
On 11/27/2013 04:14 PM, Iyer, Balaji V wrote:
> 	I completely agree with you that there are certain parts of Cilk Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled functions (formerly called elemental functions). But, the Cilk keywords is almost completely orthogonal to OpenMP. They are semantically different  and one cannot be transformed to another. Cilk uses automatically load-balanced work-stealing using the Cilk runtime, whereas OMP uses work sharing via OMP runtime. There are a number of other semantic differences but this is the core-issue. #pragma simd and #pragma omp have converged in several places but the Cilk part has always been different from OpenMP.

Yes, Cilk for loops will use the Cilk runtime and OMP for loops will use 
the OMP runtime, but that doesn't mean they can't share a lot of the 
middle end code along the way.

We already have several different varieties of parallel/simd loops all 
represented by GIMPLE_OMP_FOR, and I think this could be another 
GP_OMP_FOR_KIND_.

...
> As you can tell, this is not how openmp handles a #pragma omp for loop.

It's different in detail, but #pragma omp parallel for works very 
similarly: it creates a separate function for the body of the loop and 
then passes that to GOMP_parallel along with any shared data.

Jason
Jeff Law Dec. 3, 2013, 6:30 a.m. UTC | #5
On 11/27/13 17:52, Jason Merrill wrote:
> On 11/27/2013 04:14 PM, Iyer, Balaji V wrote:
>>     I completely agree with you that there are certain parts of Cilk
>> Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled
>> functions (formerly called elemental functions). But, the Cilk
>> keywords is almost completely orthogonal to OpenMP. They are
>> semantically different  and one cannot be transformed to another. Cilk
>> uses automatically load-balanced work-stealing using the Cilk runtime,
>> whereas OMP uses work sharing via OMP runtime. There are a number of
>> other semantic differences but this is the core-issue. #pragma simd
>> and #pragma omp have converged in several places but the Cilk part has
>> always been different from OpenMP.
>
> Yes, Cilk for loops will use the Cilk runtime and OMP for loops will use
> the OMP runtime, but that doesn't mean they can't share a lot of the
> middle end code along the way.
>
> We already have several different varieties of parallel/simd loops all
> represented by GIMPLE_OMP_FOR, and I think this could be another
> GP_OMP_FOR_KIND_.
Right.  It's not a question of what runtime they call back into, but 
that both share a common overall structure.

Conceptually I look at a for loop as having 4 main components.

Initializer, test condition, increment and the body.

I'd like to hope things like the syntatic & semantic analysis of the 
first three would be largely the same.  Most of the Cilk specific bits 
would be in the handling of the body -- but there may be some 
significant code sharing that can happen there too.


>
> ...
>> As you can tell, this is not how openmp handles a #pragma omp for loop.
>
> It's different in detail, but #pragma omp parallel for works very
> similarly: it creates a separate function for the body of the loop and
> then passes that to GOMP_parallel along with any shared data.
My thoughts exactly.
jeff
Iyer, Balaji V Dec. 3, 2013, 1:25 p.m. UTC | #6
> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Tuesday, December 3, 2013 1:30 AM
> To: Jason Merrill; Iyer, Balaji V; Aldy Hernandez
> Cc: gcc-patches@gcc.gnu.org; rth@redhat.com; Jakub Jelinek
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On 11/27/13 17:52, Jason Merrill wrote:
> > On 11/27/2013 04:14 PM, Iyer, Balaji V wrote:
> >>     I completely agree with you that there are certain parts of Cilk
> >> Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled
> >> functions (formerly called elemental functions). But, the Cilk
> >> keywords is almost completely orthogonal to OpenMP. They are
> >> semantically different  and one cannot be transformed to another.
> >> Cilk uses automatically load-balanced work-stealing using the Cilk
> >> runtime, whereas OMP uses work sharing via OMP runtime. There are a
> >> number of other semantic differences but this is the core-issue.
> >> #pragma simd and #pragma omp have converged in several places but the
> >> Cilk part has always been different from OpenMP.
> >
> > Yes, Cilk for loops will use the Cilk runtime and OMP for loops will
> > use the OMP runtime, but that doesn't mean they can't share a lot of
> > the middle end code along the way.
> >
> > We already have several different varieties of parallel/simd loops all
> > represented by GIMPLE_OMP_FOR, and I think this could be another
> > GP_OMP_FOR_KIND_.
> Right.  It's not a question of what runtime they call back into, but that both
> share a common overall structure.
> 
> Conceptually I look at a for loop as having 4 main components.
> 
> Initializer, test condition, increment and the body.
> 
> I'd like to hope things like the syntatic & semantic analysis of the first three
> would be largely the same.  Most of the Cilk specific bits would be in the
> handling of the body -- but there may be some significant code sharing that
> can happen there too.
> 
> 
> >
> > ...
> >> As you can tell, this is not how openmp handles a #pragma omp for loop.
> >
> > It's different in detail, but #pragma omp parallel for works very
> > similarly: it creates a separate function for the body of the loop and
> > then passes that to GOMP_parallel along with any shared data.
> My thoughts exactly.

I understand you both now. Let me look into the OMP routines and see what it is doing and see how I can port it to _Cilk_for. 

Thanks,

Balaji V. Iyer.

> jeff
Jakub Jelinek Dec. 3, 2013, 1:40 p.m. UTC | #7
On Tue, Dec 03, 2013 at 01:25:57PM +0000, Iyer, Balaji V wrote:
> > >> As you can tell, this is not how openmp handles a #pragma omp for loop.
> > >
> > > It's different in detail, but #pragma omp parallel for works very
> > > similarly: it creates a separate function for the body of the loop and
> > > then passes that to GOMP_parallel along with any shared data.
> > My thoughts exactly.
> 
> I understand you both now. Let me look into the OMP routines and see what
> it is doing and see how I can port it to _Cilk_for.

Yeah.  The work is actually multi-stage, first during gimplification
the code does determine what variables are used in the #pragma omp parallel
(etc., in your case _Cilk_for) region, and whether they should be shared,
or privatized (and in that case in what way, normal private, firstprivate,
lastprivate, firstprivate+lastprivate, reduction, ...).  Then there is
omplower pass (already enabled for Cilk+ due to #pragma simd) that e.g.
lowers accesses to shared variables, creates new VAR_DECLs for the
privatized vars etc. and then ompexp pass that will create the outlined body
of the function and create call to the runtime library.
I have no idea what privatization behavior _Cilk_for wants, I'd expect that
at least the IV must be privatized, otherwise it would be racy, but about
other vars?

	Jakub
Iyer, Balaji V Dec. 3, 2013, 2:01 p.m. UTC | #8
> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Tuesday, December 3, 2013 8:40 AM
> To: Iyer, Balaji V
> Cc: Jeff Law; Jason Merrill; Aldy Hernandez; gcc-patches@gcc.gnu.org;
> rth@redhat.com
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On Tue, Dec 03, 2013 at 01:25:57PM +0000, Iyer, Balaji V wrote:
> > > >> As you can tell, this is not how openmp handles a #pragma omp for
> loop.
> > > >
> > > > It's different in detail, but #pragma omp parallel for works very
> > > > similarly: it creates a separate function for the body of the loop
> > > > and then passes that to GOMP_parallel along with any shared data.
> > > My thoughts exactly.
> >
> > I understand you both now. Let me look into the OMP routines and see
> > what it is doing and see how I can port it to _Cilk_for.
> 
> Yeah.  The work is actually multi-stage, first during gimplification the code
> does determine what variables are used in the #pragma omp parallel (etc., in
> your case _Cilk_for) region, and whether they should be shared, or
> privatized (and in that case in what way, normal private, firstprivate,
> lastprivate, firstprivate+lastprivate, reduction, ...).  Then there is omplower
> pass (already enabled for Cilk+ due to #pragma simd) that e.g.
> lowers accesses to shared variables, creates new VAR_DECLs for the
> privatized vars etc. and then ompexp pass that will create the outlined body
> of the function and create call to the runtime library.
> I have no idea what privatization behavior _Cilk_for wants, I'd expect that at
> least the IV must be privatized, otherwise it would be racy, but about other
> vars?
> 

In Cilk_for you don't need to privatize any variables. I need to pass in the loop_count, the grain (if the user specifies one), the nested function and its context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). The nested function has the body of the _Cilk_for and it has 3 parameter, context, start and end, where the start and end are passed in by the runtime which will tell what parts of the loop to execute. This thread has an example: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html

Thanks,

Balaji V. Iyer.

> 	Jakub
Jakub Jelinek Dec. 3, 2013, 2:09 p.m. UTC | #9
On Tue, Dec 03, 2013 at 02:01:17PM +0000, Iyer, Balaji V wrote:
> In Cilk_for you don't need to privatize any variables. I need to pass in
> the loop_count, the grain (if the user specifies one), the nested function
> and its context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). 
> The nested function has the body of the _Cilk_for and it has 3 parameter,
> context, start and end, where the start and end are passed in by the
> runtime which will tell what parts of the loop to execute.  This thread
> has an example: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html

So Cilk+ only allows say:
	_Cilk_for (int ii = 5; ii < 15; ii++)
	{
		<body>
	}
and not
	int ii;
	_Cilk_for (ii = 5; ii < 15; ii++)
	{
		<body>
	}
?  Other variables can be all shared, that is after all the default
for #pragma omp parallel for, except for the IVs and a couple of other
exceptions (e.g. readonly vars etc.), if somebody wants private vars,
they can be surely declared inside of the <body> somwhere.

	Jakub
Jeff Law Dec. 3, 2013, 7:44 p.m. UTC | #10
On 12/03/13 06:25, Iyer, Balaji V wrote:

>
> I understand you both now. Let me look into the OMP routines and see what it is doing and see how I can port it to _Cilk_for.
Thanks.  I know it's a bit of a pain, but part what's driving the desire 
to share is to reduce the long term maintenance cost for everyone.

jeff
diff mbox

Patch

diff --git a/gcc/cp/cp-cilkplus.c b/gcc/cp/cp-cilkplus.c
index 414f71e..1c5cfb2
--- a/gcc/cp/cp-cilkplus.c
+++ b/gcc/cp/cp-cilkplus.c
@@ -28,7 +28,10 @@ 
 #include "tree-iterator.h"
 #include "tree-inline.h"  /* for copy_tree_body_r.  */
 #include "ggc.h"
+#include "cgraph.h"
+#include "gimple.h"
 #include "cilk.h"
+#include "gimplify.h"
 
 /* Callback for cp_walk_tree to validate the body of a pragma simd loop
    or _cilk_for loop.
@@ -180,3 +183,431 @@  cp_cilk_install_body_wframe_cleanup (tree fndecl, tree orig_body)
 			    &list);
 }
 
+/* Returns an overloaded function that does operation based on CODE using
+   OP0 and OP1.  If CRY is set to true, then the function complains when
+   it is unable to find an overloaded operator.  */
+
+static tree
+callable (location_t loc, enum tree_code code, tree op0, tree op1, bool cry)
+{
+  vec<tree, va_gc> *op1_vec = make_tree_vector_single (op1);
+  if (code == INIT_EXPR)
+    return build_special_member_call (NULL_TREE, complete_ctor_identifier,
+				      &op1_vec,
+				      TYPE_MAIN_VARIANT (TREE_TYPE (op1)), 0,
+				      cry);
+    
+  if (code == PSEUDO_DTOR_EXPR)
+    return build_special_member_call (NULL_TREE, complete_dtor_identifier,
+				      &op1_vec,
+				      TYPE_MAIN_VARIANT (TREE_TYPE (op1)), 0,
+				      cry);
+
+  int flags = LOOKUP_NORMAL;
+  tree exp = build_new_op (EXPR_LOCATION (op1), code, flags, op0, op1,
+			   NULL_TREE, NULL, tf_none);
+  if (exp == error_mark_node)
+    exp = build_x_modify_expr (EXPR_LOCATION (op1), op0, code, op1, tf_none);
+  if (exp && exp != error_mark_node)
+    return exp;
+
+  const char *op = operator_name_info[(int) code].name;
+  const char *explain = cry ? "" : "accessible, unambiguous";
+  if (op1) 
+    error_at (loc, "No%s operator%s(%T,%T) for _Cilk_for loop", explain, op, 
+	      TREE_TYPE (op0), TREE_TYPE (op1)); 
+  else 
+    error_at (loc, "No%s operator%s(%T,%T) for _Cilk_for loop", explain, op, 
+	      TREE_TYPE (op0), TREE_TYPE (op0));
+  return NULL_TREE;
+}
+
+/* Calculates the COUNT_UP and/or COUNT_DOWN values for a _Cilk_for loop using
+   its characteristics stored in *CFD.  */
+
+static void
+calc_count_up_count_down (struct cilk_for_desc *cfd, tree *count_up,
+			  tree *count_down)
+{
+  /* Reasoning for high and low variables can be found in
+     cilk_compute_loop_count in c-family/cilk.c.  */
+  tree high = cfd->end_var ? cfd->end_var : cfd->end_expr;
+  tree low = cfd->lower_bound ? cfd->lower_bound : cfd->var;
+
+  /* When these are invalid, we flag them in cilk_compute_loop_var.  This
+     condition is a bit rare.  */
+  if (high == error_mark_node || low == error_mark_node)
+    return;
+  
+  /* Only call this function if we are using an iterator.  */
+  gcc_assert (cfd->iterator);
+  
+  if (TREE_CODE (high) == TARGET_EXPR)
+    high = TARGET_EXPR_INITIAL (high);
+  if (TREE_CODE (low) == TARGET_EXPR)
+    low = TARGET_EXPR_INITIAL (low);
+  
+  if (TREE_CODE (low) == TREE_LIST)
+    low = TREE_VALUE (low);
+  high = cilk_tree_operand_noconv (high);
+  if (cfd->direction >= 0)
+    {
+      *count_up = build_x_binary_op (cfd->loc, MINUS_EXPR, high,
+				     TREE_CODE (high), low, TREE_CODE (low),
+				     NULL, tf_warning_or_error);
+      /* We should have already failed if this operator is not callable.  */
+      gcc_assert (*count_up != error_mark_node);
+    }
+  else
+    {
+      *count_down = build_x_binary_op (cfd->loc, MINUS_EXPR, low,
+				       TREE_CODE (low), high, TREE_CODE (high),
+				       NULL, tf_warning_or_error);
+      /* ...same reasoning as count up for the assert below.  */
+      gcc_assert (*count_down != error_mark_node);
+    }
+}
+
+/* Handler for iterator to compute the loop variable.  ADD_OP indicates
+   whether we need a '+' or '-' operation. LOW indicates the starting point
+   and LOOP_VAR is the induction variable.  This functin returns an 
+   INIT_EXPR.  */
+
+static tree
+compute_loop_var_cp_iter_hdl (location_t loc, enum tree_code add_op,
+			      tree low, tree loop_var, tree var2)
+{
+  tree exp = build_new_op (loc, add_op, 0, low, loop_var, NULL_TREE, 0,
+			   tf_none);
+  gcc_assert (exp != error_mark_node);
+  exp = cp_build_modify_expr (var2, INIT_EXPR, exp, tf_warning_or_error);
+  return exp;
+}
+
+/* Returns the body of the nested function for a _Cilk_for using the loop's
+   characteristic information from CFD.  The returned tree will be a
+   STATEMENT LIST.  */
+
+static tree
+cp_create_cilk_for_body (struct cilk_for_desc *cfd)
+{
+  push_function_context ();
+  declare_cilk_for_parms (cfd);
+  cfd->wd.fntype = build_function_type (void_type_node, cfd->wd.argtypes);
+
+  tree fndecl = cilk_create_cilk_helper_decl (&cfd->wd);
+  fndecl = build_lang_decl (FUNCTION_DECL, DECL_NAME (fndecl), cfd->wd.fntype);
+  if (cfd->nested_ok)
+    DECL_CONTEXT (fndecl) = current_function_decl;
+  else
+    DECL_CONTEXT (fndecl) = DECL_CONTEXT (current_function_decl);
+
+  tree outer = current_function_decl;
+  SET_DECL_LANGUAGE (fndecl, lang_c);
+  start_preparsed_function (fndecl, NULL_TREE, SF_PRE_PARSED);
+
+  declare_cilk_for_vars (cfd, fndecl);
+  
+  tree lower_bound = cfd->lower_bound;
+  struct gimplify_ctx gctx;
+
+  tree body = begin_compound_stmt (BCS_FN_BODY);
+  push_gimplify_context (&gctx);
+
+  gimple_add_tmp_var (cfd->var2);
+
+  /* Get the lower bound into a variable unless it is a constant or a
+     non-copyable value.  If non-copyable value, then reference value from
+     the outer frame.  */
+  if (!lower_bound)
+    {
+      lower_bound = cfd->var;
+      tree hack = build_decl (cfd->loc, VAR_DECL, NULL_TREE,
+			      TREE_TYPE (lower_bound));
+      DECL_CONTEXT (hack) = DECL_CONTEXT (lower_bound);
+      *pointer_map_insert (cfd->wd.decl_map, hack) = lower_bound;
+      lower_bound = hack;
+    }
+  tree cast_max_expr, count_type, pre, loop_var;
+  if (INTEGRAL_TYPE_P (cfd->var_type))
+    {
+      loop_var = create_tmp_var (cfd->var_type, NULL);
+      count_type = cfd->var_type;
+      tree cvt_expr = cp_fold_convert (cfd->var_type, cfd->min_parm);
+      pre = build_x_modify_expr (cfd->loc, loop_var, NOP_EXPR, cvt_expr,
+				 tf_warning_or_error);
+      cast_max_expr = cp_fold_convert (count_type, cfd->max_parm);
+    }
+  else
+    {
+      loop_var = create_tmp_var (TREE_TYPE (cfd->min_parm), NULL);
+      count_type = cfd->count_type;
+      pre = fold_build2 (INIT_EXPR, void_type_node, loop_var, cfd->min_parm);
+      cast_max_expr = cfd->max_parm;
+    }
+
+  tree loop_body = alloc_stmt_list ();
+  
+  /* Concat. the control variable initialization with the loop body.
+     Do not call gimplify_and_add to append to list because we need
+     to wrap the entire list in a cleanup point expr to delay destruction
+     of the control variable to the end of the loop if it is an iterator.  */
+  tree loop_end_comp = cilk_compute_loop_var (cfd, loop_var, lower_bound,
+					      compute_loop_var_cp_iter_hdl);
+  if (loop_end_comp == error_mark_node)
+    {
+      cfd->invalid = true;
+      return error_mark_node;
+    }
+  append_to_statement_list (loop_end_comp, &loop_body);
+  tree cleanup = cxx_maybe_build_cleanup (cfd->var2, tf_none);
+  if (cleanup)
+    {
+      append_to_statement_list (cfd->body, &loop_body);
+      append_to_statement_list (cleanup, &loop_body);
+    }
+  else
+    append_to_statement_list (cfd->body, &loop_body);
+
+  loop_body = fold_build_cleanup_point_expr (void_type_node, loop_body);
+  DECL_SEEN_IN_BIND_EXPR_P (cfd->var2) = 1;
+
+  cfd->wd.context = outer;
+  bool throws = flag_exceptions ? cp_function_chain->can_throw : false;
+  cilk_outline_body (fndecl, &loop_body, &cfd->wd, &throws);
+  cp_function_chain->can_throw = throws;
+
+  tree loop_cond = fold_build2 (LT_EXPR, boolean_type_node, loop_var,
+				cast_max_expr);
+  tree mod_expr = fold_build2 (MODIFY_EXPR, void_type_node, loop_var,
+				build2 (PLUS_EXPR, count_type, loop_var,
+					build_one_cst (count_type)));
+
+  /* this for loop will be like this (assuming start < end):
+     for (ii = start; ii < end; ii++)
+       <_Cilk_for body>  */
+  add_stmt (build5 (FOR_STMT, void_type_node, pre, loop_cond, mod_expr,
+		    loop_body, NULL_TREE));
+
+  DECL_INITIAL (fndecl) = make_node (BLOCK);
+  TREE_USED (DECL_INITIAL (fndecl)) = 1;
+  BLOCK_VARS (DECL_INITIAL (fndecl)) = loop_var;
+  TREE_CHAIN (loop_var) = cfd->var2;
+
+  body = build3 (BIND_EXPR, void_type_node, loop_var, body,
+		 DECL_INITIAL (fndecl));
+  DECL_CONTEXT (cfd->var2) = fndecl;
+  pop_gimplify_context (0);
+
+  finish_function_body (body);
+  
+  /* A nested function canot be expanded or deferred until its parent is done.
+     So, don't call expand_or_defer_fn here.  A non-nested function must be
+     done here.  */
+  if (!cfd->nested_ok)
+    expand_or_defer_fn (fndecl);
+  
+  pop_function_context ();
+  return fndecl;
+}
+
+/* Creates a nested function for the _Cilk_for statement using its information
+   in CFD.  PRE_P is the preceeding gimple trees function.  */
+
+static tree
+create_cilk_for_nested_fn (struct cilk_for_desc *cfd, gimple_seq *pre_p)
+{
+  tree var = cfd->var;
+  DECL_CONTEXT (var) = current_function_decl;
+
+  if (POINTER_TYPE_P (TREE_TYPE (var)))
+    cilk_extract_free_variables (cfd->lower_bound, &cfd->wd, ADD_WRITE);
+  else
+    cilk_extract_free_variables (cfd->lower_bound, &cfd->wd, ADD_READ);
+
+  tree incr = cfd->incr;
+
+  /* If the loop increment is not an integer constant and is not a DECL,
+     copy it to a temporary.  if it is modified during the loop the behavior
+     is undefined.  Races could be avoided by copying it to a temporary
+     variable.  */
+  if (TREE_CODE (incr) != INTEGER_CST && !DECL_P (incr))
+    {
+      incr = get_formal_tmp_var (incr, pre_p);
+      cfd->incr = incr;
+    }
+
+  if (DECL_P (incr) && !TREE_STATIC (incr) && !DECL_EXTERNAL (incr))
+    *pointer_map_insert (cfd->wd.decl_map, incr) = incr;
+
+  /* Map the loop variable to integer_minus_one_node if we won't really be
+     passing it into hte loop body.  Otherwise map to integer_zero_node.  */
+  *pointer_map_insert (cfd->wd.decl_map, var) =
+    (void *) (cfd->lower_bound ? integer_minus_one_node : integer_zero_node);
+  cilk_extract_free_variables (cfd->body, &cfd->wd, ADD_READ);
+
+  tree fn = cp_create_cilk_for_body (cfd);
+
+  /* One of the reasons why FN is error_mark_node is because the function
+     couldn't find the appropriate overloaded operation.  */
+  if (fn == error_mark_node)
+    return error_mark_node;
+
+  DECL_UNINLINABLE (fn) = 1;
+  DECL_STATIC_CHAIN (fn) = 1;
+
+  current_function_decl = fn;
+  /* Genericize the _Cilk_for body, mainly split up the _Cilk_for body and
+     the for-loop we inserted.  */
+  cp_genericize (fn);
+  return fn;
+}
+
+/* Helper function to gimplify a CILK_FOR_STMT.  CFD holds all the values
+   extracted a CILK_FOR_STMT and *PRE_P is the preceeding sequence.  */
+
+static void
+gimplify_cilk_for_1 (struct cilk_for_desc cfd, gimple_seq *pre_p)
+{
+  bool order_variable = false;
+  tree parent_function = current_function_decl;
+  
+  if (TREE_SIDE_EFFECTS (cfd.end_expr))
+    {
+      enum tree_code ecode = TREE_CODE (cfd.end_expr);
+      if (ecode == INIT_EXPR || ecode == MODIFY_EXPR)
+	cfd.end_var = TREE_OPERAND (cfd.end_expr, 0);
+      else if (ecode == TARGET_EXPR)
+	{
+	  cfd.end_var = TARGET_EXPR_INITIAL (cfd.end_expr);
+	  if (TREE_CODE (cfd.end_var) == AGGR_INIT_EXPR)
+	    cfd.end_var = TARGET_EXPR_SLOT (cfd.end_expr);
+	  else
+	    cfd.end_var = get_formal_tmp_var (cfd.end_var, pre_p);
+	}
+      else if (ecode == CALL_EXPR)
+	cfd.end_var = cfd.end_expr;
+      else
+	{
+	  tree ii_tree = cfd.end_expr;
+	  while (TREE_CODE_CLASS (TREE_CODE (ii_tree)) == tcc_unary)
+	    ii_tree = TREE_OPERAND (ii_tree, 0);
+	  if (TREE_CODE (ii_tree) == ADDR_EXPR)
+	    ii_tree = TREE_OPERAND (ii_tree, 0);
+	  ecode = TREE_CODE (ii_tree);
+	  tree tmp_var = cilk_tree_operand_noconv (cfd.end_expr);
+	  cfd.end_var = get_formal_tmp_var (tmp_var, pre_p);
+	  order_variable = true;
+	}
+    }
+  tree cond = cfd.cond;
+  tree op1 = TREE_OPERAND (cond, 1);
+  tree op0 = TREE_OPERAND (cond, 0);
+  enum tree_code cond_code = TREE_CODE (cond);
+
+  /* In this case below, we have an overloaded boolean comparison operation.  */
+  if (cond_code == CALL_EXPR)
+    {
+      cond_code = cilk_find_code_from_call (CALL_EXPR_FN (cond));
+      op1 = cilk_tree_operand_noconv (CALL_EXPR_ARG (cond, 1));
+      op0 = cilk_tree_operand_noconv (CALL_EXPR_ARG (cond, 0));
+      if (TREE_CODE (op0) == ADDR_EXPR || TREE_CODE (op0) == INDIRECT_REF)
+	op0 = TREE_OPERAND (op0, 0);
+    }
+  if (order_variable && op1 == cfd.end_expr)
+    op1 = cfd.end_var;
+  else if (order_variable && op0 == cfd.end_expr)
+    op0 = cfd.end_var;
+  
+  cond = callable (cfd.loc, cond_code, op0, op1, false);
+  gcc_assert (cond != NULL_TREE);
+
+  if (TREE_CODE (TREE_TYPE (cond)) != BOOLEAN_TYPE)
+    cond = perform_implicit_conversion (boolean_type_node, cond,
+					tf_warning_or_error);
+  enum tree_code div_op = NOP_EXPR;
+  tree forward = NULL_TREE, count_up = NULL_TREE, count_down = NULL_TREE;
+  cilk_calc_forward_div_op (&cfd, &div_op, &forward);
+  if (cfd.iterator)
+    calc_count_up_count_down (&cfd, &count_up, &count_down);
+  
+  tree count = cilk_compute_loop_count (&cfd, div_op, forward, count_up,
+					count_down);
+  tree fn = create_cilk_for_nested_fn (&cfd, pre_p);
+  if (fn == error_mark_node)
+    return;
+  cfd.cond = cond;
+  
+  current_function_decl = parent_function;
+  gimple_seq inner_seq = insert_cilk_for_nested_fn (&cfd, count, fn);
+  gimple_seq_add_seq (pre_p, inner_seq);
+}
+
+/* Extract all the relevant information from CFOR, a CILK_FOR_STMT tree
+   and store them in CFD structure.  */
+
+static void
+cp_extract_cilk_for_fields (struct cilk_for_desc *cfd, tree cfor)
+{
+  cfd->var = CILK_FOR_VAR (cfor);
+  cfd->cond = CILK_FOR_COND (cfor);
+  cfd->lower_bound = CILK_FOR_INIT (cfor);
+  cfd->incr = CILK_FOR_EXPR (cfor);
+  cfd->loc = EXPR_LOCATION (cfor);
+  cfd->body = CILK_FOR_BODY (cfor);
+  cfd->grain = CILK_FOR_GRAIN (cfor);
+  cfd->invalid = false;
+
+  /* This function shouldn't be setting these two variables.  */
+  cfd->ctx_arg = NULL_TREE;
+  cfd->count = NULL_TREE;
+  
+  cilk_set_init_info (cfd);
+  cilk_set_inclusive_and_direction (cfd);
+  cilk_set_iter_difftype (cfd);
+
+  if (cfd->iterator)
+    {
+      tree exp = NULL_TREE;
+      tree hack = build_decl (cfd->loc, VAR_DECL, NULL_TREE,
+			      TREE_TYPE (cfd->var));
+      if (cfd->direction >= 0)
+	exp = callable (cfd->loc, MINUS_EXPR, hack, cfd->var,true);
+      else
+	exp = callable (cfd->loc, MINUS_EXPR, cfd->var, hack, true);
+      if (!exp) 
+	{ 
+	  cfd->invalid = true;
+	  return;
+	}
+      cfd->difference_type = TYPE_MAIN_VARIANT (TREE_TYPE (exp));
+    }
+  cfd->count_type = cilk_check_loop_difference_type (cfd->difference_type);
+  cilk_set_incr_info (cfd, true);
+}
+
+/* Entry function to gimplify a CILK_FOR_STMT, *FOR_P.  *PRE_P and *POST_P are
+    preceeding and proceeding gimple sequences of *FOR_P, respectively.  */
+
+int
+cp_gimplify_cilk_for (tree *for_p, gimple_seq *pre_p,
+		      gimple_seq *post_p ATTRIBUTE_UNUSED)
+{
+  struct cilk_for_desc cfd;
+
+  cfun->is_cilk_function = 1;
+  cilk_init_cfd (&cfd);
+
+  cp_extract_cilk_for_fields (&cfd, *for_p);
+  if (cfd.invalid)
+    {
+      *for_p = build_empty_stmt (cfd.loc);
+      return GS_ERROR;
+    }
+  cfd.nested_ok = !DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (current_function_decl);
+  gimplify_cilk_for_1 (cfd, pre_p);
+  *for_p = NULL_TREE;
+
+  return GS_ALL_DONE;
+}
+
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index c464719..b40e9a6 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -269,6 +269,23 @@  genericize_cp_loop (tree *stmt_p, location_t start_locus, tree cond, tree body,
   *stmt_p = stmt_list;
 }
 
+/* Genericize a CILK_FOR_STMT node *STMT_P.  */
+
+static void
+genericize_cilk_for_stmt (tree *stmt_p, int *walk_subtrees, void *data)
+{
+  tree stmt = *stmt_p;
+  cp_walk_tree (&CILK_FOR_COND (stmt), cp_genericize_r, data, NULL);
+  cp_walk_tree (&CILK_FOR_INIT (stmt), cp_genericize_r, data, NULL);
+  cp_walk_tree (&CILK_FOR_GRAIN (stmt), cp_genericize_r, data, NULL);
+  cp_walk_tree (&CILK_FOR_VAR (stmt), cp_genericize_r, data, NULL);
+  cp_walk_tree (&CILK_FOR_EXPR (stmt), cp_genericize_r, data, NULL);
+
+  /* _Cilk_for body will be resolved after it is inserted into a nested
+     function.  */
+  *walk_subtrees = 0;
+} 
+
 /* Genericize a FOR_STMT node *STMT_P.  */
 
 static void
@@ -1121,6 +1138,8 @@  cp_genericize_r (tree *stmt_p, int *walk_subtrees, void *data)
     gcc_assert (!CONVERT_EXPR_VBASE_PATH (stmt));
   else if (TREE_CODE (stmt) == FOR_STMT)
     genericize_for_stmt (stmt_p, walk_subtrees, data);
+  else if (TREE_CODE (stmt) == CILK_FOR_STMT)
+    genericize_cilk_for_stmt (stmt_p, walk_subtrees, data);
   else if (TREE_CODE (stmt) == WHILE_STMT)
     genericize_while_stmt (stmt_p, walk_subtrees, data);
   else if (TREE_CODE (stmt) == DO_STMT)
diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index eb81cb2..4d1c45e 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -167,4 +167,7 @@  extern void cp_common_init_ts (void);
 #undef  LANG_HOOKS_CILKPLUS_FRAME_CLEANUP
 #define LANG_HOOKS_CILKPLUS_FRAME_CLEANUP cp_cilk_install_body_wframe_cleanup
 
+#undef  LANG_HOOKS_CILKPLUS_GIMPLIFY_CILK_FOR
+#define LANG_HOOKS_CILKPLUS_GIMPLIFY_CILK_FOR cp_gimplify_cilk_for
+
 #endif /* GCC_CP_OBJCP_COMMON */
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 77daeb8..605f9b0 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5687,6 +5687,10 @@  extern void finish_for_init_stmt		(tree);
 extern void finish_for_cond			(tree, tree, bool);
 extern void finish_for_expr			(tree, tree);
 extern void finish_for_stmt			(tree);
+extern tree begin_cilk_for_stmt                 (tree, tree);
+extern void finish_cilk_for_init_stmt           (tree);
+extern tree finish_cilk_for_stmt                (tree);
+extern tree finish_cilk_for_cond                (tree);
 extern tree begin_range_for_stmt		(tree, tree);
 extern void finish_range_for_decl		(tree, tree, tree);
 extern void finish_range_for_stmt		(tree);
@@ -6182,6 +6186,8 @@  extern void vtv_build_vtable_verify_fndecl      (void);
 extern bool cpp_validate_cilk_plus_loop		(tree);
 extern void cp_cilk_install_body_wframe_cleanup (tree, tree);
 extern tree cp_cilk_copy_tree_body_r            (tree *, int *, void *);
+extern int cp_gimplify_cilk_for                 (tree *, gimple_seq *,
+						 gimple_seq *);
 
 /* In cp/cp-array-notations.c */
 extern tree expand_array_notation_exprs         (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 0f9b29b..aac3ca5 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -235,6 +235,10 @@  static tree cp_literal_operator_id
 
 static void cp_parser_cilk_simd
   (cp_parser *, cp_token *);
+static tree cp_parser_cilk_for
+  (cp_parser *, tree);
+static void cp_parser_cilk_grainsize
+  (cp_parser *, cp_token *);
 static bool cp_parser_omp_declare_reduction_exprs
   (tree, cp_parser *);
 
@@ -2060,7 +2064,8 @@  static tree cp_parser_decltype
 /* Declarators [gram.dcl.decl] */
 
 static tree cp_parser_init_declarator
-  (cp_parser *, cp_decl_specifier_seq *, vec<deferred_access_check, va_gc> *, bool, bool, int, bool *, tree *);
+  (cp_parser *, cp_decl_specifier_seq *, vec<deferred_access_check, va_gc> *,
+   bool, bool, int, bool *, tree *, tree *);
 static cp_declarator *cp_parser_declarator
   (cp_parser *, cp_parser_declarator_kind, int *, bool *, bool);
 static cp_declarator *cp_parser_direct_declarator
@@ -9353,6 +9358,7 @@  cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 
 	case RID_WHILE:
 	case RID_DO:
+	case RID_CILK_FOR:
 	case RID_FOR:
 	  statement = cp_parser_iteration_statement (parser, false);
 	  break;
@@ -10508,6 +10514,17 @@  cp_parser_iteration_statement (cp_parser* parser, bool ivdep)
       }
       break;
 
+    case RID_CILK_FOR:
+      if (!flag_enable_cilkplus)
+	{ 
+	  error_at (token->location, 
+		    "-fcilkplus must be enabled t use %<_Cilk_for%>");
+	  statement = error_mark_node;
+	}
+      else
+	statement = cp_parser_cilk_for (parser, NULL_TREE);
+      break;
+
     default:
       cp_parser_error (parser, "expected iteration-statement");
       statement = error_mark_node;
@@ -10627,9 +10644,15 @@  cp_parser_jump_statement (cp_parser* parser)
 	case IN_OMP_FOR:
 	  error_at (token->location, "break statement used with OpenMP for loop");
 	  break;
+
 	case IN_CILK_SIMD_FOR:
 	  error_at (token->location, "break statement used with Cilk Plus for loop");
 	  break;
+
+	case IN_CILK_FOR_STMT:
+	  error_at (token->location,
+		    "break statement used in _Cilk_for loop body");
+	  break;
 	}
       cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
       break;
@@ -10645,6 +10668,7 @@  cp_parser_jump_statement (cp_parser* parser)
 		    "continue statement within %<#pragma simd%> loop body");
 	  /* Fall through.  */
 	case IN_ITERATION_STMT:
+	case IN_CILK_FOR_STMT:
 	case IN_OMP_FOR:
 	  statement = finish_continue_stmt ();
 	  break;
@@ -11191,7 +11215,7 @@  cp_parser_simple_declaration (cp_parser* parser,
 					/*member_p=*/false,
 					declares_class_or_enum,
 					&function_definition_p,
-					maybe_range_for_decl);
+					maybe_range_for_decl, NULL);
       /* If an error occurred while parsing tentatively, exit quickly.
 	 (That usually happens when in the body of a function; each
 	 statement is treated as a declaration-statement until proven
@@ -16442,7 +16466,8 @@  cp_parser_init_declarator (cp_parser* parser,
 			   bool member_p,
 			   int declares_class_or_enum,
 			   bool* function_definition_p,
-			   tree* maybe_range_for_decl)
+			   tree* maybe_range_for_decl,
+			   tree* init)
 {
   cp_token *token = NULL, *asm_spec_start_token = NULL,
            *attributes_start_token = NULL;
@@ -16450,7 +16475,9 @@  cp_parser_init_declarator (cp_parser* parser,
   tree prefix_attributes;
   tree attributes = NULL;
   tree asm_specification;
-  tree initializer;
+  /* Initialize initalizer to remove a "using potentially unset variable"
+     warning/error.  */
+  tree initializer = NULL_TREE;
   tree decl = NULL_TREE;
   tree scope;
   int is_initialized;
@@ -16587,7 +16614,8 @@  cp_parser_init_declarator (cp_parser* parser,
 	      DECL_STRUCT_FUNCTION (decl)->function_start_locus
 		= func_brace_location;
 	    }
-
+	  if (init)
+	    *init = initializer;
 	  return decl;
 	}
     }
@@ -16822,6 +16850,8 @@  cp_parser_init_declarator (cp_parser* parser,
 	finish_fully_implicit_template (parser, /*member_decl_opt=*/0);
     }
 
+  if (init)
+    *init = initializer;
   return decl;
 }
 
@@ -22987,6 +23017,7 @@  cp_parser_single_declaration (cp_parser* parser,
 				        member_p,
 				        declares_class_or_enum,
 				        &function_definition_p,
+					NULL,
 					NULL);
 
     /* 7.1.1-1 [dcl.stc]
@@ -31256,6 +31287,21 @@  cp_parser_pragma (cp_parser *parser, enum pragma_context context)
       cp_parser_cilk_simd (parser, pragma_tok);
       return true;
 
+    case PRAGMA_CILK_GRAINSIZE:
+      if (context == pragma_external)
+	{
+	  error_at (pragma_tok->location,
+		    "%<#pragma cilk grainsize%> may only be be used inside a "
+		    "function");
+	  break;
+	}
+
+      /* Ignore the pragma if Cilk Plus is not enabled.  */
+      if (flag_enable_cilkplus)
+	{
+	  cp_parser_cilk_grainsize (parser, pragma_tok);
+	  return true;
+	}
     default:
       gcc_assert (id >= PRAGMA_FIRST_EXTERNAL);
       c_invoke_pragma_handler (id);
@@ -31572,6 +31618,213 @@  cp_parser_cilk_simd (cp_parser *parser, cp_token *pragma_token)
   return;
 }
 
+static tree
+cp_parser_cilk_for_init_statement (cp_parser *parser, tree *init)
+{
+  cp_token *token = cp_lexer_peek_token (parser->lexer);
+  location_t loc = token->location;
+  tree decl_init = NULL_TREE;
+  if (token->type == CPP_SEMICOLON)
+    {
+      error_at (loc, "expected induction variable");
+      return error_mark_node;
+    }
+
+  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC)
+      || cp_lexer_next_token_is_keyword (parser->lexer, RID_REGISTER)
+      || cp_lexer_next_token_is_keyword (parser->lexer, RID_EXTERN)
+      || cp_lexer_next_token_is_keyword (parser->lexer, RID_MUTABLE)
+      || cp_lexer_next_token_is_keyword (parser->lexer, RID_THREAD))
+    {
+      error_at (loc, "storage class is not allowed");
+      cp_lexer_consume_token (parser->lexer);
+    }
+
+  if (token->type == CPP_NAME)
+    {
+      tree type = cp_parser_lookup_name_simple (parser, token->u.value, loc);
+      if (TREE_CODE (type) == VAR_DECL || TREE_CODE (type) == PARM_DECL)
+	{
+	  error_at (loc, "_Cilk_for loop initializer must declare variable");
+	  cp_parser_skip_to_end_of_statement (parser);
+	  return error_mark_node;
+	}
+    }
+  int flags = 0;
+  cp_decl_specifier_seq specs;
+  cp_parser_decl_specifier_seq (parser, CP_PARSER_FLAGS_NONE, &specs, &flags);
+  tree decl = cp_parser_init_declarator (parser, &specs, NULL, false, false,
+					 flags, NULL, NULL, &decl_init);
+  /* Sometimes if the initial is constant, it won't save in DECL_INITIAL,
+     and thus we need to get the initial value.  Now, if it saved the
+     DECL_INITIAL value, then just use it since it will have all the
+     necessary type casting.  */
+  if (DECL_INITIAL (decl))
+      decl_init = DECL_INITIAL (decl);
+
+  
+  if (processing_template_decl)
+    add_stmt (decl_init);
+  else
+    *init = decl_init;
+  parser->scope = NULL_TREE;
+  parser->qualifying_scope = NULL_TREE;
+  parser->object_scope = NULL_TREE;
+
+  if (decl == error_mark_node || DECL_INITIAL (decl) == error_mark_node
+      || TREE_TYPE (decl) == error_mark_node)
+    {
+      cp_parser_skip_to_end_of_statement (parser);
+      gcc_assert (errorcount || sorrycount);
+      return error_mark_node;
+    }
+  return decl;
+}
+
+static void
+cp_parser_cilk_grainsize (cp_parser *parser, cp_token *pragma_tok)
+{
+  if (cp_parser_require (parser, CPP_EQ, RT_EQ))
+    {
+      tree exp = cp_parser_binary_expression (parser, false, false,
+					      PREC_NOT_OPERATOR, NULL);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      if (!exp || exp == error_mark_node)
+	{
+	  error_at (pragma_tok->location, "invalid grainsize for _Cilk_for");
+	  return;
+	}
+      cp_token *n_tok = cp_lexer_peek_token (parser->lexer);
+
+      /* Make sure the next token is _Cilk_for, it is invalid otherwise.  */
+      if (n_tok && n_tok->type == CPP_KEYWORD && n_tok->keyword == RID_CILK_FOR)
+	{
+	  cp_lexer_consume_token (parser->lexer);
+	  tree cfor = cp_parser_cilk_for (parser, exp);
+	  if (cfor && STATEMENT_CODE_P (TREE_CODE (cfor)))
+	    SET_EXPR_LOCATION (cfor, n_tok->location);
+	}
+      else
+	warning (0, "%<#pragma cilk grainsize%> is not followed by "
+		 "%<_Cilk_for%>");
+      return;
+    }
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+}
+
+/* Top-level function to parse _Cilk_for and the for statement
+   following <#pragma simd>.  */
+
+static tree
+cp_parser_cilk_for (cp_parser *parser, tree grain)
+{
+  bool valid = true;
+  tree cond = NULL_TREE;
+  tree incr_expr = NULL_TREE;
+  tree init = NULL_TREE;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+
+  tree scope = begin_for_scope (&init); 
+  tree statement = begin_cilk_for_stmt (scope, init);
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+    {
+      cp_parser_skip_to_end_of_statement (parser);
+      return error_mark_node;
+    }
+
+  /* Parse initialization.  */ 
+  tree decl = cp_parser_cilk_for_init_statement (parser, &init);
+    
+  if (decl == error_mark_node)
+    valid = false;
+  else if (!decl || (TREE_CODE (decl) != VAR_DECL
+		     && TREE_CODE (decl) != DECL_EXPR))
+    {
+      error_at (loc, "_Cilk_for loop initializer does not declare a variable");
+      valid = false;
+      decl = error_mark_node;
+    }
+  if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
+    {
+      error_at (loc, "_Cilk_for loop initializer cannot have multiple variable"
+		" declarations");
+      cp_parser_skip_to_end_of_statement (parser);
+      valid = false;
+    }
+
+  if (!valid)
+    /* Skip to the semicolon ending the init.  */
+    cp_parser_skip_to_end_of_statement (parser);
+  else
+    {
+      CILK_FOR_INIT (statement) = init;
+      CILK_FOR_VAR (statement) = decl;
+      finish_cilk_for_init_stmt (statement);
+    }
+
+  /* Parse condition.  */
+  if (!cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON))
+    return error_mark_node;
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+    {
+      error_at (loc, "missing condition");
+      cond = error_mark_node;
+    }
+  else
+    { 
+      cond = cp_parser_condition (parser);
+      cond = finish_cilk_for_cond (cond); 
+      CILK_FOR_COND (statement) = cond;
+    }
+
+  if (cond == error_mark_node)
+    valid = false;
+  cp_parser_consume_semicolon_at_end_of_statement (parser);
+
+  /* Parse increment.  */
+  if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_PAREN))
+    {
+      error_at (loc, "missing increment");
+      incr_expr = error_mark_node;
+    }
+  else
+    incr_expr = cp_parser_expression (parser, false, NULL);
+  if (TREE_CODE (incr_expr) == ERROR_MARK)
+    {
+      cp_parser_skip_to_closing_parenthesis (parser, true, false, false);
+      valid = false;
+    }
+  if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+    {
+      cp_parser_skip_to_end_of_statement (parser);
+      valid = false;
+    }
+  
+  if (!valid)
+    {
+      gcc_assert (sorrycount || errorcount);
+      return error_mark_node;
+    }
+  
+  finish_for_expr (incr_expr, statement);
+  CILK_FOR_EXPR (statement) = incr_expr;
+  int saved_in_statement = parser->in_statement;
+  parser->in_statement = IN_CILK_FOR_STMT;
+  cp_parser_already_scoped_statement (parser);
+  parser->in_statement = saved_in_statement;
+  
+  CILK_FOR_GRAIN (statement) = grain;
+  statement = finish_cilk_for_stmt (statement);
+
+  /* Check if the body satisfies all the requirement of _Cilk_for.
+     If invalid, then just return error_mark_node.  */
+  if (statement == error_mark_node
+      || !cpp_validate_cilk_plus_loop (CILK_FOR_BODY (statement)))
+    return error_mark_node;
+  return statement;
+}
+
 /* Create an identifier for a generic parameter type (a synthesized
    template parameter implied by `auto' or a concept identifier). */
 
diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
index e26e350..8d1ce44 100644
--- a/gcc/cp/parser.h
+++ b/gcc/cp/parser.h
@@ -302,6 +302,8 @@  typedef struct GTY(()) cp_parser {
 #define IN_IF_STMT             16
 #define IN_CILK_SIMD_FOR       32
 #define IN_CILK_SPAWN          64
+#define IN_CILK_FOR_STMT       128
+  
   unsigned char in_statement;
 
   /* TRUE if we are presently parsing the body of a switch statement.
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 1b34434..302163f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13335,6 +13335,45 @@  tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       finish_for_stmt (stmt);
       break;
 
+    case CILK_FOR_STMT:
+      {
+	stmt = begin_cilk_for_stmt (NULL_TREE, NULL_TREE);
+	CILK_FOR_INIT (stmt) = RECUR (CILK_FOR_INIT (t));
+	finish_cilk_for_init_stmt (stmt);
+	tmp = RECUR (CILK_FOR_VAR (t));
+	CILK_FOR_VAR (stmt) = tmp;
+	CILK_FOR_GRAIN (stmt) = CILK_FOR_GRAIN (t);
+
+	tmp = CILK_FOR_COND (t);
+	if (COMPARISON_CLASS_P (tmp))
+	  {
+	    tree op0 = RECUR (TREE_OPERAND (tmp, 0));
+	    tree op1 = RECUR (TREE_OPERAND (tmp, 1));
+	    tmp = build2 (TREE_CODE (tmp), boolean_type_node, op0, op1);
+	  }
+	CILK_FOR_COND (stmt) = tmp;
+
+	tmp = CILK_FOR_EXPR (t);
+	if (TREE_CODE (tmp) == MODIFY_EXPR)
+	  {
+	    tree lhs = TREE_OPERAND (tmp, 0);
+	    tree rhs = TREE_OPERAND (tmp, 1);
+	    lhs = RECUR (lhs);
+	    rhs = build2 (TREE_CODE (rhs), TREE_TYPE (lhs),
+			  RECUR (TREE_OPERAND (rhs, 0)),
+			  RECUR (TREE_OPERAND (rhs, 1)));
+	    tmp = build2 (MODIFY_EXPR, void_type_node, lhs, rhs);
+	  }
+	else
+	  tmp = build2 (TREE_CODE (tmp), void_type_node,
+			RECUR (TREE_OPERAND (tmp, 0)),
+			RECUR (TREE_OPERAND (tmp, 1)));
+	finish_for_expr (tmp, stmt);
+	RECUR (CILK_FOR_BODY (t));
+	stmt = finish_cilk_for_stmt (stmt);
+	CILK_FOR_GRAIN (stmt) = RECUR (CILK_FOR_GRAIN (t));	
+	break;
+      }
     case RANGE_FOR_STMT:
       {
         tree decl, expr;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index a07b0ef..3086009 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -826,7 +826,8 @@  finish_return_stmt (tree expr)
   return r;
 }
 
-/* Begin the scope of a for-statement or a range-for-statement.
+/* Begin the scope of a for-statement _Cilk_for statement 
+   or a range-for-statement.
    Both the returned trees are to be used in a call to
    begin_for_stmt or begin_range_for_stmt.  */
 
@@ -899,7 +900,7 @@  finish_for_cond (tree cond, tree for_stmt, bool ivdep)
 }
 
 /* Finish the increment-EXPRESSION in a for-statement, which may be
-   given by FOR_STMT.  */
+   given by FOR_STMT or CILK_FOR_STMT.  */
 
 void
 finish_for_expr (tree expr, tree for_stmt)
@@ -926,7 +927,10 @@  finish_for_expr (tree expr, tree for_stmt)
   expr = maybe_cleanup_point_expr_void (expr);
   if (check_for_bare_parameter_packs (expr))
     expr = error_mark_node;
-  FOR_EXPR (for_stmt) = expr;
+  if (TREE_CODE (for_stmt) == CILK_FOR_STMT)
+    CILK_FOR_EXPR (for_stmt) = expr;
+  else
+    FOR_EXPR (for_stmt) = expr;
 }
 
 /* Finish the body of a for-statement, which may be given by
@@ -6655,6 +6659,18 @@  finish_omp_cancellation_point (tree clauses)
   finish_expr_stmt (stmt);
 }
 
+
+/* Perform any canonicalization of the conditional in a Cilk for loop.  */
+tree
+finish_cilk_for_cond (tree cond)
+{
+  if (!processing_template_decl)
+    return cp_truthvalue_conversion (cond);
+  else
+    return cond;
+}
+
+
 /* Begin a __transaction_atomic or __transaction_relaxed statement.
    If PCOMPOUND is non-null, this is for a function-transaction-block, and we
    should create an extra compound stmt.  */
@@ -10603,4 +10619,51 @@  capture_decltype (tree decl)
   return type;
 }
 
+/* Begin a _Cilk_for-statement.  Returns a new FOR_STMT.  
+   SCOPE and INIT should be the return of begin_for_scope, 
+   or both NULL_TREE  */
+
+tree
+begin_cilk_for_stmt (tree scope, tree init)
+{
+  tree cilk_for_stmt = build_stmt (input_location, CILK_FOR_STMT, NULL_TREE,
+				   NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE,
+				   NULL_TREE, NULL_TREE);
+  if (scope == NULL_TREE)
+    {
+      if (!init)
+	scope = begin_for_scope (&init);
+    }
+  CILK_FOR_INIT (cilk_for_stmt) = init;
+  CILK_FOR_SCOPE (cilk_for_stmt) = scope;
+  return cilk_for_stmt;
+}
+
+/* Finish the for-init-statement of a for-statement, which may be given 
+   by C_FOR_STMT.  */
+
+void
+finish_cilk_for_init_stmt (tree c_for_stmt)
+{
+  if (processing_template_decl)
+    CILK_FOR_INIT (c_for_stmt) = pop_stmt_list (CILK_FOR_INIT (c_for_stmt));
+  CILK_FOR_BODY (c_for_stmt) = do_pushlevel (sk_block);
+}
+
+/* Finish the body of a for-statement, which may be given by FOR_STMT.  
+   Returns a CILK_FOR_STMT that is type checked.  */
+
+tree
+finish_cilk_for_stmt (tree cilk_for_stmt)
+{
+  CILK_FOR_BODY (cilk_for_stmt) = do_poplevel (CILK_FOR_BODY (cilk_for_stmt));
+  tree *scope_ptr = &CILK_FOR_SCOPE (cilk_for_stmt);
+  tree scope = *scope_ptr;
+  *scope_ptr = NULL;
+  add_stmt (do_poplevel (scope));
+  cp_finish_cilk_for_loop (&cilk_for_stmt, processing_template_decl);
+  add_stmt (cilk_for_stmt);
+  return cilk_for_stmt;
+}
+
 #include "gt-cp-semantics.h"
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-start-at-5.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-start-at-5.cc
new file mode 100644
index 0000000..dec650c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-start-at-5.cc
@@ -0,0 +1,42 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+
+int j[10];
+
+int main(void)
+{
+  int error = 0;
+  int j_serial[10];
+  for (int ii = 0; ii < 10; ii++)
+    {
+      j[ii] = 10;
+      j_serial[ii] = 10;
+    }
+  _Cilk_for (int ii = 5; ii < 10; ii++)
+    {
+      j[ii]=ii;
+    }
+
+  for (int ii = 5; ii < 10; ii++)
+    {
+      j_serial[ii] = ii;
+    }
+
+  for (int ii = 0; ii < 10; ii++)
+    {
+      if (j[ii] != j_serial[ii]) 
+	error = 1;    
+    }
+
+  if (error)
+    __builtin_abort ();
+  else
+    return 0;
+
+  return j[9];
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
new file mode 100644
index 0000000..8221371
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for-tplt.cc
@@ -0,0 +1,25 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+#define SIZE 100
+#define CHECK_VALUE 5
+
+template <class T>
+int func (T start, T end)
+{
+  int Array[SIZE];
+  _Cilk_for (T ii = 0; ii < end; ii++)
+    Array[ii] = CHECK_VALUE;
+  
+  for (T ii = 0; ii < end; ii++)
+    if (Array[ii] != CHECK_VALUE)
+      __builtin_abort ();
+
+  return 0;
+}
+
+int main (void)
+{
+  return func <int> (0, 100) + func <long> (0, 100);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for.cc
new file mode 100644
index 0000000..30ea29d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk-for.cc
@@ -0,0 +1,34 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+int main(int argc, char **argv)
+{
+  char Array1[26], Array2[26];
+  char Array1_Serial[26], Array2_Serial[26];
+
+  for (int ii = 0; ii < 26; ii++)  
+    { 
+      Array1[ii] = 'A'+ii;
+      Array1_Serial[ii] = 'A'+ii;
+    }
+  for (int ii = 0; ii < 26; ii++)
+    {
+      Array2[ii] = 'a'+ii;
+      Array2_Serial[ii] = 'a'+ii;
+    }
+
+  _Cilk_for (int ii = 0 ; ii < 26; ii++) 
+    Array1[ii] = Array2[ii];
+
+  for (int ii = 0; ii < 26; ii++)
+    Array1_Serial[ii] = Array2_Serial[ii];
+
+  for (int ii = 0; ii < 26; ii++)  {
+    if (Array1_Serial[ii] != Array1[ii])  { 
+	__builtin_abort ();
+    }
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_inside_for.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_inside_for.cc
new file mode 100644
index 0000000..3759a36
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_inside_for.cc
@@ -0,0 +1,22 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+int q[10], seq[10];
+int main (int argc, char** argv)
+{
+   
+    int max = 10, start = 0;
+      _Cilk_for(int ii=max - 1; ii>=start; ii--) 
+	{ 
+	  for (int jj = 0; jj < 10; jj++)  
+	    {
+	      if (seq[jj] == 5)
+		continue;
+	      else
+		seq[jj] = 2;
+	    }
+	}
+        return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_for.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_for.cc
new file mode 100644
index 0000000..38c4d51
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_for.cc
@@ -0,0 +1,19 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int q[10], seq2[10];
+int main (int argc, char** argv)
+{
+   
+    int max = 10, start = 0;
+      _Cilk_for(int ii=max - 1; ii>=start; ii--) 
+	{ 
+	  for (int jj = 0; jj < 10; jj++) 
+	    seq2[jj] = 5;
+	  continue;
+	}
+        return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_if.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_if.cc
new file mode 100644
index 0000000..e68c700
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_if.cc
@@ -0,0 +1,18 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int q[10], seq2[10];
+int main (int argc, char** argv)
+{
+   
+    int max = 10, start = 0;
+      _Cilk_for(int ii = max - 1; ii >= start; ii--) 
+	{ 
+	  if (q[ii] != 0) 
+	    continue;
+	}
+        return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_while.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_while.cc
new file mode 100644
index 0000000..17fd064
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_cont_with_while.cc
@@ -0,0 +1,23 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int q[10], seq2[10];
+int main (int argc, char** argv)
+{
+   
+    int max = 10, start = 0;
+      _Cilk_for(int ii=max - 1; ii>=start; ii--) 
+	{ 
+	  int jj = 0;
+	  while (jj < 10)
+	    {
+	      seq2[jj] = 1;
+	      jj++;
+	    }
+	  continue;
+	}
+        return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_genricize_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_genricize_test.cc
new file mode 100644
index 0000000..f0ad2a3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_genricize_test.cc
@@ -0,0 +1,42 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <assert.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <vector>
+#include <list>
+#if HAVE_IO 
+#include <stdio.h>
+#endif
+#define NUMBER 500
+#include <stdlib.h>
+typedef std::pair<int, int> my_type_t;
+
+long
+valid_pairs(std::vector< my_type_t > my_list) 
+{
+  _Cilk_for (int ii = 0; ii < my_list.size(); ii++) 
+    {
+#if HAVE_IO
+    fprintf(stderr, "my_list index: %d, size: %zu.\n", ii, my_list.size());
+#endif
+      if (ii < 0 || ii >= my_list.size())
+	abort (); 
+    }
+  return 0;
+}
+
+int main(int argc, char **argv) 
+{
+  std::vector<my_type_t> my_list;
+
+  for (int ii = 0; ii < NUMBER; ii++) 
+    my_list.push_back(my_type_t(ii, ii));
+  long res = valid_pairs(my_list);
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_grainsize.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_grainsize.cc
new file mode 100644
index 0000000..7d54828
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_grainsize.cc
@@ -0,0 +1,77 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+int x = 5;
+int q = 25;
+int z = 2;
+
+int square (int b)
+{
+  return (b*b);
+}
+
+template<class T>
+int templated_func (T a, T b, T c)
+{
+  T Array[10];
+#pragma cilk grainsize = a
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = a;
+
+  for (int ii = 0; ii < 10; ii++)
+    if (Array[ii] != a)
+      __builtin_abort ();
+
+#pragma cilk grainsize = square ((int) (b/c))
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = b;
+
+  for (int ii = 0; ii < 10; ii++)
+    if (Array[ii] != b)
+      __builtin_abort ();
+
+#pragma cilk grainsize = 2
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = c;
+
+  for (int ii = 0; ii < 10; ii++)
+    if (Array[ii] != c)
+      __builtin_abort ();
+
+  return 0;
+}
+
+ 
+
+int main (void)
+{
+  int Array[10];
+#pragma cilk grainsize = 5
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 5;
+
+  for (int ii = 0; ii < 10; ii++)
+    if (Array[ii] != 5)
+      __builtin_abort ();
+
+
+#pragma cilk grainsize = x
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 10;
+
+  for (int ii = 0; ii < 10; ii++)
+    if (Array[ii] != 10)
+      __builtin_abort ();
+
+#pragma cilk grainsize = square (z)
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    Array[ii] = 15;
+
+  for (int ii = 0; ii < 10; ii++)
+    if (Array[ii] != 15)
+      __builtin_abort ();
+
+  int r = 5, s=10, t =15;
+  return templated_func (r, s, t);
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_p_errors.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_p_errors.cc
new file mode 100644
index 0000000..4c69712
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_p_errors.cc
@@ -0,0 +1,52 @@ 
+/* { dg-options "-fcilkplus -Wunknown-pragmas" } */
+
+int main (void)
+{
+  int a, iii = 0;
+  _Cilk_for (; iii < 10; iii++) /* { dg-error "expected induction variable" } */
+    a = 5;
+
+  _Cilk_for (iii = 0; iii < 10; iii++) /* { dg-error " must declare variable" } */
+    a = 5;
+
+  _Cilk_for (int qq = 0, jj = 0; qq < 10; qq++) /* { dg-error " initializer cannot have multiple variable declarations" } */
+    a = 5;
+
+  _Cilk_for (int ii = 0, int jj = 0; ii < 10; ii++) /* { dg-error " initializer cannot have multiple variable declarations" } */
+    a = 5;
+
+  _Cilk_for (int rr = 0; ; rr++) /* { dg-error "missing condition" } */
+    a = 5;
+
+  _Cilk_for (int ii = 0; ii = 5; ii++) /* { dg-error "invalid controlling predicate" } */
+    a = 5;
+
+  _Cilk_for (int ii = 0; ii == 5; ii++) /* { dg-error "invalid controlling predicate" } */
+    a = 5;
+
+  _Cilk_for (int ii = 0; ii < 10;) /* { dg-error "missing increment" } */
+    a = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii ) /* { dg-error "invalid increment expression" } */
+    a = 5;
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      a = 5;
+      if (ii == 5)
+	break; /* { dg-error "break statement used in _Cilk_for loop body" } */
+    }
+
+#pragma cilk grainsize 5 /* { dg-error "expected '=' before numeric constant" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    a = 5;
+
+#pragma Silk grainsize = 5 /* { dg-warning "ignoring #pragma Silk grainsize" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    a = 5;
+#pragma cilk grainsiz = 5 /* { dg-warning "ignoring #pragma cilk grainsiz" } */
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    a = 5;
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_t_errors.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_t_errors.cc
new file mode 100644
index 0000000..b597764
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/cilk_for_t_errors.cc
@@ -0,0 +1,30 @@ 
+/* { dg-options "-fcilkplus" } */
+
+#include <setjmp.h>
+int main (void)
+{
+  int a, iii = 0;
+
+  _Cilk_for (volatile int ii = 0; ii < 10; ii++) /* { dg-error "iteration variable cannot be volatile" } */
+    a = 5;
+
+  _Cilk_for (static int ii = 0; ii < 10; ii++) /* { dg-error "storage class is not allowed" } */
+    a = 5;
+  _Cilk_for (register int ii = 0; ii < 10; ii++) /* { dg-error "storage class is not allowed" } */
+    a = 5;
+
+  _Cilk_for (extern int ii = 0; ii < 10; ii++) /* { dg-error "storage class is not allowed" } */
+    a = 5;
+
+  _Cilk_for (float ii = 0.0; ii < 10.0; ii += 0.5) /* { dg-error "induction variable must be of integral record or pointer type" } */
+    a = 5;
+
+  jmp_buf env;
+  _Cilk_for (int ii = 0; ii < 10; ii++) 
+    {
+      a = 5;
+      setjmp (env); /* { dg-error "calls to setjmp are not allowed within" } */
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/explicit_ctor.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/explicit_ctor.cc
new file mode 100644
index 0000000..89f6403
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/explicit_ctor.cc
@@ -0,0 +1,27 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+struct BruceBoxleitner {
+    int m;
+    BruceBoxleitner (int n = 0) : m(n) { }
+    BruceBoxleitner operator--() { --m; return *this; }
+};
+
+int operator- (BruceBoxleitner a, BruceBoxleitner b) { return a.m - b.m; }
+
+struct BruceLee {
+    int m;
+    explicit BruceLee (int n) : m(n) { }
+};
+
+bool operator> (BruceBoxleitner a, BruceLee b) { return a.m > b.m; }
+int operator- (BruceBoxleitner a, BruceLee b) { return a.m - b.m; }
+
+int main () {
+    _Cilk_for (BruceBoxleitner i = 10; i > BruceLee(0); --i)
+      ;
+    return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/label_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/label_test.cc
new file mode 100644
index 0000000..495e9b4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/label_test.cc
@@ -0,0 +1,26 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+int main(void)
+{
+  int jj = 0;
+  int total = 0;
+
+  _Cilk_for (int ii = 0; ii < 10; ii++)
+    {
+      if ((ii % 2) == 0)
+	goto hello_label;
+      else
+	goto world_label;
+
+hello_label:
+     total++;
+world_label:
+     total++;
+    }
+  if (total != 15)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/no-opp-overload-error.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/no-opp-overload-error.cc
new file mode 100644
index 0000000..582ef60
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/no-opp-overload-error.cc
@@ -0,0 +1,88 @@ 
+/* { dg-options "-fcilkplus" } */
+
+
+#define NUMBER_OF_ELEMENTS 10
+
+#include <cstdlib>
+
+class my_class {
+private:
+  int value;
+public:
+
+  my_class ();
+  my_class (const my_class &val);
+  my_class (my_class &val);
+  my_class (int val);
+  ~my_class ();
+  int getValue();
+  my_class &operator= (my_class &new_value);
+  my_class &operator= (int x);
+  my_class &operator+= (int val)
+  {
+    value += val;
+    return *this;
+  }
+  bool operator< (const my_class &val)
+  {
+    return (value < val.value);
+  }
+};
+
+
+my_class::my_class ()
+{
+  value = 0;
+}
+
+my_class::my_class(int val)
+{
+  value = val;
+}
+
+my_class::my_class (my_class &val)
+{
+  value = val.value;
+}
+
+my_class::my_class (const my_class &val)
+{
+  value = val.value;
+}
+
+my_class::~my_class ()
+{
+  value = -1;
+}
+
+int my_class::getValue ()
+{
+  return value;
+}
+
+my_class & my_class::operator= (my_class &new_value)
+{
+  value = new_value.value;
+  return *this;
+}
+
+my_class &my_class::operator= (int x)
+{
+  value = x;
+  return *this;
+}
+
+int main (void)
+{
+  int n, *array_parallel;
+  my_class length (NUMBER_OF_ELEMENTS);
+    n = NUMBER_OF_ELEMENTS;
+  
+  array_parallel = new int[NUMBER_OF_ELEMENTS];
+  _Cilk_for (my_class ii (0); ii < length; ii += 1) { /* { dg-error " No operator-" } */
+      int x = ii.getValue();
+    array_parallel [x] = x * 2;
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/plus-equal-one.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/plus-equal-one.cc
new file mode 100644
index 0000000..1326308
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/plus-equal-one.cc
@@ -0,0 +1,59 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#if HAVE_IO
+#include <cstdio>
+#endif
+#define TEST 1
+
+
+#define ITER 300
+
+int n_errors;
+#if TEST
+void test (int *array, int n, int val) {
+#if HAVE_IO
+    for (int i = 0; i < n; i++)
+      std::printf("array[%3d] = %2d\n", i, array[i]);
+#endif
+    for (int i = 0; i < n; ++i) {
+        if (array[i] != val) {
+           __builtin_abort (); 
+        }
+    }
+}
+#endif
+ 
+
+int main () {
+    int array[ITER];
+  
+    for (int ii = 0; ii < ITER; ii++)
+      array[ii] = 9;
+    _Cilk_for (int *j = (array); j < array + ITER; j += 1)  {
+       *j = 6; 
+    }
+#if TEST
+    test(array, ITER, 6);
+#endif
+
+    _Cilk_for (int *i = array; i < array + ITER; i += 1) {
+        *i = 1;
+    }
+
+#if TEST
+    test(array, ITER, 1);
+#endif
+
+    _Cilk_for (int *k = array+ITER-1; k >= array; k -= 1) {
+        *k = 8;
+    }
+#if TEST
+    test(array, ITER, 8);
+#endif
+  
+    return 0;
+
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/plus-equal-test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/plus-equal-test.cc
new file mode 100644
index 0000000..0ca588d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/plus-equal-test.cc
@@ -0,0 +1,111 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#define NUMBER_OF_ELEMENTS 10
+
+#include <cstdlib>
+
+#if HAVE_IO
+#include <cstdio>
+#endif
+
+class my_class {
+private:
+  int value;
+public:
+
+  my_class ();
+  my_class (const my_class &val);
+  my_class (my_class &val);
+  my_class (int val);
+  ~my_class ();
+  int getValue();
+  my_class &operator= (my_class &new_value);
+  my_class &operator+= (int val)
+  {
+    value += val;
+    return *this;
+  }
+  bool operator< (const my_class &val)
+  {
+    return (value < val.value);
+  }
+};
+
+
+my_class::my_class ()
+{
+  value = 0;
+}
+
+my_class::my_class(int val)
+{
+  value = val;
+}
+
+my_class::my_class (my_class &val)
+{
+  value = val.value;
+}
+
+my_class::my_class (const my_class &val)
+{
+  value = val.value;
+}
+
+my_class::~my_class ()
+{
+  value = -1;
+}
+
+int my_class::getValue ()
+{
+  return value;
+}
+
+my_class & my_class::operator= (my_class &new_value)
+{
+  value = new_value.value;
+  return *this;
+}
+
+int operator- (my_class x, my_class y)
+{
+  int val_x = x.getValue ();
+  int val_y = y.getValue ();
+  return (val_x - val_y);
+}
+
+
+int main (void)
+{
+  int n, *array_parallel, *array_serial;
+  my_class length (NUMBER_OF_ELEMENTS);
+    n = NUMBER_OF_ELEMENTS;
+  
+  array_parallel = new int[NUMBER_OF_ELEMENTS];
+  array_serial = new int[NUMBER_OF_ELEMENTS];
+
+  _Cilk_for (my_class ii (0); ii < length; ii += 1) {
+#if HAVE_IO
+    std::printf("ii.getValue() = %d\n", ii.getValue ());
+#endif
+    array_parallel [ii.getValue ()] = ii.getValue() * 2;
+  }
+
+  for (my_class ii (0); ii < length; ii += 1)
+    array_serial [ii.getValue ()] = ii.getValue () * 2;
+  
+  for (int ii = 0; ii < NUMBER_OF_ELEMENTS; ii++)
+    if (array_serial[ii] != array_parallel[ii]) {
+#if HAVE_IO
+      std::printf("array_serial[%3d] = %6d\tarray_parallel[%3d] = %6d\n", ii,
+		  array_serial[ii], ii, array_parallel[ii]);
+#endif
+      __builtin_abort ();
+    }
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
new file mode 100644
index 0000000..e4f2ee5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_iter.cc
@@ -0,0 +1,58 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+
+
+#include <vector>
+#include <cstdio>
+#include <iostream>
+#include <algorithm>
+
+using namespace std;
+
+int main(void)
+{
+vector <int> array;
+vector <int> array_serial;
+
+#if 1
+for (int ii = -1; ii < 10; ii++)
+{   
+  array.push_back(ii);
+  array_serial.push_back (ii);
+}
+#endif
+#if 1
+_Cilk_for (vector<int>::iterator iter = array.begin(); iter != array.end(); 
+	   iter++)
+{
+   if (*iter  == 6) 
+     *iter = 13;
+}
+#endif
+for (vector<int>::iterator iter2 = array_serial.begin(); 
+     iter2 != array_serial.end(); iter2++)
+{
+   if (*iter2  == 6) 
+     *iter2 = 13;
+}
+sort (array.begin(), array.end());
+sort (array_serial.begin(), array_serial.end());
+
+vector <int>::iterator iter3 = array.begin ();
+vector <int>::iterator iter_serial = array_serial.begin ();
+
+while (iter3 != array.end () && iter_serial != array_serial.end ())
+{
+  if (*iter3 != *iter_serial)
+    abort ();
+  iter3++;
+  iter_serial++;
+}
+
+return 0;
+}   
+
+
diff --git a/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
new file mode 100644
index 0000000..3e350a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cilk-plus/CK/stl_test.cc
@@ -0,0 +1,50 @@ 
+/* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fcilkplus" } */
+/* { dg-options "-lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
+
+
+#include <iostream>
+#include <cstdio>
+#include <cstdlib>
+#include <vector>
+#include <algorithm>
+#include <list>
+
+using namespace std;
+
+
+int main(int argc, char **argv)
+{
+  vector <int> number_list, number_list_serial;
+  int new_number = 0;
+  int no_elements = 0;
+  
+  if (argc != 2)
+  {
+    no_elements = 10000;
+  }
+
+
+  number_list.clear();
+  number_list_serial.clear();
+  for (int ii = 0; ii < no_elements; ii++)
+  {
+    number_list.push_back(new_number);
+    number_list_serial.push_back(new_number);
+  }
+
+  _Cilk_for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list[jj] = jj + no_elements;
+  }
+  for (int jj = 0; jj < no_elements; jj++)
+  {
+    number_list_serial[jj] = jj + no_elements;
+  }
+
+  for (int jj = 0; jj < no_elements; jj++)
+    if (number_list_serial[jj] != number_list[jj])
+      abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp b/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
index 90faca4..20a8b55 100644
--- a/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
+++ b/gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
@@ -80,7 +80,7 @@  dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -f
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -O1 -fcilkplus $ALWAYS_CFLAGS" " "
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -O2 -fcilkplus $ALWAYS_CFLAGS" " "
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -O3 -fcilkplus $ALWAYS_CFLAGS" " "
-dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -g -fcilkplus" " "
+dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -g -fcilkplus $ALWAYS_CFLAGS" " "
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -g -O2 -fcilkplus $ALWAYS_CFLAGS" " "
 dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-plus/CK/*.c]] " -g -O3 -fcilkplus $ALWAYS_CFLAGS" " "
 dg-finish