#pragma GCC unroll support

Jason, Joseph, this is stage 1 material (unless someone else wants to try and make an argument for it sooner), if you could review the parser (frontend) bits, that would be wonderful.  The mid-end, and back-end bits Richard was reviewing.

On Jan 8, 2015, at 4:45 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> But I'd make the 2nd operand optional (thus use NULL_TREE,
> not integer_zero_node for existing builds).

Fixed.

> You also need to check that 'unroll' fits in the target integer_type_node
> (consider AVR!) and otherwise diagnose it.

Yeah, I’ve tightened this down.

> The 'unroll' member in struct loop should be unsigned

Fixed.

> (or you need to document what negative values are supposed to mean).

Fixed.

> It also should be smaller than int (struct loop may be heavily used), possibly
> short or even (un)signed char(?).

We talked around here, and we felt unsigned short would be a good choice.  Seems to be more in the GNU spirit as well.  Cray’s documentation has a limit of 64.

> @@ -341,7 +341,10 @@ tree_estimate_loop_size (struct loop *lo
>              if (likely_eliminated || likely_eliminated_last)
>                size->last_iteration_eliminated_by_peeling += num;
>            }
> -         if ((size->overall * 3 / 2 - size->eliminated_by_peeling
> +         /* A loop that we want to unroll is never too large.  */
> +         if (loop->unroll > 0)
> +           ;
> 
> but if we end up unrolling more than loop->unroll then it may be too large?
> That is this check should be in the caller, not here.

Fixed.

> I think you miss updating copy_loop_info (and places where we don't
> use that but still copy loops somehow).

I’ve fixed up copy_loop_info.  I searched for other routines based upon that other members, and found two, first one is:

/* Allocates and returns new loop structure.  */

struct loop *
alloc_loop (void)

and I thought about adding code to initialize it:

  loop->unroll = 0;

but after tracing to ensure that ggc_cleared_alloc meant what I thought it mean, that would be redundant, so I did not.

I missed print_loop, which I added:

+  if (loop->unroll)
+    fprintf (file, ", unroll = %d", loop->unroll);

I didn’t find any other hits.

>> I didn’t engineer ivdeps and unroll together.  Does it sound reasonable to allow both to be used at the same time on the same loop?  If so, I can add the two other cases, presently I just handle one of them then the loop.
> 
> Yes.

Fixed.

>> Does unroll 8 mean that the loop is repeated 8 times?
> 
> Up to you to define - what do other compilers do for #pragma unroll 0
> and #pragma unroll 1?

I googled for cray pragma unroll and found it.  They have a pragma unroll and it takes a numeric argument.  They have 0 and 1 mean don’t unroll.  That mirrors roughly with what I was thinking those two numbers meant, so I pushed the code in that direction and got rid of the -1.  I don’t see any compelling reason to deviate from what they did, and this should make the compiler more predictable by people in this space.  I’m happier now having read the Cray documentation about the mapping.

>> Can I turn on peeling in try_peel_loop by simply wanted to do it for 1 loop?
> 
> ?

So, after fixing up all the other code to unroll, it seems sufficient to rely upon it and not have the peeler do anything.  This allows me to not use the peeler, so this question is now moot.

> #pragma unroll 0
> 
> or
> 
> #pragma nounroll
> 
> what do other compilers do?

Good question.  I googled for a hypothetical pragma unroll and found it.  :-)  They support 0 and 1 as don’t unroll the loop.  I’ve adjusted the patch to do the same.

>> Yes, I’m aware that this isn’t the right phase for this, but such are business cycles.  It would not go in until we reenter stage 1.  I see no value in trying to squeeze it in past stage 1.

> the middle-end bits look fine apart from the above issues (it feels like you need to add too many
> checks for loop->unroll in the peeler…)

I did a cleanup pass to collapse down what tests I could from the entire patch set.  I also was able to engineer out the peeler from doing the unrolling.  Should be nicer now.

Tested on x86_64-unknown-linux-gnu.

Back-end/mid-end bits Ok?

C front-end bits Ok?

C++ front-end bits Ok?

#pragma GCC unroll support

Commit Message

Comments

Patch