Message ID | alpine.LNX.2.00.1108031540480.810@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
This patch causes regression in one of the Spec2000 benchmarks on arm-none-linux-gnueabi cortex-a9. The benchmark 173.applu from CFP2000 dropped performance by about 8% between revisions 177367 and 177368. Other benchmarks are not affected. In the assembly generated by the new version, the two main subroutines of applu (blts and buts) seem to use the stack a lot more. I couldn't localize the problem to a single loop. Perhaps the array index optimization increases register pressure? Thanks, Greta > -----Original Message----- > From: Richard Guenther [mailto:rguenther@suse.de] > Sent: 03 August 2011 14:48 > To: gcc-patches@gcc.gnu.org > Cc: fortran@gcc.gnu.org > Subject: [PATCH][RFC] Fix PR49957 - build array index differently > > > This fixes PR49957 by keeping the array index into a multi-dimensional > array in optimal associated form which is ((off + outermost) + ...) + > innermost) + constant) so that dependence analysis can properly > handle it. It doesn't work right now because we build the > expression in reverse order, fold thinks it can do some fancy and > the expression is of signed type and thus we know it doesn't overflow > but we also won't re-associate it to a more optimal form. > > I tried reversing the loop in gfc_conv_array_ref but that doesn't > work (for example aliasing_dummy_4.f90 ICEs), thus the funny way > of chaining the pluses. > > I also don't know if there is maybe another place we build similar > expressions that should be adjusted, too - this one is where > we build the expression for the testcase I looked at. > > The patch doesn't make 410.bwaves measurably faster, but at least > it also doesn't get slower. > > Bootstrap and regtest is currently running on x86_64-unknown-linux-gnu, > the reversed loop one was ok (well, apart from those 2-3 fails). > > Comments? Any idea why reversing the loop would break? > The ICE I got is > > /space/rguenther/src/svn/trunk/gcc/testsuite/gfortran.dg/aliasing_dummy > _4.f90: > In function 'test_f90': > /space/rguenther/src/svn/trunk/gcc/testsuite/gfortran.dg/aliasing_dummy > _4.f90:21:0: > internal compiler error: in gfc_conv_constant, at > fortran/trans-const.c:387 > Please submit a full bug report, > with preprocessed source if appropriate. > See <http://gcc.gnu.org/bugs.html> for instructions. > > Thanks, > Richard. > > 2011-08-03 Richard Guenther <rguenther@suse.de> > > PR fortran/49957 > * trans-array.c (gfc_conv_array_ref): Build the array index > expression in optimally associated order. > > * gfortran.dg/vect/O3-pr49957.f: New testcase. > > Index: gcc/fortran/trans-array.c > =================================================================== > --- gcc/fortran/trans-array.c (revision 177094) > +++ gcc/fortran/trans-array.c (working copy) > @@ -2634,7 +2634,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr > locus * where) > { > int n; > - tree index; > + tree index, offset, *indexp; > tree tmp; > tree stride; > gfc_se indexse; > @@ -2669,9 +2669,16 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr > return; > } > > - index = gfc_index_zero_node; > + offset = gfc_conv_array_offset (se->expr); > + if (TREE_CODE (offset) != INTEGER_CST) > + index = offset; > + else > + index = gfc_index_zero_node; > > - /* Calculate the offsets from all the dimensions. */ > + indexp = &index; > + > + /* Calculate the offsets from all the dimensions. Make sure to > associate > + the final offset so that we form a chain of loop invariant > summands. */ > for (n = 0; n < ar->dimen; n++) > { > /* Calculate the index for this dimension. */ > @@ -2740,15 +2747,38 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr > tmp = fold_build2_loc (input_location, MULT_EXPR, > gfc_array_index_type, > indexse.expr, stride); > > - /* And add it to the total. */ > - index = fold_build2_loc (input_location, PLUS_EXPR, > - gfc_array_index_type, index, tmp); > + /* And add it to the total. Avoid folding as that re-associates > + in a non-optimal way. We want to have constant offsets as > + the outermost addition and the rest of the additions in order > + of the loop depth. */ > + if (!integer_zerop (index)) > + { > + if (TREE_CODE (tmp) == INTEGER_CST) > + { > + bool reset = indexp == &index; > + index = fold_build2_loc (input_location, PLUS_EXPR, > + gfc_array_index_type, index, tmp); > + if (reset) > + indexp = &index; > + } > + else > + { > + *indexp > + = build2_loc (input_location, PLUS_EXPR, > + gfc_array_index_type, *indexp, tmp); > + indexp = &TREE_OPERAND (*indexp, 0); > + } > + } > + else > + { > + index = tmp; > + indexp = &index; > + } > } > > - tmp = gfc_conv_array_offset (se->expr); > - if (!integer_zerop (tmp)) > + if (TREE_CODE (offset) == INTEGER_CST) > index = fold_build2_loc (input_location, PLUS_EXPR, > - gfc_array_index_type, index, tmp); > + gfc_array_index_type, index, offset); > > /* Access the calculated element. */ > tmp = gfc_conv_array_data (se->expr); > Index: gcc/testsuite/gfortran.dg/vect/O3-pr49957.f > =================================================================== > --- gcc/testsuite/gfortran.dg/vect/O3-pr49957.f (revision 0) > +++ gcc/testsuite/gfortran.dg/vect/O3-pr49957.f (revision 0) > @@ -0,0 +1,16 @@ > +! { dg-do compile } > +! { dg-require-effective-target vect_double } > + subroutine shell(nx,ny,nz) > + implicit none > + integer i,j,k,l,nx,ny,nz > + real*8 q(5,nx,ny),dq(5,nx,ny) > + do j=1,ny > + do i=1,nx > + do l=1,5 > + q(l,i,j)=q(l,i,j)+dq(l,i,j) > + enddo > + enddo > + enddo > + return > + end > +! { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail > vect_no_align } } }
Index: gcc/fortran/trans-array.c =================================================================== --- gcc/fortran/trans-array.c (revision 177094) +++ gcc/fortran/trans-array.c (working copy) @@ -2634,7 +2634,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr locus * where) { int n; - tree index; + tree index, offset, *indexp; tree tmp; tree stride; gfc_se indexse; @@ -2669,9 +2669,16 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr return; } - index = gfc_index_zero_node; + offset = gfc_conv_array_offset (se->expr); + if (TREE_CODE (offset) != INTEGER_CST) + index = offset; + else + index = gfc_index_zero_node; - /* Calculate the offsets from all the dimensions. */ + indexp = &index; + + /* Calculate the offsets from all the dimensions. Make sure to associate + the final offset so that we form a chain of loop invariant summands. */ for (n = 0; n < ar->dimen; n++) { /* Calculate the index for this dimension. */ @@ -2740,15 +2747,38 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, indexse.expr, stride); - /* And add it to the total. */ - index = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, index, tmp); + /* And add it to the total. Avoid folding as that re-associates + in a non-optimal way. We want to have constant offsets as + the outermost addition and the rest of the additions in order + of the loop depth. */ + if (!integer_zerop (index)) + { + if (TREE_CODE (tmp) == INTEGER_CST) + { + bool reset = indexp == &index; + index = fold_build2_loc (input_location, PLUS_EXPR, + gfc_array_index_type, index, tmp); + if (reset) + indexp = &index; + } + else + { + *indexp + = build2_loc (input_location, PLUS_EXPR, + gfc_array_index_type, *indexp, tmp); + indexp = &TREE_OPERAND (*indexp, 0); + } + } + else + { + index = tmp; + indexp = &index; + } } - tmp = gfc_conv_array_offset (se->expr); - if (!integer_zerop (tmp)) + if (TREE_CODE (offset) == INTEGER_CST) index = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, index, tmp); + gfc_array_index_type, index, offset); /* Access the calculated element. */ tmp = gfc_conv_array_data (se->expr); Index: gcc/testsuite/gfortran.dg/vect/O3-pr49957.f =================================================================== --- gcc/testsuite/gfortran.dg/vect/O3-pr49957.f (revision 0) +++ gcc/testsuite/gfortran.dg/vect/O3-pr49957.f (revision 0) @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-require-effective-target vect_double } + subroutine shell(nx,ny,nz) + implicit none + integer i,j,k,l,nx,ny,nz + real*8 q(5,nx,ny),dq(5,nx,ny) + do j=1,ny + do i=1,nx + do l=1,5 + q(l,i,j)=q(l,i,j)+dq(l,i,j) + enddo + enddo + enddo + return + end +! { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail vect_no_align } } }