Message ID | 201407091602.s69G2QOm014547@d06av02.portsmouth.uk.ibm.com |
---|---|
State | New |
Headers | show |
On Wed, Jul 9, 2014 at 12:02 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Hello, > > the implementation of homogenous float aggregates for the ELFv2 ABI has > unfortunately shown to have a bug in a corner case. > > The problem is that because such aggregates are packed in the argument > save area, but each (4-byte) float occupies one of just 13 registers > on its own, we may run out of registers while we're still within the > first 64 bytes of the argument save area. > > Usually, any argument that doesn't fit into register should go in > memory. But that rule doesn't apply within the first 64 bytes, where > such arguments need to go into GPRs first. This is important since > the ABI guarantees that the first 64 bytes of the save area are free, > e.g. to store GPRs into. If an argument is actually passed within > those first 64 bytes, some code (e.g. libffi assembler stubs) may > clobber its contents. > > Now, the existing rs6000_function_arg code will handle this case > correctly if the extra floats come in a *new* argument. For example, > in the following test case > > struct float2 { float x[2]; }; > struct float6 { float x[6]; }; > struct float8 { float x[8]; }; > > float func (struct float8 a, struct float6 b, struct float2 c); > > both parts of "c" are correctly expected in r10. > > However, the code handles incorrectly the case where a *single* > aggregate argument is split between FPRs and "extra" floats. > For example, "b.x[5]" is expected in memory, although it ought > to reside in r9. > > The appended patch fixes the implementation to comply with the ABI. > > This is an ABI change for the affected corner cases of the ELFv2 > ABI. However, those cases should be extremely rare; the full > compat.exe and struct-layout-1.exp ABI compatibility test suite > passed, with the exception of two tests specifically intended > to test multiple homogeneous float aggregates. > > Tested on powerpc64le-linux. > > OK for mainline? > [ The patch should then also go into the 4.8 and 4.9 branches for > consistency. ] Okay everywhere. Thanks, David
On Wed, Jul 9, 2014 at 6:02 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Hello, > > the implementation of homogenous float aggregates for the ELFv2 ABI has > unfortunately shown to have a bug in a corner case. > > The problem is that because such aggregates are packed in the argument > save area, but each (4-byte) float occupies one of just 13 registers > on its own, we may run out of registers while we're still within the > first 64 bytes of the argument save area. > > Usually, any argument that doesn't fit into register should go in > memory. But that rule doesn't apply within the first 64 bytes, where > such arguments need to go into GPRs first. This is important since > the ABI guarantees that the first 64 bytes of the save area are free, > e.g. to store GPRs into. If an argument is actually passed within > those first 64 bytes, some code (e.g. libffi assembler stubs) may > clobber its contents. > > Now, the existing rs6000_function_arg code will handle this case > correctly if the extra floats come in a *new* argument. For example, > in the following test case > > struct float2 { float x[2]; }; > struct float6 { float x[6]; }; > struct float8 { float x[8]; }; > > float func (struct float8 a, struct float6 b, struct float2 c); > > both parts of "c" are correctly expected in r10. > > However, the code handles incorrectly the case where a *single* > aggregate argument is split between FPRs and "extra" floats. > For example, "b.x[5]" is expected in memory, although it ought > to reside in r9. > > The appended patch fixes the implementation to comply with the ABI. > > This is an ABI change for the affected corner cases of the ELFv2 > ABI. However, those cases should be extremely rare; the full > compat.exe and struct-layout-1.exp ABI compatibility test suite > passed, with the exception of two tests specifically intended > to test multiple homogeneous float aggregates. > > Tested on powerpc64le-linux. > > OK for mainline? > [ The patch should then also go into the 4.8 and 4.9 branches for > consistency. ] Can you add -Wpsabi warnings for all the issues you found? That way a world re-build will at least show if anything important is affected. Thanks, Richard. > Bye, > Ulrich > > > ChangeLog: > > * config/rs6000/rs6000.c (rs6000_function_arg): If a float argument > does not fit fully into floating-point registers, and there is still > space in the register parameter area, use GPRs to pass those parts > of the argument. > (rs6000_arg_partial_bytes): Likewise. > > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 212100) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -10227,6 +10227,7 @@ > rtx r, off; > int i, k = 0; > unsigned long n_fpreg = (GET_MODE_SIZE (elt_mode) + 7) >> 3; > + int fpr_words; > > /* Do we also need to pass this argument in the parameter > save area? */ > @@ -10255,6 +10256,37 @@ > rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); > } > > + /* If there were not enough FPRs to hold the argument, the rest > + usually goes into memory. However, if the current position > + is still within the register parameter area, a portion may > + actually have to go into GPRs. > + > + Note that it may happen that the portion of the argument > + passed in the first "half" of the first GPR was already > + passed in the last FPR as well. > + > + For unnamed arguments, we already set up GPRs to cover the > + whole argument in rs6000_psave_function_arg, so there is > + nothing further to do at this point. */ > + fpr_words = (i * GET_MODE_SIZE (elt_mode)) / (TARGET_32BIT ? 4 : 8); > + if (i < n_elts && align_words + fpr_words < GP_ARG_NUM_REG > + && cum->nargs_prototype > 0) > + { > + enum machine_mode rmode = TARGET_32BIT ? SImode : DImode; > + int n_words = rs6000_arg_size (mode, type); > + > + align_words += fpr_words; > + n_words -= fpr_words; > + > + do > + { > + r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words); > + off = GEN_INT (fpr_words++ * GET_MODE_SIZE (rmode)); > + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); > + } > + while (++align_words < GP_ARG_NUM_REG && --n_words != 0); > + } > + > return rs6000_finish_function_arg (mode, rvec, k); > } > else if (align_words < GP_ARG_NUM_REG) > @@ -10330,8 +10362,23 @@ > /* Otherwise, we pass in FPRs only. Check for partial copies. */ > passed_in_gprs = false; > if (cum->fregno + n_elts * n_fpreg > FP_ARG_MAX_REG + 1) > - ret = ((FP_ARG_MAX_REG + 1 - cum->fregno) > - * MIN (8, GET_MODE_SIZE (elt_mode))); > + { > + /* Compute number of bytes / words passed in FPRs. If there > + is still space available in the register parameter area > + *after* that amount, a part of the argument will be passed > + in GPRs. In that case, the total amount passed in any > + registers is equal to the amount that would have been passed > + in GPRs if everything were passed there, so we fall back to > + the GPR code below to compute the appropriate value. */ > + int fpr = ((FP_ARG_MAX_REG + 1 - cum->fregno) > + * MIN (8, GET_MODE_SIZE (elt_mode))); > + int fpr_words = fpr / (TARGET_32BIT ? 4 : 8); > + > + if (align_words + fpr_words < GP_ARG_NUM_REG) > + passed_in_gprs = true; > + else > + ret = fpr; > + } > } > > if (passed_in_gprs > -- > Dr. Ulrich Weigand > GNU/Linux compilers and toolchain > Ulrich.Weigand@de.ibm.com >
On Wed, Jul 09, 2014 at 02:19:36PM -0400, David Edelsohn wrote: > > This is an ABI change for the affected corner cases of the ELFv2 > > ABI. However, those cases should be extremely rare; the full > > compat.exe and struct-layout-1.exp ABI compatibility test suite > > passed, with the exception of two tests specifically intended > > to test multiple homogeneous float aggregates. > > > > Tested on powerpc64le-linux. > > > > OK for mainline? > > [ The patch should then also go into the 4.8 and 4.9 branches for > > consistency. ] > > Okay everywhere. IMHO it should only go to the trunk, same as https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00540.html (see also related PRs, has been discussed on IRC too). If it is a corner case, the better, fewer people will be affected when transitioning from 4.9.x to 5.0 (or 4.10?). But 4.9.0 is for a few months in the wild already, I don't think ABI changes are something people would like to see between 4.9.0 and 4.9.1 or any other patchlevel revision, most people aren't prepared to rebuild the world just because they have upgraded to a newer compiler snapshot or new patchlevel version. Jakub
Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 212100) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -10227,6 +10227,7 @@ rtx r, off; int i, k = 0; unsigned long n_fpreg = (GET_MODE_SIZE (elt_mode) + 7) >> 3; + int fpr_words; /* Do we also need to pass this argument in the parameter save area? */ @@ -10255,6 +10256,37 @@ rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); } + /* If there were not enough FPRs to hold the argument, the rest + usually goes into memory. However, if the current position + is still within the register parameter area, a portion may + actually have to go into GPRs. + + Note that it may happen that the portion of the argument + passed in the first "half" of the first GPR was already + passed in the last FPR as well. + + For unnamed arguments, we already set up GPRs to cover the + whole argument in rs6000_psave_function_arg, so there is + nothing further to do at this point. */ + fpr_words = (i * GET_MODE_SIZE (elt_mode)) / (TARGET_32BIT ? 4 : 8); + if (i < n_elts && align_words + fpr_words < GP_ARG_NUM_REG + && cum->nargs_prototype > 0) + { + enum machine_mode rmode = TARGET_32BIT ? SImode : DImode; + int n_words = rs6000_arg_size (mode, type); + + align_words += fpr_words; + n_words -= fpr_words; + + do + { + r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words); + off = GEN_INT (fpr_words++ * GET_MODE_SIZE (rmode)); + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off); + } + while (++align_words < GP_ARG_NUM_REG && --n_words != 0); + } + return rs6000_finish_function_arg (mode, rvec, k); } else if (align_words < GP_ARG_NUM_REG) @@ -10330,8 +10362,23 @@ /* Otherwise, we pass in FPRs only. Check for partial copies. */ passed_in_gprs = false; if (cum->fregno + n_elts * n_fpreg > FP_ARG_MAX_REG + 1) - ret = ((FP_ARG_MAX_REG + 1 - cum->fregno) - * MIN (8, GET_MODE_SIZE (elt_mode))); + { + /* Compute number of bytes / words passed in FPRs. If there + is still space available in the register parameter area + *after* that amount, a part of the argument will be passed + in GPRs. In that case, the total amount passed in any + registers is equal to the amount that would have been passed + in GPRs if everything were passed there, so we fall back to + the GPR code below to compute the appropriate value. */ + int fpr = ((FP_ARG_MAX_REG + 1 - cum->fregno) + * MIN (8, GET_MODE_SIZE (elt_mode))); + int fpr_words = fpr / (TARGET_32BIT ? 4 : 8); + + if (align_words + fpr_words < GP_ARG_NUM_REG) + passed_in_gprs = true; + else + ret = fpr; + } } if (passed_in_gprs