Message ID | 20240731200735.229898-1-morin-mikael@orange.fr |
---|---|
Headers | show |
Series | fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608] | expand |
Hi Mikael, thanks for this nice set of patches! I've played around a bit, and it seems to look good. I have only minor comments left (besides the nan issue raised): - inline expansion is inhibited at -Os. But wouldn't it be good if we make this expansion also dependent on -ffrontend-optimize? (This was the case for rank-1 before your patch). - in the case where two sets of loop(nest)s are generated, i.e. for non-integral argument x, I wonder if the predictors for conditionals (-> _likely/_unlikely) are chosen optimally. E.g. for this code: subroutine minloc_real (x, m, back) implicit none real, intent(in), contiguous :: x(:) integer :: m(*) logical, optional :: back m(1:rank(x)) = minloc (x,back=back) end the first loop becomes: S.10 = second_loop_entry.9 ? idx0.7 : 1; while (1) { if (S.10 > D.4310) goto L.3; if (__builtin_expect ((integer(kind=8)) ((*x.0)[S.10 * D.4314 + D.4309] <= limit.8), 0, 8)) { limit.8 = (*x.0)[S.10 * D.4314 + D.4309]; pos0.5 = S.10 + offset0.6; idx0.7 = S.10; second_loop_entry.9 = 1; goto L.1; } S.10 = S.10 + 1; } This results from this code in trans-intrinsic.cc: if (!lab1 || HONOR_NANS (DECL_MODE (limit))) { ... cond = gfc_unlikely (cond, PRED_BUILTIN_EXPECT); ifbody = build3_v (COND_EXPR, cond, ifbody, build_empty_stmt (input_location)); } As the reason for this separate loop is finding a first non-nan value, I would expect gfc_likely to be more reasonable for the common case. (There is also the oddity S.10 = second_loop_entry.9 ? ..., where idx0.7 seems to be not initialized, but luckily it seems to be handled by the optimizer and seen that this is no problem.) Having gfc_unlikely in the second set of loops is fine, as this passes over all array elements. Note that this is pre-existing without/before your patch, but since you are at it, you may want to check. Otherwise this is fine for mainline with the said issues considered. Thanks for the patch-set! Harald Am 31.07.24 um 22:07 schrieb Mikael Morin: > From: Mikael Morin <mikael@gcc.gnu.org> > > This series of patches enable the generation of inline code for the MINLOC > and MAXLOC intrinsics, when the DIM argument is not present. The > generated code is based on the inline implementation already generated in > the scalar case, that is when ARRAY has rank 1 and DIM is present. The > code is extended by using several variables (one for each dimension) where > the scalar code used just one, and collecting the variables to an array > before returning. > > The patches are split in a way that allows inlining in more and more cases > as controlled by the gfc_inline_intrinsic_p predicate which evolves with > the patches. > > They have been generated on top of the patch: > https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657959.html > > Mikael Morin (8): > fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608] > fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608] > fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 > [PR90608] > fortran: Outline array bound check generation code > fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK > [PR90608] > fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK > [PR90608] > fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608] > fortran: Continue MINLOC/MAXLOC second loop where the first stopped > [PR90608] > > gcc/fortran/frontend-passes.cc | 3 +- > gcc/fortran/trans-array.cc | 382 ++++++++------- > gcc/fortran/trans-intrinsic.cc | 454 +++++++++++++----- > gcc/testsuite/gfortran.dg/maxloc_7.f90 | 220 +++++++++ > gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 | 4 +- > gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 | 4 +- > gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 | 4 +- > gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 | 4 +- > .../gfortran.dg/maxloc_with_mask_1.f90 | 393 +++++++++++++++ > gcc/testsuite/gfortran.dg/minloc_8.f90 | 220 +++++++++ > .../gfortran.dg/minloc_with_mask_1.f90 | 392 +++++++++++++++ > 11 files changed, 1792 insertions(+), 288 deletions(-) > create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90 > create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90 > create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90 > create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90 >
Hi Mikael and Harald, > - inline expansion is inhibited at -Os. But wouldn't it be good if > we make this expansion also dependent on -ffrontend-optimize? > (This was the case for rank-1 before your patch). The original idea was to have -ffrontend-optimize as a check if anything went wrong with front-end optimization in particular - if the bug went away with -fno-frontend-optimize, we knew where to look (and I knew I had to look). So, probably better to not do this at -Os. One thought: Should we also do the inlining without optimization? Very nice set of patches! Best regards Thomas
Hello, Le 06/08/2024 à 22:57, Thomas Koenig a écrit : > Hi Mikael and Harald, > >> - inline expansion is inhibited at -Os. But wouldn't it be good if >> we make this expansion also dependent on -ffrontend-optimize? >> (This was the case for rank-1 before your patch). > By the way, I disabled the minmaxloc frontend optimization without too much thought, because it was preventing me from seeing the effects of my patches in the dumps. Now that both of you have put some focus on it, I think the optimization should be completely removed instead, because the patches make it unreachable. > The original idea was to have -ffrontend-optimize as a check if anything > went wrong with front-end optimization in particular - if the bug went > away with -fno-frontend-optimize, we knew where to look (and I knew > I had to look). > It also provides a way for users to workaround bugs in frontend optimizations. If inline expansion were dependent on the flag, it would also provide the same benefit, but it would be using the flag outside of its intended scope, so I would rather not do it. > So, probably better to not do this at -Os. One thought: Should we > also do the inlining without optimization? > At -Os: no inline expansion. Don't we all agree on that? I'm fine with also disabling expansion at -O0. Mikael
Hello, Le 06/08/2024 à 22:05, Harald Anlauf a écrit : > Hi Mikael, > > thanks for this nice set of patches! > > I've played around a bit, and it seems to look good. > > I have only minor comments left (besides the nan issue raised): > > - inline expansion is inhibited at -Os. But wouldn't it be good if > we make this expansion also dependent on -ffrontend-optimize? > (This was the case for rank-1 before your patch). See my answer to Thomas' message. > - in the case where two sets of loop(nest)s are generated, i.e. for > non-integral argument x, I wonder if the predictors for conditionals > (-> _likely/_unlikely) are chosen optimally. E.g. for this code: > > subroutine minloc_real (x, m, back) > implicit none > real, intent(in), contiguous :: x(:) > integer :: m(*) > logical, optional :: back > m(1:rank(x)) = minloc (x,back=back) > end > > the first loop becomes: > > S.10 = second_loop_entry.9 ? idx0.7 : 1; > while (1) > { > if (S.10 > D.4310) goto L.3; > if (__builtin_expect ((integer(kind=8)) ((*x.0)[S.10 * D.4314 > + D.4309] <= limit.8), 0, 8)) > { > limit.8 = (*x.0)[S.10 * D.4314 + D.4309]; > pos0.5 = S.10 + offset0.6; > idx0.7 = S.10; > second_loop_entry.9 = 1; > goto L.1; > } > S.10 = S.10 + 1; > } > > This results from this code in trans-intrinsic.cc: > > if (!lab1 || HONOR_NANS (DECL_MODE (limit))) > { > ... > cond = gfc_unlikely (cond, PRED_BUILTIN_EXPECT); > ifbody = build3_v (COND_EXPR, cond, ifbody, > build_empty_stmt (input_location)); > } > > As the reason for this separate loop is finding a first non-nan value, > I would expect gfc_likely to be more reasonable for the common case. > I think it is unlikely in the sense that it will be true at most once, because of the goto breaking out of the loop in the if body. Your expectation is reasonable, I can't tell which is the better choice. It seems also reasonable to generate code that is fast when there are many NANs and is slow but without visible effect when there are none, because there is a single slow iteration in that case. > (There is also the oddity S.10 = second_loop_entry.9 ? ..., where > idx0.7 seems to be not initialized, but luckily it seems to be > handled by the optimizer and seen that this is no problem.) > Yeah, I considered adding initializations, but finally decided to not pollute the frontend code (already complicated enough) and the generated code with useless stuff, that the optimizers would have to remove. > Having gfc_unlikely in the second set of loops is fine, as this passes > over all array elements. > > Note that this is pre-existing without/before your patch, but since you > are at it, you may want to check. > Not sure how to check that. I need a (realistic) benchmark. > Otherwise this is fine for mainline with the said issues considered. > Thanks for the review. Will send a second version once we have settled on the topic of the frontend optimization flag. Mikael
Hi Mikael, Thomas! Am 07.08.24 um 11:11 schrieb Mikael Morin: > Hello, > > Le 06/08/2024 à 22:57, Thomas Koenig a écrit : >> Hi Mikael and Harald, >> >>> - inline expansion is inhibited at -Os. But wouldn't it be good if >>> we make this expansion also dependent on -ffrontend-optimize? >>> (This was the case for rank-1 before your patch). >> > By the way, I disabled the minmaxloc frontend optimization without too > much thought, because it was preventing me from seeing the effects of my > patches in the dumps. Now that both of you have put some focus on it, I > think the optimization should be completely removed instead, because the > patches make it unreachable. > >> The original idea was to have -ffrontend-optimize as a check if anything >> went wrong with front-end optimization in particular - if the bug went >> away with -fno-frontend-optimize, we knew where to look (and I knew >> I had to look). >> > It also provides a way for users to workaround bugs in frontend > optimizations. If inline expansion were dependent on the flag, it would > also provide the same benefit, but it would be using the flag outside of > its intended scope, so I would rather not do it. > >> So, probably better to not do this at -Os. One thought: Should we >> also do the inlining without optimization? >> > At -Os: no inline expansion. Don't we all agree on that? > I'm fine with also disabling expansion at -O0. The following change to patch 2/8 does what I had in mind: diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc index 9f3c3ce47bc..cc0d00f4e39 100644 --- a/gcc/fortran/trans-intrinsic.cc +++ b/gcc/fortran/trans-intrinsic.cc @@ -11650,6 +11650,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr) case GFC_ISYM_TRANSPOSE: return true; + case GFC_ISYM_MINLOC: + case GFC_ISYM_MAXLOC: + { + /* Disable inline expansion if code size matters. */ + if (optimize_size) + return false; /* Disable inline expansion if frontend optimization is disabled. */ if (!flag_frontend_optimize) return false; As a result, the following happens: - at -Os, inlining will never happen (as you had it) - at -O0, the default is -fno-frontend-optimize, and we get the library implementation. Inlining is forced with -ffrontend-optimize. - at higher -Ox, the default is -ffrontend-optimize. I believe this is also what Thomas' original motivation was. (This flag actually helps to see that the inlining code in gcc-14 is currently broken for minloc/maxloc and optinional back argument). As we are not planning to remove the library implementation (-Os!), this is also the best way to compare library to inline code. Cheers, Harald > Mikael >
Le 07/08/2024 à 12:03, Harald Anlauf a écrit : > Hi Mikael, Thomas! > > Am 07.08.24 um 11:11 schrieb Mikael Morin: >> Hello, >> >> Le 06/08/2024 à 22:57, Thomas Koenig a écrit : >>> Hi Mikael and Harald, >>> >>>> - inline expansion is inhibited at -Os. But wouldn't it be good if >>>> we make this expansion also dependent on -ffrontend-optimize? >>>> (This was the case for rank-1 before your patch). >>> >> By the way, I disabled the minmaxloc frontend optimization without too >> much thought, because it was preventing me from seeing the effects of my >> patches in the dumps. Now that both of you have put some focus on it, I >> think the optimization should be completely removed instead, because the >> patches make it unreachable. >> >>> The original idea was to have -ffrontend-optimize as a check if anything >>> went wrong with front-end optimization in particular - if the bug went >>> away with -fno-frontend-optimize, we knew where to look (and I knew >>> I had to look). >>> >> It also provides a way for users to workaround bugs in frontend >> optimizations. If inline expansion were dependent on the flag, it would >> also provide the same benefit, but it would be using the flag outside of >> its intended scope, so I would rather not do it. >> >>> So, probably better to not do this at -Os. One thought: Should we >>> also do the inlining without optimization? >>> >> At -Os: no inline expansion. Don't we all agree on that? >> I'm fine with also disabling expansion at -O0. > > The following change to patch 2/8 does what I had in mind: > > diff --git a/gcc/fortran/trans-intrinsic.cc > b/gcc/fortran/trans-intrinsic.cc > index 9f3c3ce47bc..cc0d00f4e39 100644 > --- a/gcc/fortran/trans-intrinsic.cc > +++ b/gcc/fortran/trans-intrinsic.cc > @@ -11650,6 +11650,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr) > case GFC_ISYM_TRANSPOSE: > return true; > > + case GFC_ISYM_MINLOC: > + case GFC_ISYM_MAXLOC: > + { > + /* Disable inline expansion if code size matters. */ > + if (optimize_size) > + return false; > > /* Disable inline expansion if frontend optimization is disabled. */ > if (!flag_frontend_optimize) > return false; > > > As a result, the following happens: > > - at -Os, inlining will never happen (as you had it) > - at -O0, the default is -fno-frontend-optimize, and we get the > library implementation. Inlining is forced with -ffrontend-optimize. > - at higher -Ox, the default is -ffrontend-optimize. > > I believe this is also what Thomas' original motivation was. > > (This flag actually helps to see that the inlining code in gcc-14 > is currently broken for minloc/maxloc and optinional back argument). > > As we are not planning to remove the library implementation (-Os!), > this is also the best way to compare library to inline code. > This makes perfect sense, but why reuse the -ffrontend-optimize option? The manual describes it as: > This option performs front-end optimization, based on manipulating parts the Fortran parse tree These patches are about inlining, there is no manipulation of the parse tree. So I would rather use a separate option (-finline-intrinsics?).
On Thu, Aug 08, 2024 at 11:09:10AM +0200, Mikael Morin wrote: > > These patches are about inlining, there is no manipulation of the parse > tree. So I would rather use a separate option (-finline-intrinsics?). I've only followed the discussion from afar, but gcc already supports a -finline and -fno-inline option. % gfortran13 -o z -fno-inline a.f90 % ./z 0 -2.459358E-01 5.567117E-02 1 4.347283E-02 -9.840712E-03 2 2.546304E-01 -5.763932E-02 3 5.837931E-02 -1.321501E-02 4 -2.196027E-01 4.971030E-02 5 -2.340615E-01 5.298326E-02 6 -1.445877E-02 3.272955E-03 7 2.167110E-01 -4.905571E-02 8 3.178541E-01 -7.195095E-02 9 2.918557E-01 -6.606582E-02 4.347275E-02 2.490154E-01 gcc/opts.cc: SET_OPTION_IF_UNSET (opts, opts_set, flag_inline_functions, value); This, of course, controls all inlining not just intrinsic subprograms. PS: Thanks for the series of cleanup and improvement patches!
Am 08.08.24 um 11:09 schrieb Mikael Morin: >> >> As we are not planning to remove the library implementation (-Os!), >> this is also the best way to compare library to inline code. >> > This makes perfect sense, but why reuse the -ffrontend-optimize option? > The manual describes it as: >> This option performs front-end optimization, based on manipulating >> parts the Fortran parse tree > > These patches are about inlining, there is no manipulation of the parse > tree. So I would rather use a separate option (-finline-intrinsics?). I concur, -ffrontend-optimize should remain specific to manipulating the parse tree. Having a -finline-intrinsic options, which would be set at any optimization level except -Os, may be the right way forward. Or something else :-) Best regards Thomas
Am 08.08.24 um 19:13 schrieb Thomas Koenig: > Am 08.08.24 um 11:09 schrieb Mikael Morin: >>> >>> As we are not planning to remove the library implementation (-Os!), >>> this is also the best way to compare library to inline code. >>> >> This makes perfect sense, but why reuse the -ffrontend-optimize option? >> The manual describes it as: >>> This option performs front-end optimization, based on manipulating >>> parts the Fortran parse tree >> >> These patches are about inlining, there is no manipulation of the >> parse tree. So I would rather use a separate option >> (-finline-intrinsics?). > > I concur, -ffrontend-optimize should remain specific to manipulating > the parse tree. > > Having a -finline-intrinsic options, which would be set at any > optimization level except -Os, may be the right way forward. > > Or something else :-) Well, I am not so convinced that -finline-intrinsics is a good choice. What do you think users will expect from it? Quoting from the current documentation: -ffrontend-optimize This option performs front-end optimization, based on manipulating parts the Fortran parse tree. Enabled by default by any -O option except -O0 and -Og. Optimizations enabled by this option include: inlining calls to MATMUL, elimination of identical function calls within expressions, removing unnecessary calls to TRIM in comparisons and assignments, replacing TRIM(a) with a(1:LEN_TRIM(a)) and short-circuiting of logical operators (.AND. and .OR.). It can be deselected by specifying -fno-frontend-optimize. So we currently already do selective inlining, which we extend to more intrinsics, and may further extend in the future. But if you think we really need a new option I'm fine with it. Harald > Best regards > > Thomas >
From: Mikael Morin <mikael@gcc.gnu.org> This series of patches enable the generation of inline code for the MINLOC and MAXLOC intrinsics, when the DIM argument is not present. The generated code is based on the inline implementation already generated in the scalar case, that is when ARRAY has rank 1 and DIM is present. The code is extended by using several variables (one for each dimension) where the scalar code used just one, and collecting the variables to an array before returning. The patches are split in a way that allows inlining in more and more cases as controlled by the gfc_inline_intrinsic_p predicate which evolves with the patches. They have been generated on top of the patch: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657959.html Mikael Morin (8): fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608] fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608] fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608] fortran: Outline array bound check generation code fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608] fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608] fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608] fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608] gcc/fortran/frontend-passes.cc | 3 +- gcc/fortran/trans-array.cc | 382 ++++++++------- gcc/fortran/trans-intrinsic.cc | 454 +++++++++++++----- gcc/testsuite/gfortran.dg/maxloc_7.f90 | 220 +++++++++ gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 | 4 +- gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 | 4 +- gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 | 4 +- gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 | 4 +- .../gfortran.dg/maxloc_with_mask_1.f90 | 393 +++++++++++++++ gcc/testsuite/gfortran.dg/minloc_8.f90 | 220 +++++++++ .../gfortran.dg/minloc_with_mask_1.f90 | 392 +++++++++++++++ 11 files changed, 1792 insertions(+), 288 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90