mbox series

[0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

Message ID 20240731200735.229898-1-morin-mikael@orange.fr
Headers show
Series fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608] | expand

Message

Mikael Morin July 31, 2024, 8:07 p.m. UTC
From: Mikael Morin <mikael@gcc.gnu.org>

This series of patches enable the generation of inline code for the MINLOC
and MAXLOC intrinsics, when the DIM argument is not present.  The
generated code is based on the inline implementation already generated in
the scalar case, that is when ARRAY has rank 1 and DIM is present.  The
code is extended by using several variables (one for each dimension) where
the scalar code used just one, and collecting the variables to an array
before returning.

The patches are split in a way that allows inlining in more and more cases
as controlled by the gfc_inline_intrinsic_p predicate which evolves with
the patches.

They have been generated on top of the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657959.html

Mikael Morin (8):
  fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]
  fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]
  fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1
    [PR90608]
  fortran: Outline array bound check generation code
  fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK
    [PR90608]
  fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK
    [PR90608]
  fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]
  fortran: Continue MINLOC/MAXLOC second loop where the first stopped
    [PR90608]

 gcc/fortran/frontend-passes.cc                |   3 +-
 gcc/fortran/trans-array.cc                    | 382 ++++++++-------
 gcc/fortran/trans-intrinsic.cc                | 454 +++++++++++++-----
 gcc/testsuite/gfortran.dg/maxloc_7.f90        | 220 +++++++++
 gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |   4 +-
 .../gfortran.dg/maxloc_with_mask_1.f90        | 393 +++++++++++++++
 gcc/testsuite/gfortran.dg/minloc_8.f90        | 220 +++++++++
 .../gfortran.dg/minloc_with_mask_1.f90        | 392 +++++++++++++++
 11 files changed, 1792 insertions(+), 288 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90

Comments

Harald Anlauf Aug. 6, 2024, 8:05 p.m. UTC | #1
Hi Mikael,

thanks for this nice set of patches!

I've played around a bit, and it seems to look good.

I have only minor comments left (besides the nan issue raised):

- inline expansion is inhibited at -Os.  But wouldn't it be good if
   we make this expansion also dependent on -ffrontend-optimize?
   (This was the case for rank-1 before your patch).

- in the case where two sets of loop(nest)s are generated, i.e. for
   non-integral argument x, I wonder if the predictors for conditionals
   (-> _likely/_unlikely) are chosen optimally.  E.g. for this code:

subroutine minloc_real (x, m, back)
   implicit none
   real, intent(in), contiguous :: x(:)
   integer                      :: m(*)
   logical, optional            :: back
   m(1:rank(x)) = minloc (x,back=back)
end

the first loop becomes:

       S.10 = second_loop_entry.9 ? idx0.7 : 1;
       while (1)
         {
           if (S.10 > D.4310) goto L.3;
           if (__builtin_expect ((integer(kind=8)) ((*x.0)[S.10 * D.4314 
+ D.4309] <= limit.8), 0, 8))
             {
               limit.8 = (*x.0)[S.10 * D.4314 + D.4309];
               pos0.5 = S.10 + offset0.6;
               idx0.7 = S.10;
               second_loop_entry.9 = 1;
               goto L.1;
             }
           S.10 = S.10 + 1;
         }

This results from this code in trans-intrinsic.cc:

   if (!lab1 || HONOR_NANS (DECL_MODE (limit)))
     {
       ...
       cond = gfc_unlikely (cond, PRED_BUILTIN_EXPECT);
       ifbody = build3_v (COND_EXPR, cond, ifbody,
			 build_empty_stmt (input_location));
     }

As the reason for this separate loop is finding a first non-nan value,
I would expect gfc_likely to be more reasonable for the common case.

(There is also the oddity S.10 = second_loop_entry.9 ? ..., where
idx0.7 seems to be not initialized, but luckily it seems to be
handled by the optimizer and seen that this is no problem.)

Having gfc_unlikely in the second set of loops is fine, as this passes
over all array elements.

Note that this is pre-existing without/before your patch, but since you
are at it, you may want to check.

Otherwise this is fine for mainline with the said issues considered.

Thanks for the patch-set!

Harald

Am 31.07.24 um 22:07 schrieb Mikael Morin:
> From: Mikael Morin <mikael@gcc.gnu.org>
> 
> This series of patches enable the generation of inline code for the MINLOC
> and MAXLOC intrinsics, when the DIM argument is not present.  The
> generated code is based on the inline implementation already generated in
> the scalar case, that is when ARRAY has rank 1 and DIM is present.  The
> code is extended by using several variables (one for each dimension) where
> the scalar code used just one, and collecting the variables to an array
> before returning.
> 
> The patches are split in a way that allows inlining in more and more cases
> as controlled by the gfc_inline_intrinsic_p predicate which evolves with
> the patches.
> 
> They have been generated on top of the patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657959.html
> 
> Mikael Morin (8):
>    fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]
>    fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]
>    fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1
>      [PR90608]
>    fortran: Outline array bound check generation code
>    fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK
>      [PR90608]
>    fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK
>      [PR90608]
>    fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]
>    fortran: Continue MINLOC/MAXLOC second loop where the first stopped
>      [PR90608]
> 
>   gcc/fortran/frontend-passes.cc                |   3 +-
>   gcc/fortran/trans-array.cc                    | 382 ++++++++-------
>   gcc/fortran/trans-intrinsic.cc                | 454 +++++++++++++-----
>   gcc/testsuite/gfortran.dg/maxloc_7.f90        | 220 +++++++++
>   gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
>   gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
>   gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
>   gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |   4 +-
>   .../gfortran.dg/maxloc_with_mask_1.f90        | 393 +++++++++++++++
>   gcc/testsuite/gfortran.dg/minloc_8.f90        | 220 +++++++++
>   .../gfortran.dg/minloc_with_mask_1.f90        | 392 +++++++++++++++
>   11 files changed, 1792 insertions(+), 288 deletions(-)
>   create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
>   create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
>   create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
>   create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90
>
Thomas Koenig Aug. 6, 2024, 8:57 p.m. UTC | #2
Hi Mikael and Harald,

> - inline expansion is inhibited at -Os.  But wouldn't it be good if
>    we make this expansion also dependent on -ffrontend-optimize?
>    (This was the case for rank-1 before your patch).

The original idea was to have -ffrontend-optimize as a check if anything
went wrong with front-end optimization in particular - if the bug went
away with -fno-frontend-optimize, we knew where to look (and I knew
I had to look).

So, probably better to not do this at -Os.  One thought: Should we
also do the inlining without optimization?

Very nice set of patches!

Best regards

	Thomas
Mikael Morin Aug. 7, 2024, 9:11 a.m. UTC | #3
Hello,

Le 06/08/2024 à 22:57, Thomas Koenig a écrit :
> Hi Mikael and Harald,
> 
>> - inline expansion is inhibited at -Os.  But wouldn't it be good if
>>    we make this expansion also dependent on -ffrontend-optimize?
>>    (This was the case for rank-1 before your patch).
> 
By the way, I disabled the minmaxloc frontend optimization without too 
much thought, because it was preventing me from seeing the effects of my 
patches in the dumps.  Now that both of you have put some focus on it, I 
think the optimization should be completely removed instead, because the 
patches make it unreachable.

> The original idea was to have -ffrontend-optimize as a check if anything
> went wrong with front-end optimization in particular - if the bug went
> away with -fno-frontend-optimize, we knew where to look (and I knew
> I had to look).
> 
It also provides a way for users to workaround bugs in frontend 
optimizations.  If inline expansion were dependent on the flag, it would 
also provide the same benefit, but it would be using the flag outside of 
its intended scope, so I would rather not do it.

> So, probably better to not do this at -Os.  One thought: Should we
> also do the inlining without optimization?
> 
At -Os: no inline expansion.  Don't we all agree on that?
I'm fine with also disabling expansion at -O0.

Mikael
Mikael Morin Aug. 7, 2024, 9:38 a.m. UTC | #4
Hello,

Le 06/08/2024 à 22:05, Harald Anlauf a écrit :
> Hi Mikael,
> 
> thanks for this nice set of patches!
> 
> I've played around a bit, and it seems to look good.
> 
> I have only minor comments left (besides the nan issue raised):
> 
> - inline expansion is inhibited at -Os.  But wouldn't it be good if
>    we make this expansion also dependent on -ffrontend-optimize?
>    (This was the case for rank-1 before your patch).
See my answer to Thomas' message.

> - in the case where two sets of loop(nest)s are generated, i.e. for
>    non-integral argument x, I wonder if the predictors for conditionals
>    (-> _likely/_unlikely) are chosen optimally.  E.g. for this code:
> 
> subroutine minloc_real (x, m, back)
>    implicit none
>    real, intent(in), contiguous :: x(:)
>    integer                      :: m(*)
>    logical, optional            :: back
>    m(1:rank(x)) = minloc (x,back=back)
> end
> 
> the first loop becomes:
> 
>        S.10 = second_loop_entry.9 ? idx0.7 : 1;
>        while (1)
>          {
>            if (S.10 > D.4310) goto L.3;
>            if (__builtin_expect ((integer(kind=8)) ((*x.0)[S.10 * D.4314
> + D.4309] <= limit.8), 0, 8))
>              {
>                limit.8 = (*x.0)[S.10 * D.4314 + D.4309];
>                pos0.5 = S.10 + offset0.6;
>                idx0.7 = S.10;
>                second_loop_entry.9 = 1;
>                goto L.1;
>              }
>            S.10 = S.10 + 1;
>          }
> 
> This results from this code in trans-intrinsic.cc:
> 
>    if (!lab1 || HONOR_NANS (DECL_MODE (limit)))
>      {
>        ...
>        cond = gfc_unlikely (cond, PRED_BUILTIN_EXPECT);
>        ifbody = build3_v (COND_EXPR, cond, ifbody,
>               build_empty_stmt (input_location));
>      }
> 
> As the reason for this separate loop is finding a first non-nan value,
> I would expect gfc_likely to be more reasonable for the common case.
> 
I think it is unlikely in the sense that it will be true at most once, 
because of the goto breaking out of the loop in the if body.

Your expectation is reasonable, I can't tell which is the better choice. 
  It seems also reasonable to generate code that is fast when there are 
many NANs and is slow but without visible effect when there are none, 
because there is a single slow iteration in that case.

> (There is also the oddity S.10 = second_loop_entry.9 ? ..., where
> idx0.7 seems to be not initialized, but luckily it seems to be
> handled by the optimizer and seen that this is no problem.)
> 
Yeah, I considered adding initializations, but finally decided to not 
pollute the frontend code (already complicated enough) and the generated 
code with useless stuff, that the optimizers would have to remove.

> Having gfc_unlikely in the second set of loops is fine, as this passes
> over all array elements.
> 
> Note that this is pre-existing without/before your patch, but since you
> are at it, you may want to check.
> 
Not sure how to check that.  I need a (realistic) benchmark.

> Otherwise this is fine for mainline with the said issues considered.
> 
Thanks for the review.
Will send a second version once we have settled on the topic of the 
frontend optimization flag.

Mikael
Harald Anlauf Aug. 7, 2024, 10:03 a.m. UTC | #5
Hi Mikael, Thomas!

Am 07.08.24 um 11:11 schrieb Mikael Morin:
> Hello,
> 
> Le 06/08/2024 à 22:57, Thomas Koenig a écrit :
>> Hi Mikael and Harald,
>>
>>> - inline expansion is inhibited at -Os.  But wouldn't it be good if
>>>    we make this expansion also dependent on -ffrontend-optimize?
>>>    (This was the case for rank-1 before your patch).
>>
> By the way, I disabled the minmaxloc frontend optimization without too 
> much thought, because it was preventing me from seeing the effects of my 
> patches in the dumps.  Now that both of you have put some focus on it, I 
> think the optimization should be completely removed instead, because the 
> patches make it unreachable.
> 
>> The original idea was to have -ffrontend-optimize as a check if anything
>> went wrong with front-end optimization in particular - if the bug went
>> away with -fno-frontend-optimize, we knew where to look (and I knew
>> I had to look).
>>
> It also provides a way for users to workaround bugs in frontend 
> optimizations.  If inline expansion were dependent on the flag, it would 
> also provide the same benefit, but it would be using the flag outside of 
> its intended scope, so I would rather not do it.
> 
>> So, probably better to not do this at -Os.  One thought: Should we
>> also do the inlining without optimization?
>>
> At -Os: no inline expansion.  Don't we all agree on that?
> I'm fine with also disabling expansion at -O0.

The following change to patch 2/8 does what I had in mind:

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 9f3c3ce47bc..cc0d00f4e39 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -11650,6 +11650,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
      case GFC_ISYM_TRANSPOSE:
        return true;

+    case GFC_ISYM_MINLOC:
+    case GFC_ISYM_MAXLOC:
+      {
+	/* Disable inline expansion if code size matters.  */
+	if (optimize_size)
+	  return false;

	/* Disable inline expansion if frontend optimization is disabled.  */
	if (!flag_frontend_optimize)
	  return false;


As a result, the following happens:

- at -Os, inlining will never happen (as you had it)
- at -O0, the default is -fno-frontend-optimize, and we get the
   library implementation.  Inlining is forced with -ffrontend-optimize.
- at higher -Ox, the default is -ffrontend-optimize.

I believe this is also what Thomas' original motivation was.

(This flag actually helps to see that the inlining code in gcc-14
is currently broken for minloc/maxloc and optinional back argument).

As we are not planning to remove the library implementation (-Os!),
this is also the best way to compare library to inline code.

Cheers,
Harald

> Mikael
>
Mikael Morin Aug. 8, 2024, 9:09 a.m. UTC | #6
Le 07/08/2024 à 12:03, Harald Anlauf a écrit :
> Hi Mikael, Thomas!
> 
> Am 07.08.24 um 11:11 schrieb Mikael Morin:
>> Hello,
>>
>> Le 06/08/2024 à 22:57, Thomas Koenig a écrit :
>>> Hi Mikael and Harald,
>>>
>>>> - inline expansion is inhibited at -Os.  But wouldn't it be good if
>>>>    we make this expansion also dependent on -ffrontend-optimize?
>>>>    (This was the case for rank-1 before your patch).
>>>
>> By the way, I disabled the minmaxloc frontend optimization without too
>> much thought, because it was preventing me from seeing the effects of my
>> patches in the dumps.  Now that both of you have put some focus on it, I
>> think the optimization should be completely removed instead, because the
>> patches make it unreachable.
>>
>>> The original idea was to have -ffrontend-optimize as a check if anything
>>> went wrong with front-end optimization in particular - if the bug went
>>> away with -fno-frontend-optimize, we knew where to look (and I knew
>>> I had to look).
>>>
>> It also provides a way for users to workaround bugs in frontend
>> optimizations.  If inline expansion were dependent on the flag, it would
>> also provide the same benefit, but it would be using the flag outside of
>> its intended scope, so I would rather not do it.
>>
>>> So, probably better to not do this at -Os.  One thought: Should we
>>> also do the inlining without optimization?
>>>
>> At -Os: no inline expansion.  Don't we all agree on that?
>> I'm fine with also disabling expansion at -O0.
> 
> The following change to patch 2/8 does what I had in mind:
> 
> diff --git a/gcc/fortran/trans-intrinsic.cc 
> b/gcc/fortran/trans-intrinsic.cc
> index 9f3c3ce47bc..cc0d00f4e39 100644
> --- a/gcc/fortran/trans-intrinsic.cc
> +++ b/gcc/fortran/trans-intrinsic.cc
> @@ -11650,6 +11650,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
>       case GFC_ISYM_TRANSPOSE:
>         return true;
> 
> +    case GFC_ISYM_MINLOC:
> +    case GFC_ISYM_MAXLOC:
> +      {
> +    /* Disable inline expansion if code size matters.  */
> +    if (optimize_size)
> +      return false;
> 
>      /* Disable inline expansion if frontend optimization is disabled.  */
>      if (!flag_frontend_optimize)
>        return false;
> 
> 
> As a result, the following happens:
> 
> - at -Os, inlining will never happen (as you had it)
> - at -O0, the default is -fno-frontend-optimize, and we get the
>    library implementation.  Inlining is forced with -ffrontend-optimize.
> - at higher -Ox, the default is -ffrontend-optimize.
> 
> I believe this is also what Thomas' original motivation was.
> 
> (This flag actually helps to see that the inlining code in gcc-14
> is currently broken for minloc/maxloc and optinional back argument).
> 
> As we are not planning to remove the library implementation (-Os!),
> this is also the best way to compare library to inline code.
> 
This makes perfect sense, but why reuse the -ffrontend-optimize option?
The manual describes it as:
> This option performs front-end optimization, based on manipulating parts the Fortran parse tree

These patches are about inlining, there is no manipulation of the parse 
tree.  So I would rather use a separate option (-finline-intrinsics?).
Steve Kargl Aug. 8, 2024, 4:37 p.m. UTC | #7
On Thu, Aug 08, 2024 at 11:09:10AM +0200, Mikael Morin wrote:
> 
> These patches are about inlining, there is no manipulation of the parse
> tree.  So I would rather use a separate option (-finline-intrinsics?).

I've only followed the discussion from afar, but gcc already supports
a -finline and -fno-inline option.

% gfortran13 -o z -fno-inline a.f90
% ./z
 0  -2.459358E-01  5.567117E-02
 1   4.347283E-02 -9.840712E-03
 2   2.546304E-01 -5.763932E-02
 3   5.837931E-02 -1.321501E-02
 4  -2.196027E-01  4.971030E-02
 5  -2.340615E-01  5.298326E-02
 6  -1.445877E-02  3.272955E-03
 7   2.167110E-01 -4.905571E-02
 8   3.178541E-01 -7.195095E-02
 9   2.918557E-01 -6.606582E-02
     4.347275E-02  2.490154E-01

gcc/opts.cc:  SET_OPTION_IF_UNSET (opts, opts_set, flag_inline_functions, value);

This, of course, controls all inlining not just intrinsic subprograms.

PS: Thanks for the series of cleanup and improvement patches!
Thomas Koenig Aug. 8, 2024, 5:13 p.m. UTC | #8
Am 08.08.24 um 11:09 schrieb Mikael Morin:
>>
>> As we are not planning to remove the library implementation (-Os!),
>> this is also the best way to compare library to inline code.
>>
> This makes perfect sense, but why reuse the -ffrontend-optimize option?
> The manual describes it as:
>> This option performs front-end optimization, based on manipulating 
>> parts the Fortran parse tree
> 
> These patches are about inlining, there is no manipulation of the parse 
> tree.  So I would rather use a separate option (-finline-intrinsics?).

I concur, -ffrontend-optimize should remain specific to manipulating
the parse tree.

Having a -finline-intrinsic options, which would be set at any
optimization level except -Os, may be the right way forward.

Or something else :-)

Best regards

	Thomas
Harald Anlauf Aug. 8, 2024, 6:38 p.m. UTC | #9
Am 08.08.24 um 19:13 schrieb Thomas Koenig:
> Am 08.08.24 um 11:09 schrieb Mikael Morin:
>>>
>>> As we are not planning to remove the library implementation (-Os!),
>>> this is also the best way to compare library to inline code.
>>>
>> This makes perfect sense, but why reuse the -ffrontend-optimize option?
>> The manual describes it as:
>>> This option performs front-end optimization, based on manipulating 
>>> parts the Fortran parse tree
>>
>> These patches are about inlining, there is no manipulation of the 
>> parse tree.  So I would rather use a separate option 
>> (-finline-intrinsics?).
> 
> I concur, -ffrontend-optimize should remain specific to manipulating
> the parse tree.
> 
> Having a -finline-intrinsic options, which would be set at any
> optimization level except -Os, may be the right way forward.
> 
> Or something else :-)

Well, I am not so convinced that -finline-intrinsics is a good choice.
What do you think users will expect from it?


Quoting from the current documentation:

-ffrontend-optimize

     This option performs front-end optimization, based on manipulating 
parts the Fortran parse tree. Enabled by default by any -O option except 
-O0 and -Og. Optimizations enabled by this option include:

         inlining calls to MATMUL,
         elimination of identical function calls within expressions,
         removing unnecessary calls to TRIM in comparisons and assignments,
         replacing TRIM(a) with a(1:LEN_TRIM(a)) and
         short-circuiting of logical operators (.AND. and .OR.).

     It can be deselected by specifying -fno-frontend-optimize.


So we currently already do selective inlining, which we extend to
more intrinsics, and may further extend in the future.

But if you think we really need a new option I'm fine with it.

Harald

> Best regards
> 
>      Thomas
>