diff mbox series

Add param for bb limit to invoke fast_vrp.

Message ID 1bb04945-d13b-4805-b9ed-0be4a5c773fc@redhat.com
State New
Headers show
Series Add param for bb limit to invoke fast_vrp. | expand

Commit Message

Andrew MacLeod June 21, 2024, 1:02 p.m. UTC
This patch adds

     --param=vrp-block-limit=N

When the basic block counter for a function exceeded 'N' , VRP is 
invoked with the new fast_vrp algorithm instead.   This algorithm uses a 
lot less memory and processing power, although it does get a few less 
things.

Primary motivation is cases like 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP 
passes consume about 600 seconds of the compile time, and a lot of 
memory.      With fast_vrp, it spends less than 10 seconds total in the 
3 passes of VRP.     This test case has about 400,000 basic blocks.

The default for N in this patch is 150,000,  arbitrarily chosen.

This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0 
as well) on x86_64-pc-linux-gnu, with no regressions.

What do you think, OK for trunk?

Andrew

PS sorry,. it doesn't help the threader in that PR :-(

Comments

Richard Biener June 22, 2024, 1:15 p.m. UTC | #1
On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> This patch adds
>
>      --param=vrp-block-limit=N
>
> When the basic block counter for a function exceeded 'N' , VRP is
> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> lot less memory and processing power, although it does get a few less
> things.
>
> Primary motivation is cases like
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> passes consume about 600 seconds of the compile time, and a lot of
> memory.      With fast_vrp, it spends less than 10 seconds total in the
> 3 passes of VRP.     This test case has about 400,000 basic blocks.
>
> The default for N in this patch is 150,000,  arbitrarily chosen.
>
> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> as well) on x86_64-pc-linux-gnu, with no regressions.
>
> What do you think, OK for trunk?

+      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
+         &data == &pass_data_fast_vrp)

|| goes to the next line.

Btw, we have -Wdisabled-optimization for these cases which should
say sth like "function has excess of %d number of basic blocks
(--param vrp-block-limit=%d), using fast VRP algorithm"
or so in this case.

As I wrote in the PR the priority should be -O1 compile-time
performance and memory use.

Richard.

> Andrew
>
> PS sorry,. it doesn't help the threader in that PR :-(

It's probably one of the known bottlenecks there - IIRC the path range
queries make O(n^2) work.  I can look at this at some point as I've
dealt with the slowness of this pass in the past.

There is param_max_jump_thread_paths that should limit searching
but there is IIRC no way to limit the work path ranger actually does
when resolving the query.

Richard.
Andrew MacLeod June 25, 2024, 2:19 a.m. UTC | #2
On 6/22/24 09:15, Richard Biener wrote:
> On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>> This patch adds
>>
>>       --param=vrp-block-limit=N
>>
>> When the basic block counter for a function exceeded 'N' , VRP is
>> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
>> lot less memory and processing power, although it does get a few less
>> things.
>>
>> Primary motivation is cases like
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
>> passes consume about 600 seconds of the compile time, and a lot of
>> memory.      With fast_vrp, it spends less than 10 seconds total in the
>> 3 passes of VRP.     This test case has about 400,000 basic blocks.
>>
>> The default for N in this patch is 150,000,  arbitrarily chosen.
>>
>> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
>> as well) on x86_64-pc-linux-gnu, with no regressions.
>>
>> What do you think, OK for trunk?
> +      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> +         &data == &pass_data_fast_vrp)
>
> || goes to the next line.
>
> Btw, we have -Wdisabled-optimization for these cases which should
> say sth like "function has excess of %d number of basic blocks
> (--param vrp-block-limit=%d), using fast VRP algorithm"
> or so in this case.
>
> As I wrote in the PR the priority should be -O1 compile-time
> performance and memory use.


Yeah, I just wanted to use it as a model for "bad" cases for ranger.   
Adjusted patch attached which now issues the warning.

I also found that the transitive relations were causing a small blowup 
in time for relation processing now that I turned relations on for fast 
VRP.  I commited a patch and fast_vrp no longer does transitives.

If you want to experiment with enabling fast VRP at -O1, it should be 
fast all the time now.  I think :-)    This testcase runs in about 95 
seconds on my test machine.  if I turn on VRP, a single VRP pass takes 
about 2.5 seconds.    Its all set up, you can just add:

NEXT_PASS (pass_fast_vrp);

at an appropriate spot.

> Richard.
>
>> Andrew
>>
>> PS sorry,. it doesn't help the threader in that PR :-(
> It's probably one of the known bottlenecks there - IIRC the path range
> queries make O(n^2) work.  I can look at this at some point as I've
> dealt with the slowness of this pass in the past.
>
> There is param_max_jump_thread_paths that should limit searching
> but there is IIRC no way to limit the work path ranger actually does
> when resolving the query.
>
Yeah, Id like to talk to Aldy about revamping the threader now that some 
of the newer facilities are available that fast_vrp uses.

We can calculate all the outgoing ranges for a block at once with :

   // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
   bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = 
NULL);

This is what the fast_vrp routines uses.  We can gather all range 
restrictions generated from an edge efficiently just once and then 
intersect them with a known range as we walk the different paths. We 
don't need the gori exports , nor any of the other on-demand bits where 
we calculate each export range dynamically.. I suspect it would reduce 
the workload and memory impact quite a bit, but I'm not really familiar 
with exactly how the threader uses those things.

It'd require some minor tweaking to the lazy_ssa_cache to make the 
bitmap of names set accessible. This  would provide similar 
functionality to what the gori export () routine provides.  Both 
relations and inferred ranges should only need to be calculated once per 
block as well and could/should/would be applied the same way if they are 
present.   I don't *think* the threader uses any of the def chains, but 
Aldy can chip in.

Andrew
Andrew Pinski June 25, 2024, 2:35 a.m. UTC | #3
On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
>
> On 6/22/24 09:15, Richard Biener wrote:
> > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >> This patch adds
> >>
> >>       --param=vrp-block-limit=N
> >>
> >> When the basic block counter for a function exceeded 'N' , VRP is
> >> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> >> lot less memory and processing power, although it does get a few less
> >> things.
> >>
> >> Primary motivation is cases like
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> >> passes consume about 600 seconds of the compile time, and a lot of
> >> memory.      With fast_vrp, it spends less than 10 seconds total in the
> >> 3 passes of VRP.     This test case has about 400,000 basic blocks.
> >>
> >> The default for N in this patch is 150,000,  arbitrarily chosen.
> >>
> >> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> >> as well) on x86_64-pc-linux-gnu, with no regressions.
> >>
> >> What do you think, OK for trunk?
> > +      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> > +         &data == &pass_data_fast_vrp)
> >
> > || goes to the next line.
> >
> > Btw, we have -Wdisabled-optimization for these cases which should
> > say sth like "function has excess of %d number of basic blocks
> > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > or so in this case.
> >
> > As I wrote in the PR the priority should be -O1 compile-time
> > performance and memory use.
>
>
> Yeah, I just wanted to use it as a model for "bad" cases for ranger.
> Adjusted patch attached which now issues the warning.
>
> I also found that the transitive relations were causing a small blowup
> in time for relation processing now that I turned relations on for fast
> VRP.  I commited a patch and fast_vrp no longer does transitives.
>
> If you want to experiment with enabling fast VRP at -O1, it should be
> fast all the time now.  I think :-)    This testcase runs in about 95
> seconds on my test machine.  if I turn on VRP, a single VRP pass takes
> about 2.5 seconds.    Its all set up, you can just add:
>
> NEXT_PASS (pass_fast_vrp);
>
> at an appropriate spot.
>
> > Richard.
> >
> >> Andrew
> >>
> >> PS sorry,. it doesn't help the threader in that PR :-(
> > It's probably one of the known bottlenecks there - IIRC the path range
> > queries make O(n^2) work.  I can look at this at some point as I've
> > dealt with the slowness of this pass in the past.
> >
> > There is param_max_jump_thread_paths that should limit searching
> > but there is IIRC no way to limit the work path ranger actually does
> > when resolving the query.
> >
> Yeah, Id like to talk to Aldy about revamping the threader now that some
> of the newer facilities are available that fast_vrp uses.
>
> We can calculate all the outgoing ranges for a block at once with :
>
>    // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
>    bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
> NULL);
>
> This is what the fast_vrp routines uses.  We can gather all range
> restrictions generated from an edge efficiently just once and then
> intersect them with a known range as we walk the different paths. We
> don't need the gori exports , nor any of the other on-demand bits where
> we calculate each export range dynamically.. I suspect it would reduce
> the workload and memory impact quite a bit, but I'm not really familiar
> with exactly how the threader uses those things.
>
> It'd require some minor tweaking to the lazy_ssa_cache to make the
> bitmap of names set accessible. This  would provide similar
> functionality to what the gori export () routine provides.  Both
> relations and inferred ranges should only need to be calculated once per
> block as well and could/should/would be applied the same way if they are
> present.   I don't *think* the threader uses any of the def chains, but
> Aldy can chip in.

+   warning (OPT_Wdisabled_optimization,
+    "Using fast VRP algorithm. %d basic blocks"
+    " exceeds %s%d limit",
+    n_basic_blocks_for_fn (fun),
+    "--param=vrp-block-limit=",
+    param_vrp_block_limit);

This should be:
warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
    " exceeds %<%--param=vrp-block-limit=d%> limit",
n_basic_blocks_for_fn (fun), param_vrp_block_limit);

I had thought it was mentioned that options should be quoted but it is
not mentioned in the coding conventions:
https://gcc.gnu.org/codingconventions.html#Diagnostics

But it is mentioned in
https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
; This is why you were getting an error as you mentioned on IRC.


>
> Andrew
Andrew Pinski June 25, 2024, 4:27 a.m. UTC | #4
On Mon, Jun 24, 2024 at 7:35 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >
> >
> > On 6/22/24 09:15, Richard Biener wrote:
> > > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> > >> This patch adds
> > >>
> > >>       --param=vrp-block-limit=N
> > >>
> > >> When the basic block counter for a function exceeded 'N' , VRP is
> > >> invoked with the new fast_vrp algorithm instead.   This algorithm uses a
> > >> lot less memory and processing power, although it does get a few less
> > >> things.
> > >>
> > >> Primary motivation is cases like
> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which the 3  VRP
> > >> passes consume about 600 seconds of the compile time, and a lot of
> > >> memory.      With fast_vrp, it spends less than 10 seconds total in the
> > >> 3 passes of VRP.     This test case has about 400,000 basic blocks.
> > >>
> > >> The default for N in this patch is 150,000,  arbitrarily chosen.
> > >>
> > >> This bootstraps, (and I bootstrapped it with --param=vrp-block-limit=0
> > >> as well) on x86_64-pc-linux-gnu, with no regressions.
> > >>
> > >> What do you think, OK for trunk?
> > > +      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
> > > +         &data == &pass_data_fast_vrp)
> > >
> > > || goes to the next line.
> > >
> > > Btw, we have -Wdisabled-optimization for these cases which should
> > > say sth like "function has excess of %d number of basic blocks
> > > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > > or so in this case.
> > >
> > > As I wrote in the PR the priority should be -O1 compile-time
> > > performance and memory use.
> >
> >
> > Yeah, I just wanted to use it as a model for "bad" cases for ranger.
> > Adjusted patch attached which now issues the warning.
> >
> > I also found that the transitive relations were causing a small blowup
> > in time for relation processing now that I turned relations on for fast
> > VRP.  I commited a patch and fast_vrp no longer does transitives.
> >
> > If you want to experiment with enabling fast VRP at -O1, it should be
> > fast all the time now.  I think :-)    This testcase runs in about 95
> > seconds on my test machine.  if I turn on VRP, a single VRP pass takes
> > about 2.5 seconds.    Its all set up, you can just add:
> >
> > NEXT_PASS (pass_fast_vrp);
> >
> > at an appropriate spot.
> >
> > > Richard.
> > >
> > >> Andrew
> > >>
> > >> PS sorry,. it doesn't help the threader in that PR :-(
> > > It's probably one of the known bottlenecks there - IIRC the path range
> > > queries make O(n^2) work.  I can look at this at some point as I've
> > > dealt with the slowness of this pass in the past.
> > >
> > > There is param_max_jump_thread_paths that should limit searching
> > > but there is IIRC no way to limit the work path ranger actually does
> > > when resolving the query.
> > >
> > Yeah, Id like to talk to Aldy about revamping the threader now that some
> > of the newer facilities are available that fast_vrp uses.
> >
> > We can calculate all the outgoing ranges for a block at once with :
> >
> >    // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
> >    bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
> > NULL);
> >
> > This is what the fast_vrp routines uses.  We can gather all range
> > restrictions generated from an edge efficiently just once and then
> > intersect them with a known range as we walk the different paths. We
> > don't need the gori exports , nor any of the other on-demand bits where
> > we calculate each export range dynamically.. I suspect it would reduce
> > the workload and memory impact quite a bit, but I'm not really familiar
> > with exactly how the threader uses those things.
> >
> > It'd require some minor tweaking to the lazy_ssa_cache to make the
> > bitmap of names set accessible. This  would provide similar
> > functionality to what the gori export () routine provides.  Both
> > relations and inferred ranges should only need to be calculated once per
> > block as well and could/should/would be applied the same way if they are
> > present.   I don't *think* the threader uses any of the def chains, but
> > Aldy can chip in.
>
> +   warning (OPT_Wdisabled_optimization,
> +    "Using fast VRP algorithm. %d basic blocks"
> +    " exceeds %s%d limit",
> +    n_basic_blocks_for_fn (fun),
> +    "--param=vrp-block-limit=",
> +    param_vrp_block_limit);
>
> This should be:
> warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
>     " exceeds %<%--param=vrp-block-limit=d%> limit",
> n_basic_blocks_for_fn (fun), param_vrp_block_limit);
>
> I had thought it was mentioned that options should be quoted but it is
> not mentioned in the coding conventions:
> https://gcc.gnu.org/codingconventions.html#Diagnostics
>
> But it is mentioned in
> https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
> ; This is why you were getting an error as you mentioned on IRC.

Note I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115627 to
add it to the coding conventions. I might take a stab at adding the
rules mentioned in
https://inbox.sourceware.org/gcc-patches/8ac62fe2-e4bf-0922-4947-fca9567a0561@gmail.com/
since those are the ones which are detected right now.

Thanks,
Andrew

>
>
> >
> > Andrew
Andrew MacLeod June 25, 2024, 1:23 p.m. UTC | #5
On 6/24/24 22:35, Andrew Pinski wrote:
> On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>     // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
>>     bool gori_on_edge (class ssa_cache &r, edge e, range_query *query =
>> NULL);
>>
>> This is what the fast_vrp routines uses.  We can gather all range
>> restrictions generated from an edge efficiently just once and then
>> intersect them with a known range as we walk the different paths. We
>> don't need the gori exports , nor any of the other on-demand bits where
>> we calculate each export range dynamically.. I suspect it would reduce
>> the workload and memory impact quite a bit, but I'm not really familiar
>> with exactly how the threader uses those things.
>>
>> It'd require some minor tweaking to the lazy_ssa_cache to make the
>> bitmap of names set accessible. This  would provide similar
>> functionality to what the gori export () routine provides.  Both
>> relations and inferred ranges should only need to be calculated once per
>> block as well and could/should/would be applied the same way if they are
>> present.   I don't *think* the threader uses any of the def chains, but
>> Aldy can chip in.
> +   warning (OPT_Wdisabled_optimization,
> +    "Using fast VRP algorithm. %d basic blocks"
> +    " exceeds %s%d limit",
> +    n_basic_blocks_for_fn (fun),
> +    "--param=vrp-block-limit=",
> +    param_vrp_block_limit);
>
> This should be:
> warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d basic blocks"
>      " exceeds %<%--param=vrp-block-limit=d%> limit",
> n_basic_blocks_for_fn (fun), param_vrp_block_limit);
>
> I had thought it was mentioned that options should be quoted but it is
> not mentioned in the coding conventions:
> https://gcc.gnu.org/codingconventions.html#Diagnostics
>
> But it is mentioned in
> https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
> ; This is why you were getting an error as you mentioned on IRC.
>
>
I didnt do that because when I did, everything in bracketed by the %< %> 
in the warning came out in bold text.  is that the intended effect?


Andrew.
David Malcolm June 25, 2024, 9:32 p.m. UTC | #6
On Mon, 2024-06-24 at 21:27 -0700, Andrew Pinski wrote:
> On Mon, Jun 24, 2024 at 7:35 PM Andrew Pinski <pinskia@gmail.com>
> wrote:
> > 
> > On Mon, Jun 24, 2024 at 7:20 PM Andrew MacLeod
> > <amacleod@redhat.com> wrote:
> > > 
> > > 
> > > On 6/22/24 09:15, Richard Biener wrote:
> > > > On Fri, Jun 21, 2024 at 3:02 PM Andrew MacLeod
> > > > <amacleod@redhat.com> wrote:
> > > > > This patch adds
> > > > > 
> > > > >       --param=vrp-block-limit=N
> > > > > 
> > > > > When the basic block counter for a function exceeded 'N' ,
> > > > > VRP is
> > > > > invoked with the new fast_vrp algorithm instead.   This
> > > > > algorithm uses a
> > > > > lot less memory and processing power, although it does get a
> > > > > few less
> > > > > things.
> > > > > 
> > > > > Primary motivation is cases like
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855 in which
> > > > > the 3  VRP
> > > > > passes consume about 600 seconds of the compile time, and a
> > > > > lot of
> > > > > memory.      With fast_vrp, it spends less than 10 seconds
> > > > > total in the
> > > > > 3 passes of VRP.     This test case has about 400,000 basic
> > > > > blocks.
> > > > > 
> > > > > The default for N in this patch is 150,000,  arbitrarily
> > > > > chosen.
> > > > > 
> > > > > This bootstraps, (and I bootstrapped it with --param=vrp-
> > > > > block-limit=0
> > > > > as well) on x86_64-pc-linux-gnu, with no regressions.
> > > > > 
> > > > > What do you think, OK for trunk?
> > > > +      if (last_basic_block_for_fn (fun) >
> > > > param_vrp_block_limit ||
> > > > +         &data == &pass_data_fast_vrp)
> > > > 
> > > > > > goes to the next line.
> > > > 
> > > > Btw, we have -Wdisabled-optimization for these cases which
> > > > should
> > > > say sth like "function has excess of %d number of basic blocks
> > > > (--param vrp-block-limit=%d), using fast VRP algorithm"
> > > > or so in this case.
> > > > 
> > > > As I wrote in the PR the priority should be -O1 compile-time
> > > > performance and memory use.
> > > 
> > > 
> > > Yeah, I just wanted to use it as a model for "bad" cases for
> > > ranger.
> > > Adjusted patch attached which now issues the warning.
> > > 
> > > I also found that the transitive relations were causing a small
> > > blowup
> > > in time for relation processing now that I turned relations on
> > > for fast
> > > VRP.  I commited a patch and fast_vrp no longer does transitives.
> > > 
> > > If you want to experiment with enabling fast VRP at -O1, it
> > > should be
> > > fast all the time now.  I think :-)    This testcase runs in
> > > about 95
> > > seconds on my test machine.  if I turn on VRP, a single VRP pass
> > > takes
> > > about 2.5 seconds.    Its all set up, you can just add:
> > > 
> > > NEXT_PASS (pass_fast_vrp);
> > > 
> > > at an appropriate spot.
> > > 
> > > > Richard.
> > > > 
> > > > > Andrew
> > > > > 
> > > > > PS sorry,. it doesn't help the threader in that PR :-(
> > > > It's probably one of the known bottlenecks there - IIRC the
> > > > path range
> > > > queries make O(n^2) work.  I can look at this at some point as
> > > > I've
> > > > dealt with the slowness of this pass in the past.
> > > > 
> > > > There is param_max_jump_thread_paths that should limit
> > > > searching
> > > > but there is IIRC no way to limit the work path ranger actually
> > > > does
> > > > when resolving the query.
> > > > 
> > > Yeah, Id like to talk to Aldy about revamping the threader now
> > > that some
> > > of the newer facilities are available that fast_vrp uses.
> > > 
> > > We can calculate all the outgoing ranges for a block at once with
> > > :
> > > 
> > >    // Fill ssa-cache R with any outgoing ranges on edge E, using
> > > QUERY.
> > >    bool gori_on_edge (class ssa_cache &r, edge e, range_query
> > > *query =
> > > NULL);
> > > 
> > > This is what the fast_vrp routines uses.  We can gather all range
> > > restrictions generated from an edge efficiently just once and
> > > then
> > > intersect them with a known range as we walk the different paths.
> > > We
> > > don't need the gori exports , nor any of the other on-demand bits
> > > where
> > > we calculate each export range dynamically.. I suspect it would
> > > reduce
> > > the workload and memory impact quite a bit, but I'm not really
> > > familiar
> > > with exactly how the threader uses those things.
> > > 
> > > It'd require some minor tweaking to the lazy_ssa_cache to make
> > > the
> > > bitmap of names set accessible. This  would provide similar
> > > functionality to what the gori export () routine provides.  Both
> > > relations and inferred ranges should only need to be calculated
> > > once per
> > > block as well and could/should/would be applied the same way if
> > > they are
> > > present.   I don't *think* the threader uses any of the def
> > > chains, but
> > > Aldy can chip in.
> > 
> > +   warning (OPT_Wdisabled_optimization,
> > +    "Using fast VRP algorithm. %d basic blocks"
> > +    " exceeds %s%d limit",
> > +    n_basic_blocks_for_fn (fun),
> > +    "--param=vrp-block-limit=",
> > +    param_vrp_block_limit);
> > 
> > This should be:
> > warning (OPT_Wdisabled_optimization, "Using fast VRP algorithm. %d
> > basic blocks"
> >     " exceeds %<%--param=vrp-block-limit=d%> limit",
> > n_basic_blocks_for_fn (fun), param_vrp_block_limit);
> > 
> > I had thought it was mentioned that options should be quoted but it
> > is
> > not mentioned in the coding conventions:
> > https://gcc.gnu.org/codingconventions.html#Diagnostics
> > 
> > But it is mentioned in
> > https://inbox.sourceware.org/gcc/2d2bd844-2de4-ecff-7a07-b2235075074c@gmail.com/
> > ; This is why you were getting an error as you mentioned on IRC.
> 
> Note I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115627 to
> add it to the coding conventions. I might take a stab at adding the
> rules mentioned in
> https://inbox.sourceware.org/gcc-patches/8ac62fe2-e4bf-0922-4947-fca9567a0561@gmail.com/
> since those are the ones which are detected right now.

FWIW there's some overlap between:
https://gcc.gnu.org/codingconventions.html#Diagnostics
and:
https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html#Quoting

Command-line options ought to be quoted in diagnostic messages.  In
particular as of GCC 14  [1] such quoted strings will automatically get
a URL to the documentation for the option in a sufficiently capable
terminal.  I'm hoping to add something similar for quoted *attributes*
in GCC 15; see [2].  FWIW I'm also looking at something that could
support better HTML output (e.g. div classes) for such quoted output
(e.g. via markdown via SARIF output), but that's more experimental.

Dave
[1] e.g. r14-6920-g9e49746da303b8
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655541.html
diff mbox series

Patch

From 3bb9bd3ca8038676e45b0bddcda91cbed7e51662 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod <amacleod@redhat.com>
Date: Mon, 17 Jun 2024 11:38:46 -0400
Subject: [PATCH 4/5] Add param for bb limit to invoke fast_vrp.

If the basic block count is too high, simply use fast_vrp for all
VRP passes.

	gcc/doc/
	* invoke.texi (vrp-block-limit): Document.

	gcc/
	* params.opt (-param=vrp-block-limit): New.
	* tree-vrp.cc (fvrp_folder::execute): Invoke fast_vrp if block
	count exceeds limit.
---
 gcc/doc/invoke.texi | 3 +++
 gcc/params.opt      | 4 ++++
 gcc/tree-vrp.cc     | 4 ++--
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5d7a87fde86..f2f8f6334dc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16840,6 +16840,9 @@  this parameter.  The default value of this parameter is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vrp-block-limit
+Maximum number of basic blocks before VRP switches to a lower memory algorithm.
+
 @item vrp-sparse-threshold
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index d34ef545bf0..c17ba17b91b 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1198,6 +1198,10 @@  The maximum factor which the loop vectorizer applies to the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vrp-block-limit=
+Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param
+Maximum number of basic blocks before VRP switches to a fast model with less memory requirements.
+
 -param=vrp-sparse-threshold=
 Common Joined UInteger Var(param_vrp_sparse_threshold) Init(3000) Optimization Param
 Maximum number of basic blocks before VRP uses a sparse bitmap cache.
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 4fc33e63e7d..eef02146ec6 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -1330,9 +1330,9 @@  public:
   unsigned int execute (function *fun) final override
     {
       // Check for fast vrp.
-      if (&data == &pass_data_fast_vrp)
+      if (last_basic_block_for_fn (fun) > param_vrp_block_limit ||
+	  &data == &pass_data_fast_vrp)
 	return execute_fast_vrp (fun, final_p);
-
       return execute_ranger_vrp (fun, final_p);
     }
 
-- 
2.45.0