Message ID | 1615258.mc8dMGPkXm@polaris |
---|---|
State | New |
Headers | show |
Series | Use simple LRA algorithm at -O0 | expand |
On 2019-12-17 1:02 p.m., Eric Botcazou wrote: > Hi, > > LRA is getting measurably slower since GCC 8, at least on x86, and things are > worsening since GCC 9. While this might be legitimate when optimization is > enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA > over to using the simple algorithm when optimization is disabled. The effect > on code size is tiny (typically 0.2% on x86). > > Tested on x86_64-suse-linux, OK for the mainline? > Eric, thank you for reporting this issue and providing the patch. Simple LRA algorithms switch off hard register splitting, so there might a slightly bigger chance for occurring "can find reload register" error (e.g. when -O0 -fschedule-insns is used). But this error is still not solved in general case and in my experience the chance for this error is even bigger for optimized modes than for -O0 with simple LRA algorithms. Saying that I believe the patch is OK for the trunk. > 2019-12-17 Eric Botcazou <ebotcazou@adacore.com> > > * ira.c (ira): Use simple LRA algorithm when not optimizing. >
> Simple LRA algorithms switch off hard register splitting, so there might > a slightly bigger chance for occurring "can find reload register" error > (e.g. when -O0 -fschedule-insns is used). But this error is still not > solved in general case and in my experience the chance for this error is > even bigger for optimized modes than for -O0 with simple LRA algorithms. I see, thanks for the explanation. So this could occur for register varuables or something along these lines? > Saying that I believe the patch is OK for the trunk. OK, let's see how it fares. We have been using it with a GCC 9 compiler for some time, without any problem so far.
On 12/19/19 6:29 AM, Eric Botcazou wrote: >> Simple LRA algorithms switch off hard register splitting, so there might >> a slightly bigger chance for occurring "can find reload register" error >> (e.g. when -O0 -fschedule-insns is used). But this error is still not >> solved in general case and in my experience the chance for this error is >> even bigger for optimized modes than for -O0 with simple LRA algorithms. > I see, thanks for the explanation. So this could occur for register varuables > or something along these lines? It might occur when when liveness of hard registers explicitly present in RTL are expanded. A typical example is a move of hard register (e.g. x86-64 dx used as function call argument) through insn always requiring this hard register (e.g. a x86-64 div insn using ax/dx hard register). Also there are more complicated cases. Reload pass never tried to solve this problem. LRA tries to solve it but still in general case this problem is also not solved. Therefore 1st insn scheduler on some targets is switched off by default. Still GCC users can switch it on and ran into the problem with or without the patch. >> Saying that I believe the patch is OK for the trunk. > OK, let's see how it fares. We have been using it with a GCC 9 compiler for > some time, without any problem so far. > As I wrote for typical GCC use the patch will not create any problem. But GCC users (or running automatically generated tests with artificial option set) still can ran into the problem as it was before the patch.
Index: ira.c =================================================================== --- ira.c (revision 279442) +++ ira.c (working copy) @@ -5192,8 +5192,6 @@ ira (FILE *f) int ira_max_point_before_emit; bool saved_flag_caller_saves = flag_caller_saves; enum ira_region saved_flag_ira_region = flag_ira_region; - unsigned int i; - int num_used_regs = 0; clear_bb_flags (); @@ -5207,18 +5205,28 @@ ira (FILE *f) /* Perform target specific PIC register initialization. */ targetm.init_pic_reg (); - ira_conflicts_p = optimize > 0; + if (optimize) + { + ira_conflicts_p = true; - /* Determine the number of pseudos actually requiring coloring. */ - for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++) - num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i)); - - /* If there are too many pseudos and/or basic blocks (e.g. 10K - pseudos and 10K blocks or 100K pseudos and 1K blocks), we will - use simplified and faster algorithms in LRA. */ - lra_simple_p - = (ira_use_lra_p - && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun)); + /* Determine the number of pseudos actually requiring coloring. */ + unsigned int num_used_regs = 0; + for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++) + if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i)) + num_used_regs++; + + /* If there are too many pseudos and/or basic blocks (e.g. 10K + pseudos and 10K blocks or 100K pseudos and 1K blocks), we will + use simplified and faster algorithms in LRA. */ + lra_simple_p + = ira_use_lra_p + && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun); + } + else + { + ira_conflicts_p = false; + lra_simple_p = ira_use_lra_p; + } if (lra_simple_p) {