RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

Message ID	20240905185257.22411-1-palmer@rivosinc.com
State	New
Headers	show Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED659385B82F Subject: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615] Date: Thu, 5 Sep 2024 11:52:57 -0700 Message-ID: <20240905185257.22411-1-palmer@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: Palmer Dabbelt <palmer@rivosinc.com> From: Palmer Dabbelt <palmer@rivosinc.com> To: gcc-patches@gcc.gnu.org Precedence: list Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615] \| expand RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

Palmer Dabbelt Sept. 5, 2024, 6:52 p.m. UTC

We have cheap logical ops, so let's just move this back to the default
to take advantage of the standard branch/op hueristics.

gcc/ChangeLog:

	PR target/116615
	* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
---
There's a bunch more discussion in the bug, but it's starting to smell
like this was just a holdover from MIPS (where maybe it also shouldn't
be set).  I haven't tested this, but I figured I'd send the patch to get
a little more visibility.

I guess we should also kick off something like a SPEC run to make sure
there's no regressions?
---
 gcc/config/riscv/riscv.h | 2 --
 1 file changed, 2 deletions(-)

Palmer Dabbelt Sept. 5, 2024, 6:59 p.m. UTC | #1

On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:
> We have cheap logical ops, so let's just move this back to the default
> to take advantage of the standard branch/op hueristics.
>
> gcc/ChangeLog:
>
> 	PR target/116615
> 	* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> ---
> There's a bunch more discussion in the bug, but it's starting to smell
> like this was just a holdover from MIPS (where maybe it also shouldn't
> be set).  I haven't tested this, but I figured I'd send the patch to get
> a little more visibility.
>
> I guess we should also kick off something like a SPEC run to make sure
> there's no regressions?

Sorry I missed it in the bug, but Ruoyao points to dddafe94823 
("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where 
short-circuiting the FP comparisons helps on LoongArch.

Not sure if I'm also missing something here, but it kind of feels like 
that should be handled by a more generic optimization decision that just 
globally "should we short circuit logical ops" -- assuming it really is 
the FP comparisons that are causing the cost, as opposed to the actual 
logical ops themselves.

Probably best to actually run the benchmarks, though...

> ---
>  gcc/config/riscv/riscv.h | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index ead97867eb8..a0ccd1fc762 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -939,8 +939,6 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
>  #define TARGET_VECTOR_MISALIGN_SUPPORTED \
>     riscv_vector_unaligned_access_p
>
> -#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
> -
>  /* Control the assembler format that we output.  */
>
>  /* Output to assembler file text saying following lines

Xi Ruoyao Sept. 5, 2024, 7:20 p.m. UTC | #2

On Thu, 2024-09-05 at 11:59 -0700, Palmer Dabbelt wrote:
> On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:
> > We have cheap logical ops, so let's just move this back to the default
> > to take advantage of the standard branch/op hueristics.
> > 
> > gcc/ChangeLog:
> > 
> > 	PR target/116615
> > 	* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> > ---
> > There's a bunch more discussion in the bug, but it's starting to smell
> > like this was just a holdover from MIPS (where maybe it also shouldn't
> > be set).  I haven't tested this, but I figured I'd send the patch to get
> > a little more visibility.
> > 
> > I guess we should also kick off something like a SPEC run to make sure
> > there's no regressions?
> 
> Sorry I missed it in the bug, but Ruoyao points to dddafe94823 
> ("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where 
> short-circuiting the FP comparisons helps on LoongArch.
> 
> Not sure if I'm also missing something here, but it kind of feels like
> that should be handled by a more generic optimization decision that just 
> globally "should we short circuit logical ops" -- assuming it really is 
> the FP comparisons that are causing the cost, as opposed to the actual
> logical ops themselves.

IIUC there are some contributing factors here:

1. On LoongArch FP comparison is slow (costing 5 cycles).
2. On LoongArch the FP comparison result is stored into FCC registers,
and to do logical operations on two comparison results they need to be
moved into GPR first.  The move costs one or two cycles (depending on
the uarch).

and maybe

3. The FP comparison result in the SPEC tests are somewhat predictable.
IIRC when I tested dddafe94823 I made a test program where the FP
comparison results are "randomized" (so the branch predictor is
defeated), then the branch-less code generated with -Ofast --param
logical-op-non-short-circuit=1 was actually faster than the code
generated with -Ofast --param logical-op-non-short-circuit=0.

AFAIK 2 isn't an issue for RISC-V (where FP comparison result is just in
GPR) but 1 and 3 may still need to be considered.

Jeff Law Sept. 5, 2024, 9:51 p.m. UTC | #3

On 9/5/24 12:52 PM, Palmer Dabbelt wrote:
> We have cheap logical ops, so let's just move this back to the default
> to take advantage of the standard branch/op hueristics.
> 
> gcc/ChangeLog:
> 
> 	PR target/116615
> 	* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> ---
> There's a bunch more discussion in the bug, but it's starting to smell
> like this was just a holdover from MIPS (where maybe it also shouldn't
> be set).  I haven't tested this, but I figured I'd send the patch to get
> a little more visibility.
> 
> I guess we should also kick off something like a SPEC run to make sure
> there's no regressions?
Yea, I'd definitely want to see some hard data on an implementation for 
this.   I wouldn't want to rely just on icounts and eyeballing given its 
dependent on branch predictor accuracy and such.  BPI is probably the 
best platform for this kind of testing right now.

I probably can't spin it this week, but probably could next week.

jeff

Andrew Waterman Sept. 5, 2024, 9:55 p.m. UTC | #4

On Thu, Sep 5, 2024 at 2:52 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 9/5/24 12:52 PM, Palmer Dabbelt wrote:
> > We have cheap logical ops, so let's just move this back to the default
> > to take advantage of the standard branch/op hueristics.
> >
> > gcc/ChangeLog:
> >
> >       PR target/116615
> >       * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> > ---
> > There's a bunch more discussion in the bug, but it's starting to smell
> > like this was just a holdover from MIPS (where maybe it also shouldn't
> > be set).  I haven't tested this, but I figured I'd send the patch to get
> > a little more visibility.
> >
> > I guess we should also kick off something like a SPEC run to make sure
> > there's no regressions?
> Yea, I'd definitely want to see some hard data on an implementation for
> this.   I wouldn't want to rely just on icounts and eyeballing given its
> dependent on branch predictor accuracy and such.  BPI is probably the
> best platform for this kind of testing right now.
>
> I probably can't spin it this week, but probably could next week.

Thanks.  If you don't mind, please also collect the static code-size
statistics so we can decide if we need to choose different strategies
when optimizing for size vs. speed.

>
> jeff
>

Jeff Law Sept. 5, 2024, 9:56 p.m. UTC | #5

On 9/5/24 12:59 PM, Palmer Dabbelt wrote:
> On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:
>> We have cheap logical ops, so let's just move this back to the default
>> to take advantage of the standard branch/op hueristics.
>>
>> gcc/ChangeLog:
>>
>>     PR target/116615
>>     * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
>> ---
>> There's a bunch more discussion in the bug, but it's starting to smell
>> like this was just a holdover from MIPS (where maybe it also shouldn't
>> be set).  I haven't tested this, but I figured I'd send the patch to get
>> a little more visibility.
>>
>> I guess we should also kick off something like a SPEC run to make sure
>> there's no regressions?
> 
> Sorry I missed it in the bug, but Ruoyao points to dddafe94823 
> ("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where short- 
> circuiting the FP comparisons helps on LoongArch.
> 
> Not sure if I'm also missing something here, but it kind of feels like 
> that should be handled by a more generic optimization decision that just 
> globally "should we short circuit logical ops" -- assuming it really is 
> the FP comparisons that are causing the cost, as opposed to the actual 
> logical ops themselves.
> 
> Probably best to actually run the benchmarks, though...
THe #define essentially is overriding the generic heuristics which look 
at branch cost to determine how aggressively to try and combine several 
conditional branch conditions using logical ops so they can use a single 
conditional branch in the end.

I don't remember all the history here, but in retrospect, the mere 
existence of that #define points to a failing in the costing models.

FWIW, my general sense is that the gimple phases shouldn't work *too* 
hard to try and combine logical ops, but the if-converters in the RTL 
phases should be fairly aggressive.    THe fact that we use BRANCH_COST 
to drive both is likely sub-optimal.
jeff

Andrew Pinski Sept. 5, 2024, 10:08 p.m. UTC | #6

On Thu, Sep 5, 2024 at 2:57 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 9/5/24 12:59 PM, Palmer Dabbelt wrote:
> > On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:
> >> We have cheap logical ops, so let's just move this back to the default
> >> to take advantage of the standard branch/op hueristics.
> >>
> >> gcc/ChangeLog:
> >>
> >>     PR target/116615
> >>     * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> >> ---
> >> There's a bunch more discussion in the bug, but it's starting to smell
> >> like this was just a holdover from MIPS (where maybe it also shouldn't
> >> be set).  I haven't tested this, but I figured I'd send the patch to get
> >> a little more visibility.
> >>
> >> I guess we should also kick off something like a SPEC run to make sure
> >> there's no regressions?
> >
> > Sorry I missed it in the bug, but Ruoyao points to dddafe94823
> > ("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where short-
> > circuiting the FP comparisons helps on LoongArch.
> >
> > Not sure if I'm also missing something here, but it kind of feels like
> > that should be handled by a more generic optimization decision that just
> > globally "should we short circuit logical ops" -- assuming it really is
> > the FP comparisons that are causing the cost, as opposed to the actual
> > logical ops themselves.
> >
> > Probably best to actually run the benchmarks, though...
> THe #define essentially is overriding the generic heuristics which look
> at branch cost to determine how aggressively to try and combine several
> conditional branch conditions using logical ops so they can use a single
> conditional branch in the end.
>
> I don't remember all the history here, but in retrospect, the mere
> existence of that #define points to a failing in the costing models.

I provided the original history of LOGICAL_OP_NON_SHORT_CIRCUIT in the
RISCV bug report.
And yes there is a costing model fail here.
LOGICAL_OP_NON_SHORT_CIRCUIT was useful if you have a decent cset (or
these days have a ccmp optab).
One cost model issue is LOGICAL_OP_NON_SHORT_CIRCUIT does not handle
if the comparison was fp or integer (which would handle the Loonsoog
and MIPS; and to less sense RISCV).
PowerPC backend does not implement the ccmp optab nor does it have a
decent costing cset so having it as 0 is correct; even though BRANCH
cost might be low for the target (though it could implement ccmp optab
now but nobody has that implemented yet).
Note RISCV's cset is cheap (both size and speed) due to being close to
MIPS and just having instructions which set the GPRs and then
comparing against 0.

I don't have time until next year to start looking at improving the
situation with respect of LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST; it
is on my radar since I want to improve how aarch64's ccmp is done and
remove the use of LOGICAL_OP_NON_SHORT_CIRCUIT from fold-cost to only
being in the ifcombine (or maybe even just in isel) pass.

Thanks,
Andrew Pinski

>
> FWIW, my general sense is that the gimple phases shouldn't work *too*
> hard to try and combine logical ops, but the if-converters in the RTL
> phases should be fairly aggressive.    THe fact that we use BRANCH_COST
> to drive both is likely sub-optimal.
> jeff

RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

Commit Message

Comments

Patch