RISC-V: improve codegen for repeating large constants [3]

Message ID	20230630233315.212700-1-vineetg@rivosinc.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B27073858D33 From: Vineet Gupta <vineetg@rivosinc.com> To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, Jeff Law <jeffreyalaw@gmail.com>, Palmer Dabbelt <palmer@rivosinc.com>, gnu-toolchain@rivosinc.com, Vineet Gupta <vineetg@rivosinc.com> Subject: [PATCH] RISC-V: improve codegen for repeating large constants [3] Date: Fri, 30 Jun 2023 16:33:15 -0700 Message-Id: <20230630233315.212700-1-vineetg@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	RISC-V: improve codegen for repeating large constants [3] \| expand RISC-V: improve codegen for repeating large constants [3]

Vineet Gupta June 30, 2023, 11:33 p.m. UTC

Ran into a minor snafu in const splitting code when playing with test
case from an old PR/23813.

	long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }

This currently generates

	li	a5,-252645376
	addi	a5,a5,241
	li	a0,-252645376
	slli	a5,a5,32
	addi	a0,a0,240
	add	a0,a5,a0
	ret

The signed math in hival extraction introduces an additional bit,
causing loval == hival check to fail.

| riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
| 702	  unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
| (gdb)n
| 703	  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
| (gdb)
| 704	  rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
| (gdb) p/x val
| $2 = 0xf0f0f0f0f0f0f0f0
| (gdb) p/x loval
| $3 = 0xfffffffff0f0f0f0
| (gdb) p/x hival
| $4 = 0xfffffffff0f0f0f1
                       ^^^
Fix that by eliding the subtraction in shift.

With patch:

	li	a5,-252645376
	addi	a5,a5,240
	slli	a0,a5,32
	add	a0,a0,a5
	ret

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_split_integer): hival computation
	  do elide subtraction of loval.
	* (riscv_split_integer_cost): Ditto.
	* (riscv_build_integer): Ditto

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
I wasn't planning to do any more work on large const stuff, but just ran into it this
on a random BZ entry when trying search for redundant constant stuff.
The test seemed too good to pass :-)
---
 gcc/config/riscv/riscv.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Vineet Gupta June 30, 2023, 11:41 p.m. UTC | #1

On 6/30/23 16:33, Vineet Gupta wrote:
> Ran into a minor snafu in const splitting code when playing with test
> case from an old PR/23813.
>
> 	long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
>
> This currently generates
>
> 	li	a5,-252645376
> 	addi	a5,a5,241
> 	li	a0,-252645376
> 	slli	a5,a5,32
> 	addi	a0,a0,240
> 	add	a0,a5,a0
> 	ret
>
> The signed math in hival extraction introduces an additional bit,
> causing loval == hival check to fail.
>
> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
> | 702	  unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> | (gdb)n
> | 703	  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> | (gdb)

FWIW (and I missed adding this observation to the changelog) I pondered 
about using unsigned loval/hival with zext_hwi() but that in certain 
cases can cause additional insns

e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI 
0xFFFFFFFF_80000000


> | 704	  rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> | (gdb) p/x val
> | $2 = 0xf0f0f0f0f0f0f0f0
> | (gdb) p/x loval
> | $3 = 0xfffffffff0f0f0f0
> | (gdb) p/x hival
> | $4 = 0xfffffffff0f0f0f1
>                         ^^^
> Fix that by eliding the subtraction in shift.
>
> With patch:
>
> 	li	a5,-252645376
> 	addi	a5,a5,240
> 	slli	a0,a5,32
> 	add	a0,a0,a5
> 	ret
>
> gcc/ChangeLog:
>
> 	* config/riscv/riscv.cc (riscv_split_integer): hival computation
> 	  do elide subtraction of loval.
> 	* (riscv_split_integer_cost): Ditto.
> 	* (riscv_build_integer): Ditto
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
> I wasn't planning to do any more work on large const stuff, but just ran into it this
> on a random BZ entry when trying search for redundant constant stuff.
> The test seemed too good to pass :-)
> ---
>   gcc/config/riscv/riscv.cc | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5ac187c1b1b4..377d3aac794b 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
>         && (value > INT32_MAX || value < INT32_MIN))
>       {
>         unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
>         struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
>         struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
>         int hi_cost, lo_cost;
> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
>   {
>     int cost;
>     unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>     struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
>   
>     cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> @@ -700,7 +700,7 @@ static rtx
>   riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
>   {
>     unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>     rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>   
>     riscv_move_integer (lo, lo, loval, mode);

Andrew Waterman June 30, 2023, 11:50 p.m. UTC | #2

I don't believe this is correct; the subtraction is needed to account
for the fact that the low part might be negative, resulting in a
borrow from the high part.  See the output for your test case below:

$ cat test.c
#include <stdio.h>

int main()
{
  unsigned long result, tmp;

asm (
  "li      %1,-252645376\n"
  "addi    %1,%1,240\n"
  "slli    %0,%1,32\n"
  "add     %0,%0,%1"
    : "=r" (result), "=r" (tmp));

  printf("%lx\n", result);

  return 0;
}
$ riscv64-unknown-elf-gcc -O2 test.c
$ spike pk a.out
bbl loader
f0f0f0eff0f0f0f0
$


On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>
>
>
> On 6/30/23 16:33, Vineet Gupta wrote:
> > Ran into a minor snafu in const splitting code when playing with test
> > case from an old PR/23813.
> >
> >       long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >
> > This currently generates
> >
> >       li      a5,-252645376
> >       addi    a5,a5,241
> >       li      a0,-252645376
> >       slli    a5,a5,32
> >       addi    a0,a0,240
> >       add     a0,a5,a0
> >       ret
> >
> > The signed math in hival extraction introduces an additional bit,
> > causing loval == hival check to fail.
> >
> > | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
> > | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > | (gdb)n
> > | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > | (gdb)
>
> FWIW (and I missed adding this observation to the changelog) I pondered
> about using unsigned loval/hival with zext_hwi() but that in certain
> cases can cause additional insns
>
> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
> 0xFFFFFFFF_80000000
>
>
> > | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> > | (gdb) p/x val
> > | $2 = 0xf0f0f0f0f0f0f0f0
> > | (gdb) p/x loval
> > | $3 = 0xfffffffff0f0f0f0
> > | (gdb) p/x hival
> > | $4 = 0xfffffffff0f0f0f1
> >                         ^^^
> > Fix that by eliding the subtraction in shift.
> >
> > With patch:
> >
> >       li      a5,-252645376
> >       addi    a5,a5,240
> >       slli    a0,a5,32
> >       add     a0,a0,a5
> >       ret
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv.cc (riscv_split_integer): hival computation
> >         do elide subtraction of loval.
> >       * (riscv_split_integer_cost): Ditto.
> >       * (riscv_build_integer): Ditto
> >
> > Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> > ---
> > I wasn't planning to do any more work on large const stuff, but just ran into it this
> > on a random BZ entry when trying search for redundant constant stuff.
> > The test seemed too good to pass :-)
> > ---
> >   gcc/config/riscv/riscv.cc | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 5ac187c1b1b4..377d3aac794b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
> >         && (value > INT32_MAX || value < INT32_MIN))
> >       {
> >         unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> > -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
> > +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
> >         struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
> >         struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
> >         int hi_cost, lo_cost;
> > @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
> >   {
> >     int cost;
> >     unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >     struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
> >
> >     cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> > @@ -700,7 +700,7 @@ static rtx
> >   riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
> >   {
> >     unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >     rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >
> >     riscv_move_integer (lo, lo, loval, mode);
>

Vineet Gupta July 1, 2023, 12:13 a.m. UTC | #3

On 6/30/23 16:50, Andrew Waterman wrote:
> I don't believe this is correct; the subtraction is needed to account
> for the fact that the low part might be negative, resulting in a
> borrow from the high part.  See the output for your test case below:
>
> $ cat test.c
> #include <stdio.h>
>
> int main()
> {
>    unsigned long result, tmp;
>
> asm (
>    "li      %1,-252645376\n"
>    "addi    %1,%1,240\n"
>    "slli    %0,%1,32\n"
>    "add     %0,%0,%1"
>      : "=r" (result), "=r" (tmp));
>
>    printf("%lx\n", result);
>
>    return 0;
> }
> $ riscv64-unknown-elf-gcc -O2 test.c
> $ spike pk a.out
> bbl loader
> f0f0f0eff0f0f0f0
> $

Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
So is it possible to have a better code seq for the testcase at all ?

-Vineet

>
>
> On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>>
>>
>> On 6/30/23 16:33, Vineet Gupta wrote:
>>> Ran into a minor snafu in const splitting code when playing with test
>>> case from an old PR/23813.
>>>
>>>        long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
>>>
>>> This currently generates
>>>
>>>        li      a5,-252645376
>>>        addi    a5,a5,241
>>>        li      a0,-252645376
>>>        slli    a5,a5,32
>>>        addi    a0,a0,240
>>>        add     a0,a5,a0
>>>        ret
>>>
>>> The signed math in hival extraction introduces an additional bit,
>>> causing loval == hival check to fail.
>>>
>>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
>>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>>> | (gdb)n
>>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>>> | (gdb)
>> FWIW (and I missed adding this observation to the changelog) I pondered
>> about using unsigned loval/hival with zext_hwi() but that in certain
>> cases can cause additional insns
>>
>> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
>> 0xFFFFFFFF_80000000
>>
>>
>>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>>> | (gdb) p/x val
>>> | $2 = 0xf0f0f0f0f0f0f0f0
>>> | (gdb) p/x loval
>>> | $3 = 0xfffffffff0f0f0f0
>>> | (gdb) p/x hival
>>> | $4 = 0xfffffffff0f0f0f1
>>>                          ^^^
>>> Fix that by eliding the subtraction in shift.
>>>
>>> With patch:
>>>
>>>        li      a5,-252645376
>>>        addi    a5,a5,240
>>>        slli    a0,a5,32
>>>        add     a0,a0,a5
>>>        ret
>>>
>>> gcc/ChangeLog:
>>>
>>>        * config/riscv/riscv.cc (riscv_split_integer): hival computation
>>>          do elide subtraction of loval.
>>>        * (riscv_split_integer_cost): Ditto.
>>>        * (riscv_build_integer): Ditto
>>>
>>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
>>> ---
>>> I wasn't planning to do any more work on large const stuff, but just ran into it this
>>> on a random BZ entry when trying search for redundant constant stuff.
>>> The test seemed too good to pass :-)
>>> ---
>>>    gcc/config/riscv/riscv.cc | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>> index 5ac187c1b1b4..377d3aac794b 100644
>>> --- a/gcc/config/riscv/riscv.cc
>>> +++ b/gcc/config/riscv/riscv.cc
>>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
>>>          && (value > INT32_MAX || value < INT32_MIN))
>>>        {
>>>          unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
>>> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
>>> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
>>>          struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
>>>          struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
>>>          int hi_cost, lo_cost;
>>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
>>>    {
>>>      int cost;
>>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>>>      struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
>>>
>>>      cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
>>> @@ -700,7 +700,7 @@ static rtx
>>>    riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
>>>    {
>>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>>>      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>>>
>>>      riscv_move_integer (lo, lo, loval, mode);

Andrew Waterman July 1, 2023, 12:25 a.m. UTC | #4

On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>
>
>
> On 6/30/23 16:50, Andrew Waterman wrote:
> > I don't believe this is correct; the subtraction is needed to account
> > for the fact that the low part might be negative, resulting in a
> > borrow from the high part.  See the output for your test case below:
> >
> > $ cat test.c
> > #include <stdio.h>
> >
> > int main()
> > {
> >    unsigned long result, tmp;
> >
> > asm (
> >    "li      %1,-252645376\n"
> >    "addi    %1,%1,240\n"
> >    "slli    %0,%1,32\n"
> >    "add     %0,%0,%1"
> >      : "=r" (result), "=r" (tmp));
> >
> >    printf("%lx\n", result);
> >
> >    return 0;
> > }
> > $ riscv64-unknown-elf-gcc -O2 test.c
> > $ spike pk a.out
> > bbl loader
> > f0f0f0eff0f0f0f0
> > $
>
> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
> So is it possible to have a better code seq for the testcase at all ?

You're welcome!

When Zba is implemented, then inserting a zext.w would do the trick;
see below.  (The generalization is that the zext.w is needed if the
32-bit constant is negative.)  When Zba is not implemented, I think
the original sequence is optimal.

li      a5, -252645376
addi    a5, a5, 240
slli    a0, a5, 32
zext.w  a5, a5
add     a0, a0, a5


>
> -Vineet
>
> >
> >
> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
> >>
> >>
> >> On 6/30/23 16:33, Vineet Gupta wrote:
> >>> Ran into a minor snafu in const splitting code when playing with test
> >>> case from an old PR/23813.
> >>>
> >>>        long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >>>
> >>> This currently generates
> >>>
> >>>        li      a5,-252645376
> >>>        addi    a5,a5,241
> >>>        li      a0,-252645376
> >>>        slli    a5,a5,32
> >>>        addi    a0,a0,240
> >>>        add     a0,a5,a0
> >>>        ret
> >>>
> >>> The signed math in hival extraction introduces an additional bit,
> >>> causing loval == hival check to fail.
> >>>
> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
> >>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >>> | (gdb)n
> >>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >>> | (gdb)
> >> FWIW (and I missed adding this observation to the changelog) I pondered
> >> about using unsigned loval/hival with zext_hwi() but that in certain
> >> cases can cause additional insns
> >>
> >> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
> >> 0xFFFFFFFF_80000000
> >>
> >>
> >>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >>> | (gdb) p/x val
> >>> | $2 = 0xf0f0f0f0f0f0f0f0
> >>> | (gdb) p/x loval
> >>> | $3 = 0xfffffffff0f0f0f0
> >>> | (gdb) p/x hival
> >>> | $4 = 0xfffffffff0f0f0f1
> >>>                          ^^^
> >>> Fix that by eliding the subtraction in shift.
> >>>
> >>> With patch:
> >>>
> >>>        li      a5,-252645376
> >>>        addi    a5,a5,240
> >>>        slli    a0,a5,32
> >>>        add     a0,a0,a5
> >>>        ret
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>        * config/riscv/riscv.cc (riscv_split_integer): hival computation
> >>>          do elide subtraction of loval.
> >>>        * (riscv_split_integer_cost): Ditto.
> >>>        * (riscv_build_integer): Ditto
> >>>
> >>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> >>> ---
> >>> I wasn't planning to do any more work on large const stuff, but just ran into it this
> >>> on a random BZ entry when trying search for redundant constant stuff.
> >>> The test seemed too good to pass :-)
> >>> ---
> >>>    gcc/config/riscv/riscv.cc | 6 +++---
> >>>    1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >>> index 5ac187c1b1b4..377d3aac794b 100644
> >>> --- a/gcc/config/riscv/riscv.cc
> >>> +++ b/gcc/config/riscv/riscv.cc
> >>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
> >>>          && (value > INT32_MAX || value < INT32_MIN))
> >>>        {
> >>>          unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> >>> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
> >>> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
> >>>          struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
> >>>          struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
> >>>          int hi_cost, lo_cost;
> >>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
> >>>    {
> >>>      int cost;
> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >>>      struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
> >>>
> >>>      cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> >>> @@ -700,7 +700,7 @@ static rtx
> >>>    riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
> >>>    {
> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >>>      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >>>
> >>>      riscv_move_integer (lo, lo, loval, mode);
>

Palmer Dabbelt July 1, 2023, 12:36 a.m. UTC | #5

On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote:
> On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>>
>>
>>
>> On 6/30/23 16:50, Andrew Waterman wrote:
>> > I don't believe this is correct; the subtraction is needed to account
>> > for the fact that the low part might be negative, resulting in a
>> > borrow from the high part.  See the output for your test case below:
>> >
>> > $ cat test.c
>> > #include <stdio.h>
>> >
>> > int main()
>> > {
>> >    unsigned long result, tmp;
>> >
>> > asm (
>> >    "li      %1,-252645376\n"
>> >    "addi    %1,%1,240\n"
>> >    "slli    %0,%1,32\n"
>> >    "add     %0,%0,%1"
>> >      : "=r" (result), "=r" (tmp));
>> >
>> >    printf("%lx\n", result);
>> >
>> >    return 0;
>> > }
>> > $ riscv64-unknown-elf-gcc -O2 test.c
>> > $ spike pk a.out
>> > bbl loader
>> > f0f0f0eff0f0f0f0
>> > $
>>
>> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
>> So is it possible to have a better code seq for the testcase at all ?
>
> You're welcome!
>
> When Zba is implemented, then inserting a zext.w would do the trick;
> see below.  (The generalization is that the zext.w is needed if the
> 32-bit constant is negative.)  When Zba is not implemented, I think
> the original sequence is optimal.
>
> li      a5, -252645376
> addi    a5, a5, 240
> slli    a0, a5, 32
> zext.w  a5, a5
> add     a0, a0, a5

For the non-Zba case, I think we can leverage the two high parts 
starting out the same to save an instruction generating the constant.  
So for the original code sequence of 

        li      a5,-252645376
        addi    a5,a5,241
        li      a0,-252645376
        slli    a5,a5,32
        addi    a0,a0,240
        add     a0,a5,a0
        ret

we could instead generate

        li      a5,-252645376
        addi    a0,a5,240
        addi    a5,a5,241
        slli    a5,a5,32
        add     a0,a5,a0
        ret

which IIUC produces the same result.  I think something along the lines 
of this (with the corresponding cost function updates) would do it

    diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
    index de578b5b899..32b6033a966 100644
    --- a/gcc/config/riscv/riscv.cc
    +++ b/gcc/config/riscv/riscv.cc
    @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
       rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
     
       riscv_move_integer (hi, hi, hival, mode);
    -  riscv_move_integer (lo, lo, loval, mode);
    +  if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost (loval)) {
    +    rtx delta = gen_reg_rrtx (mode);
    +    riscv_move_integer (delta, delta, loval - hival, mode);
    +    lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta);
    +  } else {
    +    riscv_move_integer (lo, lo, loval, mode);
    +  }
     
       hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
       hi = force_reg (mode, hi);

though I suppose that would produce a slightly different sequence that has the
same number of instructions but a slightly longer dependency chain, something
more like

        li      a5,-252645376
        addi    a5,a5,241
        addi    a0,a5,-1
        slli    a5,a5,32
        add     a0,a5,a0
        ret

Take that all with a grain of salt, though, as I just ate some very spicy
chicken and can barely see straight :)


>
>
>>
>> -Vineet
>>
>> >
>> >
>> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>> >>
>> >>
>> >> On 6/30/23 16:33, Vineet Gupta wrote:
>> >>> Ran into a minor snafu in const splitting code when playing with test
>> >>> case from an old PR/23813.
>> >>>
>> >>>        long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
>> >>>
>> >>> This currently generates
>> >>>
>> >>>        li      a5,-252645376
>> >>>        addi    a5,a5,241
>> >>>        li      a0,-252645376
>> >>>        slli    a5,a5,32
>> >>>        addi    a0,a0,240
>> >>>        add     a0,a5,a0
>> >>>        ret
>> >>>
>> >>> The signed math in hival extraction introduces an additional bit,
>> >>> causing loval == hival check to fail.
>> >>>
>> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
>> >>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>> >>> | (gdb)n
>> >>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>> >>> | (gdb)
>> >> FWIW (and I missed adding this observation to the changelog) I pondered
>> >> about using unsigned loval/hival with zext_hwi() but that in certain
>> >> cases can cause additional insns
>> >>
>> >> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
>> >> 0xFFFFFFFF_80000000
>> >>
>> >>
>> >>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>> >>> | (gdb) p/x val
>> >>> | $2 = 0xf0f0f0f0f0f0f0f0
>> >>> | (gdb) p/x loval
>> >>> | $3 = 0xfffffffff0f0f0f0
>> >>> | (gdb) p/x hival
>> >>> | $4 = 0xfffffffff0f0f0f1
>> >>>                          ^^^
>> >>> Fix that by eliding the subtraction in shift.
>> >>>
>> >>> With patch:
>> >>>
>> >>>        li      a5,-252645376
>> >>>        addi    a5,a5,240
>> >>>        slli    a0,a5,32
>> >>>        add     a0,a0,a5
>> >>>        ret
>> >>>
>> >>> gcc/ChangeLog:
>> >>>
>> >>>        * config/riscv/riscv.cc (riscv_split_integer): hival computation
>> >>>          do elide subtraction of loval.
>> >>>        * (riscv_split_integer_cost): Ditto.
>> >>>        * (riscv_build_integer): Ditto
>> >>>
>> >>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
>> >>> ---
>> >>> I wasn't planning to do any more work on large const stuff, but just ran into it this
>> >>> on a random BZ entry when trying search for redundant constant stuff.
>> >>> The test seemed too good to pass :-)
>> >>> ---
>> >>>    gcc/config/riscv/riscv.cc | 6 +++---
>> >>>    1 file changed, 3 insertions(+), 3 deletions(-)
>> >>>
>> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> >>> index 5ac187c1b1b4..377d3aac794b 100644
>> >>> --- a/gcc/config/riscv/riscv.cc
>> >>> +++ b/gcc/config/riscv/riscv.cc
>> >>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
>> >>>          && (value > INT32_MAX || value < INT32_MIN))
>> >>>        {
>> >>>          unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
>> >>> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
>> >>> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
>> >>>          struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
>> >>>          struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
>> >>>          int hi_cost, lo_cost;
>> >>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
>> >>>    {
>> >>>      int cost;
>> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>> >>>      struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
>> >>>
>> >>>      cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
>> >>> @@ -700,7 +700,7 @@ static rtx
>> >>>    riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
>> >>>    {
>> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>> >>>      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>> >>>
>> >>>      riscv_move_integer (lo, lo, loval, mode);
>>

Andrew Waterman July 1, 2023, 8 a.m. UTC | #6

On Fri, Jun 30, 2023 at 5:36 PM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote:
> > On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
> >>
> >>
> >>
> >> On 6/30/23 16:50, Andrew Waterman wrote:
> >> > I don't believe this is correct; the subtraction is needed to account
> >> > for the fact that the low part might be negative, resulting in a
> >> > borrow from the high part.  See the output for your test case below:
> >> >
> >> > $ cat test.c
> >> > #include <stdio.h>
> >> >
> >> > int main()
> >> > {
> >> >    unsigned long result, tmp;
> >> >
> >> > asm (
> >> >    "li      %1,-252645376\n"
> >> >    "addi    %1,%1,240\n"
> >> >    "slli    %0,%1,32\n"
> >> >    "add     %0,%0,%1"
> >> >      : "=r" (result), "=r" (tmp));
> >> >
> >> >    printf("%lx\n", result);
> >> >
> >> >    return 0;
> >> > }
> >> > $ riscv64-unknown-elf-gcc -O2 test.c
> >> > $ spike pk a.out
> >> > bbl loader
> >> > f0f0f0eff0f0f0f0
> >> > $
> >>
> >> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
> >> So is it possible to have a better code seq for the testcase at all ?
> >
> > You're welcome!
> >
> > When Zba is implemented, then inserting a zext.w would do the trick;
> > see below.  (The generalization is that the zext.w is needed if the
> > 32-bit constant is negative.)  When Zba is not implemented, I think
> > the original sequence is optimal.
> >
> > li      a5, -252645376
> > addi    a5, a5, 240
> > slli    a0, a5, 32
> > zext.w  a5, a5
> > add     a0, a0, a5
>
> For the non-Zba case, I think we can leverage the two high parts
> starting out the same to save an instruction generating the constant.
> So for the original code sequence of
>
>         li      a5,-252645376
>         addi    a5,a5,241
>         li      a0,-252645376
>         slli    a5,a5,32
>         addi    a0,a0,240
>         add     a0,a5,a0
>         ret
>
> we could instead generate
>
>         li      a5,-252645376
>         addi    a0,a5,240
>         addi    a5,a5,241
>         slli    a5,a5,32
>         add     a0,a5,a0
>         ret
>
> which IIUC produces the same result.  I think something along the lines
> of this (with the corresponding cost function updates) would do it
>
>     diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>     index de578b5b899..32b6033a966 100644
>     --- a/gcc/config/riscv/riscv.cc
>     +++ b/gcc/config/riscv/riscv.cc
>     @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
>        rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>
>        riscv_move_integer (hi, hi, hival, mode);
>     -  riscv_move_integer (lo, lo, loval, mode);
>     +  if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost (loval)) {
>     +    rtx delta = gen_reg_rrtx (mode);
>     +    riscv_move_integer (delta, delta, loval - hival, mode);
>     +    lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta);
>     +  } else {
>     +    riscv_move_integer (lo, lo, loval, mode);
>     +  }
>
>        hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
>        hi = force_reg (mode, hi);
>
> though I suppose that would produce a slightly different sequence that has the
> same number of instructions but a slightly longer dependency chain, something
> more like
>
>         li      a5,-252645376
>         addi    a5,a5,241
>         addi    a0,a5,-1
>         slli    a5,a5,32
>         add     a0,a5,a0
>         ret
>
> Take that all with a grain of salt, though, as I just ate some very spicy
> chicken and can barely see straight :)

Yeah, that might end up being a false economy for superscalars.

In general, I wouldn't recommend spending too many cleverness beans on
non-Zba+Zbb implementations.  Going forward, we should expect that
even very simple cores provide those extensions.

>
>
> >
> >
> >>
> >> -Vineet
> >>
> >> >
> >> >
> >> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
> >> >>
> >> >>
> >> >> On 6/30/23 16:33, Vineet Gupta wrote:
> >> >>> Ran into a minor snafu in const splitting code when playing with test
> >> >>> case from an old PR/23813.
> >> >>>
> >> >>>        long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >> >>>
> >> >>> This currently generates
> >> >>>
> >> >>>        li      a5,-252645376
> >> >>>        addi    a5,a5,241
> >> >>>        li      a0,-252645376
> >> >>>        slli    a5,a5,32
> >> >>>        addi    a0,a0,240
> >> >>>        add     a0,a5,a0
> >> >>>        ret
> >> >>>
> >> >>> The signed math in hival extraction introduces an additional bit,
> >> >>> causing loval == hival check to fail.
> >> >>>
> >> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
> >> >>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >> >>> | (gdb)n
> >> >>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >> >>> | (gdb)
> >> >> FWIW (and I missed adding this observation to the changelog) I pondered
> >> >> about using unsigned loval/hival with zext_hwi() but that in certain
> >> >> cases can cause additional insns
> >> >>
> >> >> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
> >> >> 0xFFFFFFFF_80000000
> >> >>
> >> >>
> >> >>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >> >>> | (gdb) p/x val
> >> >>> | $2 = 0xf0f0f0f0f0f0f0f0
> >> >>> | (gdb) p/x loval
> >> >>> | $3 = 0xfffffffff0f0f0f0
> >> >>> | (gdb) p/x hival
> >> >>> | $4 = 0xfffffffff0f0f0f1
> >> >>>                          ^^^
> >> >>> Fix that by eliding the subtraction in shift.
> >> >>>
> >> >>> With patch:
> >> >>>
> >> >>>        li      a5,-252645376
> >> >>>        addi    a5,a5,240
> >> >>>        slli    a0,a5,32
> >> >>>        add     a0,a0,a5
> >> >>>        ret
> >> >>>
> >> >>> gcc/ChangeLog:
> >> >>>
> >> >>>        * config/riscv/riscv.cc (riscv_split_integer): hival computation
> >> >>>          do elide subtraction of loval.
> >> >>>        * (riscv_split_integer_cost): Ditto.
> >> >>>        * (riscv_build_integer): Ditto
> >> >>>
> >> >>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> >> >>> ---
> >> >>> I wasn't planning to do any more work on large const stuff, but just ran into it this
> >> >>> on a random BZ entry when trying search for redundant constant stuff.
> >> >>> The test seemed too good to pass :-)
> >> >>> ---
> >> >>>    gcc/config/riscv/riscv.cc | 6 +++---
> >> >>>    1 file changed, 3 insertions(+), 3 deletions(-)
> >> >>>
> >> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> >>> index 5ac187c1b1b4..377d3aac794b 100644
> >> >>> --- a/gcc/config/riscv/riscv.cc
> >> >>> +++ b/gcc/config/riscv/riscv.cc
> >> >>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
> >> >>>          && (value > INT32_MAX || value < INT32_MIN))
> >> >>>        {
> >> >>>          unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> >> >>> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
> >> >>> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
> >> >>>          struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
> >> >>>          struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
> >> >>>          int hi_cost, lo_cost;
> >> >>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
> >> >>>    {
> >> >>>      int cost;
> >> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >> >>>      struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
> >> >>>
> >> >>>      cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> >> >>> @@ -700,7 +700,7 @@ static rtx
> >> >>>    riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
> >> >>>    {
> >> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >> >>>      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >> >>>
> >> >>>      riscv_move_integer (lo, lo, loval, mode);
> >>

Jeff Law July 1, 2023, 2:04 p.m. UTC | #7

On 7/1/23 02:00, Andrew Waterman wrote:

> 
> Yeah, that might end up being a false economy for superscalars.
> 
> In general, I wouldn't recommend spending too many cleverness beans on
> non-Zba+Zbb implementations.  Going forward, we should expect that
> even very simple cores provide those extensions.
I suspect you under-estimate how difficult it is to get the distros to 
move forward on baseline ISAs.

jeff

Palmer Dabbelt July 1, 2023, 2:34 p.m. UTC | #8

On Sat, 01 Jul 2023 07:04:16 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 7/1/23 02:00, Andrew Waterman wrote:
>
>>
>> Yeah, that might end up being a false economy for superscalars.
>>
>> In general, I wouldn't recommend spending too many cleverness beans on
>> non-Zba+Zbb implementations.  Going forward, we should expect that
>> even very simple cores provide those extensions.
> I suspect you under-estimate how difficult it is to get the distros to
> move forward on baseline ISAs.

Ya, we haven't even gotten to the point where most implementations are 
shipping with the B extensions, much less to the point where we can 
start ignoring all the pre-B hardware.

Andrew Waterman July 1, 2023, 3:08 p.m. UTC | #9

On Sat, Jul 1, 2023 at 7:04 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 7/1/23 02:00, Andrew Waterman wrote:
>
> >
> > Yeah, that might end up being a false economy for superscalars.
> >
> > In general, I wouldn't recommend spending too many cleverness beans on
> > non-Zba+Zbb implementations.  Going forward, we should expect that
> > even very simple cores provide those extensions.
> I suspect you under-estimate how difficult it is to get the distros to
> move forward on baseline ISAs.

Yeah, true.

>
> jeff

RISC-V: improve codegen for repeating large constants [3]

Commit Message

Comments

Patch