diff mbox series

[v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

Message ID 20240811104317.531505-1-pan2.li@intel.com
State New
Headers show
Series [v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278] | expand

Commit Message

Li, Pan2 Aug. 11, 2024, 10:43 a.m. UTC
From: Pan Li <pan2.li@intel.com>

For QI/HImode of .SAT_ADD,  the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected.  For example as
below code.

signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
  b[0] = -40;
  c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 9;
  __builtin_printf("%d\n", c);
}

After expanding we have:

;; _6 = .SAT_ADD (_3, 9);
(insn 8 7 9 (set (reg:DI 143)
        (high:DI (symbol_ref:DI ("d") [flags 0x86]  <var_decl d>)))
     (nil))
(insn 9 8 10 (set (reg/f:DI 142)
        (mem/f/c:DI (lo_sum:DI (reg:DI 143)
                (symbol_ref:DI ("d") [flags 0x86]  <var_decl d>)) [1 d+0 S8 A64]))
     (nil))
(insn 10 9 11 (set (reg:HI 144 [ _3 ])
        (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) "test.c":7:10 -1
     (nil))

The convert from signed char to unsigned short will have sign_extend rtl
as above.  And finally become the lb insn as below:

lb      a1,0(a5)   // a1 is -40, aka 0xffffffffffffffd8
lui     a0,0x1a
addi    a5,a1,9
slli    a5,a5,0x30
srli    a5,a5,0x30 // a5 is 65505
sltu    a1,a5,a1   // compare 65505 and 0xffffffffffffffd8 => TRUE

The sltu try to compare 65505 and 0xffffffffffffffd8 here,  but we
actually want to compare 65505 and 65496 (0xffd8).  Thus we need to
clean up the high bits to ensure this.

The below test suites are passed for this patch:
* The rv64gcv fully regression test.

	PR target/116278

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
	func impl to zero extend rtx.
	(riscv_expand_usadd): Leverage above func to cleanup operands
	and sum.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/pr116278-run-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/riscv.cc                     | 19 ++++++++++++++++++-
 .../gcc.target/riscv/pr116278-run-1.c         | 16 ++++++++++++++++
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116278-run-1.c

Comments

Jeff Law Aug. 12, 2024, 4:58 p.m. UTC | #1
On 8/11/24 4:43 AM, pan2.li@intel.com wrote:

> +static rtx
> +riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
> +{
> +  if (mode != HImode && mode != QImode)
> +    return gen_lowpart (Xmode, x);
Isn't this wrong for SImode on rv64?  It seems to me the right test is 
mode != word_mode?


Assuming that works, it's OK for the trunk.

jeff
Li, Pan2 Aug. 13, 2024, 2:09 a.m. UTC | #2
> Isn't this wrong for SImode on rv64?  It seems to me the right test is 
> mode != word_mode?
> Assuming that works, it's OK for the trunk.

Thanks Jeff, Simode version of test file doesn't have this issue. Thus, only HI and QI here.
I will add a new test for SImode in v3 to ensure this.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Tuesday, August 13, 2024 12:58 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; rdapp.gcc@gmail.com
Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]



On 8/11/24 4:43 AM, pan2.li@intel.com wrote:

> +static rtx
> +riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
> +{
> +  if (mode != HImode && mode != QImode)
> +    return gen_lowpart (Xmode, x);
Isn't this wrong for SImode on rv64?  It seems to me the right test is 
mode != word_mode?


Assuming that works, it's OK for the trunk.

jeff
Jeff Law Aug. 14, 2024, 4:03 a.m. UTC | #3
On 8/12/24 8:09 PM, Li, Pan2 wrote:
>> Isn't this wrong for SImode on rv64?  It seems to me the right test is
>> mode != word_mode?
>> Assuming that works, it's OK for the trunk.
> 
> Thanks Jeff, Simode version of test file doesn't have this issue. Thus, only HI and QI here.
> I will add a new test for SImode in v3 to ensure this.
How specifically is it avoided for SI?  ISTM it should have the exact 
same problem with a constant like 0x80000000 in SImode on rv64 which is 
going to be extended to 0xffffffff80000000.

Jeff
Li, Pan2 Aug. 14, 2024, 4:16 a.m. UTC | #4
> How specifically is it avoided for SI?  ISTM it should have the exact 
> same problem with a constant like 0x80000000 in SImode on rv64 which is 
> going to be extended to 0xffffffff80000000.

HI and QI need some special handling for sum. For example, for HImode.

65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits (aka make 65537 become 1) to tell the HImode overflow.
Thus, for HI and QI, we need to clean up highest bits of mode.

But for SI, we don't need that as we have addw insn, the sign extend will take care of this as well as the sltu. For example, SImode.

lw      a1,0(a5)  // a1 is -40, aka 0xffffffffffffffd8
lui     a0,0x1a   // 
addwi    a5,a1,9   // a5 is -31, aka 0xffffffffffffffe1
                           // For QI and HI, we need to mask the highbits, but not applicable for SI.
sltu    a1,a5,a1  // compare a1 and a5, a5 > a1, then no-overflow as expected.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Wednesday, August 14, 2024 12:03 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; rdapp.gcc@gmail.com
Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]



On 8/12/24 8:09 PM, Li, Pan2 wrote:
>> Isn't this wrong for SImode on rv64?  It seems to me the right test is
>> mode != word_mode?
>> Assuming that works, it's OK for the trunk.
> 
> Thanks Jeff, Simode version of test file doesn't have this issue. Thus, only HI and QI here.
> I will add a new test for SImode in v3 to ensure this.
How specifically is it avoided for SI?  ISTM it should have the exact 
same problem with a constant like 0x80000000 in SImode on rv64 which is 
going to be extended to 0xffffffff80000000.

Jeff
Jeff Law Aug. 16, 2024, 4:22 a.m. UTC | #5
On 8/13/24 10:16 PM, Li, Pan2 wrote:
>> How specifically is it avoided for SI?  ISTM it should have the exact
>> same problem with a constant like 0x80000000 in SImode on rv64 which is
>> going to be extended to 0xffffffff80000000.
> 
> HI and QI need some special handling for sum. For example, for HImode.
> 
> 65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits (aka make 65537 become 1) to tell the HImode overflow.
> Thus, for HI and QI, we need to clean up highest bits of mode.
> 
> But for SI, we don't need that as we have addw insn, the sign extend will take care of this as well as the sltu. For example, SImode.
> 
> lw      a1,0(a5)  // a1 is -40, aka 0xffffffffffffffd8
> lui     a0,0x1a   //
> addwi    a5,a1,9   // a5 is -31, aka 0xffffffffffffffe1
>                             // For QI and HI, we need to mask the highbits, but not applicable for SI.
> sltu    a1,a5,a1  // compare a1 and a5, a5 > a1, then no-overflow as expected.
What's more important is that we get the RTL semantics right, the fact 
that it seems to work due to addiw seems to be more of an accident than 
by design.  Also note that addiw isn't available unless ZBA is enabled, 
so we don't want to depend on that to save us.

I still think we should be handling SI on rv64 in a manner similar to 
QI/HI are handled on rv32/rv64.

jeff
Andrew Waterman Aug. 16, 2024, 4:28 a.m. UTC | #6
On Thu, Aug 15, 2024 at 9:23 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 8/13/24 10:16 PM, Li, Pan2 wrote:
> >> How specifically is it avoided for SI?  ISTM it should have the exact
> >> same problem with a constant like 0x80000000 in SImode on rv64 which is
> >> going to be extended to 0xffffffff80000000.
> >
> > HI and QI need some special handling for sum. For example, for HImode.
> >
> > 65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits (aka make 65537 become 1) to tell the HImode overflow.
> > Thus, for HI and QI, we need to clean up highest bits of mode.
> >
> > But for SI, we don't need that as we have addw insn, the sign extend will take care of this as well as the sltu. For example, SImode.
> >
> > lw      a1,0(a5)  // a1 is -40, aka 0xffffffffffffffd8
> > lui     a0,0x1a   //
> > addwi    a5,a1,9   // a5 is -31, aka 0xffffffffffffffe1
> >                             // For QI and HI, we need to mask the highbits, but not applicable for SI.
> > sltu    a1,a5,a1  // compare a1 and a5, a5 > a1, then no-overflow as expected.
> What's more important is that we get the RTL semantics right, the fact
> that it seems to work due to addiw seems to be more of an accident than
> by design.  Also note that addiw isn't available unless ZBA is enabled,
> so we don't want to depend on that to save us.

addiw is always available in RV64; you're probably thinking of add.uw,
which is an RV64_Zba instruction.  I think your overall point still
holds, though.

>
> I still think we should be handling SI on rv64 in a manner similar to
> QI/HI are handled on rv32/rv64.
>
> jeff
>
Li, Pan2 Aug. 16, 2024, 12:38 p.m. UTC | #7
Thanks Jeff and waterman for comments.

> What's more important is that we get the RTL semantics right, the fact
> that it seems to work due to addiw seems to be more of an accident than
> by design.

The SImode has different handling from day 1 which follow the algorithm up to a point.

11842   if (mode == SImode && mode != Xmode)
11843     { /* Take addw to avoid the sum truncate.  
11844       rtx simode_sum = gen_reg_rtx (SImode
11845       riscv_emit_binary (PLUS, simode_sum, x, y
11846       emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));                                                                                                                                                                  
11847     }

> I think your overall point still holds, though.

Got the point here but I would like to double confirm the below 2 more insn is acceptable for this change. (or we can eliminate it later)

sat_u_add_uint32_t_fmt_1:
        slli    a5,a0,32   // additional insn for taking care SI in rv64
        srli    a5,a5,32   // Ditto.
        addw    a0,a0,a1
        sltu    a5,a0,a5                                                                                                                                                                                                                           
        neg     a5,a5                                                                                                                                                                                                                              
        or      a0,a5,a0
        sext.w  a0,a0
        ret

If so, I will prepare the v3 for the SImode in RV64.

Pan

-----Original Message-----
From: Andrew Waterman <andrew@sifive.com> 
Sent: Friday, August 16, 2024 12:28 PM
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; rdapp.gcc@gmail.com
Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

On Thu, Aug 15, 2024 at 9:23 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 8/13/24 10:16 PM, Li, Pan2 wrote:
> >> How specifically is it avoided for SI?  ISTM it should have the exact
> >> same problem with a constant like 0x80000000 in SImode on rv64 which is
> >> going to be extended to 0xffffffff80000000.
> >
> > HI and QI need some special handling for sum. For example, for HImode.
> >
> > 65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits (aka make 65537 become 1) to tell the HImode overflow.
> > Thus, for HI and QI, we need to clean up highest bits of mode.
> >
> > But for SI, we don't need that as we have addw insn, the sign extend will take care of this as well as the sltu. For example, SImode.
> >
> > lw      a1,0(a5)  // a1 is -40, aka 0xffffffffffffffd8
> > lui     a0,0x1a   //
> > addwi    a5,a1,9   // a5 is -31, aka 0xffffffffffffffe1
> >                             // For QI and HI, we need to mask the highbits, but not applicable for SI.
> > sltu    a1,a5,a1  // compare a1 and a5, a5 > a1, then no-overflow as expected.
> What's more important is that we get the RTL semantics right, the fact
> that it seems to work due to addiw seems to be more of an accident than
> by design.  Also note that addiw isn't available unless ZBA is enabled,
> so we don't want to depend on that to save us.

addiw is always available in RV64; you're probably thinking of add.uw,
which is an RV64_Zba instruction.  I think your overall point still
holds, though.

>
> I still think we should be handling SI on rv64 in a manner similar to
> QI/HI are handled on rv32/rv64.
>
> jeff
>
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5fe4273beb7..cfdb3d82972 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11564,6 +11564,23 @@  riscv_get_raw_result_mode (int regno)
   return default_get_reg_raw_mode (regno);
 }
 
+/* Generate a new rtx of Xmode based on the rtx and mode in define pattern.
+   The rtx x will be zero extended to Xmode if the mode is HI/QImode,  and
+   the new zero extended Xmode rtx will be returned.
+   Or the gen_lowpart rtx of Xmode will be returned.  */
+
+static rtx
+riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
+{
+  if (mode != HImode && mode != QImode)
+    return gen_lowpart (Xmode, x);
+
+  rtx xmode_reg = gen_reg_rtx (Xmode);
+  riscv_emit_unary (ZERO_EXTEND, xmode_reg, x);
+
+  return xmode_reg;
+}
+
 /* Implements the unsigned saturation add standard name usadd for int mode.
 
    z = SAT_ADD(x, y).
@@ -11580,7 +11597,7 @@  riscv_expand_usadd (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   rtx xmode_sum = gen_reg_rtx (Xmode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
-  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_x = riscv_gen_zero_extend_rtx (x, mode);
   rtx xmode_y = gen_lowpart (Xmode, y);
   rtx xmode_dest = gen_reg_rtx (Xmode);
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
new file mode 100644
index 00000000000..f6268e290ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
@@ -0,0 +1,16 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+signed char b[1];
+int c;
+signed char *d = b;
+
+int main() {
+  b[0] = -40;
+  c = ({
+    (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6;
+  }) + 9;
+
+  if (c != 65505)
+    __builtin_abort ();
+}