diff mbox series

[v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32.

Message ID 20240906113207.1095-1-jinma@linux.alibaba.com
State New
Headers show
Series [v2] RISC-V: Fixed incorrect semantic description in DF to DI pattern in the Zfa extension on rv32. | expand

Commit Message

Jin Ma Sept. 6, 2024, 11:32 a.m. UTC
In the process of DF to SI, we generally use "unsigned_fix" rather than
"truncate" for conversion. Although this has no effect in general,
unexpected ICE often occurs when precise semantic analysis is required.

gcc/ChangeLog:

	* config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
        the Zfa extension on rv32.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
---
 gcc/config/riscv/riscv.md                            | 4 ++--
 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c | 9 +++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c

Comments

Jeff Law Sept. 7, 2024, 5:08 p.m. UTC | #1
On 9/6/24 5:32 AM, Jin Ma wrote:
> In the process of DF to SI, we generally use "unsigned_fix" rather than
> "truncate" for conversion. Although this has no effect in general,
> unexpected ICE often occurs when precise semantic analysis is required.
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
>          the Zfa extension on rv32.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
This doesn't look correct.

fmv.x.w just moves the bits from one place to another, no conversion.

So I don't see how the original pattern was correct.  Using truncate on 
an FP mode source isn't defined.  But I also don't think the updated 
pattern is correct either.
jeff
Christoph Müllner Sept. 8, 2024, 9:28 p.m. UTC | #2
On Sat, Sep 7, 2024 at 7:08 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 9/6/24 5:32 AM, Jin Ma wrote:
> > In the process of DF to SI, we generally use "unsigned_fix" rather than
> > "truncate" for conversion. Although this has no effect in general,
> > unexpected ICE often occurs when precise semantic analysis is required.
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
> >          the Zfa extension on rv32.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
> This doesn't look correct.
>
> fmv.x.w just moves the bits from one place to another, no conversion.
>
> So I don't see how the original pattern was correct.  Using truncate on
> an FP mode source isn't defined.  But I also don't think the updated
> pattern is correct either.
> jeff

Having matching pattern for these Zfa moves seems pointless because
the optimization that utilizes the instructions in
riscv_split_doubleword_move() uses:
gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
In the XTheadFmv pattern, we use unspec, so the pattern won't match.
I think this should be done for Zfa as well.
Jin Ma Sept. 9, 2024, 3:16 a.m. UTC | #3
> Having matching pattern for these Zfa moves seems pointless because
> the optimization that utilizes the instructions in
> riscv_split_doubleword_move() uses:
> gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
> In the XTheadFmv pattern, we use unspec, so the pattern won't match.
> I think this should be done for Zfa as well.

Yes, that's a very good suggestion. I will try.

Thanks.
Jin
Jeff Law Sept. 10, 2024, 3:25 p.m. UTC | #4
On 9/8/24 3:28 PM, Christoph Müllner wrote:
> On Sat, Sep 7, 2024 at 7:08 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 9/6/24 5:32 AM, Jin Ma wrote:
>>> In the process of DF to SI, we generally use "unsigned_fix" rather than
>>> "truncate" for conversion. Although this has no effect in general,
>>> unexpected ICE often occurs when precise semantic analysis is required.
>>>
>>> gcc/ChangeLog:
>>>
>>>        * config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
>>>           the Zfa extension on rv32.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>        * gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
>> This doesn't look correct.
>>
>> fmv.x.w just moves the bits from one place to another, no conversion.
>>
>> So I don't see how the original pattern was correct.  Using truncate on
>> an FP mode source isn't defined.  But I also don't think the updated
>> pattern is correct either.
>> jeff
> 
> Having matching pattern for these Zfa moves seems pointless because
> the optimization that utilizes the instructions in
> riscv_split_doubleword_move() uses:
> gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
> In the XTheadFmv pattern, we use unspec, so the pattern won't match.
> I think this should be done for Zfa as well.
But if the generated code is just moving bits, why can't we use the 
standard movXX patterns for the data movement?  Clearly there's 
something about this that I'm missing.

Jeff
Christoph Müllner Sept. 10, 2024, 4:09 p.m. UTC | #5
On Tue, Sep 10, 2024 at 5:25 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 9/8/24 3:28 PM, Christoph Müllner wrote:
> > On Sat, Sep 7, 2024 at 7:08 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 9/6/24 5:32 AM, Jin Ma wrote:
> >>> In the process of DF to SI, we generally use "unsigned_fix" rather than
> >>> "truncate" for conversion. Although this has no effect in general,
> >>> unexpected ICE often occurs when precise semantic analysis is required.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>        * config/riscv/riscv.md:  Change "truncate" to "unsigned_fix" for
> >>>           the Zfa extension on rv32.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>        * gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
> >> This doesn't look correct.
> >>
> >> fmv.x.w just moves the bits from one place to another, no conversion.
> >>
> >> So I don't see how the original pattern was correct.  Using truncate on
> >> an FP mode source isn't defined.  But I also don't think the updated
> >> pattern is correct either.
> >> jeff
> >
> > Having matching pattern for these Zfa moves seems pointless because
> > the optimization that utilizes the instructions in
> > riscv_split_doubleword_move() uses:
> > gen_movdfsisi3_rv32(), gen_movsidf2_low_rv32() and gen_movsidf2_high_rv32().
> > In the XTheadFmv pattern, we use unspec, so the pattern won't match.
> > I think this should be done for Zfa as well.
> But if the generated code is just moving bits, why can't we use the
> standard movXX patterns for the data movement?  Clearly there's
> something about this that I'm missing.

This is a special case for rv32 + D, where we have SI, SF and DF, but no DI.
This raises the question of how to model a 2x 32-bit register <-> DF
register transfer.
This is currently done using movdf_hardfloat_rv32, which could either go through
memory (no Zfa), or create an "artificial" 64-bit move using the
constraint "zmvr",
which will later be picked up by the doubleword-move splitter, which
converts the move
into a Zfa move via riscv_split_doubleword_move().
Jeff Law Sept. 18, 2024, 1:54 p.m. UTC | #6
On 9/10/24 10:09 AM, Christoph Müllner wrote:

>> But if the generated code is just moving bits, why can't we use the
>> standard movXX patterns for the data movement?  Clearly there's
>> something about this that I'm missing.
> > This is a special case for rv32 + D, where we have SI, SF and DF,
> but no DI. This raises the question of how to model a 2x 32-bit
> register <-> DF register transfer. This is currently done using
> movdf_hardfloat_rv32, which could either go through memory (no Zfa),
> or create an "artificial" 64-bit move using the constraint "zmvr", 
> which will later be picked up by the doubleword-move splitter, which 
> converts the move into a Zfa move via riscv_split_doubleword_move().
My point is we shouldn't need a special pattern if we're just doing data 
movement.  A special pattern indicates either a misunderstanding of how 
this should work or some behind the scenes semantics of the instruction 
we're using.

Normally for the case where you have 32bit GPRs but 64bit FPRs, a move 
from an FPR<->GPR move will likely go through memory as there often 
isn't a good way to read/write the halves of the FPR.  Which often means 
defining secondary memory reloads.  It's painful, but necessary.

For this to go directly from FPR<->GPR there would have to be 
instructions which either access those halves of the FPR independently 
or instructions that read/write register pairs on the GPR side.

So I think there's some semantic thing I must be missing WRT Zfa on 
rv32.  Does Zfa on rv32 provide one of the two mechanisms above?

jeff
Christoph Müllner Sept. 18, 2024, 2:03 p.m. UTC | #7
On Wed, Sep 18, 2024 at 3:55 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 9/10/24 10:09 AM, Christoph Müllner wrote:
>
> >> But if the generated code is just moving bits, why can't we use the
> >> standard movXX patterns for the data movement?  Clearly there's
> >> something about this that I'm missing.
> > > This is a special case for rv32 + D, where we have SI, SF and DF,
> > but no DI. This raises the question of how to model a 2x 32-bit
> > register <-> DF register transfer. This is currently done using
> > movdf_hardfloat_rv32, which could either go through memory (no Zfa),
> > or create an "artificial" 64-bit move using the constraint "zmvr",
> > which will later be picked up by the doubleword-move splitter, which
> > converts the move into a Zfa move via riscv_split_doubleword_move().
> My point is we shouldn't need a special pattern if we're just doing data
> movement.  A special pattern indicates either a misunderstanding of how
> this should work or some behind the scenes semantics of the instruction
> we're using.
>
> Normally for the case where you have 32bit GPRs but 64bit FPRs, a move
> from an FPR<->GPR move will likely go through memory as there often
> isn't a good way to read/write the halves of the FPR.  Which often means
> defining secondary memory reloads.  It's painful, but necessary.
>
> For this to go directly from FPR<->GPR there would have to be
> instructions which either access those halves of the FPR independently
> or instructions that read/write register pairs on the GPR side.
>
> So I think there's some semantic thing I must be missing WRT Zfa on
> rv32.  Does Zfa on rv32 provide one of the two mechanisms above?

Yes:
* fmvh.x.d moves bits 63:32 of an FPR into a 32-bit GPR
* fmvp.d.x moves two 32-bit GPR registers into a 64-bit FPR
Jeff Law Sept. 18, 2024, 2:43 p.m. UTC | #8
On 9/18/24 8:03 AM, Christoph Müllner wrote:
> On Wed, Sep 18, 2024 at 3:55 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 9/10/24 10:09 AM, Christoph Müllner wrote:
>>
>>>> But if the generated code is just moving bits, why can't we use the
>>>> standard movXX patterns for the data movement?  Clearly there's
>>>> something about this that I'm missing.
>>>> This is a special case for rv32 + D, where we have SI, SF and DF,
>>> but no DI. This raises the question of how to model a 2x 32-bit
>>> register <-> DF register transfer. This is currently done using
>>> movdf_hardfloat_rv32, which could either go through memory (no Zfa),
>>> or create an "artificial" 64-bit move using the constraint "zmvr",
>>> which will later be picked up by the doubleword-move splitter, which
>>> converts the move into a Zfa move via riscv_split_doubleword_move().
>> My point is we shouldn't need a special pattern if we're just doing data
>> movement.  A special pattern indicates either a misunderstanding of how
>> this should work or some behind the scenes semantics of the instruction
>> we're using.
>>
>> Normally for the case where you have 32bit GPRs but 64bit FPRs, a move
>> from an FPR<->GPR move will likely go through memory as there often
>> isn't a good way to read/write the halves of the FPR.  Which often means
>> defining secondary memory reloads.  It's painful, but necessary.
>>
>> For this to go directly from FPR<->GPR there would have to be
>> instructions which either access those halves of the FPR independently
>> or instructions that read/write register pairs on the GPR side.
>>
>> So I think there's some semantic thing I must be missing WRT Zfa on
>> rv32.  Does Zfa on rv32 provide one of the two mechanisms above?
> 
> Yes:
> * fmvh.x.d moves bits 63:32 of an FPR into a 32-bit GPR
> * fmvp.d.x moves two 32-bit GPR registers into a 64-bit FPR
Thanks for clarifying.  If that's what we're using, then yea, hiding 
behind an unspec is reasonable.

In theory we could try to improve things and expose all the underlying 
semantics.  fmvh.x.d is a subreg high part access to the fpr.  fmvp.d.x 
is just a standard 64bit movdi/movdf move, those exposing it as such may 
have some fallout.  BUt I don't think either of those are necessary to 
tackle right now.

So, no objection to the patch from me anymore.  Let's go with the v2.

Jeff
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 9f94b5aa0232..36d7b333c456 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2627,7 +2627,7 @@  (define_insn "*movdf_softfloat"
 
 (define_insn "movsidf2_low_rv32"
   [(set (match_operand:SI      0 "register_operand" "=  r")
-	(truncate:SI
+	(unsigned_fix:SI
 	    (match_operand:DF 1 "register_operand"  "zmvf")))]
   "TARGET_HARD_FLOAT && !TARGET_64BIT && TARGET_ZFA"
   "fmv.x.w\t%0,%1"
@@ -2638,7 +2638,7 @@  (define_insn "movsidf2_low_rv32"
 
 (define_insn "movsidf2_high_rv32"
   [(set (match_operand:SI      0 "register_operand"    "=  r")
-	(truncate:SI
+	(unsigned_fix:SI
             (lshiftrt:DF
                 (match_operand:DF 1 "register_operand" "zmvf")
                 (const_int 32))))]
diff --git a/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c b/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c
new file mode 100644
index 000000000000..e00047b09e3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zfa-fmovh-fmovp-bug.c
@@ -0,0 +1,9 @@ 
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zfa -mabi=ilp32d -O2 -g" } */
+
+unsigned int
+foo (double a) {
+  unsigned int tt = *(unsigned long long *)&a & 0xffff;
+  return tt;
+}