Message ID | 770bdc23-24cc-4699-af13-38eab3f32b80@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [patch-1,rs6000] enable fctiw on old archs [PR112707] | expand |
Hi Haochen, on 2023/12/1 10:41, HAO CHEN GUI wrote: > Hi, > SImode in float register is supported on P7 above. It causes "fctiw" > can be generated on old 32-bit processors as the output operand of typo? I guess you meant to say "can NOT"? > fctiw insn is a SImode in float/double register. This patch fixes the > problem by adding an expand and an insn pattern for fctiw. The output > of new pattern is SFmode. When the target doesn't support SImode in > float register, it calls the new pattern and convert the SFmode to > SImode via stack. Assuming that due to the inconsistent ISA support levels between stfiwx and lfiw[az]x, it's not practical to support SImode in FP regs. > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with > no regressions. Is this OK for trunk? > > Thanks > Gui Haochen > > ChangeLog > rs6000: enable fctiw on old archs > > The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction, > but the instruction can't be generated on such platforms as the insn is > guard by TARGET_POPCNTD. The root cause is SImode in float register is > supported from Power7. Actually implementation of "fctiw" only needs > stfiwx which is supported by the old 320-bit processors. This patch > enables "fctiw" expand for these processors. > > gcc/ > PR target/112707 > * config/rs6000/rs6000.md (UNSPEC_STFIWX_SF, UNSPEC_FCTIW_SF): New. > (expand lrint<mode>si2): New. > (insn lrint<mode>si2): Rename to... > (lrint<mode>si_internal): ...this, and remove guard TARGET_POPCNTD. > (lrint<mode>si_internal2): New. > (stfiwx_sf): New. > > gcc/testsuite/ > PR target/112707 > * gcc.target/powerpc/pr112707-1.c: New. > > patch.diff > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index d4337ce42a9..1b207522ad5 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -90,6 +90,7 @@ (define_c_enum "unspec" > UNSPEC_TLSTLS_PCREL > UNSPEC_FIX_TRUNC_TF ; fadd, rounding towards zero > UNSPEC_STFIWX > + UNSPEC_STFIWX_SF > UNSPEC_POPCNTB > UNSPEC_FRES > UNSPEC_SP_SET > @@ -111,6 +112,7 @@ (define_c_enum "unspec" > UNSPEC_PARITY > UNSPEC_CMPB > UNSPEC_FCTIW > + UNSPEC_FCTIW_SF > UNSPEC_FCTID > UNSPEC_LFIWAX > UNSPEC_LFIWZX > @@ -6722,11 +6724,39 @@ (define_insn "lrint<mode>di2" > "fctid %0,%1" > [(set_attr "type" "fp")]) > > -(define_insn "lrint<mode>si2" > +(define_expand "lrint<mode>si2" > [(set (match_operand:SI 0 "gpc_reg_operand" "=d") > (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")] > UNSPEC_FCTIW))] > - "TARGET_HARD_FLOAT && TARGET_POPCNTD" > + "TARGET_HARD_FLOAT && TARGET_STFIWX" > +{ > + /* For those old archs in which SImode can't be hold in float registers, > + call lrint<mode>si_internal2 to put the result in SFmode then > + convert it via stack. */ > + if (!TARGET_POPCNTD) > + { > + rtx tmp = gen_reg_rtx (SFmode); > + emit_insn (gen_lrint<mode>si_internal2 (tmp, operands[1])); Considering some existing supports eg: "fix_trunc<mode>si2_stfiwx" adopting DImode, I think we can do the similar thing here, ie: rtx tmp = gen_reg_rtx (DImode); emit_insn (gen_lrint<mode>si_di (tmp, operands[1])); > + rtx stack = rs6000_allocate_stack_temp (SImode, false, true); > + emit_insn (gen_stfiwx_sf (stack, tmp)); ... emit_insn (gen_stfiwx (stack, tmp)); Theoretically even if !TARGET_STFIWX, we can still save the fpr into memory and load the appropriate 4 bytes from that, but TARGET_STFIWX (PPC) is quite old already, introducing such complexity here seems not worthy. > + emit_move_insn (operands[0], stack); > + DONE; > + } > +}) > + > +(define_insn "lrint<mode>si_internal" Nit: This can be unnamed, maybe something like "*lrint<mode>si"? > + [(set (match_operand:SI 0 "gpc_reg_operand" "=d") > + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")] > + UNSPEC_FCTIW))] > + "TARGET_HARD_FLOAT" Nit: Add "&& TARGET_POPCNTD" to the condition. > + "fctiw %0,%1" > + [(set_attr "type" "fp")]) > + > +(define_insn "lrint<mode>si_internal2" can be: lrint<mode>si_di > + [(set (match_operand:SF 0 "gpc_reg_operand" "=d") > + (unspec:SF [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")] > + UNSPEC_FCTIW_SF))] Use DI to replace SF here and just still use UNSPEC_FCTIW. > + "TARGET_HARD_FLOAT" Add "&& !TARGET_POPCNTD" to the condition. > "fctiw %0,%1" > [(set_attr "type" "fp")]) > > @@ -6801,6 +6831,14 @@ (define_insn "stfiwx" > [(set_attr "type" "fpstore") > (set_attr "isa" "*,p8v")]) > > +(define_insn "stfiwx_sf" > + [(set (match_operand:SI 0 "memory_operand" "=Z") > + (unspec:SI [(match_operand:SF 1 "gpc_reg_operand" "d")] > + UNSPEC_STFIWX_SF))] > + "TARGET_STFIWX" > + "stfiwx %1,%y0" > + [(set_attr "type" "fpstore")]) Then this part isn't needed. > + > ;; If we don't have a direct conversion to single precision, don't enable this > ;; conversion for 32-bit without fast math, because we don't have the insn to > ;; generate the fixup swizzle to avoid double rounding problems. > diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c > new file mode 100644 > index 00000000000..32f708c5402 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile { target { powerpc*-*-* && be } } } */ > +/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */ > +/* { dg-require-effective-target ilp32 } */ Nit: "powerpc*-*-* && be" and "-m32" are redundant as pointed out in the review of patch-2. BR, Kewen > +/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } } */ > + > +int test1 (double a) > +{ > + return __builtin_irint (a); > +} > + > +int test2 (float a) > +{ > + return __builtin_irint (a); > +} >
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index d4337ce42a9..1b207522ad5 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -90,6 +90,7 @@ (define_c_enum "unspec" UNSPEC_TLSTLS_PCREL UNSPEC_FIX_TRUNC_TF ; fadd, rounding towards zero UNSPEC_STFIWX + UNSPEC_STFIWX_SF UNSPEC_POPCNTB UNSPEC_FRES UNSPEC_SP_SET @@ -111,6 +112,7 @@ (define_c_enum "unspec" UNSPEC_PARITY UNSPEC_CMPB UNSPEC_FCTIW + UNSPEC_FCTIW_SF UNSPEC_FCTID UNSPEC_LFIWAX UNSPEC_LFIWZX @@ -6722,11 +6724,39 @@ (define_insn "lrint<mode>di2" "fctid %0,%1" [(set_attr "type" "fp")]) -(define_insn "lrint<mode>si2" +(define_expand "lrint<mode>si2" [(set (match_operand:SI 0 "gpc_reg_operand" "=d") (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")] UNSPEC_FCTIW))] - "TARGET_HARD_FLOAT && TARGET_POPCNTD" + "TARGET_HARD_FLOAT && TARGET_STFIWX" +{ + /* For those old archs in which SImode can't be hold in float registers, + call lrint<mode>si_internal2 to put the result in SFmode then + convert it via stack. */ + if (!TARGET_POPCNTD) + { + rtx tmp = gen_reg_rtx (SFmode); + emit_insn (gen_lrint<mode>si_internal2 (tmp, operands[1])); + rtx stack = rs6000_allocate_stack_temp (SImode, false, true); + emit_insn (gen_stfiwx_sf (stack, tmp)); + emit_move_insn (operands[0], stack); + DONE; + } +}) + +(define_insn "lrint<mode>si_internal" + [(set (match_operand:SI 0 "gpc_reg_operand" "=d") + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")] + UNSPEC_FCTIW))] + "TARGET_HARD_FLOAT" + "fctiw %0,%1" + [(set_attr "type" "fp")]) + +(define_insn "lrint<mode>si_internal2" + [(set (match_operand:SF 0 "gpc_reg_operand" "=d") + (unspec:SF [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")] + UNSPEC_FCTIW_SF))] + "TARGET_HARD_FLOAT" "fctiw %0,%1" [(set_attr "type" "fp")]) @@ -6801,6 +6831,14 @@ (define_insn "stfiwx" [(set_attr "type" "fpstore") (set_attr "isa" "*,p8v")]) +(define_insn "stfiwx_sf" + [(set (match_operand:SI 0 "memory_operand" "=Z") + (unspec:SI [(match_operand:SF 1 "gpc_reg_operand" "d")] + UNSPEC_STFIWX_SF))] + "TARGET_STFIWX" + "stfiwx %1,%y0" + [(set_attr "type" "fpstore")]) + ;; If we don't have a direct conversion to single precision, don't enable this ;; conversion for 32-bit without fast math, because we don't have the insn to ;; generate the fixup swizzle to avoid double rounding problems. diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c new file mode 100644 index 00000000000..32f708c5402 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc*-*-* && be } } } */ +/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */ +/* { dg-require-effective-target ilp32 } */ +/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } } */ + +int test1 (double a) +{ + return __builtin_irint (a); +} + +int test2 (float a) +{ + return __builtin_irint (a); +}