[rs6000] fix mffsl emulation

Message ID	ortv1amxns.fsf@livre.home
State	New
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6FF3B3851C06 From: Alexandre Oliva <oliva@adacore.com> To: gcc-patches@gcc.gnu.org Subject: [rs6000] fix mffsl emulation Organization: Free thinker, does not speak for AdaCore Date: Thu, 23 Apr 2020 05:08:55 -0300 Message-ID: <ortv1amxns.fsf@livre.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: list Cc: Segher Boessenkool <segher@kernel.crashing.org>, David Edelsohn <dje.gcc@gmail.com> Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	[rs6000] fix mffsl emulation \| expand [rs6000] fix mffsl emulation

Alexandre Oliva April 23, 2020, 8:08 a.m. UTC

The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going
through the motions, but not storing the result in the given
operands[0]; it rather modifies operands[0] without effect.  It also
creates a DImode pseudo that it doesn't use, a DFmode pseudo that's
unnecessary AFAICT, and it's not indented per the GNU Coding
Standards.  The patch below fixes all of these.

I wasn't sure simplify_gen_subreg might possibly emit any code in
obscure cases, so I left it in place, and accommodated the possibility
that the result of the mode conversion back might need copying to the
requested output operand.  In my tests, the output was always a
REG:DF, so the subreg was trivial and the conversion back got the
original REG:DF output back.


I'm concerned about several issues in the mffsl testcase.  First, I
don't see that comparing the values as doubles rather than as long
longs is desirable.  These are FPSCR bitfields, not FP numbers.  I
understand mffs et al use double because they output to FP registers,
but...  The bit patterns might not even be well-formed FP numbers.

Another issue with the test is that, if the compare fails, it calls
mffsl again to print the value, as if it would yield the same result.
But part of the FPSCR that mffsl (emulated with mmfl or not) copies to
the output FP register is the FPCC, so the fcmpu used to compare the
result of the first mmfsl will modify FPSCR and thus the result of the
second mmfsl call.

Yet another issue is that the test assumed the mmfs bits not extracted
by mffsl are all zero.  This appears to be the case, as the bits left
out are for sticky exceptions, but there are reserved parts of FPSCR
that might turn out to be set in the future, and then the masking in
the GCC-emulated version of mffsl would zero out those bits and cause
the compare to fail.

So I put in masking in the mffs result before the compare, but then,
what if mffsl is changed so that it copies additional nonzero bits?
Should we mask both mffs and mffsl outputs?  Or is it safe to leave
those bits alone and assume them to be zero at the entry point of
main(), as the test used to do?


Regstrapped on powerpc64le-linux-gnu.  Ok to install?


for  gcc/ChangeLog

	* gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
        output operand in emulation.  Simplify.

for  gcc/testsuite/ChangeLog

	* gcc.target/powerpc/test_mffsl.c: Call mffsl only once.
        Reinterpret the doubles as long longs for compares.  Mask out
        mffs bits that are not expected from mffsl.
---
 gcc/config/rs6000/rs6000.md                   |   26 +++++++++++++------------
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |   12 ++++++++----
 2 files changed, 22 insertions(+), 16 deletions(-)

Segher Boessenkool April 23, 2020, 11:22 a.m. UTC | #1

Hi!

On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote:
> The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going
> through the motions, but not storing the result in the given
> operands[0]; it rather modifies operands[0] without effect.

Heh, oops.

> It also
> creates a DImode pseudo that it doesn't use, a DFmode pseudo that's
> unnecessary AFAICT, and it's not indented per the GNU Coding
> Standards.  The patch below fixes all of these.
> 
> I wasn't sure simplify_gen_subreg might possibly emit any code in
> obscure cases,

It never does, it just returns an RTX.  Where would it emit the insns to?

> so I left it in place, and accommodated the possibility
> that the result of the mode conversion back might need copying to the
> requested output operand.  In my tests, the output was always a
> REG:DF, so the subreg was trivial and the conversion back got the
> original REG:DF output back.

> I'm concerned about several issues in the mffsl testcase.  First, I
> don't see that comparing the values as doubles rather than as long
> longs is desirable.  These are FPSCR bitfields, not FP numbers.  I
> understand mffs et al use double because they output to FP registers,
> but...  The bit patterns might not even be well-formed FP numbers.

"Desirable", probably not, no.  But it will always work: since all the
top bits are zeros always, it will always be a subnormal number, so all
comparisons will work as expected / wanted.

> Another issue with the test is that, if the compare fails, it calls
> mffsl again to print the value, as if it would yield the same result.
> But part of the FPSCR that mffsl (emulated with mmfl or not) copies to
> the output FP register is the FPCC, so the fcmpu used to compare the
> result of the first mmfsl will modify FPSCR and thus the result of the
> second mmfsl call.

Yeah, good point.  We never *use* FPCC (nowhere, not just in this test),
but that is somewhat beside the point.

OTOH, all this is only done if we are debugging the testcase, so it
isn't important either way.

> Yet another issue is that the test assumed the mmfs bits not extracted
> by mffsl are all zero.  This appears to be the case, as the bits left
> out are for sticky exceptions, but there are reserved parts of FPSCR
> that might turn out to be set in the future, and then the masking in
> the GCC-emulated version of mffsl would zero out those bits and cause
> the compare to fail.

All those extra bits are required to be set to 0.  From the ISA:
  For Move From FPSCR Lightweight (mffsl), do the following.  The
  contents of the control bits in the FPSCR, that is, bits 29:31 (DRN)
  and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), and the non-sticky status
  bits in the FPSCR, that is, bits 45:51 (FR, FI, C, FL, FG, FE, FU),
  are placed into the corresponding bits in register FRT. All other bits
  in register FRT are set to 0.

> So I put in masking in the mffs result before the compare, but then,
> what if mffsl is changed so that it copies additional nonzero bits?
> Should we mask both mffs and mffsl outputs?  Or is it safe to leave
> those bits alone and assume them to be zero at the entry point of
> main(), as the test used to do?

The code as-is was correct here (the compiler code as well as the
testcase code).

> for  gcc/ChangeLog
> 
> 	* gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
>         output operand in emulation.  Simplify.

Indent with a tab please.

> +      rtx tmp_df = operands[0];

Please don't reuse pseudos (or worse, non-pseudo registers).  This was
correct in the original code.

> +      rtx tmp_di;

Please just declare at first use, like the original did.  It's the more
modern style, and it actually is nicer ;-)

> +      /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
> +	 instruction using the mffs instruction and masking off the bits
> +	 the mmsl instruciton actually reads.  */

"mffsl instruction".  Heh, I see the original was bad already, oops.

As I said above, the insn is *required* to zero all other bits.

> +      emit_insn (gen_rs6000_mffs (tmp_df));
> +      tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> +      emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL)));
> +
> +      tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> +      if (operands[0] != tmp_df)
> +	emit_move_insn (operands[0], tmp_df);
> +      DONE;

That "==" won't do what you want, I guess?  It's not needed, anyway, it
is just a distracting premature micro-optimisation.

So just do the emit_move_insn please, with whitespace and comment fixed
if you want?

Thanks,

Segher

(Is there a PR, btw?)

Alexandre Oliva April 24, 2020, 7:52 a.m. UTC | #2

Hello, Segher,

On Apr 23, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote:

>> I wasn't sure simplify_gen_subreg might possibly emit any code in
>> obscure cases,

> It never does, it just returns an RTX.  Where would it emit the insns to?

I'm pretty sure there are or were subreg-extraction functions that could
emit new insns onto whatever insn stream was active in certain unusual
circumstances; I wasn't sure whether this was one of them.

> since all the top bits are zeros always, it will always be a subnormal
> number, so all comparisons will work as expected / wanted.

*nod*, as long as there's no trapping on subnormals.

>> Yet another issue is that the test assumed the mmfs bits not extracted
>> by mffsl are all zero.

> All those extra bits are required to be set to 0.

Not in the mffs output, no.  See, at this point I'm not talking about
the emulation, but about the mffs use in the test proper.

Consider the case of emulated mffsl: we end up comparing the mffs output
value with a masked version thereof.  If any of the mffs output bits
could possibly be nonzero, the compare would fail, not because the
emulation is wrong, but because the test is comparing the output of
mffsl with that of mffs, with all its bits.

In my testing, mffs output zero in this program, so it worked, but are
all the non-mffsl bits of FPSCR guaranteed to be zero, so the output of
mffs doesn't have to be masked out by the test?

What if some future CPU defines some of the reserved bits in FPSCR in a
way that they should be nonzero?

(And, could the ISA of such a future CPU possibly specify that mffsl
should preserve those extra bits from FPSCR, rather than zero them out?)

>> for  gcc/ChangeLog
>> 
>> * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
>> output operand in emulation.  Simplify.

> Indent with a tab please.

Erhm, what I posted had TABs there.  Did it get mangled? :-(

>> +      rtx tmp_df = operands[0];

> Please don't reuse pseudos

Aah, I see, that's what the original tmp_di was trying to accomplish!

I'm familiar with that general guidance, but I didn't think there's much
for optimizers to do with an mffs output, and the predicate ensures we
have a reg, but...  I suppose additional pseudos make the subregging
safer, and not adding extra uses for operands[0] might help with
register allocation for it, so...  How about this, to also avoid reusing
tmpdi as anddi input and output?

      rtx tmp_df = gen_reg_rtx (DFmode);

      /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
	 instruction using the mffs instruction and masking off the bits
	 the mmsl instruction actually reads.  */
      emit_insn (gen_rs6000_mffs (tmp_df));

      rtx tmp_df_as_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
      rtx tmp_di = gen_reg_rtx (DImode);
      emit_insn (gen_anddi3 (tmp_di, tmp_df_as_di, GEN_INT (0x70007f0ffLL)));

      rtx tmp_di_as_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
      emit_move_insn (operands[0], tmp_di_as_df);
      DONE;

> "mffsl instruction".  Heh, I see the original was bad already, oops.

Double oops.  I had that typo fixed in some earlier version of the
patch, but somehow I managed to drop that fix.

>> +      if (operands[0] != tmp_df)
>> +	emit_move_insn (operands[0], tmp_df);
>> +      DONE;

> That "==" won't do what you want, I guess?

If the incoming operands[0] is a REG, we'd be back to it after
subregging back and forth; if it was a SUBREG to begin with, the cheap
test would fail and we'd output the redundant move, that later passes
will catch.  Anyway, that's gone in the snippet above.

> (Is there a PR, btw?)

Not that I know of; we've hit the testcase FAIL in our own testing.

Segher Boessenkool April 24, 2020, 2:40 p.m. UTC | #3

Hi!

On Fri, Apr 24, 2020 at 04:52:46AM -0300, Alexandre Oliva wrote:
> On Apr 23, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote:
> >> I wasn't sure simplify_gen_subreg might possibly emit any code in
> >> obscure cases,
> 
> > It never does, it just returns an RTX.  Where would it emit the insns to?
> 
> I'm pretty sure there are or were subreg-extraction functions that could
> emit new insns onto whatever insn stream was active in certain unusual
> circumstances;

Yeah, I think so too.

> I wasn't sure whether this was one of them.

Doesn't look that way, thankfully :-)

> > since all the top bits are zeros always, it will always be a subnormal
> > number, so all comparisons will work as expected / wanted.
> 
> *nod*, as long as there's no trapping on subnormals.

There isn't :-)  I did say this isn't clean at all, just *happens* to
work, right?  :-)

> >> Yet another issue is that the test assumed the mmfs bits not extracted
> >> by mffsl are all zero.
> 
> > All those extra bits are required to be set to 0.
> 
> Not in the mffs output, no.  See, at this point I'm not talking about
> the emulation, but about the mffs use in the test proper.

Hrm, I don't see it then.

> Consider the case of emulated mffsl: we end up comparing the mffs output
> value with a masked version thereof.  If any of the mffs output bits
> could possibly be nonzero, the compare would fail, not because the
> emulation is wrong, but because the test is comparing the output of
> mffsl with that of mffs, with all its bits.

Oh.  I thought that was masked as well.

> In my testing, mffs output zero in this program, so it worked, but are
> all the non-mffsl bits of FPSCR guaranteed to be zero, so the output of
> mffs doesn't have to be masked out by the test?

No, and there are many bits not yet defined, so it may change for future
versions of the ISA.

> What if some future CPU defines some of the reserved bits in FPSCR in a
> way that they should be nonzero?

In a way different between mffs and mffsl.  Yeah.

> (And, could the ISA of such a future CPU possibly specify that mffsl
> should preserve those extra bits from FPSCR, rather than zero them out?)

Yes, I don't see why not?  Hrm, the exact language for the mffsl
description does not allow that, but that wouldn't be the best thing to
reply on.

> >> for  gcc/ChangeLog
> >> 
> >> * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
> >> output operand in emulation.  Simplify.
> 
> > Indent with a tab please.
> 
> Erhm, what I posted had TABs there.  Did it get mangled? :-(

Yes, to eight spaces, so that aligned "output" under "gcc".

> >> +      rtx tmp_df = operands[0];
> 
> > Please don't reuse pseudos
> 
> Aah, I see, that's what the original tmp_di was trying to accomplish!

:-)

> I'm familiar with that general guidance, but I didn't think there's much
> for optimizers to do with an mffs output, and the predicate ensures we
> have a reg, but...  I suppose additional pseudos make the subregging
> safer, and not adding extra uses for operands[0] might help with
> register allocation for it, so...  How about this, to also avoid reusing
> tmpdi as anddi input and output?

No pseudo should be assigned to more than once.  A la SSA, or "webs".
Many of our optimisers do not work well otherwise.

>       rtx tmp_df = gen_reg_rtx (DFmode);
> 
>       /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
> 	 instruction using the mffs instruction and masking off the bits
> 	 the mmsl instruction actually reads.  */
>       emit_insn (gen_rs6000_mffs (tmp_df));
> 
>       rtx tmp_df_as_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
>       rtx tmp_di = gen_reg_rtx (DImode);
>       emit_insn (gen_anddi3 (tmp_di, tmp_df_as_di, GEN_INT (0x70007f0ffLL)));
> 
>       rtx tmp_di_as_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
>       emit_move_insn (operands[0], tmp_di_as_df);
>       DONE;

Sure, that looks fine.  Maybe tmp1, tmp2, etc. woukld be clearer here?
:-)

> > "mffsl instruction".  Heh, I see the original was bad already, oops.
> 
> Double oops.  I had that typo fixed in some earlier version of the
> patch, but somehow I managed to drop that fix.
> 
> >> +      if (operands[0] != tmp_df)
> >> +	emit_move_insn (operands[0], tmp_df);
> >> +      DONE;
> 
> > That "==" won't do what you want, I guess?
> 
> If the incoming operands[0] is a REG, we'd be back to it after
> subregging back and forth; if it was a SUBREG to begin with, the cheap
> test would fail and we'd output the redundant move, that later passes
> will catch.  Anyway, that's gone in the snippet above.
> 
> > (Is there a PR, btw?)
> 
> Not that I know of; we've hit the testcase FAIL in our own testing.

Could you open one?  This patch will need backporting, etc.  I can do
it, if you really hate bugzilla, but I think it is easier for you to
do it :-)

Thanks!


Segher

Alexandre Oliva April 28, 2020, 3:25 a.m. UTC | #4

On Apr 24, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

>> > since all the top bits are zeros always, it will always be a subnormal
>> > number, so all comparisons will work as expected / wanted.
>> 
>> *nod*, as long as there's no trapping on subnormals.

> There isn't :-)  I did say this isn't clean at all, just *happens* to
> work, right?  :-)

I hope you don't mind my using the union in the testcase, that was
otherwise unused, to IMHO improve on that, then ;-)

>> Erhm, what I posted had TABs there.  Did it get mangled? :-(

> Yes, to eight spaces, so that aligned "output" under "gcc".

Aah, found them in lines other than the ones I'd verified.  Fixed.

> Sure, that looks fine.  Maybe tmp1, tmp2, etc. woukld be clearer here?

That works for me.

> Could you open one?

Sure, PR94812

Tested test_mffsl.c with and without -mpower9-misc.
Regstrapping on ppc64le-linux-gnu.  Ok to install?

[rs6000] fix mffsl emulation

From: Alexandre Oliva <oliva@adacore.com>

The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going
through the motions, but not storing the result in the given
operands[0]; it rather modifies operands[0] without effect.  It also
creates a DImode pseudo that it doesn't use, overwriting subregs
instead.

The patch below fixes all of these, the indentation and a typo.

I'm concerned about several issues in the mffsl testcase.  First, I
don't see that comparing the values as doubles rather than as long
longs is desirable.  These are FPSCR bitfields, not FP numbers.  I
understand mffs et al use double because they output to FP registers,
and the bit patterns are subnormal FP numbers, so it works, but given
the need for bit masking of at least one side, I'm changing the
compare to long longs.

Another issue with the test is that, if the compare fails, it calls
mffsl again to print the value, as if it would yield the same result.
But part of the FPSCR that mffsl (emulated with mmfl or not) copies to
the output FP register is the FPCC, so the fcmpu used to compare the
result of the first mmfsl will modify FPSCR and thus the result of the
second mmfsl call.  After changing the compare, this is no longer the
case, but I still think it's better to make absolutely sure what we
print is what we compared: the least desirable thing during debugging
is to find out, after hours of investigation, that you were led down
the wrong path by incorrect information.

Yet another issue is that the test assumed the mmfs bits that are not
to be extracted by mffsl to be already zero, instead of masking them
out explicitly.  This is not about the mffs emulation in the mffsl
implementation, but about the mffs use in the test proper.  The bits
appear to be zero indeed, as the bits left out are for sticky
exceptions, but there are reserved parts of FPSCR that might turn out
to be set in the future, so we're better off masking them out
explicitly, otherwise those bits could cause the compare to fail.

If some future mffsl is changed so that it copies additional nonzero
bits, the test will fail, and then we'll have a chance to adjust it
and the emulation.

for  gcc/ChangeLog

	PR target/94812
	* gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
	output operand in emulation.  Don't overwrite pseudos.

for  gcc/testsuite/ChangeLog

	PR target/94812
	* gcc.target/powerpc/test_mffsl.c: Call mffsl only once.
	Reinterpret the doubles as long longs for compares.  Mask out
	mffs bits that are not expected from mffsl.
---
 gcc/config/rs6000/rs6000.md                   |   26 +++++++++++++------------
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |   12 ++++++++----
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 11ab745..86a16dd 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13620,18 +13620,20 @@

   if (!TARGET_P9_MISC)
     {
-       rtx tmp_di = gen_reg_rtx (DImode);
-       rtx tmp_df = gen_reg_rtx (DFmode);
-
-       /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
-          instruction using the mffs instruction and masking off the bits
-          the mmsl instruciton actually reads.  */
-       emit_insn (gen_rs6000_mffs (tmp_df));
-       tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
-       emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL)));
-
-       operands[0] = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
-       DONE;
+      rtx tmp1 = gen_reg_rtx (DFmode);
+
+      /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
+	 instruction using the mffs instruction and masking off the bits
+	 the mmsl instruction actually reads.  */
+      emit_insn (gen_rs6000_mffs (tmp1));
+
+      rtx tmp1di = simplify_gen_subreg (DImode, tmp1, DFmode, 0);
+      rtx tmp2 = gen_reg_rtx (DImode);
+      emit_insn (gen_anddi3 (tmp2, tmp1di, GEN_INT (0x70007f0ffLL)));
+
+      rtx tmp2df = simplify_gen_subreg (DFmode, tmp2, DImode, 0);
+      emit_move_insn (operands[0], tmp2df);
+      DONE;
     }

     emit_insn (gen_rs6000_mffsl_hw (operands[0]));
diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
index 93a8ec2..41377ef 100644
--- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
+++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
@@ -14,17 +14,21 @@ int main ()
   union blah {
     double d;
     unsigned long long ll;
-  } conv_val;
+  } mffs_val, mffsl_val;

   /* Test reading the FPSCR register.  */
   __asm __volatile ("mffs %0" : "=f"(f14));
-  conv_val.d = f14;
+  mffs_val.d = f14;
+  /* Select the same bits as mffsl.  */
+  mffs_val.ll &= 0x70007f0ffLL;

-  if (conv_val.d != __builtin_mffsl())
+  mffsl_val.d = __builtin_mffsl ();
+
+  if (mffs_val.ll != mffsl_val.ll)
     {
 #ifdef DEBUG
       printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expecected value 0x%llx\n",
-	     __builtin_mffsl(), conv_val.d);
+	     mffsl_val.ll, mffs_val.ll);
 #else
       abort();
 #endif

Segher Boessenkool April 28, 2020, 9:20 p.m. UTC | #5

Hi!

On Tue, Apr 28, 2020 at 12:25:04AM -0300, Alexandre Oliva wrote:
> On Apr 24, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> >> > since all the top bits are zeros always, it will always be a subnormal
> >> > number, so all comparisons will work as expected / wanted.
> >> 
> >> *nod*, as long as there's no trapping on subnormals.
> 
> > There isn't :-)  I did say this isn't clean at all, just *happens* to
> > work, right?  :-)
> 
> I hope you don't mind my using the union in the testcase, that was
> otherwise unused, to IMHO improve on that, then ;-)

That is fine, makes things clearer and more obviously correct.

> > Could you open one?
> 
> Sure, PR94812

Thanks!

> But part of the FPSCR that mffsl (emulated with mmfl or not) copies to

s/mmfl/mffs/

> the least desirable thing during debugging
> is to find out, after hours of investigation, that you were led down
> the wrong path by incorrect information.

If you think that is the worst that can happen...  :-P

> Yet another issue is that the test assumed the mmfs bits that are not

s/mmfs/mffs/

> +      rtx tmp1 = gen_reg_rtx (DFmode);
> +
> +      /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
> +	 instruction using the mffs instruction and masking off the bits
> +	 the mmsl instruction actually reads.  */
> +      emit_insn (gen_rs6000_mffs (tmp1));

s/mmsl/mffsl/

I'd just say "and masking the result".

Okay for trunk with such tweaks.  Thank you!  Also okay for the 9 branch,
after waiting to see if any fallout appears.


Segher

Alexandre Oliva April 28, 2020, 10:15 p.m. UTC | #6

On Apr 28, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> s/mmfl/mffs/
> s/mmfs/mffs/
> s/mmsl/mffsl/

Oh, my, looks like I missed some mispellings of ffmls :-P

Sorry about the typos, for some reason mffs makes it worse than usual
for me.

> Okay for trunk with such tweaks.  Thank you!  Also okay for the 9 branch,
> after waiting to see if any fallout appears.

Thanks, here's what I'm checking in...

[rs6000] fix mffsl emulation

From: Alexandre Oliva <oliva@adacore.com>

The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going
through the motions, but not storing the result in the given
operands[0]; it rather modifies operands[0] without effect.  It also
creates a DImode pseudo that it doesn't use, overwriting subregs
instead.

The patch below fixes all of these, the indentation and a typo.

I'm concerned about several issues in the mffsl testcase.  First, I
don't see that comparing the values as doubles rather than as long
longs is desirable.  These are FPSCR bitfields, not FP numbers.  I
understand mffs et al use double because they output to FP registers,
and the bit patterns are subnormal FP numbers, so it works, but given
the need for bit masking of at least one side, I'm changing the
compare to long longs.

Another issue with the test is that, if the compare fails, it calls
mffsl again to print the value, as if it would yield the same result.
But part of the FPSCR that mffsl (emulated with mffs or not) copies to
the output FP register is the FPCC, so the fcmpu used to compare the
result of the first mffsl will modify FPSCR and thus the result of the
second mffsl call.  After changing the compare, this is no longer the
case, but I still think it's better to make absolutely sure what we
print is what we compared.

Yet another issue is that the test assumed the mffs bits that are not
to be extracted by mffsl to be already zero, instead of masking them
out explicitly.  This is not about the mffs emulation in the mffsl
implementation, but about the mffs use in the test proper.  The bits
appear to be zero indeed, as the bits left out are for sticky
exceptions, but there are reserved parts of FPSCR that might turn out
to be set in the future, so we're better off masking them out
explicitly, otherwise those bits could cause the compare to fail.

If some future mffsl is changed so that it copies additional nonzero
bits, the test will fail, and then we'll have a chance to adjust it
and the emulation.

for  gcc/ChangeLog

	PR target/94812
	* gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to
	output operand in emulation.  Don't overwrite pseudos.

for  gcc/testsuite/ChangeLog

	PR target/94812
	* gcc.target/powerpc/test_mffsl.c: Call mffsl only once.
	Reinterpret the doubles as long longs for compares.  Mask out
	mffs bits that are not expected from mffsl.
---
 gcc/config/rs6000/rs6000.md                   |   25 +++++++++++++------------
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |   12 ++++++++----
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 11ab745..6173994 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13620,18 +13620,19 @@

   if (!TARGET_P9_MISC)
     {
-       rtx tmp_di = gen_reg_rtx (DImode);
-       rtx tmp_df = gen_reg_rtx (DFmode);
-
-       /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
-          instruction using the mffs instruction and masking off the bits
-          the mmsl instruciton actually reads.  */
-       emit_insn (gen_rs6000_mffs (tmp_df));
-       tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
-       emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL)));
-
-       operands[0] = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
-       DONE;
+      rtx tmp1 = gen_reg_rtx (DFmode);
+
+      /* The mffs instruction reads the entire FPSCR.  Emulate the mffsl
+	 instruction using the mffs instruction and masking the result.  */
+      emit_insn (gen_rs6000_mffs (tmp1));
+
+      rtx tmp1di = simplify_gen_subreg (DImode, tmp1, DFmode, 0);
+      rtx tmp2 = gen_reg_rtx (DImode);
+      emit_insn (gen_anddi3 (tmp2, tmp1di, GEN_INT (0x70007f0ffLL)));
+
+      rtx tmp2df = simplify_gen_subreg (DFmode, tmp2, DImode, 0);
+      emit_move_insn (operands[0], tmp2df);
+      DONE;
     }

     emit_insn (gen_rs6000_mffsl_hw (operands[0]));
diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
index 93a8ec2..41377ef 100644
--- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
+++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
@@ -14,17 +14,21 @@ int main ()
   union blah {
     double d;
     unsigned long long ll;
-  } conv_val;
+  } mffs_val, mffsl_val;

   /* Test reading the FPSCR register.  */
   __asm __volatile ("mffs %0" : "=f"(f14));
-  conv_val.d = f14;
+  mffs_val.d = f14;
+  /* Select the same bits as mffsl.  */
+  mffs_val.ll &= 0x70007f0ffLL;

-  if (conv_val.d != __builtin_mffsl())
+  mffsl_val.d = __builtin_mffsl ();
+
+  if (mffs_val.ll != mffsl_val.ll)
     {
 #ifdef DEBUG
       printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expecected value 0x%llx\n",
-	     __builtin_mffsl(), conv_val.d);
+	     mffsl_val.ll, mffs_val.ll);
 #else
       abort();
 #endif

Segher Boessenkool April 28, 2020, 11:54 p.m. UTC | #7

On Tue, Apr 28, 2020 at 07:15:58PM -0300, Alexandre Oliva wrote:
> On Apr 28, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> > s/mmfl/mffs/
> > s/mmfs/mffs/
> > s/mmsl/mffsl/
> 
> Oh, my, looks like I missed some mispellings of ffmls :-P

It helps to read the mnemonics as the full name -- the Power mnemonics
are normally very sensible.  Here, "move from floating status
[lightweight]" ("move from FPSCR", but let's not abbr abbrs).

> Sorry about the typos, for some reason mffs makes it worse than usual
> for me.

No problem; just trying to help make the commit message better :-)

Segher

Alexandre Oliva May 26, 2020, 2:38 p.m. UTC | #8

On Apr 28, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> Okay for trunk with such tweaks.  Thank you!  Also okay for the 9 branch,
> after waiting to see if any fallout appears.

Thanks, I've just pushed it to the gcc-9 branch.

[rs6000] fix mffsl emulation

Commit Message

Comments

Patch