Message ID | ortv1amxns.fsf@livre.home |
---|---|
State | New |
Headers | show |
Series | [rs6000] fix mffsl emulation | expand |
Hi! On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote: > The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going > through the motions, but not storing the result in the given > operands[0]; it rather modifies operands[0] without effect. Heh, oops. > It also > creates a DImode pseudo that it doesn't use, a DFmode pseudo that's > unnecessary AFAICT, and it's not indented per the GNU Coding > Standards. The patch below fixes all of these. > > I wasn't sure simplify_gen_subreg might possibly emit any code in > obscure cases, It never does, it just returns an RTX. Where would it emit the insns to? > so I left it in place, and accommodated the possibility > that the result of the mode conversion back might need copying to the > requested output operand. In my tests, the output was always a > REG:DF, so the subreg was trivial and the conversion back got the > original REG:DF output back. > I'm concerned about several issues in the mffsl testcase. First, I > don't see that comparing the values as doubles rather than as long > longs is desirable. These are FPSCR bitfields, not FP numbers. I > understand mffs et al use double because they output to FP registers, > but... The bit patterns might not even be well-formed FP numbers. "Desirable", probably not, no. But it will always work: since all the top bits are zeros always, it will always be a subnormal number, so all comparisons will work as expected / wanted. > Another issue with the test is that, if the compare fails, it calls > mffsl again to print the value, as if it would yield the same result. > But part of the FPSCR that mffsl (emulated with mmfl or not) copies to > the output FP register is the FPCC, so the fcmpu used to compare the > result of the first mmfsl will modify FPSCR and thus the result of the > second mmfsl call. Yeah, good point. We never *use* FPCC (nowhere, not just in this test), but that is somewhat beside the point. OTOH, all this is only done if we are debugging the testcase, so it isn't important either way. > Yet another issue is that the test assumed the mmfs bits not extracted > by mffsl are all zero. This appears to be the case, as the bits left > out are for sticky exceptions, but there are reserved parts of FPSCR > that might turn out to be set in the future, and then the masking in > the GCC-emulated version of mffsl would zero out those bits and cause > the compare to fail. All those extra bits are required to be set to 0. From the ISA: For Move From FPSCR Lightweight (mffsl), do the following. The contents of the control bits in the FPSCR, that is, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), and the non-sticky status bits in the FPSCR, that is, bits 45:51 (FR, FI, C, FL, FG, FE, FU), are placed into the corresponding bits in register FRT. All other bits in register FRT are set to 0. > So I put in masking in the mffs result before the compare, but then, > what if mffsl is changed so that it copies additional nonzero bits? > Should we mask both mffs and mffsl outputs? Or is it safe to leave > those bits alone and assume them to be zero at the entry point of > main(), as the test used to do? The code as-is was correct here (the compiler code as well as the testcase code). > for gcc/ChangeLog > > * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to > output operand in emulation. Simplify. Indent with a tab please. > + rtx tmp_df = operands[0]; Please don't reuse pseudos (or worse, non-pseudo registers). This was correct in the original code. > + rtx tmp_di; Please just declare at first use, like the original did. It's the more modern style, and it actually is nicer ;-) > + /* The mffs instruction reads the entire FPSCR. Emulate the mffsl > + instruction using the mffs instruction and masking off the bits > + the mmsl instruciton actually reads. */ "mffsl instruction". Heh, I see the original was bad already, oops. As I said above, the insn is *required* to zero all other bits. > + emit_insn (gen_rs6000_mffs (tmp_df)); > + tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > + emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL))); > + > + tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); > + if (operands[0] != tmp_df) > + emit_move_insn (operands[0], tmp_df); > + DONE; That "==" won't do what you want, I guess? It's not needed, anyway, it is just a distracting premature micro-optimisation. So just do the emit_move_insn please, with whitespace and comment fixed if you want? Thanks, Segher (Is there a PR, btw?)
Hello, Segher, On Apr 23, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote: > On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote: >> I wasn't sure simplify_gen_subreg might possibly emit any code in >> obscure cases, > It never does, it just returns an RTX. Where would it emit the insns to? I'm pretty sure there are or were subreg-extraction functions that could emit new insns onto whatever insn stream was active in certain unusual circumstances; I wasn't sure whether this was one of them. > since all the top bits are zeros always, it will always be a subnormal > number, so all comparisons will work as expected / wanted. *nod*, as long as there's no trapping on subnormals. >> Yet another issue is that the test assumed the mmfs bits not extracted >> by mffsl are all zero. > All those extra bits are required to be set to 0. Not in the mffs output, no. See, at this point I'm not talking about the emulation, but about the mffs use in the test proper. Consider the case of emulated mffsl: we end up comparing the mffs output value with a masked version thereof. If any of the mffs output bits could possibly be nonzero, the compare would fail, not because the emulation is wrong, but because the test is comparing the output of mffsl with that of mffs, with all its bits. In my testing, mffs output zero in this program, so it worked, but are all the non-mffsl bits of FPSCR guaranteed to be zero, so the output of mffs doesn't have to be masked out by the test? What if some future CPU defines some of the reserved bits in FPSCR in a way that they should be nonzero? (And, could the ISA of such a future CPU possibly specify that mffsl should preserve those extra bits from FPSCR, rather than zero them out?) >> for gcc/ChangeLog >> >> * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to >> output operand in emulation. Simplify. > Indent with a tab please. Erhm, what I posted had TABs there. Did it get mangled? :-( >> + rtx tmp_df = operands[0]; > Please don't reuse pseudos Aah, I see, that's what the original tmp_di was trying to accomplish! I'm familiar with that general guidance, but I didn't think there's much for optimizers to do with an mffs output, and the predicate ensures we have a reg, but... I suppose additional pseudos make the subregging safer, and not adding extra uses for operands[0] might help with register allocation for it, so... How about this, to also avoid reusing tmpdi as anddi input and output? rtx tmp_df = gen_reg_rtx (DFmode); /* The mffs instruction reads the entire FPSCR. Emulate the mffsl instruction using the mffs instruction and masking off the bits the mmsl instruction actually reads. */ emit_insn (gen_rs6000_mffs (tmp_df)); rtx tmp_df_as_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); rtx tmp_di = gen_reg_rtx (DImode); emit_insn (gen_anddi3 (tmp_di, tmp_df_as_di, GEN_INT (0x70007f0ffLL))); rtx tmp_di_as_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); emit_move_insn (operands[0], tmp_di_as_df); DONE; > "mffsl instruction". Heh, I see the original was bad already, oops. Double oops. I had that typo fixed in some earlier version of the patch, but somehow I managed to drop that fix. >> + if (operands[0] != tmp_df) >> + emit_move_insn (operands[0], tmp_df); >> + DONE; > That "==" won't do what you want, I guess? If the incoming operands[0] is a REG, we'd be back to it after subregging back and forth; if it was a SUBREG to begin with, the cheap test would fail and we'd output the redundant move, that later passes will catch. Anyway, that's gone in the snippet above. > (Is there a PR, btw?) Not that I know of; we've hit the testcase FAIL in our own testing.
Hi! On Fri, Apr 24, 2020 at 04:52:46AM -0300, Alexandre Oliva wrote: > On Apr 23, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Thu, Apr 23, 2020 at 05:08:55AM -0300, Alexandre Oliva wrote: > >> I wasn't sure simplify_gen_subreg might possibly emit any code in > >> obscure cases, > > > It never does, it just returns an RTX. Where would it emit the insns to? > > I'm pretty sure there are or were subreg-extraction functions that could > emit new insns onto whatever insn stream was active in certain unusual > circumstances; Yeah, I think so too. > I wasn't sure whether this was one of them. Doesn't look that way, thankfully :-) > > since all the top bits are zeros always, it will always be a subnormal > > number, so all comparisons will work as expected / wanted. > > *nod*, as long as there's no trapping on subnormals. There isn't :-) I did say this isn't clean at all, just *happens* to work, right? :-) > >> Yet another issue is that the test assumed the mmfs bits not extracted > >> by mffsl are all zero. > > > All those extra bits are required to be set to 0. > > Not in the mffs output, no. See, at this point I'm not talking about > the emulation, but about the mffs use in the test proper. Hrm, I don't see it then. > Consider the case of emulated mffsl: we end up comparing the mffs output > value with a masked version thereof. If any of the mffs output bits > could possibly be nonzero, the compare would fail, not because the > emulation is wrong, but because the test is comparing the output of > mffsl with that of mffs, with all its bits. Oh. I thought that was masked as well. > In my testing, mffs output zero in this program, so it worked, but are > all the non-mffsl bits of FPSCR guaranteed to be zero, so the output of > mffs doesn't have to be masked out by the test? No, and there are many bits not yet defined, so it may change for future versions of the ISA. > What if some future CPU defines some of the reserved bits in FPSCR in a > way that they should be nonzero? In a way different between mffs and mffsl. Yeah. > (And, could the ISA of such a future CPU possibly specify that mffsl > should preserve those extra bits from FPSCR, rather than zero them out?) Yes, I don't see why not? Hrm, the exact language for the mffsl description does not allow that, but that wouldn't be the best thing to reply on. > >> for gcc/ChangeLog > >> > >> * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to > >> output operand in emulation. Simplify. > > > Indent with a tab please. > > Erhm, what I posted had TABs there. Did it get mangled? :-( Yes, to eight spaces, so that aligned "output" under "gcc". > >> + rtx tmp_df = operands[0]; > > > Please don't reuse pseudos > > Aah, I see, that's what the original tmp_di was trying to accomplish! :-) > I'm familiar with that general guidance, but I didn't think there's much > for optimizers to do with an mffs output, and the predicate ensures we > have a reg, but... I suppose additional pseudos make the subregging > safer, and not adding extra uses for operands[0] might help with > register allocation for it, so... How about this, to also avoid reusing > tmpdi as anddi input and output? No pseudo should be assigned to more than once. A la SSA, or "webs". Many of our optimisers do not work well otherwise. > rtx tmp_df = gen_reg_rtx (DFmode); > > /* The mffs instruction reads the entire FPSCR. Emulate the mffsl > instruction using the mffs instruction and masking off the bits > the mmsl instruction actually reads. */ > emit_insn (gen_rs6000_mffs (tmp_df)); > > rtx tmp_df_as_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); > rtx tmp_di = gen_reg_rtx (DImode); > emit_insn (gen_anddi3 (tmp_di, tmp_df_as_di, GEN_INT (0x70007f0ffLL))); > > rtx tmp_di_as_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); > emit_move_insn (operands[0], tmp_di_as_df); > DONE; Sure, that looks fine. Maybe tmp1, tmp2, etc. woukld be clearer here? :-) > > "mffsl instruction". Heh, I see the original was bad already, oops. > > Double oops. I had that typo fixed in some earlier version of the > patch, but somehow I managed to drop that fix. > > >> + if (operands[0] != tmp_df) > >> + emit_move_insn (operands[0], tmp_df); > >> + DONE; > > > That "==" won't do what you want, I guess? > > If the incoming operands[0] is a REG, we'd be back to it after > subregging back and forth; if it was a SUBREG to begin with, the cheap > test would fail and we'd output the redundant move, that later passes > will catch. Anyway, that's gone in the snippet above. > > > (Is there a PR, btw?) > > Not that I know of; we've hit the testcase FAIL in our own testing. Could you open one? This patch will need backporting, etc. I can do it, if you really hate bugzilla, but I think it is easier for you to do it :-) Thanks! Segher
On Apr 24, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote: >> > since all the top bits are zeros always, it will always be a subnormal >> > number, so all comparisons will work as expected / wanted. >> >> *nod*, as long as there's no trapping on subnormals. > There isn't :-) I did say this isn't clean at all, just *happens* to > work, right? :-) I hope you don't mind my using the union in the testcase, that was otherwise unused, to IMHO improve on that, then ;-) >> Erhm, what I posted had TABs there. Did it get mangled? :-( > Yes, to eight spaces, so that aligned "output" under "gcc". Aah, found them in lines other than the ones I'd verified. Fixed. > Sure, that looks fine. Maybe tmp1, tmp2, etc. woukld be clearer here? That works for me. > Could you open one? Sure, PR94812 Tested test_mffsl.c with and without -mpower9-misc. Regstrapping on ppc64le-linux-gnu. Ok to install? [rs6000] fix mffsl emulation From: Alexandre Oliva <oliva@adacore.com> The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going through the motions, but not storing the result in the given operands[0]; it rather modifies operands[0] without effect. It also creates a DImode pseudo that it doesn't use, overwriting subregs instead. The patch below fixes all of these, the indentation and a typo. I'm concerned about several issues in the mffsl testcase. First, I don't see that comparing the values as doubles rather than as long longs is desirable. These are FPSCR bitfields, not FP numbers. I understand mffs et al use double because they output to FP registers, and the bit patterns are subnormal FP numbers, so it works, but given the need for bit masking of at least one side, I'm changing the compare to long longs. Another issue with the test is that, if the compare fails, it calls mffsl again to print the value, as if it would yield the same result. But part of the FPSCR that mffsl (emulated with mmfl or not) copies to the output FP register is the FPCC, so the fcmpu used to compare the result of the first mmfsl will modify FPSCR and thus the result of the second mmfsl call. After changing the compare, this is no longer the case, but I still think it's better to make absolutely sure what we print is what we compared: the least desirable thing during debugging is to find out, after hours of investigation, that you were led down the wrong path by incorrect information. Yet another issue is that the test assumed the mmfs bits that are not to be extracted by mffsl to be already zero, instead of masking them out explicitly. This is not about the mffs emulation in the mffsl implementation, but about the mffs use in the test proper. The bits appear to be zero indeed, as the bits left out are for sticky exceptions, but there are reserved parts of FPSCR that might turn out to be set in the future, so we're better off masking them out explicitly, otherwise those bits could cause the compare to fail. If some future mffsl is changed so that it copies additional nonzero bits, the test will fail, and then we'll have a chance to adjust it and the emulation. for gcc/ChangeLog PR target/94812 * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to output operand in emulation. Don't overwrite pseudos. for gcc/testsuite/ChangeLog PR target/94812 * gcc.target/powerpc/test_mffsl.c: Call mffsl only once. Reinterpret the doubles as long longs for compares. Mask out mffs bits that are not expected from mffsl. --- gcc/config/rs6000/rs6000.md | 26 +++++++++++++------------ gcc/testsuite/gcc.target/powerpc/test_mffsl.c | 12 ++++++++---- 2 files changed, 22 insertions(+), 16 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 11ab745..86a16dd 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -13620,18 +13620,20 @@ if (!TARGET_P9_MISC) { - rtx tmp_di = gen_reg_rtx (DImode); - rtx tmp_df = gen_reg_rtx (DFmode); - - /* The mffs instruction reads the entire FPSCR. Emulate the mffsl - instruction using the mffs instruction and masking off the bits - the mmsl instruciton actually reads. */ - emit_insn (gen_rs6000_mffs (tmp_df)); - tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); - emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL))); - - operands[0] = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); - DONE; + rtx tmp1 = gen_reg_rtx (DFmode); + + /* The mffs instruction reads the entire FPSCR. Emulate the mffsl + instruction using the mffs instruction and masking off the bits + the mmsl instruction actually reads. */ + emit_insn (gen_rs6000_mffs (tmp1)); + + rtx tmp1di = simplify_gen_subreg (DImode, tmp1, DFmode, 0); + rtx tmp2 = gen_reg_rtx (DImode); + emit_insn (gen_anddi3 (tmp2, tmp1di, GEN_INT (0x70007f0ffLL))); + + rtx tmp2df = simplify_gen_subreg (DFmode, tmp2, DImode, 0); + emit_move_insn (operands[0], tmp2df); + DONE; } emit_insn (gen_rs6000_mffsl_hw (operands[0])); diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c index 93a8ec2..41377ef 100644 --- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c @@ -14,17 +14,21 @@ int main () union blah { double d; unsigned long long ll; - } conv_val; + } mffs_val, mffsl_val; /* Test reading the FPSCR register. */ __asm __volatile ("mffs %0" : "=f"(f14)); - conv_val.d = f14; + mffs_val.d = f14; + /* Select the same bits as mffsl. */ + mffs_val.ll &= 0x70007f0ffLL; - if (conv_val.d != __builtin_mffsl()) + mffsl_val.d = __builtin_mffsl (); + + if (mffs_val.ll != mffsl_val.ll) { #ifdef DEBUG printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expecected value 0x%llx\n", - __builtin_mffsl(), conv_val.d); + mffsl_val.ll, mffs_val.ll); #else abort(); #endif
Hi! On Tue, Apr 28, 2020 at 12:25:04AM -0300, Alexandre Oliva wrote: > On Apr 24, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > >> > since all the top bits are zeros always, it will always be a subnormal > >> > number, so all comparisons will work as expected / wanted. > >> > >> *nod*, as long as there's no trapping on subnormals. > > > There isn't :-) I did say this isn't clean at all, just *happens* to > > work, right? :-) > > I hope you don't mind my using the union in the testcase, that was > otherwise unused, to IMHO improve on that, then ;-) That is fine, makes things clearer and more obviously correct. > > Could you open one? > > Sure, PR94812 Thanks! > But part of the FPSCR that mffsl (emulated with mmfl or not) copies to s/mmfl/mffs/ > the least desirable thing during debugging > is to find out, after hours of investigation, that you were led down > the wrong path by incorrect information. If you think that is the worst that can happen... :-P > Yet another issue is that the test assumed the mmfs bits that are not s/mmfs/mffs/ > + rtx tmp1 = gen_reg_rtx (DFmode); > + > + /* The mffs instruction reads the entire FPSCR. Emulate the mffsl > + instruction using the mffs instruction and masking off the bits > + the mmsl instruction actually reads. */ > + emit_insn (gen_rs6000_mffs (tmp1)); s/mmsl/mffsl/ I'd just say "and masking the result". Okay for trunk with such tweaks. Thank you! Also okay for the 9 branch, after waiting to see if any fallout appears. Segher
On Apr 28, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote: > s/mmfl/mffs/ > s/mmfs/mffs/ > s/mmsl/mffsl/ Oh, my, looks like I missed some mispellings of ffmls :-P Sorry about the typos, for some reason mffs makes it worse than usual for me. > Okay for trunk with such tweaks. Thank you! Also okay for the 9 branch, > after waiting to see if any fallout appears. Thanks, here's what I'm checking in... [rs6000] fix mffsl emulation From: Alexandre Oliva <oliva@adacore.com> The emulation of mffsl with mffs, used when !TARGET_P9_MISC, is going through the motions, but not storing the result in the given operands[0]; it rather modifies operands[0] without effect. It also creates a DImode pseudo that it doesn't use, overwriting subregs instead. The patch below fixes all of these, the indentation and a typo. I'm concerned about several issues in the mffsl testcase. First, I don't see that comparing the values as doubles rather than as long longs is desirable. These are FPSCR bitfields, not FP numbers. I understand mffs et al use double because they output to FP registers, and the bit patterns are subnormal FP numbers, so it works, but given the need for bit masking of at least one side, I'm changing the compare to long longs. Another issue with the test is that, if the compare fails, it calls mffsl again to print the value, as if it would yield the same result. But part of the FPSCR that mffsl (emulated with mffs or not) copies to the output FP register is the FPCC, so the fcmpu used to compare the result of the first mffsl will modify FPSCR and thus the result of the second mffsl call. After changing the compare, this is no longer the case, but I still think it's better to make absolutely sure what we print is what we compared. Yet another issue is that the test assumed the mffs bits that are not to be extracted by mffsl to be already zero, instead of masking them out explicitly. This is not about the mffs emulation in the mffsl implementation, but about the mffs use in the test proper. The bits appear to be zero indeed, as the bits left out are for sticky exceptions, but there are reserved parts of FPSCR that might turn out to be set in the future, so we're better off masking them out explicitly, otherwise those bits could cause the compare to fail. If some future mffsl is changed so that it copies additional nonzero bits, the test will fail, and then we'll have a chance to adjust it and the emulation. for gcc/ChangeLog PR target/94812 * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to output operand in emulation. Don't overwrite pseudos. for gcc/testsuite/ChangeLog PR target/94812 * gcc.target/powerpc/test_mffsl.c: Call mffsl only once. Reinterpret the doubles as long longs for compares. Mask out mffs bits that are not expected from mffsl. --- gcc/config/rs6000/rs6000.md | 25 +++++++++++++------------ gcc/testsuite/gcc.target/powerpc/test_mffsl.c | 12 ++++++++---- 2 files changed, 21 insertions(+), 16 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 11ab745..6173994 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -13620,18 +13620,19 @@ if (!TARGET_P9_MISC) { - rtx tmp_di = gen_reg_rtx (DImode); - rtx tmp_df = gen_reg_rtx (DFmode); - - /* The mffs instruction reads the entire FPSCR. Emulate the mffsl - instruction using the mffs instruction and masking off the bits - the mmsl instruciton actually reads. */ - emit_insn (gen_rs6000_mffs (tmp_df)); - tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); - emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL))); - - operands[0] = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); - DONE; + rtx tmp1 = gen_reg_rtx (DFmode); + + /* The mffs instruction reads the entire FPSCR. Emulate the mffsl + instruction using the mffs instruction and masking the result. */ + emit_insn (gen_rs6000_mffs (tmp1)); + + rtx tmp1di = simplify_gen_subreg (DImode, tmp1, DFmode, 0); + rtx tmp2 = gen_reg_rtx (DImode); + emit_insn (gen_anddi3 (tmp2, tmp1di, GEN_INT (0x70007f0ffLL))); + + rtx tmp2df = simplify_gen_subreg (DFmode, tmp2, DImode, 0); + emit_move_insn (operands[0], tmp2df); + DONE; } emit_insn (gen_rs6000_mffsl_hw (operands[0])); diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c index 93a8ec2..41377ef 100644 --- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c @@ -14,17 +14,21 @@ int main () union blah { double d; unsigned long long ll; - } conv_val; + } mffs_val, mffsl_val; /* Test reading the FPSCR register. */ __asm __volatile ("mffs %0" : "=f"(f14)); - conv_val.d = f14; + mffs_val.d = f14; + /* Select the same bits as mffsl. */ + mffs_val.ll &= 0x70007f0ffLL; - if (conv_val.d != __builtin_mffsl()) + mffsl_val.d = __builtin_mffsl (); + + if (mffs_val.ll != mffsl_val.ll) { #ifdef DEBUG printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expecected value 0x%llx\n", - __builtin_mffsl(), conv_val.d); + mffsl_val.ll, mffs_val.ll); #else abort(); #endif
On Tue, Apr 28, 2020 at 07:15:58PM -0300, Alexandre Oliva wrote: > On Apr 28, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > > s/mmfl/mffs/ > > s/mmfs/mffs/ > > s/mmsl/mffsl/ > > Oh, my, looks like I missed some mispellings of ffmls :-P It helps to read the mnemonics as the full name -- the Power mnemonics are normally very sensible. Here, "move from floating status [lightweight]" ("move from FPSCR", but let's not abbr abbrs). > Sorry about the typos, for some reason mffs makes it worse than usual > for me. No problem; just trying to help make the commit message better :-) Segher
On Apr 28, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote: > Okay for trunk with such tweaks. Thank you! Also okay for the 9 branch, > after waiting to see if any fallout appears. Thanks, I've just pushed it to the gcc-9 branch.
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 11ab745..8f1ab55 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -13620,18 +13620,20 @@ if (!TARGET_P9_MISC) { - rtx tmp_di = gen_reg_rtx (DImode); - rtx tmp_df = gen_reg_rtx (DFmode); - - /* The mffs instruction reads the entire FPSCR. Emulate the mffsl - instruction using the mffs instruction and masking off the bits - the mmsl instruciton actually reads. */ - emit_insn (gen_rs6000_mffs (tmp_df)); - tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); - emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL))); - - operands[0] = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); - DONE; + rtx tmp_df = operands[0]; + rtx tmp_di; + + /* The mffs instruction reads the entire FPSCR. Emulate the mffsl + instruction using the mffs instruction and masking off the bits + the mmsl instruciton actually reads. */ + emit_insn (gen_rs6000_mffs (tmp_df)); + tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0); + emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (0x70007f0ffLL))); + + tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0); + if (operands[0] != tmp_df) + emit_move_insn (operands[0], tmp_df); + DONE; } emit_insn (gen_rs6000_mffsl_hw (operands[0])); diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c index 93a8ec2..a1f73aa 100644 --- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c @@ -14,17 +14,21 @@ int main () union blah { double d; unsigned long long ll; - } conv_val; + } mffs_val, mffsl_val; /* Test reading the FPSCR register. */ __asm __volatile ("mffs %0" : "=f"(f14)); - conv_val.d = f14; + mffs_val.d = f14; + /* Select the bits obtained by mffsl. */ + mffs_val.ll &= 0x70007f0ffLL; - if (conv_val.d != __builtin_mffsl()) + mffsl_val.d = __builtin_mffsl (); + + if (mffs_val.ll != mffsl_val.ll) { #ifdef DEBUG printf("ERROR, __builtin_mffsl() returned 0x%llx, not the expecected value 0x%llx\n", - __builtin_mffsl(), conv_val.d); + mffsl_val.ll, mffs_val.ll); #else abort(); #endif