diff mbox series

[v10,03/16] target/riscv: clwz must ignore high bits (use shift-left & changed logic)

Message ID 20210904203516.2570119-4-philipp.tomsich@vrull.eu
State New
Headers show
Series target/riscv: Update QEmu for Zb[abcs] 1.0.0 | expand

Commit Message

Philipp Tomsich Sept. 4, 2021, 8:35 p.m. UTC
Assume clzw being executed on a register that is not sign-extended, such
as for the following sequence that uses (1ULL << 63) | 392 as the operand
to clzw:
	bseti	a2, zero, 63
	addi	a2, a2, 392
	clzw    a3, a2
The correct result of clzw would be 23, but the current implementation
returns -32 (as it performs a 64bit clz, which results in 0 leading zero
bits, and then subtracts 32).

Fix this by changing the implementation to:
 1. shift the original register up by 32
 2. performs a target-length (64bit) clz
 3. return 32 if no bits are set

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

Changes in v10:
- New patch, fixing correctnes for clzw called on a register with undefined
  (as in: not properly sign-extended) upper bits.

 target/riscv/insn_trans/trans_rvb.c.inc | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Richard Henderson Sept. 5, 2021, 8:11 a.m. UTC | #1
On 9/4/21 10:35 PM, Philipp Tomsich wrote:
> Assume clzw being executed on a register that is not sign-extended, such
> as for the following sequence that uses (1ULL << 63) | 392 as the operand
> to clzw:
> 	bseti	a2, zero, 63
> 	addi	a2, a2, 392
> 	clzw    a3, a2
> The correct result of clzw would be 23, but the current implementation
> returns -32 (as it performs a 64bit clz, which results in 0 leading zero
> bits, and then subtracts 32).
> 
> Fix this by changing the implementation to:
>   1. shift the original register up by 32
>   2. performs a target-length (64bit) clz
>   3. return 32 if no bits are set
> 
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> ---
> 
> Changes in v10:
> - New patch, fixing correctnes for clzw called on a register with undefined
>    (as in: not properly sign-extended) upper bits.

But we have

     return gen_unary(ctx, a, EXT_ZERO, gen_clzw);

should *not* be undefined.  Ah, what's missing is

     ctx->w = true;

within trans_clzw to cause the extend to take effect.

There are a few other "w" functions that are missing that set, though they use EXT_NONE so 
there is no visible bug, it would probably be best to set w anyway.


r~
Philipp Tomsich Sept. 5, 2021, 9:01 a.m. UTC | #2
On Sun 5. Sep 2021 at 11:11, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 9/4/21 10:35 PM, Philipp Tomsich wrote:
> > Assume clzw being executed on a register that is not sign-extended, such
> > as for the following sequence that uses (1ULL << 63) | 392 as the operand
> > to clzw:
> >       bseti   a2, zero, 63
> >       addi    a2, a2, 392
> >       clzw    a3, a2
> > The correct result of clzw would be 23, but the current implementation
> > returns -32 (as it performs a 64bit clz, which results in 0 leading zero
> > bits, and then subtracts 32).
> >
> > Fix this by changing the implementation to:
> >   1. shift the original register up by 32
> >   2. performs a target-length (64bit) clz
> >   3. return 32 if no bits are set
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > ---
> >
> > Changes in v10:
> > - New patch, fixing correctnes for clzw called on a register with undefined
> >    (as in: not properly sign-extended) upper bits.
>
> But we have
>
>      return gen_unary(ctx, a, EXT_ZERO, gen_clzw);
>
> should *not* be undefined.  Ah, what's missing is
>
>      ctx->w = true;
>
> within trans_clzw to cause the extend to take effect.
>
> There are a few other "w" functions that are missing that set, though they use EXT_NONE so
> there is no visible bug, it would probably be best to set w anyway.


I had originally considered that (it would have to be ctx->w = true;
and EXT_SIGN),
but that has the side-effect of performing an extension on the result
of the clzw as
well (i.e. bot get_gpr and set_gpr are extended).

However, this is not what clzw does: it only ignores the upper bits
and then produces
a result in the value-range [0..32]. An extension on the result of
either a clz or clzw
is never needed (we have been fighting that problem in GCC and had to
use patterns
for the combiner), so I don't want to introduce this inefficiency in qemu.

If you have EXT_SIGN (or _ZERO), we will end up with 2 additional
extensions (one
on the argument, one on the result) in addition to the 2 other tcg
operations that we
need (i.e. clzi/subi for the extending case, shli/clzi for the proposed fix).

So I believe that this commit is the best fix:
1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the
input _and_
extends the output), as it only treats the input as 32bit, but the
output is xlen.
2. It doesn't introduce any unnecessary extensions, but handles the
case in-place.

Philipp.
Philipp Tomsich Sept. 10, 2021, 1:36 p.m. UTC | #3
Richard,

Did you have a chance to consider what to do with clzw?
I would prefer to avoid the extra extension instructions and change the
implementation (and would update the commit message to provide more
context), but if you insist on setting 'ctx->w' I'll just have the extra
extensions emitted than delay this series further…

Thanks,
Philipp.

On Sun, 5 Sept 2021 at 12:01, Philipp Tomsich <philipp.tomsich@vrull.eu>
wrote:

> On Sun 5. Sep 2021 at 11:11, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > On 9/4/21 10:35 PM, Philipp Tomsich wrote:
> > > Assume clzw being executed on a register that is not sign-extended,
> such
> > > as for the following sequence that uses (1ULL << 63) | 392 as the
> operand
> > > to clzw:
> > >       bseti   a2, zero, 63
> > >       addi    a2, a2, 392
> > >       clzw    a3, a2
> > > The correct result of clzw would be 23, but the current implementation
> > > returns -32 (as it performs a 64bit clz, which results in 0 leading
> zero
> > > bits, and then subtracts 32).
> > >
> > > Fix this by changing the implementation to:
> > >   1. shift the original register up by 32
> > >   2. performs a target-length (64bit) clz
> > >   3. return 32 if no bits are set
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > ---
> > >
> > > Changes in v10:
> > > - New patch, fixing correctnes for clzw called on a register with
> undefined
> > >    (as in: not properly sign-extended) upper bits.
> >
> > But we have
> >
> >      return gen_unary(ctx, a, EXT_ZERO, gen_clzw);
> >
> > should *not* be undefined.  Ah, what's missing is
> >
> >      ctx->w = true;
> >
> > within trans_clzw to cause the extend to take effect.
> >
> > There are a few other "w" functions that are missing that set, though
> they use EXT_NONE so
> > there is no visible bug, it would probably be best to set w anyway.
>
>
> I had originally considered that (it would have to be ctx->w = true;
> and EXT_SIGN),
> but that has the side-effect of performing an extension on the result
> of the clzw as
> well (i.e. bot get_gpr and set_gpr are extended).
>
> However, this is not what clzw does: it only ignores the upper bits
> and then produces
> a result in the value-range [0..32]. An extension on the result of
> either a clz or clzw
> is never needed (we have been fighting that problem in GCC and had to
> use patterns
> for the combiner), so I don't want to introduce this inefficiency in qemu.
>
> If you have EXT_SIGN (or _ZERO), we will end up with 2 additional
> extensions (one
> on the argument, one on the result) in addition to the 2 other tcg
> operations that we
> need (i.e. clzi/subi for the extending case, shli/clzi for the proposed
> fix).
>
> So I believe that this commit is the best fix:
> 1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the
> input _and_
> extends the output), as it only treats the input as 32bit, but the
> output is xlen.
> 2. It doesn't introduce any unnecessary extensions, but handles the
> case in-place.
>
> Philipp.
>
Richard Henderson Sept. 10, 2021, 1:40 p.m. UTC | #4
On 9/10/21 3:36 PM, Philipp Tomsich wrote:
> Richard,
> 
> Did you have a chance to consider what to do with clzw?
> I would prefer to avoid the extra extension instructions and change the implementation 
> (and would update the commit message to provide more context), but if you insist on 
> setting 'ctx->w' I'll just have the extra extensions emitted than delay this series further…

I don't mind not setting ctx->w, but bear in mind that UXL is going to automatically set 
this flag when executing RV32 on RV64.  That's why I have written a tcg patch set to 
eliminate unnecessary sign-extensions.

Do swap out EXT_ZERO for EXT_NONE in the gen_unary call if you're not going to make use of 
any kind of source extension.


r~
Philipp Tomsich Sept. 10, 2021, 1:47 p.m. UTC | #5
On Fri, 10 Sept 2021 at 16:40, Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 9/10/21 3:36 PM, Philipp Tomsich wrote:
> > Richard,
> >
> > Did you have a chance to consider what to do with clzw?
> > I would prefer to avoid the extra extension instructions and change the
> implementation
> > (and would update the commit message to provide more context), but if
> you insist on
> > setting 'ctx->w' I'll just have the extra extensions emitted than delay
> this series further…
>
> I don't mind not setting ctx->w, but bear in mind that UXL is going to
> automatically set
> this flag when executing RV32 on RV64.  That's why I have written a tcg
> patch set to
> eliminate unnecessary sign-extensions.
>

Ok, thanks!  Updated patches follow, once all test workloads have run…

Just wondering regarding the UXL-comment: the clzw instruction will be an
illegal encoding for RV32 (the w-form instructions are present on RV64
only), so it should never be encountered in a RV32 instruction stream.  Did
you mean that clz (the instruction operating on xlen-registers) would have
ctx->w set for RV32 executing on RV64 ... or am I missing something
fundamental?

Philipp.
Richard Henderson Sept. 10, 2021, 1:57 p.m. UTC | #6
On 9/10/21 3:47 PM, Philipp Tomsich wrote:
> Just wondering regarding the UXL-comment: the clzw instruction will be an illegal encoding 
> for RV32 (the w-form instructions are present on RV64 only), so it should never be 
> encountered in a RV32 instruction stream.

Correct.

> Did you mean that clz (the instruction operating on xlen-registers) would have ctx->w
> set for RV32 executing on RV64 ... or am I missing something fundamental?

Yes.

Or, as some test patches I was planning to post this weekend, replacing "w" as boolean 
with an "operation length" (ol) as an enum of MXL_RV*, so that we can represent "d" 
operations on RV128 with the same mechanism.


r~
diff mbox series

Patch

diff --git a/target/riscv/insn_trans/trans_rvb.c.inc b/target/riscv/insn_trans/trans_rvb.c.inc
index 6c85c89f6d..8d29cadd20 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -349,8 +349,10 @@  GEN_TRANS_SHADD(3)
 
 static void gen_clzw(TCGv ret, TCGv arg1)
 {
-    tcg_gen_clzi_tl(ret, arg1, 64);
-    tcg_gen_subi_tl(ret, ret, 32);
+    TCGv t = tcg_temp_new();
+    tcg_gen_shli_tl(t, arg1, 32);
+    tcg_gen_clzi_tl(ret, t, 32);
+    tcg_temp_free(t);
 }
 
 static bool trans_clzw(DisasContext *ctx, arg_clzw *a)