Message ID | 20231007090232.3329594-1-rjones@redhat.com |
---|---|
State | New |
Headers | show |
Series | target/riscv: Use a direct cast for better performance | expand |
If you're interested in how I found this problem, it was done using 'perf report -a -g' & flamegraphs. This is the flamegraph of qemu (on the host) when the guest is running the parallel compile: http://oirase.annexia.org/tmp/qemu-riscv.svg If you click into 'CPU_0/TCG' at the bottom left (all the vCPUs basically act alike), and then go to 'cpu_get_tb_cpu_state' you can see the call to 'object_dynamic_cast_assert' taking considerable time. If you zoom out, hit Ctrl F and type 'object_dynamic_cast_assert' into the search box then the flamegraph will tell you this call takes about 6.6% of total time (not all, but most, attributable to the call from 'cpu_get_tb_cpu_state' -> 'object_dynamic_cast_assert'). There are several other issues in the flamegraph which I'm trying to address, but this was the simplest one. Rich.
On 10/7/23 06:02, Richard W.M. Jones wrote: > RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled > this adds about 5% total overhead when emulating RV64 on x86-64 host. > > Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk > disk. The guest has a copy of the qemu source tree. The test > involves compiling the qemu source tree with 'make clean; time make -j16'. > > Before making this change the compile step took 449 & 447 seconds over > two consecutive runs. > > After making this change, 428 & 422 seconds. > > The saving is about 5%. > > Thanks: Paolo Bonzini > Signed-off-by: Richard W.M. Jones <rjones@redhat.com> > --- Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> > target/riscv/cpu_helper.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c > index 3a02079290..6174d99fb2 100644 > --- a/target/riscv/cpu_helper.c > +++ b/target/riscv/cpu_helper.c > @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, > uint64_t *cs_base, uint32_t *pflags) > { > CPUState *cs = env_cpu(env); > - RISCVCPU *cpu = RISCV_CPU(cs); > + /* > + * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when > + * qemu cast debugging is enabled, so use a direct cast instead. > + */ > + RISCVCPU *cpu = (RISCVCPU *)cs; > RISCVExtStatus fs, vs; > uint32_t flags = 0; >
On 10/7/23 02:02, Richard W.M. Jones wrote: > RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled > this adds about 5% total overhead when emulating RV64 on x86-64 host. > > Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk > disk. The guest has a copy of the qemu source tree. The test > involves compiling the qemu source tree with 'make clean; time make -j16'. > > Before making this change the compile step took 449 & 447 seconds over > two consecutive runs. > > After making this change, 428 & 422 seconds. > > The saving is about 5%. > > Thanks: Paolo Bonzini > Signed-off-by: Richard W.M. Jones <rjones@redhat.com> > --- > target/riscv/cpu_helper.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c > index 3a02079290..6174d99fb2 100644 > --- a/target/riscv/cpu_helper.c > +++ b/target/riscv/cpu_helper.c > @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, > uint64_t *cs_base, uint32_t *pflags) > { > CPUState *cs = env_cpu(env); > - RISCVCPU *cpu = RISCV_CPU(cs); > + /* > + * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when > + * qemu cast debugging is enabled, so use a direct cast instead. > + */ > + RISCVCPU *cpu = (RISCVCPU *)cs; RISCVCPU *cpu = env_archcpu(env); and avoid "CPUState *cs" entirely. r~
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c index 3a02079290..6174d99fb2 100644 --- a/target/riscv/cpu_helper.c +++ b/target/riscv/cpu_helper.c @@ -66,7 +66,11 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, vaddr *pc, uint64_t *cs_base, uint32_t *pflags) { CPUState *cs = env_cpu(env); - RISCVCPU *cpu = RISCV_CPU(cs); + /* + * Using the checked cast RISCV_CPU(cs) imposes ~ 5% overhead when + * qemu cast debugging is enabled, so use a direct cast instead. + */ + RISCVCPU *cpu = (RISCVCPU *)cs; RISCVExtStatus fs, vs; uint32_t flags = 0;
RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled this adds about 5% total overhead when emulating RV64 on x86-64 host. Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk disk. The guest has a copy of the qemu source tree. The test involves compiling the qemu source tree with 'make clean; time make -j16'. Before making this change the compile step took 449 & 447 seconds over two consecutive runs. After making this change, 428 & 422 seconds. The saving is about 5%. Thanks: Paolo Bonzini Signed-off-by: Richard W.M. Jones <rjones@redhat.com> --- target/riscv/cpu_helper.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)