mbox series

[v2,0/6] riscv: Apply Zawrs when available

Message ID 20240419135321.70781-8-ajones@ventanamicro.com
Headers show
Series riscv: Apply Zawrs when available | expand

Message

Andrew Jones April 19, 2024, 1:53 p.m. UTC
Zawrs provides two instructions (wrs.nto and wrs.sto), where both are
meant to allow the hart to enter a low-power state while waiting on a
store to a memory location. The instructions also both wait an
implementation-defined "short" duration (unless the implementation
terminates the stall for another reason). The difference is that while
wrs.sto will terminate when the duration elapses, wrs.nto, depending on
configuration, will either just keep waiting or an ILL exception will be
raised. Linux will use wrs.nto, so if platforms have an implementation
which falls in the "just keep waiting" category, then it should _not_
advertise Zawrs in the hardware description. Text to that effect has
been added to the Zawrs DT definition.

Like wfi (and with the same {m,h}status bits to configure it), when
wrs.nto is configured to raise exceptions it's expected that the higher
privilege level will see the instruction was a wait instruction, do
something, and then resume execution following the instruction. For
example, KVM does configure exceptions for wfi (hstatus.VTW=1) and
therefore also for wrs.nto. KVM does this for wfi since it's better to
allow other tasks to be scheduled while a VCPU waits for an interrupt.
For waits such as those where wrs.nto/sto would be used, which are
typically locks, it is also a good idea for KVM to be involved, as it
can attempt to schedule the lock holding VCPU.

This series starts with Christoph's addition of riscv smp_cond_load*
functions which apply wrs.sto when available. That patch has been
reworked to use wrs.nto and to use the same approach as Arm for the
wait loop, since we can't have arbitrary C code between the load-
reserved and the wrs. Then, hwprobe support is added (since the
instructions are also usable from usermode), and finally KVM is
taught about wrs.nto, allowing guests to see and use the Zawrs
extension.

We still don't have test results from hardware, and it's not possible to
prove that using Zawrs is a win when testing on QEMU, not even when
oversubscribing VCPUs to guests. However, it is possible to use KVM
selftests to force a scenario where we can prove Zawrs does its job and
does it well. [4] is a test which does this and, on my machine, without
Zawrs it takes 16 seconds to complete and with Zawrs it takes 0.25
seconds.

This series is also available here [1]. In order to use QEMU for testing
a build with [2] is needed. In order to enable guests to use Zawrs with
KVM using kvmtool, the branch at [3] may be used.

[1] https://github.com/jones-drew/linux/commits/riscv/zawrs-v2/
[2] https://lore.kernel.org/all/20240312152901.512001-2-ajones@ventanamicro.com/
[3] https://github.com/jones-drew/kvmtool/commits/riscv/zawrs/
[4] https://github.com/jones-drew/linux/commit/9311702bcd118bdbfa8b9be4a8ec355c40559499

Thanks,
drew

v2:
 - Added DT bindings patch with additional Linux specifications due
   to wrs.nto potentially never terminating, as suggested by Palmer
 - Added patch to share pause insn definition
 - Rework main Zawrs support patch to use Arm approach (which is
   also the approach that Andrea Parri suggested)
 - Dropped the riscv implementation of smp_cond_load_acquire().
   afaict, the generic implementation, which will use the riscv
   implementation of smp_cond_load_relaxed() is sufficient for riscv.
 - The rework was large enough (IMO) to drop Heiko's s-o-b and to
   add myself as a co-developer


Andrew Jones (5):
  riscv: Provide a definition for 'pause'
  dt-bindings: riscv: Add Zawrs ISA extension description
  riscv: hwprobe: export Zawrs ISA extension
  KVM: riscv: Support guest wrs.nto
  KVM: riscv: selftests: Add Zawrs extension to get-reg-list test

Christoph Müllner (1):
  riscv: Add Zawrs support for spinlocks

 Documentation/arch/riscv/hwprobe.rst          |  4 ++
 .../devicetree/bindings/riscv/extensions.yaml | 12 +++++
 arch/riscv/Kconfig                            | 20 +++++---
 arch/riscv/Makefile                           |  3 --
 arch/riscv/include/asm/barrier.h              | 45 ++++++++++------
 arch/riscv/include/asm/cmpxchg.h              | 51 +++++++++++++++++++
 arch/riscv/include/asm/hwcap.h                |  1 +
 arch/riscv/include/asm/insn-def.h             |  4 ++
 arch/riscv/include/asm/kvm_host.h             |  1 +
 arch/riscv/include/asm/vdso/processor.h       |  8 +--
 arch/riscv/include/uapi/asm/hwprobe.h         |  1 +
 arch/riscv/include/uapi/asm/kvm.h             |  1 +
 arch/riscv/kernel/cpufeature.c                |  1 +
 arch/riscv/kernel/sys_hwprobe.c               |  1 +
 arch/riscv/kvm/vcpu.c                         |  1 +
 arch/riscv/kvm/vcpu_insn.c                    | 15 ++++++
 arch/riscv/kvm/vcpu_onereg.c                  |  2 +
 .../selftests/kvm/riscv/get-reg-list.c        |  4 ++
 18 files changed, 144 insertions(+), 31 deletions(-)

Comments

Conor Dooley April 19, 2024, 3:22 p.m. UTC | #1
On Fri, Apr 19, 2024 at 03:53:25PM +0200, Andrew Jones wrote:

> +config RISCV_ISA_ZAWRS
> +	bool "Zawrs extension support for more efficient busy waiting"
> +	depends on RISCV_ALTERNATIVE
> +	default y
> +	help
> +	  The Zawrs extension defines instructions to be used in polling loops
> +	  which allow a hart to enter a low-power state or to trap to the
> +	  hypervisor while waiting on a store to a memory location. Enable the
> +	  use of these instructions in the kernel when the Zawrs extension is
> +	  detected at boot.

Ignoring the rest of the patch, and focusing on the bit relevant to our
other conversation, I think this description satisfies what I was trying
to do with the other options in terms of being clear about what exactly
it does.
Andrea Parri April 21, 2024, 9:16 p.m. UTC | #2
On Fri, Apr 19, 2024 at 03:53:25PM +0200, Andrew Jones wrote:
> From: Christoph M??llner <christoph.muellner@vrull.eu>
> 
> RISC-V code uses the generic ticket lock implementation, which calls
> the macros smp_cond_load_relaxed() and smp_cond_load_acquire().
> Introduce a RISC-V specific implementation of smp_cond_load_relaxed()
> which applies WRS.NTO of the Zawrs extension in order to reduce power
> consumption while waiting and allows hypervisors to enable guests to
> trap while waiting. smp_cond_load_acquire() doesn't need a RISC-V
> specific implementation as the generic implementation is based on
> smp_cond_load_relaxed() and smp_acquire__after_ctrl_dep() sufficiently
> provides the acquire semantics.
> 
> This implementation is heavily based on Arm's approach which is the
> approach Andrea Parri also suggested.
> 
> The Zawrs specification can be found here:
> https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
> 
> Signed-off-by: Christoph M??llner <christoph.muellner@vrull.eu>
> Co-developed-by: Andrew Jones <ajones@ventanamicro.com>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  arch/riscv/Kconfig                | 13 ++++++++
>  arch/riscv/include/asm/barrier.h  | 45 ++++++++++++++++++---------
>  arch/riscv/include/asm/cmpxchg.h  | 51 +++++++++++++++++++++++++++++++
>  arch/riscv/include/asm/hwcap.h    |  1 +
>  arch/riscv/include/asm/insn-def.h |  2 ++
>  arch/riscv/kernel/cpufeature.c    |  1 +
>  6 files changed, 98 insertions(+), 15 deletions(-)

Doesn't apply to riscv/for-next (due to, AFAIU,

  https://lore.kernel.org/all/171275883330.18495.10110341843571163280.git-patchwork-notify@kernel.org/ ).

But other than that, this LGTM.  One nit below.


> -#define __smp_store_release(p, v)					\
> -do {									\
> -	compiletime_assert_atomic_type(*p);				\
> -	RISCV_FENCE(rw, w);						\
> -	WRITE_ONCE(*p, v);						\
> -} while (0)
> -
> -#define __smp_load_acquire(p)						\
> -({									\
> -	typeof(*p) ___p1 = READ_ONCE(*p);				\
> -	compiletime_assert_atomic_type(*p);				\
> -	RISCV_FENCE(r, rw);						\
> -	___p1;								\
> -})
> -
>  /*
>   * This is a very specific barrier: it's currently only used in two places in
>   * the kernel, both in the scheduler.  See include/linux/spinlock.h for the two
> @@ -70,6 +56,35 @@ do {									\
>   */
>  #define smp_mb__after_spinlock()	RISCV_FENCE(iorw, iorw)
>  
> +#define __smp_store_release(p, v)					\
> +do {									\
> +	compiletime_assert_atomic_type(*p);				\
> +	RISCV_FENCE(rw, w);						\
> +	WRITE_ONCE(*p, v);						\
> +} while (0)
> +
> +#define __smp_load_acquire(p)						\
> +({									\
> +	typeof(*p) ___p1 = READ_ONCE(*p);				\
> +	compiletime_assert_atomic_type(*p);				\
> +	RISCV_FENCE(r, rw);						\
> +	___p1;								\
> +})

Unrelated/unmotivated changes.

  Andrea
Andrew Jones April 22, 2024, 8:36 a.m. UTC | #3
On Sun, Apr 21, 2024 at 11:16:47PM +0200, Andrea Parri wrote:
> On Fri, Apr 19, 2024 at 03:53:25PM +0200, Andrew Jones wrote:
> > From: Christoph M??llner <christoph.muellner@vrull.eu>
> > 
> > RISC-V code uses the generic ticket lock implementation, which calls
> > the macros smp_cond_load_relaxed() and smp_cond_load_acquire().
> > Introduce a RISC-V specific implementation of smp_cond_load_relaxed()
> > which applies WRS.NTO of the Zawrs extension in order to reduce power
> > consumption while waiting and allows hypervisors to enable guests to
> > trap while waiting. smp_cond_load_acquire() doesn't need a RISC-V
> > specific implementation as the generic implementation is based on
> > smp_cond_load_relaxed() and smp_acquire__after_ctrl_dep() sufficiently
> > provides the acquire semantics.
> > 
> > This implementation is heavily based on Arm's approach which is the
> > approach Andrea Parri also suggested.
> > 
> > The Zawrs specification can be found here:
> > https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc
> > 
> > Signed-off-by: Christoph M??llner <christoph.muellner@vrull.eu>
> > Co-developed-by: Andrew Jones <ajones@ventanamicro.com>
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  arch/riscv/Kconfig                | 13 ++++++++
> >  arch/riscv/include/asm/barrier.h  | 45 ++++++++++++++++++---------
> >  arch/riscv/include/asm/cmpxchg.h  | 51 +++++++++++++++++++++++++++++++
> >  arch/riscv/include/asm/hwcap.h    |  1 +
> >  arch/riscv/include/asm/insn-def.h |  2 ++
> >  arch/riscv/kernel/cpufeature.c    |  1 +
> >  6 files changed, 98 insertions(+), 15 deletions(-)
> 
> Doesn't apply to riscv/for-next (due to, AFAIU,
> 
>   https://lore.kernel.org/all/171275883330.18495.10110341843571163280.git-patchwork-notify@kernel.org/ ).

I based it on -rc1. We recently discussed what we should base on, but
I couldn't recall the final decision, so I fell back to the old approach.
I can rebase on for-next or the latest rc if that's the new, improved
approach.

> 
> But other than that, this LGTM.  One nit below.
> 
> 
> > -#define __smp_store_release(p, v)					\
> > -do {									\
> > -	compiletime_assert_atomic_type(*p);				\
> > -	RISCV_FENCE(rw, w);						\
> > -	WRITE_ONCE(*p, v);						\
> > -} while (0)
> > -
> > -#define __smp_load_acquire(p)						\
> > -({									\
> > -	typeof(*p) ___p1 = READ_ONCE(*p);				\
> > -	compiletime_assert_atomic_type(*p);				\
> > -	RISCV_FENCE(r, rw);						\
> > -	___p1;								\
> > -})
> > -
> >  /*
> >   * This is a very specific barrier: it's currently only used in two places in
> >   * the kernel, both in the scheduler.  See include/linux/spinlock.h for the two
> > @@ -70,6 +56,35 @@ do {									\
> >   */
> >  #define smp_mb__after_spinlock()	RISCV_FENCE(iorw, iorw)
> >  
> > +#define __smp_store_release(p, v)					\
> > +do {									\
> > +	compiletime_assert_atomic_type(*p);				\
> > +	RISCV_FENCE(rw, w);						\
> > +	WRITE_ONCE(*p, v);						\
> > +} while (0)
> > +
> > +#define __smp_load_acquire(p)						\
> > +({									\
> > +	typeof(*p) ___p1 = READ_ONCE(*p);				\
> > +	compiletime_assert_atomic_type(*p);				\
> > +	RISCV_FENCE(r, rw);						\
> > +	___p1;								\
> > +})
> 
> Unrelated/unmotivated changes.

The relation/motivation was to get the load/store macros in one part of
the file with the barrier macros in another. With this change we have

 __mb
 __rmb
 __wmb

 __smp_mb
 __smp_rmb
 __smp_wmb

 smp_mb__after_spinlock

 __smp_store_release
 __smp_load_acquire
 smp_cond_load_relaxed

Without the change, smp_mb__after_spinlock is either after all the
load/stores or in between them.

I didn't think the reorganization was worth its own patch, but I could
split it out (or just drop it).

Thanks,
drew