diff mbox series

[v1,1/2] powerpc/64s: system call rfscv workaround for TM bugs

Message ID 20210908101718.118522-1-npiggin@gmail.com
State New
Headers show
Series [v1,1/2] powerpc/64s: system call rfscv workaround for TM bugs | expand

Commit Message

Nicholas Piggin Sept. 8, 2021, 10:17 a.m. UTC
The rfscv instruction does not work correctly with the fake-suspend mode
in POWER9, which can end up with the hypervisor restoring an incorrect
checkpoint.

Work around this by setting the _TIF_RESTOREALL flag if a system call
returns to a transaction active state, causing rfid to be used instead
of rfscv to return, which will do the right thing. The contents of the
registers are irrelevant because they will be overwritten in this case
anyway.

Reported-by: Eirik Fuller <efuller@redhat.com>
Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/interrupt.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Daniel Axtens Sept. 17, 2021, 8:02 a.m. UTC | #1
Nicholas Piggin <npiggin@gmail.com> writes:

> The rfscv instruction does not work correctly with the fake-suspend mode
> in POWER9, which can end up with the hypervisor restoring an incorrect
> checkpoint.

If I understand correctly from commit 4bb3c7a0208f ("KVM: PPC: Book3S
HV: Work around transactional memory bugs in POWER9"), this is because
rfscv does not cause a soft-patch interrupt in the way that rfid etc do.
So we need to avoid calling rfscv if we are in fake-suspend state -
instead we must call something that does indeed get soft-patched - like
rfid.

> Work around this by setting the _TIF_RESTOREALL flag if a system call
> returns to a transaction active state, causing rfid to be used instead
> of rfscv to return, which will do the right thing. The contents of the
> registers are irrelevant because they will be overwritten in this case
> anyway.

I can follow that this will indeed cause syscall_exit_prepare to return
non-zero and therefore we should take the
syscall_vectored_*_restore_regs path which does an RFID_TO_USER rather
than a RFSCV_TO_USER. My only question/concern is:

.Lsyscall_vectored_\name\()_exit:
	addi	r4,r1,STACK_FRAME_OVERHEAD
	li	r5,1 /* scv */
	bl	syscall_exit_prepare            <-------- we get r3 != 0  here
	std	r1,PACA_EXIT_SAVE_R1(r13) /* save r1 for restart */
.Lsyscall_vectored_\name\()_rst_start:
	lbz	r11,PACAIRQHAPPENED(r13)
	andi.	r11,r11,(~PACA_IRQ_HARD_DIS)@l
	bne-	syscall_vectored_\name\()_restart <-- can we end up taking
	                                              this branch?

Are there any circumstances that would take us down the _restart path,
and if so, will we still go through the correct RFID_TO_USER branch
rather than the RFSCV_TO_USER branch?

Apart from that this looks good to me, although with the heavy
disclaimer that I only learned about fake suspend for the first time
while reviewing the patch.

Kind regards,
Daniel

>
> Reported-by: Eirik Fuller <efuller@redhat.com>
> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kernel/interrupt.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index c77c80214ad3..917a2ac4def6 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -139,6 +139,19 @@ notrace long system_call_exception(long r3, long r4, long r5,
>  	 */
>  	irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
>  
> +	/*
> +	 * If system call is called with TM active, set _TIF_RESTOREALL to
> +	 * prevent RFSCV being used to return to userspace, because POWER9
> +	 * TM implementation has problems with this instruction returning to
> +	 * transactional state. Final register values are not relevant because
> +	 * the transaction will be aborted upon return anyway. Or in the case
> +	 * of unsupported_scv SIGILL fault, the return state does not much
> +	 * matter because it's an edge case.
> +	 */
> +	if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
> +			unlikely(MSR_TM_TRANSACTIONAL(regs->msr)))
> +		current_thread_info()->flags |= _TIF_RESTOREALL;
> +
>  	/*
>  	 * If the system call was made with a transaction active, doom it and
>  	 * return without performing the system call. Unless it was an
> -- 
> 2.23.0
Michael Ellerman Sept. 19, 2021, 12:20 p.m. UTC | #2
On Wed, 8 Sep 2021 20:17:17 +1000, Nicholas Piggin wrote:
> The rfscv instruction does not work correctly with the fake-suspend mode
> in POWER9, which can end up with the hypervisor restoring an incorrect
> checkpoint.
> 
> Work around this by setting the _TIF_RESTOREALL flag if a system call
> returns to a transaction active state, causing rfid to be used instead
> of rfscv to return, which will do the right thing. The contents of the
> registers are irrelevant because they will be overwritten in this case
> anyway.
> 
> [...]

Applied to powerpc/fixes.

[1/2] powerpc/64s: system call rfscv workaround for TM bugs
      https://git.kernel.org/powerpc/c/ae7aaecc3f2f78b76ab3a8d6178610f55aadfa56
[2/2] KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers
      https://git.kernel.org/powerpc/c/267cdfa21385d78c794768233678756e32b39ead

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index c77c80214ad3..917a2ac4def6 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -139,6 +139,19 @@  notrace long system_call_exception(long r3, long r4, long r5,
 	 */
 	irq_soft_mask_regs_set_state(regs, IRQS_ENABLED);
 
+	/*
+	 * If system call is called with TM active, set _TIF_RESTOREALL to
+	 * prevent RFSCV being used to return to userspace, because POWER9
+	 * TM implementation has problems with this instruction returning to
+	 * transactional state. Final register values are not relevant because
+	 * the transaction will be aborted upon return anyway. Or in the case
+	 * of unsupported_scv SIGILL fault, the return state does not much
+	 * matter because it's an edge case.
+	 */
+	if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
+			unlikely(MSR_TM_TRANSACTIONAL(regs->msr)))
+		current_thread_info()->flags |= _TIF_RESTOREALL;
+
 	/*
 	 * If the system call was made with a transaction active, doom it and
 	 * return without performing the system call. Unless it was an