powerpc/32: Clear volatile regs on syscall exit

Message ID	28b040bd2357a1879df0ca1b74094323f778a472.1645636285.git.christophe.leroy@csgroup.eu (mailing list archive)
State	Changes Requested
Headers	show Return-Path: <linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org> From: Christophe Leroy <christophe.leroy@csgroup.eu> To: Kees Cook <keescook@chromium.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au> Subject: [PATCH] powerpc/32: Clear volatile regs on syscall exit Date: Wed, 23 Feb 2022 18:11:36 +0100 Message-Id: <28b040bd2357a1879df0ca1b74094323f778a472.1645636285.git.christophe.leroy@csgroup.eu> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Errors-To: linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" <linuxppc-dev-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>
Series	powerpc/32: Clear volatile regs on syscall exit \| expand powerpc/32: Clear volatile regs on syscall exit

Context	Check	Description
snowpatch_ozlabs/github-powerpc_selftests	success	Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_ppctests	success	Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_clang	success	Successfully ran 7 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu	success	Successfully ran 24 jobs.
snowpatch_ozlabs/github-powerpc_sparse	success	Successfully ran 4 jobs.

Christophe Leroy Feb. 23, 2022, 5:11 p.m. UTC

Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
added zeroing of used registers at function exit.

At the time being, PPC64 clears volatile registers on syscall exit but
PPC32 doesn't do it for performance reason.

Add that clearing in PPC32 syscall exit as well, but only when
CONFIG_ZERO_CALL_USED_REGS is selected.

On an 8xx, the null_syscall selftest gives:
- Without CONFIG_ZERO_CALL_USED_REGS		: 288 cycles
- With CONFIG_ZERO_CALL_USED_REGS		: 305 cycles
- With CONFIG_ZERO_CALL_USED_REGS + this patch	: 319 cycles

Note that (independent of this patch), with pmac32_defconfig,
vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:

   text	   	data	    bss	    dec	    hex		filename
9578869		2525210	 194400	12298479	bba8ef	vmlinux.without
10318045	2525210  194400	13037655	c6f057	vmlinux.with

That is a 7.7% increase on text size, 6.0% on overall size.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/entry_32.S | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Kees Cook Feb. 23, 2022, 7:34 p.m. UTC | #1

On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
> added zeroing of used registers at function exit.
> 
> At the time being, PPC64 clears volatile registers on syscall exit but
> PPC32 doesn't do it for performance reason.
> 
> Add that clearing in PPC32 syscall exit as well, but only when
> CONFIG_ZERO_CALL_USED_REGS is selected.
> 
> On an 8xx, the null_syscall selftest gives:
> - Without CONFIG_ZERO_CALL_USED_REGS		: 288 cycles
> - With CONFIG_ZERO_CALL_USED_REGS		: 305 cycles
> - With CONFIG_ZERO_CALL_USED_REGS + this patch	: 319 cycles
> 
> Note that (independent of this patch), with pmac32_defconfig,
> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
> 
>    text	   	data	    bss	    dec	    hex		filename
> 9578869		2525210	 194400	12298479	bba8ef	vmlinux.without
> 10318045	2525210  194400	13037655	c6f057	vmlinux.with
> 
> That is a 7.7% increase on text size, 6.0% on overall size.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/entry_32.S | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 7748c278d13c..199f23092c02 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -151,6 +151,21 @@ syscall_exit_finish:
>  	bne	3f
>  	mtcr	r5
>  
> +#ifdef CONFIG_ZERO_CALL_USED_REGS
> +	/* Zero volatile regs that may contain sensitive kernel data */
> +	li	r0,0
> +	li	r4,0
> +	li	r5,0
> +	li	r6,0
> +	li	r7,0
> +	li	r8,0
> +	li	r9,0
> +	li	r10,0
> +	li	r11,0
> +	li	r12,0
> +	mtctr	r0
> +	mtxer	r0
> +#endif

I think this should probably be unconditional -- if this is actually
leaking kernel pointers (or data) that's pretty bad. :|

If you really want to leave it build-time selectable, maybe add a new
config that gets "select"ed by CONFIG_ZERO_CALL_USED_REGS?

(And you may want to consider wiping all "unused" registers at syscall
entry as well.)

-Kees

>  1:	lwz	r2,GPR2(r1)
>  	lwz	r1,GPR1(r1)
>  	rfi
> -- 
> 2.34.1
>

Gabriel Paubert Feb. 23, 2022, 8:48 p.m. UTC | #2

On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
> added zeroing of used registers at function exit.
> 
> At the time being, PPC64 clears volatile registers on syscall exit but
> PPC32 doesn't do it for performance reason.
> 
> Add that clearing in PPC32 syscall exit as well, but only when
> CONFIG_ZERO_CALL_USED_REGS is selected.
> 
> On an 8xx, the null_syscall selftest gives:
> - Without CONFIG_ZERO_CALL_USED_REGS		: 288 cycles
> - With CONFIG_ZERO_CALL_USED_REGS		: 305 cycles
> - With CONFIG_ZERO_CALL_USED_REGS + this patch	: 319 cycles
> 
> Note that (independent of this patch), with pmac32_defconfig,
> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
> 
>    text	   	data	    bss	    dec	    hex		filename
> 9578869		2525210	 194400	12298479	bba8ef	vmlinux.without
> 10318045	2525210  194400	13037655	c6f057	vmlinux.with
> 
> That is a 7.7% increase on text size, 6.0% on overall size.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/entry_32.S | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 7748c278d13c..199f23092c02 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -151,6 +151,21 @@ syscall_exit_finish:
>  	bne	3f
>  	mtcr	r5
>  
> +#ifdef CONFIG_ZERO_CALL_USED_REGS
> +	/* Zero volatile regs that may contain sensitive kernel data */
> +	li	r0,0
> +	li	r4,0
> +	li	r5,0
> +	li	r6,0
> +	li	r7,0
> +	li	r8,0
> +	li	r9,0
> +	li	r10,0
> +	li	r11,0
> +	li	r12,0
> +	mtctr	r0
> +	mtxer	r0

Here, I'm almost sure that on some processors, it would be better to
separate mtctr form mtxer. mtxer is typically very expensive (pipeline
flush) but I don't know what's the best ordering for the average core.

And what about lr? Should it also be cleared?

	Gabriel

> +#endif
>  1:	lwz	r2,GPR2(r1)
>  	lwz	r1,GPR1(r1)
>  	rfi
> -- 
> 2.34.1
>

Segher Boessenkool Feb. 23, 2022, 11:27 p.m. UTC | #3

On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > +	/* Zero volatile regs that may contain sensitive kernel data */
> > +	li	r0,0
> > +	li	r4,0
> > +	li	r5,0
> > +	li	r6,0
> > +	li	r7,0
> > +	li	r8,0
> > +	li	r9,0
> > +	li	r10,0
> > +	li	r11,0
> > +	li	r12,0
> > +	mtctr	r0
> > +	mtxer	r0
> 
> Here, I'm almost sure that on some processors, it would be better to
> separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> flush) but I don't know what's the best ordering for the average core.

mtxer is cheaper than mtctr on many cores :-)

On p9 mtxer is cracked into two latency 3 ops (which run in parallel).
While mtctr has latency 5.

On p8 mtxer was horrible indeed (but nothing near as bad as a pipeline
flush).


Segher

Christophe Leroy Feb. 24, 2022, 6:41 a.m. UTC | #4

Le 23/02/2022 à 21:48, Gabriel Paubert a écrit :
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
>> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
>> added zeroing of used registers at function exit.
>>
>> At the time being, PPC64 clears volatile registers on syscall exit but
>> PPC32 doesn't do it for performance reason.
>>
>> Add that clearing in PPC32 syscall exit as well, but only when
>> CONFIG_ZERO_CALL_USED_REGS is selected.
>>
>> On an 8xx, the null_syscall selftest gives:
>> - Without CONFIG_ZERO_CALL_USED_REGS		: 288 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS		: 305 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS + this patch	: 319 cycles
>>
>> Note that (independent of this patch), with pmac32_defconfig,
>> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
>>
>>     text	   	data	    bss	    dec	    hex		filename
>> 9578869		2525210	 194400	12298479	bba8ef	vmlinux.without
>> 10318045	2525210  194400	13037655	c6f057	vmlinux.with
>>
>> That is a 7.7% increase on text size, 6.0% on overall size.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>>   arch/powerpc/kernel/entry_32.S | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
>> index 7748c278d13c..199f23092c02 100644
>> --- a/arch/powerpc/kernel/entry_32.S
>> +++ b/arch/powerpc/kernel/entry_32.S
>> @@ -151,6 +151,21 @@ syscall_exit_finish:
>>   	bne	3f
>>   	mtcr	r5
>>   
>> +#ifdef CONFIG_ZERO_CALL_USED_REGS
>> +	/* Zero volatile regs that may contain sensitive kernel data */
>> +	li	r0,0
>> +	li	r4,0
>> +	li	r5,0
>> +	li	r6,0
>> +	li	r7,0
>> +	li	r8,0
>> +	li	r9,0
>> +	li	r10,0
>> +	li	r11,0
>> +	li	r12,0
>> +	mtctr	r0
>> +	mtxer	r0
> 
> Here, I'm almost sure that on some processors, it would be better to
> separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> flush) but I don't know what's the best ordering for the average core.

In the 8xx, CTR and LR are handled by the BPU as any other reg (Latency 
1 blocage 1).
AFAIU, XER is serialize + 1

> 
> And what about lr? Should it also be cleared?

LR is restored from stack.

Christophe

Christophe Leroy Feb. 24, 2022, 7 a.m. UTC | #5

Le 23/02/2022 à 20:34, Kees Cook a écrit :
> On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
>> Commit a82adfd5c7cb ("hardening: Introduce CONFIG_ZERO_CALL_USED_REGS")
>> added zeroing of used registers at function exit.
>>
>> At the time being, PPC64 clears volatile registers on syscall exit but
>> PPC32 doesn't do it for performance reason.
>>
>> Add that clearing in PPC32 syscall exit as well, but only when
>> CONFIG_ZERO_CALL_USED_REGS is selected.
>>
>> On an 8xx, the null_syscall selftest gives:
>> - Without CONFIG_ZERO_CALL_USED_REGS		: 288 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS		: 305 cycles
>> - With CONFIG_ZERO_CALL_USED_REGS + this patch	: 319 cycles
>>
>> Note that (independent of this patch), with pmac32_defconfig,
>> vmlinux size is as follows with/without CONFIG_ZERO_CALL_USED_REGS:
>>
>>     text	   	data	    bss	    dec	    hex		filename
>> 9578869		2525210	 194400	12298479	bba8ef	vmlinux.without
>> 10318045	2525210  194400	13037655	c6f057	vmlinux.with
>>
>> That is a 7.7% increase on text size, 6.0% on overall size.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>>   arch/powerpc/kernel/entry_32.S | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
>> index 7748c278d13c..199f23092c02 100644
>> --- a/arch/powerpc/kernel/entry_32.S
>> +++ b/arch/powerpc/kernel/entry_32.S
>> @@ -151,6 +151,21 @@ syscall_exit_finish:
>>   	bne	3f
>>   	mtcr	r5
>>   
>> +#ifdef CONFIG_ZERO_CALL_USED_REGS
>> +	/* Zero volatile regs that may contain sensitive kernel data */
>> +	li	r0,0
>> +	li	r4,0
>> +	li	r5,0
>> +	li	r6,0
>> +	li	r7,0
>> +	li	r8,0
>> +	li	r9,0
>> +	li	r10,0
>> +	li	r11,0
>> +	li	r12,0
>> +	mtctr	r0
>> +	mtxer	r0
>> +#endif
> 
> I think this should probably be unconditional -- if this is actually
> leaking kernel pointers (or data) that's pretty bad. :|
> 
> If you really want to leave it build-time selectable, maybe add a new
> config that gets "select"ed by CONFIG_ZERO_CALL_USED_REGS?

You mean a CONFIG that is selected by CONFIG_ZERO_CALL_USED_REGS and may 
also be selected by the user when CONFIG_ZERO_CALL_USED_REGS is not 
selected ?

At exit:
- contain of r4 is loaded in LR
- contain of r5 is loaded in CR
- contain of r7 is were we branch after switching back to user mode
- contain of r8 is loaded in MSR. Allthough MSR can't be read by the 
user, there is nothing secret in it.
- XER contains arithmetic flags, nothing really sensitive.

So remain r0, r6, r9 to r12 and ctr.

Maybe a compromise could be to only clear those when 
CONFIG_ZERO_CALL_USED_REGS is not selected ?

> 
> (And you may want to consider wiping all "unused" registers at syscall
> entry as well.)

How "unused" ?

At syscall entry we have syscall NR in r0, syscall args in r3 to r8.
The handler uses r9, r10, r11 and r12 prior to re-enabling MMU and 
taking any conditional branche.
r1 and r2 are also soon set and used (r1 is stack ptr, r2 is ptr to 
current task struct) and restored from stack at the end.
r13-r31 are callee saved/restored.

Christophe

Gabriel Paubert Feb. 24, 2022, 8:29 a.m. UTC | #6

On Wed, Feb 23, 2022 at 05:27:39PM -0600, Segher Boessenkool wrote:
> On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> > On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > > +	/* Zero volatile regs that may contain sensitive kernel data */
> > > +	li	r0,0
> > > +	li	r4,0
> > > +	li	r5,0
> > > +	li	r6,0
> > > +	li	r7,0
> > > +	li	r8,0
> > > +	li	r9,0
> > > +	li	r10,0
> > > +	li	r11,0
> > > +	li	r12,0
> > > +	mtctr	r0
> > > +	mtxer	r0
> > 
> > Here, I'm almost sure that on some processors, it would be better to
> > separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> > flush) but I don't know what's the best ordering for the average core.
> 
> mtxer is cheaper than mtctr on many cores :-)

We're speaking of 32 bit here I believe; on my (admittedly old) paper
copy of PowerPC 604 user's manual, I read in a footnote:

"The mtspr (XER) instruction causes instructions to be flushed when it
executes." 

Also a paragraph about "PostDispatch Serialization Mode" which reads:
"All instructions following the postdispatch serialization instruction
are flushed, refetched, and reexecuted."

Then it goes on to list the affected instructions which starts with:
mtsper(xer), mcrxr, isync, ...

I know there are probably very few 604 left in the field, but in this
case mtspr(xer) looks very much like a superset of isync.

I also just had a look at the documentation of a more widespread core:

https://www.nxp.com/docs/en/reference-manual/MPC7450UM.pdf

and mtspr(xer) is marked as execution and refetch serialized, actually
it is the only instruction to have both.

Maybe there is a subtle difference between "refetch serialization" and
"pipeline flush", but in this case please educate me.

Besides that the back to back mtctr/mtspr(xer) may limit instruction
decoding and issuing bandwidth.  I'd rather move one of them up by a few
lines since they can only go to one of the execution units on some
(or even most?) cores. This was my main point initially.

	Gabriel

> 
> On p9 mtxer is cracked into two latency 3 ops (which run in parallel).
> While mtctr has latency 5.
> 
> On p8 mtxer was horrible indeed (but nothing near as bad as a pipeline
> flush).
> 
> 
> Segher

Segher Boessenkool Feb. 24, 2022, 12:49 p.m. UTC | #7

Hi!

On Thu, Feb 24, 2022 at 09:29:55AM +0100, Gabriel Paubert wrote:
> On Wed, Feb 23, 2022 at 05:27:39PM -0600, Segher Boessenkool wrote:
> > On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote:
> > > On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote:
> > > > +	/* Zero volatile regs that may contain sensitive kernel data */
> > > > +	li	r0,0
> > > > +	li	r4,0
> > > > +	li	r5,0
> > > > +	li	r6,0
> > > > +	li	r7,0
> > > > +	li	r8,0
> > > > +	li	r9,0
> > > > +	li	r10,0
> > > > +	li	r11,0
> > > > +	li	r12,0
> > > > +	mtctr	r0
> > > > +	mtxer	r0
> > > 
> > > Here, I'm almost sure that on some processors, it would be better to
> > > separate mtctr form mtxer. mtxer is typically very expensive (pipeline
> > > flush) but I don't know what's the best ordering for the average core.
> > 
> > mtxer is cheaper than mtctr on many cores :-)
> 
> We're speaking of 32 bit here I believe;

32-bit userland, yes.  Which runs fine on non-ancient cores, too.

> on my (admittedly old) paper
> copy of PowerPC 604 user's manual, I read in a footnote:
> 
> "The mtspr (XER) instruction causes instructions to be flushed when it
> executes." 

And the 604 has a trivial depth pipeline anyway.

> I know there are probably very few 604 left in the field, but in this
> case mtspr(xer) looks very much like a superset of isync.

It hasn't been like that for decades.  On the 750 mtxer was execution
synchronised only already, for example.

> I also just had a look at the documentation of a more widespread core:
> 
> https://www.nxp.com/docs/en/reference-manual/MPC7450UM.pdf
> 
> and mtspr(xer) is marked as execution and refetch serialized, actually
> it is the only instruction to have both.

This looks like a late addition (it messes up the table, for example,
being put after "mtspr (other)").  It also is different from 7400 and
750 and everything else.  A late bugfix?  Curious :-)

> Maybe there is a subtle difference between "refetch serialization" and
> "pipeline flush", but in this case please educate me.

There is a subtle difference, but it goes the other way: refetch
serialisation doesn't stop fetch / flush everything after it, only when
the instruction completes it rejects everything after it.  So it can
waste a bit more :-)

> Besides that the back to back mtctr/mtspr(xer) may limit instruction
> decoding and issuing bandwidth.

It doesn't limit decode or dispatch (not issue fwiw) bandwidth on any
core I have ever heard of.

> I'd rather move one of them up by a few
> lines since they can only go to one of the execution units on some
> (or even most?) cores. This was my main point initially.

I think it is much more beneficial to *not* do these insns than to
shift them back and forth a cycle.

Segher

powerpc/32: Clear volatile regs on syscall exit

Checks

Commit Message

Comments

Patch