diff mbox series

[SRU,Trusty,Xenial,v2] UBUNTU: SAUCE: x86/speculation: Fix the IBRS synchronization

Message ID 1543571077-25165-1-git-send-email-gavin.guo@canonical.com
State New
Headers show
Series [SRU,Trusty,Xenial,v2] UBUNTU: SAUCE: x86/speculation: Fix the IBRS synchronization | expand

Commit Message

Gavin Guo Nov. 30, 2018, 9:44 a.m. UTC
BugLink: https://launchpad.net/bugs/1764956

Ubuntu v4.4 kernel uses the in-house patches for IBRS. The backports
still have some issues causing the IBRS status wrong when
context-switching between the VM and host. For example, the IBRS would
be mistakenly enabled in the host when the switching from an IBRS-enabled
VM and that causes the performance overhead in the host. The other
condition could also mistakenly disables the IBRS in VM when
context-switching from the host. And this could be considered a CVE host.

The detail different situations analysis:

The reproducing environment:
Guest kernel version: 4.4.0-138.164
Host kernel version: 4.4.0-140.166

(host IBRS, guest IBRS)

- 1). (0, 1).
The case can be reproduced by the following instructions:
guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled
1

<Several minutes later...>

host$ cat /proc/sys/kernel/ibrs_enabled
0
host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
11111111111111000000000000000000010010100000000000000000

Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly
enabled.

host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500
stress-ng: info:  [11264] defaulting to a 86400 second run per stressor
stress-ng: info:  [11264] dispatching hogs: 1 cpu
stress-ng: info:  [11264] cache allocate: default cache size: 35840K
stress-ng: info:  [11264] successful run completed in 33.48s

The host kernel didn't notice the IBRS bit is enabled. So, the situation
is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host.
And running the stress-ng is a pure userspace CPU capability
calculation. So, the performance downgrades to about 1/3. Without the
IBRS enabled, it needs about 10s.

- 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0).
The guest IBRS has been mistakenly disabled.

guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
11111111111111111111111111111111111111111111111111111111

host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
11111111111111111111111111111111111111111111111111111111
host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled
host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
00000000000000000000000000000000000000000000000000000000

guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
00000000000000000000000000000000000000000000000000000000

Fixes: 4d8d3dbed275 ("UBUNTU: SAUCE: x86/bugs, KVM: Support the combination ...")
Fixes: f676aa34b402 ("x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD ...")
Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
---
 arch/x86/kernel/cpu/bugs.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Juerg Haefliger Dec. 3, 2018, 1:18 p.m. UTC | #1
On Fri, 30 Nov 2018 17:44:37 +0800
Gavin Guo <gavin.guo@canonical.com> wrote:

> BugLink: https://launchpad.net/bugs/1764956
> 
> Ubuntu v4.4 kernel uses the in-house patches for IBRS. The backports
> still have some issues causing the IBRS status wrong when
> context-switching between the VM and host. For example, the IBRS would
> be mistakenly enabled in the host when the switching from an IBRS-enabled
> VM and that causes the performance overhead in the host. The other
> condition could also mistakenly disables the IBRS in VM when
> context-switching from the host. And this could be considered a CVE host.
> 
> The detail different situations analysis:
> 
> The reproducing environment:
> Guest kernel version: 4.4.0-138.164
> Host kernel version: 4.4.0-140.166
> 
> (host IBRS, guest IBRS)
> 
> - 1). (0, 1).
> The case can be reproduced by the following instructions:
> guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled
> 1
> 
> <Several minutes later...>
> 
> host$ cat /proc/sys/kernel/ibrs_enabled
> 0
> host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> 11111111111111000000000000000000010010100000000000000000
> 
> Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly
> enabled.
> 
> host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500
> stress-ng: info:  [11264] defaulting to a 86400 second run per stressor
> stress-ng: info:  [11264] dispatching hogs: 1 cpu
> stress-ng: info:  [11264] cache allocate: default cache size: 35840K
> stress-ng: info:  [11264] successful run completed in 33.48s
> 
> The host kernel didn't notice the IBRS bit is enabled. So, the situation
> is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host.
> And running the stress-ng is a pure userspace CPU capability
> calculation. So, the performance downgrades to about 1/3. Without the
> IBRS enabled, it needs about 10s.
> 
> - 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0).
> The guest IBRS has been mistakenly disabled.
> 
> guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
> guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> 11111111111111111111111111111111111111111111111111111111
> 
> host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
> host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> 11111111111111111111111111111111111111111111111111111111
> host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled
> host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> 00000000000000000000000000000000000000000000000000000000
> 
> guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> 00000000000000000000000000000000000000000000000000000000
> 
> Fixes: 4d8d3dbed275 ("UBUNTU: SAUCE: x86/bugs, KVM: Support the combination ...")
> Fixes: f676aa34b402 ("x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD ...")
> Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
> ---
>  arch/x86/kernel/cpu/bugs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 60907abf12f5..e5f1ba148e3c 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -185,6 +185,13 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
>  		guestval = hostval & ~x86_spec_ctrl_mask;
>  		guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;
>  
> +		/*
> +		 * Check the host IBRS status to make IBRS regsiter update
> +		 * correctly.
> +		 */
> +		if (ibrs_enabled)
> +			hostval |= SPEC_CTRL_IBRS;

This means we're setting IBRS on the host even if ibrs_enabled == 1. I
don't think that's correct. With ibrs_enabled == 1 we could VMENTER with
IBRS cleared but with the above we will set it on VMEXIT which is
incorrect, no?

...Juerg


> +
>  		/* SSBD controlled in MSR_SPEC_CTRL */
>  		if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
>  			hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
Gavin Guo Dec. 3, 2018, 3:59 p.m. UTC | #2
On Mon, Dec 3, 2018 at 9:18 PM Juerg Haefliger
<juerg.haefliger@canonical.com> wrote:
>
> On Fri, 30 Nov 2018 17:44:37 +0800
> Gavin Guo <gavin.guo@canonical.com> wrote:
>
> > BugLink: https://launchpad.net/bugs/1764956
> >
> > Ubuntu v4.4 kernel uses the in-house patches for IBRS. The backports
> > still have some issues causing the IBRS status wrong when
> > context-switching between the VM and host. For example, the IBRS would
> > be mistakenly enabled in the host when the switching from an IBRS-enabled
> > VM and that causes the performance overhead in the host. The other
> > condition could also mistakenly disables the IBRS in VM when
> > context-switching from the host. And this could be considered a CVE host.
> >
> > The detail different situations analysis:
> >
> > The reproducing environment:
> > Guest kernel version: 4.4.0-138.164
> > Host kernel version: 4.4.0-140.166
> >
> > (host IBRS, guest IBRS)
> >
> > - 1). (0, 1).
> > The case can be reproduced by the following instructions:
> > guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled
> > 1
> >
> > <Several minutes later...>
> >
> > host$ cat /proc/sys/kernel/ibrs_enabled
> > 0
> > host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> > 11111111111111000000000000000000010010100000000000000000
> >
> > Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly
> > enabled.
> >
> > host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500
> > stress-ng: info:  [11264] defaulting to a 86400 second run per stressor
> > stress-ng: info:  [11264] dispatching hogs: 1 cpu
> > stress-ng: info:  [11264] cache allocate: default cache size: 35840K
> > stress-ng: info:  [11264] successful run completed in 33.48s
> >
> > The host kernel didn't notice the IBRS bit is enabled. So, the situation
> > is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host.
> > And running the stress-ng is a pure userspace CPU capability
> > calculation. So, the performance downgrades to about 1/3. Without the
> > IBRS enabled, it needs about 10s.
> >
> > - 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0).
> > The guest IBRS has been mistakenly disabled.
> >
> > guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
> > guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> > 11111111111111111111111111111111111111111111111111111111
> >
> > host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
> > host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> > 11111111111111111111111111111111111111111111111111111111
> > host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled
> > host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> > 00000000000000000000000000000000000000000000000000000000
> >
> > guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
> > 00000000000000000000000000000000000000000000000000000000
> >
> > Fixes: 4d8d3dbed275 ("UBUNTU: SAUCE: x86/bugs, KVM: Support the combination ...")
> > Fixes: f676aa34b402 ("x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD ...")
> > Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
> > ---
> >  arch/x86/kernel/cpu/bugs.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index 60907abf12f5..e5f1ba148e3c 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -185,6 +185,13 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
> >               guestval = hostval & ~x86_spec_ctrl_mask;
> >               guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;
> >
> > +             /*
> > +              * Check the host IBRS status to make IBRS regsiter update
> > +              * correctly.
> > +              */
> > +             if (ibrs_enabled)
> > +                     hostval |= SPEC_CTRL_IBRS;
>
> This means we're setting IBRS on the host even if ibrs_enabled == 1. I
> don't think that's correct. With ibrs_enabled == 1 we could VMENTER with
> IBRS cleared but with the above we will set it on VMEXIT which is
> incorrect, no?

Good point! The reason why it should be set to one is based on the
definition of the "ibrs_enabled == 1". When setting "ibrs_enabled ==
1", the IBRS is enabled _ONLY_ in the _KERNEL_ mode. In user space,
the IBRS is disabled.

It's possible from the two paths:

i). Boot time
start_kernel -> check_bugs -> spectre_v2_select_mitigation ->
set_ibrs_enabled(1)

ii). Sysctl path
ibrs_enabled_handler -> set_ibrs_enabled(1)

In set_ibrs_enabled(1), the IBRS register isn't set. It just assigns
one to the ibrs_enabled variable.

So, where is the point to enable the IBRS register in kernel mode?

When ibrs_enabled = 1, it won't jump over the __ASM_ENABLE_IBRS, so,
the IBRS is set(for 0/2, it will jump over the __ASM_ENABLE_IBRS to
"10:", then, the IBRS won't set in the ENABLE_IBRS).

.macro ENABLE_IBRS
        testl   $1, ibrs_enabled
        jz      10f
        __ASM_ENABLE_IBRS
        jmp 20f
10:
        lfence
20:
.endm

The ENABLE_IBRS macro is expanded in the point where user space
switches to kernel space, such as:

arch/x86/entry/entry_64.S
1). ENTRY(entry_SYSCALL_64)
2). common_interrupt
3). error_entry
4). ENTRY(nmi)


And DISABLE_IBRS will disable the IBRS register when switching back to
user space from the kernel space to keep the performance in user space.

.macro DISABLE_IBRS
        testl   $1, ibrs_enabled
        jz      9f
        __ASM_DISABLE_IBRS
9:
.endm

Finally, the reason is that before the VMENTRY, it's in the kernel mode,
so, the IBRS register status should be enabled by the above-mentioned
path from the user space to kernel space when "ibrs_enabled == 1". And
SPEC_CTRL IBRS bit should be one. Let assume that if the guest clear the IBRS
register, after the VMEXIT, we should set one back to the SPEC_CTRL
IBRS bit again to re-enable in the kernel mode. Or we can save the
IBRS bit status before the VMENTRY when "ibrs_enabled == 1." But I
think it's redundant. Or do you figure out if there are any
possibilities that in the kernel mode before VMENTRY, the SPEC_CTRL
IBRS bit could be zero?


>
> ...Juerg
>
>
> > +
> >               /* SSBD controlled in MSR_SPEC_CTRL */
> >               if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
> >                       hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
>
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 60907abf12f5..e5f1ba148e3c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -185,6 +185,13 @@  x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest)
 		guestval = hostval & ~x86_spec_ctrl_mask;
 		guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;
 
+		/*
+		 * Check the host IBRS status to make IBRS regsiter update
+		 * correctly.
+		 */
+		if (ibrs_enabled)
+			hostval |= SPEC_CTRL_IBRS;
+
 		/* SSBD controlled in MSR_SPEC_CTRL */
 		if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
 			hostval |= ssbd_tif_to_spec_ctrl(ti->flags);